08.07.2020 13:06, Mikko Perttunen пишет:
On 7/7/20 2:06 PM, Dmitry Osipenko wrote:
02.07.2020 15:10, Mikko Perttunen пишет:
Ok, so we would have two kinds of syncpoints for the job; one for kernel job tracking; and one that userspace can manipulate as it wants to.
Could we handle the job tracking syncpoint completely inside the kernel, i.e. allocate it in kernel during job submission, and add an increment for it at the end of the job (with condition OP_DONE)? For MLOCKing, the kernel already needs to insert a SYNCPT_INCR(OP_DONE) + WAIT + MLOCK_RELEASE sequence at the end of each job.
If sync point is allocated within kernel, then we'll need to always patch all job's sync point increments with the ID of the allocated sync point, regardless of whether firewall enabled or not.
The idea was that the job tracking increment would also be added to the pushbuffer in the kernel, so gathers would only have increments for the "user syncpoints", if any. I think this should work for THI-based engines (e.g. VIC), you probably have better information about GR2D/GR3D. On newer Tegras we could use CHANNEL/APPID protection to prevent the gathers from incrementing these job tracking syncpoints.
Could you please clarify what is THI?
Secondly, I'm now recalling that only one sync point could be assigned to a channel at a time on newer Tegras which support sync point protection. So it sounds like we don't really have variants other than to allocate one sync point per channel for the jobs usage if we want to be able to push multiple jobs into channel's pushbuffer, correct?
The other way around; each syncpoint can be assigned to one channel. One channel may have multiple syncpoints.
Okay! So we actually could use a single sync point per-job for user increments + job tracking, correct?
...
Hmm, we actually should be able to have a one sync point per-channel for the job submission, similarly to what the current driver does!
I'm keep forgetting about the waitbase existence!
Tegra194 doesn't have waitbases, but if we are resubmitting all the jobs anyway, can't we just recalculate wait thresholds at that time?
Yes, thresholds could be recalculated + job should be re-formed at the push-time then.
It also means that jobs always should be formed only at the push-time if wait-command is utilized by cmdstream since the waits always need to be patched because we won't know the thresholds until scheduler actually runs the job.
Could we keep the job tracking syncpoints entirely within the kernel, and have all wait commands and other stuff that userspace does use the user syncpoints? Then kernel job tracking and userspace activity would be separate from each other.
I think this should work, but it's not clear to me what benefits this brings to us if we could re-use the same user-allocated sync point for both user increments + kernel job tracking.
Is the idea to protect from userspace incrementing sync point too much? I.e. job could be treated as completed before CDMA is actually done with it.
Alternatively, if we know that jobs can only be removed from the middle of pushbuffers, and not added, we could replace any removed jobs with syncpoint increments in the pushbuffer and any thresholds would stay intact.
A bit unlikely that jobs could ever be removed from the middle of hardware queue by the DRM scheduler.
Anyways, it should be nicer to shoot down everything and restart with a clean slate when necessary, like it's already supposed to happen by the scheduler.