On Thu, May 06, 2021 at 10:06:32PM -0400, Kenny Ho wrote:
Sorry for the late reply (I have been working on other stuff.)
On Fri, Feb 5, 2021 at 8:49 AM Daniel Vetter daniel@ffwll.ch wrote:
So I agree that on one side CU mask can be used for low-level quality of service guarantees (like the CLOS cache stuff on intel cpus as an example), and that's going to be rather hw specific no matter what.
But my understanding of AMD's plans here is that CU mask is the only thing you'll have to partition gpu usage in a multi-tenant environment
- whether that's cloud or also whether that's containing apps to make
sure the compositor can still draw the desktop (except for fullscreen ofc) doesn't really matter I think.
This is not correct. Even in the original cgroup proposal, it supports both mask and count as a way to define unit(s) of sub-device. For AMD, we already have SRIOV that supports GPU partitioning in a time-sliced-of-a-whole-GPU fashion.
Hm I missed that. I feel like time-sliced-of-a-whole gpu is the easier gpu cgroups controler to get started, since it's much closer to other cgroups that control bandwidth of some kind. Whether it's i/o bandwidth or compute bandwidht is kinda a wash.
CU mask feels a lot more like an isolation/guaranteed forward progress kind of thing, and I suspect that's always going to be a lot more gpu hw specific than anything we can reasonably put into a general cgroups controller.
Also for the time slice cgroups thing, can you pls give me pointers to these old patches that had it, and how it's done? I very obviously missed that part.
Thanks, Daniel