On Wed, Mar 23, 2022 at 02:44:17PM +0100, Christian König wrote:
Am 23.03.22 um 14:36 schrieb Daniel Vetter:
On Wed, 23 Mar 2022 at 13:20, Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 23.03.22 um 12:59 schrieb Daniel Vetter:
On Mon, Mar 21, 2022 at 02:25:56PM +0100, Christian König wrote:
This way we finally fix the problem that new resource are not immediately evict-able after allocation.
That has caused numerous problems including OOM on GDS handling and not being able to use TTM as general resource manager.
v2: stop assuming in ttm_resource_fini that res->bo is still valid. v3: cleanup kerneldoc, add more lockdep annotation v4: consistently use res->num_pages
Signed-off-by: Christian König christian.koenig@amd.com Tested-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl +/**
- struct ttm_lru_bulk_move
- @tt: first/last lru entry for resources in the TT domain
- @vram: first/last lru entry for resources in the VRAM domain
- Helper structure for bulk moves on the LRU list.
- */
+struct ttm_lru_bulk_move {
- struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
- struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];
Not really needed, just a thought: Should we track the associated dma_resv object here to make sure the locking is all done correctly (and also check that the bulk move bo have the same dma_resv)? It wouldn't really be any overhead for the !CONFIG_LOCKDEP case and we could sprinkle a lot more dma_resv_held all over the place.
You made a similar comment on the last revision and I already tried to play around with that idea a bit.
But I've completely abandoned that idea after realizing that the BOs in the bulk move actually don't need to have the same dma_resv object, nor do they all need to be locked.
Uh how does that work? If you evict that bo while someone else is doing a bulk move, then at least the result might be rather funny, and I have no idea how it could work.
The LRU is still protected by the common spinlock.
So that will synchronize any modification to both the bulk move structure as well as the individual list_heads making up the LRU.
Like even if you then make the rule that you have to lock all bos for the bulk move, the bo could still be moved independently, and that would again break the bulk lru properties.
And if you do none of the above, there's no reason for that bo to have a distinct dma_resv.
Like maybe the data structure wont fall apart, but semantically it just doesn't make any sense to me to allow this. What would you want to use this for?
Yeah, that's a good point.
It's not technically necessary as far as I can see, but I'm not sure if there is a valid use case either.
Yeah I think nothing obviously bad will happen to the list, but it will very much no longer be an LRU. Which kinda defeats the point - there's easier ways to not have an LRU like never updating anything :-)
Aside, this is actually what i915-gem folks did without realizing, so that's also a bit why I think being strict here would be good. It's tricky and if you're too clever it's way too easy to end up with something which isn't anything useful anymore.
It just happens that amdgpu is currently using it that way, but I can't see any technical necessarily to restrict the bulk move like that.
Yeah we can do that later on in a follow up patch, or I figure out why it's not a good idea :-) Just figured this might be good to lock down before other drivers start adopting this.
I'm just wondering if it's really more defensive to restrict the handling like that.
On the other hand we can still lift the restriction when anybody really comes along with a valid use case.
Yeah if we do have a use-case we can lift this easily, it's just a few dma_resv_assert_held. I don't think we need this included for merging, but it'd be great as a follow-up patch while only amdgpu is using this. -Daniel
Christian.
-Daniel
Regards, Christian.
-Daniel