Re: [PATCH 1/6] drm/ttm: move the LRU into resource handling v4

23 Mar 2022

      On Wed, Mar 23, 2022 at 02:44:17PM +0100, Christian König wrote:
...
Am 23.03.22 um 14:36 schrieb Daniel Vetter:
...
On Wed, 23 Mar 2022 at 13:20, Christian König
ckoenig.leichtzumerken@gmail.com wrote:
...
Am 23.03.22 um 12:59 schrieb Daniel Vetter:
...
On Mon, Mar 21, 2022 at 02:25:56PM +0100, Christian König wrote:
...
This way we finally fix the problem that new resource are
not immediately evict-able after allocation.
That has caused numerous problems including OOM on GDS handling
and not being able to use TTM as general resource manager.
v2: stop assuming in ttm_resource_fini that res->bo is still valid.
v3: cleanup kerneldoc, add more lockdep annotation
v4: consistently use res->num_pages
Signed-off-by: Christian König christian.koenig@amd.com
Tested-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
+/**

struct ttm_lru_bulk_move

@tt: first/last lru entry for resources in the TT domain

@vram: first/last lru entry for resources in the VRAM domain

Helper structure for bulk moves on the LRU list.

*/

+struct ttm_lru_bulk_move {

struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];

Not really needed, just a thought: Should we track the associated dma_resv
object here to make sure the locking is all done correctly (and also check
that the bulk move bo have the same dma_resv)? It wouldn't really be any
overhead for the !CONFIG_LOCKDEP case and we could sprinkle a lot more
dma_resv_held all over the place.
You made a similar comment on the last revision and I already tried to
play around with that idea a bit.
But I've completely abandoned that idea after realizing that the BOs in
the bulk move actually don't need to have the same dma_resv object, nor
do they all need to be locked.
Uh how does that work? If you evict that bo while someone else is
doing a bulk move, then at least the result might be rather funny, and
I have no idea how it could work.
The LRU is still protected by the common spinlock.
So that will synchronize any modification to both the bulk move structure as
well as the individual list_heads making up the LRU.
...
Like even if you then make the rule that you have to lock all bos for
the bulk move, the bo could still be moved independently, and that
would again break the bulk lru properties.
And if you do none of the above, there's no reason for that bo to have
a distinct dma_resv.
Like maybe the data structure wont fall apart, but semantically it
just doesn't make any sense to me to allow this. What would you want
to use this for?
Yeah, that's a good point.
It's not technically necessary as far as I can see, but I'm not sure if
there is a valid use case either.
Yeah I think nothing obviously bad will happen to the list, but it will
very much no longer be an LRU. Which kinda defeats the point - there's
easier ways to not have an LRU like never updating anything :-)
Aside, this is actually what i915-gem folks did without realizing, so
that's also a bit why I think being strict here would be good. It's tricky
and if you're too clever it's way too easy to end up with something which
isn't anything useful anymore.
...
...
...
It just happens that amdgpu is currently using it that way, but I can't
see any technical necessarily to restrict the bulk move like that.
Yeah we can do that later on in a follow up patch, or I figure out why
it's not a good idea :-) Just figured this might be good to lock down
before other drivers start adopting this.
I'm just wondering if it's really more defensive to restrict the handling
like that.
On the other hand we can still lift the restriction when anybody really
comes along with a valid use case.
Yeah if we do have a use-case we can lift this easily, it's just a few
dma_resv_assert_held. I don't think we need this included for merging, but
it'd be great as a follow-up patch while only amdgpu is using this.
-Daniel
...
Christian.
...
-Daniel
...
Regards,
Christian.
...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 1/6] drm/ttm: move the LRU into resource handling v4