On Mon, Mar 15, 2021 at 6:57 PM Zack Rusin zackr@vmware.com wrote:
On 3/12/21 5:06 AM, Thomas Hellström (Intel) wrote:
On 3/12/21 12:02 AM, Zack Rusin wrote:
On Mar 11, 2021, at 17:35, Thomas Hellström (Intel) thomas_os@shipmail.org wrote:
Hi, Zack
On 3/11/21 10:07 PM, Zack Rusin wrote:
On Mar 11, 2021, at 05:46, Thomas Hellström (Intel) thomas_os@shipmail.org wrote:
Hi,
I tried latest drm-fixes today and saw a lot of these: Fallout from ttm rework?
Yes, I fixed this in d1a73c641afd2617bd80bce8b71a096fc5b74b7e it was in drm-misc-next in the drm-misc tree for a while but hasn’t been merged for 5.12.
z
Hmm, yes but doesn't that fix trip the ttm_bo_unpin() dma_resv_assert_held(bo->base.resv)?
No, doesn’t seem to. TBH I’m not sure why myself, but it seems to be working fine.
With CONFIG_PROVE_LOCKING=y I see this:
[ 7.117145] [drm] FIFO at 0x00000000fe000000 size is 8192 kiB [ 7.117284] [drm] VRAM at 0x00000000e8000000 size is 131072 kiB [ 7.117291] INFO: trying to register non-static key. [ 7.117295] the code is fine but needs lockdep annotation. [ 7.117298] turning off the locking correctness validator
Which will probably mask that dma_resv_assert_held(bo->base.resv)
Ah, yes, you're right. After fixing that I do see the dma_resv_assert_held triggered. Technically trivially fixable with ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but it's a little ugly that some places e.g. vmw_bo_unreference will require ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but some won't e.g. vmw_mob_destroy won't because its lock is held by ttm_bo_delayed_delete without a very clear indication within the function which is which.
Not sure it applies here, but if the refcount is down to 0 we know we're in destruction code and can't race with anything anymore, so maybe we can lift the debug check.
Otoh I think at that point we might still be on lru lists, so the rules become rather tricky whether it's really always legal to skip the dma_resv_lock. But we could perhaps figure out something if it's too annoying to have a consistent calling context in drivers.
I'm not a huge fan of dropping the requirement from unpin and switching to atomic_t for the pin count, since pin/unpin is an extremely slow path, adding complexity in how we protect stuff for a function that's maybe called 60/s (for page flipping we pin/unpin) doesn't strike me as the right balance. And atomic commit helpers are explicitly designed to allow drivers to grab dma_resv_lock in their ->cleanup_fb hook, since that part is _not_ in the dma_fence signalling critical path for the atomic flip. But if all else fails I guess that's an option too. -Daniel