https://bugs.freedesktop.org/show_bug.cgi?id=100712
--- Comment #4 from Michel Dänzer michel@daenzer.net --- (In reply to Julien Isorce from comment #0)
In kernel radeon_object.c::radeon_bo_list_validate, once "bytes_moved > bytes_moved_threshold" is reached (this is the case for 850 bo in the same list_for_each_entry loop), I can see that radeon_ib_schedule emits a fence that it takes more than the radeon.lockup_timeout to be signaled.
radeon_ib_schedule is called for submitting the command stream from userspace, not for any BO moves directly, right?
How did you determine that this hang is directly related to bytes_moved / bytes_moved_threshold? Maybe it's only indirectly related, e.g. due to the threshold preventing a BO from being moved to VRAM despite userspace's preference.
Also it seems the fence is signaled by swapper after more than 10 seconds but it is too late. I requires to reduce the "15" param above to 4 to see that.
How does "swapper" (what is that exactly?) signal the fence?
Is it normal that radeon_bo_list_validate still tries to move the bo if bytes_moved_threshold is reached ?
There are circumstances where a BO has to be moved even though the threshold is reached.
Indeed ttm_bo_validate is always called
ttm_bo_validate must be called for every BO referenced by the command stream from userspace for correct lifetime management of its memory.
(it blits from vram to vram).
It might be worth looking into why this happens, though. If domain == current_domain == RADEON_GEM_DOMAIN_VRAM, I wouldn't expect ttm_bo_validate to trigger a blit.