https://bugs.freedesktop.org/show_bug.cgi?id=62959
--- Comment #13 from Marek Olšák maraeo@gmail.com --- This kernel patch fixes everything:
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 70d3824..748a933 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -459,6 +459,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, if (r) { goto out; } + radeon_fence_wait(vm->fence, false); radeon_cs_sync_rings(parser); radeon_ib_sync_to(&parser->ib, vm->fence); radeon_ib_sync_to(&parser->ib, radeon_vm_grab_id(
It's merely a workaround and it kills performance, but it's now pretty clear there is a synchronization issue in the kernel affecting all NI chips with virtual memory, and it should now be easier to find the bug. I'm not really familiar with the kernel code. I had to do some code reading before I found the right place to put the wait call in.