https://bugs.freedesktop.org/show_bug.cgi?id=85207
Bug ID: 85207 Summary: agd5f drm-next-3.19-wip + Unreal Elemental sometimes = list_add corruption/hung task Product: DRI Version: XOrg CVS Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: adf.lists@gmail.com
Created attachment 108075 --> https://bugs.freedesktop.org/attachment.cgi?id=108075&action=edit dmesg when Unreal Elemental hangs on start
R9270X Sometime running unreal elemental demo it hangs at startup with errors in dmesg attached.
This doesn't always happen.
Mesa is currently on winsys/radeon: Use a single buffer cache manager again, previously produced with slightly older.
Haven't seen on drm-next-3.18-wip (but really need to test more with current mesa)
Possibly unrelated, but new for drm-next-3.19-wip I get below when running Unigine Valley - it runs OK.
Oct 17 11:15:35 ph4 kernel: radeon 0000:01:00.0: GPU fault detected: 146 0x0af03504 Oct 17 11:15:35 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010E57 Oct 17 11:15:35 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x10035004 Oct 17 11:15:35 ph4 kernel: VM fault (0x04, vmid 8) at page 69207, read from VGT (53)
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #1 from Andy Furniss adf.lists@gmail.com --- Also noticed in that dmesg and searching kern log that I sometimes get apparently without effect -
kernel: [drm:radeon_gem_va_update_vm] *ERROR* Couldn't update BO_VA (-512)
With this kernel.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #2 from Michel Dänzer michel@daenzer.net --- (In reply to Andy Furniss from comment #0)
Haven't seen on drm-next-3.18-wip
Can you bisect the kernel?
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #3 from Andy Furniss adf.lists@gmail.com --- (In reply to Michel Dänzer from comment #2)
(In reply to Andy Furniss from comment #0)
Haven't seen on drm-next-3.18-wip
Can you bisect the kernel?
May be a bit early, but I will sit on the one before for a while to confirm.
Looks like the head commit -
commit bb9a49819ed30f3f5782b2504066547a8507a591 Author: Christian König christian.koenig@amd.com Date: Mon Oct 13 12:41:47 2014 +0200
drm/radeon: update the VM after setting BO address
This way the necessary VM update is kicked off immediately if all BOs involved are in GPU accessible memory.
I haven't managed to lock or get Valley to GPU fault on the one before so far.
FWIW I noticed even on head the valley fault doesn't always happen - it seems that I need to have set my CPUs to perf (which I nearly always do when testing things like this). With cpufreq ondemand I didn't see the fault.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |deathsimple@vodafone.de
--- Comment #4 from Christian König deathsimple@vodafone.de --- Created attachment 108165 --> https://bugs.freedesktop.org/attachment.cgi?id=108165&action=edit Possible fix
Ups! Forgotten to take the VM lock in radeon_gem_va_update_vm. Fix is attached.
Thanks for testing, Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #5 from Andy Furniss adf.lists@gmail.com --- (In reply to Christian König from comment #4)
Created attachment 108165 [details] [review] Possible fix
Ups! Forgotten to take the VM lock in radeon_gem_va_update_vm. Fix is attached.
Thanks for testing, Christian.
I don't know about Elemental as it's far harder to trigger, but first try with valley produced -
[ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010F17 [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08035004 [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53)
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #6 from Christian König deathsimple@vodafone.de --- (In reply to Andy Furniss from comment #5)
I don't know about Elemental as it's far harder to trigger, but first try with valley produced -
[ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010F17 [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08035004 [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53)
Sounds like a different problem triggered by the same patchset to me.
But first things first, is the original issue with the list corruption fixed? If yes we can start to look into this one as well.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #7 from Andy Furniss adf.lists@gmail.com --- (In reply to Christian König from comment #6)
(In reply to Andy Furniss from comment #5)
I don't know about Elemental as it's far harder to trigger, but first try with valley produced -
[ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010F17 [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08035004 [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53)
Sounds like a different problem triggered by the same patchset to me.
But first things first, is the original issue with the list corruption fixed? If yes we can start to look into this one as well.
It's OK so far, but then I need more time as I don't really know how to trigger it and last time I called it as OK (in another bug) it wasn't.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #8 from Andy Furniss adf.lists@gmail.com --- (In reply to Andy Furniss from comment #7)
(In reply to Christian König from comment #6)
(In reply to Andy Furniss from comment #5)
I don't know about Elemental as it's far harder to trigger, but first try with valley produced -
[ 156.617954] radeon 0000:01:00.0: GPU fault detected: 146 0x02e83504 [ 156.617960] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00010F17 [ 156.617961] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08035004 [ 156.617963] VM fault (0x04, vmid 4) at page 69399, read from VGT (53)
Sounds like a different problem triggered by the same patchset to me.
But first things first, is the original issue with the list corruption fixed? If yes we can start to look into this one as well.
It's OK so far, but then I need more time as I don't really know how to trigger it and last time I called it as OK (in another bug) it wasn't.
Still haven't crashed Elemental but have got -
[29066.333908] [drm:radeon_gem_va_update_vm] *ERROR* Couldn't update BO_VA (-512) [29066.335653] [drm:radeon_gem_va_update_vm] *ERROR* Couldn't update BO_VA (-512)
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #9 from Andy Furniss adf.lists@gmail.com --- (In reply to Christian König from comment #6)
But first things first, is the original issue with the list corruption fixed? If yes we can start to look into this one as well.
Enough time has passed now, so I do think that the patch fixed the list corruption.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #10 from Lorenzo Bona lorenz.bona@gmail.com --- I found same issues here.
[ 1384.901951] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 1453.198866] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 2215.773607] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 2351.238014] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512) [ 3877.903397] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512)
Self compiled kernel from Linus git. 3.19-rc2+ right now.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #11 from Michel Dänzer michel@daenzer.net --- (In reply to Lorenzo Bona from comment #10)
[ 1384.901951] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512)
Christian, any ideas for these? Various people including myself are still hitting them occasionally.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #12 from Christian König deathsimple@vodafone.de --- Created attachment 111961 --> https://bugs.freedesktop.org/attachment.cgi?id=111961&action=edit Fix for printing the error message
https://bugs.freedesktop.org/show_bug.cgi?id=85207
--- Comment #13 from Christian König deathsimple@vodafone.de --- (In reply to Michel Dänzer from comment #11)
(In reply to Lorenzo Bona from comment #10)
[ 1384.901951] [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-512)
Christian, any ideas for these? Various people including myself are still hitting them occasionally.
Ups, yeah trivial to fix.
https://bugs.freedesktop.org/show_bug.cgi?id=85207
Andy Furniss adf.lists@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #14 from Andy Furniss adf.lists@gmail.com --- Should have been closed some time ago
https://bugs.freedesktop.org/show_bug.cgi?id=85207
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |commiethebeastie@gmail.com
--- Comment #15 from Christian König deathsimple@vodafone.de --- *** Bug 88211 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=85207
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #16 from Christian König deathsimple@vodafone.de --- Let's close this.
dri-devel@lists.freedesktop.org