https://bugs.freedesktop.org/show_bug.cgi?id=97980
Bug ID: 97980 Summary: [amdgpu] New kernel warning during shutdown Product: DRI Version: XOrg git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: mike@fireburn.co.uk
Created attachment 126886 --> https://bugs.freedesktop.org/attachment.cgi?id=126886&action=edit Screenshot
I might have spoke too soon with the memory manager patches, I'm seeing a stack trace just as the machine is just about to switch off.
Also it takes about 30 seconds to switch off my laptop now, I think it's amdgpu related, it seems to wait then fire up the card then switch off - it could also be hard disk or even systemd related though.
I'm attaching the screen shot but it looks like an issue with ttm_bo_force_list_clean
Sorry about the bad quality but I had to record a video in slowmo to capture it, then screenshot that
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #1 from Alex Deucher alexdeucher@gmail.com --- Does cherry-picking this patch over help? https://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=a951e...
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Mike Lothian mike@fireburn.co.uk changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mike@fireburn.co.uk
--- Comment #2 from Mike Lothian mike@fireburn.co.uk --- Yes that fixes it
I've been having a more and more difficult time testing stuff of late, there's been quite a few regressions and I've been carrying more and more patches amongst various branches - lets hope the next cycle will be better
What's your handle on IRC?
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #3 from Alex Deucher alexdeucher@gmail.com --- (In reply to Mike Lothian from comment #2)
Yes that fixes it
I've been having a more and more difficult time testing stuff of late, there's been quite a few regressions and I've been carrying more and more patches amongst various branches - lets hope the next cycle will be better
Well, bug fixes go to -fixes and new features go to -next. If you want everything, you'd need to merge -fixes into -next.
What's your handle on IRC?
agd5f
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #4 from Mike Lothian mike@fireburn.co.uk --- Sorry I spoke too soon, the issue is still there, it's just more difficult to see as the reboot is so quick now
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #5 from Andy Furniss adf.lists@gmail.com --- Maybe a different issue but I've just started getting shutdown issues with agd5f drm-next-4.9-wip
It seems the monitor blanks early so I don't get to see anything - just with halt it doesn't power off.
On current kernel reverting
0ea8cba5ef7b783f11cb1a0b900b7c18d2ce0b6 drm/amdgpu: always apply pci shutdown callbacks (v2)
Apparently fixes it, but it's not that simple. I first saw the issue on the 25th, but with the next update the branch got it went away, so I thought it was fixed. It re-appeared with more recent updates.
Unfortunately it seems the my working recent kernel (26th) has the above commit - so maybe some interaction/timing issue with something else.
https://bugs.freedesktop.org/show_bug.cgi?id=97980
pankaj pankaj.baware1@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |have-backtrace Hardware|Other |IA64 (Itanium) Priority|medium |lowest Severity|normal |blocker OS|All |NetBSD URL| |https://bugs.freedesktop.or | |g
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #6 from Mike Lothian mike@fireburn.co.uk --- I'm still seeing this issue on the 4.9-wip branch and that has this patch included:
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1708,11 +1708,11 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
DRM_INFO("amdgpu: finishing device.\n"); adev->shutdown = true; + drm_crtc_force_disable_all(adev->ddev); /* evict vram memory */ amdgpu_bo_evict_vram(adev); amdgpu_ib_pool_fini(adev); amdgpu_fence_driver_fini(adev); - drm_crtc_force_disable_all(adev->ddev); amdgpu_fbdev_fini(adev); r = amdgpu_fini(adev); kfree(adev->ip_block_status);
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Mike Lothian mike@fireburn.co.uk changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #126886|0 |1 is obsolete| |
--- Comment #7 from Mike Lothian mike@fireburn.co.uk --- Created attachment 127331 --> https://bugs.freedesktop.org/attachment.cgi?id=127331&action=edit Updated screenshot
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #8 from Mike Lothian mike@fireburn.co.uk --- OK I followed the advice you gave in the other bug about compiling amdgpu as a module and got the following dmesg using
modprobe -r amdgpu && dmesg > dmesg && sync
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #9 from Mike Lothian mike@fireburn.co.uk --- Created attachment 127340 --> https://bugs.freedesktop.org/attachment.cgi?id=127340&action=edit Dmesg
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #10 from Mike Lothian mike@fireburn.co.uk --- After I issue the modprobe -r amdgpu command the system entirely freezes up
I took a screenshot of the final messages - could this be TTM related?
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Mike Lothian mike@fireburn.co.uk changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #127331|0 |1 is obsolete| |
--- Comment #11 from Mike Lothian mike@fireburn.co.uk --- Created attachment 127341 --> https://bugs.freedesktop.org/attachment.cgi?id=127341&action=edit Updated screenshot
This captures the BUG that freezes up the system
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #12 from Mike Lothian mike@fireburn.co.uk --- Created attachment 127565 --> https://bugs.freedesktop.org/attachment.cgi?id=127565&action=edit New Screenshot
The first stack trace in the dmesg is the same, the one captured after the system freezes up is slightly different
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Mike Lothian mike@fireburn.co.uk changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #127340|0 |1 is obsolete| |
--- Comment #13 from Mike Lothian mike@fireburn.co.uk --- Created attachment 127566 --> https://bugs.freedesktop.org/attachment.cgi?id=127566&action=edit Updated dmesg
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #14 from Mike Lothian mike@fireburn.co.uk --- I've tested this again with the latest drm-next-4.10-wip branch and I still get the same errors
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #15 from Alex Deucher alexdeucher@gmail.com --- Created attachment 128355 --> https://bugs.freedesktop.org/attachment.cgi?id=128355&action=edit possible fix
Does this patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=98638
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #16 from Mike Lothian mike@fireburn.co.uk --- It helps the original issue where a saw a panic / stack trace on shutdown and shutdown took a while - so that's great news
I've retested compiling amdgpu as a module and modprobe -r(ing) it - this still kills my machine, would you be interested in me taking more diagnostics? Or can that now be considered a separate bug?
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #17 from Alex Deucher alexdeucher@gmail.com --- (In reply to Mike Lothian from comment #16)
It helps the original issue where a saw a panic / stack trace on shutdown and shutdown took a while - so that's great news
I've retested compiling amdgpu as a module and modprobe -r(ing) it - this still kills my machine, would you be interested in me taking more diagnostics? Or can that now be considered a separate bug?
Separate bug. With this patch, the two code paths (module unload and shutdown are now separate).
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ernstp@gmail.com
--- Comment #18 from Alex Deucher alexdeucher@gmail.com --- *** Bug 98638 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #19 from Alex Deucher alexdeucher@gmail.com --- Created attachment 128372 --> https://bugs.freedesktop.org/attachment.cgi?id=128372&action=edit alternative patch
Does this patch also work?
https://bugs.freedesktop.org/show_bug.cgi?id=97980
--- Comment #20 from Mike Lothian mike@fireburn.co.uk --- So I removed your previous patch and applied the new one, I get a panic in shutdown again
https://bugs.freedesktop.org/show_bug.cgi?id=97980
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
dri-devel@lists.freedesktop.org