https://bugs.freedesktop.org/show_bug.cgi?id=100222
Bug ID: 100222 Summary: Hang regression with R7 M370, identified possible culprit commit Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/Radeon Assignee: dri-devel@lists.freedesktop.org Reporter: registo.mailling@gmail.com
Created attachment 130246 --> https://bugs.freedesktop.org/attachment.cgi?id=130246&action=edit lspci -nnk and dmesg output
After updating from kernel 4.9 series to 4.10 series I have identified a regression when using the discrete GPU on my laptop (Lenovo Thinkpad E560).
When running any demanding application with DRI_PRIME=1 the card will hang, one example would be running 'DRI_PRIME=1 glmark2 -b texture'.
I have noticed that the content of /sys/kernel/debug/dri/1/radeon_pm_info has changed between kernel 4.9 and 4.10 when running glmark2.
With 4.9: power level 4 sclk: 75000 mclk: 80000 vddc: 1050 vddci: 0 pcie gen: 2
With 4.10: power level 4 sclk: 87500 mclk: 90000 vddc: 1050 vddci: 0 pcie gen: 2
This led me to revert commit 3a69adfe5617ceba04ad3cff0f9ccad470503fb2 which prevents the card from hanging.
You can find the output of lspci and dmesg in the attachment for the case with commit 3a69adfe5617ceba04ad3cff0f9ccad470503fb2 reverted.
https://bugs.freedesktop.org/show_bug.cgi?id=100222
--- Comment #1 from Alex Deucher alexdeucher@gmail.com --- Created attachment 130250 --> https://bugs.freedesktop.org/attachment.cgi?id=130250&action=edit patch 1/2
The attached patches should fix it.
https://bugs.freedesktop.org/show_bug.cgi?id=100222
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- Created attachment 130251 --> https://bugs.freedesktop.org/attachment.cgi?id=130251&action=edit patch 2/2
https://bugs.freedesktop.org/show_bug.cgi?id=100222
--- Comment #3 from Mauro Santos registo.mailling@gmail.com --- Build fails after applying patch 1 followed by patch 2 with:
drivers/gpu/drm/radeon/si_dpm.c: In function ‘si_get_vce_clock_voltage’: drivers/gpu/drm/radeon/si_dpm.c:2977:4: error: ‘else’ without a previous ‘if’ } else if (rdev->family == CHIP_OLAND) { ^~~~ drivers/gpu/drm/radeon/si_dpm.c:2985:4: error: ‘max_sclk’ undeclared (first use in this function) max_sclk = 75000; ^~~~~~~~ drivers/gpu/drm/radeon/si_dpm.c:2985:4: note: each undeclared identifier is reported only once for each function it appears in
The patch changes things inside the si_get_vce_clock_voltage function but I suppose the changes should be made a few lines bellow that to the si_apply_state_adjust_rules function after the quirks for pitcairn and hainan right?
Another thing that I'm curious about, any guesses as to why the card needs the maximum core clock limited to 750MHz on linux but seems to work fine on windows 10 at 875MHz? I've tried it on Windows 10 (all drivers downloaded via windows update) with unigine heaven + cpu-z to monitor the frequencies and it seems to go along happily with 875MHz core and 900MHz memory clocks.
https://bugs.freedesktop.org/show_bug.cgi?id=100222
--- Comment #4 from Alex Deucher alexdeucher@gmail.com --- (In reply to Mauro Santos from comment #3)
Build fails after applying patch 1 followed by patch 2 with:
drivers/gpu/drm/radeon/si_dpm.c: In function ‘si_get_vce_clock_voltage’: drivers/gpu/drm/radeon/si_dpm.c:2977:4: error: ‘else’ without a previous ‘if’ } else if (rdev->family == CHIP_OLAND) { ^~~~ drivers/gpu/drm/radeon/si_dpm.c:2985:4: error: ‘max_sclk’ undeclared (first use in this function) max_sclk = 75000; ^~~~~~~~ drivers/gpu/drm/radeon/si_dpm.c:2985:4: note: each undeclared identifier is reported only once for each function it appears in
The patch changes things inside the si_get_vce_clock_voltage function but I suppose the changes should be made a few lines bellow that to the si_apply_state_adjust_rules function after the quirks for pitcairn and hainan right?
The patch modifies si_apply_state_adjust_rules, I guess it's not applying cleanly to your kernel.
Another thing that I'm curious about, any guesses as to why the card needs the maximum core clock limited to 750MHz on linux but seems to work fine on windows 10 at 875MHz? I've tried it on Windows 10 (all drivers downloaded via windows update) with unigine heaven + cpu-z to monitor the frequencies and it seems to go along happily with 875MHz core and 900MHz memory clocks.
There is still some bug in the driver that prevents the higher clocks for working stable on your card. We fixed some issues and the driver was working on the hardware samples we had in house (which is why I removed the workaround), but apparently there are still some variants that are not working correctly.
https://bugs.freedesktop.org/show_bug.cgi?id=100222
--- Comment #5 from Mauro Santos registo.mailling@gmail.com --- (In reply to Alex Deucher from comment #4)
The patch modifies si_apply_state_adjust_rules, I guess it's not applying cleanly to your kernel.
I've retried it with the current git tree and it does apply properly. Before I was trying with kernel 4.9.2. I can confirm that with the patches that were provided the card does not hang.
I have also tried reverting commit 3a69adfe5617ceba04ad3cff0f9ccad470503fb2 from kernel 4.9.2 (leaving only the sclk limitation) and it also works, no hangs with sclk=750MHz and mclk=900MHz.
https://bugs.freedesktop.org/show_bug.cgi?id=100222
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #6 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/784.
dri-devel@lists.freedesktop.org