https://bugzilla.kernel.org/show_bug.cgi?id=205277
Bug ID: 205277 Summary: [amd powerplay] vega10: soc voltage for power state 7 is not changed by overdrive. Product: Drivers Version: 2.5 Kernel Version: 5.4.0-rc3 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: pelle@vangils.xyz Regression: No
Created attachment 285583 --> https://bugzilla.kernel.org/attachment.cgi?id=285583&action=edit debug patch
Using Overdrive to set voltage and frequency on a vega10 card does not set the voltage for the highest power state (state 7).
To reproduce:
boot with kernel parameter 'amdgpu.ppfeaturemask=0xffffffff'
cat pp_od_clk_voltage on boot:
OD_SCLK: 0: 852Mhz 800mV 1: 991Mhz 900mV 2: 1138Mhz 950mV 3: 1269Mhz 1000mV 4: 1312Mhz 1050mV 5: 1474Mhz 1100mV 6: 1538Mhz 1150mV 7: 1590Mhz 1200mV OD_MCLK: 0: 167Mhz 800mV 1: 500Mhz 800mV 2: 700Mhz 900mV 3: 800Mhz 950mV OD_RANGE: SCLK: 852MHz 2400MHz MCLK: 167MHz 1500MHz VDDC: 800mV 1200mV
set pp_od_clk_voltage:
# cd /sys/class/drm/card0/device
# echo "s 2 1138 910" > pp_od_clk_voltage # echo "s 3 1269 920" > pp_od_clk_voltage # echo "s 4 1312 930" > pp_od_clk_voltage # echo "s 5 1474 940" > pp_od_clk_voltage # echo "s 6 1538 950" > pp_od_clk_voltage # echo "s 7 1590 980" > pp_od_clk_voltage
# echo "c" > pp_od_clk_voltage
cat pp_od_clk_voltage:
OD_SCLK: 0: 852Mhz 800mV 1: 991Mhz 900mV 2: 1138Mhz 910mV 3: 1269Mhz 920mV 4: 1269Mhz 920mV 5: 1474Mhz 940mV 6: 1538Mhz 950mV 7: 1590Mhz 980mV OD_MCLK: 0: 167Mhz 800mV 1: 500Mhz 800mV 2: 700Mhz 900mV 3: 800Mhz 950mV OD_RANGE: SCLK: 852MHz 2400MHz MCLK: 167MHz 1500MHz VDDC: 800mV 1200mV
This all seems fine. The voltages are set for all the power states. But when stressing the gpu it still uses it's default of 1200mV for power state 7, as can be observed in amdgpu_pm_info:
# cat /sys/kernel/debug/dri/0/amdgpu_pm_info ... GFX Clocks and Power: 800 MHz (MCLK) 1484 MHz (SCLK) 1269 MHz (PSTATE_SCLK) 700 MHz (PSTATE_MCLK) 1200 mV (VDDGFX) 260.0 W (average GPU) ...
Using the attached patch to print the voltages that are actually being set to the vddc_lookup_table the output to dmesg is:
... [ 521.364502] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 0 vddc: 800 [ 521.364504] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 1 vddc: 900 [ 521.364504] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 2 vddc: 910 [ 521.364505] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 3 vddc: 920 [ 521.364505] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 4 vddc: 920 [ 521.364506] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 5 vddc: 940 [ 521.364506] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 6 vddc: 950 ...
_Not_ printing state 7. So it appears the vddc value for state 7 is never set in the new lookup table.
https://bugzilla.kernel.org/show_bug.cgi?id=205277
--- Comment #1 from Pelle van Gils (pelle@vangils.xyz) --- Created attachment 285585 --> https://bugzilla.kernel.org/attachment.cgi?id=285585&action=edit proposed patch
added proposed fix
https://bugzilla.kernel.org/show_bug.cgi?id=205277
--- Comment #2 from Pelle van Gils (pelle@vangils.xyz) --- (In reply to Pelle van Gils from comment #1)
Created attachment 285585 [details] proposed patch
added proposed fix
with this patch applied (and the debug patch) dmesg output is: ... [ 107.149105] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 0 vddc: 800 [ 107.149107] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 1 vddc: 900 [ 107.149108] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 2 vddc: 910 [ 107.149109] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 3 vddc: 920 [ 107.149109] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 4 vddc: 930 [ 107.149110] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 5 vddc: 940 [ 107.149111] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 6 vddc: 950 [ 107.149112] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 7 vddc: 980 ...
And the soc voltage under stress stays at the set value:
# cat /sys/kernel/debug/dri/0/amdgpu_pm_info ... GFX Clocks and Power: 800 MHz (MCLK) 1541 MHz (SCLK) 1269 MHz (PSTATE_SCLK) 700 MHz (PSTATE_MCLK) 981 mV (VDDGFX) 161.0 W (average GPU) ...
https://bugzilla.kernel.org/show_bug.cgi?id=205277
haro41@gmx.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |haro41@gmx.de
--- Comment #3 from haro41@gmx.de --- Did you debug this issue? I think the problem could be outside this code.
I would outcomment the if-statement following for-loop in your proposed patch, because otherwise 'i' points outside the array boundarys here.
https://bugzilla.kernel.org/show_bug.cgi?id=205277
stefanspr94@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |stefanspr94@gmail.com
--- Comment #4 from stefanspr94@gmail.com --- (In reply to haro41 from comment #3)
Did you debug this issue? I think the problem could be outside this code.
I would outcomment the if-statement following for-loop in your proposed patch, because otherwise 'i' points outside the array boundarys here.
I think the if statement is fine as both od_vddc_lookup_table->entries[] and podn_vdd_dep->entries[] both hold MAX_REGULAR_DPM_NUMBER members, which is 8, so accessing entries[7] is not out of bounds.
Btw, the patch works for me aswell. Card behaves as it should after loading my pp_table, which was not the case before.
https://bugzilla.kernel.org/show_bug.cgi?id=205277
Pelle van Gils (pelle@vangils.xyz) changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #285585|0 |1 is obsolete| |
https://bugzilla.kernel.org/show_bug.cgi?id=205277
--- Comment #5 from haro41@gmx.de --- In the (now obsolete) proposed code, the variable 'i' will become 8, when the for-loop is done. The following if-statement will access something outside the array memory.
Something like this may work without problems, but it can trigger a new problem too.
https://bugzilla.kernel.org/show_bug.cgi?id=205277
--- Comment #6 from Pelle van Gils (pelle@vangils.xyz) --- Created attachment 285633 --> https://bugzilla.kernel.org/attachment.cgi?id=285633&action=edit proposed patch v2
https://bugzilla.kernel.org/show_bug.cgi?id=205277
--- Comment #7 from Pelle van Gils (pelle@vangils.xyz) --- (In reply to haro41 from comment #3)
Did you debug this issue? I think the problem could be outside this code.
I would outcomment the if-statement following for-loop in your proposed patch, because otherwise 'i' points outside the array boundarys here.
Thank you for your reply. I have uploaded a new patch with your suggestion.
It looks to me now that this is not so much a bug but intended beviour. I would still like to see this changed though.
https://bugzilla.kernel.org/show_bug.cgi?id=205277
Pelle van Gils (pelle@vangils.xyz) changed:
What |Removed |Added ---------------------------------------------------------------------------- Kernel Version|5.4.0-rc3 |5.4.0-rc4
https://bugzilla.kernel.org/show_bug.cgi?id=205277
--- Comment #8 from haro41@gmx.de --- I have to agree, the code in its current state, only allows overvolting for dpm level 7.
Since the highest performance level is the most interesting one, if it comes to undervolting, energy saving and performance maximization, that should be fixed asap.
Thanks for your effort, btw.
https://bugzilla.kernel.org/show_bug.cgi?id=205277
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #9 from Alex Deucher (alexdeucher@gmail.com) --- (In reply to Pelle van Gils from comment #6)
Created attachment 285633 [details] proposed patch v2
Applied. thanks!
https://bugzilla.kernel.org/show_bug.cgi?id=205277
Pelle van Gils (pelle@vangils.xyz) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
dri-devel@lists.freedesktop.org