https://bugzilla.kernel.org/show_bug.cgi?id=213569
Bug ID: 213569 Summary: Amdgpu temperature reaching dangerous levels Product: Drivers Version: 2.5 Kernel Version: 5.12, 5.11 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: blocking Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: martin.tk@gmx.com Regression: No
Ever since going to 5.11 version and later 5.12 the fan speed on my Radeon RX550 is erratic causing the temperature to reach dangerous level.
sensors output:
amdgpu-pci-0100 Adapter: PCI adapter vddgfx: 825.00 mV fan1: 200 RPM (min = 0 RPM, max = 3500 RPM) edge: +69.0°C (crit = +97.0°C, hyst = -273.1°C) power1: 7.03 W (cap = 36.00 W)
I'm afraid it'll eventually kill my gpu.
I've already reported another bug for 5.11: https://bugzilla.kernel.org/show_bug.cgi?id=212107
From what I gather there were changes in fan control in 5.11. Is it possible to
disable those changes? There were no issues on 5.10. Fan went to roughly 1000rpm, it was cool and quiet.
The behaviour from 5.11 onward is dangerous, can cause hardware destruction.
https://bugzilla.kernel.org/show_bug.cgi?id=213569
miloog (mileikasjos@mailbox.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mileikasjos@mailbox.org
--- Comment #1 from miloog (mileikasjos@mailbox.org) --- I can confirm.
But in a different scenario. I'm using debian bullseye with lts kernel and latest amdgpu firmware. I don't change any fan control mechanism.
5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland window manager) my gpu usage is at 100% without doing anything.
It's a vega 56.
https://bugzilla.kernel.org/show_bug.cgi?id=213569
--- Comment #2 from Martin (martin.tk@gmx.com) --- In my case it was watching a video that made the gpu reach 70°C
https://bugzilla.kernel.org/show_bug.cgi?id=213569
James (mrjameshennig@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mrjameshennig@gmail.com
--- Comment #3 from James (mrjameshennig@gmail.com) --- This is a legitimate bug which is present starting 5.12.13 and the issue was said to have been fixed starting 5.13-rc8. I wanted to comment out of reassurance that 70°C edge temperature for that GPU cannot damage it. Notice "crit = +97.0°C" which is the throttle temperature.
The computer should shut down at the "emerg" temperature which is not present in your sensors output, but should be +5.0°C over "crit" for your GPU.
https://bugzilla.kernel.org/show_bug.cgi?id=213569
Frank Kruger (fkrueger@mailbox.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |fkrueger@mailbox.org
--- Comment #4 from Frank Kruger (fkrueger@mailbox.org) --- (In reply to miloog from comment #1)
I can confirm.
But in a different scenario. I'm using debian bullseye with lts kernel and latest amdgpu firmware. I don't change any fan control mechanism.
5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland window manager) my gpu usage is at 100% without doing anything.
It's a vega 56.
You are probably hit by a recent regression introduced with kernel 5.10.46 and 5.12.13 (cf. https://bugzilla.kernel.org/show_bug.cgi?id=213561), where patches are on its way (https://lists.freedesktop.org/archives/amd-gfx/2021-June/065612.html). This is not related to the original bug report here, I presume.
https://bugzilla.kernel.org/show_bug.cgi?id=213569
--- Comment #5 from Martin (martin.tk@gmx.com) --- (In reply to James from comment #3)
This is a legitimate bug which is present starting 5.12.13 and the issue was said to have been fixed starting 5.13-rc8. I wanted to comment out of reassurance that 70°C edge temperature for that GPU cannot damage it. Notice "crit = +97.0°C" which is the throttle temperature.
The computer should shut down at the "emerg" temperature which is not present in your sensors output, but should be +5.0°C over "crit" for your GPU.
Thank you for explanation. I've never seen 70°C on my gpu before so to me it looked scary.
Before those changes landed in 5.11 the usual temperature on my gpu would be around 40°C. The fan would be around 1000rpm which on my gpu doesn't produce any perceivable sound.
https://bugzilla.kernel.org/show_bug.cgi?id=213569
Martin (martin.tk@gmx.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Kernel Version|5.12, 5.11 |5.13
dri-devel@lists.freedesktop.org