https://bugzilla.kernel.org/show_bug.cgi?id=112491
Bug ID: 112491 Summary: Radeon: HD 7400G / A4-4355M System overheats with 3D graphics active. Product: Drivers Version: 2.5 Kernel Version: 4.2 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: djtm@gmx.net Regression: No
I've already tried several radeon boot flags, such as DPM on and off, BAPM on and off, without success. The problem does not occur with the fglrx or Windows driver. I'm using glamor acceleration. I'm using an external 4k display in FullHD resolution, the internal laptop display is disabled.
I've read similar bugs here: https://bugzilla.kernel.org/show_bug.cgi?id=63101 https://bugzilla.kernel.org/show_bug.cgi?id=68571 https://bugs.freedesktop.org/show_bug.cgi?id=73053
But I'm not sure which of the tips are still relevant in the current tree?
The results are: Playing a simple car racing game, I can do: dpm=0 -> 2 laps, then the system shuts down due to thermal zone overheating (regular shutdown, not suddenly). dpm=-1 -> ~1 lap, the system suddenly turns off hard without warning. bapm=1 -> ~1 lap, hard turn off
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #1 from Dionisus Torimens djtm@gmx.net --- Created attachment 203661 --> https://bugzilla.kernel.org/attachment.cgi?id=203661&action=edit dmesg 4.2
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #2 from Dionisus Torimens djtm@gmx.net --- Created attachment 203671 --> https://bugzilla.kernel.org/attachment.cgi?id=203671&action=edit dmesg 4.2 bapm=1 hang on radeon load.
This is a time where the system hung up already during boot with bapm active. That doesn't always happen, though.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #3 from Dionisus Torimens djtm@gmx.net --- Also note I can use the system pretty much without problems if I don't use 3D.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #4 from Alex Deucher alexdeucher@gmail.com --- 3D is used for everything (even "2D" apps). Is the problem specific to this car racing game? It might be a bug in the mesa driver which causes the GPU to lock up. Also make sure the fan and heatsink are free of dust.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #5 from Dionisus Torimens djtm@gmx.net --- No the problem occurs with any 3d load, just not with 2d load usually. Maybe even glxgears is enough.
Yes I've cleaned the fan and heatsink area. Didn't help. :/
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #6 from Dionisus Torimens djtm@gmx.net --- Actually, I forgot to mention I used to have problems without active 3D use as well, but I'm now switching the DPM profile to battery during boot. And that solved that part mostly. Also, high CPU use by itself is not a problem. Where not using DPM, I didn't switch to the battery/low power profile.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #7 from Dionisus Torimens djtm@gmx.net --- Created attachment 203781 --> https://bugzilla.kernel.org/attachment.cgi?id=203781&action=edit Temperature Graph 1
I'm having some doubts about the temperature hypothesis now. With DPM active, there seem to be higher temperatures during boot up than during the freezes or reboots. (visible by the gaps, 5 seconds intervals between measurements).
So with DPM there seems to be another issue than without it. I've tried disabling hyperz, to no avail.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #8 from Dionisus Torimens djtm@gmx.net --- Created attachment 203791 --> https://bugzilla.kernel.org/attachment.cgi?id=203791&action=edit Temperature Graph BAPM=1
It looks like with BAPM it might more likely be overheating. But I can't reproduce the lockups/sudden reboots at all at the moment.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #9 from Dionisus Torimens djtm@gmx.net --- Booting with radeon.hard_reset=1 I get this error at the point where I usually get a hang: GPU lockup (current fence id 0x0000000000004aa1 last fence id 0x0000000000004aa8 on ring 0)
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #10 from Dionisus Torimens djtm@gmx.net --- Created attachment 203801 --> https://bugzilla.kernel.org/attachment.cgi?id=203801&action=edit Temperature Graph Hard off radeon.fastfb=1 radeon.pcie_gen2=1 radeon.audio=0 radeon.hard_reset=1
Ok, hard turning off reproduced. It reaches almost 110°.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #11 from Dionisus Torimens djtm@gmx.net --- If you'd like any information please let me know now. Because it seems there is not much interest in finding the problem. So I'll have to and will switch back to the proprietary driver.
(It turned out that I used a different version of the game which crashed the card instead. I get the hard off also without any parameters btw.)
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #12 from Dionisus Torimens djtm@gmx.net --- [Wonderful, fglrx doesn't work at all with kernel 4.2... ...]
The fastest way to get the system to overheat is to - enable redshift (or probably xgamma) - disable vsync - set dpm to preformance* echo performance | sudo tee /sys/class/drm/card0/device/power_dpm_state - activate cpu turbo mode* echo 1 | sudo tee /sys/devices/system/cpu/cpufreq/boost - activate BAPM - activate DRI3 - stay in the game menu (tested with blazrush, kotor)
* Here the effect is not that certain/serious.
Things that help to avoid overheating: - boot with nomodeset radeon.modeset=1
#vblank_mode=0 glmark2 --run-forever does not cause a hang, some of the tests seem less demanding, so the temperature does up and down.
GALLIUM_HUD=temperature is helpful to watch how fast the temperature clims.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
Dionisus Torimens djtm@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #203781|0 |1 is obsolete| | Attachment #203781|Temperature Graph 1 |Temperature Graph 1 (crash description| |not heat related in this | |case)
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #13 from Dionisus Torimens djtm@gmx.net --- Created attachment 206301 --> https://bugzilla.kernel.org/attachment.cgi?id=206301&action=edit Temperature Graph with nomodeset radeon.modeset=1 and redshift
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #14 from Dionisus Torimens djtm@gmx.net --- Created attachment 206311 --> https://bugzilla.kernel.org/attachment.cgi?id=206311&action=edit Temperature Graph with nomodeset radeon.modeset=1 without redshift
https://bugzilla.kernel.org/show_bug.cgi?id=112491
Dionisus Torimens djtm@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Radeon: HD 7400G / A4-4355M |Radeon: HD 7400G / A4-4355M |System overheats with 3D |System overheats with |graphics active. |active graphics card use. Alias| |radeon_heat
--- Comment #15 from Dionisus Torimens djtm@gmx.net --- So the problems are worse as summer approaches. Still present in 4.6. The system also shuts down hard with vdpau video playback. The weird thing is that the hard shutdowns occur at a lower temperature if dpm is active than if it isn't.
Any hints for debugging this?
https://bugzilla.kernel.org/show_bug.cgi?id=112491
Vedran Miletić rivanvx@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rivanvx@gmail.com
--- Comment #16 from Vedran Miletić rivanvx@gmail.com --- So, OpenGL and VDPAU crash the GPU. Any chance you could also test OpenCL? Not sure if [1] works on R600 OpenCL, but [2] does.
[1] https://github.com/matszpk/clgpustress [2] https://github.com/lachesis/scallion
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #17 from Dionisus Torimens djtm@gmx.net --- I think I've solved it. The kernel parameter
radeon.runpm=0
seems to work around the issue. The performance is degraded, but the temperatures mostly stay below 80°C. Generally the system appears to stay much cooler.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #18 from Alex Deucher alexdeucher@gmail.com --- (In reply to Dionisus Torimens from comment #17)
I think I've solved it. The kernel parameter
radeon.runpm=0
seems to work around the issue. The performance is degraded, but the temperatures mostly stay below 80°C. Generally the system appears to stay much cooler.
Is this a multi-GPU notebook? That option only affects Hybrid laptops with multiple GPUs.
https://bugzilla.kernel.org/show_bug.cgi?id=112491
--- Comment #19 from Dionisus Torimens djtm@gmx.net --- Ok, true, the issue is still there. No, not multi-GPU.
dri-devel@lists.freedesktop.org