https://bugs.freedesktop.org/show_bug.cgi?id=59089
Priority: medium Bug ID: 59089 Assignee: dri-devel@lists.freedesktop.org Summary: [bisected, regression] flood of GPU fault detected in logs caused by 9af20... drm/radeon: fix fence locking in the pageflip callback Severity: normal Classification: Unclassified OS: All Reporter: alexandre.f.demers@gmail.com Hardware: All Status: NEW Version: git Component: Drivers/Gallium/r600 Product: Mesa
GPU fault detected flood in logs (dmesg, kernel, errors and everything) of the following form: [ 533.928472] radeon 0000:03:00.0: GPU fault detected: 146 0x00335514 [ 533.928477] radeon 0000:03:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 533.928483] radeon 0000:03:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
From time to time, there will be an address value different from 0x00000000.
They are produced at an awful rate, producing GB of logs in no time.
Appeared in kernel 3.8.0-rcX using a cayman GPU (HD 6950).
Bisecting to identify when the flood first appeared points at: Commit: 4ac0533abaec2b83a7f2c675010eedd55664bc26
Author: Jerome Glisse jglisse@redhat.com 2012-12-13 12:08:11 Committer: Alex Deucher alexander.deucher@amd.com 2012-12-14 10:45:24 Parent: 9af20792124850369e764965690b99b20623dfc4 (drm/radeon: fix fence locking in the pageflip callback) Branch: remotes/origin/master Follows: v3.7-rc7 Precedes: v3.8-rc1
drm/radeon: fix htile buffer size computation for command stream checker
Fix the size computation of the htile buffer.
Signed-off-by: Jerome Glisse jglisse@redhat.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
Maybe be related to some of the crashes seen in bug 58667.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=58667
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #1 from Alex Deucher agd5f@yahoo.com --- Should be fixed with this mesa commit: http://cgit.freedesktop.org/mesa/mesa/commit/?id=4332f6fc185f968e7563e748b8c...
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #2 from Alexandre Demers alexandre.f.demers@gmail.com --- Seems good now.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
https://bugs.freedesktop.org/show_bug.cgi?id=59089
Thomas Rohloff v10lator@myway.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED |---
--- Comment #3 from Thomas Rohloff v10lator@myway.de --- I don't want to play the bad guy but for me this is not fixed, just reduced.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #4 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #3)
I don't want to play the bad guy but for me this is not fixed, just reduced.
Well, I closed it because I don't have the continuous GPU fault flood happening anymore. However, I was unable to determine if there was still a GPU fault happening. This bug is really about the flood.
So, I don't have any problem in reopening it if you do experience a flood of GPU faults. I was getting GB of logs in no time.
Are you still seeing GPU faults only in some circumstances (games or specific applications) or just opening a session (for me it's with Gnome Shell) is enough? Also, keep in mind this bug is pinpointing a specific commit.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #5 from Anthony Waters awaters1@gmail.com --- I get the GPU faults starting with mesa commit
3e163a137be7f9a80ec720903c4bda028de5681f is the first bad commit commit 3e163a137be7f9a80ec720903c4bda028de5681f Author: Marek Olšák maraeo@gmail.com Date: Thu Nov 29 02:55:01 2012 +0100
gallium/postprocess: share pipe_context and cso_context with the state tracker
Using one context instead of two is more efficient and we can skip another context flush.
Reviewed-by: Brian Paul brianp@vmware.com
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #6 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #5)
I get the GPU faults starting with mesa commit
3e163a137be7f9a80ec720903c4bda028de5681f is the first bad commit commit 3e163a137be7f9a80ec720903c4bda028de5681f Author: Marek Olšák maraeo@gmail.com Date: Thu Nov 29 02:55:01 2012 +0100
gallium/postprocess: share pipe_context and cso_context with the state
tracker
Using one context instead of two is more efficient and we can skip another context flush. Reviewed-by: Brian Paul <brianp@vmware.com>
Is it a flood? Other commits may create GPU faults, but it shouldn't flood your logs. I think it would be better to track different sources of GPU faults in different bugs.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #7 from Anthony Waters awaters1@gmail.com --- I would consider it a flood, the message continually appears until glxgears is exited. Can you confirm whether 3e163a137be7f9a80ec720903c4bda028de5681f in mesa stops all of the GPU faults? If it does I will open another bug report seeing as it may be different.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #8 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #7)
I would consider it a flood, the message continually appears until glxgears is exited. Can you confirm whether 3e163a137be7f9a80ec720903c4bda028de5681f in mesa stops all of the GPU faults? If it does I will open another bug report seeing as it may be different.
Neither glxgears nor Heroes of Newerth produce GPU fault flood since applied fix in mesa. If you are still experiencing a flood when glxgears is running, I'm pretty sure it is not the same thing under the hood. Which makes me think: what kernel version are you using? Did you test with latest mesa version since fix was pushed in git? Which gpu do you have?
I'm running latest kernel from Linus' git, latest mesa git and I'm using an HD 6950.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #9 from Anthony Waters awaters1@gmail.com --- Mesa is at latest git, however, my kernel wasn't at Linus' git, so that may be the issue, using HD 6950. I will try the newest kernel and if it doesn't work I'll create a new bug report.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #10 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #9)
Mesa is at latest git, however, my kernel wasn't at Linus' git, so that may be the issue, using HD 6950. I will try the newest kernel and if it doesn't work I'll create a new bug report.
Ok, let me know. Meanwhile, I'll test something on my side to see if your bad commit could be related to another bug I'm experiencing with Tropics.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #11 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #9)
Mesa is at latest git, however, my kernel wasn't at Linus' git, so that may be the issue, using HD 6950. I will try the newest kernel and if it doesn't work I'll create a new bug report.
Also, you could see if updating libdrm and/or ddx driver could help, there were some commits for both of them in the last couple of weeks.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #12 from Anthony Waters awaters1@gmail.com --- I have been able to get rid of the flood, the message appears some times in dmesg so there is some other bug that exists.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
--- Comment #13 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #12)
I have been able to get rid of the flood, the message appears some times in dmesg so there is some other bug that exists.
I agree with you on that point, Thomas and I are also experiencing gpu faults in some cases.
https://bugs.freedesktop.org/show_bug.cgi?id=59089
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |FIXED
--- Comment #14 from Alexandre Demers alexandre.f.demers@gmail.com --- Closing since original bug/commit was fixed. The remaining GPU faults must have a different root.
dri-devel@lists.freedesktop.org