https://bugs.freedesktop.org/show_bug.cgi?id=43655
Bug #: 43655 Summary: Latest radeon dri driver on HD6950 with kernel 3.2 flickers Classification: Unclassified Product: DRI Version: XOrg CVS Platform: x86-64 (AMD64) OS/Version: Linux (All) Status: NEW Severity: critical Priority: medium Component: DRM/Radeon AssignedTo: dri-devel@lists.freedesktop.org ReportedBy: alexandre.f.demers@gmail.com
My new HD6950 flickers like hell right from the initialization. Using kernel 3.1.0 is just fine, but kernels 3.2.0-rc3 and over make the screen flicker (I can't tell for versions in between for now). Also, the screen seems shifted by a bit more than half my monitor's width.
I also have an integrated HD3200 and I have no problem at all when selecting this integrated chipset over my PCI-E 6950 with the same driver and kernel combinations.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #1 from Alex Deucher agd5f@yahoo.com 2011-12-09 06:00:10 PST --- Can you bisect? Did you update any other components (mesa, xf86-video-ati) or just the kernel?
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #2 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-11 09:00:57 UTC --- More info about this bug: I have both kernel 3.1.0 and 3.2.0-rc4 installed right now (compiled from kernel.org). I had 3.2.0-rc3 installed, before moving to rc4 to test if the bug had been solved.
Have I updated other components? Of course, I'm testing with latest versions of both mesa and xf86-video-ati (I'll have to test today's versions though). But then, it shouldn't be a problem since I'm testing the same components with both kernels.
I'll bisect kernel's commits in the next couple of days to find which one is breaking things.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #3 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-12 01:03:23 PST --- OK, so after testing first all RCs, I narrowed the problem between RC3 and RC4. So, bisecting gave me the following culprit:
commit 9b5a4d4f65e260a109eaeea8bbc8062a7c58b55e Merge: cb35999 67589c7 Author: Linus Torvalds torvalds@linux-foundation.org Date: Mon Nov 28 13:49:43 2011 -0800
Merge branch 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/gi
* 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu percpu: explain why per_cpu_ptr_to_phys() is more complicated than necessa percpu: fix chunk range calculation percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc
It has nothing to do with drm in itself. But it must be related at some point... I'll reset my tree tomorrow and retest to be sure by compiling just before this commit.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #4 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-13 17:48:47 PST --- Strangely, when rebisecting, I found commit a34815b96f9a21b3a2e2912dfd0d994acd2855e3 to be the bad one... It is really near to the first one. So, I'm retesting both to be sure.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #5 from Michel Dänzer michel@daenzer.net 2011-12-15 10:18:01 UTC --- It sounds like the problem may happen or not with a certain probability with any given kernel. You should probably test each kernel a certain number of times before declaring it as good, or the bisection may not work correctly.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID
--- Comment #6 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-15 14:03:15 PST --- I tested today's latest kernel version after fighting with the beast for the last couple of days. Just to be sure, I made a clean compilation and it now works properly without any problem. I'll assume for the moment it was related to something stuck in the compilation.
If anything goes wrong again, I'll reopen the bug.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID |
--- Comment #7 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-15 16:31:19 PST --- This is one driving me crazy. You were right, it is no reproducible everytime. I have to reboot a couple of time to trigger it or to fix it... Going back to bisection.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #8 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-15 17:50:38 PST --- I think I've found a hint. Here's the thing:
Whatever kernel version is the first entry in my Grub's list, the problem will appear. If I select a different kernel manually or if I change the default menu entry to a different one, everything is fine. The only exception is if my first menu entry is Windows. Then, there is never any problem.
Here is what I see when selecting the first entry. First, the Grub's background stays for a moment and then it switches to the boot screen (using Ubuntu, it shows the Ubuntu loading screen). However, most of the time, it will flicker, usually showing only a couple of clear lines at the top of the screen.
If I select another entry, it switches to the kernel initialization (showing step by step what is being done) and then it switches to the boot screen only after having initialized correctly the screen. The only difference I can see between the first entries and the others is the following in my grub.cfg: set gfxpayload=$linux_gfx_mode
I suspect a bad interaction between Grub and the rest of the initialization process. Does my suspicion make sense?
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #9 from Michel Dänzer michel@daenzer.net 2011-12-16 01:38:38 PST --- (In reply to comment #8)
I suspect a bad interaction between Grub and the rest of the initialization process. Does my suspicion make sense?
Quite possibly. Can you test your hypothesis by moving this line between entries?
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #10 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-16 06:51:47 PST --- (In reply to comment #9)
(In reply to comment #8)
I suspect a bad interaction between Grub and the rest of the initialization process. Does my suspicion make sense?
Quite possibly. Can you test your hypothesis by moving this line between entries?
Tested and confirmed. Whenever I added "set gfxpayload=$linux_gfx_mode", there was a really high chance of hitting this bug (near 90% of the time). Without it, I booted flawlessly.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Latest radeon dri driver on |Latest radeon dri driver on |HD6950 with kernel 3.2 |HD6950 with GRUB set |flickers |gfxpayload=$linux_gfx_mode | |put the display in a | |flickering state
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #11 from Alexandre Demers alexandre.f.demers@gmail.com 2011-12-17 10:23:06 PST --- Should I try to bisect drm driver to see if there is a version without that problem? I've had this new 6950 for less than 2 weeks, so I don't even know if it worked correctly at some point.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Peter Wang blinxwang@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |blinxwang@gmail.com
--- Comment #12 from Peter Wang blinxwang@gmail.com 2012-04-29 12:45:29 PDT --- *** Bug 49262 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #13 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-21 03:49:15 PDT --- It's been some time now. Since my initial report, I moved from Ubuntu to Arch. Today, it was officially announced Arch was moving to Grub2. So I updated my setup (I was using Grub legacy since my move to Arch). Suprise, this bug is still valide.
So I played around with grub default options. So GRUB_GFXMODE=auto works fine, but GRUB_GFXPAYLOAD_LINUX=keep triggers the bug. Removing the latest option makes everything runs smoothly.
I read bug 49262 and two things are common with my setup: we are both using a 69XX radeon card and we are both using a DVI-to-VGA adaptor. I'm wondering if the combination of card AND adaptor was the root of the problem. Before having this 6950 card, I was using an Radeon HD 3200 IGP without any adaptor and I had no problem.
I don't have any other monitor here and I don't have a DVI or HDMI input on my monitor, so I can't tell yet. But still, what would you suggest to try to help figure out what's going on? Any comment from Alex or Michel would be appreciated. I could have access to a different monitor if I ask for it.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #14 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-26 05:33:08 PDT --- May well be the same as bug 42373. I'll try to find a way to dig this following 42373 repro steps.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #15 from Jerome Glisse glisse@freedesktop.org 2012-07-30 14:59:56 UTC --- If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch there should fix your issue.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #16 from Alexandre Demers alexandre.f.demers@gmail.com 2012-07-30 15:12:24 UTC --- (In reply to comment #15)
If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch there should fix your issue.
I'll try it as soon as I'll have time. Thank you Jerome for your follow-up.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #17 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-04 04:58:03 UTC --- (In reply to comment #16)
(In reply to comment #15)
If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch there should fix your issue.
I'll try it as soon as I'll have time. Thank you Jerome for your follow-up.
(In reply to comment #15)
If it's same as https://bugs.freedesktop.org/show_bug.cgi?id=42373 then patch there should fix your issue.
It fixes it. Applied, rebooted 3 times without problem, went back to 3.6-rc1 (no patch) problem appeared, went back to patched kernel and still no problem.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alex Deucher agd5f@yahoo.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |DUPLICATE
--- Comment #18 from Alex Deucher agd5f@yahoo.com 2012-08-04 13:26:36 UTC ---
*** This bug has been marked as a duplicate of bug 42373 ***
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #19 from Alexandre Demers alexandre.f.demers@gmail.com 2012-08-20 03:00:22 UTC --- Fixed by attachment 64759 (proposed in bug 42373 which is similar to this bug but is not the same since it is not fixed by the attachment)
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=56139
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|DUPLICATE |---
--- Comment #20 from Alexandre Demers alexandre.f.demers@gmail.com --- I'm reopening this bug for two reasons: -It is still happening with kernel 3.9.0-rc4 because attachment 64759 from bug 42373 seems to never have been pushed -It is not a duplicate of bug 42373 since attachment 64759 fixes current bug but not 42373
It would be nice to have a revised version of attachment 64759 that applies correctly on latest kernel, then to have it tested and pushed to kernel's git.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #21 from Alexandre Demers alexandre.f.demers@gmail.com --- So I'm trying to narrow down what is going on. Kernel 3.5 + patch 64759 works OK. I'm now testing kernel's commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 that was supposed to supersede patch 64759 in kernel 3.6. I'll see what I get.
My feeling is we are not saving/restoring an address (VM, VRAM, TTM, whatever) correctly somewhere along the path.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Latest radeon dri driver on |Latest radeon dri driver on |HD6950 with GRUB set |HD6950 with GRUB set |gfxpayload=$linux_gfx_mode |"GRUB_GFXPAYLOAD_LINUX=keep |put the display in a |" put the display in a |flickering state |flickering state
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #22 from Alex Deucher agd5f@yahoo.com --- The current code should do the right thing with respect to disabling display access to vram when we reconfigure the memory controller. The current code disables memory reads but leaves the display controllers enabled while we change the MC setup. Turning off the crtcs as the patch you mentioned does has two problems: 1. it breaks some systems which the current method fixes 2. it defeats the purpose of GRUB_GFXPAYLOAD_LINUX=keep which is to avoid turning off the displays for flickerless boot up. If you turn off the crtcs you have to re-init the entire display pipeline. The problem seems to be that disabling the crtc memory reads seems to take longer than expected on some systems which leads to invalid reads while the MC is being reprogrammed. One possible solution may be to leave the MC as configured by the vbios and try and put the gart aperture either before or after the location of varm in the GPU's address space.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #23 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #22)
The current code should do the right thing with respect to disabling display access to vram when we reconfigure the memory controller. The current code disables memory reads but leaves the display controllers enabled while we change the MC setup. Turning off the crtcs as the patch you mentioned does has two problems:
- it breaks some systems which the current method fixes
- it defeats the purpose of GRUB_GFXPAYLOAD_LINUX=keep which is to avoid
turning off the displays for flickerless boot up. If you turn off the crtcs you have to re-init the entire display pipeline. The problem seems to be that disabling the crtc memory reads seems to take longer than expected on some systems which leads to invalid reads while the MC is being reprogrammed. One possible solution may be to leave the MC as configured by the vbios and try and put the gart aperture either before or after the location of varm in the GPU's address space.
I understand what you are explaining. Meanwhile, I'm bisecting to find out where it was broken again since commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 does indeed what it is supposed to do (no problem when using GRUB_GFXPAYLOAD_LINUX=keep). So, somewhere between commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 and 3.9.0-rcx, something went wrong. I'll keep in touch.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #24 from Alexandre Demers alexandre.f.demers@gmail.com --- 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 is the first bad commit commit 62444b7462a2b98bc78d68736c03a7c4e66ba7e2 Author: Alex Deucher alexander.deucher@amd.com Date: Wed Aug 15 17:18:42 2012 -0400
drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)
- Stop the displays from accessing the FB - Block CPU access - Turn off MC client access
This should fix issues some users have seen, especially with UEFI, when changing the MC FB location that result in hangs or display corruption.
v2: fix crtc enabled check noticed by Luca Tettamanti
Signed-off-by: Alex Deucher alexander.deucher@amd.com
:040000 040000 3e0d33c9b4eda29ced814fe9a863efe63e53f14c 4932561607b160734ec1eade927a9fe18c9f3f1b M drivers
So in other words, your explanation Alex seems to be right. I'll be waiting if anything has to be tested.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #25 from Alex Deucher agd5f@yahoo.com --- Created attachment 77332 --> https://bugs.freedesktop.org/attachment.cgi?id=77332&action=edit possible fix
Does this patch help?
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #26 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #25)
Created attachment 77332 [details] [review] possible fix
Does this patch help?
Applied on 3.9-rc5 and it doesn't help.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #27 from Alex Deucher agd5f@yahoo.com --- (In reply to comment #26)
Applied on 3.9-rc5 and it doesn't help.
Can you attach your dmesg output with the patch applied?
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #28 from Alexandre Demers alexandre.f.demers@gmail.com --- Created attachment 77348 --> https://bugs.freedesktop.org/attachment.cgi?id=77348&action=edit dmesg from 3.9-rc5 with patch
Et voilà, as asked
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #77348|0 |1 is obsolete| |
--- Comment #29 from Alexandre Demers alexandre.f.demers@gmail.com --- Created attachment 77350 --> https://bugs.freedesktop.org/attachment.cgi?id=77350&action=edit 3.9-rc5 with patch and drm.debug=14
With more debug info
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=57567
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #30 from Alex Deucher agd5f@yahoo.com --- does attachment 77441 help?
https://bugs.freedesktop.org/show_bug.cgi?id=43655
--- Comment #31 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #30)
does attachment 77441 [details] [review] help?
Still the same. 1 boot on 4 was OK, the three others were showing the same kind of corruptions as before.
https://bugs.freedesktop.org/show_bug.cgi?id=43655
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |FIXED
--- Comment #32 from Alexandre Demers alexandre.f.demers@gmail.com --- Closing this bug since fixed since a few of releases.
dri-devel@lists.freedesktop.org