https://bugzilla.kernel.org/show_bug.cgi?id=75241
Bug ID: 75241 Summary: radeon_compute_pll_avivo broken in 3.15-rc3 Product: Drivers Version: 2.5 Kernel Version: 3.15-rc3 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: clemens@ladisch.de Regression: Yes
After upgrading from rc2 to rc3, my RS880 no longer outputs a signal that my monitor is able to show.
Bisected to this:
commit c2fb3094669a3205f16a32f4119d0afe40b1a1fd Author: Christian König christian.koenig@amd.com Date: Sun Apr 20 13:24:32 2014 +0200
drm/radeon: improve PLL limit handling in post div calculation
This improves the PLL parameters when we work at the limits of the allowed ranges.
Debug output with black screen:
kernel: [drm:drm_crtc_helper_set_config] attempting to set mode from userspace kernel: [drm:drm_mode_debug_printmodeline] Modeline 29:"1600x1200" 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x48 0x5 kernel: [drm:radeon_encoder_set_active_device] setting active device to 00000200 from 00000200 00000200 for encoder 2 kernel: [drm:drm_crtc_helper_set_mode] [CRTC:14] kernel: [drm:radeon_atom_encoder_dpms] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000 kernel: [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 2036.3 ref: 30, post 6
With this commit reverted (and working screen):
kernel: [drm:radeon_compute_pll_avivo] 162000 - 16106, pll dividers - fb: 1023.5 ref: 13, post 7
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |deathsimple@vodafone.de
--- Comment #1 from Christian König deathsimple@vodafone.de --- Thanks for the info, could you provide the debug output of 3.14 as well? I especially need the line with radeon_compute_pll_avivo.
The problem isn't really triggered by the patch you bisected, but more an issue of the new PLL code.
Thanks in advance, Christian.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #2 from Christian König deathsimple@vodafone.de --- Created attachment 134611 --> https://bugzilla.kernel.org/attachment.cgi?id=134611&action=edit Possible fix.
Please try the attached patch, it might fix the issue.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #3 from Clemens Ladisch clemens@ladisch.de --- 3.14: 16205, pll dividers - fb: 135.8 ref: 2, post 6
With the patch: 162000 - 161990, pll dividers - fb: 271.5 ref: 4, post 6
And the patch indeed fixes this.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #4 from Christian König deathsimple@vodafone.de --- (In reply to Clemens Ladisch from comment #3)
Thanks allot for the info. Going to push the patch with the next bugfix release.
Could you try higher values for the limit as well and try to figure out what's the maximum your monitor still can handle?
It might also make sense to temporary comment out that the following line and see what you get for the parameters and if those still work fine:
avivo_reduce_ratio(&fb_div, &ref_div, fb_div_min, ref_div_min);
And by the way: What monitor is this?
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #5 from Clemens Ladisch clemens@ladisch.de --- It's an Eizo S2100, but this should not matter because the clocks seen by the monitor are always about the same (162MHz/75kHz/60Hz). If some were out of range, the monitor would show an error message, but with the PLL problem, the monitor does not appear to detect even an out-of-range signal. I'd guess the PLL itself cannot handle the parameters.
The largest working ref_div_max limit is 131.
with 131: 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6
with 132: 162000 - 162000, pll dividers - fb: 1493.3 ref: 22, post 6
avivo_reduce_ratio does not change these values.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #6 from Christian König deathsimple@vodafone.de --- (In reply to Clemens Ladisch from comment #5)
The PLL should be able to handle this quite fine. It's just that when you increase the reference and post divider you can better match the wanted frequency for the cost of increased jitter and general signal stability.
I have one monitor here that practically works with everything I give to it, another one can't handle it when the frequency doesn't precisely match and a third one doesn't like it when we have a high jitter in the signal.
The trick is to find the right sweet spot where you can make everbody happy.
The largest working ref_div_max limit is 131.
Thanks allot, going to use 128 then (just because it's a nice round number) until somebody else starts to complain that his monitor doesn't likes the signal.
Christian.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Tasev Nikola tasev.stefanoska@skynet.be changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |tasev.stefanoska@skynet.be
--- Comment #7 from Tasev Nikola tasev.stefanoska@skynet.be --- Hi
I'm still having randomly the frequency out off range problem with kernel 3.15-rc5. 2 times today when booting , once yesterday after suspend resume. My screen is a Belinea 2080S2 1600x1200 . I first reported this as bug 75471 where are dmesg.
Nikola
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #8 from Alex Deucher alexdeucher@gmail.com --- Does this patch help: http://lists.freedesktop.org/archives/dri-devel/2014-May/059469.html
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #9 from Christian König deathsimple@vodafone.de --- (In reply to Alex Deucher from comment #8)
Does this patch help: http://lists.freedesktop.org/archives/dri-devel/2014-May/059469.html
Unlikely. Tasev has an RS780, on those the feedback divider is usually in the ~1000 area. This patch only moves the feedback divider limit for .5 from 14 down to 13.
Does it help if you suspend/resume again after this issue? Might be that we are seeing a crash somewhere else?
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #10 from Tasev Nikola tasev.stefanoska@skynet.be --- I'm compiling a patched kernel now. I will test it but to be shure i will need 4-5 day's probably, because i use the 3.15-rc5 from sunday and the problem appear only now.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #11 from Tasev Nikola tasev.stefanoska@skynet.be --- Hi
I just notice that the 3.15-rc5 will boot successfully only once in 5-6 attempt. When i try it the first time in sunday i was just lucky he boot at first time. I suspend resume this computer rater then shutdown, and i did not shutdown the computer until yesterday after the failure when suspend resume. Now, with or without the patch, it will boot only once in 5-6 attempts without the out off range frequency problem. Attached are dmesg when working and not working without patch.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #12 from Tasev Nikola tasev.stefanoska@skynet.be --- Created attachment 136241 --> https://bugzilla.kernel.org/attachment.cgi?id=136241&action=edit dmesg working 3.15-rc5
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #13 from Tasev Nikola tasev.stefanoska@skynet.be --- Created attachment 136251 --> https://bugzilla.kernel.org/attachment.cgi?id=136251&action=edit dmesg broken 3.15-rc5
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #14 from Tasev Nikola tasev.stefanoska@skynet.be --- Hi
I try today with a Medion 1280x1024 monitor and everything work without problem. It seem's that only the combinaison RS880 + Belinea 2080S2 have problem with the new PLL code. I tried different value from 128 to 90 for the ref_div_max but none work with my Belinea 1600x1200 screen.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #15 from Christian König deathsimple@vodafone.de --- (In reply to Tasev Nikola from comment #14)
I tried different value from 128 to 90 for the ref_div_max but none work with my Belinea 1600x1200 screen.
Try going down to at least 32, this would match the behaviour on 3.14.
The problem is that in both the working and broken case the calculated parameters are the same.
Broken: [ 23.511041] [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6
Working: [ 23.560826] [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6
So I'm not really sure what else could go wrong here.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #16 from Tasev Nikola tasev.stefanoska@skynet.be --- Hi
I tried with 64, 48 and 32 for the ref_div_max . The only one working at boot is 32 , but after the first suspend resume the off range frequency problem appear again. I try a second suspend resume with the same result. I try also with the patch from comment 8 with the same result, boot succesfull and fail after resume. And you're right, the calculated parameters are the same in both the working and broken case again. The dmesg after boot and after suspend resume are attached.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #17 from Tasev Nikola tasev.stefanoska@skynet.be --- Created attachment 136831 --> https://bugzilla.kernel.org/attachment.cgi?id=136831&action=edit dmesg after boot working with max divider 32
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #18 from Tasev Nikola tasev.stefanoska@skynet.be --- Created attachment 136841 --> https://bugzilla.kernel.org/attachment.cgi?id=136841&action=edit dmesg after suspend resume broken
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #19 from Christian König deathsimple@vodafone.de ---
From the logs you are always getting the same set of paramaters, even when you
change the maximum used in the fix:
[drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6
With a maximum of 32 and a post divider of 6 the ref divider shouldn't be more than 5, but it still stays at 21.
Thise means there is something wrong with the way you install the kernel module (or the modification you make). Please double check that you got the right kernel module loaded.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #20 from Tasev Nikola tasev.stefanoska@skynet.be --- You're right again.
It seems that just build the module doesn't work for me. I build a new kernel from sources with the ref_div_max 124 and it seems to work for now. [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1346.2 ref: 17, post 7 I rebooted 3 times and it always boot fine. I would test it for some days and report if everything work fine. Sorry for my previous post
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #21 from Tasev Nikola tasev.stefanoska@skynet.be --- Hi
With the ref_div_max 124 everything works fine. Should i try another value just let me now.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #22 from Christian König deathsimple@vodafone.de --- (In reply to Tasev Nikola from comment #21)
I'm going to submit a patch with value 114, just to have some more room for errors.
I know that values below 100 causes problems for another user, so when 114 works for you we probably found the sweet spot.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #23 from Tasev Nikola tasev.stefanoska@skynet.be --- With ref_div_max 114 everything works fine for me.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #24 from Tasev Nikola tasev.stefanoska@skynet.be --- The new ref_div_max = max(min(100 / post_div, ref_div_max), 1u); works fine with my Belinea 1600x1200 screen.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Dan Merillat bugzilla@dan.merillat.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |bugzilla@dan.merillat.org
--- Comment #25 from Dan Merillat bugzilla@dan.merillat.org --- Unfortunately, I had to set this down to 32 to work on my system.
Radeon HD 3200 (onboard, RS780) Monitor Viewsonic G225f Kernel 3.16-rc3
Nonworking: [drm:radeon_compute_pll_avivo] 229500 - 229500, pll dividers - fb: 1602.7 ref: 25, post 4
Working: [drm:radeon_compute_pll_avivo] 229500 - 229500, pll dividers - fb: 240.4 ref: 3, post 5
CRTs are getting increasingly rare - perhaps a tunable for this so us fogies with 100 pound monitors can set it where it works on our system? For me, it's a trivial patch to carry forward but setting something like drm.ref_div_tweak=32 in my grub config would be easier.
I haven't been able to use a kernel since commit 3216701 drm/radeon: rework finding display PLL numbers v2.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #134611|0 |1 is obsolete| |
--- Comment #26 from Christian König deathsimple@vodafone.de --- Created attachment 142051 --> https://bugzilla.kernel.org/attachment.cgi?id=142051&action=edit Possible fix v2.
Does this patch fixes the issue for you?
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #27 from Dan Merillat bugzilla@dan.merillat.org --- No, I reverted to a clean 3.16-rc3 (changed the 32 back to 100) and applied the patch:
[drm:radeon_compute_pll_avivo] 229500 - 229500, pll dividers - fb: 1602.7 ref: 20, post 5
fb: is the same, ref and post are different. Same results as without the patch - the monitor wakes up out of sleep, but doesn't display anything.
I can't get the OSD to display, so I don't know what it thinks the sync rates are.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #142051|0 |1 is obsolete| |
--- Comment #28 from Christian König deathsimple@vodafone.de --- Created attachment 142281 --> https://bugzilla.kernel.org/attachment.cgi?id=142281&action=edit Possible fix v3.
How about this one? Does it fixes the issue as well?
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #29 from Dan Merillat bugzilla@dan.merillat.org --- (In reply to Christian König from comment #28)
Sorry for the long delay in getting back to you.
3.16 stock does not work on my monitor, this patch (alone) fixes it.
I don't have a scope at my house, but at the office when this happens all signal lines on the VGA are idle.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
Benjamin Herrenschmidt benh@kernel.crashing.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |benh@kernel.crashing.org
--- Comment #30 from Benjamin Herrenschmidt benh@kernel.crashing.org --- Your latest change broke it for me, sorry for the delay in noticing, that combination of machine & monitor was stuck in the dark ages for a while...
The combo is Radeon R9 290 (from Sapphire) and good old Apple Cinema Display 23" (1920x1200x60 fixed resolution display) on DVI.
I get a black screen with radeon. It works with Alex's amdgpu. The one liner that fixes it is in the PLL calculation:
-ref_div_max = max(min(100 / post_div, ref_div_max), 1u); +ref_div_max = max(min(128 / post_div, ref_div_max), 1u);
I noticed other differences though, the max fb div is 2047 with radeon and 4095 with amdgpu but the above is the key.
This is a trace of amdgpu calculation (which works) after I sprinkled printk's around:
[ 3.471131] fb_div_min/max=4/4095 pll_flags=400 [ 3.471132] by 10 ! fb_div_min/max=40/40950 [ 3.471133] ref_div_min=2 (from 0/2) [ 3.471133] ref_div_max=1023 (from 0/1023) [ 3.471134] vco_min/max=600000/1200000 [ 3.471134] post_div_min/max=4/7 [ 3.471135] initial nom=153970, den=2700 [ 3.471136] reduced nom=15397, den=270 [ 3.471136] - trying post_div 4, ref_div_max=32 [ 3.471137] tentative ref_div=32m, fb_div=7299 [ 3.471137] adjusted ref_div=32m, fb_div=7299 [ 3.471138] diff=7, diff_best=-1 [ 3.471138] - trying post_div 5, ref_div_max=25 [ 3.471139] tentative ref_div=25m, fb_div=7128 [ 3.471139] adjusted ref_div=25m, fb_div=7128 [ 3.471139] diff=6, diff_best=7 [ 3.471140] - trying post_div 6, ref_div_max=21 [ 3.471140] tentative ref_div=21m, fb_div=7185 [ 3.471141] adjusted ref_div=21m, fb_div=7185 [ 3.471141] diff=6, diff_best=6 [ 3.471141] - trying post_div 7, ref_div_max=18 [ 3.471142] tentative ref_div=18m, fb_div=7185 [ 3.471142] adjusted ref_div=18m, fb_div=7185 [ 3.471150] diff=6, diff_best=6 [ 3.471150] post_div_best=7 [ 3.471151] - trying post_div 7, ref_div_max=18 [ 3.471151] tentative ref_div=18m, fb_div=7185 [ 3.471152] adjusted ref_div=18m, fb_div=7185 [ 3.471153] [drm:amdgpu_pll_compute] 153970 - 153960, pll dividers - fb: 239.5 ref: 6, post 7
Now this is with radeon *NOTE: I have bumped the max fb div to the same as AMD GPU when taking that trace but that had no effect:
[ 4.718126] fb_div_min/max=4/4095 pll_flags=410 [ 4.718126] by 10 ! fb_div_min/max=40/40950 [ 4.718127] ref_div_min=2 (from 0/2) [ 4.718128] ref_div_max=1023 (from 0/1023) [ 4.718128] vco_min/max=600000/1200000 [ 4.718129] post_div_min/max=4/7 [ 4.718129] initial nom=153970, den=2700 [ 4.718130] reduced nom=15397, den=270 [ 4.718130] - trying post_div 4, ref_div_max=25 [ 4.718131] tentative ref_div=25m, fb_div=5703 [ 4.718131] adjusted ref_div=25m, fb_div=5703 [ 4.718132] diff=11, diff_best=-1 [ 4.718133] - trying post_div 5, ref_div_max=20 [ 4.718133] tentative ref_div=20m, fb_div=5703 [ 4.718133] adjusted ref_div=20m, fb_div=5703 [ 4.718134] diff=11, diff_best=11 [ 4.718134] - trying post_div 6, ref_div_max=16 [ 4.718135] tentative ref_div=16m, fb_div=5474 [ 4.718135] adjusted ref_div=16m, fb_div=5474 [ 4.718136] diff=14, diff_best=11 [ 4.718136] - trying post_div 7, ref_div_max=14 [ 4.718136] tentative ref_div=14m, fb_div=5589 [ 4.718137] adjusted ref_div=14m, fb_div=5589 [ 4.718137] diff=12, diff_best=11 [ 4.718138] post_div_best=5 [ 4.718138] - trying post_div 5, ref_div_max=20 [ 4.718139] tentative ref_div=20m, fb_div=5703 [ 4.718139] adjusted ref_div=20m, fb_div=5703 [ 4.718141] [drm:radeon_compute_pll_avivo] 153970 - 153980, pll dividers - fb: 570.3 ref: 20, post 5
The modeline is:
Modeline 55:"1920x1200" 60 153970 1920 1968 2000 2080 1200 1203 1209 1235 0x48 0x9
And is consistent between the 2 drivers.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #31 from Benjamin Herrenschmidt benh@kernel.crashing.org --- Note: It's an LCD :-) It's one of those fixed-mode panels Apple has always been fond of, one of the very first 1920x1200 out there.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #32 from Benjamin Herrenschmidt benh@kernel.crashing.org --- Note 2: Catalyst and the Windows driver both work fine. Any way to know what formula these 2 use (I assume it's the same code) ?
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #33 from Christian König deathsimple@vodafone.de --- (In reply to Benjamin Herrenschmidt from comment #30)
The combo is Radeon R9 290 (from Sapphire) and good old Apple Cinema Display 23" (1920x1200x60 fixed resolution display) on DVI.
Well this bug report is about nearly ten year old hardware and was fixed almost two years ago (we just forgot to close it).
So please open a separate bug report preferable in the FDO bugzilla.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #34 from Benjamin Herrenschmidt benh@kernel.crashing.org --- Well, the Apple Cinema Display is nearly 10 years old too :-) But at least it's an LCD... I will open a new bug on FDO.
https://bugzilla.kernel.org/show_bug.cgi?id=75241
--- Comment #35 from Benjamin Herrenschmidt benh@kernel.crashing.org --- https://bugs.freedesktop.org/show_bug.cgi?id=96789
dri-devel@lists.freedesktop.org