On Mon, Mar 24, 2014 at 4:04 PM, Bjorn Helgaas bhelgaas@google.com wrote:
On Sat, Mar 22, 2014 at 9:18 AM, Andy Lutomirski luto@amacapital.net wrote:
On Fri, Mar 21, 2014 at 9:37 AM, Bjorn Helgaas bhelgaas@google.com wrote:
On Fri, Mar 21, 2014 at 9:49 AM, Andy Lutomirski luto@amacapital.net wrote:
On Fri, Mar 21, 2014 at 7:41 AM, Alex Deucher alexdeucher@gmail.com wrote:
On Thu, Mar 20, 2014 at 10:17 PM, Andy Lutomirski luto@amacapital.net wrote:
My system works on a 3.13 Fedora kernel. It does not work on a more-or-less identically configured 3.14-rc7+ kernel. The symptom is that the Plymouth password prompt flashes and them the screen goes blank. Hitting escape brings back the text console, and all is well until X tries to start. Then I get a blank screen. killall -9 Xorg from ssh causes these errors to be logged:
[ 226.239747] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [ 226.239751] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD34 (len 55, WS 0, PS 0) @ 0xCD57 [ 231.241492] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [ 231.241496] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD6C (len 62, WS 0, PS 0) @ 0xCD88 [ 236.243111] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [ 236.243115] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD6C (len 62, WS 0, PS 0) @ 0xCD88 [ 241.244625] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [ 241.244628] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
lspci -vvvxxxnn on 3.14-rc7+ says:
09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779] (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: radeon 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: snd_hda_intel 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
(oops!)
On 3.13, it says:
09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Radeon HD 6450 1 GB DDR3 [174b:e164] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 92 Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at f4a20000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at c000 [size=256] Expansion ROM at f4a00000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: radeon 00: 02 10 79 67 07 04 10 00 00 00 00 03 10 00 80 00 10: 0c 00 00 e0 00 00 00 00 04 00 a2 f4 00 00 00 00 20: 01 c0 00 00 00 00 00 00 00 00 00 00 4b 17 64 e1 30: 00 00 a0 f4 50 00 00 00 00 00 00 00 0a 01 00 00
09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] Subsystem: PC Partner Limited / Sapphire Technology Radeon HD 6450 1GB DDR3 [174b:aa98] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 96 Region 0: Memory at f4a40000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel 00: 02 10 98 aa 06 04 10 00 00 00 03 04 10 00 80 00 10: 04 00 a4 f4 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 4b 17 98 aa 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 02 00 00
Logs attached.
Hi Andy,
I'm really sorry that you tripped over this, but thanks a lot for the report. Is there any chance the box is currently running v3.13, and you could collect the dmesg log from it? I don't see anything unusual from a PCI perspective in the v3.14-rc7 dmesg; all the PCI device resources look fine, and we didn't reassign anything. It seems like the 0000:09:00.x devices just stopped responding for some reason, and the PCI core shouldn't really be involved after the radeon driver claims and enables those devices. But it's possible I'd get a clue by comparing the v3.13 and v3.14-rc7 dmesg logs.
Attached. I also clearly screwed something up about my 3.14 config -- I meant for it to match the Fedora config, but it doesn't. At least NR_CPUs is too low. That shoudn't break radeon, but maybe something odd happens.
3.14 also complains that it can't find an AGP bridge. 3.13 does not complain about that.
CONFIG_GART_IOMMU is not defined for the 3.13.6-200.rc20.x86_64 kernel, but apparently it is for your v3.14-rc7 kernel. That explains the "No AGP bridge found" difference.
I'm afraid I still can't shed any light on the problem with the radeon device.
Is there any news on this? It would be a shame to release v3.14 with a known regression.
I opened https://bugzilla.kernel.org/show_bug.cgi?id=73041 as a place to archive the dmesg, etc.
I looked at the lspci output again (by the way, if you have occasion to collect that again, do it as root so we can see the capabilites as well). Apart from the fact that 09:00.0 and 09:00.1 stopped responding completely, the only differences are that the "Received Master-Abort" bit is set in some of the bridges. I think this difference is related to 5b764b834ea9 ("PCI: Stop clearing bridge Secondary Status when setting up I/O aperture"), which appeared in v3.14-rc1. I don't think this is related to the problem with the Radeon device, though.
Bjorn