[ Adding the dri-devel list ]
On Mon, 2011-04-11 at 15:31 +0200, Gabriel Paubert wrote:
On Thu, Apr 07, 2011 at 04:04:35PM +0200, Michel Dänzer wrote:
On Mit, 2011-04-06 at 22:43 +0200, Gabriel Paubert wrote:
The probem is that, at least on one of my machines, the new driver does not work: the system hangs (apparently solid, but it's before networking starts up and I've not yet hooked up a serial console), after the "radeon: ib pool ready" message.
Does radeon.agpmode=-1 radeon.no_wb=1 help?
You might be able to get more information via netconsole if you prevent the radeon module from loading automatically (or load it with radeon.modeset=0 first) and then load it e.g. via ssh with modeset=1.
Loading the module with modeset=1 results in insmod blocked in kernel state (not consuming CPU cycles either). The last kernel message is always the same (ib pool ready). This seems to be independent of agpmode and no_wb. The kernel messages when loading the driver are:
kernel: [drm] radeon kernel modesetting enabled. kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000) kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7). kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass kernel: [drm] register mmio base: 0xE8000000 kernel: [drm] register mmio size: 65536 kernel: radeon 0000:f1:00.0: Invalid ROM contents kernel: ATOM BIOS: X1650PRO kernel: [drm] Generation 2 PCI interface, using max accessible memory kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used) kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). kernel: [drm] Driver supports precise vblank timestamp query. kernel: irq: irq 9 on host /mpic mapped to virtual irq 24 kernel: u3msi: allocated virq 0x18 (hw 0x9) addr 0xf8004090 kernel: radeon 0000:f1:00.0: radeon: using MSI.
Have you ruled out any MSI related problems? I think the IRQ not working could explain the symptoms...
kernel: [drm] radeon: irq initialized. kernel: [drm] Detected VRAM RAM=512M, BAR=256M kernel: [drm] RAM width 128bits DDR kernel: [TTM] Zone kernel: Available graphics memory: 1002914 kiB. kernel: [TTM] Initializing pool allocator. kernel: [drm] radeon: 512M of VRAM memory ready kernel: [drm] radeon: 512M of GTT memory ready. kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072 kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized. kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000). kernel: radeon 0000:f1:00.0: WB enabled
Make sure this line changes to 'WB disabled' with no_wb=1. There's a writeback endianness bug with modeset=1, see http://lists.freedesktop.org/archives/dri-devel/2011-April/009960.html .
kernel: [drm] Loading R500 Microcode kernel: [drm] radeon: ring at 0x0000000020001000 kernel: [drm] ring test succeeded in 6 usecs kernel: [drm] radeon: ib pool ready.
For now, with modeset=0, agpmode=-1 and no_wb=1, the driver seems to work.
The agpmode and no_wb options only have an effect with modeset=1, and you don't seem to be using AGP anyway. :)
Hi Micel,
On Mon, Apr 11, 2011 at 05:32:43PM +0200, Michel Dänzer wrote:
[ Adding the dri-devel list ]
Have you ruled out any MSI related problems? I think the IRQ not working could explain the symptoms...
Booting with MSI disabled does not change anything. Actually on this machine the Ethernet (tigon3) uses MSI and everything is fine. OTOH, on my home PC (dual code Athlon64 4 1/2 years old), MSI has never worked.
Make sure this line changes to 'WB disabled' with no_wb=1. There's a writeback endianness bug with modeset=1, see http://lists.freedesktop.org/archives/dri-devel/2011-April/009960.html .
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
kernel: [drm] radeon kernel modesetting enabled. kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000) kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7). kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass kernel: [drm] register mmio base: 0xE8000000 kernel: [drm] register mmio size: 65536 kernel: radeon 0000:f1:00.0: Invalid ROM contents kernel: ATOM BIOS: X1650PRO kernel: [drm] Generation 2 PCI interface, using max accessible memory kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used) kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). kernel: [drm] Driver supports precise vblank timestamp query. kernel: [drm] radeon: irq initialized. kernel: [drm] Detected VRAM RAM=512M, BAR=256M kernel: [drm] RAM width 128bits DDR kernel: [TTM] Zone kernel: Available graphics memory: 1003018 kiB. kernel: [TTM] Initializing pool allocator. kernel: [drm] radeon: 512M of VRAM memory ready kernel: [drm] radeon: 512M of GTT memory ready. kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072 kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized. kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000). kernel: radeon 0000:f1:00.0: WB disabled kernel: [drm] Loading R500 Microcode kernel: [drm] radeon: ring at 0x0000000020001000 kernel: [drm] ring test succeeded in 6 usecs kernel: [drm] radeon: ib pool ready. kernel: [drm] ib test succeeded in 0 usecs kernel: [drm] Radeon Display Connectors kernel: [drm] Connector 0: kernel: [drm] DVI-I kernel: [drm] HPD1 kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c kernel: [drm] Encoders: kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1 kernel: [drm] DFP1: INTERNAL_KLDSCP_TMDS1 kernel: [drm] Connector 1: kernel: [drm] S-video kernel: [drm] Encoders: kernel: [drm] TV1: INTERNAL_KLDSCP_DAC2 kernel: [drm] Connector 2: kernel: [drm] DVI-I kernel: [drm] HPD2 kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c kernel: [drm] Encoders: kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2 kernel: [drm] DFP3: INTERNAL_LVTM1 kernel: [drm] Possible lm63 thermal controller at 0x4c kernel: [drm] fb mappable at 0xC00C0000 kernel: [drm] vram apper at 0xC0000000 kernel: [drm] size 9216000 kernel: [drm] fb depth is 24 kernel: [drm] pitch is 7680 kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000) kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver kernel: fb1: radeondrmfb frame buffer device kernel: drm: registered panic notifier kernel: [drm] Initialized radeon 2.8.0 20080528 for 0000:f1:00.0 on minor 0 kernel: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
There is only one display connected and it is to the first DVI connector, BTW.
Regards, Gabriel
On Die, 2011-04-12 at 13:30 +0200, Gabriel Paubert wrote:
On Mon, Apr 11, 2011 at 05:32:43PM +0200, Michel Dänzer wrote:
Have you ruled out any MSI related problems? I think the IRQ not working could explain the symptoms...
Booting with MSI disabled does not change anything. Actually on this machine the Ethernet (tigon3) uses MSI and everything is fine. OTOH, on my home PC (dual code Athlon64 4 1/2 years old), MSI has never worked.
Okay, the fact no_wb helps probably rules out an IRQ problem anyway.
Make sure this line changes to 'WB disabled' with no_wb=1. There's a writeback endianness bug with modeset=1, see http://lists.freedesktop.org/archives/dri-devel/2011-April/009960.html .
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
kernel: [drm] radeon kernel modesetting enabled. kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000) kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7). kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass kernel: [drm] register mmio base: 0xE8000000 kernel: [drm] register mmio size: 65536 kernel: radeon 0000:f1:00.0: Invalid ROM contents kernel: ATOM BIOS: X1650PRO kernel: [drm] Generation 2 PCI interface, using max accessible memory kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used) kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). kernel: [drm] Driver supports precise vblank timestamp query. kernel: [drm] radeon: irq initialized. kernel: [drm] Detected VRAM RAM=512M, BAR=256M kernel: [drm] RAM width 128bits DDR kernel: [TTM] Zone kernel: Available graphics memory: 1003018 kiB. kernel: [TTM] Initializing pool allocator. kernel: [drm] radeon: 512M of VRAM memory ready kernel: [drm] radeon: 512M of GTT memory ready. kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072 kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized. kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000). kernel: radeon 0000:f1:00.0: WB disabled kernel: [drm] Loading R500 Microcode kernel: [drm] radeon: ring at 0x0000000020001000 kernel: [drm] ring test succeeded in 6 usecs kernel: [drm] radeon: ib pool ready. kernel: [drm] ib test succeeded in 0 usecs kernel: [drm] Radeon Display Connectors kernel: [drm] Connector 0: kernel: [drm] DVI-I kernel: [drm] HPD1 kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c kernel: [drm] Encoders: kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1 kernel: [drm] DFP1: INTERNAL_KLDSCP_TMDS1 kernel: [drm] Connector 1: kernel: [drm] S-video kernel: [drm] Encoders: kernel: [drm] TV1: INTERNAL_KLDSCP_DAC2 kernel: [drm] Connector 2: kernel: [drm] DVI-I kernel: [drm] HPD2 kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c kernel: [drm] Encoders: kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2 kernel: [drm] DFP3: INTERNAL_LVTM1 kernel: [drm] Possible lm63 thermal controller at 0x4c kernel: [drm] fb mappable at 0xC00C0000 kernel: [drm] vram apper at 0xC0000000 kernel: [drm] size 9216000 kernel: [drm] fb depth is 24 kernel: [drm] pitch is 7680 kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000) kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver kernel: fb1: radeondrmfb frame buffer device
Hmm, I think this should say fb0, but that should only matter for console, not X.
kernel: drm: registered panic notifier kernel: [drm] Initialized radeon 2.8.0 20080528 for 0000:f1:00.0 on minor 0 kernel: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
BTW, if your kernel contains commit 69a07f0b117a40fcc1a479358d8e1f41793617f2, can you try if reverting that helps?
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
The Xorg.0.log from the previous boot is attached.
Gabriel
On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
Note that it's normal for this ioctl to be called every time before the GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl always returns an error, this may not indicate a problem on its own.
The Xorg.0.log from the previous boot is attached.
I don't see any obvious problems in it. Can you describe the symptoms of the problem you're having with X a bit more?
One thing I notice is that the X server/driver are rather oldish. Maybe you can try newer versions from testing, sid or even experimental to see if that makes any difference.
On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel Dänzer wrote:
On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
Note that it's normal for this ioctl to be called every time before the GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl always returns an error, this may not indicate a problem on its own.
It seems to be an infinite loop, always returning EINTR because of regular SIGALRM delivery.
The Xorg.0.log from the previous boot is attached.
I don't see any obvious problems in it. Can you describe the symptoms of the problem you're having with X a bit more?
Well, X is dead, or rather in an infinite ioctl loop as described above. IIRC, the display enters a power-down mode and there is nothing to see.
One thing I notice is that the X server/driver are rather oldish. Maybe you can try newer versions from testing, sid or even experimental to see if that makes any difference.
I lack time to do it until early May (being away for 2 weeks starting on Friday and busy on urgent things). I'm indeed Debian stable (Squeeze), which is rather recent and the machine is about 2 1/2 years old.
Gabriel
On Wed, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
Well, X is dead, or rather in an infinite ioctl loop as described above. IIRC, the display enters a power-down mode and there is nothing to see.
So basically the card crashed. There's about an infinite amount of reasons why radeons do so, sometimes it has to do with them not liking what you ate that day...
The only thing I can see that could be of use would be a bisect
Cheers, Ben.
On Wed, Apr 13, 2011 at 06:16:13PM +1000, Benjamin Herrenschmidt wrote:
On Wed, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
Well, X is dead, or rather in an infinite ioctl loop as described above. IIRC, the display enters a power-down mode and there is nothing to see.
So basically the card crashed. There's about an infinite amount of reasons why radeons do so, sometimes it has to do with them not liking what you ate that day...
The only thing I can see that could be of use would be a bisect
Bisecting for something which I have never got to work (radeon with KMS) on this machine is something I don't know how to do...
Note that radeon without KMS also always ends up crashing, but it may take hours. The only case where the machine works reliably is when glxinfo claims that it is using software rendering.
Regards, Gabriel
On Mit, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel Dänzer wrote:
On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
Note that it's normal for this ioctl to be called every time before the GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl always returns an error, this may not indicate a problem on its own.
It seems to be an infinite loop, always returning EINTR because of regular SIGALRM delivery.
That does sound like the GPU locks up. Do you get any messages in dmesg about lockups and attempts to reset the GPU at any time?
On Wed, Apr 13, 2011 at 02:12:16PM +0200, Michel Dänzer wrote:
On Mit, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel Dänzer wrote:
On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
With no_wb=1 the driver goes a bit further but the X server ends up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
Note that it's normal for this ioctl to be called every time before the GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl always returns an error, this may not indicate a problem on its own.
It seems to be an infinite loop, always returning EINTR because of regular SIGALRM delivery.
That does sound like the GPU locks up. Do you get any messages in dmesg about lockups and attempts to reset the GPU at any time?
No.
Gabriel
On Mit, 2011-04-13 at 14:27 +0200, Gabriel Paubert wrote:
On Wed, Apr 13, 2011 at 02:12:16PM +0200, Michel Dänzer wrote:
On Mit, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel Dänzer wrote:
On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
> > With no_wb=1 the driver goes a bit further but the X server ends > up in an infinite ioctl loop and the logs are:
Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
Note that it's normal for this ioctl to be called every time before the GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl always returns an error, this may not indicate a problem on its own.
It seems to be an infinite loop, always returning EINTR because of regular SIGALRM delivery.
That does sound like the GPU locks up. Do you get any messages in dmesg about lockups and attempts to reset the GPU at any time?
No.
Hmm, I guess the constant SIGALRMs might prevent the lockup detection from kicking in... Maybe you can try starting the X server with -dumbSched to see if that gets things along any further, but in the end there's probably no way around figuring out what causes the lockup and fixing that anyway.
Michel Dänzer wrote:
That does sound like the GPU locks up. Do you get any messages in dmesg about lockups and attempts to reset the GPU at any time?
No.
Hmm, I guess the constant SIGALRMs might prevent the lockup detection from kicking in... Maybe you can try starting the X server with -dumbSched to see if that gets things along any further, but in the end there's probably no way around figuring out what causes the lockup and fixing that anyway.
I have an old AGP box that locks with 600g + agpgart - It used to give GPU lockup to dmesg/log, but (I only test it occasionally) it doesn't anymore. I can still sysrq OK.
I wonder if something changed in recent months in the drm/whatever code that has changed/blocked the logging.
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
BTW, if your kernel contains commit 69a07f0b117a40fcc1a479358d8e1f41793617f2, can you try if reverting that helps?
My kernel is pristine 2.6.38 and does not include this commit (was introduced before 2.6.39-rc1 according to gitk).
Gabriel
On Wed, Apr 13, 2011 at 10:02:04AM +0200, Gabriel Paubert wrote:
On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
BTW, if your kernel contains commit 69a07f0b117a40fcc1a479358d8e1f41793617f2, can you try if reverting that helps?
My kernel is pristine 2.6.38 and does not include this commit (was introduced before 2.6.39-rc1 according to gitk).
gitk is not the best tool to find this out.
$ git name-rev --refs=refs/tags/v2.6* 69a07f0b117a40fcc1a479358d8e1f41793617f2 69a07f0b117a40fcc1a479358d8e1f41793617f2 tags/v2.6.39-rc2~3^2~43^2~4
so it was introduced just before -rc2.
Best regards Uwe
dri-devel@lists.freedesktop.org