https://bugs.freedesktop.org/show_bug.cgi?id=84500
Priority: medium Bug ID: 84500 Assignee: dri-devel@lists.freedesktop.org Summary: [radeonsi] radeon 0000:01:00.0: Packet0 not allowed! Severity: normal Classification: Unclassified OS: All Reporter: alexandre.f.demers@gmail.com Hardware: Other Status: NEW Version: XOrg CVS Component: DRM/Radeon Product: DRI
On a 7950, I keep getting this error from time to time in dmesg: radeon 0000:01:00.0: Packet0 not allowed!
I have associated this error with playing either html5 or flash videos. It may happen when playing offline movies, but I can't tell since I haven't tested it.
When the error happens, there is a slight "stuttering" (from a fraction of second to a few seconds). And then it continues.
There is nothing in Xorg.0.log about it, and no other message than "radeon 0000:01:00.0: Packet0 not allowed!" in dmesg.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #1 from Alexandre Demers alexandre.f.demers@gmail.com --- Even when UVD is manually disabled, the error still shows in dmesg.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #2 from Michel Dänzer michel@daenzer.net --- Can you run the browser with the environment variable RADEON_DUMP_CS=1, and attach any command stream dumps that generates on stderr?
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #3 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to comment #2)
Can you run the browser with the environment variable RADEON_DUMP_CS=1, and attach any command stream dumps that generates on stderr?
I'll run firefox with this env var later today.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #4 from Alex Deucher agd5f@yahoo.com --- Created attachment 107128 --> https://bugs.freedesktop.org/attachment.cgi?id=107128&action=edit dump full CS when we hit a packet 0
This kernel patch should make it much easier to debug. When you hit the error, please attach the full output of the CS.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #5 from Alexandre Demers alexandre.f.demers@gmail.com --- Created attachment 107162 --> https://bugs.freedesktop.org/attachment.cgi?id=107162&action=edit One CS dump
Got this CS dump when using Firefox while playing a few streams in Flash (it happens often when there is more than one stream playing). I was playing the live stream from radio-canada.ca, a show from tou.tv and another one from telequebec.tv.
Short after, I experienced a GPU reset. Obviously, Flash had been killed in the process.
Here is the log from that hang/reset: 25590.472377] radeon 0000:01:00.0: ring 0 stalled for more than 10020msec [25590.472383] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000003d58ba last fence id 0x00000000003d58b5 on ring 0) [25590.488409] radeon 0000:01:00.0: ring 3 stalled for more than 10036msec [25590.488415] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000014a8e2 last fence id 0x000000000014a8e0 on ring 3) [25590.979347] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x00000000c039ec40 flags=0x0010] [25590.979352] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x00000000c039ec70 flags=0x0030] [25590.979354] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x00000000c0000100 flags=0x0030] [25590.979355] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x00000000c039eb00 flags=0x0010] [25590.979357] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x00000000c039eb40 flags=0x0010] [25590.979386] radeon 0000:01:00.0: Saved 321 dwords of commands on ring 0. [25590.979432] radeon 0000:01:00.0: GPU softreset: 0x0000006C [25590.979434] radeon 0000:01:00.0: GRBM_STATUS = 0xA0003028 [25590.979437] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 [25590.979439] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 [25590.979441] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [25590.979476] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 [25590.979478] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [25590.979480] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010000 [25590.979482] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00400002 [25590.979484] radeon 0000:01:00.0: R_008680_CP_STAT = 0x84010243 [25590.979486] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44483106 [25590.979488] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C84246 [25590.979490] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [25590.979492] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [25591.482479] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF [25591.482532] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100140 [25591.483688] radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 [25591.483690] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 [25591.483692] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 [25591.483694] radeon 0000:01:00.0: SRBM_STATUS = 0x200002C0 [25591.483728] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 [25591.483730] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [25591.483731] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [25591.483734] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [25591.483735] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 [25591.483741] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [25591.483743] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [25591.483826] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [25591.524037] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0 [25591.524039] [drm] PCIE gen 2 link speeds already enabled [25591.525213] [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000). [25591.525303] radeon 0000:01:00.0: WB enabled [25591.525306] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000000c0000c00 and cpu addr 0xffff8804113f2c00 [25591.525308] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x00000000c0000c04 and cpu addr 0xffff8804113f2c04 [25591.525310] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x00000000c0000c08 and cpu addr 0xffff8804113f2c08 [25591.525311] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000000c0000c0c and cpu addr 0xffff8804113f2c0c [25591.525313] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x00000000c0000c10 and cpu addr 0xffff8804113f2c10 [25591.528260] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90015cb5a18 [25591.693423] [drm] ring test on 0 succeeded in 1 usecs [25591.693427] [drm] ring test on 1 succeeded in 1 usecs [25591.693431] [drm] ring test on 2 succeeded in 1 usecs [25591.693440] [drm] ring test on 3 succeeded in 2 usecs [25591.693446] [drm] ring test on 4 succeeded in 1 usecs [25591.693471] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). [25591.693475] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). [25591.693476] radeon 0000:01:00.0: ib ring test failed (-35). [25592.186042] radeon 0000:01:00.0: GPU softreset: 0x00000048 [25592.186044] radeon 0000:01:00.0: GRBM_STATUS = 0xA0003028 [25592.186046] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 [25592.186048] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 [25592.186050] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [25592.186084] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 [25592.186086] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [25592.186088] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010000 [25592.186090] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000002 [25592.186091] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80010243 [25592.186093] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [25592.186095] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [25592.186097] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [25592.186099] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [25592.439595] Watchdog[906]: segfault at 0 ip 00007f4d1c491c2e sp 00007f4d0a258770 error 6 in chrome[7f4d1833b000+547e000] [25592.674414] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF [25592.674468] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [25592.675624] radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 [25592.675627] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 [25592.675629] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 [25592.675631] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [25592.675665] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 [25592.675667] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [25592.675669] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [25592.675671] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [25592.675673] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 [25592.675675] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [25592.675677] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [25592.675761] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [25592.701553] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0 [25592.701556] [drm] PCIE gen 2 link speeds already enabled [25592.702716] [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000). [25592.702806] radeon 0000:01:00.0: WB enabled [25592.702809] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000000c0000c00 and cpu addr 0xffff8804113f2c00 [25592.702811] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x00000000c0000c04 and cpu addr 0xffff8804113f2c04 [25592.702812] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x00000000c0000c08 and cpu addr 0xffff8804113f2c08 [25592.702814] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000000c0000c0c and cpu addr 0xffff8804113f2c0c [25592.702816] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x00000000c0000c10 and cpu addr 0xffff8804113f2c10 [25592.706024] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90015cb5a18 [25592.873022] [drm] ring test on 0 succeeded in 1 usecs [25592.873026] [drm] ring test on 1 succeeded in 1 usecs [25592.873030] [drm] ring test on 2 succeeded in 1 usecs [25592.873039] [drm] ring test on 3 succeeded in 2 usecs [25592.873045] [drm] ring test on 4 succeeded in 1 usecs [25592.873066] [drm] ib test on ring 0 succeeded in 0 usecs [25592.873084] [drm] ib test on ring 1 succeeded in 0 usecs [25592.873124] [drm] ib test on ring 2 succeeded in 0 usecs [25592.873141] [drm] ib test on ring 3 succeeded in 0 usecs [25592.873158] [drm] ib test on ring 4 succeeded in 0 usecs [25592.873294] switching from power state: [25592.873295] ui class: none [25592.873297] internal class: boot [25592.873298] caps: [25592.873299] uvd vclk: 0 dclk: 0 [25592.873301] power level 0 sclk: 50000 mclk: 15000 vddc: 950 vddci: 875 pcie gen: 2 [25592.873302] status: c b [25592.873303] switching to power state: [25592.873304] ui class: performance [25592.873305] internal class: none [25592.873306] caps: [25592.873307] uvd vclk: 0 dclk: 0 [25592.873308] power level 0 sclk: 30000 mclk: 15000 vddc: 850 vddci: 875 pcie gen: 2 [25592.873309] power level 1 sclk: 50100 mclk: 125000 vddc: 950 vddci: 875 pcie gen: 2 [25592.873310] power level 2 sclk: 88000 mclk: 125000 vddc: 1090 vddci: 875 pcie gen: 2 [25592.873311] status: r
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #6 from Alexandre Demers alexandre.f.demers@gmail.com --- And as a sidenote: aside from the attached CS dump, others are present in dmesg, but they did not trigger a GPU reset. It seems the last one that I had encountered and that I attached was just too much for some reason, and it reset again when I restarted the streamings.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #7 from Michel Dänzer michel@daenzer.net ---
[25322.031213] 0x0000000b
This value is written to the R_028A3C_VGT_GROUP_VECT_1_FMT_CNTL register. However, the driver only ever writes 0 to that register, in si_init_config().
[25322.031214] 0x00000000 <--- [25322.031215] 0x00000295 [25322.031215] 0x00000080 [25322.031216] 0x00000040 [25322.031217] 0x00000002
The values after the arrow look like the following series of register writes to R_028A54_VGT_GS_PER_ES and the two following registers.
So, it looks like the value for the R_028A3C_VGT_GROUP_VECT_1_FMT_CNTL register and the following PKT3_SET_CONTEXT_REG header were scribbled over with the value 0x0000000b00000000. Looks like memory corruption to me.
Running firefox in valgrind or with something like the GCC / clang address sanitizers might give a clue, but might be painful.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #8 from Christian König deathsimple@vodafone.de --- (In reply to Michel Dänzer from comment #7)
So, it looks like the value for the R_028A3C_VGT_GROUP_VECT_1_FMT_CNTL register and the following PKT3_SET_CONTEXT_REG header were scribbled over with the value 0x0000000b00000000. Looks like memory corruption to me.
Yeah, agree that strongly looks like a memory corruption. Which would also explain all the crashes.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #9 from José Suárez j.suarez.agapito@gmail.com --- I have been trying linux 3.16 with the packet0 patch and after some testing I haven't got any Packet0 message in the dmesg log. So I guess it must be related to the 3.17 rc's. I don't remember getting similar crashes just by watching youtube videos in firefox with previous kernels.
I will try to build 3.17 rc6 (which was the version that gave the Packet0 logs and system hangs) with the Packet 0 patch and report back.
@Alexandre: Can you try linux 3.16 and see if it works properly for you?
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #10 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to José Suárez from comment #9)
I have been trying linux 3.16 with the packet0 patch and after some testing I haven't got any Packet0 message in the dmesg log. So I guess it must be related to the 3.17 rc's. I don't remember getting similar crashes just by watching youtube videos in firefox with previous kernels.
I will try to build 3.17 rc6 (which was the version that gave the Packet0 logs and system hangs) with the Packet 0 patch and report back.
@Alexandre: Can you try linux 3.16 and see if it works properly for you?
built and testing. I'll report ASAP.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #11 from José Suárez j.suarez.agapito@gmail.com --- Created attachment 107281 --> https://bugs.freedesktop.org/attachment.cgi?id=107281&action=edit Dmesg while hitting a packet0
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #12 from José Suárez j.suarez.agapito@gmail.com --- I compiled 3.17rc7 with the packet0 patch. You can find a dmesg log just above this message.
Running firefox with RADEON_DUMP_CS=1 didn't produce any dump. Is it because I need the mesa dbg packages (not currently installed)? I guess it should appear on the console output, right? Or is it writen to a file? (Sorry about those noob questions. First time debugging this kind of problem...)
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #13 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to José Suárez from comment #12)
I compiled 3.17rc7 with the packet0 patch. You can find a dmesg log just above this message.
Running firefox with RADEON_DUMP_CS=1 didn't produce any dump. Is it because I need the mesa dbg packages (not currently installed)? I guess it should appear on the console output, right? Or is it writen to a file? (Sorry about those noob questions. First time debugging this kind of problem...)
It seems pretty much the same "signature" as the CS dump I had attached.
The CS dump is written in your dmesg log or systemd journal if I remember correctly.
On my side, I've been playing with a 3.16 with the patch applied and I've been unable to get a Packet0 error. So, it seems to have been introduced somewhere between 3.16 and 3.17-rc7. I'll try to bisect as soon as I'll have time (maybe not before next week).
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #14 from Andy Furniss adf.lists@gmail.com --- (In reply to Alexandre Demers from comment #13)
(In reply to José Suárez from comment #12)
I compiled 3.17rc7 with the packet0 patch. You can find a dmesg log just above this message.
Running firefox with RADEON_DUMP_CS=1 didn't produce any dump. Is it because I need the mesa dbg packages (not currently installed)? I guess it should appear on the console output, right? Or is it writen to a file? (Sorry about those noob questions. First time debugging this kind of problem...)
It seems pretty much the same "signature" as the CS dump I had attached.
The CS dump is written in your dmesg log or systemd journal if I remember correctly.
On my side, I've been playing with a 3.16 with the patch applied and I've been unable to get a Packet0 error. So, it seems to have been introduced somewhere between 3.16 and 3.17-rc7. I'll try to bisect as soon as I'll have time (maybe not before next week).
FWIW I just grepped my kern.log for Packet0 and have 47 between Jul 4 and now.
Doing grep Packet0 /var/log/kern.log -B 860 | grep Microcode
Only comes up with pitcairn (lowercase = new firmware = 3.17)
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #15 from Alexandre Demers alexandre.f.demers@gmail.com --- Slowly bisecting: b401796 would be good (haven't had a Packet0 error since yesterday) and 005f8005 would be bad. Continuing.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #16 from Alexandre Demers alexandre.f.demers@gmail.com --- It may be related to general GPU crashes seen other bugs: while bisecting, I hit a loop of GPU resets just after logging in until I rebooted.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #17 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Alexandre Demers from comment #16)
It may be related to general GPU crashes seen other bugs: while bisecting, I hit a loop of GPU resets just after logging in until I rebooted.
Refering to commit 3c2ea70 (for trace purpose)
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #18 from Alexandre Demers alexandre.f.demers@gmail.com --- I add to come back in my bisection because the result couldn't make sense. It's taking longer than expected...
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #19 from Alexandre Demers alexandre.f.demers@gmail.com --- Hmmm, dummy question but I must ask: isn't a HD 7950 supposed to be a Tahiti GPU? Because when looking in dmesg, it seems to load a Pitcairn ucode... My research seems to say it is indeed a Tahiti GPU... I'm puzzled.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #20 from John Bridgman john.bridgman@amd.com --- Yes, AFAIK HD 78xx is Pitcairn and HD 79xx is Tahiti.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #21 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to John Bridgman from comment #20)
Yes, AFAIK HD 78xx is Pitcairn and HD 79xx is Tahiti.
Well, I'm sick (literally) and I mixed José's dmesg with mine. Everything is fine with the device ID then.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #22 from Christian König deathsimple@vodafone.de --- Keep in mind that this might actually be a user space problem and that different kernel versions work or don't work only be coincident.
If you can get me an SSH access to the box I could take a look as well. Attaching a debugger to the process in question shouldn't be to hard.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #23 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Christian König from comment #22)
Keep in mind that this might actually be a user space problem and that different kernel versions work or don't work only be coincident.
If you can get me an SSH access to the box I could take a look as well. Attaching a debugger to the process in question shouldn't be to hard.
I've been having a hard time getting the error lately (not encountered in the last two days with a kernel 3.17-rc4). I'll go back to a newer kernel and I'll see if the Packet0 bug still happens as often as before.
About the SSH connection, that could be possible if needed in time.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #24 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Alexandre Demers from comment #23)
(In reply to Christian König from comment #22)
Keep in mind that this might actually be a user space problem and that different kernel versions work or don't work only be coincident.
If you can get me an SSH access to the box I could take a look as well. Attaching a debugger to the process in question shouldn't be to hard.
I've been having a hard time getting the error lately (not encountered in the last two days with a kernel 3.17-rc4). I'll go back to a newer kernel and I'll see if the Packet0 bug still happens as often as before.
About the SSH connection, that could be possible if needed in time.
Well, at last, I've been able to hit the error with a 3.17-rc4 and something.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #25 from Zoltán Böszörményi zboszor@pr.hu --- I see "Packet0 not allowed" messages on 3.17.0 / 3.17.1 under Fedora 21. The video card is R9 270X, also Pitcairn.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #26 from Erich Seifert eseifert@error-reports.org --- I'm also getting "Packet0 not allowed!" messages with a Radeon HD 7770 (Cape Verde XT) video card on kernel 3.17.0 and 3.17.1. I experienced several random crashes with 3.17.0, but I'm not sure they are related to this problem yet. I'll apply the patch and report back soon.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #27 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Christian König from comment #22)
Keep in mind that this might actually be a user space problem and that different kernel versions work or don't work only be coincident.
If you can get me an SSH access to the box I could take a look as well. Attaching a debugger to the process in question shouldn't be to hard.
By the way, if I understand correctly, if the bug is in userspace and was introduced around the same time kernel 3.17-rcX went out, would it appears when using a previous kernel version? I'm trying to figure out a way to distinguish one from the other because from where I am in the bisection, I was unable to reproduce the bug with a 3.16 kernel, but it does appear before 3.17-rc1...
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #28 from Christian König deathsimple@vodafone.de --- (In reply to Alexandre Demers from comment #27)
(In reply to Christian König from comment #22)
Keep in mind that this might actually be a user space problem and that different kernel versions work or don't work only be coincident.
If you can get me an SSH access to the box I could take a look as well. Attaching a debugger to the process in question shouldn't be to hard.
By the way, if I understand correctly, if the bug is in userspace and was introduced around the same time kernel 3.17-rcX went out, would it appears when using a previous kernel version? I'm trying to figure out a way to distinguish one from the other because from where I am in the bisection, I was unable to reproduce the bug with a 3.16 kernel, but it does appear before 3.17-rc1...
It is possible that a new kernel let this problem surface by coincident. E.g. a slightly different memory layout or allocation timing and instead of changing two random pixel on the screen we change the command buffer and the whole box crashes and/or shows this error.
All you can do is to try to figure out when the corruption happens. The kernel copies the command buffer content from userspace to a kernel buffer and then checks the content of the kernel buffer. Might be a good idea to print the content of the userspace buffer as well and compare both?
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #29 from Michel Dänzer michel@daenzer.net --- (In reply to Alexandre Demers from comment #27)
I'm trying to figure out a way to distinguish one from the other because from where I am in the bisection, I was unable to reproduce the bug with a 3.16 kernel, but it does appear before 3.17-rc1...
That's fine. Once you've finished bisecting the kernel, we'll decide where to go from there based on the result.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #30 from Maciej gutigen@outlook.com --- Imagine bug like this happening on Windows, customers would go nuts and it would be fixed asap by AMD... But hey, Linux is not second class citizen, right?
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #31 from José Suárez j.suarez.agapito@gmail.com --- I've been testing linux 3.18 rc1 for a few days and I've found it to be quite stable with regard to this bug. No hangs for me yet, but the Patcket0 massages still show up in dmesg.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #32 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to José Suárez from comment #31)
I've been testing linux 3.18 rc1 for a few days and I've found it to be quite stable with regard to this bug. No hangs for me yet, but the Patcket0 massages still show up in dmesg.
Indeed, pretty much the same over here.
I'm still bisecting. Everything points to something introduced between 3.16 and 3.17-rc1. It just takes awhile since the problem doesn't appear everytime.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #33 from Dieter Nützel Dieter@nuetzel-hh.de --- Hello Alexandre,
maybe you can take a look, here to speed things up? https://bugzilla.kernel.org/show_bug.cgi?id=86891 Comment #3 and #4.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #34 from Dieter Nützel Dieter@nuetzel-hh.de --- (In reply to Dieter Nützel from comment #33)
Hello Alexandre,
maybe you can take a look, here to speed things up? https://bugzilla.kernel.org/show_bug.cgi?id=86891 Comment #3 and #4.
I'll testing it on RV730 AGP with git revert of
59bc1d89d6a4d67c94a9b70fa81bda1d5b04f0cb is the first bad commit commit 59bc1d89d6a4d67c94a9b70fa81bda1d5b04f0cb Author: Lauri Kasanen cand@gmx.com Date: Sun Apr 20 20:29:33 2014 +0300
drm/radeon: Inline r100_mm_rreg, -wreg, v3
Now.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #35 from Michel Dänzer michel@daenzer.net --- (In reply to Alexandre Demers from comment #32)
I'm still bisecting.
Did you get somewhere with the bisection? If not (or regardless), might be worth testing the Mesa patches I attached to bug 85647.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #36 from Alexandre Demers alexandre.f.demers@gmail.com --- Almost, I will be testing my last commit tonight (if I did no mistake along the way). I'll have a look at the patch after that.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #37 from Alexandre Demers alexandre.f.demers@gmail.com --- Well, the bisection was not conclusive... A branch's head commit produced the error, but I was unable to reproduce it earlier in that branch... I'll have to dig again in that branch and make sure it is related to that branch only.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #38 from Alexandre Demers alexandre.f.demers@gmail.com --- For the last couple of days, I've been playing with kernel 3.19 drm-next and with some previously problematic 3.18 kernel versions. I was unable to reproduce the problem.
Mesa was updated a couple of time since the beginning of the bisection, as for the ddx drive. I'll keep this bug open for still a couple of days, but I may end up closing it if I don't encounter the bug anymore.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #39 from drago01@gmail.com --- I am seeing those messages too here:
radeon 0000:02:00.0: Packet0 not allowed!
on a R9 270X ... no hangs or anything else just the message in the log (3.17.3 / mesa 10.3.3 on F20).
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #40 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to drago01 from comment #39)
I am seeing those messages too here:
radeon 0000:02:00.0: Packet0 not allowed!
on a R9 270X ... no hangs or anything else just the message in the log (3.17.3 / mesa 10.3.3 on F20).
I haven't hit it for a while. But I'm testing a 3.18 kernel with latest mesa from git. This could be a clue.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #41 from Alexandre Demers alexandre.f.demers@gmail.com --- There is an application that still triggers the Packet0 error: Serious Sam 3. I could get an apitrace if someone thinks it could be useful.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #42 from Öyvind Saether oyvinds@everdot.org --- Happens with 3.19.0-rc6, no idea what triggered it.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #43 from Öyvind Saether oyvinds@everdot.org --- Created attachment 113076 --> https://bugs.freedesktop.org/attachment.cgi?id=113076&action=edit dmesg with packet not allowed error
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #44 from Lorenzo Bona lorenz.bona@gmail.com --- I'm hitting this error too.
Playing Dota2 (my only game) causes this to appear in dmesg.
drm-fixes-3.19, mesa/ddx/xserver/drm from git. The GPU is a R7-265. The distribution is debian sid.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #45 from Lorenzo Bona lorenz.bona@gmail.com --- Since yesterday I've been testing last drm-fixes-3.19 kernel with old radeon firmwares. I mean before big upgrade on 24th of July.
I've played Dota2 and watched videos on flash and on mpv with vdpau, and I can't reproduce those warnings anymore.
But while I play I can see these:
[10319.747657] radeon 0000:07:00.0: GPU fault detected: 146 0x0b080404 [10319.747665] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00017258 [10319.747670] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08004004 [10319.747675] VM fault (0x04, vmid 4) at page 94808, read from TC (4) [12134.226711] radeon 0000:07:00.0: GPU fault detected: 146 0x0b084404 [12134.226719] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00017258 [12134.226724] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044004 [12134.226728] VM fault (0x04, vmid 4) at page 94808, read from TC (68)
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #46 from Alexandre Demers alexandre.f.demers@gmail.com --- (In reply to Lorenzo Bona from comment #45)
Since yesterday I've been testing last drm-fixes-3.19 kernel with old radeon firmwares. I mean before big upgrade on 24th of July.
I've played Dota2 and watched videos on flash and on mpv with vdpau, and I can't reproduce those warnings anymore.
But while I play I can see these:
[10319.747657] radeon 0000:07:00.0: GPU fault detected: 146 0x0b080404 [10319.747665] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00017258 [10319.747670] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08004004 [10319.747675] VM fault (0x04, vmid 4) at page 94808, read from TC (4) [12134.226711] radeon 0000:07:00.0: GPU fault detected: 146 0x0b084404 [12134.226719] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00017258 [12134.226724] radeon 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044004 [12134.226728] VM fault (0x04, vmid 4) at page 94808, read from TC (68)
Your VM errors may be related to bug 87278, which was also reopened after a reverted commit in LLVM.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
Alexandre Demers alexandre.f.demers@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=87278
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #47 from Chernovsky Oleg adonai@xaker.ru --- Got this bug today.
How can I know whether is it user-space IB assembly corruption or kernel-space error? I have all sources at hand, where to look at?
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #48 from Alex Deucher alexdeucher@gmail.com --- Dump the IB in userspace before submission and compare it to what gets dumped in the kernel.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #49 from patches@portaildulibre.fr --- Happens with 3.18.9-100.fc20.x86_64
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #50 from patches@portaildulibre.fr --- Created attachment 114385 --> https://bugs.freedesktop.org/attachment.cgi?id=114385&action=edit dmesg for M6700
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #51 from Vladimir Ysikov grantipak@gmail.com --- Created attachment 114754 --> https://bugs.freedesktop.org/attachment.cgi?id=114754&action=edit dmesg log
I got same issue playing in Dota 2 and Alt+Tab in firefox. Radeon 7950, kernel 4.0rc4, mesa-git, llvm-svn, KDE 5, Archlinux x86-64.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #52 from Chernovsky Oleg adonai@xaker.ru --- Created attachment 118502 --> https://bugs.freedesktop.org/attachment.cgi?id=118502&action=edit dmesg log
Just got this bug and GPU lockup while trying to play Guild Wars through wine. Happens when I rotate camera at the start of the game extensively, forcing Mesa to compile all the shaders at once.
It's repeatable so I can provide some logs here.
Radeon R9 270
Arch x86_64, Linux 4.2.1, brand new Mesa 11.0.1
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #53 from Christian König deathsimple@vodafone.de --- (In reply to Chernovsky Oleg from comment #52)
Created attachment 118502 [details] dmesg log
Just got this bug and GPU lockup while trying to play Guild Wars through wine. Happens when I rotate camera at the start of the game extensively, forcing Mesa to compile all the shaders at once.
It's repeatable so I can provide some logs here.
Radeon R9 270
Arch x86_64, Linux 4.2.1, brand new Mesa 11.0.1
Great! You are the guy who also did the fan control patches aren't you?
As first step please try to catch an apitrace of it.
If that doesn't work and you still want to get your hands dirty with the code again contact me by mail (christian.koenig@amd.com) and we can discuss how to dig deeper into this issue.
Best regards, Christian.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
--- Comment #54 from Chernovsky Oleg adonai@xaker.ru ---
Great! You are the guy who also did the fan control patches aren't you?
Yep, that's me, thanks! I also stalkered Michel Dänzer for explanations of GTT and VMM at some time :)
As first step please try to catch an apitrace of it.
If that doesn't work and you still want to get your hands dirty with the code again contact me by mail (christian.koenig@amd.com) and we can discuss how to dig deeper into this issue.
Best regards, Christian.
Will do on weekend and mail you about results.
https://bugs.freedesktop.org/show_bug.cgi?id=84500
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #55 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/540.
dri-devel@lists.freedesktop.org