New subject: [Bug 93341] GPU lockups on RadeonHD 7770 (radeonsi driver) when running OpenGL games or after extended periods of time

11 Dec 2015


      https://bugs.freedesktop.org/show_bug.cgi?id=93341
Bug ID: 93341
           Summary: GPU lockups on RadeonHD 7770 (radeonsi driver) when
                    running OpenGL games
           Product: Mesa
           Version: 11.0
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: Drivers/Gallium/radeonsi
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: nekohayo@gmail.com
        QA Contact: dri-devel@lists.freedesktop.org
Fedora 23, xorg-x11-drv-ati, on a Dell Precision T3500 (latest BIOS, A17) with
a RadeonHD 7770 GPU. Running the latest up-to-date stock packages from Fedora.
If I start a game like Xonotic (from the Fedora repos) or Unvanquished (latest
alpha binary build downloaded from their github repo), after a minute or two of
just looking around as a spectator player, I'll eventually see my computer's
monitor turn off all of a sudden. Sound will continue to play for a while, then
it might stop/loop. After a few seconds, the kernel will be locked up with the
CapsLock LED no longer working.
This also happened to me once simply by watching a video fullscreen in Totem
(I'm running GNOME Shell, FWIW), but this is a much rarer occurrence.
Unfortunately I don't have knowledge of debugging such things, and ABRT somehow
thinks my kernel is tainted with the "I" status (meaning it's "working around a
severe firmware bug"), which I suppose might be the radeon microcode, so I
can't get ABRT to create a nice automated retrace/full debug thing for me. But
at least it still has stuff stored on disk, if there's anything in there you'd
need:
# ls -lh /var/spool/abrt/oops-2015-12-10-21:50:22-777-1/
-rw-r----- 1 root abrt    5 10 déc 21:50 abrt_version
-rw-r----- 1 root abrt    9 10 déc 21:50 analyzer
-rw-r----- 1 root abrt    6 10 déc 21:50 architecture
-rw-r----- 1 root abrt 3,7K 10 déc 21:50 backtrace
-rw-r----- 1 root abrt  124 10 déc 21:50 cmdline
-rw-r----- 1 root abrt   16 10 déc 21:50 component
-rw-r----- 1 root abrt    1 10 déc 21:50 count
-rw-r----- 1 root abrt  71K 10 déc 21:50 dmesg
-rw-r----- 1 root abrt   40 10 déc 21:50 duphash
-rw-r----- 1 root abrt   23 10 déc 21:50 extra-cc
-rw-r----- 1 root abrt    8 10 déc 21:50 hostname
-rw-r----- 1 root abrt   21 10 déc 21:50 kernel
-rw-r----- 1 root abrt   25 10 déc 21:50 kernel_tainted_long
-rw-r----- 1 root abrt    3 10 déc 21:50 kernel_tainted_short
-rw-r----- 1 root abrt   10 10 déc 21:50 last_occurrence
-rw-r----- 1 root abrt  173 10 déc 21:50 not-reportable
-rw-r----- 1 root abrt  518 10 déc 21:50 os_info
-rw-r----- 1 root abrt   32 10 déc 21:50 os_release
-rw-r----- 1 root abrt    6 10 déc 21:50 package
-rw-r----- 1 root abrt    7 10 déc 21:50 pkg_arch
-rw-r----- 1 root abrt    2 10 déc 21:50 pkg_epoch
-rw-r----- 1 root abrt   12 10 déc 21:50 pkg_name
-rw-r----- 1 root abrt    9 10 déc 21:50 pkg_release
-rw-r----- 1 root abrt    6 10 déc 21:50 pkg_version
-rw-r----- 1 root abrt 4,4K 10 déc 21:50 proc_modules
-rw-r----- 1 root abrt   37 10 déc 21:50 reason
-rw-r----- 1 root abrt    8 10 déc 21:50 runlevel
-rw-r----- 1 root abrt  269 10 déc 21:50 suspend_stats
-rw-r----- 1 root abrt   10 10 déc 21:50 time
-rw-r----- 1 root abrt   10 10 déc 21:50 type
-rw-r----- 1 root abrt   40 10 déc 21:50 uuid
This is what I get in journalctl/dmesg:
-- Logs begin at lun 2015-11-30 21:48:19 EST, end at jeu 2015-12-10 23:48:33
EST. --
déc 10 21:49:00 the_PC kernel: radeon 0000:02:00.0: ring 3 stalled for more
than 10115msec
déc 10 21:49:00 the_PC kernel: radeon 0000:02:00.0: GPU lockup (current fence
id 0x000000000000a5fe last fence id 0x000000000000a600 on ring 3)
déc 10 21:49:01 the_PC kernel: BUG: unable to handle kernel paging request at
ffffc90404239ffc
déc 10 21:49:01 the_PC kernel: IP: [<ffffffffa00f850a>]
radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: PGD 6068a8067 PUD 0 
déc 10 21:49:01 the_PC kernel: Oops: 0000 [#1] SMP 
déc 10 21:49:01 the_PC kernel: Modules linked in: fuse xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge
stp llc ebtable
déc 10 21:49:01 the_PC kernel:  radeon i2c_algo_bit drm_kms_helper ttm drm
serio_raw
déc 10 21:49:01 the_PC kernel: CPU: 3 PID: 153 Comm: kworker/u64:7 Tainted: G  
       I     4.2.6-301.fc23.x86_64 #1
déc 10 21:49:01 the_PC kernel: Hardware name: Dell Inc. Precision WorkStation
T3500  /0K095G, BIOS A17 05/28/2013
déc 10 21:49:01 the_PC kernel: Workqueue: radeon-crtc radeon_flip_work_func
[radeon]
déc 10 21:49:01 the_PC kernel: task: ffff88060299b880 ti: ffff8805ff5c0000
task.ti: ffff8805ff5c0000
déc 10 21:49:01 the_PC kernel: RIP: 0010:[<ffffffffa00f850a>] 
[<ffffffffa00f850a>] radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: RSP: 0018:ffff8805ff5c3c98  EFLAGS: 00010206
déc 10 21:49:01 the_PC kernel: RAX: ffffc9000fe50000 RBX: 00000000ffffffff RCX:
0000000000000000
déc 10 21:49:01 the_PC kernel: RDX: 0000000000000000 RSI: ffffc90404239ffc RDI:
0000000000080500
déc 10 21:49:01 the_PC kernel: RBP: ffff8805ff5c3cd8 R08: ffff8805771f8cc0 R09:
0000000000082000
déc 10 21:49:01 the_PC kernel: R10: 8000000000000163 R11: ffffffff81a609e9 R12:
ffff880036a654d8
déc 10 21:49:01 the_PC kernel: R13: ffff880036a654b0 R14: 0000000000020141 R15:
ffff8805ff5c3d30
déc 10 21:49:01 the_PC kernel: FS:  0000000000000000(0000)
GS:ffff880606ec0000(0000) knlGS:0000000000000000
déc 10 21:49:01 the_PC kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
déc 10 21:49:01 the_PC kernel: CR2: ffffc90404239ffc CR3: 0000000001c0b000 CR4:
00000000000006e0
déc 10 21:49:01 the_PC kernel: Stack:
déc 10 21:49:01 the_PC kernel:  ffff8805ff5c3cc8 ffffffffa00f9413
ffff880036a64000 ffff880036a64000
déc 10 21:49:01 the_PC kernel:  ffff880036a654d8 ffff8805ff5c3d30
ffff880036a654d8 0000000000000000
déc 10 21:49:01 the_PC kernel:  ffff8805ff5c3da8 ffffffffa00c6c80
ffffffff810df990 ffff880036a64738
déc 10 21:49:01 the_PC kernel: Call Trace:
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00f9413>] ?
radeon_irq_kms_disable_hpd+0x73/0x80 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00c6c80>]
radeon_gpu_reset+0xd0/0x330 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffff810df990>] ?
wake_atomic_t_function+0x70/0x70
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00e058f>] ?
radeon_fence_wait+0x9f/0xe0 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00ed960>]
radeon_flip_work_func+0x130/0x170 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b650e>]
process_one_work+0x19e/0x3f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b67ae>] worker_thread+0x4e/0x450
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b6760>] ?
process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b6760>] ?
process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810bc8b8>] kthread+0xd8/0xf0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810bc7e0>] ?
kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel:  [<ffffffff817797df>] ret_from_fork+0x3f/0x70
déc 10 21:49:01 the_PC kernel:  [<ffffffff810bc7e0>] ?
kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel: Code: 10 e1 48 85 c0 49 89 07 74 6c 41 8d 7e ff
31 d2 48 c1 e7 02 eb 07 49 8b 07 48 83 c2 04 49 8b 74 24 08 8d 4b 01 89 db 48
8d 34 9e <8b> 36 89 34 10 41 23 4c 24 54 48 39 d7 89 cb 75 da 4c 89 ef e8 
déc 10 21:49:01 the_PC kernel: RIP  [<ffffffffa00f850a>]
radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel:  RSP <ffff8805ff5c3c98>
déc 10 21:49:01 the_PC kernel: CR2: ffffc90404239ffc
déc 10 21:49:01 the_PC kernel: ---[ end trace 37e2470f6b251992 ]---
déc 10 21:49:01 the_PC kernel: BUG: unable to handle kernel paging request at
ffffffffffffffd8
déc 10 21:49:01 the_PC kernel: IP: [<ffffffff810bcd40>] kthread_data+0x10/0x20
déc 10 21:49:01 the_PC kernel: PGD 1c0e067 PUD 1c10067 PMD 0 
déc 10 21:49:01 the_PC kernel: Oops: 0000 [#2] SMP 
déc 10 21:49:01 the_PC kernel: Modules linked in: fuse xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge
stp llc ebtable
déc 10 21:49:01 the_PC kernel:  radeon i2c_algo_bit drm_kms_helper ttm drm
serio_raw
déc 10 21:49:01 the_PC kernel: CPU: 3 PID: 153 Comm: kworker/u64:7 Tainted: G  
   D   I     4.2.6-301.fc23.x86_64 #1
déc 10 21:49:01 the_PC kernel: Hardware name: Dell Inc. Precision WorkStation
T3500  /0K095G, BIOS A17 05/28/2013
déc 10 21:49:01 the_PC kernel: task: ffff88060299b880 ti: ffff8805ff5c0000
task.ti: ffff8805ff5c0000
déc 10 21:49:01 the_PC kernel: RIP: 0010:[<ffffffff810bcd40>] 
[<ffffffff810bcd40>] kthread_data+0x10/0x20
déc 10 21:49:01 the_PC kernel: RSP: 0018:ffff8805ff5c3918  EFLAGS: 00010096
déc 10 21:49:01 the_PC kernel: RAX: 0000000000000000 RBX: 0000000000000003 RCX:
0000000000000005
déc 10 21:49:01 the_PC kernel: RDX: 0000000000000005 RSI: 0000000000000003 RDI:
ffff88060299b880
déc 10 21:49:01 the_PC kernel: RBP: ffff8805ff5c3918 R08: ffff88060299b910 R09:
0000000000000000
déc 10 21:49:01 the_PC kernel: R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000167c0
déc 10 21:49:01 the_PC kernel: R13: ffff88060299b880 R14: ffff880606ed67c0 R15:
0000000000000003
déc 10 21:49:01 the_PC kernel: FS:  0000000000000000(0000)
GS:ffff880606ec0000(0000) knlGS:0000000000000000
déc 10 21:49:01 the_PC kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
déc 10 21:49:01 the_PC kernel: CR2: 0000000000000028 CR3: 0000000001c0b000 CR4:
00000000000006e0
déc 10 21:49:01 the_PC kernel: Stack:
déc 10 21:49:01 the_PC kernel:  ffff8805ff5c3938 ffffffff810b7385
ffff8805ff5c3938 ffff880606ed67c0
déc 10 21:49:01 the_PC kernel:  ffff8805ff5c3988 ffffffff81774fc0
ffff880500000000 ffff88060299b880
déc 10 21:49:01 the_PC kernel:  ffff8805ff5c3988 ffff8805ff5c4000
ffff8805ff5c39f0 ffff8805ff5c39f0
déc 10 21:49:01 the_PC kernel: Call Trace:
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b7385>]
wq_worker_sleeping+0x15/0xa0
déc 10 21:49:01 the_PC kernel:  [<ffffffff81774fc0>] __schedule+0x620/0x950
déc 10 21:49:01 the_PC kernel:  [<ffffffff81775327>] schedule+0x37/0x80
déc 10 21:49:01 the_PC kernel:  [<ffffffff810a103a>] do_exit+0x80a/0xae0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810180fe>] oops_end+0x9e/0xd0
déc 10 21:49:01 the_PC kernel:  [<ffffffff81064c25>] no_context+0x135/0x380
déc 10 21:49:01 the_PC kernel:  [<ffffffff81064ef0>]
__bad_area_nosemaphore+0x80/0x1f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff81065073>]
bad_area_nosemaphore+0x13/0x20
déc 10 21:49:01 the_PC kernel:  [<ffffffff81065357>] __do_page_fault+0xb7/0x400
déc 10 21:49:01 the_PC kernel:  [<ffffffff810656cf>] do_page_fault+0x2f/0x80
déc 10 21:49:01 the_PC kernel:  [<ffffffff8177b378>] page_fault+0x28/0x30
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00f850a>] ?
radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00f85b0>] ?
radeon_ring_backup+0x180/0x190 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00f9413>] ?
radeon_irq_kms_disable_hpd+0x73/0x80 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00c6c80>]
radeon_gpu_reset+0xd0/0x330 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffff810df990>] ?
wake_atomic_t_function+0x70/0x70
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00e058f>] ?
radeon_fence_wait+0x9f/0xe0 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffffa00ed960>]
radeon_flip_work_func+0x130/0x170 [radeon]
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b650e>]
process_one_work+0x19e/0x3f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b67ae>] worker_thread+0x4e/0x450
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b6760>] ?
process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810b6760>] ?
process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810bc8b8>] kthread+0xd8/0xf0
déc 10 21:49:01 the_PC kernel:  [<ffffffff810bc7e0>] ?
kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel:  [<ffffffff817797df>] ret_from_fork+0x3f/0x70
déc 10 21:49:01 the_PC kernel:  [<ffffffff810bc7e0>] ?
kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel: Code: c4 08 44 89 e8 5b 41 5c 41 5d 5d c3 4c 89
e7 e8 e7 eb fd ff eb 88 0f 1f 44 00 00 66 66 66 66 90 48 8b 87 90 05 00 00 55
48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
déc 10 21:49:01 the_PC kernel: RIP  [<ffffffff810bcd40>] kthread_data+0x10/0x20
déc 10 21:49:01 the_PC kernel:  RSP <ffff8805ff5c3918>
déc 10 21:49:01 the_PC kernel: CR2: ffffffffffffffd8
déc 10 21:49:01 the_PC kernel: ---[ end trace 37e2470f6b251993 ]---
déc 10 21:49:01 the_PC kernel: Fixing recursive fault but reboot is needed!
-- Reboot --
-- 
You are receiving this mail because:
You are the assignee for the bug.