Dear Alex, I've built your drm-next-3.19-wip branch (commit be762d181e130d0e6e630f823400e9e1ba3bafd8) and with that kernel and the graphics stack detailed below, I can't run any of my Steam games anymore. All of them immediately go into the defunct state, when their 3D engine is powered up (ie. I get to see the intro/vendor videos but almost nothing after that, including the main menus). When I attach gdb to a game process, the games stop on a SIGPWR. If I hit c I usually get a SIGXCPU next, sometimes it's another SIGPWR. But no matter how often you hit c you always get one of those two signals and the game is just dead. Normal desktop with 3D acceleration and effects (KDE) is working fine.
This is a regression over my previous kernel (Git:git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git:v3.18-rc1 + https://bugs.freedesktop.org/attachment.cgi?id=107451 and https://bugs.freedesktop.org/attachment.cgi?id=107544).
I didn't have time to bisect this yet, but maybe you have an idea what might cause this right away. I'm also not sure if it's a bug in the DRI portion of the kernel.
Is there anything besides a bisect you would need to debug this?
Cheers, Kai
Kai Wasserbäch wrote on 15.11.2014 16:33:
I've built your drm-next-3.19-wip branch (commit be762d181e130d0e6e630f823400e9e1ba3bafd8) and with that kernel and the graphics stack detailed below, [...]
Sorry, forgot to include the stack, here we go:
My current stack is (Debian testing as a base): GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1) Mesa: Git:master/a4ffc2a445 libdrm: Git:master/00847fa48b LLVM: SVN:trunk/r222082 (3.6 devel) X.Org: 2:1.16.1-1 Firmware: http://people.freedesktop.org/~agd5f/radeon_ucode/ # 9e05820da42549ce9c89d147cf1f8e19 hawaii_ce.bin # c8bab593090fc54f239c8d7596c8d846 hawaii_mc.bin # 3618dbb955d8a84970e262bb2e6d2a16 hawaii_me.bin # c000b0fc9ff6582145f66504b0ec9597 hawaii_mec.bin # 0643ad24b3beff2214cce533e094c1b7 hawaii_pfp.bin # ba6054b7d78184a74602fd81607e1386 hawaii_rlc.bin # 11288f635737331b69de9ee82fe04898 hawaii_sdma.bin # 284429675a5560e0fad42aa982965fc2 hawaii_smc.bin libclc: Git:master/7f6f5bff1f DDX: 1:7.5.0-1
Kai Wasserbäch wrote:
Dear Alex, I've built your drm-next-3.19-wip branch (commit be762d181e130d0e6e630f823400e9e1ba3bafd8) and with that kernel and the graphics stack detailed below, I can't run any of my Steam games anymore. All of them immediately go into the defunct state, when their 3D engine is powered up (ie. I get to see the intro/vendor videos but almost nothing after that, including the main menus). When I attach gdb to a game process, the games stop on a SIGPWR. If I hit c I usually get a SIGXCPU next, sometimes it's another SIGPWR. But no matter how often you hit c you always get one of those two signals and the game is just dead. Normal desktop with 3D acceleration and effects (KDE) is working fine.
This is a regression over my previous kernel (Git:git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git:v3.18-rc1
+ https://bugs.freedesktop.org/attachment.cgi?id=107451 and
https://bugs.freedesktop.org/attachment.cgi?id=107544).
I didn't have time to bisect this yet, but maybe you have an idea what might cause this right away. I'm also not sure if it's a bug in the DRI portion of the kernel.
Is there anything besides a bisect you would need to debug this?
I also see an issue with current drm-next-3.19-wip branch with pitcairn and demos line Unigine Valley and Unreal Elemental.
I was running last weeks drm-next-3.19-wip OK with these, so I guess the issue is near head, though the way other things get merged in etc makes it a bit hard to see what's new just looking at cgit.
This is what I get in dmesg
[ 730.830960] BUG: unable to handle kernel paging request at ffffeb0400404000 [ 730.830993] IP: [<ffffffff81156c72>] kfree+0x62/0x1c0 [ 730.831013] PGD 0 [ 730.831019] Oops: 0000 [#1] PREEMPT SMP [ 730.831034] Modules linked in: radeon fbcon bitblit fbcon_rotate fbcon_ccw fbcon_ud fbcon_cw softcursor snd_hda_codec_hdmi font snd_hda_codec_realtek tileblit snd_hda_codec_generic i2c_algo_bit snd_usb_audio snd_hda_intel drm_kms_helper snd_hda_controller snd_usbmidi_lib snd_hda_codec snd_rawmidi ttm snd_seq_device snd_hwdep snd_pcm drm snd_timer ata_generic r8169 pata_acpi snd xhci_pci xhci_hcd plusb soundcore asus_atk0110 pcspkr acpi_cpufreq usbnet mii k10temp i2c_piix4 pata_jmicron [ 730.831247] CPU: 0 PID: 576 Comm: RenderThread 1 Not tainted 3.18.0-rc4-gbe762d1 #1 [ 730.831274] Hardware name: System manufacturer System Product Name/M4A89GTD-PRO/USB3, BIOS 1456 05/04/2010 [ 730.831308] task: ffff8802254e60c0 ti: ffff8801c69a4000 task.ti: ffff8801c69a4000 [ 730.831333] RIP: 0010:[<ffffffff81156c72>] [<ffffffff81156c72>] kfree+0x62/0x1c0 [ 730.831360] RSP: 0018:ffff8801c69a7c98 EFLAGS: 00010286 [ 730.831377] RAX: ffffeb0400404000 RBX: ffffc90010100000 RCX: 0000000000000100 [ 730.831400] RDX: ffffea0000000000 RSI: ffff88008a528118 RDI: ffffc90010100000 [ 730.831424] RBP: ffff8801c69a7cd8 R08: ffffc90010100000 R09: ffff880224d942e8 [ 730.831447] R10: 00000000000002d0 R11: ffff880224d941b8 R12: ffff8801bf3c9a80 [ 730.831471] R13: ffffffffa030983c R14: ffff8801c69a7d00 R15: ffff8801c69a7dc8 [ 730.831495] FS: 00007f3911d62700(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000 [ 730.831523] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 730.831542] CR2: ffffeb0400404000 CR3: 00000001d1de8000 CR4: 00000000000007f0 [ 730.832968] Stack: [ 730.834373] ffff8801c69a7ca8 ffffffff815379c1 ffff8801c69a7cd8 ffff8801be8796c0 [ 730.835811] ffff8801bf3c9a80 ffff880224d94000 ffff8801c69a7d00 ffff8801c69a7dc8 [ 730.837257] ffff8801c69a7d78 ffffffffa030983c ffffc90010100000 ffffc90000000000 [ 730.838699] Call Trace: [ 730.840126] [<ffffffff815379c1>] ? _raw_spin_unlock+0x11/0x30 [ 730.841597] [<ffffffffa030983c>] radeon_gem_va_ioctl+0x4fc/0x610 [radeon] [ 730.843073] [<ffffffffa013f1ac>] drm_ioctl+0x1ac/0x630 [drm] [ 730.844541] [<ffffffff8106b205>] ? preempt_count_add+0x55/0xb0 [ 730.846007] [<ffffffff815379f3>] ? _raw_spin_unlock_irqrestore+0x13/0x30 [ 730.847475] [<ffffffff8136f586>] ? __pm_runtime_resume+0x56/0x70 [ 730.848947] [<ffffffffa02d5053>] radeon_drm_ioctl+0x53/0x90 [radeon] [ 730.850423] [<ffffffff811710f0>] do_vfs_ioctl+0x2d0/0x4b0 [ 730.851885] [<ffffffff8117ae14>] ? __fget+0x74/0xb0 [ 730.853347] [<ffffffff81171317>] SyS_ioctl+0x47/0x90 [ 730.854812] [<ffffffff81538516>] system_call_fastpath+0x16/0x1b [ 730.856284] Code: 00 00 00 80 ff 77 00 00 48 01 d8 48 0f 42 15 b6 03 8c 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 <48> 8b 10 80 e6 80 0f 85 2f 01 00 00 49 89 c6 49 8b 06 a8 80 0f [ 730.857914] RIP [<ffffffff81156c72>] kfree+0x62/0x1c0 [ 730.859465] RSP <ffff8801c69a7c98> [ 730.861021] CR2: ffffeb0400404000 [ 730.872919] ---[ end trace c7fce73a27e29045 ]---
Andy Furniss wrote:
I didn't have time to bisect this yet, but maybe you have an idea what might cause this right away. I'm also not sure if it's a bug in the DRI portion of the kernel.
Is there anything besides a bisect you would need to debug this?
I am just a user - but after some testing maybe it would be better if you can reproduce on similar kernel to your old one and bisect that.
On drm-next-3.19-wip I only have to go back 4 commits to hit ones I've been running for weeks (and I am still failing if I reset to there) so for me at least bisecting this tree isn't really an option - I wouldn't be testing like for like.
I also see an issue with current drm-next-3.19-wip branch with pitcairn and demos line Unigine Valley and Unreal Elemental.
I was running last weeks drm-next-3.19-wip OK with these, so I guess the issue is near head, though the way other things get merged in etc makes it a bit hard to see what's new just looking at cgit.
Stable for me is drm-next-3.19-wip at 3.18.0-rc2-gc76c717 failing current = 3.18.0-rc4-gbe762d1 so it seems to be something in the merge that took rc-2 to rc-4.
Kai Wasserbäch wrote on 15.11.2014 16:33:
Is there anything besides a bisect you would need to debug this?
Ok, I did a bisection, but that time was wasted for sure. My "first bad commit" isn't bad at all. Is there any way to improve that experience? I'm really loathe to go through the dozen boots again, just to get another broken bisection.
I noticed, however, that the following line is showing up with be762d181e in dmesg: [drm:ci_dpm_init [radeon]] *ERROR* Invalid PCC GPIO!== power state 0 ==
And when I run any game and hit that SIGPWR, then the following shows up in dmesg: [ 154.120246] BUG: unable to handle kernel paging request at ffffeae38016fec8 [ 154.120272] IP: [<ffffffff8111e6b1>] virt_to_head_page+0x33/0x4a [ 154.120293] PGD 0 [ 154.120300] Oops: 0000 [#1] SMP [ 154.120312] Modules linked in: serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_x86_64 blowfish_common ecb cmac sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc nls_utf8 nls_cp437 vfat fat x86_pkg_temp_thermal coretemp iTCO_wdt kvm_intel snd_hda_codec_realtek radeon snd_hda_codec_generic snd_hda_codec_hdmi joydev snd_hda_intel iTCO_vendor_support drm_kms_helper ttm snd_hda_controller kvm snd_hda_codec evdev snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_timer lpc_ich mfd_core mei_me mei snd i2c_i801 soundcore processor efivars button video serio_raw pcspkr fuse parport_pc ppdev lp parport ext4 crc16 mbcache jbd2 btrfs xor raid6_pq twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common [ 154.120568] xts af_alg hid_generic usbhid dm_crypt dm_mod microcode hid_lg_g710_plus(O) hid sg sr_mod cdrom sd_mod crct10dif_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ahci libahci libata atl1c thermal fan thermal_sys [ 154.120659] CPU: 0 PID: 2041 Comm: Dreamfall Chapt Tainted: G O 3.18.0-rc4-citadel-3.18-rc1+agd5f-3.19-wip.0.git-be762d181e #1 [ 154.120691] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-DS3H, BIOS F11a 11/13/2013 [ 154.120717] task: ffff8803dba9ca90 ti: ffff8803f5bdc000 task.ti: ffff8803f5bdc000 [ 154.120737] RIP: 0010:[<ffffffff8111e6b1>] [<ffffffff8111e6b1>] virt_to_head_page+0x33/0x4a [ 154.120760] RSP: 0018:ffff8803f5bdfcf0 EFLAGS: 00010082 [ 154.120774] RAX: ffffeae38016fec8 RBX: 0000000000000286 RCX: 000077ff80000000 [ 154.120793] RDX: ffffea0000000000 RSI: ffff8803f5bdfd30 RDI: ffffc9000691f000 [ 154.120812] RBP: ffffc9000691f000 R08: 0000000000000000 R09: ffff8800dedf5008 [ 154.120830] R10: ffff8800dedf4fe0 R11: 000000000007ffff R12: ffffffffa049adde [ 154.120849] R13: ffff8803f5bdfd30 R14: ffff88038f465ec0 R15: ffff88038f1166c0 [ 154.120867] FS: 00007f6efa8ed780(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000 [ 154.120888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 154.120903] CR2: ffffeae38016fec8 CR3: 00000003f69e6000 CR4: 00000000001407f0 [ 154.120922] Stack: [ 154.120928] ffffffff8111eda1 ffff8803f5bdfde0 0000000000000000 ffff8800dedf4000 [ 154.120950] ffffffffa049adde ffffc9000691f000 00000000a049a458 0000000000000000 [ 154.120973] ffffc900069200f0 ffff8803f5bdfd58 ffff8803dba9ca90 00000000000021af [ 154.120995] Call Trace: [ 154.121004] [<ffffffff8111eda1>] ? kfree+0x2e/0x6d [ 154.121028] [<ffffffffa049adde>] ? radeon_gem_va_ioctl+0x284/0x2dc [radeon] [ 154.121055] [<ffffffffa049a601>] ? radeon_gem_create_ioctl+0xa6/0xc3 [radeon] [ 154.121076] [<ffffffff812be431>] ? drm_ioctl+0x35b/0x3e1 [ 154.121097] [<ffffffffa049ab5a>] ? radeon_gem_get_tiling_ioctl+0x8e/0x8e [radeon] [ 154.121118] [<ffffffff81440d8c>] ? _raw_spin_unlock_irqrestore+0xc/0xd [ 154.121137] [<ffffffff8106fcf1>] ? set_next_entity+0x37/0x89 [ 154.121157] [<ffffffffa047604b>] ? radeon_drm_ioctl+0x4b/0x7a [radeon] [ 154.121176] [<ffffffff8113e795>] ? do_vfs_ioctl+0x34e/0x404 [ 154.121192] [<ffffffff8106849d>] ? finish_task_switch+0x85/0xe0 [ 154.121209] [<ffffffff8143e85e>] ? __schedule+0x376/0x524 [ 154.121225] [<ffffffff8113e89c>] ? SyS_ioctl+0x51/0x77 [ 154.121239] [<ffffffff81441329>] ? system_call_fastpath+0x12/0x17 [ 154.121256] Code: 00 00 80 ff 77 00 00 48 01 fa 48 0f 42 0d 78 99 6f 00 48 8d 04 11 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 48 01 d0 <48> 8b 10 80 e6 80 74 0e 48 8b 50 30 48 8b 08 80 e5 80 48 0f 45 [ 154.121371] RIP [<ffffffff8111e6b1>] virt_to_head_page+0x33/0x4a [ 154.121389] RSP <ffff8803f5bdfcf0> [ 154.121398] CR2: ffffeae38016fec8 [ 154.133289] ---[ end trace f4a1176ae520c6e2 ]---
gdb itself shows (taken from the first time I hit this bug) with Dreamfall Chapters (but as I've written before, other games like Borderlands 2/The Pre-Sequel using Steam fail as well): 0x00007fabd96db18d in poll () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) c Continuing.
Program received signal SIGPWR, Power fail/restart. [Switching to Thread 0x7fabd4f37700 (LWP 1959)] sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 85 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory. (gdb) bt full #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 No locals. #1 0x00007fabd58984f3 in mono_sem_wait () from /home/kai/.local/share/Steam/SteamApps/common/Dreamfall Chapters/Dreamfall Chapters_Data/Mono/x86_64/libmono.so No symbol table info available. #2 0x00007fabd57fea3b in ?? () from /home/kai/.local/share/Steam/SteamApps/common/Dreamfall Chapters/Dreamfall Chapters_Data/Mono/x86_64/libmono.so No symbol table info available. #3 0x00007fabd5869085 in ?? () from /home/kai/.local/share/Steam/SteamApps/common/Dreamfall Chapters/Dreamfall Chapters_Data/Mono/x86_64/libmono.so No symbol table info available. #4 0x00007fabd5890118 in ?? () from /home/kai/.local/share/Steam/SteamApps/common/Dreamfall Chapters/Dreamfall Chapters_Data/Mono/x86_64/libmono.so No symbol table info available. #5 0x00007fabd58b1389 in ?? () from /home/kai/.local/share/Steam/SteamApps/common/Dreamfall Chapters/Dreamfall Chapters_Data/Mono/x86_64/libmono.so No symbol table info available. #6 0x00007fabdaf2e0a4 in start_thread (arg=0x7fabd4f37700) at pthread_create.c:309 __res = <optimized out> pd = 0x7fabd4f37700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140375988860672, -6352670935743070686, 0, 140376098332768, 140375998660696, 140375988860672, 6377500145404476962, 6377504531018677794}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #7 0x00007fabd96e3ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 No locals. (gdb) info registers all rax 0xfffffffffffffe00 -512 rbx 0x7fabd5bbbfd0 140376001986512 rcx 0xffffffffffffffff -1 rdx 0x0 0 rsi 0x80 128 rdi 0x7fabd5bbbfd0 140376001986512 rbp 0x0 0x0 rsp 0x7fabd4f36d70 0x7fabd4f36d70 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x7fabd57fe9e5 140375998065125 r13 0x0 0 r14 0x0 0 r15 0x7fabd4f36dd8 140375988858328 rip 0x7fabdaf34050 0x7fabdaf34050 <sem_wait+48> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 st0 0 (raw 0x00000000000000000000) st1 0 (raw 0x00000000000000000000) st2 0 (raw 0x00000000000000000000) st3 0 (raw 0x00000000000000000000) st4 0 (raw 0x00000000000000000000) st5 0 (raw 0x00000000000000000000) st6 0 (raw 0x00000000000000000000) st7 0 (raw 0x00000000000000000000) fctrl 0x27f 639 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x1fa0 [ PE IM DM ZM OM UM PM ] ymm0 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x8000000000000000, 0x0, 0x0, 0x0}, v32_int8 = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0 <repeats 24 times>}, v16_int16 = {0xffff, 0xffff, 0xffff, 0xffff, 0x0 <repeats 12 times>}, v8_int32 = {0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000ffffffffffffffff, 0x00000000000000000000000000000000}} ymm1 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm2 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm3 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm4 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x20, 0x6f, 0x62, 0x6a, 0x65, 0x63, 0x74, 0x20, 0x0 <repeats 24 times>}, v16_int16 = {0x6f20, 0x6a62, 0x6365, 0x2074, 0x0 <repeats 12 times>}, v8_int32 = { 0x6a626f20, 0x20746365, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x207463656a626f20, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000207463656a626f20, 0x00000000000000000000000000000000}} ymm5 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x8000000000000000, 0x0, 0x0, 0x0}, v32_int8 = {0x68, 0x65, 0x63, 0x6b, 0x70, 0x6f, 0x69, 0x6e, 0x0 <repeats 24 times>}, v16_int16 = {0x6568, 0x6b63, 0x6f70, 0x6e69, 0x0 <repeats 12 times>}, v8_int32 = {0x6b636568, 0x6e696f70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x6e696f706b636568, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000006e696f706b636568, 0x00000000000000000000000000000000}} ymm6 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x8000000000000000, 0x0, 0x0, 0x0}, v32_int8 = {0x65, 0x5f, 0x69, 0x6e, 0x74, 0x65, 0x72, 0x72, 0x0 <repeats 24 times>}, v16_int16 = {0x5f65, 0x6e69, 0x6574, 0x7272, 0x0 <repeats 12 times>}, v8_int32 = {0x6e695f65, 0x72726574, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x727265746e695f65, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000727265746e695f65, 0x00000000000000000000000000000000}} ymm7 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm8 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm9 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0 <repeats 21 times>}, v16_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x0, 0x0, 0xff0000, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0xff0000, 0x0, 0x0}, v2_int128 = {0x0000000000ff00000000000000000000, 0x00000000000000000000000000000000}} ymm10 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm11 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x8000000000000000, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x0, 0xff00, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0xff000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xff00000000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000ff00000000000000, 0x00000000000000000000000000000000}} ymm12 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm13 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm14 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm15 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = { 0x00000000000000000000000000000000, 0x00000000000000000000000000000000}}
Stuff like KDE's desktop effects (renderer settings: OpenGL 3.1, native) or glxgears don't trigger this bug.
Kai Wasserbäch wrote:
Kai Wasserbäch wrote on 15.11.2014 16:33:
Is there anything besides a bisect you would need to debug this?
Ok, I did a bisection, but that time was wasted for sure. My "first bad commit" isn't bad at all. Is there any way to improve that experience? I'm really loathe to go through the dozen boots again, just to get another broken bisection.
So you don't have any bads at all on linus kernel?
I agree bisecting kernels is a pain :-)
Of course it's possible our issues aren't the same anyway.
Andy Furniss wrote on 15.11.2014 23:29:
Kai Wasserbäch wrote:
Kai Wasserbäch wrote on 15.11.2014 16:33:
Is there anything besides a bisect you would need to debug this?
Ok, I did a bisection, but that time was wasted for sure. My "first bad commit" isn't bad at all. Is there any way to improve that experience? I'm really loathe to go through the dozen boots again, just to get another broken bisection.
So you don't have any bads at all on linus kernel?
Oh, yes, that was the problem. The bad commit was so obviously not the cause for my issue and way down in Linus merge land, that I was sure that git bisect must have skipped a bit too much somewhere (or I made a mistake with some record).
Of course it's possible our issues aren't the same anyway.
Looking at the backtrace you've posted, I would say so. Your bug seems to happen somewhere else. In any case, I managed to hunt my commit down.
Cheers, Kai
Kai Wasserbäch wrote on 15.11.2014 22:22:
Kai Wasserbäch wrote on 15.11.2014 16:33:
Is there anything besides a bisect you would need to debug this?
Ok, I did a bisection, but that time was wasted for sure. My "first bad commit" isn't bad at all. Is there any way to improve that experience? I'm really loathe to go through the dozen boots again, just to get another broken bisection.
Ok, after looking at the changes for radeon I decided to try the HEAD of drm-next-3.19 (c81b99423bd9d3fc35ac8752ca5fb4c50eab063c). That was still good. Armed with this much smaller bisection range, I came up with a result that sounds at least believable: 3cd76f3901e73a4a61d78c4526dcbf6d87c9a928 is the first bad commit commit 3cd76f3901e73a4a61d78c4526dcbf6d87c9a928 Author: Christian König christian.koenig@amd.com Date: Mon Oct 13 12:41:47 2014 +0200
drm/radeon: update the VM after setting BO address (v2)
This way the necessary VM update is kicked off immediately if all BOs involved are in GPU accessible memory.
v2: fix vm lock handling
Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
:040000 040000 5bb1cc4497cc2e5b9af3a59170d7a8d5c99fd0a0 3df4996bc1a25f9275b833c778cdc33a30b28838 M drivers
CCing Christian, maybe he has an idea, what's up.
Let me know, if you need something else, to debug this or if I should file this as a bug.
I noticed, however, that the following line is showing up with be762d181e in dmesg: [drm:ci_dpm_init [radeon]] *ERROR* Invalid PCC GPIO!== power state 0 ==
This seems to be unrelated as it shows up with all the commits from the 3.19 branch, I've tested for this bisection run, even those, that worked. (Might this be, however, related to https://bugs.freedesktop.org/show_bug.cgi?id=82201?)
Cheers, Kai
Kai Wasserbäch wrote:
Kai Wasserbäch wrote on 15.11.2014 22:22:
Kai Wasserbäch wrote on 15.11.2014 16:33:
Is there anything besides a bisect you would need to debug this?
Ok, I did a bisection, but that time was wasted for sure. My "first bad commit" isn't bad at all. Is there any way to improve that experience? I'm really loathe to go through the dozen boots again, just to get another broken bisection.
Ok, after looking at the changes for radeon I decided to try the HEAD of drm-next-3.19 (c81b99423bd9d3fc35ac8752ca5fb4c50eab063c). That was still good. Armed with this much smaller bisection range, I came up with a result that sounds at least believable: 3cd76f3901e73a4a61d78c4526dcbf6d87c9a928 is the first bad commit commit 3cd76f3901e73a4a61d78c4526dcbf6d87c9a928 Author: Christian König christian.koenig@amd.com Date: Mon Oct 13 12:41:47 2014 +0200
drm/radeon: update the VM after setting BO address (v2)
Yes, that seems to be it for me also - but just to confuse things I've been running with that for several weeks without incident, so something brought in from the recent merges/a new patch doesn't play nicely with it.
If you wanted to test you could get back to how drm-next-3.19-wip was last week by
git reset --hard 3.18.0-rc2-gc76c717
and you can see it's there with git log about 6/7 commits down.
On Sat, Nov 15, 2014 at 6:59 PM, Kai Wasserbäch kai@dev.carbon-project.org wrote:
Kai Wasserbäch wrote on 15.11.2014 22:22:
Kai Wasserbäch wrote on 15.11.2014 16:33:
Is there anything besides a bisect you would need to debug this?
Ok, I did a bisection, but that time was wasted for sure. My "first bad commit" isn't bad at all. Is there any way to improve that experience? I'm really loathe to go through the dozen boots again, just to get another broken bisection.
Ok, after looking at the changes for radeon I decided to try the HEAD of drm-next-3.19 (c81b99423bd9d3fc35ac8752ca5fb4c50eab063c). That was still good. Armed with this much smaller bisection range, I came up with a result that sounds at least believable: 3cd76f3901e73a4a61d78c4526dcbf6d87c9a928 is the first bad commit commit 3cd76f3901e73a4a61d78c4526dcbf6d87c9a928 Author: Christian König christian.koenig@amd.com Date: Mon Oct 13 12:41:47 2014 +0200
drm/radeon: update the VM after setting BO address (v2) This way the necessary VM update is kicked off immediately if all BOs involved are in GPU accessible memory. v2: fix vm lock handling Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
:040000 040000 5bb1cc4497cc2e5b9af3a59170d7a8d5c99fd0a0 3df4996bc1a25f9275b833c778cdc33a30b28838 M drivers
CCing Christian, maybe he has an idea, what's up.
Let me know, if you need something else, to debug this or if I should file this as a bug.
I noticed, however, that the following line is showing up with be762d181e in dmesg: [drm:ci_dpm_init [radeon]] *ERROR* Invalid PCC GPIO!== power state 0 ==
This seems to be unrelated as it shows up with all the commits from the 3.19 branch, I've tested for this bisection run, even those, that worked. (Might this be, however, related to https://bugs.freedesktop.org/show_bug.cgi?id=82201?)
Yes, that is just some additional debugging output for dpm on newer kernels. It doesn't change anything from a hw perspective.
Alex
Cheers, Kai
Dear Alex, Alex Deucher wrote on 17.11.2014 17:58:
On Sat, Nov 15, 2014 at 6:59 PM, Kai Wasserbäch kai@dev.carbon-project.org wrote:
Kai Wasserbäch wrote on 15.11.2014 22:22:
[...]
I noticed, however, that the following line is showing up with be762d181e in dmesg: [drm:ci_dpm_init [radeon]] *ERROR* Invalid PCC GPIO!== power state 0 ==
This seems to be unrelated as it shows up with all the commits from the 3.19 branch, I've tested for this bisection run, even those, that worked. (Might this be, however, related to https://bugs.freedesktop.org/show_bug.cgi?id=82201?)
Yes, that is just some additional debugging output for dpm on newer kernels. It doesn't change anything from a hw perspective.
in case you're interested: with your latest drm-next-3.19-wip branch I get the more detailed: [drm:ci_dpm_init [radeon]] *ERROR* Invalid PCC GPIO: 13!
The full dmesg for radeon/drm looks like: [ 34.648649] [drm] radeon kernel modesetting enabled. [ 34.648707] checking generic (e0000000 300000) vs hw (e0000000 10000000) [ 34.648708] fb: switching to radeondrmfb from EFI VGA [ 34.648735] Console: switching to colour dummy device 80x25 [ 34.649082] [drm] initializing kernel modesetting (HAWAII 0x1002:0x67B1 0x1682:0x9295). [ 34.649093] [drm] register mmio base: 0xF7E00000 [ 34.649094] [drm] register mmio size: 262144 [ 34.649099] [drm] doorbell mmio base: 0xF0000000 [ 34.649100] [drm] doorbell mmio size: 8388608 [ 34.649119] radeon 0000:01:00.0: Invalid ROM contents [ 34.649164] ATOM BIOS: C67111 [ 34.649212] radeon 0000:01:00.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used) [ 34.649215] radeon 0000:01:00.0: GTT: 1024M 0x0000000100000000 - 0x000000013FFFFFFF [ 34.649216] [drm] Detected VRAM RAM=4096M, BAR=256M [ 34.649218] [drm] RAM width 512bits DDR [ 34.649317] [TTM] Zone kernel: Available graphics memory: 8215284 kiB [ 34.649318] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 34.649319] [TTM] Initializing pool allocator [ 34.649324] [TTM] Initializing DMA pool allocator [ 34.649340] [drm] radeon: 4096M of VRAM memory ready [ 34.649341] [drm] radeon: 1024M of GTT memory ready. [ 34.649354] [drm] Loading hawaii Microcode [ 35.111851] [drm] Internal thermal controller with fan control [ 35.111914] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e [ 35.112580] [drm:ci_dpm_init [radeon]] *ERROR* Invalid PCC GPIO: 13! [ 35.112587] == power state 0 == [ 35.112588] ui class: none [ 35.112591] internal class: boot [ 35.112593] caps: [ 35.112595] uvd vclk: 0 dclk: 0 [ 35.112598] power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 [ 35.112600] status: c r b [ 35.112603] == power state 1 == [ 35.112604] ui class: performance [ 35.112605] internal class: none [ 35.112606] caps: [ 35.112608] uvd vclk: 0 dclk: 0 [ 35.112609] power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 [ 35.112611] power level 1 sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16 [ 35.112612] status: [ 35.121930] [drm] radeon: dpm initialized [ 35.301180] [drm] Found VCE firmware/feedback version 40.2.2 / 15! [ 35.301195] [drm] GART: num cpu pages 262144, num gpu pages 262144 [ 35.302014] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e [ 35.302018] [drm] PCIE gen 3 link speeds already enabled [ 35.318708] [drm] PCIE GART of 1024M enabled (table at 0x000000000078C000). [ 35.318825] radeon 0000:01:00.0: WB enabled [ 35.318837] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000c00 and cpu addr 0xffff88040918ac00 [ 35.318839] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000c04 and cpu addr 0xffff88040918ac04 [ 35.318841] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000c08 and cpu addr 0xffff88040918ac08 [ 35.318843] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000c0c and cpu addr 0xffff88040918ac0c [ 35.318845] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000c10 and cpu addr 0xffff88040918ac10 [ 35.319238] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xffffc90005d36c98 [ 35.319303] radeon 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000100000c18 and cpu addr 0xffff88040918ac18 [ 35.319306] radeon 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000100000c1c and cpu addr 0xffff88040918ac1c [ 35.319308] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 35.319309] [drm] Driver supports precise vblank timestamp query. [ 35.319333] radeon 0000:01:00.0: irq 30 for MSI/MSI-X [ 35.319346] radeon 0000:01:00.0: radeon: using MSI. [ 35.319372] [drm] radeon: irq initialized. [ 35.322366] [drm] ring test on 0 succeeded in 3 usecs [ 35.322445] [drm] ring test on 1 succeeded in 3 usecs [ 35.322465] [drm] ring test on 2 succeeded in 3 usecs [ 35.322605] [drm] ring test on 3 succeeded in 4 usecs [ 35.322614] [drm] ring test on 4 succeeded in 4 usecs [ 35.368313] [drm] ring test on 5 succeeded in 2 usecs [ 35.388178] [drm] UVD initialized successfully. [ 35.497411] [drm] ring test on 6 succeeded in 23 usecs [ 35.497420] [drm] ring test on 7 succeeded in 3 usecs [ 35.497421] [drm] VCE initialized successfully. [ 35.499507] [drm] ib test on ring 0 succeeded in 0 usecs [ 35.499641] [drm] ib test on ring 1 succeeded in 0 usecs [ 35.499775] [drm] ib test on ring 2 succeeded in 0 usecs [ 35.499909] [drm] ib test on ring 3 succeeded in 0 usecs [ 35.500042] [drm] ib test on ring 4 succeeded in 0 usecs [ 36.018648] [drm] ib test on ring 5 succeeded [ 36.039436] [drm] ib test on ring 6 succeeded [ 36.040242] [drm] ib test on ring 7 succeeded [ 36.041280] [drm] Radeon Display Connectors [ 36.041283] [drm] Connector 0: [ 36.041284] [drm] DP-1 [ 36.041285] [drm] HPD2 [ 36.041288] [drm] DDC: 0x6530 0x6530 0x6534 0x6534 0x6538 0x6538 0x653c 0x653c [ 36.041289] [drm] Encoders: [ 36.041290] [drm] DFP1: INTERNAL_UNIPHY2 [ 36.041292] [drm] Connector 1: [ 36.041293] [drm] HDMI-A-1 [ 36.041294] [drm] HPD3 [ 36.041296] [drm] DDC: 0x6550 0x6550 0x6554 0x6554 0x6558 0x6558 0x655c 0x655c [ 36.041297] [drm] Encoders: [ 36.041299] [drm] DFP2: INTERNAL_UNIPHY2 [ 36.041300] [drm] Connector 2: [ 36.041301] [drm] DVI-D-1 [ 36.041302] [drm] HPD1 [ 36.041304] [drm] DDC: 0x6560 0x6560 0x6564 0x6564 0x6568 0x6568 0x656c 0x656c [ 36.041305] [drm] Encoders: [ 36.041307] [drm] DFP3: INTERNAL_UNIPHY1 [ 36.041308] [drm] Connector 3: [ 36.041309] [drm] DVI-D-2 [ 36.041310] [drm] HPD6 [ 36.041312] [drm] DDC: 0x6580 0x6580 0x6584 0x6584 0x6588 0x6588 0x658c 0x658c [ 36.041313] [drm] Encoders: [ 36.041315] [drm] DFP4: INTERNAL_UNIPHY [ 36.041505] switching from power state: [ 36.041506] ui class: none [ 36.041508] internal class: boot [ 36.041510] caps: [ 36.041512] uvd vclk: 0 dclk: 0 [ 36.041514] power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 [ 36.041515] status: c b [ 36.041518] switching to power state: [ 36.041519] ui class: performance [ 36.041520] internal class: none [ 36.041522] caps: [ 36.041523] uvd vclk: 0 dclk: 0 [ 36.041525] power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 [ 36.041527] power level 1 sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16 [ 36.041528] status: r [ 36.117075] [drm] fb mappable at 0xE098F000 [ 36.117078] [drm] vram apper at 0xE0000000 [ 36.117079] [drm] size 14745600 [ 36.117080] [drm] fb depth is 24 [ 36.117081] [drm] pitch is 10240 [ 36.117225] fbcon: radeondrmfb (fb0) is primary device [ 36.117321] switching from power state: [ 36.117322] ui class: performance [ 36.117323] internal class: none [ 36.117323] caps: [ 36.117324] uvd vclk: 0 dclk: 0 [ 36.117325] power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 [ 36.117326] power level 1 sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16 [ 36.117326] status: c r [ 36.117327] switching to power state: [ 36.117327] ui class: performance [ 36.117328] internal class: none [ 36.117328] caps: [ 36.117329] uvd vclk: 0 dclk: 0 [ 36.117329] power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 [ 36.117330] power level 1 sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16 [ 36.117331] status: c r [ 36.144422] Console: switching to colour frame buffer device 320x90 [ 36.153721] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device [ 36.153722] radeon 0000:01:00.0: registered panic notifier [ 36.165392] [drm] Initialized radeon 2.40.0 20080528 for 0000:01:00.0 on minor 0
Let me know, if you need something else.
Cheers, Kai
dri-devel@lists.freedesktop.org