[Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding - dri-devel - freedesktop.org experimental mailing list

29 Jan 2018


      https://bugs.freedesktop.org/show_bug.cgi?id=104825
Bug ID: 104825
           Summary: [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled
                    failed (scratch(0xC040)=0x00000000) when unbinding
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mlen@mlen.pl
I use two amdgpu rx480 cards. During boot one of them is rebound to vfio-pci
driver using the following script:
pci_ids=("0000:03:00.0" "0000:03:00.1")
for id in "${pci_ids[@]}"; do
  vendor="$(cat "/sys/bus/pci/devices/$id/vendor")"
  device="$(cat "/sys/bus/pci/devices/$id/device")"
if [ -e "/sys/bus/pci/devices/$id/driver/unbind" ]; then
    echo "$id" >"/sys/bus/pci/devices/$id/driver/unbind"
  fi
echo "$vendor $device" >/sys/bus/pci/drivers/vfio-pci/new_id
done
Starting from Linux 4.15 with amdgpu DC enabled (I wanted to use it for HDMI
audio), unbind operation causes general protection failure:
[   68.011473] [drm] amdgpu: finishing device.
[   68.377945] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   68.575193] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   68.770107] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   68.971775] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.164265] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.350089] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.538302] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.729260] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.729733] general protection fault: 0000 [#1] PREEMPT SMP PTI
[   69.730901] Modules linked in:
[   69.731936] CPU: 2 PID: 3934 Comm: openrc-run.sh Not tainted 4.15.0-gentoo
#2
[   69.733009] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[   69.734240] RIP: 0010:dm_read_reg_func.isra.0+0x3/0xc
[   69.735314] RSP: 0018:ffffa80d8bd4fc40 EFLAGS: 00010282
[   69.736353] RAX: ccea607dac10c354 RBX: ffff95af35bfca80 RCX:
0000000180200008
[   69.737408] RDX: 0000000180200009 RSI: 0000000000005c02 RDI:
ffff95af362b91c0
[   69.738494] RBP: ffff95af35b75c90 R08: 0000000000000001 R09:
ffffffffb30fced2
[   69.739561] R10: ffff95af2f770a40 R11: 00000000ffffff80 R12:
ffff95af35f30100
[   69.740638] R13: 0000000000000000 R14: ffff95af358a3c20 R15:
ffff95bf2d5f9c20
[   69.742007] FS:  00007fe7530af740(0000) GS:ffff95af3da00000(0000)
knlGS:0000000000000000
[   69.743083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.744134] CR2: 000000c420870008 CR3: 0000001027426003 CR4:
00000000003606e0
[   69.745190] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   69.746246] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   69.747320] Call Trace:
[   69.748359]  destroy+0x21/0x9c
[   69.749395]  dal_i2caux_destruct+0x6f/0xab
[   69.750459]  destroy+0x15/0x27
[   69.751495]  dal_i2caux_destroy+0x26/0x2f
[   69.752537]  destruct+0x86/0xfd
[   69.753571]  dc_destroy+0x11/0x22
[   69.754626]  dm_hw_fini+0x1e/0x22
[   69.755632]  amdgpu_fini+0xf3/0x2d6
[   69.756616]  amdgpu_device_fini+0x5c/0x158
[   69.757597]  amdgpu_driver_unload_kms+0x6b/0x7e
[   69.758601]  drm_dev_unregister+0x4c/0xc6
[   69.759584]  amdgpu_pci_remove+0x19/0x37
[   69.760576]  pci_device_remove+0x3b/0x8b
[   69.761563]  device_release_driver_internal+0x125/0x1f9
[   69.762580]  unbind_store+0x60/0x90
[   69.763568]  kernfs_fop_write+0x111/0x159
[   69.764557]  __vfs_write+0x33/0xd7
[   69.765543]  ? preempt_count_sub+0x8b/0x94
[   69.766552]  ? __sb_start_write+0xc0/0x180
[   69.767525]  vfs_write+0xa5/0xe2
[   69.768490]  SyS_write+0x5f/0xa3
[   69.769439]  do_syscall_64+0x72/0x81
[   69.770419]  entry_SYSCALL64_slow_path+0x25/0x25
[   69.771373] RIP: 0033:0x7fe7529a1408
[   69.772324] RSP: 002b:00007ffe1885c200 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   69.773300] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fe7529a1408
[   69.774290] RDX: 000000000000000d RSI: 0000558994641890 RDI:
0000000000000001
[   69.775290] RBP: 0000558994641890 R08: 000000000000000a R09:
00005589946475f0
[   69.776260] R10: 000000000000009b R11: 0000000000000246 R12:
000000000000000d
[   69.777248] R13: 0000000000000001 R14: 00007fe752c6e740 R15:
000000000000000d
[   69.778244] Code: f7 74 24 04 89 43 5c 48 8b 4c 24 08 65 48 33 0c 25 28 00
00 00 4c 89 e0 74 05 e8 39 58 9e ff 48 83 c4 10 5b 5d 41 5c c3 48 8b 07 <48> 8b
40 30 e9 03 4a 70 00 0f 1f 44 00 00 48 8b 47 30 8b 70 04 
[   69.779406] RIP: dm_read_reg_func.isra.0+0x3/0xc RSP: ffffa80d8bd4fc40
[   69.780491] ---[ end trace 6aa4681ba3a43ec3 ]---
[   71.815503] [drm:amdgpu_fill_buffer] *ERROR* Trying to clear memory with
ring turned off.
[   71.899258] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[   71.899263] amdgpu 0000:02:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[   72.217918] [drm:amdgpu_fill_buffer] *ERROR* Trying to clear memory with
ring turned off.
-- 
You are receiving this mail because:
You are the assignee for the bug.