https://bugs.freedesktop.org/show_bug.cgi?id=108814
Bug ID: 108814 Summary: VMC page fault on POLARIS&RAVEN Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: domen.stangar@gmail.com
I tried it on two computers.
Linux (none) 4.20.0-rc1+ #8 SMP PREEMPT Tue Nov 20 00:24:49 CET 2018 x86_64 AMD Athlon PRO 200GE w/ Radeon Vega Graphics AuthenticAMD GNU/Linux
Extended renderer info (GLX_MESA_query_renderer): Vendor: X.Org (0x1002) Device: AMD RAVEN (DRM 3.27.0, 4.20.0-rc1+, LLVM 7.0.0) (0x15dd) Version: 18.2.5
[ 80.221112] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:32 vmid:2 pasid:32768, for process roles pid 358 thread roles:cs0 pid 359) [ 80.221116] amdgpu 0000:38:00.0: in page starting at address 0x0000800000a94000 from 27 [ 80.221118] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00240C40
Other computer. Linux amd1.blue.org 4.19.2 #1 SMP PREEMPT Tue Nov 20 21:41:52 CET 2018 x86_64 AMD Ryzen 7 1700X Eight-Core Processor AuthenticAMD GNU/Linux
Extended renderer info (GLX_MESA_query_renderer): Vendor: X.Org (0x1002) Device: AMD Radeon (TM) RX 460 Graphics (POLARIS11, DRM 3.27.0, 4.19.2, LLVM 7.0.0) (0x67ef) Version: 18.2.5
[ 1253.329906] amdgpu 0000:0e:00.0: GPU fault detected: 147 0x09004802 for process roles pid 1119 thread roles:cs0 pid 1120 [ 1253.329910] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0000EB20 [ 1253.329911] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002 [ 1253.329914] amdgpu 0000:0e:00.0: VM fault (0x02, vmid 6, pasid 32769) at page 60192, read from 'TC0' (0x54433000) (72)
Is this llvm or mesa issue ? I also tried older kernel 4.16 same thing.
What reports do you need ?
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #1 from Domen domen.stangar@gmail.com --- Created attachment 142535 --> https://bugs.freedesktop.org/attachment.cgi?id=142535&action=edit umr dump
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #2 from Domen domen.stangar@gmail.com --- Created attachment 142536 --> https://bugs.freedesktop.org/attachment.cgi?id=142536&action=edit gallium dump t1
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #3 from Domen domen.stangar@gmail.com --- Created attachment 142537 --> https://bugs.freedesktop.org/attachment.cgi?id=142537&action=edit gallium dump t0
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #4 from Domen domen.stangar@gmail.com --- Created attachment 142538 --> https://bugs.freedesktop.org/attachment.cgi?id=142538&action=edit trace events amdgpu
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #5 from Domen domen.stangar@gmail.com --- Attached logs [ 332.004841] amdgpu 0000:0e:00.0: GPU fault detected: 147 0x0f800802 for process roles pid 1043 thread roles:cs0 pid 1044 [ 332.004844] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000EA1F0 [ 332.004845] amdgpu 0000:0e:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04008002 [ 332.004848] amdgpu 0000:0e:00.0: VM fault (0x02, vmid 2, pasid 32769) at page 958960, read from 'TC2' (0x54433200) (8)
https://bugs.freedesktop.org/show_bug.cgi?id=108814
Domen domen.stangar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|DRM/AMDgpu |Drivers/Gallium/radeonsi Version|unspecified |18.3 Product|DRI |Mesa QA Contact| |dri-devel@lists.freedesktop | |.org
https://bugs.freedesktop.org/show_bug.cgi?id=108814
Domen domen.stangar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|VMC page fault on |[radeonsi] page fault, umr |POLARIS&RAVEN |dump
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #6 from Domen domen.stangar@gmail.com --- Created attachment 142598 --> https://bugs.freedesktop.org/attachment.cgi?id=142598&action=edit another gallium dump
another dump, tried with propriery nvidia drivers. it works fine there.
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #7 from Domen domen.stangar@gmail.com --- Looks like sctx->bindless_descriptors->gpu_address is not accessable by gpu. 2e00000 is not in buffer list.
c0017600 SET_SH_REG: 0000014d 02e00000 SPI_SHADER_USER_DATA_COMMON_1 <- 0x02e00000
[ 174.469016] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:32 vmid:2 pasid:32769, for process roles pid 398 thread roles:cs0 pid 399) [ 174.469021] amdgpu 0000:38:00.0: in page starting at address 0x0000800002e04000 from 27 [ 174.469023] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00240C40 [ 184.763074] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=583, emitted seq=585
https://bugs.freedesktop.org/show_bug.cgi?id=108814
Domen domen.stangar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freedesktop.or | |g/show_bug.cgi?id=108261
https://bugs.freedesktop.org/show_bug.cgi?id=108814
--- Comment #8 from Domen domen.stangar@gmail.com --- Well this is bug when using bindless textures and framebuffer which is also resident in bindless textures. There is no more fault if i comment out si_upload_bindless_descriptor function.
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + num_dwords, 0)); radeon_emit(cs, S_370_DST_SEL(V_370_TC_L2) | S_370_WR_CONFIRM(1) | S_370_ENGINE_SEL(V_370_ME)); radeon_emit(cs, va); radeon_emit(cs, va >> 32);
https://bugs.freedesktop.org/show_bug.cgi?id=108814
Domen domen.stangar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
dri-devel@lists.freedesktop.org