https://bugs.freedesktop.org/show_bug.cgi?id=102322
--- Comment #15 from Andrey Grodzovsky andrey.grodzovsky@amd.com --- (In reply to dwagner from comment #13)
(In reply to Andrey Grodzovsky from comment #12)
Can you load the kernel with grub command line amdgpu.vm_update_mode=3 to force CPU VM update mode and see if this helps ?
Sure. Too early yet to say "hurray", but at an uptime of one hour, currently, 4.17.2 survived with amdgpu.vm_update_mode=3 already about 20 times longer than without that option before the first crash.
One (probably just informal) message is emitted by the kernel: [ 19.319565] CPU update of VM recommended only for large BAR system
Can you explain a little: What is a "large BAR system", and what does the vm_update_mode=3 option actually cause? Should I expect any weird side effects to look for?
I think it just means systems with large VRAM so it will require large BAR for mapping. But I am not sure on that point. vm_update_mode=3 means GPUVM page tables update is done using CPU. By default we do it using DMA engine on the ASIC. The log showed a hang in this engine so I assumed there is something wrong with SDMA commands we submit. I assume more CPU utilization as a side effect and maybe slower rendering.
BTW: Not a result of that option, but of the kernel version, seems to be the fact that the shader clock keeps at a pretty high frequency all the time - even without any 3d or compute load, just displaying a quiet 4k/60Hz desktop image:
cat pp_dpm_sclk 0: 214Mhz 1: 481Mhz 2: 760Mhz 3: 1020Mhz 4: 1102Mhz 5: 1138Mhz 6: 1180Mhz * 7: 1220Mhz
Much lower shader clocks are used only if I lower the refresh rate of the screen. Is there a reason why the shader clocks should stay high even in the absence of 3d/compute load?
(I would have better understood if the minimum memory clock was depending on the refresh rate, but memory clock stays as low as with the older kernels.)