https://bugs.freedesktop.org/show_bug.cgi?id=100465
--- Comment #10 from Alex Deucher alexdeucher@gmail.com --- (In reply to Julien Isorce from comment #9)
So I have 4 questions: 1: Can an application causes a "ring 0 stalled" ? or is it a driver bug (kernel side or mesa/drm or xserver) ?
driver bug. Probably mesa or kernel.
2: About these atombios failures, does it mean that it fails to load the gpu microcode/firmware ?
Most likely the GPU reset was not actually successful and the atombios errors are a symptom of that.
3: Does it try to do a gpu softreset because I added R600_DEBUG=check_vm ? Or this one just help to flush the traces on vm fault (like mentioned in a commit msg related to that env var in mesa) ?
check_vm doesn't not change anything with respect to gpu reset.
4: For the deallocation failure / leak above (radeon_ttm_bo_destroy warning), does it mean the memory is lost until next reboot or does a gpu soft reset allow to recover these leaks ?
I'm not quite sure what you are referring to, but if the GPU reset is successful, all fences should be signalled so any memory that is pinned due to a command buffer being in flight could be freed.