Hi,
The only way I see for this to happen is that the guest is preempted between qxl_push_{cursor,command}_ring_release() and qxl_release_fence_buffer_objects() calls. The host can complete the qxl command then, signal the guest, and the IRQ handler calls qxl_release_free_list() before qxl_release_fence_buffer_objects() runs.
We think the same: qxl_release was freed by garbage collector before original thread had called qxl_release_fence_buffer_objects().
Ok, nice, I think we can consider the issue being analyzed then ;)
Looking through the code I think it should be safe to simply swap the qxl_release_fence_buffer_objects() + qxl_push_{cursor,command}_ring_release() calls to close that race window. Can you try that and see if it fixes the bug for you?
I'm going to prepare and test such patch but I have one question here: qxl_push_*_ring_release can be called with interruptible=true and fail How to correctly handle this case? Is the hunk below correct from your POV?
Oh, right, the error code path will be quite different, checking ...
--- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -261,12 +261,8 @@ static int qxl_process_single_command(struct qxl_device *qdev, apply_surf_reloc(qdev, &reloc_info[i]); }
qxl_release_fence_buffer_objects(release); ret = qxl_push_command_ring_release(qdev, release, cmd->type, true);
if (ret)
qxl_release_backoff_reserve_list(release); <<<< ????
else
qxl_release_fence_buffer_objects(release);
out_free_bos: out_free_release:
if (ret) qxl_release_free(qdev, release);
[ code context added ]
qxl_release_free() checks whenever a release is fenced and signals the fence in case it is so it doesn't wait for the signal forever. So, yes, I think qxl_release_free() should cleanup the release properly in any case and the patch chunk should be correct.
take care, Gerd