https://bugzilla.kernel.org/show_bug.cgi?id=207383
--- Comment #96 from mnrzk@protonmail.com --- (In reply to Nicholas Kazlauskas from comment #95)
Created attachment 290583 [details] 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
So the sequence looks like the following:
- Non-blocking commit #1 requested, checked, swaps state and deferred to
work queue.
- Non-blocking commit #2 requested, checked, swaps state and deferred to
work queue.
Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs, Planes, Connectors) so Commit #2 does not stall for Commit #1. DRM Private Objects have always been avoided in stall checks, so we have no safety from DRM core in this regard.
- Due to system load commit #2 executes first and finishes its commit tail
work. At the end of commit tail, as part of DRM core, it calls drm_atomic_state_put().
Since this was the pageflip IOCTL we likely already dropped the reference on the state held by the IOCTL itself. So it's going to actually free at this point.
This eventually calls drm_atomic_state_clear() which does the following:
obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);
Note that it clears "state" here. Commit sets "state" to the following:
state->private_objs[i].state = old_obj_state; obj->state = new_obj_state;
What line number roughly does that happen on? I can't seem to find that anywhere in amdgpu_dm.c
Since Commit #1 swapped first this means Commit #2 actually does free Commit #1's private object.
- Commit #1 then executes and we get a use after free.
Same bug, it's just this was never corrupted before by the slab changes. It's been sitting dormant for 5.0~5.8.
Attached is a patch that might help resolve this.
I actually just started testing my own patch, but I'll apply your patch and see if it works though.
My patch is based on how you solved bug 204181 [1] and instead of setting the new dc_state to the old dc_state, it frees the dm_state and removes the associated private object.
If I understand correctly, if dm_state is set to NULL (i.e. new state cannot be found), commit_tail retains the current state and context. Since dm_state only contains the context (which is unused), I don't see why freeing the state and clearing the private object beforehand would be an issue.
I would attach the patch but I'll need to clean up my code first. If the patch works for the next few hours, I'll clean it up and attach it.
[1] https://patchwork.freedesktop.org/patch/320797/