On Saturday, July 25, 2020 12:59 AM, Duncan 1i5t5.duncan@cox.net wrote:
On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk mnrzk@protonmail.com wrote:
Am 24.07.20 um 19:33 schrieb Kees Cook:
There was a fix to disable the async path for this driver that worked around the bug too, yes? That seems like a safer and more focused change that doesn't revert the SLUB defense for all users, and would actually provide a complete, I think, workaround
That said, I haven't seen the async disabling patch. If you could link to it, I'd be glad to test it out and perhaps we can use that instead.
I'm confused. Not to put words in Kees' mouth; /I/ am confused (which admittedly could well be just because I make no claims to be a coder and am simply reading the bug and thread, but I'd appreciate some "unconfusing" anyway).
My interpretation of the "async disabling" reference was that it was to comment #30 on the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30
... which (if I'm not confused on this point too) appears to be yours. There it was stated...
I've also found that this bug exclusively occurs when commit_work is on the workqueue. After forcing drm_atomic_helper_commit to run all of the commits without adding to the workqueue and running the OS, the issue seems to have disappeared. <<<<
Would not forcing all commits to run directly, without placing them on the workqueue, be "async disabling"? That's what I /thought/ he was referencing.
Oh, I thought he was referring to a different patch. Kees, could I get your confirmation on this?
The change I made actually affected all of the DRM code, although this could easily be changed to be specific to amdgpu. (By forcing blocking on amdgpu_dm's non-blocking commit code)
That said, I'd still need to test further because I only did test it for a couple of hours then. Although it should work in theory.
OTOH your base/context swap idea sounds like a possibly "less disturbance" workaround, if it works, and given the point in the commit cycle... (But if it's out Sunday it's likely too late to test and get it in now anyway; if it's another week, tho...)
The base/context swap idea should make the use-after-free behave how it did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a "less disturbance" workaround and more of a "no disturbance" workaround.
Thanks, Mazin Rezk
Duncan - No HTML messages please; they are filtered as spam. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman