Thanks for your comment.
-----Original Message----- From: Al Viro [mailto:viro@ftp.linux.org.uk] On Behalf Of Al Viro Sent: Monday, September 23, 2013 10:07 PM To: Inki Dae Cc: 'YoungJun Cho'; dri-devel@lists.freedesktop.org Subject: Re: [RFC] deadlock in "drm/exynos: fix wrong pointer access at vm close"
On Mon, Sep 23, 2013 at 04:49:30PM +0900, Inki Dae wrote:
Hi,
-----Original Message----- From: Al Viro [mailto:viro@ftp.linux.org.uk] On Behalf Of Al Viro Sent: Monday, September 23, 2013 6:29 AM To: YoungJun Cho Cc: dri-devel@lists.freedesktop.org; Inki Dae Subject: [RFC] deadlock in "drm/exynos: fix wrong pointer access at vm close"
You have drm_dev->struct_mutex grabbed before ->mmap_sem in exynos_drm_gem_mmap_ioctl() and after - in exynos_drm_gem_fault() (since ->fault() is always called with ->mmap_sem held). Looks like a garden-variety AB-BA deadlock...
Right, if mmap system call is requested by another process between - f_op pointer changing and restoring, the deadlock can be incurred.
For this, I think we can resolve this issue like below,
At exynos_drm_gem_mmap_ioctl() down_write(&mm->mmap_sem); mutex_lock(&dev->struct_mutex); ...
Umm... If you do it that way, why bother with changing ->private_data at all? You can as well stash obj in dev->dev_private->something after you've grabbed the mutex and have ->mmap() pick it there...
I changed ->private_data to a gem object so that mmap_buffer function can get the gem object. But yes, we can use dev->dev_private->something instead, and that seems like better way. For this, I missed that we can get drm_device object from file pointer.
Said that, I really don't like that approach - both playing with ->f_op and the games with private vmas;
Agree. But is there other better way to support direct mapping; not needing page fault handler to map, and indirect mapping interfaces; needing the page fault handler?
exynos_gem_get_vma(), AFAICS, calls find_vma() (without bothering to hold ->mmap_sem, BTW - there's nothing to prevent the result of find_vma() being freed just as it's returned
I can't see to hold ->mmap_sem when it calls find_vma() anywhere else.
to caller) and clones it manually, regardless of whether that vma allows to clone itself or not. Quite a few drivers rely on that not happening...
I think that has no any problem because exynos_gem_get_vma() takes a reference to vma, and also v4l2 side is using same way. I and v4l2 guys might be missing something what you are concerning. So Could you give us more comments?
And, I found a bug that we never call exynos_gem_put_vma() when userptr is freed. That should be fixed.
IOW, you are already digging deep inside VM guts and this only makes it deeper ;-/
And no, the deadlock doesn't depend on race between ioctl() and mmap() from another process; all it takes is
Yes, there was my wrong comments. That is a mmap callback(exynos_drm_gem_mmap_buffer) we don't want can be called if we called mmap system call when ->f_op points to exynos_drm_gem_fops.
- thread A does clone(), creating thread B that shares address space with
it. 2) thread A does that ioctl, creating a mapping 3) thread A does that ioctl again, creating another mapping, while thread B dereferences an address in the first mapping and triggers a page fault.
The only race is on step 3 in the above; the question about mmap() vs. ioctl() was about mmap(2) _during_ that ioctl() hitting exynos_drm_gem_mmap_buffer() instead of exynos_drm_gem_mmap() it would've called before or after ioctl(). Here the interesting case is when callers of mmap() and ioctl() do *not* share the address space, since in that case mmap(2) won't notice ->mmap_sem held by you - it's on the different mm_struct, so mmap(2) will get to calling the ->f_op->mmap() without waiting for you to restore ->f_op...
Yes, it's a problem. When two more threads are running, and when thread A called mmap system call, exynos_drm_gem_mmap_buffer() will be called if thread B called ioctl() first and f_op was not restored yet. I have no idea to resolve this issue yet. So do you have any good idea? At any rate, we should fix this issue.
For another bug in the same area, try building that driver modular and watch what happens to use count after a round of this switching ->f_op and restoring it back to original; fops_get() in there is wrong and AFAICS pointless.
Yes, it would also be another bug. Will check it out. It seems that it's not good way to change and restore ->f_op, and this way makes mmap part to be complicated. I will try to find a way to resolve this issue or to find another better way to support direct and indirect mapping interfaces.
I'd appreciate all the advices I can get.
Thanks, Inki Dae