Hi
On Mon, Jun 30, 2014 at 9:04 PM, Jerome Glisse j.glisse@gmail.com wrote:
On Mon, Jun 30, 2014 at 08:47:31PM +0200, David Herrmann wrote:
Additionally to what AIO and Direct-IO do, intel userptr adds the range_start callback to release pinned pages whenever the pages are unmapped. However, anyone who truncates inode pages, schedules writeback, etc., has to lock the page. Thus, any following GUP-fast from userptr will fail and the slowpath will wait on mmap_sem. So I'd really prefer if you could elaborate on your race?
Some writback code path (and other cpu page table modificiation) will not call range_start but only invalidate_page. More over once the range_start is call a GUP that is done before range_end is call will return what ever it sees inside the cpu page table at the time which might be new pages or old pages.
range_start/end are usually called by unmap_*() functions, which normally are called under page-lock. Therefore, _no_ new PTEs can be established as long as the page is locked. This means, once range_end is called, you're guaranteed any racing GUP-fast will fail and any racing GUP will wait on the page-lock. However, I wonder why i915 uses range_start instead of range_end. The PTEs are still in place when range_start is called, therefore a racing GUP-fast will still succeed.
Regarding "invalidate_page": This is called _after_ the PTE has been removed (see rmap and try_unmap_page()), also with the page-locked. Same scenario as range_end. However, I also wonder why i915 doesn't have a callback for that? They definitely need to, afaics.
Writeback code races with parallel GUP writes and keeps pages in place. They rely on GUP-writers to call SetDirty() once they're done. I've never liked that, but I don't see any code protecting writeback against GUP writes, do you?
Thus you can imagine i915 trying to use an userptr object right after a range_start but before a range_end, the i915 will read the page table (GUP is not serializing anything here) and will assume that whatever it got from there is current while it might just be soon to be discarded/replaced pages. Hence you can not garanty that pages you use are the same backing the user space range. Note that the mmap_sem does not protect page migration or thing like that. It only protect vma modifications.
As i said for the gpu only accessing those buffer in read mode is fine but i am sure userspace will start relying on the gpu writting to those page and being able to read back what the gpu wrote through the user space mapping. This will work often but this can only work because you are lucky and there is no single way to make it work reliably.
i915 userptr is a 3.16 feature, right? So we can still revert it if it's really broken.
However, I'd still like to see an example _call-trace_ with a *real race*. I cannot see any writeback code that replaces pages despite elevated page-refs. page-migration is quite strict about page-refs and fails in those cases. truncate() is the only place I can see, but this should be fixed by using range_end() instead of range_start(), right?
Thanks David