[PATCH 00/13] pwrite/pread rework

List overview All Threads
Download

newer

older

[PATCH -fixes] ttm: Don't return...

[PATCH] drm/radeon/kms/atom: unify...

Daniel Vetter

6 Nov 2011 6 Nov '11

7:13 p.m.

Hi all,

I've finally managed to clean up my pwrite/pread rework. Major changes since the first submission: - fixed a bunch of bugs, some of them discovered by the intel-gpu-tools gem test suite - extracted the clflush helper into common drm code as suggested by Chris Wilson. - extended the prefault helpers in pagemap.h instead of rolling our own. Also add a patch to not prefault for writes (which change the userspace memory) before we're committed to the write. Both address issues raised by Keith Packard.

Why this is cool stuff: - fixes the spurious -EFAULTS when handing in a pointer to a gem gtt mappings. See https://bugs.freedesktop.org/show_bug.cgi?id=38115 - Kills a bunch of code by dropping the practically useless ranged cpu read domain tracking and integrating the slow paths into the fast paths. - Fixes pwrite/pread to use snooped cpu access on llc machines, hereby decently speeding up microbenchmarks. - Paves the way for further clever tricks to reduce pressure on the mappable gtt area.

Reviews and comments highly welcome.

Yours, Daniel

Daniel Vetter (13): drm/i915: fall through pwrite_gtt_slow to the shmem slow path drm/i915: rewrite shmem_pwrite_slow to use copy_from_user drm/i915: rewrite shmem_pread_slow to use copy_to_user drm/i915: merge shmem_pwrite slow&fast-path drm/i915: merge shmem_pread slow&fast-path drm: add helper to clflush a virtual address range drm/i915: move clflushing into shmem_pread drm/i915: kill ranged cpu read domain support drm/i915: don't use gtt_pwrite on LLC cached objects drm/i915: don't call shmem_read_mapping unnecessarily mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915: drop gtt slowpath drm/i915: don't clobber userspace memory before commiting to the pread

drivers/gpu/drm/drm_cache.c | 23 ++ drivers/gpu/drm/i915/i915_drv.h | 7 - drivers/gpu/drm/i915/i915_gem.c | 824 +++++++++++---------------------------- include/drm/drmP.h | 1 + include/linux/pagemap.h | 28 +- 5 files changed, 273 insertions(+), 610 deletions(-)

-- 1.7.6.4

Show replies by date

Daniel Vetter

6 Nov 6 Nov

7:13 p.m.

New subject: [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path

The gtt_pwrite slowpath grabs the userspace memory with get_user_pages. This will not work for non-page backed memory, like a gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit -EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

Signed-Off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 31 +++++++++++++++++++------------ 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a838597..6fa24bc 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -996,10 +996,11 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, * pread/pwrite currently are reading and writing from the CPU * perspective, requiring manual detiling by the client. */ - if (obj->phys_obj) + if (obj->phys_obj) { ret = i915_gem_phys_pwrite(dev, obj, args, file); - else if (obj->gtt_space && - obj->base.write_domain != I915_GEM_DOMAIN_CPU) { + goto out; + } else if (obj->gtt_space && + obj->base.write_domain != I915_GEM_DOMAIN_CPU) { ret = i915_gem_object_pin(obj, 0, true); if (ret) goto out; @@ -1018,18 +1019,24 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,

out_unpin: i915_gem_object_unpin(obj); - } else { - ret = i915_gem_object_set_to_cpu_domain(obj, 1); - if (ret) - goto out;

- ret = -EFAULT; - if (!i915_gem_object_needs_bit17_swizzle(obj)) - ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file); - if (ret == -EFAULT) - ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file); + if (ret != -EFAULT) + goto out; + /* Fall through to the shmfs paths because the gtt paths might + * fail with non-page-backed user pointers (e.g. gtt mappings + * when moving data between textures). */ }

+ ret = i915_gem_object_set_to_cpu_domain(obj, 1); + if (ret) + goto out; + + ret = -EFAULT; + if (!i915_gem_object_needs_bit17_swizzle(obj)) + ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file); + if (ret == -EFAULT) + ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file); + out: drm_gem_object_unreference(&obj->base); unlock:

-- 1.7.6.4

Ben Widawsky

21 Nov 21 Nov

3:09 a.m.

New subject: [Intel-gfx] [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path

On Sun, 6 Nov 2011 20:13:48 +0100 Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

The gtt_pwrite slowpath grabs the userspace memory with get_user_pages. This will not work for non-page backed memory, like a gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit -EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

Signed-Off-by: Daniel Vetter daniel.vetter@ffwll.ch

It would be nice if there was some way to notify users that pwriting a gtt mmapped address can be really damn slow. That's also the one behavior change this patch introduces. It's possible that some SW was expecting to get a, "fast path" and would deal with the -EFAULT if it didn't get it.

Presumably subsequent patches in this series fix this up further, so instead of figuring out all the failure conditions pwrite can cause, let's just go with Acked-by: Ben Widawsky ben@bwidawsk.net

Chris Wilson

10:20 a.m.

New subject: [Intel-gfx] [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path

On Sun, 20 Nov 2011 19:09:18 -0800, Ben Widawsky ben@bwidawsk.net wrote:

...

On Sun, 6 Nov 2011 20:13:48 +0100 Daniel Vetter daniel.vetter@ffwll.ch wrote:

...
The gtt_pwrite slowpath grabs the userspace memory with get_user_pages. This will not work for non-page backed memory, like a gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit -EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

Signed-Off-by: Daniel Vetter daniel.vetter@ffwll.ch

It would be nice if there was some way to notify users that pwriting a gtt mmapped address can be really damn slow. That's also the one behavior change this patch introduces. It's possible that some SW was expecting to get a, "fast path" and would deal with the -EFAULT if it didn't get it.

The behaviour change is intentional. Before this patch we would deadlock... -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Daniel Vetter

6 Nov 6 Nov

7:13 p.m.

New subject: [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user

... instead of get_user_pages, because that fails on non page-backed user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We can't call that right away because we have to take care of bit17 swizzling. To not deadlock with our own pagefault handler we need to completely drop struct_mutex, reducing the atomicty-guarantees of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's already the risk of the pwrite call not being atomic, no degration. - read/write access to mmaps: already fully racy, no degration. - set_tiling: Calling set_tiling while reading/writing is already pretty much undefined, now it just got a bit worse. set_tiling is only called by libdrm on unused/new bos, so no problem. - set_domain: When changing to the gtt domain while copying (without any read/write access, e.g. for synchronization), we might leave unflushed data in the cpu caches. The clflush_object at the end of pwrite_slow takes care of this problem. - truncating of purgeable objects: the shmem_read_mapping_page call could reinstate backing storage for truncated objects. The check at the end of pwrite_slow takes care of this.

v2: - add missing intel_gtt_chipset_flush - add __ to copy_from_user_swizzled as suggest by Chris Wilson.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 126 ++++++++++++++++++++------------------- 1 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 6fa24bc..f9efebb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -58,6 +58,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj);

static int i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc); +static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);

/* some bookkeeping */ static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv, @@ -385,6 +386,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev, return 0; }

+static inline int +__copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset, + const char *cpu_vaddr, + int length) +{ + int ret, cpu_offset = 0; + + while (length > 0) { + int cacheline_end = ALIGN(gpu_offset + 1, 64); + int this_length = min(cacheline_end - gpu_offset, length); + int swizzled_gpu_offset = gpu_offset ^ 64; + + ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset, + cpu_vaddr + cpu_offset, + this_length); + if (ret) + return ret + length; + + cpu_offset += this_length; + gpu_offset += this_length; + length -= this_length; + } + + return 0; +} + /** * This is the fallback shmem pread path, which allocates temporary storage * in kernel space to copy_to_user into outside of the struct_mutex, so we @@ -841,71 +868,36 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; - struct mm_struct *mm = current->mm; - struct page **user_pages; ssize_t remain; - loff_t offset, pinned_pages, i; - loff_t first_data_page, last_data_page, num_pages; - int shmem_page_offset; - int data_page_index, data_page_offset; - int page_length; - int ret; - uint64_t data_ptr = args->data_ptr; - int do_bit17_swizzling; + loff_t offset; + char __user *user_data; + int shmem_page_offset, page_length, ret; + int obj_do_bit17_swizzling, page_do_bit17_swizzling;

+ user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size;

- /* Pin the user pages containing the data. We can't fault while - * holding the struct mutex, and all of the pwrite implementations - * want to hold it while dereferencing the user data. - */ - first_data_page = data_ptr / PAGE_SIZE; - last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE; - num_pages = last_data_page - first_data_page + 1; - - user_pages = drm_malloc_ab(num_pages, sizeof(struct page *)); - if (user_pages == NULL) - return -ENOMEM; - - mutex_unlock(&dev->struct_mutex); - down_read(&mm->mmap_sem); - pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr, - num_pages, 0, 0, user_pages, NULL); - up_read(&mm->mmap_sem); - mutex_lock(&dev->struct_mutex); - if (pinned_pages < num_pages) { - ret = -EFAULT; - goto out; - } - - ret = i915_gem_object_set_to_cpu_domain(obj, 1); - if (ret) - goto out; - - do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj); + obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);

offset = args->offset; obj->dirty = 1;

+ mutex_unlock(&dev->struct_mutex); + while (remain > 0) { struct page *page; + char *vaddr;

page_length = remain; if ((shmem_page_offset + page_length) > PAGE_SIZE) page_length = PAGE_SIZE - shmem_page_offset; - if ((data_page_offset + page_length) > PAGE_SIZE) - page_length = PAGE_SIZE - data_page_offset;

page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); if (IS_ERR(page)) { @@ -913,34 +905,44 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, goto out; }

- if (do_bit17_swizzling) { - slow_shmem_bit17_copy(page, - shmem_page_offset, - user_pages[data_page_index], - data_page_offset, - page_length, - 0); - } else { - slow_shmem_copy(page, - shmem_page_offset, - user_pages[data_page_index], - data_page_offset, - page_length); - } + page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0; + + vaddr = kmap(page); + if (obj_do_bit17_swizzling && page_do_bit17_swizzling) + ret = __copy_from_user_swizzled(vaddr, shmem_page_offset, + user_data, + page_length); + else + ret = __copy_from_user(vaddr + shmem_page_offset, + user_data, + page_length); + kunmap(page);

set_page_dirty(page); mark_page_accessed(page); page_cache_release(page);

+ if (ret) { + ret = -EFAULT; + goto out; + } + remain -= page_length; - data_ptr += page_length; + user_data += page_length; offset += page_length; }

out: - for (i = 0; i < pinned_pages; i++) - page_cache_release(user_pages[i]); - drm_free_large(user_pages); + mutex_lock(&dev->struct_mutex); + /* Fixup: Kill any reinstated backing storage pages */ + if (obj->madv == __I915_MADV_PURGED) + i915_gem_object_truncate(obj); + /* and flush dirty cachelines in case the object isn't in the cpu write + * domain anymore. */ + if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) { + i915_gem_clflush_object(obj); + intel_gtt_chipset_flush(); + }

return ret; }

-- 1.7.6.4

Ben Widawsky

21 Nov 21 Nov

5:56 a.m.

New subject: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user

On Sun, 6 Nov 2011 20:13:49 +0100 Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

... instead of get_user_pages, because that fails on non page-backed user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We can't call that right away because we have to take care of bit17 swizzling. To not deadlock with our own pagefault handler we need to completely drop struct_mutex, reducing the atomicty-guarantees of our userspace abi. Implications for racing with other gem ioctl:

execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's already the risk of the pwrite call not being atomic, no degration.

read/write access to mmaps: already fully racy, no degration.

set_tiling: Calling set_tiling while reading/writing is already pretty much undefined, now it just got a bit worse. set_tiling is only called by libdrm on unused/new bos, so no problem.

set_domain: When changing to the gtt domain while copying (without any read/write access, e.g. for synchronization), we might leave unflushed data in the cpu caches. The clflush_object at the end of pwrite_slow takes care of this problem.

truncating of purgeable objects: the shmem_read_mapping_page call could reinstate backing storage for truncated objects. The check at the end of pwrite_slow takes care of this.

v2:

add missing intel_gtt_chipset_flush

add __ to copy_from_user_swizzled as suggest by Chris Wilson.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

drivers/gpu/drm/i915/i915_gem.c | 126 ++++++++++++++++++++------------------- 1 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 6fa24bc..f9efebb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -58,6 +58,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj);

static int i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc); +static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);

/* some bookkeeping */ static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv, @@ -385,6 +386,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev, return 0; }

+static inline int +__copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
	  const char *cpu_vaddr,
	  int length)
+{
int ret, cpu_offset = 0;

while (length > 0) {
int cacheline_end = ALIGN(gpu_offset + 1, 64);
int this_length = min(cacheline_end - gpu_offset, length);
int swizzled_gpu_offset = gpu_offset ^ 64;
ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
		       cpu_vaddr + cpu_offset,
		       this_length);
if (ret)
	return ret + length;
cpu_offset += this_length;
gpu_offset += this_length;
length -= this_length;
}

return 0;
+}

Bikeshed, but I would much prefer a #define for the swizzle bit/cacheline size.

...

/**

This is the fallback shmem pread path, which allocates temporary storage

in kernel space to copy_to_user into outside of the struct_mutex, so we

@@ -841,71 +868,36 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;

struct mm_struct *mm = current->mm;

struct page **user_pages; ssize_t remain;

loff_t offset, pinned_pages, i;

loff_t first_data_page, last_data_page, num_pages;

int shmem_page_offset;

int data_page_index, data_page_offset;

int page_length;

int ret;

uint64_t data_ptr = args->data_ptr;

int do_bit17_swizzling;

loff_t offset;

char __user *user_data;

int shmem_page_offset, page_length, ret;

int obj_do_bit17_swizzling, page_do_bit17_swizzling;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size;
/* Pin the user pages containing the data. We can't fault while
* holding the struct mutex, and all of the pwrite implementations
* want to hold it while dereferencing the user data.
*/
first_data_page = data_ptr / PAGE_SIZE;

last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;

num_pages = last_data_page - first_data_page + 1;

user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));

if (user_pages == NULL)
return -ENOMEM;
mutex_unlock(&dev->struct_mutex);

down_read(&mm->mmap_sem);

pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
		      num_pages, 0, 0, user_pages, NULL);
up_read(&mm->mmap_sem);

mutex_lock(&dev->struct_mutex);

if (pinned_pages < num_pages) {
ret = -EFAULT;
goto out;
}

ret = i915_gem_object_set_to_cpu_domain(obj, 1);

if (ret)
goto out;
do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);

offset = args->offset; obj->dirty = 1;

mutex_unlock(&dev->struct_mutex);

while (remain > 0) { struct page *page;
char *vaddr;
/* Operation in this page

shmem_page_offset = offset within page in shmem file
 * data_page_index = page number in get_user_pages return
 * data_page_offset = offset with data_page_index page.
page_length = bytes to copy for this page

*/ shmem_page_offset = offset_in_page(offset);
data_page_index = data_ptr / PAGE_SIZE - first_data_page;
data_page_offset = offset_in_page(data_ptr);
page_length = remain; if ((shmem_page_offset + page_length) > PAGE_SIZE) page_length = PAGE_SIZE - shmem_page_offset;
if ((data_page_offset + page_length) > PAGE_SIZE)
	page_length = PAGE_SIZE - data_page_offset;
page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); if (IS_ERR(page)) {
@@ -913,34 +905,44 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, goto out; }
if (do_bit17_swizzling) {
	slow_shmem_bit17_copy(page,
			      shmem_page_offset,
			      user_pages[data_page_index],
			      data_page_offset,
			      page_length,
			      0);
} else {
	slow_shmem_copy(page,
			shmem_page_offset,
			user_pages[data_page_index],
			data_page_offset,
			page_length);
}
page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
vaddr = kmap(page);
if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
	ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
					user_data,
					page_length);
else
	ret = __copy_from_user(vaddr + shmem_page_offset,
			       user_data,
			       page_length);
kunmap(page);
set_page_dirty(page); mark_page_accessed(page); page_cache_release(page);
if (ret) {
	ret = -EFAULT;
	goto out;
}
remain -= page_length;
data_ptr += page_length;
user_data += page_length;
offset += page_length; }
out:
for (i = 0; i < pinned_pages; i++)
page_cache_release(user_pages[i]);
drm_free_large(user_pages);
mutex_lock(&dev->struct_mutex);

/* Fixup: Kill any reinstated backing storage pages */

if (obj->madv == __I915_MADV_PURGED)
i915_gem_object_truncate(obj);
/* and flush dirty cachelines in case the object isn't in the cpu write
* domain anymore. */
if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
i915_gem_clflush_object(obj);
intel_gtt_chipset_flush();
}

return ret;
}

I must be missing something obvious here... Can you explain how this can possibly be considered safe without holding struct_mutex?

Daniel Vetter

4:02 p.m.

New subject: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user

On Sun, Nov 20, 2011 at 09:56:32PM -0800, Ben Widawsky wrote: [snip the patch]

...

Bikeshed, but I would much prefer a #define for the swizzle bit/cacheline size.

I've looked at this stuff way too long, so I'm biased, but 64 = cacheline = dram fetch size = 1 << 64 feels about as natural for me as 4096 = PAGE_SIZE ...

[snip the patch]

...

I must be missing something obvious here... Can you explain how this can possibly be considered safe without holding struct_mutex?

That's the reason why the commit msg goes through every case and explains why I think it's safe. The large thing here is that we need to drop the mutex when calling copy_*_user (at least in the non-atomic slow-paths) because otherwise we might deadlock with our own pagefault handler. -Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Ben Widawsky

5:55 p.m.

New subject: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user

On Mon, Nov 21, 2011 at 05:02:44PM +0100, Daniel Vetter wrote:

...

On Sun, Nov 20, 2011 at 09:56:32PM -0800, Ben Widawsky wrote: [snip the patch]

...
Bikeshed, but I would much prefer a #define for the swizzle bit/cacheline size.

I've looked at this stuff way too long, so I'm biased, but 64 = cacheline = dram fetch size = 1 << 64 feels about as natural for me as 4096 = PAGE_SIZE ...

[snip the patch]

...
I must be missing something obvious here... Can you explain how this can possibly be considered safe without holding struct_mutex?

That's the reason why the commit msg goes through every case and explains why I think it's safe. The large thing here is that we need to drop the mutex when calling copy_*_user (at least in the non-atomic slow-paths) because otherwise we might deadlock with our own pagefault handler. -Daniel

The part about dropping struct_mutex is clear to me.

The bit that I'm missing, I just don't see how you guarantee the page you're reading from (assuming it's a GTT mmapped page) doesn't get moved from out under you. For instance if the page isn't there when you do the initial __copy_from_user, it will get faulted in... cool - but what if somewhere in that loop the object gets swapped out and something else is put in it's place? How is that prevented?

Sorry if it's a stupid question, I just don't get it. Ben

Ben Widawsky

6:43 p.m.

New subject: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user

On Mon, Nov 21, 2011 at 09:55:07AM -0800, Ben Widawsky wrote:

...

On Mon, Nov 21, 2011 at 05:02:44PM +0100, Daniel Vetter wrote:

...
On Sun, Nov 20, 2011 at 09:56:32PM -0800, Ben Widawsky wrote: [snip the patch]

...
Bikeshed, but I would much prefer a #define for the swizzle bit/cacheline size.

I've looked at this stuff way too long, so I'm biased, but 64 = cacheline = dram fetch size = 1 << 64 feels about as natural for me as 4096 = PAGE_SIZE ...

[snip the patch]

...
I must be missing something obvious here... Can you explain how this can possibly be considered safe without holding struct_mutex?

That's the reason why the commit msg goes through every case and explains why I think it's safe. The large thing here is that we need to drop the mutex when calling copy_*_user (at least in the non-atomic slow-paths) because otherwise we might deadlock with our own pagefault handler. -Daniel

The part about dropping struct_mutex is clear to me.

The bit that I'm missing, I just don't see how you guarantee the page you're reading from (assuming it's a GTT mmapped page) doesn't get moved from out under you. For instance if the page isn't there when you do the initial __copy_from_user, it will get faulted in... cool - but what if somewhere in that loop the object gets swapped out and something else is put in it's place? How is that prevented?

Sorry if it's a stupid question, I just don't get it. Ben

Okay, I got what I was missing from IRC. Anytime the object is unmapped we shoot down the userspace mapping. I couldn't find it in the code, but it turned out to be right in front of me.

Reviewed-by: Ben Widawsky ben@bwidawsk.net

Daniel Vetter

6 Nov 6 Nov

7:13 p.m.

New subject: [PATCH 03/13] drm/i915: rewrite shmem_pread_slow to use copy_to_user

Like for shmem_pwrite_slow. The only difference is that because we read data, we can leave the fetched cachelines in the cpu: In the case that the object isn't in the cpu read domain anymore, the clflush for the next cpu read domain invalidation will simply drop these cachelines.

slow_shmem_bit17_copy is now ununsed, so kill it.

With this patch tests/gem_mmap_gtt now actually works.

v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 172 +++++++++++++-------------------------- 1 files changed, 56 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f9efebb..be1dd4f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -277,55 +277,6 @@ slow_shmem_copy(struct page *dst_page, kunmap(dst_page); }

-static inline void -slow_shmem_bit17_copy(struct page *gpu_page, - int gpu_offset, - struct page *cpu_page, - int cpu_offset, - int length, - int is_read) -{ - char *gpu_vaddr, *cpu_vaddr; - - /* Use the unswizzled path if this page isn't affected. */ - if ((page_to_phys(gpu_page) & (1 << 17)) == 0) { - if (is_read) - return slow_shmem_copy(cpu_page, cpu_offset, - gpu_page, gpu_offset, length); - else - return slow_shmem_copy(gpu_page, gpu_offset, - cpu_page, cpu_offset, length); - } - - gpu_vaddr = kmap(gpu_page); - cpu_vaddr = kmap(cpu_page); - - /* Copy the data, XORing A6 with A17 (1). The user already knows he's - * XORing with the other bits (A9 for Y, A9 and A10 for X) - */ - while (length > 0) { - int cacheline_end = ALIGN(gpu_offset + 1, 64); - int this_length = min(cacheline_end - gpu_offset, length); - int swizzled_gpu_offset = gpu_offset ^ 64; - - if (is_read) { - memcpy(cpu_vaddr + cpu_offset, - gpu_vaddr + swizzled_gpu_offset, - this_length); - } else { - memcpy(gpu_vaddr + swizzled_gpu_offset, - cpu_vaddr + cpu_offset, - this_length); - } - cpu_offset += this_length; - gpu_offset += this_length; - length -= this_length; - } - - kunmap(cpu_page); - kunmap(gpu_page); -} - /** * This is the fast shmem pread path, which attempts to copy_from_user directly * from the backing pages of the object to the user's address space. On a @@ -387,6 +338,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev, }

static inline int +__copy_to_user_swizzled(char __user *cpu_vaddr, + const char *gpu_vaddr, int gpu_offset, + int length) +{ + int ret, cpu_offset = 0; + + while (length > 0) { + int cacheline_end = ALIGN(gpu_offset + 1, 64); + int this_length = min(cacheline_end - gpu_offset, length); + int swizzled_gpu_offset = gpu_offset ^ 64; + + ret = __copy_to_user(cpu_vaddr + cpu_offset, + gpu_vaddr + swizzled_gpu_offset, + this_length); + if (ret) + return ret + length; + + cpu_offset += this_length; + gpu_offset += this_length; + length -= this_length; + } + + return 0; +} + +static inline int __copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset, const char *cpu_vaddr, int length) @@ -425,72 +402,34 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; - struct mm_struct *mm = current->mm; - struct page **user_pages; + char __user *user_data; ssize_t remain; - loff_t offset, pinned_pages, i; - loff_t first_data_page, last_data_page, num_pages; - int shmem_page_offset; - int data_page_index, data_page_offset; - int page_length; - int ret; - uint64_t data_ptr = args->data_ptr; - int do_bit17_swizzling; + loff_t offset; + int shmem_page_offset, page_length, ret; + int obj_do_bit17_swizzling, page_do_bit17_swizzling;

+ user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size;

- /* Pin the user pages containing the data. We can't fault while - * holding the struct mutex, yet we want to hold it while - * dereferencing the user data. - */ - first_data_page = data_ptr / PAGE_SIZE; - last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE; - num_pages = last_data_page - first_data_page + 1; + obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);

- user_pages = drm_malloc_ab(num_pages, sizeof(struct page *)); - if (user_pages == NULL) - return -ENOMEM; + offset = args->offset;

mutex_unlock(&dev->struct_mutex); - down_read(&mm->mmap_sem); - pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr, - num_pages, 1, 0, user_pages, NULL); - up_read(&mm->mmap_sem); - mutex_lock(&dev->struct_mutex); - if (pinned_pages < num_pages) { - ret = -EFAULT; - goto out; - } - - ret = i915_gem_object_set_cpu_read_domain_range(obj, - args->offset, - args->size); - if (ret) - goto out; - - do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj); - - offset = args->offset;

while (remain > 0) { struct page *page; + char *vaddr;

/* Operation in this page * * shmem_page_offset = offset within page in shmem file - * data_page_index = page number in get_user_pages return - * data_page_offset = offset with data_page_index page. * page_length = bytes to copy for this page */ shmem_page_offset = offset_in_page(offset); - data_page_index = data_ptr / PAGE_SIZE - first_data_page; - data_page_offset = offset_in_page(data_ptr); - page_length = remain; if ((shmem_page_offset + page_length) > PAGE_SIZE) page_length = PAGE_SIZE - shmem_page_offset; - if ((data_page_offset + page_length) > PAGE_SIZE) - page_length = PAGE_SIZE - data_page_offset;

page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); if (IS_ERR(page)) { @@ -498,36 +437,37 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, goto out; }

- if (do_bit17_swizzling) { - slow_shmem_bit17_copy(page, - shmem_page_offset, - user_pages[data_page_index], - data_page_offset, - page_length, - 1); - } else { - slow_shmem_copy(user_pages[data_page_index], - data_page_offset, - page, - shmem_page_offset, - page_length); - } + page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0; + + vaddr = kmap(page); + if (obj_do_bit17_swizzling && page_do_bit17_swizzling) + ret = __copy_to_user_swizzled(user_data, + vaddr, shmem_page_offset, + page_length); + else + ret = __copy_to_user(user_data, + vaddr + shmem_page_offset, + page_length); + kunmap(page);

mark_page_accessed(page); page_cache_release(page);

+ if (ret) { + ret = -EFAULT; + goto out; + } + remain -= page_length; - data_ptr += page_length; + user_data += page_length; offset += page_length; }

out: - for (i = 0; i < pinned_pages; i++) { - SetPageDirty(user_pages[i]); - mark_page_accessed(user_pages[i]); - page_cache_release(user_pages[i]); - } - drm_free_large(user_pages); + mutex_lock(&dev->struct_mutex); + /* Fixup: Kill any reinstated backing storage pages */ + if (obj->madv == __I915_MADV_PURGED) + i915_gem_object_truncate(obj);

return ret; }

-- 1.7.6.4

Daniel Vetter

7:13 p.m.

New subject: [PATCH 04/13] drm/i915: merge shmem_pwrite slow&fast-path

With the previous rewrite, they've become essential identical.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 126 ++++++++++---------------------------- 1 files changed, 33 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index be1dd4f..52fc0b5 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -728,84 +728,11 @@ out_unpin_pages: return ret; }

-/** - * This is the fast shmem pwrite path, which attempts to directly - * copy_from_user into the kmapped pages backing the object. - */ -static int -i915_gem_shmem_pwrite_fast(struct drm_device *dev, - struct drm_i915_gem_object *obj, - struct drm_i915_gem_pwrite *args, - struct drm_file *file) -{ - struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; - ssize_t remain; - loff_t offset; - char __user *user_data; - int page_offset, page_length; - - user_data = (char __user *) (uintptr_t) args->data_ptr; - remain = args->size; - - offset = args->offset; - obj->dirty = 1; - - while (remain > 0) { - struct page *page; - char *vaddr; - int ret; - - /* Operation in this page - * - * page_offset = offset within page - * page_length = bytes to copy for this page - */ - page_offset = offset_in_page(offset); - page_length = remain; - if ((page_offset + remain) > PAGE_SIZE) - page_length = PAGE_SIZE - page_offset; - - page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); - if (IS_ERR(page)) - return PTR_ERR(page); - - vaddr = kmap_atomic(page); - ret = __copy_from_user_inatomic(vaddr + page_offset, - user_data, - page_length); - kunmap_atomic(vaddr); - - set_page_dirty(page); - mark_page_accessed(page); - page_cache_release(page); - - /* If we get a fault while copying data, then (presumably) our - * source page isn't available. Return the error and we'll - * retry in the slow path. - */ - if (ret) - return -EFAULT; - - remain -= page_length; - user_data += page_length; - offset += page_length; - } - - return 0; -} - -/** - * This is the fallback shmem pwrite path, which uses get_user_pages to pin - * the memory and maps it using kmap_atomic for copying. - * - * This avoids taking mmap_sem for faulting on the user's address while the - * struct_mutex is held. - */ static int -i915_gem_shmem_pwrite_slow(struct drm_device *dev, - struct drm_i915_gem_object *obj, - struct drm_i915_gem_pwrite *args, - struct drm_file *file) +i915_gem_shmem_pwrite(struct drm_device *dev, + struct drm_i915_gem_object *obj, + struct drm_i915_gem_pwrite *args, + struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; ssize_t remain; @@ -813,6 +740,7 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, char __user *user_data; int shmem_page_offset, page_length, ret; int obj_do_bit17_swizzling, page_do_bit17_swizzling; + int hit_slowpath = 0;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size; @@ -822,8 +750,6 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, offset = args->offset; obj->dirty = 1;

- mutex_unlock(&dev->struct_mutex); - while (remain > 0) { struct page *page; char *vaddr; @@ -847,6 +773,21 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,

page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;

+ if (!obj_do_bit17_swizzling || !page_do_bit17_swizzling) { + vaddr = kmap_atomic(page); + ret = __copy_from_user_inatomic(vaddr + shmem_page_offset, + user_data, + page_length); + kunmap_atomic(vaddr); + + if (ret == 0) + goto next_page; + } + + hit_slowpath = 1; + + mutex_unlock(&dev->struct_mutex); + vaddr = kmap(page); if (obj_do_bit17_swizzling && page_do_bit17_swizzling) ret = __copy_from_user_swizzled(vaddr, shmem_page_offset, @@ -858,6 +799,8 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, page_length); kunmap(page);

+ mutex_lock(&dev->struct_mutex); +next_page: set_page_dirty(page); mark_page_accessed(page); page_cache_release(page); @@ -873,15 +816,16 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, }

out: - mutex_lock(&dev->struct_mutex); - /* Fixup: Kill any reinstated backing storage pages */ - if (obj->madv == __I915_MADV_PURGED) - i915_gem_object_truncate(obj); - /* and flush dirty cachelines in case the object isn't in the cpu write - * domain anymore. */ - if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) { - i915_gem_clflush_object(obj); - intel_gtt_chipset_flush(); + if (hit_slowpath) { + /* Fixup: Kill any reinstated backing storage pages */ + if (obj->madv == __I915_MADV_PURGED) + i915_gem_object_truncate(obj); + /* and flush dirty cachelines in case the object isn't in the cpu write + * domain anymore. */ + if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) { + i915_gem_clflush_object(obj); + intel_gtt_chipset_flush(); + } }

return ret; @@ -973,11 +917,7 @@ out_unpin: if (ret) goto out;

out: drm_gem_object_unreference(&obj->base);

-- 1.7.6.4

Daniel Vetter

7:13 p.m.

New subject: [PATCH 05/13] drm/i915: merge shmem_pread slow&fast-path

With the previous rewrite, they've become essential identical.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 108 ++++++++++----------------------------- 1 files changed, 27 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 52fc0b5..5eec933 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -277,66 +277,6 @@ slow_shmem_copy(struct page *dst_page, kunmap(dst_page); }

-/** - * This is the fast shmem pread path, which attempts to copy_from_user directly - * from the backing pages of the object to the user's address space. On a - * fault, it fails so we can fall back to i915_gem_shmem_pwrite_slow(). - */ -static int -i915_gem_shmem_pread_fast(struct drm_device *dev, - struct drm_i915_gem_object *obj, - struct drm_i915_gem_pread *args, - struct drm_file *file) -{ - struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; - ssize_t remain; - loff_t offset; - char __user *user_data; - int page_offset, page_length; - - user_data = (char __user *) (uintptr_t) args->data_ptr; - remain = args->size; - - offset = args->offset; - - while (remain > 0) { - struct page *page; - char *vaddr; - int ret; - - /* Operation in this page - * - * page_offset = offset within page - * page_length = bytes to copy for this page - */ - page_offset = offset_in_page(offset); - page_length = remain; - if ((page_offset + remain) > PAGE_SIZE) - page_length = PAGE_SIZE - page_offset; - - page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); - if (IS_ERR(page)) - return PTR_ERR(page); - - vaddr = kmap_atomic(page); - ret = __copy_to_user_inatomic(user_data, - vaddr + page_offset, - page_length); - kunmap_atomic(vaddr); - - mark_page_accessed(page); - page_cache_release(page); - if (ret) - return -EFAULT; - - remain -= page_length; - user_data += page_length; - offset += page_length; - } - - return 0; -} - static inline int __copy_to_user_swizzled(char __user *cpu_vaddr, const char *gpu_vaddr, int gpu_offset, @@ -389,17 +329,11 @@ __copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset, return 0; }

-/** - * This is the fallback shmem pread path, which allocates temporary storage - * in kernel space to copy_to_user into outside of the struct_mutex, so we - * can copy out of the object's backing pages while holding the struct mutex - * and not take page faults. - */ static int -i915_gem_shmem_pread_slow(struct drm_device *dev, - struct drm_i915_gem_object *obj, - struct drm_i915_gem_pread *args, - struct drm_file *file) +i915_gem_shmem_pread(struct drm_device *dev, + struct drm_i915_gem_object *obj, + struct drm_i915_gem_pread *args, + struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; char __user *user_data; @@ -407,6 +341,7 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, loff_t offset; int shmem_page_offset, page_length, ret; int obj_do_bit17_swizzling, page_do_bit17_swizzling; + int hit_slowpath = 0;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size; @@ -415,8 +350,6 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,

offset = args->offset;

- mutex_unlock(&dev->struct_mutex); - while (remain > 0) { struct page *page; char *vaddr; @@ -439,6 +372,20 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,

page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;

+ if (!obj_do_bit17_swizzling || !page_do_bit17_swizzling) { + vaddr = kmap_atomic(page); + ret = __copy_to_user_inatomic(user_data, + vaddr + shmem_page_offset, + page_length); + kunmap_atomic(vaddr); + if (ret == 0) + goto next_page; + } + + hit_slowpath = 1; + + mutex_unlock(&dev->struct_mutex); + vaddr = kmap(page); if (obj_do_bit17_swizzling && page_do_bit17_swizzling) ret = __copy_to_user_swizzled(user_data, @@ -450,6 +397,8 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, page_length); kunmap(page);

+ mutex_lock(&dev->struct_mutex); +next_page: mark_page_accessed(page); page_cache_release(page);

@@ -464,10 +413,11 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, }

out: - mutex_lock(&dev->struct_mutex); - /* Fixup: Kill any reinstated backing storage pages */ - if (obj->madv == __I915_MADV_PURGED) - i915_gem_object_truncate(obj); + if (hit_slowpath) { + /* Fixup: Kill any reinstated backing storage pages */ + if (obj->madv == __I915_MADV_PURGED) + i915_gem_object_truncate(obj); + }

return ret; } @@ -523,11 +473,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data, if (ret) goto out;

- ret = -EFAULT; - if (!i915_gem_object_needs_bit17_swizzle(obj)) - ret = i915_gem_shmem_pread_fast(dev, obj, args, file); - if (ret == -EFAULT) - ret = i915_gem_shmem_pread_slow(dev, obj, args, file); + ret = i915_gem_shmem_pread(dev, obj, args, file);

out: drm_gem_object_unreference(&obj->base);

-- 1.7.6.4

Daniel Vetter

7:13 p.m.

New subject: [PATCH 06/13] drm: add helper to clflush a virtual address range

Useful when the page is already mapped to copy date in/out.

Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/drm_cache.c | 23 +++++++++++++++++++++++ include/drm/drmP.h | 1 + 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 0e3bd5b..502771a 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -97,3 +97,26 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) #endif } EXPORT_SYMBOL(drm_clflush_pages); + +void +drm_clflush_virt_range(char *addr, unsigned long length) +{ +#if defined(CONFIG_X86) + if (cpu_has_clflush) { + char *end = addr + length; + mb(); + for (; addr < end; addr += boot_cpu_data.x86_clflush_size) + clflush(addr); + clflush(end - 1); + mb(); + return; + } + + if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0) + printk(KERN_ERR "Timed out waiting for cache flush.\n"); +#else + printk(KERN_ERR "Architecture has no drm_cache.c support\n"); + WARN_ON_ONCE(1); +#endif +} +EXPORT_SYMBOL(drm_clflush_virt_range); diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 43538b6..a082640 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -1328,6 +1328,7 @@ extern int drm_authmagic(struct drm_device *dev, void *data,

/* Cache management (drm_cache.c) */ void drm_clflush_pages(struct page *pages[], unsigned long num_pages); +void drm_clflush_virt_range(char *addr, unsigned long length);

/* Locking IOCTL support (drm_lock.h) */ extern int drm_lock(struct drm_device *dev, void *data,

-- 1.7.6.4

Ben Widawsky

21 Nov 21 Nov

7:46 p.m.

New subject: [Intel-gfx] [PATCH 06/13] drm: add helper to clflush a virtual address range

On Sun, Nov 06, 2011 at 08:13:53PM +0100, Daniel Vetter wrote:

...

Useful when the page is already mapped to copy date in/out.

Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

drivers/gpu/drm/drm_cache.c | 23 +++++++++++++++++++++++ include/drm/drmP.h | 1 + 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 0e3bd5b..502771a 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -97,3 +97,26 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) #endif } EXPORT_SYMBOL(drm_clflush_pages);

+void +drm_clflush_virt_range(char *addr, unsigned long length) +{ +#if defined(CONFIG_X86)
if (cpu_has_clflush) {
char *end = addr + length;
mb();
for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
	clflush(addr);
clflush(end - 1);
mb();
return;
}

if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
printk(KERN_ERR "Timed out waiting for cache flush.\n");
+#else

printk(KERN_ERR "Architecture has no drm_cache.c support\n");

WARN_ON_ONCE(1);

+#endif +} +EXPORT_SYMBOL(drm_clflush_virt_range);

I'd feel more comfortable with a BUG_ON(irqs_disabled()); before the IPI... though I don't even know how many platforms that actually pertains to (if any).

Daniel Vetter

6 Nov 6 Nov

7:13 p.m.

New subject: [PATCH 07/13] drm/i915: move clflushing into shmem_pread

This is obviously gonna slow down pread. But for a half-way realistic micro-benchmark, it doesn't matter: Non-broken userspace reads back data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects. Which means that partial preads on snb are now roughly 4x as fast. This will be usefull for e.g. the libva encoder - when I finally get around to fix that up.

v2: Properly sync with the gpu on LLC machines.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++++------ 1 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 5eec933..0e9cc81 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -342,12 +342,25 @@ i915_gem_shmem_pread(struct drm_device *dev, int shmem_page_offset, page_length, ret; int obj_do_bit17_swizzling, page_do_bit17_swizzling; int hit_slowpath = 0; + int needs_clflush = 0;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size;

obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);

+ if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) { + /* If we're not in the cpu read domain, set ourself into the gtt + * read domain and manually flush cachelines (if required). This + * optimizes for the case when the gpu will dirty the data + * anyway again before the next pread happens. */ + if (obj->cache_level == I915_CACHE_NONE) + needs_clflush = 1; + ret = i915_gem_object_set_to_gtt_domain(obj, false); + if (ret) + return ret; + } + offset = args->offset;

while (remain > 0) { @@ -374,6 +387,9 @@ i915_gem_shmem_pread(struct drm_device *dev,

if (!obj_do_bit17_swizzling || !page_do_bit17_swizzling) { vaddr = kmap_atomic(page); + if (needs_clflush) + drm_clflush_virt_range(vaddr + shmem_page_offset, + page_length); ret = __copy_to_user_inatomic(user_data, vaddr + shmem_page_offset, page_length); @@ -387,6 +403,10 @@ i915_gem_shmem_pread(struct drm_device *dev, mutex_unlock(&dev->struct_mutex);

vaddr = kmap(page); + if (needs_clflush) + drm_clflush_virt_range(vaddr + shmem_page_offset, + page_length); + if (obj_do_bit17_swizzling && page_do_bit17_swizzling) ret = __copy_to_user_swizzled(user_data, vaddr, shmem_page_offset, @@ -467,12 +487,6 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,

trace_i915_gem_object_pread(obj, args->offset, args->size);

- ret = i915_gem_object_set_cpu_read_domain_range(obj, - args->offset, - args->size); - if (ret) - goto out; - ret = i915_gem_shmem_pread(dev, obj, args, file);

out:

-- 1.7.6.4

Daniel Vetter

7:13 p.m.

New subject: [PATCH 08/13] drm/i915: kill ranged cpu read domain support

No longer needed.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_drv.h | 7 -- drivers/gpu/drm/i915/i915_gem.c | 117 --------------------------------------- 2 files changed, 0 insertions(+), 124 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 25036f5..fe4b680 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -861,13 +861,6 @@ struct drm_i915_gem_object { /** Record of address bit 17 of each page at last unbind. */ unsigned long *bit_17;

- - /** - * If present, while GEM_DOMAIN_CPU is in the read domain this array - * flags which individual pages are valid. - */ - uint8_t *page_cpu_valid; - /** User space pin count and filp owning the pin */ uint32_t user_pin_count; struct drm_file *pin_filp; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0e9cc81..0048917 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -41,10 +41,6 @@ static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *o static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj); static __must_check int i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write); -static __must_check int i915_gem_object_set_cpu_read_domain_range(struct drm_i915_gem_object *obj, - uint64_t offset, - uint64_t size); -static void i915_gem_object_set_to_full_cpu_read_domain(struct drm_i915_gem_object *obj); static __must_check int i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj, unsigned alignment, bool map_and_fenceable); @@ -2964,11 +2960,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)

i915_gem_object_flush_gtt_write_domain(obj);

- /* If we have a partially-valid cache of the object in the CPU, - * finish invalidating it and free the per-page flags. - */ - i915_gem_object_set_to_full_cpu_read_domain(obj); - old_write_domain = obj->base.write_domain; old_read_domains = obj->base.read_domains;

@@ -2999,113 +2990,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) return 0; }

-/** - * Moves the object from a partially CPU read to a full one. - * - * Note that this only resolves i915_gem_object_set_cpu_read_domain_range(), - * and doesn't handle transitioning from !(read_domains & I915_GEM_DOMAIN_CPU). - */ -static void -i915_gem_object_set_to_full_cpu_read_domain(struct drm_i915_gem_object *obj) -{ - if (!obj->page_cpu_valid) - return; - - /* If we're partially in the CPU read domain, finish moving it in. - */ - if (obj->base.read_domains & I915_GEM_DOMAIN_CPU) { - int i; - - for (i = 0; i <= (obj->base.size - 1) / PAGE_SIZE; i++) { - if (obj->page_cpu_valid[i]) - continue; - drm_clflush_pages(obj->pages + i, 1); - } - } - - /* Free the page_cpu_valid mappings which are now stale, whether - * or not we've got I915_GEM_DOMAIN_CPU. - */ - kfree(obj->page_cpu_valid); - obj->page_cpu_valid = NULL; -} - -/** - * Set the CPU read domain on a range of the object. - * - * The object ends up with I915_GEM_DOMAIN_CPU in its read flags although it's - * not entirely valid. The page_cpu_valid member of the object flags which - * pages have been flushed, and will be respected by - * i915_gem_object_set_to_cpu_domain() if it's called on to get a valid mapping - * of the whole object. - * - * This function returns when the move is complete, including waiting on - * flushes to occur. - */ -static int -i915_gem_object_set_cpu_read_domain_range(struct drm_i915_gem_object *obj, - uint64_t offset, uint64_t size) -{ - uint32_t old_read_domains; - int i, ret; - - if (offset == 0 && size == obj->base.size) - return i915_gem_object_set_to_cpu_domain(obj, 0); - - ret = i915_gem_object_flush_gpu_write_domain(obj); - if (ret) - return ret; - - ret = i915_gem_object_wait_rendering(obj); - if (ret) - return ret; - - i915_gem_object_flush_gtt_write_domain(obj); - - /* If we're already fully in the CPU read domain, we're done. */ - if (obj->page_cpu_valid == NULL && - (obj->base.read_domains & I915_GEM_DOMAIN_CPU) != 0) - return 0; - - /* Otherwise, create/clear the per-page CPU read domain flag if we're - * newly adding I915_GEM_DOMAIN_CPU - */ - if (obj->page_cpu_valid == NULL) { - obj->page_cpu_valid = kzalloc(obj->base.size / PAGE_SIZE, - GFP_KERNEL); - if (obj->page_cpu_valid == NULL) - return -ENOMEM; - } else if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) - memset(obj->page_cpu_valid, 0, obj->base.size / PAGE_SIZE); - - /* Flush the cache on any pages that are still invalid from the CPU's - * perspective. - */ - for (i = offset / PAGE_SIZE; i <= (offset + size - 1) / PAGE_SIZE; - i++) { - if (obj->page_cpu_valid[i]) - continue; - - drm_clflush_pages(obj->pages + i, 1); - - obj->page_cpu_valid[i] = 1; - } - - /* It should now be out of any other write domains, and we can update - * the domain values for our changes. - */ - BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0); - - old_read_domains = obj->base.read_domains; - obj->base.read_domains |= I915_GEM_DOMAIN_CPU; - - trace_i915_gem_object_change_domain(obj, - old_read_domains, - obj->base.write_domain); - - return 0; -} - /* Throttle our rendering by waiting until the ring has completed our requests * emitted over 20 msec ago. * @@ -3521,7 +3405,6 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj) drm_gem_object_release(&obj->base); i915_gem_info_remove_obj(dev_priv, obj->base.size);

- kfree(obj->page_cpu_valid); kfree(obj->bit_17); kfree(obj); }

-- 1.7.6.4

Daniel Vetter

7:13 p.m.

New subject: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects

~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0048917..8fd175c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, ret = i915_gem_phys_pwrite(dev, obj, args, file); goto out; } else if (obj->gtt_space && + obj->cache_level == I915_CACHE_NONE && obj->base.write_domain != I915_GEM_DOMAIN_CPU) { ret = i915_gem_object_pin(obj, 0, true); if (ret)

-- 1.7.6.4

Chris Wilson

9:16 p.m.

New subject: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects

On Sun, 6 Nov 2011 20:13:56 +0100, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

drivers/gpu/drm/i915/i915_gem.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0048917..8fd175c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, ret = i915_gem_phys_pwrite(dev, obj, args, file); goto out; } else if (obj->gtt_space &&
   obj->cache_level == I915_CACHE_NONE &&
obj->base.write_domain != I915_GEM_DOMAIN_CPU) { ret = i915_gem_object_pin(obj, 0, true); if (ret)

I still think you want to include a obj->map_and_fenceable test here. When doing 2D benchmarks the stall incurred here to evict an old object map the to-be-written object into the mappable GTT causes measureable pain (obviously on non-LLC architectures).

The series looks good and I'll look at the impact upon 2D for pnv and snb over the next couple of days. With and without the extra check ;-) -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Daniel Vetter

10:19 p.m.

New subject: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects

On Sun, Nov 06, 2011 at 09:16:00PM +0000, Chris Wilson wrote:

...

On Sun, 6 Nov 2011 20:13:56 +0100, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

drivers/gpu/drm/i915/i915_gem.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0048917..8fd175c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, ret = i915_gem_phys_pwrite(dev, obj, args, file); goto out; } else if (obj->gtt_space &&
   obj->cache_level == I915_CACHE_NONE &&
obj->base.write_domain != I915_GEM_DOMAIN_CPU) { ret = i915_gem_object_pin(obj, 0, true); if (ret)
I still think you want to include a obj->map_and_fenceable test here. When doing 2D benchmarks the stall incurred here to evict an old object map the to-be-written object into the mappable GTT causes measureable pain (obviously on non-LLC architectures).

That's one of "further tricks". I think we need to also implement the same in-place clflush trick like for pread, too, to avoid penalizing partial pwrites too much.

The other trick is to do reloc fixups through llc/clflushed cpu writes. This way we'd completely eliminate mappable pressure for all untiled objects. The only thing left would be scanout, tiled gtt uploads and tiled blts (only on pre-gen4). -Daniel

-- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48

Daniel Vetter

7:13 p.m.

New subject: [PATCH 10/13] drm/i915: don't call shmem_read_mapping unnecessarily

This speeds up pwrite and pread from ~120 µs ro ~100 µs for reading/writing 1mb on my snb (if the backing storage pages are already pinned, of course).

v2: Chris Wilson pointed out a claring page reference bug - I've unconditionally dropped the reference. With that fixed (and the associated reduction of dirt in dmesg) it's now even a notch faster.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 42 +++++++++++++++++++++++++++++--------- 1 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8fd175c..5b8a7c5 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -339,6 +339,7 @@ i915_gem_shmem_pread(struct drm_device *dev, int obj_do_bit17_swizzling, page_do_bit17_swizzling; int hit_slowpath = 0; int needs_clflush = 0; + int release_page;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size; @@ -373,10 +374,16 @@ i915_gem_shmem_pread(struct drm_device *dev, if ((shmem_page_offset + page_length) > PAGE_SIZE) page_length = PAGE_SIZE - shmem_page_offset;

- page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto out; + if (obj->pages) { + page = obj->pages[offset >> PAGE_SHIFT]; + release_page = 0; + } else { + page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto out; + } + release_page = 1; }

page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0; @@ -395,6 +402,9 @@ i915_gem_shmem_pread(struct drm_device *dev, }

hit_slowpath = 1; + if (!release_page) + page_cache_get(page); + release_page = 1;

mutex_unlock(&dev->struct_mutex);

@@ -416,7 +426,8 @@ i915_gem_shmem_pread(struct drm_device *dev, mutex_lock(&dev->struct_mutex); next_page: mark_page_accessed(page); - page_cache_release(page); + if (release_page) + page_cache_release(page);

if (ret) { ret = -EFAULT; @@ -697,6 +708,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, int shmem_page_offset, page_length, ret; int obj_do_bit17_swizzling, page_do_bit17_swizzling; int hit_slowpath = 0; + int release_page;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size; @@ -721,10 +733,16 @@ i915_gem_shmem_pwrite(struct drm_device *dev, if ((shmem_page_offset + page_length) > PAGE_SIZE) page_length = PAGE_SIZE - shmem_page_offset;

page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0; @@ -741,6 +759,9 @@ i915_gem_shmem_pwrite(struct drm_device *dev, }

hit_slowpath = 1; + if (!release_page) + page_cache_get(page); + release_page = 1;

mutex_unlock(&dev->struct_mutex);

@@ -759,7 +780,8 @@ i915_gem_shmem_pwrite(struct drm_device *dev, next_page: set_page_dirty(page); mark_page_accessed(page); - page_cache_release(page); + if (release_page) + page_cache_release(page);

if (ret) { ret = -EFAULT;

-- 1.7.6.4

Daniel Vetter

7:13 p.m.

New subject: [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE

drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes.

I've checked the callsites and they all already clamp size when calling fault_in_pages_* to the same as for the subsequent __copy_to|from_user and hence don't rely on the implicit clamping to PAGE_SIZE.

Also kill a copy&pasted spurious space in both functions while at it.

Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- include/linux/pagemap.h | 28 ++++++++++++++++++---------- 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index cfaaa69..689527d 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); static inline int fault_in_pages_writeable(char __user *uaddr, int size) { int ret; + char __user *end = uaddr + size - 1;

if (unlikely(size == 0)) return 0; @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size) * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - ret = __put_user(0, uaddr); + while (uaddr <= end) { + ret = __put_user(0, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } if (ret == 0) { - char __user *end = uaddr + size - 1; - /* * If the page was already mapped, this will get a cache miss * for sure, so try to avoid doing it. */ - if (((unsigned long)uaddr & PAGE_MASK) != + if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) - ret = __put_user(0, end); + ret = __put_user(0, end); } return ret; } @@ -435,17 +439,21 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size) { volatile char c; int ret; + const char __user *end = uaddr + size - 1;

if (unlikely(size == 0)) return 0;

- ret = __get_user(c, uaddr); + while (uaddr <= end) { + ret = __get_user(c, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } if (ret == 0) { - const char __user *end = uaddr + size - 1; - - if (((unsigned long)uaddr & PAGE_MASK) != + if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) { - ret = __get_user(c, end); + ret = __get_user(c, end); (void)c; } }

-- 1.7.6.4

Chris Wilson

10:24 p.m.

New subject: [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Sun, 6 Nov 2011 20:13:58 +0100, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes.

I've checked the callsites and they all already clamp size when calling fault_in_pages_* to the same as for the subsequent __copy_to|from_user and hence don't rely on the implicit clamping to PAGE_SIZE.

Also kill a copy&pasted spurious space in both functions while at it.

Cc: linux-mm@kvack.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

include/linux/pagemap.h | 28 ++++++++++++++++++---------- 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index cfaaa69..689527d 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); static inline int fault_in_pages_writeable(char __user *uaddr, int size) { int ret;

char __user *end = uaddr + size - 1;

if (unlikely(size == 0)) return 0;

@@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size) * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */

ret = __put_user(0, uaddr);
while (uaddr <= end) {
ret = __put_user(0, uaddr);
if (ret != 0)
	return ret;
uaddr += PAGE_SIZE;
} if (ret == 0) {
char __user *end = uaddr + size - 1;
/*

If the page was already mapped, this will get a cache miss

for sure, so try to avoid doing it.

*/
if (((unsigned long)uaddr & PAGE_MASK) !=
if (((unsigned long)uaddr & PAGE_MASK) ==
	((unsigned long)end & PAGE_MASK))
 	ret = __put_user(0, end);
	ret = __put_user(0, end);
} return ret;

You leave these functions in a worse mess by introducing a false compiler warning about an uninitialized ret by the now redundant test against zero, a do{}while loop would be clearer that the original behaviour is merely extended upon. And please replace the open-coded offset_in_page(). -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Daniel Vetter

7:13 p.m.

New subject: [PATCH 12/13] drm/i915: drop gtt slowpath

With the proper prefault, it's extremely unlikely that we fall back to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the respective pwrite functions. This way the gtt_fast->shmem fallback is much more obvious.

Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 183 +++++++-------------------------------- 1 files changed, 30 insertions(+), 153 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 5b8a7c5..0a374e1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -523,30 +523,6 @@ fast_user_write(struct io_mapping *mapping, return unwritten; }

-/* Here's the write path which can sleep for - * page faults - */ - -static inline void -slow_kernel_write(struct io_mapping *mapping, - loff_t gtt_base, int gtt_offset, - struct page *user_page, int user_offset, - int length) -{ - char __iomem *dst_vaddr; - char *src_vaddr; - - dst_vaddr = io_mapping_map_wc(mapping, gtt_base); - src_vaddr = kmap(user_page); - - memcpy_toio(dst_vaddr + gtt_offset, - src_vaddr + user_offset, - length); - - kunmap(user_page); - io_mapping_unmap(dst_vaddr); -} - /** * This is the fast pwrite path, where we copy the data directly from the * user into the GTT, uncached. @@ -561,7 +537,19 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev, ssize_t remain; loff_t offset, page_base; char __user *user_data; - int page_offset, page_length; + int page_offset, page_length, ret; + + ret = i915_gem_object_pin(obj, 0, true); + if (ret) + goto out; + + ret = i915_gem_object_set_to_gtt_domain(obj, true); + if (ret) + goto out_unpin; + + ret = i915_gem_object_put_fence(obj); + if (ret) + goto out_unpin;

user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size; @@ -586,112 +574,19 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev, * retry in the slow path. */ if (fast_user_write(dev_priv->mm.gtt_mapping, page_base, - page_offset, user_data, page_length)) - return -EFAULT; + page_offset, user_data, page_length)) { + ret = -EFAULT; + goto out_unpin; + }

remain -= page_length; user_data += page_length; offset += page_length; }

- return 0; -} - -/** - * This is the fallback GTT pwrite path, which uses get_user_pages to pin - * the memory and maps it using kmap_atomic for copying. - * - * This code resulted in x11perf -rgb10text consuming about 10% more CPU - * than using i915_gem_gtt_pwrite_fast on a G45 (32-bit). - */ -static int -i915_gem_gtt_pwrite_slow(struct drm_device *dev, - struct drm_i915_gem_object *obj, - struct drm_i915_gem_pwrite *args, - struct drm_file *file) -{ - drm_i915_private_t *dev_priv = dev->dev_private; - ssize_t remain; - loff_t gtt_page_base, offset; - loff_t first_data_page, last_data_page, num_pages; - loff_t pinned_pages, i; - struct page **user_pages; - struct mm_struct *mm = current->mm; - int gtt_page_offset, data_page_offset, data_page_index, page_length; - int ret; - uint64_t data_ptr = args->data_ptr; - - remain = args->size; - - /* Pin the user pages containing the data. We can't fault while - * holding the struct mutex, and all of the pwrite implementations - * want to hold it while dereferencing the user data. - */ - first_data_page = data_ptr / PAGE_SIZE; - last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE; - num_pages = last_data_page - first_data_page + 1; - - user_pages = drm_malloc_ab(num_pages, sizeof(struct page *)); - if (user_pages == NULL) - return -ENOMEM; - - mutex_unlock(&dev->struct_mutex); - down_read(&mm->mmap_sem); - pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr, - num_pages, 0, 0, user_pages, NULL); - up_read(&mm->mmap_sem); - mutex_lock(&dev->struct_mutex); - if (pinned_pages < num_pages) { - ret = -EFAULT; - goto out_unpin_pages; - } - - ret = i915_gem_object_set_to_gtt_domain(obj, true); - if (ret) - goto out_unpin_pages; - - ret = i915_gem_object_put_fence(obj); - if (ret) - goto out_unpin_pages; - - offset = obj->gtt_offset + args->offset; - - while (remain > 0) { - /* Operation in this page - * - * gtt_page_base = page offset within aperture - * gtt_page_offset = offset within page in aperture - * data_page_index = page number in get_user_pages return - * data_page_offset = offset with data_page_index page. - * page_length = bytes to copy for this page - */ - gtt_page_base = offset & PAGE_MASK; - gtt_page_offset = offset_in_page(offset); - data_page_index = data_ptr / PAGE_SIZE - first_data_page; - data_page_offset = offset_in_page(data_ptr); - - page_length = remain; - if ((gtt_page_offset + page_length) > PAGE_SIZE) - page_length = PAGE_SIZE - gtt_page_offset; - if ((data_page_offset + page_length) > PAGE_SIZE) - page_length = PAGE_SIZE - data_page_offset; - - slow_kernel_write(dev_priv->mm.gtt_mapping, - gtt_page_base, gtt_page_offset, - user_pages[data_page_index], - data_page_offset, - page_length); - - remain -= page_length; - offset += page_length; - data_ptr += page_length; - } - -out_unpin_pages: - for (i = 0; i < pinned_pages; i++) - page_cache_release(user_pages[i]); - drm_free_large(user_pages); - +out_unpin: + i915_gem_object_unpin(obj); +out: return ret; }

@@ -710,6 +605,10 @@ i915_gem_shmem_pwrite(struct drm_device *dev, int hit_slowpath = 0; int release_page;

+ ret = i915_gem_object_set_to_cpu_domain(obj, 1); + if (ret) + return ret; + user_data = (char __user *) (uintptr_t) args->data_ptr; remain = args->size;

@@ -854,6 +753,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,

trace_i915_gem_object_pwrite(obj, args->offset, args->size);

+ ret = -EFAULT; /* We can only do the GTT pwrite on untiled buffers, as otherwise * it would end up going through the fenced access, and we'll get * different detiling behavior between reading and writing. @@ -866,37 +766,14 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, } else if (obj->gtt_space && obj->cache_level == I915_CACHE_NONE && obj->base.write_domain != I915_GEM_DOMAIN_CPU) { - ret = i915_gem_object_pin(obj, 0, true); - if (ret) - goto out; - - ret = i915_gem_object_set_to_gtt_domain(obj, true); - if (ret) - goto out_unpin; - - ret = i915_gem_object_put_fence(obj); - if (ret) - goto out_unpin; - ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file); - if (ret == -EFAULT) - ret = i915_gem_gtt_pwrite_slow(dev, obj, args, file); - -out_unpin: - i915_gem_object_unpin(obj); - - if (ret != -EFAULT) - goto out; - /* Fall through to the shmfs paths because the gtt paths might - * fail with non-page-backed user pointers (e.g. gtt mappings - * when moving data between textures). */ + /* Note that the gtt paths might fail with non-page-backed user + * pointers (e.g. gtt mappings when moving data between + * textures). Fallback to the shmem path in that case. */ }

- ret = i915_gem_object_set_to_cpu_domain(obj, 1); - if (ret) - goto out; - - ret = i915_gem_shmem_pwrite(dev, obj, args, file); + if (ret == -EFAULT) + ret = i915_gem_shmem_pwrite(dev, obj, args, file);

out: drm_gem_object_unreference(&obj->base);

-- 1.7.6.4

Daniel Vetter

7:14 p.m.

New subject: [PATCH 13/13] drm/i915: don't clobber userspace memory before commiting to the pread

The pagemap.h prefault helpers do the prefaulting by simply writing some data into every page. Hence we should not prefault when we're not yet commited to to actually writing data to userspace. The problem is now that - we can't prefault while holding dev->struct_mutex for we could deadlock with our own pagefault handler - we need to grab dev->struct_mutex before copying to sync up with any outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in the pread slowpath - at that point we're committed to the write, don't wait on the gpu anymore and hence won't return early (with e.g. -EINTR).

Signed-Off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++----- 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0a374e1..ebaa0f0 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -338,6 +338,7 @@ i915_gem_shmem_pread(struct drm_device *dev, int shmem_page_offset, page_length, ret; int obj_do_bit17_swizzling, page_do_bit17_swizzling; int hit_slowpath = 0; + int prefaulted = 0; int needs_clflush = 0; int release_page;

@@ -408,6 +409,16 @@ i915_gem_shmem_pread(struct drm_device *dev,

mutex_unlock(&dev->struct_mutex);

+ if (!prefaulted) { + ret = fault_in_pages_writeable(user_data, remain); + /* Userspace is tricking us, but we've already clobbered + * its pages with the prefault and promised to write the + * data up to the first fault. Hence ignore any errors + * and just continue. */ + (void)ret; + prefaulted = 1; + } + vaddr = kmap(page); if (needs_clflush) drm_clflush_virt_range(vaddr + shmem_page_offset, @@ -470,11 +481,6 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data, args->size)) return -EFAULT;

- ret = fault_in_pages_writeable((char __user *)(uintptr_t)args->data_ptr, - args->size); - if (ret) - return -EFAULT; - ret = i915_mutex_lock_interruptible(dev); if (ret) return ret;

-- 1.7.6.4

4924

Age (days ago)

4939

Last active (days ago)

dri-devel@lists.freedesktop.org

23 comments

4 participants

tags (0)

participants (4)

Ben Widawsky
Chris Wilson
Daniel Vetter
Daniel Vetter