On Sun, Jan 19, 2014 at 2:28 AM, Linus Torvalds torvalds@linux-foundation.org wrote:
Testing running out of file descriptors shows a NULL pointer dereference in i915_gem_alloc_object() because base.filp ends up being NULL. So the line
mapping = file_inode(obj->base.filp)->i_mapping;
will cause an oops. The call chain is
SyS_ioctl -> do_vfs_ioctl -> drm_ioctl -> i915_gem_create_ioctl -> i915_gem_create -> i915_gem_alloc_object
Now, some functions do test "base.filp" for NULL (see for example i915_gem_pread_ioctl()) so clearly people know that the filp may not exist. But that path does not.
This is unrelated since only shmem backed objects (should) have base.filp set, but other gem objects (backed by stolen mem or from dma-buf) don't have that. A bunch of gem ioclts aren't supported with these special objects (like pread/pwrite) and so we check for that and bail out.
Comments? Patches even?
If we fail to allocate the shmem file drm_gem_object_init should fail and i915_gem_alloc_object should bail out early. Apparently that doesn't work as it should. On a quick check it's not the lack of _OR_NULL in drm_gem_object_init so I'm not really sure what's going on here. I'll try to reproduce this here meanwhile.
Adding dri-devel since other gem drivers should be equally affected.
Cheers, Daniel
On Sun, Jan 19, 2014 at 12:36:20PM +0100, Daniel Vetter wrote:
On Sun, Jan 19, 2014 at 2:28 AM, Linus Torvalds torvalds@linux-foundation.org wrote:
Testing running out of file descriptors shows a NULL pointer dereference in i915_gem_alloc_object() because base.filp ends up being NULL. So the line
mapping = file_inode(obj->base.filp)->i_mapping;
will cause an oops. The call chain is
SyS_ioctl -> do_vfs_ioctl -> drm_ioctl -> i915_gem_create_ioctl -> i915_gem_create -> i915_gem_alloc_object
Now, some functions do test "base.filp" for NULL (see for example i915_gem_pread_ioctl()) so clearly people know that the filp may not exist. But that path does not.
This is unrelated since only shmem backed objects (should) have base.filp set, but other gem objects (backed by stolen mem or from dma-buf) don't have that. A bunch of gem ioclts aren't supported with these special objects (like pread/pwrite) and so we check for that and bail out.
Comments? Patches even?
If we fail to allocate the shmem file drm_gem_object_init should fail and i915_gem_alloc_object should bail out early. Apparently that doesn't work as it should. On a quick check it's not the lack of _OR_NULL in drm_gem_object_init so I'm not really sure what's going on here. I'll try to reproduce this here meanwhile.
Adding dri-devel since other gem drivers should be equally affected.
This took much longer than I'll ever dare to admit, but I think I've stitched a testcase together for this and pushed it to our i915 test repo:
http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/commit/?id=70082e367771...
But I couldn't blow up my system here with that. Same on stable kernels, so this isn't something we've fixed by accident in drm-intel-next. -Daniel
On Sun, Jan 19, 2014 at 9:31 AM, Daniel Vetter daniel@ffwll.ch wrote:
This took much longer than I'll ever dare to admit, but I think I've stitched a testcase together for this and pushed it to our i915 test repo:
So I was testing a patch that limits one user to fewer than the global system resource files, and the oops happened just once, so it might have been a lucky fluke, with the failure *usually* happening somewhere else.
And I'm beginning to think I misread the oops. I was *assuming* it was due to a NULL file pointer access in that function because of the way I triggered it, but now that I look closer at the disassembly, what seems to actually be happening is that the (inlined) i915_gem_object_free() function does
struct drm_i915_private *dev_priv = obj->base.dev->dev_private; kmem_cache_free(dev_priv->slab, obj);
and it's "base.dev" that is NULL, so the dev_private access is the one that oopses.
And it is related to running out of file descriptors, but indirectly: the lack of file descriptors is making drm_gem_object_init() fail (correctly), but it has failed *before* it has done the
drm_gem_private_object_init(dev, obj, size);
which is why that oops then happens.
So my suggestion is that drm_gem_object_init() should do that drm_gem_private_object_init() part regardless of whether the shm_file_setup() failed or not.
Makes sense?
Linus
On Sun, Jan 19, 2014 at 8:39 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Sun, Jan 19, 2014 at 9:31 AM, Daniel Vetter daniel@ffwll.ch wrote:
This took much longer than I'll ever dare to admit, but I think I've stitched a testcase together for this and pushed it to our i915 test repo:
So I was testing a patch that limits one user to fewer than the global system resource files, and the oops happened just once, so it might have been a lucky fluke, with the failure *usually* happening somewhere else.
And I'm beginning to think I misread the oops. I was *assuming* it was due to a NULL file pointer access in that function because of the way I triggered it, but now that I look closer at the disassembly, what seems to actually be happening is that the (inlined) i915_gem_object_free() function does
struct drm_i915_private *dev_priv = obj->base.dev->dev_private; kmem_cache_free(dev_priv->slab, obj);
and it's "base.dev" that is NULL, so the dev_private access is the one that oopses.
And it is related to running out of file descriptors, but indirectly: the lack of file descriptors is making drm_gem_object_init() fail (correctly), but it has failed *before* it has done the
drm_gem_private_object_init(dev, obj, size);
which is why that oops then happens.
So my suggestion is that drm_gem_object_init() should do that drm_gem_private_object_init() part regardless of whether the shm_file_setup() failed or not.
Makes sense?
Yup. And I've also figured out why my testcase couldn't kill the kernel - the testsuite is run as root to be able to do generally nasty things, and apparently the kernel doesn't enforce the limits for root. Now that I drop root privs in the testcase it blows up in a nice oops. And is fixed with your suggestion applied.
But then my testcase now dies up on an assert where it shouldn't, so I think I'll look at all this again tomorrow with a fresh mind, clean up the patches and then send them out.
Cheers, Daniel
dri-devel@lists.freedesktop.org