The file is not part of the global drm resource and can be released prior to take the global mutex to drop the open_count (and potentially close) the drm device. As the global mutex is indeed global, not only within the device but across devices, a slow file release mechanism can bottleneck the entire system.
However, inside drm_close_helper() there are a number of dev->driver callbacks that take the drm_device as the first parameter... Worryingly some of those callbacks may be (implicitly) depending on the global mutex.
v2: Drop the debug message for the open-count, it's included with the drm_file_free() debug message -- and for good measure make that up as reading outside of the mutex.
v3: Separate the calling of the filp cleanup outside of drm_global_mutex into a new drm_release_noglobal() hook, so that we can phase the transition. drm/savage relies on the global mutex, and there may be more, so be cautious.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Hellström (VMware) thomas_os@shipmail.org Acked-by: Thomas Hellström (VMware) thomas_os@shipmail.org --- drivers/gpu/drm/drm_file.c | 36 ++++++++++++++++++++++++++++++++- drivers/gpu/drm/i915/i915_drv.c | 2 +- include/drm/drm_file.h | 1 + 3 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 92d16724f949..e25306c49cc6 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -220,7 +220,7 @@ void drm_file_free(struct drm_file *file) DRM_DEBUG("pid = %d, device = 0x%lx, open_count = %d\n", task_pid_nr(current), (long)old_encode_dev(file->minor->kdev->devt), - dev->open_count); + READ_ONCE(dev->open_count));
if (drm_core_check_feature(dev, DRIVER_LEGACY) && dev->driver->preclose) @@ -455,6 +455,40 @@ int drm_release(struct inode *inode, struct file *filp) } EXPORT_SYMBOL(drm_release);
+/** + * drm_release_noglobal - release method for DRM file + * @inode: device inode + * @filp: file pointer. + * + * This function may be used by drivers as their &file_operations.release + * method. It frees any resources associated with the open file prior to taking + * the drm_global_mutex, which then calls the &drm_driver.postclose driver + * callback. If this is the last open file for the DRM device also proceeds to + * call the &drm_driver.lastclose driver callback. + * + * RETURNS: + * + * Always succeeds and returns 0. + */ +int drm_release_noglobal(struct inode *inode, struct file *filp) +{ + struct drm_file *file_priv = filp->private_data; + struct drm_minor *minor = file_priv->minor; + struct drm_device *dev = minor->dev; + + drm_close_helper(filp); + + mutex_lock(&drm_global_mutex); + if (!--dev->open_count) + drm_lastclose(dev); + mutex_unlock(&drm_global_mutex); + + drm_minor_release(minor); + + return 0; +} +EXPORT_SYMBOL(drm_release_noglobal); + /** * drm_read - read method for DRM file * @filp: file pointer diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index e9b42e962032..5a5846d892f4 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -2673,7 +2673,7 @@ const struct dev_pm_ops i915_pm_ops = { static const struct file_operations i915_driver_fops = { .owner = THIS_MODULE, .open = drm_open, - .release = drm_release, + .release = drm_release_noglobal, .unlocked_ioctl = drm_ioctl, .mmap = i915_gem_mmap, .poll = drm_poll, diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 8b099b347817..19df8028a6c4 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -374,6 +374,7 @@ int drm_open(struct inode *inode, struct file *filp); ssize_t drm_read(struct file *filp, char __user *buffer, size_t count, loff_t *offset); int drm_release(struct inode *inode, struct file *filp); +int drm_release_noglobal(struct inode *inode, struct file *filp); __poll_t drm_poll(struct file *filp, struct poll_table_struct *wait); int drm_event_reserve_init_locked(struct drm_device *dev, struct drm_file *file_priv,
Since drm_global_mutex is a true global mutex across devices, we don't want to acquire it unless absolutely necessary. For maintaining the device local open_count, we can use atomic operations on the counter itself, except when making the transition to/from 0. Here, we tackle the easy portion of delaying acquiring the drm_global_mutex for the final release by using atomic_dec_and_mutex_lock(), leaving the global serialisation across the device opens.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Hellström (VMware) thomas_os@shipmail.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/drm_file.c | 14 ++++++-------- drivers/gpu/drm/i915/i915_switcheroo.c | 2 +- drivers/gpu/drm/nouveau/nouveau_vga.c | 2 +- drivers/gpu/drm/radeon/radeon_device.c | 2 +- include/drm/drm_device.h | 2 +- 6 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 53d882000101..c3c0356dfa61 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1136,7 +1136,7 @@ static bool amdgpu_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return dev->open_count == 0; + return atomic_read(&dev->open_count) == 0; }
static const struct vga_switcheroo_client_ops amdgpu_switcheroo_ops = { diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index e25306c49cc6..c4e9ef86c589 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -220,7 +220,7 @@ void drm_file_free(struct drm_file *file) DRM_DEBUG("pid = %d, device = 0x%lx, open_count = %d\n", task_pid_nr(current), (long)old_encode_dev(file->minor->kdev->devt), - READ_ONCE(dev->open_count)); + atomic_read(&dev->open_count));
if (drm_core_check_feature(dev, DRIVER_LEGACY) && dev->driver->preclose) @@ -379,7 +379,7 @@ int drm_open(struct inode *inode, struct file *filp) return PTR_ERR(minor);
dev = minor->dev; - if (!dev->open_count++) + if (!atomic_fetch_inc(&dev->open_count)) need_setup = 1;
/* share address_space across all char-devs of a single device */ @@ -398,7 +398,7 @@ int drm_open(struct inode *inode, struct file *filp) return 0;
err_undo: - dev->open_count--; + atomic_dec(&dev->open_count); drm_minor_release(minor); return retcode; } @@ -440,11 +440,11 @@ int drm_release(struct inode *inode, struct file *filp)
mutex_lock(&drm_global_mutex);
- DRM_DEBUG("open_count = %d\n", dev->open_count); + DRM_DEBUG("open_count = %d\n", atomic_read(&dev->open_count));
drm_close_helper(filp);
- if (!--dev->open_count) + if (atomic_dec_and_test(&dev->open_count)) drm_lastclose(dev);
mutex_unlock(&drm_global_mutex); @@ -478,10 +478,8 @@ int drm_release_noglobal(struct inode *inode, struct file *filp)
drm_close_helper(filp);
- mutex_lock(&drm_global_mutex); - if (!--dev->open_count) + if (atomic_dec_and_mutex_lock(&dev->open_count, &drm_global_mutex)) drm_lastclose(dev); - mutex_unlock(&drm_global_mutex);
drm_minor_release(minor);
diff --git a/drivers/gpu/drm/i915/i915_switcheroo.c b/drivers/gpu/drm/i915/i915_switcheroo.c index 39c79e1c5b52..ed69b5d4a375 100644 --- a/drivers/gpu/drm/i915/i915_switcheroo.c +++ b/drivers/gpu/drm/i915/i915_switcheroo.c @@ -43,7 +43,7 @@ static bool i915_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return i915 && i915->drm.open_count == 0; + return i915 && atomic_read(&i915->drm.open_count) == 0; }
static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c b/drivers/gpu/drm/nouveau/nouveau_vga.c index d865d8aeac3c..c85dd8afa3c3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_vga.c +++ b/drivers/gpu/drm/nouveau/nouveau_vga.c @@ -72,7 +72,7 @@ nouveau_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return dev->open_count == 0; + return atomic_read(&dev->open_count) == 0; }
static const struct vga_switcheroo_client_ops diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index a522e092038b..266e3cbbd09b 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1263,7 +1263,7 @@ static bool radeon_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return dev->open_count == 0; + return atomic_read(&dev->open_count) == 0; }
static const struct vga_switcheroo_client_ops radeon_switcheroo_ops = { diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index 1acfc3bbd3fb..bb60a949f416 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -144,7 +144,7 @@ struct drm_device { * Usage counter for outstanding files open, * protected by drm_global_mutex */ - int open_count; + atomic_t open_count;
/** @filelist_mutex: Protects @filelist. */ struct mutex filelist_mutex;
Since drm_global_mutex is a true global mutex across devices, we don't want to acquire it unless absolutely necessary. For maintaining the device local open_count, we can use atomic operations on the counter itself, except when making the transition to/from 0. Here, we tackle the easy portion of delaying acquiring the drm_global_mutex for the final release by using atomic_dec_and_mutex_lock(), leaving the global serialisation across the device opens.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Thomas Hellström (VMware) thomas_os@shipmail.org --- atomic_dec_and_mutex_lock needs pairing with mutex_unlock (you fool) --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/drm_file.c | 16 ++++++++-------- drivers/gpu/drm/i915/i915_switcheroo.c | 2 +- drivers/gpu/drm/nouveau/nouveau_vga.c | 2 +- drivers/gpu/drm/radeon/radeon_device.c | 2 +- include/drm/drm_device.h | 2 +- 6 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 53d882000101..c3c0356dfa61 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1136,7 +1136,7 @@ static bool amdgpu_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return dev->open_count == 0; + return atomic_read(&dev->open_count) == 0; }
static const struct vga_switcheroo_client_ops amdgpu_switcheroo_ops = { diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index e25306c49cc6..1075b3a8b5b1 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -220,7 +220,7 @@ void drm_file_free(struct drm_file *file) DRM_DEBUG("pid = %d, device = 0x%lx, open_count = %d\n", task_pid_nr(current), (long)old_encode_dev(file->minor->kdev->devt), - READ_ONCE(dev->open_count)); + atomic_read(&dev->open_count));
if (drm_core_check_feature(dev, DRIVER_LEGACY) && dev->driver->preclose) @@ -379,7 +379,7 @@ int drm_open(struct inode *inode, struct file *filp) return PTR_ERR(minor);
dev = minor->dev; - if (!dev->open_count++) + if (!atomic_fetch_inc(&dev->open_count)) need_setup = 1;
/* share address_space across all char-devs of a single device */ @@ -398,7 +398,7 @@ int drm_open(struct inode *inode, struct file *filp) return 0;
err_undo: - dev->open_count--; + atomic_dec(&dev->open_count); drm_minor_release(minor); return retcode; } @@ -440,11 +440,11 @@ int drm_release(struct inode *inode, struct file *filp)
mutex_lock(&drm_global_mutex);
- DRM_DEBUG("open_count = %d\n", dev->open_count); + DRM_DEBUG("open_count = %d\n", atomic_read(&dev->open_count));
drm_close_helper(filp);
- if (!--dev->open_count) + if (atomic_dec_and_test(&dev->open_count)) drm_lastclose(dev);
mutex_unlock(&drm_global_mutex); @@ -478,10 +478,10 @@ int drm_release_noglobal(struct inode *inode, struct file *filp)
drm_close_helper(filp);
- mutex_lock(&drm_global_mutex); - if (!--dev->open_count) + if (atomic_dec_and_mutex_lock(&dev->open_count, &drm_global_mutex)) { drm_lastclose(dev); - mutex_unlock(&drm_global_mutex); + mutex_unlock(&drm_global_mutex); + }
drm_minor_release(minor);
diff --git a/drivers/gpu/drm/i915/i915_switcheroo.c b/drivers/gpu/drm/i915/i915_switcheroo.c index 39c79e1c5b52..ed69b5d4a375 100644 --- a/drivers/gpu/drm/i915/i915_switcheroo.c +++ b/drivers/gpu/drm/i915/i915_switcheroo.c @@ -43,7 +43,7 @@ static bool i915_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return i915 && i915->drm.open_count == 0; + return i915 && atomic_read(&i915->drm.open_count) == 0; }
static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c b/drivers/gpu/drm/nouveau/nouveau_vga.c index d865d8aeac3c..c85dd8afa3c3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_vga.c +++ b/drivers/gpu/drm/nouveau/nouveau_vga.c @@ -72,7 +72,7 @@ nouveau_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return dev->open_count == 0; + return atomic_read(&dev->open_count) == 0; }
static const struct vga_switcheroo_client_ops diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index a522e092038b..266e3cbbd09b 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1263,7 +1263,7 @@ static bool radeon_switcheroo_can_switch(struct pci_dev *pdev) * locking inversion with the driver load path. And the access here is * completely racy anyway. So don't bother with locking for now. */ - return dev->open_count == 0; + return atomic_read(&dev->open_count) == 0; }
static const struct vga_switcheroo_client_ops radeon_switcheroo_ops = { diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h index 1acfc3bbd3fb..bb60a949f416 100644 --- a/include/drm/drm_device.h +++ b/include/drm/drm_device.h @@ -144,7 +144,7 @@ struct drm_device { * Usage counter for outstanding files open, * protected by drm_global_mutex */ - int open_count; + atomic_t open_count;
/** @filelist_mutex: Protects @filelist. */ struct mutex filelist_mutex;
On 1/24/20 2:01 PM, Chris Wilson wrote:
For the series:
Reviewed-by: Thomas Hellström thellstrom@vmware.com
Now the only remaining (though pre-existing) problem I can see is that there is no corresponding mutex lock in drm_open() so that firstopen might race with lastclose.. Or I might be missing something..
/Thomas
Quoting Thomas Hellström (VMware) (2020-01-24 13:37:47)
Now being opt-in, it is fairly limited in scope and will not randomly break others (touch wood) and the close() racing in BAT didn't throw anything up, so pushed to drm-misc-next. Thanks for the review and suggestions,
Next task is to suggest others might like to use it as well. -Chris
On Fri, Jan 24, 2020 at 06:39:26PM +0000, Chris Wilson wrote:
Yeah this version looks reasonable compared to the previous few (I'm catching up on dri-devel). I've looked at getting rid of the global_mutex, and all I have is a simple patch with a pile of notes. It's real nasty.
This one here is a neat trick that I missed, and I'm semi-convinced it's safe :-)
Next task is to suggest others might like to use it as well.
My idea for the opt-in was to look at whether ->load/->unload exists. And ofc not bother with any of this for DRIVER_LEGACY. So maybe next step would be to define a
drm_can_noglobal() { return !DRIVER_LEGACY && !->load && !->unload; }
and inline the close helper again and see what breaks? At least from what I've looked trying to duplicate paths and opt-in is going to be real tough on the open side of things. Best I've done thus far is minor pushing of the critical section. -Daniel
On Fri, Jan 24, 2020 at 12:56:26PM +0000, Chris Wilson wrote:
btw my rough idea for lastclose is that we're just going to make it racy, and then use the master lock and drm_client infrastructure to handle fights between fbcon and a restarted compositor. That's already how that's handled everywhere else. We might need to cut over drivers to the generic fbcon stuff (or at least steal parts of the drm_client stuff I think), but not sure. -Daniel
dri-devel@lists.freedesktop.org