Hello,
This series contains patches suggested by Daniel Vetter to fix a use-after-free error in the fb_release() function, due a fb_info associated with a fbdev being freed too early while a user-space process still has the fbdev dev node opened.
That is caused by a wrong management of the struct fb_info lifetime in drivers, but the fbdev core can also be made more resilient about it an leak
This can easily be reproduced with the simplefb driver doing the following:
$ cat < /dev/fb0 & $ echo simple-framebuffer.0 > /sys/bus/platform/drivers/simple-framebuffer/unbind $ kill %1
[ 257.490471] ------------[ cut here ]------------ ... [ 257.495125] refcount_t: underflow; use-after-free. [ 257.495222] WARNING: CPU: 0 PID: 975 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x144 ... [ 257.637482] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 257.644441] pc : refcount_warn_saturate+0xf4/0x144 [ 257.649226] lr : refcount_warn_saturate+0xf4/0x144 [ 257.654009] sp : ffff80000a06bbf0 [ 257.657315] x29: ffff80000a06bbf0 x28: 000000000000000a x27: 000000000000000a [ 257.664448] x26: 0000000000000000 x25: ffff470b88c6a180 x24: 000000000000000a [ 257.671581] x23: ffff470b81706480 x22: ffff470b808c2160 x21: ffff470b8922ba20 [ 257.678713] x20: ffff470b891f5810 x19: ffff470b891f5800 x18: ffffffffffffffff [ 257.685846] x17: 3a725f7463656a62 x16: ffffbb18c6465fd4 x15: 0720072007200720 [ 257.692978] x14: 0720072d072d072d x13: 0a2e656572662d72 x12: 657466612d657375 [ 257.700110] x11: 203b776f6c667265 x10: 646e75203a745f74 x9 : ffffbb18c58f6c90 [ 257.707242] x8 : 75203b776f6c6672 x7 : 65646e75203a745f x6 : 0000000000000001 [ 257.714373] x5 : ffff470bff8ec418 x4 : 0000000000000000 x3 : 0000000000000027 [ 257.721506] x2 : 0000000000000000 x1 : 0000000000000027 x0 : 0000000000000026 [ 257.728638] Call trace: [ 257.731075] refcount_warn_saturate+0xf4/0x144 [ 257.735513] put_fb_info+0x70/0x7c [ 257.738916] fb_release+0x60/0x74 [ 257.742225] __fput+0x88/0x240 [ 257.745276] ____fput+0x1c/0x30 [ 257.748410] task_work_run+0xc4/0x21c [ 257.752066] do_exit+0x170/0x370 [ 257.755288] do_group_exit+0x40/0xb4 [ 257.758858] get_signal+0x8e0/0x90c [ 257.762339] do_signal+0x1a0/0x280 [ 257.765733] do_notify_resume+0xc8/0x390 [ 257.769650] el0_da+0xe8/0xf0 [ 257.772613] el0t_64_sync_handler+0xe8/0x130 [ 257.776877] el0t_64_sync+0x190/0x194 [ 257.780534] ---[ end trace 0000000000000000 ]---
Patch #1 adds a WARN_ON() to framebuffer_release() to prevent the use-after-free to happen.
Patch #2, #3 and #4 fix the simplefb, efifb and vesafb drivers respectively, to free the resources at the correct place.
Changes in v3: - Add Fixes: tag (Daniel Vetter). - Include in commit message a note that drivers are still broken but at least reverts to the previous behavior (Daniel Vetter). - Only move framebuffer_release() and don't do any other change (Daniel Vetter).
Changes in v2: - Also do the change for vesafb (Thomas Zimmermann).
Daniel Vetter (1): fbdev: Prevent possible use-after-free in fb_release()
Javier Martinez Canillas (3): fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove
drivers/video/fbdev/core/fbsysfs.c | 4 ++++ drivers/video/fbdev/efifb.c | 9 ++++++++- drivers/video/fbdev/simplefb.c | 8 +++++++- drivers/video/fbdev/vesafb.c | 8 +++++++- 4 files changed, 26 insertions(+), 3 deletions(-)
From: Daniel Vetter daniel.vetter@ffwll.ch
Most fbdev drivers have issues with the fb_info lifetime, because call to framebuffer_release() from their driver's .remove callback, rather than doing from fbops.fb_destroy callback.
Doing that will destroy the fb_info too early, while references to it may still exist, leading to a use-after-free error.
To prevent this, check the fb_info reference counter when attempting to kfree the data structure in framebuffer_release(). That will leak it but at least will prevent the mentioned error.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de ---
(no changes since v1)
drivers/video/fbdev/core/fbsysfs.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c index 8c1ee9ecec3d..c2a60b187467 100644 --- a/drivers/video/fbdev/core/fbsysfs.c +++ b/drivers/video/fbdev/core/fbsysfs.c @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) { if (!info) return; + + if (WARN_ON(refcount_read(&info->count))) + return; + kfree(info->apertures); kfree(info); }
On 06.05.2022 00:04, Javier Martinez Canillas wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
Most fbdev drivers have issues with the fb_info lifetime, because call to framebuffer_release() from their driver's .remove callback, rather than doing from fbops.fb_destroy callback.
Doing that will destroy the fb_info too early, while references to it may still exist, leading to a use-after-free error.
To prevent this, check the fb_info reference counter when attempting to kfree the data structure in framebuffer_release(). That will leak it but at least will prevent the mentioned error.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de
(no changes since v1)
drivers/video/fbdev/core/fbsysfs.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c index 8c1ee9ecec3d..c2a60b187467 100644 --- a/drivers/video/fbdev/core/fbsysfs.c +++ b/drivers/video/fbdev/core/fbsysfs.c @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) { if (!info) return;
- if (WARN_ON(refcount_read(&info->count)))
return;
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
Regarding fb drivers, just for stats: git grep -p framebuffer_release | grep _remove | wc -l Suggests there is at least 70 incorrect users of this :)
Regards Andrzej
Hello Andrzej,
On 5/9/22 16:56, Andrzej Hajda wrote:
On 06.05.2022 00:04, Javier Martinez Canillas wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
Most fbdev drivers have issues with the fb_info lifetime, because call to framebuffer_release() from their driver's .remove callback, rather than doing from fbops.fb_destroy callback.
Doing that will destroy the fb_info too early, while references to it may still exist, leading to a use-after-free error.
To prevent this, check the fb_info reference counter when attempting to kfree the data structure in framebuffer_release(). That will leak it but at least will prevent the mentioned error.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de
(no changes since v1)
drivers/video/fbdev/core/fbsysfs.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c index 8c1ee9ecec3d..c2a60b187467 100644 --- a/drivers/video/fbdev/core/fbsysfs.c +++ b/drivers/video/fbdev/core/fbsysfs.c @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) { if (!info) return;
- if (WARN_ON(refcount_read(&info->count)))
return;
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap); - framebuffer_release(info); } fb_helper->fbdev = NULL;
@@ -2112,6 +2111,7 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) static void drm_fbdev_fb_destroy(struct fb_info *info) { drm_fbdev_release(info->par); + framebuffer_release(info); }
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Regarding fb drivers, just for stats: git grep -p framebuffer_release | grep _remove | wc -l Suggests there is at least 70 incorrect users of this :)
Yes, Daniel already mentioned that most of them get this wrong but I was mostly interested in {simple,efi,vesa}fb since are used with "nomodeset".
But given that I only touched those tree and still managed to introduce a regression, I won't attempt to fix the others.
On 09.05.2022 17:30, Javier Martinez Canillas wrote:
Hello Andrzej,
On 5/9/22 16:56, Andrzej Hajda wrote:
On 06.05.2022 00:04, Javier Martinez Canillas wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
Most fbdev drivers have issues with the fb_info lifetime, because call to framebuffer_release() from their driver's .remove callback, rather than doing from fbops.fb_destroy callback.
Doing that will destroy the fb_info too early, while references to it may still exist, leading to a use-after-free error.
To prevent this, check the fb_info reference counter when attempting to kfree the data structure in framebuffer_release(). That will leak it but at least will prevent the mentioned error.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de
(no changes since v1)
drivers/video/fbdev/core/fbsysfs.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/core/fbsysfs.c b/drivers/video/fbdev/core/fbsysfs.c index 8c1ee9ecec3d..c2a60b187467 100644 --- a/drivers/video/fbdev/core/fbsysfs.c +++ b/drivers/video/fbdev/core/fbsysfs.c @@ -80,6 +80,10 @@ void framebuffer_release(struct fb_info *info) { if (!info) return;
- if (WARN_ON(refcount_read(&info->count)))
return;
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
What about cmap? I am not an fb expert, but IMO cmap can be accessed from userspace as well.
Regards Andrzej
} fb_helper->fbdev = NULL;
@@ -2112,6 +2111,7 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) static void drm_fbdev_fb_destroy(struct fb_info *info) { drm_fbdev_release(info->par);
framebuffer_release(info);
}
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Regarding fb drivers, just for stats: git grep -p framebuffer_release | grep _remove | wc -l Suggests there is at least 70 incorrect users of this :)
Yes, Daniel already mentioned that most of them get this wrong but I was mostly interested in {simple,efi,vesa}fb since are used with "nomodeset".
But given that I only touched those tree and still managed to introduce a regression, I won't attempt to fix the others.
On 5/9/22 17:51, Andrzej Hajda wrote:
[snip]
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
What about cmap? I am not an fb expert, but IMO cmap can be accessed from userspace as well.
I actually thought about the same but then remembered what Daniel said in [0] (AFAIU at least) that these should be done in .remove() so the current code looks like matches that and only framebuffer_release() should be moved.
For vesafb a previous patch proposed to also move a release_region() call to .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that is the correct thing to do.
[0]: https://www.spinics.net/lists/dri-devel/msg346257.html [1]: https://www.spinics.net/lists/dri-devel/msg346261.html
Hi Javier
Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
On 5/9/22 17:51, Andrzej Hajda wrote:
[snip]
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
What about cmap? I am not an fb expert, but IMO cmap can be accessed from userspace as well.
I actually thought about the same but then remembered what Daniel said in [0] (AFAIU at least) that these should be done in .remove() so the current code looks like matches that and only framebuffer_release() should be moved.
For vesafb a previous patch proposed to also move a release_region() call to .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that is the correct thing to do.
The cmap data structure is software state that can be accessed via icotl as long as the devfile is open. Drivers update the hardware from it. See [1]. Moving that cleanup into fb_destroy seems appropriate to me.
Best regards Thomas
[1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/video/fbdev/core/fbc...
On 5/9/22 20:12, Thomas Zimmermann wrote:
[snip]
I actually thought about the same but then remembered what Daniel said in [0] (AFAIU at least) that these should be done in .remove() so the current code looks like matches that and only framebuffer_release() should be moved.
For vesafb a previous patch proposed to also move a release_region() call to .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that is the correct thing to do.
The cmap data structure is software state that can be accessed via icotl as long as the devfile is open. Drivers update the hardware from it. See [1]. Moving that cleanup into fb_destroy seems appropriate to me.
I see, that makes sense. Then something like the following instead?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..ce0d89c49e42 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) cancel_work_sync(&fb_helper->resume_work); cancel_work_sync(&fb_helper->damage_work);
- info = fb_helper->fbdev; - if (info) { - if (info->cmap.len) - fb_dealloc_cmap(&info->cmap); - framebuffer_release(info); - } fb_helper->fbdev = NULL;
mutex_lock(&kernel_fb_helper_lock); @@ -2111,7 +2105,11 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) */ static void drm_fbdev_fb_destroy(struct fb_info *info) { + if (info->cmap.len) + fb_dealloc_cmap(&info->cmap); + drm_fbdev_release(info->par); + framebuffer_release(info); }
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
On 09.05.2022 22:03, Javier Martinez Canillas wrote:
On 5/9/22 20:12, Thomas Zimmermann wrote:
[snip]
I actually thought about the same but then remembered what Daniel said in [0] (AFAIU at least) that these should be done in .remove() so the current code looks like matches that and only framebuffer_release() should be moved.
For vesafb a previous patch proposed to also move a release_region() call to .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that is the correct thing to do.
The cmap data structure is software state that can be accessed via icotl as long as the devfile is open. Drivers update the hardware from it. See [1]. Moving that cleanup into fb_destroy seems appropriate to me.
I see, that makes sense. Then something like the following instead?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..ce0d89c49e42 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) cancel_work_sync(&fb_helper->resume_work); cancel_work_sync(&fb_helper->damage_work);
info = fb_helper->fbdev;
if (info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
} fb_helper->fbdev = NULL; mutex_lock(&kernel_fb_helper_lock);
@@ -2111,7 +2105,11 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) */ static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Regards Andrzej
}
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
I think that besides this patch, drivers shouldn't need to call to the drm_fb_helper_fini() function directly. Since that would be called during drm_fbdev_fb_destroy() anyways.
We should probably remove that call in all drivers and make this helper function static and just private to drm_fb_helper functions.
Or am I missing something here ?
[0]:
From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas javierm@redhat.com Date: Tue May 10 00:39:55 2022 +0200 Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info too early
Currently these are done in drm_fb_helper_fini() but this helper is called by drivers in their .remove callback, which could lead to a use-after-free if a process has opened the emulated fbdev node while a driver is removed.
For example, in i915 driver the call chain during remove is the following:
struct pci_driver i915_pci_driver = { ... .remove = i915_pci_remove, ... };
i915_pci_remove i915_driver_remove intel_modeset_driver_remove_noirq intel_fbdev_fini intel_fbdev_destroy drm_fb_helper_fini framebuffer_release
Later the process will close the fbdev node file descriptor leading to the mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following:
drm_fbdev_fb_destroy drm_fbdev_release(info->par); <-- info was already freed on .remove
To prevent that, let's move the framebuffer_release() call to the end of the drm_fbdev_fb_destroy() function.
Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early and is more correct to do it in drm_fbdev_fb_destroy() as well. After a call to drm_fbdev_release() has been made.
Reported-by: Andrzej Hajda andrzej.hajda@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com --- drivers/gpu/drm/drm_fb_helper.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..7288fbd26bcc 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) cancel_work_sync(&fb_helper->resume_work); cancel_work_sync(&fb_helper->damage_work);
- info = fb_helper->fbdev; - if (info) { - if (info->cmap.len) - fb_dealloc_cmap(&info->cmap); - framebuffer_release(info); - } fb_helper->fbdev = NULL;
mutex_lock(&kernel_fb_helper_lock); @@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) static void drm_fbdev_fb_destroy(struct fb_info *info) { drm_fbdev_release(info->par); + if (info->cmap.len) + fb_dealloc_cmap(&info->cmap); + framebuffer_release(info); }
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
On 10.05.2022 00:42, Javier Martinez Canillas wrote:
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
I think that besides this patch, drivers shouldn't need to call to the drm_fb_helper_fini() function directly. Since that would be called during drm_fbdev_fb_destroy() anyways.
We should probably remove that call in all drivers and make this helper function static and just private to drm_fb_helper functions.
Or am I missing something here ?
This is question for experts :) I do not know what are user API/ABI expectations regarding removal of fbdev driver, I wonder if they are documented somewhere :) Apparently we have some process of 'zombification' here - we need to remove the driver without waiting for userspace closing framebuffer(???) (to unbind ops-es and remove references to driver related things), but we need to leave some structures to fool userspace, 'info' seems to be one of them. So I guess there should be something called on driver's _remove path, and sth on destroy path.
Regards Andrzej
[0]: From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Tue May 10 00:39:55 2022 +0200 Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info too early
Currently these are done in drm_fb_helper_fini() but this helper is called by drivers in their .remove callback, which could lead to a use-after-free if a process has opened the emulated fbdev node while a driver is removed.
For example, in i915 driver the call chain during remove is the following:
struct pci_driver i915_pci_driver = { ... .remove = i915_pci_remove, ... };
i915_pci_remove i915_driver_remove intel_modeset_driver_remove_noirq intel_fbdev_fini intel_fbdev_destroy drm_fb_helper_fini framebuffer_release
Later the process will close the fbdev node file descriptor leading to the mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following:
drm_fbdev_fb_destroy drm_fbdev_release(info->par); <-- info was already freed on .remove
To prevent that, let's move the framebuffer_release() call to the end of the drm_fbdev_fb_destroy() function.
Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early and is more correct to do it in drm_fbdev_fb_destroy() as well. After a call to drm_fbdev_release() has been made.
Reported-by: Andrzej Hajda andrzej.hajda@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com
drivers/gpu/drm/drm_fb_helper.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..7288fbd26bcc 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) cancel_work_sync(&fb_helper->resume_work); cancel_work_sync(&fb_helper->damage_work);
info = fb_helper->fbdev;
if (info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
} fb_helper->fbdev = NULL;
mutex_lock(&kernel_fb_helper_lock);
@@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) static void drm_fbdev_fb_destroy(struct fb_info *info) { drm_fbdev_release(info->par);
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
framebuffer_release(info); }
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
On 5/10/22 09:19, Andrzej Hajda wrote:
On 10.05.2022 00:42, Javier Martinez Canillas wrote:
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
I think that besides this patch, drivers shouldn't need to call to the drm_fb_helper_fini() function directly. Since that would be called during drm_fbdev_fb_destroy() anyways.
We should probably remove that call in all drivers and make this helper function static and just private to drm_fb_helper functions.
Or am I missing something here ?
This is question for experts :)
Fair. I'm definitely not one of them :)
I do not know what are user API/ABI expectations regarding removal of fbdev driver, I wonder if they are documented somewhere :)
I don't know. At least I haven't found them.
Apparently we have some process of 'zombification' here - we need to remove the driver without waiting for userspace closing framebuffer(???) (to unbind ops-es and remove references to driver related things), but we need to leave some structures to fool userspace, 'info' seems to be one of them.
That's correct, yes. I think that any driver that provides a .mmap file operation would have the same issue. But drivers keep an internal state and just return -ENODEV or whatever on read/write/close after a removal.
The fbdev subsystem is different though since as you said it, the fbdev core unconditionally calls to the driver .fb_release() callback with a struct fb_info reference as argument.
I tried to prevent that with commit aafa025c76dc ("fbdev: Make fb_release() return -ENODEV if fbdev was unregistered") but Daniel pointed out that is was wrong since could leak memory allocated and was expected to be freed on release.
That's why I instead fixed the issue in the fbdev drivers and just added a warn on fb_release(), that is $SUBJECT.
So I guess there should be something called on driver's _remove path, and sth on destroy path.
That was my question actually, do we need something to be called in the destroy path ? Since that could just be internal to the DRM fb helpers.
In other words, drivers should only care about setting a generic fbdev by calling drm_fbdev_generic_setup(), and then do any HW cleanup in the removal path, but let the fb helpers to handle the SW cleanup in destroy.
On Tue, May 10, 2022 at 09:50:38AM +0200, Javier Martinez Canillas wrote:
On 5/10/22 09:19, Andrzej Hajda wrote:
On 10.05.2022 00:42, Javier Martinez Canillas wrote:
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
I think that besides this patch, drivers shouldn't need to call to the drm_fb_helper_fini() function directly. Since that would be called during drm_fbdev_fb_destroy() anyways.
We should probably remove that call in all drivers and make this helper function static and just private to drm_fb_helper functions.
Or am I missing something here ?
This is question for experts :)
Fair. I'm definitely not one of them :)
I do not know what are user API/ABI expectations regarding removal of fbdev driver, I wonder if they are documented somewhere :)
I don't know. At least I haven't found them.
Apparently we have some process of 'zombification' here - we need to remove the driver without waiting for userspace closing framebuffer(???) (to unbind ops-es and remove references to driver related things), but we need to leave some structures to fool userspace, 'info' seems to be one of them.
That's correct, yes. I think that any driver that provides a .mmap file operation would have the same issue. But drivers keep an internal state and just return -ENODEV or whatever on read/write/close after a removal.
Just commenting on the mmap part here. I think there's two options:
- shadow buffer for fbdev defio, and keep the shadow buffer around until fb_destroy
- redirect fbdev mmap fully to gem mmap, and make sure the gem mmap is hotunplug safe. The approach amd folks are pushing for that we discussed is to replace them all with a dummy r/w page, because removing the ptes means you can get a SIGBUS almost anywhere in application code, and that violates like all the assumptions behind gl/vk and would just crash your desktop. Reading/writing garbage otoh is generally much better.
So yeah hotunplug safe fbdev mmap is still quite a bit of work ...
Cheers, Daniel
The fbdev subsystem is different though since as you said it, the fbdev core unconditionally calls to the driver .fb_release() callback with a struct fb_info reference as argument.
I tried to prevent that with commit aafa025c76dc ("fbdev: Make fb_release() return -ENODEV if fbdev was unregistered") but Daniel pointed out that is was wrong since could leak memory allocated and was expected to be freed on release.
That's why I instead fixed the issue in the fbdev drivers and just added a warn on fb_release(), that is $SUBJECT.
So I guess there should be something called on driver's _remove path, and sth on destroy path.
That was my question actually, do we need something to be called in the destroy path ? Since that could just be internal to the DRM fb helpers.
In other words, drivers should only care about setting a generic fbdev by calling drm_fbdev_generic_setup(), and then do any HW cleanup in the removal path, but let the fb helpers to handle the SW cleanup in destroy.
-- Best regards,
Javier Martinez Canillas Linux Engineering Red Hat
Hi
Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas:
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
You have to go through all DRM drivers that call drm_fb_helper_fini() and make sure that they free fb_info. For example armada appears to be leaking now. [1]
Best regards Thomas
[1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/armada/armad...
I think that besides this patch, drivers shouldn't need to call to the drm_fb_helper_fini() function directly. Since that would be called during drm_fbdev_fb_destroy() anyways.
We should probably remove that call in all drivers and make this helper function static and just private to drm_fb_helper functions.
Or am I missing something here ?
[0]: From 5170cafcf2936da8f1c53231e3baa7d7a2b16c61 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Tue May 10 00:39:55 2022 +0200 Subject: [RFT PATCH] drm/fb-helper: Don't deallocate fb colormap and free fb info too early
Currently these are done in drm_fb_helper_fini() but this helper is called by drivers in their .remove callback, which could lead to a use-after-free if a process has opened the emulated fbdev node while a driver is removed.
For example, in i915 driver the call chain during remove is the following:
struct pci_driver i915_pci_driver = { ... .remove = i915_pci_remove, ... };
i915_pci_remove i915_driver_remove intel_modeset_driver_remove_noirq intel_fbdev_fini intel_fbdev_destroy drm_fb_helper_fini framebuffer_release
Later the process will close the fbdev node file descriptor leading to the mentioned use-after-free bug in drm_fbdev_fb_destroy(), due the following:
drm_fbdev_fb_destroy drm_fbdev_release(info->par); <-- info was already freed on .remove
To prevent that, let's move the framebuffer_release() call to the end of the drm_fbdev_fb_destroy() function.
Also, the call to fb_dealloc_cmap() in drm_fb_helper_fini() is too early and is more correct to do it in drm_fbdev_fb_destroy() as well. After a call to drm_fbdev_release() has been made.
Reported-by: Andrzej Hajda andrzej.hajda@intel.com Signed-off-by: Javier Martinez Canillas javierm@redhat.com
drivers/gpu/drm/drm_fb_helper.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..7288fbd26bcc 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -627,12 +627,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) cancel_work_sync(&fb_helper->resume_work); cancel_work_sync(&fb_helper->damage_work);
info = fb_helper->fbdev;
if (info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
} fb_helper->fbdev = NULL;
mutex_lock(&kernel_fb_helper_lock);
@@ -2112,6 +2106,9 @@ static void drm_fbdev_release(struct drm_fb_helper *fb_helper) static void drm_fbdev_fb_destroy(struct fb_info *info) { drm_fbdev_release(info->par);
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
framebuffer_release(info); }
static int drm_fbdev_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
Hello Thomas,
On 5/10/22 10:04, Thomas Zimmermann wrote:
Hi
Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas:
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
You have to go through all DRM drivers that call drm_fb_helper_fini() and make sure that they free fb_info. For example armada appears to be leaking now. [1]
But shouldn't fb_info be freed when unregister_framebuffer() is called through drm_dev_unregister() ? AFAICT the call chain is the following:
drm_put_dev() drm_dev_unregister() drm_client_dev_unregister() drm_fbdev_client_unregister() drm_fb_helper_unregister_fbi() unregister_framebuffer() do_unregister_framebuffer() put_fb_info() drm_fbdev_fb_destroy() framebuffer_release()
which is the reason why I believe that drm_fb_helper_fini() should be an internal static function and only called from drm_fbdev_fb_destroy().
Drivers shouldn't really explicitly call this helper in my opinion.
Hi
Am 10.05.22 um 10:30 schrieb Javier Martinez Canillas:
Hello Thomas,
On 5/10/22 10:04, Thomas Zimmermann wrote:
Hi
Am 10.05.22 um 00:42 schrieb Javier Martinez Canillas:
On 5/10/22 00:22, Andrzej Hajda wrote:
[snip]
static void drm_fbdev_fb_destroy(struct fb_info *info) {
if (info->cmap.len)
fb_dealloc_cmap(&info->cmap);
drm_fbdev_release(info->par);
framebuffer_release(info);
I would put drm_fbdev_release at the beginning - it cancels workers which could expect cmap to be still valid.
Indeed, you are correct again. [0] is the final version of the patch I've but don't have an i915 test machine to give it a try. I'll test tomorrow on my test systems to verify that it doesn't cause any regressions since with other DRM drivers.
You have to go through all DRM drivers that call drm_fb_helper_fini() and make sure that they free fb_info. For example armada appears to be leaking now. [1]
But shouldn't fb_info be freed when unregister_framebuffer() is called through drm_dev_unregister() ? AFAICT the call chain is the following:
drm_put_dev() drm_dev_unregister() drm_client_dev_unregister() drm_fbdev_client_unregister() drm_fb_helper_unregister_fbi() unregister_framebuffer() do_unregister_framebuffer() put_fb_info() drm_fbdev_fb_destroy() framebuffer_release()
which is the reason why I believe that drm_fb_helper_fini() should be an internal static function and only called from drm_fbdev_fb_destroy().
Drivers shouldn't really explicitly call this helper in my opinion.
Thanks. That makes sense.
Best regards Thomas
Hi
Am 10.05.22 um 10:37 schrieb Thomas Zimmermann: ...
You have to go through all DRM drivers that call drm_fb_helper_fini() and make sure that they free fb_info. For example armada appears to be leaking now. [1]
But shouldn't fb_info be freed when unregister_framebuffer() is called through drm_dev_unregister() ? AFAICT the call chain is the following:
drm_put_dev() drm_dev_unregister() drm_client_dev_unregister() drm_fbdev_client_unregister() drm_fb_helper_unregister_fbi() unregister_framebuffer() do_unregister_framebuffer() put_fb_info() drm_fbdev_fb_destroy() framebuffer_release()
which is the reason why I believe that drm_fb_helper_fini() should be an internal static function and only called from drm_fbdev_fb_destroy().
Drivers shouldn't really explicitly call this helper in my opinion.
One more stupid question: does armada actually use drm_fbdev_fb_destroy()? It's supposed to be a callback for struct fb_ops. Armada uses it's own instance of fb_ops, which apparently doesn't contain fb_destroy. [1]
Best regards Thomas
[1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/armada/armad...
Thanks. That makes sense.
Best regards Thomas
Hello Thomas,
On 5/10/22 10:50, Thomas Zimmermann wrote:
[snip]
Drivers shouldn't really explicitly call this helper in my opinion.
One more stupid question: does armada actually use drm_fbdev_fb_destroy()? It's supposed to be a callback for struct fb_ops. Armada uses it's own instance of fb_ops, which apparently doesn't contain fb_destroy. [1]
No stupid question at all. You are correct on this. So I guess we still need this call in the drivers that don't provide a .fb_destroy() handler.
I see many options here:
1) Document in drm_fb_helper_alloc_fbi() that drivers only need to call drm_fb_helper_fini() explicitly if they are not setting up a fbdev with drm_fbdev_generic_setup(), otherwise is not needed.
2) Make drm_fbdev_fb_destroy() an exported symbol so drivers that have custom fb_ops can use it.
3) Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info().
I'm leaning towards option (3). Then the fb_info release will be automatic whether drivers are using the generic setup or a custom one.
Hi Javier
Am 10.05.22 um 11:06 schrieb Javier Martinez Canillas:
Hello Thomas,
On 5/10/22 10:50, Thomas Zimmermann wrote:
[snip]
Drivers shouldn't really explicitly call this helper in my opinion.
One more stupid question: does armada actually use drm_fbdev_fb_destroy()? It's supposed to be a callback for struct fb_ops. Armada uses it's own instance of fb_ops, which apparently doesn't contain fb_destroy. [1]
No stupid question at all. You are correct on this. So I guess we still need this call in the drivers that don't provide a .fb_destroy() handler.
I see many options here:
Document in drm_fb_helper_alloc_fbi() that drivers only need to call drm_fb_helper_fini() explicitly if they are not setting up a fbdev with drm_fbdev_generic_setup(), otherwise is not needed.
Make drm_fbdev_fb_destroy() an exported symbol so drivers that have custom fb_ops can use it.
Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info().
I'm leaning towards option (3). Then the fb_info release will be automatic whether drivers are using the generic setup or a custom one.
IMHO this would just be another glitch to paper over all the broken code. And if you follow through drm_fbdev_fb_helper(), [1] it'll call _fini at some point and probably blow up in some other way. Instances of struct fb_ops are also usually const.
The only reliable way AFAICT is to do what generic fbdev does: use unregister_framebuffer and do the software cleanup somewhere within fb_destroy. And then fix all drivers to use that pattern.
Best regards Thomas
[1] https://elixir.bootlin.com/linux/v5.17.6/source/drivers/gpu/drm/drm_fb_helpe...
On 5/10/22 11:39, Thomas Zimmermann wrote:
[snip]
- Set .fb_destroy to drm_fbdev_fb_destroy() if isn't set by drivers when they call drm_fb_helper_initial_config() or drm_fb_helper_fill_info().
I'm leaning towards option (3). Then the fb_info release will be automatic whether drivers are using the generic setup or a custom one.
IMHO this would just be another glitch to paper over all the broken code. And if you follow through drm_fbdev_fb_helper(), [1] it'll call _fini at some point and probably blow up in some other way. Instances of struct fb_ops are also usually const.
The only reliable way AFAICT is to do what generic fbdev does: use unregister_framebuffer and do the software cleanup somewhere within fb_destroy. And then fix all drivers to use that pattern.
Right. We can't really abstract this away from drivers that are not using the generic fbdev helpers. So then they will have to provide their own .fb_destroy() callback and do the cleanup.
Hi
Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
On 5/9/22 17:51, Andrzej Hajda wrote:
[snip]
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
After reviewing that code, drm_fb_helper_fini() appears to be called from .fb_destroy (see drm_fbdev_release). The code is hard to follow though. If there another way of releasing the framebuffer here?
Best regards Thomas
What about cmap? I am not an fb expert, but IMO cmap can be accessed from userspace as well.
I actually thought about the same but then remembered what Daniel said in [0] (AFAIU at least) that these should be done in .remove() so the current code looks like matches that and only framebuffer_release() should be moved.
For vesafb a previous patch proposed to also move a release_region() call to .fb_destroy() and Daniel also said that it was iffy and shouldn't be done [1].
But I'm also not fb expert so happy to move fb_dealloc_cmap() as well if that is the correct thing to do.
Hello Thomas,
On 5/9/22 20:32, Thomas Zimmermann wrote:
Hi
Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
On 5/9/22 17:51, Andrzej Hajda wrote:
[snip]
Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
After reviewing that code, drm_fb_helper_fini() appears to be called from .fb_destroy (see drm_fbdev_release). The code is hard to follow though. If there another way of releasing the framebuffer here?
Andrzej mentioned intel/radeon/nouveau as example, I only looked at i915 and the call chain is the following as far as I can tell:
struct pci_driver i915_pci_driver = { ... .remove = i915_pci_remove, ... };
i915_driver_remove intel_modeset_driver_remove_noirq intel_fbdev_fini intel_fbdev_destroy drm_fb_helper_fini framebuffer_release
So my underdestanding is that if a program has the emulated fbdev device opened and the i915 module is removed, then a use-after-free would be triggered on drm_fbdev_fb_destroy() once the program closes the device:
drm_fbdev_fb_destroy drm_fbdev_release(info->par); <-- info was already freed on .remove
On Mon, May 09, 2022 at 10:00:41PM +0200, Javier Martinez Canillas wrote:
Hello Thomas,
On 5/9/22 20:32, Thomas Zimmermann wrote:
Hi
Am 09.05.22 um 18:33 schrieb Javier Martinez Canillas:
On 5/9/22 17:51, Andrzej Hajda wrote:
[snip]
> + Regarding drm: What about drm_fb_helper_fini? It calls also framebuffer_release and is called often from _remove paths (checked intel/radeon/nouveau). I guess it should be fixed as well. Do you plan to fix it?
I think you are correct. Maybe we need something like the following?
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d265a73313c9..b09598f7af28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -631,7 +631,6 @@ void drm_fb_helper_fini(struct drm_fb_helper *fb_helper) if (info) { if (info->cmap.len) fb_dealloc_cmap(&info->cmap);
framebuffer_release(info);
After reviewing that code, drm_fb_helper_fini() appears to be called from .fb_destroy (see drm_fbdev_release). The code is hard to follow though. If there another way of releasing the framebuffer here?
Andrzej mentioned intel/radeon/nouveau as example, I only looked at i915 and the call chain is the following as far as I can tell:
struct pci_driver i915_pci_driver = { ... .remove = i915_pci_remove, ... };
i915_driver_remove intel_modeset_driver_remove_noirq intel_fbdev_fini intel_fbdev_destroy drm_fb_helper_fini framebuffer_release
So my underdestanding is that if a program has the emulated fbdev device opened and the i915 module is removed, then a use-after-free would be triggered on drm_fbdev_fb_destroy() once the program closes the device:
drm_fbdev_fb_destroy drm_fbdev_release(info->par); <-- info was already freed on .remove
Yeah the old drivers that haven't switched over to the drm_client based fbdev emulations are all kinds of wrong and release it too early.
Note that they don't use the provided ->remove hook, since that would result in double-cleanup like you point out. Those old drivers work more like all the other fbdev drivers where all the cleanup is done in ->remove, and if it's a real hotunplug you just die in fire :-/
Switching them all over to the drm_client based fbdev helpers and unexporting these old (dangerous!) functions would be really neat. But it's also a loooooot of work, and generally those big drivers don't get hotunplugged. -Daniel
The driver is calling framebuffer_release() in its .remove callback, but this will cause the struct fb_info to be freed too early. Since it could be that a reference is still hold to it if user-space opened the fbdev.
This would lead to a use-after-free error if the framebuffer device was unregistered but later a user-space process tries to close the fbdev fd.
To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy instead of doing it in the driver's .remove callback.
Strictly speaking, the code flow in the driver is still wrong because all the hardware cleanupd (i.e: iounmap) should be done in .remove while the software cleanup (i.e: releasing the framebuffer) should be done in the .fb_destroy handler. But this at least makes to match the behavior before commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch ---
(no changes since v1)
drivers/video/fbdev/simplefb.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c index 94fc9c6d0411..2c198561c338 100644 --- a/drivers/video/fbdev/simplefb.c +++ b/drivers/video/fbdev/simplefb.c @@ -84,6 +84,10 @@ struct simplefb_par { static void simplefb_clocks_destroy(struct simplefb_par *par); static void simplefb_regulators_destroy(struct simplefb_par *par);
+/* + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end + * of unregister_framebuffer() or fb_release(). Do any cleanup here. + */ static void simplefb_destroy(struct fb_info *info) { struct simplefb_par *par = info->par; @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) if (info->screen_base) iounmap(info->screen_base);
+ framebuffer_release(info); + if (mem) release_mem_region(mem->start, resource_size(mem)); } @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev) { struct fb_info *info = platform_get_drvdata(pdev);
+ /* simplefb_destroy takes care of info cleanup */ unregister_framebuffer(info); - framebuffer_release(info);
return 0; }
The driver is calling framebuffer_release() in its .remove callback, but this will cause the struct fb_info to be freed too early. Since it could be that a reference is still hold to it if user-space opened the fbdev.
This would lead to a use-after-free error if the framebuffer device was unregistered but later a user-space process tries to close the fbdev fd.
To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy instead of doing it in the driver's .remove callback.
Strictly speaking, the code flow in the driver is still wrong because all the hardware cleanupd (i.e: iounmap) should be done in .remove while the software cleanup (i.e: releasing the framebuffer) should be done in the .fb_destroy handler. But this at least makes to match the behavior before commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch ---
(no changes since v1)
drivers/video/fbdev/efifb.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c index ea42ba6445b2..cfa3dc0b4eee 100644 --- a/drivers/video/fbdev/efifb.c +++ b/drivers/video/fbdev/efifb.c @@ -243,6 +243,10 @@ static void efifb_show_boot_graphics(struct fb_info *info) static inline void efifb_show_boot_graphics(struct fb_info *info) {} #endif
+/* + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end + * of unregister_framebuffer() or fb_release(). Do any cleanup here. + */ static void efifb_destroy(struct fb_info *info) { if (efifb_pci_dev) @@ -254,6 +258,9 @@ static void efifb_destroy(struct fb_info *info) else memunmap(info->screen_base); } + + framebuffer_release(info); + if (request_mem_succeeded) release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size); @@ -620,9 +627,9 @@ static int efifb_remove(struct platform_device *pdev) { struct fb_info *info = platform_get_drvdata(pdev);
+ /* efifb_destroy takes care of info cleanup */ unregister_framebuffer(info); sysfs_remove_groups(&pdev->dev.kobj, efifb_groups); - framebuffer_release(info);
return 0; }
On 06.05.2022 00:05, Javier Martinez Canillas wrote:
The driver is calling framebuffer_release() in its .remove callback, but this will cause the struct fb_info to be freed too early. Since it could be that a reference is still hold to it if user-space opened the fbdev.
This would lead to a use-after-free error if the framebuffer device was unregistered but later a user-space process tries to close the fbdev fd.
To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy instead of doing it in the driver's .remove callback.
Strictly speaking, the code flow in the driver is still wrong because all the hardware cleanupd (i.e: iounmap) should be done in .remove while the software cleanup (i.e: releasing the framebuffer) should be done in the .fb_destroy handler. But this at least makes to match the behavior before commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
(no changes since v1)
drivers/video/fbdev/efifb.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c index ea42ba6445b2..cfa3dc0b4eee 100644 --- a/drivers/video/fbdev/efifb.c +++ b/drivers/video/fbdev/efifb.c @@ -243,6 +243,10 @@ static void efifb_show_boot_graphics(struct fb_info *info) static inline void efifb_show_boot_graphics(struct fb_info *info) {} #endif
+/*
- fb_ops.fb_destroy is called by the last put_fb_info() call at the end
- of unregister_framebuffer() or fb_release(). Do any cleanup here.
- */ static void efifb_destroy(struct fb_info *info) { if (efifb_pci_dev)
@@ -254,6 +258,9 @@ static void efifb_destroy(struct fb_info *info) else memunmap(info->screen_base); }
- framebuffer_release(info);
- if (request_mem_succeeded) release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
You are releasing info, then you are using it.
I suspect it is responsible for multiple failures of Intel CI [1].
[1]: http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_11615/fi-adl-ddr5/boot0.txt
Regards Andrzej
@@ -620,9 +627,9 @@ static int efifb_remove(struct platform_device *pdev) { struct fb_info *info = platform_get_drvdata(pdev);
- /* efifb_destroy takes care of info cleanup */ unregister_framebuffer(info); sysfs_remove_groups(&pdev->dev.kobj, efifb_groups);
framebuffer_release(info);
return 0; }
Hello Andrzej,
On 5/6/22 15:07, Andrzej Hajda wrote:
On 06.05.2022 00:05, Javier Martinez Canillas wrote:
[snip]
- framebuffer_release(info);
- if (request_mem_succeeded) release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
You are releasing info, then you are using it.
I suspect it is responsible for multiple failures of Intel CI [1].
Yes, it is :( sorry about the mess. Ville already reported this to me. I'll post a patch in a minute.
I wonder how this didn't happen before since .remove() happens before .fb_destroy() AFAIU...
Greeting,
FYI, we noticed the following commit (built with gcc-11):
commit: c6a2b1a99968f3fd74b6b16250d871e5b239e5a8 ("[PATCH v3 3/4] fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove") url: https://github.com/intel-lab-lkp/linux/commits/Javier-Martinez-Canillas/fbde... base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 107c948d1d3e61d10aee9d0f7c3d81bbee9842af patch link: https://lore.kernel.org/lkml/20220505220540.366218-1-javierm@redhat.com
in testcase: igt version: igt-x86_64-eddc67c5-1_20220430 with following parameters:
group: group-07 ucode: 0xec
on test machine: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 32G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag Reported-by: kernel test robot oliver.sang@intel.com
[ 25.615590][ T318] BUG: KASAN: use-after-free in efifb_destroy (drivers/video/fbdev/efifb.c:265) [ 25.622534][ T318] Read of size 8 at addr ffff888143978358 by task systemd-udevd/318 [ 25.630335][ T318] [ 25.632531][ T318] CPU: 8 PID: 318 Comm: systemd-udevd Not tainted 5.18.0-rc5-00019-gc6a2b1a99968 #1 [ 25.641707][ T318] Hardware name: Dell Inc. OptiPlex 7060/0C96W1, BIOS 1.4.2 06/11/2019 [ 25.641712][ T318] Call Trace: [ 25.641714][ T318] <TASK> [ 25.641715][ T318] ? efifb_destroy (drivers/video/fbdev/efifb.c:265) [ 25.641722][ T318] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) [ 25.641726][ T318] print_address_description+0x1f/0x200 [ 25.641732][ T318] print_report.cold (mm/kasan/report.c:430) [ 25.641736][ T318] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 25.687074][ T318] kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) [ 25.691341][ T318] ? efifb_destroy (drivers/video/fbdev/efifb.c:265) [ 25.695954][ T318] efifb_destroy (drivers/video/fbdev/efifb.c:265) [ 25.700394][ T318] ? put_fb_info (arch/x86/include/asm/atomic.h:190 include/linux/atomic/atomic-instrumented.h:177 include/linux/refcount.h:272 include/linux/refcount.h:315 include/linux/refcount.h:333 drivers/video/fbdev/core/fbmem.c:79) [ 25.704662][ T318] efifb_remove (drivers/video/fbdev/efifb.c:632) [ 25.714635][ T318] device_release_driver_internal (drivers/base/dd.c:1202 drivers/base/dd.c:1223) 1;39mLSB: Load k[ 25.720538][ T318] ? klist_put (include/linux/kref.h:66 lib/klist.c:206 lib/klist.c:217) [ 25.726157][ T318] bus_remove_device (drivers/base/bus.c:530) [ 25.730943][ T318] device_del (drivers/base/core.c:3593) ernel image with[ 25.735119][ T318] ? __device_link_del (drivers/base/core.c:3548) [ 25.741431][ T318] ? _printk (kernel/printk/printk.c:2288) [ 25.745350][ T318] ? record_print_text.cold (kernel/printk/printk.c:2288) [ 25.751663][ T318] platform_device_del (drivers/base/platform.c:760) [ 25.757138][ T318] platform_device_unregister (drivers/base/platform.c:547 drivers/base/platform.c:790) [ 25.762523][ T318] do_remove_conflicting_framebuffers (drivers/video/fbdev/core/fbmem.c:1591) [ 25.768768][ T318] remove_conflicting_framebuffers (drivers/video/fbdev/core/fbmem.c:1780) [ 25.774666][ T318] remove_conflicting_pci_framebuffers (drivers/video/fbdev/core/fbmem.c:1879) [ 25.780995][ T318] ? __mutex_unlock_slowpath+0x2c0/0x2c0 [ 25.787501][ T318] ? memcpy (mm/kasan/shadow.c:65 (discriminator 1)) [ 25.791331][ T318] drm_aperture_remove_conflicting_pci_framebuffers (drivers/gpu/drm/drm_aperture.c:349) drm [ 25.799343][ T318] i915_driver_hw_probe (drivers/gpu/drm/i915/i915_driver.c:588) i915 [ 25.805153][ T318] i915_driver_probe (drivers/gpu/drm/i915/i915_driver.c:854) i915 [ 25.810660][ T318] ? __mutex_unlock_slowpath+0x2c0/0x2c0 [ 25.817167][ T318] ? i915_print_iommu_status (drivers/gpu/drm/i915/i915_driver.c:824) i915 [ 25.823380][ T318] ? drm_privacy_screen_get (drivers/gpu/drm/drm_privacy_screen.c:167) drm [ 25.829320][ T318] i915_pci_probe (drivers/gpu/drm/i915/i915_pci.c:1191) i915 [ 25.834593][ T318] ? i915_pci_remove (drivers/gpu/drm/i915/i915_pci.c:1191) i915 [ 25.839932][ T318] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) [ 25.845575][ T318] ? __cond_resched (kernel/sched/core.c:8177) [ 25.850098][ T318] ? i915_pci_remove (drivers/gpu/drm/i915/i915_pci.c:1191) i915 [ 25.855450][ T318] local_pci_probe (drivers/pci/pci-driver.c:323) [ 25.859973][ T318] pci_call_probe (drivers/pci/pci-driver.c:391) [ 25.864496][ T318] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 25.869102][ T318] ? pci_pm_suspend_noirq (drivers/pci/pci-driver.c:351) [ 25.874308][ T318] ? pci_assign_irq (drivers/pci/setup-irq.c:25) [ 25.878919][ T318] ? pci_match_device (drivers/pci/pci-driver.c:107 drivers/pci/pci-driver.c:158) [ 25.883784][ T318] ? kernfs_put (arch/x86/include/asm/atomic.h:123 (discriminator 1) include/linux/atomic/atomic-instrumented.h:576 (discriminator 1) fs/kernfs/dir.c:513 (discriminator 1)) [ 25.887959][ T318] pci_device_probe (drivers/pci/pci-driver.c:460) [ 25.892575][ T318] ? pci_dma_configure (drivers/pci/pci-driver.c:1620) [ 25.897448][ T318] really_probe (drivers/base/dd.c:542 drivers/base/dd.c:621) [ 25.901803][ T318] __driver_probe_device (drivers/base/dd.c:752) [ 25.906936][ T318] driver_probe_device (drivers/base/dd.c:782) [ 25.911806][ T318] __driver_attach (drivers/base/dd.c:1142) [ 25.916420][ T318] ? __device_attach_driver (drivers/base/dd.c:1094) [ 25.921804][ T318] bus_for_each_dev (drivers/base/bus.c:301) [ 25.926500][ T318] ? subsys_dev_iter_exit (drivers/base/bus.c:290) [ 25.931541][ T318] ? klist_add_tail (include/linux/list.h:69 include/linux/list.h:102 lib/klist.c:104 lib/klist.c:137) [ 25.936234][ T318] bus_add_driver (drivers/base/bus.c:619) [ 25.940756][ T318] driver_register (drivers/base/driver.c:171) [ 25.945365][ T318] i915_init (drivers/gpu/drm/i915/i915_driver.c:1004) i915 [ 25.950136][ T318] ? 0xffffffffc0b7d000 [ 25.954138][ T318] do_one_initcall (init/main.c:1298) [ 25.958665][ T318] ? trace_event_raw_event_initcall_level (init/main.c:1289) [ 25.965258][ T318] ? __asan_register_globals (mm/kasan/generic.c:214 (discriminator 3) mm/kasan/generic.c:226 (discriminator 3)) [ 25.970556][ T318] ? kasan_unpoison (mm/kasan/shadow.c:108 mm/kasan/shadow.c:142) [ 25.975079][ T318] do_init_module (kernel/module.c:3731) [ 25.979599][ T318] __do_sys_finit_module (kernel/module.c:4222) [ 25.984720][ T318] ? __ia32_sys_init_module (kernel/module.c:4190) [ 25.989927][ T318] ? __seccomp_filter (arch/x86/include/asm/bitops.h:214 include/asm-generic/bitops/instrumented-non-atomic.h:135 kernel/seccomp.c:351 kernel/seccomp.c:378 kernel/seccomp.c:410 kernel/seccomp.c:1183) [ 25.994794][ T318] ? vm_mmap_pgoff (mm/util.c:523) [ 25.999405][ T318] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 26.003668][ T318] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) [ 26.009398][ T318] RIP: 0033:0x7f280dd1b989 [ 26.013659][ T318] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 64 0c 00 f7 d8 64 89 01 48 All code ======== 0: 00 c3 add %al,%bl 2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 9: 00 00 00 c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 11: 48 89 f8 mov %rdi,%rax 14: 48 89 f7 mov %rsi,%rdi 17: 48 89 d6 mov %rdx,%rsi 1a: 48 89 ca mov %rcx,%rdx 1d: 4d 89 c2 mov %r8,%r10 20: 4d 89 c8 mov %r9,%r8 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 01 jae 0x33 32: c3 retq 33: 48 8b 0d d7 64 0c 00 mov 0xc64d7(%rip),%rcx # 0xc6511 3a: f7 d8 neg %eax 3c: 64 89 01 mov %eax,%fs:(%rcx) 3f: 48 rex.W
Code starting with the faulting instruction =========================================== 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 6: 73 01 jae 0x9 8: c3 retq 9: 48 8b 0d d7 64 0c 00 mov 0xc64d7(%rip),%rcx # 0xc64e7 10: f7 d8 neg %eax 12: 64 89 01 mov %eax,%fs:(%rcx) 15: 48 rex.W [ 26.033004][ T318] RSP: 002b:00007ffcd7609228 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 26.041236][ T318] RAX: ffffffffffffffda RBX: 0000561117222e00 RCX: 00007f280dd1b989 [ 26.049030][ T318] RDX: 0000000000000000 RSI: 00007f280dc20cad RDI: 0000000000000018 [ 26.056823][ T318] RBP: 00007f280dc20cad R08: 0000000000000000 R09: 0000000000000000 [ 26.064618][ T318] R10: 0000000000000018 R11: 0000000000000246 R12: 0000000000000000 [ 26.072414][ T318] R13: 000056111726f6d0 R14: 0000000000020000 R15: 0000561117222e00 [ 26.080214][ T318] </TASK> [ 26.083093][ T318] [ 26.085278][ T318] Allocated by task 1: [ 26.089189][ T318] kasan_save_stack (mm/kasan/common.c:39) [ 26.093708][ T318] __kasan_kmalloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515 mm/kasan/common.c:524) [ 26.098143][ T318] framebuffer_alloc (drivers/video/fbdev/core/fbsysfs.c:49) [ 26.102833][ T318] efifb_probe.cold (drivers/video/fbdev/efifb.c:461) [ 26.107611][ T318] platform_probe (drivers/base/platform.c:1416) [ 26.112048][ T318] really_probe (drivers/base/dd.c:542 drivers/base/dd.c:621) [ 26.116399][ T318] __driver_probe_device (drivers/base/dd.c:752) [ 26.121521][ T318] driver_probe_device (drivers/base/dd.c:782) [ 26.126389][ T318] __device_attach_driver (drivers/base/dd.c:900) [ 26.131606][ T318] bus_for_each_drv (drivers/base/bus.c:385 drivers/base/bus.c:426) [ 26.136302][ T318] __device_attach (drivers/base/dd.c:970) [ 26.140917][ T318] bus_probe_device (drivers/base/bus.c:489) [ 26.145617][ T318] device_add (drivers/base/core.c:3412) [ 26.149882][ T318] platform_device_add (drivers/base/platform.c:713) [ 26.154842][ T318] sysfb_init (drivers/firmware/sysfb.c:72) [ 26.158844][ T318] do_one_initcall (init/main.c:1298) [ 26.163364][ T318] do_initcalls (init/main.c:1370 init/main.c:1387) [ 26.167709][ T318] kernel_init_freeable (init/main.c:1617) [ 26.172747][ T318] kernel_init (init/main.c:1504) [ 26.176921][ T318] ret_from_fork (arch/x86/entry/entry_64.S:304) [ 26.181184][ T318] [ 26.183379][ T318] Freed by task 318: [ 26.187123][ T318] kasan_save_stack (mm/kasan/common.c:39) [ 26.191644][ T318] kasan_set_track (mm/kasan/common.c:45) [ 26.196082][ T318] kasan_set_free_info (mm/kasan/generic.c:372) [ 26.200864][ T318] __kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328 mm/kasan/common.c:374) [ 26.205644][ T318] kfree (mm/slub.c:1754 mm/slub.c:3510 mm/slub.c:4552) [ 26.209302][ T318] efifb_destroy (drivers/video/fbdev/efifb.c:264) [ 26.213651][ T318] efifb_remove (drivers/video/fbdev/efifb.c:632) [ 26.217829][ T318] platform_remove (drivers/base/platform.c:1438) [ 26.222265][ T318] device_release_driver_internal (drivers/base/dd.c:1202 drivers/base/dd.c:1223) [ 26.228165][ T318] bus_remove_device (drivers/base/bus.c:530) [ 26.232943][ T318] device_del (drivers/base/core.c:3593) [ 26.237116][ T318] platform_device_del (drivers/base/platform.c:760) [ 26.242585][ T318] platform_device_unregister (drivers/base/platform.c:547 drivers/base/platform.c:790)
To reproduce:
git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
The driver is calling framebuffer_release() in its .remove callback, but this will cause the struct fb_info to be freed too early. Since it could be that a reference is still hold to it if user-space opened the fbdev.
This would lead to a use-after-free error if the framebuffer device was unregistered but later a user-space process tries to close the fbdev fd.
To prevent this, move the framebuffer_release() call to fb_ops.fb_destroy instead of doing it in the driver's .remove callback.
Strictly speaking, the code flow in the driver is still wrong because all the hardware cleanupd (i.e: iounmap) should be done in .remove while the software cleanup (i.e: releasing the framebuffer) should be done in the .fb_destroy handler. But this at least makes to match the behavior before commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal").
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch ---
Changes in v3: - Only move framebuffer_release() and don't do any other change (Daniel Vetter).
Changes in v2: - Also do the change for vesafb (Thomas Zimmermann).
drivers/video/fbdev/vesafb.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c index df6de5a9dd4c..e25e8de5ff67 100644 --- a/drivers/video/fbdev/vesafb.c +++ b/drivers/video/fbdev/vesafb.c @@ -179,6 +179,10 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green, return err; }
+/* + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end + * of unregister_framebuffer() or fb_release(). Do any cleanup here. + */ static void vesafb_destroy(struct fb_info *info) { struct vesafb_par *par = info->par; @@ -188,6 +192,8 @@ static void vesafb_destroy(struct fb_info *info) if (info->screen_base) iounmap(info->screen_base); release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size); + + framebuffer_release(info); }
static struct fb_ops vesafb_ops = { @@ -484,10 +490,10 @@ static int vesafb_remove(struct platform_device *pdev) { struct fb_info *info = platform_get_drvdata(pdev);
+ /* vesafb_destroy takes care of info cleanup */ unregister_framebuffer(info); if (((struct vesafb_par *)(info->par))->region) release_region(0x3c0, 32); - framebuffer_release(info);
return 0; }
On 5/5/22 23:59, Javier Martinez Canillas wrote:
Hello,
This series contains patches suggested by Daniel Vetter to fix a use-after-free error in the fb_release() function, due a fb_info associated with a fbdev being freed too early while a user-space process still has the fbdev dev node opened.
Pushed this to drm-misc (drm-misc-fixes).
dri-devel@lists.freedesktop.org