(was: drm/vram-helper: Fix performance regression in fbdev)
Generic fbdev emulation maps and unmaps the console BO for updating it's content from the shadow buffer. If this involves an actual mapping operation (instead of reusing an existing mapping), lots of debug messages may be printed, such as
x86/PAT: Overlap at 0xd0000000-0xd1000000 x86/PAT: reserve_memtype added [mem 0xd0000000-0xd02fffff], track write-combining, req write-combining, ret write-combining x86/PAT: free_memtype request [mem 0xd0000000-0xd02fffff]
as reported at [1]. Drivers using VRAM helpers may also see reduced performance as the mapping operations can create overhead.
This patch set fixes the problem by adding a ref counter to the GEM VRAM buffers' kmap operation, and keeping the fbdev's console buffer mapped while the console is being displayed. These changes avoid the frequent mappings in the fbdev code. The drivers, ast and mgag200, map the console's buffer when it becomes visible and the fbdev code reuses this mapping. The original fbdev code in ast and mgag200 used the same strategy.
[1] https://lists.freedesktop.org/archives/dri-devel/2019-September/234308.html
v2: * fixed comment typos
Thomas Zimmermann (3): drm/vram: Add kmap ref-counting to GEM VRAM objects drm/ast: Map fbdev framebuffer while it's being displayed drm/mgag200: Map fbdev framebuffer while it's being displayed
drivers/gpu/drm/ast/ast_mode.c | 19 +++++++ drivers/gpu/drm/drm_gem_vram_helper.c | 74 +++++++++++++++++++------- drivers/gpu/drm/mgag200/mgag200_mode.c | 20 +++++++ include/drm/drm_gem_vram_helper.h | 19 +++++++ 4 files changed, 114 insertions(+), 18 deletions(-)
-- 2.23.0
The kmap and kunmap operations of GEM VRAM buffers can now be called in interleaving pairs. The first call to drm_gem_vram_kmap() maps the buffer's memory to kernel address space and the final call to drm_gem_vram_kunmap() unmaps the memory. Intermediate calls to these functions increment or decrement a reference counter.
This change allows for keeping buffer memory mapped for longer and minimizes the amount of changes to TLB, page tables, etc.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: Davidlohr Bueso dave@stgolabs.net --- drivers/gpu/drm/drm_gem_vram_helper.c | 74 ++++++++++++++++++++------- include/drm/drm_gem_vram_helper.h | 19 +++++++ 2 files changed, 75 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index fd751078bae1..6c7912092876 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -26,7 +26,11 @@ static void drm_gem_vram_cleanup(struct drm_gem_vram_object *gbo) * TTM buffer object in 'bo' has already been cleaned * up; only release the GEM object. */ + + WARN_ON(gbo->kmap_use_count); + drm_gem_object_release(&gbo->bo.base); + mutex_destroy(&gbo->kmap_lock); }
static void drm_gem_vram_destroy(struct drm_gem_vram_object *gbo) @@ -100,6 +104,8 @@ static int drm_gem_vram_init(struct drm_device *dev, if (ret) goto err_drm_gem_object_release;
+ mutex_init(&gbo->kmap_lock); + return 0;
err_drm_gem_object_release: @@ -283,6 +289,34 @@ int drm_gem_vram_unpin(struct drm_gem_vram_object *gbo) } EXPORT_SYMBOL(drm_gem_vram_unpin);
+static void *drm_gem_vram_kmap_locked(struct drm_gem_vram_object *gbo, + bool map, bool *is_iomem) +{ + int ret; + struct ttm_bo_kmap_obj *kmap = &gbo->kmap; + + if (gbo->kmap_use_count > 0) + goto out; + + if (kmap->virtual || !map) + goto out; + + ret = ttm_bo_kmap(&gbo->bo, 0, gbo->bo.num_pages, kmap); + if (ret) + return ERR_PTR(ret); + +out: + if (!kmap->virtual) { + if (is_iomem) + *is_iomem = false; + return NULL; /* not mapped; don't increment ref */ + } + ++gbo->kmap_use_count; + if (is_iomem) + return ttm_kmap_obj_virtual(kmap, is_iomem); + return kmap->virtual; +} + /** * drm_gem_vram_kmap() - Maps a GEM VRAM object into kernel address space * @gbo: the GEM VRAM object @@ -304,40 +338,44 @@ void *drm_gem_vram_kmap(struct drm_gem_vram_object *gbo, bool map, bool *is_iomem) { int ret; - struct ttm_bo_kmap_obj *kmap = &gbo->kmap; - - if (kmap->virtual || !map) - goto out; + void *virtual;
- ret = ttm_bo_kmap(&gbo->bo, 0, gbo->bo.num_pages, kmap); + ret = mutex_lock_interruptible(&gbo->kmap_lock); if (ret) return ERR_PTR(ret); + virtual = drm_gem_vram_kmap_locked(gbo, map, is_iomem); + mutex_unlock(&gbo->kmap_lock);
-out: - if (!is_iomem) - return kmap->virtual; - if (!kmap->virtual) { - *is_iomem = false; - return NULL; - } - return ttm_kmap_obj_virtual(kmap, is_iomem); + return virtual; } EXPORT_SYMBOL(drm_gem_vram_kmap);
-/** - * drm_gem_vram_kunmap() - Unmaps a GEM VRAM object - * @gbo: the GEM VRAM object - */ -void drm_gem_vram_kunmap(struct drm_gem_vram_object *gbo) +static void drm_gem_vram_kunmap_locked(struct drm_gem_vram_object *gbo) { struct ttm_bo_kmap_obj *kmap = &gbo->kmap;
+ if (WARN_ON_ONCE(!gbo->kmap_use_count)) + return; + if (--gbo->kmap_use_count > 0) + return; + if (!kmap->virtual) return;
ttm_bo_kunmap(kmap); kmap->virtual = NULL; } + +/** + * drm_gem_vram_kunmap() - Unmaps a GEM VRAM object + * @gbo: the GEM VRAM object + */ +void drm_gem_vram_kunmap(struct drm_gem_vram_object *gbo) +{ + mutex_lock(&gbo->kmap_lock); + drm_gem_vram_kunmap_locked(gbo); + mutex_unlock(&gbo->kmap_lock); +} EXPORT_SYMBOL(drm_gem_vram_kunmap);
/** diff --git a/include/drm/drm_gem_vram_helper.h b/include/drm/drm_gem_vram_helper.h index ac217d768456..8c08bc87b788 100644 --- a/include/drm/drm_gem_vram_helper.h +++ b/include/drm/drm_gem_vram_helper.h @@ -34,11 +34,30 @@ struct vm_area_struct; * backed by VRAM. It can be used for simple framebuffer devices with * dedicated memory. The buffer object can be evicted to system memory if * video memory becomes scarce. + * + * GEM VRAM objects perform reference counting for pin and mapping + * operations. So a buffer object that has been pinned N times with + * drm_gem_vram_pin() must be unpinned N times with + * drm_gem_vram_unpin(). The same applies to pairs of + * drm_gem_vram_kmap() and drm_gem_vram_kunmap(). */ struct drm_gem_vram_object { struct ttm_buffer_object bo; struct ttm_bo_kmap_obj kmap;
+ /** + * @kmap_lock: Protects the kmap address and use count + */ + struct mutex kmap_lock; + + /** + * @kmap_use_count: + * + * Reference count on the virtual address. + * The address are un-mapped when the count reaches zero. + */ + unsigned int kmap_use_count; + /* Supported placements are %TTM_PL_VRAM and %TTM_PL_SYSTEM */ struct ttm_placement placement; struct ttm_place placements[2];
On Wed, Sep 04, 2019 at 01:56:42PM +0200, Thomas Zimmermann wrote:
The kmap and kunmap operations of GEM VRAM buffers can now be called in interleaving pairs. The first call to drm_gem_vram_kmap() maps the buffer's memory to kernel address space and the final call to drm_gem_vram_kunmap() unmaps the memory. Intermediate calls to these functions increment or decrement a reference counter.
This change allows for keeping buffer memory mapped for longer and minimizes the amount of changes to TLB, page tables, etc.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: Davidlohr Bueso dave@stgolabs.net
Reviewed-by: Gerd Hoffmann kraxel@redhat.com
The generic fbdev emulation will map and unmap the framebuffer's memory if required. As consoles are most often updated while being on screen, we map the fbdev buffer while it's being displayed. This avoids frequent map/unmap operations in the fbdev code. The original fbdev code in ast used the same trick to improve performance.
v2: * fix typo in comment
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: Noralf Trønnes noralf@tronnes.org Cc: Dave Airlie airlied@redhat.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Sam Ravnborg sam@ravnborg.org Cc: Gerd Hoffmann kraxel@redhat.com Cc: Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com Cc: CK Hu ck.hu@mediatek.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: YueHaibing yuehaibing@huawei.com Cc: Sam Bobroff sbobroff@linux.ibm.com Cc: Huang Rui ray.huang@amd.com Cc: "Y.C. Chen" yc_chen@aspeedtech.com Cc: Rong Chen rong.a.chen@intel.com Cc: Feng Tang feng.tang@intel.com Cc: Huang Ying ying.huang@intel.com Cc: Davidlohr Bueso dave@stgolabs.net --- drivers/gpu/drm/ast/ast_mode.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c index d349c721501c..c10fff652228 100644 --- a/drivers/gpu/drm/ast/ast_mode.c +++ b/drivers/gpu/drm/ast/ast_mode.c @@ -529,13 +529,20 @@ static int ast_crtc_do_set_base(struct drm_crtc *crtc, struct drm_framebuffer *fb, int x, int y, int atomic) { + struct drm_fb_helper *fb_helper = crtc->dev->fb_helper; struct drm_gem_vram_object *gbo; int ret; s64 gpu_addr; + void *base;
if (!atomic && fb) { gbo = drm_gem_vram_of_gem(fb->obj[0]); drm_gem_vram_unpin(gbo); + + // Unmap fbdev FB if it's not being displayed + // any longer. + if (fb == fb_helper->buffer->fb) + drm_gem_vram_kunmap(gbo); }
gbo = drm_gem_vram_of_gem(crtc->primary->fb->obj[0]); @@ -552,6 +559,14 @@ static int ast_crtc_do_set_base(struct drm_crtc *crtc, ast_set_offset_reg(crtc); ast_set_start_address_crt1(crtc, (u32)gpu_addr);
+ // Map fbdev FB while it's being displayed. This avoids frequent + // mapping and unmapping within the fbdev code. + if (crtc->primary->fb == fb_helper->buffer->fb) { + base = drm_gem_vram_kmap(gbo, true, NULL); + if (IS_ERR(base)) + DRM_ERROR("failed to kmap fbcon\n"); + } + return 0;
err_drm_gem_vram_unpin: @@ -605,10 +620,14 @@ static void ast_crtc_disable(struct drm_crtc *crtc) DRM_DEBUG_KMS("\n"); ast_crtc_dpms(crtc, DRM_MODE_DPMS_OFF); if (crtc->primary->fb) { + struct drm_fb_helper *fb_helper = crtc->dev->fb_helper; struct drm_framebuffer *fb = crtc->primary->fb; struct drm_gem_vram_object *gbo = drm_gem_vram_of_gem(fb->obj[0]);
+ // Unmap if it's the fbdev FB. + if (fb == fb_helper->buffer->fb) + drm_gem_vram_kunmap(gbo); drm_gem_vram_unpin(gbo); } crtc->primary->fb = NULL;
On Wed, Sep 04, 2019 at 01:56:43PM +0200, Thomas Zimmermann wrote:
The generic fbdev emulation will map and unmap the framebuffer's memory if required. As consoles are most often updated while being on screen, we map the fbdev buffer while it's being displayed. This avoids frequent map/unmap operations in the fbdev code. The original fbdev code in ast used the same trick to improve performance.
v2:
- fix typo in comment
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: Noralf Trønnes noralf@tronnes.org Cc: Dave Airlie airlied@redhat.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Sam Ravnborg sam@ravnborg.org Cc: Gerd Hoffmann kraxel@redhat.com Cc: Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com Cc: CK Hu ck.hu@mediatek.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: YueHaibing yuehaibing@huawei.com Cc: Sam Bobroff sbobroff@linux.ibm.com Cc: Huang Rui ray.huang@amd.com Cc: "Y.C. Chen" yc_chen@aspeedtech.com Cc: Rong Chen rong.a.chen@intel.com Cc: Feng Tang feng.tang@intel.com Cc: Huang Ying ying.huang@intel.com Cc: Davidlohr Bueso dave@stgolabs.net
drivers/gpu/drm/ast/ast_mode.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c index d349c721501c..c10fff652228 100644 --- a/drivers/gpu/drm/ast/ast_mode.c +++ b/drivers/gpu/drm/ast/ast_mode.c @@ -529,13 +529,20 @@ static int ast_crtc_do_set_base(struct drm_crtc *crtc, struct drm_framebuffer *fb, int x, int y, int atomic) {
- struct drm_fb_helper *fb_helper = crtc->dev->fb_helper;
struct drm_framebuffer *fbcon = crtc->dev->fb_helper->buffer->fb ?
makes clear what is going on without excessive commenting ;)
struct drm_gem_vram_object *gbo; int ret; s64 gpu_addr;
void *base;
if (!atomic && fb) { gbo = drm_gem_vram_of_gem(fb->obj[0]); drm_gem_vram_unpin(gbo);
// Unmap fbdev FB if it's not being displayed
// any longer.
I'd drop the comment. It says *what* the comment is doing. You should be able to figure by just reading the code. Comments should explain *why* the code does something ...
if (fb == fb_helper->buffer->fb)
drm_gem_vram_kunmap(gbo);
}
gbo = drm_gem_vram_of_gem(crtc->primary->fb->obj[0]);
@@ -552,6 +559,14 @@ static int ast_crtc_do_set_base(struct drm_crtc *crtc, ast_set_offset_reg(crtc); ast_set_start_address_crt1(crtc, (u32)gpu_addr);
- // Map fbdev FB while it's being displayed. This avoids frequent
- // mapping and unmapping within the fbdev code.
... like this one (avoid frequent map/unmap).
Comments should use /* */ style, especially multi line comments. See also the comment section in Documentation/process/coding-style.rst
cheers, Gerd
The generic fbdev emulation will map and unmap the framebuffer's memory if required. As consoles are most often updated while being on screen, we map the fbdev buffer while it's being displayed. This avoids frequent map/unmap operations in the fbdev code. The original fbdev code in mgag200 used the same trick to improve performance.
v2: * fix typo in comment
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 90f479ae51af ("drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation") Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Noralf Trønnes noralf@tronnes.org Cc: Dave Airlie airlied@redhat.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Gerd Hoffmann kraxel@redhat.com Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: Sam Ravnborg sam@ravnborg.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Huang Rui ray.huang@amd.com Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Cc: "Michał Mirosław" mirq-linux@rere.qmqm.pl Cc: Armijn Hemel armijn@tjaldur.nl Cc: Rong Chen rong.a.chen@intel.com Cc: Feng Tang feng.tang@intel.com Cc: Huang Ying ying.huang@intel.com Cc: Davidlohr Bueso dave@stgolabs.net --- drivers/gpu/drm/mgag200/mgag200_mode.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c index 5e778b5f1a10..7b95c59341f5 100644 --- a/drivers/gpu/drm/mgag200/mgag200_mode.c +++ b/drivers/gpu/drm/mgag200/mgag200_mode.c @@ -860,13 +860,20 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc, struct drm_framebuffer *fb, int x, int y, int atomic) { + struct drm_fb_helper *fb_helper = crtc->dev->fb_helper; struct drm_gem_vram_object *gbo; int ret; s64 gpu_addr; + void *base;
if (!atomic && fb) { gbo = drm_gem_vram_of_gem(fb->obj[0]); drm_gem_vram_unpin(gbo); + + // Unmap fbdev FB if it's not being displayed + // any longer. + if (fb == fb_helper->buffer->fb) + drm_gem_vram_kunmap(gbo); }
gbo = drm_gem_vram_of_gem(crtc->primary->fb->obj[0]); @@ -882,6 +889,14 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc,
mga_set_start_address(crtc, (u32)gpu_addr);
+ // Map fbdev FB while it's being displayed. This avoids frequent + // mapping and unmapping within the fbdev code. + if (crtc->primary->fb == fb_helper->buffer->fb) { + base = drm_gem_vram_kmap(gbo, true, NULL); + if (IS_ERR(base)) + DRM_ERROR("failed to kmap fbcon\n"); + } + return 0;
err_drm_gem_vram_unpin: @@ -1403,9 +1418,14 @@ static void mga_crtc_disable(struct drm_crtc *crtc) DRM_DEBUG_KMS("\n"); mga_crtc_dpms(crtc, DRM_MODE_DPMS_OFF); if (crtc->primary->fb) { + struct drm_fb_helper *fb_helper = crtc->dev->fb_helper; struct drm_framebuffer *fb = crtc->primary->fb; struct drm_gem_vram_object *gbo = drm_gem_vram_of_gem(fb->obj[0]); + + // Unmap if it's the fbdev FB. + if (fb == fb_helper->buffer->fb) + drm_gem_vram_kunmap(gbo); drm_gem_vram_unpin(gbo); } crtc->primary->fb = NULL;
Please use C style comments rather than C++.
Alex ________________________________ From: Thomas Zimmermann tzimmermann@suse.de Sent: Wednesday, September 4, 2019 7:56 AM To: daniel@ffwll.ch daniel@ffwll.ch; noralf@tronnes.org noralf@tronnes.org; airlied@linux.ie airlied@linux.ie; rong.a.chen@intel.com rong.a.chen@intel.com; feng.tang@intel.com feng.tang@intel.com; ying.huang@intel.com ying.huang@intel.com; sean@poorly.run sean@poorly.run; maxime.ripard@bootlin.com maxime.ripard@bootlin.com; maarten.lankhorst@linux.intel.com maarten.lankhorst@linux.intel.com; dave@stgolabs.net dave@stgolabs.net Cc: dri-devel@lists.freedesktop.org dri-devel@lists.freedesktop.org; Thomas Zimmermann tzimmermann@suse.de; Dave Airlie airlied@redhat.com; Greg Kroah-Hartman gregkh@linuxfoundation.org; Thomas Gleixner tglx@linutronix.de; Gerd Hoffmann kraxel@redhat.com; Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; Sam Ravnborg sam@ravnborg.org; Daniel Vetter daniel.vetter@ffwll.ch; Huang, Ray Ray.Huang@amd.com; Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com; Michał Mirosław mirq-linux@rere.qmqm.pl; Armijn Hemel armijn@tjaldur.nl Subject: [PATCH v2 3/3] drm/mgag200: Map fbdev framebuffer while it's being displayed
The generic fbdev emulation will map and unmap the framebuffer's memory if required. As consoles are most often updated while being on screen, we map the fbdev buffer while it's being displayed. This avoids frequent map/unmap operations in the fbdev code. The original fbdev code in mgag200 used the same trick to improve performance.
v2: * fix typo in comment
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 90f479ae51af ("drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation") Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Noralf Trønnes noralf@tronnes.org Cc: Dave Airlie airlied@redhat.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Gerd Hoffmann kraxel@redhat.com Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: Sam Ravnborg sam@ravnborg.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Huang Rui ray.huang@amd.com Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Cc: "Michał Mirosław" mirq-linux@rere.qmqm.pl Cc: Armijn Hemel armijn@tjaldur.nl Cc: Rong Chen rong.a.chen@intel.com Cc: Feng Tang feng.tang@intel.com Cc: Huang Ying ying.huang@intel.com Cc: Davidlohr Bueso dave@stgolabs.net --- drivers/gpu/drm/mgag200/mgag200_mode.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c b/drivers/gpu/drm/mgag200/mgag200_mode.c index 5e778b5f1a10..7b95c59341f5 100644 --- a/drivers/gpu/drm/mgag200/mgag200_mode.c +++ b/drivers/gpu/drm/mgag200/mgag200_mode.c @@ -860,13 +860,20 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc, struct drm_framebuffer *fb, int x, int y, int atomic) { + struct drm_fb_helper *fb_helper = crtc->dev->fb_helper; struct drm_gem_vram_object *gbo; int ret; s64 gpu_addr; + void *base;
if (!atomic && fb) { gbo = drm_gem_vram_of_gem(fb->obj[0]); drm_gem_vram_unpin(gbo); + + // Unmap fbdev FB if it's not being displayed + // any longer. + if (fb == fb_helper->buffer->fb) + drm_gem_vram_kunmap(gbo); }
gbo = drm_gem_vram_of_gem(crtc->primary->fb->obj[0]); @@ -882,6 +889,14 @@ static int mga_crtc_do_set_base(struct drm_crtc *crtc,
mga_set_start_address(crtc, (u32)gpu_addr);
+ // Map fbdev FB while it's being displayed. This avoids frequent + // mapping and unmapping within the fbdev code. + if (crtc->primary->fb == fb_helper->buffer->fb) { + base = drm_gem_vram_kmap(gbo, true, NULL); + if (IS_ERR(base)) + DRM_ERROR("failed to kmap fbcon\n"); + } + return 0;
err_drm_gem_vram_unpin: @@ -1403,9 +1418,14 @@ static void mga_crtc_disable(struct drm_crtc *crtc) DRM_DEBUG_KMS("\n"); mga_crtc_dpms(crtc, DRM_MODE_DPMS_OFF); if (crtc->primary->fb) { + struct drm_fb_helper *fb_helper = crtc->dev->fb_helper; struct drm_framebuffer *fb = crtc->primary->fb; struct drm_gem_vram_object *gbo = drm_gem_vram_of_gem(fb->obj[0]); + + // Unmap if it's the fbdev FB. + if (fb == fb_helper->buffer->fb) + drm_gem_vram_kunmap(gbo); drm_gem_vram_unpin(gbo); } crtc->primary->fb = NULL; -- 2.23.0
On Wed, Sep 4, 2019 at 1:56 PM Thomas Zimmermann tzimmermann@suse.de wrote:
(was: drm/vram-helper: Fix performance regression in fbdev)
Generic fbdev emulation maps and unmaps the console BO for updating it's content from the shadow buffer. If this involves an actual mapping operation (instead of reusing an existing mapping), lots of debug messages may be printed, such as
x86/PAT: Overlap at 0xd0000000-0xd1000000 x86/PAT: reserve_memtype added [mem 0xd0000000-0xd02fffff], track write-combining, req write-combining, ret write-combining x86/PAT: free_memtype request [mem 0xd0000000-0xd02fffff]
as reported at [1]. Drivers using VRAM helpers may also see reduced performance as the mapping operations can create overhead.
This patch set fixes the problem by adding a ref counter to the GEM VRAM buffers' kmap operation, and keeping the fbdev's console buffer mapped while the console is being displayed. These changes avoid the frequent mappings in the fbdev code. The drivers, ast and mgag200, map the console's buffer when it becomes visible and the fbdev code reuses this mapping. The original fbdev code in ast and mgag200 used the same strategy.
[1] https://lists.freedesktop.org/archives/dri-devel/2019-September/234308.html
As discussed on irc a bit, here's my thoughts:
- imo we should fix this by using the io_mapping stuff, that avoids the overhead of repeated pat checks for map/unmap. It should also cut down on remapping costs a lot in general (at least on 64bit kernels, which is like everything nowadays). But it's a lot more work to roll out I guess. I think this would be the much better longterm fix.
- this here only works when fbcon is active, the noise will come back when you start X or wayland. We should probably check whether the display is active with drm_master_internal_acquire (and reupload when we restore the entire console in the restore function - could just launch the worker for that).
I'm also not sure whether we have a real problem here, it's just debug noise that we're fighting here? -Daniel
v2: * fixed comment typos
Thomas Zimmermann (3): drm/vram: Add kmap ref-counting to GEM VRAM objects drm/ast: Map fbdev framebuffer while it's being displayed drm/mgag200: Map fbdev framebuffer while it's being displayed
drivers/gpu/drm/ast/ast_mode.c | 19 +++++++ drivers/gpu/drm/drm_gem_vram_helper.c | 74 +++++++++++++++++++------- drivers/gpu/drm/mgag200/mgag200_mode.c | 20 +++++++ include/drm/drm_gem_vram_helper.h | 19 +++++++ 4 files changed, 114 insertions(+), 18 deletions(-)
-- 2.23.0
On Wed, 04 Sep 2019, Daniel Vetter wrote:
I'm also not sure whether we have a real problem here, it's just debug noise that we're fighting here?
It is non stop debug noise as the memory range in question is being added + deleted over and over. I doubt we want to be burning cycles like this.
Thanks, Davidlohr
On Wed, Sep 4, 2019 at 7:14 PM Davidlohr Bueso dave@stgolabs.net wrote:
On Wed, 04 Sep 2019, Daniel Vetter wrote:
I'm also not sure whether we have a real problem here, it's just debug noise that we're fighting here?
It is non stop debug noise as the memory range in question is being added + deleted over and over. I doubt we want to be burning cycles like this.
Yeah the proper fix is setting up an io_mapping in ttm (or drivers) so the pat tracking is cached, and then using the right pte wrangling functions. But that's a lot more involved fix, and from all the testing we've done the pte rewriting itself doesn't seem to be the biggest issue with mgag200 being slow ... -Daniel
Hi,
- imo we should fix this by using the io_mapping stuff, that avoids
the overhead of repeated pat checks for map/unmap.
Another idea: IIRC ttm has a move_notify callback. So we could simply keep mappings active even when the refcount goes down to zero. Then do the actual unmap either in the move_notify or in the destroy callback.
cheers, Gerd
On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
- imo we should fix this by using the io_mapping stuff, that avoids
the overhead of repeated pat checks for map/unmap.
Another idea: IIRC ttm has a move_notify callback. So we could simply keep mappings active even when the refcount goes down to zero. Then do the actual unmap either in the move_notify or in the destroy callback.
Yeah that should be a really clean solution, and only needs changes in the vram helpers. Which is nice, more common code! -Daniel
Hi
Am 05.09.19 um 09:56 schrieb Daniel Vetter:
On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
- imo we should fix this by using the io_mapping stuff, that avoids
the overhead of repeated pat checks for map/unmap.
Another idea: IIRC ttm has a move_notify callback. So we could simply keep mappings active even when the refcount goes down to zero. Then do the actual unmap either in the move_notify or in the destroy callback.
Yeah that should be a really clean solution, and only needs changes in the vram helpers. Which is nice, more common code!
But the console's BO is special wrt to mapping. Putting special code for console handling into vram helpers doesn't seem right. I think it's preferable to keep the special cases located in fbdev emulation. Or even better in DRM client code, so that other, future, internal clients automatically do the right thing.
Best regards Thomas
-Daniel
On Thu, Sep 05, 2019 at 10:19:40AM +0200, Thomas Zimmermann wrote:
Hi
Am 05.09.19 um 09:56 schrieb Daniel Vetter:
On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
- imo we should fix this by using the io_mapping stuff, that avoids
the overhead of repeated pat checks for map/unmap.
Another idea: IIRC ttm has a move_notify callback. So we could simply keep mappings active even when the refcount goes down to zero. Then do the actual unmap either in the move_notify or in the destroy callback.
Yeah that should be a really clean solution, and only needs changes in the vram helpers. Which is nice, more common code!
But the console's BO is special wrt to mapping. Putting special code for console handling into vram helpers doesn't seem right.
I have no special handling in mind. I think we can simply do that for all gem objects, no matter whenever they are created by fbcon or userspace (wayland/xorg/whatever). vmap will create a mapping (or increase the reference count). vunmap decreases the reference count, when it goes down to zero unpin it but keep the mapping. The actual unmap happens when ttm wants move the object from VRAM to SYSTEM due to VRAM being full. In case vram has room for all our objects we simply never unmap.
hope this clarifies, Gerd
Hi
Am 05.09.19 um 11:29 schrieb Gerd Hoffmann:
On Thu, Sep 05, 2019 at 10:19:40AM +0200, Thomas Zimmermann wrote:
Hi
Am 05.09.19 um 09:56 schrieb Daniel Vetter:
On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
- imo we should fix this by using the io_mapping stuff, that avoids
the overhead of repeated pat checks for map/unmap.
Another idea: IIRC ttm has a move_notify callback. So we could simply keep mappings active even when the refcount goes down to zero. Then do the actual unmap either in the move_notify or in the destroy callback.
Yeah that should be a really clean solution, and only needs changes in the vram helpers. Which is nice, more common code!
But the console's BO is special wrt to mapping. Putting special code for console handling into vram helpers doesn't seem right.
I have no special handling in mind. I think we can simply do that for all gem objects, no matter whenever they are created by fbcon or userspace (wayland/xorg/whatever). vmap will create a mapping (or increase the reference count). vunmap decreases the reference count, when it goes down to zero unpin it but keep the mapping. The actual unmap happens when ttm wants move the object from VRAM to SYSTEM due to VRAM being full. In case vram has room for all our objects we simply never unmap.
That's pretty cool. Thanks for clarifying. I think it's the solution I was looking for.
Best regards Thomas
hope this clarifies, Gerd
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org