VRAM memory can easily fragment, which leads to OOM errors on devices with little VRAM available (i.e., all of them). This patchset addresses the problem by placing BOs near the bottom and top ends of the VRAM and keeping available areas near the middle.
Thomas Zimmermann (2): drm/vram-helper: Don't put new BOs into VRAM drm/vram-helper: Alternate between bottom-up and top-down placement
drivers/gpu/drm/drm_gem_vram_helper.c | 30 +++++++++++++++++++++------ include/drm/drm_gem_vram_helper.h | 10 +++++---- 2 files changed, 30 insertions(+), 10 deletions(-)
-- 2.26.0
Most drivers that use VRAM helpers have only a few MiB of framebuffer memory available. To reduce fragmentation, new BOs are now put into system memory by default. Only pin operations are allowed to move BOs into VRAM.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de --- drivers/gpu/drm/drm_gem_vram_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 8b2d5c945c95c..5d5bfb38bbed4 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -197,7 +197,7 @@ static int drm_gem_vram_init(struct drm_device *dev, acc_size = ttm_bo_dma_acc_size(bdev, size, sizeof(*gbo));
gbo->bo.bdev = bdev; - drm_gem_vram_placement(gbo, TTM_PL_FLAG_VRAM | TTM_PL_FLAG_SYSTEM); + drm_gem_vram_placement(gbo, TTM_PL_FLAG_SYSTEM);
ret = ttm_bo_init(bdev, &gbo->bo, size, ttm_bo_type_device, &gbo->placement, pg_align, false, acc_size,
With limited VRAM available, fragmentation can lead to OOM errors. Alternating between bottom-up and top-down placement keeps BOs near the ends of the VRAM and the available pages consecutively near the middle.
A real-world example with 16 MiB of VRAM is shown below.
cat /sys/kernel/debug/dri/0/vram-mm
0x0000000000000000-0x000000000000057f: 1407: free 0x000000000000057f-0x0000000000000b5b: 1500: used 0x0000000000000b5b-0x0000000000000ff0: 1173: free
The first free area was the location of the fbdev framebuffer. The used area is Weston's current framebuffer of 1500 pages. Weston now cannot do a pageflip to another 1500 page-wide framebuffer, even though enough pages are available. The patch resolves this problem to
cat /sys/kernel/debug/dri/0/vram-mm
0x0000000000000000-0x00000000000005dc: 1500: used 0x00000000000005dc-0x0000000000000a14: 1080: free 0x0000000000000a14-0x0000000000000ff0: 1500: used
with both of Weston's framebuffers located near the ends of the VRAM memory.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de --- drivers/gpu/drm/drm_gem_vram_helper.c | 28 ++++++++++++++++++++++----- include/drm/drm_gem_vram_helper.h | 10 ++++++---- 2 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 5d5bfb38bbed4..2d0b7474288de 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -146,15 +146,33 @@ static void drm_gem_vram_placement(struct drm_gem_vram_object *gbo, unsigned int i; unsigned int c = 0; u32 invariant_flags = pl_flag & TTM_PL_FLAG_TOPDOWN; + struct drm_device *dev = gbo->bo.base.dev; + struct drm_vram_mm *vmm = dev->vram_mm;
gbo->placement.placement = gbo->placements; gbo->placement.busy_placement = gbo->placements;
- if (pl_flag & TTM_PL_FLAG_VRAM) - gbo->placements[c++].flags = TTM_PL_FLAG_WC | - TTM_PL_FLAG_UNCACHED | - TTM_PL_FLAG_VRAM | - invariant_flags; + if (pl_flag & TTM_PL_FLAG_VRAM) { + /* + * We usually have at most 2 pinned BOs during pageflips, + * plus cursor BOs. Even with a significant number of free + ' pages, always placing bottom-up can lead to fragmentation + * and OOM situations. So if there's no explicit request for + * top-down placement, we alternatingly place BOs bottom-up + * and top-down. The placement strategy should help to keep + * free VRAM pages available near the middle of the VRAM. + */ + gbo->placements[c].flags = TTM_PL_FLAG_WC | + TTM_PL_FLAG_UNCACHED | + TTM_PL_FLAG_VRAM | + invariant_flags; + if (!(invariant_flags & TTM_PL_FLAG_TOPDOWN)) { + if (vmm->place_topdown) + gbo->placements[c].flags |= TTM_PL_FLAG_TOPDOWN; + vmm->place_topdown = !vmm->place_topdown; + } + ++c; + }
if (pl_flag & TTM_PL_FLAG_SYSTEM) gbo->placements[c++].flags = TTM_PL_MASK_CACHING | diff --git a/include/drm/drm_gem_vram_helper.h b/include/drm/drm_gem_vram_helper.h index b63bcd1b996da..04767d0ff23a6 100644 --- a/include/drm/drm_gem_vram_helper.h +++ b/include/drm/drm_gem_vram_helper.h @@ -165,10 +165,10 @@ void drm_gem_vram_simple_display_pipe_cleanup_fb(
/** * struct drm_vram_mm - An instance of VRAM MM - * @vram_base: Base address of the managed video memory - * @vram_size: Size of the managed video memory in bytes - * @bdev: The TTM BO device. - * @funcs: TTM BO functions + * @vram_base: Base address of the managed video memory + * @vram_size: Size of the managed video memory in bytes + * @bdev: The TTM BO device + * @place_topdown: Flags the next BO to be placed at the VRAM's high end * * The fields &struct drm_vram_mm.vram_base and * &struct drm_vram_mm.vrm_size are managed by VRAM MM, but are @@ -180,6 +180,8 @@ struct drm_vram_mm { size_t vram_size;
struct ttm_bo_device bdev; + + bool place_topdown; };
/**
On Wed, Apr 22, 2020 at 04:40:55PM +0200, Thomas Zimmermann wrote:
With limited VRAM available, fragmentation can lead to OOM errors. Alternating between bottom-up and top-down placement keeps BOs near the ends of the VRAM and the available pages consecutively near the middle.
A real-world example with 16 MiB of VRAM is shown below.
cat /sys/kernel/debug/dri/0/vram-mm
0x0000000000000000-0x000000000000057f: 1407: free 0x000000000000057f-0x0000000000000b5b: 1500: used 0x0000000000000b5b-0x0000000000000ff0: 1173: free
The first free area was the location of the fbdev framebuffer. The used area is Weston's current framebuffer of 1500 pages. Weston now cannot do a pageflip to another 1500 page-wide framebuffer, even though enough pages are available. The patch resolves this problem to
cat /sys/kernel/debug/dri/0/vram-mm
0x0000000000000000-0x00000000000005dc: 1500: used 0x00000000000005dc-0x0000000000000a14: 1080: free 0x0000000000000a14-0x0000000000000ff0: 1500: used
with both of Weston's framebuffers located near the ends of the VRAM memory.
I don't think it is that simple.
First: How will that interact with cursor bo allocations? IIRC the strategy for them is to allocate top-down, for similar reasons (avoid small cursor bo allocs fragment vram memory).
Second: I think ttm will move bo's from vram to system only on memory pressure. So you can still end up with fragmented memory. To make the scheme with one fb @ top and one @ bottom work reliable you have to be more aggressive on pushing out framebuffers.
Third: I'd suggest make topdown allocations depending on current state instead of simply alternating, i.e. if there is a pinned framebuffer @ offset 0, then go for top-down.
I also think using this scheme should be optional. In the simplest case we can allow drivers opt-in. Or we try do to something clever automatically: using the strategy only for framebuffers larger than 1/4 or 1/3 of total vram memory (which makes alloc failures due to fragmentation much more likely).
cheers, Gerd
Hi Gerd,
thanks for the feedback.
Am 23.04.20 um 13:18 schrieb Gerd Hoffmann:
On Wed, Apr 22, 2020 at 04:40:55PM +0200, Thomas Zimmermann wrote:
With limited VRAM available, fragmentation can lead to OOM errors. Alternating between bottom-up and top-down placement keeps BOs near the ends of the VRAM and the available pages consecutively near the middle.
A real-world example with 16 MiB of VRAM is shown below.
cat /sys/kernel/debug/dri/0/vram-mm
0x0000000000000000-0x000000000000057f: 1407: free 0x000000000000057f-0x0000000000000b5b: 1500: used 0x0000000000000b5b-0x0000000000000ff0: 1173: free
The first free area was the location of the fbdev framebuffer. The used area is Weston's current framebuffer of 1500 pages. Weston now cannot do a pageflip to another 1500 page-wide framebuffer, even though enough pages are available. The patch resolves this problem to
cat /sys/kernel/debug/dri/0/vram-mm
0x0000000000000000-0x00000000000005dc: 1500: used 0x00000000000005dc-0x0000000000000a14: 1080: free 0x0000000000000a14-0x0000000000000ff0: 1500: used
with both of Weston's framebuffers located near the ends of the VRAM memory.
I don't think it is that simple.
First: How will that interact with cursor bo allocations? IIRC the strategy for them is to allocate top-down, for similar reasons (avoid small cursor bo allocs fragment vram memory).
In ast, 2 cursor BOs are allocated during driver initialization and kept permanently at the vram's top end. I don't know about other drivers.
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
Second: I think ttm will move bo's from vram to system only on memory pressure. So you can still end up with fragmented memory. To make the scheme with one fb @ top and one @ bottom work reliable you have to be more aggressive on pushing out framebuffers.
I'm the process of converting mgag200 to atomic modesetting. The given example is what I observed. I'm not claiming that the placement scheme is perfect, but it is required to get mgag200 working with atomic modesetting's pageflip logic. So we're solving a real problem here.
The bug comes from Weston's allocation strategy. Looking at the debug output:
0x0000000000000000-0x000000000000057f: 1407: free
This was fbdev's framebuffer with 1600x900@32bpp
0x000000000000057f-0x0000000000000b5b: 1500: used
This is Weston's framebuffer also with 1600x900@32bpp. But Weston allocates an additional, unused 60 scanlines. That is to render with tiles of 64x64px, I suppose. fbdev doesn't do that, hence Weston's second framebuffer doesn't fit into the free location of the fbdev framebuffer.
The other drivers with a small amount of vram are also prone to this problem. They simply have not yet encountered such a setup.
Third: I'd suggest make topdown allocations depending on current state instead of simply alternating, i.e. if there is a pinned framebuffer @ offset 0, then go for top-down.
That's what the current patch does. If the last pin was at the bottom, the next goes to the top. And then the other way around. Without alternating between both end of vram, the problem would occur again when fragmentation happens near the top end.
I also think using this scheme should be optional. In the simplest case we can allow drivers opt-in. Or we try do to something clever automatically: using the strategy only for framebuffers larger than 1/4 or 1/3 of total vram memory (which makes alloc failures due to fragmentation much more likely).
I'd like to not change behavior automatically, but we can surely make this optional.
Looking again at the vram helpers, this functionality could be implemented in drm_gem_vram_plane_helper_prepare_fb(). Drivers with other placement strategies could implement their own helper for prepare_fb.
Best regards Thomas
cheers, Gerd
I don't think it is that simple.
First: How will that interact with cursor bo allocations? IIRC the strategy for them is to allocate top-down, for similar reasons (avoid small cursor bo allocs fragment vram memory).
In ast, 2 cursor BOs are allocated during driver initialization and kept permanently at the vram's top end. I don't know about other drivers.
One-time allocation at init time shouldn't be a problem.
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
Second: I think ttm will move bo's from vram to system only on memory pressure. So you can still end up with fragmented memory. To make the scheme with one fb @ top and one @ bottom work reliable you have to be more aggressive on pushing out framebuffers.
I'm the process of converting mgag200 to atomic modesetting. The given example is what I observed. I'm not claiming that the placement scheme is perfect, but it is required to get mgag200 working with atomic modesetting's pageflip logic. So we're solving a real problem here.
I don't doubt this is a real problem.
The bug comes from Weston's allocation strategy. Looking at the debug output:
0x0000000000000000-0x000000000000057f: 1407: free
This was fbdev's framebuffer with 1600x900@32bpp
0x000000000000057f-0x0000000000000b5b: 1500: used
This is Weston's framebuffer also with 1600x900@32bpp. But Weston allocates an additional, unused 60 scanlines. That is to render with tiles of 64x64px, I suppose. fbdev doesn't do that, hence Weston's second framebuffer doesn't fit into the free location of the fbdev framebuffer.
Sure. Assume there is just enough vram to fit in fbdev and two weston framebuffers. fbdev is allocated from bottom, first weston fb from top, second weston fb from bottom again. fbdev is not pushed out (no memory pressure yet) so the second weston fb ends up in the middle of vram fragmenting things. And now you are again in the situation where you might have enough free vram for an allocation but can't use it due to fragmention (probably harder to trigger in practice though).
That's why I would suggest to explicitly move out unpinned framebuffers (aka fbdev) before pinning a new one (second weston fb) instead of depending on ttm moving things out on OOM, to make sure you never allocate something in the middle of vram.
Third: I'd suggest make topdown allocations depending on current state instead of simply alternating, i.e. if there is a pinned framebuffer @ offset 0, then go for top-down.
That's what the current patch does. If the last pin was at the bottom, the next goes to the top. And then the other way around. Without alternating between both end of vram, the problem would occur again when fragmentation happens near the top end.
I'd feel better when checking the state of my current pins to figure whenever I should alloc top-bottom or not, for robustness reasons.
Looking again at the vram helpers, this functionality could be implemented in drm_gem_vram_plane_helper_prepare_fb(). Drivers with other placement strategies could implement their own helper for prepare_fb.
vram helpers could also simply offer two prepare_fb variants.
cheers, Gerd
Hi guys,
one thing you don't seem to have considered yet is that not setting the top-down flag also won't get you the bottom-up approach, but rather the best fit.
Additional to that the top-down allocations have quite a bit more CPU overhead since they don't walk the size tree, but the address tree in search for a fitting hole. Nirmoy is currently working on improving this.
Am 23.04.20 um 15:57 schrieb Gerd Hoffmann:
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
Well maybe instead of a ping/pong approach between top and bottom use a size based approach to place small BOs at the top and only the larger ones at the bottom (or the other way around).
Cheers, Christian.
Hi Christian
Am 23.04.20 um 17:04 schrieb Christian König:
Hi guys,
one thing you don't seem to have considered yet is that not setting the top-down flag also won't get you the bottom-up approach, but rather the best fit.
Kind of unrelated: Would it pick the smallest free area that can hold a BO? IOW, would a small cursor BO fill up a small free area even if there's a much larger free area somewhere?
Additional to that the top-down allocations have quite a bit more CPU overhead since they don't walk the size tree, but the address tree in search for a fitting hole. Nirmoy is currently working on improving this.
I'm not worried about that. Compositors alternate between only two framebuffer instances. Once their BOs have been placed, they remain in vram.
Am 23.04.20 um 15:57 schrieb Gerd Hoffmann:
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
Well maybe instead of a ping/pong approach between top and bottom use a size based approach to place small BOs at the top and only the larger ones at the bottom (or the other way around).
That's what the current code does and it results in the OOM. Basically, there's a large BO in the middle of vram and the pageflip's destination BO does not fit before or after the large one.
Best regards Thomas
Cheers, Christian. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 24.04.20 um 08:27 schrieb Thomas Zimmermann:
Hi Christian
Am 23.04.20 um 17:04 schrieb Christian König:
Hi guys,
one thing you don't seem to have considered yet is that not setting the top-down flag also won't get you the bottom-up approach, but rather the best fit.
Kind of unrelated: Would it pick the smallest free area that can hold a BO? IOW, would a small cursor BO fill up a small free area even if there's a much larger free area somewhere?
Yes, exactly that.
The bottom-up method searches for the lowest hole which can fit the requirements.
The top-down method searches for the highest hole which can fit the requirements.
Both of those walk the holes by the address index tree, but there is also the best method which looks at the holes by their size and tries to find the best fit.
The best method usually only needs a single tree walk instead of a linear search.
Am 23.04.20 um 15:57 schrieb Gerd Hoffmann:
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
Well maybe instead of a ping/pong approach between top and bottom use a size based approach to place small BOs at the top and only the larger ones at the bottom (or the other way around).
That's what the current code does and it results in the OOM. Basically, there's a large BO in the middle of vram and the pageflip's destination BO does not fit before or after the large one.
Tough problem. No idea how to otherwise fix this without making the large BO moveable somehow.
Christian.
Best regards Thomas
Cheers, Christian. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi
Am 24.04.20 um 08:56 schrieb Christian König:
Am 24.04.20 um 08:27 schrieb Thomas Zimmermann:
Hi Christian
Am 23.04.20 um 17:04 schrieb Christian König:
Hi guys,
one thing you don't seem to have considered yet is that not setting the top-down flag also won't get you the bottom-up approach, but rather the best fit.
Kind of unrelated: Would it pick the smallest free area that can hold a BO? IOW, would a small cursor BO fill up a small free area even if there's a much larger free area somewhere?
Yes, exactly that.
I see.
The bottom-up method searches for the lowest hole which can fit the requirements.
But there's no way to explictely select bottom-up? What is TTM_PL_FLAG_CONTIGUOUS doing?
Best regards Thomas
The top-down method searches for the highest hole which can fit the requirements.
Both of those walk the holes by the address index tree, but there is also the best method which looks at the holes by their size and tries to find the best fit.
The best method usually only needs a single tree walk instead of a linear search.
Am 23.04.20 um 15:57 schrieb Gerd Hoffmann:
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
Well maybe instead of a ping/pong approach between top and bottom use a size based approach to place small BOs at the top and only the larger ones at the bottom (or the other way around).
That's what the current code does and it results in the OOM. Basically, there's a large BO in the middle of vram and the pageflip's destination BO does not fit before or after the large one.
Tough problem. No idea how to otherwise fix this without making the large BO moveable somehow.
Christian.
Best regards Thomas
Cheers, Christian. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 24.04.20 um 09:03 schrieb Thomas Zimmermann:
Hi
Am 24.04.20 um 08:56 schrieb Christian König:
Am 24.04.20 um 08:27 schrieb Thomas Zimmermann:
Hi Christian
Am 23.04.20 um 17:04 schrieb Christian König:
Hi guys,
one thing you don't seem to have considered yet is that not setting the top-down flag also won't get you the bottom-up approach, but rather the best fit.
Kind of unrelated: Would it pick the smallest free area that can hold a BO? IOW, would a small cursor BO fill up a small free area even if there's a much larger free area somewhere?
Yes, exactly that.
I see.
The bottom-up method searches for the lowest hole which can fit the requirements.
But there's no way to explictely select bottom-up? What is TTM_PL_FLAG_CONTIGUOUS doing?
Making sure that we allocate contiguous for amdgpu. Added that because I couldn't fix a few problems otherwise.
Should probably be a driver private placement flag instead, but didn't had time to clean that up.
Regards, Christian.
Best regards Thomas
The top-down method searches for the highest hole which can fit the requirements.
Both of those walk the holes by the address index tree, but there is also the best method which looks at the holes by their size and tries to find the best fit.
The best method usually only needs a single tree walk instead of a linear search.
Am 23.04.20 um 15:57 schrieb Gerd Hoffmann:
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
Well maybe instead of a ping/pong approach between top and bottom use a size based approach to place small BOs at the top and only the larger ones at the bottom (or the other way around).
That's what the current code does and it results in the OOM. Basically, there's a large BO in the middle of vram and the pageflip's destination BO does not fit before or after the large one.
Tough problem. No idea how to otherwise fix this without making the large BO moveable somehow.
Christian.
Best regards Thomas
Cheers, Christian. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Gerd
Am 23.04.20 um 15:57 schrieb Gerd Hoffmann:
I don't think it is that simple.
First: How will that interact with cursor bo allocations? IIRC the strategy for them is to allocate top-down, for similar reasons (avoid small cursor bo allocs fragment vram memory).
In ast, 2 cursor BOs are allocated during driver initialization and kept permanently at the vram's top end. I don't know about other drivers.
One-time allocation at init time shouldn't be a problem.
But cursor BOs are small, so they don't make much of a difference. What is needed is space for 2 primary framebuffers during pageflips, with one of them pinned. The other framebuffer can be located anywhere.
The problem isn't the size. The problem is dynamically allocated cursor BOs can also fragment vram, especially if top-bottom allocation is also used for large framebuffers so cursor BOs can end up somewhere in the middle of vram.
But the problem is the size. Cursor BOs are unrelated. Of the vram-helper-based drivers, only ast and mgag200 use cursors, and both perma-pin them to the end of vram.
Second: I think ttm will move bo's from vram to system only on memory pressure. So you can still end up with fragmented memory. To make the scheme with one fb @ top and one @ bottom work reliable you have to be more aggressive on pushing out framebuffers.
I'm the process of converting mgag200 to atomic modesetting. The given example is what I observed. I'm not claiming that the placement scheme is perfect, but it is required to get mgag200 working with atomic modesetting's pageflip logic. So we're solving a real problem here.
I don't doubt this is a real problem.
The bug comes from Weston's allocation strategy. Looking at the debug output:
0x0000000000000000-0x000000000000057f: 1407: free
This was fbdev's framebuffer with 1600x900@32bpp
0x000000000000057f-0x0000000000000b5b: 1500: used
This is Weston's framebuffer also with 1600x900@32bpp. But Weston allocates an additional, unused 60 scanlines. That is to render with tiles of 64x64px, I suppose. fbdev doesn't do that, hence Weston's second framebuffer doesn't fit into the free location of the fbdev framebuffer.
Sure. Assume there is just enough vram to fit in fbdev and two weston framebuffers. fbdev is allocated from bottom, first weston fb from top, second weston fb from bottom again. fbdev is not pushed out (no memory pressure yet) so the second weston fb ends up in the middle of vram fragmenting things. And now you are again in the situation where you might have enough free vram for an allocation but can't use it due to fragmention (probably harder to trigger in practice though).
Not quite. Framebuffer BOs of the current or smaller size will fit into vram. It's only a problem if the size of the new framebuffer BO increases. And that is exactly what currently happens with mgag200.
That aside, it's not a fair point, as you constructed an example that no memory manager can resolve.
That's why I would suggest to explicitly move out unpinned framebuffers (aka fbdev) before pinning a new one (second weston fb) instead of depending on ttm moving things out on OOM, to make sure you never allocate something in the middle of vram.
We cannot do that. Evicting BOs from vram involves an unmap operation. We did that in an earlier version of the code and received reports about performance regressions and CPU cycles in TLB update.
So we added the lazy-unmap feature, where BOs are only unmapped and evicted when necessary. I think it was even you who suggested this idea. :)
Third: I'd suggest make topdown allocations depending on current state instead of simply alternating, i.e. if there is a pinned framebuffer @ offset 0, then go for top-down.
That's what the current patch does. If the last pin was at the bottom, the next goes to the top. And then the other way around. Without alternating between both end of vram, the problem would occur again when fragmentation happens near the top end.
I'd feel better when checking the state of my current pins to figure whenever I should alloc top-bottom or not, for robustness reasons.
I don't understand why this is more robust. For example, if you pin a larger number of BOs, and than evict every other BO, there will always be free areas among the remaining BOs. All that changes is the location of those areas.
For a strategy, one cannot look at the BO size alone. In my initial example, BOs are ~6 MiB and vram is 16 MiB. So any strategy ala 'choose top-bottom for BOs >1/3 of vram' selects top-bottom for all framebuffer BOs. This would result in the same OOM, but from top to bottom.
At some point one has to choose to switch to top-down, and then back again at one of the next BOs. So the current patch effectively splits vram into a lower half and an upper half and puts BOs in alternating halves.
Best regards Thomas
Looking again at the vram helpers, this functionality could be implemented in drm_gem_vram_plane_helper_prepare_fb(). Drivers with other placement strategies could implement their own helper for prepare_fb.
vram helpers could also simply offer two prepare_fb variants.
cheers, Gerd
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi,
At some point one has to choose to switch to top-down, and then back again at one of the next BOs. So the current patch effectively splits vram into a lower half and an upper half and puts BOs in alternating halves.
Hmm, so maybe just make the split explicit instead of playing tricks with top_bottom? Register the lower vram half as TT_VRAM, register the upper half as TT_PRIV. That'll make sure you never have a big allocation in the middle blocking things. Or maybe add a allocation strategy flag to ttm which effectively does the same.
take care, Gerd
Hi Gerd
Am 24.04.20 um 11:38 schrieb Gerd Hoffmann:
Hi,
At some point one has to choose to switch to top-down, and then back again at one of the next BOs. So the current patch effectively splits vram into a lower half and an upper half and puts BOs in alternating halves.
Hmm, so maybe just make the split explicit instead of playing tricks with top_bottom? Register the lower vram half as TT_VRAM, register the upper half as TT_PRIV. That'll make sure you never have a big allocation in the middle blocking things. Or maybe add a allocation strategy flag to ttm which effectively does the same.
It's not that easy. Current cursors n ast are statically allocated. As soon as you add dynamic cursors into the mix, you'd get OOMs. If the framebuffer BO goes into VRAM and the cursor BO goes into PRIV, pinning the next framebuffer BO during a pageflip would send it to VRAM. But VRAM is still filled with the first framebuffer BO. So the memory manager needs some flexibility with the placement. Handling all that is already TTM's job.
I'm preparing v2 of this patch set. It'll get static cursors out of the way and make the feature opt-in.
Best regards Thomas
take care, Gerd
Hi,
It's not that easy. Current cursors n ast are statically allocated. As soon as you add dynamic cursors into the mix, you'd get OOMs.
Well, with the split you can simply allocate dynamic cursors with top-bottom to keep them out of the way. It's not 100% perfect, the area where the cursors are allocated from has a bit less free vram, so the split effectively isn't 50/50 but more like 49/51. But hey, still alot better than what we have today.
With two static cursors you can allocate one from TT_VRAM and one from TT_PRIV so the remaining free vram is the same in both regions.
If the framebuffer BO goes into VRAM and the cursor BO goes into PRIV, pinning the next framebuffer BO during a pageflip would send it to VRAM.
Oh, seems my idea outline was a bit to short. The plan is *not* to alternate between VRAM and PRIV on allocations. The plan is to add both PRIV and VRAM to the placement array when pinning the framebuffer.
ttm can simply place the framebuffers as it pleases and where it wants. Due to the split it can't do a big blocking allocation in the middle of vram though.
If you are going to pageflip you should have one framebuffer already pinned (the one currently displayed). If that happens to live TT_VRAM ttm should be able to make room in TT_PRIV to pin the new framebuffer you want pageflip to, and visa-versa.
ttm will only evict unpinned BOs if needed, when running with smaller framebuffers ttm will happily keep more than two framebuffers in vram. All fully automatic without vram helpers having to pay much attention to the allocation strategy.
I'm preparing v2 of this patch set. It'll get static cursors out of the way and make the feature opt-in.
I think with the split model there is no need to make that optional.
cheers, Gerd
Am 29.04.20 um 09:25 schrieb Gerd Hoffmann:
Hi,
It's not that easy. Current cursors n ast are statically allocated. As soon as you add dynamic cursors into the mix, you'd get OOMs.
Well, with the split you can simply allocate dynamic cursors with top-bottom to keep them out of the way. It's not 100% perfect, the area where the cursors are allocated from has a bit less free vram, so the split effectively isn't 50/50 but more like 49/51. But hey, still alot better than what we have today.
With two static cursors you can allocate one from TT_VRAM and one from TT_PRIV so the remaining free vram is the same in both regions.
If the framebuffer BO goes into VRAM and the cursor BO goes into PRIV, pinning the next framebuffer BO during a pageflip would send it to VRAM.
Oh, seems my idea outline was a bit to short. The plan is *not* to alternate between VRAM and PRIV on allocations. The plan is to add both PRIV and VRAM to the placement array when pinning the framebuffer.
ttm can simply place the framebuffers as it pleases and where it wants. Due to the split it can't do a big blocking allocation in the middle of vram though.
There is an easier and most likely less CPU intense way of doing this.
All you need to do is to make allocations which are below halve of your address space allocate from the beginning of a hole and if they are above halve of the address space from the end of a hole.
This way you should always keep the largest hole in the middle.
This can be done with another flag to drm_mm and would most likely would be a nice addition to other drivers as well.
Regards, Christian.
If you are going to pageflip you should have one framebuffer already pinned (the one currently displayed). If that happens to live TT_VRAM ttm should be able to make room in TT_PRIV to pin the new framebuffer you want pageflip to, and visa-versa.
ttm will only evict unpinned BOs if needed, when running with smaller framebuffers ttm will happily keep more than two framebuffers in vram. All fully automatic without vram helpers having to pay much attention to the allocation strategy.
I'm preparing v2 of this patch set. It'll get static cursors out of the way and make the feature opt-in.
I think with the split model there is no need to make that optional.
cheers, Gerd
dri-devel@lists.freedesktop.org