Re: [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu

29 Mar 2022

      On 29/03/22 9:30 pm, Arunpravin Paneer Selvam wrote:
...
On 29/03/22 4:54 pm, Christian König wrote:
...
Am 29.03.22 um 13:19 schrieb Arunpravin Paneer Selvam:
...
[SNIP]
...
...

pages_left = node->base.num_pages;
i = 0;

spin_lock(&mgr->lock);
while (pages_left) {
uint32_t alignment = tbo->page_alignment;

if (tbo->page_alignment)

	min_page_size = tbo->page_alignment << PAGE_SHIFT;

else

	min_page_size = mgr->default_page_size;

The handling here looks extremely awkward to me.
min_page_size should be determined outside of the loop, based on default_page_size, alignment and contiguous flag.
I kept min_page_size determine logic inside the loop for cases 2GiB+
requirements, Since now we are round up the size to the required
alignment, I modified the min_page_size determine logic outside of the
loop in v12. Please review.
Ah! So do we only have the loop so that each allocation isn't bigger 
than 2GiB? If yes couldn't we instead add a max_alloc_size or something 
similar?
yes we have the loop to limit the allocation not bigger than 2GiB, I
think we cannot avoid the loop since we need to allocate the remaining
pages if any, to complete the 2GiB+ size request. In other words, first
iteration we limit the max size to 2GiB and in the second iteration we
allocate the left over pages if any.
Hi Christian,
Here my understanding might be incorrect, should we limit the max size =
2GiB and skip all the remaining pages for a 2GiB+ request?
Thanks,
Arun
...
...
BTW: I strongly suggest that you rename min_page_size to min_alloc_size. 
Otherwise somebody could think that those numbers are in pages and not 
bytes.
modified in v12
...
...
...
Then why do you drop the lock and grab it again inside the loop? And what is "i" actually good for?
modified the lock/unlock placement in v12.
"i" is to track when there is 2GiB+ contiguous allocation request, first
we allocate 2GiB (due to SG table limit) continuously and the remaining
pages in the next iteration, hence this request can't be a continuous.
To set the placement flag we make use of "i" value. In our case "i"
value becomes 2 and we don't set the below flag.
node->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
If we don't get such requests, I will remove "i".
I'm not sure if that works.
As far as I can see drm_buddy_alloc_blocks() can allocate multiple 
blocks at the same time, but i is only incremented when we loop.
So what you should do instead is to check if node->blocks just contain 
exactly one element after the allocation but before the trim.
ok
...
...
...
...

/* Limit maximum size to 2GB due to SG table limitations */

pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));

if (pages >= pages_per_node)

	alignment = pages_per_node;

r = drm_mm_insert_node_in_range(mm, &node->mm_nodes[i], pages,

				alignment, 0, place->fpfn,

				lpfn, mode);

if (unlikely(r)) {

	if (pages > pages_per_node) {

		if (is_power_of_2(pages))

			pages = pages / 2;

		else

			pages = rounddown_pow_of_two(pages);

		continue;

	}

	goto error_free;

	min_page_size = pages_per_node << PAGE_SHIFT;

if (!is_contiguous && !IS_ALIGNED(pages, min_page_size >> PAGE_SHIFT))

	is_contiguous = 1;

if (is_contiguous) {

	pages = roundup_pow_of_two(pages);

	min_page_size = pages << PAGE_SHIFT;

	if (pages > lpfn)

		lpfn = pages;
}

vis_usage += amdgpu_vram_mgr_vis_size(adev, &node->mm_nodes[i]);

amdgpu_vram_mgr_virt_start(&node->base, &node->mm_nodes[i]);

pages_left -= pages;

BUG_ON(min_page_size < mm->chunk_size);

mutex_lock(&mgr->lock);

r = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,

			   (u64)lpfn << PAGE_SHIFT,

			   (u64)pages << PAGE_SHIFT,

			   min_page_size,

			   &node->blocks,

			   node->flags);

mutex_unlock(&mgr->lock);

if (unlikely(r))

	goto error_free_blocks;

++i;

if (pages > pages_left)

	pages = pages_left;

	pages_left = 0;

else

	pages_left -= pages;

}

spin_unlock(&mgr->lock);

if (i == 1)

/* Free unused pages for contiguous allocation */
if (is_contiguous) {

Well that looks really odd, why is trimming not part of
drm_buddy_alloc_blocks() ?
we didn't place trim function part of drm_buddy_alloc_blocks since we
thought this function can be a generic one and it can be used by any
other application as well. For example, now we are using it for trimming
the last block in case of size non-alignment with min_page_size.
Good argument. Another thing I just realized is that we probably want to 
double check if we only allocated one block before the trim.
ok
...
Thanks,
Christian.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu