Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

30 Apr 2018


      On 2018-04-29 09:02 AM, Christian König wrote:
...
Am 29.04.2018 um 01:02 schrieb Michel Dänzer:
...
On 2018-04-28 06:30 PM, Ilia Mirkin wrote:
...
On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer michel@daenzer.net
wrote:
...
From: Michel Dänzer michel.daenzer@amd.com
Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
try to allocate huge pages. However, not all drivers can take advantage
of huge pages, but they would incur the overhead for allocating and
freeing them anyway.
Now, drivers which can take advantage of huge pages need to set the new
flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
no longer incur any overhead for allocating or freeing huge pages.
v2:

Also guard swapping of consecutive pages in ttm_get_pages
Reword commit log, hopefully clearer now

Cc: stable@vger.kernel.org
Signed-off-by: Michel Dänzer michel.daenzer@amd.com
Both I and lots of other people, based on reports, are still seeing
plenty of issues with this as late as 4.16.4.
"lots of other people", "plenty of issues" sounds a bit exaggerated from
what I've seen. FWIW, while I did see the original messages myself, I
haven't seen any since Christian's original fix (see below), neither
with amdgpu nor radeon, even before this patch you followed up to.
...
Admittedly I'm on nouveau, but others have reported issues with
radeon/amdgpu as well. It's been going on since the feature was merged
in v4.15, with what seems like little investigation from the authors
introducing the feature.
That's not a fair assessment. See
https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following
comments.
Christian fixed the original issue in
d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when
__GFP_NOWARN is set". Christian did his best to try and get the fix in
before 4.15 final, but for reasons beyond his control, it was delayed
until 4.16-rc1 and then backported to 4.15.5.
Unfortunately, there was an swiotlb regression (not directly related to
Christian's work) shortly after this fix, also in 4.16-rc1, which is now
fixed in 4.17-rc1 and will be backported to 4.16.y.
And that's exactly the reason why I intentionally kept this enabled for
all users of the TTM DMA page pool and not put it behind a flag.
This change has surfaced quite a number of bugs in the swiotlb code
which could have caused issues before. It's just that those code path
where never exercised massively before.
Additional to that using huge pages is beneficial for the MM and CPU TLB
(not implemented yet) even when the GPU driver can't make much use of it.
Do I understand correctly that you're against this patch?
AFAIU the only benefit of huge pages with a driver which doesn't take
advantage of them directly is "for the MM". Can you describe a bit more
what that benefit is exactly? Is it expected to outweigh the cost of
allocating / freeing huge pages?
...
...
It looks like there's at least one more bug left, but it's not clear yet
when that was introduced, whether it's directly related to Christian's
work, or indeed what the impact is. Let's not get ahead of ourselves.
Well my patches surfaced the problems, but the underlying issues where
present even before those changes and I'm very well involved in fixing
the underlying issues.
I even considered to just revert the huge page path for the DMA pool
allocator, but it's just that the TTM patches seem to work exactly as
they are intended. So that doesn't feel like doing the right thing here.
I agree. Has anyone reported this to the DMA/SWIOTLB developers?
-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE