Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

30 Nov 2020


      On Fri, Nov 27, 2020 at 11:04:55AM -0500, Andrey Grodzovsky wrote:
...
On 11/27/20 9:59 AM, Daniel Vetter wrote:
...
On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
...
On 11/25/20 11:36 AM, Daniel Vetter wrote:
...
On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
...
Am 25.11.20 um 11:40 schrieb Daniel Vetter:
...
On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > On 11/24/20 2:41 AM, Christian König wrote:
> > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > what you are doing with this function.
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > notifier per device." to see
> > > > > > > > how i use it. I don't see why this should go
> > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > registered from within TTM
> > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > No, that is really vendor specific.
> > > > > > > 
> > > > > > > What I meant is to have a function like
> > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > to call and all tt objects are unpopulated.
> > > > > > So instead of this BO list i create and later iterate in
> > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > within
> > > > > > TTM with a single function ? Makes much more sense.
> > > > > Yes, exactly.
> > > > > 
> > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > actually not the best idea, we should now check the
> > > > > pin_count instead. This way we could also have a list of the
> > > > > pinned BOs in TTM.
> > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > It's probably a good idea to combine both iterations into this
> > > > new function to cover all the BOs allocated on the device.
> > > Yes, that's what I had in my mind as well.
> > > 
> > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > could have unforeseen consequences.
> > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > accesses once we drop all the DMA backing pages for a particular
> > > > BO ? Because for user mappings
> > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > nothing was done for in kernel CPU mappings.
> > > Yes exactly that.
> > > 
> > > In other words what happens if we free the ring buffer while the
> > > kernel still writes to it?
> > > 
> > > Christian.
> > While we can't control user application accesses to the mapped buffers
> > explicitly and hence we use page fault rerouting
> > I am thinking that in this  case we may be able to sprinkle
> > drm_dev_enter/exit in any such sensitive place were we might
> > CPU access a DMA buffer from the kernel ?
> Yes, I fear we are going to need that.
Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access
Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
driver specific hook ?
...
...
...
(but only for the kernel, so a
bit tricky)?
Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
some user process ? And  if we do need this distinction I think we should be able to
differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
for kernel thread.
Userspace mmap is handled by punching out the pte. So we don't need to do
anything special there.
For kernel mmap the begin/end should be all in the same context (so we
could use the srcu lock that works underneath drm_dev_enter/exit), since
at least right now kernel vmaps of dma-buf are very long-lived.
If by same context you mean the right drm_device (the exporter's one)
then this should be ok as I am seeing from amdgpu implementation
of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add
handler for .end_cpu_access callback to call drm_dev_exit there.
Same context = same system call essentially. You cannot hold locks while
returning to userspace. And current userspace can call the
begin/end_cpu_access callbacks through ioctls, so just putting a
drm_dev_enter/exit in them will break really badly. Iirc there's an igt
also for testing these ioctl - if there isn't we really should have one.
Hence why we need to be more careful here about how's calling and where we
can put the drm_dev_enter/exit.
-Daniel
...
Andrey
...
But the good news is that Thomas Zimmerman is working on this problem
already for different reasons, so it might be that we won't have any
long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
there.
...
...
...
Oh very very good point! I haven't thought about DMA-buf mmaps in this
context yet.
...
btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.
Well thinking more about this, it seems to be a another really good argument
why mapping pages from DMA-bufs into application address space directly is a
very bad idea :)
But yes, we essentially can't remove the device as long as there is a
DMA-buf with mappings. No idea how to clean that one up.
drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
What are the other 10% ?
dma_fence, which is also about 90% of the work probably. But I'm
guesstimating only 10% of the oopses you can hit. Since generally the
dma_fence for a buffer don't outlive the underlying buffer. So usually no
problems happen when we've solved the dma-buf sharing, but the dma_fence
can outlive the dma-buf, so there's still possibilities of crashing.
...
...
The even more worrying thing is random dma_fence attached to the dma_resv
object. We could try to clean all of ours up, but they could have escaped
already into some other driver. And since we're talking about egpu
hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
I have no how to fix that one :-/
-Daniel
I assume you are referring to sync_file_create/sync_file_get_fence API  for
dma_fence export/import ?
So dma_fence is a general issue, there's a pile of interfaces that result
in sharing with other drivers:

dma_resv in the dma_buf
sync_file
drm_syncobj (but I think that's not yet cross driver, but probably
 changes)

In each of these cases drivers can pick up the dma_fence and use it
internally for all kinds of purposes (could end up in the scheduler or
wherever).
...
So with DMA bufs we have the drm_gem_object as exporter specific private data
and so we can do drm_dev_get and put at the drm_gem_object layer to bind
device life cycle
to that of each GEM object but, we don't have such mid-layer for dma_fence
which could allow
us to increment device reference for each fence out there related to that
device - is my understanding correct ?
Yeah that's the annoying part with dma-fence. No existing generic place to
put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
other problems first.
-Daniel
...
Andrey
Andrey
...
...
Christian.
...
-Daniel
> > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > Is there other places ?
> Puh, good question. I have no idea.
> 
> > Another point is that at this point the driver shouldn't access any such
> > buffers as we are at the process finishing the device.
> > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > think there is anything else to do ?
> Well there is a page fault handler for kernel mappings, but that one just
> prints the stack trace into the system log and calls BUG(); :)
> 
> Long story short we need to avoid any access to released pages after unplug.
> No matter if it's from the kernel or userspace.
> 
> Regards,
> Christian.
> 
> > Andrey
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use