Re: ✗ Fi.CI.BAT: failure for series starting with [1/6] dma-buf: add dynamic DMA-buf handling v13

8 Aug 2019

      On Thu, Aug 8, 2019 at 9:09 AM Koenig, Christian
Christian.Koenig@amd.com wrote:
...
Am 07.08.19 um 23:19 schrieb Daniel Vetter:
...
On Wed, Jul 31, 2019 at 10:55:02AM +0200, Daniel Vetter wrote:
...
On Thu, Jun 27, 2019 at 09:28:11AM +0200, Christian König wrote:
...
Hi Daniel,
those fails look like something random to me and not related to my patch
set. Correct?
First one I looked at has the reservation_obj all over:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13438/fi-cml-u/igt@gem_ex...
So 5 second guees is ... probably real?
Note that with the entire lmem stuff going on right now there's massive
discussions about how we're doing resv_obj vs obj->mm.lock the wrong way
round in i915, so I'm not surprised at all that you managed to trip this.
The way I see it right now is that obj->mm.lock needs to be limited to
dealing with the i915 shrinker interactions only, and only for i915 native
objects. And for dma-bufs we need to make sure it's not anywhere in the
callchain. Unfortunately that's a massive refactor I guess ...
Thought about this some more, aside from just breaking i915 or waiting
until it's refactored (Both not awesome) I think the only option is get
back to the original caching. And figure out whether we really need to
take the direction into account for that, or whether upgrading to
bidirectional unconditionally won't be ok. I think there's only really two
cases where this matters:

display drivers using the cma/dma_alloc helpers. Everything is allocated
 fully coherent, cpu side wc, no flushing.

Everyone else (on platforms where there's actually some flushing going
 on) is for rendering gpus, and those always map bidirectional and want
 the mapping cached for as long as possible.

With that we could go back to creating the cached mapping at attach time
and avoid inflicting the reservation object lock to places that would keel
over.
Thoughts?
Actually we had a not so nice internal mail thread with our hardware
guys and it looks like we have tons of hardware bugs/exceptions that
sometimes PCIe BARs are only readable or only writable. So it turned out
that always caching with bidirectional won't work for us either.
Additional to that I'm not sure how i915 actually triggered the issue,
cause with the current code that shouldn't be possible.
Forgot to explain this: i915 has it's own lock for managing buffer
state, originally struct_mutex, now also i915_gem_obj->mm.lock. When
importing we take both of these before calling into the exporter, when
exporting we need these when getting called from the import. If the
importer uses the reservation_object lock you get a classic ABBA
deadlock.
I thought the plan was to push struct_mutex down and obj->mm.lock up
until they meet in the middle and we can start using the resv_obj
ww_mutex for everything. But looking at some of the latest in-flight
patches (I cc'ed you on them) that seems to not really be the plan,
which is bad :-/
-Daniel
...
But independent of that I came to the conclusion that we first need to
get to a common view of what the fences in the reservation mean or
otherwise the whole stuff here isn't going to work smooth either.
So working on that for now and when that's finished I will come back to
this problem here again.
Regards,
Christian.
...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: ✗ Fi.CI.BAT: failure for series starting with [1/6] dma-buf: add dynamic DMA-buf handling v13