Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13

8 Apr 2021

      On Thu, Apr 8, 2021 at 6:28 AM Christian König
ckoenig.leichtzumerken@gmail.com wrote:
...
Am 08.04.21 um 09:13 schrieb Christian König:
...
Am 07.04.21 um 21:04 schrieb Alex Deucher:
...
On Wed, Apr 7, 2021 at 3:23 AM Dave Airlie airlied@gmail.com wrote:
...
On Wed, 7 Apr 2021 at 06:54, Alex Deucher alexdeucher@gmail.com
wrote:
...
On Fri, Apr 2, 2021 at 12:22 PM Christian König
ckoenig.leichtzumerken@gmail.com wrote:
...
Hey Alex,
the TTM and scheduler changes should already be in the drm-misc-next
branch (not 100% sure about the TTM patch, need to double check
next week).
The TTM change is not in drm-misc yet.
...
Could that cause problems when both are merged into drm-next?
Dave, Daniel, how do you want to handle this?  The duplicated patch
is this one:
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac4eb83ab255de9c31184df...
amdgpu has changes which depend on it.  The same patch is included
in this PR.
Ouch not sure how best to sync up here, maybe get misc-next into my
tree then rebase your tree on top of it?
I can do that.
Please let me double check later today that we have everything we need
in drm-misc-next.
There where two patch for TTM (one from Felix and one from Oak) which
still needed to be pushed to drm-misc-next. I've done that just a minute
ago.
They were included in this PR.
...
Then we have this patch which fixes a bug in code removed on
drm-misc-next. I think it should be dropped when amd-staging-drm-next is
based on drm-next/drm-misc-next.
Author: xinhui pan xinhui.pan@amd.com
Date:   Wed Feb 24 11:28:08 2021 +0800
 drm/ttm: Do not add non-system domain BO into swap list

Ok.
...
I've also found the following patch which is problematic as well:
commit c8a921d49443025e10794342d4433b3f29616409
Author: Jack Zhang Jack.Zhang1@amd.com
Date:   Mon Mar 8 12:41:27 2021 +0800
 drm/amd/amdgpu implement tdr advanced mode

 [Why]
 Previous tdr design treats the first job in job_timeout as the bad job.
 But sometimes a later bad compute job can block a good gfx job and
 cause an unexpected gfx job timeout because gfx and compute ring share
 internal GC HW mutually.

 [How]
 This patch implements an advanced tdr mode.It involves an additinal
 synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
 step in order to find the real bad job.

 1. At Step0 Resubmit stage, it synchronously submits and pends for the
 first job being signaled. If it gets timeout, we identify it as guilty
 and do hw reset. After that, we would do the normal resubmit step to
 resubmit left jobs.

 2. For whole gpu reset(vram lost), do resubmit as the old way.

 Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
 Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

That one is modifying both amdgpu as well as the scheduler code. IIRC I
actually requested that the patch is split into two, but that was
somehow not done.
How should we proceed here? Should I separate the patch, push the
changes to drm-misc-next and then we merge with drm-next and rebase
amd-staging-drm-next on top of that?
That's most likely the cleanest option approach as far as I can see.
That's fine with me.  We could have included them in my PR.  Now we
have wait for drm-misc-next to be merged again before we can merge the
amdgpu code.  Is anyone planning to do another drm-misc merge at this
point?
Alex
...
Thanks,
Christian.
...
Regards,
Christian.
...
Alex
...
Dave.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13