https://bugs.freedesktop.org/show_bug.cgi?id=97634
Bug ID: 97634 Summary: [amdgpu SI] multigpu setup crashes during boot when dpm=1 Product: DRI Version: DRI git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: arek.rusi@gmail.com
Created attachment 126300 --> https://bugs.freedesktop.org/attachment.cgi?id=126300&action=edit kernel log
tested on drm-next-4.9-wip: 1) 832c6ef + 2 patches from Tom and Michel (another bugs) 2) 2c0d731 the same behavior on both.
With dpm=0 amdgpu doesn't complain and works with intel.
Hard to say it's regression because when I tried DRI_PRIME few month ago dpm didn't work at all.
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #1 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 126508 --> https://bugs.freedesktop.org/attachment.cgi?id=126508&action=edit dmesg log with latest drm-next-4.9-wip: 72bb0f5
modprobe -r amdgpu: at "148 modprobe amdgpu: at "162
It's some kind of progress, because intel-gfx works but amdgpu doesn't start at all. error from dmesg looks like the same: https://bugs.freedesktop.org/show_bug.cgi?id=97801
kernel: drm-next-4.9-wip: 72bb0f5
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #2 from Alex Deucher alexdeucher@gmail.com --- probably a duplicate of bug 97801
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #3 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 126519 --> https://bugs.freedesktop.org/attachment.cgi?id=126519&action=edit dmesg log with latest drm-next-4.9-wip: 97231a9
unfortunately it doesn't when bug 97801 is solved now, kernel log looks like similar for the first log
[ 90.544035] amdgpu 0000:01:00.0: fb1: amdgpudrmfb frame buffer device [drm:gfx_v6_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on GFX ring (-110). [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110). [drm] Initialized amdgpu 3.6.0 20150101 for 0000:01:00.0 on minor 1 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #4 from Arek Ruśniak arek.rusi@gmail.com --- For new drm-next-4.9-wip and drm-next-4.9 symptoms are the same as before.
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #5 from Arek Ruśniak arek.rusi@gmail.com --- I've just tried newest drm-next-4.9-wip: 1c22b05623e5e03ada5a767951eac3203b246be9
and there is something new in kernel log:
[ 3430.379659] amdgpu 0000:01:00.0: fb1: amdgpudrmfb frame buffer device [ 3431.438851] [drm:gfx_v6_0_ring_test_ib] *ERROR* amdgpu: IB test timed out [ 3431.438862] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on GFX ring (-110). [ 3431.438866] [drm:amdgpu_device_init] *ERROR* ib ring test failed (-110). [ 3431.871374] [drm:amdgpu_late_init] *ERROR* late_init of IP block <amdgpu_powerplay> failed -22 [ 3431.871381] amdgpu 0000:01:00.0: amdgpu_late_init failed [ 3431.871386] amdgpu 0000:01:00.0: Fatal error during GPU init [ 3431.871390] [drm] amdgpu: finishing device.
after that, sysrq works only (and ssh).
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #6 from Arek Ruśniak arek.rusi@gmail.com --- Created attachment 127149 --> https://bugs.freedesktop.org/attachment.cgi?id=127149&action=edit full dmesg for next-drm-4.9-wip: 1c22b05
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #7 from Robin KERDILES kerdiles.robin@orange.fr --- This is reproductible on kernel 4.9-rc7 even with the branch drm-fixes-4.9 from agd5f repository merged https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-fixes-4.9 The branch was pointing on bcfdd5d5105087e6f33dfeb08a1ca6b2c0287b61 when I merged it. I can use amdgpu with dpm=0 but performance is bad (not surprising for an experimental support on southern island). I would like to bisect but the dpm never worked on amdgpu SI as far as I know.
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #8 from Arek Ruśniak arek.rusi@gmail.com --- I don't have HW (multiGPU) to test it anymore, so fill free to close it in any time/way you want.
https://bugs.freedesktop.org/show_bug.cgi?id=97634
--- Comment #9 from Robin KERDILES kerdiles.robin@orange.fr --- I was still getting issues with amdgpu when I tested linux 4.9 + merge from drm-next 4.10 but the dmesg changed (another error code).
Will try another merge in next days and post the new error codes.
https://bugs.freedesktop.org/show_bug.cgi?id=97634
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #10 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/96.
dri-devel@lists.freedesktop.org