I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as ring 0.
I'm hoping that one of the amdgpu developers might give me pointers on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
Hi Matthew,
sounds like the UVD block doesn't want to initialize. No idea off hand why, could be anything. I would need the hardware here for a closer inspection.
For a workaround you can try to disable the UVD blokc using the ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as ring 0.
I'm hoping that one of the amdgpu developers might give me pointers on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König christian.koenig@amd.com wrote ----
Hi Matthew,
sounds like the UVD block doesn't want to initialize. No idea off hand why, could be anything. I would need the hardware here for a closer inspection.
For a workaround you can try to disable the UVD blokc using the ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
When I clear bit 7 I get the following now:
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0 Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0 Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22 Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.
One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?
Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?
Thanks.
-M
CWARNFLAGS+= -Wno-pointer-arith CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual CWARNFLAGS.amdgpu_fence.c= -Wno-format CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_object.c= -Wno-format CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ucode.c= -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual CWARNFLAGS.amdgpu_uvd.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.amdgpu_test.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes CWARNFLAGS.atombios_dp.c= -Wno-format CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes CWARNFLAGS.fiji_smc.c= -Wno-cast-qual CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable CWARNFLAGS.tonga_smc.c= -Wno-cast-qual CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.processpptables.c= -Wno-missing-prototypes -Wno-sometimes-uninitialized CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as ring 0.
I'm hoping that one of the amdgpu developers might give me pointers on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Matthew,
see inline below.
Am 14.06.2016 um 00:03 schrieb Matthew Macy:
---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König christian.koenig@amd.com wrote ----
Hi Matthew,
sounds like the UVD block doesn't want to initialize. No idea off hand why, could be anything. I would need the hardware here for a closer inspection.
For a workaround you can try to disable the UVD blokc using the ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
When I clear bit 7 I get the following now:
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0 Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0 Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22 Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
UVD is optional (as long as you don't want to do hardware video decoding) but the SMU isn't. Alex, Rex any idea what's going wrong here?
Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.
Yeah, I don't see why some blocks should fail while others seem to initialize just fine. Especially since you reported it seems to work on other hardware.
One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?
Not as far as I know. We had some problems in the past even with some gcc versions because of some odd things in the BIOS headers (e.g. zero sized arrays). But those issues should be fixed by now.
Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?
Yeah, sure feel free to provide patches. As long as it is only cleanup and not structural changes it should be trivial to get them merged.
Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like something which should be trivial to fix.
Regards, Christian.
Thanks.
-M
CWARNFLAGS+= -Wno-pointer-arith CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual CWARNFLAGS.amdgpu_fence.c= -Wno-format CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_object.c= -Wno-format CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ucode.c= -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual CWARNFLAGS.amdgpu_uvd.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.amdgpu_test.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes CWARNFLAGS.atombios_dp.c= -Wno-format CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes CWARNFLAGS.fiji_smc.c= -Wno-cast-qual CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable CWARNFLAGS.tonga_smc.c= -Wno-cast-qual CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.processpptables.c= -Wno-missing-prototypes -Wno-sometimes-uninitialized CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as ring 0.
I'm hoping that one of the amdgpu developers might give me pointers on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Jun 14, 2016 at 4:10 AM, Christian König christian.koenig@amd.com wrote:
Hi Matthew,
see inline below.
Am 14.06.2016 um 00:03 schrieb Matthew Macy:
---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König christian.koenig@amd.com wrote ----
Hi Matthew,
sounds like the UVD block doesn't want to initialize. No idea off hand why, could be anything. I would need the hardware here for a closer inspection.
For a workaround you can try to disable the UVD blokc using the ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
When I clear bit 7 I get the following now:
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0 Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0 Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22 Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
UVD is optional (as long as you don't want to do hardware video decoding) but the SMU isn't. Alex, Rex any idea what's going wrong here?
Seems like maybe the two issues are related. Maybe some general MMIO issue on that particular system or a issue with the MC or gart setup? The firmware that the SMU loads is stored in gart and all of the engine rings are in gart. Maybe a problem with the IOMMU setup on the CPU?
Alex
Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.
Yeah, I don't see why some blocks should fail while others seem to initialize just fine. Especially since you reported it seems to work on other hardware.
One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?
Not as far as I know. We had some problems in the past even with some gcc versions because of some odd things in the BIOS headers (e.g. zero sized arrays). But those issues should be fixed by now.
Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?
Yeah, sure feel free to provide patches. As long as it is only cleanup and not structural changes it should be trivial to get them merged.
Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like something which should be trivial to fix.
Regards, Christian.
Thanks.
-M
CWARNFLAGS+= -Wno-pointer-arith CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual CWARNFLAGS.amdgpu_fence.c= -Wno-format CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_object.c= -Wno-format CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ucode.c= -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual CWARNFLAGS.amdgpu_uvd.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.amdgpu_test.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes CWARNFLAGS.atombios_dp.c= -Wno-format CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes CWARNFLAGS.fiji_smc.c= -Wno-cast-qual CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable CWARNFLAGS.tonga_smc.c= -Wno-cast-qual CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.processpptables.c= -Wno-missing-prototypes -Wno-sometimes-uninitialized CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as
ring 0.
I'm hoping that one of the amdgpu developers might give me pointers
on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
---- On Tue, 14 Jun 2016 06:02:09 -0700 Alex Deucher wrote ----
On Tue, Jun 14, 2016 at 4:10 AM, Christian König christian.koenig@amd.com wrote:
Hi Matthew,
see inline below.
Am 14.06.2016 um 00:03 schrieb Matthew Macy:
---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König christian.koenig@amd.com wrote ----
Hi Matthew,
sounds like the UVD block doesn't want to initialize. No idea off hand why, could be anything. I would need the hardware here for a closer inspection.
For a workaround you can try to disable the UVD blokc using the ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
When I clear bit 7 I get the following now:
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0 Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0 Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22 Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
UVD is optional (as long as you don't want to do hardware video decoding) but the SMU isn't. Alex, Rex any idea what's going wrong here?
Seems like maybe the two issues are related. Maybe some general MMIO issue on that particular system or a issue with the MC or gart setup? The firmware that the SMU loads is stored in gart and all of the engine rings are in gart. Maybe a problem with the IOMMU setup on the CPU?
The two issues are definitely related. They both go through a bounded delay loop waiting for some operation to complete.
By default FreeBSD doesn't use the IOMMU on x86 so that's not an issue.
One thing that is different between the Elitebook (A12) and the the Thinkpad (A10) is that the Thinkpad has both integrated and discrete GPUs. 0x6660 matches Hainan in drm_pciids.h which I guess is GCN 1.0? Could that possibly be an issue? I know amdgpu doesn't support pre GCN 1.1 currently, so I would assume it would just be ignored. Nonetheless, I thought I should bring it up just in case.
vgapci0@pci0:0:1:0: class=0x030000 card=0x511617aa chip=0x98741002 rev=0xc5 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Carrizo' class = display subclass = VGA <...> vgapci1@pci0:5:0:0: class=0x038000 card=0x511617aa chip=0x66601002 rev=0x83 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330]' class = display
Thanks. -M
Alex
Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.
Yeah, I don't see why some blocks should fail while others seem to initialize just fine. Especially since you reported it seems to work on other hardware.
One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?
Not as far as I know. We had some problems in the past even with some gcc versions because of some odd things in the BIOS headers (e.g. zero sized arrays). But those issues should be fixed by now.
Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?
Yeah, sure feel free to provide patches. As long as it is only cleanup and not structural changes it should be trivial to get them merged.
Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like something which should be trivial to fix.
Regards, Christian.
Thanks.
-M
CWARNFLAGS+= -Wno-pointer-arith CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual CWARNFLAGS.amdgpu_fence.c= -Wno-format CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_object.c= -Wno-format CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ucode.c= -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual CWARNFLAGS.amdgpu_uvd.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.amdgpu_test.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes CWARNFLAGS.atombios_dp.c= -Wno-format CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes CWARNFLAGS.fiji_smc.c= -Wno-cast-qual CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable CWARNFLAGS.tonga_smc.c= -Wno-cast-qual CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.processpptables.c= -Wno-missing-prototypes -Wno-sometimes-uninitialized CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as
ring 0.
I'm hoping that one of the amdgpu developers might give me pointers
on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
The two issues are definitely related. They both go through a bounded delay loop waiting for some operation to complete.
I realized that sounded really dumb after I sent it. But what makes me think it's all related is that timing perturbations / random seemingly unrelated code changes can cause it to fail in about 3 or 4 distinct ways. Some times failing as early ring 0, other times ring 1, other times ring 11, and in this case on SMU firmware check. Whereas no matter what I do it gets to the point of switching from the efifb to the fb based on the set up by amdgpu on my friend's elitebook. So it looks like I just hit really unfortunate choice for a bring up device.
-M
By default FreeBSD doesn't use the IOMMU on x86 so that's not an issue.
One thing that is different between the Elitebook (A12) and the the Thinkpad (A10) is that the Thinkpad has both integrated and discrete GPUs. 0x6660 matches Hainan in drm_pciids.h which I guess is GCN 1.0? Could that possibly be an issue? I know amdgpu doesn't support pre GCN 1.1 currently, so I would assume it would just be ignored. Nonetheless, I thought I should bring it up just in case.
vgapci0@pci0:0:1:0: class=0x030000 card=0x511617aa chip=0x98741002 rev=0xc5 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Carrizo' class = display subclass = VGA <...> vgapci1@pci0:5:0:0: class=0x038000 card=0x511617aa chip=0x66601002 rev=0x83 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330]' class = display
Thanks. -M
Alex
Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.
Yeah, I don't see why some blocks should fail while others seem to initialize just fine. Especially since you reported it seems to work on other hardware.
One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?
Not as far as I know. We had some problems in the past even with some gcc versions because of some odd things in the BIOS headers (e.g. zero sized arrays). But those issues should be fixed by now.
Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?
Yeah, sure feel free to provide patches. As long as it is only cleanup and not structural changes it should be trivial to get them merged.
Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like something which should be trivial to fix.
Regards, Christian.
Thanks.
-M
CWARNFLAGS+= -Wno-pointer-arith CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual CWARNFLAGS.amdgpu_fence.c= -Wno-format CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_object.c= -Wno-format CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ucode.c= -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual CWARNFLAGS.amdgpu_uvd.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.amdgpu_test.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes CWARNFLAGS.atombios_dp.c= -Wno-format CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes CWARNFLAGS.fiji_smc.c= -Wno-cast-qual CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable CWARNFLAGS.tonga_smc.c= -Wno-cast-qual CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.processpptables.c= -Wno-missing-prototypes -Wno-sometimes-uninitialized CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
On my A10 ring 11 test is failing: https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds: https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as
ring 0.
I'm hoping that one of the amdgpu developers might give me pointers
on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-----Original Message----- From: Matthew Macy [mailto:mmacy@nextbsd.org] Sent: Tuesday, June 14, 2016 4:03 PM To: Alex Deucher Cc: Koenig, Christian; Deucher, Alexander; Zhu, Rex; dri- devel@lists.freedesktop.org Subject: Re: Re: Looking for pointers on diagnosing ring test failure in amdgpu
---- On Tue, 14 Jun 2016 06:02:09 -0700 Alex Deucher wrote ----
On Tue, Jun 14, 2016 at 4:10 AM, Christian König christian.koenig@amd.com wrote:
Hi Matthew,
see inline below.
Am 14.06.2016 um 00:03 schrieb Matthew Macy:
---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König christian.koenig@amd.com wrote ----
Hi Matthew,
sounds like the UVD block doesn't want to initialize. No idea off hand why, could be anything. I would need the hardware here for a closer inspection.
For a workaround you can try to disable the UVD blokc using the ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
When I clear bit 7 I get the following now:
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0 Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0 Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22 Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
UVD is optional (as long as you don't want to do hardware video
decoding)
but the SMU isn't. Alex, Rex any idea what's going wrong here?
Seems like maybe the two issues are related. Maybe some general MMIO issue on that particular system or a issue with the MC or gart setup? The firmware that the SMU loads is stored in gart and all of the engine rings are in gart. Maybe a problem with the IOMMU setup on the CPU?
The two issues are definitely related. They both go through a bounded delay loop waiting for some operation to complete.
By default FreeBSD doesn't use the IOMMU on x86 so that's not an issue.
One thing that is different between the Elitebook (A12) and the the Thinkpad (A10) is that the Thinkpad has both integrated and discrete GPUs. 0x6660 matches Hainan in drm_pciids.h which I guess is GCN 1.0? Could that possibly be an issue? I know amdgpu doesn't support pre GCN 1.1 currently, so I would assume it would just be ignored. Nonetheless, I thought I should bring it up just in case.
amdgpu won't try and bind to it, but radeon will. If you've ported radeon to freebsd, you might want to blacklist that driver while you are testing. Some bioses also have an option to disable the dGPU, if yours does, you could try that. It shouldn't cause an issue if nothing is trying to load on it.
Alex
vgapci0@pci0:0:1:0: class=0x030000 card=0x511617aa chip=0x98741002 rev=0xc5 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Carrizo' class = display subclass = VGA <...> vgapci1@pci0:5:0:0: class=0x038000 card=0x511617aa chip=0x66601002 rev=0x83 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330]' class = display
Thanks. -M
Alex
Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.
Yeah, I don't see why some blocks should fail while others seem to initialize just fine. Especially since you reported it seems to work on other hardware.
One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?
Not as far as I know. We had some problems in the past even with some
gcc
versions because of some odd things in the BIOS headers (e.g. zero sized arrays). But those issues should be fixed by now.
Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the -Wno-pointer-arith is to accept the linux convention of
doing
pointer arithmetic on void pointers. All the others are arguably
oversights
in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?
Yeah, sure feel free to provide patches. As long as it is only cleanup and not structural changes it should be trivial to get them merged.
Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound
like
something which should be trivial to fix.
Regards, Christian.
Thanks.
-M
CWARNFLAGS+= -Wno-pointer-arith CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual CWARNFLAGS.amdgpu_fence.c= -Wno-format CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_object.c= -Wno-format CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes CWARNFLAGS.amdgpu_ucode.c= -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual CWARNFLAGS.amdgpu_uvd.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vce.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.amdgpu_test.c= -Wno-format CWARNFLAGS.amdgpu_vm.c= -Wno-format CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes CWARNFLAGS.atombios_dp.c= -Wno-format CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes CWARNFLAGS.fiji_smc.c= -Wno-cast-qual CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable CWARNFLAGS.tonga_smc.c= -Wno-cast-qual CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-
prototypes
CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.processpptables.c= -Wno-missing-prototypes -Wno-sometimes-uninitialized CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes -Wno-enum-conversion CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-
qual
CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes -Wno-cast-qual CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-
qual
CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
Regards, Christian.
Am 13.06.2016 um 03:35 schrieb Matthew Macy:
I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works
extremely
well on most hardware (I have yet to diagnose / fix the severe artifacts
on
Cherry Trail and Atom).
On my A10 ring 11 test is failing:
https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
On my friend's A12 based EliteBook ring initialization succeeds:
https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
With minor timing perturbations ring tests will fail as early as
ring 0.
I'm hoping that one of the amdgpu developers might give me
pointers
on how to diagnose further and or what bugs in the linuxkpi might be
causing
this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
Thanks in advance.
-M
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org