qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] [ 6.621376] sp : ffff800012b73760 [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27: 0000000010000000 [ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24: ffffcf376848c000 [ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21: 0000000000000000 [ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18: 0000000000000014 [ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15: 000000006e771809 [ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12: 696e696666615f65 [ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 : ffffcf3718e085a4 [ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 : 0000000000000017 [ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 : 0000000014000000 [ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 : ffff0000c4086000 [ 6.628751] Call trace: [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] [ 6.629404] setup_slot+0x34/0xf0 [qxl] [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] [ 6.630654] local_pci_probe+0x48/0xb8 [ 6.631027] pci_device_probe+0x194/0x208 [ 6.631464] really_probe+0xd0/0x458 [ 6.631818] __driver_probe_device+0x124/0x1c0 [ 6.632256] driver_probe_device+0x48/0x130 [ 6.632669] __driver_attach+0xc4/0x1a8 [ 6.633049] bus_for_each_dev+0x78/0xd0 [ 6.633437] driver_attach+0x2c/0x38 [ 6.633789] bus_add_driver+0x154/0x248 [ 6.634168] driver_register+0x6c/0x128 [ 6.635205] __pci_register_driver+0x4c/0x58 [ 6.635628] qxl_init+0x48/0x1000 [qxl] [ 6.636013] do_one_initcall+0x50/0x240 [ 6.636390] do_init_module+0x60/0x238 [ 6.636768] load_module+0x2458/0x2900 [ 6.637136] __do_sys_finit_module+0xbc/0x128 [ 6.637561] __arm64_sys_finit_module+0x28/0x38 [ 6.638004] invoke_syscall+0x74/0xf0 [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 [ 6.638836] do_el0_svc+0x2c/0x90 [ 6.639216] el0_svc+0x40/0x190 [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 [ 6.639934] el0t_64_sync+0x1a4/0x1a8 [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) [ 6.640889] ---[ end trace 95615d89b7c87f95 ]---
Signed-off-by: Cong Liu liucong2@kylinos.cn --- drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..0e61ac04d8ad 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit");
+#ifdef CONFIG_ARM64 + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size); +#else qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); +#endif if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; }
+#ifdef CONFIG_ARM64 + qdev->ram_header = ioremap_cache(qdev->vram_base + + qdev->rom->ram_header_offset, + sizeof(*qdev->ram_header)); +#else qdev->ram_header = ioremap(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); +#endif if (!qdev->ram_header) { DRM_ERROR("Unable to ioremap RAM header\n"); r = -ENOMEM;
Arm64 also need the function in commit b849bec29a99 ("drm/ttm: ioremap buffer according to TTM mem caching setting"), so enable it. The following Call Trace captured in arm64 with qxl card.
[ 5.609923] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 5.610592] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5.611271] pc : __memset+0x90/0x188 [ 5.611641] lr : qxl_create_monitors_object+0xe0/0x180 [qxl] [ 5.612208] sp : ffff800012cd37a0 [ 5.612533] x29: ffff800012cd37a0 x28: 0000000000000000 x27: 0000000000000001 [ 5.613228] x26: ffff800012cd3d30 x25: ffffb70116ef5f10 x24: ffffb70116ef5ed8 [ 5.613953] x23: ffffb70116ef5000 x22: 0000000000000000 x21: ffff000300020000 [ 5.614645] x20: ffff0003008e4000 x19: 0000000000000074 x18: 0000000000000014 [ 5.615331] x17: 0000000038ca76f1 x16: ffffb70138fd0600 x15: ffffb7013b1a9950 [ 5.616018] x14: ffff800010000000 x13: ffffb7013a8a2d78 x12: ffffb7013a8a2d78 [ 5.616709] x11: ffffb7013a8aa767 x10: 0000000000000000 x9 : ffffb701398fc5bc [ 5.617409] x8 : ffff8000100ad074 x7 : 0000000000000000 x6 : 0000000000000002 [ 5.618206] x5 : ffff0003008e50b0 x4 : 0000000000000000 x3 : 0000000000000030 [ 5.618933] x2 : 0000000000000004 x1 : 0000000000000000 x0 : ffff8000100ad000 [ 5.619624] Call trace: [ 5.619872] __memset+0x90/0x188 [ 5.620188] qxl_modeset_init+0x4c/0x320 [qxl] [ 5.620627] qxl_pci_probe+0x11c/0x1d0 [qxl] [ 5.621029] local_pci_probe+0x48/0xb8 [ 5.621390] pci_device_probe+0x194/0x208 [ 5.621762] really_probe+0xd0/0x458 [ 5.622122] __driver_probe_device+0x124/0x1c0 [ 5.622534] driver_probe_device+0x48/0x130 [ 5.622923] __driver_attach+0xc4/0x1a8 [ 5.623280] bus_for_each_dev+0x78/0xd0 [ 5.623636] driver_attach+0x2c/0x38 [ 5.623969] bus_add_driver+0x154/0x248 [ 5.624324] driver_register+0x6c/0x128 [ 5.624678] __pci_register_driver+0x4c/0x58 [ 5.625072] qxl_init+0x48/0x1000 [qxl] [ 5.625439] do_one_initcall+0x50/0x240 [ 5.625825] do_init_module+0x60/0x238 [ 5.626189] load_module+0x2458/0x2900 [ 5.626543] __do_sys_finit_module+0xbc/0x128 [ 5.626952] __arm64_sys_finit_module+0x28/0x38 [ 5.627384] invoke_syscall+0x74/0xf0 [ 5.627732] el0_svc_common.constprop.0+0x58/0x1a8 [ 5.628190] do_el0_svc+0x2c/0x90 [ 5.628503] el0_svc+0x40/0x190 [ 5.628811] el0t_64_sync_handler+0xb0/0xb8 [ 5.629206] el0t_64_sync+0x1a4/0x1a8 [ 5.629552] Code: a8811d07 f2400c42 b4000062 8b020108 (a93f1d07) [ 5.630152] ---[ end trace 35a380fcdcd5b8f7 ]---
Signed-off-by: Cong Liu liucong2@kylinos.cn --- drivers/gpu/drm/ttm/ttm_bo_util.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 72a94301bc95..3df96e76c424 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -281,7 +281,7 @@ static int ttm_bo_ioremap(struct ttm_buffer_object *bo, map->bo_kmap_type = ttm_bo_map_iomap; if (mem->bus.caching == ttm_write_combined) map->virtual = ioremap_wc(res, size); -#ifdef CONFIG_X86 +#if (defined CONFIG_X86) || (defined CONFIG_ARM64) else if (mem->bus.caching == ttm_cached) map->virtual = ioremap_cache(res, size); #endif @@ -402,7 +402,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct dma_buf_map *map) else if (mem->bus.caching == ttm_write_combined) vaddr_iomem = ioremap_wc(mem->bus.offset, bo->base.size); -#ifdef CONFIG_X86 +#if (defined CONFIG_X86) || (defined CONFIG_ARM64) else if (mem->bus.caching == ttm_cached) vaddr_iomem = ioremap_cache(mem->bus.offset, bo->base.size);
Am 22.03.22 um 10:34 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
Christian.
6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] [ 6.621376] sp : ffff800012b73760 [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27: 0000000010000000 [ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24: ffffcf376848c000 [ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21: 0000000000000000 [ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18: 0000000000000014 [ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15: 000000006e771809 [ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12: 696e696666615f65 [ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 : ffffcf3718e085a4 [ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 : 0000000000000017 [ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 : 0000000014000000 [ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 : ffff0000c4086000 [ 6.628751] Call trace: [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] [ 6.629404] setup_slot+0x34/0xf0 [qxl] [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] [ 6.630654] local_pci_probe+0x48/0xb8 [ 6.631027] pci_device_probe+0x194/0x208 [ 6.631464] really_probe+0xd0/0x458 [ 6.631818] __driver_probe_device+0x124/0x1c0 [ 6.632256] driver_probe_device+0x48/0x130 [ 6.632669] __driver_attach+0xc4/0x1a8 [ 6.633049] bus_for_each_dev+0x78/0xd0 [ 6.633437] driver_attach+0x2c/0x38 [ 6.633789] bus_add_driver+0x154/0x248 [ 6.634168] driver_register+0x6c/0x128 [ 6.635205] __pci_register_driver+0x4c/0x58 [ 6.635628] qxl_init+0x48/0x1000 [qxl] [ 6.636013] do_one_initcall+0x50/0x240 [ 6.636390] do_init_module+0x60/0x238 [ 6.636768] load_module+0x2458/0x2900 [ 6.637136] __do_sys_finit_module+0xbc/0x128 [ 6.637561] __arm64_sys_finit_module+0x28/0x38 [ 6.638004] invoke_syscall+0x74/0xf0 [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 [ 6.638836] do_el0_svc+0x2c/0x90 [ 6.639216] el0_svc+0x40/0x190 [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 [ 6.639934] el0t_64_sync+0x1a4/0x1a8 [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) [ 6.640889] ---[ end trace 95615d89b7c87f95 ]---
Signed-off-by: Cong Liu liucong2@kylinos.cn
drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..0e61ac04d8ad 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit");
+#ifdef CONFIG_ARM64
- qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
+#else qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); +#endif if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; }
+#ifdef CONFIG_ARM64
- qdev->ram_header = ioremap_cache(qdev->vram_base +
qdev->rom->ram_header_offset,
sizeof(*qdev->ram_header));
+#else qdev->ram_header = ioremap(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); +#endif if (!qdev->ram_header) { DRM_ERROR("Unable to ioremap RAM header\n"); r = -ENOMEM;
Hi Cong,
well than Dave must decide what to do here.
When QXL emulates a device it should also not use memory accesses which won't work on a physical device.
BTW: Your patch is really buggy, it misses the cases in ttm_module.c
Regards, Christian.
Am 23.03.22 um 09:48 schrieb liucong2@kylinos.cn:
Hi Christian,
according to 'Arm Architecture Reference Manual Armv8,for Armv8-A
architecture profile' pdf E2.6.5
E2.6.5 Generation of Alignment faults by load/store multiple accesses to
Device memory
When FEAT_LSMAOC is implemented and the value of the applicable nTLSMD
field is 0, any memory access by an AArch32 Load Multiple or Store
Multiple instruction to an address that the stage 1 translation
assigns as Device-nGRE, Device-nGnRE, or Device-nGnRnE generates
an Alignment fault.
so it seems not just some ARM boards doesn't allow unaligned access to MMIO
space, all pci memory mapped as Device-nGnRE in arm64 cannot support
unaligned access. and qxl is a device simulated by qemu, it should be able to access
to MMIO space in a more flexible way(PROT_NORMAL) than the real physical
graphics card.
Cong.
*主 题:*Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64 *日 期:*2022-03-23 15:15 *发件人:*Christian König *收件人:*Cong Liuairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-devel@lists.freedesktop.org
Am 22.03.22 um 10:34 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
Christian.
6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] [ 6.621376] sp : ffff800012b73760 [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27:
0000000010000000
[ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24:
ffffcf376848c000
[ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21:
0000000000000000
[ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18:
0000000000000014
[ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15:
000000006e771809
[ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12:
696e696666615f65
[ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 :
ffffcf3718e085a4
[ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 :
0000000000000017
[ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 :
0000000014000000
[ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 :
ffff0000c4086000
[ 6.628751] Call trace: [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] [ 6.629404] setup_slot+0x34/0xf0 [qxl] [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] [ 6.630654] local_pci_probe+0x48/0xb8 [ 6.631027] pci_device_probe+0x194/0x208 [ 6.631464] really_probe+0xd0/0x458 [ 6.631818] __driver_probe_device+0x124/0x1c0 [ 6.632256] driver_probe_device+0x48/0x130 [ 6.632669] __driver_attach+0xc4/0x1a8 [ 6.633049] bus_for_each_dev+0x78/0xd0 [ 6.633437] driver_attach+0x2c/0x38 [ 6.633789] bus_add_driver+0x154/0x248 [ 6.634168] driver_register+0x6c/0x128 [ 6.635205] __pci_register_driver+0x4c/0x58 [ 6.635628] qxl_init+0x48/0x1000 [qxl] [ 6.636013] do_one_initcall+0x50/0x240 [ 6.636390] do_init_module+0x60/0x238 [ 6.636768] load_module+0x2458/0x2900 [ 6.637136] __do_sys_finit_module+0xbc/0x128 [ 6.637561] __arm64_sys_finit_module+0x28/0x38 [ 6.638004] invoke_syscall+0x74/0xf0 [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 [ 6.638836] do_el0_svc+0x2c/0x90 [ 6.639216] el0_svc+0x40/0x190 [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 [ 6.639934] el0t_64_sync+0x1a4/0x1a8 [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) [ 6.640889] ---[ end trace 95615d89b7c87f95 ]---
Signed-off-by: Cong Liu
drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c
b/drivers/gpu/drm/qxl/qxl_kms.c
index 4dc5ad13f12c..0e61ac04d8ad 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit");
+#ifdef CONFIG_ARM64
- qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
+#else qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); +#endif if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; }
+#ifdef CONFIG_ARM64
- qdev->ram_header = ioremap_cache(qdev->vram_base +
- qdev->rom->ram_header_offset,
- sizeof(*qdev->ram_header));
+#else qdev->ram_header = ioremap(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); +#endif if (!qdev->ram_header) { DRM_ERROR("Unable to ioremap RAM header\n"); r = -ENOMEM;
Hi Cong,
yes I've seen that, but that is still not sufficient.
You need to update the check in ttm_module.c as well or otherwise your userspace mapping might not work correctly either.
Regards, Christian.
Am 23.03.22 um 11:00 schrieb liucong2@kylinos.cn:
Hi Christian,
another commit fix the case in ttm. I send two patches at the same time, but seems I miss
'--cover-letter' when run format-patch or some other bad operation.
Regards,
Cong.
*主 题:*Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64 *日 期:*2022-03-23 17:34 *发件人:*Christian König *收件人:*liucong2@kylinos.cnairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-devel@lists.freedesktop.org
Hi Cong,
well than Dave must decide what to do here.
When QXL emulates a device it should also not use memory accesses which won't work on a physical device.
BTW: Your patch is really buggy, it misses the cases in ttm_module.c
Regards, Christian.
Am 23.03.22 um 09:48 schrieb liucong2@kylinos.cn:
Hi Christian,
according to 'Arm Architecture Reference Manual Armv8,for Armv8-A
architecture profile' pdf E2.6.5
E2.6.5 Generation of Alignment faults by load/store multiple accesses to
Device memory
When FEAT_LSMAOC is implemented and the value of the applicable nTLSMD
field is 0, any memory access by an AArch32 Load Multiple or Store
Multiple instruction to an address that the stage 1 translation
assigns as Device-nGRE, Device-nGnRE, or Device-nGnRnE generates
an Alignment fault.
so it seems not just some ARM boards doesn't allow unaligned access to MMIO
space, all pci memory mapped as Device-nGnRE in arm64 cannot support
unaligned access. and qxl is a device simulated by qemu, it should be able to access
to MMIO space in a more flexible way(PROT_NORMAL) than the real physical
graphics card.
Cong.
*主 题:*Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64 *日 期:*2022-03-23 15:15 *发件人:*Christian König *收件人:*CongLiuairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-devel@lists.freedesktop.org
Am 22.03.22 um 10:34 schrieb Cong Liu: > qxl use ioremap to map ram_header and rom, in the arm64 implementation, > the device is mapped as DEVICE_nGnRE, it can not support unaligned > access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
Christian.
> > 6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] > [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] > [ 6.621376] sp : ffff800012b73760 > [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27: 0000000010000000 > [ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24: ffffcf376848c000 > [ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21: 0000000000000000 > [ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18: 0000000000000014 > [ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15: 000000006e771809 > [ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12: 696e696666615f65 > [ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 : ffffcf3718e085a4 > [ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 : 0000000000000017 > [ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 : 0000000014000000 > [ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 : ffff0000c4086000 > [ 6.628751] Call trace: > [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] > [ 6.629404] setup_slot+0x34/0xf0 [qxl] > [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] > [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] > [ 6.630654] local_pci_probe+0x48/0xb8 > [ 6.631027] pci_device_probe+0x194/0x208 > [ 6.631464] really_probe+0xd0/0x458 > [ 6.631818] __driver_probe_device+0x124/0x1c0 > [ 6.632256] driver_probe_device+0x48/0x130 > [ 6.632669] __driver_attach+0xc4/0x1a8 > [ 6.633049] bus_for_each_dev+0x78/0xd0 > [ 6.633437] driver_attach+0x2c/0x38 > [ 6.633789] bus_add_driver+0x154/0x248 > [ 6.634168] driver_register+0x6c/0x128 > [ 6.635205] __pci_register_driver+0x4c/0x58 > [ 6.635628] qxl_init+0x48/0x1000 [qxl] > [ 6.636013] do_one_initcall+0x50/0x240 > [ 6.636390] do_init_module+0x60/0x238 > [ 6.636768] load_module+0x2458/0x2900 > [ 6.637136] __do_sys_finit_module+0xbc/0x128 > [ 6.637561] __arm64_sys_finit_module+0x28/0x38 > [ 6.638004] invoke_syscall+0x74/0xf0 > [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 > [ 6.638836] do_el0_svc+0x2c/0x90 > [ 6.639216] el0_svc+0x40/0x190 > [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 > [ 6.639934] el0t_64_sync+0x1a4/0x1a8 > [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) > [ 6.640889] ---[ end trace 95615d89b7c87f95 ]--- > > Signed-off-by: Cong Liu > --- > drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c > index 4dc5ad13f12c..0e61ac04d8ad 100644 > --- a/drivers/gpu/drm/qxl/qxl_kms.c > +++ b/drivers/gpu/drm/qxl/qxl_kms.c > @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, > (int)qdev->surfaceram_size / 1024, > (sb == 4) ? "64bit" : "32bit"); > > +#ifdef CONFIG_ARM64 > + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size); > +#else > qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); > +#endif > if (!qdev->rom) { > pr_err("Unable to ioremap ROM\n"); > r = -ENOMEM; > @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, > goto rom_unmap; > } > > +#ifdef CONFIG_ARM64 > + qdev->ram_header = ioremap_cache(qdev->vram_base + > + qdev->rom->ram_header_offset, > + sizeof(*qdev->ram_header)); > +#else > qdev->ram_header = ioremap(qdev->vram_base + > qdev->rom->ram_header_offset, > sizeof(*qdev->ram_header)); > +#endif > if (!qdev->ram_header) { > DRM_ERROR("Unable to ioremap RAM header\n"); > r = -ENOMEM;
Hi Cong,
when I understand Robin correctly all mapping (host, guest, kernel, userspace etc..) must have the same caching attributes unless you use the S2FWB feature introduced with Armv8.4.
If you don't follow those rules you usually run into coherency issues or even worse system hangs. So you not only need to adjust the kernel mappings, but also the function for userspace mappings to follow those rules.
Additional to that I think I read Robin correctly that mapping the resource write combined should be sufficient to solve your problem. So I suggest to use that instead.
On the other hand if you manage to fix this completely for arm64 by updating all the places to use cached mappings I'm fine with it as well. It's then up to Dave to decide if that's something we want because QXL is intentionally emulating a hardware device as far as I know.
Regards, Christian.
Am 24.03.22 um 08:04 schrieb liucong2@kylinos.cn:
Hi Christian,
Can you help explain more detail under what circumstances userspace mapping
will be problematic, you mean cached mappings also need adjustment in
function ttm_prot_from_caching?
Regards,
Cong.
*主 题:*Re: 回复: Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64 *日 期:*2022-03-23 18:17 *发件人:*Christian König *收件人:*liucong2@kylinos.cnairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-devel@lists.freedesktop.org
Hi Cong,
yes I've seen that, but that is still not sufficient.
You need to update the check in ttm_module.c as well or otherwise your userspace mapping might not work correctly either.
Regards, Christian.
Am 23.03.22 um 11:00 schrieb liucong2@kylinos.cn:
Hi Christian,
another commit fix the case in ttm. I send two patches at the same time, but seems I miss
'--cover-letter' when run format-patch or some other bad operation.
Regards,
Cong.
*主 题:*Re: 回复: Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64 *日 期:*2022-03-23 17:34 *发件人:*Christian König *收件人:*liucong2@kylinos.cnairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-devel@lists.freedesktop.org
Hi Cong,
well than Dave must decide what to do here.
When QXL emulates a device it should also not use memory accesses which won't work on a physical device.
BTW: Your patch is really buggy, it misses the cases in ttm_module.c
Regards, Christian.
Am 23.03.22 um 09:48 schrieb liucong2@kylinos.cn:
Hi Christian,
according to 'Arm Architecture Reference Manual Armv8,for Armv8-A
architecture profile' pdf E2.6.5
E2.6.5 Generation of Alignment faults by load/store multiple accesses to
Device memory
When FEAT_LSMAOC is implemented and the value of the applicable nTLSMD
field is 0, any memory access by an AArch32 Load Multiple or Store
Multiple instruction to an address that the stage 1 translation
assigns as Device-nGRE, Device-nGnRE, or Device-nGnRnE generates
an Alignment fault.
so it seems not just some ARM boards doesn't allow unaligned access to MMIO
space, all pci memory mapped as Device-nGnRE in arm64 cannot support
unaligned access. and qxl is a device simulated by qemu, it should be able to access
to MMIO space in a more flexible way(PROT_NORMAL) than the real physical
graphics card.
Cong.
*主 题:*Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64 *日 期:*2022-03-23 15:15 *发件人:*Christian König *收件人:*CongLiuairlied@redhat.comkraxel@redhat.comairlied@linux.iedaniel@ffwll.chray.huang@amd.comvirtualization@lists.linux-foundation.orgspice-devel@lists.freedesktop.orgdri-devel@lists.freedesktop.org
Am 22.03.22 um 10:34 schrieb Cong Liu: > qxl use ioremap to map ram_header and rom, in the arm64 implementation, > the device is mapped as DEVICE_nGnRE, it can not support unaligned > access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
Christian.
> > 6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] > [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] > [ 6.621376] sp : ffff800012b73760 > [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27: 0000000010000000 > [ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24: ffffcf376848c000 > [ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21: 0000000000000000 > [ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18: 0000000000000014 > [ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15: 000000006e771809 > [ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12: 696e696666615f65 > [ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 : ffffcf3718e085a4 > [ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 : 0000000000000017 > [ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 : 0000000014000000 > [ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 : ffff0000c4086000 > [ 6.628751] Call trace: > [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] > [ 6.629404] setup_slot+0x34/0xf0 [qxl] > [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] > [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] > [ 6.630654] local_pci_probe+0x48/0xb8 > [ 6.631027] pci_device_probe+0x194/0x208 > [ 6.631464] really_probe+0xd0/0x458 > [ 6.631818] __driver_probe_device+0x124/0x1c0 > [ 6.632256] driver_probe_device+0x48/0x130 > [ 6.632669] __driver_attach+0xc4/0x1a8 > [ 6.633049] bus_for_each_dev+0x78/0xd0 > [ 6.633437] driver_attach+0x2c/0x38 > [ 6.633789] bus_add_driver+0x154/0x248 > [ 6.634168] driver_register+0x6c/0x128 > [ 6.635205] __pci_register_driver+0x4c/0x58 > [ 6.635628] qxl_init+0x48/0x1000 [qxl] > [ 6.636013] do_one_initcall+0x50/0x240 > [ 6.636390] do_init_module+0x60/0x238 > [ 6.636768] load_module+0x2458/0x2900 > [ 6.637136] __do_sys_finit_module+0xbc/0x128 > [ 6.637561] __arm64_sys_finit_module+0x28/0x38 > [ 6.638004] invoke_syscall+0x74/0xf0 > [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 > [ 6.638836] do_el0_svc+0x2c/0x90 > [ 6.639216] el0_svc+0x40/0x190 > [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 > [ 6.639934] el0t_64_sync+0x1a4/0x1a8 > [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) > [ 6.640889] ---[ end trace 95615d89b7c87f95 ]--- > > Signed-off-by: Cong Liu > --- > drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c > index 4dc5ad13f12c..0e61ac04d8ad 100644 > --- a/drivers/gpu/drm/qxl/qxl_kms.c > +++ b/drivers/gpu/drm/qxl/qxl_kms.c > @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, > (int)qdev->surfaceram_size / 1024, > (sb == 4) ? "64bit" : "32bit"); > > +#ifdef CONFIG_ARM64 > + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size); > +#else > qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); > +#endif > if (!qdev->rom) { > pr_err("Unable to ioremap ROM\n"); > r = -ENOMEM; > @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, > goto rom_unmap; > } > > +#ifdef CONFIG_ARM64 > + qdev->ram_header = ioremap_cache(qdev->vram_base + > + qdev->rom->ram_header_offset, > + sizeof(*qdev->ram_header)); > +#else > qdev->ram_header = ioremap(qdev->vram_base + > qdev->rom->ram_header_offset, > sizeof(*qdev->ram_header)); > +#endif > if (!qdev->ram_header) { > DRM_ERROR("Unable to ioremap RAM header\n"); > r = -ENOMEM;
On Thu, Mar 24, 2022 at 10:20:40AM +0100, Christian König wrote:
Hi Cong,
when I understand Robin correctly all mapping (host, guest, kernel, userspace etc..) must have the same caching attributes unless you use the S2FWB feature introduced with Armv8.4.
If you don't follow those rules you usually run into coherency issues or even worse system hangs. So you not only need to adjust the kernel mappings, but also the function for userspace mappings to follow those rules.
That matches my understanding.
For qxl specifically: when using the xork qxl driver getting the userspace mappings right is essential because userspace will write qxl command buffers then. When using the xorg modesetting driver or wayland the worst thing happening would be display corruption because userspace will only map dumb bo's for pixel data.
I'm wondering though why you are keen on getting qxl work instead of just using virtio-gpu?
take care, Gerd
On Thu, Mar 24, 2022 at 06:34:02PM +0800, liucong2@kylinos.cn wrote:
ok, thanks, a lot of our customer use qxl on x86 before, so it still need to supoort qxl on arm64.
Well, qxl isn't the best choice even on x86. The main advantage it offers (2d acceleration) is basically useless today because pretty much everything moved on to use 3d acceleration instead. So qxl ends up being used as dumb framebuffer with software 3d rendering.
So, I'm still recommending to just use virtio-gpu ...
take care, Gerd
On Thu, Mar 24, 2022 at 02:21:09PM +0100, Gerd Hoffmann wrote:
On Thu, Mar 24, 2022 at 06:34:02PM +0800, liucong2@kylinos.cn wrote:
ok, thanks, a lot of our customer use qxl on x86 before, so it still need to supoort qxl on arm64.
Well, qxl isn't the best choice even on x86. The main advantage it offers (2d acceleration) is basically useless today because pretty much everything moved on to use 3d acceleration instead. So qxl ends up being used as dumb framebuffer with software 3d rendering.
So, I'm still recommending to just use virtio-gpu ...
Also bear in mind that while almost no one uses the 2d acceleration in QXL, the QEMU device still implements all the 2d functionality. This exposes an attack surface to the guest VM, from code that is largely ignored by maintainers today, as attention is focused on virtio-gpu instead.
With regards, Daniel
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access. and qxl is a virtual device, it can be treated more like RAM than actual MMIO registers. use ioremap_wc() replace it.
Signed-off-by: Cong Liu liucong2@kylinos.cn --- drivers/gpu/drm/qxl/qxl_kms.c | 4 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..a054e4a00fe8 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,7 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit");
- qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); + qdev->rom = ioremap_wc(qdev->rom_base, qdev->rom_size); if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,7 +183,7 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; }
- qdev->ram_header = ioremap(qdev->vram_base + + qdev->ram_header = ioremap_wc(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); if (!qdev->ram_header) { diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index b2e33d5ba5d0..95df5750f47f 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -82,13 +82,13 @@ int qxl_ttm_io_mem_reserve(struct ttm_device *bdev, case TTM_PL_VRAM: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->vram_base; - mem->bus.caching = ttm_cached; + mem->bus.caching = ttm_write_combined; break; case TTM_PL_PRIV: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->surfaceram_base; - mem->bus.caching = ttm_cached; + mem->bus.caching = ttm_write_combined; break; default: return -EINVAL;
Am 24.03.22 um 11:49 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access. and qxl is a virtual device, it can be treated more like RAM than actual MMIO registers. use ioremap_wc() replace it.
Signed-off-by: Cong Liu liucong2@kylinos.cn
Looks sane to me, but I'm really not involved enough to fully judge.
Acked-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/qxl/qxl_kms.c | 4 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..a054e4a00fe8 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,7 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit");
- qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
- qdev->rom = ioremap_wc(qdev->rom_base, qdev->rom_size); if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM;
@@ -183,7 +183,7 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; }
- qdev->ram_header = ioremap(qdev->vram_base +
- qdev->ram_header = ioremap_wc(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); if (!qdev->ram_header) {
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index b2e33d5ba5d0..95df5750f47f 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -82,13 +82,13 @@ int qxl_ttm_io_mem_reserve(struct ttm_device *bdev, case TTM_PL_VRAM: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->vram_base;
mem->bus.caching = ttm_cached;
break; case TTM_PL_PRIV: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->surfaceram_base;mem->bus.caching = ttm_write_combined;
mem->bus.caching = ttm_cached;
break; default: return -EINVAL;mem->bus.caching = ttm_write_combined;
any suggestions or extra test I can do now?
Regards, Cong
On 2022/3/25 15:45, Christian König wrote:
Am 24.03.22 um 11:49 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access. and qxl is a virtual device, it can be treated more like RAM than actual MMIO registers. use ioremap_wc() replace it.
Signed-off-by: Cong Liu liucong2@kylinos.cn
Looks sane to me, but I'm really not involved enough to fully judge.
Acked-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/qxl/qxl_kms.c | 4 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..a054e4a00fe8 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,7 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit"); - qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); + qdev->rom = ioremap_wc(qdev->rom_base, qdev->rom_size); if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,7 +183,7 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; } - qdev->ram_header = ioremap(qdev->vram_base + + qdev->ram_header = ioremap_wc(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); if (!qdev->ram_header) { diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index b2e33d5ba5d0..95df5750f47f 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -82,13 +82,13 @@ int qxl_ttm_io_mem_reserve(struct ttm_device *bdev, case TTM_PL_VRAM: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->vram_base; - mem->bus.caching = ttm_cached; + mem->bus.caching = ttm_write_combined; break; case TTM_PL_PRIV: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->surfaceram_base; - mem->bus.caching = ttm_cached; + mem->bus.caching = ttm_write_combined; break; default: return -EINVAL;
I'd like to make sure this has no side effects on x86 guests, it probably is safe, but keep an eye for regression reports.
Reviewed-by: Dave Airlie airlied@redhat.com Dave.
On Wed, Mar 30, 2022 at 8:20 PM Cong Liu liucong2@kylinos.cn wrote:
any suggestions or extra test I can do now?
Regards, Cong
On 2022/3/25 15:45, Christian König wrote:
Am 24.03.22 um 11:49 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access. and qxl is a virtual device, it can be treated more like RAM than actual MMIO registers. use ioremap_wc() replace it.
Signed-off-by: Cong Liu liucong2@kylinos.cn
Looks sane to me, but I'm really not involved enough to fully judge.
Acked-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/qxl/qxl_kms.c | 4 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..a054e4a00fe8 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,7 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit");
- qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
- qdev->rom = ioremap_wc(qdev->rom_base, qdev->rom_size); if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM;
@@ -183,7 +183,7 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; }
- qdev->ram_header = ioremap(qdev->vram_base +
- qdev->ram_header = ioremap_wc(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); if (!qdev->ram_header) {
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index b2e33d5ba5d0..95df5750f47f 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -82,13 +82,13 @@ int qxl_ttm_io_mem_reserve(struct ttm_device *bdev, case TTM_PL_VRAM: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->vram_base;
mem->bus.caching = ttm_cached;
mem->bus.caching = ttm_write_combined; break; case TTM_PL_PRIV: mem->bus.is_iomem = true; mem->bus.offset = (mem->start << PAGE_SHIFT) + qdev->surfaceram_base;
mem->bus.caching = ttm_cached;
mem->bus.caching = ttm_write_combined; break; default: return -EINVAL;
On 2022-03-23 07:15, Christian König wrote:
Am 22.03.22 um 10:34 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
No, this one's not a bug. The Device memory type used for iomem mappings is *architecturally* defined to enforce properties like aligned accesses, no speculation, no reordering, etc. If something wants to be treated more like RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the appropriate thing to do in general (with the former being a bit more portable according to Documentation/driver-api/device-io.rst).
Of course *then* you might find that on some systems the interconnect/PCIe implementation/endpoint doesn't actually like unaligned accesses once the CPU MMU starts allowing them to be sent out, but hey, one step at a time ;)
Robin.
Christian.
6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] [ 6.621376] sp : ffff800012b73760 [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27: 0000000010000000 [ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24: ffffcf376848c000 [ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21: 0000000000000000 [ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18: 0000000000000014 [ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15: 000000006e771809 [ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12: 696e696666615f65 [ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 : ffffcf3718e085a4 [ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 : 0000000000000017 [ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 : 0000000014000000 [ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 : ffff0000c4086000 [ 6.628751] Call trace: [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] [ 6.629404] setup_slot+0x34/0xf0 [qxl] [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] [ 6.630654] local_pci_probe+0x48/0xb8 [ 6.631027] pci_device_probe+0x194/0x208 [ 6.631464] really_probe+0xd0/0x458 [ 6.631818] __driver_probe_device+0x124/0x1c0 [ 6.632256] driver_probe_device+0x48/0x130 [ 6.632669] __driver_attach+0xc4/0x1a8 [ 6.633049] bus_for_each_dev+0x78/0xd0 [ 6.633437] driver_attach+0x2c/0x38 [ 6.633789] bus_add_driver+0x154/0x248 [ 6.634168] driver_register+0x6c/0x128 [ 6.635205] __pci_register_driver+0x4c/0x58 [ 6.635628] qxl_init+0x48/0x1000 [qxl] [ 6.636013] do_one_initcall+0x50/0x240 [ 6.636390] do_init_module+0x60/0x238 [ 6.636768] load_module+0x2458/0x2900 [ 6.637136] __do_sys_finit_module+0xbc/0x128 [ 6.637561] __arm64_sys_finit_module+0x28/0x38 [ 6.638004] invoke_syscall+0x74/0xf0 [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 [ 6.638836] do_el0_svc+0x2c/0x90 [ 6.639216] el0_svc+0x40/0x190 [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 [ 6.639934] el0t_64_sync+0x1a4/0x1a8 [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) [ 6.640889] ---[ end trace 95615d89b7c87f95 ]---
Signed-off-by: Cong Liu liucong2@kylinos.cn
drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..0e61ac04d8ad 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit"); +#ifdef CONFIG_ARM64 + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size); +#else qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); +#endif if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; } +#ifdef CONFIG_ARM64 + qdev->ram_header = ioremap_cache(qdev->vram_base + + qdev->rom->ram_header_offset, + sizeof(*qdev->ram_header)); +#else qdev->ram_header = ioremap(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); +#endif if (!qdev->ram_header) { DRM_ERROR("Unable to ioremap RAM header\n"); r = -ENOMEM;
Am 23.03.22 um 10:45 schrieb Robin Murphy:
On 2022-03-23 07:15, Christian König wrote:
Am 22.03.22 um 10:34 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
No, this one's not a bug. The Device memory type used for iomem mappings is *architecturally* defined to enforce properties like aligned accesses, no speculation, no reordering, etc. If something wants to be treated more like RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the appropriate thing to do in general (with the former being a bit more portable according to Documentation/driver-api/device-io.rst).
Of course *then* you might find that on some systems the interconnect/PCIe implementation/endpoint doesn't actually like unaligned accesses once the CPU MMU starts allowing them to be sent out, but hey, one step at a time ;)
Ah, good point! I was already wondering why that sometimes work and sometimes doesn't.
Thanks, Christian.
Robin.
Christian.
6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] [ 6.621376] sp : ffff800012b73760 [ 6.621701] x29: ffff800012b73760 x28: 0000000000000001 x27: 0000000010000000 [ 6.622400] x26: 0000000000000001 x25: 0000000004000000 x24: ffffcf376848c000 [ 6.623099] x23: ffff0000c4087400 x22: ffffcf3718e17140 x21: 0000000000000000 [ 6.623823] x20: ffff0000c4086000 x19: ffff0000c40870b0 x18: 0000000000000014 [ 6.624519] x17: 000000004d3605ab x16: 00000000bb3b6129 x15: 000000006e771809 [ 6.625214] x14: 0000000000000001 x13: 007473696c5f7974 x12: 696e696666615f65 [ 6.625909] x11: 00000000d543656a x10: 0000000000000000 x9 : ffffcf3718e085a4 [ 6.626616] x8 : 00000000006c7871 x7 : 000000000000000a x6 : 0000000000000017 [ 6.627343] x5 : 0000000000001400 x4 : ffff800011f63400 x3 : 0000000014000000 [ 6.628047] x2 : 0000000000000000 x1 : ffff0000c40870b0 x0 : ffff0000c4086000 [ 6.628751] Call trace: [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] [ 6.629404] setup_slot+0x34/0xf0 [qxl] [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] [ 6.630654] local_pci_probe+0x48/0xb8 [ 6.631027] pci_device_probe+0x194/0x208 [ 6.631464] really_probe+0xd0/0x458 [ 6.631818] __driver_probe_device+0x124/0x1c0 [ 6.632256] driver_probe_device+0x48/0x130 [ 6.632669] __driver_attach+0xc4/0x1a8 [ 6.633049] bus_for_each_dev+0x78/0xd0 [ 6.633437] driver_attach+0x2c/0x38 [ 6.633789] bus_add_driver+0x154/0x248 [ 6.634168] driver_register+0x6c/0x128 [ 6.635205] __pci_register_driver+0x4c/0x58 [ 6.635628] qxl_init+0x48/0x1000 [qxl] [ 6.636013] do_one_initcall+0x50/0x240 [ 6.636390] do_init_module+0x60/0x238 [ 6.636768] load_module+0x2458/0x2900 [ 6.637136] __do_sys_finit_module+0xbc/0x128 [ 6.637561] __arm64_sys_finit_module+0x28/0x38 [ 6.638004] invoke_syscall+0x74/0xf0 [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 [ 6.638836] do_el0_svc+0x2c/0x90 [ 6.639216] el0_svc+0x40/0x190 [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 [ 6.639934] el0t_64_sync+0x1a4/0x1a8 [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) [ 6.640889] ---[ end trace 95615d89b7c87f95 ]---
Signed-off-by: Cong Liu liucong2@kylinos.cn
drivers/gpu/drm/qxl/qxl_kms.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..0e61ac04d8ad 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit"); +#ifdef CONFIG_ARM64 + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size); +#else qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); +#endif if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; } +#ifdef CONFIG_ARM64 + qdev->ram_header = ioremap_cache(qdev->vram_base + + qdev->rom->ram_header_offset, + sizeof(*qdev->ram_header)); +#else qdev->ram_header = ioremap(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); +#endif if (!qdev->ram_header) { DRM_ERROR("Unable to ioremap RAM header\n"); r = -ENOMEM;
On Wed, Mar 23, 2022 at 09:45:13AM +0000, Robin Murphy wrote:
On 2022-03-23 07:15, Christian König wrote:
Am 22.03.22 um 10:34 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
No, this one's not a bug. The Device memory type used for iomem mappings is *architecturally* defined to enforce properties like aligned accesses, no speculation, no reordering, etc. If something wants to be treated more like RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the appropriate thing to do in general (with the former being a bit more portable according to Documentation/driver-api/device-io.rst).
Well, qxl is a virtual device, so it *is* ram.
I'm wondering whenever qxl actually works on arm? As far I know all virtual display devices with (virtual) pci memory bars for vram do not work on arm due to the guest mapping vram as io memory and the host mapping vram as normal ram and the mapping attribute mismatch causes caching troubles (only noticeable on real arm hardware, not in emulation). Did something change here recently?
take care, Gerd
On 2022-03-23 10:11, Gerd Hoffmann wrote:
On Wed, Mar 23, 2022 at 09:45:13AM +0000, Robin Murphy wrote:
On 2022-03-23 07:15, Christian K�nig wrote:
Am 22.03.22 um 10:34 schrieb Cong Liu:
qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access.
Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards.
So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct.
No, this one's not a bug. The Device memory type used for iomem mappings is *architecturally* defined to enforce properties like aligned accesses, no speculation, no reordering, etc. If something wants to be treated more like RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the appropriate thing to do in general (with the former being a bit more portable according to Documentation/driver-api/device-io.rst).
Well, qxl is a virtual device, so it *is* ram.
I'm wondering whenever qxl actually works on arm? As far I know all virtual display devices with (virtual) pci memory bars for vram do not work on arm due to the guest mapping vram as io memory and the host mapping vram as normal ram and the mapping attribute mismatch causes caching troubles (only noticeable on real arm hardware, not in emulation). Did something change here recently?
Indeed, Armv8.4 introduced the S2FWB feature to cope with situations like this - essentially it allows the hypervisor to share RAM-backed pages with the guest without losing coherency regardless of how the guest maps them.
Robin.
dri-devel@lists.freedesktop.org