On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I'm adding Thomas Zimmermann to the cc, because he did that "drm/ast: Program display mode in CRTC's atomic_enable" which looks relevant in that it's right in that call-chain.
Did some initialization perhaps get overlooked?
And Dave and Daniel and the drm list cc'd as well..
Full splat left quoted below for new people and list.
Linus
[ 20.809891] WARNING: CPU: 0 PID: 973 at drivers/gpu/drm/drm_gem_vram_helper.c:284 drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] [ 20.821543] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_powerclamp coretemp kvm_intel kvm joydev input_leds ipmi_si intel_cstate ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel gpio_ich drm aesni_intel hid_generic glue_helper crypto_simd igb usbhid cryptd ahci i2c_i801 hid libahci i2c_smbus lpc_ich dca i2c_ismt i2c_algo_bit [ 20.887477] CPU: 0 PID: 973 Comm: gnome-shell Not tainted 5.10.0-rc4+ #78 [ 20.894274] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 [ 20.900896] RIP: 0010:drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] [ 20.907342] Code: 00 48 89 e5 85 c0 74 17 48 83 bf 78 01 00 00 00 74 18 48 8b 87 80 01 00 00 5d 48 c1 e0 0c c3 0f 0b 48 c7 c0 ed ff ff ff 5d c3 <0f> 0b 31 c0 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 87 18 06 [ 20.926100] RSP: 0018:ffff9f59811d3a68 EFLAGS: 00010246 [ 20.931339] RAX: 0000000000000002 RBX: ffff8b46861e20c0 RCX: ffffffffc032d600 [ 20.938479] RDX: ffff8b468f47a000 RSI: ffff8b46861e2000 RDI: ffff8b468f9acc00 [ 20.945622] RBP: ffff9f59811d3a68 R08: 0000000000000040 R09: ffff8b46864ce288 [ 20.952769] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b468f47a000 [ 20.959915] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8b468ad2bf00 [ 20.967057] FS: 00007f5b37ac5cc0(0000) GS:ffff8b49efc00000(0000) knlGS:0000000000000000 [ 20.975149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.980904] CR2: 00007f5b3d093f00 CR3: 0000000103438000 CR4: 00000000001006f0 [ 20.988047] Call Trace: [ 20.990506] ast_cursor_page_flip+0x22/0x100 [ast] [ 20.995313] ast_cursor_plane_helper_atomic_update+0x46/0x70 [ast] [ 21.001524] drm_atomic_helper_commit_planes+0xbd/0x220 [drm_kms_helper] [ 21.008243] drm_atomic_helper_commit_tail_rpm+0x3a/0x70 [drm_kms_helper] [ 21.015062] commit_tail+0x99/0x130 [drm_kms_helper] [ 21.020050] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper] [ 21.026269] drm_atomic_commit+0x4a/0x50 [drm] [ 21.030737] drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper] [ 21.037384] __setplane_atomic+0xcc/0x110 [drm] [ 21.041953] drm_mode_cursor_universal+0x13e/0x260 [drm] [ 21.047299] drm_mode_cursor_common+0xef/0x220 [drm] [ 21.052287] ? alloc_set_pte+0x10d/0x6d0 [ 21.056244] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.061242] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 21.066067] drm_ioctl_kernel+0xae/0xf0 [drm] [ 21.070455] drm_ioctl+0x241/0x3f0 [drm] [ 21.074415] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.079401] __x64_sys_ioctl+0x91/0xc0 [ 21.083167] do_syscall_64+0x38/0x90 [ 21.086755] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 21.091813] RIP: 0033:0x7f5b3cf1350b [ 21.095403] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 [ 21.114154] RSP: 002b:00007ffef1966588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 21.121730] RAX: ffffffffffffffda RBX: 00007ffef19665c0 RCX: 00007f5b3cf1350b [ 21.128870] RDX: 00007ffef19665c0 RSI: 00000000c02464bb RDI: 0000000000000009 [ 21.136013] RBP: 00000000c02464bb R08: 0000000000000040 R09: 0000000000000004 [ 21.143157] R10: 0000000000000002 R11: 0000000000000246 R12: 0000561ec9d10060 [ 21.150295] R13: 0000000000000009 R14: 0000561eca2cc9a0 R15: 0000000000000040
From: Linus Torvalds
Sent: 18 November 2020 18:11
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I'm working on it - have been all afternoon. (I'm on holiday and it is raining...)
5.10-rc1 fails, so it is something in the merge window. I suspect I'll just hit the pull of the drm changes. The bisect suddenly build a 5.9-rc5+ kernel! So I'm retesting a good/bad pair with likely dates and will restart it.
Annoyingly the test system defaults to booting the highest version kernel - not the one I've just build; I may have given it a wrong answer. The builds also all take 20 minutes; so the bisect is slow.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I'm adding Thomas Zimmermann to the cc, because he did that "drm/ast: Program display mode in CRTC's atomic_enable" which looks relevant in that it's right in that call-chain.
Did some initialization perhaps get overlooked?
And Dave and Daniel and the drm list cc'd as well..
Full splat left quoted below for new people and list.
Linus
[ 20.809891] WARNING: CPU: 0 PID: 973 at drivers/gpu/drm/drm_gem_vram_helper.c:284 drm_gem_vram_offset+0x35/0x40 [drm_vram_helper]
That line is at [1], which comes from
46642a7d4d80 ("drm/vram-helper: don't use ttm bo->offset v4")
But the patch was merged in 5.9-rc1, so it's probably something else.
We've had a lot of TTM-related changes recently, so my best guess is that it's something in TTM with BO initialization.
From some grepping, it looks like we have to call ttm_bo_mem_space() to fill mm_node (i.e., the pointer that causes the warning). But I cannot find where vram helpers do this. Maybe that's a good starting point.
I'm adding the TTM devs to cc.
Best regards Thomas
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
[ 20.821543] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_powerclamp coretemp kvm_intel kvm joydev input_leds ipmi_si intel_cstate ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel gpio_ich drm aesni_intel hid_generic glue_helper crypto_simd igb usbhid cryptd ahci i2c_i801 hid libahci i2c_smbus lpc_ich dca i2c_ismt i2c_algo_bit [ 20.887477] CPU: 0 PID: 973 Comm: gnome-shell Not tainted 5.10.0-rc4+ #78 [ 20.894274] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 [ 20.900896] RIP: 0010:drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] [ 20.907342] Code: 00 48 89 e5 85 c0 74 17 48 83 bf 78 01 00 00 00 74 18 48 8b 87 80 01 00 00 5d 48 c1 e0 0c c3 0f 0b 48 c7 c0 ed ff ff ff 5d c3 <0f> 0b 31 c0 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 87 18 06 [ 20.926100] RSP: 0018:ffff9f59811d3a68 EFLAGS: 00010246 [ 20.931339] RAX: 0000000000000002 RBX: ffff8b46861e20c0 RCX: ffffffffc032d600 [ 20.938479] RDX: ffff8b468f47a000 RSI: ffff8b46861e2000 RDI: ffff8b468f9acc00 [ 20.945622] RBP: ffff9f59811d3a68 R08: 0000000000000040 R09: ffff8b46864ce288 [ 20.952769] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b468f47a000 [ 20.959915] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8b468ad2bf00 [ 20.967057] FS: 00007f5b37ac5cc0(0000) GS:ffff8b49efc00000(0000) knlGS:0000000000000000 [ 20.975149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.980904] CR2: 00007f5b3d093f00 CR3: 0000000103438000 CR4: 00000000001006f0 [ 20.988047] Call Trace: [ 20.990506] ast_cursor_page_flip+0x22/0x100 [ast] [ 20.995313] ast_cursor_plane_helper_atomic_update+0x46/0x70 [ast] [ 21.001524] drm_atomic_helper_commit_planes+0xbd/0x220 [drm_kms_helper] [ 21.008243] drm_atomic_helper_commit_tail_rpm+0x3a/0x70 [drm_kms_helper] [ 21.015062] commit_tail+0x99/0x130 [drm_kms_helper] [ 21.020050] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper] [ 21.026269] drm_atomic_commit+0x4a/0x50 [drm] [ 21.030737] drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper] [ 21.037384] __setplane_atomic+0xcc/0x110 [drm] [ 21.041953] drm_mode_cursor_universal+0x13e/0x260 [drm] [ 21.047299] drm_mode_cursor_common+0xef/0x220 [drm] [ 21.052287] ? alloc_set_pte+0x10d/0x6d0 [ 21.056244] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.061242] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 21.066067] drm_ioctl_kernel+0xae/0xf0 [drm] [ 21.070455] drm_ioctl+0x241/0x3f0 [drm] [ 21.074415] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.079401] __x64_sys_ioctl+0x91/0xc0 [ 21.083167] do_syscall_64+0x38/0x90 [ 21.086755] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 21.091813] RIP: 0033:0x7f5b3cf1350b [ 21.095403] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 [ 21.114154] RSP: 002b:00007ffef1966588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 21.121730] RAX: ffffffffffffffda RBX: 00007ffef19665c0 RCX: 00007f5b3cf1350b [ 21.128870] RDX: 00007ffef19665c0 RSI: 00000000c02464bb RDI: 0000000000000009 [ 21.136013] RBP: 00000000c02464bb R08: 0000000000000040 R09: 0000000000000004 [ 21.143157] R10: 0000000000000002 R11: 0000000000000246 R12: 0000561ec9d10060 [ 21.150295] R13: 0000000000000009 R14: 0000561eca2cc9a0 R15: 0000000000000040
From: Thomas Zimmermann
Sent: 18 November 2020 19:37
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I don't quite understand what 'git bisect' did. I was bisecting between v5.9 and v5.10-rc1 but it suddenly started generating v5.9.0-rc5+ kernels.
The identified commit was 13a8f46d803 drm/ttm: move ghost object created. (retyped - hope it is right). But the diff to that last 'good' commit is massive.
So I don't know if that is anywhere near right.
David
I'm adding Thomas Zimmermann to the cc, because he did that "drm/ast: Program display mode in CRTC's atomic_enable" which looks relevant in that it's right in that call-chain.
Did some initialization perhaps get overlooked?
And Dave and Daniel and the drm list cc'd as well..
Full splat left quoted below for new people and list.
Linus
[ 20.809891] WARNING: CPU: 0 PID: 973 at drivers/gpu/drm/drm_gem_vram_helper.c:284
drm_gem_vram_offset+0x35/0x40 [drm_vram_helper]
That line is at [1], which comes from
46642a7d4d80 ("drm/vram-helper: don't use ttm bo->offset v4")
But the patch was merged in 5.9-rc1, so it's probably something else.
We've had a lot of TTM-related changes recently, so my best guess is that it's something in TTM with BO initialization.
From some grepping, it looks like we have to call ttm_bo_mem_space() to fill mm_node (i.e., the pointer that causes the warning). But I cannot find where vram helpers do this. Maybe that's a good starting point.
I'm adding the TTM devs to cc.
Best regards Thomas
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv... elper.c?h=v5.10-rc4#n284
[ 20.821543] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
ipmi_ssif intel_powerclamp coretemp kvm_intel kvm joydev input_leds ipmi_si intel_cstate ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel gpio_ich drm aesni_intel hid_generic glue_helper crypto_simd igb usbhid cryptd ahci i2c_i801 hid libahci i2c_smbus lpc_ich dca i2c_ismt i2c_algo_bit
[ 20.887477] CPU: 0 PID: 973 Comm: gnome-shell Not tainted 5.10.0-rc4+ #78 [ 20.894274] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 [ 20.900896] RIP: 0010:drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] [ 20.907342] Code: 00 48 89 e5 85 c0 74 17 48 83 bf 78 01 00 00 00 74 18 48 8b 87 80 01 00 00 5d
48 c1 e0 0c c3 0f 0b 48 c7 c0 ed ff ff ff 5d c3 <0f> 0b 31 c0 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 87 18 06
[ 20.926100] RSP: 0018:ffff9f59811d3a68 EFLAGS: 00010246 [ 20.931339] RAX: 0000000000000002 RBX: ffff8b46861e20c0 RCX: ffffffffc032d600 [ 20.938479] RDX: ffff8b468f47a000 RSI: ffff8b46861e2000 RDI: ffff8b468f9acc00 [ 20.945622] RBP: ffff9f59811d3a68 R08: 0000000000000040 R09: ffff8b46864ce288 [ 20.952769] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b468f47a000 [ 20.959915] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8b468ad2bf00 [ 20.967057] FS: 00007f5b37ac5cc0(0000) GS:ffff8b49efc00000(0000) knlGS:0000000000000000 [ 20.975149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.980904] CR2: 00007f5b3d093f00 CR3: 0000000103438000 CR4: 00000000001006f0 [ 20.988047] Call Trace: [ 20.990506] ast_cursor_page_flip+0x22/0x100 [ast] [ 20.995313] ast_cursor_plane_helper_atomic_update+0x46/0x70 [ast] [ 21.001524] drm_atomic_helper_commit_planes+0xbd/0x220 [drm_kms_helper] [ 21.008243] drm_atomic_helper_commit_tail_rpm+0x3a/0x70 [drm_kms_helper] [ 21.015062] commit_tail+0x99/0x130 [drm_kms_helper] [ 21.020050] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper] [ 21.026269] drm_atomic_commit+0x4a/0x50 [drm] [ 21.030737] drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper] [ 21.037384] __setplane_atomic+0xcc/0x110 [drm] [ 21.041953] drm_mode_cursor_universal+0x13e/0x260 [drm] [ 21.047299] drm_mode_cursor_common+0xef/0x220 [drm] [ 21.052287] ? alloc_set_pte+0x10d/0x6d0 [ 21.056244] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.061242] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 21.066067] drm_ioctl_kernel+0xae/0xf0 [drm] [ 21.070455] drm_ioctl+0x241/0x3f0 [drm] [ 21.074415] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.079401] __x64_sys_ioctl+0x91/0xc0 [ 21.083167] do_syscall_64+0x38/0x90 [ 21.086755] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 21.091813] RIP: 0033:0x7f5b3cf1350b [ 21.095403] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66
0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
[ 21.114154] RSP: 002b:00007ffef1966588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 21.121730] RAX: ffffffffffffffda RBX: 00007ffef19665c0 RCX: 00007f5b3cf1350b [ 21.128870] RDX: 00007ffef19665c0 RSI: 00000000c02464bb RDI: 0000000000000009 [ 21.136013] RBP: 00000000c02464bb R08: 0000000000000040 R09: 0000000000000004 [ 21.143157] R10: 0000000000000002 R11: 0000000000000246 R12: 0000561ec9d10060 [ 21.150295] R13: 0000000000000009 R14: 0000561eca2cc9a0 R15: 0000000000000040
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Nov 18, 2020 at 11:01 PM David Laight David.Laight@aculab.com wrote:
From: Thomas Zimmermann
Sent: 18 November 2020 19:37
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I don't quite understand what 'git bisect' did. I was bisecting between v5.9 and v5.10-rc1 but it suddenly started generating v5.9.0-rc5+ kernels.
We queue up patches for -rc1 way before the previous kernel is released, so this is normal.
The identified commit was 13a8f46d803 drm/ttm: move ghost object created. (retyped - hope it is right). But the diff to that last 'good' commit is massive.
Yeah that's also normal for non-linear history. If you want to double-check, re-test the parent of that commit (which is 2ee476f77ffe ("drm/ttm: add a simple assign mem to bo wrapper")), which should work, and then the bad commit.
Also is this the first bad commit for both the splat and the screen corruption issues?
So I don't know if that is anywhere near right.
Thomas guessed it could be a ttm change, you hit one, and it looks like it could be the culprit. Now I guess it's up to Dave. Also adding Christian, in case he has an idea. -Daniel
David
I'm adding Thomas Zimmermann to the cc, because he did that "drm/ast: Program display mode in CRTC's atomic_enable" which looks relevant in that it's right in that call-chain.
Did some initialization perhaps get overlooked?
And Dave and Daniel and the drm list cc'd as well..
Full splat left quoted below for new people and list.
Linus
[ 20.809891] WARNING: CPU: 0 PID: 973 at drivers/gpu/drm/drm_gem_vram_helper.c:284
drm_gem_vram_offset+0x35/0x40 [drm_vram_helper]
That line is at [1], which comes from
46642a7d4d80 ("drm/vram-helper: don't use ttm bo->offset v4")
But the patch was merged in 5.9-rc1, so it's probably something else.
We've had a lot of TTM-related changes recently, so my best guess is that it's something in TTM with BO initialization.
From some grepping, it looks like we have to call ttm_bo_mem_space() to fill mm_node (i.e., the pointer that causes the warning). But I cannot find where vram helpers do this. Maybe that's a good starting point.
I'm adding the TTM devs to cc.
Best regards Thomas
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv... elper.c?h=v5.10-rc4#n284
[ 20.821543] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
ipmi_ssif intel_powerclamp coretemp kvm_intel kvm joydev input_leds ipmi_si intel_cstate ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel gpio_ich drm aesni_intel hid_generic glue_helper crypto_simd igb usbhid cryptd ahci i2c_i801 hid libahci i2c_smbus lpc_ich dca i2c_ismt i2c_algo_bit
[ 20.887477] CPU: 0 PID: 973 Comm: gnome-shell Not tainted 5.10.0-rc4+ #78 [ 20.894274] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 [ 20.900896] RIP: 0010:drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] [ 20.907342] Code: 00 48 89 e5 85 c0 74 17 48 83 bf 78 01 00 00 00 74 18 48 8b 87 80 01 00 00 5d
48 c1 e0 0c c3 0f 0b 48 c7 c0 ed ff ff ff 5d c3 <0f> 0b 31 c0 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 87 18 06
[ 20.926100] RSP: 0018:ffff9f59811d3a68 EFLAGS: 00010246 [ 20.931339] RAX: 0000000000000002 RBX: ffff8b46861e20c0 RCX: ffffffffc032d600 [ 20.938479] RDX: ffff8b468f47a000 RSI: ffff8b46861e2000 RDI: ffff8b468f9acc00 [ 20.945622] RBP: ffff9f59811d3a68 R08: 0000000000000040 R09: ffff8b46864ce288 [ 20.952769] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b468f47a000 [ 20.959915] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8b468ad2bf00 [ 20.967057] FS: 00007f5b37ac5cc0(0000) GS:ffff8b49efc00000(0000) knlGS:0000000000000000 [ 20.975149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.980904] CR2: 00007f5b3d093f00 CR3: 0000000103438000 CR4: 00000000001006f0 [ 20.988047] Call Trace: [ 20.990506] ast_cursor_page_flip+0x22/0x100 [ast] [ 20.995313] ast_cursor_plane_helper_atomic_update+0x46/0x70 [ast] [ 21.001524] drm_atomic_helper_commit_planes+0xbd/0x220 [drm_kms_helper] [ 21.008243] drm_atomic_helper_commit_tail_rpm+0x3a/0x70 [drm_kms_helper] [ 21.015062] commit_tail+0x99/0x130 [drm_kms_helper] [ 21.020050] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper] [ 21.026269] drm_atomic_commit+0x4a/0x50 [drm] [ 21.030737] drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper] [ 21.037384] __setplane_atomic+0xcc/0x110 [drm] [ 21.041953] drm_mode_cursor_universal+0x13e/0x260 [drm] [ 21.047299] drm_mode_cursor_common+0xef/0x220 [drm] [ 21.052287] ? alloc_set_pte+0x10d/0x6d0 [ 21.056244] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.061242] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 21.066067] drm_ioctl_kernel+0xae/0xf0 [drm] [ 21.070455] drm_ioctl+0x241/0x3f0 [drm] [ 21.074415] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.079401] __x64_sys_ioctl+0x91/0xc0 [ 21.083167] do_syscall_64+0x38/0x90 [ 21.086755] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 21.091813] RIP: 0033:0x7f5b3cf1350b [ 21.095403] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66
0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
[ 21.114154] RSP: 002b:00007ffef1966588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 21.121730] RAX: ffffffffffffffda RBX: 00007ffef19665c0 RCX: 00007f5b3cf1350b [ 21.128870] RDX: 00007ffef19665c0 RSI: 00000000c02464bb RDI: 0000000000000009 [ 21.136013] RBP: 00000000c02464bb R08: 0000000000000040 R09: 0000000000000004 [ 21.143157] R10: 0000000000000002 R11: 0000000000000246 R12: 0000561ec9d10060 [ 21.150295] R13: 0000000000000009 R14: 0000561eca2cc9a0 R15: 0000000000000040
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, 19 Nov 2020 at 08:15, Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Nov 18, 2020 at 11:01 PM David Laight David.Laight@aculab.com wrote:
From: Thomas Zimmermann
Sent: 18 November 2020 19:37
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I don't quite understand what 'git bisect' did. I was bisecting between v5.9 and v5.10-rc1 but it suddenly started generating v5.9.0-rc5+ kernels.
We queue up patches for -rc1 way before the previous kernel is released, so this is normal.
The identified commit was 13a8f46d803 drm/ttm: move ghost object created. (retyped - hope it is right). But the diff to that last 'good' commit is massive.
Yeah that's also normal for non-linear history. If you want to double-check, re-test the parent of that commit (which is 2ee476f77ffe ("drm/ttm: add a simple assign mem to bo wrapper")), which should work, and then the bad commit.
Also is this the first bad commit for both the splat and the screen corruption issues?
So I don't know if that is anywhere near right.
Thomas guessed it could be a ttm change, you hit one, and it looks like it could be the culprit. Now I guess it's up to Dave. Also adding Christian, in case he has an idea.
I'd be mildly surprised if it's that commit, since it just refactors what looks to me to be two identical code pieces into one instance (within the scope of me screwing that up, but reading it I can't see it).
I'll dig into this today.
Dave.
On Thu, 19 Nov 2020 at 08:25, Dave Airlie airlied@gmail.com wrote:
On Thu, 19 Nov 2020 at 08:15, Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Nov 18, 2020 at 11:01 PM David Laight David.Laight@aculab.com wrote:
From: Thomas Zimmermann
Sent: 18 November 2020 19:37
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I don't quite understand what 'git bisect' did. I was bisecting between v5.9 and v5.10-rc1 but it suddenly started generating v5.9.0-rc5+ kernels.
We queue up patches for -rc1 way before the previous kernel is released, so this is normal.
The identified commit was 13a8f46d803 drm/ttm: move ghost object created. (retyped - hope it is right). But the diff to that last 'good' commit is massive.
Yeah that's also normal for non-linear history. If you want to double-check, re-test the parent of that commit (which is 2ee476f77ffe ("drm/ttm: add a simple assign mem to bo wrapper")), which should work, and then the bad commit.
Also is this the first bad commit for both the splat and the screen corruption issues?
So I don't know if that is anywhere near right.
Thomas guessed it could be a ttm change, you hit one, and it looks like it could be the culprit. Now I guess it's up to Dave. Also adding Christian, in case he has an idea.
I'd be mildly surprised if it's that commit, since it just refactors what looks to me to be two identical code pieces into one instance (within the scope of me screwing that up, but reading it I can't see it).
I'll dig into this today.
https://patchwork.freedesktop.org/patch/401559/
should fix it.
We had a report in the rc1 thread but it got lost in the nouveau stuff as well, I've cc that reporter as well.
please test. Dave.
From: Dave Airlie
Sent: 19 November 2020 01:16
On Thu, 19 Nov 2020 at 08:25, Dave Airlie airlied@gmail.com wrote:
On Thu, 19 Nov 2020 at 08:15, Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Nov 18, 2020 at 11:01 PM David Laight David.Laight@aculab.com wrote:
From: Thomas Zimmermann
Sent: 18 November 2020 19:37
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote: > > I've got the 'splat' below during boot. > This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. > User space is Ubuntu 20.04. > > Additionally the X display has all the colours and alignment slightly > messed up. > 5.9.0 was ok. > I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I don't quite understand what 'git bisect' did. I was bisecting between v5.9 and v5.10-rc1 but it suddenly started generating v5.9.0-rc5+ kernels.
We queue up patches for -rc1 way before the previous kernel is released, so this is normal.
The identified commit was 13a8f46d803 drm/ttm: move ghost object created. (retyped - hope it is right). But the diff to that last 'good' commit is massive.
Yeah that's also normal for non-linear history. If you want to double-check, re-test the parent of that commit (which is 2ee476f77ffe ("drm/ttm: add a simple assign mem to bo wrapper")), which should work, and then the bad commit.
Also is this the first bad commit for both the splat and the screen corruption issues?
So I don't know if that is anywhere near right.
Thomas guessed it could be a ttm change, you hit one, and it looks like it could be the culprit. Now I guess it's up to Dave. Also adding Christian, in case he has an idea.
I'd be mildly surprised if it's that commit, since it just refactors what looks to me to be two identical code pieces into one instance (within the scope of me screwing that up, but reading it I can't see it).
I'll dig into this today.
https://patchwork.freedesktop.org/patch/401559/
should fix it.
Nope, and probably not relevant. pl_flags is 2 or 3 and it is testing for 4.
The oldest kernel doesn't generate the 'splat' either. Just the f*cked up display output.
I'll put a screenshot (photo) into another email.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: David Laight David.Laight@ACULAB.COM
Sent: 19 November 2020 09:31
...
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok.
...
I'll put a screenshot (photo) into another email.
An oddity I'd not noticed is that if I let the screensaver blank the screen, when it is re-initialised the colours are ok (a nice? purple).
So it might just be some (major) race condition in the startup scripts.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi David
Am 18.11.20 um 23:01 schrieb David Laight:
From: Thomas Zimmermann
Sent: 18 November 2020 19:37
Hi
Am 18.11.20 um 19:10 schrieb Linus Torvalds:
On Wed, Nov 18, 2020 at 4:12 AM David Laight David.Laight@aculab.com wrote:
I've got the 'splat' below during boot. This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. User space is Ubuntu 20.04.
Additionally the X display has all the colours and alignment slightly messed up. 5.9.0 was ok. I'm just guessing the two issues are related.
Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change..
I don't quite understand what 'git bisect' did. I was bisecting between v5.9 and v5.10-rc1 but it suddenly started generating v5.9.0-rc5+ kernels.
The identified commit was 13a8f46d803 drm/ttm: move ghost object created. (retyped - hope it is right). But the diff to that last 'good' commit is massive.
So I don't know if that is anywhere near right.
Did you try Daniel's suggestion of testing with the direct parent commit?
Best regards Thomas
David
I'm adding Thomas Zimmermann to the cc, because he did that "drm/ast: Program display mode in CRTC's atomic_enable" which looks relevant in that it's right in that call-chain.
Did some initialization perhaps get overlooked?
And Dave and Daniel and the drm list cc'd as well..
Full splat left quoted below for new people and list.
Linus
[ 20.809891] WARNING: CPU: 0 PID: 973 at drivers/gpu/drm/drm_gem_vram_helper.c:284
drm_gem_vram_offset+0x35/0x40 [drm_vram_helper]
That line is at [1], which comes from
46642a7d4d80 ("drm/vram-helper: don't use ttm bo->offset v4")
But the patch was merged in 5.9-rc1, so it's probably something else.
We've had a lot of TTM-related changes recently, so my best guess is that it's something in TTM with BO initialization.
From some grepping, it looks like we have to call ttm_bo_mem_space() to fill mm_node (i.e., the pointer that causes the warning). But I cannot find where vram helpers do this. Maybe that's a good starting point.
I'm adding the TTM devs to cc.
Best regards Thomas
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv... elper.c?h=v5.10-rc4#n284
[ 20.821543] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
ipmi_ssif intel_powerclamp coretemp kvm_intel kvm joydev input_leds ipmi_si intel_cstate ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel gpio_ich drm aesni_intel hid_generic glue_helper crypto_simd igb usbhid cryptd ahci i2c_i801 hid libahci i2c_smbus lpc_ich dca i2c_ismt i2c_algo_bit
[ 20.887477] CPU: 0 PID: 973 Comm: gnome-shell Not tainted 5.10.0-rc4+ #78 [ 20.894274] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 [ 20.900896] RIP: 0010:drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] [ 20.907342] Code: 00 48 89 e5 85 c0 74 17 48 83 bf 78 01 00 00 00 74 18 48 8b 87 80 01 00 00 5d
48 c1 e0 0c c3 0f 0b 48 c7 c0 ed ff ff ff 5d c3 <0f> 0b 31 c0 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 87 18 06
[ 20.926100] RSP: 0018:ffff9f59811d3a68 EFLAGS: 00010246 [ 20.931339] RAX: 0000000000000002 RBX: ffff8b46861e20c0 RCX: ffffffffc032d600 [ 20.938479] RDX: ffff8b468f47a000 RSI: ffff8b46861e2000 RDI: ffff8b468f9acc00 [ 20.945622] RBP: ffff9f59811d3a68 R08: 0000000000000040 R09: ffff8b46864ce288 [ 20.952769] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b468f47a000 [ 20.959915] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8b468ad2bf00 [ 20.967057] FS: 00007f5b37ac5cc0(0000) GS:ffff8b49efc00000(0000) knlGS:0000000000000000 [ 20.975149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.980904] CR2: 00007f5b3d093f00 CR3: 0000000103438000 CR4: 00000000001006f0 [ 20.988047] Call Trace: [ 20.990506] ast_cursor_page_flip+0x22/0x100 [ast] [ 20.995313] ast_cursor_plane_helper_atomic_update+0x46/0x70 [ast] [ 21.001524] drm_atomic_helper_commit_planes+0xbd/0x220 [drm_kms_helper] [ 21.008243] drm_atomic_helper_commit_tail_rpm+0x3a/0x70 [drm_kms_helper] [ 21.015062] commit_tail+0x99/0x130 [drm_kms_helper] [ 21.020050] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper] [ 21.026269] drm_atomic_commit+0x4a/0x50 [drm] [ 21.030737] drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper] [ 21.037384] __setplane_atomic+0xcc/0x110 [drm] [ 21.041953] drm_mode_cursor_universal+0x13e/0x260 [drm] [ 21.047299] drm_mode_cursor_common+0xef/0x220 [drm] [ 21.052287] ? alloc_set_pte+0x10d/0x6d0 [ 21.056244] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.061242] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 21.066067] drm_ioctl_kernel+0xae/0xf0 [drm] [ 21.070455] drm_ioctl+0x241/0x3f0 [drm] [ 21.074415] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] [ 21.079401] __x64_sys_ioctl+0x91/0xc0 [ 21.083167] do_syscall_64+0x38/0x90 [ 21.086755] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 21.091813] RIP: 0033:0x7f5b3cf1350b [ 21.095403] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66
0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
[ 21.114154] RSP: 002b:00007ffef1966588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 21.121730] RAX: ffffffffffffffda RBX: 00007ffef19665c0 RCX: 00007f5b3cf1350b [ 21.128870] RDX: 00007ffef19665c0 RSI: 00000000c02464bb RDI: 0000000000000009 [ 21.136013] RBP: 00000000c02464bb R08: 0000000000000040 R09: 0000000000000004 [ 21.143157] R10: 0000000000000002 R11: 0000000000000246 R12: 0000561ec9d10060 [ 21.150295] R13: 0000000000000009 R14: 0000561eca2cc9a0 R15: 0000000000000040
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org