On Tue, May 23, 2017 at 12:00:16PM +0200, Lukas Wunner wrote:
On Mon, May 22, 2017 at 09:24:34PM +0200, Daniel Vetter wrote:
On Thu, May 18, 2017 at 09:33:44PM +0200, Lukas Wunner wrote:
Nicolai Stange reports the following oops which is caused by dereferencing rdev->pdev before it's subsequently set by radeon_device_init(). Fix it.
BUG: unable to handle kernel NULL pointer dereference at 00000000000007cb IP: radeon_driver_load_kms+0xeb/0x230 [radeon] PGD 0 P4D 0
Oops: 0000 [#1] SMP Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel serio_raw mmc_core pps_core video i2c_hid hid_plantronics CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1 Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014 task: ffff97d62c8f0000 task.stack: ffffb96f01478000 RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: 0018:ffffb96f0147b9d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97d620085000 RCX: 0000000000610037 RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000000 RBP: ffffb96f0147b9e8 R08: 0000000000000002 R09: ffffb96f0147b924 R10: 0000000000000000 R11: ffff97d62edd2ec0 R12: ffff97d628d5c000 R13: 0000000000610037 R14: ffffffffc0698280 R15: 0000000000000000 FS: 00007f496363d8c0(0000) GS:ffff97d62eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000007cb CR3: 000000022c14c000 CR4: 00000000001406e0 Call Trace: drm_dev_register+0x146/0x1d0 [drm] drm_get_pci_dev+0x9a/0x180 [drm] radeon_pci_probe+0xb8/0xe0 [radeon] local_pci_probe+0x45/0xa0 pci_device_probe+0x14f/0x1a0 driver_probe_device+0x29c/0x450 __driver_attach+0xdf/0xf0 ? driver_probe_device+0x450/0x450 bus_for_each_dev+0x6c/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x170/0x270 driver_register+0x60/0xe0 ? 0xffffffffc0508000 __pci_register_driver+0x4c/0x50 drm_pci_init+0xeb/0x100 [drm] ? vga_switcheroo_register_handler+0x6a/0x90 ? 0xffffffffc0508000 radeon_init+0x98/0xb6 [radeon] do_one_initcall+0x52/0x1a0 ? __vunmap+0x81/0xb0 ? kmem_cache_alloc_trace+0x159/0x1b0 ? do_init_module+0x27/0x1f8 do_init_module+0x5f/0x1f8 load_module+0x27ce/0x2be0 SYSC_finit_module+0xdf/0x110 ? SYSC_finit_module+0xdf/0x110 SyS_finit_module+0xe/0x10 do_syscall_64+0x67/0x150 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f4962295679 RSP: 002b:00007ffdd8c4f878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000055c014ed8200 RCX: 00007f4962295679 RDX: 0000000000000000 RSI: 00007f4962dd19c5 RDI: 0000000000000010 RBP: 00007f4962dd19c5 R08: 0000000000000000 R09: 00007ffdd8c4f990 R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000 R13: 000055c014ed81a0 R14: 0000000000020000 R15: 000055c0149d1fca Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38 RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: ffffb96f0147b9d0 CR2: 00000000000007cb ---[ end trace 89cc4ba7e569c65c ]---
Reported-by: Nicolai Stange nicstange@gmail.com Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo") Signed-off-by: Lukas Wunner lukas@wunner.de
Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and needs to be fixed, so sending out with a proper commit message now. The bug was only introduced to radeon, not amdgpu.
@Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't land before -rc3 because Sean Paul has already sent out the -rc2 pull. I notice you haven't sent out a pull for -rc2 yet, so maybe you want to take it yourself? Whichever you prefer. Thanks & sorry for the breakage!
Just noticed that this has landed already in drm-misc-fixes, without any r-b or at least an ack from radeon driver folks. That's breaking the drm-misc rules, we need at least an ack for small drivers (which radeon really isn't) and a full reviewed-by tag on everything else.
Patch doesn't look wrong, so not much harm, but please follow the ground rules and especially don't ever push your own patches without any peer feedback.
I was aware of that rule and that the available peer feedback (Nicolai's Tested-by) was thin. I misinterpreted Christian's remark that he has "all hands full replacing" Alex such that he is swamped in work and didn't get the chance to look at my patch so far. Christian was already cc'ed on Nicolai's regression report and on every single e-mail that followed. I figured that nagging Christian wouldn't be helpful if he's already overloaded, yet didn't want to miss another rc cycle, so I pushed without waiting further for a response. I'm sorry for the irritation this has caused, I guess in the future nagging Christian despite regrets is the only option in such a case.
Don't worry too much. The entire point of drm-misc is to make contributing to drm/gpu drivers as painless as possible. Occasionally things go a bit wrong, but that's why I then want to focus on tooling&documentation to make sure we'll get this right in the future (if every committer would need to learn every implicit rule we have, we'd get nowhere at all). Assigning blame to people doesn't help in getting better at this stuff as a community.
Anyway just my 2cents of debriefing, looks like we're all good. -Daniel