https://bugzilla.kernel.org/show_bug.cgi?id=209987
Bug ID: 209987 Summary: Memory leak in amdgpu_dm_update_connector_after_detect Product: Drivers Version: 2.5 Kernel Version: 5.9.1 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: lstarnes1024@gmail.com Regression: No
Created attachment 293341 --> https://bugzilla.kernel.org/attachment.cgi?id=293341&action=edit /sys/kernel/debug/kmemleak
It looks like there's a memory leak in drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:amdgpu_dm_update_connector_after_detect. It appears to be calling drm_add_edid_modes, which indirectly calling ito either do_detailed_mode or drm_mode_duplicate.
This has caused me to run out of memory a handful of times, which could only be resolved by rebooting.
I only experienced this after upgrading to 5.9.1, and it looks like commit b24bdc37d03a0478189e20a50286092840f414fa added the call to drm_add_edid_modes in amdgpu_dm_update_connector_after_detect.
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #1 from Lee Starnes (lstarnes1024@gmail.com) --- Created attachment 293343 --> https://bugzilla.kernel.org/attachment.cgi?id=293343&action=edit dmesg with oom-killer invocations
Note that the stack has amdgpu_dm_update_connector_after_detect+0x28d/0x330 > drm_add_edid_modes+0x6e1/0x1860. This was recorded on Linux 5.9.1, but the kmemleak was on linux 5.9.2.
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #2 from Lee Starnes (lstarnes1024@gmail.com) --- It looks like this can be fixed by setting aconnector->num_modes to the return value from drm_add_edid_modes. At least one other place in amdgpu_dm.c sets struct amdgpu_dm_connector.num_modes to the return value of drm_add_edid_modes like this. I'm not familiar enough with AMDGPU or DRM internals to know if this will mess anything up.
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #3 from Lee Starnes (lstarnes1024@gmail.com) --- Created attachment 293577 --> https://bugzilla.kernel.org/attachment.cgi?id=293577&action=edit proposed patch
https://bugzilla.kernel.org/show_bug.cgi?id=209987
youling257@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |youling257@gmail.com
--- Comment #4 from youling257@gmail.com --- I have the same memory leak.
android_x86:/ # echo scan > /sys/kernel/debug/kmemleak android_x86:/ # cat /sys/kernel/debug/kmemleak android_x86:/ # echo scan > /sys/kernel/debug/kmemleak android_x86:/ # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8edad8208580 (size 128): comm "ueventd", pid 1498, jiffies 4294676333 (age 65.106s) hex dump (first 32 bytes): 22 16 04 00 00 0a 30 0a 50 0a a0 0a 00 00 40 06 ".....0.P.....@. 43 06 48 06 69 06 00 00 05 00 00 00 00 00 00 00 C.H.i........... backtrace: [<0000000080ce8e0b>] do_detailed_mode+0x27c/0x520 [drm] [<000000000427e646>] drm_for_each_detailed_block.part.0+0x35/0x110 [drm] [<00000000566583b3>] drm_add_edid_modes+0x22b/0x1880 [drm] [<00000000f63b328b>] amdgpu_dm_update_connector_after_detect+0x385/0x4f0 [amdgpu] [<000000009f1bbb4c>] dm_helpers_read_local_edid+0xaa/0x170 [amdgpu] [<0000000005f6f065>] dc_link_detect_helper+0x29b/0xd70 [amdgpu] [<00000000a096d0f5>] dc_link_detect+0x31/0x50 [amdgpu] [<000000009a977098>] amdgpu_dm_init.isra.0.cold+0xf81/0x1297 [amdgpu] [<00000000cfd3da50>] dm_hw_init+0xe/0x20 [amdgpu] [<00000000128bd3d5>] amdgpu_device_init.cold+0x13c7/0x16b5 [amdgpu] [<0000000039b2a07d>] amdgpu_driver_load_kms+0x2b/0x200 [amdgpu] [<000000009b370228>] amdgpu_pci_probe+0x129/0x1b0 [amdgpu] [<0000000066485d99>] pci_device_probe+0xd2/0x150 [<00000000c858be29>] really_probe+0x232/0x460 [<00000000f84cda17>] driver_probe_device+0x5d/0x150 [<00000000103f2cc3>] device_driver_attach+0xa1/0xb0 unreferenced object 0xffff8edad828f280 (size 128): comm "ueventd", pid 1498, jiffies 4294676333 (age 65.107s) hex dump (first 32 bytes): 14 44 02 00 80 07 d8 07 04 08 98 08 00 00 38 04 .D............8. 3c 04 41 04 65 04 00 00 0a 00 00 00 00 00 00 00 <.A.e........... backtrace: [<0000000017977f42>] drm_mode_duplicate+0x1f/0x90 [drm] [<00000000c4367b7e>] drm_mode_std+0x1fe/0x5e0 [drm] [<00000000d7555cdd>] drm_add_edid_modes+0x2c7/0x1880 [drm] [<00000000f63b328b>] amdgpu_dm_update_connector_after_detect+0x385/0x4f0 [amdgpu] [<000000009f1bbb4c>] dm_helpers_read_local_edid+0xaa/0x170 [amdgpu] [<0000000005f6f065>] dc_link_detect_helper+0x29b/0xd70 [amdgpu] [<00000000a096d0f5>] dc_link_detect+0x31/0x50 [amdgpu] [<000000009a977098>] amdgpu_dm_init.isra.0.cold+0xf81/0x1297 [amdgpu] [<00000000cfd3da50>] dm_hw_init+0xe/0x20 [amdgpu] [<00000000128bd3d5>] amdgpu_device_init.cold+0x13c7/0x16b5 [amdgpu] [<0000000039b2a07d>] amdgpu_driver_load_kms+0x2b/0x200 [amdgpu] [<000000009b370228>] amdgpu_pci_probe+0x129/0x1b0 [amdgpu] [<0000000066485d99>] pci_device_probe+0xd2/0x150 [<00000000c858be29>] really_probe+0x232/0x460 [<00000000f84cda17>] driver_probe_device+0x5d/0x150 [<00000000103f2cc3>] device_driver_attach+0xa1/0xb0
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #5 from youling257@gmail.com --- (In reply to Lee Starnes from comment #3)
Created attachment 293577 [details] proposed patch
this patch seem no help for me, test on linux 5.10 kernel. thanks for you point the bad commit, i can revert "drm/amd/display: Fix EDID parsing after resume from suspend" to fix memory leak.
https://bugzilla.kernel.org/show_bug.cgi?id=209987
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #6 from Alex Deucher (alexdeucher@gmail.com) --- Does this patch work any better? https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg54780.html
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #7 from youling257@gmail.com --- (In reply to Alex Deucher from comment #6)
Does this patch work any better? https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg54780.html
nice! test this patch fix my memleak problem.
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #8 from Lee Starnes (lstarnes1024@gmail.com) --- (In reply to Alex Deucher from comment #6)
Does this patch work any better? https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg54780.html
This looks better than my patch. I've been using it for the last week or so with my RX 480 and it has been working.
https://bugzilla.kernel.org/show_bug.cgi?id=209987
Oleksandr Natalenko (oleksandr@natalenko.name) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |oleksandr@natalenko.name
--- Comment #9 from Oleksandr Natalenko (oleksandr@natalenko.name) --- This change caused a regression that leads to inability to light up the display after powering it off.
See:
* https://lore.kernel.org/lkml/e5d9703f-42a4-f154-cf13-55a3eba10859@tomt.net/ * https://bugzilla.kernel.org/show_bug.cgi?id=211033 * https://bugs.archlinux.org/task/69202
https://bugzilla.kernel.org/show_bug.cgi?id=209987
--- Comment #10 from youling257@gmail.com --- I can't stand memory leak, i will revert "Revert "drm/amd/display: Fix memory leaks in S3 resume""
revert 5efc1f4b454c6179d35e7b0c3eda0ad5763a00fc in today linux 5.11-rc3. i use rc kernel every week.
dri-devel@lists.freedesktop.org