https://bugs.freedesktop.org/show_bug.cgi?id=111527
Bug ID: 111527 Summary: obs-studio + latest mesa on amdgpu/vega64 leaks kernel memory rapidly Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: not set Priority: not set Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: john@pointysoftware.net QA Contact: dri-devel@lists.freedesktop.org
As of at least mesa 19.3/bfac462d929 on a Vega 64:
Running obs-studio, even without starting a broadcast, will begin a seemingly exponential memory leak. It will be fine for a few minutes, until it rapidly begins consuming what appears to be kernel memory (nothing attributed to app, but total usage skyrockets). With 32G of ram I exhaust system memory after about three minutes, but the OOM killer doesn't know what to take down as OBS itself remains low in the list. This can then murder the whole system.
However, killing OBS causes most of the memory to be freed. I say most because after reproducing on a fresh boot, there were apparently a few gigabytes of unaccounted for memory that never returned. Subsequent repros of the bug on that same boot returned to the same baseline, however. Some caching mechanism gone wrong?
I've noticed this going back at least a few weeks, but haven't a proper bisect. It should be very easy to reproduce, and happens on both Vega 64 systems I have available.
Steps to reproduce, may not all be necessary but I confirmed this does it from a fresh state: - Launch obs-studio - Enable Studio Mode by clicking the button the right - Add two sources: "desktop capture" (select any monitor) and a single "Image" source (any image) - Press Fade/Cut up top to make that state live. No need to actually start recording/broadcasting. - Wait a few minutes or until your system hangs. Memory usage will appear stable for at least a full minute before taking off unprompted. It will not be attributed to the app, however, being apparently kernel memory.
Reproduces with 19.3 - bfac462d929 Does not reproduce with 19.1.4
Kernel versions 5.2.8/5.2.11 same behavior
https://bugs.freedesktop.org/show_bug.cgi?id=111527
--- Comment #1 from Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com ---
Reproduces with 19.3 - bfac462d929 Does not reproduce with 19.1.4
Could you bisect to find when the issue was introduced?
https://bugs.freedesktop.org/show_bug.cgi?id=111527
tele42k3@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |tele42k3@hotmail.com
--- Comment #2 from tele42k3@hotmail.com --- Thanks for the clear steps to reproduce this issue. I managed to reproduce this on my RX 480 and it bisected to:
commit 11a3679e3aba3524cf987f1f808d92c25f16e080 Author: Michel Dänzer michel.daenzer@amd.com Date: Fri Jun 28 18:35:56 2019 +0200
winsys/amdgpu: Make KMS handles valid for original DRM file descriptor
Getting a DMA-buf fd and converting that to a handle using our duplicate of that file descriptor (getting at which requires passing a radeon_winsys pointer to the buffer_get_handle hook) makes sure of this, since duplicated file descriptors reference the same file description and therefore the same GEM handle namespace.
This is necessary because libdrm_amdgpu may use a different DRM file descriptor with a separate handle namespace internally, e.g. because it always reuses any existing amdgpu_device_handle for the same device. amdgpu_bo_export returns a handle which is valid for that internal file descriptor.
Bugzilla: https://bugs.freedesktop.org/110903 Reviewed-by: Marek Olšák marek.olsak@amd.com Tested-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
While testing I saw a .8 to 1 MB/s slow leak which appeared immediately on opening OBS with the test scene. It felt like it consistently hit some obscured value like 64MB before the major memory leak started, which helped bisect the issue.
I reverted the commit on top of f8887909c6683986990474b61afd6d4335a69e41 with good results.
https://bugs.freedesktop.org/show_bug.cgi?id=111527
--- Comment #3 from Michel Dänzer michel@daenzer.net --- Does https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1907 help by any chance?
https://bugs.freedesktop.org/show_bug.cgi?id=111527
--- Comment #4 from tele42k3@hotmail.com --- I reproduced the issue with 7d28e9ddd62eeccf6c528beee6b1a58fdfb7f5a0 + merge request 1907. No visible effect.
https://bugs.freedesktop.org/show_bug.cgi?id=111527
tele42k3@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Blocks| |111444
Referenced Bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=111444 [Bug 111444] [TRACKER] Mesa 19.2 release tracker
https://bugs.freedesktop.org/show_bug.cgi?id=111527
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |MOVED Status|NEW |RESOLVED
--- Comment #5 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1426.
dri-devel@lists.freedesktop.org