https://bugs.freedesktop.org/show_bug.cgi?id=105308
Bug ID: 105308 Summary: X log ballooning in size with "drmmode_wait_vblank failed for scanout update" and "get vblank counter failed" Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: rah@settrans.net
Created attachment 137717 --> https://bugs.freedesktop.org/attachment.cgi?id=137717&action=edit First 3900 lines of Xorg log
Hi,
My X log is on occassion filling up my hard disk, reaching a size of 18G with mostly repeats of two lines. The X server is working and running fine; I'm typing this on it now. The first indication that there is a problem is seeing "No space left on device". It looks like the woes begin with this line:
[ 668.860] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: Invalid argument, TearFree inactive until next modeset
The kernel also reported this although I've no idea when in relation to other events:
[57903.995504] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. [57903.995539] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22! [58014.990950] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. [58014.990998] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22!
rah@lotus:~/store-f$ ls -lh Xorg.0.log -rw-r--r-- 1 rah rah 18G Mar 1 09:52 Xorg.0.log rah@lotus:~/store-f$ grep "drmmode_wait_vblank failed for scanout update" Xorg.0.log | head -n 1 [ 668.860] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: Invalid argument rah@lotus:~/store-f$ grep "get vblank counter failed" Xorg.0.log | head -n 1 [ 10749.146] (WW) AMDGPU(0): get vblank counter failed: Invalid argument rah@lotus:~/store-f$ wc -l Xorg.0.log 203212490 Xorg.0.log rah@lotus:~/store-f$ egrep -vc "(drmmode_wait_vblank failed for scanout update|get vblank counter failed)" Xorg.0.log 1509 rah@lotus:~/store-f$ grep -c "drmmode_wait_vblank failed for scanout update" Xorg.0.log 200689290 rah@lotus:~/store-f$ grep -c "get vblank counter failed" Xorg.0.log 2521692
I'll attach the first 3900 lines of the Xorg log and the kernel log.
Thanks,
Bob
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #1 from Bob Ham rah@settrans.net --- Created attachment 137718 --> https://bugs.freedesktop.org/attachment.cgi?id=137718&action=edit Kernel log
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #2 from Michel Dänzer michel@daenzer.net --- (In reply to Bob Ham from comment #0)
[ 668.860] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: Invalid argument, TearFree inactive until next modeset
Did anything in particular happen, or did you do anything in particular, around this time (about 11 minutes after system boot)?
The kernel also reported this although I've no idea when in relation to other events:
[57903.995504] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. [57903.995539] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22!
The timestamps should be comparable, so this happened much later and is probably not directly related. Looks like the kernel ran out of memory or some other resource while processing GPU commands.
Please attach the output of
xrandr --verbose
while the problem is occurring.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #3 from Bob Ham rah@settrans.net --- Created attachment 137719 --> https://bugs.freedesktop.org/attachment.cgi?id=137719&action=edit Output of xrandr --verbose
(In reply to Michel Dänzer from comment #2)
(In reply to Bob Ham from comment #0)
[ 668.860] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: Invalid argument, TearFree inactive until next modeset
Did anything in particular happen, or did you do anything in particular, around this time (about 11 minutes after system boot)?
Not that I recall, no. FYI, current uptime is 2 days, 17:25.
Please attach the output of
xrandr --verbose
while the problem is occurring.
I'll attach the output as requested. It's from the same X session but I ran "truncate /var/log/Xorg.0.log" and since then the file has been at 0 size so I can only assume the problem is still occurring.
Thanks,
Bob
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #4 from Michel Dänzer michel@daenzer.net --- While the problem is occurring, and the screen contents are being updated at least once per second, please run the following as root:
echo 255 >/sys/module/drm/parameters/debug"; sleep 1; echo 0
/sys/module/drm/parameters/debug
and attach the resulting dmesg output.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #5 from Bob Ham rah@settrans.net --- Created attachment 137723 --> https://bugs.freedesktop.org/attachment.cgi?id=137723&action=edit Kernel output while setting /sys/module/drm/parameters/debug to 255
This was done while playing video on the display.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #137723|text/x-log |text/plain mime type| |
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #6 from Michel Dänzer michel@daenzer.net --- There's nothing in there that would correspond to the Xorg log messages. Please make sure the problem is actually occurring when you capture the dmesg debugging output.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #7 from Bob Ham rah@settrans.net --- (In reply to Michel Dänzer from comment #6)
Please make sure the problem is actually occurring when you capture the dmesg debugging output.
I've discovered that the problem occurs when the monitor is switched off. With just the monitor off, the "get vblank counter failed: Invalid argument" error appears. With the monitor off while playing a video, both the aforementioned error and "drmmode_wait_vblank failed for scanout update: Invalid argument" appears. I'll attach two kernel logs as instructed, one for each condition.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
Bob Ham rah@settrans.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #137723|0 |1 is obsolete| |
--- Comment #8 from Bob Ham rah@settrans.net --- Created attachment 137774 --> https://bugs.freedesktop.org/attachment.cgi?id=137774&action=edit Kernel output while setting /sys/module/drm/parameters/debug to 255 with monitor off
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #9 from Bob Ham rah@settrans.net --- Created attachment 137775 --> https://bugs.freedesktop.org/attachment.cgi?id=137775&action=edit Kernel output while setting /sys/module/drm/parameters/debug to 255 with monitor off while playing video
https://bugs.freedesktop.org/show_bug.cgi?id=105308
Bob Ham rah@settrans.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #137719|0 |1 is obsolete| |
--- Comment #10 from Bob Ham rah@settrans.net --- Created attachment 137776 --> https://bugs.freedesktop.org/attachment.cgi?id=137776&action=edit Output of xrandr --verbose
Output of xrandr --verbose while the error is really occurring
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #11 from Bob Ham rah@settrans.net --- Created attachment 137777 --> https://bugs.freedesktop.org/attachment.cgi?id=137777&action=edit First 1000 lines of /var/log/Xorg.0.log (2018-03-04)
Here is an updated version of the Xorg.0.log. The "flip queue failed in amdgpu_scanout_flip" error is present again.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #12 from Michel Dänzer michel@daenzer.net --- This might work better if you enable DC with
amdgpu.dc=1
on the kernel command line.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #13 from Michel Dänzer michel@daenzer.net --- https://patchwork.freedesktop.org/patch/209464/ fixes this for me with DC disabled.
https://bugs.freedesktop.org/show_bug.cgi?id=105308
--- Comment #14 from Mariusz Mazur mariusz.g.mazur@gmail.com --- So, I'm not 100% sure yet, I need time to rebuild 4.14.28 with Michel's patch and run it for a bit, but at this point I'm about 90% certain that a patch in 4.14.29 introduces a regression, as described in bug 106529. And from reading the changelog and looking at the code, it does seem to me that this patch is the most likely culprit.
Short version of what's wrong: on a multi-monitor setup, if the primary display is on DP and this code toggles it off/on, the window manager doesn't take it well and you end up with windows all over the place.
(I don't know if this is the same codepath as is used with amdgpu.dc=1 on more recent kernels, though I do now the same bug occurs on 4.16+ with dc=1.)
https://bugs.freedesktop.org/show_bug.cgi?id=105308
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #15 from Michel Dänzer michel@daenzer.net --- Resolving this report, as it should be fixed or at least much better with the kernel change referenced in comment 13.
Mariusz, please keep the discussion about your issue on bug 106529, or maybe file a new report for the non-DC case.
dri-devel@lists.freedesktop.org