Hi Linus,
On Thu, Nov 11, 2021 at 2:03 PM Sudip Mukherjee sudipm.mukherjee@gmail.com wrote:
Did a bisect and this is a separate issue than the one we saw in 5.10.79-rc1.
The bisect log: # bad: [ce840177930f591a181f55515fc6ac9e1f56b84a] Merge tag 'defconfig-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc # good: [6f2b76a4a384e05ac8d3349831f29dff5de1e1e2] Merge tag 'Smack-for-5.16' of https://github.com/cschaufler/smack-next git bisect start 'ce840177930f5' '6f2b76a4a384e05ac8d3349831f29dff5de1e1e2' # good: [a64a325bf6313aa5cde7ecd691927e92892d1b7f] Merge tag 'afs-next-20211102' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs git bisect good a64a325bf6313aa5cde7ecd691927e92892d1b7f # bad: [dcd68326d29b62f3039e4f4d23d3e38f24d37360] Merge tag 'devicetree-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux git bisect bad dcd68326d29b62f3039e4f4d23d3e38f24d37360 # bad: [c7c774fe09389fc806bbe4b487c18e45f576c1ae] Merge tag 'drm-intel-next-2021-10-04' of git://anongit.freedesktop.org/drm/drm-intel into drm-next git bisect bad c7c774fe09389fc806bbe4b487c18e45f576c1ae # good: [8017ecb11ebbcdfcbdff14c5edbdf1efc14991f4] drm/amd/display: Added root clock optimization flags git bisect good 8017ecb11ebbcdfcbdff14c5edbdf1efc14991f4 # good: [8a1ec3f3275479292613273a7be2ac87f2a7f6e6] drm/i915: Remove DP_PORT_EN stuff from link training code git bisect good 8a1ec3f3275479292613273a7be2ac87f2a7f6e6 # bad: [9962601ca5719050906915c3c33a63744ac7b15c] drm/bridge: dw-hdmi-cec: Make use of the helper function devm_add_action_or_reset() git bisect bad 9962601ca5719050906915c3c33a63744ac7b15c # bad: [606b102876e3741851dfb09d53f3ee57f650a52c] drm: fb_helper: fix CONFIG_FB dependency git bisect bad 606b102876e3741851dfb09d53f3ee57f650a52c # good: [c43da06c24a485308e80d709737b446e8cad175d] dt-bindings: drm/panel: boe-tv101wum-nl6: Support enabling a 3.3V rail git bisect good c43da06c24a485308e80d709737b446e8cad175d # good: [8d6b006e1f51c99016aa39ca9e03947cbdd024e3] drm/virtio: implement context init: handle VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK git bisect good 8d6b006e1f51c99016aa39ca9e03947cbdd024e3 # bad: [d0f5d790ae863079025398015eb59347b01db455] drm/ttm: remove TTM_PAGE_FLAG_NO_RETRY git bisect bad d0f5d790ae863079025398015eb59347b01db455 # bad: [f5d28856b89baab4232a9f841e565763fcebcdf9] drm/ttm: stop calling tt_swapin in vm_access git bisect bad f5d28856b89baab4232a9f841e565763fcebcdf9 # bad: [78aa20fa4381623cf59a85d053486f98784ca3a0] drm/virtio: implement context init: advertise feature to userspace git bisect bad 78aa20fa4381623cf59a85d053486f98784ca3a0 # bad: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0] drm/virtio: implement context init: add virtio_gpu_fence_event git bisect bad cd7f5ca33585918febe5e2f6dc090a21cfa775b0 # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0] drm/virtio: implement context init: add virtio_gpu_fence_event
And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed the problem I was seeing on my qemu test of x86_64. The qemu image is based on Ubuntu.
Will be happy to test any fix or more debugging if needed.
On Thu, Nov 11, 2021 at 8:51 PM Sudip Mukherjee sudipm.mukherjee@gmail.com wrote:
Last night's test on 66f4beaa6c1d worked fine. So I guess this has now been fixed. Thanks.
[ Hmm. This email was marked as spam for me. I see no obvious reason for it being marked as spam, but it happens.. ]
On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee sudipm.mukherjee@gmail.com wrote:
# first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0] drm/virtio: implement context init: add virtio_gpu_fence_event
Hmm. Judging from your automated screenshots, the login never appears.
Presumably either that commit is somehow buggy in itself - or it does exactly what it means to do, and the new poll() semantics just confuses the heck out of the X server (or wayland or whatever).
And honestly, if I read that thing correctly, the patch is entirely broken. The new poll function (virtio_gpu_poll()) will unconditionally remove the first event from the event list, and then report "Yeah, I had events".
This is completely bogus for a few reasons:
- poll() really should be idempotent, because the poll function gets called multiple times
- it doesn't even seem to check that the event that it removes is the new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will unconditionally just remove random events.
- it does seem to check the "vfpriv->ring_idx_mask" and do the old thing if that is zero, but I see absolutely no reason for that (and that check itself has caused problems, see below)
Honestly, my reaction to this all is that that commit is fundamentally broken and probably should be reverted regardless as "this commit does bad things".
HOWEVER - it has had a fix for a NULL pointer dereference in the meantime - can you check whether the current top of tree happens to work for you? Maybe your problem isn't due to "that commit does unnatural things", but simply due to the bug fixed in d89c0c8322ec ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
And if it's still broken with that commit, I'll happily revert it and people need to go back to the drawing board.
In fact, I would really suggest that people look at that virtio_gpu_poll() function regardless. That odd "let's unconditionally just drop events in the poll function is really REALLY broken behavior.
Linus
Hi Linus,
On Sat, Nov 13, 2021 at 5:07 PM Linus Torvalds torvalds@linux-foundation.org wrote:
<snip>
I sent another mail yesterday which is now at https://lore.kernel.org/lkml/CADVatmOOzCxAgLhCu1tTz=44sgRDXds5-oMZ3V0w4f5kLC... I will just pase that here for you.
Last night's test on 66f4beaa6c1d worked fine. So I guess this has now been fixed.
I have not done a bisect to see what has fixed it, but looking at the log I think it will be that NULL pointer fix.
On Sat, Nov 13, 2021 at 09:06:57AM -0800, Linus Torvalds wrote:
Greg also reported a regression, plus I looked at the commit and had questions too.
Tbh I haven't looked at the poll implementation, but it's fishy on principles as in drm drivers shouldn't reinvent them. The commit message cites vmwgfx as example for a private driver drm_event, but that works perfectly fine with standard drm_poll (and is meant to work perfectly fine with standard drm_poll).
So if it's buggy on top of questionable I think revert might be the right choice irrespective of whether there's some fixes in-flight.
So if you end up pushing that revert:
References: https://lore.kernel.org/dri-devel/YZJrutLaiwozLfSw@phenom.ffwll.local/ Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
Plus credit Greg too and all that ofc.
But lets wait a bit for virtio folks to chime in, it's only Monday :-)
Cheers, Daniel
dri-devel@lists.freedesktop.org