I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
~r.
On Friday, May 20, 2011, Ray Lee wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
It is on the list of known regressions from 2.6.37, but we're not tracking them any more now that 2.6.39 is out.
Thanks, Rafael
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
~r.
2011/5/20 Rafael J. Wysocki rjw@sisk.pl
It is on the list of known regressions from 2.6.37, but we're not tracking them any more now that 2.6.39 is out.
Hopefully Chris is still tracking them, even if you aren't.
Chris? What other information can the affected person provide, or what tests can he run to help close this out?
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath? -Chris
On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath?
Why assume it's a different bug? I would almost wonder if it might affect all Sandy Bridge GPUs. In any case, I no longer have the original motherboard (it was recalled, as I said in the first post), nor even the revision of it (it had other issues that weren't being fixed). I *assume* I will have the same problem with my new motherboard (Intel DQ67SW), but I haven't verified that yet. I'll be sure to try a netconsole when I have to reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, but at the moment it seems reasonable to address the problem bisected in the bug, even if it turns out to be different.
On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" luke@dashjr.org wrote:
On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath?
Why assume it's a different bug? I would almost wonder if it might affect all Sandy Bridge GPUs. In any case, I no longer have the original motherboard (it was recalled, as I said in the first post), nor even the revision of it (it had other issues that weren't being fixed). I *assume* I will have the same problem with my new motherboard (Intel DQ67SW), but I haven't verified that yet. I'll be sure to try a netconsole when I have to reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, but at the moment it seems reasonable to address the problem bisected in the bug, even if it turns out to be different.
The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate locking between release and IRQ and so is prone to such races as befell Kirill should not surprise anyone. As neither UMS nor DRI supported SNB, I can quite confidently state they are separate bugs. -Chris
On Saturday, May 21, 2011 11:40:17 AM Chris Wilson wrote:
The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate locking between release and IRQ and so is prone to such races as befell Kirill should not surprise anyone. As neither UMS nor DRI supported SNB, I can quite confidently state they are separate bugs.
Unfortunately, I cannot help troubleshoot that bug any further, as I no longer have the affected motherboard. I was unable to reproduce it on my Intel DQ67SW.
However, I did encounter a new regression, which I have reported as: https://bugzilla.kernel.org/show_bug.cgi?id=35552 This one is related to Intel HD Audio, not Graphics.
Hello Chris, everyone,
On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" luke@dashjr.org wrote:
On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath?
Why assume it's a different bug? I would almost wonder if it might affect all Sandy Bridge GPUs. In any case, I no longer have the original motherboard (it was recalled, as I said in the first post), nor even the revision of it (it had other issues that weren't being fixed). I *assume* I will have the same problem with my new motherboard (Intel DQ67SW), but I haven't verified that yet. I'll be sure to try a netconsole when I have to reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, but at the moment it seems reasonable to address the problem bisected in the bug, even if it turns out to be different.
The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate locking between release and IRQ and so is prone to such races as befell Kirill should not surprise anyone. As neither UMS nor DRI supported SNB, I can quite confidently state they are separate bugs. -Chris
I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok on kernels < 2.6.38, and starting from 2.6.38 the system is just unusable because X either crashes the kernel (2.6.38), or does not start at all (2.6.39):
https://bugzilla.kernel.org/show_bug.cgi?id=36052
It's a regression. It's blocking me to upgrade to newer kernels. I've done my homework -- digged it and came with detailed OOPS on netconsole and bisected to single commit. Could this please be fixed?
Thanks, Kirill
On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
Hello Chris, everyone,
On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" luke@dashjr.org wrote:
On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote:
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month ago against 2.6.38. Now 2.6.39 was just released without the regression being addressed. This bug makes the system unusable... Some guys on IRC suggested I email, so here it is.
See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath?
Why assume it's a different bug? I would almost wonder if it might affect all Sandy Bridge GPUs. In any case, I no longer have the original motherboard (it was recalled, as I said in the first post), nor even the revision of it (it had other issues that weren't being fixed). I *assume* I will have the same problem with my new motherboard (Intel DQ67SW), but I haven't verified that yet. I'll be sure to try a netconsole when I have to reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, but at the moment it seems reasonable to address the problem bisected in the bug, even if it turns out to be different.
The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate locking between release and IRQ and so is prone to such races as befell Kirill should not surprise anyone. As neither UMS nor DRI supported SNB, I can quite confidently state they are separate bugs. -Chris
I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok on kernels < 2.6.38, and starting from 2.6.38 the system is just unusable because X either crashes the kernel (2.6.38), or does not start at all (2.6.39):
https://bugzilla.kernel.org/show_bug.cgi?id=36052
It's a regression. It's blocking me to upgrade to newer kernels. I've done my homework -- digged it and came with detailed OOPS on netconsole and bisected to single commit. Could this please be fixed?
Silence...
Still, reverting the bisected patch helps even for 3.0:
On Tue, Jul 12, 2011 at 8:17 PM, Kirill Smelkov kirr@mns.spb.ru wrote:
On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
Hello Chris, everyone,
On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" luke@dashjr.org wrote:
On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote:
[ Adding Chris Wilson (author of the problematic patch) and Rafael Wysocki to the message ]
On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote: > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month > ago against 2.6.38. Now 2.6.39 was just released without the > regression being addressed. This bug makes the system unusable... Some > guys on IRC suggested I > email, so here it is.
See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath?
Why assume it's a different bug? I would almost wonder if it might affect all Sandy Bridge GPUs. In any case, I no longer have the original motherboard (it was recalled, as I said in the first post), nor even the revision of it (it had other issues that weren't being fixed). I *assume* I will have the same problem with my new motherboard (Intel DQ67SW), but I haven't verified that yet. I'll be sure to try a netconsole when I have to reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, but at the moment it seems reasonable to address the problem bisected in the bug, even if it turns out to be different.
The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate locking between release and IRQ and so is prone to such races as befell Kirill should not surprise anyone. As neither UMS nor DRI supported SNB, I can quite confidently state they are separate bugs. -Chris
I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok on kernels < 2.6.38, and starting from 2.6.38 the system is just unusable because X either crashes the kernel (2.6.38), or does not start at all (2.6.39):
https://bugzilla.kernel.org/show_bug.cgi?id=36052
It's a regression. It's blocking me to upgrade to newer kernels. I've done my homework -- digged it and came with detailed OOPS on netconsole and bisected to single commit. Could this please be fixed?
Silence...
Still, reverting the bisected patch helps even for 3.0:
Keith, Chris, what's up with this regression from 2.6.38? It seems commit e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths") caused problems on other machines.
[ Cc'ing Florian Mickler and Keith Packard ]
On Tue, Jul 12, 2011 at 09:07:47PM +0300, Pekka Enberg wrote:
On Tue, Jul 12, 2011 at 8:17 PM, Kirill Smelkov kirr@mns.spb.ru wrote:
On Sat, May 28, 2011 at 05:19:20PM +0400, Kirill Smelkov wrote:
Hello Chris, everyone,
On Sat, May 21, 2011 at 04:40:17PM +0100, Chris Wilson wrote:
On Sat, 21 May 2011 11:23:53 -0400, "Luke-Jr" luke@dashjr.org wrote:
On Saturday, May 21, 2011 4:41:45 AM Chris Wilson wrote:
On Fri, 20 May 2011 11:08:56 -0700, Ray Lee ray-lk@madrabbit.org wrote: > [ Adding Chris Wilson (author of the problematic patch) and Rafael > Wysocki to the message ] > > On Fri, May 20, 2011 at 10:06 AM, Luke-Jr luke@dashjr.org wrote: > > I submitted https://bugzilla.kernel.org/show_bug.cgi?id=33662 a month > > ago against 2.6.38. Now 2.6.39 was just released without the > > regression being addressed. This bug makes the system unusable... Some > > guys on IRC suggested I > > email, so here it is. > > See the bugzilla entry for the bisection history.
Which has nothing to do with Luke's bug. Considering the thousand things that can go wrong during X starting, without a hint as to which it is nigh on impossible to debug except by trial and error. If you set up netconsole, does the kernel emit an OOPS with it's last dying breath?
Why assume it's a different bug? I would almost wonder if it might affect all Sandy Bridge GPUs. In any case, I no longer have the original motherboard (it was recalled, as I said in the first post), nor even the revision of it (it had other issues that weren't being fixed). I *assume* I will have the same problem with my new motherboard (Intel DQ67SW), but I haven't verified that yet. I'll be sure to try a netconsole when I have to reboot next and get a chance to try the most recent 2.6.38 and .39 kernels, but at the moment it seems reasonable to address the problem bisected in the bug, even if it turns out to be different.
The bisection is into an old DRI1 bug on 945GM. That DRI has inadequate locking between release and IRQ and so is prone to such races as befell Kirill should not surprise anyone. As neither UMS nor DRI supported SNB, I can quite confidently state they are separate bugs. -Chris
I see DRI1 is maybe buggy and old, but still, pre-kms X used to work ok on kernels < 2.6.38, and starting from 2.6.38 the system is just unusable because X either crashes the kernel (2.6.38), or does not start at all (2.6.39):
https://bugzilla.kernel.org/show_bug.cgi?id=36052
It's a regression. It's blocking me to upgrade to newer kernels. I've done my homework -- digged it and came with detailed OOPS on netconsole and bisected to single commit. Could this please be fixed?
Silence...
Still, reverting the bisected patch helps even for 3.0:
Keith, Chris, what's up with this regression from 2.6.38? It seems commit e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths") caused problems on other machines.
Silence again, and not surprising -- I was ringing this bell for 3 months already:
https://bugzilla.kernel.org/show_bug.cgi?id=33662#c10 https://bugzilla.kernel.org/show_bug.cgi?id=36052 (and on the list)
with detailed logs and bisected single patch, without even single reply from intel-gfx people.
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first bad io access the system freezes completely:
On netconsole:
# X starts here, then
[ 45.102377] ------------[ cut here ]------------ [ 45.102402] WARNING: at lib/iomap.c:43 bad_io_access+0x3d/0x40() [ 45.102411] Hardware name: PCISA-945GSE [ 45.102418] Bad IO access at port 0x84 (return inl(port)) [ 45.102425] Modules linked in: [ 45.102438] Pid: 2846, comm: sshd Not tainted 3.0.0--NAVY #33 [ 45.102445] Call Trace: [ 45.102460] [<c118e9fd>] ? bad_io_access+0x3d/0x40 [ 45.102473] [<c10287ec>] warn_slowpath_common+0x6c/0xa0 [ 45.102484] [<c118e9fd>] ? bad_io_access+0x3d/0x40 [ 45.102495] [<c102889e>] warn_slowpath_fmt+0x2e/0x30 [ 45.102506] [<c118e9fd>] bad_io_access+0x3d/0x40 [ 45.102516] [<c118edb2>] ioread32+0x22/0x40 [ 45.102528] [<c122cc7d>] i915_driver_irq_handler+0x1ad/0x660 [ 45.102541] [<c12c6a7e>] ? rtl8169_interrupt+0xee/0x370 [ 45.102554] [<c105c396>] handle_irq_event_percpu+0x36/0x140 [ 45.102565] [<c105e490>] ? handle_edge_irq+0x150/0x150 [ 45.102576] [<c105c4d9>] handle_irq_event+0x39/0x60 [ 45.102587] [<c105e4d5>] handle_fasteoi_irq+0x45/0xd0 [ 45.102594] <IRQ> [<c1003c29>] ? do_IRQ+0x39/0xb0 [ 45.102613] [<c103c9b3>] ? start_flush_work+0xc3/0x130 [ 45.102625] [<c13bc329>] ? common_interrupt+0x29/0x30 [ 45.102636] [<c13bc329>] ? common_interrupt+0x29/0x30 [ 45.102648] [<c11e007b>] ? pnpacpi_encode_resources+0x37b/0x7a0 [ 45.102659] [<c109971e>] ? fget_light+0xe/0xf0 [ 45.102671] [<c10a8f97>] ? do_select+0x2e7/0x680 [ 45.102685] [<c1341998>] ? sch_direct_xmit+0x58/0x1d0 [ 45.102695] [<c10a83e0>] ? poll_freewait+0xa0/0xa0 [ 45.102706] [<c102df37>] ? local_bh_enable+0x47/0xa0 [ 45.102718] [<c132e371>] ? dev_queue_xmit+0x101/0x4e0 [ 45.102729] [<c134ffba>] ? ip_finish_output+0x10a/0x2f0 [ 45.102740] [<c1350216>] ? ip_output+0x76/0x90 [ 45.102750] [<c134d715>] ? ip_local_out+0x65/0x70 [ 45.102762] [<c134fa3d>] ? ip_queue_xmit+0x1bd/0x3b0 [ 45.102775] [<c1362af8>] ? tcp_transmit_skb+0x468/0x7d0 [ 45.102788] [<c13215af>] ? sk_reset_timer+0xf/0x20 [ 45.102798] [<c1362446>] ? tcp_event_new_data_sent+0x86/0xc0 [ 45.102809] [<c1364fc1>] ? tcp_write_xmit+0x1e1/0x9a0 [ 45.102822] [<c1326925>] ? __alloc_skb+0x55/0x100 [ 45.102838] [<c102df37>] ? local_bh_enable+0x47/0xa0 [ 45.102849] [<c1321246>] ? release_sock+0xd6/0x110 [ 45.102859] [<c13657f7>] ? __tcp_push_pending_frames+0x27/0x80 [ 45.102870] [<c13584fa>] ? tcp_sendmsg+0x64a/0xac0
-*- and then system FREEZE -*-
For completeness `X -verbose` log is in "Appendix 1", (but who cares anyway? I've sent lots of such logs without a reply).
And again, after reverting e8616b6 ("drm/i915: Initialise ring vfuncs for old DRI paths") on top of v3.0, X works without any problem again.
So I wonder:
I thought people are trying to do "no regressions" rule in kernel. Should we then just apply the following patch? In case Intel people are not responding, should it just go directly into mainline?
Or would it be more fair to say that UMS is not supported anymore, is broken and just remove support for it?
Thanks, Kirill
P.S. Sometimes people change their hardware preferences based on software support quality. Knock, knock...
From ef91a178e6069ae07c7a3c1e39e13eea609953cd Mon Sep 17 00:00:00 2001
From: Kirill Smelkov kirr@mns.spb.ru Date: Wed, 29 Jun 2011 14:22:49 +0400 Subject: [PATCH] Revert "drm/i915: Initialise ring vfuncs for old DRI paths"
This reverts commit e8616b6ced6137085e6657cc63bc2fe3900b8616.
See https://bugzilla.kernel.org/show_bug.cgi?id=36052
Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Florian Mickler florian@mickler.org Cc: Pekka Enberg penberg@kernel.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Keith Packard keithp@keithp.com Cc: stable@kernel.org
--- drivers/gpu/drm/i915/i915_dma.c | 25 +++++++++++++----- drivers/gpu/drm/i915/intel_ringbuffer.c | 42 ------------------------------- drivers/gpu/drm/i915/intel_ringbuffer.h | 3 -- 3 files changed, 18 insertions(+), 52 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 296fbd6..9300d18 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -160,7 +160,7 @@ static int i915_initialize(struct drm_device * dev, drm_i915_init_t * init) { drm_i915_private_t *dev_priv = dev->dev_private; struct drm_i915_master_private *master_priv = dev->primary->master->driver_priv; - int ret; + struct intel_ring_buffer *ring = LP_RING(dev_priv);
master_priv->sarea = drm_getsarea(dev); if (master_priv->sarea) { @@ -171,22 +171,33 @@ static int i915_initialize(struct drm_device * dev, drm_i915_init_t * init) }
if (init->ring_size != 0) { - if (LP_RING(dev_priv)->obj != NULL) { + if (ring->obj != NULL) { i915_dma_cleanup(dev); DRM_ERROR("Client tried to initialize ringbuffer in " "GEM mode\n"); return -EINVAL; }
- ret = intel_render_ring_init_dri(dev, - init->ring_start, - init->ring_size); - if (ret) { + ring->size = init->ring_size; + + ring->map.offset = init->ring_start; + ring->map.size = init->ring_size; + ring->map.type = 0; + ring->map.flags = 0; + ring->map.mtrr = 0; + + drm_core_ioremap_wc(&ring->map, dev); + + if (ring->map.handle == NULL) { i915_dma_cleanup(dev); - return ret; + DRM_ERROR("can not ioremap virtual address for" + " ring buffer\n"); + return -ENOMEM; } }
+ ring->virtual_start = ring->map.handle; + dev_priv->cpp = init->cpp; dev_priv->back_offset = init->back_offset; dev_priv->front_offset = init->front_offset; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 95c4b14..8d2f610 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1304,48 +1304,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev) return intel_init_ring_buffer(dev, ring); }
-int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) -{ - drm_i915_private_t *dev_priv = dev->dev_private; - struct intel_ring_buffer *ring = &dev_priv->ring[RCS]; - - *ring = render_ring; - if (INTEL_INFO(dev)->gen >= 6) { - ring->add_request = gen6_add_request; - ring->irq_get = gen6_render_ring_get_irq; - ring->irq_put = gen6_render_ring_put_irq; - } else if (IS_GEN5(dev)) { - ring->add_request = pc_render_add_request; - ring->get_seqno = pc_render_get_seqno; - } - - ring->dev = dev; - INIT_LIST_HEAD(&ring->active_list); - INIT_LIST_HEAD(&ring->request_list); - INIT_LIST_HEAD(&ring->gpu_write_list); - - ring->size = size; - ring->effective_size = ring->size; - if (IS_I830(ring->dev)) - ring->effective_size -= 128; - - ring->map.offset = start; - ring->map.size = size; - ring->map.type = 0; - ring->map.flags = 0; - ring->map.mtrr = 0; - - drm_core_ioremap_wc(&ring->map, dev); - if (ring->map.handle == NULL) { - DRM_ERROR("can not ioremap virtual address for" - " ring buffer\n"); - return -ENOMEM; - } - - ring->virtual_start = (void __force __iomem *)ring->map.handle; - return 0; -} - int intel_init_bsd_ring_buffer(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 39ac2b6..b6b0fd4 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -197,7 +197,4 @@ static inline void i915_trace_irq_get(struct intel_ring_buffer *ring, u32 seqno) ring->trace_irq_seqno = seqno; }
-/* DRI warts */ -int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size); - #endif /* _INTEL_RINGBUFFER_H_ */
Kirill Smelkov kirr@mns.spb.ru wrote:
Or would it be more fair to say that UMS is not supported anymore, is broken and just remove support for it?
It's kind of ironic that the intention of the offending patch was to avoid a UMS crash :)
Anyway I'm now using KMS (forced to due to new hardware) so I certainly don't have any objections to reverting this patch.
Cheers,
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru wrote:
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first bad io access the system freezes completely:
I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com --- drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev) static int i915_init_phys_hws(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private; - struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah = @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev) DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM; } - ring->status_page.page_addr = - (void __force __iomem *)dev_priv->status_page_dmah->vaddr;
- memset_io(ring->status_page.page_addr, 0, PAGE_SIZE); + memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) ring->get_seqno = pc_render_get_seqno; }
+ if (!I915_NEED_GFX_HWS(dev)) + ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr; + ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru wrote:
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first bad io access the system freezes completely:
I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com
drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev) static int i915_init_phys_hws(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev) DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM; }
ring->status_page.page_addr =
(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) ring->get_seqno = pc_render_get_seqno; }
- if (!I915_NEED_GFX_HWS(dev))
ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
- ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Thanks again, Kirill
On Sat, 23 Jul 2011 00:23:36 +0400, Kirill Smelkov kirr@mns.spb.ru wrote:
What kind of a workaround are you talking about?
Just reverting the commit -- that makes your machine work, even if it's wrong for other machines.
Sorry, to me it all looked like "UMS is being ignored forever".
You're right, of course -- UMS is a huge wart on the kernel driver at this point, keeping it working while also adding new functionality continues to cause challenges. We tend to expect that most people will run reasonably contemporaneous kernel and user space code, and so three years after the switch, it continues to surprise us when someone actually tries UMS.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Let me know; I've pushed this patch to my drm-intel-fixes tree on kernel.org in the meantime; if it does solve the problem, I'd like to add your Tested-by: line.
On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
On Sat, 23 Jul 2011 00:23:36 +0400, Kirill Smelkov kirr@mns.spb.ru wrote:
What kind of a workaround are you talking about?
Just reverting the commit -- that makes your machine work, even if it's wrong for other machines.
Yes, I could revert it. But since the driver is reasonably complex, it is better to know what I'm doing and that the change makes sense, especially when it's not "my machine", but lots of target boards located all over the country.
That's why I wanted, and imho reasonably, because I did the homework, your feedback - to be not on my own, alone.
Sorry, to me it all looked like "UMS is being ignored forever".
You're right, of course -- UMS is a huge wart on the kernel driver at this point, keeping it working while also adding new functionality continues to cause challenges. We tend to expect that most people will run reasonably contemporaneous kernel and user space code, and so three years after the switch, it continues to surprise us when someone actually tries UMS.
We are planning upgrade to KMS too. The kernel is upgraded more often compared to userspace, because of already mentioned (thanks!) "no regression" rule. Userspace is more complex and more work in my context, so it is lagging, but eventually we'll get there.
So I hope some day, when everyone upgrades, UMS support could be "cleaned up" out from the driver.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Let me know; I've pushed this patch to my drm-intel-fixes tree on kernel.org in the meantime; if it does solve the problem, I'd like to add your Tested-by: line.
Yes, sure, I'll let you know the results.
Thanks, Kirill
On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
You're right, of course -- UMS is a huge wart on the kernel driver at this point, keeping it working while also adding new functionality continues to cause challenges. We tend to expect that most people will run reasonably contemporaneous kernel and user space code, and so three years after the switch, it continues to surprise us when someone actually tries UMS.
We are planning upgrade to KMS too. The kernel is upgraded more often compared to userspace, because of already mentioned (thanks!) "no regression" rule. Userspace is more complex and more work in my context, so it is lagging, but eventually we'll get there.
Also wanted to say, that if whole X could be built, like the kernel, from one repo without multirepo-setup tool, with 100% reliable working incremental rebuild, etc... it would be a bit easier to upgrade X too.
Sorry for being a bit offtopic, could not resist. I was keeping that though in my head for ~ 2 years already, and now had a chance to mention it.
Thanks, Kirill
On Fri, Jul 22, 2011 at 5:31 PM, Kirill Smelkov kirr@mns.spb.ru wrote:
On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
You're right, of course -- UMS is a huge wart on the kernel driver at this point, keeping it working while also adding new functionality continues to cause challenges. We tend to expect that most people will run reasonably contemporaneous kernel and user space code, and so three years after the switch, it continues to surprise us when someone actually tries UMS.
We are planning upgrade to KMS too. The kernel is upgraded more often compared to userspace, because of already mentioned (thanks!) "no regression" rule. Userspace is more complex and more work in my context, so it is lagging, but eventually we'll get there.
Also wanted to say, that if whole X could be built, like the kernel, from one repo without multirepo-setup tool, with 100% reliable working incremental rebuild, etc... it would be a bit easier to upgrade X too.
Sorry for being a bit offtopic, could not resist. I was keeping that though in my head for ~ 2 years already, and now had a chance to mention it.
You don't have to rebuild all of X to use KMS. In most cases, you just need to update the ddx for your card.
Alex
On Sat, Jul 23, 2011 at 11:10:53AM -0400, Alex Deucher wrote:
On Fri, Jul 22, 2011 at 5:31 PM, Kirill Smelkov kirr@mns.spb.ru wrote:
On Sat, Jul 23, 2011 at 01:08:14AM +0400, Kirill Smelkov wrote:
On Fri, Jul 22, 2011 at 01:50:04PM -0700, Keith Packard wrote:
You're right, of course -- UMS is a huge wart on the kernel driver at this point, keeping it working while also adding new functionality continues to cause challenges. We tend to expect that most people will run reasonably contemporaneous kernel and user space code, and so three years after the switch, it continues to surprise us when someone actually tries UMS.
We are planning upgrade to KMS too. The kernel is upgraded more often compared to userspace, because of already mentioned (thanks!) "no regression" rule. Userspace is more complex and more work in my context, so it is lagging, but eventually we'll get there.
Also wanted to say, that if whole X could be built, like the kernel, from one repo without multirepo-setup tool, with 100% reliable working incremental rebuild, etc... it would be a bit easier to upgrade X too.
Sorry for being a bit offtopic, could not resist. I was keeping that though in my head for ~ 2 years already, and now had a chance to mention it.
You don't have to rebuild all of X to use KMS. In most cases, you just need to update the ddx for your card.
I meant the rebuilt not to use KMS, but general case. To me the kernel has one of the great advantage of being lots of self-consistent code because of being maintained in one repo + good build system + good development process. And as the result it is (relatively) easy to upgrade.
Anyway, this is just a note from both kernel and X stranger, so whatever...
Kirill
Hi Keith,
On Fri, 22 Jul 2011, Keith Packard wrote:
Sorry, to me it all looked like "UMS is being ignored forever".
You're right, of course -- UMS is a huge wart on the kernel driver at this point, keeping it working while also adding new functionality continues to cause challenges. We tend to expect that most people will run reasonably contemporaneous kernel and user space code, and so three years after the switch, it continues to surprise us when someone actually tries UMS.
I know I sound like a broken record but I really wish you i915 devs were little more eager to revert broken patches early rather than late. I mean, this particular breakage was already bisected but nobody said or did anything - and it's not like it's the first time either!
I suppose I need to bribe Linus somehow to be more strict with you folks.
Pekka
On Sat, 23 Jul 2011 18:55:48 +0300 (EEST), Pekka Enberg penberg@kernel.org wrote:
I know I sound like a broken record but I really wish you i915 devs were little more eager to revert broken patches early rather than late. I mean, this particular breakage was already bisected but nobody said or did anything - and it's not like it's the first time either!
We've switched processes starting with 2.6.39 and I think we're doing better in this regard. For this particular issue, the regression came with 2.6.38, and the revert was too large for me to consider merging just before 3.0 shipped -- I knew reverting it *would* cause problems for anyone using UMS on newer hardware.
I suppose I need to bribe Linus somehow to be more strict with you folks.
He nicely delivered the message for you a few months ago in person.
In any case, I'm hoping that my smaller fix will resolve the problem and also not cause regressions for other users.
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru wrote:
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first bad io access the system freezes completely:
I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com
drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev) static int i915_init_phys_hws(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev) DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM; }
ring->status_page.page_addr =
(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) ring->get_seqno = pc_render_get_seqno; }
- if (!I915_NEED_GFX_HWS(dev))
ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
- ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev) return intel_init_ring_buffer(dev, ring); }
+int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) +{
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring;
^^^^^^^^^^^^^^^^^^^ here resets
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->irq_get = gen6_render_ring_get_irq;
ring->irq_put = gen6_render_ring_put_irq;
} else if (IS_GEN5(dev)) {
ring->add_request = pc_render_add_request;
ring->get_seqno = pc_render_get_seqno;
}
and then the rest of the `ring` is initialized seemingly copy-pasted from intel_init_ring_buffer():
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->gpu_write_list);
ring->size = size;
ring->effective_size = ring->size;
if (IS_I830(ring->dev))
ring->effective_size -= 128;
ring->map.offset = start;
ring->map.size = size;
ring->map.type = 0;
ring->map.flags = 0;
ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if * the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Thanks again, Kirill
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru wrote:
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first bad io access the system freezes completely:
I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com
drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev) static int i915_init_phys_hws(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev) DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM; }
ring->status_page.page_addr =
(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
memset_io((void __force __iomem *)dev_priv->status_page_dmah->vaddr,
0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) ring->get_seqno = pc_render_get_seqno; }
- if (!I915_NEED_GFX_HWS(dev))
ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
- ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev) return intel_init_ring_buffer(dev, ring); }
+int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) +{
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring;
^^^^^^^^^^^^^^^^^^^ here resets
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->irq_get = gen6_render_ring_get_irq;
ring->irq_put = gen6_render_ring_put_irq;
} else if (IS_GEN5(dev)) {
ring->add_request = pc_render_add_request;
ring->get_seqno = pc_render_get_seqno;
}
and then the rest of the `ring` is initialized seemingly copy-pasted from intel_init_ring_buffer():
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->gpu_write_list);
ring->size = size;
ring->effective_size = ring->size;
if (IS_I830(ring->dev))
ring->effective_size -= 128;
ring->map.offset = start;
ring->map.size = size;
ring->map.type = 0;
ring->map.flags = 0;
ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if * the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
And also is copy-pasting, instead of properly structuring things, ok?
Why not revert what caused trouble and introduced other subtle bugs, and redo things properly in the first place?
On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru
wrote:
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
bad io access the system freezes completely:
I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com
drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
static int i915_init_phys_hws(struct drm_device *dev) {
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM;
}
ring->status_page.page_addr =
(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
- memset_io((void __force __iomem
*)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
ring->get_seqno = pc_render_get_seqno;
}
if (!I915_NEED_GFX_HWS(dev))
ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
return intel_init_ring_buffer(dev, ring);
}
+int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) +{
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring;
^^^^^^^^^^^^^^^^^^^ here resets
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->irq_get = gen6_render_ring_get_irq;
ring->irq_put = gen6_render_ring_put_irq;
} else if (IS_GEN5(dev)) {
ring->add_request = pc_render_add_request;
ring->get_seqno = pc_render_get_seqno;
}
and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer():
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->gpu_write_list);
ring->size = size;
ring->effective_size = ring->size;
if (IS_I830(ring->dev))
ring->effective_size -= 128;
ring->map.offset = start;
ring->map.size = size;
ring->map.type = 0;
ring->map.flags = 0;
ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if
* the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
I really doubt that UMS supports gen6 hardware.
Regards Vasily
On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru
wrote:
And now after v3.0 is out, I've tested it again, and yes, like it was broken on v3.0-rc5, it is (now even more) broken on v3.0 -- after first
bad io access the system freezes completely:
I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com
drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
static int i915_init_phys_hws(struct drm_device *dev) {
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM;
}
ring->status_page.page_addr =
(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
- memset_io((void __force __iomem
*)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
ring->get_seqno = pc_render_get_seqno;
}
if (!I915_NEED_GFX_HWS(dev))
ring->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
return intel_init_ring_buffer(dev, ring);
}
+int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) +{
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring;
^^^^^^^^^^^^^^^^^^^ here resets
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->irq_get = gen6_render_ring_get_irq;
ring->irq_put = gen6_render_ring_put_irq;
} else if (IS_GEN5(dev)) {
ring->add_request = pc_render_add_request;
ring->get_seqno = pc_render_get_seqno;
}
and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer():
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->gpu_write_list);
ring->size = size;
ring->effective_size = ring->size;
if (IS_I830(ring->dev))
ring->effective_size -= 128;
ring->map.offset = start;
ring->map.size = size;
ring->map.type = 0;
ring->map.flags = 0;
ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if
* the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
I really doubt that UMS supports gen6 hardware.
Then why it is there in intel_render_ring_init_dri():
int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) { drm_i915_private_t *dev_priv = dev->dev_private; struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring; if (INTEL_INFO(dev)->gen >= 6) { ring->add_request = gen6_add_request; ring->irq_get = gen6_render_ring_get_irq; ring->irq_put = gen6_render_ring_put_irq; } else if (IS_GEN5(dev)) { ring->add_request = pc_render_add_request; ring->get_seqno = pc_render_get_seqno; }
?
Added by the same guilty commit e8616b6c I'm talking about.
On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote:
On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov kirr@mns.spb.ru
wrote:
> And now after v3.0 is out, I've tested it again, and yes, like > it was broken on v3.0-rc5, it is (now even more) broken on > v3.0 -- after first
> bad io access the system freezes completely: I looked at this when I first saw it (a couple of weeks ago), and I couldn't see any obvious reason this patch would cause this particular problem. I didn't want to revert the patch at that point as I feared it would cause other subtle problems. Given that you've got a work-around, it seemed best to just push this off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
Given the failing address passed to ioread32, this seems like it's probably the call to READ_BREADCRUMB -- I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit units within the hardware status page. If the status_page.page_addr value was zero, then the computed address would end up being 0x84.
And, it looks like status_page.page_addr *will* end up being zero as a result of the patch in question. The patch resets the entire ring structure contents back to the initial values, which includes smashing the status_page structure to zero, clearing the value of status_page.page_addr set in i915_init_phys_hws.
Here's an untested patch which moves the initialization of status_page.page_addr into intel_render_ring_init_dri. I note that intel_init_render_ring_buffer *already* has the setting of the status_page.page_addr value, and so I've removed the setting of status_page.page_addr from i915_init_phys_hws.
I suspect we could remove the memset from intel_init_render_ring_buffer; it seems entirely superfluous given the memset in i915_init_phys_hws.
From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Fri, 22 Jul 2011 10:44:39 -0700 Subject: [PATCH] drm/i915: Initialize RCS ring status page address in
intel_render_ring_init_dri
Physically-addressed hardware status pages are initialized early in the driver load process by i915_init_phys_hws. For UMS environments, the ring structure is not initialized until the X server starts. At that point, the entire ring structure is re-initialized with all new values. Any values set in the ring structure (including ring->status_page.page_addr) will be lost when the ring is re-initialized.
This patch moves the initialization of the status_page.page_addr value to intel_render_ring_init_dri.
Signed-off-by: Keith Packard keithp@keithp.com
drivers/gpu/drm/i915/i915_dma.c | 6 ++---- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct drm_device *dev)
static int i915_init_phys_hws(struct drm_device *dev) {
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = LP_RING(dev_priv);
/* Program Hardware Status Page */ dev_priv->status_page_dmah =
@@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct drm_device *dev)
DRM_ERROR("Can not allocate hardware status page\n"); return -ENOMEM;
}
ring->status_page.page_addr =
(void __force __iomem *)dev_priv->status_page_dmah->vaddr;
memset_io(ring->status_page.page_addr, 0, PAGE_SIZE);
- memset_io((void __force __iomem
*)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE);
i915_write_hws_pga(dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
ring->get_seqno = pc_render_get_seqno;
}
- if (!I915_NEED_GFX_HWS(dev))
ring->status_page.page_addr =
dev_priv->status_page_dmah->vaddr; +
ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
return intel_init_ring_buffer(dev, ring);
}
+int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) +{
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring;
^^^^^^^^^^^^^^^^^^^ here resets
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->irq_get = gen6_render_ring_get_irq;
ring->irq_put = gen6_render_ring_put_irq;
} else if (IS_GEN5(dev)) {
ring->add_request = pc_render_add_request;
ring->get_seqno = pc_render_get_seqno;
}
and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer():
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->gpu_write_list);
ring->size = size;
ring->effective_size = ring->size;
if (IS_I830(ring->dev))
ring->effective_size -= 128;
ring->map.offset = start;
ring->map.size = size;
ring->map.type = 0;
ring->map.flags = 0;
ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if
* the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
I really doubt that UMS supports gen6 hardware.
Then why it is there in intel_render_ring_init_dri():
int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
size) { drm_i915_private_t *dev_priv = dev->dev_private; struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring; if (INTEL_INFO(dev)->gen >= 6) {
This branch executes only when hw generation is 6 or newer.
ring->add_request = gen6_add_request; ring->irq_get = gen6_render_ring_get_irq; ring->irq_put = gen6_render_ring_put_irq; } else if (IS_GEN5(dev)) { ring->add_request = pc_render_add_request; ring->get_seqno = pc_render_get_seqno; }
On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote:
Keith,
first of all thanks for your prompt reply. Then...
On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote: > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov > kirr@mns.spb.ru
wrote:
> > And now after v3.0 is out, I've tested it again, and yes, like > > it was broken on v3.0-rc5, it is (now even more) broken on > > v3.0 -- after first > > > bad io access the system freezes completely: > I looked at this when I first saw it (a couple of weeks ago), and > I couldn't see any obvious reason this patch would cause this > particular problem. I didn't want to revert the patch at that > point as I feared it would cause other subtle problems. Given > that you've got a work-around, it seemed best to just push this > off past 3.0.
What kind of a workaround are you talking about? Sorry, to me it all looked like "UMS is being ignored forever". Anyway, let's move on to try to solve the issue.
> Given the failing address passed to ioread32, this seems like > it's probably the call to READ_BREADCRUMB -- > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit > units within the hardware status page. If the > status_page.page_addr value was zero, then the computed address > would end up being 0x84. > > And, it looks like status_page.page_addr *will* end up being zero > as a result of the patch in question. The patch resets the > entire ring structure contents back to the initial values, which > includes smashing the status_page structure to zero, clearing > the value of status_page.page_addr set in i915_init_phys_hws. > > Here's an untested patch which moves the initialization of > status_page.page_addr into intel_render_ring_init_dri. I note > that intel_init_render_ring_buffer *already* has the setting of > the status_page.page_addr value, and so I've removed the setting > of status_page.page_addr from i915_init_phys_hws. > > I suspect we could remove the memset from > intel_init_render_ring_buffer; it seems entirely superfluous > given the memset in i915_init_phys_hws. > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 00:00:00 > 2001 From: Keith Packard keithp@keithp.com > Date: Fri, 22 Jul 2011 10:44:39 -0700 > Subject: [PATCH] drm/i915: Initialize RCS ring status page > address in > > intel_render_ring_init_dri > > Physically-addressed hardware status pages are initialized early > in the driver load process by i915_init_phys_hws. For UMS > environments, the ring structure is not initialized until the X > server starts. At that point, the entire ring structure is > re-initialized with all new values. Any values set in the ring > structure (including > ring->status_page.page_addr) will be lost when the ring is > re-initialized. > > This patch moves the initialization of the status_page.page_addr > value to intel_render_ring_init_dri. > > Signed-off-by: Keith Packard keithp@keithp.com > --- > > drivers/gpu/drm/i915/i915_dma.c | 6 ++---- > drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ > 2 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_dma.c > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct > drm_device *dev) > > static int i915_init_phys_hws(struct drm_device *dev) > { > > drm_i915_private_t *dev_priv = dev->dev_private; > > - struct intel_ring_buffer *ring = LP_RING(dev_priv); > > /* Program Hardware Status Page */ > dev_priv->status_page_dmah = > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct > drm_device *dev) > > DRM_ERROR("Can not allocate hardware status page\n"); > return -ENOMEM; > > } > > - ring->status_page.page_addr = > - (void __force __iomem *)dev_priv->status_page_dmah->vaddr; > > - memset_io(ring->status_page.page_addr, 0, PAGE_SIZE); > + memset_io((void __force __iomem > *)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE); > > i915_write_hws_pga(dev); > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c index e961568..47b9b27 > 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct > drm_device *dev, u64 start, u32 size) > > ring->get_seqno = pc_render_get_seqno; > > } > > + if (!I915_NEED_GFX_HWS(dev)) > + ring->status_page.page_addr = > dev_priv->status_page_dmah->vaddr; + > > ring->dev = dev; > INIT_LIST_HEAD(&ring->active_list); > INIT_LIST_HEAD(&ring->request_list);
I can't tell whether this is correct, because intel gfx driver is unknown to me, but from the first glance your description sounds reasonable.
I'm out of office till ~ next week's tuesday, and on return I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..51fbc5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
return intel_init_ring_buffer(dev, ring);
}
+int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size) +{
drm_i915_private_t *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring;
^^^^^^^^^^^^^^^^^^^ here resets
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->irq_get = gen6_render_ring_get_irq;
ring->irq_put = gen6_render_ring_put_irq;
} else if (IS_GEN5(dev)) {
ring->add_request = pc_render_add_request;
ring->get_seqno = pc_render_get_seqno;
}
and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer():
ring->dev = dev;
INIT_LIST_HEAD(&ring->active_list);
INIT_LIST_HEAD(&ring->request_list);
INIT_LIST_HEAD(&ring->gpu_write_list);
ring->size = size;
ring->effective_size = ring->size;
if (IS_I830(ring->dev))
ring->effective_size -= 128;
ring->map.offset = start;
ring->map.size = size;
ring->map.type = 0;
ring->map.flags = 0;
ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if
* the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
From the outside it looks like the offending patch was done as a quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
I really doubt that UMS supports gen6 hardware.
Then why it is there in intel_render_ring_init_dri():
int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
size) { drm_i915_private_t *dev_priv = dev->dev_private; struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
*ring = render_ring; if (INTEL_INFO(dev)->gen >= 6) {
This branch executes only when hw generation is 6 or newer.
and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock which is left uninitialized.
I don't understand what you were trying to say. How does it matter if some branch executes only for such-and-such hardware, when this branch contains bugs? Could you please clarify?
ring->add_request = gen6_add_request; ring->irq_get = gen6_render_ring_get_irq; ring->irq_put = gen6_render_ring_put_irq; } else if (IS_GEN5(dev)) { ring->add_request = pc_render_add_request; ring->get_seqno = pc_render_get_seqno; }
On Tuesday 09 August 2011 18:34:46 Kirill Smelkov wrote:
On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote:
On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote: > Keith, > > first of all thanks for your prompt reply. Then... > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote: > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov > > kirr@mns.spb.ru
wrote:
> > > And now after v3.0 is out, I've tested it again, and yes, > > > like it was broken on v3.0-rc5, it is (now even more) > > > broken on v3.0 -- after first > > > > > bad io access the system freezes completely: > > I looked at this when I first saw it (a couple of weeks ago), > > and I couldn't see any obvious reason this patch would cause > > this particular problem. I didn't want to revert the patch > > at that point as I feared it would cause other subtle > > problems. Given that you've got a work-around, it seemed > > best to just push this off past 3.0. > > What kind of a workaround are you talking about? Sorry, to me > it all looked like "UMS is being ignored forever". Anyway, > let's move on to try to solve the issue. > > > Given the failing address passed to ioread32, this seems like > > it's probably the call to READ_BREADCRUMB -- > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit > > units within the hardware status page. If the > > status_page.page_addr value was zero, then the computed > > address would end up being 0x84. > > > > And, it looks like status_page.page_addr *will* end up being > > zero as a result of the patch in question. The patch resets > > the entire ring structure contents back to the initial > > values, which includes smashing the status_page structure to > > zero, clearing the value of status_page.page_addr set in > > i915_init_phys_hws. > > > > Here's an untested patch which moves the initialization of > > status_page.page_addr into intel_render_ring_init_dri. I note > > that intel_init_render_ring_buffer *already* has the setting > > of the status_page.page_addr value, and so I've removed the > > setting of status_page.page_addr from i915_init_phys_hws. > > > > I suspect we could remove the memset from > > intel_init_render_ring_buffer; it seems entirely superfluous > > given the memset in i915_init_phys_hws. > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 > > 00:00:00 2001 From: Keith Packard keithp@keithp.com > > Date: Fri, 22 Jul 2011 10:44:39 -0700 > > Subject: [PATCH] drm/i915: Initialize RCS ring status page > > address in > > > > intel_render_ring_init_dri > > > > Physically-addressed hardware status pages are initialized > > early in the driver load process by i915_init_phys_hws. For > > UMS environments, the ring structure is not initialized > > until the X server starts. At that point, the entire ring > > structure is re-initialized with all new values. Any values > > set in the ring structure (including > > ring->status_page.page_addr) will be lost when the ring is > > re-initialized. > > > > This patch moves the initialization of the > > status_page.page_addr value to intel_render_ring_init_dri. > > > > Signed-off-by: Keith Packard keithp@keithp.com > > --- > > > > drivers/gpu/drm/i915/i915_dma.c | 6 ++---- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ > > 2 files changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c > > 100644 --- a/drivers/gpu/drm/i915/i915_dma.c > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct > > drm_device *dev) > > > > static int i915_init_phys_hws(struct drm_device *dev) > > { > > > > drm_i915_private_t *dev_priv = dev->dev_private; > > > > - struct intel_ring_buffer *ring = LP_RING(dev_priv); > > > > /* Program Hardware Status Page */ > > dev_priv->status_page_dmah = > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct > > drm_device *dev) > > > > DRM_ERROR("Can not allocate hardware status page\n"); > > return -ENOMEM; > > > > } > > > > - ring->status_page.page_addr = > > - (void __force __iomem *)dev_priv->status_page_dmah-
vaddr;
> > > > - memset_io(ring->status_page.page_addr, 0, PAGE_SIZE); > > + memset_io((void __force __iomem > > *)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE); > > > > i915_write_hws_pga(dev); > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index > > e961568..47b9b27 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct > > drm_device *dev, u64 start, u32 size) > > > > ring->get_seqno = pc_render_get_seqno; > > > > } > > > > + if (!I915_NEED_GFX_HWS(dev)) > > + ring->status_page.page_addr = > > dev_priv->status_page_dmah->vaddr; + > > > > ring->dev = dev; > > INIT_LIST_HEAD(&ring->active_list); > > INIT_LIST_HEAD(&ring->request_list); > > I can't tell whether this is correct, because intel gfx driver > is unknown to me, but from the first glance your description > sounds reasonable. > > I'm out of office till ~ next week's tuesday, and on return > I'll try to test it on the hardware in question.
Keith, thanks again for the patch. As promised I've tested it on the hardware in question and yes, bad_access is gone and X seems to work, so thank you, but...
I see there are more such bugs in introduced-in-guilty-patch intel_render_ring_init_dri(). For example ring->irq_queue is left uninitialized and also ring->irq_lock etc...
I'm X newbie, so if here is something stupid X-wise, please don't beat me too hard, but to me the gist of the problem is the original patch, where Chris does
( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 )
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c index > 03e3370..51fbc5e 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct > drm_device *dev) > > return intel_init_ring_buffer(dev, ring); > > } > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 > start, u32 size) +{ > + drm_i915_private_t *dev_priv = dev->dev_private; > + struct intel_ring_buffer *ring = &dev_priv->ring[RCS]; > + > + *ring = render_ring; > ^^^^^^^^^^^^^^^^^^^ here resets > > + if (INTEL_INFO(dev)->gen >= 6) { > + ring->add_request = gen6_add_request; > + ring->irq_get = gen6_render_ring_get_irq; > + ring->irq_put = gen6_render_ring_put_irq; > + } else if (IS_GEN5(dev)) { > + ring->add_request = pc_render_add_request; > + ring->get_seqno = pc_render_get_seqno; > + }
and then the rest of the `ring` is initialized seemingly copy-pasted
from intel_init_ring_buffer(): > + ring->dev = dev; > + INIT_LIST_HEAD(&ring->active_list); > + INIT_LIST_HEAD(&ring->request_list); > + INIT_LIST_HEAD(&ring->gpu_write_list); > + > + ring->size = size; > + ring->effective_size = ring->size; > + if (IS_I830(ring->dev)) > + ring->effective_size -= 128; > + > + ring->map.offset = start; > + ring->map.size = size; > + ring->map.type = 0; > + ring->map.flags = 0; > + ring->map.mtrr = 0;
...
where both 3 chunks go almost exactly from intel_init_ring_buffer(), and ring->effective_size tweak even stripped original comment:
# original version from intel_init_ring_buffer(): /* Workaround an erratum on the i830 which causes a hang if
* the TAIL pointer points to within the last 2 cachelines * of the buffer. */ ring->effective_size = ring->size; if (IS_I830(ring->dev)) ring->effective_size -= 128;
...
The line marked "here resets" resets all the fields, and maybe it's not a good idea to re-initialize them all afterwards (missing some as this thread show), or at least if it is really needed, share initialization code between intel_render_ring_init_dri() and intel_init_ring_buffer() ?
>From the outside it looks like the offending patch was done as a >quick
fix in a hurry (lots of copy-paste), and maybe it would be better to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
I really doubt that UMS supports gen6 hardware.
Then why it is there in intel_render_ring_init_dri(): int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
size) {
drm_i915_private_t *dev_priv = dev->dev_private; struct intel_ring_buffer *ring = &dev_priv->ring[RCS]; *ring = render_ring; if (INTEL_INFO(dev)->gen >= 6) {
This branch executes only when hw generation is 6 or newer.
and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock which is left uninitialized.
I don't understand what you were trying to say. How does it matter if some branch executes only for such-and-such hardware, when this branch contains bugs? Could you please clarify?
I want to say that xf86-video-intel with gen6 support does not support UMS. So you can't even hit this "bug".
On Tue, Aug 09, 2011 at 07:02:59PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 18:34:46 Kirill Smelkov wrote:
On Tue, Aug 09, 2011 at 06:09:57PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 17:47:56 Kirill Smelkov wrote:
On Tue, Aug 09, 2011 at 05:00:52PM +0300, Vasily Khoruzhick wrote:
On Tuesday 09 August 2011 15:08:03 Kirill Smelkov wrote:
On Tue, Jul 26, 2011 at 05:48:27PM +0400, Kirill Smelkov wrote: > On Sat, Jul 23, 2011 at 12:23:36AM +0400, Kirill Smelkov wrote: > > Keith, > > > > first of all thanks for your prompt reply. Then... > > > > On Fri, Jul 22, 2011 at 11:00:41AM -0700, Keith Packard wrote: > > > On Fri, 22 Jul 2011 15:08:06 +0400, Kirill Smelkov > > > kirr@mns.spb.ru
wrote:
> > > > And now after v3.0 is out, I've tested it again, and yes, > > > > like it was broken on v3.0-rc5, it is (now even more) > > > > broken on v3.0 -- after first > > > > > > > bad io access the system freezes completely: > > > I looked at this when I first saw it (a couple of weeks ago), > > > and I couldn't see any obvious reason this patch would cause > > > this particular problem. I didn't want to revert the patch > > > at that point as I feared it would cause other subtle > > > problems. Given that you've got a work-around, it seemed > > > best to just push this off past 3.0. > > > > What kind of a workaround are you talking about? Sorry, to me > > it all looked like "UMS is being ignored forever". Anyway, > > let's move on to try to solve the issue. > > > > > Given the failing address passed to ioread32, this seems like > > > it's probably the call to READ_BREADCRUMB -- > > > I915_BREADCRUMB_INDEX is 0x21, which is an offset in 32-bit > > > units within the hardware status page. If the > > > status_page.page_addr value was zero, then the computed > > > address would end up being 0x84. > > > > > > And, it looks like status_page.page_addr *will* end up being > > > zero as a result of the patch in question. The patch resets > > > the entire ring structure contents back to the initial > > > values, which includes smashing the status_page structure to > > > zero, clearing the value of status_page.page_addr set in > > > i915_init_phys_hws. > > > > > > Here's an untested patch which moves the initialization of > > > status_page.page_addr into intel_render_ring_init_dri. I note > > > that intel_init_render_ring_buffer *already* has the setting > > > of the status_page.page_addr value, and so I've removed the > > > setting of status_page.page_addr from i915_init_phys_hws. > > > > > > I suspect we could remove the memset from > > > intel_init_render_ring_buffer; it seems entirely superfluous > > > given the memset in i915_init_phys_hws. > > > > > > From 159ba1dd207fc52590ce8a3afd83f40bd2cedf46 Mon Sep 17 > > > 00:00:00 2001 From: Keith Packard keithp@keithp.com > > > Date: Fri, 22 Jul 2011 10:44:39 -0700 > > > Subject: [PATCH] drm/i915: Initialize RCS ring status page > > > address in > > > > > > intel_render_ring_init_dri > > > > > > Physically-addressed hardware status pages are initialized > > > early in the driver load process by i915_init_phys_hws. For > > > UMS environments, the ring structure is not initialized > > > until the X server starts. At that point, the entire ring > > > structure is re-initialized with all new values. Any values > > > set in the ring structure (including > > > ring->status_page.page_addr) will be lost when the ring is > > > re-initialized. > > > > > > This patch moves the initialization of the > > > status_page.page_addr value to intel_render_ring_init_dri. > > > > > > Signed-off-by: Keith Packard keithp@keithp.com > > > --- > > > > > > drivers/gpu/drm/i915/i915_dma.c | 6 ++---- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ > > > 2 files changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c > > > b/drivers/gpu/drm/i915/i915_dma.c index 1271282..8a3942c > > > 100644 --- a/drivers/gpu/drm/i915/i915_dma.c > > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > > @@ -61,7 +61,6 @@ static void i915_write_hws_pga(struct > > > drm_device *dev) > > > > > > static int i915_init_phys_hws(struct drm_device *dev) > > > { > > > > > > drm_i915_private_t *dev_priv = dev->dev_private; > > > > > > - struct intel_ring_buffer *ring = LP_RING(dev_priv); > > > > > > /* Program Hardware Status Page */ > > > dev_priv->status_page_dmah = > > > > > > @@ -71,10 +70,9 @@ static int i915_init_phys_hws(struct > > > drm_device *dev) > > > > > > DRM_ERROR("Can not allocate hardware status page\n"); > > > return -ENOMEM; > > > > > > } > > > > > > - ring->status_page.page_addr = > > > - (void __force __iomem *)dev_priv->status_page_dmah-
vaddr;
> > > > > > - memset_io(ring->status_page.page_addr, 0, PAGE_SIZE); > > > + memset_io((void __force __iomem > > > *)dev_priv->status_page_dmah->vaddr, + 0, PAGE_SIZE); > > > > > > i915_write_hws_pga(dev); > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index > > > e961568..47b9b27 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -1321,6 +1321,9 @@ int intel_render_ring_init_dri(struct > > > drm_device *dev, u64 start, u32 size) > > > > > > ring->get_seqno = pc_render_get_seqno; > > > > > > } > > > > > > + if (!I915_NEED_GFX_HWS(dev)) > > > + ring->status_page.page_addr = > > > dev_priv->status_page_dmah->vaddr; + > > > > > > ring->dev = dev; > > > INIT_LIST_HEAD(&ring->active_list); > > > INIT_LIST_HEAD(&ring->request_list); > > > > I can't tell whether this is correct, because intel gfx driver > > is unknown to me, but from the first glance your description > > sounds reasonable. > > > > I'm out of office till ~ next week's tuesday, and on return > > I'll try to test it on the hardware in question. > > Keith, thanks again for the patch. As promised I've tested it on > the hardware in question and yes, bad_access is gone and X seems > to work, so thank you, but... > > > I see there are more such bugs in introduced-in-guilty-patch > intel_render_ring_init_dri(). For example ring->irq_queue is > left uninitialized and also ring->irq_lock etc... > > > I'm X newbie, so if here is something stupid X-wise, please don't > beat me too hard, but to me the gist of the problem is the > original patch, where Chris does > > ( git show e8616b6ced6137085e6657cc63bc2fe3900b8616 ) > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index > > 03e3370..51fbc5e 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -1291,6 +1291,48 @@ int intel_init_render_ring_buffer(struct > > drm_device *dev) > > > > return intel_init_ring_buffer(dev, ring); > > > > } > > > > +int intel_render_ring_init_dri(struct drm_device *dev, u64 > > start, u32 size) +{ > > + drm_i915_private_t *dev_priv = dev->dev_private; > > + struct intel_ring_buffer *ring = &dev_priv->ring[RCS]; > > + > > + *ring = render_ring; > > > ^^^^^^^^^^^^^^^^^^^ > here resets > > > > + if (INTEL_INFO(dev)->gen >= 6) { > > + ring->add_request = gen6_add_request; > > + ring->irq_get = gen6_render_ring_get_irq; > > + ring->irq_put = gen6_render_ring_put_irq; > > + } else if (IS_GEN5(dev)) { > > + ring->add_request = pc_render_add_request; > > + ring->get_seqno = pc_render_get_seqno; > > + } > > and then the rest of the `ring` is initialized seemingly > copy-pasted > > from intel_init_ring_buffer(): > > + ring->dev = dev; > > + INIT_LIST_HEAD(&ring->active_list); > > + INIT_LIST_HEAD(&ring->request_list); > > + INIT_LIST_HEAD(&ring->gpu_write_list); > > + > > + ring->size = size; > > + ring->effective_size = ring->size; > > + if (IS_I830(ring->dev)) > > + ring->effective_size -= 128; > > + > > + ring->map.offset = start; > > + ring->map.size = size; > > + ring->map.type = 0; > > + ring->map.flags = 0; > > + ring->map.mtrr = 0; > > ... > > where both 3 chunks go almost exactly from > intel_init_ring_buffer(), and ring->effective_size tweak even > stripped original comment: > > # original version from intel_init_ring_buffer(): > /* Workaround an erratum on the i830 which causes a hang > if > > * the TAIL pointer points to within the last 2 > cachelines * of the buffer. > */ > > ring->effective_size = ring->size; > if (IS_I830(ring->dev)) > > ring->effective_size -= 128; > > ... > > > The line marked "here resets" resets all the fields, and maybe > it's not a good idea to re-initialize them all afterwards > (missing some as this thread show), or at least if it is really > needed, share initialization code between > intel_render_ring_init_dri() and intel_init_ring_buffer() ? > > >From the outside it looks like the offending patch was done as a > >quick > > fix in a hurry (lots of copy-paste), and maybe it would be better > to re-do it properly...
Silence... ?
I read UMS is still ignored, because e.g. that uninitialized ring->irq_lock which I've wrote about above is for sure used e.g. in gen6_render_ring_get_irq() added to ring vtable in intel_render_ring_init_dri().
I really doubt that UMS supports gen6 hardware.
Then why it is there in intel_render_ring_init_dri(): int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32
size) {
drm_i915_private_t *dev_priv = dev->dev_private; struct intel_ring_buffer *ring = &dev_priv->ring[RCS]; *ring = render_ring; if (INTEL_INFO(dev)->gen >= 6) {
This branch executes only when hw generation is 6 or newer.
and adds gen6_render_ring_get_irq() to vtable which uses ring->irq_lock which is left uninitialized.
I don't understand what you were trying to say. How does it matter if some branch executes only for such-and-such hardware, when this branch contains bugs? Could you please clarify?
I want to say that xf86-video-intel with gen6 support does not support UMS. So you can't even hit this "bug".
Ok, but so then there is a dead code in the kernel, right? Or not dead at all because potentially some non-X userspace could trigger the bug.
Why it was added in the first place?
To me, intel_render_ring_init_dri() looks like being copy-pasted from several places in a hurry. And I was already beaten by one bug introduced in it, without a single response for 3 kernel cycles though I've asked for help several times and provided detailed info.
Finally Keith analyzed and plugged NULL-pointer dereference (thanks) but I'm telling, it seems there are more bugs introduced in e8616b6c.
The patch title says "drm/i915: Initialise ring vfuncs for old DRI paths" and one could ask, why couldn't it be done without bugs and regressions. Are we waiting for another one hitting left bugs instead of fix them in the first place?
Quite frankly, I don't understand intel-gfx developers attitude: why is it me, just random user who is nitpicking here? Why there is no interest/will to analyze now obviously buggy/duplicate code and fix it?
If support for UMS/old-dri/whatever is dropped, could you please say so and clean the driver from legacy code and move on. That would be at least fair for people not hoping their old setups will continue to work.
Thanks, Kirill
On Tue, Aug 9, 2011 at 9:32 AM, Kirill Smelkov kirr@mns.spb.ru wrote:
Quite frankly, I don't understand intel-gfx developers attitude: why is it me, just random user who is nitpicking here? Why there is no interest/will to analyze now obviously buggy/duplicate code and fix it?
Because they don't have an infinite amount of manpower. Actual bugs hitting actual users take precedence over 'cleanups' which always have a chance of causing regressions, as you're well aware. Code churn for the sake of abstract prettiness is discouraged, as it has a potential cost for little potential gain.
If you like, submit a patch. You may now be more up-to-date on those particular code paths than most of the intel-gfx developers.
On Tue, Aug 09, 2011 at 09:56:01AM -0700, Ray Lee wrote:
On Tue, Aug 9, 2011 at 9:32 AM, Kirill Smelkov kirr@mns.spb.ru wrote:
Quite frankly, I don't understand intel-gfx developers attitude: why is it me, just random user who is nitpicking here? Why there is no interest/will to analyze now obviously buggy/duplicate code and fix it?
Because they don't have an infinite amount of manpower. Actual bugs hitting actual users take precedence over 'cleanups' which always have a chance of causing regressions, as you're well aware. Code churn for the sake of abstract prettiness is discouraged, as it has a potential cost for little potential gain.
If you like, submit a patch. You may now be more up-to-date on those particular code paths than most of the intel-gfx developers.
Ray, I'd agree with you if the topic was about cleanups.
But here I was talking about copy-pasty commit which introduced regressions and bugs, and if now it's a user dilemma to either "clean up" it after developers himself, or accept that something is broken because developers lack manpower and so plug things in a hurry increasing entropy, I'd like to remind a good rule, at least to me one more time, not to break things in the first place.
I'm not talking about cleanup here. I'm talking about original commit which introduced problems, and that there is no need to clean it up, but better revert and redo properly to avoid subsequent code churn in lots of fixes.
Sorry, I won't submit a patch. If there is a need to find/fix/cleanup obvious things after company's developers, I have better things to do, and a todo item to re-evaluate hardware for my next project.
Thanks, Kirill
On Tue, Aug 9, 2011 at 10:40 AM, Kirill Smelkov kirr@mns.spb.ru wrote:
If you like, submit a patch. You may now be more up-to-date on those particular code paths than most of the intel-gfx developers.
Ray, I'd agree with you if the topic was about cleanups.
At this point it is about cleanups unless Keith's patch upthread does not work for you. Does it or not?
On Tue, Aug 09, 2011 at 10:43:08AM -0700, Ray Lee wrote:
On Tue, Aug 9, 2011 at 10:40 AM, Kirill Smelkov kirr@mns.spb.ru wrote:
If you like, submit a patch. You may now be more up-to-date on those particular code paths than most of the intel-gfx developers.
Ray, I'd agree with you if the topic was about cleanups.
At this point it is about cleanups unless Keith's patch upthread does not work for you. Does it or not?
I've already wrote two weeks ago it does, but if this needs to be restated one more time here it is: Keith's patch fixes the problem in a sense that X now starts and seemingly works (thanks), but several issues remain to be there imho. I've got the message, if it's ok for intel-gfx to leave them as is - it's ok for me.
Peace, Kirill
Sorry, I won't submit a patch. If there is a need to find/fix/cleanup obvious things after company's developers, I have better things to do, and a todo item to re-evaluate hardware for my next project.
You seem confused. If you have a support contract of some form with a Linux supplier or Intel please contact your support. This mailing list isn't for support services.
Alan
On Wed, Aug 10, 2011 at 10:41:44AM +0100, Alan Cox wrote:
Sorry, I won't submit a patch. If there is a need to find/fix/cleanup obvious things after company's developers, I have better things to do, and a todo item to re-evaluate hardware for my next project.
You seem confused. If you have a support contract of some form with a Linux supplier or Intel please contact your support. This mailing list isn't for support services.
Thanks for clarifying.
dri-devel@lists.freedesktop.org