Hello,
Since 2.6.38-rc I get screen corruption (mostly horizontal grabage stripes on the right side of the screen). After a long time bisecting the offending commit ends up being:
commit a00b10c360b35d6431a94cbf130a4e162870d661 Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Sep 24 21:15:47 2010 +0100
drm/i915: Only enforce fence limits inside the GTT.
So long as we adhere to the fence registers rules for alignment and no overlaps (including with unfenced accesses to linear memory) and account for the tiled access in our size allocation, we do not have to allocate the full fenced region for the object. This allows us to fight the bloat tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside the GTT we still suffer the additional alignment constraints, so it doesn't magic allow us to render larger scenes without stalls -- we need the expanded GTT and fence pipelining to overcome those...]
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
This commit caused other problems too, which Daniel tried to fix with commits:
5e78330126e23e00950 drm/i915: fix relaxed tiling for gen <= 3 && !g33 75e9e9158f38e5cb21e drm/i915: kill mappable/fenceable disdinction 818f2a3cc34b0673dcc drm/i915: revert pageflip/mappable related abi breakage
But those don't fix my screen corruption.
Unfortunately, it's a big commit and it doesn't revert cleanly, and its size makes it unclear what the source of the problem is. Daniel's commits don't revert cleanly either, so reverting all of them didn't work.
I'll start poking at it and see if I can find anything.
Greetings,
Indan
Hi Indan,
Please provide the usual details about your system (especially what gpu this is on). Also, screenshots of what typical corruptions look like can help a lot in tracking down such things.
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Yours, Daniel
On Sat, Feb 19, 2011 at 06:58:06AM +0100, Indan Zupancic wrote:
Hello,
Since 2.6.38-rc I get screen corruption (mostly horizontal grabage stripes on the right side of the screen). After a long time bisecting the offending commit ends up being:
commit a00b10c360b35d6431a94cbf130a4e162870d661 Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Sep 24 21:15:47 2010 +0100
drm/i915: Only enforce fence limits inside the GTT. So long as we adhere to the fence registers rules for alignment and no overlaps (including with unfenced accesses to linear memory) and account for the tiled access in our size allocation, we do not have to allocate the full fenced region for the object. This allows us to fight the bloat tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside the GTT we still suffer the additional alignment constraints, so it doesn't magic allow us to render larger scenes without stalls -- we need the expanded GTT and fence pipelining to overcome those...] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This commit caused other problems too, which Daniel tried to fix with commits:
5e78330126e23e00950 drm/i915: fix relaxed tiling for gen <= 3 && !g33 75e9e9158f38e5cb21e drm/i915: kill mappable/fenceable disdinction 818f2a3cc34b0673dcc drm/i915: revert pageflip/mappable related abi breakage
But those don't fix my screen corruption.
Unfortunately, it's a big commit and it doesn't revert cleanly, and its size makes it unclear what the source of the problem is. Daniel's commits don't revert cleanly either, so reverting all of them didn't work.
I'll start poking at it and see if I can find anything.
Greetings,
Indan
Hi,
On Sat, February 19, 2011 19:25, Daniel Vetter wrote:
Hi Indan,
Please provide the usual details about your system (especially what gpu this is on). Also, screenshots of what typical corruptions look like can help a lot in tracking down such things.
Sorry, I plainly forgot to mention that. I'm on my Thinkpad X40 with gen 2 hardware, 855GM.
I uploaded a screenshot of Firefox to: http://img406.imageshack.us/img406/9159/31827167.png
It happens a lot when scrolling in FF on pages with images, pageup or pagedown don't exhibit it. Text gets corrupted as well, but in a less predictable way. It must be something that FF does that triggers it easily, I haven't managed to get it with other programs yet, though I do see some corruption in the window decoration too sometimes (which is Fluxbox). Scrolling horizontally doesn't show the same behaviour. The corruptions seems to be happening on the right side of a surface/window.
Forcing a refresh makes it go away again (e.g. switching to another window or opening another window on top of it. Moving a window doesn't though). I get it with and without xcompmgr running.
Okay, interesting titbit: For text corruption, the stripes are one pixel high for the default font size, no corruption for smaller sizes than the default, two pixel lines for 1 ctrl+ bigger than the default, 3 pixels for +2 etc. With bigger sizes (>=5) I get less corruption to the left of the text, but more in the text itself. The distance between the lines continues to increase, but the stripes thickness goes back to 1 pixel.
Perhaps an off-by-one error somewhere?
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Tried with both applied, doesn't change anything.
Greetings,
Indan
Huh, no idea why it didn't go to Daniel the first time, Squirrelmail never did that before.
On Sun, February 20, 2011 03:20, Indan Zupancic wrote:
Hi,
On Sat, February 19, 2011 19:25, Daniel Vetter wrote:
Hi Indan,
Please provide the usual details about your system (especially what gpu this is on). Also, screenshots of what typical corruptions look like can help a lot in tracking down such things.
Sorry, I plainly forgot to mention that. I'm on my Thinkpad X40 with gen 2 hardware, 855GM.
I uploaded a screenshot of Firefox to: http://img406.imageshack.us/img406/9159/31827167.png
It happens a lot when scrolling in FF on pages with images, pageup or pagedown don't exhibit it. Text gets corrupted as well, but in a less predictable way. It must be something that FF does that triggers it easily, I haven't managed to get it with other programs yet, though I do see some corruption in the window decoration too sometimes (which is Fluxbox). Scrolling horizontally doesn't show the same behaviour. The corruptions seems to be happening on the right side of a surface/window.
Forcing a refresh makes it go away again (e.g. switching to another window or opening another window on top of it. Moving a window doesn't though). I get it with and without xcompmgr running.
Okay, interesting titbit: For text corruption, the stripes are one pixel high for the default font size, no corruption for smaller sizes than the default, two pixel lines for 1 ctrl+ bigger than the default, 3 pixels for +2 etc. With bigger sizes (>=5) I get less corruption to the left of the text, but more in the text itself. The distance between the lines continues to increase, but the stripes thickness goes back to 1 pixel.
Perhaps an off-by-one error somewhere?
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Tried with both applied, doesn't change anything.
Greetings,
Indan
Hi,
On Sun, Feb 20, 2011 at 03:20:15AM +0100, Indan Zupancic wrote:
Sorry, I plainly forgot to mention that. I'm on my Thinkpad X40 with gen 2 hardware, 855GM.
Ah, craptastic i855gm. Is this with the HIC poke patch?
<snip>
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Tried with both applied, doesn't change anything.
Well, these two patches completely undo the patch by Chris Wilson (and I've re-reviewed it to check whether there are any other hidden bugs in it). I suspect that your bisect went wrong because the range a00b10c360b35d6431a94cb..5e78330126e23e00950 is known to be broken on your hw. Could you recheck these two endpoints to confirm that breakage was introduce in that range? If so, could you rebase this range and squash the three patches by me you've mentioned in the original report into the patch by Chris and re-run git bisect on this new branch?
Thanks, Daniel
Hi Daniel,
On Sun, February 20, 2011 10:20, Daniel Vetter wrote:
On Sun, Feb 20, 2011 at 03:20:15AM +0100, Indan Zupancic wrote:
Sorry, I plainly forgot to mention that. I'm on my Thinkpad X40 with gen 2 hardware, 855GM.
Ah, craptastic i855gm. Is this with the HIC poke patch?
Yeah, that one. That's what I first suspected, so I tried both with and without it, but this is something new.
Well, these two patches completely undo the patch by Chris Wilson (and I've re-reviewed it to check whether there are any other hidden bugs in it).
I can't find anything dodgy in it either, so it's either somewhere else or really subtle. I certainly don't see much gen2 specific stuff.
I suspect that your bisect went wrong because the range a00b10c360b35d6431a94cb..5e78330126e23e00950 is known to be broken on your hw.
It mostly worked after a bit of twiddling (after that I didn't need to skip commits any more), and the kind of corruption was the same too.
Only way I can see that this would make a difference is when a00b10c3 created the bug, the bug got fixed by your patches, but something else introduced it between that range in a different way. This actually makes sense when it's re-introduced by a bad merge. But then it should show up in the current code. If not for a bad merge it seems unlikely though.
Could you recheck these two endpoints to confirm that breakage was introduce in that range?
I'm pretty sure the breakage was introduced starting with a00b10c3, because the commit just before it (746537) works fine. so yes, it's in that range. I was getting a bit tired and less sharp though, so perhaps I made a mistake somewhere. I'll recheck, probably tomorrow (getting a bit late here).
If so, could you rebase this range and squash the three patches by me you've mentioned in the original report into the patch by Chris and re-run git bisect on this new branch?
As I have to apply patches anyway, it seems easier to just add your patches to the bunch, if they apply...
Annoying bug, it seems I'm always hitting the tricky ones.
Looking at: gitk a00b10c360b35d6431a94cb..5e78330126e23e00950 \ drivers/char/agp/ drivers/gpu/drm/i915/
There were three merges around that area, so it seems a bit messy and something might have gone wrong when merging there. But the right branch seems okay because all later good bisections were from there.
---
I just saw your new email with a new patch to try, I'll try that one first and report back, before digging further into all this.
Greetings,
Indan
---
First bisection info:
When I did the bisect I had to fix a compile error and an OOPS, patches I applied were:
commit ff75b9bc489710ff223bc0d805bf3b862dc347ea Author: Chris Wilson chris@chris-wilson.co.uk Date: Sat Oct 30 22:52:31 2010 +0100
drm/i915: Fix typo from e5281ccd in i915_gem_attach_phys_object()
Accessing the uninitialised obj->pages instead of the local page lead to an OOPs.
Reported-by: Xavier Chantry chantry.xavier@gmail.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 936ddd8..e3fc333 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4925,7 +4925,7 @@ i915_gem_attach_phys_object(struct drm_device *dev, if (IS_ERR(page)) return PTR_ERR(page);
- src = kmap_atomic(obj_priv->pages[i]); + src = kmap_atomic(page); dst = obj_priv->phys_obj->handle->vaddr + (i * PAGE_SIZE); memcpy(dst, src, PAGE_SIZE); kunmap_atomic(src);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c index 75a3d7f..e1cc56a 100644 --- a/arch/x86/mm/iomap_32.c +++ b/arch/x86/mm/iomap_32.c @@ -61,7 +61,8 @@ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
pagefault_disable();
- type = kmap_atomic_idx_push(); +// type = kmap_atomic_idx_push(); + type = 0; idx = type + KM_TYPE_NR * smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); set_pte(kmap_pte - idx, pfn_pte(pfn, prot)); @@ -98,7 +99,8 @@ iounmap_atomic(void __iomem *kvaddr) vaddr <= __fix_to_virt(FIX_KMAP_BEGIN)) { int idx, type;
- type = kmap_atomic_idx_pop(); +// type = kmap_atomic_idx_pop(); + type = 0; idx = type + KM_TYPE_NR * smp_processor_id();
#ifdef CONFIG_DEBUG_HIGHMEM
With these patches it worked for me and the only difference was that corruption showing up or not, and the kind of corruption seemed the same too.
I ran bisect on the drm/i915 and char/agp paths. My bisect log was:
# bad: [d2478521afc20227658a10a8c5c2bf1a2aa615b3] char/ipmi: fix OOPS caused by pnp_unregister_driver on unregistered driver # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect start 'd2478521afc20227658a10a8c5c2bf1a2aa615b3' 'v2.6.37' 'drivers/gpu/drm/i915/' 'drivers/char/agp/' # bad: [c6748e09eed072be077fe583976516b76daf42ec] drm/i915: Remove inactive LRU tracking from set_domain_ioctl git bisect bad c6748e09eed072be077fe583976516b76daf42ec # bad: [e624ae8e0d4243e71daedce7570e91290438eaca] Merge branch 'drm-intel-fixes' into drm-intel-next git bisect bad e624ae8e0d4243e71daedce7570e91290438eaca # good: [4ab0fbd3a29067e1540f05093ae4ed07645d18c8] Merge remote branch 'linus' into drm-intel-fixes git bisect good 4ab0fbd3a29067e1540f05093ae4ed07645d18c8 # good: [1bb95834bbcdc969e477a9284cf96c17a4c2616f] Merge remote branch 'airlied/drm-fixes' into drm-intel-fixes git bisect good 1bb95834bbcdc969e477a9284cf96c17a4c2616f # bad: [c94f28c383f58c9de74678e0f1624db9c5f8a8cb] Merge branch 'drm-intel-fixes' into drm-intel-next git bisect bad c94f28c383f58c9de74678e0f1624db9c5f8a8cb # bad: [46168f39360f419e59952d58cd08a862886ec8cd] Merge branch 'drm-intel-fixes' into drm-intel-next git bisect bad 46168f39360f419e59952d58cd08a862886ec8cd # good: [16a02cf08a2de0863daf7ebb91718d7c6bbe7f9c] agp/intel: fix cache control for sandybridge git bisect good 16a02cf08a2de0863daf7ebb91718d7c6bbe7f9c # bad: [ff75b9bc489710ff223bc0d805bf3b862dc347ea] drm/i915: Fix typo from e5281ccd in i915_gem_attach_phys_object() git bisect bad ff75b9bc489710ff223bc0d805bf3b862dc347ea # good: [dd2b379f071424f36f9f90ff83cb4ad058c7b6ed] drm/i915: Fix typo from "Enable DisplayPort Audio" git bisect good dd2b379f071424f36f9f90ff83cb4ad058c7b6ed # good: [b4ce0f85159f77f208a62930f67b4e548576a5a3] drm/i915: Use pci_iomap for remapping the MMIO registers. git bisect good b4ce0f85159f77f208a62930f67b4e548576a5a3 # good: [39a01d1fb63cf8ebc1a8cf436f5c0ba9657b55c6] drm/i915: Remove mmap_offset git bisect good 39a01d1fb63cf8ebc1a8cf436f5c0ba9657b55c6 # good: [e5281ccd2e0049e2b9e8ce82449630d25082372d] drm/i915: Eliminate nested get/put pages git bisect good e5281ccd2e0049e2b9e8ce82449630d25082372d # good: [e5281ccd2e0049e2b9e8ce82449630d25082372d] drm/i915: Eliminate nested get/put pages git bisect good e5281ccd2e0049e2b9e8ce82449630d25082372d # bad: [e380f60b22eddec7825224b8d788572c82b63161] agp/intel: Sandybridge doesn't require GMCH enabling git bisect bad e380f60b22eddec7825224b8d788572c82b63161 # bad: [e380f60b22eddec7825224b8d788572c82b63161] agp/intel: Sandybridge doesn't require GMCH enabling git bisect bad e380f60b22eddec7825224b8d788572c82b63161 # bad: [a00b10c360b35d6431a94cbf130a4e162870d661] drm/i915: Only enforce fence limits inside the GTT. git bisect bad a00b10c360b35d6431a94cbf130a4e162870d661 # good: [4a684a4117abd756291969336af454e8a958802f] drm/i915: Kill GTT mappings when moving from GTT domain git bisect good 4a684a4117abd756291969336af454e8a958802f # good: [7465378fd7c681f6cf2b74b3494c4f0991d8c8ac] drm/i915: Convert BUG_ON(pin_count) from an impossible condition git bisect good 7465378fd7c681f6cf2b74b3494c4f0991d8c8ac
Or in condensed form:
- d24785 char/ipmi: fix OOPS caused by pnp_unregister_driver on unregistered driver + 3c0eee Linux 2.6.37 - c6748e drm/i915: Remove inactive LRU tracking from set_domain_ioctl - e624ae Merge branch 'drm-intel-fixes' into drm-intel-next + 4ab0fb Merge remote branch 'linus' into drm-intel-fixes + 1bb958 Merge remote branch 'airlied/drm-fixes' into drm-intel-fixes - c94f28 Merge branch 'drm-intel-fixes' into drm-intel-next - 46168f Merge branch 'drm-intel-fixes' into drm-intel-next + 16a02c agp/intel: fix cache control for sandybridge - ff75b9 drm/i915: Fix typo from e5281ccd in i915_gem_attach_phys_object() + dd2b37 drm/i915: Fix typo from "Enable DisplayPort Audio" + b4ce0f drm/i915: Use pci_iomap for remapping the MMIO registers. + 39a01d drm/i915: Remove mmap_offset + e5281c drm/i915: Eliminate nested get/put pages - e380f6 agp/intel: Sandybridge doesn't require GMCH enabling - a00b10 drm/i915: Only enforce fence limits inside the GTT. + 4a684a drm/i915: Kill GTT mappings when moving from GTT domain + 746537 drm/i915: Convert BUG_ON(pin_count) from an impossible condition
Hi,
On Sun, Feb 20, 2011 at 03:20:15AM +0100, Indan Zupancic wrote:
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Tried with both applied, doesn't change anything.
I've just noticed that one of the patches (the 2nd one) doesn't work as advertised. Please retest with the attached one.
Yours, Daniel
Hi,
Good news:
On Sun, February 20, 2011 11:55, Daniel Vetter wrote:
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Tried with both applied, doesn't change anything.
I've just noticed that one of the patches (the 2nd one) doesn't work as advertised. Please retest with the attached one.
This one fixes it!
So ignore my other email, you nailed it. Thanks a heap Daniel.
Tomorrow I'll test more and double check, but it seems fixed.
Of course, looking at the patch, I guess it isn't generally applicable, so perhaps either more digging or a special exception for older chipsets is needed. I'm happy to test any further patches if that helps, triggering the corruption is really easy for me.
Thanks,
Indan
On Sun, Feb 20, 2011 at 12:19:14PM +0100, Indan Zupancic wrote:
Hi,
Good news:
On Sun, February 20, 2011 11:55, Daniel Vetter wrote:
I've created two quick patches to check a few theories, please test them (both patches independently and both together). Patches attached.
Tried with both applied, doesn't change anything.
I've just noticed that one of the patches (the 2nd one) doesn't work as advertised. Please retest with the attached one.
This one fixes it!
Well, don't start jumping around, yet. These patches are just to rule out some theories. Now: Is it fixed with just the 2nd patch alone or do you need both patches? This is very important, so please test extensively whether there are really no corruptions with just the 2nd patch.
Yours, Daniel
Hi,
On Sun, February 20, 2011 14:21, Daniel Vetter wrote:
Well, don't start jumping around, yet. These patches are just to rule out some theories.
I know, that's why I said I don't mind testing other patches.
Now: Is it fixed with just the 2nd patch alone or do you need both patches? This is very important, so please test extensively whether there are really no corruptions with just the 2nd patch.
I only applied the 2nd version of the 2nd patch, no other patches. The horizontal garbage stripes are gone for sure. It's a lot easier to prove that something doesn't work than to prove it does, but I'll keep running it for a while and see if I can spot any other badness. If it's hard to trigger it's also very hard to find out what causes it. Without your 2nd patch I always get the garbage on certain webpages, so I'm quite confident that the original source is located for this particular problem.
Greetings,
Indan
It looks like gen2 has a peculiar interleaved 2-row inter-tile layout. Probably inherited from i81x which had 2kb tiles (which naturally fit an even-number-of-tile-rows scheme to fit onto 4kb pages). There is no other mention of this in any docs (also not in the Intel internal documention according to Chris Wilson).
Problem manifests itself in corruptions in the second half of the last tile row (if the bo has an odd number of tiles). Which can only happen with relaxed tiling (introduced in a00b10c360b35d6431a9).
So reject set_tiling calls that don't satisfy this constrain to prevent broken userspace from causing havoc. While at it, also check the size for newer chipsets.
LKML: https://lkml.org/lkml/2011/2/19/5 Reported-by: Indan Zupancic indan@nul.nu Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/gpu/drm/i915/i915_gem_tiling.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c index 22a32b9..79a04fd 100644 --- a/drivers/gpu/drm/i915/i915_gem_tiling.c +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c @@ -184,7 +184,7 @@ i915_gem_detect_bit_6_swizzle(struct drm_device *dev) static bool i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) { - int tile_width; + int tile_width, tile_height;
/* Linear is always fine */ if (tiling_mode == I915_TILING_NONE) @@ -215,6 +215,20 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) } }
+ if (IS_GEN2(dev) || + (tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))) + tile_height = 32; + else + tile_height = 8; + /* i8xx is strange: It has 2 interleaved rows of tiles, so needs an even + * number of tile rows. */ + if (IS_GEN2(dev)) + tile_height *= 2; + + /* Size needs to be aligned to a full tile row */ + if (size & (tile_height * stride - 1)) + return false; + /* 965+ just needs multiples of tile width */ if (INTEL_INFO(dev)->gen >= 4) { if (stride & (tile_width - 1))
On 22 February 2011 18:25, Daniel Vetter daniel.vetter@ffwll.ch wrote:
It looks like gen2 has a peculiar interleaved 2-row inter-tile layout. Probably inherited from i81x which had 2kb tiles (which naturally fit an even-number-of-tile-rows scheme to fit onto 4kb pages). There is no other mention of this in any docs (also not in the Intel internal documention according to Chris Wilson).
Giving that lack of documentation, could you put some more comments in the code? so that nobody cleans out that "strange workaround" in 6 monthes...
@@ -215,6 +215,20 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) } }
- if (IS_GEN2(dev) ||
- (tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev)))
- tile_height = 32;
- else
- tile_height = 8;
- /* i8xx is strange: It has 2 interleaved rows of tiles, so needs an even
- * number of tile rows. */
- if (IS_GEN2(dev))
- tile_height *= 2;
- /* Size needs to be aligned to a full tile row */
- if (size & (tile_height * stride - 1))
- return false;
/* 965+ just needs multiples of tile width */ if (INTEL_INFO(dev)->gen >= 4) { if (stride & (tile_width - 1))
On Tue, Feb 22, 2011 at 06:32:11PM +0100, Thierry Vignaud wrote:
Giving that lack of documentation, could you put some more comments in the code? so that nobody cleans out that "strange workaround" in 6 monthes...
It's not a workaround, it's how the hw works. If we'd add the explanation from the changelog everytime there's a little and/or badly documented hw oddity, you won't be able to read the code anymore.
Use git blame and dig into the history - drm/* is full of such stuff. -Daniel
Hi,
On Tue, February 22, 2011 18:25, Daniel Vetter wrote:
It looks like gen2 has a peculiar interleaved 2-row inter-tile layout. Probably inherited from i81x which had 2kb tiles (which naturally fit an even-number-of-tile-rows scheme to fit onto 4kb pages). There is no other mention of this in any docs (also not in the Intel internal documention according to Chris Wilson).
Problem manifests itself in corruptions in the second half of the last tile row (if the bo has an odd number of tiles). Which can only happen with relaxed tiling (introduced in a00b10c360b35d6431a9).
So reject set_tiling calls that don't satisfy this constrain to prevent broken userspace from causing havoc. While at it, also check the size for newer chipsets.
LKML: https://lkml.org/lkml/2011/2/19/5 Reported-by: Indan Zupancic indan@nul.nu Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
drivers/gpu/drm/i915/i915_gem_tiling.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c index 22a32b9..79a04fd 100644 --- a/drivers/gpu/drm/i915/i915_gem_tiling.c +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c @@ -184,7 +184,7 @@ i915_gem_detect_bit_6_swizzle(struct drm_device *dev) static bool i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) {
- int tile_width;
int tile_width, tile_height;
/* Linear is always fine */ if (tiling_mode == I915_TILING_NONE)
@@ -215,6 +215,20 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) } }
- if (IS_GEN2(dev) ||
(tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev)))
tile_height = 32;
- else
tile_height = 8;
- /* i8xx is strange: It has 2 interleaved rows of tiles, so needs an even
* number of tile rows. */
- if (IS_GEN2(dev))
tile_height *= 2;
- /* Size needs to be aligned to a full tile row */
- if (size & (tile_height * stride - 1))
return false;
- /* 965+ just needs multiples of tile width */ if (INTEL_INFO(dev)->gen >= 4) { if (stride & (tile_width - 1))
Tested-by: Indan Zupancic indan@nul.nu
I tested with this patch and without the other ones you send and the corruption is indeed gone.
Not sure why you dropped lkml from CC, now people who stuble upon it don't see the ending...
Greetings,
Indan
Am Mi, 23.02.2011, 07:59 schrieb Indan Zupancic:
On Tue, February 22, 2011 18:25, Daniel Vetter wrote:
It looks like gen2 has a peculiar interleaved 2-row inter-tile layout. Probably inherited from i81x which had 2kb tiles (which naturally fit an even-number-of-tile-rows scheme to fit onto 4kb pages). There is no other mention of this in any docs (also not in the Intel internal documention according to Chris Wilson).
Problem manifests itself in corruptions in the second half of the last tile row (if the bo has an odd number of tiles). Which can only happen with relaxed tiling (introduced in a00b10c360b35d6431a9).
So reject set_tiling calls that don't satisfy this constrain to prevent broken userspace from causing havoc. While at it, also check the size for newer chipsets.
LKML: https://lkml.org/lkml/2011/2/19/5 Reported-by: Indan Zupancic indan@nul.nu Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
drivers/gpu/drm/i915/i915_gem_tiling.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c index 22a32b9..79a04fd 100644 --- a/drivers/gpu/drm/i915/i915_gem_tiling.c +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c @@ -184,7 +184,7 @@ i915_gem_detect_bit_6_swizzle(struct drm_device *dev) static bool i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) {
- int tile_width;
int tile_width, tile_height;
/* Linear is always fine */ if (tiling_mode == I915_TILING_NONE)
@@ -215,6 +215,20 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) } }
- if (IS_GEN2(dev) ||
(tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev)))
tile_height = 32;
- else
tile_height = 8;
- /* i8xx is strange: It has 2 interleaved rows of tiles, so needs an
even
* number of tile rows. */
- if (IS_GEN2(dev))
tile_height *= 2;
- /* Size needs to be aligned to a full tile row */
- if (size & (tile_height * stride - 1))
return false;
- /* 965+ just needs multiples of tile width */ if (INTEL_INFO(dev)->gen >= 4) { if (stride & (tile_width - 1))
Tested-by: Indan Zupancic indan@nul.nu
I tested with this patch and without the other ones you send and the corruption is indeed gone.
Not sure why you dropped lkml from CC, now people who stuble upon it don't see the ending...
Random incoherency in my brain. Re-added to cc. - Daniel
Hi,
On Wed, February 23, 2011 08:10, Daniel Vetter wrote:
Am Mi, 23.02.2011, 07:59 schrieb Indan Zupancic:
On Tue, February 22, 2011 18:25, Daniel Vetter wrote:
It looks like gen2 has a peculiar interleaved 2-row inter-tile layout. Probably inherited from i81x which had 2kb tiles (which naturally fit an even-number-of-tile-rows scheme to fit onto 4kb pages). There is no other mention of this in any docs (also not in the Intel internal documention according to Chris Wilson).
Problem manifests itself in corruptions in the second half of the last tile row (if the bo has an odd number of tiles). Which can only happen with relaxed tiling (introduced in a00b10c360b35d6431a9).
So reject set_tiling calls that don't satisfy this constrain to prevent broken userspace from causing havoc. While at it, also check the size for newer chipsets.
LKML: https://lkml.org/lkml/2011/2/19/5 Reported-by: Indan Zupancic indan@nul.nu Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
drivers/gpu/drm/i915/i915_gem_tiling.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c index 22a32b9..79a04fd 100644 --- a/drivers/gpu/drm/i915/i915_gem_tiling.c +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c @@ -184,7 +184,7 @@ i915_gem_detect_bit_6_swizzle(struct drm_device *dev) static bool i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) {
- int tile_width;
int tile_width, tile_height;
/* Linear is always fine */ if (tiling_mode == I915_TILING_NONE)
@@ -215,6 +215,20 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode) } }
- if (IS_GEN2(dev) ||
(tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev)))
tile_height = 32;
- else
tile_height = 8;
- /* i8xx is strange: It has 2 interleaved rows of tiles, so needs an
even
* number of tile rows. */
- if (IS_GEN2(dev))
tile_height *= 2;
- /* Size needs to be aligned to a full tile row */
- if (size & (tile_height * stride - 1))
return false;
- /* 965+ just needs multiples of tile width */ if (INTEL_INFO(dev)->gen >= 4) { if (stride & (tile_width - 1))
Tested-by: Indan Zupancic indan@nul.nu
I tested with this patch and without the other ones you send and the corruption is indeed gone.
Not sure why you dropped lkml from CC, now people who stuble upon it don't see the ending...
Random incoherency in my brain. Re-added to cc.
This isn't in rc8, can someone make sure it gets into 2.6.38-rc9/2.6.38?
Thanks,
Indan
Am Do, 10.03.2011, 06:06 schrieb Indan Zupancic:
This isn't in rc8, can someone make sure it gets into 2.6.38-rc9/2.6.38?
The patch unfortunately broke gen4+ (more precisely: the unwritten abi guarantee that userspace can tile buffers that don't have a complete last tile row, as long as it promises not to touch it). Hence it got reverted. It was just a enforcement check to prevent broken userspace from fooling itself. The proper fix is to upgrade your userspace (libdrm + xf86-video-intel).
[Aside: This only happens if you have new enough userspace that supports relaxed tiling. Old userspace and new kernels are _not_ broken.]
Cheers, Daniel
On Thu, March 10, 2011 08:52, Daniel Vetter wrote:
Am Do, 10.03.2011, 06:06 schrieb Indan Zupancic:
This isn't in rc8, can someone make sure it gets into 2.6.38-rc9/2.6.38?
The patch unfortunately broke gen4+ (more precisely: the unwritten abi guarantee that userspace can tile buffers that don't have a complete last tile row, as long as it promises not to touch it). Hence it got reverted. It was just a enforcement check to prevent broken userspace from fooling itself. The proper fix is to upgrade your userspace (libdrm + xf86-video-intel).
[Aside: This only happens if you have new enough userspace that supports relaxed tiling. Old userspace and new kernels are _not_ broken.]
Yes, I noticed it got reverted when digging into git log, but the commit message only said it broke gen4+, not how gen2 should be unbroken after the revert.
Which versions fix this, just for reference?
I got libdrm 2.4.23 and xf86-video-intel 2.14.0.
(As a side note, do you have version checking on the kernel side? If so, you could return 0 for the relaxed fencing feature check if the driver is the wrong version.)
To keep things manageable I either upgrade the kernel, or userspace, but never both at once. It seems that that doesn't work with graphics.
Thanks,
Indan
Am Do, 10.03.2011, 11:36 schrieb Indan Zupancic:
On Thu, March 10, 2011 08:52, Daniel Vetter wrote:
[Aside: This only happens if you have new enough userspace that supports relaxed tiling. Old userspace and new kernels are _not_ broken.]
Yes, I noticed it got reverted when digging into git log, but the commit message only said it broke gen4+, not how gen2 should be unbroken after the revert.
Which versions fix this, just for reference?
git master branch of libdrm and xf86-video-intel newer than 2011-02-22.
-Daniel
On Thu, March 10, 2011 14:31, Daniel Vetter wrote:
Am Do, 10.03.2011, 11:36 schrieb Indan Zupancic:
Which versions fix this, just for reference?
git master branch of libdrm and xf86-video-intel newer than 2011-02-22.
Thank you. If there will be no new releases of those packages within a couple weeks it might be better to temporarily add those checks back for gen 2 only, I think. Otherwise there will be a period where people who update regularly will have screen corruption, with no easy way of fixing it. I think this would avoid a few unnecessary bugreports. The check can be removed in 2.6.39-rc1, if you want I'll remind you about it too.
Greetings,
Indan
On Fri, Mar 11, 2011 at 02:08:46AM +0100, Indan Zupancic wrote:
On Thu, March 10, 2011 14:31, Daniel Vetter wrote:
Am Do, 10.03.2011, 11:36 schrieb Indan Zupancic:
Which versions fix this, just for reference?
git master branch of libdrm and xf86-video-intel newer than 2011-02-22.
Thank you. If there will be no new releases of those packages within a couple weeks it might be better to temporarily add those checks back for gen 2 only, I think. Otherwise there will be a period where people who update regularly will have screen corruption, with no easy way of fixing it. I think this would avoid a few unnecessary bugreports. The check can be removed in 2.6.39-rc1, if you want I'll remind you about it too.
Well, libdrm is already released (2.4.24) and for the ddx there's an rc (2.4.901) out there. So that should be sufficient. -Daniel
Hi,
Feel free to ignore this post, it's a it of rambling.
On Sun, February 20, 2011 11:55, Daniel Vetter wrote:
I've just noticed that one of the patches (the 2nd one) doesn't work as advertised. Please retest with the attached one.
I was wondering why that was, so I looked a bit closer:
Old patch:
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 17bd766..fbc21e3 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -763,7 +763,7 @@ static int i915_getparam(struct drm_device *dev, void *data, value = HAS_BLT(dev); break; case I915_PARAM_HAS_RELAXED_FENCING: - value = 1; + value = 0; break; case I915_PARAM_HAS_COHERENT_RINGS: value = 1;
New patch:
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 17bd766..d275c96 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -762,9 +762,6 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_BLT: value = HAS_BLT(dev); break; - case I915_PARAM_HAS_RELAXED_FENCING: - value = 1; - break; case I915_PARAM_HAS_COHERENT_RINGS: value = 1; break;
Looking at userspace intel-dri code it becomes clear why:
gp.param = I915_PARAM_HAS_EXECBUF2; ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp); if (!ret) exec2 = 1;
gp.param = I915_PARAM_HAS_BSD; ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp); bufmgr_gem->has_bsd = ret == 0;
gp.param = I915_PARAM_HAS_BLT; ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp); bufmgr_gem->has_blt = ret == 0;
gp.param = I915_PARAM_HAS_RELAXED_FENCING; ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp); bufmgr_gem->has_relaxed_fencing = ret == 0;
The stupid code assumes that if an option is supported, the value is true too, while the kernel return code just tells whether it knows the option, and the value is copied to gp.value, but that's ignored by intel-dri!
And a quick look at the other code gives a strong impression of lots of code dubplication with the kernel driver.
More and more I'm getting the impression that you guys haven't gotten the interfaces between all bits right yet. All in all you seem to be somewhat in the same mess as filesystems are with overly complex interaction between the MM system, VFS and individual filesystems, all doing more or less the same in slightly different ways, instead of abstracting things properly. (Which is also just an impression, I haven't looked that close yet.)
To give you an idea of a driver subsystem that does get it right:
All the common libata code is 23k lines. (wc liba*) All the individual sata drivers code combined are 19k lines. (wc sata_*)
DRM does it differently (only counting .c files):
Common drm code is 21k i915 is 37k nouveau is 47k radeon is 68k
And then there are the userspace drivers: xf86-video-intel/src/*.c is 16k drm/intel/ is 4k mesa/drivers/dri/i915/ is 22k
Of course graphics drivers are a lot more complex than sata drivers, but because of that extra complexity it's a lot easier to make things even more compex by getting the interfaces wrong. So what I'm saying is, there seems a lot of room for improvements. The intel driver code I've seen is mostly busy with infrastructure stuff and talking to other bits and pieces, but it doesn't seem to do much actual work.
To come back to the I915_PARAM_HAS_RELAXED_FENCING thing:
Why is this interface there at all? It seems like a driver internal detail thing. Either it should be handled entirely in the kernel driver, and this interface wouldn't be there, or, if the userspace driver has to know about it, it should be entirely handled by the userspace driver and not done by the kernel driver at all.
Another slightly annoying thing: The code is littered with gen checks, while very often the only difference is a register or size value. Why not put those in a gen specific hardware description structure which is used unconditionally, instead of having a lot of almost the same code?
The current design seems overly complex and fragile, and I think you guys make it more difficult for yourself than necessary.
Greetings,
Indan
Daniel Vetter wrote:
Please provide the usual details about your system (especially what gpu this is on). Also, screenshots of what typical corruptions look like can help a lot in tracking down such things.
Confirm I also have this issue on my X40, but there are other bugs that are much more significant so I haven't bothered mentioning this.
On the other hand my kernel is nearly two months old.
Please don't mistake this as a complaint. I'm overall rather happy even with current state and it's issues, and especially I think the situation is consistently getting better, with some bumps here and there along the way.
Last time I used a projector I had some trouble, because the graphics driver was too clever; it automatically set the correct resolution for the projector, but I needed the resolution of my internal screen. :)
//Peter
Hi Peter,
On Sun, February 20, 2011 04:51, Peter Stuge wrote:
Confirm I also have this issue on my X40, but there are other bugs that are much more significant so I haven't bothered mentioning this.
What issues? If it's backlight related, try my patch at: http://lkml.org/lkml/2011/2/16/447
Or if the screen is black after suspend/screen blank then just try a newer kernel, that got recently fixed. Actually, a lot of bugs were recently introduced and fixed, with two months ago you're probably in the new-bugs-only period, so I can recommend trying 2.6.38-rc5.
This screen corruption is the only problem for me, but I don't do anything fancy with my laptop. The ipw2200 wireless driver is quite crappy, but it has always been as far as I know. (Though not too long ago the USB got burnt out, which probably took one RAM module with it, so now I'm running with 512MB instead of 1GB, which is really noticeable when doing kernel git stuff or compiles with this very very slow HD.)
Good luck,
Indan
P.S. Anyone any idea where that "Mail-Followup-To" header comes from? Or is everyone subscribed to dri-devel automatically excluded from Mail-Followup-To? If that's the case, pardon for the dubplicate mails.
Indan Zupancic wrote:
Confirm I also have this issue on my X40, but there are other bugs that are much more significant so I haven't bothered mentioning this.
What issues?
The one I confirmed is corrupted graphics within Gecko. I haven't had Xv working for a long time either. Not sure if I've tried it with this kernel though.
If it's backlight related, try my patch at: http://lkml.org/lkml/2011/2/16/447
Yeah also have backlight issue whenever the backlight level changes by other means than Fn+Home/End. (Lid switch, screen blank) I noticed the patch and that it solved the issue for someone. I'm not too inconvenienced by this issue though.
Or if the screen is black after suspend/screen blank
Oh I've stopped using suspend since using KMS. I get way too angry about all the state that I lose if resume fails so I don't risk it.
Actually, a lot of bugs were recently introduced and fixed, with two months ago you're probably in the new-bugs-only period, so I can recommend trying 2.6.38-rc5.
Yeah, I think it's time to pull Linus' git. I've been keeping an eye on things i915 on the list for a good while already.
This screen corruption is the only problem for me, but I don't do anything fancy with my laptop. The ipw2200 wireless driver is quite crappy, but it has always been as far as I know.
I've used ipw2200 with great success for many years, but these days I'm having fun (no, not at all) with ath9k where there is some very fundamental hardware issue between laptop and card. I'd need to hook up logic analyzer to say anything concrete, but I have no end of problems with internal ath9k in my machine. It's completely unusable.
The only other annoying issue I have is that as wine enumerates available screen resolutions i915 goes out to the VGA connector, which on 855 always means a 600ms timeout when nothing is connected, but this is a bit tricky because the hardware just can not tell if anything is connected.
I would be very happy if there was a knob for enabling/disabling the VGA connector though.
Good luck,
Thanks, you too!
//Peter
Hi Peter,
The new corruption is "fixed" with daniel's patch from https://lkml.org/lkml/2011/2/20/25
On Mon, February 21, 2011 05:10, Peter Stuge wrote:
Indan Zupancic wrote:
Confirm I also have this issue on my X40, but there are other bugs that are much more significant so I haven't bothered mentioning this.
What issues?
The one I confirmed is corrupted graphics within Gecko. I haven't had Xv working for a long time either. Not sure if I've tried it with this kernel though.
Hmm, I recommend upgrading all the graphics userspace too then, current stuff works pretty well for me.
If it's backlight related, try my patch at: http://lkml.org/lkml/2011/2/16/447
Yeah also have backlight issue whenever the backlight level changes by other means than Fn+Home/End. (Lid switch, screen blank) I noticed the patch and that it solved the issue for someone. I'm not too inconvenienced by this issue though.
Well, it was for me, because I suspend all the time.
Or if the screen is black after suspend/screen blank
Oh I've stopped using suspend since using KMS. I get way too angry about all the state that I lose if resume fails so I don't risk it.
I never had a failed resume with this laptop, not even once. But I avoided the early days of KMS because of all the instability.
Actually, a lot of bugs were recently introduced and fixed, with two months ago you're probably in the new-bugs-only period, so I can recommend trying 2.6.38-rc5.
Yeah, I think it's time to pull Linus' git. I've been keeping an eye on things i915 on the list for a good while already.
There were coherency bugs for 855, but those seem to be fixed in the newer kernel too (2.6.37?), so another reason to upgrade.
This screen corruption is the only problem for me, but I don't do anything fancy with my laptop. The ipw2200 wireless driver is quite crappy, but it has always been as far as I know.
I've used ipw2200 with great success for many years, but these days I'm having fun (no, not at all) with ath9k where there is some very fundamental hardware issue between laptop and card. I'd need to hook up logic analyzer to say anything concrete, but I have no end of problems with internal ath9k in my machine. It's completely unusable.
My impression is that ipw2200's firmware is crappy. Everything is fine with a good signal, but with a bad one things don't work out too well after a time.
The only other annoying issue I have is that as wine enumerates available screen resolutions i915 goes out to the VGA connector, which on 855 always means a 600ms timeout when nothing is connected, but this is a bit tricky because the hardware just can not tell if anything is connected.
I have the impression that got better too, but I haven't followed the development. Can't you just lower the timeout in the code, or let it cache the VGA status? The latter is what I'd expect the code to do.
Running xrandr to see the modes takes 0.3s here.
I would be very happy if there was a knob for enabling/disabling the VGA connector though.
Good luck,
Thanks, you too!
I can recommend using PHC too, if you don't already: www.linux-phc.org With it I almost never have the fan running and it uses a lot less power when under load. (But don't use it till you got your system stable.)
Greetings,
Indan
dri-devel@lists.freedesktop.org