https://bugs.freedesktop.org/show_bug.cgi?id=58272
Priority: medium Bug ID: 58272 Assignee: dri-devel@lists.freedesktop.org Summary: Rv670 AGP drm-next ttm errors Severity: normal Classification: Unclassified OS: Linux (All) Reporter: lists@andyfurniss.entadsl.com Hardware: x86 (IA32) Status: NEW Version: XOrg CVS Component: DRM/Radeon Product: DRI
Created attachment 71476 --> https://bugs.freedesktop.org/attachment.cgi?id=71476&action=edit errors in kern log with dma
I haven't had time to test latest drm-next but will post this now as may be AFK tomorrow.
After finding a place on mesa where etqw seems OK with drm-fixes I am getting errors with drm-next.
On yesterdays head + the wb patch I got attachment 1.
With the tree reset to before the dma changes which required the patch -
drm/ttm: remove no_wait_reserve, v3
I got attachment 2
the last lines repeating for 400k lines and the log also getting filled with junk.
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #1 from Andy Furniss lists@andyfurniss.entadsl.com --- Created attachment 71477 --> https://bugs.freedesktop.org/attachment.cgi?id=71477&action=edit errors in kern log before dma changes
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #2 from Andy Furniss lists@andyfurniss.entadsl.com --- Hmm I see using the word a t t a c h m e n t does strange things - 1 and 2 are not mine.
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #3 from Maarten Lankhorst m.b.lankhorst@gmail.com --- It seems that ttm_mem_evict_first is called way more often in a nasted fashion than is healthy there. Could you resolve the ttm_mem_evict_first address where it ends up calling itself back to a specific line?
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #4 from Maarten Lankhorst m.b.lankhorst@gmail.com --- It looks nasty though, could you also dump mem_type for each time it calls ttm_mem_evict_first?
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #5 from Maarten Lankhorst m.b.lankhorst@gmail.com --- Do any of your local patches touch radeon_evict_flags or radeon_ttm_placement_from_domain? I don't see why it would recurse so deeply otherwise.
A full public git tree to reproduce the problem and seeing what patches are applied would also be nice.
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #6 from Alex Deucher agd5f@yahoo.com --- Created attachment 71507 --> https://bugs.freedesktop.org/attachment.cgi?id=71507&action=edit fix
Should be fixed with this patch.
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #7 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #6)
Created attachment 71507 [details] [review] fix
Should be fixed with this patch.
Probably :-)
It seems that current drm-next head + fix has a different issue which makes etqw die quite quickly.
drm-next reset onto
drm/ttm: remove no_wait_reserve, v3 + the fix
is now stable with etqw.
The head issue is -
EE r600_texture.c:697 r600_texture_transfer_map - failed to create temporary texture to hold untiled copy Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage radeon: The kernel rejected CS, see dmesg for more information. double fault: 'Segmentation fault', bailing out
in dmesg -
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (8192, 2, 4096, -12) [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! etqw.x86[2478]: segfault at 0 ip af5142ad sp bff8b310 error 4 in gamex86.so[af23f000+948000]
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #8 from Maarten Lankhorst m.b.lankhorst@gmail.com --- Could you please run a git bisection to see where that error has been introduced, then?
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #9 from Andy Furniss lists@andyfurniss.entadsl.com --- Created attachment 71530 --> https://bugs.freedesktop.org/attachment.cgi?id=71530&action=edit gpu lock + oops on use async dma for ttm buffer moves on 6xx-SI
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #10 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #8)
Could you please run a git bisection to see where that error has been introduced, then?
It seems that drm/radeon: use async dma for ttm buffer moves on 6xx-SI is the first non working, but it gives a different fail from head. Log attached.
https://bugs.freedesktop.org/show_bug.cgi?id=58272
Florian Mickler florian@mickler.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |florian@mickler.org
--- Comment #11 from Florian Mickler florian@mickler.org --- A patch referencing this bug report has been merged in Linux v3.8-rc1:
commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb Author: Dave Airlie airlied@redhat.com Date: Fri Dec 14 21:04:46 2012 +1000
radeon: fix regression with eviction since evict caching changes
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #12 from Alex Deucher agd5f@yahoo.com --- Make sure your kernel has this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=...
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #13 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #12)
Make sure your kernel has this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff; h=0953e76e91f4b6206cef50bd680696dc6bf1ef99
I tested drm-next head when that went in and got the same results.
I've just rebuilt it to be sure and with etqw I get a segfault after about 10 secs and in dmesg -
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
I've also managed to reproduce the GPU lock + oops I reported earlier - this time with nexuiz on current drm-next head.
I am not getting ttm errors any more so I guess this bug should be closed?
https://bugs.freedesktop.org/show_bug.cgi?id=58272
--- Comment #14 from Andy Furniss lists@andyfurniss.entadsl.com --- (In reply to comment #13)
(In reply to comment #12)
Make sure your kernel has this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff; h=0953e76e91f4b6206cef50bd680696dc6bf1ef99
I tested drm-next head when that went in and got the same results.
I've just rebuilt it to be sure and with etqw I get a segfault after about 10 secs and in dmesg -
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
I've also managed to reproduce the GPU lock + oops I reported earlier - this time with nexuiz on current drm-next head.
I am not getting ttm errors any more so I guess this bug should be closed?
FWIW I tried current drm-next + patch -
0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
And I still fail with etqw after about 10 secs, but do get more info.
radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: Failed to allocate a buffer: radeon: size : 7168 bytes radeon: alignment : 256 bytes radeon: domains : 2 EE r600_texture.c:697 r600_texture_transfer_map - failed to create temporary texture to hold untiled copy Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. radeon: The kernel rejected CS, see dmesg for more information. double fault: 'Segmentation fault', bailing out shutdown terminal support /home/andy/bin/etqw: line 1: 2472 Segmentation fault /usr/local/games/etqw/etqw
dmesg -
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (8192, 2, 4096, -12) [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! etqw.x86[2472]: segfault at 0 ip af5292ad sp bfbe3250 error 4 in gamex86.so[af254000+948000]
https://bugs.freedesktop.org/show_bug.cgi?id=58272
Andy Furniss lists@andyfurniss.entadsl.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #15 from Andy Furniss lists@andyfurniss.entadsl.com --- Current drm-fixes is working for me now.
The remaining etqw issue was fixed by -
Revert "drm/radeon: do not move bo to different placement at each cs"
dri-devel@lists.freedesktop.org