Re: [PATCH] drm/nouveau: fix ttm move notify callback

List overview All Threads
Download

newer

older

Looking for libsegfault

g33: GPU hangs

Konrad Rzeszutek Wilk

6 Jan 2012 6 Jan '12

2:57 p.m.

On Thu, Jan 05, 2012 at 09:14:10PM -0500, Konrad Rzeszutek Wilk wrote:

...

On Fri, Jan 06, 2012 at 07:53:13AM +1000, Ben Skeggs wrote:

...
On Thu, 2012-01-05 at 13:31 -0500, j.glisse@gmail.com wrote:

...
From: Jerome Glisse jglisse@redhat.com

ttm might call the move notify with null new mem placement, properly handle this case inside nouveau move notify callback.

This has been fixed already in a -next tree I sent to Dave.

I just tried -next with your patch (and two other fixes that I had sent):

drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages

and Jerome's AGP fix: ttm: fix agp since ttm tt rework

and got the crash (but only with NVidia cards) after swapping between Xorg and the VCs. Look in drm-next.jpg

http://darnok.org/vga/drm-next.jpg

...

With your patch removed ("drm/nouveau/ttm: fix crash as a result of a recent ttm change") and the patch below by Jerome I still get it to crash (see drm-next-with-Jerome-fix-revert-Ben.jpg)..

http://darnok.org/vga/drm-next-with-Jerome-fix-revert-Ben.jpg

...

...
Ben.

...
Signed-off-by: Jerome Glisse jglisse@redhat.com

drivers/gpu/drm/nouveau/nouveau_bo.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index f12dd0f..65f5b0b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -808,9 +808,8 @@ out: }

static void -nouveau_bo_move_ntfy(struct ttm_buffer_object *bo, struct ttm_mem_reg *new_mem) +nouveau_bo_move_notify(struct ttm_buffer_object *bo, struct ttm_mem_reg *new_mem) {

struct nouveau_mem *node = new_mem->mm_node; struct nouveau_bo *nvbo = nouveau_bo(bo); struct nouveau_vma *vma;

@@ -820,6 +819,7 @@ nouveau_bo_move_ntfy(struct ttm_buffer_object *bo, struct ttm_mem_reg *new_mem) } else if (new_mem && new_mem->mem_type == TTM_PL_TT && nvbo->page_shift == vma->vm->spg_shift) {
	struct nouveau_mem *node = new_mem->mm_node;
nouveau_vm_map_sg(vma, 0, new_mem->
		  num_pages << PAGE_SHIFT,
		  node, node->pages);
@@ -1131,7 +1131,7 @@ struct ttm_bo_driver nouveau_bo_driver = { .invalidate_caches = nouveau_bo_invalidate_caches, .init_mem_type = nouveau_bo_init_mem_type, .evict_flags = nouveau_bo_evict_flags,

.move_notify = nouveau_bo_move_ntfy,

.move_notify = nouveau_bo_move_notify, .move = nouveau_bo_move, .verify_access = nouveau_bo_verify_access, .sync_obj_signaled = __nouveau_fence_signalled,
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Show replies by date

Jerome Glisse

6 Jan 6 Jan

4:51 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Fri, Jan 6, 2012 at 9:57 AM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:

...

On Thu, Jan 05, 2012 at 09:14:10PM -0500, Konrad Rzeszutek Wilk wrote:

...
On Fri, Jan 06, 2012 at 07:53:13AM +1000, Ben Skeggs wrote:

...
On Thu, 2012-01-05 at 13:31 -0500, j.glisse@gmail.com wrote:

...
From: Jerome Glisse jglisse@redhat.com

ttm might call the move notify with null new mem placement, properly handle this case inside nouveau move notify callback.

This has been fixed already in a -next tree I sent to Dave.

I just tried -next with your patch (and two other fixes that I had sent):

drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages

and Jerome's AGP fix: ttm: fix agp since ttm tt rework

and got the crash (but only with NVidia cards) after swapping between Xorg and the VCs. Look in drm-next.jpg

http://darnok.org/vga/drm-next.jpg

...
With your patch removed ("drm/nouveau/ttm: fix crash as a result of a recent ttm change") and the patch below by Jerome I still get it to crash (see drm-next-with-Jerome-fix-revert-Ben.jpg)..

http://darnok.org/vga/drm-next-with-Jerome-fix-revert-Ben.jpg

Anything special to trigger it ? I can't trigger it with simple gnome3 session (firefox evince ...)

Cheers, Jerome

Konrad Rzeszutek Wilk

4:53 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Fri, Jan 06, 2012 at 11:51:03AM -0500, Jerome Glisse wrote:

...

On Fri, Jan 6, 2012 at 9:57 AM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:

...
On Thu, Jan 05, 2012 at 09:14:10PM -0500, Konrad Rzeszutek Wilk wrote:

...
On Fri, Jan 06, 2012 at 07:53:13AM +1000, Ben Skeggs wrote:

...
On Thu, 2012-01-05 at 13:31 -0500, j.glisse@gmail.com wrote:

...
From: Jerome Glisse jglisse@redhat.com

ttm might call the move notify with null new mem placement, properly handle this case inside nouveau move notify callback.

This has been fixed already in a -next tree I sent to Dave.

I just tried -next with your patch (and two other fixes that I had sent):

drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages

and Jerome's AGP fix: ttm: fix agp since ttm tt rework

and got the crash (but only with NVidia cards) after swapping between Xorg and the VCs. Look in drm-next.jpg

http://darnok.org/vga/drm-next.jpg

...
With your patch removed ("drm/nouveau/ttm: fix crash as a result of a recent ttm change") and the patch below by Jerome I still get it to crash (see drm-next-with-Jerome-fix-revert-Ben.jpg)..

http://darnok.org/vga/drm-next-with-Jerome-fix-revert-Ben.jpg

Anything special to trigger it ? I can't trigger it with simple gnome3 session (firefox evince ...)

I ran etracer, then switched over to a framebuffer console (Alt-F2), logged in. Then ran perf record and switched back to etracer. Ran a couple of laps and when finished quit the perf top. On the PCI-e it took a while (so I had to run a couple of laps).

On the AGP one it happended immediately, which is no surprise since the code looks to be activated when we do garbage collection and the machine only had 2GB. The PCIe on has 8GB. Perhaps a better way would be to force the workqueue by setting the pool limits to smaller values.

...

Cheers, Jerome

Jerome Glisse

6:22 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Fri, Jan 06, 2012 at 11:53:35AM -0500, Konrad Rzeszutek Wilk wrote:

...

On Fri, Jan 06, 2012 at 11:51:03AM -0500, Jerome Glisse wrote:

...
On Fri, Jan 6, 2012 at 9:57 AM, Konrad Rzeszutek Wilk konrad.wilk@oracle.com wrote:

...
On Thu, Jan 05, 2012 at 09:14:10PM -0500, Konrad Rzeszutek Wilk wrote:

...
On Fri, Jan 06, 2012 at 07:53:13AM +1000, Ben Skeggs wrote:

...
On Thu, 2012-01-05 at 13:31 -0500, j.glisse@gmail.com wrote:

...
From: Jerome Glisse jglisse@redhat.com

ttm might call the move notify with null new mem placement, properly handle this case inside nouveau move notify callback.

This has been fixed already in a -next tree I sent to Dave.

I just tried -next with your patch (and two other fixes that I had sent):

drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages

and Jerome's AGP fix: ttm: fix agp since ttm tt rework

and got the crash (but only with NVidia cards) after swapping between Xorg and the VCs. Look in drm-next.jpg

http://darnok.org/vga/drm-next.jpg

...
With your patch removed ("drm/nouveau/ttm: fix crash as a result of a recent ttm change") and the patch below by Jerome I still get it to crash (see drm-next-with-Jerome-fix-revert-Ben.jpg)..

http://darnok.org/vga/drm-next-with-Jerome-fix-revert-Ben.jpg

Anything special to trigger it ? I can't trigger it with simple gnome3 session (firefox evince ...)

I ran etracer, then switched over to a framebuffer console (Alt-F2), logged in. Then ran perf record and switched back to etracer. Ran a couple of laps and when finished quit the perf top. On the PCI-e it took a while (so I had to run a couple of laps).

On the AGP one it happended immediately, which is no surprise since the code looks to be activated when we do garbage collection and the machine only had 2GB. The PCIe on has 8GB. Perhaps a better way would be to force the workqueue by setting the pool limits to smaller values.

Still having difficulty to reproduce can you reproduce with the attached printk debuging patch and provide the log (only few printk preceding the oops or segfault are interesting).

Cheers, Jerome

Konrad Rzeszutek Wilk

7:52 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

...

Still having difficulty to reproduce can you reproduce with the attached printk debuging patch and provide the log (only few printk preceding the oops or segfault are interesting).

http://darnok.org/vga/move_notify-v212.log

...

Cheers, Jerome

...

...
From 862e2cc6d35d85404ed24d24c5a5c49c5ef45fc7 Mon Sep 17 00:00:00 2001

From: Jerome Glisse jglisse@redhat.com Date: Fri, 6 Jan 2012 13:20:08 -0500 Subject: [PATCH] TTM-DEBUG-PRINTK

drivers/gpu/drm/nouveau/nouveau_bo.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 724b41a..326b64a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -812,12 +812,14 @@ nouveau_bo_move_ntfy(struct ttm_buffer_object *bo, struct ttm_mem_reg *new_mem) struct nouveau_bo *nvbo = nouveau_bo(bo); struct nouveau_vma *vma;

+DRM_INFO("%s list (%p %p)\n", __func__, nvbo->vma_list.prev, nvbo->vma_list.next); list_for_each_entry(vma, &nvbo->vma_list, head) { if (new_mem && new_mem->mem_type == TTM_PL_VRAM) { nouveau_vm_map(vma, new_mem->mm_node); } else if (new_mem && new_mem->mem_type == TTM_PL_TT && nvbo->page_shift == vma->vm->spg_shift) { +DRM_INFO("%s vma %p new mem %p %d pages\n", __func__, vma, new_mem, new_mem->num_pages); nouveau_vm_map_sg(vma, 0, new_mem-> num_pages << PAGE_SHIFT, new_mem->mm_node); -- 1.7.5.4

Jerome Glisse

9 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Fri, Jan 06, 2012 at 02:52:49PM -0500, Konrad Rzeszutek Wilk wrote:

...

...
Still having difficulty to reproduce can you reproduce with the attached printk debuging patch and provide the log (only few printk preceding the oops or segfault are interesting).

http://darnok.org/vga/move_notify-v212.log

Looks like nouveau doesn't like move notify being call on driver shutdown or when somethings om nv50 is down. Ben i think you will be better at finding a fix for that than me.

Cheers, Jerome

Ben Skeggs

10 Jan 10 Jan

3:46 a.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Fri, 2012-01-06 at 16:00 -0500, Jerome Glisse wrote:

...

On Fri, Jan 06, 2012 at 02:52:49PM -0500, Konrad Rzeszutek Wilk wrote:

...
...
Still having difficulty to reproduce can you reproduce with the attached printk debuging patch and provide the log (only few printk preceding the oops or segfault are interesting).

http://darnok.org/vga/move_notify-v212.log

Looks like nouveau doesn't like move notify being call on driver shutdown or when somethings om nv50 is down. Ben i think you will be better at finding a fix for that than me.

I'm also not able to reproduce this issue on a NV98 (so, i'd expect every nv50+ chipset to behave the same) chipset with the current code in Dave's drm-core-next tree..

Am I missing something?

Ben.

...

Cheers, Jerome

Konrad Rzeszutek Wilk

2:34 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Tue, Jan 10, 2012 at 01:46:05PM +1000, Ben Skeggs wrote:

...

On Fri, 2012-01-06 at 16:00 -0500, Jerome Glisse wrote:

...
On Fri, Jan 06, 2012 at 02:52:49PM -0500, Konrad Rzeszutek Wilk wrote:

...
...
Still having difficulty to reproduce can you reproduce with the attached printk debuging patch and provide the log (only few printk preceding the oops or segfault are interesting).

http://darnok.org/vga/move_notify-v212.log

Looks like nouveau doesn't like move notify being call on driver shutdown or when somethings om nv50 is down. Ben i think you will be better at finding a fix for that than me.

I'm also not able to reproduce this issue on a NV98 (so, i'd expect every nv50+ chipset to behave the same) chipset with the current code in Dave's drm-core-next tree..

I was using 3.2 and then merged drm-core-next tree on top of that.

...

Am I missing something?

I am using a stock Fedora 16 with X Server 1.11.2. Machine has 8GB, and one DVI monitor and is an AMD box. The kernel was compiled using the default Fedora Core .config and for any new options I just hit enter.

Don't have the experimental libGL code, so using: OpenGL version string: 2.1 Mesa 7.11.2

for the rendering. And the test setup is fairly easy - launch etracer, switch to a FB VC (Ctrl-Alt-F2), login, find the etracer pid and run perf --record --pid X and then switch back. Finish playing the game, exit it and then switch to the FB VC to turn it off, and it happens.

Sometimes it happens when I just finished the game.

I also can reproduce this with an AGP card (GeForce 4 Ti4200?) on an Intel Prescott box (2GB of memory) - also with stock Fedora 16. Thought the crash is different:

http://darnok.org/vga/agp_nouveau_crash.jpg

Hmm, I can hook up a serial console to that box to get a better output - but perhaps before I do that should is there a debug patch I should compile in?

Konrad Rzeszutek Wilk

24 Jan 24 Jan

3 p.m.

New subject: [PATCH] drm/nouveau: fix ttm move notify callback

On Tue, Jan 10, 2012 at 01:46:05PM +1000, Ben Skeggs wrote:

...

On Fri, 2012-01-06 at 16:00 -0500, Jerome Glisse wrote:

...
On Fri, Jan 06, 2012 at 02:52:49PM -0500, Konrad Rzeszutek Wilk wrote:

...
...
Still having difficulty to reproduce can you reproduce with the attached printk debuging patch and provide the log (only few printk preceding the oops or segfault are interesting).

http://darnok.org/vga/move_notify-v212.log

Looks like nouveau doesn't like move notify being call on driver shutdown or when somethings om nv50 is down. Ben i think you will be better at finding a fix for that than me.

I'm also not able to reproduce this issue on a NV98 (so, i'd expect every nv50+ chipset to behave the same) chipset with the current code in Dave's drm-core-next tree..

There looks to be a bug about this openned when folks where using firefox and seeing large pictures or scrolling through a large web-page.

Any thoughts or things I could try out to narrow this down?

4846

Age (days ago)

4864

Last active (days ago)

dri-devel@lists.freedesktop.org

8 comments

3 participants

tags (0)

participants (3)

Ben Skeggs
Jerome Glisse
Konrad Rzeszutek Wilk