Guys, I don't know if these kinds of things have been forwarded to you, but there's apparently been several things like this going on - with the finger pointing to the i915 driver apparently clearing random memory. Often the end result seems to be list corruption or a NULL pointer dereference in the filesystem layer.
Linus
---------- Forwarded message ---------- From: Jiri Kosina jkosina@suse.cz Date: Fri, Mar 16, 2012 at 8:25 AM Subject: Re: Oops in ext3_block_to_path.isra.40+0x26/0x11b To: George Spelvin linux@horizon.com Cc: jack@suse.cz, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org
On Fri, 16 Mar 2012, George Spelvin wrote:
I am not aware of anything, but I have a question -- George, did the machine get suspended/resumed before this happened?
I'm pretty sure it didn't. It's a desktop machine, and I don't ever suspend it. It had been up for a while, and /sys/power/state exists, and *maybe* I played with it sometime since boot, but the backup runs nightly and I *definitely* did not suspend it the day (or even several days) before the oops.
(I tried to preserve the machine state, but processes started getting stuck in the kernel a few hours after the report, so I had to reboot it.)
Jan Kara asked:
And by any chance, do you use i915 driver? Because that one seems to cause corruption - see: https://lkml.org/lkml/2012/3/9/217. I believe Jiri's corruption is likely caused by that...
Yes! lspci -nn and abbreviated .config attached. But, as mentioned, the machine hasn't been suspended...
So it might be the culprit. As the reason of the corruption is not yet understood, it might be that suspend/resume cycle is not necessary pre-requisite for this to trigger, it might just make it more likely.
And the corruption is observed to be indeed several writes of 0x00000000, so it could easily lead to null pointer dereferences all over the place.
Are you able to reproduce the problem if you turn kernel modesetting off?
-- Jiri Kosina SUSE Labs
Guys, I don't know if these kinds of things have been forwarded to you, but there's apparently been several things like this going on - with the finger pointing to the i915 driver apparently clearing random memory. Often the end result seems to be list corruption or a NULL pointer dereference in the filesystem layer.
I think the hibernate problem isn't this problem.
The hibernate issue is known and I've been hoping someone from Intel would run with debugging it, they have a big enough team that I don't feel I can expend the personal time to look into it.
Maybe Keith can push someone or maybe I just refuse pull requests until one with a fix appears.
The latest thinking on the hibernate issues is kernel one sets up an fbcon, hibernate restores the old memory and the GTT still points at the pages, then something writes to the console and overwrites real memory., just a working theory, nobody has proven it yet.
Dave.
<#part sign=pgpmime> On Fri, 16 Mar 2012 16:47:46 +0000, Dave Airlie airlied@gmail.com wrote:
The hibernate issue is known and I've been hoping someone from Intel would run with debugging it, they have a big enough team that I don't feel I can expend the personal time to look into it.
Yeah, I've been chatting with a couple of intel folks; we should have a test patch ready shortly, but we haven't been able to reproduce anything like this...
Maybe Keith can push someone or maybe I just refuse pull requests until one with a fix appears.
From what I've seen, this is a problem only on Ironlake machines, which
makes me afraid of some weird GTT flushing issue, given the adventures we had with VT-d on that hardware where we idle the gpu before any GTT updates. I wonder what would happen if we idled the GPU before any GTT updates even when VT-d wasn't running...
The latest thinking on the hibernate issues is kernel one sets up an fbcon, hibernate restores the old memory and the GTT still points at the pages, then something writes to the console and overwrites real memory., just a working theory, nobody has proven it yet.
Presumably the resumed kernel will not be able to write to the console until the i915 driver is running again, at which point it will have updated all of the GTT entries. And, presumably the booted kernel won't be writing to the console after it has loaded the resumed kernel memory?
On Fri, Mar 16, 2012 at 11:01 PM, Keith Packard keithp@keithp.com wrote:
<#part sign=pgpmime> On Fri, 16 Mar 2012 16:47:46 +0000, Dave Airlie airlied@gmail.com wrote:
The hibernate issue is known and I've been hoping someone from Intel would run with debugging it, they have a big enough team that I don't feel I can expend the personal time to look into it.
Yeah, I've been chatting with a couple of intel folks; we should have a test patch ready shortly, but we haven't been able to reproduce anything like this...
Maybe Keith can push someone or maybe I just refuse pull requests until one with a fix appears.
From what I've seen, this is a problem only on Ironlake machines, which makes me afraid of some weird GTT flushing issue, given the adventures we had with VT-d on that hardware where we idle the gpu before any GTT updates. I wonder what would happen if we idled the GPU before any GTT updates even when VT-d wasn't running...
We've had reports on 965GM in Fedora, maybe davej has the specific bug.
Dave.
<#part sign=pgpmime> On Sat, 17 Mar 2012 07:41:14 +0000, Dave Airlie airlied@gmail.com wrote:
We've had reports on 965GM in Fedora, maybe davej has the specific bug.
Are these reports relatively recent, leading to a possible software bug introduced in the last couple of releases?
On Sat, Mar 17, 2012 at 6:59 PM, Keith Packard keithp@keithp.com wrote:
<#part sign=pgpmime> On Sat, 17 Mar 2012 07:41:14 +0000, Dave Airlie airlied@gmail.com wrote:
We've had reports on 965GM in Fedora, maybe davej has the specific bug.
Are these reports relatively recent, leading to a possible software bug introduced in the last couple of releases?
No I think they are since 0-day KMS reports.
but its St Patricks day so really not a reliable info source for a couple of days :-)
Dave.
On Sat, Mar 17, 2012 at 7:20 PM, Dave Airlie airlied@gmail.com wrote:
On Sat, Mar 17, 2012 at 6:59 PM, Keith Packard keithp@keithp.com wrote:
<#part sign=pgpmime> On Sat, 17 Mar 2012 07:41:14 +0000, Dave Airlie airlied@gmail.com wrote:
We've had reports on 965GM in Fedora, maybe davej has the specific bug.
Are these reports relatively recent, leading to a possible software bug introduced in the last couple of releases?
No I think they are since 0-day KMS reports.
but its St Patricks day so really not a reliable info source for a couple of days :-)
I did however get a flashback in google to this:
http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
Linus don't think we ever did work out why that worked, I wonder if we lost something after that. Dave.
On Sat, Mar 17, 2012 at 1:23 PM, Dave Airlie airlied@gmail.com wrote:
I did however get a flashback in google to this:
http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
Linus don't think we ever did work out why that worked, I wonder if we lost something after that.
Hmm. Maybe we should stop making the default gfp_mask be GFP_HIGHMEM_MOVABLE (see inode_init_always() in fs/inode.c), and instead add the _MOVABLE flag only for mappings that use "generic_file_mmap()".
It does sound a bit scary to just make default mappings use MOVABLE pages, when we know that can be incorrect for some cases.
But that would require that filesystems like ext4 etc that don't just use the generic_file_mmap() function would have add do that MOVABLE thing on their own. And the *normal* case is certainly that pages are movable and putting them in themovable zone should be ok. It's just *not* ok if you also map them into some hardware GTT thing or just otherwise keep track of the pages directly, like DRI does.
The other possibility is to just make this a shm thing, since it's generally that layer that has the case of "pages allocated with the mapping gfp masks". Hugh Dickins clearly tried to make sure that the DRM initialization of the gfp mask was honored, but maybe there is some case where it is missed. Hugh added a mapping_set_gfp_mask() to i915_gem_alloc_object(), but maybe we have i915 shmem mappings that get set up through some other path.
What about the ttm swap storage thing, for example? That also does shmem_file_setup(), without limiting the pages. But I don't think i915 uses ttm, right?
Al? Hugh? Opinions? Right now anybody who uses the shmem_read_mapping_page() function is in danger of getting a MOVABLE page by default. Which may or may not be right.
That said, I don't see how i915 would get a movable page. It *seems* to do the right setup for the gfp flags of the mapping.
Linus
On Sat, 17 Mar 2012, Linus Torvalds wrote:
On Sat, Mar 17, 2012 at 1:23 PM, Dave Airlie airlied@gmail.com wrote:
I did however get a flashback in google to this:
http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
Linus don't think we ever did work out why that worked, I wonder if we lost something after that.
Hmm. Maybe we should stop making the default gfp_mask be GFP_HIGHMEM_MOVABLE (see inode_init_always() in fs/inode.c), and instead add the _MOVABLE flag only for mappings that use "generic_file_mmap()".
It does sound a bit scary to just make default mappings use MOVABLE pages, when we know that can be incorrect for some cases.
But that would require that filesystems like ext4 etc that don't just use the generic_file_mmap() function would have add do that MOVABLE thing on their own. And the *normal* case is certainly that pages are movable and putting them in themovable zone should be ok. It's just *not* ok if you also map them into some hardware GTT thing or just otherwise keep track of the pages directly, like DRI does.
The other possibility is to just make this a shm thing, since it's generally that layer that has the case of "pages allocated with the mapping gfp masks". Hugh Dickins clearly tried to make sure that the DRM initialization of the gfp mask was honored, but maybe there is some case where it is missed. Hugh added a mapping_set_gfp_mask() to i915_gem_alloc_object(), but maybe we have i915 shmem mappings that get set up through some other path.
What about the ttm swap storage thing, for example? That also does shmem_file_setup(), without limiting the pages. But I don't think i915 uses ttm, right?
Al? Hugh? Opinions? Right now anybody who uses the shmem_read_mapping_page() function is in danger of getting a MOVABLE page by default. Which may or may not be right.
That said, I don't see how i915 would get a movable page. It *seems* to do the right setup for the gfp flags of the mapping.
Yes, i915_gem_alloc_object() does its own mapping_set_gfp_mask(mapping, GFP_HIGHUSER | __GFP_RECLAIMABLE), which should be as good as ever.
I've got to go out for an hour: I'll digest more and think more about this when I get back. If someone could explain the original problem with _MOVABLE, that would help me: I didn't see an explanation in the patch or in the bugzilla. I would expect using _MOVABLE for i915_gem would render a block of otherwise movable pages immovable because of the GEM pages pinned, but I wouldn't expect them to move in mysterious ways - or are references held to the pages without them being pinned??.
A factor which might turn out to be relevant: swapin_readahead() (or swapoff) brings in pages from swap before it knows who they belong to, so the swapped-in swapcache pages don't necessarily have the limitations mapping_set_gfp_mask() has asked for. It's been discussed with Alan, we know it will need fixing for gma500 when eventually that comes to be used on machines with more than 4GB, but for now I wasn't aware of a problem.
Hugh
On Sat, Mar 17, 2012 at 3:09 PM, Hugh Dickins hughd@google.com wrote:
I've got to go out for an hour: I'll digest more and think more about this when I get back. If someone could explain the original problem with _MOVABLE, that would help me:
I do not believe we actually ever uncovered the original problem with _MOVABLE: the problem was bisected down to the stable-backported version of commit 4bdadb978569 ("drm/i915: Selectively enable self-reclaim"), and I looked at the changes and decided that one of the main ones was the removal of the mapping_set_gfp_mask() - which resulted in __GFP_MOVABLE being on for the mapping.
So we just tried re-introducing that, and it fixed the problem.
Exactly *why* movable pages are a problem was never all that clear. The assumption was (and I guess still is) that there are physical pointers to the pages in the DRM pages themselves. Quoting from the original thread:
"I could easily see that something would get very unhappy and corrupt memory if the suspend-to-disk phase ends up compacting memory and moving the pages around from under the i915 driver."
but I didn't actually see why the i915 page pinning would be defeated by __GFP_MOVABLE. The code does get a reference to them afaik.
So no, we don't have a real first cause. We just have a band-aid for the symptom. It might be a pure bug in hibernation that just assumes that any pages in MOVABLE zones can always be moved without even checking any refcounts, so not using __GFP_MOVABLE might just be working around problems elsewhere.
Or it's entirely possible that the DRM layer does *not* increment the refcounts of the pages properly, and that hibernate with movable pages is just a very efficient way of showing the problems that results in.
A factor which might turn out to be relevant: swapin_readahead() (or swapoff) brings in pages from swap before it knows who they belong to, so the swapped-in swapcache pages don't necessarily have the limitations mapping_set_gfp_mask() has asked for. It's been discussed with Alan, we know it will need fixing for gma500 when eventually that comes to be used on machines with more than 4GB, but for now I wasn't aware of a problem.
So i915 shouldn't much care about highmem vs non-highmem, it's just that it does hold on to pages and take their physical address. But as far as I can tell, whenever the physical address is in use, it does have a refcount to the page.
So for example, i915_gem_object_get_pages_gtt() will use shmem_read_mapping_page_gfp() which will increment the page count for the page it gets, so all the obj->pages[] entries should have properly incremented page counts. And they get released by i915_gem_object_put_pages_gtt(), but maybe that is called too early while the pages are still in use by the GFX unit...
If there is a refcounting issue, hibernate + __GFP_MOVABLE might just be a great way to see it.
I don't really know the code, though.
Linus
<#part sign=pgpmime> On Sat, 17 Mar 2012 15:52:15 -0700, Linus Torvalds torvalds@linux-foundation.org wrote:
I do not believe we actually ever uncovered the original problem with _MOVABLE: the problem was bisected down to the stable-backported version of commit 4bdadb978569 ("drm/i915: Selectively enable self-reclaim"), and I looked at the changes and decided that one of the main ones was the removal of the mapping_set_gfp_mask() - which resulted in __GFP_MOVABLE being on for the mapping.
Can anyone explain what __GFP_MOVABLE even does? I can't understand what this flag would be for; if the page is locked (with get_page), then the page cannot move. If it isn't locked, then it's subject to swapping, in which case the page will almost certainly move when it returns from disk. Is it that the page won't move if it isn't swapped? That doesn't seem all that useful to me.
but I didn't actually see why the i915 page pinning would be defeated by __GFP_MOVABLE. The code does get a reference to them afaik.
GTT mapping and page locking are done in lock-step in the driver:
i915_gem_object_bind_to_gtt i915_gem_object_get_pages_gtt pins the pages i915_gem_gtt_bind_object maps to GTT
i915_gem_object_unbind i915_gem_gtt_unbind_object unmaps from GTT i915_gem_object_put_pages_gtt unpins the pages.
There are no other code paths to unmapping objects from the GTT or unpinning the pages that I can find.
So for example, i915_gem_object_get_pages_gtt() will use shmem_read_mapping_page_gfp() which will increment the page count for the page it gets, so all the obj->pages[] entries should have properly incremented page counts. And they get released by i915_gem_object_put_pages_gtt(), but maybe that is called too early while the pages are still in use by the GFX unit...
This seems the most likely problem -- there are so many caches and buffers involved. However, we're seeing troubles on hibernate resume, at which point there isn't any acceleration going on, just two fbdev drivers poking the hardware. Which really reduces the complexity quite a bit; it's just CPU reads/writes through the GTT aperture created for the two console frame buffers. That makes this an interesting place to look for trouble; we can ignore vast areas within the driver that deal with acceleration, at least for this case.
On Sat, 17 Mar 2012, Keith Packard wrote:
On Sat, 17 Mar 2012 15:52:15 -0700, Linus Torvalds torvalds@linux-foundation.org wrote:
I do not believe we actually ever uncovered the original problem with _MOVABLE: the problem was bisected down to the stable-backported version of commit 4bdadb978569 ("drm/i915: Selectively enable self-reclaim"), and I looked at the changes and decided that one of the main ones was the removal of the mapping_set_gfp_mask() - which resulted in __GFP_MOVABLE being on for the mapping.
Can anyone explain what __GFP_MOVABLE even does? I can't understand what this flag would be for; if the page is locked (with get_page), then the page cannot move. If it isn't locked, then it's subject to swapping, in which case the page will almost certainly move when it returns from disk. Is it that the page won't move if it isn't swapped? That doesn't seem all that useful to me.
__GFP_MOVABLE is a hint to page allocation that there's a good likelihood that this logical page can be migrated elsewhere in physical memory later on if mm wants, so it's a good idea to allocate it from a physical area of similarly MOVABLE pages; then if later on someone wants a large contiguous area for something (or wants to hot-unplug that memory), it should be easy to clear the whole area out, moving existing pages elsewhere. (I think that's right: several questions come to me as I write it, but now is not the time to research all those details.) Page migration can only be done later if it can account for all of page_count(page).
but I didn't actually see why the i915 page pinning would be defeated by __GFP_MOVABLE. The code does get a reference to them afaik.
GTT mapping and page locking are done in lock-step in the driver:
i915_gem_object_bind_to_gtt i915_gem_object_get_pages_gtt pins the pages i915_gem_gtt_bind_object maps to GTT
i915_gem_object_unbind i915_gem_gtt_unbind_object unmaps from GTT i915_gem_object_put_pages_gtt unpins the pages.
There are no other code paths to unmapping objects from the GTT or unpinning the pages that I can find.
So for example, i915_gem_object_get_pages_gtt() will use shmem_read_mapping_page_gfp() which will increment the page count for the page it gets, so all the obj->pages[] entries should have properly incremented page counts. And they get released by i915_gem_object_put_pages_gtt(), but maybe that is called too early while the pages are still in use by the GFX unit...
This seems the most likely problem -- there are so many caches and buffers involved. However, we're seeing troubles on hibernate resume, at which point there isn't any acceleration going on, just two fbdev drivers poking the hardware. Which really reduces the complexity quite a bit; it's just CPU reads/writes through the GTT aperture created for the two console frame buffers. That makes this an interesting place to look for trouble; we can ignore vast areas within the driver that deal with acceleration, at least for this case.
I keep worrying about the sequence when the machine is powered on again after hibernation: can i915 get up to anything before it is resumed from the hibernation image? Get to use certain pages at that stage, then continue to poke at them after the hibernation image is restored (which changes the story of what pages are free and what are used for what): lacking some kind of flush?
Hugh
<#part sign=pgpmime> On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins hughd@google.com wrote:
__GFP_MOVABLE is a hint to page allocation that there's a good likelihood that this logical page can be migrated elsewhere in physical memory later on if mm wants, so it's a good idea to allocate it from a physical area of similarly MOVABLE pages;
So, allocating things with __GFP_MOVABLE may just change where in memory the graphics pages get allocated, moving whatever GPU-inspired damage to less sensitive bits of the kernel data. Sounds fabulous!
I keep worrying about the sequence when the machine is powered on again after hibernation: can i915 get up to anything before it is resumed from the hibernation image?
Well, the frame buffer is presumably still using whatever mapping it had before suspend occurred; is there any way it could be writing through that before the graphics driver was resumed?
What I don't understand is the relationship between the boot kernel and the resumed kernel; when does the boot kernel stop writing to the console, and how does it hand off control of the frame buffer at that time.
It would be great if we could separate out the boot kernel access to the graphics system from the resumed system -- if the boot kernel was run without the i915 driver loaded at all, and just used VGA text mode, then any damage as a result of resume wouldn't be caused by the boot kernel GTT mappings getting used at the wrong time.
Added Rafael to the Cc: Rafael, we're pondering over one or more of these recurrent threads about corruption after resume, seemingly related to i915.
On Sat, 17 Mar 2012, Keith Packard wrote:
On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins hughd@google.com wrote:
I keep worrying about the sequence when the machine is powered on again after hibernation: can i915 get up to anything before it is resumed from the hibernation image?
Well, the frame buffer is presumably still using whatever mapping it had before suspend occurred; is there any way it could be writing through that before the graphics driver was resumed?
It's hibernation restore here, so I don't think it could be using the mapping from before hibernation until after resuming from hibernation snapshot: it would be using the rebooting kernel's mapping until then.
What I don't understand is the relationship between the boot kernel and the resumed kernel; when does the boot kernel stop writing to the console, and how does it hand off control of the frame buffer at that time.
I believe the handoff point comes in the late initcall software_resume(): which loads the image and calls hibernation_restore -> resume_target_kernel -> swsusp_arch_resume, which emerges into the restored hibernation image.
As a late initcall, I imagine some work has already been done via the framebuffer, but I have no conception of what kind of mappings that involves (would shmem objects come into it at all? and is that even a relevant question, could enough damage be done without them?), nor whether they're properly torn down before emerging into the hibernimage.
It would be great if we could separate out the boot kernel access to the graphics system from the resumed system -- if the boot kernel was run without the i915 driver loaded at all, and just used VGA text mode, then any damage as a result of resume wouldn't be caused by the boot kernel GTT mappings getting used at the wrong time.
But you're giving my worry more credence than it deserves there: we don't have any evidence that this is where the problem lies, that's just a suspicion of mine at the moment.
Hugh
On Sunday, March 18, 2012, Hugh Dickins wrote:
Added Rafael to the Cc: Rafael, we're pondering over one or more of these recurrent threads about corruption after resume, seemingly related to i915.
Thanks for letting me know. :-)
I actually have a confirmation that the issue isn't present if the KMS is not used.
On Sat, 17 Mar 2012, Keith Packard wrote:
On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins hughd@google.com wrote:
I keep worrying about the sequence when the machine is powered on again after hibernation: can i915 get up to anything before it is resumed from the hibernation image?
Well, the frame buffer is presumably still using whatever mapping it had before suspend occurred; is there any way it could be writing through that before the graphics driver was resumed?
It's hibernation restore here, so I don't think it could be using the mapping from before hibernation until after resuming from hibernation snapshot: it would be using the rebooting kernel's mapping until then.
That's correct.
What I don't understand is the relationship between the boot kernel and the resumed kernel; when does the boot kernel stop writing to the console, and how does it hand off control of the frame buffer at that time.
I believe the handoff point comes in the late initcall software_resume(): which loads the image and calls hibernation_restore -> resume_target_kernel -> swsusp_arch_resume, which emerges into the restored hibernation image.
That's correct too. It may, however, be done through the SNAPSHOT_ATOMIC_RESTORE ioctl in kernel/power/user.c, that calls hibernation_restore() directly, too.
In either case, hibernation_restore() calls suspend_console() which is the point the console is supposed to be left alone at.
Later, it calls dpm_suspend_start() that executes .freeze() callbacks of all device drivers and it calls resume_target_kernel() that executes the driver's .freeze_noirq() callbacks. If that goes well, swsusp_arch_resume() is run that jumps into the image kernel and then it starts over in create_image() at the restore_processor_state() call.
As a late initcall, I imagine some work has already been done via the framebuffer, but I have no conception of what kind of mappings that involves (would shmem objects come into it at all? and is that even a relevant question, could enough damage be done without them?), nor whether they're properly torn down before emerging into the hibernimage.
During resume from hibernation we're very careful to restore the entire contents of RAM and to switch the CPU (there's only one of them executing code at that point) to the pre-hibernation page tables. As a result, all of the CPU's memory mappings will be the same as before the hibernation when that's been completed. However, I'm not sure what the contents of the graphics' registers is at this point and whether or not it may possibly access "wrong" memory regions through DMA.
It would be great if we could separate out the boot kernel access to the graphics system from the resumed system -- if the boot kernel was run without the i915 driver loaded at all, and just used VGA text mode, then any damage as a result of resume wouldn't be caused by the boot kernel GTT mappings getting used at the wrong time.
That shouldn't be very difficult to verify, if i915 is built as a module and is not loaded until software_resume() is run.
But you're giving my worry more credence than it deserves there: we don't have any evidence that this is where the problem lies, that's just a suspicion of mine at the moment.
Well, pretty much the only explanation of the observed symptoms I can imagine is some kind of "leakage" of the boot kernel's memory mappings through the graphics adapter into the post-restore system.
Thanks, Rafael
On Sun, 18 Mar 2012, Rafael J. Wysocki wrote:
On Sunday, March 18, 2012, Hugh Dickins wrote:
On Sat, 17 Mar 2012, Keith Packard wrote:
On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins hughd@google.com wrote:
I keep worrying about the sequence when the machine is powered on again after hibernation: can i915 get up to anything before it is resumed from the hibernation image?
...
Well, pretty much the only explanation of the observed symptoms I can imagine is some kind of "leakage" of the boot kernel's memory mappings through the graphics adapter into the post-restore system.
Thanks for your detailed confirmations, Rafael. I see from other threads that pretty much everyone has now converged upon this hypothesis - which perhaps I read first elsewhere, forgot, and then thought it my idea! I'll leave it to the i915 experts: good luck in tracking it down.
Hugh
On Sat, 17 Mar 2012, Linus Torvalds wrote:
On Sat, Mar 17, 2012 at 3:09 PM, Hugh Dickins hughd@google.com wrote:
I've got to go out for an hour: I'll digest more and think more about this when I get back. If someone could explain the original problem with _MOVABLE, that would help me:
I do not believe we actually ever uncovered the original problem with _MOVABLE: the problem was bisected down to the stable-backported version of commit 4bdadb978569 ("drm/i915: Selectively enable self-reclaim"), and I looked at the changes and decided that one of the main ones was the removal of the mapping_set_gfp_mask() - which resulted in __GFP_MOVABLE being on for the mapping.
So we just tried re-introducing that, and it fixed the problem.
Exactly *why* movable pages are a problem was never all that clear. ...
Thanks a lot for your considered and detailed reply on this (before we added Rafael).
I've come to the (not entirely firm) conclusion that the addition and removal of __GFP_MOVABLE was just shifting the shmem object allocations from one block of memory (shared with others not using __GFP_MOVABLE) to another (shared with others using __GFP_MOVABLE) and back.
So when the __GFP_MOVABLE inadvertently went in, a new group of users who hadn't noticed the corruption before, now reported it; and when you removed the __GFP_MOVABLE (a good idea anyway, pinned pages just clogging up an otherwise movable block of memory), that group of users found their problem solved.
Not really a problem from i915 specifying __GFP_MOVABLE on shmem objects.
Hugh
On Sat, 2012-03-17 at 14:14 -0700, Linus Torvalds wrote:
On Sat, Mar 17, 2012 at 1:23 PM, Dave Airlie airlied@gmail.com wrote:
I did however get a flashback in google to this:
http://lists.fedoraproject.org/pipermail/scm-commits/2010-July/456636.html
Linus don't think we ever did work out why that worked, I wonder if we lost something after that.
Hmm. Maybe we should stop making the default gfp_mask be GFP_HIGHMEM_MOVABLE (see inode_init_always() in fs/inode.c), and instead add the _MOVABLE flag only for mappings that use "generic_file_mmap()".
It does sound a bit scary to just make default mappings use MOVABLE pages, when we know that can be incorrect for some cases.
But that would require that filesystems like ext4 etc that don't just use the generic_file_mmap() function would have add do that MOVABLE thing on their own. And the *normal* case is certainly that pages are movable and putting them in themovable zone should be ok. It's just *not* ok if you also map them into some hardware GTT thing or just otherwise keep track of the pages directly, like DRI does.
The other possibility is to just make this a shm thing, since it's generally that layer that has the case of "pages allocated with the mapping gfp masks". Hugh Dickins clearly tried to make sure that the DRM initialization of the gfp mask was honored, but maybe there is some case where it is missed. Hugh added a mapping_set_gfp_mask() to i915_gem_alloc_object(), but maybe we have i915 shmem mappings that get set up through some other path.
What about the ttm swap storage thing, for example? That also does shmem_file_setup(), without limiting the pages. But I don't think i915 uses ttm, right?
It's not an issue for ttm, as we never us page allocated through shmem for the hw. We use shmem to swapout buffer object (ttm_tt_swapout func in ttm_tt.c).
Cheers, Jerome
<#part sign=pgpmime> On Fri, 16 Mar 2012 09:11:54 -0700, Linus Torvalds torvalds@linux-foundation.org wrote:
I don't know if these kinds of things have been forwarded to you, but there's apparently been several things like this going on - with the finger pointing to the i915 driver apparently clearing random memory. Often the end result seems to be list corruption or a NULL pointer dereference in the filesystem layer.
Yikes! I've been participating in the hibernation thread, but I hadn't heard about this other random memory corruption.
We had a theory about hibernation -- unflushed FB writes that go astray when the pages underneath the GTT get reassigned when switching from the boot kernel to the resumed kernel.
Something similar could be happening at other times though? A graphics page is freed with writes outstanding, the GTT entry is cleared and the page allocated to something in the filesystem, but before the GTT entry update is recognized by the GPU, and before pending writes are flushed, the data gets written through the GTT to the middle of the file system structure. That stretches the bounds of credibility though; you'd have to have unflushed data *and* an unflushed GTT write, neither of which is supposed to happen.
So it might be the culprit. As the reason of the corruption is not yet understood, it might be that suspend/resume cycle is not necessary pre-requisite for this to trigger, it might just make it more likely.
That opens up a much broader scope of code that might contain problems...
On Fri, Mar 16, 2012 at 3:52 PM, Keith Packard keithp@keithp.com wrote:
We had a theory about hibernation -- unflushed FB writes that go astray when the pages underneath the GTT get reassigned when switching from the boot kernel to the resumed kernel.
Well, even without hibernation, one theory was about the VC switching - apparently that was one thing that at least one reporter does end up doing - switching from X into the virtual terminals and back.
That may have similar issues, and may be the deeper problem (do people still do the VC switching *for* hibernation?)
Maybe.
Linus
<#part sign=pgpmime> On Fri, 16 Mar 2012 16:24:35 -0700, Linus Torvalds torvalds@linux-foundation.org wrote:
Well, even without hibernation, one theory was about the VC switching
- apparently that was one thing that at least one reporter does end up
doing - switching from X into the virtual terminals and back.
Thanks for the extra info; it doesn't bring anything to mind as we just don't do that much when VT-switching anymore. But, yet another code path to explore.
That may have similar issues, and may be the deeper problem (do people still do the VC switching *for* hibernation?)
I haven't a clue -- I haven't tried hibernating a machine in years...
dri-devel@lists.freedesktop.org