On Mon, Apr 1, 2013 at 4:14 PM, Christoph Lameter cl@linux.com wrote:
On Wed, 27 Mar 2013, Ilia Mirkin wrote:
The GPF happens at +160, which is in the argument setup for the cmpxchg in slab_alloc_node. I think it's the call to get_freepointer(). There was a similar bug report a while back, https://lkml.org/lkml/2011/5/23/199, and the recommendation was to run with slub debugging. Is that still the case, or is there a simpler explanation? I can't reproduce this at will, not sure how many times this has happened but definitely not many.
slub debugging will help to track down the cause of the memory corruption.
OK, with slub_debug=FZP, I get (after a while):
Which definitely makes it look like something in the nouveau context/whatever alloc failure path causes some stomping to happen. (I don't suppose it's reasonable to warn when the stomping happens through some sort of page protection... would explode the size since each n-byte object would be at least 4K, but might be worth it for debugging...)
On Sat, Apr 6, 2013 at 5:01 AM, Ilia Mirkin imirkin@alum.mit.edu wrote:
On Mon, Apr 1, 2013 at 4:14 PM, Christoph Lameter cl@linux.com wrote:
On Wed, 27 Mar 2013, Ilia Mirkin wrote:
The GPF happens at +160, which is in the argument setup for the cmpxchg in slab_alloc_node. I think it's the call to get_freepointer(). There was a similar bug report a while back, https://lkml.org/lkml/2011/5/23/199, and the recommendation was to run with slub debugging. Is that still the case, or is there a simpler explanation? I can't reproduce this at will, not sure how many times this has happened but definitely not many.
slub debugging will help to track down the cause of the memory corruption.
OK, with slub_debug=FZP, I get (after a while):
Which definitely makes it look like something in the nouveau context/whatever alloc failure path causes some stomping to happen. (I don't suppose it's reasonable to warn when the stomping happens through some sort of page protection... would explode the size since each n-byte object would be at least 4K, but might be worth it for debugging...)
OK, after staring for a while at this code, I found an issue, and looks like it's already fixed by cfd376b6bfccf33782a0748a9c70f7f752f8b869 (drm/nouveau/vm: fix memory corruption when pgt allocation fails), which didn't make it into 3.7.9, but is in 3.7.10. Time to upgrade, I guess. Thanks for the various suggestions.
dri-devel@lists.freedesktop.org