(2011/07/12 19:06), Chris Wilson wrote:
On Tue, 12 Jul 2011 18:36:50 +0900, KOSAKI Motohiro kosaki.motohiro@jp.fujitsu.com wrote:
Hi,
sorry for the delay.
On Wed, 29 Jun 2011 20:53:54 -0700, Keith Packard keithp@keithp.com wrote:
On Fri, 24 Jun 2011 17:03:22 +0900, KOSAKI Motohiro kosaki.motohiro@jp.fujitsu.com wrote:
Now, i915_gem_inactive_shrink() should return -1 instead of 0 if it can't take a lock. Otherwise, vmscan is getting a lot of confusing because vmscan can't distinguish "can't take a lock temporary" and "we've shrank all of i915 objects".
This doesn't look like the cleanest change possible. I think it would be better if the shrink function could uniformly return an error indication so that we wouldn't need the weird looking conditional return.
shrink_icache_memory() is good sample code. It doesn't take a lock if sc->nr_to_scan==0. i915_gem_inactive_shrink() should do it too, ideally.
My patch only take a first-aid.
Plus, if I understand correctly, i915_gem_inactive_shrink() have more fundamental issue. actually, shrinker code shouldn't use mutex. Instead, use spinlock.
Why? The shrinker code is run in a non-atomic context that is explicitly allowed to wait, or so I thought. Where's the caveat that prevents mutex? Why doesn't the kernel complain?
The matter is not in contention. The problem is happen if the mutex is taken by shrink_slab calling thread. i915_gem_inactive_shrink() have no way to shink objects. How do you detect such case?
IOW, Don't call kmalloc(GFP_KERNEL) while taking dev->struct_mutex. Otherwise, vmscan in its call path completely fail to shrink i915 cache and it makes big memory reclaim confusing if i915 have a lot of shrinkable pages.
i915 can have several GiB of shrinkable pages. Of which 2 GiB may be tied up in the GTT upon which we have to wait for the GPU to release. In the future, we will be able to tie up all of physical memory.
There is only a single potential kmalloc in the shrinker path, for which we could preallocate a request so that we always have one available here.
Again, waiting is no problem if it is enough little time. btw, I think preallocation must be implemented, otherwise shrinker have no guarantee to shrink.
thanks.
Unless I am mistaken, and there are more patches in flight, the return code from i915_gem_inactive_shrink() is promoted to unsigned long and then used in the calculation of how may objects to evict...
shrinker->shrink has int type value. you can't change i915_gem_inactive_shrink() unless generic shrinker code. Do you really want to change it?
No, just pointing out that the patch causes warnings from the shrinker code as it tries to process (unsigned long)-1 objects. shrink_slab() does not use <0 as an error code!
Look.
unsigned long shrink_slab(struct shrink_control *shrink, unsigned long nr_pages_scanned, unsigned long lru_pages) { (snip) while (total_scan >= SHRINK_BATCH) { long this_scan = SHRINK_BATCH; int shrink_ret; int nr_before;
nr_before = do_shrinker_shrink(shrinker, shrink, 0); shrink_ret = do_shrinker_shrink(shrinker, shrink, this_scan); if (shrink_ret == -1) break;