Kernel side fence objects are used when unbinding resources and may thus be created as part of a memory reclaim operation. This might trigger recursive memory reclaims and result in the kernel running out of stack space.
So a simple way out is to avoid accounting of these fence objects. In principle this is OK since while user-space can trigger the creation of such objects, it can't really hold on to them. However, their lifetime is quite long, so some form of accounting should perhaps be implemented in the future.
Fixes kernel crashes when running, for example viewperf11 ensight-04 test 3 with low system memory settings.
Cc: stable@vger.kernel.org Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Jakob Bornecrantz jakob@vmware.com Reviewed-by: Sinclair Yeh syeh@vmware.com --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 22 ++-------------------- 1 file changed, 2 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 197164f..6773938 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -545,35 +545,19 @@ void vmw_fence_obj_flush(struct vmw_fence_obj *fence)
static void vmw_fence_destroy(struct vmw_fence_obj *fence) { - struct vmw_fence_manager *fman = fman_from_fence(fence); - fence_free(&fence->base); - - /* - * Free kernel space accounting. - */ - ttm_mem_global_free(vmw_mem_glob(fman->dev_priv), - fman->fence_size); }
int vmw_fence_create(struct vmw_fence_manager *fman, uint32_t seqno, struct vmw_fence_obj **p_fence) { - struct ttm_mem_global *mem_glob = vmw_mem_glob(fman->dev_priv); struct vmw_fence_obj *fence; int ret;
- ret = ttm_mem_global_alloc(mem_glob, fman->fence_size, - false, false); - if (unlikely(ret != 0)) - return ret; - fence = kzalloc(sizeof(*fence), GFP_KERNEL); - if (unlikely(fence == NULL)) { - ret = -ENOMEM; - goto out_no_object; - } + if (unlikely(fence == NULL)) + return -ENOMEM;
ret = vmw_fence_obj_init(fman, fence, seqno, vmw_fence_destroy); @@ -585,8 +569,6 @@ int vmw_fence_create(struct vmw_fence_manager *fman,
out_err_init: kfree(fence); -out_no_object: - ttm_mem_global_free(mem_glob, fman->fence_size); return ret; }
The commit "vmwgfx: Rework fence event action" introduced a number of bugs that are fixed with this commit:
a) A forgotten return stateemnt. b) An if statement with identical branches.
Cc: stable@vger.kernel.org Reported-by: Rob Clark robdclark@gmail.com Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Jakob Bornecrantz jakob@vmware.com Reviewed-by: Sinclair Yeh syeh@vmware.com --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 6773938..b7594cb 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -1087,6 +1087,8 @@ static int vmw_event_fence_action_create(struct drm_file *file_priv, if (ret != 0) goto out_no_queue;
+ return 0; + out_no_queue: event->base.destroy(&event->base); out_no_event: @@ -1162,17 +1164,10 @@ int vmw_fence_event_ioctl(struct drm_device *dev, void *data,
BUG_ON(fence == NULL);
- if (arg->flags & DRM_VMW_FE_FLAG_REQ_TIME) - ret = vmw_event_fence_action_create(file_priv, fence, - arg->flags, - arg->user_data, - true); - else - ret = vmw_event_fence_action_create(file_priv, fence, - arg->flags, - arg->user_data, - true); - + ret = vmw_event_fence_action_create(file_priv, fence, + arg->flags, + arg->user_data, + true); if (unlikely(ret != 0)) { if (ret != -ERESTARTSYS) DRM_ERROR("Failed to attach event to fence.\n");
This codepath is mostly hit when rebinding after a backup buffer swapout. It's amazing that this error hasn't been more obvious but probably the shaders are not reread from guest memory that often..
Signed-off-by: Thomas Hellstrom thellstrom@vmware.com Reviewed-by: Jakob Bornecrantz jakob@vmware.com Reviewed-by: Sinclair Yeh syeh@vmware.com --- drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c index 8719fb3..6a4584a 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c @@ -198,7 +198,7 @@ static int vmw_gb_shader_bind(struct vmw_resource *res, cmd->header.size = sizeof(cmd->body); cmd->body.shid = res->id; cmd->body.mobid = bo->mem.start; - cmd->body.offsetInBytes = 0; + cmd->body.offsetInBytes = res->backup_offset; res->backup_dirty = false; vmw_fifo_commit(dev_priv, sizeof(*cmd));
On 2 December 2014 at 21:59, Thomas Hellstrom thellstrom@vmware.com wrote:
Kernel side fence objects are used when unbinding resources and may thus be created as part of a memory reclaim operation. This might trigger recursive memory reclaims and result in the kernel running out of stack space.
So a simple way out is to avoid accounting of these fence objects. In principle this is OK since while user-space can trigger the creation of such objects, it can't really hold on to them. However, their lifetime is quite long, so some form of accounting should perhaps be implemented in the future.
Fixes kernel crashes when running, for example viewperf11 ensight-04 test 3 with low system memory settings.
are these 3 intended for fixes? can you send me a git pull for them soon, so they don't miss Linus release.
Dave.
On 12/03/2014 02:06 AM, Dave Airlie wrote:
On 2 December 2014 at 21:59, Thomas Hellstrom thellstrom@vmware.com wrote:
Kernel side fence objects are used when unbinding resources and may thus be created as part of a memory reclaim operation. This might trigger recursive memory reclaims and result in the kernel running out of stack space.
So a simple way out is to avoid accounting of these fence objects. In principle this is OK since while user-space can trigger the creation of such objects, it can't really hold on to them. However, their lifetime is quite long, so some form of accounting should perhaps be implemented in the future.
Fixes kernel crashes when running, for example viewperf11 ensight-04 test 3 with low system memory settings.
are these 3 intended for fixes? can you send me a git pull for them soon, so they don't miss Linus release.
Dave.
Hi.
Actually, these bugs have been quite long-standing and I feel a bit uncomfortable including the fixes this late in the release cycle. I'd rather wait to 3.19 unless you think otherwise.
Thanks, Thomas
dri-devel@lists.freedesktop.org