Am 17.09.21 um 19:53 schrieb Zack Rusin:
On some hardware, in particular in virtualized environments, the system memory can be shared with the "hardware". In those cases the BO's allocated through the ttm system manager might be busy during ttm_bo_put which results in them being scheduled for a delayed deletion.
While the patch itself is probably fine the reasoning here is a clear NAK.
Buffers in the system domain are not GPU accessible by definition, even in a shared environment and so *must* be idle.
Otherwise you break quite a number of assumptions in the code.
Regards, Christian.
The problem is that that the ttm system manager is disabled before the final delayed deletion is ran in ttm_device_fini. This results in crashes during freeing of the BO resources because they're trying to remove themselves from a no longer existent ttm_resource_manager (e.g. in IGT's core_hotunplug on vmwgfx)
In general reloading any driver that could share system mem resources with "hardware" could hit it because nothing prevents the system mem resources from being scheduled for delayed deletion (apart from them not being busy probably anywhere apart from virtualized environments).
Signed-off-by: Zack Rusin zackr@vmware.com Cc: Christian Koenig christian.koenig@amd.com Cc: Huang Rui ray.huang@amd.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: dri-devel@lists.freedesktop.org
drivers/gpu/drm/ttm/ttm_device.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 9eb8f54b66fc..4ef19cafc755 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -225,10 +225,6 @@ void ttm_device_fini(struct ttm_device *bdev) struct ttm_resource_manager *man; unsigned i;
- man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
- ttm_resource_manager_set_used(man, false);
- ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
- mutex_lock(&ttm_global_mutex); list_del(&bdev->device_list); mutex_unlock(&ttm_global_mutex);
@@ -238,6 +234,10 @@ void ttm_device_fini(struct ttm_device *bdev) if (ttm_bo_delayed_delete(bdev, true)) pr_debug("Delayed destroy list was clean\n");
- man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
- ttm_resource_manager_set_used(man, false);
- ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
- spin_lock(&bdev->lru_lock); for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) if (list_empty(&man->lru[0]))