On Tue, Aug 06, 2019 at 08:15:45PM -0300, Jason Gunthorpe wrote:
From: Jason Gunthorpe jgg@mellanox.com
radeon is using a device global hash table to track what mmu_notifiers have been registered on struct mm. This is better served with the new get/put scheme instead.
radeon has a bug where it was not blocking notifier release() until all the BO's had been invalidated. This could result in a use after free of pages the BOs. This is tied into a second bug where radeon left the notifiers running endlessly even once the interval tree became empty. This could result in a use after free with module unload.
Both are fixed by changing the lifetime model, the BOs exist in the interval tree with their natural lifetimes independent of the mm_struct lifetime using the get/put scheme. The release runs synchronously and just does invalidate_start across the entire interval tree to create the required DMA fence.
Additions to the interval tree after release are already impossible as only current->mm is used during the add.
Signed-off-by: Jason Gunthorpe jgg@mellanox.com drivers/gpu/drm/radeon/radeon.h | 3 - drivers/gpu/drm/radeon/radeon_device.c | 2 - drivers/gpu/drm/radeon/radeon_drv.c | 2 + drivers/gpu/drm/radeon/radeon_mn.c | 157 ++++++------------------- 4 files changed, 38 insertions(+), 126 deletions(-)
AMD team: Are you OK with this patch?
Jason