Chris Morgan reported UBSAN errors in panfrost and tracked them down to the size computation in lock_region. This calculation is overcomplicated (seemingly cargo culted from kbase) and can be simplified with kernel helpers and some mathematical identities. The first patch in the series rewrites the calculation in a form avoiding undefined behaviour; Chris confirms it placates UBSAN.
While researching this function, I noticed a pair of other potential bugs: Bifrost can lock more than 4GiB at a time, but must lock at least 32KiB at a time. The latter patches in the series handle these cases.
The size computation was unit-tested in userspace. Relevant code below, just missing some copypaste definitions for fls64/clamp/etc:
#define MIN_LOCK (1ULL << 12) #define MAX_LOCK (1ULL << 48)
struct { uint64_t size; uint8_t encoded; } tests[] = { /* Clamping */ { 0, 11 }, { 1, 11 }, { 2, 11 }, { 4095, 11 }, /* Power of two */ { 4096, 11 }, /* Round up */ { 4097, 12 }, { 8192, 12 }, { 16384, 13 }, { 16385, 14 }, /* Maximum */ { ~0ULL, 47 }, };
static uint8_t region_width(uint64_t size) { size = clamp(size, MIN_LOCK, MAX_LOCK); return fls64(size - 1) - 1; }
int main(int argc, char **argv) { for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) { uint64_t test = tests[i].size; uint8_t expected = tests[i].encoded; uint8_t actual = region_width(test);
assert(expected == actual); } }
Alyssa Rosenzweig (3): drm/panfrost: Simplify lock_region calculation drm/panfrost: Use u64 for size in lock_region drm/panfrost: Clamp lock region to Bifrost minimum
drivers/gpu/drm/panfrost/panfrost_mmu.c | 31 +++++++++--------------- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 2 files changed, 13 insertions(+), 20 deletions(-)
In lock_region, simplify the calculation of the region_width parameter. This field is the size, but encoded as log2(ceil(size)) - 1. log2(ceil(size)) may be computed directly as fls(size - 1). However, we want to use the 64-bit versions as the amount to lock can exceed 32-bits.
This avoids undefined behaviour when locking all memory (size ~0), caught by UBSAN.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reported-and-tested-by: Chris Morgan macromorgan@hotmail.com Cc: stable@vger.kernel.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 0da5b3100ab1..f6e02d0392f4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, { u8 region_width; u64 region = iova & PAGE_MASK; - /* - * fls returns: - * 1 .. 32 - * - * 10 + fls(num_pages) - * results in the range (11 .. 42) - */ - - size = round_up(size, PAGE_SIZE);
- region_width = 10 + fls(size >> PAGE_SHIFT); - if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) { - /* not pow2, so must go up to the next pow2 */ - region_width += 1; - } + /* The size is encoded as ceil(log2) minus(1), which may be calculated + * with fls. The size must be clamped to hardware bounds. + */ + size = max_t(u64, size, PAGE_SIZE); + region_width = fls64(size - 1) - 1; region |= region_width;
/* Lock the region that needs to be updated */
On 20/08/2021 22:31, Alyssa Rosenzweig wrote:
It might have been useful to mention what it is that UBSAN specifically picked up (it took me a while to spot) - but anyway I think there's a bigger issue with it being completely wrong when size == ~0 (see below).
However, I've confirmed this returns the same values and is certainly more simple, so:
Reviewed-by: Steven Price steven.price@arm.com
This seems to be the first issue - ~0 will be 'rounded up' to 0.
fls(0) == 0, so region_width == 10.
Presumably here's where UBSAN objects - we're shifting by a negative value, which even it it happens to works means the lock region is tiny and certainly not what was intended! It might well be worth a:
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Note for anyone following along at (working-from-) home: although this code was cargo culted from kbase - kbase is fine because it takes a pfn and doesn't do the round_up() stage.
Which also exposes the second bug (fixed in patch 2): a size_t isn't big enough on 32 bit platforms (all Midgard/Bifrost GPUs have a VA size bigger than 32 bits). Again kbase gets away with a u32 because it's a pfn.
There is potentially a third bug which kbase only recently attempted to fix. The lock address is effectively rounded down by the hardware (the bottom bits are ignored). So if you have mask=(1<<region_width)-1 but (iova & mask) != ((iova + size) & mask) then you are potentially failing to lock the end of the intended region. kbase has added some code to handle this:
I guess we should too?
Steve
Indeed. I've updated the commit message in v2 to explain what goes wrong (your analysis was spot on, but a mailing list message is more ephermal than a commit message). I'll send out v2 tomorrow assuming nobody objects to v1 in the mean time.
Thanks for the review.
Oh, I missed this one. Guess we have 4 bugs with this code instead of just 3, yikes. How could such a short function be so deeply and horribly broken? 😃
Should I add a fourth patch to the series to fix this?
On 23 August 2021 22:09:08 BST, Alyssa Rosenzweig alyssa@collabora.com wrote:
Yes please!
Thanks, Steve
Mali virtual addresses are 48-bit. Use a u64 instead of size_t to ensure we can express the "lock everything" condition as ~0ULL without relying on platform-specific behaviour.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Suggested-by: Rob Herring robh@kernel.org Tested-by: Chris Morgan macromorgan@hotmail.com --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index f6e02d0392f4..3a795273e505 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -58,7 +58,7 @@ static int write_cmd(struct panfrost_device *pfdev, u32 as_nr, u32 cmd) }
static void lock_region(struct panfrost_device *pfdev, u32 as_nr, - u64 iova, size_t size) + u64 iova, u64 size) { u8 region_width; u64 region = iova & PAGE_MASK; @@ -78,7 +78,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr, - u64 iova, size_t size, u32 op) + u64 iova, u64 size, u32 op) { if (as_nr < 0) return 0; @@ -95,7 +95,7 @@ static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr,
static int mmu_hw_do_operation(struct panfrost_device *pfdev, struct panfrost_mmu *mmu, - u64 iova, size_t size, u32 op) + u64 iova, u64 size, u32 op) { int ret;
@@ -112,7 +112,7 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m u64 transtab = cfg->arm_mali_lpae_cfg.transtab; u64 memattr = cfg->arm_mali_lpae_cfg.memattr;
- mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM); + mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, AS_COMMAND_FLUSH_MEM);
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), transtab & 0xffffffffUL); mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), transtab >> 32); @@ -128,7 +128,7 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
static void panfrost_mmu_disable(struct panfrost_device *pfdev, u32 as_nr) { - mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM); + mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, AS_COMMAND_FLUSH_MEM);
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), 0); mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), 0); @@ -242,7 +242,7 @@ static size_t get_pgsize(u64 addr, size_t size)
static void panfrost_mmu_flush_range(struct panfrost_device *pfdev, struct panfrost_mmu *mmu, - u64 iova, size_t size) + u64 iova, u64 size) { if (mmu->as < 0) return;
On 20/08/2021 22:31, Alyssa Rosenzweig wrote:
'platform-specific behaviour' makes it sound like this is something to do with a particular board. This is 32bit/64bit - it's going to be broken on 32bit: large lock regions are not going to work.
Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Reviewed-by: Steven Price steven.price@arm.com
Steve
Oh, my. I used the term as a weasel word since the spec is loose on how big a size_t actually is. But if this is causing actual breakage on armv7 we're in trouble. I'll add a Cc stable tag on v2, unless the Fixes implies that?
Thanks for pointing this out.
On 23 August 2021 22:11:09 BST, Alyssa Rosenzweig alyssa@collabora.com wrote:
The usual practice is to put CC: stable in the commit message or the fixes tag (no need to actually CC the stable email address). So just Fixes: should work
Thanks for pointing this out.
It's not actually quite so bad as it could be as >4GB regions are unlikely (especially on 32bit), but the GPU does of course support such a thing.
Thanks, Steve
When locking a region, we currently clamp to a PAGE_SIZE as the minimum lock region. While this is valid for Midgard, it is invalid for Bifrost, where the minimum locking size is 8x larger than the 4k page size. Add a hardware definition for the minimum lock region size (corresponding to KBASE_LOCK_REGION_MIN_SIZE_LOG2 in kbase) and respect it.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Tested-by: Chris Morgan macromorgan@hotmail.com Cc: stable@vger.kernel.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 3a795273e505..dfe5f1d29763 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -66,7 +66,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, /* The size is encoded as ceil(log2) minus(1), which may be calculated * with fls. The size must be clamped to hardware bounds. */ - size = max_t(u64, size, PAGE_SIZE); + size = max_t(u64, size, AS_LOCK_REGION_MIN_SIZE); region_width = fls64(size - 1) - 1; region |= region_width;
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 1940ff86e49a..6c5a11ef1ee8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -316,6 +316,8 @@ #define AS_FAULTSTATUS_ACCESS_TYPE_READ (0x2 << 8) #define AS_FAULTSTATUS_ACCESS_TYPE_WRITE (0x3 << 8)
+#define AS_LOCK_REGION_MIN_SIZE (1ULL << 15) + #define gpu_write(dev, reg, data) writel(data, dev->iomem + reg) #define gpu_read(dev, reg) readl(dev->iomem + reg)
On 20/08/2021 22:31, Alyssa Rosenzweig wrote:
When locking a region, we currently clamp to a PAGE_SIZE as the minimum lock region. While this is valid for Midgard, it is invalid for Bifrost,
While the spec does seem to state it's invalid for Bifrost - kbase didn't bother with a lower clamp for a long time. I actually think this is in many ways more of a spec bug: i.e. implementation details of the round-up that the hardware does. But it's much safer following the spec ;) And it seems like kbase eventually caught up too.
Reviewed-by: Steven Price steven.price@arm.com
On 23 August 2021 22:13:45 BST, Alyssa Rosenzweig alyssa@collabora.com wrote:
I think it might still be worth fixing. Early Bifrost should be fine, but something triggered a bug report that caused kbase to be fixed, so I'm less confident that there's nothing out there that cares. Following both kbase and the spec seems the safest approach.
Thanks, Steve
dri-devel@lists.freedesktop.org