Chris Morgan reported UBSAN errors in panfrost and tracked them down to the size computation in lock_region. This calculation is overcomplicated (cargo culted from kbase) and can be simplified with kernel helpers and some mathematical identities. The first patch in the series rewrites the calculation in a form avoiding undefined behaviour; Chris confirms it placates UBSAN.
While researching this function, I noticed a pair of other potential bugs: Bifrost can lock more than 4GiB at a time, but must lock at least 32KiB at a time. The latter patches in the series handle these cases.
In review of v1 of this series, Steven pointed out a fourth potential bug: rounding down the iova can truncate the lock region. v2 adds a new patch for this case.
The size computation was unit-tested in userspace. Relevant code below, just missing some copypaste definitions for fls64/clamp/etc:
#define MIN_LOCK (1ULL << 12) #define MAX_LOCK (1ULL << 48)
struct { uint64_t size; uint8_t encoded; } tests[] = { /* Clamping */ { 0, 11 }, { 1, 11 }, { 2, 11 }, { 4095, 11 }, /* Power of two */ { 4096, 11 }, /* Round up */ { 4097, 12 }, { 8192, 12 }, { 16384, 13 }, { 16385, 14 }, /* Maximum */ { ~0ULL, 47 }, };
static uint8_t region_width(uint64_t size) { size = clamp(size, MIN_LOCK, MAX_LOCK); return fls64(size - 1) - 1; }
int main(int argc, char **argv) { for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) { uint64_t test = tests[i].size; uint8_t expected = tests[i].encoded; uint8_t actual = region_width(test);
assert(expected == actual); } }
Changes in v2:
* New patch for non-aligned lock addresses * Commit message improvements. * Add Steven's tags.
Alyssa Rosenzweig (4): drm/panfrost: Simplify lock_region calculation drm/panfrost: Use u64 for size in lock_region drm/panfrost: Clamp lock region to Bifrost minimum drm/panfrost: Handle non-aligned lock addresses
drivers/gpu/drm/panfrost/panfrost_mmu.c | 32 ++++++++++-------------- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 2 files changed, 15 insertions(+), 19 deletions(-)
In lock_region, simplify the calculation of the region_width parameter. This field is the size, but encoded as ceil(log2(size)) - 1. ceil(log2(size)) may be computed directly as fls(size - 1). However, we want to use the 64-bit versions as the amount to lock can exceed 32-bits.
This avoids undefined (and completely wrong) behaviour when locking all memory (size ~0). In this case, the old code would "round up" ~0 to the nearest page, overflowing to 0. Since fls(0) == 0, this would calculate a region width of 10 + 0 = 10. But then the code would shift by (region_width - 11) = -1. As shifting by a negative number is undefined, UBSAN flags the bug. Of course, even if it were defined the behaviour is wrong, instead of locking all memory almost none would get locked.
The new form of the calculation corrects this special case and avoids the undefined behaviour.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reported-and-tested-by: Chris Morgan macromorgan@hotmail.com Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Cc: stable@vger.kernel.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 0da5b3100ab1..f6e02d0392f4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, { u8 region_width; u64 region = iova & PAGE_MASK; - /* - * fls returns: - * 1 .. 32 - * - * 10 + fls(num_pages) - * results in the range (11 .. 42) - */ - - size = round_up(size, PAGE_SIZE);
- region_width = 10 + fls(size >> PAGE_SHIFT); - if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) { - /* not pow2, so must go up to the next pow2 */ - region_width += 1; - } + /* The size is encoded as ceil(log2) minus(1), which may be calculated + * with fls. The size must be clamped to hardware bounds. + */ + size = max_t(u64, size, PAGE_SIZE); + region_width = fls64(size - 1) - 1; region |= region_width;
/* Lock the region that needs to be updated */
On 24/08/2021 18:30, Alyssa Rosenzweig wrote:
In lock_region, simplify the calculation of the region_width parameter. This field is the size, but encoded as ceil(log2(size)) - 1. ceil(log2(size)) may be computed directly as fls(size - 1). However, we want to use the 64-bit versions as the amount to lock can exceed 32-bits.
This avoids undefined (and completely wrong) behaviour when locking all memory (size ~0). In this case, the old code would "round up" ~0 to the nearest page, overflowing to 0. Since fls(0) == 0, this would calculate a region width of 10 + 0 = 10. But then the code would shift by (region_width - 11) = -1. As shifting by a negative number is undefined, UBSAN flags the bug. Of course, even if it were defined the behaviour is wrong, instead of locking all memory almost none would get locked.
The new form of the calculation corrects this special case and avoids the undefined behaviour.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reported-and-tested-by: Chris Morgan macromorgan@hotmail.com Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Cc: stable@vger.kernel.org
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 0da5b3100ab1..f6e02d0392f4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, { u8 region_width; u64 region = iova & PAGE_MASK;
/*
* fls returns:
* 1 .. 32
*
* 10 + fls(num_pages)
* results in the range (11 .. 42)
*/
size = round_up(size, PAGE_SIZE);
region_width = 10 + fls(size >> PAGE_SHIFT);
if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
/* not pow2, so must go up to the next pow2 */
region_width += 1;
}
/* The size is encoded as ceil(log2) minus(1), which may be calculated
* with fls. The size must be clamped to hardware bounds.
*/
size = max_t(u64, size, PAGE_SIZE);
region_width = fls64(size - 1) - 1; region |= region_width;
/* Lock the region that needs to be updated */
Mali virtual addresses are 48-bit. Use a u64 instead of size_t to ensure we can express the "lock everything" condition as ~0ULL without overflow. This code was silently broken on any platform where a size_t is less than 48-bits; in particular, it was broken on 32-bit armv7 platforms which remain in use with panfrost. (Mainly RK3288)
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Suggested-by: Rob Herring robh@kernel.org Tested-by: Chris Morgan macromorgan@hotmail.com Reviewed-by: Steven Price steven.price@arm.com Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Cc: stable@vger.kernel.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index f6e02d0392f4..3a795273e505 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -58,7 +58,7 @@ static int write_cmd(struct panfrost_device *pfdev, u32 as_nr, u32 cmd) }
static void lock_region(struct panfrost_device *pfdev, u32 as_nr, - u64 iova, size_t size) + u64 iova, u64 size) { u8 region_width; u64 region = iova & PAGE_MASK; @@ -78,7 +78,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr, - u64 iova, size_t size, u32 op) + u64 iova, u64 size, u32 op) { if (as_nr < 0) return 0; @@ -95,7 +95,7 @@ static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr,
static int mmu_hw_do_operation(struct panfrost_device *pfdev, struct panfrost_mmu *mmu, - u64 iova, size_t size, u32 op) + u64 iova, u64 size, u32 op) { int ret;
@@ -112,7 +112,7 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m u64 transtab = cfg->arm_mali_lpae_cfg.transtab; u64 memattr = cfg->arm_mali_lpae_cfg.memattr;
- mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM); + mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, AS_COMMAND_FLUSH_MEM);
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), transtab & 0xffffffffUL); mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), transtab >> 32); @@ -128,7 +128,7 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
static void panfrost_mmu_disable(struct panfrost_device *pfdev, u32 as_nr) { - mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM); + mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, AS_COMMAND_FLUSH_MEM);
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), 0); mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), 0); @@ -242,7 +242,7 @@ static size_t get_pgsize(u64 addr, size_t size)
static void panfrost_mmu_flush_range(struct panfrost_device *pfdev, struct panfrost_mmu *mmu, - u64 iova, size_t size) + u64 iova, u64 size) { if (mmu->as < 0) return;
When locking a region, we currently clamp to a PAGE_SIZE as the minimum lock region. While this is valid for Midgard, it is invalid for Bifrost, where the minimum locking size is 8x larger than the 4k page size. Add a hardware definition for the minimum lock region size (corresponding to KBASE_LOCK_REGION_MIN_SIZE_LOG2 in kbase) and respect it.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Tested-by: Chris Morgan macromorgan@hotmail.com Reviewed-by: Steven Price steven.price@arm.com Cc: stable@vger.kernel.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 3a795273e505..dfe5f1d29763 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -66,7 +66,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, /* The size is encoded as ceil(log2) minus(1), which may be calculated * with fls. The size must be clamped to hardware bounds. */ - size = max_t(u64, size, PAGE_SIZE); + size = max_t(u64, size, AS_LOCK_REGION_MIN_SIZE); region_width = fls64(size - 1) - 1; region |= region_width;
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 1940ff86e49a..6c5a11ef1ee8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -316,6 +316,8 @@ #define AS_FAULTSTATUS_ACCESS_TYPE_READ (0x2 << 8) #define AS_FAULTSTATUS_ACCESS_TYPE_WRITE (0x3 << 8)
+#define AS_LOCK_REGION_MIN_SIZE (1ULL << 15) + #define gpu_write(dev, reg, data) writel(data, dev->iomem + reg) #define gpu_read(dev, reg) readl(dev->iomem + reg)
When locking memory, the base address is rounded down to the nearest page. The current code does not adjust the size in this case, truncating the lock region:
Input: [----size----] Round: [----size----]
To fix the truncation, extend the lock region by the amount rounded off.
Input: [----size----] Round: [-------size------]
This bug is difficult to hit under current assumptions: since the size of the lock region is stored as a ceil(log2), the truncation must cause us to cross a power-of-two boundary. This is possible, for example if the caller tries to lock 65535 bytes starting at iova 0xCAFE0010. The existing code rounds down the iova to 0xCAFE0000 and rounds up the lock region to 65536 bytes, locking until 0xCAFF0000. This fails to lock the last 15 bytes.
In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding the bug. Therefore this doesn't need to be backported. Still, that's a happy accident and not a precondition of lock_region, so we let's do the right thing to future proof.
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reported-by: Steven Price steven.price@arm.com --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index dfe5f1d29763..14be32497ec3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -63,6 +63,9 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, u8 region_width; u64 region = iova & PAGE_MASK;
+ /* After rounding the address down, extend the size to lock the end. */ + size += (region - iova); + /* The size is encoded as ceil(log2) minus(1), which may be calculated * with fls. The size must be clamped to hardware bounds. */
On 24/08/2021 18:30, Alyssa Rosenzweig wrote:
When locking memory, the base address is rounded down to the nearest page. The current code does not adjust the size in this case, truncating the lock region:
Input: [----size----] Round: [----size----]
To fix the truncation, extend the lock region by the amount rounded off.
Input: [----size----] Round: [-------size------]
This bug is difficult to hit under current assumptions: since the size of the lock region is stored as a ceil(log2), the truncation must cause us to cross a power-of-two boundary. This is possible, for example if the caller tries to lock 65535 bytes starting at iova 0xCAFE0010. The existing code rounds down the iova to 0xCAFE0000 and rounds up the lock region to 65536 bytes, locking until 0xCAFF0000. This fails to lock the last 15 bytes.
In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding the bug. Therefore this doesn't need to be backported. Still, that's a happy accident and not a precondition of lock_region, so we let's do the right thing to future proof.
Actually it's worse than that due to the hardware behaviour, the spec states (for LOCKADDR_BASE):
Only the upper bits of the address are used. The address is aligned to a multiple of the region size, so a variable number of low-order bits are ignored, depending on the selected region size. It is recommended that software ensures that these low bits in the address are cleared, to avoid confusion.
It appears that indeed this has caused confusion ;)
So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size = 0x30000) the region width gets rounded up (to 0x40000) which causes the start address to be effectively rounded down (by the hardware) to 0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000.
Interestingly (unless my reading of this is wrong) that means to lock 0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking *at least* 0x00000000-0x200000000 (i.e. locking the first 8GB).
This appears to be broken in kbase (which actually does zero out the low bits of the address) - I've raised a bug internally so hopefully someone will tell me if I've read the spec completely wrong here.
Steve
Signed-off-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reported-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index dfe5f1d29763..14be32497ec3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -63,6 +63,9 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, u8 region_width; u64 region = iova & PAGE_MASK;
- /* After rounding the address down, extend the size to lock the end. */
- size += (region - iova);
- /* The size is encoded as ceil(log2) minus(1), which may be calculated
*/
- with fls. The size must be clamped to hardware bounds.
In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding the bug. Therefore this doesn't need to be backported. Still, that's a happy accident and not a precondition of lock_region, so we let's do the right thing to future proof.
Actually it's worse than that due to the hardware behaviour, the spec states (for LOCKADDR_BASE):
Only the upper bits of the address are used. The address is aligned to a multiple of the region size, so a variable number of low-order bits are ignored, depending on the selected region size. It is recommended that software ensures that these low bits in the address are cleared, to avoid confusion.
It appears that indeed this has caused confusion ;)
So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size = 0x30000) the region width gets rounded up (to 0x40000) which causes the start address to be effectively rounded down (by the hardware) to 0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000.
Interestingly (unless my reading of this is wrong) that means to lock 0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking *at least* 0x00000000-0x200000000 (i.e. locking the first 8GB).
This appears to be broken in kbase (which actually does zero out the low bits of the address) - I've raised a bug internally so hopefully someone will tell me if I've read the spec completely wrong here.
Horrifying, and not what I wanted to read my last day before 2 weeks of leave. Let's drop this patch, hopefully by the time I'm back, your friends in GPU can confirm that's a spec bug and not an actual hardware/driver one...
Can you apply the other 3 patches in the mean time? Thanks :-)
On 25/08/2021 15:07, Alyssa Rosenzweig wrote:
In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding the bug. Therefore this doesn't need to be backported. Still, that's a happy accident and not a precondition of lock_region, so we let's do the right thing to future proof.
Actually it's worse than that due to the hardware behaviour, the spec states (for LOCKADDR_BASE):
Only the upper bits of the address are used. The address is aligned to a multiple of the region size, so a variable number of low-order bits are ignored, depending on the selected region size. It is recommended that software ensures that these low bits in the address are cleared, to avoid confusion.
It appears that indeed this has caused confusion ;)
So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size = 0x30000) the region width gets rounded up (to 0x40000) which causes the start address to be effectively rounded down (by the hardware) to 0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000.
Interestingly (unless my reading of this is wrong) that means to lock 0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking *at least* 0x00000000-0x200000000 (i.e. locking the first 8GB).
This appears to be broken in kbase (which actually does zero out the low bits of the address) - I've raised a bug internally so hopefully someone will tell me if I've read the spec completely wrong here.
Horrifying, and not what I wanted to read my last day before 2 weeks of leave. Let's drop this patch, hopefully by the time I'm back, your friends in GPU can confirm that's a spec bug and not an actual hardware/driver one...
Can you apply the other 3 patches in the mean time? Thanks :-)
Yeah, sure. I'll push the first 3 to drm-misc-next-fixes (should land in v5.15).
It's interesting that if my (new) reading of the spec is correct then kbase has been horribly broken in this respect forever. So clearly it can't be something that crops up very often. It would have been good if the spec could have included wording such as "naturally aligned" if that's what was intended.
Enjoy your holiday!
Thanks, Steve
Horrifying, and not what I wanted to read my last day before 2 weeks of leave. Let's drop this patch, hopefully by the time I'm back, your friends in GPU can confirm that's a spec bug and not an actual hardware/driver one...
Can you apply the other 3 patches in the mean time? Thanks :-)
Yeah, sure. I'll push the first 3 to drm-misc-next-fixes (should land in v5.15).
It's interesting that if my (new) reading of the spec is correct then kbase has been horribly broken in this respect forever. So clearly it can't be something that crops up very often. It would have been good if the spec could have included wording such as "naturally aligned" if that's what was intended.
Indeed. Fingers crossed this is a mix-up. Although the text you quoted seems pretty clear unfortunately :|
Enjoy your holiday!
Thanks!
On Tue, Aug 24, 2021 at 12:30 PM Alyssa Rosenzweig alyssa.rosenzweig@collabora.com wrote:
Chris Morgan reported UBSAN errors in panfrost and tracked them down to the size computation in lock_region. This calculation is overcomplicated (cargo culted from kbase) and can be simplified with kernel helpers and some mathematical identities. The first patch in the series rewrites the calculation in a form avoiding undefined behaviour; Chris confirms it placates UBSAN.
While researching this function, I noticed a pair of other potential bugs: Bifrost can lock more than 4GiB at a time, but must lock at least 32KiB at a time. The latter patches in the series handle these cases.
In review of v1 of this series, Steven pointed out a fourth potential bug: rounding down the iova can truncate the lock region. v2 adds a new patch for this case.
The size computation was unit-tested in userspace. Relevant code below, just missing some copypaste definitions for fls64/clamp/etc:
#define MIN_LOCK (1ULL << 12) #define MAX_LOCK (1ULL << 48) struct { uint64_t size; uint8_t encoded; } tests[] = { /* Clamping */ { 0, 11 }, { 1, 11 }, { 2, 11 }, { 4095, 11 }, /* Power of two */ { 4096, 11 }, /* Round up */ { 4097, 12 }, { 8192, 12 }, { 16384, 13 }, { 16385, 14 }, /* Maximum */ { ~0ULL, 47 }, }; static uint8_t region_width(uint64_t size) { size = clamp(size, MIN_LOCK, MAX_LOCK); return fls64(size - 1) - 1; } int main(int argc, char **argv) { for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) { uint64_t test = tests[i].size; uint8_t expected = tests[i].encoded; uint8_t actual = region_width(test); assert(expected == actual); } }
Changes in v2:
- New patch for non-aligned lock addresses
- Commit message improvements.
- Add Steven's tags.
Alyssa Rosenzweig (4): drm/panfrost: Simplify lock_region calculation drm/panfrost: Use u64 for size in lock_region drm/panfrost: Clamp lock region to Bifrost minimum drm/panfrost: Handle non-aligned lock addresses
drivers/gpu/drm/panfrost/panfrost_mmu.c | 32 ++++++++++-------------- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 2 files changed, 15 insertions(+), 19 deletions(-)
For the series,
Reviewed-by: Rob Herring robh@kernel.org
dri-devel@lists.freedesktop.org