[[PATCH][RESENT] 1/3] Replace i2f() in r600_blit.c with an optimized version.

List overview All Threads
Download

newer

older

[PATCH V3 0/5] arm: samsung: Move...

[PATCH] nouveau: Use ERR_CAST...

Steven Fuerst

6 Aug 2012 6 Aug '12

11:11 p.m.

We use __fls() to find the most significant bit. Using that, the loop can be avoided. A second trick is to use the mod(32) behaviour of the rotate instructions on x86 to expand the range of the unsigned int to float conversion to the full 32 bits.

The routine is now exact up to 2^24. Above that, we truncate which is equivalent to rounding towards zero.

Signed-off-by: Steven Fuerst svfuerst@gmail.com --- drivers/gpu/drm/radeon/r600_blit.c | 52 +++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c index 3c031a4..f0ce441 100644 --- a/drivers/gpu/drm/radeon/r600_blit.c +++ b/drivers/gpu/drm/radeon/r600_blit.c @@ -489,29 +489,37 @@ set_default_state(drm_radeon_private_t *dev_priv) ADVANCE_RING(); }

-static uint32_t i2f(uint32_t input) +/* 23 bits of float fractional data */ +#define I2F_FRAC_BITS 23 +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1) + +/* + * Converts unsigned integer into 32-bit IEEE floating point representation. + * Will be exact from 0 to 2^24. Above that, we round towards zero + * as the fractional bits will not fit in a float. (It would be better to + * round towards even as the fpu does, but that is slower.) + * This routine depends on the mod(32) behaviour of the rotate instructions + * on x86. + */ +static uint32_t i2f(uint32_t x) { - u32 result, i, exponent, fraction; - - if ((input & 0x3fff) == 0) - result = 0; /* 0 is a special case */ - else { - exponent = 140; /* exponent biased by 127; */ - fraction = (input & 0x3fff) << 10; /* cheat and only - handle numbers below 2^^15 */ - for (i = 0; i < 14; i++) { - if (fraction & 0x800000) - break; - else { - fraction = fraction << 1; /* keep - shifting left until top bit = 1 */ - exponent = exponent - 1; - } - } - result = exponent << 23 | (fraction & 0x7fffff); /* mask - off top bit; assumed 1 */ - } - return result; + uint32_t msb, exponent, fraction; + + /* Zero is special */ + if (!x) return 0; + + /* Get location of the most significant bit */ + msb = __fls(x); + + /* + * Use a rotate instead of a shift because that works both leftwards + * and rightwards due to the mod(32) beahviour. This means we don't + * need to check to see if we are above 2^24 or not. + */ + fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK; + exponent = (127 + msb) << I2F_FRAC_BITS; + + return fraction + exponent; }

-- 1.7.10.4

Show replies by date

Steven Fuerst

6 Aug 6 Aug

11:11 p.m.

New subject: [[PATCH][RESENT] 2/3] Replace i2f() in r600_blit_kms.c with an optimized version.

The routine is now exact up to 2^24. Above that, we truncate which is equivalent to rounding towards zero.

Signed-off-by: Steven Fuerst svfuerst@gmail.com --- drivers/gpu/drm/radeon/r600_blit_kms.c | 53 ++++++++++++++------------------ 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index 2bef854..8307558 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -455,44 +455,37 @@ set_default_state(struct radeon_device *rdev) radeon_ring_write(ring, sq_stack_resource_mgmt_2); }

-#define I2F_MAX_BITS 15 -#define I2F_MAX_INPUT ((1 << I2F_MAX_BITS) - 1) -#define I2F_SHIFT (24 - I2F_MAX_BITS) +/* 23 bits of float fractional data */ +#define I2F_FRAC_BITS 23 +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)

/* * Converts unsigned integer into 32-bit IEEE floating point representation. - * Conversion is not universal and only works for the range from 0 - * to 2^I2F_MAX_BITS-1. Currently we only use it with inputs between - * 0 and 16384 (inclusive), so I2F_MAX_BITS=15 is enough. If necessary, - * I2F_MAX_BITS can be increased, but that will add to the loop iterations - * and slow us down. Conversion is done by shifting the input and counting - * down until the first 1 reaches bit position 23. The resulting counter - * and the shifted input are, respectively, the exponent and the fraction. - * The sign is always zero. + * Will be exact from 0 to 2^24. Above that, we round towards zero + * as the fractional bits will not fit in a float. (It would be better to + * round towards even as the fpu does, but that is slower.) + * This routine depends on the mod(32) behaviour of the rotate instructions + * on x86. */ -static uint32_t i2f(uint32_t input) +static uint32_t i2f(uint32_t x) { - u32 result, i, exponent, fraction; + uint32_t msb, exponent, fraction;

- WARN_ON_ONCE(input > I2F_MAX_INPUT); + /* Zero is special */ + if (!x) return 0;

- if ((input & I2F_MAX_INPUT) == 0) - result = 0; - else { - exponent = 126 + I2F_MAX_BITS; - fraction = (input & I2F_MAX_INPUT) << I2F_SHIFT; + /* Get location of the most significant bit */ + msb = __fls(x);

- for (i = 0; i < I2F_MAX_BITS; i++) { - if (fraction & 0x800000) - break; - else { - fraction = fraction << 1; - exponent = exponent - 1; - } - } - result = exponent << 23 | (fraction & 0x7fffff); - } - return result; + /* + * Use a rotate instead of a shift because that works both leftwards + * and rightwards due to the mod(32) beahviour. This means we don't + * need to check to see if we are above 2^24 or not. + */ + fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK; + exponent = (127 + msb) << I2F_FRAC_BITS; + + return fraction + exponent; }

int r600_blit_init(struct radeon_device *rdev)

-- 1.7.10.4

Steven Fuerst

11:11 p.m.

New subject: [[PATCH][RESENT] 3/3] Rename i2f() to int2float(), and make it global so one copy can be removed.

Remove the copy of i2f() in r600_blit_kms.c We rename the function to something longer now that it is a global symbol. This reduces the likelyhood of unintended clashes later.

This might be a candidate for inclusion inside general drm infrastructure. However, at the moment only the radeon driver uses it.

Signed-off-by: Steven Fuerst svfuerst@gmail.com --- drivers/gpu/drm/radeon/r600_blit.c | 66 ++++++++++++++-------------- drivers/gpu/drm/radeon/r600_blit_kms.c | 45 +++---------------- drivers/gpu/drm/radeon/r600_blit_shaders.h | 1 + 3 files changed, 40 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c index f0ce441..7c748ba 100644 --- a/drivers/gpu/drm/radeon/r600_blit.c +++ b/drivers/gpu/drm/radeon/r600_blit.c @@ -501,7 +501,7 @@ set_default_state(drm_radeon_private_t *dev_priv) * This routine depends on the mod(32) behaviour of the rotate instructions * on x86. */ -static uint32_t i2f(uint32_t x) +uint32_t int2float(uint32_t x) { uint32_t msb, exponent, fraction;

@@ -640,20 +640,20 @@ r600_blit_copy(struct drm_device *dev, vb = r600_nomm_get_vb_ptr(dev); }

- vb[0] = i2f(dst_x); + vb[0] = int2float(dst_x); vb[1] = 0; - vb[2] = i2f(src_x); + vb[2] = int2float(src_x); vb[3] = 0;

- vb[4] = i2f(dst_x); - vb[5] = i2f(h); - vb[6] = i2f(src_x); - vb[7] = i2f(h); + vb[4] = int2float(dst_x); + vb[5] = int2float(h); + vb[6] = int2float(src_x); + vb[7] = int2float(h);

- vb[8] = i2f(dst_x + cur_size); - vb[9] = i2f(h); - vb[10] = i2f(src_x + cur_size); - vb[11] = i2f(h); + vb[8] = int2float(dst_x + cur_size); + vb[9] = int2float(h); + vb[10] = int2float(src_x + cur_size); + vb[11] = int2float(h);

/* src */ set_tex_resource(dev_priv, FMT_8, @@ -729,20 +729,20 @@ r600_blit_copy(struct drm_device *dev, vb = r600_nomm_get_vb_ptr(dev); }

- vb[0] = i2f(dst_x / 4); + vb[0] = int2float(dst_x / 4); vb[1] = 0; - vb[2] = i2f(src_x / 4); + vb[2] = int2float(src_x / 4); vb[3] = 0;

- vb[4] = i2f(dst_x / 4); - vb[5] = i2f(h); - vb[6] = i2f(src_x / 4); - vb[7] = i2f(h); + vb[4] = int2float(dst_x / 4); + vb[5] = int2float(h); + vb[6] = int2float(src_x / 4); + vb[7] = int2float(h);

- vb[8] = i2f((dst_x + cur_size) / 4); - vb[9] = i2f(h); - vb[10] = i2f((src_x + cur_size) / 4); - vb[11] = i2f(h); + vb[8] = int2float((dst_x + cur_size) / 4); + vb[9] = int2float(h); + vb[10] = int2float((src_x + cur_size) / 4); + vb[11] = int2float(h);

/* src */ set_tex_resource(dev_priv, FMT_8_8_8_8, @@ -812,20 +812,20 @@ r600_blit_swap(struct drm_device *dev, dx2 = dx + w; dy2 = dy + h;

- vb[0] = i2f(dx); - vb[1] = i2f(dy); - vb[2] = i2f(sx); - vb[3] = i2f(sy); + vb[0] = int2float(dx); + vb[1] = int2float(dy); + vb[2] = int2float(sx); + vb[3] = int2float(sy);

- vb[4] = i2f(dx); - vb[5] = i2f(dy2); - vb[6] = i2f(sx); - vb[7] = i2f(sy2); + vb[4] = int2float(dx); + vb[5] = int2float(dy2); + vb[6] = int2float(sx); + vb[7] = int2float(sy2);

- vb[8] = i2f(dx2); - vb[9] = i2f(dy2); - vb[10] = i2f(sx2); - vb[11] = i2f(sy2); + vb[8] = int2float(dx2); + vb[9] = int2float(dy2); + vb[10] = int2float(sx2); + vb[11] = int2float(sy2);

switch(cpp) { case 4: diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index 8307558..1c7ed3a 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -455,39 +455,6 @@ set_default_state(struct radeon_device *rdev) radeon_ring_write(ring, sq_stack_resource_mgmt_2); }

-/* 23 bits of float fractional data */ -#define I2F_FRAC_BITS 23 -#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1) - -/* - * Converts unsigned integer into 32-bit IEEE floating point representation. - * Will be exact from 0 to 2^24. Above that, we round towards zero - * as the fractional bits will not fit in a float. (It would be better to - * round towards even as the fpu does, but that is slower.) - * This routine depends on the mod(32) behaviour of the rotate instructions - * on x86. - */ -static uint32_t i2f(uint32_t x) -{ - uint32_t msb, exponent, fraction; - - /* Zero is special */ - if (!x) return 0; - - /* Get location of the most significant bit */ - msb = __fls(x); - - /* - * Use a rotate instead of a shift because that works both leftwards - * and rightwards due to the mod(32) beahviour. This means we don't - * need to check to see if we are above 2^24 or not. - */ - fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK; - exponent = (127 + msb) << I2F_FRAC_BITS; - - return fraction + exponent; -} - int r600_blit_init(struct radeon_device *rdev) { u32 obj_size; @@ -759,14 +726,14 @@ void r600_kms_blit_copy(struct radeon_device *rdev, vb_cpu_addr[3] = 0;

vb_cpu_addr[4] = 0; - vb_cpu_addr[5] = i2f(h); + vb_cpu_addr[5] = int2float(h); vb_cpu_addr[6] = 0; - vb_cpu_addr[7] = i2f(h); + vb_cpu_addr[7] = int2float(h);

- vb_cpu_addr[8] = i2f(w); - vb_cpu_addr[9] = i2f(h); - vb_cpu_addr[10] = i2f(w); - vb_cpu_addr[11] = i2f(h); + vb_cpu_addr[8] = int2float(w); + vb_cpu_addr[9] = int2float(h); + vb_cpu_addr[10] = int2float(w); + vb_cpu_addr[11] = int2float(h);

rdev->r600_blit.primitives.set_tex_resource(rdev, FMT_8_8_8_8, w, h, w, src_gpu_addr, size_in_bytes); diff --git a/drivers/gpu/drm/radeon/r600_blit_shaders.h b/drivers/gpu/drm/radeon/r600_blit_shaders.h index f437d36..e17c2cb 100644 --- a/drivers/gpu/drm/radeon/r600_blit_shaders.h +++ b/drivers/gpu/drm/radeon/r600_blit_shaders.h @@ -35,4 +35,5 @@ extern const u32 r6xx_default_state[]; extern const u32 r6xx_ps_size, r6xx_vs_size; extern const u32 r6xx_default_size, r7xx_default_size;

+uint32_t int2float(uint32_t x); #endif

-- 1.7.10.4

Michel Dänzer

7 Aug 7 Aug

7:37 a.m.

New subject: [[PATCH][RESENT] 1/3] Replace i2f() in r600_blit.c with an optimized version.

On Mon, 2012-08-06 at 16:11 -0700, Steven Fuerst wrote:

...

We use __fls() to find the most significant bit. Using that, the loop can be avoided. A second trick is to use the mod(32) behaviour of the rotate instructions on x86 to expand the range of the unsigned int to float conversion to the full 32 bits.

The routine is now exact up to 2^24. Above that, we truncate which is equivalent to rounding towards zero.

Signed-off-by: Steven Fuerst svfuerst@gmail.com

drivers/gpu/drm/radeon/r600_blit.c | 52 +++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c index 3c031a4..f0ce441 100644 --- a/drivers/gpu/drm/radeon/r600_blit.c +++ b/drivers/gpu/drm/radeon/r600_blit.c @@ -489,29 +489,37 @@ set_default_state(drm_radeon_private_t *dev_priv) ADVANCE_RING(); }

-static uint32_t i2f(uint32_t input) +/* 23 bits of float fractional data */ +#define I2F_FRAC_BITS 23 +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)

+/*

Converts unsigned integer into 32-bit IEEE floating point representation.

Will be exact from 0 to 2^24. Above that, we round towards zero

as the fractional bits will not fit in a float. (It would be better to

round towards even as the fpu does, but that is slower.)

This routine depends on the mod(32) behaviour of the rotate instructions

on x86.

The radeon driver works on other architectures than x86. It sounds (and looks, looking at ror32() in include/linux/bitops.h) like this change will break those, which is a no go.

...

* Use a rotate instead of a shift because that works both leftwards

* and rightwards due to the mod(32) beahviour.  This means we don't

* need to check to see if we are above 2^24 or not.

```
*/
```
fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK;

Seems like you could write this as

fraction = ror32(x, (msb - I2F_FRAC_BITS) & 31) & I2F_MASK;

to avoid that, and remove the mentions of relying on the mod(32) behaviour.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer

4657

Age (days ago)

4658

Last active (days ago)

dri-devel@lists.freedesktop.org

3 comments

2 participants

tags (0)

participants (2)

Michel Dänzer
Steven Fuerst