Fast pow() With Adjustable Accuracy

Harrison Ainsworth

http://www.hxa.name/
artifex (ατ) hxa7241 (dοτ) org

2007-11-25

Summary

A fast alternative to the standard C/C++ pow() function. With adjustable accuracy-space tradeoff. (1400 words)

subject: programming, optimization, C, C++, pow, floating-point, IEEE-754
uri: http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html
license: Creative Commons BY-SA 3.0 License.
download: Source code in C and C++: http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.zip

Description
Code
Explanation
Supplement
Download
License

Description

This technique is formed of two ideas: manipulation of IEEE-754 floating-point bits, and precalculation of a lookup table.

Float Bits

First, a fast pow:

‘A Fast, Compact Approximation of the Exponential Function’
Technical Report IDSIA-07-98;
Nicol N. Schraudolph;
IDSIA,
1998-06-24.
http://users.rsise.anu.edu.au/~nici/bib2html/b2hd-Schraudolph99.html
http://www.idsia.ch/idsiareport/IDSIA-07-98.ps.gz

— was re-written for float implementation instead of double (avoiding endian platform dependence of original). It works by inserting the submitted number into the exponent bits of the answer. The fractional-part makes a linear interpolation in the mantissa (simple and fast, but inaccurate). Or it can be used as follows.

Lookup Table

Second, an adjustable fast log:

‘Revisiting a basic function on current CPUs: A fast logarithm implementation
with adjustable accuracy’
Technical Report ICSI TR-07-002;
Oriol Vinyals, Gerald Friedland, Nikki Mirghafori;
ICSI,
2007-06-21.
http://www.icsi.berkeley.edu/cgi-bin/pubs/publication.pl?ID=002209
http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf

— was adapted to fit the fast pow core (and modified to be twice as accurate). This uses the mantissa bits to index a lookup table. The table value then replaces the mantissa.

Precision is adustable from 0 to 18. This is how many bits to use as index, and hence how large the table is: 4B to 1MB.

For precision 11 (8KB table), mean error is < 0.01%, and max error is < 0.02% (proportional, ie: abs(true - approx) / true). And, on one machine, it was over 9 times faster than standard pow.

Code

In C (89) (and assuming 32 bit integers), the core is:

const float _2p23 = 8388608.0f;


/**
 * Initialize powFast lookup table.
 *
 * @pTable     length must be 2 ^ precision
 * @precision  number of mantissa bits used, >= 0 and <= 18
 */
void powFastSetTable
(
   unsigned int* const pTable,
   const unsigned int  precision
)
{
   /* step along table elements and x-axis positions */
   float zeroToOne = 1.0f / ((float)(1 << precision) * 2.0f);        /* A */
   int   i;                                                          /* B */
   for( i = 0;  i < (1 << precision);  ++i )                         /* C */
   {
      /* make y-axis value for table element */
      const float f = ((float)pow( 2.0f, zeroToOne ) - 1.0f) * _2p23;
      pTable[i] = (unsigned int)( f < _2p23 ? f : (_2p23 - 1.0f) );

      zeroToOne += 1.0f / (float)(1 << precision);
   }                                                                 /* D */
}


/**
 * Get pow (fast!).
 *
 * @val        power to raise radix to
 * @ilog2      one over log, to required radix, of two
 * @pTable     length must be 2 ^ precision
 * @precision  number of mantissa bits used, >= 0 and <= 18
 */
float powFastLookup
(
   const float         val,
   const float         ilog2,
   unsigned int* const pTable,
   const unsigned int  precision
)
{
   /* build float bits */
   const int i = (int)( (val * (_2p23 * ilog2)) + (127.0f * _2p23) );

   /* replace mantissa with lookup */
   const int it = (i & 0xFF800000) | pTable[(i & 0x7FFFFF) >>        /* E */
      (23 - precision)];                                             /* F */

   /* convert bits to float */
   return *(const float*)( &it );
}

Setting the ilog2 parameter:

for pow( 2, val), ilog2 = 1 / log2(2) = 1
for pow( e, val), ilog2 = 1 / log(2) = 1.44269504088896
for pow(10, val), ilog2 = 1 / log10(2) = 3.32192809488736
for pow( r, val), ilog2 = log(r) / log(2) = log(r) * 1.44269504088896

These two functions can easily be put in a small class/module that manages the storage and wraps calls for different radixes.

Explanation

Twiddling

An IEEE-754 single-precision floating point number comprises a sign, exponent, and mantissa, with value: (sign)mantissa * 2^exponent. The exponent is -125 to +128, the mantissa is 1 to 2 - 2^-23 (and the sign is + or -). The bit-representation is slightly different. The exponent is 8 bits with 127 added (the bias), and the mantissa is 23 bits with an implicit extra 1 assumed, but not stored, as its next higher bit. This is the form of a usual number.

sign	exponent	mantissa
S	EEEEEEEE	MMMMMMMMMMMMMMMMMMMMMMM
31	23	0

Considering the parts as plain integer numbers, the value is: (+/-)(1 + mantissa/2²³) * 2^{(exponent - 127)}. So, for example, to make 2.5, set the exponent to 128 and the mantissa to 0x200000, shift them into position, and reinterpret as a float.

If you set the exponent to an integer number, and the mantissa to zero, you effectively get pow(2,number). That leaves fractional numbers. An Engineering Mathematics book will show that a^{(b + c)} = a^b * a^c, which in this case can be 2^{(int + frac)} = 2^int * 2^frac. This separates the fractional calculation into something constant for all integers. And it matches the IEEE-754 machinery of mantissa * 2^exponent: setting the exponent and mantissa bits to int and 2^frac, and reinterpreting, will do the calculation.

The range of 2^frac is 1 to <2 — the same as the usual mantissa. So if the fractional part of the power is simply placed in the mantissa bits, it gives an approximation by linear interpolation (0 ≤ x < 1 → 1 ≤ y < 2). This is simplest and fastest, if up to 7% error is tolerable. Or a higher order, perhaps spline, interpolation can be calculated. Or, as here, a precalculated table can be queried.

There are 23 bits of mantissa, but the maximum precision is 18. This is because 5 bits are used by the integer part of the power.

(For other radixes, multiply the power by ln(radix) / ln(2).)

Rounding

The essential approximation is a ‘staircase’ function across the fraction range between successive integer powers. It has full float precision y values, but at limited regular x intervals. Accuracy is proportional to number of table elements.

Those limited intervals imply rounding considerations for setting and looking-up the table values. The mantissa could be rounded before truncation for table look-up. This costs a few instructions in speed. But instead this can be treated when setting the table. Set the table values for the middles of the x intervals — so line A (in the code) starts the iteration half an increment across. This (practically) minimises error. (Better would be the mean of the y values across each interval. But the improvement would be small for these circumstances.)

Radix 2

This solution has a small weakness: for radix two it produces inexact results for integer powers when it need not. This may be undesirable, since they are ‘special’. One treatment is to round the mantissa, as mentioned before:

in powFastSetTable, change lines A-C to:

float zeroToOne = 0.0f;
int   i;
for( i = 0;  i < ((1 << precision) + 1);  ++i )

— in powFastLookup, change lines E-F to:

/* (rounding mantissa-index before use) */
const int it = (i & 0xFF800000) | pTable[((i & 0x7FFFFF) + (0x400000 >>
   precision)) >> (23 - precision)];

— and the table is one element longer for all precisions (so change the @pTable comments too).

Instead, add a special case to set the first table element (the integer value) to zero. It avoids the rounding cost, but almost doubles max error in the first interval.

in powFastSetTable, add after line D:

/* make integer power exact */
pTable[0] = 0;

Supplement

It is possible to have maximum precision with medium precision table size (max error < 0.002%, 4KB instead of 1MB). The cost in speed is quite small — a few extra instructions.

The power equation above suggests the further reformulation. Just as the fractional calculation is separable from the integer calculation, the fractional one can itself be separated into parts: eg. 2^{(frachigh + fraclow)} = 2^frachigh * 2^fraclow. Breaking it into two is probably the most useful.

Here is an excerpt from a C++ class:

PowFast2::PowFast2()
{
   setTable( tableH_m,  9, 9, false );
   setTable( tableL_m, 18, 9, true  );
}


void PowFast2::setTable
(
   float* const pTable,
   const udword precision,
   const udword extent,
   const bool   isRound
)
{
   // step along table elements and x-axis positions
   float zeroToOne = !isRound ?
      0.0f : (1.0f / (static_cast<float>(1 << precision) * 2.0f));
   for( int i = 0;  i < (1 << extent);  ++i )
   {
      // make y-axis value for table element
      pTable[i] = ::powf( 2.0f, zeroToOne );

      zeroToOne += 1.0f / static_cast<float>(1 << precision);
   }
}


inline
float PowFast2::lookup
(
   const float val,
   const float ilog2
) const
{
   const float _2p23 = 8388608.0f;

   // build float bits
   const int i = static_cast<int>( (val * (_2p23 * ilog2)) + (127.0f * _2p23) );

   // replace mantissa with combined lookups
   const float t  = tableH_m[(i >> 14) & 0x1FF] * tableL_m[(i >> 5) & 0x1FF];
   const int   it = (i & 0xFF800000) |
      (*reinterpret_cast<const int*>( &t ) & 0x7FFFFF);

   // convert bits to float
   return *reinterpret_cast<const float*>( &it );
}

The limit is separating each bit. This yields 18 two-element tables, combinable with 17 multiplies. Since the first elements are all 1, only 8.5 multiplies are needed on average, if branching is used.

Download

Here are two source-code packages, including tests. In C++ as a simple constant class, and in C with an object interface:

http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.zip

License

The source code is available according to the (new) BSD license:

———

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.