src/lib/libc/gdtoa/README - annotate

Return to README CVS log
Up to [cvs.NetBSD.org] / src / lib / libc / gdtoa
Annotation of src/lib/libc/gdtoa/README, Revision 1.1.1.2

1.1       kleink      1: This directory contains source for a library of binary -> decimal
                      2: and decimal -> binary conversion routines, for single-, double-,
                      3: and extended-precision IEEE binary floating-point arithmetic, and
                      4: other IEEE-like binary floating-point, including "double double",
                      5: as in
                      6:
                      7:        T. J. Dekker, "A Floating-Point Technique for Extending the
                      8:        Available Precision", Numer. Math. 18 (1971), pp. 224-242
                      9:
                     10: and
                     11:
                     12:        "Inside Macintosh: PowerPC Numerics", Addison-Wesley, 1994
                     13:
                     14: The conversion routines use double-precision floating-point arithmetic
                     15: and, where necessary, high precision integer arithmetic.  The routines
                     16: are generalizations of the strtod and dtoa routines described in
                     17:
                     18:        David M. Gay, "Correctly Rounded Binary-Decimal and
                     19:        Decimal-Binary Conversions", Numerical Analysis Manuscript
                     20:        No. 90-10, Bell Labs, Murray Hill, 1990;
                     21:        http://cm.bell-labs.com/cm/cs/what/ampl/REFS/rounding.ps.gz
                     22:
                     23: (based in part on papers by Clinger and Steele & White: see the
                     24: references in the above paper).
                     25:
                     26: The present conversion routines should be able to use any of IEEE binary,
                     27: VAX, or IBM-mainframe double-precision arithmetic internally, but I (dmg)
                     28: have so far only had a chance to test them with IEEE double precision
                     29: arithmetic.
                     30:
                     31: The core conversion routines are strtodg for decimal -> binary conversions
                     32: and gdtoa for binary -> decimal conversions.  These routines operate
                     33: on arrays of unsigned 32-bit integers of type ULong, a signed 32-bit
                     34: exponent of type Long, and arithmetic characteristics described in
                     35: struct FPI; FPI, Long, and ULong are defined in gdtoa.h.  File arith.h
                     36: is supposed to provide #defines that cause gdtoa.h to define its
                     37: types correctly.  File arithchk.c is source for a program that
                     38: generates a suitable arith.h on all systems where I've been able to
                     39: test it.
                     40:
                     41: The core conversion routines are meant to be called by helper routines
                     42: that know details of the particular binary arithmetic of interest and
                     43: convert.  The present directory provides helper routines for 5 variants
                     44: of IEEE binary floating-point arithmetic, each indicated by one or
                     45: two letters:
                     46:
                     47:        f       IEEE single precision
                     48:        d       IEEE double precision
                     49:        x       IEEE extended precision, as on Intel 80x87
                     50:                and software emulations of Motorola 68xxx chips
                     51:                that do not pad the way the 68xxx does, but
                     52:                only store 80 bits
                     53:        xL      IEEE extended precision, as on Motorola 68xxx chips
                     54:        Q       quad precision, as on Sun Sparc chips
                     55:        dd      double double, pairs of IEEE double numbers
                     56:                whose sum is the desired value
                     57:
                     58: For decimal -> binary conversions, there are three families of
1.1.1.2 ! christos   59: helper routines: one for round-nearest (or the current rounding
        !            60: mode on IEEE-arithmetic systems that provide the C99 fegetround()
        !            61: function, if compiled with -DHonor_FLT_ROUNDS):
1.1       kleink     62:
                     63:        strtof
                     64:        strtod
                     65:        strtodd
                     66:        strtopd
                     67:        strtopf
                     68:        strtopx
                     69:        strtopxL
                     70:        strtopQ
                     71:
                     72: one with rounding direction specified:
                     73:
                     74:        strtorf
                     75:        strtord
                     76:        strtordd
                     77:        strtorx
                     78:        strtorxL
                     79:        strtorQ
                     80:
                     81: and one for computing an interval (at most one bit wide) that contains
                     82: the decimal number:
                     83:
                     84:        strtoIf
                     85:        strtoId
                     86:        strtoIdd
                     87:        strtoIx
                     88:        strtoIxL
                     89:        strtoIQ
                     90:
                     91: The latter call strtoIg, which makes one call on strtodg and adjusts
                     92: the result to provide the desired interval.  On systems where native
                     93: arithmetic can easily make one-ulp adjustments on values in the
                     94: desired floating-point format, it might be more efficient to use the
                     95: native arithmetic.  Routine strtodI is a variant of strtoId that
                     96: illustrates one way to do this for IEEE binary double-precision
                     97: arithmetic -- but whether this is more efficient remains to be seen.
                     98:
                     99: Functions strtod and strtof have "natural" return types, float and
                    100: double -- strtod is specified by the C standard, and strtof appears
                    101: in the stdlib.h of some systems, such as (at least some) Linux systems.
                    102: The other functions write their results to their final argument(s):
                    103: to the final two argument for the strtoI... (interval) functions,
                    104: and to the final argument for the others (strtop... and strtor...).
                    105: Where possible, these arguments have "natural" return types (double*
                    106: or float*), to permit at least some type checking.  In reality, they
                    107: are viewed as arrays of ULong (or, for the "x" functions, UShort)
                    108: values. On systems where long double is the appropriate type, one can
                    109: pass long double* final argument(s) to these routines.  The int value
                    110: that these routines return is the return value from the call they make
                    111: on strtodg; see the enum of possible return values in gdtoa.h.
                    112:
                    113: Source files g_ddfmt.c, misc.c, smisc.c, strtod.c, strtodg.c, and ulp.c
                    114: should use true IEEE double arithmetic (not, e.g., double extended),
                    115: at least for storing (and viewing the bits of) the variables declared
                    116: "double" within them.
                    117:
                    118: One detail indicated in struct FPI is whether the target binary
                    119: arithmetic departs from the IEEE standard by flushing denormalized
                    120: numbers to 0.  On systems that do this, the helper routines for
                    121: conversion to double-double format (when compiled with
                    122: Sudden_Underflow #defined) penalize the bottom of the exponent
                    123: range so that they return a nonzero result only when the least
                    124: significant bit of the less significant member of the pair of
                    125: double values returned can be expressed as a normalized double
                    126: value.  An alternative would be to drop to 53-bit precision near
                    127: the bottom of the exponent range.  To get correct rounding, this
                    128: would (in general) require two calls on strtodg (one specifying
                    129: 126-bit arithmetic, then, if necessary, one specifying 53-bit
                    130: arithmetic).
                    131:
                    132: By default, the core routine strtodg and strtod set errno to ERANGE
                    133: if the result overflows to +Infinity or underflows to 0.  Compile
                    134: these routines with NO_ERRNO #defined to inhibit errno assignments.
                    135:
                    136: Routine strtod is based on netlib's "dtoa.c from fp", and
                    137: (f = strtod(s,se)) is more efficient for some conversions than, say,
                    138: strtord(s,se,1,&f).  Parts of strtod require true IEEE double
                    139: arithmetic with the default rounding mode (round-to-nearest) and, on
                    140: systems with IEEE extended-precision registers, double-precision
                    141: (53-bit) rounding precision.  If the machine uses (the equivalent of)
                    142: Intel 80x87 arithmetic, the call
                    143:        _control87(PC_53, MCW_PC);
                    144: does this with many compilers.  Whether this or another call is
                    145: appropriate depends on the compiler; for this to work, it may be
                    146: necessary to #include "float.h" or another system-dependent header
                    147: file.
                    148:
                    149: Source file strtodnrp.c gives a strtod that does not require 53-bit
                    150: rounding precision on systems (such as Intel IA32 systems) that may
                    151: suffer double rounding due to use of extended-precision registers.
                    152: For some conversions this variant of strtod is less efficient than the
                    153: one in strtod.c when the latter is run with 53-bit rounding precision.
                    154:
                    155: The values that the strto* routines return for NaNs are determined by
                    156: gd_qnan.h, which the makefile generates by running the program whose
                    157: source is qnan.c.  Note that the rules for distinguishing signaling
                    158: from quiet NaNs are system-dependent.  For cross-compilation, you need
                    159: to determine arith.h and gd_qnan.h suitably, e.g., using the
                    160: arithmetic of the target machine.
                    161:
                    162: C99's hexadecimal floating-point constants are recognized by the
                    163: strto* routines (but this feature has not yet been heavily tested).
                    164: Compiling with NO_HEX_FP #defined disables this feature.
                    165:
                    166: When compiled with -DINFNAN_CHECK, the strto* routines recognize C99's
                    167: NaN and Infinity syntax.  Moreover, unless No_Hex_NaN is #defined, the
                    168: strto* routines also recognize C99's NaN(...) syntax: they accept
                    169: (case insensitively) strings of the form NaN(x), where x is a string
                    170: of hexadecimal digits and spaces; if there is only one string of
                    171: hexadecimal digits, it is taken for the fraction bits of the resulting
                    172: NaN; if there are two or more strings of hexadecimal digits, each
                    173: string is assigned to the next available sequence of 32-bit words of
                    174: fractions bits (starting with the most significant), right-aligned in
                    175: each sequence.
                    176:
                    177: For binary -> decimal conversions, I've provided just one family
                    178: of helper routines:
                    179:
                    180:        g_ffmt
                    181:        g_dfmt
                    182:        g_ddfmt
                    183:        g_xfmt
                    184:        g_xLfmt
                    185:        g_Qfmt
                    186:
                    187: which do a "%g" style conversion either to a specified number of decimal
                    188: places (if their ndig argument is positive), or to the shortest
                    189: decimal string that rounds to the given binary floating-point value
                    190: (if ndig <= 0).  They write into a buffer supplied as an argument
                    191: and return either a pointer to the end of the string (a null character)
                    192: in the buffer, if the buffer was long enough, or 0.  Other forms of
                    193: conversion are easily done with the help of gdtoa(), such as %e or %f
                    194: style and conversions with direction of rounding specified (so that, if
                    195: desired, the decimal value is either >= or <= the binary value).
1.1.1.2 ! christos  196: On IEEE-arithmetic systems that provide the C99 fegetround() function,
        !           197: if compiled with -DHonor_FLT_ROUNDS, these routines honor the current
        !           198: rounding mode.
1.1       kleink    199:
                    200: For an example of more general conversions based on dtoa(), see
                    201: netlib's "printf.c from ampl/solvers".
                    202:
                    203: For double-double -> decimal, g_ddfmt() assumes IEEE-like arithmetic
                    204: of precision max(126, #bits(input)) bits, where #bits(input) is the
                    205: number of mantissa bits needed to represent the sum of the two double
                    206: values in the input.
                    207:
                    208: The makefile creates a library, gdtoa.a.  To use the helper
                    209: routines, a program only needs to include gdtoa.h.  All the
                    210: source files for gdtoa.a include a more extensive gdtoaimp.h;
                    211: among other things, gdtoaimp.h has #defines that make "internal"
                    212: names end in _D2A.  To make a "system" library, one could modify
                    213: these #defines to make the names start with __.
                    214:
                    215: Various comments about possible #defines appear in gdtoaimp.h,
                    216: but for most purposes, arith.h should set suitable #defines.
                    217:
                    218: Systems with preemptive scheduling of multiple threads require some
                    219: manual intervention.  On such systems, it's necessary to compile
                    220: dmisc.c, dtoa.c gdota.c, and misc.c with MULTIPLE_THREADS #defined,
                    221: and to provide (or suitably #define) two locks, acquired by
                    222: ACQUIRE_DTOA_LOCK(n) and freed by FREE_DTOA_LOCK(n) for n = 0 or 1.
                    223: (The second lock, accessed in pow5mult, ensures lazy evaluation of
                    224: only one copy of high powers of 5; omitting this lock would introduce
                    225: a small probability of wasting memory, but would otherwise be harmless.)
                    226: Routines that call dtoa or gdtoa directly must also invoke freedtoa(s)
                    227: to free the value s returned by dtoa or gdtoa.  It's OK to do so whether
                    228: or not MULTIPLE_THREADS is #defined, and the helper g_*fmt routines
                    229: listed above all do this indirectly (in gfmt_D2A(), which they all call).
                    230:
                    231: By default, there is a private pool of memory of length 2000 bytes
                    232: for intermediate quantities, and MALLOC (see gdtoaimp.h) is called only
                    233: if the private pool does not suffice.   2000 is large enough that MALLOC
                    234: is called only under very unusual circumstances (decimal -> binary
                    235: conversion of very long strings) for conversions to and from double
                    236: precision.  For systems with preemptively scheduled multiple threads
                    237: or for conversions to extended or quad, it may be appropriate to
                    238: #define PRIVATE_MEM nnnn, where nnnn is a suitable value > 2000.
                    239: For extended and quad precisions, -DPRIVATE_MEM=20000 is probably
                    240: plenty even for many digits at the ends of the exponent range.
                    241: Use of the private pool avoids some overhead.
                    242:
                    243: Directory test provides some test routines.  See its README.
                    244: I've also tested this stuff (except double double conversions)
                    245: with Vern Paxson's testbase program: see
                    246:
                    247:        V. Paxson and W. Kahan, "A Program for Testing IEEE Binary-Decimal
                    248:        Conversion", manuscript, May 1991,
                    249:        ftp://ftp.ee.lbl.gov/testbase-report.ps.Z .
                    250:
                    251: (The same ftp directory has source for testbase.)
                    252:
                    253: Some system-dependent additions to CFLAGS in the makefile:
                    254:
                    255:        HU-UX: -Aa -Ae
                    256:        OSF (DEC Unix): -ieee_with_no_inexact
                    257:        SunOS 4.1x: -DKR_headers -DBad_float_h
                    258:
                    259: If you want to put this stuff into a shared library and your
                    260: operating system requires export lists for shared libraries,
                    261: the following would be an appropriate export list:
                    262:
                    263:        dtoa
                    264:        freedtoa
                    265:        g_Qfmt
                    266:        g_ddfmt
                    267:        g_dfmt
                    268:        g_ffmt
                    269:        g_xLfmt
                    270:        g_xfmt
                    271:        gdtoa
                    272:        strtoIQ
                    273:        strtoId
                    274:        strtoIdd
                    275:        strtoIf
                    276:        strtoIx
                    277:        strtoIxL
                    278:        strtod
                    279:        strtodI
                    280:        strtodg
                    281:        strtof
                    282:        strtopQ
                    283:        strtopd
                    284:        strtopdd
                    285:        strtopf
                    286:        strtopx
                    287:        strtopxL
                    288:        strtorQ
                    289:        strtord
                    290:        strtordd
                    291:        strtorf
                    292:        strtorx
                    293:        strtorxL
                    294:
                    295: When time permits, I (dmg) hope to write in more detail about the
                    296: present conversion routines; for now, this README file must suffice.
                    297: Meanwhile, if you wish to write helper functions for other kinds of
                    298: IEEE-like arithmetic, some explanation of struct FPI and the bits
                    299: array may be helpful.  Both gdtoa and strtodg operate on a bits array
                    300: described by FPI *fpi.  The bits array is of type ULong, a 32-bit
                    301: unsigned integer type.  Floating-point numbers have fpi->nbits bits,
                    302: with the least significant 32 bits in bits[0], the next 32 bits in
                    303: bits[1], etc.  These numbers are regarded as integers multiplied by
                    304: 2^e (i.e., 2 to the power of the exponent e), where e is the second
                    305: argument (be) to gdtoa and is stored in *exp by strtodg.  The minimum
                    306: and maximum exponent values fpi->emin and fpi->emax for normalized
                    307: floating-point numbers reflect this arrangement.  For example, the
                    308: P754 standard for binary IEEE arithmetic specifies doubles as having
                    309: 53 bits, with normalized values of the form 1.xxxxx... times 2^(b-1023),
                    310: with 52 bits (the x's) and the biased exponent b represented explicitly;
                    311: b is an unsigned integer in the range 1 <= b <= 2046 for normalized
                    312: finite doubles, b = 0 for denormals, and b = 2047 for Infinities and NaNs.
                    313: To turn an IEEE double into the representation used by strtodg and gdtoa,
                    314: we multiply 1.xxxx... by 2^52 (to make it an integer) and reduce the
                    315: exponent e = (b-1023) by 52:
                    316:
                    317:        fpi->emin = 1 - 1023 - 52
                    318:        fpi->emax = 1046 - 1023 - 52
                    319:
                    320: In various wrappers for IEEE double, we actually write -53 + 1 rather
                    321: than -52, to emphasize that there are 53 bits including one implicit bit.
                    322: Field fpi->rounding indicates the desired rounding direction, with
                    323: possible values
                    324:        FPI_Round_zero = toward 0,
                    325:        FPI_Round_near = unbiased rounding -- the IEEE default,
                    326:        FPI_Round_up = toward +Infinity, and
                    327:        FPI_Round_down = toward -Infinity
                    328: given in gdtoa.h.
                    329:
                    330: Field fpi->sudden_underflow indicates whether strtodg should return
                    331: denormals or flush them to zero.  Normal floating-point numbers have
                    332: bit fpi->nbits in the bits array on.  Denormals have it off, with
                    333: exponent = fpi->emin.  Strtodg provides distinct return values for normals
                    334: and denormals; see gdtoa.h.
                    335:
                    336: Compiling g__fmt.c, strtod.c, and strtodg.c with -DUSE_LOCALE causes
                    337: the decimal-point character to be taken from the current locale; otherwise
                    338: it is '.'.
                    339:
1.1.1.2 ! christos  340: Source files dtoa.c and strtod.c in this directory are derived from
        !           341: netlib's "dtoa.c from fp" and are meant to function equivalently.
        !           342: When compiled with Honor_FLT_ROUNDS #defined (on systems that provide
        !           343: FLT_ROUNDS and fegetround() as specified in the C99 standard), they
        !           344: honor the current rounding mode.  Because FLT_ROUNDS is buggy on some
        !           345: (Linux) systems -- not reflecting calls on fesetround(), as the C99
        !           346: standard says it should -- when Honor_FLT_ROUNDS is #defined, the
        !           347: current rounding mode is obtained from fegetround() rather than from
        !           348: FLT_ROUNDS, unless Trust_FLT_ROUNDS is also #defined.
        !           349:
        !           350: Compile with -DUSE_LOCALE to use the current locale; otherwise
        !           351: decimal points are assumed to be '.'.  With -DUSE_LOCALE, unless
        !           352: you also compile with -DNO_LOCALE_CACHE, the details about the
        !           353: current "decimal point" character string are cached and assumed not
        !           354: to change during the program's execution.
        !           355:
        !           356: On machines with a 64-bit long double and perhaps a 113-bit "quad"
        !           357: type, you can invoke "make Printf" to add Printf (and variants, such
        !           358: as Fprintf) to gdtoa.a.  These are analogs, declared in stdio1.h, of
        !           359: printf and fprintf, etc. in which %La, %Le, %Lf, and %Lg are for long
        !           360: double and (if appropriate) %Lqa, %Lqe, %Lqf, and %Lqg are for quad
        !           361: precision printing.
        !           362:
1.1       kleink    363: Please send comments to        David M. Gay (dmg at acm dot org, with " at "
                    364: changed at "@" and " dot " changed to ".").
CVSweb <webmaster@jp.NetBSD.org>