Annotation of src/lib/libc/gdtoa/README, Revision 1.1.1.2
1.1 kleink 1: This directory contains source for a library of binary -> decimal
2: and decimal -> binary conversion routines, for single-, double-,
3: and extended-precision IEEE binary floating-point arithmetic, and
4: other IEEE-like binary floating-point, including "double double",
5: as in
6:
7: T. J. Dekker, "A Floating-Point Technique for Extending the
8: Available Precision", Numer. Math. 18 (1971), pp. 224-242
9:
10: and
11:
12: "Inside Macintosh: PowerPC Numerics", Addison-Wesley, 1994
13:
14: The conversion routines use double-precision floating-point arithmetic
15: and, where necessary, high precision integer arithmetic. The routines
16: are generalizations of the strtod and dtoa routines described in
17:
18: David M. Gay, "Correctly Rounded Binary-Decimal and
19: Decimal-Binary Conversions", Numerical Analysis Manuscript
20: No. 90-10, Bell Labs, Murray Hill, 1990;
21: http://cm.bell-labs.com/cm/cs/what/ampl/REFS/rounding.ps.gz
22:
23: (based in part on papers by Clinger and Steele & White: see the
24: references in the above paper).
25:
26: The present conversion routines should be able to use any of IEEE binary,
27: VAX, or IBM-mainframe double-precision arithmetic internally, but I (dmg)
28: have so far only had a chance to test them with IEEE double precision
29: arithmetic.
30:
31: The core conversion routines are strtodg for decimal -> binary conversions
32: and gdtoa for binary -> decimal conversions. These routines operate
33: on arrays of unsigned 32-bit integers of type ULong, a signed 32-bit
34: exponent of type Long, and arithmetic characteristics described in
35: struct FPI; FPI, Long, and ULong are defined in gdtoa.h. File arith.h
36: is supposed to provide #defines that cause gdtoa.h to define its
37: types correctly. File arithchk.c is source for a program that
38: generates a suitable arith.h on all systems where I've been able to
39: test it.
40:
41: The core conversion routines are meant to be called by helper routines
42: that know details of the particular binary arithmetic of interest and
43: convert. The present directory provides helper routines for 5 variants
44: of IEEE binary floating-point arithmetic, each indicated by one or
45: two letters:
46:
47: f IEEE single precision
48: d IEEE double precision
49: x IEEE extended precision, as on Intel 80x87
50: and software emulations of Motorola 68xxx chips
51: that do not pad the way the 68xxx does, but
52: only store 80 bits
53: xL IEEE extended precision, as on Motorola 68xxx chips
54: Q quad precision, as on Sun Sparc chips
55: dd double double, pairs of IEEE double numbers
56: whose sum is the desired value
57:
58: For decimal -> binary conversions, there are three families of
1.1.1.2 ! christos 59: helper routines: one for round-nearest (or the current rounding
! 60: mode on IEEE-arithmetic systems that provide the C99 fegetround()
! 61: function, if compiled with -DHonor_FLT_ROUNDS):
1.1 kleink 62:
63: strtof
64: strtod
65: strtodd
66: strtopd
67: strtopf
68: strtopx
69: strtopxL
70: strtopQ
71:
72: one with rounding direction specified:
73:
74: strtorf
75: strtord
76: strtordd
77: strtorx
78: strtorxL
79: strtorQ
80:
81: and one for computing an interval (at most one bit wide) that contains
82: the decimal number:
83:
84: strtoIf
85: strtoId
86: strtoIdd
87: strtoIx
88: strtoIxL
89: strtoIQ
90:
91: The latter call strtoIg, which makes one call on strtodg and adjusts
92: the result to provide the desired interval. On systems where native
93: arithmetic can easily make one-ulp adjustments on values in the
94: desired floating-point format, it might be more efficient to use the
95: native arithmetic. Routine strtodI is a variant of strtoId that
96: illustrates one way to do this for IEEE binary double-precision
97: arithmetic -- but whether this is more efficient remains to be seen.
98:
99: Functions strtod and strtof have "natural" return types, float and
100: double -- strtod is specified by the C standard, and strtof appears
101: in the stdlib.h of some systems, such as (at least some) Linux systems.
102: The other functions write their results to their final argument(s):
103: to the final two argument for the strtoI... (interval) functions,
104: and to the final argument for the others (strtop... and strtor...).
105: Where possible, these arguments have "natural" return types (double*
106: or float*), to permit at least some type checking. In reality, they
107: are viewed as arrays of ULong (or, for the "x" functions, UShort)
108: values. On systems where long double is the appropriate type, one can
109: pass long double* final argument(s) to these routines. The int value
110: that these routines return is the return value from the call they make
111: on strtodg; see the enum of possible return values in gdtoa.h.
112:
113: Source files g_ddfmt.c, misc.c, smisc.c, strtod.c, strtodg.c, and ulp.c
114: should use true IEEE double arithmetic (not, e.g., double extended),
115: at least for storing (and viewing the bits of) the variables declared
116: "double" within them.
117:
118: One detail indicated in struct FPI is whether the target binary
119: arithmetic departs from the IEEE standard by flushing denormalized
120: numbers to 0. On systems that do this, the helper routines for
121: conversion to double-double format (when compiled with
122: Sudden_Underflow #defined) penalize the bottom of the exponent
123: range so that they return a nonzero result only when the least
124: significant bit of the less significant member of the pair of
125: double values returned can be expressed as a normalized double
126: value. An alternative would be to drop to 53-bit precision near
127: the bottom of the exponent range. To get correct rounding, this
128: would (in general) require two calls on strtodg (one specifying
129: 126-bit arithmetic, then, if necessary, one specifying 53-bit
130: arithmetic).
131:
132: By default, the core routine strtodg and strtod set errno to ERANGE
133: if the result overflows to +Infinity or underflows to 0. Compile
134: these routines with NO_ERRNO #defined to inhibit errno assignments.
135:
136: Routine strtod is based on netlib's "dtoa.c from fp", and
137: (f = strtod(s,se)) is more efficient for some conversions than, say,
138: strtord(s,se,1,&f). Parts of strtod require true IEEE double
139: arithmetic with the default rounding mode (round-to-nearest) and, on
140: systems with IEEE extended-precision registers, double-precision
141: (53-bit) rounding precision. If the machine uses (the equivalent of)
142: Intel 80x87 arithmetic, the call
143: _control87(PC_53, MCW_PC);
144: does this with many compilers. Whether this or another call is
145: appropriate depends on the compiler; for this to work, it may be
146: necessary to #include "float.h" or another system-dependent header
147: file.
148:
149: Source file strtodnrp.c gives a strtod that does not require 53-bit
150: rounding precision on systems (such as Intel IA32 systems) that may
151: suffer double rounding due to use of extended-precision registers.
152: For some conversions this variant of strtod is less efficient than the
153: one in strtod.c when the latter is run with 53-bit rounding precision.
154:
155: The values that the strto* routines return for NaNs are determined by
156: gd_qnan.h, which the makefile generates by running the program whose
157: source is qnan.c. Note that the rules for distinguishing signaling
158: from quiet NaNs are system-dependent. For cross-compilation, you need
159: to determine arith.h and gd_qnan.h suitably, e.g., using the
160: arithmetic of the target machine.
161:
162: C99's hexadecimal floating-point constants are recognized by the
163: strto* routines (but this feature has not yet been heavily tested).
164: Compiling with NO_HEX_FP #defined disables this feature.
165:
166: When compiled with -DINFNAN_CHECK, the strto* routines recognize C99's
167: NaN and Infinity syntax. Moreover, unless No_Hex_NaN is #defined, the
168: strto* routines also recognize C99's NaN(...) syntax: they accept
169: (case insensitively) strings of the form NaN(x), where x is a string
170: of hexadecimal digits and spaces; if there is only one string of
171: hexadecimal digits, it is taken for the fraction bits of the resulting
172: NaN; if there are two or more strings of hexadecimal digits, each
173: string is assigned to the next available sequence of 32-bit words of
174: fractions bits (starting with the most significant), right-aligned in
175: each sequence.
176:
177: For binary -> decimal conversions, I've provided just one family
178: of helper routines:
179:
180: g_ffmt
181: g_dfmt
182: g_ddfmt
183: g_xfmt
184: g_xLfmt
185: g_Qfmt
186:
187: which do a "%g" style conversion either to a specified number of decimal
188: places (if their ndig argument is positive), or to the shortest
189: decimal string that rounds to the given binary floating-point value
190: (if ndig <= 0). They write into a buffer supplied as an argument
191: and return either a pointer to the end of the string (a null character)
192: in the buffer, if the buffer was long enough, or 0. Other forms of
193: conversion are easily done with the help of gdtoa(), such as %e or %f
194: style and conversions with direction of rounding specified (so that, if
195: desired, the decimal value is either >= or <= the binary value).
1.1.1.2 ! christos 196: On IEEE-arithmetic systems that provide the C99 fegetround() function,
! 197: if compiled with -DHonor_FLT_ROUNDS, these routines honor the current
! 198: rounding mode.
1.1 kleink 199:
200: For an example of more general conversions based on dtoa(), see
201: netlib's "printf.c from ampl/solvers".
202:
203: For double-double -> decimal, g_ddfmt() assumes IEEE-like arithmetic
204: of precision max(126, #bits(input)) bits, where #bits(input) is the
205: number of mantissa bits needed to represent the sum of the two double
206: values in the input.
207:
208: The makefile creates a library, gdtoa.a. To use the helper
209: routines, a program only needs to include gdtoa.h. All the
210: source files for gdtoa.a include a more extensive gdtoaimp.h;
211: among other things, gdtoaimp.h has #defines that make "internal"
212: names end in _D2A. To make a "system" library, one could modify
213: these #defines to make the names start with __.
214:
215: Various comments about possible #defines appear in gdtoaimp.h,
216: but for most purposes, arith.h should set suitable #defines.
217:
218: Systems with preemptive scheduling of multiple threads require some
219: manual intervention. On such systems, it's necessary to compile
220: dmisc.c, dtoa.c gdota.c, and misc.c with MULTIPLE_THREADS #defined,
221: and to provide (or suitably #define) two locks, acquired by
222: ACQUIRE_DTOA_LOCK(n) and freed by FREE_DTOA_LOCK(n) for n = 0 or 1.
223: (The second lock, accessed in pow5mult, ensures lazy evaluation of
224: only one copy of high powers of 5; omitting this lock would introduce
225: a small probability of wasting memory, but would otherwise be harmless.)
226: Routines that call dtoa or gdtoa directly must also invoke freedtoa(s)
227: to free the value s returned by dtoa or gdtoa. It's OK to do so whether
228: or not MULTIPLE_THREADS is #defined, and the helper g_*fmt routines
229: listed above all do this indirectly (in gfmt_D2A(), which they all call).
230:
231: By default, there is a private pool of memory of length 2000 bytes
232: for intermediate quantities, and MALLOC (see gdtoaimp.h) is called only
233: if the private pool does not suffice. 2000 is large enough that MALLOC
234: is called only under very unusual circumstances (decimal -> binary
235: conversion of very long strings) for conversions to and from double
236: precision. For systems with preemptively scheduled multiple threads
237: or for conversions to extended or quad, it may be appropriate to
238: #define PRIVATE_MEM nnnn, where nnnn is a suitable value > 2000.
239: For extended and quad precisions, -DPRIVATE_MEM=20000 is probably
240: plenty even for many digits at the ends of the exponent range.
241: Use of the private pool avoids some overhead.
242:
243: Directory test provides some test routines. See its README.
244: I've also tested this stuff (except double double conversions)
245: with Vern Paxson's testbase program: see
246:
247: V. Paxson and W. Kahan, "A Program for Testing IEEE Binary-Decimal
248: Conversion", manuscript, May 1991,
249: ftp://ftp.ee.lbl.gov/testbase-report.ps.Z .
250:
251: (The same ftp directory has source for testbase.)
252:
253: Some system-dependent additions to CFLAGS in the makefile:
254:
255: HU-UX: -Aa -Ae
256: OSF (DEC Unix): -ieee_with_no_inexact
257: SunOS 4.1x: -DKR_headers -DBad_float_h
258:
259: If you want to put this stuff into a shared library and your
260: operating system requires export lists for shared libraries,
261: the following would be an appropriate export list:
262:
263: dtoa
264: freedtoa
265: g_Qfmt
266: g_ddfmt
267: g_dfmt
268: g_ffmt
269: g_xLfmt
270: g_xfmt
271: gdtoa
272: strtoIQ
273: strtoId
274: strtoIdd
275: strtoIf
276: strtoIx
277: strtoIxL
278: strtod
279: strtodI
280: strtodg
281: strtof
282: strtopQ
283: strtopd
284: strtopdd
285: strtopf
286: strtopx
287: strtopxL
288: strtorQ
289: strtord
290: strtordd
291: strtorf
292: strtorx
293: strtorxL
294:
295: When time permits, I (dmg) hope to write in more detail about the
296: present conversion routines; for now, this README file must suffice.
297: Meanwhile, if you wish to write helper functions for other kinds of
298: IEEE-like arithmetic, some explanation of struct FPI and the bits
299: array may be helpful. Both gdtoa and strtodg operate on a bits array
300: described by FPI *fpi. The bits array is of type ULong, a 32-bit
301: unsigned integer type. Floating-point numbers have fpi->nbits bits,
302: with the least significant 32 bits in bits[0], the next 32 bits in
303: bits[1], etc. These numbers are regarded as integers multiplied by
304: 2^e (i.e., 2 to the power of the exponent e), where e is the second
305: argument (be) to gdtoa and is stored in *exp by strtodg. The minimum
306: and maximum exponent values fpi->emin and fpi->emax for normalized
307: floating-point numbers reflect this arrangement. For example, the
308: P754 standard for binary IEEE arithmetic specifies doubles as having
309: 53 bits, with normalized values of the form 1.xxxxx... times 2^(b-1023),
310: with 52 bits (the x's) and the biased exponent b represented explicitly;
311: b is an unsigned integer in the range 1 <= b <= 2046 for normalized
312: finite doubles, b = 0 for denormals, and b = 2047 for Infinities and NaNs.
313: To turn an IEEE double into the representation used by strtodg and gdtoa,
314: we multiply 1.xxxx... by 2^52 (to make it an integer) and reduce the
315: exponent e = (b-1023) by 52:
316:
317: fpi->emin = 1 - 1023 - 52
318: fpi->emax = 1046 - 1023 - 52
319:
320: In various wrappers for IEEE double, we actually write -53 + 1 rather
321: than -52, to emphasize that there are 53 bits including one implicit bit.
322: Field fpi->rounding indicates the desired rounding direction, with
323: possible values
324: FPI_Round_zero = toward 0,
325: FPI_Round_near = unbiased rounding -- the IEEE default,
326: FPI_Round_up = toward +Infinity, and
327: FPI_Round_down = toward -Infinity
328: given in gdtoa.h.
329:
330: Field fpi->sudden_underflow indicates whether strtodg should return
331: denormals or flush them to zero. Normal floating-point numbers have
332: bit fpi->nbits in the bits array on. Denormals have it off, with
333: exponent = fpi->emin. Strtodg provides distinct return values for normals
334: and denormals; see gdtoa.h.
335:
336: Compiling g__fmt.c, strtod.c, and strtodg.c with -DUSE_LOCALE causes
337: the decimal-point character to be taken from the current locale; otherwise
338: it is '.'.
339:
1.1.1.2 ! christos 340: Source files dtoa.c and strtod.c in this directory are derived from
! 341: netlib's "dtoa.c from fp" and are meant to function equivalently.
! 342: When compiled with Honor_FLT_ROUNDS #defined (on systems that provide
! 343: FLT_ROUNDS and fegetround() as specified in the C99 standard), they
! 344: honor the current rounding mode. Because FLT_ROUNDS is buggy on some
! 345: (Linux) systems -- not reflecting calls on fesetround(), as the C99
! 346: standard says it should -- when Honor_FLT_ROUNDS is #defined, the
! 347: current rounding mode is obtained from fegetround() rather than from
! 348: FLT_ROUNDS, unless Trust_FLT_ROUNDS is also #defined.
! 349:
! 350: Compile with -DUSE_LOCALE to use the current locale; otherwise
! 351: decimal points are assumed to be '.'. With -DUSE_LOCALE, unless
! 352: you also compile with -DNO_LOCALE_CACHE, the details about the
! 353: current "decimal point" character string are cached and assumed not
! 354: to change during the program's execution.
! 355:
! 356: On machines with a 64-bit long double and perhaps a 113-bit "quad"
! 357: type, you can invoke "make Printf" to add Printf (and variants, such
! 358: as Fprintf) to gdtoa.a. These are analogs, declared in stdio1.h, of
! 359: printf and fprintf, etc. in which %La, %Le, %Lf, and %Lg are for long
! 360: double and (if appropriate) %Lqa, %Lqe, %Lqf, and %Lqg are for quad
! 361: precision printing.
! 362:
1.1 kleink 363: Please send comments to David M. Gay (dmg at acm dot org, with " at "
364: changed at "@" and " dot " changed to ".").
CVSweb <webmaster@jp.NetBSD.org>