Previous Table of Contents "New C Standard" commentary

330
The characteristics of floating types are defined in terms of a model
that describes a representation of floating-point numbers and values
that provide information about an implementation's floating-point
arithmetic.^{16)}

331 The following parameters are used to define the model for each floating-point type:

332

333

334
_{min}_{max}

335

336
_{k}

337
A *floating-point number* (*x*) is defined by the following
model:

^{e} ∑_{k=1}^{p} f_{k}b^{-k}, e_{min} ≤ e ≤ e_{max}

338
In addition to normalized floating-point numbers
(_{1}>0_{min}_{1}=0_{min}_{1}=0

339
A *NaN* is an encoding signifying Not-a-Number.

340
A *quiet NaN* propagates through almost every arithmetic
operation without raising a floating-point exception;

341
a *signaling NaN* generally raises a floating-point exception
when occurring as an arithmetic operand.^{17)}

342

343

344 15) See 6.2.5.

345 16) The floating-point model is intended to clarify the description of each floating-point characteristic and does not require the floating-point arithmetic of the implementation to be identical.

346
The accuracy of the floating-point operations (

347 The implementation may state that the accuracy is unknown.

348
All integer values in the

349 all floating values shall be constant expressions.

350
All except

351
The floating-point model representation is provided for all values
except

352
The rounding mode for floating-point addition is characterized by the
implementation-defined value of ^{18)}

All other values for

353

354
The use of evaluation formats is characterized by the
implementation-defined value of
^{19)}

355

356

357 17) IEC 60559:1989 specifies quiet and signaling NaNs.

358 For implementations that do not support IEC 60559:1989, the terms quiet NaN and signaling NaN are intended to apply to encodings with similar behavior.

359
18) Evaluation of

360 19) The evaluation method determines evaluation formats of expressions involving all floating types, not just real types.

361
For example, if

362

363

364
All other negative values for

365 The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign:

366
— radix of exponent representation,

FLT_RADIX 2

367
— number of base-

FLT_MANT_DIG DBL_MANT_DIG LDBL_MANT_DIG

368
— number of decimal digits, _{max}

_{max}log_{10}b

⌈1+p_{max}log_{10}b⌉

DECIMAL_DIG 10

369
— number of decimal digits,

_{max}log_{10}b

⌊(p-1)log_{10}b⌋

FLT_DIG 6 DBL_DIG 10 LDBL_DIG 10

370
— minimum negative integer such that
_{min}

FLT_MIN_EXP DBL_MIN_EXP LDBL_MIN_EXP

371
— minimum negative integer such that 10 raised to that
power is in the range of normalized floating-point numbers,
_{10}
b^{emin-1}⌉

FLT_MIN_10_EXP -37 DBL_MIN_10_EXP -37 LDBL_MIN_10_EXP -37

372
— maximum integer such that _{max}

FLT_MAX_EXP DBL_MAX_EXP LDBL_MAX_EXP

373
— maximum integer such that 10 raised to that power is
in the range of representable finite floating-point numbers,
_{10}((1 - b^{-p})
b^{emax})⌋

FLT_MAX_10_EXP +37 DBL_MAX_10_EXP +37 LDBL_MAX_10_EXP +37

374 The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater than or equal to those shown:

375
— maximum representable finite floating-point number,
^{-p})b^{emax}

FLT_MAX 1E+37 DBL_MAX 1E+37 LDBL_MAX 1E+37

376 The values given in the following list shall be replaced by constant expressions with implementation-defined (positive) values that are less than or equal to those shown:

377
— the difference between 1 and the least value greater
than 1 that is representable in the given floating point type,
^{1-p}

FLT_EPSILON 1E-5 DBL_EPSILON 1E-9 LDBL_EPSILON 1E-9

378
— minimum normalized positive floating-point number,
^{emin-1}

FLT_MIN 1E-37 DBL_MIN 1E-37 LDBL_MIN 1E-37

379
Conversion from (at least)

380
EXAMPLE 1
The following describes an artificial floating-point representation
that meets the minimum requirements of this International Standard,
and the appropriate values in a

^{e}∑_{k=1}^{6} f_{k}16^{-k}

FLT_RADIX 16 FLT_MANT_DIG 6 FLT_EPSILON 9.53674316E-07F FLT_DIG 6 FLT_MIN_EXP -31 FLT_MIN 2.93873588E-39F FLT_MIN_10_EXP -38 FLT_MAX_EXP +32 FLT_MAX 3.40282347E+38F FLT_MAX_10_EXP +38

381
EXAMPLE 2
The following describes floating-point representations that also meet
the requirements for single-precision and double-precision normalized
numbers in IEC 60559,^{20)} and the appropriate values
in a

_{f} = s2^{e}∑_{k=1}^{24} f_{k}2^{-k}

_{d} = s2^{e}∑_{k=1}^{53} f_{k}2^{-k}

FLT_RADIX 2 DECIMAL_DIG 17 FLT_MANT_DIG 24 FLT_EPSILON 1.19209290E-07F // decimal constantFLT_EPSILON 0X1P-23F //hex constantFLT_DIG 6 FLT_MIN_EXP -125 FLT_MIN 1.17549435E-38F //decimal constantFLT_MIN 0X1P-126F //hex constantFLT_MIN_10_EXP -37 FLT_MAX_EXP +128 FLT_MAX 3.40282347E+38F //decimal constantFLT_MAX 0X1.fffffeP127F //hex constantFLT_MAX_10_EXP +38 DBL_MANT_DIG 53 DBL_EPSILON 2.2204460492503131E-16 //decimal constantDBL_EPSILON 0X1P-52 //hex constantDBL_DIG 15 DBL_MIN_EXP -1021 DBL_MIN 2.2250738585072014E-308 //decimal constantDBL_MIN 0X1P-1022 //hex constantDBL_MIN_10_EXP -307 DBL_MAX_EXP +1024 DBL_MAX 1.7976931348623157E+308 //decimal constantDBL_MAX 0X1.fffffffffffffP1023 //hex constantDBL_MAX_10_EXP +308

If a type wider than

382
20) The floating-point model in that standard sums powers of

383
**Forward references:**
conditional inclusion (6.10.1), complex arithmetic

Created at: 2008-01-30 02:39:40 The text from WG14/N1256 is copyright © ISO