IEEE 754
维库,知识与思想的自由文库
|
IEEE二進位浮點數算術標準(IEEE 754)是最廣泛使用的浮點數運算標準,為許多CPU與浮點運算器所採用。這個標準定義了表示浮點數的格式(包括負零(−0 (number))與反常值(denormal number)),一些特殊數值(無窮與非數值(NaN)),以及這些數值的「浮點數運算子」;它也指明了四種數值修約規則和五種例外狀況(包括例外發生的時機與處理方式)。 IEEE 754規定了四種表示浮點數值的方式:單精確度(32位元)、雙精確度(64位元)、延伸單精確度(43位元以上,很少使用)與延伸雙精確度(79位元以上,通常以80位元實做)。只有32位元模式有強制要求,其他都是選擇性的。大部分程式語言都有提供IEEE格式與算術,但有些將其列為非必要的。例如,IEEE 754問世之前就有的C語言,現在有包括IEEE算術,但不算作強制要求(C語言的float通常是指IEEE單精確度,而double是指雙精確度)。 該標準的全稱為IEEE二進位浮點數算術標準(ANSI/IEEE Std 754-1985),又稱IEC 60559:1989,微處理器系統的二進位浮點數算術(本來的編號是IEC 559:1989)[1]。後來還有「與基數無關的浮點數」的「IEEE 854-1987標準」,有規定基數為2跟10的狀況。
[编辑] 浮點數剖析以下是該標準對浮點數格式的描述。 [编辑] 本文表示位元的約定我們將電腦上一個長度為W的單字(word)其中的位元以0到W−1的整數編碼,通常將最右邊的位元編成0,以讓編號最小的位元與最低效位元(least significant bit或lsb,代表最小位數,改變時對數值影響最小的位元)一致。 [编辑] 整體呈現二進位浮點數是以符號數值表示法格式儲存,將最高效位元指定為符號位元(sign bit);「指數部份」,即次高效的e位元,為浮點數中經指數偏差(exponent bias)處理過後的指數;「小數部份」,即剩下的f位元,為有效位數(significand)減掉有效位數本身的最高效位元。 一些非中文的文字因为尚未翻譯而被隐藏,歡迎參與翻譯。
[编辑] 指數偏差指數偏差(表示法中的指數為實際指數減掉某個值)為 2e-1 - 1,參見有符號數處理的Excess-N。減掉一個值是因為指數必須是有號數才能表達很大或很小的數值,但是有號數通常的表示法,二的補數(two's complement),會使得 Biasing is done because exponents have to be signed values in order to be able to represent both tiny and huge values, but two's complement, the usual representation for signed values, would make comparison harder. To solve this the exponent is biased before being stored, by adjusting its value to put it within an unsigned range suitable for comparison. For example, to represent a number which has exponent of 17, exponent is 17+2e-1 - 1. [编辑] 範例The most significant bit of the significand ( not stored) is determined by the value of exponent. If 0 < exponent < 2e − 1, the most significant bit of the significand is 1, and the number is said to be normalized. If exponent is 0, the most significant bit of the significand is 0 and the number is said to be de-normalized. Three special cases arise:
This can be summarized as:
[编辑] Single-precision 32 bitA single-precision binary floating-point number is stored in 32 bits. The exponent is biased by 28 − 1 − 1 = 127 in this case (Exponents in the range −126 to +127 are representable. See the above explanation to understand why biasing is done). An exponent of −127 would be biased to the value 0 but this is reserved to encode that the value is a denormalized number or zero. An exponent of 128 would be biased to the value 255 but this is reserved to encode an infinity or not a number (NaN). See the chart above. For normalised numbers, the most common, exponent is the biased exponent and fraction is the significand minus the most significant bit. The number has value v: v = s × 2e × m Where s = +1 (positive numbers) when the sign bit is 0 s = −1 (negative numbers) when the sign bit is 1 e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127") m = 1.fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of the fraction). Therefore, 1 ≤ m < 2. In the example shown above, the sign is zero, the exponent is −3, and the significand is 1.01 (in binary, which is 1.25 in decimal). The represented number is therefore +1.25 × 2−3, which is +0.15625. Notes:
[编辑] A more complex exampleLet us encode the decimal number −118.625 using the IEEE 754 system.
[编辑] Double-precision 64 bitDouble precision is essentially the same except that the fields are wider: The fraction part is much larger, while the exponent is only slightly larger. The standard creators believed precision is more important than range. NaNs and Infinities are represented with Exp being all 1s (2047). For Normalized numbers the exponent bias is +1023 (so e is exponent − 1023). For Denormalized numbers the exponent is −1022 (the minimum exponent for a normalized number—it is not −1023 because normalised numbers have a leading 1 digit before the binary point and denormalized numbers do not). As before, both infinity and zero are signed. Notes:
[编辑] Comparing floating-point numbersIEEE floating point numbers use lexicographical ordering. If NaN's are excluded IEEE floating point numbers can be compared as signed magnitude integers. [编辑] Rounding floating-point numbersThe IEEE standard has four different rounding modes; the first is the default; the others are called directed roundings.
[编辑] Extending the real numbersThe IEEE standard employs (and extends) the affinely extended real number system, with separate positive and negative infinities. During drafting, there was a proposal for the standard to incorporate the projectively extended real number system, with a single unsigned infinity, by providing programmers with a mode selection option. In the interest of reducing the complexity of the final standard, the projective mode was dropped, however. The Intel 8087 and Intel 80287 floating point co-processors both support this projective mode.[3][4][5] [编辑] Recommended functions and predicates
[编辑] References
[编辑] Revision of the standardNote that the IEEE 754 standard is currently under revision. See: IEEE 754r [编辑] See also
[编辑] 外部連結
|


