Floating Point/Floating Point Formats

Floating-Point Formats

There are 4 different formats of floating point number representation in the IEEE 754 standard:

Single-Precision
Double-Precision
Single, Extended-Precision
Double, Extended-Precision

Single-Precision

Single precision floating point numbers are 32 bits wide. The first bit (bit 31, the MSB) is a sign bit, the next 8 bits (bits 30-23) are the exponent, and the remaining 23 bits are for the significand. Note that even though 23 bits are stored for the significand, the precision( $p$ ) is actually 24 bits. This is a trick made possible by a normalized floating point system with $b = 2$ . The exponent is biased by 127, so that negative exponents can be expressed.

Double-Precision

Double-precision numbers are 64 bits wide. The MSB (bit 63) is the sign bit. The next 11 bits (bits 62-52) are the exponent, and the rest of the bits (bits 51-0) are for the significand. Again, the precision is actually 53 bits (not 52) because of the same normalization trick.

Review

Format	Width	Precision	Exponent	Significand
Single	32 bits	23 bits	bits 30-23	bits 22-0
Double	64 bits	52 bits	bits 62-52	bits 51-0

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.