- a mantissa, and
- an exponent.

- it has only one single digit left of the dot, and
- this digit is not zero.

Numbers themselves also can be negative, such as -1.234567. Because multiplying and dividing does not change the sign, the mantissa also needs a sign bit. So we can handle positive as well as negative numbers, such as -182.162°C as the boiling point of oxygen. Of course we'll have to divide this boiling point by 1,00 to get a normalized mantissa, and its exponent will be plus two. Normalized we'll get -1.82162E+02 for that boiling point.

Converted to the binary world, where the base is 2, the floating point numbers need at least two bytes: one for the mantissa and one for the exponent. Both are signed integers. The meaning of one bit in the mantissa and one bit in the exponent is very different:

- In the mantissa each bit, starting from the dot, or better: from its highest non-sign-bit, stands for 1 divided by 2, powered by n, where n is its position in the mantissa. So the first bit is 1 / 2^1 = 1 / 2, or in decimal 0.5. Each further bit stands for half of the previous bit, so the next in the line is 0.25, the overnext is 0,125 etc. etc.
- The exponent is simpler to understand: in an 8-bit exponent it reaches from zero to 127 (hexadecimal 0x00 to 0x7F) for positive exponents and from -1 to -128 (hexadecimal 0xFF for -1, 0x80 for -128) for negative exponents. This says that for each positive number the mantissa has to be shifted n positions to the left, for negative ones shifted one position to the right. A left shift means multiplying the mantissa by two, right shift a division by two.

Very small are the variations that come with the mantissa: as can be seen from an 8-bit mantissa's 0x7F that its decimal value is only 0.992 and by only 0.008 below the one. So we can one handle numbers with slightly more than two digit (2 1/2) precision in an 8-bit mantissa. By far not enough for calculating interest rates or other commercial stuff or in engineering, only suitable for rather rough technical measurements. 8-bit mantissa's are of the same accuracy as an ancient slide rule (for those who are still familiar with that kind of calculating machines).

To increase the precision we add additional eight bits to the mantissa. The lowest of the mantissa's bits stands now for 0.0000305. This increases the precision to slightly more than four digits. If we would add another byte to the mantissa we are at slightly more than six decimal digits, the complete number has already 32 bits or four bytes. 16-bit mantissas are not precise enough to calculate Mandelbrot-sets, but are suitable for most technical applications.

If you need higher resolutions, pick a needed style from this table.

Because one additional mantissa bit can increase precision by roughly a half decimal digit, the inventors of binary floats increased it by one with a trick: because a normalized binary mantissa always starts with a one, this bit can be skipped and an additional bit fits into the 16 bit mantissa at the end. These kind of tricks increase the variability of floating number formats and make it more and more complicated to understand: of course the skipped one-bit on top has to be added when calculating with the mantissa. It can replace the mantissa's sign bit, if that bit sign bit is stored elsewhere.

An advantage do those floats have: they simplify the multiplication and division of two floats. If we have to multiply two floats with their mantissas M1 and M2, we can simply multiply the two mantissas and, even more simple, add their two exponents E1 and E2. When dividing, we have to subtract E2 from E1.

The simplification when multiplying is associated by a higher effort when adding or subtracting. Before we can add the two mantissas we have to bring their exponents to the same value (by shifting the mantissa of the smaller number to the right). Only when both are equal, we can add both mantissas.

To the page top

©2021 by http://www.avr-asm-tutorial.net