Assembler introduction => Floats
(Diese Seite in Deutsch:
Beginner's introduction to AVR assembler language
Floating point numbers in assembler language
Floating points, if necessary
Those who want to make life more complicated than necessary: besides whole numbers (integers),
signed integers and fixed dot numbers floating point numbers are available. What comes in
higher-level languages simply as 1.234567 is rather complicated in assembler. And that
comes as follows.
The format of floating point numbers
Binary floating points consist of two constituets:
In the decimal world the mantissa gives the normal number part, in 1.234567 this is
the 1.234567. The precision of the number is given by the seven digits. The exponent
says how often the mantissa has to be multiplied by 10 (the base in the decimal world).
In our example this would be a zero. The number could also be written as
0.1234567*101 or shorter as 0.1234567E+01, which says: shift the mantissa
one time left. It could also be written as 12.34567E-01 to say: shift the mantissa
one time right. The formulation 1.234567E+00 is called normalized in that it
- a mantissa, and
- an exponent.
Numbers larger than 9.999999 are repeatedly divided by 10, by that increasing the exponent.
Numbers smaller than 1 are repeatedly multiplied by 10, by that decreasing the exponent.
Numbers smaller than one have a negative exponent. That is why the exponent has to be a
- it has only one single digit left of the dot, and
- this digit is not zero.
Numbers themselves also can be negative, such as -1.234567. Because multiplying and dividing
does not change the sign, the mantissa also needs a sign bit. So we can handle positive
as well as negative numbers, such as -182.162°C as the boiling point of oxygen. Of
course we'll have to divide this boiling point by 1,00 to get a normalized mantissa,
and its exponent will be plus two. Normalized we'll get -1.82162E+02 for that boiling
Converted to the binary world, where the base is 2, the floating point numbers need at
least two bytes: one for the mantissa and one for the exponent. Both are signed
integers. The meaning of one bit in the mantissa and one bit in the exponent is very
Because the exponent shifts the number by its power of two (* 2 ^), each bit of it is
more powerful than a bit in the mantissa. So 2^127 is 1.7-multiplied by-10-power-38 or
1.7*1038 or even shorter 1.7E38. Vice versa, negative exponents make the
exponent part of the number very small: 2^-128 is decimal 2.9E-39. With eight bit
exponent only we can cover the range of numbers between 2.9E-39 to 1.7E+38. That
should be enough large or small, not for an astronomer but for most of the rest of the
calculating mankind. So an 8-bit exponent is sufficient.
- In the mantissa each bit, starting from the dot, or better: from its highest
non-sign-bit, stands for 1 divided by 2, powered by n, where n is its position in
the mantissa. So the first bit is 1 / 2^1 = 1 / 2, or in decimal 0.5. Each further
bit stands for half of the previous bit, so the next in the line is 0.25, the
overnext is 0,125 etc. etc.
- The exponent is simpler to understand: in an 8-bit exponent it reaches from zero
to 127 (hexadecimal 0x00 to 0x7F) for positive exponents and from -1 to -128
(hexadecimal 0xFF for -1, 0x80 for -128) for negative exponents. This says that
for each positive number the mantissa has to be shifted n positions to the left,
for negative ones shifted one position to the right. A left shift means multiplying
the mantissa by two, right shift a division by two.
Very small are the variations that come with the mantissa: as can be seen from an
8-bit mantissa's 0x7F that its decimal value is only 0.992 and by only 0.008 below the
one. So we can one handle numbers with slightly more than two digit (2 1/2) precision
in an 8-bit mantissa. By far not enough for calculating interest rates or other commercial
stuff or in engineering, only suitable for rather rough technical measurements. 8-bit
mantissa's are of the same accuracy as an ancient slide rule (for those who are still
familiar with that kind of calculating machines).
To increase the precision we add additional eight bits to the mantissa. The lowest of
the mantissa's bits stands now for 0.0000305. This increases the precision to slightly
more than four digits. If we would add another byte to the mantissa we are at slightly
more than six decimal digits, the complete number has already 32 bits or four bytes.
16-bit mantissas are not precise enough to calculate Mandelbrot-sets, but are suitable
for most technical applications.
If you need higher resolutions, pick a needed style from this table.
Because one additional mantissa bit can increase precision by roughly a half decimal
digit, the inventors of binary floats increased it by one with a trick: because a
normalized binary mantissa always starts with a one, this bit can be skipped and an
additional bit fits into the 16 bit mantissa at the end. These kind of tricks increase
the variability of floating number formats and make it more and more complicated to
understand: of course the skipped one-bit on top has to be added when calculating with
the mantissa. It can replace the mantissa's sign bit, if that bit sign bit is stored
An advantage do those floats have: they simplify the multiplication and division
of two floats. If we have to multiply two floats with their mantissas M1 and M2,
we can simply multiply the two mantissas and, even more simple, add their two
exponents E1 and E2. When dividing, we have to subtract E2 from E1.
The simplification when multiplying is associated by a higher effort when adding or
subtracting. Before we can add the two mantissas we have to bring their exponents
to the same value (by shifting the mantissa of the smaller number to the right).
Only when both are equal, we can add both mantissas.
Conversion of binary to decimal number format
To demonstrate that handling binary float numbers is rather extensive, I have shown
the conversion of a 24-bit float with a 16-bit mantissa in detail. The software
for doing that has 410 code lines and needs a few milli-seconds in an AVR. How this
is done is documented on this page here. If you want to
learn assembler: this is a more high-level example, with lots of pointers. I hope
that you enjoy the understanding of a more complex task.
Those who are clever and do not need numbers up to 1038 (or even larger)
avoid floats and rather use integers or fixed floating point numbers (Pseudo-floats).
Those are by far simpler to handle, easier to understand and it is rather simpler to
adjust their precision to the given practical needs.
To the page top
©2021 by http://www.avr-asm-tutorial.net