Path: Home => AVR-Overview => Binary calculations => 64bit binaries    (Diese Seite in Deutsch: Flag DE) Logo

Binary calculations with 64 bit unsigned integers in AVR assembler

You need to handle large numbers, e.g. unsigned integers? This page demonstrates the handling of 64-bit unsigned integer values with an 8-bit AVR.

  1. The structure of 64 bit numbers in an 8 bit controller
  2. Converting decimal strings to binary
  3. Binary adding of two large numbers
  4. Subtracting a large number from another large number
  5. Multiplication of a large number by an 8 bit binary
  6. Listing decimals in binary
  7. Converting large numbers to a decimal string
  8. Counting long times with 64-bit counter
  9. Conclusions

1 The structure of 64 bit numbers in an 8 bit controller

An 8 bit controller can handle 8 bit numbers only? No, it can handle nearly any length. As long there is enough memory space for storing and handling.

16 bits require two eight bit registers, e.g. the registers R0 and R1. In assembler you have the freedom to place those two bytes to wherever you want and in any row. Whether the most significant byte (MSB) of the two is in R0 or in R1 is up to the assembler programmer. By simple convention we write this either as R1:R0 (if the MSB is in R1) or as R0:R1 (if the MSB is in R0).

Extending the 16 bits to 64 bits is easy: just add more registers. The 64 bits are e.g. stored like this: R7:R6:R5:R4:R3:R2:R1:R0. You can also use R3:R7:R6:R0:R2:R1:R4:R3:R5, if you can remember the row. It is your own choice to make life a little more complicated.

To write understandable code I named R7:R6:R5:R4:R3:R2:R1:R0 as N1 and abbreviate this as R7:...:R0. If you need a second one of the same 64 bit size: R15:R14:R13:R12:R11:R10:R9:R8 would be a good choice. N2 or the abbreviation R14:...:R8 are simpler to remember.

As the AVRs have plenty registers available (not so in other controllers such as PICs or in the 3GHz monsters in the PC), it is not necessary to store those numbers in SRAM. If you run out of space in registers you can as well copy those to the SRAM of the controller. If your desired location in SRAM is sN1 (define this within the .DSEG data segment), you can perform that with the following code:

  sts sN1,R0 ; Store register R0 on location sN1
  sts sN1+1,R1
  sts sN1+2,R2
  sts sN1+3,R3
  sts sN1+4,R4
  sts sN1+5,R5
  sts sN1+6,R6
  sts sN1+7,R7

If you want to copy this 64 bit number to a different location, e.g. sN2, you'll have to copy all that and replace sN1 by sN2.

As copying always involves 8 bytesin a row, you can choose another way of doing that: by the use of pointers. Here we use the pointer registers Pointing X to the register R0 is simple: just set the address to zero. E.g. with the instructions clr XH and clr XL. With the instruction ld R16,X+ the content of register R0 is temporarily copied to register R16, with st Z+,R16 its content is written to the address in SRAM that Z points to. The "+" in X+ and Z+ means autoincrement the address: following load and store the addresses in X and Z already point to the next register resp. SRAM location. If X reaches 8, the copy action is over, otherwise additional runs are done.

The source code here provides a routine called CopyN1ToSramN1: that copies the content of the registers in N1 to a location sN1 in SRAM:

; Copy N1 to SRAM location
  ldi ZH,High(sN1) ; Point Z to Sram N1, MSB
  ldi ZL,Low(sN1) ; dto., LSB
  clr XH ; Point X to N1, MSB
  clr XL ; dto., LSB
  ld rmp,X+ ; Read register
  st Z+,rmp ; Write to SRAM
  cpi XL,8 ; All 8 bytes copied?
  brne CopyN1ToSram1

That routine is fast: at 1 MHh clock rate it consumes 68 µs time (including an RCALL and RET instruction. And it can be used for any SRAM location: just point Z to another location and call CopyN1ToSram with that.

Where did I get the 68 µs from? Now, This was a short introduction to simulation. If you want to know more, consult the handbook of avr_sim.

2 Converting decimal strings to binary

Such large numbers cannot be defined with an .EQU directive because the assembler uses only 32-bit integers internally. How can we start with a number of e.g. 123.456.789.012.345.678?

Now, the answer is simple: define a string in the source code

.db "123,456,789,012,345,678",0 ; A very large first number
This places a string of ASCII characters into flash memory at location ANumber1:, starting with a '1' as LSB and '2' as MSB, and so on. The number string ends with a zero byte, so our number conversion routine Dec2BinN1: recognizes the end of the number string. If the number of characters on the .db line is odd, just add another 0 at the end (or start the number with a blank). This routine also accepts commas, dots or spaces as thousands separators and ignores those ASCII characters.

Number conversion starts with clearing N1 (R7:...:R0). Pointer register Z points to the first character at the doubled address of ANumber1:. Doubled because the LPM instruction has to be able to read the LSB and the MSB at that 16-bit address. The first ASCII character is read, which is the '1' or hexadecimal 0x31. Only ASCII numbers between '0' and '9' and separator characters are correct, other characters lead to a return with the carry flag set, to signal an error to the calling routine.

The '1' is then converted to binary by subtracting 0x30 ('0') from it. Now, the previous N1 has to be multiplied by 10 to add the next digit to it. As N1 was zero, nothing will change with multiplicating by 10. If the next following numbers are read, the multiplication by 10 goes as follows:
  1. copy the eight bytes of N1 to another location (either to R15:...:R8 or to the SRAM),
  2. multiply N1 by two (left shifting R0, then rotate left R1 - with the carry bit going into bit 0 of the next byte -, then R2, and so on until R7),
  3. if the last rotate ends with a carry flag of one, an error has occurred (overflow error, number is too large to fit into 64 bits),
  4. then again multiply by two, so that N1 is multiplied by four now (check again for an overflow),
  5. now add the prevously stored number, by using the instruction add R0,R8 and adc R1,R9 and so on to account for any carry flags from the lower byte addition,
  6. if after the last adc R7,R15 the carry is set, signal an overflow error,
  7. then multiply the five-fold of the previous number by two again and yield the result of the multiplication by 10.
Now we can add the incoming new binary to N1: just add R0,R16. To account for a carry, we load R16 with 0 (with LDI, not with CLR because that would clear the carry flag, too) and add this zero to all other registers too by using adc Rn,R16.

If the next ASCII character is 0x00, the number is on its end and Dec2BinN1 can return with a cleared carry flag (no error condition).

To simulate the conversion of 123.456.789.012.345.678 to a 64 bit binary just single-step through the code from the beginning. Note that the conversion of that number consumes 2,235 Milliseconds or needs 2,235 instructions to be executed.

Later on in the source code a smaller number is converted. The time consumed is nearly proportional to the number of digits to be converted, because the multiplication by 10 is the most time consuming step.

All very fast for a controller handling 8 bits only.

3 Binary adding of two large numbers

That is a simple task: just add the LSB by add R0,R8, then adc R1,R9 to add the carry also and continue until adc R7,R15. If the last ADC ends with the carry flag set, an overflow error occurred.

Adding takes only 16 µs and is very fast.

The source code then copies the result to location sN3, so that the binaries N1 and N2 and the result N3 are all listed in SRAM and can be viewed there when simulating. Place a breakpoint there behind the rcall to stop at this point.

4 Subtracting a large number from another large number

This is also simple: just use sub R0,R8 and sbc R1,R9 etc. to care for a carry. In that case an underflow occurs if N2 is larger than N1, as signalled by the carry flag being set.

Subtraction is performed in very fast 16 µs.

The result of the subtraction is also written to sN3 in the SRAM.

5 Multiplication of a large number by an 8 bit binary

A little more sophisticated is multiplication. The first steps are to copy N1 to N2 and to clear N1.

Then bit by bit is right-shifted out of R16 to the carry flag. If it is a one, the number N2 is added to N1. Error checking is done after the last ADC.

Then it is checked if R16 is already zero (no more to multiplicate). If that is the case, the subroutine resturns with a clear carry flag.

If not, N2 is multiplied by two. Error checking is done after the last ROL, signalling an overflow.

Multiplication with an 8-bit binary of 255 (all bits to be added) needs very fast 205 µs. A hardware multiplicator would not be very much faster than this software multiplication.

The result is also copied to sN3.

6 Listing decimals in binary

A convenient tool follows in routine ListDecimals:: it calculates all binaries for decimals of 1, 100, 1000, etc., until it does not fit into 64 bit any more, and lists those in SRAM. The last decimal cannot be calculated using normal calculators because those use signed 64 bit integers and interpret the 64th bit as sign.

Calculation and listing needs 2.991 ms.

7 Converting large numbers to a decimal string

Now the master's discipline: converting 64 bit binaries back to ASCII strings. Leading zeroes are blanked and thousands separators added, so that 26 digits have to be handled.

This involves a DecimalTable:, as was generated from the above listing. It starts with the binary of 1019, the largest one to be handled in 64 bits, and goes down to 101. After that the table ends with a 0xFF to signal the end.

The math of conversion of the 64 bit binary in N1 goes like this:
  1. the 64 bit binary representation of the decimal is read from the flash table to N2, if the first byte is 0xFF the end of conversion has been reached,
  2. it is subtracted from N1 until an underflow occurs,
  3. the number of successful subtractions is converted to an ASCII digit subi R16,-'0' adds 0x30 to the digit,
  4. then it is checked if the T flag is set, which signals blanking of leading zeroes, if not set the characters is written directly to the SRAM buffer, if set it is checked whether the digit is still a zero, if yes, a blank is issued, if not, the T flag is cleared and the number is written to SRAM,
  5. a counter is decreased (initial from two downwards, later from three), if it reaches zero, a thousands separator is added.
The last digit is converted to ASCII by adding 0x30 to it and by writing the resulting character to the SRAM.

If you need the number left-adjusted just copy the X pointer when the T flag is cleared. On the beginning set this pointer to the last digit in SRAM to make sure that a single digit number is covered.

Conversion to an ASCII string requires 2.682 ms for the larger of the two numbers and 2.052 ms for the smaller one.

8 Counting long times with a 64-bit counter

Of course, if you need extremely long times, a 64-bit-wide counter is the correct solution. You can count any time between some tens of microseconds and up to several hundreds of years with that.

100 Years are
t [µs] = 100 [Years] * 365,25 [Days/Year] * 24 [Hours/Day] * 60 [Minutes/Day] * 60 [Seconds/Minute] * 1.000.000 [µs/second] = 9,467,280,000,000,000
Loop counter = t / 3 - t/3 DIV 256 - t/3 DIV 2562 - ... - t/3 DIV 2565 = 1,047,794,823,776,258 = 0x0003.B8F6.BE44.C802
and is by the 5,555-fold below the maximum number to be handled in an eight byte wide loop counter. If only the battery is enough longlived to work for 100 years!

See here for more on long counting.

9 Conclusions

Handling of 64 bit unsigned integers is a rather simple task over all. It involves some add/subtract with or without carry, some shift/rotate instructions and lots of error detection. Not at all operations that are beyond the intellect of a Java or C programmer, just something he does not care about when he declares variables.

Assembler provides fast (in the ms range) and efficient code to do all those operations. No need to involve lengthy and space consuming libraries or to change to a 16, 32, 48 or 64 bit controller.

One reason for that efficient code is that the AVRs have plenty of registers and a very efficient instruction set. Without that, the routines would be much longer and would consume very much longer times to execute.

To the top of that page

(©)2019 by