Path: Home => AVR overview => Programming techniques => AVR Number 1    (Diese Seite in Deutsch: Flag DE) Logo

Why AVRs are still the best controllers

Now, this is all nonsense. I startet in 1999 with AVRs and use them ever since for many different purposes. In that time I never touched limits in respect to storage space, speed (instructions per second), bits, etc. But: I did not try to design a mobile phone, an cable-free internet repeater, a navigator or a PC. That would be wasting time because you get those things for small money, no need for own efforts here.

AVR's best design idea: Plenty registers

Why should you use AVRs? Now, because they have 32 registers. This is more than any other controller provides, a so-called unique feature.

Why is that so important? Now: the content of the 32 registers can be directly fed into the central computing unit CPU. With only one instruction in machine code, e. g. add R0,R1, two 8 bit numbers can be added. With two of those instructions, add R0,R2 and adc R1,R3 two 16 bit numbers in the registers R1:R0 and R3:R2 are added. This runs as fast as possible and needs only two controller clock cycles.

Try to do that with a PIC micro: those have only one single register and you repeatedly have to read and write to the SRAM to resolve that. The two AVR instructions are blown up to several more in a PIC. Just because he is lacking registers.

So, the simple comparison of the clock rates of AVRs and PICs is by a factor of three to four misleading. If each AVR instruction translates to four in a PIC, the AVR runs accordingly much faster.

In most cases, and in simple projects and tasks, one does not need more storage space than the 32 registers.Therefore you don't need the lame SRAM in AVRs (reading and writing from/to the SRAM requires two clock cycles instead of one). Even the fact that several instruction types work with registers equal or above 16 only (e. g. ldi Register,Constant only works with R16 and above), is not really a disadvantage: you still have 16 registers left that can do it all, and that is still much more than in any other controller. With a little planning you can avoid having to use two instructions for that.

Those who leave register allocation for their compiler, because one believes that a high-C-compiler does it better, are relieved from that task, but what the compiler will select is not always the optimal solution.

(Nearly) Double speed by pre fetch

Another property that makes AVRs double as fast than PICs: while an instruction is executed the next instruction is already read from the program's flash memory (Pre-Fetch). If you are lucky, execution lasts exactly one clock cycle: if the executed instruction is not a branching or jumping instruction or, in case of conditional branching, if the condition is not true. Then the execution of the next instruction follows immediately. If that is not the case, flash read has to be repeated and the execution lasts two cycles.

To see, how often jumps or branches occur in source cvode, I have counted all 22,000 instructions in my self-written AVR source code and found that 28.11% of those were jump, call or branch instructions. If we assume that the count of instructions is a linear function of their execution occurrence (which is not a perfect match), pre-fetch makes AVRs by 1.72-fold faster than they were without that. That is not double, but very near to that.

With the above factor of four due to registers and this 1.72 an AVR at 20 MHz runs the same speed as a PIC with 137 MHz (do not try that at home, the PIC won't work at that speed).

More on pre-fetch and instruction execution in AVRs can be found here.

Optimal access to registers, ports and SRAM

Another increasing factor are the many more addressing opportunities that AVRs offer. Othe controllers do not have that broad variety. That makes code much more effective in AVRs.

The reason behind that are the 16 bits of instruction coding, while other controllers have much less (PICs: 12 bits). These extra bits allow more variations, e. g. for addressing memory accesses.

The following address modes are implemented in AVRs:

Address modeInstruction wordsVariationsCode examplesAvailable in PICs?
DirectTwo: Instruction word followed by
address as second word
Read, WriteLDS Register,Address
STS Address,Register
Pointer register pairOne (instruction word)Pointer register pairs X, Y, Z
Read, Write
LD Register,X/Y/Z
ST Register,X/Y/Z
Yes, but only one pointer
Pointer register pair
with post-auto-increment
One (instruction word)Pointer register pairs X, Y, Z
Read, Write
LD Register,X/Y/Z+
ST Register,X/Y/Z+
Pointer register pair
with pre-auto-decrement
One (instruction word)Pointer register pair X, Y, Z
Read, Write
LD Register,-X/Y/Z
ST -X/Y/Z,Register
Poniter register pair
with positive displacement
One (instruction word)Pointer register pair Y, Z
Read, Write
LDD Register,Y/Z+D
STD Y/Z+D,Register

From these broad opportunities anyone can select its favourite mode. You do not find much of these in other controller types.

Addressing registers

Addressing not only works with built-in SRAM. The 32 registers can be accessed by these methods, too: they are located at addresses 0x0000 to 0x001F. Reading from those addresses yields the register content, writing to those changes the register content. This can be programmed in assembler like that:

  ; Register-Address as a constant
  .equ Address = 3 ; Address 3 = register R3
  ; Read from address, increment and write to address
  lds R16,Address ; Read from address (copy R3 to R16)
  inc R16 ; Increment by one
  sts Address,R16 ; And write back to register R3

By using direct addressing the address is hard-coded: it is following the instruction word and is not subject to change.

More flexible is to use a pointer register pair. Following I use the X pair (R27:R26) for that.

  ; Register address as a constant
  .equ Address = 3 ; Address 3 = register R3
  ; Load pointer X with the address
  ldi XH,High(Address) ; MSB of address into the X pointer
  ldi XL,Low(Address) ; LSB of address into the X pointer
  ; Read from pointer address, increment and write back
  ld R16,X ; Register R16 from address (copy R3 to R16)
  inc R16 ; Increase by one
  st X,R16 ; And write back to register R3

If the pointer address in the register pair X is increased, e. g. with the post-increment instruction, or decreased, e. g. with the pre-decrement instruction, the same can be executed with the next or previous register.

As an example for that here a 24-bit counter in the registers R2:R1:R0:

  ; Point Z to R0
  clr ZH ; MSB of pointer Z clear
  clr ZL ; LSB of pointer Z clear
  ld R16,Z ; Read register content
  inc R16 ; Increase counter
  st Z+,R16 ; Store increased value and increment pointer
  breq CountLoop ; Repeat with next higher byte, if zero
  ; Ready counting up

So whenever this is executed, it counts the value in R2:R1:R0 one up. As the counter loop is executed even when R2:R1:R0 exceeds its maximum of 0xFFFFFF, the next bytes (in R3, R4, ...) also would count up, if you do not include a check to avoid that. The routine would count up to fill the whole registers up to R15, so would be a 16-by-8 or 128-bit wide counter, if not checked for such an overflow.

Addressing port registers

The same methods with pointers can be applied for accesses to the port registers. The 64 port registers have addresses between 0x0000 to 0x003F. Here, an excerpt of the port register list of an ATtiny13:

Port registers in an ATtiny13
The port registers of the external pins of port B are located at the addresses 0x0016 (Read or input port), 0x0017 (Direction port) and 0x0018 (Output or pull-up resistor port). To write all portpins to output a high signal you can use this:

  ldi R16,0x1F ; All physically available bits to one
  out PORTB,R16 ; and to the output port
  out DDRB,R16 ; as well as to the direction port

But you can also access the port registers PORTB and DDRB by using the fact that those ports have an offset address: all ports between 0x0000 and 0x003F appear at the addresses 0x0020 to 0x005F, with an offset of 32 0x 0x20 (just because the registers are already in that space).

  ldi R16,0x1F ; All physically available bits to one
  sts PORTB+32,R16 ; and to the output port PORTB
  sts DDRB+32,R16 ; and to the direction port DDRB

This is absolutely equivalent to the above OUT formulation and does the same, but needs one additional clock cycle because STS is a 2-word-instruction. So this is not the standard formulation.

But it is the standard formulation for port registers that are located beyond the 0x003F/0x005F address space. If your AVR device has lots of internal hardware, the port register space is completely booked out and some registers have to be placed to beyond the standard address space. Those port registers cannot be accessed via the OUT instruction, you'll have to use the STS instruction instead.

As an example here an excerpt from the ATmega48's list:

Port registers ATmega48

Here, not the port register addresses are given (those would exceed 0x003F) but their offset addresses. Therefore the status register SREG, usually at address 0x003F, is here at offset address 0x005F. Those names (such as "SREG") are associated with their offset address, so you can use STS SREG,R16 if you want to write the content of R16 to the SREG port register, instead of STS SREG+32,R16. For port registers below an offset address of 0x0060 still OUT can be used, but 32 has to be subtracted from that offset address: OUT SREG-32,R16.

Note that in many cases the SRAM starts on address 0x0100 because of many port registers needed. The constant SRAM_START reflects this within the respective file.

Adressing multiple data structures with displacements

A very special sort of addressing, implemented in all AVRs, is to add a positive displacement D to a base address pointer temporarily. The instruction LDD Register,Y+D adds D to the address in register pair Y and reads from that cell (register, port register or SRAM).

This addressing can be used if several similar data sets have to be accessed. As an example I'll take a data set that consists of a measured 10-bit ADC value, that is converted to a 16-bit binary voltage and then to an 7 character wide string for display on an LCD. The device shall measure four channels and display the results on four lines of the LCD. The data set structure is as follows:

Four channel data structure

Of course you can place the ADC results of all four channels in a row in SRAM, but as each result has to be multiplied by the same constant and the resulting binary has to be converted to an ASCII string in the same manner, placing those data bytes into a sorted channel data set, onto which Y or Z can point, has immense advantages: just change the base address and do the same operations on the data set. Many tasks involving addresses are simplified by that data set structure, e. g. by When the next result comes in, just add 11 to Y (adiw YL,11) and all the rest of operations goes the same way.

Try this kind of data structures with a controller that cannot handle displacements to pointers: you'll see that it is unefficient, not that easy to debug and ends in a lot more code.


Already the large number of registers, its pre-fetch mode and the large variety of addressing modes makes AVRs still state of the art, even though 20 years old. PICs, and several more modern controllers, do not reach this design level.

Just forget the assumption that newer is always better. Sometimes it is the other way around.

AVRs are still the best you can get.

To the top of that page

©2019 by