AVR overview =>
Programming techniques => AVR Number 1
(Diese Seite in Deutsch:
Why AVRs are still the best controllers
Now, this is all nonsense. I startet in 1999 with AVRs and use them
ever since for many different purposes. In that time I never touched
limits in respect to storage space, speed (instructions per second),
bits, etc. But: I did not try to design a mobile phone, an cable-free
internet repeater, a navigator or a PC. That would be wasting time
because you get those things for small money, no need for own efforts
- Why should I start with AVRs? They are older than two decades,
aren't there more modern controllers and better ones?
- Why should I start with 8 Bit, which is so small and while there
are already 16, 32 and 64 Bits available?
- Those who know nothing about controllers have oit easy: they
select what they were told is best, by whom ever and on which
basis ever. Good to have someone to assist.
- Or are sales figures the criterion? What sells better is
AVR's best design idea: Plenty registers
Why should you use AVRs? Now, because they have 32 registers. This
is more than any other controller provides, a so-called unique
Why is that so important? Now: the content of the 32 registers can
be directly fed into the central computing unit CPU. With only one
instruction in machine code, e. g. add R0,R1, two
8 bit numbers can be added. With two of those instructions,
add R0,R2 and adc R1,R3 two 16 bit numbers in the
registers R1:R0 and R3:R2 are added. This runs as fast as possible
and needs only two controller clock cycles.
Try to do that with a PIC micro: those have only one single register
and you repeatedly have to read and write to the SRAM to resolve
that. The two AVR instructions are blown up to several more in a PIC.
Just because he is lacking registers.
So, the simple comparison of the clock rates of AVRs and PICs is by a
factor of three to four misleading. If each AVR instruction translates
to four in a PIC, the AVR runs accordingly much faster.
In most cases, and in simple projects and tasks, one does not need
more storage space than the 32 registers.Therefore you don't need
the lame SRAM in AVRs (reading and writing from/to the SRAM requires
two clock cycles instead of one). Even the fact that several instruction
types work with registers equal or above 16 only (e. g.
ldi Register,Constant only works with R16 and above), is not
really a disadvantage: you still have 16 registers left that can do it
all, and that is still much more than in any other controller. With a
little planning you can avoid having to use two instructions for that.
Those who leave register allocation for their compiler, because one
believes that a high-C-compiler does it better, are relieved from
that task, but what the compiler will select is not always the
(Nearly) Double speed by pre fetch
Another property that makes AVRs double as fast than PICs: while an
instruction is executed the next instruction is already read from
the program's flash memory (Pre-Fetch). If you are lucky, execution
lasts exactly one clock cycle: if the executed instruction is not
a branching or jumping instruction or, in case of conditional branching,
if the condition is not true. Then the execution of the next instruction
follows immediately. If that is not the case, flash read has to be
repeated and the execution lasts two cycles.
To see, how often jumps or branches occur in source cvode, I have
counted all 22,000 instructions in my self-written AVR source code
and found that 28.11% of those were jump, call or branch
instructions. If we assume that the count of instructions is a
linear function of their execution occurrence (which is not a perfect
match), pre-fetch makes AVRs by 1.72-fold faster than they were
without that. That is not double, but very near to that.
With the above factor of four due to registers and this 1.72 an
AVR at 20 MHz runs the same speed as a PIC with 137 MHz
(do not try that at home, the PIC won't work at that speed).
More on pre-fetch and instruction execution in AVRs can be found
Optimal access to registers, ports and SRAM
Another increasing factor are the many more addressing opportunities
that AVRs offer. Othe controllers do not have that broad variety.
That makes code much more effective in AVRs.
The reason behind that are the 16 bits of instruction coding, while
other controllers have much less (PICs: 12 bits). These extra bits
allow more variations, e. g. for addressing memory accesses.
The following address modes are implemented in AVRs:
|Address mode||Instruction words||Variations||Code examples||Available in PICs?|
|Direct||Two: Instruction word followed by|
address as second word
|Read, Write||LDS Register,Address|
|Pointer register pair||One (instruction word)||Pointer register pairs X, Y, Z|
|Yes, but only one pointer|
|Pointer register pair|
|One (instruction word)||Pointer register pairs X, Y, Z|
|Pointer register pair|
|One (instruction word)||Pointer register pair X, Y, Z|
|Poniter register pair|
with positive displacement
|One (instruction word)||Pointer register pair Y, Z|
From these broad opportunities anyone can select its favourite mode.
You do not find much of these in other controller types.
Addressing not only works with built-in SRAM. The 32 registers can be
accessed by these methods, too: they are located at addresses 0x0000
to 0x001F. Reading from those addresses yields the register content,
writing to those changes the register content. This can be programmed
in assembler like that:
By using direct addressing the address is hard-coded: it is
following the instruction word and is not subject to change.
; Register-Address as a constant
.equ Address = 3 ; Address 3 = register R3
; Read from address, increment and write to address
lds R16,Address ; Read from address (copy R3 to R16)
inc R16 ; Increment by one
sts Address,R16 ; And write back to register R3
More flexible is to use a pointer register pair. Following
I use the X pair (R27:R26) for that.
If the pointer address in the register pair X is increased,
e. g. with the post-increment instruction, or decreased,
e. g. with the pre-decrement instruction, the same can
be executed with the next or previous register.
; Register address as a constant
.equ Address = 3 ; Address 3 = register R3
; Load pointer X with the address
ldi XH,High(Address) ; MSB of address into the X pointer
ldi XL,Low(Address) ; LSB of address into the X pointer
; Read from pointer address, increment and write back
ld R16,X ; Register R16 from address (copy R3 to R16)
inc R16 ; Increase by one
st X,R16 ; And write back to register R3
As an example for that here a 24-bit counter in the registers
So whenever this is executed, it counts the value in R2:R1:R0
one up. As the counter loop is executed even when R2:R1:R0
exceeds its maximum of 0xFFFFFF, the next bytes (in R3, R4,
...) also would count up, if you do not include a check to
avoid that. The routine would count up to fill the whole
registers up to R15, so would be a 16-by-8 or 128-bit wide
counter, if not checked for such an overflow.
; Point Z to R0
clr ZH ; MSB of pointer Z clear
clr ZL ; LSB of pointer Z clear
ld R16,Z ; Read register content
inc R16 ; Increase counter
st Z+,R16 ; Store increased value and increment pointer
breq CountLoop ; Repeat with next higher byte, if zero
; Ready counting up
Addressing port registers
The same methods with pointers can be applied for accesses to
the port registers. The 64 port registers have addresses between
0x0000 to 0x003F. Here, an excerpt of the port register list of
The port registers of the external pins of port B are located
at the addresses 0x0016 (Read or input port), 0x0017 (Direction
port) and 0x0018 (Output or pull-up resistor port). To write
all portpins to output a high signal you can use this:
But you can also access the port registers PORTB and DDRB by
using the fact that those ports have an offset address: all
ports between 0x0000 and 0x003F appear at the addresses 0x0020
to 0x005F, with an offset of 32 0x 0x20 (just because the
registers are already in that space).
ldi R16,0x1F ; All physically available bits to one
out PORTB,R16 ; and to the output port
out DDRB,R16 ; as well as to the direction port
This is absolutely equivalent to the above OUT formulation
and does the same, but needs one additional clock cycle
because STS is a 2-word-instruction. So this is not
the standard formulation.
ldi R16,0x1F ; All physically available bits to one
sts PORTB+32,R16 ; and to the output port PORTB
sts DDRB+32,R16 ; and to the direction port DDRB
But it is the standard formulation for port registers that
are located beyond the 0x003F/0x005F address space. If your
AVR device has lots of internal hardware, the port register
space is completely booked out and some registers have to be
placed to beyond the standard address space. Those port registers
cannot be accessed via the OUT instruction, you'll
have to use the STS instruction instead.
As an example here an excerpt from the ATmega48's list:
Here, not the port register addresses are given (those would
exceed 0x003F) but their offset addresses. Therefore the status
register SREG, usually at address 0x003F, is here at offset
address 0x005F. Those names (such as "SREG") are
associated with their offset address, so you can use STS SREG,R16
if you want to write the content of R16 to the SREG port register,
instead of STS SREG+32,R16. For port registers below an offset
address of 0x0060 still OUT can be used, but 32 has to
be subtracted from that offset address: OUT SREG-32,R16.
Note that in many cases the SRAM starts on address 0x0100
because of many port registers needed. The constant SRAM_START
reflects this within the respective def.inc file.
Adressing multiple data structures with displacements
A very special sort of addressing, implemented in all AVRs,
is to add a positive displacement D to a base address pointer
temporarily. The instruction LDD Register,Y+D adds
D to the address in register pair Y and reads from that cell
(register, port register or SRAM).
This addressing can be used if several similar data sets have
to be accessed. As an example I'll take a data set that consists
of a measured 10-bit ADC value, that is converted to a 16-bit
binary voltage and then to an 7 character wide string for
display on an LCD. The device shall measure four channels and
display the results on four lines of the LCD. The data set
structure is as follows:
Of course you can place the ADC results of all four channels
in a row in SRAM, but as each result has to be multiplied by
the same constant and the resulting binary has to be converted
to an ASCII string in the same manner, placing those data
bytes into a sorted channel data set, onto which Y or Z can
point, has immense advantages: just change the base address
and do the same operations on the data set. Many tasks
involving addresses are simplified by that data set structure,
e. g. by
When the next result comes in, just add 11 to Y (adiw YL,11)
and all the rest of operations goes the same way.
- writing the ADC result to Y (LSB) and Y+1 (MSB),
- converting this to a voltage by multiplication and
writing the result to Y+2 (LSB) and Y+3 (MSB), and by
- converting this to a decimal string with a decimal
point and the voltage dimension, to located be written
to Y+4 until Y+10.
Try this kind of data structures with a controller that cannot
handle displacements to pointers: you'll see that it is
unefficient, not that easy to debug and ends in a lot more
Already the large number of registers, its pre-fetch mode and
the large variety of addressing modes makes AVRs still state of
the art, even though 20 years old. PICs, and several more modern
controllers, do not reach this design level.
Just forget the assumption that newer is always better. Sometimes
it is the other way around.
AVRs are still the best you can get.
To the top of that page
©2019 by http://www.avr-asm-tutorial.net