Pfad: Home => AVR-EN => Beginner's Intro => Instruction execution    (Diese Seite in Deutsch: Flag DE) Logo

Executing instructions in AVRs

1 How AVRs execute program instructions internally

Program instructions are stored as 16-bit-words in the flash memory in AVRs, those are written there by the programming device.

1.1 The program counter

The program counter starts at the address 0x0000. When starting up and during a hardware reset this address is written to the program counter (PC). The instruction word that is located there is read, decoded and finally executed.

The program counter then, after each instruction executed, advances by one and the instruction located there is executed. This is not the case, if the instruction word is a jump instruction, which changes the program counter to point to a different location in memory. Then, instruction words are taken from there, and the program counter advances there.

What if the instructions are over, no more instructions have been programmed? If the AVR comes into that situation (which should be thoroughly avoided), he reads 0xFFFF from the unprogrammed flash location. This is an undefined instruction word, and the AVR does nothing at all. He just advances and reads the next word.

And what happens if the complete prgogram memory exhausts? Now, he is simply starting from address 0x0000. That is similar to a reset, but not exactly what happens during a reset: the registers are not cleared and the portregisters are not set to their default values. But this wrap-around is exactly what happens in an un-programmed AVR: he starts all over again, because the complete memory is filled with 0xFFFF.

1.2 The ALU

Arithmetic/Logical Unit ALU Many of thze AVR's instructions perform arithmetic or logic operations. That is done by the respective unit ALU in the AVRs: this unit can connect up to two values on its input lines 1 and 2 and can add or subtract or can AND, OR or EXOR these two values. What operation is to be performed is derived from the injstruction word, see below for details.

The ALU writes the result of the operation e. g. to a target register. Events such as overflows (carry, C) or detection of a zero result (zero, Z) are named flags and are copied to the status register SREG. There, those flags can be read and used in further instructions for e. g. conditional jumps.

Zum Seitenanfang

1.3 Instruction decoding

LDI instruction Decoding is the selection from the more than 100 different operations that AVRs can perform and execute, for example the LDI r,c instruction. This copies the 8-bit constant c, which is located within the instruction word in bits 0 to 4 and 8 to 11, to the input 1 of the arithmetical/logical unit (ALU) and writes it, on the next edge change of the clock signal, to the register r, which is located within the instruction word in the four bits 12 to 15, by setting the fifth bit high.

The operation LDI is encoded in the four most significant bits 12 to 15 of the instruction word: it is 0b1110. Decoding these bits tells the ALU to copy the byte on input 1 to the register.

As the encoded register number has only four bits in the instruction word, only 16 of the 32 registers can be addressed. The decision to set the fifth bit high and to address the registers R16 to R31 with those four bits is justified because the upper half of the registers also holds the 16-bit pointer registers X, Y and Z, and the LDI often is used to manipulate these pointers.

Similar to the LDI instruction, that can only address R16 and above, are other instructions that work with an 8-bit constant: and which all write the result back to the respective upper register r.

But this missing fifth bit is rather a rare construction, most of the instructions can address all 32 registers. E. g. the instructions allow to address all 32 registers as sources as well as as targets.

All bit combinations, that the instruction word can consist of, is documented in the "avr-instruction-set-manual" for all instructions under Descrption and 16-bit-Opcode. The manual can be downloaded from Microchip's website.

To the top of this page

1.4 Pre-Fetch-mechanism

Execution of instructions in AVRs is constructed so that the next instruction is already decoded during execution of the previous instruction. Finishing execution can immediately execute the next instruction without first having to decode it.

This so-called Pre-Fetch only fails if an instructions leads to altering the program counter: then the already decoded next instruction has to be fetched and decoded in an extra cyle. This is the case if the instruction is a jump JMP address or a relative jump RJMP label or if the condition of a conditional jump BRxC label or BRxS label is true and alters the program counter. Those instructions need two cycles instead of of one clock cycle, in the conditional case only if the condition is true.

As an example here the binary AND of two 8-bit numbers in the registers R16 and R17. Those are loaded with the binaries 0b01010101 (0x55) and 0b10101010 (0xAA). The source in assembler for that is:

  ldi R16,0x55 ; R16 to hexadecimal 55
  ldi R17,0xAA ; R17 to hexadecimal AA
  and R16,R17 ; AND R16 with R17, result to register R16

The simulation is made with avr_sim.

Load instructions AND instruction Displayed to the left is the register content after the two load instructions have been executed: both hexadecimals are set to the desired values. To the right the AND instruction has been exeuted after that: the ALU has written the result, zero, to R16.

Statusregister Taktdauer Because the result of the last instruction is zero, the Z flag in the status register SREG is one. With that, further instructions can work.

All three operations last exactly 3 µs at a clock rate of 1 MHz, each instruction lasted 1 µs. Without pre-fetch six µs would be consumed, doubling the time needed.

If following the AND, a conditional jump would be executed, which is only jumping if the result is zero, the source code would be:

  ldi R16,0x55 ; R16 to hexadecimal 55
  ldi R17,0xAA ; R17 to hexadecimal AA
  and R16,R17 ; AND, result to register R16
  breq Label ; If zero (equal) jump to label
  nop ; Here when not jumped
Label: ; Here when jumped and not jumped
Conditional jump Now the execution of the complete code has lasted 5 µs: the executed jump instruction BREQ Label has lasted two Micro-seconds, because the pre-fetch failed due to program counter alteration by the executed jump.

To the top of this page

2 Instruction coding

Instruction coding To understand how the AVRs process instructions here the ADD rx,ry instruction is shown in detail.

The 16-bit instruction word holds in its bits 10 to 15 the binary value 0b000011. That signals to the ALU that the two register contents on its inputs IN1 and IN2 have to be added.

The bits 0 to 3 plus the bit in 9 determine, which of the 32 registers is transported to IN1. Similarly the bits 4 to 8 determine the source for IN2. Both values are multiplexed by a 1-out-of-32 MUX.

The ALU adds the two values, the result in OUT is written to the register that is selecting IN1.

The ALU further determines, if during adding All instructions that connect two registers, such as e. g. are encoded in the instruction word similarly, only the highest six bits are different.

As there are five bits in the instruction word are available for selecting both registers, each of the 32 registers can serve as source as well as as target. Even both sources can be the same, e. g. adding the same register with itself, by that multiplying its content by two. This has a funny effect. When doing AND or OR with itself the result is zero only in case the register is zero from the beginning. ATMEL has given an AND of a register with itself a new menomonic named TST r, which sets the zero flag if the register r is zero. ORing the register with itself would do the same, only the binary code changes slightly.

A similar effect has the instruction ADC r,r: all bits in register r are shifted one position to the left, the lowest bit 0 receives the carry flag, and the highest bit 7 is shifted to the carry. Even though unnecessary the instruction was associated a mnemonic named ROL r, for rotate r left.

Another pseudo instruction is EOR r,r, which clears all one-bits in register r, because EXORing a one with a one yields zero. This also got its own mnemonic: CLR r, which results in the same instruction word like EOR r,r.

That is how you increase the pure number of different instructions without having to change the ALU's capabilities.

But even with that: all described operations require only one single clock cycle (and not more like in PICs). No longer-lasting SRAM accesses are necessary, all is available in the fast-acting 32 registers.

To the top of this page

Example: A 16-bit adder

As an example for the execution of instructions in AVRs we look at a 16-bit adder with overflow recognition. The source code for that:

; Defining the two numbers to be added (no instruction words, assembler directives) 
.equ n16bitZahl1 = 12345 ; Define the first 16-bit number, no clock cycle
.equ n16bitZahl2 = 45678 ; Define the second 16-bit number, no clock cycle
  ldi R16,Low(n16bitZahl1) ; The LSB of number 1 to R16, one clock cycle
  ldi R17,High(n16bitZahl1) ; The respective MSB to R17, one clock cycle
  ldi R18,Low(n16bitZahl2) ; The LSB of number 2 to R18, one clock cycle
  ldi R19,High(n16bitZahl2) ; The respective MSB to R19, one clock cycle
  add R16,R18 ; Add the two LSB, one clock cycle
  adc R17,R19 ; And add the two MSBs plus the previous carry, one clock cycle
  brcc Ready ; If no overflow occurs: jump over the next instructions, two cycles when jumping
  ldi R16,0xFF ; Set result to maximum 16-bbit value, LSB, one clock cycle
  ldi R17,0xFF ; The same for the MSB, one clock cycle
Loading the 16-bit numbers Loading and adding On the left, the four load instruction have been executed and the numbers are in the four registers, lasting four clock cycles or 4 µs at 1 MHz. To the right, the addition has been executed. After the addition, the carry flag was not set, because the result is smaller than 65,536. The conditional jump has been executed and the result is done within 8 µs.

By provoking an overflow, e. g. by defining the first 16-bit number as 23,456, then the result looks different:

Different 16-bit number Adding the different number Now, the first number in R17:R16 is 0x5BA0 (to the left). The result (to the right) is different, and the carry flag has been set. Both registers are to be set to 0xFF.

Result to largest 16-bit value Here, the result has been corrected to not exceed the 16-bit range. Now, 9 µs are needed. Instead of the two clock cycles for the jump in the previous addition the jump has now lasted only one clock cycle, but the two load instructions of 0xFF require two additional cycles: the first one is compensating the shorter jump and the second one adds one clock cycle.

All clocking problems can be resolved with the methods shown here, can be analyzed using simulation and are completely transparent. The execution process is simple to understand, no unclear features or properties here.

To the top of this page

©2019 by