Assembly

ARM

The ARM architecture is a load-store architecture, with a 32-bit addressing range (64-bit is also available). ARM processors are typical of RISC processors in that only load and store instructions can access memory. Data processing instructions operate on register contents only.

Instruction Cycle

If we have 32-bit instructions, then

Registers

Register set/file consists of 16 registers.

Immediates

Aka constants. Don’t require register or memory access. Are unsigned 12-bit (range 0–4095) numbers.

Examples

Instructions

; comment

Op Rd, S1, S2

Mnemonic/operation destination-operand, source-operand1, source-operand2

Operations

Some architectures provide:

Conditionals

Flags: the top 4 bits of current program status register (CPSR)

If an instruction mnemonic is followed by s (e.g.ADDS), then the N, Z conditional flags will be set. Only some instructions (like CMP, ADDS, SUBS) will set the C, V flags.

Comparison instructions

If an instruction mnemonic is followed by a condition code (e.g.ADDEQ), then the instruction is executed based on the flags.

Conditional mnemonics

Conditional selection

CSEL RDest, RTrue, RFalse, CM where CM is the conditional mnemonic that evaluates the currently set flags.

Conditional comparison

CCMP R1, R2, flags, CM R1 and R2 are only compared if the previous condition was true (for CM), else sets the flags.

Other conditional instructions

Labels

Labels are single word in a line. They represent the address of the next instruction.

Branching

For non-branching instructions, PC increments by 4 after each instruction. After a branch instruction, PC is changed to contain the address of the next instruction to process. Branches:

If performance is a priority, try avoid branches. Successive branch instructions are especially inefficient.

Memory

Operations

Definitions

Barrel shifter

The shift operations can be applied to S2 as part of another instruction.

The barrel shifter can be used to simplify indexing operations:

LSL R2, R1, #2   ; the index/offset, which is a multiple of 4 (bytes)
STR R3, [R0, R2] ; R0 = array, R2 = index, R3 = value to store

; the above can be simplified to
STR R3, [R0, R1, LSL #2]

ARM indexing modes

Directives

DCB sets data in memory and returns the address.

mydata  DCB 0x0, 0x1, 0x2, 0x3, 0x4
        LDR R0, =mydata

Functions

Aka procedures, subroutines. Take in arguments and output a return value.

The calling convention is a scheme for the caller and the callee to agree on where to put the args and the return value.

The stack

Pseudo-instructions

Loading Literals

Allows for loading 32-bit constants (from the literal pool in the text segment). Can also load the address of a labelled variable or pointer in the program.

LDR Rd, =literal
LDR Rd, =label

The assembler places the constant/label in a literal pool and generates a PC-relative LDR instruction that reads the constant from the literal pool e.g. LDR R1, [PC, #0x2A]

NOP

Used to achive a delay or to memory-align instructions. Translated to MOV R0, RO.

Exceptions

An unscheduled function call that branches to a new address. Can be caused by hardware or software. - Hardware exception triggered by I/O is called an interrupt. - Reading non-existent memory The program then branches to code in the OS.

Saturated Arithmetic Instructions

QADD, QDADD, QDSUB, QSUB

When overflow happens, the result is fixed at the most positive or most negative number. ARM has a Q flag to indicate whether overflow or saturation has occurred.

Memory Map

The memory address space is divided into:

In detail:

Advanced Topics

ARM architecture has a weakly ordered memory model. The processor can re-order the memory access instructions to reduce the time it takes to access memory.

The order in which instructions are written is not guaranteed, and instead may be executed depending on the memory access cost of a given instruction. This approach does not impact single core machine but can negatively impact a multi-threaded program running on a multicore machine.

In such situations, there are instructions, called “memory barriers”, that tell processors not to re-arrange memory access at a given point. The dmb (Data Memory Barrier) instruction in ARM64 prevents reordering of data accesses (loads, stores) across the DMB instruction.

Structured programming

If Statement

if (R0 == R1)
    // do if
// end if
    CMP R0, R1
    BNE DONE
    ; do if
DONE

If-else Statement

if (R0 == R1)
    // do if
else
    // do else
// end if
    CMP R0, R1
    BNE ELSE
    ; do if
    B DONE
ELSE
    ; do else
DONE 

Switch Statement

switch (R0)
    case R1:
        // do R1 case
        break;
    case R2:
        // do R2 case
        break;
    default:
        // do default
CASE1
    CMP R0, R1
    BNE CASE2
    ; do R1 case
    B DONE
CASE2
    CMP R0, R2
    BNE DEFAULT
    ; do R2 case
    B DONE
DEFAULT
    ; do default
DONE

While Loops

while (R0 == R1)
    // do while
WHILE
    CMP R0, R1
    BNE DONE
    ; do while
    B WHILE
DONE

Do-while Loops

do {
    // do dowhile
} while (R0 == R1)
DOWHILE
    ; do dowhile
    CMP R0, R1
    BEQ DOWHILE
DONE

For Loops

for (int R0 = 0; RO < R1; R0++)
    // do for
MOV R0, #0
FOR
    CMP R0, R1
    BGE DONE
    ; do for
    ADD R0, R0, #1
    B FOR
DONE

Arrays and Indexing

int array[] = {0, 1, 2};
MOV R0, #0x10000 ; the base address for array (e.g.)
MOV R1, #0
STR R1, [R0, #0]
MOV R1, #1
STR R1, [R0, #4]
MOV R1, #2
STR R1, [R0, #8]

Function call

    // ...
    int y = 3;
    int z = double(y);
    // ...

int double(int x){
    return 2 * x;
}
    MOV R0, #3
    BL DOUBLE


DOUBLE
    ADD R0, R0, R0 ; i.e. 2 * R0
    MOV PC, LR

ARM Thumb

Thumb instructions are 16 bits long. This allows for

But this comes at the expense of:

To-Do

ARM has fixed length encoding of 4-bytes in contrast to x86 which has variable length encoding. A return instruction ret on x86 can be as short as 1-byte, but on ARM64, it is always 4-bytes long.

References