The ARM architecture is a load-store architecture, with a 32-bit addressing range (64-bit is also available). ARM processors are typical of RISC processors in that only load and store instructions can access memory. Data processing instructions operate on register contents only.
If we have 32-bit instructions, then
Register set/file consists of 16 registers.
Aka constants. Don’t require register or memory access. Are unsigned 12-bit (range 0–4095) numbers.
Examples
#24, #-5#0x18#11000; comment
Op Rd, S1, S2
Mnemonic/operation destination-operand, source-operand1, source-operand2
ADD: r1 = r2 + r3ADC: r1 = r2 + r3 + CSUB: r1 = r2 - r3SUBC: r1 = r2 - r3 + C - 1RSB: r1 = r3 - r2RSC: r1 = r3 - r2 + C - 1ANDORREORBIC: bit clear (used for masking),
BIC rd, r1, r2 \implies
rd = r1 and not(r2)MVN: move not, MVN rd, rs \implies rd = not(rs)LSL: logical shift leftLSR: logical shift right (shift 0s into msb)ASR: arithmetic shift right (shift sign-bit into
msb)ROR: rotate rightSome architectures provide:
MUL: R1 = R2 * R3
R1MULS: will set the N and Z flagsSMULL, UMULL
UMULL R1, R2, R3, R4R1R2SMUL: (32 <= 16 x 16) [no UMUL]MLAMLAS: will set the N and Z flagsSMLA: signed MAC (32 <= 32 + 16 x 16)SMLAL, UMLAL: (64 <= 64 + 32 x 32)Rd = S1 / S2
SDIVUDIVFlags: the top 4 bits of current program status register (CPSR)
If an instruction mnemonic is followed by s
(e.g.ADDS), then the N, Z conditional flags will be set.
Only some instructions (like CMP, ADDS,
SUBS) will set the C, V flags.
Comparison instructions
CMP: compare, CMP R0, R1 sets flags based
on R0 - R1CMN: compare negative, CMN R0, R1 sets
flags based on R0 + R1TST: test, TST R0, R1 sets flags based on
R0 and R1TEQ: test equivalence, TEQ R0, R1 sets
flags based on R0 xor R1If an instruction mnemonic is followed by a condition code
(e.g.ADDEQ), then the instruction is executed based on the
flags.
Conditional mnemonics
EQ: equalNE: not equalCS: carry setCC: no carryMI: negativePL: non-negativeVS: overflowVC: no overflowHS: unsigned \geq
(same as CS)LO: unsigned <
(same as CC)HI: unsigned >LS: unsigned \leqGE: signed \geqLT: signed <GT: signed >LE: signed \leqConditional selection
CSEL RDest, RTrue, RFalse, CM where CM is
the conditional mnemonic that evaluates the currently set flags.
Conditional comparison
CCMP R1, R2, flags, CM R1 and
R2 are only compared if the previous condition was true
(for CM), else sets the flags.
Other conditional instructions
CINC RDest, RTrue, CMCINV, CNEGLabels are single word in a line. They represent the address of the next instruction.
BBLFor non-branching instructions, PC increments by 4 after
each instruction. After a branch instruction, PC is changed
to contain the address of the next instruction to process. Branches:
If performance is a priority, try avoid branches. Successive branch instructions are especially inefficient.
Operations
LDR: load register, loads word into register
LDR Rd, [Rbase, offset]STR: store register, stores word into memory
STR Rd, [Rbase, offset]LDRB, STRB: load, store a byteLDRH, STRH: load, store a halfwordLDRSB: load a signed byteLDRSH: load a signed halfwordLDM, STM: load, store multiple wordsDefinitions
Barrel shifter
The shift operations can be applied to S2 as part of
another instruction.
The barrel shifter can be used to simplify indexing operations:
LSL R2, R1, #2 ; the index/offset, which is a multiple of 4 (bytes)
STR R3, [R0, R2] ; R0 = array, R2 = index, R3 = value to store
; the above can be simplified to
STR R3, [R0, R1, LSL #2]
ARM indexing modes
LDR Rdest, [Rbase, offset]LDR Rdest, [Rbase, offset]!LDR Rdest, [Rbase], offsetDirectives
DCB sets data in memory and returns the address.
mydata DCB 0x0, 0x1, 0x2, 0x3, 0x4
LDR R0, =mydataAka procedures, subroutines. Take in arguments and output a return value.
The calling convention is a scheme for the caller and the callee to agree on where to put the args and the return value.
LR)
or memory needed by the callerBL) to call a
function. BL performs two tasks:
BL) in the link register (LR)R4-R11LR is moved to the PC:
MOV PC, LRBX), branches to an address contained
in a register: MOV PC, LR is then simply
BX LRThe stack
R13), is just a register that
points to (i.e. stores the memory address of) the top of the stackAllows for loading 32-bit constants (from the literal pool in the text segment). Can also load the address of a labelled variable or pointer in the program.
LDR Rd, =literal
LDR Rd, =label
The assembler places the constant/label in a literal pool and
generates a PC-relative LDR instruction that reads the constant from the
literal pool e.g. LDR R1, [PC, #0x2A]
Used to achive a delay or to memory-align instructions. Translated to
MOV R0, RO.
An unscheduled function call that branches to a new address. Can be caused by hardware or software. - Hardware exception triggered by I/O is called an interrupt. - Reading non-existent memory The program then branches to code in the OS.
QADD, QDADD, QDSUB, QSUB
When overflow happens, the result is fixed at the most positive or most negative number. ARM has a Q flag to indicate whether overflow or saturation has occurred.
The memory address space is divided into:
In detail:
ARM architecture has a weakly ordered memory model. The processor can re-order the memory access instructions to reduce the time it takes to access memory.
The order in which instructions are written is not guaranteed, and instead may be executed depending on the memory access cost of a given instruction. This approach does not impact single core machine but can negatively impact a multi-threaded program running on a multicore machine.
In such situations, there are instructions, called “memory barriers”,
that tell processors not to re-arrange memory access at a given point.
The dmb (Data Memory Barrier) instruction in ARM64 prevents
reordering of data accesses (loads, stores) across the DMB
instruction.
if (R0 == R1)
// do if
// end if CMP R0, R1
BNE DONE
; do if
DONE
if (R0 == R1)
// do if
else
// do else
// end if CMP R0, R1
BNE ELSE
; do if
B DONE
ELSE
; do else
DONE
switch (R0)
case R1:
// do R1 case
break;
case R2:
// do R2 case
break;
default:
// do defaultCASE1
CMP R0, R1
BNE CASE2
; do R1 case
B DONE
CASE2
CMP R0, R2
BNE DEFAULT
; do R2 case
B DONE
DEFAULT
; do default
DONE
while (R0 == R1)
// do whileWHILE
CMP R0, R1
BNE DONE
; do while
B WHILE
DONE
do {
// do dowhile
} while (R0 == R1)DOWHILE
; do dowhile
CMP R0, R1
BEQ DOWHILE
DONE
for (int R0 = 0; RO < R1; R0++)
// do forMOV R0, #0
FOR
CMP R0, R1
BGE DONE
; do for
ADD R0, R0, #1
B FOR
DONE
int array[] = {0, 1, 2};MOV R0, #0x10000 ; the base address for array (e.g.)
MOV R1, #0
STR R1, [R0, #0]
MOV R1, #1
STR R1, [R0, #4]
MOV R1, #2
STR R1, [R0, #8]
// ...
int y = 3;
int z = double(y);
// ...
int double(int x){
return 2 * x;
} MOV R0, #3
BL DOUBLE
DOUBLE
ADD R0, R0, R0 ; i.e. 2 * R0
MOV PC, LR
Thumb instructions are 16 bits long. This allows for
But this comes at the expense of:
ARM has fixed length encoding of 4-bytes in contrast to x86 which has
variable length encoding. A return instruction ret on x86
can be as short as 1-byte, but on ARM64, it is always 4-bytes long.