x86-64 assembly language reference
x86-64 machine code is the native language of the processors in most desktop and laptop computers. x86-64 assembly language is a human-readable version of this machine code.
x86-64 has hundreds of instructions, and compiling programs to the most efficient machine code requires a good understanding of all of them–indeed, the fastest C compiler for x86-64 processors is developed by Intel! However, we'll be able to develop a perfectly functional compiler using only a small subset of x86-64 instructions.
This is a guide to that subset of x86-64, and to the OCaml library we have provided to produce x86-64 instructions.
x86-64 assembly language
The assembly programs produced by our compiler have the following form:
;; frontmatter: global, etc. entry: ;; instructions label1: ;; more instructions label2: ;; more instructions
At the top of the file are some special directives to the assembler,
telling it which labels should be visible from outside the file (for
now, just the special entry
label). After that, each line is either a
label, which indicates a position in the program that other lines can
reference, or an instruction, which actually tells the processor what
to do.
In this class, your compiler won't emit assembly code directly. Instead, you'll use an OCaml library developed by the course staff. This library handles some differences between operating systems and idiosyncrasies of x86-64. The rest of this document focuses on the library.
Directives
The main interface to our OCaml library is the directive
type. A
directive
corresponds to a single line of assembly code; we will
produce a .s
file from a list of these directives. Directives,
therefore, correspond directly to frontmatter declarations, labels, and
instructions.
Operands
Many directives take one or more arguments. For most instructions, these
arguments are instances of the operand
type. An operand can be any one
of:
- A register, written
Reg <register>
(for instance,Reg Rax
orReg R8
). - An "immediate" numerical constant value, written
Imm <num>
. - An offset into memory defined by an additional two operands. For
instance,
MemOffset(Reg Rsp, Reg Rax)
refers to the memory location atrsp + rax
.
Some directives–jumps, for instance–take a string
naming a label
instead of an operand
.
Register conventions
- The
ret
directive expects to find a memory location stored in the registerrsp
. At this location in memory, it expects to find a return address, that is, the address of the next assembly directive to execute. - Function calls expect to find their first argument located in the register
rdi
. - When a function terminates, its return value (if it has one) is expected to be located in the register
rax
.
Directive reference
This table will be updated as the class progresses and homeworks require additional assembly directives. Notes on some instructions are below, as indicated.
Directive | Example asm | Description | Notes |
---|---|---|---|
Global of string |
Tells the assembler to export a label | ||
global entry |
|||
Section of string |
section .text |
Writes to a segment in the generated binary | |
Label of string |
label: |
Labels a program location | |
DqLabel of string |
dq label1 |
Writes the address of a particular label | DqLabel |
LeaLabel of (operand * string) |
lea rax, [label1] |
Loads a label's address into a register | LeaLabel |
Mov of (operand * operand) |
mov rax, [rsp + -8] |
Moves data between locations | |
Add of (operand * operand) |
add r8, rsp |
Adds its arguments, storing the result in the first one | |
Sub of (operand * operand) |
sub rax, 4 |
Subtracts its second argument from its first, storing the result in its first | |
Div of operand |
idiv r8 |
Divides the signed 128-bit integer rdx:rax by its argument, storing the result in rax |
Div and Mul |
Mul of operand |
imul [rsp + -8] |
Multiplies rax by its argument, storing the result in rdx:rax |
Div and Mul |
Cqo |
cqo |
Sign-extends rax into rdx |
|
Shl of (operand * operand) |
shl rax,2 |
Shifts its first argument left by its second argument | |
Shr of (operand * operand) |
shr rax,3 |
Shifts its first argument right by its second argument, padding with zeroes on the left | |
Sar of (operand * operand) |
sar rax,3 |
Shifts its first argument right by its second argument, padding with zeroes or ones to maintain the sign | Sar |
Cmp of (operand * operand) |
cmp r8, [rsp + -16] |
Compares its two arguments, setting RFLAGS | |
And of (operand * operand) |
and rax, r8 |
Does a bitwise AND of its arguments, storing the result in its first argument | |
Or of (operand * operand) |
or r8, 15 |
Does a bitwise OR of its arguments, storing the result in its first argument | |
Setz of operand |
setz al |
Sets its one-byte argument to the current value of ZF |
Setz and al |
Setl of operand |
setl al |
Sets its one-byte argument to the current value of (SF != OF) |
Setl |
Jmp of string |
jmp label1 |
Jumps execution to the given label | |
Je of string |
je label1 |
Jumps execution to the given label if ZF is set |
Jumps |
Jz of string |
jz label1 |
Jumps execution to the given label if ZF is set |
same as Je |
Jne of string |
jne label1 |
Jumps execution to the given label if ZF is not set |
Jumps |
Jnz of string |
jnz label1 |
Jumps execution to the given label if ZF is not set |
same as Jne |
Jl of string |
jl label1 |
Jumps execution to the given label if SF != OF |
Jumps |
Jnl of string |
jnl label1 |
Jumps execution to the given label if SF == OF |
Jumps |
Jg of string |
jg label1 |
Jumps execution to the given label if SF == OF AND !ZF |
Jumps |
Jng of string |
jng label1 |
Jumps execution to the given label if SF != OF OR ZF |
Jumps |
ComputedJmp of operand |
jmp rax |
Jumps to the location in the given operand | |
Ret |
ret |
Returns control to the calling function | |
Comment of string |
;; helpful comment |
A comment |
-
DqLabel
DqLabel "label1"
writes the address of the given label into the program as data (dq
is short for "data, quad-word"). You can then load this address with amov
instruction. You should make sure that your program's execution never gets to this directive–it's just data, not an instruction! -
LeaLabel
LeaLabel (Reg Rax, "label1")
loads the address of the given label into a register. You'll use this when doing a computed jump, or when trying to load data from a given label (e.g., in combination withDqLabel
). -
Div and Mul
Div
andMul
work differently fromAdd
andSub
. Because multiplying two 64-bit numbers will frequently overflow, the result ofimul
is stored inrdx:rax
as a 128-bit number. Our compiler doesn't handle overflow, so you don't need to worry about this for multiplication; however,idiv
does the inverse operation, dividingrdx:rax
by its argument. If you just want to dividerax
, you'll need to sign-extendrax
intordx
with thecqo
instruction. This setsrdx
to all 0s ifrax
is positive or zero and all 1s ifrax
is negative.Finally, neither
Div
norMul
can take an immediate value as their argument–it needs to be either a register or a memory offset. -
Sar
Sar
does an arithmetic right-shift, which maintains the sign of its argument while shifting it to right. -
Setz
Setz(Reg Rax)
sets the last byte ofrax
to0b00000001
ifZF
is set and to0b00000000
otherwise. In assembly it actually looks likesetz al
, becauseal
is the name for the last byte ofrax
. The OCaml assembly library takes care of this for you. -
Setl
setl
is short for "set if less." Just asSetz
sets its argument to 1 if the lastcmp
instruction compared equal arguments,setl
sets its argument to 1 if, in the lastcmp
instruction, the first operand was less than the second. Socmp r8, 40 setl al
will set the last byte of
rax
to 1 ifr8
is less than40
.This works because
cmp arg1, arg2
sets several flags:ZF
ifarg1 - arg2 = 0
SF
ifarg1 - arg2 < 0
OF
ifarg1 - arg2
overflows
setl
jumps ifSF != OF
, which means that the signed valuearg1
is less than the signed valuearg2
.Most of the time, you won't need to worry about the specific flags. Just do a
cmp
instruction and use theset
(orj
, see below) instruction with the right mnemonic. -
Conditional jumps
je
and friends jump to the specified label if their condition is true. The mnemonics work as explained above. For instance:cmp r8, 40 jng label1
will jump to label 1 if the value in r8 was Not Greater than 40.