Unary operations
Today we'll work on extending the language supported by our compiler with a couple of unary operations on natural numbers. In particular:
(add1 x)
adds one tox
(sub1 x)
subtracts one fromx
We'll also talk about testing our compiler, and learn more about the semantics of x86-64 assembly.
Interpreting Assembly
Let's look at the assembly code our compiler produces when we compile the program 17
.
global entry entry: mov rax, 17 ret
Each of these four lines is an assembly directive.
global entry
: We want theentry
label to be available outside this program, since we're callingentry
from our runtime.entry:
The label itself, indicating that when someone tries to run the programentry
, they should run the code below.mov
: This is an instruction directive.rax
is a register, a location in memory where you store numbers. There aren't many registers available for us to use.17
is an immediate value, what we call a numeral in this context.mov rax, 17
says to store the value 17 in the specified register. We could also, for example, move a value from one register to another. This same pattern "name, desitination, source" will come up a lot!
ret
: This means "return control to the caller function." By convention, the registerrax
holds the value being returned.
Adding operations
Here's the compiler we ended up with after last class:
compile.ml
let compile (program : s_exp) : string = match program with | Num n -> String.concat "\n" [ "global entry"; "entry:"; Printf.sprintf "\tmov rax, %d" n; "\tret"; ] | e -> raise (BadExpression e)
Before we extend the compiler to support new operations, we're going to
make a couple of changes. First: right now, the compiler will only work
on Linux, not MacOS; on Linux, labels shouldn't have underscores,
whereas on MacOS they must. Relatedly: right now, it's sort of hard to
read! The mov
and ret
instructions both have tab characters,
there's that sprintf
call, and so on.
Just as we introduced a more usable representation for s-expressions,
we'll use an OCaml type for assembly language as well.
See the lecture capture for the details of how our Asm
library works.
Using the Asm
library, we can rewrite our compiler:
compile.ml
open Asm (* some definitions elided... *) let compile (exp : s_exp) : string = match exp with | Num n -> [Global "entry"; Label "entry"; Mov (Reg Rax, Imm n); Ret] |> List.map string_of_directive |> String.concat "\n" | _ -> raise (BadExpression exp)
This code uses the "pipeline" operator |>
. This operator gives us a
cleaner way of calling functions in OCaml; x |> f
is exactly the same
as f x
. It lets us write expressions from "inner" to "outer" rather
than the other way around; the pipeline expression in the code above
could be replaced with
String.concat "\n" (List.map string_of_directive [Global "entry"; Label "entry"; Mov (Reg Rax, Imm n); Ret])
Notice the other thing we're doing here: partial application. We can provide some of the arguments to an OCaml function and get back a function we can apply to the rest of the arguments. For instance, since
List.map : ('a -> 'b) -> 'a list -> 'b list
we know that
List.map string_of_directive : -> directive list -> string list
OK: now we're ready to add support for add1
and sub1
. These are both
unary operations: they take one argument. Each of them takes in an
integer and returns an integer (integers are still the only type our
language supports). We're starting with unary operations because we'll
be able to implement them using only a single register (namely our
friend rax
).
Let's start with the add1
operation. What code should our compiler
produce for, say, (add1 50)
?. How about something like this:
global entry entry: mov rax, 50 add rax, 1 ret
So, let's write some code. We know we're going to need to match against these new operations.
compile.ml
let compile (exp: s_exp) : string = match exp with | Num n -> [Global "entry"; Label "entry"; Mov (Reg Rax, Imm n); Ret] |> List.map string_of_directive |> String.concat "\n" | Lst [Sym "add1"; arg] -> (* what should we put here? *) | _ -> raise (BadExpression exp)
Once we've done that, we get a bit stuck. We're going to need to include the "global entry", "entry:", and "ret" directives again, which seems repetitive. We're also going to need to do something with the argument to our operation.
Let's reorganize our compiler one more time:
compile.ml
let compile_exp (exp: s_exp) : directive list = match exp with | Num n -> [Mov (Reg Rax, Imm n)] | Lst [Sym "add1"; arg] -> (* what should we put here? *) | _ -> raise (BadExpression exp) let compile (program : s_exp) : string = [Global "entry"; Label "entry"] @ compile_exp program @ [Ret] |> List.map string_of_directive |> String.concat "\n"
We now have a helper function compile_exp
in which we don't need to
worry about "boilerplate" like our label and our return instruction.
Now, we can complete our support for add1
:
compile.ml
let rec compile_exp (exp: s_exp) : directive list = match exp with | Num n -> [Mov (Reg Rax, Imm n)] | Lst [Sym "add1"; arg] -> compile_exp arg @ [Add (Reg Rax, Imm 1)] | _ -> raise (BadExpression exp)
In the add1
case, we first recursively compile the argument to the
operation. We don't know what it is–it could be a number, or another
unary operation. Regardless, we know what our compiler is going to do:
it's going to produce a sequence of instructions that will put the value
of that expression in rax
.
We can add sub1
support similarly:
compile.ml
let rec compile_exp (exp: s_exp) : directive list = match exp with | Num n -> [Mov (Reg Rax, Imm n)] | Lst [Sym "add1"; arg] -> compile_exp arg @ [Add (Reg Rax, Imm 1)] | Lst [Sym "sub1"; arg] -> compile_exp arg @ [Sub (Reg Rax, Imm 1)] | _ -> raise (BadExpression exp)
OK–this looks reasonable! But how do we know it's right?
Correctness and testing
If we take the long view, programmers have–to put it gently–a pretty mixed track record vis-a-vis bugs; writing bug-free software is really hard! At the same time, it's quite important to ensure that compilers are correct. If your compiler has a bug, it could impact anyone writing programs in your language and is likely to be very hard for end-users to track down. So: what can we do to make sure that our software is correct?
We have a number of options, all of which have been applied (to varying degrees of success) to compilers:
- Hand-written correctness arguments
- Testing on particular programs
- Testing different compilers against each other
- Formal verification
For this class, our approach is going to be as follows. As we grow our language, we're going to add features both to our compiler and to a definitional interpreter. This is an interpreter for our language where we don't care about performance at all; all we care about is that it's correct. Indeed, this interpreter will serve as the definition of our language.
Here's such an interpreter for the language we have so far:
let rec interp_exp (exp : s_exp) : int = match exp with | Num n -> n | Lst [Sym "add1"; arg] -> (interp_exp arg) + 1 | Lst [Sym "sub1"; arg] -> (interp_exp arg) - 1 | _ -> raise (BadExpression exp) let interp (program : string) = string_of_int (interp_exp (parse program))
Notice that our interpreter has a pretty similar structure to our compiler. We'll see this throughout the semester.
We asked in the first lecture about the "type" of an interpreter compared to the "type" of a compiler, and concluded that interpreters produce "values." Notice that here, the values are integers. Why did we choose that? How long will that last us?
Now that we have an interpreter, we can test our compiler against it:
let difftest (examples : string list) = let results = List.map (fun ex -> (compile_and_run ex, interp ex)) examples in List.for_all (fun (r1, r2) -> r1 = r2) results let test () = difftest ["43"; "(add1 (add1 3))"; "(sub1 4)"]
This isn't a very fancy differential tester. In particular, it just checks that our compiler and interpreter have the same output -- what if we expect a certain output from both and want to specify that? On the homeworks we'll provide a fancier differential tester, with better error reporting and the option to provide expected values.