Interacting with the environment

Up until this point, our compiler has had a curious property. Here's another compiler we could have written for our language:

compile.ml

let rec compile_value (stack_index : int) (v : Interp.value) =
  match v with
  | Number n ->
      [Mov (Reg Rax, operand_of_num n)]
  | Boolean b ->
      [Mov (Reg Rax, operand_of_bool b)]
  | Pair (v1, v2) ->
      compile_value stack_index v1
      @ [Mov (stack_address stack_index, Reg Rax)]
      @ compile_value (stack_index - 8) v2
      @ [ Mov (Reg R8, stack_address stack_index)
        ; Mov (MemOffset (Reg Rdi, Imm 0), Reg R8)
        ; Mov (MemOffset (Reg Rdi, Imm 8), Reg Rax)
        ; Mov (Reg Rax, Reg Rdi)
        ; Or (Reg Rax, Imm pair_tag)
        ; Add (Reg Rdi, Imm 16) ]

let compile (program : s_exp) : string =
  [Global "entry"; Label "entry"]
  @ compile_value (Interp.interp_exp Symtab.empty program)
  @ [Ret]
  |> List.map string_of_directive
  |> String.concat "\n"

That's quite a bit simpler! It's even a better compiler–the programs it outputs are short and execute very efficiently. Given that it has been much easier to write our interpreter than to write our compiler, why not just do this? Why does anyone work on compilers?

The answer, of course, is that most programs don't just compute the answer to a fixed expression! We usually write programs because we want to do the same operation multiple times on different inputs. However, in our current language, there's no way to get an input from the outside world. Let's fix that now!

Adding input to the AST

We want to add the language construct (read-num) to our input language. Notice a subtlety here: unlike true and false, which are values, read-num is a function -- we enclose it in parentheses to "call" it. It corresponds to the S-expression Lst [Sym "read-num"].

We'll introduce a type of 0-ary primitives in our AST.

ast.ml

type prim0 = ReadNum

let prim0_of_string (s : string) : prim0 option =
  match s with "read-num" -> Some ReadNum | _ -> None

type expr =
...
  | Prim0 of prim0
...

let rec expr_of_s_exp (e : s_exp) : expr =
  match e with
...
  | Lst [Sym op] when Option.is_some (prim0_of_string op) ->
      Prim0 (Option.get (prim0_of_string op))
...

Adding input to the interpreter

interp.ml

let rec interp_exp env (exp : expr) : value =
  match exp with
  | Prim0 ReadNum ->
      Number (input_line stdin |> int_of_string)
  (* ... *)

Simple enough! We get slightly weird behavior if we do this, though (at least on my machine):

> interp "(pair (read-num) (read-num))";;
1
2
(pair 2 1)

What's going on here? As it turns out, the problem isn't with read-num–it's with pair!

interp.ml

let rec interp_exp env (exp : expr) : value =
  match exp with
  (* ... *)
  | Prim2 (Pair, e1, e2) ->
      Pair (interp_exp env e1, interp_exp env e2)
  (* ... *)

We're calling interp_exp twice, and each of those ends up reading input. But it seems like the second one is happening first!

The order in which OCaml evaluates arguments to functions (in this case, the function that constructs a pair) is actually unspecified: the implementation can evaluate them in whatever order it likes. This often doesn't really matter–for pure expressions like most of the ones we've been dealing with, it doesn't matter when they are evaluated! Now, though, we've introduced a side-effect: reading input. It really does matter now when those calls to interp_exp get evaluated!

Our compiler evaluates arguments to binary operations like pair in left to right order. We'll do the same in the interpreter:

interp.ml

let rec interp_exp env (exp : expr) : value =
  match exp with
  (* ... *)
  | Prim2 (Pair, e1, e2) ->
      let l = interp_exp env e1 in
      let r = interp_exp env e2 in
      Pair (l, r)
  (* ... *)


let interp (program : string) : string =
  interp_exp Symtab.empty (parse program)

Adding input to the compiler

First, we'll add a function to the C runtime to read a number:

runtime.c

uint64_t read_num() {
  int r;
  scanf("%d", &r);
  return (uint64_t)(r) << num_shift;
}

We'll need to let the assembler know that the read_num label is defined in the runtime:

compile.ml

let compile (program : expr) : string =
  [ Global "entry"
  ; Extern "error"
  ; Extern "read_num"
  ; Label "entry" ]
  @ compile_exp Symtab.empty (-8) program
  @ [Ret]
  |> List.map string_of_directive
  |> String.concat "\n"

Now we need to actually compile the (read-num) form into a call to this function. In our error-handling code, we were able to "call" C functions just by jumping to the right label. Fundamentally that's still what's going to happen, but we're going to need to do some additional work to make sure our program can keep executing after the function call:

Here's the code we end up with:

compile.ml

let align_stack_index (stack_index : int) : int =
  if stack_index mod 16 = -8 then stack_index else stack_index - 8

let rec compile_exp (tab : int symtab) (stack_index : int) (exp : s_exp) :
    directive list =
  match exp with
  | Prim0 ReadNum ->
      [ Mov (stack_address stack_index, Reg Rdi)
      ; Add (Reg Rsp, Imm (align_stack_index stack_index))
      ; Call "read_num"
      ; Sub (Reg Rsp, Imm (align_stack_index stack_index))
      ; Mov (Reg Rdi, stack_address stack_index) ]
  (* ... *)

See the lecture capture for an explanation of why this works, including some worked examples.