Booleans

Today we're going to add booleans to our interpreter and our compiler. Specifically, we're going to add support for these expressions:

Types in the interpreter

Up until now, our language has supported only one type: numbers. This means that:

For instance, add1 is an operation that takes in a number and produces a number.

Now we're going to add booleans to our language. This means adding some expressions that produce booleans, and also some operations that accept booleans.

Here's our interp_exp function from last time:

let rec interp_exp (exp : s_exp) : int =
  match exp with
  | Num n -> n
  | Lst [Sym "add1"; arg] ->
    (interp_exp arg) + 1
  | Lst [Sym "sub1"; arg] ->
      (interp_exp arg) - 1
  | _ -> raise (BadExpression exp)

Since all of the expressions in our language evaluate to integers, interp_exp can return an integer. How will we modify our interpreter to work with booleans?

One option would be to represent booleans as numbers. For instance, we could decide that true is 1 and false is 0. Then we could implement our operations like this:

let rec interp_exp (exp : s_exp) : int =
  match exp with
  | Num n -> n
  | Sym "true" -> 1
  | Sym "false" -> 0
  | Lst [Sym "add1"; arg] ->
    (interp_exp arg) + 1
  | Lst [Sym "sub1"; arg] ->
      (interp_exp arg) - 1
  | Lst [Sym "not"; arg] ->
      if (interp_exp arg) = 0 then 1 else 0
  | _ -> raise (BadExpression exp)

This is a perfectly valid approach–it's more or less how C encodes booleans. It's not going to work very well, though, once we have more complex types like strings and lists (though I guess we could use Gödel numbering if we really had to). We'll also have a hard time correctly implementing our num operator. What should (num true) return?

Instead of encoding booleans as numbers, we're going to introduce a new type: value. (We'll also take this opportunity to move our interpreter into its own file, interp.ml).

interp.ml

open S_exp

type value = Number of int | Boolean of bool

Our interp_exp function should return this type. First, we'll just modify it to support the same operations it did before:

interp.ml

exception BadExpression of s_exp

let rec interp_exp (exp : s_exp) : value =
  match exp with
  | Num n -> Number n
  | Lst [Sym "add1"; arg] as e -> (
    match interp_exp arg with
    | Number n -> Number (n + 1)
    | _ -> raise (BadExpression e) )
  | Lst [Sym "sub1"; arg] -> (
    match interp_exp arg with
    | Number n -> Number (n - 1)
    | _ -> raise (BadExpression e) )
  | e -> raise (BadExpression e)

Notice what we're doing in the add1 and sub1 cases: if their argument doesn't evaluate to a number, it's not a valid expression. So, for instance, (add1 false) won't evaluate to anything.

Now we can add booleans:

interp.ml

let rec interp_exp (exp : s_exp) : value =
  match exp with
  | Num n -> Number n
  | Sym "true" -> Boolean true
  | Sym "false" -> Boolean false
  | Lst [Sym "add1"; arg] as e -> (
    match interp_exp arg with
    | Number n -> Number (n + 1)
    | _ -> raise (BadExpression e) )
  | Lst [Sym "sub1"; arg] as e -> (
    match interp_exp arg with
    | Number n -> Number (n - 1)
    | _ -> raise (BadExpression e) )
  | Lst [Sym "not"; arg] ->
    if interp_exp arg = Boolean false then Boolean true else Boolean false
  | Lst [Sym "zero?"; arg] ->
    if interp_exp arg = (Number 0) then Boolean true else Boolean false
  | Lst [Sym "num?"; arg] -> (
    match interp_exp arg with
    | Number _ -> Boolean true
    | _ -> Boolean false )
  | e -> raise (BadExpression e)

Notice that our new operations can take in arguments of any type. The Lisp-like language we're implementing, like Python or Racket or Javascript, is dynamically typed.

Finally, we'll patch up our top-level interpreter function:

interp.ml

let string_of_value (v : value) : string =
  match v with
  | Number n -> string_of_int n
  | Boolean b -> if b then "true" else "false"

let interp (program : string) : string =
  parse program |> interp_exp |> string_of_value

Types in the compiler

Now that we have an interpreter to test against, we can extend our compiler to support our new operations!

When our interpreter is executing a program, values of expressions are instances of the value datatype we just defined. We won't be able to do that in the compiler–we can't define new datatypes in x86-64! Remember that when our program is executing, its values live in registers (actually, just rax). Registers store 64-bit integers. Right now the values in our program are all integers, so this works fine. But how will we add booleans? Take a second and think about how you might implement this.

Well, we know that all of our values need to be represented, at runtime, as 64-bit integers. So instead of representing integers as themselves:

0 -> 0b00
1 -> 0b01
2 -> 0b10
3 -> 0b11
...

We're going to represent the integer x as x << 2 (shifted left by two bits):

0 -> 0b0000
1 -> 0b0100
2 -> 0b1000
3 -> 0b1100

This is exactly equivalent to representing each integer x as x * 4.

This means our integers have to fit in 62 bits instead of 64. So our minimum integer is now -(2**61) and our maximum integer is (2**61) - 1.

This also means there are a bunch of 64-bit integers (how many?) that are no longer being used to represent values! All of our integer values now end with 00. So anything that ends with a different pair of bits won't be used to represent a number. This means we can use some of them to represent booleans, and other types!

First, though, let's update our compiler to use this new representation for integers. Integer constants will be easy–we'll just shift them left. How will we handle add1 and sub1? Well, remember that our runtime representations are the values multiplied by 4. Since multiplication distributes over addition (and subtraction), we can just add (or subtract) 4 instead of 1! So:

compiler.ml

let num_shift = 2
let num_mask = 0b11
let num_tag = 0b00

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  | Num n ->
    [Mov (Reg Rax, Imm (n lsl num_shift))]
  | Lst [Sym "add1"; arg] ->
    compile_exp arg @
    [Add (Reg Rax, Imm (1 lsl num_shift))]
  | Lst [Sym "sub1"; arg] ->
    compile_exp arg @
    [Sub (Reg Rax, Imm (1 lsl num_shift))]
  | e -> raise (BadExpression e)

(lsl is "logical shift left." We could also just multiply by 4, but it's clearer this way.)

What happens if we run a program now?

>>> compile_and_run "(add1 4)"
20

This makes sense–we're printing out the runtime representation! We'll need to fix that. We'll edit our C runtime:

runtime.c

#include <stdio.h>
#include <inttypes.h>

#define num_shift 2
#define num_mask 0b11
#define num_tag 0b00

extern uint64_t entry();

void print_value(uint64_t value) {
  if ((value & num_mask) == num_tag) {
    int64_t ivalue = (int64_t)value;
    printf("%" PRIi64, ivalue >> num_shift);
  } else {
    printf("BAD VALUE %" PRIu64, value);
  }
}

int main(int argc, char **argv) {
  print_value(entry());
  return 0;
}

Boolean support in the runtime

While we're editing the runtime, let's also add support for booleans.

runtime.c

#include <stdio.h>
#include <inttypes.h>

#define num_shift 2
#define num_mask  0b11
#define num_tag   0b00

#define bool_shift 7
#define bool_mask  0b1111111
#define bool_tag   0b0011111

extern uint64_t entry();

void print_value(uint64_t value) {
  if ((value & num_mask) == num_tag) {
    int64_t ivalue = (int64_t)value;
    printf("%" PRIi64, ivalue >> num_shift);
  } else if ((value & bool_mask) == bool_tag) {
    if (value >> bool_shift) {
      printf("true");
    } else {
      printf("false");
    }
  } else {
    printf("BAD VALUE %" PRIu64, value);
  }
}

int main(int argc, char **argv) {
  print_value(entry());
  return 0;
}

We'll need to recompile the runtime:

$ gcc -c runtime.c -o runtime.o

Boolean support in the compiler

We can now add support for true and false pretty easily:

let bool_shift = 7
let bool_mask = 0b1111111
let bool_tag = 0b0011111

let rec compile_exp (exp : s_exp) : directive list =
  match exp with
  (* some cases elided ... *)
  | Sym "true" -> [Mov (Reg Rax, Imm ((1 lsl bool_shift) lor bool_tag))]
  | Sym "false" -> [Mov (Reg Rax, Imm ((0 lsl bool_shift) lor bool_tag))]

Handling our other operations will be a little trickier: a job for next class.