Homework 4 - Error handling and the heap

Assignment Instructions

Accept the assignment on GitHub Classroom here.
Do the assignment 🐛.
Upload the assignment on Gradescope. The most convenient way to do this is by uploading your assignment repo through the integrated GitHub submission method on Gradescope, but you may also upload a .zip of your repo.

Introduction

In this homework, you'll implement lists using pairs and arrays. You'll get practice handling runtime errors and dealing with data on the heap.

Lists

As we saw in class, lists can be implemented using pairs. In fact, this is how lists are generally implemented in Scheme and other Lisp-like languages. A list is either:

The empty list, for which we used false in class
A pair where the second element is a list

Instead of using false to signal the empty list, Scheme includes a special value () (pronounced "nil").

Add support for () to both the interpreter and the compiler. In the compiler, use 0b11111111 as the runtime representation of (). You will also need to add support for displaying () to the runtime (see "The Runtime" section below).

Now we can build lists using pair and (). Here are a few such lists:

()
(pair 1 ())
(pair 1 (pair 2 ()))
(pair (pair 1 2) ())

Add a primitive (list? e). It should evaluate to true when its argument is a list and false otherwise.

Hint: implement list? in the interpreter with a recursive helper function.

Hint: implement list? in the compiler with a loop: you'll need to define a label and repeatedly jump back to that label.

Vectors

We can also implement lists using arrays. In Scheme, array-backed lists are called vectors. You'll implement the following vector operations:

(vector n e) creates a vector of length n where all of the elements are the expression e. n must evaluate to a positive integer. For example, (vector 3 10) produces a vector of length 3 where each entry is the integer 10.
(vector? e) returns true if e is a vector and false otherwise.
(vector-length v) returns the length of the vector v.
(vector-get v n) returns the element of the vector v at (0-based) index n. n must evaluate to a non-negative integer less than the length of v.
(vector-set v n e) sets the element at index n of the vector v to e and evaluates to the vector. n must evaluate to a non-negative integer less than the length of v.

Vectors should be printed as space-delimited lists of their values enclosed in square brackets. For instance, the following represent some simple vectors of numbers:

[1]
[1 2 3]

In the interpreter, you can implement vectors using OCaml's built-in array type and the functions in the Array module.

In the compiler, you should implement vectors on the heap. A vector of length n should occupy n+1 8-byte cells; the first cell should be used to store its length. Use the three-bit tag 0b101 for vector values.

Hint: you may need to make use of an additional register in order to implement some vector operations. If you do, you can use R9 in addition to the usual Rax and R8.

Note: unlike the other data types in our language so far, vectors are mutable. The following code should only store one vector, [10 21 10], on the heap:

(let ((v (vector 3 10))) (vector-set v 1 21))

It can be frustrating to nest together calls to vector-set to create test cases. We provide a new function for you, (do <expr> <expr> ...), that will help with this. Similar to OCaml's semicolon, which throws away the return value of a function with type Unit, the do function will run each of its arguments but only return the final one. For example, the program

(let ((v (vector 5 10)))
  (do 
    (vector-set v 0 4)
    (vector-set v 1 8)
    (vector-set v 3 200)
    (pair (vector-get v 2) (vector-get v 3))
  )
)

will return the pair (10, 200).

Error handling

For this assignment, unlike previous assignments, you are expected to implement error-handling in the compiler. Invalid expressions, such as vector-length on a non-vector or vector-get with an index greater than the vector's length, should always cause runtime errors. Take a look at ensure_num and ensure_pair for some examples of the error handling we've implemented so far.

We're particularly interested in catching runtime errors (and the corresponding interpreter errors). You can assume that all of our tests are well-formed s-expressions that do not contain unbound variables.

For error testing with .lisp, instead of creating the normal .out files, you should create a file example.err with empty content.

The Runtime

You'll need to change the runtime to handle displaying nil and vector values. In order to display vectors, it will likely be helpful to cast a vector's pointer to the heap into a C pointer to a uint64_t and then use pointer arithmetic to access each of the values in the vector.

Grammar

The grammar your new interpreter and compiler should support is as follows:

<expr> ::= <num>
         | <id>
         | true
         | false
+        | ()
         | (<un_prim> <expr>)
         | (<bin_prim> <expr> <expr>)
         | (<tri_prim> <expr> <expr> <expr>)
         | (if <expr> <expr> <expr>)
         | (let ((<id> <expr>) ...) <expr>)
         | (do <expr> <expr> ...)
             

<un_prim> ::= add1
            | sub1
            | zero?
            | num?
            | not
            | pair?
            | left
            | right
+           | list?
+           | vector?
+           | vector-length

<bin_prim> ::= +
             | -
             | =
             | <
             | pair
+            | vector
+            | vector-get

<tri_prim> ::=
+            vector-set