Homework 3 - DMAOC (div, mul, and, or, case)
Assignment Instructions
-
Accept the assignment on GitHub Classroom here.
-
Do the assignment 🐛.
-
Upload the assignment on Gradescope. The most convenient way to do this is by uploading your assignment repo through the integrated GitHub submission method on Gradescope, but you may also upload a .zip of your repo.
Testing
For this assignment, we're introducing a way to write test without the need to produce .lisp
and .out
files for each test. If you would like, place an index.in
file into your examples
folder. Here, write your tests in the following format
test|code|output, test2|code2|output2
Here's an example of a testing suite written like this
ischar| (char? #\a)| true, notchar| (char? 2)| false
This is functionally equivalent to writing 2 tests. Don't worry about whitespace, either.
You can still run your tests the same way, by calling dune runtest
.
Differences with the class compiler
The homework compiler we're building is a bit better than our class compiler. Let's start by highlighting some of these differences for your reference:
-
In class on Sep 27, we introduced ASTs. We mentioned that we could do a better software engineering job in our compiler by factoring out some repeated code. When we compile unary function applications, we always compile the argument first, and then do something with it. The stencil for this homework properly factors out this repetition into
compile_unary_primitive
andcompile_binary_primitive
functions, so there is only onePrim1
pattern match incompile_expr
:| Prim1 (f, arg) -> compile_expr tab stack_index arg @ compile_unary_primitive f
-
At the end of that same class, we started hinting at
let
bindings. We'll finish implementing these in class on Monday. But we've decided to provide an even betterlet
in the stencil for you: you can write programs like(let ((x 2) (y 5)) (x + y))
, which will evaluate to7
. This is fully implemented for you.
*
and /
Extend the interpreter and the compiler to support the *
and /
operators. These are binary operators that perform division and multiplication,
respectively. For instance, (* 3 (/ 10 5))
should evaluate to 6
.
We've already added these operations to our AST type in the stencil code. Specifically, they're binary primitives. You'll need to update the appropriate places in the interpreter and compiler, but don't need to think about the syntax for this part.
In the interpreter, these should be implemented just like +
and -
. In the
compiler, things get a little trickier:
- x86-64's
idiv
andimul
instructions, which do signed multiplication and division, are a bit unusual. Both only take one argument, a register, and divide (or multiply)rax
by the value contained in that register. Our OCamlasm
library hasdirective
constructorsIDiv
andIMul
. idiv
looks atrdx
for sign information. Before calling theidiv
instruction, you should callcqo
to sign-extendrax
intordx
.- With
add
andsub
, we were able to ignore our encoding of integers because multiplication distributes over addition. Forimul
andidiv
, that won't be the case. If you need to shift an integer right, make sure to usesar
instead ofshr
to keep the sign consistent.
Recall that each 64-bit register (e.g. rax
) can store an integer ranging from
-(2**63)
to (2**63)-1
(both inclusively). When two such integers are
multiplied, the result will range from -2**126 +2**63
to 2**126
, which
would fit not in a 64-bit register but an 128-bit one. One can simulate an
128-bit register with two 64-bit registers. This is why x86_64 assembly puts
the "outputs" of imul
in rdx
and rax
. Conversely, the dividend for idiv
is 128-bit and stored in rdx
and rax
.
You may use sar
and shl
to cast our 62-bit lisp numbers to and from int64s,
and cqo
for casts from int64 to int128. Of course, casting from a larger
integer set to a smaller one may fail (e.g. from 2**63
to a lisp number). You
are not required to handle these overflow cases in this assignment.
Here is a summary of cqo
, shl
and sar
(source):
CQO
: an instruction.RDX:RAX
:= sign-extend ofRAX
SHL r/m64, imm8
: multiplyr/m64
by2
,imm8
timesSAR r/m64, imm8
: signed divider/m64
by2
,imm8
times
and
and or
Implement and
and or
, binary operations that take operands of any type and
return one of them as described below. As usual, only false
is considered falsey.
(or <non-falsey-value-1> <value-2>) --> <non-falsey-value-1>
(or false <value-2>) --> <value-2>
(and <non-falsey-value-1> <value-2>) --> <value-2>
(and false <value-2>) -> false
Both and
and or
should short-circuit: if they are returning their first
argument, they should not evaluate their second argument. It might be useful to
write a test case that will fail if they do not short-circuit, perhaps using
your new /
operator with a 0
argument. For this reason, they must not
be implemented as binary primitives, since the arguments of binary primitives
are eagerly evaluated before compile_primitive
or interp_binary_primitive
are called.
Again, we have included these operations in our AST type, but you'll need to update the appropriate places in the interpreter and compiler.
The case
form
Note: this part will probably take more time than the other tasks! There's a little more creativity required from you here.
Common Lisp's case
expression, like C's switch
and OCaml's pattern-matching,
compares a single expression against a number of values:
(case 4
(1 1)
(2 4)
(3 9)
(4 16)
(5 25)) => 16
You'll implement a limited form of this expression:
- The argument must be an expression that evaluates to an integer
- The left-hand side of each case must be a literal integer
- There must be at least one case and no duplicate cases
- The last case in the expression is the default case
Add support for case
syntax to the AST
First, you'll have to add support for this expression to your AST.
This will involve adding a Case
to the expr type -- think about what input it should take!
You will also need to add another pattern to the expr_of_s_exp
function
located in lisp_syntax/lisp_syntax.ml
.
Add support for case
in the interpreter
In your interpreter, you can implement support for this expression however you'd like. See below for some helpful built-in OCaml functions.
Add support for case
in the compiler
In your compiler, implement support for case
expressions via branch
tables. A branch table (sometimes
called a jump table) avoids the need for many comparisons and jumps. For
instance, imagine converting the case expression above to if
expressions:
(if (= 4 1)
1
(if (= 4 2)
4
(if (= 4 3)
9
(if (= 4 4)
16
25))))
If we compiled this code and ran it, the processor would end up needing to
execute 4 cmp
instructions and 4 jmp
instructions. If we added more cases,
the worst case scenario gets worse; execution time will be linear in the number
of cases.
For 4 or 5 cases, this isn't usually too terrible. But for some applications, it's really important to be able to do this kind of switching in constant time in the number of cases. Branch tables let us do that.
Building a branch table
A branch table is a chunk of assembly directives that, rather than specifying instructions, just contain the addresses of labels. It looks like this:
branch_table:
dq case_label_1
dq case_label_2
dq case_label_3
...
dq case_label_n
(The dq
instruction puts data into the executable. The q
is for quadword
,
because label addresses are 8 bytes. See below for more on dq
.)
The first label in the branch table should be the label of the smallest (not necessarily first!) case. The last label should be the label of the largest (not necessarily last!) case. There should be a label for every value in between the minimum and the maximum; this means you might have more labels than cases. For the "holes" between cases, you should use the label of the last (not necessarily largest!) case, since this is the default.
After the branch table, you should emit code for each case's expression, labeled
with the correct label for that case. Like the "then" case of an if
, each case should jump
to the same "continue" label.
Using a branch table
To use the branch table, you'll need to:
- Compare the argument to the minimum and maximum cases. If it's outside those bounds (i.e., less than the minimum case or greater than the maximum case), jump to the default label.
- Compute the offset of the label for your argument. For an argument
x
, this is(x - min_case) * 8
. You need to multiply by 8 because each label address stored in the branch table takes up 8 bytes. - Load the label for the branch table into a register. You should use
LeaLabel
for this. - Jump to the computed label. You should use
ComputedJmp
with aMemOffset
argument for this.
Useful functions
A few functions you may find useful for implementing case
:
List.assoc
,List.assoc_opt
for using('a * 'b) list
s somewhat like maps.- See the "Association lists" section of OCaml's List docs.
get_cases_ast
, which parses a list of expressions into a list of (number, expression) pairs.- Defined in
lib/util.ml
, which can be opened withopen Util
.
- Defined in
List.range lo hi
, which produces a list of numbers fromlo
tohi
, inclusive.- Added to the
List
module inlib/util.ml
, which can be opened withopen Util
.
- Added to the
A bit more explanation on dq
When we write and execute an assembly program, the program itself gets loaded into memory. Each directive (with its arguments) is a value, and the directives are stored in sequence in memory; a label is effectively a memory address, and the directive at that label is stored in memory at that address.
dq
is a "meta-directive." It takes as input an 8-byte value. Instead of putting into memory a value corresponding to dq w
, it will put the value w directly into memory at that location. w
itself could be any value; in particular, it may not correspond to any assembly directive! So, you probably don't want to let the processor execute this directive. Effectively, you're using a location in memory where the processor expects to find part of your program, but storing arbitrary data there instead. When we call the assembler, this arbitrary data will still be in the machine code, hence, "data in the executable."
For our purposes in hw3, we're using dq
to store a label: that is, the address of a different part of our program. And we can compute, using only arithmetic, at which point in memory this label will be stored (as an offset from another label).
Grammar
The grammar your new interpreter and compiler should support is as follows:
<expr> ::= <num> | <id> | true | false | (<un_prim> <expr>) | (<bin_prim> <expr> <expr>) | (if <expr> <expr> <expr>) | (let ((<id> <expr>) ...) <expr>) + | (and <expr> <expr>) + | (or <expr> <expr>) + | (case <expr> (<num> <expr>) (<num> <expr>) ...) <un_prim> ::= add1 | sub1 | zero? | num? | not <bin_prim> ::= + | - | = | < + | / + | *