Homework 5 - Fun With files
Accept the assignment on GitHub Classroom here.
Do the assignment 🐫.
Upload the assignment on Gradescope. The most convenient way to do this is by uploading your assignment repo through the integrated GitHub submission method on Gradescope, but you may also upload a .zip of your repo.
In this homework, you'll implement string support and file handling. You'll get more practice dealing with data on the heap and learn how to extend your compiler's functionality via the C runtime.
Implement support for strings. For now, this just means adding support for
string literals (i.e., s-expressions built with the
Str constructor). String
literals are sequences of characters enclosed in double-quotes. Strings should
be displayed using the same double-quote-enclosed representation when they are
Strings may contain characters with special meaning (for example, newlines and
double-quotes). When displaying strings, the newline and double-quote special
characters should be printed as
\", respectively, to ensure the
resulting expression is well-formed. For instance, a string containing only a
double-quote should be displayed as
"\"" and not
You do not need to support special characters besides quote and newline.
Add string support to the interpreter by adding a constructor to
extending the interpreter's
display_value function to display strings. You may
use OCaml's built-in
string type for the underlying value, and you may use
String.escaped to escape
In the compiler, we'll represent strings like C does: null-terminated sequences
of characters. (The null terminator is simply a byte with value
0 [in C, a
char represented by
'\0'] that follows the bytes that contains the string's
characters.) The runtime value for a string should be a pointer to the first
character of the sequence, tagged with
Extend the runtime with support for displaying strings. In order to properly escape newlines and double-quotes, implement this as a loop over the string's characters, and check if each needs to be escaped.
There are multiple ways to store string values; for example, you could store string literals on the heap. In fact, we suggest that you do this when implementing
input in the latter part of this assignment. However, to implement string literals -- that is, explicit strings like
"abc" that appear in our Lisp programs directly -- we can use a more direct approach: the
dq assembly directive.
Remember in homework 3 that you used the
dq directive to "embed data in the executable". An executable is itself is a sequence of bytes, where most of those bytes represent machine code instructions. When the program is run, these bytes are loaded into memory, thus each byte of the executable has a memory address. When the assembler sees
dq with an argument, it inserts the argument directly into the bytes of the executable that it generates. Normally the argument to
dq is not itself an instruction, so we must make sure the processor will never try to execute these bytes of the executable. But if we label this location, we can find its address in memory using
lea, and thus access its contents.
But there's a catch. Remember that when we started putting pairs on the heap, we had to make sure that every address we used to represent pairs was a multiple 8, so that we had room on the end for a three-bit tag. We need to do the same here, this time by using the
align assembly directive. When the assembler sees
align 8, it ensures that the next instruction will be placed at a memory location that is a multiple of 8. It fills the required amount of padding by inserting some number of no-op instructions into the sequence of machine code. So if we
align 8 before storing a string value with
dq, the string will be stored in the resulting program at an address whose last three bits are
0. As noted above, you will need to tag string values with
For example, what would end up in
rax if you wrote the following assembly code?
lea rax, [my_label] jmp continue align 8 label my_label: dq 'its_friday' label continue:
After executing this code,
rax will contain the address of the string
"its_friday". Think about what problems you might encounter if you move the line
label my_label: above the line
align 8 (hint: think about where the "junk" bytes generated by
align will be placed in the generated machine code). Also, think about what problems you might encounter if you omit the
jmp continue line.
In the compiler, the
lea instruction above is represented by the
label directives are represented by the
Label constructor, and
dq directives (as shown above) are represented by the
DqString automatically adds a null-terminator to the specified string data.
File I/O using channels
You're going to add a miniature version of OCaml's channel-based I/O to your language. First, you'll need to add support for channel types. Channels can be either input channels or output channels. Input channels can be read from; output channels can be written to.
stdout refer to the program's input and output.
stdin is an input channel;
stdout is an output channel.
- In the interpreter, implement input channels with
in_channeland output channels with
out_channel. The AST elements
Stdoutshould evaluate to the OCaml built-in values
- In the compiler, implement both types of channels with the same mask,
0b111111111. Input channels should have tag
0b011111111; output channels should have tag
stdinshould be represented at runtime as
0shifted left and tagged with the input-channel tag;
stdoutshould be represented as
1shifted left and tagged with the output channel tag. The
1correspond to file numbers for the Unix standard input/output files.
Channels should be printed as
Now you'll implement some operations to open, close, and read and write via channels.
(open-in filename)takes in a string representing a filename and opens it as an input channel, returning the channel
(open-out filename)takes in a string representing a filename and opens it as an output channel, returning the channel
(close-in ch)takes in an input channel and closes it, returning
(close-out ch)takes in an output channel and closes it, returning
(input ch n)reads
nbytes from input channel
ch, returning a string of those bytes (plus a null terminator)
(output ch s)writes the (null-terminated) string
sout to output channel
In the interpreter, these should be implemented using OCaml's built-in functions:
open-inshould be implemented using
open-outshould be implemented using
close-inshould be implemented using
close-outshould be implemented using
inputshould be implemented using
outputshould be implemented using
In the compiler, all of these operations will be implemented with C functions
that you add to the runtime. You should define a C function for each operation
close-out, which should be implemented identically
in the runtime, using the same C function).
- Your functions for
open-outshould use the
open_for_writinghelper functions we have provided in
runtime.c. Both functions return a file descriptor as an integer. Turn this file descriptor into a channel by shifting it and tagging it with the correct channel tag.
- Your function(s) for
close-outshould use the
closeC function, passing the file descriptor obtained from
open_for_writing(remember to remove the tag and shift appropriately). Input and output channels can be handled exactly the same.
- Your function for
inputshould use the
read_allhelper function we've provided in
runtime.c; this helper function takes care of reading the correct number of bytes, adding a null terminator, and handling errors. Since
read_allreads into a buffer, you'll need to pass in a pointer to the Lisp heap (in other words, you'll have to pass the heap pointer as an argument to your function). After you call
read_all, you'll need to update your heap pointer to reflect the space used for reading; you'll also need to make sure that the heap pointer remains a multiple of 8. We recommend doing this by returning the heap adjustment from your C function as an integer (taking care to make sure it's a multiple of 8), but there are other ways of doing it.
outputshould be implemented using the
write_allhelper function we have provided in
This assignment adds a couple of testing features.
Providing program input
Your programs can now read from standard input, using both the
operator defined in class and the
input operator you'll write on this
assignment. For testing, you should provide inputs by writing
.in files for
.lisp file that expects an input. For example, if we defined a program
like this in
(print (pair (read-num) (pair (read-num) (read-num))))
We could define its input in
8 13 21
The testing system will provide this as the input to both the interpreter and the compiler.
When testing reading/writing to files, it can be easy for state to get mixed up
between tests (e.g. one test creates a file and writes some things to it, while
another test expects the file to not exist). To prevent this situation, our test
runner will create a temporary directory called
tmp, and will empty this
directory before each test, thus ensuring that every test starts with an empty
tmp directory. You must use this directory when writing files in your tests
(i.e. all paths passed to
open-out must start with
restriction is checked when your tests are run by the autograder.
For example, to test writing a file you might use a test program like the following:
(let ((f (open-out "tmp/test.txt"))) <do something with f>)
You may also add files to a
data folder located in the root folder for this
project. These files will be accessible by test programs, to allow you to test
input channels separate from output channels. However, you may not write to these files for the reasons explained above.
For example, you might add the file
data/test.txt with the contents
and then include the following program in
(let ((f (open-in "data/test.txt"))) (print (input f 4)))
The grammar your new interpreter and compiler should support is as follows:
<expr> ::= <num> | <id> + | <string> | true | false + | stdin + | stdout - | () | (<z-prim>) | (<un_prim> <expr>) | (<bin_prim> <expr> <expr>) | (if <expr> <expr> <expr>) | (let ((<id> <expr>) ...) <expr>) | (do <expr> <expr> ...) <z-prim> ::= read-num | newline <un_prim> ::= add1 | sub1 | zero? | num? | not | pair? | left | right | print + | open-in + | open-out + | close-in + | close-out <bin_prim> ::= + | - | = | < | pair + | input + | output