# OCaml and S-Expressions

Last time we wrote a compiler for a very small language: the integers. Next time, we'll grow our language a bit. Today, though, we're going to step back and learn some OCaml. We'll also talk about how to represent our language's syntax.

## OCaml basics

Here's the compiler we wrote last time:

`compile.ml`

let compile (program: string): string = String.concat "\n" [ "global entry"; "entry:"; Printf.sprintf "\tmov rax, %s" program; "\tret" ]

This program illustrates several OCaml features (see the lecture capture for details):

- Functions
- Types
- Lists

There are some list functions that will be helpful for us to know about.
Keep these in mind, and see what their output is by running the following in `utop`

:

let f x = x * 2;; List.map f [1; 2; 3];; List.map (fun x -> x * 2) [1; 2; 3];; List.filter (fun x -> x > 2) [1; 2; 3];; List.fold_left (fun x y -> x + y) 0 [1; 2; 3];; List.fold_left ( + ) 0 [1; 2; 3];;

## S-Expressions

We want to eventually extend our language with support for many new
features, such as conditionals and functions. We will talk in great
depth about the *semantics* of those features: what a conditional
expression *means*. We'll also need to figure out what the *syntax*
corresponding to those features is. For instance, conditionals look like
this in Javascript:

if (x) { return y; } else { return z; }

And like this in Python:

if x: return y else: return z

We will talk more about syntax much later in the course. For now, our language is going to support a very simple syntax: S-expressions.

S-expressions (short for "symbolic expressions") were introduced by John McCarthy in his paper on the LISP programming language. They were originally intended as an internal representation that programmers wouldn't manipulate directly, but LISP programmers liked writing them and the intended language syntax (M-Expressions) was never implemented. Here are some S-expressions:

(if x y z) (* (+ 2 6 4) 3) (define (f x) (if (= x 0) 1 (* x (f (- x 1)))))

S-expressions are a simple, convenient way to represent structured expressions. Because of their simple structure, they are very easy to represent and manipulate inside a compiler or an interpreter.

Look at the expressions above. What components are there? Ignoring
everything we might know about what the expressions *mean* and focusing
on how they *look*, we can see three different kinds of expression:

- Symbols

such as`f`

and`if`

- Numbers

such as`4`

and`0`

- Lists

sequences of other S-Expressions wrapper in parentheses, like`(+ 2 6 4)`

or`(* (+ 2 6 4) 3)`

## S-Expressions in OCaml

How should we represent S-expressions in OCaml? One option would be to use strings. So we could have

let e1 = "(if x y z)" let e2 = "(* (+ 2 6 4) 3)"

This representation is convenient to read in: programmers write programs
as sequences of characters in files. It is, however, very difficult to
work with. Imagine extending our compiler to support unary operations on
numbers (for instance, a function that increments a number). Doing so
using this string representation would be **very** difficult. We'd have
to see if the first character of the string is a `(`

, then look at the
next characters to find the operation, then find the argument, and so
on. Moreover, we'd need to repeat this effort for every operation we
want to support.

A better representation for S-expressions will use the structure we talked about above. We can add a new type to OCaml to encode this structure, as follows:

type s_exp = Sym of string | Num of int | Lst of s_exp list

`Sym`

, `Num`

, and `Lst`

are all *constructors*: ways of building
s-expressions. So what this type definition is saying is that an
s-expression can be a symbol **or** a number **or** a list.

The "of" after each constructor indicates that that constructor is
associated with some data of a particular type. For instance, the symbol
`if`

represented as an `s_exp`

looks like `Sym "if"`

; 4000000000000
looks like `Num 4000000000000`

.

`Lst`

is defined *recursively*: it takes as its argument a list of other
s-expressions. These could be build using the `Sym`

or `Num`

constructors or using more `Lst`

constructors. So for instance, `(if x
y z)` looks like

let e1 = Lst [Sym "if"; Sym "x"; Sym "y"; Sym "z"]

and `(* (+ 2 6 4) 3)`

looks like

let e2 = Lst [Sym "*"; Lst [Sym "+"; Num 2; Num 6; Num 4]; Num 3]

We can see that compared to the string representation, which is "flat",
this representation includes the nested structure of s-expressions. We
can access this structure using *pattern matching*:

let which_constructor (e: s_exp) : string = match e with | Sym x -> "a symbol" | Num n -> "a number" | Lst l -> "a list"

This looks like a repeated "if". The real power of pattern matching is
*destructuring*: we can use names defined on the left-hand side of each
pattern.

let rec total (e: s_exp) : int = match e with | Sym _ -> 0 | Num n -> n | Lst l -> List.fold_left ( + ) 0 (List.map total l)

For the `Lst`

constructor, it can sometimes be useful to destructure the
list as well:

let rec has_if (e: s_exp) : bool = match e with | Sym _ -> false | Num _ -> false | Lst (Sym "if" :: _) -> true | Lst l -> List.exists has_if l

## S-expressions in our compiler

For the purposes of this course, we have a small library for dealing with S-expressions.
HW1, out today, is designed to get you some practice working with this library.
The library defines the same `s_exp`

type listed above; it also defines
functions to convert strings to s-expressions.

If our OCaml project is set up correctly, we can use the library like this:

open S_exp let e1 = parse "(* (+ 2 6 4) 3)"

`parse`

takes in a string and returns the corresponding s-expression.
Later in the course, we will see how this function works.

The compiler we have developed can be modified to use s-expressions:

`compile.ml`

open S_exp let compile (program: s_exp): string = match program with | Num n -> String.concat "\n" [ "global entry"; "entry:"; Printf.sprintf "\tmov rax, %d" n; "\tret" ]

We only support numbers, so we haven't included the other cases. Our OCaml editor will complain about this program, saying something about "non-exhaustive pattern matching"; this means that there are some s-expressions that we haven't matched. We can "fix" this error like this:

`compile.ml`

open S_exp exception BadExpression of s_exp let compile (program: s_exp): string = match program with | Num n -> String.concat "\n" [ "global entry"; "entry:"; Printf.sprintf "\tmov rax, %d" n; "\tret" ] | e -> raise (BadExpression e)

Now we will get a somewhat-useful error if we try to compile something that isn't a number.