Lab 6 - Parser generators
In this lab, you'll be working with ocamllex
(a lexer generator) and menhir
(a parser generator) to lex and parse an arithmetic language.
Accept the assignment on Github Classroom here.
References
Menhir: http://gallium.inria.fr/~fpottier/menhir/manual.html
Starting point
We've provided a lexer (syntax/lex.mll
) and parser (syntax/parse.mly
) for a
simple language supporting integers, infix addition, and a print
operator.
Programs are composed of statements of the form print <expr>
. For example, the
following are valid programs as of now:
print 1
print 1 + 2
-
print 1 + 2 print 4
Running the interpreter
To run the interpreter, execute the interp
binary as follows:
dune exec ./bin/interp.exe -- -e "<PROGRAM>"
This will parse the program <PROGRAM>
with your parser and interpret the AST
that is produced. For instance, you could use the following to test the starter
code:
dune exec ./bin/interp.exe -- -e "print 1 + 2"
Once the programs you want to run become more complex, it'll be easier to write them as files. To run the interpreter on a file, execute:
dune exec ./bin/interp.exe -- <FILE>
Part 1: Extending the language
First, extend the lexer and parser to support the following:
- Float literals e.g.
1.23
- Scientific notation literals e.g.
3.47e5
(3.47 * 10^5 = 347000
)- OCaml's
float_of_string
function will accept these literals as input, so you just have to extend the number parse rule to recognize the new literal form
- OCaml's
(
and)
for grouping expressions e.g.print (1 + 2) * 3
-
,*
,/
, and^
(exponentiation) binary infix operators e.g.5 - 3
,1 * 2
,4 / 2
,2 ^ 3
-
should also be usable as a unary prefix operator for negation e.g.-5
- Prefix
log
,sin
,cos
,tan
operators e.g.log 5
- Variable assignments e.g.
foo = 1
(corresponding toAssign
in the AST)- Variable names may contain letters, underscores, or numbers, but must start with a letter
- The statements that make up a program may now either be
print
statements or variable assignments
Associativity
-
,*
, and/
should be left-associative (similar to+
)- When used for negation,
-
should be non-associative
- When used for negation,
^
should be right-associative (i.e.2 ^ 3 ^ 4 = 2 ^ (3 ^ 4)
)
Precedence
Of all the operators, log
, sin
, cos
, tan
, and negation should have the
highest precedence, followed by exponentiation, then multiplication and
division, then addition and subtraction.
Note: You may not use Menhir's %left
, %right
, %nonassoc
, or %prec
declarations yet, since we'll be using them in the next part. Instead, you
should encode associativity and precedence explicitly using production rules,
similar to how we've encoded the left-associativity of addition.
Checkoff: Call over a TA to go over your solution.
Part 2: Simplifying the parser
Next, we'll be simplifying the parser using some of Menhir's useful features.
Before beginning work on this section, save a copy of parse.mly
-- you'll
need it again for part 3.
Using Menhir's %left
, %right
, and %nonassoc
declarations (see the
reference linked above, and note that it calls precedence "priority"), establish
the same associativity and precedence rules for the operators in our language.
This should reduce the number of parse rules you need to represent the language.
Hint: You may need to use the %prec
declaration to override the precedence
of the -
operator when it is used for negation.
Checkoff: Call over a TA to go over your solution.
Part 3: Implicit multiplication
When writing mathematical expressions in non-programming contexts, it's common
to write expressions like 3 (4 + 5)
to implicitly mean 3 * (4 + 5)
. Add
support for this form of expressions to your parser, converting them into the
same ASTs that the explicit forms would parse to e.g.
Times (Num 3) (Plus (Num 4) (Num 5))
for the previous example.
In order to add support for this form of expression in the parse rules, we'll
need to go back to manually describing precedence and associativity instead of
using Menhir's succinct declarations, so swap your current parse.mly
file with
the copy saved at the beginning of part 2. The reason for this is that Menhir's
precedence and associativity hints help the parser generator resolve
shift-reduce conflicts. There's a set of rules that it uses to do this and they
involve looking at the token being parsed (e.g. PLUS
). In the case of implicit
multiplication, however, there isn't any such token to look at.
Checkoff: Call over a TA to get checked off for the lab.