Homework 8 - Optimizations
Introduction
In this homework, you'll implement some optimizations in your compiler. You'll also come up with benchmark programs and see how well your optimizations do on a collaboratively-developed benchmark suite.
You'll implement at least two optimizations, all of which we discussed in class:
- Constant propagation and at least one of:
- Inlining
- Unrolling
- Common subexpression elimination
- Peephole optimizations
- Static type inference/runtime typecheck elimination
In order to make inlining and common subexpression elimination easier to
implement, you'll also write an AST pass (i.e., a function of type program -> program) to make sure all variable names are globally unique.
If you're taking the class as a capstone project, you should do constant propagation and pick two of the remaining 4 optimizations to implement. (The optional part of constant propogation is still optional.) You'll also write a short document about how your optimizations work and what kind of results you end up with.
Due dates:
- Benchmark programs: Fri, Dec 5 at 11:59pm
- Final submission: Mon, Dec 15 at 11:59pm
You have some options as far as how much time and effort to put into this final homework. If you're short on time and want to be done with the semester--perfectly understandable!--we recommend implementing inlining and skipping the optional extension to constant propagation. If you feel like diving in a little deeper, implement common subexpression elimination and the optional extension to constant propagation. The static type inference option is new and experimental this year (2025) and presents a fun design challenge. It's up to you, and won't affect your grade.
If you choose to do static type inference, your stencil code will look a little different: we've built type information into the abstract syntax of our language for you. (This gets cumbersome if you're not doing any type checking.) Your stencil code is in a different GitHub Classrooms assignment. Only use this option if you're planning to implement static type inference.
Assignment Instructions
-
Accept the assignment on GitHub Classroom here (without type inference) or here (with type inference).
-
Do the assignment 🐛.
-
Upload the assignment on Gradescope. The most convenient way to do this is by uploading your assignment repo through the integrated GitHub submission method on Gradescope, but you may also upload a .zip of your repo. You can upload either version (with or without type inference) to the same Gradescope assignment.
Benchmarks
There's a benchmarks repository at
https://github.com/browncs1260/final-benchmarks-2025. You can add your benchmarks to
that repository by forking the
repository
and then creating a pull
request
adding a file to the benchmarks directory. As part of your grade for this
final homework, you should add at least three interesting benchmark programs
to this repository by Friday, Dec 5. Please include your cslogin somewhere in the pull request title.
When you make a pull request, the CI action will show a green check if your tests are valid according to our stencil code. If they fail, please correct them! Pushing to your fork will automatically update the pull request.
The benchmark repository readme has directions for testing your compiler on these benchmarks.
Capstone
If you are taking CSCI 1260 as a capstone, you should submit a short (1-2 page) PDF document describing your implementation of these optimizations and their effects on your compiler's performance (the benchmarking scripts may help with this). This will serve as your capstone summary!
Starting code
The starting code is the same as for HW7, with support for either our lisp-like syntax or MLB syntax. Lambda expressions and function pointers are not supported.
You should write all of your optimizations in the file lib/optimize.ml. You
can write tests in the usual way; the tester will run all of your optimizations
on every test case.
You can run the compiler with specific optimization passes enabled using the
bin/compile.exe executable, by passing the -p argument one or more
times. For instance:
dune exec bin/compile.exe -- examples/ex1.lisp output -r -p propagate-constants -p uniquify-variables -p inline -p type-inference
will execute the compiler with constant propagation, globally unique names, and
inlining enabled, and the passes will run in the order specified. You can also
use this to execute an optimization more than once--for instance, doing constant
propagation, then inlining, then constant propagation again. Executing the
compiler without any -p flags will run all optimizations once, while -noopt
will disable all optimizations.
Constant propagation
Constant propagation is a crucial optimization in which as much computation as possible is done at compile time instead of at run time. We implemented a sketch of a simple version of constant propagation in class. Your constant propagation implementation should support:
- Replacing the primitive operations
add1,sub1,plus,minus,eq, andltwith their statically-determined result, when possible - Replacing
let-bound names with constant boolean or number values, when possible - Eliminating
ifexpressions where the test expression's value can be statically determined
Optionally, you can also implement re-associating binary operations (possibly in a separate pass) to find opportunities for constant propagation. For instance, consider the expression
(+ 5 (+ 2 (read-num)))
This expression won't be modified by the constant propagation algorithm described above, but with re-association it could be optimized to
(+ 7 (read-num))
Globablly unique names
Many optimizations can benefit from a pass that ensures all names are globally
unique. Implement this pass using gensym. This pass should be run before
inlining and common subexpression elimination, and both of those optimizations
can then assume globally-unique names (this is an exception to the usual
principle that the order of optimizations shouldn't matter for correctness). The
validate_passes function in optimize.ml ensures that this optimization is
executed before inlining and common subexpression elimination.
Note: If you don't plan to implement inlining, you don't have to implement uniquify_variables.
Inlining
Implement function inlining for function definitions. In general, inlining functions can be tricky because of variable names; consider the following code:
(define (f x y) (+ x y)) (let ((x 2)) (let ((y 3)) (f y x)))
A naive inlining implementation might result in code like this:
(let ((x 2)) (let ((y 3)) (let ((x y)) (let ((y x)) (+ x y)))))
This expression, however, is not equivalent!
This problem can be solved by adding a simultaneous binding form like the one you implemented in HW3. It can also be solved by just ensuring that all variable and parameter names are globally unique.
You should implement a heuristic for when to inline a given function. This heuristic should involve both (1) the number of static call sites and (2) the size of the function body. For example, you could multiply some measure of the size of the function body by the number of call sites and see if this exceeds some target threshold. We recommend implementing your inliner as follows:
- Find a function to inline. This function should satisfy your heuristics and be a leaf function: one that doesn't contian any function calls.
- Inline static calls to the function and remove the function's definition.
- Go back to step 1. Now that you've inlined a function, more functions may now be leaf functions.
This process will never inline recursive functions, including mutually-recursive functions.
Please describe your heuristic in a comment in the optimizations.ml file.
Unrolling
This is a technique that involves 'unrolling' loops with a known finite number of iterations into copies of the loop body, thereby removing the loop control instructions. Although this often leads to an increased program size, the removal of comparisons and jumps can greatly improve its speed.
In the context of our language, this optimization should target recursive functions that the inlining pass will miss, such as:
(define (func n x) (if (= n 0) x (func (sub1 n) (+ 4 x)))) (print (func 5 0))
Without optimization, this will compile into a 'loop' in the assembly code which repeatedly compares the decrementing variable to 0 and jumps to different computations depending on the result.
With some precise pattern-matching, however, we can determine the number of iterations that this recursive function will go through based on the arguments passed in, and simply convert the function call into repeated computations.
In this case, unrolling should translate the above program into:
(print (+ 4 (+ 4 (+ 4 (+ 4 (+ 4 0))))))
Note that with Constant Propogation applied to this particular program after unrolling, it can be converted into something as simple as:
(print (20))
As with inlining, you should come up with a heuristic for when to apply unrolling. For recursive calls with a very large number of iterations, the increase in space may not be worth the removal of loop control instructions. It's up to you to decide which cases should be unrolled!
We recommend implementing unrolling as follows:
- Determine if a function call is recursive, and if it follows the form above (i.e. it has an incrementing or decrementing variable controlling the number of iterations, and the 'body' is independent from this counter variable)
- Determine the number of iterations that the loop will execute for with the given parameters, and check if it satisfies your heuristic.
- If it does, unroll the function call into successive iterations of the 'loop body', such as (+ 4 x) in the case above.
- If all calls to the function are unrolled, remove the function definition.
Common subexpression elimination
Implement common subexpression elimination. This optimization pass should find common subexpressions, add names for those subexpressions, and replace the subexpressions with variable references.
This optimization is more challenging to implement than inlining or unrolling. Our suggested approach is to:
- Optimize each definition (including the top-level program body) independently. For each definition:
- Make a list of all of the subexpressions in the program that don't include calls to
(read-num)or(print) - Find any such subexpressions that occur more than once
- Pick a new variable name for each expression that occurs more than once
- Replace each subexpression with this variable name
- Add a let-binding for each common subexpression
- Make a list of all of the subexpressions in the program that don't include calls to
The most difficult part of this process is determining where to put the new
let-binding. Consider replacing the (identical) subexpressions e1, e2, and
e3 with the variable x. You'll need to find the lowest common ancestor e
of e1, e2, and e3, then replace it with
(let ((x e1)) e)
In order to find this lowest common ancestor, it will likely be useful to track the "path" to a given expression: how to get to that subexpressson from the top level of the given definition. How exactly you do this is up to you.
Peephole Optimizations
This is a very open-ended option, with a different flavor from the others.
All of the optimizations we've seen so far happen at the AST level. Peephole optimizations slot in somewhere else: they examine the list of assembly directives produced by the compiler and analyze them directly for patterns that can be simplified.
Because these optimizations can be very simple, you must implement at least three distinct peephole optimizations if you choose this option.
Note: If you do implement this optimization, provide example programs
where each of your peephole optimizations changes the assembly code (i.e., where the .s
file changes depending whether you use -p peephole or not), and specifically
mention these programs, along with the specific changes to the assembly code that
your optimization does, in your write-up. If you do not mention such programs,
you may not get credit for implementing your optimization.
You can find inspiration for possible peephole optimizations by simply opening
the assembly code (the .s file) generated after compiling some program. Simply
scan through the assembly code and find obvious opportunities to optimize, such
as redundant mov instructions. The entire point of this type of optimization
is to optimize the "obvious" opportunities that you would find by simply
scanning the assembly code. The name peephole comes from how these
optimizations typically scan the list of assembly directives in order, looking
at a "window" of a just a few sequential instructions, and try to find ways to
generate simpler, but equivalent, assembly code.
For a simple example, if the compiler produced the code
mov rax, r8 mov r8, rax mov rax, 10
this would be equivalent to a single directive
mov rax, 10
One way to "rewrite" this sequence of directives with general rules is to note that
mov R, S mov S, R
simplifies to
mov R, S
and
mov R, V1 mov R, V2
simplifies to
mov R, V2
However, note that this is trickier than it might first appear! For example, if
V2 depends on R (for example, if it is a memory offset from R) the value
of V2 will be changed by the first mov, thus the "optimization" described
above may cause the program to error or compute incorrect values! You could
check that V2 does not reference R, or you could just apply this
optimization in cases where V2 is a constant. Similar considerations often
apply to other peephole optimizations, since many "simple" assembly instructions
may have non-obvious side-effects that you must consider to ensure that your
optimization does not change the program's behavior.
We've provided the structure for peephole optimizations in the stencil. Simply implement the peephole function in optimize.ml. The flag -p peephole will specifically enable this optimization pass. Note that validate_passes ensures that peephole is the last optimization, since it runs on the assembly code, which is generated after all the other optimizations are run.
Note that these optimizations are supposed to be simple! They operate on a simple data structure -- a flat list of assembly instructions -- and should implement simple optimizations. It is fine to implement peephole optimizations that are just several lines long each, as long as you describe how they make the compiled code simpler and how they maintain the program correctness in your write-up, and provide example programs where they apply.
Using OCaml pattern matching will make your life much easier while implementing
this optimization. For example, matching the mov pattern shown above could be
done as follows:
let rec peephole = function | Mov (Reg r1, op) :: Mov (Reg r2, op) :: tl when r1 = r2 -> (* Generate new instructions, and recursively apply `peephole` to `tl`. You may also want to recursively apply `peephole` to the new instrs. *) | e :: tl -> (* Recursively apply `peephole` to `tl` *) | [] -> s[]
This is the general form that your peephole optimizations should take. Note that
you need to be careful what you pass to recursive calls so that you don't end up
with infinite loops. You may also find it better to implement a couple
simple peephole optimization passes as separate functions which are chained
together (for example, by using the |> operator) in peephole.
Type Inference
Implement type inference. The AST for the type inference stencil includes a type
called expr_type. This aptly named type denotes the type of an expr.
Additionally, all complex expr now include expr_type within them. The most
notable expr_type is HoleType of string. This is used to specify a type hole,
which means we currently do not know the type of this expr. When a program is
parsed, all complex expr are assigned an expr_type of HoleType, with a
unique numeric parameter (ex: t0, t1, etc). Your goal is to implement an
optimization pass that fills all these type holes.
Removing Runtime Type Checks
With compile time type inference, we can also remove runtime type checking. It's up
to you to decide when and how it's safe to do this. A complete strong type system will
need no runtime type checking at all. You will need to update compile.ml to
take advantage of the expr_type tags in the abstract syntax.
Implementation
There are many ways to implement type inference, but here is an outline of one way. You are not required to implement any particular inference algorithm; your type inference may be as aggressive as you would like.
-
Assign Type Holes when Parsing Done for you!
-
Function Signatures
For each function definition, assign a fresh type hole to each argument and to the return type. Build a function environment mapping function names to their type signatures (using these holes). A function's type can be represented usingPairTypes. For example, a function taking two arguments and returning a result has type:PairType(arg1_type, PairType(arg2_type, return_type)) -
Collect Constraints
Traverse the AST and, for each expression, generate a set of type constraints that must be satisfied for the program to be well-typed. For example,Plus(e1, e2)requires thate1,e2, and the result are allNumType. You might think about creating a type for contraints, such asEq of expr_type * expr_type. ForCall, you can generate constraints from the arguments and one constraint saying the function’s declared type equals the type we created in the previous step. -
Unification
Implement a unification algorithm to solve the set of constraints. This algorithm should:
- Substitute type holes with concrete types or other holes as constraints are solved.
- Detect and report type errors (e.g., trying to unify
NumTypeandBoolType). - Reject constraints like
t0 = PairType(t0, NumType)since they are infinitely recursive - Return a type substituion map from hole to
expr_type(e.g.{"t0" -> NumType, ...})
- Type Annotation
After unification, walk the AST and replace all type holes with their inferred types using the map returned from unification. Everyexprshould have itsexpr_typefield filled with the correct type.
Notes
- The type inference system described here assigns each function a single, fixed type signature based on its usage. Thus we lose the polymorphism that previously allowed us to have functions take on and return multiple types.