home

compilers

Table of Contents

1. Overview

Early in January of 2024 I took a week off from work to write a compiler from scratch. I've always liked the idea of putting aside some time to dive into a technical project so when I saw I had plenty of PTO and this course by dabeaz I marked my calendar as OOO. I ended up having a great time and highly recommend the experience.

My goal going into the week was to follow along with the course and write a compiler in python and, if there's time, in clojure. There wasn't time but I wrote a compiler in clojure anyways. It's funny how that works. You can find the code here.

I had very high expectations for this project. I thought writing a compiler would unlock some sort of hidden wizard knowledge. I can't say that happened. And I can't say working on a compiler has given me knowledge that I can bring back to my day to day work. While compilers are very complex the general idea is simple:

  1. Convert raw text into a sequence of tokens.
  2. Organize tokens into a syntax tree.
  3. Manipulate the syntax tree to make it easier to work with.
  4. Produce your targeted code by essentially walking the.

Surprisingly, my biggest takeaways don't have anything to do with compilers.

2. Key Takeaways

Learning about zippers and getting to work with LLVM and WebAssembly were my technical highlights. But most of what I got out of the week wasn't technical. The big takeaways were high level ideas on writing software. Being constrained on time and complexity of writing a compiler pushed me out of my comfort zone and made me re-evaluate how I approach software.

2.1. Zippers

Zippers are one of the coolest data structures I've come across in a while. Zippers are a way to navigate tree like structures. They give you a "location" (essentially a pointer to a node in your tree) and a set of functions to call on that location. These functions essentially fall into two groups: manipulation and navigation.

Manipulation is fairly straightforward, it covers things like:

  • Adding children to the node (associated with the location).
  • Editing the node (associated with the location).
  • Replacing the node (associated with the location).
  • Removing the node (associated with the location).

Navigation is more interesting. You can navigate from a location in four directions:

  • Up: Go to the parent of the node (associated with the location).
  • Down: Go to the first child of the node (associated with the location).
  • Left: Go to the sibling before the node (associated with the location).
  • Right: Go to the sibling after the node (associated with the location).

This is very different from traditional tree navigationt. Normally, you use a top down approach. At any given node you consider the node and the children, not the nodes parent and siblings. Normally, you only go in the down direction.

What does this have to do with compilers? Well, a big part of compilers is working with syntax trees. In my case, I was getting stuck trying to do things like access the third child of the grandparent when a node matched a pattern. With zippers I just need to go up, down, right, right, right. It turned my spaghetti compiler code into what felt like an old school video game cheat code!

Here are some links with invaluable info for learning about zippers if you want to learn more:

2.2. Wait to Refactor

Wait to refactor until you have an E2E prototype. If you asked me about this before starting the project I would of told you "of course"! Everyone, except junior devs, knows you should build an MVP before investing time into development. Looks like I'm still junior.

On my first day I had a fairly simple task of writing a constant folding operation. Essentially, if someone writes some constant algebraic expressions in their code we should be able to evaluate it at compile time. I did this by writing a function that walks the syntax tree. When it comes across an addition node with two constant integers as children it replaces the addition node with a node representing the sum of the constants. At a high level, I figured I could break my compiler into different syntax tree transformations where each transformation would essentially be a simple pattern match and replace similar to what I had done with constant folding. So I built a tree walking and pattern matching system. It worked great and my constant folding routine could be represented in a couple lines of code! I felt great after finishing that.

Then I started to work on the deinitializer. The deinitializer takes each line that initializes a new variable and breaks it into two, a line to declare the variable and another to assign it a value. This lead to what I called the one to many node problem since I wanted to take one node and replace it with many. My perfect system syntax tree manipulation came crashing down because it wasn't designed to replace one node with many. I ended up hacking it into the system but later when more complicated transformation were required the system collapsed (RIP, 2024-2024). Luckily, this lead me to zippers.

This has changed my view of how I write software. Software is not about crafting the perfect system. It's about testing your assumptions. It's a way to test your ideas. And just like actual testing - you want to fail fast.

2.3. Developer Feedback Loops

I spent a lot of time writing tests for my code. This was a good idea because it allowed me to iterate quickly. It also helped me catch any breaking changes. As I said before, I wasn't able to come away with any wizard like knowledge but tight developer loops are pretty close because they allow you to develop fast without noticing it. It's funny because when your developing and your not running into any bugs it feels normal. And since this "is the way things should be" you don't notice all of the bugs you've avoided by having tests. And you avoid the context switching of having to manually test things, come up with new test cases, see if those test cases work, etc…

My takeaway from tight developer feedback loops is:

  • They really do help increase productivity.
  • You don't notice them when they're working (but they're working!).

3. Log

Day 1

  • Python: Added model
  • Python: Added formatter
  • Clojure formatter
  • Clojure: Constant Folding
  • Clojure: Deinitilaizer
  • Python: Constant Folding and deintilization
  • One to many node replacement issue

Day 2

  • Python: resolver
  • Python: Unscript

Day 3

  • Python: Part of LLVM
  • Python: Refactoring resolver
  • Python: Parser
  • Python: Complete LLVM
  • Clojure: Instaparse parser

Day 4

  • Clojure: Understood zippers
  • Clojure: Constant Folding, Deinitialization, resolver

Day 5

  • Clojure: Unscript
  • Clojure: Starting LLVM generation
  • Clojure: AST cleanup

Date:

Author: Zach Dingels

license