TheStateOfGarnet2021
I am working on a new programming language called Garnet, so I wanted to write up a bit of a progress report about what it is, where it is, and where it’s going. This is partly to get my thoughts in order, partly to drum up interest (if it is deserving of any), and partly to solicit feedback from anyone who is interested. I’ve been working on it off and on for something like a year and a half now, fairly seriously for about a year, and I intend to keep going with it until either it has solved the questions I want to solve or it stops being useful because someone’s made something better.
What is Garnet?
The core question of Garnet is “what if Rust were simple?”
The origins of Garnet as a general idea are very old, but didn’t solidify until I learned Rust. I love Rust a lot, and it has basically replaced anything else I might use for most classes of problems I’m interested in. I essentially no longer write C#, F#, OCaml, Python or C for anything serious, if I can at all avoid it. For me, Rust actually is that good. However, Rust is not the last language that ever needs to exist, and like any real-world system that gets used for complicated stuff it has its share of flaws, downsides, and just outright mistakes. Garnet might or might not ever become anything large or popular, but I want to experiment with taking a lot of Rust’s ideas, making different decisions about how those ideas should work, and see what happens.
I’m not going to try to fully explain the language here. As a shortcut, imagine a simple Rust, or maybe a modern C, or maybe a systems-level Lua. Check out the WIP rationale notes for the design philosophy, and to see it in action you’re best off looking at the docs and tests directories in the source repo.
The Past
Or, what I did in 2021.
Basically, I got the compiler working end-to-end and compiling real
programs that include types other than integers and booleans. There are
integers of various sizes, there are basic enums and tuples and structs
and lambdas, but no closures or generics yet. There is a
Never
type which works way better than I ever expected.
There is some very basic expression-level type inference. All in all, it
now looks like a programming language that can do something, rather than
one that is fresh out of an undergrad compiler class. If I got around to
adding arrays and pointers, it would basically be functionally on par
with C.
Lots of small tooling and quality of life things were made, which are not exactly innovative but also makes everything feel much more real. There is a command line program that you can actually use to compile programs. There is a test harness that compiles and runs a bunch of real programs written as standalone files and checks that their output is correct, which seems like a small thing but actually makes life so much better. There is a CI pipeline that runs tests and does code coverage, though it’s still a little jank. Error handling for type checking got way better. Symbol tables and string interning got way better. I even started writing an informal spec.
The backend was rewritten. I went around in circles on it about three times, going from outputting Webassembly to creating an SSA intermediate representation to outputting Webassembly again to realizing that I didn’t know enough to write my own Relooper equivalent and I should just do something simple because a backend is not the important part right now. So now the backend simply outputs Rust code. This is stupid and also works very well. I’ll worry about the backend again once I know what the type system should look like.
Then… I spent about six months grappling with type checking, type systems and generics. I am not finished doing that yet, either. The type checker has been rewritten about three times as well, and has grown to be about as sophisticated as is necessary to work without any kind of generics, but I banged my head into it a lot trying to go further without much luck. I did accidentally reinvent bidirectional type checking, which is cool, but only sort of got me closer to my goal. Plain HM type checking is not good enough, or at least did not help me with what I want. So I’ve spent the last couple months hitting the papers and experimenting with the results of them, and writing a proof of concept type checker based on the “Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism” paper by Dunfield and Krishnaswami, 2020. Implementing the paper worked, now I am working on adding the rest of Garnet’s existing types into it which also seems to be working so far. Then I should have a good basis for handling polymorphism in general, basically about on par with System F, and I can replace Garnet’s existing type checker with it.
All in all the experience has given me a serious love/hate
relationship with type theory and comp sci papers, mostly verging on the
“hate” side. On the one hand, if I replace the variables â
and â'
in your algorithm with paramtype
and
rettype
and then go “…wait, is THAT what it’s doing? That’s
all? Oh.”, then you’re failing as a communicator. On the other hand,
turns out that simply typed lambda calculus is actually a really good
baseline for doing type theory, because if you can express functions, a
unit type, and the various types of generics involved with System F or
whatever you’re going for, then adding a lot of the useful
types is actually not much more work.
I’ve also concluded that I’m probably the only person on the face of the planet who wrote a parser and was basically happy with it on the first attempt. I guess some things have to go well sooner or later.
The Present
Or, what I want to do in 2022.
Garnet is very much still in the embryonic phase. A compiler exists, the language is technically Turing complete and has a lot of the foundational features like functions, loops, basic tuples and structs, and so on. However, it’s not able to do anything particularly interesting yet.
Major features it does not have yet are:
- Borrow checking
- Module system
- Generics and something like traits/objects/interfaces/typeclasses
These are all sooooorta interrelated, but I will try to tease them apart.
Borrow checking
Okay I have to admit I have no idea what I’m doing here, at least in a formal sense. I am on very good terms with Rust’s borrow checker and we very rarely get into fights, but I have only a general idea about how it actually works under the hood and what the algorithms are like. IMO there’s no particular need to do better than Rust’s borrow checker, but there’s still a lot to learn to be equally good.
So, this is both easy and difficult.
Module system
This is the simplest feature at its core. Having namespaces in code is useful, and having those namespaces stuck into a separate file is pretty convenient, and they may or may not correspond to separate compilation units. On the other hand, structs produce namespaces already, so I really may end up doing something similar to Zig or Lua, and make modules simply be structs. We can evaluate them at compile time, remove any unnecessary indirection, and compile the function calls to code just as if those structs never existed. But perhaps if we also preserve those structs explicitly, we can use them as interfaces, vtables, DLL export tables, or other handy bits of runtime metadata. There’s lots of potential for different features here, I just have to get the basic structure down and then experiment.
Which leads us to…
Generics
This is the biggie. IMO some sort of mechanism for safe code generation based on types is essential for a strongly typed language in the 21st century. However, it makes everything exponentially more complicated.
It is easy to make types and functions accept arbitrary data types, as long as all they can do is move data around. The difficulty comes when you want to do stuff with your data based on partial information about it, and it becomes more difficult again when your data can be code anyway. Various language have different approaches to this, Rust traits and Haskell typeclasses and C# objects and SML modules; the basic principle of “here is a set of functions you can apply to this data type” is pretty clear. But then when you nest these, or generate them based on other type constraints, then things can quickly become impossible to typecheck or impossible to generate efficient code for.
So this is a design space that needs some exploration. Rust’s traits are very nice, but I dislike the constraints around trait objects, coherence and how much they invite metaprogramming without quite living up to the full possibilities. Zig’s essentially macro/template-ish code generation approach is very compelling, but has the problem where things can be invisibly Wrong until they are instantiated and used. Object-ish systems are quite well explored, but have a lot of holes and do a lot of their work at runtime via slower dynamic dispatch. If our traits/interfaces are structs, then can we define our generics simply by having functions that take a type and return a struct implementing the generic for that type? SML’s modules work a lot like this, and the research language 1ML demonstrates you can make it work with just functions and first-class types. This starts out seeming easy, then you realize it means you can have structs with types as their members and have to figure out how to actually implement that, and kinda goes down-hill from there. Can we do something like that robustly, without needing arbitrary compile-time code execution that may take infinite amounts of time? Moreover, can we do it without it being a total pain in the ass?
I also really want it to add some useful special cases to a more general system. In Rust there’s a lot of functions that have equivalents that operate on owned objects, on references, and on mutable references. This irks me, and occasionally (such as when making builder pattern objects) causes real problems. I think being able to be generic over ownership would make the language both simpler and nicer. We will see though.
The Future
Or, what isn’t going to get done in 2022.
Backend
The compiler is written in Rust, and rather cheekily also outputs
Rust code and calls rustc
to compile it. Obviously, this
does not really fulfill any of my “easy FFI” or “quick to compile”
goals, but should be good enough to bootstrap itself with. Keeping a
backend around that emits Rust, C, or some other high level language
might actually be quite useful for making archival snapshots or
bootstrapping on new platforms – OCaml does something like this, for
example. But the intention is to eventually have a Real Backend of some
kind or another.
What to use for its backend, though? There’s not actually that many general-purpose compiler backends out there, your choices are basically GCC or LLVM, and both of those are large hulking monsters I would really prefer not to have to deal with. This basically leaves two options: Use a less-developed but more lightweight compiler backend like Cranelift or QBE, or write our own backend from scratch. On the one hand, using existing code both enables us to benefit from other people’s work, and shackles us to be limited by their work. On the other hand, writing a fairly simple backend is a large but fairly manageable task, while writing a fancy backend is a lifelong adventure I’m not terribly interested in. Not sure which way is the best right now. Both have upsides and risks.
Properties
I want the compiler to be able to explicitly prove and promise that
certain functions or types have certain properties. Rust does this a lot
with traits and derive macros, but that often comes down to Compiler
Magic in the end, such as the Copy
trait. Mainly I want to
be able to be able to say that certain functions do not panic, or do not
allocate dynamic memory, or do not use unbounded stack space, things
like that. Then if I violate one of these properties by accident the
compiler can catch it for me. This seems fairly straightforward to begin
with, but as it gets fancier you start edging into algabraic effects,
which are a topic of open research. So we may go there, or we may
cherry-pick the most useful bits of functionality from them without the
complexity of the whole system, or the whole idea may be more trouble
than it’s worth. We will see.
Tooling
Tooling is Important. Rust is a great example of what a difference modern tooling can make, and C++ is a great example of what life is like when a language makes that impossible. So, if I want Garnet to become a Big Boy Language, it must have at least:
- Build/packaging tool (cargo)
- Package index (crates.io)
- Formatter (rustfmt)
- Doc generator (rustdoc)
- Basic editor integration (syntax highlighting and basic context info)
If things get more serious, more powerful tools also become useful:
- Linter (clippy)
- IDE integration (language server)
Frankly, I’m halfway convinced I could just make cargo
believe that Garnet code is Rust, and invoke garnetc
appropriately. That way we could just use an existing build tool. This
isn’t necessarily a good idea, but it would be hilarious if it
works.
None of this exists yet.
Conclusions
So yeah! The easy stuff is mostly done. Spending six months grappling with type theory and still not having a real solution is… pretty demoralizing, but not particularly surprising in retrospect. I know the shape of what I want so I think I just have to hang in there and get there in my own fashion, and I am now reasonably certain I am on the right track. Math has never been my friend but I know I shall bend it to my cruel whim sooner or later. Always have before.