TheStateOfGarnet2023

What got done in 2023
Type system
Compilation model
Neat features that exist right now
- More composable unary operator syntax
- Nominal vs. structural types
Coroutines
So what’s left?
Conclusions

Garnet is my attempt at making a programming language that could be summed up as “What if Rust was simple?” The goal is to make a small, borrow-checked language aimed at lower-level programming, designed for portability and robustness. It’s the end of another year, so let’s take a look at what’s new.

Last updated December 2023.

Previous years:

What got done in 2023

Finished writing a basic type checker with inference and generics, it now mostly works as long as you don’t get very fancy
Made newlines syntactically significant, which dramatically unfucked old syntax issues
Tried to improve generic type inference, which dramatically conjured new syntax issues, so it’s probably not worth it
Sum types now work, as do arrays and tuples
Took a stab at monomorphization, which doesn’t work yet but is a blocker ’cause we need to be able to generate some code that isn’t valid Rust. Since we just compile to Rust for now, this is a bit of a nuisance.
Experimented with a namespace/import system; it’s on hold until there’s a real plan for separate compilation.
Burned myself out and decided to spend ~~a couple~~ four months working on a side-project, which conveniently made me get around to learning Elixir too. Elixir is a pretty wild ride and I love it.

I also did a lot of cleanup and minor convenience-level language improvements, like adding consts and propagating them into generated code where possible, sorting out type and operator syntax a bit, adding nicer support for compiler intrinsics, making a slightly better autoformatter and integrating it into the language test harness, adding a verification pass to the type checker to make sure it actually finds concrete types for everything in the program and doesn’t leave anything un-inferred, etc etc… There’s still a lot of small pieces left to do though, for example bit operations like shifts and xor’s are currently just compiler builtin functions, there’s no real way to convert values between types, there’s no characters or even constant strings…. It would take a day or ~~two~~ four to add each of those, they’re just less important than getting the foundations down more solidly.

So right now, this is a small example of the hypothetical Garnet I am aiming for, that just implements the PCG32 pseudorandom number generator:

type Rand32 = struct
  state: U64,
  increment: U64,
end

--- Mutably borrows a `Rand32` and produces a random 
--- `U32` in the range `[0, 2^32-1]`.
fn rand_u32(self ^uniq Rand32) U32 =
  -- We have to explicitly dereference borrows with the ^ operator,
  -- which is postfix.  Not sure if we'll eventually have Rust-style 
  -- auto-deref of borrows, don't feel like we need it yet.
  let oldstate U64 = self^.state

  -- The "method" syntax foo:bar() is just equivalent to calling bar(foo)
  -- I'm considering replacing it with a more functional-style pipeline
  -- operator, foo |> bar(), might play with that someday.
  self.state = oldstate
      :wrapping_mul(RAND32_MULTIPLIER)
      :wrapping_add(self^.increment)
  let xorshifted = cast(|U32| ((oldstate >> 18) xor oldstate) >> 27)
  let rot = cast(|U32| oldstate >> 59)
  xorshifted rotr rot
end


const RAND32_DEFAULT_INC U64 = 1442695040888963407
const RAND32_MULTIPLIER U64 = 6364136223846793005

--- Creates a new RNG state with the given seed
--- and the default increment value
fn new_rng(seed U64) Rand32 =
  let mut rng = Rand32 {
        .state = 0,
        .increment = RAND32_DEFAULT_INC,
  }
  rng:rand_u32()
  rng.state = rng.state:wrapping_add(seed)
  rng:rand_u32()
  rng
end

fn main() =
  let mut rng = new_rand32(4)
  let num = rng:rand_u32()
  println(rng.state)
end

Then when you rewrite this into something the Garnet compiler can currently actually build, you get this:

type Rand32 = struct
  state: U64,
  increment: U64,
end

-- There's no borrows or references yet, or even raw pointers, so for now we just take
-- the struct as a value and return tuple containing a new, modified struct, immutable-
-- language style.
fn rand_u32(self Rand32) {Rand32, U32} =
  let mut self = self
  -- Currently declared types are strictly nominal, you can't do much with them besides
  -- pass them to a function that expects that type.  This includes accessing the fields
  -- of the struct.  So the $ operator is an "unwrap" operator to access the inner value, 
  -- like doing `foo.0` with a newtype struct in Rust.
  -- It should be able to be inferred away when I can bother.
  let oldstate = self$.state
  -- No wrapping/saturating/etc math operators yet, so this will actually
  -- panic on overflow.
  self$.state = oldstate * RAND32_MULTIPLIER + self$.increment
  -- Bitwise operators are just built-in functions right now.
  let xorshifted = __rshift_u64(__bxor_u64(__rshift_u64(oldstate, 18), oldstate), 27)
  let rot = __rshift_u64(oldstate, 59)
  -- Type conversions are also builtin placeholders.
  let res = __ror_u64(xorshifted, __u64_to_i32(rot))
  {self, __u64_to_u32(res)}
end


const RAND32_DEFAULT_INC U64 = 1442695040888963407
const RAND32_MULTIPLIER U64 = 6364136223846793005

fn new_rand32(seed U64) Rand32 =
  let mut rng = Rand32 {
        .state = 0,
        .increment = RAND32_DEFAULT_INC,
  }
  -- No pattern matching yet, so we index the returned tuple to access the first element.
  rng = rng:rand_u32().0
  rng$.state = rng$.state + seed
  rng = rng:rand_u32().0
  rng.0
end


fn main() =
  let mut rng = new_rand32(4)
  let res = rng:rand_u32()
  __println_u32(res.1)
end

I’d totally forgotten I’d implemented unsigned integers; neat! So there’s a lot of placeholders and fiddly edges, but honestly it’s better than I thought it was, which is of course part of the point of doing a retrospective like this.

Type system

The main blocker on handling a lot of those fiddly edges is improving Garnet’s type system, which is really where 90% of the effort has gone. It’s also the part where I expose how little I know, so let’s get it out of the way first. Garnet’s type system is basically the younger sibling of Rust, which is a younger sibling of ML, but there’s still a lot of design space to play around in there. Beyond the modules vs typeclasses thing I went through last year, here’s the main points that I’ve mostly settled on for Garnet:

Garnet has Rust-like generics that get typechecked and monomorphized – as opposed to for example Zig that treats generics as template-y macro-y things that generates new code before typechecking, or OCaml which last I checked doesn’t monomorphize most generics.
Limited-at-most dependent types. Ie, you can’t treat arbitrary values as part of types, such as being able to write fn square(Int) returns an Int >= 0 and have the type system prove it correct. Arrays with integer lengths can be treated like a special case of this, and if there are other special cases they’ll be limited to fairly-simple integers and maybe C-like enums. This is mainly because whole-hog dependent types tend to creep towards having a Turing machine in your type system, and also from what I’ve seen when people want dependent types they really want to be able to do math to type sizes or reflect on stuff at compile time. So I’m not sure it’s worth the cost as a general-purpose tool.
Higher-kinded types are a bit of a morass that I’m still undecided on. The question that HKT’s answer is, can you write a generic type constructor that takes generic parameters? Can a function take a type like Map(K,V) where Map is a generic type and let you pass it a BTreeMap(K,V) or a HashMap(K,V), without writing the equivalent of separate trait impls for both? HKT’s are nice, but are they nice enough to be worth the added complexity? They add complexity, but are they that much more complex than not having them? You can often work around their absence with some effort, but how nasty is that? I dunno. Every time I look at them I come up with a different conclusion.
So for the math-y among us, I guess this is fairly close to System Fω?

Details about Garnet’s type system that don’t fall into these categories:

Garnet will have move semantics with optional linear types. Values that are non-Copy will mostly be drop-checked automatically a la Rust, but you will have the option to label types “must be destroyed explicitly” in which case the compiler will force you to call their destructors when they go out of scope. This lets you handle the cases where you need to drop objects in a way that doesn’t fit into the Drop API. Basically all the actual details of this are up in the air though.
There’s about 25% of an algebraic effects system, which I call “properties”. “Effects systems” a la Koka seem to really be made of two parts which, to me, seem unrelated: declaring what side-effects functions can have, and attaching non-local control flow to those side-effects. I just want to be able to assert things about functions in Garnet and have the compiler check my work for me. For me the interesting effects are mostly fairly low-level things like const/comptime, “no infinite loops,”no dynamic allocation”, “no panics”, “no stack overflow”, etc, and only like half of them actually need compiler intervention to check. For all the rest, all you really need is the ability to tell the compiler “this specific function has an effect called X” and then anything that doesn’t call it has your “doesn’t do X” property. For example, any function has the effect “doesn’t write to stdout” as long as it doesn’t call another function that writes to stdout anywhere in its call tree, and you just have a low-level write() function somewhere that has the property “this writes to stdout”. Then you can parameterize higher-order functions on it, so map(List, F) has the “doesn’t write to stdout” property as long as F does as well. Naturally this gets tricky sometimes when you want to dodge side effects, such as with logging or debugging functions, but figuring out how to make that work well is part of the fun.
As described, those properties are ways of describing constraints on functions, but the original motivation for properties was looking for something better than Rust’s marker traits on data. Beyond Send/Sync I want to be able to express things like Zero for types where zero initialization produces a valid type, or ConstDefault where a non-zero but fixed bit-pattern is a valid initializer, or Total where any bit-pattern is valid. Hopefully this lets you specify some things that tend to be sharp-edged UB in Rust or C, such as “this chunk of uninitialized memory will contain garbage but it’s still a valid integer”, or “casting this integer into a file handle is a bug because you can’t just conjure valid file handles out of nowhere”. Those sorts of things are honestly probably more useful than function properties, but the semantics around it are also harder to think about so my ideas about how to use them are less mature.

Compilation model

This is another tough part that I haven’t gotten much real work done on yet, but it’s also one place where Garnet’s goals depart from Rust’s the most. Rust makes some very deliberate design decisions: Compilation units are big, everything is statically linked, and there is no stable ABI by default. This makes the compiler design easier, linking and versioning of code easier, and optimizations more aggressive. But it also has some downsides: It’s harder to make compilation parallel, it’s harder to make compilation incremental, and it’s more work to distribute code artifacts (which is not always necessary, but is pretty useful sometimes).

So instead Garnet will have:

API versioning built into the toolchain by default, with semver checking. In a strongly typed language there’s basically no reason I know of why the compiler shouldn’t be able to detect breaking API changes 100% of the time, or at least 99% of the time with a 1% false-positive rate and 0% false-negative rate.
Compilation needs to be embarrassingly parallel, or at least have as few single-threaded bottlenecks as we can manage. I’m not sure I’m quite as dedicated to it as Zig, which guarantees you can lex each line in a file in parallel, but there’s a lot you can do to design away those bottlenecks if you try.
There will be a stable ABI, though that’s honestly pretty damn hard and a project for the far future. The current lowest-common-denominator ABI is the C ABI for whatever OS you are on, but I would also like to have a stable Garnet ABI that can express things like generics and borrows, including metadata and maybe partially-compiled code for another compiler to pick up on and work with. Having this can also make fast parallel compilation easier, since the compiler can read some metadata about a compiled artifact and know what to do with it, rather than having to start entirely from the source. Interop with other languages like C and Rust is a necessity, but honestly my dream is eventually it will be useful and desirable to add a repr(Garnet) directive to Rust and C compilers. How to achieve all this? Eeeeh, ask me in a couple years.
Because of all of this, using the existing C-descended linking model is prooooobably a bad move? The biggest issue with it in my mind is inlining. Inlining is really important, and not being able to trust compilers to do it results in a surprisingly large pile of terrible shit in the C/C++ code that people actually write. But currently your options are basically “don’t have inlining across compilation units” (C/C++), “have only one compilation unit” (Rust/Go), or “trust your linker’s LTO magic to do something sensible”. I don’t like the last option, partially because a linker doing LTO magic tends to be one gigantic bottleneck in a program’s otherwise-parallel compilation. (Linking is always the step at which building a giant program like Chrome or LLVM suddenly consumes 60 GB of memory and cacks out.) On the other hand, the issue of linking is a long-term performance improvement, and I’m certainly not going to sit down and start writing a new linker tomorrow. Zig and Swift are both doing some fascinating work exploring this space, so there’s nothing wrong with letting them lead the way on that a bit and learning from their work.

Neat features that exist right now

Okay, I’ve done lots of talking about what I want to do with Garnet, but there’s some small things that Garnet does right now that just make me happy, so I’m going to talk about those. These mostly are places where I feel like Rust has some nasty rough edges and I wanted to make them nicer. There’s also a big pile of smaller mostly-cosmetic things that I like in my own opinionated way, like having only one type of doc comment or making struct and tuple literal syntax mirrors of each other, but while they spark some joy in me they don’t actually Matter outside my own head.

More composable unary operator syntax

Basically, most unary operators in Garnet are postfix, particularly the reference and dereference operators. This is similar to Zig, and hopefully will become the standard idiom for most languages in the future for reasons I shall now explain, somewhat similar to how types-compose-prefix is now an unspoken standard among Go, Zig, most of Rust, and a few other more minor languages. (Garnet does that, too.)

This solves one significant wart in Rust, that of operator precedence of chained unary operators. Basically, in Rust when you write *foo.bar.baz the * binds less tightly than the ., so you’re writing *(foo.bar.baz). If you want to write (*foo).bar.baz then you have do that explicitly. However, in Rust you usually don’t notice because the compiler will insert reference or dereference operators for you, with some consistent rules that actually work quite well. So if you need to write *(foo).bar.baz or even *(&(*(*foo).bar).baz) then you can actually write *foo.bar.baz and rustc will usually be able to just Figure It Out For You. Except that when you’re using raw pointers, you really want dereferences to be explicit so rustc doesn’t do ref/deref inference for raw pointers. So in that one damn case you’re stuck writing *(&(*(*foo).bar).baz) anyway. It’s cosmetic, but it also gets old really quick. C gets around this by having both . and -> operators where foo->bar is sugar for (*foo).bar, and it works but I kinda dislike that too.

Meanwhile in Garnet ^ (aka *) and & are postfix, so to write Rust’s (*foo).bar.baz in Garnet you write foo^.bar.baz and to write *(foo.bar.baz) you write foo.bar.baz^. If you want to write *(&(*(*foo).bar).baz), then well you just do foo^.bar^.baz&^. No parens needed, and instead of having to read it from the inside out you just read it from beginning to end. Because of this Garnet has no auto-deref rules; it may grow some eventually, but IMO it doesn’t really need them so far.

This syntax is hardly new and exotic, it’s exactly the logic behind Rust’s postfix .await, Zig’s pointer dereferences are postfix .*‘s, and Pascal did the same thing with its own pointer syntax in the heckin’ 1970’s. But it still makes me happy to replace a subtle complication with a nice simplification.

Nominal vs. structural types

Ok, this one is actually more than cosmetic. In Rust, tuples are structurally typed, so every (i32, String) is interchangeable with every other one, while structs are nominally typed, so struct Foo { x: i32, y: String } and struct Bar { x: i32, y: String } are entirely unique types and you can’t assign some Foo to some Bar even though they have the same fields. Both nominal and structural types are useful, so in Rust you can turn anything into a new nominal type with the “newtype struct” pattern, struct Foo(Whatever);. But you can’t make a struct into a structural type, and “newtype structs” always seemed like a kinda half-assed solution both syntactically and semantically, and there’s no real reason for it that I can tell. It just ended up that way.

In Garnet, tuples are structural, every {I32, String} is interchangeable with every other {I32, String}, and structs are also structural, so you can write a variable of type struct x: I32, y: String end and it’s interchangeable with every other variable declared with the type struct x: I32, y: String end. (Yeah the syntax is too verbose for inline types, idk what would be better.) It’s just an anonymous struct type, and its only identifying features are its field names and types, so it’s also the same type as a struct written struct y: String, x: I32 end. In this it’s a little like OCaml’s and C’s structs, which I frankly never particularly liked.

But then we make the type Foo = ... declaration create a new nominal type, just like Rust’s “newtype structs”, and suddenly everything just slots together. type Foo = struct ... end creates a nominal struct like Rust, and type Thing = {...} creates a nominal tuple like Rust’s newtype structs. Then you can have an $ operator to “unwrap” or “take apart” the nominal part of the type to get at the underlying structural type. A “type constructor” like Foo { .x = 3, .y = "something" } or Thing {a,b,c} is just a function which takes a single argument, which is an anonymous struct literal, and returns an instance of that struct. So you can do things like some_array.map(Foo) to turn an array of tuples or anonymous structs into an array of Foo’s. In contrast, in Rust if you make a newtype struct Foo then its constructor is a function, but if you declare a struct type Foo then its constructor is not a function, and you just suck it up and deal. (And now that I think of it, Garnet’s $ “unwrap” operator is just a function too, which dispatches on the type of its argument and does the opposite operation.)

Then under the hood, both tuples and structs compile to tuples, which is to say, ordered, heterogeneous sequences of values. So the plan is to have a “layout” concept of some kind that can turn a struct into a tuple or vice versa, so you can tell the compiler that you can turn a {I32, String} into a struct x: I32, y: String end by mapping the 1st field to x and the 2nd field to y, or vice versa. This gives you a way you can apply a layout to impose an ordering on a struct or field-names to tuples, and suddenly tuples and structs become interchangeable with each other and a lot of the annoying little decisions you have to make with choosing which to use in Rust can go away. And somewhere down the line (with additions to describe padding and alignment of fields) this will let you express things like Rust’s repr(C) which define an ordering to struct fields in memory as well, except instead of magical compiler builtins with fixed rules they’ll just be something you can define yourself.

Oh, I nearly forgot; this also meshes nicely with sum types. Basically each variant in a sum type, aka an enum in Rust, just takes one argument. That argument can be an anonymous struct or tuple, or it can be a nominal type, or whatever. It’s a bit rough still, but have you ever wanted to write a function that only takes or returns one particular variant of an sum type while still being the same type? Or one that takes a type constructor function for a particular variant and calls it with some particular data? Or one that takes a sum type’s discriminant and a tuple of values and constructs one of several variants with identical fields? In Garnet that should all be possible, or at least more smooth and less manual than Rust, because the contents of a type and the name attached to it are always separate things.

Coroutines

So in June 2023 Graydon Hoare, the creator of Rust, wrote a blog post titled The Rust I Wanted Had No Future. It’s full of fascinating little nuances here and there, many of them about design goals and how that interacts with people management/diplomacy/politics. It paints a pretty interesting story where Graydon wanted a higher-level language with some low-level features, something more in the vein of Swift, C# or Elixir. But other people saw that Rust could be a safe low-level language instead, and really wanted that, and Graydon basically ended up bowing to their wishes. I’m pretty glad he did, too. I wouldn’t say that the Rust he wanted had no future (Swift exists after all, and Graydon spent several years at Apple working on it) but I think the Rust we got fills a niche that has been unfilled for far too long.

One of the interesting features he mentioned losing the argument over was “exterior iteration”, which caught my attention ’cause I’d never heard the term before. All he really says about it is “Iteration used to be by stack / non-escaping coroutines, which we also called ‘interior’ iteration, as opposed to ‘exterior’ iteration by pointer-like things that live in variables you advance.” So now I had to figure out what “non-escaping coroutines” were. I’ve heard of coroutines of course, but never really grokked them particularly deeply. They were always one of those concepts, like type theory notation or thermodynamics, that I kinda get in the abstract but never understood well enough to figure out how to do anything useful with it. Plus every language that says it has coroutines seems to mean something slightly different, and it’s not a small list – there’s Lua, Modula-2, C# with tasks, Python with generators, Rust with async, etc etc. So I figured it was time to try to figure coroutines out for real.

This lead me down a fascinating and awesome rabbit-hole, for months. The short version is that Garnet isn’t going to do interior iteration like Graydon wanted Rust to do. The long version is really cool though. Basically what Graydon was referring to by “interior” iteration is making iteration happen via functions that take closures, like Ruby does it. So instead of writing in Rust:

let mut iterator = (0..10).iter();
for i in iterator {
  do_stuff_with(i)
}

in Ruby you write:

iterator = (1..10)
iterator.each do |i|
  do_stuff_with(i)
end

Ok, so instead of a language construct for a loop you have a function that takes a closure. Does this really change anything? Why yes, because what do you do if you want to break the loop early? If you write a return in the loop in the first example then it returns from the function containing the loop. If you write return in the second one, it… returns from the closure and continues the loop from the top. Right? So with interior iteration you don’t have any built-in way to write break or return from inside a loop, just continue. Riiiight? Well that seems like a strict downside, ’cause now if you want to write a loop that breaks early you have to make the inner closure return some marker value like a bool or an Option that tells the outer function “keep iterating” or “stop here”. That is exactly what Rust iterator adapters like map_while do, and while it works it makes nesting loops and keeping track of what is what rather harder. It seems like a strange thing for Graydon to yearn for.

But no, that’s not how the second example works at all! Fire up irb and try it out! Add a break or return to your loop and see what happens. Nest a few loops! The the block being passed to each() can break or return through multiple levels of function calls; the each() method can say “I’m calling this thing, and if it calls return, it returns all the way to here even if there’s multiple function calls between this call and that return”. But how the hell does that work? To figure that out, let’s back up a little.

So a function call is, at its heart, saving the current code’s state onto a stack, jumping to a new piece of code and doing some stuff, and then going back to where it came from and restoring its state. A coroutine is an extension of this: if calling a function is a jump, a coroutine is a jump and a stack switch. This explains why coro’s are both very simple and fundamental (it’s not actually doing anything particularly fancy, just loading a new stack pointer, so implementing them in the 1970’s languages where the term originates wasn’t a huge stretch), and also arcane as heck (they look a lot like hardware interrupts, and hardware interrupts are where the theoretical underpinnings of the Turing machine model of computation goes to die).

Ok, so a coro is a jump and a stack switch. What the heck does that mean? Well, first it means that calling a coro is usually a distinct operation from a function call called resume, so it doesn’t get mixed up with function calls, because resuming a coro does things that a function call can’t. And it means that to exit from inside inside a coro along with return you have a new operation called yield. Inside a coro you can call a function and return from it and it all works like normal, but also instead of return you can also call yield which hops back up the call chain to where the coro was resume’d from – similar to throwing an exception. So it’s like, yield is throw and resume is catch? Not quite; unlike an exception, the call stack of the coro doesn’t go away when you call yield, so if you call resume on the same coro again you just hop right back up the call chain to where you left off and keep executing just after the yield, with all the function calls between those two points still intact. Instead of unwinding the stack and destroying everything on it until you hit a catch, it keeps the coro’s piece of the stack around so you can switch back to it later. That’s why the stack switch happens, so you can pick up a coro where it left off. So a coro also has state, the state of a new stack that you can jump into and out of.

Or, you can never yield and just return normally, or you can resume a different coro and never return to the old one. Or, you can recursively resume yourself and build up a chain of yield calls that would have to be made to go back to where the coro was started. Or you can resume a coro and pass it another coro and tell it “resume this thing when event Foo happens”.

This is about where most people’s heads start hurting, so let me try to restate it

In its most general form:

A coro is a first-class object with state, like a closure is. It uses memory that has to be allocated and freed.
Like a closure, a coro can be passed arguments or capture its environment, so it can refer back to objects that existed where it was created.
Unlike a closure (usually), when you call a coro it might never return to the caller; it’s entirely possible for it to just wander off down a diverging control flow path without ever calling yield or return, such as having two coro’s that just repeatedly resume each other.
And that coro may create new coros, pass them as arguments to each other, dynamically decide which ones to call, etc.
This means that a coro is a piece of data that incorporates arbitrary control flow.

For the Schemers out there, if this starts sounding like continuations, it’s not a coincidence; there’s a hierarchy of power out there that basically goes exceptions < coroutines < continuations. (Apparently coroutines are equal in power to continuations that can only be called once, and I don’t even know how to start thinking about that.) For the Lispers out there, if it sounds like Common Lisp “conditions”, also not a coincidence, afaict they’re exactly the same thing. For Go or Erlang people, your lightweight threads are basically implemented by the compiler as coroutines that get stuffed into a global data structure and selected by a scheduler. The compiler inserts yield calls for you at various places in every function, so each process/goroutine runs for a little while and then does its jump+stack-switch back to the scheduler, which chooses a new one and does a jump+stack-switch into it to pick up where it left off. In fact you can implement coroutines with lightweight threads very easily.

There’s also a few more details I will mention just for completeness:

A coroutine usually has an “exit” control flow distinct from yield that says “this coro is over and cannot be resumed”. That’s what happens if a coroutine calls return instead of yield at the toplevel.
Coro’s have nothing at all to do with threads… apart from them both involving a jump+stack-switch! With coro’s that operation is done explicitly when the code says to do it, with threads it is done entirely by surprise, when the hardware says to do it. So in some ways they are very similar but in others they are very different beasts.

Ok so this is all awesome and brain-bending and how are you actually supposed to use these? That’s a very good question, and seems to be why so many variations of coroutines exist. Most of them have various limitations to specialize them for various purposes, and so you get you a whole fucking zoo of distinguishing features:

Asymmetric vs. symmetric coroutines – asymmetric are the more common variety currently, they have yield and resume as I’ve described while symmetric ones only have resume. If you want something like yield in symmetric coro’s, you build it out of resume and pass a parameter for where to yield to. They’re equally powerful, you can implement either one in the other style, but people seem to like asymmetric ones more ’cause they act more like functions than goto’s.
Stackful vs. stackless coroutines – Stackful coro’s are “normal” as described, stackless ones are basically Python’s generators and can only call yield at the top-level function. This means they can’t recurse, and also means that calling yield only ever yields through one function call of the stack; it can’t do the jump-up-multiple-levels-of-calls that stackful coro’s can. However, this also means that stackful coro’s may need to allocate their own dynamically-sized stack, while stackless ones are always able to be turned into fixed-sized objects at compile time. (I have a hunch that this is incomplete, and you can often make a coro fixed-sized without restricting it to only yield from the top level, but I haven’t dug into that too much.)
Escaping vs. non-escaping coroutines – In GC’ed languages like Lua or Common Lisp, coroutines can capture their environment like closures can, and so can refer to variables on the stack of the function where they were created. Just like closures! And like closures, those coro’s can be returned or passed to other functions or stuffed into arbitrary data structures, escaping the place they were created, so they have to be able to bring the fragments of that environment with them. So when Graydon was talking about “stack / non-escaping coroutines”, he was talking about coroutines that are limited to not be able to do this; these coroutines are essentially limited like Rust’s closures are, you make sure a coro either can’t outlive the function that called it or there’s some kind of explicit copy/move/GC/whatever involved to make sure the shared bits it uses stay alive as long as it needs them to.

There’s probably more variations, and I’m not even going to try to go into promises/async/generators etc and what they actually mean. Right now, if you want to play with a language with the most archetypical, powerful and interesting coroutines around, check out Lua. If you want to learn more about the engineering and theory, check out the paper Revisiting Coroutines which is very easy to read, optionally followed by Coroutines in Lua which has a bit more math in it. (It’s not a coincidence that both were written by Lua implementers.) I feel like I’ve seen a third paper somewhere that’s somewhat newer, but I can’t find it?

Ok, so coroutines are all very cool, but still… so what?

Good question!

Oh right back to Garnet

Graydon’s original suggestion for non-escaping coroutines was to use them mainly for iteration, like Ruby does. Eeeeh, IMO iteration isn’t exactly a difficult problem that needs lots of interesting innovation done to it. Garnet is very goal-driven; if I’m going to consider adding coroutines to it, they need to have a purpose, so what else might we use coroutines for?

State machines
State machines
State machines
State machines

Ok, that’s a bit tongue-in-cheek, but really is the essence. There’s just a few kinds of state machines that are so common and special-purpose they have their own names, so it’s more useful to think of it as:

Iterators
Error handling (especially involving transient errors or retries)
Non-blocking I/O
All other state machines

Great, if iterators aren’t too interesting and Garnet isn’t doing much exception-style stack-unwinding error handling due to all the various well-known issues with it, are the other use cases valuable enough to add a new feature? Specializing coro’s for non-blocking I/O pretty much gets you Rust’s async, and “do not have Rust’s async” was one of the OG goals for Garnet. I used to feel like Rust’s async was just a big heckin’ mess of unnecessary complexity, but withoutboats has written some compelling stuff on why it looks the way it does, and so I now think it’s a lot more of a useful local minimum. It contains lots of subtle details around moving values, ownership, and control flow which interlock to make it work, and you can’t really change much about one part without needing to change all the others. Still, I’d rather have a general-purpose tool than a special-purpose one, and Rust’s async is pretty lame compared to real coro’s, so maybe we can do better/nicer?

First, let’s get one thing clear: Using coroutines for state machines is a feature of convenience, not necessity. If I understand correctly, any coroutine can be rewritten to be a normal closure that takes a struct representing its current state, and returning a new instance of that struct, optionally plus a global stack for the context of “where does yield yield to”. The thing is that keeping track of it all without the stack-switch part of the equation built in is just a gigantic pain in the ass, and doing it that way doesn’t compose.

For example, writing non-blocking I/O in Rust without async isn’t that hard; whip out mio and just get to work. It’s a little tedious but far from difficult. You might have multiple states in a handshake protocol or error states or such, but writing those singly is also pretty easy. Boiler-plate-y, but I usually prefer simple boiler-plate to complicated hidden logic anyway. What gets difficult is composing these state machines, so that if you’re writing say a network protocol, you can have one portion that says “initial handshake”, one portion that says “talk to new peer of type A”, and a different portion that says “talk to new peer of type B”, then more portions that say “talk to previously-known peer of type A/B”, and so on and so on. It all becomes a big morass of state enums and fragile function calls back and forth dispatched by various giant match statements and you’d better not forget any of them or mix them up or else you get subtle hard-to-spot bugs. Then if you need to add a new peer of type C with its own protocol variation? Be prepared to rip apart and replace big chunks of your existing code, and basically re-analyze the entire thing to make sure it all still works. Using coro’s instead, where each coro is a state and each resume/yield call is a state transition, is way nicer. I have some very ramble-y notes on how it looks to implement a small but non-trivial state machine this way, and it works out surprisingly well.

Sounds good, right? Let’s do it! Wellllll… unfortunately there’s some wrinkles. Garnet has no GC, like Rust the memory management the language understands is purely RAII+borrow-checking. So if it has coroutines, it is up to the user to tell the compiler whether a coro can live on the stack, in a Box, in an Rc, or something else. This is just like Rust’s closures, so it’s not an unsolvable problem, but with coro’s it’s way nastier. Borrow and drop checking works by analyzing the static control flow of the program and making sure there’s no code you can reach where Bad Stuff happens. Coroutines… are data that represents arbitrary control flow. So without getting too deep into the (fascinating) weeds about it, making a borrow checker that can try to figure out what the hell a coroutine is borrowing and how long those things live is a pretty hairy problem. Plus, remember that bit about a coro being a jump+stack-switch? The “stack switch” part means that creating a coro often means allocating dynamic memory, which Garnet needs to be able to function without. Stuff that lives on the stack needs to be fixed-size, so can you statically analyze a coro to make sure it can be fixed-size? Again, yes, you can do this like with you can closures, but that gets you the stackless coro’s described above, which really just… aren’t very useful in comparison to stackful ones. The best description I’ve heard for them is “weaksauce”. They work for promises or iterators but way less useful for whole-hog state machines, so it’s really not very appealing.

And that’s without the whole morass you get with Pin in Rust’s async functions! Which is not a small deal, because if you don’t want every single coro to be boxed, then that means when you move them they move, you’re not just copying a pointer from one place to another. Which means you have the whole self-referential struct problem, but also you can have one object on one stack pointing to a different object on another stack, and that stack may suddenly get picked up and put somewhere else in memory. And unlike the call tree of function calls, because you can pass coro’s around as objects and stuff them into data structures and whatever, instead of a tree or DAG of function calls you can get a call graph with cycles in it that are constructed at runtime. And so as far as I can tell, when your compiler tries to figure out where to insert the drop() calls for you there will always be cases where it just cries. This isn’t a problem in GC’ed languages because the GC has the power to move an object in memory and then fix up all the pointers pointing to it; manual memory management literally can’t do that. If you tried then you’d end up with 90% of a GC already. (Which sounds pretty cool now that I mention it, but not something for Garnet.) So if you want a coro that can be passed around as freely as it is in Lua or such, it would almost always need to be boxed.

So… yeah. Is this a solvable problem? Can we make:

some sort of coroutine
in a borrow-checked language without automatic memory allocation or a GC
that can live on the stack and be copied around
which aren’t weaksauce
can create and call new coro’s without having to box every single one
and is still pretty nice to actually use?

I mean… probably? Eventually? If you took withoutboats, burntsushi and a bunch of other Rust lang design gurus and locked them in a room for a year or two to come up with a clean slate design, I’m like 80% sure they’d come up with something that works really well. But Garnet’s goals are “small and obvious” and, more importantly, “don’t be a research project”. To me the dynamic control/data flow in coro’s makes them mesh really well with a garbage collector, and looks really hard without one. I still need to do a writeup on all the details to explain how I think about this, but there’s a lot to there and it’s not sorted very well in my head and I would have to spend like 15 hours making diagrams to think through all the various cases. So I haven’t gotten around to it yet.

So, unless an epiphany hits that makes everything fall together nicely without becoming massively more complicated, Garnet will not have coroutines.

wat

Yep! Sometimes you spend 3 months thinking real hard about something and then decide it’s not the path to take. This is kinda why I didn’t want Garnet to be a research project, but sometimes the allure of shiny objects is just too much to resist. I can’t call it wasted time though; I learned a shit-ton, and tied my brain up into many interesting pretzels that forever changed how I think about code. Don’t get me wrong, I would love it if Garnet had coro’s, but I’m not going to try to redesign the language around them any time soon.

Also, honestly the most useful part of coroutines that is also the hardest is “writing general-purpose, composable state machines”, which don’t necessarily need to overlap with “coroutine” at all. There’s lots of ways to write state machines! So I’ve kinda pondered what it might look like to have “state machine” be a primitive of a programming language, like for loops or structs. Maybe coro’s are just too low-level, or the wrong abstraction, with all their detailed interactions with what they borrow on which stack and how the control flow interacts? If you just want to represent states and transitions in a way that’s easy to express, what would it look like? I think this idea has a lot of promise, but again, Garnet isn’t a research project, so it’s a problem to think about in the future once most of the current goals are sorted out. Maybe the next language!

So what’s left?

All the shit I don’t want to deal with but have to, alas.

Monomorphization

This is one of those bits of language design and compiler implementation that not enough people talk about, as far as I can tell, though that’s also not the same as nobody. It’s basically the process of taking a general-purpose function fn append(&mut Vec<T>, T) and turning it into specialized variants fn append(&mut Vec<i32>, i32) and fn append(&mut Vec<SomeStruct>, SomeStruct) and so on so that you can actually compile them to efficient machine code without tons of pointer indirection. It’s very simple in concept and a total pain in the ass in reality, but at least I’m not the only one who thinks so. It’ll get done, it’s just tedious and error-prone. Basically most of the worst parts of type-checking without the payoff.

The rest of modules

Sufficiently-trivial ML modules pretty much work in Garnet right now, and once monomorph works we’ll be able to write slightly-less-trivial functors that let you do actual type metaprogramming along the lines of Rust’s traits. Simple ones, at least, until I explore the boundaries more and figure out what else is missing. One thing is that modules can’t currently contain new type declarations, which is a limitation similar to having Rust traits without associated types, so there’s useful stuff they either can’t express or can express only very awkwardly. And then there’s the whole higher-kinded type question. And then there’s the whole modular implicits thing I want to do to try to make modules less of a pain in the ass to actually use, which may or may not need some special cases or nuanced design work to make it actually nice instead of merely less terrible. And then it would be really nice to have some notion of dynamic dispatch built into the language, ’cause it nicely avoids a lot of writing of giant match statements of function dispatches, but I haven’t even really begun to think about how to do that.

There’s lots of research papers about variations and nuances of ML modules and I really should read more of them than I have, but most of them seem to stop at the math or the simplest-case algorithms and say “so this proves all the rest of the stuff works”, but nobody heckin’ tells you how all the rest of the stuff works and what all the implications are. At least not outside of the OCaml compiler codebase, which is honestly another thing I should read more of. Anyone know of any good, thorough review papers on this topic that don’t descend into 40 pages of math? Or heck, even a mediocre one would be nice.

Borrow checker

Oh man, this is gonna suck and there’s no two ways about it. There’s very few people in the world who can tell you how to actually write a full borrow checker, and I am not one of them, but I’ll have to become one.

But, you can get a lot of the most immediately useful bits of borrow checking just by doing basically what Austral does and making non-escaping borrows, which seem actually pretty easy to implement. Basically with them you can write a function fn foo(x: &Thing) that borrows a thing and does stuff with it, but you can’t write a function fn foo(x: &'a Thing) -> &'a OtherThing that returns part of what it borrowed, or write a type struct Thing<'a> { x: &'a OtherThing } that borrows another value. That’s not really sufficient to be a full-blown solution IMO, but it’s a pretty okay place to start.

I occasionally dream of just including Datalog, Prolog or polonius whole-hog into garnetc and waiting for a random type-theory angel to wander along and use it write a borrow checker for me, but that’s not a terribly actionable plan. Wandering type-theory angels usually have their own weird side-projects they’re working on; to coax them to contribute to your project you have to put in the work to make your project interesting to them first.

Pattern matching

This is a big, complex feature of its own, which I haven’t even started to touch for real, but it’s also pretty well-bounded. I don’t really have anything I want Garnet to do that Rust or Erlang can’t do already, so it’s just a matter of figuring out how to implement it. From what I’ve seen that’s its own side-quest that can consume infinite amounts of interesting time and energy, but a basic implementation can just turn a match into a series of if statements that are inefficient but correct and get on with life. There’s still questions that will need to be answered around exhaustiveness checking and overlapping match arms and such, but I think the simple-and-obvious solution will cover like 95% of what you usually need.

I’m probably wrong about that, now that I think about it. Look, let me dream!

The rest of the owl

The broad strokes of Garnet’s design are mostly settled in my mind but there’s tons more details that can be added more or less independently. This is actually where most of the actually interesting stuff goes, but I have to get the foundations down real good before I pay too much attention to them. So this includes things like:

Properties. Probably 80% of the framework needed for function properties exists already, but actually doing anything with them is still TBD. Data/type properties are still mostly nebulous.
Bit patterns and data layouts, and how they mesh with pattern matching and type conversions for real.
How do linear types actually work? Do we have a Drop module or property, or something else?
How do moves and copies actually work? Do we have a Move or a Pin property, or something else?
Unsafe and pointer semantics. This is a Big Deal, and also a very important one, since one goal of Garnet is to make writing unsafe code less hazardous than it is in Rust. But it’s also low on the priority list for having a basic compiler work.

Conclusions

If I have monomorph and a solid module system done by the end of 2024, I’ll be pretty happy. If I have a non-escaping borrow checker too, or a prototype of something more sophisticated, I’ll be pretty ecstatic. Maybe then I can start designing more of the cool stuff.