WritingAWaylandCompositorInRust

Wayland and the associated libraries are a little bit of a moving target. This information is, as far as I know, up to date as of May 2020.

Introduction

When the Rust programming language was bright and shiny and new, one of the more interesting/popular projects for it was Way Cooler, a Wayland compositor in Rust. Wayland is Linux’s next-gen API to replace X11 for graphics and user interaction, so this was pretty cool, two next-gen technologies playing nice together. So it was a little bit of an “oof” moment when Way Cooler announced it was giving up on Rust to rewrite in C instead.

To me this seemed the acme of foolishness, since IMHO even writing unsafe Rust is far nicer than writing C. Despite striving for memory safety everywhere, the set of programs that can be easily be proved memory-safe at compile time is pretty small when compared to the set of programs that are actually memory safe, especially when dealing with spooky stuff like hardware. A lot of those not-provably-safe programs do pretty useful things, so we must keep some perspective: the problem is unnecessary unsafety. What is necessary depends on your problem domain and the design tradeoffs you decide to make, and so is up to you. The problem with ensuring there’s only necessary unsafety when interfacing Rust and C is that even very solid and battle-tested C code really does not map well to the rules that Rust uses to make sure the compiler can prove a program is memory-safe. You have to constrain your code so that it follows the rules. If you can’t do this, the main ways to turn an unsafe interface safe are to add runtime checks such as refcounted pointers, or arrange things so that there are still some unsafe API’s but they are easy to contain and verify manually. The Way Cooler developers wanted to make wlroots-rs capital-S Safe everywhere, they wanted to do it without runtime checks, and they were working on this in 2017 and early 2018 when there was less community wisdom about how to do (or not do) these things most effectively. They wanted to achieve memory safety entirely by structuring the code so it could be proved safe at compile time, while working with C code that was foreign to Rust’s rules. They failed to meet all these goals in a reasonable way.

They are in very good company.

So, it was sad but not a huge surprise to me when, eight months later, there was another blog post announcing the end of the project. It had some interesting insights though, chronicling a tragic and familiar tale of misplaced expectations, development hell and questionable assumptions, made by skilled but relatively inexperienced programmers who weren’t sure what they wanted. The postmortem is definitely worth a read; if only we could all have so vivid and interesting a Learning Experience.

It left me with some questions though, mainly, what IS it like to write a Wayland compositor in Rust? How safe can you make it, or not make it? So now, a couple years later, I’ve taken a bit of time and translated the wlroots example code from C to Rust. Here are my findings.

Swot

First off, how the heck do you write a Wayland compositor? Well, let’s look for an example. The most complete and popular compositor out there that isn’t part of Gnome or something equally complex is probably Sway. Sway is a fairly mature, fairly simple compositor that works quite well once you get used to it. In fact it was using Sway on my Pinebook Pro that got me into this endeavor in the first place: it works far better on the Pinebook’s low-end hardware than any X11 WM I’ve tried, even the lightweight ones. Sway is written in C, using a C library called wlroots which is maintained mostly by the same people and seems to be becoming the de-facto lib for interfacing with Wayland at a (slightly) higher level than “do it all yourself”. The wlroots tagline of “50,000 lines of code you’d have to write anyway” is pretty good advertising. The only other such lib I am aware of is the smithay Rust crate which intends to do something fairly similar, but I chose not to use it ’cause wlroots is already demonstrated by Sway to More Or Less Work. I am still learning how Wayland works myself and didn’t want to multiply my problems by digging into a new lib when I wasn’t sure how complete it was, or even how to judge its completeness. A proper compare and contrast between wlroots and smithay may happen Real Soon.

Next, the most important part: The name. I took the name Sway, changed a couple letters, and got Swot. Apparently this is British slang for someone who is way too interested in a topic, which probably accurately describes most people who write Wayland compositors. The source code is here: https://hg.sr.ht/~icefox/swot.

Now, this is NOT a fully-functional, featureful, usable Wayland compositor. It’s a direct port of the tinywl minimum-viable-product example code that ships with wlroots. It’s also not perfect, it currently still has a bug or two and needs a lot of cleanup. My original plan was to make something like Sway but maybe a little less obtuse in its window arrangement, but I’ve since gotten used to Sway so I’m no longer motivated to do anything nearly as huge.

Finally, build setup: This uses the wlroots C library, and generates Rust wrappers to it and the Wayland C headers using bindgen. There is a wayland-sys crate which is maintained by smithay and looks perfectly fine, but the wlroots-sys crate was part of Way Cooler and is now tragically out of date. Making a new, proper wlroots-sys crate that is up to date and works with wayland-sys would be great, but a) now I’d have two projects, and b) I’d have to be responsible and maintain wlroots-sys for the forseeable future. So instead I just used bindgen to create wrappers for them both as part of Swot. This does mean that things Mysteriously May Not Work unless you’re using Debian Bullseye with wlroots 0.10.1, and building the thing at all takes a bit of work. Sorry.

Reading list

There’s a lot of documentation out there on Wayland but most of it is dense and hard to get into without wider context. Instead I recommend the wlroots docs and examples, which often then direct you to more in-depth references or blog posts.

Also, https://wayland-book.com/ which came out just after I basically finished Swot. It looks somewhat WIP, but, that’s life. Written by the originator of Sway and wlroots, so, he probably knows what he’s talking about.

Learning Experiences

I’m not going to go through Swot in huge depth because the actual mechanics are pretty straightforward, and the tinywl example does a far better job of explaining what’s actually going on than I will. Instead, I’m going to talk about what I learned from doing it.

Making a basic Wayland compositor involves startlingly little actual drawing. Wayland is mostly a pile of protocols, with each protocol being an API defining functions, events and resources. That’s the compositor’s real job: it’s there to handle events such as key presses, windows resizing, new monitors being plugged in, and to manage resources such as key maps, cursors and memory buffers representing chunks of screen real-estate. Everything that happens starts from an event, all events are handled via callbacks, and some resources are the opaque “you can touch pointers to this but never the thing itself” kind while others are defined as C structs with various fields.

The real complexity comes from two sources: first, sorta like Vulkan, you start off not knowing anything about the system you’re running on and you have to figure out the hardware and capabilities to see what’s actually going on, and load the protocols associated with those at runtime. wlroots handles most of this for you though, and provides you with some reasonable abstractions, like a “seat” which is a collection of N displays, 0-1 keyboards, 0-1 pointers, and 0-1 touch devices. The second source of complexity is that when you decide to handle a protocol, it can result in you then handling other protocols associated with it, so the data structures and callbacks proliferate and you get to manage them all by hand.

Each of these protocols is defined as an XML file full of descriptions of structures and functions, and there’s a generator program which turns it into code. For C it’s called wayland-scanner. This is a nice-ish system for making language-independent API’s, OpenGL and Vulkan do the same thing. In practice all of these API’s encode C’s assumptions in them, but, that’s at least something to start with. (Also, it turns out that all these protocols do actually turn into messages in a binary wire format which gets sent over a Unix socket between a Wayland client and compositor. I didn’t actually know that while writing Swot!) There is a wayland-sys Rust crate is just the output of its own version of wayland-scanner that generates Rust code, but for this effort I took the C Wayland code and wrapped it in Rust using bindgen. This caused some issues, because C, but was all in all fairly easy to get going. I had to rewrite Rust versions of a couple C things that were implemented as macros or static inline functions, which bindgen couldn’t create bindings to. Tracking them down was annoying but not actually troublesome.

The more awkward parts are… well, Wayland code is REALLY C-ish. There’s a wl_list type for an intrusive linked list, and a pile of C macros for working with it, and Rust does not much like intrusive linked lists. (More on that later.) Then there’s wl_listener. wl_listener is a two-field struct that you use to define the callback for a particular event, and it contains a function pointer that gets called when that event occurs. All well and good. This callback gets a pointer to the event struct, and a pointer to the wl_listener. However, I’d expect the wl_listener to also contain a void pointer that you could stick arbitrary data into, so you could pass whatever other random data you wanted into the callback. That doesn’t exist. Well then, how do you get the callback the context it needs to actually do stuff in response to the event? For, as we all know, if you use a global variable then the ghost of Edsger W. Dijkstra will haunt you forevermore.

You get sneaky, of course. You stick your wl_listener into a struct that has everything you need in it. Then when your callback is called, it gets the pointer to the wl_listener and, since you know the type of the struct containing the wl_listener and where in the struct it is is, you back out a pointer to that struct from the pointer aiming at the listener. So you keep together your listener and whatever the event handler in question needs, and it All Magically Just Works. There’s a macro for it, wl_container_of.

Get it? No? I didn’t either for a while, then it took me a bit longer to believe it. I commented on the #sway-devel IRC channel that I’d never seen anything like that before and someone said “really? that sort of thing is pretty common in Linux kernel code”. I guess I need to read more hardcore C code, then. Apparently this is actually nice because it avoids having that mystery void pointer at all. If you did the same thing but passed a pointer to your context struct to the event, you’d need to make sure that pointer kept in sync with the listener containing it. Using wl_container_of instead, you only end up doing cruel and unusual things to one pointer instead of two.

Rust generally does not do this sort of thing. It certainly can, of course, and I wrote my own version of the wl_container_of macro, called conjure_heckin_ptr!(). However, every time I use this I imagine rustc looking at me with big sad puppy-dog eyes saying “this is a small struct getting given to a function, I want to just pass it in registers. You realize I can’t do that now, right? Why do you hate passing things in registers?” However, this actually gets us into an interesting design choice between C and Rust, and how that affects API’s.

C assumes things in memory do not move. Rust assumes things in memory may always move.

Here, “move” is meant quite literally, making a copy of something in memory and never referring to the old one again. If you reallocate an array and copy the contents of the old array into it, that’s a move. If you copy a struct into a bigger struct that contains it, that’s a move. In C, unless a value is an atom like an int or pointer, moving it generally requires an explicit memcpy(). Plus, if you move something, you have to fix up all the pointers referring to it to aim at the new version. This is annoying and error prone and so when writing C code people design their programs to avoid it. In Rust however, it’s assumed that things move all the time; every variable assignment and lots of function calls involve a move, and rustc will call memcpy() for you if it needs to move large things. The compiler optimizes out all the move’s that don’t actually do anything useful, and the borrow checker makes sure that the moves can’t happen if you have any pointers that would need fixing up after. That’s exactly what the borrow checker is for, after all.

These are very different and fundamental opinions about how a program works, and they result in very different sorts of designs for solving the same problems.

Other languages have their own opinions on these things, but like C they’re a bit more implicit. In my experience they generally fall into two categories: Either your language has a GC that is allowed to move stuff wherever it feels like, which is most languages these days, and you don’t think about this at all because the GC does all the moving and fixes up the pointers for you. The other category as far as I know consists only of C++, and you have a gross morass of copy constructors, move constructors, and Eris knows what else that try to let you tell the language what the heck it should do when you move something. (C-contemporary systems languages such as Pascal pretty much use C’s assumptions.)

Whichever opinion your language has, it reflects into API design in broad and subtle ways. C tends to go for patterns that don’t involve moving objects in memory. wl_list is exactly this, you can add and remove and reorder nodes just by shuffling pointers around inside a couple nodes. It’s super convenient. wl_listener is also this, a wl_listener stops working right if you move it in memory; you need to unregister it and re-register the new one. (A wl_listener contains a wl_list node, naturally.) But if you never move it or the object containing it, you get one less pointer to worry about. Contrast this with Rust’s Vec, which is a dynamic array that will resize itself if necessary… allocating new memory and moving its old contents into it. Vec is super common in Rust code, but you can’t just shove your Wayland types into a Rust Vec, because the next time someone stuffs a new object into it or reorders it or such, it will nuke relevant pointers. To make Wayland and Rust cooperate have to know where that matters and tell it not to do that.

This is a very interesting design space with a fair bit of engineering and not a lot of research behind it, as far as I can tell. Thinking about programs in terms of ownership and moving things in memory predates Rust, but seems to have become more common as Rust has. (This is my own impression, so maybe I’m wrong.) But I look forward to a new generation of languages that explore these ideas more.

So, that’s where the real impedance mismatch is between C and Rust lies, at least as far as this experience with Wayland. Once you understand that, and know where to look to spot these implicit assumptions, you can start to design in ways that work better for both.

Bugs

So the whole argument behind Rust is that you can write hyper-efficient low-level code, like display systems, with fewer errors. Does it live up to it in this case? Well, Swot is not really a fair comparison of this, since it’s a transliteration of already more-or-less-bug-free C code. But even when basically just transcribing code, there will be bugs. Maybe more than writing your own version from scratch, ’cause your mental model is incomplete and it’s easy to overlook details. Swot is far from perfect, and getting it working as much as it does required some time in gdb. Here are the classes of bugs I’ve fought:

  • Pointer screwups bit me several times
  • User error in wl_container_of bit me several times
  • I got bit a couple times by Rust moving things that had raw pointers pointing to them, then I got very paranoid about anything getting moved and stopped having issues.
  • Hardest bug to find was actually just a missed ! in an if statement.

One thing I was curious about is whether I would find any existing bugs in tinywl by transcribing it to Rust. Answer is, not yet. A couple oddities though:

  • There were a couple unused variables in some functions, which have since been removed in git master
  • There’s a couple functions with a time parameter for some reason, which is unused.
  • The logic in process_cursor_motion() stood out as the most complex and stateful part of it by far, and was tricky to understand the several interlocking states to translate it to Rust. Or maybe it was just ’cause I did that part late at night.
  • …Actually, it’s bigger than that, the desktop_view_at() functions and everything that calls it and associated functions are just a really good way to make a tangled up mess of which View and wlr_xdg_surface actually belong together. I’d consider that a strong candidate for refactoring, and I noticed it mainly because trying to translate more of its calls from pointers to safe references caused borrow checker issues. That made me realize that usually focus_view() is called with a View and then the wlr_surface that View contains, which seemed redundant, but there’s like one single case when this wasn’t necessarily true. On the other hand, the core of all of this is wlr_xdg_surface_surface_at(), which looks like it does pretty delicate things in the first place, so maybe it’s just fundamentally complicated.
  • Now that I think of it, tinywl’s calloc() calls aren’t checked, nor are the return values of a variety of other system functions. It’s example code, so I’m fine with it, but it’s still irksome. I didn’t even realize until now.

Also notable is what I didn’t find:

  • Screwups with unions were absent – there aren’t many of them and their state is always very explicit
  • Screwups with uninitialized values were absent – tinywl is very careful about these
  • Screwups with implicit numerical conversions were absent as far as I could tell. However, said conversions are everywhere, which Feels Bad after basically only writing Rust for so long. Just taking a random f64, doing a bunch of math on it and then stuffing it into a u32 makes me uncomfortable.
  • wl_list didn’t bite me, though possibly because I got rid of it everywhere possible.

So, tinywl is in general an example of very tight and specific code that Works. There are no checks for things that can’t happen, no initialization of things that don’t need to be, and so on. It’s pretty nice when a program is small and specific enough that that’s possible, and C is very nice for writing such programs. There’s just also no safety nets for things that are trivially wrong, because, C.

Oh yeah, Wayland

Almost an afterthought by this point. Frankly the links in the reading list section explains it better than I can and, frankly, I don’t trust my knowledge to be terribly complete or accurate yet. Instead I’ll just outline the general flow of how things in Swot fit together, which is basically the story of following wl_signal_add() calls. These all seem to come from three main sources: inputs, outputs, and xdg_shell.

You start everything with the wlr_backend protocol. The wlr_backend has two events: new_output and new_input. new_output happens when a new display device is attached/discovered (such as when the compositor first starts or a new screen is plugged in), and in it you should register a handler for a new event, frame, that occurs when it is time to draw on that display. Handling the frame event generally consists of clearing the screen, drawing all the windows in it however you feel like, drawing the mouse cursor, and then signaling you are done. You can do more sophisticated window damage/dirty-rect tracking, but Swot doesn’t. new_input occurs when a new input device is attached/discovered, such as plugging in a new keyboard, and it finds out what kind of input device it is (keyboard, tablet, etc) and registers new event handlers for that input device. For a keyboard, this appears to be key and modifier events, and the compositor either handles the key presses itself (such as swapping window focus in response to alt-tab) or passes them on to whatever program it decides should get them.

wlroots’s seat abstraction has a lot of useful functions and state for mongling input events from various devices, keeping track of modifier keys, keeping track of what program has focus, setting up keymaps properly, drawing cursors, and so on. wlroots also handles mouse events but in the process makes things a little less orthogonal, since at its most basic level a mouse/trackball/whatever gives you motion and button press events just like a keyboard would, but in practice handling the mouse inputs and drawing the cursor at the same time gets more convoluted. wlroots has functions and events for doing things like changing the cursor image, notifying programs when the cursor enters/leaves a window, tracking where the cursor is on the screen in multi-screen setups, and so on. In general it does lots of the stuff that these days we expect to Just Happen. Nothing says that moving a mouse around has to result in corresponding movements of a little pointer icon on the screen, but it’d be hard to find a system where it doesn’t. You could definitely do things differently such as, say, each window having its own cursor position, or making it possible for a mouse cursor to hide under a window, but the utility of that seems pretty dubious.

So finally we come to the xdg_shell family of events. The “XDG shell” is a Wayland protocol that defines everything about windows and how they can behave. It’s technically optional, but if you want to have a system that displays programs in windows that work more or less how everyone has expected them to work since 1993, then it’s probably useful. It’s also what lets programs communicate their state back to the Wayland compositor by saying things like “set my title to this string” or “I am done drawing, display these pixels at your leisure”. Swot only handles the basic events that deal with manipulating surfaces, where a “surface” is a buffer of pixels to be drawn on (or GPU equivalent). There’s map and unmap events, which occur when the client window is shown or hidden, there’s request_move and request_resize which happen when the client asks to be moved or resized (such as for client-side window controls), and there’s destroy which happens when the surface is destroyed.

Hopefully with this you can start to get a feel for how everything fits together. When a compositor starts, it gets a pile of events telling it about new input and output devices, and does whatever it feels like to keep track of them and register event handlers for them. When a user pushes buttons on the keyboard, wiggles the mouse, or plugs a new monitor in, the compositor gets events notifying it of these things and it does stuff like pass events to the currently-focused window or rearrange the desktop to include the new monitor. When a program starts it generally asks the compositor for a new surface to draw stuff on (or two, or three), then draws whatever and however it feels like, but talks back and forth with the compositor to say “let me know when it’s time to draw” and “I’m done drawing”. Then when it shuts down it tells the compositor it’s done with its surfaces and the compositor stops listening for any events associated with that particular program. There’s a lot going on, and it’s very low-level, but it all seems to come together pretty nicely. The somewhat asynchronous event-and-callback-driven structure seems to be a very modern style, and it can get spaghettified very quickly if you let it, but if you keep a grip on what interacts with what then it can be very efficient.

To finish, let’s try to compare complexity at least a little bit between Wayland and X11. Lines of code is a crap measure of complexity, but until someone actually invents something better, we’ll use that:

  • The xorg X11 server, version 1.20, weighs in at about 400,000 lines of code without comments. This is actually a lot less than I expected. This is just the bare server, with no utilities, drivers or libs or other such things. It’s low enough in contrast to X11’s fearsome reputation that I’m actually wondering what I’m missing here; I don’t know much about how the guts of xorg is structured. (Edit from the future: Apparently what I missed is that during the transition from XFree86 to Xorg something like half the code got cleaned up and removed.)
  • wlroots version 0.10 is about 56,000 lines of code
  • The sway Wayland compositor, version 1.3, is about 40,000 lines of code, including utility code like swaybar. Between them wlroots and sway do pretty much everything that xorg plus a window manager do.
  • i3 version 4.18.1, the X11 window manager that directly inspired sway, is about 30,000 lines of code, including utilities.
  • The awesome version 4.3, another quite lightweight X11 window manager similar in spirit to i3, is about 60,000 lines of code without utilities.

So by this (bad) measure, writing a Wayland compositor is more work than an X11 window manager, but not by very much if you use wlroots.

Future work

To be done by any interested party. I’ve got other things to do this year.

  • Bug in Swot: draw ordering for selected windows still doesn’t work ’cause I couldn’t be arsed to figure it out.
  • Clean up the Swot code, restructure things in ways to make it more harmonious with Rust – more references, maybe a few wrappers to make wl_listener’s more convenient, etc.
  • Try to use wayland-sys and maybe an updated/custom wlroots-sys instead of generating the bindings yourself.
  • Make a version using smithay instead of wlroots, compare and contrast

Appendix: discussion

Good comment by levansfg on Reddit:

Hey there! As one of smithay’s main dev, I cannot not react to this. :)

Overall I’d say I mostly agree with your description of the state or Rust + Wayland, but I’d like to add a few details, for whoever may be interested:

Making a basic Wayland compositor involves startlingly little actual drawing. Wayland is mostly a pile of protocols, with each protocol being an API defining functions, events and resources. That’s the compositor’s real job: it’s there to handle events such as key presses, windows resizing, new monitors being plugged in, and to manage resources such as key maps, cursors and memory buffers representing chunks of screen real-estate.

While this is kind of true, I think you get this vision because wlroots does a huge part of the heavy-lifting for you. A very significant part of the job of a Wayland compositor is to interface with the OS. This means DRM, GBM, udev, logind, libinput (on Linux at least), and this is no small feat.

If you want a general idea of how much work this represent, Smithay’s codebase is roughtly 1/3 code for managing the Wayland clients, 1/3 code for managing the graphics stack (DRM/GBM/OpenGL), and 1/3 code for managing the rest of the system interfaces (udev, logind, libinput).

The actual “using OpenGL to draw” code though is indeed a really small part of the whole thing.

So, that’s where the real impedance mismatch is between C and Rust lies, at least as far as this experience with Wayland.

I agree with you that this question about whether and how things move is a significant part of the friction, but with my years working on wayland-rs I’d also add an other aspect (which I suspect is mostly hidden deep in the guts of wlroots so you probably didn’t need to face it): pointer lifetimes.

libwayland’s API gives you access to lots of pointer with a dynamically defined lifetimes, that are sometimes not even controlled by the app itself but by event coming from the Wayland socket. So you get some situations like “once you have received that event (via a callback), this other pointer is no longer valid”. This kind of things require a lot of runtime book-keeping to fit into a Rust API.

These two friction points were mainly the reason I started Smithay in the first place: Trying to reduce the impedance mismatch to a minimum by relegating it to the lowest-level possible places: actual FFI bindings to libwayland (and other system libraries), and then write the whole “50.000 lines of code you’ll write anyway” directly in Rust. As a result, Smithay’s API ends up being (I believe) quite different from the one of wlroots. And hopefully much more Rust-friendly. :)

Still, making a good comparison is difficult, given wlroots is a much larger and mature project compared to Smithay, which as of today remains a few-persons show.

Finally, to add to your measures comparing lines of code, according to tokei:

  • The whole set of wayland-rs crates is 10k lines of code
  • This includes both binding code to system libwayland as well as a pure Rust implementation of the Wayland protocol (you can choose which on you want using a cargo feature).
  • I’d say roughtly 1/4 is binding code, 1/2 is protocol implementation, and 1/4 is shared logic between them.
  • Smithay is 15k lines of code
  • This does not count the various FFI crates it uses, which each expose a Rust API on top of a system library
  • Keep in mind that the main reason Smithay is much smaller than wlroots is likely because it is much less feature-complete.
  • anvil is 2.6k lines of code
  • anvil is the standard compositor of Smithay, much like rootston for wlroots.

I found this recently too and it’s very nice: The Real Story Behind Wayland and X - Daniel Stone (linux.conf.au 2013)