WritingAWaylandCompositorInRust
Wayland and the associated libraries are a little bit of a moving target. This information is, as far as I know, up to date as of May 2020.
Introduction
When the Rust programming language was bright and shiny and new, one of the more interesting/popular projects for it was Way Cooler, a Wayland compositor in Rust. Wayland is Linux’s next-gen API to replace X11 for graphics and user interaction, so this was pretty cool, two next-gen technologies playing nice together. So it was a little bit of an “oof” moment when Way Cooler announced it was giving up on Rust to rewrite in C instead.
To me this seemed the acme of foolishness, since IMHO even writing
unsafe Rust is far nicer than writing C. Despite striving for memory safety
everywhere, the set of programs that can be easily be proved
memory-safe at compile time is pretty small when compared to the set of
programs that are actually memory safe, especially when dealing with
spooky stuff like hardware. A lot of those not-provably-safe programs do
pretty useful things, so we must keep some perspective: the problem is
unnecessary unsafety. What is necessary depends on
your problem domain and the design tradeoffs you decide to make, and so
is up to you. The problem with ensuring there’s only necessary
unsafety when interfacing Rust and C is that even very solid and
battle-tested C code really does not map well to the rules that Rust
uses to make sure the compiler can prove a program is memory-safe. You
have to constrain your code so that it follows the rules. If you can’t
do this, the main ways to turn an unsafe interface safe are to add
runtime checks such as refcounted pointers, or arrange things so that
there are still some unsafe API’s but they are easy to contain and
verify manually. The Way Cooler developers wanted to make
wlroots-rs
capital-S Safe everywhere, they wanted to do it
without runtime checks, and they were working on this in 2017
and early 2018 when there was less community wisdom about how to do (or
not do) these things most effectively. They wanted to achieve memory
safety entirely by structuring the code so it could be proved safe at
compile time, while working with C code that was foreign to Rust’s
rules. They failed to meet all these goals in a reasonable way.
They are in very good company.
So, it was sad but not a huge surprise to me when, eight months later, there was another blog post announcing the end of the project. It had some interesting insights though, chronicling a tragic and familiar tale of misplaced expectations, development hell and questionable assumptions, made by skilled but relatively inexperienced programmers who weren’t sure what they wanted. The postmortem is definitely worth a read; if only we could all have so vivid and interesting a Learning Experience.
It left me with some questions though, mainly, what IS it like to
write a Wayland compositor in Rust? How safe can you make it, or not
make it? So now, a couple years later, I’ve taken a bit of time and
translated the wlroots
example code from C to Rust. Here
are my findings.
Swot
First off, how the heck do you write a Wayland compositor? Well,
let’s look for an example. The most complete and popular compositor out
there that isn’t part of Gnome or something equally complex is probably
Sway. Sway is a fairly mature, fairly
simple compositor that works quite well once you get used to it. In fact
it was using Sway on my Pinebook Pro that got me into this endeavor in
the first place: it works far better on the Pinebook’s low-end
hardware than any X11 WM I’ve tried, even the lightweight ones. Sway is
written in C, using a C library called wlroots
which is
maintained mostly by the same people and seems to be becoming the
de-facto lib for interfacing with Wayland at a (slightly) higher level
than “do it all yourself”. The wlroots
tagline of “50,000
lines of code you’d have to write anyway” is pretty good advertising.
The only other such lib I am aware of is the smithay
Rust
crate which intends to do something fairly similar, but I chose not to
use it ’cause wlroots
is already demonstrated by Sway to
More Or Less Work. I am still learning how Wayland works myself and
didn’t want to multiply my problems by digging into a new lib when I
wasn’t sure how complete it was, or even how to judge its completeness.
A proper compare and contrast between wlroots
and
smithay
may happen Real
Soon.
Next, the most important part: The name. I took the name Sway, changed a couple letters, and got Swot. Apparently this is British slang for someone who is way too interested in a topic, which probably accurately describes most people who write Wayland compositors. The source code is here: https://hg.sr.ht/~icefox/swot.
Now, this is NOT a fully-functional, featureful, usable Wayland
compositor. It’s a direct port of the tinywl
minimum-viable-product example code that ships with
wlroots
. It’s also not perfect, it currently still has a
bug or two and needs a lot of cleanup. My original plan was to make
something like Sway but maybe a little less obtuse in its window
arrangement, but I’ve since gotten used to Sway so I’m no longer
motivated to do anything nearly as huge.
Finally, build setup: This uses the wlroots
C library,
and generates Rust wrappers to it and the Wayland C headers using
bindgen
. There is a wayland-sys
crate which is
maintained by smithay
and looks perfectly fine, but the
wlroots-sys
crate was part of Way Cooler and is now
tragically out of date. Making a new, proper wlroots-sys
crate that is up to date and works with wayland-sys
would
be great, but a) now I’d have two projects, and b) I’d have to be
responsible and maintain wlroots-sys
for the forseeable
future. So instead I just used bindgen
to create wrappers
for them both as part of Swot. This does mean that things Mysteriously
May Not Work unless you’re using Debian Bullseye with
wlroots
0.10.1, and building the thing at all takes a bit
of work. Sorry.
Reading list
There’s a lot of documentation out there on Wayland but most of it is
dense and hard to get into without wider context. Instead I recommend
the wlroots
docs and examples, which often then direct you to more in-depth
references or blog posts.
Also, https://wayland-book.com/ which came out just
after I basically finished Swot. It looks somewhat WIP, but,
that’s life. Written by the originator of Sway and wlroots
,
so, he probably knows what he’s talking about.
Learning Experiences
I’m not going to go through Swot in huge depth because the actual
mechanics are pretty straightforward, and the tinywl
example does a far better job of explaining what’s actually going on
than I will. Instead, I’m going to talk about what I learned from doing
it.
Making a basic Wayland compositor involves startlingly little actual drawing. Wayland is mostly a pile of protocols, with each protocol being an API defining functions, events and resources. That’s the compositor’s real job: it’s there to handle events such as key presses, windows resizing, new monitors being plugged in, and to manage resources such as key maps, cursors and memory buffers representing chunks of screen real-estate. Everything that happens starts from an event, all events are handled via callbacks, and some resources are the opaque “you can touch pointers to this but never the thing itself” kind while others are defined as C structs with various fields.
The real complexity comes from two sources: first, sorta like Vulkan,
you start off not knowing anything about the system you’re running on
and you have to figure out the hardware and capabilities to see what’s
actually going on, and load the protocols associated with those at
runtime. wlroots
handles most of this for you though, and
provides you with some reasonable abstractions, like a “seat” which is a
collection of N displays, 0-1 keyboards, 0-1 pointers, and 0-1 touch
devices. The second source of complexity is that when you decide to
handle a protocol, it can result in you then handling other protocols
associated with it, so the data structures and callbacks proliferate and
you get to manage them all by hand.
Each of these protocols is defined as an XML file full of
descriptions of structures and functions, and there’s a generator
program which turns it into code. For C it’s called
wayland-scanner
. This is a nice-ish system for making
language-independent API’s, OpenGL and Vulkan do the same thing. In
practice all of these API’s encode C’s assumptions in them, but, that’s
at least something to start with. (Also, it turns out that all these
protocols do actually turn into messages in a binary wire format which
gets sent over a Unix socket between a Wayland client and compositor. I
didn’t actually know that while writing Swot!) There is a
wayland-sys
Rust crate is just the output of its own
version of wayland-scanner
that generates Rust code, but
for this effort I took the C Wayland code and wrapped it in Rust using
bindgen
. This caused some issues, because C, but was all in
all fairly easy to get going. I had to rewrite Rust versions of a couple
C things that were implemented as macros or static inline functions,
which bindgen
couldn’t create bindings to. Tracking them
down was annoying but not actually troublesome.
The more awkward parts are… well, Wayland code is REALLY C-ish.
There’s a wl_list
type for an intrusive linked list, and a
pile of C macros for working with it, and Rust does not much like
intrusive linked lists. (More on that later.) Then there’s
wl_listener
. wl_listener
is a two-field struct
that you use to define the callback for a particular event, and it
contains a function pointer that gets called when that event occurs. All
well and good. This callback gets a pointer to the event struct, and a
pointer to the wl_listener
. However, I’d expect the
wl_listener
to also contain a void pointer that you could
stick arbitrary data into, so you could pass whatever other random data
you wanted into the callback. That doesn’t exist. Well then, how do you
get the callback the context it needs to actually do stuff in response
to the event? For, as we all know, if you use a global variable then the
ghost of Edsger W. Dijkstra will haunt you forevermore.
You get sneaky, of course. You stick your wl_listener
into a struct that has everything you need in it. Then when your
callback is called, it gets the pointer to the wl_listener
and, since you know the type of the struct containing the
wl_listener
and where in the struct it is is, you back out
a pointer to that struct from the pointer aiming at the listener. So you
keep together your listener and whatever the event handler in question
needs, and it All Magically Just Works. There’s a macro for it,
wl_container_of
.
Get it? No? I didn’t either for a while, then it took me a bit longer
to believe it. I commented on the #sway-devel
IRC channel
that I’d never seen anything like that before and someone said “really?
that sort of thing is pretty common in Linux kernel code”. I guess I
need to read more hardcore C code, then. Apparently this is actually
nice because it avoids having that mystery void pointer at all. If you
did the same thing but passed a pointer to your context struct to the
event, you’d need to make sure that pointer kept in sync with the
listener containing it. Using wl_container_of
instead, you
only end up doing cruel and unusual things to one pointer instead of
two.
Rust generally does not do this sort of thing. It certainly
can, of course, and I wrote my own version of the
wl_container_of
macro, called
conjure_heckin_ptr!()
. However, every time I use this I
imagine rustc
looking at me with big sad puppy-dog eyes
saying “this is a small struct getting given to a function, I want to
just pass it in registers. You realize I can’t do that now, right? Why
do you hate passing things in registers?” However, this actually gets us
into an interesting design choice between C and Rust, and how that
affects API’s.
C assumes things in memory do not move. Rust assumes things in memory may always move.
Here, “move” is meant quite literally, making a copy of something in
memory and never referring to the old one again. If you reallocate an
array and copy the contents of the old array into it, that’s a move. If
you copy a struct into a bigger struct that contains it, that’s a move.
In C, unless a value is an atom like an int
or pointer,
moving it generally requires an explicit memcpy()
. Plus, if
you move something, you have to fix up all the pointers referring to it
to aim at the new version. This is annoying and error prone and so when
writing C code people design their programs to avoid it. In
Rust however, it’s assumed that things move all the time; every variable
assignment and lots of function calls involve a move, and rustc will
call memcpy()
for you if it needs to move large things. The
compiler optimizes out all the move’s that don’t actually do anything
useful, and the borrow checker makes sure that the moves can’t happen if
you have any pointers that would need fixing up after. That’s exactly
what the borrow checker is for, after all.
These are very different and fundamental opinions about how a program works, and they result in very different sorts of designs for solving the same problems.
Other languages have their own opinions on these things, but like C they’re a bit more implicit. In my experience they generally fall into two categories: Either your language has a GC that is allowed to move stuff wherever it feels like, which is most languages these days, and you don’t think about this at all because the GC does all the moving and fixes up the pointers for you. The other category as far as I know consists only of C++, and you have a gross morass of copy constructors, move constructors, and Eris knows what else that try to let you tell the language what the heck it should do when you move something. (C-contemporary systems languages such as Pascal pretty much use C’s assumptions.)
Whichever opinion your language has, it reflects into API design in
broad and subtle ways. C tends to go for patterns that don’t involve
moving objects in memory. wl_list
is exactly this, you can
add and remove and reorder nodes just by shuffling pointers around
inside a couple nodes. It’s super convenient.
wl_listener
is also this, a wl_listener
stops
working right if you move it in memory; you need to unregister it and
re-register the new one. (A wl_listener
contains a
wl_list
node, naturally.) But if you never move it or the
object containing it, you get one less pointer to worry about. Contrast
this with Rust’s Vec
, which is a dynamic array that will
resize itself if necessary… allocating new memory and moving its old
contents into it. Vec
is super common in Rust
code, but you can’t just shove your Wayland types into a Rust
Vec
, because the next time someone stuffs a new object into
it or reorders it or such, it will nuke relevant pointers. To make
Wayland and Rust cooperate have to know where that matters and tell it not to do
that.
This is a very interesting design space with a fair bit of engineering and not a lot of research behind it, as far as I can tell. Thinking about programs in terms of ownership and moving things in memory predates Rust, but seems to have become more common as Rust has. (This is my own impression, so maybe I’m wrong.) But I look forward to a new generation of languages that explore these ideas more.
So, that’s where the real impedance mismatch is between C and Rust lies, at least as far as this experience with Wayland. Once you understand that, and know where to look to spot these implicit assumptions, you can start to design in ways that work better for both.
Bugs
So the whole argument behind Rust is that you can write
hyper-efficient low-level code, like display systems, with fewer errors.
Does it live up to it in this case? Well, Swot is not really a fair
comparison of this, since it’s a transliteration of already
more-or-less-bug-free C code. But even when basically just transcribing
code, there will be bugs. Maybe more than writing your own version from
scratch, ’cause your mental model is incomplete and it’s easy to
overlook details. Swot is far from perfect, and getting it working as
much as it does required some time in gdb
. Here are the
classes of bugs I’ve fought:
- Pointer screwups bit me several times
- User error in
wl_container_of
bit me several times - I got bit a couple times by Rust moving things that had raw pointers pointing to them, then I got very paranoid about anything getting moved and stopped having issues.
- Hardest bug to find was actually just a missed
!
in an if statement.
One thing I was curious about is whether I would find any existing
bugs in tinywl
by transcribing it to Rust. Answer is, not
yet. A couple oddities though:
- There were a couple unused variables in some functions, which have since been removed in git master
- There’s a couple functions with a
time
parameter for some reason, which is unused. - The logic in
process_cursor_motion()
stood out as the most complex and stateful part of it by far, and was tricky to understand the several interlocking states to translate it to Rust. Or maybe it was just ’cause I did that part late at night. - …Actually, it’s bigger than that, the
desktop_view_at()
functions and everything that calls it and associated functions are just a really good way to make a tangled up mess of whichView
andwlr_xdg_surface
actually belong together. I’d consider that a strong candidate for refactoring, and I noticed it mainly because trying to translate more of its calls from pointers to safe references caused borrow checker issues. That made me realize that usuallyfocus_view()
is called with aView
and then thewlr_surface
thatView
contains, which seemed redundant, but there’s like one single case when this wasn’t necessarily true. On the other hand, the core of all of this iswlr_xdg_surface_surface_at()
, which looks like it does pretty delicate things in the first place, so maybe it’s just fundamentally complicated. - Now that I think of it,
tinywl
’scalloc()
calls aren’t checked, nor are the return values of a variety of other system functions. It’s example code, so I’m fine with it, but it’s still irksome. I didn’t even realize until now.
Also notable is what I didn’t find:
- Screwups with unions were absent – there aren’t many of them and their state is always very explicit
- Screwups with uninitialized values were absent –
tinywl
is very careful about these - Screwups with implicit numerical conversions were absent as far as I
could tell. However, said conversions are everywhere, which
Feels Bad after basically only writing Rust for so long. Just taking a
random
f64
, doing a bunch of math on it and then stuffing it into au32
makes me uncomfortable. wl_list
didn’t bite me, though possibly because I got rid of it everywhere possible.
So, tinywl is in general an example of very tight and specific code that Works. There are no checks for things that can’t happen, no initialization of things that don’t need to be, and so on. It’s pretty nice when a program is small and specific enough that that’s possible, and C is very nice for writing such programs. There’s just also no safety nets for things that are trivially wrong, because, C.
Oh yeah, Wayland
Almost an afterthought by this point. Frankly the links in the
reading list section explains it better than I can and, frankly, I don’t
trust my knowledge to be terribly complete or accurate yet. Instead I’ll
just outline the general flow of how things in Swot fit together, which
is basically the story of following wl_signal_add()
calls.
These all seem to come from three main sources: inputs, outputs, and
xdg_shell
.
You start everything with the wlr_backend
protocol. The
wlr_backend
has two events: new_output
and
new_input
. new_output
happens when a new
display device is attached/discovered (such as when the compositor first
starts or a new screen is plugged in), and in it you should register a
handler for a new event, frame
, that occurs when it is time
to draw on that display. Handling the frame
event generally
consists of clearing the screen, drawing all the windows in it however
you feel like, drawing the mouse cursor, and then signaling you are
done. You can do more sophisticated window damage/dirty-rect tracking,
but Swot doesn’t. new_input
occurs when a new input device
is attached/discovered, such as plugging in a new keyboard, and it finds
out what kind of input device it is (keyboard, tablet, etc) and
registers new event handlers for that input device. For a keyboard, this
appears to be key
and modifier
events, and the
compositor either handles the key presses itself (such as swapping
window focus in response to alt-tab) or passes them on to whatever
program it decides should get them.
wlroots
’s seat
abstraction has a lot of
useful functions and state for mongling input events from various
devices, keeping track of modifier keys, keeping track of what program
has focus, setting up keymaps properly, drawing cursors, and so on.
wlroots
also handles mouse events but in the process makes
things a little less orthogonal, since at its most basic level a
mouse/trackball/whatever gives you motion and button press events just
like a keyboard would, but in practice handling the mouse inputs and
drawing the cursor at the same time gets more convoluted.
wlroots
has functions and events for doing things like
changing the cursor image, notifying programs when the cursor
enters/leaves a window, tracking where the cursor is on the screen in
multi-screen setups, and so on. In general it does lots of the stuff
that these days we expect to Just Happen. Nothing says that moving a
mouse around has to result in corresponding movements of a
little pointer icon on the screen, but it’d be hard to find a system
where it doesn’t. You could definitely do things differently such as,
say, each window having its own cursor position, or making it possible
for a mouse cursor to hide under a window, but the utility of that seems
pretty dubious.
So finally we come to the xdg_shell
family of events.
The “XDG shell” is a Wayland protocol that defines everything about
windows and how they can behave. It’s technically optional, but if you
want to have a system that displays programs in windows that work more
or less how everyone has expected them to work since 1993, then it’s
probably useful. It’s also what lets programs communicate their state
back to the Wayland compositor by saying things like “set my title to
this string” or “I am done drawing, display these pixels at your
leisure”. Swot only handles the basic events that deal with manipulating
surfaces, where a “surface” is a buffer of pixels to be drawn on (or GPU
equivalent). There’s map
and unmap
events,
which occur when the client window is shown or hidden, there’s
request_move
and request_resize
which happen
when the client asks to be moved or resized (such as for client-side
window controls), and there’s destroy
which happens when
the surface is destroyed.
Hopefully with this you can start to get a feel for how everything fits together. When a compositor starts, it gets a pile of events telling it about new input and output devices, and does whatever it feels like to keep track of them and register event handlers for them. When a user pushes buttons on the keyboard, wiggles the mouse, or plugs a new monitor in, the compositor gets events notifying it of these things and it does stuff like pass events to the currently-focused window or rearrange the desktop to include the new monitor. When a program starts it generally asks the compositor for a new surface to draw stuff on (or two, or three), then draws whatever and however it feels like, but talks back and forth with the compositor to say “let me know when it’s time to draw” and “I’m done drawing”. Then when it shuts down it tells the compositor it’s done with its surfaces and the compositor stops listening for any events associated with that particular program. There’s a lot going on, and it’s very low-level, but it all seems to come together pretty nicely. The somewhat asynchronous event-and-callback-driven structure seems to be a very modern style, and it can get spaghettified very quickly if you let it, but if you keep a grip on what interacts with what then it can be very efficient.
To finish, let’s try to compare complexity at least a little bit between Wayland and X11. Lines of code is a crap measure of complexity, but until someone actually invents something better, we’ll use that:
- The
xorg
X11 server, version 1.20, weighs in at about 400,000 lines of code without comments. This is actually a lot less than I expected. This is just the bare server, with no utilities, drivers or libs or other such things. It’s low enough in contrast to X11’s fearsome reputation that I’m actually wondering what I’m missing here; I don’t know much about how the guts ofxorg
is structured. (Edit from the future: Apparently what I missed is that during the transition from XFree86 to Xorg something like half the code got cleaned up and removed.) wlroots
version 0.10 is about 56,000 lines of code- The
sway
Wayland compositor, version 1.3, is about 40,000 lines of code, including utility code likeswaybar
. Between themwlroots
andsway
do pretty much everything thatxorg
plus a window manager do. i3
version 4.18.1, the X11 window manager that directly inspiredsway
, is about 30,000 lines of code, including utilities.- The
awesome
version 4.3, another quite lightweight X11 window manager similar in spirit toi3
, is about 60,000 lines of code without utilities.
So by this (bad) measure, writing a Wayland compositor is more work
than an X11 window manager, but not by very much if you use
wlroots
.
Future work
To be done by any interested party. I’ve got other things to do this year.
- Bug in Swot: draw ordering for selected windows still doesn’t work ’cause I couldn’t be arsed to figure it out.
- Clean up the Swot code, restructure things in ways to make it more
harmonious with Rust – more references, maybe a few wrappers to make
wl_listener
’s more convenient, etc. - Try to use
wayland-sys
and maybe an updated/customwlroots-sys
instead of generating the bindings yourself. - Make a version using
smithay
instead ofwlroots
, compare and contrast
Appendix: discussion
Good comment by levansfg
on Reddit:
Hey there! As one of smithay’s main dev, I cannot not react to this. :)
Overall I’d say I mostly agree with your description of the state or Rust + Wayland, but I’d like to add a few details, for whoever may be interested:
Making a basic Wayland compositor involves startlingly little actual drawing. Wayland is mostly a pile of protocols, with each protocol being an API defining functions, events and resources. That’s the compositor’s real job: it’s there to handle events such as key presses, windows resizing, new monitors being plugged in, and to manage resources such as key maps, cursors and memory buffers representing chunks of screen real-estate.
While this is kind of true, I think you get this vision because wlroots does a huge part of the heavy-lifting for you. A very significant part of the job of a Wayland compositor is to interface with the OS. This means DRM, GBM, udev, logind, libinput (on Linux at least), and this is no small feat.
If you want a general idea of how much work this represent, Smithay’s codebase is roughtly 1/3 code for managing the Wayland clients, 1/3 code for managing the graphics stack (DRM/GBM/OpenGL), and 1/3 code for managing the rest of the system interfaces (udev, logind, libinput).
The actual “using OpenGL to draw” code though is indeed a really small part of the whole thing.
So, that’s where the real impedance mismatch is between C and Rust lies, at least as far as this experience with Wayland.
I agree with you that this question about whether and how things move is a significant part of the friction, but with my years working on wayland-rs I’d also add an other aspect (which I suspect is mostly hidden deep in the guts of wlroots so you probably didn’t need to face it): pointer lifetimes.
libwayland’s API gives you access to lots of pointer with a dynamically defined lifetimes, that are sometimes not even controlled by the app itself but by event coming from the Wayland socket. So you get some situations like “once you have received that event (via a callback), this other pointer is no longer valid”. This kind of things require a lot of runtime book-keeping to fit into a Rust API.
These two friction points were mainly the reason I started Smithay in the first place: Trying to reduce the impedance mismatch to a minimum by relegating it to the lowest-level possible places: actual FFI bindings to libwayland (and other system libraries), and then write the whole “50.000 lines of code you’ll write anyway” directly in Rust. As a result, Smithay’s API ends up being (I believe) quite different from the one of wlroots. And hopefully much more Rust-friendly. :)
Still, making a good comparison is difficult, given wlroots is a much larger and mature project compared to Smithay, which as of today remains a few-persons show.
Finally, to add to your measures comparing lines of code, according to tokei:
- The whole set of wayland-rs crates is 10k lines of code
- This includes both binding code to system libwayland as well as a pure Rust implementation of the Wayland protocol (you can choose which on you want using a cargo feature).
- I’d say roughtly 1/4 is binding code, 1/2 is protocol implementation, and 1/4 is shared logic between them.
- Smithay is 15k lines of code
- This does not count the various FFI crates it uses, which each expose a Rust API on top of a system library
- Keep in mind that the main reason Smithay is much smaller than wlroots is likely because it is much less feature-complete.
- anvil is 2.6k lines of code
- anvil is the standard compositor of Smithay, much like
rootston
for wlroots.
I found this recently too and it’s very nice: The Real Story Behind Wayland and X - Daniel Stone (linux.conf.au 2013)