NervesNotes
Nerves is a framework for making embedded systems using Elixir. Here are my notes on how to use Nerves and what it can do. This is not quite a tutorial and certainly not a reference, more a rambly brain-dump. It started out being just my personal notes on handy features and shortcuts, and got progressively more structured and complete the further I went. Don’t assume this is super-complete; my interests for Nerves are sometimes for IoT-y things (sensors, home automation etc) and mostly for robotics (autonomous drones, drone swarms, occasionally constrained-environment things like forklifts or farm robots), so those are the things I will pay the most attention to. For notes on how to set up Nerves to run on a VM and general impressions of it, check out NervesLocalSetup.
Up to date as of May 2024, for Nerves 1.10.
Resources
- https://tips.nerves-project.org/ – short handy tips; apparently unmaintained now, but still worth looking through.
- https://wiki.alopex.li/NervesLocalSetup – Building and running on VM
Capabilities/features
Overall, what Nerves does is build a barebones Linux system with an Elixir VM running on it. It boots directly into said VM and bypasses 99% of the stuff modern Linux distro’s do: there’s no systemd, no X11 or Wayland, no getty, no openrc, no package manager, or anything else. It’s designed to make an embedded system: you don’t build the system by starting with a base distro and adding programs on it and stuff, you basically just bundle the program you want with the OS and that’s it. It runs one program at boot and continues until the system turns off.
This is actually brilliant, ’cause that program is “the Erlang VM”, and IMO it is a rather better operating system abstraction than Unix is. Not gonna go into that too deeply here, but suffice to say that Rust’s stdlib has no safe way to kill an thread from the outside because no matter how you do it, there’s always going to be something horrible that can happen that will hose process state or other threads. Erlang processes generally don’t have this sort of problem even though they share a memory space. Ponder on that a bit.
It also has the result that Nerves really isn’t much of a library; it gives you a few things at runtime but mostly relatively-small convenience and setup utilities. Nerves is mostly just a build system! Well, a build system, and a firmware packager, and an installer, and bunch of other stuff that generally takes a lot of the pain out of building and deploying Linux systems on mostly-embedded hardware. It’s very nice! It just is something that does most of the work at compile-time rather than runtime.
IEx shell stuff
When you boot it up, the system will boot into an IEx shell. …On
some terminal device or another; on embedded boards like a
Beaglebone this will generally be on a serial port. The
nerves_system_foo
docs for your particular board
foo
will tell you which device the terminal is on, which is
more than most actual board providers tend to do.
By default the system will reboot itself if the VM stops. You can
change this behavior with a config option in
config/target.exs
.
If the IEx shell alone stops (such as by you running
exit()
or something) the Erlang VM will keep on chugging
without it. You can type ctrl-G to get into the VM monitor, then
h
to list help. s
starts an Erlang
shell and c
will connect to it; to start an IEx shell you
have to do s 'Elixir.IEx'
, and again connect to it with
c
. Or you can stop the VM from the Erlang monitor with
q
or with Erlang’s q().
function which will,
as said, reboot the machine by default.
Toolshed
is the default Nerves package full of CLI-ish stuff, most of which is
similarly-named to normal Unix command line utils. It’s honestly
reasonably useful as a shell without the rest of Nerves, though it is
not really intended to be used as a non-interactive lib for shell
scripting. The Erlang VM already includes most of the utilities you
would want, instead of having shell scripts you just write code.
There IS a Unix environment under there but it’s all provided by Busybox, so it is functional but quite minimalistic.
Toolshed.cmd/1
lets you run a string in a
sh
shell and print its result.
Nerves.Runtime.cmd/3
does the same but lets you specify
more detail and captures its output to Logger
, and
System.cmd
and System.shell
are the
lower-level primitives these things call. So you can just do
cmd "ls -alh"
, or do something like
Nerves.Runtime.cmd("ls", ["-alh"], :info)
followed by
RingLogger.next()
.
Other useful things: save_term!
and
load_term!
are shortcuts for you to save/load Erlang values
to files on the filesystem. save_value!
does… something
like that? There’s also a few small shortcuts like weather
and qr_encode
that reach out to specific network services
that will probably not exist when you really need them to, which strikes
me as somewhat bad form, though they are handy sometimes. Usually for
reminding you to add :inets
or :ssl
to the
list of applications started at boot.
Init and configuration
The init system used is erlinit
which apparently is loaded with an erlinit.config
file that
just specifies command line options. The default
erlinit.config
file is provided by the
nerves_system_*
package, and you can override bits of it in
your config/target.exs
; those settings then get merged with
the platform’s default. If you put your own erlinit.config
into the rootfs_overlay/etc
dir it will overwrite
the platform erlinit.config
, which probably isn’t what you
want.
There’s also some way to set kernel command line args, but I haven’t
bothered digging into that yet. It’s probably either in a
fwup
config file or in an Elixir config script under
config/
.
Logging
The Logger
package is included in your app by default
and you can run Toolshed.log_attach/0
and
Toolshed.log_detach/0
to make it output to the console.
There’s several other log backends available:
- https://hex.pm/packages/ring_logger – The default
project setup uses this to provide an in-memory ring log, so it’s lost
on reboot but doesn’t wear out flash. Get new messages with
RingLogger.get/0
, view latest messages withRingLogger.tail/0
, save them to a file withRingLogger.save/1
. Claims to have a nice-looking TUI started withRingLogger.viewer/0
, but that gets me an undefined function, might need to enable it in a build option somewhere. - https://github.com/smartrent/ramoops_logger – RamOops logger, stores logs in DRAM designed to hopefully persist between reboots. Not enabled by default but sounds cool.
- Whatever other normal
Logger
backend you care to add. Toolshed.dmesg/0
will also show you kernel logs, not sure if there’s any integration between them andLogger
.
Onboard storage
Nerves uses an MBR style partition to lay out its stuff for some
reason, probably because they need to use something even on
eMMC flash or such. (Or maybe it’s just what fwup
does.
(Nope, fwup
can do GPT partitions just fine; weird. The
exact partition layout and format is defined by the
nerves_system_*
package, certain platforms require certain
partition table formats.)) The layout for an x86_64
system
looks something like this, though it will vary for different boards and
such:
- 2 MB of unpartitioned space containing stage0 bootloader in 1st megabyte and persistent per-device env variables in the 2nd megabyte – serial number, firmware version, UUID, metadata, whatever else you want to put there.
- Partition 1: 16 MB vfat partition with Grub bootloader and config
- Partition 2: 256 MB Linux squashfs filesystem with read-only root FS (“A” partition)
- Partition 3: 256 MB Linux squashfs filesystem with read-only root FS (“B” partition)
- Partition 4: 256 MB Linux ext4 filesystem mounted read-write on
/root
./data
is a symlink to/root
. This is where your mutable data goes.
You can put files into rootfs_overlay/
in your project
root and they will be imported into the generated squashfs filesystem.
Is there a way to put files into the /data
dir by default?
Probably.
Nerves supports A/B firmware upgrades! So you can make a firmware
upgrade, flash it onto a device in an unused partition, and set the
bootloader to boot to it… and then if it doesn’t boot correctly, then on
the next reboot it will go back to the old firmware automatically.
Very handy. Toolshed.fw_validate/0
is apparently
the userland hook for this that tells it “this firmware has booted
correctly”, and there’s a couple persistent env vars that are used to
keep track in it. It appears to Just Work without you having to do
anything to it, but you probably can tinker with when or how the
firmware is considered “valid” if you need to.
To add a mount point, such as for an external drive, edit
config/target.exs
and add something like:
config :nerves, :erlinit, mount: "/dev/sdb1:/mnt:ext4:nodev:"
.
This mounts the ext4 filesystem on /dev/sdb1
to
/mnt
with the options nodev
. The syntax for
the string can be found in https://github.com/nerves-project/erlinit/?tab=readme-ov-file#filesystem-mounting-notes
The fwup
tool defines what the partition layout in the
firmware image is actually like though, in
nerves_system_$MIX_TARGET/fwup.conf
. So if you need to
modify the root partition layout instead of just adding external drives
to it, you’ll have to update the fwup.conf
as well,
probably by making a custom system package.
More info here: https://hexdocs.pm/nerves/advanced-configuration.html#partitions.
You can get the persistent env vars using the
Nerves.Runtime.KV
package. You get 1 megabyte of
string->string
typed storage, and you can see what’s in
there with Nerves.Runtime.KV.get_all()
. To save something
there you can use Nerves.Runtime.KV.put("foo", "bar")
. If
you then open the disk image in a hex editor, and you should see the
string foo=bar
somewhere after address 0x00100000.
Hardware interface
https://elixir-circuits.github.io/ provides your basic low-level interfaces – UART, GPIO, I2C and SPI. No options for CANbus? That’s ok, it’s more specialized anyway, but it might be nice someday.
Circuits isn’t included in Nerves by default, which is honestly
fairly reasonable, just add it to your mix.exs
. Its own
docs are perfectly fine, just use those. Interestingly, it appears to
define a GenServer for reading/writing to UART devices but not for the
other device types? Maybe just ’cause polling/monitoring a serial port
is such a headache otherwise? I dunno.
I gotta say, something that stands out is that Circuits has actual
functions to list/enumerate the various devices that exist on the
system, which is a nice fucking change from groveling through kernel
logs. So you can just run Circuits.UART.enumerate()
instead
of glaring at /dev/ttyS0-32
and trying to figure out which
ones actually exist.
Network setup
Nerves currently configures its network stuff with library called
VintageNet
, which apparently supercedes an older and more
sophisticated but more error-prone lib. VintageNet has plugins for
various networking interfaces (ethernet, wifi, cellular, etc) and will
basically take down all networking and set it up from scratch each time
it makes a change. Looks like Nerves makes some effort to give network
interfaces consistent names, though I don’t see any udev settings in
/etc
so I don’t know how it does that. Oh, VintageNet just
asks
you to hardcode it and ignores things it doesn’t recognize. You can
define a network config in config/target.exs
, by default it
gives you some sane defaults based on your platform package. It includes
a basic DHCP client, and can use udhcpc
for a more
sophisticated one, as well as udhcpd
to provide a DHCP
server if you want it.
VintageNet’s cookbook is quite good, if you just want to set up something quick then that’s probably a good place to start. Basic functions are:
VintageNet.info
– Show detailed config info.VintageNet.get
– Get configuration struct for a particular interface.VintageNet.configure
– Set configuration for a particular interface.VintageNet.reset_to_defaults
– Sets configuration for an interface to whatever is inconfig.exs
, ignoring whatever is saved in local storage. I’ve definitely wanted this a time or two when figuring out a network config.
Nerves’s Toolshed
also provides simple implementations
of some basic net tools like ping
, ifconfig
,
httpget
, etc. VintageNet.configure
didn’t let
me give it egregiously incomplete or nonsensical input, which is an
improvement over most Linux network config systems.
I can’t find something exactly like
netplan try
, where it will apply the network config and
then wait for acknowledgement and revert to the old config if it doesn’t
get it. So it IS very possible to brick a system to the point you can’t
reach it over the network if you are not careful! However, there
is a persist: bool
flag for
VintageNet.configure/3
which is true by default. So you can
do
VintageNet.configure("eth0", my_new_config, persist: false)
and it will apply the config without saving it to storage, so if you
lock yourself out you can reboot the device and it will come back with
the old config. IMO this is nowhere near as nice a method as
netplan try
, but it does work. The persisted settings
appear to be saved to some kind of binary file in
/data/vintage_net/iface_name
, so if you delete that file
and reboot it will also come back up with the network config specified
in your target.exs
or whatever.
Don’t see any way to set up firewall rules, interesting. I don’t even
see iptables
binaries installed. How much this matters will
depend vastly upon your device and use case; all I really know
about network security is that good security is like an ogre: it has
layers. You can tell Nerves to install iptables
into a custom system package, but I can’t find specific,
easy docs how.
On the up side it appears there’s at least a little support
for multicast addresses, which is pretty boss. Multicast is a really
neat and really underused technology. I should come up with something
cool to do with it. Oh, there’s also an mDNS
package for letting devices advertise their presence and services via
mDNS/Avahi.
So yeah. All in all, not flawless but reasonably solid. Kinda one of the places where the Elixir community is smol and it shows, but it seems like the basics are all well-thought-out.
SSH
By default, the Nerves system will have SSH running on port 22. The
only user on a Nerves system is root
, it’s not really
designed for multi-user operation, which is fine. By default it will
look for SSH keys in ~/.ssh/id_{rsa,ecdsa,ed25519}.pub
on
the host system you build the Nerves firmware on, and put those into the
target system so that you can SSH into it; you can turn this off or
specify more/different SSH keys in config/target.exs
.
sftp or sshfs works too. That’s probably the easiest way to get remote access to the target system’s filesystem besides rebuilding it from scratch or using an external drive, though I wouldn’t recommend sshfs for heavy-duty NAS use or anything. For light duty stuff it’s always treated me well though.
If you want to turn ssh off or do other config stuff to it, the entry
point to do that lies in config/target.exs
and includes
links to deeper docs.
There’s also shortcuts for burning new firmware to a device over SSH.
If you set the MIX_TARGET
env var and run
mix deps.get
, you should then have a
mix firmware.gen.script
shortcut that will generate a
script called upload.sh
. Run that, modifying the host and
SSH port and such if necessary, and it should take a .fw
file from fwup
and upload it to the target device, save it
to the unused A/B partition, and reboot the device. nerves_ssh
has more info. fwup
appears to do most of the heavy lifting
here.
fwup
is pretty damn cool.
IPv6
Support for IPv6 appears to be sadly rather lacking. It will give you an address with stateless autoconfig, so that’s nice. But it doesn’t really seem to let you do anything else with it. I should help improve that; IPv6 support really isn’t that hard.
Vector crunching with Nx
Nx is a fairly powerful
and complete numerical computing library for Erlang/Elixir, filling a
similar niche to glam
for Rust or Eigen for C++. Robots
generally need to crunch a lot of numbers, so how to make Nx work nicely
with Nerves is something I want to look at.
Processing big chonkin’ matrices of numbers is something of a weak point of the Erlang VM in general, at least compared to its ability to put together and take apart highly-nested, pointer-heavy data structures. This is because the concern with large chonks of numbers is usually “how do I pack them as densely as possible and use memory as efficiently as possible”, which tends to lead to dense, mutable, array-like data structures. A lot of Erlang’s excellence comes by restricting aliasing of data, which usually means restricting mutation, which makes dense arrays pretty inefficient. So I’m a little curious how Nx handles math well.
After spending a couple days gnawing on it though… by default, uh,
well, it actually doesn’t. This was a bit of a rude surprise, I won’t
lie. Basically, Nx has a variety of pluggable backends that allow you to
specify different representations for its big chonkin’ matrices. The
default backend stores them as an Elixir binary
object,
there’s one for Google’s GPU vector math lib XLA
, another
based on PyTorch
,
etc. This is really cool because it means you can switch out
the backends for your Nx libs to support the hardware you have
available… CUDA, ROCm, specific NPU things, plain ol’ CPU, etc.
The thing is that the default backend is pretty basic and not very
smart, so it does lots and lots of cloning of binary
’s when
doing math. For simple use cases, in my tests it was nice and dense in
terms of memory usage, but significantly slower than operating
on dumb linked lists of numbers. Most people seem to recommend using the
EXLA
backend when you want actual performance. But
EXLA
is a big heavy complicated dependency that is really
overkill for what I want or need. I have exactly zero robotics use cases
that actually benefit from 100 MB of C++ code behind the scenes… well,
yes I do, but they’re better served by special-case libraries like PCL
and OpenCV.
So really, if you want a machine learning math lib, Nx is probably exactly what you want. But if you want a vector math lib for doing physics and geometry and some medium-scale linear algebra, then not so much. After some digging, there’s simpler and better alternatives for vector stuff. Matrex needs a bit of love but is built on BLAS and looks very good all in all, as does graphmath, and there’s honestly nothing particularly wrong with numerl for small systems. For numpy/pandas/R style dataframes, maybe look at Explorer. If you want to try to read my whole rambly problem-solving process investigating Nx, check out NxNotes, though you may find more questions than answers.
Sorry. Not the answer I wanted to give! But use the right tool for the right jobs.
Networked BEAM clusters / distributed Erlang
https://learnyousomeerlang.com/distribunomicon is probably the best place to start. https://hexdocs.pm/nerves_pack/readme.html#erlang-distribution also has a bit of practical info. I’m not an expert on this stuff, so this is just my observations and gut feels.
My (limited) experience with networked BEAM clusters has been a bit fraught from time to time. BEAM’s inter-node messaging (usually called Distributed Erlang) uses TCP/IP and is is generally assumed to be reliable. This is fine in a datacenter or a building with a dedicated wifi network, but with robots and other IoT/distributed devices, it can be pretty common that networking is done by fairly low-power wifi, other radio systems like Bluetooth or Zigbee, or even satellite. These can be subject to extreme lag, transient hiccups, vastly varying bandwidth, and other nasty behaviors very different than “on and operating at 100% effectiveness” or “off”. It also assumes that the network is secure, so make sure this is true. Probably want to use WireGuard or another VPN if you are using Erlang messaging to talk over a network that might have untrusted devices on it.
The Erlang network is a mesh and managed by the epmd
daemon, which talks on port 4369. Nerves appears to do no firewalling by
default, but it also runs nothing besides SSH and epmd afaict, so you’re
probably ok to start without a firewall as long as you put it on your
to-do list for a real deployment. By default you build an Erlang mesh by
just telling each node the addresses of some/all other nodes, which
really doesn’t scale up great. There’s other ways of doing it though: if
you’re running only one VM per machine you can tell them to not bother
with epmd
and talk to each other directly on fixed ports,
which makes firewalling much easier. Or you can use another way to
announce systems such as libcluster, which
has a
number of methods for systems to find each other. The nodes still
use reliable TCP to talk to each other once found though.
All in all, I’d probably only use Erlang messaging between separate machines in Nerves if a) all the systems were physically built into the same device and connected via copper, or b) you had a very clear understanding about its limitations and were operating on a controlled network anyway, such as a device-specific wifi network. For example I’d use it on a drone with a separate flight control computer and a safety monitor computer wired to each other over ethernet, or if I had a warehouse full of robot forklifts that all talk over a private wifi network to a central dispatching system. (Even there I’d worry about “what do the robots do if wifi goes down or gets spotty”, but that’s a relatively simple failure mode in a relatively controlled environment.)
But if you’re using genservers and such to abstract message sends behind function calls and services, hoooooopefully you can use an unreliable messaging layer and use these things to abstract it. So you don’t need to worry about sending messages directly between Erlang nodes anyway. Right? Right.
My takeaway from this is that if there’s two machines talking to each other over an unreliable transport, as far as I can tell there’s nothing wrong with having them each run a process that talks each other via some protocol designed for unreliable transport, instead of Distributed Erlang. It can be some kind of HTTP RPC, UDP broadcasts, heckin’ DDS or whatever else you want. Just use the right tool for the job.
Language integration
The docs are here: https://hexdocs.pm/nerves/compiling-non-beam-code.html. They provide some handy rules of thumb:
- Build large and complicated C and C++ projects using Buildroot by creating a Custom system
- Build small C and C++ projects using elixir_make
- Look for libraries like zigler for specific languages
- When hope is lost, compile the programs outside of Nerves and include the binaries in a priv directory. Static linking is recommended.
…Before you even start, experience has shown that searching the Erlang/OTP docs three times and skimming the Erlang source leads to all kinds of amazing discoveries that may not require you to port any code at all.
I can confirm that the Erlang/OTP system has many fascinating things in it that you might not expect, but sometimes you gotta roll your own. My Dream Robot Architecture is basically to use Rust for number-crunching and hardware interface, and Erlang/Elixir for the control plane, so I want to figure out how to do exactly that with Nerves.
The Erlang VM has several options for FFI, with a sliding scale between safety and performance:
- Safest and slowest – “C nodes”, where you write an entirely separate program that talks to the Erlang VM over the Distributed Erlang protocol.
- Medium-slow – “Ports”, where you write an entirely separate program
that is started by the VM and talks to a process via a bidirectional
byte stream, basically the same as a Unix pipe, Python’s
subprocess.Popen
, etc. - Medium-fast – “Port drivers”, which work like Ports but are loaded as a dynamically linked library into the VM’s address space and is run in the VM. People on the Elixir discord tend to say “Port Drivers IMHO should not be used anymore as there is no point” – they have all the weaknesses of both Ports and NIF’s with only some of the strengths.
- Fastest and most hazardous – “NIF’s”, Native Implemented Functions. These are native-code programs compiled as DLL’s and loaded by the VM, which offer functions that can be directly called by Erlang/Elixir code. If you’ve used Python’s or LuaJIT’s FFI before this will probably be a familiar approach. This also means that if a function takes too long it can block the VM’s scheduler and Cause Issues, if a function crashes or corrupts memory it can crash the BEAM VM or cause it to silently give incorrect results, etc.
We’re gonna focus on two approaches: Making native code NIF’s, and making Nerves interoperate with non-Elixir BEAM languages. Nerves is mostly a build system, and the really hard nasty irritating part of FFI is always the build system, so this is really mostly about the build system anyway. If we can get this off the ground with the very basic stuff, you should be able to read other tutorials and get on to the more advanced stuff yourself.
We’re also only going to worry about building this stuff on Linux, for reasons that will become rapidly apparent. I’m using Debian Bookworm and/or Ubuntu 22.
Erlang
There’s some good example code in
the nerves_examples
repo. Nerves’s build stuff
is all built using mix
, but you can always just write your
entire project as a library built with rebar3
if you wish,
and use it as a dependency in the mix
project.
At its simplest, you can just put Erlang code in the
src/
dir of your root project and everything will pretty
much just work. mix
will find it and build it as normal
modules. You can also swap out IEx to just use the Erlang shell if
you’re a little masochistic by editing rel/vm.args.eex
, the
nerves_examples
repo shows you how. (Is there a better
command line REPL for Erlang? I should find out.) Or since Erlang and
Elixir have almost the exact same data model, you can just call your
Erlang functions from Elixir like normal.
Going a step beyond, you can use umbrella projects to just have an
Erlang subproject in your Nerves project. However, mix
’s
umbrella projects are a little squirrelly and hard to use sometimes, so
the Nerves people seem to have created the idea of a poncho
project instead. I admire their ridiculous terminology! This is more
or less exactly
how you’d want to make a dependency as its own subproject; you
literally move your “root” project into a subdirectory, make another
subdirectory next to it, and add a dependency to it in your
mix.exs
, like so:
cd my_cool_nerves_project
mkdir my_cool_nerves_project
mv * my_cool_nerves_project
mix new my_subproject
vim my_cool_nerves_project/mix.exs
# Add {:my_subproject, path: "../my_subproject"} to your deps
cd my_cool_nerves_project
mix firmware
It’s literally just two entirely separate projects under the same
directory. They can be Elixir, Erlang, or anything else that
mix
recognizes. There’s probably a reason mix
makes umbrella projects complicated, but idk what it is. Probably
because it tries to merge application configs automatically, and that
just doesn’t seem like something you ever want to leave up to a
machine. There’s always something that’s gonna step on something else by
accident.
Lisp-Flavored Erlang
Does anyone actually use LFE? I should find an excuse to play with it someday. IMO Elixir is a perfectly fine Lisp and I don’t have much personal need for something different, but using LFE in Nerves looks pretty simple. That example works pretty much right out of the box, so I’ll leave it up to you to figure out the details. It’s nice.
Gleam
The newest member of the Erlang family! I know nothing about it besides it’s strongly typed and its creators are pretty cool.
We’re using asdf
for this
anyway, so installing Gleam is pretty simple:
asdf plugin add gleam
asdf install gleam 1.1.0
asdf global gleam 1.1.0
We will create a Gleam project as a poncho project for simplicity’s
sake. It looks like Gleam has its own build system and build file format
(sigh), and mix
does not yet know how to use the
gleam
build system (sigh), so they basically can’t
interoperate automatically yet. Someone wrote mix_gleam
,
a mix plugin that can be used to build Gleam code, but it has been
described as working “in a narrow range of cases [to make a best effort
attempt to compile Gleam for Mix”. So I’m not gonna dig too deep into
it.
Huh, Gleam’s compiler and build tool are actually written in Rust. I assumed it’d just be an Erlang program and libs, the same way that Elixir and LFE essentially are. Not a terrible choice, just unexpected.
Gleam does compile to Erlang and output the
.erl
files in its build directory, so there has to
be a way of telling mix
to just build and load them
automatically, but I don’t know enough about it to do that. It
loooooooks like the gleam
build tool knows how to build
rebar3
projects, but not yet the other way around?
Supposedly if you publish a Gleam project to hex.pm, it will just
magically become an Erlang project in a format that mix
and
rebar3
and stuff will be able to understand. But I don’t
want to do that, I want to build locally. I’ll just add “bend rebar3 and
mix to my evil whims” to my bucket list, I guess.
Siiiiiiiigh, using rebar3
is not really my favorite
passtime, but wouldn’t it be nice if the whole Erlang ecosystem used the
same damn build system? Fucking hell this fragmentation is dumb.
Rust
rustler
provides some nice tools for creating Rust NIF’s in Elixir projects.
Looks pretty straightforward. After adding rustler
to your
mix.exs
dependencies and doing mix deps.get
,
it gives you a very simple Rust crate under native/
that
has a little bit of template code, like compiling to a cdylib by default
and calling rustler:init!()
to export the functions you’ve
defined. I have to add a stub function and a little config in my
lib/hello_nerves.ex
module per the rustler
docs, and fix my names to be consistent between the two. Running
mix firmware.burn
now invokes Cargo and builds my Rust
code. Nice. Run my VM aaaaaaaaaand… it crashes like a boss.
Wonderful. There’s some Erlang error in the VM output, but it goes by
too fast for me to see. Well, now we have a reason to add
config :nerves, :erlinit, hang_on_exit: true
to our
config/target.exs
. And when the VM crashes the error
message I get is:
Failed to load NIF library: Error loading shared library ld-linux-x86-64.so.2: no such file or directory
Well this makes sense, rustler
builds stuff as a shared
library and it appears Nerves uses only musl which doesn’t support
dynamic linking, so… uh… how do NIF’s in Nerves ever
work???? It has to use some C in there somewhere or another,
I’ve seen it being built! I can try building my rust crate alone with
cargo build --target x86_64-unknown-linux-musl
in its
directory, but cargo refuses to try to produce a dynamic lib when using
the musl target, which makes sense.
Ok let’s try again. Let’s try building Rustler stuff in a normal Elixir project totally without Nerves involved, and make sure we know how to use it.
mix new heckin_rustler
cd heckin_rustler
vim mix.exs
# Add {:rustler, "~> 0.32.1"} to deps
mix deps.get
mix rustler.new
# It asks me for a couple names, then generates a new Rust crate in native/heckinrust.
# There's instructions for how to write the Elixir glue code in native/heckinrust/README.md
# Just copy-paste the example for now into lib/heckin_rustler.ex and build your stuff:
mix compile
# All your Rust code should now be built, so you should be able to do:
iex -S mix
...
iex(1)> HeckinRust.add(3,4)
# Returns 7
Great, it works! Rustler isn’t garbage that can’t even do the trivial
things; always good to confirm. Our Rust code is now built into a shared
object using glibc in
_build/dev/lib/heckin_rustler/native/heckinrust/release/libheckinrust.so
.
Success! How do we make this cooperate with Nerves? Let’s start from an
entirely fresh Nerves project.
You DID do mix archive.install hex nerves_bootstrap
,
right? You didn’t just nuke ~/.asdf/
to start from scratch
and not faithfully follow the instructions in NervesLocalSetup to get a working Nerves
system again? Good. Just asking. No reason why.
…Let’s just make our new Nerves project and add Rustler to it:
mix nerves.new hello_rustler
set -x MIX_TARGET x86_64
# or for bash rather than fish:
# export MIX_TARGET=x86_64
mix deps.get
mix firmware.burn --device hello_rustler.img
mix nerves.gen.qemu_script
./run_qemu.sh hello_rustler.img
Everything starts and runs, and we can SSH into our VM, run
HelloRustler.hello
, and get :world
back.
Great, everything works. Now let’s add
{:rustler, "~> 0.32.1"}
to our mix.exs
and
do mix rustler.new
. That creates our Rust crate boilerplate
in native/
, we add the Elixir side of the boilerplate to
lib/hello_rustler.ex
, and do our
mix firmware.burn
command. It builds our Rust code, builds
our firmware image… and does the same thing where it can’t load our NIF
module because it can’t load the .so file. Because it uses musl instead
of glibc and musl doesn’t do .so files. …right? The .so file DOES exist
in our built firmware image, I can confirm that at least!
Okay, it’s time to do some surgery. Question one, is this assumption correct? Can musl use shared objects somehow, or does the Nerves system not use musl for everything? I don’t know TOO much about musl in practice and glibc is pretty cursed, so I’m trying to make an educated guess here. Let’s unzip the .fw file this project generates, mount the squashfs that contains the root filesystem, and poke around a bit. I find a number of .so files that look legit, and– well, there’s one program I know HAS to be set up correctly and actually used, and that’s Erlang:
file srv/erlang/erts-14.2.5/bin/erlexec
# Prints:
# srv/erlang/erts-14.2.5/bin/erlexec: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped
mmmmHM, so my educated guess appears to be wrong. musl does have a
dynamic linker of SOME kind or another,
interpreter /lib/ld-musl-x86_64.so.1
is pretty unambiguous.
Is that actually provided by musl or is to some weird hack? Am I out of
date? I am out of date. It doesn’t enable it by default, but you can now
totally use musl’s libc as a dynamic library, and it includes a full-ish
dynamic linker. Neat! That explains how Nerves uses stuff with NIF’s,
I’m not fundamentally insane. Whew!
Well then cross-compiling our Rust should be pretty straightforward, if we just build for the musl target:
rustup target add x86_64-unknown-linux-musl
cd ~/tmp/hello_rustler/native/hirust
cargo build --target x86_64-unknown-linux-musl
# error: cannot produce cdylib for `hirust v0.1.0 (!/tmp/hello_rustler/native/hirust)` as the target `x86_64-unknown-linux-musl` does not support these crate types
So you CAN build code using musl libc as a dynamic library, but rustc
still doesn’t want to by default. Searching for “rust musl cdylib” gets
me this
issue, which can be summed up as “we don’t want to have our musl
builds use dylib’s by default because people tend to use musl to avoid
dylibs”, which is fairly reasonable. This then asks
for the ability to build dylib/cdylib’s using musl, which links to this unclosed
issue but also says you can set some RUSTFLAGS
env vars to
make it build dynamic libs with musl. Trying that out in our
native/hirust
crate appears to work! …well, it builds
without errors. “Work” is a higher bar than that.
Rustler already creates a
native/hirust/.cargo/config.toml
file for us with some
MacOS config stuff with it, so we should be able to just edit that and
add the following so we don’t have to keep messing around with env
vars:
[target.x86_64-unknown-linux-musl]
rustflags = [
"-C", "target-feature=-crt-static"
]
If you’re not building for an x86_64
target, change that
part accordingly. Then export the env var
CARGO_BUILD_TARGET=x86_64-unknown-linux-musl
to tell cargo
to build on the target you desire by default.
Cargo isn’t terribly good at detecting changes to command line opts
in config.toml
, so just nuke the _build/
dir
to force everything to recompile from scratch. Mix compiles our Rust
crate but complains that
_build/x86_64_dev/lib/hello_rustler/native/hirust/release/libhirust.so
doesn’t exist. That’s cause it’s in a different subdir,
_build/x86_64_dev/lib/hello_rustler/native/hirust/x86_64-unknown-linux-musl/release/libhirust.so
.
So how do we tell Rustler to search in the correct dir?
Cross-compilation in Rustler seems to be in a rather unfinished
state, alas.
I’m not going to fix that right this instant. So I’m just gonna
heckin’ go into
_build/x86_64_dev/lib/hello_rustler/native/hirust
and do
ln -s x86_64-unknown-linux-musl/release/
to make it look in
the right directory. Aaaaaand that doesn’t work because ’cause cargo
actually needs those directories to be different and so it
blocks forever on a locked file.
FINE. I’ll FUCKING COPY THE .SO to the place rustler expects it to
be. And it still doesn’t work, on boot our VM still gives us
Failed to load NIF library: Error loading shared library ld-linux-x86-64.so.2: no such file or directory
!
Our generated .so file isn’t trying to use musl’s dynamic loader at all!
What the fuck?! This says I
should be able to fix that by setting -C linker=musl-gcc
in
a RUSTFLAGS env var (or in my .cargo/config.toml
). But then
when I do THAT I get a big chonkin compile error ending with
/usr/bin/ld: cannot find libgcc_s.so.1: No such file or directory
.
Which leads to this
issue about rustc improperly adding libgcc_s
to the link
args even when it shouldn’t, which might have been fixed in Alpine
since 2022 but doesn’t seem to have made it upstream yet???
Ok, I think going seven tickets deep into build issues is my limit.
So yeah. tldr, rustler needs to learn how to cross-compile, preferably
by reading Nerves’s env vars. But I can’t really blame them for not
wanting to put the work in because libc is a fuck, dynamic linking is a
fuck and nothing knows how to dynamic link with musl yet, gcc is a fuck,
cross-compiling is a fuck, and Nerves and Rustler are just doing the
best they can. Maybe I’ll come back to it when creating my own
nerves_system
package someday.
Never, ever do this
………ahahahahah oh my fucking gods. So, if libgcc_s.so.2
isn’t actually being used but is just incorrectly put into the list of
libs to link our Rust crate with, can we just… stub it out? Will the
linker accept an empty file if it never needs to try to actually use it?
I wanna see what happens, mostly out of morbid curiosity. Let’s look at
the linker error message to see where it’s looking for it, so we can try
putting something there.
~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/
looks like a likely candidate, so let’s just do
touch ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libgcc_s.so.2
.
Build our stuff, put the .so in the right place, build it some more,
ignore the part where the poor Erlang VM complains
[warning] The on_load function for module Elixir.HiRust returned: {:error, {:load_failed, ~c"Failed to load NIF library ~/tmp/hello_rustler/_build/x86_64_dev/lib/hello_rustler/priv/native/libhirust: '/lib/x86_64-linux-gnu/libc.so: invalid ELF header'"}}
during our build process for some reason, burn the image, boot the VM,
and… it boots. Can I run my NIF? My poor, blighted, tragic NIF that
desperately wants to just add a + b
together and has no
idea what all the fuss is about?
iex(1)> HiRust.add(3,4)
7
Holy sweet cats in the sky it worked. …well, “worked”. This is right up there with the time I edited an executable with vim to fix a dynamic lib path, on a production system we’d already shipped to a customer. That “worked” too.
So yeah. This is not a fix. This isn’t even a hack. I have no idea what this might blow up. I would expect it to harm nothing, but I sure as hell don’t want to find out I’m wrong about that.
(And then I paused qemu, forgetting that Nerves doesn’t like that because it makes the clock skip, and somehow unpausing it crashed my laptop’s GPU driver and I nearly lost this entire writeup. Probably the universe trying to punish me for my hubris. It’s all right, I deserved it, and I was lucky enough to save this file anyway. Suck it, universe!)
(Also, it just occurred to me that you can’t necessarily guarantee with this method that the version of musl that Rust links against is even the version of musl that Nerves actually uses. So the real answer is probably to do whatever is necessary to build stuff in Buildroot, but Buildroot apparently doesn’t yet support Rust.)
Zig
There’s a similar library to build Zig NIF’s from Elixir, called
Zigler. Again the nerves_examples
repo has some example code.
Its documentation is out of date with its contents, it doesn’t work, and it doesn’t work in a different way each time I try it. So I think I’m done screwing around with NIF’s for a while. My hopes for all of this were so much higher.
More qemu recipes
x86_64 VM with network
There’s actually a mix shortcut that makes you an appropriate (if
basic) qemu script, when you have the MIX_TARGET
env var
set to x86_64
: mix nerves.gen.qemu_script
. It
produces a script with this output (as well as a help command and some
other conveniences):
IMAGE="$1"
qemu-system-x86_64 \
-drive file="$IMAGE",if=virtio,format=raw \
-net nic,model=virtio \
-net user,hostfwd=tcp::10022-:22 \
-serial stdio
You can then ssh to the VM with
ssh -p 10022 root@localhost
. You can specify SSH keys in
config/target.exs
, by default it will look for
~/.ssh/id_{rsa,ecdsa,ed25519}.pub
on your host system and
put those into the target system.
No mix targets that will generate similar scripts for RPi though, alas.
x86_64 VM with network and external data drive
# Create 1 GB disk image
dd if=/dev/zero of=data.img bs=1M count=1024
# Partition it as you wish. Can we make fwup do this somehow?
# cfdisk data.img
# Or just create the filesystem directly upon the image, with no partition table:
mkfs -t ext4 data.img
qemu-system-x86_64 \
-drive file=hello.img,if=virtio,format=raw \
-drive file=data.img,if=virtio,format=raw \
-net nic,model=virtio \
-net user,hostfwd=tcp::10022-:22 \
-serial stdio
This will give you by default a device called /dev/vdb
,
with Nerves’s /dev/rootdisk0*
being /dev/vda
.
The ordering of these is not always consistent, I suspect that qemu just
puts ’em where you expect based on the ordering of the
-drive
args unless you specify otherwise. You can
prooooobably make fwup
create this image for you but idk
how yet, and modifying fwup.conf
is something I’d prefer to
avoid unless I’m specifically making a whole new platform – which I
probably would do for particular applications, but not until I knew
pretty well what I was doing.
Anyway! To mount this image on boot you edit
config/target.exs
and add
config :nerves, :erlinit, mount: "/dev/sdb1:/mnt:ext4:nodev:"
,
and you should be good to go.
GUI stuff
Not many options right now. Either build a web gui with Phoenix/Liveview/your poison of choice, or use Scenic, which as of April 2024 is apparently quite lightly maintained but still alive. There’s a couple libs out there for particular models of display as well, I don’t know enough about it to judge them.
Creating new hardware system packages
The docs for this are https://hexdocs.pm/nerves/customizing-systems.html, which says that you should first read https://hexdocs.pm/nerves/systems.html.
It appears to be mainly compiling a kernel, building Erlang and
Busybox, and setting up all the config and devicetrees and such
correctly. A lot of the hard work is done through Buildroot, and a lot
of the easy work seems to be usually done by cloning an existing
nerves_system_*
package and modifying it for your new
target. This lets you set up a totally-customized “base” system for your
Elixir environment, which is nice, and it appears pretty common to take
the generic system packages and tweak them for specific devices. For
example on hex.pm there’s a generic nerves_system_rpi3
and
then a nerves_system_farmbot_rpi3
for Farmbot, a generic
nerves_system_bbb
for Beagleboard support and then a
nerves_system_bbb_sgx
version that includes the proprietary
GPU drivers for it, stuff like that.
The docs seem pretty ok but also very target-specific, so I’m not
going to go into it here. If you can install/configure u-boot and build
a Linux kernel then you should prooooooobably be able to figure out how
to make a nerves_system
package. Not a low bar, but not a
super high one either. I want to try running Nerves on a couple Rockchip
devices, which don’t have existing system packages but do have good
Linux support, so if I ever get around to it then I will write up
something about how to do it.
Random bits and pieces
D-Bus is not normally enabled on Nerves. It may be enabled in a custom system.
I guess that makes sense, D-Bus is just Erlang messages for C
programs. Oh, Nerves does have udev though. Most of the standard linux
tools for using it are absent but it does have busybox’s
/sbin/uevent
, which gives you a (very minimal) place to
start. Ahhhhh, and it also has the nerves_uevent
package, which is probably what you want anyway. Niiiiiice.
There’s also nerves_time
to handle time via various
methods (NTP, real-time clock, etc.) And nerves_pack
which appears to collect together various… well, random bits and
pieces.