NervesNotes

Nerves is a framework for making embedded systems using Elixir. Here are my notes on how to use Nerves and what it can do. This is not quite a tutorial and certainly not a reference, more a rambly brain-dump. It started out being just my personal notes on handy features and shortcuts, and got progressively more structured and complete the further I went. Don’t assume this is super-complete; my interests for Nerves are sometimes for IoT-y things (sensors, home automation etc) and mostly for robotics (autonomous drones, drone swarms, occasionally constrained-environment things like forklifts or farm robots), so those are the things I will pay the most attention to. For notes on how to set up Nerves to run on a VM and general impressions of it, check out NervesLocalSetup.

Up to date as of May 2024, for Nerves 1.10.

Resources

Capabilities/features

Overall, what Nerves does is build a barebones Linux system with an Elixir VM running on it. It boots directly into said VM and bypasses 99% of the stuff modern Linux distro’s do: there’s no systemd, no X11 or Wayland, no getty, no openrc, no package manager, or anything else. It’s designed to make an embedded system: you don’t build the system by starting with a base distro and adding programs on it and stuff, you basically just bundle the program you want with the OS and that’s it. It runs one program at boot and continues until the system turns off.

This is actually brilliant, ’cause that program is “the Erlang VM”, and IMO it is a rather better operating system abstraction than Unix is. Not gonna go into that too deeply here, but suffice to say that Rust’s stdlib has no safe way to kill an thread from the outside because no matter how you do it, there’s always going to be something horrible that can happen that will hose process state or other threads. Erlang processes generally don’t have this sort of problem even though they share a memory space. Ponder on that a bit.

It also has the result that Nerves really isn’t much of a library; it gives you a few things at runtime but mostly relatively-small convenience and setup utilities. Nerves is mostly just a build system! Well, a build system, and a firmware packager, and an installer, and bunch of other stuff that generally takes a lot of the pain out of building and deploying Linux systems on mostly-embedded hardware. It’s very nice! It just is something that does most of the work at compile-time rather than runtime.

IEx shell stuff

When you boot it up, the system will boot into an IEx shell. …On some terminal device or another; on embedded boards like a Beaglebone this will generally be on a serial port. The nerves_system_foo docs for your particular board foo will tell you which device the terminal is on, which is more than most actual board providers tend to do.

By default the system will reboot itself if the VM stops. You can change this behavior with a config option in config/target.exs.

If the IEx shell alone stops (such as by you running exit() or something) the Erlang VM will keep on chugging without it. You can type ctrl-G to get into the VM monitor, then h to list help. s starts an Erlang shell and c will connect to it; to start an IEx shell you have to do s 'Elixir.IEx', and again connect to it with c. Or you can stop the VM from the Erlang monitor with q or with Erlang’s q(). function which will, as said, reboot the machine by default.

Toolshed is the default Nerves package full of CLI-ish stuff, most of which is similarly-named to normal Unix command line utils. It’s honestly reasonably useful as a shell without the rest of Nerves, though it is not really intended to be used as a non-interactive lib for shell scripting. The Erlang VM already includes most of the utilities you would want, instead of having shell scripts you just write code.

There IS a Unix environment under there but it’s all provided by Busybox, so it is functional but quite minimalistic.

Toolshed.cmd/1 lets you run a string in a sh shell and print its result. Nerves.Runtime.cmd/3 does the same but lets you specify more detail and captures its output to Logger, and System.cmd and System.shell are the lower-level primitives these things call. So you can just do cmd "ls -alh", or do something like Nerves.Runtime.cmd("ls", ["-alh"], :info) followed by RingLogger.next().

Other useful things: save_term! and load_term! are shortcuts for you to save/load Erlang values to files on the filesystem. save_value! does… something like that? There’s also a few small shortcuts like weather and qr_encode that reach out to specific network services that will probably not exist when you really need them to, which strikes me as somewhat bad form, though they are handy sometimes. Usually for reminding you to add :inets or :ssl to the list of applications started at boot.

Init and configuration

The init system used is erlinit which apparently is loaded with an erlinit.config file that just specifies command line options. The default erlinit.config file is provided by the nerves_system_* package, and you can override bits of it in your config/target.exs; those settings then get merged with the platform’s default. If you put your own erlinit.config into the rootfs_overlay/etc dir it will overwrite the platform erlinit.config, which probably isn’t what you want.

There’s also some way to set kernel command line args, but I haven’t bothered digging into that yet. It’s probably either in a fwup config file or in an Elixir config script under config/.

Logging

The Logger package is included in your app by default and you can run Toolshed.log_attach/0 and Toolshed.log_detach/0 to make it output to the console. There’s several other log backends available:

  • https://hex.pm/packages/ring_logger – The default project setup uses this to provide an in-memory ring log, so it’s lost on reboot but doesn’t wear out flash. Get new messages with RingLogger.get/0, view latest messages with RingLogger.tail/0, save them to a file with RingLogger.save/1. Claims to have a nice-looking TUI started with RingLogger.viewer/0, but that gets me an undefined function, might need to enable it in a build option somewhere.
  • https://github.com/smartrent/ramoops_logger – RamOops logger, stores logs in DRAM designed to hopefully persist between reboots. Not enabled by default but sounds cool.
  • Whatever other normal Logger backend you care to add.
  • Toolshed.dmesg/0 will also show you kernel logs, not sure if there’s any integration between them and Logger.

Onboard storage

Nerves uses an MBR style partition to lay out its stuff for some reason, probably because they need to use something even on eMMC flash or such. (Or maybe it’s just what fwup does. (Nope, fwup can do GPT partitions just fine; weird. The exact partition layout and format is defined by the nerves_system_* package, certain platforms require certain partition table formats.)) The layout for an x86_64 system looks something like this, though it will vary for different boards and such:

  • 2 MB of unpartitioned space containing stage0 bootloader in 1st megabyte and persistent per-device env variables in the 2nd megabyte – serial number, firmware version, UUID, metadata, whatever else you want to put there.
  • Partition 1: 16 MB vfat partition with Grub bootloader and config
  • Partition 2: 256 MB Linux squashfs filesystem with read-only root FS (“A” partition)
  • Partition 3: 256 MB Linux squashfs filesystem with read-only root FS (“B” partition)
  • Partition 4: 256 MB Linux ext4 filesystem mounted read-write on /root. /data is a symlink to /root. This is where your mutable data goes.

You can put files into rootfs_overlay/ in your project root and they will be imported into the generated squashfs filesystem. Is there a way to put files into the /data dir by default? Probably.

Nerves supports A/B firmware upgrades! So you can make a firmware upgrade, flash it onto a device in an unused partition, and set the bootloader to boot to it… and then if it doesn’t boot correctly, then on the next reboot it will go back to the old firmware automatically. Very handy. Toolshed.fw_validate/0 is apparently the userland hook for this that tells it “this firmware has booted correctly”, and there’s a couple persistent env vars that are used to keep track in it. It appears to Just Work without you having to do anything to it, but you probably can tinker with when or how the firmware is considered “valid” if you need to.

To add a mount point, such as for an external drive, edit config/target.exs and add something like: config :nerves, :erlinit, mount: "/dev/sdb1:/mnt:ext4:nodev:". This mounts the ext4 filesystem on /dev/sdb1 to /mnt with the options nodev. The syntax for the string can be found in https://github.com/nerves-project/erlinit/?tab=readme-ov-file#filesystem-mounting-notes

The fwup tool defines what the partition layout in the firmware image is actually like though, in nerves_system_$MIX_TARGET/fwup.conf. So if you need to modify the root partition layout instead of just adding external drives to it, you’ll have to update the fwup.conf as well, probably by making a custom system package.

More info here: https://hexdocs.pm/nerves/advanced-configuration.html#partitions.

You can get the persistent env vars using the Nerves.Runtime.KV package. You get 1 megabyte of string->string typed storage, and you can see what’s in there with Nerves.Runtime.KV.get_all(). To save something there you can use Nerves.Runtime.KV.put("foo", "bar"). If you then open the disk image in a hex editor, and you should see the string foo=bar somewhere after address 0x00100000.

Hardware interface

https://elixir-circuits.github.io/ provides your basic low-level interfaces – UART, GPIO, I2C and SPI. No options for CANbus? That’s ok, it’s more specialized anyway, but it might be nice someday.

Circuits isn’t included in Nerves by default, which is honestly fairly reasonable, just add it to your mix.exs. Its own docs are perfectly fine, just use those. Interestingly, it appears to define a GenServer for reading/writing to UART devices but not for the other device types? Maybe just ’cause polling/monitoring a serial port is such a headache otherwise? I dunno.

I gotta say, something that stands out is that Circuits has actual functions to list/enumerate the various devices that exist on the system, which is a nice fucking change from groveling through kernel logs. So you can just run Circuits.UART.enumerate() instead of glaring at /dev/ttyS0-32 and trying to figure out which ones actually exist.

Network setup

Nerves currently configures its network stuff with library called VintageNet, which apparently supercedes an older and more sophisticated but more error-prone lib. VintageNet has plugins for various networking interfaces (ethernet, wifi, cellular, etc) and will basically take down all networking and set it up from scratch each time it makes a change. Looks like Nerves makes some effort to give network interfaces consistent names, though I don’t see any udev settings in /etc so I don’t know how it does that. Oh, VintageNet just asks you to hardcode it and ignores things it doesn’t recognize. You can define a network config in config/target.exs, by default it gives you some sane defaults based on your platform package. It includes a basic DHCP client, and can use udhcpc for a more sophisticated one, as well as udhcpd to provide a DHCP server if you want it.

VintageNet’s cookbook is quite good, if you just want to set up something quick then that’s probably a good place to start. Basic functions are:

  • VintageNet.info – Show detailed config info.
  • VintageNet.get – Get configuration struct for a particular interface.
  • VintageNet.configure – Set configuration for a particular interface.
  • VintageNet.reset_to_defaults – Sets configuration for an interface to whatever is in config.exs, ignoring whatever is saved in local storage. I’ve definitely wanted this a time or two when figuring out a network config.

Nerves’s Toolshed also provides simple implementations of some basic net tools like ping, ifconfig, httpget, etc. VintageNet.configure didn’t let me give it egregiously incomplete or nonsensical input, which is an improvement over most Linux network config systems.

I can’t find something exactly like netplan try, where it will apply the network config and then wait for acknowledgement and revert to the old config if it doesn’t get it. So it IS very possible to brick a system to the point you can’t reach it over the network if you are not careful! However, there is a persist: bool flag for VintageNet.configure/3 which is true by default. So you can do VintageNet.configure("eth0", my_new_config, persist: false) and it will apply the config without saving it to storage, so if you lock yourself out you can reboot the device and it will come back with the old config. IMO this is nowhere near as nice a method as netplan try, but it does work. The persisted settings appear to be saved to some kind of binary file in /data/vintage_net/iface_name, so if you delete that file and reboot it will also come back up with the network config specified in your target.exs or whatever.

Don’t see any way to set up firewall rules, interesting. I don’t even see iptables binaries installed. How much this matters will depend vastly upon your device and use case; all I really know about network security is that good security is like an ogre: it has layers. You can tell Nerves to install iptables into a custom system package, but I can’t find specific, easy docs how.

On the up side it appears there’s at least a little support for multicast addresses, which is pretty boss. Multicast is a really neat and really underused technology. I should come up with something cool to do with it. Oh, there’s also an mDNS package for letting devices advertise their presence and services via mDNS/Avahi.

So yeah. All in all, not flawless but reasonably solid. Kinda one of the places where the Elixir community is smol and it shows, but it seems like the basics are all well-thought-out.

SSH

By default, the Nerves system will have SSH running on port 22. The only user on a Nerves system is root, it’s not really designed for multi-user operation, which is fine. By default it will look for SSH keys in ~/.ssh/id_{rsa,ecdsa,ed25519}.pub on the host system you build the Nerves firmware on, and put those into the target system so that you can SSH into it; you can turn this off or specify more/different SSH keys in config/target.exs.

sftp or sshfs works too. That’s probably the easiest way to get remote access to the target system’s filesystem besides rebuilding it from scratch or using an external drive, though I wouldn’t recommend sshfs for heavy-duty NAS use or anything. For light duty stuff it’s always treated me well though.

If you want to turn ssh off or do other config stuff to it, the entry point to do that lies in config/target.exs and includes links to deeper docs.

There’s also shortcuts for burning new firmware to a device over SSH. If you set the MIX_TARGET env var and run mix deps.get, you should then have a mix firmware.gen.script shortcut that will generate a script called upload.sh. Run that, modifying the host and SSH port and such if necessary, and it should take a .fw file from fwup and upload it to the target device, save it to the unused A/B partition, and reboot the device. nerves_ssh has more info. fwup appears to do most of the heavy lifting here.

fwup is pretty damn cool.

IPv6

Support for IPv6 appears to be sadly rather lacking. It will give you an address with stateless autoconfig, so that’s nice. But it doesn’t really seem to let you do anything else with it. I should help improve that; IPv6 support really isn’t that hard.

Vector crunching with Nx

Nx is a fairly powerful and complete numerical computing library for Erlang/Elixir, filling a similar niche to glam for Rust or Eigen for C++. Robots generally need to crunch a lot of numbers, so how to make Nx work nicely with Nerves is something I want to look at.

Processing big chonkin’ matrices of numbers is something of a weak point of the Erlang VM in general, at least compared to its ability to put together and take apart highly-nested, pointer-heavy data structures. This is because the concern with large chonks of numbers is usually “how do I pack them as densely as possible and use memory as efficiently as possible”, which tends to lead to dense, mutable, array-like data structures. A lot of Erlang’s excellence comes by restricting aliasing of data, which usually means restricting mutation, which makes dense arrays pretty inefficient. So I’m a little curious how Nx handles math well.

After spending a couple days gnawing on it though… by default, uh, well, it actually doesn’t. This was a bit of a rude surprise, I won’t lie. Basically, Nx has a variety of pluggable backends that allow you to specify different representations for its big chonkin’ matrices. The default backend stores them as an Elixir binary object, there’s one for Google’s GPU vector math lib XLA, another based on PyTorch, etc. This is really cool because it means you can switch out the backends for your Nx libs to support the hardware you have available… CUDA, ROCm, specific NPU things, plain ol’ CPU, etc.

The thing is that the default backend is pretty basic and not very smart, so it does lots and lots of cloning of binary’s when doing math. For simple use cases, in my tests it was nice and dense in terms of memory usage, but significantly slower than operating on dumb linked lists of numbers. Most people seem to recommend using the EXLA backend when you want actual performance. But EXLA is a big heavy complicated dependency that is really overkill for what I want or need. I have exactly zero robotics use cases that actually benefit from 100 MB of C++ code behind the scenes… well, yes I do, but they’re better served by special-case libraries like PCL and OpenCV.

So really, if you want a machine learning math lib, Nx is probably exactly what you want. But if you want a vector math lib for doing physics and geometry and some medium-scale linear algebra, then not so much. After some digging, there’s simpler and better alternatives for vector stuff. Matrex needs a bit of love but is built on BLAS and looks very good all in all, as does graphmath, and there’s honestly nothing particularly wrong with numerl for small systems. For numpy/pandas/R style dataframes, maybe look at Explorer. If you want to try to read my whole rambly problem-solving process investigating Nx, check out NxNotes, though you may find more questions than answers.

Sorry. Not the answer I wanted to give! But use the right tool for the right jobs.

Networked BEAM clusters / distributed Erlang

https://learnyousomeerlang.com/distribunomicon is probably the best place to start. https://hexdocs.pm/nerves_pack/readme.html#erlang-distribution also has a bit of practical info. I’m not an expert on this stuff, so this is just my observations and gut feels.

My (limited) experience with networked BEAM clusters has been a bit fraught from time to time. BEAM’s inter-node messaging (usually called Distributed Erlang) uses TCP/IP and is is generally assumed to be reliable. This is fine in a datacenter or a building with a dedicated wifi network, but with robots and other IoT/distributed devices, it can be pretty common that networking is done by fairly low-power wifi, other radio systems like Bluetooth or Zigbee, or even satellite. These can be subject to extreme lag, transient hiccups, vastly varying bandwidth, and other nasty behaviors very different than “on and operating at 100% effectiveness” or “off”. It also assumes that the network is secure, so make sure this is true. Probably want to use WireGuard or another VPN if you are using Erlang messaging to talk over a network that might have untrusted devices on it.

The Erlang network is a mesh and managed by the epmd daemon, which talks on port 4369. Nerves appears to do no firewalling by default, but it also runs nothing besides SSH and epmd afaict, so you’re probably ok to start without a firewall as long as you put it on your to-do list for a real deployment. By default you build an Erlang mesh by just telling each node the addresses of some/all other nodes, which really doesn’t scale up great. There’s other ways of doing it though: if you’re running only one VM per machine you can tell them to not bother with epmd and talk to each other directly on fixed ports, which makes firewalling much easier. Or you can use another way to announce systems such as libcluster, which has a number of methods for systems to find each other. The nodes still use reliable TCP to talk to each other once found though.

All in all, I’d probably only use Erlang messaging between separate machines in Nerves if a) all the systems were physically built into the same device and connected via copper, or b) you had a very clear understanding about its limitations and were operating on a controlled network anyway, such as a device-specific wifi network. For example I’d use it on a drone with a separate flight control computer and a safety monitor computer wired to each other over ethernet, or if I had a warehouse full of robot forklifts that all talk over a private wifi network to a central dispatching system. (Even there I’d worry about “what do the robots do if wifi goes down or gets spotty”, but that’s a relatively simple failure mode in a relatively controlled environment.)

But if you’re using genservers and such to abstract message sends behind function calls and services, hoooooopefully you can use an unreliable messaging layer and use these things to abstract it. So you don’t need to worry about sending messages directly between Erlang nodes anyway. Right? Right.

My takeaway from this is that if there’s two machines talking to each other over an unreliable transport, as far as I can tell there’s nothing wrong with having them each run a process that talks each other via some protocol designed for unreliable transport, instead of Distributed Erlang. It can be some kind of HTTP RPC, UDP broadcasts, heckin’ DDS or whatever else you want. Just use the right tool for the job.

Language integration

The docs are here: https://hexdocs.pm/nerves/compiling-non-beam-code.html. They provide some handy rules of thumb:

  • Build large and complicated C and C++ projects using Buildroot by creating a Custom system
  • Build small C and C++ projects using elixir_make
  • Look for libraries like zigler for specific languages
  • When hope is lost, compile the programs outside of Nerves and include the binaries in a priv directory. Static linking is recommended.

…Before you even start, experience has shown that searching the Erlang/OTP docs three times and skimming the Erlang source leads to all kinds of amazing discoveries that may not require you to port any code at all.

I can confirm that the Erlang/OTP system has many fascinating things in it that you might not expect, but sometimes you gotta roll your own. My Dream Robot Architecture is basically to use Rust for number-crunching and hardware interface, and Erlang/Elixir for the control plane, so I want to figure out how to do exactly that with Nerves.

The Erlang VM has several options for FFI, with a sliding scale between safety and performance:

  • Safest and slowest – “C nodes”, where you write an entirely separate program that talks to the Erlang VM over the Distributed Erlang protocol.
  • Medium-slow – “Ports”, where you write an entirely separate program that is started by the VM and talks to a process via a bidirectional byte stream, basically the same as a Unix pipe, Python’s subprocess.Popen, etc.
  • Medium-fast – “Port drivers”, which work like Ports but are loaded as a dynamically linked library into the VM’s address space and is run in the VM. People on the Elixir discord tend to say “Port Drivers IMHO should not be used anymore as there is no point” – they have all the weaknesses of both Ports and NIF’s with only some of the strengths.
  • Fastest and most hazardous – “NIF’s”, Native Implemented Functions. These are native-code programs compiled as DLL’s and loaded by the VM, which offer functions that can be directly called by Erlang/Elixir code. If you’ve used Python’s or LuaJIT’s FFI before this will probably be a familiar approach. This also means that if a function takes too long it can block the VM’s scheduler and Cause Issues, if a function crashes or corrupts memory it can crash the BEAM VM or cause it to silently give incorrect results, etc.

We’re gonna focus on two approaches: Making native code NIF’s, and making Nerves interoperate with non-Elixir BEAM languages. Nerves is mostly a build system, and the really hard nasty irritating part of FFI is always the build system, so this is really mostly about the build system anyway. If we can get this off the ground with the very basic stuff, you should be able to read other tutorials and get on to the more advanced stuff yourself.

We’re also only going to worry about building this stuff on Linux, for reasons that will become rapidly apparent. I’m using Debian Bookworm and/or Ubuntu 22.

Erlang

There’s some good example code in the nerves_examples repo. Nerves’s build stuff is all built using mix, but you can always just write your entire project as a library built with rebar3 if you wish, and use it as a dependency in the mix project.

At its simplest, you can just put Erlang code in the src/ dir of your root project and everything will pretty much just work. mix will find it and build it as normal modules. You can also swap out IEx to just use the Erlang shell if you’re a little masochistic by editing rel/vm.args.eex, the nerves_examples repo shows you how. (Is there a better command line REPL for Erlang? I should find out.) Or since Erlang and Elixir have almost the exact same data model, you can just call your Erlang functions from Elixir like normal.

Going a step beyond, you can use umbrella projects to just have an Erlang subproject in your Nerves project. However, mix’s umbrella projects are a little squirrelly and hard to use sometimes, so the Nerves people seem to have created the idea of a poncho project instead. I admire their ridiculous terminology! This is more or less exactly how you’d want to make a dependency as its own subproject; you literally move your “root” project into a subdirectory, make another subdirectory next to it, and add a dependency to it in your mix.exs, like so:

cd my_cool_nerves_project
mkdir my_cool_nerves_project
mv * my_cool_nerves_project
mix new my_subproject
vim my_cool_nerves_project/mix.exs
# Add {:my_subproject, path: "../my_subproject"} to your deps
cd my_cool_nerves_project
mix firmware

It’s literally just two entirely separate projects under the same directory. They can be Elixir, Erlang, or anything else that mix recognizes. There’s probably a reason mix makes umbrella projects complicated, but idk what it is. Probably because it tries to merge application configs automatically, and that just doesn’t seem like something you ever want to leave up to a machine. There’s always something that’s gonna step on something else by accident.

Lisp-Flavored Erlang

Does anyone actually use LFE? I should find an excuse to play with it someday. IMO Elixir is a perfectly fine Lisp and I don’t have much personal need for something different, but using LFE in Nerves looks pretty simple. That example works pretty much right out of the box, so I’ll leave it up to you to figure out the details. It’s nice.

Gleam

The newest member of the Erlang family! I know nothing about it besides it’s strongly typed and its creators are pretty cool.

We’re using asdf for this anyway, so installing Gleam is pretty simple:

asdf plugin add gleam
asdf install gleam 1.1.0
asdf global gleam 1.1.0

We will create a Gleam project as a poncho project for simplicity’s sake. It looks like Gleam has its own build system and build file format (sigh), and mix does not yet know how to use the gleam build system (sigh), so they basically can’t interoperate automatically yet. Someone wrote mix_gleam, a mix plugin that can be used to build Gleam code, but it has been described as working “in a narrow range of cases [to make a best effort attempt to compile Gleam for Mix”. So I’m not gonna dig too deep into it.

Huh, Gleam’s compiler and build tool are actually written in Rust. I assumed it’d just be an Erlang program and libs, the same way that Elixir and LFE essentially are. Not a terrible choice, just unexpected.

Gleam does compile to Erlang and output the .erl files in its build directory, so there has to be a way of telling mix to just build and load them automatically, but I don’t know enough about it to do that. It loooooooks like the gleam build tool knows how to build rebar3 projects, but not yet the other way around? Supposedly if you publish a Gleam project to hex.pm, it will just magically become an Erlang project in a format that mix and rebar3 and stuff will be able to understand. But I don’t want to do that, I want to build locally. I’ll just add “bend rebar3 and mix to my evil whims” to my bucket list, I guess.

Siiiiiiiigh, using rebar3 is not really my favorite passtime, but wouldn’t it be nice if the whole Erlang ecosystem used the same damn build system? Fucking hell this fragmentation is dumb.

Rust

rustler provides some nice tools for creating Rust NIF’s in Elixir projects. Looks pretty straightforward. After adding rustler to your mix.exs dependencies and doing mix deps.get, it gives you a very simple Rust crate under native/ that has a little bit of template code, like compiling to a cdylib by default and calling rustler:init!() to export the functions you’ve defined. I have to add a stub function and a little config in my lib/hello_nerves.ex module per the rustler docs, and fix my names to be consistent between the two. Running mix firmware.burn now invokes Cargo and builds my Rust code. Nice. Run my VM aaaaaaaaaand… it crashes like a boss.

Wonderful. There’s some Erlang error in the VM output, but it goes by too fast for me to see. Well, now we have a reason to add config :nerves, :erlinit, hang_on_exit: true to our config/target.exs. And when the VM crashes the error message I get is:

Failed to load NIF library: Error loading shared library ld-linux-x86-64.so.2: no such file or directory

Well this makes sense, rustler builds stuff as a shared library and it appears Nerves uses only musl which doesn’t support dynamic linking, so… uh… how do NIF’s in Nerves ever work???? It has to use some C in there somewhere or another, I’ve seen it being built! I can try building my rust crate alone with cargo build --target x86_64-unknown-linux-musl in its directory, but cargo refuses to try to produce a dynamic lib when using the musl target, which makes sense.

Ok let’s try again. Let’s try building Rustler stuff in a normal Elixir project totally without Nerves involved, and make sure we know how to use it.

mix new heckin_rustler
cd heckin_rustler
vim mix.exs
# Add {:rustler, "~> 0.32.1"} to deps
mix deps.get
mix rustler.new
# It asks me for a couple names, then generates a new Rust crate in native/heckinrust.
# There's instructions for how to write the Elixir glue code in native/heckinrust/README.md
# Just copy-paste the example for now into lib/heckin_rustler.ex and build your stuff:
mix compile 
# All your Rust code should now be built, so you should be able to do:
iex -S mix
...
iex(1)> HeckinRust.add(3,4)
# Returns 7

Great, it works! Rustler isn’t garbage that can’t even do the trivial things; always good to confirm. Our Rust code is now built into a shared object using glibc in _build/dev/lib/heckin_rustler/native/heckinrust/release/libheckinrust.so. Success! How do we make this cooperate with Nerves? Let’s start from an entirely fresh Nerves project.

You DID do mix archive.install hex nerves_bootstrap, right? You didn’t just nuke ~/.asdf/ to start from scratch and not faithfully follow the instructions in NervesLocalSetup to get a working Nerves system again? Good. Just asking. No reason why.

…Let’s just make our new Nerves project and add Rustler to it:

mix nerves.new hello_rustler
set -x MIX_TARGET x86_64
# or for bash rather than fish:
# export MIX_TARGET=x86_64
mix deps.get
mix firmware.burn --device hello_rustler.img
mix nerves.gen.qemu_script
./run_qemu.sh hello_rustler.img

Everything starts and runs, and we can SSH into our VM, run HelloRustler.hello, and get :world back. Great, everything works. Now let’s add {:rustler, "~> 0.32.1"} to our mix.exs and do mix rustler.new. That creates our Rust crate boilerplate in native/, we add the Elixir side of the boilerplate to lib/hello_rustler.ex, and do our mix firmware.burn command. It builds our Rust code, builds our firmware image… and does the same thing where it can’t load our NIF module because it can’t load the .so file. Because it uses musl instead of glibc and musl doesn’t do .so files. …right? The .so file DOES exist in our built firmware image, I can confirm that at least!

Okay, it’s time to do some surgery. Question one, is this assumption correct? Can musl use shared objects somehow, or does the Nerves system not use musl for everything? I don’t know TOO much about musl in practice and glibc is pretty cursed, so I’m trying to make an educated guess here. Let’s unzip the .fw file this project generates, mount the squashfs that contains the root filesystem, and poke around a bit. I find a number of .so files that look legit, and– well, there’s one program I know HAS to be set up correctly and actually used, and that’s Erlang:

file srv/erlang/erts-14.2.5/bin/erlexec 
# Prints:
# srv/erlang/erts-14.2.5/bin/erlexec: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped

mmmmHM, so my educated guess appears to be wrong. musl does have a dynamic linker of SOME kind or another, interpreter /lib/ld-musl-x86_64.so.1 is pretty unambiguous. Is that actually provided by musl or is to some weird hack? Am I out of date? I am out of date. It doesn’t enable it by default, but you can now totally use musl’s libc as a dynamic library, and it includes a full-ish dynamic linker. Neat! That explains how Nerves uses stuff with NIF’s, I’m not fundamentally insane. Whew!

Well then cross-compiling our Rust should be pretty straightforward, if we just build for the musl target:

rustup target add x86_64-unknown-linux-musl
cd ~/tmp/hello_rustler/native/hirust
cargo build --target x86_64-unknown-linux-musl
# error: cannot produce cdylib for `hirust v0.1.0 (!/tmp/hello_rustler/native/hirust)` as the target `x86_64-unknown-linux-musl` does not support these crate types

So you CAN build code using musl libc as a dynamic library, but rustc still doesn’t want to by default. Searching for “rust musl cdylib” gets me this issue, which can be summed up as “we don’t want to have our musl builds use dylib’s by default because people tend to use musl to avoid dylibs”, which is fairly reasonable. This then asks for the ability to build dylib/cdylib’s using musl, which links to this unclosed issue but also says you can set some RUSTFLAGS env vars to make it build dynamic libs with musl. Trying that out in our native/hirust crate appears to work! …well, it builds without errors. “Work” is a higher bar than that.

Rustler already creates a native/hirust/.cargo/config.toml file for us with some MacOS config stuff with it, so we should be able to just edit that and add the following so we don’t have to keep messing around with env vars:

[target.x86_64-unknown-linux-musl]
rustflags = [
    "-C", "target-feature=-crt-static"
]

If you’re not building for an x86_64 target, change that part accordingly. Then export the env var CARGO_BUILD_TARGET=x86_64-unknown-linux-musl to tell cargo to build on the target you desire by default.

Cargo isn’t terribly good at detecting changes to command line opts in config.toml, so just nuke the _build/ dir to force everything to recompile from scratch. Mix compiles our Rust crate but complains that _build/x86_64_dev/lib/hello_rustler/native/hirust/release/libhirust.so doesn’t exist. That’s cause it’s in a different subdir, _build/x86_64_dev/lib/hello_rustler/native/hirust/x86_64-unknown-linux-musl/release/libhirust.so. So how do we tell Rustler to search in the correct dir? Cross-compilation in Rustler seems to be in a rather unfinished state, alas.

I’m not going to fix that right this instant. So I’m just gonna heckin’ go into _build/x86_64_dev/lib/hello_rustler/native/hirust and do ln -s x86_64-unknown-linux-musl/release/ to make it look in the right directory. Aaaaaand that doesn’t work because ’cause cargo actually needs those directories to be different and so it blocks forever on a locked file.

FINE. I’ll FUCKING COPY THE .SO to the place rustler expects it to be. And it still doesn’t work, on boot our VM still gives us Failed to load NIF library: Error loading shared library ld-linux-x86-64.so.2: no such file or directory! Our generated .so file isn’t trying to use musl’s dynamic loader at all! What the fuck?! This says I should be able to fix that by setting -C linker=musl-gcc in a RUSTFLAGS env var (or in my .cargo/config.toml). But then when I do THAT I get a big chonkin compile error ending with /usr/bin/ld: cannot find libgcc_s.so.1: No such file or directory. Which leads to this issue about rustc improperly adding libgcc_s to the link args even when it shouldn’t, which might have been fixed in Alpine since 2022 but doesn’t seem to have made it upstream yet???

Ok, I think going seven tickets deep into build issues is my limit. So yeah. tldr, rustler needs to learn how to cross-compile, preferably by reading Nerves’s env vars. But I can’t really blame them for not wanting to put the work in because libc is a fuck, dynamic linking is a fuck and nothing knows how to dynamic link with musl yet, gcc is a fuck, cross-compiling is a fuck, and Nerves and Rustler are just doing the best they can. Maybe I’ll come back to it when creating my own nerves_system package someday.

Never, ever do this

………ahahahahah oh my fucking gods. So, if libgcc_s.so.2 isn’t actually being used but is just incorrectly put into the list of libs to link our Rust crate with, can we just… stub it out? Will the linker accept an empty file if it never needs to try to actually use it? I wanna see what happens, mostly out of morbid curiosity. Let’s look at the linker error message to see where it’s looking for it, so we can try putting something there. ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/ looks like a likely candidate, so let’s just do touch ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-musl/lib/libgcc_s.so.2. Build our stuff, put the .so in the right place, build it some more, ignore the part where the poor Erlang VM complains [warning] The on_load function for module Elixir.HiRust returned: {:error, {:load_failed, ~c"Failed to load NIF library ~/tmp/hello_rustler/_build/x86_64_dev/lib/hello_rustler/priv/native/libhirust: '/lib/x86_64-linux-gnu/libc.so: invalid ELF header'"}} during our build process for some reason, burn the image, boot the VM, and… it boots. Can I run my NIF? My poor, blighted, tragic NIF that desperately wants to just add a + b together and has no idea what all the fuss is about?

iex(1)> HiRust.add(3,4)
7

Holy sweet cats in the sky it worked. …well, “worked”. This is right up there with the time I edited an executable with vim to fix a dynamic lib path, on a production system we’d already shipped to a customer. That “worked” too.

So yeah. This is not a fix. This isn’t even a hack. I have no idea what this might blow up. I would expect it to harm nothing, but I sure as hell don’t want to find out I’m wrong about that.

(And then I paused qemu, forgetting that Nerves doesn’t like that because it makes the clock skip, and somehow unpausing it crashed my laptop’s GPU driver and I nearly lost this entire writeup. Probably the universe trying to punish me for my hubris. It’s all right, I deserved it, and I was lucky enough to save this file anyway. Suck it, universe!)

(Also, it just occurred to me that you can’t necessarily guarantee with this method that the version of musl that Rust links against is even the version of musl that Nerves actually uses. So the real answer is probably to do whatever is necessary to build stuff in Buildroot, but Buildroot apparently doesn’t yet support Rust.)

Zig

There’s a similar library to build Zig NIF’s from Elixir, called Zigler. Again the nerves_examples repo has some example code.

Its documentation is out of date with its contents, it doesn’t work, and it doesn’t work in a different way each time I try it. So I think I’m done screwing around with NIF’s for a while. My hopes for all of this were so much higher.

More qemu recipes

x86_64 VM with network

There’s actually a mix shortcut that makes you an appropriate (if basic) qemu script, when you have the MIX_TARGET env var set to x86_64: mix nerves.gen.qemu_script. It produces a script with this output (as well as a help command and some other conveniences):

IMAGE="$1"
qemu-system-x86_64 \
    -drive file="$IMAGE",if=virtio,format=raw \
    -net nic,model=virtio \
    -net user,hostfwd=tcp::10022-:22 \
    -serial stdio

You can then ssh to the VM with ssh -p 10022 root@localhost. You can specify SSH keys in config/target.exs, by default it will look for ~/.ssh/id_{rsa,ecdsa,ed25519}.pub on your host system and put those into the target system.

No mix targets that will generate similar scripts for RPi though, alas.

x86_64 VM with network and external data drive

# Create 1 GB disk image
dd if=/dev/zero of=data.img bs=1M count=1024
# Partition it as you wish.  Can we make fwup do this somehow?
# cfdisk data.img
# Or just create the filesystem directly upon the image, with no partition table:
mkfs -t ext4 data.img


qemu-system-x86_64 \
    -drive file=hello.img,if=virtio,format=raw \
    -drive file=data.img,if=virtio,format=raw \
    -net nic,model=virtio \
    -net user,hostfwd=tcp::10022-:22 \
    -serial stdio

This will give you by default a device called /dev/vdb, with Nerves’s /dev/rootdisk0* being /dev/vda. The ordering of these is not always consistent, I suspect that qemu just puts ’em where you expect based on the ordering of the -drive args unless you specify otherwise. You can prooooobably make fwup create this image for you but idk how yet, and modifying fwup.conf is something I’d prefer to avoid unless I’m specifically making a whole new platform – which I probably would do for particular applications, but not until I knew pretty well what I was doing.

Anyway! To mount this image on boot you edit config/target.exs and add config :nerves, :erlinit, mount: "/dev/sdb1:/mnt:ext4:nodev:", and you should be good to go.

GUI stuff

Not many options right now. Either build a web gui with Phoenix/Liveview/your poison of choice, or use Scenic, which as of April 2024 is apparently quite lightly maintained but still alive. There’s a couple libs out there for particular models of display as well, I don’t know enough about it to judge them.

Creating new hardware system packages

The docs for this are https://hexdocs.pm/nerves/customizing-systems.html, which says that you should first read https://hexdocs.pm/nerves/systems.html.

It appears to be mainly compiling a kernel, building Erlang and Busybox, and setting up all the config and devicetrees and such correctly. A lot of the hard work is done through Buildroot, and a lot of the easy work seems to be usually done by cloning an existing nerves_system_* package and modifying it for your new target. This lets you set up a totally-customized “base” system for your Elixir environment, which is nice, and it appears pretty common to take the generic system packages and tweak them for specific devices. For example on hex.pm there’s a generic nerves_system_rpi3 and then a nerves_system_farmbot_rpi3 for Farmbot, a generic nerves_system_bbb for Beagleboard support and then a nerves_system_bbb_sgx version that includes the proprietary GPU drivers for it, stuff like that.

The docs seem pretty ok but also very target-specific, so I’m not going to go into it here. If you can install/configure u-boot and build a Linux kernel then you should prooooooobably be able to figure out how to make a nerves_system package. Not a low bar, but not a super high one either. I want to try running Nerves on a couple Rockchip devices, which don’t have existing system packages but do have good Linux support, so if I ever get around to it then I will write up something about how to do it.

Random bits and pieces

D-Bus is not normally enabled on Nerves. It may be enabled in a custom system.

I guess that makes sense, D-Bus is just Erlang messages for C programs. Oh, Nerves does have udev though. Most of the standard linux tools for using it are absent but it does have busybox’s /sbin/uevent, which gives you a (very minimal) place to start. Ahhhhh, and it also has the nerves_uevent package, which is probably what you want anyway. Niiiiiice.

There’s also nerves_time to handle time via various methods (NTP, real-time clock, etc.) And nerves_pack which appears to collect together various… well, random bits and pieces.