RustStarterKit2020

People were arguing about Rust’s std lib recently, so I went through the Cargo.toml of all the Rust projects I’ve written since 2015 and picked out the choice tools that get used over and over again. Up to date as of October 2020.

Also see RustCrates, though that’s old. There’s also this, which is narrower but deeper, and awesome-rust, which is shallower and broader, and the various more specific websites for various topics.

Dev tools

I need to set up a new Rust dev environment, what do I install?

Linting – clippy

The one, the only, the great Rust style and correctness linter. Want to learn how to write “idiomatic” Rust, or just learn more about handy little corners of the language and library? Run clippy regularly. It’s distributed with the compiler via rustup now, so you have no excuse not to.

Build cache – sccache

Or, “how to make a full rebuild 70% faster”. sccache is a build artifact cache similar to icecream or ccache, except it’s actually trivial to just use. cargo install sccache, add a single line in a home dir config file, and you’re ready to go. Pretty much handles most crate and compiler versioning issues for you, so it Just Works if you update crates or install a new version of rustc or something. I think I’ve had to force-clear the cache due to some build weirdness a grand total of once. Looks like it has enough features to use in a professional context as well, at least on a small-to-medium scale.

Dependency viewer – cargo-tree

Lets you easily view what dependencies you are using, and what dependencies they are using, and so on. Best way to start cracking down on flabby dependencies.

Benchmarking – criterion

Basically the best benchmark system out there. Incredibly simple to use, informative, and statistically sound. Doesn’t really do profiling, but it’s a good start for understanding your program’s performance, and better for proving that your implementation of X is faster than someone else’s.

Other things

Stuff that is less general purpose but occasionally very useful for the meta-programming process of choosing libraries, evaluating them, etc.

  • cargo-geiger – Measures how much unsafe code is in a codebase, and its dependencies
  • cargo-crev – A very neat tool for authoring and verifying distributed code reviews.
  • Various tools maintained by Embark Studios, useful for production/company purposes like checking licenses, pinning specific versions of crates, etc.

Fundamental algorithms

The cool stuff Real Computer Scientists write about.

Hashing

No specific crates here. There’s no single crate that provides All The Hash Algorithms, just lots of little ones that generally provide a single algorithm each. Just type the name of the algorithm you want into crates.io and you’ll get at least a couple options, choose the one with 8 million downloads or whatever. sha2, md5, crc, etc. Lots of them are written by the Rust core team.

Compression

Same as the hashing category. Type zip or bzip2 or whatever into crates.io and you’ll get what you need. flate2 might be the one crate that’s not quite trivial to find. Again, many of them are written by the Rust core team.

Encryption

I have little actual experience or authority on this topic, so I’m going to punt on this one.

Pseudorandom number generator

Use oorandom. (Disclaimer, I wrote oorandom, but people besides me seem to like it.) More usually you’ll see the rand crate in use. If you’re doing Real Science and need to generate fancy probabilities, then rand is the right tool, but most people aren’t doing that. Otherwise rand is complicated and has lots of features, while oorandom is very simple and has about two features, and I expect 80% of code to use at least one of them. rand has had several major breaking changes in its history that the rest of the ecosystem still hasn’t caught up with, while I intend oorandom’s API to change maybe twice in my lifetime. (Its version number, while obeying semver, is mostly a joke.)

There’s other lightweight PRNG crates that are just fine; see oorandom’s readme for a list of some others and choose one you like. Whatever you choose, use the getrandom crate to produce Real Random Seeds for it.

Utility

“I just need to solve this ooooooone common problem, but it needs to be solved WELL…”

Logging – log

Need to output log messages in your code? Why, use the log crate. Where do the log messages go? log provides only an interface, and that interface compiles to nothing if it isn’t used. You can write your own system for it to actually output the logs to, which is pretty easy, or use one of the small plethora of crates for it. My preferred one is pretty_env_logger, but fern, slog and others are all good too.

Parallel data crunching – rayon

Ever have some computation where you have a big list of STUFF and want to process it in parallel, farming out jobs to as many threads as you have CPU’s? That’s what rayon does, and it does it really, really well. You still have to know what you’re doing, but changing a single .iter() into .par_iter() and watching your CPU-bound data-crunching run 8x faster is pretty magical. Now your CPU can help keep you warm this winter!

Please don’t use it in a library. It’s rude to spawn threads in library code, unless that’s specifically what the library is for.

Regexes – regex

To quote the inestimable jwz:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

On the other hand, somebody’s gotta save the day. So, use the regex crate. Also use anything else written by BurntSushi. BurntSushi is a paragon of Rust program design, and also just a great human being charred cuisine in general.

Threadsafe globals – lazy_static

“I know globals are evil,” you say, “but I just need one. I’ll only use it for good, I promise.” lazy_static has your back.

May eventually be superseded by once_cell, which looks like its headed for inclusion into std.

Serialization/deserialization – serde

Ever have a struct and just wanted to turn it into JSON, CBOR, XML, or some other engine of woe and devastation designed to be written to an I/O stream? Or had a blob of random JSON and wanted to just stuff it into a struct matching it? Sure you have. serde lets you do this with a single #[derive]. serde is without a doubt one of Rust’s killer libraries. It is better than any other serialization system I have ever used.

What data formats does it support? Anything; the actual reading and writing is done via plugin library. There’s a wide selection of them, of varying quality, and writing your own is a little tedious but not terribly difficult.

Error handling

This spot deliberately left blank.

Rust’s Result<T,E> type is one of the best setups for lightweight, transparent error handling I’ve seen, but it doesn’t do everything. How do you easily write your own error type without a bunch of boilerplate? What if you have multiple different error types from different libraries you want to coalesce together? How do you collect a backtrace of every function an Err is returned through, so you can find the root cause of where it came from? Can we do all this without allocating anything unnecessarily? And so on.

There have been various crates to try to solve these problems. First in 2015 there was error_chain, which was complicated and not very convenient. Then in 2017 there was failure, which was simpler but not very flexible, and which took an irritatingly long time to compile. Then in 2019 there was anyhow, which was about the time I stopped paying attention. Now apparently the new kid on the block is eyre, and I’m sure that in another year or two there will be something else.

So, I just write the boilerplate and make my errors descriptive enough I don’t need a backtrace. When I want to get fancy I implement the built-in Error trait, which used to be kinda useless but is now more helpful. And in another five years it’ll still work just fine.

Byte mucking – bytemuck

For the rare occasions you need to turn a structure into arbitrary &[u8] or back. Doing this using unsafe pointers is quite easy, and also makes it very easy to screw up horribly with Undefined Behavior galore. (Did you know that changing the value of padding bytes in a struct in UB? You do now.) bytemuck lets you muck around with bytes a little more responsibly.

Human dates and times – chrono

Rust’s std::time doesn’t really handle calendar or wall-clock times, just arbitrary, monotonic Instant’s and measurable Duration’s between them. Nice, pure, computationally-robust time measurement. For all the nasty human calendar and timezone stuff, you use chrono. (And maybe humantime, but I personally reach for chrono first, just out of habit.)

Bit flags – bitflags

Defining type-safe bit-masks in a reasonably convenient way. Not always worth the trouble, but sometimes pretty convenient.

Data, data types and data formats

“I have to create or read a…”

PNG/JPEG/GIF/etc – image

General-purpose loading and saving and images, which can handle a lot of formats. Can do some amount of image manipulation as well, such as cropping, smoothing, etc. but that will hopefully be pulled out into its own library at some point soon.

Small data THINGS – uuid, base64, csv, semver

Exactly what it says on the tin.

Media codecs – Various

Not aware of any great encoders, but there’s plenty of decoders for common audio formats. lewton for Ogg Vorbis, hound for .wav, minimp3 for MP3, claxon for FLAC. Video, I haven’t used enough to have an opinion on.

Config files – toml

For all your config file format needs. Works with serde, naturally.

Markdown – pulldown-cmark

There’s several good Markdown readers and writers, pulldown-cmark is my favorite. It supports CommonMark, it’s simple to use, and it’s pure Rust.

Templating – askama

There’s several quite good text templating engines, but askama IMO rises above them all by compiling your templates into Rust code and type-checking your templates at compile time. Sometimes this isn’t what you want, but it is a great feature surprisingly often. This also makes it super fast, for when you really need to generate 150,000 HTML pages in a couple minutes. For when I need templates that can be altered at runtime, on the other hand, tera is my default pick.

Lexing/parsing

For lexing, I have never seen anything better than logos. Derive a trait on your token type, add a few annotations, and you have a lexer.

There are many good parser libraries out there in different styles: nom, pom, lalrpop, pest, combine… They’re mostly quite good, in my experience, but in practice I tend to either use Known Formats like CBOR or TOML, or I write a parser by hand. Play around and find what you like.

Data structures

Rust’s standard library includes most data structures you need: hash map, ring buffer, all the good things. Can’t include everything though, so here’s a few things to fill in the cracks.

Immutable data structures – im

Fast immutable data structures. Rust is a little weird, ’cause the borrowing and ownership actually makes immutable data structures both difficult and usually unnecessary – things can’t be mutated without you knowing it, because the borrow checker won’t let you. This makes trying to write code in a purely functional style feel weird. However, there’s some situations where immutable data structures are actually really nice – the one I ran into was passing shared data structures around in an Erlang-like multithreaded actor system. For those cases, im is the way to go.

Generational map – genmap

What I call a “generational map”; I’ve also seen it called a slot map, handle map, and some other things I can’t remember. This is very much not a new structure, but I haven’t seen it often outside of video game code. Like oorandom, this particular crate is my own implementation, but there’s several others that are good for different use cases; see the crate’s README file for links to some others.

So what does a generational map do? It’s basically an array where you insert things and get an opaque handle back, which is actually just the array index. This is a quite useful pattern in Rust code when you have a collection of things with complicated dynamically-checked lifetimes, but using an Rc for them feels icky. The improvement over just using plain array indices is that a generational map also has a “generation” count also stored in each handle, and the map’s generation counter increments each time you remove an object from the array. So when you remove an item, you can put a new one in the same slot in the array and reuse that storage, but the new object’s handle will have a different generation counter. So if you try to look up an object with an old handle that used to refer to something else, instead of silently getting the wrong item, you get a runtime error. It’s kind of the inverse of an Rc, which promises that an object will never be freed until there are no more pointers referring to it. A generational map lets you free an object, but trying to use a dangling pointer will be a safe runtime error.

Stable hashtables – fnv

fnv is a fast, stable hash algorithm, and the crate includes simple wrappers for using it with Rust’s standard HashMap and HashSet. Rust’s default HashMap is not “stable”, which is to say, it contains an element of randomness. If you make two different HashMap’s and insert the exact same values into them in the same order, they will be stored in a different order internally, which prevents Hash DoS attacks. This is the sane default you want, but sometimes you want a stable hash map (so giving it the same contents results in the same ordering) or you need slightly higher performance. The fnv crate gives you both. The main time I wanted it was writing a small compiler where I wanted to have reproducible builds.

Multithreaded hashmap – dashmap

I’ve never actually used this but always wanted to find an excuse to. Basically a replacement for Mutex<HashMap<K,V>> that breaks the hashmap into several portions, each with its own lock, increasing performance and reducing contention. There’s a few similar crates out there, some of which are lockless, but like I said I’ve never had an excuse to use dashmap, so I haven’t explored alternatives either.

Thread pool – ???

If you’re just feeding data to a fixed number of worker threads, using a thread pool of some kind is generally simpler and more convenient than herding threads around yourself. There’s several thread pool crates with good reviews but, I haven’t used any often enough to have specific recommendations. There’s also various scoped threadpools, which make sure that the threads never outlive the current program scope. This lets you feed references to local data into the threads, instead of forcing you to clone or Arc them.

Networking

“I need something fancier than simple network sockets but I’m not writing an HTTP service.”

HTTP client – reqwest

Need to fetch something via HTTP? reqwest is the way to do it. Not the most lightweight of things since it uses tokio to drive its I/O (see below), but it’s fast, robust, and can do about anything.

Lightweight alternative to try out: ureq. Haven’t used it personally yet, though.

Application dev

“Okay, I’m making an interactive program, so it needs to be able to…”

Command line parsing – structopt

Like serde, I consider this one of Rust’s killer libraries. No other language has anything nearly as good. You define a Rust struct and throw some annotations onto it, and you now have a command line parser. It’s amazing. Its downside is it’s relatively slow to compile and doesn’t produce the tiniest code, though it’s not too bad. Lightweight alternatives: argh. You can get more lightweight with things like clap or argparse, but those don’t have the macro-based goodness of structopt, and I can’t live without that now.

Env file loading – dotenv

Load options from environment variables or .env files in the local directory. This is generally what server systems do for configuration so they don’t have to store usernames and passwords in config files that can get accidentally checked into version control. Very handy pattern if you’re making something that needs to touch a remote database or web service. It’s trivial to implement on your own, but it’s already been implemented once, so why redo the work?

Cross-platform config file locations – directories

The dual of dotenv, it gives you an easy, cross-platform way to access config files in the Right Place on whatever platform you’re using.

Interactive CLI – rustyline

Rust equivalent of the readline library. readline (or its descendent linenoise) is what bash and just about every other command line program in existence uses to give you an editable terminal that lets you backspace, jump to beginning/end of lines, search backwards through a buffer, and so on. rustyline does the same thing without needing to bind to a C library.

Progress bar – pbr

Displays a progress bar in the console, with simple configurable styles. It’s amazingly handy.

Web stuff

I am really not the person to ask here, and what knowledge I do have is pretty out of date.

Databases

Again, I am the wrong person to ask, but in my modest experience… diesel is a very good, database-independent query builder. Database-independent query builders/ORM’s/etc also might be a bit of a white elephant. Usually life gets simpler when I just find a crate that interfaces with Postgres or SQLite directly, and I end up making better systems when I do it in terms of SQL directly. diesel’s migrations are pretty nice, though. I dunno.

Things I avoid

Because every positive is incomplete without its negative. These are things that are popular, and are definitely not bad code or bad designs, but which I will personally choose not to use without a good reason. There are good reasons to use them, but most of the time if I see these in a project’s dependency tree I wince a little and expect excess complexity, two minute compiles, or inconvenient breaking changes within 6-12 months.

Again, these are from my personal experience and priorities. Your needs are not necessarily mine.

ring

A very correct, very complete, very rigorous encryption library. Due to the level of rigor involved, and the analness personality of the people required to maintain that level of rigor, they have an Interesting policy regarding releases. Plus, for technical reasons, you can’t have multiple versions of ring linked into a single crate like you can with most crates – there’s some asm embedded in there, and the symbols in the asm code don’t get mangled by rustc and so can clash. So if you write a program that uses ring, and it has a dependency that uses ring, they must always use the exact same version of it or else your program will not compile, and the ring developers will shrug and say “that’s your problem, we aren’t going to make life easier for you”. It used to be that all but the most current versions of ring were deliberately yanked by the developers – marked “do not use” on crates.io, which made it still available if you really wanted but cargo would refuse to compile it. That just made the problem less tractable, and really irked me once upon a time, though fortunately they seem to have given up on this madness more recently.

What this adds up to is, if you use ring, then having your application compile and function from one day to the next is always less important than ring working correctly, for the value of “correctly” defined by the library devs, and you will get exactly zero warning or sympathy when someone else decides to break your code forever. This is a perfectly reasonable and robust approach to the problem of dealing with security in human-built tools, and I hate it. Best to just move on and deal with other things.

tokio

Tokio is a multi-threaded runtime for fast async I/O, particularly for network servers. It is also a ton of work to use, a ton of compile time, and a ton of learning to do. If you need it, it’s fantastic, but if you only kindasorta need it, I find it less work to pass it by. I’ll spawn a thread per connection, I ain’t scared! On the other hand, if you already use tokio all the time and you’re already on top of the learning curve, then it’s probably pretty easy to just whip it out whenever you need something I/O-ish. Lightweight alternatives to try out sometime: smol, popol.

proc-macro2/syn

These are the foundational crates of Rust’s procedural macro system. They are kindasorta part of the compiler, but not actually distributed as part of the compiler for Reasons I haven’t bothered finding out. Procedural macros are what makes it possible to have awesome metaprogramming tools like serde, structopt and logos, and these crates are the glue that binds it together. They are also slow to compile, no two ways about it, and generating procedural macros is also slow. So if you need to make a tool that does awesome metaprogramming, or you need to use a tool that does awesome metaprogramming, then go ahead and use these. But IMO it’s not worth the extra twenty seconds of compile time for a small convenience crate like derive_more.

num

A crate containing lots of numerical traits for compile-time metaprogramming, like Integer, Unsigned, etc. That’s cool, but it’s a large, complicated, slow-compiling dependency that evolves breaking changes just fast enough to be irritating. I’ve written and used a fair amount of math code of various types, and I’ve never actually wanted or needed num’s metaprogramming traits, but those are the part that infiltrate the dependency trees of everything ever. On the flip side though, num also provides quite nice implementations for fancier numbers like ratios, complex numbers, bigint’s etc, so it’s totally worth using for those.

crossbeam

Additional, occasionally faster concurrency primitives than what the standard library provides. Multiple-producer-multiple-consumer queues, exponential backoff timers, fancy atomic things, huzzah! Great stuff, but also 95% of the time you aren’t gonna need it. If you profile your program and discover you really need to be spending 3% of your time on mpsc::send() instead of 5%, then you can always just drop crossbeam in. But until then, if you can do your stuff with just what’s already in std, please do.

parking_lot

More compact and efficient implementations of the standard synchronization primitives. Which, like crossbeam, you aren’t going to need. If benchmarks show that Rust’s standard synchronization primitives are too slow for your application, then go for it, but the time it takes to lock or unlock a Mutex has never been a bottleneck for me. Okay, it has once, but I switched to using rayon and it did everything I needed better.

Also, if parking_lot’s implementation of this stuff was unilaterally better it’d be in std already, so I wonder what the catch is. – Update: Looks like the whole story is here, and a more actionable summary is here.