RustStarterKit2020
People were arguing about Rust’s std lib recently, so I went through
the Cargo.toml
of all the Rust projects I’ve written since
2015 and picked out the choice tools that get used over and over again.
Up to date as of October 2020.
Also see RustCrates, though that’s old. There’s also this, which is narrower but deeper, and awesome-rust, which is shallower and broader, and the various more specific websites for various topics.
Dev tools
I need to set up a new Rust dev environment, what do I install?
Linting – clippy
The one, the only, the great Rust style and correctness linter. Want
to learn how to write “idiomatic” Rust, or just learn more about handy
little corners of the language and library? Run clippy
regularly. It’s distributed with the compiler via rustup
now, so you have no excuse not to.
Build cache – sccache
Or, “how to make a full rebuild 70% faster”. sccache
is
a build artifact cache similar to icecream
or
ccache
, except it’s actually trivial to just use.
cargo install sccache
, add a single line in a home dir
config file, and you’re ready to go. Pretty much handles most crate and
compiler versioning issues for you, so it Just Works if you update
crates or install a new version of rustc
or something. I
think I’ve had to force-clear the cache due to some build weirdness a
grand total of once. Looks like it has enough features to use in a
professional context as well, at least on a small-to-medium scale.
Dependency viewer –
cargo-tree
Lets you easily view what dependencies you are using, and what dependencies they are using, and so on. Best way to start cracking down on flabby dependencies.
Benchmarking –
criterion
Basically the best benchmark system out there. Incredibly simple to use, informative, and statistically sound. Doesn’t really do profiling, but it’s a good start for understanding your program’s performance, and better for proving that your implementation of X is faster than someone else’s.
Other things
Stuff that is less general purpose but occasionally very useful for the meta-programming process of choosing libraries, evaluating them, etc.
cargo-geiger
– Measures how much unsafe code is in a codebase, and its dependenciescargo-crev
– A very neat tool for authoring and verifying distributed code reviews.- Various tools maintained by Embark Studios, useful for production/company purposes like checking licenses, pinning specific versions of crates, etc.
Fundamental algorithms
The cool stuff Real Computer Scientists write about.
Hashing
No specific crates here. There’s no single crate that provides All
The Hash Algorithms, just lots of little ones that generally provide a
single algorithm each. Just type the name of the algorithm you want into
crates.io
and you’ll get at least a couple options, choose
the one with 8 million downloads or whatever. sha2
,
md5
, crc
, etc. Lots of them are written by the
Rust core team.
Compression
Same as the hashing category. Type zip
or
bzip2
or whatever into crates.io and you’ll get what you
need. flate2
might be the one crate that’s not quite
trivial to find. Again, many of them are written by the Rust core
team.
Encryption
I have little actual experience or authority on this topic, so I’m going to punt on this one.
Pseudorandom number generator
Use oorandom
. (Disclaimer, I wrote
oorandom
, but people besides me seem to like it.) More
usually you’ll see the rand
crate in use. If you’re doing
Real Science and need to generate fancy
probabilities, then rand
is the right tool, but most
people aren’t doing that. Otherwise rand
is complicated and
has lots of features, while oorandom
is very simple and has
about two features, and I expect 80% of code to use at least one of
them. rand
has had several major breaking changes in its
history that the rest of the ecosystem still hasn’t caught up with,
while I intend oorandom
’s API to change maybe twice in my
lifetime. (Its version number, while obeying semver, is mostly a
joke.)
There’s other lightweight PRNG crates that are just fine; see
oorandom
’s readme for a list of some others and choose one
you like. Whatever you choose, use the getrandom
crate to
produce Real Random Seeds for it.
Utility
“I just need to solve this ooooooone common problem, but it needs to be solved WELL…”
Logging – log
Need to output log messages in your code? Why, use the
log
crate. Where do the log messages go? log
provides only an interface, and that interface compiles to nothing if it
isn’t used. You can write your own system for it to actually output the
logs to, which is pretty easy, or use one of the small plethora of
crates for it. My preferred one is pretty_env_logger
, but
fern
, slog
and others are all good too.
Parallel data crunching –
rayon
Ever have some computation where you have a big list of STUFF and
want to process it in parallel, farming out jobs to as many threads as
you have CPU’s? That’s what rayon
does, and it does it
really, really well. You still have to know
what you’re doing, but changing a single .iter()
into
.par_iter()
and watching your CPU-bound data-crunching run
8x faster is pretty magical. Now your CPU can help keep you warm this
winter!
Please don’t use it in a library. It’s rude to spawn threads in library code, unless that’s specifically what the library is for.
Regexes – regex
To quote the inestimable jwz:
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
On the other hand, somebody’s gotta
save the day. So, use the regex
crate. Also use anything else written by BurntSushi.
BurntSushi is a paragon of Rust program design, and also just a great
human being charred cuisine in general.
Threadsafe globals –
lazy_static
“I know globals are evil,” you say, “but I just need one. I’ll only
use it for good, I promise.” lazy_static
has your back.
May eventually be superseded by once_cell
, which looks
like its headed
for inclusion into std
.
Serialization/deserialization –
serde
Ever have a struct and just wanted to turn it into JSON, CBOR, XML,
or some other engine of woe and devastation designed to be written to an
I/O stream? Or had a blob of random JSON and wanted to just stuff it
into a struct matching it? Sure you have. serde
lets you do
this with a single #[derive]
. serde
is without
a doubt one of Rust’s killer libraries. It is better than any other
serialization system I have ever used.
What data formats does it support? Anything; the actual reading and writing is done via plugin library. There’s a wide selection of them, of varying quality, and writing your own is a little tedious but not terribly difficult.
Error handling
This spot deliberately left blank.
Rust’s Result<T,E>
type is one of the best setups
for lightweight, transparent error handling I’ve seen, but it doesn’t do
everything. How do you easily write your own error type without a bunch
of boilerplate? What if you have multiple different error types from
different libraries you want to coalesce together? How do you collect a
backtrace of every function an Err
is returned through, so
you can find the root cause of where it came from? Can we do all this
without allocating anything unnecessarily? And so on.
There have been various crates to try to solve these problems. First
in 2015 there was error_chain
, which was complicated and
not very convenient. Then in 2017 there was failure
, which
was simpler but not very flexible, and which took an irritatingly long
time to compile. Then in 2019 there was anyhow
, which was
about the time I stopped paying attention. Now apparently the new kid on
the block is eyre
, and I’m sure that in another year or two
there will be something else.
So, I just write the boilerplate and make my errors descriptive
enough I don’t need a backtrace. When I want to get fancy I implement
the built-in Error
trait, which used to be kinda useless but is now more helpful. And in
another five years it’ll still work just fine.
Byte mucking – bytemuck
For the rare occasions you need to turn a structure into arbitrary
&[u8]
or back. Doing this using unsafe pointers is
quite easy, and also makes it very easy to screw up horribly with
Undefined Behavior galore. (Did you know that changing the value of
padding bytes in a struct in UB? You do now.) bytemuck
lets
you muck around with bytes a little more responsibly.
Human dates and times –
chrono
Rust’s std::time
doesn’t really handle calendar or
wall-clock times, just arbitrary, monotonic Instant
’s and
measurable Duration
’s between them. Nice, pure,
computationally-robust time measurement. For all the nasty human
calendar and timezone stuff, you use chrono
. (And maybe
humantime
, but I personally reach for chrono
first, just out of habit.)
Bit flags – bitflags
Defining type-safe bit-masks in a reasonably convenient way. Not always worth the trouble, but sometimes pretty convenient.
Data, data types and data formats
“I have to create or read a…”
PNG/JPEG/GIF/etc – image
General-purpose loading and saving and images, which can handle a lot of formats. Can do some amount of image manipulation as well, such as cropping, smoothing, etc. but that will hopefully be pulled out into its own library at some point soon.
Small data THINGS –
uuid
, base64
, csv
,
semver
…
Exactly what it says on the tin.
Media codecs – Various
Not aware of any great encoders, but there’s plenty of
decoders for common audio formats. lewton
for Ogg
Vorbis, hound
for .wav, minimp3
for MP3,
claxon
for FLAC. Video, I haven’t used enough to have an
opinion on.
Config files – toml
For all your config file format needs. Works with serde
,
naturally.
Markdown –
pulldown-cmark
There’s several good Markdown readers and writers,
pulldown-cmark
is my favorite. It supports CommonMark, it’s
simple to use, and it’s pure Rust.
Templating – askama
There’s several quite good text templating engines, but
askama
IMO rises above them all by compiling your templates
into Rust code and type-checking your templates at compile time.
Sometimes this isn’t what you want, but it is a great feature
surprisingly often. This also makes it super fast, for when you really
need to generate 150,000 HTML pages in a couple minutes. For when I need
templates that can be altered at runtime, on the other hand,
tera
is my default pick.
Lexing/parsing
For lexing, I have never seen anything better than
logos
. Derive a trait on your token type, add a few
annotations, and you have a lexer.
There are many good parser libraries out there in different styles:
nom
, pom
, lalrpop
,
pest
, combine
… They’re mostly quite good, in
my experience, but in practice I tend to either use Known Formats like
CBOR or TOML, or I write a parser by hand. Play around and find what you
like.
Data structures
Rust’s standard library includes most data structures you need: hash map, ring buffer, all the good things. Can’t include everything though, so here’s a few things to fill in the cracks.
Immutable data structures –
im
Fast immutable data structures. Rust is a little weird, ’cause the
borrowing and ownership actually makes immutable data structures both
difficult and usually unnecessary – things can’t be mutated without you
knowing it, because the borrow checker won’t let you. This makes trying
to write code in a purely functional style feel weird. However, there’s
some situations where immutable data structures are actually really nice
– the one I ran into was passing shared data structures around in an
Erlang-like multithreaded actor system. For those cases, im
is the way to go.
Generational map –
genmap
What I call a “generational map”; I’ve also seen it called a slot
map, handle map, and some other things I can’t remember. This is very
much not a new structure, but I haven’t seen it often outside of video
game code. Like oorandom
, this particular crate is my own
implementation, but there’s several others that are good for different
use cases; see the crate’s README file for links to some others.
So what does a generational map do? It’s basically an array where you
insert things and get an opaque handle back, which is actually just the
array index. This is a quite useful pattern in Rust code when you have a
collection of things with complicated dynamically-checked lifetimes, but
using an Rc
for them feels icky. The improvement over just
using plain array indices is that a generational map also has a
“generation” count also stored in each handle, and the map’s generation
counter increments each time you remove an object from the array. So
when you remove an item, you can put a new one in the same slot in the
array and reuse that storage, but the new object’s handle will have a
different generation counter. So if you try to look up an object with an
old handle that used to refer to something else, instead of silently
getting the wrong item, you get a runtime error. It’s kind of the
inverse of an Rc
, which promises that an object will never
be freed until there are no more pointers referring to it. A
generational map lets you free an object, but trying to use a dangling
pointer will be a safe runtime error.
Stable hashtables – fnv
fnv
is a fast, stable hash algorithm, and the crate
includes simple wrappers for using it with Rust’s standard
HashMap
and HashSet
. Rust’s default
HashMap
is not “stable”, which is to say, it contains an
element of randomness. If you make two different HashMap
’s
and insert the exact same values into them in the same order, they will
be stored in a different order internally, which prevents Hash
DoS attacks. This is the sane default you want, but sometimes you
want a stable hash map (so giving it the same contents results in the
same ordering) or you need slightly higher performance. The
fnv
crate gives you both. The main time I wanted it was
writing a small compiler where I wanted to have reproducible builds.
Multithreaded hashmap –
dashmap
I’ve never actually used this but always wanted to find an excuse to.
Basically a replacement for Mutex<HashMap<K,V>>
that breaks the hashmap into several portions, each with its own lock,
increasing performance and reducing contention. There’s a few similar
crates out there, some of which are lockless, but like I said I’ve never
had an excuse to use dashmap
, so I haven’t explored
alternatives either.
Thread pool – ???
If you’re just feeding data to a fixed number of worker threads,
using a thread pool of some kind is generally simpler and more
convenient than herding threads around yourself. There’s several thread
pool crates with good reviews but, I haven’t used any often enough to
have specific recommendations. There’s also various scoped threadpools,
which make sure that the threads never outlive the current program
scope. This lets you feed references to local data into the threads,
instead of forcing you to clone or Arc
them.
Networking
“I need something fancier than simple network sockets but I’m not writing an HTTP service.”
HTTP client – reqwest
Need to fetch something via HTTP? reqwest
is the way to
do it. Not the most lightweight of things since it uses
tokio
to drive its I/O (see below), but it’s fast, robust,
and can do about anything.
Lightweight alternative to try out: ureq
. Haven’t used
it personally yet, though.
Application dev
“Okay, I’m making an interactive program, so it needs to be able to…”
Command line parsing –
structopt
Like serde
, I consider this one of Rust’s killer
libraries. No other language has anything nearly as good. You define a
Rust struct and throw some annotations onto it, and you now have a
command line parser. It’s amazing. Its downside is it’s relatively slow
to compile and doesn’t produce the tiniest code, though it’s not too
bad. Lightweight alternatives: argh
. You can get more
lightweight with things like clap
or argparse
,
but those don’t have the macro-based goodness of structopt
,
and I can’t live without that now.
Env file loading –
dotenv
Load options from environment variables or .env
files in
the local directory. This is generally what server systems do for
configuration so they don’t have to store usernames and passwords in
config files that can get accidentally checked into version control.
Very handy pattern if you’re making something that needs to touch a
remote database or web service. It’s trivial to implement on your own,
but it’s already been implemented once, so why redo the work?
Cross-platform
config file locations – directories
The dual of
dotenv
, it gives you an easy, cross-platform way to access
config files in the Right Place on whatever platform you’re using.
Interactive CLI –
rustyline
Rust equivalent of the readline
library.
readline
(or its descendent linenoise
) is what
bash
and just about every other command line program in
existence uses to give you an editable terminal that lets you backspace,
jump to beginning/end of lines, search backwards through a buffer, and
so on. rustyline
does the same thing without needing to
bind to a C library.
Progress bar – pbr
Displays a progress bar in the console, with simple configurable styles. It’s amazingly handy.
Web stuff
I am really not the person to ask here, and what knowledge I do have is pretty out of date.
Databases
Again, I am the wrong person to ask, but in my modest
experience… diesel
is a very good, database-independent
query builder. Database-independent query builders/ORM’s/etc also might
be a bit of a white elephant. Usually life gets simpler when I just find
a crate that interfaces with Postgres or SQLite directly, and I end up
making better systems when I do it in terms of SQL directly.
diesel
’s migrations are pretty nice, though. I dunno.
Things I avoid
Because every positive is incomplete without its negative. These are things that are popular, and are definitely not bad code or bad designs, but which I will personally choose not to use without a good reason. There are good reasons to use them, but most of the time if I see these in a project’s dependency tree I wince a little and expect excess complexity, two minute compiles, or inconvenient breaking changes within 6-12 months.
Again, these are from my personal experience and priorities. Your needs are not necessarily mine.
ring
A very correct, very complete, very rigorous encryption library. Due
to the level of rigor involved, and the analness personality
of the people required to maintain that level of rigor, they have an Interesting
policy regarding releases. Plus, for technical reasons, you can’t
have multiple versions of ring
linked into a single crate
like you can with most crates – there’s some asm embedded in there, and
the symbols in the asm code don’t get mangled by rustc
and
so can clash. So if you write a program that uses ring
, and
it has a dependency that uses ring
, they must always use
the exact same version of it or else your program will not compile, and
the ring
developers will shrug and say “that’s your
problem, we aren’t going to make life easier for you”. It used to be
that all but the most current versions of ring
were
deliberately yanked by the developers – marked “do not use” on
crates.io, which made it still available if you really wanted but
cargo
would refuse to compile it. That just made the
problem less tractable, and really irked me once upon a
time, though fortunately they seem to have given up on this madness
more recently.
What this adds up to is, if you use ring
, then having
your application compile and function from one day to the next is
always less important than ring
working
correctly, for the value of “correctly” defined by the library devs, and
you will get exactly zero warning or sympathy when someone else decides
to break your code forever. This is a perfectly reasonable and robust
approach to the problem of dealing with security in human-built tools,
and I hate it. Best to just move on and deal with other things.
tokio
Tokio is a multi-threaded runtime for fast async I/O, particularly
for network servers. It is also a ton of work to use, a ton of compile
time, and a ton of learning to do. If you need it, it’s fantastic, but
if you only kindasorta need it, I find it less work to pass it by. I’ll
spawn a thread per connection, I ain’t scared! On the other hand, if you
already use tokio
all the time and you’re already on top of
the learning curve, then it’s probably pretty easy to just whip it out
whenever you need something I/O-ish. Lightweight alternatives to try out
sometime: smol
,
popol
.
proc-macro2
/syn
These are the foundational crates of Rust’s procedural macro system.
They are kindasorta part of the compiler, but not actually distributed
as part of the compiler for Reasons I haven’t bothered finding out.
Procedural macros are what makes it possible to have awesome
metaprogramming tools like serde
, structopt
and logos
, and these crates are the glue that binds it
together. They are also slow to compile, no two ways about it,
and generating procedural macros is also slow. So if you
need to make a tool that does awesome metaprogramming, or you
need to use a tool that does awesome metaprogramming, then go ahead and
use these. But IMO it’s not worth the extra twenty seconds of compile
time for a small convenience crate like derive_more
.
num
A crate containing lots of numerical traits for compile-time
metaprogramming, like Integer
, Unsigned
, etc.
That’s cool, but it’s a large, complicated, slow-compiling dependency
that evolves breaking changes just fast enough to be
irritating. I’ve written and used a fair amount of math code of various
types, and I’ve never actually wanted or needed num
’s
metaprogramming traits, but those are the part that infiltrate the
dependency trees of everything ever. On the flip side though,
num
also provides quite nice implementations for fancier
numbers like ratios, complex numbers, bigint’s etc, so it’s totally
worth using for those.
crossbeam
Additional, occasionally faster concurrency primitives than what the
standard library provides. Multiple-producer-multiple-consumer queues,
exponential backoff timers, fancy atomic things, huzzah! Great stuff,
but also 95% of the time you aren’t gonna need it. If you profile your
program and discover you really need to be spending 3% of your time on
mpsc::send()
instead of 5%, then you can always just drop
crossbeam
in. But until then, if you can do your stuff with
just what’s already in std
, please do.
parking_lot
More compact and efficient implementations of the standard
synchronization primitives. Which, like crossbeam
, you
aren’t going to need. If benchmarks show that Rust’s standard
synchronization primitives are too slow for your application, then go
for it, but the time it takes to lock or unlock a Mutex
has
never been a bottleneck for me. Okay, it has once, but I switched to
using rayon
and it did everything I needed better.
Also, if parking_lot
’s implementation of this stuff was
unilaterally better it’d be in std
already, so I wonder
what the catch is. – Update: Looks like the whole story is here, and a more
actionable summary is here.