ActuallyUsingCrev
Actually Using Crev, Or, The Problem Of Trusting Software Dependencies
Up to date as of August 2019.
The Problem
As anyone who ever tried to build a program on Windows using SDL and
gcc in 2004 can tell you… building software with dependencies sucks. C
and C++ make this especially hard because between the
compilation+linking model, header files, macros, and conditional
compilation, it’s basically impossible to start with a pile of bare
source code and know how the correct magic compiler invocations to get a
functional program out the other end. Sometimes you can figure it out,
sometimes you can’t, and sometimes you’re trying to build X11 and you
really can’t. So, we have build systems like make
and cmake
and meson
and so on, and they’re all
basically giant garbage fires but they’re still absolutely necessary.
And all they REALLY do is try to provide the information that the
compiler and linker need to actually compile and link a program.
The traditional solution to this problem is to be using Unix, which has some assumptions and conventions for where to put C libraries and how things fit together. All libraries are installed system-wide, you put them in certain places that the compiler knows to look for, give them particular names, and so on. This also means that you have ONE copy of each library, EVERYTHING is built with the same compiler and the same versions of libraries, and if you want to build a program that departs from these assumptions you’re on your own to make it work. Putting all these pieces together and making them work is basically what Linux distros and BSD ports maintainers do, and it’s a big task. Each program and library gets treated as its own special case as necessary, and people write patches and build scripts to massage things into working. Thousands of them, all written and maintained by hand.
This usually works fine for end-users who just want to use existing software, but it sucks for developers because the pipeline between “someone writes code” and “you use the code” is pretty long, and is designed for stability rather than speed. If the distro doesn’t provide a library, or doesn’t provide the version of it you want, then you package it yourself and wait for it to be accepted for all the platforms you care about – thank you, your groundbreaking new software system will be included in the next Debian Stable release in about 2.5 years, give or take. Or you “vendor” the dependency and include it into your source distribution and build process and put the work in to upgrading it as necessary and just Deal With It. This is a horrible solution that defeats a lot of the purpose of using libraries to begin with, but dealing with libraries in C/C++ in general is so dysfunctional that there is an entire genre of libraries designed to be vendored into their user’s projects.
So, if you want to write C/C++, your choices are
- Write only for very specific platforms (this program runs best on
Internet Explorer 6.0Ubuntu 16.04), or - Deal with a VERY SLOW, labor-intensive, and pretty complicated software distribution process, or
- Put tons of work into supporting someone else’s code as part of your own code, or
- Use the most minimal dependencies possible, or
- Just avoid using other people’s code altogether.
This is one of the biggest and most pervasive hidden costs of writing C/C++. Forget memory safety, the compilation and distribution model is outright criminal.
The Solution
More or less starting with Java, as far as I’m aware, people got sick
of this shit and started explicitly designing their (mainstream)
programming languages to include all the information needed to build a
source file in the source file itself. If your file says
import java.util.HashMap;
then the compiler knows where to
look to find the code for java.util.HashMap
, and knows that
it needs to link it into your program. If that has any other
dependencies it mentions them in the same way, and the compiler can
search for them too, until it has the full dependency tree for your
program. There’s some naming conventions and language-specific
variations and such, and it’s still not a trivial problem to design and
implement, but it works. Large or complex programs sometimes still need
some sort of customized configuration, and there’s various systems like
ant
that provide that, but mostly you can type
javac Foo.java
and Foo.java
has enough
information in it for javac
to do everything it needs to
do. make
can go die in a fire.
So what if some of those modules aren’t present? Well, go online and
search for them! Java again was weirdly prescient in its
URL-based naming convention for modules, but also as usual basically
missed the target compared to the path technology actually took because
there was no built-in way to go from
import li.alopex.code.Foo;
to “download
http://code.alopex.li/Foo.java
”. Other languages invented
this first: you had CTAN and CPAN for TeX and Perl, which were basically
just FTP sites for source packages with a few added conventions and
tools. Then you got things like pypi and rubygems for Python and Ruby,
which were special-purpose versions of a Linux distribution’s repository
that moved faster and had fewer controls, which mostly worked fine
because the software didn’t need as much massaging to work together.
These managed system-wide packages for you, and this sometimes caused
clashes with the versions of libraries already installed as part of
linux distros, and people eventually made tools like
virtualenv
to try to work around these problems, but
generally it was still a big improvement for those people who wanted or
needed the cutting edge. Then at some point someone said “why don’t we
make it so there’s NO system-wide packages, you don’t need to make two
different programs depend on the same version of a library, you make
each program build with its own totally independent dependency tree and
you just have to specify what dependencies you want”, and we made
npm
for JS and the go
toolchain for Go… and
the world heckin’ EXPLODED.
Suddenly using a dependency is basically as easy as writing it down,
each program can use whatever it wanted with no fear at all of other
programs stomping on it or making life difficult. Even better, all the
infrastructure is run for free by third parties (and honestly pretty
cheap, I’m told), so publishing a library is as easy as throwing a
license and version number on it. (Well, and solving the hardest
problem in programming: coming up with a good name.) This is
awesome. Sure you spend a bit more time downloading and
building dependencies than with the old Unix model of system-wide
libraries, but our software and hardware can handle it, everything’s
connected to the internet anyway, source code is small, and the extra
five seconds it adds to a clean system build is WAY cheaper than the
hours of programmer time wasted on dicking around with
cmake
. It can’t be beat, and we’re basically at the point
where no new programming language can be taken seriously if it
doesn’t have a system like this.
The Problem, Again
So suddenlydistributing new code can move a lot faster, and reach a lot further with a lot less effort. This makes developing software way easier. Of course this includes developing malicious software as well. These giant package repositories are a single point of failure that everyone uses, and in fact it’s very difficult to not use them. This is naturally, an awesome attack vector, and such attacks are now not uncommon.
The reasons for this are many, and all go back into “it’s easy to use libraries, so people use them”.
- Because these libraries are so easy to use, they get used by lots of
people, and people build tools and workflows on the assumption that they
work. Because of this, all it takes is one compromise of a fairly
popular library to infect thousands of machines.
- Because libraries are easy to use, you also get deeper
dependency trees that have more hidden churn going on under the
hood.
- Because it’s so easy to update a dependency (especially if the
dependency follows Semver) we have
suddenly gone from “upgrade your OS every two years when a new LTS
release is made” to “I guess it’s Tuesday, might as well do
cargo update
to see if anything’s changed”.
- Because libraries have become so easy to create, you get more of them out there that are maintained by one or two people part time, or not at all, instead of having something like The GNOME Foundation backing giant chunks of the ecosystem with dedicated and experienced (if still often volunteer) developers.
So there’s an increasing number of attacks, since it’s easy and profitable and pretty low-risk. Because there’s lots of eyes on the software, these attacks often get discovered pretty quickly in absolute terms – for instance, in the most recent one there was about a week between the exploit and discovery. Try getting that turn-around time out of a commercial support contract; when a CVE is reported the embargo time between “attack is discovered” and “attack is announced publicly” is usually months. But because these systems are widely used, largely automated (and thus obscured), and trusted by default, a week translates to over a thousand downloads of a compromised library, and if sofware built with the compromised library is distributed to other users it could add up to much more than that.
The Solution, Again?
Nobody actually knows how to fix this. This ecosystem is a new thing in the world, and is still evolving fast. But people are looking for ways to fix it. Personally I think that a lot of it can be tackled by better, more transparent analytics to make it easier to understand what is actually going on in your dependency tree. More information may lead to better judgements made by developers, and hopefully automatic detection of problems or potential problems such as unmaintained or poor-quality packages.
Another approach is to, instead of removing the “obscured” part of
the equation, remove the “trusted by default” part. The main effort I’m
aware of in this field is crev
, which is
appealingly simple in concept: People do code reviews of specific
packages, summarize their results, and sign them with a crypto key that
proves the review comes from a particular source. It’s simple, very
broad, and so is hopefully easy to get going and actually use.
crev
is a very human-centric system, and a very minimal
one – there’s no accounts or real-world identity system or such attached
to it, all you can prove is that all reviews made by the same ID were
made by people with access to the private key behind it. It could be a
person, an institution, one of a person’s dozen alts, or whatever. But
the reviewer is staking a reputation of some kind or another on it, and
a key without a good reputation isn’t going to matter much, so while
someone could shotgun millions of false reviews out there, they’re about
as likely to be taken seriously as spam emails or bot-generated Twitter
posts. (These are not solved problems either, but can at least be kept
down to a dull roar.) The actual value of this model comes from being an
incarnation of the human social webs that already exist – I personally
know svenstaro, I know how
good their work is, and so if I know that an ID is theirs and they
review some stuff then it’s a reasonably trustworthy opinion. I choose
who I actually trust, or don’t trust, which isn’t a perfect model but
that’s how human trust always works. “Perfect” guarentees of trust such
as those provided by cryptocurrency are inflexible. Systems that people
actually use need some wiggle room to operate well.
Will this web of trust model work? I don’t know. It didn’t really for
PGP, but PGP has plenty of other problems as well. Could it work for
crev
? I don’t see why not. In human systems, all security
comes down to trusting individual people. That’s why getting cheated by
a friend hurts so much. We’re just building tools around the social
dynamics that already exist.
So, let’s try it out and see how it goes.
Actually Using Crev
Okay, all this nonsense has basically been trying to lead up to an
actual tutorial for using crev
. Currently the only
implementation of crev
is cargo-crev
, which
ties into the Rust language package manager, cargo
.
However, none of this is Rust-specific apart from the implementation,
the basic concept and code review format should work for any language or
package system. Code reviews (“proofs”) are just YAML files, and they
can be shared around however you feel like – the method currently seems
to be by putting proofs in git repositories, and cargo-crev
has support for this. crev
already has a pretty good getting
started guide that covers much of the same ground as this, but I
wanted to write something similar that comes from a random user, not the
system’s creator.
A small use case
First, let’s install the thing. We’re using Linux, obviously, but
this shouldn’t be too different on any platform ’cause Rust has a lot of
work put into its tooling to make it Just Work. If you haven’t installed
Rust by now, do so. Yes, that page
tells you to curl a shell script into sh
– don’t you
trust it? At least it doesn’t need sudo though! Then all you
need to do is set up your $PATH
to include
~/.cargo/bin/
so that you can find cargo
and
the programs it builds, and do cargo install cargo-crev
to
build the latest version of cargo-crev
from source. On
Debian, you will need to do
apt install clang llvm-dev libclang-dev
first to get it to
build – the hashing library uses some inconvenient C dependencies and
getting them to build is a PITA. How topical.
Alternatively, you can just download a pre-built binary from crev
’s github
release page.
Checking reviews
So there’s two main things that we want to do: Check the code reviews
for a crate, and make reviews for some dependencies. As a small test
case I’m going to use goatherd
,
which is a toy pubsub messenger thing I was making. It had the dubious
distinction of turning up a nasty race condition in one of its (quite
young) dependencies, which got me thinking about this sort of thing in
the first place. So once we install cargo-crev
we can just
cd to its dir and run cargo crev verify
:
~/my.src/goatherd $ cargo crev verify
status reviews downloads own. issues lines geiger flgs crate version latest_t
none 0 0 685715 7926719 0/1 0/0 346 60 num_cpus 1.10.1
none 0 0 837285 15868655 3/3 0/0 875 0 CB bitflags 1.1.0
none 0 0 34142 12593931 1/1 0/0 34340 35 CB syn 1.0.3
none 0 0 3829463 7436762 1/2 0/0 1399 0 semver 0.9.0
none 0 0 48229 12761052 1/3 0/0 9775 0 CB serde 1.0.99
none 0 0 120906 4702189 0/1 0/0 2530 291 CB parking_lot_core 0.6.2
none 0 0 6030 156927 0/1 0/0 748 68 once_cell 0.2.6
none 0 0 298180 1712663 2/3 0/0 1786 2 CB bincode 1.1.4
none 0 0 2285432 3257460 1/2 0/0 306 0 CB rand_chacha 0.1.1
none 0 0 2738292 2819988 1/2 0/0 698 12 rand_isaac 0.1.1
none 0 0 142889 2292737 0/1 0/0 2159 561 redox_syscall 0.1.56
none 0 0 33854 11301406 1/1 0/0 1329 0 quote 1.0.2
none 0 0 2396 220849 0/1 0/0 4364 393 sized-chunks 0.3.1
none 0 0 963594 4731677 3/6 0/0 2478 16 uuid 0.7.4
none 0 0 742319 747528 0/1 0/0 358 2 rdrand 0.4.0
none 0 0 794 147029 0/1 0/0 11481 75 CB im 13.0.0
none 0 0 87187 17062523 3/4 0/0 58231 37 CB libc 0.2.62
none 0 0 144063 3388430 0/1 0/0 1615 216 lock_api 0.3.1
none 0 0 1192174 1192526 0/1 0/0 1196 49 cloudabi 0.0.3
none 0 0 1869540 1903212 0/1 0/0 13 0 CB winapi-x86_64-pc-windows-gnu 0.4.0
none 0 0 153833 8206135 2/3 0/0 594 31 rand_core 0.4.2
none 0 0 1116907 2245501 1/2 0/0 398 10 rand_jitter 0.1.4
none 0 0 43526 8282886 1/2 0/0 6664 0 serde_derive 1.0.99
none 0 0 773001 9236283 1/1 0/0 2436 226 CB byteorder 1.3.2
none 0 0 700118 739883 0/1 0/0 40 3 fuchsia-cprng 0.1.1
none 0 0 2767914 2903867 1/2 0/0 326 58 rand_hc 0.1.0
none 0 0 1261816 9417985 1/1 0/0 104 0 cfg-if 0.1.9
none 0 0 14491 3298524 0/1 0/0 398 0 autocfg 0.1.6
none 0 0 2284234 2912350 1/2 0/0 155 6 rand_xorshift 0.1.1
none 0 0 62430 10353037 2/6 0/0 538 0 unicode-xid 0.2.0
none 0 0 34 124 0/1 0/0 1129 26 secc 0.0.9
none 0 0 1822881 1855264 0/1 0/0 13 0 CB winapi-i686-pc-windows-gnu 0.4.0
none 0 0 2087595 18385573 3/4 0/0 6388 19 CB rand 0.6.5
none 0 0 1815094 2945762 1/2 0/0 258 12 CB rand_pcg 0.1.2
none 0 0 4475042 4774520 0/1 0/0 1040 0 semver-parser 0.7.0
none 0 0 739818 5968911 0/3 0/0 1926 342 smallvec 0.6.10
none 0 0 3153691 5602538 0/1 0/0 265 0 rustc_version 0.2.3
none 0 0 8 162 0/1 0/0 1289 4 axiom 0.0.7
none 0 0 276112 13978426 5/5 0/0 2414 32 CB log 0.4.8
none 0 0 599802 5409729 0/2 0/0 264 19 scopeguard 1.0.0
none 0 0 1806787 2534826 1/2 0/0 1033 43 rand_os 0.1.3
none 0 0 47917 7598948 2/2 0/0 3478 0 CB proc-macro2 1.0.1
none 0 0 2157068 2465807 0/1 0/0 3909 51 CB typenum 1.10.0
none 0 0 144187 4728114 0/1 0/0 2742 349 CB parking_lot 0.9.0
none 0 0 2266560 8206135 2/3 0/0 516 41 rand_core 0.3.1
none 0 0 1000256 10062103 0/1 0/0 160451 197 CB winapi 0.3.7
Okay, that’s a lot more junk than I expected. Some of it, like
crate
and version
, is pretty obvious.
downloads
, I assume are the recent and total download
numbers from crates.io, and geiger
I assume is the result
of cargo-geiger
,
a tool which crawls through Rust code looking for unsafe
blocks. The status
column is the amount of trust we’ve
decided the crate deserves, and reviews
, flgs
and own.
are a bit obscure. Turns out own.
is
number of owners of the crate on crates.io (known / total; many
low-level Rust crates are made by well-known developers like bluss
or alexcrichton
),
and flgs
is various flags – CB
means “custom
build script”, so building the package may run random code. The two
columns in reviews
are the number of known proofs for the
listed version of the crate, and for all versions, respectively. Despite
crev
’s generally-good docs, finding specifics on these
columns is currently unfortunately tricky.
Right now we have absolutely zero proofs/code reviews, because we
haven’t imported any. Well, dpc
made
cargo-crev
, so I might as well start with their proofs
’cause I’m already trusting them by using their software. Proofs are
just files kept in a git repo, which we can import via the following:
cargo crev fetch url https://github.com/dpc/crev-proofs
Run it again, and we get some changes to the reviews
column, shown here:
status reviews downloads own. issues lines geiger flgs crate version latest_t
none 1 1 685715 7926719 0/1 0/0 346 60 num_cpus 1.10.1
none 1 6 739818 5968911 0/3 0/0 1926 342 smallvec 0.6.10
none 1 1 700118 739883 0/1 0/0 40 3 fuchsia-cprng 0.1.1
none 0 2 773001 9236283 1/1 0/0 2436 226 CB byteorder 1.3.2
none 0 1 276112 13978426 5/5 0/0 2414 32 CB log 0.4.8
none 0 1 2087595 18385573 3/4 0/0 6388 19 CB rand 0.6.5
Yay, now we have some non-zero numbers in the reviews
column! The first number is the number of reviews for that particular
version of the code, the second is for all versions of the code. Note
that even with dpc
’s reviews we still don’t trust those
crates (their status
is none
), because we
haven’t marked dpc
’s ID as trustworthy. We need to create
our own ID to mark dpc
’s ID as trusted, we’ll get to that
in a moment. For now, we can see there’s not a WHOLE lot of
crev
reviews out there, so let’s focus on changing
that.
Sharing reviews
First, we need a git repo called crev-proofs
. Well
that’s pretty easy, we’ll just make one in our preferred public
location, make an empty initial commit in it, and push it. Now we
can create a crev
ID:
cargo crev new id --url https://git.sr.ht/~icefox/crev-proofs
This asks us for a password, which is the usual “this is forever, you
can never recover this” type password you have to use with crypto
systems, so make sure you save it in your password safe. You should
probably make a copy of the key file it gives you too; maybe print it
out and stick it under the cat’s bed or something. You can find a copy
of it in ~/.config/crev/ids/<id public key>.yaml
; the private key in this
file is encrypted with your passphrase, so this file doesn’t need to
be kept under lock and key (though it probably wouldn’t hurt).
Now that we have an ID of our own, we can mark dpc
’s ID
as trusted:
cargo crev trust FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE
It asks us to put in some notes about how trusted this person
actually is, and we’re done. Now when we run
cargo crev verify
, the crates that dpc
has
reviewed positively are marked as “pass”.
How to find someone else’s ID? In this case I just took it from the
docs in the getting started guide, but if you do
cargo crev query id all
it will show you the ID(s)
connected to any git repository you’ve added with
cargo crev fetch url
. (Note that a repository can contain
proofs from more than one ID!)
So by trusting someone’s proofs, by default we are also trusting the
people they trust! This is huge. When you mark someone
as trusted it is stored in your crev-proofs
repo so other
people can see it! You can see this by running
cargo crev query id trusted
, which lists a lot more people
than just the one I added:
YWfa4SGgcW87fIT88uCkkrsRgIbWiGOOYmBbA1AtnKA low https://github.com/oherrala/crev-proofs
Qf4cHJBEoho61fd5zoeweyrFCIZ7Pb5X5ggc5iw4B50 medium https://github.com/kornelski/crev-proofs
aD4K0g6AcSKUDp3VPF7u4hM94zEkqjWeRQwmabLBcV0 medium https://github.com/Mark-Simulacrum/crev-proofs
FBkykBV6YaqAaGoUXyvd-XkEqDYxQNM7EUnZ2nuy-XQ low https://github.com/Canop/crev-proofs
lr2ldir9XdBsKQkW3YGpRIO2pxhtSucdzf3M5ivfv4A high https://git.sr.ht/~icefox/crev-proofs
FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE medium https://github.com/dpc/crev-proofs
X98FCpyv5I7z-xv4u-xMWLsFgb_Y0cG7p5xNFHSjbLA low https://github.com/kornelski/crev-proofs
ZOm7om6WZyEf3SBmDC69BXs8sc1VPniYx7Nfz2Du6hM low https://gitlab.com/KonradBorowski/crev-proofs
However, note the only high
trust level is the one for
my own ID: if dpc
trusts kornelski
, then I
transitively trust kornelski
as well, but I never
trust dpc
’s judgements more than I trust
dpc
. There’s tons of options in crev
for tweaking how exactly you measure “trust”, so if you don’t like the
defaults you can tinker to your liking. It all feeds into whether that
first column in cargo crev verify
is “pass” or not; that is
something you can set to your own standards. In the end,
you are the one who makes the definition of “trust”!
I’m fine with the defaults for now though, so I can pull all the proofs
for dpc
’s little social circle by running
cargo crev fetch trusted
, and suddenly there’s more
pass
’s in my cargo crev verify
output.
Creating reviews
Okay, let’s actually review some code! In the process of making
goatherd
I’m playing with a new library called
axiom
, and due to the aforementioned race condition issue
I’ve already rummaged around inside it, added a couple minor PR’s, and
talked with its author a bunch. It’s not a big piece of code, so this
was what I wanted to review. We can do that with just
cargo crev goto axiom
– since cargo downloads a crate’s
source code anyway, that command just takes us to a new shell in cargo’s
source repo.
This is actually clever, not to say essential, as it makes sure that
what we are reviewing what is actually in crates.io
, not
what is on github or such.
We are just in a subshell, so we can do whatever we want there,
browse the code or run tools like cargo geiger
or whatever
we want. When we’ve seen what we need to, we run
cargo crev review
in that shell to create a new review for
the crate.
However, when I did that it kept saying
Error: The digest of the reviewed and freshly downloaded crate were different; dbhwvUPPXHFO7Nn2u29HaOuZtg9CKMExl-5ayu0-itg != RRB0JmuenFACKjFN0CTdO1_MOse-YDhCvjf4Vec1GLs; /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/axiom-0.0.7 != /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/axiom-0.0.7.crev.reviewed
.
Deleting the axiom
source and rebuilding
goatherd
to fetch a fresh copy doesn’t make this message go
away. More on that in a bit!
…for now let’s try a different crate. I went to another crate I make,
ggez
, and and did cargo crev verify
, but it
failed with the following message:
Updating crates.io index
Error: the lock file /home/icefox/my.src/ggez/ggez/Cargo.lock needs to be updated but --locked was passed to prevent this
Okay, this is why this is called Actually Using
Crev. Deleting my Cargo.lock
and rebuilding it didn’t help,
but running cargo update
in ggez
’s dir cleared
this message. Thing is, this error message isn’t coming from
crev
, it’s coming from cargo
– it wants to
update its package list to the most recent before going through
ggez
’s dependency tree, but crev
calls
cargo
with the --locked
option, which tells it
to not touch anything and operate in read-only mode. A sensible
precaution in principle, reviewing a crate doesn’t change the dep list
you’re trying to review, but here it’s blocking us with a weird error
message. Doing the cargo update
by hand solves the problem
and we can proceed.
Doing cargo crev verify
again I got the expected
display, and also the following warnings:
Unclean crate approx 0.3.2
Unclean crate crc32fast 1.2.0
Unclean crate either 1.5.2
Unclean crate lazy_static 1.3.0
Unclean crate nodrop 0.1.13
Unclean crate void 1.0.2
Error: Unclean packages detected. Use `cargo crev clean <crate>` to wipe the local source.
Unclean, UNCLEAN! Not sure what’s going on here; it SOUNDS like it is
saying that the local source code in the local downloaded version of the
crates doesn’t match what cargo
thinks it should, which
seems slightly impossible ’cause I certainly haven’t been faffing about
in ~/.cargo/registry/src/
. Still, the suggested command
makes this warning go away, so huzzah? Could still be
better.
Anyway, this void
crate looks weird, and
crev
says it’s only 70-odd lines of code, so that seems
like a good target to review. I do cargo crev goto void
and
it opens a shell in its cargo source directory again. I poke around and
decide it looks fine, then do cargo crev review
, and this
time it works. I put in my key’s passphrase and it opens my editor with
the following template:
# Package Review of void 1.0.2
review:
thoroughness: low
understanding: medium
rating: positive
comment: ""
Plus another 60 lines of comments giving absurdly thorough
explanations of what is going on with each of these fields. It’s very
helpful, actually. I fill it out, save, exit, and poof it’s done. The
proof is saved in my local proof repo and I can view it with
cargo crev query review void
, at which point I realize that
dpc
has already reviewed void 1.0.2
. Oh well,
adding another review to it can’t hurt!
This proof is only saved locally, but is already committed into my
git repo of proofs, which is apparently in
~/.config/crev/proofs/<key>/
. I can manipulate the
git repo with cargo crev git
, which just goes to that repo
and passes its args through to git
, so
cargo crev git log
and such work as you’d expect. A
shortcut for cargo crev git push
that automatically goes to
the repo associated to your ID is cargo crev publish
.
So I run that and, huzzah, it works! Check the actual review out here
I made a few more reviews, ’cause it’s kinda fun. I was looking
forward to giving a friend some shit about his code when I hit the error
again:
The digest of the reviewed and freshly downloaded crate were different; i_bS1vb-271a-02WkBOf7D-yGi-t3fsYJ3kco8FKrNY != v_iKtjR6uB7QFaZ0_sOyYIAcDHidjucAjRGvjXOXqq0; /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/randomize-3.0.0-rc.3 != /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/randomize-3.0.0-rc.3.crev.reviewed
.
Okay, this is annoying now. It’s saying the hash of the checked-out
Cargo source directory is not the same as its copy of it, but I can’t
figure out why. So, time to ask for
help. Turns out that was a bug in the lists of what files got
excluded from the hash, which is now fixed on git master and should be
in the next release.
So I think my main conclusion is that crev
is great in
concept, but the implementation is still young. It mostly works, and you
should use it, but you should expect to hit some roadbumps. Response on
the issue tracker is pretty quick though!
Discussion
So I see a few possible failure modes for crev
as an
ecosystem:
- Nobody uses it – and it dies in obscurity
- Everyone uses it, poorly – and its data is worthless because it’s so hard to find meaningful stuff
- Lots of people use it maliciously – ibid
Hopefully none of these things happen! I think that as long as we
have people who care about writing good software, we can have people
creating good proofs as well. And in the end, as in many things, it all
comes down to who you know, which is a very powerful tool. A
little bit of work can hopefully lead to the average person having
access to a lot of high-quality code reviews, and the “6 Degrees Of
Kevin Bacon Paul Erdős” phenomenon will hopefully result in a
lot of these social circles being connected together.
I actually kinda want to stress-test this, just to see how resilient
the system is in practice. Maybe make a bot that will automatically
create reviews in a semi-human-ish method, and see if I can convince
people to trust it. I think the underlying idea would be quite similar
to the Twitter bots that just spew nonsense and like each other’s stuff
to boost follower ratings, and would have a similar goal of making
certain statements look far more authoritative and widely accepted than
they actually are. You could even make some front websites for security
consultants or such and associate the git repos with them for
verisimilitude. The crev
community is currently small so
even a few of these could poison a lot of the well, and even if the
community grows bigger it’s easy to spin off more bots. If there’s money
in it then people will do it, and I can easily see there being money in
saying “Yes, of course, this exploit-ridden code is totally safe. Trust
us.”
However, I also see a few possible defenses against this abuse built
in to crev
:
- The social network is transitive, but trust is graded. If A trusts B and B trusts C, then A’s trust for C is equal or less than A’s trust for B.
- Open source tends to be a fairly personal and reputation-driven
endeavor. So, people who participate in
crev
are motivated to be shy about who they trust. - Open source coders tend to, well, write a lot of code. There’s not a lot of point in trusting someone’s reviews if they don’t also have projects that use the code they review, or at least someone’s code! That’s going to be hard to fake. People try it, for example creating github accounts full of resume-padding, but it’s pretty damn easy to spot.
- Coming up with a convincing fake and getting someone important to trust it is a non-trivial amount of work, since you’re trying to fake a pretty savvy audience. I’m sure it’s certainly possible, and will happen, but it’s probably a job that would require spear-phishing rather than brute-force spam. This is the nature of security: you can never remove the possibility of malicious action, but by making it harder and more expensive you reduce the chances of it.
I’m now getting more hypothetical, but afaik these sorts of bot
networks to boost reputation are often structured to have bottlenecks in
them, so that a few “leader” accounts end up having the authority of
lots of other “follower” accounts backing them. So, blocking a few
accounts may also render large chunks of a bot network useless at once.
Human social networks also tend to follow certain patterns of
connection density that result in similar bottlenecks, so if part of
the web of trust departs from this structure that’s also
possibly evidence that it is artificial and you can have your own bots
that look for such things. In crev
distrust is as transitive as trust is, so if someone
you trust marks an ID as a bad actor you inherit that opinion until you
form your own. You can help ensure the integrity of your network by
blacklisting people as well as whitelisting them. Security researchers
who actually try to find botnets or vulnerable crates can publicize them
as untrustworthy, for instance through the already-existing RustSec effort, and then you can
follow those to get an up-to-date safety net.
Or, if you just take the whitelist of the most social person you know, and the blacklist of the most paranoid person you know, then you’re probably in good shape! Or maybe the other way around, depending on what you want. But what do you do if two of your trusted sources disagree about a third party? I dunno, it’s an interesting question! What do you do when that happens in real life? :-D
Conclusion
Anyway, back on topic! There’s a few GREAT things that
crev
has going for it over systems like PGP. First is that
it is targeted at programmers, and it uses programmer’s tools. Putting
your proofs as plain text in git is brilliant, because everyone already
uses plain text in git. Sharing your crev
proofs online
adds exactly zero infrastructure over what you’re already using, and git
does exactly what you need for this role because it’s resilient, easy to
manipulate, easy to move between machines, system-independent, and so
on. Also unlike PGP (or SSL), it’s trivial to distrust
an ID, revise a review, or so on. To err is human, and any human social
system that doesn’t allow people to screw up and then fix their mistakes
is doomed. But since proofs are plain files in version control, you just
update the file and commit it, and that change propagates to everyone
that is interested in your opinion the next time they do
cargo crev fetch trusted
.
Also, it seems like crev
should be resilient to
compromised keys, and doesn’t have a single point of failure. If I lose
my crev
key, I can make a new ID, trust the old one, and
life is good; people trusting me need to be informed to update, but
that’s always gonna happen, and I can put the new ID in the same git
repo. If my crev
key is compromised and someone uses it to
create spam… I still control my git repo, so I can just make sure they
don’t have access to it as well as creating a new key. If someone gets
control of my git repo, they don’t have my crev
key, so
they can’t add or alter proofs or trust new people, only remove things.
Bad, but not fatal, and I’m going to have a local backup somewhere since
it’s part of crev
anyway. They could delete ALL your proofs
and replace them with their own, but since other people trust your
key and not your repo, that gives them the same problem. So, it
seems pretty hard for an attacker to fuck up your life by compromising
your crev
setup; the crev
key and git repo
combine to make two-factor auth an architectural built-in, so the
attacker has to get control of both to really do damage.
I’ve noticed a few ergonomic things to improve though, odd edge-case
bugs aside. Most of what I do with crev
doesn’t actually
read or affect Rust code at all; all the ID management and such could be
a separate tool, not a cargo
plugin. It would be nice to
have a general-purpose tool that only manages ID’s and proofs, and a
special-purpose tool that’s specifically for Rust-related stuff. That
would make it easier to make crev
-based tools for other
languages. I also don’t like that you have to be in the directory of a
project using a particular crate before you can do
cargo crev goto dependency_name
, though it does need the
data in the project dir to find the right version of the dependency to
go to. The cargo-crev
tool itself is really trying to do
three separate things: manage ID’s and proofs, create
new proofs, and assess a Rust package based on the proofs I
know. No wonder it feels a bit clutter-y.
Well, there’s space for the tool to evolve, and it looks like it’s going in that direction anyway. The tool can change however it wants as long as the proof file format is the same, and the proof file format has a version number so it can change too.
All in all though… this is a heckin great system. And reading code is usually a lot easier than writing it, especially Rust, and I tend to end up reading a lot of bits and pieces of people’s code anyway. So, having a way to formalize the process and share my findings is real useful. (After all, who doesn’t like to brag about their opinions?) The tooling definitely is rough, but generally usable, and I am in love with the power of the concept.
So what would we have to do to actually make crev
useful
in the future? There’s a lot of small steps:
- First, we need an option for
cargo update
or such that will make it attempt to ONLY use packages that have passing code reviews, or at least prefer the ones that do and warn if it’s forced to use one that doesn’t. - Second, obviously, we need more people to use
crev
. This is the hard bit, which is why I’m writing this. Frankly if everyone just checkscargo crev verify
semi-consistently, that would be a real good start.
- Next, if we get people into the habit of publishing code reviews for
the crates that they create, that would be awesome. First, the presence
of a review by an author can serve as an indicator that the crate is
intended to be taken seriously instead of just being some
random experiment, and second, it would mark a particular crypto key as
“this is the author of this crate” which is a useful connection to have
anyway. This would make it easier to automatically catch cases like the
rubygems
rest-client
attack that made me kick off this whole investigation. - We need to treat a library being un-reviewed as a bug, and file issues for crates that aren’t reviewed at least by their author. This will help with the previous points, again just by making people aware of the problem; responsible authors will want their code to be reviewed, and you don’t want to use the code of unresponsible authors.
- We need
crev
tools for non-Rust codebases. The infrastructure in terms of proofs and such is all language-agnostic, we just need to start pushing it into other communities. Everyone will benefit. - Frankly, if people want to make their own proofs, just doing one
crev
code review a week will add up fast. There’s a whole lot of small but key crates out there.
So yeah, you should use crev
! My public ID is
lr2ldir9XdBsKQkW3YGpRIO2pxhtSucdzf3M5ivfv4A
and proofs repo
is https://git.sr.ht/~icefox/crev-proofs
. ’Cause, you
totally trust my opinion, right?
Cheatsheet
Setup:
# Create an ID
cargo crev new id --url https://git.sr.ht/<whoever>/crev-proofs
# Trust someone you trust
cargo crev trust <some key>
# Or start with someone's git repo...
cargo crev fetch url https://github.com/dpc/crev-proofs
# ...find their ID...
cargo crev query id all
# and then mark it trusted
cargo crev trust <whatever>
Basic workflow:
cd your_project
# Update your trusted repos
cargo crev fetch trusted
cargo crev verify
# Find a dependency that looks easy, important, or both
cargo crev goto some_dependency
# Look at stuff, decide what you think
cargo crev review
# write your review, save, quit
exit
# Push your changes
cargo crev publish
# And, you're done!