QbeNotes
So QBE hit 1.0 recently and hey, it looks like the sort of thing that I want for my compiler experiment. I’m a big believer in the value of the simple 90% solution over the complex 100% solution whenever possible, so a simple compiler backend that does most of the most important optimizations seems like my kind of jam. So I decided to try to write a simple backend for my compiler (given that currently there’s no other sort) to output QBE code and see how it works out in practice. I’m writing down my thoughts on it, because someone is sure to ask.
What is QBE? It’s a compiler backend “that aims to provide 70% of the performance of industrial optimizing compilers in 10% of the code”. Basically, high-level assembler with optimization passes. You can find it at https://c9x.me/compile/code.html
TODO: Incorporate feedback from https://lobste.rs/s/je3o8m/qbe_notes and stuff
Minor things
It takes text files as an input instead of providing a library to link to, which is a little bit of a mixed blessing. For my purposes it’s fine.
I low key hate the C style type names; i32
is
w
for “word”, i64
is l
for
“long”, f32
is s
for “single”, etc. But fuck
it, the computer is the one who has to deal with them, I just have to
output them.
There’s no explicit pointer/address type? Seems a little odd, it says
in the docs to just use an integer. …it only supports 64-bit backends,
so your pointers are always i64
. That’s… actually not
unreasonable, considering its problem scope is “hobbyist-level”. Might
make adding a wasm32 backend Interesting some day. Or compiling for
embedded platforms (aka ARM32 or RISCV32). 32-bit isn’t completely dead
yet, it seems! However, the docs are written as if 32-bit support will
be considered in the future, so we’ll see how it goes. 32 bit platforms
generally have different ABI’s than 64-bit ones, so there’s not
necessarily much to be gained by pretending that they’re the same
thing.
On my Pinebook Pro it output ARM64 assembly code automatically, which
is nice because I’d literally not considered that it could have been a
problem. However, compiling/linking the ARM64 asm it outputs requires
the -no-pie
flag passed to your C compiler. If you don’t do
that, you get a crazy linker error message about -fPIE
that
points you in entirely the wrong direction. I guess the asm it outputs
is not position-independent? Why not? It appears that Linux on ARM64
expects position-independent executables by default, or at least GCC+LD
does. Would be nice if the QBE docs said something about that somewhere.
I tried several different flags but only figured out the right one
because someone on the mailing list coincidentally asked about a similar
problem the day later.
There are “half-word” (h
) and “byte” (b
)
types, but they can’t be used in function args. They can only be used in
struct definitions. Assuming that function args are always in registers,
this sorta makes sense, I think? Feels a little weird though. Maybe
makes the compiler work a little harder than it needs to, you gotta
explicitly widen everything to words to call functions and then use the
correct size loads/stores inside the functions anyway.
Actual problems
Aha, I found an actual criticism: No add-with-carry
instruction! Or subtract-with-borrow. Can’t implement i128
types nicely.
Functions have an optional env
parameter that is an
i64
? It’s hard to figure out what it actually does. The
docs just say: “The intended use of this feature is to pass the
environment pointer of closures while retaining a very good
compatibility with C.” “An environment parameter can be passed as first
argument using the env keyword. The passed value must be a 64-bit
integer. If the called function does not expect an environment
parameter, it will be safely discarded.” Oooookay so it’s obviously
intended for closure environments but how does this make life easier
than just turning your closures into functions that take the environment
as an explicit arg? This is what my compiler does already anyway.
Okay okay. Let’s start from the top and play with this
env
thing a little. Per the docs, you can define a function
and call it with an env
parameter, and if the callee
doesn’t use the env
parameter then it just ignores it. So
you can do this:
data $str = { b "%ld", b 0 }
function $print_i64(l %i) {
@start
call $printf(l $str, ..., l %i)
ret
}
export function w $main() {
@start
%e =l copy 99
call $print_i64(env %e, l 12)
ret 0
}
This generates the following asm code. Sorry for the ugly AT&T syntax.
.data
.balign 8
str:
.ascii "%ld"
.byte 0
/* end data */
.text
print_i64:
pushq %rbp
movq %rsp, %rbp
movq %rdi, %rsi
leaq str(%rip), %rdi
movl $0, %eax
callq printf
leave
ret
.type print_i64, @function
.size print_i64, .-print_i64
/* end function print_i64 */
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
movl $12, %edi
movl $99, %eax
callq print_i64
movl $0, %eax
leave
ret
.type main, @function
.size main, .-main
/* end function main */
.section .note.GNU-stack,"",@progbits
Ok, we see our 99
being put into
eax
/rax
, which per the System-V ABI is
a temporary value – that is, the callee can clobber it and it’s fine. So
this technically violates the ABI by sneaking in an extra function
argument, but does so in a way that can’t break anything that doesn’t
know about it. Okay, that’s cute I guess. So what happens if you have a
function with an “env” parameter and don’t pass it one?
data $str = { b "%ld", b 0 }
function $print_i64(env %e, l %i) {
@start
%foo =l add %e, %i
call $printf(l $str, ..., l %foo)
ret
}
export function w $main() {
@start
call $print_i64(l 12)
ret 0
}
A quick qbe closure_fiddling.ssa
and it… ah, silently
generates invalid code.
.data
.balign 8
str:
.ascii "%ld"
.byte 0
/* end data */
.text
print_i64:
pushq %rbp
movq %rsp, %rbp
movq %rax, %rsi # rax is read here...
addq %rdi, %rsi
leaq str(%rip), %rdi
movl $0, %eax
callq printf
leave
ret
.type print_i64, @function
.size print_i64, .-print_i64
/* end function print_i64 */
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
movl $12, %edi
callq print_i64 # But nothing before this call sets rax or eax to anything!
movl $0, %eax
leave
ret
.type main, @function
.size main, .-main
/* end function main */
.section .note.GNU-stack,"",@progbits
It does similar things on the ARM64 and RV64 backends. Okay, good to
know: don’t use the env
feature. Bug report filed here.
Not a deal breaker; I’ve committed worse sins, and as I said, my
compiler already does the work that an env
arg would get
used for.
Conclusions
All in all so far: Much more than a toy, much less than what I would consider “industrial grade”. It’s a tough niche to target. I don’t hate it.
Okay I take it all back, because I actually looked at the code trying
to find where the env
check should be happening and wasn’t.
It was tricky ’cause the code is dense and weird and full of
single-letter variables, globals, and other nonsense. I was starting to
get discouraged, then halfway through the parser I stumbled across
this:
switch (t) {
default:
if (isstore(t)) {
case Tcall:
case Ovastart:
/* operations without result */
r = R;
k = Kw;
op = t;
goto DoOp;
}
err("label, instruction or jump expected");
case ...:
...
}
DoOp:
...
So… we have a switch
statement that has
default
as the first member, which has an if
statement that contains cases to jump directly into it, sets
some magical single-letter variables and goto
’s something
most of the way to the end of the function. And that’s not just a
goto Error;
type thing either, it does some complicated
logic and then maybe goto
’s another thing further down.
New plan: Let’s not use QBE. If I can’t find and fix a simple bug because the code is absolutely insane, then I’m not going to rely on it.