QbeNotes
So QBE hit 1.0 recently and hey, it looks like the sort of thing that I want for my compiler experiment. I’m a big believer in the value of the simple 90% solution over the complex 100% solution whenever possible, so a simple compiler backend that does most of the most important optimizations seems like my kind of jam. So I decided to try to write a simple backend for my compiler (given that currently there’s no other sort) to output QBE code and see how it works out in practice. I’m writing down my thoughts on it, because someone is sure to ask.
What is QBE? It’s a compiler backend “that aims to provide 70% of the performance of industrial optimizing compilers in 10% of the code”. Basically, high-level assembler with optimization passes. You can find it at https://c9x.me/compile/code.html
TODO: Incorporate feedback from https://lobste.rs/s/je3o8m/qbe_notes and stuff
Minor things
It takes text files as an input instead of providing a library to link to, which is a little bit of a mixed blessing. For my purposes it’s fine.
I low key hate the C style type names; i32
is w
for “word”, i64
is l
for “long”, f32
is s
for “single”, etc. But fuck it, the computer is the one who has to deal with them, I just have to output them.
There’s no explicit pointer/address type? Seems a little odd, it says in the docs to just use an integer. …it only supports 64-bit backends, so your pointers are always i64
. That’s… actually not unreasonable, considering its problem scope is “hobbyist-level”. Might make adding a wasm32 backend Interesting some day. Or compiling for embedded platforms (aka ARM32 or RISCV32). 32-bit isn’t completely dead yet, it seems! However, the docs are written as if 32-bit support will be considered in the future, so we’ll see how it goes. 32 bit platforms generally have different ABI’s than 64-bit ones, so there’s not necessarily much to be gained by pretending that they’re the same thing.
On my Pinebook Pro it output ARM64 assembly code automatically, which is nice because I’d literally not considered that it could have been a problem. However, compiling/linking the ARM64 asm it outputs requires the -no-pie
flag passed to your C compiler. If you don’t do that, you get a crazy linker error message about -fPIE
that points you in entirely the wrong direction. I guess the asm it outputs is not position-independent? Why not? It appears that Linux on ARM64 expects position-independent executables by default, or at least GCC+LD does. Would be nice if the QBE docs said something about that somewhere. I tried several different flags but only figured out the right one because someone on the mailing list coincidentally asked about a similar problem the day later.
There are “half-word” (h
) and “byte” (b
) types, but they can’t be used in function args. They can only be used in struct definitions. Assuming that function args are always in registers, this sorta makes sense, I think? Feels a little weird though. Maybe makes the compiler work a little harder than it needs to, you gotta explicitly widen everything to words to call functions and then use the correct size loads/stores inside the functions anyway.
Actual problems
Aha, I found an actual criticism: No add-with-carry instruction! Or subtract-with-borrow. Can’t implement i128
types nicely.
Functions have an optional env
parameter that is an i64
? It’s hard to figure out what it actually does. The docs just say: “The intended use of this feature is to pass the environment pointer of closures while retaining a very good compatibility with C.” “An environment parameter can be passed as first argument using the env keyword. The passed value must be a 64-bit integer. If the called function does not expect an environment parameter, it will be safely discarded.” Oooookay so it’s obviously intended for closure environments but how does this make life easier than just turning your closures into functions that take the environment as an explicit arg? This is what my compiler does already anyway.
Okay okay. Let’s start from the top and play with this env
thing a little. Per the docs, you can define a function and call it with an env
parameter, and if the callee doesn’t use the env
parameter then it just ignores it. So you can do this:
data $str = { b "%ld", b 0 }
function $print_i64(l %i) {
@start
call $printf(l $str, ..., l %i)
ret
}
export function w $main() {
@start
%e =l copy 99
call $print_i64(env %e, l 12)
ret 0
}
This generates the following asm code. Sorry for the ugly AT&T syntax.
.data
.balign 8
str:
.ascii "%ld"
.byte 0
/* end data */
.text
print_i64:
pushq %rbp
movq %rsp, %rbp
movq %rdi, %rsi
leaq str(%rip), %rdi
movl $0, %eax
callq printf
leave
ret
.type print_i64, @function
.size print_i64, .-print_i64
/* end function print_i64 */
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
movl $12, %edi
movl $99, %eax
callq print_i64
movl $0, %eax
leave
ret
.type main, @function
.size main, .-main
/* end function main */
.section .note.GNU-stack,"",@progbits
Ok, we see our 99
being put into eax
/rax
, which per the System-V ABI is a temporary value – that is, the callee can clobber it and it’s fine. So this technically violates the ABI by sneaking in an extra function argument, but does so in a way that can’t break anything that doesn’t know about it. Okay, that’s cute I guess. So what happens if you have a function with an “env” parameter and don’t pass it one?
data $str = { b "%ld", b 0 }
function $print_i64(env %e, l %i) {
@start
%foo =l add %e, %i
call $printf(l $str, ..., l %foo)
ret
}
export function w $main() {
@start
call $print_i64(l 12)
ret 0
}
A quick qbe closure_fiddling.ssa
and it… ah, silently generates invalid code.
.data
.balign 8
str:
.ascii "%ld"
.byte 0
/* end data */
.text
print_i64:
pushq %rbp
movq %rsp, %rbp
movq %rax, %rsi # rax is read here...
addq %rdi, %rsi
leaq str(%rip), %rdi
movl $0, %eax
callq printf
leave
ret
.type print_i64, @function
.size print_i64, .-print_i64
/* end function print_i64 */
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
movl $12, %edi
callq print_i64 # But nothing before this call sets rax or eax to anything!
movl $0, %eax
leave
ret
.type main, @function
.size main, .-main
/* end function main */
.section .note.GNU-stack,"",@progbits
It does similar things on the ARM64 and RV64 backends. Okay, good to know: don’t use the env
feature. Bug report filed here. Not a deal breaker; I’ve committed worse sins, and as I said, my compiler already does the work that an env
arg would get used for.
Conclusions
All in all so far: Much more than a toy, much less than what I would consider “industrial grade”. It’s a tough niche to target. I don’t hate it.
Okay I take it all back, because I actually looked at the code trying to find where the env
check should be happening and wasn’t. It was tricky ’cause the code is dense and weird and full of single-letter variables, globals, and other nonsense. I was starting to get discouraged, then halfway through the parser I stumbled across this:
switch (t) {
default:
if (isstore(t)) {
case Tcall:
case Ovastart:
/* operations without result */
r = R;
k = Kw;
op = t;
goto DoOp;
}
err("label, instruction or jump expected");
case ...:
...
}
DoOp:
...
So… we have a switch
statement that has default
as the first member, which has an if
statement that contains cases to jump directly into it, sets some magical single-letter variables and goto
’s something most of the way to the end of the function. And that’s not just a goto Error;
type thing either, it does some complicated logic and then maybe goto
’s another thing further down.
New plan: Let’s not use QBE. If I can’t find and fix a simple bug because the code is absolutely insane, then I’m not going to rely on it.