QbeNotes

So QBE hit 1.0 recently and hey, it looks like the sort of thing that I want for my compiler experiment. I’m a big believer in the value of the simple 90% solution over the complex 100% solution whenever possible, so a simple compiler backend that does most of the most important optimizations seems like my kind of jam. So I decided to try to write a simple backend for my compiler (given that currently there’s no other sort) to output QBE code and see how it works out in practice. I’m writing down my thoughts on it, because someone is sure to ask.

What is QBE? It’s a compiler backend “that aims to provide 70% of the performance of industrial optimizing compilers in 10% of the code”. Basically, high-level assembler with optimization passes. You can find it at https://c9x.me/compile/code.html

TODO: Incorporate feedback from https://lobste.rs/s/je3o8m/qbe_notes and stuff

Minor things

It takes text files as an input instead of providing a library to link to, which is a little bit of a mixed blessing. For my purposes it’s fine.

I low key hate the C style type names; i32 is w for “word”, i64 is l for “long”, f32 is s for “single”, etc. But fuck it, the computer is the one who has to deal with them, I just have to output them.

There’s no explicit pointer/address type? Seems a little odd, it says in the docs to just use an integer. …it only supports 64-bit backends, so your pointers are always i64. That’s… actually not unreasonable, considering its problem scope is “hobbyist-level”. Might make adding a wasm32 backend Interesting some day. Or compiling for embedded platforms (aka ARM32 or RISCV32). 32-bit isn’t completely dead yet, it seems! However, the docs are written as if 32-bit support will be considered in the future, so we’ll see how it goes. 32 bit platforms generally have different ABI’s than 64-bit ones, so there’s not necessarily much to be gained by pretending that they’re the same thing.

On my Pinebook Pro it output ARM64 assembly code automatically, which is nice because I’d literally not considered that it could have been a problem. However, compiling/linking the ARM64 asm it outputs requires the -no-pie flag passed to your C compiler. If you don’t do that, you get a crazy linker error message about -fPIE that points you in entirely the wrong direction. I guess the asm it outputs is not position-independent? Why not? It appears that Linux on ARM64 expects position-independent executables by default, or at least GCC+LD does. Would be nice if the QBE docs said something about that somewhere. I tried several different flags but only figured out the right one because someone on the mailing list coincidentally asked about a similar problem the day later.

There are “half-word” (h) and “byte” (b) types, but they can’t be used in function args. They can only be used in struct definitions. Assuming that function args are always in registers, this sorta makes sense, I think? Feels a little weird though. Maybe makes the compiler work a little harder than it needs to, you gotta explicitly widen everything to words to call functions and then use the correct size loads/stores inside the functions anyway.

Actual problems

Aha, I found an actual criticism: No add-with-carry instruction! Or subtract-with-borrow. Can’t implement i128 types nicely.

Functions have an optional env parameter that is an i64? It’s hard to figure out what it actually does. The docs just say: “The intended use of this feature is to pass the environment pointer of closures while retaining a very good compatibility with C.” “An environment parameter can be passed as first argument using the env keyword. The passed value must be a 64-bit integer. If the called function does not expect an environment parameter, it will be safely discarded.” Oooookay so it’s obviously intended for closure environments but how does this make life easier than just turning your closures into functions that take the environment as an explicit arg? This is what my compiler does already anyway.

Okay okay. Let’s start from the top and play with this env thing a little. Per the docs, you can define a function and call it with an env parameter, and if the callee doesn’t use the env parameter then it just ignores it. So you can do this:

data $str = { b "%ld", b 0 }

function $print_i64(l %i) {
@start
    call $printf(l $str, ..., l %i)
    ret
}
export function w $main() {
@start
    %e =l copy 99
    call $print_i64(env %e, l 12)
    ret 0
}

This generates the following asm code. Sorry for the ugly AT&T syntax.

.data
.balign 8
str:
        .ascii "%ld"
        .byte 0
/* end data */

.text
print_i64:
        pushq %rbp
        movq %rsp, %rbp
        movq %rdi, %rsi
        leaq str(%rip), %rdi
        movl $0, %eax
        callq printf
        leave
        ret
.type print_i64, @function
.size print_i64, .-print_i64
/* end function print_i64 */

.text
.globl main
main:
        pushq %rbp
        movq %rsp, %rbp
        movl $12, %edi
        movl $99, %eax
        callq print_i64
        movl $0, %eax
        leave
        ret
.type main, @function
.size main, .-main
/* end function main */

.section .note.GNU-stack,"",@progbits

Ok, we see our 99 being put into eax/rax, which per the System-V ABI is a temporary value – that is, the callee can clobber it and it’s fine. So this technically violates the ABI by sneaking in an extra function argument, but does so in a way that can’t break anything that doesn’t know about it. Okay, that’s cute I guess. So what happens if you have a function with an “env” parameter and don’t pass it one?

data $str = { b "%ld", b 0 }

function $print_i64(env %e, l %i) {
@start
    %foo =l add %e, %i
    call $printf(l $str, ..., l %foo)
    ret
}
export function w $main() {
@start
        call $print_i64(l 12)
        ret 0
}

A quick qbe closure_fiddling.ssa and it… ah, silently generates invalid code.

.data
.balign 8
str:
        .ascii "%ld"
        .byte 0
/* end data */

.text
print_i64:
        pushq %rbp
        movq %rsp, %rbp
        movq %rax, %rsi       # rax is read here...
        addq %rdi, %rsi
        leaq str(%rip), %rdi
        movl $0, %eax
        callq printf
        leave
        ret
.type print_i64, @function
.size print_i64, .-print_i64
/* end function print_i64 */

.text
.globl main
main:
        pushq %rbp
        movq %rsp, %rbp
        movl $12, %edi
        callq print_i64       # But nothing before this call sets rax or eax to anything!
        movl $0, %eax
        leave
        ret
.type main, @function
.size main, .-main
/* end function main */

.section .note.GNU-stack,"",@progbits

It does similar things on the ARM64 and RV64 backends. Okay, good to know: don’t use the env feature. Bug report filed here. Not a deal breaker; I’ve committed worse sins, and as I said, my compiler already does the work that an env arg would get used for.

Conclusions

All in all so far: Much more than a toy, much less than what I would consider “industrial grade”. It’s a tough niche to target. I don’t hate it.

Okay I take it all back, because I actually looked at the code trying to find where the env check should be happening and wasn’t. It was tricky ’cause the code is dense and weird and full of single-letter variables, globals, and other nonsense. I was starting to get discouraged, then halfway through the parser I stumbled across this:

    switch (t) {
    default:
        if (isstore(t)) {
        case Tcall:
        case Ovastart:
            /* operations without result */
            r = R;
            k = Kw;
            op = t;
            goto DoOp;
        }
        err("label, instruction or jump expected");
    case ...:
    ...
    }
DoOp:
    ...

So… we have a switch statement that has default as the first member, which has an if statement that contains cases to jump directly into it, sets some magical single-letter variables and goto’s something most of the way to the end of the function. And that’s not just a goto Error; type thing either, it does some complicated logic and then maybe goto’s another thing further down.

New plan: Let’s not use QBE. If I can’t find and fix a simple bug because the code is absolutely insane, then I’m not going to rely on it.