TernaryComputer
ah shit this needs a name. The Soviet ternary computer was named Setun, after a river in Russia. I could name this after a river near me, and force everyone to try to pronounce Monongahela. Sure, let’s do it.
Monongahela
So I sort of want to make a virtual machine for a computer using balanced ternary, just to see how it’d work out. Signed vs. unsigned arithmetic is generally a big pain in the ass, but with balanced ternary that’s not a problem, because sign is built into the number system implicitly. You could make a ternary computer in non-balanced ternary, but balanced ternary is rad af so there’s no reason not to use it. So let’s make a typical Von Neumann architecture machine with Kleene logic.
Of course it’s going to be slow as fuck since we have to essentially emulate each trit on top of binary operations, but hey. This is an experiment.
This sort of thing is explored in MUCH MORE DEPTH by http://www.scribd.com/doc/78370674/Ternary-Computing-TestBed-3-Trit-Computer-Architecture which, in case that link vanishes someday, is “Ternary Computing Testbed 3-Trit Computer Architecture”, by Jeff Connelly, Advised by Professor Phillip Nico of the California Polytechnic State University of San Luis Obispo, published August 2008. Oooor see http://jeff.tk//trinary/CPE%20Report%20-%20Ternary%20Computing%20Testbed%20-%20RC6.pdf . There’s also other explorations here: https://homepage.cs.uiowa.edu/~dwjones/ternary/ , but they don’t use balanced ternary and so are officially Le Lame.
…someone should make an emulator for one of the old 36-bit computers, except it’s actually all ternary.
Balanced ternary
We have three digits: +, - and 0. + is positive one, - is negative
one, and 0 is zero. This has the nice property of letting the highest
nonzero trit always determine whether we have a positive or negative
number, all arithmetic is the same on signed and unsigned numbers, and a
few other things. Negative and positive can be switched by inverting the
number, swapping every +
and -
. They are also
symmetrical in their numerical range, so a word of a specific size has
an equal range into the negative and the positive,unlike 2’s-complement
binary.
So let’s write some of these to try to get a feel for how this works:
Decimal | Balanced ternary |
---|---|
-13 | --- |
-12 | --0 |
-11 | --+ |
-10 | -0- |
-9 | -00 |
-8 | -0+ |
-7 | -+- |
-6 | -+0 |
-5 | -++ |
-4 | 0-- |
-3 | 0-0 |
-2 | 0-+ |
-1 | 00- |
0 | 000 |
1 | 00+ |
2 | 0+- |
3 | 0+0 |
4 | 0++ |
5 | +-- |
6 | +-0 |
7 | +-+ |
8 | +0- |
9 | +00 |
10 | +0+ |
11 | ++- |
12 | ++0 |
13 | +++ |
One trit has a range of 3, two trits 9, 3 trits 27, etc. This leads to some weird-feeling min and max ranges like +/- 13, 40 and such, but you get used to it. Basically, for word size n, the range is plus/minus ((3^n)-1) / 2.
Word size | Range |
---|---|
1 trit (3^1) | -1 to +1 |
2 trits (3^2) | -4 to +4 |
3 trits (3^3) | -13 to +13 |
4 trits (3^4) | -40 to +40 |
5 trits (3^5) | -121 to +121 |
6 trits (3^6) | -364 to +364 |
7 trits (3^7) | -1093 to +1093 |
8 trits (3^8) | -3280 to +3280 |
9 trits (3^9) | -9841 to +9841 |
10 trits | +- 29,524 |
18 trits | +- 193,710,244 |
20 trits | +- 1,743,392,200, (1.7 billion) |
27 trits | +- 3,812,798,742,493 (3.8 trillion) |
30 trits | +- 102,945,566,047,324 (102 trillion) |
Comparison with handy binary numbers
Word size | Binary range near it |
---|---|
5 trits | 0.945 * 2^8 |
6 trits | 2.843 * 2^8 |
9 trits | 19.22 * 2^10 |
10 trits | 0.901 * 2^16 |
18 trits | 369.5 * 2^20 |
20 trits | 0.811 * 2^32 |
27 trits | 1775 * 2^32 |
30 trits | 0.000111 * 2^64 |
36 trits | 0.008 * 2^64 |
Shifts left and right will multiply and divide a number by three. Per design, signedness doesn’t exist, so sign extension doesn’t exist; shifts just always fill the high bits with 0. Floating point numbers could be implemented without a sign bit, but I’m not going to bother with floating point right now. All comparisons return either higher, lower or equal, and we could conceivably have three-destination jumps based on that in addition to just here-or-there. And that really covers it. The basic operations really don’t get changed much, conceptually.
We could write 2 trits per digit in base-9, or 3 trits per digit in base-27. Both of these feel weird with negative numbers, so I’m generally not going to do either. Maybe base-9 with positive numbers being 1234 and negative ones being ABCD? We could do the same in base 18 to have 5 trits in 4 digits I think… Eh, not too useful.
Logic!
There’s several types of three-valued logic. We’ll go with Kleene logic because it seems the simplest and most sensible, at least for a machine rather than some kind of proof notation. If we run into a problem with that, we can always change it.
V | NOT V |
---|---|
+ |
- |
0 |
0 |
- |
+ |
AND | + |
0 |
- |
---|---|---|---|
+ |
+ |
0 |
- |
0 |
0 |
0 |
- |
- |
- |
- |
- |
OR | + |
0 |
- |
---|---|---|---|
+ |
+ |
+ |
+ |
0 |
+ |
0 |
0 |
- |
+ |
0 |
- |
Can’t find a reference for xor. Someone online said its undefined.
But you can evaluate XOR in binary just as
(a OR b) AND NOT (a AND B)
, so let’s try that and see how
it works. I thiiiiiink this is right:
XOR | + |
0 |
- |
---|---|---|---|
+ |
- |
0 |
+ |
0 |
0 |
0 |
0 |
- |
+ |
0 |
- |
Architecture
Word size
A 9 or 12 trit computer would be on the microcontroller scale, comparable to 16 or 20 bits of binary address space. 18 trits is an ok size, it’s a few times smaller than 32 bits but still quite large. 27 trits is a bit weird ’cause it’s way bigger than 32 bits but way smaller than 64 bits; it’s closest to 48 bits. 36 trits is a little under 1% of a 64-bit address space, but a 64-bit address space is absurdly huge so you probably won’t notice the difference. I want to build something bigger than a microcontroller (big enough for real desktop programs), but not necessarily full-scale-modern-supercomputer with all the bells and whistles. An 18 trit pointer gives us +- 193 million values of address space, so that doesn’t seem terrible for a proof of concept. 18 trits seems a nice size.
With 27 registers you need 3 trits of address per register, so you could make a quite reasonable 3-address instruction encoding with 18 trits per instruction.
Can we subdivide our 18-trit words into… uh I guess “trytes”? Seems useful for text characters and stuff. The obvious sizes are 6 or 9 trits-per-tryte. After some waffling I will go with 6 so that our “byte” equivalent is -364 to +364, which seems okay I guess, and is not too far off from an actual byte for the purposes of comparing text encodings, memory sizes, etc. It also means we can fit 3 trytes into a word, which is nice. So memory will be tryte-addressed with a 6-trit tryte. Everything will be little-endian, to keep life consistent.
Registers
We have 27 18-trit general purpose registers, numbered -r13 to r13. There’s an instruction pointer that is not a GPR. Nothing else too special. All instructions are aligned on an 18 trit boundary.
- r0 is zero register
- r1 is return address/link register
- r2 is stack pointer (grows downward)
- r3 is global pointer
- Thread pointer?
- Calling convention? args/saved/temporary.
We could just not have a flags register and use a GPR for all tests/comparisons, but then david_chisnall would laugh at me on lobste.rs. So we have a flags register that is not a GPR. …I kinda like the idea of POWER’s multiple flags registers. Ok, the flags register is 6 trits and subdivided into 3 2-trit “subregisters”. (I would love for it to be bigger like 9 subregisters, but we don’t really have the instruction encoding space for it without making life more complicated and I do want to have a “zero register” that ignores flag writes.) Comparisons and checks can read/write to these subregisters. They just contain one trit for carry/overflow and one trit for gt/lt/zero comparison.
So we have FL[x]
where x
is -1 to +1, and
it has two subfields: FL[x].CARRY
and
FL[x].CMP
CMP is +0- for GT, EQ, LT respectively, and CARRY
is + for add-with-carry, - for sub-with-borrow and 0 for no-overflow. We
can add more subfields to it easily if we discover we need them. Writes
to FL[0]
are ignored, so instructions can write to it to
say “we don’t care about our flags”. (Does reading FL[0] do anything
special? idk.)
We can also have operations for reading/writing control/status registers (CSR’s) a la RISC-V. For now though they are all undefined and open for implementations to use them however they want.
Not to do for now:
- Do we want a saturating-carry flag? Not yet.
- Non-GPR stack pointer? Eeeeh, we could do that, but I don’t think it’s worth making the instruction decoding more complicated just for a toy VM.
Instructions
Instruction formats:
R
egister: 5 trit opcode, 1 trit of flags plus 3 3-trit register operands (dest, src1, src2), plus a 3-trit immediate as well just for lulz (maybe we can use it for something like ARM’s barrel shifter).R4
instruction, kinda a subset ofR
we use for a few instructions where the immediate portion specifies a 4th register. This may be a mistake, idk.I
mmediate: 5 trit opcode, 1 trits of flags, 3 trit destination, 3 trit register operand, 6 trit immediate value. 6 trits is a bit small but oh well.M
emory: 3 trit opcode, 1 3-trit register operands, 12 trit immediate value.
This is satisfying ’cause it means the first trit of the opcode alone
can tell us what kind of instruction it is. The first trit of an R
instruction is +
, an I
instruction is
0
, and a M
instruction is -
.
Instruction layouts:
trit 000000 000011 111111
012345 678901 234567
| | | | | | |
R instr +PPPPF DDDAAA BBBIII
I instr 0PPPPF DDDAAA IIIIII
M instr -PPDDD IIIIII IIIIII
P = opcode
D = destination reg
A = src reg 1
B = src reg 2
I = immediate value
F = flags register indicator
TODO: Not sure whether the R4 instruction format is a good idea or not.
Instruction reference
Memory and registers
Instruction | Opcode | Format | code | Notes |
---|---|---|---|---|
LIL | M | rd[0:12] = imm12 | Load low immediate. Are rd[13:18] left as is, or zeroed? | |
LIH | I | rd[10:18] = imm6[0:6] | Load high immediate. Do we do anything with ra? Or use an M instruction format and do something with the extra 6 bits? idk. | |
LD | I | rd = *(ra + imm6) | Load absolute word. Can we do something useful with the flags trit? Maybe use it to select a byte to load in the word? | |
ST | I | *(rd + imm6) = ra | Store absolute. Can we do something useful with the flags trit? | |
STL | I | *(rd + imm6) = ra[0:6] | Store low tryte. | |
AUIPC | M | rd = pc + (imm12 << 6) | add upper immediate to program counter,
use this for constructing relative addresses like RISC-V’s
AUIPC |
|
RFL | R | rd[0:6] = FL | Read flags register, upper trits of destination are zeroed | |
WFL | R | FL = rd [0:6] | Write flags register | |
RCSR | I | rd = CSR[ra] | Read CSR | |
WCSR | I | CSR[rd] = ra | Write CSR |
TODO:
- Better tryte-wise load and store? Tryte-wise loads are technically just an optimization but stores do matter. Maybe have a “load/store tryte” and use the flags trit in the instruction as an index? Or just have a non-zero flags trit indicate “load low” or “load high”?
- Trit or tryte extraction/shuffle? For now let’s not bother I suppose.
- Compare and exchange instructions? Swap instructions? Atomics? Might be handy but not useful for a smol-ish proof of concept.
- Load and store multiple? Have them fixed size so they’re at least constant time.
Math and logic
Instruction | Opcode | Format | code | Notes |
---|---|---|---|---|
ADD | R | rd = ra + rb + imm3; FL[F].CARRY = carry | Would the imm3 be better spent on a rotate of rb? | |
ADC | R | rd = ra + rb + imm3 + FL[F].CARRY; FL[F].CARRY = carry | Would the imm3 be better spent on a rotate of rb? | |
SUB | R | rd = ra - rb - imm3; FL[F].CARRY = carry | Would the imm3 be better spent on a rotate of rb? Borrow is represented with the CARRY field being - | |
SBB | R | rd = ra - rb - imm3 + FL[imm3].CARRY; FL[F].CARRY = carry | Subtract with borrow. | |
MUL | R4 | rc:rd = ra * rb; FL[F].CARRY = overflow | The imm3 field holds rc. Not sure if this is a good idea but it works. | |
MULA | R4 | rd = ra * rb + rc; FL[F].CARRY = overflow | Multiply-and-add. The imm3 field holds rc. Not sure this is a good idea but whatever | |
DIV | R4 | rd = ra:rb / rc | The imm3 field holds rc. Not sure if this does anything to the carry flag? | |
REM | R4 | rd = ra:rb % rc | Remainder operation, (a / b) + (a % b) = d. double-check. Not sure if this does anything to the carry flag? | |
NEG | I | rd = -ra | Negate operation, same as logical NOT | |
CMP | R | rd = FL[F].CMP = signum(ra - rb) | Storing the result in rd might be useful for something I suppose? Probably more useful than doing nothing with it. | |
ADDI | I | rd = ra + imm6; FL[F].CARRY = carry | ||
ADCI | I | rd = ra + imm6 + FL[F].CARRY; FL[F].CARRY = carry | ||
SUBI | I | rd = ra - imm6; FL[F].CARRY = borrow | ||
SBBI | I | rd = ra - imm6 + FL[F].CARRY; FL[F].CARRY = borrow | ||
MULI | R | rb:rd = ra * imm3; FL[F].CARRY = overflow | Maybe not very useful tbh | |
DIVI | R | rd = ra:rb / imm3 | double-check | |
MODI | R | rd = ra:rb % imm3 | double-check | |
CMPI | I | rd = FL[F].cmp = signum(ra - imm6) | ||
AND | R | rd = ra AND (rb << imm3) | ||
OR | R | rd = ra OR (rb << imm3) | ||
XOR | R | rd = ra XOR (rb << imm3) | ||
NOT | I | rd = NOT ra | Alias for NEG | |
ANDI | I | rd = ra AND imm6 | Not sure whether top bits should be zeroed or left alone | |
ORI | I | rd = ra OR imm6 | Not sure whether top bits should be zeroed or left alone | |
XORI | I | rd = ra XOR imm6 | Not sure whether top bits should be zeroed or left alone | |
MIN | R | rd = min(ra, rb) | Can we do something useful with imm3? | |
MAX | R | rd = max(ra, rb) | Can we do something useful with imm3? |
You can implement SUB as negate and add, idk if I care.
You specify F flag trit by adding a .+ or .- suffix to the instruction. No suffix means F=0, so by default operations do not check flags.
TODO:
- We have a bunch of operations where the R-format
imm3
field is unused or uncertainly used and idk if there’s anything better to do with it since its range is only -13 to +13. Havingrd = ra + rb + some small integer constant
seems pretty useful for lots of things, doing the same operation with division or whatever seems less so. - In related news, we use 4-operand instructions for mul and div and rem and multiply+add, not sure if that’s reasonable but it’s something.
- I don’t really remember how multiply and divide should interact with flags, needs to double check.
- There’s annoying differences between remainder and modulo; do those still exist in balanced ternary? Check.
- Someday it would be nice to have a “sticky overflow” bit in flags, so you can clear it, do a bunch of operations, then check if any of them overflowed. For now it’s not worth worrying about though.
Not to do for now:
- saturating arithmetic
- Floating point
- Min and max instructions could be a compare followed by a conditional move, not sure if I care.
Bit mani… uh, trit manipulation
Instruction | Opcode | Format | code | Notes |
---|---|---|---|---|
SH | R | rd = ra << rb | Shift left or right. Negative values of rb shift right. The last shifted-off trit is stored in FL[F].CARRY. Can we do something with imm3? | |
SHI | I | rd = ra << imm6 | Shift by immediate. Negatives values of imm6 shift right. The last shifted trit is stored in FL[F].CARRY. | |
ROT | R | rd = ra ROTATE rb | Rotate left or right. Negative values of rb rotate right, values pushed off one end appear on the other end | |
ROTI | I | rd = ra ROTATE imm6 | Rotate by immediate. | |
CLZ | I | rd = count-leading-zeros(ra) | ||
SIG | I | rd = FLAGS[F].CMP = signum(ra) | Signum, alias for
CMPI rd, ra, 0 |
|
POPCNT | I | rd = popcount(ra) | popcount, returns number of non-zero trits |
TODO:
- Can we do something useful with the imm3 in SH and ROT?
- Can we do something useful with the imm6 in CLZ? Probably not.
- Shortcut operations to test/set/unset particular trits? We could do those with AND/OR/XOR operations that take an imm3 shift, as discussed with tryte-extraction ops above.
Jumps
Instruction | Opcode | Format | code | Notes |
---|---|---|---|---|
J | M | pc = rd + 3*imm12 | Bog-standard absolute jump. All instructions are on 18-trit boundaries so this cannot jump to an unaligned instruction. | |
JL | M | r1 = pc + 3; pc = rd + 3*imm12 | Jump-and-link. | |
JC | I | if FL[F].COMPARE = + then pc = rd + 3*imm6; if FL[F].COMPARE = - then pc = ra + 3*imm6; if FL[F].COMPARE = 0 then fallthrough | Jump on condition. Goofy 3-way jump ’cause sure why not. Basically jumps to rd+imm6 if the flag is positive, or ra+imm6 if the flag is negative. …now that I think of it, having the same immediate for both is not necessarily great, hmm. | |
JCARRY | I | if FL[F].CARRY = + then pc = rd + 3*imm6; if FL[F].CARRY = - then pc = ra + 3*imm6; if FL[F].CARRY = 0 then fallthrough | Jump on carry. Goofy 3-way jump ’cause sure why not. |
You construct PC-relative jumps like loads and stores, using AUIPC.
The 3-way jumps are actually a lot more reasonable than they first
look, you can synthesize 2-way jumps out of them by doing
JC r9, r9
or something which will then jump to
r9
if the carry flag is nonzero or fallthrough if it is
zero.
TODO:
- How do you jump only if the COMPARE flag is zero? Do
CMPI rSomething, 0
I guess. Not sure that’s ideal.
- Not sure if the immediate offset for the jump being the same for both branches is ideal either. Still, good enough for now.
AUIPC
loading a 12 trit offset and all our jumps having a 9 or 12 trit offset means we can address 21 or 24 trits of address space, which is frankly overkill. Honestly though, better than the other way around.
Other
Instruction | Opcode | Format | code | Notes |
---|---|---|---|---|
NOP | R | no operation | Canonical encoding is
ADD r0, r0, r0 |
|
UND | 0 | I | error | guaranteed undefined instruction, all zero’s |
HLT | halts CPU | |||
RESERVED | anything | Reserved instruction space for implementation-dependent stuff |
Meta-operations
Basically, shortcuts for achieving specific things that may not be obvious:
- You can construct PC-relative addresses for loads, stores and jumps
with
AUIPC rd, imm12
followed by any ol M or I-type instruction. Honestly with this instruction format we could probably have a 24-trit address space and be ok. - You can set CMP flags to specific values with
CMPI r0, r0, plus or minus constant
TODO:
- Can you set CARRY flags to specific values somehow?
Encoding space used
Counts are not necessarily exact, I’m not gonna recount everything every single time I change something.
R
instructions: 4 trits of opcode space, 81 values total. 16 used.I
instructions: 4 trits of opcode space, 81 values total. 27 used.M
instructions: 2 trits of opcode space, 9 values total. 4 used.
Total instructions: 47ish.
We cooooould get away with 2 trits of address space for a 9-register flags reg. We would have almost literally zero extra encoding space left over though.
Implementation
So one of the first things to ask oneself for writing a VM for this is, “how do I represent trits?”. We have have several possibilities:
- Pack a trit into 2 bits and implement operations one trit as a time. This gives us 4 possible values per trit, so we will just say one of them is invalid. Then we can have 00 be 0, 01 be + and 10 be - .This gives us 4 trits per byte, or 16 trits per 32-bit integer, which is pretty good, but means 18 trits would take 5 bytes. I don’t really mind the memory usage but having things be unaligned and byte-wise Irks me.
- Represent trytes as a larger machine word, either 5 trits into 8 bits, 9 trits into 16 bits, or 18 trits into 32 bits. You wouldn’t represent trits directly, just operate on numbers. Then you’d do all the math and stuff in binary so, a tryte holding the value -13 would be represented as a byte holding the value -13. This would work pretty fine for math but doing ternary logic would basically require decoding and then re-encoding the trytes, which sounds slow considering that instruction decode would be lots of ternary logic.
- Pack 4 trits into 8 bits but, instead of doing operations one trit at a time, implement them as a lookup table or maybe SIMD operations.
- More complicated encoding to try to stuff an 18-trit word into 32 bits. You can’t quite stuff 2 trits into 3 bits, annoyingly, so I’m probably not going to bother. This devolves into something close to the “5 trit per byte” encoding above.