TernaryComputer

Monongahela
Balanced ternary
- Logic!
Architecture
Instruction reference
Implementation

ah shit this needs a name. The Soviet ternary computer was named Setun, after a river in Russia. I could name this after a river near me, and force everyone to try to pronounce Monongahela. Sure, let’s do it.

Monongahela

So I sort of want to make a virtual machine for a computer using balanced ternary, just to see how it’d work out. Signed vs. unsigned arithmetic is generally a big pain in the ass, but with balanced ternary that’s not a problem, because sign is built into the number system implicitly. You could make a ternary computer in non-balanced ternary, but balanced ternary is rad af so there’s no reason not to use it. So let’s make a typical Von Neumann architecture machine with Kleene logic.

Of course it’s going to be slow as fuck since we have to essentially emulate each trit on top of binary operations, but hey. This is an experiment.

This sort of thing is explored in MUCH MORE DEPTH by http://www.scribd.com/doc/78370674/Ternary-Computing-TestBed-3-Trit-Computer-Architecture which, in case that link vanishes someday, is “Ternary Computing Testbed 3-Trit Computer Architecture”, by Jeff Connelly, Advised by Professor Phillip Nico of the California Polytechnic State University of San Luis Obispo, published August 2008. Oooor see http://jeff.tk//trinary/CPE%20Report%20-%20Ternary%20Computing%20Testbed%20-%20RC6.pdf . There’s also other explorations here: https://homepage.cs.uiowa.edu/~dwjones/ternary/ , but they don’t use balanced ternary and so are officially Le Lame.

…someone should make an emulator for one of the old 36-bit computers, except it’s actually all ternary.

Balanced ternary

We have three digits: +, - and 0. + is positive one, - is negative one, and 0 is zero. This has the nice property of letting the highest nonzero trit always determine whether we have a positive or negative number, all arithmetic is the same on signed and unsigned numbers, and a few other things. Negative and positive can be switched by inverting the number, swapping every + and -. They are also symmetrical in their numerical range, so a word of a specific size has an equal range into the negative and the positive,unlike 2’s-complement binary.

So let’s write some of these to try to get a feel for how this works:

Decimal	Balanced ternary
-13	`---`
-12	`--0`
-11	`--+`
-10	`-0-`
-9	`-00`
-8	`-0+`
-7	`-+-`
-6	`-+0`
-5	`-++`
-4	`0--`
-3	`0-0`
-2	`0-+`
-1	`00-`
0	`000`
1	`00+`
2	`0+-`
3	`0+0`
4	`0++`
5	`+--`
6	`+-0`
7	`+-+`
8	`+0-`
9	`+00`
10	`+0+`
11	`++-`
12	`++0`
13	`+++`

One trit has a range of 3, two trits 9, 3 trits 27, etc. This leads to some weird-feeling min and max ranges like +/- 13, 40 and such, but you get used to it. Basically, for word size n, the range is plus/minus ((3^n)-1) / 2.

Word size	Range
1 trit (3^1)	-1 to +1
2 trits (3^2)	-4 to +4
3 trits (3^3)	-13 to +13
4 trits (3^4)	-40 to +40
5 trits (3^5)	-121 to +121
6 trits (3^6)	-364 to +364
7 trits (3^7)	-1093 to +1093
8 trits (3^8)	-3280 to +3280
9 trits (3^9)	-9841 to +9841
10 trits	+- 29,524
18 trits	+- 193,710,244
20 trits	+- 1,743,392,200, (1.7 billion)
27 trits	+- 3,812,798,742,493 (3.8 trillion)
30 trits	+- 102,945,566,047,324 (102 trillion)

Comparison with handy binary numbers

Word size	Binary range near it
5 trits	0.945 * 2^8
6 trits	2.843 * 2^8
9 trits	19.22 * 2^10
10 trits	0.901 * 2^16
18 trits	369.5 * 2^20
20 trits	0.811 * 2^32
27 trits	1775 * 2^32
30 trits	0.000111 * 2^64
36 trits	0.008 * 2^64

Shifts left and right will multiply and divide a number by three. Per design, signedness doesn’t exist, so sign extension doesn’t exist; shifts just always fill the high bits with 0. Floating point numbers could be implemented without a sign bit, but I’m not going to bother with floating point right now. All comparisons return either higher, lower or equal, and we could conceivably have three-destination jumps based on that in addition to just here-or-there. And that really covers it. The basic operations really don’t get changed much, conceptually.

We could write 2 trits per digit in base-9, or 3 trits per digit in base-27. Both of these feel weird with negative numbers, so I’m generally not going to do either. Maybe base-9 with positive numbers being 1234 and negative ones being ABCD? We could do the same in base 18 to have 5 trits in 4 digits I think… Eh, not too useful.

Logic!

There’s several types of three-valued logic. We’ll go with Kleene logic because it seems the simplest and most sensible, at least for a machine rather than some kind of proof notation. If we run into a problem with that, we can always change it.

V	NOT V
`+`	`-`
`0`	`0`
`-`	`+`

AND	`+`	`0`	`-`
`+`	`+`	`0`	`-`
`0`	`0`	`0`	`-`
`-`	`-`	`-`	`-`

OR	`+`	`0`	`-`
`+`	`+`	`+`	`+`
`0`	`+`	`0`	`0`
`-`	`+`	`0`	`-`

Can’t find a reference for xor. Someone online said its undefined. But you can evaluate XOR in binary just as (a OR b) AND NOT (a AND B), so let’s try that and see how it works. I thiiiiiink this is right:

XOR	`+`	`0`	`-`
`+`	`-`	`0`	`+`
`0`	`0`	`0`	`0`
`-`	`+`	`0`	`-`

Architecture

Word size

A 9 or 12 trit computer would be on the microcontroller scale, comparable to 16 or 20 bits of binary address space. 18 trits is an ok size, it’s a few times smaller than 32 bits but still quite large. 27 trits is a bit weird ’cause it’s way bigger than 32 bits but way smaller than 64 bits; it’s closest to 48 bits. 36 trits is a little under 1% of a 64-bit address space, but a 64-bit address space is absurdly huge so you probably won’t notice the difference. I want to build something bigger than a microcontroller (big enough for real desktop programs), but not necessarily full-scale-modern-supercomputer with all the bells and whistles. An 18 trit pointer gives us +- 193 million values of address space, so that doesn’t seem terrible for a proof of concept. 18 trits seems a nice size.

With 27 registers you need 3 trits of address per register, so you could make a quite reasonable 3-address instruction encoding with 18 trits per instruction.

Can we subdivide our 18-trit words into… uh I guess “trytes”? Seems useful for text characters and stuff. The obvious sizes are 6 or 9 trits-per-tryte. After some waffling I will go with 6 so that our “byte” equivalent is -364 to +364, which seems okay I guess, and is not too far off from an actual byte for the purposes of comparing text encodings, memory sizes, etc. It also means we can fit 3 trytes into a word, which is nice. So memory will be tryte-addressed with a 6-trit tryte. Everything will be little-endian, to keep life consistent.

Registers

We have 27 18-trit general purpose registers, numbered -r13 to r13. There’s an instruction pointer that is not a GPR. Nothing else too special. All instructions are aligned on an 18 trit boundary.

r0 is zero register
r1 is return address/link register
r2 is stack pointer (grows downward)
r3 is global pointer
Thread pointer?
Calling convention? args/saved/temporary.

We could just not have a flags register and use a GPR for all tests/comparisons, but then david_chisnall would laugh at me on lobste.rs. So we have a flags register that is not a GPR. …I kinda like the idea of POWER’s multiple flags registers. Ok, the flags register is 6 trits and subdivided into 3 2-trit “subregisters”. (I would love for it to be bigger like 9 subregisters, but we don’t really have the instruction encoding space for it without making life more complicated and I do want to have a “zero register” that ignores flag writes.) Comparisons and checks can read/write to these subregisters. They just contain one trit for carry/overflow and one trit for gt/lt/zero comparison.

So we have FL[x] where x is -1 to +1, and it has two subfields: FL[x].CARRY and FL[x].CMP CMP is +0- for GT, EQ, LT respectively, and CARRY is + for add-with-carry, - for sub-with-borrow and 0 for no-overflow. We can add more subfields to it easily if we discover we need them. Writes to FL[0] are ignored, so instructions can write to it to say “we don’t care about our flags”. (Does reading FL[0] do anything special? idk.)

We can also have operations for reading/writing control/status registers (CSR’s) a la RISC-V. For now though they are all undefined and open for implementations to use them however they want.

Not to do for now:

Do we want a saturating-carry flag? Not yet.
Non-GPR stack pointer? Eeeeh, we could do that, but I don’t think it’s worth making the instruction decoding more complicated just for a toy VM.

Instructions

Instruction formats:

Register: 5 trit opcode, 1 trit of flags plus 3 3-trit register operands (dest, src1, src2), plus a 3-trit immediate as well just for lulz (maybe we can use it for something like ARM’s barrel shifter).
R4 instruction, kinda a subset of R we use for a few instructions where the immediate portion specifies a 4th register. This may be a mistake, idk.
Immediate: 5 trit opcode, 1 trits of flags, 3 trit destination, 3 trit register operand, 6 trit immediate value. 6 trits is a bit small but oh well.
Memory: 3 trit opcode, 1 3-trit register operands, 12 trit immediate value.

This is satisfying ’cause it means the first trit of the opcode alone can tell us what kind of instruction it is. The first trit of an R instruction is +, an I instruction is 0, and a M instruction is -.

Instruction layouts:

  trit   000000 000011 111111
         012345 678901 234567
         |  |   |  |   |  |  |
R instr  +PPPPF DDDAAA BBBIII
I instr  0PPPPF DDDAAA IIIIII
M instr  -PPDDD IIIIII IIIIII

P = opcode
D = destination reg
A = src reg 1
B = src reg 2
I = immediate value
F = flags register indicator

TODO: Not sure whether the R4 instruction format is a good idea or not.

Instruction reference

Memory and registers

Instruction	Format	code	Notes
LIL	M	rd[0:12] = imm12	Load low immediate. Are rd[13:18] left as is, or zeroed?
LIH	I	rd[10:18] = imm6[0:6]	Load high immediate. Do we do anything with ra? Or use an M instruction format and do something with the extra 6 bits? idk.
LD	I	rd = *(ra + imm6)	Load absolute word. Can we do something useful with the flags trit? Maybe use it to select a byte to load in the word?
ST	I	*(rd + imm6) = ra	Store absolute. Can we do something useful with the flags trit?
STL	I	*(rd + imm6) = ra[0:6]	Store low tryte.
AUIPC	M	rd = pc + (imm12 << 6)	add upper immediate to program counter, use this for constructing relative addresses like RISC-V’s `AUIPC`
RFL	R	rd[0:6] = FL	Read flags register, upper trits of destination are zeroed
WFL	R	FL = rd [0:6]	Write flags register
RCSR	I	rd = CSR[ra]	Read CSR
WCSR	I	CSR[rd] = ra	Write CSR

TODO:

Better tryte-wise load and store? Tryte-wise loads are technically just an optimization but stores do matter. Maybe have a “load/store tryte” and use the flags trit in the instruction as an index? Or just have a non-zero flags trit indicate “load low” or “load high”?
Trit or tryte extraction/shuffle? For now let’s not bother I suppose.
Compare and exchange instructions? Swap instructions? Atomics? Might be handy but not useful for a smol-ish proof of concept.
Load and store multiple? Have them fixed size so they’re at least constant time.

Math and logic

Instruction	Format	code	Notes
ADD	R	rd = ra + rb + imm3; FL[F].CARRY = carry	Would the imm3 be better spent on a rotate of rb?
ADC	R	rd = ra + rb + imm3 + FL[F].CARRY; FL[F].CARRY = carry	Would the imm3 be better spent on a rotate of rb?
SUB	R	rd = ra - rb - imm3; FL[F].CARRY = carry	Would the imm3 be better spent on a rotate of rb? Borrow is represented with the CARRY field being -
SBB	R	rd = ra - rb - imm3 + FL[imm3].CARRY; FL[F].CARRY = carry	Subtract with borrow.
MUL	R4	rc:rd = ra * rb; FL[F].CARRY = overflow	The imm3 field holds rc. Not sure if this is a good idea but it works.
MULA	R4	rd = ra * rb + rc; FL[F].CARRY = overflow	Multiply-and-add. The imm3 field holds rc. Not sure this is a good idea but whatever
DIV	R4	rd = ra:rb / rc	The imm3 field holds rc. Not sure if this does anything to the carry flag?
REM	R4	rd = ra:rb % rc	Remainder operation, (a / b) + (a % b) = d. double-check. Not sure if this does anything to the carry flag?
NEG	I	rd = -ra	Negate operation, same as logical NOT
CMP	R	rd = FL[F].CMP = signum(ra - rb)	Storing the result in rd might be useful for something I suppose? Probably more useful than doing nothing with it.
ADDI	I	rd = ra + imm6; FL[F].CARRY = carry
ADCI	I	rd = ra + imm6 + FL[F].CARRY; FL[F].CARRY = carry
SUBI	I	rd = ra - imm6; FL[F].CARRY = borrow
SBBI	I	rd = ra - imm6 + FL[F].CARRY; FL[F].CARRY = borrow
MULI	R	rb:rd = ra * imm3; FL[F].CARRY = overflow	Maybe not very useful tbh
DIVI	R	rd = ra:rb / imm3	double-check
MODI	R	rd = ra:rb % imm3	double-check
CMPI	I	rd = FL[F].cmp = signum(ra - imm6)
AND	R	rd = ra AND (rb << imm3)
OR	R	rd = ra OR (rb << imm3)
XOR	R	rd = ra XOR (rb << imm3)
NOT	I	rd = NOT ra	Alias for NEG
ANDI	I	rd = ra AND imm6	Not sure whether top bits should be zeroed or left alone
ORI	I	rd = ra OR imm6	Not sure whether top bits should be zeroed or left alone
XORI	I	rd = ra XOR imm6	Not sure whether top bits should be zeroed or left alone
MIN	R	rd = min(ra, rb)	Can we do something useful with imm3?
MAX	R	rd = max(ra, rb)	Can we do something useful with imm3?

You can implement SUB as negate and add, idk if I care.

You specify F flag trit by adding a .+ or .- suffix to the instruction. No suffix means F=0, so by default operations do not check flags.

TODO:

We have a bunch of operations where the R-format imm3 field is unused or uncertainly used and idk if there’s anything better to do with it since its range is only -13 to +13. Having rd = ra + rb + some small integer constant seems pretty useful for lots of things, doing the same operation with division or whatever seems less so.
In related news, we use 4-operand instructions for mul and div and rem and multiply+add, not sure if that’s reasonable but it’s something.
I don’t really remember how multiply and divide should interact with flags, needs to double check.
There’s annoying differences between remainder and modulo; do those still exist in balanced ternary? Check.
Someday it would be nice to have a “sticky overflow” bit in flags, so you can clear it, do a bunch of operations, then check if any of them overflowed. For now it’s not worth worrying about though.

Not to do for now:

saturating arithmetic
Floating point
Min and max instructions could be a compare followed by a conditional move, not sure if I care.

Bit mani… uh, trit manipulation

Instruction	Format	code	Notes
SH	R	rd = ra << rb	Shift left or right. Negative values of rb shift right. The last shifted-off trit is stored in FL[F].CARRY. Can we do something with imm3?
SHI	I	rd = ra << imm6	Shift by immediate. Negatives values of imm6 shift right. The last shifted trit is stored in FL[F].CARRY.
ROT	R	rd = ra ROTATE rb	Rotate left or right. Negative values of rb rotate right, values pushed off one end appear on the other end
ROTI	I	rd = ra ROTATE imm6	Rotate by immediate.
CLZ	I	rd = count-leading-zeros(ra)
SIG	I	rd = FLAGS[F].CMP = signum(ra)	Signum, alias for `CMPI rd, ra, 0`
POPCNT	I	rd = popcount(ra)	popcount, returns number of non-zero trits

TODO:

Can we do something useful with the imm3 in SH and ROT?
Can we do something useful with the imm6 in CLZ? Probably not.
Shortcut operations to test/set/unset particular trits? We could do those with AND/OR/XOR operations that take an imm3 shift, as discussed with tryte-extraction ops above.

Jumps

Instruction	Format	code	Notes
J	M	pc = rd + 3*imm12	Bog-standard absolute jump. All instructions are on 18-trit boundaries so this cannot jump to an unaligned instruction.
JL	M	r1 = pc + 3; pc = rd + 3*imm12	Jump-and-link.
JC	I	if FL[F].COMPARE = + then pc = rd + 3imm6; if FL[F].COMPARE = - then pc = ra + 3imm6; if FL[F].COMPARE = 0 then fallthrough	Jump on condition. Goofy 3-way jump ’cause sure why not. Basically jumps to rd+imm6 if the flag is positive, or ra+imm6 if the flag is negative. …now that I think of it, having the same immediate for both is not necessarily great, hmm.
JCARRY	I	if FL[F].CARRY = + then pc = rd + 3imm6; if FL[F].CARRY = - then pc = ra + 3imm6; if FL[F].CARRY = 0 then fallthrough	Jump on carry. Goofy 3-way jump ’cause sure why not.

You construct PC-relative jumps like loads and stores, using AUIPC.

The 3-way jumps are actually a lot more reasonable than they first look, you can synthesize 2-way jumps out of them by doing JC r9, r9 or something which will then jump to r9 if the carry flag is nonzero or fallthrough if it is zero.

TODO:

How do you jump only if the COMPARE flag is zero? Do CMPI rSomething, 0 I guess. Not sure that’s ideal.
Not sure if the immediate offset for the jump being the same for both branches is ideal either. Still, good enough for now.
AUIPC loading a 12 trit offset and all our jumps having a 9 or 12 trit offset means we can address 21 or 24 trits of address space, which is frankly overkill. Honestly though, better than the other way around.

Other

Instruction	Opcode	Format	code	Notes
NOP		R	no operation	Canonical encoding is `ADD r0, r0, r0`
UND	0	I	error	guaranteed undefined instruction, all zero’s
HLT			halts CPU
RESERVED			anything	Reserved instruction space for implementation-dependent stuff

Meta-operations

Basically, shortcuts for achieving specific things that may not be obvious:

You can construct PC-relative addresses for loads, stores and jumps with AUIPC rd, imm12 followed by any ol M or I-type instruction. Honestly with this instruction format we could probably have a 24-trit address space and be ok.
You can set CMP flags to specific values with CMPI r0, r0, plus or minus constant

TODO:

Can you set CARRY flags to specific values somehow?

Encoding space used

Counts are not necessarily exact, I’m not gonna recount everything every single time I change something.

R instructions: 4 trits of opcode space, 81 values total. 16 used.
I instructions: 4 trits of opcode space, 81 values total. 27 used.
M instructions: 2 trits of opcode space, 9 values total. 4 used.

Total instructions: 47ish.

We cooooould get away with 2 trits of address space for a 9-register flags reg. We would have almost literally zero extra encoding space left over though.

Implementation

So one of the first things to ask oneself for writing a VM for this is, “how do I represent trits?”. We have have several possibilities:

Pack a trit into 2 bits and implement operations one trit as a time. This gives us 4 possible values per trit, so we will just say one of them is invalid. Then we can have 00 be 0, 01 be + and 10 be - .This gives us 4 trits per byte, or 16 trits per 32-bit integer, which is pretty good, but means 18 trits would take 5 bytes. I don’t really mind the memory usage but having things be unaligned and byte-wise Irks me.
Represent trytes as a larger machine word, either 5 trits into 8 bits, 9 trits into 16 bits, or 18 trits into 32 bits. You wouldn’t represent trits directly, just operate on numbers. Then you’d do all the math and stuff in binary so, a tryte holding the value -13 would be represented as a byte holding the value -13. This would work pretty fine for math but doing ternary logic would basically require decoding and then re-encoding the trytes, which sounds slow considering that instruction decode would be lots of ternary logic.
Pack 4 trits into 8 bits but, instead of doing operations one trit at a time, implement them as a lookup table or maybe SIMD operations.
More complicated encoding to try to stuff an 18-trit word into 32 bits. You can’t quite stuff 2 trits into 3 bits, annoyingly, so I’m probably not going to bother. This devolves into something close to the “5 trit per byte” encoding above.