TernaryComputer
ah shit this needs a name. The Soviet ternary computer was named Setun, after a river in Russia. I could name this after a river near me, and force everyone to try to pronounce Monongahela. Sure, let’s do it.
Monongahela
So I sort of want to make a virtual machine for a computer using balanced ternary, just to see how it’d work out. Signed vs. unsigned arithmetic is generally a big pain in the ass, but with balanced ternary that’s not a problem, because sign is built into the number system implicitly. You could make a ternary computer in nonbalanced ternary, but balanced ternary is rad af so there’s no reason not to use it. So let’s make a typical Von Neumann architecture machine with Kleene logic.
Of course it’s going to be slow as fuck since we have to essentially emulate each trit on top of binary operations, but hey. This is an experiment.
This sort of thing is explored in MUCH MORE DEPTH by http://www.scribd.com/doc/78370674/TernaryComputingTestBed3TritComputerArchitecture which, in case that link vanishes someday, is “Ternary Computing Testbed 3Trit Computer Architecture”, by Jeff Connelly, Advised by Professor Phillip Nico of the California Polytechnic State University of San Luis Obispo, published August 2008. Oooor see http://jeff.tk//trinary/CPE%20Report%20%20Ternary%20Computing%20Testbed%20%20RC6.pdf . There’s also other explorations here: https://homepage.cs.uiowa.edu/~dwjones/ternary/ , but they don’t use balanced ternary and so are officially Le Lame.
…someone should make an emulator for one of the old 36bit computers, except it’s actually all ternary.
Balanced ternary
We have three digits: +,  and 0. + is positive one,  is negative
one, and 0 is zero. This has the nice property of letting the highest
nonzero trit always determine whether we have a positive or negative
number, all arithmetic is the same on signed and unsigned numbers, and a
few other things. Negative and positive can be switched by inverting the
number, swapping every +
and 
. They are also
symmetrical in their numerical range, so a word of a specific size has
an equal range into the negative and the positive,unlike 2’scomplement
binary.
So let’s write some of these to try to get a feel for how this works:
Decimal  Balanced ternary 

13   
12  0 
11  + 
10  0 
9  00 
8  0+ 
7  + 
6  +0 
5  ++ 
4  0 
3  00 
2  0+ 
1  00 
0  000 
1  00+ 
2  0+ 
3  0+0 
4  0++ 
5  + 
6  +0 
7  ++ 
8  +0 
9  +00 
10  +0+ 
11  ++ 
12  ++0 
13  +++ 
One trit has a range of 3, two trits 9, 3 trits 27, etc. This leads to some weirdfeeling min and max ranges like +/ 13, 40 and such, but you get used to it. Basically, for word size n, the range is plus/minus ((3^n)1) / 2.
Word size  Range 

1 trit (3^1)  1 to +1 
2 trits (3^2)  4 to +4 
3 trits (3^3)  13 to +13 
4 trits (3^4)  40 to +40 
5 trits (3^5)  121 to +121 
6 trits (3^6)  364 to +364 
7 trits (3^7)  1093 to +1093 
8 trits (3^8)  3280 to +3280 
9 trits (3^9)  9841 to +9841 
10 trits  + 29,524 
18 trits  + 193,710,244 
20 trits  + 1,743,392,200, (1.7 billion) 
27 trits  + 3,812,798,742,493 (3.8 trillion) 
30 trits  + 102,945,566,047,324 (102 trillion) 
Comparison with handy binary numbers
Word size  Binary range near it 

5 trits  0.945 * 2^8 
6 trits  2.843 * 2^8 
9 trits  19.22 * 2^10 
10 trits  0.901 * 2^16 
18 trits  369.5 * 2^20 
20 trits  0.811 * 2^32 
27 trits  1775 * 2^32 
30 trits  0.000111 * 2^64 
36 trits  0.008 * 2^64 
Shifts left and right will multiply and divide a number by three. Per design, signedness doesn’t exist, so sign extension doesn’t exist; shifts just always fill the high bits with 0. Floating point numbers could be implemented without a sign bit, but I’m not going to bother with floating point right now. All comparisons return either higher, lower or equal, and we could conceivably have threedestination jumps based on that in addition to just hereorthere. And that really covers it. The basic operations really don’t get changed much, conceptually.
We could write 2 trits per digit in base9, or 3 trits per digit in base27. Both of these feel weird with negative numbers, so I’m generally not going to do either. Maybe base9 with positive numbers being 1234 and negative ones being ABCD? We could do the same in base 18 to have 5 trits in 4 digits I think… Eh, not too useful.
Logic!
There’s several types of threevalued logic. We’ll go with Kleene logic because it seems the simplest and most sensible, at least for a machine rather than some kind of proof notation. If we run into a problem with that, we can always change it.
V  NOT V 

+ 
 
0 
0 
 
+ 
AND  + 
0 
 

+ 
+ 
0 
 
0 
0 
0 
 
 
 
 
 
OR  + 
0 
 

+ 
+ 
+ 
+ 
0 
+ 
0 
0 
 
+ 
0 
 
Can’t find a reference for xor. Someone online said its undefined.
But you can evaluate XOR in binary just as
(a OR b) AND NOT (a AND B)
, so let’s try that and see how
it works. I thiiiiiink this is right:
XOR  + 
0 
 

+ 
 
0 
+ 
0 
0 
0 
0 
 
+ 
0 
 
Architecture
Word size
A 9 or 12 trit computer would be on the microcontroller scale, comparable to 16 or 20 bits of binary address space. 18 trits is an ok size, it’s a few times smaller than 32 bits but still quite large. 27 trits is a bit weird ’cause it’s way bigger than 32 bits but way smaller than 64 bits; it’s closest to 48 bits. 36 trits is a little under 1% of a 64bit address space, but a 64bit address space is absurdly huge so you probably won’t notice the difference. I want to build something bigger than a microcontroller (big enough for real desktop programs), but not necessarily fullscalemodernsupercomputer with all the bells and whistles. An 18 trit pointer gives us + 193 million values of address space, so that doesn’t seem terrible for a proof of concept. 18 trits seems a nice size.
With 27 registers you need 3 trits of address per register, so you could make a quite reasonable 3address instruction encoding with 18 trits per instruction.
Can we subdivide our 18trit words into… uh I guess “trytes”? Seems useful for text characters and stuff. The obvious sizes are 6 or 9 tritspertryte. After some waffling I will go with 6 so that our “byte” equivalent is 364 to +364, which seems okay I guess, and is not too far off from an actual byte for the purposes of comparing text encodings, memory sizes, etc. It also means we can fit 3 trytes into a word, which is nice. So memory will be tryteaddressed with a 6trit tryte. Everything will be littleendian, to keep life consistent.
Registers
We have 27 18trit general purpose registers, numbered r13 to r13. There’s an instruction pointer that is not a GPR. Nothing else too special. All instructions are aligned on an 18 trit boundary.
 r0 is zero register
 r1 is return address/link register
 r2 is stack pointer (grows downward)
 r3 is global pointer
 Thread pointer?
 Calling convention? args/saved/temporary.
We could just not have a flags register and use a GPR for all tests/comparisons, but then david_chisnall would laugh at me on lobste.rs. So we have a flags register that is not a GPR. …I kinda like the idea of POWER’s multiple flags registers. Ok, the flags register is 6 trits and subdivided into 3 2trit “subregisters”. (I would love for it to be bigger like 9 subregisters, but we don’t really have the instruction encoding space for it without making life more complicated and I do want to have a “zero register” that ignores flag writes.) Comparisons and checks can read/write to these subregisters. They just contain one trit for carry/overflow and one trit for gt/lt/zero comparison.
So we have FL[x]
where x
is 1 to +1, and
it has two subfields: FL[x].CARRY
and
FL[x].CMP
CMP is +0 for GT, EQ, LT respectively, and CARRY
is + for addwithcarry,  for subwithborrow and 0 for nooverflow. We
can add more subfields to it easily if we discover we need them. Writes
to FL[0]
are ignored, so instructions can write to it to
say “we don’t care about our flags”. (Does reading FL[0] do anything
special? idk.)
We can also have operations for reading/writing control/status registers (CSR’s) a la RISCV. For now though they are all undefined and open for implementations to use them however they want.
Not to do for now:
 Do we want a saturatingcarry flag? Not yet.
 NonGPR stack pointer? Eeeeh, we could do that, but I don’t think it’s worth making the instruction decoding more complicated just for a toy VM.
Instructions
Instruction formats:
R
egister: 5 trit opcode, 1 trit of flags plus 3 3trit register operands (dest, src1, src2), plus a 3trit immediate as well just for lulz (maybe we can use it for something like ARM’s barrel shifter).R4
instruction, kinda a subset ofR
we use for a few instructions where the immediate portion specifies a 4th register. This may be a mistake, idk.I
mmediate: 5 trit opcode, 1 trits of flags, 3 trit destination, 3 trit register operand, 6 trit immediate value. 6 trits is a bit small but oh well.M
emory: 3 trit opcode, 1 3trit register operands, 12 trit immediate value.
This is satisfying ’cause it means the first trit of the opcode alone
can tell us what kind of instruction it is. The first trit of an R
instruction is +
, an I
instruction is
0
, and a M
instruction is 
.
Instruction layouts:
trit 000000 000011 111111
012345 678901 234567
      
R instr +PPPPF DDDAAA BBBIII
I instr 0PPPPF DDDAAA IIIIII
M instr PPDDD IIIIII IIIIII
P = opcode
D = destination reg
A = src reg 1
B = src reg 2
I = immediate value
F = flags register indicator
TODO: Not sure whether the R4 instruction format is a good idea or not.
Instruction reference
Memory and registers
Instruction  Opcode  Format  code  Notes 

LIL  M  rd[0:12] = imm12  Load low immediate. Are rd[13:18] left as is, or zeroed?  
LIH  I  rd[10:18] = imm6[0:6]  Load high immediate. Do we do anything with ra? Or use an M instruction format and do something with the extra 6 bits? idk.  
LD  I  rd = *(ra + imm6)  Load absolute word. Can we do something useful with the flags trit? Maybe use it to select a byte to load in the word?  
ST  I  *(rd + imm6) = ra  Store absolute. Can we do something useful with the flags trit?  
STL  I  *(rd + imm6) = ra[0:6]  Store low tryte.  
AUIPC  M  rd = pc + (imm12 << 6)  add upper immediate to program counter,
use this for constructing relative addresses like RISCV’s
AUIPC 

RFL  R  rd[0:6] = FL  Read flags register, upper trits of destination are zeroed  
WFL  R  FL = rd [0:6]  Write flags register  
RCSR  I  rd = CSR[ra]  Read CSR  
WCSR  I  CSR[rd] = ra  Write CSR 
TODO:
 Better trytewise load and store? Trytewise loads are technically just an optimization but stores do matter. Maybe have a “load/store tryte” and use the flags trit in the instruction as an index? Or just have a nonzero flags trit indicate “load low” or “load high”?
 Trit or tryte extraction/shuffle? For now let’s not bother I suppose.
 Compare and exchange instructions? Swap instructions? Atomics? Might be handy but not useful for a smolish proof of concept.
 Load and store multiple? Have them fixed size so they’re at least constant time.
Math and logic
Instruction  Opcode  Format  code  Notes 

ADD  R  rd = ra + rb + imm3; FL[F].CARRY = carry  Would the imm3 be better spent on a rotate of rb?  
ADC  R  rd = ra + rb + imm3 + FL[F].CARRY; FL[F].CARRY = carry  Would the imm3 be better spent on a rotate of rb?  
SUB  R  rd = ra  rb  imm3; FL[F].CARRY = carry  Would the imm3 be better spent on a rotate of rb? Borrow is represented with the CARRY field being   
SBB  R  rd = ra  rb  imm3 + FL[imm3].CARRY; FL[F].CARRY = carry  Subtract with borrow.  
MUL  R4  rc:rd = ra * rb; FL[F].CARRY = overflow  The imm3 field holds rc. Not sure if this is a good idea but it works.  
MULA  R4  rd = ra * rb + rc; FL[F].CARRY = overflow  Multiplyandadd. The imm3 field holds rc. Not sure this is a good idea but whatever  
DIV  R4  rd = ra:rb / rc  The imm3 field holds rc. Not sure if this does anything to the carry flag?  
REM  R4  rd = ra:rb % rc  Remainder operation, (a / b) + (a % b) = d. doublecheck. Not sure if this does anything to the carry flag?  
NEG  I  rd = ra  Negate operation, same as logical NOT  
CMP  R  rd = FL[F].CMP = signum(ra  rb)  Storing the result in rd might be useful for something I suppose? Probably more useful than doing nothing with it.  
ADDI  I  rd = ra + imm6; FL[F].CARRY = carry  
ADCI  I  rd = ra + imm6 + FL[F].CARRY; FL[F].CARRY = carry  
SUBI  I  rd = ra  imm6; FL[F].CARRY = borrow  
SBBI  I  rd = ra  imm6 + FL[F].CARRY; FL[F].CARRY = borrow  
MULI  R  rb:rd = ra * imm3; FL[F].CARRY = overflow  Maybe not very useful tbh  
DIVI  R  rd = ra:rb / imm3  doublecheck  
MODI  R  rd = ra:rb % imm3  doublecheck  
CMPI  I  rd = FL[F].cmp = signum(ra  imm6)  
AND  R  rd = ra AND (rb << imm3)  
OR  R  rd = ra OR (rb << imm3)  
XOR  R  rd = ra XOR (rb << imm3)  
NOT  I  rd = NOT ra  Alias for NEG  
ANDI  I  rd = ra AND imm6  Not sure whether top bits should be zeroed or left alone  
ORI  I  rd = ra OR imm6  Not sure whether top bits should be zeroed or left alone  
XORI  I  rd = ra XOR imm6  Not sure whether top bits should be zeroed or left alone  
MIN  R  rd = min(ra, rb)  Can we do something useful with imm3?  
MAX  R  rd = max(ra, rb)  Can we do something useful with imm3? 
You can implement SUB as negate and add, idk if I care.
You specify F flag trit by adding a .+ or . suffix to the instruction. No suffix means F=0, so by default operations do not check flags.
TODO:
 We have a bunch of operations where the Rformat
imm3
field is unused or uncertainly used and idk if there’s anything better to do with it since its range is only 13 to +13. Havingrd = ra + rb + some small integer constant
seems pretty useful for lots of things, doing the same operation with division or whatever seems less so.  In related news, we use 4operand instructions for mul and div and rem and multiply+add, not sure if that’s reasonable but it’s something.
 I don’t really remember how multiply and divide should interact with flags, needs to double check.
 There’s annoying differences between remainder and modulo; do those still exist in balanced ternary? Check.
 Someday it would be nice to have a “sticky overflow” bit in flags, so you can clear it, do a bunch of operations, then check if any of them overflowed. For now it’s not worth worrying about though.
Not to do for now:
 saturating arithmetic
 Floating point
 Min and max instructions could be a compare followed by a conditional move, not sure if I care.
Bit mani… uh, trit manipulation
Instruction  Opcode  Format  code  Notes 

SH  R  rd = ra << rb  Shift left or right. Negative values of rb shift right. The last shiftedoff trit is stored in FL[F].CARRY. Can we do something with imm3?  
SHI  I  rd = ra << imm6  Shift by immediate. Negatives values of imm6 shift right. The last shifted trit is stored in FL[F].CARRY.  
ROT  R  rd = ra ROTATE rb  Rotate left or right. Negative values of rb rotate right, values pushed off one end appear on the other end  
ROTI  I  rd = ra ROTATE imm6  Rotate by immediate.  
CLZ  I  rd = countleadingzeros(ra)  
SIG  I  rd = FLAGS[F].CMP = signum(ra)  Signum, alias for
CMPI rd, ra, 0 

POPCNT  I  rd = popcount(ra)  popcount, returns number of nonzero trits 
TODO:
 Can we do something useful with the imm3 in SH and ROT?
 Can we do something useful with the imm6 in CLZ? Probably not.
 Shortcut operations to test/set/unset particular trits? We could do those with AND/OR/XOR operations that take an imm3 shift, as discussed with tryteextraction ops above.
Jumps
Instruction  Opcode  Format  code  Notes 

J  M  pc = rd + 3*imm12  Bogstandard absolute jump. All instructions are on 18trit boundaries so this cannot jump to an unaligned instruction.  
JL  M  r1 = pc + 3; pc = rd + 3*imm12  Jumpandlink.  
JC  I  if FL[F].COMPARE = + then pc = rd + 3*imm6; if FL[F].COMPARE =  then pc = ra + 3*imm6; if FL[F].COMPARE = 0 then fallthrough  Jump on condition. Goofy 3way jump ’cause sure why not. Basically jumps to rd+imm6 if the flag is positive, or ra+imm6 if the flag is negative. …now that I think of it, having the same immediate for both is not necessarily great, hmm.  
JCARRY  I  if FL[F].CARRY = + then pc = rd + 3*imm6; if FL[F].CARRY =  then pc = ra + 3*imm6; if FL[F].CARRY = 0 then fallthrough  Jump on carry. Goofy 3way jump ’cause sure why not. 
You construct PCrelative jumps like loads and stores, using AUIPC.
The 3way jumps are actually a lot more reasonable than they first
look, you can synthesize 2way jumps out of them by doing
JC r9, r9
or something which will then jump to
r9
if the carry flag is nonzero or fallthrough if it is
zero.
TODO:
 How do you jump only if the COMPARE flag is zero? Do
CMPI rSomething, 0
I guess. Not sure that’s ideal.
 Not sure if the immediate offset for the jump being the same for both branches is ideal either. Still, good enough for now.
AUIPC
loading a 12 trit offset and all our jumps having a 9 or 12 trit offset means we can address 21 or 24 trits of address space, which is frankly overkill. Honestly though, better than the other way around.
Other
Instruction  Opcode  Format  code  Notes 

NOP  R  no operation  Canonical encoding is
ADD r0, r0, r0 

UND  0  I  error  guaranteed undefined instruction, all zero’s 
HLT  halts CPU  
RESERVED  anything  Reserved instruction space for implementationdependent stuff 
Metaoperations
Basically, shortcuts for achieving specific things that may not be obvious:
 You can construct PCrelative addresses for loads, stores and jumps
with
AUIPC rd, imm12
followed by any ol M or Itype instruction. Honestly with this instruction format we could probably have a 24trit address space and be ok.  You can set CMP flags to specific values with
CMPI r0, r0, plus or minus constant
TODO:
 Can you set CARRY flags to specific values somehow?
Encoding space used
Counts are not necessarily exact, I’m not gonna recount everything every single time I change something.
R
instructions: 4 trits of opcode space, 81 values total. 16 used.I
instructions: 4 trits of opcode space, 81 values total. 27 used.M
instructions: 2 trits of opcode space, 9 values total. 4 used.
Total instructions: 47ish.
We cooooould get away with 2 trits of address space for a 9register flags reg. We would have almost literally zero extra encoding space left over though.
Implementation
So one of the first things to ask oneself for writing a VM for this is, “how do I represent trits?”. We have have several possibilities:
 Pack a trit into 2 bits and implement operations one trit as a time. This gives us 4 possible values per trit, so we will just say one of them is invalid. Then we can have 00 be 0, 01 be + and 10 be  .This gives us 4 trits per byte, or 16 trits per 32bit integer, which is pretty good, but means 18 trits would take 5 bytes. I don’t really mind the memory usage but having things be unaligned and bytewise Irks me.
 Represent trytes as a larger machine word, either 5 trits into 8 bits, 9 trits into 16 bits, or 18 trits into 32 bits. You wouldn’t represent trits directly, just operate on numbers. Then you’d do all the math and stuff in binary so, a tryte holding the value 13 would be represented as a byte holding the value 13. This would work pretty fine for math but doing ternary logic would basically require decoding and then reencoding the trytes, which sounds slow considering that instruction decode would be lots of ternary logic.
 Pack 4 trits into 8 bits but, instead of doing operations one trit at a time, implement them as a lookup table or maybe SIMD operations.
 More complicated encoding to try to stuff an 18trit word into 32 bits. You can’t quite stuff 2 trits into 3 bits, annoyingly, so I’m probably not going to bother. This devolves into something close to the “5 trit per byte” encoding above.