GarnetLanguage

Design

It is essentially designed to replace C and be simpler to use and implement than baroque alternatives like C++ and D. We want to provide more safety and power than C, while still remaining suitable for embedded programming, device drivers, and other situations where you might need a tyrannical grip on the hardware.

It has utterly no redeeming features from a research standpoint, I’m sure.

However, when we have a design choice between “This is how it is going to work” and “We can make it work this way if we add a couple features”, we do the latter. We will not write a single multiprocessing system into the language, but we will add operations that help write such things. We will not write a single memory manager into the language, but we will have the language be powerful enough to write various allocation functions.

Notable features:

Strong static typing
Parametric polymorphism
Namespaces
Sum types and pattern matching
First-class functions
Bounds-checked arrays
Variously sized integers
Tail call optimization
Control over memory allocation
Tuples
Default values and initialization syntax for structures
Pointers allowing access to raw memory and such low-level stuff Things we want to add:
Type casts
Bitwise functions/operators (set, clear, toggle bit/range), possibly with bitfield or bitvector type
Multiprocessing memory fence instructions
Array slicing/range types
Functions taking variable number of arguments, default args, keywords, function overloads(???) etc.
Null pointers might be explicit options with some syntactic sugar.
Exceptions/nonlocal exits, conditions
Boo-style optionally-inferred static typing
More number types (int8, uint8, etc…). Define conversion between number types.
Pragmas to put variables/values at a certain location instead of just alignment of structures and such
Optimization pragmas for functions
Consider Haskell-y typeclasses vs. C++ classes for type extension. I do like the idea of generic functions too…
Define inline asm better
Define unicode better
Read-only and write-only values/locations???
Bitfields
Memory regions for harvard/NUMA systems?
If we can write a malloc/free manual allocator, can we write a similar manual refcounting allocator?
OMJ standard library Things to consider:
Immutable strings?
Remember, no nulls. Maybe nullable types, though, which are just a shortcut for a certain sum type construction. A reference type ? instead of ^ for nullable references would be ideal. However, let’s see how things work without it for now.
For the object system… Mixins? Multiple inheritance? Interfaces?
FFI with C (should be easy)
More reflection, dynamic compilation etc? Expose the compiler as a library, ideally…
Symbols/Lisp keywords???
Conditional compilation??? Code regions a la C#?
Refcounted regions, with full safety (cannot be freed twice, cannot be have dangling references…)
Vector types, if only in a library… can we overload operators then??? We deliberately do not have:
Partial evaluation (alas)
GC and such. Too much infrastructure.
Multiprocessing. Also too much infrastructure, and no clear right way to do it. Mjolnir’s suggestions for making a language better than C for embedded programming:
Reasonable inline assembly syntax
Read-only and write-only varibles (for instance, for providing read-only or write-only machine registers)
Stronger typing that you can still override
Define bitfields, define order in bitfields (at least in terms of machine order, so in the 1 is always the least significant bit, regardless of whether the machine is big or little endian)
Real bit-twiddling routines (set, clear, toggle, test…)
Language level control for packing and alignment
Acknowledgement of the existence of harvard architecture systems. This could be done by being able to define different memory regions, and having pointers only able to point within a given region. Then the compiler could define a “flash memory” region and you could update and fiddle with it just like normal memory, and the compiler would turn it all into the ‘alter flash firmware’ stuff
Per-function optimization hints; “opt for size”, “opt for speed”, “don’t touch”, etc.
C++-style OO, or something like it.
Parametric polymorphism should not make everything indirect through pointers! Or should it? Fuck, look at Ada and Eiffel and Java and C#, and ponder HARD.
Variable-length size prefixes for strings
Don’t assume that sizeof(pointer) = sizeof(int)
Better standard library
Language support for ring buffers! :D Including ability to stuff arbitrary bits in and pull arbitrary structures out. Things to fix:
References are a little wiggy still… I sort of suspect they may be hugely inconvenient. But I suppose in the end that requires playtesting. …actually, taking OCaml refs as a model, they may not be quite necessary at all. Blaaaaaaah.
Everything with an XXX near it
All the other things that are broken

Goals

The general goals are in the first paragraph: a language to replace C. This means similar problem domain to C, such that it is not less powerful in any domain, but more powerful in many. These are more detailed goals for particular subjects. They are in no particular order.

Goals of the syntax:

Simple and minimal/small
Little ambiguity
Easy to write
Few special cases
Vaguely similar to C Goals of the type system:
Strong static typing that can still be overridden
Detailed enough for system/embedded programming
Polymorphic enough that you can have generic collection libraries
Some kind of subtyping mechanism if possible (objects, typeclasses)
Full runtime reflection would be nice but probably not feasible
Ability to have higher-order functions, composibility, etc. would be nice but probably very difficult. Goals of the standard library and namespace mechanisms:

Goals of the extension mechanisms:

Give implementations a way to handle common platform- and implementation-specific issues
Give the implementation more power over exactly how the language works in certain special cases
Provide guidelines for the interface of more extensive extensions

Fiddly Details Of Document And Implementation

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Source files MAY be ASCII encoded, but SHOULD be unicode, hopefully UTF-8.

Sometimes this document describes certain things as resulting in an error or a warning. “An error” means that the compiler MUST NOT produce output, and say why. “A warning” means that the compiler MAY NOT compile the input, and MUST inform the programmer of the unusual condition (I suppose unless you have a compile flag to turn the warning message off, ’cause you’re a loser).

Lexical Elements

Identifiers

Identifiers are case-sensitive. They may only start with alphabetic characters, but may include the characters a-z, A-Z, and 0-9, as well as +, -, **, /, !, >, <, _, =, and probably a few other things.

Comments

Line comments start with a double dash, –. Block comments start with /- and end with -/. Block comments may be nested.

Literals

Integer literals are strings of numbers 0-9, with an optional minus sign at the front. “0x” followed by a string of 0-9 and a-f or A-F (case insensitive) are hexadecimal integers. “0b” followed by a string of 0-1 are binary integers. “0o” followed by a string of 0-7 are octal numbers.

Floating point literals are just numbers 0-9, a decimal point, and some more numbers 0-9, with an optional minus sign at the front. You can also put an exponential term at the end, such as 3.5e3, or 94.364E-4. Case on the “e” doesn’t matter.

Array literals are a list of literals surrounded by [square brackets] and separated by commas. All elements in an array literal must have the same type. You cannot put expressions in array literals at this point, because then they’d have to be initialized somehow when you do it for globals and constants.

A single character enclosed in single quotes (‘) is a character literal. There are also several escape sequences for non-printable characters:’‘is a tab,’‘is a newline,’‘is a carriage return, and’‘is a backspace.’‘where X is any integer literal (as above) specifies a character literal with that value (ASCII for normal strings, Unicode for unicode strings). A character literal prefixed by a will be a unicode character of type ’achar’ instead of ‘char’. For instance, a’f’.

XXX: Character escapes are ambiguous! Just use longest valid match?

All literals are immutable. This applies to structures, arrays and strings.

let foo somestruct = somestruct {a = 10, b = 20}
foo.a <- 20 -- Impossible!
foo.b <- 30 -- Equally impossible!

let bar &somestruct = @somestruct {a = 10, b = 20}
bar.a <- 20 -- Possible!
bar.b <- 30 -- Equally possible!

Reserved symbols

Reserved names and symbols are:

XXX: Once you come up with a firm design you like, this needs updating

Grammar

Usual BNF syntax. [] is zero-or-one, {} is zero-or-many, | is alternation, etc. Terminals are ALL IN CAPS, nonterminals begin with a lowercase letter.

XXX: Once you come up with a firm design you like, this needs updating

XXX: Needs to be fixed to handle precedence correctly.

program ::= {decl}
decl ::= namespaceDecl | importDecl | typeDecl | structDecl | dataDecl | globalDecl | funcDecl

namespaceDecl ::= "namespace" namespaceID
importDecl ::= "import" namespaceID ["as" namespaceID]
namespaceID ::= IDENTIFIER {"." IDENTIFIER}
exportDecl ::= "export" IDENTIFIER {"," IDENTIFIER}
pragmaDecl ::= "pragma" attribute

typeDecl ::= "type" IDENTIFIER "=" typespec
structDecl ::= "struct" [attributes] typespec "=" {structItem} "end"
attributes ::= "(" attribute {"," attribute} ")
attribute ::= ["optional"] IDENTIFIER | ["optional"] IDENTIFIER "=" value
structItem ::= IDENTIFIER typespec ["=" value]
dataDecl ::= "data" typespec "=" {dataItem} "end"
dataItem ::= "$" IDENTIFIER [typespec]
typespec ::= ["^"] ["%"] IDENTIFIER ["(" typespec {"," typespec} ["->" typespec] ")"] | typespec "[" [NUMBER] "]" | "{" typespec {"," typespec} "}"

globalDecl ::= "let" IDENTIFIER typeSpec "=" value

funcDecl ::= "def" IDENTIFIER "(" argList "->" typespec ")" {expression} "end"
argList ::= [ IDENTIFIER typespec {"," IDENTIFIER typespec}]
expression ::= varExpr | funcallExpr | ifExpr | whileExpr | forExpr | assignExpr | returnExpr | funExpr | matchExpr | arrayElementExpr | structElementExpr | value | compilerExpr | structLitExpr | arrayLitExpr | tupleLitExpr | mathExpr | unaryExpr

varExpr ::= "let" IDENTIFIER typespec "=" expression
funcallExpr ::= expression "(" [expression {"," expression} ")"
ifExpr ::= "if" expression "then" {expression} {"elif" expression "then" {expression}} ["else" {expression}] "end"
whileExpr ::= "while" expression "do" {expression} "end"
forExpr ::= "for" IDENTIFIER typespec = expression ";" [expression] ";" [expression] "do" {expression} "end"
assignExpr ::= lvalue "<-" expression
breakExpr ::= "break"
returnExpr ::= "return" expression
funExpr ::= "fun" "(" arglist "->" typespec ")" {expression} "end"
matchExpr ::= "match" expression "with" pattern "->" expression {expression} ";" {pattern "->" expression {expression} ";"} "end"
arrayElementExpr ::= expression "[" expression "]"
structElementExpr ::= expression "." IDENTIFIER
arrayLitExpr ::= "[" [expression {"," expression}] "]" 
structLitExpr ::= IDENTIFIER "{" [IDENTIFIER "=" expression {"," IDENTIFIER "=" expression}] "}"
tupleLitExpr ::= "{" [expression {"," expression}] "}"
dataLitExpr ::= "$" IDENTIFIER [tupleLitExpr | structLitExpr]
mathExpr ::= expression ["+"|"-"|"/"|"**"|"="|"/="|...] expression
unaryPrefix ::= "^" | "@" | "-"
unaryExpr ::= unaryPrefix expr

pattern = value | arrayLitExpr | structLitExpr | tupleLitExpr | dataLitExpr | "_"
lvalue = IDENTIFIER | structExpr | arrayExpr | refExpr
value ::= NUMBER  | STRING | CHARACTER | IDENTIFIER | SYMBOL | arrayExpr | structExpr

Types and Type System

KISS. Basic atomic types: integers, floats, characters. Pointers and functions as well.

Then we have compound types: Sum types, arrays, tuples and structs

Atomic types cannot be decomposed into simpler elements (besides bits), while compound types can.

Numbers

We have signed and unsigned integers with explicit lengths: uint8, int8, uint16, int16, uint32, int32, uint64, int64. We also have types ‘byte’, ‘ubyte’, ‘short’, ‘ushort’, ‘int’, ‘uint’, ‘long’, and ‘ulong’ which are mapped by the compiler to integers of a size convenient for the platform.

We have two types of floating point numbers: float and double. Floats are 32 bits, doubles are 64 bits. They’re typical IEEE floating point numbers, probably.

Any integer of any size can be used as a strictly more general integer without a cast. For instance, if we have an int8 value, then it can be used as an int16 value without a cast. If we wanted to use the int16 as an int8 value, we must cast it explicitly. If we wanted to use an int16 for a uint16 or vice versa then it would need an explicit cast, but we could use a uint16 in place of an int32 with no cast.

Any integer can undergo the same automatic conversion to a float or double. Converting a float or double to an integer requires an explicit cast, and truncates the floating point number.

Values and locations

Values are immutable. Local and global ‘let’ declarations bind values to names. This binding is also immutable, though it can be overridden in a new block. Values do not necessarily have a location or a place in storage; constants, intermediate values and such can be optimized away by the compiler if it wants. The arguments to a function are values, and functions return values. They may return the value of a reference to a location though.

Numbers, tuples, functions, references, pointers, and sum types are all values. This means that you cannot assign ‘3’ to have a new value, or alter the a single value in the tuple {3,“foo”}.

Locations are addressable locations in memory, that contain a value. Array members, structure members, or allocated memory pointed to by a reference are all locations. A location may be optimized away by the compiler. What a location contains can be mutated with the following syntax:

Edge case examples: If you had a tuple with an array in it though, for instance {“something”, [1,2,3]}, then you COULD alter the contents of the array. But you could not make the tuple contain an entirely different array. The same with a sum type. If it contains a location, you can modify the contents of that location, but you cannot make the sum type contain a different location.

let foo ^int = 5
let bar int = 5
foo <- bar

You can get the value of a location thusly:

let bar2 int = ^foo      -- Reference
let bar2 int = foo.bar   -- Structure member
let bar2 int = foo[0]    -- Array

The location constructor @ allocates local memory that is no longer valid at the end of a scope, and returns a reference to it. (Constructs a cell.) It requires an initial value, and is used thusly:

let bar ^int = @10

You can store a reference in a location for MORE INDIRECTION (yaaay!):

let foo int = 5
let bar ^int = @foo
let bop ^^int = @bar
-- now ^bop returns bar, ^^bop returns 5

You can pass references to functions and return them from functions, thus facilitating pass-by-reference:

let a ^int = @5
def foo(x int, y ^int : unit)
   y <- ^y+x
end

def bar(x int : ^int)
   let y = @x
   y               -- Kaboom!  This returns a locally-allocated value!  
                   -- The compiler SHOULD make this an error.
end

def betterBar(x int : ^int)
   let y = new int in someregion with x
   y
end

Note that locations CONTAIN values, they do not POINT to values. Reference are values that POINT to locations. You cannot mutate a value via a location reference. Thus:

let a int = 5
let b ^int = @a   -- b = 5, a = 5
b <- 10           -- b = 10, a = 5

let c ^int = b    -- b and c are now separate references that point to the same location.

For the sake of comparison with C:

let a int = 5                  int a = 5;
let b ^int = @a                int **b = (int**) alloca(sizeof(int)); **b = a;
b <- a;                        **b = a;

let c ^int = b                 int **c = b;
c <- a                         **c = a;
c <- b -- Type error, b is a ^int but the memory cell c points to can only contain an int
let d ^int = b                 int **d = b;

b itself is an immutable value. It always refers to the same memory cell. Thus, there is no way to do the equivalent of ‘b = &a’ after b has already been declared. Declare a new value and trust the compiler to reuse memory that isn’t going to be referred to again.

Polymorphism

We also have parametric polymorphism. Polymorphic types are just given by a name prefixed with a `. You can do nothing with a polymorphic value except assign it, return it, or call another polymorphic function on it. For instance:

def foo(v1 `a, v2 `a : bool)
   v1 = v2
end

has type

fun(`a, `a : bool)

and works because the ‘=’ function has the same type.

Each `a must be the same. If they were different types you’d have to give them different names, such as:

def bar(v1 `a, v2 `b : someType)
   doSomethingWith(v1, v2)
end

This has type fun(a,b : someType)

All polymorphic types are more or less treated like references to locations. This means we know how big they are (as big as references are), so we can do things like create locals. It also means that we can’t do things like new or free

XXX: Parametric polymorphic things can be represented by pointers, or by immediate values if they fit. However, it is probably a much better idea in the long term to go through the trouble of doing individual recompilation for each type that gets used, like C++ and C# do it. Hmmmm. This is actually a difficult issue….

Types can contain other types… arrays are the most obvious example, for instance “int[]” or even “`t[]”. References are similar, ^%t and such.

Numbers

We’ve covered this pretty well. Integers and floats.

Arrays

Arrays are a contiguous chunk of memory. They have a fixed length, they know their length, and do automatic bounds-checking (there will probably be a compiler switch to tell them not to).

They do parametric polymorphism too, as demonstrated above.

XXX: An array of int is declared int[]. XXX: How do we declare/allocate an array? @int [10], malloc(sizeof(int) ** 10)?

XXX: Is ^int[] a reference to an array of ints? Or an array of int references? This is why it should be int^ instead, so we can have int[]^ and int¹.

Arrays elements are indexed by the syntax

arr[i]

and are assigned to by the syntax

arr[i] <- val

where both arr and i can be expressions.

XXX: If you attempt to reach out of the end of an array, is it an error? Probably. How is this handled? Unknown, yet.

XXX: What happens if you index an array with a negative number? I VERY much like Python’s usage of just indexing backwards from the end.

Characters and Strings

The type ‘char’ is a unicode character. Right now, what encoding they are is undefined. UTF-8 and UTF-16 are both good candidates. Note however that characters are CHARACTERS, not numbers. Their size is not guaranteed to be a byte, or anything else.

The type ‘achar’ are 8-ASCII characters.

Strings are arrays of characters. Strings are delimited by double quotes (“). A string is just an array of characters. Character literals can be embedded in strings, just do the backslash escape sequence. A string literal prefixed by an ‘a’ will be an ascii string of type ‘astring’ instead of ‘string’. For instance, a”I’m ASCII now!"

References

References are your bog-standard way to manipulate mutable memory cells. A reference to type foo can be written ^foo. The reference has one operator, a dereference operator “^” that returns a value of the location the reference refers to.

The location constructor @ lets you allocate local memory that is no longer valid at the end of a scope, and returns a reference to it. It requires an initial value, and is used thusly:

let foo = 10
let bar ^int = @foo
let bop ^int = @20

--- XXX: Danger!  Using new sane reference syntax!  Should always do this...
let baz int[]^ = [10, 20, 30]

The dereference operator ^ takes a reference and returns the value of the memory referred to:

-- Note that ^ essentially has type fun(^`a -> `a)
let a ^int = @10
let b ^int = @10
let c ^int = a

a = b         -- False, a and b refer to different memory locations
a = c         -- True, a and c refer to the same memory location
^a = ^b       -- True, the contents of a and b are identical

Structures

Structures are a bunch of values mooshed together. Structures can be polymorphic.

This declares a new structure of type “foo” with three members: an integer x, a float y, and a reference to another foo structure “z”.

struct foo =
   x int
   y float
   z ^foo
end

Structures can be polymorphic:

struct list(`t) =
   item `t
   next list(`t)
end

Structures, like arrays, are full of locations and are indexed in the familiar C-ish way.

list.item
list.next <- nextNode

Structures can be given default values:

struct bop =
   x int = 0
   y float = 1.0
end

def whatever(: float)
   let a ^bop = alloc(bop)  -- XXX: Yes, 'alloc()' is doing magic here, whatever it is.
   a.y  -- returns 1.0
end

Note that a.y works whether a is a bop or a ^bop. Thus a.y and a reference to a.y are in this case equivalent.

Default values of atomic types are 0, 0.0, null character, etc. Default values of other things are undefined; all references, unions, tuples etc must either have something specified as a default, or must always be provided with a value using an expression (see below).

XXX: This constraint breaks the above linked list example

Implementation note: The way this works is that an initializer function is created and automatically called on new structs.

Structures can also be written as expressions:

let foo bop = bop{x=10, y=20.0}

If bop has default values, you can omit the ones you don’t want and just write:

let foo bop = bop{y=20}

Big fat XXX: What the fuck is the difference between these two things?

let a somestruct = somestruct{a = 10, y = 20}
let b ^somestruct = @somestruct{a = 10, y = 20}

Well, a is a value and b is a reference to a cell containing a value. Does that mean the fields of a are immutable? No, but it means that a can never be a different struct. Say we had:

data foo = 
   $foo somestruct
   $bar
end

let a foo = $foo somestruct{}
let b ^foo = @$foo somestruct{}

a.a <- 10  -- Okay
b.a <- 20  -- Okay
a <- $bar  -- Doesn't work
b <- $bar  -- Works fine

I’m pretty sure this model is still consistent, but I’m not entirely sure it’s the Right Thing. Still, best thing I have.

XXX: The automatic dereferencing for assignment and such is a little wiggy. Might want to either get rid of that or make it universal. Not sure.

XXX: A related issue is, if we want to dereference a reference in a struct, do we do ^(struct.member) or (^struct).member by default? Precedence! Again we could do struct.member^ vs. struct^.member, but… ewwwww. (Actually I’ve been working on this for too long because that isn’t seeming like a horrid idea)

Structure attributes

Structures can have attributes assigned to them, that tell the compiler to behave in certain ways. The attributes are enclosed in parenthesis and separated by commas and are entirely optional for the programmer. Attributes may be mutually exclusive. Attributes are a list of keywords, or a list of “key=value” pairs. An attribute may be prefixed by the word ‘optional’, with effects described below.

Implementations are free to define their own structure attributes, but it is suggested they try to follow the general format listed above.

Right now we only define three, ‘pad’, ‘pack’, and ‘align=’. Pad and pack are mutually exclusive. Align takes a single number specifying which byte boundary to align on.

XXX: Might want to break this up into compiler pragmas in general… We might want to apply pragmas to structs, functions, variables…

struct (pad) foo =
   a char
   b char
end

struct (pack) foo =
   a char
   b char
end

struct (pad, optional align=8) foo =
   a int
   b float
end

XXX: Do we want to specify maximum padding? So pad=4 would be ‘pad values less than 4 bytes to a 4-byte boundary’

XXX: Whether a compiler pads or packs by default is undefined… the Right Answer is probably ‘pad’, right?

The compiler MUST follow these directives, or signal an error. If the attribute is prefixed by ‘optional’, the compiler SHOULD follow the directive, and MUST signal a warning if it cannot.

Pointers

Pointers a low-level programming that refers to the computer’s memory directly. You can point them at arbitrary addresses. You can do arithmetic and let them dangle and overrun arrays and fandango on core and all sorts of things. Try to never use them unless you’re writing something like an operating system or memory manager.

Pointers are type ‘ptr’. What they point to is NOT type-checked. This provides a big fat hole in the type system for those who really need it.

Pointer operations are ptrLocation(), ptrContents(), ptrSet() and ptrReference(). ptrLocation() takes a location and returns a pointer to that location; modifying the contents of the pointer modifies the contents of the location. ptrContents() returns the contents of the given pointer. In this way pointers refer to locations, like references. Unlike references, which may only refer to particular locations, pointers can also have math done to them like integers. A pointer can be an arbitrary integer rather than the result of ptrValue() or ptrLocation(), pointerContents() can take an arbitrary integer, and all operations that are valid on integers are valid on pointers. ptrSet(p,v) sets the location pointed to by p to the value v. ptrReference() takes a pointer and returns a ^%t reference referring to the same object.

XXX: Pointers are numbers? On some platforms they are not simply integers, such as segmented architectures, Harvard architectures…! Pointers use normal integer math?

XXX: Shorten names some…

XXX: Shorten syntax even more? Think of those poor embedded programmers…

let foo int = 5
let p1 pointer = ptrAddress(foo)
let foo2 int = ptrContents(p1)  -- foo2 = 5
let p2 int = p1 + -97
let i int = ptrContents(p2)     -- Kaboom!  Value of foo2 is undefined, possibly invalid.

--- Now for some more sophisticated pointer-mongling
struct something =
   a int
   b float
end

let a ^something = @something{a = 10, b = 20.5}
let pb pointer = ptrLocation(a.b)
let i int = ptrContents(pb)   -- Get untranslated chunk of a float as an integer.
let iNew int = someKindOfBitTwiddlingWith(i)
ptrSet(pb, iNew)              -- Forcefully set a new binary value to a.b, which will then be interpreted as a float

Tuples

A tuple is a collection of values or locations. They are immutable. They are written {1,2,“foo”}, which would have type {int, int, string} or such. You can deconstruct them to get at the values using pattern matching. I’m not really sure if you can get to the values any other way yet.

XXX: Ponder. Can we do tuple[0], tuple[1], etc?

Sum types

Also known as a tagged union. A type, or possibly several types, that can substitute for each other. They can be polymorphic. You can also leave off the type and just have the tag, essentially producing enums. You get to the data attached to the tag by using the ‘match’ expression, and ’_’ is a special match term that will match anything.

data foo =
   $foo int
   $bar {int,float}
   $bop somestruct
   $quux
end

let x foo = $foo 10
match x with
   $foo a -> doSomethingWithInt(a);
   $bar b -> doSomethingWithTuple(b);
   $bop c -> doSomethingWithStruct(c);
   $quux -> doSomethingRandom();
   _ -> doDefault() -- But this will never happen
end

XXX: Say we had $bop int[]. How the fuck would that work?

You can’t deconstruct a union to get at its member without pattern matching. This preserves safety, even if it’s a bit of a pain in the butt. We might eventually want automaticly defined functions that perform a bit of reflection, and generally treat them like enums.

XXX: We really want to be able to treat enums like ints. They become much less useful if you can’t.

Unit

There is a unit value and type, called ‘unit’. It means ‘this function returns no useful information’, more or less. It can be defined thusly:

data unit = $unit end
let unit unit = $unit

Bool

Bool is also pretty easy to define.

data bool =
   $false
   $true
end

let true bool = $true
let false bool = $false

Bitfields

Bitfields!

let bitfield(8) foo = <<0xFE>>
let bitfield(5) bar = <<0b10101>>
let bitfield(4) bop = <<4>>

XXX: Then we have expressions to test particular bits, set and clear and toggle bits…

Semantics

Program form

A program is a bunch of declerations.

An executable program starts at the function called “main”, of which there can only be one per program (cue Highlander). It takes an array of strings as its argument and returns an integer as the program’s return value. An implementation MAY change this as required by the host platform; for instance, for writing an operating system or programming an embedded device.

All expressions are evaluated in the order they are written, unless the compiler can prove that they don’t depend on each other. Function arguments are evaluated in an undefined order. Program execution probably starts at the function called “main” which receives as its argument an array of strings as its argument, and returns an integer as the program’s return value, yadda yadda.

Global variables

Not sure yet whether we can initalize them to expressions or just values. I rather want to KISS and say ‘just values’. We have some pretty powerful values, after all. Functions, for instance.

let somename type = someexpression

XXX: But then how do we do this?

let something ^int = @10

Type Definition

type foo = bar
struct foo = ... end
data foo = ... end

A type definition essentially an alias. You can do the following:

type foo = int
type bar = int
let x foo = 10
let y bar = 20
let z int = 20
y <- x
z <- y
x <- z

XXX: Is this right?

Function types

XXX: …

Builtin types

int
float
char
string
ref(a) (aka ^a)
array(a) (akaa[])
ptr
fun(argtype, argtype, … : rettype)
tuple(a,b…) (aka {a,b…}) XXX: Crap, having array be [foo] and ref be ^foo solves the ambiguity problem too. ² vs. [^foo]. I recall this being the original plan… foo^ and foo[] also solves the problem pretty nicely, and leaves the possibility for more affixes. ^foo and []foo would also work, but would probably make C programmers hurt even more.

XXX: Is this section entirely redundant?

We’ll have basically the entire Scheme numerical tower as well, it’ll just be a library.

Expressions

We start with only a few types of expressions apart from literals and variables: function definitions, if statements, while statements, function calls, value bindings, and pattern matching.

Functions

A function is defined thus:

def name(arg1 typ1, arg2 typ2, arg3 typ3 : returntype)
   body
end

A function always returns the value of the last expression in it, or the result of a return expression. Functions return one value. Any type of a size known at compile time (a tuple, struct, etc, but NOT an array) can be returned without worrying about where the memory for it is.

XXX: This is opposed to what the bit about references says if you try to return a local cell! Fook.

let f ^foo = new foo
f <- someFunctionReturningFoo()

The above code could result in someFunctionReturningFoo() getting passed a pointer to f and putting its ‘return value’ in it directly and invisibly, so copying would never need to happen.

We can also have anonymous functions, thus:

let x int[] = [1,2,3,4,5]
let f fun(int : int) = fun(x int : int) = x ** 2 end
Array.map!(f, x)
-- x now has the value: [2,4,6,8,10]

We don’t bother with closures right now. Functions cannot reach up outside their own scope, apart for referring to global values.

Defining a function is an expression, and returns the function defined. So you can do:

let f fun(int : int) = def g(x int : int) x+1 end

XXX: Inline and other compiler pragmas on functions!

Function Calls

f(x1, x2, …)

All math and logic and such are function calls under the hood. Trust the damn compiler to inline them.

XXX: A good idea would probably be to use something like Python’s metamethods! So ‘1 + 2’ is really ‘add(1,2)’.

If and While and For

These are expressions! They return one value, the last expression in them. They create a new block/scope.

if a then b [elif c]** [else z] end

while a do b end

for x int = 0; expr2; expr3 do stuff end

Only unexpected thing is ‘for’; this is almost but not quite like C. The first thing is basically a special value binding that gets re-bound every iteration of the loop. The second is a boolean expression for the test. The third is an expression that returns what the new value of the index should be. For example:

for x int = 0; x < length(someArray); x + 1 do
   someArray[x] <- x**2
end

-- For some abstract data type with an iterator function...
for i iterator = getIterator(s); iteratorDone(i); iteratorNext(i) do
   doSomethingWith(getValue(i))
end

XXX: Should it get turned into the ‘for x in y [by z] do a end’ construct instead? Hmmm. It seems to make a lot more sense in this context, but is less flexible.

Return

The return expression is just a shortcut that exits a function immediately, returning the given statement as the return value.

def example(x foo : rettype)
   if not (isValid x) then
      return ErrorValue
   end

   ...Do stuff...
end

A return must be the last expression in a block, to prevent weird shenanigans like:

def foo(: something)
   let a something = return $Something

Local values

Local value decleration is an expression. It returns the unit type. Binding the same value name more than once in the same scope block is an error. The LHS is a pattern, the RHS is an expression

let x int = a
let y float = b
let z somethingelse = c
let {a,b,c} {int,float,string} = someComplicatedFunction()

XXX: Can we define arbitrary blocks, then???

Assignment

These are expressions too, though they just return the value of the RHS

Assignments are pretty straightforward. They alter the value of a location. That is, a reference, structure element, or array element.

The LHS of an assignment is a location; that is, a reference, or a location in a structure or array. Thusly:

let a int = 5
a <- 10          -- Invalid
let b ^int = @a  -- ^b = 5, a = 5
b <- 10          -- ^b = 10, a = 5
let c somestruct = someStruct {foo=10}
c.foo <- 20      -- c.foo = 20
let d [int] = [1,2,3,4,5]
d[0] <- 10       -- d = [10,2,3,4,5]

Pattern Matching

Again, expressions. Breaks a value out of a tagged type or a tuple. Can also just match arbitrary values.

match x with
   foo -> doSomething(foo);
   SomeStruct{foo=10} -> doSomethingWithSomeStruct(x);
   SomeVal -> doSomethingElse();
   _ -> anythingElse();
end

match y with
   10 -> 1;
   11 -> 2;
   _ -> 0;
end

_ matches any pattern.

XXX: SOMEDAY we will have a match-one-item kind of thing, like OCaml’s “let {a,b} = sometuple in …”. But we do not have it yet, since it would throw an exception when the match failed, and we don’t have exceptions. For now all pattern matches must be complete.

Refinement: Be able to say “match x with foo where foo=10 -> …”, or “match x with foo@bar -> …” where foo@bar means “evaluate expression bar and match against its result, binding to foo if it matches”. This may happen later.

sizeof

sizeof() takes a type and returns the size of the type, thus:

let intsize int = sizeof(int)
let floatsize int = sizeof(float)

sizeof() can return any integer type that is big enough to represent any type.

XXX: This can be made entirely redundant if we store runtime type information. This is not bloat; most of this information would be things we need and use anyway.

Minimum type information needed would be:

Size
Initializer
Comparison function …it’s tempting to do more, like a toString function. But then we’re just making classes, and so we might as well just have classes! Either C#-like or generic functions…

Also, typeclasses still don’t seem like a horrid idea.

Math and Logic

We define 5 math operations: +, -, /, ** and % (modulus). They work transparently on all number types, have expected precedence, etc.

We have some logic operation: and, or, not, xor. These take boolean values and return boolean values, and perform short-circuit evaluation.

We have some bitwise logic operations, that treat integers as bitfields: band, bor, bnot, bxor, shl, shr, lshr. shl and shr perform arithmatic bit shifts, preserving the sign bits. lshr performs a logical right shift.

We have one equality operator: =. It takes two values or references of the same type and compares them. Values are compared as binary blobs, field by field. References are equal iff they both refer to the same location.

Namespaces

Namespaces are a way of controlling the visibility of things in a file, and controlling name clashes. When you declare that a file is part of a namespace, everything in it gets added to that namespace. You can also declare in a file that you want to import a certain namespace, or import it and change its name. If you do not, you can still access that namespace (linker permitting, and such), but have to refer to anything in it by the whole name path.

Namespaces are hierarchical. Namespaces have no associations with files or file names. At the moment a file is the smallest unit you can include in a namespace, but there’s no reason that has to be the case. A namespace can also be made of multiple files; it is an error to try to define the same value in the same namespace more than once. Namespaces are global among all compilation units a program uses, including dynamic libraries. If a file does not explicitly declare a namespace, it exists in some private, automatically-named one that you can’t actually learn the name of, and everything in it is private.

You can use an import directive to make the values in a different namespace available in the current namespace, without having to prefix each value with the whole path. If you import two different namespaces with the same value name in them (for instance, Foo.A and Bar.A), then the compiler SHOULD issue a warning and the programmer MUST say explicitly which is which every time you use them. So you CAN import two different namespaces which contain the same name, but you must disambiguate them by hand.

You may also use ‘import A as B’ to rename a namespace that you import, giving it a different prefix. So you could say ‘import Some.Very.Long.Namespace.Heirarchy as S’.

If there is no ‘export’ directive, all values are exported by default. Otherwise only the values listed after the ‘export’ directive are made public to other namespaces.

Example.

-- File 1
namespace Foo.Bar
def doStuff(: rettype)
   ...
end

-- File 2
namespace Foo.Bar
def doMoreStuff(: rettype)
   doStuff()   ; No namespace path needed.
end

-- File 3
namespace Foo.Bop
def doOtherStuff(: rettype)
   Foo.Bar.doStuff()   ; Full namespace path needed
end

-- File 4
namespace Foo.Bop
import Foo.Bar as F

def doYetMore(: rettype)
   F.doStuff()
end

Implementation notes

This needs a Sufficiently Clever Linker and a system to keep track of some metadata, but shouldn’t be too hard. The main thing is that there is no way to promise that a namespace only contains a set of functions; so really, you can stick stuff in whatever damn namespace you want as long as it doesn’t cause name clashes.

It is intended for a compiler to be able to resolve dependancies for a program starting from a single key file, ie the file containing a program’s ‘main’ function or a library’s public interface. However, I just realized this is impossible since a compiler can’t know what namespace a file is in without peeking into the file. If we force namespaces to be file names like Haskell we can maybe do this. I dunno if we want to do that though. We may have to suck it up and live with make for another 20 years.

XXX: We COULD… essentially, compile all libraries/namespaces/whatever down to .NET assemblies, with appropriate metadata and everything, and this would let us have an intermediate, typed wossname for parametric polymorphism. Then we just do the final compilation and linking in native code. This really isn’t sounding like a bad option. LLVM bitcode might also be another option??? The LLVM people sort of say “don’t do this, it’s not entirely stable”, which is worrisome.

Exceptions

They don’t exist yet. They will someday. We may make general non-local exits, with the exit-to point being an explicit continuation-like object that can get passed around, a little like Scheme. But first we need an unwind-protect mechanism.

XXX: Will they? Will they really?

Inline Assembly

The general syntax of it will be something like this:

asm "abi-decl"
   STUFF GOES HERE
end

The “abi-decl” string is a string of a type that should hopefully be standardized later, which defines what kind of architecture and ABI the assembly code is using. If the compiler doesn’t recognize it or can’t compile it, it then knows to throw a sane error. The “STUFF GOES HERE” part is the assembly code itself. How it actually works is pretty much undefined right now.

Pragmas

Pragmas are hints/instructions in the source file telling the implementation to do something special. They have the following syntax:

pragma keyword
pragma keyword=value
pragma optional keyword

There may be multiple pragma declarations in a source file, and they apply only to the source file they are contained in. Some pragmas can even appear multiple times, like line=. Like structures, a pragma MUST be followed by the implementation unless it is prefixed with ‘optional’, in which case it SHOULD follow it and MUST give a warning if it cannot.

The following standard pragmas MAY be implemented. An implementation SHOULD use one of the following rather than defining something subtly different to do the same job.

line= – Set line number in error reporting
file= – Set file name in error reporting
inline= – Always inline the named function
warn= – Cause the compiler to emit the value as a warning
message= – Output the value as a message Others that may be useful?
pad – Always pad structures by default in this file
pack – Always pack structures by default in this file
align= – Always align global values to the given boundary
deprecated= – Cause the compiler to emit a warning complaining of a deprecated feature or part of code, with the value Implementations are free to add more pragmas as is pragmatic.

Extensions

These are various things that extend the language in ways that are probably convenient and desirable, but not to everyone. Implementations MAY use any or all of these extensions, but MUST document somewhere which they use.

It is hoped that by documenting and defining some optional things people won’t always go haring off making incompatible features that do the same thing.

The assembly block, structure directives and compiler pragmas are also designed to be extended by implementations as needed. XXX: We should probably make directives for individual variables as well…

I had two or three things I wanted to put here, but forgot them. Segmented pointer syntax? Memory location stuff? General variable attributes?

Dynamic memory

XXX: Screw all below. We could do bog-standard malloc(), we could do manually refcounted memory, we could do regions… So far the language really doesn’t care, but the standard library sure as fuck will. Hmmm.

This is in the extensions section because there are really two different ways it could be done, which are sort of mutually exclusive… free’ing a region allocated thingy is invalid if not fatal, and the standard library needs to either use one or the other, or not allocate at all.

XXX: This whole section is pretty rough.

XXX: Can we do this? It seems like a bit of a strain on the type system, but I THINK it’s valid…

def new(r region, initvalue %t : ^%t)
   let v pointer = getMemoryFromPool(r)  -- XXX: Still need sizeof!  Dammit!
   let newVal ^%t = pointerReference(v)
   newVal <- initval
   newVal

new in with

Or OPTIONALLY

new with

free, rfree

BREAK

Everything above here in the document is pretty well defined, if only defined as ‘we will define this later’. Below this point things are pretty fuzzy and speculative, and probably won’t be implemented in the first few versions.

Typeclasses

Essentially, you say ‘here is a set of functions that implement this interface upon this type’. And then you can use that interface as a type, and declare other types to use the same interface and thus be members of that typeclass.

Revise: Make this exist. Also, ponder multivariate objects, and Excessively Dynamic objects like Objective-C.

Implementation notes

Easy peasy; instead of having a pointer attached to an object’s in-memory representation that points to the typeclass, then in instances where a typeclass is used the pointer is passed along with the value. So for instance, if we have typeclass Num and an implementation of it for Int:

def doSomething(x1 Num, x2 Num : Num)
   x1 + x2
end

It compiles to:

def doSomething(x1_tc NumImplementation, x1 Num, x2_tc NumImplementation, x2 Num : {NumImplementation, Num})
   x1_tc.AddMethod(x1, x2_tc, x2)
end

Or something reasonably similar, anyway.

Safety considerations

We cannot have buffer overruns unless you force it to, either by using pointers or telling the compiler to turn off bounds checks.

We have no null references.

We have no uninitialized variables.

We can have memory leaks, but they should be easier to both avoid and debug. Regions are deallocated all at once, and you should be able to tell at runtime how big a region is, and thus notice if it is getting huge. However regions themselves can also be leaked, which is vaguely worrisome.

We can have dangling pointers, but hopefully it should be a bit easier to avoid. This could be largely eliminated if we give pointers region specifiers; for instance, this particular pointer may not refer across region boundaries. This would remove the danger, and could also provide a way to track things that are allocated in a stack frame.

We can refer stack-allocated values outside their scope if we’re not careful. See above.

We can have double-frees. Hmmm. Even refcounting won’t fix this.

Library

We need ratios and bignums. Basically, the whole Scheme numerical tower.
Library functions should have mutating and nonmutating versions, a la Ruby, and also have allocating and non-allocating versions so you get to choose how much micro you want to do
Check out Cocoa and .NET and Ruby.
Check out D’s Phobos and Tango libraries We can create destructors within the language itself, using something along these lines:

class IDestroyable =
   destroy  fun(IDestroyable : null)
end

instance foo : IDestroyable =
   def destroy(f foo : null) =
      ...
   end
end

-- Region needs to know about this, which is a little bit of feeping creaturism...
-- Maybe we can have multiple types of regions, implementing the same typeclass interface???
-- Hmmm.
def rBindDestructor(d IDestroyable, reg Region : null)
   let dlist List(IDestroyable) = reg.Destroyable  
   List.rallocNode(dlist, reg, d)
end

def rfree(reg Region : null)
   List.Iter(destroy, reg.Destroyable)
   ...
end

var a : foo = ralloc(region)
rBindDestructor(a, region)
rfree(region)

I haven’t decided how we jibe the typing with memory allocation yet, so just bear with me. And of course the pool of things to free doesn’t have to be actually attached to the region, but it would be nice.

Inspirations

Haskell
OCaml
C
Cyclone
Common Lisp
Python
Boo
Probably others

↩
foo↩