BetterThanC

Thoughts on creating a minimalistic systems language that starts from an ML-y base. Rust is better than C++, can we make something more basic that is better than C?

Largely inspired by Rust; my thought is basically “what if we took a strong, ML-y type system, slapped pointers into it, and tried to see how far we could crank up the safety without a borrow checker”. Other influences: OCaml, F#, everything in https://wiki.alopex.li/_category/language

…I kinda want a borrow checker though. And ♥️move semantics♥️. We’ll think about that as we go.

Things to look at much more deeply: Nim, Idris, ATS, Myrrdin, Zig

Key ideas

Goals:

  • RAII
  • Move semantics
  • Generics, but without monomorphization at least at first
  • Immutable by default
  • ML-y type system
  • Compiles fast(er than Rust)
  • You can easily write an OS kernel in it

Later goals:

  • Full spread of number types – gonna need it sooner or later but think about how the math works first
  • Rust levels of safety – it’d be nice but may not be worth the trouble yet, explore.
  • Type inference – It will happen but not yet
  • Module system
  • Stable ABI, lib metainfo format (such as header files)
  • Borrow checking, at least a limited form of it

Maybe goals:

  • Zero-sized types?

Non goals:

  • Absolute memory safety
  • Whatever it is that makes traits in Rust so powerful (and complicated)

Syntax

We’ll just do it ad-hoc right now since we’re just brainstorming. It’s basically going to look like Rust and Lua had a lovechild, I expect.

Comments use // (line) and /* */ (block) Block comments can be nested.

Types

Are zero-sized types a thing? Unsized types? …yeah, unsized types definitely are. ZST’s I need to understand more about; there are some huge benefits, but also it adds a lot of complexity.

Rules: A pointer to a ZST is always invalid. A ZST by definition can only ever have one value.

Numbers

Integers of specific sizedness. u8, u16, etc, like Rust, up to 64 bits. Signed and unsigned variants. Like Rust, not convertible into each other implicitly. There is also usize, the size of an array index (also the size of a pointer?) and isize, the size of a pointer offset. Rename to idx and offset? uidx and ioffset?

bool is also a type. Booleans and integers are not the same.

char is a UTF-32 character.

Tuples

A collection of zero or more possibly-different types.

() (aka unit)

(u8, u16, i32)

Access members either through indexing via tuple.0 or tuple.1 or such, or via match or let.

Should it just be the same as array indexing?

Arrays

Fixed-sized collections of homogeneous values.

[u8;32]

Basically Rust arrays.

Making them generic on length would be nice but don’t bother right now. We don’t even have the easy generics. We might be able to special case it without too much heartbreak though?

Slices

A typed pointer and length. Refers to a segment of an array, generally.

[u8]

Use @u8 or something as a sigil instead? Eh. Maybe.

Strings, string slices

Going to be UTF-8 in the pattern of Rust, but don’t worry about it yet.

Enums

One or more variants, optionally with an enum attached.

Can we distinguish from c-like and ML-like enums somehow? The former are a very useful subset of the latter. C-like enums are basically a shortcut to define a mathematical set, where each member of the set is an integer. ML-like enums are a shortcut to define a mathematical set, where each member of the set is a tag+tuple.

The latter are also mostly useless without generics. The easy way is to make them entirely separate things, but ugh. The Better way to do it is to have certain Trait-like properties that can be implemented for, or are implemented automatically for, C-like enums. Or at least that’s what I would expect, and am confused that Rust doesn’t do.

enum TaggedNumber =
  | Nothing
  | Signed (i32)
  | Unsigned (u32)
end

Aha, one way to do it is to have two types: Sets, which are C-like enums, and Enums, which are Rust-like enums. Then you can make it so that for each Enum you can pull out the discriminant as its own Set type, define equivalences between Set’s and Enum’s, and so on.

Structs

Basically just tuples with syntactic sugar? Don’t worry about them now tbh.

Pointers

May or may not point to valid memory. No null pointer.

*const i32

*mut i32

Use ^ as the sigil instead? Eh, don’t bother

Functions

A function pointer type. No closures yet.

argtype1 -> argtype2 -> returntype

Generics

This is going to be kind of a big deal. I consider it necessary, but it’s also quite not-easy.

There’s two ways we can approach this. We can do the C++/Rust way, where generics are specialized at compile-time and a new instance of a generic function is built for each specialization. Or we can do it the OCaml way, where generics always live behind a pointer. The latter method is more limited but far easier to reason about (and compile).

Let’s start with “generics must live behind a pointer” and see where that takes us. …It basically takes us to a “sized vs. unsized type” concept, which is p. useful tbh.

Don’t bother yet

Floats, characters, strings

Actually characters are easy, it’s just a UTF-8 char in a u32.

Mutability

Everything is immutable by default.

Variables may be declared as mutable. It’s a little redundant since you can just re-bind variables, but it’s some sugar.

Const vs. mut pointers on the other hand are part of the actual type.

Keep in mind that in Rust, you can turn owned values from const to mut, but turning references or pointers from const to mut is undefined. This lets the compiler inline const’s more aggressively I expect. Investigate more.

Functions that are explicitly pure would also be nice. Rust does this with const fn, which only implement a limited subset of the language that (IIRC) is provably pure and actually terminates. Haskell, OTOH, basically says “everything is pure and any side-effects have to get wrapped in a monad”, which is nice to reason about but harder to program in.

Expressions

Values

Integers, enums, tuple literals, array literals, etc.

Variable declaration

let x: i32 = 10

we don’t need no semicolons, hopefully

no type inference yet

No uninitialized variables. Later, maybe they can be uninitialized until assigned to when the compiler proves it valid.

_ is an ignored variable name.

Tuple destructuring:

let x, y = (10, 20)

Enum destructuring:

let Foo(x) = y

It is a compile-time error to not handle every possible variant. if let comes later.

Is this an expression? For now we’ll say “yes, it returns ()

Assignment?

Figure that out if we want mutability, and when.

Is this an expression? For now we’ll say “yes, it returns ()”, which is the simplest way to close the “statement” vs “expression” divide.

Match

match foo with
  | X -> expr1
  | Y -> expr2
  | _ -> expr3
end

_ means “anything else”. It is a compile-time error to not handle every possible variant.

I kinda dislike -> and => being separate sigils. See if we can just use -> everywhere without it being ambiguous.

Operators

Don’t bother yet. Just assume the presence of + - * / % and or not xor.

Pointer stuff

Creating a pointer pointing to the result of an expression uses &. Assume the compiler is smart enough to handle that – put the result in a temporary place, return a pointer to it.

Dereferencing a pointer uses *. Dereferencing an invalid pointer is allowed to do anything, but I’m not sure that the compiler should be allowed to assume it never happens.

let x: i32 = 10
let y: *mut i32 = &20
let mut z: *const i32 = &30

// Change the contents of the memory y points at
*y = x

// Change the memory that z points at
z = &x

Function definitions

Should be an expression. Really just a variable declaration.

A function expression is just fn x: type, y: type -> return_type expr

Examples:

let add: i32 -> i32 -> i32 = fn x: i32, y: i32 -> i32 do
    x + y
end

Very easily sugar’ed to:

let fn add x: i32, y: i32 -> i32 do
   x + y
end

No type inference still. Yeah there’s redundancy, suck it up for now.

No closures, yet.

Grouping

heck, just use do ... end for now to squish multiple expressions together and return the value of the last one. { ... } (Rust style) or parenthesis (OCaml style) might be nicer. Ponder it. This also makes a scope happen.

Function calls

Just usual foo(v1, v2) etc whatever.

No currying, yet. No tuple unswizzling or such, yet.

Indexing

Slices and arrays can be indexed by the syntax expr[expr]. Indexing is bounds-checked and must be a positive integer. Attempting to index an out of bounds value is a runtime error.

heck, indexing in Rust panics, do we really want that to be the default?

Type conversions

Just looks like a function. (generic, with type arguments, so not actually a function yet). force(from_expr, to_type). Only accepts pointer types. Forcing a const to a mut pointer is not allowed. Forcing a mut to a const pointer is just fine. Horribly unsafe by definition, just like Rust’s transmute, but we may choose to enforce more restrictions about what does and does not go into a type’s layout than Rust does. A notable example: Rust is allowed to turn Option<NonZeroU32> into a single word. Thus, you can not assume that Option involves a discriminant in a separate word.

let foo: *const i32 = &i32
let bar: *const u32 = force(foo, *const u32)

Move semantics

We want them by default. We also want to be able to turn them off for certain types, as Rust does with the Copy trait. Perhaps we want a POD trait or marker of some kind, for Plain Ol Data, which can be defined as “anything that doesn’t contain a pointer”. These things are not going to result in aliasing pointers, and so are safe to copy wherever you want them to.