BetterThanC
Thoughts on creating a minimalistic systems language that starts from an ML-y base. Rust is better than C++, can we make something more basic that is better than C?
Largely inspired by Rust; my thought is basically “what if we took a strong, ML-y type system, slapped pointers into it, and tried to see how far we could crank up the safety without a borrow checker”. Other influences: OCaml, F#, everything in https://wiki.alopex.li/_category/language
…I kinda want a borrow checker though. And ♥️move semantics♥️. We’ll think about that as we go.
Things to look at much more deeply: Nim, Idris, ATS, Myrrdin, Zig
Key ideas
Goals:
- RAII
- Move semantics
- Generics, but without monomorphization at least at first
- Immutable by default
- ML-y type system
- Compiles fast(er than Rust)
- You can easily write an OS kernel in it
Later goals:
- Full spread of number types – gonna need it sooner or later but think about how the math works first
- Rust levels of safety – it’d be nice but may not be worth the trouble yet, explore.
- Type inference – It will happen but not yet
- Module system
- Stable ABI, lib metainfo format (such as header files)
- Borrow checking, at least a limited form of it
Maybe goals:
- Zero-sized types?
Non goals:
- Absolute memory safety
- Whatever it is that makes traits in Rust so powerful (and complicated)
Syntax
We’ll just do it ad-hoc right now since we’re just brainstorming. It’s basically going to look like Rust and Lua had a lovechild, I expect.
Comments use //
(line) and /* */
(block)
Block comments can be nested.
Types
Are zero-sized types a thing? Unsized types? …yeah, unsized types definitely are. ZST’s I need to understand more about; there are some huge benefits, but also it adds a lot of complexity.
Rules: A pointer to a ZST is always invalid. A ZST by definition can only ever have one value.
Numbers
Integers of specific sizedness. u8
, u16
,
etc, like Rust, up to 64 bits. Signed and unsigned variants. Like Rust,
not convertible into each other implicitly. There is also
usize
, the size of an array index (also the size of a
pointer?) and isize
, the size of a pointer offset. Rename
to idx
and offset
? uidx
and
ioffset
?
bool
is also a type. Booleans and integers are not the
same.
char
is a UTF-32 character.
Tuples
A collection of zero or more possibly-different types.
()
(aka unit)
(u8, u16, i32)
Access members either through indexing via tuple.0
or
tuple.1
or such, or via match
or
let
.
Should it just be the same as array indexing?
Arrays
Fixed-sized collections of homogeneous values.
[u8;32]
Basically Rust arrays.
Making them generic on length would be nice but don’t bother right now. We don’t even have the easy generics. We might be able to special case it without too much heartbreak though?
Slices
A typed pointer and length. Refers to a segment of an array, generally.
[u8]
Use @u8
or something as a sigil instead? Eh. Maybe.
Strings, string slices
Going to be UTF-8 in the pattern of Rust, but don’t worry about it yet.
Enums
One or more variants, optionally with an enum attached.
Can we distinguish from c-like and ML-like enums somehow? The former are a very useful subset of the latter. C-like enums are basically a shortcut to define a mathematical set, where each member of the set is an integer. ML-like enums are a shortcut to define a mathematical set, where each member of the set is a tag+tuple.
The latter are also mostly useless without generics. The easy way is to make them entirely separate things, but ugh. The Better way to do it is to have certain Trait-like properties that can be implemented for, or are implemented automatically for, C-like enums. Or at least that’s what I would expect, and am confused that Rust doesn’t do.
enum TaggedNumber =
| Nothing
| Signed (i32)
| Unsigned (u32)
end
Aha, one way to do it is to have two types: Sets, which are C-like enums, and Enums, which are Rust-like enums. Then you can make it so that for each Enum you can pull out the discriminant as its own Set type, define equivalences between Set’s and Enum’s, and so on.
Structs
Basically just tuples with syntactic sugar? Don’t worry about them now tbh.
Pointers
May or may not point to valid memory. No null pointer.
*const i32
*mut i32
Use ^
as the sigil instead? Eh, don’t bother
Functions
A function pointer type. No closures yet.
argtype1 -> argtype2 -> returntype
Generics
This is going to be kind of a big deal. I consider it necessary, but it’s also quite not-easy.
There’s two ways we can approach this. We can do the C++/Rust way, where generics are specialized at compile-time and a new instance of a generic function is built for each specialization. Or we can do it the OCaml way, where generics always live behind a pointer. The latter method is more limited but far easier to reason about (and compile).
Let’s start with “generics must live behind a pointer” and see where that takes us. …It basically takes us to a “sized vs. unsized type” concept, which is p. useful tbh.
Don’t bother yet
Floats, characters, strings
Actually characters are easy, it’s just a UTF-8 char in a u32.
Mutability
Everything is immutable by default.
Variables may be declared as mutable. It’s a little redundant since you can just re-bind variables, but it’s some sugar.
Const vs. mut pointers on the other hand are part of the actual type.
Keep in mind that in Rust, you can turn owned values from const to mut, but turning references or pointers from const to mut is undefined. This lets the compiler inline const’s more aggressively I expect. Investigate more.
Functions that are explicitly pure would also be nice. Rust does this
with const fn
, which only implement a limited subset of the
language that (IIRC) is provably pure and actually terminates. Haskell,
OTOH, basically says “everything is pure and any side-effects have to
get wrapped in a monad”, which is nice to reason about but harder to
program in.
Expressions
Values
Integers, enums, tuple literals, array literals, etc.
Variable declaration
let x: i32 = 10
we don’t need no semicolons, hopefully
no type inference yet
No uninitialized variables. Later, maybe they can be uninitialized until assigned to when the compiler proves it valid.
_
is an ignored variable name.
Tuple destructuring:
let x, y = (10, 20)
Enum destructuring:
let Foo(x) = y
It is a compile-time error to not handle every possible variant.
if let
comes later.
Is this an expression? For now we’ll say “yes, it returns
()
”
Assignment?
Figure that out if we want mutability, and when.
Is this an expression? For now we’ll say “yes, it returns
()
”, which is the simplest way to close the “statement” vs
“expression” divide.
Match
match foo with
| X -> expr1
| Y -> expr2
| _ -> expr3
end
_
means “anything else”. It is a compile-time error to
not handle every possible variant.
I kinda dislike ->
and =>
being
separate sigils. See if we can just use ->
everywhere
without it being ambiguous.
Operators
Don’t bother yet. Just assume the presence of
+ - * / % and or not xor
.
Pointer stuff
Creating a pointer pointing to the result of an expression uses
&
. Assume the compiler is smart enough to handle that –
put the result in a temporary place, return a pointer to it.
Dereferencing a pointer uses *
. Dereferencing an invalid
pointer is allowed to do anything, but I’m not sure that the compiler
should be allowed to assume it never happens.
let x: i32 = 10
let y: *mut i32 = &20
let mut z: *const i32 = &30
// Change the contents of the memory y points at
*y = x
// Change the memory that z points at
z = &x
Function definitions
Should be an expression. Really just a variable declaration.
A function expression is just
fn x: type, y: type -> return_type expr
Examples:
let add: i32 -> i32 -> i32 = fn x: i32, y: i32 -> i32 do
x + y
end
Very easily sugar’ed to:
let fn add x: i32, y: i32 -> i32 do
x + y
end
No type inference still. Yeah there’s redundancy, suck it up for now.
No closures, yet.
Grouping
heck, just use do ... end
for now to squish multiple
expressions together and return the value of the last one.
{ ... }
(Rust style) or parenthesis (OCaml style) might be
nicer. Ponder it. This also makes a scope happen.
Function calls
Just usual foo(v1, v2)
etc whatever.
No currying, yet. No tuple unswizzling or such, yet.
Indexing
Slices and arrays can be indexed by the syntax
expr[expr]
. Indexing is bounds-checked and must be a
positive integer. Attempting to index an out of bounds value is a
runtime error.
heck, indexing in Rust panics, do we really want that to be the default?
Type conversions
Just looks like a function. (generic, with type arguments, so not
actually a function yet). force(from_expr, to_type)
. Only
accepts pointer types. Forcing a const to a mut pointer is not allowed.
Forcing a mut to a const pointer is just fine. Horribly unsafe by
definition, just like Rust’s transmute, but we may choose to
enforce more restrictions about what does and does not go into a type’s
layout than Rust does. A notable example: Rust is allowed to turn
Option<NonZeroU32>
into a single word. Thus, you can
not assume that Option
involves a discriminant in
a separate word.
let foo: *const i32 = &i32
let bar: *const u32 = force(foo, *const u32)
Move semantics
We want them by default. We also want to be able to turn them off for
certain types, as Rust does with the Copy
trait. Perhaps we
want a POD
trait or marker of some kind, for Plain Ol Data,
which can be defined as “anything that doesn’t contain a pointer”. These
things are not going to result in aliasing pointers, and so are safe to
copy wherever you want them to.