BetterThanYaml

See BetterThanJson for earlier work on this.

Inspired by some conversation on lobste.rs and my own damn brain going after shiny objects instead of doing something useful.

What are the things about yaml we want to preserve? Why do people use it in the first place?

  • easy, non-delimited-ish lists and key-value structures
  • low noise – easy to read in places where the values matter more than the structure – that’s why people use it instead of JSON or XML
  • Good for deeply nested tree-structures, like JSON or XML, unlike TOML

So what are the problems with yaml?

  • Ambiguous and inconsistent
  • Lots of legacy bad decisions
  • Lack of composition – nesting things actually has lots of sharp edges and is easy to screw up
  • Inline lists and structs are actually pretty nice sometime, and yaml doesn’t do them too well.
  • String delimiters, and lack of them, are always pain
  • This results in parsing being messy and that makes bugs

I personally think we should just use EDN and have sexprs everywhere, but that’s apparently still marketing suicide, so let’s try something else.

Things we need:

  • Nice, easy nesting of lists and maps
  • Consistent, easy-to-parse and easy-to-validate syntax
  • Make the simple things simple and the complex things possible

Smaller goals:

  • “A YAML file is almost always still ‘valid’ even if it is trunca” – ooh, nice callout

So here’s a first pass.

Primitive types

These are things that can’t contain other types inside of them.

Numbers

Integers are integers. 0x prefix signifies hex numbers, 0b binary, we’ll do 0o for octal if anyone is ever silly enough to ask for it.

1
0x34F
0b101
-3
-0xFF

Floats are numbers that contain a period. Their allowed precision is implementation defined for now, even though that’s a bad idea.

1.
1.0
1.01

It’s probably also a good idea to allow [number]e[integer] syntax for powers of 10.

1000.0
1e3
1.0e3
1.159e-37

Booleans

Booleans are booleans. True or false. No yes or no bullshit. We might want a nil someday but not right now.

true
false

Strings

Strings are strings. They are enclosed by double quotes. \x can be used to escape characters, where x can be one of \nrt. You can also do \u{1234} or something to make a unicode code point I suppose; research this more.

Strings can contain newlines I suppose, ’cause I’m not sure there’s a reason they shouldn’t.

We don’t have a separate type for characters, they’re just strings of length 1.

"foo"
"hello\tworld"
"hi there
world"

Possible elaborations

There’s a number of ways to make strings nicer to write, at the cost of making them harder to parse. We aren’t gonna have bare strings no matter what, but maybe having > foo or | foo as a synonym for "foo" would be nice. Less for what it looks like to write, and more for how easy it is to read. Make ’em terminated at the end of the line though, so you don’t have to think about parsing multi-line strings with indentation or some such shit. So:

> foo
> bar

would be two strings, “foo” and “bar”, not “foo”.

Sigil types

I’m just gonna steal this whole hog from Elixir’s sigils ’cause that works pretty well, actually. So a custom data type can be written with a ~ followed by an identifier ([a-zA-Z0-9_]+) followed by a delimited string. This string is then parsed into some type based on the identifier, if the parser understands what type it is.

So you can write a regex like this:

~r/foo|bar/

Or a date like this:

~Date[2023-01-01]

Right now we don’t try to describe what the valid identifiers are. We can do that later, and implementations are free to add their own. But it’s useful to have some standard representations for common things, so for now we’ll just steal Elixir’s list and have:

  • ~U – UTC datetime
  • ~D – date
  • ~r – regex
  • ~s – string

For now we will follow Elixir’s example and allow the following delimiters:

~r/hello/
~r|hello|
~r"hello"
~r'hello'
~r(hello)
~r[hello]
~r{hello}
~r<hello>

All delimiter pairs are treated the same way. Per Elixir’s docs, “The reason behind supporting different delimiters is to provide a way to write literals without escaped delimiters.” So if you want your regex to include / you can write ~r"thing|/" No characters inside the delimiters are escaped.

Keywords/symbols?

Not yet.

nil/none?

Not yet.

Compound types

Lists

This is where yaml is relatively nice, tbh. At least to read. If things aren’t nested too deeply.

What if instead of yaml we do something like this?

- 1
- 2
- 3
# -> [1, 2, 3]

- 1
-- 2
--- 3
# -> [1, [2, [3]]]

So then we could write nested lists pretty easily and unambiguously without having to fucking eyeball any indentation:

- 1
-- 1.1
--- 1.11
-- 1.2
-- 1.3
- 2
- 3
-- 3.1

# -> [
#   1, [
#     1.1, [
#       1.11
#     ],
#     1.2,
#     1.3,
#   ],
#   2,
#   3, [
#     3.1
#   ]
# ]

Considering how much fiddly annoyance it was to translate the first version into the second version, I think we might be on to something.

This is not indentation sensitive. The number of dashes signifies nesting of lists.

- 1
    - 2
- 3

# -> [1, 2, 3]

-- 3
 - 4
-- 5

# -> [[3], 4, [5]]

Not sure this is a good idea but I guess we will see how cursed it gets when life becomes more complex.

Maybe we allow _ or such as well, so if shit gets really deeply nested you can use it as a separator, like commas or underscores in numbers.

-------- 1
-------- 2
-------- 3

# can become

----_---- 1
----_---- 2
----_---- 3

Do we treat it as nothing, or as another - character? Not sure. Other possible candidate characters for this are | or , I suppose.

Empty elements are invalid. So you cannot do:

- 1
- 
- 3

To be unambiguous with negative numbers, you may NEVER omit the space after the last dash in a line.

- 1
- 2
-3   # Always parses as the number negative three


- "a"
- "b"
-"c" # Not allowed

This syntax does fail the the truncation thing, though. A truncated list of this style is often still valid. Though since we require closing delimiters for strings and don’t allow empty list elements, it’s a lot more fail-safe than YAML is.

Delimited lists

Because sometimes you just want to write

[1, 2, 3]

instead of

- 1
- 2
- 3

Let’s not make it possible to mix the two though. If you have a delimited list, all lists inside it must also be delimited.

Let’s just use the [1, 2, 3] syntax itself. It works. Trailing commas are allowed.

Maps

idfk.

Honestly this might be a case where just staying with the traditional {key: val} syntax is a good idea. Nesting lists and maps in yaml always ends up kinda horrible.

But it’s also pervasive as hell and not having such an option gets realllll noisy. What happens if we just have a line-based syntax that makes struct-nesting explicit with a prefix, like we do with lists?

{ 
  foo: 1, 
  bar: 2, 
  bop: 3 
}

. foo: 1
. bar: 2
. bop: 3


{ 
  foo: 1, 
  bar: {
    inner: "something",
    another: "something else",
  }, 
  bop: 3 
}


. foo: 1
. bar: 
.. inner: "something"
.. another: "something else"
. bop: 3

It’s…. kinda cursed but also kinda works. It’s kinda still an indentation-based syntax, but we’re making the indentation visible and resetting it whenever heterogenous structures are nested. It lets us get rid of the commas between members, which is also nice; we just use newlines instead. Not sure whether . is the right sigil for it or not but that’s easy to change, I tried @ but it was way worse. Maybe need something between the two in terms of visual impact.

So I think that again like lists, we want both a delimited and a line-based syntax.

Comments

# to the end of the line. Multiline comments can get bent for now.

Exercise: translate some Ansible nonsense to this

- hosts: all
  remote_user: ansible
  become: yes
  tasks:
    - name: test connection
      ping:

    - name: SSH started (a bit of a tautology, I know)
      service:
        name: sshd
        state: started
        enabled: true
    - name: Install rsyslog
      apt:
        name: rsyslog
        state: present
    - name: Start rsyslog
      service:
        name: rsyslog
        state: started
        enabled: true
    - name: Install logrotate
      apt: name=logrotate state=present
    - name: Setup cron job to clean apt cache
      copy:
        src: conf/etc/cron.monthly/apt-clean
        dest: /etc/cron.monthly/apt-clean
        owner: root
        group: root
        mode: 0700

If we only have the {} struct syntax, this becomes:

{
  hosts: "all",
  remote_user: "ansible",
  become: true,
  tasks: 
  - {
    name: "test connection",
    ping: "",
  }
  - {
     name: "SSH started (a bit of a tautology, I know)",
     service: {
        name: "sshd",
        state: "started",
        enabled: true
     }
  }
  - {
    name: "Install rsyslog",
    apt: {
      name: "rsyslog",
      state: "present",
    }
  }
 - {
   name: "Start rsyslog",
   service: {
      name: "rsyslog",
      state: "started",
      enabled: true,
    }
  }
  - {
    name: "Install logrotate",
    apt: { name: "logrotate", state: "present", }
  }
  - {
    name: "Setup cron job to clean apt cache",
    copy: {
        src: "conf/etc/cron.monthly/apt-clean",
        dest: "/etc/cron.monthly/apt-clean",
        owner: "root",
        group: "root",
        mode: 0o700,
    }
  }
}

hmm hmm hmm, interesting. Good example of the fact that we want something other than {} delimited maps. The commas also add a lot of noise. Very interesting. Not what I would call very desirable.

Let’s try out the prefixed struct syntax:

. hosts: "all"
. remote_user: "ansible"
. become: true
. tasks: 
  - . name: "test connection"
    . ping: ""

  - . name: "SSH started (a bit of a tautology, I know)"
    . service:
    .. name: "sshd"
    .. state: "started"
    .. enabled: true

  - . name: "Install rsyslog"
    . apt:
    .. name: "rsyslog"
    .. state: "present"

  - . name: "Start rsyslog"
    . service:
    .. name: "rsyslog"
    .. state: "started"
    .. enabled: true
  - . name: "Install logrotate"
    . apt: { name: "logrotate", state: "present", }
  - . name: "Setup cron job to clean apt cache"
    . copy:
    .. src: "conf/etc/cron.monthly/apt-clean"
    .. dest: "/etc/cron.monthly/apt-clean"
    .. owner: "root"
    .. group: "root"
    .. mode: 0o700

I… don’t think that I hate it???