Hacker News new | past | comments | ask | show | jobs | submit login
YAML: It's Time to Move On (nestedtext.org)
265 points by firearm-halter on Nov 14, 2021 | hide | past | favorite | 392 comments



I don't like YAML and would like to move on, but I hope we don't move onto this.

I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.

I don't like that a truncated document is a complete and valid document.

But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:

>>> nt.loads(nt.dumps("\r"),top="str") '\n'


I wish people would stop trying to write programs for which there are no interpreters, compilers, or linters:

    name: Install dependencies
    run:
        > python -m pip install --upgrade pip
        > pip install pytest
        > if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi
That is a program that is hiding in the bowels of a "nestedtext" document ... It is no better than a program that is hiding in the bowels a JSON or YAML document.

We all have to deal with this, but it is beyond stupid.

    [Install Dependencies]
    run=/path/to/install-script
Then, write `install-script` in whatever language you want ... verify it works. It should have tests. etc etc etc.


I don't think it matters much if this is inline or in separate file. If you want to test your tests, "yq -r .run input.yaml | sh -e" works as well.

In fact, if I really wanted to test my tests, I'd say that directly testing the corresponding clause is the more comprehensive approach. For example, what if someone accidentally changes the line to read:

    run=/path/to/install-scriptq
? then your test of "install-script" will not catch anything. But if your test runs "yq -r .run | sh -e", then it will catch that error. And you can still forward to a script if you wanted to.

So let's keep inline scripts, they are very reasonable methods for just a few commands.


Depending on the source control tool you may lose syntax highlightning, you, most likely, lose linters and even copying those multi-line commands to shell becomes cumbersome. I consider inlining example from GP's comment awful


V minor point: new yq versions use 'e' instead of '-r'.


It would be nice if YAML wasn't horrendously abused the way it is. You have CI pipelines that let you construct DAGs to represent your builds, but you need several thousand lines of YAML and a load of custom parsing to get programming constructs in the string types, for example. And then each provider has its own way of providing those.

I don't have to re-read manuals describing how to do if/else in Ruby or Java or Lisp, but as soon as yaml and some 'devops' tooling is involved, I have to constantly jump back and forth between the reference and my config.

The main point being that the problem isn't the file format but the products that continue to push it, presumably because hacking stuff on top of `YAML.parse` is less effort than designing something that fits the purpose.


Yeah. A lot of times I find myself thinking YAML is like a really awful programming language. You can sort of do conditional logic and loops, but usually I find it hard to follow what's going on.

For build systems, I always liked the idea of Gradle where the core functionality was simple and declarative, but with the option to use a real programming language for things that weren't simple. For example, integrating installers or form builders (pre-processing) into a build are things I would consider non-trivial if there aren't official plugins, but it was still relatively easy to do with Gradle.

The biggest problem I always had with Gradle was that I didn't like Groovy and I always though there was a missed opportunity to have a statically typed build system with a solid API/contract and all the fancy tooling like auto-complete that you get with statically typed languages.

I see JSON5 mentioned a lot in the comments. In terms of CI / build systems, I feel like something built with JSON5/TypeScript could be really good. I'd be really happy using TypeScript for configuring things like build systems where there shouldn't really be an argument for needing it to be usable by non-programmers.

Personally I feel like I've spent way to much of my life debugging YAML syntax issues.


If you're happy to go lispy, there's Babashka [1], a Clojure without the JVM. It has built-in support for 'tasks' designed to make writing build scripts easy.

[1] https://babashka.org/


Does the Kotlin support in Gradle solve your problem? https://docs.gradle.org/current/userguide/kotlin_dsl.html


My experience with Kotlin gradle scripts is worse than Groovy. For example, given the following valid groovy/kotlin gradle program:

    dependencies {
    }
What would you expect to see between the curly braces? IntelliJ IDEA which supposedly has full support for the gradle DSL both for Groovy and Kotlin offers only generic suggestions. Common function calls such "implementation()" or "testImplementation()" are not suggested. If you do use those functions, no suggestion is made for their parameters. Because Gradle's DSL is built on top of a general purpose language, it loses the benefits of a DSL (constraining the set of possible configurations and guiding the user towards valid configurations).


The key benefit of the Kotlin DSL is that in this precise example, IDEA does suggest valid stuff: https://imgur.com/a/vFYNIU1

Kotlin DSL is miles ahead of Groovy in terms of discoverability and IDEA integration. With Groovy DSL, most of the build script is highlighted with various degrees of errors and warnings; with Kotlin DSL, if something is highlighted, it is a legitimate error, and vice versa - if no errors are detected by IDEA, then it is almost certain to work.

There were rough spots of IDEA integration a couple years ago, but now it is close to perfect, within Gradle's limits of course (due to sheer dynamic nature of it, some things are just not possible to express in a static fashion, unfortunately). The biggest obstacle to Kotlin DSL use might be that some of the plugins use various Groovy-specifc features which are hard to use from Kotlin, but thankfully most of the plugins either fix those, or are rewritten in Java or Kotlin instead.


This was one thing I found difficult learning Gradle: its seeming complete lack of autodiscoverability.

I expected it to catch on, but I think a lot of people are sticking with Maven.


There's a huge gap in Java build tool space for a tool that is simple and easy to learn and can cover 90% of projects' requirements. I have this feeling that we're in the "subversion" days of java build tools and the day someone introduces "git" people will wonder why we suffered with Gradle and Maven for so long. If I had time I would be looking into building this.


predating Gradle was a tool called gant. It was simple, intuitive and did 90% of what every project could want. Ironically it was Groovy based as well. But instead of the Gradle arcane magic based configuration it was literal, direct, a simple extension of Ant that came before it. I liked it much better, but someone decided they could make a business out of Gradle and gant got deprecated and here we are.


I found it fairly simple to build Gradle plugins with Kotlin. If anything, the problem was just having the patience to actually find the right documentation in the first place, and understand what was being described. The main problem I faced there was that I wanted a plugin to configure dependencies for the project it would run against and the docs around dealing with dependencies and detached configurations were a bit confusing.

I do find it curious that a lot of these tools get seen as basic task runners despite offering much more potential.


It's always the same trajectory with declarative programming. It starts with "it's just configuration, we need something simple". Then users come with use cases which are more complex. Then you have programming language on top of configuration language syntax.

* https://ilya-sher.org/2018/06/30/terraform-becomes-a-program...

* https://ilya-sher.org/2018/09/15/aws-cloudformation-became-a...


Very much so. A good few years ago I got annoyed that I couldn't have change mutt configuration the way that I wanted, because it has a built in configuration language which doesn't allow complicated conditionals etc.

(There are workarounds, and off-hand I can't think of a great example, but bear with me.)

In the end I wrote a simple console-based mail-client, which used a Lua configuration file. That would generally default to using hashes, and key=value settings, but over the time I used it things got really quite configurable via user-defined callbacks, and functions to return various settings.

For example I wrote a hook called `on_reply_to`, and if you defined that function in your configuration file it would be invoked when you triggered the Reply function. This kind of flexibility was very self-consistent, and easy to add using an embedded real language.

Later I added some hacks to a local fork of GNU Screen, there I just said:

* If the ~/.screenrc file is executable, then execute it, and parse the output.

That let me say "If hostname == foo; do this ; otherwise do this .." and get conditionals and some other things easily. Another example was unbinding all keys, and then only allowing some actions to be bound. (I later submitted "unbindall" upstream, to remove the need for that.)


Some even start with a programming language and pretend it's declarative...

IMO, it's only declarative when it's a data model which is easily parsed by multiple languages/systems where it's needed.


What's really sad is that XML had a much better ecosystem around this for ages. I'd very much rather deal with XQuery or even XSLT to construct build trees, than the current crop of ad-hoc YAML preprocessors. At least the XML stuff had a consistent type system underneath!


XSLT is an absolute horror and not something I would want to deal with again. It feels like some weird academic experiment in an XML declarative programming language that should never have made it to print.

If something needs the flexibility of a programming language, why not use a real one that's been well tested for writing other programs? These various config file programming systems always end up creating something notorious that everyone tries to avoid having to work on.


XQuery is, in many ways, XSLT with better syntax. It doesn't have the pattern-matching transforms that are the T in XSLT - but for configs, I don't think it makes a big difference.

Also, I don't think many realize that the stack has evolved since early 00s. XSLT 1.0 was a very limiting language, requiring extensions for many advanced scenarios. But there's XSLT v3.0 these days, and XPath & XQuery v3.1, with some major new features - e.g. maps and lambdas. Granted, this doesn't fix the most basic complaint about XSLT - its insanely verbose syntax - but even then, I'd still take XSLT over ad-hoc YAML-based loops and conditionals.


I will take the verbosity of XML any day over YAML wrestling (complex YAML configs of course). There is simply too many "implicit rules" for YAML. it's why I prefer python over ruby and perl. Generally though TOML has been good enough for me to do lots of fairly large config files that are easy for humans and machines to parse.


XML died because too many configurations turned what should be a 'prop' into an inner tag -- and it doesn't help that XML doesn't really give guidance as to when to use which. And, of course, when you deserialize XML, the innerText is always in a very strange place as to not really being clear "what the right way to handle it" is.


Honestly, I think using an embedded scripting language, like lua or even javascript, would be a much better fit for these use cases than trying to make yaml do something it wasn't designed for.


Ironically, having used cdk8s[1] for dealing with kubernetes infrastructure, that's the one thing where I've actually preferred yaml. That said, k8s resource definitions are pure config so there's no need to try and hack extra bits on top of a serialized data structure.

[1]https://cdk8s.io/


I really like approach of buildkite CI -- they use yaml, but this yaml can be produced by an executable script.

So you write yaml by hand for trivial cases, but once it get complex, you can just drop back to shell/python/ruby/node/whatever, implement any complex logic, and serialize results to plain yaml.


This is the crux of the issue right? YAML is fine for most projects, but THE project for using YAML is CI configuration, for which it doesn’t match.


I'm pretty sure the format is the issue.

I still don't know how to do arrays in yaml.

Is it new line, tab to same indent or same indent + one space, do I need a dash? Does a dash make it an array or an object?

It's just simply not that big of a deal to add a few quotes and braces to make everything make sense.

The only real issue with Json is the lack of comments and strictness about extra commas.


> I don't like that a truncated document is a complete and valid document.

Me either. If your documents have this property you're likely to tempt people to start trying to process partial documents.

When they do that, they violate Full Recognition Before Processing and likely there's a latent security bug as a result.


> So, I downloaded their python client to see how it did it.

Who are you suggesting the python client belongs to, who is 'they' in 'their'?


... the people that made it and wrote the submission we are discussing? how is this even a question?


Author seems to use misfeatures of a particular implementation to tar all implementations with. The round-tripping issue is not a statement about YAML as a markup language, much in the way a rendering bug in Firefox is not a statement about the web.

Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on. Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy. They're prone to opinion and style, as if replacing some part or other will make the high level problem (that's us) go away. Fretting over perfection in UI is an utterly pointless waste of time.

I don't know what NestedText is and find it very difficulty to care, there are far more important problems in life to be concerned with than yet another incremental retake on serialization. I find it hard to consider contributions like this to be helpful or represent progress in any way.


I actually disagree it's bike shedding.

If you can write a bad YAML document because of those mis-features/edge cases, I'd say you've already lost.

Humans are messy, but at the end of the day the data has to go to a program, so a concise and super simple interface has a lot of power to it for humans.

Working at a typical software company with average skill level engineers (including myself), no one likes writing YAML. But everyone is fine with JSON.

I think it's a case of conceptual purity vs what an average engineer would actually want to use. And JSON wins that. If YAML was really better than JSON, we'd all be using that right now.

So does it really matter if YAML is superior if >80% of engineers pick JSON instead?


I would argue that you can write something poor and/or confusing in any markup language that is sufficiently powerful.

Conversely, if a markup language is strict enough to prevent every inconsistency, then it's not powerful enough or too cumbersome to use to be generally useful.


I'd say that YAML is anything but conceptually pure, with all the arbitrariness, multitude of formattin options, and parsig magic happening without warning.

If you want conceptual purity (and far fewer footguns), take Dhall.


> Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on

Nah, in the 1970s we had Lisp S-expressions that completely solved the problem, and everything since then has been regressions on S-expressions due to parenthesis phobia.

After hearing that thing about the country code for Norway, I became convinced that YAML has to just die. Become an ex-markup language. Pine for the fjords. Be a syntax that wouldn't VOOM if you put 4 million volts through it. Join the choir invisible, etc.

This is good: https://noyaml.com/

Erik Naggum had a notoriously NSFW rant about XML (over the top even for him) that I better not link to here, but lots of it applies to YAML as well.


S-expressions don't solve the problem at all, you just get to fractally bikeshed all over again about what semantics they have and what transformations are or aren't equivalent. Does whitespace roundtrip through S-expressions? Who knows. Are numbers in S-expressions rounded to double precision on read/write? Umm, maybe. How do I escape a ) in one of my values? Hoo boy, pick any escape character you like and there's an implementation that does it.


EDN solves all these problems: https://github.com/edn-format/edn


I have to second that. Including the variant "canonical s-expressions" which is in fact a binary format.


Why of course, embedding a full-blown Lisp development environment for parsing a config file is totally sane and normal.

(Sarcasm, just in case.)


S-expressions don’t completely solve the problem: they don’t have a syntax for maps, and in practice there are at least two common incompatible conventions: alist or plist?


Obviously the application has to interpret the Lisp object resulting from reading the S-expression, just like it has to interpret any JSON, YAML, or anything else that it reads. So for maps you can, as you mention, use alists or plists. Regarding other stuff mentioned: none of the encodings are supposed to be bijective (the writer emits the exact input that the reader ingested). Otherwise, for example, they couldn't have comments, unless those ended up in the data somehow. There is ASN.1 DER if you want that, but ASN.1 is generally disastrous.

Stuff like escape chars were well specified in Lisps of the 1970s (at least the late 1970s), including in Scheme (1975). Floating point conversion is a different matter (it was even messier in the pre-IEEE 754 era than now) but I think the alternatives don't handle it well either. You probably have to use hexadecimal representation for binary floats. Maybe decimal floats will become more widely supported on future hardware.

A type-checked approach can be seen in XMonad, whose config files use Haskell's Read typeclass for the equivalent of typed S-expressions.


EDN has maps, sets, vectors and lists and is extendable.


Solutions for this problem that I've used in my own S-expression config files:

1. Use only alists for maps because they prevent off-by-one errors.

2. Allow plists because they're less verbose than alists and use reader macros to distinguish them, and allow the reader macro definitions to be in the same file.

Most of the time I use option 1 because it's simpler.


I would argue that, in a data markup language, there shouldn't be a syntax for maps. Whether a given sequence should be treated as key-value pairs, and whether keys in that sequence are ordered or unordered, is something that is best defined by the schema, just like all other value types.


It'd be bikeshedding if the status quo was good. But it isn't.


> Author seems to use misfeatures of a particular implementation to tar all implementations with.

There's no canonical YAML implementation, and YAML spec is enormous (doubly so if you need to work with stuff like non-quoted strings etc. )


> There's no canonical YAML implementation

The formal grammar counts as canonical and several implementations are derived from it: https://github.com/yaml/yaml-reference-parser


If you use YAML in situations where it may need hand editing, it means you actively hate your users.

YAML is patently unsuitable for any use case where the resulting output may require hand editing.


> YAML as a markup language

YAML ain't markup language.


>Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy

I don't know what to make of this statment, it has so much handwaving built-in. The most charitable interpretation I can find is that by 'Human-convenient' you simply meant the quick-and-dirty ideology expressed in Worse Is Better: Does job, makes users contemplate suicide only once per month, isn't too boat-rocking for current infrastructure and tooling.

Taken at face value (without special charitable parsing), this statement is trivially false. Python is often used as a paragon of 'Human-convenience', I sometimes find this trope tiring but whatever Python's merits and vices its _definitely_ NOT messy in design.

Perl is the C++ of scripting languages, it's a very [badly|un] designed language widely mocked by both language designers and users. Lua and tcl instead are languages literally created for the sole exact purpose of (non-) programmers expressing configuration inside of a fixed kernel of code created by other programmers, and look at their design : the whole of tcl's syntax and semantics is a single human-readable sentence, while lua thought it would be funny if 70% of the language involved dictionaries for some reason. These are extremely elegant and minimal designs, and they are brutally efficient and successful at their niches : tcl is EDA's and Network Administration's darling, and lua is used by game artists utterly uninterested in programming to express level design.

'Humans are messy' isn't a satisfactory way to put it. 'Humans love simple rules that get the job done' is more like it. But because the world is very complex and exception-laden, though, simple rules don't hug its contours well. There are two responses to this:

- you can declare it a free-for-all and just have people make up simple rules on the fly as situations come up, that's the Worse Is Better approach. It doesn't work for long because very soon the sheer mountain of simple rules interact and create lovecraftian horrors more complex than anything the world would have thrown at you. Remember that the world itself is animated by extremely simple rules (Maxwell's equations, Evolution by Natural Selection, etc...), it's the multitude and interaction of those simple rules that give it its gargantuan complexity and variety.

- you stop and think about The One Simple Rule To Rule All Rules, a kernel of order that can be extended and added to gradually, consistently and beautifully.

The first approach can be called the 'raster ideology', it's a way of approximating reality by dividing it into a huge part of small, simple 'pixels' and describing each one seperately by simple rules. I'm not sure it's 'easy' or 'convenient', maybe seductive. It promises you can always come up with more rules to describe new patterns and situations, and never ever throw away the old rules. This doesn't work if your problem is the sheer multitude and inconsistency of rules. The second approach is the 'vector ideology', it promises you that there is a small basis of simple rules that will describe your pattern in entirety, and can always be tweaked or added to (consistently!) when new patterns arise, the only catch is that you have to think hard about it first.


>and lua is used by game artists utterly uninterested in programming to express level design

Rather short sighted and dismissive to a successful programming language that's evolved over 20+ years. Lua is a great general purpose programming language that specializes not in "game making for non-programmers" but in ease of embedding, extension/extensability, and data description (like a config language). There's a whole section in Programming in Lua[1] to that effect. The fact that it's frequently used in games is credit to it's speed, size and great C API for embedding, not because of any particular catering to game designers.

[1]: https://www.lua.org/pil/10.1.html


You misunderstood me. I love lua and I wasn't being dismissive of it, I was using the first example that came to my mind to counter the claim that a convenient language has to be messy. Just because that was the example used doens't mean there is an implicit "and that's the only thing it's good for" clause I'm implying there: if someone said "Python is used by scientists utterly uninterested in programming to express numerical algorithms" would you understand that to be a dismissive remark against Python ?

Being used by non-programmers utterly uninterested in programming to solve problems is the highest honor any programming language can ever attain, because it means that the language is well-suited to the domain enough (or flexible enough to be made so) that describing problems in it is no different than writing thoughts or design documents in natural language. This is the single most flattering thing you can ever say about a language, not a dismissive remark.


It's really sad to see the pervasiveness of JSON. For one thing its usage as a config file is disturbing. Config files need to have comments. Second, even as a data transfer format the lack of schema is even more disturbing. I really wish JSON didn't happen and now these malpractices are so widespread that it's hurting everyone.


JSONC. JSON with comments. And even if your favorite parser does not support it natively it’s not so hard to add with a very simple pre-lexer step.

JSON schemas exist and they’re ok for relatively simple things. For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.


> For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.

Not sure if this is what you're looking for, and whether it's powerful and expressive enough for your use case, but you can use typescript-json-schema¹ for this, and validate with eg ajv.

¹https://github.com/YousefED/typescript-json-schema


I like JSON5 for similar reasons. I specifically like the addition of comments, trailing commas, and keys without quotes.


I've struggled with this in Java recently and at first I used Jankson which supports the complete JSON5 spec, but later we figured out we could configure the standard Jackson JSON package to accept the things we actually need and actually use.


Also needed is string concatenation. One line strings are very limiting.


There's libraries that let you define a schema programmatically, and then infer the types.

https://github.com/sinclairzx81/typebox


Seems to me that YAML just needs type/schema support to be less of a hurdle.

As an alternative, the encoding/decoding roundtrip using protobuf seems reasonable to me, catches the footgun of using floating-point version numbers (it becomes a parse error), whitespace/multiline concatenation being more obvious, and allowing comments (compared to JSON):

  ( cat << EOF
  # yes, comments are allowed
  name: "Python package"
  on: "push"
  build {
    python_version: ["3.6", "3.7", "3.8", "3.9", "3.10"]
    steps: [
      {
        name: "Install dependencies"
          run:
            "python -m pip install --upgrade pip\n"
            "pip install pytest\n"
            "if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi\n"
      },
      {
        name: "Test with pytest"
        run: "pytest\n"
      }
    ]
  }
  EOF
  ) | protoc --encode=Config config.proto  | protoc --decode=Config config.proto
  
  name: "Python package"
  on: "push"
  build {
    python_version: "3.6"
    python_version: "3.7"
    python_version: "3.8"
    python_version: "3.9"
    python_version: "3.10"
    steps {
      name: "Install dependencies"
      run: "python -m pip install --upgrade pip\npip install pytest\nif [ -f \'requirements.txt\' ]; then pip   install -r requirements.txt; fi\n"
    }
    steps {
      name: "Test with pytest"
      run: "pytest\n"
    }
  }


> Seems to me that YAML just needs type/schema support to be less of a hurdle.

JSON schemas exist and can be applied to yaml and this is supported by many editors. For example this vscode extension: https://marketplace.visualstudio.com/items?itemName=redhat.v...

It's strange to see so many complains about "missing tooling" that actually exists and is well supported.


> Seems to me that YAML just needs type/schema support to be less of a hurdle.

Unfortunately YAML already got type support, which made it easier to roundtrip, but also insecure. Creating a type calls constructors with possible insecure side effects. Which was eg used to hack Movable Type.


JSON Schema is an official thing that exists and has implementations in all major languages. Personally I’m very glad that it’s an opt-in addition rather than a requirement.

(I agree with you about comments though)


For comments just use JSONC.


I agree, but I would recommend JSON5 as the solution. Not YAML or this abomination.

JSON5 has many advantages:

* Superset of JSON without being wildly different. I know YAML is a superset of JSON but it's completely different too. Insane.

* Unambiguous grammar. YAML has way too many big structure decisions that are made by unclear and minor formatting differences. My work's YAML data is full of single-element lists that shouldn't be lists for example.

* Comments, trailing commas

* It's a subset of Javascript so basically nothing new to learn.

* It has an unambiguous extension (.json5). I think JSONC would be a reasonable option but everyone uses the same extension as JSON (.json) so you can never be sure which you are using. E.g. `tsconfig.json` is JSONC but `package.json` is just JSON (to everyone's annoyance).

* Doesn't add too much of Javascript. I wouldn't recommend JSON6 because it's just making the format too complicated for little benefit.


I would rather recommend jsonc:

- it has good editor support (VsCode) - has comments support - Support jsonschema

Only thing missing is trailing commas, but i would rather live without trailing commas than tooling support


JSONC supports trailing commas.

> - it has good editor support (VsCode)

Unfortunately it doesn't really because of the extension issue I mentioned. Certain file names (like `tsconfig.json`) are whitelisted to have JSONC support, but any random file `foo.json` will be treated as JSON and give you annoying lints if you put comments and trailing commas in.

That's a fairly recent change I think.


Tools that use JSON as configuration format could simply allow certain unused keys (e.g. all keys starting with #) and promise never to use them. Then author can write their comments with something like:

    {
      "name": "my-tool",
      "#comment-1": "Don’t change the version!",
      "version": "42.1337.0"
    }


There's a lot of JSON tooling, and it's liable to interact badly with this. For example, a formatter might re-order the fields of a dict, moving "#comment-1" away from "version". Or the software that this JSON is for might error upon receiving unexpected keys (which is actually useful behavior, as that would catch a typo in an optional field).

Also, this doesn't let you put comments at the top of the file, or before a list item, or at the end of a line.

If you're going to change your JSON tooling to handle comments of some kind, you might as well go all the way to JSONC.


I've heard and read this multiple times. Why are you trying so hard to fit into a format that doesn't just support comments out of the box? What advantages is JSON offering you that you've compelled to bend over backwards to do this? It's exactly these kinds of workarounds that is making it super difficult stop such malpractices. It's just plain ugly. Please stop doing this.


In many cases, you're using a library or service that you don't maintain, so you don't have much of a choice.


You can't comment out a large section of config easily. For me, this is a relatively common use case for config files, so I take the position that JSON should be used for serialization only.


And I am just writing a JSON de/serializer to move my config from the current system to JSON. I worked on it today and yesterday and several days some time ago.

This situation makes me feel rather silly


So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

(and it doesn't have to be comment-less... JSON with comments is a thing and VSCode has syntax highlighting for it - just strip out the comments before parsing).


> So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

Aren't we past basic false dichotomies?


Nope: basic false dichotomies and JSON are both pervasive.


There is a corporate- and government-approved standard for false dichotomies, but it works as a de-facto standard, not published.


   ifnot:
    - foo
   then_clearly:
    -
       'some bar'


XML is perfect. + With all the fancy editors now its very easy to write. Easy schema to check, comments. Perfect.


Disclaimer: this is not a defense for YAML, I'm just trying to remove the rose tinted glasses some people view XML configs through.

As someone who has used XML configs they have a few problems:

- technical: missing comments are mentioned multiple times here so I will mention that while XML has comments they cannot be nested.

- socially: for some reason (maybe because XML is structured enough that this doesn't immediately collapse?) XML tends to just grow and grow. People start programming in XML too, and not only using XSLT or other standard approaches but also in completely proprietary ways.

At one project someone even wrote an authorization framework in Apache Tiles which allowed one to create roles using somewhere between 600 and 5000 lines of XML pr role. The benefit was of course that you could update the roles without touching the Java code.

(In case it isn't immediately obvious: it would have been extremely much simpler to edit it in Java, and people who know enough Java to fix it are available at the right price, the XML system had to be learned at work.)

Personally I just want it to be kept simple:

- a settings.local.ini and default settings in settings.ini or something to that effect

- if necessary, just use a code file: config.ts works just as well, or config.js if it needs to be adjustable at runtime without transpilation.


not easy to read, it's the java of config, pages of code that express very little, by the time you find what you need, you forget the context and what level of nesting you're on already. It's also more wasteful as a transport.


> It's also more wasteful as a transport.

This is most certainly true, however with GZip thrown into the mix, it's not quite as bad as one might imagine: https://www.codeproject.com/articles/604720/json-vs-xml-some...

It compresses pretty decently and doesn't have too much of an overhead, in the example it being around 10% larger than JSON when compressed.

I'd argue that if one were to swap out JSON for XML within all the requests that an average webpage needs for some unholy reason, the overall increase in page size would be much less than that, because huge amounts of modern sites are images, as well as bits of JS that won't be executed but also won't be removed because our tree shaking isn't perfect.

Edit: as someone who writes a good deal of Java in their dayjob, i feel like commenting about the verbosity of XML might be unwelcome. I'll only say that in some cases it can be useful to have elements that have been structured and described in verbose ways, especially when you don't have the slightest idea about what API or data you're looking at when seeing it for the first time (the same way how WSDL files for SOAP could provide discoverability).

However, it all goes downhill due to everything looking like a nail once you have a hammer - most of the negative connotations with XML in my mind actually come from Java EE et al and how it tried doing dynamic code loading through XML configuration (e.g. web.xml, context.xml, server.xml and bean configuration), which was unpleasant.

On an unrelated note, XSD is the one truly redeeming factor of XML, the equivalent of which for JSON took a while to get there (JSON Schema). Similarly, WSDL was a good attempt, whereas for JSON there first was WADL which didn't gain popularity, though at least now OpenAPI seems to have a pretty stable place, even if the tooling will still take a while to get there (e.g. automatically generating method stubs for a web API with a language's HTTP client).


You mean something like https://pyotr.readthedocs.io


Thanks for the link, but not necessarily.

How WSDL and the code generation around it worked, was that you'd have a specification of the web API (much like OpenAPI attempts to do), which you could feed into any number of code generators, to get output code which has no coupling to the actual generator at runtime, whereas Pyotr is geared more towards validation and goes into the opposite direction: https://pyotr.readthedocs.io/en/latest/client/

The best analogy that i can think of is how you can also do schema first application development - you do your SQL migrations (ideally in an automated way as well) and then just run a command locally to generate all of the data access classes and/or models for your database tables within your application. That way, you save your time for 80% of the boring and repetitive stuff while minimizing the risks of human error and inconsistencies, with nothing preventing you from altering the generated code if you have specific needs (outside of needing to make it non overrideable, for example, a child class of a generated class). Of course, there's no reason why this can't be applied to server code either - write the spec first and generate stubs for endpoints that you'll just fill out.

Similarly there shouldn't be a need for a special client to generate stubs for OpenAPI, the closest that Python in particular has for now is this https://github.com/openapi-generators/openapi-python-client

However, for some reason, model driven development never really took off, outside of niche frameworks, like JHipster: https://www.jhipster.tech/

Furthermore, for whatever reason formal specs for REST APIs also never really got popular and aren't regarded as the standard, which to me seems silly: every bit of client code that you write will need a specific version to work against, which should be formalized.


> model driven development never really took off

same as to why REST is now not a hot thing anymore, the idea that your API is just a dumb wrapper around data model is poor api design.

API-driven development didn't really took off either, that is write your spec in grpc/OpenAPI and have the plumbing code generated in both ends. It's technically already there with various tools, but because of dogma like "code generation is bad", quality of code generators, or whatever reason, we're still writting "API code"


Well, in Python, code generation is an anti-pattern.


> Well, in Python, code generation is an anti-pattern.

Hmm, i don't think that i've ever heard of this. Would you care to provide any sources, since that sounds like an interesting stance to take?

So far, it seems like frameworks like Django don't have an issue with CLI tools to generate bits of code, i.e. https://docs.djangoproject.com/en/3.2/intro/tutorial01/

  If this is your first time using Django, you’ll have to take care of some initial setup. Namely, you’ll need to auto-generate some code that establishes a Django project – a collection of settings for an instance of Django, including database configuration, Django-specific options and application-specific settings.
  
  $ django-admin startproject mysite
Similarly, PyCharm doesn't seem to have an issue with offering to generate methods for classes (ALT + INSERT), such as override methods (__class__, __init__, __new__, __setattr__, __eq__, __ne__, __str__, __repr__, __hash__, __format__, __getattribute__, __delattr__, __sizeof__, __reduce__, __reduce_ex__, __dir__, __init__), implementing methods, generating tests and copyright information.

I don't see why CLI tools would be treated any differently or why code generation should be considered an anti-pattern since it's additive in nature and is entirely optional, hence asking to learn more.


First of all, just because a tool or project uses a pattern, it doesn't mean that it's a good idea. Second, code generation as part of IDE or one-time setup is something else.

I need to clarify: when I say that "code generation" is an anti-pattern, I'm talking about the traditional, two-step process where you generate some code in one process, and then execute it in another. But Python works really well with a different type of "code generation".

Someone once said that the only thing missing from Python is a macro language; but that is not true - Python has its own macro language, and it's called Python.

Python is dynamically evaluated and executed, so there is no reason why we need two separate steps when generating code dynamically; in Python, the right way is not to dynamically construct the textual representation of code, but rather to dynamically construct runtime entities (classes, functions etc), and then use them straight away, in the same process.

Unless you're dynamically building hundreds of such constructs (and if you do you have a bigger problem), any performance impact is negligible.


> Someone once said that the only thing missing from Python is a macro language

Ahh, then it feels like we're talking about different things here! The type of code generation that i was talking about was more along the lines of tools that allow you to automatically write some of the repetitive boilerplate code that's needed for one reason or another, such as objects that map to your DB structure and so on. Essentially things that a person would have to do manually otherwise, as opposed to introducing preprocessors and macros.

For a really nice example of this, have a look at the Ruby on Rails generators here: https://medium.com/@simone.catley/ruby-on-rails-generators-a...


>you forget the context

Wait, it the opposite. XML is designed to indicate context, and JSON is designed to hide context, you have a bunch of braces in place of context there, no matter where you are it's braces all the way down, like lisp.


not really, what enables you to have the context is shorter code. It's useless to have context reminders at the top and bottom of the thing, but not the middle and it's too damn long


For me XML and YAML are about the same. I think I'd also prefer comment-less JSON over both. However, XML wasn't that bad. With a decent editor and schema validation I would say there's a good chance I was more productive with XML than I am with YAML.


It's simple. For config files, choose the format that has the best tooling in your company and that supports comments. For data transfer, choose that supports schemas, backwards compatibility and good tooling (protobufs is just one e.g. that I'm most familiar with).


Actually, yes, I do. XML syntax was far from stellar, and much of the ecosystem (e.g. XML Schema) was drastically overengineered... but even so, we had gems like RELAX NG to compensate. On the whole, it was better than the current mess.


So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

Sure, why not? XML rocks. I'll take it over JSON for many purposes.


My opinion only: I love JSON because it lacks so many foot guns of yaml. If you’re doing lots of clever stuff with yaml you probably want a scripting language instead. Django using Python for configs made me fall in love with this. Spending years with the unmitigated disaster that is ROS xml launchfiles and rosparams makes me love it even more.

Yaml and toml are fine if you keep it simple. JSON direly needs comments support (but of course wasn’t designed to be used as a human config file format so that’s kind of on us). And not just “Jsonc that sometimes might work in places.”

Beyond that, I think we generally have all the things we need and I don’t personally think we need yet another yaml. =)


These aren't foot-guns per se, but I can think of another handful of grievances I have with JSON:

* JSON streaming is a bit of a mess. You can either do JSONL, or keep the entire document in memory at once. I usually end up going with JSONL.

* JSON itself doesn't permit trailing commas. I can measure the amount of time that I've wasted re-opening JSON files after accidentally adding a comma in days, not hours.

* JSON has weakly specified numbers. The specification itself defines the number type symbolically, as (essentially) `[0-9]+`. It's consequently possible (and common) for different parsers to behave differently on large numbers. YAML also, unfortunately, has this problem.

* Similarly: JSON doesn't clearly specify how parsers should behave in the presence of duplicate keys. More opportunity for confusion and bugs.


Running prettier (https://prettier.io) on each save will fix trailing commas for you. If you accidentally have one, it will just sneakily remove it and turn your document into one that is valid.


How someone could have decided on a subset of javascript and not include comments is beyond me.


It may have been a good or bad decision. But comments were intentionally left out of JSON to avoid obvious ways to sneak in parsing directives and thus incompatibilities between different JSON-parsers.


Yet incompatibilities persist from day 1: big integers, duplicate keys, keys order.

On the other hand, XML allows comments, yet I've never seen XML parsers incompatibilities.


Not exactly an incompatibility, but my mind jumped to issues like this: https://github.com/swisskyrepo/PayloadsAllTheThings/blob/mas...

    <NameID>[email protected]<!--XMLCOMMENT-->.evil.com</NameID>
Some parsers will take just the first text element ("[email protected]"), and others will concatenate the text elements ("[email protected]").


> Some parsers will take just the first text element

Those are not in compliance with the relevant spec. We need to treat them as damage and confront them on the technical and social level.


If I had a penny every time someone tried to parse xml using a regex, if that classifies as a parser. Those are 100% incompatible with everything else.

Easiest way to demonstrate how wrong that is, is to throw in a comment in the example document ;)


the funny thing is that json doesn't even need commas, they essentially act as whitespace, any amount or no amount would make no difference in the meaning of the document.

Arrays with hole are a JS-only feature


> json doesn't even need commas

JSON is defined by the spec. The people who wrote the spec think otherwise[0].

[0]: https://www.json.org/json-en.html

> Arrays with hole are a JS-only feature.

There are other langauges that allow arrays with missing elements.


JSON requires commas, but does not need them, semantically they are treated like whitespace

The document > {1:2 3:[4 5]}

can only be "commafied" to > {1:2, 3:[4, 5]}

> There are other langauges that allow arrays with missing elements.

But javascript is the only one that gave JSON the JS in its name


You can parse JSON in a streaming fashion with many libraries. You just don't know at the beginning if it is going to be valid or not.


And the flip side of that with YAML is you can stream it, but you don't know once you've gotten to the end if it was the whole document without some user defined checksum mechanism.


Ran into a great bug with the INI format which has the same issue. The application would read the config file on modification but if you just wrote over the file it would sometimes read the config before the file was fully written. Have to use a temp file and move it rather than just edit it.


It's possible to have document start and end markers in yaml:

    ---
    foo: 1
    ...
Your application can mandate usage of these. But yeah, not ideal.


> Your application can mandate usage of these

I believe that's only true if one were to load YAML via the "SAX"-style per-event stream, and not the "object materialization" that normal apps use (aka `yaml.load_all` or JAX-B objects) since in those more data-object centric views, where would one put the processing events for those markers?

I also originally expected `yaml.parse(...)` to eat them as it does for comments and extraneous whitespace, but no, it does in fact return dedicated stream events for them, so TIL


2, 3 and 4 can be caught early with JSON schema.


Not really, json schema validation is applied after json parsing on already parsed json.


> Django using Python for configs made me fall in love with this.

I also started advocating in-language configuration files (Python for Python, but also Lua for Lua, etc) a number of years ago because it lets you do really useful things (like functionally generating values, importing shared subsets of data, storing executable references, and ensuring that two keys return the same values without manual copy/paste) all without needing to spec and use Yet Another Thing™ that does only a fraction of what the programming language you're already using already does.


That also implies that you can't just test a foreign config file without first reading and understanding what it does, as just using one would imply arbitrary code execution.


This is a place where Tcl excels. You can easily create restricted sub-interpreters that can't do anything dangerous. If you need more power for trusted scripts you just reenable selected commands.


Same thing with with Lua!


Using the programming language to do the comments works only when using some scripting language.

Things that get compiled can't really use it without recompilation.


But you can embed Lua or Python using its C interface.


That is how our Tcl based application server was, the configuration files were a Tcl DSL.


> My opinion only: I love JSON because it lacks so many foot guns of yaml.

While true, parsing it is still a minefield because it's very underspecified: http://seriot.ch/projects/parsing_json.html


JSON5 is the way to go. It supports comments and trailing commas. Unfortunately it's going to be difficult to supplant legacy JSON, which is so pervasive.


Except parsing JSON5 in browser is super slow. Native JSON.Parse doesn't support it, non-native parsnips are slow, and the only fast way to parse it is `eval()`.


Does the browser need JSON objects with comments?

The desire to use a single interchange format for all data is the problem. There are plenty of reasons to support comments and minor syntax issues that JSON itself dislikes for human consumable and interactive JSON. I'd think software JSON could be just that.


This shouldn't really matter for the JSON5 use case - config files - which are usually small enough.

For machine-to-machine generated payloads JSON is good enough.


I work with ros extensively and have not heard of using django for this use case. do you know of any open source projects that do this?


Sorry. Two separate contexts. I use both in the big picture but the Django world doesn’t directly interact with ROS. there’s an HTTP api for that.


I’ve never liked YAML. For whatever reason, it always feels like working in a mine field. It comes from the same cargo cult of people who think the problem with human machine formats is that it needs to be “clean”.

Clean, of course to them means some bizarre aesthetic notion of removing as much as possible. Only it’s taken to an extreme. I wonder if the same people also think books would be better with all punctuation be removed to make it look “clean”?

It’s unhealthy minimalism, causes more problems than it solves. As soon as I see a project using YAML I cringe and try to find an alternative because god knows what other poor choices the developer has made. In that sense, YAML can be considered a red herring and I’m usually right. The last project I used that adopted an overly complex and build-breaking YAML configuration syntax had other problems hiding under the covers, and in some cases couldn’t parse it’s own syntax due to YAML’s overly broad but at the same time opinionated syntax.

Just say no to YAML.


By its very name (and the fact that the MEANING of the name flip-flopped in mid-flight after launch) you can tell that the designers of YAML had no clue what they were doing, because originally they named it "YAML" for "Yet Another Markup Language", when it clearly was NOT a markup language.

Only AFTER YAML had been around and in use for a few years did those geniuses actually realize that they had made a mistake in naming it something that it's not, and retroactively changed the name "YAML" to mean "YAML Ain't Markup Language", which was a too clever by half way of whitewashing the fact that they originally CLAIMED it was "Yet Another Markup Language", since they had no idea what a markup language actually was.

I prefer to use markup languages and data definition languages that were designed by people who are situationally aware enough to know what the difference between a markup language and a data definition language is, please.

Hard pass on YAML, whatever it stands for this week.


I've often heard this argument about YAML being "clean", but over time I have realized that they are conflating minimalism with cleaninless, when they are two different things. That realization is what it took for me to realize why I didn't like it. I did _not_ find it clean, I found it "messy" by virtue of the increased cognitive overhead. But it is minimal at least compared to other formats. Other formats appear cleaner to me.


I'll give my opinion as someone who has to choose among JSON, XML, TOML, and YAML about two years ago for a new project. Whatever I chose had to be easy for end-users who don't know the specification to to understand later.

Here were my thoughts on the options.

JSON - No comments -> impossible

XML - Unreadable

YAML - 2nd place. Meaningful indentation also made me worried someone was going to not understand why their file didn't work. The lack of quotes around strings was frustrating.

TOML - 1st place. Simpler than YAML to read & parse. It truly seems 'obvious' like the name says.

I haven't encountered any situations where I wish I had more than TOML offers.


I disagree. TOML is terrible at handling nested data.

Check this thread:

https://news.ycombinator.com/item?id=17523194

I don't see Kubernetes switching to TOML anytime soon!


There may be no nested data in his use case. There’s no single correct answer here.


Too YAGNI for me.


I have nesting up to three levels deep. I use inline tables^ for the many innermost (or other few-element) tables. It's never seemed excessively verbose.

^https://toml.io/en/v1.0.0#inline-table


A bit later you see [fruits.physical]

Even XML doesn't make you repeat the higher level keys!


Also agree, I find toml way less readable than yaml for lots of data structures


I feel like if you have data so nested that TOML is a problem then your schema is a problem/you should just be using a script


I think the "right" choice is HCL


It's plenty easy to convert YAML to JSON and use it in Terraform :)

https://www.terraform.io/docs/language/syntax/json.html


Why convert when HCL is superior to YAML and JSON?


It isn't. YAML and JSON are much more proven than HCL. HCL is used for some relatively small products. Just making something more complicated doesn't make it better.


Proven in what sense? Several implementations are broken are incorrect. HCL is used in very large products as well. Just because it isn't the majority currently doesn't mean that it isn't a worthy choice. HCL isn't more complicated if used as an alternative to YAML or JSON, in fact, I would argue that it is simpler. It bridges the pros of YAML and JSON combined, and addresses the nested complexity of TOML. It really is IMO the best, but you of course are free to share a different opinion. However, I would encourage you to actually try it out and re-evaluate.


The unreadability of XML is grossly exaggerated.


I agree. I have never really had a problem reading XML myself.


There’s properties files too, but TOML is my “format of choice” as well for a bunch of use-cases where human readability is important.

More people should give it a try. Very reminiscent of old Windows INI files and Java properties.


TOML is pretty good but it gets too verbose when you add bunch of arrays.

All we need is to revise official JSON standard(ECMA 404) to include comments.


And trailing commas. And unquoted keys.


Why are unquoted keys so critical? I feel like one of the strengths of a DDL like JSON or XML is that it's easy to tell what the data (key-value pair or otherwise) is, while with YAML and others, understanding data-vs-structure can be challenging.


Mostly so copy between JS and JSON isn’t such a PITA.

It’s not essential, but if we’re already changing the format we might as well?


> All we need is to revise official JSON standard(ECMA 404) to include comments.

That would be a step back for GitLab CI, GitHub Actions, Kubernetes, Google App Engine, and a bunch of other projects which use YAML and seldom encounter the Norway problem. https://hitchdev.com/strictyaml/why/implicit-typing-removed/


That's fine. They should not have used a format made for data to describe what essentially is code.


Why shouldn't they have?


TOML can't decide if it's a super INI file or a JSON cousin. You can represent the same information using two completely different representations and you can mix both styles in the same document. Manually navigating and editing values is error prone and hard to automate.


JSON with comments would be ideal.


Which is why many parsers support that. I'm positive you'll find one that does so in pretty much every environment.


In that case, you might want to have a look at JSON5: https://json5.org/

It is pretty niche, but attempts to improve upon JSON in a multitude of ways, one of which is the support for comments: https://spec.json5.org/#comments


The libconfig format is fairly close to that, and it's great!

http://hyperrealm.github.io/libconfig/libconfig_manual.html#...


I guess it's just matter of personal taste, but I don't see how XML is any more "unreadable" than any of the other options mention here.


TOML can handle nested data at the application level by using entity reference token semantics.

It does need an XPath traversal and search query format for application use and data references.


Shoehorning.


A lot of people have really strong opinions towards syntax things like YAML vs JSON vs XML, HTML, even programming languages. I think at some point we assign way too much importance to this kind of stuff.

I recently read a piece by Joel Spolsky that resonated with me (even though my career is not nearly as long as his).

> I took a few stupid years trying to be the CEO of a growing company during which I didn’t have time to code, and when I came back to web programming, after a break of about 10 years, I found Node, React, and other goodies, which are, don’t get me wrong, amazing? Really really great? But I also found that it took approximately the same amount of work to make a CRUD web app as it always has, and that there were some things (like handing a file upload, or centering) that were, shockingly, still just as randomly difficult as they were in VBScript twenty years ago. [0]

It makes me wonder if we're really focusing on the right stuff. Maybe there's lower hanging fruit somewhere that's more valuable than focusing on fundamentally subjective things like syntax.

[0]: https://www.joelonsoftware.com/2021/06/02/kinda-a-big-announ...


A radically different alternative with a lot going for it is Starlark: https://github.com/bazelbuild/starlark

It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.


EDN [1] and Transit [2]... Elegant weapons for a more civilized system.

[1] https://github.com/edn-format/edn

[2] https://github.com/cognitect/transit-format


Really came here to search why EDN wasn't mentioned. It is used in Clojure/ ClojureScript/ hylang ... projects a lot. It is a superset of JSON, is in my opinion a lot more readable than JSON but familiar enough too. It has native sets e.g. #{1 2 "three" '("four element list with a string inside")} and keywords. Tagged elements can be used for extending e.g. with a timestamp (such as the built-in #inst) or #uuid. And it also supports comments and discards for stuff, that should be omitted in evaluation.

As a sysadmin, YAML seems nice until you have actually done anything more advanced with it. See Julien Pivotto's presentation about some of its pitfalls: https://www.slideshare.net/roidelapluie/yaml-magic?next_slid... Btw. Jsonnet doesn't seem too bad either: https://www.youtube.com/watch?v=LiQnSZ4SOnw and here some examples: https://jsonnet.org/ but in my book, EDN still wins.


How about simply using pure full blown JavaScript or Python for config files, and not hiring people who you can't trust not to write infinite loops?

Or if you really must, then simply interrupt processes that loop infinitely, and fix the bugs that caused it.

You know, like you already do when you have an infinite loop.

Infinite loops are not the end of the world, you know. Processes can be interrupted, and computers have reset buttons.


IMO using code that generates (possibly binary/opaque) config data is the sweet spot. It's one more layer of indirection, but it means you're language-agnostic, you have a "safe" interface, and your "config-generating" process can be as expressive as you like -- comments, loops, whatever.

The underlying conundrum is:

- systems need to be configured,

- human-readability is obviously necessary at some level,

- configuration is often very "compressible" (needs loops, needs variables to be maintainable), but

- system-writers don't know the structure of your data, the axes on which you'd want to compress things, the best abstractions for you.

Templating languages are an obvious direction, but they're uniformly bad. If they have limited expressiveness you'll run into the limits. Maybe there are templating languages with good unit testing frameworks, but I haven't seen them. "Look at the expanded diff" doesn't scale. And generating gobs of human-readable "data" (in a format that supports comments!) is very wasteful.


It's not just a trust thing. Knowing that some snipped has bounded evaluation is super important, for mental models, processin, security, etc.

It's still resources to detect loops, it often involves introspection or privileged views; it's simply easier to prevent loops.


Determing whether or not arbitrary code is looping is actually impossisble (halting problem).


> Starlark is a dialect of Python. Like Python, it is a dynamically typed language with high-level data types, first-class functions with lexical scope, and garbage collection.

If it has first-class functions, how can you avoid infinite recursion? Like, what stops me from running the omega combinator in it? This is why Meson (a similar language) does not allow those kinds of shenanigans, to keep the language non-Turing-complete.


No recursion and no lambda.


So it doesn't have first class functions then?


Not a bad idea but only implemented in Rust, Go, and Java so far. Meanwhile, all sorts of languages can interpret JSON and YAML.

It's a cool idea to do configuration in a subset of Python but now you have to go implement that subset in every language.


Have you had any experience building on top of it directly outside of blaze/bazel?



How about just nudge json a couple more notches towards js? https://github.com/leontrolski/dnjs


Interesting! I started using jsonnet this year, but found that the language was needlessly quirky (e.g. the `::`, purely functional aspect, and no one wants to learn a new language to write configuration in the first place). More importantly, it is extremely slow (lazy evaluation without memoization...): rendering the Kubernetes YAML of my 5-container app taking over 10 seconds...

I will look into this further.


> It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

Starlark is indeed deterministic and guaranteed to terminate (the Go implementation has a flag that allows recursion, but it's off by default), but these are two orthogonal properties.


Plenty of tools lacking in the Starlark environment, e.g.: generating Starlark files, machine editting Starlark maps


So one thing I wasnt sure of is: If you have a Starlark program how is the value of it decided? Is it simply the value of the last expression? And where does the print-output end up? Is it just for diagnostics and has no influence on the value?


I like INI. It's simple it's readable and it leaves the data types up to the application to interpret. It's also really easy to parse, I can work out how to do it and JSON is beyond me.

I like CSV (and similar delimited files) it's less verbose than anything else for tabular data.

I like JSON for data transfer, you know the data types, it's succinct, and readable.

I personally don't need anything else.


This is the right answer in my view. If you need something structured use XML, otherwise INI.

I'm more likely to Yacc my own config format than use YAML or JSON personally.

JSON is great as an output format for data though.


INI is my favorite. I dont understand why it isn't the automatic default for everything.


As far as I know there is no standard for INI. There is a TOML that looks close enough I guess?


TOML looks good. I'd rather it be call the ini standard but.


iirc, it's hard to do any nested structure in INI - you'd have to do a convention like putting prefixes and dots in the name of the entry to denote hierarchy.


Exactly what I think about the matter. Sometimes I use proprietary binary formats together with UDP where performance is critical (game servers for example).


I like NestedText, it's less verbose than anything else for nested data.


I have to say I hate the fact that I have low confidence when editing YAML that the result will be what I intend. It's kind of the number one job of such a format. And I routinely run into people using advanced features and then I have no idea at all how to safely edit it. It is interesting that it seems so difficult to pick a good tradeoff between flexibility and complexity with these kinds of languages.


I just stick to XML unless forced to use something else.

Schema validation, code completion on IDEs, endless amount of tooling including graphical visualisation, a language for data transformation and queries, and.... wait for it... comments!


If you're going to use XML, I would consider it mandatory to also use XSDs (W3C XML Schemas).

XSDs is something I think people need to pay more attention to when dealing with XML; the type system that the W3C XSD standard lays out (when used effectively) really does relieve much of the pain that people experience with XML.


What is the obsession with removing braces? I will never find the lack of clear demarcations (relying on indent) easier than braces.


Visual clutter, familiarity to non-coders. Curly braces are almost never used outside of programming and are ugly to boot.

My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"

If the intended audience is purely developers, then sure. JSON (with the addition of comments and trailing commas) is just fine.

White space has the additional advantage of agreeing with itself. Other demarcations can have issues where the indentation and the structure contradict each other.


Again the dreaded Cobol argument. We had to struggle with a lot of this in the past: Cobol, SQL, YAML, BDD. All this would be much easier without this nonsensical idea that nontechnical people will read code. They won’t. Making code a bit more like prose doesn’t make it readable for nontechnical people. Yet we again and again make our life harder - ugly syntax rules, no code completion, no auto-formatters.

Please stop making code easy for non-coders. They don’t want to read it. They never did. They just want this damn box to work.


as a counter argument. I work in robotics, where many operators will look at and change settings in a yaml file during testing. They do not have software skills outside of this.


My educated guess then is you could have gotten them to change "settings" in C, Java or basically anything.

Just put the file in the root folder, and keep it as simple as possible and you should be fine? I mean, if they manage to write yaml correctly and consistently C is no match?


maybe? the dynamic loading of the configs kind of restricts it to a markup language.


The reason the situation is the way it is now is precisely because the code being made easy for non-coders increased the popularity and reach of the products. Probably because non-coders also found it easy to pick up and start working with it.


Wrong use case. I'm talking about asking them to write it edit these files.


I haven't found "this needs to be indented exactly the right amount or it won't work" to be much easier for non-programmers than "this needs to be enclosed in braces or it won't work." Most people have at least experienced parentheses in math (albeit maybe decades ago), so it's not an entirely foreign concept. Either one requires a bit of learning, but I think most people are capable of it, so any improvement in non-coder familiarity seems minor at best, vs. the very real costs.


Counter-argument - why do programmers insist on clear indentation if it doesn't aid readability? The indentation is there for humans and the braces are there for the compiler.


That's why you have both and lint against inconsistencies. This catches errors.


>My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"

My benchmark is this: can an autoformatter do its job every time without breaking something that's technically working right now but possibly formatted wrong?

Every data format that cannot comply with this contains in it a huge waste of time. Even as a python programmer, I extend the same rule to programming languages.


Significant whitespace is evil. It's just begging for copy/paste bugs.


Funny that's never been an issue for me after 2 decades of writing Python.


My google ability didn't see anything, but are their any case studies that show it's more readable? I'm happy to accept that it is, but I can't help wondering if research has been done or it's mostly gut feeling / anecdotes / aesthetics.


> that show it's more readable?

I'm not clear exactly what the "it" here refers to but as I mentioned in other comments it's fairly self-evident that indentation is easier to visually parse than braces. A simple thought experiment - would you find it easier to skim read code where the indentation was consistent with the bracing or where it was inconsistent? Your brain registers the indentation first and you only resort to counting braces if there's a reason to doubt the former.


Reality often times runs counter to our expectations though, which is why I wondered if no brace methods have been shown to enable simpler to understand usage / understanding.


My point was that the debate is between "indentation alone is sufficient" and "braces plus indentation is better than just indentation".

Nobody advocates for "braces are better without indentation".

This surely implies that (practically) everyone agrees indentation is carrying most of the weight of visual indication of structure.

Of course "everyone" might be wrong - but that's a fairly tricky corner to defend.


> Nobody advocates for "braces are better without indentation".

^^^ THIS ^^^


That's just display though, if you have to show it to a skeptical client, why not run it through a browser that shows it without braces? It's the same as showing a webpage instead of the html.


I was specifically thinking about asking clients to edit or write these files. Isn't that a fairly common use case for config languages?


If programmers find getting YAML indentation correct difficult how are non-programmers going to fair?


If the HomeAssistant subreddit is anything to go by its their biggest complaint (HA configuration is in YAML).

With that said if they weren't complaining about white space they'd be complainit about missing semicolons, missing/extra commas, missig equals signs, missing closing )]} or whatever.


Are you arguing "braces are easier to get right than indentation" or is this a point specific to YAML's rules? Because I'm not defending the latter but I find it hard to understand I would need to argue against the former.


That's a good point someone brought up in a different comment. I haven't really dealt with those scenarios, so honestly I'll accept that as a reason.

I wish it could be displayed with braces though, I wonder if someone already has built that as an extension for editing / viewing yaml files.


YAML is a superset of JSON. In other words: any syntactically valid JSON file is a valid YAML file. If you want braces like JSOM, but not quoted strings, YAML supports it.


> YAML is a superset of JSON.

That's false. http://p3rl.org/JSON::XS#JSON-and-YAML


You could run a formatter which adds braces if you like?


This makes zero sense to me. Why do config files out of all things have to be accessible to non-coders?


Because we are asking them to write these things in many cases.


I'm not buying that you genuinely have a target audience of "I trust this person with config files but their eyes are too gentle to see a curly brace".


Well. I've genuinely had clients editing YAML so there's that.

I can definitely think of a broad range of people where I'd be happy to recommend they use text files for config and data but I wouldn't be happy if those text files needed to follow the rules of JSON syntax.

I mean - to some extent I would rather not edit JSON. It's not a terribly ergonomic experience. If I had to design a format for my own use, it would be indentation based and probably look a little bit like YAML, Markdown or similar.


Because they're not code?


Isn't an indent a clear demarcation?


Not at all.

If it was code, an indentation error would often not compile or show errors or fail tests.

Configuration in YAML is much worse: most of the time an indentation error goes undetected until an application starts misbehaving.

Significant whitespace is perfectly ok for code but a huge footgun for YAML


Literally, yes.

But I find it incredibly annoying to estimate indentation when lines are wrapped in an editor (or webpage). Or, to a lesser extent but still throws me off, when multiple blocks end at the same line. Or when pasting blocks into another block, and having to double check to make sure the indentation was carried over correctly. I like editors that visually show indentation characters.


Only if you forbid tabs ...


If it must be created by non-programmers too delicate to match braces, how about BEGIN .. END and an autoformatter.

Or, NODE <name> .. END .


tabs or spaces for that indent?

it makes a difference on how it is parsed, so it's more than a devs preference


It really doesn't matter. Just force the leading indent to be exactly the same bytes. If indent moves between two values where one isn't a prefix of the other raise an error.


is one \t the same as \s{4,} in bytes? that doesn't make any sense


No it isn't. That is the point. If you have one line indented with 4 spaces and one line indented with a tab there is no correct answer what the difference of indent is. The only good option is to raise an error.


right, or encapsulate in some sort of braces ;-)


Sure, that is another option. But your code still looks confusing.

Personally I mostly use indent to read code, so requiring that the indent matches the semantic nesting makes it much easier for me to understand.


Looks confusing to who? Another coder? Then they have issues.

I get the original was talking about client facing config files. I'd rather see INI style config files personally.

If you're writing ugly code, braces or spaces won't save you. Just don't write ugly code.* Write it like the next person to view your code is an axe murder that knows where you live, so don't make them mad. You can minify later.

*I'm ignoring Perl, as it's always ugly


An underrated property of braces in this case is that a truncated document is no longer valid (assuming your document only has one top-level item).

Truncate YAML and in most cases you still have valid YAML.


I was surprised the first time I saw Daniel J. Bernstein's qmail configuration. Qmail uses separate configuration files for each parameter being set. The directory /var/qmail/control contains most of these files.

For example, to set the maximum message size to by 10Mb and to set the timeout to be 30 seconds:

    echo 10000000 > /var/qmail/control/databytes
    echo 30 > /var/qmail/control/timeoutsmtpd
There are many more files like this that hold simple values. /var/qmail/control/locals is a file that is a list of domain names, one per line.

Dictionaries are just subdirectories with one file per entry, for example this is how aliases are defined to qmail:

    echo fred > /var/qmail/alias/.qmail-postmaster
    echo fred > /var/qmail/alias/.qmail-mailer-daemon
See [1] for more about qmail.

DJB also created a simple, portable encoding for serializing data called netstrings, see [2]. XML, YAML, JSON, TOML, and INI files all have some advantages over netstrings, but netstrings are simple to understand and simple to parse correctly.

[1] https://www.oreilly.com/library/view/qmail/1565926285/ch04.h...

[2] https://en.wikipedia.org/wiki/Netstring


I used this system myself for a project. It has some downsides, but overall it worked pretty dang well.


My opinion: I can live with yaml and json. Toml,tjson if I have to. Xml with a gun to my head. But I don't want yet another markup language (ironically that's what YAML stands for)


What I want from YAML (or a competitor) is access to the concrete syntax tree.

For one of my art projects I make YAML files that describe the front side, back side, and web side of a "three sided card". I generate these out of several templates, currently using ordinary string templating.

I'd love to be able to load a YAML file and add something programatically to the list and have the list stay in the same format that it was in, so if it was a

   [1,2,3]
list I get

   [1,2,3,4]
if it was a

   - 1
   - 2
   - 3
list I want

   - 1
   - 2
   - 3
   - 4
sadly I'm the only one who thinks this way.


In JavaScript, use https://www.npmjs.com/package/yaml for this:

    import assert from 'assert'
    import { parseDocument } from 'yaml'
    
    const flowDoc = parseDocument(`[1,2,3]`)
    flowDoc.add(4)
    assert(flowDoc.toString(), '[ 1, 2, 3, 4 ]\n')
    
    const blockDoc = parseDocument(`\
    - 1
    - 2
    - 3`)
    blockDoc.add(4)
    assert(
      blockDoc.toString(),
      `\
    - 1
    - 2
    - 3
    - 4
    `
    )


You are not the only one. But even if you find a library for YAML AST transformations for your language. What ever other language uses your YAML probably doesn't have it.

E.g. I tried exactly the same thing, and it was quiet difficult with Rust. Because the way to parse it usually is with serde and it just removes the AST of course.

In the end I gave up, and just used JSON for my use case.



YAML stands for "YAML Ain't Markup Language"


Which is more than a tad bit ironic, in retrospect.


Not really since it's true. It isn't markup, it's a configuration file format.

"<em>This</em> is a markup language" since there is text which is marked up.

YAML/JSON is a way to serialise fairly common data structures (arrays/lists, hashes/dictionaries, numbers, strings, bools, etc.)

Incidentally, if you can seamlessly replace XML with something like JSON, then you probably aren't using the 'markup' bit of XML.


Ah, yes, that’s totally correct. I was mentally glossing over the difference between markup and configuration languages.


There was a previous discussion about YAML:

YAML: Probably not so great after all (arp242.net)

https://news.ycombinator.com/item?id=20731160

https://www.arp242.net/yaml-config.html

To which I posted:

https://news.ycombinator.com/item?id=20735231

I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?

https://yaml.org/spec/history/2001-08-01.html

XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.

If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?

https://en.wikipedia.org/wiki/YAML#History_and_name

>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.

https://en.wikipedia.org/wiki/Markup_language

>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).


> YAML is considered by many to be a human friendly alternative to JSON

I'm not disagreeing with the author here, but as someone old enough to remember the rise of XML as a data transmission format (and Erik Naggum's masterful rant against it[0]), it's strange because historically speaking both XML and JSON were also popularized as more "human readable".

I would be curious how many HNers (and even more so newer developers outside the HN-o-sphere) have worked extensively with or even written parsers for binary (or otherwise non-human readable) file formats. Writing an MP3 metadata parser used to be a standard exercise for devs looking to level up their programming skills a bit.

It personally feels weird to me that we would keep pushing for more "human readable" data formats when the world is increasingly removed from one where non-programmer humans need to read data. Keep your data in whatever format make sense and let software handle transforming it to a more readable or more efficient format depending on the needs, even if humans can't read it (they shouldn't need to!).

On top of all that my experience has been that JSON leads to more atrocities than XML (while fully agreeing with all of Erik Naggum's points about that) and YAML creates even worse horrors than JSON. It seems we'll soon be approaching eldritch horrors if we continue to pursue human readable data exchange formats.

0. https://www.schnada.de/grapt/eriknaggum-xmlrant.html


As an embedded sw dev working on things that interface with legacy devices, I have written lots and lots of binary parsers (as well as serial, net, and ipc protocols). I've also reversed some binary formats used by games, etc.

I like binary formats for things that are simple and don't change too often. However, I still love not having to waste days on studying yet another bespoke binary format & parser for things that are complex and don't work right for whatever reason. So when performance isn't a concern and you aren't working in a size-constrained environment, I do find that "human readable" formats are often worth it.

As a practical example, I recently hit a bug where KiCad moved some custom footprints' pad shapes around after saving & reloading. And I quickly discovered that the footprint files are just S-expressions and relatively self-descriptive so I fixed my issue in five minutes with vim without ever needing to look at docs or code. That kind of thing is super convenient. Later I discovered that other users are likewise working around the program's limitations using a text editor or custom scripts to manipulate things KiCad won't do for you; for example, to create a repetitive pattern of components in a layout more complicated than a grid.


I don't understand this. YAML has limitations. All formats have limitations. If a format is too limiting, don't use it. Pick one more suitable, or come up with another one, like NestedText (or whatever). What is this need to tell everyone else to "move on" from using some format because it doesn't sit your specific preferences or use case?


The person that creates the config file does not necessarily choose the config file format. In the example, github chose YAML and everyone using github actions must use it. YAML is error prone, as everyone that tests with Python is finding out as the add Python 3.10 to their regression tests. This is a plea to organizations like github to stop choosing YAML.


As not a Python programmer, I'm struggling to understand the issue.

Your (or blogger's) claim is that GitHub misparses the YAML actions config when it comes specifically to Python? Or that YAML is inherently inadequate to the task of representing Python's necessary actions?


The Python 3.10 issue is this, in yaml:

  Python_versions:
  - 3.8
  - 3.9
Then we add the new version in our yaml file:

  Python_versions:
  - 3.8
  - 3.9
  - 3.10
But now we discover that 3.10 is parsed as a float, just like the other ones were. But the problem is that 3.10 becomes 3.1, the equivalent float value! With 3.9 we didn't notice this problem.

I'm not sure what the name for this problem is.

It's something like.. an unfaithful but unintentionally working representation. Until it doesn't work anymore. The solution in YAML is to quote the values so that they become strings as intended.


How is that different from other formats? Take JSON:

    {"python_versions": [3.8, 3.9, 3.10]}
This is a problem in any config language that doesn't enforce types, and jf it does enforce types, you should've used quotes already (like you really should've been before 3.10 was added to the list).

Similar problems also exist in many config formats with scientific notation (2e345) or hexadecimal notation (0x12345) or octal notation (012345, no that's not a decimal number in many programming languages and config formats!).

What commonly supported alternative would you suggest for this use case?


The difference from JSON is that in JSON, you always have to write "" to get strings.

In YAML you can just write blah and and it becomes a string. Except when it doesn't, when it matches some other type of value.


Sure, if you ignore what the YAML documentation actually says.


> I'm not sure what the name for this problem is.

Not reading the manual? Same thing would happen in JSON.

Sure it's annoying but all the YAML docs I've ever read state that unquoted numbers are ints/floats.


YAML is a superset of JSON.

Which means anything that accepts YAML accepts JSON.

So you can write your Github action code in JSON (with a yml file extension).


> YAML is a superset of JSON.

That's false. http://p3rl.org/JSON::XS#JSON-and-YAML


Interesting. Thanks for bringing that to my attention.

It sounds like the differences are unlikely to be encountered in conventional config files, especially those written by hand.

From that link I understand the source of problems to be:

Object keys longer than 1024 "stream characters". But these wouldn't be valid in any other yaml syntax either.

Some Unicode characters in strings (characters outside of BMP).

An escape sequence that is technically valid in JSON but never required.


I don't understand this. Opinions are not universal, every blogger has limitations. If you don't agree with a HN title, don't read it. Pick the next one or read another site. What is this need to tell everyone that you don't like them to tell everyone to "move on" just because it doesn't sit your specific preferences in formulation?


I see what you did there, but it comes off as mocking and not so clever.

The difference between my post and that blog, and your reply for that matter, is that I'm not telling anyone not to read it. I'm inviting you to comment on why people feel the need to tell others to move on.

Here's a good example of how to respectfully disagree: https://news.ycombinator.com/item?id=29220994


YAML's lack of limitations is the source of much of the difficulty with the format. The numerous ways to represent basic data (arrays, strings etc) is a common source of error. YAML doesn't have enough limitations!

And you can't pick what config format a tool you need uses.


But, GitHub, specifically has very specific guidelines on what to write and how to format it, and will give pretty detailed error messages if it's badly formatted. Not to mention tooling to format and highlight. I'm not seeing how the proposed NestedText is inherently free from those same issues: the need for tooling, guidance, error messages. Is the claim that it's easier?


I think the important difference between NestedText and YAML is that NestedText does not try to convert the text to numbers or booleans. YAML converts on to True and 3.10 to 3.1, which in this case is undesired. NestedText keeps text as strings. The idea is that end application should be the one that determines if on is a string or a boolean and whether 3.10 is a string or a number.

Its all in the name. All leaf values are strings. It is literally nested text.


Especially since the post title doesn't match the linked article's heading.


Yes! Thanks for pointing that out! You're right!


As a fun overview of the problem we're discussing, here's a rough list of the various mentioned languages in this comment section:

  - YAML
  - JSON
  - JSONC
  - XML
  - TOML
  - INI
  - CSV
  - NestedText
  - Starlark
  - Python
  - Dhall
  - Cue
  - Jsonnet
  - DADL
  - EDN
  - HCL


And Lua. I'm mentioning Lua, right now.


RON (Rusty Object Notation) is another good one. The name is slightly unfortunate, since it seems applicable outside Rust projects.


surprised JSON5 didn't show up


I post this pretty much every time this topic comes up:

JSON5 exists, and is quite nice. I've picked it up for configs on a work project and haven't once had an issue due to misconfiguration, unexpected parsing, or friction with leaving a trailing comma or a comment.

The nesting in JSON5 is simple and familiar to pretty much all programmers, unlike deep nesting in TOML which is a huge pain.


JSON5 keeps coming up in these discussions, and I've personally had a great time with it. Hopefully some larger projects pick it up and it eventually becomes a common occurance, or something.


I like it so much I got motivated enough to start making a sublime text highlighter for it. I got a bit lost though, having never made one before.

And then I tried to use a tool called SBNF to write the grammar for the language at a high level and have it spit out Sublime Text syntax highlighting code. Didn't quite work yet unfortunately.

https://github.com/bschwind/sublime-json5

https://github.com/BenjaminSchaaf/sbnf


The introduction keeps citing "no need for escaping or quoting" as a major advantage, but provides no examples of what a key with a colon, or value beginning with "[", or any datum with leading or trailing whitespace would look like.

Also, the changelog is quite frightening!

> [In 3.0], `[ ]` now represents a list that contains an empty string, whereas previously it represented an empty list.


This made my curious to find out. The "Language introduction" docs [1] answers these points;

* keys containing leading spaces, newlines or colons can be represented with the multiline key form, where each line of the key starts with `: `.

* leading or trailing space is not complicated; the string values are just the rest of the line after the separator from the key, `: `. The values are not trimmed.

* a string value beginning with `[` just works in most places. This would not be confused with list values, as these would only start after a new line. Only in the compact inline list and inline dict form there are restricted characters for syntax.

It seems that their claim, no escaping required, holds. The slightly more verbose form of the language constructs may be required to represent special values though.

[1]: https://nestedtext.org/en/stable/basic_syntax.html


Looking at the comparison examples between TOML and YAML/NestedText, I fail to see how anyone can look at the YAML/NestedText and think "yeah, this is way easier to read and reason about than TOML".

I'm not even a Rust person. I've never worked in Rust in my life, so there is no "preference bias" in my comparing the two. I just don't find YAML, or this "improvement" as "human-readable" as people make out to be.


I'm also confused. The TOML example is way easier for me to understand. It's clearly very subjective.


Two interesting configuration language alternatives:

https://dhall-lang.org/

https://cuelang.org/


I've tried dhall, cue and jsonet, and cue is so far my fav. It's very well designed, expressive, but restrictive enough so that config files don't look like scripts.

The way it blend types and values makes learning it super easy, yet you can do complex things with few lines.

But the main implementation exports to yaml without quoting the strings, which kinda defeat the purpose :(


I’m kind of unsure about the way CUE achieves reuse: if I understand correctly, you have files in a directory tree and the (result of processing the) bottommost files are the things you’re supposed to point your consuming tools at. So there’s no way to share structure among a collection of items if that collection is nested inside your config, the only operation available is essentially the generation of a set of similar but separate configs. Or am I wrong here? I’d very much like to be.

(Also, the type system is absolutely delicious, but it badly needs a paper with a complete description. I’m extremely interested in how it works, but fragmentary “notes on the formalism underlying” CUE are not enough.)


CUE is based on Typed Feature Structures, which predate Deep NLP, and for which there is limited literature. We do need a good writeup on the theory. I've written a bit here: https://cuetorials.com/cueology/theory/

Think of a graph with lots of attributes within which paths are searched for.


The linked page seems to be empty.


I'm seeing a few headings and links


You do have imports and functions so you can reuse what you want.

The doc is also quite clear and rich, but the way it's organized means I have to read it entirely before writting my first CUE file. It also lacks IRL examples so trials and errors were my best friends.

It can be discouraging.


cuetorials.com really helped me get there


I actually use CUE for large configuration files, used YAML before and had many issues once configurations became larger and more complex. It validates and exports JSON, which is easily readable in C++ and Python :-) have been happy doing the switch


I'm working on my own (you can start the attack hahaha). I feel there's not many simple, generic languages that allow to write simple DSLs with embedded documentation. The self documenting part is still missing but you can take a look and say what you think. It's kind of like yaml format (a bit TOML) with schema and possibility to merge multiple files with smaller chanks of the data. With export to json and yaml. https://github.com/dadlang/dadl


This looks pretty cool


I have a client that uses a CMS of unknown origins. I just get stuff placed in an s3 bucket, and then attempt to parse what was provided. 100% of their YAML files are invalid by every single linter I have found/tried. Not one of them understands where the error is occuring to help debug. It litterally just says invalid. I'm at a total loss. My head doesn't think YAML. Does a string need quoting or not? trailing spaces at the end causes problems? my personal experience in YAML is limited, but it hasn't been pleasant.


I'm glad to see people experimenting with alternative document/object representations, but this one might be a hard sell: based on the README[1], it only has Python, Zig and Janet implementations so far. One of the nice things about YAML (and JSON, TOML, etc.) is that they have decently mature C, C++, or Rust libraries that other languages bind to.

[1]: https://github.com/KenKundert/nestedtext


Sure, but why move to an alternative that's almost as bad?

YAML's problem is that whitespace is significant. TOML could be superior to it if it weren't for the fact that they forgot to forbid indentation. And now indented TOML is everywhere, including its wikipedia page.

If we have to make a change, why not finally bite the bullet and go to the form that has existed for decades and is obviously superior to all of these formats? S-expressions. There's even been a standard for data notation brewing for some time: https://github.com/edn-format/edn

Then we can actually forego http://xkcd.com/927 and do something useful with our significantly saved mental energy.

edit I see that I'm not at all alone in wanting edn to replace all this crap. So some action points on how to actually make that happen, in order of preference:

- write or improve robust edn parsers for your ecosystem

- write or improve robust x => edn converters for your ecosystems (x=yaml,json,toml,whateverpoisontheyuserightnow)

- use edn in your projects

- advocate the use of edn


One interesting demonstration of YAML's complexity relative to JSON is that YAML is almost a complete superset of JSON. This is acknowledged by the authors of the YAML spec.[0]

For example, the following code translates JSON to YAML using only the Python yaml library:

    import yaml
    
    data = yaml.load("""
    {
      "firstName": "John",
      "lastName": "Smith",
      "isAlive": true,
      "age": 27,
      "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021-3100"
      },
      "phoneNumbers": [
        {
          "type": "home",
          "number": "212 555-1234"
        },
        {
          "type": "office",
          "number": "646 555-4567"
        },
        {
          "type": "mobile",
          "number": "123 456-7890"
        }
      ],
      "children": [],
      "spouse": null
    }
    """)
    
    print(yaml.dump(data))
    
Prints:

    address:
      city: New York
      postalCode: 10021-3100
      state: NY
      streetAddress: 21 2nd Street
    age: 27
    children: []
    firstName: John
    isAlive: true
    lastName: Smith
    phoneNumbers:
    - number: 212 555-1234
      type: home
    - number: 646 555-4567
      type: office
    - number: 123 456-7890
      type: mobile
    spouse: null
0: https://yaml.org/spec/1.2.2/#12-yaml-history


Further to this you can even include the trailing commas and hack in comments in the yaml-like json value:

    import yaml

    data = yaml.load("""
    {
      "firstName": "John",
      # Woah comments
      "lastName": "Smith",
    }
    """, yaml.SafeLoader)
Granted I still like YAML when it sticks to the basics and avoid some of the messier aspects, like anchors, when I can.


I wrote a program at a corporate job where all the configuration is in Excel files. Tables are just fed into a dictionary and columns on each worksheet are predefined to hold the keys. People loved it because they know how to use excel and “text” is scary. (This is all very strange because they are just entering text in Excel, but the familiarity goes a long way)


I _recently_ suffered through a meeting where we developers were told to use the _new_ testing framework some team at our corp created. It's written in Java (we use .NET exclusively in our branch), configured via Excel sheets and Java, and exports results also as Excel sheets.

Whoever thought this was a good idea in 2021 has to be braindead. But the CEO was pleased. Probably because they know Excel.


I give you Robot, originally created at Nokia.

https://robotframework.org

Back in 2006, the testing was written in HTML tables, no idea how it manage to still be around.


Same but ms access


Keeping leaf values as strings is quite elegant. I’ve found the auto conversion to be inconsistent across yaml implementations in different languages.


Have spent many years developing dev tools that use YAML and alternatives and I still think YAML wins because of its ubiquity relative interop with JSON. I’d pick HCL as an alternative if I was going to, as it’s been widely battle tested in Terraform.


Agree, HCL resembles JSON but handles comments/nested data/etc really well.


Yet another alternative format to JSON or YAML is Rust Object Notation (RON or RSON), which is a lot like JSON but more expressive:

https://github.com/ron-rs/ron


Modest proposal - The real problem I see with JSON is it's not rich enough. Not a problem if you're using EDN

https://github.com/edn-format/edn


Huh, no. YAML is a superset of JSON. So valid JSON is valid YAML. This is sometimes suprisingly useful. Also, YAML is used everywhere and like the other user pointed out, has mature well tested libs for almost every language.


Another JSON superset is HOCON. The H is for human.


HOCON is pretty good. Probably the sanest format for config files.


Indeed: "YAML version 1.2 is a superset of JSON" https://en.m.wikipedia.org/wiki/JSON#YAML


> YAML is a superset of JSON.

That's false. http://p3rl.org/JSON::XS#JSON-and-YAML


How is YAML a superset of JSON? never heard that claim before.


You can turn any piece of a YAML into a JSON object and it will be read just as well.

This YAML:

    user:
      name: Ted
is equivalent to this YAML:

    user: {"name": "Ted"}
which is equivalent to this YAML/JSON:

    {"user": {"name": "Ted"}}


While true, this is what being a supperset means. What it means is that any JSON document is, without modification, valid YAML document and can be read with a YAML parser.


I think you mean "this isn't what".


That's exactly the case. Any YAML 1.2 parser can parse any valid json document as is.


YAML supports tagged nodes and multiple documents in a stream, which can’t be represented in standard JSON. (You could make up conventions, but only your parser would support them.)


It just is: Any valid JSON is valid YAML with the same semantics. It was intentionally designed that way (though not from the start iirc).

Basically take JSON, make quotes optional for strings and make curly brackets optional if the object is indented properly, and boom you've got (something like) YAML.


If all JSON is YAML but not all YAML is JSON, then YAML is a superset of JSON.


Any valid json can be parsed by any 1.2 yaml parser. That's how it gained popularity in the first place: you didn't have to migrate. Like ascii and utf8.


When YAML came out there was no large JSON usage to migrate from, the incumbent in configuration was "human readable serialization" was XML.


There was an interesting project showcased here a while ago, it was some kind of very minimal language, almost the most minimal theoretically possible, but with some interesting properties. Does anyone remember it?


I can't tell whether you're making a LISP joke.


Actually not.


Im sorry, But the issues raise have more to do with a particular implemention -- that is outdated -- then YAML in general.

E.g. "on" should no longer be treated a true. Thats a 1.1 version archaism. And 3.10 is going to have the same issue in JSON.

No doubt YAML could still be improved and maybe we'll get there eventually. 2.0 is a long discussed goal, but the creators of YAML (who I have talked to extensively) are cautious, thoughtful and methodic and won't make that jump until they are sure of it.

Meanwhile 1.2 is a fairly good spec, and difficulties largely lie with implementors and users.


Most people that complain about YAML are like a person complaining that tennis shoes are terrible because when you try to rock climb in them, they don't have traction and you slip and fall. Tennis shoes are garbage! ..... Or maybe tennis shoes are good for what they were designed for, and you need a different kind of shoe for rock climbing.

YAML is a human-readable data-serialization language. Note the word readable, and that it's for data serialization. It's not intended to be human-writeable. It's not intended to configure an application (unless that configuration is the result of serializing the data object created after someone configures the app).

Since programmers don't really understand the different types/classes of file formats and what they're for, they choose the wrong formats for the wrong tasks. And then the people who are forced into using those formats for those programs find it's highly problematic - but they get pissed off at the formats and not the programmer!

I'm perfectly happy for people to create new data serialization formats, new configuration formats, new markup formats. But please avoid the trap of thinking "this format or that format sucks". None of them suck for what they were created for, when used correctly. YAML, for example, has huge advantages over most common data serialization formats. But those advantages fly out the window as soon as humans start writing YAML by hand, and then bolting on weird custom logic, as if this data serialization format were a higher-level language.


Surprised they didn't call it YAYAML


Not a fan of Yaml. An ini file (no space problems) for simple config, json for more complex structure, csv for multiple entries with the same fields


One step forward, one step back, one step sideways. It's a great idea to have line types, but the treatment of block text is terrible... at least with YAML I can embed markdown and have the paragraphs come out correctly. Multiline keys... are they really that important so that we have to jump through awkward looking syntax hoops to get them? And please, why keep inline object definitions? Nobody needs them! It won't kill you to write a list over multiple lines. Some verboseness is good if it enforces structure.

I use YAML quite extensively for an internal tool, since it's pretty much the only human-useable format out there. But the weirdness of the language makes it error-prone while the quality of available parsers is simply unacceptable. I ended up writing my own parser for a subset of unambiguous YAML (not unlike StrictYAML, but even simpler) that offers nice error messages etc. and it works very well for us.


I feel like there should be a data format that supports typing.

Basically just a Typescript object where types are defined on the key somehow.

Because JS/TS objects are basically JSON with a bunch of annoyances (too particular about commas, always-quoted keys, no comments) removed.



If your project is in python, the only correct file format to use for the config file is python. If your project is in ruby, the config file should be ruby. If you're in $SCRIPTING_LANGUAGE, your config file should be implemented in $SCRIPTING_LANGUAGE.

You can use `literal_eval()` (in python, there are similar constructs in basically every scripting language) to prevent prevent people from putting code in the config, but really, it's the person writing the config file's computer, let them do what they want.

For compiled languages, it's a bit harder, but there are zero cases where a separate "configuration language" should exist.


> NestedText only supports three data types (strings, lists and dictionaries)...

No numbers? Looks nice otherwise, but this seems like a very weird decision.

That said, JSON with its "I might overflow your number" attitude is not much better in this regard.


The issue with YAML is that it does not unambiguously distinguish between number/booleans and strings. JSON does, but only for numbers, booleans, and nulls. But there are many data types that need to be conveyed. For example, dates and quantities (numbers with units, such as $3.14 or 47kΩ). Such things are left to the application to interpret. Even JSON does not unambiguously distinguish between integers and reals. Even so, JSON pays for its lack of ambiguity by requiring all strings to be quoted, which adds clutter and the requirement for quoting and escaping. Thus, supporting those extra types comes at a cost.

I think NestedText is unique in leaving all leaf values as strings, so it does not need quoting or escaping.

Everything involves a compromise. YAML provides a lack of clutter at the cost of ambiguity. JSON is unambiguous, but comes with visual clutter. In both cases there are still lots of types they cannot handle and so must be passed on to the application.

The compromise with NestedText is that it provides simplicity and a lack of clutter by not supporting any data types for leaf values other than string. Thus, all interpretation of the text is deferred to the application. But fundamentally that is the best place for it, because only the application understands the context and knows what is expected.


Yes but I can understand the rationale. There are many numeric types and settling on some excludes use with others. If letting the application handle that, the configuration language can remain simple. His example where a version number 1.10 was round trip converted to 1.1 was enlightening.


The worst config files I've ever encountered are dynamic YAML templates for Kubernetes. Mind boggling to figure out with crazy indentation rules and for loops. Kill me now.

Seriously, the more time I spend with other configuration formats the more I start appreciating JSON. It is a simple array and object format. Nothing can go wrong with it. No indentation rules. Easy to encode and decode. Easy to turn into actual arrays and objects in your programming language. Lack of types is not great, but use a type checker in your parser that throws exceptions and you are fine.

JSON is simple and that is why it is successful.


I've always been wondering why is Ion so unknown... https://amzn.github.io/ion-docs/


Here's my dream config language:

* start with JSON

* add comments (obviously)

* allow trailing commas (obviously)

* allow left-side and right-side to be independently and explicitly typed

* extensible type support (JSON's lack of u64 is absurd)

For example the following are all equivalent:

* "foo" : f32 = 13.37

* "bar" : f32 = "13.37"

* "baz" : f32 = 0x4155eb85 : hex

And extensibility to allow:

* "ham" : u64 = 123456789123

I would also be inclined to allow explicit config file hierarchy by allowing:

* configs can specify their base

* configs can specify their child

* must be acylic

I've thought about implementing this myself. But I've never written a real lexer or parser. It's on my side-project TODO list.


You might like RON[1]. It's far from perfect (and far from complete), but seems nice so far.

[1] https://github.com/ron-rs/ron


JSON has integer types/literals AFAIK, it's just that the Javascript implementation doesn't support them.


It has numeric literals. But all JSON number types are f64. Which is sufficient to represent every 32-bit integer, but it can not represent all 64-bit integers.

For code that needs a 64-bit integer, which is quite a lot, you have to encode the integer in a string.

Lack of 64-bit integer (either signed or unsigned) is a pretty common and well understood pain point when using JSON as an interchange format.


No, that's just how JavaScript reads them, because JavaScript only has float64. JSON itself allows arbitrary large decimal numbers, including uint64.


I suppose you're right that the formal JSON specification doesn't actual specify what a number actually is. Only that it's a sequence of digits. TIL.

In practice, when you are using JSON as an interchange format between tools and systems it is exceptionally likely that you'll be constrained to 64-bit float as the underlying Number data type. Even if there is no JavaScript anywhere in your data pipeline. I have been burned by this repeatedly. For the record, I have never written a single line of JavaScript in my professional career.

In any case, my dream format would separate storage type from encoding type and formalize implementation extensions.



Here’s a popular opinion: YAML, used sparingly, without too much complexity, is fine, and provides a great balance of legibility and data density.

There are a few gotchas that are easy to catch with validations.

The end.


> YAML, used sparingly, without too much complexity, is fine

It's wishful thinking because the complexity is inherent, unfortunately. That's analogue to saying "programmers should not write bugs". Humans are fallible and error prone, it's not going to happen unless a language is restricted in such a way that a category of bugs is not possible by design. However, YAML's design is sprawling, so despite best intentions people will run into the problems caused by the complexity. Possible ways out are restrictions of the design (e.g. Strict YAML) or whole replacements (e.g. NestedText).

> There are a few gotchas that are easy to catch with validations.

Does this actually exist? If not, who's writing the code for these validations? How can we make sure everyone who needs to use them is using them?

The idea sounds good on paper, but not workable in practice because "patching after the spec problems" requires global coordination.


I use yaml as an alternative to .properties files. It's boring. Yes, quote your strings. Know the damn config language. It's not going to kill you like a table saw.


Naive question: is it viable if we started using Lua[JIT] for configuration, like NeoVim and likely others do?

Can Lua's interpreter be compiled without some "dangerous" APIs enabled (whichever those might be) and thus be made viable as an embeddable and isolated configuration engine?

I'm just getting sick and tired of all the half-baked configuration formats and want to look for something that's both more flexible but still strict and unambiguous. I wonder if it's possible?

As I said, naive question.


> Naive question: is it viable if we started using [a Turing-complete programming language]

This is actually a good question. The people who are not asking and going right ahead with that plan are doing a dangerous thing. <https://www.cs.dartmouth.edu/~sergey/langsec/occupy/> It's not viable because a subset of people would like to have the following properties upheld:

• Parsing configuration should be decidable and finish in finite time.

• Parsing configuration should not be a security exploit.

• I should not have to implement the Emacs runtime just to parse its configuration file.

> something that's both more flexible but still strict and unambiguous. I wonder if it's possible?

This really depends on what you mean by these words; I'm interested to hear your idea in detail. Meanwhile, have a look at Dhall <https://dhall-lang.org/> and the other languages mentioned in <https://news.ycombinator.com/item?id=29221643> and compare.


JSON is often so hard to read I have to open a file in a formatting tool. It's also hard to tell if there's a simple missing quote or brace. Those are extra steps I don't have with YAML. For many things, JSON is great. For simple loaders, YAML gets it done quick and easy. They are functionally interoperable for the most part so I use what's best.

I'd also add the author has a competing framework so maybe there's a bit of pre-existing bias.


When you miss a quote or a brace in JSON, the JSON fails to parse. When you make a similar minor mistake in YAML, you often end up with a valid but nonsensical document with completely incorrect structure.

I don't want the language to be flexible enough that simple common errors go unnoticed - I WANT the parser to tell me at parse time if I screwed something up. It's a similar dynamic to dynamic/static typing.


I'm not sure I'd trust manually editing either in a plain text editor. YAML cares about whitespace yet will silently make most combinations valid and it also has surprising pitfalls on type interpretation if one isn't consistently on top of it when entering data. JSON has a bunch of additional characters but at least if you forget them you have a higher chance of it just telling you it's wrong when you try to use it.

Thankfully it's very rare I ever have to open any config file in a text editor that isn't aware of dozens of formats so things like a missing brace are always suggested anyways and it isn't much a problem regardless of format.


I think the advantages over the csv example are not quite advantages.

"tall and skinny". Well it looks skinny, but in bytes is actually fatter than the csv. Similarly, for comparing stuff, the columnar display works better.

My go-to is ini. Simple. Everything is a string. No assumed hierarchy in section naming. Just key/value pairs under sections. It is up to the using application to parse them how it sees fit.


I like complexity of YAML. When people choose YAML as driver for their shitty tools I'm forced to use, at least I can use YAML features to cope with this braindamage. Of course, you can go only so far with e.g. anchors.

Stop using configuration languages as programming languages and leave YAML alone as configuration language.

If you know about dangers of unsafe YAML, why don't you just replace load() with safe_load()??


1) No, no, no, please avoid stuff that needs to be indented in 2021. YAML is more than enough to prove that is a silly thing to do.

2) If your configuration file is too big, maybe you're doing it wrong.

3) TOML is by far the less ugly format for configurations that i've ever used. Far from perfect, but it's predictable and doesn't require indentation.


YAML has problems (mainly that it requires an overly complex parser), but the alternatives presented here look like a step backward. Easier for the parser, but less flexible for the human (one nitpick with NestedText that stood out immediately: can I collapse arrays and dictionaries into a single line like in YAML with [] and {}?)


yep

    python-version:
        - 3.8
        - 3.9
        - 3.10
can be written:

    python-version:
        [3.8, 3.9, 3.10]

same is true for dictionaries.


I really don't know where all this XML hate comes from, it has everything I need and I don't find hard to read, maybe I'm just used to it. How often do you change your configuration files? My projects pom.xml change only occasionally, version upgrade or new dependencies, that's it.


I wish we could use [nix](https://nixery.dev/nix-1p.html) for config files.

It's nice, simple, and allows variables and functions.

Jsonnet has a similar feature set, but I find it nigh unreadable.


The problem with the version numbers here seems like it's because they've inappropriately written them as numbers rather than as strings, so 3.1 and 3.10 are equal. It's not the languages fault if the author chooses the wrong datatype.


The trouble is that people insist in writing trees using plain text editors. Trees should be written in tree editors. Then you can't get the delimiters wrong.

"NestedText was inspired by YAML, but eschews its complexity. "

Mandatory XKCD: https://xkcd.com/927/

Fun fact: HTML was invented because SGML was "too complicated".


What is a tree editor?



EDN is the best I've used, I dream for it to become wildly adopted.

It has powerful types, can be extended, clean explicit syntax, whitespace independent, easy for humans and machines to read. It's like JSON done right.


Just learn about EDN already.


We need a lot more tooling to support EDN everywhere well enough though. If you use Clojure(Script) it is absolutely natural to use EDN everywhere. (We do just that at OrgPad.com which you can see in the OrgPage about Clojure: https://www.orgpad.com/o/D6TrZny7tNhYqWygzax7Wx and for the EDN just add download: https://www.orgpad.com/o/D6TrZny7tNhYqWygzax7Wx/download)

EDN is less ideal for Python and other similarly somewhat high-level languages and there just aren't any libraries for C/C++ from what I have seen. In general, XML/ JSON/ YAML/ custom binary or text formats it is in that order for most software from what I can guess. Remember, .docx, .xlsx and other document formats are also basically just XML-based configurations for programs/ interchange formats.


Isn't NestedText similar to using recfiles[0]?

[0]: https://www.gnu.org/software/recutils/


Is there a yaml cli equiv to jq for json?

https://stedolan.github.io/jq/

Is there a syntax for specifying yaml schema like coach?


> Is there a yaml cli equiv to jq for json?

You could have found this by searching the Web for "jq for yaml". <https://kislyuk.github.io/yq/>

> yaml schema

You could have found this by searching the Web for "yaml schema". <https://rx.codesimply.com/> <https://web.archive.org/web/2021/http://www.kuwata-lab.com/k...>

In practice, schemas designed for operating on the JSON infoset (not the serialisation) will also work.


I, for one, prefer to write python that generates JSON.

- All the flexibility of python, plus commenting!

- json module can test or dump as a flat blob o' text or pretty-printed

#ProblemSolved

#OKjustMoved


In the first example:

Do people really represent software versions as floats?

3.10 being converted to 3.1 makes sense. Use a string instead.


HCL anyone? It does everything right in my experience (nomad, using it as a format for my own apps)


what's wrong with dhall?


Does Dhall have a .NET implementation yet?


I’m confused: why would Dhall need a specific .net implementation?


Because you don't want to bring in a separate language with a separate runtime with unknown interop story just to read config files?


You don’t do that pre-launch?

I use it to generate the config files, and pass the generated config to the application the same way you’d pass any other json/yaml/etc to your app. Neither ever talk to each other, and Dhall doesn’t exist within our runtime environment.


Dhall must be interpreted to find the finished config, correct? I would like to do that inside my application if possible.


You can always convert it to JSON before consumption.


I've been doing this when writing CloudFormation, and it helps a lot.


also don't like yaml. hjson is an obvious better choice https://hjson.github.io/


seriously why dont people just use EDN and move on. Rich types, proper key value pairs, keywords instead of strings everywhere, support for fractions and comments...


please dog, end this plague of whitespace sensitive parsers.


the boolean thing is dumb, otherwise yaml is a fine format


The boolean thing with "on" also is not a problem that exists in the current version of yaml.


What does everyone think of prototext format.


927


XML above all!


XML in theory is a great format for what it represents — a tree of heterogeneous typed simple key/value pairs.

The problem is almost no data that people want to actually represent has this form and every way people have tried to beat XML into representing other things (i.e. lists and dicts) is kludgy is awkward.


eff it, let’s use XML…


Mandatory xkcd

https://xkcd.com/927/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: