YAML: It's Time to Move On

thefifthsetpin · on Nov 14, 2021

I don't like YAML and would like to move on, but I hope we don't move onto this.

I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.

I don't like that a truncated document is a complete and valid document.

But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:

>>> nt.loads(nt.dumps("\r"),top="str") '\n'

nanis · on Nov 15, 2021

I wish people would stop trying to write programs for which there are no interpreters, compilers, or linters:

    name: Install dependencies
    run:
        > python -m pip install --upgrade pip
        > pip install pytest
        > if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi

That is a program that is hiding in the bowels of a "nestedtext" document ... It is no better than a program that is hiding in the bowels a JSON or YAML document.

We all have to deal with this, but it is beyond stupid.

    [Install Dependencies]
    run=/path/to/install-script

Then, write `install-script` in whatever language you want ... verify it works. It should have tests. etc etc etc.

theamk · on Nov 15, 2021

I don't think it matters much if this is inline or in separate file. If you want to test your tests, "yq -r .run input.yaml | sh -e" works as well.

In fact, if I really wanted to test my tests, I'd say that directly testing the corresponding clause is the more comprehensive approach. For example, what if someone accidentally changes the line to read:

    run=/path/to/install-scriptq

? then your test of "install-script" will not catch anything. But if your test runs "yq -r .run | sh -e", then it will catch that error. And you can still forward to a script if you wanted to.

So let's keep inline scripts, they are very reasonable methods for just a few commands.

galkk · on Nov 15, 2021

Depending on the source control tool you may lose syntax highlightning, you, most likely, lose linters and even copying those multi-line commands to shell becomes cumbersome. I consider inlining example from GP's comment awful

kitd · on Nov 15, 2021

V minor point: new yq versions use 'e' instead of '-r'.

ljm · on Nov 14, 2021

It would be nice if YAML wasn't horrendously abused the way it is. You have CI pipelines that let you construct DAGs to represent your builds, but you need several thousand lines of YAML and a load of custom parsing to get programming constructs in the string types, for example. And then each provider has its own way of providing those.

I don't have to re-read manuals describing how to do if/else in Ruby or Java or Lisp, but as soon as yaml and some 'devops' tooling is involved, I have to constantly jump back and forth between the reference and my config.

The main point being that the problem isn't the file format but the products that continue to push it, presumably because hacking stuff on top of `YAML.parse` is less effort than designing something that fits the purpose.

donmcronald · on Nov 15, 2021

Yeah. A lot of times I find myself thinking YAML is like a really awful programming language. You can sort of do conditional logic and loops, but usually I find it hard to follow what's going on.

For build systems, I always liked the idea of Gradle where the core functionality was simple and declarative, but with the option to use a real programming language for things that weren't simple. For example, integrating installers or form builders (pre-processing) into a build are things I would consider non-trivial if there aren't official plugins, but it was still relatively easy to do with Gradle.

The biggest problem I always had with Gradle was that I didn't like Groovy and I always though there was a missed opportunity to have a statically typed build system with a solid API/contract and all the fancy tooling like auto-complete that you get with statically typed languages.

I see JSON5 mentioned a lot in the comments. In terms of CI / build systems, I feel like something built with JSON5/TypeScript could be really good. I'd be really happy using TypeScript for configuring things like build systems where there shouldn't really be an argument for needing it to be usable by non-programmers.

Personally I feel like I've spent way to much of my life debugging YAML syntax issues.

kitd · on Nov 15, 2021

If you're happy to go lispy, there's Babashka [1], a Clojure without the JVM. It has built-in support for 'tasks' designed to make writing build scripts easy.

[1] https://babashka.org/

TheCycoONE · on Nov 15, 2021

Does the Kotlin support in Gradle solve your problem? https://docs.gradle.org/current/userguide/kotlin_dsl.html

spuz · on Nov 15, 2021

My experience with Kotlin gradle scripts is worse than Groovy. For example, given the following valid groovy/kotlin gradle program:

    dependencies {
    }

What would you expect to see between the curly braces? IntelliJ IDEA which supposedly has full support for the gradle DSL both for Groovy and Kotlin offers only generic suggestions. Common function calls such "implementation()" or "testImplementation()" are not suggested. If you do use those functions, no suggestion is made for their parameters. Because Gradle's DSL is built on top of a general purpose language, it loses the benefits of a DSL (constraining the set of possible configurations and guiding the user towards valid configurations).

netvl · on Nov 16, 2021

The key benefit of the Kotlin DSL is that in this precise example, IDEA does suggest valid stuff: https://imgur.com/a/vFYNIU1

Kotlin DSL is miles ahead of Groovy in terms of discoverability and IDEA integration. With Groovy DSL, most of the build script is highlighted with various degrees of errors and warnings; with Kotlin DSL, if something is highlighted, it is a legitimate error, and vice versa - if no errors are detected by IDEA, then it is almost certain to work.

There were rough spots of IDEA integration a couple years ago, but now it is close to perfect, within Gradle's limits of course (due to sheer dynamic nature of it, some things are just not possible to express in a static fashion, unfortunately). The biggest obstacle to Kotlin DSL use might be that some of the plugins use various Groovy-specifc features which are hard to use from Kotlin, but thankfully most of the plugins either fix those, or are rewritten in Java or Kotlin instead.

okeuro49 · on Nov 15, 2021

This was one thing I found difficult learning Gradle: its seeming complete lack of autodiscoverability.

I expected it to catch on, but I think a lot of people are sticking with Maven.

spuz · on Nov 15, 2021

There's a huge gap in Java build tool space for a tool that is simple and easy to learn and can cover 90% of projects' requirements. I have this feeling that we're in the "subversion" days of java build tools and the day someone introduces "git" people will wonder why we suffered with Gradle and Maven for so long. If I had time I would be looking into building this.

zmmmmm · on Nov 17, 2021

predating Gradle was a tool called gant. It was simple, intuitive and did 90% of what every project could want. Ironically it was Groovy based as well. But instead of the Gradle arcane magic based configuration it was literal, direct, a simple extension of Ant that came before it. I liked it much better, but someone decided they could make a business out of Gradle and gant got deprecated and here we are.

ljm · on Nov 15, 2021

I found it fairly simple to build Gradle plugins with Kotlin. If anything, the problem was just having the patience to actually find the right documentation in the first place, and understand what was being described. The main problem I faced there was that I wanted a plugin to configure dependencies for the project it would run against and the docs around dealing with dependencies and detached configurations were a bit confusing.

I do find it curious that a lot of these tools get seen as basic task runners despite offering much more potential.

ilyash · on Nov 15, 2021

It's always the same trajectory with declarative programming. It starts with "it's just configuration, we need something simple". Then users come with use cases which are more complex. Then you have programming language on top of configuration language syntax.

* https://ilya-sher.org/2018/06/30/terraform-becomes-a-program...

* https://ilya-sher.org/2018/09/15/aws-cloudformation-became-a...

stevekemp · on Nov 15, 2021

Very much so. A good few years ago I got annoyed that I couldn't have change mutt configuration the way that I wanted, because it has a built in configuration language which doesn't allow complicated conditionals etc.

(There are workarounds, and off-hand I can't think of a great example, but bear with me.)

In the end I wrote a simple console-based mail-client, which used a Lua configuration file. That would generally default to using hashes, and key=value settings, but over the time I used it things got really quite configurable via user-defined callbacks, and functions to return various settings.

For example I wrote a hook called `on_reply_to`, and if you defined that function in your configuration file it would be invoked when you triggered the Reply function. This kind of flexibility was very self-consistent, and easy to add using an embedded real language.

Later I added some hacks to a local fork of GNU Screen, there I just said:

* If the ~/.screenrc file is executable, then execute it, and parse the output.

That let me say "If hostname == foo; do this ; otherwise do this .." and get conditionals and some other things easily. Another example was unbinding all keys, and then only allowing some actions to be bound. (I later submitted "unbindall" upstream, to remove the need for that.)

silon42 · on Nov 15, 2021

Some even start with a programming language and pretend it's declarative...

IMO, it's only declarative when it's a data model which is easily parsed by multiple languages/systems where it's needed.

int_19h · on Nov 15, 2021

What's really sad is that XML had a much better ecosystem around this for ages. I'd very much rather deal with XQuery or even XSLT to construct build trees, than the current crop of ad-hoc YAML preprocessors. At least the XML stuff had a consistent type system underneath!

tdeck · on Nov 15, 2021

XSLT is an absolute horror and not something I would want to deal with again. It feels like some weird academic experiment in an XML declarative programming language that should never have made it to print.

If something needs the flexibility of a programming language, why not use a real one that's been well tested for writing other programs? These various config file programming systems always end up creating something notorious that everyone tries to avoid having to work on.

int_19h · on Nov 15, 2021

XQuery is, in many ways, XSLT with better syntax. It doesn't have the pattern-matching transforms that are the T in XSLT - but for configs, I don't think it makes a big difference.

Also, I don't think many realize that the stack has evolved since early 00s. XSLT 1.0 was a very limiting language, requiring extensions for many advanced scenarios. But there's XSLT v3.0 these days, and XPath & XQuery v3.1, with some major new features - e.g. maps and lambdas. Granted, this doesn't fix the most basic complaint about XSLT - its insanely verbose syntax - but even then, I'd still take XSLT over ad-hoc YAML-based loops and conditionals.

stjohnswarts · on Nov 15, 2021

I will take the verbosity of XML any day over YAML wrestling (complex YAML configs of course). There is simply too many "implicit rules" for YAML. it's why I prefer python over ruby and perl. Generally though TOML has been good enough for me to do lots of fairly large config files that are easy for humans and machines to parse.

dnautics · on Nov 15, 2021

XML died because too many configurations turned what should be a 'prop' into an inner tag -- and it doesn't help that XML doesn't really give guidance as to when to use which. And, of course, when you deserialize XML, the innerText is always in a very strange place as to not really being clear "what the right way to handle it" is.

thayne · on Nov 15, 2021

Honestly, I think using an embedded scripting language, like lua or even javascript, would be a much better fit for these use cases than trying to make yaml do something it wasn't designed for.

ljm · on Nov 15, 2021

Ironically, having used cdk8s[1] for dealing with kubernetes infrastructure, that's the one thing where I've actually preferred yaml. That said, k8s resource definitions are pure config so there's no need to try and hack extra bits on top of a serialized data structure.

[1]https://cdk8s.io/

theamk · on Nov 15, 2021

I really like approach of buildkite CI -- they use yaml, but this yaml can be produced by an executable script.

So you write yaml by hand for trivial cases, but once it get complex, you can just drop back to shell/python/ruby/node/whatever, implement any complex logic, and serialize results to plain yaml.

shusaku · on Nov 15, 2021

This is the crux of the issue right? YAML is fine for most projects, but THE project for using YAML is CI configuration, for which it doesn’t match.

Lhiw · on Nov 14, 2021

I'm pretty sure the format is the issue.

I still don't know how to do arrays in yaml.

Is it new line, tab to same indent or same indent + one space, do I need a dash? Does a dash make it an array or an object?

It's just simply not that big of a deal to add a few quotes and braces to make everything make sense.

The only real issue with Json is the lack of comments and strictness about extra commas.

tialaramex · on Nov 15, 2021

> I don't like that a truncated document is a complete and valid document.

Me either. If your documents have this property you're likely to tempt people to start trying to process partial documents.

When they do that, they violate Full Recognition Before Processing and likely there's a latent security bug as a result.

jrochkind1 · on Nov 15, 2021

> So, I downloaded their python client to see how it did it.

Who are you suggesting the python client belongs to, who is 'they' in 'their'?

detaro · on Nov 15, 2021

... the people that made it and wrote the submission we are discussing? how is this even a question?

stillicidious · on Nov 14, 2021

Author seems to use misfeatures of a particular implementation to tar all implementations with. The round-tripping issue is not a statement about YAML as a markup language, much in the way a rendering bug in Firefox is not a statement about the web.

Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on. Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy. They're prone to opinion and style, as if replacing some part or other will make the high level problem (that's us) go away. Fretting over perfection in UI is an utterly pointless waste of time.

I don't know what NestedText is and find it very difficulty to care, there are far more important problems in life to be concerned with than yet another incremental retake on serialization. I find it hard to consider contributions like this to be helpful or represent progress in any way.

spicybright · on Nov 14, 2021

I actually disagree it's bike shedding.

If you can write a bad YAML document because of those mis-features/edge cases, I'd say you've already lost.

Humans are messy, but at the end of the day the data has to go to a program, so a concise and super simple interface has a lot of power to it for humans.

Working at a typical software company with average skill level engineers (including myself), no one likes writing YAML. But everyone is fine with JSON.

I think it's a case of conceptual purity vs what an average engineer would actually want to use. And JSON wins that. If YAML was really better than JSON, we'd all be using that right now.

So does it really matter if YAML is superior if >80% of engineers pick JSON instead?

speleding · on Nov 15, 2021

I would argue that you can write something poor and/or confusing in any markup language that is sufficiently powerful.

Conversely, if a markup language is strict enough to prevent every inconsistency, then it's not powerful enough or too cumbersome to use to be generally useful.

nine_k · on Nov 15, 2021

I'd say that YAML is anything but conceptually pure, with all the arbitrariness, multitude of formattin options, and parsig magic happening without warning.

If you want conceptual purity (and far fewer footguns), take Dhall.

throwaway81523 · on Nov 14, 2021

> Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on

Nah, in the 1970s we had Lisp S-expressions that completely solved the problem, and everything since then has been regressions on S-expressions due to parenthesis phobia.

After hearing that thing about the country code for Norway, I became convinced that YAML has to just die. Become an ex-markup language. Pine for the fjords. Be a syntax that wouldn't VOOM if you put 4 million volts through it. Join the choir invisible, etc.

This is good: https://noyaml.com/

Erik Naggum had a notoriously NSFW rant about XML (over the top even for him) that I better not link to here, but lots of it applies to YAML as well.

lmm · on Nov 15, 2021

S-expressions don't solve the problem at all, you just get to fractally bikeshed all over again about what semantics they have and what transformations are or aren't equivalent. Does whitespace roundtrip through S-expressions? Who knows. Are numbers in S-expressions rounded to double precision on read/write? Umm, maybe. How do I escape a ) in one of my values? Hoo boy, pick any escape character you like and there's an implementation that does it.

didibus · on Nov 15, 2021

EDN solves all these problems: https://github.com/edn-format/edn

mro_name · on Nov 15, 2021

I have to second that. Including the variant "canonical s-expressions" which is in fact a binary format.

otabdeveloper4 · on Nov 16, 2021

Why of course, embedding a full-blown Lisp development environment for parsing a config file is totally sane and normal.

(Sarcasm, just in case.)

fanf2 · on Nov 14, 2021

S-expressions don’t completely solve the problem: they don’t have a syntax for maps, and in practice there are at least two common incompatible conventions: alist or plist?

throwaway81523 · on Nov 15, 2021

Obviously the application has to interpret the Lisp object resulting from reading the S-expression, just like it has to interpret any JSON, YAML, or anything else that it reads. So for maps you can, as you mention, use alists or plists. Regarding other stuff mentioned: none of the encodings are supposed to be bijective (the writer emits the exact input that the reader ingested). Otherwise, for example, they couldn't have comments, unless those ended up in the data somehow. There is ASN.1 DER if you want that, but ASN.1 is generally disastrous.

Stuff like escape chars were well specified in Lisps of the 1970s (at least the late 1970s), including in Scheme (1975). Floating point conversion is a different matter (it was even messier in the pre-IEEE 754 era than now) but I think the alternatives don't handle it well either. You probably have to use hexadecimal representation for binary floats. Maybe decimal floats will become more widely supported on future hardware.

A type-checked approach can be seen in XMonad, whose config files use Haskell's Read typeclass for the equivalent of typed S-expressions.

kaliszad · on Nov 15, 2021

EDN has maps, sets, vectors and lists and is extendable.

dreamcompiler · on Nov 15, 2021

Solutions for this problem that I've used in my own S-expression config files:

1. Use only alists for maps because they prevent off-by-one errors.

2. Allow plists because they're less verbose than alists and use reader macros to distinguish them, and allow the reader macro definitions to be in the same file.

Most of the time I use option 1 because it's simpler.

int_19h · on Nov 15, 2021

I would argue that, in a data markup language, there shouldn't be a syntax for maps. Whether a given sequence should be treated as key-value pairs, and whether keys in that sequence are ordered or unordered, is something that is best defined by the schema, just like all other value types.

AYBABTME · on Nov 14, 2021

It'd be bikeshedding if the status quo was good. But it isn't.

dmitriid · on Nov 14, 2021

> Author seems to use misfeatures of a particular implementation to tar all implementations with.

There's no canonical YAML implementation, and YAML spec is enormous (doubly so if you need to work with stuff like non-quoted strings etc. )

bmn__ · on Nov 15, 2021

> There's no canonical YAML implementation

The formal grammar counts as canonical and several implementations are derived from it: https://github.com/yaml/yaml-reference-parser

Aloha · on Nov 14, 2021

If you use YAML in situations where it may need hand editing, it means you actively hate your users.

YAML is patently unsuitable for any use case where the resulting output may require hand editing.

tannhaeuser · on Nov 14, 2021

> YAML as a markup language

YAML ain't markup language.

Banana699 · on Nov 15, 2021

>Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy

I don't know what to make of this statment, it has so much handwaving built-in. The most charitable interpretation I can find is that by 'Human-convenient' you simply meant the quick-and-dirty ideology expressed in Worse Is Better: Does job, makes users contemplate suicide only once per month, isn't too boat-rocking for current infrastructure and tooling.

Taken at face value (without special charitable parsing), this statement is trivially false. Python is often used as a paragon of 'Human-convenience', I sometimes find this trope tiring but whatever Python's merits and vices its _definitely_ NOT messy in design.

Perl is the C++ of scripting languages, it's a very [badly|un] designed language widely mocked by both language designers and users. Lua and tcl instead are languages literally created for the sole exact purpose of (non-) programmers expressing configuration inside of a fixed kernel of code created by other programmers, and look at their design : the whole of tcl's syntax and semantics is a single human-readable sentence, while lua thought it would be funny if 70% of the language involved dictionaries for some reason. These are extremely elegant and minimal designs, and they are brutally efficient and successful at their niches : tcl is EDA's and Network Administration's darling, and lua is used by game artists utterly uninterested in programming to express level design.

'Humans are messy' isn't a satisfactory way to put it. 'Humans love simple rules that get the job done' is more like it. But because the world is very complex and exception-laden, though, simple rules don't hug its contours well. There are two responses to this:

- you can declare it a free-for-all and just have people make up simple rules on the fly as situations come up, that's the Worse Is Better approach. It doesn't work for long because very soon the sheer mountain of simple rules interact and create lovecraftian horrors more complex than anything the world would have thrown at you. Remember that the world itself is animated by extremely simple rules (Maxwell's equations, Evolution by Natural Selection, etc...), it's the multitude and interaction of those simple rules that give it its gargantuan complexity and variety.

- you stop and think about The One Simple Rule To Rule All Rules, a kernel of order that can be extended and added to gradually, consistently and beautifully.

The first approach can be called the 'raster ideology', it's a way of approximating reality by dividing it into a huge part of small, simple 'pixels' and describing each one seperately by simple rules. I'm not sure it's 'easy' or 'convenient', maybe seductive. It promises you can always come up with more rules to describe new patterns and situations, and never ever throw away the old rules. This doesn't work if your problem is the sheer multitude and inconsistency of rules. The second approach is the 'vector ideology', it promises you that there is a small basis of simple rules that will describe your pattern in entirety, and can always be tweaked or added to (consistently!) when new patterns arise, the only catch is that you have to think hard about it first.

pull_my_finger · on Nov 15, 2021

>and lua is used by game artists utterly uninterested in programming to express level design

Rather short sighted and dismissive to a successful programming language that's evolved over 20+ years. Lua is a great general purpose programming language that specializes not in "game making for non-programmers" but in ease of embedding, extension/extensability, and data description (like a config language). There's a whole section in Programming in Lua[1] to that effect. The fact that it's frequently used in games is credit to it's speed, size and great C API for embedding, not because of any particular catering to game designers.

[1]: https://www.lua.org/pil/10.1.html

Banana699 · on Nov 15, 2021

You misunderstood me. I love lua and I wasn't being dismissive of it, I was using the first example that came to my mind to counter the claim that a convenient language has to be messy. Just because that was the example used doens't mean there is an implicit "and that's the only thing it's good for" clause I'm implying there: if someone said "Python is used by scientists utterly uninterested in programming to express numerical algorithms" would you understand that to be a dismissive remark against Python ?

Being used by non-programmers utterly uninterested in programming to solve problems is the highest honor any programming language can ever attain, because it means that the language is well-suited to the domain enough (or flexible enough to be made so) that describing problems in it is no different than writing thoughts or design documents in natural language. This is the single most flattering thing you can ever say about a language, not a dismissive remark.

posharma · on Nov 14, 2021

It's really sad to see the pervasiveness of JSON. For one thing its usage as a config file is disturbing. Config files need to have comments. Second, even as a data transfer format the lack of schema is even more disturbing. I really wish JSON didn't happen and now these malpractices are so widespread that it's hurting everyone.

jackjeff · on Nov 14, 2021

JSONC. JSON with comments. And even if your favorite parser does not support it natively it’s not so hard to add with a very simple pre-lexer step.

JSON schemas exist and they’re ok for relatively simple things. For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.

yesbabyyes · on Nov 14, 2021

> For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.

Not sure if this is what you're looking for, and whether it's powerful and expressive enough for your use case, but you can use typescript-json-schema¹ for this, and validate with eg ajv.

¹https://github.com/YousefED/typescript-json-schema

andyfleming · on Nov 15, 2021

I like JSON5 for similar reasons. I specifically like the addition of comments, trailing commas, and keys without quotes.

eitland · on Nov 15, 2021

I've struggled with this in Java recently and at first I used Jankson which supports the complete JSON5 spec, but later we figured out we could configure the standard Jackson JSON package to accept the things we actually need and actually use.

silon42 · on Nov 15, 2021

Also needed is string concatenation. One line strings are very limiting.

jestar_jokin · on Nov 14, 2021

There's libraries that let you define a schema programmatically, and then infer the types.

https://github.com/sinclairzx81/typebox

matja · on Nov 14, 2021

Seems to me that YAML just needs type/schema support to be less of a hurdle.

As an alternative, the encoding/decoding roundtrip using protobuf seems reasonable to me, catches the footgun of using floating-point version numbers (it becomes a parse error), whitespace/multiline concatenation being more obvious, and allowing comments (compared to JSON):

  ( cat << EOF
  # yes, comments are allowed
  name: "Python package"
  on: "push"
  build {
    python_version: ["3.6", "3.7", "3.8", "3.9", "3.10"]
    steps: [
      {
        name: "Install dependencies"
          run:
            "python -m pip install --upgrade pip\n"
            "pip install pytest\n"
            "if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi\n"
      },
      {
        name: "Test with pytest"
        run: "pytest\n"
      }
    ]
  }
  EOF
  ) | protoc --encode=Config config.proto  | protoc --decode=Config config.proto
  
  name: "Python package"
  on: "push"
  build {
    python_version: "3.6"
    python_version: "3.7"
    python_version: "3.8"
    python_version: "3.9"
    python_version: "3.10"
    steps {
      name: "Install dependencies"
      run: "python -m pip install --upgrade pip\npip install pytest\nif [ -f \'requirements.txt\' ]; then pip   install -r requirements.txt; fi\n"
    }
    steps {
      name: "Test with pytest"
      run: "pytest\n"
    }
  }

elurg · on Nov 15, 2021

> Seems to me that YAML just needs type/schema support to be less of a hurdle.

JSON schemas exist and can be applied to yaml and this is supported by many editors. For example this vscode extension: https://marketplace.visualstudio.com/items?itemName=redhat.v...

It's strange to see so many complains about "missing tooling" that actually exists and is well supported.

rurban · on Nov 15, 2021

> Seems to me that YAML just needs type/schema support to be less of a hurdle.

Unfortunately YAML already got type support, which made it easier to roundtrip, but also insecure. Creating a type calls constructors with possible insecure side effects. Which was eg used to hack Movable Type.

afavour · on Nov 14, 2021

JSON Schema is an official thing that exists and has implementations in all major languages. Personally I’m very glad that it’s an opt-in addition rather than a requirement.

(I agree with you about comments though)

fastball · on Nov 15, 2021

For comments just use JSONC.

IshKebab · on Nov 15, 2021

I agree, but I would recommend JSON5 as the solution. Not YAML or this abomination.

JSON5 has many advantages:

* Superset of JSON without being wildly different. I know YAML is a superset of JSON but it's completely different too. Insane.

* Unambiguous grammar. YAML has way too many big structure decisions that are made by unclear and minor formatting differences. My work's YAML data is full of single-element lists that shouldn't be lists for example.

* Comments, trailing commas

* It's a subset of Javascript so basically nothing new to learn.

* It has an unambiguous extension (.json5). I think JSONC would be a reasonable option but everyone uses the same extension as JSON (.json) so you can never be sure which you are using. E.g. `tsconfig.json` is JSONC but `package.json` is just JSON (to everyone's annoyance).

* Doesn't add too much of Javascript. I wouldn't recommend JSON6 because it's just making the format too complicated for little benefit.

rk06 · on Nov 15, 2021

I would rather recommend jsonc:

- it has good editor support (VsCode) - has comments support - Support jsonschema

Only thing missing is trailing commas, but i would rather live without trailing commas than tooling support

IshKebab · on Nov 15, 2021

JSONC supports trailing commas.

> - it has good editor support (VsCode)

Unfortunately it doesn't really because of the extension issue I mentioned. Certain file names (like `tsconfig.json`) are whitelisted to have JSONC support, but any random file `foo.json` will be treated as JSON and give you annoying lints if you put comments and trailing commas in.

That's a fairly recent change I think.

runarberg · on Nov 14, 2021

Tools that use JSON as configuration format could simply allow certain unused keys (e.g. all keys starting with #) and promise never to use them. Then author can write their comments with something like:

    {
      "name": "my-tool",
      "#comment-1": "Don’t change the version!",
      "version": "42.1337.0"
    }

justinpombrio · on Nov 14, 2021

There's a lot of JSON tooling, and it's liable to interact badly with this. For example, a formatter might re-order the fields of a dict, moving "#comment-1" away from "version". Or the software that this JSON is for might error upon receiving unexpected keys (which is actually useful behavior, as that would catch a typo in an optional field).

Also, this doesn't let you put comments at the top of the file, or before a list item, or at the end of a line.

If you're going to change your JSON tooling to handle comments of some kind, you might as well go all the way to JSONC.

posharma · on Nov 14, 2021

I've heard and read this multiple times. Why are you trying so hard to fit into a format that doesn't just support comments out of the box? What advantages is JSON offering you that you've compelled to bend over backwards to do this? It's exactly these kinds of workarounds that is making it super difficult stop such malpractices. It's just plain ugly. Please stop doing this.

cbm-vic-20 · on Nov 14, 2021

In many cases, you're using a library or service that you don't maintain, so you don't have much of a choice.

argomo · on Nov 15, 2021

You can't comment out a large section of config easily. For me, this is a relatively common use case for config files, so I take the position that JSON should be used for serialization only.

benibela · on Nov 15, 2021

And I am just writing a JSON de/serializer to move my config from the current system to JSON. I worked on it today and yesterday and several days some time ago.

This situation makes me feel rather silly

umvi · on Nov 14, 2021

So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

(and it doesn't have to be comment-less... JSON with comments is a thing and VSCode has syntax highlighting for it - just strip out the comments before parsing).

stavros · on Nov 14, 2021

> So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

Aren't we past basic false dichotomies?

saurik · on Nov 14, 2021

Nope: basic false dichotomies and JSON are both pervasive.

a9h74j · on Nov 15, 2021

There is a corporate- and government-approved standard for false dichotomies, but it works as a de-facto standard, not published.

psd1 · on Nov 15, 2021

   ifnot:
    - foo
   then_clearly:
    -
       'some bar'

zz865 · on Nov 14, 2021

XML is perfect. + With all the fancy editors now its very easy to write. Easy schema to check, comments. Perfect.

eitland · on Nov 15, 2021

Disclaimer: this is not a defense for YAML, I'm just trying to remove the rose tinted glasses some people view XML configs through.

As someone who has used XML configs they have a few problems:

- technical: missing comments are mentioned multiple times here so I will mention that while XML has comments they cannot be nested.

- socially: for some reason (maybe because XML is structured enough that this doesn't immediately collapse?) XML tends to just grow and grow. People start programming in XML too, and not only using XSLT or other standard approaches but also in completely proprietary ways.

At one project someone even wrote an authorization framework in Apache Tiles which allowed one to create roles using somewhere between 600 and 5000 lines of XML pr role. The benefit was of course that you could update the roles without touching the Java code.

(In case it isn't immediately obvious: it would have been extremely much simpler to edit it in Java, and people who know enough Java to fix it are available at the right price, the XML system had to be learned at work.)

Personally I just want it to be kept simple:

- a settings.local.ini and default settings in settings.ini or something to that effect

- if necessary, just use a code file: config.ts works just as well, or config.js if it needs to be adjustable at runtime without transpilation.

nsonha · on Nov 14, 2021

not easy to read, it's the java of config, pages of code that express very little, by the time you find what you need, you forget the context and what level of nesting you're on already. It's also more wasteful as a transport.

KronisLV · on Nov 14, 2021

> It's also more wasteful as a transport.

This is most certainly true, however with GZip thrown into the mix, it's not quite as bad as one might imagine: https://www.codeproject.com/articles/604720/json-vs-xml-some...

It compresses pretty decently and doesn't have too much of an overhead, in the example it being around 10% larger than JSON when compressed.

I'd argue that if one were to swap out JSON for XML within all the requests that an average webpage needs for some unholy reason, the overall increase in page size would be much less than that, because huge amounts of modern sites are images, as well as bits of JS that won't be executed but also won't be removed because our tree shaking isn't perfect.

Edit: as someone who writes a good deal of Java in their dayjob, i feel like commenting about the verbosity of XML might be unwelcome. I'll only say that in some cases it can be useful to have elements that have been structured and described in verbose ways, especially when you don't have the slightest idea about what API or data you're looking at when seeing it for the first time (the same way how WSDL files for SOAP could provide discoverability).

However, it all goes downhill due to everything looking like a nail once you have a hammer - most of the negative connotations with XML in my mind actually come from Java EE et al and how it tried doing dynamic code loading through XML configuration (e.g. web.xml, context.xml, server.xml and bean configuration), which was unpleasant.

On an unrelated note, XSD is the one truly redeeming factor of XML, the equivalent of which for JSON took a while to get there (JSON Schema). Similarly, WSDL was a good attempt, whereas for JSON there first was WADL which didn't gain popularity, though at least now OpenAPI seems to have a pretty stable place, even if the tooling will still take a while to get there (e.g. automatically generating method stubs for a web API with a language's HTTP client).

BerislavLopac · on Nov 15, 2021

You mean something like https://pyotr.readthedocs.io

KronisLV · on Nov 15, 2021

Thanks for the link, but not necessarily.

How WSDL and the code generation around it worked, was that you'd have a specification of the web API (much like OpenAPI attempts to do), which you could feed into any number of code generators, to get output code which has no coupling to the actual generator at runtime, whereas Pyotr is geared more towards validation and goes into the opposite direction: https://pyotr.readthedocs.io/en/latest/client/

The best analogy that i can think of is how you can also do schema first application development - you do your SQL migrations (ideally in an automated way as well) and then just run a command locally to generate all of the data access classes and/or models for your database tables within your application. That way, you save your time for 80% of the boring and repetitive stuff while minimizing the risks of human error and inconsistencies, with nothing preventing you from altering the generated code if you have specific needs (outside of needing to make it non overrideable, for example, a child class of a generated class). Of course, there's no reason why this can't be applied to server code either - write the spec first and generate stubs for endpoints that you'll just fill out.

Similarly there shouldn't be a need for a special client to generate stubs for OpenAPI, the closest that Python in particular has for now is this https://github.com/openapi-generators/openapi-python-client

However, for some reason, model driven development never really took off, outside of niche frameworks, like JHipster: https://www.jhipster.tech/

Furthermore, for whatever reason formal specs for REST APIs also never really got popular and aren't regarded as the standard, which to me seems silly: every bit of client code that you write will need a specific version to work against, which should be formalized.

nsonha · on Nov 15, 2021

> model driven development never really took off

same as to why REST is now not a hot thing anymore, the idea that your API is just a dumb wrapper around data model is poor api design.

API-driven development didn't really took off either, that is write your spec in grpc/OpenAPI and have the plumbing code generated in both ends. It's technically already there with various tools, but because of dogma like "code generation is bad", quality of code generators, or whatever reason, we're still writting "API code"

BerislavLopac · on Nov 15, 2021

Well, in Python, code generation is an anti-pattern.

KronisLV · on Nov 15, 2021

> Well, in Python, code generation is an anti-pattern.

Hmm, i don't think that i've ever heard of this. Would you care to provide any sources, since that sounds like an interesting stance to take?

So far, it seems like frameworks like Django don't have an issue with CLI tools to generate bits of code, i.e. https://docs.djangoproject.com/en/3.2/intro/tutorial01/

  If this is your first time using Django, you’ll have to take care of some initial setup. Namely, you’ll need to auto-generate some code that establishes a Django project – a collection of settings for an instance of Django, including database configuration, Django-specific options and application-specific settings.
  
  $ django-admin startproject mysite

Similarly, PyCharm doesn't seem to have an issue with offering to generate methods for classes (ALT + INSERT), such as override methods (__class__, __init__, __new__, __setattr__, __eq__, __ne__, __str__, __repr__, __hash__, __format__, __getattribute__, __delattr__, __sizeof__, __reduce__, __reduce_ex__, __dir__, __init__), implementing methods, generating tests and copyright information.

I don't see why CLI tools would be treated any differently or why code generation should be considered an anti-pattern since it's additive in nature and is entirely optional, hence asking to learn more.

BerislavLopac · on Nov 16, 2021

First of all, just because a tool or project uses a pattern, it doesn't mean that it's a good idea. Second, code generation as part of IDE or one-time setup is something else.

I need to clarify: when I say that "code generation" is an anti-pattern, I'm talking about the traditional, two-step process where you generate some code in one process, and then execute it in another. But Python works really well with a different type of "code generation".

Someone once said that the only thing missing from Python is a macro language; but that is not true - Python has its own macro language, and it's called Python.

Python is dynamically evaluated and executed, so there is no reason why we need two separate steps when generating code dynamically; in Python, the right way is not to dynamically construct the textual representation of code, but rather to dynamically construct runtime entities (classes, functions etc), and then use them straight away, in the same process.

Unless you're dynamically building hundreds of such constructs (and if you do you have a bigger problem), any performance impact is negligible.

KronisLV · on Nov 16, 2021

> Someone once said that the only thing missing from Python is a macro language

Ahh, then it feels like we're talking about different things here! The type of code generation that i was talking about was more along the lines of tools that allow you to automatically write some of the repetitive boilerplate code that's needed for one reason or another, such as objects that map to your DB structure and so on. Essentially things that a person would have to do manually otherwise, as opposed to introducing preprocessors and macros.

For a really nice example of this, have a look at the Ruby on Rails generators here: https://medium.com/@simone.catley/ruby-on-rails-generators-a...

GoblinSlayer · on Nov 15, 2021

>you forget the context

Wait, it the opposite. XML is designed to indicate context, and JSON is designed to hide context, you have a bunch of braces in place of context there, no matter where you are it's braces all the way down, like lisp.

nsonha · on Nov 15, 2021

not really, what enables you to have the context is shorter code. It's useless to have context reminders at the top and bottom of the thing, but not the middle and it's too damn long

donmcronald · on Nov 15, 2021

For me XML and YAML are about the same. I think I'd also prefer comment-less JSON over both. However, XML wasn't that bad. With a decent editor and schema validation I would say there's a good chance I was more productive with XML than I am with YAML.

posharma · on Nov 14, 2021

It's simple. For config files, choose the format that has the best tooling in your company and that supports comments. For data transfer, choose that supports schemas, backwards compatibility and good tooling (protobufs is just one e.g. that I'm most familiar with).

int_19h · on Nov 15, 2021

Actually, yes, I do. XML syntax was far from stellar, and much of the ecosystem (e.g. XML Schema) was drastically overengineered... but even so, we had gems like RELAX NG to compensate. On the whole, it was better than the current mess.

mindcrime · on Nov 15, 2021

So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

Sure, why not? XML rocks. I'll take it over JSON for many purposes.

Waterluvian · on Nov 14, 2021

My opinion only: I love JSON because it lacks so many foot guns of yaml. If you’re doing lots of clever stuff with yaml you probably want a scripting language instead. Django using Python for configs made me fall in love with this. Spending years with the unmitigated disaster that is ROS xml launchfiles and rosparams makes me love it even more.

Yaml and toml are fine if you keep it simple. JSON direly needs comments support (but of course wasn’t designed to be used as a human config file format so that’s kind of on us). And not just “Jsonc that sometimes might work in places.”

Beyond that, I think we generally have all the things we need and I don’t personally think we need yet another yaml. =)

woodruffw · on Nov 14, 2021

These aren't foot-guns per se, but I can think of another handful of grievances I have with JSON:

* JSON streaming is a bit of a mess. You can either do JSONL, or keep the entire document in memory at once. I usually end up going with JSONL.

* JSON itself doesn't permit trailing commas. I can measure the amount of time that I've wasted re-opening JSON files after accidentally adding a comma in days, not hours.

* JSON has weakly specified numbers. The specification itself defines the number type symbolically, as (essentially) `[0-9]+`. It's consequently possible (and common) for different parsers to behave differently on large numbers. YAML also, unfortunately, has this problem.

* Similarly: JSON doesn't clearly specify how parsers should behave in the presence of duplicate keys. More opportunity for confusion and bugs.

jrockway · on Nov 14, 2021

Running prettier (https://prettier.io) on each save will fix trailing commas for you. If you accidentally have one, it will just sneakily remove it and turn your document into one that is valid.

xorcist · on Nov 14, 2021

How someone could have decided on a subset of javascript and not include comments is beyond me.

zosima · on Nov 14, 2021

It may have been a good or bad decision. But comments were intentionally left out of JSON to avoid obvious ways to sneak in parsing directives and thus incompatibilities between different JSON-parsers.

deepsun · on Nov 15, 2021

Yet incompatibilities persist from day 1: big integers, duplicate keys, keys order.

On the other hand, XML allows comments, yet I've never seen XML parsers incompatibilities.

singron · on Nov 15, 2021

Not exactly an incompatibility, but my mind jumped to issues like this: https://github.com/swisskyrepo/PayloadsAllTheThings/blob/mas...

    <NameID>[email protected]<!--XMLCOMMENT-->.evil.com</NameID>

Some parsers will take just the first text element ("[email protected]"), and others will concatenate the text elements ("[email protected]").

bmn__ · on Nov 15, 2021

> Some parsers will take just the first text element

Those are not in compliance with the relevant spec. We need to treat them as damage and confront them on the technical and social level.

Too · on Nov 15, 2021

If I had a penny every time someone tried to parse xml using a regex, if that classifies as a parser. Those are 100% incompatible with everything else.

Easiest way to demonstrate how wrong that is, is to throw in a comment in the example document ;)

afiori · on Nov 14, 2021

the funny thing is that json doesn't even need commas, they essentially act as whitespace, any amount or no amount would make no difference in the meaning of the document.

Arrays with hole are a JS-only feature

the_jeremy · on Nov 15, 2021

> json doesn't even need commas

JSON is defined by the spec. The people who wrote the spec think otherwise[0].

[0]: https://www.json.org/json-en.html

> Arrays with hole are a JS-only feature.

There are other langauges that allow arrays with missing elements.

afiori · on Nov 15, 2021

JSON requires commas, but does not need them, semantically they are treated like whitespace

The document > {1:2 3:[4 5]}

can only be "commafied" to > {1:2, 3:[4, 5]}

> There are other langauges that allow arrays with missing elements.

But javascript is the only one that gave JSON the JS in its name

spullara · on Nov 14, 2021

You can parse JSON in a streaming fashion with many libraries. You just don't know at the beginning if it is going to be valid or not.

lkschubert8 · on Nov 14, 2021

And the flip side of that with YAML is you can stream it, but you don't know once you've gotten to the end if it was the whole document without some user defined checksum mechanism.

spullara · on Nov 14, 2021

Ran into a great bug with the INI format which has the same issue. The application would read the config file on modification but if you just wrote over the file it would sometimes read the config before the file was fully written. Have to use a temp file and move it rather than just edit it.

gray_-_wolf · on Nov 14, 2021

It's possible to have document start and end markers in yaml:

    ---
    foo: 1
    ...

Your application can mandate usage of these. But yeah, not ideal.

mdaniel · on Nov 15, 2021

> Your application can mandate usage of these

I believe that's only true if one were to load YAML via the "SAX"-style per-event stream, and not the "object materialization" that normal apps use (aka `yaml.load_all` or JAX-B objects) since in those more data-object centric views, where would one put the processing events for those markers?

I also originally expected `yaml.parse(...)` to eat them as it does for comments and extraneous whitespace, but no, it does in fact return dedicated stream events for them, so TIL

secondcoming · on Nov 14, 2021

2, 3 and 4 can be caught early with JSON schema.

mirekrusin · on Nov 14, 2021

Not really, json schema validation is applied after json parsing on already parsed json.

BugsJustFindMe · on Nov 14, 2021

> Django using Python for configs made me fall in love with this.

I also started advocating in-language configuration files (Python for Python, but also Lua for Lua, etc) a number of years ago because it lets you do really useful things (like functionally generating values, importing shared subsets of data, storing executable references, and ensuring that two keys return the same values without manual copy/paste) all without needing to spec and use Yet Another Thing™ that does only a fraction of what the programming language you're already using already does.

eptcyka · on Nov 14, 2021

That also implies that you can't just test a foreign config file without first reading and understanding what it does, as just using one would imply arbitrary code execution.

kevin_thibedeau · on Nov 14, 2021

This is a place where Tcl excels. You can easily create restricted sub-interpreters that can't do anything dangerous. If you need more power for trusted scripts you just reenable selected commands.

spacechild1 · on Nov 14, 2021

Same thing with with Lua!

Jnr · on Nov 14, 2021

Using the programming language to do the comments works only when using some scripting language.

Things that get compiled can't really use it without recompilation.

mike_hock · on Nov 14, 2021

But you can embed Lua or Python using its C interface.

pjmlp · on Nov 14, 2021

That is how our Tcl based application server was, the configuration files were a Tcl DSL.

dmitriid · on Nov 14, 2021

> My opinion only: I love JSON because it lacks so many foot guns of yaml.

While true, parsing it is still a minefield because it's very underspecified: http://seriot.ch/projects/parsing_json.html

timmytokyo · on Nov 14, 2021

JSON5 is the way to go. It supports comments and trailing commas. Unfortunately it's going to be difficult to supplant legacy JSON, which is so pervasive.

dheera · on Nov 14, 2021

Except parsing JSON5 in browser is super slow. Native JSON.Parse doesn't support it, non-native parsnips are slow, and the only fast way to parse it is `eval()`.

TheCondor · on Nov 14, 2021

Does the browser need JSON objects with comments?

The desire to use a single interchange format for all data is the problem. There are plenty of reasons to support comments and minor syntax issues that JSON itself dislikes for human consumable and interactive JSON. I'd think software JSON could be just that.

The_Colonel · on Nov 14, 2021

This shouldn't really matter for the JSON5 use case - config files - which are usually small enough.

For machine-to-machine generated payloads JSON is good enough.

gmadsen · on Nov 14, 2021

I work with ros extensively and have not heard of using django for this use case. do you know of any open source projects that do this?

Waterluvian · on Nov 14, 2021

Sorry. Two separate contexts. I use both in the big picture but the Django world doesn’t directly interact with ROS. there’s an HTTP api for that.

iamleppert · on Nov 14, 2021

I’ve never liked YAML. For whatever reason, it always feels like working in a mine field. It comes from the same cargo cult of people who think the problem with human machine formats is that it needs to be “clean”.

Clean, of course to them means some bizarre aesthetic notion of removing as much as possible. Only it’s taken to an extreme. I wonder if the same people also think books would be better with all punctuation be removed to make it look “clean”?

It’s unhealthy minimalism, causes more problems than it solves. As soon as I see a project using YAML I cringe and try to find an alternative because god knows what other poor choices the developer has made. In that sense, YAML can be considered a red herring and I’m usually right. The last project I used that adopted an overly complex and build-breaking YAML configuration syntax had other problems hiding under the covers, and in some cases couldn’t parse it’s own syntax due to YAML’s overly broad but at the same time opinionated syntax.

Just say no to YAML.

DonHopkins · on Nov 15, 2021

By its very name (and the fact that the MEANING of the name flip-flopped in mid-flight after launch) you can tell that the designers of YAML had no clue what they were doing, because originally they named it "YAML" for "Yet Another Markup Language", when it clearly was NOT a markup language.

Only AFTER YAML had been around and in use for a few years did those geniuses actually realize that they had made a mistake in naming it something that it's not, and retroactively changed the name "YAML" to mean "YAML Ain't Markup Language", which was a too clever by half way of whitewashing the fact that they originally CLAIMED it was "Yet Another Markup Language", since they had no idea what a markup language actually was.

I prefer to use markup languages and data definition languages that were designed by people who are situationally aware enough to know what the difference between a markup language and a data definition language is, please.

Hard pass on YAML, whatever it stands for this week.

politelemon · on Nov 14, 2021

I've often heard this argument about YAML being "clean", but over time I have realized that they are conflating minimalism with cleaninless, when they are two different things. That realization is what it took for me to realize why I didn't like it. I did _not_ find it clean, I found it "messy" by virtue of the increased cognitive overhead. But it is minimal at least compared to other formats. Other formats appear cleaner to me.

avsteele · on Nov 14, 2021

I'll give my opinion as someone who has to choose among JSON, XML, TOML, and YAML about two years ago for a new project. Whatever I chose had to be easy for end-users who don't know the specification to to understand later.

Here were my thoughts on the options.

JSON - No comments -> impossible

XML - Unreadable

YAML - 2nd place. Meaningful indentation also made me worried someone was going to not understand why their file didn't work. The lack of quotes around strings was frustrating.

TOML - 1st place. Simpler than YAML to read & parse. It truly seems 'obvious' like the name says.

I haven't encountered any situations where I wish I had more than TOML offers.

benatkin · on Nov 14, 2021

I disagree. TOML is terrible at handling nested data.

Check this thread:

https://news.ycombinator.com/item?id=17523194

I don't see Kubernetes switching to TOML anytime soon!

Aeolun · on Nov 15, 2021

There may be no nested data in his use case. There’s no single correct answer here.

benatkin · on Nov 15, 2021

Too YAGNI for me.

avsteele · on Nov 15, 2021

I have nesting up to three levels deep. I use inline tables^ for the many innermost (or other few-element) tables. It's never seemed excessively verbose.

^https://toml.io/en/v1.0.0#inline-table

benatkin · on Nov 15, 2021

A bit later you see [fruits.physical]

Even XML doesn't make you repeat the higher level keys!

mountainriver · on Nov 15, 2021

Also agree, I find toml way less readable than yaml for lots of data structures

duped · on Nov 15, 2021

I feel like if you have data so nested that TOML is a problem then your schema is a problem/you should just be using a script

encryptluks2 · on Nov 15, 2021

I think the "right" choice is HCL

benatkin · on Nov 15, 2021

It's plenty easy to convert YAML to JSON and use it in Terraform :)

https://www.terraform.io/docs/language/syntax/json.html

encryptluks2 · on Nov 15, 2021

Why convert when HCL is superior to YAML and JSON?

benatkin · on Nov 15, 2021

It isn't. YAML and JSON are much more proven than HCL. HCL is used for some relatively small products. Just making something more complicated doesn't make it better.

encryptluks2 · on Nov 15, 2021

Proven in what sense? Several implementations are broken are incorrect. HCL is used in very large products as well. Just because it isn't the majority currently doesn't mean that it isn't a worthy choice. HCL isn't more complicated if used as an alternative to YAML or JSON, in fact, I would argue that it is simpler. It bridges the pros of YAML and JSON combined, and addresses the nested complexity of TOML. It really is IMO the best, but you of course are free to share a different opinion. However, I would encourage you to actually try it out and re-evaluate.

Andrew_nenakhov · on Nov 15, 2021

The unreadability of XML is grossly exaggerated.

sigzero · on Nov 15, 2021

I agree. I have never really had a problem reading XML myself.

signal11 · on Nov 14, 2021

There’s properties files too, but TOML is my “format of choice” as well for a bunch of use-cases where human readability is important.

More people should give it a try. Very reminiscent of old Windows INI files and Java properties.

DethNinja · on Nov 14, 2021

TOML is pretty good but it gets too verbose when you add bunch of arrays.

All we need is to revise official JSON standard(ECMA 404) to include comments.

gray_-_wolf · on Nov 14, 2021

And trailing commas. And unquoted keys.

distortedsignal · on Nov 14, 2021

Why are unquoted keys so critical? I feel like one of the strengths of a DDL like JSON or XML is that it's easy to tell what the data (key-value pair or otherwise) is, while with YAML and others, understanding data-vs-structure can be challenging.

Aeolun · on Nov 15, 2021

Mostly so copy between JS and JSON isn’t such a PITA.

It’s not essential, but if we’re already changing the format we might as well?

benatkin · on Nov 14, 2021

> All we need is to revise official JSON standard(ECMA 404) to include comments.

That would be a step back for GitLab CI, GitHub Actions, Kubernetes, Google App Engine, and a bunch of other projects which use YAML and seldom encounter the Norway problem. https://hitchdev.com/strictyaml/why/implicit-typing-removed/

lostmsu · on Nov 14, 2021

That's fine. They should not have used a format made for data to describe what essentially is code.

benatkin · on Nov 14, 2021

Why shouldn't they have?

speed_spread · on Nov 15, 2021

TOML can't decide if it's a super INI file or a JSON cousin. You can represent the same information using two completely different representations and you can mix both styles in the same document. Manually navigating and editing values is error prone and hard to automate.

Gigachad · on Nov 14, 2021

JSON with comments would be ideal.

emn13 · on Nov 14, 2021

Which is why many parsers support that. I'm positive you'll find one that does so in pretty much every environment.

KronisLV · on Nov 14, 2021

In that case, you might want to have a look at JSON5: https://json5.org/

It is pretty niche, but attempts to improve upon JSON in a multitude of ways, one of which is the support for comments: https://spec.json5.org/#comments

cozzyd · on Nov 15, 2021

The libconfig format is fairly close to that, and it's great!

http://hyperrealm.github.io/libconfig/libconfig_manual.html#...

mindcrime · on Nov 15, 2021

I guess it's just matter of personal taste, but I don't see how XML is any more "unreadable" than any of the other options mention here.

errcorrectcode · on Nov 14, 2021

TOML can handle nested data at the application level by using entity reference token semantics.

It does need an XPath traversal and search query format for application use and data references.

benatkin · on Nov 15, 2021

Shoehorning.

resonious · on Nov 15, 2021

A lot of people have really strong opinions towards syntax things like YAML vs JSON vs XML, HTML, even programming languages. I think at some point we assign way too much importance to this kind of stuff.

I recently read a piece by Joel Spolsky that resonated with me (even though my career is not nearly as long as his).

> I took a few stupid years trying to be the CEO of a growing company during which I didn’t have time to code, and when I came back to web programming, after a break of about 10 years, I found Node, React, and other goodies, which are, don’t get me wrong, amazing? Really really great? But I also found that it took approximately the same amount of work to make a CRUD web app as it always has, and that there were some things (like handing a file upload, or centering) that were, shockingly, still just as randomly difficult as they were in VBScript twenty years ago. [0]

It makes me wonder if we're really focusing on the right stuff. Maybe there's lower hanging fruit somewhere that's more valuable than focusing on fundamentally subjective things like syntax.

[0]: https://www.joelonsoftware.com/2021/06/02/kinda-a-big-announ...

georgewfraser · on Nov 14, 2021

A radically different alternative with a lot going for it is Starlark: https://github.com/bazelbuild/starlark

It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

slowmovintarget · on Nov 14, 2021

EDN [1] and Transit [2]... Elegant weapons for a more civilized system.

[1] https://github.com/edn-format/edn

[2] https://github.com/cognitect/transit-format

kaliszad · on Nov 15, 2021

Really came here to search why EDN wasn't mentioned. It is used in Clojure/ ClojureScript/ hylang ... projects a lot. It is a superset of JSON, is in my opinion a lot more readable than JSON but familiar enough too. It has native sets e.g. #{1 2 "three" '("four element list with a string inside")} and keywords. Tagged elements can be used for extending e.g. with a timestamp (such as the built-in #inst) or #uuid. And it also supports comments and discards for stuff, that should be omitted in evaluation.

As a sysadmin, YAML seems nice until you have actually done anything more advanced with it. See Julien Pivotto's presentation about some of its pitfalls: https://www.slideshare.net/roidelapluie/yaml-magic?next_slid... Btw. Jsonnet doesn't seem too bad either: https://www.youtube.com/watch?v=LiQnSZ4SOnw and here some examples: https://jsonnet.org/ but in my book, EDN still wins.

DonHopkins · on Nov 15, 2021

How about simply using pure full blown JavaScript or Python for config files, and not hiring people who you can't trust not to write infinite loops?

Or if you really must, then simply interrupt processes that loop infinitely, and fix the bugs that caused it.

You know, like you already do when you have an infinite loop.

Infinite loops are not the end of the world, you know. Processes can be interrupted, and computers have reset buttons.

repsilat · on Nov 15, 2021

IMO using code that generates (possibly binary/opaque) config data is the sweet spot. It's one more layer of indirection, but it means you're language-agnostic, you have a "safe" interface, and your "config-generating" process can be as expressive as you like -- comments, loops, whatever.

The underlying conundrum is:

- systems need to be configured,

- human-readability is obviously necessary at some level,

- configuration is often very "compressible" (needs loops, needs variables to be maintainable), but

- system-writers don't know the structure of your data, the axes on which you'd want to compress things, the best abstractions for you.

Templating languages are an obvious direction, but they're uniformly bad. If they have limited expressiveness you'll run into the limits. Maybe there are templating languages with good unit testing frameworks, but I haven't seen them. "Look at the expanded diff" doesn't scale. And generating gobs of human-readable "data" (in a format that supports comments!) is very wasteful.

kortex · on Nov 15, 2021

It's not just a trust thing. Knowing that some snipped has bounded evaluation is super important, for mental models, processin, security, etc.

It's still resources to detect loops, it often involves introspection or privileged views; it's simply easier to prevent loops.

8192kjshad09- · on Nov 15, 2021

Determing whether or not arbitrary code is looping is actually impossisble (halting problem).

OskarS · on Nov 14, 2021

> Starlark is a dialect of Python. Like Python, it is a dynamically typed language with high-level data types, first-class functions with lexical scope, and garbage collection.

If it has first-class functions, how can you avoid infinite recursion? Like, what stops me from running the omega combinator in it? This is why Meson (a similar language) does not allow those kinds of shenanigans, to keep the language non-Turing-complete.

georgewfraser · on Nov 14, 2021

No recursion and no lambda.

OskarS · on Nov 15, 2021

So it doesn't have first class functions then?

civilized · on Nov 14, 2021

Not a bad idea but only implemented in Rust, Go, and Java so far. Meanwhile, all sorts of languages can interpret JSON and YAML.

It's a cool idea to do configuration in a subset of Python but now you have to go implement that subset in every language.

vlovich123 · on Nov 14, 2021

Have you had any experience building on top of it directly outside of blaze/bazel?

dilyevsky · on Nov 14, 2021

I have - https://github.com/cruise-automation/isopod and more generally with https://github.com/stripe/skycfg

leontrolski · on Nov 14, 2021

How about just nudge json a couple more notches towards js? https://github.com/leontrolski/dnjs

remram · on Nov 14, 2021

Interesting! I started using jsonnet this year, but found that the language was needlessly quirky (e.g. the `::`, purely functional aspect, and no one wants to learn a new language to write configuration in the first place). More importantly, it is extremely slow (lazy evaluation without memoization...): rendering the Kubernetes YAML of my 5-container app taking over 10 seconds...

I will look into this further.

xiaq · on Nov 14, 2021

> It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

Starlark is indeed deterministic and guaranteed to terminate (the Go implementation has a flag that allows recursion, but it's off by default), but these are two orthogonal properties.

seedless-sensat · on Nov 14, 2021

Plenty of tools lacking in the Starlark environment, e.g.: generating Starlark files, machine editting Starlark maps

im3w1l · on Nov 14, 2021

So one thing I wasnt sure of is: If you have a Starlark program how is the value of it decided? Is it simply the value of the last expression? And where does the print-output end up? Is it just for diagnostics and has no influence on the value?

account-5 · on Nov 14, 2021

I like INI. It's simple it's readable and it leaves the data types up to the application to interpret. It's also really easy to parse, I can work out how to do it and JSON is beyond me.

I like CSV (and similar delimited files) it's less verbose than anything else for tabular data.

I like JSON for data transfer, you know the data types, it's succinct, and readable.

I personally don't need anything else.

aspaceman · on Nov 14, 2021

This is the right answer in my view. If you need something structured use XML, otherwise INI.

I'm more likely to Yacc my own config format than use YAML or JSON personally.

JSON is great as an output format for data though.

zz865 · on Nov 14, 2021

INI is my favorite. I dont understand why it isn't the automatic default for everything.

gray_-_wolf · on Nov 14, 2021

As far as I know there is no standard for INI. There is a TOML that looks close enough I guess?

zz865 · on Nov 15, 2021

TOML looks good. I'd rather it be call the ini standard but.

chii · on Nov 14, 2021

iirc, it's hard to do any nested structure in INI - you'd have to do a convention like putting prefixes and dots in the name of the entry to denote hierarchy.

FpUser · on Nov 14, 2021

Exactly what I think about the matter. Sometimes I use proprietary binary formats together with UDP where performance is critical (game servers for example).

angelzen · on Nov 15, 2021

I like NestedText, it's less verbose than anything else for nested data.

zmmmmm · on Nov 14, 2021

I have to say I hate the fact that I have low confidence when editing YAML that the result will be what I intend. It's kind of the number one job of such a format. And I routinely run into people using advanced features and then I have no idea at all how to safely edit it. It is interesting that it seems so difficult to pick a good tradeoff between flexibility and complexity with these kinds of languages.

pjmlp · on Nov 14, 2021

I just stick to XML unless forced to use something else.

Schema validation, code completion on IDEs, endless amount of tooling including graphical visualisation, a language for data transformation and queries, and.... wait for it... comments!

miffy900 · on Nov 15, 2021

If you're going to use XML, I would consider it mandatory to also use XSDs (W3C XML Schemas).

XSDs is something I think people need to pay more attention to when dealing with XML; the type system that the W3C XSD standard lays out (when used effectively) really does relieve much of the pain that people experience with XML.

diob · on Nov 14, 2021

What is the obsession with removing braces? I will never find the lack of clear demarcations (relying on indent) easier than braces.

andybak · on Nov 14, 2021

Visual clutter, familiarity to non-coders. Curly braces are almost never used outside of programming and are ugly to boot.

My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"

If the intended audience is purely developers, then sure. JSON (with the addition of comments and trailing commas) is just fine.

White space has the additional advantage of agreeing with itself. Other demarcations can have issues where the indentation and the structure contradict each other.

szastamasta · on Nov 14, 2021

Again the dreaded Cobol argument. We had to struggle with a lot of this in the past: Cobol, SQL, YAML, BDD. All this would be much easier without this nonsensical idea that nontechnical people will read code. They won’t. Making code a bit more like prose doesn’t make it readable for nontechnical people. Yet we again and again make our life harder - ugly syntax rules, no code completion, no auto-formatters.

Please stop making code easy for non-coders. They don’t want to read it. They never did. They just want this damn box to work.

gmadsen · on Nov 14, 2021

as a counter argument. I work in robotics, where many operators will look at and change settings in a yaml file during testing. They do not have software skills outside of this.

eitland · on Nov 14, 2021

My educated guess then is you could have gotten them to change "settings" in C, Java or basically anything.

Just put the file in the root folder, and keep it as simple as possible and you should be fine? I mean, if they manage to write yaml correctly and consistently C is no match?

gmadsen · on Nov 14, 2021

maybe? the dynamic loading of the configs kind of restricts it to a markup language.

sockpuppet_12 · on Nov 16, 2021

The reason the situation is the way it is now is precisely because the code being made easy for non-coders increased the popularity and reach of the products. Probably because non-coders also found it easy to pick up and start working with it.