I don't like YAML and would like to move on, but I hope we don't move onto this.
I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.
I don't like that a truncated document is a complete and valid document.
But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:
I wish people would stop trying to write programs for which there are no interpreters, compilers, or linters:
name: Install dependencies
run:
> python -m pip install --upgrade pip
> pip install pytest
> if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi
That is a program that is hiding in the bowels of a "nestedtext" document ... It is no better than a program that is hiding in the bowels a JSON or YAML document.
We all have to deal with this, but it is beyond stupid.
I don't think it matters much if this is inline or in separate file. If you want to test your tests, "yq -r .run input.yaml | sh -e" works as well.
In fact, if I really wanted to test my tests, I'd say that directly testing the corresponding clause is the more comprehensive approach. For example, what if someone accidentally changes the line to read:
run=/path/to/install-scriptq
? then your test of "install-script" will not catch anything. But if your test runs "yq -r .run | sh -e", then it will catch that error. And you can still forward to a script if you wanted to.
So let's keep inline scripts, they are very reasonable methods for just a few commands.
Depending on the source control tool you may lose syntax highlightning, you, most likely, lose linters and even copying those multi-line commands to shell becomes cumbersome. I consider inlining example from GP's comment awful
It would be nice if YAML wasn't horrendously abused the way it is. You have CI pipelines that let you construct DAGs to represent your builds, but you need several thousand lines of YAML and a load of custom parsing to get programming constructs in the string types, for example. And then each provider has its own way of providing those.
I don't have to re-read manuals describing how to do if/else in Ruby or Java or Lisp, but as soon as yaml and some 'devops' tooling is involved, I have to constantly jump back and forth between the reference and my config.
The main point being that the problem isn't the file format but the products that continue to push it, presumably because hacking stuff on top of `YAML.parse` is less effort than designing something that fits the purpose.
Yeah. A lot of times I find myself thinking YAML is like a really awful programming language. You can sort of do conditional logic and loops, but usually I find it hard to follow what's going on.
For build systems, I always liked the idea of Gradle where the core functionality was simple and declarative, but with the option to use a real programming language for things that weren't simple. For example, integrating installers or form builders (pre-processing) into a build are things I would consider non-trivial if there aren't official plugins, but it was still relatively easy to do with Gradle.
The biggest problem I always had with Gradle was that I didn't like Groovy and I always though there was a missed opportunity to have a statically typed build system with a solid API/contract and all the fancy tooling like auto-complete that you get with statically typed languages.
I see JSON5 mentioned a lot in the comments. In terms of CI / build systems, I feel like something built with JSON5/TypeScript could be really good. I'd be really happy using TypeScript for configuring things like build systems where there shouldn't really be an argument for needing it to be usable by non-programmers.
Personally I feel like I've spent way to much of my life debugging YAML syntax issues.
If you're happy to go lispy, there's Babashka [1], a Clojure without the JVM. It has built-in support for 'tasks' designed to make writing build scripts easy.
My experience with Kotlin gradle scripts is worse than Groovy. For example, given the following valid groovy/kotlin gradle program:
dependencies {
}
What would you expect to see between the curly braces? IntelliJ IDEA which supposedly has full support for the gradle DSL both for Groovy and Kotlin offers only generic suggestions. Common function calls such "implementation()" or "testImplementation()" are not suggested. If you do use those functions, no suggestion is made for their parameters. Because Gradle's DSL is built on top of a general purpose language, it loses the benefits of a DSL (constraining the set of possible configurations and guiding the user towards valid configurations).
The key benefit of the Kotlin DSL is that in this precise example, IDEA does suggest valid stuff:
https://imgur.com/a/vFYNIU1
Kotlin DSL is miles ahead of Groovy in terms of discoverability and IDEA integration. With Groovy DSL, most of the build script is highlighted with various degrees of errors and warnings; with Kotlin DSL, if something is highlighted, it is a legitimate error, and vice versa - if no errors are detected by IDEA, then it is almost certain to work.
There were rough spots of IDEA integration a couple years ago, but now it is close to perfect, within Gradle's limits of course (due to sheer dynamic nature of it, some things are just not possible to express in a static fashion, unfortunately). The biggest obstacle to Kotlin DSL use might be that some of the plugins use various Groovy-specifc features which are hard to use from Kotlin, but thankfully most of the plugins either fix those, or are rewritten in Java or Kotlin instead.
There's a huge gap in Java build tool space for a tool that is simple and easy to learn and can cover 90% of projects' requirements. I have this feeling that we're in the "subversion" days of java build tools and the day someone introduces "git" people will wonder why we suffered with Gradle and Maven for so long. If I had time I would be looking into building this.
predating Gradle was a tool called gant. It was simple, intuitive and did 90% of what every project could want. Ironically it was Groovy based as well. But instead of the Gradle arcane magic based configuration it was literal, direct, a simple extension of Ant that came before it. I liked it much better, but someone decided they could make a business out of Gradle and gant got deprecated and here we are.
I found it fairly simple to build Gradle plugins with Kotlin. If anything, the problem was just having the patience to actually find the right documentation in the first place, and understand what was being described. The main problem I faced there was that I wanted a plugin to configure dependencies for the project it would run against and the docs around dealing with dependencies and detached configurations were a bit confusing.
I do find it curious that a lot of these tools get seen as basic task runners despite offering much more potential.
It's always the same trajectory with declarative programming. It starts with "it's just configuration, we need something simple". Then users come with use cases which are more complex. Then you have programming language on top of configuration language syntax.
Very much so. A good few years ago I got annoyed that I couldn't have change mutt configuration the way that I wanted, because it has a built in configuration language which doesn't allow complicated conditionals etc.
(There are workarounds, and off-hand I can't think of a great example, but bear with me.)
In the end I wrote a simple console-based mail-client, which used a Lua configuration file. That would generally default to using hashes, and key=value settings, but over the time I used it things got really quite configurable via user-defined callbacks, and functions to return various settings.
For example I wrote a hook called `on_reply_to`, and if you defined that function in your configuration file it would be invoked when you triggered the Reply function. This kind of flexibility was very self-consistent, and easy to add using an embedded real language.
Later I added some hacks to a local fork of GNU Screen, there I just said:
* If the ~/.screenrc file is executable, then execute it, and parse the output.
That let me say "If hostname == foo; do this ; otherwise do this .." and get conditionals and some other things easily. Another example was unbinding all keys, and then only allowing some actions to be bound. (I later submitted "unbindall" upstream, to remove the need for that.)
What's really sad is that XML had a much better ecosystem around this for ages. I'd very much rather deal with XQuery or even XSLT to construct build trees, than the current crop of ad-hoc YAML preprocessors. At least the XML stuff had a consistent type system underneath!
XSLT is an absolute horror and not something I would want to deal with again. It feels like some weird academic experiment in an XML declarative programming language that should never have made it to print.
If something needs the flexibility of a programming language, why not use a real one that's been well tested for writing other programs? These various config file programming systems always end up creating something notorious that everyone tries to avoid having to work on.
XQuery is, in many ways, XSLT with better syntax. It doesn't have the pattern-matching transforms that are the T in XSLT - but for configs, I don't think it makes a big difference.
Also, I don't think many realize that the stack has evolved since early 00s. XSLT 1.0 was a very limiting language, requiring extensions for many advanced scenarios. But there's XSLT v3.0 these days, and XPath & XQuery v3.1, with some major new features - e.g. maps and lambdas. Granted, this doesn't fix the most basic complaint about XSLT - its insanely verbose syntax - but even then, I'd still take XSLT over ad-hoc YAML-based loops and conditionals.
I will take the verbosity of XML any day over YAML wrestling (complex YAML configs of course). There is simply too many "implicit rules" for YAML. it's why I prefer python over ruby and perl. Generally though TOML has been good enough for me to do lots of fairly large config files that are easy for humans and machines to parse.
XML died because too many configurations turned what should be a 'prop' into an inner tag -- and it doesn't help that XML doesn't really give guidance as to when to use which. And, of course, when you deserialize XML, the innerText is always in a very strange place as to not really being clear "what the right way to handle it" is.
Honestly, I think using an embedded scripting language, like lua or even javascript, would be a much better fit for these use cases than trying to make yaml do something it wasn't designed for.
Ironically, having used cdk8s[1] for dealing with kubernetes infrastructure, that's the one thing where I've actually preferred yaml. That said, k8s resource definitions are pure config so there's no need to try and hack extra bits on top of a serialized data structure.
I really like approach of buildkite CI -- they use yaml, but this yaml can be produced by an executable script.
So you write yaml by hand for trivial cases, but once it get complex, you can just drop back to shell/python/ruby/node/whatever, implement any complex logic, and serialize results to plain yaml.
Author seems to use misfeatures of a particular implementation to tar all implementations with. The round-tripping issue is not a statement about YAML as a markup language, much in the way a rendering bug in Firefox is not a statement about the web.
Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on. Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy. They're prone to opinion and style, as if replacing some part or other will make the high level problem (that's us) go away. Fretting over perfection in UI is an utterly pointless waste of time.
I don't know what NestedText is and find it very difficulty to care, there are far more important problems in life to be concerned with than yet another incremental retake on serialization. I find it hard to consider contributions like this to be helpful or represent progress in any way.
If you can write a bad YAML document because of those mis-features/edge cases, I'd say you've already lost.
Humans are messy, but at the end of the day the data has to go to a program, so a concise and super simple interface has a lot of power to it for humans.
Working at a typical software company with average skill level engineers (including myself), no one likes writing YAML. But everyone is fine with JSON.
I think it's a case of conceptual purity vs what an average engineer would actually want to use. And JSON wins that. If YAML was really better than JSON, we'd all be using that right now.
So does it really matter if YAML is superior if >80% of engineers pick JSON instead?
I would argue that you can write something poor and/or confusing in any markup language that is sufficiently powerful.
Conversely, if a markup language is strict enough to prevent every inconsistency, then it's not powerful enough or too cumbersome to use to be generally useful.
I'd say that YAML is anything but conceptually pure, with all the arbitrariness, multitude of formattin options, and parsig magic happening without warning.
If you want conceptual purity (and far fewer footguns), take Dhall.
> Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on
Nah, in the 1970s we had Lisp S-expressions that completely solved the problem, and everything since then has been regressions on S-expressions due to parenthesis phobia.
After hearing that thing about the country code for Norway, I became convinced that YAML has to just die. Become an ex-markup language. Pine for the fjords. Be a syntax that wouldn't VOOM if you put 4 million volts through it. Join the choir invisible, etc.
S-expressions don't solve the problem at all, you just get to fractally bikeshed all over again about what semantics they have and what transformations are or aren't equivalent. Does whitespace roundtrip through S-expressions? Who knows. Are numbers in S-expressions rounded to double precision on read/write? Umm, maybe. How do I escape a ) in one of my values? Hoo boy, pick any escape character you like and there's an implementation that does it.
S-expressions don’t completely solve the problem: they don’t have a syntax for maps, and in practice there are at least two common incompatible conventions: alist or plist?
Obviously the application has to interpret the Lisp object resulting from reading the S-expression, just like it has to interpret any JSON, YAML, or anything else that it reads. So for maps you can, as you mention, use alists or plists. Regarding other stuff mentioned: none of the encodings are supposed to be bijective (the writer emits the exact input that the reader ingested). Otherwise, for example, they couldn't have comments, unless those ended up in the data somehow. There is ASN.1 DER if you want that, but ASN.1 is generally disastrous.
Stuff like escape chars were well specified in Lisps of the 1970s (at least the late 1970s), including in Scheme (1975). Floating point conversion is a different matter (it was even messier in the pre-IEEE 754 era than now) but I think the alternatives don't handle it well either. You probably have to use hexadecimal representation for binary floats. Maybe decimal floats will become more widely supported on future hardware.
A type-checked approach can be seen in XMonad, whose config files use Haskell's Read typeclass for the equivalent of typed S-expressions.
Solutions for this problem that I've used in my own S-expression config files:
1. Use only alists for maps because they prevent off-by-one errors.
2. Allow plists because they're less verbose than alists and use reader macros to distinguish them, and allow the reader macro definitions to be in the same file.
Most of the time I use option 1 because it's simpler.
I would argue that, in a data markup language, there shouldn't be a syntax for maps. Whether a given sequence should be treated as key-value pairs, and whether keys in that sequence are ordered or unordered, is something that is best defined by the schema, just like all other value types.
>Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy
I don't know what to make of this statment, it has so much handwaving built-in. The most charitable interpretation I can find is that by 'Human-convenient' you simply meant the quick-and-dirty ideology expressed in Worse Is Better: Does job, makes users contemplate suicide only once per month, isn't too boat-rocking for current infrastructure and tooling.
Taken at face value (without special charitable parsing), this statement is trivially false. Python is often used as a paragon of 'Human-convenience', I sometimes find this trope tiring but whatever Python's merits and vices its _definitely_ NOT messy in design.
Perl is the C++ of scripting languages, it's a very [badly|un] designed language widely mocked by both language designers and users. Lua and tcl instead are languages literally created for the sole exact purpose of (non-) programmers expressing configuration inside of a fixed kernel of code created by other programmers, and look at their design : the whole of tcl's syntax and semantics is a single human-readable sentence, while lua thought it would be funny if 70% of the language involved dictionaries for some reason. These are extremely elegant and minimal designs, and they are brutally efficient and successful at their niches : tcl is EDA's and Network Administration's darling, and lua is used by game artists utterly uninterested in programming to express level design.
'Humans are messy' isn't a satisfactory way to put it. 'Humans love simple rules that get the job done' is more like it. But because the world is very complex and exception-laden, though, simple rules don't hug its contours well. There are two responses to this:
- you can declare it a free-for-all and just have people make up simple rules on the fly as situations come up, that's the Worse Is Better approach. It doesn't work for long because very soon the sheer mountain of simple rules interact and create lovecraftian horrors more complex than anything the world would have thrown at you. Remember that the world itself is animated by extremely simple rules (Maxwell's equations, Evolution by Natural Selection, etc...), it's the multitude and interaction of those simple rules that give it its gargantuan complexity and variety.
- you stop and think about The One Simple Rule To Rule All Rules, a kernel of order that can be extended and added to gradually, consistently and beautifully.
The first approach can be called the 'raster ideology', it's a way of approximating reality by dividing it into a huge part of small, simple 'pixels' and describing each one seperately by simple rules. I'm not sure it's 'easy' or 'convenient', maybe seductive. It promises you can always come up with more rules to describe new patterns and situations, and never ever throw away the old rules. This doesn't work if your problem is the sheer multitude and inconsistency of rules. The second approach is the 'vector ideology', it promises you that there is a small basis of simple rules that will describe your pattern in entirety, and can always be tweaked or added to (consistently!) when new patterns arise, the only catch is that you have to think hard about it first.
>and lua is used by game artists utterly uninterested in programming to express level design
Rather short sighted and dismissive to a successful programming language that's evolved over 20+ years. Lua is a great general purpose programming language that specializes not in "game making for non-programmers" but in ease of embedding, extension/extensability, and data description (like a config language). There's a whole section in Programming in Lua[1] to that effect. The fact that it's frequently used in games is credit to it's speed, size and great C API for embedding, not because of any particular catering to game designers.
You misunderstood me. I love lua and I wasn't being dismissive of it, I was using the first example that came to my mind to counter the claim that a convenient language has to be messy. Just because that was the example used doens't mean there is an implicit "and that's the only thing it's good for" clause I'm implying there: if someone said "Python is used by scientists utterly uninterested in programming to express numerical algorithms" would you understand that to be a dismissive remark against Python ?
Being used by non-programmers utterly uninterested in programming to solve problems is the highest honor any programming language can ever attain, because it means that the language is well-suited to the domain enough (or flexible enough to be made so) that describing problems in it is no different than writing thoughts or design documents in natural language. This is the single most flattering thing you can ever say about a language, not a dismissive remark.
It's really sad to see the pervasiveness of JSON. For one thing its usage as a config file is disturbing. Config files need to have comments. Second, even as a data transfer format the lack of schema is even more disturbing. I really wish JSON didn't happen and now these malpractices are so widespread that it's hurting everyone.
JSONC. JSON with comments. And even if your favorite parser does not support it natively it’s not so hard to add with a very simple pre-lexer step.
JSON schemas exist and they’re ok for relatively simple things. For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.
> For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.
Not sure if this is what you're looking for, and whether it's powerful and expressive enough for your use case, but you can use typescript-json-schema¹ for this, and validate with eg ajv.
I've struggled with this in Java recently and at first I used Jankson which supports the complete JSON5 spec, but later we figured out we could configure the standard Jackson JSON package to accept the things we actually need and actually use.
Seems to me that YAML just needs type/schema support to be less of a hurdle.
As an alternative, the encoding/decoding roundtrip using protobuf seems reasonable to me, catches the footgun of using floating-point version numbers (it becomes a parse error), whitespace/multiline concatenation being more obvious, and allowing comments (compared to JSON):
> Seems to me that YAML just needs type/schema support to be less of a hurdle.
Unfortunately YAML already got type support, which made it easier to roundtrip, but also insecure. Creating a type calls constructors with possible insecure side effects. Which was eg used to hack Movable Type.
JSON Schema is an official thing that exists and has implementations in all major languages. Personally I’m very glad that it’s an opt-in addition rather than a requirement.
I agree, but I would recommend JSON5 as the solution. Not YAML or this abomination.
JSON5 has many advantages:
* Superset of JSON without being wildly different. I know YAML is a superset of JSON but it's completely different too. Insane.
* Unambiguous grammar. YAML has way too many big structure decisions that are made by unclear and minor formatting differences. My work's YAML data is full of single-element lists that shouldn't be lists for example.
* Comments, trailing commas
* It's a subset of Javascript so basically nothing new to learn.
* It has an unambiguous extension (.json5). I think JSONC would be a reasonable option but everyone uses the same extension as JSON (.json) so you can never be sure which you are using. E.g. `tsconfig.json` is JSONC but `package.json` is just JSON (to everyone's annoyance).
* Doesn't add too much of Javascript. I wouldn't recommend JSON6 because it's just making the format too complicated for little benefit.
Unfortunately it doesn't really because of the extension issue I mentioned. Certain file names (like `tsconfig.json`) are whitelisted to have JSONC support, but any random file `foo.json` will be treated as JSON and give you annoying lints if you put comments and trailing commas in.
Tools that use JSON as configuration format could simply allow certain unused keys (e.g. all keys starting with #) and promise never to use them. Then author can write their comments with something like:
There's a lot of JSON tooling, and it's liable to interact badly with this. For example, a formatter might re-order the fields of a dict, moving "#comment-1" away from "version". Or the software that this JSON is for might error upon receiving unexpected keys (which is actually useful behavior, as that would catch a typo in an optional field).
Also, this doesn't let you put comments at the top of the file, or before a list item, or at the end of a line.
If you're going to change your JSON tooling to handle comments of some kind, you might as well go all the way to JSONC.
I've heard and read this multiple times. Why are you trying so hard to fit into a format that doesn't just support comments out of the box? What advantages is JSON offering you that you've compelled to bend over backwards to do this? It's exactly these kinds of workarounds that is making it super difficult stop such malpractices. It's just plain ugly. Please stop doing this.
You can't comment out a large section of config easily. For me, this is a relatively common use case for config files, so I take the position that JSON should be used for serialization only.
And I am just writing a JSON de/serializer to move my config from the current system to JSON. I worked on it today and yesterday and several days some time ago.
So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day
(and it doesn't have to be comment-less... JSON with comments is a thing and VSCode has syntax highlighting for it - just strip out the comments before parsing).
Disclaimer: this is not a defense for YAML, I'm just trying to remove the rose tinted glasses some people view XML configs through.
As someone who has used XML configs they have a few problems:
- technical: missing comments are mentioned multiple times here so I will mention that while XML has comments they cannot be nested.
- socially: for some reason (maybe because XML is structured enough that this doesn't immediately collapse?) XML tends to just grow and grow. People start programming in XML too, and not only using XSLT or other standard approaches but also in completely proprietary ways.
At one project someone even wrote an authorization framework in Apache Tiles which allowed one to create roles using somewhere between 600 and 5000 lines of XML pr role. The benefit was of course that you could update the roles without touching the Java code.
(In case it isn't immediately obvious: it would have been extremely much simpler to edit it in Java, and people who know enough Java to fix it are available at the right price, the XML system had to be learned at work.)
Personally I just want it to be kept simple:
- a settings.local.ini and default settings in settings.ini or something to that effect
- if necessary, just use a code file: config.ts works just as well, or config.js if it needs to be adjustable at runtime without transpilation.
not easy to read, it's the java of config, pages of code that express very little, by the time you find what you need, you forget the context and what level of nesting you're on already. It's also more wasteful as a transport.
It compresses pretty decently and doesn't have too much of an overhead, in the example it being around 10% larger than JSON when compressed.
I'd argue that if one were to swap out JSON for XML within all the requests that an average webpage needs for some unholy reason, the overall increase in page size would be much less than that, because huge amounts of modern sites are images, as well as bits of JS that won't be executed but also won't be removed because our tree shaking isn't perfect.
Edit: as someone who writes a good deal of Java in their dayjob, i feel like commenting about the verbosity of XML might be unwelcome. I'll only say that in some cases it can be useful to have elements that have been structured and described in verbose ways, especially when you don't have the slightest idea about what API or data you're looking at when seeing it for the first time (the same way how WSDL files for SOAP could provide discoverability).
However, it all goes downhill due to everything looking like a nail once you have a hammer - most of the negative connotations with XML in my mind actually come from Java EE et al and how it tried doing dynamic code loading through XML configuration (e.g. web.xml, context.xml, server.xml and bean configuration), which was unpleasant.
On an unrelated note, XSD is the one truly redeeming factor of XML, the equivalent of which for JSON took a while to get there (JSON Schema). Similarly, WSDL was a good attempt, whereas for JSON there first was WADL which didn't gain popularity, though at least now OpenAPI seems to have a pretty stable place, even if the tooling will still take a while to get there (e.g. automatically generating method stubs for a web API with a language's HTTP client).
How WSDL and the code generation around it worked, was that you'd have a specification of the web API (much like OpenAPI attempts to do), which you could feed into any number of code generators, to get output code which has no coupling to the actual generator at runtime, whereas Pyotr is geared more towards validation and goes into the opposite direction: https://pyotr.readthedocs.io/en/latest/client/
The best analogy that i can think of is how you can also do schema first application development - you do your SQL migrations (ideally in an automated way as well) and then just run a command locally to generate all of the data access classes and/or models for your database tables within your application. That way, you save your time for 80% of the boring and repetitive stuff while minimizing the risks of human error and inconsistencies, with nothing preventing you from altering the generated code if you have specific needs (outside of needing to make it non overrideable, for example, a child class of a generated class). Of course, there's no reason why this can't be applied to server code either - write the spec first and generate stubs for endpoints that you'll just fill out.
However, for some reason, model driven development never really took off, outside of niche frameworks, like JHipster: https://www.jhipster.tech/
Furthermore, for whatever reason formal specs for REST APIs also never really got popular and aren't regarded as the standard, which to me seems silly: every bit of client code that you write will need a specific version to work against, which should be formalized.
same as to why REST is now not a hot thing anymore, the idea that your API is just a dumb wrapper around data model is poor api design.
API-driven development didn't really took off either, that is write your spec in grpc/OpenAPI and have the plumbing code generated in both ends. It's technically already there with various tools, but because of dogma like "code generation is bad", quality of code generators, or whatever reason, we're still writting "API code"
If this is your first time using Django, you’ll have to take care of some initial setup. Namely, you’ll need to auto-generate some code that establishes a Django project – a collection of settings for an instance of Django, including database configuration, Django-specific options and application-specific settings.
$ django-admin startproject mysite
Similarly, PyCharm doesn't seem to have an issue with offering to generate methods for classes (ALT + INSERT), such as override methods (__class__, __init__, __new__, __setattr__, __eq__, __ne__, __str__, __repr__, __hash__, __format__, __getattribute__, __delattr__, __sizeof__, __reduce__, __reduce_ex__, __dir__, __init__), implementing methods, generating tests and copyright information.
I don't see why CLI tools would be treated any differently or why code generation should be considered an anti-pattern since it's additive in nature and is entirely optional, hence asking to learn more.
First of all, just because a tool or project uses a pattern, it doesn't mean that it's a good idea. Second, code generation as part of IDE or one-time setup is something else.
I need to clarify: when I say that "code generation" is an anti-pattern, I'm talking about the traditional, two-step process where you generate some code in one process, and then execute it in another. But Python works really well with a different type of "code generation".
Someone once said that the only thing missing from Python is a macro language; but that is not true - Python has its own macro language, and it's called Python.
Python is dynamically evaluated and executed, so there is no reason why we need two separate steps when generating code dynamically; in Python, the right way is not to dynamically construct the textual representation of code, but rather to dynamically construct runtime entities (classes, functions etc), and then use them straight away, in the same process.
Unless you're dynamically building hundreds of such constructs (and if you do you have a bigger problem), any performance impact is negligible.
> Someone once said that the only thing missing from Python is a macro language
Ahh, then it feels like we're talking about different things here! The type of code generation that i was talking about was more along the lines of tools that allow you to automatically write some of the repetitive boilerplate code that's needed for one reason or another, such as objects that map to your DB structure and so on. Essentially things that a person would have to do manually otherwise, as opposed to introducing preprocessors and macros.
Wait, it the opposite. XML is designed to indicate context, and JSON is designed to hide context, you have a bunch of braces in place of context there, no matter where you are it's braces all the way down, like lisp.
not really, what enables you to have the context is shorter code. It's useless to have context reminders at the top and bottom of the thing, but not the middle and it's too damn long
For me XML and YAML are about the same. I think I'd also prefer comment-less JSON over both. However, XML wasn't that bad. With a decent editor and schema validation I would say there's a good chance I was more productive with XML than I am with YAML.
It's simple. For config files, choose the format that has the best tooling in your company and that supports comments. For data transfer, choose that supports schemas, backwards compatibility and good tooling (protobufs is just one e.g. that I'm most familiar with).
Actually, yes, I do. XML syntax was far from stellar, and much of the ecosystem (e.g. XML Schema) was drastically overengineered... but even so, we had gems like RELAX NG to compensate. On the whole, it was better than the current mess.
My opinion only: I love JSON because it lacks so many foot guns of yaml. If you’re doing lots of clever stuff with yaml you probably want a scripting language instead. Django using Python for configs made me fall in love with this. Spending years with the unmitigated disaster that is ROS xml launchfiles and rosparams makes me love it even more.
Yaml and toml are fine if you keep it simple. JSON direly needs comments support (but of course wasn’t designed to be used as a human config file format so that’s kind of on us). And not just “Jsonc that sometimes might work in places.”
Beyond that, I think we generally have all the things we need and I don’t personally think we need yet another yaml. =)
These aren't foot-guns per se, but I can think of another handful of grievances I have with JSON:
* JSON streaming is a bit of a mess. You can either do JSONL, or keep the entire document in memory at once. I usually end up going with JSONL.
* JSON itself doesn't permit trailing commas. I can measure the amount of time that I've wasted re-opening JSON files after accidentally adding a comma in days, not hours.
* JSON has weakly specified numbers. The specification itself defines the number type symbolically, as (essentially) `[0-9]+`. It's consequently possible (and common) for different parsers to behave differently on large numbers. YAML also, unfortunately, has this problem.
* Similarly: JSON doesn't clearly specify how parsers should behave in the presence of duplicate keys. More opportunity for confusion and bugs.
Running prettier (https://prettier.io) on each save will fix trailing commas for you. If you accidentally have one, it will just sneakily remove it and turn your document into one that is valid.
It may have been a good or bad decision. But comments were intentionally left out of JSON to avoid obvious ways to sneak in parsing directives and thus incompatibilities between different JSON-parsers.
If I had a penny every time someone tried to parse xml using a regex, if that classifies as a parser. Those are 100% incompatible with everything else.
Easiest way to demonstrate how wrong that is, is to throw in a comment in the example document ;)
the funny thing is that json doesn't even need commas, they essentially act as whitespace, any amount or no amount would make no difference in the meaning of the document.
And the flip side of that with YAML is you can stream it, but you don't know once you've gotten to the end if it was the whole document without some user defined checksum mechanism.
Ran into a great bug with the INI format which has the same issue. The application would read the config file on modification but if you just wrote over the file it would sometimes read the config before the file was fully written. Have to use a temp file and move it rather than just edit it.
I believe that's only true if one were to load YAML via the "SAX"-style per-event stream, and not the "object materialization" that normal apps use (aka `yaml.load_all` or JAX-B objects) since in those more data-object centric views, where would one put the processing events for those markers?
I also originally expected `yaml.parse(...)` to eat them as it does for comments and extraneous whitespace, but no, it does in fact return dedicated stream events for them, so TIL
> Django using Python for configs made me fall in love with this.
I also started advocating in-language configuration files (Python for Python, but also Lua for Lua, etc) a number of years ago because it lets you do really useful things (like functionally generating values, importing shared subsets of data, storing executable references, and ensuring that two keys return the same values without manual copy/paste) all without needing to spec and use Yet Another Thing™ that does only a fraction of what the programming language you're already using already does.
That also implies that you can't just test a foreign config file without first reading and understanding what it does, as just using one would imply arbitrary code execution.
This is a place where Tcl excels. You can easily create restricted sub-interpreters that can't do anything dangerous. If you need more power for trusted scripts you just reenable selected commands.
JSON5 is the way to go. It supports comments and trailing commas. Unfortunately it's going to be difficult to supplant legacy JSON, which is so pervasive.
Except parsing JSON5 in browser is super slow. Native JSON.Parse doesn't support it, non-native parsnips are slow, and the only fast way to parse it is `eval()`.
The desire to use a single interchange format for all data is the problem. There are plenty of reasons to support comments and minor syntax issues that JSON itself dislikes for human consumable and interactive JSON. I'd think software JSON could be just that.
I’ve never liked YAML. For whatever reason, it always feels like working in a mine field. It comes from the same cargo cult of people who think the problem with human machine formats is that it needs to be “clean”.
Clean, of course to them means some bizarre aesthetic notion of removing as much as possible. Only it’s taken to an extreme. I wonder if the same people also think books would be better with all punctuation be removed to make it look “clean”?
It’s unhealthy minimalism, causes more problems than it solves. As soon as I see a project using YAML I cringe and try to find an alternative because god knows what other poor choices the developer has made. In that sense, YAML can be considered a red herring and I’m usually right. The last project I used that adopted an overly complex and build-breaking YAML configuration syntax had other problems hiding under the covers, and in some cases couldn’t parse it’s own syntax due to YAML’s overly broad but at the same time opinionated syntax.
By its very name (and the fact that the MEANING of the name flip-flopped in mid-flight after launch) you can tell that the designers of YAML had no clue what they were doing, because originally they named it "YAML" for "Yet Another Markup Language", when it clearly was NOT a markup language.
Only AFTER YAML had been around and in use for a few years did those geniuses actually realize that they had made a mistake in naming it something that it's not, and retroactively changed the name "YAML" to mean "YAML Ain't Markup Language", which was a too clever by half way of whitewashing the fact that they originally CLAIMED it was "Yet Another Markup Language", since they had no idea what a markup language actually was.
I prefer to use markup languages and data definition languages that were designed by people who are situationally aware enough to know what the difference between a markup language and a data definition language is, please.
Hard pass on YAML, whatever it stands for this week.
I've often heard this argument about YAML being "clean", but over time I have realized that they are conflating minimalism with cleaninless, when they are two different things. That realization is what it took for me to realize why I didn't like it. I did _not_ find it clean, I found it "messy" by virtue of the increased cognitive overhead. But it is minimal at least compared to other formats. Other formats appear cleaner to me.
I'll give my opinion as someone who has to choose among JSON, XML, TOML, and YAML about two years ago for a new project. Whatever I chose had to be easy for end-users who don't know the specification to to understand later.
Here were my thoughts on the options.
JSON - No comments -> impossible
XML - Unreadable
YAML - 2nd place. Meaningful indentation also made me worried someone was going to not understand why their file didn't work. The lack of quotes around strings was frustrating.
TOML - 1st place. Simpler than YAML to read & parse. It truly seems 'obvious' like the name says.
I haven't encountered any situations where I wish I had more than TOML offers.
I have nesting up to three levels deep. I use inline tables^ for the many innermost (or other few-element) tables. It's never seemed excessively verbose.
It isn't. YAML and JSON are much more proven than HCL. HCL is used for some relatively small products. Just making something more complicated doesn't make it better.
Proven in what sense? Several implementations are broken are incorrect. HCL is used in very large products as well. Just because it isn't the majority currently doesn't mean that it isn't a worthy choice. HCL isn't more complicated if used as an alternative to YAML or JSON, in fact, I would argue that it is simpler. It bridges the pros of YAML and JSON combined, and addresses the nested complexity of TOML. It really is IMO the best, but you of course are free to share a different opinion. However, I would encourage you to actually try it out and re-evaluate.
Why are unquoted keys so critical? I feel like one of the strengths of a DDL like JSON or XML is that it's easy to tell what the data (key-value pair or otherwise) is, while with YAML and others, understanding data-vs-structure can be challenging.
TOML can't decide if it's a super INI file or a JSON cousin. You can represent the same information using two completely different representations and you can mix both styles in the same document. Manually navigating and editing values is error prone and hard to automate.
In that case, you might want to have a look at JSON5: https://json5.org/
It is pretty niche, but attempts to improve upon JSON in a multitude of ways, one of which is the support for comments: https://spec.json5.org/#comments
A lot of people have really strong opinions towards syntax things like YAML vs JSON vs XML, HTML, even programming languages. I think at some point we assign way too much importance to this kind of stuff.
I recently read a piece by Joel Spolsky that resonated with me (even though my career is not nearly as long as his).
> I took a few stupid years trying to be the CEO of a growing company during which I didn’t have time to code, and when I came back to web programming, after a break of about 10 years, I found Node, React, and other goodies, which are, don’t get me wrong, amazing? Really really great? But I also found that it took approximately the same amount of work to make a CRUD web app as it always has, and that there were some things (like handing a file upload, or centering) that were, shockingly, still just as randomly difficult as they were in VBScript twenty years ago. [0]
It makes me wonder if we're really focusing on the right stuff. Maybe there's lower hanging fruit somewhere that's more valuable than focusing on fundamentally subjective things like syntax.
It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.
Really came here to search why EDN wasn't mentioned. It is used in Clojure/ ClojureScript/ hylang ... projects a lot. It is a superset of JSON, is in my opinion a lot more readable than JSON but familiar enough too. It has native sets e.g. #{1 2 "three" '("four element list with a string inside")} and keywords. Tagged elements can be used for extending e.g. with a timestamp (such as the built-in #inst) or #uuid. And it also supports comments and discards for stuff, that should be omitted in evaluation.
IMO using code that generates (possibly binary/opaque) config data is the sweet spot. It's one more layer of indirection, but it means you're language-agnostic, you have a "safe" interface, and your "config-generating" process can be as expressive as you like -- comments, loops, whatever.
The underlying conundrum is:
- systems need to be configured,
- human-readability is obviously necessary at some level,
- configuration is often very "compressible" (needs loops, needs variables to be maintainable), but
- system-writers don't know the structure of your data, the axes on which you'd want to compress things, the best abstractions for you.
Templating languages are an obvious direction, but they're uniformly bad. If they have limited expressiveness you'll run into the limits. Maybe there are templating languages with good unit testing frameworks, but I haven't seen them. "Look at the expanded diff" doesn't scale. And generating gobs of human-readable "data" (in a format that supports comments!) is very wasteful.
> Starlark is a dialect of Python. Like Python, it is a dynamically typed language with high-level data types, first-class functions with lexical scope, and garbage collection.
If it has first-class functions, how can you avoid infinite recursion? Like, what stops me from running the omega combinator in it? This is why Meson (a similar language) does not allow those kinds of shenanigans, to keep the language non-Turing-complete.
Interesting! I started using jsonnet this year, but found that the language was needlessly quirky (e.g. the `::`, purely functional aspect, and no one wants to learn a new language to write configuration in the first place). More importantly, it is extremely slow (lazy evaluation without memoization...): rendering the Kubernetes YAML of my 5-container app taking over 10 seconds...
> It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.
Starlark is indeed deterministic and guaranteed to terminate (the Go implementation has a flag that allows recursion, but it's off by default), but these are two orthogonal properties.
So one thing I wasnt sure of is: If you have a Starlark program how is the value of it decided? Is it simply the value of the last expression? And where does the print-output end up? Is it just for diagnostics and has no influence on the value?
I like INI. It's simple it's readable and it leaves the data types up to the application to interpret. It's also really easy to parse, I can work out how to do it and JSON is beyond me.
I like CSV (and similar delimited files) it's less verbose than anything else for tabular data.
I like JSON for data transfer, you know the data types, it's succinct, and readable.
iirc, it's hard to do any nested structure in INI - you'd have to do a convention like putting prefixes and dots in the name of the entry to denote hierarchy.
Exactly what I think about the matter. Sometimes I use proprietary binary formats together with UDP where performance is critical (game servers for example).
I have to say I hate the fact that I have low confidence when editing YAML that the result will be what I intend. It's kind of the number one job of such a format. And I routinely run into people using advanced features and then I have no idea at all how to safely edit it. It is interesting that it seems so difficult to pick a good tradeoff between flexibility and complexity with these kinds of languages.
I just stick to XML unless forced to use something else.
Schema validation, code completion on IDEs, endless amount of tooling including graphical visualisation, a language for data transformation and queries, and.... wait for it...
comments!
If you're going to use XML, I would consider it mandatory to also use XSDs (W3C XML Schemas).
XSDs is something I think people need to pay more attention to when dealing with XML; the type system that the W3C XSD standard lays out (when used effectively) really does relieve much of the pain that people experience with XML.
Visual clutter, familiarity to non-coders. Curly braces are almost never used outside of programming and are ugly to boot.
My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"
If the intended audience is purely developers, then sure. JSON (with the addition of comments and trailing commas) is just fine.
White space has the additional advantage of agreeing with itself. Other demarcations can have issues where the indentation and the structure contradict each other.
Again the dreaded Cobol argument. We had to struggle with a lot of this in the past: Cobol, SQL, YAML, BDD. All this would be much easier without this nonsensical idea that nontechnical people will read code. They won’t. Making code a bit more like prose doesn’t make it readable for nontechnical people. Yet we again and again make our life harder - ugly syntax rules, no code completion, no auto-formatters.
Please stop making code easy for non-coders. They don’t want to read it. They never did. They just want this damn box to work.
as a counter argument. I work in robotics, where many operators will look at and change settings in a yaml file during testing. They do not have software skills outside of this.
My educated guess then is you could have gotten them to change "settings" in C, Java or basically anything.
Just put the file in the root folder, and keep it as simple as possible and you should be fine? I mean, if they manage to write yaml correctly and consistently C is no match?
The reason the situation is the way it is now is precisely because the code being made easy for non-coders increased the popularity and reach of the products. Probably because non-coders also found it easy to pick up and start working with it.
I haven't found "this needs to be indented exactly the right amount or it won't work" to be much easier for non-programmers than "this needs to be enclosed in braces or it won't work." Most people have at least experienced parentheses in math (albeit maybe decades ago), so it's not an entirely foreign concept. Either one requires a bit of learning, but I think most people are capable of it, so any improvement in non-coder familiarity seems minor at best, vs. the very real costs.
Counter-argument - why do programmers insist on clear indentation if it doesn't aid readability? The indentation is there for humans and the braces are there for the compiler.
>My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"
My benchmark is this: can an autoformatter do its job every time without breaking something that's technically working right now but possibly formatted wrong?
Every data format that cannot comply with this contains in it a huge waste of time. Even as a python programmer, I extend the same rule to programming languages.
My google ability didn't see anything, but are their any case studies that show it's more readable? I'm happy to accept that it is, but I can't help wondering if research has been done or it's mostly gut feeling / anecdotes / aesthetics.
I'm not clear exactly what the "it" here refers to but as I mentioned in other comments it's fairly self-evident that indentation is easier to visually parse than braces. A simple thought experiment - would you find it easier to skim read code where the indentation was consistent with the bracing or where it was inconsistent? Your brain registers the indentation first and you only resort to counting braces if there's a reason to doubt the former.
Reality often times runs counter to our expectations though, which is why I wondered if no brace methods have been shown to enable simpler to understand usage / understanding.
That's just display though, if you have to show it to a skeptical client, why not run it through a browser that shows it without braces? It's the same as showing a webpage instead of the html.
If the HomeAssistant subreddit is anything to go by its their biggest complaint (HA configuration is in YAML).
With that said if they weren't complaining about white space they'd be complainit about missing semicolons, missing/extra commas, missig equals signs, missing closing )]} or whatever.
Are you arguing "braces are easier to get right than indentation" or is this a point specific to YAML's rules? Because I'm not defending the latter but I find it hard to understand I would need to argue against the former.
YAML is a superset of JSON. In other words: any syntactically valid JSON file is a valid YAML file. If you want braces like JSOM, but not quoted strings, YAML supports it.
I'm not buying that you genuinely have a target audience of "I trust this person with config files but their eyes are too gentle to see a curly brace".
Well. I've genuinely had clients editing YAML so there's that.
I can definitely think of a broad range of people where I'd be happy to recommend they use text files for config and data but I wouldn't be happy if those text files needed to follow the rules of JSON syntax.
I mean - to some extent I would rather not edit JSON. It's not a terribly ergonomic experience. If I had to design a format for my own use, it would be indentation based and probably look a little bit like YAML, Markdown or similar.
But I find it incredibly annoying to estimate indentation when lines are wrapped in an editor (or webpage). Or, to a lesser extent but still throws me off, when multiple blocks end at the same line. Or when pasting blocks into another block, and having to double check to make sure the indentation was carried over correctly. I like editors that visually show indentation characters.
It really doesn't matter. Just force the leading indent to be exactly the same bytes. If indent moves between two values where one isn't a prefix of the other raise an error.
No it isn't. That is the point. If you have one line indented with 4 spaces and one line indented with a tab there is no correct answer what the difference of indent is. The only good option is to raise an error.
Looks confusing to who? Another coder? Then they have issues.
I get the original was talking about client facing config files. I'd rather see INI style config files personally.
If you're writing ugly code, braces or spaces won't save you. Just don't write ugly code.* Write it like the next person to view your code is an axe murder that knows where you live, so don't make them mad. You can minify later.
I was surprised the first time I saw Daniel J. Bernstein's qmail configuration. Qmail uses separate configuration files for each parameter being set. The directory /var/qmail/control contains most of these files.
For example, to set the maximum message size to by 10Mb and to set the timeout to be 30 seconds:
There are many more files like this that hold simple values. /var/qmail/control/locals is a file that is a list of domain names, one per line.
Dictionaries are just subdirectories with one file per entry, for example this is how aliases are defined to qmail:
echo fred > /var/qmail/alias/.qmail-postmaster
echo fred > /var/qmail/alias/.qmail-mailer-daemon
See [1] for more about qmail.
DJB also created a simple, portable encoding for serializing data called netstrings, see [2]. XML, YAML, JSON, TOML, and INI files all have some advantages over netstrings, but netstrings are simple to understand and simple to parse correctly.
My opinion: I can live with yaml and json. Toml,tjson if I have to. Xml with a gun to my head. But I don't want yet another markup language (ironically that's what YAML stands for)
What I want from YAML (or a competitor) is access to the concrete syntax tree.
For one of my art projects I make YAML files that describe the front side, back side, and web side of a "three sided card". I generate these out of several templates, currently using ordinary string templating.
I'd love to be able to load a YAML file and add something programatically to the list and have the list stay in the same format that it was in, so if it was a
You are not the only one. But even if you find a library for YAML AST transformations for your language. What ever other language uses your YAML probably doesn't have it.
E.g. I tried exactly the same thing, and it was quiet difficult with Rust. Because the way to parse it usually is with serde and it just removes the AST of course.
In the end I gave up, and just used JSON for my use case.
I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?
XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.
If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?
>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.
>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).
> YAML is considered by many to be a human friendly alternative to JSON
I'm not disagreeing with the author here, but as someone old enough to remember the rise of XML as a data transmission format (and Erik Naggum's masterful rant against it[0]), it's strange because historically speaking both XML and JSON were also popularized as more "human readable".
I would be curious how many HNers (and even more so newer developers outside the HN-o-sphere) have worked extensively with or even written parsers for binary (or otherwise non-human readable) file formats. Writing an MP3 metadata parser used to be a standard exercise for devs looking to level up their programming skills a bit.
It personally feels weird to me that we would keep pushing for more "human readable" data formats when the world is increasingly removed from one where non-programmer humans need to read data. Keep your data in whatever format make sense and let software handle transforming it to a more readable or more efficient format depending on the needs, even if humans can't read it (they shouldn't need to!).
On top of all that my experience has been that JSON leads to more atrocities than XML (while fully agreeing with all of Erik Naggum's points about that) and YAML creates even worse horrors than JSON. It seems we'll soon be approaching eldritch horrors if we continue to pursue human readable data exchange formats.
As an embedded sw dev working on things that interface with legacy devices, I have written lots and lots of binary parsers (as well as serial, net, and ipc protocols). I've also reversed some binary formats used by games, etc.
I like binary formats for things that are simple and don't change too often. However, I still love not having to waste days on studying yet another bespoke binary format & parser for things that are complex and don't work right for whatever reason. So when performance isn't a concern and you aren't working in a size-constrained environment, I do find that "human readable" formats are often worth it.
As a practical example, I recently hit a bug where KiCad moved some custom footprints' pad shapes around after saving & reloading. And I quickly discovered that the footprint files are just S-expressions and relatively self-descriptive so I fixed my issue in five minutes with vim without ever needing to look at docs or code. That kind of thing is super convenient. Later I discovered that other users are likewise working around the program's limitations using a text editor or custom scripts to manipulate things KiCad won't do for you; for example, to create a repetitive pattern of components in a layout more complicated than a grid.
I don't understand this. YAML has limitations. All formats have limitations. If a format is too limiting, don't use it. Pick one more suitable, or come up with another one, like NestedText (or whatever). What is this need to tell everyone else to "move on" from using some format because it doesn't sit your specific preferences or use case?
The person that creates the config file does not necessarily choose the config file format. In the example, github chose YAML and everyone using github actions must use it. YAML is error prone, as everyone that tests with Python is finding out as the add Python 3.10 to their regression tests. This is a plea to organizations like github to stop choosing YAML.
As not a Python programmer, I'm struggling to understand the issue.
Your (or blogger's) claim is that GitHub misparses the YAML actions config when it comes specifically to Python? Or that YAML is inherently inadequate to the task of representing Python's necessary actions?
But now we discover that 3.10 is parsed as a float, just like the other ones were. But the problem is that 3.10 becomes 3.1, the equivalent float value! With 3.9 we didn't notice this problem.
I'm not sure what the name for this problem is.
It's something like.. an unfaithful but unintentionally working representation. Until it doesn't work anymore. The solution in YAML is to quote the values so that they become strings as intended.
How is that different from other formats? Take JSON:
{"python_versions": [3.8, 3.9, 3.10]}
This is a problem in any config language that doesn't enforce types, and jf it does enforce types, you should've used quotes already (like you really should've been before 3.10 was added to the list).
Similar problems also exist in many config formats with scientific notation (2e345) or hexadecimal notation (0x12345) or octal notation (012345, no that's not a decimal number in many programming languages and config formats!).
What commonly supported alternative would you suggest for this use case?
I don't understand this. Opinions are not universal, every blogger has limitations. If you don't agree with a HN title, don't read it. Pick the next one or read another site. What is this need to tell everyone that you don't like them to tell everyone to "move on" just because it doesn't sit your specific preferences in formulation?
I see what you did there, but it comes off as mocking and not so clever.
The difference between my post and that blog, and your reply for that matter, is that I'm not telling anyone not to read it. I'm inviting you to comment on why people feel the need to tell others to move on.
YAML's lack of limitations is the source of much of the difficulty with the format. The numerous ways to represent basic data (arrays, strings etc) is a common source of error. YAML doesn't have enough limitations!
And you can't pick what config format a tool you need uses.
But, GitHub, specifically has very specific guidelines on what to write and how to format it, and will give pretty detailed error messages if it's badly formatted. Not to mention tooling to format and highlight. I'm not seeing how the proposed NestedText is inherently free from those same issues: the need for tooling, guidance, error messages. Is the claim that it's easier?
I think the important difference between NestedText and YAML is that NestedText does not try to convert the text to numbers or booleans. YAML converts on to True and 3.10 to 3.1, which in this case is undesired. NestedText keeps text as strings. The idea is that end application should be the one that determines if on is a string or a boolean and whether 3.10 is a string or a number.
Its all in the name. All leaf values are strings. It is literally nested text.
I post this pretty much every time this topic comes up:
JSON5 exists, and is quite nice. I've picked it up for configs on a work project and haven't once had an issue due to misconfiguration, unexpected parsing, or friction with leaving a trailing comma or a comment.
The nesting in JSON5 is simple and familiar to pretty much all programmers, unlike deep nesting in TOML which is a huge pain.
JSON5 keeps coming up in these discussions, and I've personally had a great time with it. Hopefully some larger projects pick it up and it eventually becomes a common occurance, or something.
I like it so much I got motivated enough to start making a sublime text highlighter for it. I got a bit lost though, having never made one before.
And then I tried to use a tool called SBNF to write the grammar for the language at a high level and have it spit out Sublime Text syntax highlighting code. Didn't quite work yet unfortunately.
The introduction keeps citing "no need for escaping or quoting" as a major advantage, but provides no examples of what a key with a colon, or value beginning with "[", or any datum with leading or trailing whitespace would look like.
Also, the changelog is quite frightening!
> [In 3.0], `[ ]` now represents a list that contains an empty string, whereas previously it represented an empty list.
This made my curious to find out. The "Language introduction" docs [1] answers these points;
* keys containing leading spaces, newlines or colons can be represented with the multiline key form, where each line of the key starts with `: `.
* leading or trailing space is not complicated; the string values are just the rest of the line after the separator from the key, `: `. The values are not trimmed.
* a string value beginning with `[` just works in most places. This would not be confused with list values, as these would only start after a new line. Only in the compact inline list and inline dict form there are restricted characters for syntax.
It seems that their claim, no escaping required, holds. The slightly more verbose form of the language constructs may be required to represent special values though.
Looking at the comparison examples between TOML and YAML/NestedText, I fail to see how anyone can look at the YAML/NestedText and think "yeah, this is way easier to read and reason about than TOML".
I'm not even a Rust person. I've never worked in Rust in my life, so there is no "preference bias" in my comparing the two. I just don't find YAML, or this "improvement" as "human-readable" as people make out to be.
I've tried dhall, cue and jsonet, and cue is so far my fav. It's very well designed, expressive, but restrictive enough so that config files don't look like scripts.
The way it blend types and values makes learning it super easy, yet you can do complex things with few lines.
But the main implementation exports to yaml without quoting the strings, which kinda defeat the purpose :(
I’m kind of unsure about the way CUE achieves reuse: if I understand correctly, you have files in a directory tree and the (result of processing the) bottommost files are the things you’re supposed to point your consuming tools at. So there’s no way to share structure among a collection of items if that collection is nested inside your config, the only operation available is essentially the generation of a set of similar but separate configs. Or am I wrong here? I’d very much like to be.
(Also, the type system is absolutely delicious, but it badly needs a paper with a complete description. I’m extremely interested in how it works, but fragmentary “notes on the formalism underlying” CUE are not enough.)
CUE is based on Typed Feature Structures, which predate Deep NLP, and for which there is limited literature. We do need a good writeup on the theory. I've written a bit here: https://cuetorials.com/cueology/theory/
Think of a graph with lots of attributes within which paths are searched for.
You do have imports and functions so you can reuse what you want.
The doc is also quite clear and rich, but the way it's organized means I have to read it entirely before writting my first CUE file. It also lacks IRL examples so trials and errors were my best friends.
I actually use CUE for large configuration files, used YAML before and had many issues once configurations became larger and more complex. It validates and exports JSON, which is easily readable in C++ and Python :-) have been happy doing the switch
I'm working on my own (you can start the attack hahaha). I feel there's not many simple, generic languages that allow to write simple DSLs with embedded documentation. The self documenting part is still missing but you can take a look and say what you think. It's kind of like yaml format (a bit TOML) with schema and possibility to merge multiple files with smaller chanks of the data. With export to json and yaml. https://github.com/dadlang/dadl
I have a client that uses a CMS of unknown origins. I just get stuff placed in an s3 bucket, and then attempt to parse what was provided. 100% of their YAML files are invalid by every single linter I have found/tried. Not one of them understands where the error is occuring to help debug. It litterally just says invalid. I'm at a total loss. My head doesn't think YAML. Does a string need quoting or not? trailing spaces at the end causes problems? my personal experience in YAML is limited, but it hasn't been pleasant.
I'm glad to see people experimenting with alternative document/object representations, but this one might be a hard sell: based on the README[1], it only has Python, Zig and Janet implementations so far. One of the nice things about YAML (and JSON, TOML, etc.) is that they have decently mature C, C++, or Rust libraries that other languages bind to.
Sure, but why move to an alternative that's almost as bad?
YAML's problem is that whitespace is significant. TOML could be superior to it if it weren't for the fact that they forgot to forbid indentation. And now indented TOML is everywhere, including its wikipedia page.
If we have to make a change, why not finally bite the bullet and go to the form that has existed for decades and is obviously superior to all of these formats? S-expressions. There's even been a standard for data notation brewing for some time: https://github.com/edn-format/edn
Then we can actually forego http://xkcd.com/927 and do something useful with our significantly saved mental energy.
edit I see that I'm not at all alone in wanting edn to replace all this crap. So some action points on how to actually make that happen, in order of preference:
- write or improve robust edn parsers for your ecosystem
- write or improve robust x => edn converters for your ecosystems (x=yaml,json,toml,whateverpoisontheyuserightnow)
One interesting demonstration of YAML's complexity relative to JSON is that YAML is almost a complete superset of JSON. This is acknowledged by the authors of the YAML spec.[0]
For example, the following code translates JSON to YAML using only the Python yaml library:
I wrote a program at a corporate job where all the configuration is in Excel files. Tables are just fed into a dictionary and columns on each worksheet are predefined to hold the keys. People loved it because they know how to use excel and “text” is scary. (This is all very strange because they are just entering text in Excel, but the familiarity goes a long way)
I _recently_ suffered through a meeting where we developers were told to use the _new_ testing framework some team at our corp created. It's written in Java (we use .NET exclusively in our branch), configured via Excel sheets and Java, and exports results also as Excel sheets.
Whoever thought this was a good idea in 2021 has to be braindead. But the CEO was pleased. Probably because they know Excel.
Have spent many years developing dev tools that use YAML and alternatives and I still think YAML wins because of its ubiquity relative interop with JSON. I’d pick HCL as an alternative if I was going to, as it’s been widely battle tested in Terraform.
Huh, no. YAML is a superset of JSON. So valid JSON is valid YAML. This is sometimes suprisingly useful. Also, YAML is used everywhere and like the other user pointed out, has mature well tested libs for almost every language.
While true, this is what being a supperset means. What it means is that any JSON document is, without modification, valid YAML document and can be read with a YAML parser.
YAML supports tagged nodes and multiple documents in a stream, which can’t be represented in standard JSON. (You could make up conventions, but only your parser would support them.)
It just is: Any valid JSON is valid YAML with the same semantics. It was intentionally designed that way (though not from the start iirc).
Basically take JSON, make quotes optional for strings and make curly brackets optional if the object is indented properly, and boom you've got (something like) YAML.
Any valid json can be parsed by any 1.2 yaml parser. That's how it gained popularity in the first place: you didn't have to migrate. Like ascii and utf8.
There was an interesting project showcased here a while ago, it was some kind of very minimal language, almost the most minimal theoretically possible, but with some interesting properties. Does anyone remember it?
Im sorry, But the issues raise have more to do with a particular implemention -- that is outdated -- then YAML in general.
E.g. "on" should no longer be treated a true. Thats a 1.1 version archaism. And 3.10 is going to have the same issue in JSON.
No doubt YAML could still be improved and maybe we'll get there eventually. 2.0 is a long discussed goal, but the creators of YAML (who I have talked to extensively) are cautious, thoughtful and methodic and won't make that jump until they are sure of it.
Meanwhile 1.2 is a fairly good spec, and difficulties largely lie with implementors and users.
Most people that complain about YAML are like a person complaining that tennis shoes are terrible because when you try to rock climb in them, they don't have traction and you slip and fall. Tennis shoes are garbage! ..... Or maybe tennis shoes are good for what they were designed for, and you need a different kind of shoe for rock climbing.
YAML is a human-readable data-serialization language. Note the word readable, and that it's for data serialization. It's not intended to be human-writeable. It's not intended to configure an application (unless that configuration is the result of serializing the data object created after someone configures the app).
Since programmers don't really understand the different types/classes of file formats and what they're for, they choose the wrong formats for the wrong tasks. And then the people who are forced into using those formats for those programs find it's highly problematic - but they get pissed off at the formats and not the programmer!
I'm perfectly happy for people to create new data serialization formats, new configuration formats, new markup formats. But please avoid the trap of thinking "this format or that format sucks". None of them suck for what they were created for, when used correctly. YAML, for example, has huge advantages over most common data serialization formats. But those advantages fly out the window as soon as humans start writing YAML by hand, and then bolting on weird custom logic, as if this data serialization format were a higher-level language.
One step forward, one step back, one step sideways. It's a great idea to have line types, but the treatment of block text is terrible... at least with YAML I can embed markdown and have the paragraphs come out correctly. Multiline keys... are they really that important so that we have to jump through awkward looking syntax hoops to get them? And please, why keep inline object definitions? Nobody needs them! It won't kill you to write a list over multiple lines. Some verboseness is good if it enforces structure.
I use YAML quite extensively for an internal tool, since it's pretty much the only human-useable format out there. But the weirdness of the language makes it error-prone while the quality of available parsers is simply unacceptable. I ended up writing my own parser for a subset of unambiguous YAML (not unlike StrictYAML, but even simpler) that offers nice error messages etc. and it works very well for us.
If your project is in python, the only correct file format to use for the config file is python. If your project is in ruby, the config file should be ruby. If you're in $SCRIPTING_LANGUAGE, your config file should be implemented in $SCRIPTING_LANGUAGE.
You can use `literal_eval()` (in python, there are similar constructs in basically every scripting language) to prevent prevent people from putting code in the config, but really, it's the person writing the config file's computer, let them do what they want.
For compiled languages, it's a bit harder, but there are zero cases where a separate "configuration language" should exist.
The issue with YAML is that it does not unambiguously distinguish between number/booleans and strings. JSON does, but only for numbers, booleans, and nulls. But there are many data types that need to be conveyed. For example, dates and quantities (numbers with units, such as $3.14 or 47kΩ). Such things are left to the application to interpret. Even JSON does not unambiguously distinguish between integers and reals. Even so, JSON pays for its lack of ambiguity by requiring all strings to be quoted, which adds clutter and the requirement for quoting and escaping. Thus, supporting those extra types comes at a cost.
I think NestedText is unique in leaving all leaf values as strings, so it does not need quoting or escaping.
Everything involves a compromise. YAML provides a lack of clutter at the cost of ambiguity. JSON is unambiguous, but comes with visual clutter. In both cases there are still lots of types they cannot handle and so must be passed on to the application.
The compromise with NestedText is that it provides simplicity and a lack of clutter by not supporting any data types for leaf values other than string. Thus, all interpretation of the text is deferred to the application. But fundamentally that is the best place for it, because only the application understands the context and knows what is expected.
Yes but I can understand the rationale. There are many numeric types and settling on some excludes use with others. If letting the application handle that, the configuration language can remain simple. His example where a version number 1.10 was round trip converted to 1.1 was enlightening.
The worst config files I've ever encountered are dynamic YAML templates for Kubernetes. Mind boggling to figure out with crazy indentation rules and for loops. Kill me now.
Seriously, the more time I spend with other configuration formats the more I start appreciating JSON. It is a simple array and object format. Nothing can go wrong with it. No indentation rules. Easy to encode and decode. Easy to turn into actual arrays and objects in your programming language. Lack of types is not great, but use a type checker in your parser that throws exceptions and you are fine.
It has numeric literals. But all JSON number types are f64. Which is sufficient to represent every 32-bit integer, but it can not represent all 64-bit integers.
For code that needs a 64-bit integer, which is quite a lot, you have to encode the integer in a string.
Lack of 64-bit integer (either signed or unsigned) is a pretty common and well understood pain point when using JSON as an interchange format.
I suppose you're right that the formal JSON specification doesn't actual specify what a number actually is. Only that it's a sequence of digits. TIL.
In practice, when you are using JSON as an interchange format between tools and systems it is exceptionally likely that you'll be constrained to 64-bit float as the underlying Number data type. Even if there is no JavaScript anywhere in your data pipeline. I have been burned by this repeatedly. For the record, I have never written a single line of JavaScript in my professional career.
In any case, my dream format would separate storage type from encoding type and formalize implementation extensions.
> YAML, used sparingly, without too much complexity, is fine
It's wishful thinking because the complexity is inherent, unfortunately. That's analogue to saying "programmers should not write bugs". Humans are fallible and error prone, it's not going to happen unless a language is restricted in such a way that a category of bugs is not possible by design. However, YAML's design is sprawling, so despite best intentions people will run into the problems caused by the complexity. Possible ways out are restrictions of the design (e.g. Strict YAML) or whole replacements (e.g. NestedText).
> There are a few gotchas that are easy to catch with validations.
Does this actually exist? If not, who's writing the code for these validations? How can we make sure everyone who needs to use them is using them?
The idea sounds good on paper, but not workable in practice because "patching after the spec problems" requires global coordination.
I use yaml as an alternative to .properties files. It's boring. Yes, quote your strings. Know the damn config language. It's not going to kill you like a table saw.
Naive question: is it viable if we started using Lua[JIT] for configuration, like NeoVim and likely others do?
Can Lua's interpreter be compiled without some "dangerous" APIs enabled (whichever those might be) and thus be made viable as an embeddable and isolated configuration engine?
I'm just getting sick and tired of all the half-baked configuration formats and want to look for something that's both more flexible but still strict and unambiguous. I wonder if it's possible?
> Naive question: is it viable if we started using [a Turing-complete programming language]
This is actually a good question. The people who are not asking and going right ahead with that plan are doing a dangerous thing. <https://www.cs.dartmouth.edu/~sergey/langsec/occupy/> It's not viable because a subset of people would like to have the following properties upheld:
• Parsing configuration should be decidable and finish in finite time.
• Parsing configuration should not be a security exploit.
• I should not have to implement the Emacs runtime just to parse its configuration file.
> something that's both more flexible but still strict and unambiguous. I wonder if it's possible?
JSON is often so hard to read I have to open a file in a formatting tool. It's also hard to tell if there's a simple missing quote or brace. Those are extra steps I don't have with YAML. For many things, JSON is great. For simple loaders, YAML gets it done quick and easy. They are functionally interoperable for the most part so I use what's best.
I'd also add the author has a competing framework so maybe there's a bit of pre-existing bias.
When you miss a quote or a brace in JSON, the JSON fails to parse. When you make a similar minor mistake in YAML, you often end up with a valid but nonsensical document with completely incorrect structure.
I don't want the language to be flexible enough that simple common errors go unnoticed - I WANT the parser to tell me at parse time if I screwed something up. It's a similar dynamic to dynamic/static typing.
I'm not sure I'd trust manually editing either in a plain text editor. YAML cares about whitespace yet will silently make most combinations valid and it also has surprising pitfalls on type interpretation if one isn't consistently on top of it when entering data. JSON has a bunch of additional characters but at least if you forget them you have a higher chance of it just telling you it's wrong when you try to use it.
Thankfully it's very rare I ever have to open any config file in a text editor that isn't aware of dozens of formats so things like a missing brace are always suggested anyways and it isn't much a problem regardless of format.
I think the advantages over the csv example are not quite advantages.
"tall and skinny". Well it looks skinny, but in bytes is actually fatter than the csv. Similarly, for comparing stuff, the columnar display works better.
My go-to is ini. Simple. Everything is a string. No assumed hierarchy in section naming. Just key/value pairs under sections. It is up to the using application to parse them how it sees fit.
I like complexity of YAML. When people choose YAML as driver for their shitty tools I'm forced to use, at least I can use YAML features to cope with this braindamage. Of course, you can go only so far with e.g. anchors.
Stop using configuration languages as programming languages and leave YAML alone as configuration language.
If you know about dangers of unsafe YAML, why don't you just replace load() with safe_load()??
YAML has problems (mainly that it requires an overly complex parser), but the alternatives presented here look like a step backward. Easier for the parser, but less flexible for the human (one nitpick with NestedText that stood out immediately: can I collapse arrays and dictionaries into a single line like in YAML with [] and {}?)
I really don't know where all this XML hate comes from, it has everything I need and I don't find hard to read, maybe I'm just used to it.
How often do you change your configuration files? My projects pom.xml change only occasionally, version upgrade or new dependencies, that's it.
The problem with the version numbers here seems like it's because they've inappropriately written them as numbers rather than as strings, so 3.1 and 3.10 are equal. It's not the languages fault if the author chooses the wrong datatype.
The trouble is that people insist in writing trees using plain text editors. Trees should be written in tree editors. Then you can't get the delimiters wrong.
"NestedText was inspired by YAML, but eschews its complexity. "
EDN is the best I've used, I dream for it to become wildly adopted.
It has powerful types, can be extended, clean explicit syntax, whitespace independent, easy for humans and machines to read. It's like JSON done right.
EDN is less ideal for Python and other similarly somewhat high-level languages and there just aren't any libraries for C/C++ from what I have seen. In general, XML/ JSON/ YAML/ custom binary or text formats it is in that order for most software from what I can guess. Remember, .docx, .xlsx and other document formats are also basically just XML-based configurations for programs/ interchange formats.
I use it to generate the config files, and pass the generated config to the application the same way you’d pass any other json/yaml/etc to your app. Neither ever talk to each other, and Dhall doesn’t exist within our runtime environment.
seriously why dont people just use EDN and move on. Rich types, proper key value pairs, keywords instead of strings everywhere, support for fractions and comments...
XML in theory is a great format for what it represents — a tree of heterogeneous typed simple key/value pairs.
The problem is almost no data that people want to actually represent has this form and every way people have tried to beat XML into representing other things (i.e. lists and dicts) is kludgy is awkward.
I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.
I don't like that a truncated document is a complete and valid document.
But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:
>>> nt.loads(nt.dumps("\r"),top="str") '\n'