Skip to content

Latest commit

 

History

History
318 lines (255 loc) · 15.3 KB

language_design.rst

File metadata and controls

318 lines (255 loc) · 15.3 KB

Configuration Language Design

In this section we will cover some conventions for HCL-based configuration languages that can help make them feel consistent with other HCL-based languages, and make the best use of HCL's building blocks.

HCL's native and JSON syntaxes both define a mapping from input bytes to a higher-level information model. In designing a configuration language based on HCL, your building blocks are the components in that information model: blocks, arguments, and expressions.

Each calling application of HCL, then, effectively defines its own language. Just as Atom and RSS are higher-level languages built on XML, HashiCorp Terraform has a higher-level language built on HCL, while HashiCorp Nomad has its own distinct language that is also built on HCL.

From an end-user perspective, these are distinct languages but have a common underlying texture. Users of both are therefore likely to bring some expectations from one to the other, and so this section is an attempt to codify some of these shared expectations to reduce user surprise.

These are subjective guidelines however, and so applications may choose to ignore them entirely or ignore them in certain specialized cases. An application providing a configuration language for a pre-existing system, for example, may choose to eschew the identifier naming conventions in this section in order to exactly match the existing names in that underlying system.

Language Keywords and Identifiers

Much of the work in defining an HCL-based language is in selecting good names for arguments, block types, variables, and functions.

The standard for naming in HCL is to use all-lowercase identifiers with underscores separating words, like service or io_mode. HCL identifiers do allow uppercase letters and dashes, but this primarily for natural interfacing with external systems that may have other identifier conventions, and so these should generally be avoided for the identifiers native to your own language.

The distinction between "keywords" and other identifiers is really just a convention. In your own language documentation, you may use the word "keyword" to refer to names that are presented as an intrinsic part of your language, such as important top-level block type names.

Block type names are usually singular, since each block defines a single object. Use a plural block name only if the block is serving only as a namespacing container for a number of other objects. A block with a plural type name will generally contain only nested blocks, and no arguments of its own.

Argument names are also singular unless they expect a collection value, in which case they should be plural. For example, name = "foo" but subnet_ids = ["abc", "123"].

Function names will generally not use underscores and will instead just run words together, as is common in the C standard library. This is a result of the fact that several of the standard library functions offered in cty (covered in a later section) have names that follow C library function names like substr. This is not a strong rule, and applications that use longer names may choose to use underscores for them to improve readability.

Blocks vs. Object Values

HCL blocks and argument values of object type have quite a similar appearance in the native syntax, and are identical in JSON syntax:

block {
  foo = bar
}

# argument with object constructor expression
argument = {
  foo = bar
}

In spite of this superficial similarity, there are some important differences between these two forms.

The most significant difference is that a child block can contain nested blocks of its own, while an object constructor expression can define only attributes of the object it is creating.

The user-facing model for blocks is that they generally form the more "rigid" structure of the language itself, while argument values can be more free-form. An application will generally define in its schema and documentation all of the arguments that are valid for a particular block type, while arguments accepting object constructors are more appropriate for situations where the arguments themselves are freely selected by the user, such as when the expression will be converted by the application to a map type.

As a less contrived example, consider the resource block type in Terraform and its use with a particular resource type aws_instance:

resource "aws_instance" "example" {
  ami           = "ami-abc123"
  instance_type = "t2.micro"

  tags = {
    Name = "example instance"
  }

  ebs_block_device {
    device_name = "hda1"
    volume_size = 8
    volume_type = "standard"
  }
}

The top-level block type resource is fundamental to Terraform itself and so an obvious candidate for block syntax: it maps directly onto an object in Terraform's own domain model.

Within this block we see a mixture of arguments and nested blocks, all defined as part of the schema of the aws_instance resource type. The tags map here is specified as an argument because its keys are free-form, chosen by the user and mapped directly onto a map in the underlying system. ebs_block_device is specified as a nested block, because it is a separate domain object within the remote system and has a rigid schema of its own.

As a special case, block syntax may sometimes be used with free-form keys if those keys each serve as a separate declaration of some first-class object in the language. For example, Terraform has a top-level block type locals which behaves in this way:

locals {
  instance_type = "t2.micro"
  instance_id   = aws_instance.example.id
}

Although the argument names in this block are arbitrarily selected by the user, each one defines a distinct top-level object. In other words, this approach is used to create a more ergonomic syntax for defining these simple single-expression objects, as a pragmatic alternative to more verbose and redundant declarations using blocks:

local "instance_type" {
  value = "t2.micro"
}
local "instance_id" {
  value = aws_instance.example.id
}

The distinction between domain objects, language constructs and user data will always be subjective, so the final decision is up to you as the language designer.

Standard Functions

HCL itself does not define a common set of functions available in all HCL-based languages; the built-in language operators give a baseline of functionality that is always available, but applications are free to define functions as they see fit.

With that said, there's a number of generally-useful functions that don't belong to the domain of any one application: string manipulation, sequence manipulation, date formatting, JSON serialization and parsing, etc.

Given the general need such functions serve, it's helpful if a similar set of functions is available with compatible behavior across multiple HCL-based languages, assuming the language is for an application where function calls make sense at all.

The Go implementation of HCL is built on an underlying type and function system :go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That library also has a package of "standard library" functions which we encourage applications to offer with consistent names and compatible behavior, either by using the standard implementations directly or offering compatible implementations under the same name.

The "standard" functions that new configuration formats should consider offering are:

  • abs(number) - returns the absolute (positive) value of the given number.
  • coalesce(vals...) - returns the value of the first argument that isn't null. Useful only in formats where null values may appear.
  • compact(vals...) - returns a new tuple with the non-null values given as arguments, preserving order.
  • concat(seqs...) - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments.
  • format(fmt, args...) - performs simple string formatting similar to the C library function printf.
  • hasindex(coll, idx) - returns true if the given collection has the given index. coll may be of list, tuple, map, or object type.
  • int(number) - returns the integer component of the given number, rounding towards zero.
  • jsondecode(str) - interprets the given string as JSON format and return the corresponding decoded value.
  • jsonencode(val) - encodes the given value as a JSON string.
  • length(coll) - returns the length of the given collection.
  • lower(str) - converts the letters in the given string to lowercase, using Unicode case folding rules.
  • max(numbers...) - returns the highest of the given number values.
  • min(numbers...) - returns the lowest of the given number values.
  • sethas(set, val) - returns true only if the given set has the given value as an element.
  • setintersection(sets...) - returns the intersection of the given sets
  • setsubtract(set1, set2) - returns a set with the elements from set1 that are not also in set2.
  • setsymdiff(sets...) - returns the symmetric difference of the given sets.
  • setunion(sets...) - returns the union of the given sets.
  • strlen(str) - returns the length of the given string in Unicode grapheme clusters.
  • substr(str, offset, length) - returns a substring from the given string by splitting it between Unicode grapheme clusters.
  • timeadd(time, duration) - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like "1h" (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp.
  • upper(str) - converts the letters in the given string to uppercase, using Unicode case folding rules.

Not all of these functions will make sense in all applications. For example, an application that doesn't use set types at all would have no reason to provide the set-manipulation functions here.

Some languages will not provide functions at all, since they are primarily for assigning values to arguments and thus do not need nor want any custom computations of those values.

Block Results as Expression Variables

In some applications, top-level blocks serve also as declarations of variables (or of attributes of object variables) available during expression evaluation, as discussed in :ref:`go-interdep-blocks`.

In this case, it's most intuitive for the variables map in the evaluation context to contain an value named after each valid top-level block type and for these values to be object-typed or map-typed and reflect the structure implied by block type labels.

For example, an application may have a top-level service block type used like this:

service "http" "web_proxy" {
  listen_addr = "127.0.0.1:8080"

  process "main" {
    command = ["/usr/local/bin/awesome-app", "server"]
  }

  process "mgmt" {
    command = ["/usr/local/bin/awesome-app", "mgmt"]
  }
}

If the result of decoding this block were available for use in expressions elsewhere in configuration, the above convention would call for it to be available to expressions as an object at service.http.web_proxy.

If it the contents of the block itself that are offered to evaluation -- or a superset object derived from the block contents -- then the block arguments can map directly to object attributes, but it is up to the application to decide which value type is most appropriate for each block type, since this depends on how multiple blocks of the same type relate to one another, or if multiple blocks of that type are even allowed.

In the above example, an application would probably expose the listen_addr argument value as service.http.web_proxy.listen_addr, and may choose to expose the process blocks as a map of objects using the labels as keys, which would allow an expression like service.http.web_proxy.service["main"].command.

If multiple blocks of a given type do not have a significant order relative to one another, as seems to be the case with these process blocks, representation as a map is often the most intuitive. If the ordering of the blocks is significant then a list may be more appropriate, allowing the use of HCL's "splat operators" for convenient access to child arguments. However, there is no one-size-fits-all solution here and language designers must instead consider the likely usage patterns of each value and select the value representation that best accommodates those patterns.

Some applications may choose to offer variables with slightly different names than the top-level blocks in order to allow for more concise references, such as abbreviating service to svc in the above examples. This should be done with care since it may make the relationship between the two less obvious, but this may be a good tradeoff for names that are accessed frequently that might otherwise hurt the readability of expressions they are embedded in. Familiarity permits brevity.

Many applications will not make blocks results available for use in other expressions at all, in which case they are free to select whichever variable names make sense for what is being exposed. For example, a format may make environment variable values available for use in expressions, and may do so either as top-level variables (if no other variables are needed) or as an object named env, which can be used as in env.HOME.

Text Editor and IDE Integrations

Since HCL defines only low-level syntax, a text editor or IDE integration for HCL itself can only really provide basic syntax highlighting.

For non-trivial HCL-based languages, a more specialized editor integration may be warranted. For example, users writing configuration for HashiCorp Terraform must recall the argument names for numerous different provider plugins, and so auto-completion and documentation hovertips can be a great help, and configurations are commonly spread over multiple files making "Go to Definition" functionality useful. None of this functionality can be implemented generically for all HCL-based languages since it relies on knowledge of the structure of Terraform's own language.

Writing such text editor integrations is out of the scope of this guide. The Go implementation of HCL does have some building blocks to help with this, but it will always be an application-specific effort.

However, in order to enable such integrations, it is best to establish a conventional file extension other than .hcl for each non-trivial HCL-based language, thus allowing text editors to recognize it and enable the suitable integration. For example, Terraform requires .tf and .tf.json filenames for its main configuration, and the hcldec utility in the HCL repository accepts spec files that should conventionally be named with an .hcldec extension.

For simple languages that are unlikely to benefit from specific editor integrations, using the .hcl extension is fine and may cause an editor to enable basic syntax highlighting, absent any other deeper features. An editor extension for a specific HCL-based language should not match generically the .hcl extension, since this can cause confusing results for users attempting to write configuration files targeting other applications.