Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring a redesign and discussing how a modern Rust configuration library should behave #111

Open
mehcode opened this issue May 7, 2019 · 29 comments

Comments

@mehcode
Copy link
Owner

mehcode commented May 7, 2019

This library has gotten quite stale over the last year and I've been loathe to come back to it mostly because I don't like it. It's complex to use, complex to develop, etc. I apologize for letting it atrophy.

I'd like to thank you everyone who has contributed or submitted a PR ( especially those who've done so in the last several months and have just sat there ). It's not that I don't want to merge in your work, it's that if I come back to this I'll want to do too much and I haven't had the time available to do that.


With that out of the way I want to explore how a configuration library could behave in Rust, optimally, using some example use cases as a jumping off point:

// load configuration only from envrionment
let settings: Settings = config::env::vars()
    // filter to only keys that case-insentively start with s
    .prefix("app_")
    // explictly transform key names to allow traversing nested configuration
    // replaces .separator("__") from config v0.9
    .map(|key| key.replace("__", "."))
    .load()?;

// find a single configuration file from example.* in
let settings: Settings = config::file::find("example")
    // look in the following paths (by default cwd)
    .paths(&[xdg_config_home, home])
    // expect format in TOML only
    // default allows all formats compiled in
    .format(Format::Toml)
    // and load into Settings
    .load()?;

// find multiple configuration files and merge together
let settings: Settings = config::file::find_all(&[
    "default",
    mode // development, production, or test, etc.
]).load()?;

// load configuration from arguments
let settings: Settings = config::env::args(clap_matches).load();

// manually create a configuration stack
let settings: Settings = config::Stack::new()
    // start with an embedded JSON file 
    .add(config::file::from_bytes(Format::Json, include_bytes!("default.json")))
    // override from environment
    .add(config::env::vars())
    // load into settings
    .load()?;

// create a dynamic configuration (ala config v0.9)
// any _source_ could be built into a Config struct
let settings: config::Config = config::file::find("example").build();

// a dyn configuration can listen for changes on supported sources
let _handle = settings.watch(|| do_something_fun_when_config_changes);

// use path access
let host: &str = settings.get("db.host");

Thoughts? How would you want config to work?

Are there any pain points in the current design you'd like to discuss?

@mockersf
Copy link

Just a comment describing how I tend to setup configuration:

  • Start with default values for config
  • Then try to read a config file in XDG config folder and override
  • Then, for every folder in the current path, starting from /, try to read a config file and override
  • Allow user to specify / override some options using command line arguments, defined through structopt
  • Finally, allow user to specify / override some options using environment variables

My pain points:

  • Some configuration options are only possible through configuration files, other are only possible through command line arguments.
  • I want my CLI to report errors for missing parameters in a consistent ways for missing parameters that can only be specified as arguments or for parameters that could also be present in a configuration file (if value is found in a file, do not report an error)
  • Clap can show a default value for an argument in the --help, I would like it to show the value found from a configuration file if there is one rather than a default that would be override by said configuration file

@mwillsey
Copy link

mwillsey commented Jun 3, 2019

I love how config lets you write hierarchical configuration settings, but that does raise an interesting question as to how overrides/merging works. Example:

struct Settings {
    foo: String,
    bar: Option<BarSettings>,
    bar: Option<BarSettings>,
}

struct BarSettings {...}
struct BazSettings {...}

And then my config looks like this:

foo = "hello world"

[bar]
item1 = 1234
item2 = "asdf"

If I create Settings from the config file, I'll end up with Settings.baz being None, which makes sense. My question is how would I override that (or merge it in) from the environment variables?

If you try to merge in something like APP_BAR= or something like that, you end up merging in a (potentially empty) string in for the value of a table! Then, serde will fail to deserialize into Settings. This is surprising to me because it's merging in a string for the value of (what was) a table.

I see a couple potential ways to address this:

  1. Special case merging the empty string with a table, making the result and empty table
  2. Somehow generalize 1 into some kind of "typed" merging
  3. Manually walk the dynamic config, essentially replacing empty strings with empty tables by hand (where appropriate).

I suppose a similar situation could occur when getting config values from the command line. I think the larger questions is what does it mean to get config values from a non-hierarchical source, and then later merge them with a hierarchical source?

@mehcode
Copy link
Owner Author

mehcode commented Jun 3, 2019

@mwillsey

If I create Settings from the config file, I'll end up with Settings.bar being None, which makes sense. My question is how would I override that (or merge it in) from the environment variables?

The idea is that you would override using the key app.baz.one which would match a field one in BazSettings. As that's not possible to write in an env var there is a .separator("_") function you can call on config::Environment which does a string replace of _ for ..


@mockersf

  1. Do you merge all of those configuration files together or is it last one wins?

My pain points: [...]

It seems we need to think hard on how CLI arguments tie into this. I have a fairly good idea on how to tie clap in normally but I hadn't thought of how to tie config into clap as a "default values provider".

@mwillsey
Copy link

mwillsey commented Jun 3, 2019

@mehcode Sorry, edited parent comment. If I create a config from the above config file, I should get Settings.baz (not bar, as previously stated). The question is, how do I "cancel out" Settings.bar, which was defined as a table in the config file.

I understand the separator trick, but I AFAIK it's not possible to use that to un-define a table (in this case, Settings.bar.

@mehcode
Copy link
Owner Author

mehcode commented Jun 3, 2019

@mwillsey Hmm.. Un-define in general is a hard concept. APP_BAR= is understood as "undefined" not "null" (by default) as its a very common unix idiom to unset an environment variable like that instead of actually unsetting it. So it would do nothing to the previous properties.

We could probably allow a JSON string to be merged in for a table or array. Viper (go config lib) does that.. but that only can get you an empty map. Not a null map. I suppose that could still work if you go one property up.. APP='{"bar": null}'.

Un-setting from a string is probably a lot easier as you could just merge in a string and tell config to treat it as JSON or TOML or something.

@geniusisme
Copy link
Contributor

Considering merging, I would like to see "action" keys, which will make use of special characters. These keys should allow altering config when merged. For example, "remove$app.bar" to remove value completely. Or "add$app.bar"=[1, 2, 3] to add values into array.

@vorner
Copy link
Contributor

vorner commented Aug 6, 2019

I use config as kind of a backend for spirit. I read the path(s) to config files, any potential config overrides and such from command line and then feed config with it. So, I only use the ability to merge configs (of possibly different types) together and then parse it to something implementing Deserialize.

Therefore I guess the interface doesn't matter to me that much, though I'd really like having more informative error messages and being a bit more robust and well defined about how the merging and such works.

@KoltesDigital
Copy link

I also use this crate for its ability to merge from multiple sources, including defaults, and to deserialize to a struct. I guess if I were to do, I'd do the same, with some kind of format provider which would feed with Option.

My only problem is the case (#65). I heavily rely on sub-objects for namespacing, because IMHO some (theoretically all) modules should not know about others, and as such I'm not fan of 12 factor's reliance on a flat list of key-value pairs. Besides with docker configs and secrets, it's easy to provide different config files for different envs.

Anyway, I'd say either a project relies on env vars only, or on config files only, but not both (I gladly receives counter-examples). Serde allows for renaming struct fields, so for env vars #[serde(rename_all = "SCREAMING_SNAKE_CASE")] should be used, and for files #[serde(rename_all = "camelCase")] or whatever. My point is, this crate shouldn't take the responsibility of renaming fields too.

@vorner
Copy link
Contributor

vorner commented Oct 23, 2019

I could imagine a project where one would have most of the configuration inside a config file, but set passwords through environment variables. I think we have some such uses in the company, though not in Rust.

As for the case, I actually find the fact the config is case insensitive as an advantage. Not a huge one, but it feels more convenient from the user perspective.

@KoltesDigital
Copy link

Thanks for the use case. I think you imply that both the config from the file and the passwords end up in the same list. As I prefer to divide config by modules, I don't see how both needs could fit together.

If I had to accept env vars, I guess I'd have a dedicated struct for available env keys, and then I'd put these values into the config structs. Ultimately this dedicated struct would be similar to what structopt does, i.e. env var and cli arguments serve the same goal.

@vorner
Copy link
Contributor

vorner commented Oct 24, 2019

I think you imply that both the config from the file and the passwords end up in the same list. As I prefer to divide config by modules, I don't see how both needs could fit together.

Yes, I like to have one „big“ config structure and then I hand out parts of it to submodules of the application. I tend to consider configuration reload to be transactional, so I don't want to load part of new configuration and then fail on another sub-config.

Nevertheless, even if I didn't have that, let's say I have something like „update source“ ‒ a place where updates for whatever data the application holds.

#[derive(Deserialize)]
struct Updates {
    base_url: Url,
    username: String,
    password: String,
}

This is definitely supposed to come to the same place in the application, so it should be the same config. But the url will be in config file, probably distributed together in a docker image or deb package or whatever. The password will be overwritten by an environment variable.

@rahulwa
Copy link

rahulwa commented Oct 24, 2019

Sorry for jump in between, i never tried config-rs, just wanted to share quite famous way to handle config. https://github.com/spf13/viper Let’s you use config without worry of from where value is coming from, it can be from file/env/key-value-store etc. i love that application doesn’t need to care about how value being set.

@chrisbouchard
Copy link

One use case for me would be able to use config::file::find as per the second example in the OP, but with a different filename per-path. E.g.,

  • config.toml in $XDG_CONFIG_DIR
  • .app-name.toml in $HOME

I'd personally be happy passing a slice of Paths or PathBufs and having it pick the first one that exists. (I'm using directories anyway.) I'd be even happier if I could pass a slice of Paths or PathBufs without extensions, and have it pick up the basename and try different formats.

@chrisbouchard
Copy link

Regarding the first example, I'm not a fan of manually map-ing replace like that. It feels so, well, manual compared to how smart the rest of this API is. I personally like having a specified separator — and I think the prefix should be separated from the key by separator, rather than always by "_", so that we don't have to set the prefix as "app_" like the example does.

In fact, I'd like to see an option for the separator to get even smarter: I'd like to have an environment variable like PREFIX_FOO_BAR_BAZ_BLAH map to foo_bar.baz_blah because my structs are:

pub struct Settings {
    foo_bar: SubSettings
}

pub struct SubSettings {
    baz_blah: i32
}

I know this is a long-shot because we have to map to a key string, which has no knowledge of its eventual location in a struct.

@dawid-nowak
Copy link
Collaborator

The way I see it, I am looking for two things from the config file :

  1. I would like to be able to express my configuration in richer and hierarchical constructs and then translate those into appropriate objects. Ideally, I should be able to write my config file in yaml, json, toml and properties and that should translate to the same structure type. I would like it if it was possible to express all that complexity through env variables but I am not sure if that is easily doable or even doable for the complex configuration files.

  2. I would like an ability to override certain configuration values in my config file from a different source. An example of a real life use case would be an ability to inject a different database endpoint/password into a config file from the env variable at deploy time to stage or prod environments.

I think there are two ways of addressing point 2:

  1. Patching/Merging - I think this is what the existing code is doing and I do believe that will work fine in a situation where two sources are config files.
    The main problem as I see it is how to patch a complex config file from the env variable. If I wanted to add/replace client "client_topic:orders" from the env variable, I probably would need to create a very long env variable and even then I am not sure if that would work.
pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: orders
      producer_topics: []      
  1. Replacing - In this case, we pepper config files with placeholders so the env variables can be injected in the appropriate places. This is not pretty either. The config file can look pretty ugly. Possibly we could break the formatting of the file so using linting tools would be restricted. But this approach allows for a simpler injection of env variables.
pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: ${orders_topic}
      producer_topics: []      

I'd like to know what other think:

  • Is there one/preffered way of doing things ?
  • Could these two approaches live side by side ?
  • If so, should they be always available or available as a feature?

@szarykott
Copy link
Contributor

@dawid-nowak

Let me refer to your concern with point number 1. You do not need to patch values from environment variables directly. What you can do it to keep various versions of config files in VCS named for example appsettings.yaml, appsettings.dev.yaml or appsettings.prod.yaml. Those files could look like this for example :

appsettings.yaml

pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: orders
      producer_topics: []  

appsettings.dev.yaml

pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: orders_dev
      producer_topics: []  

What you'd keep in env variables is just the name of environment, here : dev and prod and you'd load appropriate file based on its name in runtime. appsettings.yaml would serve as default settings used regardless of environment.

Ideally you'd want to write in settings files "down the waterfall" only those lines you'd want to overwrite (here : only client_topic with path leading to it, dropping e.g consumer_group as it is the same in parent file). On the other hand though, if you kept all other keys that do not change, reading config files would be much easier as all keys would be visible. The cost would be however more typing when creating them.

What I propose is how we solve such issues at my company (and in .NET Core world from which I come from) so you could take a look at IConfiguration inteface (and its implementation) in .NET Core which is heavily battle-tested and proven to work in many circumstances.

@tmccombs
Copy link
Contributor

tmccombs commented Sep 6, 2020

The cost would be however more typing when creating them.

And needing to update all files whenever you need to change a config. Which can be easy to forget.

@Korvox
Copy link

Korvox commented Sep 11, 2020

Just to throw my bone in the ring I've been fighting against the state of config in Rust this last week and came up with this as my personal ideal workflow:

  1. Structure your config as a (possibly heirarchical composite) of deserializable & (optionally) serializable structs. If you want defaults you'd want to also derive / impl default.
  2. derive(config) the whole thing to generate a bunch of impls to use this struct as a config store.
  3. The config derive would generate a builder pattern to construct a config from various sources, ex:
    let conf = Settings::named("myapp").watched_file().env().args().build()?;

The methods defined by the derive would include the following:

  • ::new() for unnamespaced configs
  • ::named() for config where every source is namespaced by the appname given (so if your app is foobar, it would source env from FOOBAR_<struct_field> etc).
  • .file() to read dirs:;config_dir() + named()
  • .watched_file() to watch the same default config file instead
  • .named_file("name") to watch a specific file with the suffix inferred
  • .env() to read environment variables from the aformentioned named namespace
  • .named_env("name") for custom prefix env search
  • .args to parse the cli args

And more, obviously.

What I really like about this is the derive and builder pattern lets you incrementally add sources and eventually have this opaque configuration struct that can do a lot behind the scenes - I can see constructors that can take event loop handles to use async or threadpools for the IO and these builder methods can be generally complex because I can imagine a lot can be done with constexpr at compile time to figure out what the final struct is supposed to actually look like. Ultimately you would only be doing real work in the build() function using a bunch of constexpr prefill.

But for a user all you are doing is making a plain old data struct, using a derive macro, and building your config with the options you want. No other boilerplate. To write to it you just set properties on the struct and call .save() which it and the drop impl can writeback changes with if the struct derives serialize.

@vorner
Copy link
Contributor

vorner commented Sep 13, 2020

Hello

Your builder approach looks kind of nice on the high level. But there are some things that don't add up for me:

If you already want your struct to be Deserialize (possibly derived or manually implemented), what is the reason why you want the derive(Config)? Usually, derive macros generate some code based on the structure/shape, with possibly descending into the fields. I assume this is not needed here, as the structure is used from Deserialize. If it needed to read the structure/shape, then it would probably require that everything down (the field types, their field's types, etc) are also derive(Config), which would be kind of a downside (a lot of things provide Deserialize implementation, but wouldn't generally provide Config impl).

If you only care about these new methods, wouldn't an extension trait (something like itertools or similar does to add more methods to all the iterators) on top of Deserialize be enough? Or, if don't want everything to be Config automatically, having a trait with all the methods being provided would also work:

// In the lib:
pub struct Config: DeserializeOwned {
  fn new() -> ConfigBuilder<Self> { ... }
}

// The user:
#[derive(Deserialize)] 
struct MyConfig {
  params: Vec<String>,
}

impl Config for MyConfig {} // Yes, empty here

I can see constructors that can take event loop handles to use async or threadpools for the IO and these builder methods can be generally complex because I can imagine a lot can be done with constexpr at compile time to figure out what the final struct is supposed to actually look like.

I'm not sure if integration with threadpools should be in scope of this library. I mean, if there's a library that can read bunch of formats, combine different sources together and spits out parsed data structure, that's great. Should watching files and realoading be done in the same library, or in some that builds on top? I don't really want to pull in the whole tokio to read a 20 lines long config file 😈.

To write to it you just set properties on the struct and call .save() which it and the drop impl can writeback changes with if the struct derives serialize.

That sounds a bit scary. First, I believe most applications don't modify its own configuration. At least servers don't. And it seems to be not obvious where it would store the changes if, for example, the whole config was read from the environment variables. And if save takes a path to the file, does it provide anything on top eg. serde-json, if you structure already implements Serialize?

I don't see what exactly you want to do in drop, but saving configuration is a fallible operation and there's no way to signal these from within destructors, so such pattern is generally discouraged (you can either panic, which is probably the wrong thing to do, or ignore the error, which is definitely not a robust design).

So, in general, the composition by the builder pattern seems nice (nicer than the fn merge(&mut self, …) thing), but would need some details ironed out.

I also wonder what scope the library should have. For me it would make sense to split the parsing/merging/decoding of configuration from other features, like file change watching (both because one might not to want it, and because there's a million ways how you may want to trigger your config reload, or where it happens and how the changes are propagated back into the program; as the author of spirit, I know how much complexity there is :-| ).

@tmccombs
Copy link
Contributor

It would need to derive to support adding the fields to support watching, etc.

Why? The way I imagine watching working, is when a fule changes, it just reloads the full config.

Save on close is just something a lot of other config frameworks do because in practice users will generally forget to save (or in something like GUI configuration not have to save on every user change).

Save on close doesn't necessarily require save on drop. And having the save method separate givea the developer more control, and the ability to handle errors.

@tmccombs
Copy link
Contributor

I see, so the purpose of the derive is to store additional state in the config state? Couldn't the same thing be accomplished with a wrapper Config<T> struct that implements Deref<Target=T> and possibly DerefMut?

You don't even need this in the crate either since an end user is defining the struct and can thus impl drop themselves if they want it.

Exactly. If that is desired behavior, the user of the library can just implement Drop on their config struct themselves (or if a wrapper type is used, as I mentioned above, implement it on Config<MyConfig>).

@vorner
Copy link
Contributor

vorner commented Sep 16, 2020

First, AFAIK, derive macros don't allow you to inject more fields. They allow you to add code, but outside of the struct (eg. make impl blocks).

Second, why would I ever want, as the user of the library, to intermix the config structure (the data loaded from the config file) with the manager? Shouldn't the manager be some kind of object that does the tracking and provides me/contains/derefs to the actual config I've decided to use? Putting both into the same thing conflates two things into one. I can certainly want to pass a config structure to a function, but not pass the manager into the same function.

@rakshith-ravi
Copy link

A little off-topic, but how about a macro that embeds the configuration provided at compile-time into your code? Essentially, during compile time (perhaps in CI), the configuration can be provided through the same merge-semantics (and combining multiple configurations) and provided as a constant struct, making value accesses hard-coded? Idk just a random idea. Maybe that combined with env vars during build time would be useful in a few use-cases (especially in embedded scenarios), considering runtime performance would be improved significantly, since the compiler can now optimize for constant values.

@AlexMikhalev
Copy link

// load configuration from arguments
let settings: Settings = config::env::args(clap_matches).load();

+1 for this one. I found re-creating large parts of config.rs manually to achieve clear precedence rules flags > env > toml.

@djeedai
Copy link

djeedai commented Apr 22, 2022

Adding to this issue, but I might also open a separate one. I've just found out about this crate, and although it looks like it covers part of what I need, I'm actually put off by the sheer number of mandatory dependencies. The support for the various serialization formats are behind Cargo features as one would expect, but it seems (or did I misunderstood something?) that the various sources are all mandatory. In particular, web/online ones like HTTP via reqwest are almost a deal breaker; many applications either do not want any network connection, or do want absolute control over them due to privacy or security concerns (especially when dealing with user settings), or have special requirements in terms of how networking can be done. Same for disk access in some contexts. The result is that this crate depends on a long list of crates to provide various sources that cannot be disabled, for something that is not its main role (I'd assume it's a config management and serialization/deserialization library, not a networking one or a filesystem abstraction one). I've yet to decide if I'm going to use this, some part of it, or just rebuild the config management part (the core responsibility of this crate) just to be able to rip off all the clutter around the various sources I don't want/need.

@matthiasbeyer
Copy link
Collaborator

I moved this issue to the project so we can consider it as well (see here for more details).

If you, as a commenter in this issue, want to contribute to the effort of re-thinking config-rs, please open issues for your ideas (see the "Help wanted" section here).

@SamuelMarks
Copy link

Two years later! - How're the plans progressing?

@matthiasbeyer
Copy link
Collaborator

#549

@apps4uco
Copy link

Hi, just wanted to add that allowing Humanized numbers would be great, maybe leveraging the parse_size crates parse_size function to convert 1KB into 1024 etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests