Application configs: files or environment variables? Actually both!

07-06-2019 20:30 | by Vitaly Samigullin | in Blog | 1288 words | 6 min to read
tags: architecture config k8s

App configuration is highly opinionated topic. It’s technology stack- and infrastructure-dependent. When it comes to configs there are a few parties with a possible conflict of interests. Software developers want an easy access to app’s configuration, versioning, readability, language support for the config format. Security guys want security! SRE/Administrators want an easy deployment and scaling as well as infrastructure native support for configuration format of choice. Managers, well, they probably want to smile at conferences and don’t want you all to spend too much time on such a miserable question like configuration architecture!

In software development some things change rapidly. Ideas, manifests and frameworks rise and fall. So lets review what’s application configuration today and what it could be.

Environment variables: good and bad parts

Environment variables (env vars or env) are often recommended as the only way to store application configs (see The Twelve-Factor App rule III). Envs are good bacause they are:

Secure

Environment variables are set by who’s responsible for the deployment environment: by developer on local machines, by SRE/Administrator/DevOps in production. With env you won’t compromise your production credentials even if your code’s leaked to the public. It’s also harder for developers to break something in production, because they simply don’t have access to production credentials.

Messing up with envs in the runtime may also be tricky, as you cannot change environment variables for the parent process in *nix systems.

Universal

Environment variables are everywhere! On GNU/Linux, MacOS, Windows. Every production-ready programming language supports envs out of box.

Easy to use

Env is flat. Basically it’s a key-value pair with string value. So loading envs is easy.

There are some bad things about envs too. Envs are:

Unexpressive

The flip side of the “easy” coin is unexpressiveness. Envs are flat key-value strings. You cannot store the whole config as en env. So you need to read envs in your code and then construct some sort of the final configuration, possible overly complex.

Flat env structure also get you into a grey area of bad variable names. Consider the following YAML config:

cluster:
  host_weight:
    node00.example.com: 0.8
    node01.example.com: 0.8
    node02.example.com: 1.0
    node03.example.com: 0.5
db: 1
user: app
password: my_password

It may translate to the following envs to substitute values in the config above:

CLUSTER_HOST_NODE_00=0.8
CLUSTER_HOST_NODE_01=0.8
CLUSTER_HOST_NODE_02=1.0
CLUSTER_HOST_NODE_03=0.5
CLUSTER_DB=1
CLUSTER_USER=app
CLUSTER_PASSWORD=my_password

So, you still store some kind of configuration file in your app’s version control system (VCS). Usually it’s a module in you programming language (e.g. config.py in Python) with some language-specific data structures and even some logic. Values (at least some of them) are stored as envs with quite awkward names. These configs are much less readable as the YAML above.

Need to be stored somewhere eventually

Envrironment variables are stored somewhere eventually: in your admins repository, orchestration system’s manifest files that also checked in the VCS. So even separated from the app’s code repo, envs are still stored in files that potentially could be accessed by more people than they should.

May break release determinism

If you use config files checked into your app’s repository, your config is also a part of the app’s releases. You can effectively rollback to a stable release whenever needed.

In the case of envs your app’s releases are separated from the configs. Ideally environment variables should be versioned too. Both app’s and config’s releases should be synchornized somehow. Otherwise your release determinism may break, rollback to previous versions with zero downtime may become difficult.

Configuration files: good and bad parts

Configuration files have their strengths and weakness too. First, the good things. Files are:

Readable

You don’t write config files as program’s module (e.g. config.py in Python). Instead you use some language-agnostic format for humans like YAML. Then you have both readability and (to some extent) separation code and configs.

You write clean, self-sufficient configs. No more ugly things like:

HOSTNAME = os.environ.get('DB_HOSTNAME', 'db.example.com')

Developers often group config files by deployment enviroment: testing.yaml, development.yaml, production.yaml. Sometimes you cannot reach 100% deploy environments parity. So having all configs at hand may help when fixing a bug.

Part of release

As I said earlier, if a config is checked into app’s repository it a part of app release history. It makes your app releases more deterministic allowing to rollback to earlier releases if necessary with no extra efforts.

There are bad things about files too:

Formats hell

XML, YAML, ini, TOML, JSON, you name it! Probably you don’t have a luxuary of having the only one way to do it in your language. Even in JavaScript JSON is not the only configuration format in use. But in popular general-purpose languages like Python it’s even worse.

In a complex project with legacy components you may find all sorts of configuration file formats. It’s annoying.

Insecure

Junior developers, people on probation period, QAs, admins, maybe even developers from other teams and departments may have access to your app’s secrets when they are in the config files checked into a VCS. That’s a huge security problem, even if your network operation center engineers, admins and devops do a great job isolating production services and networks.

Taking good parts from both envs and files

But what if we take the good parts from both enrionment variables and config files, combine them and throw away the bad parts? Sounds great, doesn’t it?

First, we need to take a closer look at what’s really important when it comes to security. Actually, it’s credentials (login/password, private API keys, cryptographic certificates, etc.) that if kept in secret solve most of the problems that your config files stored in the VCS have. If your production database hostname leaks but username and password are kept in secret you are probably okay. Brutforce attacks, unpatched software vulnerabilities are still real. But your customer database isn’t leaked immediately even if bad guys know your database URL.

Second, we really want to have nice language-agnostic non-flat file format for our configs. It’s opinionated, but I personally think that YAML is superior to all other formats. It’s beautiful!

So what if we use a YAML file and substitute our secrets with environment variables tokens?

db:
  login: user
  password: ${DB_PASSWORD}
mail:
  login: user
  password: ${MAIL_PASSWORD:-my_default_password}

Then our app loads the YAML and interpolates environment variables in it:

{'db': {'login': 'user', 'password': 'my_db_password'},
'mail': {'login': 'user', 'password': 'my_default_password'}}

In the above snippet a database password has been loaded from the $DB_PASSWORD environment variable. Email password had to be loaded from $MAIL_PASSWORD. It’s defined as Bash-style env with default value. It seems that at the time of config loading the $MAIL_PASSWORD wasn’t set so a default value 'my_default_password' is used.

Proof of concept

As a proof of concept I’ve just released a Python library called Piny. It does exactly this:

Reads your YAML-config file marked up with environments variables

Interpolates environment variables with their values

Creates a Python object with configuration ready for use in your app

When using Piny consider best practices for your configuration files:

Maintain healthy security/convenience balance for your config

Mark up entity as an environment variable in your YAML if and only if it really is a secret (login/passwords, private API keys, crypto keys, certificates, or maybe DB hostname too? You decide)

Once config loaded by Piny, validate it using your favourite validation tool (some integrations are coming in the future releases)

Store your config files in the version control systems along with you app’s code.

Environment variables are set by whomever is responsible for the deployment. Modern orchestration systems like Kubernetes make it easy to keep envs secure (see Secrets and Vault).

Piny stands for Piny is not YAML. It’s not only a library name, but also a name for YAML marked up with environment variables.

You can download Piny at PyPi.

Vitaly Samigullin's blog