Don't automatically use environment variables for configuration of production services

I'm worried by a trend I see where software intended for use in production systems is defaulting to automatically loading configuration information from environment variables.

I think this is a bad idea.

Initial configuration to a production system should be explicit and visible, so configuration should come from one of two places:

Note: How those command line flags are generated and provided is a separate, orthogonal, story. Maybe it's a systemd service file. Or a Kubernetes job definition. Or an Ansible role. Or something else.

Using environment variables by default is an attractive nuisance that complicates troubleshooting and reproducibility.

Troubleshooting

When troubleshooting a service I'm almost certainly going to need to try and understand what its configuration is.

The service might provide a dedicated interface to see this, but since there's no standard for this each service's mechanism is likely subtly different. This more information to have to recall when managing a system.

Everything accepts command line flags. So looking at the command line a program was started with (whether that's in a log line, from running ps(1), inspecting the service file, reading the Kubernetes configuration, on a dashboard somewhere, ...) is the universal way to do this.

Using environment variables by default breaks this. Now I have to have a mechanism for seeing the environment variables the service was started with. Either this is going to show me all the environment variables, and I have to remember which one was relevant. Or the mechanism is going to have to know, on a service by service basis, which ones are important and only show me those.

This, and any other mitigation measures like this are added complexity that could be avoided by not automatically using environment variables in the first place.

Reproducibility

This is closely linked to troubleshooting, because when troubleshooting a problem it helps to be able to reproduce it.

If your service is reading configuration from environment variables it no longer suffices for someone to file an issue and provide a simple command line to reproduce the problem.

Now they need to know that the service is automatically loading environment variables, and they need to provide those as well. And if they don't you then you have to (a) know that these variables are important, and (b) get back to them and ask for this additional information.

When is it OK?

There are some occasions where this is OK.

It's not a production service

For non-production services, sure. Anything goes. As it says at the top of this post, these are my rules for handling production services.

It's an extremely common environment variable

There are a number of environment variables that are so widely used that it's impossible to escape using them. Things like HOME, LC_*, PAGER, TZ, and so on.

https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html has a good list.

Even if you do this you're still creating troubleshooting and reproducibility problems for yourself, especially with the LC_* and TZ variables.

In fact, I think there's an argument to be made that production-grade software that deals with locale related information should refuse to read the LC_* and TZ variables and require that that information be provided as explicit configuration.

If that means that the command line is a bit longer because it has a --tz=${TZ} entry I think that's a small price to pay for making critical configuration information like this completely explicit.