12 Fractured Apps

andrewguenther · on Dec 23, 2015

The article starts off by talking about trying to run legacy applications and then just casually tells you to modify the source code to solve all your problems. Not everything you want to run in a container is something which you want to modify.

Also, I really don't see the problem with the volume mounting approach. It may look ugly on the command-line, but when you're using an orchestration tool, it is actually quite painless and solves a lot of the issues mentioned in the article.

I do completely agree that one should completely avoid running a custom entrypoint. They are often written and then forgotten and can lead to really nasty bugs.

Lastly (this is more a Docker criticism, not the article) writing to stdout is all well and good, but Docker does a terrible job handling it. There is no way to truncate logs to stdout coming from a Docker container so Docker just holds onto the entirety of the log contents until the container is removed. For long running applications, this makes logging to stdout a deal-breaker. In a move absolutely contrary to my last point, I commonly use a custom entrypoint purely to handle logging. It passes all arguments to the application and then redirects stdout and stderr to cronolog which writes to a volume in my log pulling container.

ploxiln · on Dec 24, 2015

I created this logrotate rule to rotate the docker json log files:

  /var/lib/docker/containers/*/*-json.log {
      copytruncate
      rotate 2
      size 40M
      missingok
      nocreate
      nocompress
  }

But for most services I still mount a log dir and redirect stdout and stderr to a file manually, since neither the json log format nor that directory structure are particularly convenient for my purposes.

More recent docker releases have added other logging drivers, and added options to the "json-file" logging driver for "max-size" and "max-file", which sounds appropriate, but that came after I worked around the problem and I haven't upgraded yet.

brazzledazzle · on Dec 24, 2015

>The article starts off by talking about trying to run legacy applications and then just casually tells you to modify the source code to solve all your problems. Not everything you want to run in a container is something which you want to modify.

I think it goes without saying that practicality/pragmatism trumps convention or philosophy every time. I suppose the question we have to ask ourselves every day is "are we being resistant to change or are we being pragmatic?" Anyway, he does cover this when he mentions applications you don't have control over. So I'm not sure it's a fair criticism. This post seems to be aimed directly at developers or people with the latitude to effect change with them.

Weizilla · on Dec 23, 2015

Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?

In this example, it's only six simple values but what happens when you have 10 or 20? Or you have 10 applications with six values each that need to be deployed to four different environments?

Or what about if there are multiple teams making concurrent changes at the same time? What if some application starts failing due to to a recent variable change and you want to revert back or track down who made the change?

I feel like once your application grows to more than just a few simple values, you end up creating a big file to populate these values and you end up back to using configuration files.

sagichmal · on Dec 23, 2015

Environment variables are just a mechanism, or transport, for getting information from the environment to your process. You don't version or manage them, any more than you'd version or manage command line flags. Change control is the responsibility of whatever system sets the variables, or execs the template that produces the run script, or whatever. You don't actually set env vars on the host.

deedubaya · on Dec 23, 2015

Using environment variables for everything is wrong too. API keys and other sensitive information should be in environment vars. Non-private information should definitely be in config files.

If you need the flexibility of environment variables for a semi-configurable non-secure variable, use them to overwrite a sensible default.

andybak · on Dec 24, 2015

Can someone explain to me why env vars > filesystem for secrets? They seem equivalent in most ways that actually matter.

In general 12-factor gets my hackles up as it comes across as dictatorial with explaining why. Even when I'm wrong I like to be gently convinced rather than hit over the head with rule book. Can someone point me to an extensive source that clearly justifies each factor? Ideally with an actual debate about each point (as this often surfaces the strongest parts of the case for something)

nzoschke · on Dec 24, 2015

I have a tremendous amount of experience with the 12 Factor book having worked with at Heroku for 6 years. I am also working on an open source 12 Factor platform called Convox.

One reason the factors are presented as prescriptive because apps that don't do this won't work on Heroku.

Is there a specific factor you'd like to deep dive into?

I'll pick one to start: Environment.

There are many ways you can set and read configuration for an app: env, config files, config tools like chef or puppet, config database like zookeeper or etc. if we are talking about config like a database URL you could also use a service discovery system.

Env represents the simplest contract between your app and whatever platform is running it (the OS, Docker, Heroku, ECS).

If the platform can update env and restart the processes to get the new settings, no other config management is necessary.

It's UNIX, it's simple, and it helps you bootstrap any more specialized config management if you need it (set ZOOKEEPER_URL or CHEF_SERVER_URL).

So ENV feels like a factor to become very prescriptive about.

The biggest debate I can see is if ENV is sufficient to build our micro services on, or if service discovery "magic" is necessary too. I.e Zookeeper, Airbnb SmartStack or Docker Ambassador containers.

For the vast majority of apps, ENV is sufficient.

I personally still build my more complex apps around ENV and at all costs avoid needing to use a service discovery system. The added complexity and operations isn't worth it to me.

I have a strong hunch that service discovery won't become an app development pattern that everyone uses until a managed platform (like Heroku) offers it. Perhaps this is where Docker, Swarm and Tutum is headed.

mcpherrinm · on Dec 24, 2015

I don't like env vars for secrets: They tend to be easier to leak out of your process, especially via execing child processes. At least with files you can open them CLOEXEC.

Files on disk have the problem of being persistent, though, and being subject to Unix permissions, instead of the process you're explicitly giving the env variables to.

The solution I work on is to keep files in a non-persistent filesystem that audits access and ensures tight permissions (Keywhiz), though in many cases a tmpfs and auditd will do the same.

hobs · on Dec 24, 2015

Anybody care to comment why this was downvoted?

kevinastone · on Dec 24, 2015

Because your environment variables can (and should) be defined by a config somewhere, but passing the information via the process environment allows more flexibility than requiring access to a file. Someone said it will elsewhere that env variables are a transport, not the storage of the config information.

ben_straub · on Dec 23, 2015

You effectively build your configuration file into the thing that knows how to run your container. If you're running Kubernetes, this is either a secret or the replication controller definition file. For docker-compose, this is the `docker-compose.yml` file. Or it's the script that starts your container.

But it's pretty common to put service credentials into a config file, so it's an anti-pattern to version-control them. It's _way_ safer not to, which means you shouldn't be version-controlling the thing that runs your container? This is sort of tricky. We're doing it by volume-mapping a non-version-controlled file for database credentials, and storing the rest of the configuration in the database.

jacques_chester · on Dec 24, 2015

In CF-land the most common pattern I've seen for important keys is a "secrets" repository which is merged with the base config at push time.

andybak · on Dec 24, 2015

With Ansible you have an encrypted vault file that stores your secrets. Similar principle I guess.

mkulke · on Dec 24, 2015

> Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?

We're doing this: the env vars are stored as a stage/container/key hierarchy in version-controlled eyaml files (yaml with encryption at the value level, nice for git diffs). At deployment the eyaml gets decrypted by ops or jenkins converted into a container env map (in our case a kubernetes resource controller).

Additionally we tag deployed containers with the config's git hash to have reproducible deployments, which is actually pretty useful. (again we leverage kubernetes labels, but this principle should could be applied to other orchestration tech i guess).

jacques_chester · on Dec 24, 2015

> Two things which I never understood about using environment variables are how do you version control the changes and how do you manage these variables when you have more than just a handful of them?

If you're using Cloud Foundry, you put them in your manifest.yml and check that into source control. When you do `cf push`, they'll be updated.

Disclaimer: I work for Pivotal, who donate the most engineering to CF.

kawsper · on Dec 24, 2015

So how is that different, or better, than putting in the source?

jacques_chester · on Dec 24, 2015

Different repos with different access credentials to create another layer of separation between secrets and source. Defence in depth.

jordic · on Dec 24, 2015

You shouldn't put in your repo sensitiva data.

nzoschke · on Dec 24, 2015

Thank you for this guide Kelsey.

I worked with countless apps and developers at Heroku on getting their apps running well on the platform. There was always one great mystery: why not build our apps a bit differently (dare I say better) to work in the cloud?

The database connection pattern is spot on. For any network resource, try to connect and if there is a problem retry with back off.

Also log the connection error events that a monitoring tool can notify off of.

I've seen apps that have the absolute worst behavior around this error that will happen. The worst is crashing the app in a way that triggers thrashing around restarts.

We had to build tons of complex restart back off logic into the Heroku platform to handle this.

I often wish app frameworks made this easier. I think most devs don't do these things because it is a chore for s problem that only happens occasionally.

But what if Rails baked this into ActiveRecord?

At one point Rails only logged to files on disk. We came together to add stdout logging to the framework.

kelseyhightower · on Dec 24, 2015

You're welcome.

You have managed to capture the spirit of the post. The goal was to highlight areas where developers can take action, and how improving even the little things can go a long way to improving the entire system -- even the one you can't see.

In the early parts of my career I would often take pride in building complex systems to accommodate for misbehaving applications. Throw in some fancy Nagios alerts and a sleep depriving on-call rotation; I looked like a hero.

Then I learned how to write code.

This was the turning point in my career (2006). I was now brave enough to modify "legacy" production applications to take advantage of "new" infrastructure features like service discovery (use DNS records instead of IPs), and logging directly to syslog (asynchronously with proper ring buffers).

I was willing to learn any language too: Python, PHP, or Java, it did not matter because it allowed me to take action and contribute at the heart of the application.

I'm not saying platforms that also handle "misbehaving" applications or complex failure scenarios are unnecessary. I just consider those platforms as extra layers of protection, not a free pass to ignore building applications that take responsibility for reliability from startup to shutdown.

merb · on Dec 23, 2015

Everything written in this article is easily done without docker. Just with groups / systemd. Docker makes these things way harder to do. Especially since packaging isn't so hard anymore. Mostly Dynamic Language's are harder to package, but when I think about Java, Go or other compiled language's you mostly could just create a single file which you could version.

jsnk · on Dec 23, 2015

This is an aside, but somewhat related.

Where are you suppose to store secretive environment variables like database password or apikey/secret pair etc (say in Ubuntu server)? Is storing in something like ~/.profile or ~/.bashrc and doing `export SECRET_KEY='plaintext_secret'` on the server enough, or should they be treated in an even more secretive manner?

Perceptes · on Dec 24, 2015

This is a problem that several new tools are being built to address, e.g. Vault from HashiCorp and Keywhiz from Square. Storing the secrets unencrypted on disk on the host system is not a huge improvement over having them in the application by default. Ideally you want a system to store them securely that allows them to be extracted and decrypted only using credentials and policies you control. They should only ever exist in memory (which is why Keywhiz uses FUSE, for example.) Some container orchestration tools like Kubernetes also include their own mechanism for securely storing and retrieving secrets and making them available to applications.

allan_s · on Dec 24, 2015

How do you manage things like database schema (except by switching to schema-less database :) ), is it your software which is suppose to create it if non existing ?

In case the database is pristine, ok I see, I do a "create table if not exist"

But if the database is at version N and I want to go version N+1 , what do I do? I mean I do know about database migration tools, but how does it integrates in your "pure 12factor" deployement, as it means when you deploy you need to have at least this order:

1. bring up the database 2. run the migration script 3. bring up the application

and the article was advocating to make things in a way that you don't need to have a "you MUST first run this, then that"

spotman · on Dec 24, 2015

One way to think about this that maybe overlaps with the theme of this article, but also stands on its own with specific regard to database modifications is that you often need to have the ability to have multiple versions of everything alive at one point in time.

So maybe that's a schema A and a schema B, or maybe you have applied schema B which only app version 1.1 is optimized for , but version 1.0 is what is in production immediately following your database migration. So you can't make changes in schema B that would immediately render app 1.0 broken, which means you need to not box yourself into a corner with future assumptions as much as possible.

Ultimately if downtime is not an option you end up writing these capabilities in at every layer. Whether it's an api endpoint or code talking to a database you often have to make carefully thought out changes incrementally to ensure that things can all operate simultaneously, and often this ends up having metadata about the versions of everything as an option for taking different code paths.

This article touches on this in the way that if suggests making your app deal with an available database and one that is not available. Same with a field in a schema or a payload. To make your code less brittle instruct it what to do in both cases.

zimbatm · on Dec 25, 2015

Threat your database like an API and only introduce backward-compatible changes. For example if you want to rename a column, make a new column with the target name and a trigger that keeps both columns in sync. Once the new code is deployed everywhere you can remove the trigger and the old column.

davidbanham · on Dec 24, 2015

At no point does the author elaborate on why failing to start if the environment isn't sane is a bad thing. All my software checks for the things it expects to be in place, then bails hard and fast if they aren't.

It's then up to the init daemon to attempt to restart that process, and up to the monitoring and orchestration tools to ensure that the environment returns to sanity over time.

fauigerzigerk · on Dec 24, 2015

I agree. Things become incredibly murky if preconditions are not clearly seperated from optional settings with sensible defaults. For a server-side multi-user application to just go off hunting for data stores or even create new ones whenever configuration settings are missing seems like a security and data integrity nightmare.

If something is a precondition then the app shouldn't act like it wasn't just to make Docker configuration easier. It needs to fail fast.

Retrying database and other connections is sometimes the right thing to do in long running applications. But I dont' think application launch is the right time for it. Application launch is an opportunity to make sure that all dependencies were in place at some point. If things break later on, the odds of it being a temporary issue are much better.

parasubvert · on Dec 24, 2015

"If something is a precondition then the app shouldn't act like it wasn't just to make Docker configuration easier. It needs to fail Fail fast doesn't mean "crash completely". It means "fallback to the next sensible approach".

"But I dont' think application launch is the right time for it. Application launch is an opportunity to make sure that all dependencies were in place at some point. If things break later on, the odds of it being a temporary issue are much better."

This is presuming you or some human has control over the lifecycle of an individual process.

The trend in both mobile and cloud native is the process model, which says the opposite: your app process can and will be killed or relaunched any time by the underlying OS. It may do so out of sequence with backing service availability. This, retries (with a time or count bound, perhaps) are a sensible default.

fauigerzigerk · on Dec 26, 2015

>This is presuming you or some human has control over the lifecycle of an individual process.

No, that doesn't matter at all. It's like with DBMS transactions. I want a defined point at which the system is in a known good state or fail in some detectable way.

For long running processes that get bounced automatically, there needs to be some sort of monitoring anyway. Monitoring is easier if the application does not linger endlessly in an inconsistent state.

parasubvert · on Dec 27, 2015

"I want a defined point at which the system is in a known good state or fail in some detectable way."

I think we're (as an industry) getting to scale and complexity of systems that warrants systems to heal themselves for a range of predictable and well-understood failure modes, in a way that doesn't require my manual interference.

davidbanham · on Dec 29, 2015

Absolutely! But that doesn't need to be the app's job. It's the role of an orthogonal process, the monitoring or init daemon, to say "Hey, this process bailed. I should restart it."

That way all your app has to worry about is "Every time I get started, I should try and connect to my dependent services. If it doesn't work, bail."

And the monitoring process gets to worry about things like "I should retry X times before giving up. There need to be at least Y instances of this process running."

fauigerzigerk · on Dec 27, 2015

Agreed, but I don't think we can do that without having transactional boundaries. When a transaction fails, it doesn't necessarily mean that a human has to intervene. It just means that we have a reduced set of possible states that are known to be consistent. I don't see how we could ever hope to define correct self healing algorithms without reducing the number of possible states a system can be in.

parasubvert · on Dec 29, 2015

Yes. Consistency guarantees in the face of distributed failure is a popular topic. (CAP theorem, etc). The whole "cloud native" (12 factor apps, microservices, immutable or disposable infrastructure) movement is also trying to describe ways to simplify most codebases so that you keep as much of the system as stateless/ephemeral as possible.

What is interesting is that , for the stateful/persistent data processing, most large scale systems are rejecting transaction boundaries as we know them (fully serializable isolation and consistently) for relaxed consistency. There are some good articles and papers on how programming needs to change to enable better self healing / "recoverable to a known state" behaviour, such as CRDTs.

ozim · on Dec 24, 2015

I think that author just does not have experience outside of what he does. Maybe his systems can fall back to sane defaults. But what is sane default if you have to communicate with 3rd party server and your system is worthless when connection is not there? You have to have ip/domain name configured.

For cars if something is wrong then in some cases you can start and even drive but users get warning. If there is something really wrong car will not start.

So I think what author suggests is at least asking for trouble.

Almost everywhere as quoted:

"Everything in this post is about improving the deployment process for your applications, specifically those running in a Docker container, but these ideas should apply almost anywhere."

parasubvert · on Dec 24, 2015

"I think that author just does not have experience outside of what he does."

Kelsey's recommendations aren't that different from general resilient systems guidelines in the Erlang community, or any of the many notes on how distributed system development is Different.

"But what is sane default if you have to communicate with 3rd party server and your system is worthless when connection is not there? "

The sane default is to wait and keep trying to connect for at least a bounded period of time (and a sensible approach to backoff).

The point is that, if you're building distributed systems, you need to account for partial failure. One of the 12 factors is the process model: your app should be considered a process that will be killed and/or restarted at will. It might do so when your backing services are currently unavailable.

The sensible thing is to retry a bounded number times (with backoff) before giving up. Sometimes the underlying application platform also does this for you (by killing your app process and rescheduling it, if it doesn't come up after a certain time period).

Baking in timing dependencies as a failure mode makes software less resilient.

lwf · on Dec 24, 2015

The author is a she.

parasubvert · on Dec 24, 2015

Kelsey isn't a she :)

lwf · on Dec 25, 2015

Oh, welp. That's what I get for gendering names. My apologies, to both you and Kelsey.

kazagistar · on Dec 23, 2015

There is one thing that really confuses me about 12 factor...

It suggests that you provide the locations of backing services in the config. This seems insane, since it means that you cannot move any backing services. Do they expect you to restart when you switch backing services? Do they expect you to run all your internal traffic through load balancers?

We provide each container with the address of the local service discovery server (Consul) and it finds what it needs itself, when it needs it. I assume everyone using this kind of setup in production is doing something similar?

sagichmal · on Dec 23, 2015

"Providing the location of the backing services" doesn't mean the physical address, but the logical address, managed by an e.g. load balancer, capable of abstracting over physical changes as necessary. Consul is one way to do it; DNS is another, and there are many more.

jacques_chester · on Dec 24, 2015

> Do they expect you to restart when you switch backing services? Do they expect you to run all your internal traffic through load balancers?

12 Factor was pioneered by Heroku, so yes.

Edit: I don't understand the downvote. This is the factual answer. 12 Factor apps are supposed to delegate a lot of cross-cutting concerns to the platform, including the problem of "where does that service live?" and "can you give me another chunk of service?"

andrewguenther · on Dec 23, 2015

CNAMEs are perfect for this. "customer-data-service.my-product.com" could point to absolutely anything. Give your CNAME a name which is meaningful no matter what service is behind it and you'll never need to change it (externally).

dominotw · on Dec 24, 2015

any problem can be solved with a layer of abstraction.

jacques_chester · on Dec 24, 2015

> As you can see there’s nothing special here, but if you look closely you can see this application will only startup under specific conditions, which we’ll call the happy path. If the configuration file or working directory is missing, or the database is not available during startup, the above application will fail to start.

Why not let a PaaS do this for you? Heroku, Cloud Foundry, OpenShift or the others I've yet to learn about?

Disclaimer: I work for Pivotal who donate the majority of engineering effort to Cloud Foundry.

cpitman · on Dec 24, 2015

I agree with the general idea, let an already existing product handle problems that everyone has. The last part is still important though, if we are going to build "distributed" apps, then we need to handle those dependencies failing or being briefly unavailable.

And the reality is that every app becomes distributed as soon as it has a database or client. A ton of legacy applications make the implicit assumption that the network is reliable, and fall over hard when it isn't true anymore.

jacques_chester · on Dec 24, 2015

I agree that legacy apps break 12 factor rules. That's why the 12 factor app is "a thing" in the first place. An app which follows them can survive being killed and restarted.

I've worked on some legacy migrations and it's usually a process of incrementally chiselling out services and cleaning up hard-coded assumptions. Tedious but usually doable.

kimi · on Dec 24, 2015

Whaleware was created to address some of these issues with Docker. Default configuration, a definite application init phase, plus internal monitoring and lifecycle reporting. https://github.com/l3nz/whaleware

akramhussein · on Dec 24, 2015

If you haven't seen Kelsey give a talk, well worth watching - really funny guy and intelligent. He gave one on Nomad and Kubernetes that's good.