How We Automate Our Infrastructure

avitzurel · on Jan 26, 2016

This seems to be the "new standard" when it comes to startup infrastructure beyond Heroku.

However, what frustrates me the most about it, is that every startup is left to figure out everything from scratch and it seems impossible.

There are many tools you need to familiarize yourself with, too many to be comfortable with.

Companies that already figured it out write blog post like this, which provide insights but it's super high level, as a startup engineer this gives you absolutely no value other than "yes, they are using it too".

I wonder if there's a solution for this generic enough to open source that will be a good start for startups.

You check out the project, read some docs and in 2-3 hours you have a cluster running. Kind of a "batteries included" devops solution.

hackercomplex · on Jan 26, 2016

In my view what we're really talking about here is PaaS. Every shop is left to implement a private PaaS on their own for the most part. There are software companies out there who specialize in helping teams deploy this kind of architecture for example:

https://pivotal.io/platform

http://deis.io/

both are open source technologies based on Docker that are gathering momentum and you can hire consultants to help you deploy either one.

I personally don't use Docker. The startup I'm building has chosen to standardize on the JVM for all application code so we leverage the JAR file as a kind of container. The Java ecosystem already solved the problem of zero-downtime deployments a long time ago so for us deploying can be as simple as shipping new jars file across the network.

Instead of using Docker to drive development we simply spin up development database/redis/etc instances in the cloud which automatically join a development VPN network. All of the non-VPN interfaces are automatically firewalled off. One nice advantage of this setup is that developers who have slow laptops are still able to work. I'm a big fan of this approach.

Check out Wildfly's "High Availablity" features if you're interested in one way that the Java Ecosystem can make headaches like zero-downtime deployment, HTTP health checks, monitoring, caching, and even load balancing disappear.. It'll deploy non-java code too as long as it's on the JVM. If you're a Scala only shop there are some great Scala-only alternatives available to boot.

dmichulke · on Jan 26, 2016

I use a similar setup from time to time and one of my main problems with this is XMHell - you set up everything in XML and whenever something doesn't work, it's hard to find help because you'll get a stack trace instead of "WARNING: <xmlpath> seems to be incorrect" or "Module X: When using feature Y with setting Z, you also need to define A, B and C"

A recent example is trying to make Hibernate work via postgis and postgresql as a datasource in Wildfly. We weren't able to solve it, we could only work around it.

Finally, if you need some behavior off the beaten path, you'll have to use lots of annotated Java which makes it easy if you know all this but it's hard to read a Java file with 10 annotations for classes and methods, simply because you don't know what happens when.

To summarize, it's an ok solution if you have a Java guy with lots of experience in all this (luckily we had one). Otherwise you gonna have to learn a lot (as in by heart) because you can't really reason about XML and annotations (as you could, e.g., when composing services in Clojure).

hackercomplex · on Jan 26, 2016

As you point out there's definitely a learning curve with this tech, however there's also a learning curve with rolling your own versions of many of it's capabilities which is the alternative. I also personally find XML configuration files distasteful, but luckily I found that the default configuration was good enough for me to stand up a clustered "HA" environment with all the bells and whistles.

There are projects out there such as Torquebox for JRuby and Immutant for Clojure which attempt to wrap some of this configuration in a DSL which I think is really convenient.

It is true though that if you want to extend Wildfly you need to create a Wildfly module which can mean writing Java code. I look at this as being similar to how if you want to extend NGINX you have to be prepared to write your configuration in LUA or C. Unfortunately the JBoss community isn't as well documented as NGINX is right now, so realistically there is some pain.

Since my application didn't need to have Wildfly manage database thread pools on it's behalf I didn't feel the specific pain point that you mention.

Over the long term, when thinking about scale I enjoy knowing that there are companies like Redhat out there who provide support for this technology, but I don't anticipate ever needing to engage them. With this tech configuration is always the hard part but once it's up and running it's performance characteristics are predictable and the Undertow web server is in the top 5 on the latest benchmarks: https://www.techempower.com/benchmarks/#section=data-r11&hw=...

brikis98 · on Jan 26, 2016

Trying to create battle-tested, pre-packaged, "batteries included" DevOps solutions is exactly what we're trying to do at Atomic Squirrel [1]. We think there needs to be a middle ground between Platform as a Service (PaaS), like Heroku, where everything is hidden and magical, and therefore, harder to debug, customize, scale, and Infrastructure as a Service (IaaS), like AWS, where you have full power and flexibility, but also way too many moving parts to learn and manage for a small company. If your company needs something like this, contact us at info@atomic-squirrel.net.

[1] http://www.atomic-squirrel.net/

avitzurel · on Jan 26, 2016

Actually working on an open source solution around this exact space.

I think the space between Heroku and AWS remains to be solved and lots of companies will jump on the train (if it's good and fast enough).

noir_lord · on Jan 26, 2016

In my case I found Ansible was pretty much all I needed and it only took me a day or so to get my head around the basics (though I'm still learning all the other interesting stuff you can do) - in truth though I've been running and deploying servers for years with bash and python stuff so it just felt like a more generic better put together version of stuff I was already doing.

dano · on Jan 27, 2016

Agreed. I'm quite pleased with Ansible having been around a while and seen the growth through cf_engine, custom scripts, chef, puppet, and salt. Ansible is certainly quite easy to get going, very flexible, and precise.

joshmanders · on Jan 26, 2016

I've been heavily researching and working with Docker. While I am building my new business, I decided to give back to the open source community and have been doing my best to open source every aspect of the business that I can without giving away our business. One of the things I am doing is abstracting a docker deployment workflow out into a service of it's own.

Basically what I have come up with is a push or merge on master in github, triggers a build in the service, which will push your new image up to docker hub, then ping an agent that runs on your docker host, notifying it of the new image, and any meta data needed to determine how it should proceed.

So for example, if git push to master on app, webhook fires on service, service pulls code, runs commands to run tests if you want, build docker image, etc. Push new image to docker hub, pings agent on docker host, agent gets data, pulls new image, deploys new container, does health checks, and then starts migrating new traffic to the new container before taking old container offline.

DanielDent · on Jan 26, 2016

We've actually been considering if we should turn our internal environment into a product and/or service-product mix.

We've got a mostly automated cloud-agnostic process for spinning up a multi-datacenter Mesos cluster which integrates nicely with a docker CI workflow.

I'm pretty sure it's quite valuable, though I'm also unclear what people would be willing to pay.

helloiamaperson · on Jan 26, 2016

> I'm pretty sure it's quite valuable, though I'm also unclear what people would be willing to pay.

Your solution probably works great for your needs, but this stuff is expensive to productize. See https://www.openshift.org/

DanielDent · on Jan 26, 2016

Who I note basically decided to start from scratch because this docker thing happened. Cloud Foundry has had to do a lot of re-thinking too.

But my impression of openshift is that it's really a work in progress and that they haven't actually gotten it adequately productized yet.

Docker has gotten enough developer buy-in into containerization that I think it's fundamentally changed what it means to do infrastructure, be it PaaS or IaaS or whatever.

brazzledazzle · on Jan 26, 2016

Probably an oversimplification on my part but it seems like OpenShift is nice enterprise friendly features sprinkled on top of Kubernetes. Which is no small thing, they've contributed quite a few patches to Kubernetes that are critical for a lot of enterprises. And a read/write GUI shouldn't be a hard requirement these days but a lot of big companies have this ingrained habit of treating IT like a commodity and subsequently hire people that are so uncomfortable with the CLI they're openly hostile to the idea of even touching it.

Then there's command and control. OpenShift seems to be more friendly to keeping things under someone's thumb. In an ideal world people would use Kubernetes the way Google uses Borg and devs would be trusted the way they are at Google. But between corporate fiefdoms and the aforementioned hiring practices many companies are still very far from that ideal.

avitzurel · on Jan 26, 2016

IMHO, the problem is not only whether people will pay.

The problem is that this includes too many new tools that startups need to learn about, implement and maintain.

Most people, just reading "Mesos" "Marathon" or other in the space just tune out.

DanielDent · on Jan 26, 2016

Agreed - and it's understandable why. The list of things that you need to understand before you get to work on your actual goal is way too long.

And there are a lot of nuances that make different tooling the right choice for different situations.

And the problem with trying to simply say "Do this" is what happens if your "do this" flow is tool X (similar to tool Y), but the CTO likes tool Y. Which isn't compatible with tool Z.

There are a nearly infinite number of ways to put together a decent development work flow.

curun1r · on Jan 26, 2016

Speaking as someone who rolled his own version of this, there were a lot of more complete solutions out there, but they all involved some technology that I felt would cause more pain down the road. Whether it's Chef/Ansible/Puppet which are popular, but seem targeted at mutable infrastructure (one of our explicit goals was immutable infrastructure) or Mesos/Kubernetes/ECS/CoreOS which seem targeted at a larger fleet of instances than we're running, there didn't seem to be any starting point beyond composing the right set of tools and writing the glue that made sense for us.

What we ended up with uses Terraform for provisioning instances, Docker (and a private registry) for distributing our application code, Consul for coordinating everything and HAProxy w/ consul-template for dynamic routing. There were only two pieces that we had to write. The first (which we may open source, if we're given the time to clean it up and generalize it) is a small Go agent that runs on provisioned hosts, figures out its role based on instance meta data, pulls its configuration from Consul and handles deployment, both initial and subsequent when a new version is registered with Consul. The second piece is ensuring that CI generates Docker images as artifacts, pushes them to our private registry and updates Consul to indicate that there's new code to deploy.

It took us about a week to get this working and it's been mostly rock solid for almost a year now. Part of why it's been solid is that we understand exactly how every component of it works. The one problem we've had came from not understanding how HAProxy worked (never point HAProxy and an ELB...it will cache the NS resolution and ELBs can change IPs over time). If we'd tried something off-the-shelf, we'd have a much shallower understanding and, since it's not optimized for our use case, we would have run into many more issues than we've had. On the whole, I highly recommend rolling your own. The code that you will have to write is glue code that's really just replacing what would be configuration in something pre-built. I get that it seems imposing to people without devops experience, but between the tools that are available these days and articles like the one we're commenting about, it doesn't take a guru to get everything working seamlessly. Also, the tools from Hashicorp are fabulous. Use them whenever possible. No disclaimer necessary since I have no affiliation with them beyond using their tools and watching their talks on the subject.

drakenot · on Jan 26, 2016

This past summer I spent some time learning Ansible. I've written scripts for the configuration and the deployment of my application's various services. The built-in idempotency of the commands was a big win for me and I feel fairly productive using the tool now.

My only complaint with Ansible really has been that it feels slow at times.

I'm interested in checking out Docker. What exactly does it buy me over my Ansible config/deployment scripts? Does it obsolete them?

dexterbt1 · on Jan 26, 2016

Ansible and Docker are orthogonal technologies. Docker buys you repeatable, application packaging to solve dev/prod parity. Ansible can then become your orchestration tool, doing the heavy lifting to manage not just containers, but hosts, dns, LBs, etc.

drakenot · on Jan 26, 2016

But by using Docker, it does change the way you use Ansible, right? I'm not going to be executing Playbooks against a set of hosts anymore to configure them.

Instead, I guess I'll be using a Ansible to configure a container locally (in place of using Dockerfiles)? Then perhaps a different Playbook to deploy this container to my hosts?

chetanahuja · on Jan 26, 2016

Note, container images are blobs of static content... programs, libraries etc. When they are running, they're akin to processes (or rather, actually are processes) running on a physical machine or a VM. Typically your build process will create docker container images (via some sort of CI tool perhaps).

Ansible is useful for automating tasks on an actual unix machine (VM or physical). Think of it basically as a parallel ssh to your remote machines.

So typically, you'd use docker containers to create reliable packages for your code and use ansible to do things like provision machines, change configs, run one-time commands on groups of machines etc. And yes, you can also use ansible to deploy your docker containers to your servers too. But that part is more manageable with tools like quay etc. which gives you nice things like package versioning etc.

DanielDent · on Jan 26, 2016

One way I like to think about it: docker pull is the new apt-get

jacoelho · on Jan 27, 2016

if you aren't setting capabilities, changing users and limits, and docker is a packing system

crdoconnor · on Jan 26, 2016

Docker has a build file. They're not entirely orthogonal.

It's also not strictly necessary to ensure dev/prod parity.

thraxil · on Jan 26, 2016

> My only complaint with Ansible really has been that it feels slow at times.

Highly recommend Salt then. A bit more of a learning curve, but so much faster than Ansible.

webo · on Jan 26, 2016

Really wish this article either included more details or segmentio open-sources a few of the tools.

calvinfo · on Jan 26, 2016

Totally hear you.

We're planning on open-sourcing some pieces of our Terraform config and service toolkit in the next few months. We're definitely excited to share our internal tooling with the rest of the community.

webo · on Jan 26, 2016

Awesome, looming forward to it!

kreutz · on Jan 26, 2016

Checkout Convox.com. Stellar team behind an awesome project.