Learning to operate Kubernetes reliably

KaiserPro · on Dec 20, 2017

Much as it burns me to admit this, for this usecase, jenkins is king. <60 nodes and its perfect.

At previous job, we had migrated from a nasty cron orchestration system to jenkins. It did a number of things including building software, batch generating thumbnails and moving data about on around 30 nodes, of which about 25 were fungible.

Jenkins job builder meant that everything was defined in yaml, stored in git and was repeatable. A sane user environment meant that we could execute as user and inherit their environment. It has sensible retry logic, and lots of hooks for all your hooking needs. pipelines are useful for chaining jobs together.

We _could_ have written them as normal jobs to be run somewhere in the 36k node farm, but that was more hassle than its worth. Sure its fun, but having to contend with sharing a box that's doing a fluid sim or similar, so we'd have to carve off a section anyway.

However kuberenetes to _just_ run cron is a massive waste. It smacks of shiny new tool syndrome. seriously jenkins is a single day deployment. transplanting the cron jobs is again less than a day (assuming your slaves have got a decent environment.)

So, with the greatest of respect, talking about building a business case is pretty moot when you are effectively wasting what appears to be > two man months on what should be a week long migration. Think gaffer tape, not carbon fibre bonded to aluminium.

If however, the rest of the platform lives on kuberenetes, then I could see the logic, having all your stuff running on one platform is very appealing, especially if you have invested time in translating comprehensive monitoring into business relevant alerts.

jvns · on Dec 20, 2017

Hi! Post author here! I agree that it's really important to be careful of "shiny new tool" syndrome -- one of my primary goals in writing this post was to show that operating Kubernetes in production is complicated and to encourage people to think carefully before introducing a Kubernetes cluster into their infrastructure.

As you say -- I think by itself "we want to run some cron jobs" isn't a good enough reason by itself to use Kubernetes (though it might be a good enough reason if you’re using a managed Kubernetes cluster where someone else handles the cluster operations). A goal for this project was to prove to ourselves that we actually could run production code in Kubernetes, to learn about how much work operating Kubernetes actually is, and to lay the groundwork for moving more things to Kubernetes in the future.

In my mind, a huge advantage of Kubernetes is that Kubernetes' code is very readable and they're great at accepting contributions. In the past when we've run into performance problems with Jenkins (we also use jenkins-job-builder to manage our 1k node Jenkins cluster), they've been extremely difficult to debug and it's hard to get visibility into what's going on inside Jenkins. I find Kubernetes’ code a lot easier to read, it's fairly easy to monitor the internals, and the core components have pprof included by default if you want to get profiling information out. Being able to easily fix bugs in Kubernetes and get the patches merged upstream has been a big deal for us.

eric_b · on Dec 21, 2017

> A goal for this project was to prove to ourselves that we actually could run production code in Kubernetes, to learn about how much work operating Kubernetes actually is, and to lay the groundwork for moving more things to Kubernetes in the future.

Why wasn't the final sentence "and to re-evaluate if moving forward was even a good idea?"

Because I get nervous every time someone is relying on their patches to be included upstream. Or they need to dive in to the internals of something repeatedly. That screams "not production ready" to me.

After reading the post, Kubernetes did not sound at all like a slam dunk in terms of a solution, let alone a foundation for more mission critical infrastructure. The Jenkins solution offered by the parent sounds more reasonable, even with the objections you list.

Edit: Take my comments with a grain of salt, but from internet armchair vantage point it does sound like Kubernetes was chosen first, and rationalized second. (Though I very much appreciated the thoroughness with which you went about learning the technology)

jmillikin · on Dec 21, 2017

Hello! I work at Stripe and helped with some aspects of the Kubernetes cron stuff -- maybe these answers can be helpful.

  > Why wasn't the final sentence "and to re-evaluate if
  > moving forward was even a good idea?"

I think that's sort of implied -- complex technical projects have a risk of unexpected roadblocks, and it's important that "stop and roll back" always be on the list of options. Never burn your ships.

We invested a (proportionally) large amount of engineering effort to ensure we had the ability to move the whole shebang back to Chronos ~immediately. As noted in the article, we exercised this rollback feature several times when particular cronjobs deviated from expected behavior when run in Kubernetes.

  > Because I get nervous every time someone is relying on
  > their patches to be included upstream. Or they need to
  > dive in to the internals of something repeatedly. That
  > screams "not production ready" to me.

This is the same basic model as disto-specific patches to the Linux kernel.

Every engineering organization reaches the point where they want more features than are available in an existing platform. The most practical solutions for this are to launch a new platform ("Not Invented Here"), or contribute code upstream. The first option can provide better short-term outcomes, but is usually inferior on multi-year timescales.

Consider that with a mature build infrastructure, internal builds are actually the latest stable release plus cherry-picked patches. This provides the best of all worlds -- an upstream foundation, with bug fixes on our schedule, and an eventually-consistent contribution to the community.

RA_Fisher · on Dec 21, 2017

Julia is a visibility pro. When things scale, you need to be able to look inside the thing. If that's tough :grimmacing: for probably hundreds of developers. What a waste! /irony

antoncohen · on Dec 20, 2017

I disagree that Jenkins is king for this. Jenkins is a single point of failure, is isn't a highly available distributed scheduler. It is a single master with slaves. While it is easy to configure Jenkins jobs with code (Job Builder, Job DSL, Jenkinsfiles), it is a pain to manage Jenkins itself with code. Plugins, authentication, all the non-job configuration, that is usually done via the GUI.

Saying Jenkins can be configured in a day, to the degree that Stripe configured Kubernetes (with Puppet), is disingenuous. It would take more than a day to do the configuration management of the slaves, getting the right dependancies for all the jobs.

How to you isolate job executions in Jenkins? In Kubernetes each job inherently isolated in containers. In Jenkins you have a bunch of choices. Do you only run one executer per slave? OK, but then you have a bunch of wasted capacity some of the time, and not enough capacity other times. You could dynamically provision EC2 instances to scale capacity, but then you need a setup to bake your slave AMIs, and you have potentially added ~3 minutes to jobs for EC2 provisioning. You can run the jobs in Docker containers on the slaves, that will probably get you better bin packing, but it doesn't have resource management in the way Kubernetes does, so you could easily overload a slave (leading to failure) while other slaves are underutilized.

Doing Jenkins right is not easy, there are solutions to all the problems, but isn't just fire it up and it works.

Stripe was running Chronos before, which is a Mesos scheduler. So they have experience with distributed cluster schedulers. They were probably comfortable with the idea of Kubernetes.

They mention this as a first step to using Kubernetes for other things. So they probably wanted to used Kubernetes for other things, and this seemed like a low risk way to get experience with it. Just like GitHub started using Kubernetes for their internal review-lab to get comfortable with it before moving to riskier things (https://githubengineering.com/kubernetes-at-github/).

dominotw · on Dec 21, 2017

> it is a pain to manage Jenkins itself with code. Plugins, authentication, all the non-job configuration, that is usually done via the GUI.

This is not true, all the configuration is scriptable via groovy scripts. We run bunch of groovy startup scripts that configure everything post launch. There is an effort to support this better[1] by jenkins team.

> How to you isolate job executions in Jenkins? In Kubernetes each job inherently isolated in containers.

We run one docker container/build on docker swarm. Each build gets its own isolated/clean environment. There is no EC2 provisioning ect. We already own and maintain docker swarm setup we just run jenkins/jenkins agents on it. I assume if you are using kubernetes it would be similar setup.

> Jenkins is a single point of failure, is isn't a highly available distributed scheduler.

I agree with this to an extent. If you are running jenkins on scheduler it can be rescheduled but you inflight jobs are dead.

1. https://github.com/jenkinsci/configuration-as-code-plugin

vorg · on Dec 21, 2017

> > it is a pain to manage Jenkins itself with code

> This is not true, all the configuration is scriptable via groovy scripts. [...] There is an effort to support this better[1] by jenkins team

The link you gave confirms it by saying managing Jenkins code "require you know Jenkins internals, and are confident in writing groovy scripts". Neither GUI's (like the one shown in your link) nor procedural languages (like Apache Groovy, still procedural even though its collection API is crippled for Jenkins pipelines) are very good for configuring software. Nor is an unreadable declarative language (like XML).

A readable declarative language (like YAML, as shown in your link) is the solution. Languages like Groovy were an over-reaction against unreadable XML in the Java ecosystem. The correct solution is to switch from an unreadable to a readable declarative language for configuring software.

dominotw · on Dec 21, 2017

> Languages like Groovy were an over-reaction against unreadable XML in the Java ecosystem. The correct solution is to switch from an unreadable to a readable declarative language for configuring software.

I somewhat agree with you. Unfortunately Jenkins team seems to have bet in the opposite direction by going full groovy https://github.com/jenkinsci/pipeline-examples

y4mi · on Dec 21, 2017

Since when does docker swarm support build steps pre launch?

Are you sure you're not just using plain docker on docker swarm nodes?

KaiserPro · on Dec 21, 2017

> isn't a highly available distributed scheduler.

Bingo! thats the point, its a cron replacement.

But to tackle your first point, K8s might be distributed, its not inherently reliable. Yeah sure people run it in production, but there are a myriad of bugs that you bump into. I've lost clusters due to tiny issues that ran rampant. Something that I've not had in other cluster or grid engine systems.

if we are talking AWS, then having the jenkins master in an auto scaling group with decent monitoring sorts out most of your uptime issues,

The reason I say it'd take a day to configure jenkins is because the jobs have already been setup in cronos. It should literally be a copy-pasta job. All the hard work of figuring out which jobs are box killers, which can share, which are a bit sticky has been done already, all thats changing is the execution system.

What level of isolation are you after, and for what purpose? if jobs can't live on the same box, then thats almost certainly bad job design. (yes there are exceptions, but unbounded memory or CPU usage is just nasty.) There maybe need for regulatory isolation, but containers are not currently recognised as isolated for that purpose.

ianstormtaylor · on Dec 21, 2017

It feels like you didn’t read the article.

The author made clear multiple times that they were using cron jobs as a test bed for Kubernetes, and they chose to “overengineer” because they’re looking to use Kubernetes for more and more of their needs over time. You’re kind of arguing against a straw man.

I think it’s actually a great example of how Stripe thinks about technology choices.

They’re interested in choosing fewer tools that are better built and can grow to solve more needs. And they’re evaluating tools not just by “time to complete X random project”, but by other longer-term heuristics like maintenance levels. And the best way to do that is to start using the tool for a single need, investing more time in learning/research than is required for the need itself—ensuring that it really is a solid, foundational solution—with the understanding that you’re choosing technology for the long run. Then continue to expand your use of the tool over time, reaping benefits on your initial time investment.

KaiserPro · on Dec 21, 2017

I read the article, I understand completely and I've heard that argument before. Thats why at my company we have three incompatible, half arsed K8s clusters.

At the point where you have to fix upstream bugs, its the point where one says: fuckit, its not stable enough, more trouble than its worth. Lets use gaffer tape and move on. As for maintenance, without company buyin for transplanting the _entire_ stack, its questionable. And if there are only two people, and you have to maintain an entire distributed stack, that smacks of pain.

One company, one platform.

ownagefool · on Dec 21, 2017

I think you're sort of right but not really.

If the benefits of running k8s outweigh the effort of kicking a few patches upstream. Further, if nobody is kicking patches upstream, where exactly are out open source solutions coming from?

I would counter argue about the times jenkins has bit me in the ass, but actually, most solutions will when you go deep enough.

KaiserPro · on Dec 21, 2017

Jenkins is an utter utter arse, don't get me wrong. I would gladly pay for circleCI for 90% of usecases. We have > 90 jenkins masters here (don't ask) all in various states of rot. all of them are unceasingly tedious.

However, for getting a script to run on a certain bunch of nodes, at a certain time for given conditions, its pretty simple. (Unless you have a fetish for the myriad of unstable jenkins plugins)

K8s however isn't simple for that usecase. If I had to read the code and then push changes _before_ it worked for my usecase I'd have dropped it like a bag of sick.

However I do take your point that if no one pushes upstream then its not very fun at all.

takeda · on Dec 21, 2017

If you have a hammer ...

numbsafari · on Dec 20, 2017

I've also previously used jenkins for cron to pretty good effect (I like to call it "jcron"). The ability to define jobs in yaml and have it be driven from your scm is really awesome.

However, k8s does more than just scheduling where pods run. It also ensures that they run with the correct security and availability constraints. When you add in things like affinity (don't run this job on the same machine as that job, or, only run jobs for this tenant on nodes assigned to that tenant), storage management (connect this job to this volume), networking (only let this pod talk to this service and the monitoring layer, don't let anyone connect to the pods running the job), and much, much more.

Yeah, you can do that with jenkins, or like, just cron. I know, because I did it for 18 years before I had ever heard of Kubernetes.

But, just like I can reach for Django or Rails or whatever it is that Java programmers use these days to build my web application, I can lean on Kubernetes to build my infrastructure.

I estimate that leveraging GKE has saved me in the range of $400k in direct employee costs, not to mention time-to-market advantages. As we grow, I expect that number to go higher.

bonesss · on Dec 21, 2017

> I can lean on Kubernetes to build my infrastructure. ... I estimate that leveraging GKE has saved me [$BigMoney]

I'm very sympathetic to the view that jenkins, or something comparable, is viable and cost effective for a lot of shops if you're looking exclusively at direct project costs.

As you've pointed out, though, as a building block of Enterprise software the ability to scale out in, and across, multiple clouds consistently is an economic and development boon so powerful I don't think one should really be looking at k8s as just a microservice/deployment platform: it's a common environment-ignorant application standard. Picking and choosing per service whether you should be hosting in GKE, AWS, or on-premise, applying federated clusters, recreating whole production environments for dev... It's a gamechanger.

It's totally possible to fire up a new Jenkins solution in EC2, but as of a few weeks ago Kubernetes is click-and-go in all three major cloud providers. It totally reshapes how we're looking at development projects with suppliers, testing, etc, as we can create fictionalized shared versions of our production environment for development, integration, and testing. As an emerging industry wide standard we can demand and expect Kubernetes knowledge from third parties in a way a home-brewed Jenkins setup could never match.

user5994461 · on Dec 20, 2017

Jenkins also has a notion of hosts and tags to decide where jobs are assigned.

ecnahc515 · on Dec 21, 2017

Though it's resource awareness is lacking, which is where k8s shines. Honestly I find combining Jenkins and K8s a relatively pleasant experience. The jenkins kubernetes-plugin has gaps and issues, but with time it will mature. There's no reason you can't combine them to get the best of both worlds.

hinkley · on Dec 20, 2017

My current company keeps trying to cook up elaborate systems to keep certain deployments from happening while others are going on and I couldn’t recall ever having to solve this previously which is odd because of course this has been a problem before.

Yeah I was using my CI system to handle the CD constraints and it was so straightforward it hardly registered as work. I was setting up one build agent with a custom property and all the builds that couldn’t run simultaneously would all require an agent with that property. So they just queued in chronological order of arrival. Done. Next problem.

lima · on Dec 20, 2017

Red Hat's OpenShift has a very nice Jenkins <-> Kubernetes integration, too.

solatic · on Dec 21, 2017

Depends on the nature of the cronjobs you're scheduling. If your cronjobs cannot run in parallel on the same node (or, more likely, you cannot trust that they can safely run in parallel on the same node, because somebody else wrote the job and didn't need your review or approval before deploying to the scheduler), then you need to restrict each Jenkins node to a single executor, and you cannot run more cronjobs in parallel than you have Jenkins nodes, or else those cronjobs will be delayed. Because Kubernetes enforces the use of containers, multiple jobs can be run on each Kubernetes node with no issues (by design).

Remember - if there's a one in a million chance of a collision, it'll happen by next Tuesday.

clutchdude · on Dec 20, 2017

Why not both? - https://github.com/jenkinsci/kubernetes-plugin

You provide a scalable infrastructure underneath your jenkins install while not dealing with the issue of node/agent allocation. Plus, you get kubernetes if for your not-so-simple crons.

scrollaway · on Dec 21, 2017

Been using Jenkins a bunch here and cronjobs are the only thing it does really nicely. We're thinking of switching to CircleCI for builds though (which has been a pain because no self-hosting), and I'm not sure Jenkins makes sense to keep as only a cronjobber.

Has anyone used Airflow for cronjobs? is it a good idea or a terrible one?

013a · on Dec 20, 2017

I would argue that, while Stripe is going with a scratch build, this could be motived by AWS's lack of a good managed Kube offering, which is changing in the next few months.

With a managed Kube offering, setting up Kube is much much easier than this jenkins setup you are suggesting. And, there's no overhead charge. Why would anyone go through the hassle of manually provisioning machines like you suggest when AWS/GCP will do it for you?

Its overkill in the same way using DynamoDB for something that only experiences a handful of writes every day is overkill; who cares? The scale is there if you need it, but it doesn't cost anything to not use it.

kureikain · on Dec 21, 2017

Setting up a K8S cluster isn't that hard actually.

From my experience, the hard part kickin when dealing with stateful service which needs to associated with volume.

Even with a managed cluster, you still have to solve that problem. Either you pre-provision disk or use dynamic volume.

Next is when upgrading K8S version. with a stateless service, it's a walk in a prt to upgrade. With data volume it's more tricky to upgrade because you want to control the process of replacing node and want to ensure the data volume get mounted and migrated to new node properly.

Thing get harder especially with stuff like Kafka/ZooKeeper when pods get remove and the re-balancing happen.

In other words, managed Kuber actually offer not much. You still have to be carefully planning and it isn't magically solve all problem for you.

013a · on Dec 21, 2017

That's true, but I'm not sure if using Jenkins would avoid these problems you outline. And that's really the crux of what the OP is suggesting; that Jenkins or something smaller than Kube would have been a better choice.

kureikain · on Dec 21, 2017

That's a fair point. I agree that Jenkins will not solve these problem and in fact they come with their own problem anyway. I was argued on the sole point of setting up K8S.

Other than that I agree with you.

thesandlord · on Dec 20, 2017

(I work for GCP)

For 95% of people, I'd say going with the managed version is the right choice.

However there are some reasons why you wouldn't use a managed service. If you need a custom build, custom drivers, etc.

karussell · on Dec 21, 2017

Do you think GKE will support multi clouds setups or hybrid scenarios at some point? For cost reasons we have to put some big servers off the cloud ...

user5994461 · on Dec 20, 2017

And most importantly, to use across multiple clouds!

doxcf434 · on Dec 21, 2017

I've often used Jenkins for this use case, and really appreciate how it scales to teams too. While it works well, there are lots of pitfalls in it too, logs filling up disks, lots of configs to tweak. I think you've just gotten past those issues so it's stable for your use case.

manojlds · on Dec 21, 2017

> If we could successfully operate Kubernetes, we could build on top of Kubernetes in the future (for example, we’re currently working on a Kubernetes-based system to train machine learning models.)

mring33621 · on Dec 21, 2017

Here in Chicago, I've spoken with two different, profitable companies that were migrating critical jobs to jenkins and were so far very happy with it.

hinkley · on Dec 20, 2017

I wonder if it's feasible or worthwhile for someone to try to extract the task and batch processing code from Jenkins into a separate project. Perhaps the analytics too.

With a little work you could expand that out to make a travis equivalent using the same code base.

ironjunkie · on Dec 20, 2017

Agree with the shiny new tool syndrome.

Also remember this is Stripe, and they like to advertise through Engineering blogs (and they do that quite well to be honest).

I'm getting cynical here, but I'm sometimes wondering if they didn't specifically chose a cool shiny tool, so that they can speak about it (and advertise through blogging)

alexebird · on Dec 21, 2017

I always search for mentions of Hashicorp Nomad in the comments section of front-page Kubernetes articles like this. There are often few or no mentions, so I’d like to add a plug for the Hashistack.

For some reason Nomad seems to get noticeably less publicity than some of the other Hashicorp offerings like Consul, Vault, and Terraform. In my opinion Nomad is right up there with them. The documentation is excellent. I haven’t had to fix any upstream issues in about a year of development on two separate Nomad clusters. Upgrading versions live is straightforward, and I rarely find myself in a situation where I can’t accomplish something I envisioned because Nomad is missing a feature. It schedules batch jobs, cron jobs, long running services, and system services that run on every node. It has a variety of job drivers outside of Docker.

Nomad, Consul, Vault, and the Consul-aware Fabio load balancer run together to form most of what one might need for a cluster scheduler based deployment, somewhat reminiscent of the “do one thing well” Unix philosophy of composability.

Certainly it isn’t perfect, but I’d recommend it to anyone who is considering using a cluster scheduler but is apprehensive about the operational complexity of the more widely discussed options such as Kubernetes.

kvz · on Dec 21, 2017

Being a bit of a HashiCorp fan I tried Nomad for Transloadit but at the time it did not support persistent volumes. K8s had that already. The more I started looking into k8s as an alternative, the more compelling features I discovered that Nomad did not have yet.

With the velocity of k8s it's hard to imagine how Nomad could catch/keep up. K8s has operators, Helm, etc. That just means you can add battle-tested components off the shelve with a single command. So, less wheel-inventing and boilerplate writing to do for us.

With the backing of so much larger community/entities it also feels like I’m less likely to be the first one to discover a new bug. RedHat or Google or one of their customers will have hit and fixed it already, and my production platform keeps humming along nicely. K8s has just had more flytime and exposure to crazy environments and workloads, so more kinks are going to be ironed out.

I always did like the “do one thing right” unixy approach of Hashicorp’s toolset, and that you can pick the pieces you like. But (sadly for them) that means I can now pick Vault or Consul and run it on top of Kubernetes (re-using k8s' internal etcd is not recommended) if I wanted. I'm actually not overly sorry for them, seeing as how they're locking up more & more features behind enterprise products. I haven't checked in a while but wouldn't be surprised if they also had a Nomad Enterprise already. Nothing wrong with HashiCorp wanting to make money, but if there also is k8s without those restrictions..

jnsaff2 · on Dec 21, 2017

I have a few production Mesos clusters under my belt and one production Nomad and I really like Nomad and Mesos is not bad.

Kubernetes seems to be a lot of magic and NIH and tries to do everything itself, whereas Mesos and Nomad are nicely composable and easy to reason about.

Nomad's biggest benefit for me is a very nice integration with Vault (and Consul), I can have Nomad ask for a container instance specific secret which Vault then goes and generates and later immediately revokes once that container dies. Maybe this is possible with Kubernetes but I have not seen anything that tight yet.

IAM instance profiles are nice but they are instance wide, but having each container a unique, short lived and properly scoped set of secrets injected at the last possible time and immediately revoked afterwards makes me feel all warm and fuzzy inside.

theptip · on Dec 21, 2017

> Kubernetes seems to be a lot of magic and NIH

Not heard that criticism before, what are you referring to in particular? The NIH part seems incongruous to me, since Google were a major contributor in inventing warehouse scale computing and cluster schedulers (c.f. the Borg and Omega papers, etc.).

SahAssar · on Dec 21, 2017

What's your take on Fabio vs. Traefik? I had not heard of Fabio before, but they seem to support a similar featureset.

toong · on Dec 21, 2017

Catch 22: the lack of traction/adoption is the main point that stops me from exploring it more.

I would have to put so much effort in convincing customers and management to not go the (now almost default?) Kubernetes-route, that it's risky trying something else. A small hiccup in Nomad, would be enough for the pitchforks to come out.

akvadrako · on Dec 21, 2017

I never heard of Nomad, but I can't see why I would choose it over the much more popular and standardised k8s.

The biggest benefits seem to be

(1) simplicity, but GCE and minikube are easy enough to learn in a day and

(2) ability to run non-containers, but docker containers are generic - they can run java apps just fine.

zie · on Dec 21, 2017

I would argue the biggest strength is maintainability. Managing and keeping up a distributed cluster with k8s is WORK. If you are not at the scale where you can dedicate full-time staff to managing only k8s, you shouldn't even be touching k8s. You need full-time staff to keep it alive.

Nomad is operationally simple, you can run it out of your normal devops roles, you don't need dedicated staff. Mostly because you can pretty easily wrap your head around what it does and how it works.

This saves you bundles of cash and time.

akvadrako · on Dec 21, 2017

I don't see why - I have my GCE cluster running fine with zero maintenance work.

zie · on Dec 21, 2017

Zero maintenance work implies you are not doing security patches or upgrades, so as soon as you have a problem, not only will you be left holding the now broken pieces, nobody will have any reason to help or support you, unless you pay them $$$$$$'s(and even then.... maybe not).

I hope whatever you are running under k8s isn't crucial or important, and I really hope I'm not a customer of whatever you "operate".

Maintenance is real, that applies to everything if you want it to work reliably for any length of time. There are various ways to handle maintenance, do a little consistently and constantly (what most of us professionals do) or do large bulk-replacements every X time (like when stuff crashes and burns - and nobody can remember how to fix it, so they just replace it with whatever is new and shiny).

kureikain · on Dec 22, 2017

I second to this. Setting up K8S is a walk in the part.

Upgrading it is hard. Especially with stuff like Kafka/ZooKeeper run on a K8S cluster.

kemitche · on Dec 21, 2017

GCE is a hosted k8s. Google does the maintenance for you, to my understanding.

zie · on Dec 21, 2017

AH! sorry. I didn't realize Google started offering hosted k8s.. That def. keeps maintenance down, since Google does it for you. It's been a while since I've dug into k8s in depth.

ownagefool · on Dec 21, 2017

Google, Amazon, Microsoft and IBM all offer managed kubernetes.

zie · on Dec 21, 2017

Cool. This definitely makes it easier to use k8s, but that's very different from running k8s. My comment(s) are geared about running k8s yourself. My systems are all on physical hardware we own, hence I don't really pay a lot of attention to the latest and spiffiest in hosted platforms.

brianwawok · on Dec 21, 2017

have you ever used google hosted k8s?

I am a 1 man shop. I manage my cluster in ~10 minutes per month.

zie · on Dec 21, 2017

AH! sorry. I didn't realize Google started offering hosted k8s.. That def. keeps maintenance down, since Google does it for you. It's been a while since I've dug into k8s in depth.

brianwawok · on Dec 21, 2017

Has to be a good year or so? Been a while.

AWS is the new one, just started a few weeks ago.

zie · on Dec 21, 2017

My systems are all on physical hardware we own, hence I don't really pay a lot of attention to the latest and spiffiest in hosted platforms.

brianwawok · on Dec 22, 2017

You probably should :)

erkkie · on Dec 21, 2017

One huge benefit of nomad is that it can schedule non containers too, enabling fixing up legacy systems incrementally.

tokenizerrr · on Dec 21, 2017

I'm not sure how Consul is doing nowadays, but when I used it about two years ago I've had nothing but issues.

toong · on Dec 21, 2017

Having a correct mental modal of the Consul architecture and realizing that the raft cluster (consistency) and the consul cluster (gossip) are two separate layers, does wonders.

Additionally, in the early days there were some tools missing (like online modifying the raft peer members) that are all there now.

Running in production and very happy with it!

mephitix · on Dec 20, 2017

Setting aside the k8s content itself, I love the way this article is written. It's not a typical tutorial or tips/tricks but takes you time-traveling through the experience of a big company adopting nascent tech. Lot of great things to take away even outside of the kubernetes tips.

unmole · on Dec 21, 2017

Julia Evans is something of a celebrity. Her personal blog is an absolute gold mine: https://jvns.ca

robszumski · on Dec 20, 2017

> “Sometimes when we do an etcd failover, the API server starts timing out requests until we restart it.”

This is likely related a set of Kubernetes bugs [1][2] (and grpc[3]) that CoreOS is working diligently to get fixed. The first set of these, the endpoint reconciler[4], has landed in 1.9.

More work is pending on the etcd client in Kubernetes. The good news is that the client is used everywhere, so one fix and all components will benefit.

[1]: https://github.com/kubernetes/community/pull/939 [2]: https://github.com/kubernetes/kubernetes/issues/22609 [3]: https://github.com/kubernetes/kubernetes/issues/47131 [4]: https://github.com/kubernetes/kubernetes/pull/51698

pishpash · on Dec 21, 2017

I don't get this. Didn't Kubernetes come out of Google Borg that had been in use forever? The second write should be more elegant and impressive -- why so many basic bugs?

alpb · on Dec 21, 2017

Kubernetes takes some concepts from Borg. A system like Borg would be very closely coupled to Google‘s infrastructure that there’s probably very little to open source from there without open sourcing the entire machinery.

Also, any large scale system like Borg developed at a large company like Facebook or Google will have completely opinionated one-way-of-doing-things for a lot of aspects. This doesn’t work for the world outside where lots of developers from different backgrounds, lots of projects with different requirements exist.

obeattie · on Dec 21, 2017

I think this bit from "Borg, Omega, and Kubernetes"[1] (which is an excellent read) sheds light on this:

> The Borgmaster is a monolithic component that knows the semantics of every API operation. It contains the cluster management logic such as the state machines for jobs, tasks, and machines; and it runs the Paxos-based replicated storage system used to record the master’s state.

So it sounds as though Borg includes its own storage system. As I understand, Google has a set of (very complex) libraries written in C++ that implement Paxos/Multi-Paxos[2], which they have not open sourced.

[1] https://research.google.com/pubs/pub44843.html [2] https://research.google.com/archive/paxos_made_live.html

ecnahc515 · on Dec 21, 2017

The concepts are based on similar ideas in Borg, but the things it shares with Borg ends there.

The implementation is effectively entirely from scratch, so bugs will exist.

ShakataGaNai · on Dec 21, 2017

IIRC from one of their talks.... K8s was supposed to be Borg 2.0 in many respects. They decided early on in development that it was a good tool and had lots of potential, but "fixing" Borg would be less work than replacing it. So k8s takes the Borg 2.0 concepts without being any of Borg code.

rsanders · on Dec 21, 2017

Kubernetes didn't come with all of the other Google infrastructure.

scarface74 · on Dec 21, 2017

I'm curious about what people think about HashiCorp's Nad bs Kubernetes.

I chose Nomad because I'm already using Consul and I wanted to run raw .Net executables. Would it have been worth it to use Docker with .Net Core?

Not trying to change my infrastructure now, but just curious about whether it is worth the time to play with it on the side.

wmf · on Dec 21, 2017

Nomad appears to be better designed, more scalable, and easier to operate than k8s, but it will fall behind pretty rapidly since k8s has 100x more developers.

pm90 · on Dec 21, 2017

That isn't necessarily true (playing devils advocate): OpenStack had gajillion developers and still failed (mostly).

Although k8s does seem to be designed much better. I use it personally too and hope for its success.

YesThatTom2 · on Dec 21, 2017

Such good writing style AND useful technical content. Why can't all blog posts be this good?

nindalf · on Dec 21, 2017

The author writes regularly and her posts almost always reach the top of HN. Like most skills, improvement comes with practice. If a person is willing to put in the same time and effort as jvns has, I'm sure they would be rewarded with similar results.

djsumdog · on Dec 21, 2017

I haven't been at a k8s shop yet, but at my last job we used Marathon (on DC/OS). I know you can run Kubernetes on DC/OS, but the default scheduler it comes with is Marathon.

Is there an advantage to one over the other? It looks like in both cases, you need a platform team (at least 2, maybe 3 people; we had a large complex setup and had like 10) to setup things like K8s, DC/OS or Nomad, because they are complex systems with a lot of different components .. components like Flanel vs Weavenet vs some other container networks, handling storage volumes, labels and automatic configuration of HAProxy from them (marathon-lb on DC/OS).

All schedulers (k8s, swarm, marathon) seems to use a json format for job information that's pretty specific, not only to the scheduler, but to the way other tooling is setup at your specific shop.

perfmode · on Dec 20, 2017

Why do you need a 99.99% from job completion rate? Why not just design for failure and inevitable retries? Almost seems like you grant platform users a false sense of security by making it very reliable but not perfect.

sisk · on Dec 20, 2017

My guess: because financial systems.

A lot of traditional financial instruments 1) are not resilient to failure and 2) run at fixed times in batches. I’m confident it’s not their own systems that set the requirement of rigidity.

hinkley · on Dec 20, 2017

I’ll hazard a guess that this has to do with the fact that the work load is a set of scheduled tasks.

Their customers expect the cron jobs to run when they expected and how they expected.

With that constraint restarts look a lot less acceptable.

YesThatTom2 · on Dec 21, 2017

How are those two things different?

ad_hominem · on Dec 21, 2017

How do you deal with sidecar containers in CronJobs (and regular batch Jobs) not terminating correctly?

https://github.com/kubernetes/kubernetes/issues/25908

jvns · on Dec 21, 2017

We don't run sidecar containers in cron jobs yet. That said, here's a workaround (from that issue): https://github.com/kubernetes/kubernetes/issues/25908#issuec...

ad_hominem · on Dec 21, 2017

I'm aware of the workarounds in that thread. Just wondering if Stripe had a different workaround but I guess not.

jmillikin · on Dec 21, 2017

That GitHub comment is Stripe's workaround! I copied it nearly as-is from our internal job setup boilerplate.

asimpletune · on Dec 21, 2017

What is the benefit of using Kubernetes over Mesos (or in conjunction with Mesos)?

vicaya · on Dec 21, 2017

FTFA: "We’d previously been using Chronos (with Mesos) as a cron job scheduling system, but it was no longer meeting our reliability requirements and it’s mostly unmaintained (1 commit in the last 9 months, and the last time a pull request was merged was March 2016) Because Chronos is unmaintained, we decided it wasn’t worth continuing to invest in improving our existing cluster."

Though Chronos has a release recently with a bunch of fixes, Mesos is inevitably fading as a legacy platform.

asimpletune · on Dec 21, 2017

> Mesos is inevitably fading as a legacy platform.

Because of Chronos? This is a bizarre thing to say. Mesos actually works extremely well. Whenever I ask the why kube over Mesos question, I never get a good answer. I think because people just don’t know Mesos. Also it wasn’t made by google.

vicaya · on Dec 21, 2017

Chronos is just an example. There're many bugs in Mesos that don't get fixed for months/years. Mesos core is legacy (pre 11) C++ code nobody wants to maintain.

hatred · on Dec 21, 2017

Please don’t spread FUD when you have no clue what you are talking about. Mesos core is certainly not legacy and has been on C++11 for a long time/is very well maintained and has a large active development community. In case a bug has fell through the cracks; kindly reach out on the user mailing list and I am reasonably confident you would get a response.

Disclaimer: Apache Mesos committer/PMC

asimpletune · on Dec 21, 2017

This whole thread reads like it's from n-gate.

pm90 · on Dec 21, 2017

> Mesos core is legacy (pre 11) C++ code nobody wants to maintain.

This is actually very very VERY important. Go is a lot more concise (IMO) that C++, generally when I'm curious about how something works in a project written in Go, its much easier to follow the logic.

asimpletune · on Dec 21, 2017

Go is nice. I like it a lot. It's very readable, it reduces the number of good ways you can do something to usually just one, it's fast, fat binaries are awesome, great concurrency primitives, etc...

benjamin_mahler · on Dec 21, 2017

There's something to be said about how the code is written. I've seen easy to read C++ and hard to read go (I found go's built in http server source code to be a tough read) and vice versa. In Mesos we strive to write very readable code and so I would hope that despite being modern C++, it is approachable for an uninitiated reader.

vesak · on Dec 21, 2017

Or perhaps you do get good answers but you choose to forget them, because they don't quite suit your situation.

asimpletune · on Dec 21, 2017

I'd be interested to hear if you have one. I'm serious, I really just want to understand better, not trying to be controversial. Also, "it's not written in go" doesn't count, although I do like go a lot.

benjamin_mahler · on Dec 21, 2017

Take a look at the Mesos releases, a lot of progress is being made and and the project is well maintained. Where are you getting this misinformation?

whalesalad · on Dec 21, 2017

I cannot think of a single reason to choose Mesos over Kubernetes unless you are literally Twitter.

gtaylor · on Dec 21, 2017

The answer to that depends on your particular usage case and requirements. There is no simple, always-true response.

asimpletune · on Dec 21, 2017

What are some examples of each?

minimaxir · on Dec 20, 2017

Kubernetes very recently added native Cronjob support: https://kubernetes.io/docs/concepts/workloads/controllers/cr...

How does Stripe's approach differ?

tarmstrong · on Dec 20, 2017

No difference — we are using Kubernetes's native cronjob support. This post is about how we migrated to that system.