Hacker News new | past | comments | ask | show | jobs | submit login
The DevOps Phenomenon: An executive crash course (acm.org)
120 points by headalgorithm on June 5, 2019 | hide | past | favorite | 58 comments



Hi. DevOps person here. Was doing DevOps for about 15 years before the word was even coined. (I did my first "devops" kind of work professionally in 1995.)

DevOps isn't a tool. It isn't a team, or a person. It isn't even a culture. It's a way, for lack of a better word. Like the way of the Tao. The DevOps that can be named is not the true DevOps. Everyone is doing DevOps, and no one is doing DevOps. That doesn't mean some aren't doing it better than others.

Some years ago, I attended the first DevOps Enterprise conference, and there was a phrase everyone was using... "We're not unicorns, we're horses". The point being, your company doesn't need to be a Netflix or a Google to do DevOps successfully. Elements of transformation can be adopted by anyone. Then again, as I said on Twitter at some point, "Some companies are unicorns, some are just horses, some are just donkeys, some are just stick horses, and some are just sticks". To go on another wild analogy, DevOps transformation is kind of like alcoholism. First, you have to admit you have a problem.

The next thing is, because DevOps is basically a Taoist problem... well, you can't buy the Tao. You can't hire the Tao. So companies that try to "buy the DevOps", thinking a tool will save them, or try to "hire the DevOps", thinking a consultant will save them, are doomed to failure. This isn't to say you shouldn't go out and try to hire experienced DevOps engineers, or that they can't help you, but the reality is much more broad than a single hire. Getting a good engineer is only as good as that engineer's freedom to act.

And tools do not DevOps. They only make the DevOps easier. You can DevOps even with antique tools. Back before some of you were born, I was doing distributed deployments with version control and A/B testing with shell scripts and a SQL database. It worked.

Wow, I can rant a lot about this. But what really matters? Commitment to becoming better. Actions, not words. Automating what can be automated. Recognizing that speed reduces rather than increases risk. Remembering that it's a process, not a goal. Being patient with slow beginnings. Measuring things. Remembering that not everything that is valuable can be easily measured.


I agree with everything, except maybe:

> It isn't even a culture.

I tend to think it is a culture that recognizes and seeks after the tao. I've been part of cultures that think IaaS, CI/CD etc are varying levels of important. Some are zealots about CI/CD but then don't care at all about IaaS (they provision everything manually). I've also been part of cultures that won't do anything manually. So maybe it isn't a culture in itself, but it's definitely related to and enabled by a culture at the least.

> Getting a good engineer is only as good as that engineer's freedom to act.

That made me cheer in my brain. Absolutely true :clap:


Yes, it's a culture. And it's not a culture. It's also a toolset, and not a toolset. So I see "culture" on the same spectrum as "tools". Which is more Taoism, I suppose.

If you're doing DevOps right, you'll have good culture and good tools, for sure.


I tend to favor automated, or at least scripted provisioning... if it can be exercised quickly and easily, it's more likely to be exercised more often, meaning one can be more trusting.


This sounds nice. The reality of "DevOps" in many companies is this, right or wrong: we'll make the development team do operations / infrastructure tasks, and save on hiring real, seasoned ops folks.

The problem is the team building your product often doesn't want to do that work, even if they can...


I don't think the reason companies are moving to devops is to save on ops costs. At most companies their ops guys are cheaper than their devs. They're doing it because there was an unexploited gap between development and ops that has a lot of potential.


I've interviewed many developers and operations folks. The cheap ops folks are cheap for a reason: they can't code and therefore aren't going to be automating much of anything without a lot of effort. The good ones are just as expensive as developers, because that's what they actually are...


Now we sort of have some words that seperate the two...


Well, even worse, a lot of companies I've been to, are hiring "devops team", that does nothing else rather than involuntary creating yet another bottleneck, it was supposed to eradicate at first place.


That falls under my "hire the devops" criticism. A "devops team" is a bad idea. This is a blurry line, because it's also a good idea - that is, hiring someone (or a team of someones) whose job it is to think about, plan, and implement the devopsy things. But it too often comes with an attitude that development doesn't need to change, and ops doesn't need to change, and can't you just build something to make the stupid stop hurting? And then you spend your day either fighting fires in the build system or surfing Facebook and wondering where you went wrong with your career... but I digress.


Devops are more expensive than dev overall.


My biggest niggle, is it doesn't have to be "perfect"... just good enough.

    * Do you need it now?
    * Is there a simpler solution that gives most of what you want?
    * IS there an iterative approach you can start with?
As much as I've appreciated being on projects with CI/CD all the way to production, etc... there's something to be said for... does it run locally? can you containerize adjacent local services (db, api, etc)? can you containerize your build and local deploy? Do you have a CI/CD pipeline for pull requests? master? Integration tests?

It doesn't all have to be done at once, it can definitely be ad-hoc and can grow as your application(s) grow. Don't start with k8s when dokku will work. Don't force everything to be ready before any features can be built.


See also “The Tao of Hashicorp” (makers of Terraform, etc.) for examples of the types of meaningful ‘under the hood’ thinking changes needed to really bake ops thinking into dev.

https://www.hashicorp.com/tao-of-hashicorp

    - Workflows, not technologies
    - Simple, Modular, Composable
    - Communicating Sequential Processes
    - Immutability
    - Versioning through Codification
    - Automation through Codification
    - Resilient Systems
    - Pragmatism


Nah, that's just marketing bullshit targeted at people who are allured by mysticism. (If I were to reductive generalize: west cost liberals who are attracted to the idea of eastern spirituality, but never actually looked into it.)

For Hashicorp to claim the tao like that, is the admission that they do not know it.


For the record, I've seen a number of attempts to "devops" that were the functional equivalent of saying "We're doing agile now, it just has to fit in the Gantt chart". Heck, I've been Gantted on devops transformation work.


I've actually done the Gantt on such a project. As terrible as it is, it's sometimes better than not having the project.


While I tend to agree, I think on a practical level it IS a culture, if culture is defined as the way a company operates and a set of principles around that. In essence the culture becomes the way, and so it's a very useful abstraction for thinking about how to influence or lead a movement towards the ways of DevOps.

Would be very interested in your feedback on an article I wrote recently dealing with this issue: https://calebfornari.com/2019/05/31/leading-a-devops-movemen...


Was doing DevOps for about 15 years before the word was even coined.

True story, my title was Dev Ops Manager long before the word was coined because I knew a lot about both and kept bringing both groups to the table to solve problems so they made a position for me.

It would be years before I heard the term DevOps again.


> Everyone is doing DevOps, and no one is doing DevOps.

I agree with this. Most times I've been interviewing for "devops" roles what the employer was looking for was, actually, a sysadmin to babysit the devleopers (in terms of being a "dedicate resource").


These roles have been historically disastrous for me with no exception because they shared common problems that kept me from doing my job:

1. All talk and no support for automation. I want to be rich, handsome, and well-liked too but none of that happens without a lot of serious work if I’m just a pretty face with a horrible personality and refuse to admit that’s a big problem

2. Reactionary (vs responsive) culture - all the bad, none of the good of what allows agility. If there was vision and a real need to deploy more often, that would have been solved as a priority to help speed. Instead, all had been around for 5+ years cobbling together bad processes, tribal knowledge first, and burn-out of engineers related to the failing processes was the rule not exception.

3. Lack of understanding of how to grow beyond a certain engineering maturity level - ceremonies over results and design is common as a reaction. A lot of companies do “Agilefall” for example and watch quality and throughout drop like a rock. I’m not sure if there’s an equivalent portmanteau for “Devops tools and processes with none of the benefits of automation and all of the drawbacks of code”


It's funny how a term that defines a culture is now used to describe a role. If you know how to manage cloud stacks, provisioning, CI/CD pipelines, etc then you are a "DevOps engineer". I personally don't like that. It's like calling modern developers "Agile Engineers" or something like that. And now that "DataOps" movement is gaining traction, I've already started to see "DataOps Engineer" posted on job boards.


"DevOps" == "sysadmin who can program a bit".

This in contrast to "sysadmin who can configure your Cisco and your Outlook".

The difference is crucial in data centers where you need automation for sysadmin tasks. (I.e., all of them.)


Related tangent: "Full-Stack" engineers -- which stack? If it doesn't include the network layer, it ain't "full" in my book.


lol. It's just MVC framework + Javascript framework. Basically what a software web developer should know.


It's really about day to day decisions being "pipeline driven" - allowing automated pipelines to make continuous tactical decisions that humans usually make in "traditional" processes. When to merge, is it secure?, when to deploy, is it up? when to spin up an environment? when to rollback, when to enable a flag, when to declare "all OK" etc.

This also means, not just dev and ops skills. It's also testing, also security, also compliance. "pipeline driven" is what we are after. And this of course enables true continuous delivery.

"DevOps" is another Silo. Which is why we also have "DevSecOps" and "TesOps" etc...

I'm in the process of writing a new book about this new-old idea: https://leanpub.com/pipelinedriven/


The last five years or so I've been pigeonholed into this "devops" space. Everything about it is a train wreck. I am currently seriously considering going back to a plain old "feature development" role, at least while "DevOps" quiets down.

Calling it "devops" has turned out to be a disservice. Teams are either made up of devs who don't know the sysadmin space and thus think the solution to every problem is "Write a new program in $LANGUAGE_OF_THE_MONTH", and balk when you tell them "Ok, but... did you consider this thing that's been around 25 years?", or it's made up of what most companies had for "ops" people -- button-clickers who can follow a checklist like "Open Control Panel, double-click 'Add and Remove Programs...'" but are beyond hopeless in front of command line.

In the wild, it's exceedingly rare to find people who can bridge both worlds reasonably. I appreciate that Google was able to formulate a functional SRE team, but... most companies aren't Google.

Virtually everything that could've gone wrong as devops hit mainstream has gone wrong. Docker is all wrong. Kubernetes is a massive barf-fest. The obsession with "cloud everything" is not only a grotesque waste of money, it's insecure from first principles, as the emergence of practical speculative execution attacks has clearly demonstrated.

crazypills.gif


> In the wild, it's exceedingly rare to find people who can bridge both worlds reasonably. I appreciate that Google was able to formulate a functional SRE team, but... most companies aren't Google.

I agree with this comment. Personally, I got my start doing sys admin work, and after doing it for years I became pretty decent at writing scripts in Python and Ruby, and I picked up some Java and JavaScript along the way.

In my experience, a lot of people who don't have a lot of experience tend to fall in one camp or the other: either they're operations people who can't code, or coders who don't follow best practices when it comes to operational work.

But here's the problem that I keep seeing:

Management, the ones writing the checks and doing the hiring, all they know about DevOps is what they've read in books and seen in seminars. Due to this, they have a bad habit of dismissing people who don't speak the DevOps "language."

For instance, I did a job interview a few months back, and everything in the job description was in my wheelhouse. I was a great fit. But the hiring manager kept trying to coerce me into talking about my 'vision for DevOps.' Clearly, he had read a book or attended a seminar, and he wanted me to have some type of religious experience with him.

But that's not my thing. I am too busy actually doing the work to read a 200 page book about mission statements.


Anyone else run into SOx problems where audit wont let DevOps staff make changes to meaningful source code. How do you get around that? Seems like an insurmountable barrier to proper devops.


Sure. So for us IT folk, SOX boils down into 2 buckets (if you'll allow me to oversimplify a bit):

1. Separation of duties

2. Process (and well, documentation, but let's say that's a part of process.)

Separation of duties boils down into needing separate people to develop, review, and deploy a piece of code. Okay, let's look at this from compliance's PoV.

Compliance team just wants an easy way to show and enforce "yes, a separate individual performed each of these duties." Preferably in a way that non-technical people can understand. Easiest way to do this? RBAC. If you are a developer, you simply don't have permission to login to production. If you're ops, you don't have permission to contribute. Clean, easy enforcement, clear lines drawn in the sand, easy reporting to auditors. It also has a chilling effect on the devops mindset. Which seems to be the issue you are raising.

How else can you satisfy your compliance team, follow regulations, and still practice devops? Well, instead of enforcing at the role level, enforce at the individual level.

A lot of VCS software offer the ability to disallow branch merges without a secondary approver. This can be your separation between development and review. Then for implementation in production, you can automate this piece. Think of a git commit, enforced code review, and then k8s automatically deploying your change from the merge to master. This allows your dev and ops team to work on tasks together.

We use Jira extensively (other tools work too), and certain important systems are tagged as SOX for a specific year. When an auditor asks "show me all changes to financially important systems", it's easy to pull up that year's changes, and then show off the 2 people signing off. We also have auto deploy (not k8s... yet) that is easy to show "hey look, this code was definitely not tampered with on its way to production."

Getting your compliance team on board is difficult. It requires team work because if you are subject to SOX, you are possibly subject to other regulations that touch your processes. Like PCI-DSS. Or SOC. If your dev or ops teams can pitch a process change in a way that's easy to digest, document, and audit, then they will likely be on board.


My sibling chomps reply gets it.

You need to explain "peer review" or "merge requests" functionality from github, gitlab, what have you.

Basically, you have:

- An Author - Tests - Reviewer(s) - Audit - Robots - Protective Monitoring

You should be able to explain that your average developer cannot bypass this, offering you greater assurance than the paper driven process before, and then point holes in the current process such as, people deploying manually can change the deployment after it's signed off.

You can also mention that working in this way allows you to alert and triage on all assumed roles that aren't a robot account, meaning nothing that hasn't been reviewed should happen on your production cloud account, for example.

Depending on the InfoSec team you might need to present your own risks, such as the CD environment, robot account setup, how they assume roles, the administrators, etc.

I've been explaining this stuff to old school InfoSec teams for around 5 years now and have got pretty good at it. As long as they're actually interested in the risks, rather than just exerting control, you should be able to explain it to them.


SOX section 404 does not mean ops and dev can’t write code or deploy to prod respectively. It means there should be clear roles and oversight and insight into authoritative sources. If there’s an incident in prod that requires code changes that ops knows how to fix, it doesn’t mean to wake up developers while waiting for a new build either. As long as it’s auditable, defensible to support the business, and there’s clear oversight it’s ok. For example, you may be able to temporarily assume a limited access role after recording that you have a business need to perform an action and the process can enforce another party that this taking place and provide temporary, time-sensitive credentials. Most shops that are bad or less mature at security processes don’t get this far and have code czars and prod czars.


Yes - the compliance teams needs to be on board with the goal. It's not an unsurmountable barrier unless they choose to make it one. In my current workplace, the compliance team spells out what the rules are and provides suggestions on how to achieve those rules.


Compliance is mostly achieved by having ops people who have credentials to do stuff on prod server. They don't do dev, they do 2nd line support and they work with devs on new releases. Prod and acceptance changes are always 4 eyes. Though if no config changes then it is 0 eyes, because automated deploys, and ops guy just pushes button. So team is devops and those ops people are in the team having tasks on delivering releases.

So whole devops is just working together not just throwing stuff over the fence. It works great with QA, security, ops. It is all about people willing to work with each other. No amount of tools will substitute that.


"DevOps" is a stupid buzzword.

Let's use real words that tell the listener or reader what we're talking about.


“Make your devs be on call without paying them”


> “Make your devs be on call without paying them”

1) let's fire all the DBAs

2) let's fire half the operations team, and outsource the rest

3) let's have the devs do the operational work

What could go wrong?!


That might be true for some yes. But for others it might be "We are used to Ops patching machines, doing deployments after lengthy handovers etc and now Dev is saying that they can spin up everything they need in the Cloud and wants to deploy every week using a Gitlab. What now?".

Of course Ops is still needed but what they do has probably changed a lot in this case. And their responsibilities and how they work with Dev must changed.


Depending on on-call crisis management is pretty much the opposite of DevOps. DevOps isn't about being on-call; it's about making on-call unnecessary.


This sounds like pure fantasy. How are you going to run a 24/7 operation if nobody is on call?


And what's wrong with fantasy?

More correctly, it's something of an unattainable goal. Can we get rid of on-call? Probably not. Can we drastically reduce the frequency and scope of on-call actions? Absolutely! Can we get to where every single time the on-call person has to take action, there's a post-mortem to understand what went wrong, and fix the automation and monitoring so it doesn't happen again?


Not 'nobody on call', just 'unnecessary on call'.

Have you ever gotten paged for a transient error that went away by the time you checked it? Or paged for the same error 5 weeks in a row? These, and really most pages, are fixed by DevOps.

In a DevOps system, a feedback loop is used to address all alerts as bugs to be immediately and permanently fixed. After a while, if something breaks, it's because someone just changed something, so it's happening during working hours. Alerts pop up in slack and are acknowledged before a page is sent out. So nobody is getting called.

If your infrastructure is ephemeral and managed as code, you use CI/CD to deploy all changes, and you aren't resource-constrained (at this point, cloud-native infra is only constrained by budget) you shouldn't have stuff crashing randomly at 3am, so there should be very few pages.


> you aren't resource-constrained (at this point, cloud-native infra is only constrained by budget)

Isn't "budget" just another type of resource?


So, Workflowprocessintegrateteamsqualitygateintegrationdeploymentoperationalbestpracticecontinuousfeedbackloop ?

Anyone have the German translation?


WPQL.

That's the Americanized acronym of the original German word.


I've been active in the DevOps space since before it was called DevOps. I got to watch as things liked the release of the Phoenix Project book sparked a frenzy amongst managers in the Silicon Valley. What I think many people miss is that DevOps should actually apply to the entire company.

It's a culture and a way of doing things that goes far far beyond just automating routine tasks. The automation can save time and help with the transformation but it's not the main ingredient so to speak.

As a consultant in the space I have to explain this to clients sometimes and they don't always want to hear it. I believe this is because many people don't know how to influence the culture of their company or team and so it seems far easier to just implement some tool than it is to actually change the culture in a meaningful way.

I actually wrote an article on this subject recently that some may find useful: https://calebfornari.com/2019/05/31/leading-a-devops-movemen...


This really is an executive summary: glossing over the 3+ years it will take for digital transformation to bear fruit, along with the huge budget hits from unexpected expenditures. And it's not really hammered home that the executive level must be 100% invested in the transformation, or every other level will drag its feet, defeating most of the benefits.

There also seem to be some misnomers here, like the lack of a definition for DevOps (not true), or that continuous integration is "continuously testing software" (not really), to confusing the definitions of continuous delivery vs continuous deployment, to assuming 'measurement' only refers to components of systems and not the actual development of the systems, to the idea that you need automation to do DevOps (you don't).


I think the best introduction for DevOps for people completely new is still the Phoenix Project. It is an easy read and I think accessible for people outside of Dev or Ops.


This thing shatters every time developers are told that they have to learn how to make operating system packages. As a developer myself, I cannot come to terms with that since OS packaging is a tool for developers and since it's easy to learn.


Huh?

I mean, what and the how and the why?

It's supposed to be a team decision. If the team decides to use packages, then at least someone has to be able to implement that into the workflow, release and deploy process. If the team decides to not do RPMs or DEBs, then they have to solve build/deploy/release some other way.

For example since Docker I haven't even touched spec files or fpm ( https://github.com/jordansissel/fpm ).


> In one case study, a company was able to provide a new software feature every 11 seconds.

The big company referred here is certainly Amazon - this is public information that we shared a few years ago (when I worked there).


Ugh. Stop talking about automation. Hiring people to automate all the things without any idea about what needs automating is a great way to increase WIP in your system. All the original practitioners were talking about was taking Lean Manufacturing principles and applying them to software engineering. Not so difficult or crazy, just a desire to have a functioning organization that's always shipping in an aggressive and competitive market, instead of an organization undermined by tribalism and petty politics.

The way you do that isn't by "automating all the things" and "continuously all the things" and "measuring all the things". You do it by getting everybody to put their concerns out of their heads, off the whiteboards, and into code. Code is the only thing that can guarantee that everybody's concerns have been accounted for, including the concerns you didn't know about because nobody can keep 200 people and all of their concerns straight in their heads, while allowing for a release cycle short enough to stay competitive.

That's it! Everything else is execution. Scripts (automation) handle enforcement. Pipelines make sure enforcement is consistent and up-to-date. Metrics make sure your scripts are doing what you expect them to do. Dashboards help convince people that their concerns are actually being taken seriously and the engineering managers aren't just spouting voodoo. Speed is just a side-effect of people being focused on their work and not having to waste time arguing about petty details anymore.

Shell scripting has been around for more than forty years. Modern tools made it pretty, but ultimately you need leadership to make it relevant on an organizational scale.


I don't understand what you are saying more than "Just do the right thing, it should be obvious for everyone".

The target audience here is a large enterprise that has value cost and stability for years and has separate Dev and Ops (often not in the same country) and not only follows all those standard processes but has also added a lot of their own written Word documents to the pile. There are check lists, approvals, delivery managers, product managers and all kinds of other managers involved in every release.

How do they even start with your advice?


You kind of answered your own question. Word documents aren't code. Take all these managers and get their concerns into code. That means dedicated engineers to turn non-code into code, lower management that can negotiate with stakeholders to figure out what to prioritize, and upper management to get stakeholders to play ball with lower management. Standard enterprise project management, really.


You need to have the courage as a leader of such an organization to put accountability in a place that results in delivery and provide services and culture to get it done.


There is tons of accountability. That is why you have the delivery managers, the lengthy handovers, checklists and slow processes. So nothing goes wrong and those that are accountable can be cleared from any blame.


But I don’t actually care about these things and I don’t care about succeeding or failing. I just care about having sexy projects associated with my name so I can get promoted.

(Not actually me, but that’s how people at big companies think)


This is actually an excellent strategy. Doing grunt work in the trenches is good for business but very bad for the worker.


It literally says that in the article:

"Most importantly, for DevOps to truly deliver value, it must include more than just tooling and automation—so simply purchasing and installing a solution isn't sufficient. As outlined here, DevOps includes culture, process, and technology."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: