I accidentally published[1] my AWS secret key last year because I pushed an old project from college. At the time, I was very new to using source control and had little idea how to distinguish between what should and shouldn't be committed. I hope colleges and code boot camps go over that sort of info nowadays. The usefulness to effort to learn ratio seems exceptionally high.
My sophomore year of high school, I was trying to writing a Discord (chat platform) bot for a server I shared with friends and unknowingly included the private key in a public repo I hoped to show them. A specifically written crawler for Discord keys found the key and starting spamming the server with images of very very undesirable things from the far corners of the internet at a rate of hundreds per second. Needless to say I learned my lesson the hard way.
When I attended Hack Reactor they did tell us not to push them. However since they didn't teach us git (they expected us to know it) many still pushed them up. You would know because they'd get an email from some random company/person letting them know that they found their secret keys and that they should enroll/buy their services if they don't know what they're doing. Luckily no one from my class got hosed, but others in the past had.
Looks like a great methodology and good results. Looking forward to reading the paper because I've been working around the GitHub API restrictions for the same purpose.
Specifically, I'm building a SaaS (https://www.locktower.com/) for organizations (or security teams) looking to have a managed solution for detecting leaked secrets in GitHub/BitBucket/etc. I'm in the process of building an on-prem version as well. Overall, I really hope to help drive down the number of unresolved leaks that the authors found.
I wrote a tool that scans all the new commits to our Org for passwords/secrets.
Webhook > AWS API Gateway > Lambda
The Lambda uses the new(ish) Layers feature so it can use Git. I then use the truffleHog[0] library to scan for entropy/regexes inside the commit.
If something is detected, it posts to an SNS topic, which is currently subscribed to by another Lambda that posts an alert to my team and the Security team's Slack channel.
It then calls the GitHub API to make the repo private to limit the exposure.
Why not have a pre-commit hook clientside that runs truffleHog AND if successful generates some form of file indicating it was run, then have a serverside hook checking for that file? This should be doable even with plain Github/etc, no?
I assume you saw the note on truffleHog in the article? The paper found it to be rather inaccurate outside of the basics (mainly AWS keys). Hopefully the authors open source their stuff.
https://github.com/zricethezav/gitleaks plugging my own tool. You can enforce custom rules like entropy ranges + custom regexes to get less false positives similar to what is described under "Validity Filters" in this article.
Article quotes someone making this claim:
> we discovered that even if commit histories are rewritten, secrets can still be recovered…. we discovered we could recover the full contents of deleted commits from GitHub with only the commit’s SHA-1 ID.
I believe they have it - I've gotten notified in the past when I committed secrets on purpose, for test applications. I'm not 100% sure they were from GitHub, but I think they were.
There is no excuse to ever have AWS secret keys anywhere in your code or your settings.
If you are running locally, you should be using your own secret keys that are configured in your user directory with
aws configure
If you are running on anything within AWS you should be using a role attached to your EC2 instance or lambda and the SDK can retrieve your keys automatically.
Unfortunately, every single third party code sample on the internet has you including the secret keys in your code.
An employee of mine once committed a keypair for our company GSuite, clearly labeled, in a Python script. I asked her to remove it from the repo, and she simply pushed a new version of the file with the keypair gone. Plus, she hadn’t configured .gitignore, so all the binaries were there too.
Exactly right.
The default behavior has to change, but thats probably going to be an uphill battle. Its easier to protect users from their own mistakes than to change years of habits though.
Since there's no standard, presumably different people use different methods, or sometimes none at all. Beyond that, people could still make a mistake and put something in code that belongs in an env file.
You can keep a .env.default or .env.sample in your repo, but never use it directly. It should only document what the available parameters are.
Using a .env file is a bit of an anti-pattern, partly because many applications expect it and thus will be affected by it in ways you might not want. But also because passing configuration to applications via environment is not great, because then the values ar all static and the only way to change them is to restart the app. Better to have a function that can reload a real data format (json, yaml, ini) at run-time.
Each environment that runs your app will need to have its own 'env file' because every environment is slightly different, and they're coupled to deployments. So I'd keep your environment stuff wherever your deployment stuff is; with your terraform/ansible/puppet/chef configs, or etc/consul, or an S3 bucket, or SSM, etc. Create it at deploy time, pull it into the app at run time.
[1]: https://www.dannyguo.com/blog/i-published-my-aws-secret-key-...