Many years ago I got a trial license key for something, Aspose components of some sorts I think, and without thinking of it, checked it in into public Github repo. Well, few days later Aspose's support sends me a nicely worded note saying that they noticed that it was there and invalidated it for me. Their description and instructions were very clear about why they did it and why I shouldn't have checked it in. I thought that was very proactive and excellent customer service.
I actually had something similar happen to me last month. I accidentally published a discord API key to GitHub and within minutes I got a nice message from “Safety Jim” to my personal discord account letting me know they’ve found my key on a public repo and have gone ahead and revoked it.
I felt like a bit of a dope but it was neat to have it happen to me. Lesson learned for sure.
GitHub PM here. Glad that was a good experience! We work with ~50 partners (details in the link below) to notify them when tokens for their service are exposed in public repos, so that they can notify you.
Reminds me of how Airbnb redacts Hawaiian street addresses because they look too much like phone numbers, literally replacing them with a "phone number hidden" string in the host|guest chat.
Moral of the story: make your keys regexable without likelihood of false positives!
I spend a lot of time working with physician data. In the USA, physicians have a registration system called NPI. Apparently, NPI numbers are in the same format as some passport numbers. I know this because I started getting angry warnings about PII sharing until I got our tech team to turn them off.
And while we're at it, I think saving two chars isn't going to do much to prevent global warming, and let's just use more readable SERVICE_{KEY} and SERVICE_PUB_{KEY} (as opposed to having scratch your head thinking "did I call it SRV, SVC, SRVC, SRVCE, ...?")
I see this standard linked here a lot. Did anyone read it though? It only helps with identifying whether a string is a secret, not at all the service or environment where the secret applies.
Awesome feature. Saved the day for us some months back when an AWS token was accidentally committed and pushed. (AWS itself also immediately notified us.)
Rant time: this isn’t directed at you. I am just replying to your comment because you said something that triggered me.
Also the “you” below is the generic you - not you personally.
Disclaimer: I work at AWS in Professional Services, all rants are my own.
Now with that out of the way, I hate the fact that there are way too many code samples floating around on the internet that have you explicitly put your access key and secret key in the initialization code for the AWS SDK.
Even if you put the access keys in a separate config file in your repo, this is wrong, unnecessary, and can easily lead to checking credentials in.
When all they have to do is
s3=boto3.resource(‘s3’)
All of the SDKs will automatically find your credentials locally in your .config file that is in your home directory when you run “aws configure”.
But really, you shouldn’t do that, you should use temporary access keys.
When you do get ready to run on AWS, the SDK will automatically get the credentials from the attached role.
Even when I’m integrating AWS with Azure DevOps, Microsoft provides a separate secure store that you can attach to your pipeline for your AWS credentials.
Hindsight is 20/20, but definitely one of those places where flat out giving the credentials should not even be an option (or it should be made artificially tedious and/or explicitly clear that it’s a bad idea by e.g. naming the param _this_is_a_bad_idea_use_credentials_file_instead_secret_key or so). Of course there are always edge cases in the vein of running notebooks in containers (probably not an optimal example, but some edge case like that) where you might need the escape hatch of embedding the credentials straight to the code.
But yeah, if the wrong thing is easier or more straightforward than the right way, people tend to follow it when they have a deadline to meet. To end on a positive note, at least cli v2 makes bootstrapping the credentials to a workstation a tad easier!
I remember a Rust AWS library worked like you describe (An old version of rusoto, I think, deprecated now).
I wasn't familiar with how AWS credentials are usually managed so I was very confused why I had to make my own struct and implement the `CredentialSource` trait on it. It felt like I was missing something... because I was. You're not supposed to enter the credentials directly, you're supposed to use the built-in EnvCredentialSource or whatever.
> at least cli v2 makes bootstrapping the credentials to a workstation a tad easier!
I know I should know this seeing that I work in ProServe at AWS, but what do you mean?
I’m going to say there is
never a use case for embedding credentials just so I can invoke Cunningham’s Law on purpose.
But when I need to test something in Docker locally I do
docker run -e AWS_ACCESS_KEY_ID=<your_access_key> -e AWS_SECRET_ACCESS_KEY=<your_secret_key> -e AWS_DEFAULT_REGION=<aws_region> <docker_image_name>
And since you should be using temporary access keys anyway that you can copy and paste from your standard Control Tower interface, it’s easy to pass those environment variables to your container.
I meant the aws configure import which they added — point it to the credentials csv and the cli handles adding the entry to the credentials file.
Sometimes you might need to use stuff that for some reason fails to use the envars, I think I’ve bumped into some stuff which reads s3 via self-rolled http calls. Dunno if it was to save from having boto as a dependency, but those things are usually straightforwardly engineered so no logic in figuring out the other, more smart ways to handle the keys. Here are the parameter slots, enter keys to continue.
> I hate the fact that there are way too many code samples floating around on the internet that have you explicitly put your access key and secret key in the initialization code for the AWS SDK.
See, I thought that was a big strength of a lot of the AWS documentation over Google Cloud.
An AWS example for, say, S3 would show you where to insert the secrets, and it would work.
The Google Cloud Storage examples, though? It didn't seem to have occurred to them that someone reading "how to create bucket example" might not have their credentials set up.
And when the example didn't work - well, it was like the auth documentation was written by a completely different team, and they'd never considered a developer might simply want to access their own account. Instead the documentation was a mess of complicated-ass use cases like your users granting your application access to their google account; sign-in-with-google for your mobile app; and so on.
Google's documentation is better than it once was - but I've always wondered how much of the dominance of AWS arose from the fact their example code actually worked.
> See, I thought that was a big strength of a lot of the AWS documentation over Google Cloud.
Just to clarify, I’ve never seen a code sample published by AWS that has you explicitly specifying your credentials. (Now I await 15 replies showing me samples hosted on Amazon)
For Java they used to demonstrate putting a .properties file in among your source code [1] although admittedly not literally hardcoding a string. The PHP examples suggested putting your code into a config php include [2] (although they did also suggest putting them in your home directory).
But I can't understate how important it was that the AWS getting started guides said "Go to this URL, copy these values into this file" while Google's examples and getting started guides... didn't.
I deal with this by having a directory in my development tree, named ”doNotCheckThisIntoSourceControl”, and I add a wildcard of it to my global .gitignore.
I’ll put things like server secrets and whatnot, there.
Of course, I need to make sure the local directory is backed up, on this end, since it is not stored in git.
I am serious. If there is a better way, I'd use it.
Remember that I don't do online/server-based stuff. Most of my projects are for full compilation/linking, and rendering into host-executable, binary apps. There's a bunch of stuff in my development process that never needs to see a server.
A super simple way is to have a script in your home directory - far away from your repos - that set environment variables that you read in your configuration.
[UPDATE] I ended up doing something even simpler. I have issues with running scripts during the build process, unless really necessary (I have done it, and will, again).
Since this is Xcode, I simply needed to store the file in a directory (still with the global ignored name) far out of my dev tree, and dragged the file into the IDE.
100% agree. We always keep all tokens (not just AWS secret keys) in a separate file that is never checked into the repo and are passed into the CloudFormation template at deployment. (The error in this case was a new repo hastily pushed and .gitignore wasn't properly updated to exclude the file with the keys.) But we've since switched to using AWS Secrets which is a much better solution.
Yeah that’s not good either. Your keys never need to be in a local file. Just put them in Parameter Store/Secrets Manager and you can reference those values in CF.
You could set up something like https://github.com/godaddy/tartufo in a pre-commit hook. Not sure if github has a way to hook into the push hooks on server side, they might though.
Yeah, the issue with pre-commit hooks is you have to remember to set them up client-side. I tend to push to GitHub through a gitolite mirror, though, so I could probably put this in the hooks in my gitolite middlebox.
Pre-commit hooks can't be automatically set up on the client side. If they could, this would mean that any repo you clone could run arbitrary code on your machine.
It can be as simple as a script you have to run once, but it can't be automatic. Which also means you can't really trust contributors to do it, even if they're well-meaning some will forget.
Hm - this would work better if keys were easy to scan with regular expressions.
Next time I implement api keys I wonder if it’s worth going out of my way to make them easy to identify. Eg, by prefixing every key with a few well known characters. Like FMLA_xxxxx for a fastmail app key.
If you go make an API key in Fastmail (Settings -> Password & Security -> API tokens), you'll see that it's prefixed very similarly to that (e.g. `fmo1-`) for this very reason! (There are some other neat things about our API key format I'd be happy to tell you about sometime if you're interested.)
Some services also use prefixes to provide additional context like account type and token validity length. I think Slack does this (service accounts have different prefixes than user accounts and I think temporary tokens have another prefix)
I had a couple questions, as this feature is awesome!
How long does it take to get the response vs external bots pulling the data? What mechanisms does GitHub have in place to stop bots who monitor repo changes? I ask, as I have been there and it is super scary how fast someone/bot pulls repo data changes, as in minutes, and the repo we had back then was not popular.
As long as search results can be sorted by date, anyone can see updates pretty much instantly if they monitor the search results. The repos don't have to be popular for that. Bots can just check such a feed every few seconds for example.
What are the thoughts around capabilities like this for private/enterprise customers? Is the code available in an action that could be connected to private runners perhaps?
You can definitely use pre commit hooks for this like the one of ggshield https://github.com/GitGuardian/ggshield - remediation is far quicker when the secret does't make it to the codebase!
You cannot block what someone commits (they can block it themselves with tools like gitleaks invoked on a pre-commit hook) so the only thing you can do as a 3rd party is to scan and react when you do notice a secret published.
GitHub certainly could block push requests, at least git itself can via hooks, there are a number of hooks invoked by git-receive-pack that can influence what it does.
But the commit still exists locally (since git is decentralized) so you now end up with a weird state that you have code you cannot push to origin. Definitely not a desirable feature.
> at least git itself can via hooks
I already said that:
> they can block it themselves with tools like gitleaks invoked on a pre-commit hook
The problem with git hooks is that they're not cloned with the repo. So you're reliant on the user installing those git hooks locally (sure, some repos will have helper scripts to install the hooks for you. But you're still reliant on the user running that script).
> code you cannot push to origin. Definitely not a desirable feature.
If there is data that shouldnever be pushed to origin, then it is a highly desirable feature that the server block pushes that include that private data.
> The problem with git hooks
I was talking about GitHub's own git hooks that run on their servers, not about any local ones.
> is that they're not cloned with the repo.
It would be a terrible security issue if they were automatically enabled after cloning.
> If there is data that should never be pushed to origin, then it is a highly desirable feature that the server block pushes that include that private data.
It’s already too late by that point because your secrets have already left the building. You’re not relying on upstream being honourable
> I was talking about GitHub's own git hooks that run on their servers, not about any local ones.
There’s no such thing. You can have CI tooling like GitHub Actions, but they’re a different beast to git hooks
> It would be a terrible security issue if they were automatically enabled after cloning.
It doesn’t have to be either/or. There are ways of having a sensible compromise. Like a git config that enables hooks from known safe origins. Or having the user promoted whether they want to install git hooks upon cloning.
True, but it is better than the secrets becoming entirely public, automated bots could be harvesting them and exploiting the resources they protect.
> There’s no such thing.
I would be surprised to here that GitHub doesn't actually run git on their servers. If they receive git pushes using git, then own git hooks are involved, ones that GitHub has written for their own purposes. They could simply add one to block bad pushes.
> Like a git config that enables hooks from known safe origins.
That sounds a bit terrifying to me, but I'm not of the GitHub generation.
> Or having the user promoted whether they want to install git hooks upon cloning.
That sounds like it would enable phishing-like attacks and people just clicking "yeah sure" without verifying the safety of the hook.
> True, but it is better than the secrets becoming entirely public, automated bots could be harvesting them and exploiting the resources they protect.
True. And some popular repos do already run into this problem. So it’s not a theoretical problem.
> I would be surprised to here that GitHub doesn't actually run git on their servers.
They’ve documented about how their backend works so there’s no need to speculate. They run an implementation of git but not the standard git CLI.
> If they receive git pushes using git, then own git hooks are involved, ones that GitHub has written for their own purposes. They could simply add one to block bad pushes.
They have their automation, GitHub Actions.
Sure they “could” also implement what you’ve described but it’s not how it currently works. So a pointless argument since we could be here all year discussing the literal infinity of different things Github “could” do in theory but that their infrastructure doesn’t currently support.
> That sounds a bit terrifying to me, but I'm not of the GitHub generation.
What I posted has literally nothing to do with GitHub. In fact if your origin is private git server (as I started out using git, since GitHub didn’t exist back then) then it’s even easier to designate a trusted origin. This approach makes total sense for businesses. Doesn’t work so well for open source but it’s just one option of many.
> That sounds like it would enable phishing-like attacks and people just clicking "yeah sure" without verifying the safety of the hook.
Potentially yes. But if you’re cloning a git repo, making code changes and then committing it back, you’d hope that individual is competent enough to audit the git hook. At the very least, they’ll be running the build scripts locally to unit test their changes, so it’s not like that phishing attack isn’t already present. Feels very much like you’re looking for reasons to dismiss any suggestions here rather than have an intelligent discussion.
Great that they finally do that. I accidentally checked one into a public GitHub repo a long time ago and about 2 years later someone found it. The infinite spam wasn't even the worst part about this, bajillion emojis in every message just caused the Discord client to crash instantly upon opening, so I couldn't even figure out what's happening at first.
I’ve had this happen to me too! No less than a second after pushing to GitHub did I receive a message about publicizing my auth key. It was amazing, and I’m sure this saves a lot of stolen keys from people just getting into programming
In fact, you can apply as a Github "secret scanning partner" to have your own secret's format (regexp) be a part of this secret scanning, with a webhook to your servers whenever they find one, so that you can do the credential-invalidation on your own backend + send the kindly-worded email from your own domain.
Mind you, your secrets need to have a distinctive format in order for this to work. Probably a distinctive prefix is enough.
An Unethical Life Pro-Tip (that the word is already out on anyway, so I don't feel too bad):
• For about $500, you can use BigQuery to extract all matches of a particular regexp, from every file, in every commit, in every public Github repo.
Whether or not Github themselves use this to power their secret scanning, arbitrary third parties (benevolent or not) certainly can use it for such. And likely already do.
Makes sense; but doesn't help the companies who aren't aware of the secret-scanning service / the ability to become a secret-scanning partner. If you have your own little API SaaS with its own API-key format, then you've probably got API keys exposed in the Github dataset; and someone's probably already found and extracted them. (It happened to us!)
Mind you, the Github dataset isn't the leak itself; the leak is the public repo that the user pushed their key to. The dataset just makes such searches scalable / cost-effective to third parties who aren't already indexing Github for some other reason.
I worked for a big startup last year and was on a contract deadline for integrating a vendor framework into a React Native app.
It was taking too long to get a new temp demo license key and GitHub search with clever filters helped me track down a demo key that was recently uploaded to a test repo.
Hah. Yeah. Found a bunch of ssh keys, passwords, etc for Comcast years back which turned into a shitshow when I tried to report it. Once I found the right people to talk to things got better, but the entire experience was really reflective of how bad large orgs are with security.
A friend once told me he was having a hard time getting a client to take his security concerns seriously. So I went on github and found a commit in their repo that included a production password and sent it to him. Maybe took 5-10 minutes to find? Apparently once they found out about the commit, they panicked a bit and started taking his concerns more seriously.
Old school one when I was a security consultant for a bit (pre-automated pentest scammers). Medium size regulated fintech. Domain admin passwords and admin accounts were stuck on post it notes on a board in the machine room. If you went over the road to the college, asked to use the toilet, which they seemed fine with, and poked your 200mm lens out of the bathroom window you could snap them all.
Don't assume that level of competence improved with addition of technology.
Heh, sometimes, sure. In a separate comment I mention a company with whiteboard passwords. What I didn't mention is that they had a glass wall that you could look into from a well-traffick'd hallway. One of the larger companies that worked at the office (not any longer) rhymes with loinbase.
Also, I no-joke heard of a company that absolutely, unironically, did the webcam thing with RSA tokens.
Did some consulting for an org that did managed IT and found that they wrote on a white board all of their passwords. Wrote them an email basically telling them "hey maybe you should erase that". May or may not have billed them for the time it took to write that email.
They put a piece of paper over the passwords in response.
Yikes. It is sad to hear stories like that, where security is not a concern until panic sets in. :(
Yet another reason we need to adopt standards like security.txt and make it easy to report these things as it is to tell robots to ignore us with robots.txt. See securitytxt.org for more on the project.
It's tough. I'm our public security reporting email list.
We get a lot of things that boil down to "When I go to your website, I am able to see the content of your html files!" ... yes, reporter. That is what a web server does. It gives you HTML files. Congrats that you have figure out the dev console on your browser, but you're not a hacker. I'm trying to go with Hanlon's razor here and assume this is inexperienced people and not outright scams.
We don't get a lot of these, but they far outweigh actual credible reports. But we try our best and take everything seriously until it can get disproven. And it's exhausting. So I get it sometimes. Sometimes having a place for responsible disclosure just opens yourself up to doing more paperwork (verifying that the fake reports are fake). That said, we still do it.
> Sometimes having a place for responsible disclosure just opens yourself up to doing more paperwork
100% this. And it bites harder when you’re a scrappy time constrained startup, or just offering a public service.
I maintain a public API that returns public information- observable facts about the world. As such, the API doesn’t have any authn/z. Anyone can use it as little or as much as they want, free of charge.
Of course I get at least 1 email per year telling me my API is insecure and that I should really set up some OAuth JWT tokens and blah blah blah.
I used to reply telling them they are wrong but it gets hostile because they want money for finding the “vulnerability”.
On the flip side, at another company I once got a security@ email that sounded like a false alarm. I quickly wrote it off and sent a templates response. Then they came back with screenshots of things that shocked me. It was not a false alarm. That guy got paid a handsome sum and an apology from me for writing him off.
Or this! It's not just paperwork, but also mental capacity. Having a place for responsible disclosure yields enough "fake" disclosures that you become desensitized to it. Boy who cried wolf style.
It's possible "security isn't a concern" because they are dismissing the report, not the security.
I think the fundamental problem is, a lot of orgs just don't care about security, as it doesn't affect their bottom-line. Even breaches are only a temporary hit on the PR. Proper way to address that might just be legislation, with heavy fines based on total revenue.
That and also security is just hard to scale. That's why if it was mandated by legislation, companies would be forced to spend a comparable amount on scaling their security teams and efforts.
Most respectable services will have an abuse@ address you can contact. They should at least be able to get your issues where they need to go internally. I've had very good results for companies and networks in the US.
I've never had an outright bad experience reporting a security issue, but some companies definitely aren't geared up to handle reports. I found that an energy provider's API would give usage information for past addresses and eventually I think the right team got told, but it was a nightmare trying to find someone to actually report the issue to.
It's hit and miss. Sometimes they want to throw you under the bus. Sometimes they want you to sign affidavits. I've never been asked to sign an NDA or anything like that. Sometimes they threaten with criminal charges. DoJ recently released some guidance about good-faith security reporting, so it might be easier these days. Doubt that affects active litigation/prosecution or vindictive orgs, though.
Worked at a place where they liked to use encrypted Java prop files... with the passwords hard coded in the app (in the same repo). Those were internal repositories, though.
The access model on platforms like GitHub is flawed, a single account can be used for both professional and personal projects/repositories, leading to “fat finger” errors like this one here...
Oh yes this. It's so easy to critically fuck up an invite into an organisation. If you get typo the username you are potentially compromised. I've seen a couple of near misses on this already.
Note: the invite input box actually autocompletes ALL github usernames.
This can be vulnerable to "ticket trick" - often support/helpdesk sites are put on the main domain and have reply-to email addresses that will reflect the content back to the user requesting support. This can be used to sign up for slack, etc.
This is what I do but I really wish there was a better integration with auth providers and could use it for the invite. Would be nice to search my directory to type the email and confirm the name matches the email.
This is what GitLab does with their hosted AD/LDAP connector.
I’m in fear of mistyping something and inviting the wrong person.
Sorry, but string prefix search over a few hundred million entries is something you can do with the same performance using just postgres on a single server with just a few hours of dev time.
I've done it before, it's not as impressive as it seems.
With trigrams you can even do precise substring search on this scale with good performance.
The fact that no one bats an eye that GitHub is used to store proprietary source code is so surprising to me.
Conversely if that is what it is meant for, why does it default to autocompleting to all users globally instead of my org (even on the enterprise version.) why hasn’t this been fixed for years.
Do you have a source that this is a "fat finger" error?
I've had contractors publish my code to public Github repos to showcase their work for their next job. Even after emailing them multiple times, I kept finding my code in github with companies emailing me asking for a referral to this person...
This. I used to use them, because I, too, have been burned by my own mistakes before. However, I had to stop using them as a plugin for GitHub because (at least when I was looking) there doesn't seem to be a way to exclude private repos from the reporting, and there were some false positives in private repos I don't want flagged. But I think the service is a good idea in general.
Ah. I can’t believe this still happens in this day of age. About a decade ago, I was working for a startup and we were getting dominated in our growing space by a much larger, well funded rival. Our competitive intelligence team browsed through their git, and the rival actually exposed access to their customer, pricing and sales agent database by leaving their credentials in one of their branches. The team went to our legal department asking if they can be protected by the company, and if they can use this intel. The team then worked with the product team to integrate all their pricing engines to our POS to undercut their pricing and sent marketing blasts to their leads with targeted marketing campaigns. Long story short that company is now defunct, and it definitely undermined their growth.
If you're in the US, that's 100% a crime. If they were responding to an unauthenticated API or web request that's one thing, but using a leaked password on a database is not legal at all.
> Git is an awesome version control system, used by over 93% of developers, and is at the heart of modern CI/CD pipelines. One of the benefits of Git is that everyone has a complete copy of the project they are working on.
I feel like this is copy-pasted from a pitch deck on why GitGuardian should be funded. Does anyone reading the article care about this anecdote? Like do people stop reading at "well I'm one of the 7% that doesn't" or think "wow a lot of people are using Git, I should buy their thing"?
Sorry for the meta comment, but it just stuck out as odd to me.
There is still a lot of noise with basic tools like this (I've also used trufflehog at scale).
To properly handle secret scanning requires calling live APIs to test if keys are "real". And you need to have a way to file tickets when you do have findings... if you rotate a cred from production, that's now an outage, so you need to coordinate multiple teams.
It's a lot of work and free tools only solve one part of this. I can't speak to any of the vendors in this space but I can attest that it's a harder problem than it seems!
Those are good points. Still, it’s fairly manageable, after certain adjustments. Also, we’re using the new (Go-based) version of TH that’s both much more performant and validates secrets against endpoints. I suspect their SaaS offering is a bit more polished and turn-key, but even the open-source one is quite decent. It doesn’t swamp us with FPs, at least.
Well, GitGuardian is free for individual developers (20 K of them use it - n°1 app on GitHub market place) and for team below 25. So I guess the masses can enjoy secrets free code! https://github.com/marketplace/gitguardian
I stand corrected on this, but what I’d argue is it’s not an affordable solution for medium-sized companies and non-profits who don’t swim in cash. It could be that our example is unusual (big non-profit), but when we evaluated GG the pricing left a sour taste..
More specifically, none of the paid security products we use cost nearly as much, and those products do much more than just detecting secrets. So from that standpoint, the pricing just seems outrageous. It’s pretty clearly aimed at big enterprises that can afford it and are vulnerable to FUD (while the “hobbyist” pricing is just free advertising). I don’t blame them for finding a way to make big money, but this business model is not what we’d pick.
Most of these (even sometimes expensive) tools only look at repos and users who are associated with the company’s GitHub org, which barely solves the problem. The much harder problem is the number of corporate secrets that are on random repositories (personal dotfiles, automations, data science scripts, etc.) across GitHub with no strong relationship to the organization. Try using GitHub Code Search to find all the Fastly API tokens that have been leaked, for example, and I bet you’d find some wild stuff.
Make a private repo. I wouldn't blame a corp if they tried to scan every public github repo for their API keys, let alone an employee's public account.
“T-Connect enables features like remote starting, in-car Wi-Fi, digital key access, full control over dashboard-provided metrics, as well as a direct line to the My Toyota service app. The servers that control these options contain unique customer identification numbers and customer emails.”
I don't see why any of that should require the email address. They can communicate with the customer through the app or through the car UI.
In general, apps and sites these days hoover up more info than they need simply because they can, not because it adds to the customer experience (and often doesn't help the company either). There is no incentive to be in any way judicious about what to collect and the frequent breaches show that even the companies don't value PII as something worth protecting because it's not core to their business.
There are software such as Trufflehog ( https://github.com/trufflesecurity/trufflehog ), that find secrets. We are using it at an organizational level, but there's always some delay from finding something and getting it reported. I've been meaning to add it both to our CI so our team can notice right away, and even to Git push hooks, to catch these cases early.
I used to report things like this that I had found, including cases where I can see people used the default "sample" config for security purposes, but I found that either people would not care at all, or massively overreact and somehow blame me.
If an organisation is disorganised enough to leave critical details in public, they're probably too disorganised to handle someone reporting it.
Reminds me of a related HN discussion a few years ago: someone searched for "remove password" in GitHub and unearthed who knows how many valid passwords in the hundreds of thousands of commits that the search returned...
I wish hosted GitHub made pre-push hooks available to the public. Would make this a much easier problem with free scanning tools like Trufflehog.
Or alternatively, if GitHub Secret Scanning was available to all public repos, instead of requiring a (very) expensive GitHub Advanced Security subscription. But I understand, they need to make money somehow.
(GitHub PM here.) The Advanced Security secret scanning experience is coming to public repos (for free, obviously)! Give us a few more months - we have a little more work to do scaling it up
Github is amazing. Wanna get expensive licensed fonts for free? Just search for $FONTNAME.otf github and you will find at least a few projects using it.
People don't think a lot about what they put on there it seems. Or maybe font foundaries haven't sued enough at this point.
Is there any reason why keys don't constantly update? It seems like a service could exist where every five minutes a rotation occurs across services with decaying privileges. For example, the 5 minute old key still works, but the 10 minute woman has completely expired.
Either you'd use an encryption algorithm that depended on a "deeper" key... or you'd fetch the new key while authenticated with, you guessed it, another "deeper" key.
It's keys all the way down. Every key you use, it's your responsibility to keep it private.
(Unless you want to be dealing with physical hardware dongles that generate keys, but those aren't exactly easily portable.)
The refresh can be done with the old key, within the time-window. You only need the deeper key for the first time authentication, when starting up or provisioning the service.
What you have to consider is that starting a service is often not just a one time fire-and-forget operations. Applications crash need auto-restart, usually via systemd, kubernetes or something. So the keys-all-the-way down knowledge need to be integrated throughout that whole stack or on the side of it.
Kubernetes ServiceAccounts are based on a very similar flow, they are temporary certificates that are mounted into each Pod, used to connect to the API server and deciding what that Pod is allowed to access.
Not exactly the same but I had to integrate with a payments API which required one call to an auth endpoint with user/password under HTTP basic auth to get an access token. The actual calls to calculate and execute payments used this access token instead as a bearer token. Those ones expired.
I'm not sure I really see the point, but I guess you could lock down the main call on another system, store the temporary token somewhere, and have the other systems use it.
As you say, you still need the top level credentials somewhere to get a new token.
Yah, that's the OAuth Client Credentials flow but as noted, you still have a static set of creds that are required to generate the short lived access token. Besides being useful for being able to limit scope in some circumstances, the main point of the client cred flow is to appease eager sec arch's who insist on OAuth.
"Production keys in source control" is right up there with "mistaken routing table entry" and "fat-fingered DNS config" on the list of critical company-breaking mistakes that you'd think would be easy to avoid, but aren't.
Those three are not all equal. "Production keys in source control" is the equivalent of a surgeon not washing their hands between between surgeries. It's basic level of professional competency that should not be violated. The latter two are bad mistakes, which shouldn't happen but do.
Surgeons have a practiced ritual ("scrubbing") to prep for surgery. Do you practice a credential-scanning ritual before saving (committing) your code or pushing your code to a remote repo?
I have git hooks to lint code syntax, but nothing for scanning for leaked credentials. Looking @ TruffleHog now, mentioned by another poster.
That's certainly a good idea. But the secrets shouldn't be in the codebase to begin with, certainly not production secrets. Production secrets should stay in production and no one has access. Whatever intends to use the production secrets should have first been developed in a dev environment and released to prod.
A nice approach, if you have sufficient control over the form of your secrets, is to prefix each secret with "MY_COMPANY_SECRET_DO_NOT_COMMIT:". Then you can add a commit hook that refuses to commit if any committed file contains that substring, etc. etc.
Great idea, but hard to enforce. Just use a scanning CLI like TruffleHog, Gitleaks, or ggshield from GitGuardian to catch all sorts of hardcoded secrets.
Code-reviews? Should be a ritual you do on your own code before commiting+pushing and should be a ritual that others will do in the PR before merge (arguable here a secret is already compromised).
"Should" not be violated is the point, though. I agree, it shouldn't. But it is, all the time.
I mean, I'll bet Toyota knew this organizationally. They had security people sign off on the design who all knew how secure key management is supposed to work. They probably reviewed this github release. And it happened anyway.
Maybe they weren't supposed to be production keys. Maybe it was a development key from someone's early cut of the tooling that got mistakenly reused for production. Maybe a script that updated all the keys mixed up which was which.
The point is that the existence of a Clear And Unambiguous Right Thing to Do is, in practice, not sufficient to ensure that that thing is done. The space of ways to mess up even obvious rules is too big.
And that's surprising, which is why (1) it keeps happening and (2) people like you don't take the possibility seriously in your own work.
You're jumping to conclusions in your final statement there. The existence of inexcusable bad practices does not mean we should not try to mitigate against them, and I didn't say we shouldn't.
And yet I see it get violated all the time. People should do a lot of things, but a lot of my coworkers are lazy and do not do quality work. Given that it happens, and that I can't prevent it, one must then ask how to guard against it.
At my org, we even try to generate all secrets with a standardize prefix/suffix so as to make them very greppable. That doesn't stop "Architects", "Customer Solutions", "Analytics" types from … just working around the standard tooling and manually generating one by hand because … IDK, they think they know better? I really don't get it.
Doctors used to not wash their hands too. I get it though, and i've seen the same thing. Really it comes down to education and not granting access to secrets to people who aren't capable of handling them.
I committed my google maps api key to a public github repository recently and github immediately sent me a warning about it. The thing is, I did it intentionally. The key is used on my website and the website is served by github pages.
Now, it's an embedded maps api key, there's no cost to use it, nobody can use it from a domain other than mine, and it's easily visible in the page source if someone views that, so there's really no reason not to commit it since even if I didn't commit to somewhere publically, it's still publicly available on my website and nobody else can use it anyway.
I could run your site locally with a customized host file so the referers all come from your domain. I don’t think it’s that much of a risk but I wouldn’t want to use a key associated with something that can bill me.
You could use Google actions to build your pages site injecting the api key at build time. It’s stored a repo secret rather than in code. Of course since you deploy the site publicly, the key will still be visible.
My company has monitoring for this, but it still seems to be a law of nature that,
1. someone adds new service/server/infra in a submarine manner
2. it goes to prod
3. the cert expires and outage begins
4. my team is asked what to do, because "we're the cert experts"
5. we add it to the monitoring
So it only happens once … per service. Which isn't great. But how do you get people to slow down and do simple shit, like add a monitor to your new service? (Or promulgate a design doc…)
I think the real answer is to only issue limited-duration certs and only via automated means (ACME or similar), thus requiring automation be in place from day 1.
This still doesn't protect against the vector where somebody else in the company has managed to prove themselves to be responsible parties to another CA/issuer.
Oh I agree! Everything would be ACME if I could. A ridiculous amount of stuff still doesn't support it, though.
And, like I said, usually it's someone who doesn't grok certs doing it without asking for help in the first place, so they're not going to get why ACME. (Because I am tired of doing cert renewals. I've had enough for a lifetime…)
Pit of success. Make it so that the Right Thing™ is super easy, whereas the Wrong Thing™ is frustrating and keeps pushing people towards the Right Thing. Humans are lazy, use that to your advantage.
For example it's one line for me to configure a new machine infrastructure built to have a certificate for myservice.myorg.example, and there's a Wiki page reminding me what the line is, or I can look at lots of services which already work. If I do that, the automation happens from the outset, my service never has a custom cert or lacks monitoring, it has monitoring from day zero and its certificates are automated. I happen to really care about ACME and the Web PKI and so on - and would have gone the extra mile to do this, but I was astonished on Week One at my current employer to realise oh, this is just how everything works here, the Right Thing™ is just easier.
Does your company have a Wiki page saying how to do it wrong? After writing the page about the right way, update the bad wiki page with a link to your new page, and cross through all the previous text, or even just delete it.
If you have firewall rules or a proxy, block random unauthorised stuff. This is probably a reasonable strategy anyway. Now they come to you to unblock their "submarine" service, and before you do that's the opportunity to insist on proper certificate behaviour.
People are really good at avoiding the pit of success! We run most of our infra on k8s, and if you want a cert, with ACME & auto-renew all managed automatically for you, you just create a Certificate object.
But then we get some vendored product that manages to be completely unable to run in k8s, devs avoid the automation for $reasons, etc.
> Does your company have a Wiki page saying how to do it wrong?
Sometimes we do! I've found a few of these after really pressing the point of "why are you doing it this way?" hard enough. But you have to a.) get an answer to that and b.) the answer has to reveal they followed some shadow docs.
> If you have firewall rules or a proxy, block random unauthorised stuff.
Your rouge service implementer just creates their own VPC; they are in control of the firewall.
Should my security team either set appropriate privileges or delegate that to my team? Perhaps. I have to get them to adopt like RBAC and ABAC first; they fervently believe our industry's regulations forbid "job function begets access" (i.e., RBAC) type policies. (They desire that, even if a job function begets access, that if you're not needing to exercise that access, it should be revoked until such a time that it is required to be exercised. But this means that you end up with "all security reqs. must flow through security team" style thing, and there are then a lot of them (because they are so ephemeral) and so any process must inherently be ignorant of whether the request is right. So your rouge implementor's request for "I need to implement $high level service" is basically carte blanche.
The thing about shadow-docs and shadow-services is that they're hard to find out about in a timely manner. A lot of these comments are fighting the very core of human nature.
(We used to be better about this as a company, back when we were very engineer heavy, and filled with good engineers — most better than me. The quality bar definitely fell at some point, and we've hired a lot of not engineers doing things that really would be better served by an engineer. Y'all are working at rainbow companies, and I don't know how to keep a company in that state or move it to that state as a bottom-of-rungs eng.)
IaC, right? If you don't put keys into the Code you can't have Infrastructure as Code. Without keys the code only partially defines your infrastructure.
If anyone out there is using environment variables currently, and is interested a quick path to plugging the leaks in their secrets management, check out EnvKey[1] (disclaimer: I'm the founder).
Because EnvKey integrates tightly with environment variables, no app code changes are needed to switch, so it only takes a minute or two to import/integrate a typical app.
EnvKey is designed to help avoid incidents exactly like the one that just hit Toyota, while staying out of your way and even making your life significantly easier as a developer by simplifying many aspects of config management.
Give it a look if you know you have some room for improvement in this area and are looking for an easy, secure, open source, end-to-end encrypted solution :)
Well, I guess in the purest possible sense you're correct.
However, I'm currently working with a group using Terraform on GCP (GKE), and it's popular with them to use Secret Manager to manually create a secret in there (when it cannot be auto-gen'd with the IaC, a fairly small subset of things) and then reference that secret from the infra-defining code.
I think of it as being akin to "this service requires a correctly configured FOO_BLAH variable in it's environment". I don't really see it as any failure of achieving some IaC goal, but defining infrastructure code isn't my primary function, so take this with a grain of salt.
This is exactly how Ansible-vault works. It's many times better than committing them plain, but I'd still vote for some external service providing the secrets runtime only.
In case it's helpful, here's a bookmarklet that instantly searches any public repository for the most common secret patterns: https://about.sourcegraph.com/blog/no-more-secrets. Sourcegraph built a feature called "code monitors" that basically runs a recurring search in the background to guard against anti-patterns like API keys and secrets being committed into the codebase.
> A credential for a DB server holding customer data was hardcoded into the repo.
This is normally not a problem since databases should not be exposed to everyone one the Internet.
> One of the benefits of Git is that everyone has a complete copy of the project they are working on.
This is one of the problems with GIT and many programming langs. Because of this full copies of entire code bases are everywhere. GIT also only has ACL on repo level.
In many langs it is recommended to not store credentials in the code but at the same time there are no guidelines for how to store credentials. If you use cloud services there are recommendations.
Edit: Btw, which git service where they using? Cause I believe Github and others monitor code bases for accidental pushes of secrets.
It can actually be comical just how _bad_ things can be at large orgs.
Anyone have details, theories, or a book on how such inefficiencies come about? I can't speak to tech-oriented large orgs but I've worked with others and its just... I'm not shocked at all. I've seen public facing API keys in HTML, private SSH keys that do god knows what in plaintext on FTP servers... I just don't understand how they seem to care so much but in reality, care so little. Just lazy?
In my experience, security is so far removed from the actual job description/day to day cares that it's perpetually "somebody else's problem", seen as an unnecessary time sink. Usually there's a couple of people that actually care, but they're ignored and lack the power to influence change.
Not lazy, just overburdened by more important things.
That seems to match what I've seen. Especially the last line. It's just so weird the dynamic between "pretending to care" and "actually caring." I've worked at or consulted on small teams that were very "lax" about security but everyone seemed to take it seriously so it "worked."
At larger orgs, I notice they take it "seriously" but that results in people finding creative loopholes... like pasting in all the important production keys/passwords into a google doc and sharing the google doc because they "can't send secrets over slack" ... lol
You're the receptionist at a high-security facility. Lots of people work there. Your job is to make sure the people get in and out quickly.
The building has multiple methods of access at different stages. During the course of doing your job, you don't follow the proper protocols a few times. You closed a door without re-entering a lock code. You didn't check someone's badge and ID at the third floor access gate. You forgot to make sure someone swiped out on exit.
Nobody actually trained you on any of those protocols; they just made you watch a "security is everyone's responsibility" video and then expected you to know all the protocols. Maybe once or twice you were lazy, but it's equally likely you just didn't know what to do. And once you make that mistake, somebody can exploit it as long as you don't follow the protocol.
In facilities where security is important, there are safeguards. Doors left open sound an alarm, automatically lock when closed, and sensors detect if more than one person walks through without badging in. These safeguards exist because people are fallible and need help to enforce security. Companies without security safeguards either don't care, or are ignorant.
Because the importance of security isn't respected until it's too late. At any medium to large company in your team's planning process you always need to justify the business or customer impact (even on infra teams). And security related efforts are much harder to sell because the impact is a "what if" and not $$ saved, products being easier/faster to develop, or customers benefiting in some way.
Also basic security knowledge isn't screened for in hiring nor is it really taught in most orgs (aside from trainings that people skip or optional stuff).
There’s a book by John Gall called the Systems Bible [0] that goes into how big systems form and fall apart. Mostly anecdotes and no real solution, but a decent read that isn’t full of the usual BS that’s required because usually only large systems can afford high speaker fees.
The reality is that most companies can't afford an IT budget it would require to implement, and then force adherence to, a standard set of best practices for much of anything.
As another commenter said, though, this is not the case in regulated industries, in the parts of those companies dealing with regulated processes, controls and data access.
every organization is held back by the slowest adopter. If you advance too quickly compared to your colleges then you will likely leave because everyone else feels like they are trying to pull you back into their crap. Innovation is a depreciating asset. If you don't reward the people who make a quantum leap, they will leave, and all their progress will revert to the mean.
I bet someone made a security key because it was the right thing to do but they didn't have the controls in place to manage it in their build system and give another key to developers/engineers/etc. So someone else copied it for convivence rather than have to explain to every moron in the whole company how to use it to access the database or run their monolith tests or get access from the dude that no longer works there who was the "giver" of access keys.
Toyota is claiming no, not with this leak. It was a partial repo that was exposed. The data they accessed with the key got customer ID numbers and emails only.
I worked for a couple corps in my life and the security theatre in here is ridiculous. At the same time things like this happen all the freaking time. And none of the company spyware can catch that. Funny how this never happened in small companies I worked for, although maybe that's just a function of number of people? Or maybe because small companies rely on fewer platforms and frameworks so fewer chances of such leaks?
In my experience, smaller companies tend to hire the best they can attract and afford. Larger companies tend to collect talent so no one else can get it, so it usually isn't "the best."
What if companies using secret keys (not SaaS providers who generate secret keys) voluntarily send their secret keys (MD5 hashed or somehow encrypted) to GitHub and GitHub can then monitor their leakage and notify the company?
Same system can be applied for repo owner being pattern provider and getting notified with matches but if i had to guess only enterprise customers might get feature like this
This is why you should never, never, never check production keys into git. There are tons of secret manager services out there. All the major cloud services have them and even companies like 1Password now have them.
Was this title re-written? I feel it was phrased more vaguely when I first read it, but I could be wrong... More curious about if that's a thing hackerNews does.
Question: Let's say I want to open source my app but a long time ago I used to have credentials hard coded. What can I do to clean this up from history?
Git does have tools that allow you to rewrite history to fix situations like this, but by far the easiest solution is to invalidate those credentials so they become worthless.
Here's a checklist [1] (again, from gitguardian) of steps to follow before open-sourcing projects and [2] a guide on how to remediate hardcoded/exposed secrets.
To prevent accidental leaks perhaps Github et al could implement a mechanism analogous to the Eurion constellation [0]. E.g. A commit containing a file with a specific text pattern should always be rejected. Most fileypes accept arbitrary comments (ahem Json) so it should work for most files.