Hacker News new | past | comments | ask | show | jobs | submit login

Github supports this out of the box – https://docs.github.com/en/code-security/secret-scanning/abo..., and recognizes tokens from a lot of services.



In fact, you can apply as a Github "secret scanning partner" to have your own secret's format (regexp) be a part of this secret scanning, with a webhook to your servers whenever they find one, so that you can do the credential-invalidation on your own backend + send the kindly-worded email from your own domain.

Mind you, your secrets need to have a distinctive format in order for this to work. Probably a distinctive prefix is enough.

An Unethical Life Pro-Tip (that the word is already out on anyway, so I don't feel too bad):

• The content of Github public repos is all continuously loaded (by Github themselves) as a public dataset into BigQuery — https://console.cloud.google.com/marketplace/details/github/....

• For about $500, you can use BigQuery to extract all matches of a particular regexp, from every file, in every commit, in every public Github repo.

Whether or not Github themselves use this to power their secret scanning, arbitrary third parties (benevolent or not) certainly can use it for such. And likely already do.


The GitHub public events API is delayed by 5 minutes, presumably to give secret scanning partners time to react before commits are made public.

https://github.blog/changelog/2018-08-01-new-delay-public-ev...

Disclosure: I'm an ex-GitHub employee but was not involved in the secret scanning API.


Makes sense; but doesn't help the companies who aren't aware of the secret-scanning service / the ability to become a secret-scanning partner. If you have your own little API SaaS with its own API-key format, then you've probably got API keys exposed in the Github dataset; and someone's probably already found and extracted them. (It happened to us!)

Mind you, the Github dataset isn't the leak itself; the leak is the public repo that the user pushed their key to. The dataset just makes such searches scalable / cost-effective to third parties who aren't already indexing Github for some other reason.


Does GitHub not postpone publishing new verisons until after the secret scanning is done?

Also I'd hope that Google is scanning BigQuery queries for that abuse signal.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: