The hash functions are totally unrevertable, so it is impossible to come back to the original identifiers.
We added these anonymous ids, in order to know which commands are the most used per users.
Anyway, if you have better ideas on how to manage this, feel free to make a pull request or create a Github issue. And if you prefer to disable it, you can also do it easily with the source code (you just need to comment a few lines).
You don't need to break SHA256 to de-anonymize these values.
`awless` collects account number hashes. AWS account numbers are 12 decimal digits long, meaning there's a total of 10^12 unique values. Values are anonymized before submission using a single round of SHA256, so in ~2^40 hash operations, anyone with your database of hashes can invert every single account number.
For comparison, the bitcoin blockchain presently has a hash rate of ~2^61 SHA256 hashes per second. (Edit: I incorrectly stated 2^41 based on a hash rate of 3 TH/s, when it's actually 3 million TH/s.)
On my not-so-special spare server, I'm able to pregenerate the hashes with that fixed salt at 344,191 per second. So, it would take only about a month to compute them for every 12 digit AWS account number. And, as mentioned, that's on my not-so-fast spare server, running in one process, one thread.
acct [000003441910] has hash [d2a52833a6e434d2a55be0ce852c2dd9c5260c49a7c28ea4fa3fe2ac6d054d7e] (the last one it finished in 10 seconds)
A little effort with a decent GPU + hashcat though, would take this exercise down to a few minutes.
Good point. Thanks for the advice, we will study quickly how we can improve this.
Our goal is above all to make the usage of AWS easier, and as a result, more secure. We do not want to expose the CLI users to any new threat. We made the source code available to anyone (even the anonymous data collection), to be transparent and get feedback on our work to correct it when needed.
PBKDF2, bcrypt, and scrypt are all used where a database needs to store something and check for equality, but where the values in the database need to not be reversible even if the database is breached. They might be suitable here.
None of those can deal with the case of having too limited of an input range. Even if you use a million rounds, you've only added 2^20 to the workload.
You can create a randomly generated cookie of sorts instead of doing anything with a users' credentials. The supposed accomplished task and end goal would be the same, and yet, people would feel more comfortable.
Your claim that you are using an irreversible hash is not comforting.
Your forced data collection is also not comforting.
> You can create a randomly generated cookie of sorts instead of doing anything with a users' credentials.
That throws off their statistical analysis. Random cookies generates a new cookie for each new install or re-install, inflating the "users" count. If someone installs this on five different servers, the stats under random cookies will show five separate streams of data, and they will draw improper conclusions that a particular operation used on all of those servers if five times more popular than it really is. A configuration flag to disable the data collection is reasonable, but using a well-known hash like Whirlpool to anonymize the data stream is also reasonable.
If someone doesn't like data collection, then they shouldn't use cloud products, and they should just as vociferously declaim cloud services. With cloud services, whether or not the usage data collection is anonymized is at vendor discretion, but here, you control the source. Using a utility for a cloud service, and complaining about usage data collection, is ironic, considering AWS surely collects the same data.
Well of course they do, since all of these commands send off calls to AWS servers. And is you're using AWS products you already trust Amazon, that doesn't mean you trust a random person who put some code on Github.
This whole mess should be opt-in, but it's shocking that anyone thought uploading account IDs hashed with known salts was a good idea. How long did it take you to generate the rainbow table? What you did was more difficult than simply generating a random string as you should have done.
As the project is Apache licensed, you're free to modify it if you don't want this. Also, if you're conscious about privacy you should use application firewalls on your client side like Little Snitch etc. since many software that you install on your machine also do this.
I like the look of this, so on the software side it's a thumbs up.
However, the fact that the code is active at all will rule it out for some companies (firewall or not).
Perhaps make it something users can turn off in a config file? Not everyone can code in go, especially if their job is as a sysadmin, which isn't unlikely given that this is an infrastructure tool, so it might not be as simple as forking and editing the code for them.
Or make it turn-off-able (?) with an environmental variable. There are a couple of ways to make the tool default to report and allowable in non-reportable environments. The key thing is to make what is happening transparent.
I appreciate that your folks released this OSS tool.
However:
Where I work, as long as the data collection code is in there, whether I can modify it or not, they won't allow it on our computers. I know this is not uncommon.
Dismissing this concern by saying "other software does this" while awless falls into a different category (small CLI tool) is also problematic.
What does the data payload look like? I'd like to see the actual data you're sending, even if it's just a mock. From digging around in the code, it looks like you're sending infra data, including instance IDs. How do I know you aren't sending my AWS access tokens[0]?
Bitching? You kidding me? This is user feedback. Someone posted here to promote the tool out here, and we are asking them to remove it, that becomes bitching? That's insulting from your end.
Maybe you're right and I'm being unfair. It just seems kind of dick-ish - what's wrong with even "Cool, but I don't like stats being collected, please make this opt in"?
This kind of functionality is generally frowned upon in the Free Software world. For example, in Debian, it'd be treated as a bug and patched out. So I disagree; calling it out to inform others is entirely appropriate.
The tool phones home. Their website doesn't have HTTPS. It's plausible that the tools phones home over an unencrypted channel (I didn't look, so I could be wrong).
My overall impression is that they don't do security very well.
@heartsucker If you want to judge on previous things, we are the team that created http://opalang.org and have no tie at all with the company static and outsourced portal. Also, will be in Berlin soon, contact me will gladly meet there.
Sure, and if we imagine a hypothetical entity that has 10 products with security holes and then releases and 11th, it might be worth looking at the 11th more suspiciously. Things don't happen in a vacuum.
Looks cool, but this is an instant no for me. Sorry guys.