Hacker News new | past | comments | ask | show | jobs | submit login

> We also collect a few anonymous data (CLI errors, most frequently used commands and count of resources).

Looks cool, but this is an instant no for me. Sorry guys.




Also upload the a hash of the userid and accountid. Hashed with non-random salt so it's not really anonymous as the function says.

userid and accountid stored in database here: https://github.com/wallix/awless/blob/e2bf4f2cad37b011c5b3b6...

retrieved by stats here: https://github.com/wallix/awless/blob/e2bf4f2cad37b011c5b3b6...

Added to stats payload here: https://github.com/wallix/awless/blob/e2bf4f2cad37b011c5b3b6...


(I'm one of the core developpers of awless)

The hash functions are totally unrevertable, so it is impossible to come back to the original identifiers.

We added these anonymous ids, in order to know which commands are the most used per users.

Anyway, if you have better ideas on how to manage this, feel free to make a pull request or create a Github issue. And if you prefer to disable it, you can also do it easily with the source code (you just need to comment a few lines).

Edit: We opened an issue for this topic on our Github repo: https://github.com/wallix/awless/issues/38 . Feel free to continue the discussion there.


You don't need to break SHA256 to de-anonymize these values.

`awless` collects account number hashes. AWS account numbers are 12 decimal digits long, meaning there's a total of 10^12 unique values. Values are anonymized before submission using a single round of SHA256, so in ~2^40 hash operations, anyone with your database of hashes can invert every single account number.

For comparison, the bitcoin blockchain presently has a hash rate of ~2^61 SHA256 hashes per second. (Edit: I incorrectly stated 2^41 based on a hash rate of 3 TH/s, when it's actually 3 million TH/s.)


On my not-so-special spare server, I'm able to pregenerate the hashes with that fixed salt at 344,191 per second. So, it would take only about a month to compute them for every 12 digit AWS account number. And, as mentioned, that's on my not-so-fast spare server, running in one process, one thread.

acct [000003441910] has hash [d2a52833a6e434d2a55be0ce852c2dd9c5260c49a7c28ea4fa3fe2ac6d054d7e] (the last one it finished in 10 seconds)

A little effort with a decent GPU + hashcat though, would take this exercise down to a few minutes.


Good point. Thanks for the advice, we will study quickly how we can improve this. Our goal is above all to make the usage of AWS easier, and as a result, more secure. We do not want to expose the CLI users to any new threat. We made the source code available to anyone (even the anonymous data collection), to be transparent and get feedback on our work to correct it when needed.


I opened an issue:

https://github.com/wallix/awless/issues/39

PBKDF2, bcrypt, and scrypt are all used where a database needs to store something and check for equality, but where the values in the database need to not be reversible even if the database is breached. They might be suitable here.


None of those can deal with the case of having too limited of an input range. Even if you use a million rounds, you've only added 2^20 to the workload.


Different algo, but my 970 can perform 3.4 billon SHA1 hashes per second on the low setting in hashcat


You can create a randomly generated cookie of sorts instead of doing anything with a users' credentials. The supposed accomplished task and end goal would be the same, and yet, people would feel more comfortable.

Your claim that you are using an irreversible hash is not comforting.

Your forced data collection is also not comforting.


> You can create a randomly generated cookie of sorts instead of doing anything with a users' credentials.

That throws off their statistical analysis. Random cookies generates a new cookie for each new install or re-install, inflating the "users" count. If someone installs this on five different servers, the stats under random cookies will show five separate streams of data, and they will draw improper conclusions that a particular operation used on all of those servers if five times more popular than it really is. A configuration flag to disable the data collection is reasonable, but using a well-known hash like Whirlpool to anonymize the data stream is also reasonable.

If someone doesn't like data collection, then they shouldn't use cloud products, and they should just as vociferously declaim cloud services. With cloud services, whether or not the usage data collection is anonymized is at vendor discretion, but here, you control the source. Using a utility for a cloud service, and complaining about usage data collection, is ironic, considering AWS surely collects the same data.


> AWS surely collects the same data

Well of course they do, since all of these commands send off calls to AWS servers. And is you're using AWS products you already trust Amazon, that doesn't mean you trust a random person who put some code on Github.


This whole mess should be opt-in, but it's shocking that anyone thought uploading account IDs hashed with known salts was a good idea. How long did it take you to generate the rainbow table? What you did was more difficult than simply generating a random string as you should have done.


Project creator here (but obviously not the OP).

Yes, we do collect minimal anonymised statistics in the sole goal of improving awless. All the statistics code is here: https://github.com/wallix/awless/blob/master/stats/stats.go

As the project is Apache licensed, you're free to modify it if you don't want this. Also, if you're conscious about privacy you should use application firewalls on your client side like Little Snitch etc. since many software that you install on your machine also do this.


You should at least provide a prompt on first start that asks if participating in analytics collection is acceptable.


I like the look of this, so on the software side it's a thumbs up.

However, the fact that the code is active at all will rule it out for some companies (firewall or not).

Perhaps make it something users can turn off in a config file? Not everyone can code in go, especially if their job is as a sysadmin, which isn't unlikely given that this is an infrastructure tool, so it might not be as simple as forking and editing the code for them.


Or make it turn-off-able (?) with an environmental variable. There are a couple of ways to make the tool default to report and allowable in non-reportable environments. The key thing is to make what is happening transparent.


Must be an explicit "turn-on" option.


I appreciate that your folks released this OSS tool.

However:

Where I work, as long as the data collection code is in there, whether I can modify it or not, they won't allow it on our computers. I know this is not uncommon.

Dismissing this concern by saying "other software does this" while awless falls into a different category (small CLI tool) is also problematic.


Thanks for the feedback. Until we provide a way to allow/disable data collection, we have disabled completely the data sending (see https://github.com/wallix/awless/commit/f6389e75787390bd7797...).


What does the data payload look like? I'd like to see the actual data you're sending, even if it's just a mock. From digging around in the code, it looks like you're sending infra data, including instance IDs. How do I know you aren't sending my AWS access tokens[0]?

[0]: https://github.com/wallix/awless/blob/e2bf4f2cad37b011c5b3b6...


A toggle at least would be nice to turn all data collection off.


And for some reason, this (in my eyes useless) data collection is bundled inside the version check: https://github.com/wallix/awless/blob/master/stats/stats.go#...


Not useless, gives them usage numbers.



Agreed! Cant run it on our environment as well :(


Is it opt-out?


[flagged]


Bitching? You kidding me? This is user feedback. Someone posted here to promote the tool out here, and we are asking them to remove it, that becomes bitching? That's insulting from your end.


No need to fork. Just clone and comment the line.


That "bitching" is both constructive criticism and helpful to highlight here in the comments so others may take note.


Maybe you're right and I'm being unfair. It just seems kind of dick-ish - what's wrong with even "Cool, but I don't like stats being collected, please make this opt in"?


That's how I interpreted the OP.


Well, I think it's not constructive criticism. If there were a closed source project, then yes, it would be a very helpful highlight.


This kind of functionality is generally frowned upon in the Free Software world. For example, in Debian, it'd be treated as a bug and patched out. So I disagree; calling it out to inform others is entirely appropriate.


Wow, and your website isn't even HTTPS for what appears to be a security company. Get it together.


Seems like a non sequitur when discussing an open source project they have released.


The tool phones home. Their website doesn't have HTTPS. It's plausible that the tools phones home over an unencrypted channel (I didn't look, so I could be wrong).

My overall impression is that they don't do security very well.


Anyone can release a project on Github; the project should be based on its merits, not how well an unrelated project is implemented. (and vice versa)

A quick search of the repo of https:// and then http:// shows that the stats collection is apparently https.


@heartsucker If you want to judge on previous things, we are the team that created http://opalang.org and have no tie at all with the company static and outsourced portal. Also, will be in Berlin soon, contact me will gladly meet there.


Sure, and if we imagine a hypothetical entity that has 10 products with security holes and then releases and 11th, it might be worth looking at the 11th more suspiciously. Things don't happen in a vacuum.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: