Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know alternative places to download the data set? The original forum it was posted in is slammed.


Me too. 35 millions of italian user is really near the 100% of italian internet users. So I need to understand how many info about me and my family are on the web.


There's a magnet link somewhere in this thread

https://archived.moe/g/thread/80976828


Search for "fbleaks" on telegram


Thank you, I am now downloading all the available data, can’t wait to play around with it.

One of the annoying things is that there's a timestamp making up the 10th column that has ':' in it, but the delimiter for the fields is also ':', so it makes a clean import to a database a bit of a hassle as the file may need some processing, probably will just do a find and replace as all the time stamps seem to be 12:00:00 AM. The column holding the current employment is also problematic.


I haven't seen the data yet and there's likely a better way of doing this, but worst case - couldn't you just script replacing :00: with -00- (through :59:)? Then, if you wanted, replace ':' with a better delimiter and then replace the -00- back to :00:?

I'm assuming there's a better way w/ regex but I'm not great with regex. Could probably find that with a few minutes of google though, just replacing the pattern of :##: with -##- using numeric wildcards.


Looking for this as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: