A database with 3.8B phone numbers from Clubhouse is up for sale

deliberateJack · on July 24, 2021

I am selling a database with ten billion phone numbers. 1.25 GB file with each number compressed to a single bit. You can compare the clubhouse database against mine to determine which numbers are not in their set.

Scoundreller · on July 24, 2021

Knowing which numbers are capable of receiving SMS and which aren't has some value.

Especially in a world of number portability where you can't just say "oh, that's an old number, it must be POTS".

But I guess, here, if a number is from your contact list, it may still be POTS.

But at least you have higher assurance that it's an active user. If you wardial one day, you quickly find out how many numbers never lead to a human for various reasons. In theory, some of these are trap numbers and quickly flag the caller as suspicious, but I doubt it.

rsync · on July 25, 2021

"Knowing which numbers are capable of receiving SMS and which aren't has some value."

This isn't difficult - I wrote a shell script named "lookup" that will give me background info for any phone number I feed it and tell me what kind of number it is, what carrier it is, who it belongs to, etc.:

  # lookup 415-333-2222

  {"caller_name": {"caller_name": "WIRELESS CALLER", "caller_type": null, "error_code": null}, "country_code": "US", "phone_number": "+14153332222", "national_format": "(415) 333-2222", "carrier": {"mobile_country_code": "311", "mobile_network_code": "489", "name": "Verizon Wireless", "type": "mobile", "error_code": null}, "add_ons": null, "url": "https://lookups.twilio.com/v1/PhoneNumbers/+14153332222?Type=carrier&Type=caller-name"}

... which is very useful since I often send (personal) SMS from the command line and sometimes I need to know if a number can receive it ...

I'm not going to paste the entire script here but the meat of it is:

  /usr/local/bin/curl -X GET "https://lookups.twilio.com/v1/PhoneNumbers/$number?Type=carrier&Type=caller-name" -u $accountsid:$authtoken

... and each lookup costs a penny or a half a penny or something ... I forget ...

EGreg · on Aug 2, 2021

How would your script obtain this information though? Relying on twilio?

rospaya · on July 24, 2021

In some countries mobile phone numbers have a prefix so you know by that.

gsich · on July 24, 2021

Also some POTS provider will accept SMS and either read it to you, or you can read them in some web portal (or the router possibly).

simfree · on July 24, 2021

The Local Routing Number provides this value in the USA, and multiple carriers (eg:Twilio) offer daily deactivation reports from the cellular carriers so you can tell which numbers are unroutable.

Scoundreller · on July 25, 2021

Canada isn't as progressive. Only telecoms can see which telecom a number points to and for the purpose of call-routing only.

fisherjeff · on July 24, 2021

Great. It’s the weekend and I can theoretically now stop thinking about software, and yet here I am thinking of ways to efficiently compress lists of phone numbers

perihelions · on July 24, 2021

There was a thread about that last month,

https://news.ycombinator.com/item?id=27549075 ("Sorted Integer Compression")

fisherjeff · on July 24, 2021

The rabbit hole deepens…

quchen · on July 24, 2021

Just enumerate them all, if none is missing it's fairly easy to compress. (And 1b per number is really inefficient) ;-)

main = traverse print [1..99999999]

WJW · on July 24, 2021

The Kolmogorov complexity of the set of all phone numbers is pretty low. All phone numbers with a few missing is also pretty low.

In fact, I now wonder if you can even compress the 3.8b phone number set to less than 1 bit per phone number. It should be pretty doable since a significant chunk of the number space is not valid.

dillondoyle · on July 24, 2021

But not all numbers are valid? 911. Not all area codes exist.

luckman212 · on July 24, 2021

What language is that?

WJW · on July 24, 2021

Haskell

H8crilA · on July 24, 2021

Presumably all non-american ones are not on your list?

gihtas · on July 25, 2021

How much?

saiya-jin · on July 24, 2021

I have even better - for every country, just covering all their operator's prefix and then 99999-9999999 numbers in that range. Definitely the biggest dataset around, and bigger is alwyas better, right?

astatine · on July 24, 2021

The 3.8B numbers is really meaningless, in isolation. This is the problem of plenty - 10K numbers with a very specific profile might be a lot more valuable. The real worry would be the info on the relationships between the numbers (which number is connected to whom). This leak seems to have a count of relations rather than the actual connections.

axegon_ · on July 24, 2021

Well the facebook data that was published everywhere earlier this year could hold some value when combined with this one: While the facebook data is somewhat outdated, I'm pretty sure you'd get millions of people with relevant and up to date information.

ttam · on July 24, 2021

https://twitter.com/UnderTheBreach/status/141888964970820813...

this tweet says it's BS (they validated the japan sample)

PragmaticPulp · on July 24, 2021

According to the Tweet, the leaker provides a claimed data sample that is a list of phone numbers without any additional information.

A list of 3.8 billion phone numbers that simply exist is useless. The leak would only have value if the numbers were associated with some identifying information.

If it’s really only phone numbers, I wonder if it’s a leak or if someone brute-forced all possible phone numbers against a ClubHouse API that leaked information about whether or not the number existed in their database.

sebmellen · on July 24, 2021

If Clubhouse can’t detect >3.8B erroneous requests and shut down that API/microservice, that destroys my confidence more than a data breach.

mohanmcgeek · on July 24, 2021

Clubhouse didn't have 3.8B users.. why would they have 3.8B phone numbers?

This whole thing seems made up.

mcintyre1994 · on July 24, 2021

Because they encourage users to upload their contacts so they can connect them on the platform. At one point when it was invite-only these uploaded contacts were the only way to invite friends.

makapuf · on July 24, 2021

A fair share of my phone numbers are bogus(old numbers, info I store as a phone number even if its not) so the db extracted from here would be dubious

jsjohnst · on July 24, 2021

Last I heard, they had around 10M users. Since they employ the, what I would consider, dark pattern of heavily encouraging folks to upload their contact list, that comes out to an average of 380 people per person. Given the Clubhouse user base demographics, I find this at least plausible.

jimkleiber · on July 24, 2021

I'd say it's even more of a dark pattern than that. They didn't encourage me to "upload my contact list" but rather "give access to my contacts" (or something like that) Perhaps the difference is trivial in how it's coded yet even though I've removed their access to my contacts, they still have my contacts. I think they should have to delete them whenever I remove their access, or not even upload them in the first place but just read them when necessary.

Also, some apps seem to do this with photos, asking for access, does anyone know if these apps also upload all of one's photos once the user grants permission on iOS?

d110af5ccf · on July 25, 2021

> does anyone know if these apps also upload all of one's photos once the user grants permission on iOS

That would eat up a lot of bandwidth. I suspect someone would notice it. An app could extract a lot of information from the metadata though, assuming it had access (I'm not sure how permissions on iOS work currently). It could also potentially run facial recognition algorithms locally (not sure how well that would work in practice though).

jimkleiber · on July 25, 2021

I really like that point about the bandwidth and also about the metadata and facial recognition.

I guess I just wish we had more insight into what info companies take and how, permissions on iOS and Android seem to be getting more granular and yet still seem quite broad to me.

jsjohnst · on July 25, 2021

I’m particularly fond of iOS’s new “selected photos only” setting, but apps really don’t support it well in general (so I chose not to use them anymore). Instagram used to support it decently well, but in a recent update they removed the “select more” button and my usage of Instagram has dropped dropped dramatically since.

jimkleiber · on July 25, 2021

I mean, I like it in theory, however I find it can be really cumbersome. I don't see why they can't just have me open my "pick a photo" browser on iOS without needing access to the photos. Seems odd that choosing photos from the OS can't just be the default option.

jsjohnst · on July 26, 2021

When an app first requests access to photos, it’s one of the options listed in the system permissions dialog, so it’s virtually the default. The problem isn’t that, it’s that once you’ve picked the “selected photos only”, apps can choose to make it a pain to pick additional photos if they don’t add a UI element for it. Given that Instagram had it before and then removed it, I can only assume that the real reason is to try to coerce users into granting all access (nice try FB, but not going to happen for me!).

jimkleiber · on July 26, 2021

Oh wow I didn't know this. From what I see on iOS, IG still lets me Manage>Select more photos, whereas WhatsApp has a tiny "You've given WhatsApp access to only a select number of photos. Manage" at the top.

So now I've set all to Selected Photos and will just click manage and add extra photos when I need them. So much easier than I had thought, thank you!!

jsjohnst · on July 26, 2021

> From what I see on iOS, IG still lets me Manage>Select more photos

Weird! That option is missing from mine as of about a few weeks ago when doing a normal post. Stories’s picker gives me the option to “Manage”, but no where can I find the option for normal posts as of the last app update. Would you mind sharing a screenshot? I’d love to see if our UIs are different in some way. My contact info is in my profile here if you prefer to share privately.

jimkleiber · on July 26, 2021

Ohhh, no I hadn't looked there. I just checked my normal posts function and it also does not let me "manage photos.

Where I originally found it was in the messaging feature of IG.

jsjohnst · on July 27, 2021

Hadn’t noticed it was in messaging still. Guess that’s another avenue to add more selected photos. Really b/s on them imho.

acid__ · on July 24, 2021

That would only be true if it were 380 _unique_ contacts per person. Surely there is significant overlap from user to user.

jsjohnst · on July 25, 2021

See my reply to sibling comment here: https://news.ycombinator.com/item?id=27949879

whatch · on July 24, 2021

Shouldn't it be 380 distinct people?

jsjohnst · on July 25, 2021

Not necessarily. Do we know every single number in the 3.8B is unique? I’ve seen zero proof of that, but maybe I missed it.

sellyme · on July 26, 2021

I'm pretty sure that would qualify as the number being "made up".

If anyone disagrees, I'm happy to sell my database of 100B valid phone numbers.

jsjohnst · on July 26, 2021

> I'm pretty sure that would qualify as the number being "made up".

Not necessarily. Let me give you an example, if there’s other metadata included with a specific contact list entry, it would be valuable to have duplicate numbers, as that extra metadata could then be leveraged potentially.

mm983 · on July 24, 2021

they didn't "validate" anything, they just opened the csv. also i'd be interested in their take on the second column, that looks like clubhouse's scoring system (which they ran without telling anyone, likely for marketing purposes, according to this* article). if so, you can in fact tell which numbers are more significant than others.

*https://futurezone.at/apps/clubhouse-leakt-38-milliarden-tel...

zinekeller · on July 24, 2021

Hmm, so the "highest" numbers would be publicly-knowable numbers anyway (because they are the numbers to dial and contact the government/customer service of a private company).

If this is only a list of numbers and their relative popularity, the best you can do is accusation of adultery (and even in that, you could say that you're "popular" because coworkers also store your numbers).

FabianBeiner · on July 24, 2021

https://zerforschung.org/posts/clubhouse-telefonnummern-en/

koolba · on July 24, 2021

They should combine it with that zero click remote iMessage bug. That’d be some serious black hat marketing synergy.

anigbrowl · on July 24, 2021

Enough phone numbers for half the population of the world? Cool story, bro.

I refer here to the aspiring salespeople, not the person reporting it. I suspect this list will be available for free on the dark web within a couple of months. Much as I like to collect interesting data this doesn't seem useful.

paxys · on July 24, 2021

I wonder how feasible a business model it is to collect all the data from all leaks which make their way to the internet, massage the data a little bit, and sell it as a brand new "hack" of some popular service. You can probably do this a few times a year without a problem.

d110af5ccf · on July 25, 2021

Why fake a new data leak at all? It's likely to be illegal either way. Depending on the quality of your work I suspect it would be easy to find buyers for aggregated and cross validated data sets on the black market.

For that matter, I have to assume that the shadier businesses silently make use of publicly available leaks. The data is just too valuable to ignore depending on your business model.

mam3 · on July 24, 2021

Bilions ?? On clubhouse ?

mcintyre1994 · on July 24, 2021

Clubhouse does the classic “share your contacts with us to find your friends here” thing, but it sounds like they just upload your entire list into their database instead of doing anything remotely privacy aware. I’m mostly curious how much else they uploaded with the numbers - is this name + number + email etc? And if this dump is just numbers, do Clubhouse have the rest somewhere else?

FabianBeiner · on July 24, 2021

According to the screenshot: All members plus every single number in each of their phone books.

oliv__ · on July 24, 2021

Even if they had 10M users (which I doubt), at 100 contacts per user that's 1B contacts.

BatteryMountain · on July 24, 2021

They forgot to "select distinct"?

chovybizzass · on July 24, 2021

It includes every users' contact list from their phone. So likely damn near everyone on the planet with a cell phone.

coldcode · on July 24, 2021

Are people really that stupid to give some mobile app company access to their contact list? On iPhone you have to explicitly give permission, I presume on Android as well. I find that hard to believe everyone is doing it.

hdjjhhvvhga · on July 24, 2021

Many apps will refuse to work if you don't allow access to your contacts, so people just give in and allow it.

Google is the biggest abuser in this area just grabbing all your contacts and linking them to your Google account once you add any Google account (like Gmail or Youtube) to your Android device.

user-the-name · on July 24, 2021

I do not think you are allowed on the Apple App Store if you do that.

capableweb · on July 24, 2021

Maybe not for smaller apps but apps with large user bases are under different rules than the rest.

user-the-name · on July 24, 2021

I work for a fairly large app and that certainly is not the case for us.

alisonkisk · on July 24, 2021

What are you talking about?

flemhans · on July 24, 2021

It's extremely annoying to add a number to Telegram without adding it as a contact first, and allowing Telegram access to the contact list.

noxer · on July 24, 2021

Whats the point of that? You dont need to exchange phone numbers for telegram just the @username and only one side needs to know the others username.

And once you have a chat with someone both can share their own contact directly in the chat with 2 clicks and add it with 2 clicks as well.

(which is still rather useless because there is no real benefit from adding someone as contact. But I guess if you want to store number then this is easy)

b3morales · on July 24, 2021

You're thinking like a technically enlightened person -- if not an engineer -- who prioritizes efficiency and control.

You're not thinking like a "normie" goal-oriented user, who doesn't care about understanding the system, and for whom the shortest path to achieving their goal generally passes through saying "sure, whatever" to any requests the app makes.

noxer · on July 25, 2021

Maybe read the message I replied to before making such a nonsensical reply that has absolute nothing to do with what I asked the person above.

flemhans · on July 26, 2021

I can't figure out how to message someone with their @username.

I press "New Message" and get these options: "New Group", "New Contact", "New Channel".

If I press "New Contact" it asks for permission to access my contacts. If I refuse, it goes back to the three options mentioned above.

I've tried to find it for years, and I just have the other person message me first.

noxer · on July 26, 2021

You put the @username (the @ is optional) in the search field then click on the user and send a message. Once a message is send the chat will obviously stay in the chat list. Same for public groups/channels.

Alternatively you can share/click a link with the format t.me/username to skip the search part.

nemothekid · on July 24, 2021

>Are people really that stupid to give some mobile app company access to their contact list?

Almost every social media startup in the last 15 years was bootstrapped this way.

alpaca128 · on July 24, 2021

Afaik WhatsApp (on Android at least) requires you giving access to your contacts. So roughly speaking a huge chunk, probably the majority, of smartphone users shared their contact list to at least one company, which strictly speaking might not even be legal in many cases.

After all that's how WhatsApp populates its contact list, it looks which users have each other's phone numbers. That way it doesn't need a user login and friend/contact requests, but in return you give up your privacy.

wngr · on July 24, 2021

Not true. It'll work without, it's just very inconvenient.

CapitalistCartr · on July 24, 2021

Everyone doesn't have to. If one person with your number gives up their contact list, they have yours. I'd guess about 10-12% of the populace would have to cooperate.

FabianBeiner · on July 24, 2021

That was what made Clubhouse so famous:

"After registering, the clubhouse app asks for access to your address book. This must be granted if you want to invite friends."

jbverschoor · on July 24, 2021

I have no idea how that went through the Apple checks

codetrotter · on July 24, 2021

It must be granted to invite friends but you can deny it access and still use Clubhouse, just that until you grant access you can’t invite others.

codetrotter · on July 24, 2021

Actually now that I look into it again, it looks like since the middle of March of this year it's even possible to invite others without sharing your phonebook.

https://www.blogher.com/social-media/clubhouse-invite-withou...

https://www.gizchina.com/2021/03/16/clubhouse-new-update-use...

ipaddr · on July 24, 2021

I keep no contacts on my phone and gladly give that info away. I'm surprised people don't use multiple phones for privacy.

eclipxe · on July 24, 2021

Most people don’t care about privacy.

patja · on July 24, 2021

Based on the popularity of WhatsApp, yes most people don't give it a second thought.

Bjartr · on July 24, 2021

Yes, constantly.

sneak · on July 24, 2021

justinclift · on July 24, 2021

Yeah, not sure either. Suspecting it's some other Clubhouse, not the main (project planning) one (https://clubhouse.io).

SahAssar · on July 24, 2021

It's the audio chat one: https://www.joinclubhouse.com/

justinclift · on July 24, 2021

Thanks, that makes more sense. :)

afrcnc · on July 24, 2021

It's fake: https://twitter.com/troyhunt/status/1419013520763539457

Sebguer · on July 25, 2021

It's not fake, it's just not valuable.

Edit: Or rather, the tweet doesn't claim it's fake, just that it's not valuable.

afrcnc · on July 25, 2021

No. It's fake. Even Clubhouse said so. It's just randomly generated data.

robertwt7 · on July 24, 2021

How does it work for the seller when the FBI is the one who ends up buying that list and then busted him in the auction?

Genuinely asking.. might be dumb question

vmception · on July 24, 2021

If the seller gets caught that is how it works

If the seller doesn’t get caught due to the purchasing methods and general routine OPSEC, then its just another example of the Fed reliably monetizing everything, meaning there will always be a buyer and everyone should sell more.

dmitriid · on July 24, 2021

That's what law enforcement does all the time: when there are illegal goods for sale, and a chance to catch the seller, they will go in, make the purchase and arrest the seller.

finger · on July 24, 2021

Sorry for the stupid question, but isn’t it illegal to buy illegal stuff? How does the police get away with that?

For instance in Denmark it is technically illegal to buy stolen goods, even if you genuinely aren’t aware of it being stolen. Im sure this applies to most countries.

zenexer · on July 24, 2021

LEOs often seem to be exempt when acting in an official capacity. I’m not sure what the restrictions are—do they need a court order in a situation like this?—but LEOs are definitely allowed to break laws and buy illegal wares.

noxer · on July 24, 2021

Illegal is defined by law and laws applied to a subset of people. What do you think the police does with illegal substances? Not confiscating them because "owning" it is illegal? No, the police does not take ownership the state does and the laws do not apply to the state. There is nothing out there in the world that is illegal for everyone to handle. not drugs, not nukes, not illegal media etc. someone has to have the right to handle it somehow.

dmitriid · on July 24, 2021

This differs from country to country. There's some info on Wikipedia: https://en.wikipedia.org/wiki/Sting_operation?wprov=sfti1

noxer · on July 24, 2021

This would not be a classics sting operation. The seller already committed the crime(s) by offering it. Sting operation usually are the reason someone could commit a crime by creating a bait crime opportunity.

rsa25519 · on July 25, 2021

You're describing entrapment

unnouinceput · on July 24, 2021

Let's play devil's advocate here and assume I am the dude selling the list.

I would ask for monero and would not care if the FBI is the buyer. The most they can do is to watch exchanges where monero is exchanged versus dollars or other cryptocoins. Then do this a few times over and start buying goods with those then sell the goods on Amazon/eBay for hard $$$. Small amounts and even with 50 cents at a dollar is still worth it for one person.

sennight · on July 24, 2021

I've wondered about the feasibility of using state run lotteries for laundering in a cash based criminal enterprise. The known odds of low cost/return scratch-offs and the need to only account for claimed winnings would make it tempting... if it wasn't so labor intensive.

clavigne · on July 24, 2021

I don't think it would be a good idea, given that you'd have to claim the winnings. It might work once or twice but not over and over again.

Additionally in most cases I'd think the lottery odds would be lower than the cost of traditional laundering (smurfing, through crooked banks, using cash based businesses like taxis etc.) Especially if you have to pay people to buy tickets.

sennight · on July 24, 2021

> It might work once or twice but not over and over again.

Except for when it does: there are a bunch of people who have repeatedly jackpotted state lotteries, they're usually described as 'reclusive mathematicians'. But that isn't what I'm talking about. I just checked the TX Lottery Commission's site and it looks like scratchoffs would run, worst case, a 30% return. I can't be bothered to calculate the upper bounds, but I'd expect it to be 40%-ish. That seems good to me, I especially like that you can skip the part where you have to drive out to some hotel to meet an undercover Secret Service agent pretending to be a Wells Fargo employee responding to your help wanted notice in Soldier of Fortune.

Aeolun · on July 24, 2021

Isn’t it great that a lot of high-tech crime is prevented by the people capable of it being too lazy to bother?

sennight · on July 24, 2021

I learned a long time ago that the most effective way to correct a vice is to play it against another vice, sloth being an easy goto. But in this case... I'm not a drug dealer, so I don't need to launder large amounts of small bills. But... if I wanted to launder a bunch of public ledger based crypto: instead of a using a loud and proud "bitcoin tumbler", I'd use something like satoshibet. Of course, that is likely why the original no longer exists - and I imagine anyone standing up a replacement (without a sufficiently invasive KYC implementation) would face similar hostility. Anyway, I expect that'll change when a state run satoshibet eventually emerges.

edoceo · on July 24, 2021

Cant go wrong with Quick Pick.

ptr2voidStar · on July 24, 2021

Check mate.

michelb · on July 24, 2021

How realistic would it be to send (anonymous) mass sms messages with phishing or other malicious links to those numbers? I’m occasionally getting sms message with bogus sender info (i cannot reply, nor get contact info), always wonder how spammers pull that off so easily.

Scoundreller · on July 24, 2021

As a challenge, I try to takedown these things by reporting them to Google Safebrowsing, their SSL provider (if they have one), their host, their URL shortener, etc.

Though in Canada, I'm seeing them apply some cloaking measures so they don't get removed as quickly.

I think there's two streams of this:

1. a crooked telecom that has low-level access

2. buy a bunch of SIM cards and dump them into one of these aliexpress machines that has 16 wireless modems in them that let you do whatever you want:

https://www.aliexpress.com/item/4000462982086.html

Can even network them to a bank thingy that'll hold 128 cards:

https://www.aliexpress.com/item/4000462976225.html

agumonkey · on July 24, 2021

Ah I wonder if that's related to the bot flood I got recently.

TechBro8615 · on July 24, 2021

I’ve been getting this since the FB hack (by “hack” I mean the recent bulk enumeration of 500m phone numbers that Facebook facilitated for an unknown party).

fabiandesimone · on July 24, 2021

Hey @fabianbeiner how can I get in touch with you?

stackedinserter · on July 24, 2021

Is clubhouse still a thing in July 2021? How do you use it? (and who do you talk to?)

ALittleLight · on July 24, 2021

It's funny how the hacker who is selling stolen private data is also complaining about GDPR compliance and privacy. On the one hand, he's right that Clubhouse (if this is true) has done something bad, but the hacker is much worse.

mm983 · on July 24, 2021

They are done for this time. Leaking peoples' number who haven't even signed up yet because of their economy flame approach for literally anything, oh boy...

qpiox · on July 24, 2021

If you have enough cash and time you can legally create your own list of all possible numbers on the world. Pick a number, dial and see if it exists. Hang up to prevent further charges.

jsjohnst · on July 24, 2021

> create your own list of all possible numbers on the world. Pick a number, dial and see if it exists.

Let’s say you had the ability to do that 1,000x a minute using an automated dialer. Just in the US alone that would take you over a year to complete and how many of those numbers you verified changed active/disconnected status during that time?

(PS, I didn’t downvote you, just pointing out a problem with your theory)

riffic · on July 25, 2021

You've invented wardialing

https://en.wikipedia.org/wiki/Wardialing