Hacker News new | past | comments | ask | show | jobs | submit login
Fake WhatsApp update from “WhatsApp Inc.” with Unicode whitespace: 1M downloads (twitter.com/virqdroid)
494 points by jakub_g on Nov 4, 2017 | hide | past | favorite | 210 comments



Several years ago I read [1] a proposal for internationalized domain names from DJ Bernstein from before puny-code took hold. The key observation was that there's nothing stopping you from just using UTF-8 in the existing DNS protocols, but it included a discussion of how to treat visually indistinguishable unicode characters to prevent fraud, which is why I bring it up now:

The proposal was that the TLD administrators should whitelist in non-ASCII characters and generally require that domains are either entirely ASCII or entirely in a subset of Unicode that made sense for their native languages - .ru could allow all-ASCII or all-Cyrillic, .gr could require all-ASCII or all-Greek, .de could allow ASCII plus eszett and the umlauts, and could further require normalized encoding (ü must be FC and not CC 88 75) and consider ö.de and oe.de to be collisions [2], and so on. Weird varieties of spaces, dashes, non-printing characters, accents that are only needed to type Klingon, and so on would never get whitelisted in.

I've always thought that was a great idea, and its a general principal App stores could use too. (Although I realize that app stores don't have as strong a concept of a native language as most TLDs do, which makes it a bit harder)

[1]: Its possible it was https://cr.yp.to/djbdns/idn.html, but I'm not convinced. Maybe it was an earlier revision.

[2]: In German, you spell "ö" as "oe" if you don't have an ö key. German speakers wouldn't necessarily need "ö" and "o" to be collisions.


This is an awful idea that conflates country and language and adds another place where governments can marginalise minorities. Should administrators of north African tlds be empowered to forbid Tifinagh characters?

Which character sets should be permitted in the .US tld?

The general principal of all ascii or all something else is not bad though. It would prevent certain homograph spoofs.


I'm not sure it would help that much. For example pop.ru and рор.ru look identical to my eyes in my current font.


Yes it's not bad but not great either. It would only help in certain situations (hello vs Ꮒello). It also becomes tricky if you permit Turkish, as you would have to allow mixing of ascii and the dotted vs dotless i.

There are also good reasons to mix language in a name. US state abbreviations might be used to distinguish (e.g.) a diaspora community in Texas from their equivalent in Alaska.


Dotted and dotless i is a different issue, related to capitalization. Those are still Latin letters. Of course to support most European languages you have to be able to use ASCII and non-ASCII Latin letters together. That's not unique to Turkish.

Think of the Spanish word "cañon", for example. It's not four English letters and one Spanish letter, it's just five Latin letters (four of which are in ASCII).


There is a difference between ı and i in both cases. I don't know what you mean by 'related to capitalisation'

The reason I raise Turkish specifically is that the similarity between the characters presents a potential homograph for a phishing attack in non-Turkish domains. (e.g. mıcrosoft.com). Characters with apparent diacritics are less vulnerable (e.g. öracle.com).


There was an example where еріс.com (whose domain is in the Ukrainian Cyrillic alphabet, and I think would be pronounced like "eris dot com") once got a security certificate that would be visually indistinguishable from one for epic.com. This was, thankfully, intended by its creator to demonstrate the vulnerability.

The certificate has expired now, and additionally, browsers now show it as https://www.xn--e1awd7f.com/ . Even HN rewrites it if I type it as a full URL in Unicode, actually.


same thing with google.com and google.com


Uh, you can just do two things. Prohibit mixing of character blocks. That is sane and would prevent many incidents.

Edit: As for homoglyph bundles, you just have to do it like EURid and create tables.

But the main protective layer would be the user interface, similar to how HTTPS is handled, either descriptively, with warning colors or both.


I would say yes, otherwise we're stuck with enumerating and testing innumerable edge cases.


This is what happens today with international domains.

Different TLDs whitelist different code pages/codepoints as allowed under their domains. See e.g. https://www.verisign.com/en_IN/channel-resources/domain-regi... or https://eurid.eu/en/register-a-eu-domain/domain-names-with-s... and the linked https://eurid.eu/media/filer_public/8d/18/8d18473b-ed9b-4fba...


Yes. I don't recall the details but I think it's required by ICANN. We (Google Registry) support extended Latin and Japanese codebases, and since the registry software that we've written is Free Software, you can see our implementation, e.g. here's extended Latin: https://github.com/google/nomulus/blob/master/java/google/re...


Would that really help when .com is the most popular gTLD, followed by .org? Being international I'm not sure how you could restrict the charset here.


This is a horrendous idea.

What's to say, 0 and o can be mixed up and ban that too? How would you handle gray areas? Determining what's fishy and what's not is not a matter of black and white. What if you have a German name and want to set up a site in India with ö.in? This idea creates more problems than it solves.

Policing/banning is never a good idea. Internet freedom is way more important than phishing attemps.

Let's ban everything because anything can be phished if you're smart enough.


Respectfully, I disagree with the absolutist view that policing and banning is /never/ a good idea. Phishing is rightfully unlawful, and its wholly appropriate for the mechanisms of society to fight against it.

There is a balance to be made for sure, but in the case of both domain and app names, I'd argue that the harms from fishing outweigh the freedom of expression conferred by being able to register "Αpple.com". While the names of things do carry some expression, things have names primarily so you can tell them apart. Its a well settled moral principal that we should have rules to preserve the utility of names as monikers, and that its entirely possible to construct rules for that purpose that have a negligible effect on freedom of expression. If you disagree, then show me the developed country that decided it didn't need a trademark law.

Your hypothetical German expat in India is welcome to try to convince the Indian authorities that the ability to register können.in outweighs the value in preventing amazön.in from being registered. Maybe there are more Germans in India than I know, or maybe phishing causes less economic harm in India than it does in the US.

App store names are harder, because you don't have TLDs giving you a hint to what language(s) most of your users speak. But you can still disallow mixing of Latin, Green and Cyrillic alphabets; you can still say that if you are going to use the crazy accents used in Vietnamese, that you can't also use umlauts; you can still whitelist unicode characters as needed, so that you don't have a dozen different spaces and dashes for no reason; you can still use other signals to give language hints. And as it turns out, most people running app stores have a large pile of money they can use to curate and maintain good automated rules, and hire people to manually audit names when the automated heuristics think its fishy but not fishy enough to automatically disallow, or in response to complaints.


The explosion of TLD s shows they don't care about us. I cannot in good faith put trust in them. I don't know what a solution is but this is not it.


Explosion of TLD was such a lucrative move, they couldn’t resist!

See, this is another problem. Once you have a phishing blacklist, some entity with big enough pockets will just start influencing the DNS system. It would be a slippery slope going straight to dystopia.


Thanks for a detailed response but I still think it’s a bad idea on principle as you’re bestowing authority to some organization that needs to be funded, organized, maintained and skilled to keep track of these rules.

Soon, it will become a giant mess and regulation will impede regular companies/people that would question “Who put these rules together? Our domain doesn’t happen to be fishy”.

Furthermore, how important of a name does it have to be for adding it to these automated rules? How do you quantify that?

Even in an ideal perfect world, I wouldn’t want to sacrifice internet freedom for phishing attempts. DNS is a huge part of the internet whereas phishing costs are neglible.

Every time someone brings an idea about regulating the internet, I just get a repulsive feeling. Stop trying to fuck with the internet. The EFF is not enough.

Isn’t Google Chrome and other browsers already doing this? Phishy websites get tagged and there is a big red screen that shows up - most importantly - allowing users to bypass it if needed. There are times when legit but old websites get banned by these Phishing blockers. I think the solution needs to be more “local” than something on a grand global scale.


It is certainly not a horrendous idea; it is commonly used for serial numbers in engineering. For example a part number would have a serial number that is a subset of A-Z, skipping the easily ambiguous O,I and similar.


Related: https://pypi.python.org/pypi/idgen/0.0.1

That's one way to ensure safety of domain names, at the cost of mnemonics ;)


Say, for example, the base 58 character set used by bitcoin.


Here are three fake uBlock Origins that have over 4 million users between them

https://chrome.google.com/webstore/detail/ublock-plus/kjagjn...

https://chrome.google.com/webstore/detail/ublock-adblock-plu...

https://chrome.google.com/webstore/detail/ublock-adblocker-p...

The last two are exploiting the fact that uBlock Origin doesn't come up when you search "adblock".

There's tons more, just look through the search results for "adblock" https://chrome.google.com/webstore/search/adblock?hl=en-US&_... and results for "ublock" https://chrome.google.com/webstore/search/ublock?hl=en-US&_c...

Note that firefox doesn't have this problem (tons of adblockers, maybe some are fake, but none pretending to be uBlock Origin) https://addons.mozilla.org/en-US/firefox/search/?platform=ma... maybe has something to do with the fact that they show usage numbers on the results page.


> Note that firefox doesn't have this problem [...] maybe has something to do with the fact that they show usage numbers on the results page.

Mozilla does a manual code review of newly submitted or updated extensions. So, an actual human being sits down and looks at the code. They'll notice when a fake uBlock Origin is submitted.

With that, they also enforce a rule which Google does not have, that any connection to the internet which is not necessary for the add-on to function (ads, telemetry) have to be opt-in.

This isn't perfect protection, for example the extension Web Of Trust required sending browsing data back home in order to function, which they then sold in anonymized form, which was proven to be deanonymizable last year. But it does take out the incentive to spread fake versions in a lot of cases, as you just can't publish an ad-ridden or trojan uBlock Origin clone.


Mozilla recently changed their add-on review process. Humans are still in the loop, but part of the process is automated.

https://blog.mozilla.org/addons/2017/09/21/review-wait-times...


> With that, they also enforce a rule which Google does not have, that any connection to the internet which is not necessary for the add-on to function (ads, telemetry) have to be opt-in.

This sounds pretty cool and reasonable. But extensions still can modify the currently displayed website, right? Doesn't that make it trivial to submit data somewhere? E.g. <img> tag with GET params, as the most basic form of this.


It does, but it also makes it trivial for Mozilla to notice that this is happening and then they can weed that extension out before it gets published.


What are those fake uBlocks up to? Malware? Fake ad traffic? Does anyone know?


Probably replace some ads with theirs



I just use the host file to block stuff. There are enough crowd sourced github lists out there that relying on a thirdparty browser extension that is doing god knows what, controlled by god knows who just doesn't make sense.

The rest of the family has been trained to update their files.

Update it every two months


That's great until you need to selectively disable it for a single (poorly coded) site.


There are many other problems with the concept, though of this one:

High-level blocks (domains, TLDs) can be bypassed using, say, dnsmasq, by providing specific pass-throughs.

As for sites that use widely-blocked services: the message is to feed back to them and tell them not to do that. The fact of countermeasures does not mean that there will be no collateral damage. In fact, that's kind of precisely the situation that got us into this mess in the first place: putatively legitimate advertising that isn't.


"Hey mum, blocking adverts is simple! All you need to do is install and configure dnsmasq, then configure some bypasses for sites that break!"

Yeah, sorry, no. As for reporting sites that break, that's a good thing to do in general, but not too good if you want to use the site now.


The "it's too complicated for the average user" is most of the "other reasons" I was referring to. Pi-Hole is far more streamlined (and largely reduces to "dnsmasq plus a lot of blocked domains").

My point was that if you're going the blockfile route, you can punch specific holes.


/etc/hosts will never block all ads[0]. uBlock Origin can use host files. They aren't on by default because they aren't needed but Dan Pollock's and MVPS lists are already there, you just have to click a check box in Options→"3rd-party filters". The lists are auto updated[1]. uBlock Origin is open source[2] and made by a great[3], hardworking[4], ideological[5] guy. He writes uBlock Origin, which loads in lists of url + css-selector pairs compiled by other people[6][7].

I prefer the term "content blocker" because they're more general than just ads. You can block anything you can write a css-selector for (and because of the arms races[0][8][9], other things like websocket connections too). For example, I hide sticky header bars (thanks web designers), youtube comments, everything that isn't the article on news sites and the menu that comes up when you right click on Medium. There are also pre-compiled lists of annoyances and a list to block social share buttons.

uBlock Origin is the only ad blocker that should exist, arguably every single one of the others is fake. There's plain "uBlock" which is the original project that was effectively abandoned in 2015[10]. There's "Adblock Plus" which is a rent-seeking operation[11] that employs 100 people[12]. There's "Ghostery" which is closed source[13] and up until February 2017 was owned by an advertising company[14]. uBlock Origin is the one you want.

[0] https://twitter.com/gorhill/status/846781439890853893 I'm reminded of anti-bacterial soap

[1] https://github.com/gorhill/uBlock/wiki/Dashboard:-3rd-party-... periodically updating files is for cron jobs, not humans

[2] https://github.com/gorhill/uBlock

[3] https://news.ycombinator.com/user?id=gorhill

[4] https://github.com/gorhill/uBlock/commits?author=gorhill

[5] https://news.ycombinator.com/threads?id=gorhill seriously, thanks man

[6] https://github.com/easylist/easylist

[7] https://github.com/ryanbr/fanboy-adblock fanboy also deserves a lot of thanks

[8] https://issues.adblockplus.org/ticket/1727

[9] https://newsroom.fb.com/news/2016/08/a-new-way-to-control-th...

[10] https://github.com/chrisaljoudi/uBlock/commits/master

[11] https://adblockplus.org/acceptable-ads http://www.businessinsider.com/google-microsoft-amazon-taboo... https://en.wikipedia.org/wiki/Rent-seeking

[12] https://eyeo.com/en/team

[13] https://github.com/jonpierce/ghostery

[14] https://en.wikipedia.org/wiki/Ghostery https://en.wikipedia.org/wiki/Evidon,_Inc.

the problem in my original comment has been known since June, and some of these extensions phone home https://twitter.com/gorhill/status/898574880773484545 these are probably for "market research"


>uBlock Origin is the only ad blocker that should exist, arguably every single one of the others is fake

You are being grossly unfair to uBO's big brother uMatrix. Otherwise, yes.


Google. A search giant. A machine learning leader. They can save me from a typo in the web search, but can't in the Play store.

That's another reason to switch to F-Droid.


What keeps someone from doing the same thing to f-droid?


Fewer people run it so it's not as desirable a target.

"Security via unpopularity"


Aka, the Linux anti-virus.


Sometimes it certainly seems that way, but there was also period of time when Apache dominated the web and yet Microsoft's IIS was having a lot more exploits despite Apache having more market share. Marketshare isn't the only factor, but it probably is a factor.


I wonder if that might have been due to Windows market share? Windows was everywhere on the desktop, and those Windows desktops provide a good intermediary vector for attacking instances of IIS on Windows Servers.

Also, thinking back to the bad old days and the script-kiddie-eseque of many viruses of the early 2000s (iloveyou, et al), I suspect it may come down to attacking what you know: Windows was more prevalent and better understood so that's what people tried to break.

Not my field though, so all just speculation.


We're the exploits helped out in any way by Windows itself or were they solely exploiting IIS alone? Never used it but I assume at least before it may have been pretty integrated into Windows.


It's a factor, but is it really that big of a factor for Linux? I'd've thought that its usage on servers would make it a sizeable target (both as powerful machines to use in a botnet and as a way to compromise company information).


Attacks on servers are of a different nature. No one runs Linux desktop, so it is not a valuable target for trojans or other attacks like that.

If it were popular we'd see just as much malware for linux.


Desktop Linux just as well has software repositories, which are similar to app stores, but human beings look over each application that's included. And you find essentially all popular software in this trusted repository.

So, this strategy would barely work, as users would only look on the internet for a download, if it's not in this trusted repository and then it's gonna be a really unpopular application. (Theoretically, it's possible for your grandma to go on the internet before checking this trusted repository, but that is really just so much more effort.)


Linux user base also has an effect. It's all people who know what they're doing or someone having a newbie relative using it just for browsing the web.

If Linux was 90%+ of the market, getting people to download some stuff with a curl command promising some BS or having people download and run sudo would only need to touch a fraction of users to be highly valuable. That's just a random example off the top of my head. And also because I don't use Linux on desktop so I don't fully know how everything works there.


You're spot on.

And if someone thinks regular people would be "too scared" of CLI to pipe curl into sudo sh, remember that people are "too scared" of developer tools in browsers too, and yet Facebook and others have to implement self-XSS protection measures in there, because it turns out there's nothing too complicated in computing when it stands between a person and fulfilling their desire (as promised by a scammer).


Yeah for sure about the "too scared" to do any CLI or any other different sort of thing. I've had some Mac friends just copy paste homebrew stuff in. Without any knowledge of how popular it is or if they are for sure on the correct site. They just knew I said homebrew [cask] is good and installed it. They could've easily gone to a wrong site and gotten screwed over. Half these people are people into gadgets and electronics, but probably will only open terminal once a year on their Macs, if that much.


> similar to app stores, but human beings look over each application that's included

I'm sure that's not true for all Linux repositories. But humans certainly do look at all apps going to the App Store.


You can't put up a wordpress site with more than 4 plugins selected at random on a linux box that is secure. I even experienced a THEME that was hacked.


That’s more a PHP problem than a Linux problem; a lot of the Wordpress hacks don’t care about compromising your system at large as long as they can just take over your www-data user.


Can't that also be because that's easier to do and already lucrative enough for the hackers? Not the only reason, but a big reason? Add on that a ton of [hacked] wordpresses are in jailed settings.


> No one runs Linux desktop

roughly 3% of the world internet users. Sure, 3% is not much, but it's still multiple hundred of millions of people


You're off considerably. The world population is only 7.6 billion, if every single person on the planet used the internet and 3% were using Linux to do so it would barely multiple hundreds of millions (228 million).

The actual number of users is closer to 3 billion, so even if your 3% is correct (it isn't) that's not even 100 million.

That's also assuming that every user of the internet is a laptop or desktop to access the internet, but that isn't case. More and more people are only using a smartphone or tablet, especially in emerging markets.


> so even if your 3% is correct (it isn't)

The netmarketshare stats have been hovering around this for a few months, and all the "global internet usage" stats that I could find were closer to 3.75 billions.

Even then, assuming 3 billion, it's still 90 million users... that's most than the inhabitants of any country in the european union


Universities and teams within corporations run Linux desktop, though. Maybe they'd make for specialized targets?


There is ransomware in the wild for Linux desktop now, iirc propagating through Flash.


The existence of desktop Linux malware is not at issue here.


isn't it more like Apple anti-virus?


"Security via obscurity" is the popular phrase.


That’s something different, though.


Or (seriously) one of the reasons I use a windows phone!


Still? I love Windows Phone relative to other mobile OSes and I clung to webOS for as long as possible. But seems like clinging to Windows Phone still has to be tough at some point. The ecosystem is gone.


Agreed. At least with Google, we have something big to blame on.


Besides assuaging feelings of rage what does this accomplish exactly?


Blame, nothing. But a single app store means only one place to remove offending apps from.


Potential for class action lawsuits?


Maybe they have package vetting process with an actual human inside?


You're basically saying that f-droid is better because it's small. Nothing to do with its selling point of being FOSS.

(And for the record I'm an f-droid user)


1) What does it matter? If it's better by being small, it's still better.

2) The F-Droid maintainers manually build the apps on F-Droid from the respective code repositories. They will notice when something like that is off. This has to do with it being FOSS.

And if you're wanting to tell me that this doesn't scale, not really, no, but it's the same thing that Linux distros have been doing for a long time and Red Hat, SUSE, Canonical actually do have a crapton of users, especially on the server side.


Manual code inspection and FOSS or closed is entirely orthogonal to typo squatting of package names.

Every one of those distros gets around volume of desired apps by allowing the inclusion of third-party repos (e.g. current Python or docker) which in turn introduces typo squatting as a vector.

You're asking for app store maintainers to slow everything to a crawl and never get popular. No entity which wants to be successful will do that, corporate or otherwise.


They don't. More exactly, they try to vet things, after the fact, and not covering anything. Source : their own faq.


I skimmed their docs and I don't see any sort of language that corroborates what you are claiming.

They DO seem to have some sort of review process in place:

https://f-droid.org/en/docs/Inclusion_How-To/

See: Application Review Process.

I don't know how exhaustive it is or how effective it is in practice though.


They don't do a thorough code review, but they are human beings and they manually build the software from its code repository, so they will notice these kind of discrepancies.


They have software dev process with actual human, inside.


>they can save me from a typo in the web search, but can't in the Play store.

Besides machine intervention, this seems like a company the size of Google could easily help (if not solve) while creating a lot of good will by hiring a few at-home workers to better flag or check in on certain applications. Off the top of my head, this wouldn't require a ton of training and you could even provide burner-like phones for folks to download the apps to and play a bit on.


It could easily be hooked it into their existing opinion rewards program. Every time a new app is published on the play store or the name of an existing app is changed, just ask a half-dozen real verified humans if it matches the name of an app they've heard of before. I'd be happy to do it once a week or so for ten cents of play store credit.


> They can save me from a typo in the web search

Well, I'm not so sure. For example, if you search, in French, for the correctly spelled phrase "Avez-vous aidé quelqu'un aujourd'hui" they suggest the wrong spelling "Avez-vous aider quelqu'un aujourd'hui", which is a grammatical abomination.

https://i.imgur.com/em6t9qS.png


> That's another reason to switch to F-Droid.

But if you need to download WhatsApp to communicate with loved ones, you can't get it from F-Droid, right?


The best way to download whatsapp is to search for it on google and click the top result to go to the official website. Then click download from there. It turns out the age-old method of installing shit is the best.


Usually there is no download link (nor is there in the case of WhatsApp, as far as I can see), but a link to the Play store web interface.


Yeah a link to the play store is what I meant. That's guaranteed to point to the genuine app.


In a way, you can: https://f-droid.org/app/com.javiersantos.whatsappbetaupdater

The method the other commenter suggested also works.

But yes, these are anecdotal for WhatsApp, which I presume was just an example on your side.

You'd have to mainly use F-Droid and then make some exceptions for those apps. You could also use Yalp store (which interfaces with the Play Store) or Aptoide to get those apps, if you don't want to keep the Google Play Services around, though I cannot make any claims of these being more secure.


If your loved ones require you to install crap on your phone to communicate with you, do they really love you?


If you won't install what you call crap and they call good or great on your phone to communicate with them, do you really love them?


Great reply.

Your question demands an answer, for real. As does mine.


I used to begrudge my friends when I was depressed and isolated (self-sabotage). Why weren't they communicating with me in my ways? They were, but not frequently enough to satisfy me. But in the end, them reaching out multiple times when I was available in only one or two spots, was very commendable and I appreciate the friends that did.

There has to be a middle ground. In this case, unfortunately, besides close friends or family who know something about tech, you will have to be the one to compromise and use their apps. Plus, I've seen getting people to move to your communication methods which are less popular usually leads to less talking. Not always, just something I've noticed isn't uncommon. I end up talking to people more if we just use FBM or iMessage or something.


Don't be dense. Not everyone has the technical skills to decide which is the best app to message others. People just go with the popular ones or the ones their family/friends use.


Google has no incentive to police the store or protect it’s user’s devices. They only care about getting more people on the Android platform so they can increase their search and advertising revenue. They could give 2 shits about the Play Store.


The fact that android is such a mess is the reason I use an iphone. If and when they start caring about the quality of the android experience, including the play store, I may be tempted back.



That seems like a rare occurrence. My macOS install is screwed up. I'm not entirely sure if it's something on my end, the hard drive, MacOS or something else. But I'm not going to straight up just blame macOS or Apple MacBook hardware for the issue when maintenance others do not seem to be having the issue. That seems far too assuming and rash.


So you won't blame Apple, but will blame Google? :)


But that's counter intuitive. I for one don't use android simply because of the play store. If you take care of the developers, they'll take care of the apps which will bring the people in.


One incentive to police the store is that quality is a draw for users.


What's the alternative, iOS? That's great for people with financial stability in developed nations, but that leaves out swathes of those in developing nations.


Old & new previous-gen iPhones are available at Android prices, including “free with contract” (in the US at least).

They aren’t cutting edge, but neither are the equivalently priced Androids.


I have an Android dev phone that cost £19, no contract, about half-to-a-third the price of a then-equivalent second-hand iPhone. In third world countries even a £19 device is worth stealing, as a friend of my partner experienced.


Wow, that is cheaper than I expected. What brand?


Vodafone sold it (pay as you go, no contract) with only their brand visible, but I think it’s an Alcatel device under the paint. The model is no longer being made, but their current cheapest model is almost as cheap.


Old iPhones are still prohibitively expensive in developing countries. Brand new low/midrange Android phones usually cost half or less.


Good catch. Didn't even think about that.


F-Droid


Take the top apps and automatically raise a list of apps with similar names each week. Pay someone to investigate and flag if necessary.

I feel like there's a general unwillingness in some realms to do anything at all if the solution is "manual labor until a better tool is available."


There might be legal concerns as well. I know under the DCMA safe harbor provisions that basically as soon as you start doing manual moderation you are now liable for anything that gets through.


I don't think that's true. A safe harbor can't make you liable for something, it can carve out a specific case where you are not liable. So the DCMA safe harbor could say that they are not liable if they never moderate. That would mean that by moderating they would give up their ability to claim safe harbor. It would not make them liable. Their liability would then be judged based on what they did (ie was their moderation negligent in some way).


Could you expand on that?


Not an expert, but the only part I see like this is section 512(a) here: https://en.wikipedia.org/wiki/Online_Copyright_Infringement_...

This seems to apply to network traffic, whereas I'd guess what to host on the Play Store would be covered by 512(c) instead. If so, the "red flags" test would seem to require at least some automated content checking.

But all the above refers to copyright, rather than the trademark violation and fraud in the WhatsApp clones.


Did you mean DMCA?


Or don't allow apps with similar name than a popular one in the first place.


For anyone curious, the name for this type of deception (using Unicode to pretend to be a known domain) is called a homoglyph attack: https://www.cisco.com/c/en/us/support/docs/security/email-se...


Why don't they have a normalized slug to ensure name uniqueness? Or if so why would it consider whitespace differences unique?


They could do a lot of things, if they cared.

For example: limit app and account renames; when creating/renaming app/account, compute levenshtein distance to all the existing ones and if distance < threshold, make it subject to manual review and make it unlisted before cleared.

Problem is, from my observation, that Google has a culture of hating any manual processes, because they do not scale, so they avoid them, unless compelled by law.

2nd problem is that they have big enough market share that they don't have to care about things that are not convenient to them. Slightly off-topic but in a similar way, Apple can increase iphone price 10% per year and get away with it, because people still buy.


Manual processes scale just fine. They just don't want to pay for it.


Exactly. Hearing "support doesn't scale" is complete bullshit. They just would prefer to skirt the costs.


True, for example, AWS is doing fine for years with manual customer services despite its scale.


Scaling in 2017 means "scales like f(x) = ax+b" with a=0, b=0


Maybe for Google.

Scaling well means "costs / effort scale like f(x) = ax^p +b " with p < 1. Not scaling well means having p > 1.


What is a "normalized slug"?


The kind of excerpt-like thing you see on the end of many many modern URLs. "rare-white-moose-captured-on-film-in-sweden" in this one: www.bbc.com/news/av/world-europe-40918494/rare-white-moose-captured-on-film-in-sweden

It's machine-produced and surprisingly good at revealing accidental/unintentional/evil duplicates, considering how cheap it is.


slug is a term leftover from newspapers. You most commonly see slugs today where a title might be "Did you know this mushroom is off the hook?" and the URL is turned in to a "slug" like

    example.com/mushroom-off-the-hook/
Here is an example of the Django framework's "slugify" https://github.com/django/django/blob/master/django/utils/te...


A simpler representation of some text, using only lowercase letters, numbers and dashes for spaces. Removes any unsafe or "invisible" characters, accents, and sometimes simple words. Often used in URLs.

TEXT: Fake WhatsApp update from “WhatsApp Inc.” with Unicode whitespace: 1M downloads

SLUG: fake-whatsapp-update-from-whatsapp-inc-with-unicode-whitespace-1m-downloads


All names get reduced using Unicode normalization.


Recently, thousands of users were duped into installing "АdВIосk РIuѕ" in Chrome. Every one of those letters is Unicode normalized.

(Hint: it's not just the I's masquerading as l's.)


Fair enough. I'm a little surprised that those hold up under NFK normalization. А sure as heck seems compatible with A.


I suspect that if they haven't already, they will now


Or visually rendered and use some kind of picture diffing to decide if they're visually similar ?


That sounds a lot harder than just normalizing the string


You would think so, but that turns out not to be the case. Mostly because it doesn't require humans to think of edge cases. It also more closely addresses the actual problem of "do these things look similar when rendered on screen" as opposed to the abstract problem. For instance, render and score catches "rn" being visually similar to "m".


What may look like decoration to English readers e.g. the ä in häagen dasz or ö in Motörhead, are actually distinct letters in languages that have them. Disallowing display names because another one exists but without diacritics and the like is just asking for a ton of manual review. And how do you handle scripts that only use ASCII for English loan words, say Chinese, Japanese, Thai, Arabic, Russian and Indian scripts?


Hang on, don't Google supposedly review apps before accepting them into the store? I mean, they apparently have both an automated system checking for rule violations and actual human staff checking every now and then:

https://www.recode.net/2015/3/17/11560334/google-is-adding-m...

So how is this stuff just waltzing past their quality control setup? That one unicode character can't really be messing up the whole system, right?

If this stuff is supposedly moderated, who's actually doing the moderation here?


I assume the downloads were fake, thus giving Google an easy excuse to get rid of it (it's gone now). Although probably all they needed was the obvious impersonation.

Unicode has a boat load of security issues. http://unicode.org/reports/tr36/


Recently, I went to install the Amazon Kindle app onto my new phone. From the Google Play store. It all looked good, except for the strangeness of an individual's name listed as the name for the street address and contact information for the app. That was something I did not recall from previous visits to the app in the Google app store.

So, the Kindle app's not on my new phone. Because the validation portion of curation is, ultimately, left up to the individual. And I didn't have time to go chasing around the Web making sure I was hitting the correct/official app store page. I probably was. But I've been well-trained to "pause and check" on such details.

P.S. I now recall, causing further hesitation, the "other apps" sections of the search results and/or Kindle app page, included an Amazon Video app. And that app had the same name listed in its details.

Now, the last I recall, Amazon Video was specifically NOT available in the Google app store. Forcing people on non-Amazon devices who wanted to use it, to have to add the Amazon app store and adjust permissions to allow installing apps from it. At least, temporarily; once you had that or whatever app you wanted from Amazon, you could then adjust your devices settings back to their defaults. Unless/until you wanted to pull an update to such an app -- then, rinse and repeat.

So... I see a weird bit of contact information. And I see it also for an app that prior experience taught me was not available in the Google app store...

And, with repeated stories like the OP, I can't trust the Google app store to be well-curated.

What else can I say? Meh...


That’s interesting.

All of them have "Suhail Mirza, 500 9th Avenue N, Seattle, WA 98109" listed as author.

But they are the real apps: https://play.google.com/store/apps/developer?id=Amazon+Mobil...

That person’s LinkedIn profile claims "I own Engineering for Amazon's Mobile Shopping iOS, Android and Windows Mobile Teams. I am looking for developers globally. Reach out if you are interested."


Amazon prime video is available on play store.


It appears to be, now. It did not used to be.

The Kindle listing I was looking at shared details with the Video listing.

Despite a fair amount of news browsing, apparently I missed the information that the Video app had made its way into the Play store. Actually, I seem to recall some news of same but also follow-up news that it had been pulled, again, within a few days. (The eternal Google/Amazon competition/strife/"user, you are the product" situation.) This would have been months ago.

So, I'm left uncertain whether I'm looking at the real thing, or an imposter. I'm fairly certain I'm not. But "fairly certain" is not "secure".

At the time, I didn't have a lot of time to delve into this. And I only had my phone in hand, making such an investigation more cumbersome.

I didn't install the Kindle app, then. The moment and immediate need passed, and following up on this dropped down my list of priorities.


When I see an app has over 100 million installs, half a million 5 star reviews, and a support email address ending in @amazon.com it seems pretty sure to be the real deal.


After some more checking, when I had a bit of time, I installed it.

Those are also things I look for.

Still seems to be in line with my basic point: On Google Play, it's up to the user to assess the item's legitimacy. At least, so far, Google continues to provide these data points to the user; as long as the Play Store itself isn't compromised.

Keep in mind, some of the items recently in question in the news are reported to have had a million plus installs. Separately, fairly recent news stories have described ways in which third parties have managed to glom onto prominent domains -- particularly those providing extensive user services -- to gain the addressing of that major domain for their own functionality.


Trust is hard. It cannot be automated, it's inherently social and demands vigilance. This is as true IRL as it is online and on "curated" federations (of software, news, contacts etc). The "killer app" for trust is one that extends our natural skepticism and social awareness, and this will _never_ be easier online than in meatspace. This is obvious when we raise our perspective from the purely technological (the "means") to the fundamentally social (the "ends").

All the automated or manual safeguards that Google could enact would never prevent people from pulling a fast one, the old switcheroo, a kansas shuffle on each other because it's just something that we do. And we will use whichever means (technology) available, in whatever way feasible. This particular example looks egregious (or ingenious, depending) for cosmetic reasons, but it's fundamentally an interaction between people however fraudulent. Google is in the business of interactions between people.


Solving trust 100% is hard. Having people review apps that have names which are within a short Levenshtein distance (accounting for Unicode tricks etc.) of a popular apps' names and banning those apps, the accounts that created them and their suppliers of fake votes is not that hard, especially for a company like Google. And look at those apps' descriptions, they are complete baloney, and any two-bit text classifier which a capable intern can mock up together in a weekend from off-the-shelf components can recognize that. These guys aren't even trying, and still aren't getting caught.

Yes, it may require some monetary investment, but we're talking about $700bn company. They could afford it if they wanted to. If they are not doing it, that means they do not want to.


Of course, in hindsight you "only" have to calculate the Levehnstein distance between any product name and _all other_ product names on the store. That scales well. In order to close one single avenue for fraudulent advertisement. Maybe it's a big one, and maybe the cost is recouped through improved customer relations. Maybe.

And maybe they implement this, and calculate hundreds of millions (billions?) of Levehnstein distances every day, but the next day someone publishes the same app but with a germanized name ("Was ist App Update") and fools a couple'o hundred thousand germans. Now the solution is obvious, run the names through Google Translate for ALL languages and calculate the respective levehnstein distances! I'ts foolproof! Shame on you google for not doing it already! Simply irresponsible.


> and _all other_ product names on the store

Not true. Nobody fakes random products. It's the top scoring ones that are getting faked - for the obvious reason that this is what people are looking for. If you're not in top N (100, 200, whatever), faking you is useless, you just replacing nobody with nobody (exception may be bank apps, where even faking relatively obscure ones can be lucrative, but let's not get into niches for now). Just scanning against the top ones would kick the floor from under the most current fakers.

And of course you don't need to continuously re-scan the data - you need to scan only once, when the app is submitted or the name is changed. So, in summary, when adding app or release to the store, you need to check its name and description against a list - let's be generous - of 1000 strings and maybe run a basic text classifier if you are feel in very AI mood today. Is that impossible to scale? Nope, it's fairly easy.

> but the next day someone publishes the same app

So your argument is because simple checks are not perfect and do not cover 100% of possible fakery, let's not do anything and allow even the dumbest fakers to run free and fill the store with trash. Does it make sense to you? Because it doesn't make sense to me. Probably you decided since your argument won't be perfect anyway, there's no point to even try for it to make minimal sense?


> All the automated or manual safeguards that Google could enact would never prevent people from pulling a fast one

We don’t expect perfection, but they’ve at least gotta make it harder than the copy and paste bs that litters the Play Store. ‘It’s hard’ is the worst possible reason to do nothing.


> All the automated or manual safeguards that Google could enact would never prevent people from pulling a fast one,

This is a copout. Nobody ask Google to review the source code of each app uploaded. There are plenty of basic things google could put in place to make sure blatant fraud doesn't happen. But they don't, because they don't care or don't want to allocate resources to anything that doesn't have a high return on investment. And since the competition virtually does not exist...


"Unicode whitespace" apparently means non-breaking space (U+00A0).


To which the term "Unicode whitespace" applies as equally well as it does to plain old U+0020 :)


I seriously don't understand how people let their aging parents or young children or friends use Android phones.


Should they also be forbidden from using Linux, Windows, macOS because it allows for the same exploit? Should everyone on the world be limited to iOS and ONLY iOS which is limited to ONLY apps (and soon media content) Apple allows you to use?


You joke, but this is exactly why I recommend Macs or iPads to all of my elderly friends and family. Trying to run Windows safely turns into a level of knowledge that my 75 year old mom isn't able to do.

I don't yet know of a good alternative. I desperately want one because Apple kit is expensive. But for now "$500 for an iPad" is the advice that gets me the fewest calls for support.


My grandma does three things on the Internet: talks to her family on facebook messenger, receives photos in email, and checks her stocks on ameritrade. So yes, with those requirements, she should only use iOS.


Maybe because 99% of the world cannot afford even pre owned apple phones.


Good point. I would say cost is the only reason to not get iOS device.


> I seriously don't understand how people let their aging parents or young children or friends use Android phones.

Your comment is a strawman, anybody can be fooled by these kind of dirty tricks. This isn't about users, this is about what Google is not doing upstream to prevent basic fraud on their platform.


I agree that Google needs to be more responsible but I've never seen an iOS device with ads on the lock screen.


I gave my mum F-Droid, which is a small FOSS app store, and removed the Google Play Store.


Android enthusiasts who think they’re elite because "Apple sucks! Android’s has that feature for 5 years!" But not to be naïve; iOS has problems with fake apps too, just not nearly as bad as Android.


Yes, it might, but:

1. There are fewer of these apps.

2. There's less that they can do–on Android apps can do anything and everything once installed.


> "on Android apps can do anything and everything once installed."

Not necessarily. Modern Android devices have granular permissions, just like the iPhone.

https://developer.android.com/training/permissions/requestin...


Unless this has been changed, this only occurs if the app agrees to it, making it entirely pointless. A malicious app needs only to target Android 5 and gets free reign over the phone.


I think that means not using a newer SDK than 5, in which case the feature doesn't exist yet.


This requires that the "aging parents or young children" have a modern Android device, which they often do not. Often, they either have a low-cost outdated phone, or an old hand-me-down; neither of which will be running a new version of Android. Plus, it's not uncommon for people to tap through permissions lists. Android fundamentally allows for a lot more to go wrong than iOS does.


I have personally and purposefully caused a lot of confusion on some sites by using Cyrillic letters that look exactly like English letters to impersonate other people. This was mainly for fun and for harmless trolling, but it's very easy to see that this could be used on any site that uses Unicode for usernames, etc. Phishing is extremely easy with this and something needs to be done otherwise no one will trust the Internet ever again, especially if someone can just "steal" Whatsapp so easily.


At least now it's easier to explain to customers why they have to get a DUNS to have apps under their company name in the App Store while the Play Store just allows it.


Apparently a-z0-9 usernames work better than these full business names.

It would be much harder to fake a github.com/whatsapp account than it is to fake "WhatsApp Inc.". Besides the invisible codepoints, one would easily do "WhatsApp Inc", "WhatsApp Messenger Inc.", "WhatsApp IM" and so on.


How is any end user to know what is the original whatsapp?


I hope there are no fake banking apps like that...


Unicode strikes again!


When people bitch about “walled gardens” I like to remind them just why people build walls. This... is why. Sure, a world without walls and locks would be ideal, but only if it’s also a world without thieves, saboteurs, and jerks.


The irony is strong here. You need walled gardena because walled gardena protect people from dangerous software. Posted as a comment in a news about dangerous software found in a walled garden.


The argument would be that this suggests that Google's garden should have higher and better-guarded walls to prevent such things, while many seem to argue that our gardens' walls are too high. Apple gets criticized for having slower and more arbitrary manual reviews of all app updates, but they don't seem to get malicious apps like this in their app store nearly as much.


They also don't have decent free mail clients with OpenPGP support.


Not really. This is only news because it's surprising and uncommon. In contrast, consider the fake Microsoft tech support scams. Those are so common that Microsoft gets 10,000 complaints/month [1], so nobody would think to mention a single example here.

[1] https://blogs.microsoft.com/on-the-issues/2017/05/18/fight-t...


It’s not uncommon at all, and it’s been documented here and other places many many times. The difference between this and your poor analogy is that here Google could actually prevent this if they chose to.


It's hardly a walled garden. More like a picket fence.


While physical analogies have their limitations, a meatspace store where any supplier could drop their product without vetting would not be a safe place to shop.

There has to be some sort of curation. Algorithms and automation can help with the curation, but there has to be something.


Scams in the Google Play store are an old problem. It genuinely perplexes me that Google hasn't solved this problem. Surely they realize this drives people away from the platform. My parents don't follow @swiftonsecurity but they do read the news and they don't want their retirement accounts pilfered. Thus they overreact to any negative security news in the mainstream news. So they own iPhones. The volume of bad news about Android security outweighs iOS.


Google has an obsession with automating everything and employing at best no humans at all.

Same thing with Chrome extensions. Mozilla has solved this same problem in near perfection by just having a few actual human beings look over the code of newly submitted or updated extensions.

Google has magnitudes more money than Mozilla, so they could easily afford to just copy that, too.


Google Play is a walled garden. Maybe a wall is a little lower then some, but still.

Is F-Droid a walled garden?


Yeah, its walls are higher than most. What's the point?


They build walls and fences because of economic inequality. I have been in places with huge 3m walls and in places with no fences around the house and owner name plates over the doors. In former places there are jobless people asking for job in the streets, lets call them thieves.

Walled gardens just keep small developers out of the marketplace by rising the bar. Now you need to pay money or have a name, so WhatsApp, Viber and similar shit can retain monopolies and keep their users despite being filled with ads and offer less security than competitors. Jabber clients and independent media never get a "verified" badge.

If you want to download "genuine" WhatsApp, go to their website, check their TLS certificate (you can never be sure, they don't even bother to get a EV cert, even for WhatsApp web; https://app.wire.com/ has an EV, for example) and follow the link to Google Play. Software repos are not here to do the job of CAs.


And of course you go to their website by googling them. Maybe clicking one of the 5 adword links at the top of the page that look like search results. Works out nicely for someone.


"Good fences make good neighbors."


Um.... it didn’t seem to do much good here. Not sure what your point is.


I think his point is that Apple's app store is commonly criticized for being a walked garden. While the Android app store may be as well, some complaints seem specific to the Apple side. For instance, releasing a new or updated app requires manual review from someone at Apple. I've only seen complaints about an app release taking an arbitrary amount of time with the Apple app store.

There's a list of guidelines your app must conform to, and Apple is generally more aggressive about catching violations before app release compared to Google. There are consequences to this, like Safari-WebKit being the only permitted browser engine on iOS. Any other browser must wrap this engine.

The grandparent post is likely pointing out the dissatisfaction that devs express regarding the Apple app store review process. It seems like it comes down to an engineering trade-off. At some point you have to choose between developer experience and end user security.


If you prefer being unfree in the name of security, that is your prerogative. But please don't drag the rest of us with you into your golden cage.

I prefer solutions that offer both, freedom and security. Such as proper application isolation, user review systems (a tough nut, yes) and generally having better reputation/quality signals than just a company name.


I wish for security and simplicity there was a way to disable everything but 7-bit ASCII. Like, who actually thought that having identical characters for different things made any sense ??


This is why even some of ASCII is a mistake, maybe we shouldn't even bother with case.

Every time something like this comes about I just get more cynical about the complexity of multilingual systems, or systems with interesting typesetting routines.


ASCII is very well defined. Control characters were needed at the time (and still are today, if you're using a terminal emulator).

It's easy to handle, just disallow < 32 and > 127, which are invalid or non-printable chars (but think about tab/cr/lf). Basically every font you can find can display all of the ASCII printable range without room for confusion.


What ASCII characters are a mistake? If we required all apps to only use characters in the ASCII charset (the 127 codepoint one, not the Windows-1291 "ASCII" garbage), these tricks wouldn’t work.


The poster said some ascii is a mistake. I’m assuming those chars that don’t translate to much/anything.


Like basically anything in the 0x1 to 0x1F range?


Bullshit. Just like punycode TLDs. And I say this as a guy from a cyrillic country. We invented a footgun and then promptly shot themselves with it.

Instead of making everyone use safe ascii charset for IDs (domains, names like the one in the article, etc.), we go for stupid fuckton of language charsets that cause such problems. All in the name of accessiblity or whatever. And all this does it let people continue living in their language-specific bubble instead of just learning the main international language: english and living happily ever after.

And now people suggest some crutches like restricting the data to some subset of unicode. Never learn.


I don't think it's fair to force English upon people. It's not even the most spoken language on the planet. (There is more people speaking Chinese. And also more people speaking Spanish than English.)


Of course it's fair. Tech dominates the world, it's not about numbers, it's about being able to understand everyone using modern communication methods.

Regarding china, they chose isolationist politics (just like russia lol), it's everyone else's duty to pull the blanket over to the 'everyone' side from 'china' side.


I am happy about unicode, the benefits are well worth the few problems. Scammer will always be scammer. We have to outsmart them, that's all.


But gmail doesn’t make any differences between user.name and username.


Why would this be relevant? What I do with one field in one of my product for one set of reasons has very little to do with what I do with another field entirely, in another product entirely, for a whole other set of reasons...


By design. It isn’t accidental.


This is so typical of Google's policies. They will not fix something just because users report it.

https://bugs.chromium.org/p/chromium/issues/detail?id=147


I wouldn't fix that "bug" either. I don't want confirmation dialogs all over the place. They are annoying when I try to close or delete. Yes, I clicked close on purpose.

Google has done a good job with some of their "undo" notifications; these work much better imho.


Especially considering there's a chrome option to have the startup tabs be the tabs that were last open. No dialogs necessary, just take me back to where I was.


That's hardly going to help average users.


Which is your personal opinion. Your personal opinion is not more important than the needs of a significant portion of users.

The proper question here has to be what pisses off more users and to what degree does it piss them off: Having to click "Don't ask me again" once after a fresh installation or repeatedly losing some tabs they had opened in the background, because they forgot about them.


A "Don't ask me again" checkbox is a good compromise.


> They will not fix something just because users report it.

Nobody should "fix" something only because users report it.


Right, they should only implement that style of feature if it is bugging Chrome Devs http://googlemac.blogspot.com/2011/06/q-i-didnt-mean-to-do-t...


What's the Chromium bugtracker for, then? O.o


For reporting and tracking actual bugs.


That simply isn't a bug, it's a feature suggestion that was rejected.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: