Google using captchas to get humans to read street addresses captured by street view cars to improve maps results remains one of the most Googly things they've ever done. Genius, lateral, and a little weird.
Having to do free work for Google is more than weird for me. For this reason unless the page is really important I end up closing the tab.
If it was for open service (such as OpenStreetMap) it would be a different matter though.
How is that "free work"? Automated usage can threaten a site's bottom-line, and even its existence. A site resorts to Google's CAPTCHA's service because they don't have the money to build their own detector. It's not a service that was free for Google to create.
edit: Also worth noting that some of this "free work" that users do for Google is used to improve bot-detection overall. The comment to this blog post (the post itself, and its paper are great reads) is a nice example:
Essentially, the user is complaining (and justifiably so) of being served the pre-Street-View versions of CAPTCHA; I had forgotten how bad they could get: http://i.imgur.com/01F2eES.png
The original recaptcha was used to help clear up text from book digitization projects where OCR couldn't understand the data. The appeal of this was that these works were in the public domain, and thus proper digitization of these works is a societal-wide benefit.
Since Google purchased it, they've been using it to:
1.) Clean street view data for a proprietary product (Google maps)
2.) Build training sets for unknown ML purposes
These are activities that Google could very much pay a group of people to do. Instead, through recaptcha, they are getting that work from the end user for no payment. A case could be made that it's not free for the owner of the site that deploy recaptcha (because they get value out of the service, and Google gets data/ML services). However, the actual end user who has to fill out the recaptcha does not benefit in any significant way. Since a recaptcha is an inconvenience to the end user, that user pays both with the time to fill it out, and the data gathered by Google.
TL;DR Some people do not like that Google benefits from a transaction where Google is not a party, and where otherwise, Google could generate the benefit using their own resources.
Recaptcha was not a free service. Or if it was, why did Google end up paying millions for it? And it's not free now, unless the engineers and scientists working on it are working pro bono.
According to Wikipedia and the New York Times, reCaptcha was not developed for public domain works. Its pilot project was to digitize the NYT archives, archives which were not released to the public domain nor are fully available without being a subscriber: http://www.nytimes.com/2011/03/29/science/29recaptcha.html
I'm not a machine learning expert but I'm going to laugh at your suggestion that Google could "pay a group of people to do". In the above referenced NYT article from 2011, recaptcha's creator says several million words were being processed by recaptcha per day.
And again, have to disagree that the end user "does not benefit in any significant way". We would not be discussing this if Google hadn't learned from massive user data to iterate their captcha from distorted word mush to what it is today. Captcha was a serious drain of user energy and patience, that's why recaptcha was invented in the first place. And the worst captchas were tolerated because automated usage was a financial threat to websites that end users use.
I had totally forgotten how bad the word mush was. In perspective of this, I agree that we do get something back from doing unpaid work for Google, and that is a HUGELY improved user experience with the current reCAPTCHA like "click all the pictures of sandwiches" or "click the portions of the picture that has a street sign in it" compared to the earlier reCAPTCHA and other CAPTCHA.
It was provided a free service[0] (search free). I'm pretty sure that no one in this conversation is saying that the cost to run the service is free, and arguing such is a strawman.
was not developed for public domain works.
pilot project was to digitize the NYT archives
While the original "recaptcha.net" website is no longer available online (even via archive.org). There are plenty of sources still available that bely this claim. that will help convert printed text into computer-readable letters on behalf of the Internet Archive[0]. The team is involved in digitising old books and manuscripts supplied by a non-profit organisation called the Internet Archive[1]. "There's still about 100 million books to be digitised, which at the current rate will take us about 400 years to complete - Luis von Ahn, Carnegie Mellon"[1]. The sources are from 2007, when recaptcha was first introduced. There are more, but I picked the first two by going back to the beginning of the wikipedia entry for recaptcha and looking at the supplied sources.
This is before Google bought it in 2009. While I can't speak for anyone else, this is what I mean when I say "original" recaptcha.
laugh at your suggestion that Google could "pay a group of people to do".
They got a group of people to do it for free, which implies (barring salaries) that they could also get a group of people to do it for not-free. Thus the argument that people don't want to work for Google for free.
Google hadn't learned from massive user data
And when Google has met it's ML goals and decides it gets no further benefit from recaptcha? Remember, it's not a free-to-run service. They only continue it as long as they benefit. If recaptcha shuts down, it would have been nicer to have the fruits of that work available through something like the Internet Archive, than something like Google Books.
> However, the actual end user who has to fill out the recaptcha does not benefit in any significant way.
This isn't true. The end user benefits in a very obvious way by being able to use a site that hasn't been crippled or spammed by bots. It is very possible that many sites either would not exist or would be of much worse quality without a recaptcha-like service.
It is "free work" since I'm helping training Google's algorithms and I don't get money back. Note that the service is free for the site owner, but not for the end user, which has to invest his/her own time.
A site uses captcha because it is threatened by automated use. A site uses Google's CAPTCHA because the site cannot afford to build a (effective) CAPTCHA. If Google required the site to pay to use their CAPTCHA, those costs would be passed in some way to the user.
In other words, if Google's service didn't exist, you'd be experiencing less of the site, or no site at all if they chose to use a paywall that you refuse to pay for.
I am not saying that the service is not useful. It certainly is . What I am arguing is that it requires the users to do some amount of work and so it is not free for them, and I am of the opinion that Google benefit from it much more than I do.
Anyway, I am not completely against it, as I said above depending on the situation I make use of it.
It's the usual comparative-advantage thing, though. Five seconds of your brain reading something is essentially worthless to you (unless someone is paying you to read this comment), but of value to Google. The resource behind the CAPTCHA is of minimal value to Google (what are they going to do with an account at startup-of-the-month), but of more value to you. The transaction is mutually beneficial.
The value Google gets from the thing they want, and the value you get from the thing you want, aren't particularly comparable.
The alternative would be you're doing different work (i.e. more intellectually demanding) on each different site as they invent their own CAPTCHA, and rather than benefitting anybody, it would just be an annoyance between you and the content you want to reach.
Taxes work the same way. You pay money (or, if you wish, you work for money you don't get to keep) but don't get money in return. Rather, the money goes toward societal improvements. You can debate whether they're societal improvements you personally care about, but it's not a very interesting debate, because societal benefits aren't meant to be uniformly attractive to everyone.
> Except, everything funded by my taxes has to be open, freely accessible, etc.
You must not be living in the U.S. Even as strong as the FOIA and public records laws are, there's a huge amount of information and records that are exempted from free access:
I'll take the mangled words over the new version, where Google shows you a variety of photos and you have to pick which ones belong to a category.
Does a sweet pastry count as a cake? Is this stretch of water a river, a lake or the sea? Are those trees I see in the distance on this photo of a mountain? Does the front of a restaurant or a veterinary surgery count as a "storefront"? As a British English speaker I had to guess at that last one, and some of the others seem culturally dependent too.
Or, my personal nemesis, the ones which ask you to select the squares containing road signs, and there's always a couple of squares containing a 3-pixel strip of the very edge of the road sign, and you don't know whether you're supposed to count those or not.
GP is talking about visitors doing free work for Google. Googel can use the results to train its algorithms - this is a big set of data and very valuable.
Of course the site (and with it visitors) benefit in this exchange, so I agree with you that it is not entirely "free".
I actually feed google false information (select rectangular objects when asked to select traffic signs, flat surfaces when asked for water surfaces, and so on). It'll probably be filtered out when aggregated with other users, but I like to think that I'm at least influencing it even if a little.
I wouldn't have a problem if they'd open the data set, but I'm not exactly keen on being forced to work for free.
So you'd do the work if it were not productive but not otherwise? Like, if they just threw away your input into the CAPTCHA after confirming you're human, you're okay with that? Or do you just hate CAPTCHAs on principle?
It's an ethical distinction for some. I am completely fine with a type of captcha that helps improve some public domain or free/libre licensed resource (e.g., proofreading words scanned from public domain works). I am not okay with helping some megacorporation sanitize their proprietary database or training their machine learning neural networks for free.
As mentioned elsewhere in this topic; I'd much rather spend some of my free time to help improve OpenStreetMap (and I do).
But to me this is not a Pareto improvement (i.e., this is subjective). I consider the bolstering of the hold Google (and Apple, and Amazon, and Microsoft) has over a number of common domains (and in a broader sense our lives) to be undesirable and unethical. So when I correctly teach Google's proprietary image recognition software what a traffic sign, mountain, store front (ugh, these are the worst), or tree is, I am (even in the strict economic sense of a Pareto improvement) harming myself.
Imagine you're at an airport and an inspector takes your batteries out of your luggage. All other things being equal, does it matter if he tosses them in the trash, or keeps them for his personal use?
From one perspective it doesn't matter - your batteries are gone either way.
But from another perspective, inspectors who keep what they find have completely different motivations - making the same outcome much more sinister and corrupt.
It makes sense there because the actual consequences are undesirable. What are the undesirable consequences we're incentivising here? That Google will cover the entire web in captchas?
That Google gets an unfair advantage compared to other companies, making it harder to start a new company able to compete with Google on Machine Learning? (Which is already impossible, because Google uses the knowledge from ReCaptcha, GMail, etc unfairly)
What's going to stop you from just stealing the content from behind the paywall because the profitability of their chosen business model is not your concern?
And you really believe that to be ethical and normal? Do you apply those same beliefs to the physical world? Is it okay to just take goods and services with no payment as long as you are physically able to?
No, I do not believe it is okay to take physical goods or receive services with no payment. I do however believe that people should be free to send and receive any information they wish, regardless of the wishes of the "owner" of said information. Sharing information without the right holder's involvement will not make them loose access to it.
If you, as a site owner, don't want people to share information your server sent to their computers, implement strong DRM or don't allow off-premise access altogether. You might loose many prospective users, but at least your information will remain "safe". Probably.
If you want to share information on a website but can't think of a reasonable way to make money of it, that's your problem, not mine.
Lastly, I fail to see how a discussion on copyright relates to recaptchas or even site-owners' failed business models. Unless you can get this thread back on topic, perhaps we should continue this (most likely frutless) discussion somewhere else.
I feel this is on topic as it relates to you using Google services without payment.
This isn't information sharing, this is using a service which has costs (hardware, software, employee time, power, cooling, etc...). You are using the ReCAPTCHA service. If you don't pay them for that (either in time or in money) then in my opinion you are stealing the cost of that service.
Just like how if you got a taxi, then just walked away without paying. Yeah, you aren't stealing any physical goods, but you are costing the taxi company something and you aren't paying for it. And if you agreed to get a free taxi ride in exchange for washing their taxi car for them, and you just walked away at the end, that would be stealing in my eyes.
Using the reCAPTCHA service lets them use the data from it for machine learning. That's the reason they are providing it for free, and if you attempt to use the service without paying (or in your scenario maliciously provide false information, which not only doesn't contribute, but can mess up the results from those who do), I feel you are stealing it just the same.
I disagree that it is stealing because I never agreed (or wanted) to participate in this program. The site owner did.
I'm under no obligation (moral or otherwise) to make honest contributions. If we wanted to continue with your analogy, it'll be like being forced into a taxi, then driven one meter, then forced to pay a large fee because the landlord doesn't want me to walk on his lawn. I didn't choose to ride the taxi. The landlord did (ignoring saner options).
If google wants a revenue from their service, they should consider charging site owners.
If google wants to train their NNs, they'll have to pay people to classify data for them.
If google doesn't want to do either, and doesn't want people like me contributing to their system, don't let site owners allow us to use their site.
Otherwise, I think google should shutdown the service.
If you really want to keep using the analogy, it's like you going to a restaurant and they require you to take a taxi to the actual location (and tell you the cost upfront), and you just take it then refuse to pay.
You do have the option of not using a service that uses recaptcha. Nobody is forcing you to fill them out, you are making the choice to do it because you want the service the site owner is providing, and the site owner wants recaptcha.
If a taxi company keeps getting screwed over by patrons of a given restaurant, the company should stop providing service for said patrons, seek compensation directly from the people who require their service (the restaurant), or seek legal action against either the patrons or the restaurant.
4chan used to have a campaign where they'd collectively replace the obviously-unknown word in the ReCaptcha text captchas with the n-word to poison the results...
What's wrong with them building a profile of me as long as I consented to it? Are you opposed to consenting agreements between willing adults and companies?
Then in this case the problem seems to be ignorance. Ignorant people will always find themselves in trouble though. Anyway, you still haven't proved that they wouldn't be ok with this even if aware of it. I've talked to many people about this and most were ok with it. I think you're assuming a lot here about what others care about.
It seems like you have a problem with people consenting to things you don't like, just like homophobes have problems with gays because they personally don't like consenting to same sex intercourse and think gays are ignorant of some facts (they're going to hell etc), maybe people don't care or believe in these facts and are just enjoying themselves, ever thought of that?
What's the problems if other people enjoy things you don't? Live and let live.
Well, if we have people willing to kill us based on the type of socks we like or porn we enjoy, we have a bigger problem than the fact that this information is being collected. It's not like people willing to kill for such ridiculous reasons would be stopped by the lack of available data. Crazies don't really care about fact and data in general. I think you're assuming a bit too much about them.
And how's that different than advertisements in News Papers/magazines and TV? People never seemed to have a problem with that, and no one ever said "you are the product". Somehow, that is associated mostly only with Google.
And if at some point, the industry shifts to direct brain-stimulation to influence behavior, someone will say "how is that different from targeted advertising? People seemed to have no problem with that..."
How exactly would you expect that "having a problem with it" manifests? By not watching TV and not reading any kind of newspaper?
First off: tu quoque. You're changing the argument, which I'll take to mean you've conceeded the point.
But since you ask, it's very much the same, and people have very much had a problem with that, which raises a second fault with your argument: false premise.
Noam Chomsky. Pretty much his life's work.
Jerry Mander, Four Arguments for the Elimination of Television. (NB: Mander is a former ad executive himself).
A couple of Stanford researchers in the 1990s recognised the corrosive effects of advertising on incentives in the provision of online services. The prior awareness didn't save them from faling into the same pit: http://infolab.stanford.edu/~backrub/google.html
Niel Postman. Technopoly and Amusing Ourselves to Death.
Vance Packard, The Hidden Persuaders
Naomi Oreskes and Eric Conway, Merchants of Doubt looks at influencing on a much broader basis.
Banksy's art is, in many regards, a direct refutation of advertising. The letter attributed to him on the topic is originally by someone else however. It remains exceedingly good reading:
http://thefoxisblack.com/2012/02/29/banksy-on-advertising/
Because ads are evil. They are bedrock of all worldwide bullshit and scam. They make seller don’t care about reputation, competitors and eventually product quality. They transfer products completion into ads competition. They make all products more expensive because of this “advertising tax”.
They turn off customer’s head to make a zombie who buys best advertised product instead of one with the best quality. They are good only for advertising companies and scammers.
Ads are a great way to make your product known. Have you ever ran a business? At some point if you want to participate in the market of voluntary exchange of good, services and ideas, you need to advertize. The only countries where ads are forbidden and nowhere to be found are North Korea and before that USSR. Very depressing places. I love seeing people expressing themselves and showing up what they've built and trying to sell it. What's wrong with that? Advertizing oneself is one of the most human treat we have and has always existed one way or the other.
Yes, I’m talking this from my business experience. I’m selling software. It was cracked and thief is easily selling it with adwords now. He has top ad’s position in Google (despite several dmca notices and not only from me) and Google doesn’t care about it as long as they get good money from thief.
Google must share responsibility with the scammers for false advertising and the customers and sellers should have a way to make Google pay for promoting scam. The ads cost should be proportional to the possible damage.
I do that not because I want to not be tracked, but because I am bad at computers and don't want to accidentally install a virus because blatting and starting again would be an bunch of effort and it's not like my stuff is backed up or anything.
Sometimes though I wonder if I shouldn't turn adblock off, for example, and then I'm at work and some website auto-plays video and I remember.
Imagine, if you will, a scene like this. It is dusk. You are out running in the park, trying to get exercise before the last light fades. Suddenly you realize: A group of people have been closing on you, boxing you in from all sides.
You try to run. Your muscles burn, you heave for breath. They match your pace exactly, coming closer... and closer... and closer. You cast your gaze desperately from side to side. Isn't anyone around? You're scared. You're not a strong man, and these people all look like body-builders and criminals. No, now that they've come closer, you're not sure they look like men at all. Is that a tattoo, or is it... is it a tusk?
Then you stumble over a rock, fall, and scrape your hands. While you're rolling around and trying to get back to your feet, the men stop in a tight circle around you. They look down at you.
"What," you begin. Your voice is quavering.
"Take backups," the men say, as one. "Take backups, for you never know what may happen."
Did I claim to be using their services? Stick to what you know if you don't like reading what I actually write.
I was questioning someone feeling "hypocritical" for using other google services but disliking their new recaptcha tool.
As for "the contract": I'd wager that for most people, most contracts they see/sign are easier to get a clear understanding of, than Google's business model of "free" services.
> As for "the contract": I'd wager that for most people, most contracts they see/sign are easier to get a clear understanding of, than Google's business model of "free" services
Oh bull shit. The contracts you sign (or click I agree to) are thousands of legal definitions and words that most people don't even read, much less understand.
Google's contract is very simple: We show you ads (the more we understand your interests, the more targeted our ads) and you get to use our stuff without monetary cost.
Have you tried asking your non-IT acquaintances whether they knew and understood the tracking part? Because my experience is that most people don't, even if you think it should be obvious.
The original inventor also went on to found Duolingo, which uses teaching someone a language as a way to translate web pages as a service. I like that model a lot more, as at least you get something in exchange (the language learning)
To be fair, if you use google maps, and don't have any ethical issues with helping googler, you do get a return on investment, in that you get improved ML capabilities and improved address recognition in google maps.
I see lot's of those on Tor browser. There's nothing about entering street numbers, just identifying images that include street numbers. But perhaps that's also valuable. And what about the other major types, such as rivers, mountains, store fronts and street signs?
It depends on whether you're presented with the v1/noscript version or the v2 captcha. Desktop users may prefer the noscript version since it's faster to fill out with a keyboard if you're already typing into a form anyway instead of having to switch to the mouse.
I'm talking about image-based v2. I have better luck with them if I allow Javascript. I don't recall seeing the code pasting ones until recently. I often found v1 impossible. And buggy, in the sense that they kept repeating even when I clearly got them right.
Not sure if that's good for us. We're doing work for google for free. Google Translate will replace translators,
Google car will replace drivers, Google VR + map will replace city guides. We're giving them knowledge and powers to replace us, for free. Professions will disappear, people will lose jobs, Google will get richer and richer, will not pay me or you for fixing their suggestions in Google Translate or fixing maps or improving computed route from A to B.
No one is being paid for that work, but someone is monetizing it.
I consider it more a donation towards posterity than getting stiffed on some short term compensation.
It's just an opinion, but if the things Google has created and shared into the whole of human knowledge towards the betterment of future societies are still being used in a century for my grandkids (and the rest of humankind) to benefit from, then I think it's better than not having it because a few million people who couldn't have done it themselves said 'no' over a pittance.
I can't speak for you, but my dying thoughts won't be a sour reflection on all of opportunities to monetize my existence I might have missed.
There is some trade off when you think of the benefits we get for free, in return, from Google. I barely recall what it was like to buy a map, writing down directions, and praying to God I didn't get thrown off on the way.
the things Google has created and shared into the whole of human knowledge
Do they really share it, though? You can't download and reuse data from Google Maps or Street View the way you can from OpenStreetMap, say.
I'm not saying this is a fatal problem. Google Maps is a lot more popular than OSM so the free-with-ads closed source approach clearly has a lot going for it. But the data isn't public, they own it.
My brother teaches math to at risk youth in the Denver area. He has students that don't speak a lot of English (their primary language is Spanish) and he's trying to explain math concepts to them. He realized he can use google translate and get 95% of the information across that he needs to.
I've never personally hired a professional translator and I don't think I would if I was going to travel to a country where the majority of people don't speak English because I assume a professional translator is expensive. However if I know I have free access to google translate which will be useful enough for me to navigate by myself I would be much more likely to go on such a trip.
I'm sure some jobs will be lost but at least for middle class people who arnt able to afford translators the technology will be used to communicate better without affecting professional translators.
There are plenty of similar professions that exist but not utilized by the middle/lower class due to cost. I would love to hire a interior decorator, as I'm sure plenty of home owners would, but I haven't due to the cost and likely never will. If Google (or any other company) offered a service where I put photos of my home online and gave me a free layout with online links to purchase the furniture I would be thrilled and no interior decorator would be out of a job because I wasn't going to pay for one anyway. I think its all about the level of quality you want.
Google translate is decent for some languages and really poor for others - so I'd run it by some native speakers before you rely on it for anything important.
Work in the Language Service Industry. Google translate for a few languages is decent at best for social conversations. It is never used in serious business translations. The technology still has a long way to go.
and some don't, so this point of view is arguable, to say the least.
Regardless, translation is an intellectual work, driving (depending on the point of view), isn't. Same goes for city guides; people doesn't necessarily prefer an electronic device to a human.
Goog-411 predates that quite a bit an dis arguably more Googley. Launching a free 411 service (at a time when you usually paid per minute for that) and using it to improve your voice recognition algorithms.
It was essentially "OK Google" for dumb phones.
For anyone not familiar, 411 (in the US at least) was the directory service number. You could call it to get things like what time a restaurant was open until. Directions to the airport. Etc.
I wonder about the error-check thing. If at least one paid employee had to enter a first-set of data that was 'absolutely correct' before being seen by multiple people online which then 'solidifies' the correct responses. I briefly read about this somewhere but still not sure. I mean it does obviously work in the sense of entering something blatantly not correct is flagged.
Probably no need to have someone seed it. You just show the same image to a thousand people, and if 900 of them answer it one way that's almost certainly the correct answer.
Interesting. I'm inclined to think that this is not good. But it makes sense consensus, just seems to me if everyone is wrong. But the quality-check (seed guy) could be wrong as well.
edit: still, it's far better than the previous state of captchas. I'm glad they did this. But it's like for anything to be considered "advanced" or "good" in tech lately, it has to have been powered by "machine learning".
Well, technically, it is machine learning. Only that the machine learning was likely part of the usual data mining on google accounts and not much specific to the captcha problem...
(That said, whenever I used that checkbox widget they had before this announcement, there was a noticeable framerate drop in the browser while the thing was doing its magic. So I suspect, they are at least doing some browser fingerprinting/benchmarking to see if the widget runs inside selenium or a stock browser.
I also remember rumors that they analyze keyboard/mouse input on the page and check if it looks "human", but I'm not sure if that's true.)
Yeah it's basically browser fingerprinting (incl. GPU fingerprinting, hence the slowdown) plus google cookie.
If your browser is standard (AKA no anti-fingerprinting plugins) and your advertising cookies are not blocked (privacy or adblocker plugins) you'll probably pass with no issues.
If either of those is not true, you have to solve a bunch of image captchas.
Mouse/keyboard input analysis was just marketing talk; at least when they first released the nocaptcha it wasn't even captured.
It doesn't seem to be fundamentally different than reCAPTCHA. They will probably replace the click-this-checkbox box with a set of elements already on the hosted page. Only indexing a page's HTML isn't good enough -- it's better to also know how people interact with the HTML. Unfortunately, as soon as they click a link on Google, Google is no longer invited to the party, and is blind to how the user interacts with the page.
The next best thing a search company can do is have every website willingly track their users' mouse and key movements, and then willingly send all of that data to the company's inbox. In return, Google provides them with a binary classifier trained on all of the user click-stream/click-move data which determines whether or not the user is a bot!
It's an OK deal for the website owner; it's a great deal for Google. Not to mention, the user is now sending anonymous data to Google, at the expense of the website's Privacy Policy.
Google gets website owners to willingly install live-cameras on every corner of their website, and then willingly send over all of the footage, in exchange for "protection" from bots. Cough The Government gets citizens to willingly fill out lengthy tax forms, and then willingly send over a bunch of money, in exchange for "protection" from criminals. Cough
why are you implying that taxation is somehow a protection racket?
That's a pretty radical notion wrapped in a matter-of-fact language. Strikes me as a bit dishonest. Taxation pays for plenty of things besides police and military.
Instead of the user having to click the "I am not a robot" checkbox, then the submit button (or w/e), this basically binds clicking the submit button to reCAPTCHA. So whatever checks clicking "I am not a robot" would run are instead done automatically when submitting.
No idea how the additional prompts (e.g. "select the parts of the image with a street sign") are shown nicely in this "invisible" UX though.
While I like the idea of not having to deal with these annoying ReCAPTCHA prompts, something somehow feels "intrusive ?". I mean does this mean google is going to keep track of what I would be doing when I visit a site?
Say for instance I am signing up for a website, does the password I enter get sent to google servers to be analyzed now?
My standard internet security suite is ublock origin, noscript, self-destructing cookies, privacy badger and httpseverywhere.
Privacybadger does an ok job against tracking pixels. It uses machine learning to try to detect when a specific url or domain seems to be "following" you around the internet, which could indicate a tracking pixel among other things. It then blocks these entities and gives you the option to override.
And yes, I do think tracking pixels are super evil. It could be standard for websites to have a bar at the bottom with little logos[1] showing the companies that are tracking you, with the logos being the remote-loaded content. The fact that these companies feel compelled to make it 1x1 transparent pixels tells me very clearly that they know they are doing something people don't want them to do, because they've gone out of their way to hide it. It's a clear misuse of browser capabilities, yet they do it anyway. What a pack of cunts they are.
[1] or textual short names company ticker style if you want to minimize bandwidth
They're an even bigger thing in emails. Usually, companies would include a pixel from an external resource and then figure out if you've read the email or not by looking at was that image loaded or not.
IIRC, GitHub does that. If you read the email of some notification, it won't show you that notification in your notifications. The solution is to block external image loading in your email client (I know that Thunderbird has that, and I know that Zoho's email client on Android has that).
Yep, tracking pixels are among the oldest forms of internet surveillance, predating the far more aggressive and intrusive JavaScript companies use today. They're one of the reasons why most mail clients don't load images by default. They only give information on page loads, not obsessive behavior tracking, but they're harder to block.
They're why I block doubleclick.net, google-analytics.com, etc. in my hosts file rather than just blocking their JavaScript.
Gmail does something to divert it - it downloads all the resources once and CDNs them for you, so the email author can't see who and where opened the email.
Adblockers such as uBlock already block many of these since they're hosted by ad agencies. If someone is determined they can bypass the filter easily though; you'd need a static filter for each of these then.
The only global solution would be to block all remote images which would likely break many websites. Also, nowadays so many sophisticated tracking methods exist I'm not sure it's worth the hassle.
It's just one more of the many elements of Google puzzle. They control most web searches, GA is a de facto standard of web analytics, Gmail is the most popular mail service, not to mention other free services, Android etc. etc., Google can easily track you all the way from the moment you open your browser (likely, Chrome...), through your searches, visits and actions.
It's high time people realized giving so much power to one company, just for short-time convenience, is extremely dangerous. Google might have benevolent management now, but this can easily change, and it scary to think what could happen then.
The saving grace here is that there are alternatives for all those services and I can switch pretty easily at any time so honestly it's less urgent to me
don't they already? Almost everyone uses Google analytics and most websites that have ad placements use the Google network. Where do you think the data for those comes from?
That probably refers to check whether the mouse moves like a human or more bot-like.
Recaptcha does more than that though. It checks if you are logged into any other Google services, whether your browser user agent matches your actual browser, and I think one or two more things.
It's more urban legend than fact, recaptcha checks your google cookies and little else. They initially marketed it as looking for "human signs" such as mouse movements but they didn't, and AFAIK they still don't.
"NoBot is a control that attempts to provide CAPTCHA-like bot/spam prevention without requiring any user interaction. This approach is easier to bypass than an implementation that requires actual human intervention, but NoBot has the benefit of being completely invisible."
Seems like a bait-and-switch. Do free labor for a good cause (PD books), turns out you're just growing Goolge's library which can be taken down at a whim.
I was browsing Upwork the other week and I got trapped.
I couldn't load any more jobs, or view any more workers. I had to inspect the Network tab in Chrome, open the API request and then click a "I am not a bot" on their API page.
I also expect I will start getting silent registration failures with this. I'm using uMatrix to block most third-party scripts, and it's usually quite clear when I have to allow extra things with traditional captchas.
Then again, it's allow-more-and-retry on many pages already so nothing new in that regard.
Doesn't anyone else see it as a problem that Google's bots (and possibly CIA/NSA) can access any system using Google ReCAPTCHA? They can create unlimited amount of accounts on social networks, blogs, forums etc. and thus have unlimited social voting power on the Internet.
There is a need for a decentralized Captcha that can't be circumvented by anyone.
Google owns google. If they really wanted to manipulate people they would manipulate their own search result and not go through the trouble of spamming other sites.
Using some other bot-filtering system would prevent it. Of course the makers of that filter would be able to bypass it, but I guess you could use multiple filters...!
Guessing it's your browsing history (courtesy of including js from a Google domain) plus mouse tracking.
It doesn't seem much different than the current "click here" one to me. They are just letting the page owner substitute their own button in lieu of the check box.
I assume it pops up a challenge, again, just like the current one.
Edit: Yep. "Human users will be let through without seeing the "I'm not a robot" checkbox, while suspicious ones and bots still have to solve the challenges."
It really is just the current one, with the site owner's button instead of Google's checkbox.
I remember when they advertised it with "you just need to click a checkbox". In reality, I always have to solve a streetview captcha, possibly several times in a row.
There is usually an audio version available, presumably screen readers are pre-configured to present the audio option on popular captchas by default.
I'm curious now, will have to give it a whirl when I get into the office :D
Suspect Google will rely on their vast knowledge of people's browsing habits based off IP/account/ad-tracking/browser-fingerprinting to skip the user input aspects. Although that said, a screen reader won't have the standard physical interaction clues client-side that a user is a real person, mouse tracking for ex. is probably a moot point. Not really sure how Google will handle those, or if blind users will get a degraded always-on captcha experience.
Either that or a headless screen reader becomes the scraping/botting tool of choice.
Speaking of. Audio captchas are one of the most spooky things I've heard. Noise in the background and muddled voice listing numbers. Think numbers stations, except they try their best to make the numbers unintelligible (just like regular captchas tend to do with written words).
I'm less aware of how the more modern captchas work under the hood, but I do know that the audio interface (provided for for accessibility) of an older version of captcha was used as an attack vector for defeating it. My understanding is that the optional audio file was run through an STT program to generate the answer text so that requests could still be automated. Translating speech to text is less computationally expensive than OCR, especially when captchas were typically using hard to read or intentionally distorted images of text.
Someone who knows more about this than me can certainly school me on that last bit, however.
I think this sounds like further and deeper (maybe multi page) tracking to build up a profile that Google deems acceptable. I think I'm off to train a bot to behave more like a human when it interacts with a webpage...
Concerning the general problem of captchas:
What about paying a small fee instead of filling ever increasingly complicated forms.
Say 10 cents.
You could set up a service where the website's owner could decide whether the money goes to them or to, say, a foundation of their choice.
I'd love to hear some commentary from someone involved in the business of bypassing captchas. I have to assume "google invisible captcha hack" is getting bid up right now.
Me too. If you want to spam or hack surely you'd just outsource the captcha solving to a cheaper country. Or even sites like 2captcha that give you 1000 solves for around 50c. You can still make a robot but just send the captcha to a service like that, and that's not even one of the cheapest ones. Especially when you consider what the real value of the data (or even spamming) is.
Every now and then I have to train models (mostly neural networks) that bypass captchas.
My company provides web automation in Brazil, not for spam, not for marketing purposes. Sometimes the client has a website (like an intranet) that she wants to automate through screen scraping. I know it sounds stupid, but that is pretty frequent.
The models are fast (most captchas solved under 100ms in AWS Lambda) and accurate (95 to 100%). I have about 40 different captchas being solved this way.
My team is still working on a solution to nocaptcha recaptcha. The problem for me is not solving the challenges, but submitting them like a human would do.
I'm progressing with chromium headless. As more and more sites add this captcha, I need to find a solution to automate it. And I'm confident I will find.
They should add a contest to break it. Cause that sounds like the only real thing they can do is track the movement of the mouse / keyboard, and that sounds quite simple to fake.
Google Analytics and Ads are present in a large portion of the Web. So it is much more difficult than that. It requires you to program the bot to follow a sequence of links unrelated to the bot's goal, idle on some pages, click links that make sense to the session, enter some text sometimes in comment fields — in essence, react to content the bot has never seen before.
Google is betting that they can extract meaning from web pages better than bots, and they have had a lot of experience with that. On Web pages, each link does not have the same probability of being clicked by a human given the list of pages seen before. Knowing that probability requires the bot to understand what a human would see, and to perform actions that match a given goal which corresponds to the sequence of pages browsed.
And that goal mustn't always be the bot's goal. Bots have business incentives: they want to get people to do something by writing text that will be seen by humans. Humans, on the other hand, only do so once in a while.
I started making my own as Google Recaptcha was getting annoying. At its most basic, hide a field. On your PHP page that does the checking, simply see if that field has anything in it. If it does, its a bot. If it doesn't, it's not a bot.
I've seen this called "Honeypot CAPTCHA"[0]. I'd say it's good for 90% of sites, up until the point where spammers or fraudsters start targeting you specifically.
It just doesn't show up at all. Placed right before the form ends. I also use a combination of PHP and Javascript to create my own captcha. PHP generates a random string which I send to Javascript to hold. Upon clicking a button, if what is in the captcha does not match what is in the Javascript, thou shalt not pass. Haven't had any bot break through it yet, but I did have a Russian hacker email me, pissed off (entire email was in Russian, but I'm pretty sure the translator did well to know he wasn't happy), because he had created a bot specifically for my website that managed to create over 2000 posts in less than an hour.
You can view the captcha at https://mypost.io/ .. you cannot create a post without entering the captcha. I ended up creating my own because despite entering the correct answer (selecting the right images), Google Recaptcha would not recognize it.
Here's the problem Google needs to fix. If I'm logged into my account, years old, obviously not a complaint against it, I still end up getting this captcha nonsense. Doesn't matter the site.
its quite simple. the boot kicks the ball, the balloon carries it up to the ferris wheel, the ball drops onto some vertical pully system and raises the red thingy, which then tells the browser that you are not a robot
So I didn't see how it was done, but in a way that makes sense. I guess Google doesn't want to divulge what they're doing as to not give the bot makers any sort of insight on how it works.
I'm wondering if this might not be illegal under several countries' data privacy laws.
Users need to be directly and visibly be informed, if and when data of theirs will be transmitted to third parties.
As Google's ReCaptcha is based on tracking what websites you visit, what search terms you enter, correlating this data, and comparing this tracking profile of yours, it's quite problematic that the user doesn't even see any captcha anymore. With the previous captchas, site owners could keep pushing the legal problems to Google, partially.
But this new solution doesn't fit with the EU Data Privacy Directive in neither intention nor letter of the law.
IANAL, this is not legal advice. You can not use this in court.
To use the reCAPTCHA service you must agree to its Terms of Service agreement that clearly specifies that you must inform users on the collection and use of the data collected by reCAPTCHA + get consent from EU users [1]:
> You acknowledge and understand that the reCAPTCHA API works by collecting hardware and software information, such as device and application data and the results of integrity checks, and sending that data to Google for analysis. Pursuant to Section 3(d) of the Google APIs Terms of Service, you agree that if you use the APIs that it is your responsibility to provide any necessary notices or consents for the collection and sharing of this data with Google. For users in the European Union, you and your API Client(s) must comply with the EU User Consent Policy [...]
I think you are raising a valid point but this is outside my expertise. Could a site owner satisfy the EU Data Privacy Directive by simply notifying users that their activity is being provided to Google to prevent abuse on the site? I imagine something similar to the cookie warning.
The spirit of the EU directive is that you must get consent from your users before you attach third party cookies to their browsers. The setting in the browser is not enough because most people don't even know about it (not that they care about the cookie law banner). So I guess it will be the same with the captcha, if the information sent to Google can be used to track people. But I'm not a lawyer.