Surely you must be joking. Alphabet is the largest web scraper in the world. They would soon go out of business if robots.txt was the only data they scraped.
It’s not a web crawler. They are all web scrapers. And Alphabet/Google sells this data and makes profits from it.
It is not like it is trying to hide the fact that it is king web scraper.
Google has gotten in trouble from various publishers for this before. It is no secret there is a double standard in big tech.
Again if you are going to arrest a web scraper, then arrest the king of all web scrapers first to make it fair.
Data wants to be free. If it is publicly accessible then it is fair game.
So, no source? Your response is unrelated to the statement at hand.
Think about it: Google has every advantage by respecting robots.txt and nothing to win by ignoring it.
Eg.
1) If a media company doesn't want to get crawled: add it in robots.txt
Then they realize their visitors drops and they'll remove it again.
Ergo: publishers sue. Because they want the advantages, but without the scraping. Which doesn't seem logical to me, since they currently give Google explicit permission to scrape content.
2) if they would sometimes leak personal documents protected by robots.txt they could have a lot of lawsuits on their hands.
Robots.txt is a simple method to not get blamed.
Ignoring robots.txt could literally be a core business liability from my POV.
---
So please, source outside of gut feeling, as requested before, would be greatly appreciated.
Google scrapes web data is my point. It is king web scraper.
Robots.txt does not fit into this argument. Im not sure why it was brought up. Google doesn’t scrape urls listed there? Ok. And so? Am I to believe that just because Google says so?
Google scrapes what it wants. It does so for its shareholders. It could care less about web standards.