Hacker News new | past | comments | ask | show | jobs | submit login

Think how ridiculous it sounds that Google only has URLs listed in robot.txt. They wouldve gone out of business long ago.



Do you know how robots.txt works?

It's an exclusion standard, not an inclusion one.

https://en.m.wikipedia.org/wiki/Robots_exclusion_standard

For helping individual url discovery, you can use sitemap.xml.

In case you know how it works ( and i suppose so considering your account age), your comment is just weird tbh.


Google scrapes web data is my point. It is king web scraper.

Robots.txt does not fit into this argument. Im not sure why it was brought up. Google doesn’t scrape urls listed there? Ok. And so? Am I to believe that just because Google says so?

Google scrapes what it wants. It does so for its shareholders. It could care less about web standards.

Source: Amp




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: