Considering that it's Google, I'm honestly surprised they haven't created a craw...

patio11 · on Oct 1, 2010

Google does both heuristically parse Javascript and execute Javascript, for at least some fraction of their crawl. This is obviously much more expensive than doing HTML parsing, particularly once the Internet largely goes from static HTML to AJAXy magic.

pseudonym · on Oct 1, 2010

Granted, but if Google can't parse it, I highly doubt anyone else is going to. And as nice as it would be if every web developer read the "Google Guide to Being Nice to Our Web Crawlers", as an internet we still can't get away from IE6. The term "pipe dream" comes to mind.

eli · on Oct 1, 2010

Uh, I thought they did. http://blogs.forbes.com/velocity/2010/06/25/google-isnt-just...

paradoja · on Oct 1, 2010

That would be a fairly bad idea. Javascript execution, and in particular AJAX calls, frequently change data. I wouldn't want my crawler to delete or edit somehow lots of data if I where Google.

pseudonym · on Oct 1, 2010

No more or less than following a link with a GET variable. I'm trying to find the article posted on TDWTF regarding a link that went to "&delete=true" that dropped the entire database...but, Ajax calls are no more or less frequently data-changing than any one of a hundred links on survey sites that send their data with GET variables.

Edit: Found it! http://thedailywtf.com/Articles/WellIntentioned-Destruction....

RyanMcGreal · on Oct 1, 2010

As long as the crawler can't execute POST requests, no one who has built their web application properly will have any problems.

njharman · on Oct 1, 2010

> no one who has built their web application properly will have any problems

In other words everyone will have problems (to an accuracy of 3-4 decimals)

RyanMcGreal · on Oct 1, 2010

Related: http://xkcd.com/327/

IgorPartola · on Oct 1, 2010

And how would the crawler know when the website is done loading?