Hacker News new | past | comments | ask | show | jobs | submit login

Considering that it's Google, I'm honestly surprised they haven't created a crawler application that can execute the javascript on a given page and re-parse it based on the new layout.

You wouldn't think it would be that hard, either-- take Chrome, remove UI, add crawler. Chrome's already got the functionality in "Inspect Element" to show dynamically-created content.




Google does both heuristically parse Javascript and execute Javascript, for at least some fraction of their crawl. This is obviously much more expensive than doing HTML parsing, particularly once the Internet largely goes from static HTML to AJAXy magic.


Granted, but if Google can't parse it, I highly doubt anyone else is going to. And as nice as it would be if every web developer read the "Google Guide to Being Nice to Our Web Crawlers", as an internet we still can't get away from IE6. The term "pipe dream" comes to mind.



That would be a fairly bad idea. Javascript execution, and in particular AJAX calls, frequently change data. I wouldn't want my crawler to delete or edit somehow lots of data if I where Google.


No more or less than following a link with a GET variable. I'm trying to find the article posted on TDWTF regarding a link that went to "&delete=true" that dropped the entire database...but, Ajax calls are no more or less frequently data-changing than any one of a hundred links on survey sites that send their data with GET variables.

Edit: Found it! http://thedailywtf.com/Articles/WellIntentioned-Destruction....


As long as the crawler can't execute POST requests, no one who has built their web application properly will have any problems.


> no one who has built their web application properly will have any problems

In other words everyone will have problems (to an accuracy of 3-4 decimals)



And how would the crawler know when the website is done loading?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: