Considering that it's Google, I'm honestly surprised they haven't created a crawler application that can execute the javascript on a given page and re-parse it based on the new layout.
You wouldn't think it would be that hard, either-- take Chrome, remove UI, add crawler. Chrome's already got the functionality in "Inspect Element" to show dynamically-created content.
Google does both heuristically parse Javascript and execute Javascript, for at least some fraction of their crawl. This is obviously much more expensive than doing HTML parsing, particularly once the Internet largely goes from static HTML to AJAXy magic.
Granted, but if Google can't parse it, I highly doubt anyone else is going to. And as nice as it would be if every web developer read the "Google Guide to Being Nice to Our Web Crawlers", as an internet we still can't get away from IE6. The term "pipe dream" comes to mind.
That would be a fairly bad idea. Javascript execution, and in particular AJAX calls, frequently change data. I wouldn't want my crawler to delete or edit somehow lots of data if I where Google.
No more or less than following a link with a GET variable. I'm trying to find the article posted on TDWTF regarding a link that went to "&delete=true" that dropped the entire database...but, Ajax calls are no more or less frequently data-changing than any one of a hundred links on survey sites that send their data with GET variables.
I wish they would make it possible to do the whole bit with "pretty" urls. Example:
Users go to: /#/abc
Google goes to /abc
and they both see the same content.
Obviously you'd need some way to tell Google that your site follows this scheme but then you don't have to change all of your URLs and existing external links to pages on your site would accumulate page rank appropriately.
I've been thinking about writing a blog article on this subject. Basically AJAX is like Chocolate - it's really tempting, but eating too much or for the wrong reasons is not a good idea.
I've just rewritten one of my sites which used AJAX to load content on a single index page. I've now made sure every unique resource has a unique url. This is good for browser navigation, good for Google, and good for promoting a single resource on a service like Twitter. Now my site which looked like 1 page to Google is almost 500 pages.
I just recently implemented this. The thing I'm having trouble with is the fact that they don't limit the amount of time (or specify a limit, rather) which is acceptable to make the crawler wait. For instance, what if it take 20 seconds to load an acceptable amount of JavaScript-created HTML?
You wouldn't think it would be that hard, either-- take Chrome, remove UI, add crawler. Chrome's already got the functionality in "Inspect Element" to show dynamically-created content.