Polluting markup is not the solution. It looks really ugly and makes maintenance a nightmare. Separate markup from data (and style, scripts, etc) for a cleaner app.
I've always been a proponent of placing all data in a script using json which you can consume easily without screen scraping.
I don't see microdata as markup pollution. It is semantic information that is closely tied to the data it is representing. Rather than just telling the browser you have a paragraph, you can tell it the paragraph contains an address.
And it seems to me that in real world web apps all the markup with microdata could be programmtically added. Create a new object with a microdata schema and get an ORM type object with an HTML write method. This way you never manually type microdata markup anyways.
Which also won't work for people with javascript disabled. Plus, does Google go through your javascript and figure out what eventually makes it onto the page?
Well, templates is another issue, mixed with data island may make it more verbose.
The idea of the data island is to have the data separated from the content and let consumers use the data whenever they need it. I am now studying the possibility to use a link tag and have the data island external, like rss/atom to save bandwidth when 90% of consumers won't care about the data. And for those who care, they just load the external link and there you have it, all the data without scraping.
It’s mostly intended for places where data is going to be slurped out of a database and dumped into a template or renderer. Adding a bit of extra gunk to the template isn’t the end of the world (especially if you have a sane template system).
If I’m writing some text by hand, I’m going to use markdown and just write:
# Hendershot’s Coffee Bar
1560 Oglethorpe Ave, Athens, GA
As for the benefits: having some widely-used machine-readable metadata linked directly to the data could let browsers do some pretty neat stuff in the future, like letting a user click to add an event to his calendar, look up directions to an address, or add a person to his contact book.
Kilimanjaro's JSON data-island example is certainly more readable. OTOH, with that approach there's the risk that, during maintenance, the data island will be overlooked and will become inconsistent with the human-readable information.
Edit: Perhaps that risk could be mitigated by having the human-readable markup issue a JS call to the data island, which would have the benefit of being DRY-compliant.
Another option would be to put all data in a separate file and reference it with a link tag, just like rss/atom. Then from the markup we could use a 'ref' attribute to reference it.
Like I've mentioned before, Google and browser support are hardly the limit of how annotations would be useful. Having a scalable standard for metadata would create a platform for all types of services. To me, what's exciting isn't what it will immediately get you, but what will be possible with that platform of data when it's a well-accepted standard.
I've always been a proponent of placing all data in a script using json which you can consume easily without screen scraping.
http://mylittlehacks.appspot.com/dataislands