Microdata: HTML5’s Best-Kept Secret

Kilimanjaro · on Sept 12, 2010

Polluting markup is not the solution. It looks really ugly and makes maintenance a nightmare. Separate markup from data (and style, scripts, etc) for a cleaner app.

I've always been a proponent of placing all data in a script using json which you can consume easily without screen scraping.

http://mylittlehacks.appspot.com/dataislands

TrevorFancher · on Sept 12, 2010

I don't see microdata as markup pollution. It is semantic information that is closely tied to the data it is representing. Rather than just telling the browser you have a paragraph, you can tell it the paragraph contains an address.

And it seems to me that in real world web apps all the markup with microdata could be programmtically added. Create a new object with a microdata schema and get an ORM type object with an HTML write method. This way you never manually type microdata markup anyways.

henrikschroder · on Sept 12, 2010

Turning

  <div>
      <h1>Hendershot's Coffee Bar</h1>
      <p>1560 Oglethorpe Ave, Athens, GA</p>
  </div>

Into

  <div itemscope itemtype="http://data-vocabulary.org/Organization">
      <h1 itemprop="name">Hendershot's Coffee Bar</h1>
      <p itemprop="address" itemscope itemtype="http://data-vocabulary.org/Address">
        <span itemprop="street-address">1560 Oglethorpe Ave</span>,
        <span itemprop="locality">Athens</span>,
        <span itemprop="region">GA</span>.
      </p>
  </div>

...is not markup pollution? It's three times as big as before, and the value of this is still questionable.

what · on Sept 12, 2010

But it's smaller than the proposed data island.

   <script data>
   bizcard={
     name: 'Hendershot's Coffee Bar',
     address  :{
       line   :'1560 Oglethorpe Ave',
       city   :'Athens',
       state  :'GA',
       zipcode:'30606',
       country:'US'
     }
   };
   </script>

   <div>
      <h3><span data="bizcard.fname"></span></h1>
      <span data="bizcard.address.line"></span>
      <span data="bizcard.address.city"></span>
      <span data="bizcard.address.state"></span>
   </div>

Which also won't work for people with javascript disabled. Plus, does Google go through your javascript and figure out what eventually makes it onto the page?

Kilimanjaro · on Sept 12, 2010

Well, templates is another issue, mixed with data island may make it more verbose.

The idea of the data island is to have the data separated from the content and let consumers use the data whenever they need it. I am now studying the possibility to use a link tag and have the data island external, like rss/atom to save bandwidth when 90% of consumers won't care about the data. And for those who care, they just load the external link and there you have it, all the data without scraping.

jarin · on Sept 13, 2010

Google doesn't load JavaScript, that's why they have the AJAX crawling URL scheme: http://code.google.com/web/ajaxcrawling/docs/getting-started...

They're the URLs you see on Facebook that have #! in them.

jacobolus · on Sept 12, 2010

It’s mostly intended for places where data is going to be slurped out of a database and dumped into a template or renderer. Adding a bit of extra gunk to the template isn’t the end of the world (especially if you have a sane template system).

If I’m writing some text by hand, I’m going to use markdown and just write:

    # Hendershot’s Coffee Bar
    1560 Oglethorpe Ave, Athens, GA

As for the benefits: having some widely-used machine-readable metadata linked directly to the data could let browsers do some pretty neat stuff in the future, like letting a user click to add an event to his calendar, look up directions to an address, or add a person to his contact book.

dctoedt · on Sept 12, 2010

Kilimanjaro's JSON data-island example is certainly more readable. OTOH, with that approach there's the risk that, during maintenance, the data island will be overlooked and will become inconsistent with the human-readable information.

Edit: Perhaps that risk could be mitigated by having the human-readable markup issue a JS call to the data island, which would have the benefit of being DRY-compliant.

Kilimanjaro · on Sept 12, 2010

Another option would be to put all data in a separate file and reference it with a link tag, just like rss/atom. Then from the markup we could use a 'ref' attribute to reference it.

Zero pollution

Zero overhead

Markup/style/data separation

Qz · on Sept 12, 2010

Part of the dilemma, I think, is that in the age of Google, everything except the markup IS data (and in some cases the markup is data too).

dododo · on Sept 12, 2010

why bother? because google, for example, will use some of the annotations:

http://www.google.com/support/webmasters/bin/answer.py?answe...

akozak · on Sept 12, 2010

Like I've mentioned before, Google and browser support are hardly the limit of how annotations would be useful. Having a scalable standard for metadata would create a platform for all types of services. To me, what's exciting isn't what it will immediately get you, but what will be possible with that platform of data when it's a well-accepted standard.

arfrank · on Sept 12, 2010

Recent discussion here about microdata: http://news.ycombinator.com/item?id=1673623

mark_l_watson · on Sept 13, 2010

I have always liked (and often written about) an alternative: for HTML: http://example.com/test.html, for semantic data: http://example.com/test.n3 (N3 RDF format).

If a web app creates dynamic data at URL: http://example.com/stockfeeds?symbol=APPL then perhaps use something like: http://example.com/stockfeeds.n3?symbol=APPL to get RDF. Also, have standard (or commonly used) meta data to point to an equivalent RDF URI in generated HTML.

alanh · on Sept 12, 2010

Never confuse microdata or microformats for API substitutes.