There was a project out of MIT CSAIL back in 2006 that did automated extraction of tabular data from web pages. e.g. product lists on a store site. It recognized pagination and looked for a sequence repeated DOM structures (and what varied in them) to identify the items. You might find it interesting:
"We propose that web sites can be similarly augmented with other sophisticated data-centric functionality, giving users new benefits over the existing Web." - gonna check this paper out!
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90....