There was a project out of MIT CSAIL back in 2006 that did automated extraction ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dunham on Jan 26, 2022 \| parent \| context \| favorite \| on: Show HN: Electric Tables – an experiment in person... There was a project out of MIT CSAIL back in 2006 that did automated extraction of tabular data from web pages. e.g. product lists on a store site. It recognized pagination and looked for a sequence repeated DOM structures (and what varied in them) to identify the items. You might find it interesting: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90....

topcat31 on Jan 26, 2022 [–]

"We propose that web sites can be similarly augmented with other sophisticated data-centric functionality, giving users new benefits over the existing Web." - gonna check this paper out!

Reminds me also of this amazing project that also deals in structured data and tables: https://www.geoffreylitt.com/wildcard/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact