Come to think of it, we moved offices 3 times since then, must've been 8-10 year...

MichaelGG · on May 18, 2015

Though now a day you could use Common Crawl to get the dataset and use existing tools to extract such files, right? (I've no idea if that's a practical thing to do or not.)

roel_v · on May 18, 2015

I guess so, if they "look" at the web the same way Google does (respecting robots.txt, nofollow etc - which Wikipedia says they do). But the interesting things are found in nooks and crannies where nobody else has thought of looking before - so relying on someone else to do the heavy lifting is probably the wrong way to go about it...

jacquesm · on May 18, 2015

Common crawl gives you the data, not the results for the keywords that you're interested in.