Hacker News new | past | comments | ask | show | jobs | submit login

Come to think of it, we moved offices 3 times since then, must've been 8-10 years ago. I don't think I had to do any special trickery, I spend only an afternoon or so writing and testing the code. I didn't realize such a thing would be impossible now - what a shame. I downloaded several gigabytes iirc - a big amount at the time.



Though now a day you could use Common Crawl to get the dataset and use existing tools to extract such files, right? (I've no idea if that's a practical thing to do or not.)


I guess so, if they "look" at the web the same way Google does (respecting robots.txt, nofollow etc - which Wikipedia says they do). But the interesting things are found in nooks and crannies where nobody else has thought of looking before - so relying on someone else to do the heavy lifting is probably the wrong way to go about it...


Common crawl gives you the data, not the results for the keywords that you're interested in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: