Hacker News new | past | comments | ask | show | jobs | submit login

Would you mind sharing your spider technology stack. How do you find new urls efficiently? What spider and what storage do you use? Do you append everything in one WARC file and split it if it gets too big?



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: