Hacker News new | past | comments | ask | show | jobs | submit login

Even if it’s just http(s) requests that’s a lot of data to find & crawl. The bandwidth costs are probably insane.



I have a background in scraping from prior projects over the last decade.

Bandwidth is not a concern for projects like this at a lot of hosting/VPS providers.


Data ingress is usually free, which really cuts down on costs when scraping. If you can do everything in-memory, it's surprisingly cheap. The important bit is being respectful of robots.txt files and not overloading small sites with too many requests.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: