Hacker News new | past | comments | ask | show | jobs | submit login

how about caching the default entry (static url instead) + attribs, to ease demoing. at the moment it's been analyzing for more than >5mins



Good idea and I would implement that if I used an API from server to get the response. But currently I'm at the same time testing stability of Apify "Actor" solution and proxies, so for my case it's good that there are real requests with real responses, even if it's just from demo.

Btw the fact that it's running for 5 minutes is a bug, that I will look at, since there is a timeout of 2 minutes and there are no hanging runs or runs that ended with timeout.


You also don’t want to get your server blocked by yelp if they do rate limiting.


It's why I'm using proxies, every request is routed through different proxy address and the application as whole is rate limited. So hopefully I'm not making too much traffic on yelp. They are just a perfect example because they are using all types of data I'm looking for. When I find more good examples I will add them and rotate them for every page load.

Btw when it comes to ToS and scraping, this is not much different from accessing their website through normal browser only instead of rendered content we should you analyzed data. The page is only loaded once same as in browser.


They have fairly aggressive scraper detection (and this is also against their ToS)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: