Depends on who you are. Hotel and airline websites, for example, are pummeled with scrapers wanting pricing and availability info. Letting them scrape with no limits is costly.
And why is that a good thing for them? I mean if they provided an API to get their pricing, wouldn't it be in their interest? Does anyone buy airline tickets from the airline website ever? I don't. I use sites like Expedia and Priceline. And if an airline is not listed or has shitty prices, I don't buy it. Wouldn't it be in their interest to be listed on there? Moreover, wouldn't they want the other companies to have similar API's to have dynamic pricing?
I get that at one point not having this info out there was a good thing for them. But now the car is out of the bag. You can't keep pretending that booking sites don't exist.
> Does anyone buy airline tickets from the airline website ever?
All other things being equal, I will strongly prefer buying directly from the airline, because I've personally experienced the shifting of responsibility if things go wrong during a flight.
I've also been denied boarding (not in my home country) because the airline claimed that the OTA (Netflights) didn't pay for my ticket, leaving me to spend the night in the airport and having to book a one-way flight with a different airline the next morning.
That was a horrible experience I'm determined to not repeat.
> Hotel and airline websites, for example, are pummeled with scrapers wanting pricing and availability info.
Yes, and they make it very cumbersome to discourage it. I've successfully written scrapers for airlines. It's significantly more difficult than crawling other websites for a few reasons:
1. Session management is wacky - they really like to manage state entirely through cookies, and you typically need to visit a specific set of pages in a specific sequence before you can access the resources you want, like the number of seats available or their prices.
2. Sessions have time limits because anyone who looks at the seats initiates a "soft" reservation on them (this works in a similar way for theatre, concert and movie seating).
3. You don't usually have nice JSON endpoints, so you'll be doing a lot of HTML parsing (which, given the type of HTML you encounter, can be hell).
>You don't usually have nice JSON endpoints, so you'll be doing a lot of HTML parsing
That is changing. Most scrapers haven't caught on, but the more modern things airlines are pushing out (their mobile sites and native mobile apps) often have really nice REST/JSON api interfaces. The scrapers are often still scraping the old desktop site which will be the last to get that underpinning.
Yes, whenever I'm looking for a source to crawl I prefer mobile applications for precisely this reason. Request signing and certificate pinning are an upfront annoyance, but the maintainability is far higher.
Depends on who you are. Hotel and airline websites, for example, are pummeled with scrapers wanting pricing and availability info. Letting them scrape with no limits is costly.