Interesting. So, it seems like you aren't respecting robots.txt. I picked Old Navy, as it was on your supported stores page [0], and went to their robots.txt [1]
So, do you have permission to violate robots.txt, as I'm sure there is some automated interaction with checkout/purchasing pages? Or I am I missing something about how TwoTap works? Scraping is one thing, but accessing when the management of the website prohibits it seems like a big no no.
Oldnavy and a few other retailers are not active yet. We've pre-built these integrations despite not having requests to sell their inventory just yet.
I'd mention more on the BD side but can't at this point for competitive reasons. The fact that we currently support sending orders through to 450 retailers does not mean we have deals in place with all of them, but that the infrastructure is built to allow this to happen -- if affiliates or publishers get an approval from retailers or the affiliate networks that govern this. Perhaps we should make this clearer on the supported stores page.
All in due time. The industry as a whole is being pushed to decide which models they will embrace -- and as always some will be slower to adapt than others. The pressure comes from lost revenue on mobile which makes retailers a LOT more flexible now compared to even 6 months ago when talking about this.
Considering multiple format screens and devices fragmenting retailers distribution channels over the next years this is set to become an even bigger chapter down the line.
Looks like I'm at one of the retailers you crawl. Recently, our site was getting hit with a web crawler that was following links incorrectly. I black listed several IP addresses from accessing the site and now I wonder if it was this!
Probably they don't, because so much of the web has robots files like
User-Agent: established_company
Allow: /some-stuff
User-Agent: *
Disallow: /
# keeps out filthy peasants
And you're either stuck following them, and not having data that would be offered up for free if you were someone else, or being a bad person and ignoring it. You don't really see the services that follow the rules.
Also, good paper on how much being on robots.txt preferred helps, which makes you a better product, which makes you more preferred...
We don't spider retailer websites. That means we don't follow links or go hardcore on building a database of products.
We hit your website:
* if someone has asked us information about a product url
* when we place an order
* weekly for regression tests
Ping us on contact@ and we're more than happy to jump on a call and describe exactly what we're doing. Most of the time we're completely un-noticeable except for the fact that you're getting more orders.
AFAIK, the main legal issue is a trespass to chattels tort. The data collected is generally uncopyrightable if not reproduced in it's entirety without modifications. The relevant case is Feist v. Rural [0].
IANAL, but I think the best bet for staying technically legal is to use jurisdictional arbitrage and tit-for-tat to liberate the data. If someone scrapes a US server and are in the US and they generate enough load to deprive the owner of use, then they are technically liable for damages under trespass to chattels. If they instead trade scraping labor with people in other jurisdictions, then that other entity would be liable. There might be some other legal defense/attack that might be usable by the entity who has the data being liberated, but I reckon it would be tenuous at best.
IANAL, but I doubt that that is an issue in most jurisdiction. A website does not get to make their own law by simply putting up a note. Neither are the terms of use a binding contract between two parties because the scrapers are not the websites customers, and thus did not sign or agree to anything.
But, wouldn't it be more beneficial to get websites to open up an API to you, communicate to them to do so, or even offer consulting services to build an API?
I know that there are a few cart/store offerings out there. it seems to me that they would have an API.
Good question. That's because this method doesn't scale and fails as a solution to the industry's challenges.
There's companies that are trying to get retailers to implement APIs but this leads to a fragmented ecosystem. Year's past payment processors that sold "pay/checkout with ..." buttons and wallets have failed to achieve significant merchant adoption despite being fuelled with marketing spend in the billions.
The solution everyone embraces seems to rely in building an independent and neutral piece of infrastructure (an API) that any publisher can integrate and that plugs into every checkout out there. It's the missing pipes in ecommerce, anyone can use it and nothing really changes (we don't process payments, it's all automated etc) -- and conversions go UP.
I'm repeating some ideas in the post but on the publisher side it's worth noting NONE would entertain the idea of integrating multiple APIs -- one for each merchant. Did I also bring up the required combined efforts of all merchants to keep those APIs up & running? :)
So pro-scraping because it's the only way to build adoption in ecommerce.
The hard part is not scraping, it's returns. For many kinds of online products, the return rate is over 40%. The shopper must be completely aware of how to contact the merchant of record and how to return the product.
Also, if you are scraping a large retailer you are effectively required to be PCI DSS level 1 compliant, which takes a bit of extra effort.
Completely agree! Returns and not breaking the retailer's CRM is key in the space. Retailers are happy especially because we're not breaking their relationship with consumers nor obscuring payment/shipping data.
It's very difficult to find and build a model that's accepted and actively supported by all stakeholders in the ecommerce space and we're very excited by current efforts in the industry. Long story short, we've had no complaints or confusion from consumers so far on returns, customer support etc. By now most consumers are aware of in-stream or remote buying and following basic guidelines helps a lot too (clearly displaying the retailer logo, user messaging etc)
And yes, PCI DSS compliance is also crucial to storing and handling credentials. We're going through the process again this year at Two Tap, but the effort is worth it.
Are you sure that is true for all retailers of those products?
I deal with a high return rate industry [specialty products many customers can't size correctly] and I only see return rates of 3-7% depending on the product. 40% seems very high.
It's very true for higher end goods. These goods typically are sold with free/cheap shipping and so customers will order 3 sizes of an item and return all but the one that fits.
This is strongly correlated with brand values they push in certain marketing campaigns and both returns as well as excellent service are promoted.
Flipping it you could say they attract people that do returns more than avg.
Its explicitly designed into Zappo's business model and an intentional side effect. They have a 1 year return policy + free returns specifically to encourage people to send more returns so they are more likely to risk buying things that might not work out.
For the same products, I have a ~9% return rate. :/
I interviewed for a data position at a major internet clothing store last year and they said their return rate was in that ballpark. (I don't fully remember the number)
Clothing has higher than average return rates because people can't anticipate sizing or look. I worked for a food ecommerce site and our return rate was low single digits.
I prefer being publicly anonymous and if I answered your question there is literally only 1 person I could be. I suppose that makes me paranoid but I'd rather not create drama.
For the sake of example, Amazon.com would be sufficiently equivalent as we do sell products via Amazon. And the fact I complain about Amazon in places gives away that we do do that. :P
I've worked with two shopping search engines, and interestingly, scraping sites was one of the things they did to build up their inventory as well. The big difference being, they simply organized the products into a searchable format, then sent traffic to the ecommerce site and let them handle the checkout . What you're doing is arguably more complex.
(They also prioritized the feeds that were sent to them directly by retailers above the scraped items feeds - thus prioritizing paid listings, similar to the Google SERPs - so a different business model entirely.)
That being said, a very cool concept - and agreed that, given the relatively-small number of ecommerce platforms out there, scraping then erving them up seems pretty scalable. Interested to see how it goes.
At least one of them might have been using our technology in the backend, especially if they're one of the top 5 shopping search engines.
The downside to feeds is that they become obsolete very quickly, especially if the product is popular. Products sell out very quickly, retailers lose money on traffic they can't onboard and shoppers get frustrated.
> At least one of them might have been using our technology in the backend, especially if they're one of the top 5 shopping search engines.
Which of these "top 5 shopping search engines" have you worked with? You don't seem to mention any on your website.
> The downside to feeds is that they become obsolete very quickly, especially if the product is popular. Products sell out very quickly, retailers lose money on traffic they can't onboard and shoppers get frustrated.
Feeds are the only way to keep up with frequently changing listings from large retailers (apart from doing live API requests) since scraping is several orders of magnitude slower. Amazon gives selected partners incremental feeds, scraping their millions of products takes days.
I built a CJ scraper for a deals website that is now defunct. What a pain it was to maintain. All the different retailers dump their data into CJ in different ways. I might just put it on github if anyone's interested. Python + chromedriver + beautifulsoup + mechanize
I tried the demo with a Lego castle priced 99€ and got a grand total of more than $10k...
FYI, Lego showed me the French version of their website as it's where I live. You seems to only offer shipping in the US though that's not clear reading your website. Still very interesting.
Can anyone go into a bit more detail about how the affiliate commissions work here? From what I have read, I would feed my affiliate link through TwoTap and you would then handle the cookie and conversion and everything?
If I was using URLs gathered from a Commission Junction datafeed, is this basically a plug and play solution? Or do I need to process those URLs?
Do you have a backend stats dashboard? Or would I still rely on CJ for that data?
We simulate what a shopper would do. We first go through your affiliate link (which drops a cookie) and then go on the retailer website to place the order.
All the commissioning, connecting/talking to retailers, receiving the money, is directly between you and the affiliate network. We're plug and play :)
We do have a stats backend where you can see all the purchases that went through Two Tap. And you can also use CJs dashboard just like you are probably doing right now.
Yes. It used to be more controversial 2 years ago.
They're only reticent to not getting the consumer data, breaking the relationship with shoppers or not processing the payments. And control over who sells their inventory obviously -- which is already in place through their relationships with affiliate networks.
Tick those boxes and they're cool with it and supportive. That being said we're still expanding tech support for retailers faster than BD can can keep up -- 75 new retailers monthly at this point. That's why we're pushing all our affiliates to get an approval before using Two Tap in order to keep getting affiliate revenues.
Experiments from Twitter and Facebook also do a good job at educating the market which works in our favor -- at least merchants learn what they don't want :)
So you guys are scraping all the product information for a retailer and keeping it up to date? Or is it all live, you fetch it when that particular url is called? Where do you get the list of retailers to scrape?
We're fetching the live data only for the products requested via an URL. Two Tap mimics a consumer visiting the retailer and getting that info for themselves which also allows retailers to retain their analytics layer with no negative impact.
Our current supported stores span the top 500 as well as a number of specific integration requests.
Correct. We don't get any input from the retailers. Two Tap can get product availability info and place an order just by having the product URL, nothing else.
Also, the full retailer inventory is available, unlike FB or other models that require the shop to upload a certain number of products.
hm, I didn't mean infrastructure, I mean, did you buy proxy nodes from someone like sslprivateproxy and slap HAProxy in front...
Most ecommerce sites wise up to bots crawling them and have a robots.txt that suggests you might not want to...
I don't get it. Is this just a middle man between all the retail websites and the publishers? Sort of like what Google is doing with the product search and also giving comissions on the items sold?
Yes, you could say that. We're laying down pipes in ecommerce so you can send an order to a merchant from anywhere on the web in a standardised API.
Retailers can extend their reach and make their inventory shoppable from anywhere with an internet connection and publishers can build ecommerce in their apps.
Interesting but why the need for web scraping? If the retailers saw the monetary benefit of this wouldn't they go out of their way to provide you the data directly and an API as well?
Tap certainly would solve the feed problem - but thats a problem for affiliates, not retailers :-)
Its likely that I don't have much clue what I'm talking about, but I can't see great benefits for many retailers from this as it stands. They like more control over leads and to onboard customers into their brand. Typically, this is managed through affiliate programs and partnerships.
Two Tap looks like it allows prospects/customers to be kept under the publishers wing. The retailer gets the transaction, but the publisher keeps the relationship - where the real value (capital) is created and can be realised.
As someone with an interest in a small time publisher - I'm very interested in this.
However, I have a concern with how well it sits with affiliate t&c's. These vary by program of course - but those I know prohibit scraping by any means. That's stopped me doing similar on 80legs or other in the past.
If above is right, Two Tap may well have to develop the kind of relationships with retailers that better serves their needs at the mouth of the funnel. Then Tap risks becoming like anyone other program (CJ/AW..) - & the scraping will likely have to stop. I'm sure they know the space better than anyone here though - and certainly better than me - so I'd be interested to hear if they have these relationships in place with the scraping MO, or are confident of a way forward.
Given that - the CTA only leaves me wondering whats on the other side of the wall:
"Sounds interesting? Let’s talk! - Sign up below"
Do you have traffic/other requirements. Whats the pricing? It'd be nice if you could inform me more/make me work less if I want to use your service :-)
You're right on a lot of accounts. Let's talk, can you get in touch at hello@ please? Happy to talk about all of this.
The main difference now is that both retailers as well as affiliate networks are actively trying to push this model once they've seen they can make 5x more money from the same amount of impressions.
Also, the publisher gets to keep PART the relationship -- the one dealing with the users interaction with their product. The relationship with the retailer that sells the product is also kept intact -- consumer loses just one touchpoint (a visit to the product page) but gets same confirmation email, returns, customer support etc. The retailer has same load on servers and gets the exact same user data they would normally (shipping, email, address, billing, payment etc).
[0] : https://twotap.com/supported-stores/
[1]: http://oldnavy.gap.com/robots.txt