Why scraping and ecommerce are a perfect fit

firloop · on Nov 13, 2014

Interesting. So, it seems like you aren't respecting robots.txt. I picked Old Navy, as it was on your supported stores page [0], and went to their robots.txt [1]

    User-agent: *
    Disallow: /buy/
    Disallow: /checkout/

So, do you have permission to violate robots.txt, as I'm sure there is some automated interaction with checkout/purchasing pages? Or I am I missing something about how TwoTap works? Scraping is one thing, but accessing when the management of the website prohibits it seems like a big no no.

[0] : https://twotap.com/supported-stores/

[1]: http://oldnavy.gap.com/robots.txt

razvanr · on Nov 14, 2014

Oldnavy and a few other retailers are not active yet. We've pre-built these integrations despite not having requests to sell their inventory just yet.

I'd mention more on the BD side but can't at this point for competitive reasons. The fact that we currently support sending orders through to 450 retailers does not mean we have deals in place with all of them, but that the infrastructure is built to allow this to happen -- if affiliates or publishers get an approval from retailers or the affiliate networks that govern this. Perhaps we should make this clearer on the supported stores page.

All in due time. The industry as a whole is being pushed to decide which models they will embrace -- and as always some will be slower to adapt than others. The pressure comes from lost revenue on mobile which makes retailers a LOT more flexible now compared to even 6 months ago when talking about this.

Considering multiple format screens and devices fragmenting retailers distribution channels over the next years this is set to become an even bigger chapter down the line.

tommccabe · on Nov 13, 2014

Looks like I'm at one of the retailers you crawl. Recently, our site was getting hit with a web crawler that was following links incorrectly. I black listed several IP addresses from accessing the site and now I wonder if it was this!

Does your crawler obey robots.txt rules?

patmcguire · on Nov 13, 2014

Probably they don't, because so much of the web has robots files like

User-Agent: established_company

Allow: /some-stuff

User-Agent: *

Disallow: /

# keeps out filthy peasants

And you're either stuck following them, and not having data that would be offered up for free if you were someone else, or being a bad person and ignoring it. You don't really see the services that follow the rules.

Also, good paper on how much being on robots.txt preferred helps, which makes you a better product, which makes you more preferred...

https://etda.libraries.psu.edu/paper/9230/4516

sradu · on Nov 13, 2014

Tom, that most likely wasn't us.

We don't spider retailer websites. That means we don't follow links or go hardcore on building a database of products.

We hit your website:

* if someone has asked us information about a product url

* when we place an order

* weekly for regression tests

Ping us on contact@ and we're more than happy to jump on a call and describe exactly what we're doing. Most of the time we're completely un-noticeable except for the fact that you're getting more orders.

We know for sure nobody is spidering through us.

tommccabe · on Nov 13, 2014

thanks for the confirmation. how do you get the product URL in the first place?

razvanr · on Nov 13, 2014

That's the app developer's responsibility.

Jake232 · on Nov 13, 2014

Is there any caching that goes on here? I would presume not as the vendors prices could change prices / details at any point.

josephjrobison · on Nov 13, 2014

I'm confused about the legality of scraping. Is it completely open, or are there some restrictions on scraping any site without explicit permission?

malandrew · on Nov 13, 2014

AFAIK, the main legal issue is a trespass to chattels tort. The data collected is generally uncopyrightable if not reproduced in it's entirety without modifications. The relevant case is Feist v. Rural [0].

IANAL, but I think the best bet for staying technically legal is to use jurisdictional arbitrage and tit-for-tat to liberate the data. If someone scrapes a US server and are in the US and they generate enough load to deprive the owner of use, then they are technically liable for damages under trespass to chattels. If they instead trade scraping labor with people in other jurisdictions, then that other entity would be liable. There might be some other legal defense/attack that might be usable by the entity who has the data being liberated, but I reckon it would be tenuous at best.

[0] http://en.wikipedia.org/wiki/Feist_v._Rural

incunix · on Nov 13, 2014

Doubt they would prosecute if they are making more money through the scrapping.

brentm · on Nov 13, 2014

The only thing illegal about it is you're potentially violating local copyrights. Sites generally state their stance on scraping in the terms of use.

runarb · on Nov 13, 2014

IANAL, but I doubt that that is an issue in most jurisdiction. A website does not get to make their own law by simply putting up a note. Neither are the terms of use a binding contract between two parties because the scrapers are not the websites customers, and thus did not sign or agree to anything.

Wikipedia has some insight into the legal issues with web scraping: https://en.wikipedia.org/wiki/Web_scraping#Legal_issues

At least for me it looks like one are better of adding technical counter measures against scrapers then to try a legal route.

monksy · on Nov 13, 2014

I don't understand why you're pro-scrapping. ( I did write a blog post on this, and I believe that I posted it to HN before: http://theexceptioncatcher.com/blog/2012/07/how-to-get-rid-o... )

But, wouldn't it be more beneficial to get websites to open up an API to you, communicate to them to do so, or even offer consulting services to build an API?

I know that there are a few cart/store offerings out there. it seems to me that they would have an API.

Magneto: http://www.magentocommerce.com/api/soap/checkout/checkout.ht...

OpenCart Propretary API: http://opencart-api.com/

Prestashop API: http://doc.prestashop.com/display/PS14/Using+the+REST+webser...

razvanr · on Nov 13, 2014

Good question. That's because this method doesn't scale and fails as a solution to the industry's challenges.

There's companies that are trying to get retailers to implement APIs but this leads to a fragmented ecosystem. Year's past payment processors that sold "pay/checkout with ..." buttons and wallets have failed to achieve significant merchant adoption despite being fuelled with marketing spend in the billions.

The solution everyone embraces seems to rely in building an independent and neutral piece of infrastructure (an API) that any publisher can integrate and that plugs into every checkout out there. It's the missing pipes in ecommerce, anyone can use it and nothing really changes (we don't process payments, it's all automated etc) -- and conversions go UP.

I'm repeating some ideas in the post but on the publisher side it's worth noting NONE would entertain the idea of integrating multiple APIs -- one for each merchant. Did I also bring up the required combined efforts of all merchants to keep those APIs up & running? :)

So pro-scraping because it's the only way to build adoption in ecommerce.

grandalf · on Nov 12, 2014

The hard part is not scraping, it's returns. For many kinds of online products, the return rate is over 40%. The shopper must be completely aware of how to contact the merchant of record and how to return the product.

Also, if you are scraping a large retailer you are effectively required to be PCI DSS level 1 compliant, which takes a bit of extra effort.

razvanr · on Nov 12, 2014

Completely agree! Returns and not breaking the retailer's CRM is key in the space. Retailers are happy especially because we're not breaking their relationship with consumers nor obscuring payment/shipping data. It's very difficult to find and build a model that's accepted and actively supported by all stakeholders in the ecommerce space and we're very excited by current efforts in the industry. Long story short, we've had no complaints or confusion from consumers so far on returns, customer support etc. By now most consumers are aware of in-stream or remote buying and following basic guidelines helps a lot too (clearly displaying the retailer logo, user messaging etc)

And yes, PCI DSS compliance is also crucial to storing and handling credentials. We're going through the process again this year at Two Tap, but the effort is worth it.

opendais · on Nov 12, 2014

Are you sure that is true for all retailers of those products?

I deal with a high return rate industry [specialty products many customers can't size correctly] and I only see return rates of 3-7% depending on the product. 40% seems very high.

grandalf · on Nov 12, 2014

It's very true for higher end goods. These goods typically are sold with free/cheap shipping and so customers will order 3 sizes of an item and return all but the one that fits.

opendais · on Nov 12, 2014

What are you defining as higher end?

$100? $200? Unusual measurements required? [e.g. Not just a standard size but additional dimensions]

'cause if you are, I'm still standing by my experience.

grandalf · on Nov 12, 2014

Probably a bit higher, but not 100% sure. I'd reveal the source of my info but I should probably not do that in a public forum.

rgbrenner · on Nov 13, 2014

Zappos has a 35% return rate.

http://www.internetretailer.com/2010/03/31/get-back

razvanr · on Nov 13, 2014

This is strongly correlated with brand values they push in certain marketing campaigns and both returns as well as excellent service are promoted. Flipping it you could say they attract people that do returns more than avg.

opendais · on Nov 13, 2014

Its explicitly designed into Zappo's business model and an intentional side effect. They have a 1 year return policy + free returns specifically to encourage people to send more returns so they are more likely to risk buying things that might not work out.

For the same products, I have a ~9% return rate. :/

That is different than normal return rates.

GFK_of_xmaspast · on Nov 13, 2014

I interviewed for a data position at a major internet clothing store last year and they said their return rate was in that ballpark. (I don't fully remember the number)

nols · on Nov 13, 2014

Clothing has higher than average return rates because people can't anticipate sizing or look. I worked for a food ecommerce site and our return rate was low single digits.

opendais · on Nov 13, 2014

Wow. We must be exceptionally good for some reason then 'cause some of the items I'm talking about are specialty shoes that cost $100+.

grandalf · on Nov 13, 2014

What's your website?

It may also be that your customers are comparatively good at predicting their size/fit.

opendais · on Nov 13, 2014

I prefer being publicly anonymous and if I answered your question there is literally only 1 person I could be. I suppose that makes me paranoid but I'd rather not create drama.

For the sake of example, Amazon.com would be sufficiently equivalent as we do sell products via Amazon. And the fact I complain about Amazon in places gives away that we do do that. :P

lloyddobbler · on Nov 12, 2014

I've worked with two shopping search engines, and interestingly, scraping sites was one of the things they did to build up their inventory as well. The big difference being, they simply organized the products into a searchable format, then sent traffic to the ecommerce site and let them handle the checkout . What you're doing is arguably more complex.

(They also prioritized the feeds that were sent to them directly by retailers above the scraped items feeds - thus prioritizing paid listings, similar to the Google SERPs - so a different business model entirely.)

That being said, a very cool concept - and agreed that, given the relatively-small number of ecommerce platforms out there, scraping then erving them up seems pretty scalable. Interested to see how it goes.

razvanr · on Nov 12, 2014

At least one of them might have been using our technology in the backend, especially if they're one of the top 5 shopping search engines.

The downside to feeds is that they become obsolete very quickly, especially if the product is popular. Products sell out very quickly, retailers lose money on traffic they can't onboard and shoppers get frustrated.

Thanks for your thoughts!

lazyjones · on Nov 13, 2014

> At least one of them might have been using our technology in the backend, especially if they're one of the top 5 shopping search engines.

Which of these "top 5 shopping search engines" have you worked with? You don't seem to mention any on your website.

> The downside to feeds is that they become obsolete very quickly, especially if the product is popular. Products sell out very quickly, retailers lose money on traffic they can't onboard and shoppers get frustrated.

Feeds are the only way to keep up with frequently changing listings from large retailers (apart from doing live API requests) since scraping is several orders of magnitude slower. Amazon gives selected partners incremental feeds, scraping their millions of products takes days.

coupdejarnac · on Nov 12, 2014

I built a CJ scraper for a deals website that is now defunct. What a pain it was to maintain. All the different retailers dump their data into CJ in different ways. I might just put it on github if anyone's interested. Python + chromedriver + beautifulsoup + mechanize

Jake232 · on Nov 12, 2014

I'd be interested in seeing this.

blaze33 · on Nov 12, 2014

I tried the demo with a Lego castle priced 99€ and got a grand total of more than $10k...

FYI, Lego showed me the French version of their website as it's where I live. You seems to only offer shipping in the US though that's not clear reading your website. Still very interesting.

Product URL: http://shop.lego.com/fr-FR/Le-ch%C3%A2teau-fort-70404?fromLi...

Screenshot: http://imgur.com/mlr8Q2e

razvanr · on Nov 12, 2014

Nice :) We currently focus on the US market with both retailer as well as publisher integrations, we should make that clearer perhaps.

Stay tuned though, we'll have news on this.

sradu · on Nov 12, 2014

Oops, it's because it's the french version of the website. We currently support only US retailers.

If you try this (same product, US version) it would work perfectly: http://shop.lego.com/en-US/King-s-Castle-70404?_requestid=25...

dchuk · on Nov 12, 2014

Can anyone go into a bit more detail about how the affiliate commissions work here? From what I have read, I would feed my affiliate link through TwoTap and you would then handle the cookie and conversion and everything?

If I was using URLs gathered from a Commission Junction datafeed, is this basically a plug and play solution? Or do I need to process those URLs?

Do you have a backend stats dashboard? Or would I still rely on CJ for that data?

sradu · on Nov 12, 2014

We simulate what a shopper would do. We first go through your affiliate link (which drops a cookie) and then go on the retailer website to place the order.

All the commissioning, connecting/talking to retailers, receiving the money, is directly between you and the affiliate network. We're plug and play :)

We do have a stats backend where you can see all the purchases that went through Two Tap. And you can also use CJs dashboard just like you are probably doing right now.

dchuk · on Nov 12, 2014

Thanks for the reply. Are the retailers cool with all of this?

razvanr · on Nov 12, 2014

Yes. It used to be more controversial 2 years ago.

They're only reticent to not getting the consumer data, breaking the relationship with shoppers or not processing the payments. And control over who sells their inventory obviously -- which is already in place through their relationships with affiliate networks.

Tick those boxes and they're cool with it and supportive. That being said we're still expanding tech support for retailers faster than BD can can keep up -- 75 new retailers monthly at this point. That's why we're pushing all our affiliates to get an approval before using Two Tap in order to keep getting affiliate revenues.

Experiments from Twitter and Facebook also do a good job at educating the market which works in our favor -- at least merchants learn what they don't want :)

quaffapint · on Nov 13, 2014

So you guys are scraping all the product information for a retailer and keeping it up to date? Or is it all live, you fetch it when that particular url is called? Where do you get the list of retailers to scrape?

razvanr · on Nov 13, 2014

Not really, we're not building a product catalog.

We're fetching the live data only for the products requested via an URL. Two Tap mimics a consumer visiting the retailer and getting that info for themselves which also allows retailers to retain their analytics layer with no negative impact.

Our current supported stores span the top 500 as well as a number of specific integration requests.

Animats · on Nov 12, 2014

This is sort of what Google Shopping was before it went all-ads.

opendais · on Nov 12, 2014

That isn't quite right. Many vendors provided them with XML feeds of their products just as they do with Amazon.

razvanr · on Nov 12, 2014

Correct. We don't get any input from the retailers. Two Tap can get product availability info and place an order just by having the product URL, nothing else.

Also, the full retailer inventory is available, unlike FB or other models that require the shop to upload a certain number of products.

dmritard96 · on Nov 12, 2014

How many proxy nodes do you have?

sradu · on Nov 12, 2014

Paraphrasing newrelic "it takes a village to count our proxy nodes".

We have our placing orders infrastructure on AWS, and whole in-house cloud dedicated to product crawling built on top of Digital Ocean.

dmritard96 · on Nov 20, 2014

hm, I didn't mean infrastructure, I mean, did you buy proxy nodes from someone like sslprivateproxy and slap HAProxy in front... Most ecommerce sites wise up to bots crawling them and have a robots.txt that suggests you might not want to...

michaelmcmillan · on Nov 13, 2014

This seems hard, but I think that's your big advantage (business-wise).

notastartup · on Nov 12, 2014

I don't get it. Is this just a middle man between all the retail websites and the publishers? Sort of like what Google is doing with the product search and also giving comissions on the items sold?

razvanr · on Nov 12, 2014

Yes, you could say that. We're laying down pipes in ecommerce so you can send an order to a merchant from anywhere on the web in a standardised API.

Retailers can extend their reach and make their inventory shoppable from anywhere with an internet connection and publishers can build ecommerce in their apps.

notastartup · on Nov 13, 2014

Interesting but why the need for web scraping? If the retailers saw the monetary benefit of this wouldn't they go out of their way to provide you the data directly and an API as well?

lurcio · on Nov 13, 2014

Tap certainly would solve the feed problem - but thats a problem for affiliates, not retailers :-)

Its likely that I don't have much clue what I'm talking about, but I can't see great benefits for many retailers from this as it stands. They like more control over leads and to onboard customers into their brand. Typically, this is managed through affiliate programs and partnerships.

Two Tap looks like it allows prospects/customers to be kept under the publishers wing. The retailer gets the transaction, but the publisher keeps the relationship - where the real value (capital) is created and can be realised.

As someone with an interest in a small time publisher - I'm very interested in this.

However, I have a concern with how well it sits with affiliate t&c's. These vary by program of course - but those I know prohibit scraping by any means. That's stopped me doing similar on 80legs or other in the past.

If above is right, Two Tap may well have to develop the kind of relationships with retailers that better serves their needs at the mouth of the funnel. Then Tap risks becoming like anyone other program (CJ/AW..) - & the scraping will likely have to stop. I'm sure they know the space better than anyone here though - and certainly better than me - so I'd be interested to hear if they have these relationships in place with the scraping MO, or are confident of a way forward.

Given that - the CTA only leaves me wondering whats on the other side of the wall: "Sounds interesting? Let’s talk! - Sign up below"

Do you have traffic/other requirements. Whats the pricing? It'd be nice if you could inform me more/make me work less if I want to use your service :-)

razvanr · on Nov 14, 2014

You're right on a lot of accounts. Let's talk, can you get in touch at hello@ please? Happy to talk about all of this.

The main difference now is that both retailers as well as affiliate networks are actively trying to push this model once they've seen they can make 5x more money from the same amount of impressions.

Also, the publisher gets to keep PART the relationship -- the one dealing with the users interaction with their product. The relationship with the retailer that sells the product is also kept intact -- consumer loses just one touchpoint (a visit to the product page) but gets same confirmation email, returns, customer support etc. The retailer has same load on servers and gets the exact same user data they would normally (shipping, email, address, billing, payment etc).