MixRank (YC S11) Raises $1.5M from Mark Cuban, 500 Startups

jphackworth · on Dec 13, 2011

Quite an interesting product. I typed a competitor's name in and immediately saw a huge list of ads they were running. Very helpful in just ten seconds!

smilliken · on Dec 13, 2011

As an aside, we're hiring engineers: http://news.ycombinator.com/item?id=3160100

We're working on interesting problems at scale, like crawling the web and big data analytics. If this sounds interesting, it's worth reaching out :).

jstreebin · on Dec 14, 2011

We're pysched to be in on this one. http://blog.raaventures.com/post/14220926611/raaventuresinve... Looking forward to working with Scott and Ilya

badclient · on Dec 13, 2011

I get how they can crawl ads. But how do they figure out the most effective ad? The only thing I can think of is that the most effective ad is the most frequent one. But that isn't necessarily true.

ntoshev · on Dec 13, 2011

MixRank’s search engine for AdSense crawls pages running Google ads, indexing effectiveness data and estimating ad performance.

That's strange. Crawling Google's ads is prohibited by robots.txt and their terms of service. And Google tends to enforce this kind of rules.

dangrossman · on Dec 13, 2011

There are a half dozen other mass ad crawling services selling the same service (being able to search and analyze your competitors' ads), dating back at least several years.

On one hand you'd think if this wasn't permissible, they'd be big targets and Google would go after them legally. On the other hand, what would the legal basis for blocking this be?

Someone browsing my website and seeing a Google ad has not agreed to any Google terms of service. It's a tough argument that a browser is allowed to do this but a crawler isn't, especially when Google's own crawlers now include javascript-executing webkit browsers.

ntoshev · on Dec 13, 2011

AdWords come from google.com. You'd think this service is not limited to adsense...

They could enforce it by technical means, by just blocking your IP.

k33n · on Dec 13, 2011

Not really. Takes about 5 seconds to get a new IP for an Amazon EC2 instance.

smilliken · on Dec 13, 2011

The ads we crawl aren't served from the google.com domain, and we haven't agreed to any TOS.

If this were the case, then you wouldn't be able to run a headless browser without accidentally violating TOS and robots.txt.

zackzackzack · on Dec 13, 2011

Out of curiosity, how can they enforce that sort of thing? Blocking ip's is the only thing that comes to the top of my head, but some tricky javascript might be able to keep google from even knowing that they are being crawled at all. How could google even find the people doing it? (Besides, you know, reading this article.)

ntoshev · on Dec 13, 2011

If you start doing automated requests to their search engine, very soon they start serving captchas. They detect it using machine learning on features like number of requests coming from an IP, etc.

asanwal · on Dec 13, 2011

We've used SpyFu before. How is this different/better?

myared · on Dec 13, 2011

This addresses Google's content network. You still need SpyFu to target the search network.

dwynings · on Dec 14, 2011

Congrats, Scott & Ilya!