Hacker News new | past | comments | ask | show | jobs | submit login
MixRank (YC S11) Raises $1.5M from Mark Cuban, 500 Startups (techcrunch.com)
101 points by il on Dec 13, 2011 | hide | past | favorite | 14 comments



Quite an interesting product. I typed a competitor's name in and immediately saw a huge list of ads they were running. Very helpful in just ten seconds!


As an aside, we're hiring engineers: http://news.ycombinator.com/item?id=3160100

We're working on interesting problems at scale, like crawling the web and big data analytics. If this sounds interesting, it's worth reaching out :).


We're pysched to be in on this one. http://blog.raaventures.com/post/14220926611/raaventuresinve... Looking forward to working with Scott and Ilya


I get how they can crawl ads. But how do they figure out the most effective ad? The only thing I can think of is that the most effective ad is the most frequent one. But that isn't necessarily true.


MixRank’s search engine for AdSense crawls pages running Google ads, indexing effectiveness data and estimating ad performance.

That's strange. Crawling Google's ads is prohibited by robots.txt and their terms of service. And Google tends to enforce this kind of rules.


There are a half dozen other mass ad crawling services selling the same service (being able to search and analyze your competitors' ads), dating back at least several years.

On one hand you'd think if this wasn't permissible, they'd be big targets and Google would go after them legally. On the other hand, what would the legal basis for blocking this be?

Someone browsing my website and seeing a Google ad has not agreed to any Google terms of service. It's a tough argument that a browser is allowed to do this but a crawler isn't, especially when Google's own crawlers now include javascript-executing webkit browsers.


AdWords come from google.com. You'd think this service is not limited to adsense...

They could enforce it by technical means, by just blocking your IP.


Not really. Takes about 5 seconds to get a new IP for an Amazon EC2 instance.


The ads we crawl aren't served from the google.com domain, and we haven't agreed to any TOS.

If this were the case, then you wouldn't be able to run a headless browser without accidentally violating TOS and robots.txt.


Out of curiosity, how can they enforce that sort of thing? Blocking ip's is the only thing that comes to the top of my head, but some tricky javascript might be able to keep google from even knowing that they are being crawled at all. How could google even find the people doing it? (Besides, you know, reading this article.)


If you start doing automated requests to their search engine, very soon they start serving captchas. They detect it using machine learning on features like number of requests coming from an IP, etc.


We've used SpyFu before. How is this different/better?


This addresses Google's content network. You still need SpyFu to target the search network.


Congrats, Scott & Ilya!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: