Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Yahoo's Latest Acquisition Stole & Broke My Heart (readwriteweb.com)
41 points by aditya on Oct 16, 2010 | hide | past | favorite | 15 comments


>"It turns out the client at the end of the long pipeline of invoices sold a diet pill, and young women were complaining on MySpace and forums that the pill sometimes caused leakage from their..."

So you thought cause you were a consultant that it didn't matter that you were aggregating people's personal posts from MySpace for the purposes of Big Pharma? Surely, we must draw the ethical line somewhere.


Well, it was aggregating complaints of side-effects, which is a good thing for the pharma company to keep track of if they're going to sell such a drug. But yeah, that's why I felt uncomfortable about working with them, didn't continue and now joke about it publicly. Plus it's odd.


> It was beautiful, but people didn't want it, they didn't understand it. Because people are stupid.

I watched his video explaining Dapper and could only loosely follow what he was doing. It was clear that he was making an RSS feed of changes to a web page, but the process of selecting the dynamic elements of the target page was unclear to me. And several times he had to say things like, "Oh, it's confused now. We'll just fix that..."

So maybe you came to the right conclusion that the rest of the world is stupid. Or maybe Dapper was a little difficult to use, and the value proposition was a little vague, and it never really took off. Thousands of cool projects have met the same fate. That's just it goes.

Keep your chin up. At least you got a cool sweatshirt.


A company called "Fetch" is very similar to Dapper. Their tech is used to aggregate things like Dow Jones news stories.

They monetized by selling licenses to use their scraper. I wonder why Dapper couldn't do the same?

http://www.fetch.com/


anyone knows of something equivalent to Dapper out there? I really wish there was since I need this for a project. Thanks!


Pythoneers have BeautifulSoup, it is fast simple and can deal with real world html. I have used it for site scraping with great success.


lxml.html does a pretty good job too, and offers elementtree and xpath querying. http://codespeak.net/lxml/lxmlhtml.html

Recently I used Beautiful Soup in a very simple program to scrape playlists from Soma.fm: http://www.michielovertoom.com/hobby/somafm-playlists/


http://webnumbr.com is like dapper for numbers


If you're pretty technically inclined and know your way around FireBug/Webkit Inspector, YQL (Yahoo Query Language) is very convenient. It lets you use css selectors to grab data and returns it in JSON or XML.

It's great for quick and dirty hacks, but the big question is how long Yahoo will allow it to stick around.



Thanks for all those great replies. I will now spend some time reviewing!!


Fetch, but it's super expensive: http://www.fetch.com/


http://needlebase.com/ belongs to ITA software being acquired by google... You can find some public datasets at https://pub.needlebase.com/


Enjoy it while you can, until an uncaring market starves it to death and it turns into an ad network, for lack of viable alternatives.

Depressing as that is, perhaps the future holds promise. As the cost of building startups keeps trending down, maybe timing will stop mattering as much, and business model innovation (such as freemium) will save good technology from turning into ad networks?

Isn't this part of the skill required to build a successful business anyway, your technology is only as good as the people selling it. Where would google be, if they hadn't hit on AdSense?


Fun fact: Jon is the person who told me about HN in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: