Hacker News new | past | comments | ask | show | jobs | submit login

Is there much point in pointing out a dupe if there is no discussion on it?



This problem has to be solved by pg, or whoever maintains the code here. It's really too much to expect users to review all posts to see if what they want to submit has been submitted before. It's also too much to expect that there is only one url to the same content.


This is my submission and I looked as much as I could. Not only does YC have no dupe check, there's no search either.


http://searchyc.com/

It wouldn't necessarily have helped in this case, but scanning through the previous 12 entries in the "new" listing would've found the duplication.


I looked at the new ones and just didn't pick it up. (I did see it later.) I think I was expecting the original title.


URL detection isn't as trivial as it sounds - lots of edge cases http://visualwebsiteoptimizer.com/split-testing-blog/challen...


I wonder if you could do a quick content check -- something along the lines of:

    Run the content through the "Readability" filter (http://lab.arc90.com/experiments/readability/)
    Hash the first N paragraphs
    Compare the hash on a submitted article to all hashes from that domain


It doesn't have to be a perfect solution. Making an http request to a submitted url and grabbing the contents of the title tag is a trivial development effort that add a lot of value to a community that doesn't want to see the same stuff on the front page over and over...


And as a bonus it's a problem waiting for someone to solve


No there isn't. I've already suggested a solution that will work in almost every case of duplication I've detected, and it appears to be completely ignored.

To be fair the general decline in civility and quality, and the general increase in spam might be regarded as more pressing issues, but the solution I suggested seems trivial to implement and moderately effective.

http://news.ycombinator.com/item?id=1012215


Isn't this what plagiarism algorithms are doing? Scrape the content, run through the plagiarism filter (although not a very strict one to prevent false positives on quotations and the like) and you are done. If there's a match, show it so the submitter can see for himself.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: