Hacker News new | past | comments | ask | show | jobs | submit login

Great, i searched for a readability's ruby port on github but the closest i can find is pismo: https://github.com/peterc/pismo



Pismo used to use ruby-readability (linked above) but I ended up writing my own system. It works similarly to Readability but is better on certain types of poorly formatted content (but worse on others, so YMMV). Pismo is more a general purpose content extraction library than Readability and better suited for machine processing and summarization (which is what I use it for).

Pismo also comes with a command line client built-in, so you can do stuff like this:

  $ pismo http://preona.net/2010/11/ever-wanted-arc90s-readability-as-an-api/ title sentences

  :title: "Ever wanted arc90's Readability as an API?"
  :sentences: Over at Preona we have been wanting something just like that for a while now. So we built it! Some time ago, while developing LazyReadr, we were faced with the fact that RSS feeds simply aren't all that lovely anymore.
Note: "sentences" picks the first few sentences by default, but this is ideal for a summary by an automated system or for a news page :-)


You should check out topicmarks. It does summaries in a very smart way from what I've seen. We'll likely start using it to do automated summaries for RSS content in our iPad app.


Sadly, though, it "takes minutes" (and they even seem to make a big point of that..) It might be useful for slightly better summaries though I've had great luck with going with the first paragraph of an article so far (or certain other metadata if it scores better, like <meta> description).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: