Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: TLDR This – Auto summarize any article or webpage in a click (tldrthis.com)
118 points by radhakrsna on March 15, 2020 | hide | past | favorite | 59 comments


Nice landing page. If you Google "summarizer", you will find dozens of similar services for free. The mechanism behind it is very simple. A couple a years ago I built one from scratch in about 2 hours, then I accidentally deleted it and rewrote it in 15 minutes. Here's how most of them work:

1. Split the text into words

2. Rank each word based on how many times it appears in the text. For example, a word that appears 10 times gets 10 points, and so on.

3. Rank sentences based on the sum of the scores of each word inside them.

4. Return the top N sentences by score (N is up to the user), in the order in which they appear in the text.

For extra fancyness, exclude the most common articles and prepositions and give 2 points to proper nouns.

Works surprisingly well.


You can use tf-idf [1] to achieve step 2 and that extra fancy part of excluding commmon articles and prepositions: count the frequency of words in the article, but divide it by the sum of frequencies from past articles.

Text summarization works as a good toy problem, because it leads to two harder problems: 1. text extraction (how to distinguish content from non-content like ads) 2. q&a (given text and a question about the text, how can you produce an answer).

[1] https://en.wikipedia.org/wiki/Tf%E2%80%93idf


I first heard about this from the autotldr bot on Reddit [1] which uses a similar service SMMRY [2].

The SMMRY page points out some extra NLP-related grunt work in addition to the high-level steps you list, like:

>Associate words with their grammatical counterparts. (e.g. "city" and "cities")

>Detect which periods represent the end of a sentence. (e.g "Mr." does not).

[1] https://www.reddit.com/r/autotldr/comments/31b9fm/faq_autotl...

[2] https://smmry.com/about


True, there are quite a few similar services but not many seem to work well. Our service provides better summarization (at least for the articles I tested), had additional features like extracting author name, publish data, important keywords etc and also comes with browsers extensions so you could summarize pages at the click of a button.

The method you described is a part of our algorithms but more steps are needed to make it give meaningful results and make sure it works on different kinds of articles.


I used a similar algorithm when developing a Chrome extension a few years ago: https://chrome.google.com/webstore/detail/auto-highlight/dnk...

Longer sentences had an inherent advantage, so I controlled for that by reducing sentence scores as a function of the sentence length.


I built a similar service a while back, with a small modification to the common algorithm.

You can improve contextual summarization by splitting the x sentences into x/n buckets. Then based on the percent of article to be summarized (eg return 60% of the article), pick the sentences ranked in the top 60% of each bucket. Then do this for all the x sentences, ie top 60% across buckets, and combine them together.

This prevents the bias rising from picking a sentence with a lot of critical words.


I agree that the effectiveness is quite surprising, given the simplicity of the analysis. It can go very wrong, however, if a significant negation is overlooked, as in cautionary tales:

... So, don't do what the late Thag Simmons did...

Maybe final paragraphs should be more highly weighted? That's often where the conclusion is.


I really want to like this. My points -

1. From the articles I tried, the summaries seem to be very basic. They don't seem to capture the essential points of the articles it is trying to summarize.

2. I tried the 'advanced summarizer' too. Here again, it seems to have the same problem. Worse, it seems to skip parts of the article, especially if they are beyond a certain length.

3. The landing page is nice. But the product seems to be targeted towards people who want to share a summary of a random blog post rather than try and save time reading the article.

In my opinion, SMMRY, as mentioned in another comment and which I've used since whenever, seems like a much better product. Additionally, SMMRY also gives you the capability of expanding the number of lines of the summary in case you've found the article interesting and want to read a few more details of it, rather than the full thing.

SMMRY: https://smmry.com/


Is there any thought put into considering if this type of service is actually beneficial?

Of course on it's face it seems nice that it saves us time. But it's no secret that the reduction of complicated topics into simplified one-liners leads to less understanding and more misinformation spread.

In my opinion, this just makes that problem worse. There is often a reason that texts aren't already shorter. If the author didn't intend for you to read the details of something and instead wanted you to just read bullet points, they would have just made the bullet points themselves.


Didn't do a great job on this AP News article on coronavirus - only four bullet points, two of them repeated: https://apnews.com/545af824f44a22f7559c74679a4f1f53.


I tried the advanced summarizer and got the lines below. Seems to me it skips summarizing beyond a certain length of the article.

Most people have had mild to moderate illness and recovered, but the virus is more serious for those who are older or have other health problems.

The risk of virus transmission from food servers is the same risk as transmission from other infected people, but “one of the concerns in that food servers, like others facing stark choices about insurance and paychecks, may be pressured to work even if they are sick,” she said.

Tests have found high amounts of virus in the throats and noses of people a couple days before they show symptoms. Flu kills about 0.1% of those it infects, so the new virus seems about 10 times more lethal, the National Institutes of Health’s Dr. Anthony Fauci told Congress last week.

The death rate has been higher among people with other health problems -- more than 10% for those with heart disease, for example.


The Basic Summarizer has its restrictions. Try Advanced summarizer. It will give better results.


If the Basic Summarizer is meant to convince me to sign up for the Advanced Summarizer, that's not gonna happen with the former providing unconvincing results.


While I didn't have high hopes to get a summary of a technical paper, since I spend a good chunk of time every week reading some related to exploits and mitigations for a podcast I host, I hoped this might help reduce time spent trying to get an overall understanding before diving into the details.

It actually did better than I expected with the paper "Bypassing memory safety mechanisms through speculative control flow hijacks" [0]

I copied and pasted the text from sections 3-7 (Case Studies - Conclusion) and Section 2 on its own (describes the attack)

It did pull out some important statements, better than I expected. Probably won't save me much time, but I was quite disappointed by the fact that the Advanced and Basic versions were the same for both which kinda felt a bit cheap to get the same results especially since it still cost to get that advanced result. Maybe including information about how the basic version is restricted and what the advanced does better would make it easier to know when the advanced version won't be useful.

I also tested with a random write-up I'll be covering tomorrow "Breaking the Competition" [1] I had higher hopes for this since it was more of a blog-ish post. I did get different results for basic and advanced with this one, but the result was basically non-sense, worse than expected, and worse than the paper summary.

Overall, probably not something that I'll end up using, but technical content also isn't the intended use-case which is totally fair. I'll also add that one feature that I looked for immediately was API access as I'd have wanted to integrate this into an app I use to plan episodes.

- [0] https://arxiv.org/pdf/2003.05503v1.pdf

- [1] https://medium.com/ctf-writeups/breaking-the-competition-bug...


I did try to run it four times, and in each case the result was semi-random: it looks like picking 4 random sentences that open paragraphs. There was not a single case when I would consider the output useful-ish.


Can you please let me know the article that you tested it on? Maybe you could try the advanced summarizer and see if it gives useful results.



Thank you! Yes, it doesn't work that well on technical articles. We will try to keep improving it.


You could try selling it to yahoo for a few billion dollars like some other kid did with exactly the same thing



I love the idea. The implementation did not produce the expected results ( article shown on HN - https://amp-economist-com.cdn.ampproject.org/c/s/amp.economi... ).

That said. Keep at it. It seems like a viable and valuable service.


Using the advanced summarizer, I got this -

On March 9th America’s government awarded a trio of firms $39.7m to design “microreactors” that can supply a few megawatts of power to remote military bases, and be moved quickly by road, rail, sea and air. The idea of small reactors is as old as nuclear power itself.

In July 1951, five months before a reactor in Idaho became the first in the world to produce usable electricity through fission, America began building USS Nautilus, a nuclear-powered submarine.

A report by the army in 2018 said that Holos, a prototype mobile nuclear reactor, would be 62% cheaper than using liquid fuel.

NASA is developing smaller “Kilopower” reactors for space missions, designed to power small lunar outposts.


Glad that you liked it. Have you tested it out with the advanced summarizer? Thank you very much for your encouragement. I will try to keep improving it.


It sounds like a joke, I tried to reduce a technical manual but it didn't... well... I don't really know how to expect anything. This however: https://www.latimes.com/world-nation/story/2020-03-13/china-... worked really well.


It isn't really built for manuals etc. It is mainly for blog posts or news articles.


I didn't expect it to work for either.


Interesting stuff. Have you succeeded in getting paying customers for such a service? I've seen some similar free alternatives online, i.e. resoomer, smmry


v1 of our service was free as well. v2 includes a basic summarizer which is free and an advanced summarizer which requires payment. Just launched the premium version, so waiting to see.


I’ve noticed an increase in services like this lately. What gives? Is there some sort of ML serverless offering made available on GCP?


Not exactly “serverless” but I built something similar with AWS SageMaker, which has elastic inference abilities. it’s rather fast to spin up and down.

Also, when it comes to summarization- you don’t really need to infer each run, you can throw up a pretty simple caching system. Which means repeat requests are far cheaper and faster.

I used cloudflare workers as a proxy / caching layer with KV in front of an AWS lambda to do article extraction and SageMaker spinup (with a small cache on the AWS side too- to catch in progress jobs)


That’s really interesting. Cool idea


This is v2 of the service that I launched last year. I am not aware of any ML serverless offering that does text summarization.


Oh interesting! So you made the summarization algorithm?


Awesome product! I was quite surprised at how good it summarized Dutch articles as well.

Are you planning to launch an API anytime soon?


Glad you liked it. Let me know if you have any feedback/suggestions. Yes, we do plan to launch an API soon. Please message us here - https://tldrthis.com/contact and we will let you know when we launch it.


Did a fairly poor job on this Ars article https://arstechnica.com/science/2020/03/what-monty-pythons-m...


You could try the advanced summarizer. It gives better results.


Basic

* It's intended in part as a commemoration on the 50-year anniversary of the sketch, but also to draw attention to the need for a more streamlined peer review process for grants in the health sciences.

* "So, put together a Monty Python fan with a creative scientific mind and an expert in gait analysis, and this paper is what you get," Butler told Ars. Or, as they wrote in their paper, "It really is the silliness of the sketch that resonates with us, and extreme silliness seems more relevant now than ever before in this increasingly Pythonesque world."

* First aired on September 15, 1970, on BBC One, the sketch opens with Cleese's character buying a newspaper on his way to work—which takes him a bit longer than usual since his walk "has become rather sillier recently." Waiting for him in his office is a gentleman named Mr. Putey (Michael Palin) seeking a grant from the Ministry to develop his own silly walk.

* (Note: the name is spelled "Pudey" in the paper but we're going with the Wiki spelling.) Mr. Putey demonstrates his silly walk-in-progress, but the Minister isn't immediately impressed.

Advanced

* One of the best-known sketches from Monty Python's Flying Circus features John Cleese as a bowler-hatted bureaucrat with the fictional Ministry of Silly Walks.

* Waiting for him in his office is a gentleman named Mr. Putey (Michael Palin) seeking a grant from the Ministry to develop his own silly walk.

* For their own gait analysis, Butler and Dominy studied both Mr. Putey's and the Minister's gait cycles in the video of the original 1970 televised sketch, as well as the Minister's gaits from a 1980 live stage performance in Los Angeles.

* Butler and Dominy found that the Minister's silly walk is much more variable than a normal human walk—6.7 times as much—while Mr. Putey's walk-in-progress is only 3.3 times more variable.

* The sketch might be satirizing bureaucratic inefficiency, but Cleese's Minister is essentially engaging in a hyper-streamlined version of the peer review process in his meeting with Mr. Putey that (the authors concluded) resulted in a fair assessment.


I tried this on three of my own articles. It worked very poorly. There’s a lot of room for improvement before asking people to pay for it IMO.

I may try my hand at writing one with the reqs outlined in the top comment just for a fun coding project.


Yes, you are right. There is still room for improvement and we will keep trying to make it better. The reason I added paid plans was to test whether people would be willing to pay for such a service so that I can spend more time in adding more features to it and making it better.


Fair enough, decent business strategy!


I get an error. I’m on Safari mobile on iOS 13.4. I use content blockers, BlockBear and Firefox Focus - not sure if that’s relevant or not.

Method Not Allowed

The method is not allowed for the requested URL.


Thank you for letting me know. Are you using the extension or the web app?


We app from the Show HN link


Firefox Mobile was functional.


Hugged to death? Entering a url to summarize just results in an error for me.


If you Google "summarizer", you will find dozens of similar services for free. A couple a years ago I built one from scratch in about 2 hours, then I accidentally deleted it and rewrote it in 15 minutes. Rank each word based on how many times it appears in the text.


some of the results had me laughing out loud. it would be super useful if it worked though. its a Hard problem.


How did you make the landing page?


This thing understands Finnish? Very strange. TLDR from this was quite excellent: https://sarastuslehti.com/2020/03/12/koronaviruksen-kayra-on...


Yes, it works on quite a few languages. Glad you liked it. Let me know if you have any feedback/suggestions.


The article was a translation from English. The English summary was less good, as it repeats same sentence twice. https://www.takimag.com/article/crushing-the-coronavirus-cur...


Yes, that's because that sentence appears twice in the article. Advanced summarizer gives a better summary for that URL.


It would be more useful to me if the authors of articles would provide the tl;dr themselves.


How does this work?


not bad, not bad


Glad you liked it. Let me know if you have any feedback/suggestions.


Works great!


Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: