Nice landing page. If you Google "summarizer", you will find dozens of similar services for free. The mechanism behind it is very simple. A couple a years ago I built one from scratch in about 2 hours, then I accidentally deleted it and rewrote it in 15 minutes. Here's how most of them work:
1. Split the text into words
2. Rank each word based on how many times it appears in the text. For example, a word that appears 10 times gets 10 points, and so on.
3. Rank sentences based on the sum of the scores of each word inside them.
4. Return the top N sentences by score (N is up to the user), in the order in which they appear in the text.
For extra fancyness, exclude the most common articles and prepositions and give 2 points to proper nouns.
You can use tf-idf [1] to achieve step 2 and that extra fancy part of excluding commmon articles and prepositions: count the frequency of words in the article, but divide it by the sum of frequencies from past articles.
Text summarization works as a good toy problem, because it leads to two harder problems: 1. text extraction (how to distinguish content from non-content like ads) 2. q&a (given text and a question about the text, how can you produce an answer).
True, there are quite a few similar services but not many seem to work well. Our service provides better summarization (at least for the articles I tested), had additional features like extracting author name, publish data, important keywords etc and also comes with browsers extensions so you could summarize pages at the click of a button.
The method you described is a part of our algorithms but more steps are needed to make it give meaningful results and make sure it works on different kinds of articles.
I built a similar service a while back, with a small modification to the common algorithm.
You can improve contextual summarization by splitting the x sentences into x/n buckets. Then based on the percent of article to be summarized (eg return 60% of the article), pick the sentences ranked in the top 60% of each bucket. Then do this for all the x sentences, ie top 60% across buckets, and combine them together.
This prevents the bias rising from picking a sentence with a lot of critical words.
I agree that the effectiveness is quite surprising, given the simplicity of the analysis. It can go very wrong, however, if a significant negation is overlooked, as in cautionary tales:
... So, don't do what the late Thag Simmons did...
Maybe final paragraphs should be more highly weighted? That's often where the conclusion is.
1. From the articles I tried, the summaries seem to be very basic. They don't seem to capture the essential points of the articles it is trying to summarize.
2. I tried the 'advanced summarizer' too. Here again, it seems to have the same problem. Worse, it seems to skip parts of the article, especially if they are beyond a certain length.
3. The landing page is nice. But the product seems to be targeted towards people who want to share a summary of a random blog post rather than try and save time reading the article.
In my opinion, SMMRY, as mentioned in another comment and which I've used since whenever, seems like a much better product. Additionally, SMMRY also gives you the capability of expanding the number of lines of the summary in case you've found the article interesting and want to read a few more details of it, rather than the full thing.
Is there any thought put into considering if this type of service is actually beneficial?
Of course on it's face it seems nice that it saves us time. But it's no secret that the reduction of complicated topics into simplified one-liners leads to less understanding and more misinformation spread.
In my opinion, this just makes that problem worse. There is often a reason that texts aren't already shorter. If the author didn't intend for you to read the details of something and instead wanted you to just read bullet points, they would have just made the bullet points themselves.
I tried the advanced summarizer and got the lines below. Seems to me it skips summarizing beyond a certain length of the article.
Most people have had mild to moderate illness and recovered, but the virus is more serious for those who are older or have other health problems.
The risk of virus transmission from food servers is the same risk as transmission from other infected people, but “one of the concerns in that food servers, like others facing stark choices about insurance and paychecks, may be pressured to work even if they are sick,” she said.
Tests have found high amounts of virus in the throats and noses of people a couple days before they show symptoms.
Flu kills about 0.1% of those it infects, so the new virus seems about 10 times more lethal, the National Institutes of Health’s Dr. Anthony Fauci told Congress last week.
The death rate has been higher among people with other health problems -- more than 10% for those with heart disease, for example.
If the Basic Summarizer is meant to convince me to sign up for the Advanced Summarizer, that's not gonna happen with the former providing unconvincing results.
While I didn't have high hopes to get a summary of a technical paper, since I spend a good chunk of time every week reading some related to exploits and mitigations for a podcast I host, I hoped this might help reduce time spent trying to get an overall understanding before diving into the details.
It actually did better than I expected with the paper "Bypassing memory safety mechanisms through speculative control flow hijacks" [0]
I copied and pasted the text from sections 3-7 (Case Studies - Conclusion) and Section 2 on its own (describes the attack)
It did pull out some important statements, better than I expected. Probably won't save me much time, but I was quite disappointed by the fact that the Advanced and Basic versions were the same for both which kinda felt a bit cheap to get the same results especially since it still cost to get that advanced result. Maybe including information about how the basic version is restricted and what the advanced does better would make it easier to know when the advanced version won't be useful.
I also tested with a random write-up I'll be covering tomorrow "Breaking the Competition" [1] I had higher hopes for this since it was more of a blog-ish post. I did get different results for basic and advanced with this one, but the result was basically non-sense, worse than expected, and worse than the paper summary.
Overall, probably not something that I'll end up using, but technical content also isn't the intended use-case which is totally fair. I'll also add that one feature that I looked for immediately was API access as I'd have wanted to integrate this into an app I use to plan episodes.
I did try to run it four times, and in each case the result was semi-random: it looks like picking 4 random sentences that open paragraphs. There was not a single case when I would consider the output useful-ish.
On March 9th America’s government awarded a trio of firms $39.7m to design “microreactors” that can supply a few megawatts of power to remote military bases, and be moved quickly by road, rail, sea and air.
The idea of small reactors is as old as nuclear power itself.
In July 1951, five months before a reactor in Idaho became the first in the world to produce usable electricity through fission, America began building USS Nautilus, a nuclear-powered submarine.
A report by the army in 2018 said that Holos, a prototype mobile nuclear reactor, would be 62% cheaper than using liquid fuel.
NASA is developing smaller “Kilopower” reactors for space missions, designed to power small lunar outposts.
Glad that you liked it.
Have you tested it out with the advanced summarizer?
Thank you very much for your encouragement. I will try to keep improving it.
Interesting stuff. Have you succeeded in getting paying customers for such a service? I've seen some similar free alternatives online, i.e. resoomer, smmry
v1 of our service was free as well.
v2 includes a basic summarizer which is free and an advanced summarizer which requires payment.
Just launched the premium version, so waiting to see.
Not exactly “serverless” but I built something similar with AWS SageMaker, which has elastic inference abilities. it’s rather fast to spin up and down.
Also, when it comes to summarization- you don’t really need to infer each run, you can throw up a pretty simple caching system. Which means repeat requests are far cheaper and faster.
I used cloudflare workers as a proxy / caching layer with KV in front of an AWS lambda to do article extraction and SageMaker spinup (with a small cache on the AWS side too- to catch in progress jobs)
Glad you liked it. Let me know if you have any feedback/suggestions.
Yes, we do plan to launch an API soon.
Please message us here - https://tldrthis.com/contact and we will let you know when we launch it.
* It's intended in part as a commemoration on the 50-year anniversary of the sketch, but also to draw attention to the need for a more streamlined peer review process for grants in the health sciences.
* "So, put together a Monty Python fan with a creative scientific mind and an expert in gait analysis, and this paper is what you get," Butler told Ars. Or, as they wrote in their paper, "It really is the silliness of the sketch that resonates with us, and extreme silliness seems more relevant now than ever before in this increasingly Pythonesque world."
* First aired on September 15, 1970, on BBC One, the sketch opens with Cleese's character buying a newspaper on his way to work—which takes him a bit longer than usual since his walk "has become rather sillier recently." Waiting for him in his office is a gentleman named Mr. Putey (Michael Palin) seeking a grant from the Ministry to develop his own silly walk.
* (Note: the name is spelled "Pudey" in the paper but we're going with the Wiki spelling.) Mr. Putey demonstrates his silly walk-in-progress, but the Minister isn't immediately impressed.
Advanced
* One of the best-known sketches from Monty Python's Flying Circus features John Cleese as a bowler-hatted bureaucrat with the fictional Ministry of Silly Walks.
* Waiting for him in his office is a gentleman named Mr. Putey (Michael Palin) seeking a grant from the Ministry to develop his own silly walk.
* For their own gait analysis, Butler and Dominy studied both Mr. Putey's and the Minister's gait cycles in the video of the original 1970 televised sketch, as well as the Minister's gaits from a 1980 live stage performance in Los Angeles.
* Butler and Dominy found that the Minister's silly walk is much more variable than a normal human walk—6.7 times as much—while Mr. Putey's walk-in-progress is only 3.3 times more variable.
* The sketch might be satirizing bureaucratic inefficiency, but Cleese's Minister is essentially engaging in a hyper-streamlined version of the peer review process in his meeting with Mr. Putey that (the authors concluded) resulted in a fair assessment.
Yes, you are right. There is still room for improvement and we will keep trying to make it better.
The reason I added paid plans was to test whether people would be willing to pay for such a service so that I can spend more time in adding more features to it and making it better.
If you Google "summarizer", you will find dozens of similar services for free. A couple a years ago I built one from scratch in about 2 hours, then I accidentally deleted it and rewrote it in 15 minutes. Rank each word based on how many times it appears in the text.
1. Split the text into words
2. Rank each word based on how many times it appears in the text. For example, a word that appears 10 times gets 10 points, and so on.
3. Rank sentences based on the sum of the scores of each word inside them.
4. Return the top N sentences by score (N is up to the user), in the order in which they appear in the text.
For extra fancyness, exclude the most common articles and prepositions and give 2 points to proper nouns.
Works surprisingly well.