Hacker Newsnew | past | comments | ask | show | jobs | submit | Moosdijk's commentslogin

Not even my bookmarks get trashed. It’s like a treasure trove.


what's up with the amount of new accounts praising this project?


Seems like someone (OP or not) is testing how good they can use HN for free advertisement.


I only see two green usernames. Have others been deleted already?


odd indeed


[flagged]


I always wondered how the heck do people get away with that. HN mods lacking allowing those sort of projects to the top and legit bot likes and comments. Craziness. Put's all the projects and posts worthy of eyes to the dead bottom.


some people don't, but there's survivorship bias at play here. whenever you suspect foul play, email the mods at hn@ycombinator.com, they're quite responsive


Thank you for the info! Much appreciated fragmede :)


You're probably getting downvoted because there are local conventions against astroturfing/shillage/botting accusations described in https://news.ycombinator.com/newsguidelines.html

If you think there's something wrong with the post email the mods at hn@ycombinator.com


and the account being 1 hour old (at this time)


If you’re interested in a more detailed explanation, give the podcast “better offline” a listen.


I wonder if AI will be a benefit or a detriment to this project.

On the one hand, there’s going to be a lot more, potentially high quality audio books in its repository, on the other hand it goes against the spirit of the project itself.


The speech data collected by this project has been used for more than a decade to build automatic speech recognition and text-to-speech synthesis systems (see LibriSpeech, LibriTTS, LJSpeech). It definitely has been a benefit to AI.


I think they are talking more about the impact of AI on Librivox, as in people running an ebook through an AI TTS tool and uploading it.

On one hand, a well curated/edited AI recording might be great but a lot of people will (try? Idk their policies) to upload AI slop (no proof-listen, no checking, just laziness).


I think that, for the purposes of creating high-quality Free audiobooks, the issues are essentially the same with human-generated recordings as with AI generated ones. The recording quality and faithfulness to the original text (both in terms of “content” and the appropriate reading in terms of tone, expression of emotion, etc.) have to be verified. The problem is scale. There will be many more TTS-generated recordings uploaded than human-generated ones. Some automated filters (e.g., ASR WER, audio quality metrics) would be a great first step to discard bad-quality slop right away (though it might unfairly penalize real human accented speech).

Importantly, the recording should indicate whether it was human or AI generated.


> Importantly, the recording should indicate whether it was human or AI generated.

This is all that's necessary. Sometimes I'm fine with mediocre TTS; sometimes I want an actual professional; librivox is somewhere in between, but should clearly specify whether I will be getting an amateur human or a robot.


I disagree, for the reasons stated by the person you replied to.

Historically, being told that a voice recording is AI generated would be enough to tell you to expect basic TTS robotic voice, but with advances in AI voice generation we're approaching the point where AI can sound as good as real humans - it's not yet to the point where it's easy to generate an audiobook as good as a professional reader, but that point will come in the not too distant future.

And equally on the other side, something being recorded by a human doesn't automatically mean it has the quality of a professionally-read audiobook. This is something LibriVox has always had to deal with, by gatekeeping which volunteer recordings to either give feedback requesting improvements to or to not use at all.

In some but not all cases, an amateur human reader can already be as good as a professional, that will soon be true for AI. For both AI and humans it will remain the case that some efforts are not as good, but the line between them (for quality) isn't going to be whether or not they are AI - though I do agree that AI or not should also be labelled.


Certainly TTS has improved a lot thanks to modern AI, but it simply doesn't have the information to improve beyond sounding like a human reading words fluently. A professional audiobook reader modulates his tone to reflect narrative mood, chooses voices for the characters consistent with their natures, etc., and transformer models can't do those things.

For an example of a professional audiobook, check out Rob Inglis' version of The Lord of the Rings.


I agree with you about the current stage of things, which is why I said that we're approaching the point but not yet there for AI to be able to match professional readers.

But I disagree with you when you write "it simply doesn't have the information to improve beyond sounding like a human reading words fluently" - it has the same information when reading it as a human does, meaning that the best implementation would have to not only adapt tone to explicit instructions like "... she shouted", but also read between the lines / make subjective choices to suit the different characters.

AI is already capable of doing sentiment analysis on text, and text to speech models are getting better at being able to simulate moods/emotions rather than just speaking flatly, and I don't think we're many years away, if that, from those two sides being paired together in a way that produces the sort of quality output we're talking about for the first time without human involvement. Add to that the fact that AI can train on the many good examples of humans reading things, they may get to the point of emulating not just the core accent but also how each accent should adopt to what meanings in the text and arrive at a great solution without even needing to go through the steps of analysing what the text means to use that to know how to modify the voice being generated.


You're more optimistic about this stuff than I am, but I think I get your perspective. We have decent sentiment analysis, fluent text generation, and real-sounding TTS, so combining them will yield a pretty good reading. I agree that you're probably right when it comes to newspaper columns and magazine articles, but that's not on the level of a good audiobook.

To take an example, here's an iconic line from the Fellowship of the Ring:

> The wizard swayed on the bridge, stepped back a pace, and then again stood still. ‘You cannot pass!’ he said.

If you think that is a command, you should shout it like Ian McKellen in the movie. If you think it's a statement based on superior knowledge (see https://acoup.blog/2025/04/25/collections-how-gandalf-proved...), you should probably state it with certainty and fatigue. And if you're making a movie with a ton of crazy special effects and swelling music, you should probably make whatever choice goes best in that context.

Even if a model could make some consistent choice there, I wouldn't be all that interested, because the reader conveying their interpretation of the character to the listener is what matters. Sure, it might get enough Spotify plays to make some money, but it's not art.


I get very annoyed at the AI voice overs in youtube shorts and videos (which are showing up more and more lately).

I just close the tab when I realize it is AI. Not sure how long I can do this.


Hopefully long enough till the fraction of AI voices you recognize as AI drops down to less than 10% so you don't get frustrated that often.


I imagine there's various disabilities where audio readings greatly simplify people's lives. They're probably appreciative of anything accurate regardless of whether it's humans talking or not.


I'd like to think that in the future this will be possible, but at present I believe there are still too many uncanny valley problems for me to regard any TTS generated audio books as high quality. I can sometimes tolerate listening to articles or technical essays done this way, but quality audio book narrators often do consistent and distinct character voices, understand complex emotional states, and are capable of reacting to contextual clues and subtext. This seems like a pretty high bar for current TTS models.

Something like NotebookLLM seems shockingly good at first, and gives me hope that eventually we'll have machines that are nearly as good as humans at this; but after listening to it for an hour or so the novelty wore off and the artifice of it now seems galling and distracting.


>quality audio book narrators often do consistent and distinct character voices, understand complex emotional states, and are capable of reacting to contextual clues and subtext.

The best experience I've ever had with audio books is John le Carré reading his early novels (not in public domain). He uses a different voice for each character and they are SO pulsing with life it's breathtaking.


Thanks for the tip! I love well narrated audiobooks, and I've been meaning to get into one of le Carré's books for a while. I see he narrates the "John le Carré Value Collection" on Audible which has Tailor of Panama, Our Game, and Night Manager for a single credit. Is that what you're referring to?

Along those lines, there is a great 2007 unabridged audiobook[0] of Frank Herbert's Dune that is read by Simon Vance for narration, but other characters are dramatically performed by other voice actors. It's excellent, but sadly a tad bit uneven and inconsistent in production. It's like they got 3/4ths of the way through the project and some of the original voice actors couldn't complete the project and Vance had to pick up the slack. Regardless, it's still one of my favorite audiobooks.

[0]: https://libro.fm/audiobooks/9781427201447-dune


My favorites are his first books read by him:

"Call For The Dead" and "A Small Town in Germany"

Listen here:

https://youtu.be/e1lmpG3kCDg

https://youtu.be/30QOqAcY4bY

https://youtu.be/0Ik9Gv9s0TQ

https://youtu.be/q79SspzdpLA

https://youtu.be/i3UnPBMouwU

FWIW The three novels in the "Value Collection" are abridged:

https://www.amazon.com/John-le-Carr-Value-Collection-audiobo...


Well, you can safely assume that everything in Librivox was used to train the AI. So, "benefit" or "detriment"... you make the call.


Read the article and find out


It loads in about 5 seconds on an iPhone 12 using safari.

It also pans and zooms swiftly


Same, right up until I zoomed in and waited for Safari to produce a higher resolution render.

Partially zoomed in was fine, but zooming to maximum fidelity resulted in the tab crashing (it was completely responsive until the crash). Looks like Safari does some pretty smart progressive rendering, but forcing it to render the image at full resolution (by zooming in) causes the render to get OOMed or similar.


I remember that years ago (mobile) Safari would aggressively use GPU layers and crash if you ran out of GPU memory. Maybe that's still happening?

Preview on a mac handles the file fine.


How strange, took at least 30s to load on my iPhone 12 Pro Max with Safari but it was smooth to pan and zoom after. Which is way better than my 16 core 64GB RAM Windows machine where both Chrome and Edge gave up very quickly, with a "broken thumbnail" icon.


Probably because they're based on the same engine.


The strangeness was that 2 iPhones from the same generation would exhibit such different performance behaviors, and in parallel the irony that a desktop browser (engine irrelevant) on a device with cutting edge performance can't do what a phone does.


With which money, Sam?


Just shave off a few billion from that 7 trillion dollar plan. What's the big deal?


Someone elses.


Many are much older than that.


It’s the trader joes marketing team.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: