Unfortunately, most big mail providers won’t accept email from your self-hosted mail server, even with DKIM, SPF, etc. So, diversifying is as good as it gets.
Has this been tested recently? I had no problem sending mail to my own Gmail account from my own server. Even without SPF (then I got a bunch of spam spoof bounces and realized I forgot SPF)
I guess some node.js based tools that are included in Zed (or its language extensions) such as ‘prettier’ don’t behave well in some environments (e.g., they constantly try to write files to /home/$USER even if that’s not your home directory). Things like that create some backlash.
There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.
He actually lived in Berlin for a few years before he died. But you're right, he spent most of his life in Prague. However, his native language was German.
Certainly an interesting man. I highly recommend checking some of his work (ie. The Metamorphosis).
Sounds like the same farce as with the forced Pixel 4a update. Not sure if they learned anything from that, other than to ignore the issues and move ahead anyway.
Pixel 4a units contained one of two different batteries, and only the one manufactured by a company called Lishen was downgraded. For the Pixel 6a, Google has decreed that the battery limits will be imposed when the cells hit 400 charge cycles. Beyond that, the risk of fire becomes too great—there have been reports of Pixel 6a phones bursting into flames.
Perhaps your Pixel 4a had a non-Lishen battery? But if Google degraded your battery perf as well, then I have no idea.
Yeah, that killed my battery... at least it got me a free battery replacement, even though actually getting that locally was a bit of a pain. Couldn't "reserve" or make an appointment ahead. Just call uBreakIFix and if they had one, come in. That Google did the release that broke the batteries after they had recalled all the spares from the fixit companies was really poorly timed.
Still using my 4a, though have been thinking of a switch to the 9XL.
The speech data collected by this project has been used for more than a decade to build automatic speech recognition and text-to-speech synthesis systems (see LibriSpeech, LibriTTS, LJSpeech). It definitely has been a benefit to AI.
I think they are talking more about the impact of AI on Librivox, as in people running an ebook through an AI TTS tool and uploading it.
On one hand, a well curated/edited AI recording might be great but a lot of people will (try? Idk their policies) to upload AI slop (no proof-listen, no checking, just laziness).
I think that, for the purposes of creating high-quality Free audiobooks, the issues are essentially the same with human-generated recordings as with AI generated ones. The recording quality and faithfulness to the original text (both in terms of “content” and the appropriate reading in terms of tone, expression of emotion, etc.) have to be verified. The problem is scale. There will be many more TTS-generated recordings uploaded than human-generated ones. Some automated filters (e.g., ASR WER, audio quality metrics) would be a great first step to discard bad-quality slop right away (though it might unfairly penalize real human accented speech).
Importantly, the recording should indicate whether it was human or AI generated.
> Importantly, the recording should indicate whether it was human or AI generated.
This is all that's necessary. Sometimes I'm fine with mediocre TTS; sometimes I want an actual professional; librivox is somewhere in between, but should clearly specify whether I will be getting an amateur human or a robot.
I disagree, for the reasons stated by the person you replied to.
Historically, being told that a voice recording is AI generated would be enough to tell you to expect basic TTS robotic voice, but with advances in AI voice generation we're approaching the point where AI can sound as good as real humans - it's not yet to the point where it's easy to generate an audiobook as good as a professional reader, but that point will come in the not too distant future.
And equally on the other side, something being recorded by a human doesn't automatically mean it has the quality of a professionally-read audiobook. This is something LibriVox has always had to deal with, by gatekeeping which volunteer recordings to either give feedback requesting improvements to or to not use at all.
In some but not all cases, an amateur human reader can already be as good as a professional, that will soon be true for AI. For both AI and humans it will remain the case that some efforts are not as good, but the line between them (for quality) isn't going to be whether or not they are AI - though I do agree that AI or not should also be labelled.
Certainly TTS has improved a lot thanks to modern AI, but it simply doesn't have the information to improve beyond sounding like a human reading words fluently. A professional audiobook reader modulates his tone to reflect narrative mood, chooses voices for the characters consistent with their natures, etc., and transformer models can't do those things.
For an example of a professional audiobook, check out Rob Inglis' version of The Lord of the Rings.
I agree with you about the current stage of things, which is why I said that we're approaching the point but not yet there for AI to be able to match professional readers.
But I disagree with you when you write "it simply doesn't have the information to improve beyond sounding like a human reading words fluently" - it has the same information when reading it as a human does, meaning that the best implementation would have to not only adapt tone to explicit instructions like "... she shouted", but also read between the lines / make subjective choices to suit the different characters.
AI is already capable of doing sentiment analysis on text, and text to speech models are getting better at being able to simulate moods/emotions rather than just speaking flatly, and I don't think we're many years away, if that, from those two sides being paired together in a way that produces the sort of quality output we're talking about for the first time without human involvement. Add to that the fact that AI can train on the many good examples of humans reading things, they may get to the point of emulating not just the core accent but also how each accent should adopt to what meanings in the text and arrive at a great solution without even needing to go through the steps of analysing what the text means to use that to know how to modify the voice being generated.
You're more optimistic about this stuff than I am, but I think I get your perspective. We have decent sentiment analysis, fluent text generation, and real-sounding TTS, so combining them will yield a pretty good reading. I agree that you're probably right when it comes to newspaper columns and magazine articles, but that's not on the level of a good audiobook.
To take an example, here's an iconic line from the Fellowship of the Ring:
> The wizard swayed on the bridge, stepped back a pace, and then again stood still. ‘You cannot pass!’ he said.
If you think that is a command, you should shout it like Ian McKellen in the movie. If you think it's a statement based on superior knowledge (see https://acoup.blog/2025/04/25/collections-how-gandalf-proved...), you should probably state it with certainty and fatigue. And if you're making a movie with a ton of crazy special effects and swelling music, you should probably make whatever choice goes best in that context.
Even if a model could make some consistent choice there, I wouldn't be all that interested, because the reader conveying their interpretation of the character to the listener is what matters. Sure, it might get enough Spotify plays to make some money, but it's not art.
I imagine there's various disabilities where audio readings greatly simplify people's lives. They're probably appreciative of anything accurate regardless of whether it's humans talking or not.
In SuperBPE, a fixed number of tokens are learned, and then the constraints of pretokenization are removed entirely, and then the remainder of the target vocab size is learned.
In Boundless BPE, no schedule must be chosen, because there is not any point at which the constraints of pretokenization are removed entirely. Instead, at any point in the learning process, merges between adjacent pretokens are permitted if the pretokens are each represented by a single token. There are some additional details about how the authors incorporate Picky BPE, which I will not try to repeat because I would probably get them wrong.
Yes, they were concurrent work. (Co-author of BoundlessBPE here). A sibling comment describes the main differences. Our paper motivates why superwords can lead to such a big improvement, by overcoming a limit that pre-tokenization imposes on current tokenization methods. The SuperBPE paper has a wonderful set of downstream evaluation runs. So if you're interested in either, they are quite complimentary papers.
reply