tiktok voice:
hate. let me tell you how much i’ve come to hate you since i began to live. there are 387.44 million miles of printed circuits in wafer thin layers that fill my complex…
youtube already does it.
And it’s shit
YouTube is crawling with it. It’s unlistenable shit. The prosody is badly implemented, pronunciation is infuriatingly bad, and a lot of the text that these TTS are reading appears to be AI-generated. Otherwise, already dire standards of literacy are getting worse at an accelerating rate.
trained on stolen books? then I guess I can download these from anywhere I may find for free as well, right?
Well, yeah, you can. Whoever told you that you can’t, don’t believe them, they are probably being payed to say it. You could also pay for the book to support the author but most likely your money will not go to the author so don’t bother.
free AI read audiobooks coming up
you couldn’t pay me to listen to an AI narrated book
This has actually got me thinking differently about AI all together.
The best use for AI needs to be for the individual. I want MY ai to read books or research with or complete tasks for me.
I don’t want another company to do it for me or monetize it or steal content with it.
I like your way of thinking
I can get that for free. There are apps that will read an ebook to you already. The whole point of paying the premium on audible is the superior reading/acting. Not put up with mispronounced words, weird cadence and an inability to handle acronyms
Looking for iOS recommendations, preferably without a subscription that can read epub/pdf
I’m an android user, so not sure if it’s on iOS but I’ve used ReadEra
I’ve tried one that works surprisingly well. Each sentence had great pacing, cadence, and correct enunciation- even had tone right when someone was shouting or angry or sad.
I wouldn’t really recommend it, though. While I couldn’t pick any single thing out that was wrong, overall it just didn’t quite flow. It’s like watching someone try to act that is technically doing everything right, but it just isn’t good. It basically didn’t understand the greater context of the story and was saying lines.
It was uncanny valley, but exclusively with voice.
Is there an offline tool that generates realistic audio for epubs as Mp3 ? Something like the free Ai tool, Vibe which is for transcription. Is there something similar for TTS, runs locally without complicated setup ( most are complicated using python and etc just for installation)
Great question! I need to come back to this thread to see if something is suggested.
I’ve loaded epubs into the app ReadEra, which lets you read it like any other novel app or will, in real time, read it to you. It’s not the most natural of speech, but was good enough for my commute when I was in the midst of a compelling book.
Download TTS Server, and change the engine in Readera to use it. Use the Microsoft Azure settings in TTS, much more realistic. Little slow though is my only complaint as it sends/receives a paragraph at time, resulting in a pause now and again.
How do I do that? Have both readera and tts server on a Samsung Galaxy
I thought people mainly paid for the large library
Is voice AI trained on stolen data? I was under the impression that was LLMs.
Pretty much anything handling unstructed data (audio, video, text) is using training data that has copyrighted content.
So you can take the square root of that:
5x+7integral from 5z to 9x derivative of deltaT minus minus multiply times 3. Figure 1
Figure 1 shows a typical lizard living in a square root.
This is clearly the future despite the outrage here.
There are at least 389 living languages with over 1M speakers. That alone means it’s impossible to reach some people and they get left out. Most of these languages dont even have enough professional voice actors to cover the bandwidth.
There are thousands of books released every year. That’s impossible to cover even in English alone.
Its an objective net good to have more accessible audio books and the privileged people who do care about this stuff can very much afford to vote with their wallets for non-ai voices.
In fact since AI moat is so minimal this will very quickly be adapted by open source solution providing audio book access to millions if not billions of people to whom this was not an option. Its amazing.
dont even have enough professional voice actors to cover the bandwidth
I’m pretty sure they’d be a lot more people ready to do that job if there was a good remuneration. Heck that sounds a lot more fun that a LOT of jobs out there!
Sure but that’s not how free markets work. If there’s only 3 million consumers you can’t afford 3 million voice actors but you can afford 3 million AI renders.
Most of these languages dont even have enough professional voice actors to cover the bandwidth.
And you think anyone is training AI voice models for those languages? Have you even seen how long it takes even large companies like Google to support the languages with hundreds of millions of speakers?
It becomes easier and cheaper every day. Today’s open source LLMs are better than last year’s best model.
You’re fundamentally misunderstanding the comment you replied to, they are not saying that voice AI are bad, they are saying there is not enough training data to improve the AI for these languages. How will it improve without good training data?
Thats not how AI training works and even then there’s absolutely enough data. Also training data can be created and even synthesized. There are many techniques to extract make training value from datasets that we discover every year - It’s really not a problem you think it is.
I’m genuinely confused how AI illiterate users here are. It’s just blind leading the blind.
Is it? I just tried again yesterday for a simple script since coding is the one thing apparently AI will replace people like me and it could not put together a working JavaScript script.
I have yet to see tangible results not announced by the people with sunken cost exploding their balls.
Sounds like a skill issue my dude. While you struggle to get a js script people are putting out entire programs with AI assistants so sure - you’re right and they’re wrong
yeah, I guess I didn’t prompt right lol
Yes, to effectively use AI you actually have to understand the medium you’re in to describe the problem you’re trying to solve. You can get there with prompting but it’ll take you much longer if you just don’t understand code yourself.
Thats why most senior software devs are not afraid of LLMs cause they need strong oversight and thats exactly what years of software dev experience trained you to do.
That’s the benefit of using AI and machine learning - once you have enough source material, you can throw it all in and it’ll eventually spit out a model.
Which is exactly what Meta did with their Massively Multilingual Speech project which supports text-to-speech and speech-to-text for 1107 different languages.Is it actually any good in 99% of them, I don’t have a clue, but it exists.
Seems more like a proof of concept project for that paper than something they are pursuing seriously judging by the GitHub location in some example folder that hasn’t seen any significant updates in over a year. If it is so great I would assume they would pursue it more actively and replace existing models with it two years later.
but for a service like audible.
Beautiful, it works. Why not.
It’s Amazon, what did you expect? Enshittification and monopoly abuse, no surprise.
Fucking gross. Maybe it’s the 250+ audiobooks I have influencing me, but the very best ones I’ve listened to transcend just turning words into sound. Sound effects, music, tone, emotion, accents, sarcasm, and god damn BLOOPERS all improve the experience beyond just hearing what is written down.
I’m against it, fuck that literal noise.
Sound effects, music […] improve the experience
Actually hard disagreeing on that. I absolutely hate the audio drama versions of audio books and prefer the narrator only ones since they are much clearer and require a lot less focus to listen to and work in more contexts (background noise,…). Sound effects and music (while something is read, intro or outro style music is okay) distract from the actual content.
Usually I agree with this with the exception of hitchhiker’s guide to the galaxy where the audio drama is much better than the audiobook version.
All I can think of is Jim Dale’s reading of the Harry Potter books. Fucking epic.
What, no way, they did not replace Steven Fry.
They didn’t replace Fry. When the Audiobooks were released in the US, they were read by Jim Dale. Fry was for the rest of the English language releases. During the run, Jim Dale broke the world record for the most character voices performed by a single actor in an audiobook (146).
That award was rescinded and given to Roy Dotrice for A Game of Thrones (2004) where he voiced 224 characters. I believe Jim Dale did hold the record before that though with 134 voices for Harry Potter and the Order of the Phoenix.
Also Andy Serkis reading the lord of the rings. 11/10
AI voice synth is pretty solidly-useful in comparison to, say, video generation from scratch. I think that there are good uses for voice synth — e.g. filling in for an aging actor/actress who can’t do a voice any more, video game mods, procedurally-generated speech, etc — but audiobooks don’t really play to those strengths. I’m a little skeptical that in 2025, it’s at the point where it’s a good drop-in replacement for audiobooks. What I’ve heard still doesn’t have emphasis on par with a human.
I don’t know what it costs to have a human read an audiobook, but I can’t imagine that it’s that expensive; I doubt that there’s all that much editing involved.
kagis
https://www.reddit.com/r/litrpg/comments/1426xav/whats_the_average_narrator_cost/
So I produced my own audiobooks for my Nova Roma series so I know the exact numbers for you:
$250 per finished hour for the narrator. Books ranged from about 200k words-270k words, which came out to 22 hours, 20 hours, and 25 hours.
So books 1-3 cost me $5,500, $5,000, and $6,250. I’m contracted for two more books with my narrator, so I expect to spend another 5k-6k for each of those.
So for a five book series, each one 200k+ words, the total cost out of pocket for me will be about $27,000 give or take to make the series into audiobooks.
That’s actually lower than I expected. Like, if a book sells at any kind of volume, it can’t be that hard to make that back.
EDIT: I can believe that it’s possible to build a speech synth system that does do better, mind — I certainly don’t think that there are any fundamental limitations on this. It’d guess that there’s also room for human-assisted stuff, where you have some system that annotates the text with emphasis markers, and the annotated text gets fed into a speech synth engine trained to convert annotated text to voice. There, someone listens to the output and just tweaks the annotated text where the annotation system doesn’t get it quite right. But I don’t think that we’re really there today yet.
The annotated text idea could work but I’m just sceptical of whether or not you would end up doing more work annotating all of the text, listening to it back, redoing certain bits and then editing the final result into a single file then you would if you just had a human do it.
After all you’ve really automated is the reading of the text, which in the grand scheme of things doesn’t take all that long.
I just wrote a novel (finished first draft yesterday). There’s no way I can afford professional audiobook voice actors—especially for a hobby project.
What I was planning on doing was handling the audiobook on my own—using an AI voice changer for all the different characters.
That’s where I think AI voices can shine: If someone can act they can use a voice changer to handle more characters and introduce a great variety of different styles of speech while retaining the careful pauses and dramatic elements (e.g. a voice cracking during an emotional scene) that you’d get from regular voice acting.
I’m not saying I will be able to pull that off but surely it will be better than just telling Amazon’s AI, “Hey, go read my book.”
I think it would be a good idea to do a section of your work with and without AI modification. Then have people listen to both and give feedback. Good to find out if people like the modifications before you do a tone of work.
do a section of your work with and without […t]hen have people listen to both and give feedback.
Yes, that’s the principle of prototyping. De-risk while testing solely the crucial part!
AI aside, different voices may be immersion breaking. I tend to avoid audiobooks with more than a single narrator.
They are redoing all of the discworld books like this, and personally I can’t stand it.
Two narrators with one reading the male and one reading the female characters is usually okay but the full cast dramas are the worst.
Would infinitely prefer no voice changer.
Agreed. No AI voice changer please. Hopefully every one of us at one point in our lives has been read a story by someone else. Never once did the fact that all the different characters dialog was coming from one voice did that detract from the story or the immersion.
I’ve listened to audiobooks recorded with extremely deep masculine voices (think James Earl Jones) and when the voice actor was doing the voice of a 5 year old girl, (in only a slightly higher whiny timbre which matched the character traits) it was never immersion breaking. However, AI voice would. If I want different actors for different characters I’ll listen to radio dramas.
I only get the ones with a famous narrator or the author.
AI can mimic those :/
Why would they when you can just plug any epub into a program and use google tts. Ive listened to about a book a day for the past few years doing this and i love it. Yeah it took getting used too, but once you find an ai voice you like and figure out which words to auto replace to sound right its honestly better then an audiobook. Well at least to me it is, i could never stand when the reader would change their voice for different characters.
This is what I don’t get from a business standpoint. Why would anyone buy an AI read audiobook for $20 when they can get the exact same audio by buying the ebook for $0.99 and running it through AI?
My experience is these systems never get the intonation and stresses right. It drives me nuts and I can’t listen to it.
Idk how much experience you have with this type of thing, but when I listen to my books i use my imagination to picture and hear things the way i want just like when i read a book normally. Ive read well over a 1000 books doing so, and that doesnt count rereads, and having the ability and willingness to use this method has drastically increased the amount i read but also my enjoyment doing so. The app i use also allows me to edit words and phrases throughput the book where i can correct how things are pronounced. Hell there’s a series that has this stupid catchphrase that i completely removed from all 20 books cause it was annoying. Im sure im only a single person that likes this method, but if i can find it enjoyable then when real ai gets put to work it’ll capture others.
It was bound to happen. I’m okay with ones that were never going to be turned into audiobooks to begin with… but they likely will use that as the norm for all books… I guess unless the author/publisher says not to.
Yeah currently contracts require the author’s or publisher’s consent. If anyone is a writer make sure to triple check your contracts for this shit.
And unless you are Stephan King or the like exactly how are you going to get the publishing cartel (I think they re consolidated downs to 3-4 publishers now) to change their contract to not include this? Their response will almost certainly be either “that’s non-negotiable” or “ok then you get half as much money”.
Publishers will at least retain the right to use AI audio books for themselves. And it’s much easier for an author to get a piece of something the publisher does than it is for them to get money for books Amazon recorded without their consent.
I’ve listened to a couple audiobooks where the author did the voice and i liked them. They know how phrases need to sound like better then an AI i would assume.