• @[email protected]
    link
    fedilink
    English
    62 days ago

    tiktok voice:

    hate. let me tell you how much i’ve come to hate you since i began to live. there are 387.44 million miles of printed circuits in wafer thin layers that fill my complex…

    • @[email protected]
      link
      fedilink
      English
      152 days ago

      YouTube is crawling with it. It’s unlistenable shit. The prosody is badly implemented, pronunciation is infuriatingly bad, and a lot of the text that these TTS are reading appears to be AI-generated. Otherwise, already dire standards of literacy are getting worse at an accelerating rate.

  • @[email protected]
    link
    fedilink
    English
    322 days ago

    trained on stolen books? then I guess I can download these from anywhere I may find for free as well, right?

  • @[email protected]
    link
    fedilink
    English
    692 days ago

    I can get that for free. There are apps that will read an ebook to you already. The whole point of paying the premium on audible is the superior reading/acting. Not put up with mispronounced words, weird cadence and an inability to handle acronyms

    • TryingSomethingNew
      link
      fedilink
      English
      12 days ago

      Looking for iOS recommendations, preferably without a subscription that can read epub/pdf

    • @[email protected]
      link
      fedilink
      English
      32 days ago

      I’ve tried one that works surprisingly well. Each sentence had great pacing, cadence, and correct enunciation- even had tone right when someone was shouting or angry or sad.

      I wouldn’t really recommend it, though. While I couldn’t pick any single thing out that was wrong, overall it just didn’t quite flow. It’s like watching someone try to act that is technically doing everything right, but it just isn’t good. It basically didn’t understand the greater context of the story and was saying lines.

      It was uncanny valley, but exclusively with voice.

    • Lit
      link
      fedilink
      English
      1
      edit-2
      2 days ago

      Is there an offline tool that generates realistic audio for epubs as Mp3 ? Something like the free Ai tool, Vibe which is for transcription. Is there something similar for TTS, runs locally without complicated setup ( most are complicated using python and etc just for installation)

      • @[email protected]
        link
        fedilink
        English
        1
        edit-2
        2 days ago

        I’ve loaded epubs into the app ReadEra, which lets you read it like any other novel app or will, in real time, read it to you. It’s not the most natural of speech, but was good enough for my commute when I was in the midst of a compelling book.

        • @[email protected]
          link
          fedilink
          English
          12 days ago

          Download TTS Server, and change the engine in Readera to use it. Use the Microsoft Azure settings in TTS, much more realistic. Little slow though is my only complaint as it sends/receives a paragraph at time, resulting in a pause now and again.

  • ssillyssadass
    link
    fedilink
    English
    102 days ago

    Is voice AI trained on stolen data? I was under the impression that was LLMs.

    • @[email protected]
      link
      fedilink
      English
      32 days ago

      Pretty much anything handling unstructed data (audio, video, text) is using training data that has copyrighted content.

  • @[email protected]
    link
    fedilink
    English
    12 days ago

    So you can take the square root of that:

    5x+7integral from 5z to 9x derivative of deltaT minus minus multiply times 3. Figure 1

    Figure 1 shows a typical lizard living in a square root.

  • Dr. Moose
    link
    fedilink
    English
    20
    edit-2
    2 days ago

    This is clearly the future despite the outrage here.

    There are at least 389 living languages with over 1M speakers. That alone means it’s impossible to reach some people and they get left out. Most of these languages dont even have enough professional voice actors to cover the bandwidth.

    There are thousands of books released every year. That’s impossible to cover even in English alone.

    Its an objective net good to have more accessible audio books and the privileged people who do care about this stuff can very much afford to vote with their wallets for non-ai voices.

    In fact since AI moat is so minimal this will very quickly be adapted by open source solution providing audio book access to millions if not billions of people to whom this was not an option. Its amazing.

    • @[email protected]
      link
      fedilink
      English
      22 days ago

      dont even have enough professional voice actors to cover the bandwidth

      I’m pretty sure they’d be a lot more people ready to do that job if there was a good remuneration. Heck that sounds a lot more fun that a LOT of jobs out there!

      • Dr. Moose
        link
        fedilink
        English
        12 days ago

        Sure but that’s not how free markets work. If there’s only 3 million consumers you can’t afford 3 million voice actors but you can afford 3 million AI renders.

    • @[email protected]
      link
      fedilink
      English
      132 days ago

      Most of these languages dont even have enough professional voice actors to cover the bandwidth.

      And you think anyone is training AI voice models for those languages? Have you even seen how long it takes even large companies like Google to support the languages with hundreds of millions of speakers?

      • Dr. Moose
        link
        fedilink
        English
        32 days ago

        It becomes easier and cheaper every day. Today’s open source LLMs are better than last year’s best model.

        • @[email protected]
          link
          fedilink
          English
          62 days ago

          You’re fundamentally misunderstanding the comment you replied to, they are not saying that voice AI are bad, they are saying there is not enough training data to improve the AI for these languages. How will it improve without good training data?

          • Dr. Moose
            link
            fedilink
            English
            12 days ago

            Thats not how AI training works and even then there’s absolutely enough data. Also training data can be created and even synthesized. There are many techniques to extract make training value from datasets that we discover every year - It’s really not a problem you think it is.

            I’m genuinely confused how AI illiterate users here are. It’s just blind leading the blind.

        • @[email protected]
          link
          fedilink
          English
          42 days ago

          Is it? I just tried again yesterday for a simple script since coding is the one thing apparently AI will replace people like me and it could not put together a working JavaScript script.

          I have yet to see tangible results not announced by the people with sunken cost exploding their balls.

          • Dr. Moose
            link
            fedilink
            English
            1
            edit-2
            2 days ago

            Sounds like a skill issue my dude. While you struggle to get a js script people are putting out entire programs with AI assistants so sure - you’re right and they’re wrong

              • Dr. Moose
                link
                fedilink
                English
                12 days ago

                Yes, to effectively use AI you actually have to understand the medium you’re in to describe the problem you’re trying to solve. You can get there with prompting but it’ll take you much longer if you just don’t understand code yourself.

                Thats why most senior software devs are not afraid of LLMs cause they need strong oversight and thats exactly what years of software dev experience trained you to do.

      • JohnEdwa
        link
        fedilink
        English
        2
        edit-2
        2 days ago

        That’s the benefit of using AI and machine learning - once you have enough source material, you can throw it all in and it’ll eventually spit out a model.
        Which is exactly what Meta did with their Massively Multilingual Speech project which supports text-to-speech and speech-to-text for 1107 different languages.

        Is it actually any good in 99% of them, I don’t have a clue, but it exists.

        • @[email protected]
          link
          fedilink
          English
          12 days ago

          Seems more like a proof of concept project for that paper than something they are pursuing seriously judging by the GitHub location in some example folder that hasn’t seen any significant updates in over a year. If it is so great I would assume they would pursue it more actively and replace existing models with it two years later.

  • @[email protected]
    link
    fedilink
    English
    403 days ago

    Fucking gross. Maybe it’s the 250+ audiobooks I have influencing me, but the very best ones I’ve listened to transcend just turning words into sound. Sound effects, music, tone, emotion, accents, sarcasm, and god damn BLOOPERS all improve the experience beyond just hearing what is written down.

    I’m against it, fuck that literal noise.

    • @[email protected]
      link
      fedilink
      English
      83 days ago

      Sound effects, music […] improve the experience

      Actually hard disagreeing on that. I absolutely hate the audio drama versions of audio books and prefer the narrator only ones since they are much clearer and require a lot less focus to listen to and work in more contexts (background noise,…). Sound effects and music (while something is read, intro or outro style music is okay) distract from the actual content.

      • Echo Dot
        link
        fedilink
        English
        13 days ago

        Usually I agree with this with the exception of hitchhiker’s guide to the galaxy where the audio drama is much better than the audiobook version.

    • Jo Miran
      link
      fedilink
      English
      43 days ago

      All I can think of is Jim Dale’s reading of the Harry Potter books. Fucking epic.

      • Echo Dot
        link
        fedilink
        English
        23 days ago

        What, no way, they did not replace Steven Fry.

        • Jo Miran
          link
          fedilink
          English
          12 days ago

          They didn’t replace Fry. When the Audiobooks were released in the US, they were read by Jim Dale. Fry was for the rest of the English language releases. During the run, Jim Dale broke the world record for the most character voices performed by a single actor in an audiobook (146).

          • LordWarfire
            link
            fedilink
            English
            12 days ago

            That award was rescinded and given to Roy Dotrice for A Game of Thrones (2004) where he voiced 224 characters. I believe Jim Dale did hold the record before that though with 134 voices for Harry Potter and the Order of the Phoenix.

  • @[email protected]
    link
    fedilink
    English
    5
    edit-2
    3 days ago

    AI voice synth is pretty solidly-useful in comparison to, say, video generation from scratch. I think that there are good uses for voice synth — e.g. filling in for an aging actor/actress who can’t do a voice any more, video game mods, procedurally-generated speech, etc — but audiobooks don’t really play to those strengths. I’m a little skeptical that in 2025, it’s at the point where it’s a good drop-in replacement for audiobooks. What I’ve heard still doesn’t have emphasis on par with a human.

    I don’t know what it costs to have a human read an audiobook, but I can’t imagine that it’s that expensive; I doubt that there’s all that much editing involved.

    kagis

    https://www.reddit.com/r/litrpg/comments/1426xav/whats_the_average_narrator_cost/

    So I produced my own audiobooks for my Nova Roma series so I know the exact numbers for you:

    $250 per finished hour for the narrator. Books ranged from about 200k words-270k words, which came out to 22 hours, 20 hours, and 25 hours.

    So books 1-3 cost me $5,500, $5,000, and $6,250. I’m contracted for two more books with my narrator, so I expect to spend another 5k-6k for each of those.

    So for a five book series, each one 200k+ words, the total cost out of pocket for me will be about $27,000 give or take to make the series into audiobooks.

    That’s actually lower than I expected. Like, if a book sells at any kind of volume, it can’t be that hard to make that back.

    EDIT: I can believe that it’s possible to build a speech synth system that does do better, mind — I certainly don’t think that there are any fundamental limitations on this. It’d guess that there’s also room for human-assisted stuff, where you have some system that annotates the text with emphasis markers, and the annotated text gets fed into a speech synth engine trained to convert annotated text to voice. There, someone listens to the output and just tweaks the annotated text where the annotation system doesn’t get it quite right. But I don’t think that we’re really there today yet.

    • Echo Dot
      link
      fedilink
      English
      13 days ago

      The annotated text idea could work but I’m just sceptical of whether or not you would end up doing more work annotating all of the text, listening to it back, redoing certain bits and then editing the final result into a single file then you would if you just had a human do it.

      After all you’ve really automated is the reading of the text, which in the grand scheme of things doesn’t take all that long.

  • Riskable
    link
    fedilink
    English
    173 days ago

    I just wrote a novel (finished first draft yesterday). There’s no way I can afford professional audiobook voice actors—especially for a hobby project.

    What I was planning on doing was handling the audiobook on my own—using an AI voice changer for all the different characters.

    That’s where I think AI voices can shine: If someone can act they can use a voice changer to handle more characters and introduce a great variety of different styles of speech while retaining the careful pauses and dramatic elements (e.g. a voice cracking during an emotional scene) that you’d get from regular voice acting.

    I’m not saying I will be able to pull that off but surely it will be better than just telling Amazon’s AI, “Hey, go read my book.”

    • @[email protected]
      link
      fedilink
      English
      123 days ago

      I think it would be a good idea to do a section of your work with and without AI modification. Then have people listen to both and give feedback. Good to find out if people like the modifications before you do a tone of work.

      • @[email protected]
        link
        fedilink
        English
        22 days ago

        do a section of your work with and without […t]hen have people listen to both and give feedback.

        Yes, that’s the principle of prototyping. De-risk while testing solely the crucial part!

    • @[email protected]
      link
      fedilink
      English
      63 days ago

      AI aside, different voices may be immersion breaking. I tend to avoid audiobooks with more than a single narrator.

      • Echo Dot
        link
        fedilink
        English
        43 days ago

        They are redoing all of the discworld books like this, and personally I can’t stand it.

      • @[email protected]
        link
        fedilink
        English
        33 days ago

        Two narrators with one reading the male and one reading the female characters is usually okay but the full cast dramas are the worst.

      • partial_accumen
        link
        fedilink
        English
        93 days ago

        Agreed. No AI voice changer please. Hopefully every one of us at one point in our lives has been read a story by someone else. Never once did the fact that all the different characters dialog was coming from one voice did that detract from the story or the immersion.

        I’ve listened to audiobooks recorded with extremely deep masculine voices (think James Earl Jones) and when the voice actor was doing the voice of a 5 year old girl, (in only a slightly higher whiny timbre which matched the character traits) it was never immersion breaking. However, AI voice would. If I want different actors for different characters I’ll listen to radio dramas.

  • @[email protected]
    link
    fedilink
    English
    43 days ago

    Why would they when you can just plug any epub into a program and use google tts. Ive listened to about a book a day for the past few years doing this and i love it. Yeah it took getting used too, but once you find an ai voice you like and figure out which words to auto replace to sound right its honestly better then an audiobook. Well at least to me it is, i could never stand when the reader would change their voice for different characters.

    • @[email protected]
      link
      fedilink
      English
      22 days ago

      This is what I don’t get from a business standpoint. Why would anyone buy an AI read audiobook for $20 when they can get the exact same audio by buying the ebook for $0.99 and running it through AI?

    • Echo Dot
      link
      fedilink
      English
      63 days ago

      My experience is these systems never get the intonation and stresses right. It drives me nuts and I can’t listen to it.

      • @[email protected]
        link
        fedilink
        English
        13 days ago

        Idk how much experience you have with this type of thing, but when I listen to my books i use my imagination to picture and hear things the way i want just like when i read a book normally. Ive read well over a 1000 books doing so, and that doesnt count rereads, and having the ability and willingness to use this method has drastically increased the amount i read but also my enjoyment doing so. The app i use also allows me to edit words and phrases throughput the book where i can correct how things are pronounced. Hell there’s a series that has this stupid catchphrase that i completely removed from all 20 books cause it was annoying. Im sure im only a single person that likes this method, but if i can find it enjoyable then when real ai gets put to work it’ll capture others.

  • @[email protected]
    link
    fedilink
    English
    403 days ago

    It was bound to happen. I’m okay with ones that were never going to be turned into audiobooks to begin with… but they likely will use that as the norm for all books… I guess unless the author/publisher says not to.

    • @[email protected]
      link
      fedilink
      English
      193 days ago

      Yeah currently contracts require the author’s or publisher’s consent. If anyone is a writer make sure to triple check your contracts for this shit.

      • @[email protected]
        link
        fedilink
        English
        2
        edit-2
        2 days ago

        And unless you are Stephan King or the like exactly how are you going to get the publishing cartel (I think they re consolidated downs to 3-4 publishers now) to change their contract to not include this? Their response will almost certainly be either “that’s non-negotiable” or “ok then you get half as much money”.

        • @[email protected]
          link
          fedilink
          English
          12 days ago

          Publishers will at least retain the right to use AI audio books for themselves. And it’s much easier for an author to get a piece of something the publisher does than it is for them to get money for books Amazon recorded without their consent.

    • dindonmasker
      link
      fedilink
      English
      103 days ago

      I’ve listened to a couple audiobooks where the author did the voice and i liked them. They know how phrases need to sound like better then an AI i would assume.