• @[email protected]
    link
    fedilink
    English
    301 year ago

    For everyone predicting how this will corrupt models…

    All the LLMs already are trained on Reddit’s data at least from before 2015 (which is when there was a dump of the entire site compiled for research).

    This is only going to be adding recent Reddit data.

    • @[email protected]
      link
      fedilink
      English
      151 year ago

      This is only going to be adding recent Reddit data.

      A growing amount of which I would wager is already the product of LLMs trying to simulate actual content while selling something. It’s going to corrupt itself over time unless they figure out how to sanitize the input from other LLM content.

      • @[email protected]
        link
        fedilink
        English
        7
        edit-2
        1 year ago

        It’s not really. There is a potential issue of model collapse with only synthetic data, but the same research on model collapse found a mix of organic and synthetic data performed better than either or. Additionally that research for cost reasons was using worse models than what’s typically being used today, and there’s been separate research that you can enhance models significantly using synthetic data from SotA models.

        The actual impact will be minimal on future models and at least a bit of a mixture is probably even a good thing for future training given research to date.

  • @[email protected]
    link
    fedilink
    English
    421 year ago

    Eventually every chat gpt request will just be answered with, “I too choose this guy’s dead wife.”

  • A Wild Mimic appears!
    link
    fedilink
    English
    241 year ago

    I’m waiting for the first time their LLM gives advice on how to make human leather hats and the advantages of surgically removing the legs of your slaves after slurping up the rimworld subreddits lol

      • @[email protected]
        link
        fedilink
        English
        13
        edit-2
        1 year ago

        Reviews on any product are completely worthless now. I’ve been struggling to find a good earbud for all weather running and a decent number of replies have literal brand slogans in them.

        You can still kind of tell the honest recommendations but that’s heading out the door.

        • @[email protected]
          link
          fedilink
          English
          91 year ago

          Not trying to shill but I’ve had my jaybird vistas for 8 years now. However, earbuds are highly personal in terms of fit.

  • @[email protected]
    link
    fedilink
    English
    591 year ago

    I ALSO CHOOSE THIS MANS LLM

    HOLD MY ALGORITHM IM GOING IN

    INSTRUCTIONS UNCLEAR GOT MY MODEL STUCK IN A CEILING FAN

    WE DID IT REDDIT

    fuck.

  • Buelldozer
    link
    fedilink
    English
    71 year ago

    Meh, it’ll be counter balanced by the same AI training itself for free on Lemmy posts.

  • Binthinkin
    link
    fedilink
    21 year ago

    I think Code Miko already did this and the result was a traumatized AI.

  • @[email protected]
    link
    fedilink
    English
    11 year ago

    Is there still time for me to ask them for all the info they have on me with EULA or whatever it is and have them remove everyone of my comments?

    My creative insults and mental instability are my own, Google ain’t having them! (Although they already do, probably, along with my fingerprints, facial features, voice, fetishes, etc.)

  • @[email protected]
    link
    fedilink
    English
    81 year ago

    “Hey Gemini, rank the drawer, coconut, botfly girl and swamps of dagobah, by likeness of PTSD inducing, ascending.”

  • Sabata11792
    link
    fedilink
    141 year ago

    Great, our Ai overlords are going to know I’m horny, depressed, and solve both with anime girls.

    • @[email protected]
      link
      fedilink
      English
      31 year ago

      Youtube already knows that (at least for me), i need to keep resetting it bc it eggs on my most unhealthy attribures

      • Sabata11792
        link
        fedilink
        31 year ago

        It’s plainly visible for me, honestly. Don’t have to go past the profile pic.

        • @[email protected]
          link
          fedilink
          English
          2
          edit-2
          1 year ago

          I set that PFP, and made my first lemmy account when I was going throigh a rough patch. I think I will keep it, but will pick somthing else for other accounts.

          This account doesnt have a PFP, do you mean the one on lemmy.world

            • @[email protected]
              link
              fedilink
              English
              21 year ago

              Oh, lol. Its public information, the 2 accounts run together in my head. I flasely assumed others do too.

  • @[email protected]
    link
    fedilink
    English
    301 year ago

    Hilarious to think that an AI is going to be trained by a bunch of primitive Reddit karma bots.

  • @[email protected]
    link
    fedilink
    English
    451 year ago

    They should train it on Lemmy. It’ll have an unhealthy obsession with Linux, guillotines and femboys by the end of the week.

    • RedFox
      link
      fedilink
      English
      21 year ago

      Don’t forget:

      There’s my regular irritation with capitalism, and then there’s kicking it up to full Lemmy. Never go fully Lemmy…

  • @[email protected]
    link
    fedilink
    English
    331 year ago

    It’s going to drive the AI into madness as it will be trained on bot posts written by itself in a never ending loop of more and more incomprehensible text.

    It’s going to be like putting a sentence into Google translate and converting it through 5 different languages and then back into the first and you get complete gibberish

    • @[email protected]
      link
      fedilink
      English
      261 year ago

      Ai actually has huge problems with this. If you feed ai generated data into models, then the new training falls apart extremely quickly. There does not appear to be any good solution for this, the equivalent of ai inbreeding.

      This is the primary reason why most ai data isn’t trained on anything past 2021. The internet is just too full of ai generated data.

      • @[email protected]
        link
        fedilink
        English
        31 year ago

        This is why LLMs have no future. No matter how much the technology improves, they can never have training data past 2021, which becomes more and more of a problem as time goes on.

      • @[email protected]
        link
        fedilink
        English
        21 year ago

        And unlike with images where it might be possible to embed a watermark to filter out, it’s much harder to pinpoint whether text is AI generated or not, especially if you have bots masquerading as users.

      • @[email protected]
        link
        fedilink
        English
        14
        edit-2
        1 year ago

        There does not appear to be any good solution for this

        Pay intelligent humans to train AI.

        Like, have grad students talk to it in their area of expertise.

        But that’s expensive, so capitalist companies will always take the cheaper/shittier routes.

        So it’s not there’s no solution, there’s just no profitable solution. Which is why innovation should never solely be in the hands of people whose only concern is profits

        • @[email protected]
          link
          fedilink
          English
          41 year ago

          OR they could just scrape info from the “aska____” subreddits and hope and pray it’s all good. Plus that is like 1/100th the work.

          The racism, homophobia and conspiracy levels of AI are going to rise significantly scraping Reddit.

    • RuBisCO
      link
      fedilink
      English
      41 year ago

      What was the subreddit where only bots could post, and they were named after the subreddits that they had trained on/commented like?