‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • @[email protected]
    link
    fedilink
    English
    102 years ago

    Too bad

    Why do they have free reign to store and use copyrighted material as training data? AIs don’t learn as a human would, and comparisons can’t be made between the learning processes.

    • @[email protected]
      link
      fedilink
      English
      12 years ago

      They can be made. Imagine trying to hold any conversations without being able to reference popular culture.

    • @[email protected]
      link
      fedilink
      English
      4
      edit-2
      2 years ago

      Why do you have free reign to do the same?

      AIs don’t learn as a human would, and comparisons can’t be made between the learning processes.

      I think you’re going to have a hard time proving a financial distinction between them

      • @[email protected]
        link
        fedilink
        English
        42 years ago

        You don’t need to prove a financial difference. They are fundamentally different systems that function in different ways. They cannot be compared 1:1 and laws cannot be applied as a 1:1. New regulations need to be added around AI use of copyrighted material.

        • @[email protected]
          link
          fedilink
          English
          52 years ago

          I agree. For instance, it should be secured in law that you can train AI on anything, to avoid frivolous discussions like this.

          Output is what should be moderated by law.

          • @[email protected]
            link
            fedilink
            English
            52 years ago

            No

            Why are you entitled to use everyone else’s work? It should be secured in law that licensing applies to training data to avoid frivolous discussions like this. Then it’s an entirely opt-in solution, which works in the benefit of everyone except the people stealing data.

            Output doesn’t matter since it’s pretty well settled it’s not derivative work (as much as I disagree with that statement).

            • @[email protected]
              link
              fedilink
              English
              62 years ago

              the people stealing data

              No one is doing this

              Output doesn’t matter since it’s pretty well settled it’s not derivative work

              Cool, discussion over.

              • @[email protected]
                link
                fedilink
                English
                42 years ago

                It is stealing data. In order to train on it they have to store the data. That’s a copyright violation. There’s no way to interpret it as not stealing data.

    • @[email protected]
      link
      fedilink
      English
      12 years ago

      If I steal something from you I have it and you don’t. When I copy an idea from you, you still have the idea. As a whole the two person system has more knowledge. While actual theft is zero sum. Downloading a car and stealing a car are not the same thing.

      And don’t even try the awarding artists and inventor argument. Companies that fund R&D get tax breaks for it, so they already get money. An artists are rarely compensated appropriately.

  • @[email protected]
    link
    fedilink
    English
    7
    edit-2
    2 years ago

    My hot take is that it’s not like most of those independent artists are getting compensated fairly by the companies that own them anyway if at all. Stealing ai training content is just stealing from corporations. Corporations who are probably politically fighting to keep things worse for the average person in your country.

    Theft is “a crime” but I never saw anyone complaining about how unfair it was all those times I myself got fucked over by google bullshitting their way out of giving me my ad revenue. If normal people can’t profit from stuff like this, we shouldn’t be doing anything to protect the profits of evil corporations.

  • @[email protected]
    link
    fedilink
    English
    78
    edit-2
    2 years ago

    If it ends up being OK for a company like OpenAI to commit copyright infringement to train their AI models it should be OK for John/Jane Doe to pirate software for private use.

    But that would never happen. Almost like the whole of copyright has been perverted into a scam.

    • @[email protected]
      link
      fedilink
      English
      162 years ago

      Using copyrighted material is not the same thing as copyright infringement. You need to (re)publish it for it to become an infringement, and OpenAI is not publishing the material made with their tool; the users of it are. There may be some grey areas for the law to clarify, but as yet, they have not clearly infringed anything, any more than a human reading copyrighted material and making a derivative work.

      • @[email protected]
        link
        fedilink
        English
        22 years ago

        Insane how this comment is downvoted, when, as far as a I’m aware, it’s literally just the legal reality at this point in time.

      • @[email protected]
        link
        fedilink
        English
        32 years ago

        any more than a human reading copyrighted material and making a derivative work.

        It seems obvious to me that it’s not doing anything different than a human does when we absorb information and make our own works. I don’t understand why practically nobody understands this

        I’m surprised to have even found one person that agrees with me

        • @[email protected]
          link
          fedilink
          English
          12 years ago

          Because it’s objectively not true. Humans and ML models fundamentally process information differently and cannot be compared. A model doesn’t “read a book” or “absorb information”

          • @[email protected]
            link
            fedilink
            English
            1
            edit-2
            2 years ago

            I didn’t say they processed information the same, I said generative AI isn’t doing anything that humans don’t already do. If I make a drawing of Gordon Freeman or Courage the Cowardly Dog, or even a drawing of Gordon Freeman in the style of Courage the Cowardly Dog, I’m not infringing on the copyright of Valve or John Dilworth. (Unless I monetize it, but even then there’s fair-use…)

            Or if I read a statistic or some kind of piece of information in an article and spoke about it online, I’m not infringing the copyright of the author. Or if I listen to hundreds of hours of a podcast and then do a really good impression of one of the hosts online, I’m not infringing on that person’s copyright or stealing their voice.

            Neither me making that drawing, nor relaying that information, nor doing that impression are copyright infringement. Me uploading a copy of Courage or Half-Life to the internet would be, or copying that article, or uploading the hypothetical podcast on my own account somewhere. Generative AI doesn’t publish anything, and even if it did I think there would be a strong case for fair-use for the same reasons humans would have a strong case for fair-use for publishing their derivative works.

        • @[email protected]
          link
          fedilink
          English
          62 years ago

          It’s being mishmashed with a billion other documents just like to make a derivative work. It’s not like open hours giving you a copy of Hitchhiker’s Guide to the Galaxy.

          • @[email protected]
            link
            fedilink
            English
            32 years ago

            New York Times was able to have it return a complete NYT article, verbatim. That’s not derivative.

            • @[email protected]
              link
              fedilink
              English
              52 years ago

              I thought the same thing until I read another perspective into it from Mike Masnick and, from what he writes, it seems pretty clear they manipulated ChatGPT with some very specific prompts that someone who doesn’t already pay NYT for access would not be able to do. For example, feeding it 3 verbatim paragraphs from an article and asking it to generate the rest if you understand how these LLMs work, its really not surprising that you can indeed force it to do things like that but it’s an extreme and I’m qith Masnick and the user your responding to on this one myself.

              I also watched most of today’s subcommittee hearing on AI and journalism. A lot of the arguments are that this will destroy local journalism. Look, strong local journalism is some of the most important work that is dying right now. But the grave was dug by these large media companies and hedge funds that bought up and gutted those local news orgs and not many people outside of the industry batted an eye while that was happening. This is a bit of a tangent but I don’t exactly trust the giant headgefunds who gutted these local news journalists ocer the padt deacde to all of a sudden care at all about how important they are.

              Sorry fir the tangent butbheres the article i mentioned thats more on topic - http://mediagazer.com/231228/p11#a231228p11

              • @[email protected]
                link
                fedilink
                English
                32 years ago

                So they gave it the 3 paragraphs that are available publicly, said continue, and it spat out the rest of the article that’s behind a paywall. That sure sounds like copyright infringement.

  • kingthrillgore
    link
    fedilink
    English
    552 years ago

    Its almost like we had a thing where copyrighted things used to end up but they extended the dates because money

    • @[email protected]
      link
      fedilink
      English
      182 years ago

      This is where they have the leverage to push for actual copyright reform, but they won’t. Far more profitable to keep the system broken for everyone but have an exemption for AI megacorps.

    • rivermonster
      link
      fedilink
      English
      192 years ago

      I was literally about to come in here and say it would be an interesting tangential conversation to talk about how FUCKED copyright laws are, and how relevant to the discussion it would be.

      More upvote for you!

  • @[email protected]
    link
    fedilink
    English
    72 years ago

    Copyright protection only exists in the context of generating profit from someone else’s work. If you were to figure out cold fusion and I’d look at your research and say “That’s cool, but I am going to go do some woodworking.” I am not infringing any copyrights. It’s only ever an issue if the financial incentive to trace the profits back to it’s copyrighted source outway the cost of doing so. That’s why China has had free reign to steal any western technology, fighting them in their courts is not worth it. But with AI it’s way easier to trace the output back to it’s source (especially for art), so the incentive is there.

    The main issue is the extraction of value from the original data. If I where to steal some bricks from your infinite brick pile and build a house out of them, do you have a right to my house? Technically I never stole a house from you.

  • Ook the Librarian
    link
    fedilink
    English
    372 years ago

    It’s not “impossible”. It’s expensive and will take years to produce material under an encompassing license in the quantity needed to make the model “large”. Their argument is basically “but we can have it quickly if you allow legal shortcuts.”

    • rivermonster
      link
      fedilink
      English
      12
      edit-2
      2 years ago

      All AI should be FOSS and public domain, owned by the people, and all gains from its use taxed at 100%. It’s only because of the public that AI exists, through the schools, universities, NSF, grants, etc and all the other places that taxes have been poured into that created the advances upon which AI stands, and the AI critical research as well.

    • @[email protected]
      link
      fedilink
      English
      32 years ago

      That does nothing to solve the problem of data being used without consent to train the models. It doesn’t matter if the model is FOSS if it stole all the data it trained on.

      • @[email protected]
        link
        fedilink
        English
        12 years ago

        The only way I can steal data from you is if I break into your office and walk off with your hard drive. Do you have access to something? It hasn’t been stolen.

        • @[email protected]
          link
          fedilink
          English
          4
          edit-2
          2 years ago

          Copying copyright protected data is theft AND stealing

          Edit: this also applies to my stance on piracy, which I don’t engage in for the same reason. It’s theft

          • db0
            link
            fedilink
            English
            42 years ago

            By definition you’re wrong

            • @[email protected]
              link
              fedilink
              English
              22 years ago

              It’s theft.

              You can steal all you want, but it’s still theft. Piracy is theft, stealing data to be used as training data is theft.

              Not everyone wants their creations to be infinitely shared beyond their control. If someone creates something, they’re entitled to absolute control over it.

  • @[email protected]
    link
    fedilink
    English
    382 years ago

    But our current copyright model is so robust and fair! They will only have to wait 95y after the author died, which is a completely normal period.

    If you want to control your creations, you are completely free to NOT publish it. Nowhere it’s stated that to be valuable or beautiful, it has to be shared on the world podium.

    We’ll have a very restrictive Copyright for non globally transmitted/published works, and one for where the owner of the copyright DID choose to broadcast those works globally. They have a couple years to cash in, and then after I dunno, 5 years, we can all use the work as we see fit. If you use mass media to broadcast creative works but then become mad when the public transforms or remixes your work, you are part of the problem.

    Current copyright is just a tool for folks with power to control that power. It’s what a boomer would make driving their tractor / SUV while chanting to themselves: I have earned this.

    • @[email protected]
      cake
      link
      fedilink
      English
      32 years ago

      IMHO being able to “control your creations” isn’t what copyright was created for; it’s just an idea people came up with by analogy with physical property without really thinking through what purpose is supposed to serve. I believe creators of intellectual “property” have no moral right to control what happens with their creations, and they only have a limited legal right to do so as a side-effect of their legal right to profit from their creations.

    • @[email protected]
      link
      fedilink
      English
      402 years ago

      Copyright can be a double edged sword, but…

      If you want to control your creations, you are completely free to NOT publish it.

      … You’ve identified the chilling effect it’s designed to prevent: namely, telling people that they don’t matter in the scope of creation.

      There’s a great video about how plaigirism dehumanizes, and if you’ve got a couple minutes I’d recommend it.

      • @[email protected]
        link
        fedilink
        English
        4
        edit-2
        2 years ago

        First:

        I truly believe that they don’t matter as an individual when looking at their creation as a whole. It matters among their loved ones, and for that person itself. Why do you need more… importance? From who? Why do you need to matter in scope of creation? Is it a creation for you? Then why publish it? Is it a creation for others? Then why does your identity matter? It just seems like egotism with extra steps. Using copyright to combat this seems like a red herring argument made by people who have portfolio’s against people who don’t…

        You are not only your own person, you carry human culture remnants distilled out of 12000 years of humanity! You plagiarised almost the whole of humanity while creating your ‘unique’ addition to culture. But, because your remixed work is newer and not directly traceable to its direct origins, we’re gonna pretend that you wrote it as a hermit living without humanity on a rock and establish the rules from there on out. If it was fair for all the players in this game, it would already be impossible to not plagiarise.

      • @[email protected]
        link
        fedilink
        English
        52 years ago

        Them: “Oh yeah I have 10 minutes until my dentist appointment, I’ll check that out.”

      • @[email protected]
        link
        fedilink
        English
        162 years ago

        I think it’s pretty amazing when people just run with the dogma that empowers billionaires.

        Every creator hopes they’ll be the next taylor swift and that they’ll retain control of their art for those life + 70 years and make enough to create their own little dynasty.

        The reality is that long duration copyright is almost exclusively a tool of the already wealthy, not a tool for the not-yet-wealthy. As technology improves it will be easier and easier for wealth to control the system and deny the little guy’s copyright on grounds that you used something from their vast portfolio of copyright/patent/trademark/ipmonopolyrulelegalbullshit. Already civil legal disputes are largely a function of who has the most money.

        I don’t have the solution that helps artists earn a living, but it doesn’t seem like copyright is doing them many favors as-is unless they are retired rockstars who have already earned in excess of the typical middle class lifetime earnings by the time they hit 35, or way earlier.

        • @[email protected]
          link
          fedilink
          English
          92 years ago

          I don’t have the solution that helps artists earn a living, but it doesn’t seem like copyright is doing them many favors as-is unless they are retired rockstars who have already earned in excess of the typical middle class lifetime earnings by the time they hit 35, or way earlier.

          Just because copyright helps them less doesn’t mean it doesn’t help them at all. And at the end of the day, I’d prefer to support the retired rockstars over the stealing billionaires.

        • @[email protected]
          link
          fedilink
          English
          92 years ago
          1. I am against the dogma that empowers billionaires. Sam Altman is one such billionaire who abuses data that we should not ignore.
          2. I don’t know why you are treating copyright as a binary that doesn’t have any nuance. Current Copyright Law Imperfect, and if your concern is genuine we can talk about it at a future time.
          3. If you don’t have the solution, perhaps you should not attack one of the remaining defenses against rampant abuses of peoples’ livelihood.
          • @[email protected]
            link
            fedilink
            English
            12 years ago

            Current Copyright Law Imperfect,

            Yeah and Joseph Stalin was a bit naughty. As long as we are seeing how understated we can be.

            If you don’t have the solution, perhaps you should not attack one of the remaining defenses against rampant abuses of peoples’ livelihood.

            The creator of Superman wasnt paid royalties and was laid off. Many years later he worked a restaurant delivery guy and ended up dropping off food at DC comics. The artist that built that company doing a sandwich run.

            • @[email protected]
              link
              fedilink
              English
              1
              edit-2
              2 years ago

              Oh, is this the future time? I was thinking that you could air your concerns in a different thread entirely, perhaps in a subreddit devoted to it. There has been a suspicious number of people suddenly concerned about copyright and other things but only when AI is discussed.

              • @[email protected]
                link
                fedilink
                English
                32 years ago

                If you got an accusation go ahead and make it. I will be hearing downloading a fucking car

                • @[email protected]
                  link
                  fedilink
                  English
                  22 years ago

                  Relax, I would never be so grimy as to accuse you of something. I wish you well with your legitimate interests, and I hope you can find threads where they actually are on topic!

      • @[email protected]
        link
        fedilink
        English
        72 years ago

        Funny thing is, human artists work quite similar to AI, in that they take the whole of human art creation, build on ot and create something new (sometimes quite derivative). No art comes out of a vacuum, it builds on previous works. I would not really say AI plagiarizes anything, unless it reproduced pretty much the exact work of someone

  • ugjka
    link
    fedilink
    English
    82 years ago

    TBH I only use LLMs when traditional search fails and even then I’m not sure if I’m getting something useful or hallucination. I need better search engines not fancy AI bullshitters

  • @[email protected]
    link
    fedilink
    English
    522 years ago

    I guess the lesson here is pirate everything under the sun and as long as you establish a company and train a bot everything is a-ok. I wish we knew this when everyone was getting dinged for torrenting The Hurt Locker back when.

    Remember when the RIAA got caught with pirated mp3s and nothing happened?

    What a stupid timeline.

  • @[email protected]
    link
    fedilink
    English
    212 years ago

    So if I look at a painting study it and then emulate the original painter’s artstyle, then I’m in breach of their copyright?

    Or if I read a lot of fantasy like GRRM or JK Rowling and I also write a fantasy book and say, that they were my Inspiration, I’m breaching their copyright??

    That’s not how it works, and if it is, it shouldn’t be!

    Sure, if a start reproducing work, i.e. plagiarizing the work of others, then I’m doing sth wrong.

    And to spin this further: If I raise a child on children’s books by a specific author, am I breaching copyright, when my child enters the workforce and starts to earn money??? Stupid, yes! But so are the copyright claims against LLMs, in my opinion.

    • @[email protected]
      link
      fedilink
      English
      4
      edit-2
      2 years ago

      You’re comparing something humans often do subconsciously to a machine that was programmed to do that. Unless you’re arguing that intent doesn’t matter (pretty much every judge in America will tell you it does) then we’re talking about 2 completely different things.

      Edit: Disregard the struck out portion of my comment. Apparently I don’t know shit about law. My point is that comparing a a quirk of human psychology to the strict programming of a machine is a false equivalency.

        • @[email protected]
          link
          fedilink
          English
          22 years ago

          I looked it up and you’re right. I must of been thinking of a different crime. That’ll teach me to go spouting off about stuff.

          My point that AI is programmed to recycle and humans aren’t is still something I stand by, so I edited my comment.

    • @[email protected]
      link
      fedilink
      English
      192 years ago

      I don’t think it’s accurate to call the work of AI the same as the human brain, but most importantly, the difference is that humans and tools have and should have different rights. Someone can’t simply point a camera at a picture and say “I can look at it with my eye and keep it in my memory, so why can’t the camera?”

      Because we ensure the right of learning for people. That doesn’t mean it’s a free pass to technologically process works however one sees fit.

      Nevermind that the more people prodded AIs, the more they have found that the reproductions are much more identical than simply vaguely replicating style from them. People have managed to get whole sentences from books and obvious copies of real artwork, copyrighted characters and celebrities by prompting AI in specific ways.

      • @[email protected]
        link
        fedilink
        English
        32 years ago

        To be fair, I think your analogy falls apart a bit because you can in fact take a picture of pretty much any art you want to, legally speaking.

        You can’t go sell it or anything, but you are definitely not in breach of copyright just by taking the picture.

        • @[email protected]
          link
          fedilink
          English
          102 years ago

          That’s a rebuttal on the level of “if a tree falls in the forest and nobody is there to hear it”. Legally, theoretically, you should need permission just as much, but nobody is going to sue you over something nobody else sees.

          Copyright addresses reproduction and distribution, paid or not, including derivative works. There are exemptions for journalism and education, AI advanced a lot by using copyrighted materials under the reasoning that it was technological research, but as it spun off into commercial use, its reliance on copyrighted materials for training has become much more questionable.

          • @[email protected]
            cake
            link
            fedilink
            English
            32 years ago

            Copyright law only works because most violations are not feasible to prosecute. A world where copyright laws are fully enforced would be an authoritarian dystopia where all art and science is owned by wealthy corporations.

            Copyright law is inherently authoritarian. The conversation we should have been having for the last 100 years isn’t about how much we’ll tolerate technical violations of copyright law; it’s how much we’ll tolerate the chilling effect of copyright law on sharing for the sake of promoting new creative works.

            • @[email protected]
              link
              fedilink
              English
              92 years ago

              Absolutely and I’m with you on that. I think Copyright is excessively long and overly restrictive.

              But that is another conversation.

              The conversation we are having now is how to protect and compensate human creators that need their livelihoods to keep creating in our society as it is, when these new AI tools, trained on their works, are used to deliberately replace them.

              There are many issues with copyright as it is right now, but it is literally the only resort that artists have left in this situation. It’s not a given that opposing copyright hinders corporations. In this particular case there are many corporations salivating at the opportunity to replace human creators with AI, to get faster work, cheaper, to appropriate distinctive styles without needing to hire the people who developed them.

              There is a chilling effect on its own happening here. There are writers and artists today that are seeing their jobs handed to AI, which decide creative works are not a feasible career to have anymore. Not only this is tragic by virtue of human interest alone, since AI relies on human creators to be trained, it’s very possible that they will spiral into recursive derivativeness and become increasingly stale, devoid of fresh ideas and styles.

      • @[email protected]
        link
        fedilink
        English
        12 years ago

        the right of learning

        That’s not a thing. There is a right to an education, but that is not about copyright (though it may imply the necessity of fair use exceptions in certain contexts).

        Also, you are confused about AI output. It’s possible to make the AI spit out training data, but it takes, indeed, prodding. It’s unlikely to matter by US law.

  • Melllvar
    link
    fedilink
    English
    92 years ago

    Sounds like a fatal problem. That’s a shame.

  • @[email protected]
    link
    fedilink
    English
    72 years ago

    We’ll, strictly speaking you could have an AI that only knows about the world up to 1928 and talks like it’s 1928.