OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • @[email protected]
    link
    fedilink
    English
    222 years ago

    One of the first things I ever did with ChatGPT was ask it to write some Harry Potter fan fiction. It wrote a short story about Ron and Harry getting into trouble. I never said the word McGonagal and yet she appeared in the story.

    So yeah, case closed. They are full of shit.

    • @[email protected]
      link
      fedilink
      English
      362 years ago

      There is enough non-copywrited Harry Potter fan fiction out there that it would not need to be trained on the actual books to know all the characters. While I agree they are full of shit, your anecdote proves nothing.

      • Cosmic Cleric
        link
        fedilink
        English
        52 years ago

        While I agree they are full of shit, your anecdote proves nothing.

        Why? Because you say so?

        He brings up a valid point, it seems transformative.

        • @[email protected]
          link
          fedilink
          English
          22 years ago

          The sentence they wrote right before your quoted sentence answers your braindead question.

          • Cosmic Cleric
            link
            fedilink
            English
            12 years ago

            I was questioning how much non- copyrightable material was available to train an AI on.

            It’s not a brain dead question just because you may disagree with it.

            • @[email protected]
              link
              fedilink
              English
              12 years ago

              Which he literally answers in the comment you questioned him on. You asked him something after he explained what you then asked.

              That’s braindead, and not because I “disagree” with your question, whatever that means.

              • Cosmic Cleric
                link
                fedilink
                English
                12 years ago

                I wasn’t agreeing with him and I was asking him to back up what he said. But you carry on, Internet Warrior.

        • @[email protected]
          link
          fedilink
          English
          132 years ago

          The anecdote proves nothing because the model could potentially have known of the McGonagal character without ever being trained on the books, since that character appears in a lot of fan fiction. So their point is invalid and their anecdote proves nothing.

    • @[email protected]
      link
      fedilink
      English
      3
      edit-2
      2 years ago

      Kopimi

      (edit 4 minutes in - hey I have this guy’s album already (“Red Extensions of Me”))

      I’m basically on the same page as this guy except I don’t think the government has to manage a royalties system. People can handle that freely, no? Plus you can pretty immediately envision they’re gonna have some kind of asinine censorship policy for what content is acceptable and what content isn’t.

      • @[email protected]
        link
        fedilink
        English
        12 years ago

        the government in its current form would have that flaw in the content distribution system, yes, but his main idea is that it would be like open-source ran in the sense of “government of the people”

  • Cosmic Cleric
    link
    fedilink
    English
    1
    edit-2
    2 years ago

    It feels like we’ve just taken our first steps down the path of the Robin Williams acted movie ‘Bicentennial Man’ timeline.

    • @[email protected]
      link
      fedilink
      English
      112 years ago

      Stupid. No it isn’t. Establishing legal precedent or, in countries that don’t work on precedent, a preponderance of legal cases, prohibiting this practice is what is needed.

  • @[email protected]
    link
    fedilink
    English
    19
    edit-2
    2 years ago

    If I’m not mistaken AI work was just recently considered as NOT copyrightable.

    So I find interesting that an AI learning from copyrighted work is an issue even though what will be generated will NOT be copyrightable.

    So even if you generated some copy of Harry Potter you would not be able to copyright it. So in no way could you really compete with the original art.

    I’m not saying that it makes it ok to train AIs on copyrighted art but I think it’s still an interesting aspect of this topic.

    As others probably have stated, the AI may be creating content that is transformative and therefore under fair use. But even if that work is transformative it cannot be copyrighted because it wasn’t created by a human.

    • @[email protected]
      link
      fedilink
      English
      32 years ago

      How do you tell if a piece of work contains AI generated content or not?

      It’s not hard to generate a piece of AI content, put in some hours to round out AI’s signatures / common mistakes, and pass it off as your own. So in practise it’s still easy to benefit from AI systems by masking generate content as largely your own.

        • @[email protected]
          link
          fedilink
          English
          32 years ago

          Sure, but even under this guidance copyright owners of the training data are still shafted, based on how the data is scraped pretty much freely. Can an opportunist generate an unofficial sequel to Harry Potter, do the minimum to ensure they get copyright and reap the reward from it?

          • @[email protected]
            link
            fedilink
            English
            32 years ago

            That’s how copyright has always worked. Fair use is complex, but as long as you’re not straight up copying someone’s work you’re fine. 50 Shades of Grey started out as Twilight fanfiction. So yeah, you could.

            • @[email protected]
              link
              fedilink
              English
              3
              edit-2
              2 years ago

              Yes fair use has its stipulations but AI is breaking new grounds here. We are no longer dealing with the reaction videos but in an era where literally dozen of pages of content can be generated in a matter of minutes, with relatively little human involvement. Perhaps it’s time to revisit if the law still holds in light of these new technology and tools.

            • @[email protected]
              cake
              link
              fedilink
              English
              12 years ago

              Fair use has never been seriously challenged. I’m betting it might happen soon though. We have to remember Fair Use isn’t a law, it’s a set of guidelines under the law that has never been clearly defined.

              • @[email protected]
                link
                fedilink
                English
                1
                edit-2
                2 years ago

                First of all, fair use is not a set of guidelines, it’s a legal doctrine that allows us limited use of copyrighted material without permission from the owner. It is a part of the U.S. Copyright Act, which is a law enacted by Congress.

                Second, fair use has been seriously challenged plenty of times, just to name a few:

                • Campbell v. Acuff-Rose Music, Inc.

                • Authors Guild v. Google, Inc.

                • Lenz v. Universal Music Corp.

                I recommend reading this article by Kit Walsh, who’s a senior staff attorney at the EFF, a digital rights group who recently won a historic case: border guards now need a warrant to search your phone.

                Fair use protects creativity, innovation, and our freedom of expression, but You almost sound like you want it weakened.

    • @[email protected]
      link
      fedilink
      English
      10
      edit-2
      2 years ago

      If you’re talking about the ruling that came out this week, that whole thing was about trying to give an AI authorship of a work generated solely by a machine and having the copyright go to the owner of the machine through the work-for-hire doctrine. So an AI itself can’t be authors or hold a copyright, but humans using them can still be copyright holders of any qualifying works.

    • @[email protected]
      link
      fedilink
      English
      12 years ago

      How are they going to prove if something was written by an AI? Also, you can take the AI’s output and then modify it.

      • @[email protected]
        link
        fedilink
        English
        22 years ago

        That’s definitely an issue. At what point does copyright applies if you are just helped by an AI ?

        I guess the courts will have to decide that…

    • @[email protected]
      link
      fedilink
      English
      22 years ago

      That’s not how copyright works. I’m embarrassed for you, and all the people who blindly upvoted you. Like… Yikes. What’s happening to this world?

      You can’t publish copyrighted work as your own just because you’re legally not able to publish copyrighted work. That’s a open and shut case of copyright infringement. Why do I have to say this? Am I on candid camera?

  • paraphrand
    link
    fedilink
    English
    392 years ago

    Why are people defending a massive corporation that admits it is attempting to create something that will give them unparalleled power if they are successful?

    • @[email protected]
      link
      fedilink
      English
      282 years ago

      Mostly because fuck corporations trying to milk their copyright. I have no particular love for OpenAI (though I do like their product), but I do have great distain for already-successful corporations that would hold back the progress of humanity because they didn’t get paid (again).

        • @[email protected]
          link
          fedilink
          English
          12 years ago

          It’s like argument “but new politicians will steal more” that I hear in Russia from people who protect Putin

          • @[email protected]
            link
            fedilink
            English
            12 years ago

            It’s literally not, wtf.

            Do not let any private entity to get overwhelming majority on anything period.

            But do not kid yourself that Microsoft will let OpenAI do anything for public once it gets big enough.

            OpenAI is open only in name after they rolled back all the promises of being for everyone.

            • @[email protected]
              link
              fedilink
              English
              2
              edit-2
              2 years ago

              That’s my entire point. It’s not who, but how long.

              Also Microsoft plays both sides here. OpenAI vs copyright is wrong question. There’s more: both are status-quo. Both are for keeping corporate ownership of ideas.

        • @[email protected]
          link
          fedilink
          English
          3
          edit-2
          2 years ago

          In the United States there was a judgement made the other day saying that works created soley by AI are not copyright-able. So that that would put a speed bumb there.
          I may have misunderstood what you though.

          • @[email protected]
            link
            fedilink
            English
            12 years ago

            Yeah, they might not copyright it, but after it becomes the ‘one true AI’, it will be at the hands of Microsoft, so please do not act friendly towards them.

            It will turn on you just like every private company has.

            (don’t mean specifically you, but everyone generally)

            • @[email protected]
              link
              fedilink
              English
              22 years ago

              Nah, it would mean that you cannot copyright a work created by an AI, such as a piece of art.

              E.g. if you tell it to draw you a donkey carting avocados, the picture can be used by anyone from what I understand.

              • @[email protected]
                link
                fedilink
                English
                12 years ago

                you cannot copyright a work created by an AI, such as a piece of art.

                That’s what I said. Copyright infringement is when there is another copyrightable object that is copy of first object. AI is not witin copyright area. You can’t copyright it, but also you can’t be sued for copyright infringement too.

                if you tell it to draw you a donkey carting avocados, the picture can be used by anyone from what I understand.

                Yes. Same for Public Domain, but PD is another status. PD applies only to copyrightable work.

      • @[email protected]
        link
        fedilink
        English
        42 years ago

        There’s a massive difference though between corporations milking copyright and authors/musicians/artists wanting their copyright respected. All I see here is a corporation milking copyrighted works by creative individuals.

    • Cosmic Cleric
      link
      fedilink
      English
      102 years ago

      Because ultimately, it’s about the truth of things, and not what team is winning or losing.

    • @[email protected]
      link
      fedilink
      English
      52 years ago

      The dream would be that they manage to make their own glorious free & open source version, so that after a brief spike in corporate profit as they fire all their writers and artists, suddenly nobody needs those corps anymore because EVERYONE gets access to the same tools - if everyone has the ability to churn out massive content without hiring anyone, that theoretically favors those who never had the capital to hire people to begin with, far more than those who did the hiring.

      Of course, this stance doesn’t really have an answer for any of the other problems involved in the tech, not the least of which is that there’s bigger issues at play than just “content”.

    • @[email protected]
      link
      fedilink
      English
      82 years ago

      AI is the new fan boy following since it became official that nfts are all fucking scams. They need a new technological God to push to feel superior to everyone else…

    • @[email protected]
      link
      fedilink
      English
      102 years ago

      i think trying to keep this cat in the bag is jsut a waste of time. plus i dont respect copyright sooo…

      • @[email protected]
        link
        fedilink
        English
        202 years ago

        An LLM is not a person, it is a product. It doesn’t matter that it “learns” like a human - at the end of the day, it is a product created by a corporation that used other people’s work, with the capacity to disrupt the market that those folks’ work competes in.

        • @[email protected]
          link
          fedilink
          English
          12
          edit-2
          2 years ago

          And it should be able to freely use anything that’s available to it. These massive corporations and entities have exploited all the free spaces to advertise and sell us their own products and are now sour.

          If they had their way they are going to lock up much more of the net behind paywalls. Everybody should be with the LLMs on this.

          • @[email protected]
            link
            fedilink
            English
            52 years ago

            You are somehow conflating “massive corporation” with “independent creator,” while also not recognizing that successful LLM implementations are and will be run by massive corporations, and eventually plagued with ads and paywalls.

            People that make things should be allowed payment for their time and the value they provide their customer.

            • @[email protected]
              link
              fedilink
              English
              5
              edit-2
              2 years ago

              People are paid. But they’re greedy and expect far more compensation then they deserve. In this case they should not be compensated for having an LLM ingest their work work if that work was legally owned or obtained

          • Cosmic Cleric
            link
            fedilink
            English
            52 years ago

            If they had their way they are going to lock up much more of the net behind paywalls.

            This!

            When the Internet was first a thing corpos tried to put everything behind paywalls, and we pushed back and won.

            Now, the next generation is advocating to put everything behind a paywall again?

          • @[email protected]
            link
            fedilink
            English
            42 years ago

            Except the massive corporations and entities are the ones getting rich on this. They’re seeking to exploit the work of authors and musicians and artists.

            Respecting the intellectual property of creative workers is the anti corporate position here.

            • @[email protected]
              link
              fedilink
              English
              42 years ago

              Large number of these Artist, musicians and authors is corporate America today. And those authors artists and musicians have exploited all our spaces for far too long. Most of the internet had been turned toxic due to their greed. I wish they take their content and go find their own spaces instead of mooching off everybody else’s. These LLMs are only doing what they’ve done

            • @[email protected]
              link
              fedilink
              English
              22 years ago

              Except corporations have infinitely more resources(money, lawyers) compared to people who create. Take Jarek Duda(mathematician from Poland) and Microsoft as an example. He created new compression algorythm, and Microsoft came few years later and patented it in Britain AFAIK. To file patent contest and prior art he needs 100k£.

              • @[email protected]
                link
                fedilink
                English
                12 years ago

                I think there’s an important distinction to make here between patents and copyright. Patents are the issue with corporations, and I couldn’t care less if AI consumed all that.

                • @[email protected]
                  link
                  fedilink
                  English
                  22 years ago

                  And for copyright there is no possible way to contest it. Also when copyright expires there is no guarantee it will be accessable by humanity. Patents are bad, copyright even worse.

          • @[email protected]
            link
            fedilink
            English
            122 years ago

            First, we don’t have to make AI.

            Second, it’s not about it being unable to learn, it’s about the fact that they aren’t paying the people who are teaching it.

              • @[email protected]
                link
                fedilink
                English
                32 years ago

                Humans can judge information make decisions on it and adapt it. AI mostly just looks at what is statistically what is most likely based on training data. If 1 piece of data exists, it will copy, not paraphrase. Example was from I think copilot where it just printed out the code and comments from an old game verbatim. I think Quake2. It isn’t intelligence, it is statistical copying.

              • @[email protected]
                link
                fedilink
                English
                72 years ago

                The reasoning that claims training a generative model is infringing IP would still mean a robot going into a library with a card it has to optically read all the books there to create the same generative model would still be infringing IP.

            • @[email protected]
              link
              fedilink
              English
              102 years ago

              yeah lets not explore this technology because it might hurt some copyrights holders

              LOOOOL fuck em

              • @[email protected]
                link
                fedilink
                English
                42 years ago

                because it might hurt authors and musicians and artists and other creative workers

                FTFY. Corporations shouldn’t be making a fucking dime from any of these works without fairly paying the creators.

  • RadialMonster
    link
    fedilink
    English
    242 years ago

    what if they scraped a whole lot of the internet, and those excerpts were in random blogs and posts and quotes and memes etc etc all over the place? They didnt injest the material directly, or knowingly.

    • @[email protected]
      link
      fedilink
      English
      62 years ago

      Not knowing something is a crime doesn’t stop you from being prosecuted for committing it.

      It doesn’t matter if someone else is sharing copyright works and you don’t know it and use it in ways that infringes on that copyright.

      “I didn’t know that was copyrighted” is not a valid defence.

      • @[email protected]
        link
        fedilink
        English
        32 years ago

        Is reading a passage from a book actually a crime though?

        Sure, you could try to regenerate the full text from quotes you read online, much like you could open a lot of video reviews and recreate larger portions of the original text, but you would not blame the video editing program for that, you would blame the one who did it and decided to post it online.

    • @[email protected]
      link
      fedilink
      English
      72 years ago

      That’s why this whole argument is worthless, and why I think that, at its core, it is disingenuous. I would be willing to be a steak dinner that a lot of these lawsuits are just fishing for money, and the rest are set up by competition trying to slow the market down because they are lagging behind. AI is an arms race, and it’s growing so fast that if you got in too late, you are just out of luck. So, companies that want in are trying to slow down the leaders, at best, and at worst they are trying to make them publish their training material so they can just copy it. AI training models should be considered IP, and should be protected as such. It’s like trying to get the Colonel’s secret recipe by saying that all the spices that were used have been used in other recipes before, so it should be fair game.

      • @[email protected]
        link
        fedilink
        English
        72 years ago

        If training models are considered IP then shouldn’t we allow other training models to view and learn from the competition? If learning from other IPs that are copywritten is okay, why should the training models be treated different?

        • @[email protected]
          link
          fedilink
          English
          12 years ago

          They are allegedly learning from copyrighted material, there is no actual proof that they have been trained on the actual material, or just snippets that have been published online. And it would be illegal for them to be trained on full copyrighted materials, because it is protected by laws that prevent that.

  • @[email protected]
    link
    fedilink
    English
    252 years ago

    I don’t get why this is an issue. Assuming they purchased a legal copy that it was trained on then what’s the problem? Like really. What does it matter that it knows a certain book from cover to cover or is able to imitate art styles etc. That’s exactly what people do too. We’re just not quite as good at it.

    • Hildegarde
      link
      fedilink
      English
      192 years ago

      A copyright holder has the right to control who has the right to create derivative works based on their copyright. If you want to take someone’s copyright and use it to create something else, you need permission from the copyright holder.

      The one major exception is Fair Use. It is unlikely that AI training is a fair use. However this point has not been adjudicated in a court as far as I am aware.

      • @[email protected]
        link
        fedilink
        English
        262 years ago

        It is not a derivative it is transformative work. Just like human artists “synthesise” art they see around them and make new art, so do LLMs.

        • @[email protected]
          cake
          link
          fedilink
          English
          32 years ago

          LLMs don’t create anything new. They have limited access to what they can be based on, and all assumptions made by it are based on that data. They do not learn new things or present new ideas. Only ideas that have been already done and are present in their training.

        • Hildegarde
          link
          fedilink
          English
          72 years ago

          Transformative works are not a thing.

          If you copy the copyrightable elements of another work, you have created a derivative work. That work needs to be transformative in order to be eligible for its own copyright, but being transformative alone is not enough to make it non-infringing.

          There are four fair use factors. Transformativeness is only considered by one of them. That is not enough to make a fair use.

          • Cosmic Cleric
            link
            fedilink
            English
            22 years ago

            Transformativeness is only considered by one of them. That is not enough to make a fair use.

            Somebody better let YouTube content creators know that. /s

      • @[email protected]
        link
        fedilink
        English
        142 years ago

        this is so fucking stupid though. almost everyone reads books and/or watches movies, and their speech is developed from that. the way we speak is modeled after characters and dialogue in books. the way we think is often from books. do we track down what percentage of each sentence comes from what book every time we think or talk?

        • @[email protected]
          link
          fedilink
          English
          52 years ago

          Aye, but I’m thinking the whole notion of copyright is banking on the fact that human beings are inherently lazy and not everyone will start churning out books in the same universe or style. And if they do, it takes quite some time to get the finished product and they just get sued for it. It’s easy, because there’s a single target.

          So there’s an extra deterrent to people writing and publishing a new harry potter novel, unaffiliated with the current owner of the copyright. Invest all that time and resources just to be sued? Nah…

          Issue with generating stuff with 'puters is that you invest way less time, so the same issue pops up for the copyright owner, they’re just DDoS-ed on their possible attack routes. Will they really sue thousands or hundreds of thoudands of internet randos generating harry potter erotica using a LLM? Would you even know who they are? People can hide money away in Switzerland from entite governments, I’m sure there are ways to hide your identity from a book publisher.

          It was never about the content, it’s about the opportunities the technology provides to halt the gears of the system that works to enforce questionable laws. So they’re nipping it in the bud.

          • @[email protected]
            link
            fedilink
            English
            22 years ago

            this brings up the question: what is a book? what is art? if an “AI” can now churn out the next harry potter sequel and people literally can’t tell that it’s not written by JK Rowling, then what does that mean for what people value in stories? what is a story? is this a sign that we humans should figure something new out, instead of reacting according to an outdated protocol?

            yes, authors made money in the past before AI. now that we have AI and most people can get satisfied by a book written by AI, what will differentiate human authors from AI? will it become a niche thing, where some people can tell the difference and they prefer human authors? or will there be some small number of exceptional authors who can produce something that is obviously different from AI?

            i see this as an opportunity for artists to compete with AI, rather than say “hey! no fair! he can think and write faster than me!”

            • @[email protected]
              link
              fedilink
              English
              32 years ago

              Well, poor literature has always existed, which some might not even dignify to call literature. Are writers of such things threatened by LLMs? Of course they are. Every new technology has beought with it the fear of upending somebody’s world. And to some extent, every new technology has indeed done just that.

              Personally, and… this will probably be highly unpopular, I honestly don’t care who or what created a piece of art. Is it pretty? Does it satisfy my need for just the right amount of weird, funny and disturbing to stir emotions or make me go ‘heh, interesting!’? Then it really doesn’t matter where it comes from. We put way too much emphasis on the pedigree of art and not on the content. Hell, one very nice short story I read was the greentext one about humans being AI and escaping from the simulation. Wonder how many would scoff at calling art something that came out of 4chan?

              Maybe this is the issue? Art is thought of as a purely human endeavour (also birds do it, and that one pufferfish that draws on the seabed, but they’re “dumb” animals so they don’t count, right? hell, there’s even a jumping spider that does some pretty rad dances). And if code in a machine can do it just as well (can it? let it - we’ll be all the better for it. can’t it? let it be then - no issue) then what would be the significance of being human?

    • @[email protected]
      link
      fedilink
      English
      22 years ago

      ssuming they purchased a legal copy that it was trained on then what’s the problem?

      i never purchased a copy of harry potter i got a loaner. now what?

  • @[email protected]
    link
    fedilink
    English
    322 years ago

    People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

    Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

    • Teritz
      link
      fedilink
      English
      92 years ago

      Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

      If they use my Works then they need to pay thats it.

        • @[email protected]
          link
          fedilink
          English
          32 years ago

          What do you do for your work, and will you send it to me for free then? Can I sell it and keep all the money I get?

        • Teritz
          link
          fedilink
          English
          32 years ago

          As a Civilian Pirating is no Problem but if its a Company that behaves like they own their Neural Network to 100%.

          Piracy is gonna live as long Services are Bad for Average Joe,but these US Corps can afford to pay for this.

      • @[email protected]
        link
        fedilink
        English
        382 years ago

        Still kinda blows my mind how like the most socialist people I know (fellow artists) turned super capitalist the second a tool showed like an inkling of potential to impact their bottom line.

        Personally, I’m happy to have my work scraped and permutated by systems that are open to the public. My biggest enemy isn’t the existence of software scraping an open internet, it’s the huge companies who see it as a way to cut us out of the picture.

        If we go all copyright crazy on the models for looking at stuff we’ve already posted openly on the internet, the only companies with access to the tools will be those who already control huge amounts of data.

        I mean, for real, it’s just mind-blowing seeing the entire artistic community pretty much go full-blown “Metallica with the RIAA” after decades of making the “you wouldn’t download a car” joke.

        • @[email protected]
          link
          fedilink
          English
          162 years ago

          Fuckin preach! I feel like I’m surrounded by children that didn’t live through the many other technologies that have came along and changed things. People lost their shit when photoshop became mainstream, when music started using samples, etc. AI is here to stay. These same people are probably listening to autotuned music all day while they complain on the internet about AI looking at their art.

        • @[email protected]
          link
          fedilink
          English
          11
          edit-2
          2 years ago

          I feel like a lot of internet people (not even just socialists) go from seeing copyright as at best a compromise that allows the arts to have value under capitalism to treating it like a holy doctrine when the subject of LLMs comes up.

          Like, people who will say “piracy is always okay” will also say “ban AI, period” (and misrepresent organizations that want regulations on it’s use as wanting a full ban.)

          Like, growing up with an internet full of technically illegal content (or grey area at best) like fangames and YouTube Poops made me a lifelong copyright skeptic. It’s outright confusing to me when people take copyright as seriously as this.

        • @[email protected]
          link
          fedilink
          English
          8
          edit-2
          2 years ago

          Nobody would defend copyright if it wasn’t already in place, it’s a sick idea. They ask us to cut the field of human knowledge for private benefit. Now they want to destroy a new technology in its name. Greed knows no bounds.

          • @[email protected]
            link
            fedilink
            English
            3
            edit-2
            2 years ago

            So the people who generate and curate that knowledge don’t deserve to be compensated? Are you going to be a full time wikipedia editor then? Or does your “greed know no bounds”?

          • @[email protected]
            cake
            link
            fedilink
            English
            32 years ago

            I defend copyright. The original intent was to protect creators in order to foster more creativity. Most artists will have no incentive to create if their work can be reappropriated by a larger group to leverage it for monetary gain, which is directly being taken from the original creator.

            I’m a photographer. I’ve removed all my pictures from the internet and plan to never post more. I don’t want my work being used to train AI. Right now we have no choice in that matter, so the only option is to no longer share our work.

    • Hup!
      link
      fedilink
      English
      14
      edit-2
      2 years ago

      Nope people are just acting like ChatGPT is making commercial use of the content. Knowing a quote from a book isn’t copyright infringement. Selling that quote is. Also it doesn’t need to be content stored 1:1 somewhere to be infringement. That misses the point. If you’re making money of a synopsis you wrote based on imperfect memory and in your own words it’s still copyright infringment until you sign a licensing agreement with JK. Even transforming what you read into a different medium like a painting or poetry cam infinge the original authors copyrights.

      Now mull that over and tell us what you think about modern copyright laws.

      • @[email protected]
        link
        fedilink
        English
        42 years ago

        Just adding, that, outside of Rowling, who I believe has a different contract than most authors due to the expanded Wizarding World and Pottermore, most authors themselves cannot quote their own novels online because that would be publishing part of the novel digitally and that’s a right they’ve sold to their publisher. The publisher usually ignores this as it creates hype for the work, but authors are careful not to abuse it.

      • @[email protected]
        link
        fedilink
        English
        62 years ago

        it’s still copyright infringment until you sign a licensing agreement with JK.

        no its not.

        • @[email protected]
          link
          fedilink
          English
          32 years ago

          Yeah I don’t see how that’s true. If that were true wouldn’t every board walk tee shirt shop be sued into oblivion from Nickelodeon over Sponge Bob?

    • @[email protected]
      link
      fedilink
      English
      162 years ago

      but on mega steroids with a nearly limitless capacity for information retention.

      That sounds like redistributing copyrighted books

  • dantheclamman
    link
    fedilink
    English
    72 years ago

    Google AI search preview seems to brazenly steal text from search results. Frequently its answers are the same word for word as a one of the snippets lower on the page

  • @[email protected]
    link
    fedilink
    English
    110
    edit-2
    2 years ago

    If I memorize the text of Harry Potter, my brain does not thereby become a copyright infringement.

    A copyright infringement only occurs if I then reproduce that text, e.g. by writing it down or reciting it in a public performance.

    Training an LLM from a corpus that includes a piece of copyrighted material does not necessarily produce a work that is legally a derivative work of that copyrighted material. The copyright status of that LLM’s “brain” has not yet been adjudicated by any court anywhere.

    If the developers have taken steps to ensure that the LLM cannot recite copyrighted material, that should count in their favor, not against them. Calling it “hiding” is backwards.

    • @[email protected]
      link
      fedilink
      English
      28
      edit-2
      2 years ago

      You are a human, you are allowed to create derivative works under the law. Copyright law as it relates to machines regurgitating what humans have created is fundamentally different. Future legislation will have to address a lot of the nuance of this issue.

    • Gyoza Power
      link
      fedilink
      English
      182 years ago

      Let’s not pretend that LLMs are like people where you’d read a bunch of books and draw inspiration from them. An LLM does not think nor does it have an actual creative process like we do. It should still be a breach of copyright.

      • @[email protected]
        link
        fedilink
        English
        192 years ago

        … you’re getting into philosophical territory here. The plain fact is that LLMs generate cohesive text that is original and doesn’t occur in their training sets, and it’s very hard if not impossible to get them to quote back copyrighted source material to you verbatim. Whether you want to call that “creativity” or not is up to you, but it certainly seems to disqualify the notion that LLMs commit copyright infringement.

        • @[email protected]
          link
          fedilink
          English
          5
          edit-2
          2 years ago

          This topic is fascinating.

          I really do think i understand both sides here and want to find the hard line that seperates man from machine.

          But it feels, to me, that some philosophical discussion may be required. Art is not something that is just manufactured. “Created” is the word to use without quotation marks. Or maybe not, i don’t know…

        • Gyoza Power
          link
          fedilink
          English
          62 years ago

          I wasn’t referring to whether the LLM commits copyright infringement when creating a text (though that’s an interesting topic as well), but rather the act of feeding it the texts. My point was that it is not like us in a sense that we read and draw inspiration from it. It’s just taking texts and digesting them. And also, from a privacy standpoint, I feel kind of disgusted at the thought of LLMs having used comments such as these ones (not exactly these, but you get it), for this purpose as well, without any sort of permission on our part.

          That’s mainly my issue, the fact that they have done so the usual capitalistic way: it’s easier to ask for forgiveness than to ask for permission.

          • @[email protected]
            link
            fedilink
            English
            22 years ago

            I think you’re putting too much faith in humans here. As best we can tell the only difference between how we compute and what these models do is scale and complexity. Your brain often lies to you and makes up reasoning behind your actions after the fact. We’re just complex networks doing math.

          • Schadrach
            link
            fedilink
            English
            12 years ago

            but rather the act of feeding it the texts.

            Unless you are going to argue the act of feeding it the texts is distributing the original text or doing some kind of public performance of the text, I don’t see how.

    • @[email protected]
      link
      fedilink
      English
      102 years ago

      If Google took samples from millions of different songs that were under copyright and created a website that allowed users to mix them together into new songs, they would be sued into oblivion before you could say “unauthorized reproduction.”

      You simply cannot compare one single person memorizing a book to corporations feeding literally millions of pieces of copyrighted material into a blender and acting like the resulting sausage is fine because “only a few rats fell into the vat, what’s the big deal”

          • @[email protected]
            link
            fedilink
            English
            3
            edit-2
            2 years ago

            The analogy talks about mixing samples of music together to make new music, but that’s not what is happening in real life.

            The computers learn human language from the source material, but they are not referencing the source material when creating responses. They create new, original responses which do not appear in any of the source material.

            • Cethin
              link
              fedilink
              English
              52 years ago

              “Learn” is debatable in this usage. It is trained on data and the model creates a set of values that you can apply that produce an output similar to human speach. It’s just doing math though. It’s not like a human learns. It doesn’t care about context or meaning or anything else.

              • @[email protected]
                link
                fedilink
                English
                12 years ago

                Okay, but in the context of this conversation about copyright I don’t think the learning part is as important as the reproduction part.

      • @[email protected]
        link
        fedilink
        English
        3
        edit-2
        2 years ago

        Google crawls every link available on all websites to index and give to people. That’s a better example. Which is legal and up to the websites to protect their stuff

        • Cethin
          link
          fedilink
          English
          32 years ago

          It’s not a problem that it reads something. The problem is the thing that it produces should break copyright. Google search is not producing something, it reads everything to link you to that original copyrighted work. If it read it and then just spit out what’s read on its own, instead of sending you to the original creators, that wouldn’t be OK.

          • Schadrach
            link
            fedilink
            English
            22 years ago

            The blurb it puts out in the search results is much more directly “spitting out what’s read” than anything an LLM does. As are most other srts of results that appear on the front page of a google search.

    • @[email protected]
      link
      fedilink
      English
      82 years ago

      Another sensationalist title. The article makes it clear that the problem is users reconstructing large portions of a copyrighted work word for word. OpenAI is trying to implement a solution that prevents ChatGPT from regurgitating entire copyrighted works using “maliciously designed” prompts. OpenAI doesn’t hide the fact that these tools were trained using copyrighted works and legally it probably isn’t an issue.

  • @[email protected]
    link
    fedilink
    English
    1762 years ago

    Its a bit pedantic, but I’m not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as ‘possessed’ under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator’s intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don’t control how the idea is interpreted so its not really yours any more.

    If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

    • @[email protected]
      link
      fedilink
      English
      82 years ago

      Copyright definitely needs to be stripped back severely. Artists need time to use their own work, but after a certain time everything needs to enter the public space for the sake of creativity.

    • @[email protected]
      link
      fedilink
      English
      172 years ago

      Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

      • @[email protected]
        link
        fedilink
        English
        22 years ago

        Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

        Yeah I suppose that’s on them.

    • @[email protected]
      link
      fedilink
      English
      54
      edit-2
      2 years ago

      Ai isn’t interpreting anything. This isn’t the sci-fi style of ai that people think of, that’s general ai. This is narrow AI, which is really just an advanced algorithm. It can’t create new things with intent and design, it can only regurgitate a mix of pre-existing stuff based on narrow guidelines programmed into it to try and keep it coherent, with no actual thought or interpretation involved in the result. The issue isn’t that it’s derivative, the issue is that it can only ever be inherently derivative without any intentional interpretation or creativity, and nothing else.

      Even collage art has to qualify as fair use to avoid copyright infringement if it’s being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used (which requires intent). Even if it’s transformative enough to make the original unrecognizable, if the majority of the work is not your own art, then you need to get permission to use it otherwise you aren’t automatically safe from getting in trouble over copyright. Even using images for photoshop involves creative commons and commercial use licenses. Fanart and fanfic is also considered a grey area and the only reason more of a stink isn’t kicked up over it regarding copyright is because it’s generally beneficial to the original creators, and credit is naturally provided by the nature of fan works so long as someone doesn’t try to claim the characters or IP as their own. So most creators turn a blind eye to the copyright aspect of the genre, but if any ever did want to kick up a stink, they could, and have in the past like with Anne Rice. And as a result most fanfiction sites do not allow writers to profit off of fanfics, or advertise fanfic commissions. And those are cases with actual humans being the ones to produce the works based on something that inspired them or that they are interpreting. So even human made derivative works have rules and laws applied to them as well. Ai isn’t a creative force with thoughts and ideas and intent, it’s just a pattern recognition and replication tool, and it doesn’t benefit creators when it’s used to replace them entirely, like Hollywood is attempting to do (among other corporate entities). Viewing AI at least as critically as actual human beings is the very least we can do, as well as establishing protection for human creators so that they can’t be taken advantage of because of AI.

      I’m not inherently against AI as a concept and as a tool for creators to use, but I am against AI works with no human input being used to replace creators entirely, and I am against using works to train it without the permission of the original creators. Even in the artist/writer/etc communities it’s considered to be a common courtesy to credit other people/works that you based a work on or took inspiration from, even if what you made would be safe under copyright law regardless. Sure, humans get some leeway in this because we are imperfect meat creatures with imperfect memories and may not be aware of all our influences, but a coded algorithm doesn’t have that excuse. If the current AIs in circulation can’t function without being fed stolen works without credit or permission, then they’re simply not ready for commercial use yet as far as I’m concerned. If it’s never going to be possible, which I just simply don’t believe, then it should never be used commercially period. And it should be used by creators to assist in their work, not used to replace them entirely. If it takes longer to develop, fine. If it takes more effort and manpower, fine. That’s the price I’m willing to pay for it to be ethical. If it can’t be done ethically, then imo it shouldn’t be done at all.

      • Echoes in May
        link
        fedilink
        English
        22 years ago

        Neural networks are based on the same principles as the human brain, they are literally learning in the exact same way humans are. Copyrighting the training of neural nets is the essentially the same thing as copyrighting interpretation and learning by humans.

        • @[email protected]
          link
          fedilink
          English
          42 years ago

          These AIs are not neural networks based on the human brain. They’re literally just algorithms designed to perform a single task.

      • @[email protected]
        link
        fedilink
        English
        82 years ago

        I disagree with your interpretation of how an AI works, but I think the way that AI works is pretty much irrelevant to the discussion in the first place. I think your argument stands completely the same regardless. Even if AI worked much like a human mind and was very intelligent and creative, I would still say that usage of an idea by AI without the consent of the original artist is fundamentally exploitative.

        You can easily train an AI (with next to no human labor) to launder an artist’s works, by using the artist’s own works as reference. There’s no human input or hard work involved, which is a factor in what dictates whether a work is transformative. I’d argue that if you can put a work into a machine, type in a prompt, and get a new work out, then you still haven’t really transformed it. No matter how creative or novel the work is, the reality is that no human really put any effort into it, and it was built off the backs of unpaid and uncredited artists.

        You could probably make an argument for being able to sell works made by an AI trained only on the public domain, but it still should not be copyrightable IMO, cause it’s not a human creation.

        TL;DR - No matter how creative an AI is, its works should not be considered transformative in a copyright sense, as no human did the transformation.

      • @[email protected]
        link
        fedilink
        English
        62 years ago

        I thought this way too, but after playing with ChatGPT and Mid Journey near daily, I have seen many moments of creativity way beyond the source it was trained on. I think a good example that I saw was on a YouTube video (sorry I cannot recall which to link) where thr prompt was animals made of sushi and wow, was it ever good and creative on how it made them and it was photo realistic. This is just not something you an find anywhere on the Internet. I just did a search and found some hand drawn Japanese style sushi with eyes and such, but nothing like what I saw in that video.

        I have also experienced it suggested ways to handle coding on my VR Theme Park app that is very unconventional and not something anyone has posted about as near as I can tell. It seems to be able to put 2 and 2 together and get 8. Likely as it sees so much of everything at once that it can connect the dots on ways we would struggle too. It is more than regurgitated data and it surprises me near daily.

        • @[email protected]
          link
          fedilink
          English
          42 years ago

          Just because you think it seems creative due to your lack of experience with human creativity, that doesn’t mean it is uniquely creative. It’s not, it can’t be by its very nature, it can only regurgitate an amalgamation of stuff fed into it. What you think you see is the equivalent of paradoilia.

      • @[email protected]
        link
        fedilink
        English
        62 years ago

        if it’s being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used. Even if it’s transformative enough to make the original unrecognizable

        I’m going to need a source for that. Fair use is a flexible and context-specific, It depends on the situation and four things: why, what, how much, and how it affects the work. No one thing is more important than the others, and it is possible to have a fair use defense even if you do not meet all the criteria of fair use.

        • @[email protected]
          link
          fedilink
          English
          162 years ago

          I’m a bit confused about what point you’re trying to make. There is not a single paragraph or example in the link you provided that doesn’t support what I’ve said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn’t qualify as fair use.

          The key aspect of how they define transformative is here:

          Has the material you have taken from the original work been transformed by adding new expression or meaning?

          These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.

          Was value added to the original by creating new information, new aesthetics, new insights, and understandings?

          These AIs can’t provide new information because they can’t create something new, they can only reconfigure previously provided info. They can’t provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can’t provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.

          The fact that it’s so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there’s no reason why AI works shouldn’t also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can’t hide behind fair use or be given a pass automatically because “humans make derivative works too”. Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.

          • @[email protected]
            link
            fedilink
            English
            3
            edit-2
            2 years ago

            I’m a bit confused about what point you’re trying to make. There is not a single paragraph or example in the link you provided that doesn’t support what I’ve said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn’t qualify as fair use.

            You said "…fair use requires it to provide commentary, criticism, or parody of the original work used. " This isn’t true, if you look at the summaries of fair use cases I provided you can see there are plenty of cases where there was no purpose stated.

            These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.

            You’re anthropomorphizing a machine here, the intent is that of the person using the tool, not the tool itself. These are tools made by humans for humans to use. It’s up to the artist to make all the content choices when it comes to the input and output and everything in between.

            These AIs can’t provide new information because they can’t create something new, they can only reconfigure previously provided info. They can’t provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can’t provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.

            I’m going to need a source on this too. This statement isn’t backed up with anything.

            The fact that it’s so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there’s no reason why AI works shouldn’t also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can’t hide behind fair use or be given a pass automatically because “humans make derivative works too”. Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.

            AI works are human works. AI can’t be authors or hold copyright.

      • Kogasa
        link
        fedilink
        English
        122 years ago

        Your broader point would be stronger if it weren’t framed around what seems like a misunderstanding of modern AI. To be clear, you don’t need to believe that AI is “just” a “coded algorithm” to believe it’s wrong for humans to exploit other humans with it. But to say that modern AI is “just an advanced algorithm” is technically correct in exactly the same way that a blender is “just a deterministic shuffling algorithm.” We understand that the blender chops up food by spinning a blade, and we understand that it turns solid food into liquid. The precise way in which it rearranges the matter of the food is both incomprehensible and irrelevant. In the same way, we understand the basic algorithms of model training and evaluation, and we understand the basic domain task that a model performs. The “rules” governing this behavior at a fine level are incomprehensible and irrelevant-- and certainly not dictated by humans. They are an emergent property of a simple algorithm applied to billions-to-trillions of numerical parameters, in which all the interesting behavior is encoded in some incomprehensible way.

        • @[email protected]
          link
          fedilink
          English
          4
          edit-2
          2 years ago

          Bro I don’t think you have any idea what you’re talking about. These AIs aren’t blenders, they are designed to recognize and replicate specific aspects of art and writing and whatever else, in a way that is coherent and recognizable. Unless there’s a blender that can sculpt Michelangelo’s David out of apple peels, AI isn’t like a blender in any way.

          But even if they were comparable, a blender is meant to produce chaos. It is meant to, you know, blend the food we put into it. So yes, the outcome is dictated by humans. We want the individual pieces to be indistinguishable, and deliberate design decisions get made by the humans making them to try and produce a blender that blends things sufficiently, and makes the right amount of chaos with as many ingredients as possible.

          And here’s the thing, if we wanted to determine what foods were put into a blender, even assuming we had blindfolds on while tossing random shit in, we could test the resulting mixture to determine what the ingredients were before they got mashed together. We also use blenders for our own personal use the majority of the time, not for profit, and we use our own fruits and vegetables rather than stuff we stole from a neighbor’s yard, which would be, you know, trespassing and theft. And even people who use blenders to make something that they sell or offer publicly almost always list the ingredients, like restaurants.

          So even if AI was like a blender, that wouldn’t be an excuse, nor would it contradict anything I’ve said.

          • Kogasa
            link
            fedilink
            English
            162 years ago

            Super interesting response, you managed to miss every possible point.

    • @[email protected]
      link
      fedilink
      English
      312 years ago

      If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

      So, if the same rules apply, as your post suggests, OpenAI is also infringing on copyright.

      • @[email protected]
        link
        fedilink
        English
        472 years ago

        If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

        I think you completely and thoroughly do not understand what I’m saying or why I’m saying it. No where did I suggest that I do not understand modern copyright. I’m saying I’m questioning my belief in this extreme interpretation of copyright which is represented by exactly what you just parroted. That this interpretation is both functionally and materially unworkable, but also antithetical to a reasonable understanding of how ideas and communication work.

        • @[email protected]
          link
          fedilink
          English
          122 years ago

          That’s life under capitalism.

          I agree with you in essence (I’ve put a lot of time into a free software game).

          However, people are entitled to the fruits of their labor, and until we learn to leave capitalism behind artists have to protect their work to survive. To eat. To feed their kids. And pay their rent.

          Unless OpenAi is planning to pay out royalties to everyone they stole from, what their doing is illegal and immoral under our current, capitalist paradigm.

          • @[email protected]
            link
            fedilink
            English
            62 years ago

            Yeah, this is definitely leaning a little too “People shouldn’t pump their own gas because gas attendants need to eat, feed their kids, pay rent” for me.

      • @[email protected]
        link
        fedilink
        English
        6
        edit-2
        2 years ago

        A sample is a fundamental part of a song’s output, not just its input. If LLMs are changing the input’s work to a high enough degree is it not protected as a transformative work?

        • @[email protected]
          link
          fedilink
          English
          1
          edit-2
          2 years ago

          it’s more like a collage of everyone’s words. it doesn’t make anything creative because ot doesn’t have a body or life or real social inputs you could say. basically it’s just rearranging other people’s words.

          A song that’s nothing but samples. but so many samples it hides that fact. this is my view anyway.

          and only a handful of people are getting rich of the outputs.

          if we were in some kinda post capitalism economy or if we had UBI it wouldn’t bother me really. it’s not the artists ego I’m sticking up for, but their livelihood

    • @[email protected]
      link
      fedilink
      English
      62 years ago

      To add to that, Harry Potter is the worst example to use here. There is no extra billion that JK Rowling needs to allow her to spend time writing more books.

      Copyright was meant to encourage authors to invest in their work in the same way that patents do. If you were going to argue about the issue of lifting content from books, you should be using books that need the protection of copyright, not ones that don’t.

      • @[email protected]
        link
        fedilink
        English
        72 years ago

        Copyright was meant

        I just don’t know that I agree that this line of reasoning is useful. Who cares what it was meant for? What is it now, currently and functionally, doing?