Meta beats Kadrey, AI training was fair use — what this means

David Gerard · 10 days ago

Meta beats Kadrey, AI training was fair use — what this means

@[email protected] · 10 days ago

@[email protected] · 9 days ago

I think in most EU countries - after lobbying from US copyright corporations - it is explicitly banned to make copies from an illegal original. This was in order to criminalise downloads from torrents whether you seed or not. And the potential punishment typically involves jail sentences in order to give the police access to the surveillance necessary to prove the crime. Plus copyright violations being the only crime that in all EU countries also yields punishing damages.

Now I know this because I was against every single one of these unproportional laws, but some copyright organisations over here should know this. Just saying it would be fun if Meta got to pay out punishing damages. And even funnier if Zuckerberg got some jail time.

sunzu2 · 8 days ago

Just saying it would be fun if Meta got to pay out punishing damages.

It would be pretty great but we both know that’s k now how this cookoe crumbles.

The oligarchs never get the dick of the law, their property is protected

Plebs pays the taxes and their property and rights is to be looted to enable oligarchs to live their best lives

Limp dick regimes enable it because corruption fucks hard

And just like that, we are a living in the future and it is a dystopia with out much of the cool tech. Just endless extraction and deteriorating socio economic conditions

@[email protected] · 8 days ago

Read carefully. On p1-2, the judge makes it clear that “the incentive for human beings to create artistic and scientific works” is “the ability of copyright holders to make money from their works,” to the law, there isn’t any other reason to publish art. This is why I’m so dour on copyright, folks; it’s not for you who love to make art and prize it for its cultural impact and expressive power, but for folks who want to trade art for money.

On p3, a contrast appears between Chhabria and Alsup (yes, that Alsup); the latter knows what a computer is and how to program it, and this makes him less respectful of copyright overall. Chhabria doesn’t really hide that they think Meta didn’t earn their summary judgement, presumably because they disagree with Alsup about whether this is a “competitive or creative displacement.” That’s fair given the central pillar of the decision on p4:

Llama is not capable of generating enough text from the plantiffs’ books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data.

An analogy might make this clearer. Suppose a transient person on a street corner is babbling. Occasionally they spout what sounds like a quote from a Star Wars film. Intrigued, we prompt the transient to recite the entirety of Star Wars, and they proceed to mostly recreate the original film, complete with sound effects and voice acting, only getting a few details wrong. Does it matter whether the transient paid to watch the original film (as opposed to somebody else paying the fee)? No, their recreation might be candid and yet not faithful enough to infringe. Is Lucas entitled to a licensing fee for every time the transient happens to learn something about Star Wars? Eh, not yet, but Disney’s working on it. This is why everybody is so concerned about whether the material was pirated, regardless of how it was paid for; they want to say that what’s disallowed is not the babbling on the street but the access to the copyrighted material itself.

Almost every technical claim on p8-9 is simplified to the point of incorrectness. They are talking points about Transformers turned into aphorisms and then axioms. The wrongest claim is on p9, that “to be able to generate a wide range of text … an LLM’s training data set must be large and diverse” (it need only be diverse, not large) followed by the claim that an LLM’s “memory” must be trained on books or equivalent “especially valuable training data” in order to “work with larger amounts of text at once” (conflating hyperparameters with learned parameters.) These claims show how the judge fails to actually engage with the technical details and thus paints with a broad brush dipped in the wrong color.

On p12, the technical wrongness overflows. Any language model can be forced to replicate a copyrighted work, or to avoid replication, by sampling techniques; this is why perplexity is so important as a metric. What would have genuinely been interesting is whether Llama is low-perplexity on the copyrighted works, not the rate of exact replications, since that’s the key to getting Llama to produce unlimited Harry Potter slash or whatever.

On p17 the judge ought to read up on how Shannon and Markov initially figured out information theory. LLMs read like Shannon’s model, and in that sense they’re just like humans: left to right, top to bottom, chunking characters into words, predicting shapes and punctuation. Pretending otherwise is powdered-wig sophistry or perhaps robophobia.

On p23 Meta cites fuckin’ Sega v. Accolade! This is how I know y’all don’t read the opinions; you’d be hyped too. I want to see them cite Galoob next. For those of you who don’t remember the 90s, the NES and Genesis were video game consoles, and these cases established our right to emulate them and write our own games for them.

p28-36 is the judge giving free legal advice. I find their line of argumentation tenuous. Consider Minions; Minions are bad, Minions are generic, and Minions can be used to crank out infinite amounts of slop. But, as established at the top, whoever owns Minions has the right to profit from Minions, and that is the lone incentive by which they go to market. However, Minions are arbitrary; there’s no reason why they should do well in the market, given how generic and bad they are. So if we accept their argument then copyright becomes an excuse for arbitrary winners to extract rent from cultural artifacts. For a serious example, look up the ironic commercialization of the Monopoly brand.

Roamin' Chemicals · 10 days ago

@dgerard I was pretty bummed out about this, but the judge in the Meta case seems confident that future lawsuits will find that training on copyrighted works *isn’t* fair use, it’s just that these particular plaintiffs made bad arguements.

https://arstechnica.com/tech-policy/2025/06/book-authors-made-the-wrong-arguments-in-meta-ai-training-case-judge-says/

@[email protected] · edit-2 8 days ago

As somewhat of an author I fucking can’t understand how.

To win, they’d need to demonstrate specific harms (from specific infringer, to specific book), the “amazon is full of slop” won’t do.

It’s like someone makes a movie without licensing from the book author, and then the judge says that authors must argue that movies harm book sales.

edit: except much much worse because good luck pointing at specific instances of slop and connecting them to specific ai and its training on a specific work. At least with a movie you can point at a movie and at the book its made from.

edit: frankly the whole thing just sounds like both of the judges had to sound neutral, and because Meta’s conduct was more egregious the judge had to write weirder stuff to sound neutral.

Anthropic’s judge can simply slap them with (likely insignificant) fines in the light of theirs displaying “good faith” by buying a bunch of books legally. Meta’s judge had to invent whole new theory of unfair use that plaintiff lawyers can’t possibly support with evidence.

Roamin' Chemicals · 8 days ago

@diz I’m not a legal expert, but I think it’s more straight-forward than your analogy. It’s literally books used to make books to be sold in the same book market. The derivative work is clearly supplanting the original, which fair use law is supposed to prevent. See point 4 in this link, I think what the judge is arguing is that the courts will find that AI’s disruption of the markets that its training data comes from will render it not fair use of the training material.

https://www.law.cornell.edu/uscode/text/17/107

@[email protected] · edit-2 8 days ago

But he’s saying that plaintiffs need to demonstrate said disruption, to even get to the jury.

He said:

In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use

And what do those records are supposed to look like? Harm has to be specific, always had been. How do you ever demonstrate that a specific AI harmed the market for a specific book?

I honestly think both judges had to try to appear neutral and Meta’s had to work harder to appear neutral because Meta’s conduct was worse, hence a more bizarre argument. Misanthropic can be slapped with a small fine.

Roamin' Chemicals · 8 days ago

@diz That’s fair, and I don’t know, I’ve pretty much gone the limit of my knowledge here. I guess I just can’t bring myself to cede the victory to the AI companies and say “training IS fair use” if the judge in one of the cases in question thinks there’s a good chance it generally isn’t. Maybe in the future it will be settled, but I don’t think we’re there yet.

@[email protected] · edit-2 8 days ago

Thought about it some more, the most charitable I can get is that Meta’s judge thinks someone else could win the case if they have a specific book that was torrented and then they point at the general situation with AI slop in bookstores and argue that AI harms book sales.

I can not imagine that working. At all. So the AI is producing slop of infinitesimally higher quality because it was trained on a pirated copy of your book in particular. Clearly the extra harm to specifically your business due to piracy of specifically your books would be rather small, as this very judge would immediately point out. In fact the AI slop is so shit that people only buy it by mistake, so its quality doesn’t really matter.

Maybe news companies could sometimes win lawsuits like this, but book authors, no way.

I think it is just pure copium to see this ruling in any kind of positive light. Alsup (misanthropic’s judge) at least was willing to ding an AI company for pirating books (although he was probably only willing to ding them for that because it wouldn’t be fatal to them the way it would be to Meta). This guy wouldn’t even do that bare minimum.

And the whole approach is insane. You can’t make a movie without getting a movie rights contract with the author. A movie adaptation of a book is far more transformative than anything AI does. Especially the “training” which is just fucking gradient descent, you nudge a bunch of numbers towards replicating the works, over and over again, in a purely mechanical process.

Nobody ever had to successfully argue that movie sales harm book sales just to treat movie adaptations as derivative work.

Roamin' Chemicals · 8 days ago

@diz I admit, it would be challenging. I don’t think it’s cope, though, because I think there are practical reasons not to cede victory. I think once “AI is fair use” becomes a meme, many will assume it to mean “AI is ethical”, and belief that there are no open legal questions will increase adoption.

Like, the literal fact of the matter is that the courts haven’t decided this categorically. Why get ahead of ourselves and pretend they have, just because it seems inevitable? What’s the benefit?

@[email protected] · edit-2 8 days ago

It’s not about ceding victory, it’s about whether we accept shit talking plaintiff’s lawyers as an adequate substitute for a slap on the wrist, or not. Clearly the judge wants to appear impartial.

Plaintiff made a perfectly good argument that meta downloaded the books illegally, and that this downloading wasn’t necessary to enable a (fair or not) use. A human critic does not get a blanket license to pirate any work he might want to criticize, even though critique is fair use.

@[email protected] · 9 days ago

At the very least, there is currently some legal uncertainty for all the AI companies that behave in this way, which is a good thing, I guess.

@[email protected] · 8 days ago

So, the judge says:

In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use.

And what is that supposed to ever look like? Do authors need a better developed record of effects of movies on book sales, to get paid for movie adaptations, too?