How a $1.5 Billion Settlement Could Alter the Course of AI

20m
Get more information about our first-ever live show here! Limited tickets left.

Artificial intelligence company Anthropic agreed to pay at least $1.5 billion to settle a copyright infringement lawsuit over the company's use of pirated books to train large-language models. WSJ’s Melissa Korn unpacks the proposed settlement and explores what the precedent could mean for the AI industry. Ryan Knutson hosts.

Further Listening:

Why Elon Musk's AI Chatbot Went Rogue

The Company Behind ChatGPT

Sign up for WSJ’s free What’s News newsletter.

Learn more about your ad choices. Visit megaphone.fm/adchoices

Listen and follow along

Transcript

The boom in AI over the last few years has also led to a boom in lawsuits.

As artificial intelligent chatbots are growing more and more popular, a number of authors are now suing the companies behind them.

Two major studios have sued an AI startup, claiming it has, quote, blatantly copied famous movie characters.

Sarah Silverman is going into battle against the chatbots.

George R.

R.

Martin and more than a dozen other authors now suing.

The New York Times becoming the first major media company to sue over AI.

So we've got some movie studios suing.

We've got book publishers and authors.

We've got newspaper publishers, including two subsidiaries of News Corp, our parent company.

There's a lot of litigation right now over

how AI companies are kind of coming into the material that they're using to train their large language models and then also what they're doing with that material once it is in their system.

Companies like OpenAI have argued that training AI models amounts to fair use.

Last year, a group of authors filed one of those lawsuits.

against the AI company Anthropic.

They alleged that Anthropic, the AI startup, infringed their copyrights for a number of books because of the way Anthropic uploaded and then used the material for training and other purposes.

Last week, there was a big development in the case.

Artificial intelligence company Anthropic has agreed to pay $1.5 billion to settle a class action lawsuit brought by...

It's big.

If it's approved, it will still help influence how AI companies think about taking content, how publishers or other media companies might want to try to strike deals with AI companies, and it starts to put a bit of a price tag on what that material is worth to the AI company.

Welcome to The Journal, our show about money, business, and power.

I'm Ryan Knutson.

It's Wednesday, September 10th.

Coming up on the show, Anthropic's proposed settlement and what it means for the future of AI.

This episode is presented by SAP.

A bad storm hitting your warehouse.

Incomplete customs forms.

A short supply of those little plastic twist ties.

These could all deal a crushing setback to your business, but they don't don't have to.

The AI-powered capabilities of SAP will help you navigate uncertainty.

You can pivot to new suppliers, automate paperwork, and source the twist ties you need so your business can stay unstoppable.

Learn more at sap.com/slash uncertainty.

This episode is brought to you by AARP.

They have reskilling courses and career tools to help your income live as long as you do.

The younger you are, the more you need AARP.

Learn more at aarp.org/slash skills.

All right.

Well, so first of all, can you introduce yourself?

My name is Melissa Corn, and I'm an editor on the tech and media team.

And are you a human or a robot?

I am a human.

Are you sure?

Fairly certain.

I can tell you you look like a human.

It's natural intelligence, not artificial.

Okay, so we'll now introduce us to Anthropic, which makes the Claude chatbot, whose intelligence is artificial.

Anthropic is a very fast-growing AI company.

I mean, its valuation has tripled in just the last six or seven months.

It is now valued at $183 billion.

It is one of the kind of white-hot AI companies.

There's quite a few of them these days.

But it is really

one of the giants in this space.

And how does Anthropic train Claude?

So Claude, we're acting as if Claude is a guy sitting in the room with us, right?

But

so Anthropic and all the other AI companies feed tons and tons of content into these computers to train them, to make them learn, to make Claude learn how to put a sentence together, what's happened in history, how to do math, all of that.

It's sort of like the underlying principle, right, is just sort of the same way a human learns, would you just expose it to enough material and then it'll identify the patterns and then it can start kind of, I guess, arguably, thinking for itself.

Right.

And it's not quite thinking for itself yet, but it is learning how to assess information that it's given.

So some of these computer models, unlike you and me, it ingests everything really, really fast, right?

It takes me a few seconds to read a sentence.

Claude can read an entire book extremely quickly.

So it can ingest millions of books very quickly and then get to training and get to learning.

Are you saying you haven't read millions of books?

I mean, I have a pretty good library at home, but I don't think it's quite millions, no.

And I certainly haven't done it in a matter of minutes or hours.

Ah, God, that would be so cool, wouldn't it?

Just to be like to read every book that came out.

I don't know.

I kind of enjoy the process.

I relish the experience.

I don't want it to go too fast.

Okay, human.

Last year, three humans, the authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, filed a lawsuit against Anthropic, arguing the company improperly used their copyrighted works to train its AI models.

They said that they think this should be a class action lawsuit because they are representing a number of other authors.

So they filed the suit last summer.

Things started to move through the court system.

Obviously, that's a slow, laborious process.

It's not millions of books a second speed.

It's judicial process speed.

Correct, correct.

So according to the lawsuit, how did

Anthropic allegedly acquire the books that it used to train its models?

So there were a few different ways that Anthropic got the books.

The lawsuit says that Anthropic stole a lot.

So they used these kind of shadow libraries, places like LibGen, and there's a few others.

Almost, if you remember, Napster back in our teen days.

LimeWire.

LimeWire, exactly, right?

Like it just, people had uploaded these things and they downloaded them.

That simple.

They also bought books and had some other access to books that are, you know, out of copyright.

So they got the books through a few different means and in some cases got the same book a few different ways, but they did not go out and purchase a copy of every single book that they used to train Claude.

And that's according to the lawsuit.

So in the court case, what did Anthropic argue?

And what did the authors argue?

So Anthropic asked for this summary judgment on the fair use argument.

Meaning they wanted the case dismissed.

Exactly.

And just help me understand, what is the fair use argument they're making here?

If you wrote a book and I copied your book word for word and published it as my own, that would be problematic.

If you wrote a book and I wrote a book report about it, an analysis of it, or I used it to inspire another book

that maybe has some similar characters, but is otherwise quite different.

Or I wrote a song about your book.

That's fair use because I'm not just copying what you already did and have protected under the law.

You're reading my book, and then you are creating some new work out of it that's transforming it into something else.

Exactly.

So that word transformative is important here.

That's generally the bar, that it needs to be transformative, or it needs to

have been accessed in a fair way that there is compensation for artist.

And so Anthropic is arguing: yeah, we're reading the book, but we're also reading millions of books, and it's just making our AI really smart.

We're not just like stealing the book and then distributing it to other people to read.

Exactly.

The authors, meanwhile, argue that Anthropic stole their books and that they deserve to be compensated.

And earlier this summer, the judge reached an important decision.

So in June, the judge issued a summary judgment for Anthropic on the fair use argument, saying

it's okay for

AI companies or for Anthropic specifically here to use this content in this way.

It's fair use.

That part of the argument, the authors are not going to win.

And that seems like kind of a big deal here, because if a judge does issue a summary judgment, that is a court precedent where a judge is saying, yes, reading all these books,

using it to train your model, that's an okay thing to do.

That is fair use.

Right, exactly.

So, this was seen as a big win for Anthropic, for AI companies in general, but came with a bit of an asterisk with the judge saying,

Yes, the way you've used this stuff is fair use, but the way you got some of it is not okay.

And that asterisk is a very expensive asterisk.

That's next.

This episode is brought to you by Progressive Insurance.

Do you ever find yourself playing the budgeting game?

Well, with the name Your Price tool from Progressive, you can find options that fit your budget and potentially lower your bills.

Try it at Progressive.com.

Progressive Casualty Insurance Company and Affiliates.

Price and Coverage Match Limited by State Law.

Not available in all states.

At blinds.com, it's not just about window treatments, it's about you, your style, your space, your way.

Whether you DIY or want the pros to handle it all, you'll have the confidence of knowing it's done right.

From free expert design help to our 100% satisfaction guarantee, everything we do is made to fit your life and your windows.

Because at blinds.com, the only thing we treat better than windows is you.

Visit blinds.com now for up to 50% off with minimum purchase plus a professional measure at no cost.

Rules and restrictions apply.

The judge's ruling came with some good news for Anthropic.

He said the company's fair use claim was valid.

It could use copyrighted books to train its AI models.

But the judge also gave Anthropic some bad news.

What he did say was

Anthropic had no entitlement to use pirated copies for its library.

So how they got the books became the thing at issue.

In other other words, it's okay to use the books.

It's not okay to steal the books.

Exactly.

Violating copyright law can carry a penalty of anywhere between $750 and $150,000 per book.

And how many books are we talking here?

The initial number was that there were 7 million works that were used

by Anthropic.

The numbers, the dollar signs could start to add up very, very quickly if this class action lawsuit went forward and a jury found in favor of this class of authors against Anthropic.

It could be existential.

This brings us to the proposed settlement that was announced last week.

So the headline number of the proposed settlement is $1.5 billion,

which is how much Anthropic would pay to the class to compensate them for the stolen materials, for the pirated materials.

As part of the settlement, the number of impacted books got whittled down from over 7 million to around 500,000.

And you get to that 500,000 from that initial 7 million because some works were ultimately purchased.

There's some duplication, a lot of duplication in the initial list of what was downloaded.

Some books were not actually registered with the copyright office the way one would assume they are right away.

So we get to about roughly 500,000.

We do not know what books, what works are included in that list yet.

We do not know exactly how the payouts will come, right?

Authors, publishers, who gets what portion of what money.

How much money do we think will go to each author?

So it's roughly $3,000 per work.

But again, we don't know if that's to the author, a portion to the author, a portion to a publisher, how that gets worked out.

And $1.5 billion.

Is that a lot of money to Anthropic?

Anthropic's valuation is $183 billion.

So it's, you know, not going to bankrupt the company, which is, you know, part of why they wanted to settle.

But it's not nothing, but it is not going to debilitate them.

While the authors and Anthropic have agreed to the terms of the settlement, there's still one more hurdle.

The judge has to approve it.

And he's expressed some skepticism.

So over the weekend, the judge issued an order saying, we're going to have a hearing to to discuss the proposed settlement, but it was very like, I'm not angry, I'm disappointed vibes from the letter.

I'm not mad at you.

I'm mad at your behavior.

Pretty much.

And it used the word disappointed, right?

The judge said, like, there are still a lot of unanswered questions here.

And I am not totally convinced that you have.

Enough of it worked out yet for me to approve the settlement.

I see.

So not necessarily it should be more money or less money, just more like there's not really enough detail here yet right so he says we don't know what the list of works is yet we don't know what the list of authors is yet how do you make sure everyone knows that they're eligible for a potential payout how are you going to do the payouts

So they held a hearing earlier this week in San Francisco in court, and the judge was pretty harsh in laying out, especially to the plaintiff's lawyers, I don't think you have it all there yet.

It's not fully baked.

And he also said that he was concerned that there would be some more lawsuits, like people coming out of the woodwork after this, right?

Saying, wait, I wasn't included.

I should have been included.

I wasn't notified fairly.

I should have been notified.

The goal of a settlement is to end litigation, not launch a whole bunch of new stuff.

He set a schedule of deadlines over the next few weeks.

You know, you have to have a list of the works that are involved involved by X date.

You have to have the class list by X date.

They're going to convene again in a few weeks in early October, and he will presumably then approve or reject the proposed settlement.

So what might this mean for all those other cases that are playing out in court right now against between creators and AI companies?

Aaron Powell, it will have an impact.

Again, it's not legal precedent, so it won't immediately dismiss everyone else's cases or anything like that.

But the other cases are still moving forward in a lot of instances, including there was a new case filed last week, right?

The day that Anthropic and the authors announced their proposed settlement, there was another lawsuit filed by some authors against Apple about how they ingested books for their training models for AI.

So there's still a lot of litigation action coming down the pike.

And not just in books, right?

We're talking movies, we're talking songs, we're talking all sorts of creative content.

It feels like as a society and as a legal system,

we're feeling our way through the dark to try to figure out and find an answer to this really big

question.

Yes, and I think we're starting to get a few answers, right?

That the AI companies can use this content.

It is fair use.

But there's still a lot of unknowns about what a compensation model should look like or could look like and how that's going to differ by medium, how that's going to differ by AI company.

And so you're seeing this in litigation.

You're also seeing this in discussions between AI companies and media companies about what can they do on the front end so that they don't end up in litigation on the back end.

Has anybody asked Claude, Anthropic's AI

machine,

what does it think the answer should be for how authors and creators should be compensated?

I have not, but I have a feeling I know what you're going to be doing later this morning.

Certainly.

Hey, Claude, mind if I ask you a question?

Of course, go ahead and ask me anything.

How much do you think authors should be compensated for having their work used to train AI models?

That's a really complex question with valid arguments on multiple sides.

Authors deserve fair compensation when their creative work contributes value to AI systems.

But the mechanics of how that would work are challenging to figure out.

What's your take on it?

I'm asking you.

I think authors should receive some form of compensation, especially when their work directly contributes to commercial AI systems.

The challenge is creating a fair system that recognizes their contribution without making AI development prohibitively expensive.

Okay.

Thanks, Claude.

You're welcome.

Before we go, I want to make sure you heard that the journal is hosting our first ever live show in New York City next month.

We'll be at the Grammarcy Theater on Tuesday, October 7th.

There are literally only a handful of tickets left.

We are almost sold out.

So if you want to be there, head to bit.ly

slash thejournal live25 for tickets and more information.

You can find the link in our show notes.

We'd love to see you there.

That's all for today.

Wednesday, September 10th.

A quick note: The Wall Street Journal's parent company, News Corp, has a content deal with OpenAI.

Two of News Corp's subsidiaries have also sued Perplexity.

The journal is a co-production of Spotify and the Wall Street Journal.

Additional reporting in this episode by Jeffrey Trachtenberg and Angel Ao Young.

Thanks for listening.

See you tomorrow.