This Site Unmasks Cops With Facial Recognition
YouTube version: https://youtu.be/1eieXQIaALA
‘FuckLAPD.com’ Lets Anyone Use Facial Recognition to Instantly Identify Cops
Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not
The AI Slop Fight Between Iran and Israel
Thumbnail credit: Photo by Sean Lee/Unsplash
Subscribe at 404media.co for bonus content
Learn more about your ad choices. Visit megaphone.fm/adchoices
Listen and follow along
Transcript
Hello, welcome to the 404 Media Podcast, where we bring you unparalleled access to hidden worlds, both online and IRL.
404 Media is a journalist-founded company and needs your support.
To subscribe, go to 404media.co.
As well as bonus content every single week, subscribers also get access to additional episodes where we respond to their best comments.
Gain access to that content at 404media.co.
I'm your host, Joseph, and with me are 404 Media co-founders Sam Cole.
Hello, Emmanuel Mayberg,
and Jason Kebler.
Hi.
So we're doing something unprecedented for 404 Media next week.
We are taking the week off.
We have not done that since we launched in around August 2023.
Obviously, we've each taken days off here and there, but we've never taken extended time off as a company.
So we'll be off for the week.
Um, I notice other podcasts do this as well, and they'll do a rerun or something like that.
There should be a podcast up next week anyway, it will be probably an interview that Jason recorded.
We're still working out the specifics, and we actually want to do more of those interviews as well.
So, um, keep an eye out for that.
But you won't be getting the normal show next week, you'll be getting uh something a little bit different.
Let's go straight into this week's stories, though.
This is one um, Emmanuel Emmanuel wrote: FuckLapd.com lets anyone use facial recognition to instantly identify cops.
I guess, just first of all, Emmanuel, how did you first come across this?
Because I came across it as well, and I'll tell you how, but how did you first see this?
I think I was going to say, oh, I didn't know
that you actually saw it also, so I'm interested
in knowing.
I found out about it via a subreddit that is dedicated to
um
monitoring like very aggressive ice crackdowns and ice raids and they were just sharing it as a possibly useful tool where did you see it so i saw it on tick tock where a person was using this tool and we'll explain a little bit more more about what it is and sort of uh everything about it in a second but basically it's as the headline says, face recognition tool to identify cops.
I saw somebody on TikTok using it, goes up to a police officer, points their iPhone or whatever camera into the face brings up the guy the cop's name reads out his salary oh wow to the cop and then the cop starts laughing a little bit I was gonna try and find it again but I couldn't find the video how much did he make you remember it was about 100k
and yeah you can see him sniggering it's very very funny but I can I tell you when I saw it because I either hallucinated this but I thought I saw something like this like a year ago to be totally honest with you we talked about this yeah that's why when I saw it at the weekend i was like ha ha that's funny they're still using that tool and i didn't actually realize it was new so sorry
but i don't know what it was i i don't think we wrote about it i don't i don't remember where it was this is not like terribly interesting i wasn't here yesterday though but it's like i either dreamed about this or i saw a site that looked really similar to this like There is an atmosphere.
So let me explain what this is and then we can talk about like why we all we all thought it already existed.
And which is the first thing I asked when I pitched it in Slack.
I was like, Isn't this old?
Anyway,
fucklapd.com is a site where it has a very simple interface.
All you can do
is upload a single image of a LAPD officer's face, and it will use facial recognition tech
and
a database of,
from what I can tell, most of the LAPD
workforce to automatically identify which police officer it is, pull up their name, badge number, and then also, yes, their salary, all of which are public records, which is why this data is available.
And the reason this is something that has been launched now,
according to the guy who made it, Kyle McDonald, he launched a site on Saturday, and this was in response to the anti-ICE protests in LA,
which got
very heated.
And there were many instances of
police violence against protesters.
As everybody, I think, knows by now, the National Guard was called in.
There was a huge confrontation between the governor, Gavin Newsom, and Trump.
And yeah, in response to
LAPD playing fast and loose with protesters, he just created a tool that would allow people, hopefully, to identify cops who sometimes.
I haven't seen this happen
in the LA protests recently.
Maybe Jason knows more, but definitely
I remember during the George Floyd protest in New York, there were instances of cops who just like covered their badge number with a piece of tape.
Very big during BLM.
Right, which they're supposed to keep visible for the purpose of allowing the population to like identify who is enforcing the law and who is maybe using too much force.
And this would allow you to get to get around that trick that some cops use.
Yeah.
I haven't seen them doing it much in the anti-ICE ones, but
there have been clips of horseback LAPD officers beating up people when they don't think the cameras are looking and all of that sort of thing.
Plenty of examples.
So you spoke to Kyle McDonald, who is an artist who made this site.
And they've done other work, which I think we'll touch on in a minute.
You spoke to him
just a little bit more explicitly.
Why did he make this?
Is it like for transparency?
Is it for accountability?
Like, what does he tell you when you asked him, hey, why did you actually make this?
Yeah, so first of all, it's Kyle McDonald.
He has a website called kylemacdonald.net.
I really recommend people check it out.
He's been working for more than a decade now doing this kind of work, doing exactly the kind of work that we used to cover all the time at Motherboard and I have covered a few times here 4-4,
just subversing technology, using it in unexpected ways to
make art and often make pretty explicit.
political statements.
In terms of this project, let me try and stumble through his quote here because I I think it's pretty good and direct.
He told me, We deserve to know who is shooting us in the face, even when they have their badge covered up.
Fucklapd.com is a response to the violence of the LAPD during the recent protest against the horrific ICE raids, and more broadly, the failure of the LAPD to accomplish anything useful with over $2 billion in funding each year.
So I think that that is
pretty good summation of why he's doing this.
Yeah, that makes makes sense.
So,
Jason, will you?
I can, okay, so now I remember where it came from or like why I thought that this existed previously is back in 2023,
there was a website called Watch the Watchers, and it was a database of officers' headshots, names, higher dates, ranks, and ethnicities.
that was compiled using public records requests.
And the website
looked sort of similar, and you could search by badge number.
And it was billed as counter-surveillance.
Emmanuel, do you know if this is related?
It is, yeah.
So
Kyle really wanted to stress.
So the organization you're talking about, Watch the Watchers, is part of something called the Stop LAPD Spying Coalition, which is based in Skid Row in LA, which is a neighborhood with a lot of homeless people and a lot of
people really having drug problems, mental health problems, kind of like a notorious part of the city.
So they created this database.
They're not affiliated with fucklapd.com, but that is where he pulled the data.
He is just
using facial recognition to match the headshots that are in that database that then pull up the information about the individual officers.
Yeah, rather than searching a badge number, you search a face,
which practically speaking is probably going to be a lot more useful to people in the protest.
Well, first of all, because if a badge number is covered, obviously, then they're going to be able to get it.
But also, just in the sort of hecticness of the situation of a protest, it might be easier just to point a camera at somebody and then get their photo and then do it that way.
So Watch the watchers gets this data through public records requests which i haven't seen the actual language of that but presumably it's like give us all the information of your stuff and i think they have to sue for some of that it wasn't like an easy it's like i i think um
i mean not to disrespect the kyle's work here i think it's very good and interesting but uh
44 media is huge fans of foyer like the really you have to admire the work of just like getting this data out oh yeah yeah yeah for sure um
And it looks like it works because you tested it a few times.
Walk us through that.
What did you do and what were the results?
Yeah, so I wanted to kind of give it an easy test case.
So I just took a still from an LAPD press conference responding to accusations of police violence during the protest, actually.
And I just...
screenshotted one of the faces of the cops that was, you know, a front shot of his face, like very clear.
And I uploaded that to the site.
Within a few seconds, it pulled up nine results.
All
and each result is like headshot, name, and a link to the Watch the Watchers database page for that person.
And they were all
the officer was like a bald white man, and it showed me nine
officers that vaguely match that description.
But the first one was the correct one.
So it worked perfectly in this case.
The site, the fuck LAPD site is clear that
it has limitations.
One of which is, you know, if you're pulling a TikTok video of some incident involving a LAPD officer and it's far away and it's blurry, it's not going to work.
And then obviously, at least when it comes to ICE, and that's a different, the ICE officers would not show up in this database.
But if the officer has their face covered, obviously that wouldn't work either.
Yeah.
Yeah.
And I mean,
just because I didn't really spell it out, but the reason that I thought, oh, this has already been done is because it had been done in other cities.
We saw an example,
I think Sam alluded to, with there was the New York one.
And then there was Portland as well.
And again, this was mostly during BLM, that sort of thing.
So I see facial recognition tool used on police.
I'm like, oh, okay.
They're still doing that.
And no, this is an interesting marriage between facial recognition tech and public records.
And I think it's particularly interesting in that it's all done locally on the device, which you couldn't have necessarily done even a couple of years ago.
Would that be fair, Emmanuel?
I think so.
The
Kyle, who made this website, himself has made a similar tool for ICE called iSpy back in 2018.
And that tool, I believe, doesn't work so well anymore, both because it doesn't have the like the data isn't as good, right?
It doesn't have this watch the watches database to work with, but I don't believe that one runs locally.
Well,
it when it launched the ISPY one, which is to basically do the same thing, but to identify ICE employees, and that's not just
immigration enforcement.
It could be a lawyer in Texas or something like that, but ICE broadly.
There's two things.
It used Microsoft's facial recognition system,
an API,
and
it was just using that to process the actual
identification of faces.
Microsoft sunset that after the project was released again in 2018.
But now Carl McDonald has brought it back because the processing can now be done locally on the device.
So you're like, oh, okay, they're going to relaunch a tool or whatever.
Obviously, that's especially notable.
A tool that would identify ICE when there's mass deportations happening.
And of course, ICE officials are repeatedly covering their faces.
And maybe I can talk about that in a second.
But there is a second component, which is the ICE spy database, is built on a scraped
set of profile images and data from LinkedIn.
So what somebody had done back in the day was go in, in, scrape LinkedIn for anybody you could find who worked at ICE, grab their name, their title, the city, and their profile photo.
Now,
in sort of a scary, depressing way, 2018 was actually a long, long time ago.
Like,
you know, several, several years, stuff has changed.
Administrations have passed.
People have joined ICE, people have left ICE.
And I tried out ICE Spy, again, just for verification purposes.
And frankly, it it was bad like it didn't get any of the results I uploaded and I was actually giving it low balls just to start you know and I was actually downloading images from LinkedIn of ICE officials and then running it through and it didn't recognize those I would upload a photo of a black man it would just present a photo of another black man and it like it was not accurate at all.
That being said,
that's probably not the facial recognition algorithm and system itself.
It's because the data is out of date and it's sort of doing the best it possibly can
with the data, right?
And I guess just the last thing I'll say on that is that
the reason that people would probably want this for ICE specifically is because as many of us have seen, ICE continues to carry out deportation efforts while not just covering their badges like police may have done in in earlier protests, but full-on sunglasses, a baseball cap, covering their neck, masks as well, refusing to say what agency they're from, refusing to provide their name.
So facial recognition could actually provide a moment
and a tool that people could actually use to identify who are these armed men?
who are saying, you need to get into my truck now, and we're leaving, and I'm from ICE.
And of course, that refusal to identify themselves, that refusal to provide an avenue for accountability
has created chaos where a group of four or five alleged ICE agents or whatever will go to a car wash or a Home Depot or whatever it is, and the civilians there really cannot tell whether this is actually a group of ICE agents or not.
To the point there where we have criminals actually impersonating ICE officials, including in Brooklyn in an attempted rape, elsewhere we have robberies as well.
So they've created this environment where no one can be sure what is going on, which is of course beneficial to them in some respects.
But now there's this narrative on Blue Sky and other places that, oh, none of these people are ICE officials.
They are just bounty hunters or they're January 6ers or whatever.
And don't get me wrong, I think eventually it will probably turn out that some of them were
opportunistic random members of the public.
But I think a lot of them are still ICE, you know, and a tool like this could help people figure out, okay,
well, this ICE agent beat up my family, punching him in the face before throwing him to a van.
And now I can identify him and at least, you know, file a lawsuit or contact the agency, find out where he's been.
But without this, there's very little recourse for identifying an agent or identifying the agency as well.
So I guess we'll see if they improve that tool, you know.
I was just going to say, I think tools like this are super interesting and definitely, you know, we should be writing about them.
People should know that they exist.
I think the
use cases of them are tricky,
which is just to say, again,
to like generalize an argument I've seen on Blue Sky is like
people
are saying if you get grabbed by a masked man, like demand their badge, demand that they are, you know a federal officer etc etc and it's like that sounds great in theory i think in practice when this is happening very quickly
it's like a very scary situation i've seen actual lawyers saying like yes this sounds like great in theory
fighting back against it if you like
If you assume that someone who is kidnapping you is not a federal agent and you then assault a federal agent, like fighting that in court is potentially like life-ruining, very difficult situation.
And it's just like that, that all just sort of underlines the fact that
when these people are wearing masks,
it is
like, how are you supposed to respond as someone who is being you know, grabbed by someone with a mask, I guess, especially in the context of what happened in Minnesota, where the alleged guy who assassinated
a state lawmaker there was dressed up as a police officer or dressed up as law enforcement.
And so you have like
alleged assassins dressing up as law enforcement.
You have actual law enforcement masking themselves, not identifying themselves.
And it's like the, it's just a very, very like scary and difficult time at this moment.
And I think that tools like this are really useful when there's already footage and you can sort of like figure this stuff out after the fact.
I think
during the fact, it's like if you're like, hey, wait a second,
let me take a picture of you and look you up while you're trying to arrest me.
It's like, it's just a very, very volatile situation.
And then I guess the last thing is like, I do kind of wonder,
like, I do think that this tool is very useful.
Um,
I wonder if ICE and the LAPD
use the existence of tools like this to say, look, we need to mask, like, leftists are doxing us-type vibes.
And surely they will.
I think that is an argument that will be made.
That doesn't mean
that tools like this aren't useful.
Um,
but there's this narrative that the administration is putting out right now that like ICE agents are being harassed and attacked, and that's why they're masking.
And I think that that is like a really
tricky situation.
Yeah.
ICE has said the assaults of his officers have increased something like 400%, but as the an opinion piece in the Washington Post broke it down, it's those numbers are a little bit iffy in that, well, we don't know the actual number as a percentage.
Like, you know, do they go from five to twenty or or something
like that?
But and also ICE is putting putting themselves in a situation where people are more likely to fight back when they roll up with masks on
people working at Home Depot or hot dog stand vendors or things like that.
Or if you increase enforcement by 400%, would it not be logical that incidents would increase by 400% or something?
Yeah.
I'll just say also being in Los Angeles and like going around the city right now,
there are places where there used to be vibrant communities of people that are empty right now.
And it is really dystopian and really scary.
It's like I went to Home Depot the other day, and there usually are people there who are like, I will help you install this drywall that you're purchasing, or like, I will help you lay down this sod that you're buying or whatever.
Like, that is a common thing that I've experienced every single time I've been to Home Depot.
I went to Home Depot this weekend, no one there, like, no one.
I saw, I've seen some posts on Reddit, like after concerts, there's like no hot dog stand vendors there.
There's like a lot of farmers markets that are totally empty.
There's like secondhand,
like thrift markets that are, that are empty.
And it's like,
regardless of whether these people are undocumented or not, you have like
anyone who has brown skin who is like scared to go out in public at this point.
It's like a lot of my friends are scared to go out in public at this point.
And it's, it's really, really messed up.
And it's, it's like, again,
Los Angeles doesn't want this to be happening.
The community doesn't want this to be happening.
Yeah, I think that's a good place to leave it.
And we'll definitely keep an eye on this tool.
And if any others are made or improved or whatever,
I'm sure we will do a follow-up.
After the break,
Sam is going to ask Jason all about his latest piece, which is about a, frankly, pretty complicated court ruling that we spent a long time trying to get a headline for.
We'll be right back after this.
I'm here to tell you about the Limited Edition Inbound Conference, which brings you to San Francisco for a one-time only West Coast event this September 3rd through 5th, where you'll get insights you won't find anywhere else.
Inbound 2025's agenda is now live from the agent.ai workshop, from idea to agent, and Dwarquesh on AI's future, research-backed bold predictions with Dwarkesh Patel.
Explore more than 200 sessions you won't find anywhere else.
Get fresh perspectives on innovation from a dynamic lineup, including Sean Evans, the host of Hot Ones, the one and only Amy Poehler, tech reviewer Marcus Brownlee, and AI pioneer Dario Amadei, only at Inbound 2025.
Cut through the noise with focused, actionable takeaways on the latest marketing, sales, and AI trends that give businesses a competitive edge in today's rapidly changing landscape.
Network with decision makers in San Francisco's AI-powered ecosystem, where innovative technologies are creating entirely new approaches to business.
Experience firsthand how San Francisco's technology ecosystem is revolutionizing content creation, distribution, and monetization through AI and innovative tech solutions.
Secure your spot at inbound.com slash register.
That's inbound.com slash register.
Hey, it's Joseph again.
If you're a new listener to the 404 Media Podcast or even a long time one, you might not be aware of all of the impact our journalism has had recently or how we even got here in the first place.
In 2023, the four of us quit corporate media to go independent.
We were sick of working for a VC-backed company that put profits before journalism.
That gave birth to 404 Media.
Since then, we've stopped the spread of AI books in public libraries, triggered class action lawsuits against AI companies, got Congress to pressure big tech in various ways, and we've even shut down surveillance companies.
This real-world impact is only possible because of our paying subscribers.
As a journalist-owned business, they are the engine that powers our journalism and where the vast, vast majority of our revenue comes from.
So please consider signing up today for $10 a month or $100 a year at 404media.co/slash membership and get bonus content every week and access to all of our articles.
Thank you and enjoy the rest of the podcast.
And we're back.
We're going to talk about a story that Jason published
today.
The headline is: Judge rules training AI on authors' books is legal, but pirating them is not.
This story went through so many rounds of headline workshopping because it is kind of like a weird,
not really complicated, but like it doesn't, it's a little bit hard to get your mind around what exactly the judge is talking about here.
I think one of the headlines was like
judge being annoying, training cool, piracy bad, judge says.
It just is hard to encapsulate a story like this into one sentence.
So,
yeah, Jason, do you want to just kind of give us like a really quick,
just like top line of what the ruling said?
It's based on the books three
case, right?
Yeah.
So three authors sued Anthropic, which makes Claude the AI tool,
basically saying that Anthropic trained on their books without consent and without compensation.
And so it's a copyright lawsuit about whether that is legal or not.
There's like, I've seen a tracker.
I think there's like over 100 cases.
like this where it's, you know, authors, artists, you know, journalists, newspapers, New York Times, et cetera, suing an AI company about this question of whether you can
train on someone's work without paying them, without permission, et cetera, et cetera.
And this is the first major decision among these lawsuits.
So it's pretty important for that reason because it's like one of the first ones.
And basically, the judge decided
that
it was fair use for Anthropic to train train on these authors' books, but that it was not legal for them to pirate the books to do so.
And so, the actual like pirating of the books, not fair use,
training of the AI,
transformative under fair use, which is like if you're transforming the work in some way, then it's protected by fair use, broadly speaking.
And basically, the judge found like
training the AI was transformative, but downloading these books was not.
And then there's caveats upon caveats from there, but that's like top line, sort of like what was decided.
Yeah, I feel like if you went back in time five years and explained, if you said exactly that to me, I would be like, yeah,
obviously.
Like, that's not a difficult concept to understand.
Like, piracy has been illegal for a very long time.
So acquiring things via piracy to further your company's
business is illegal, but because this decision is coming after all of these companies have already sucked up all of this copyrighted work via piracy in a lot of cases, uh, without permission, without consent from the authors in this case, and then gone ahead and used it to train their AI and their LLMs.
It's like we're kind of working, we're putting this together backwards and all of this copyright law is being like shook up and re-litigated, and
maybe even litigated for the first time in a lot of cases.
So
it's like, that's why I said it's like, it's a complex situation, but it's not.
It's like,
yeah, these are very basic principles that we're now like arguing about again is theft is bad.
Yeah, yeah, I agree.
And then also, I think the initial reaction from people was like, oh, this is very bad for authors.
This is very bad for artists.
And that was my initial reaction.
And then, like, as I was writing the story, as we were talking about it, it, and then especially after I published it, I was like, wait, maybe this is actually like quite a good decision, perhaps.
Yeah, can you walk us through that?
Like, why
is this?
What does this, what does this mean?
I guess, if
what could it mean, if anything?
Yeah.
So, again, there's like hundreds of cases.
And so, the judge in this case had to look at the facts of this case.
And in this case, Anthropic used Books 3, which is this massive torrent of books, LibGen, and one other piracy site called Pirate Library Mirror to download hundreds of thousands of books.
And the
judge essentially, and then crucially,
Anthropic made a like library/slash database of all of these.
So not only did it like download all this information for training purposes, it kept a copy of all of them.
And keeping the copy of them is sort of like what the judge said was bad.
But then, also, alongside of this, Anthropic started buying physical books, like physical used books from like mass resellers.
And then it started scanning those books in the same way that like Google Books was scanning books and the same way that like the Internet Archive has done before as well.
And basically, the judge said, for the books that it bought legally,
like all good, no problem here.
For the ones that it pirated,
not good, gonna go to trial, gonna like look into damages and stuff like that.
So basically, like there's these instances of piracy that Anthropic might now have to pay for.
And sort of like at first blush, it's like, oh, that's
good for these three authors.
Like, what does it mean for the rest of us?
Like, everyone else that has had their stuff,
you know, kind of taken?
And
it seems like on first blush, it's like, oh, okay, like they can train their AI on this stuff as long as they acquire it legally.
But then you think about it, and it's like almost very little of what AI companies have acquired has been acquired legally.
Like we've reported various times, Sam, you had like an amazing story about NVIDIA
and we did one about Runway as well.
You did one about Runway as well.
And it's like they're scraping a lot.
of piracy sites.
These companies are scraping piracy sites.
They're scraping like Netflix.
They're scraping like
stuff that is paywalled that they are not allowed to do.
And crucially, the judge was like,
that's illegal.
That's very bad in this case.
And so when you think about it in terms of like practical practicalities, it's like a lot of these AI companies have
committed
like millions of instances of piracy.
And the potential punishment and liability for that is like billions and billions of dollars.
And so
that's why I think it's maybe not as bad of a decision as I first thought, because like these AI companies very well could be on the hook for like millions of instances of piracy.
And then the other thing that is really notable about this is the authors didn't allege that Claude
like reproduced their work in any way.
Meaning, they didn't talk to Claude and get Claude to spit out portions of their book verbatim.
And a lot of the other lawsuits hinge on outputs like that, where like Chat GPT, New York Times versus OpenAI hinges on ChatGPT verbatim reproducing parts of New York Times articles, for example.
And that didn't happen in this case.
And the people suing didn't even allege that it happened.
They were just like,
our books were in books three.
We know you trained on books three.
Therefore,
you know, we're suing you.
And the judge specifically sort of said, well, they didn't allege that,
you know, the AI was reproducing their books.
Therefore, I'm not even going to consider that.
And I think that that's like a pretty crucial fact to keep in mind.
And that in these other cases,
you know, the AI tools are regurgitating pretty much verbatim or very similar, like copyrighted characters, for example, in terms of like the mid-journey, like Disney versus mid-journey lawsuit, things like that.
And so
I don't think that it's going to be like blanket.
Yes, you can train AI on anything you want.
I think that the outputs are actually going to matter for some of these other cases.
Yeah, you kind of have to get,
you kind of have to read this and get into like copyright and IP lawyer and IP judge brain because
he's looking very specifically at
what the law is, right?
So like,
what they're, like you said, what they're alleging is not that there are these like perfect reproductions, it's
that their copyright was violated because it was ingested illegally.
It was acquired illegally.
It was acquired to piracy.
Yeah, and the Disney and
the Disney versus Midjourney one is like, oh, you reproduced Mickey Mouse perfectly.
This is a threat to our copyright.
So obviously you need to pay for that or pay in court or some other way.
And we did a story yesterday.
I guess it'll be two days.
It'll Monday, whenever this comes out for you,
about how one of Meta's LLMs was apparently researchers found and think that it was memorizing huge pieces of books and would actually output big chunks of like the first Harry Potter book or 1984,
which is a very clear sign that those pieces are in the LLM as that contiguous chunk of the book.
But that's not, it's a different case than what's happening in here with Claude and Anthropic, which I think is really interesting.
So, yeah, do you, I mean, you also made a good point on Lucius Guy, you were writing, and you had mentioned this also in Slack,
that this is interesting for like paywalled sites.
Do you want to get get into that a little bit?
Like if Anthropic got a subscription to 404 Media, for example?
Yeah, I mean, I guess it's just like
a lot of the deals that
media outlets are signing with OpenAI have to do with
OpenAI giving a company millions of dollars a year to train on their, to get permission to train on their
content in an ongoing fashion.
And sort of like the way that I read this, and you know, I am not a copyright lawyer.
I'm not a lawyer of any sort, but the
logic and reasoning sort of suggests to me where it's like, if you buy a book, you can then train on that book.
I don't know why you wouldn't be able to buy a subscription to a website and be able to train on everything.
And so instead of paying
a news outlet millions of dollars a year for like a really big news outlet, a really big deal, you know, can you get one subscription and train on that?
And is that going to
sort of like
open this up even further?
I think the other thing I want to at least give voice to is that there's a lot of copyright experts out there who say that copyright is not the correct way to sort of go about policing this.
I think that it's the most obvious one at the moment.
But that like a very restrictive
fair use decision on like what is fair use and what is not would have
probably like pretty scary knock-on effects for people like us who do lot use lots of things under fair use, like use sometimes images as fair use, sometimes use like snippets of video as fair use, things like that.
I don't want to get too into it because I think I'd get out of my depth, but like we're going to see tons and tons of copyright decisions, but copyright law wasn't really like written anticipating this sort of thing happening.
Yeah, for sure.
It's
a new frontier to be cliche about it.
So, what happens now in this big
ongoing case?
What are we going to see occur next or what should we watch for?
So, basically, this was a summary judgment, which means the judge threw out the parts of the case that he decided were fair use and the parts where he thinks that it was not fair use, there will be a trial.
And so the trial will, I guess, give Anthropic the opportunity to argue: actually, this creation of this massive library using piracy was fair use for some reason.
Although the judge in this case seems like really reticent to accept any sort of argument there, he's like, it pretty plainly fails fair use.
It's just like straight up piracy.
So that'll happen.
And then they'll decide damages.
And then, like, of of course, there's opportunity to appeal.
I think more importantly, there are just like a hundred other cases
that have different facts that are going to go through different courts.
And eventually, some of these are going to get appealed up probably to the Supreme Court at some point, just because there's like so much money at stake, and there's so many cases going on right now.
So, I don't think that this is like precedent-setting in any way at the moment, but this is like the first really big decision.
So we can like see what types of things judges are considering here.
I have a question.
And I don't necessarily
think any of us can have the answer for it, but I don't know.
It's something I want to talk about, which is, I think, especially back when we first started to report about what is included.
in the training data for these very popular models for these giant AI companies, specifically when Sam published those stories about Runway and Nvidia.
My feeling then was like, oh no, the other shoe has to drop for the entire generative AI industry
because
their story changed from, oh, this is trained on the open internet.
And then people started to ask, is there copyrighted material in there?
And literally they shrugged.
Like famously, the,
I think, CEO of OpenAI at the time was like, maybe YouTube is in there, maybe it's not.
I don't know.
Maybe there's copyrighted material in there.
Maybe there isn't.
I don't know.
And now somehow we've shifted to, like, well, obviously there's copyrighted material in there.
Obviously, there's pirated material in there.
But now we're making some fair use argument.
And I guess I'm at a pretty nihilistic point where
it's like a too big to fail situation.
Like, it's hard for me to imagine
any legal action on doing the training or forcing these AI companies to redo their training or to pay every single person whose work has been used to train these giant, extremely profitable AI models.
Jason, do you?
I'm at a pretty, I'm not saying it's right.
I'm saying it's horrible, but
I'm at a pretty nihilistic place with it where I just don't think there's, I don't think the legal system is equipped to punish or correct such brazen and huge
IP theft, right?
Like on this scale.
And I'm just not sure there's anything to do to correct it.
What do you think?
Do you think they actually end up paying billions of dollars or?
I mean, I am sort of with you.
I'm sort of with you where it's just like
every major tech company is doing this in such a brazen way.
I want to read you like an excerpt from the judge wrote this.
The judge wrote this in this decision he said quote from the start anthropic had many places from which it could have purchased books but it preferred to steal them to avoid quote legal slash practice slash business slog as co-founder and chief executive officer uh as as anthropic put it so in january or february 2021 They downloaded books three, an online library of 196,640 books that he knew had been been assembled from unauthorized copies of copyrighted books.
That is, pirated,
the judge wrote.
Anthropic's next pirated acquisition involved downloading distributed reshared copies of other pirate libraries.
In June 2021, Mann downloaded in this way at least 5 million copies of books from Library Genesis, which he knew had been pirated.
And in July 22...
2022, Anthropic likewise downloaded at least 2 million copies of books from the Pirate Library Mirror, which Anthropic knew had been pirated.
So the judge in this case was like,
you who downloaded 7.5 million books illegally,
mixed decision.
Like, he's kind of just like,
yeah, they stole a shitload of things.
And I know that we're like many years out from
like teenagers downloading 17 songs on Napster and having to pay Metallica hundreds of thousands of dollars.
But it's like
Anthropic is one of the most valuable AI companies in the world.
Here's a judge saying, yes, they stole millions of books.
And it seems like maybe they'll get a slap on the wrist here.
Like, I don't know.
And I think that
Congress is certainly not going to do anything.
This current administration is certainly not going to do anything.
I think that individual courts
might
have some like civil penalties.
I think some of the bigger lawsuits are going to get settled.
Like, we've already seen,
I forget which one, but I think it was the Suno case, the AI Music case.
Like, Universal is
essentially settling with them and signing a licensing agreement with Suno.
And so,
the like super big entities, you're like Universal Musics, your New York Times's, your Disney's, et cetera.
I get the sense that they are going to eventually sign some sort of deal with these companies.
These smaller authors who are like banding together as classes to sue against them are going to like
lose or like people are going to get, you know, $12
in some class action lawsuit settlement.
And like, the industry is going to keep going.
Like, I don't see a world in which this becomes an existential issue for these AI giants because there's too much writing on it.
Like the entire tech industry, which is a huge part of the American economy, is writing on
this succeeding in some way.
And
I guess when this all first started, I was like, oh, they're going to get sued and this is all going to go away in some way.
And I guess I don't feel that way anymore.
But
it sounds like that's kind of like where you are as well.
I think it's that.
And I think it's it's the scale.
The thing that everyone always,
you know, tweets at me or posts on Blue Sky whenever we post one of these stories is like, oh, this is allowed.
But Aaron Schwartz, who was like an early,
it's fair to call him founder of Reddit.
And he was about to go to jail
for
making a bunch of copyrighted papers, I think, available to the public and took his own life.
And it's such an obvious point to make, but I think it's true.
It's like, if
Comcast snitches on you for downloading a pirated copy of the Avengers, they're like, this guy has committed a crime.
Aaron Schwartz has like broken copyright law and he's going to jail.
But if you steal from more people that you can count more times than you can count and kind of mix it all into like this slurry where it's hard to tell who you stole from and the
responsibility of the people you stole from is to come up to you and say, Hey, that's actually my content.
It's like, it's just it's a crime on such a scale that you can't really do anything about it, and the system is not built to deal with it.
You also go into like Meta AI app right now and scroll through the Discover page, and there's like Mickey Mouse, SpongeBob, Mickey Mouse, SpongeBob, Spider-Man.
Like
it's so, it's so obvious, it's so brazen, And,
you know, I guess they're just arguing transformative.
But like.
Many entities have argued
transformative
better and lost.
And, you know,
that's not why they're going to get away with it.
And I mean, it is interesting that, again, like a year ago, these companies were saying, oh,
we don't know if we trained on pirated material.
Like, very unclear.
Could be anything.
Could be just open web.
Could be these AI tools are magic.
And like when you ask for Mario, Mario pops out because they're so smart that they're figuring out how to do Mario.
And it turns out like, no, they're just downloading Netflix.
They're just downloading YouTube.
They're just downloading every book that has ever existed.
Like that's one of the quotes in here is that Anthropic has tried to train on every book that has ever existed.
Like that was their goal.
And
yeah, it's just like
their argument now is not we didn't do it or you can't prove we didn't do it.
Their argument now is we did it and it was legal and there's nothing you can do about it.
And we'll see how that plays out.
But,
you know, this decision is
mixed.
It's mixed.
I don't think it's as dire as a lot of people were saying when it first came out, but it's certainly not like cut and dry, you can't do this.
It leans more on the like, yeah, they can get away with this.
And I think that
except for in a few really egregious cases, that's probably what we're going to see moving forward.
Yeah.
I was kind of surprised by how complicated it was.
Like, I thought it was going to fall on one side or the other, but it lands in this weird middle spot.
But I don't know.
I guess we'll.
We'll see how it continues and what happens if it gets to trial.
If you're listening to the free version of the podcast, I'll now play us out.
But if you are a paying 404 Media subscriber, we're going to talk about all the AI slop during the Iran-Israel war.
You can subscribe and gain access to that content at 404media.co.
As a reminder, 404 Media is journalist-founded and supported by subscribers.
If you do wish to subscribe to 404 Media and directly support our work, please go to 404media.co.
You'll get unlimited access to our articles and an ad-free version of this podcast.
You'll also get to listen to the subscribers only section where we talk about a bonus story each week.
This podcast is made in partnership with Kaleidoscope.
Another way to support us is by leaving a five-star rating and review for the podcast.
That really helps us out.
Here is one of those from Anne Nettie.
I hope I got that right.
It's quite a long one.
Team 404 Media's reporting is incredibly thoughtful, thorough, and important.
Some days it's AI slops.
Some days it's about real-time, real life, Black Mirror Skynet.
world that we all live in now.
At the end of every episode, they do a bonus story from the week I became a subscriber after being unable to shake the strong sense of FOMO.
Thank you so much.
This has been Forrester Media.
We'll see you again next week.