AI is killing the internet
This episode was made in partnership with Vox’s Future Perfect team. It was produced by Gabrielle Berbey, edited by Amina Al-Sadi, fact-checked by Rebeca Ibarra, engineered by Patrick Boyd and Andrea Kristinsdottir, and hosted by Sean Rameswaram.
Listen to Today, Explained ad-free by becoming a Vox Member: vox.com/members. Transcript at vox.com/today-explained-podcast.
Noted fan of the internet Al Gore with his boss at the time, President Bill Clinton. (Photo by Sharon Farmer/White House/Consolidated News Pictures/Getty Images)
Learn more about your ad choices. Visit podcastchoices.com/adchoices
Press play and read along
Transcript
Speaker 1 Artificial intelligence is scraping the internet.
Speaker 1 It's gorging all the websites to give you what you want. It's actually kind of gorging everything to give you what you want, and the makers of everything are not very happy about it.
Speaker 1 Sarah Silverman is suing, Sony is suing, Dow Jones is suing, the New York Times is suing, authors are suing.
Speaker 1 But in one author lawsuit, AI kind of won. Specifically, Anthropics AI, who goes by Claude?
Speaker 2 Well, Claude's not cool, but Claude's uncool the same way I'm uncool. See, so
Speaker 1 Claude's win in court is scaring the makers of everything, and we're going to talk about why on Today Explained.
Speaker 3 Support for Today Explained comes from BetterHelp.
Speaker 7 BetterHelp says it's winter and winter is often depressing.
Speaker 3 Instead of getting depressed, BetterHelp wants to encourage you to reach out to someone, perhaps get coffee with an old friend, perhaps write a letter to a family member, perhaps connect with a licensed therapist using BetterHelp.
Speaker 10 This month, don't wait to reach out, says BetterHelp.
Speaker 7 Whether you're checking in on a friend or reaching out to a therapist yourself, BetterHelp makes it easier to take that first step, says BetterHelp.
Speaker 3 You can get 10% off your first month at betterhelp.com/slash explained.
Speaker 5 That's betterhelp.com/slash explained.
Speaker 11
With a Spark Cash Plus card from Capital One, you earn unlimited 2% cash back on every purchase. And you get big purchasing power.
So your business can spend more and earn more.
Speaker 11 Capital One, what's in your wallet? Find out more at capital1.com slash Spark Cash Plus. Terms apply.
Speaker 1 Today, Explain from Vox. I'm Sean Ramas from here with Jason Kebler, tech reporter and co-founder of 404 Media.
Speaker 14 I am a journalist who covers AI, but I'm also a business owner because we have our own small publication. And so I'm very interested in what is going to happen with all of these
Speaker 14 AI companies getting sued on copyright grounds.
Speaker 14 There's dozens of lawsuits at this point, and I'm concerned about it both as a journalist who has had my work scraped, but also as someone who has like a direct financial interest in it.
Speaker 14 And so about a month ago, there was this decision in a case against Anthropic, which makes the AI tool called Claude.
Speaker 14 And it's not necessarily that this is the biggest AI copyright case, but is the first real major decision where we get a judge sort of pointing at how he is thinking about these issues of massive AI companies scraping authors' work, scraping artists' work, scraping musicians' work.
Speaker 1 And who sued Anthropic?
Speaker 14 Yeah, so it's three authors. Their names are Andrea Bart, Charles Graeber, and Kirk Wallace Johnson.
Speaker 16 Three authors claim Anthropic built a multi-billion dollar business by misusing copyrighted works and pirated writings without permission and without paying the authors for their work.
Speaker 17 This lawsuit is really just the latest as many other authors, journalists, record labels, artists, creators, they try to wrestle back control of their work.
Speaker 14 To be totally honest, I didn't know them before this lawsuit.
Speaker 1 To be totally honest, I still don't.
Speaker 14 They sued them because they learned that their books were included in this data set called Books 3, which is this really controversial at this point data set that contains a few hundred thousand books.
Speaker 14 And The Atlantic at one point got a copy of Books 3 and then published like this search tool that allowed authors to see, is your book in this data set?
Speaker 9 Author Drew Hayden Taylor had no idea
Speaker 13 that nine of his works were part of Books 3.
Speaker 10 a massive data set used by tech companies to train artificial intelligence.
Speaker 1 Well,
Speaker 18 it's a combination of being flattered and being concerned.
Speaker 19
We're all just like little ants who don't mean anything to the big billionaires. They don't want to pay us for our words.
They'd rather just take it. I'm so mad.
If your book is on here, I'm so sorry.
Speaker 19 I'm just like so sad for so many authors today.
Speaker 14 These authors learned that their books were in books three, Anthropic trained on books three.
Speaker 14
and therefore Anthropic trained on their copyrighted works. And so that formed like the basis of this lawsuit.
So the really interesting thing is that in the early days of
Speaker 14 this debate, and it's like one of the hottest debates at the moment between artists, journalists, authors, and like the AI boosters and companies and maximalists, is
Speaker 14 is it fair use to
Speaker 14 scrape this stuff en masse, run it through a large language model, like turn it into a huge data set, and then use large language model technology to create these tools?
Speaker 14 And at first, the AI companies were very skittish about saying that they had trained on copyrighted work at all.
Speaker 1 AI should be allowed to read the internet and learn.
Speaker 14 Shouldn't be regurgitating. Shouldn't be
Speaker 14
violating any copyright laws. But on individuals' private work, yeah, we try not to train on that stuff.
We really don't want to be here upsetting people.
Speaker 14 But as these cases started going to court and as they entered discovery and as it became clear that every major AI company was training on copyrighted work, their argument went from being,
Speaker 14 well, we can't say what we trained on because this is proprietary to, of course, we trained on copyrighted work. We had to, and it's legal.
Speaker 14 And it's legal because our use of it is transformative and therefore it's protected by the fair use tenet of copyright law.
Speaker 20 Section 107 of the Copyright Act reads, transformative uses are more likely to be considered fair.
Speaker 20 Transformative uses are those that add something new with a further purpose or different character and do not substitute for the original use of the work.
Speaker 14 That's what they argued and that's what the judge ultimately decided. What he decided in this case was the scraping of these three authors' books was considered fair use under copyright law.
Speaker 14 But there is a huge caveat here where he decided that the way that Anthropic Anthropic went about acquiring the books in the first place was piracy.
Speaker 1 Okay, so the judge essentially hands down a split decision saying that, yes, this is fair use to use these authors' work this way, but also it wasn't totally fair how you got this stuff because it was pirated.
Speaker 1 So I don't know, what does that mean? Does everyone go home unhappy, or was this like a huge win for Anthropic? Doesn't feel like a huge win for the authors.
Speaker 14 Yeah, I mean, I don't think it's a huge win for anyone yet.
Speaker 14 And I think that the people who are saying this is a slam dunk for Anthropic, which many people in the AI world are saying it's a huge win for Anthropic,
Speaker 14 I think they're wrong. And the reason that I think they're wrong is because
Speaker 14 the judge determined essentially that it was not copyright infringement to train Claude on copyrighted material that was legally obtained.
Speaker 14 But then they also downloaded books from this website called LibGen, which is a piracy website that has millions of books on it.
Speaker 14 And then also from a website called Pirate Library Mirror, which is another piracy site that has millions of books on it.
Speaker 14 And the judge said that obtaining the books in this way was pretty much like cut and dry copyright infringement.
Speaker 14 And I think the really important thing to note is that every major AI company has trained on copyrighted works that they obtained in a similar fashion. We have done reporting at 404 Media where
Speaker 14 entire YouTube channels were scraped,
Speaker 14 Netflix, like the entirety of Netflix was scraped. And so
Speaker 14
The specifics about how these companies obtained these works is potentially going to be really important. And a lot of that scraping has already been done.
A lot of that piracy has already been done.
Speaker 1 These companies are literally some of the richest companies on earth, are affiliated with some of the richest people on earth. Did they really just steal all these books?
Speaker 1 Could they not have just gone to Amazon and bought like some books? Or is that just too much work for them?
Speaker 14 Well, so the super interesting thing about this lawsuit and something that like really like, I was like, holy shit, like, how did they do this? Why did this happen?
Speaker 14 Is in the beginning, Anthropic pirated all these books.
Speaker 14 They downloaded huge amounts of torrents, they scraped these piracy websites, and they did that specifically because they didn't want to slow down.
Speaker 14 Like, there's an email that is part of this lawsuit where the CEO, Dario Amadei, says, you know, we don't want to get into
Speaker 14 he calls them legal slash practice/slash business slog.
Speaker 14 and so they were basically like let's do all of this let's pirate all the books let's put it into our model and then let's go buy copies of a lot of other books and so what anthropic did was they had a whole team of people who was dedicated to buying used books from used bookstores that were going out of business from ebay from these online marketplaces and they bought a huge, huge number of books, like physical books.
Speaker 14 They tore the covers off of them and they had this like giant scanning operation where they would scan the books and then create a digital copy of the books and then fed that into their model
Speaker 14 and the judge said that all of those books that were bought from used bookstores no problem
Speaker 14 and i i think that goes to show that um
Speaker 14 These AI companies are grabbing data from wherever they can find it. It's like a, it's a huge arms race to see who can get the most data from the most number of places.
Speaker 14 And so they're doing like the low-hanging fruit, which is downloading
Speaker 14 everything.
Speaker 14 Yeah. But then they're like scouring the planet looking for like bookstores that are going out of business.
Speaker 14 Like I've, I've heard of AI companies looking for like huge physical archives of like VHS movies and things like that, and then digitizing those.
Speaker 14 And so really they're just trying to find data wherever they can.
Speaker 14 And it seems like when they're able to get it legally by purchasing a copy, they're willing to do so, but they're also willing to take it for free when they can.
Speaker 1 Did we learn anything from this lawsuit that might implicate those other ones?
Speaker 14 Yeah, I mean, I think that the piracy aspect of this is really important.
Speaker 14 And we've seen in the past, like if you are a 13-year-old kid who's pirating Metallica songs on Napster, like you can be liable for hundreds of thousands of dollars worth of damage.
Speaker 1 Lars will find you.
Speaker 14
For just like a few songs. And like in this case, you have 7 million books.
And so
Speaker 14 like it will be very interesting to see whether a judge
Speaker 14 levies like a huge financial penalty here or whether it's more of a slap on the wrist. And I tend to think it will probably be more of a slap on the wrist because
Speaker 14 all of Silicon Valley, all of America's largest companies sort of have a huge amount of investment riding on the widespread adoption of AI. And AI is now a huge part of the American economy.
Speaker 14 It's become part of like geopolitics as well, where you have the Trump administration and really the Biden administration was saying the same thing. Come on, man.
Speaker 14 Saying that the United States can't fall behind China in the quest to innovate in AI and to have like widespread AI adoption.
Speaker 14 I'll be very curious to see whether there are like actual
Speaker 14 like, serious punishments for these companies that have scraped all of this data or whether they,
Speaker 14 you know, wiggle out of it with a slap on the wrist or get out of it with a series of settlements or what have you. But I tend to think that there's probably no stopping this
Speaker 14 industry from a legal perspective. I think that it feels too big to fail to me at this point.
Speaker 1 404media.co is where you can find and support Jason Kebler's work instead of, you know, just stealing it. AI companies aren't just stealing everyone's intellectual property.
Speaker 1 They're also kind of killing the internet as we know it right before our eyes.
Speaker 18 We're going to talk about that when we're back on Today Explain.
Speaker 21 Support for Today Explain comes from ATT.
Speaker 5 There's nothing worse than needing to make a call and realizing you can't connect, says ATT.
Speaker 6 And of course, every wireless provider will claim that they're the best, but ATT says ATT has the goods to back it up.
Speaker 21 According to Root Metrics, ATT earned the best overall network performance.
Speaker 23 While the other guys are busy making claims they can't keep, ATT says they're making connections on America's fastest and most reliable wireless network.
Speaker 9 No matter if you're at a concert, a huge sporting event, or just out enjoying nature, you can post when you want to post. Don't post when you're enjoying nature, guys.
Speaker 5 Keep it in control.
Speaker 10 Call when you want to call and rest easy knowing that no matter where you go, ATT has got you covered.
Speaker 6 When you compare, there's no comparison.
Speaker 5 ATT.
Speaker 2 Based on Root Metrics United States Route Score Report 1H2025 tested with best commercially available smartphones, smartphones on three national mobile networks across all available network types, your experiences may vary.
Speaker 21 Root Metrics rankings are not an endorsement of ATT.
Speaker 21 Support for Today Explain comes from ATT.
Speaker 5 There's nothing worse than needing to make a call and realizing you can't connect says AT ⁇ T.
Speaker 6 And of course, every wireless provider will claim that they're the best, but AT ⁇ T says ATT has the goods to back it up.
Speaker 21 According to Root Metrics, AT ⁇ T earned the best overall network performance.
Speaker 23 While the other guys are busy making claims they can't keep, AT ⁇ T says they're making connections on America's fastest and most reliable wireless network.
Speaker 9 No matter if you're at a concert, a huge sporting event, or just out enjoying nature, you can post when you want to post. Don't post when you're enjoying nature, guys.
Speaker 5 Keep it in control.
Speaker 10 Call when you want to call and rest easy knowing that no matter where you go, AT ⁇ T has got you covered.
Speaker 6 When you compare, there's no comparison.
Speaker 5 AT ⁇ T.
Speaker 22 Based on Root Metrics, United States Root Score Report 1H2025 tested with best commercially available smartphones, smartphones on three national mobile networks across all available network types.
Speaker 25 Your experiences may vary.
Speaker 21 Root Metrics rankings are not an endorsement of ATT.
Speaker 6 Support for today explain comes from Udacity.
Speaker 5 Udacity says they can help you prepare yourself to be able to use IT.
Speaker 6 Udacity is an online learning platform.
Speaker 8 It has courses in AI and tech, generative AI, agentic AI, Python, data science, so much more.
Speaker 8 When you learn with Udacity, according to Udacity, you're not just passively watching videos or reading articles, maybe even books, you're doing practical exercises and projects that prepare you for the job you want.
Speaker 8 That's why they claim 90% of Udacity graduates surveyed say they achieved their enrollment goal. Udacity just launched a master's degree in AI.
Speaker 6 When you have a certification from Udacity, they say recruiters and employers take notice.
Speaker 5
You can check out Udacity today. The tech field is always evolving.
You should be too.
Speaker 8
You can try Udacity risk-free for seven days. You can head to udacity.com/slash explained.
Use code explained for 40% off your order.
Speaker 8 Once again, udacity.com/slash explained for 40% off. Make sure you use the promo code EXPLAIND.
Speaker 1 Today, Explain is back with John Herman now. He's a tech columnist at New York Magazine.
Speaker 1 John, in the first half of the show, we're talking about how this anthropic case and judgment, you know, may or may not change the extent to which these big AI models can scrape the internet.
Speaker 1 But I want to talk to you about how all this scraping has already, in some ways, broken the internet as we know it and how we use it.
Speaker 1
You wrote about how AI has broken maybe like, you know, the front page of the internet for a lot of people. Google.com.
Tell us how.
Speaker 1 Google could not be closer to the center of like this recent AI boom. On one hand, they are a company that has really deep roots in that space.
Speaker 1 They published like the foundational research for what then became generative AI as we know it. They've put it in all their products.
Speaker 1 If you use any Google thing, you are seeing like chatbots everywhere.
Speaker 27 Take notes with Gemini.
Speaker 27 Summarize this file, summarize a folder, refine this document, find inspiration, easy, fresh ideas, elevate your writing, get clear, constructive, improve sentence flow, word choice.
Speaker 1
They are all in on AI. Google search in particular has AI overviews at the top.
There's a new AI search mode that works like a chatbot instead of a search engine.
Speaker 1 Google making a rare change to its homepage, the most visited website in the world, pushing its AI mode tool directly into the hands of its billions of users.
Speaker 28 With this latest move, it is changing what billions of people see when they open their browsers, still the on-ramp for the entire internet.
Speaker 1 Meet AI mode.
Speaker 15 Ask detailed questions for better responses.
Speaker 1 AI on Google search can provide information.
Speaker 1 While that was all happening, AI was also sort of accelerating this feeling of decline in the Google product, which over the years, through this back-and-forth battle between the company and search engine optimizers and companies trying to get an edge on Google and this sort of long-running dynamic had become a little spammy, a little overloaded with ads.
Speaker 1 Have you noticed that Google sucks lately?
Speaker 14 I'm talking about their search.
Speaker 1 It sucks.
Speaker 15 Why is it so hard to find anything on Google search?
Speaker 29
Google search is terrible. It's bought and it's sold five or six links up top, all paid for.
It's just garbage, pure, unadulterated garbage.
Speaker 1
But I think a lot of people would agree that using Google in, say, 2023 was a kind of a degraded experience compared to 10 years prior. It was kind of cluttered.
There was more just junk in it.
Speaker 1 There were more ads all over the interface, but also the stuff you were getting in search was a lot of low-quality, cheaply made, aggregated content, stuff that was taken from somewhere else in an effort to sell a product or just serve up some ads.
Speaker 1 The arrival of generative AI tools, which enable like the creation of basically infinite passable content almost for free, really accelerated that issue.
Speaker 1 So, on one side, you have the big ecosystem that Google guides people to that is in a sort of collapse because of this massive shock of new AI-generated content.
Speaker 1 On the other side, you have Google, the product, becoming more and more AI-centric. And in the middle, you have kind of a complicated story.
Speaker 1 And honestly, for search users and regular people, kind of a strange experience. Do they have a plan to make money off of this? Obviously, they want to make money.
Speaker 1 Has anyone asked what their long-term plan is? So there are obvious risks to throwing away this like cluttered but lucrative product and replacing it with a totally clean chat bot or whatever.
Speaker 1 That's not what they're doing. They are
Speaker 1 incorporating
Speaker 1 AI answers into the main search page, which they say people like quite a bit. So this last quarter has been really good for them.
Speaker 1 It also arrived in the context of lots of like really strong data data suggesting that the way people use Google Search now with these AI tools means that they don't really leave it anymore.
Speaker 1 They don't really click out and go to anything.
Speaker 1 An AI overview might summarize three articles, archival resource, some expert opinions, but the number of people that actually then click through to those opinions or to those articles is minuscule.
Speaker 1 So Google's relationship to the web around it is pretty, pretty dramatically different.
Speaker 1 If Google's like eating up the rest of the internet, if Gemini is eating up the rest of the internet right now,
Speaker 1 and companies like ours, let's say,
Speaker 1 are no longer, you know,
Speaker 1 meeting their traffic goals, are no longer getting any traffic from Google at all. Like, does Gemini have like nothing to eat? You know what I mean? Because everything dies?
Speaker 1 Who's going to be feeding Gemini all the right answers in like 10 years?
Speaker 1 We're sort of like glorifying the web a bit in this conversation. No matter how great and incredible it is as this big resource, it really doesn't go that deep.
Speaker 1 And the idea that it is now being sort of like trawled and overfished and just sort of consumed like a resource by these AI companies really does, I think,
Speaker 1 raise the specter of like, of collapse. I do think that they could find that their products are being made worse by this dynamic and by their relationship with the web.
Speaker 1 I do think that's a real problem. And you can see this in some of the deals that these companies make with publishers,
Speaker 1 including our parent company, which has a deal with OpenAI, for example.
Speaker 1 Remind people out there or me why companies like ours make deals with companies like ChatGPT. The context is Every media company is struggling for visitors.
Speaker 1 Even before the Google traffic really started to collapse, it was sort of unstable.
Speaker 1 And so in addition to like a weak advertising market, every media company is looking for any sort of additional source of revenue.
Speaker 1 And if you're a media executive, OpenAI showing up and saying, here is this many millions of dollars for this many years, it looks like free money.
Speaker 1 Of course, if you're like producing the content, or if you're even just thinking longer term about how
Speaker 1 a media company or website fits into this AI picture, you recognize that you're sort of, you know, giving access away to something that these companies are explicitly trying to automate.
Speaker 1 You know, you're sort of like,
Speaker 1 in an institutional sense, training a replacement.
Speaker 27 You're listening to AI Explains today.
Speaker 1 But it is a deal made not quite under duress, but something close to that.
Speaker 1 For people who miss that old version of the internet, who miss going to Google, typing in a query, getting a bunch of results, clicking on a few of them, getting answers that felt credible,
Speaker 1
where do they go for that experience now? I think there's like a funny polarized answer to this. I just did a story on Reddit, which is having a huge moment right now.
It's been around for 20 years.
Speaker 1 It's growing hugely. And part of it is just a response to
Speaker 1 you know, social media fatigue,
Speaker 1 the sense that other communities on the web don't really exist anymore, that everything else on the web is too commercial and whatever. Also, a huge part of that growth is just traffic from Google.
Speaker 1 They're having the fastest growth they've had in almost their entire existence because Google is just shoveling so many people into Reddit because everything else is not really like...
Speaker 1 working.
Speaker 1
So you have that. You have a community of communities.
You have something that feels kind of like it's of the old web.
Speaker 1 It seems like eventually we're going to get to the point where it's like you either want to talk to one of these large language models or you just go back to like calling up your friend.
Speaker 1 I don't even know where it gets. You just walk it, you just get, walk into the street and yell, does anyone know of a good barber?
Speaker 1 Yeah, I mean, it's like a real, the mutual suspicion about who's using AI is, is really pervasive, and especially, especially online, but also also in person.
Speaker 1 But yeah, I do think that the way that the
Speaker 1 AI like training paradigm and some of the stuff that you were talking about with Anthropic, but also just the way that Google incorporates all this stuff.
Speaker 1
It really does kind of break the deal with the whole idea of the public web. Like, all right, we'll all just do this stuff in public.
We'll talk to each other.
Speaker 1 People will build all these businesses around this to sort of connect everything and it'll all sort of work together and whatever.
Speaker 1 When you have like these massive sort of predatory companies just consuming all of that, harvesting all of that and saying, all right, we are no longer part of this arrangement.
Speaker 1
We are doing something else. More people are on Discord.
More people are in group chats.
Speaker 1 More people are either just purely consuming on social networks and not posting or just talking privately with their friends.
Speaker 1 And I do think that this fits quite well with that trend and probably accelerates it.
Speaker 1 John Herman, you can read and subscribe to New York Magazine at nymag.com. Gabrielle Berbay produced, Amina Al-Sadi edited, Rebecca Ibarra fact-checked, Patrick Boyd and Andrea Kristenstocher mixed.
Speaker 1 And by the way, Vox's Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic, and none of them have any editorial input into the stuff we make here at Vox.
Speaker 1
Speaking of stuff, we hope you enjoyed the 1700th episode. If you did, you can say something nice about us most anywhere you listen.
And if you didn't, well,
Speaker 1 there's always episode 1701 tomorrow.
Speaker 15 This message comes from ATT. America's first network is also its fastest and most reliable.
Speaker 15 Based on Rootmetrics United States Root Score Report, first half 2025, tested with best commercially available smartphones on three national mobile networks across all available network types.
Speaker 15
Your experiences may vary. Rootmetrics rankings are not an endorsement of ATT.
When you compare, there's no comparison. ATT
Speaker 12 Support for this show comes from Aura Frames. It's not a competition to see who can give the best present during the holidays, but it does feel nice to know you really nailed it.
Speaker 12 Aura Frames is that ideal gift. It makes it effortless to share personal, easy, and unforgettable frame digital photos to the people you love.
Speaker 12 Just upload your photos, share videos, and even preload memories before it ships, so your gift feels thoughtful from the moment it's unwrapped.
Speaker 12
For a limited time, visit auraframes.com and get $45 off Aura's best-selling Carver Matte frames using promo code Vox at checkout. That's A-U-R-AFrames.com.
Promo code Vox. Terms and conditions apply.