Hackers Dox ICE
13:53 - Hackers Say They Have Personal Data of Thousands of NSA and Other Government Officials
31:58 - Wikipedia Says AI Is Causing a Dangerous Decline in Human Visitors
SUBSCRIBER'S SEGMENT: OpenAI Catches Up to AI Market Reality: People Are Horny
YouTube version: https://youtu.be/7P2a4Y7P5UESubscribe at 404media.co for bonus content.
Learn more about your ad choices. Visit megaphone.fm/adchoices
Listen and follow along
Transcript
The Who's Down and Who Newville were making their list.
But some didn't know.
Walmart has the best brands for their gifts.
What about toys?
Do they have brands kids have been wanting all year?
Yep, Barbie, Tony's, and Lego.
Gifts that will make them all cheer.
Do you mean they have all the brands I adore?
They have Nintendo, Espresso, Apple, and more.
What about so the Who answered questions from friends till they were blue?
Each one listened and shouted, From Walmart?
Who knew?
Shop kissed from top brands for everyone on your list in the Walmart app.
Hello, and welcome to the 404 Media Podcast, where we bring you unparalleled access to hidden worlds, both online and IRL.
404 Media is a journalist-founded company and needs your support.
To subscribe, go to 404media.co.
As well as bonus content every single week, subscribers also get access to additional episodes where we respond to the best comments.
Gain access to that content at 404media.co.
I'm your host, Joseph, and with me are two of the 404 Media co-founders.
The first being Sam Cole.
Hi.
And Emmanuel Mayberg.
Hey, I'm back.
I feel like I haven't been here in many years.
You come back.
Jason's now gone.
Who knows next week.
Speaking of Jason, though, he published what is, you know, our first
short documentary.
It's called How Artists Are Keeping the Lost Art of Neon Signs Alive.
I've put a link to the YouTube,
to the video on YouTube in the show notes.
There is an accompanying article as well.
But this is exciting stuff, right?
We obviously, before we launched Hororphal Media, we all worked at Vice, that was,
you know, infamous, I would say, for its documentaries.
We had the pleasure of working on some of them.
I definitely didn't do a lot on it.
I think all of of you did more than me.
But now
that we run
the company, we can start to experiment and branch out and dabble into doing documentaries
ourselves.
And I don't know,
I think that's a very, very exciting time to be in.
Emmanuel, is it I know obviously you haven't done a documentary with 404 yet, but like it's going to be pretty different moving from
a big company like Vice all those years ago, where there were teams upon teams of people to do this to doing it ourselves.
Like, and I feel like you touched it maybe more than me, or maybe I'm misremembering.
What do you make of us doing it now?
Uh, I think Sam actually has done the most both in helping the video teams make their own videos, but also she hosted a video, which I hope she can talk about.
I'm really excited.
You know, like
Vice did really good videos.
We supported them.
Motherboard made really amazing series.
One that I really liked that Jason worked on very closely was about right to repair,
super viral video about how
farmers fix their tractors and such.
I hope that we're able to match that level of quality.
It'll be hard.
It's going to take a while to hit that, but
we have enough support now from our audience to at least try to do that.
And we obviously have plenty of interesting, worthwhile subjects to make videos about.
So I'm stoked.
Sam, what did you work on video-wise back at Vice slash motherboard?
I did an episode of Crypto Land, which was my episode was about
crypto being used for sex work.
So I went out to Vegas and went to like a content house basically for
crypto and
it was really fun it was cool um yeah obviously video documentaries are really hard um it's a complicated process so um
i'll be interested to see how it goes now that we're uh dipping our toe into doing our own so but yeah jason's i mean the neon documentary is really good you should go watch it it's really interesting uh the main character in it geo is such a character so i want to go hang out in his shop in la
Yeah, I mean, it's great.
And when we say we're doing it ourselves, it is very much that.
Jason took the podcast, sorry, the camera he usually uses for this podcast and went and filmed this, you know, short film.
So that's literally what we're doing.
And of course, as we get further into it, I'm sure maybe we'll bring help on and outside contributors and that sort of thing.
And that's all very much TBD down the line.
But hey, it's just an exciting time.
The couple of years after launch, as you say, we have the support where we can diversify and start to explore different stuff.
And I definitely have ideas of what I want to try to do.
All right.
Well, for this week, Emmanuel, do you want to lead us through this first story?
Yeah, so our first story is from Joe.
The headline is: Hackers docks hundreds of DHS, ICE, FBI, and DOJ officials.
So, Joe, what is this data exactly?
And what is the scale of the hack or docs or whatever you want to call it?
Like, how big of a problem is it, would you say?
Yeah, so the data itself,
this first story we're talking about, it's mostly DHS.
And of course, that includes ICE as well.
And then there's some FBI and DOJ as well.
But it's something like nearly 700 DHS officials in total.
and that includes their official email address, which you know helps verify that they're a government employee, their name, a phone number.
Sometimes that looks like a personal number, sometimes it looks like an office number, but there's definitely some personal ones in there.
And then addresses.
And again, this is also sometimes some sort of government facility.
But the ones I looked at, they sure look like residential addresses when you put them into Google and a Zillow listing comes up and it's like, oh, I don't think that's an official DHS facility or something like that.
So it's straight up docs.
And obviously, that is a lot of personal data that
people
could do various things with.
And in this case, it landed in the hands of a group of hackers who are financially motivated, I would say.
And I'm sure we'll get into that as well.
So the first thing we do when we find out about a leak or a docs like this is we we try to verify that it's accurate.
Otherwise we wouldn't report on it.
How did you do that for this story?
Yeah, when a hacker reaches out and they say, I have this data, it could be absolute BS.
You have no idea if they've fabricated it and they're trying to do it for clout or prestige or maybe just to troll the journalist or whatever.
We've had it before where hackers targeted telemessage that a signal clone that the Trump administration uses and we broke news of that hack.
Way back then, there was a bunch of data about customs and border protection officials in there, like phone numbers, that sort of thing.
In that case, I just phoned them up.
I asked them, hey, is that blah, blah, blah from CPP?
And they say yes.
And then I say, my name is Joseph Cox.
I'm a journalist, blah, blah, blah.
And then usually they hang up immediately when I start, you know, actually speaking.
So I did that.
way back when.
And I could have done that for this, but I actually just felt like doing a different approach which is that i took this docs provided by the hackers and then i turned to data provided by a company called district 4 labs they have a search tool called darkseed and i
log in and basically
It brings together all of these different data breaches from all over the internet.
So back in 2016, I remember that there was a wave of breaches, LinkedIn and all of these other ones.
And if you were a journalist or a cybersecurity researcher, you had to go and download that data, then search for it yourself.
And maybe that was useful.
I seem to remember some stories from the time where, oh, I need to verify who this person is for this story a couple of years later.
Oh, let me go through the data breach of XYZ company.
You don't need to do that anymore because companies like District 4 Labs exist and they just bring it all together.
And then they allow researchers, or journalists, or threat intelligence analysts, or I presume government government as well.
It allows them to log in, and you can just search all of these databases at once.
So I started taking phone numbers from
the DHS and the ICE docs, put them into this tool, and very quickly it was apparent that, oh, yeah, these are real docs of ICE and DHS officials.
There were some interesting cases where I would put in the phone number and it would bring up the matching DHS email from a breach of a parking app.
And that shows, oh, that DHS official must have signed up to that parking app with their official email address.
The name matches, the phone number matches, all of that sort of thing.
Other ones where you would put in the address and get the phone number, or you put in the phone number and you get the address.
So that did indicate this data is legitimate.
Now that said, There's the chance that these hackers have just repurposed all of that data that's already out there.
You know what I mean?
They've like, oh, they also have the parking app data and now they're repackaging it and going, oh, look at us, we doxed
ICE, we doxed DHS or whatever.
But when I was verifying, it was spread across so many different data sets that just didn't seem likely.
And sometimes the data didn't appear in previously breached stuff at all as well.
So it was basically building.
a mosaic and an understanding of,
well, this data does appear to be legitimate.
It does appear to be new in a lot of cases as well.
And it really does appear to relate to specific government officials, including DHS, ICE, DOJ, and FBI.
This show is sponsored by BetterHelp.
October 10th was World Mental Health Day.
And this year, we're we're saying a heartfelt thank you to therapists and people who show up, listen, and help millions take meaningful steps towards mental health.
If you've tried therapy, think about those powerful moments where something unlocks.
You get the right question at the right time, a safe space to let something go, or that feeling that something just clicks.
Sometimes that moment can come outside of your session when you're not even thinking about it.
Therapists make that possible.
And if you haven't tried therapy, now is a great time to start considering.
BetterHelp has been helping people connect with licensed therapists for over 12 years.
With more than 30,000 therapists and over 5 million people served worldwide, they're the world's largest online therapy platform.
And with their therapist match commitment, finding the right person to talk to is easier than ever.
BetterHelp's flexible platform makes it easy to schedule, reschedule, and go to your sessions on their all-in-one online platform.
This World Mental Health Day, we're celebrating the therapists who have helped millions of people take that step forward.
If you're ready to find the right therapist for you, BetterHelp can help you start that journey.
Our listeners get 10% off their first month at betterhelp.com/slash 404media.
That's betterhelp.com/slash 404 media.
You know that hoodie you can't stop reaching for?
The one that feels like a second skin, soft, warm, and somehow makes you feel a little more grounded?
Well, I've found it.
And trust me, it's not what you expect.
It's called PACA, spelled P-A-K-A.
And if you're tired of hoodies that stretch out, peel, or feel heavy, this is the upgrade you've been waiting for.
PACA makes performance apparel from Alpaca Fiber, one of the world's most sustainable natural fibers.
Their best-selling hoodie is softer than cashmere, warmer than wool, and breathable.
It's lightweight, but still cozy, doesn't stretch out, doesn't pill, and somehow keeps me warm when it's cold and cool when it's hot.
Basically, it adapts to wherever your life takes you.
This hoodie is built for real life, thermal regulating, odor-resistant, durable, and made to last.
The weather has finally cooled off here in LA, and when I need something to walk my dog, lounge around in the house, grab a dinner or drinks, I find myself regularly tossing on my pack of hoodie.
which is regularly earning me compliments.
Right now, when you order your pack of hoodie, they'll throw in a free pair of their alpaca crew socks, which might be the only thing better than the hoodie.
These are seriously next-level.
They keep your feet dry, never smell, and on top of that, they're just insanely cozy.
Plus, have you ever had socks that come with a lifetime guarantee?
PACKA dares you to wear these out, and if you can, they'll replace them.
If you've been thinking about leveling up your hoodie game, this is your sign to do it now.
To grab your pack of hoodie and free pair of alpaca crew socks, head to go.packaapparel.com/slash 404 media and use my code 404 media.
That's go.paka apparel.com/slash 404 media and enter code 404 media.
So, this story
was
a huge traffic bomb for us.
And I think the reason for that is that
in a different time,
the fact that there was a hack or a dox of a bunch of people who work for DHS and ICE would not be that interesting to so many people.
But it's happening at a time,
as we reported
over and over again this year, where there's like increased attention on DHS and on ICE because of the way that it's operating, mass agents in the streets, picking people up, deporting them, et cetera.
And the timing of this docs,
I think, if you don't read anything about the story, if you don't click through and read a single line of reporting, you might assume that there's some sort of hacktivist
activity here or motivation to the docs.
That is not
what is happening if you are familiar with the group behind the docks, which I think is the most interesting part of this story.
And I guess to start on that, Joe, it's because it's so complicated.
Like
what this group is, the names it has, the motivation, the history of other breaches that they've been involved with.
All that stuff.
So like to start this, I would challenge you to like, what is the, and you use as many words as you want.
You don't have to be succinct because it is complicated, but it's like, what do you think is the most accurate, fair description of this hacking group?
I think this group, which is called Scattered Lapsus Hunters, which we'll get to, which is obviously a very complicated name.
And I'll explain what that means in a minute.
But that group is a financially driven extortion gang.
That is what they do.
They breach companies.
They steal data.
They then try to extort maybe the individual customers
where that data has been taken from, or the overarching technology and service provider, which in this case is Salesforce.
They're a financially driven extortion gang.
That's how we would categorize them.
I think that's fair, but I feel like that kind of like if you were just if you were to just say that,
you're missing some other important
context, which is, I would argue, the fact that that they are primarily English speaking, it seems, that
they appear to be young, or that many of the members are young,
and that they've been involved in real-world violence as well.
Yeah, go ahead.
Yeah, so that was definitely the bullet point.
But now to step back,
and you're absolutely right, to give that context, I'm even going to, I'm going to step several steps back.
So there's there's this thing called the com,
right?
Right.
And it's short for community.
And there's a lot of different ways to describe it.
I think recently I've called it like an online phenomenon.
It's almost like a cultural thing, almost an anthropological thing.
And what this is, is thousands of thousands of people on Telegram.
and Discord, usually English speaking, as you say, Canada, the United States, and the United Kingdom.
And it's gamers, it's people hanging out on these chat apps, people playing Minecraft, people playing Roblox is often how they get into this world.
And it might start with cheating in those games or stealing valuable Roblox items or something like that, until eventually people in Calm are hacking, they are doing fraud, they're scamming, they're doing sim swapping.
At least they did a lot of that.
I'm sure they still do, but things have escalated somewhat now.
And it's this massive community, as I said, of thousands of people doing all sorts of things from trolling to beefing to crime, right up to, as you said, the commission of serious physical violence.
I've seen, well, we've all seen videos of people having their ears chopped off, attacks with hammers, all of this stuff back when I covered Comm
much more, like a few years ago.
I've just been focused on other things recently.
So they come from that ecosystem.
Now,
out of COM, again, because that's not, it's not like a group or really, it's again, like an anthropological thing.
Out of COM, you get these various groups or nexuses of activity.
So a really famous one is going to be Scattered Spider, which was, you know, sort of the most well-known one, I would say.
They were amazing at sim swapping, really effective at that.
Started joining forces with Eastern European ransomware gangs, have been linked to the ransomware attacks on MGM resorts, that sort of thing.
So, really going from stealing Twitter handles, because it's a proper noun and it looks cool to have at dark or whatever on Twitter, right up to hacking MGM and other companies.
So, you have Scattered Spider, you have Lapsus, which actually came before, and I covered them when they broke into Electronic Arts and stole a bunch of data from there.
That was involving buying login tokens for Slack that was also pretty novel at the time and lowered the barrier to entry for hackers again.
So you have Lapsus.
Then you have Shiny Hunters, which almost like a forum administrator, more of a data broker/slash seller, although it has connections to hacks as well.
And then that brings us to this latest evolution of this sort of ecosystem, which is this kind of specific group called scattered Lapsus Hunters.
And they are the ones that have posted these docs of DHS ICE officials.
And that sounds like a bunch of names, and it is just a bunch of names, but it shows that, you know,
we're not just talking about Chinese state-sponsored hackers breaking into OPM anymore and stealing the data of U.S.
officials.
We're talking about probably pretty young people who have escalated from stealing stuff in Roblox to becoming a top-tier like national security threat.
I mean, it's insane when you put it like that.
Yeah.
And notably, again, not hacktivists, like not looking to do good, not anonymous, right?
It's like if you just read the headline and you assume that's what this is, it's not that.
To follow up on something,
you know, in your brief definition of them, you said they're financially motivated,
which is
what made them
dox
these people.
Can you explain a little bit more how that would work?
And also, I believe there was like another
stated
mechanism through which they thought they would gain profit.
Yeah.
So I don't think the doxing of DHS and ICE is specifically linked to the extortion.
But to give the context on that,
this group or people affiliated with it, whatever, broke into
a bunch of databases which were using tech from the tech company Salesforce, which we all know.
A ton of companies were using this, Disney, Hulu, Toyota, UPS, all of these companies.
This group, again, people working with them, broke into those, stole all the data, and now they've been trying to extort Salesforce.
Maybe they've gone to individual companies as well, but they've gone more to Salesforce and said, we're going to leak all of your customers' data if you don't give us...
you know, presumably millions of dollars or whatever.
Salesforce has said, we're not going to do that.
That's what Bloomberg reported.
And then when I've spoken to a member, they said, oh, yeah, we're sort of done with Salesforce and now we're just posting these docs.
And the reason being, and I didn't include this in the article because, you know, I don't take it as a verified fact, but I feel like I can give people just a behind the scenes thing on the podcast, which is that one of the members told me they started doxing DHS and ICE because one of their friends got deported.
I don't know, man.
Like I asked, well, can you give me the name of the person who was deported?
Then I can verify it.
And they didn't want to provide that and because presumably that could you know reveal their identity or something like that which i get they're not going to hand that over but that's why i didn't put in the article so they are extorting salesforce or they were extorting salesforce now they're almost just doing stuff to cause damage and that's like a key feature of these groups as well is that they just do stuff to start fires sometimes i was i was going to give my completely subjective reading of it, but it almost feels as if they haven't been in the news for a minute.
And they were like, oh, you know what?
We could do to really get some clout right now is dox a bunch of DHS and ICE officials because that's what everybody's talking about.
And they were right.
You know, it's like it did work.
And they have a history of trolling and making a mess just to make a mess.
So I definitely, I can't prove it, but like, that I believe that that is a motivation here.
So you published the the story we just talked about.
Then, a few days later, you published another story with the headline: hackers say they have the personal data of thousands of NSA and other government officials.
So, what did you learn to justify publishing this new story?
Yeah.
So, a little while before that, almost around the time of the original DHS and ICE doxing, these hackers also posted the apparent docs of a specific NSA official in the Telegram chat.
I mentioned that offhand in a behind the blog that we published on a recent Friday.
I then do this one.
Then again, this member reaches out and says, oh, we actually have
much more data.
They say of 2,000 NSA officials, 22,000 something like in total of the US government.
And they start sending me the personal data.
of a lot of government officials.
And again, it's the same thing.
It's names, email addresses, phone numbers, and addresses.
And
honestly, the number of agencies was dizzying.
There's Defense Intelligence Agency, Federal Trade Commission, Federal Aviation Administration, the CDC, ATF, Air Force, State Department.
And what I did was I did much the same thing.
I took those pieces of personal data.
turned to that dark side tool again and verified what I could.
Again, it looked legitimate.
So then we published that
as well.
And initially, this story, at least in my head, was going to be more about the Salesforce connection because that was a question when they initially published the DHS and ICE ones.
Like, well, where are they getting this data?
It's probably the Salesforce stuff they stole, but I don't really know.
The member reached out.
That's what they said.
That was going to be the headline.
But then when they started sending all of this other personal data, I was like, oh, this is like a really broad,
I think attack is too strong, like a broad incident with the US government.
It's not just a DHS and ICE thing anymore.
Any other agencies that you want to talk about that were impacted or anything else you want to say about where they got this data?
Yeah.
Well, I mean, there's FDA in there as well, Health and Human Services, and they mentioned State Department.
I think the last thing I would say is that I find it very interesting that I mentioned, you know, several several years ago, Chinese state hackers targeted the Office of Personnel Management, OPM, and they took all of this very sensitive data that, you know, could be used to identify, I think, undercover officials and that sort of thing.
When you attack
the HR department of the US government, you're going to get a lot of very sensitive, interesting data.
That's why the Chinese hackers did that.
Here, I'm definitely not saying they're equivalent.
The OPM stuff would have been way more sensitive because there's like background checks and and stuff in there as well.
But what's happened here is that a bunch of
English-speaking, probably young hackers, have managed to build dossiers on US government officials themselves, not by attacking the US government directly, but going to all of these other companies that happen to use Salesforce.
And of course, it also reminds me of the Snowflake ATT breach, where young hackers broke into a Snowflake database run by AT ⁇ T, which included the call
and text metadata records of nearly all of AT ⁇ T customers.
Again, that is something that you would usually attribute to top-level state-sponsored hackers from Chinese, Arabian, Russian intelligence, something like that.
And we reported, they did a lot of interesting stuff with that.
They looked up members of the Trump family in that data and various politicians, all of that sort of thing.
Almost Almost the same thing is happening here.
It's not Snowflake this time.
It's Salesforce, but they've gone and got this very sensitive data, including, again, personal data of NSA officials and other intelligence community officials as well.
The Telegram channel is gone.
Now,
people are saying it was shut down by Telegram.
That seems like the most likely explanation.
There are rumors that members have been arrested, but I haven't seen strong enough confirmation to, you know, put that in an article or anything.
But I don't know.
If you're starting to dox
ICHS and NSA officials, I think that the US government is probably going to come knocking pretty quickly.
All right.
Shall we leave that there?
After the break, we're going to talk about one of Emmanuel's stories.
He's going to bring us up to date on what's happening with Wikipedia and AI.
We'll be right back after this.
The holidays are upon us, and businesses are hiring for seasonal roles.
Haunted house ghosts, turkey carvers, even Santa is probably hiring for his workshop.
That means that people with specialized skills, experience, and licenses are in high demand and can be tough to find.
Whether you're hiring for one of these roles or any other role, the best way to find the perfect match for your role is on ZipRecruiter.
And right now, you can try it for free at ziprecruiter.com slash 404media.
Whether you're hiring for an office job or a haunted corn maze, ZipRecruiter's matching technology works fast to find top talent.
So you don't waste your time or your money.
And you can get someone in the door on the timeline you need.
With ZipRecruiter, you can immediately figure out just how many qualified people are in your area looking for your role.
And you can instantly unlock top candidates' info with ZipRecruiter's advanced resume database.
Let ZipRecruiter find the right people for your roles, seasonal or otherwise.
Four out of five employers who post on ZipRecruiter get a quality candidate within the first day.
And right now, you can try it for free at ziprecruiter.com slash 404media.
Again, that's ziprecruiter.com slash 404media.
ZipRecruiter, the smartest way to hire.
It's a business owner's dream, but also kind of a problem.
Your idea takes off and suddenly you're the CEO, the accountant, the designer, the marketing team, and the IT department all at once.
There's just not enough hours in the day.
I felt that way with 404 Media.
We're not just the journalists, we're doing everything.
Thankfully, having Shopify in our corner has simplified everything and made it easy to run our own merch store.
Shopify is the commerce platform behind millions of businesses around the world, powering 10% of all e-commerce in the U.S.
From household names like Mattel and Gymshark to, well, 404 Media, Shopify gives you the tools to build, sell, and grow anywhere, whether you're online, brick and mortar, or both.
What I love best about Shopify is you can set up your store in just a few minutes with hundreds of easy-to-use templates and be on the world's most powerful and easy-to-use backend in a snap.
A little secret.
Your Shopify store will look like you've been selling for years, even if you only spend a few minutes setting things up.
Shopify has made it so easy to add and manage inventory, handle shipping and returns, and refresh your items like we just did with a new line of merch.
That's the great thing about Shopify.
It wears all the business hats that maybe you don't feel comfortable wearing.
So you can have top-of-the-line design, logistics, and marketing letting you launch email campaigns, run social media ads, and connect you with your audience wherever they scroll or stroll.
Shopify is your commerce expert, helping you with everything from inventory management to international shipping and more.
Because Because when you're ready to sell, you're ready for Shopify.
Turn your big business idea into
with Shopify by your side.
Sign up for your $1 per month trial today and start selling at shopify.com/slash media.
That's shopify.com/slash media.
All right, we are back.
Emmanuel, this is one you wrote.
The headline is: Wikipedia says AI is causing a dangerous decline in human visitors.
So, Wikipedia is seeing this.
How big are we talking?
What's the scale before we get to the specifics?
Is it like, I don't know, half of all traffic to Wikipedia is bots or something?
Like, is this a big deal?
So, the hard number they gave me is 8%,
8%
down compared to the same time last year,
which I don't know what that sounds like to you.
I would say if we were to look at 4-4 media's traffic and see that we were down for October, 8% year over year, we would definitely not lose our minds over that.
But you have to keep in mind that Wikipedia is one of the most popular sites on the internet.
I believe that they get something like 300 billion page views a year.
So, when you're talking about an 8% decline, you're talking about billions of page views, which is just like a massive shift in how traffic is rooted around the internet.
Yeah.
So, what does Wikipedia attribute this 8% decline to?
And to be clear, it's an 8% decline of human visitors.
Correct.
Yeah.
So, you know, they kind of do an audit of their traffic
and also
they
have always done this, but now a lot more they do a lot of auditing of bot traffic in particular.
And they have ever-evolving methods for how to designate traffic as bot traffic, as many other sites do now because of the huge problem of AI scrapers that we've talked about on the podcast many times.
And what happened basically is that they noticed
a lot of traffic,
like an abnormal spike of traffic coming from Brazil specifically.
And it was such a significant increase that it made him
go back and investigate the traffic, which they previously assumed was human.
And upon further examination, they decided that that traffic was not human, that it was bot traffic.
And then they both reconfigured their methods for detecting bot traffic because of this investigation and then revised their traffic data.
And that's when they noticed that there was an 8% decline in human traffic.
Yeah.
So
I'm speculating and kind of putting words in the mouth a little bit, but it's almost like, wow, there's a big spike of readers from Brazil.
Maybe people in Brazil just got there was some Asian news event, and wow, everybody's checking out Wikipedia and they go and look at it.
it's like oh no it's actually a bunch of robots uh essentially
they they they didn't speculate or name a specific company then it was just hundred there's a lot of buck traffic coming from brazil for some reason and left it at that they did not name a specific company i think
as you probably know from reporting on uh hacking or even like video game hacking cheating they're kind of in a bind here uh obviously as a reporter i want to get as as much specific information as possible, and I pushed for that.
However,
they are reluctant to share technical details about how this bot traffic works and how they detected it.
Because
once they make that information public, it makes it easier for people who build scrapers to work around their protection and detection methods.
So they didn't explain exactly how it worked, only that they were confident that it was bot traffic.
Who that can be,
your guess is as good as mine.
All I can say about that is that every single major AI company scrapes the internet for traffic.
There's a varying degree of openness about that practice, but everybody does it.
And that is before we get into
AI companies
from China and other countries that are less open about how they collect their data and a bunch of really small projects that like nobody knows exists, but they are also scraping the internet to build AI products.
So that can be anyone.
However, kind of like related and in addition to this
bot scraping
problem,
when they talk about a decline in traffic, they also
say that search engines presenting their own summaries of Wikipedia articles is diverting a lot of traffic away from Wikipedia.
And they don't say Google, but it's Google.
Obviously, the AI summaries are the top of Google, right?
Not even AI summaries, yes.
And I recently wrote about a study from Pew that shows that like only 1% of people who are presented in an AI summary click through to the article, which makes that problem even worse.
But that is building upon Google's years-long practice of presenting like the Google snippet or knowledge panel, which basically does the same thing.
It just shows you a summary of something instead of throwing you a link to that website.
And Wikipedia is saying that obviously that is diverting traffic away from them.
Yeah.
So there's two things.
There's a spike in bot traffic.
In that just case, it was from Brazil, but of course it happens all over as well.
And then people just not going.
to Wikipedia because they're seeing some sort of summary from a search engine, that sort of thing.
And the bot traffic is building chat bots that are also diverting traffic right so it's like there are also there's also an increase of people who are not going to google they're going to chat gpt
asking a question the answer that is provided many many times is just a summary of a wikipedia article but you get when you do that you don't you you're you don't go to wikipedia you get wikipedia information but you're not going to wikipedia right
so the wikimedia foundation which is a non-profit that hosts Wikipedia, their senior director of product, Marshall Miller, wrote this blog post announcing it.
I think you actually got it under embargo, if I remember correctly.
And then you also spoke to Miller via email.
What did they tell you?
Like, did they elaborate on this at all?
Again, this is them being nice and like entertaining my questions, but
the gist was like, we can't tell you exactly how we did this, only that we're very confident that this was bot traffic and
that our bot detection methods evolve all the time.
And that this most recent spike was like very severe and something that we're confident is bot traffic.
But we can't tell you the gory details for the reasons I mentioned earlier.
Yeah.
So you mentioned the Pew research about only 1% of people clicking through when they're presented with a Google summary.
Does this line up with any other research you've seen recently?
Because it just seems to be, it almost seems like we've covered this a bunch, where you specifically have covered this a bunch, and now Wikipedia is like coming out of a blog post basically saying, yep, here's another example of it.
Yeah, so there's the Pew research where they tracked, I forget what the sample size was, but they just like tracked the user behavior of people who were presented AI summaries on Google and saw that only 1% of those users clicked through.
That is building on top of a bunch of anecdotal reporting
from and about
media that they're seeing a decline in traffic because of Google summaries, because of chatbots.
And
that is also building on top of
previous
self-reporting from Wikipedia saying, like, hey, we are dealing with like a massive spike in bot scrapers that is making it more difficult for our site to operate.
And we've heard the same thing.
I reported about this quite a bit, where it's not just Wikipedia.
It's like any store of information online, online libraries, museums, online archives, they're all getting slammed by
these bot scrapers that is making it more difficult for their sites to operate.
Yeah.
So what is the actual impact on Wikipedia with all this?
Is it what you just alluded to, which is like, it's kind of getting overloaded and there's all of these bots?
I feel like it's probably not that because I think Wikipedia can handle that.
Or is it more, you know, fewer people are going through and they're not going to donate as much money?
Like, do you, do you know what the actual tangible impact is?
I think that's a great question.
And to be honest, when they first told me about this, that was...
the question that came to mind as well because Wikipedia is free, right?
It's free.
It doesn't have ads.
What does it matter if you get the information from wikipedia.org or if you get it from a summary from ChatGPT?
And
there are two
reasons
that
Miller gave me.
And one is
pretty obvious when you think about it, which is
Wikimedia Foundation relies on donations.
The way they collect donations, if you go to Wikipedia and there's a big banner, which people love to complain about on Wikipedia, which is like, hey, we're supported by donations.
Please give us money.
And it's like, that is an important way for them to raise money, just as it's a way for us to raise money, right?
It's like you go to our website, you read our article, and there are various what we call calls to action to support us financially.
And that's how we run our business.
So that makes perfect sense.
The other reason I thought was way more interesting,
which is Wikipedia, there's a Wikimedia Foundation which kind of operates at a very high level and does this traffic analysis, writes these blog posts.
But like the governance and production of Wikipedia articles is not them.
Like the reason Wikipedia is such an amazing and
still functional
store of information is that it's
community and volunteer run.
And it's just a bunch of people writing articles, debating them, editing them, coming up with the governance for how to moderate articles and all of that.
And
if
you
don't have some sort of mechanism to feed more users into that ecosystem
and have them graduate into being Wikipedia editors and volunteers, then the system dies, right?
It's just like Wikipedia has been around for a very long time now.
And at this point, like I talk to Wikipedia editors sometimes, they are younger than me.
And the reason they get into Wikipedia is like their students at school, they research something, they start using Wikipedia, and they're like, oh, this is a really cool project.
How do I get involved?
And that's how Wikipedia works.
Like, that's what makes it valuable.
And if you,
if, if, if that process stops at a Google AI-generated answer or a chat GPT-generated answer,
the whole system breaks down.
And that seems to me the more
serious concern for the Wikimedia Foundation is you have to let people into Wikipedia's actual site in order to keep that ecosystem of volunteering alive.
Yeah.
And just lastly, well, what is the solution here if one exists?
Is it just Wikipedia continues this cat and mouse game with bots?
Or
like there's no, oh, if only we did this, the problem would be solved.
I mean, that's just not realistic.
It's an ongoing dynamic, right?
Yeah, I thought this was also a really interesting part of their announcement.
There's like a two-pronged approach to the problem, it seems.
And one is
Wikimedia
has official relationships with Google, with YouTube,
with these generative AI companies.
And
they don't give specifics here, but it sounds like, you know, at this point, sometimes when you go to YouTube and you watch a video and
YouTube itself will like fact check a video, and what it will do is,
you know, like it gives you a little banner at the bottom that says something like, oh, well, people dispute the argument
in this video or this video was produced by so-and-so government, and it sends you to a Wikipedia page for that topic.
And that obviously is good for Wikipedia because it funnels people that way.
So it sounds like they want to build on those relationships and they want to build in those relationships with AI companies in particular because it looks like that funnel, obviously, as we can see from the data, that funnel is not working to their advantage or not working fairly given what they contribute and like the traffic they get back.
So that's a piece of it.
That makes a lot of sense.
The other one I think was kind of more interesting because it was an appeal to personal responsibility.
It was sort of telling people, it's like, hey, it's like, do you like getting information online?
You probably get that information from Wikipedia.
You should consider visiting our website and you should consider becoming a volunteer.
And you should consider how
this
really, this whole system makes the internet a much healthier, more productive, positive place.
And like, please get involved, which I don't know.
I'm not a huge fan of like
personal responsibility arguments when it comes to like climate change or something like that.
But given that this is a community run
thing,
I find that to be a compelling argument.
How many people hear it and how many people actually respond to it,
I think remains to be seen.
But
that was kind of his plea.
Yeah, bring in more humans to counter the increase in bots, essentially, in various forms.
All right, we'll leave that there.
If you're listening to the free version of the podcast, I'll now play us out.
But if you are a paying 404 Media subscriber, we're going to talk about OpenAI's pivot to sex chatbot service.
I mean, it was always going to happen.
The writing was on the wall.
It took them long enough.
You can subscribe and gain access to that content at 404media.co.
As a reminder, 404 Media is journalist-founded and supported by subscribers.
If you do wish to subscribe to 404 Media and directly support our work, please go to 404media.co.
You'll get unlimited access to our articles and an ad-free version of this podcast.
You'll also get to listen to the subscribers-only section where we talk about the bonus story each week.
This podcast is made in partnership with Kalegoscope.
Another way to support us is by leaving a five-star rating and review for the podcast.
That stuff really helps us out.
I'll read some more out soon.
This has been 404 Media.
We'll see you again next week.