The Case of the Missing US Data
In early February 2025, something strange started happening across US government websites.
Decades of data began disappearing from webpages for agencies such as the Centres for Disease Control and Prevention, the National Institutes of Health and the Census Bureau. In many cases the entire website went dark.
Within a few days some 8,000 government pages and 3,000 datasets had been taken down. Since then, many have been reinstated - but some have not.
We speak to Professors Maggie Levinstein and John Kubale to find out why this data was taken away, and why any of it matters.
If you spot any numbers or statistical claims that you think we should check out contact: moreorless@bbc.co.uk
Presenter: Tim Harford
Producer: Lizzy McNeill
Series Producer: Tom Colls
Editor: Richard Vadon
Production Co-Ordinator: Rosie Strawbridge
Audio Mix: Neil Churchill
Listen and follow along
Transcript
This BBC podcast is supported by ads outside the UK.
At the UPS store, we ensure your small biz stands out with a variety of high-quality paper stock options, banners, posters for shores, and more.
Most locations are independently owned.
Product services, pricing, and hours of operation may vary.
See Center for Details.
The UPS store.
Be unstoppable.
Come into your local store today and get your print on.
Stop settling for weak sound.
It's time to level up your game and bring the boom.
Hit the town with the ultra-durable LGX Boom portable speaker and enjoy vibrant sound wherever you go.
Elevate your listening experience to new heights because let's be real, your music deserves it.
The future of sound is now with LG X-Boom.
And for a limited time, save 25% at lg.com with code FALL25.
Bring the boom.
X-Boom.
Hello and thanks for downloading the More Orless podcast.
We're the program that delves into data, steeps itself in statistics and mulls over mathematical mysteries.
And I'm Tim Harford.
In early February 2025,
something strange started happening across US government websites.
Error.
Decades of data began disappearing from web pages to agencies such as the Centers for Disease Control and Prevention, the National Institutes of Health, and the Census Bureau.
In many cases the entire website went daft.
Within a few days some 8,000 government pages and 3,000 data sets had been taken down.
Since then many have been reinstated but some have not.
A curious case indeed.
As data sleuths we have questions.
Why was this data taken away?
And why does any of it matter?
Let us delve into the case of the missing U.S.
data.
Lots and lots of data sets disappeared from access because entire websites for agencies of the government came down because of executive orders from the president questioning what could be made available.
I'm Maggie Levinstein.
I'm a professor at the University of Michigan and director of ICPSR, the data consortium that's based here.
At the start of President Trump's second term, he signed a flurry of executive orders.
For example, eradicating measures promoting diversity, equality and inclusion in federal government.
The order targeted websites over language about minorities, indigenous peoples, race, gender, sexuality and disability.
Climate change was another banned topic.
Because agencies weren't sure that their websites would be consistent with the policies, they just brought the entire websites down.
And that meant that data that had been collected and was accessed through those websites became inaccessible.
While this was happening, the Department of Government Efficiency was plowing on with their goal of cutting $2 trillion from government spending.
They haven't got anywhere close to those savings, by the way.
Lots and lots of people were fired, and those people are often people who produce data or people who provide access to data.
And without those people in place, access was impossible.
And there also were budget cuts that were affecting data access and data collection going forward.
Now most of the government websites did eventually reappear.
Some data sets were fully restored, but some data sets were discontinued and some reappeared, although researchers soon noticed they'd been changed.
There were actual decisions for particular data sets to alter them to obscure information about gender identity or about vaccines or about sexual behavior or about climate, that those are policy areas where there's a lot of data that the administration didn't want people to have access to that data about those questions.
And so those data were removed or were altered.
When data goes missing or is altered, it can have very real life consequences.
For example, Some might see the eradication of the non-binary gender category as purely a symbolic move.
But we know that teenagers who identify as non-binary are more likely to be more vulnerable to certain risks than their peers.
That's something we couldn't have known without collecting the data in the past and without the data now.
Are we providing the appropriate social supports to them?
It's very hard to know without the data that was being collected about the mental health of young people and their gender identity and their risk and their sexual behaviors.
When we don't have that information, it's much harder to provide the resources that people need to live full and healthy lives.
The data at risk spans all areas of life and society.
Recently, the Bureau of Labor Statistics announced that when calculating CPI, the Consumer Prices Index, they would stop collecting the data from three cities due to budgetary cuts.
Businesses rely on these and they rely on them to be high quality measures.
When we reduce spending and the resources that the BLS has, then we get lower quality measures.
And as a result, businesses are kind of much more likely to be flying blind.
Many websites also now have government-mandated disclaimers on their homepage throwing doubt on the data.
For example, if you go on the CDC, the Centre for Disease Control and Prevention's homepage for national HIV surveillance systems, you're met with this.
Per a court order, HHS is required to restore this website as of 11.59 p.m.
Eastern Time, February 11th, 2025.
Any information on this page promoting gender ideology is extremely inaccurate and disconnected from the immutable biological reality that there are two sexes, male and female.
The fate of other datasets is uncertain.
For example, a decades-old dataset that monitors birth outcomes in the US, PRAMS.
PRAMS, or the Pregnancy Risk Assessment Monitoring System, is kind of the gold standard for assessing and monitoring maternal and child health in the US.
That's an area where the US has consistently lagged behind other high-income countries.
That's John Kubali, a research assistant professor at the University of Michigan.
who specializes in epidemiology.
The Department of Health and Human Services states that the purpose of PRAMS is to reduce infant morbidity and mortality by influencing programs and policies aimed at reducing health problems among mothers and infants.
As an infant is around three times more likely to die in the US than in Finland, Japan or Sweden, this seems like a pretty important data set.
It's something that is used both by policymakers and by researchers really to understand how are different aspects of maternal and child health changing over time and what populations are most impacted.
That's really important information that you need to have if you're going to be considering how to design interventions to address some of those challenges.
This data is collected by local states and governments and then sent to the CDC to be merged and for population weighting to be added.
To access the data, researchers needed to apply for access and sign a declaration stating they wouldn't share the data.
I started getting contacted by researchers and essentially these requests were just going unanswered.
I submitted multiple requests myself.
I reached out to that office directly.
You know, I never heard any response.
And so essentially any of the data from the last 10 years, so 2016 or later, that's not publicly available.
And so if you did not have access to that already, you're kind of out of luck at this point.
Do we know whether they're still collecting the data?
Yeah, there is some uncertainty around that.
There was an announcement, I believe, in early to mid-February saying, oh, we're going to discontinue this data collection.
And there was a massive backlash and outcry.
And there was kind of a clarification, oh, no, we're not going to discontinue this, but we're going to kind of push it back until April 2025, and then we'll start it up again.
But April 1st, the entire Pram's office was essentially fired.
And so now it's really uncertain what this means for this data collection moving forward.
Since February, there's been a scramble to try and safeguard as as much data as possible.
Both Maggie and John are involved in this process.
But snapshots of old web pages do not a full data set make.
So there's some data that you can't get to at all.
There's some data that we don't know is gone yet.
There's also data that's been preserved, but we don't know for sure that it hasn't been changed.
And you also don't have the context, what we call metadata, the description of the data, all the things that allow you to make inference from the data.
The case remains open.
Thanks to Maggie Levinstein and John Kubali.
That is all we have time for this week, but if you have any questions or comments, please email in to more or less at bbc.co.uk.
Until next week, goodbye.
Stop settling for weak sound.
It's time to level up your game and bring the boom.
Hit the town with the ultra-durable LG X-Boom portable speaker and enjoy vibrant sound wherever you go.
Elevate your listening experience to new heights because let's be real, your music deserves it.
The future of sound is now with LG X-Boom.
And for a limited time, save 25% at LG.com with code Fall25.
Bring the boom.
X-Boom.
In the market to sell or trade in your luxury exotic, call the company the dealers trust, gimmethhevin.com.
Looking to sell your Hurricane, 9-11, Cullin Inn, Ferrari, or Bugatti?
GimmeTheVin will pay you the most.
It's as simple as going to GimmeTheVin.com and entering your car's VIN number or license plate number.
You'll get an offer on your car in minutes, well above any dealer quote.
You can go to one of their offices or they'll come to you.
GimmeTheVin.com is America's best car buyer.