The pioneers of proof

8m

Here are More or Less we’ll all about the facts. Every day we use a toolkit of known proofs to try and answer our listeners’ questions. But who do we have to thank for this toolkit and how did they set about proving the unknown?

Luckily for us mathematician Adam Kucharski has just written a book about this very topic called ‘Proof: The Uncertain Science of Certainty’.

Join us to hear more about some of the proof pioneers included in his book, from estimating the number of German tanks during WW2 to an unsung heroine of statistics.

Presenter: Tim Harford
Producer: Lizzy McNeill
Series Producer: Tom Colls
Editor: Richard Vadon
Production Co-ordinator: Brenda Brown
Sound Mix: Annie Gardiner

Listen and follow along

Transcript

This BBC podcast is supported by ads outside the UK.

I'm Bluff, and you're in California, which means you can play on my favorite social casino, SpinQuest.com.

They have over a thousand slots and table games available to play from the comfort of your own phone with instant cash prize redemptions.

And new users that sign up today get a $30 coin package for only $10.

That's S-P-I-N-Q-U-E-S-T.com.

I'll see you there.

SpinQuest is a free-to-play social casino.

Voidwear Prohibited.

Visit SpendQuest.com for more details.

When disaster takes control of your life, ServePro helps you take it back.

ServePro shows up faster to any size disaster to make things right.

Starting with a single call, that's all.

Because the number one name in cleanup and restoration has the scale and the expertise to get you back up to speed quicker than you ever thought possible.

So whenever never thought this would happen actually happens, ServePro's got you.

Call 1-800-SERPRO or visit SurfPro.com today to help make it like it never even happened.

Hello and thank you for downloading the More Orless podcast.

We're the program that delights in data, marvels at maths and swoons over statistics.

And as ever, I'm Tim Harford.

Here at More Orless, we are constantly fact-checking wild claims, but proof can be a strangely difficult beast to capture.

Sometimes you don't have enough data, sometimes it seems to be a matter of opinion, and sometimes, just sometimes, the proof of the pudding is in the eating.

Every day we use a toolkit of known proofs to try and answer our listeners' questions.

But who do we have to thank for this toolkit, and how did they set about proving the unknown?

Luckily for me, Adam Kacharsky has just written a book about this very topic called Proof, the Uncertain Science of certainty.

Adam is a mathematician and professor of epidemiology at the London School of Hygiene and Tropical Medicine.

I sat down with him to hear more about some of the proof pioneers included in his book.

We start in the lead-up to D-Day in 1944.

Here on what we get to be the eve of the greatest military operation of all time, the Allies were trying to predict what the Germans might have waiting for them in occupied France, and in particular a fearsome new tank called the Panther Mark V.

Just how many of these tanks had the German economy been able to produce?

And the Allies had only managed to get hold of two captured tanks.

So the British and the Americans have got one and the Russians have got another.

Intelligence reports estimated some 1500 of these tanks were being built every month.

The statisticians weren't so sure.

They started to pet into pieces and they noticed that the little wheels that keep the tracks in place have these rubber tyres on them.

There's 24 on each side.

And then looking at these rubber tires, they realize that each tyre has a serial number.

So they start to get a sense of, well, can this tell us how many tank molds the manufacturers have?

And that gives us something about their manufacturing capacity.

And we can use that to estimate.

What they did is they had, for one manufacturer, about 20 serial numbers and the highest was 77.

So I think everyone intuitively could feel there's probably not thousands of these serial numbers.

And using this method for essentially estimating if we've observed 20 and the biggest is 77, they worked out it's probably about 80 molds to that manufacturing.

If you've only seen one, then maybe there are 150, but we're really not sure.

Exactly.

And if you see that you see more and more, and they're all less than 77.

It becomes less plausible that suddenly there's loads of higher numbers that you just by chance haven't observed.

And what they did using this method and using their kind of understanding of the manufacturing process is estimated that the Germans were probably making about 270 a month in early 1944.

The statisticians assumed that the tank serial numbers started at one and counted up from there.

With 20 serial numbers, none of which are higher than 77, they combined this with information about production rates from British tank manufacturers, used a bit of statistical intuition, and came out with the estimate of 270 tanks a month.

That was way below the intelligence estimate of 1500 a month.

Had they predicted too low?

And as it happened on D-Day, they ended up facing a large chunk, about 40% of the tanks they faced were Panthers.

And later, when they found out the actual manufacturing numbers, the real numbers 276.

Wow.

So they estimated 270, the real numbers 276.

So they basically nailed it.

Just from looking at tyres.

As we all know, D-Day was a success and a pivotal moment during the war.

Some 76 years later, Adam found himself predicting numbers in a high-pressure situation using very similar methods.

There was a few situations when we were working on COVID as epidemiologists trying to understand data where we might have had fragments of data and only some observations on cases or tests, for example.

And looking at that, it struck me this is just a version of the German tank problem.

We have this unknown total and we have some fragmented observations that are drawn somewhat a few years.

You've got a few cases, you've got a few tests, you've got some positive tests, some negative tests.

You're trying to work out how many people are infected or how accurate the test is and and actually for a very quick rough ballpark it's just a really simple calculation that can get you there and then at least it informs the scale of what you might be dealing with and what sort of follow-up you want to do news you can use

our next proof pioneer is a scientist responsible for two of my absolutely favourite things cohort studies and case control studies Born in rural Lincolnshire on the 3rd of February 1877,

Janet Lane Claypon grew up in a wealthy family.

Originally homeschooled, she was always noted as being very bright.

In 1899, she moved to London and gained a first-class degree from University College London.

She then got her PhD in physiology and a medical doctorate in 1910.

Indeed, very, very bright.

She became particularly interested in child health.

and one of the questions was the nutritional benefits of breast milk versus at the time boiled cow's milk.

But she couldn't find in England the right kind of data set at the time.

So she went to Berlin and, across a series of clinics in working-class areas,

from

data that had been already collected, pieced together different groups of children that had been fed breast milk and cow's milk, followed them over time in the data sets, and looked at what happened to them.

And this is a method we still use today in a lot of health studies.

It's known as a retrospective cohort because what you're doing is looking backwards and then you're identifying these groups that then you can reconstruct what happens to them subsequently.

So a really kind of powerful idea for getting what she needed to explore this problem.

Obviously the ideal situation would be a large-scale randomised controlled trial.

However, the first properly controlled medical trial wasn't conducted until the late 1940s.

In any case, one of the benefits of a retrospective cohort trial meant that she was able to get answers very quickly.

as instead of observing what happens in the present, you're looking back at what has already happened.

So one of the things she was particularly interested in is just how much they grew over time as a measure of how much nutrition they were getting.

And from an initial look at the data, it seemed that children who were fed breast milk were growing more than those who were fed boiled cow's milk.

But she realised there were some limitations here.

So one of them was maybe there's another factor that's influencing both what they're given to eat and their health and their well-being.

So she thought maybe family income, well that's a similar area, there might be just different aspects of wealth that influence both the probability you give them breast milk versus cow's milk and just their overall health over time.

And so she adjusted for this.

She said, well, okay, let's account for that difference between the groups as a fair comparison.

There's still that difference, even if you account for those differences in income.

And nowadays, we call this a confounder in statistics.

So a confounder is some factor that influences both the thing you're exposed to, in this case, diet, and your outcome, in this case, growth.

after her research into mother's milk versus moo cow milk she was commissioned by the early medical council to look at the risk factors associated with breast cancer again because breast cancer is one of the things that develops over a very long period of time these can be quite rare events in a population and she wanted an answer faster what she did was in London and Glasgow look at people who develop cancer, looked at about 500 people and then looked at 500 so-called controls that had attended hospital but for other reasons.

So very similar in individuals by age and other characteristics, but they didn't have cancer.

And then looked at what might be in the history of these individuals that might tell you something different.

And one of the things that jumped out was differences in the number of children that they'd had.

That particularly, the women who'd had cancer generally had fewer children.

And again, confounders are potential issue here.

So she accounted for things like their age, how long they'd been married, but again, found this signal between the data sets.

And we now call this a case control study.

Some of the things she discovered about risk factors for breast cancer are still cited today and epidemiologists still use case control studies to understand differences in risk.

Thanks to Adam Kacharski, author of Proof, and to all the proof pioneers of the past who've all made our lives that little bit more predictable in the best of ways.

That is all we have time for this week, but please do keep your questions and comments coming in to more or less at bbc.co.uk.

We will be back next week.

And until then, goodbye.

The Mercedes-Benz Dream Days are back with offers on vehicles like the 2025 E-Class, CLE Coupe, C-Class, and EQE sedan.

Hurry in now through July 31st.

Visit your local authorized dealer or learn more at mbusa.com/slash dream.

Want to stop engine problems before they start?

Pick up a can of CFOAM motor treatment.

C-Foam helps engines start easier, run smoother, and last longer.

Trusted by millions every day, C-Foam is safe and easy to use in any engine.

Just pour it in your fuel tank.

Make the proven choice with C-Foam.

Available everywhere, automotive products are sold.

Seafoam!