Tech and AI: 5. Big Data

14m

Big data is vast, unwieldy information that is so huge that normal software traditionally used to analyse it just can't cope.

It's not new. Ever since humans organised ourselves, we've been keeping records, and gathering information, from scrolls, to bank ledgers and now your internet browsing history.

But his information is now flowing into organisations at a faster rate and in greater volume than ever before. Taking a good look at it could provide meaningful insights. But how can you spot patterns in the chaos, without the job taking forever? And what insights could this mega data analysis provide?

Technology has already completely altered our lives, and Artificial Intelligence may transform our world to an even greater degree. This series is your chance to get back to basics and really understand key technology terms. What's an algorithm? where is "the Cloud" and what exactly is Blockchain? What's the difference between machine and deep learning in artificial intelligence, and is it just our jobs under threat, or is it much worse than that? And before we get to the destruction of humanity, should we all be using Bitcoin? Our experts will explain in the very simplest terms everything you need to know about the tech that underpins your day. We'll explore the rich history of how all these systems developed, and where they may be going next.

Presenter: Spencer Kelly
Producers: Ravi Naik and Nick Holland
Editor: Clare Fordham
Production Coordinator: Janet Staples

Listen and follow along

Transcript

This BBC podcast is supported by ads outside the UK.

Suffs!

The new musical has made Tony award-winning history on Broadway.

We demand to be home!

Winner, best score!

We demand to be seen!

Winner, best book!

We demand to be quality!

It's a theatrical masterpiece that's thrilling, inspiring, dazzlingly entertaining, and unquestionably the most emotionally stirring musical this season.

Suffs!

Playing the Orpheum Theater, October 22nd through November 9th.

Tickets at BroadwaySF.com.

So, what do this animal

and this animal

and this animal

have in common?

They all live on an Organic Valley farm.

Organic Valley dairy comes from small organic family farms that protect the land and the plants and animals that live on it from toxic pesticides, which leads to a thriving ecosystem and delicious, nutritious milk and cheese.

Learn more at OV.coop and taste the difference.

BBC Sounds, music, radio, podcasts.

Welcome to Understand Tech and AI, the podcast that takes you back to basics to explain, explore, unpick, and demystify the technology that's becoming part part of our everyday lives.

I'm Spencer Kelly from BBC Click and you can find all of these episodes on BBC Sounds.

Every time you use your payment card, the transaction is recorded.

Every time you use your phone, what you do with it is stored somewhere.

In fact, it's probably recording your location right now.

If you have a smart TV, if you watch YouTube, if you stream music, if you read news or post on social media, you are leaving a trail.

A trail of data that over time has become huge.

And because we're all doing it, it's actually massive.

But that's not necessarily a bad thing because when you can see everything,

it's possible to spot patterns and solve problems that you couldn't notice at close range.

Welcome to the world of big data.

And joining us in this vast place is Julia Kasmire, a research fellow for the UK Data Service and an expert on big data.

Welcome.

Thank you very much.

You are a computational social scientist for the UK Data Service.

First question, what on earth is that?

Yeah, I get this question a lot.

I would say that a computational social scientist is someone who uses computationally intensive methods to address social science questions.

So things that we might traditionally address through surveys or studying diaries or literature analysis.

Instead, I will suggest that people also use things like tracking movement through phone GPS or heart rate sensors, things throughout the environment like temperature sensors or cameras or air pollution sensors.

And we can use those accurate readings along with the responses from surveys to give much more detailed, insightful pictures into what's actually happening happening and how people feel about it.

Where is all the data that you look at coming from?

Data comes from a lot of different places.

It comes from sources that people generate, so actions that people take, maybe when they're surfing online or answering the phone or moving through a space, but it also comes from static sensors that measure the world around us.

A really good example is household energy usage and smart meters.

So we can use these sensors that are really just registering how much energy is flowing in and out of a house.

But because people behave in very patterned ways, we can tell maybe how many people are home at a time or whether they're using one source of energy or another or even whether they're boiling the kettle.

That's an awful lot of data.

I mean it's it's there's no mystery where the word big data comes from.

It's a term for whenever data is too much for people to handle on their own.

So, you know, I might get home at the end of the day and empty out my pockets and look at the three receipts and todd up how much I've spent in the day.

But I wouldn't want to do that in my head or even with a calculator if I had a warehouse full of receipts.

And that's when we get into big data.

It's about volume or complexity or even tediousness.

If you don't want to do it by hand, it's probably big.

The phrase big data does imply you've just got lots and lots of data that you hang on to.

But this is not just about storing it.

It's about kind of looking at it in

many different ways, is that right?

It is, yeah.

It's about not only storing, but using.

So, if you think of data as accumulating or flowing, it's a little bit like a river, maybe a small stream, you know, that's maybe the kind of data we're used to.

Well, we still need water, so we might stand at the edge with a bucket and just sort of dip in and out as we need.

You know, we want to get useful amounts of data out of the stream and not get everything because we quickly become flooded.

Yeah.

And if we were doing a survey as humans, we could probably cope with a few hundred responses and we might have a specific question in mind.

I'm imagining that if you put computers on this, which run very, very fast, never sleep, you don't have to feed them, etc., they can just interrogate that data in many different ways that we wouldn't be able to.

I think of it like looking through the data in different directions to see whether there's a connection to do with location or whether there's a connection to do with type of diet.

Is that the sort of thing that you can do?

You can just ask lots of random questions of the same data?

Yeah, you can.

And with really good pattern recognition, you can get some very interesting things.

Some that might be surprising or some that might reinforce your own expectations.

An example that I think of is my fitness software on my phone.

So I've got a smartwatch and it's constantly reading a lot of my data about how I'm moving and even how warm my arm is and what my heart rate is doing, all kinds of things.

And I don't want the raw numbers from that.

I don't want to just know what my heart rate is at any given point.

I want to know whether it's accelerated at certain times of the day or whether certain days of the week it's showing more variability than others.

I want it to tell me, oh, every Monday morning you're really anxious.

Maybe you should, you know, just

take a walk before work.

Well, hmm, a thought.

Yeah.

And my imagining of big data is it's not just about looking at your data, but it's about collecting millions of people's heart rates and seeing if there are patterns connected to, I don't know, where they live or the type of job.

Is that something that you've been able to do?

Yeah, absolutely.

That's the kind of thing that's really useful.

You could, for example, compare someone's mobile screen time to their heart rate and see whether using their phone makes them more or less anxious.

That would be an interesting question to ask, especially if you could link it to particular apps.

Am I more anxious when I'm looking at social media versus when I'm playing a game?

How do we see the patterns change over time?

That might also be interesting.

Are people always more anxious on a Monday than a Sunday?

Always.

Now, in each episode, we like to take a break and hear from a tech historian.

He is Dr.

James Sumner from the University of Manchester.

And here he is with a look at how we used to deal with all that data before we had data scientists.

All data is big data if you don't have a system to deal with it.

A famous example is the 1890 United States Census.

A growing population meant that census results were taking longer and longer to process.

What got the project back on track was the punched card tabulating machine invented by Hermann Hollerith.

Instead of being taken down in writing, information was now encoded as a pattern of holes punched into stiff cards.

The cards could then be fed into a sorting machine

with mechanical probes that felt for the holes and automatically counted up the totals for various categories.

Hollerith's company evolved into the computing giant IBM.

Even before punched carts, there were highly organized data systems made possible not by machines, but by cheap labor.

The Post Office Savings Bank, founded in the 1860s, had no machinery at all till the 1920s, but successfully maintained a nationwide network using two written copies of every account holder's balance records, one at their local branch and one at the central office in London.

There was even an equivalent of password protection.

as anyone making a withdrawal had to sign a form that would be sent down to London on a mail train

to be compared with a sample signature.

In many ways, computerization didn't greatly change the methods of data management.

It just speeded them up.

Historically, it often felt like data systems were fighting just to stay on top of the rising tide of information.

But today's cheap, fast processors have given data scientists a lot more breathing space to explore whole new approaches to what the data means.

That was Dr.

James Sumner.

Now, Julia, can you give us a sense of how much data each of us gives off every day, week, year?

Too much to count.

Way, way too much to count.

I think it's probably changing too fast to be able to say with any kind of certainty.

And I guess we're producing more of it every year, right?

Yeah, not only are we producing more than we can really use, but we're increasing the amount that we produce all the time.

Would I be right in thinking that best practice for big data scientists is not to only take the data you know you want, but to take it all because you might think of a way of using it in a year or two's time?

This is an interesting question and it kind of depends on your resources.

So if you have the space to hold a lot of data, you might want to get more than you think you need.

But yeah, there's also reasons not to keep data that you cannot possibly imagine a use for, because there's lots of interesting data that we can imagine uses for that also needs a place to be stored.

I have this romantic notion that in the future, if we collect enough health data, enough lifestyle data about everyone, a computer that's just constantly interrogating it and looking at it from different angles might realize that, hang on, all people born in the winter who mainly eat raw vegetables, and I'll get silly here, and only wear natural fibers, don't get Alzheimer's.

Or something crazy like that, something

that we would never think to ask, but the data's in there if only we could look at it from a certain angle.

Is that sort of thing ever going to be possible, do you think?

Do you think the patterns for preventing serious disease are in there?

That's tricky.

I think it's possible that we might do something like that.

I think we couldn't rely on systems to take everything and predict useful things in real time, at least not right now,

because

it takes so much work to ask good questions of big data.

It takes a really long time to find those patterns and then to isolate the patterns and understand what's going on with them because patterns just indicate a relationship.

It doesn't say what causes what or how you might address it.

But it's a starting point, I suppose.

It is noticed that a certain type of person with a really weird combination of factors don't get a certain disease.

That's a starting point to go asking why.

Yeah, absolutely.

And we can probably do some of that now.

And I think we will definitely be doing more of it in the future.

I'm confident that certainly the big issues can be addressed through this kind of use of big data.

You know, really common problems, things that affect very large sections of the population.

There's a concern that we might not pay enough attention to serious issues that affect limited numbers of the population.

Then it won't pop up in the big data and we'll miss it entirely.

And that may may be a health inequality thing that we ought to be concerned about if we rely too much on big data.

I would imagine the things like health data are particularly sensitive.

Most people don't like the idea of their health data being shared, being interrogated.

Is that a reasonable fear?

It is

if they're identifiable through their health data.

And you have to contrast that with the potential benefits that might arise by using lots and lots of people's data without any effort to identify individuals to recognize patterns.

So, for example, if lots of people suddenly their heart rate goes up, that might be an indicator that something is happening where maybe people are getting sick or there's some kind of incident in the city that has created a lot of noise and people are responding to that.

You might be able to detect almost in real time

important things that are happening that need addressed.

Julia, thank you so much for filling us in on the joys of data.

Thank you very much.

I'm happy to share my data joy.

So you've collected all this data, masses of it, and your computers are able to spot all of those patterns and make all of those lovely connections.

But there is more that you can do.

What if, instead of asking the computer to look for stuff that has something in common, you told it that certain things all had something in common.

What if you told it that all those pictures over there are cats and all those pictures over there are human faces?

Then you would be teaching it and it would be learning.

Congratulations.

You have just entered the world of artificial intelligence and that's where we're going in the next five episodes.

We'll see you there.

Along with COVID-19 came the rise of the conspiracy theory movement in the UK.

The system's rotten at the core.

It should be deleted.

I'm Mariana Spring.

In my new series, I'll be exposing how radical some people in the movement have become and how alternative media is fueling them.

So many crazy stories have been spread so far and wide that it's hard to see this ending well.

Mariana in Conspiracy Land on BBC Radio 4.

Available now on BBC Sounds.

Coach, the energy out there felt different.

What changed for the team today?

It was the new game day, Scratchers, from the California Lottery.

Play is everything.

Those games sent the team's energy through the roof.

Are you saying it was the off-field play that made the difference on the field?

Hey, a little play makes your day, and today it made the game.

That's all for now.

Coach, one more question: play the new Los Angeles Chargers, San Francisco 49ers, and Los Angeles Rams Scratchers from the California Lottery.

A little play can make your day.

Please play responsibly, must be 18 years or older to purchase, play, or claim.