Big Data

Transcript

Speaker 1 This BBC podcast is supported by ads outside the UK.

Speaker 2 Hey there, it's Ryan Seacress for Safeway. Cough and cold season is coming, so make sure you're prepared and stock up on your family's favorite personal wellness products.

Speaker 2 Now, through October 7th, shop in-store and online for savings on products like Mucinex, Kickstart Combo, Zertec, Allergy Relief Tablets, or Liquid Gels, Halls Cough Drops, and Mucinex Fast Day and Night.

Speaker 2 So you and your family are armed and ready for the season ahead. All friends, October 7th, Restrictions apply.
Offers may vary. Visit Safeway.com for more details.

Speaker 3 Suffs, the new musical has made Tony award-winning history on Broadway. We demand to be home.
Winner, best score. We demand to be seen.

Speaker 4 Winner, best book. We demand to be quality.

Speaker 3 It's a theatrical masterpiece that's thrilling, inspiring, dazzlingly entertaining, and unquestionably the most emotionally stirring musical this season.

Speaker 3 Suffs, playing the Orpheum Theater, October 22nd through November 9th. Tickets at BroadwaySF.com.

Speaker 5 A happy place comes in many colors. Whatever your color, bring happiness home with Certopro Painters.
Get started today at Certapro.com.

Speaker 5 Each Certopro Painters business is independently owned and operated. Contractor license and registration information is available at Certapro.com.

Speaker 7 This is the BBC.

Speaker 8 Hello, I'm Robin Ince, and I'm Brian Cox.

Speaker 8 Today's show is all about big data and how the information you unwittingly offer up every minute of your life is being used to shape the environment around you and even to decide what you need.

Speaker 1 You'll know this if you've ever done internet shopping, that suddenly that algorithm, I looked up Brian Cox once, once, and ever since then I'm inundated with adverts for how to be a one-hit wonder,

Speaker 1 1990s club classics volume 17, and astronomer's finger wax, which I don't even really know what that is.

Speaker 1 It just says with astronomer's finger wax, you can point to anywhere in the universe with confidence. May not work at zenith.

Speaker 1 You're right, you said that zenith beat will be quite a low laugh. You'll be a niche.

Speaker 8 A niche joke for astronomers. How many astronomers in the audience?

Speaker 4 None.

Speaker 1 Well, to be honest, we got more laughs than we deserved, didn't you?

Speaker 8 Well, I put your name into your search engine anyway, and all I get now is adverts for cardigans.

Speaker 1 You would.

Speaker 1 So 40 years ago, you would know if someone was collecting data from you because you would be stopped in the street by a man in a Mac who would be saying something like, Excuse me, have you got five minutes for me to ask you about your thoughts on Home Pride cooking sauce and the education policy at Barbara Castle?

Speaker 1 To which I would normally reply, you'd have to be quick, Rumbalo's closes at five and I've just blown a valve.

Speaker 1 We really debated about the Rumbalos reference,

Speaker 1 but we felt that to say Rumbalo's would somehow find some of our core demographic and take them to a place of delight.

Speaker 8 Every time you use your loyalty card or your oyster card, search online or even switch on your phone, information about your behaviour is collected. But where does it go and what is it used for?

Speaker 1 To answer those questions, we are joined by at least 64 petabytes of pure information. And they are.

Speaker 7 My name is Dr. Hannah Frye.
I'm a mathematician from University College London and the author of a new book on algorithms and data called Hello World.

Speaker 7 And the most interesting thing that I have discovered on the internet is that France was still using the guillotine when the first Star Wars film came out.

Speaker 6 Hi, I'm Tamandra Harkness.

Speaker 6 I'm a lapsed lapsed comedian. I'm a mediocre mathematician, but I am the author of Big Data: Does Size Matter?

Speaker 6 That's the correct response.

Speaker 6 And the most interesting thing I found on the internet is that the very first ever Registrar General of Births, Marriages and Deaths for England and Wales, who was a man called Thomas Henry Lister, also wrote a romantic novel called Granby, in which the eponymous hero's love for Miss Jermyn overcomes her parents' opposition when he is revealed to be the secret heir to Lord Malton.

Speaker 4 Hello, my name's Danny Wallace. I'm a writer and a presenter.

Speaker 4 And what I find interesting about the internet is when big companies get their sort of web addresses wrong, their websites, you know, they've got to come up with something good and they've got to come up with their URL and they just think that'll do.

Speaker 4 Which is why the custom pen manufacturer, PennIsland, when it's all typed out,

Speaker 4 you are way ahead of me,

Speaker 4 becomes penislands.net or the company Internet Protocol Anywhere, which until they noticed you could find at ipanywhere.com.

Speaker 4 So it's just a way of bringing in new audiences, I think, for their products.

Speaker 8 And this is our panel.

Speaker 1 Tamandra, we'll start off with a definition because this is, I mean, big data is quite new.

Speaker 6 I mean, for most of everyone on this panel, when we were born this was nothing that would be referenced this is it is a new idea so what is big data okay well obviously it's big it's actually quite hard to specify how big because it's the amount of data in the world proliferates so quickly that if you give it a figure one year it's it's out of date to give you an idea I interviewed a brain scientist called Professor John D.

Speaker 6 Van Horn and he said that when he got his first post-doctoral research job the lab sent him out to buy the biggest hard drive they could afford because they were doing brain scans, which is a lot of data.

Speaker 6 And he brought it back, and people from the other labs in California came to look at this hard drive because they'd never seen one so big. It was four gigabytes.

Speaker 6 And I'm thinking, I'm talking to him on the phone, thinking, your photo on the internet makes you look quite young, but how long ago was this? Because my phone has eight gigabytes. So

Speaker 6 it is big. There's lots and lots of data.
But is there more to it than that? Yes.

Speaker 6 And I actually came up with a backronym for big data, which, as you all know, is an acronym where you've reverse-engineered it to get the word you wanted.

Speaker 6 That's the only thing anyone's going to remember about what I say tonight.

Speaker 6 A backronym. So, data, big, big data is big, obviously.
D, it's got lots of dimensions. So, you've got lots of different data sets.

Speaker 6 So, perhaps, you know, you don't just, well, another brain scientist said, yeah, okay, brain scans are great, but I'm much more excited if you get the brain scans and the medical records records and the postcodes where the patients have lived and the weather reports for those postcodes when they lived there.

Speaker 6 This is Professor Paul Matthews. And I put them all together, and then I can study the effects of sunshine on the progression of multiple sclerosis.
And that, he said, that's big data.

Speaker 6 If I just have brain scans, that's just large data. So it's got different, yeah, that was what we actually said.
So it's got different dimensions. It's collected automatically.
That's the first A.

Speaker 6 It's collected pretty much in real time, hence the T, which means you can also then project it forwards in time and use it to make predictions.

Speaker 6 And the second A is for AI, because you basically use artificial intelligence-type programs to process it. So that's big DATA.

Speaker 4 So, Hannah,

Speaker 8 we've spoken there about, I suppose, collecting data sets in a way that is

Speaker 8 well understandable in a way, such as weather data or brain data or brain scans. But can you give us an overview of the totality of the amount of data that's being collected?

Speaker 7 Yeah, well, in a way, actually, I don't know if I totally agree with everything that Tamanj said, because I think that actually we have had big data in the past.

Speaker 7 I think, you know, the census, for example, is this connected data set, tells us all about, you know, different things about one person and lots of things about everyone.

Speaker 7 And, you know, each one of us has contributed to it across the entire country. I mean, that really is sort of

Speaker 7 big data. But I think what has changed is the types of data that we're now collecting.
I mean, you know, you don't need me to tell you how much data your phone can collect just by you walking around.

Speaker 7 You know, you have a heart rate monitor on your wrist. You've got

Speaker 7 your lights coming on and off. That's all being recorded.
Everything, I mean, basically every single thought I ever have, I practically type into Google.

Speaker 7 You know, there's there's just a catalogue, the the range of different stuff that we know about people now, that is different.

Speaker 8 And you said it's being recorded almost in passing, but is it are we now to take for granted the fact that all those things we do with the phone, everywhere that we move, every internet search that we perform, is recorded or archived somewhere.

Speaker 8 Do we have control over that?

Speaker 7 No.

Speaker 7 No, not at the moment.

Speaker 7 But there are, I think people have in the last couple of years started to wake up to the fact that actually being able to infer this much about individual people isn't necessarily the kind of society they all want to live in.

Speaker 7 Because, of course, there are things like what gender you are, you know, your sexual orientation, but other things as well, very, very personal things, you know, whether you've had an abortion perhaps, you know, whether you're having problems conceiving.

Speaker 7 All of these things can be inferred from your searches that you're doing online.

Speaker 7 And I think that people are starting to wake up to the fact that actually it doesn't feel particularly comfortable to have someone be able to know that about you.

Speaker 7 There was a story, actually, a very big story in America a few years ago.

Speaker 7 A company called Target, it's kind of like sort of like Woolworths, really. So it sells everything that you can imagine.
You can get some grocery stuff in there, but also things for your house.

Speaker 7 And they have a club card type system where they can track what an individual is buying.

Speaker 7 And they were doing something called basket analysis, right, where you look at one individual and the things that they're buying over time.

Speaker 7 And they brought on a new statistician to look at all of this data.

Speaker 7 And this statistician realized that there were some clever tricks that you could use to work out whether or not someone was pregnant based on what they were buying.

Speaker 7 So, not the obvious stuff, not when they're buying nappies and cotton wool, but when they're buying unscented body lotion when they're in the second trimester.

Speaker 7 And often, that would be preceded by someone buying vitamins and stopping buying alcohol.

Speaker 7 And you could even kind of predict the exact moment that they were going to give birth by the things they would go on to buy.

Speaker 7 So, what the company decided to do is they set up this pregnancy predictor, right?

Speaker 7 So, if you went past a certain threshold, they would assume you were pregnant and they would send out a series of coupons to you in the mail.

Speaker 7 Just to, you know, capture your custom early and lock you in, so you're a target customer.

Speaker 7 And that was all fine, you know,

Speaker 7 don't necessarily have a problem with that. Except that in 2012, a father of a teenage daughter walks into a store in Minneapolis and was outraged that his daughter had been sent this pack of coupons.

Speaker 7 And he was like, You're, you know, normalizing teenage pregnancy, this is outrageous. So the store apologised and then called his home later to follow up on that apology.

Speaker 7 And by the time they managed to call him, he said, You know what? Actually, I've had a chance to have a chat with my daughter. And it turns out that she was pregnant.

Speaker 7 Found out through coupons that are mailed through the post.

Speaker 7 So, I think this is something that people are really kind of waking up to that actually does make us feel quite uncomfortable to have people know that much about us.

Speaker 4 Can I ask you a question on behalf of my mum? Yes. My mum,

Speaker 4 she doesn't know how this is happening or why, but the universe seems to be telling her that she needs a new mattress.

Speaker 8 And, like, wherever she goes, whatever she does,

Speaker 4 the same mattress is targeting my mother, and she feels victimized by this mattress company. And she says she hasn't been googling mattresses, and I believe her.

Speaker 4 What's going on, and should she buy one?

Speaker 7 I don't know if I can help you on the latter.

Speaker 7 But you know, there are things like maybe she's googling insomnia if she can't, if she's having trouble sleeping, or back pain.

Speaker 7 I mean, there are all of these things that are just loosely associated that don't directly say. I mean, not everyone who Googles back pain needs a new mattress.

Speaker 7 You know, maybe your mum doesn't either. But there's just enough, if you do it to enough people, the chances are that you're going to increase your sales.

Speaker 6 Twitter thinks I'm a man.

Speaker 6 And I know this not only from all the beard care products that it advertises to me

Speaker 6 and the videos of buff guys working out, which is actually okay.

Speaker 6 But you can go on Twitter, you can can see what it thinks about you. It thinks I'm aged between 13 and 55.

Speaker 4 It's got that right anyway.

Speaker 6 And it thinks I'm a man. And I don't know why, because I've never told it.
So, you know, it's not infallible, which I think is there's some hope.

Speaker 4 As a man, I'm not getting those things.

Speaker 6 What do you get on Twitter?

Speaker 4 Well, I never get buff men working out. I don't know what you've been googling.

Speaker 6 But no, but this is, well, I don't know.

Speaker 4 Have you been Googling?

Speaker 4 Have you been Googling Buff Men Working Out?

Speaker 4 Hell near.

Speaker 6 I do remember actually my mother saying that she'd been looking for, there was a particular scene from a movie where there was a very moving scene of, I think it was Edward Woodward acting, and it was a movie about a prison.

Speaker 6 And there was this very moving, emotional scene in the shower where he was acting really well. And so she googled prison shower scene.

Speaker 6 And she said she didn't find Edward Woodward, and a lot of the acting wasn't that good.

Speaker 6 It was quite blurry, and you couldn't really tell what was going on.

Speaker 4 I mean, you raised a good, you got me worried now, because this afternoon I was literally googling penis land and IP Anywhere.

Speaker 8 We're talking about human behaviour here. And so you gave an example actually of a single individual

Speaker 8 in a case where you can predict something about them.

Speaker 8 How accurately can we predict individuals' behaviours? And then, I suppose, groups of individuals. Does that become easier to predict?

Speaker 7 Yeah, well, so groups of you can certainly look at what a lot of people are doing.

Speaker 7 And actually, there was

Speaker 7 in getting my notes ready for this, I was researching something on my work computer that I was

Speaker 7 slightly regretted because it's quite an interesting story about Pornhub.

Speaker 7 So,

Speaker 4 during the Hawaiian. We don't know what that is.
What is

Speaker 7 I mean, it's got a backronym somewhere, I'm not sure.

Speaker 7 So during the Hawaii missile alerts,

Speaker 7 there is

Speaker 7 Pornhub have released the data of how much they were being used in Hawaii at the time.

Speaker 7 And as the alert came out, the first text message saying, you know, there's going to be a missile coming,

Speaker 7 there was an 80% drop in usage. Still not down to zero.

Speaker 7 Still 20% of people there.

Speaker 8 They were too busy busy to read their text messages, that's why. The other 20%.

Speaker 7 But as soon as the follow-up message came through to say that everything was all okay, Pornhub then spiked to 50% higher than normal usage for that time.

Speaker 4 What a way to celebrate.

Speaker 7 Exactly, exactly.

Speaker 7 But I think you can certainly observe how people are behaving, especially at the level of a population like that.

Speaker 7 In terms of predicting what an individual will do, you can do a better job with algorithms and with data than you can just by guessing, just by another person sort of trying to make a prediction.

Speaker 7 But you can't get absolute perfection. And that I think is one of the slight concerns, really, about all of these algorithms being used.
So, to give you an example,

Speaker 7 algorithms based on data of people's past is being used to try and predict whether or not they'll go on to commit a crime. Now, this is in a particular scenario.

Speaker 7 So, when a judge is trying to decide whether to to give someone bail or not, for instance, but also now increasingly if they're sentencing an individual and sending them to jail.

Speaker 7 And this is something that I think the whole sort of big data community has really been tussling with. Because

Speaker 7 if you're a judge, you sort of do need to make a prediction. You need to make a prediction of whether letting someone out and giving them their freedom before they face trial is a good thing.

Speaker 7 You've got to make a prediction of whether they're going to betray your trust and break the conditions of their bail.

Speaker 7 So, in some sense, actually, an algorithm that makes that prediction better is better.

Speaker 4 But,

Speaker 7 well,

Speaker 6 but are they better? And also, how would you feel about being sentenced because people like you in the past

Speaker 6 went on to re-offend or didn't go on to re-offend? Because what categories are they using?

Speaker 6 Durham Police are working on their own version of this, which to do them credit, they've looked at the American ones which have been subject to some controversy and gone, we want something to help us make this decision whether to kind of keep people out of jail and put them onto a rehabilitation scheme instead.

Speaker 6 But we want it to be transparent, we want everybody to know what's going on, how they're being judged.

Speaker 6 But the biggest predictor of whether they're going to re-offend turns out to be the postcode.

Speaker 6 So, I mean, I don't know what kind of area you guys live in, but do you want to go to jail or not on the basis of your postcode? Is that fair?

Speaker 4 Danny, how do you feel about this collection of information?

Speaker 1 I mean, are you careful? Are you canny when you are on the internet, for instance, and you know, certain things or you're ticking a box, whatever it might be?

Speaker 1 Do you are you methodical, or do you kind of worry about?

Speaker 4 No, I sort of, in a weird way, I find it kind of comforting

Speaker 4 in a strange way. It's that kind of, you know, it's the electronic version of the nanny state, kind of looking after me, making sure my mother has a mattress, things like that, reminding her.

Speaker 4 I worry, though, about the kind of the next stage.

Speaker 4 You know, we've got,

Speaker 4 we've all got all these devices that have microphones in them, and we've got one of those robot ladies in our house

Speaker 4 with a little box, and you can talk to her. And you can just go, you know, hello, what's the weather like? And I don't know why I ask her that, because I've got windows.

Speaker 4 I'm able to, you know, do that for myself. But you start to wonder, you know, are they listening all the time? You know, is that how more information will be got?

Speaker 1 I'm saying, what is this? Because I'm not.

Speaker 4 I know.

Speaker 1 When you say unconnected as well, I was just seeing one of those old-fashioned barometers where either someone comes out with an umbrella or a lady comes out because it's sunny.

Speaker 1 So this doesn't sound very modern at all.

Speaker 4 That's exactly what it is. No, you know, like Siri or Alexa.
I want to, you know, now I've come up with two names, I can say them because it's sort of branded, isn't it?

Speaker 4 But, you know, you can go, Alexa, do this for me. And she will.
She's very obedient. But I mean, although,

Speaker 4 although there was a story not long ago about these

Speaker 4 Alexis in particular, or Alex I, seeing as as I'm already a four

Speaker 4 where every now and again you know they'll only really respond if you say their name but just recently they've started to just every now and again when you're having a conversation with someone else or terrifyingly in the middle of the night they begin to laugh maniacally

Speaker 4 which is not what you want to you know we could do with a few in here but

Speaker 4 but you don't want to hear that at sort of three in the morning just hearing a disembodied voice a woman just laughing downstairs downstairs or just mocking you so i i start to think you know there are bugs in this but could this be the sort of the next step where we just accept that these things are always on and then your smart tv is communicating with your phone and your phone is talking to alexa and they're all going his mum wants a mattress

Speaker 1 see this sounds like it has in terms of ai i don't know how you feel about andrew if we now have these machines which have reached the stage of you know mocking us and probably also understanding irony have we now got you know have these machines machines reached that point of passing the Turing test?

Speaker 1 It's like that moment where it's not the chess machine winning, it's when the chess machine gloats as well.

Speaker 6 Well, that I think, apparently, there is somebody in London actually working on trying to get AI to understand irony. And I'm like, don't do it, don't it?

Speaker 6 Because when they do become more intelligent than us, our only hope will be in irony. We'll live underground and have a secret language based on sarcasm.

Speaker 6 Because it is the only thing they don't understand. I mean, now they can do stares.

Speaker 8 Really?

Speaker 4 I once talked to a roboticist and I asked him that kind of, you know, that question that everyone sort of ends up asking a roboticist, which is, you know, will they rise up against us?

Speaker 4 And he sort of laughed and said, probably.

Speaker 4 And I said, well,

Speaker 4 what are we doing about this? And he didn't really have many ideas. And he just sort of went, well, at the moment, we're making sure that the off-switch is quite readily accessible.

Speaker 4 It's all safe.

Speaker 6 But in a sense, I I mean, that's that's the wrong question because I actually had really funny conversations when I was out in California at Google HQ for an event. And

Speaker 6 really spookily, I basically was going out there for this Google event. At the last minute, I stuck my radio recording kit in the bag just on a whim.

Speaker 6 And I landed at the airport and I got this message from the BBC saying, this is a few years ago, saying, yeah, no, get in touch with us urgently because we want you to co-present this programme about the singularity, like the super intelligent AI that's going to be more intelligent than us.

Speaker 6 And I went, That's quite spooky because I'm basically on my way to where it probably already is.

Speaker 6 And so I was talking to people in Google and saying, you know, how long do you think we've got before the super intelligent AI? And several of them said,

Speaker 6 well, we're in it.

Speaker 6 You know, it's already here. Look, think about it.
If you were a super intelligent AI, would you burst on the scene going, No, you are mine, humans?

Speaker 6 Or would you quietly sit in Silicon Valley attracting really clever people, giving them nice food and drink,

Speaker 6 getting them to service you and bring you all the data that you need, and then maybe some robotic cars and maybe some drones and who knows what else, you know, that would be what you would do.

Speaker 6 And I'm sitting there in Google, in the restaurant, talking to this guy, going,

Speaker 6 but that means it can hear what we're saying.

Speaker 6 I actually don't think it's already here, but you know, if it was, if it's going to come, it'll come like that. It'll come with

Speaker 6 clever people in Silicon Valley going, here it is, we've built it. Isn't it great?

Speaker 7 I think we're quite, I think, calling AI, well, I just think actually, I have a slight problem with the label AI, full stop.

Speaker 7 I think what we've seen recently is a revolution in computational statistics, not a revolution in intelligence. And I admit that that is nowhere near as sexy.

Speaker 6 Statisticians of the world unite.

Speaker 7 You really like statistics.

Speaker 6 Statisticians of the World Unite, you have nothing to lose but your Markov chains.

Speaker 4 All right?

Speaker 1 It is, that's one of the things actually that for both you and Hannah, I imagine I I found a quote about this which said about where you are with mathematics and statisticians.

Speaker 1 It's like an arms race to hire statisticians nowadays. Mathematicians are suddenly sexy.

Speaker 1 I mean, in terms of big data, at that moment of going, it has changed people's conception of both mathematics and mathematicians, hasn't it?

Speaker 7 Yeah, I think so. I mean, I think suddenly we can use all of these techniques that we were able to use in science for so many years, and now suddenly we can apply them to ourselves.
And

Speaker 4 to manage is maybe.

Speaker 6 I do agree, that's the question. Because it's one thing to like, you know, like Brian does underground in Switzerland, applying big data and a large hadron collider to protons, isn't it?

Speaker 6 Anyway, those subatomic things.

Speaker 1 To be honest, he's not as involved as he used to be.

Speaker 4 Well,

Speaker 1 tele or great discoveries.

Speaker 4 I'm resisting.

Speaker 1 We get on very well, by the way. That's why I said that.
Because one little loo is if to go.

Speaker 4 I'm resisting the urge to deflect deflect the programme from the subject into the

Speaker 4 I'll talk about particle physics. But that's the natural world, right?

Speaker 6 The natural world is absolutely, that's maths is brilliant at studying the natural world and statistics.

Speaker 6 But human beings, we are part of the natural world, but we're also not part of it. Like, we don't just behave like particles.

Speaker 6 And,

Speaker 6 you know, we have free will and we and we're awkward and we do things for reasons that we then have to explain and people don't understand.

Speaker 6 And I get squeamish about saying, oh, well, you know, but you can model how people behave, and we basically behave the same as, I don't know,

Speaker 6 ball bearings.

Speaker 8 There's a difference here, though, isn't there? Because we've talked about it, it sounded quite sinister, actually, when you're talking about predicting whether someone is likely to re-offend.

Speaker 8 But in terms of groups of people, in terms of movement of people through shopping centers or cities, then I think it sounds less sinister and more sensible, doesn't it? And

Speaker 8 is it easier to take to say, well,

Speaker 8 I suppose to predict how crowds will behave in certain environments rather than individuals.

Speaker 7 Yeah, it's a perfect example. You know, you take something like a transport system, like the tube network, say.

Speaker 7 You know, it's actually really important to have a really clear idea through data of how people are using your system, where they're going, when there's a problem, where they redirect to. You know,

Speaker 7 it's absolutely integral to getting something that works efficiently. And I think that you can say the same thing about,

Speaker 7 you know,

Speaker 7 to a degree, and this is where Timandra and me disagree, but I think to a degree you can say the same thing about making

Speaker 7 your policing as efficient as possible and working out where the best places to place your forces are in a city.

Speaker 7 And I think actually across the board, really, in healthcare, I think in

Speaker 7 yeah, everything, everything, every system that humans are part of, I think that we can learn about ourselves by thinking of ourselves through the eyes of data and make it more efficient.

Speaker 8 Timandra, do you feel there's a difference between using big data to predict the way that crowds will behave or large groups of people and individuals?

Speaker 8 Is it the because you said you were concerned, but is it really about the individual rather than group behaviour?

Speaker 6 It really is. I mean, that's the root of it.
Is that, I mean, Hannah's right, it can be really, really useful to look at how people behave en masse in order to find solutions en masse.

Speaker 6 And, you know, and in crime, for example, if you did find a postcode, which happened to have a lot of criminals, then it would be useful to go, that's weird, what's happening there, is it particularly deprived?

Speaker 6 Is you know, is this something we can do

Speaker 6 en masse on a population level?

Speaker 6 It it's where I would get really worried is when you then jump and go, okay, well, if we know that I know 30% of the people in this postcode will end up unemployed, then you look at an individual and you go, you're thirty percent likely to be unemployed, so we are going to treat you as if you are basically a potential unemployed person without any regard to you as an individual and what you might think and want and do.

Speaker 6 So that is that is part of it. But I do, I'm also a bit squeamish about the idea that you kind of guide us without consulting us.

Speaker 6 If you look at, I don't know, wanting us all to live healthier lives and walk more and get the bus less, and you go, okay, well, we what we'll do is we'll redesign the system so that you have to actually really put yourself out to get a bus.

Speaker 6 And we're just going to nudge you into walking without ever saying to you, Do you want to walk more or not? And that's another thing.

Speaker 6 I just get a bit, you know, I mean, you, Danny, you seem quite happy with the idea of the Danani state doing things for your good.

Speaker 6 I'm a bit more like, well, ask me. You know, I might want to walk more, but I do think it should be my decision, not just kind of nudged into it.

Speaker 6 You mean smart cities is all smart cities and everything works really efficiently and it's a great system.

Speaker 6 But the problem with a smart city is it seems to kind of assume that we're dumb and the city is smarter than we are.

Speaker 6 And

Speaker 4 I don't like that.

Speaker 8 I was actually at an event in Hong Kong, Robin was there as well, and it was a trade panel, and we were talking about things.

Speaker 8 And then there was a Hong Kong entrepreneur, a property entrepreneur, there. And he sort of woke up.
I'd said something. I made some joke about Brexit or something.
I can't remember what it was.

Speaker 8 And he looked up and he said,

Speaker 8 This is, we're going to beat you. In China, we're going to beat you.
And we're going to. And the reason is that data in China is owned by the government.

Speaker 8 So you do not have the right to restrict data, for example, about your movement through a city. And he was using that as the example.

Speaker 8 So a city like Shanghai, for example, his his assertion was it will be a better city, it will be more efficient because

Speaker 8 the data is freely available to the planners of the city and therefore you can build a better city. Which is i you can see the point.

Speaker 8 Um and I suppose really what we're talking about here is we should separate the two in the discussion really from what we can do with big data and then the oversight that government has.

Speaker 7 Yeah, what we should do with big data. And actually, China is a very interesting example because China has had ID cards for a really long time, which obviously we rejected in the UK several times.

Speaker 7 But essentially, that means that the database of ID cards means that the government knows what everyone's face looks like, right? It owns that data on everyone's face.

Speaker 7 So facial recognition software is now

Speaker 7 widespread across China. There was even an example actually in

Speaker 7 some toilets in Beijing where the facial recognition system would notice you as you went in,

Speaker 7 and then if you came back,

Speaker 7 oh, it would only release, I think, 60 centimeters of toilet paper, right, every time. And if you came back within nine minutes, it would lock off all of the toilet paper system.

Speaker 7 Because clearly, toilet paper theft within this particular toilet in China was so extreme that they needed to register your face.

Speaker 1 That's a terrifying moment in 2001. That's what Kubrick did.
I'm not going to give you any more toilet paper, Dave.

Speaker 1 I'm not going to open the toilet door, Dave.

Speaker 8 You could have just had some dodgy sushi or something.

Speaker 8 Well, I wanted to ask Tamandra if you have examples of extremely positive outcomes from looking at big data sets and analysing data.

Speaker 6 Oh, definitely. And, you know, it is true that obviously it can be used to make things more efficient.
I would never say we shouldn't use it. I think it really is all about oversight.

Speaker 6 And we could, for example, contrast,

Speaker 6 well, China, where they're using facial recognition everywhere and have no compunction.

Speaker 6 But even there's cities in Europe which are introducing all sorts of different smart systems where different private companies are just gathering up a lot of data about individuals and nobody even knows what they are with the city of Oakland in California, where

Speaker 6 the citizens basically heard their council had got a federal grant to put in a very integrated surveillance system with facial recognition and number plate recognition and all sorts of things, and went,

Speaker 6 can we just ask, where is your privacy policy on this? And when the council went,

Speaker 6 they had a big campaign, and they now have not only

Speaker 6 is that system kind of quite controlled and scaled back, but they have a standing privacy commission which includes citizens and civil liberties liberties organizations and every time they want to bring in new technology they sit there and go okay well what do you want the data for what are you going to do with it how long are you going to keep it and it has democratic oversight and I think that's perfect because they still get the benefits of the technology but they know what it is

Speaker 6 but if you want to talk about how great big data is and what it could do My favorite example, my kind of poster boy of big data, is a professor in Southern California called Eamon Keogh.

Speaker 6 And he's a professor of electrical engineering, but what he works with is insects. And I said, well, what's the connection there?

Speaker 6 And he said, well, you know how when your emails come in, you've got an algorithm that sorts them and gets rid of the spam and can forward emails automatically. I want to do that with insects.

Speaker 6 I want to be able to delete them and forward them.

Speaker 6 And I'm afraid my first thought was, Great, so you could forward wasps to somebody else's office.

Speaker 6 That wasn't what he meant at all. And he's basically, he's using big data to classify insects.

Speaker 6 He's got like a global database of insects based essentially on the sound that the wings make, but to not have background noise, he's got this mad little device using lasers and photodiodes.

Speaker 6 So you've got this light, red light, falling on a light gate, and it produces an electrical signal which you can turn into sound.

Speaker 6 And when an insect flies through that, or anything interrupts it, it interrupts the electrical signal, it makes a sound.

Speaker 6 So essentially, if an insect flies through this light gate, then the electrical signal is the sound of its wings, but without any background noise.

Speaker 6 And he's used big data techniques with millions and millions of these recordings of insects all around the world to classify this sound is this species of mosquito.

Speaker 6 And there are something like 5,236 species of mosquitoes.

Speaker 4 That's right.

Speaker 6 And it can not only tell the species, but tell whether it's male or female and whether it has already sucked blood from some creature.

Speaker 6 And so you could you could track, like, is it are the Zika carrying insects moving across Africa?

Speaker 6 We can trap all the ones. There's all sorts of things you could do with that to control insects,

Speaker 6 to know where they are, to know what diseases they're carrying. And that just made me go,

Speaker 6 this is it. This is a great use of this technology.
We can understand insects. We can do something about the diseases they carry or the crops they eat.

Speaker 6 So he's my kind of big data hero.

Speaker 7 What about the individual insect, though? You shouldn't label them.

Speaker 6 That's just it, because they're insects.

Speaker 4 I don't care about them.

Speaker 4 I didn't mean that.

Speaker 8 I suppose there's a real challenge here because

Speaker 8 obviously collecting large amounts of data for no apparent reason at first sight can be problematic. You might think someone should define what they want to do with it before it's collected.

Speaker 8 But of course, much of the opportunity is in finding patterns in the data, isn't it?

Speaker 8 So So, it's really not only the data you collect, but what you do with it and finding new ways of interrogating the data.

Speaker 7 Yes, no, I totally agree with that. I think people who

Speaker 7 work with data are

Speaker 7 notoriously greedy.

Speaker 7 Give us everything and we'll find something.

Speaker 7 Things are changing, though. I think

Speaker 7 there's a new bit of European legislation that is now out

Speaker 7 that changes the sort of ownership of data from the company and then shifts it slightly more in the hands of the individual. It's called GDPR,

Speaker 7 that should just give us a little bit more control over what companies know about us. And in particular on that point that you just made there of

Speaker 7 we need to know what our data will be used for,

Speaker 7 a complete list of what it will be used for before they're allowed to own it.

Speaker 8 Is that a a good thing though? Because

Speaker 8 part of the opportunity I suppose is to find patterns. So I can just to invent something.

Speaker 8 It may be that people who engage in certain activity are more likely to develop heart disease or something, but it might be something very unexpected, like eat too many apples.

Speaker 8 I don't know what it is, but something that we no one ever suspected. So so do we close off the possibility of making really important public health discoveries if we restrict the usage of data?

Speaker 7 It's a very, very difficult question without an easy answer. But I think one thing that I do know is there was an example of where

Speaker 7 all health records of actually, Samandra, may you maybe you know this one slightly better than me, the um the Royal Free example.

Speaker 6 Oh, yeah, the Royal Free Hospital uh set up a partnership with Google Deep Minds, which is the the

Speaker 6 the source of artificial intelligence, although not not to uh Hannah's standards, but the but the AI programme that that beat um the world champion at Go.

Speaker 6 And they set up a partnership so that the patient's health records could be analyzed uh using a new system Google developing to basically just move data around within the hospital more efficiently.

Speaker 6 It was fairly innocuous what it was doing. I mean, other projects they're doing is about finding patterns in the data.

Speaker 6 This one really was just about moving it around more efficiently and tracking it, but they didn't ask the permission of the patients to do this with the data.

Speaker 6 The hospital went, well, they've signed up to be our patients, they must be fine with it.

Speaker 6 And afterwards, there were a lot of wrists slapped because the information commissioner's office said, No, you really should have said to the patient, Are you okay with us giving your data to Google?

Speaker 6 Because it's an outside Google Deep Minds, because it's an outside organization.

Speaker 6 And I think there's a really important trust question here. I'm actually really in favor of us

Speaker 6 sharing our health data for everyone's benefit. I mean, it is a real example of

Speaker 6 we can all benefit enormously from sharing health data. And like you said, Brian, suddenly finding that, oh, actually, there's a relationship between this and this, and this could be really important.

Speaker 6 But we have to, in order to do that, we have to feel that we trust the organizations that are using it and that they have our best interests at heart.

Speaker 6 So, you know, we all love the NHS and it saves everyone's lives and so on. But

Speaker 6 there are also cases where the NHS says, well, I'm sorry, we're short of money. So, fatties and smokers, you're to the back of the queue when you need an operation.

Speaker 6 So, you can imagine if there's this great health data sharing research going on, and they suddenly go, oh, well, sorry, Brian, your storecard says that you bought pizza every week for the last 15 years and so you're not getting that operation.

Speaker 4 My name is not Brian.

Speaker 8 I don't know where you got that information from.

Speaker 4 And that's why this is terrifying now.

Speaker 8 You've conflated the two things.

Speaker 8 I was going to ask, Danny, would you I suppose there are two things here. One is that you can huge amounts of data about you could be collected.
But if it's anonymized, would you care?

Speaker 8 Is it really the the personalization, the identification of you with that data that matters?

Speaker 4 Oh, yeah, no, I think we're all, you know, we're all very protective of who we are and what we get up to, even if what we're getting up to is fairly innocuous.

Speaker 4 But if there's a greater good, I think the health thing is, you know, the best example possible.

Speaker 4 We've all, you know, you have apps on your phone that tell you kind of how many steps you've taken and where you've been, and you can add all this other stuff in.

Speaker 4 And if you're adding all this information in there, and it can be a central database that can look at these patterns and

Speaker 4 track your health and see what troubles you've come up against, then, like you say, if they can find new treatments or patterns that have never been spotted before that help the greater good, then absolutely.

Speaker 4 But yes,

Speaker 4 we all want to keep that to ourselves instinctively.

Speaker 1 But this thing that I find interest is: how easy is it to actually

Speaker 1 remove the anonymity?

Speaker 1 Because I was reading an article the other day, which was about apparently the system that was used in New York taxes, and it was just to see about the routes of taxes and various different kinds of the pay that was given on different routes.

Speaker 1 And some journalists managed to work out which ones, how much different celebrities tipped different. Now, that seems incredible because it was very clever, actually.

Speaker 7 It was well, the way that they undid all of the data, because there was a very weak encryption.

Speaker 7 The data was released for all of the yellow cabs in New York across a year, so that people could do these beautiful visualizations, work out efficiency in the city, so on and so on and so on.

Speaker 7 But they put a very weak encryption on it so you could work out what cab was what at what time.

Speaker 7 And then someone else realized that if you took paparazzi photographs of celebrities getting into cabs, if you could see the registration number of the taxi and know what day it was on, you could work backwards, you put those two data sets together and work out often where celebrities lived, but also exactly how much they tipped.

Speaker 4 It's also called stalking. Yeah, well, yeah, it totally is.

Speaker 7 But I also think that there's this idea about choice, and I think that we're slightly kidding ourselves if we think that we have much choice in this matter.

Speaker 7 Tamandra and I,

Speaker 7 about a couple of months ago, went to a crypto party.

Speaker 4 Don't know if you've ever been.

Speaker 4 Yeah, we do go to the best parties.

Speaker 1 Now, the audience at home don't know this, but if you could sense the envy going on in the studio at this moment,

Speaker 1 it is palpable.

Speaker 7 Keep it, keep it below the surface. Crypto parties, I didn't know what it was until I went to one.
Crypto parties are where you go and people teach you how to hide from

Speaker 4 everything.

Speaker 7 So it was, you know, people showing you how to have an operating system that only exists on a USB key so that you can take your whole computer with you when you leave and no trace of you will remain.

Speaker 7 You know, how to use the dark web, how to change all of the settings on your phone so that no one could track you. And it was very interesting.

Speaker 7 I went, I was

Speaker 7 researching my book to Manager, I think,

Speaker 7 a similar story. But I couldn't help looking around the room and thinking, what have these people got to hide?

Speaker 4 Paranoid parties are the best parties.

Speaker 6 Come on, wouldn't you want to go to a party with people that have something to hide?

Speaker 4 Surely.

Speaker 4 Turn up, it's just an empty room.

Speaker 6 No,

Speaker 6 I think that's unfair. When people say, you know, nothing to hide, nothing to fear, I say, nothing to hide, you haven't really lived.

Speaker 6 Everyone should have something to hide by the time we're adults. But I also think that getting together in a room and going, We're going to be really

Speaker 6 technical and spend ages changing our settings is not really the answer because you're not actually helping everybody do that.

Speaker 6 I think the answer is just to make everything more transparent so we can genuinely choose: do we want our data to be part of this? And also, just to generally say, What do we want it to be used for?

Speaker 6 Do we want to be like China where every individual can be tracked and everyone can be given a a credit rating, well a social credit rating based on how well behaved they they are, how polite they are, which can affect their chances of getting loans and things.

Speaker 6 Or do we want to kind of draw some lines and say, okay, GCHQ, you can hack into my phone in order to save me from being blown up by terrorists, but you can't hack into my phone to check that I'm not letting my dog poop where it shouldn't.

Speaker 7 That's the line, is it? Dog pooping.

Speaker 4 Dog pooping.

Speaker 1 Two very different Liam Neeson films there.

Speaker 4 Are you?

Speaker 1 So we asked the audience a question, as usual, and today the question was: what is the strangest question you have ever asked the internet?

Speaker 1 And I can tell you now, this is the largest number of answers we've had to throw away due to the fact that it's just not suitable for 4:30.

Speaker 1 What is the strangest question you've ever asked the internet? Why am I so inexplicably attracted to Brian Cox?

Speaker 1 Oh, Dominic, it's very explicable.

Speaker 8 Where is the hippo in the hippocampus?

Speaker 4 I really did.

Speaker 8 Danny, you got some as well.

Speaker 4 Yeah, Katie Adams, this is a great question. What is the capital of space?

Speaker 4 Brian?

Speaker 4 Well,

Speaker 8 there isn't a center to the universe.

Speaker 8 It's the ultimate Copernican principle at all points. It's called

Speaker 8 invariant, essentially, in every direction. It's all the same.

Speaker 4 Thank you.

Speaker 4 It was homogeneous. It was a lot less exciting than I was hoping for.

Speaker 8 It's homogeneous and isotropic.

Speaker 1 And almost exactly the same question here: is a fragile a Muppet?

Speaker 8 This is very.

Speaker 8 If there are an infinity of quantum worlds, why am I stuck in this one?

Speaker 1 I'm telling you, John, you should see the others. Honestly, it's not as bad as a thing.

Speaker 1 If things could only get better, when

Speaker 4 you got one? Yeah,

Speaker 7 I don't know the answer to this one how do whales breastfeed why did you google that

Speaker 1 what not the others just that

Speaker 4 I think I actually know that one

Speaker 4 yeah this is a brilliant what a moment for me

Speaker 4 well the the milk is a lot thicker and or you'll like this more viscous

Speaker 4 meaning it doesn't just dissolve and go away in the water and that's little bull whales that do that. And also, they sleep vertically.
Thanks, guys.

Speaker 1 So, during the show, we've been collating data on those who have been listening via the BBC's patented soul thieving conundrum machine, or Pesitikan.

Speaker 4 Got to work on the acronym, don't you?

Speaker 8 So, while this show's been on, we've been using Pesitikan to find out what you have been searching for on the internet during this broadcast, and here are the top three searches.

Speaker 1 Top three searches were: Why won't my lightsaber cut ham?

Speaker 1 Jacob Reese Mogg, Mars base. Question mark: What times do Rumbolos close today?

Speaker 1 Thank you very much to our panel, Hannah Freight, Amanda Harkness, and Danny Wallace. Goodbye.

Speaker 4 cage.

Speaker 4 Till now, nice again.

Speaker 3 Suffs, the new musical has made Tony award-winning history on Broadway.

Speaker 4 We demand to be home.

Speaker 7 Winner, best store.

Speaker 3 We demand to be seen.

Speaker 4 Winner, best book. We demand to be quality.

Speaker 3 It's a theatrical masterpiece that's thrilling, inspiring, dazzlingly entertaining, and unquestionably the most emotionally stirring musical this season.

Speaker 3 Suffs, playing the Orpheum Theater October 22nd through November 9th. Tickets at BroadwaySF.com.

Big Data

Press play and read along

Transcript

More episodes from The Infinite Monkey Cage

The North Pole Unwrapped - Russell Kane, Felicity Aston and Lloyd Peck

Monkey Business - Robin Dunbar, Dave Gorman and Jo Setchell

What’s the deal with eels? – Lucy Porter, David Righton and Caroline Durif

What’s the time? - Marcus Brigstocke, Leon Lobo, Louise Devoy

Mind-reading computers – Phil Wang, Anne Vanhoestenberghe and Luke Bashford