Rutherford and Fry on Living with AI: A Future for Humans

December 23, 2021 27m

▶️ Listen to episode Download audio (MP3)

As huge tech companies race to develop ever more powerful AI systems, the creation of super-intelligent machines seems almost inevitable. But what happens when, one day, we set these advanced AIs loose? How can we be sure they’ll have humanity’s best interests in their cold silicon hearts?

Inspired by Stuart Russell’s fourth and final Reith lecture, AI-expert Hannah Fry and AI-curious Adam Rutherford imagine how we might build an artificial mind that knows what’s good for us and always does the right thing.

Can we ‘programme’ machine intelligence to always be aligned with the values of its human creators? Will it be suitably governed by a really, really long list of rules - or will it need a set of broad moral principles to guide its behaviour? If so, whose morals should we pick?

On hand to help Fry and Rutherford unpick the ethical quandaries of our fast-approaching future are Adrian Weller, Programme Director for AI at The Alan Turing Institute, and Brian Christian, author of The Alignment Problem.

Producer - Melanie Brown
Assistant Producer - Ilan Goodman

Jump to transcript

Listen and follow along

Speed:

Transcript

This BBC podcast is supported by ads outside the UK.

Want to stop engine problems before they start?

Pick up a can of CFOAM motor treatment.

Seafoam helps engines start easier, run smoother, and last longer.

Trusted by millions every day, C Foam is safe and easy to use in any engine.

Just pour it in your fuel tank.

Make the proven choice with C Foam.

Available everywhere.

Automotive products are sold.

Seafoam!

Suffs, the new musical has made Tony award-winning history on Broadway.

We demand to be home.

Winner, best score.

We demand to be seen.

Winner, best book.

We demand to be quality.

It's a theatrical masterpiece that's thrilling, inspiring, dazzlingly entertaining, and unquestionably the most emotionally stirring musical this season.

Suffs, playing the Orpheum Theater, October 22nd through November 9th.

Tickets at BroadwaySF.com.

Hello, I'm Adam Rutherford.

And I'm Hannah Fry.

And this is the final part of our mini-series, Investigating AI and How We Will Live in a World of Intelligent Machines.

Yes, and we've been tracking the themes of the wreath lectures given this year by AI Supremo Stuart Russell.

So far, we've covered war, work, and what he described as possibly the most important and maybe the last event in human history, the emergence of machines that have general intelligence and will be able to accomplish all manner of tasks, basically better than us.

Well, that's all very jolly.

But this last episode is really about how we manage the development of intelligence in machines.

How we build systems that not only share our objectives, but understand them too.

Yeah, and as ever, we've got two mega mind experts with us to shuffle me and Hannah through.

We've got Brian Christian from the University of California at Berkeley.

He is the author of The Alignment Problem, and we're going to come to that title in just a minute.

And we also have Adrian Weller, Research Fellow in Machine Learning at the University of Cambridge and Programme Director for Artificial Intelligence at the Alan Turing Institute.

Welcome to both of you.

Now, Brian,

the title of your book is The Alignment Problem.

So I think it makes probably the most sense to come to you to help us with a definition.

What does it actually mean?

The alignment problem is the question of whether the objective that you've put into a machine learning system actually

incentivizes or instills the behavior that you have in mind for that system.

Where might this appear where you don't have what the machine is doing quite aligning with what you want as a human?

Well, unfortunately, this, I think, is actually quite common.

Most poignantly, we saw it with the example of the self-driving Uber vehicle,

which was trained to avoid pedestrians and it it was trained to avoid cyclists.

But when it encountered a woman who was walking a bicycle across the street, the car didn't know how to categorize her and she ended up being struck and killed.

Now, Adrian, I want to bring you in here.

It strikes me that both the control problem and the alignment problem as defined when we're looking at AI, well,

you know, we're talking about when the objectives of any two parties clash or aren't properly aligned.

It seems that that is a bigger idea idea than for just artificial intelligence.

That's right, Adam.

I think it's really helpful to think about the alignment problem as kind of rising out of a whole family of problems which we've been thinking about for a long time, about alignment or misalignment of incentives.

So if you are in your house and you don't like your neighbor's tree that's going over into your garden, you have an incentive to try and prune that tree and potentially even move the tree in the wall.

But the incentives of your neighbor are quite different.

And you can get into an altercation about that.

What's interesting to note there is that both of you have roughly equivalent power unless you live next to Bill Gates.

But

what we tend to do is we have laws and regulations that prevent people from just acting in a way which is going to impinge on your situation.

And I think that's a helpful paradigm to keep in mind when we're thinking about AI systems.

Okay, so let's go back to what Stuart Russell said in part of his lecture that we're sort of bouncing off today.

So Stuart gave an example of an alignment problem.

And he was thinking specifically about dinner robots and cats and a potentially unfortunate combination of the three.

Let's have a listen to what he said.

For example, suppose you have a domestic robot built according to the classical model with fixed but imperfect objectives.

And you're stuck at work late.

Your partner is away.

Now the kids are hungry and very grumpy and there's nothing in the fridge and there's not time to go shopping and then

the robot sees the cat.

Unfortunately the robot lacks the understanding that the cat's sentimental value is far more important than its nutritional value.

Right, now I'm going to play devil's advocate in this conversation as the person who is by far the least knowledgeable about artificial intelligence in this room.

Now I you know, I get that issue.

We don't want to kill the cat.

We don't want to eat the cat.

But isn't that just solved by introducing a variable which says feed the children, rule one.

Rule two, don't kill cats.

Doesn't that just make the alignment problem go away?

Adrian.

One of the difficulties with what you're suggesting, which initially can sound very appealing, that is the idea that, well, all we have to do is keep adding some additional constraint rules, which the system will understand,

is that it's almost impossible and probably might even be impossible to get all the rules that you need.

So you might say, well, don't kill the cat, but then what if you have a dog?

It'll go for the dog.

And so you'll say, don't kill a cat or a dog.

Well, then what if you have a hamster?

Or what if you have a small child who looks a bit like a dog because they're crouched down?

So there's almost no end to how many rules you would need to add.

And that's one of the big reasons why this is such an issue, particularly because all the AI systems which we have today are narrow and they don't have this sense of common sense understanding.

Could you not just have a more general set of rules, like thou shall not kill, thou shall not steal?

I mean, can't you just have a sort of list of Ten Commandments for AI?

I think one of the problems here is that

even when these things work in the short term, we're talking about scaling up these systems to become ever more

intelligent, ever more powerful.

And so there is a worry that

the Ten Commandments or the three laws or whatever it is that we put into the system may hold when the system is not particularly bright or resourceful, but it may get to a point where it learns to kind of out-fox or out-smart whatever rule structure we've put into place.

Aaron Powell,

does that mean that the way we think and the way we generate rules is just fundamentally different from the way that we can program

specific task-focused AIs?

That we can transfer our knowledge and say, don't kill cats.

That also means don't kill dogs.

Whereas an AI is always just going to go, it's just cats until you tell me not dogs as well.

One of the things that we've started to see with the shift towards machine learning systems, so systems that learn by examples, rather than being explicitly programmed, is that we, in many cases, don't have to enumerate explicitly all of the things that we want to do.

If you think about self-driving cars, we may or may not need to explicitly say stop at stop signs.

We just give hundreds of thousands of examples of humans behaving in that way, and we hope the system will kind of get the gist.

And this has both advantages and disadvantages.

The advantage is that we can convey subtle norms that we may not even be aware of in terms of, you know, how do you negotiate when both people are stopped at a stop sign?

What are the rules?

And so the system can sort of pick up all of these different norms in their full subtlety.

But on the other hand, without recourse to those explicit

norms, you may be missing something as well.

Just to add to the great points that Brian's made, I think it's it's interesting to look at the rules which we set for ourselves.

So if you look at law as it's written,

sometimes people think that it should be written in an extremely precise, clear way.

Arguably that's true in some cases, but in many settings, it's deliberately written in a vague way because we know that it's going to be necessary for humans to take context into account when we interpret those rules.

It should also note that because they're vague, it makes it easier to come to agreement between different parties to set up those rules in the first place, knowing that we'll figure out the details down the road.

And we don't have those abilities with an AI system.

One of the things that Stuart does throughout the Reese lectures, and we do in our program to it to a, well, I do in our program to a certain extent, is that we make specific allusions to science fiction.

And in this case, Stuart goes on to talk about Asimov's laws of robotics.

And I know, Brian, you sort of alluded to it a minute ago as well.

So I'm going to test you.

Can you do them, All three?

Or four?

Well, I know they were amended over time, right?

And didn't he add a zero-eth law at some point?

He's bluffing for time.

He's doing well there.

No, it's incorrect.

Okay, why?

I think the zeroth one was not to allow humanity as a whole to come to harm.

And then it was like, don't allow

a specific person to come to harm.

And then after that, don't let itself.

get destroyed.

And then do if you've met all those other criteria, then do whatever the person said.

Do you know what?

That's four out of four.

That's pretty, pretty good.

Very good.

That's pretty good.

Okay, well, listen, what's wrong with them?

I don't know that there's anything glaringly wrong with them, but in some sense, that is exactly why we should be worried, right?

It's the things that we don't foresee.

And, you know, in Asimov's fiction, that was the contrivance of the laws, was that they were a demonstration of how the system would nonetheless, you know, end up finding these creative ways to follow the letter, but not the spirit.

I also think one of the issues with the Asmov laws is that there are so few of them, right?

When human society is full of norms,

anyone who's been a parent, for example, or been around small children, you're constantly having to teach them things that seem obvious.

Like, okay, you don't put your finger in an electric socket, or you don't peanut butter doesn't go on that.

And so there's kind of an inexhaustible list of these norms, right?

So yeah, I think that's part of the problem.

I guess despite the fact that these are not forming an exhaustive list, nonetheless, Stuart Russell has come up with his own set of principles rather than the Asimov ones.

Now, they don't map exactly, but the first rule is to do what humans want.

The first principle is that the machine's only objective is to maximise the realization of human preferences.

So, the machine will be purely altruistic towards humans with no objectives of its own,

including self-preservation as commanded by Asimov's third law.

Imagine watching two films, each describing in sufficient detail and breadth a future life you might lead,

including everything that you might care about, and deciding which of these two futures you prefer.

So that's one of Stuart Russell's principles, which they don't exactly map onto Asimov's laws, but I think they're an interesting way to sort of frame this conversation.

But again, it strikes me that the sort of dichotomy that we've got, the disconnect we've got here, is how you get from simple principles that we want to abide by, do no evil, you know, don't kill cats or dogs and eat them, to what actually happens as societies evolve, which is you end up with, you know, 47 volumes of tax code to try and specify every single possible outcome of every possible variable in a real-world scenario.

So, you know, how do we square those things when we're thinking about AI and rules that they need to abide by?

Adrian?

Well, first, I think it's very hard to argue with Stuart's principle.

Why would you not want to maximize the realization of human preferences?

Of course, we do want to do that, but there's tremendous devil in the detail there.

First, how do we know what are human preferences?

How are we going to figure that out?

And even if we could figure it out, do we mean

Hannah, your preferences?

Do we mean Adam's preferences?

Do we mean Hannah, your preferences right now for what you want?

You know, would you like to eat a cake right now?

Maybe yes, but you know maybe it's not on your long-term good.

How do we deal with the fact that we know your preferences might evolve over time?

How are we going to deal with that?

So it sounds simple.

We should just, of course, work with human preferences, but there's a lot of complicated detail there to figure out.

Well, okay, let's go into some of those questions that you raised there.

How on earth, Brian, can you build an AI that understands how to interpret that?

I would say we're really just at the petty, exhilarating beginning of the process of actually building these systems that can learn general things from human preferences.

So one of my favorite examples, it was a collaboration between DeepMind in London and OpenAI in San Francisco, and they set out to try to determine if they could get a robot to do a backflip.

I don't know that any of us could confidently create a mathematical definition of a backflip.

We have systems that can learn from demonstrations, but most of us, I know I certainly can't demonstrate a backflip, you know, an emotion capture system.

I couldn't do a backflip on a robot using a game controller, but I'd know it when I saw it.

And there's this question of, is that enough?

And so what the team did was they had this robot just wriggling around at random, and then it would show you two video clips of different random writhing, and it would say, which of these is slightly more like whatever that thing is in your head that you want me to do.

And you'd say, okay, well, in this one, you're kind of sort of tipping over to the right.

So sure, that's slightly closer.

And it would go and start to build this idea of what it thought you wanted.

And it would come back to you with another pair of video clips.

You'd say, Okay, the one on the left is a little bit closer to what I want.

And this remarkable thing happens, which is that over the course of about one hour, the robot starts doing these kind of gymnastically perfect backflips.

And I think that is just an incredible proof of concept

that we are starting to build build these methodologies for somehow conveying these things that exist in human brains, but these things that we can't articulate into words, let alone code or numbers.

You know, I'm not naive about what it would take to scale up from backflips to

human flourishing more generally, but I think it shows us that there might be a path forward here.

As Brian was saying there, maybe you don't even need to have to specifically be able to define what your preferences are for an AI to still sort of understand them.

But how do we know what's best for us?

That seems like a very difficult question.

It is a very difficult question.

There are great challenges to really understanding even an individual's preferences.

Although we like to think, we humans like to think of ourselves as being autonomous agents that decide stuff for ourselves, we're heavily influenced by people around us and by stories around us.

And increasingly, we're actually

influenced influenced by AI algorithms.

So social media platforms, search engines are really applying a sort of digital filter to the information that we see about the world.

And for better or worse, that influences our beliefs about things.

So that further complicates all these issues.

Well, hang on a second.

This seems like an unsolvable problem in many ways, Brian.

There are 8 billion of us on Earth.

How can we possibly decide what kind of future we want where there's no conflict between us?

As Adrian pointed out,

this goes far beyond AI.

Alignment was a human problem first, and I think it will be a human problem last.

Even if we get to a point where we can declare victory over the technical alignment problem, you know, aligning the behavior of an AI system with the people that build it, you're still left with, I think, this really fundamental question, which is

who gets a seat at the table?

Whose idea of a backflip is getting implemented by the robot?

And that involves thinking not just about the behavior of the AI system relative to the designer's intent, but how does the business model of the company that built that system

fit into the people who use that system or the third parties that are affected?

How does it fit into sort of a governance

more broadly?

I think waiting for us on the other side of all of these very concrete technical challenges are these millennia-old questions of political philosophy,

how does a giant heterogeneous group of people sort of express the quote-unquote will of that group of people?

And we are just starting to see the beachhead of these questions making their way into computer science.

So for example, As soon as you start to move to a heterogeneous pool of users, you start to get these weird problems.

So we saw, for example, Tesla was recently talking about some of their drivers are aggressive and some of their drivers are passive.

And if you're in a situation where you're making a left-hand turn and there's a car coming, the aggressive drivers hit the gas in order to turn before the oncoming car.

The passive drivers hit the brake in order to let the other car go first.

And the last thing in the world you want to do is just split the difference, right?

Because that's how you end up in a crash.

And so this is very much an active research problem.

There's a very specific technical problem there, but it is also the problem of how do a group of voters agree on who should lead them

if they all vote for different people.

And there are paradoxes that have existed in political philosophy going many decades back.

Okay, well, one thing that Stewart thinks may help us to follow human preferences is to instill an element of uncertainty in the AI.

The second principle is that the machine is initially uncertain about what those preferences are.

This is the core of the new approach.

We remove the false assumption that the machine is pursuing a fixed objective that is perfectly known.

This principle is what gives us control.

Let me unpack this a little bit with you, Agent, if I may.

How does this idea work?

How does including an element of uncertainty in a system

help us to get closer towards an answer?

At its heart, there's the scary recognition that if you have a fixed rule that a machine follows, there'll be some cases where we won't have thought of everything and it will do something that we don't like, like kill and eat a cat as we talked about before.

So by making the machine uncertain about what we want,

this naturally makes the machine interested in what is it that we would like it to do.

And because of that, it won't jump to do things which could really cause harm and it'll look to us to guide it.

Presumably we want the machine to be uncertain about some things but confident about other things and exactly how we divvy that up and figure it out is a very complicated problem.

I suppose in some ways actually people may have some experience of this idea of uncertainty helping to get towards the right solution.

I was thinking about how sat nav used to just you put it in your system and it just started off with an arrow going like off you go in that direction.

And then we had lots of stories about people driving into the ocean and like driving off cliffs because their sat nerv told them to.

Whereas now, broadly speaking, you're often presented with a map with three different options, and that is the machine in some way saying, not sure which one of these you would like more, and it's your choice in the end to decide which way you go.

There is actually a wonderful bit of uncertainty in what people are actually asking for when they're asking for directions from point A to point B.

It was initially assumed that people just wanted the absolute fastest way to get to the destination, even if that meant burning twice as much gas and taking a million side turns and so forth.

Increasingly, we're starting to see a little bit of this uncertainty creep in, where, yeah, maybe people just want to be as fast as possible.

Maybe they want to take as few turns as possible, or maybe they want to take the scenic route.

And gradually we are starting to see the space of what we mean by the best path from A to B start to enlarge.

And I think that's a great example of how much ambiguity there is, how much subtlety and human nuance there is, even in something seemingly as simple as asking for directions.

I'm always baffled when it says, this is the alternative route, and it'll take you 45 minutes longer.

Presumably, no one has ever clicked on, oh, that's the one I want to do.

But I suppose, you know, another aspect of this, which is related, which is that we change our minds, right?

That we're faced with options.

And one day, you might want to take the scenic route, or one day, you know, you might want to stop off at the shops on the way, and you you don't want to have to program that in, but you know that there's a shop on the way.

So how do we figure, well, the fact that we do like to change our minds into the options in front of us, that level of uncertainty.

There are exciting opportunities there, but also some concerns.

I think we all enjoy it if we're with our partner and our partner intuitively can appreciate that we might want to take the scenic route on that particular day because they can tell the way we feel and they can see that it would be nice to calm us down.

We would like that.

It's really wonderful to be understood like that that and to have someone act in an appropriate way.

But on the other hand, if we can really build machines which are capable of doing that, it could do really good things, but it also creates worrying scenarios.

If they really understand us that well, it presumably makes it much easier for them to manipulate us and deceive us and do all sorts of things, which some people might want to do.

That, in some ways, actually brings us to Stuart Russell's third principle, which is that the best illustration of how we want machines to behave is our own behaviour, I guess.

So, you know, do as I do, not as I say.

Here's Stuart's third principle.

The third principle is that the ultimate source of information about human preferences is human behaviour.

Here, behaviour means everything we do, which includes everything we say, as well as everything we don't do, such as not reading your email during lectures.

It also includes the entire written record because most of what we write about is humans doing things and and other humans being upset about it.

Okay, so what do you guys think about this?

Might this be problematic?

Well, Stuart pointed out that the entire written record of human civilization is in many ways a story about our norms, our values, our goals.

But it also includes our fascination with grisly crimes.

It also includes the internet, which is full of hate speech and trolling and all sorts of things.

So how how do you make sure that this thing which is reading the quote-unquote entire written record of human history is not learning the ethnic slurs and

the sort of sexist comments, et cetera, et cetera?

I think this ends up being

a pretty big part of the challenge that's ahead of us in the next few years.

Just to add to Brian's good example, and also to make us ponder the extent to which humans

also suffer from dangerous biases.

There was a nice example from a few years ago where what Amazon did was they started to use a hiring algorithm where they were going to filter candidates based on

analyzing what their human hirers had done in the past to try to train their algorithm to replicate what those

human selectors had done.

And what they discovered was that their algorithm actually learned to be biased against women in a way that was really troubling.

Now, Now, then what happened was that there was some publicity around Amazon using this algorithm, caused them some trouble, and they stopped using the algorithm.

But I think that's unfortunate.

I think Amazon were trying to do something good.

And what they had learned by training their algorithm on human decisions was actually their humans were biased.

And so I don't think the right answer is to throw the algorithm away and go back to those biased humans.

particularly because actually it's almost always much easier to tweak an algorithm to try to mitigate mitigate those biases rather than try to tweak humans that they're going to be less biased.

So although sometimes people say bias in, bias out, that is, if you have biased data going in, you will get a biased algorithm, that isn't always the case.

We increasingly have a toolkit of methods that can measure and mitigate bias.

I wonder whether we need some kind of

code of ethics, because we do have them in medicine, we do have them in science, and it's taken terrible global events for these to emerge, but they do exist.

I think we very much are in need of something like that, and I think we're moving towards something like that.

I mean, if you look at fields like the pharmaceutical industry, that it has to pass some kind of, not just peer-reviewed trial, but then it has to get authorized by the government before it can go out and affect millions or billions of people.

We don't really have something like that in machine learning in terms of the licensing or the approval or regulation process.

Facebook's motto famously for many years was move fast and break things.

And we are living in a world in which these algorithms that are deployed at a scale that affects billions of people are tweaked dozens of times every single day with the idea that if we screw something up, we'll just roll it back.

Maybe we're now at a point where that can't be the ethos.

I mean, if we approached pharmaceuticals that way, we'll just give people a bunch of random molecules in pill form.

And if some of them start dying, we'll change it and we'll switch the formula.

You know, no big deal.

The fact that that sounds crazy in that context might give us an indication of just the degree to which we're kind of living in this wild west digitally.

And maybe it's time to sort of bring that era to an end.

These challenges are hard.

And in addition, they're pressing because they're influencing us already today.

So we really need to get going and working on these problems.

And I want to emphasize that we need to reach out across a really broad set of expertise.

We need technical experts, we need social scientists, economists, lawyers, policymakers.

And of course, we also need to draw on the perspectives and values of a completely diverse set of users and stakeholders.

We need to listen very carefully to everybody.

So one message I'd really like to get across here is that please everyone, get involved because we need to work on this together now.

Well, on that note, I'm afraid to say that we have reached the end of our time today and we have only just dipped our toes into the future.

But thank you very much to our guests, Brian Christian and Adrian Weller.

Yes, thank you to all of our guests over the series and that is a wrap on the Reef Lectures for 2021.

But don't forget that all of the Reef Lectures from this year's by Stuart Russell all the way back to Bertrand Russell's in 1948 are all available free online on the BBC website.

It is a ridiculous treasure trove of some of the best ideas of the modern age.

So do tuck in.

It's informative, educational and entertaining.

And doesn't just include people with the surname

I'm Adam Rutherford.

And I'm Hannah Fry, and we will be back in the new year with more of us being just a little bit sci-curious.

Thank you very much for listening.

Want to stop engine problems before they start?

Pick up a can of C-Foam Motor Treatment.

C-Foam helps helps engines start easier, run smoother, and last longer.

Trusted by millions every day, C-Foam is safe and easy to use in any engine.

Just pour it in your fuel tank.

Make the proven choice with C-Foam.

Available everywhere.

Automotive products are sold.

SeaFoam!

From Australia to San Francisco, Color Jewelry brings timeless craftsmanship and modern labgrown diamond engagement rings to the U.S.

Explore solitaire, trilogy, halo, and bezel settings, or design a custom piece that tells your love story.

With expert guidance, a lifetime warranty, and a talented team of in-house jewels behind every piece, your perfect ring is made with meaning.

Visit our Union Street Showroom or explore the range at colournjewelry.com.

Your ring, your way.

← Previous: Rutherford and Fry on Living with AI: AI in the Economy Next: The Mystery of the Teenage Brain →

Rutherford and Fry on Living with AI: A Future for Humans

Listen and follow along

Transcript

More episodes from Curious Cases

Just My Luck

Good Vibrations?

To Crab, or Not to Crab?

Coming soon... a brand new series of Curious Cases

Clever Crows