Shane Legg (DeepMind Founder) - 2028 AGI, New Architectures, Aligning Superhuman Models

Transcript

Speaker 1 Okay,

Speaker 1 today I have the pleasure of interviewing Shane Leg, who is a founder and the chief AGI scientist of Google Deep Mind. Shane, welcome to the podcast.

Speaker 2 Thank you, it's a pleasure to be here.

Speaker 1 So, first question: How do we measure progress towards AGI concretely? So, we have these loss numbers, and we can see how the loss improves from one model to another, but it's just a number.

Speaker 1 How do we interpret this? How do we see how much progress we're actually making?

Speaker 2 That's a hard question, actually.

Speaker 2 AGI, by its definition, is about generality. So it's not about doing a specific thing.

Speaker 2 It's much easier to measure performance when you have a very specific thing in mind because you can construct a test around that. Well, maybe I should first explain what do I mean by AGI?

Speaker 2 Because there are a few different notions around it. When I say AGI, I mean a machine that can do the sorts of cognitive things that people can typically do, possibly more.

Speaker 2 But that's to be an AGI, that's kind of the bar you need to meet. So if we want to test whether we're meeting this threshold or we're getting close to this threshold, what we actually need then is

Speaker 2 a lot of different kinds of measurements and tests

Speaker 2 that spans the breadth of all the sorts of cognitive tasks that people can do. And then to have a sense of what is human performance

Speaker 2 on these sorts of tasks. And that then allows us to sort of judge whether or not we're there.

Speaker 2 It's difficult because you'll never have a complete set of everything that people do because it's such a large set. But I think that if you ever get to the point where you

Speaker 2 have a pretty good range of tests of all sorts of different things that people do, cognitive things people do, and you have an AI system which can meet human performance and all those things,

Speaker 2 and with some effort, you can't actually come up with new examples of cognitive tasks where the machine is below human performance, then then at that point, it's conceptually possible that there is something that

Speaker 2 the machine can't do that people can do. But if you can't find it with some effort, I think all practical purposes, you now have an AGI.

Speaker 1 So let's get more concrete.

Speaker 1 You know, we measure the performance of these large language models on MMLU or something, and maybe you can explain what all these different benchmarks are.

Speaker 1 But the ones we use right now that you might see in a paper, what are they missing? What aspect of human cognition do they not measure adequately?

Speaker 2 Oh, yeah, another heart grease.

Speaker 2 These are quite big areas. So they don't measure things like understanding streaming video, for example, because these are language models and people can do things like understanding streaming video.

Speaker 2 They don't do things

Speaker 2 like humans have what we call episodic memory.

Speaker 2 So we have a working memory, which are things that have happened quite recently, and then we have the sort of cortical memory.

Speaker 2 So these are things that have sort of been, you know, in in our cortex that have been but there's also a system in between which is episodic memory which is a hippocampus and so this is about learning specific things very very rapidly so some of the things I say to you today if you remember them tomorrow that will be your your hip your episodic memory hippocampus our models don't really have that kind of thing and we don't really test for that kind of thing we just sort of try to make the context windows which is I think more like a working memory longer and longer to sort of compensate for this.

Speaker 2 But yeah, we don't really test for that kind of a thing.

Speaker 2 So

Speaker 2 there is all sorts of bits and pieces, but it is a difficult question because you really need to, as I said, intelligence, the generality of human intelligence is very, very broad.

Speaker 2 So you really have to start going into the weeds of trying to find if there's specific types of things that are missing from existing benchmarks or different categories of benchmarks that

Speaker 2 don't currently exist or something.

Speaker 1 The thing you're referring to with episodic memory, would it be fair to call that sample efficiency or is that a different

Speaker 2 It's very much related to sample efficiency. It's one of the things that enables humans to be very sample efficient.

Speaker 2 Large language models have a certain kind of sample efficiency because when something's in their context window, they can then that that sort of biases the distribution to behave in a different way.

Speaker 2 And so that's a very rapid kind of learning. So there are multiple kinds of learning, and the existing systems have some of them, but not others.
So it's a little bit complicated.

Speaker 1 So

Speaker 1 this kind of memory, or we call it sample efficiency, whatever,

Speaker 1 is it a fatal flaw of these deep learning models that it just takes trillions of tokens, far more, many orders of magnitude more than a human will see throughout their lifetime?

Speaker 1 Or is this something that just solved over time?

Speaker 2 So the models can learn things immediately when it's in a context window. And then they have this sort of this longer process of when you actually train the base model and so on.

Speaker 2 And that's they're learning over trillions of tokens, but they sort of miss something in the middle.

Speaker 2 That's sort of what I'm getting at here.

Speaker 2 I don't think it's a fundamental limitation.

Speaker 2 I think what's happened with large language models is something fundamental has changed. We know how to build models now that have some degree of, I would say, understanding of what's going on.

Speaker 2 And that did not exist in the past. And because we've got a scalable way to do this now, that unlocks lots and lots of lots of new things.

Speaker 2 Now, we can then look at things which are missing, such as this sort of episodic memory type thing, and we can then start to imagine ways to address that. So

Speaker 2 my feeling is that

Speaker 2 there are kind of relatively clear paths forwards now to address most of the shortcomings we see in existing models, whether it's about delusions, factuality,

Speaker 2 the type of memory and learning that they have, or understanding video, or all sorts of things like this. So I'm not actually, I don't see there are big blockers here.

Speaker 2 I don't see big walls in front of us. I just see there's more research and work and these things will improve and probably be adequately solved.

Speaker 1 But going back to the original question of how do you measure when human-level AI is arrived or we're beyond it,

Speaker 1 as you mentioned, there's these other sorts of benchmarks you can use and other sorts of traits.

Speaker 1 But concretely, if there is, what would it have to do for you to be like, okay, we've reached human level? Would it have to beat Minecraft from start to finish? Would it have to get 100% on MMLU?

Speaker 1 What would it it have to do?

Speaker 2 There is no one thing that would do it because I think that's the nature of it.

Speaker 2 It's about general intelligence, so I would have to make sure it could do lots and lots of different things, and it didn't have a gap.

Speaker 2 We already have systems that can do very impressive categories of things to human level or even beyond.

Speaker 2 So I would want a whole suite of tests that I felt was very comprehensive. And then, furthermore, when people come and say, okay, so it's passing our big suite of tests, let's try to find examples.

Speaker 2 Let's take an adversarial approach to this. Let's deliberately try to find examples where people can clearly typically do this, but the machine fails.

Speaker 2 And when those people cannot succeed, I'll go, okay, we're probably there.

Speaker 1 A lot of your early research, at least,

Speaker 1 Zach, I find, emphasized that AI should be able to manipulate and succeed in a variety of open-ended environments. It kind of sounds like a video game almost.

Speaker 1 Is that that where your head is still at now, or do you think about it differently?

Speaker 2 Yeah, it's evolved a bit.

Speaker 2 When I did my thesis work around universal intelligence and so on, I was trying to come up with a sort of extremely universal, general, mathematically clean framework for defining and measuring intelligence.

Speaker 2 And

Speaker 2 I think there were aspects of that that were successful. I think in my own mind, it clarified

Speaker 2 the nature of intelligence

Speaker 2 as being able to perform well in lots of different domains and different tasks and so on. It's about that sort of capability of performance and the breadth of performance.

Speaker 2 So I found that was quite helpful, enlightening. There was always the issue of the reference machine.
because

Speaker 2 in the framework you have a weighting of things according to their complexity. It's like an Occam's razor type of thing, where you weight

Speaker 2 tasks, environments, which are simpler more highly in this sort of thing, because you've got an infinite,

Speaker 2 it's a countable space of different computable environments, of semi-computable environments.

Speaker 2 And that

Speaker 2 complexity measure has something built into it, which is called a reference machine. And that's a free parameter.
So that means that the intelligence measure has a free parameter in it.

Speaker 2 And as you change that free parameter, it changes the weighting and the distribution over the space of all the different tasks and environments.

Speaker 2 So this is sort of an unresolved part of the whole problem.

Speaker 2 So

Speaker 2 what

Speaker 2 reference machine should we ideally use? There isn't really a, there's no universal, like one specific reference machine.

Speaker 2 People will usually put a universal Turing machine in there, but there are many kinds of universal Turing machines.

Speaker 2 You have to put a universal Turing machine in there, but there are many different ones. So, I think, given that it's a free parameter, I think the most natural thing to do is say, okay,

Speaker 2 let's think about what's meaningful to us in terms of intelligence.

Speaker 2 I think human intelligence is meaningful to us and the environment that we live in.

Speaker 2 We know what human intelligence is. We are human too.
We interact with other people who have human intelligence. We know that human intelligence is possible, obviously, because it exists in the world.

Speaker 2 We We know that human intelligence is very, very powerful because it's affected the world profoundly in countless ways.

Speaker 2 And we know if human-level intelligence was achieved, that would be economically transformative, because the types of cognitive tasks people do in the economy could be done by machines then.

Speaker 2 And it would be philosophically important, because this is sort of how we often think about intelligence. And I think historically, it would be a key point.

Speaker 2 So I think that human intelligence is actually quite, in a human-like environment, is quite a natural sort of reference point. So you could imagine sort of setting

Speaker 2 your reference machine to be such that it emphasizes the kinds of environments that we live in as opposed to some abstract mathematical environment or something like that.

Speaker 2 And so that's how I've kind of gone on this journey of let's try to define a completely universal, clean, mathematical notion of intelligence to, well,

Speaker 2 it's got a free parameter. One way of thinking about it is say, okay, let's think more concretely now about human intelligence.
And can we build machines that can match human intelligence?

Speaker 2 Because we understand what that is, and we know that that is a very powerful thing. And it has economic, philosophical, historical kind of importance.
So that's kind of the.

Speaker 2 And the other aspect, of course, is that, you know, in this pure formulation of

Speaker 2 complexity, it's actually not computable.

Speaker 2 And I obviously knew that there was a limitation at the time, but it was an effort to say, okay, can we just even very theoretically come up with a clean definition?

Speaker 2 I think we can sort of get there, but we have this issue of

Speaker 2 a reference machine which is unspecified.

Speaker 1 So before we move on, I do want to ask, on the original point you made about these machines or these LLMs need

Speaker 1 episodic memory,

Speaker 1 you said that these are

Speaker 1 problems that we can solve. These are not fundamental impediments.

Speaker 1 But when you say that, do you think they will just be solved by scale, or do each of these need a fine-grained, specific solution that is architectural in nature?

Speaker 2 I think it'll be architectural in nature because the, well, the current architectures,

Speaker 2 they don't really have what you need to do this. They basically have a context window, which is very, very fluid, of course, and they have the weights, which things get baked into very slowly.

Speaker 2 So, to my mind, that feels like working memory, which is like the activations in your brain, and then the weights, the synapses and so on on in your cortex. Now, the brain separates these things out.

Speaker 2 It has a separate mechanism for rapidly learning

Speaker 2 specific information because that's a different type of optimization problem compared to slowly learning deep generalities. All right.
That's sort of,

Speaker 2 there's a tension between the two, but you want to be able to do both. You want to be able to, I don't know, hear someone's name and remember it the next day.

Speaker 2 And you also want to be able to integrate information over a lifetime so you start to see deeper patterns in the world.

Speaker 2 These are quite

Speaker 2 different

Speaker 2 optimization targets, different processes. But a comprehensive system should be able to do both.
And so, I think

Speaker 2 it's conceivable you could build one system that does both, but you can see because they're quite different things that it makes sense for them to do different.

Speaker 2 I think that's why the brain does it separately.

Speaker 1 I'm curious about how concretely you think that would be achieved. And I'm specifically curious,

Speaker 1 if you guys are going to answer this as part of the answer,

Speaker 1 You know, DeepMind has been working on these domain-specific reinforcement learning type setups, alpha fold, alpha code, and so on. How does that fit into what you see as a path to AGI?

Speaker 1 Have these just been orthogonal domain-specific models or do they feed into the eventual AGI?

Speaker 2 Things like AlphaFold

Speaker 2 are not really feeding into AGI. You know, we may learn things in the process that may end up being relevant, but I don't see them as being

Speaker 2 likely being on the path to AGI.

Speaker 2 But yeah,

Speaker 2 we're a big group. We've got hundreds and hundreds and hundreds of PhDs working on lots of different projects.
So when we find,

Speaker 2 you know, what we see like opportunities to do something significant like AlphaFold, we'll go and do it. It's not like we only do.
AGI type work.

Speaker 2 We work on fusion reactors and

Speaker 2 various things in sustainability, energy. We've got people looking at

Speaker 2 satellite images of

Speaker 2 deforestation. We have people looking at weather forecasting.
We've got tons of people looking at lots of things.

Speaker 1 On the point you made earlier about the reference class or the reference machine as human intelligence, it's interesting because in your 2008 thesis, one of the things you mentioned almost as a side note is, well, how would you measure intelligence?

Speaker 1 And you said, well, you could do a compression test and you could see if it fills in words and a sample of text, and that could measure intelligence.

Speaker 1 And funnily enough, that's basically how the lens are trained. At the time, did it stick out to you as especially fruitful thing to train for?

Speaker 2 Well,

Speaker 2 yeah, I mean, in a sense, what's happened is actually very aligned with what I write about my thesis, which are the ideas from Marcus Hutter with AIC,

Speaker 2 where

Speaker 2 you take Solomonov induction, which is this incomputable but sort of theoretically very elegant and extremely sample efficient prediction system.

Speaker 2 And then once you have that, you can build a general agent on top of it by basically adding search and reinforcement signal. That's what you do with AIC.

Speaker 2 But what that sort of tells you is that if you have a fantastically good sequence predictor, some approximation of Solomonov induction, then going from that to a very powerful, very general AI system, AGI system is just sort of another step.

Speaker 2 You know, you've actually solved a lot of the problem already.

Speaker 2 And I think that's what we're seeing today, actually, that these incredibly powerful foundation models are incredibly good sequence predictors. They're compressing the world based on all this data.

Speaker 2 And then you will be able to extend these in different ways and build very, very powerful agents out of them.

Speaker 1 Okay, let me ask you more about that. So, Richard Sutton's bitter lesson essay says that there's two things you can scale: search and learning.

Speaker 1 And I guess you could say that LLMs are about the learning aspect.

Speaker 1 The search stuff, which you've worked on throughout your career, where you have an agent that is

Speaker 1 interacting with this environment, and

Speaker 1 is that a direction that needs to be explored again? Or is that something that needs to be added to LLMs where they can actually interact with their data or the world in some way?

Speaker 2 Yeah.

Speaker 2 I think that's on the right track. I think there is

Speaker 2 these foundational models are world models of a kind, and to do really creative

Speaker 2 problem solving, you need to start searching. So if I think about something like AlphaGo and the move 37, the famous move 37, where did that come from?

Speaker 2 Did that come from all its data that it's seen of human games or something like that? No, it didn't.

Speaker 2 It came from it identifying a move as being quite unlikely, but plausible, and then via a process of search, coming to understand that that was actually a very, very good move.

Speaker 2 So you need to, to get real creativity, you need to search through spaces of possibilities and find these sort of hidden gems. That's what creativity is.

Speaker 2 I think current language models, they don't really do that kind of a thing. They really are mimicking the data.

Speaker 2 They are mimicking all the human ingenuity and everything which they have seen from all this data that's coming from from the internet that's originally derived from humans.

Speaker 2 If you want a system that can go truly beyond that and not just generalize in novel ways, so it can, you know, these models can blend things.

Speaker 2 They can do, you know, Harry Potter in the style of a Kanye West rap or something, even though it's never happened. They can blend things together.

Speaker 2 But to do something that's truly creative, that there's not just a blending of existing things, that requires searching through a space of possibilities and finding these hidden gems that

Speaker 2 are sort of hidden away in there somewhere. And that requires search.
So I don't think we'll see systems that truly step beyond their training data until we have powerful search in the process.

Speaker 1 So there are rumors that Google DeepMind is training newer models, and you don't have to comment on those specifically. But

Speaker 1 when you do that,

Speaker 1 if it's the case that search or something like that is required to go to the next level, are you training in a completely different way than, say, GPT-4 or other Transformers are trained?

Speaker 2 I can't say much about how we're training. I think it's fair to say we're doing the sorts of scaling and training roughly that you see many people in the field doing.

Speaker 2 But we have our own take on it and our own different tricks and techniques.

Speaker 1 Okay, maybe we'll come back to it and get another answer on that. But let's talk about alignment briefly.
So, what will it take to align human-level and superhuman AIs? And

Speaker 1 it's interesting because the sorts of reinforcement learning and self-lay kinds of setups that are popular now, like Constitution AI or RLHF, DeepMind obviously has expertise in it for more decades longer.

Speaker 1 So I'm curious what you think of the current landscape and how DeepMind

Speaker 1 pursues that problem of safety towards human-level models.

Speaker 2 So do you want to know about what we're currently doing, or would you want me to have a stab at what I think needs to be done? Needs to be done. Needs to be done.

Speaker 2 So, I mean, in terms of what we're currently doing, we're doing lots of things. We're doing interpretability.
We're doing process supervision. We're doing red teaming.

Speaker 2 We're doing evaluation for dangerous capabilities. We're doing work on institutions and governance and, you know, tons of stuff, right? There's lots of different things.

Speaker 2 Anyway, what do I think needs to be done?

Speaker 2 So,

Speaker 2 I think that powerful machine learning, powerful AGI is coming in sometime.

Speaker 2 And

Speaker 2 if the system is really capable, really intelligent, really powerful, trying to somehow contain it or limit it is probably not a winning strategy because these systems ultimately will be very, very capable.

Speaker 2 So what you have to do is you have to align it. You have to get it so it's fundamentally a highly ethical, value-aligned

Speaker 2 system from the get-go, right?

Speaker 2 How do you do that? Well,

Speaker 2 I have a, maybe this is slightly naive, but this is my take on it. How do people do it, right? If you have a really difficult ethical decision in front of you, what do you do? Right? Well,

Speaker 2 you don't just do the first thing that comes to mind, right? Because, you know, there could be a lot of emotions involved and other things, right? It's a difficult problem.

Speaker 2 So what you have to do is you have to calm yourself down, you've got to sit down and you've got to think about it. You've got to think, well, okay, what could I do? I could do this.
I could do this.

Speaker 2 I could do this.

Speaker 2 If I do each of these things, what will happen, right?

Speaker 2 And then you have to think about, so that requires a model of the world. And then you have to think about ethically,

Speaker 2 how do I view each of these different actions and the possibilities and what may happen from it, right?

Speaker 2 What is the right thing to do?

Speaker 2 And as you think about all the different possibilities and your actions and what can follow from them and how it aligns with your values and your ethics, you can then come to some conclusion of what is really the best choice that you should be making if you want to be really ethical about this.

Speaker 2 I think AI systems need to essentially do the same thing. So, when you sample from a foundational model at the moment, it's like it's blurting out the first thing.

Speaker 2 It's like system one, if you like, from psychology from Kahneman, right?

Speaker 2 That's not good enough. And if we do RLHF or

Speaker 2 what's it called? I can't remember. Anyway, it's the AI AI version without the human feedback.
R

Speaker 2 AIF. Is that what it is? Oh gosh, I'm confusing myself.
Anyway, constitutional AI tries to do that sort of thing. You're trying to fix the underlying system one in a sense, right?

Speaker 2 And that can shift the distribution and that can be very helpful, but it's a very high-dimensional distribution and you're sort of poking it in a whole lot of points.

Speaker 2 And so it's not likely to be a very robust solution.

Speaker 2 right it's like trying to train yourself out of a bad habit you know you can sort of do it eventually but what you need to do is you need to have a system too you need the system to not just sample from the model you need the system to go okay i'm going to reason this through i'm going to do step-by-step reasoning what are the options in front of me i'm going to use my world model now and i'm going to use a good world model to understand what's likely to happen from each of these options

Speaker 2 and then reason about each of these from an ethical perspective. So you need a system which has

Speaker 2 a deep understanding of the world, has a good world model, it has a good understanding of people, it has a good understanding of ethics, and it has robust and very reliable reasoning.

Speaker 2 And then you set it up in such a way that it applies this reasoning and this understanding of ethics to analyze the different options which are in front of it and then execute on which is an

Speaker 2 ethical

Speaker 2 way forwards.

Speaker 1 But I think when a lot of people think about the fundamental alignment problem, the worry is not that it's not going to have a world model necessary to understand its actions, or sorry, to understand the effects of its actions.

Speaker 1 I guess it's one worry, but not the main worry. The main worry is that the effects it cares about are not the ones we will care about.

Speaker 1 And so even if you improve its systems, rethinking and do better planning, the fundamental problem of we have this really nuanced values about what we want.

Speaker 1 How do we communicate those values and make sure they're reinforced in the AI?

Speaker 2 It needs not just a good model of the world, but it needs a really good understanding of ethics. And we need to communicate to the system what ethics and values it should be following.

Speaker 1 And how do we do that in a way that's

Speaker 1 we can be confident that a human level or eventually a superhuman level model will preserve those values or learn them in the first place?

Speaker 2 Well, it should preserve them because if it's making all its decisions based on a good understanding of ethics and values and it's consistent in doing this, it shouldn't take actions which undermine that.

Speaker 2 That would be inconsistent. Right.

Speaker 1 So then how do we get to the point where it's learned them in the first place?

Speaker 2 Yeah, that's the challenge. Yeah.
We need to have systems. The way I think about it is this: to have a profoundly ethical AI system, it also has to be very, very capable.

Speaker 2 It needs a really good world model, a really good understanding of ethics, and it needs really good reasoning.

Speaker 2 Because if you don't have any of those things, how can you possibly be consistently, profoundly ethical? You can't.

Speaker 2 So, we actually need better reasoning, better understanding of the world, world, and better understanding of ethics in our systems.

Speaker 1 Right. So it seems to me the former two would just come along for the ride as these models get more powerful.
Yeah.

Speaker 2 So that's a nice property because it's actually a capabilities thing to some extent.

Speaker 1 But then if the third one is a bottleneck, or if the third one is a thing that doesn't come along with the AI itself,

Speaker 1 what is the actual technique to make sure that that happens?

Speaker 2 The third one, sorry, the third one.

Speaker 1 The ethical model. What do humans value?

Speaker 2 Well,

Speaker 2 we've got a couple of problems. First of all, we need to decide,

Speaker 2 we should train the system on ethics generally. I mean, there's a lot of lectures and papers and books and all sorts of things.
So it understands human ethics well, right?

Speaker 2 And we need to make sure it understands humans' ethics well, right? Because that's important, at least as well as a very good ethicist.

Speaker 2 And

Speaker 2 we then need to decide, okay.

Speaker 2 Of this sort of general understanding of ethics, what do we want the system to actually value and what sort of ethics do we want it to apply.

Speaker 2 Now that's not a technical problem. That's a problem for society and ethicists and so on to come up with.
Now

Speaker 2 you know I'm not sure there's such a thing as true or correct optimal ethics or something like that, but I'm pretty sure that it's possible to come up with a set of ethics which is

Speaker 2 much better than the you know, what the so-called doomers worry about in terms of the behavior of these AGI AGI systems. And then what you do is you engineer the system to actually follow

Speaker 2 these things. So every time it makes a decision, it does an analysis using a

Speaker 2 deep understanding of the world and of ethics and very robust and precise reasoning to do an ethical analysis of what it's doing. And of course, we'd want lots of other things.

Speaker 2 We'd want people checking these processes of reasoning. We'd want people, you know, verifying that it's behaving itself in terms of

Speaker 2 how it reaches these conclusions.

Speaker 1 But I still feel like I don't understand how that fundamental problem of making sure it follows that ethic, because presumably

Speaker 1 it has Mao's little book, so it understands Maoist ethics, it understands all these other ethics.

Speaker 1 How do we make sure the ethic that we say

Speaker 1 this is the one we've decided ethicist in society and so on today,

Speaker 1 that is the one it ends up following and not the other ones it understands.

Speaker 2 Right. So you have to specify to the system, these are the ethical principles that you should follow.

Speaker 1 And how do we make sure it does that?

Speaker 2 We have to check it as it's doing it. We have to assure ourselves that it is consistently following these ethical principles.

Speaker 2 At least, I mean, I'm not sure there's such a thing as optimally, but at least as well as a group of human experts.

Speaker 1 Are you worried that if you do it the default way, which is just reinforcing it whenever it seems to be following them, you could be training deception as well?

Speaker 2 That is straightforward way.

Speaker 2 Reinforcement has

Speaker 2 some dangerous aspects to it.

Speaker 2 I think it's actually more robust to

Speaker 2 check the process of reasoning and check its understanding of ethics. So

Speaker 2 to reassure ourselves that the system has a really good understanding of ethics, it should be grilled

Speaker 2 for

Speaker 2 some time to try to really pull apart its understanding and make sure it has a very robust.

Speaker 2 And then also, if it's deployed, we should have people constantly looking for how, you know, the decisions it's making and the reasoning process that goes into those decisions

Speaker 2 to try to understand how that is correctly reasoning about these types of things.

Speaker 1 Speaking of which, do you at Google DeepMind have some sort of framework for?

Speaker 2 This is not so much a Google DeepMind perspective on this. This is my take on how I think we need to do this kind of thing.

Speaker 2 There are many different views within, and there are different variants on these sorts of ideas as well.

Speaker 1 So then, do you personally think there needs to be some sort of framework for as you arrive at certain capabilities, these are the concrete safety benchmarks that you must have instated at this point or you should pause or slow down or something?

Speaker 2 I think that's a sensible thing to do. It's actually quite hard to do.

Speaker 2 And there are some people thinking about it. I know Anthropics has put out some things like that.

Speaker 2 We're thinking about similar things.

Speaker 2 Actually, you know, putting concrete things down is actually quite a hard thing to do. So I think it's an important problem and I certainly encourage people to work on it.
Yeah.

Speaker 1 Yeah.

Speaker 1 So, you know,

Speaker 1 it's interesting because you have these blog posts that you wrote when you started DeepMind,

Speaker 1 you know, back in 2008, where you talk about

Speaker 1 the motivation was to accelerate safety. On net, what do you think the impact of DeepMind has been on safety versus capabilities?

Speaker 2 Oh,

Speaker 2 interesting.

Speaker 2 I don't know. It's hard to judge, actually.

Speaker 2 You know, back in the,

Speaker 2 I've been worried about AGI safety for a long time,

Speaker 2 well before DeepMind.

Speaker 2 But it was always really hard to hire people, actually, particularly in the early days, to work on AGI safety.

Speaker 2 I'm thinking back in 2013 or so, I think we had the first hire, and he only agreed to do it part-time because he didn't want to, you know, drop all the capabilities work because you know the impact it could have was a career and stuff.

Speaker 2 And this was someone someone who'd already previously been publishing in AGI Safety. So,

Speaker 2 yeah, I don't know. It's hard to

Speaker 2 know what is the counterfactual if we weren't there doing it.

Speaker 2 I think

Speaker 2 we've been a group that's been

Speaker 2 talked about this openly.

Speaker 2 I've talked about this on many occasions, the importance of it.

Speaker 2 We've been hiring people to work on these topics.

Speaker 2 I know a lot of other people in the area, and I've talked to them over many, many years. I've known Dario since 2005 or something or other.

Speaker 2 We've talked on and off about AGI safety and so on. So I don't know, the impact that DeepMind has had.

Speaker 2 I guess we were the first,

Speaker 2 I'd say the first AGI company. And as the first AGI company, we...
We, you know, we always had an AGI safety group.

Speaker 2 We've been publishing papers in this for many years. I think that's lent some credibility to the area when people see, oh, here's a AGI.

Speaker 2 I mean, AGI was a, you know, there was a fringe term not that long ago. And this person's doing AGI safety.
And what they're a deep mind?

Speaker 2 Oh, okay.

Speaker 2 I hope that sort of, you know, creates some space for people.

Speaker 1 And where do you think AI progress itself would have been without DeepMind? And this is not just a point that people make about DeepMind.

Speaker 1 I think this is a general point people make about OpenAI and Anthropic as well, that these people went into the business to accelerate safety and sort of the net effect might have been to accelerate capabilities far more.

Speaker 2 Right, right, right. I think we have accelerated capabilities, but again, the counterfactuals are quite, quite difficult.

Speaker 2 I mean, we didn't do ImageNet, for example, and ImageNet, I think, was very influential in

Speaker 2 attracting investment to the field.

Speaker 2 We did do AlphaGo, and that changed some people's minds.

Speaker 2 But, you know, the community is a lot bigger than just DeepMind. I mean,

Speaker 2 we have,

Speaker 2 well, not so much now, but

Speaker 2 because there are a number of other players with significant resources. But if you went back more than five years in the future, we were able to do

Speaker 2 bigger projects with bigger teams and take on more ambitious things than a lot of the smaller academic groups, right?

Speaker 2 And so the sort of nature of the type of work we could do was a bit different.

Speaker 2 And that, I think,

Speaker 2 that affected the dynamics in some ways. But, you know, the community is much, much bigger than, say, deep minds.
So

Speaker 2 maybe we've sped things up a bit, but I think a lot of these things would have happened

Speaker 2 before too long anyway.

Speaker 2 I think these often good ideas are kind of in the air. And

Speaker 2 as a researcher, when sometimes you publish something or you're about to publish something, you see somebody else who's got a very similar idea coming out with some good results.

Speaker 2 I think often it's the time is right, right, for things. So, you know, it's, I find it very hard to reason about the counterfactuals there.

Speaker 1 Speaking of the early years, it's really interesting that in 2009, you had a blog post where you say, my modal expectation of when we get human-level AI is 2025, expected value is 2028.

Speaker 1 And this is before deep learning. This is when nobody's talking about AI.
And it turns out, like, if you, if the trends continue, this is not an unreasonable prediction. This is.

Speaker 1 How did you, I mean, before all these trends came into effect, how did you have that accurate an estimate?

Speaker 2 Well, first i'd say it's not before deep learning uh um deep learning was getting started around 2008 oh sorry i meant to say before image net before image net that was 2012 yeah

Speaker 2 um

Speaker 2 so well i first

Speaker 2 formed those beliefs in about 2001 after reading ray kurtzwell's the age of spiritual machines and i

Speaker 2 I came to the conclusion he was, he was,

Speaker 2 there was two really important points that in his book that I came to believe is true. One is that

Speaker 2 computational power would grow exponentially for at least a few decades, and that the quantity of data in the world would grow exponentially for a few decades.

Speaker 2 And when you have exponentially increasing quantities of computation and data, then the value of highly scalable algorithms gets higher and higher.

Speaker 2 So then there's a lot of incentive to make a more scalable algorithm to harness all this compute and data.

Speaker 2 And so I thought it would be very likely that we'll start to discover scalable algorithms to do this.

Speaker 2 And then there's a positive feedback between all these things because if your algorithm gets better at harnessing compute and data, then the value of the data in the compute goes up because it can be more effectively used.

Speaker 2 And so that drives more investment into these areas. If your compute performance goes up, then the value of the data goes up because you can utilize more data.

Speaker 2 So there are positive feedback loops between all these things. So that was the first thing.

Speaker 2 And then the second thing was just looking at the trends, if these scalable algorithms were to be discovered, then during the 2020s, it should be possible to start training models on significantly more data than a human would experience in a lifetime.

Speaker 2 And I figured that that would be a time where big things would start to happen. and that would eventually unlock AGI.
So that was my reasoning process. And I think we're now at that first part.

Speaker 2 I think we can start training models now where the scale of the data is beyond what a human can experience in a lifetime. So I think this is the first unlocking step.

Speaker 2 And so, yeah, I think there's a 50% chance that somebody 2028. Now, it's just a 50% chance.

Speaker 2 I mean, I'm sure what's going to happen is going to get to 2029, and someone's going to say, oh, Shane, you were wrong. It's like, come on, it's a 50% chance.

Speaker 2 So, yeah,

Speaker 2 I think

Speaker 2 it's entirely plausible. Yeah, it's a 50% chance it could happen by 2028.

Speaker 2 But I'm not going to be surprised if it doesn't happen by then. Maybe, you know,

Speaker 2 you often hit unexpected problems in research and sciences, and sometimes things take longer than you expect.

Speaker 1 If there was a problem that caused it, if we're in 2029 and it hasn't happened yet, looking back, what would be the most likely reason that would be the case?

Speaker 1 I don't know.

Speaker 2 I don't know. At the moment, it looks to me like all the problems are likely solvable with a number of years of research.
That's my current sense.

Speaker 1 And what does a time from here to 2028 look like if the 2028 ends up being the year?

Speaker 1 Is it just we have trillions of dollars of economic impact in the meantime? And

Speaker 1 the world gets crazy or what happens.

Speaker 2 I think what you'll see is the existing models maturing.

Speaker 2 They'll be less delusional, much more factual. They'll be be more up-to-date on what's currently going on when they answer questions.

Speaker 2 They'll become multimodal much more than they currently are.

Speaker 2 And this will just make them much more useful. So I think probably what we'll see more than anything is just loads of great applications

Speaker 2 for the coming years.

Speaker 2 I think that'll be the,

Speaker 2 there can be some misuse cases as well. I'm sure somebody will come up with, you know,

Speaker 2 something to do with these models that is quite unhelpful.

Speaker 2 But my expectation for the coming years is mostly a positive one. We'll see all kinds of really impressive, really amazing applications

Speaker 2 for the coming years.

Speaker 1 And on the safety point, you mentioned these different research directions that are out there and that you are doing internally in DeepMind as well: interability, RAIF, and so on.

Speaker 1 Which are you most optimistic about?

Speaker 2 I don't know. I don't want to pick favorites.

Speaker 2 It's hard picking favorites. I know the people working on all these areas.

Speaker 2 I think things of the sort of system two flavor.

Speaker 2 There's a work we have going on that Jeffrey Irving leads

Speaker 2 called Deliberative Dialogue, which kind of has the System II flavor, where you have

Speaker 2 this sort of debate takes place about

Speaker 2 the actions that an agent could take or what's the correct answer to something or something like this. And people then can sort of review these debates and so on.

Speaker 2 And they use these sort of AI algorithms to help them judge the correct outcomes and so on. And so this is sort of meant to be a way in which to try to scale

Speaker 2 the

Speaker 2 alignment. to sort of increasingly powerful systems.

Speaker 2 So I think things of that kind of flavor, I I think, have quite a lot of promise in my opinion, but that's kind of quite a broad category of research and there are many different topics within that.

Speaker 1 That's interesting. So you've mentioned

Speaker 1 two areas in which LLMs need to improve. One is the episodic memory and the other is the system to thinking.
Are those two related or

Speaker 1 are they two separate drawbacks?

Speaker 2 I think they're fairly separate, but they can be somewhat related. So you can learn different ways of thinking through problems and actually learn about this rapidly using your episodic memory.

Speaker 2 So all these different systems and subsystems interact. So they're never completely separate.
But I think conceptually, you can probably think of them as quite separate things.

Speaker 2 I think delusions and factuality is another area

Speaker 2 that's going to be quite important

Speaker 2 and particularly important in lots of applications.

Speaker 2 If you want a model that writes creative poetry, then that's fine because you want to be able to be very free to suggest all kinds of possibilities and so on.

Speaker 2 You're not really constrained by a specific reality. Whereas, if you want something that's in a

Speaker 2 particular application, normally you have to be quite concrete about what's currently going on and what is true and what is not true, and so on.

Speaker 2 And models are a little bit sort of freewheeling when it comes to truth and creativity at the moment. And that, I think, limits their applications in many ways.

Speaker 1 So, the final question is this: You've been in this field for over a decade, much longer than many others. And you've seen these different landmarks, ImageNet, transformers.

Speaker 1 What do you think the next landmark will look like?

Speaker 2 I think the next landmark that people

Speaker 2 will think back to and remember is

Speaker 2 going much more fully multimodal, I think.

Speaker 2 Because I think that will

Speaker 2 That'll open out the sort of understanding that you see in language models into a a much larger space of possibilities.

Speaker 2 And when people think back, they'll think about, oh, those old-fashioned models, they just did like chat, they just did text. You know, it was just felt like a very narrow thing.

Speaker 2 Whereas now they, you know, they understand

Speaker 2 when you talk to them, and they understand images, pictures, and video, and you can show them things or things like that, and they will have much more understanding of what's going on.

Speaker 2 And it'll feel like the system's kind of opened up into the world

Speaker 2 in a much more powerful way.

Speaker 1 Do you want to ask us to follow up on that? So, ChatGPT just released their multimodal feature, and then you, in DeepMind, you had the Gato paper where

Speaker 1 you have this one model, you can images, even actions, video games, whatever you can throw in there.

Speaker 1 And so far, it doesn't seem to have been, it hasn't percolated as much as even like ChatGPT initially from GPT-3 or something. What explains that?

Speaker 1 Is it just that people haven't learned to use multimodality? They're not powerful enough yet?

Speaker 2 I think it's early days.

Speaker 2 I think there's, you can see promise there, understanding images and things more and more. But I think it's, yeah, it's early days in this transition.

Speaker 2 It's when you start really digesting a lot of video and other things like that, that the systems will start having a much more grounded understanding of the world and all kinds of other aspects.

Speaker 2 And then when that works well, that will open up naturally lots and lots of new applications and all sorts of new possibilities because you're not confined to text chat anymore.

Speaker 1 The new avenues of training data as well, right?

Speaker 2 Yeah, new training data and new in all kinds of different applications that aren't just purely textual anymore.

Speaker 2 And, you know, what are those applications?

Speaker 2 Well, probably a lot of them we can't even imagine at the moment because there are just so many, so many possibilities once you can start dealing with all sorts of different modalities in a consistent way.

Speaker 1 Awesome. Shane, I think that's an excellent place to leave it off.
Thank you so much for coming on the podcast.

Speaker 2 Thank you.

Speaker 1 Hey, everybody. I hope you enjoyed that episode.
As always, the most helpful thing you can do is to share the podcast.

Speaker 1 Send it to people you think might enjoy it, put it in Twitter, your group chats, et cetera.

Speaker 2 It just splits the world.

Speaker 1 I appreciate you listening. I'll see you next time.
Cheers.

Shane Legg (DeepMind Founder) - 2028 AGI, New Architectures, Aligning Superhuman Models

Press play and read along

Transcript

More episodes from Dwarkesh Podcast

Ilya Sutskever – We're moving from the age of scaling to the age of research

Satya Nadella — How Microsoft is preparing for AGI

Sarah Paine – How Russia sabotaged China's rise

Andrej Karpathy — AGI is still a decade away

Nick Lane – Life as we know it is chemically inevitable