Demis Hassabis on AI, Game Theory, Multimodality, and the Nature of Creativity | Possible

58m
How can AI help us understand and master deeply complex systems—from the game Go, which has 10 to the power 170 possible positions a player could pursue, or proteins, which, on average, can fold in 10 to the power 300 possible ways? This week, Reid and Aria are joined by Demis Hassabis. Demis is a British artificial intelligence researcher, co-founder, and CEO of the AI company, DeepMind. Under his leadership, DeepMind developed Alpha Go, the first AI to defeat a human world champion in Go and later created AlphaFold, which solved the 50-year-old protein folding problem. He's considered one of the most influential figures in AI. Demis, Reid, and Aria discuss game theory, medicine, multimodality, and the nature of innovation and creativity.

For more info on the podcast and transcripts of all the episodes, visit https://www.possible.fm/podcast/

Listen to more from Possible here.
Learn more about your ad choices. Visit podcastchoices.com/adchoices

Listen and follow along

Transcript

Support for this show comes from IBM.

Is your AI built for everyone or is it built to work with the tools your business relies on?

IBM's AI agents are tailored to your business and can easily integrate with the tools you're already using so they can work across your business, not just some parts of it.

Get started with AI Agents at iBM.com.

That's IBM.com.

The AI Built for Business, IBM.

Avoiding your unfinished home projects because you're not sure where to start?

Thumbtack knows homes, so you don't have to.

Don't know the difference between matte paint finish and satin, or what that clunking sound from your dryer is?

With thumbtack, you don't have to be a home pro.

You just have to hire one.

You can hire top-rated pros, see price estimates, and read reviews all on the app.

Download today.

Hi, everyone.

This is Pivot from New York Magazine and the Vox Media Podcast Network.

I'm Kara Swisher, and today we're sharing an episode of Possible, hosted by one of our recent guests, Reed Hoffman.

Join Reed and his co-host, Arya Finger, as they sit down with the co-founder and CEO of Google DeepMind, Demis Hasevis, one of the most influential figures in AI.

They'll dive into game theory, medicine, multi-modality, the nature of innovation, and how board games and video games shape our understanding of the future of AI.

Enjoy the episode, and remember, you can find it and subscribe to Possible wherever you listen to podcasts.

AI is going to affect the whole world.

It's going to affect every industry.

It's going to affect every country.

It's going to be the most transformative technology ever, in my opinion.

So if that's true and it's going to be like electricity or fire, then I think it's important that the whole world participates in its design.

I think it's important that it's not just 100 square miles of patch of California.

I do actually think it's important that we get these other inputs, the broader inputs, not just geographically, but also different subjects, philosophy, social sciences, economists, not just the tech companies, not just the scientists involved in deciding how this gets built and what it gets used for.

Hi, I'm Reed Hoffman.

And I'm Aria Finger.

We want to know how, together, we can use technology like AI to help us shape the best possible future.

With support from Stripe, we ask technologists, ambitious builders, and deep thinkers to help us sketch out the brightest version of the future.

And we learn what it'll take to get there.

This is possible.

In the 13th century, Sir Galahad embarked on a treacherous journey in pursuit of the elusive Holy Grail.

The grail, known in Christian lore as the cup Christ used in the Last Supper, had disappeared from King Arthur's table.

The knights of the round table swore to find it.

After many trials, Gallaud's pure heart allowed him the unique ability to look into the grail and observe divine mysteries that could not be described by the human tongue.

In 2020, a team of researchers at DeepMind successfully created a model called AlphaFold that could predict how proteins will fold.

This model helped answer one of the holy grail questions of biology.

How does a long line of amino acids configure itself into a 3D structure that becomes the building block of life itself?

In October 2024, three scientists involved with AlphaFold won a Nobel Prize for these efforts.

This is just one of the striking achievements spearheaded by our guest today.

Demis Hassabas is a British artificial intelligence researcher, co-founder, and CEO of the AI company DeepMind.

Under his leadership, DeepMind developed AlphaGo, the first AI to defeat a human world champion In Go, and later created Alpha Fold, which solved the 50-year protein folding problem.

He is considered one of the most influential figures in AI.

Reed and I sat down for an interview with Demis, in which we talked about everything from game theory to medicine to multimodality and the nature of innovation and creativity.

Here's our conversation with Demis Asabas.

Demis, welcome to possible.

It was awesome dining with you at Queen's.

It was kind of a special moment in all kinds of ways.

And, you know, I think I'm going to start with a question that kind of came from your Babbage theater lecture and also from the fireside chat that you did with Mohamed El-Aryan, which is share with us the moment where you went from thinking chess is the kind of the thing that I have, you know, spent my childhood doing to what I want to do is start thinking about thinking.

I want to accelerate the process of thinking and that computers are a way to do that.

And how did you arrive at that?

What age were you?

What was that turn?

into the into metacognition?

Well, yeah, well, first of all, thanks for having me on the podcast chess for me was is where it all started actually in gaming and i started playing chess when i was four very seriously all through my childhood playing for the most of the england junior teams captaining a lot of the teams and uh for a long while i was gonna my main you know aim was to become a professional chess player you know grandmaster maybe one day possibly a world champion and that was my whole childhood really every every spare moment not at school i was i was playing going around the world playing chess you know against adults in international tournaments.

And then around 11 years old, I sort of had an epiphany, really,

that although I love chess and I still love chess today, is it really

something that one should spend your entire life on?

Is it the best use of my mind?

So that was one thing that was troubling me a little bit.

But then the other thing was, as we were going to training camps with the England chess team, you know, we started to use early chess computers and to try and improve your chess.

And I remember thinking that, you know, of course we were supposed to be focusing on improving the chess openings and chess theory and tactics.

But actually I was more fascinated by the fact that someone had programmed this inanimate lump of plastic to play very good chess against me.

And I was fascinated by how that was done.

And I really wanted to understand that and then eventually try and make my own chess programs.

I mean, it's so funny.

I was saying to Reid before this, my seven-year-old school just won the New York State chess championship.

So they have a long way to go before they get to you.

But he takes it on faith, like, oh, yeah, mom, I'm just going to go play chess kid on the computer.

Like, I'll go play against the computer a few games, which of course was sort of a revelation sort of decades ago.

And I remember, you know, when I was in middle school, it was obviously the deep blue versus Gary Kasparov.

And this was like a man versus machine moment.

And one thing that you've gestured at about this moment is that it illustrated, like in this case, based on grandmaster data, it was like brute force versus like a self-learning system.

Can you say more about that dichotomy?

Yeah, well, look, first of all, I mean, it's great.

Your son's playing chess.

And I think it's fantastic.

I'm a big advocate for teaching chess in schools as a part of the curriculum.

I think it's fantastic training for the mind, just like doing maths or programming would be.

And it's certainly affected the way, you know, the way I approach problems and problem solve and visualize solutions and plan, you know, teaches you all these amazing meta meta-skills, dealing with pressure.

So you sort of learn all of that as a young kid, which is fantastic for anything else you're going to do.

And as, you know, as far as deep blue goes, you're right.

Most of these early chess programs and then deep blue became the pinnacle of that were these types of expert systems, which at the time was that were the favored way of approaching AI, where actually it's the programmers that solve the problem, in this case, playing chess, and then they encapsulate that solution in a set of heuristics and rules, which guides a kind of brute force search towards, you know, in this case, making a good chess move.

And I always had this, although I was fascinated by these ALE chess programs that they could do that, I was also slightly disappointed by them.

And actually, by the time it got to deep blue, I was already studying at Cambridge in my undergrad.

I was actually more impressed with Kasparov's mind, because I'd already started studying neuroscience than I was with the machine, because he was this brute of a machine.

All it can do is play chess.

And then Kasparov can play chess at the same sort of roughly the same level, but also can do all the other things, amazing things that humans can do.

And so I thought, isn't that, you know, doesn't that speak to the wonderfulness of the human mind?

And it also, more importantly, means something was missing from very fundamental from deep blue and these expert system approaches to AI, right?

Very clearly, because deep blue did not seem,

even though it was a pinnacle of AI at the time, it did not seem intelligent.

And what was missing was its ability to learn, learn new things.

So, for example, it it was crazy that deep blue could play chess to world champion level, but it couldn't even play tic-tac-toe, right?

You'd have to reprogram.

Nothing in the system would allow it to play tic-tac-toe.

So that's odd, right?

That's very different to a human grandmaster because obviously play a simpler game trivially.

And then also it was not general, right?

In the way that the human mind is.

And I think those are the hallmarks.

That's what I took away from that match is those are the hallmarks of intelligence.

And they were needed if we wanted to crack AI.

And go a little bit into the deep learning, which obviously is part of the reason why deep mind was named for it is because part of, I think, the what was seen to be completely contrarian hypothesis that you guys played out with self-play and kind of learning system was that this learning approach was the right way to generate these significant systems.

So say a little bit about having the hypothesis, what the trek through the desert looked like.

And then what finding the Nile ended up with.

Yes.

Well, look, of course, we started DeepMind in 2010 before anyone was working on this in industry and there was barely any work on it in academia.

And we partially named the company DeepMind, the deep part because of deep learning.

It was also a nod to deep thought in, you know, Hitchhiker's Guys Galaxy and Deep Blue and other AI things.

But it was mostly around the idea we were better on these learning techniques.

Deep learning and hierarchical neural networks, they, you know, just sort of been invented, right, in seminal work by Jeff Hinton and colleagues in 2006.

So it's very, very new.

And reinforcement learning, which has always been a speciality of DeepMind,

and the idea of learning from trial and error, learning from your experience, right?

And then making plans and acting in the world.

And we combine those two things, really.

We sort of pioneered doing that.

And we called it deep reinforcement learning, these two approaches.

And deep learning to kind of build a model of the environment or what you were doing, in this case, a game.

And then the reinforcement learning to do the planning and the acting and actually accomplish it, be able to build agent systems that could accomplish goals.

In the case of games, is maximizing the score, winning the game.

And we felt that that was actually the entirety of what's needed for intelligence.

And the reason that we sort of were pretty confident about that is actually from using the brain as an example, right?

Basically, those are the two major components of how the brain works.

You know, the brain is a neural network, it's a pattern-matching matching and structure finding system.

But then it also has reinforcement learning and this idea of planning and learning from trial and error and trying to maximize reward, which is actually in the human brain and the animal brains, mammal brain is the dopamine system

implements that, a form of reinforcement learning called TD learning.

So that gave us confidence that if we pushed hard enough in this direction, even though no one was really doing that, that eventually this should work, right?

Because

we have the existence proof of the human mind.

And of course, that's why I also studied neuroscience.

Because when you're in the desert, like you say, you need any source of water or any evidence that you might get out of the desert.

There's, you know, even a mirage in the distance is a useful thing to understand in terms of giving you some direction when you're in the midst of that, of that desert.

And of course, AI was itself in the midst of that because, you know, several times this had failed.

The expert system approach basically had reached the ceiling.

I could easily hog the entire interview, so I'm trying not to.

So one of the things that the learning system obviously ended up creating was solving what was previously considered an insoluble problem.

There were even people who thought that computers couldn't, like classical computational techniques couldn't solve Go, and it did.

But not only did it solve Go, but in the classic Move 37, it demonstrated originality, creativity that was beyond, you know, the thousands of years of Go play and books and the hundreds of years of very serious play.

What was that moment of move 37 like for understanding where AI is?

And what do you think the next move 37 is?

Well, look, the reason Go was considered to be and ended up being so much harder than chess, so it took another 20 years,

even us with AlphaGo, and all the approaches that have been taken with chess, these expert systems uh, uh, approaches, had failed with Go, right?

Um, basically, couldn't even be a professional, let alone a world champion.

And the reason was two main reasons: one is the complexity of Go is so enormous.

You know, it's one way to measure that is there are 10 to the power 170 possible positions, right?

Far more than atoms in the universe.

There's no way you can brute force a solution to Go, right?

It's impossible.

But even harder than that is that it's such a beautiful, esoteric, uh, elegant game.

You know, it's sort of considered art, an art form in Asia, really, right?

And it's because it's both beautiful aesthetically, but also it's all about patterns rather than sort of brute calculation, which chess is more about.

And so

even the best players in the world can't really describe to you very clearly what are the heuristics they're using.

They just kind of intuitively feel the right moves, right?

They'll sometimes just say that, this move, why did you play this move?

Well, it felt right, right?

And then it turns out their intuition, if they're a brilliant player, their intuition is fantastic.

And

it's an amazingly beautiful and effective move.

But that's very difficult then to encapsulate in a set of heuristics and rules that to direct how a machine should play go.

And so that's why all of these kind of deep blue methods didn't work.

Now, we got around that by having the system learn for itself.

what are good patterns, what are good moves, what are good motifs and approaches, and

what are kind of valuable and high probability of winning positions are.

So it kind of learned that for itself through experience, through seeing millions of games and playing millions of games against itself.

So that's how we got AlphaGo to be, you know, better than world champion level.

But the additional exciting thing about that is that it means those kinds of systems can actually go beyond what we as the programmers or the system designers know how to do, right?

No expert system can do that because of course it's strictly limited by

what we already know and can describe to the machine.

But these systems can learn for themselves.

So and that's what we resulted in move 37 in game two of the famous world championship match, the challenge match we had against Lisa Doll in Seoul in 2016.

And that was a truly creative move.

You know, Goa has been played for thousands of years.

It's the oldest game humans have invented.

And it's the most complex game and and it's been played professionally for hundreds of years in places like japan and even still even despite all of that exploration by brilliant human human players we this move 37 was something never seen before and actually worse than that it was thought to be a terrible strategy in fact if you go and watch the documentary you know which i recommend on you it's on youtube now of of of alpha go you'll see the the professional commentators nearly fell off their chairs when they saw move 37 because they thought it was a mistake.

They thought

the computer operator, Adger, had misclicked on the computer because it was so unthinkable that someone would play there.

And then, of course, in the end, it turned out 100 moves later, that move 37, the stone, the piece that was put down on the board, was in exactly the right place to be decisive for the whole game.

So it turned, you know, now it's studied as a great classic of the of the Go, you know, history of Go, that game and that move.

And of course, then, and then even more exciting for that is that's exactly what we hoped these systems would do because the whole point of me and my whole motivation, my whole life of working on AI, was to use AI to accelerate scientific discovery.

And it's those kinds of new innovations, albeit in a game, is what we were looking for from our systems.

And, you know, that I think is an awesome rendition of kind of why it is these learning systems are

even now doing

original discovery.

What do you you think the next, you know, move 37

might be for kind of opening our minds to what is the way that AI can add a whole lot to the kind of quality of human thought, human existence, human science?

Yeah.

Well, look,

I think there'll be a lot of move 37s in almost every area of human endeavor.

Of course, the thing I've been focusing on since then is mostly being how can we apply those types of AI techniques, those learning techniques, those general learning techniques to science.

Big areas of science, I call them root node problems.

So problems where if you think of the tree of all knowledge that's out there in the universe, you know, can you unlock some root nodes that unlock entire branches or new avenues of discovery that people can build on afterwards?

Right.

And for us, protein folding and alpha fold was one of those.

It was always, you know, top of my list.

I have a kind of mental list of all these types of problems that i've come across throughout my life and and just being generally interested in all areas of science and um and and and sort of thinking through which ones would be suitable uh would both be hugely impactful um but also suitable for these types of techniques um and i think we're you know we're going to see a kind of new golden era of of these types of new strategies new ideas in very important areas of human endeavor.

Now, I would say one thing to say, though, is that we haven't fully cracked creativity yet, right?

So I don't want to claim that.

I think that there are, you know, I often describe it as three levels of creativity.

And I think AI is capable of

the first two.

So first one would be interpolation.

So you give it, you know, a million pictures of cats, an AI system, a million pictures of cats, and you say, create me a, the prototypical cat.

And it will just like average all the million cats pictures that it's seen.

And that prototypical one won't be in the training set.

So it will be a unique cat, but it's not very, you know, that's not very interesting from a creative point of view, right?

It's just an averaging.

But the second thing would be what I call extrapolation.

So that's more like AlphaGo, where you've played 10 million games of Go, you've looked at, you know, a few million human games of Go, but then you come up with, you extrapolate from what's known to a new strategy never seen before, like move 37.

Okay, so that's very valuable already.

That is, you know, I think that is true creativity.

But then there's a third level, which I call it kind of invention or out-of-the-box thinking, which is not only can you come up with a move 37, but could you have invented Go, right?

Or another

measure I like to use is if we went back to the time of Einstein in 1900, early 1900s, could an AI system actually come up with general relativity with the same information that Einstein had at the time?

And clearly today, the answer is no.

to those things, right?

It can't invent something as a game as great as Go, and

it wouldn't be able to invent general relativity just from the information that Einstein had at the time.

And so there's still something missing from our systems to get

true out-of-the-box thinking.

But I think it will come, but we just don't have it yet.

I think so many people outside of sort of the AI realm would sort of be surprised that sort of it all starts with gaming, but that's sort of gospel for what we're doing.

It's like, that's how we created these systems.

And so switching gears from board games to video games, can you give us just like the elevator pitch explanation for what exactly makes an AI that can play StarCraft II, like Alpha Star, so much more advanced and fascinating than the one that can play chess or Go?

Yeah, with AlphaGo, we sort of cracked the pinnacle of board games, right?

So Go was always considered the Mount Everest, if you like, of games AI for board games.

But there are even more complex games by some measures, if you take on board the most complex strategy games that you can play on on computers uh and starcraft 2 is is is acknowledged to be the sort of classic of the genre of real-time strategy games and it's a very complex game you've got to build up your base and and and your units and other things so every game is different right and the board game is very fluid and you've got to you've got to move uh many units around in real time and the the way we cracked that was to add this additional level in of a league of agents competing against each other, all seeded with slightly different initial strategies.

And then you kind of get a sort of survival of the fittest.

You have a tournament between them all.

So it's a kind of multi-agent setup now.

And the strategies that win out in that tournament go to the next, you know, the next epoch.

And then you generate some other new strategies around that.

And you keep doing that for many generations.

You're kind of both having this idea of self-play that we had in AlphaGo, but you're adding in this multi-agent competitive, almost evolutionary dynamic in there.

And then eventually you get an agent that or series or set of agents that are kind of the Nash distribution of agents.

So no other strategy dominates them, but they dominate the most number of other strategies.

And then you have this kind of Nash equilibrium.

And then you pick out the, you know, you pick out the top agents from that.

And

that succeeded very well with this type of very open-ended kind of gameplay.

So it's quite different from what you get with chess or go, where the rules are very prescribed and the pieces that you get are always the same.

And it's sort of a very ordered game.

Something like Starcraft's much more chaotic.

So it's sort of interesting to have to deal with that.

It has hidden information too.

You can't see the whole map at once.

You have to explore it.

So it's not a perfect information game, which is another thing we wanted our systems to be able to cope with, is partial information situations, which is actually more like the real world, right?

Very rarely in the real world do you actually have full information about everything.

Usually you only have partial information and then you have to infer everything else in order to come up with the right strategies.

And

part of the game side of this is, I presume you've heard that there's this kind of theory of homologens.

Yes.

That we're game players.

Is that informing the kind of thinking about how

games is both strategic, but also

kind of framing for like science acceleration, framing for

kind of the serendipity of innovation is is is in addition to the kind of the fitness function the the kind of evolution of self-play the ability to play scale compute are there other deeper elements to the game playing nature that allows this thinking of thinking well look i'm glad you brought up home eludens and it's a it's a wonderful book and it basically argues um that that that games playing is is is is actually a fundamental part of being human right in many ways, that's the, you know, the act of play.

What could be more human than that, right?

And then, of course, it leads into creativity, fun,

you know, all of these things kind of get built on top of that.

And so I've always loved them as a way to practice and train your own mind in situations that you might only ever get

a handful of times in real life.

but they're usually very critical, you know, what company to start, what deal to make, things like that.

So I think games is a way to practice those scenarios.

And if you take games seriously, then you can actually simulate a lot of the pressures one would have in decision-making situations.

And

going back to earlier, that's why I think chess is such a great training ground for kids to learn because it does teach them about all of these situations.

And

so, of course, it's the same for AI systems too.

It was the perfect proving ground for our early AI system ideas,

partly because they were invented to be challenging and fun for humans to play.

So they and of course, there's different levels of gameplay.

So we could start with very simple games like Atari games and then go all the way up to the most complex computer games like Starcraft, right?

And continue to sort of challenge our system.

So we were in the sweet spot of the S curve.

So it's not too easy, it's trivial, or too hard, you can't even see if you're making any progress.

You want to be in that

maximum sort of part of the S curve where you're making almost exponential progress.

And we could keep picking harder and harder games as our systems got improved.

And then the other nice feature about games is because they're some kind of microcosm of the real world, they've usually been boiled down to very clear objective functions, right?

So winning the game or maximizing the score is usually the objective of a game.

And that's very easy to specify to a reinforcement learning system or an agent-based system.

So you can, it's perfect for hill climbing against, right, and measuring ELO scores, ratings, and exactly where you are.

And then finally, of course, you can calibrate yourselves against the best human players.

So you can sort of calibrate what your agents are doing in their own tournaments.

In the end, even with the StarCraft agent, we had to eventually challenge a professional grandmaster at StarCraft to make sure that our systems hadn't overfitted somehow to their own tournament strategies, right?

It actually needed to be, oh, we grounded it with, oh, it can actually be a genuine human grandmaster StarCraft player.

The final thing is, of course, you can generate as much synthetic data as you want with games too, which is, you know, coming into vogue right now, again, about data limitations and, you know, with large language models and how many tokens left in the world and has it read everything in the world.

Obviously, for things like games, you can actually just play the system against itself and generate lots more data from the right distribution.

Can you double-click on that for a moment?

Like you said, it is in vogue to to talk about, are we running out of data?

Do we need synthetic data?

Like, where do you stand on that issue?

Well, I've always been a huge proponent of simulations

and simulations and AI.

And, you know, it's also interesting to think about what the real world is, right, in terms of a computational system.

And

so I've always been involved with trying to build very realistic simulations of things.

And now, of course, that interacts with AI because you can have an AI that learns a simulator of some real world system

just by observing that system or all the data from that system.

So I think the current debate is to do with these large foundation models

now pretty much use the whole internet, right?

And so then once you've tried to learn from those, what's left, right?

That's all the language that's out there.

Of course, there's other modalities like video and audio.

I don't think we've exhausted all of that kind of multimodal tokens, but even that will reach some limit.

So then the question comes of like, can you generate synthetic data?

And I think that's why you're seeing quite a lot of progress with maths and coding, because in those domains, it's quite easy to generate synthetic data.

The problem with synthetic data is, are you creating data that is from the right distribution?

the actual distribution, right?

Does it mimic the kind of real distribution?

And also, are you generating data that's correct?

Right.

And of course, for things like maths, for coding, and for things like gaming, you can actually test the final data and verify if it's correct, right?

Before you feed it in

as input into the training data for a new system.

So it's very amenable,

certain areas.

In fact, it turns out the more abstract areas of human thinking that you can verify and prove that it's correct.

And so therefore, that unlocks the sort of ability to create a lot lot of synthetic data.

Change is always happening.

But no matter what changes in five years, there's one thing that will stay the same, the price of your internet.

With the Xfiniti five-year price guarantee, you get five years of the most reliable Wi-Fi with our best equipment included.

for a price that stays exactly the same.

Restrictions apply, new residential customers only.

Taxes and fees extra and subject to to change.

Most reliable Wi-Fi based on OpenSignal Awards USA.

Fixed Broadband Experience Report, August 2024.

It's the Smuckers Uncrustables podcast with your host, Uncrustables.

Okay, today's guest is rough around the edges.

Please welcome Crust.

Thanks for having me.

Today's topic, he's round with soft pillowy bread.

Hey, filled with delicious PBJ.

Are you talking about yourself?

And you can take him anywhere.

Why'd you invite him?

And we are out of time.

Are you really cutting me off?

Uncrustables are the best part of the sandwich.

Sorry, Crust.

Gatorade is the number one proven electrolyte blend designed to hydrate better than water.

So you can lose more sweat

and raise your game.

Gatorade, is it in you?

So one of the things that, you know, is kind of also in addition to the kind of the frequent discussion around, you know, data, how do we get more, but one of the questions is, in order to do AI, is it important to actually have it embedded in the world?

Yeah.

Well, interestingly, if we talked about this

five years ago or certainly 10 years ago, I would have said that some real world experience, you know, maybe through robotics.

Usually when we talk about embodied intelligence, we're meaning robotics, but it could also be a very accurate simulator, right?

Like some kind of ultra-realistic game environment would be needed to fully understand the, say, the physics of the world around you, right?

And the physical context around you.

And there's actually a whole branch of neuroscience that is

predicated on this.

It's called action in perception.

So this is the idea that one can't actually fully perceive the world unless you can also act in it.

And the kinds of arguments go is like, how can you really understand the concept of the weight of something, for example, unless you can pick things up and sort of compare them with each other.

And then you get this sort of idea of weight.

Like, can you actually, you know, can you really get that notion just by looking at things?

It

seems hard, right?

Certainly for humans.

Like, I think you need to act in the world.

So this is the idea that acting in the world is part of your learning.

You're kind of like an active learner.

And in fact, reinforcement learning is like that, because because the decisions you make give you new experiences, but those experiences depend on the actions you took.

But also, those are the experiences that you'll then subsequently learn from.

So, in a sense, reinforcement learning systems are involved in their own learning process, right?

Because they're active learners.

And I think you can make a good argument that that's also required in the physical

world.

Now, it turns out I'm not sure I believe that anymore because because now, you know, with our systems, especially our video models, if you've seen VO2, you know, our latest video models, completely state of the art, which we released late last year.

And it astound, it kind of shocked even me that even though we're building this thing, that it can sort of basically by watching YouTube video a lot of YouTube videos, it can figure out.

you know, the physics of the world.

There's a sort of funny Turing test of, you know, in some sense, Turing tests in Verg commas of video models, which is, can you chop a tomato?

Can you show a video of a knife chopping a tomato with the fingers and everything in the right place?

And the tomato doesn't, you know, magically spring back together or the knife goes through the tomato without cutting you, et cetera.

And VO can do it.

And if you think through the complexity of the physics, you know, to understand this, you know, you've got to, what you've got to keep consistent and so on, it's pretty amazing.

Like, it's hard to argue that it doesn't understand something about physics and the physics of the world.

And it's done it without acting in the world and certainly not certainly not acting as a robot in the world.

Now, so

it's not clear to me there is a limit now with just sort of passive perception.

Now, the interesting thing is that I think this has huge consequences for robots as an embodied intelligence, as an application, because the types of models we've built, Gemini and also now VO, and we'll be combining those things together at some point in the future, is we've always built Gemini, our foundation model, to be multimodal from the beginning.

And the reason we did that, and

we still lead on all the multimodal benchmarks, is because for twofold.

One is we have a vision for this idea of a universal digital assistant, an assistant that goes around with you on the digital devices, but also in the real world, maybe on your phone or a glasses device,

and actually helps you.

um in the real world like recommend things to you navigate you you know help you navigate around um you know help with physical things in the world like cooking, stuff like that.

And

for that to work, you obviously need to understand the context that you're in.

It's not just the language I'm typing into a chat bot.

You actually have to understand the 3D world I'm living in, right?

I think to be a really good assistant, you need to do that.

But the second thing is, of course, is exactly what you need for robotics as well.

And we released our first big sort of Gemini robotics work, which has caused a bit of a stir.

And that's the beginning of showing what we showcasing what we can do with these multimodal models that do understand physics of the world with a little bit of robotics fine-tuning on top to do with the actions and the motor actions and the planning a robot needs to do.

And it looks like it's going to work.

So actually now I think these general models are actually going to transfer.

to the embodied robotic setting without too much extra sort of special casing or extra data or extra effort, which is probably not what most people, even the top roboticists, would have predicted five years ago.

I mean, that's wild.

And, you know, thinking about benchmarks and what, what we're going to need these digital assistants to do, like when we look under the hood of these big AI models, there's, there's, well, some people would say it's attention.

So the trade-offs is thinking time versus output quality.

We need them to be fast, but of course we need them to be accurate.

And so talk about like, what is that trade-off and how is that going in the world right now?

Well, look, we, we, of course, we sort of pioneered all that area of thinking systems because that's what our original gaming systems all did, right?

Go, AlphaGo, but actually most famously AlphaZero, which was our follow-up system that could play any two-player game.

And there you always have to think about your time budget, your compute budget you've got to actually do the planning part, right?

So the model you can pre-train.

just like we do with our foundation models today.

So you can play millions of games offline and then you have your model of chess or your model of Go or whatever it is.

And then, but at test time, at runtime, you've only got one minute to, you know, to think about your move, right?

One minute times however many computers you've got running.

So that's still a limited compute budget.

So what's very interesting today is there's this trade-off between, do you use a more expensive, larger base model, foundation model?

right so in our case you know we have we have different size names like gemini flash and which and or pro or even bigger which is ultra but those models are more costly to run.

So they take longer to run.

So you can, but they're more accurate and they're more capable.

So you can run a bigger model with a shorter number of planning steps, or you can run a very efficient smaller model that's slightly less powerful, but you can run it for many more steps, right?

And it's actually currently what we're finding is it's sort of roughly about equal.

But of course, what we want to find is some, is some, the Pareto frontier of that, right?

Like actually the exact right trade-off of the size of the model and the expense of that running that model versus the amount of thinking time you want to and thinking steps that you're you're able to do per unit of compute time.

And I think that's, that's actually fairly cutting-edge research right now that I think all the leading labs are probably experimenting on.

And I think there's not a clear answer to that yet.

You know, all the major labs, the deep mind, others, are all working intensely on coding assistance.

And there's, you know, a number of reasons.

Everything from, you you know, like it's one of the things that accelerates productivity across the whole front.

It has a kind of good fitness function.

It's also, of course, one of the ways that, you know, everyone is going to be enhancing productivity is having a software kind of co-pilot agent for helping.

There's just a ton of reasons.

Now, one of the things that gets interesting here is as you're building these,

You know, obviously there's a tendency to start with these computer languages that have been designed for humans.

What would be computer languages that would be designed for AIs or an agentic world or designed for this hybrid process of a human plus an AI?

Is that a good world to start looking at those kind of computer languages?

How would it change our theory of computation, linguistics, et cetera?

I think we are entering a new era in coding, which is going to be very interesting.

And, you know, as you say, all the leading labs are pushing on this frontier for many reasons.

It's easy to create synthetic data.

So that's another reason that

everyone's pushing on this vector.

And I think we're going to move into a world where,

you know, sometimes it's called vibe coding, where you're basically coding with natural language, really.

Right.

And then, and, and we've seen this before with computers, right?

I remember when I first started programming, you know, in the in the 80s, we were doing assembler.

And then, of course, you, you know, that seems crazy now.

Like, why would you do machine code?

You just, you know, you start with C and then you get Python and so on.

And really, one could see as the natural evolution of going higher and higher up the abstraction stack of programming languages and leaving the lower, the more and more of the lower level implementation details to the compiler, in a sense.

And now this is just, you know, one could just view this as the, as the natural sort of final step is, well, we just use natural language.

And then the whole, you know, everything is high level program, you know, super high level programming language.

And I think we eventually that's maybe what we'll get to.

And the exciting thing there is that, of course, it will make accessible coding to a whole new range of people, creatives, right?

Who normally would, you know, designers, game designers, app writers,

that would normally would not have been able to implement their ideas without the help of, you know, teams of programmers.

So that's going to be pretty exciting, I think, from a creativity point of view.

But it may also be very good, certainly in the next few years for coders as well, because I think

there's, and I think this in general with these AI tools is I think that the people that are going to get most benefit out of them initially will be the experts in that area who also know how to use these tools in precisely the right way, you know, whether that's prompting or interfacing with your existing code base.

You know,

there's going to be this sort of interim period where I think the current experts who embrace these new tools, whether that's filmmakers, game designers, or coders, are going to be like superhuman

in terms of what they're able to do.

And I see that with some, you know, film,

you know, directors and film designer friends of mine who are able to, you know, create pitch decks, for example, for new film ideas in a day on their own.

You know, and then they can, but it's very high quality pitch deck that they can pitch for a $10 million budget for.

And normally they would have had to spend a few tens of thousands of dollars just to get to that pitch deck, which is a huge risk for them.

So it becomes,

you know, I think there's going to be a whole new, incredible set of opportunities.

And then there's the question of like, if you think about creative, the creative arts, whether there'll be new ways of working, much more fluid, instead of doing, you know, Adobe Photoshop or something, you're actually co-creating this thing with this fluid, responsive tool.

And

that could be kind of feel more like minority report or something, you know, I imagine, with the kind of interface and there's this thing swirling around you and you're, and, and you're kind of, but it'll, it'll require people to get used to a very new workflow to take like maximum advantage of that.

But I think when they do, it'll be probably incredible for those people.

They'll be like 10x more

productive.

So I want to go back to the world of multimodal that we were talking about before with sort of robots in the real world.

And so right now, most AI doesn't need to be multimodal in real time because the internet is not multimodal.

And for our listeners, that means absorbing many types of input, voice, text, vision at once.

And so can you go deeper in what you think the benefits of truly real-time multimodal AI will be?

And like, what are the challenges to get to that point?

I think, first of all, we live in a multimodal world, right?

That's that, and we have our five senses and that's what makes us human.

So if we want our systems to be brilliant tools or fantastic assistants, I think in the end, they're going to have to understand the world, the spatial temporal world that we live in, not just our linguistic maths world, right?

Abstract thinking world.

I think that they'll need to be able to act in and plan in and process things in the real world and understand the real world.

I think that...

computer, you know, sort of the potential for robotics is huge.

I don't think it's had its

chat GPT or its alpha fold moment yet, say in science and language, right?

Or alpha go moment.

I think that's to come.

But I think, I think we're close.

And as we talked about before, I think in order for that to happen i think the the shortest path i see that happening on now is these general multimodal uh models being eventually good enough and maybe we're not very far away from that to sort of install on a robot perhaps a humanoid robot with the cameras now there's additional challenges of you've got to fit it locally on maybe on the local chips to have the latency fast enough and so on but you know as every as we all know just wait a couple of years and though though you know those systems that stay of the art today will fit on a little mobile chip tomorrow.

So I think it's very exciting, multimodal from that point of view, robotics, assistance.

And then finally, I think also for creativity, I think we're the first model in the world, Gemini 2.0, that you can try now in AI Studio that allows native image generation.

So not calling a separate program, you know, in this separate model, in our case, ImageN3, you know, which you can try separately, but actually Gemini itself natively coming up in the chat flow of images.

And I think people seem to be really enjoying using that.

So it's sort of like you're now talking to a multimodal chat bot, right?

And so you can get it to express emotions in pictures, or you can give it a picture and then tell it to modify it and then continue to work on it with word descriptions.

You know, can you remove that background?

Can you do this?

So this is, this goes back to the other earlier thing we said about, you know, programming or any of these creative things in a new workflow.

I think we're just seeing the glimpse of that if you try out this new Gemini 2 experimental model

of how that might look in image creation.

And that's just the beginning.

Of course, it will work with video and coding and all sorts of things.

So in the land of the real world, multimodal, one of the things that you know, frequently people speculate is, you know, geolocation of AI work.

And obviously in the US, we intensely track everything that's happening on the West Coast.

We also intensely track DeepMind and then somewhat less Mistral,

you know, and others.

What's some of the stuff that's really key for the world to understand about what's coming out of Europe?

What's the benefit of having there be multiple major centers of innovation and invention, you know, not just within the West Coast, but also obviously DeepMind and London and Mistral and Paris and others.

And what are some of the things to,

for people to pay attention to, why it's important and what's happening, especially within the UK and European AI ecosystem?

We started deep mind in London and

still headquartered here for several reasons.

I mean, this is where I grew up, as what I know.

It's where I had all the contacts that I had.

But the competitive reasons were that we felt that the talent in the UK and in Europe was the coming out of universities was the equivalent of the top US ones.

You know, Cambridge, My Alma Mata and Oxford, they're up there with the MITs and Harvards and the Ivy Leagues, right?

I think they're sort of, you know, they're always in the top 10 there together on the university world tables.

But if you, this is certainly true in 2010, if you were coming, say you had a PhD in physics out of Cambridge and you didn't want to work in finance at a hedge fund in the city, but you wanted to stay in the UK and be intellectually challenged, there were not that many options for you, right?

There are not that many deep tech startups.

So we were the first really to prove that could be done.

And actually, we were a big draw for the whole of Europe.

So we got the best people from the technical universities in, you know, Munich and in Switzerland and so on.

And for a long while, that was a huge competitive advantage.

And

also salaries were

cheaper here than in the in the West Coast.

And you weren't competing against the big incumbents, right?

And also it was conducive.

The other reason I chose to do that was I knew that AGI, which was our plan from the beginning, you know, solve intelligence and then use it to solve everything else.

That was our, where we articulated our mission statement.

And I still like that, that framing of it.

It was a 20-year mission.

And if you're on a 20-year mission, and we're now 15 years in and I think we're sort of on track, unbelievably, right?

Which is strange for any 20-year mission, but is...

is you don't want to be too distracted on the way in a deep science, deep technology, deep scientific mission.

So

one of the issues I find with Silicon Valley is lots of benefits, obviously, contacts and support systems and funding and amazing things and the amount of talent there, the density of talent.

But it is quite distracting, I feel.

Like everyone and their dog is trying to do a startup, you know, that they think is going to change the world, but it's just a photo app or something.

And then, you know, the cafes are filled with this.

Of course, it leads to some great things, but it's also a lot of noise if one actually wants to commit to a long-term mission that you think is the most important thing ever.

And you don't want to be too, you know, you and your staff and want to be too distracted distracted and like, oh, I could make a, maybe I could make 100 million though if I jumped and did this, you know, quickly did this gaming app or something, right?

And I think that's sort of the milieu that you're in in the valley, at least, at least back then.

Maybe this is less true now.

There's probably more mission focused startups now.

But I also, I kind of also wanted to prove it could be done elsewhere.

And then the final reason I think it's important is that AI is going to affect the whole world, right?

It's going to affect every industry.

It's going to affect every country.

It's going to be the most transformative technology ever, in my opinion.

So if that's true, and it's going to be like electricity or fire, you know, more impactful than even the internet or

mobile, then

I think it's important that the whole world participates in its design

and with the different value systems that we think are out there that are, you know, philosophies that are, you know, are good philosophies and, you know, from democratic values, you know, Western Europe, US, you know, I think it's important that it's not just 100 square miles of, you know, a patch of California, right?

I do actually think it's important that we get these other inputs, the broader inputs, not just geographically, but also, and I know you agree with this, Reed, like different subjects, philosophy, social sciences, economists.

academia,

civil society, not just the tech companies, not just the scientists involved in deciding how this gets built and what it gets used for.

And I feel that I've always felt that very strongly from the beginning.

And I think having some European involvement and some UK involvement at the top table of the innovation is a good thing.

So Demas, one of the areas of AI that when anyone asks me like, hey, Aria, I know you're interested in AI, but like, well, you can write my emails.

Like, why is it so special?

I just say, no, think about what it can do in medicine.

I always talk about AlphaFold.

I tell them about what Reed is doing.

Like, I'm just so excited for those breakthroughs.

Can you give us just a little bit?

You had this seminal breakthrough in AlphaFold and what is it going to do for the future of medicine?

I've always felt that like, what are the most important things AI can be used for?

Right.

And I think there are two.

One is human health.

That's number one, trying to solve and cure terrible diseases.

And then number two is to help with, you know, energy, sustainability, and climate, the planet, the planet's health, let's call it, right?

So there's human's health and then there's a planet's health.

And those are the two areas that we have focused on in our science group, which I think is, you know, fairly unique amongst the AI labs, actually, in terms of how much we pushed that from the beginning.

And then, and protein folding specifically was this canonical for me.

I sort of came across it when I was an undergrad in Cambridge, you know, 30 years ago.

And it's always stuck with me as this fantastic puzzle.

So that that would unlock so many possibilities, you know, the structure of proteins.

Everything in life depends on proteins.

And we need to understand the structure so we know their function.

And if we know the function, then we can understand what goes wrong in disease and we can design drugs and molecules that will bind to the right part of the surface of the protein

if you know the 3D structure.

So it's a fascinating problem.

It goes to all of the computational things we were discussing earlier as well.

Can you enumerate, can you see through this forest of possibilities, you know, all these different ways a protein could fold?

You know, some people estimate Levental very famously in the 60s, 1960s estimated an average protein can fold in 10 to the 300 possible ways right so how do you how do you get you know enumerate those and astronomical possibilities and yet it is possible with these learning systems and and that's what we did with alpha fold and then we spun out a company isomorphic and i know reed's very interested in this area too with his new company of like if we can reduce the time it takes to discover a protein structure from it used to take a phd student their entire phd as a rule of thumb to discover one protein structure.

So four or five years.

And there's 200 million proteins known to science.

And we folded them all in one year.

So we did a billion years of PhD time in one year is another way you can think of it.

And then gave it to the world, you know, freely to use.

And, you know, 2 million researchers around the world have used it.

And we spun out a new company, Isomorphic, to try and go further downstream now and develop the drugs needed and try and reduce that time.

I mean, it's just amazing.

I mean, Demis, there's a reason they give you the Nobel Prize.

Thank you so much for all of your work in this area.

It's truly amazing.

And now to Rabid Fire.

Is there a movie, song or book, that fills you with optimism for the future?

There's lots of movies that I've watched that have been super inspiring for me.

Things like even like Blade Runner is probably my favorite sci-fi movie.

But maybe maybe it's not that optimistic.

So if you want an optimistic thing, I would say the culture series by Ian Banks.

I think that's the best depiction of a post-AGI universe where, you know, AIs and you've basically got societies of AIs and humans and kind of alien species actually, and sort of

maximum human flourishing across the galaxy.

That's a kind of amazing, compelling future that I would hope for humanity.

What is a question that you wish people asked you more often?

The questions

I sort of often wonder why people don't discuss a lot more, including with me, are some of the really fundamental properties of reality that actually drove me in the beginning when I was a kid to think about building AI to help us sort of this ultimate tool for science.

So, for example, you know, I don't understand why people don't worry more about what is time, what is, what is, you know, what is gravity, what, what, you know, or they're basically the fundamental fabric of reality, like, which is sort of staring us in the face all the time, all these very obvious things that impact us all the time.

And we, we don't really have any idea how it works.

And I don't know why that doesn't trouble people more.

It troubles me.

And,

you know, I'd love to have more debates with people about those things.

But actually, most people don't seem to, you know, they seem to sort of shy away from those topics.

Where do you see progress or momentum outside of your industry that inspires you?

That's a tough one because AI is so general.

It's almost touching, you know, what industry is outside of the AI industry.

I'm not sure there's many.

Maybe the progress going on in quantum is kind of interesting.

I still believe AI is going to get built first and then will maybe help us perfect our quantum systems.

But I have ongoing bets with some of my quantum friends like Hartman Nebin on they're going to build quantum systems first and then that will help us accelerate AI.

So I always keep a close eye on the on the advances going on with quantum computing systems.

Final question.

Can you leave us with a final thought on what is possible over the next 15 years if everything breaks humanity's way?

And what's the first step to get there?

Well, what I hope for next 10, 15 years is

what we're doing in medicine to really have new breakthroughs.

And I think maybe in the next 10, 15 years, we can actually have a real crack at solving all disease, right?

That's, that's the mission of isomorphic.

And I think with AlphaFold, we showed what the potential was to sort of do what I like to call science at digital speed.

And why couldn't that also be applied to finding medicines?

And so my hope is 10, 15 years' time, we'll we'll look back on the medicine we have today a bit like how we look back on medieval times and how we used to do medicine then, you know, and that would be, I think, the most incredible benefit we could imagine from AI.

Possible is produced by Wonder Media Network.

It's hosted by Aria Finger and me, Reid Hoffman.

Our showrunner is Sean Young.

Possible is produced by Katie Sanders, Edie Allard, Sarah Schleid, Vanessa Handy, Aaliyah Yates, Palomo Moreno-Jimenes, and Malia Agudelo.

Jenny Kaplan is our executive producer and editor.

Special thanks to Suria Yalamanchili, Sayeda Sepieva, Vanasi Dilos, Ian Alice, Greg Biato, Parth Patel, and Ben Ralis.

And a big thanks to Leila Hajaj, Alice Talbert, and Denise Owusu-Afriye.