Asimov: Building An Omniscient RL Oracle with ReflectionAI’s Misha Laskin

Transcript

Speaker 2 Hi, listeners. Welcome back to No Priors.
RL is back with a vengeance, and one of the most talent-dense new research labs has a product release, a new code comprehension agent.

Speaker 2 Reflection AI's co-founders, Misha Laskin and Jana Santonaglo, work together as leaders at Google DeepMind on groundbreaking projects like AlphaGo, AlphaZero, and Gemini.

Speaker 2 I talk to Misha about building universal superhuman agents, the trickiness of reward modeling, bringing all knowledge work tasks under data distribution, how RL for language and robotics differs, the windsurf non-acquisition, and the landscape from here.

Speaker 2 Misha, welcome. Thank you for doing this.

Speaker 1 Yeah, thanks, Sarah, for having me.

Speaker 2 So it's been

Speaker 2 about a while, like year and a half since you guys started the company. Is that about right?

Speaker 1 Roughly a year and a half, maybe a bit less, but I'd say it's ballpark correct.

Speaker 2 Well, can you just start by describing, you said that the company's mission is to build super intelligent autonomous systems.

Speaker 2 And we've talked before about why, like, this is the moment in time that's possible.

Speaker 2 What is different about that from building just super intelligence, which is now a sort of more popular, ambitious goal?

Speaker 1 At a high level, it's fairly synonymous, but maybe there are different ways of thinking about how to build super intelligence and what that might look like.

Speaker 1 I think on one spectrum, there's an academic way to look at it, which is,

Speaker 1 in some sense, to some extent, super intelligence in that sense has already been achieved. So

Speaker 1 AlphaGo was a super intelligent system, and there were other systems during that time that were built that were super intelligent in narrow domains.

Speaker 1 And I think you can go for the goal of building a very broad super intelligence by, you know, kind of locking yourself up in an academic, or it's not really an academic, but kind of an industrial lab with

Speaker 1 that is sort of kind of decoupled from product or customers and kind of max out all the benchmarks that are out there and build super intelligence that way. I think that is that is one approach.

Speaker 1 I think the other approach is to kind of think about

Speaker 1 what is super intelligence more concretely? How is it going to be deployed? What is it actually going to look like in people's hands and build backwards from there?

Speaker 1 So I would kind of say that that approach is more kind of co-designing product and research together. Now, the kind of benefits of that approach is that you're kind of

Speaker 1 you're optimizing for real problems. The cons to it is that you have to be a lot more focused, right?

Speaker 1 Because your product kind of defines the sort of capabilities that you want to draw out of the system.

Speaker 1 And you have to start out a lot more focused before expanding across other product categories and other capabilities.

Speaker 1 So I would say that on the spectrum of companies that are kind of super intelligence in just a research lab and then figure out what the product is, you know, once it's built, as opposed to co-designing product and research together to build very powerful systems in what I would call kind of

Speaker 1 ASI complete categories. You can pick something that is maybe too small of a category to draw out a super intelligence.

Speaker 1 As long as you pick a category that I would say is kind of big enough to be ASI complete.

Speaker 1 I think, and this is kind of our approach at reflection, is it makes a lot more sense to be focused and co-design those two things together, the product and the research.

Speaker 2 I want to come back to choice of initial problem

Speaker 2 in a minute. In terms of just having the intuition and the confidence to say, like, we can go do this as a team.
We're going to recruit great people and go build reflection.

Speaker 2 You and your co-founder, Yannis, were working at Gemini together in key roles before. And previously, you had been part of Peter Abiel's lab, who's an amazing researcher as well.

Speaker 2 You had described to me as having, like, I believe the term you used was somewhat muscled your way into AI and deep learning from originally a physics background.

Speaker 2 Like, how did you decide to go work on this and end up in Peter's lab?

Speaker 1 Yeah, as a kid,

Speaker 1 I became really interested in physics, theoretical physics. It was, I mean, probably a byproduct of I'm Russian, kind of Israeli-American, and moved around.

Speaker 1 And then when I landed in the States, it was kind of in a desert in Washington State, learning a new language. And so I had a lot of time in my hands and you know, bumped into my parents had

Speaker 1 the Feynman lectures in their in their library. And so I spent a lot of time just reading what was on the shelf and bumped into that and got really interested in physics.

Speaker 2 How old were you?

Speaker 1 I was, so when my interest in physics started, that was probably around middle school. And it really, I think, became the thing I wanted to do in high school.

Speaker 1 And the reason physics was so interesting was because it kind of seemed like the science that was at the root of many of the things that became impactful.

Speaker 1 So I was reading about the history of the transistor, and it was invented by a group of theoretical physicists. I was reading about how GPS works.

Speaker 1 So it turns out you need special relativity in order to accurately account for spatial coordinates using GPS.

Speaker 1 And so I felt that physics was kind of the

Speaker 1 root science to pursue. I went in and studied it, got my PhD in it.
At the same time, I started seeing kind of deep learning take off and really saw kind of alpha go happen.

Speaker 1 And my sense was that I want to to pursue the kind of the root science,

Speaker 1 but there is such a thing as kind of the root science of our time.

Speaker 1 I think a lot of physics has as a field, it's very interesting, but it's crystallized a lot more than, you know, than a new dynamic field that was being born out of nothing.

Speaker 1 And AI to me felt like it was going through the moment

Speaker 1 that physics went to maybe 100 years ago that when I do problem sets, but I did problem sets in physics.

Speaker 1 And the most exciting stuff that I was working on there was basically the things that people were discovering 100 years ago. So I saw it kind of happening in front of my eyes.

Speaker 1 And I just decided that that was the science to bet on.

Speaker 1 And in particular, because it was AlphaGo that was, that inspired me, because it was just unbelievable to me that you could train a neural network

Speaker 1 to have such immense kind of basically reasoning capabilities, right? This thing was able, it was super intelligent within the realm of Go.

Speaker 1 Yeah, I decided that I needed to kind of get myself into the best reinforcement learning lab I could.

Speaker 1 And

Speaker 1 Peter's lab was that lab for me.

Speaker 2 And then you and Yannis were working specifically on RL at Gemini.

Speaker 1 That's right. So Yannis, my co-founder, was the overall RL lead for Gemini at the time for 1 1.5.

Speaker 1 I was working very closely with him on his team. Yeah, it was a really exciting time because we went both of us from being reinforcement learning researchers.

Speaker 1 uh to um training large language models at scale and we kind of saw at the end of that project of what's to come, which was, you know, Gemini 1 1.5 lands.

Speaker 1 And it became pretty clear to us that the next paradigm and effectively the final paradigm

Speaker 1 that we need to have in place before

Speaker 1 a, you know, what people used to call AGI, or now I think the goalposts have shifted to ASI is reached, is just figuring out how to scale reinforcement learning on top of large language models.

Speaker 1 And the first instances of that have been happening over the last year. I think we're still actually a lot earlier than people think,

Speaker 1 but there is a wedge in and things have started to work.

Speaker 2 Yeah,

Speaker 2 I definitely want to talk about what you think is solved and unsolved here.

Speaker 2 The entire field has clearly gotten more focused on deep reinforcement learning over the last 18 months. You have this

Speaker 2 huge product launch this week with Asimov. Can you just sort of describe what it is?

Speaker 1 So Asimov is the best code research agent in the world. It's a comprehension agent, meaning that it's really designed to kind of feel almost like a deep research for large code bases.

Speaker 1 The way a developer is supposed to feel interacting with it is effectively like they have a principal level engineer who deeply understands their organization at their fingertips.

Speaker 1 So it's very different from the existing set of tools that focus primarily on code generation. like every single coding tool has some code generation and some comprehension aspect.

Speaker 1 But as we we spent a lot of time kind of with our customers, trying to understand why coding tools, and this is enterprise specific.

Speaker 1 So I think the world is different with startups, but within enterprises, when you, you know, they're adopting coding tools and you see the impact that this is having on their actual productivity, and I think it's much lower than people expect.

Speaker 1 So it's...

Speaker 1 In fact, it's sometimes negative, sometimes negligible.

Speaker 2 Did you see the recent meter report on that?

Speaker 1 Yeah, the meter report was very close to what I've been hearing when talking to engineering leaders within larger organizations. And it's not just enterprises.
It's, I'd say, growth stage startups.

Speaker 1 It's any kind of engineering organization that has a sufficiently complex code base and sufficiently large team that no one engineer can have the entire code base kind of in their heads.

Speaker 1 And so reflection is one of those places as well.

Speaker 1 We use our product actively because the, you know, training large language models is complex. And there's right the large language model code bases, there's the product code base.

Speaker 1 Knowledge is kind of scattered across engineers. It's not just in the code base.
It exists in your chats and project management tools and other places where knowledge lives.

Speaker 1 And so what we're effectively building towards is this kind of omniscient oracle for organizations that you can go in,

Speaker 1 ask

Speaker 1 any question at any level of complexity, and it'll provide you an answer at the level of what that principal level engineer would have given you or you know in the future as the product expands to other categories

Speaker 1 what the person who's most embedded in the organization understands

Speaker 1 and of course once you have that solved it begets much more reliable agents that act for you as well but i think the world today is focused on i would say 80 kind of action 20% understanding, so 80% code generation, 20% comprehension.

Speaker 1 The actual problem is exactly the opposite, that when you look at what an engineer does in an organization, 80% of their time they're spending trying to comprehend complex systems and collaborating with teammates.

Speaker 1 And what is collaboration? It's usually someone asking someone else a question about a system that they don't know.

Speaker 1 And so that, I think, is kind of the problem at the heart of what would prevent a super intelligence from actually working within an organization.

Speaker 1 It's really this kind of understanding and being able to ingest from a lot of sources of information and from the team. And once you have that, then the action part, I think, becomes,

Speaker 1 I don't want to say trivial, but a lot easier. Like to me, it seems like really 20% of the problem is teaching these agents how to act.
And it's more or less solved.

Speaker 2 That definitely squares with both my understanding of engineering and then my experience with coding agents personally, right?

Speaker 2 If you think about the, I don't know, the like context load time of just like trying to understand a new system or code anyone else has written or code your agent has written.

Speaker 2 In the end, it's like, you know, very stupid implementation that like, if you had reasoned through it with context of the system, you never would have made such a mistake or like a, you know, works in, works in my environment type problem.

Speaker 2 And so I think that very much mirrors my, you know, intuitive understanding of engineering here. That's great as problem formation.

Speaker 2 What makes Asimon different in terms of ability to understand better versus just generate code?

Speaker 1 There are a few things. So I think this is kind of where, you know, why it is so important to co-design research and product.

Speaker 1 Because as a researcher, you'd go in and say the answer is entirely in the agent design or the model or something like this.

Speaker 1 And as a product person, you would say, well, it's in these product differentiators, like being able to draw not just from your code base, but knowledge that lives, you know, in other sources of information or being able to learn from the engineering team to offload their tribal knowledge.

Speaker 1 So an engineer can go in and teach Asimov like, hey,

Speaker 1 we deploy our, you know, when we say environment jobs on our team, we mean this specific thing, which we mean kind of Google bath jobs.

Speaker 1 So now when another engineer asks a question about environment jobs in the future, the system just knows what they're talking about. A lot of knowledge is stored in engineers' heads.

Speaker 1 And I think you need both of these things. You need to understand your customer really closely and develop differentiated product almost independently, right, of the models that are towering it.

Speaker 1 But then you also need to innovate on the research in terms of agent design and model training to actually drive the capabilities that you want to see out of the system.

Speaker 1 And this becomes an evaluation problem, which is basically at the heart of any frontier lab as well.

Speaker 1 This is, I think, the least spoken about part of what Frontier labs do, but possibly the most important, which is figuring out how they evaluate.

Speaker 1 Like, what makes Claude magically feel better at code than

Speaker 1 another model out there?

Speaker 1 They did something right in their evaluations. So

Speaker 1 when you look at this problem specifically, there are different capabilities that you need to

Speaker 1 train. And what we do is we really post-train models where we really focus on post-training today.
Some of these things are long context reasoning. Now,

Speaker 1 when I say long context reasoning, I don't mean I actually mean kind of small models with very long contexts that are able to go into giant code bases, sort of suck up as much information as they can, and reason over and output relevant stuff, basically.

Speaker 1 So it's almost like neural retrieval. There are capabilities like tool use and multi-hop reasoning.
So this is more for a, you have your agent and it's designed with some tools.

Speaker 1 And there are two ways of training agentic models. One is in this very general way where you just train it on thousands of environments and make it like the most general agent possible.

Speaker 1 And that is kind of almost like the pre-training of agents. And that's sort of what, you know, that's what a Frontier Lab does.
That's what there's a new release from Kimi2.

Speaker 1 That's kind of what that model does. And that's definitely part of it.
But in order to, that kind of gives you a nice general base to start from.

Speaker 1 But then to drive a capability kind of depth-wise, like if you really want this reasoner that has, you know, search tools and, you know, ability to call like these long reasoning context models and other you know other tools that it might want to interact with like oh when do i when do i read from jira when do i read from um another tool like this is kind of a reasoning problem if you train with those specific tools in mind uh that's typically what people refer to when they when they say tool use like they actually train for a specific set of tools and really drive like the capabilities um for those tools so these are the kinds of research problems that you need to solve in order to build the overall system that's the best in the world it's not any one thing it's it's all these things combined.

Speaker 1 And some examples of systems that are being trained for a specific set of tools.

Speaker 1 The thing that comes to mind is the Grog 4 release, and they kind of showed a plot of their general model, and then the model that was trained with a tool to basically climb on humanity's last exam.

Speaker 1 And there was some big noticeable difference between the two. Now, that's great, but I think the

Speaker 1 downside of that is that does humanity's last exam actually matter in any meaningful way for an end user? And I would argue that some weak correlation, but the answer is most likely no.

Speaker 1 And so you have to build the tools and train for the things that users actually want. I think that there's sort of no way around that.

Speaker 2 What can you share about how you evaluate either like technically or philosophically that makes SASMOS performance great?

Speaker 1 This is sort of why it makes sense to do something like this as a startup.

Speaker 1 So the only advantage that you'll ever have as a startup over a big incumbent, especially when there are such talented teams out there,

Speaker 1 is kind of focus and velocity against the thing that you're focused on. Now, I think you need, if you want to be playing in what is

Speaker 1 arguably, I think, the biggest category in AI, which is coding, then

Speaker 1 you need to have the talent as well to do it.

Speaker 1 But, you know, what do you do if you don't have the billions of dollars to pre-train models? The only way we can win, I think, is by being very focused.

Speaker 1 So the way I would describe what does it look like to work on a big model

Speaker 1 within an incumbent lab is that you are one of hundreds of evals. There are teams,

Speaker 1 when you look at the model card for, let's say, the O1 paper that came out. I think last year, if you look at the distribution of what most people worked on on that paper, it was evals.

Speaker 1 So you're one of

Speaker 1 many people doing all sorts of evals.

Speaker 1 And

Speaker 1 spreading yourself in that sense, you get something that's general, but it's spread fairly thin.

Speaker 1 As a startup and a startup that has a very focused product that didn't, you know, that's not kind of being too diffuse and that's pretty opinionated about what it is it's building.

Speaker 1 Your evals are basically what, you know, in the startup lore, when I know Paul Brahm would tell you to kind of go talk to customers, like half the time build product, half the time talk to customers.

Speaker 1 I think in the AI age, it's

Speaker 1 develop your evals based on what customers are saying and what they're doing so you have to work with your customers to look at what prompts it is that they're uh you know trying to solve what general questions are they trying to unlock so right there's very specific pain points that um you know we've identified like onboarding being one of them like in a big company uh it takes months to onboard an engineer so how do you develop evals that accelerate the onboarding of an engineer from you know months to hopefully just a couple of weeks now that all the questions that they had, they can just ask Asimov and be able to onboard much faster.

Speaker 1 So I think

Speaker 1 there's no silver bullet other than coupling to the information coming from customers, but then being very scientific in the evals that you develop across them.

Speaker 1 So you have these, let's say, customer needs, let's say onboarding and a bunch of others.

Speaker 1 And then you have your system capabilities, which is, well, what do you need in order to provide a good experience there?

Speaker 1 Well, this customer is being onboarded onto a giant code base. Like it has,

Speaker 1 you know, it might be a code base that on its own is like 100 million tokens or something. Well, then you need to figure out some way to reason over that giant code base.

Speaker 1 So you have kind of a long context reasoning capability, or you kind of look at your agent and seeing like, what's preventing it from satisfying this query from a user.

Speaker 1 And so you kind of work backwards and reverse engineer from what a user is asking for to what capabilities you want to drive in your system.

Speaker 1 But the important part, I think, is to be able to tweak every part of the system from the product features to the agent design to the model training in order to build the best overall system.

Speaker 1 And if you are tapped in which parts you can change, like if you can only change the product and agent design, then you're actually pretty limited in what you can do because you're kind of at the mercy of

Speaker 1 what

Speaker 1 kind of these general third-party models can do.

Speaker 2 What I'm hearing from you is also that there is some trade-off between

Speaker 2 having, you know, to serve all different kinds of users and optimizing across those different evals, because each one of the teams that is thinking about a particular use case or audience at a more general organization, for example, is less likely to have the ability to work through the entire pipeline from training to product to.

Speaker 2 to win their use case.

Speaker 1 So the thing that was extremely satisfying about working on Gemini is that you're driving research in the frontier. And there's something very gratifying about that.

Speaker 1 The downside was that you were so far away removed from product that it was kind of a broken telephone game of talking to the

Speaker 1 four different people that information flowed through before the model got into a customer's hands. That coupling was very loose.
And I think it's very true that

Speaker 1 Just because a company might have the best model in some general set of academic benchmarks doesn't actually mean they have the best product.

Speaker 1 And I think what we're seeing is when things really fit together, it's usually that there's a

Speaker 1 tight coupling between a product and a model that's a whole system. It's not just the model alone.

Speaker 1 Obviously, the first big example of that was ChatGPT. ChatGPT is kind of an incredible product that was coupled with the model.

Speaker 1 And the model was post-trained for the prompts that are coming in from users from ChatGPT.

Speaker 1 Like there was a reason why it was, you know, when I saw the first coding blog post that ChatGPT produced for me, that was, that was just insane. It was like an insane magical moment.

Speaker 1 And they post-trained specifically for that. And I think there's another example of that happening right now with pod code.
That's kind of tight model to product coupling.

Speaker 1 And so I really think that that's, it's important to really be able to do both at a great degree of excellence.

Speaker 2 What is an example as you guys open up the wait list that you want users to try where it should just be like obvious that the answers are better than other coding agents?

Speaker 1 I think the kinds of queries that it tends to be better at are, I guess, what we would call semantic queries. So, let's say, like, an example of a query where this is not the best system to use.

Speaker 1 It's like file level.

Speaker 1 If you're looking at a file and there's like a specific thing in that file and you're just trying to get a quick answer to it, you don't really need the hammer of like a deep research experience.

Speaker 1 You don't need to wait, you know, like tens of seconds or a minute or two to get that answer because that should just be delivered snappily.

Speaker 1 But if you

Speaker 1 don't exactly know where you're looking for, and you don't know the function name, or you don't, you know, something, and this is kind of the hard problems that engineers are usually in, like there's a flaky test.

Speaker 1 I mean, you know that this test is flaky, but that's where your knowledge stops, right? And that's when you usually go to Slack and ask some engineers, like, this test is flaky. What's going on?

Speaker 1 Does anyone know?

Speaker 1 You know,

Speaker 1 we've had the way we've used it is when you're training these models, there's a lot of infrastructure work that goes into it, and it fails in interesting ways all the time.

Speaker 1 And

Speaker 1 asking things like, you know, my jobs are running slowly, five times more slowly than usually. Why is that?

Speaker 1 That's kind of a vague query that would be very hard to answer with existing systems, especially since

Speaker 1 the knowledge around that query might live not just in the code base. So in the example that I just brought up,

Speaker 1 when this was happening, that our environment jobs were slowing down,

Speaker 1 it turned out that two different teams, kind of infrastructure and research team, submitted pull requests that were, they passed tests.

Speaker 1 It wasn't that they were wrong, but they kind of conflicted together in a way that caused this kind of effectively a race condition and slowed everyone's jobs down.

Speaker 1 And these are the kinds of bugs that actually engineers spend, you know, that's where you have like two or three engineers who spend a few days trying to solve one of these.

Speaker 1 So I think these kind of semantic queries

Speaker 1 tend to be the place where a product like this shines in the same way that when you think of what kind of query would you ask ChatGPT to, you know, when it just needs to use kind of the browser tool.

Speaker 1 So it's like a quick factual thing, like you wouldn't invoke the deep research experience.

Speaker 1 But when you wanted to compile kind of a lot of information around some more nebulous query, I think that's where people seem to find a lot of value with deep research.

Speaker 1 So I think a similar kind of mindset holds here.

Speaker 2 One thing I would do, you know, working on new system with principal engineer next to me is just have them explain the entire system, right? Yeah.

Speaker 2 Because I want to have that context where I can't, I can't even tell the agent what to do.

Speaker 2 And so I'm curious from a product perspective, like

Speaker 2 the way you have, you know, memory for agents or even for teams is an increasingly popular idea. There's lots of ideas about how

Speaker 2 to do it. I think there are not many examples of like collaborative memory in production in a useful way yet, but I'm sure it is coming.

Speaker 2 Have you guys designed it in a form like I can understand too?

Speaker 1 Yes, that's so it's this is actually one of the more fun things to work on in product today. And I think it's one of the more fun kind of features to work on at the company is

Speaker 1 how do you design a team-wide memory?

Speaker 1 Because there are all sorts of details around, well, who can edit the memory? Who can view different parts of the memory?

Speaker 1 How do you maintain a kind of repository of this memory for people to edit and view?

Speaker 2 You have to have a concept of authority, right? People are going to say things that are wrong.

Speaker 1 The way it's worked with customers we've started working with is

Speaker 1 they typically have, they want to start off with kind of a group of trusted kind of senior,

Speaker 1 staff level plus engineers who are kind of the gatekeepers, which is a very, I think, common notion. You have permissions, right, and ownerships,

Speaker 1 ownership structure, and code bases. And they basically are the ones who kind of populate the memory first and then sort of expand the scope.
But I think it works.

Speaker 1 It's actually a much more complex feature to build because it touches on

Speaker 1 org-wide permissions. There's some parts of the code where a certain engineer should be able to edit the memory, but other engineers shouldn't.

Speaker 1 And so it actually starts looking like the new way of versioning code effectively.

Speaker 1 It's kind of a GitHub plus plus, because you're not versioning the code, you're kind of versioning the meta-knowledge around it that helps language models understand it better.

Speaker 1 But definitely, that is something that we've built that I think it's a thing to iterate a lot until you kind of get the right design here because you're effectively building kind of a new Git from scratch.

Speaker 2 Yes, interesting. And you're trying to design some sort of permissions into it versus like, you know, dominant system today in actual version control is like,

Speaker 2 you know, at best, pull request review, right?

Speaker 1 Like you just right you try and like somebody in the organization um with the ability to review makes a determination as to whether or not misha should be able to make this change or not actually based on the content and i think actually it's going to look not too dissimilar from that right where if you want to change the agents the the team-wide memory then it probably is going to look something like a pull request where the person who really understands that system

Speaker 1 approves or you know edits it or something like this. I don't think it's going to look too dissimilar.

Speaker 2 That's quite different from like traditional role-based

Speaker 2 group hierarchical access control that is quite static, right? And it makes sense to me that it would look perhaps a little bit more Git-like in that the

Speaker 2 person who knows what part of the code base you are editing or

Speaker 2 creating or editing knowledge about is going to evolve over time as the code base evolves over time and the team does as well.

Speaker 1 Yeah, exactly. But I think this is also how it was very common at, you know, at Google and I think other places as well for different parts of the code base to have owners.

Speaker 1 And so there are like these ownership files that we have as well.

Speaker 1 And basically, if you're on the ownership file, then the review has to go through you or through, it has to be approved by at least one of the members of the ownership file.

Speaker 1 And as people move around teams and so forth, the ownership files themselves get updated.

Speaker 1 So I think a pretty similar structure is probably going to hold here, but it's a lot more nuanced than building kind of an individual memory, which is just kind of personal to you and lives on your computer in your agents and D file or something.

Speaker 2 Okay, if we zoom out and place like reflection overall in context a little bit and talk about the larger environment.

Speaker 1 Sounds good. Yeah.

Speaker 2 You know, coding as a

Speaker 2 root problem in this era of AI research is somewhat commonly held belief. Right.

Speaker 2 I think a criticism of companies that went after pre-training focused on coding was in reality, like you actually, you needed language, you needed a lot of the capabilities, who can say exactly which, but the reasoning capabilities that could be elicited from large pre-trained models to do code anyway.

Speaker 2 And so you had to do all of the work without the general use. Is it specifically the availability of pre-trained models

Speaker 2 that are more capable and open source that made you feel like we can go after

Speaker 2 super intelligent, like autonomous autonomous systems in coding without spending the pre-training dollars up front as a new lab or help me think about that logic a little bit more.

Speaker 1 I think that that's roughly correct for kind of, you know, the sort of

Speaker 1 why you can get into the game sort of short term.

Speaker 1 A bet that we made,

Speaker 1 you know, when you were starting a company a year and a half ago was that there'd be pretty decent open weight models out there that pre-training you know we kind of saw pre-training as starting to more or less converge on kind of a known paradigm.

Speaker 1 There's sort of a

Speaker 1 known big data set on the internet. Yes, there are going to be some algorithmic innovations, but you're basically extracting signal from an extremely noisy data set.

Speaker 1 And we felt like there's only so much signal that

Speaker 1 one would be able to extract without getting into just absurd dollars for scaling this in terms of what you're trying to get out of it.

Speaker 1 So, what we thought would happen is that there'd be decent open weight models.

Speaker 1 I think the quality of the open weight frontier has

Speaker 1 surprised me. They're actually, the models are better than I thought they would be.
And we thought that you can just focus on, you know, we're in this brief period in history right now where

Speaker 1 the RL flops are still manageable. Like

Speaker 1 you can really have a best in class product if you're focused.

Speaker 1 And yes, you'll need to put, you know, you still need a decent amount of GPUs, but from a, but from a FLOPS perspective, it's nowhere near where pre-training is.

Speaker 2 Like two magnitudes off.

Speaker 1 Exactly. Right.
So you can get into it and kind of build out a both kind of the product and a research arm. Our thought was that this was the time where you can actually start a,

Speaker 1 you know, a generational

Speaker 1 frontier lab that does not need to be coupled to a, you know, to a big cloud provider,

Speaker 1 because if you do it right, you'll actually be able to generate

Speaker 1 sufficient revenues to not have to be acquired or find some strange deal where the cloud provider kind of owns you.

Speaker 1 And that was kind of the model, I think, of a lot of what Frontier Labs look like pre-LLMs.

Speaker 1 I think we're already starting to see that this kind of more of a field-wide thing, independently of reflection, right? When you look at how fast Anthropics revenue is growing, I think

Speaker 1 they're kind of in this spot where it's like a massive revenue-generating business that's that's growing at an unprecedented rate.

Speaker 1 That is, but that was very much the ethos: that we can come in, we don't need a pre-train,

Speaker 1 you can get by with

Speaker 1 two orders of magnitude less compute and really get

Speaker 1 something out there that's really good.

Speaker 1 I think that roughly speaking, you won't need the amount of compute that I think Frontier Lab needs today

Speaker 1 as you're focused, but you'll still need kind of an order of magnitude less. So I think that the capitalization requirements are still high.
There's no way of avoiding that.

Speaker 1 But I'd say they're, and asymptotically, they're probably the same. But asymptotically,

Speaker 1 the idea is that at that point, you just have a generational business that

Speaker 1 can raise capital off of that.

Speaker 2 I guess part of my read at this point in time is, and maybe it was always true, but especially now is your actual capabilities in terms of understanding what evals to go after, how to design reward models.

Speaker 2 There's perhaps perhaps like less understanding and more dispersion in the field in post-training strategies versus like, as you said, more maturity in pre-training right now.

Speaker 2 Because you can, if it was a simple question of scaling RL on language models, people would be doing it more aggressively right now.

Speaker 1 Right.

Speaker 2 And so, actually, maybe that's a good question for you. Like, how would you describe the challenges in solving scaling here?

Speaker 2 Like, why, why are we only able as a field to put like a much smaller amount of compute to work here and still get like

Speaker 2 best in class results versus pre-training scale GPUs right now?

Speaker 1 I'd say that there are two categories where one would think that things fall into.

Speaker 1 One is more around the problems, limitations of the problem structure.

Speaker 1 And the other one is, well, maybe the structure is fine, but you need algorithmic advances to really drive the next frontier forward.

Speaker 1 There's, you know, I'd say it's some mixture of both, both, but the biggest weight I put is on the problem structure. So if you, the thing that I led for Gemini was reward models.

Speaker 1 I built out the reward models that were used to post-train Gemini 1 and 1.5.

Speaker 1 And I thought is that if you have a reward that accurately basically describes the outcome of any arbitrary task that you throw at it, then that's that's it.

Speaker 1 You know at that point, it's just algorithmic advances, but even the very simple RL RL methods we have today will be able to get a lot out of this.

Speaker 1 They'll only be bound by their exploration abilities. That's the only thing.

Speaker 1 But if today,

Speaker 1 we certainly are not in this world where we have clean rewards for every task we could imagine. And so we're kind of

Speaker 1 making as a field have to make sort of various shortcuts and compromises to that.

Speaker 1 So you'll have things like LLM is judge with different rubrics, and that works to some extent, but it inevitably, inevitably, a noisy or like stochastic reward inevitably gets hacked.

Speaker 1 So you kind of need a lot of these. And, you know, and there's only so much you can extract out of them.

Speaker 1 Then you have sources that do have ground truth rewards, but there are not many of them. And so you have to hope that by optimizing against those, you'll get some generalization effects.

Speaker 1 And so I think that the fundamental problem is like the reward problem. You can either go in and say, I'm just going to, all I'm going to focus on is kind of rewards,

Speaker 1 or Or you can say, I'm going to

Speaker 1 take things as they are and just be more

Speaker 1 creative in the methods that leverage the rewards that happen today. And so, examples of that are basically every synthetic generation pipeline is some example of this.

Speaker 1 So,

Speaker 1 it's a messy problem, but I think it's fundamentally like we're in a reward-bound world.

Speaker 1 I don't think there's going to be any breakthrough that all of a sudden, you know, we go from we didn't have rewards for everything to we do, because the reward problem in itself is at the time I called, I thought it was AGI complete, now I'd say it's ASI complete.

Speaker 1 But by the time you have a neural network that can accurately verify any outcome, that is probably a super intelligence. And so then it goes back to again, evaluations.

Speaker 1 What if you're training your rewards, your reward models on something, like what are you evaluating against? What are the tasks that you want it to be good at? So

Speaker 1 that's kind of

Speaker 1 how I think about it. I think it's a fundamentally reward model model or rewards bound field.

Speaker 1 And then there's also kind of algorithmic progress in terms of

Speaker 1 the RL methods we have today are quite bad, I would say, at exploration and credit assignment.

Speaker 1 Like they, they're sort of addressed like the fundamental algorithms are take the things that work and make them happen more frequently and the things that don't work and make them happen less frequently.

Speaker 1 But they don't discern at all along your say reasoning chain which part of the reasoning was correct and which part was incorrect.

Speaker 1 And so that's why you get these reasoning chains that are kind of garden path meandering.

Speaker 1 Like they'll explore all sorts of things that are, you know, completely unnecessary and don't look at all like the kind of structured thinking that a person would have. That's how the algorithm works.

Speaker 1 It doesn't, it doesn't actually look at, there's no credit assignment step on any atomic level.

Speaker 1 And so that I would say falls into more algorithmic progress bottlenecks.

Speaker 2 Can I ask you for a few like hot takes quickly?

Speaker 1 Yeah, let's go for it.

Speaker 2 What do you think of all of these efforts, either in-house with labs and vendors or young companies just creating software environments that look like popular software to train agents in? Right.

Speaker 2 Copies of Airbnb or Amazon or Salesforce or Excel.

Speaker 1 Personally, maybe the take is not very hot. I'm very like bullish on it because how else are you going to maybe the hot take is that.
There's no such thing as generalization.

Speaker 1 There's just bringing the test distribution into train.

Speaker 2 Okay. That is an aggressive take.
Wow. Yeah.

Speaker 1 So as long as like your

Speaker 1 train distribution looks something like what you would actually want to evaluate for,

Speaker 1 then

Speaker 1 users will

Speaker 1 experience it as generalization.

Speaker 1 I think there is some generalization that happens in these models, but

Speaker 1 we probably as users overestimate it because we don't actually see how they were made. But then,

Speaker 1 yeah, if you saw, oh, the synthetic environment was actually very similar to the thing I was asking about. So it would make sense why the model would be would be good at that.
Maybe

Speaker 2 six months ago, I think

Speaker 2 you said, said, like, I think it's possible we have my definition of ASI in a couple of years. Do you still believe that's true?

Speaker 1 I still do believe that's true.

Speaker 1 I think that where I think we'll be in a couple of years from now is that there will be kind of definitive super intelligence in

Speaker 1 some meaningful categories of work.

Speaker 1 And so, for example, when I say coding, I don't mean all of coding, but there will be a super intelligence within some kind of slivers, some meaningful slivers of coding that are driving,

Speaker 1 I would say, immense progress in the companies that can benefit from that.

Speaker 1 And the reason why I would say that the problem of ASI would have been solved by then is because you've kind of, at that point, it's just a matter of operationalizing like what you know.

Speaker 1 You know, it just so happened that these particular categories, like you might have a super intelligent front end developer because there's so much data.

Speaker 1 distribution for that on the internet and it's easier to make synthetic data for that.

Speaker 1 But at that point, you have the recipe and it's just a matter of making kind of economic decisions of is it worth sinking in X amount of dollars to get the data in this category to get kind of something close to super intelligence there.

Speaker 1 An example of that is what happened with reinforcement learning before language models. Effectively, the blueprint for building super intelligent systems was developed.

Speaker 1 It happened with the Atari games, AlphaGo.

Speaker 1 You know, then Dota5 and Alphastar were near super intelligent systems. And if OpenAI and Deepine had sunk more compute into them, they would have definitely become super intelligent.

Speaker 1 It's just that at that point, it didn't really make sense. It economically, like, why would you do that?

Speaker 2 Then this is a definitional issue because I was going to ask, like, help me understand your view of like, I don't, like, one of the big criticisms of RL overall has been lack of generalization.

Speaker 2 That's been just kind of a general question for this direction.

Speaker 2 I do have friends at every large research lab that somewhat, you know, some, I mean, tell me if you hear something of a different tenor or just believe differently.

Speaker 2 They believe we're going to have systems that are much more capable than humans in many types of knowledge work, but they believe less in generalization.

Speaker 2 And so, in a resigned way, they're also, as you're saying, like, I guess we're just going to bring all of it under distribution one way or another.

Speaker 2 But that means, like, you know, it's a little bit different than my view of like, it's

Speaker 2 at some point, you're just, you know, you have enough capability that the rest you get for free, right? The rest sort of useful capability you get for free.

Speaker 1 I think I kind of

Speaker 1 have a similar viewpoint to the people you describe.

Speaker 1 I think the generalization capabilities of these things has been weaker. First of all, it's all mind-blowing if this exists.
So we went from fundamental existential crises and generalization.

Speaker 1 Like this was the feel of reinforcement learning before language models was

Speaker 1 we have these systems that we can make amazing, you know, at like very narrow tasks. We have absolutely no answer for generalization, like zero.

Speaker 1 And we went from that to things that, you know, feel like they're generalizing. They're certainly generalizing much better than anything we had before.

Speaker 1 But it's likely because the training distributions are so broad.

Speaker 1 So at least the way I think about it is more kind of output as a user. Is the system super intelligent in some meaningful categories of work?

Speaker 1 And then from a research perspective, is it obvious how to make it general for anything that you might care about? And at that point, again, it's just a matter of economics.

Speaker 1 Maybe there are some categories where collecting the data is so expensive and the return on investment is low, where

Speaker 1 effectively just better to have craftspeople than super intelligent AIs. So I think we're moving into this kind of world of jagged superintelligence where you have

Speaker 1 a handful of these super intelligences for categories that matter, maybe subsumed into one model at some point. But at first, there'll probably be,

Speaker 1 again, I think there will be a few companies that have kind of product model coupling that, you know, that is super intelligent in different categories.

Speaker 1 I think an example of, again, starting to see the first glimpses of super intelligence, but in a way that hasn't really transferred to anything meaningful yet is, well,

Speaker 1 we have these like super intelligent test takers now, like you know, Amy, the AMI benchmark is completely saturated, code forces and other competitive coding environments.

Speaker 1 The models are almost best in the world, and within the year, it will probably be just the best in the world. And yet,

Speaker 1 so we have the best competitive coding agents. Then you go into a company and you ask them, have these things been helpful? And they say, It's uneven.

Speaker 2 Yeah.

Speaker 1 Yeah. Right.

Speaker 1 So in the parts of work that are really meaningful, that would you want to see these things driving meaningful increase in gdp and i think right the only way you'll see that is if you go into a company and there's kind of you know a universal understanding that yeah my engineers are double digit percentage points as a whole every single one of them more productive right that's the kind of thing that if you that starts happening across every field then you'll see double digit increases in gdp so

Speaker 1 i think that the kind of benchmark maxing that's um

Speaker 1 And it's a bit different than benchmark maxing used to be before because you have benchmark maxing that is weakly correlated to customer outcomes, but it still looks very similar to taking a board game, training our NRL agent on it, getting kind of a landmark result in super intelligence, and then making a claim that, you know, super intelligence is solved.

Speaker 1 I think the reality is that

Speaker 1 deployment of it is half the problem,

Speaker 1 which goes back to kind of evaluating on customer problems and building product together with the models.

Speaker 2 So you must have seen the news of the Windsor non-acquisition into either OpenAI, but non-acquisition into Google DeepMind. What do you make of it?

Speaker 1 We're seeing this verticalization basically happen across categories that

Speaker 1 are material to frontier intelligence. And one could argue that the first verticalized category was actually search, like through ChatGPT.
That's sort of a place where OpenAI verticalized first.

Speaker 1 And coding has obviously emerged as another kind of frontier level category that could like all these companies have aspirations of ASI. Yeah, ASI.

Speaker 1 And I think, you know, being basically trillion-dollar companies or more, I don't think that it's really the economics that are the driving factor, but it's more that if you want to sustain frontier research, that's kind of what you have to become.

Speaker 1 And so coding has clearly become one of these categories where verticalization is extremely important. And I think that there's

Speaker 1 there are kind of two sides of the story, one on the frontier lab side and the other on the kind of more of you know product side, like a startup that builds product but does not have its intelligence in-house.

Speaker 1 So, I think on the frontier lab side, I think this is exactly kind of what Yannis and I noticed when we were working in Gemini:

Speaker 1 your model is so far away from the product that oftentimes, even though you have the best model, does not at all mean that you have the best product. So, there's a reason why

Speaker 1 basically startups are the places where kind of adoption of coding tools took off rather than the frontier labs. And so, there's a verticalization happening there.

Speaker 1 And some are going to do it successfully, and some are not.

Speaker 1 I think that that's kind of, we're already starting to see that with Cloud Code really being an example of a successful verticalization.

Speaker 1 I don't think it's guaranteed that a big lab can, you know, buy their way to

Speaker 1 the end user. because the fundamental problems of your

Speaker 1 research team being far away from your product team will still be true. And the company having, you know, 100 different focus areas will still be true.

Speaker 1 So I don't think that acquiring an asset will change that fundamentally, but it does underscore the importance of verticalization.

Speaker 1 And then from the startup side, I think it actually puts companies that

Speaker 1 are in these kind of critical path categories like search and coding.

Speaker 1 in a pretty existential place if they can't build their own frontier models. Not all frontier labs will be able to verticalize correctly, but some will, maybe one will.

Speaker 1 And that's going to be enough, I think, to kind of

Speaker 1 take the thunder out from,

Speaker 1 you know, from a company that's built a great user experience on top of someone else's model.

Speaker 1 And I think some of those dynamics are probably starting to play out as well.

Speaker 1 Like, I think that there are some question marks around: if you're on this critical path category and you don't have your own intelligence,

Speaker 1 you know, how do you compete when your your competitor can, you know, just basically subsidize their product a lot more than you can, right?

Speaker 1 Because you're effectively as a startup that's building on top of these things to grow quickly, you're subsidizing, you know, the margin that an Anthropic or Gemini or whatever is making.

Speaker 1 And

Speaker 1 Google and Anthropic and OpenAI can subsidize their products a lot more than you can.

Speaker 1 So I think that companies that are

Speaker 1 don't own their intelligence or are not kind of deeply integrated integrated into a customer in some way that makes them hard to remove find themselves in this pretty existential place as it becomes clear to the frontier labs that this is a category they need to verticalize around.

Speaker 2 I work with a few robotics companies. And so much of my lens on RL comes from that.
And I think it is like.

Speaker 2 far less clear in robotics that, you know, RL will be a dominant part of the training versus imitation learning. You'll actually appreciate this on imitation from humans using tools, right?

Speaker 2 Because we run this,

Speaker 2 I'm going to like describe this idea that is nuts, but I think it's just funny. We run this grant program twice a year for amazing people using ML in different fields.

Speaker 2 And it's called Embed.

Speaker 2 And

Speaker 2 one of the ideas I had as a joke recently was, well, like you just record everything, right?

Speaker 2 Like not obviously just the code base, but like your your Slack and all your documentation and all your conversations, because you are a software engineering team. And I'm 100%

Speaker 2 sure that I can take that data set if you ship something into production to an end customer that has real issues at any scale and sell it to a friend who's a researcher at a lab working on this stuff.

Speaker 2 And so you have some floor of value that is millions of dollars for your, you know, couple person company and like bonuses, like maybe the software company works, right?

Speaker 2 Obviously, this is like very noisy and and I'm mostly joking, but I'm curious how you think about

Speaker 2 like exploring non-RL data sets that could be useful to you here.

Speaker 1 If that company existed, right, we would definitely pay for their data.

Speaker 2 There we go. See, it's not an idiot idea.

Speaker 1 Yeah, it's yeah, especially if there's diversity.

Speaker 1 I think that'd be. I can sell the whole set.
Yeah.

Speaker 1 So is the question around

Speaker 1 how do you leverage alternative sources of data?

Speaker 2 Yeah, the question is,

Speaker 2 I think there is like,

Speaker 2 I don't want to like over-analogize to robotics, right? But within robotics, you have learning from world models, you have learning from sim,

Speaker 2 you have learning from embodied data of different types, right?

Speaker 2 Imitation, then you have RL. I think it's like much less clear that

Speaker 2 you can use RL for a lot of robotics today, especially some of the harder like manipulation problems.

Speaker 2 And I'm curious, just given, you know, your team has this enormous strength in RL's like a starting premise, how you look at other types of data to create the, you know,

Speaker 2 coding agent experiences you want.

Speaker 1 So I was actually

Speaker 1 a robotics researcher for like in reinforcement learning, like Peter at Beale's lab is a robotics lab. And it was, you know, it was a mixture.

Speaker 1 Like Peter's lab was always around the intelligence problem and robotics as being a domain where you study it.

Speaker 1 And one of the, you know, the reason I came to lead reward models for Gemini was because that's the question I was studying with robotics.

Speaker 1 I was, you know, we have these RL algorithms for getting robots to do some very narrow tasks, like moving blocks and, you know, various kinds of narrow tasks in simulation.

Speaker 1 And the question was, well, how do we get generalized manipulators? And,

Speaker 1 you know, just how do we build this all into one system? And it seemed like the the rewards were bottlenecks. So, this a lot of what I was studying before

Speaker 1 starting, you know, getting into language models was how do we design reward functions or models for

Speaker 1 robotics or, you know, for 3D video games like Minecraft or something like this that have, I think, similar challenges scientifically.

Speaker 1 The challenge is that if we, if you think that language model rewards are hackable, vision language model rewards or you know like other sensory signal rewards are infinitely more hackable they're much more short-lived than um than rewards like you can think of like language is just a compressed representation of the world that we have that we are kind of magically have to start with um whereas if you're processing pixels or a sensory motor signal um this is raw signal that has a lot more noise in it and so if you train a neural network that is sort of trying to detect whether this thing was manipulated correctly or this thing was you know moved correctly then

Speaker 1 that thing is just infinitely more hackable than anything you have in language models. So the same problems blow up and become much larger.

Speaker 1 And so that's actually why I changed to language models, because I felt that this was a fundamental problem, but we now have these confounding factors of these noisy signals coming in.

Speaker 1 I think that, at least in a generalizable way, that's why it's really hard to get reinforcement learning to work

Speaker 1 with robotics.

Speaker 1 The one place where it really does work well is when you have a clean reward signal, which happens to be in these like locomotion like scenarios. So there's a lot of work on

Speaker 1 building very robust sim to real locomotion pipelines. And it's because it's kind of locomotion is just your body.
Like you don't have to manipulate the world around you.

Speaker 1 And so you can actually build reward signals that are like, oh, you know, your

Speaker 1 quadruped is moving at this velocity without damaging its body kind of thing.

Speaker 1 Maybe it's a bit of a roundabout answer to the question, but it's that I think these two fields are very different in the data distributions that they support.

Speaker 1 And the kind of imitation learning data for language models is, of course, the internet. It's, of course,

Speaker 1 we've people who've gathered all this data on how we write and so forth. And so

Speaker 1 aside from that, when we're generating synthetic data,

Speaker 1 there is the only scalable path is really reinforcement learning. The other thing that I'll say here is that when you're collecting data for robotics,

Speaker 1 you can do it in like this tele op way. Like it's sort of, these are things like the things that we try to, are trying to train robots to do are very intuitive for humans as well.

Speaker 1 I mean, actually more intuitive for humans, right? People are master manipulators. So you can have a lot of kind of tele op-like data collection.
The things that we want language models to do

Speaker 1 are sort of you know, at the level of

Speaker 1 it's really hard to collect data of, you know, the chain of thought process that goes on in like a human's head when they're trying to solve some task. And that's kind of the data that you need.

Speaker 1 And so for that reason, I think language models favor this more like synthetic data RL-like approach where, well, it's easier for us to verify whether the thing was done or not than it is to actually generate all that data from a person specifically.

Speaker 2 Maybe we just need like a network interface.

Speaker 1 Yeah.

Speaker 2 To get the chain of thought.

Speaker 1 Yeah, maybe.

Speaker 1 I mean, that's kind of actually when Yannis and I were starting the company, we were thinking about, well, what, like, you know, maybe we just like, yeah, somehow like have people speak into a microphone as they're doing tasks in order to capture that.

Speaker 1 Just stream it. Yeah.
And it seemed,

Speaker 1 you know, logistically very hard to pull off.

Speaker 2 Okay, one

Speaker 2 final

Speaker 2 question about

Speaker 2 sort of reflections path from here. At what point do you, this is a decision you get to make in the future, but at what point do you try to look at other problems beyond engineering and coding?

Speaker 2 Like, do you do you feel like there's a level of sufficient depth where you should just go attack different domains?

Speaker 1 The thing that makes coding as a category special is that

Speaker 1 it's not synonymous with software engineering. It's just kind of how we think about the market today.

Speaker 1 The reason code is special is if you believe that the way a language model will interact with almost any piece of software is through function calls and therefore code, then if you build very capable reasoners, coding reasoners, that are sort of purpose-built for organizations.

Speaker 1 So you've solved the kind of long context: how do I reason over a bunch of disparate source of information problem?

Speaker 1 And I can act on pieces of software through code, then you've kind of built a system like the technology that will generalize, at least operationally, across other categories of work. And so

Speaker 1 the way I think about it is more

Speaker 1 first, just build, not trying to get too ahead of yourself of kind of just first build the kind of most depth-wise comprehension system for software engineers.

Speaker 1 This will naturally induce more reliable coding agents.

Speaker 1 You can plug that in as an MCP to your favorite IDE or coding agent

Speaker 1 or use one of our own.

Speaker 1 You can kind of plug that into whatever surface area makes sense for the customer and then sort of naturally start seeing where you're getting pulled from there.

Speaker 1 And the reason I think this will work is because this is kind of what we're already seeing, right? In the,

Speaker 1 you know, how do you make the system useful for product managers or technical support people?

Speaker 1 And then, you know, I think moving on to things like sales or something like this. But there are already places where, you know, customers are pulling us

Speaker 1 in different directions. It's just kind of a matter of whether you engage on that today or not.
And I think that the risk that a startup has is that

Speaker 1 you see a lot of shiny areas where you can go and you start kind of going diffuse before you've really nailed a category.

Speaker 1 So I think it's really important to be focused and not diffuse in the short term.

Speaker 1 And that if you kind of build the right, as we kind of think about it, as a contextual core for an organization, in this case, an engineering organization, then you can naturally start expanding that into adjacent areas of work in that enterprise.

Speaker 2 Okay, last question, Misha. Where would you characterize us as like being on the path toward deployment of these capabilities in different fields?

Speaker 1 I think we're a lot earlier than most people think.

Speaker 1 That this is going to be one of those areas where the technological building blocks outpace their deployment.

Speaker 1 And so, yeah, within the next couple of years, uh the blueprint roughly for you know how to build ASIs will have been set more or less.

Speaker 1 Like maybe there are still some efficiency breakthroughs that need to happen, But more or less, there'll be a blueprint for how do you build a super intelligence in a particular category?

Speaker 1 Actually, going in and deploying it and building it for specific categories of work, there are going to be a lot of product and kind of research innovation specific to those categories

Speaker 1 that will probably make this a multi-decade thing.

Speaker 1 So I don't think that it's a couple of years from now and GDP starts growing 10%.

Speaker 1 um you know year over year globally i think we're actually going to get there uh but it's going to be a kind of multi-decade endeavor.

Speaker 1 I tend to kind of see a lot of patterns now in kind of real world deployment with reinforcement learning research as it worked again before large language models.

Speaker 1 And before large language models, it used to be kind of you pick an environment, like you pick Go, you pick StarCraft, you pick something else, and you go and try to solve it with some combination of imitation learning and reinforcement learning.

Speaker 1 And when you look at all those projects, projects, these were basically things that were called strikes within DeepMind.

Speaker 1 And each strike

Speaker 1 within and outside of DeepMind was a bit of a snowflake.

Speaker 1 Like the reinforcement learning methods and environment setup for Go was at a high level, conceptually similar, but in the detailed implementation level, very different from StarCraft, very different from Dota5.

Speaker 1 And so I think that that's sort of what we're going into

Speaker 1 every big category having a different environment, right, and different different kinds of agents with different tools.

Speaker 1 And that means that you'll need to, you'll have like general base models that you can start with, but you'll need to post-train things in specific ways for those categories.

Speaker 1 And we're starting to see that already in the sense that the model that powers OpenAI's Codex is not the O series of models, it's a model called Codex, which was post-trained for that environment.

Speaker 1 The deep research models, like that's a specific environment. They're also post-trains for that environment.

Speaker 1 And I think we'll basically see more and more of that, that any category that has a sufficiently large business around it that requires an intelligent score to power it, there will be all sorts of interesting design decisions at the research and product level of how do you actually gain the most performance out of this particular category.

Speaker 1 So

Speaker 1 I think we'll kind of see a lot more kind of depth-first players emerge over the coming decade or so.

Speaker 2 I'm making a bet on it. And I also think that like part of to your point about choosing like the problem for the era, we don't get to choose at conviction a problem for 100 years.

Speaker 2 We do get to choose for like this decade or so. Right.
And

Speaker 2 if you actually believe it's going to be a very long-term endeavor to get to the sort of productivity and abundance you described, but we are going to get there, then

Speaker 2 the other thing you think about is like

Speaker 2 like path to supporting the cost for bringing anything under distribution during a particular period. Right.

Speaker 2 And so I'd say, like, in the, you know, we've already backed companies in some of these areas, but like, let's say in life sciences or material science, like it is more expensive to collect, you know, types of data you might need.

Speaker 2 And that might be a longer endeavor or one that you have to figure out how to fund, right? Or in robotics.

Speaker 2 And so I think it's a really interesting timing question of like any of these really big categories. But I believe coding is this era.

Speaker 1 I think coding is this era as well.

Speaker 1 This one, I think, will take longer than people thought as well, because again, enterprise,

Speaker 1 organizational problems are much different

Speaker 3 than

Speaker 1 the benchmarks that we have today. But I think it will be one of the faster ones.
So I don't think that that's kind of a decade out.

Speaker 1 That's within the next,

Speaker 1 you know,

Speaker 1 say

Speaker 1 dozens of months kind of thing. So I think the next sort of generational companies in coding

Speaker 1 are definitely being built today.

Speaker 2 Well, congratulations on the release, Misha. Thanks.

Speaker 1 Yeah, thank you, Sarah.

Speaker 2 Find us on Twitter at no priors pod. Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.

Speaker 2 That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Asimov: Building An Omniscient RL Oracle with ReflectionAI’s Misha Laskin

Press play and read along

Transcript

More episodes from No Priors: Artificial Intelligence | Technology | Startups

The Future of Voice AI: Agents, Dubbing, and Real-Time Translation with ElevenLabs Co-Founder Mati Staniszewski

Scaling Legal AI and Building Next-Generation Law Firms with Harvey Co-Founder and President Gabe Pereyra

Sunday Robotics: Scaling the Home Robot Revolution with Co-Founders Tony Zhao and Cheng Chi

How AI Will Accelerate Breakthroughs in Biotechnology with Benchling CEO Sajith Wickramasekara

Meet Snowflake Intelligence: A Personalized Enterprise Intelligence Agent with Sridhar Ramaswamy