John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

1h 36m

Chatted with John Schulman (cofounded OpenAI and led ChatGPT creation) on how posttraining tames the shoggoth, and the nature of the progress to come...

Watch on YouTube. Listen on Apple PodcastsSpotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.

Timestamps

(00:00:00) - Pre-training, post-training, and future capabilities

(00:16:57) - Plan for AGI 2025

(00:29:19) - Teaching models to reason

(00:40:50) - The Road to ChatGPT

(00:52:13) - What makes for a good RL researcher?

(01:00:58) - Keeping humans in the loop

(01:15:15) - State of research, plateaus, and moats

Sponsors

If you’re interested in advertising on the podcast, fill out this form.

* Your DNA shapes everything about you. Want to know how? Take 10% off our Premium DNA kit with code DWARKESH at mynucleus.com.

* CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at commandbar.com.



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Press play and read along

Runtime: 1h 36m

Transcript

Speaker 1 Today I have the pleasure to speak with John Schulman, who is one of the co-founders of OpenAI and leads the post-training team here.

Speaker 1 And he also led the creation of ChatGBT and is the author of many of the most important and widely cited papers in AI and RL, including PPO and many others. So John, really excited to chat with you.

Speaker 1 Thanks for coming on the podcast.

Speaker 2 Thanks for having me on the podcast. I'm a big fan.

Speaker 1 Thank you for saying that.

Speaker 1 So the first question I had is, we have these distinctions between pre-training and post-training beyond what is actually happening in terms of loss function and trading regimes.

Speaker 1 I'm just curious, taking a step back conceptually, like what kind of thing is pre-training creating? What does post-training do on top of that?

Speaker 2 In pre-training, you're basically training to imitate all of the content on the internet or on the web, including websites and code and so forth. So you get a model that can basically generate

Speaker 2 content that looks like random web pages from the internet. And

Speaker 2 the model is also trained to maximize likelihood, where it has to put a probability on everything. So

Speaker 2 the objective is basically predicting the next token given the previous tokens. Tokens are like words or parts of words.
And since the model has to put a probability on it, and

Speaker 2 we're training with to maximize log probability, it ends up being very calibrated. So it can not only generate all

Speaker 2 the content of the web, it can also assign probabilities to everything.

Speaker 2 So the base model can effectively take on all these different personas or generate all these different kinds of content. And then when we do post-training,

Speaker 2 we're usually targeting a narrower range of behavior where we basically want the model to behave like this.

Speaker 2 kind of chat assistant and it's a it's a more specific persona where it's trying to be helpful. It's not trying to imitate a person.
It's answering your questions or doing your tasks.

Speaker 2 And

Speaker 2 we're optimizing on a different objective, which is more about producing outputs that humans will like and find useful, as opposed to just trying to imitate this raw content from the web. Yeah.
Okay.

Speaker 1 I think maybe I should take a step back and ask.

Speaker 1 Right now we have these models that are pretty good at acting as chatbots.

Speaker 1 Just taking a step back from how these processes work currently, what will the models release by the end of kinds of things the models release in the end of the year will be capable of doing?

Speaker 1 What do you see the progress looking like? You know, carry this forward for the next five years.

Speaker 2 Oh, yeah, five years. Yeah, I think the models will get quite a bit better.

Speaker 2 But in what way? In the person? Five years.

Speaker 2 So,

Speaker 2 I mean, I think even

Speaker 2 in one or two years, we'll find that a lot of you can use them for a lot of

Speaker 2 more involved tasks than they can do now.

Speaker 2 So for example, right now,

Speaker 2 you could imagine having the models carry out a whole coding project instead of maybe giving you one suggestion on how to write a function. So

Speaker 2 you could imagine the model

Speaker 2 giving it sort of high-level instructions on what to code up, and

Speaker 2 it'll go and write many files and test it, look at the output, iterate on that a bit. So just much more complex tasks.

Speaker 1 And fundamentally, the unlock is that it can act coherently for long enough to write multiple files of code? Or what has changed between now and then?

Speaker 2 Yeah, I would say this will come from some combination of just

Speaker 2 training the models to do harder tasks like this. So

Speaker 2 just

Speaker 2 like, I'd say write the models aren't

Speaker 2 aren't particularly like

Speaker 2 most of the training data is more like doing single steps at a time. And I would expect us to do more

Speaker 2 for training the models to carry out these longer projects.

Speaker 2 So I'd say any kind of training,

Speaker 2 like doing RL

Speaker 2 to learn how to do these tasks,

Speaker 2 however you do it,

Speaker 2 whether you're supervising the final output or supervising it like each step. I think any kind of training at carrying out these long projects is going to make them a lot better.
And since since

Speaker 2 the whole area is pretty new, I'd say there's just a lot of low-hanging fruit

Speaker 2 in doing this kind of training. So I'd say that's one thing.
Also, I would expect that as the models get better, they're just better at recovering from errors or they have just

Speaker 2 they're better at

Speaker 2 dealing with

Speaker 2 dealing with edge cases or when things go wrong, they know how to recover from it. So

Speaker 2 the models will be more sample efficient. So you don't have to collect a ton of data to teach them how to get back on track.
Just a little bit of data or just their generalization from

Speaker 2 other

Speaker 2 abilities will allow them to get back

Speaker 2 on track, whereas current models might just get stuck and get lost.

Speaker 1 I'm not sure I understood actually how

Speaker 1 I want to understand more explicitly how the generalization helps you get back on track. Can you say more about that? I'm not sure I got why those two concepts are connected.

Speaker 2 Right. They're not directly connected.
So I would say you usually have a little bit of data that does everything. So, I mean, if you have,

Speaker 2 yeah, if you collect a diverse data set, you're going to get a little bit of everything in it. And

Speaker 2 if you have models that generalize really well,

Speaker 2 even if there's just a couple examples of getting back on track. I see.
Or even like maybe in the pre-training, there's examples of getting back on track.

Speaker 2 Then like like the model will be able to generalize from those other things it's seen to the current situation. So I think like

Speaker 2 if you have models that are weaker, you might be able to get them to do almost anything with enough data, but you might have to put a lot of effort into a particular domain or skill.

Speaker 2 Whereas for a stronger model, it might just do the right thing without any. training data or any effort.

Speaker 1 Do you have some intuition about right now these models can maybe act coherently for five minutes?

Speaker 1 We want them to be able to do tasks that for a human would take an hour, then a week, then a month, and so forth.

Speaker 1 To get from each of these benchmarks, is it going to be each one takes 10x more compute analogous to the current scaling laws for free training? Or is it going to be a much more

Speaker 1 streamlined process because

Speaker 1 just getting to that point where you're already more sample efficient and then you can just you just go to the years of carrying out tasks or something.

Speaker 2 Yeah, I would say at a high level, I would agree that longer horizon tasks are going to

Speaker 2 require more model intelligence to do well and are going to be more expensive to train for.

Speaker 2 I'm not sure I would expect there to be a really clean scaling law unless you

Speaker 2 set it up in a very careful way or design your

Speaker 2 yeah, design the experiment in a certain way. Because

Speaker 2 I would say there might end up being being some phase transitions where um once you get to a certain level um you can uh deal with

Speaker 2 um you can deal with much longer tasks so for example people um

Speaker 2 uh like i think when people um like think when people do planning for uh at different time scales, I'm not sure they use completely different mechanisms.

Speaker 2 So

Speaker 2 we probably use the same mental machinery if we're thinking thinking about one month from now, one year from now,

Speaker 2 or like 100 years from now.

Speaker 2 So we're not actually doing some kind of reinforcement learning

Speaker 2 where we need to worry about a discount factor that covers that time scale and so forth. So

Speaker 2 I think using language, you can describe all of these different time scales.

Speaker 2 And then you can do things like plan to in the moment, you can try to make progress towards your goal, whether it's a month away or 10 years years away.

Speaker 2 So, I might expect the same out of models where they're

Speaker 2 some kind of,

Speaker 2 I don't know if it's a phase transition, but like there's some capabilities that work at multiple scales. Yeah.

Speaker 1 Well, okay, so to correct me if this is wrong, but it seems like that implies.

Speaker 1 Right now, we have models that are on a per token basis, pretty smart. Like they might be as smart as humans on a per token basis, the smartest humans.

Speaker 1 And the thing that prevents them from being as useful as they could be is that five minutes from now, they're not going to be still writing your code in a way that's coherent and aligns with the broader goals you have for your project or something.

Speaker 1 If it's the case that once you start this long horizon RL training regime, it immediately unlocks your ability to be coherent for longer periods of time. Should we be predicting

Speaker 1 something that is human level as soon as that regime is unlocked? And if not, then what is remaining after we can plan for a year and execute projects that take that that long?

Speaker 2 Yeah, it's not totally clear what we're going to see once we get into that regime and

Speaker 2 how fast progress will be. So

Speaker 2 that's still uncertain.

Speaker 2 I would say I would expect there to be

Speaker 2 I wouldn't expect everything to be immediately solved by doing any training like this.

Speaker 2 I would think there will be other like miscellaneous deficits that the models have that cause them to get stuck or not make progress or make worse decisions than humans. So

Speaker 2 I wouldn't say I expect that this one little thing will unlock all capabilities, but

Speaker 2 yeah, it's not clear.

Speaker 2 But it might, like some improvement in the ability to do long horizon tasks might go quite far.

Speaker 1 Would you say it's plausible or is it seems quite likely that there will be other reasons why there might be bottlenecks?

Speaker 1 And I'm also kind of curious, like, what would be the nature of the bottlenecks? So it has all these representations for pre-training.

Speaker 1 Now it can do act coherently for a long period of time because of Long Horizon RL. What's remaining?

Speaker 2 Yeah.

Speaker 2 Maybe

Speaker 2 there's some other experience that human experts bring to different tasks, like having some taste or dealing with ambiguity better.

Speaker 2 So I could imagine that if we want to do something like research,

Speaker 2 like those

Speaker 2 kind of considerations come into play.

Speaker 2 Yeah, obviously,

Speaker 2 they're going to be just sort of mundane limitations around like affordances of the model, like whether it can

Speaker 2 whether it can use UIs and obviously the physical world

Speaker 2 or having access to things. So, I think there might be a lot of

Speaker 2 mundane barriers that are probably not going to last that long, but would initially

Speaker 2 slow down progress.

Speaker 1 The websites that are designed for these AIs, once they're much more multimodal, or at least trained on more multimodal data, will they be in any way different from the ones we have for humans?

Speaker 1 Like the UIs that will be needed?

Speaker 1 Compensating for their strengths and weaknesses, how would that look different from the current

Speaker 1 UIs we have for humans?

Speaker 2 Yeah, that's... That's an interesting question.
I mean,

Speaker 2 I would expect that models will be able to use websites that are designed for humans just by using vision, like when the vision capabilities get a bit better.

Speaker 2 So there wouldn't be an immediate need to change them. On the other hand, some websites that are going to benefit a lot from

Speaker 2 AIs being able to use them will probably want to design to be better UXs for AIs. So

Speaker 2 I'm not sure exactly what that would mean, but probably

Speaker 2 like assuming that our models are still better in text mode than like reading text out of images,

Speaker 2 you'd probably want to have a good text-based representation for the models. So,

Speaker 2 and also

Speaker 2 just a good

Speaker 2 indication of what are all the things that can be interacted with.

Speaker 2 But I guess I wouldn't expect the web to get totally redesigned to have APIs everywhere, because I would expect that we can get models to use the same kind of UIs that humans use. Right.

Speaker 1 I mean, I guess that's been the big lesson of language models, right? That

Speaker 1 they can act in the similar affordances that humans have.

Speaker 1 So the point you made earlier about this process could be more sample efficient because it could generalize from its experiences in pre-training of how to get unstuck in different scenarios.

Speaker 1 I'm curious what the strongest evidence of this kind of generalization and transfer you've seen is.

Speaker 1 Yeah,

Speaker 1 like,

Speaker 1 because the big question, it seems, about the future abilities as models is like how much generalization there is happening. Is there something that feels really compelling to you?

Speaker 1 Like you really learned something that you wouldn't expect it to learn from the generalization here?

Speaker 2 There's definitely been some interesting

Speaker 2 instances of generalization in post-training. Like

Speaker 2 one well-known phenomenon is if you do all your fine-tuning with English data, you'll automatically

Speaker 2 have the model also

Speaker 2 behaving well in other languages. So if you train the assistant on English data, it'll also do something reasonable in Spanish, say.
And sometimes

Speaker 2 you might get the wrong behavior in terms of whether it replies in English or replies in Spanish, but

Speaker 2 usually

Speaker 2 you get the right behavior there as well. Like you get it to respond in Spanish to Spanish queries.
So that's one.

Speaker 2 One kind of interesting instance of generalization that you just sort of latch onto the right helpful persona and then you automatically do the right thing in different languages.

Speaker 2 We've seen some version of this with multimodal data where if you do text only fine tuning, you also get reasonable behavior with images.

Speaker 2 Early on in ChatGPT,

Speaker 2 we were trying to fix some issues in terms of the model understanding its own limitations.

Speaker 2 like early versions of the model would think they could like send you an email or call an Uber or something.

Speaker 2 Like the model would try to play the assistant and it would say, oh, yeah, of course, I sent that email. And obviously it didn't.
So

Speaker 2 we started collecting some data to fix those problems. And we found that a tiny amount of data did the trick, even when you mixed it together with everything else.

Speaker 2 So I don't remember exactly how many examples, but something like 30. 30 examples.

Speaker 2 Well, we had, I don't know, pretty small number of examples showing this general behavior of like explaining that the model doesn't have this capability and that generalized pretty well to all sorts of capabilities we didn't train for.

Speaker 1 Okay, so I still want to go back to this because I'm not sure I understood.

Speaker 1 Like if you have

Speaker 1 this model that is trained on to be coherent for longer periods of time, does that imply that unless there are these other bottlenecks, which they may or may not be, by next year you could have models that are potentially like human level in terms of acting like

Speaker 1 you're interacting with this as a colleague, and it's like almost it's like as good as interacting with a human colleague. You can tell them to go do stuff and they go to get it done.

Speaker 2 Uh,

Speaker 1 what seems wrong with that picture of this is the capabilities you think might be possible?

Speaker 2 Yeah, it's hard to say exactly what will be the deficit. I mean, I would say that

Speaker 2 when you talk to the models today, they have various um

Speaker 2 uh weaknesses besides uh long-term coherence in terms of also like um like really uh thinking hard about things or paying attention to what you ask them.

Speaker 2 So

Speaker 2 I would say

Speaker 2 I wouldn't expect

Speaker 2 just improving the coherence a little bit to

Speaker 2 be all it takes to get to AGI. But I guess I wouldn't be able to articulate exactly what the main weakness is.
that'll stop them from like being a fully functional colleague.

Speaker 1 It seems like then you should be planning for the possibility you would have AGI very soon.

Speaker 2 Yeah,

Speaker 2 I think that would be reasonable.

Speaker 1 So what's the plan?

Speaker 1 If there's no other bottlenecks, next year or something, you got AGI. What's the plan?

Speaker 2 Well, I would say that if AGI came way sooner than expected,

Speaker 2 we would definitely

Speaker 2 want to be careful about it. And we might want to

Speaker 2 slow down a little bit on training and deployment until we're pretty sure we know

Speaker 2 we can deal with it safely. And we have

Speaker 2 a pretty good handle on what it's going to do, what it can do. So I think,

Speaker 2 yeah,

Speaker 2 we would have to be very careful if it happened way sooner than expected, because I think our understanding is rudimentary in a lot of ways still.

Speaker 1 And what would being careful mean?

Speaker 1 Because presumably you are already careful, right? You do these evaluations before you're

Speaker 2 doing it. Just like

Speaker 2 maybe not

Speaker 2 training the even smarter version,

Speaker 2 being really careful when you do train it, that it's not

Speaker 2 properly sandboxed and everything.

Speaker 2 Maybe not deploying it at scale or

Speaker 2 being careful about what

Speaker 2 scale you deploy it.

Speaker 1 Yeah, I guess I'm not, okay, so let's just play with the scenario. Like it happens next year and then

Speaker 1 you're you're not training a smarter system, but and then you're deploying somewhat in a measured way.

Speaker 1 Yeah,

Speaker 2 I'm wondering,

Speaker 1 presumably, if this is just, this isn't particular to OPA and AI, but this is just intelligence was just much easier than we expected, and this is why it happened.

Speaker 1 And so you wait to deploy a little bit. Now, other companies have the similar level of capabilities.

Speaker 1 What happens next? So you've waited to deploy. What are you waiting for? What are you talking with these?

Speaker 1 What is every company doing in this scenario?

Speaker 2 Yeah. Yeah.
The game theory is a little tough to think through. So, oh, yeah.
So, first of all, I don't think this is going to happen next year, but it's still useful to have the conversation.

Speaker 2 And maybe it's like two or three years instead. But yeah, I guess.
Two or three years is still pretty soon. It's still pretty soon.
I do think you probably need some coordination.

Speaker 2 Like everyone needs to agree on

Speaker 2 some reasonable

Speaker 2 limits to deployment or to further training for this to work.

Speaker 2 Otherwise, you have the race dynamics where

Speaker 2 everyone's trying to stay ahead and

Speaker 2 that might require compromising on safety. So I think you would probably need some coordination among the larger entities that are doing this kind of training.

Speaker 1 And so you're coordinating to,

Speaker 1 I guess, pause deployment

Speaker 1 until what exactly? Like until you figure out what's happening in the model?

Speaker 2 Like pause either further training, pause deployment, like

Speaker 2 avoid certain types of training that we think might be riskier.

Speaker 2 So just like setting up some reasonable rules for

Speaker 2 what everyone should do to yeah, having everyone somewhat limit

Speaker 2 limit these things.

Speaker 1 But limit to what end? Because I guess at some point, then you're going to have to like the potential energy that's within this intelligence world,

Speaker 1 you know, it'll be only show

Speaker 1 what

Speaker 1 is the plan to do, like suppose in two years we get the AGI and now everybody's freaking out. And so now the AI companies have paused.

Speaker 1 And now what?

Speaker 1 What would be the plan to wait till?

Speaker 2 Yeah, that's,

Speaker 2 I don't have a good answer to that. I mean, I would say

Speaker 2 if we can, if everyone is going to coordinate like that,

Speaker 2 I think we would be,

Speaker 2 that would be an okay scenario. That would be a pretty good scenario, because I do think

Speaker 2 like

Speaker 2 building these models is very capital intensive and there are a lot of complex pieces. So it's not like everyone's going to go and recreate the stuff at home.

Speaker 2 So I think it is possible to do, given the relatively small number of entities who could train the largest models, it does seem possible to coordinate. So I'm not sure how

Speaker 2 you you would maintain

Speaker 2 this equilibrium for a long period of time. But I think if we got to that point, we would be in an okay position.

Speaker 1 Or would we? I guess I'm curious.

Speaker 1 I'm not sure what happens next. Because fundamentally, the problem

Speaker 1 or the benefit is that

Speaker 1 we've got a ton of like, you push it to the server, and now we've got a bunch of intelligences, or they could push themselves to the server.

Speaker 1 And

Speaker 1 now we've got everybody coordinated, but I'm not sure what we do next in this in this world. We're like, why that why that sets this up for a good outcome?

Speaker 2 Yeah, I would say if we had everyone reasonably coordinated, we could figure out some

Speaker 2 and we felt like we had solved the technical problems around alignment well enough to be able to deploy like really smart AIs that can like

Speaker 2 like act as an extension of people's will, but also prevent

Speaker 2 them from being misused in some way that would cause a

Speaker 2 catastrophe. I think then

Speaker 2 that would be great. Like we could go ahead and safely deploy these systems and

Speaker 2 it would usher in a lot of prosperity and a new

Speaker 2 much

Speaker 2 more rapid phase of scientific advancement and so forth. So I think that would be what the good scenario would look like.

Speaker 1 Okay, so that's that makes sense, but I'm curious, how would you know in a couple of years if

Speaker 1 you like all these actors, even in the best case scenario, they've agreed to pause until we've figured out that we're building aligned systems that

Speaker 1 are not themselves going to attempt to take over or coup or are not going to enable somebody else to do that.

Speaker 1 What would proof of that look like or what would evidence of that look like?

Speaker 2 Well, I would say if we

Speaker 2 if we can deploy like systems incrementally that are successively smarter than the ones before, then I think that's safer.

Speaker 2 So, I hope the way things play out is it's not this scenario where everyone has to coordinate and lock things down and safely release things

Speaker 2 like because it would like lead to this big buildup in potential energy, yeah, potentially.

Speaker 2 So, I would rather some scenario where we're just um continually releasing things that are a little better than what came before, and then we uh while like making sure we're um confident that each um diff is right, like improving

Speaker 2 improving the safety and alignment in

Speaker 2 correspondence to the improvement in capability. So, and if things started to look a little bit scary, then we would be able to slow things down.
So, that's what I would hope for.

Speaker 2 I would say if there's more of a discontinuous jump, and the question is, how do you know if the thing you've got is safe to release?

Speaker 2 I would say

Speaker 2 I can't give a generic answer. Like I would want to,

Speaker 2 but um

Speaker 2 like the type of thing you might want to do to make that more uh

Speaker 2 more acceptable would be you would want to do um a lot of uh testing like simulated deployment um

Speaker 2 uh you where

Speaker 2 that you expect so red teaming of sorts like you'd want to do that in a way that you feel is like uh much less favorable than uh or much uh

Speaker 2 more likely to fail than the thing you're planning to do in the real world. You'd want to have a really good monitoring system so that you can,

Speaker 2 like,

Speaker 2 if something does start to go wrong with the deployed system, you can

Speaker 2 you feel like it's going to be detectable immediately. Like you've got, maybe you've got something watching over the deployed AIs and what they're doing and looking for signs of trouble.
So

Speaker 2 I would want to

Speaker 2 Yeah, I would say just

Speaker 2 you'd want some defense in depth. Like you'd want to have some combination of

Speaker 2 like the model itself seems to be

Speaker 2 like really well behaved and have like impeccable moral confus and everything.

Speaker 2 And you're pretty confident that it's extremely resistant to any kind of takeover attempt or something or like severe misuse.

Speaker 2 And then you would also want to have like really good monitoring on top of it. So yeah, you could detect any kind of any trouble.

Speaker 1 What are you keeping track of while you're doing Long Horizon RL or when you eventually start doing it that

Speaker 1 you could notice this sort of discontinuous jump before you deployed these systems broadly?

Speaker 2 I would say you would want to have a lot of evalves that you're running during the training process.

Speaker 1 And what specifically would it, how would you notice something like?

Speaker 1 Yeah. And

Speaker 1 does it make sense to train on a Long Horizon RL knowing that this is something that could happen? Or is it just like a very low possibility? How do you think about this?

Speaker 2 You'd want to be pretty careful when you do this kind of training if you see

Speaker 2 a lot of

Speaker 2 potentially scary capabilities

Speaker 2 if those seem close. I mean, like,

Speaker 2 I would say it's not something

Speaker 2 we have to be scared of right now, because right now it's hard to get the models to do anything like coherent. But if they started to get really good, I think,

Speaker 2 yeah, I think we would want to

Speaker 2 we would have to take some of these questions seriously and we would want to have a lot of evals that

Speaker 2 like sort of test them for misbehavior and the most or I guess that's like for the alignment of the models we want to check

Speaker 2 we would want to check that they're not gonna

Speaker 2 they're not gonna

Speaker 2 sort of turn against us or something but you you might also want to look for like discontinuous jumps and capabilities like

Speaker 2 you you'd want to have lots of evals for the capabilities of the models. I mean, also, I guess

Speaker 2 you'd also want to make sure that whatever you're training on doesn't have any reason to make the model turn against you, which itself, I think, isn't,

Speaker 2 I would say there's like

Speaker 2 that doesn't seem like the hardest thing to do. I mean, if

Speaker 2 like the way we train them with RLHF,

Speaker 2 That does feel, even though the models are very smart, it does feel very safe because the model is just trying to produce a message that is pleasing to a human and it has no concern about anything else in the world other than whether this text it produces is

Speaker 2 approved.

Speaker 2 So obviously, if you were doing something

Speaker 2 where the model has,

Speaker 2 yeah, it's carrying out a long sequence of actions, which involve tools and everything, then it might have some incentive to do a lot of wacky, like wacky things that wouldn't make sense to a human in the process of producing its final result.

Speaker 2 But I guess it wouldn't necessarily have an incentive incentive to do anything other than produce a very high-quality output at the end. So,

Speaker 2 like, it's not,

Speaker 2 yeah. So, I guess you have these old points about like instrumental convergence.
Like, the model is going to want to take over the world so it can produce this awesome piece of code at the end.

Speaker 2 Like, if you ask it to write you the Flask app, it'll be like, oh, yeah, first I need to take over the world and then I need to, I don't know.

Speaker 2 But at a certain point, it's a little bit, it's a little hard to imagine imagine why for some like fairly well-specified tasks like that, you would want to first take over the world.

Speaker 2 But of course, yeah, if you had a task like make money,

Speaker 2 then maybe

Speaker 2 that would lead to some nefarious behavior as an instrumental goal.

Speaker 1 Yeah. Okay.
So before we get back to that, I think let's step back and talk about like

Speaker 1 today's RLHF systems and everything.

Speaker 1 But I don't want to follow by the thread at some point. It's kind of interesting.

Speaker 1 Okay, so today's RLHF, the way in which it influences these models is, would you characterize it as, in terms of human psychology, is it a drive?

Speaker 2 Is it a goal?

Speaker 1 Is it an impulse? Like

Speaker 1 psychologically, what kind of thing, in what way is it being changed?

Speaker 1 And not just like the persona of a chatbot, but just like, don't talk that way, talk this other way, or don't put those kind of outputs.

Speaker 2 Yeah, I would say there are probably some analogies with a drive or a goal in humans. So in that

Speaker 2 you're trying to steer towards a certain set of states rather than some other states.

Speaker 2 And so

Speaker 2 I would think that our concept of a drive or a goal has

Speaker 2 other

Speaker 2 elements like the feeling of satisfaction you get for achieving it. And

Speaker 2 those things might be more, like have more to do with the learning algorithm than what the model does at runtime when you just have a a fixed model. So

Speaker 2 I would say there are probably some analogies, though it's,

Speaker 2 I don't know exactly

Speaker 2 how close it is, but I would say to some extent it is

Speaker 2 the models

Speaker 2 do have drives and goals in some meaningful way.

Speaker 2 And in the case of RLHF, where you're trying to maximize human approval as measured by a reward model, the model is just trying to produce something that people are going to like and they're going to judge as correct.

Speaker 1 I've heard two ideas in terms of using that inner monologue type of thing to get better at reasoning, at least publicly, the kinds of things I've seen.

Speaker 1 And I'm curious which you think is more promising. One is that

Speaker 1 the model learns from, it outputs a bunch of potential trains of thought and it learns to follow the one that leads to the correct answer and is trained on that before deployment.

Speaker 1 And the other one is you use a bunch of compute to do inference in deployment, which involves the model talking to itself after, you know, while it's deployed.

Speaker 1 Which one do you expect it to be closer to when it's like really good at reasoning?

Speaker 1 Is it because it's doing just a bunch of inference calls, or is it just because you've trained it to do well at that?

Speaker 2 Well, I would say you could define reasoning as tasks that require some kind of

Speaker 2 like computation at test time or maybe some kind of deduction.

Speaker 2 So by definition, reasoning would be tasks that require like some test time computation and like step-by-step computation.

Speaker 2 On the other hand, I would also expect to gain a lot out of like doing some kind of training time computation or practice at training time.

Speaker 2 So I would think that you get the best results by

Speaker 2 combining these two things.

Speaker 1 So my uncle had prostate cancer, and I wanted to know my own risk. And I got a 23andMe test, but it was mostly useless.

Speaker 1 I mean, the whole, what are your odds of liking chocolate, is not what I was looking for.

Speaker 1 So, I exported my data onto nucleus genomics, and immediately I got my risk profile for almost two dozen diseases, including prostate cancer.

Speaker 1 And it turns out my risk is higher than 97% of people with my ancestry, which is a very useful thing to know because now I know to get screened early.

Speaker 1 Ironically, tests like 23andMe don't even look at the variants which have the largest impact, including for prostate cancer.

Speaker 1 Many people don't know this, but 23andMe looks at less than 0.1% of your DNA. And that's why I pre-ordered Nucleus Premium Whole Genome Sequencing.

Speaker 1 It's the only clinical grade test that reads 100% of your DNA. I've spent a lot of time digging into this company and I think it will be a big change in what we can get out of genetic tests.

Speaker 1 So if you want to live a long and healthy life, you can pre-order NucleusPremium at mynucleus.com. All right, back to John.

Speaker 1 Right now, you know, you have these two ways in which the model learns.

Speaker 1 It's either in training, whether it's pre-training or with the post-training, but it's like most of the compute and training is spent on pre-training and it's just glossing over trillions of tokens, just like standing by as they all, you know, like almost like skimming trillions of tokens worth of information, which if a human was subjected to that, would just be totally confused, right?

Speaker 1 It's like not a very efficient way to learn. And the other way is in context learning, but of course that is more sample efficient there, but it's destroyed with each instance.

Speaker 1 I'm curious if you think that

Speaker 1 there's a path for something in between those where it's not destroyed with each instance, but it's also not as

Speaker 1 not as sort of frivolous as just seeing trillions of tokens where it's more deliberate and active.

Speaker 2 Yeah. So do you mean models having some kind of medium-term memory? So too much to fit in context, but like much smaller scale than pre-training?

Speaker 1 I'm not sure if memory, it might be memory.

Speaker 1 I don't have context, but certainly like when I'm trying to prepare for this conversation,

Speaker 1 it feels like I think of like, I should understand this. So I look it up and I like read it carefully and I maybe think about it as I'm reading it.

Speaker 1 And I'm not sure what it naturally corresponds to in terms of models, but what would that look like? I'm curious.

Speaker 2 I see. So it's not just memory, but it's also somewhat like specializing to a task that

Speaker 2 specializing to a certain task or putting a lot of effort into like some particular project.

Speaker 1 And I'm not even sure specialization more so.

Speaker 1 I'm thinking about, I don't understand this part. So let me look into this part deeper.
I already understand this. I'm going to like specializing to your existing knowledge base.

Speaker 1 Yeah.

Speaker 2 I see. So it's not just about finding like,

Speaker 2 I don't know, training on a bunch of sources that are relevant, fine-tuning on some special domain it's also about like like reasoning about like developing some knowledge through your own reasoning and also yeah using some sort of introspection and self-knowledge to figure out what you need to learn yeah

Speaker 2 um yeah I would say that does feel like something that's missing from today's systems

Speaker 2 I mean I would say

Speaker 2 People haven't really pushed too hard on this middle ground between

Speaker 2 like large-scale training, like where you produce the like the snapshot model that's supposed to do everything, like a deployed model. And then like, on the other hand, like in context learning.

Speaker 2 And I think part of that is that we've just been increasing context length so much that there hasn't been an incentive for it.

Speaker 2 So if you can go to like a hundred thousand or a million context, then that's actually quite a lot. And it's not,

Speaker 2 it's not actually the bottleneck in a lot of cases. But I agree that

Speaker 2 you'd probably also want to supplement that by some kind of fine-tuning. Like the

Speaker 2 capabilities you get from fine-tuning and in-context learning are probably somewhat complementary.

Speaker 2 So I would expect us to want to build systems that do some kind of online learning and also have some of these cognitive skills of like introspecting on their own knowledge and seeking out new knowledge that fills in the holes.

Speaker 1 Is this all happening at the same time?

Speaker 1 Is it just like a new training regime where all these things can happen at once?

Speaker 1 Or whether it's the long horizon training or whether it's this kind of training, are they separate or are they just because the model is smart enough so they can both introspect and it can act on longer horizons and you can get adequate reward on long horizon tasks.

Speaker 2 Yeah, I would say if you're doing some kind of long horizon task,

Speaker 2 well, I would

Speaker 2 you're learning while you do the task, right? So the only way to do something that involves a lot of steps is to like to have learning and memory that gets updated during the task.

Speaker 2 So like there's a continuum between

Speaker 2 like

Speaker 2 short-term memory,

Speaker 2 between short-term and long-term memory. So

Speaker 2 I would say

Speaker 2 Yeah,

Speaker 2 I would expect this capability would start to become like the need for it would start to become clear when we start to look at long horizon tasks more and

Speaker 2 to some extent just putting a lot of stuff into context will probably will take you pretty far because we have really long contexts now. But you probably also want things like fine-tuning.

Speaker 2 And as for like introspection and the ability to do active learning,

Speaker 2 that might

Speaker 2 like automatically fall out of the model's abilities to know what they know because they have some, like,

Speaker 2 models have some calibration

Speaker 2 regarding what they know. And that's why, like, that's why

Speaker 2 models don't hallucinate that badly

Speaker 2 because, yeah, they have some understanding of

Speaker 2 their own limitations. So, I think that like same kind of ability could be used for something like active learning.

Speaker 1 And how,

Speaker 1 so

Speaker 1 there's all these complicating RL procedures

Speaker 1 that many of whom you've pioneered, how many of them will be relevant when you get to the point where the model itself is this smart that it can act as a solid environment and

Speaker 1 interact in a more online and stable way?

Speaker 1 Is the path for progress going to be more straightforward than the kinds of solutions that were required for RL in the past?

Speaker 2 Well, I think policy gradient algorithms are not the most sample efficient algorithms. So that's probably not what you want to do at test time if you want to learn really fast.
But though who knows?

Speaker 2 I mean, maybe it's not that bad.

Speaker 2 So

Speaker 2 I think

Speaker 2 something like motor learning in animals is probably something like a policy grading algorithm. And

Speaker 2 so, for example, you're like learning how to shoot baskets.

Speaker 2 I think you probably like that takes maybe thousands of tries to get more accurate. And I think you probably, there's probably something that's like a policy grading algorithm underneath.

Speaker 2 But that's not going to be the fastest way to learn in like if you have a model trying to do a project or some kind of task. So I would think we would want to rely more on like in-context learning

Speaker 2 where

Speaker 2 you effectively have a learned algorithm, like you've learned how to explore, like you've learned how to try all the possibilities exhaustively.

Speaker 2 And instead of doing the same thing over and over again, making the same mistake. So yeah, I would say we'll be able to do things that look more like learned search algorithms.

Speaker 2 And that'll be the kind of thing that gets used in a particular task.

Speaker 2 Interesting.

Speaker 1 All right.

Speaker 1 I want to step back and ask about your own history. So

Speaker 1 at least at OpenAI. So

Speaker 1 you led the creation of ChatGPT. At what point did you realize, first of all, these LLMs are the path to go and then a chat bot would be or some way to instruct them would be a useful thing to do.

Speaker 1 Just walk me through the whole lineage from like when this became your main focus and yeah,

Speaker 1 yeah, what the process was like.

Speaker 2 Yeah, so early.

Speaker 2 So we had

Speaker 2 before ChatGPT, we had

Speaker 2 OpenAI had these instruction following models. And that that was the idea there was

Speaker 2 we had base models and people can prompt them in elaborate ways.

Speaker 2 But they're also kind of hard to prompt. You had to,

Speaker 2 they basically do auto-complete. So you have to set up a very good prompt with some examples.
So

Speaker 2 people at OpenAI

Speaker 2 were working on just taking the base models and making them easier to prompt so that if you just wrote a question, it would answer the question instead of giving you more questions or something.

Speaker 2 So, we had these instruction-following models, which were kind of like base models, but a little easier to use.

Speaker 2 And those were the original ones deployed in the API, or after GPD-3, those were the next generation of models.

Speaker 2 Then, at the same time, there were definitely a lot of people thinking about chat. So,

Speaker 2 Google had some papers, like they had Lambda and

Speaker 2 earlier MENA. So, they had these chat bots and it was more like

Speaker 2 you had a, it was more like a base model that was really specialized to the task of chat, really good at chat.

Speaker 2 And like, I think at least looking at the examples from the paper, it was more used for sort of fun applications like

Speaker 2 where the model would like take on some persona and pretend to be that persona. It was not so functional, like help me refactor my code.

Speaker 2 So yeah, there are definitely people thinking about chat. I had worked on a project before

Speaker 2 looking at chat called WebGPT, which is more about doing question answering with the help of web browsing and retrieval. And

Speaker 2 well, when you do question answering, it really wants to be in a chat because

Speaker 2 you always want to ask follow-up questions or sometimes you need a clarify, the model should ask a clarifying question because the question is ambiguous.

Speaker 2 So it was kind of clear after we did the first version of that that we should, the next version should be conversational.

Speaker 2 So anyway, we started working on like a conversational chat assistant.

Speaker 2 And

Speaker 2 we,

Speaker 2 this was built on top of GPD 3.5, which was done training at the beginning of 2022. And that model was quite good at language and code.

Speaker 2 So we quickly realized that it was actually quite good at coding help. And that was one of the things we were excited about.
So yeah, we worked on that.

Speaker 2 We worked on that for most of the year. And we had browsing as another feature in it, though we ended up

Speaker 2 de-emphasizing that later on because

Speaker 2 the model's internal knowledge was so good that we didn't, that the browsing wasn't the most interesting thing about it.

Speaker 2 And then

Speaker 2 we were thinking about, we had it out for beta testing or to friends and family for a while. And we were thinking about doing a public release.

Speaker 2 But at that time,

Speaker 2 actually, GPD-4 finished training in August or

Speaker 2 yeah, in August that year. And

Speaker 2 actually, the like the flagship RL effort at OpenAI was the instruction following effort, because that was the models that were being deployed into production.

Speaker 2 So like the first fine-tunes of GPD-4 used that. um that whole stack and that was um

Speaker 2 yeah those models were really good And everyone got really excited about that after seeing the like instruct fine-tuned GP4s. Uh, but so they were really, really good.

Speaker 2 They would occasionally give you amazing outputs, but they were also like a little bit, the model was clearly like pretty unreliable. Like it would sometimes hallucinate it a lot.

Speaker 2 And it was like pretty, it would sometimes give you pretty unhinged outputs. So it was clearly not quite ready for prime time, but it was like obviously very good.

Speaker 2 And yeah, so I guess that people forgot about chat for a little while after that because about this like alternative branch uh but then we we ended up um we pushed it further and we ended up like mixing together all the data sets like the instruct and the chat data and to try to get something that was the best of both worlds and uh i think the yeah the models we the chat models were like uh

Speaker 2 were clearly more um like it was an easy easier to use it was sort of more um it sort of uh like automatically had much more sensible behavior in terms of like the model knowing its own limitations.

Speaker 2 That was actually one of the things that I got excited about as we were developing it. That

Speaker 2 like I realized a lot of the things that people thought were flaws in language models, like just like blatantly hallucinating,

Speaker 2 could be not completely fixed, but you could make a lot of progress with pretty straightforward methods.

Speaker 2 Oh, yeah, and also the

Speaker 2 other thing about chat was that,

Speaker 2 like

Speaker 2 when we had these instruct models, like the task of complete this text, but in a nice way or in a helpful way, that's like a pretty poorly defined task. So

Speaker 2 I think that task is like both confusing for the model and for the human who's supposed to do the data labeling. Whereas for chat,

Speaker 2 I think people had an intuitive sense of like what a helpful robot should be like. So I think it was just much easier to tell people,

Speaker 2 like

Speaker 2 to give for people to get the idea of what the model was supposed to do.

Speaker 2 And so that, so as a result, I think

Speaker 2 the model had a much more coherent personality and it was much like easier to get

Speaker 2 pretty sensible behavior robustly.

Speaker 1 Interesting.

Speaker 1 Is it the case that anybody could have made ChatGPT using your publicly available fine-tuning API?

Speaker 2 Not exactly. I mean,

Speaker 2 they could have,

Speaker 2 I don't remember the status of which models were

Speaker 2 available for fine-tuning.

Speaker 2 Assuming we had 3.5 available for fine-tuning at the time, you could have made something pretty decently close, but I'm not sure you would have

Speaker 2 I don't think you would have been able to do just one iteration of fine-tuning where you have like

Speaker 2 purely human written data and you fine-tune on that. I think you would want like you would want to do several iterations.
Like if you're not going to do RL,

Speaker 2 which we did,

Speaker 2 you would want to do some kind of iterative supervised fine-tuning where you have like humans edit the model generated outputs because it's really hard to get people to

Speaker 2 like if you train on human generated data, even if it's really high quality, it's just hard for a model to fit that data perfectly because it might not be like, it might not be something a model is capable of outputting.

Speaker 2 So you need to do something iterative that looks a little bit more like RL.

Speaker 2 So I think if you had done that, you could have gotten something pretty close, but that would have been kind of non-trivial.

Speaker 2 But we also had another like instruction following model trained with RL that was released a little before ChatGPT.

Speaker 2 So I think if you put a chat like wrapper on that, you would get something decently close. But it like that model,

Speaker 2 like if you just prompted it with chat.

Speaker 2 But that model had some differences in

Speaker 2 strengths. Like it was, like, that model was pretty good at writing and poetry and so forth, but it wasn't,

Speaker 2 it sort of, it wasn't as good at knowing its limitations and at factuality and so forth.

Speaker 1 So stepping back from 3.5, I think I heard you somewhere say GPT-2, you're super impressed. Compared to your expectations in 2019, has AI progressed faster or slower than you would have expected?

Speaker 2 I would say faster than I would have expected since GPT-2.

Speaker 2 I was pretty like bought into scaling and

Speaker 2 pre-training and so forth being a good idea.

Speaker 2 But

Speaker 2 when GPD2 was done,

Speaker 2 I would say I wasn't completely

Speaker 2 sold on it, being revolutionizing everything.

Speaker 2 Like I only really pivoted what I was working on and what yeah, what my team was working on in after GPD-3. So after that, we kind of got together and said, oh, yeah, let's,

Speaker 2 let's this language model stuff works really well. Let's see what we can do here.
But yeah, after GPD-2, I wasn't quite sure yet.

Speaker 1 Especially if the stuff we were talking about earlier with RL starts working better with the smarter models, will the fraction of compute that is spent on training that is free training versus post-training change significantly in favor of post-training in the future?

Speaker 2 Yeah, there are some arguments for that. I mean, mean, right now it's a pretty lopsided ratio, but you could argue that the

Speaker 2 output generated by the model is like high quality compared to, or higher quality than

Speaker 2 most of what's on the web. So it sort of makes more sense for the model to think by itself

Speaker 2 instead of just like training to imitate what's on the web. So I think there's a first principles argument for that.
And

Speaker 2 I would say we found a lot of gains through post-training. So

Speaker 2 I'm not sure. So I would expect us to keep

Speaker 2 like pushing this methodology and probably increasing the amount of compute we put into it.

Speaker 1 The current GPD-4 has a ELO score, ELO score that is like 100 points higher than the original one that was released.

Speaker 1 And is that all because of what you're talking about with these improvements that are brought on by post-training? Or?

Speaker 2 Yeah, I would say that we've,

Speaker 2 I would say that most most of that is post-training. Interesting.

Speaker 2 So

Speaker 2 there are a lot of different

Speaker 2 separate axes for improvement. Like you can,

Speaker 2 yeah, so we think about like data quality, data quantity, just doing more iterations of the whole process of deploying and collecting new data and like changing what you're, what kind of annotations you're collecting.

Speaker 2 So there's a lot of a lot of things that stack up, but together they give you a pretty good

Speaker 2 like effective compute increase.

Speaker 1 Yeah, I mean, that's a huge increase. That's like really interesting that there's this much

Speaker 1 room for improvement from post-training.

Speaker 1 What makes for somebody who's really good at doing this sort of RL research?

Speaker 1 I hear it's super finicky, but like, what is the sort of intuitions that you have that enable you to find these ways to mess with the data and set up these environments?

Speaker 2 I'd say I just

Speaker 2 have a decent amount of experience at this point from

Speaker 2 like

Speaker 2 the different parts of the stack from like

Speaker 2 RL algorithms, obviously,

Speaker 2 since I've worked on those since grad school to

Speaker 2 like

Speaker 2 the data collection, like the annotation process

Speaker 2 to

Speaker 2 like language playing with language models.

Speaker 2 So I've I mean, I'd say I just dabbled with these things and I'd I'd say the people who do well at this kind of research have some view of the whole stack and have a lot of curiosity about the different parts of it.

Speaker 2 And

Speaker 2 also sort of think about,

Speaker 2 well, you want to be both empirical

Speaker 2 and like use experiment, let experiments update your views, but you also want to think from first principles somewhat, like

Speaker 2 what

Speaker 2 like assuming that like learning works, like what would be the ideal type of data to collect and that sort of thing.

Speaker 1 So because there doesn't seem to be a model released since GPT-4 that seems to be significantly better, there seems to be the hypothesis that potentially we're hitting some sort of plateau and that these models aren't actually generalizing that well.

Speaker 1 And you're going to hit some sort of data wall beyond which point the

Speaker 1 abilities that are unlocked by memorizing a vast corpus of pre-training data won't actually help you get something much smarter than GPT-4.

Speaker 1 What do you think that hypothesis is? Is that wrong? And, like, I think we've talked about some examples generically about generalization, the Spanish to English and so forth. But is there, yeah,

Speaker 1 I mean, okay, so

Speaker 1 maybe this is a run-on question, but

Speaker 1 one example I was thinking of was the idea that

Speaker 1 there is transfer from language code, reasoning and code. If you train a bunch bunch of code, it gets better at reasoning and language.
And

Speaker 1 is that actually the case? Do you see things like that, which suggests that there's all this kind of positive transfer between different modalities?

Speaker 1 So once you try training on a bunch of videos and images, it'll get smarter and it'll get smatter from synthetic data.

Speaker 1 Or does it seem like the abilities that are unlocked are extremely local to the exact kind of labels and data you put into the training corpus?

Speaker 2 Yeah. Okay.
Yeah, I'll try to respond to all of that. So

Speaker 2 first, are we about to hit the data wall? I mean, I wouldn't draw too much from the

Speaker 2 time since GPD-4 was released because I mean, it does,

Speaker 2 yeah, it takes a while to like train these models and to

Speaker 2 like get all the

Speaker 2 do all the prep to

Speaker 2 train a new model, like generation of models. So

Speaker 2 yeah, I wouldn't draw too much from that fact.

Speaker 2 I would say there are definitely some challenges from the limited amount of data,

Speaker 2 but I wouldn't expect us to immediately hit the data wall, but I would expect

Speaker 2 the nature of pre-training to somewhat change over time as we get closer to it.

Speaker 2 In terms of like

Speaker 2 generalization from different types of pre-training data,

Speaker 2 I would say it's pretty hard to do science on this type of question because you can't do that, create that many pre-trained models. So maybe you can't train

Speaker 2 a GPT-4 size model. You can't do ablation studies at GPD-4

Speaker 2 scale. Maybe you can do like train a ton of GPT-2 size models or maybe even a GPT-3 size model with different data blends and see what you get.
So I'm not like

Speaker 2 aware of any results or

Speaker 2 public results on ablations involving code data and reasoning performance and so forth.

Speaker 2 So that would be, I would be very interested to know about those results.

Speaker 1 I'm actually curious about,

Speaker 1 I mean, if one of the things is that the model gets smarter as it's bigger, would an ablation on a GPT-2 level model, which suggests that there isn't that much transfer, how much evidence does that provide for the level of transfer on a similar set of domains in the GPT-4-level model?

Speaker 2 Right. You might not be able to conclude that if transfer fails at GBD2 size, then it's also going to fail at a higher scale.
So it might be that

Speaker 2 like for the smaller models,

Speaker 2 you

Speaker 2 yeah, for the larger models, you learn these better shared representations.

Speaker 2 Or the smaller models have to lean too much on memorization, whereas the larger models can learn how to do the right computation. So I would expect this to be true to some extent.

Speaker 1 This might have a very simple answer, but so bigger models, you train train them on the same amount of data and they become smarter.

Speaker 1 Or conversely, they can, to get the same amount of smarts, you have to train them on less data.

Speaker 1 Why is that the case?

Speaker 1 It's got more parameters. It saw less things and now it's equally as smart.

Speaker 1 Why is that the case?

Speaker 2 I don't think anyone has a good answer for a good explanation of the scaling law with parameter count. I mean, there's some,

Speaker 2 I don't even know what the best, uh, what the best

Speaker 2 sort of mental model is for this. Like, clearly, you have more capacity if you have a bigger model, but uh, so like you should be able to eventually get uh lower loss.

Speaker 2 But I guess, why are bigger models more sample efficient? Um, I guess you could, um, I could give you some like very sketchy uh explanations. Yes, please.
Like, uh, like they have, um

Speaker 2 like you could say that the model is uh like uh sort of an uh an ensemble of a bunch of different circuits that do the computation so it has like um you could imagine that it's doing um it has a bunch of uh like computations that it's doing in parallel and it's uh like doing some like the output is a weighted combination of them uh and uh

Speaker 2 if you have more um just width of the model or if you just have i mean actually width is somewhat similar to depth because uh like with residual networks uh you end up like

Speaker 2 the depth can do something similar to width in terms of like updating what's in the residual stream. But

Speaker 2 if you, yeah, you could argue that you're learning all these things in parallel.

Speaker 2 You're learning all these different computations in parallel, and you just have more of them with the bigger model. So you have more chance that one of them is lucky and ends up

Speaker 2 like having high like like winning, guessing correctly a lot and getting upweighted. So that's kind of like

Speaker 2 what would be the, yeah, there's some algorithms that work this way, like that

Speaker 2 like mixture,

Speaker 2 what is it, mixture,

Speaker 2 some kind of mixture model

Speaker 2 or multiplicative weight update algorithm. Yeah, there's some algorithms that kind of work like this.
So

Speaker 2 where you have like a

Speaker 2 some kind of mixture of, I don't want to say mixture of experts because it means something different, but like basically a weighted combination of experts with some learned gating.

Speaker 2 And

Speaker 2 actually, anyway, I said something slightly wrong, but anyway,

Speaker 2 yeah, you can imagine something like that. And just having a bigger model gives you more chances to get the right function.
So that would be

Speaker 2 and then of course, it's not just like you have a bunch of like totally disjoint like functions that have

Speaker 2 you're taking a linear combination of. It's more like a library where you might chain the functions together in some way.
So

Speaker 2 there's some composability.

Speaker 2 So, yeah, so I would just say there's like the bigger model has a bigger library of different computations, including lots of stuff that's kind of dormant and only being used some of the time.

Speaker 2 But those things, but it has like more space to look for the

Speaker 2 like, look for the circuits to do something useful.

Speaker 1 I want to ask you about

Speaker 1 stepping back from the current research questions, just stepping back, I want to understand your sort of like modal scenario of what happens over the next few years. I think

Speaker 1 towards the beginning of the conversation, we were talking about the case in which it progresses really fast. But just let's just take the modal scenario.

Speaker 1 You're unlocking Long Horizon RL at some point, but then, as you said, there's potentially other bottlenecks. So, what's happening?

Speaker 1 You know, how good are these models? How are they being deployed? What other modalities are part of them?

Speaker 1 At what stage are these being unlocked and so forth? And you just kind of want to understand your broader picture of what the next few years look like.

Speaker 2 Yeah,

Speaker 2 I would expect things like, okay, new modalities to be added

Speaker 2 over time or pretty soon.

Speaker 2 I would expect the capabilities to generally keep getting better through a combination of pre-training and post-training, and that'll open up new use cases. So right now,

Speaker 2 AI is still not a huge part of the economy. Like there's a pretty small fraction of jobs that it can help with at all.

Speaker 2 So I'd expect that to be higher over time. And not just from the models improving, also from people just figuring out how to integrate them into different processes.

Speaker 2 So even if we just froze the models at their current state,

Speaker 2 I think you would still see a lot of growth in how they're being used.

Speaker 2 So I would expect there to be a lot of

Speaker 2 like, I would expect AI to be

Speaker 2 used much more widely. And

Speaker 2 I would expect it to be used for more kind of technique, like technically sophisticated tasks. Like

Speaker 2 I gave the programming example earlier of doing like longer projects, but also helping with various kinds of research. So I would hope that we can use

Speaker 2 AI to accelerate science in various ways. And

Speaker 2 just

Speaker 2 like

Speaker 2 because you can potentially have the models like understand all of the literature in a given field and be able to like

Speaker 2 be able to sift through tons of data

Speaker 2 like more than a person would have patience to do. So I would hope that we can basically

Speaker 2 like

Speaker 2 Yeah,

Speaker 2 well, I hope the form factor would basically be that people are still driving all of this and you have your like helpful assistants that you can use.

Speaker 2 You can sort of direct and point to lots of different problems that are useful to you. And everyone sort of has all these AIs helping them

Speaker 2 helping them do more, get more done. Hey, everybody.

Speaker 1 Real quick, I want to tell you about a tool that I wish more applications used. So obviously you've noticed every single company is trying to add an AI chatbot to their website.

Speaker 1 But as a user, I usually find them really annoying because they give these long, generic, often useless answers.

Speaker 1 Command bar is a user assistant that you can just embed into your website or application. And it feels like you're talking to a friendly human support agent who is browsing with you and for you.

Speaker 1 And it's much more personalized than a regular chatbot. It can actually look up users' history and respond differently based on that.
It can use APIs to perform actions.

Speaker 1 It can even proactively nudge users to explore new features.

Speaker 1 One thing that I think is really cool is that instead of just outputting text, Commandbar can kind of just say here, let me show you and start browsing alongside the user.

Speaker 1 Anyways, they're in a bunch of great products already. You can learn more about them at commandbar.com.
Thanks to them for sponsoring this episode.

Speaker 1 But obviously at some point, they're going to be better than everyone, whatever they want to do. So

Speaker 1 what will that process look like? Right now, they're clearly only helping you at some point. They're able to just do things for you and maybe like run entire firms for you or whatever.
Um,

Speaker 1 at that point, is it yeah, is it just going to be like a smooth process? And at that point,

Speaker 1 the hope is that we have systems that are aligned with the user enough that they can count on the firm being run in the way they expect and so forth.

Speaker 2 Yeah, I think, um,

Speaker 2 well, we might not want to jump to uh having AIs run whole firms immediately. I mean,

Speaker 2 we might want to have people like overseeing

Speaker 2 like overseeing these

Speaker 2 important decisions and calling the shots. So

Speaker 2 even if the models are good enough to

Speaker 2 actually run a successful business themselves.

Speaker 2 So

Speaker 2 yeah, to some extent, there might be choices there.

Speaker 2 And I think people will still have different interests and what they want to different ideas for what kind of interesting pursuits they want to direct their AIs at. And like they can,

Speaker 2 people could like

Speaker 2 do a lot of

Speaker 2 AI doesn't necessarily have an intrinsic,

Speaker 2 like

Speaker 2 any kind of intrinsic desire of its own unless we put we put it in the system. So I think

Speaker 2 so people can still end up being, even if AIs AIs

Speaker 2 become extremely capable,

Speaker 2 I would hope that people are still the drivers of like what the AIs end up doing.

Speaker 1 Yeah, but I wonder if the economic equilibrium is so far from that where

Speaker 1 you have the equivalent of Amdahl's law in a firm. The slowest part of the process is the one that's going to bottleneck you.

Speaker 1 And so, you know, the AI makes all the non-human parts of the firm 10x more efficient. The firm can no longer, you know, it's still bottlenecked by that step.
And so

Speaker 1 if in the, if like one company decides to proceed by keeping humans in the loop on all the things that you really want human oversight on, then they'll just be out-competed by other companies.

Speaker 1 If one country decides to go this route, other countries will beat it. This doesn't seem,

Speaker 1 I hope this is like, yeah, I wonder if this is a sort of a sustainable plan for keeping humans in the loop.

Speaker 2 Right. So I think if you,

Speaker 2 if we wanted to keep humans in the loop,

Speaker 2 which seems reasonable,

Speaker 2 and it turned out that firms with any humans in the loop were out-competed by firms that didn't have any humans, then I think then you would obviously need some kind of regulation that

Speaker 2 disallowed having no humans in the loop for running a whole company.

Speaker 1 But there's so many companies in the world,

Speaker 1 well, I guess in any country, but let alone the world.

Speaker 1 Yeah, I wonder if it's better to do the regulation on companies and to say, like, you've got to keep humans in the loop on important processes, but then you got to define what important processes are.

Speaker 1 You've got to monitor every single company.

Speaker 1 And you also got to get collaboration in every single country which has firms in it versus if this is a problem, should it be solved before the model is even deployed, such that hopefully you would get into a situation where you did decide to build a firm end-to-end on these models.

Speaker 1 It's basically does what you want it to do and you don't need a human in the loop. Does that question make sense? Like, I guess I'm just wondering in this situation,

Speaker 1 how do we actually monitor every single firm as a human in the loop? And what happens if China doesn't decide to do that and so forth?

Speaker 2 Right. Um,

Speaker 2 yeah, you would either have to have uh like um every country uh

Speaker 2 agree to this regulatory regime, or you would need every um, you need all of the model infrastructure or the model providers to agree to this uh kind of requirement.

Speaker 2 Um, so it's definitely uh going to be non-trivial. Um, So

Speaker 2 I guess,

Speaker 2 yeah, this is looking a ways ahead. So it's a little hard to

Speaker 2 imagine this world

Speaker 2 before seeing

Speaker 2 anything like it.

Speaker 2 But

Speaker 2 so for example, like there's some questions like,

Speaker 2 are we actually confident that

Speaker 2 AI-run companies are better in every way?

Speaker 2 Or do we think they're better most of the time, but occasionally they malfunction because AIs are still like, they're still less sample efficient in certain ways, like dealing with very wacky situations.

Speaker 2 So

Speaker 2 actually

Speaker 2 AI-run firms have higher tail risk because they're more likely to malfunction in a big way.

Speaker 2 So I guess that there might be some questions, practical questions like that that would that would also determine how things play off, like play out. Like maybe

Speaker 2 if you just require people to be accountable for various like liability, this would also change the incentives a bit.

Speaker 2 So if it turned out that like

Speaker 2 AIs are better at running everything and they're also completely benevolent and we've like totally solved alignment and we can like they're better at being accountable to like their

Speaker 2 to people than people are, then I would say maybe it's okay having the AIs run the firms, but I think that's that might be pretty far out.

Speaker 2 And I think we we're more likely to be in a situation where they look better like in the short term, but they still have some problems. Like the AI-run entities still have some serious problems.

Speaker 2 And it's actually like practical considerations that push you more towards having humans in the loop, at least for the near future.

Speaker 1 Okay, so this is a problem you got to deal with today with RLHF, where you have to aggregate preferences across a lot of different humans.

Speaker 1 And it'll be maybe more marked with future more powerful systems.

Speaker 1 But when you say, well, we want these eventual AI systems that are going to fully replace humans as part of these firms to be aligned,

Speaker 1 what does that mean? Like, will it mean that they basically do what the user wants them to do? Does it mean that they have to

Speaker 1 result in some sort of global outcome that we're happy with as

Speaker 1 the kind of people with the stakeholders in OpenAI?

Speaker 1 What concretely would that mean?

Speaker 2 If the models are being used

Speaker 2 for these higher stakes use cases, then we would have to think about RLHF in a much different way than we are right now.

Speaker 2 So I would say we're not quite

Speaker 2 quite ready for that or the current methods

Speaker 2 might not be completely sufficient. But

Speaker 2 I would say we would need to make compromises between

Speaker 2 the needs of the different stakeholders involved.

Speaker 2 So we have this document that

Speaker 2 we're releasing called the model spec. And it's about how we want our models to behave in the API and in ChatGPT.
And

Speaker 2 we try to talk about this issue where there are different stakeholders involved and sometimes there are conflicts between what they might want.

Speaker 2 In our case, we were thinking of the stakeholders as

Speaker 2 the user or the end user. That means like someone sitting in front of ChatGPT or some other app.

Speaker 2 The developer, so this is like someone using the API who might be serving other end users with their app, like the

Speaker 2 platform, which is OpenAI. Like

Speaker 2 we don't want the models to

Speaker 2 expose us to legal risk and so forth.

Speaker 2 And then the rest of the

Speaker 2 humanity, including people not part of the, like,

Speaker 2 who might not be users or customers or anything. So obviously, like, the user might ask,

Speaker 2 ask the model to do something that we think is like actively harmful to other people.

Speaker 2 And so we might have to refuse that.

Speaker 2 By the way, this isn't the order of priority necessarily. So this is just like we have these four or so classes of stakeholder.

Speaker 2 Actually, you could also say maybe in the future, we'll say the model itself, the model itself. So I would say we're not going there yet.

Speaker 2 But anyway,

Speaker 2 we have these different stakeholders. Sometimes they have conflicting demands and we have to make some call on how to resolve those conflicts.
And it's not always obvious how to do that.

Speaker 2 So

Speaker 2 I would say we had to think through,

Speaker 2 yeah, we just had to think through the trade-offs. And basically the like the rough heuristic is that we mostly want the models to

Speaker 2 follow your instructions and be helpful to the user and the developer.

Speaker 2 But when this impinges on other people's

Speaker 2 happiness or

Speaker 2 way of life, this becomes a problem and we have to block certain kinds of usage. But we don't want to be too, we mostly want the models to just be an extension of people's will and do what they say.

Speaker 2 We don't want to be too paternalistic. We want to be kind of neutral and not like impose our opinions on people.

Speaker 2 Yeah, we want to mostly let people do what they want with the models.

Speaker 1 I got a chance to read the spec beforehand and

Speaker 1 it was,

Speaker 1 I guess, it's a question of how well that transfers over to how the model itself behaves. But

Speaker 1 I was impressed with how sensible the trade-offs were.

Speaker 1 Like it made sense that this is the, I believe it was like explicitly stated the actual edge cases rather than the kinds of things where everybody can,

Speaker 1 which are obvious. Like in this case, you really are going after the edge cases.

Speaker 2 Yeah, we wanted it to be very actionable so that it wasn't just a bunch of nice sounding principles, but it was like each

Speaker 2 each example kind of tells you something about some non-obvious situation and reasons through that situation. Yeah.

Speaker 1 Okay. Now I have a couple of questions about the

Speaker 1 state of the research itself.

Speaker 2 So

Speaker 1 famously in the social sciences, things are really hard to replicate. And it's a question about how much of the science there is real versus these manufactured, bespoke sorts of experiments.

Speaker 1 When you look at the average ML paper, does it feel like

Speaker 1 a really solid piece of literature or does it feel often like it's the equivalent of what p-hacking is in the social sciences?

Speaker 2 Everyone has their complaints about the ML literature, but I would say overall, I think it's

Speaker 2 a relatively healthy field compared to some other ones like in the social sciences,

Speaker 2 just because,

Speaker 2 well,

Speaker 2 it's largely grounded in practicality and getting things to work. And

Speaker 2 if you publish something that can't be replicated easily, then people will just forget about it.

Speaker 2 And it's like accepted that often

Speaker 2 you don't just report someone's number from their paper, you also try to re-implement their method and compare it to your method on the same, say, on the same training data set. So I think

Speaker 2 if you publish methods that are like really hard to implement or don't, or are really finicky, they'll tend to get forgotten. And as a result, people actually try to open source their work a lot.

Speaker 2 I guess there's also various

Speaker 2 incentives

Speaker 2 that

Speaker 2 there's various unfavorable incentives. Like, yeah, people are incentivized to make the baseline methods, like the methods they're comparing to worse.
And like there are other

Speaker 2 mild pathologies, like trying to make your methods seem sophisticated mathematically.

Speaker 2 But I would say overall, I feel like the field makes progress. And

Speaker 2 I would probably like to see a little bit more science and trying to understand things rather than more like hill climbing on benchmarks.

Speaker 2 trying to propose new methods. And there's been a decent amount of that recently.
But

Speaker 2 yeah, I think it's uh we could use more of that and I think that's a good thing for like academics to work on um oh yeah on the social sciences uh on a slightly different note uh I think actually um

Speaker 2 I would be really excited to see more research and using base models to do simulated social science

Speaker 2 because uh these models have a probabilistic model of the whole world and you can uh set up like a simulated questionnaire or um like a conversation and um

Speaker 2 like

Speaker 2 and you can look at how any anything is correlated like any um any traits that you might imagine you can see how they might be correlated with other traits so it would be pretty cool to see if people could replicate some of the like more notable results in the social sciences like like moral foundations and that sort of thing yeah by just like uh prompting base models in different ways and seeing what's correlated.

Speaker 1 What is that Sanford experiment? The

Speaker 1 The one where

Speaker 1 the ash conformity test, right? Oh, yeah. It'd be fun if that replicated with language models as well.

Speaker 1 That'd be interesting. With regards to the research that happens at big labs, how much of it is

Speaker 1 increasing the

Speaker 1 or decreasing the amount of compute you need to get?

Speaker 1 a certain result as an actual compute multiplier versus how much of it is things that are just making the learning more stable and just building out the infrastructure.

Speaker 1 I guess the broader question I try to ask is: since GPT-4, does it feel like with the same amount of compute, we can train a much better model?

Speaker 1 Or does it feel like, oh, we've made sure that the learning can happen better and in a more scalable way with GPT-5, but it's not like we can train GPT-4 with like GPT-3.5 budget now or something like that?

Speaker 2 Yeah, well,

Speaker 2 definitely there's always progress in improving the efficiency.

Speaker 2 Whenever you have a 1D performance metric, you're going to find that

Speaker 2 different improvements can kind of substitute for each other. So

Speaker 2 you might find that

Speaker 2 post-training and

Speaker 2 pre-training both improve the metrics or like improve,

Speaker 2 they'll have a different, slightly different profile of which metrics they improve. But if at the end of the day, you have a single number,

Speaker 2 they're going to substitute for each other

Speaker 2 somewhat. So I would say for something like

Speaker 2 a human evaluation, like what what do humans prefer,

Speaker 2 we've definitely made a lot of progress on both sides, like pre-training and post-training and improving that.

Speaker 1 A couple of rapid-fire questions about RLHF. So obviously RLHF is important to make these models useful.

Speaker 1 So maybe the lobotomized description is inaccurate, but there is a sense in which all of these models, once they're

Speaker 1 put in a chatbot form, have a very similar way of speaking. They really want to delve into things.
They want to turn things into bullet points. They often seem sort of have this formal and

Speaker 1 dull way of speaking.

Speaker 1 And there's complaints that they're not as creative like what we were talking about before, with it can only do rhyming poetry and not rhyming until recently, I guess.

Speaker 1 Is that a result of the particular way in which RLHF happens now? And if so, like, is it because of who the raiders are? Is it because of what the loss function is?

Speaker 1 Why is this the way all chatbots look?

Speaker 2 Yeah, I would say there's a decent amount of room for variation in exactly how you do the training process. And I think we have a lot of,

Speaker 2 I'd say we're actively trying to improve this and make the writing more lively and

Speaker 2 more fun. And I think we've made some progress like improving the personality of ChatGPT.
So

Speaker 2 it is more fun and like you, it's it's better when you're trying to chit chat with it and so forth. It's less robotic.

Speaker 2 I would say,

Speaker 2 yes, it's a kind kind of interesting question how some of the ticks came about, like

Speaker 2 the word delve. I've actually caught myself using the word a bit recently.

Speaker 2 So I don't know if it rubbed off on me from the model or what.

Speaker 2 But

Speaker 2 actually, I think there's also, there might be some funny effects going on where there's like unintentional distillation.

Speaker 2 uh happening between the language model providers where like if you hire someone to um go do a labeling task they might just be feeding

Speaker 2 feeding it into a model they might just be pulling up their favorite chat pod and like feeding it in and having the model do the task and then copy and pasting it back so there might be

Speaker 2 that that might account for some of the convergence. But also, I think some of the things we're seeing are just what

Speaker 2 people like. I mean, I think people do like bullet points.
They like the structured responses. People do often like the big info dumps that they get from the models.

Speaker 2 So,

Speaker 2 yeah, I think there's

Speaker 2 so it's not completely clear how much is just a quirk of

Speaker 2 the

Speaker 2 particular like choices and like design design of the

Speaker 2 post-training processes and how much is actually intrinsic to like what people actually want.

Speaker 1 It does seem persistently more verbose than

Speaker 1 some people want. And maybe just because during the labeling stage, the raiders will prefer the more verbose answer.
But

Speaker 1 I wonder if

Speaker 1 it's inherent to because of how it's free trading, the stop sequence doesn't come up that often and it really wants to just keep going.

Speaker 2 There might be some biases in the labeling that lead to verbosity, like the fact that we tend to train for one message at a time rather than the full interaction. So

Speaker 2 like

Speaker 2 if you only see one message,

Speaker 2 then something that just has like a clarifying question or maybe a short response with an invitation to follow up is going to be, it's going to look less complete than something that covers all possibilities.

Speaker 2 There's also a question of what people,

Speaker 2 whether people's preferences would change depending on how fast the model is streaming its output.

Speaker 2 Like clearly, if you're sitting there waiting for it to

Speaker 2 waiting for the tokens to come out, you're going to prefer that it gets to the point. But if it just gives you

Speaker 2 a dump of text instantly, maybe you don't actually care if there's a bunch of boilerplate or if there's a bunch of stuff you're in a scheme, you'd rather just have it all there. Yeah.

Speaker 1 The reward model is, I think, such an interesting artifact because it's the closest thing we have to an aggregation of

Speaker 1 what people want, what preferences they have.

Speaker 1 When you think about models that are much smarter,

Speaker 1 the kind of way in which we'll,

Speaker 1 I mean, one hope would be that you could just give a sort of like list of things we want

Speaker 1 that are not a sort of trivial and obvious kinds of like UN Declaration of Rights things.

Speaker 1 On the other hand, I think I heard you make the point that, well, a lot of our preferences and values are very subtle. And so that they might be best represented through these pairwise preferences.

Speaker 1 When you think of a GPD-6 or GPD-7 level model, are we giving it more of like a written instructions or are we still doing

Speaker 1 which kind, you know, these sorts of like subliminal preferences?

Speaker 2 Yeah, that's that's a good question. So I think like these preference models do learn a lot of subtleties

Speaker 2 of,

Speaker 2 yeah, subtleties about what

Speaker 2 people prefer that would be hard to articulate in a like in an instruction manual. Yeah.

Speaker 2 Maybe if you

Speaker 2 like,

Speaker 2 obviously you can write an ex like an instruction manual that has lots of examples of comparisons.

Speaker 2 And that's like that's what the model spec has. It has a lot of examples with some explanation.

Speaker 2 So

Speaker 2 it's not clear what the optimal format is for describing preferences.

Speaker 2 I would guess that whatever you can get out of like a big data set that captures fuzzy preferences, you can distill it down down to a like a smaller, a shorter document that mostly captures the ideas.

Speaker 2 And

Speaker 2 I would think that the big like

Speaker 2 the bigger models are like they do

Speaker 2 like learn a lot of these concepts automatically of what people might find like they'll have some

Speaker 2 they'll just learn from all the pre-training data what people would find useful and helpful and what

Speaker 2 they'll have like some some, there'll be some complex like

Speaker 2 moral theories that they can they have. And they can, but of course, there's still a lot of

Speaker 2 room to latch on to a different like different style or a different morality. So I think like when we have

Speaker 2 like if we were to write a

Speaker 2 a doc, or if we're going to align these models, what we're doing is latching onto a specific,

Speaker 2 like specific style, a specific

Speaker 2 morality. And there's still like a decent, you still need a decent,

Speaker 2 decently long document to capture exactly what you want. Yeah.

Speaker 1 How much of a moat is better post-training?

Speaker 1 Currently, companies distinguish themselves by, well, how big is our model and so forth?

Speaker 1 Will it be a big moat who has figured out all the finickiness that you were talking about earlier with regards to all this data?

Speaker 2 I think there's something of a moat because it's just a very complex

Speaker 2 operation. And there's, so it takes,

Speaker 2 you have to have a lot of skilled people doing it. And so there's a lot of tacit knowledge.
And

Speaker 2 there's a lot of organizational knowledge that's required. So

Speaker 2 I think post-training, like to create a model that actually

Speaker 2 like has all the functionality people care about

Speaker 2 is pretty complicated,

Speaker 2 requires a pretty pretty complicated effort.

Speaker 2 So and this requires a lot of, this is basically an accumulation of a lot of R D.

Speaker 2 So

Speaker 2 I would say that makes it somewhat of a moat that it's not trivial to spin this up immediately.

Speaker 2 It does seem like

Speaker 2 the same companies that are putting together the most serious pre-training efforts are also putting together the serious post-training efforts. So

Speaker 2 it seems like

Speaker 2 it is

Speaker 2 somewhat possible to copy or to spin up more of these efforts.

Speaker 2 There's also like one force that sort of makes it less of a mode is that you can like distill the models or you can take someone else's model and clone the outputs or you can use someone else's model as a judge to like do comparisons.

Speaker 2 So I think like

Speaker 2 the more big league people people probably aren't doing that because it goes against terms of service policies. And it would also be sort of

Speaker 2 hit to their pride. But I would expect some of the smaller players are doing that to get off the ground.

Speaker 2 And that catches you up to a large extent.

Speaker 1 I guess it helps you clear the moat. What is the median raider like? Where are they based? What are their politics?

Speaker 1 What is their sort of knowledge level?

Speaker 2 I would say

Speaker 2 it varies a lot. So we've definitely hired raiders with different skills

Speaker 2 for different kinds of tasks or projects.

Speaker 2 So I would say

Speaker 2 like a decent

Speaker 2 mental model is just look at people who are on Upwork and other platforms like that, like who's doing

Speaker 2 sort of

Speaker 2 odd jobs with remote work.

Speaker 2 So it's

Speaker 2 Yeah, it's a pretty international group, though there's a decent number of people in the U.S.

Speaker 2 We hire different

Speaker 2 people,

Speaker 2 like different groups of people for different types of labeling, like whether we're more focused on writing or like STEM tasks.

Speaker 2 So people doing STEM tasks are more likely to be in India or other sort of

Speaker 2 like middle or lower middle income countries. uh whereas people um doing more like english writing and composition tend more to be like u.s

Speaker 2 So, yeah, and I'd say there have been times when we needed to hire different experts for some of our campaigns. Some of the people are very, some of them are very talented.

Speaker 2 And like, we even find that they're

Speaker 2 at least as good as us, the researchers at doing these tasks. And they're like much more careful than us.
So

Speaker 2 I would say the people we have now

Speaker 2 are quite skilled and conscientious.

Speaker 1 With regards to the sort of plateau narrative, one of the things I've heard is that a lot of the abilities these models have to help you with specific things is related to the

Speaker 1 having very closely matched labels within the

Speaker 1 supervised fine-tuning data set.

Speaker 1 Is that true?

Speaker 1 Like if it can teach me how to use FFmpeg correctly, like there's somebody who's like doing, figuring out, seeing the inputs and seeing what flags you need to add, and some human is figuring that out and smashing to that.

Speaker 1 And is it, yeah, do you need to hire like all these label rollers who have domain expertise in all these different domains?

Speaker 1 Because if that's the case, it seems like it would be a much bigger slog to get these models to be smarter and smarter over time.

Speaker 2 Right. You don't exactly need that

Speaker 2 because

Speaker 2 you can get quite a bit out of generalization.

Speaker 2 So if you

Speaker 2 like,

Speaker 2 like the base model has already been trained on tons of documentation, tons of code with shell scripts and so forth.

Speaker 2 So it's already seen all of the FFmpeg man pages and lots of bash scripts and everything. And

Speaker 2 it's

Speaker 2 so like the base, even just giving the base model a good few shot prompt, you can get it to answer queries like this. And

Speaker 2 just training a preference model

Speaker 2 for helpfulness will,

Speaker 2 even if you don't train it on,

Speaker 2 probably even if you don't train it on any stem it'll somewhat generalize to stem and

Speaker 2 like

Speaker 1 this so you not only do you not need uh like examples of how to use f of mteg you might not even need anything with programming to get some reasonable behavior in the programming domain maybe final question is we've touched on this in different ways but to put it together so you say you're training on much more multimodal data presumably like these things understand what screens look like and will be able to interact with it in a much more coherent way.

Speaker 1 And also, you're going to do this Long Horizon RL. So they'll be able to act as agents and assistants who can be part of your workflow in a much more integrated way.

Speaker 1 What do you expect that to look like? And what will be the next steps from there?

Speaker 1 So suppose by the end of the year or next year, you have something that's like an assistant who can work with you on your screen. Does that seem like, first of all, a sensible thing to expect?

Speaker 1 And then where does it go from there?

Speaker 2 I would definitely

Speaker 2 expect things to move in that direction.

Speaker 2 It's unclear what's going to be the best form factor, whether it's like

Speaker 2 something that's

Speaker 2 like a clippy that's on your computer and helping you with something, or if it's more like a

Speaker 2 helpful colleague in the cloud. So we'll see

Speaker 2 which kinds of form factors

Speaker 2 work the best. And I would expect people to try all of them out.

Speaker 2 Yeah, I would expect more,

Speaker 2 like,

Speaker 2 yeah, I would expect something like a

Speaker 2 mental model of like a

Speaker 2 helpful assistant or helpful colleague to become more real, where you can share more of your

Speaker 2 everyday work or have it like instead of just giving it one-off queries, you would have a whole project that you're doing and it knows about everything you've done on that project so far.

Speaker 2 You can tell it,

Speaker 2 it can like even proactively make suggestions.

Speaker 2 Maybe you can tell it, oh yeah, like remember to ask me about this and if I've made any progress on it. So I think like proactivity is one thing that's been missing.

Speaker 2 Yeah, I'd really love to see better

Speaker 2 like

Speaker 2 a more

Speaker 2 like moving away from sort of one-off queries, like using the model kind of like a search engine, a smarter search engine, and more towards like having having a whole project that I'm like doing in collaboration with the model.

Speaker 2 And it knows everything I've done. It's proactively like

Speaker 2 suggesting things for me to try, or it's going and doing work in the background.

Speaker 1 Yeah,

Speaker 1 that's really interesting. By the way, final question:

Speaker 1 What is your median timeline? It replaces your job.

Speaker 2 Yeah. Oh, replaces my job.

Speaker 2 Maybe like

Speaker 2 five years.

Speaker 1 Yeah, pretty soon. Yeah.

Speaker 1 Interesting. Okay.
Well, John, this is super interesting.

Speaker 1 Yeah.

Speaker 1 Thanks so much for making the time. I think this seems like one of the parts of the AI process that are super important and people don't understand that much about it.

Speaker 1 So it was super interesting to delve into it. Yeah, just your thoughts on it.

Speaker 2 But yeah, thanks for having me on the podcast. It was fun to talk about all this stuff.

Speaker 1 Hey, everybody. I hope you enjoyed that episode.

Speaker 1 The John, he's just a very thoughtful guy, and it's super interesting to learn about the way in which these models become the kind of shaga that they are.

Speaker 1 Anyways, as you can see, I'm now doing ads on the podcast. So if you'd like to advertise, you can reach out at the link in the description.

Speaker 1 And of course, if you enjoyed the episode, it's really helpful if you can share it with other people who you think might enjoy it-your friends, group chats, Twitter, whatever else.

Speaker 1 See you on the next one.

Speaker 2 Cheers.