Humans&: Bridging IQ and EQ in Machine Learning with Eric Zelikman

Transcript

Speaker 1 Hi, listeners. Welcome back to No Priors.
Today we're here with Eric Zelkman, previously of Stanford and XAI.

Speaker 1 We're going to talk about the contributions he's made to research, reasoning, and scaling up RL, as well as his new company, Human's End. Eric, thank you so much for doing this.
Thank you.

Speaker 1 You have had an amazing impact as a researcher, including starting from just your time at Stanford.

Speaker 1 I want to hear about that, but first background of how you got interested in machine learning at all.

Speaker 2 I guess going back like really far, I've been motivated by this question of like, you have, you know, all of these people out there who have like all of these things that they're really talented in, all of these things that people are really passionate about.

Speaker 2 Like, you have like so much, like, you know, there's just so much talent out there.

Speaker 2 And I've always been like a little bit disappointed that like, you know, like so much of that talent doesn't get used just because everyone has like circumstances and like has like these, you know, situations where, you know, they can't actually pursue those things.

Speaker 2 And so for me, AI.

Speaker 1 All of humanity is not living up to their full potential.

Speaker 2 I mean, and then you got to say.

Speaker 2 I mean,

Speaker 2 the thing I've always been excited about is like, how do you actually build? this technology that frees people up to kind of do the things that they are passionate about.

Speaker 2 Like, how do you basically, you know,

Speaker 2 yeah, allow people to actually focus on those things? You know, originally I thought of automation as kind of like the most natural way of doing that.

Speaker 2 Like you automate away the parts that people kind of don't want to do. And that

Speaker 2 frees up people to do the things that they do want to do. But I guess I realized increasingly that it's actually pretty complex.

Speaker 2 You actually have to understand if you want to empower people to do what they want to do, you have to really understand what people actually want to do.

Speaker 2 And building systems that understand kind of people's goals and outcomes is actually really hard.

Speaker 1 Did you have like

Speaker 1 this human-centric perspective when you were choosing research problems to work on originally?

Speaker 2 I guess like at the very beginning, I was just like, when I was choosing research problems, I was just interested in how do you actually make these things half decent?

Speaker 1 So it's more increased capability.

Speaker 2 Yeah.

Speaker 2 First, yeah. I think, I think for me, like, you know, when I looked at like AI, like, or, you know, language models back in like 2021 or whatever, you know, I was like, these things aren't very smart.

Speaker 2 They can't do that much.

Speaker 2 And, and, there was some like early work around there, like, um, that showed that, like, for example, you could use like chain of thought to like, you know, get models to answer more smartly.

Speaker 2 But it was still like only like a small step improvement at that time. Like, there was still the, you know, the benefit of that was, you know, as much as you can really get with just prompting.

Speaker 2 And so back then, I was like thinking about, okay, how do you actually make them like half decent at actually solving these harder problems?

Speaker 1 Can you give a broad, like we have

Speaker 1 everything from researcher audience to business person audience here. Can you give a broad intuition for a star?

Speaker 2 I guess the intuition is if you have a model and it's able to solve these like basic, like these like slightly harder questions by thinking about them, then what if you actually teach it?

Speaker 2 Like, hey, this solution that you came up with, that got you to the right answer. Good job.
Or, you know, if you, or if the model didn't, then you basically like don't reward it.

Speaker 2 I guess the original version of SAR actually had like, or yeah, there were like no, there wasn't a baseline at the time.

Speaker 2 We compared it to Reinforce, which is this like popular algorithm in, I guess, reinforcement learning, like very simple like policy gradient. thing.

Speaker 2 But yeah, I guess, you know, at the time, it was like a very simple algorithm, just, you know, you iteratively generate solutions. If the solutions get you to the right answer, you learn from them.

Speaker 2 If they don't, you don't. And then you just kind of keep doing this as the model solves harder and harder problems and then learns from harder and harder problems.

Speaker 1 Did you,

Speaker 1 at what point in the research, if at all, were you surprised by how well it worked? Or did you have some intuition for this being like something scalable?

Speaker 2 There was one experiment that I remember doing, though this was quite a while ago at this point.

Speaker 2 But we looked at the, I think it was like

Speaker 2 n digit like addition or multiplication. Sorry, it's been a second.

Speaker 2 And one thing that was really interesting was that this back then, this was like a task that was considered like hard for language models.

Speaker 1 Of course, it was considered like one of the examples of why they were still so stupid. Yeah.

Speaker 2 Exactly. And I was like, okay.

Speaker 2 And one of the really interesting things for me was that as you actually trained for more and more iterations, the number of digits that it was actually able to do kept increasing. Okay.

Speaker 2 And I think that this was like one of those big surprises for me. Like, like, oh, wow, like there's no obvious plateau here.

Speaker 1 And did you go directly from that to generally this should scale?

Speaker 2 I think I was generally like the interest in like, yeah. I think there were a few things though.

Speaker 2 Like there was one part of it that we introduced to kind of, we observed that there was a bunch of the data that the model wasn't learning from.

Speaker 2 And so we proposed another variant of this where we actually were like, oh, what if you actually take the ones where it fails and you basically like ask it to reason about like why it should have gotten it right.

Speaker 2 And then you train as if it got it right.

Speaker 2 And this version

Speaker 2 was kind of a way of extending beyond the kind of the parts of the data that

Speaker 2 it couldn't see. So if you only train on like the positive examples, then you end up in this kind of like potential minimum where there's just no more data that it can actually solve.

Speaker 2 And so back then we were like, what if we just

Speaker 2 show it the problems that it didn't solve and try to teach it from those? But I guess another thing that other work has done since then is, oh, what if you just sample a lot?

Speaker 2 And that also seems to work in those works.

Speaker 1 Star has become a broadly used part of the reasoning paradigm since you published. Can you also describe, I think this was like sort of your last published work, like QSTAR?

Speaker 2 Oh, yeah, so Quiet Star was

Speaker 2 kind of the

Speaker 2 Yeah, the last thing that I did back at Stanford. And it was really fun.
I guess we showed a few things that were kind of cool.

Speaker 2 One of the main goals of that paper was to show that you could actually scale this up to like pre-training scale by using like basically pre-training style data.

Speaker 2 I guess now there's like a bunch of these works that have come out recently around like you know RL pre-training and stuff like that. And that's, you know, I guess

Speaker 2 in some ways similar to some of the what we showed in the Quiet Star work. Instead of having question and answer, if you actually just have like,

Speaker 2 you know, these arbitrary kind of like chunks of text, for example, and he tries to predict what's going to come next, which is like the standard language modeling objective.

Speaker 2 Can you actually get models that more generally learn to reason? One of the kind of cooler things that I think is kind of overlooked about the original Quiet Star paper is we showed a bunch of like

Speaker 2 kind of key improvements to the star paper that were necessary to actually do this kind of thing. So that was, for example, showing that it's really valuable for this algorithm to be online,

Speaker 2 showing that it's really valuable for to have a baseline where you like, you know, the harder for harder problems, you learn more. For easier problems, you like, you don't learn quite as much.

Speaker 2 And I think that there were a bunch of like nuggets in there that even at the time, I don't think I fully, you know, thought of as like, oh, wow, that's actually like a cool improvement over the original thing.

Speaker 1 So you ended up going to Grok for several years and you,

Speaker 1 sorry, or XAI for several years and you worked on a bunch of different paradigms. So pre-training data for Grok2 and then overall the reasoning recipe for Grok3.

Speaker 1 I'm sure I'm missing things, but tool use and agentic infrastructure for Grok4.

Speaker 1 I guess when you, if you level set us today, like how smart are models? They can obviously do n-digit

Speaker 1 arithmetic at this point.

Speaker 2 I guess in terms of like IQ stuff, I'd say like there's a lot of, and if you're able to pose the problem like very well,

Speaker 2 like some very advanced like physics problem or math problem, I would say they're they're they're reasonably smart. I think like

Speaker 2 a lot of the failures that people see.

Speaker 1 Give me a human comparison. What is reasonably smart?

Speaker 2 I think, I think it's hard to compare directly because it's very jagged. Yeah.

Speaker 2 Like, like it's, it's true that like some of these, for example, some of the HLE questions that these models are able to solve are genuinely genuinely things that are like non-trivial for like actual PhD researchers.

Speaker 2 I'm not saying they're like

Speaker 2 open problems or anything, but they are like pretty non-trivial. Also, a lot of them are like, you know, one interesting category of like these,

Speaker 2 I spend a lot of time looking at kind of the HLE questions. One interesting category of that.

Speaker 1 Sorry, humanities last exam for anybody who isn't looking at PhD valves.

Speaker 2 No, great.

Speaker 2 Yeah, so but yeah, looking at these humanities last

Speaker 2 exam questions, I kind of

Speaker 2 one kind of category that is like actually quite big are these like tricker trick questions that require, you know, basically people like if you're familiar with it, you'll be like, oh, they're trying to get you to like assume something.

Speaker 2 But actually, like if you think more carefully about this problem, that assumption doesn't hold.

Speaker 2 And this turns out to be like a bunch of. those kinds of problems.
So I think it's like, it's, they're pretty smart, but also they're more, I think, tripped up by some of these like tricky things.

Speaker 2 But also they don't really, I think one of the core things is that they're not smart like emotionally or like they're not smart on the level of like actually understanding kind of what people care about or kind of like how to actually like help people accomplish the things that they care about.

Speaker 1 I want to talk about this and your next mission, but just on this topic of even jagged intelligence within like the IQ domain, which I think every almost everybody in the industry has focused on until now.

Speaker 1 What would you recommend for people who are not researchers to develop some sort of intuition for that surface? Because that seems very important to making them useful.

Speaker 2 Yeah. I guess one thing that's that I think is like really important to keep in mind is that like the more kind of context you can give the current generation of models, the better you kind of are,

Speaker 2 the better off you are.

Speaker 2 Their answers are

Speaker 2 super sensitive to like, like, you know, whatever additional information you can give them. Yeah, I think this is like a really important thing.

Speaker 2 I would generally say like existing models are particularly good at handling questions that are like easy to answer in kind of like a closed form.

Speaker 2 Like if there's like a you know a simple numerical answer to what you're asking or like a simple like way of choosing from a set of things, this is something that these models actually like it obviously it's like all dependent, but this is something that makes it easier for the model.

Speaker 2 If you can imagine it being easy to check your answer, that actually, I think, makes it easier for the models.

Speaker 2 What

Speaker 1 do you think is the most dominant explanation for attempts to use models in very verifi more verifiable domains like code still failing at sophisticated tasks?

Speaker 1 Is it just like the wrong context that's been fed to them? Is it

Speaker 1 context windows simply not large enough to support the like scratch pad and continual testing? Like what, why

Speaker 1 in those domains, what is the biggest challenge?

Speaker 2 Part of it is there's, I think, a balance. When people kind of want to give users these models, it's actually important that they're not annoyingly slow.

Speaker 2 And so I think there's actually like a number of problems where like if you gave the models more time, you know, they would actually be able to answer better.

Speaker 2 But for example, in the kind of coding context, you kind of have to be reasonably responsive. At least it depends on the kind of setup, right? Like if you look at products like OpenAI's Codex,

Speaker 2 which is kind of this longer-running background thing, versus like Cursor, which is

Speaker 2 more interactive, you have a bit more luxury with those

Speaker 2 more background approaches to tackle harder problems, I'd say. Yeah, I think it's a tricky question.
A lot of things depend on how far the distribution of

Speaker 2 what you're asking is from the distribution that the models were actually trained with.

Speaker 2 So, you know, if you happen to be asking a problem that's very similar to the kind of problems that it's seen before, then you know it'll do great.

Speaker 2 And if you're asking a problem that's like very yeah, out of domain.

Speaker 2 So like to some extent, this question is kind of hard to answer concretely without, unless you know, like, basically what the date, what the RL data for a lot of these, you know, specific tasks is.

Speaker 2 Right.

Speaker 1 And today, um, obviously, none of the model or code agent code interface companies are going to release like a capability map for you of what their RL data looks like, which would be very useful because, I mean, intuitively, unless if you just look outside of the pre-training internet data sets right there are types of problems and types of code bases that are much further out of distribution um and so when engineers try in those scenarios obviously they get a dumb dumb agent yeah back right and and you know also like the

Speaker 2 Another thing that matters a lot is just like how verifiable are the things that you're trying to get the model to do. I mean, obviously there's been

Speaker 2 a ton of of work out there on making models like less dependent on verifiable rewards.

Speaker 2 Lots of cool published papers.

Speaker 2 I believe most people would say that there's still a gap between how well these models perform on verifiable tasks versus not verifiable tasks.

Speaker 1 Yeah, absolutely.

Speaker 2 What

Speaker 1 this last real question on IQ, but because it is where 90 plus percent of industry energy, literally energy and compute is focused, how would would you characterize where we are in scaling and the obvious opportunity to improve from here?

Speaker 2 There's still meaningful dimensions of scaling that haven't been, I think, fully explored in terms of

Speaker 2 IQ. I think there's a lot of cool efforts out there.
There's a lot of cool stuff that can, you know, that can still be done on the capabilities access. I do think that one,

Speaker 2 as you start thinking about some of these new of axes of scaling, it's actually very natural to realize that, like, there are ways to do them in ways that

Speaker 2 incorporate people, and there's ways to do them in ways that kind of leave people out more and more.

Speaker 2 And

Speaker 2 being very mindful of,

Speaker 2 oh, hey, I'm designing this new algorithm, and it's going to scale IQ, you know, of this model by X amount.

Speaker 2 If you effectively

Speaker 2 keep people to effectively keep people in the loop, it's actually like a very active decision.

Speaker 2 And so, you know, I think in general, if you're thinking about these things, that's important.

Speaker 1 Wouldn't it be fair to claim that the instinct of many labs is to like try to get people out of the loop as much as possible from a scaling perspective? Because that's very messy, right?

Speaker 1 If I want to recruit people to,

Speaker 1 for example, take complex reasoning traces off them in tasks that are not in distribution for me yet.

Speaker 1 That is not as simple to execute on for an organization

Speaker 1 as like

Speaker 1 more rollouts, right?

Speaker 1 And so, why is that important at all from a capabilities perspective?

Speaker 1 I mean, that's a good transition to like, what are you doing?

Speaker 2 Yeah, I'd say that it,

Speaker 2 the main thing is just that, like,

Speaker 2 as you kind of have these models that, you know, expand in terms of like the horizon that they're automating, you You know, we have these models, the recent like, or recent-ish IMO results are like a kind of a good example of this.

Speaker 2 We have these models that go on for like, you know, hours of, you know, reasoning without any kind of human intervention.

Speaker 2 And this has kind of been an increasing measure of success, I would say, for these labs.

Speaker 2 So for example, you know, there's this METR, like meter, like benchmark that everyone likes to share whenever there's a new model.

Speaker 2 And it's like, oh, we went from being able to have these models work for two, like complete two-hour tasks autonomously without human intervention to 2.5-hour tasks without human intervention.

Speaker 2 And obviously there's like

Speaker 2 questions of like, what do those numbers actually mean?

Speaker 2 And how

Speaker 2 should we take them at face value? But regardless, this is kind of been like the metric that

Speaker 2 people are looking at more and more to measure progress. But

Speaker 2 as we kind of

Speaker 2 get these models that increasingly, you know,

Speaker 2 remove people from the interaction, you end up with basically people having less say in kind of the things that get built.

Speaker 2 You end up with like, you know, I think if you have a model that goes off and does its own thing for like eight hours and comes back to you with like something that like is somewhat there.

Speaker 2 I think this is like a weird regime where like people probably feel

Speaker 2 less like

Speaker 2 real agency over the things that they're building. And I think also

Speaker 2 I kind of anticipate that people will feel like they don't really understand the things that are being built. You know, I think that's already true.
I think it's already true.

Speaker 1 20,000 lines of generated code looks good to me. Yeah.

Speaker 2 It's just like you make these PRs and they're like 100,000 lines of like, you know, like.

Speaker 2 And I think in general, this is kind of going to be

Speaker 2 part of the trend.

Speaker 1 So do you think that it's important to have humans in the loop of, you know, producing the output or the reasoning because the ceiling is higher with humans who are in the loop, because it is more efficient, because we can error correct when models are off path, or philosophically, because people want that, or like some combination of all three?

Speaker 2 Yeah, I think it's probably some combination.

Speaker 2 I think another thing that I kind of think about is like, you know, the most natural thing to do as you kind of automate away the existing set of tasks is, you know, you kind of look at the world, GDP, you like carve out the parts that are like, you know, most easy to replace with these models.

Speaker 2 And, you know, that's kind of the things that you target. Like, oh, wow, you know, coding is like a X billion dollar market.
Let's automate all of that. Or

Speaker 2 this other segment is like an X billion dollar market. Let's automate all of that.
But I actually think like if you kind of empower people,

Speaker 2 if you have models that really understand what people are trying to accomplish and really support them in accomplishing those things, you have the potential to actually grow that pie instead of basically replacing all of those segments.

Speaker 2 And I think in general, like if the purpose of these models is to kind of, you know, replace the person for like this chunk of work, you end up with a lot less, I think, innovate real innovation on kind of what's possible.

Speaker 2 Yeah, I think if you actually have models that really understand what people's goals are and really empower them more, you end up in a very different situation because we're going to push those capabilities into areas that are out of distribution for them okay cool i think is that accurate i'm just yeah no i i'd say so i think it's like when i say that you know i'd like to work on models that like empower people instead of replacing them people are like oh yeah sure like but i'd rather like you know work on curing cancer or something obviously that's a really important goal right building models that are able able to kind of solve, you know, humanity's most difficult and most fundamental problems is like incredibly important.

Speaker 2 But I also think that like,

Speaker 2 and, you know, I'm sure that many of the researchers in the field disagree. I guess in the long run, we'll see kind of what plays out.

Speaker 2 But I personally strongly believe that we're much more likely to solve a lot of these fundamental human problems by working together, by building models that are really good at collaborating with large groups of people, that are really good at understanding different people's goals, different people's ambitions, different people's values, understanding different people's weaknesses and how to kind of coordinate with these large groups of people to make everyone more effective.

Speaker 2 And I think the

Speaker 2 like the vision of this AI that like goes off on its own for like 20 hours, does its own thing, and kind of like, you know, comes back with like, you know, the answers to life, the universe and everything.

Speaker 2 I think that this is like

Speaker 2 less likely.

Speaker 2 I think it's, you know, this is like a, I guess we'll have to see, but I think it's less likely.

Speaker 1 So that goes to you are starting a new company, humans and.

Speaker 1 I remember being like actually quite fundamentally surprised, given all of your work on IQ and reasoning and coding and scale, that you were interested in essentially EQ.

Speaker 1 And you also thought of EQ, and tell me if this is a wrong characterization, as

Speaker 1 like the emotional or the interactive capabilities of models today have really shown up in things like character or like companionship tools only.

Speaker 1 And you thought of it as also like enablement from a productivity perspective, right? So tell me about like where this thread came from.

Speaker 2 Yeah. I guess I've been thinking about this kind of stuff for some time now.

Speaker 2 Like even back in my PhD, I think one of my, I guess, less well-known works was actually about, we showed that you can train language models to simulate different kinds of students. Right.

Speaker 2 Yeah, yeah. And by simulating students, you can actually design better tests for those students.
And that was like a really cool finding.

Speaker 2 Like, hey, if you have models that are really good at modeling people, you can actually design systems that are better for people. And like, this was something that

Speaker 2 I found really cool.

Speaker 2 And

Speaker 2 kind of as we move towards the current kind of capabilities frontier, it became more and more obvious that

Speaker 2 we have these incredibly smart models that are like capable of so much, but they're not used for anywhere near what they're capable of. Like the role that they play in people's lives

Speaker 2 is

Speaker 2 a lot less deep, a lot less positive than it could be. And I spent a lot of time thinking about like, okay, why is that?

Speaker 2 Like, why are these models not like more, like I said, deeply, positively integrated into people's lives? And it seemed like a really big part of it is

Speaker 2 that fundamentally, these models don't really understand people. They don't understand people's goals.

Speaker 2 They're trained, I would say, part of it is like the general kind of training paradigm that the field is in. It's very, I would say, single task focused or task-centric.

Speaker 1 It's ludicrous that all the benchmarks are still oriented this way.

Speaker 2 Yeah. Yeah.
I mean, like, like, or most of them. You know, I mean, even, even the ones that are like,

Speaker 2 like, there's very few benchmarks out there that actually try to consider like, oh, what if you actually have like a person that's interacting with this model?

Speaker 2 Like, you know, at best, you have like some, you know, multi-turn benchmarks that like try to simulate what an environment would respond in different, you know, to different inputs.

Speaker 2 But even that is like still far from, you know, considering, hey, if you actually have this model that interacts with the person for like, you know, some amount of time, like, how does it actually affect that person's life?

Speaker 2 It's really remarkable that the field is kind of like

Speaker 2 so

Speaker 2 stuck in this kind of task-centric regime.

Speaker 2 And I think it, but it makes a lot of sense. One thing that I was told by some folks at, you know, at Google is that

Speaker 2 one of the reasons is that it's actually very useful for credit assignment.

Speaker 2 So, like, being able to have these benchmarks that are very easy to quantify and very easy to relate to some immediate thing means that you can kind of say, oh, yeah, this like, you know, this, this team did like 2% better than this team.

Speaker 2 So they deserve like all of the resources. Or, you know, this team like improved the benchmark by like 10% while this team improved it by 5%.
So, you know, let's, let's allocate accordingly.

Speaker 2 And I think in general, like, that's, that's part of it. I think another part of it is like kind of more aligned with the easiest ways to train these models.

Speaker 2 It's not easy. to you know have these our own environments and stuff you have lots of these companies popping up obviously that are trying to sell environments to different people.
But

Speaker 1 the most popular are, of course, in coding and computer use. Yeah.

Speaker 1 Rather than anything that requires simulating people.

Speaker 2 Yeah, it's not that surprising that we're kind of in this current regime.

Speaker 1 So what do models need to

Speaker 1 know about people? Or like what capabilities are they either missing or have not been elicited from them?

Speaker 2 The most fundamental thing is that the models kind of don't understand the long-term implications of the things that they do and say.

Speaker 2 When you treat every turn of a conversation as kind of its own game, and you, you know, you basically think of it as like, okay, you had this interaction, you're done.

Speaker 2 You need to make sure that this one response has all of the possible answers, has all of the possible content. You don't ever like ask questions, you don't ever like try to clarify things.

Speaker 2 You don't really tend to express uncertainty.

Speaker 2 You don't tend to be proactive. You don't tend to think about the long-term.

Speaker 2 You see a lot of

Speaker 2 even single-term side effects of this kind of regime. And most of them are treated as kind of their own problems to solve.
You see issues around like that that people highlight around like sycophancy.

Speaker 2 You see issues that

Speaker 2 there was recent news around

Speaker 2 the psychosis stuff. There's a lot of these like...

Speaker 2 harmful effects that you get if you think about things in this very single task or like task-centric way.

Speaker 2 But if you have models that actually consider, you know, the long-term implications of, oh, hey, if I tell this person to start

Speaker 2 a company that sells gloves for catching ice cream, if I tell them that that sounds like a good business idea, they might actually go and they might actually build that business and they might realize that it was not actually a good business idea.

Speaker 2 Having a model that can kind of roll out the long-term implications of the things.

Speaker 1 And then they won't trust me anymore and then they won't pay for my compute exactly

Speaker 1 exactly oh no i'm kidding i think that's that's really interesting like one of the very core principles we have at conviction for how we make decisions is

Speaker 1 well what is the very long-term thing we want right and like if that is the customer the founder in this case or an lp or even for us like it actually simplifies things quite a bit if you say like we're optimizing for like a decade plus um versus like this interaction uh and and so being single turn versus multi-term seems like a very different way to make decisions it seems very hard to collect data about multi-term human interactions especially when you get to time scale you know it's actually like analogous to a problem in biology of how do you study diseases that just take time to progress i think it's a really fundamental question I think there is actually like some good academic work that has started to explore some of this.

Speaker 2 Yeah, there's some work recently around like, you know, RL from human interaction. There's some, there's a cool paper called a Colab LLM, you know, that trains against like, you know, simulation.

Speaker 2 There's a lot of very cool work kind of starting to explore this in academia.

Speaker 2 But in general, I would say there's a lot less attention being paid to this kind of stuff in industry because

Speaker 2 I would say for most labs,

Speaker 2 and maybe this is a strong statement, but I would say for most labs, like the human is kind of, you know, the intermediate until you have like this fully automated like, you know, system.

Speaker 2 And so spending a lot of time optimizing things for being really good at understanding and really good at interacting and really good at like collaborating with things is kind of like almost like an intermediate thing you have to do until you get to this like, you know, fully automated point.

Speaker 1 Can you paint a picture of like if we have models that better understand human objectives over different time scales and are good at interacting with humans.

Speaker 1 How is that more integrated into your life five years from now?

Speaker 2 Yeah, I think

Speaker 2 you don't need to go that far out. Two years.

Speaker 2 But yeah,

Speaker 2 I think you get a lot of behaviors that you currently don't really see in these models.

Speaker 2 I think you have models that are much better at understanding how the things that you say and ask fit in to the overall context of the stuff that you're doing. Like, For example,

Speaker 2 if the model knows that you're going to

Speaker 2 some wedding, for example, and then you ask it about booking hotels in Paris, it might

Speaker 2 consider, oh, hey, like around the time of this event,

Speaker 2 I know that this user has all of these things that are true about them.

Speaker 2 A model that's generally able to kind of think about how every

Speaker 2 thing that you say fits into your understanding of that person would just be like, I think a very fundamentally different interaction.

Speaker 2 Because right now, if you want to ask a question like that, you kind of have to dump all of this context in. You have to tell, like, oh, you know, I, can you help me find a hotel in Paris?

Speaker 2 This is because, you know, I'm going to like a wedding. I have like, you know, these constraints.
I, you know, I have like these people who need to be with me.

Speaker 2 I have like, you know, it needs to do this.

Speaker 2 It needs to be, you know, you have, you need to basically dump all of the context that's relevant to yourself into the model and which is also an expensive interaction yeah and something that most people won't do imagine if you had a friend where you had to re-explain everything about yourself to them exactly spoke yeah like can you imagine if every time you interacted with someone you basically like they remember like your name and like you know maybe what you do and like just like the really high level sketch of your life like it would be

Speaker 2 that that friendship probably would not last very long yeah I think that's kind of what the current models are.

Speaker 1 So you'd argue that the like any investment in memory that today's models have is not

Speaker 1 that interesting or that core to their capabilities today.

Speaker 2 I would say that memory is definitely like a feature that has been under underinvested in by the field.

Speaker 2 But I would say that it is kind of difficult to invest in memory in this very like task-centric regime. Because if you have a bunch of these independent tasks,

Speaker 2 the amount of information that each of those needs from other things that you've discussed is not all that high.

Speaker 2 Like, because of the current paradigm, memory doesn't end up being super useful in the training. And so, these models are not particularly good at doing it.

Speaker 1 So, one other thing I said to you, I think out of like fear instinct than anything else, but I feel like other people will have this reaction as well, is,

Speaker 1 I'm a unique snowflake. You can't possibly simulate, you know, me and all of my self-consistency issues between like, I want to learn this today, but I don't actually want to do the work.

Speaker 1 I want to eat cake, but I want to be in shape as well. Like, you know, and we have different time scales and change our minds.

Speaker 1 I'm just constant distribution shift. Like, and then you can't possibly bring all of us under distribution.
Like, what, how do you react to that?

Speaker 2 I think to a certain extent, it's probably a little bit true. It's not easy to

Speaker 2 build these really good models of people. But I do think that the task for the model needs to be that it should be trying to do that.

Speaker 2 Like the model needs to actually be like trying to learn all of these, like trying to learn about you, trying to learn about the things that you care about.

Speaker 2 Like the actual objective of the model needs to be to kind of understand you.

Speaker 2 And like it probably won't be perfect, like, but boy, you know, like you can be a lot better than the current models like that seems totally reasonable actually yeah i think um and you know it's it's something that i think as a field uh we will probably get better at i'm not going to pretend that you know i'm going to one-shot this problem but i think even like any serious effort that gets you quite a long way so there is like a cult sci-fi series about the culture where you have, you know, these super intelligent minds and essentially all of the human and human-like races live in a society where the minds make most of the decisions.

Speaker 1 And there's like, I forget the total humanoid population, but let's say there are 30 or 40 minds that are still relevant as people in terms of perhaps being out of distribution or providing reasoning that the minds cannot.

Speaker 1 And everybody else just lives in the world of abundance where they're like rock climbing and hanging out or whatever, and they do not produce. How is your view of abundance different?

Speaker 2 Everyone kind of has things that they're passionate about and given the opportunity i think

Speaker 2 like people can do like really cool things i think um the role of the model should be to allow people to do those really cool things that everyone kind of wants to do and accomplish those things that everyone kind of wants to accomplish and i think like you know we shouldn't outsource all of the thinking and all of the you know everything to these you know ai overlords or whatever i think what we really want are models that are able to empower us.

Speaker 1 Amazing.

Speaker 1 Okay, super unique mission, amazing research work. You're hiring an early team, getting a lot of compute.
Who are you looking for on the recruiting side?

Speaker 2 One thing that I think is actually probably a good thing that my previous company did is, you know, thinking of everyone kind of to some extent as like engineers.

Speaker 2 I think I'm looking for really strong infra folks who can build stuff. I'm looking for really strong researchers who can build stuff.
I'm looking for really strong product folks who can build stuff.

Speaker 2 I'm looking for people who like have thought a lot about like users, who've thought a lot about memory, you know, on the research side.

Speaker 2 I'm looking for, you know, on the infra side for people who've thought a lot about building distributed systems, really fast inference, people who've

Speaker 2 been there to scale really big projects up.

Speaker 2 On the product side, I think people who are like, you know, really creative about like new modes of interaction, people who have, who really deeply care about building beautiful, tasteful products.

Speaker 1 Awesome. Thanks so much, Eric.

Speaker 2 Thank you so much. Congrats on the new company.
Thank you so much.

Speaker 1 Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.

Speaker 1 That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Humans&: Bridging IQ and EQ in Machine Learning with Eric Zelikman

Press play and read along

Transcript

More episodes from No Priors: Artificial Intelligence | Technology | Startups

The Future of Voice AI: Agents, Dubbing, and Real-Time Translation with ElevenLabs Co-Founder Mati Staniszewski

Scaling Legal AI and Building Next-Generation Law Firms with Harvey Co-Founder and President Gabe Pereyra

Sunday Robotics: Scaling the Home Robot Revolution with Co-Founders Tony Zhao and Cheng Chi

How AI Will Accelerate Breakthroughs in Biotechnology with Benchling CEO Sajith Wickramasekara

Meet Snowflake Intelligence: A Personalized Enterprise Intelligence Agent with Sridhar Ramaswamy