The Best of 2024 with Sarah Guo and Elad Gil

27m
2024 has been a year of transformative technological progress, marked by conversations that have reshaped our understanding of AI's evolution and what lies ahead. Throughout the year, Sarah and Elad have had the privilege of speaking with some of the brightest minds in the field. As we look back on the past months, we’re excited to share highlights from some of our favorite No Priors podcast episodes. Featured guests include Jensen Huang (Nvidia), Andrej Karpathy (OpenAI, Tesla), Bret Taylor (Sierra), Aditya Ramesh, Tim Brooks, and Bill Peebles (OpenAI’s Sora Team), Dmitri Dolgov (Waymo), Dylan Field (Figma), and Alexandr Wang (Scale). Want to dive deeper? Listen to the full episodes here:

NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bet No Priors Ep. 89 | With NVIDIA CEO Jensen Huang

The Road to Autonomous Intelligence, With Andrej Karpathy from OpenAI and Tesla No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla

Transforming Customer Service through Company Agents, with Sierra’s Bret Taylor No Priors Ep. 82 | With CEO of Sierra Bret Taylor

OpenAI’s Sora team thinks we’ve only seen the "GPT-1 of video models" No Priors Ep.61 | OpenAI's Sora Leaders Aditya Ramesh, Tim Brooks and Bill Peebles

Waymo’s Journey to Full Autonomy: AI Breakthroughs, Safety, and Scaling No Priors Ep. 87 | With Co-CEO of Waymo Dmitri Dolgov

Designing the Future: Dylan Field on AI, Collaboration, and Independence No Priors Ep. 55 | With Figma CEO Dylan Field

The Data Foundry for AI with Alexandr Wang from Scale No Priors Ep. 65 | With Scale AI CEO Alexandr Wang

Sign up for new podcasts every week. Email feedback to show@no-priors.com
Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil

Show Notes:
0:00 Introduction
0:15 Jensen Huang on building at data-center scale
4:00 Andrej Karpathy on the AI exo-cortex, model control, and a shift to smaller models
7:14 Bret Taylor on the agentic future of business interactions
11:17 OpenAI’s Sora team on visual models and their role in AGI
15:53 Waymo’s Dmitri Dolgov on bridging the gap to full autonomy and the challenge of 100% accuracy
19:00 Figma’s Dylan Field on the future of interfaces and new modalities
23:29 Scale AI’s Alexandr Wang on the journey to AGI
26:29 Outro

Press play and read along

Runtime: 27m

Transcript

Speaker 1 Hi, No Pryors listeners. I hope it's been an amazing 2024 for you all.
Looking back on this year, we wanted to bring you highlights from some of our favorite conversations.

Speaker 1 First up, we have a clip with the one and only Jensen Huang, CEO of NVIDIA, the company powering the AI revolution.

Speaker 1 Since our 2023 No Priors chat with Jensen, NVIDIA has tripled in stock price, adding almost $100 billion of value each month of 2024 and entering the $3 trillion club.

Speaker 1 More recently, Jensen shared his perspective again with us, this time on why NVIDIA is no longer a chip company, but a data center ecosystem. Here's our conversation with Jensen.

Speaker 1 NVIDIA has moved into larger and larger, let's say, like unit of support for customers. I think about it going from single chip to, you know, server to rack and VL72.

Speaker 1 How do you think about that progression? Like, what's next? Like, should NVIDIA do a full data center?

Speaker 4 In fact, we've built full data centers. The way that we build everything, unless you're building,

Speaker 4 if you're developing software, you need the computer in its full manifestation.

Speaker 4 We don't build PowerPoint slides and ship the chips.

Speaker 6 And we build a whole data center.

Speaker 4 And until we get the whole data center built up, how do you know the software works? Until you get the whole data center built up, how do you know your, you know,

Speaker 4 your fabric works and all the things that you expected the efficiencies to be? How do you know it's going to really work at the scale? And

Speaker 4 that's the reason why it's not unusual to see somebody's actual performance be dramatically lower than their peak performance as shown in PowerPoint slides.

Speaker 2 And

Speaker 4 computing is just not used to, it's not what it used to be. You know, I say that the new unit of computing is the data center.

Speaker 8 That's to us. So that's what you have to deliver.
That's what we build.

Speaker 4 Now, we build a whole thing like that.

Speaker 4 And then we, for every single thing, every combination, air-cooled, x86, liquid-cooled, grace, Ethernet, InfiniBand, MV-Link, no MV-Link, you know what I'm saying? We build every single configuration.

Speaker 4 We have five supercomputers in our company today. Next year, we're going to build easily five more.
So if you're serious about software, you build your own computers.

Speaker 4 If you're serious about software, then you're going to build your whole computer. And we build it all at scale.
This is the part that is really interesting.

Speaker 4 We build it at scale and we build it vertically integrated. We optimize it

Speaker 7 full stack and

Speaker 4 then we disaggregate everything and we sell it in parts. That's the part that is completely, utterly remarkable about what we do.
The complexity of that is just insane.

Speaker 4 And the reason for that is we want to be able to graft our infrastructure into GCP, AWS, Azure, OCI.

Speaker 4 All of their control planes, security planes are all different. And all of the way they think about their cluster sizing, all different.

Speaker 4 But yet we make it possible for them to all accommodate NVIDIA's architecture so that CUDA could be everywhere.

Speaker 4 That's really, really in the end, the singular thought, you know, that we would like to have a computing platform that developers could use that's largely consistent.

Speaker 4 Modulo, you know, 10% here and there because people's infrastructure are slightly optimized differently. And modulo 10% here and there.

Speaker 7 But everything they build will run everywhere.

Speaker 4 This is kind of one of the principles of software that should never be given up. And

Speaker 4 we protect it quite dearly.

Speaker 4 It makes it possible for our software engineers to build once, run everywhere. And that's because we recognize that the investment of software is the most expensive investment.

Speaker 2 It's easy to test.

Speaker 4 Look at the size of the whole hardware industry. And then look at the size of the world's industries.
It's $100 trillion on top of this $1 trillion industry. And that tells you something.

Speaker 4 The software that you build, you have to, you know, you basically maintain for as long as you shall live.

Speaker 1 We, of course, have to mention our conversation with the lovely Andre Karpathy, where we dig into the future of AI as an exocortex, an extension of human cognition.

Speaker 1 Andre, who's been a key figure in AI development from open AI to Tesla to the education of us all, shares a provocative perspective on ownership and access to AI models and also makes a case for why future models might be much smaller than we think.

Speaker 1 If we're talking about an exocortex, that feels like a pretty fundamentally

Speaker 1 important thing to democratize access to. How do you think, like, the current market structure of what's happening in LLM research?

Speaker 1 You know, there's a small number of large labs that actually have a shot at the next generation progressing training. Like, how does that translate to what people have access to in the future?

Speaker 9 So, what you were kind of alluding to maybe is the state of the ecosystem, right?

Speaker 9 So, we have kind of like an oligopoly of a few closed platforms, and then we have an open platform that is kind of like behind, so like Meta Lama, et cetera.

Speaker 9 And this is kind of like mirroring the open source kind of ecosystem.

Speaker 9 I do think that when this stuff starts to, when we start to think of it as like an exocortex, so there's the, there's a saying in crypto, which is like, not your keys, not your, not your key.

Speaker 5 Not yours. Yeah.

Speaker 9 Like, is it the case that if it's like not your weights, not your brain?

Speaker 6 That's interesting because a company is effectively controlling your exocortex and therefore.

Speaker 9 It starts to feel kind of invasive.

Speaker 1 If this is my exocortex, I think people will care much more about ownership. Yeah.

Speaker 7 Like you're, yeah, you, you realize you're renting your brain.

Speaker 9 Like it seems to rent your brain.

Speaker 1 The thought experiment is like, are you willing to give up ownership and control to rent a better brain?

Speaker 11 Because I am. Yeah.

Speaker 2 So I think that's the trade-off.

Speaker 5 I think we'll see how that works.

Speaker 9 But maybe it's possible to like, by default, use the closed versions because they're amazing. But you have a fallback in various scenarios.

Speaker 9 And I think that's kind of like the way things are shaping up today, even.

Speaker 9 When APIs go down on some of the closed source providers, people start to implement fallbacks to like the open ecosystems, for example, that they fully control. And

Speaker 5 they feel empowered by that.

Speaker 9 So maybe that's just the extension of what it looks like for the brain is you fall back on the open source stuff

Speaker 9 should anything happen. But most of the time, you actually.

Speaker 1 So it's quite important that the open source stuff continues to progress.

Speaker 3 I think so.

Speaker 9 100%. And this is not like an obvious point or something that people maybe agree on right now, but I think 100%.

Speaker 6 I guess one thing I've been wondering about a little bit is

Speaker 6 what is the smallest performant model that you can get to in some sense, either in parameter size or however you want to think about it. And so I'm a little bit curious about your view.

Speaker 6 Cause you've thought a lot about both. distillation, small models, you know.

Speaker 9 I think it can be surprisingly small. And I do think that the current models are wasting a ton of capacity remembering stuff that doesn't matter.

Speaker 9 Like they remember SHA hashes, they remember like the ancient.

Speaker 1 Because the data set is not curated the best.

Speaker 3 Yeah, exactly.

Speaker 9 And I think this will go away. And I think we just need to get to the cognitive core.
And I think the cognitive core can be extremely small. And it's just this thing that thinks.

Speaker 9 And if it needs to look up information, it knows how to use different tools.

Speaker 1 Is that like 3 billion parameters? Is that 20 billion?

Speaker 9 I think even a billion, billion satisfies. We'll probably get to that point.
And the models can be very, very small.

Speaker 9 And I think the the reason they can be very small is fundamentally, I think, just like distillation works is maybe like the only thing I would say.

Speaker 2 Distillation works like surprisingly well.

Speaker 9 Distillation is where you get a really big model or a huge amount of compute or something like that

Speaker 9 supervising a very small model.

Speaker 1 Our conversation with Brett Taylor, OpenAI board member and founder of Sierra, painted a really different picture of how we interact with businesses in the future.

Speaker 1 Here's a clip of Brett explaining company agents and why the website is going to take a backseat.

Speaker 7 The other category, which is the area that my company, Sierra, works in, in, is what I call company agents.

Speaker 7 And it's really less simply about automation or autonomy, but in this world of conversational AI, how does your company exist digitally? I'll use the metaphor.

Speaker 7 If it were 1995, you know, if you existed digitally, it meant having a website and being in Yahoo Directory, right?

Speaker 7 In 2025, existing digitally will probably mean having a branded AI agent that your customers can interact with to do everything that they can do on your website, whether it's asking about your products and services, doing commerce, doing customer service.

Speaker 7 That domain, I think, is shovel ready right now with current technology because again, like the persona-based agents, it's not boiling the proverbial ocean technically.

Speaker 7 You have well-defined processes for your customer experience, well-defined systems that are your systems of record.

Speaker 7 And it's really about saying in this world where we've gone from websites to apps to now conversational experiences, what is the conversational experience you want around your brand?

Speaker 7 And it doesn't mean it's perfect or it's easy. Otherwise we wouldn't have started a company around it, but it's at least well-defined.

Speaker 7 And I think that right now in AI, if you're working on artificial general intelligence, your version of agent probably means something different. And that's okay.

Speaker 7 That's just a different problem to be solved.

Speaker 7 But I think, you know, particularly in the areas that Sierra works and a lot of the companies that you all have invested in, is it saying, you know, are there some shovel-ready opportunities right now with existing technology?

Speaker 7 And I absolutely think there are.

Speaker 1 Can you describe the shoveling cycle of building a company agent? Like, what is the gap between research and reality? Like, how do you, what do you invest in as an engineering team?

Speaker 1 Like, how do you understand the scope of different customer environments? Just like, what are the sort of vectors of investment here?

Speaker 6 And maybe, Sarah, as a starting point, it may even be worth also defining like what are the products that Sierra provides today for its customers? And then where do you want that to go?

Speaker 6 And then maybe we can feed that back into like, what are the components of that?

Speaker 6 Because I think obviously you folks are really emerging as a leader in your vertical, but it'd be great just for a broader audience to to understand what you focus on.

Speaker 7 Yeah, sure. I'll just give a couple examples to make it concrete.
So, if you buy a new Sonos speaker or you're having technical issues with your speaker, you get the dreaded flashing orange light.

Speaker 7 You'll now chat with the Sonos AI, which is powered by Sira to help you onboard, help you debug whether it's a hardware issue, a Wi-Fi issue, things like that.

Speaker 7 If you're a SiriusXM subscriber, their AI agent is named Harmony, which I think is a delightful name. Good to hear.

Speaker 7 And it's everything from upgrading and downgrading your subscription level to if you get a trial when you purchase a new vehicle, speaking to you about that.

Speaker 7 Broadly speaking, I would say we help companies build branded customer-facing agents.

Speaker 7 And branded is an important part of it. It's part of your brand.
It's part of your brand experience.

Speaker 7 And I think that's really interesting and compelling because I think just like, you know, when I go back to the proverbial 1995, you know, your website was on your business card.

Speaker 7 It was the first time you had sort of this digital presence. And I think the same novelty and probably we'll look back at the agents today with the same sense of, oh, that was quaint.

Speaker 7 You know, I remember if you go back to the Wayback Machine, you look at early websites, it was either someone's phone number, and that's it, or it looked like a DVD intro screen with like lots of graphics.

Speaker 7 You know, a lot of the agents that customers start with are often around areas of customer service, which is a really great use case.

Speaker 7 But I do truly believe if you fast forward three or four years, your agent will encompass all that your company does. I use this example before, but I like it.

Speaker 7 But just imagine an insurance company, all that you can do when you engage with them. Maybe you're filing a claim.
Maybe you're comparing plans.

Speaker 7 We were talking about our kids earlier. Maybe you're adding your child to your insurance premium when they get old enough to have a driver's license.

Speaker 7 All of the above, you know, all of the above will be done by your agent. So that's what we're helping companies build.

Speaker 1 Next, we talk to the Sora team at OpenAI, which is building an incredibly realistic video AI generation model.

Speaker 1 In this clip, we talk about their research and how models that understand the world fit into the road to AGI. Is there anything you can say about how the work you've done with Sora

Speaker 1 sort of affects the broader research roadmap?

Speaker 10 Yeah, so I think something here is about

Speaker 10 the knowledge that Sora ends up learning about the world just from seeing all this visual data. It understands 3D, which is one cool thing because we haven't trained it to.

Speaker 10 We didn't explicitly bake 3D information into it whatsoever. We just trained it on video data and it learned about 3D because 3D exists in those videos.

Speaker 10 And it learned that when you take a bite out of a hamburger that you leave a bite mark. So it's learning so much about our world.

Speaker 10 And when we interact with the world, so much of it is visual. So much of what we see and learn throughout our lives is visual information.

Speaker 10 So we really think that just in terms of intelligence, in terms of

Speaker 10 leading toward AI models that are more intelligent, that better understand the world like we do, this will actually be really important for them to have this grounding of like, hey, this is the world that we live in there's so much complexity in it there's so much about how people interact how things happen how events in the past end up impacting events in the future that this will actually lead to just much more intelligent ai models more broadly than even generating videos it's almost like you invented like the future visual cortex plus some part of the

Speaker 10 reasoning parts of the brain or something sort of simultaneously yeah and and that's a cool comparison because a lot of the intelligence that humans have is actually about world modeling, right?

Speaker 10 All the time when we're thinking about how we're going to do things, we're playing out scenarios in our head. We have dreams where we're playing out scenarios in our head.

Speaker 10 We're thinking in advance of doing things. If I did this, this thing would happen.
If I did this other thing, what would happen, right?

Speaker 10 So we have a world model, and building Sora as a world model is very similar to a big part of the intelligence that humans have.

Speaker 1 How do you guys think about the sort of analogy to humans as having a very approximate world model versus something that is as accurate as like, let's say,

Speaker 1 a physics engine in the traditional sense, right? Because if I hold an apple and I drop it, I expect it to fall at a certain rate.

Speaker 1 But most humans do not think of that as articulating a path with a speed as a calculation.

Speaker 1 Do you think that sort of learning is like parallel in large models?

Speaker 5 I think it's a...

Speaker 13 a really interesting observation. I think how we think about things is that it's almost like a deficiency, you know, in humans that it's not so high fidelity.

Speaker 13 So, you know, the fact that we actually can't do very accurate long-term prediction when you get down to a really narrow set of physics

Speaker 3 is something that we can improve upon with some of these systems.

Speaker 13 And so we're optimistic that Sora will, you know, supersede that kind of capability and will, you know, in the long run, enable it to be more intelligent one day than humans as world models.

Speaker 13 But it is, you know, certainly an existence proof that it's not necessary for other types of intelligence.

Speaker 13 Regardless of that, it's still something that Sora and models in the future will be able to improve upon.

Speaker 1 Okay, so it's very clear that the trajectory prediction for like throwing a football is going to be better than the next versions of these models than mine is, let's say.

Speaker 10 If I could add something to that, this relates to the paradigm of scale and

Speaker 10 the bitter lesson a bit about how we want methods that as you increase compute get better and better.

Speaker 10 And something that works really well in this paradigm is doing the simple but challenging task of just predicting data. And you can try coming up with more complicated tasks.

Speaker 10 For example, something that doesn't use video explicitly, but is maybe in some like space that simulates approximate things or something.

Speaker 10 But all this complexity actually isn't beneficial when it comes to the scaling laws of how methods improve as you increase scale. And what works really well as you increase scale is just predict data.

Speaker 10 And that's what we do with text. We just predict text.

Speaker 10 And that's exactly what we're doing with visual data with Sora, which is we're not making some complicated trying to figure out some new thing to optimize.

Speaker 10 We're saying, hey, the best way to learn intelligence in a scalable matter is to just predict data.

Speaker 1 That makes sense. And relating to what you said, Bill, like predictions will just get much better with no necessary limit that approximates

Speaker 5 humans.

Speaker 1 We also sat down with Dmitry Dolgov, co-CEO of Waymo.

Speaker 1 Today, the company is scaling its self-driving fleet, completing over 100,000 fully autonomous rides per week in cities like San Francisco and Phoenix. It's my favorite way to travel.

Speaker 1 In this trip, Dimitri explains why achieving full autonomy, removing the driver entirely, and achieving 100% accuracy rather than 99.99% accuracy in self-driving is much harder than it might appear.

Speaker 1 Why is it breaking from like, you know, let's say advanced driver assistance that seems to work in more and more scenarios versus, let's say, full autonomy?

Speaker 5 What's the delta? Yeah.

Speaker 11 It's the number of nines, right? And it's the nature of this problem, right? If you think about where we started in 2009, one of our first

Speaker 11 milestones, one of the goals that we set for ourselves was to drive

Speaker 11 10 routes. Each one was 100 miles long all over the Bay Area.

Speaker 11 Freeways.

Speaker 11 downtown San Francisco, around Lake Tahoe, everything. And you had to do 100 miles with no intervention.
So the car had to drive autonomous from beginning to end.

Speaker 11 That's the goal that we created for ourselves. You know,

Speaker 11 about a dozen of us, took us maybe 18 months. We achieved that.

Speaker 2 2009, no ImageNet, no continuous, no transformers, no big models, tiny computers,

Speaker 11 right? Very easy to get started. It's always been the property.
And with every wave of technology, it's been very easy to get started.

Speaker 11 But that the hard problem, and it's kind of like that, the early part of the curve has been getting even steeper and steeper. But that's not where the complexity is.

Speaker 11 The complexity is in the long tail of the many, many, many nines and you don't see that if you go you know for a prototype if you go for you know a driver assist system uh and this is where you know we've been spending all of our that's the only hard part of the problem right and I guess you know nowadays it's always been getting easier with every technical you know kind of cycle so nowadays you can take with like all of the advances of in AI and especially in the you know generative AI world and the LLMs and VLMs, you can take kind of an almost off-the-shelf transformers are amazing.

Speaker 11 VLMs are amazing. you can take uh kind of a vlm uh

Speaker 11 that you know can accept uh images or video and is you know has a decoder where you can give it your text prompt and it'll output text and you can fine-tune it you know with just a little bit of you know data to go from let's say camera data on a car to instead of words to trajectories or you know whatever decisions you want to make you just you know take the thing because a black box you know you take whatever has been trained for a living you fine-tune it a little bit and like that without you know that's i think if you ask any good gristine in computer science to build you know, NAV today, this is what they would do.

Speaker 11 Yeah.

Speaker 5 And out of the box, that's amazing. That's something that it's amazing, right? Yeah.
Like the power of transformers, the power of vulnerabilities is mind-blowing, right?

Speaker 2 So with just a little bit of effort, you get something on the road and it works.

Speaker 11 You can drive, I don't know, test hundreds of miles and just, you know, it will blow your mind.

Speaker 11 But then is that enough? Is that enough to remove the driver and drive millions of miles and have a safety record that is demonstrated really better than humans?

Speaker 5 No.

Speaker 11 Right. I guess this is with every tech, you know, evolution, technology and breakthrough and AI.
They've seen that, appreciate it.

Speaker 1 Up next, we have my dear friend Dylan Fields, CEO of Figma. Dylan shares his prediction for how user interfaces will evolve in an AI-driven world.

Speaker 1 While many predict a shift toward conversational or agent-based interfaces, Dylan suggests that new interface paradigms will complement existing ones.

Speaker 1 He also highlights the exciting potential of visual AI and intelligent cameras as the next frontier in input methods.

Speaker 6 How do you think about the shift in UI in general that's going to come with AI? A lot of things are kind of collapsing in the short run into chat interfaces.

Speaker 6 There's a lot of people talking about a future agentic world, which does away with most UI altogether. And there's just all programmatic stuff happening in the background.

Speaker 6 How do you think about where UI is going in general right now?

Speaker 12 I mean, I kind of think this kind of comes back to the rabbit point I was making earlier. Yes, there's a lot of

Speaker 12 innovation happening in terms of agents, but I think like in terms of the way that we use UI to interact with agents, we're just the beginning.

Speaker 12 And I think that the interfaces will get more sophisticated.

Speaker 12 But also, even if they don't, I suspect that it's just like any new media type, when it's introduced, it's not like the old media types go away, right?

Speaker 12 Just because you have TikTok doesn't mean that you no longer watch YouTube.

Speaker 8 Even if it's true that a new

Speaker 12 form of interaction is via chat interfaces, which I'm not even sure I believe, but even if we take that as a prior on the No Priors podcast, then I think that you still have UI.

Speaker 12 And actually, I think you have more UI and more software than before.

Speaker 6 Do you have any predictions in terms of multimodality? Like, do you think there's more need for voice?

Speaker 6 Like, so, you know, a lot of the debates people have is like, when are you going to use voice versus text versus other types of interfaces? And,

Speaker 6 you know, you could imagine arguments in all sorts of directions in terms of, you know, when do you use what and things like that.

Speaker 6 And a lot of people are, not a lot, some people are suggesting because of the rise of multimodal models, you'll have like more voice input or more things like that, because you'll be able to do real-time sort of smart contextual semantic understanding of like a conversation.

Speaker 6 And so you have more of a verbal conversational UI versus a text-based UI. And so it kind of changes how you think about design.

Speaker 6 So I was just curious if you have any thoughts on that, that sort of future-looking stuff.

Speaker 12 There's all sorts of contexts where a voice UI is really important.

Speaker 12 And I think that

Speaker 12 it might be that we find that voice UIs start to map to more traditional UIs

Speaker 12 because that's something that you could obviously do

Speaker 12 in a more generalized way.

Speaker 12 But yeah, I mean, personally, I don't want to navigate the information spaces that I interact with every day, all day via voice.

Speaker 12 I also don't want to do it in minority report style on the Vision Pro exactly either.

Speaker 12 Maybe with a keyboard and mouse and like an amazing Vision Pro monitor setup or Oculus, like that could be cool, but I don't want to do the minority report thing.

Speaker 12 And so it's interesting because I think that we get these new glimpses at interaction patterns that are really cool.

Speaker 12 And the natural inclination is to extrapolate and say they're going to be useful for everything. And I think that they have like sort of their role.
And

Speaker 12 it doesn't mean that they're going to be ubiquitous across every interaction we have.

Speaker 12 but that's a natural cycle to be in. And I think it's good.
It's healthy to have sort of that almost mania around what can it do, because if you don't have that, then you don't get to find out.

Speaker 12 And so I'm supportive of people exploring as much as possible because that's how you kind of progress on HCI and figuring out how to use computers and to the fullest potential that could be possible.

Speaker 1 One of the things I am really bullish on is, I mean, maybe you just think of it as an input mode or a peripheral, but it's really hard for people to describe things visually.

Speaker 1 And so the idea of intelligent cameras, even in the like most basic sense.

Speaker 5 Oh, it works.

Speaker 1 It works. I think that's actually a really fun space to be, as you said, like exploring, because I actually think that will be useful.
And it's something that every user is capable of, right?

Speaker 1 Taking pictures, capturing video. And so I think that'll be, I'm pretty bullish on that.
To wrap up our favorite moments from 2024, we have Scale CEO Alexander Wang.

Speaker 1 In this clip, he shares his bold take on the road to AGI.

Speaker 1 Alex also dives into why generalization in AI is harder than many think and why solving these niche problems and more data and evals is key to advancing the technology.

Speaker 1 Something you believe about AI that other people don't.

Speaker 14 My biggest belief here is that the path to AGI is

Speaker 14 one that looks a lot more like curing cancer than developing a vaccine. And what I mean by that is I think that the path to build HEI is going to be in,

Speaker 14 you know, you're going to have to solve a bunch of small problems that where you don't get that much positive leverage between solving one problem to solving the next problem.

Speaker 14 And there's just sort of, you know, it's like curing cancer, which is you have to then zoom into each individual cancer and solve them independently.

Speaker 14 And eventually over a multi-decade timeframe, we're going to look back and realize that we've we've, you know, built HI, we've cured cancer.

Speaker 14 But the path to get there will be this like, quite plodding road of solving individual capabilities and building individual sort of data flywheels to support this end mission.

Speaker 14 Whereas I think a lot of people in the industry paint the path to AGI as like,

Speaker 14 eventually we'll just, boop, we'll get there.

Speaker 7 We'll like, you know, we'll

Speaker 14 like, we'll solve it in one fell swoop. And I think there's a lot of implications for how you actually think about the technology arc and

Speaker 14 how society is going to have to deal with it.

Speaker 14 I think it's actually a pretty bullish case for society adapting the technology because I think it's going to be consistent, slow progress for quite some time.

Speaker 14 And society will have time to fully sort of

Speaker 6 acclimate to the technology that develops.

Speaker 1 When you say solve like a problem at a time, right, if we just pull away from the analogy a little bit,

Speaker 1 should I think of that as

Speaker 1 generality of multi-step reasoning is really hard, as Monte Carlo's research is not the answer that people think it might be?

Speaker 1 We're just going to run into scaling walls. Like what sort of what are the dimensions of like solving multiple problems?

Speaker 14 I think the main thing fundamentally is I think there's very limited generality that we get from these models.

Speaker 14 And even for multimodality, for example, my understanding is there's no positive transfer from learning in one modality to other modalities.

Speaker 14 So like training off of a bunch of video doesn't really help you that much with your text problems and vice versa. And so I think what this means is like each

Speaker 14 sort of

Speaker 14 each niche of capabilities or each area of capability is going to require require separate flywheels data flywheels to be able to to push through and drive performance you don't yet believe in video as basis for world model that helps i think that's reason i think it's great narrative i don't think there's strong scientific evidence of that yet maybe there will be eventually um but i think that this is the uh i think the base case let's say is one where you know there's not that much generalization coming out of the models.

Speaker 14 And so we actually just need to slowly solve lots and lots of little problems to ultimately result in AGI.

Speaker 1 Thank you so much for listening in 2024. We've really enjoyed talking to the people reshaping the world for AI.

Speaker 1 If you want to more deeply dive into any of the conversations you've heard today, we've linked the full episodes in our description.

Speaker 1 Please let us know who you want to hear from and what your questions are for next year. Happy holidays.

Speaker 1 Find us on Twitter at no priors pod. Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.

Speaker 1 That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.