The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

Transcript

Speaker 1 Hi listeners, welcome to No Pryors. This week we're speaking to Chelsea Finn, co-founder of Physical Intelligence, a company bringing general-purpose AI into the physical world.

Speaker 1 Chelsea co-founded Physical Intelligence alongside a team of leading researchers and minds in the field.

Speaker 1 She's an associate professor of computer science and electrical engineering at Stanford University, and prior to that, she worked at Google Brain and was at Berkeley.

Speaker 1 Chelsea's research focuses on how AI systems can acquire general-purpose skills through interactions with the world. So, Chelsea, thank you so much for joining us today, I know Pryors.

Speaker 2 Yeah, thanks for having me.

Speaker 1 You've done a lot of really important storied work in robotics between your work at Google, at Stanford, etc.

Speaker 1 So, I would just love to hear a little bit firsthand your background in terms of your path in the world of robotics, what drew you to it initially, and some of the work that you've done.

Speaker 2 Yeah, it's been a long road.

Speaker 2 At the beginning, I was really excited about the impact that robotics could have in the world, but at the same time, I was also really fascinated by this problem of developing perception and intelligence in machines and robots embody all of that and also there's sometimes there's some cool math that you can do as well that makes keeps your brain active makes you think and so I think all of that is really fun about working in the field I started working more seriously in robotics more than 10 years ago at this point at the start of my PhD at Berkeley and we were working on neural network control

Speaker 2 trying to train neural networks that map from image pixels to directly actually to motor torques on a robot arm. At the time, this was not very popular.

Speaker 2 And we've come a long way and it's a lot more accepted in robotics and also just generally something that a lot of people are excited about.

Speaker 2 Since that beginning point, it was very clear to me that we could train robots to do pretty cool things, but that getting the robot to do one of those things in many scenarios with many objects was a major, major challenge.

Speaker 2 So, 10 years ago, we were training robots to like screw a cap onto a bottle and use a spatula to lift an object into a bowl and kind of do a tight insertion or hang up like a hanger on a clothes rack.

Speaker 2 And so, pretty cool stuff.

Speaker 2 But actually getting the robot to do that in many environments with many objects, that's where a big part of the challenge comes in. And

Speaker 2 I've been thinking about ways to make broader data sets, train on those broader data sets, and also different approaches for learning, whether it be reinforcement learning, video prediction, imitation learning,

Speaker 2 all those things.

Speaker 2 And so yeah, moved from, spent a year at Google Brain in between my PhD and joining Stanford, became a professor at Stanford, started a lab there, did a lot of work along all these lines.

Speaker 2 And then recently started physical intelligence almost a year ago at this point. So I've been on leave from Stanford for that.
And it's been really exciting to be able to

Speaker 2 try to execute on the vision that the co-founders that we collectively have and

Speaker 2 do it with a lot of resources and so forth. And I'm also still advising students at Stanford as well.

Speaker 1 That's really cool. And I guess we started physical intelligence with four other co-founders and an incredibly impressive team.

Speaker 1 Could you tell us a little bit more about what physical intelligence is working on and the approach that you're taking? Because I think it's a pretty unique slant on the whole field and approach.

Speaker 2 Yeah, so we're trying to build a big neural network model that could ultimately control any robot to do anything in any scenario. And

Speaker 2 like a big part of our vision is that in the past, robotics has focused on like trying to go deep on one application and like developing a robot to do one thing and then ultimately gotten kind of stuck in that one application.

Speaker 2 It's really hard to like solve one thing and then try to get out of that and broaden. And instead we're really

Speaker 2 in it for the long term to try to address this broader problem of physical intelligence in the real world. We're thinking a lot about generalization, generalists,

Speaker 2 and

Speaker 2 unlike other robotics companies, we think that

Speaker 2 being able to leverage all of the possible data is very important.

Speaker 2 And this comes down to actually not just leveraging data from one robot, but from any robot platform that might have six joints or seven joints or two arms or one arm.

Speaker 2 We've seen a lot of evidence that you could actually transfer a lot of rich information across these different embodiments and allows you to use data.

Speaker 2 And also, if you iterate on your robot platform, you don't have to throw all your data away.

Speaker 2 I have faced a lot of pain in the past where we got a new version of the robot, and then your policy doesn't work.

Speaker 2 And

Speaker 2 it's a really painful process to try to get back to where you were on the previous robot iteration. So, yeah, trying to build generalist robots

Speaker 2 and essentially kind of develop foundation models that will power the next generation of robots

Speaker 2 in the real world.

Speaker 1 That's really cool. Cause I mean, I guess there's a lot of sort of parallels to the large language model world where, you know, really a mixture of deep learning, the transformer architecture,

Speaker 1 and scale has really proven out that you can get real generalizability and different forms of transfer between different areas.

Speaker 1 Could you tell us a little bit more about the architecture you're taking or the approach or, you know, how you're thinking about the basis for the foundation model that you're developing?

Speaker 2 At the beginning, we were just getting off the ground. We were trying to scale data collection.

Speaker 2 And a big part of that is unlike in language, we don't have like Wikipedia or an internet of robot motions. And we're really excited about scaling data on real robots in the real world.

Speaker 2 This is this kind of real data is what has fueled machine learning advances in the past.

Speaker 2 And a big part of that is we actually need to collect that data and that looks like teleoperating robots in the physical world.

Speaker 2 We're also exploring other ways of scaling data as well, but the kind of bread and butter is scaling real robot data.

Speaker 2 We released something in late October where we showed some of our initial efforts around scaling data and how we can learn very complex tasks of folding laundry, cleaning tables, constructing a cardboard box.

Speaker 2 Now, where we are in that journey is really thinking a lot about language interaction and generalization to different environments.

Speaker 2 So what we showed in October was the robot in one environment and it was trained. It had data in that environment.

Speaker 2 We were able to see some amount of generalization. So it was able to fold shirts that had never seen before, fold shorts it has never seen before, but

Speaker 2 the degree of generalization was very limited. And you also couldn't interact with it in any way.

Speaker 2 You couldn't prompt it and tell you what you want to do beyond kind of fairly basic things that it saw in the training data.

Speaker 2 And so being able to handle lots of different prompts in lots of different environments is a big focus right now. And in terms of the architecture,

Speaker 2 we're using transformers

Speaker 2 and we are using pre-trained models, pre-trained vision language models. And that allows you to leverage all of the rich information in the internet.

Speaker 2 We had a research result a couple of years ago where we showed that if you leverage vision language models, then you could actually get the robot to do tasks that require concepts that were never in the robot's training data, but were in the internet.

Speaker 2 Like one famous example is that you can pass the coke can to Taylor Swift or a picture of Taylor Swift, and the robot has never seen Taylor Swift in person, but the internet has lots of images of Taylor Swift in it.

Speaker 2 And you can leverage all of the information in that data and then the weights of the pre-trained model to to kind of transfer that to the robot.

Speaker 2 So we're not starting from scratch and that helps a lot as well. So that's a little bit about the approach.
Happy to dive deeper as well.

Speaker 1 That's really amazing. And then

Speaker 1 what do you think is the main basis then for really getting to generalizability? Is it scaling data further? Is it scaling compute? It's a combination of the two. It's other forms of post-training.

Speaker 1 Like, I'm just sort of curious, like, as you think through the common pieces that people look at now.

Speaker 1 I'm sort of curious what you think needs to get filled in.

Speaker 1 Obviously, on the, again, the more language model world, people are spending a lot of time on reasoning modules and other things like that as well.

Speaker 1 So I'm curious, like, what are the components that you feel are missing right now?

Speaker 2 Yeah, so I think the number one thing, and this is kind of the boring thing, is just getting more diverse robot data.

Speaker 2 So for that release that we had in late October last year, we collected data in three buildings, technically.

Speaker 2 The internet, for example, and everything that is fueled language models and vision models is way, way more diverse than that because the internet is pictures that are taken by lots of people and texts written by lots of different people.

Speaker 2 And so just trying to collect data in many more diverse places and with many more objects, many more tasks. So scaling the diversity of the data, not just the quantity of the data is very important.

Speaker 2 And that's a big thing that we're focusing on right now, actually bringing our robots into lots of different places and collecting data in it.

Speaker 2 As a side product of that, we also learn what it takes to actually get your robot to be operational and functional in lots of different places.

Speaker 2 And that is a really nice nice byproduct because if you actually want to get robots to work in the real world you need to be able to do that so that's the number one thing but then we're also exploring other things leveraging videos of people uh again leveraging data from the web leveraging pre-trained models uh

Speaker 2 thinking about reasoning, although more basic forms of reasoning. In order to, for example, put a dirty shirt into a hamper.

Speaker 2 If you can recognize where the shirt is and where the hamper is and what you need to do to accomplish that task,

Speaker 2 that's useful. Or if you want to make a sandwich and the user has a particular request in mind, you should reason through that request.

Speaker 2 If they're allergic to pickles, you probably shouldn't put pickles on the sandwich.

Speaker 2 Things like that. So there's some basic things around there, although the number one thing is just more diverse robot data.

Speaker 1 And then I think a lot of the approach you've taken to date has really been an emphasis on releasing open source models and packages for robotics. Do you think that's the long-term path?

Speaker 1 Do you think it's open core? Do you think it's eventually proprietary models? Or how do you think about that in the context context of the industry?

Speaker 1 Because it feels like there's a few different robotics companies now each taking different approaches in terms of either hardware only, excuse me, hardware plus software and they're focused on a specific hardware footprint.

Speaker 1 There's

Speaker 1 software and there's closed source versus open source if you're just doing the software. So I'm sort of curious where in that spectrum physical intelligence lies.

Speaker 2 Definitely. So we've actually been quite open.

Speaker 2 Not only have we open source some of the weights and release details in technical papers, we've actually also been working with hardware companies and giving designs of robots to hardware companies.

Speaker 2 And some people have actually like when I tell people this, sometimes they're actually really shocked that like, what about the IP? What about, I don't know, confidentiality and stuff like that.

Speaker 2 And we've actually made this, made a very intentional choice around this. There's a couple reasons for it.

Speaker 2 One is that we think that the field, it's really just the beginning and these models will be so, so much better. And the robots should be so, so much better in a year, in three years.

Speaker 2 And we want to support the development of the research. And we want to support the community, support the robots, so that

Speaker 2 when we hopefully develop the technology of these generalist models, the world will be more ready for it.

Speaker 2 We'll have better, like more robust robots that are able to leverage those models, people who have the expertise and understand what it requires to use those models.

Speaker 2 And then the other thing is also like, we have a really fantastic team of researchers and engineers and

Speaker 2 really, really fantastic researchers and and engineers want to work at companies that

Speaker 2 are open, especially researchers, where they can get kind of credit for their work and share their ideas, talk about their ideas.

Speaker 2 And we think that having the best researchers and engineers will be necessary for solving this problem.

Speaker 2 The last thing that I'll mention is that I think the biggest risk with this bet is that it won't work. Like I'm not really worried about competitors.

Speaker 2 I'm more worried that no one will solve the problem.

Speaker 1 Oh, interesting. And why do you worry about that?

Speaker 2 I think robotics is, it's very hard.

Speaker 2 And there have been many, many failures in the past. And unlike when you're like recognizing an object in an image, there's very little tolerance for error.

Speaker 2 You can miss a grasp on an object or not make like the difference between making contact and not making contact in an object is so small.

Speaker 2 And it has a massive impact on

Speaker 2 the outcome of whether the robot can actually successfully manipulate the object. And I mean, that's just one example.
There's challenges on the data side of collecting data.

Speaker 2 Well, just anything involving hardware is hard as well.

Speaker 1 I guess we have a number of examples now of robots in the physical world.

Speaker 1 You know, everything from autopilot on a jet on through to some forms of pick and pack and or other types of robots and distribution centers.

Speaker 1 And there's obviously the different robots involved with manufacturing, particularly in automotive, right? So there's been a handful of more constrained environments where.

Speaker 1 people have been using them in different ways.

Speaker 1 Where do you think the impact of these models will first show up? Because to your point, there are certain things where you have very low tolerance for error.

Speaker 1 And then there's a lot of fields where actually it's okay, or maybe you can constrain the problem sufficiently relative to the capabilities of the model that it works fine.

Speaker 1 Where do you think physical intelligence will have the nearest term impact? Or in general, the field of robotics and these new approaches will substantiate themselves.

Speaker 2 Yeah, as a company, we're really focused on... on the long-term problem and not at like any one particular application because of the failure modes that can come up when you focus on one application.

Speaker 2 I don't know

Speaker 2 where the first applications will be.

Speaker 2 I think one thing that's actually challenging is that typically in machine learning, a lot of the successful applications of like recommender systems, language models,

Speaker 2 like image detection, a lot of the consumers of that, of the model outputs are actually humans who could actually check it. And the humans are good at the thing.

Speaker 2 A lot of the very natural applications of robots is actually the robot doing something autonomously on its own. where it's not like a human consuming the commanded arm position, for example,

Speaker 2 and then checking it and then validating it and so forth.

Speaker 2 And so I think we need to think about new ways of having some kind of tolerance for mistakes or scenarios where that's fine or scenarios where humans and robots can work together.

Speaker 2 That's, I think, one big challenge that will come up when trying to actually deploy these.

Speaker 2 And some of the language interaction work that we've been doing is actually motivated by this challenge where we think it's really important for humans to be able to kind of provide input for how they want the robot to behave and what they want the robot to do, how they want the robot to help in a particular scenario.

Speaker 2 That makes sense.

Speaker 1 I guess the other form of generalizability to some extent, at least in our current world, is the human form, right?

Speaker 1 And so, some people are specifically focused on humanoid robots like Tesla and others under the assumption that the world is designed for people and therefore is the perfect form factor to coexist with people.

Speaker 1 And then, other people have taken very different approaches in terms of saying, Well, I need something that's more specialized for the home in certain ways or for factories or manufacturing, or you name it.

Speaker 1 What is your view on kind of humanoid versus not?

Speaker 2 On one hand, I think humanoids are really cool. And I have one in my lab at Stanford.

Speaker 2 On the other hand, I think that they're a little overrated.

Speaker 2 And one way it kind of to practically look at it is I think that we're generally fairly bottlenecked on data right now.

Speaker 2 And some people argue that with humanoids, you can maybe collect data more easily because it matches the human form factor. And so maybe it'd be easier to mimic humans.

Speaker 2 And I've actually heard people make those arguments, but if you've ever actually tried to teleoperate a humanoid, it's actually a lot harder to teleoperate than a static manipulator or a mobile manipulator with wheels.

Speaker 2 Optimizing for being able to collect data, I think is very important

Speaker 2 because if we can get to the point where we have more data than we could ever want, then it just comes down to research and compute and evaluations.

Speaker 2 And so we're optimizing for That's one of the things we're kind of optimizing for. And so we're using cheap robots.

Speaker 2 We're using robots that we can very easily develop teleoperation interfaces for, in which you can do teleoperation very quickly and collect diverse data, collect lots of data.

Speaker 1 Yeah, it's funny. There was that viral fake Kim Kardashian video of her going shopping with the robot following her around, carrying all of her shopping bags.

Speaker 1 When I saw that, I really wanted a humanoid robot to follow me around everywhere. I thought it'd be really funny to do that.

Speaker 1 So I'm hopeful that someday I can use your software to cause a robot to follow me around to do things. So exciting future.

Speaker 1 How do you think about

Speaker 1 the embodied embodied model of development versus not on some of these things in terms of that's that's another sort of I think set of trade-offs that some people are making or deciding between?

Speaker 2 A lot of the AI community is very focused on

Speaker 2 just like language models, vision language models, and so forth. And there's like a ton of hype around like reasoning and stuff like that.
Oh, let's create like the most intelligent thing.

Speaker 2 I feel like actually people underestimate. how much intelligence goes into motor control.
Many, many years of evolution is what led to us being able to use our hands the way that we do.

Speaker 2 And there are many animals that can't do it, even though they had so much, so many years of evolution.

Speaker 2 And so I think that there's actually so much complexity and intelligence that goes into being able to do something as basic as like make a bowl of cereal or pour a glass of water. And

Speaker 2 yeah, so in some ways, I think that actually like... embodied intelligence or physical intelligence is very core to

Speaker 2 intelligence and maybe kind of underrated compared to some of the less embodied models.

Speaker 1 One of the papers that I really loved over over the last couple of years in robotics was your Aloha paper. And I thought it was a very clever approach.

Speaker 1 What is some of the research over the last two or three years that you think has really caused this flurry of activity?

Speaker 1 Because I feel like there's been a number of people now starting companies in this area because a lot of people feel like now's the time to do it.

Speaker 1 And I'm a little bit curious what research you feel was the basis for that shift and people thinking this was a good place to work.

Speaker 2 At least for us, there were a few things that we felt like were turning points that felt like where it felt like the field was moving a lot faster compared to where it was before.

Speaker 2 One was

Speaker 2 the SACAN work, where we found that you can plan with language models as kind of the high-level part, and then kind of plug that in with a low-level model to get a model to do long horizon tasks.

Speaker 2 One was the RG2 work, which showed that you could do the Taylor Swift example that I mentioned earlier and be able to plug in kind of a lot of the web data and get better generalization on robots.

Speaker 2 A third was our RTX work where

Speaker 2 we actually were able to train models across robot embodiments and significantly, we basically took all the robot data that

Speaker 2 different research labs had. It was a huge effort to aggregate that into a common format and train on it.

Speaker 2 And we also, when we trained on that, we actually found that we could take a checkpoint, send that model checkpoint to another lab.

Speaker 2 halfway across the country and the grad student at that lab could run the checkpoint on the robot and it would actually more often than not do better than the model that they had specifically iterated on themselves in their own lab and that was like another big sign that like this stuff is actually starting to work and that you can get benefit across by pooling data across different robots and then also like you mentioned i think the aloha work and and later the mobile aloha work was work that showed that you can teleoperate and get models to train pretty complicated dexterous manipulation tasks.

Speaker 2 We also had a follow-up paper with the shoelace tying. That was a a fun project because someone said that they would retire if they saw a robot tie shoelaces.
Did they retire? They did not retire.

Speaker 2 They're still in the field.

Speaker 1 We need to force them into retirement, whoever that person is. We need to follow up on that.

Speaker 2 Yeah, so I think that those are a few examples.

Speaker 2 And so, yeah, I think we've seen a ton of progress in the field. I also,

Speaker 2 it seems like

Speaker 2 after we started Pi, that that was also kind of a sign to others that if the experts are really willing to bet on this, then

Speaker 2 something, maybe something will happen.

Speaker 1 So, one thing that you all came out with today from Pi was what you call a hierarchical interactive robot or high robot. Can you tell us a little bit more about that?

Speaker 2 So, this was a really fun project. There's two things that we're trying to look at here.
One is that

Speaker 2 if you need to do like a longer horizon task, meaning a task that might take minutes to do, then

Speaker 2 if you just train a single policy to like output actions based on images, Like, if you're trying to make a sandwich and you train a policy that's just outputting the next motor command,

Speaker 2 that might not do as well as something that's actually kind of thinking through the steps to accomplish that task. That was kind of the first component.
That's where the hierarchy comes in.

Speaker 2 And the second component is a lot of the times when we train robot policies, we're just saying, like, we'll take our data, we'll annotate it, and say, like, this is picking up the sponge, this is putting the bowl in the bin, this segment is, I don't know, folding the shirt.

Speaker 2 And then you get a policy that can like follow those basic commands of like fold the shirt or pick up the cup, those sorts of things.

Speaker 2 But at the end of the day, we don't want robots just to be able to do that. We want them to be able to interact with us where we can say like, oh, I'm a vegetarian.
Can you make me a sandwich?

Speaker 2 Oh, and I like, I'm allergic to pickles. So like.
maybe don't include those.

Speaker 2 And maybe also be able to interject in the middle and say like, oh, hold off on the tomatoes or something.

Speaker 2 It's actually kind of a big gap between something that can just follow like an instruction like pick up the cup and something that could be able to handle those kinds of prompts and those situated corrections and so forth.

Speaker 2 And so we developed a system that basically has one model that takes Zimput the prompt and kind of reasons through and able to output like the next step that the robot should follow.

Speaker 2 And that might be.

Speaker 2 that that's kind of like it's going to tell it to then the next thing will be pick up the tomato for example and then a lower level model that takes its input pick up the tomato and outputs the sequence of motor commands for the next like half second that's the gist of it we it was a lot of fun because we actually got the robot to make a vegetarian sandwich or a ham and cheese sandwich or whatever.

Speaker 2 We also did a grocery shopping example and a table cleaning example.

Speaker 2 And I was excited about it first because it was just like cool to see the robot be able to respond to different prompts and do these challenging tasks.

Speaker 2 And second, because it actually seems like a like the right approach for solving the problem.

Speaker 1 On the technical capability side, one thing I was wondering about a little bit was

Speaker 1 If I look at the world of self-driving, there's a few different approaches that are being taken.

Speaker 1 And one of the approaches that is the more kind of Waymo-centric one is really incorporating a variety of other types of sensors besides just vision.

Speaker 1 So you have LiDAR and a few other things as ways to augment the self-driving capabilities of a vehicle. Where do you think we are in terms of the sensors that we use in the context of robots?

Speaker 1 Is there anything missing? Is there anything we should add? Are there types of inputs or feedback that we need to incorporate that haven't been incorporated yet?

Speaker 2 So we've gotten very far just with vision, with RGB images, even.

Speaker 2 And we typically will have one or multiple external kind of what we call base cameras that are looking at the scene, and also cameras mounted to each of the wrists of the robot.

Speaker 2 We can get very, very far with that. I would love if, like, skin, if we could give our robot skin.

Speaker 2 Unfortunately, a lot of the tactile sensors that are out there are either far less robust than skin, far more expensive,

Speaker 2 or

Speaker 2 very, very low resolution. And so there's a lot of kind of challenges on the hardware side there.

Speaker 2 And we found that actually that mounting RGB cameras to the wrists ends up being very, very helpful and probably giving you a lot of the same information that tactile sensors can give you.

Speaker 1 Because when I think about the set of sensors that are incorporated into a person, obviously to your point, there's the tactile sensors effectively, right? And then there's heat sensors.

Speaker 1 There's actually a variety of things that are incorporated that people usually don't really think about much. Absolutely.

Speaker 1 And I'm just sort of curious, like how many of those are actually necessary in the context of robotics versus not, what are some of the things we should think about?

Speaker 1 Like, just if we extrapolate off of humans or animals or other, you know, it's a great question.

Speaker 2 I mean, for the sandwich making, you could argue that you'd want the robot to be able to taste the sandwich to know if it's good or not.

Speaker 1 Or smell it, at least, you know?

Speaker 2 Yeah, I've made a lot of arguments for smell to Sergei in the past because there's a lot of nice things about smell, although we've never actually attempted it before. Yeah.

Speaker 2 In some ways, the redundancy is nice.

Speaker 2 For example, and I think like audio, for example, like a human, if you hear something that's unexpected, it can actually kind of alert you to something.

Speaker 2 In many cases, it might actually be very, very redundant with your other sensors because you might be able to actually see something fall, for example. And that redundancy can lead to robustness.

Speaker 2 For us, it's not currently not a priority to look into these sensors because we think that the bottleneck right now is elsewhere, is on the data front, is on kind of the architectures and so forth.

Speaker 2 The other thing that I'll mention is actually right now we're most like our policies right now do not have any memory. They only look at the current image frame.

Speaker 2 They can't remember even half a second prior.

Speaker 2 And so I would much rather add memory to our models before we add other sensors. We can have

Speaker 2 commercially viable robots for a number of applications without other sensors.

Speaker 1 What do you think is a timeframe on that?

Speaker 2 I have no idea. Yeah.
Some parts of robotics that make it easier than self-driving and some parts that make it harder. On one hand,

Speaker 2 it's harder because you're not just like, it's just a much higher dimensional space. Like, even our static robots have 14 dimensions, seven for each arm.

Speaker 2 You need to be more precise in many scenarios than driving. We also don't have as much data right off the bat.
On the other hand, with driving, I feel like you kind of need to solve the

Speaker 2 entire distribution to be have anything that's viable.

Speaker 2 Like you have to be able to handle an intersection at any time of day or with any kind of possible pedestrian scenario or other cars and all that.

Speaker 2 Whereas in robotics, I think that there's lots of commercial use cases where you don't have to handle this whole huge distribution.

Speaker 2 And you also don't have as much of a safety risk as well. That makes me optimistic.

Speaker 2 And I think that also like all the results in self-driving have been very encouraging, especially like the number of Waymos that I see in San Francisco.

Speaker 1 Yeah, it's been very impressive to watch them scale up usage.

Speaker 1 The thing I found striking about the self-driving world is, you know, there was two dozen startups started roughly, I don't know, 10 to 15 years ago around self-driving.

Speaker 1 And the industry is largely consolidated, at least in the US, and obviously the China market's a bit different, but it's consolidated into Waymo and Tesla, which effectively were two incumbents, right?

Speaker 1 Google and Tesla was an automaker. And then there's maybe one or two startups that either spacked and went public or are still kind of working in the area.

Speaker 1 And then most of it's kind of fallen off, right? And the set of players that existed at that starting moment 10, 15 years ago is kind of the same players that ended up actually winning, right?

Speaker 1 There hasn't been a lot of dynamism in the industry other than just consolidation.

Speaker 1 Do you think that the main robotics players are the companies that exist today? And do you think there's any sort of incumbency bias that's likely?

Speaker 2 A year ago, like it would be completely different.

Speaker 2 And I think that we've had so many new players recently. I think that the fact that self-driving was like that

Speaker 2 suggested that it might have been a bit too early 10 years ago.

Speaker 2 And I think that arguably it was. I think deep learning has come a long, long way since then.

Speaker 2 And so I think that that's also part of it.

Speaker 2 And I think that the same with robotics. Like, if you were to ask me 10 years ago or even

Speaker 2 five years ago, honestly, I think it would be too early.

Speaker 2 I think the technology wasn't there yet. We might still be too early for all we know.
I mean, it's a very hard problem.

Speaker 2 And like how hard self-driving has been is, I think, is a testament to how hard it is to build intelligence in the physical world.

Speaker 2 In terms of like major players, there's a lot of things that I've really really liked about the startup environment and a lot of things that were very hard to do when I was at Google.

Speaker 2 And Google is an amazing place in many, many ways. But like as one example, taking a robot off campus was like almost a non-starter just for code security reasons.

Speaker 2 And if you want to collect diverse data, taking robots off campus is is valuable. You can move a lot faster when you're a smaller company and you don't have

Speaker 2 kind of restrictions, red tape, that sort of things. The really big companies, they have a ton of capital and so they can last longer.
But I also think that

Speaker 2 they're going to move slower too.

Speaker 1 If you were to give advice to somebody thinking about starting a robotics company today, what would you suggest they do or where would you point them in terms of what to focus on?

Speaker 2 I think the main advice that I would give someone trying to start a company would be to

Speaker 2 try to learn as much as possible quickly. And I think that actually like trying to deploy quickly and learn and iterate quickly,

Speaker 2 that's probably the main advice and try to, yeah, like actually get the robots out there, learn from that.

Speaker 2 I'm also not sure if I'm the best person to be giving startup advice because I've only been an entrepreneur myself for 11 months, but

Speaker 2 yeah, that's probably the advice I'd give.

Speaker 1 That's cool. I mean, you're running an incredibly exciting startup.
So I think you have a full ability to

Speaker 1 suggest stuff to people in that area for sure. I've heard a number of different groups doing is really using observational data of people as part of the training set.

Speaker 1 So that could be YouTube videos, it could be things that they're recording specifically for the purpose. How do you think about that in the context of training robotic models?

Speaker 2 I think that data can have a lot of value, but I think that by itself, it won't get you very far. And I think that there's actually some really nice analogies you can make where,

Speaker 2 for example, if you watch like an Olympic swimmer swimmer race,

Speaker 2 even if you had their strength,

Speaker 2 just their practice at moving their own muscles to do the, to accomplish what they're accomplishing is like essential for being able to do it or if you're trying to learn how to hit a tennis ball well you won't be able to learn it by kind of watching the pros now maybe these examples seem a little bit contrived because they're talking about like experts the reason why i make those analogies is that we humans are experts at motor control low-level motor control already for a variety of things that our robots are not and i think the robots actually need experience from their own body uh in order to learn and so i think that it's really promising to be able to leverage that form of data especially to expand on the robot's own experience.

Speaker 2 But it's really going to be essential to actually have the data from the robot itself, too.

Speaker 1 In some of those cases, is that just general data that you're generating around that robot, or would you actually have it mimic certain activities? Or how do you think about the data generation?

Speaker 1 Because you mentioned a little bit about the transfer and generalizability. It's interesting to ask, well, what is generalizable or not, and what types of data are and aren't, and things like that.

Speaker 2 I mean, when we collect data, we have it's kind of like puppeteering, like the original Aloha work.

Speaker 2 And then you can record both the actual motor commands and the

Speaker 2 sensor, like the camera images. And so that is the like experience for the robot.

Speaker 2 And then I also think that autonomous experience will play a huge role, just like we've seen in language models after you get an initial language model.

Speaker 2 If you can use reinforcement learning to have the robot, the language model bootstrap on its own experience, that's extremely valuable. Yeah.

Speaker 2 And then in terms of what's generalizable versus not, I think it all comes down to the breadth of the distribution. It's really hard to quantify or measure how broad the robot zone experience is.

Speaker 2 And there's no way to categorize the breadth of the tasks, like how different one task is from another, how different one kitchen is from another, that sort of thing.

Speaker 2 But we can at least get a rough idea for that breadth by like looking at things like the number of buildings or the number of scenes,

Speaker 2 those sorts of things.

Speaker 1 And then I guess we talked a little bit about humanoid robots and other sort of formats.

Speaker 1 If you think ahead in terms of the form factors that are likely to exist in end years as this sort of robotic feature comes into play, do you think there's sort of one singular form?

Speaker 1 Are there a handful? Is it a rich ecosystem, just like in biology? Like, how do you think about what's going to come out of all this?

Speaker 2 I don't know exactly, but I think that my bet would be on something where there's actually a

Speaker 2 really wide range of different robot platforms.

Speaker 2 I think Sergei, my co-founder, likes to call it a Cambrian explosion of different robot hardware types and so forth, once we actually can have the technology that can, the intelligence that can power all of those different robots.

Speaker 2 And I think it's kind of similar to like, we have all these different devices in our kitchen, for example, that can do all these different things for us.

Speaker 2 And rather than just like one device that can, that. cooks the whole meal for us.

Speaker 2 And so I think we can envision like a world where there's like one kind of robot arm that does things on on the kitchen that has like some hardware that's optimized for that and maybe also optimized for it to be cheap for that particular use case

Speaker 2 and another hardware that's kind of designed for for like folding clothes or something like that dishwashing those sorts of things these are all like speculation of course but i think that a world like that is something where um yeah it's i think different from what a lot of people think about in the book the diamond age there's there's sort of this uh view of like matter pipes going into homes and you have these 3d printers that make everything for you.

Speaker 1 And in one case, you're like downloading schematics and then you 3D print the thing.

Speaker 1 And then people who are kind of bootlegging some of this stuff end up with almost evolutionarily based processes to build hardware and then select against certain functionality as the mechanism by which to optimize things.

Speaker 1 Do you think a feature like that is at all likely?

Speaker 1 Or do you think it's more just, hey, you make the foundation model really good, you have a couple of form factors and, you know, you don't need that much specialization if you have enough generalizability in the actual underlying intelligence?

Speaker 2 I think a world like that is very possible.

Speaker 2 I think that you can make a cheaper hardware piece of hardware if you are optimizing for a particular use case and maybe it'd be like also be a lot faster and so forth.

Speaker 2 Yeah, obviously very hard to predict.

Speaker 1 Yeah, it's super hard to predict because one of the arguments for a smaller number of hardware platforms is just supply chain, right?

Speaker 1 It's just going to be cheaper at scale to manufacture all the subcomponents and therefore you're going to collapse down to fewer things because unless there's a dramatic cost advantage, those fewer things will be more easily scalable, reproducible, cheap to make, et cetera, right?

Speaker 1 If you look at sort of general hardware approaches. So it's an interesting question in terms of that trade-off between those two tensions.

Speaker 2 Yeah, although maybe we'll have robots in the supply chain that can manufacture any customizable device that you want.

Speaker 1 It's robots all the way down. So that's our future.
Yeah. Well, thanks so much for joining me today.
It was a super interesting conversation. We covered a wide variety of things.

Speaker 1 So I really appreciate your time.

Speaker 2 Yeah, this is fun.

Speaker 3 Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.

Speaker 3 That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

The Robotics Revolution, with Physical Intelligence’s Cofounder Chelsea Finn

Press play and read along

Transcript

More episodes from No Priors: Artificial Intelligence | Technology | Startups

The Future of Voice AI: Agents, Dubbing, and Real-Time Translation with ElevenLabs Co-Founder Mati Staniszewski

Scaling Legal AI and Building Next-Generation Law Firms with Harvey Co-Founder and President Gabe Pereyra

Sunday Robotics: Scaling the Home Robot Revolution with Co-Founders Tony Zhao and Cheng Chi

How AI Will Accelerate Breakthroughs in Biotechnology with Benchling CEO Sajith Wickramasekara

Meet Snowflake Intelligence: A Personalized Enterprise Intelligence Agent with Sridhar Ramaswamy