Sunday Robotics: Scaling the Home Robot Revolution with Co-Founders Tony Zhao and Cheng Chi
Sign up for new podcasts every week. Email feedback to show@no-priors.com
Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @tonyzzhao | @chichengcc | @sundayrobotics
Chapters:
00:00 – Tony Zhao and Cheng Chi Introduction
00:56 – State of AI Robotics
02:11 – Deploying a Robot Pre-AI
03:13 – Impact of Diffusion Policy
04:29 – Role of ACT and ALOHA
07:02 – Imitation Learning - Enter UMI
10:38 – Introducing Sunday
11:57 – Sunday’s Robot Design Philosophy
15:05 – Sunday’s Shipping Timeline
19:02 – Scale of Sunday’s Training Data
23:58 – Importance of Data Quality at Scale
24:56 – Technical Challenges
27:59 – When Will People Have Home Robots?
30:48 – Failures of Past Demos
32:34 – Sunday’s Demos
36:53 – What Sunday’s Hiring For
39:10 – Conclusion
Press play and read along
Transcript
Speaker 1
Nobody wants to do their dishes. Nobody wants to do their laundry.
People will love to spend more time with their family, with their loved ones.
Speaker 1 So what we believe in is that if the robot is cheap, safe, and capable, everyone will want our robot.
Speaker 1 And we see a future where we have more than 1 billion of these robots in people's homes within a decade.
Speaker 1 Thanks, Memo.
Speaker 2
Hi, listeners. Welcome back to No Priors.
Today we're here with Tony Zhao and Chang Chi, co-founders of Sunday, makers of Memo, the first general home robot.
Speaker 2 We'll talk about AI and robotics, data collection, building a full-stack robotics company, and a world beyond toil.
Speaker 1 Welcome.
Speaker 2 Chang, Tony, thanks for being here.
Speaker 1 Thanks for having me. Thanks.
Speaker 2 Okay, first I want to ask, like, where are we here? Because classical robotics has not been an area of great optimism over time or like massive velocity of work.
Speaker 2 And now people are talking about a foundation model for robotics or a chat GPT moment. Can you just contextualize like the state of AI robotics and why we should be excited?
Speaker 1 I would say I think we're kind of in between the GPT moment and the chat GPT moment.
Speaker 1 Like in the context of LLMs, what it means is that It seems like we have a recipe that can be scaled, but we haven't scaled it up yet.
Speaker 1 And we haven't scaled it up so much so that we can have a great consumer product out of it. So this is what I mean like GPT, which is like a technology, and ChatGPT, which is a product.
Speaker 1 Yeah, so we're seeing across academia, there's consensus around what's the method for manipulation, but everybody's talking about scaling up. It's like we know there's sign of life for the algorithms.
Speaker 1 people are picking, but people don't know if we have more data like what happened to GPT2, GP3, what will happen.
Speaker 1 But we see a clear trend that
Speaker 1 there's no reason to believe that robotic doesn't follow the trajectory of other AI fields that scaling up is going to improve performance.
Speaker 2 Maybe even if you took a step back, like what was the process for deploying a robot into the world like 10 years ago, like pre set of generalizable AI algorithms? Like why was it so slow as a field?
Speaker 1 Yeah, so previously,
Speaker 1 you know, classical robotics have this sense plan act modular approach, where there's a human designing interface between each of the modules.
Speaker 1 And those are need to be designed for each specific task and each specific environment. In academia, that means for every task, that means a paper.
Speaker 1 So a paper is you design a task, design an environment, and you design interfaces, and then you produce engineering work for that specific task.
Speaker 1 But once you move on to the next task, you throw away all your code, all your work, and you start over again. And that's also kind of what happened to industry.
Speaker 1 So for each application, people build a very specific software software and hardware system around it, but it's not really generalizable.
Speaker 1 And therefore, it's just feel like we're just running in loops. We build one system and then we build the next one, but there's like no synergy between them.
Speaker 1 And as a result, the progress has been somewhat slow.
Speaker 2 I feel like that's a good segue into some of the amazing research work that you guys have contributed over the last five years to the field. Should we start with diffusion policy?
Speaker 2 What was the impact of that?
Speaker 1 Yeah, so a diffusion policy is like a specific algorithm for a paradigm called imitation learning. That's really like the most intuitive way of, you know, how to use machine learning for robotics.
Speaker 1
So you collect paired data of action observation of what the robot should do. You use that to train a model with supervised learning.
And then the robot do the same thing.
Speaker 1 The problem is that in the field, it's known to be very finicky.
Speaker 1 So when I talk to researchers, when I start into the field, people are like, the researcher themselves, the specific researcher, need to collect the data.
Speaker 1 so that there's exactly one way to do everything. Otherwise, the robot either like either the model training will diverge or the robot will behave some weird way.
Speaker 1 And diffusion model really allows us to capture multiple modes of behavior for the same observation in a way that's still preserved training stability.
Speaker 1 And that really kind of unlocked more scalable training and more scalable data collection.
Speaker 2 So it doesn't have to be you personally wearing, you know, a teleops set in order to make a robot learn.
Speaker 1
Yeah. Yep.
So like we can have multiple people, sometimes even untrained people collecting data and the result will still be great.
Speaker 2 Where do Aloha and ACT play into this?
Speaker 1
Yeah, so these two papers are actually like super close to each other. They're like one month or two months away.
That's actually how me and Chen know each other.
Speaker 1 It was about looking at each other's paper, like, and we met on Twitter, I think, when Chen is back in Colombia. Before Aloha, I think the typical way people collect data is with a
Speaker 1
like teleopolition setup with VRHSET. And it turns out to be very unintuitive to do.
And it's hard to collect data that is actually dexterous.
Speaker 1 What Aloha brains is a very simple and reproducible setup. So it's very intuitive.
Speaker 2 Sorry, in terms of just for most people who haven't worn a teleops setup, is it the lag? Is it like just, you know, how should I compare it to like playing a video game or something?
Speaker 1
Yeah, I think Aloha make it feel more like playing a video game. Normally it feels kind of disconnected that.
you're just like moving in the free air and the robot is moving with some delays.
Speaker 1 But Aloha reduces that delay by a lot. And that contributes to the kind of smoothness and how fast humans can react.
Speaker 1 Like once we get those really dexterous data, what it allows us to do is to investigate on algorithms that are actually solving things that are difficult.
Speaker 1 In this case, it is through the introducing of transing transformers in the case of robotics. And there was a long period of time that I think robotics was stuck with three layer MLPs and ConfNets.
Speaker 1 And as you make it deeper, it works worse. But it turns out that once you have very very strong and dexterous data sets, like just throw a transformer at it and it works quite well.
Speaker 2 Actually, like just in terms of progress of the industry over time, transformers didn't make sense without a certain level of data collection capability.
Speaker 1 Okay. And also auto system around it, for example, action chunking, which is to predict a trajectory as opposed to predicting single steps of actions.
Speaker 1 All these things kind of combined to make dexterous tasks, bimanual tasks, more scalable.
Speaker 2 Why is chunking important here? If I think about like just the analogy to LLMs and text sequence prediction?
Speaker 1 I think it just kind of throws the ML off if you're trying to force it to react every millisecond. That's
Speaker 1 not how human act. We perceive and we can actually move quite a bit without looking at things again.
Speaker 1 And that turns out to make things, the motion a lot more consistent and oral performance to be a lot better.
Speaker 2
And you discovered that actually transformers architecturally did apply to robotics. Cheng, you felt then that data collection was still a problem.
So enter UMI.
Speaker 1 Yeah, so after Aloha and diffusion policy, I was super excited about imitation learning. But at the time, both of us started still doing teleoperation, and that just feels super limiting.
Speaker 1 I think the problem is that in your setup at a time, like a teleop setup,
Speaker 1 it takes a PhD student a couple of hours to set it up in a lab. It pretty much restricts data collection to a lab.
Speaker 1 But in order for a robot to actually work as a product, it needs to be worked in the wild, in unseen environments. That requires data be also collected in the wild.
Speaker 1 And at the time, I was thinking, okay, is there a way we can collect robotic data without actually using a robot?
Speaker 1 That forced me to think, okay, what's the actual most essential part of a robotics data? And after diffusion policy and act, actually, the paradigm is kind of simple.
Speaker 1 You just need paired observation and action data. In our case, the observation is the video clip.
Speaker 1 The action is the movement of your hand plus how the finger moves. I realized that all this information you can get from a GoPro.
Speaker 1 You can track the movement of GoPro in space and you can track the motion of the
Speaker 1
gripper and also finger through images as well. And that's why I built this UMI gripper.
It's 3D printed. At the time, the project has three PhD students.
We just took the grippers everywhere.
Speaker 1 I think it was two weeks before the paper deadline. Every time it goes to a restaurant, before the waiter comes in, we just collect some data.
Speaker 1 And very quickly, we got, you know, I think 1,500 video clips of this like Espresso cup serving task. And that turns out to be one of the biggest data sets in robotics.
Speaker 1 And it's simply by three people. And that's like how, like, that's where the power kind of shines.
Speaker 1 And then with that amount of data, allows to train the first end-to-end model that can actually generalize to unseen environments. So we can push the robot around in Stanford.
Speaker 1 Actually, Tony was there as well uh you know push robot arm around the stanford campus and then anywhere you know the robot can serve you a drink yeah i think that is the moment i was like hey maybe we should start a company this is actually working so well i remember like just following chung and
Speaker 1 sometimes it doesn't work well
Speaker 1 yes i think the only exception i saw was um when it's under direct sunlight yeah right and i think the reasoning was like over that whole like two three weeks of data that two weeks is all raining so there's no sunlight data uh so like it it fails.
Speaker 1 That also demonstrates the importance of distribution matching. So in order for a robot to work in a sunny environment, it must have seen sunny environments, training, training data.
Speaker 2 Yeah, this is really interesting because I remember when I first met you guys, it was like you spent like, I don't know, $200,000 across all of your academic research.
Speaker 2 And yet the scale of data collection as translated to model capability is leading, right?
Speaker 2 So it's very interesting that, you know, we look at where we are, maybe going back to to Tony's point of scaling and massive capital deployment.
Speaker 2 But that entire paradigm actually wasn't relevant before people realized like you should train on all of the internet data. And we just don't have that in robotics.
Speaker 2 So the entire field is just blocked on having any scale of data that's relevant.
Speaker 1 I think these days there are so like so many debates about like what is even the right way to scale.
Speaker 1 There are like world models, there are simulations, there is teleoperation, there are like all these new ideas.
Speaker 1 And I think this is the sort of area that we really want to innovate, that we want to differentiate. They want to find out something that is both high quality and scalable.
Speaker 2 And then you guys, you decide to start a company pushing this cart around Stanford. Tell me about that decision and congratulations on the launch and sort of the direction and team you've built.
Speaker 1 Yeah,
Speaker 1 it's a very interesting journey. I remember in the beginning, especially two of us, in Chung's apartment on his desk, we were like clamped a robot there and tried to do some tasks.
Speaker 1 And it soon soon becomes like i think an eight person team towards the end of 2024 and now we're at around like 30 to 40 people we're not the best at everything right but starting a company allows us to find people who we really love working with and then bring all the expertise together from mechanical engineering supply chain like software engineering like controls and to build a system together that is not like a demo, but a real product.
Speaker 2 You built this amazing team. What are people actually signing up for? What's the mission of Sunday?
Speaker 1 Yes. It is to put a home robot in everyone's home.
Speaker 1 I think there are a lot of AIs trying to make you more efficient during the work, but there is not enough AI that actually helps you with all these mundane things that are not creative.
Speaker 1 That really has nothing to do with what's making us more intrinsically human.
Speaker 1 What's ideal for people to spend more time on is actually with their hobbies, with their passions, as opposed to spending more time doing chores.
Speaker 2 So if you guys are going from, you know, these amazing research breakthroughs to we're actually going to ship a home robot, you know, and that's a product, you have to talk about cost and capability and robustness.
Speaker 2 Like what's the design philosophy?
Speaker 1 As these AI models become more capable and as hardware cost continues to go down,
Speaker 1 the home robots or all kinds of robots will be everywhere. So if we start from the most surface level, which is design of the robot,
Speaker 1 when we design it, we think about
Speaker 1
what should a robot look like if it is ubiquitous. You need to see it every single day.
What should it look like?
Speaker 1 And what we end up with is that we really think the robot should have a face. It should have a cute face and it should be very friendly.
Speaker 1 So instead of like a Terminator doing your dishes, we want the robot to feel like it's out of a cartoon movie. And then a huge decision is like, how many arms should the robot have?
Speaker 1 Should it have like four arms? Should it have like one arms? Should it have legs? Should it have like five fingers, two fingers, three fingers? It's a huge space.
Speaker 2 Why isn't the obvious answer? It should just be like a full human arm.
Speaker 1 I think the core motivation for us is how can we build a useful robot as soon as possible. So whenever we see something that we can accelerate it with simplification, we'll go simplify that.
Speaker 1
So one example of that is the hand that we designed. which has three fingers.
We kind of combine the three of the fingers that we have together.
Speaker 1 And the reasoning there is is just that most of the time when we use those fingers, we use it together. Let it be like grasping a handle, let it be opening the dishwasher.
Speaker 1 So it really doesn't make sense to add the cost, like multiply by 3x
Speaker 1 to have separate it into three when we can do one with most of the benefits. So this is how we think about the whole robot as well.
Speaker 1 It's kind of with the constraint that we're building a general purpose robot that can eventually do all your chores and will simplify everything we possibly can so that the robot can can be as low cost and as easy to repair as possible.
Speaker 1 Yeah, I just want to add a little bit more to the actuator and like mechanical design. Traditionally, most robots are designed for industrial use cases.
Speaker 1 And the robot are very fast, they are very stiff, and they're very precise. The reason is because all the industrial robots are blind.
Speaker 1 So they're blindly following a trajectory that's programmed by someone.
Speaker 2 It's not reaction to perception.
Speaker 1
Correct, right? Correct. Okay.
But because of the breakthrough we had in AI, like now the robot have eyes, so it can actually correct its own mechanical and hardware inaccuracies.
Speaker 1 So that kind of like opened up a new different space of design.
Speaker 2 And intuitively, it should be like, I can't tell you exactly what the distance is that you're on a millimeter scale, but I'm going to get to the cup because I could stop.
Speaker 1 Yeah, exactly. So that allows us to use these low-cost actuators that's achieved, that's compliant, but they're imprecise.
Speaker 1 But because of the AI's algorithms and systems we build, it allows us to build robots that's mechanically inherently safe and compliant while simultaneously be able to achieve the sufficient accuracy we need for the home tasks.
Speaker 2 Where are we in that timeline? You said we're between GPT and chat GPT. And so like when do consumers get chat GPT and when will you guys ship something?
Speaker 1 Yeah, it's actually a really exciting time because like we have so many prototypes internally. What we will do next year, 2026, is actually start doing beta programs.
Speaker 1 We'll have these robots, all kinds of different ones, into people's home and see how they react to it. That will be when we learn the most about like how people like, people want to talk to a robot.
Speaker 1 Do people want to have the robots maybe teach their kids some new knowledge about the world? And this will inform us what the eventual product should look like.
Speaker 1
Internally, we just have an extremely high standard of what is the minimal consumer product we want to ship. It needs to be extremely safe.
It needs to be extremely capable and low cost.
Speaker 2 Do you feel like you know something now that you didn't when you started the company?
Speaker 1 Absolutely. So I think at the beginning, I would describe it as like we see light at the end of the tunnel of there are two axes, there's dexterity, there's generalization.
Speaker 1
When we add more data, things work better. And what this company about is the cross product of these two.
How can we scale and have both dexterity and generalization?
Speaker 1 And this is something we're able to show in our generalization demo, which is like we can pick up these like very precise like forks, like actual metallic forks on a ceramic plates with very high success rates.
Speaker 1 And honestly, this is not something that like we thought that would work so easily just by having so much more data. Yeah, so actually, just want to expand a little bit.
Speaker 1 You know, it's actually the process was long and painful.
Speaker 1 So, yeah, there are so many issues, like just scaling up a system, a robotic system is very, very hard. There are mechanical issues, like reliability issues.
Speaker 1 There's like data, you know, quality issues that come out of it. In the beginning, I actually thought it's going to be much easier than this.
Speaker 1 But really, just it takes time and effort to grind out all the little details for this to work. I also actually compared to teleop is much harder to get the system scaled up.
Speaker 1 But once it's scaled up, it's very powerful and very repeatable.
Speaker 2 So it is both harder than you thought it would be to get to here, and you are further than you thought you would be.
Speaker 1 Yes. And I remember in the beginning, we're having this like funny conversation of we're like, if we build this, someone can just like take our glove and they'll build the same thing.
Speaker 1 Like, what mode do we have? Are we worried about that? And in the beginning, actually, we were a little bit worried because we thought, like, oh, you know, they can probably just replicate it.
Speaker 1 But as we go along the path, it turns out things are so much harder than we thought it was. There's so many small details.
Speaker 1 Yeah. Yes.
Speaker 2 And when you say scaling up the robotic system, you mean the data collection to training pipeline and the hardware itself.
Speaker 1
Yeah. So actually, like for this data, for this to work at all, you need the data clear system.
Yeah, you need the robotic and control system to be able to deliver the hand to where you want to go.
Speaker 1
You also need the data filtering pipeline and data cleaning pipeline and the training pipeline. And all these things need to be iterated together.
So, actually, gone through several loops of these.
Speaker 1
It's kind of hard to imagine without having a full stack team in-house how this can even be done. Yeah.
The glove we're using right now is we call it like v5.
Speaker 1 And for v0 to v5, each version has like around 20 iterations okay and so a hundred yes yes um and also like just when you make these at scale right now we have more than 500 people using these golfs in the wild like all the things that could go wrong will go wrong for example they did they did yes
Speaker 1 for example like how things are assembled if you don't specify exactly how it should be done people will assemble it in creative ways.
Speaker 1 And the creativity doesn't help us here because like we really want the data collection device to be extremely precise.
Speaker 2 So you guys can't obviously know everything that's happening in every company and in academia and industry, but from what you know, how would you compare the scale of training data that you have today relative to the industry?
Speaker 1 At this point, we are almost
Speaker 1 10 million trajectories being collected in the wild. And those trajectories are not just like, oh, pick up a cup.
Speaker 1 It's these long trajectories with like walking, with navigation, and then like doing these long horizon tasks.
Speaker 2 Tony, as you mentioned, like it's an open question, actually, what the right way to scale data up is. And so there are strong theories around teleop, around like pure RL,
Speaker 2 around video and world models.
Speaker 1 Like,
Speaker 2 how did you think about all of these?
Speaker 1 Yeah, so from our perspective, actually, it's kind of somewhat surprising.
Speaker 1 So in the beginning, we worried that, you know, the data from Glove or UMI-like data has higher quantity, but lower quality quality compared to teleop.
Speaker 1 Because for teleop, you're using exactly the same hardware and software stack between training and testing. It's perfectly distribution match.
Speaker 1 But what we realized is actually this GLOV form factor encourages people to do more dexterous and more natural movement. And those actually result in a more intelligent behavior on the modeling side.
Speaker 1 And in terms of data quality, we don't really see a difference in terms of how much
Speaker 1 there's a gap between teleop and glove data. After we did the 20 engineering like
Speaker 1 yeah yeah like because like apparently there is a mismatch right that um in the camera frame there's a human instead of the robot and there are just a lot of things that we need to do to kind of convert a human data one-to-one to like as if it is robot data and have the model not being able to tell the difference yeah and that kind of relies on again the whole the full cycle iteration between hardware and software what about rl we see a lot of great promise for RL in local motion.
Speaker 1 And
Speaker 1 we think that will continue to be true for local motion. So what we see really, like RL is a method is very powerful,
Speaker 1
but it is much less sample efficient compared to imitation learning. And we see that to work great in environments where it's easy to simulate.
For the case of local motion,
Speaker 1 you only need to worry about rigid body dynamics and rigid body contact between the robot and the ground.
Speaker 1 And, you know, because you engineer a robot, you know everything.
Speaker 1 But for manipulation, it's kind of hard for us to imagine, like, have there's actually the same amount of diversity and the distribution of real object
Speaker 1 in terms of matching both
Speaker 1 appearance and physical properties. And we think that it's going to be challenging compared to global data collection and then Telegraph.
Speaker 1
Yeah, I think it's really about which method can get us there faster. There might be different methods that will eventually get there.
For example, like, you know, simulation world model, right?
Speaker 1 And like, it's almost a tautology to say that if I have a perfect world simulator, anything can be done there.
Speaker 1 Like as long as you can do it in the real world, you can do it in a simulation and you can like, you know, cure cancer in a simulator, right?
Speaker 1 But what it turns out for robotics is that there's some, some things are harder than others, and it really depends on the problem itself.
Speaker 1 So in the case of locomotion, as I mentioned, all we need to model in a simulator are point contacts with a somewhat flat ground,
Speaker 1 like feet. Yes,
Speaker 1 but sort of the behavior we want out of it is actually very difficult to model. Like it's all these reactive behaviors that when you feel like your
Speaker 1 leg is hitting something, you should like retract and you know, step again.
Speaker 1 These are very, very hard to describe or try to learn from demonstrations directly.
Speaker 1 But in the case of manipulation, I think the difficulty is flipped, that it's a lot easier to capture the behavior itself, and it's a lot harder to simulate the world.
Speaker 1 For example, if you were to grasp a transparent cup with some orange juice in it, it's ridiculously hard to simulate how
Speaker 1 your hand deforms around the cup and how all those ripplings, how those
Speaker 1 like the color of the juice results in like the rendering and what the policy end up seeing. simulating that is very expensive and difficult.
Speaker 1 But all we need to learn is just to like get your hand to be in front of the cup and then close with the appropriate amount of force. And that's actually very easy to learn.
Speaker 1 That's why we see so many success of imitation learning in the case of robotics manipulation is because the behavior itself is actually not
Speaker 1 as hard as simulating the world. And that's why we see faster progress there.
Speaker 2 Is there anything that you have changed your point of view on in data over the last year?
Speaker 1 Like the one thing I wouldn't say changed, but just data quality really matters.
Speaker 1 Like, I think we know, I always knew data quality matters, but once you scale it up, like it really matters. And then, because
Speaker 1
just like the diversity of behavior that you experience in the wild is very hard to control. And the hardware failures are hard to control.
You need to constantly monitor them.
Speaker 1
Just need to spend a lot of huge amount of engineering effort just to make sure that the data is clean. Yeah.
And also building all those automatic processes, right?
Speaker 1 We have our own way of calibrating the glove before we ship it out. And we have this whole software system to catch if something is broken on a glove and we can detect it automatically.
Speaker 1 like the importance of data quality kind of translates to all these repeatable processes and we don't need a human to be staring at the data to know that something is wrong.
Speaker 2 When you described the beta for next year,
Speaker 2 a lot of it sounded like, you know, we just want to understand behavior, like how people actually want to use it. We can make some design decisions for the actual product.
Speaker 2 What technical challenges do you still see?
Speaker 1
So to me, I think there's like two kinds. The number one is really figuring out the training recipe at scale.
We as a field just entered
Speaker 1 the realm of scaling and we just got the amount of data that we need.
Speaker 1 I think now is a perfect time to start do research and actually figure out what exact training recipe we need to actually get robust behaviors uh and i think you know we're in a unique position because like the amount of data and the entire pipeline we built around data the second point i think just really hardware is hard we're still pushing the boundary the envelope performance envelope of hardware it's not really clear actually what is needed what is needed what is necessary for the hardware to be reliable because like when whenever the mechanical team build a hardware the learning team will try to hard try harder to push it against the boundary and then it'll break at some point.
Speaker 1 But I think what's interesting in this company is that everybody's under the same roof. So immediately after something breaks, it goes straight back into mechanical design.
Speaker 1
And we have another iteration, like say for the hand parts very quickly. Hardware is hard, but it is important.
And I think, you know, it's a hard but right thing to do.
Speaker 1 And I think we as a field shouldn't avoid doing the hard things just because they're hard. Yeah, I want to echo Chung's point about like first the research.
Speaker 1 I think when there is data scarcity, it is really easy to come up with cute, fancy research ideas that don't end up scaling very well.
Speaker 1 And this is why, like, when we build a company, we actually focus on the infrastructure and a scalable data pipeline and operations before we start to really dive into research, which we only started to do like three months ago.
Speaker 1
I think we really want to avoid doing research that doesn't scale. We want to focus on things that contribute to the final product.
The second point is:
Speaker 1 I think robotics is so intrinsically
Speaker 1 like a system that right now, we don't, like, there's not a existing general purpose home robot out there. And we don't really know what the interface of different systems is, like, what is even good.
Speaker 1 And in that case, if you're working with a partner, it's actually really hard for them to understand your standard of good because your standard of good is changing all the time.
Speaker 1 This is why we are like building everything in-house in a more full-stack approach. We build build our own data collection device that is co-designed with the robot.
Speaker 1 We build our own like operation team to be like, how can we most efficiently get the most high-quality data out? And of course, our own AI training team that make the best use of these data.
Speaker 1
I think these are the things that are really not easy. It makes our company a lot harder to build that right now.
You suddenly need so many teams and they need to orchestrate together.
Speaker 1 But we believe it is the right thing to do.
Speaker 2 Okay, I'm going to ask you a few questions that are uncomfortable guesses now, but when will people be able to buy robots commercially for the home?
Speaker 1 Like, this is something we're really excited about because we have so many of prototype robots in our office and really want to get it out there.
Speaker 1 So the next step of our plan is to have a beta group program 2026.
Speaker 1 And what it means is that for people who sign up that we selected, they will have a real robot in their home and it will start doing chores for you. And
Speaker 1 it's going to be a really interesting learning lesson for us because we will see how humans interact with the robot. We'll see what kind of things people just really want the robot to do.
Speaker 1 I think this will be before we actually ship it to the masses because we just have an incredibly high standard of what we are willing to ship as a for a consumer experience standpoint.
Speaker 1 We want the robot to be highly reliable, we want it to be capable, we want it to be cheap.
Speaker 1 I think it really depends on the results of the beta program that will decide when it's a good time to ship it. Is it 2027? Is it 28?
Speaker 1
But all of those are possible. But it's not a decade away.
No, it's definitely not a decade away.
Speaker 2 How much do you think it could cost?
Speaker 1 Right now, the prototype robots we have in-house, I think the cost ranges from like $6,000 to something like $20,000. And this is actually pretty interesting that
Speaker 1
the big difference here is not like, oh, we find a better actuator. They're using the same actuators.
They're like very low cost, but actually, it's the cladding of the robot.
Speaker 1 When you're trying to make them at low scale, it's just really expensive. Like, the claddings are like a few thousand dollars to make.
Speaker 1 But this is the type of things that as we scale up, it becomes like they're cheap because instead of like doing CNC, instead of hinge painting them, it'll become injection moding.
Speaker 1 What we see is that as we get the scale to a few thousand units, we can drastically reduce the material cost, likely under 10K.
Speaker 1 and what it implies is that
Speaker 1 when we sell the robots the price will be somewhere around it
Speaker 2 okay so uh you fast forward two three years out if you look like five years and beyond the home robots are ubiquitous like what does life look like how does it change for your average person this is a different answer for everyone for me
Speaker 1 Like, I just really hate dishes.
Speaker 1 Like, in my sink, there's always like four or five dishes that are like somewhat dirty out there that kind of stinks a little bit.
Speaker 1 And after a long day of work, it really doesn't feel good to come out, like see a home like that.
Speaker 1 So I think the world we'll live in is
Speaker 1
going to be cleaner. It's going to be cleaner.
And I was just thinking about it as like the marginal cost of labor in homes goes to zero.
Speaker 2 The last thing I want to make sure we do is like talk about demos, right? There's a lot of robotics launch videos today. It's been years since you saw an optimus serving drinks at a bar.
Speaker 2 Why are those not available? And what is actually hard?
Speaker 1 I think the way I would put it is make zero assumptions, no priors.
Speaker 1 As in, if you see a robot handing one drink to one person, first ask the question of, is that autonomous or is that teleoperated? So this is the first thing.
Speaker 1 So we should look at the tweet and see what they say about it. And then is that, Does it show giving another slightly different color cup to the same person or not?
Speaker 1 If they didn't didn't show it, it means that a robot can literally only pick up that single cup and give it to that same person. When we look at demos, we tend to put our human instinct into it.
Speaker 1 That, like, oh, if we can hand a cup to that person, it must be able to hand a different cup to another person. Maybe you can also do my dishes, maybe you can do a laundry.
Speaker 1 There are a lot of like visual thinking that we can have about it, which is what's great about robotics, that there are a lot of imaginations.
Speaker 1 But I think when we look at demos, only index on things that is strong.
Speaker 1 And that's likely the full scope of that task.
Speaker 1 I think another aspect is, at least me as a researcher,
Speaker 1 I appreciate the number of interactions that happens in a demos. Usually, the more interactions you have, like every interaction, there's a chance of failure.
Speaker 1 So the longer the sequence is, the harder it actually is. So that's something we really emphasize here.
Speaker 1 And that's actually somewhat uniquely easy for us because the glove way of data collection is so intuitive to people. Yeah, it's really about like generalization and reliability.
Speaker 2 So, can you explain the demos that you guys are showing?
Speaker 1
Yeah, of course. So, we're showing like basically three categories of demos.
The first one, as you saw, is we have this whole like messy table.
Speaker 1 And what the robot does is to clean up the whole table and dump the food into the food waste bin and load the dishes into a dishwasher and then operate the dishwasher.
Speaker 1 What makes this demo really hard is that it is a mix of really fine-grained manipulation with these super long horizon full-range tasks, as in like indico up and also go down very much.
Speaker 1
It's a mobile manipulation high. Yes.
Exactly. The reason that we can show this is just how nimble and easy for us to collect these data sets to make Horizon Dexter demo possible.
Speaker 1 And it's also about the forces as well. So you might have seen like, we're trying to pick up two wine glasses with one hand.
Speaker 1
And I struggle with this, but yes. It's actually really hard.
And because it's like transparent objects, we need to also load it very precisely into the dishwasher.
Speaker 1 A lot of it is about how much force you apply.
Speaker 1 Because if you are trying to grasp two in one hand, if you squeeze a little bit harder, you're going to break one of the glass.
Speaker 1 And when you load it into a dishwasher, if you're pushing it in the wrong direction and it hits something, it's going to shatter.
Speaker 1 We did shatter a ton of glasses when we were like experimenting with it. So these are tasks that are like really high stake that
Speaker 1 it's not just about recovering from mistakes, but about not making those mistakes in the first place.
Speaker 1 And this is what's generally the case in a lot of the home tasks, that you're just not allowed to make any mistakes.
Speaker 1 And then we get into the generalization demos, which we basically show our robot, we book like six AMBNBs and we get it there. zero shot and see if it can do like part of the task.
Speaker 1
So two tasks we use. One is I go around the table and clock all the utensils into the caddy.
The other is to grasp a plate and then load it into the dishwasher.
Speaker 1
What makes these demos very interesting is that we don't need any data when we enter that home. It's pure generalization.
And this is as close to getting like a real product as you can get.
Speaker 1 Because when someone buy our home robot, we really don't want them to like collect a huge data set themselves just to like unbox it.
Speaker 1 Also, in addition to the generalization, those two tasks are also really precise. We're using the exact silver wares
Speaker 1 in the home.
Speaker 1 And you need like basically a few millimeter of precision to grasp it properly. Those forks are also hard to perceive because they're reflective, like the lights look weird on it.
Speaker 1 We have a transparent table home. I think that the table looks like nothing, and the robot still reacts very well to it.
Speaker 1 And again, the reason that we can do it is because we have all these like more than 500 people and we've seen so many glass tables in that data set. So the robot is able to do it.
Speaker 1 I think the last bit of the task that we did is kind of pushing what's possible in terms of dexterity. The two tasks we chose, one is an espresso, operating espresso machine.
Speaker 1 The other is like folding socks. What makes these hard is that they require very fine-grained force that is hard to get if you're dealing with teleoperation.
Speaker 1 Because these days, there's not a good teleoperation system that can let you feel how much force a robot is feeling. So basically, when you're teleoperating, your hand is numb.
Speaker 1 And sometimes you are applying like a huge amount of force on the robot, but you don't know it. And that can result in very low data quality.
Speaker 1 That robot is also doing in that aggressive way that we really want to avoid for our system.
Speaker 1
The sock is a very good example that when you're trying to fold it, your two fingers can touch. And that forms a, what we call like a force closure.
You have a closed loop for the force.
Speaker 1 And if your robot is stiff, you can apply an infinite amount of force at it and it doesn't look like anything.
Speaker 1
But for us, because we're using the glove to collect the data, the human who is collecting it can just naturally feel it. It's very intuitive.
I think we're the first
Speaker 1 to do the SOC folding
Speaker 1 and using end-to-end to do like espresso machine out of the whole industry.
Speaker 2 One of the things that you will also need to scale as you guys
Speaker 2 scale up the company is the team. um uh what are you hiring for what do you what are you looking for one thing uh i'm really looking forward is
Speaker 1 yeah yeah so it's uh full-stack roboticists and people who aspire to become full-stack roboticists yeah uh really what you learn in this company it's just that robotics is such a multidisciplinary field uh you need to know uh you know mechanical mechanical a little electrical a little code a little bit data to actually fully optimize the system and we have a couple examples of training, you know, just full-size software engineers to become roboticists, training MA engineers to become roboticists.
Speaker 1 And so if you want to learn about robotics,
Speaker 1 if you want to learn the whole thing, not just to be boxing into your small, you know, little cubicle, let us know.
Speaker 2 And you told me that you didn't write code until you got to college or something. Yeah.
Speaker 1 I was super enthusiastic about robotics, but I was mostly doing like a mechanical and electrical design before that.
Speaker 1 And then I realized, okay, the bottleneck is actually how the robot will move. And there's like,
Speaker 1 there's something called like programming. And then the more I get into it, like the deeper it gets.
Speaker 1
And then toward the end of college, I realized, okay, there's a thing called machine learning. I need to figure out how to trade models.
I think this thing just goes on and on.
Speaker 1 I think it's very natural for me to gradually expand my skill set because I'm always looking forward to build a robot.
Speaker 2 Well, I hope you discover the next field because you're no longer doing dishes.
Speaker 1 It's a very fun place to work. Whatever you can imagine about robotics and consumer product and machine learning, you can find it here because we're just fundamentally such a full-stack company.
Speaker 1 We're not just about the software. We're not just about the hardware, but we're about the whole experience, the whole product, and making sure that product is general and like scalable in the future.
Speaker 2 Awesome. Congratulations.
Speaker 1 It's really exciting.
Speaker 2
Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Podcasts, Spotify, or wherever you listen.
Speaker 2 That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.