Chai-2: The AI Model Accelerating Drug Discovery with Chai Discovery Co-Founders Jack Dent and Joshua Meier
Sign up for new podcasts every week. Email feedback to show@no-priors.com
Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @_jackdent | @joshim5
Chapters:
00:00 – Joshua Meier and Jack Dent Introduction
01:09 – Genesis of Chai Discovery
06:12 – Chai-2 Model
10:13 – Criteria for Specifying Targets for Chai-2
13:12 – How the Chai-2 Model Works
16:12 – Emergent Vocabulary from Chai-2
18:15 – Hopes for Chai-2’s Impact
20:33 – Reception of the Chai-2 Model
22:16 – Future of Wet Lab Screening and Biotech
27:08 – Optimizing Other Molecule Properties
31:37 – Where Chai Invests From Here
36:20 – What Bioscientists Should Learn for Chai-2
40:23 – How Jack and Josh Oriented to the Biotech Space
43:38 – Platform Investment and Chai-2
46:53 – Scaling Chai Discovery
48:21 – Hiring at Chai Discovery
49:09 – Conclusion
Press play and read along
Transcript
Speaker 1 Hi, listeners. Welcome back to Know Priors.
Speaker 1 Today I'm excited to speak with Josh Meyer and Jack Dent, two of the co-founders at Chai Discovery and former bio AI and engineering leaders at Meta OpenAI at ScienceStrike.
Speaker 1 This week, Chai released their industry-leading Chai2 zero-shot antibody discovery platform, which at its core is a generative model that can design antibodies that bind to specified targets with a hundred-fold the hit rate of prior computational approaches.
Speaker 1 We'll talk about their product, the next frontier for Chai, why they're bullish on biotech, and why the most effective antibody engineers will soon be working as expert prompt engineers.
Speaker 1 Jack, Josh, congrats on the Chai 2 launch. Thanks for doing this.
Speaker 2
Welcome. Thanks for having us, Sarah.
We're excited to be here. Great to be here.
Speaker 1 Josh, I'll start by just asking, you know, you and several scientists on the team have been working on AI drug discovery for about a decade now in different settings.
Speaker 1 I have also been looking at this area for over a decade. We haven't yet seen successes of drugs to market that were designed
Speaker 1 with these AI computational techniques. What made you believe? Why start the company when you guys did?
Speaker 2 That's a great question. So many of us have been working on this space for a while and didn't start a company because it was really a research idea, I think, until very recently.
Speaker 2 There were signs of life that someday this was going to work, but it wasn't really on the timeline of a company, right?
Speaker 2 You can't really start a company thinking that 10 years from now things are going to work. You also don't want to start a company after it's already working and kind of miss the boat.
Speaker 2 So the sweet spot is like, okay, we have like maybe one, two years that we have to really get this off the ground. And we made a bet when we started the company that was going to work.
Speaker 2 There were really a couple of things that fueled that decision. The first one was we made a bet that structure prediction, protein folding was going to get a lot better.
Speaker 2 So obviously protein folding is considered solved in a couple of years ago, around like 2020, you had the breakthroughs of AlphaFold 2 and being able to predict protein structures with experimental accuracy, but it was just a single protein structure at a time.
Speaker 2
So we can take a single protein sequence and we can see what that protein looks like. That's very useful for basic biology.
So we can understand what the proteins we're looking at look like.
Speaker 2 But if you think about drug discovery, which is where we're really focused on a chi discovery, in drug discovery, you need to understand how multiple molecules interact with one another.
Speaker 2 So you need to understand how a small molecule drug is going to modulate a protein or how an antibody protein is going to modulate in an antigen protein.
Speaker 2 So, we started to see early signs of life that that was going to be possible.
Speaker 2 And again, we made a bet that we would be able to take this to the next level with the kinds of breakthroughs that we were seeing around diffusion models and around language models.
Speaker 2 The previous generation of structure prediction models would really just predict, you know, like one confirmation of protein at a time, kind of like one view on a protein.
Speaker 2 It's like the early image models, like they didn't have diffusion models. You weren't really able to look at the diversity of generations that could come out.
Speaker 2
And we thought the same thing would impact drug discover and and protein folding as well. So that's a bit of color on how we decided to start the company.
And we did.
Speaker 2 And maybe lastly, I should say, almost every AI bio company before us has had some kind of very tight lab integration with what they are doing. And almost too tight.
Speaker 2 I think the lab integration is great.
Speaker 2 We do a lot of lab experiments at Chai, but the thing that was missing was: could you actually have some kind of portable AI platform, something that would actually be generalizable and could be applied to lots of different areas?
Speaker 2 If you could do that, it means that your impact could really be taken to the next level.
Speaker 2 We can take Chai2, the model that we've just released, and we can deploy it to hundreds of different projects, thousands of different projects.
Speaker 2 Chai 1, which we open source, is already being applied throughout the industry to tons of different projects. We don't even know everything it's being applied to because it's open sourced.
Speaker 2 But that was something that was also really important to us if we were going to kind of see this transformation of biology from a science into more of an engineering discipline, which is ultimately the goal of the company.
Speaker 1 Yeah, I want to come back to what you said about lab integration as we talk more about the technical approach here. But Jack, you and I met in the context of
Speaker 1 you being a beloved engineering and product leader at Stripe, coming from the engineering side and looking for like the most interesting problems to work on in AI.
Speaker 1 Why did you decide to work on this versus like some of the other things we were talking about, like Cogen and such?
Speaker 2 Yeah, so as you know, Sarah, I spent quite some time thinking about my next steps and what I wanted to do with my life after the period I was at Stripe.
Speaker 2 And I give a lot of credit to Josh, actually, for this, that we were good friends going back even to college. We were PSAT buddies in college at Harvard and many of the same classes together.
Speaker 2 While I was maxing out the CS curriculum, Josh was also doing that somehow for the chemistry and physics and all the other scientific curricula as well. But we had landed in a lot of the same classes.
Speaker 2 And as we went our separate ways after college, we really just made a point of keeping in touch every three, six months. And Josh would always talk to to me about his research.
Speaker 2 Once it became clear that the research that Josh and others were doing in this space was really no longer just a toy, uh and but was really going to impact and change the entire industry, that idea became infectious, right?
Speaker 2 It sort of became impossible to unsee the future once you have that glimpse. And although you didn't know until very recently that any of this
Speaker 2 was going to work, and of course, there's still a lot left to prove, once you start to grasp the implications of the fact that over the next few years, we are going to have the ability as a human race to engineer molecules with atomic precision.
Speaker 2 It's almost hard to work on anything else with your life.
Speaker 2 The impacts for society just broadly
Speaker 2 and human health, and not just health.
Speaker 2 There are a ton of other areas which this will touch, which we can get into, but that is just a platform shift in an entire industry.
Speaker 2 And so put that together with the fact that the kind of the belief or conviction that you might just be able to get it working. And I think it was
Speaker 2 impossible to say no to working on this in many ways. So this is a breakthrough result in Chai 2.
Speaker 1 Can you give us a sort of layperson's explanation of what the result was and the model itself and what you think is the most valuable part?
Speaker 2 Sure. Chai 2 is our latest series of models, which are state of the art across a number of different tasks, but specifically the one we're most excited about is design.
Speaker 2 And what we've shown is that we can design a class of molecules known as antibodies, which are some of the most therapeutically interesting molecules as well.
Speaker 2 These account for close to 50% of all recent drug approvals, and seven of the top 10 best-selling drugs out there are actually antibodies.
Speaker 2 And so, what we've shown with CHI-2 is really the ability to design antibodies against targets that one wants to go after in just a small, what we call a 24-well plate, in just 20 attempts.
Speaker 2
What this means is that we take a target, run our models, ask the models to design a antibody. We then ship that antibody to the lab.
We have about a two-week validation cycle in the lab.
Speaker 2 And two weeks later, we see that roughly close to 20% of these antibodies actually bind their targets in the intended way. So, TRI2 is a major breakthrough for the field.
Speaker 2 When we set out on this project, we were actually only targeting a success rate of 1%. That was the company-wide goal for the entire year.
Speaker 2 And the reason we set that goal of 1% is that previous attempts at this problem are maybe successful around 0.1% or even lower of the time. And those are the computational techniques.
Speaker 2 If you look at the traditional lab-based high-throughput screening techniques, people are really screening between millions or billions of compounds just to find one molecule that sticks.
Speaker 2 You know, there's a reason we call it drug drug discovery is it's a discovery problem, it's a search problem.
Speaker 2 And so people are really just sort of panning for gold in these massive yeast or phage libraries. Or alternatively, you might inject a mouse or a llama.
Speaker 2 You might wait a couple of weeks for them to get really sick. You might then
Speaker 2 bleed them, take their plasma, take the antibodies out and isolate them. And this is actually what we did for COVID, actually.
Speaker 2 We actually took some humans who had already got COVID, took their antibodies out of them, tried to find one which actually then neutralized the virus.
Speaker 2 So you can imagine not an ideal or the most efficient or the most principled process.
Speaker 2 And so what we've shown with THI2 is that we've been able to increase these success rates and discovering antibodies computationally by multiple orders of magnitudes compared to the prior state of the art computationally, and by many, many, many orders of magnitudes compared to the traditional lab-based alternatives.
Speaker 2 And what this means is
Speaker 2 pretty profound for the industry in our view.
Speaker 2
There are two ways to look at this. There's, of course, the faster, better, cheaper.
You know, this is going to allow us to make drugs against targets and get them turned around faster.
Speaker 2 But I think the thing that we're really excited about, and what's, I think, more important,
Speaker 2 is the entire class of targets that this will unlock in the future, which have just been inaccessible to previous methods.
Speaker 2 And I think, in general, the biotech industry is, everybody's a little glum right now. XBI hasn't done that well over the last five years.
Speaker 2 I think we're in one of the worst markets in biotech over the last few decades.
Speaker 2 But I think with Tri2, we're starting to see, I think, those first early signs of a real platform shift in biotech, the sort that comes around only so often.
Speaker 2 You know, we've had one in the 70s with all sorts of new techniques then.
Speaker 2 But the idea that in the next five, 10 years, that there are going to be entire new class of molecules that that we're going to be able to discover and entire new targets that we're going to unlock and entire markets that we can open up and therapeutics we can get to patients to real really cure diseases that have had no cure before.
Speaker 2 That's just an incredibly exciting prospect for us.
Speaker 1 I want to come back to impact because I think the ramifications here are
Speaker 2 really huge.
Speaker 1 But if we just go and like think about first problem design, you, I think, looked at 52 problems. Why that many? And like, how do you specify a target?
Speaker 1 I'm picturing like bind to epitope epitope X, but I'm sure there are other requirements you'd want to have as drug designers.
Speaker 2 It's a great question, Sarah. So in the THI2 paper, we look at over 50 targets.
Speaker 2 Most of the existing papers in this area of doing AI for drug discovery are usually looking at like one, two, or three targets.
Speaker 2 But again, it was important for us, if we were seeing this as an engineering problem, to make sure that this is going to be generalizable.
Speaker 2 It's like imagine you had a new LLM paper and you said, oh, I solved like one problem in the USIMO contest. Like, really, really cool.
Speaker 2 It's like, no, you need a real benchmark, and you need to actually have that benchmark at scale. You need to have enough problems to convince yourself that the system is working.
Speaker 2 So, that's why, whenever we do these experiments, you know, sometimes we'll try one or two targets just to make sure there's not like a huge bug and you know, make sure not everything fails.
Speaker 2 But, you know, even if everything fails in one or two, you know, the hit rate's 50%, you could have just gone unlucky.
Speaker 2 So, that's one of the reasons why we decided to do a big benchmark here, really convince ourselves things are working.
Speaker 2 The way we selected the 50 problems, the biology people would laugh at this, The engineering people would love it.
Speaker 2 We actually just went to the vendor catalogs to see what was in stock because we wanted to turn around this experiment quickly. We ordered all of these designs at the same time.
Speaker 2
So we actually wrote a scraper that would go and see what was in stock. We would go and pick out the protein.
We would go look up what that protein sequence was.
Speaker 2 Now we need to make sure this is held out from training as well, right? So we would take that protein sequence. We would go compare it to all of like the
Speaker 2 database called the SABDAB. It's a group of like antibody structures in the protein data bank.
Speaker 2 And we'd make sure that like none of these sequences were in there and that none of these sequences were actually even close to anything in there.
Speaker 2 We removed things that had, you know, more than 70% sequence identity.
Speaker 2 So really things that are like a bit different than what we could have trained on, then selected those, made our designs, and then we shipped everything off to the lab.
Speaker 2 So we actually think it's possible that the 50% is actually an underbound because we might have just like messed things up because of how we set up this experiment. We did not think about the biology.
Speaker 2 These are not necessarily things that are even that useful for therapeutics. Some of these already even have drug programs against them.
Speaker 2
We were just doing this really from a model assessment perspective. Let's understand how well the model is working.
Let's convince ourselves. Let's convince the community that Chai2 is working.
Speaker 2 And then in terms of applying this to problems, I think, you know, now we've got like hundreds of people that want to go and like try the model tomorrow and apply it to the various drug programs that they're working on.
Speaker 2 So that was really how we came up with those 50 tasks. Let's benchmark this and treat it as an engineering problem.
Speaker 1 We have a broad audience for no priors that ranges from like business people to engineers, machine learning researchers, some scientists in other fields.
Speaker 1 Like, what intuition can you give listeners for how the model works under the hood? Like, especially for anybody who might start with some familiarity with like structured prediction models.
Speaker 2 Yeah, well, structured prediction is really a key part in making these models work.
Speaker 2 And it's actually the first thing we did when we started the company is we sprinted to build a state-of-the-art structure prediction engine. We actually open sourced the first version of that.
Speaker 2 It's called Chai One. And again, like scientists around the world are using that now.
Speaker 2 But structure prediction basically gives you an atomic atomic level microscope and it allows you to see where atoms are placed in 3D space.
Speaker 2 So once you can do that and you have this microscope, then the next question is, well, can we start moving those atoms around? Right.
Speaker 2 We can now start to make changes in a sequence and then we can see the ramifications of those changes in 3D space.
Speaker 2 So the actual design model, you can think of it as you prompt it with some information, like here's a target that we want to go and design an antibody against.
Speaker 2 And then the model will try to place, place, again, these atoms in 3D space in order to satisfy that constraint.
Speaker 2 Like we tell the model, here's the target, and I want you to make a molecule that binds to that location.
Speaker 2 And then the model will go in and generate both a sequence and a structure that kind of fits into that. So that's like the high-level intuition for this.
Speaker 2 Yeah, one piece of intuition around that is that you can almost think about structured prediction as the image net moment for the field, where with structure prediction, we are asking a model to go from sequence to a predicted structure.
Speaker 2
And it's sort of like a classification task. And then design, where you're trying to design binders, that is much more like a generative task.
That's sort of like mid-journey for molecules.
Speaker 2 Whereas structure prediction, you are looking to predict the placement of atoms in 3D space.
Speaker 2 With design, you're taking an existing placement of atoms and you're trying to craft a new set of atoms that is complementary to that original set.
Speaker 2 So one analogy that people like to use is that of a lock and a key.
Speaker 2 And that when designing a protein or a drug, you have some target, which is your lock, and you're trying to design a key using a generative model that fits that lock. And
Speaker 2 the way that the models work is actually pretty interesting.
Speaker 2 They reason quite literally by placing individual atoms in 3D space.
Speaker 2 And often they're getting the resolution of these structures, the error, down to less than the width of one atom when we look at the error across the entire structure.
Speaker 2 So when we talk about atomic level microscope, you can see now why that might be important for design, because how can you hope to be able to design the key if you can't see the lock?
Speaker 1 Yeah, that's completely wild from a precision of prediction perspective.
Speaker 1 You know, if we analogize to LLMs, you know, you have learned grammar, syntax, semantics capabilities that emerge in the model that you can measure.
Speaker 1 Is there anything that would be analogous in terms of emergent vocabulary or concepts that you think Chai 2 has?
Speaker 2 Yeah, I think this whole point about the atomic level microscope is actually that point, right? There is something really, I don't know, I think deep.
Speaker 2
We still don't fully understand it about like why these models work. Again, we didn't even know this was possible.
Obviously, we tried it. So we thought that there was a chance.
Speaker 2 And I think it just tells you something about, you know, maybe the signature of how proteins interact with one another is really embedded in the data, right? And we're generalizing to a new setting.
Speaker 2 So it's not like the model has seen, you know, specific binders against a target, and then we're just trying to do some in-domain generalization and walk through that space.
Speaker 2
That's actually quite an impactful application as well. And that's already being done through the biotech industry.
Our team published work on that years ago already.
Speaker 2 But I think this really new frontier about generalizing to a new space, it tells us that, again, like the model is learning something really fundamental about how the molecules interact with one another.
Speaker 2 Again, it's able to generalize to problems that look very different in terms of how we would actually organize it in the biology.
Speaker 2 I think the whole rules about, you know, what do we think about like a protein family being different, these targets that we tested on are, again, a biologist, they are very quote-unquote dissimilar from what we saw during training, but it doesn't seem like the model thinks that way.
Speaker 2 We actually even have a thought in our paper in the supplement where we actually look at an even harder subset.
Speaker 2 So, not looking at things that are, you know, up to 70% sequence similarity with the model, but actually pushing all the way down to 25%.
Speaker 2
So really looking at tasks that are very different we saw in training. Success rate was basically the same.
Like the model didn't care.
Speaker 2 And again, I think that indicates something very profound about what the models are learning here.
Speaker 1 I mean, my assumption is the same here, where obviously the fastest path to immediate impact is going to be, you know, antibodies in clinic or whatever other therapeutics Chai and its partners work on.
Speaker 1 But it does raise a question of like, if the model has learned something that fundamentally like the biology research community doesn't yet know from a principles perspective, like we will also learn those rules from these models or whatever the principles are of structure and interaction.
Speaker 1 So I think that's super exciting.
Speaker 2 Yeah, totally agree.
Speaker 1 Well, how would you characterize like overall hoped for impact of Chai 2 in terms of like bringing it to industry or your own programs?
Speaker 2 It's a great question, Sarah. So there's maybe two main areas that we can break it down into.
Speaker 2 The first one is, again, like we've returned this into an engineering problem and spending months or sometimes even years trying to discover some molecule.
Speaker 2 You know, now we can actually do it way faster because the screening, if you will, is happening on the computer instead of in the lab.
Speaker 2 But the second area that we're actually even more excited about is how do we actually solve problems which just weren't even reachable with traditional methods. The model is not perfect.
Speaker 2 You know, it worked in 50% of the targets that we tried. Maybe it would have been more, right, for the caveats we talked about before, but you know, it worked in 50% of cases.
Speaker 2 The failure mode of the models is going to be different than the failure mode in the lab today. And I think that's really going to be the sweet spot to focus in on.
Speaker 2 What are the areas that were not possible, you know, a few months ago, where now we'll be able to actually generate potential molecules really quickly against? So, those are the two areas.
Speaker 2 You know, things that you can do today, let's do them a lot fastest and a lot cheaper. But I think really the breakthrough opportunities are things that just weren't possible before.
Speaker 2 Yeah, and one other thing that we've announced is that we will be opening up access to both academic groups and industry partners.
Speaker 2 I think when you think about how this space is just going to evolve in the next few years and the amount of opportunity that's out there, given this platform shift, there is way too much opportunity for any one company to capture alone.
Speaker 2 And
Speaker 2 drug discovery itself is just an incredibly resource-intensive process. And I think it would be probably a conceit to assume that we could go after and pursue every target.
Speaker 2 We've done every program ourselves, even if we wanted to. And so, when we think about impact and think about what
Speaker 2 is going to move the needle for the company, of course, but also for the world, we think that the way to do that is to go out and bring this to life with a really exciting set of partners. And so
Speaker 2
we've opened up access. There's an access page on our website, which people can go to and fill out.
Currently, walking through there has been inundated with requests. But
Speaker 2 my hope is that we can really enable quite a few use cases with this and do that quite quickly.
Speaker 1 What has the reception been like so far? What is the biggest objection? Because this is a significant challenge to the ideas of high-throughput screening or even like the workflow that
Speaker 1 even innovative pharma and biotechs have today.
Speaker 2
Yeah, it's a great question. Usually when these kinds of papers come out, again, people have tried to do this many times.
The critique is often, does this really work?
Speaker 2 You showed this on maybe COVID, for example. Is this going to work for a case where we have less training data? Are the molecules going to be high quality?
Speaker 2 Do we really, you know, kind of believe the data? So I think the approach we did, like benchmarking this at scale, has really helped a lot with that reception.
Speaker 2 Like I think people really appreciated that approach, which has been great. Some of the questions people have is, okay, like I can already discover drugs.
Speaker 2 So, you know, so now I have AI that can do it a lot faster, but does that actually change the kinds of molecules I can work on? And it goes back to what we just discussed before.
Speaker 2 I think there are other folks that are responding to that saying like, no, like the transformation here is how about those projects that didn't work for you, uh, or we were really struggling today.
Speaker 2 Now, you've got another tool and a toolkit, and you kind of have to use this tool now, or you might be left behind.
Speaker 2 So, I think that it's been really interesting to see the community kind of digesting this. Of course, a lot of the AI folks are really excited, right?
Speaker 2 Like, we're getting artificial antibodies before we're getting, you know, maybe other breakthroughs we would have expected earlier. But it's uh, it's overall been
Speaker 2 really exciting to see that reception. I mean, our inboxes are just flooding up, like, the early access has gone in.
Speaker 2 Hundreds of people, you know, within hours of launching, uh reaching out to us we just announced so i think we're still kind of digesting all that we're a small team uh so we're prioritizing early access to to the right people but we're we're really excited uh to kind of get the models out there and for them to to start solving some uh some really uh hard problems in the drug discovery space is there an important future for like large-scale wet lab screening and does it just become a data collection exercise to fill out the distribution for chi models?
Speaker 2 Are there areas where you will, you think we'll need that in 10 years, 20 yeah i think if you just take the models and then you sample more you probably will get a better result so we tested only 20 molecules per target in the paper up to 20 molecules you know if you were to do 10 times that 100 times that orders of magnitude more you probably just get into spaces uh with better better molecules so you know the machine learning model is probabilistic it's like using chat gpt if you you're trying to solve a math problem and then you look at the top one response or if you look at the top 10 000 responses you're going to get a better result if you look at the top 10 000 You can't really do that with a product experience on ChatGPT.
Speaker 2
I'm not going to look through 10,000 math responses. I won't even know which one is correct.
The cool thing with the lab actually is we actually could just test all 10,000 of those in the lab.
Speaker 2 So I don't know if you have to, but that's definitely something that is, I think, going to be tested out with these models.
Speaker 2 And I think the future of high-throughput screening and how they kind of interact with the models, I think the question is still open. But I expect that
Speaker 2 people will be creative and will find ways to actually take the best of AI and marry that with the best of biology to kind of push the bounce forward.
Speaker 2 Yeah, and just to add to one thing to that, there's a whole host of really amazing CROs and other players with this incredible expertise running those traditional methods.
Speaker 2 And to Josh's point, we have many, many companies asking us: can you run this not just 20 times, but can you run this 100,000 times, even if it's going to work in 20?
Speaker 2 Because I just might find something better, right? And that something better can result in a better job.
Speaker 2 That could be the difference between getting a patient, an antibody which requires an injection or something which requires sub Q dosing, for example.
Speaker 2 And so I think with these tools, you can sample the search space sort of ad infinitum.
Speaker 2 And that marrying of traditional technique and models will actually hopefully get us into areas of this space where we can just find better products for patients.
Speaker 1
I want to ask one more question generally about like predictions for biotech. And then I want to talk about the future of chai as well.
What do you think biotech looks like 25 years from now?
Speaker 1 I realize that's a ludicrous question to anybody working in AI where you're like, hey, I didn't know this is going to work at all last year.
Speaker 2 As I mentioned before, there is a lot of doom and gloom in the biotech industry right now due to macro factors with rates where they are and the long-term investment cycles that are required to make biotech viable.
Speaker 2 There is just a real pessimism in the industry right now. It's sort of the worst market in a couple of decades.
Speaker 2 And I think that it's moments like this, breakthroughs like this, which give us these flashes of light and these reasons for just immense optimism about the future of this industry, not just in terms of improving timelines and reducing costs, but also in terms of fundamentally enabling those new products.
Speaker 2 And so, if we think ahead over the next 25 years, you know, we've gone from a less than 0.1% success rate to a closer 20% success rate in a year.
Speaker 2 Well, who's to say that in another year, that can't be a 50 plus or even a close to 100% success rate? I think if you see our mini protein results, we are,
Speaker 2 I think, close to 70% on those with P.Comolar affinities, like really, really tight binders for every single target that we tested.
Speaker 2 So all five targets we tested worked, and 70% of the designs that we ordered works. I think that there's no reason that other classes of molecules, those success rates can't be that high as well.
Speaker 2 And I think once you have that, you really enter this era where you sort of have a computer-aided design suite for molecules in a way that, you know, we have maybe SOLIDWORKS for mechanical engineering, or we have Photoshop for
Speaker 2 creatives. And that entire software suite will exist for biology.
Speaker 2 I think the implications of that, the ability to design, program, understand the interactions between atoms and molecules at the most fundamental level are pretty vast and should just give us a lot of hope and excitement about what's about to happen.
Speaker 2 We're just talking last night about maybe we should be getting baseball caps saying, say, bullish on biotech on them.
Speaker 2 Because I think this is one of those special moments, which I think can really, we've heard from many others writing into the company that this has really shifted their opinion.
Speaker 1 If you think about going from antibodies to, you know, obviously better success rates and then also other therapeutics, Is there a difficulty hierarchy we should have in our minds, or is it just like unexplored space in terms of enzymes and peptides, small molecules, other domains?
Speaker 2
Yeah, it's actually a lot more than just success rates. There's lots of properties that need to be optimized for a molecule.
You know, finding a drug is like looking for a needle in the haystack.
Speaker 2 And I think we've really passed through massive swaths of that sequence space with Chai2, right? By really focusing out on the things that buy.
Speaker 2 And that's where like a lot of the search space gets, has to be searched in the lab today, going deeper into other properties as well. Let's make sure that these antibodies can be manufactured well.
Speaker 2
Let's make sure that they can be really stable. So, there's lots of other properties that we're excited about.
So, stay tuned for that.
Speaker 2 And another thing is actually there are next generation antibody formats, even.
Speaker 2 So, what we predict will happen is people probably won't be as interested in the clinic for things like monoclonal antibodies.
Speaker 2 These are antibodies that are hitting, for example, like a specific epitope on a protein.
Speaker 2 But now, if we can make antibodies much faster and more easily, you can imagine a future where if I want to hit a target, let me choose two different parts of that target, make two different proteins that are hitting that, like basically two different primitive antibodies, and let me bring them together.
Speaker 2
This is called a biparatopic, two paratopes. So basically two different antibody interactions.
And that kind of stuff is going to become a lot easier to do today.
Speaker 2 I think these days there's a lot of trade-offs that get made in biotech about like, you know, risk on your target, risk on your discovery process, how hard is it going to be to make a molecule?
Speaker 2 And I think AI is going to raise the bar across the board. I think the bullish on biotech movement that
Speaker 2 Jack is announcing here as well, if we think about what that could even represent, there's right now a lot of risk in biotech. There's a lot of crowding on the same kinds of targets.
Speaker 2 The risk actually starts to go down in terms of discovering some of this stuff. Maybe there's still clinical risk if you try something that's like totally new that people haven't done before.
Speaker 2 But we've just like opened up, I think, the arper of opportunities that can be pursued here. And that's something that I think is really exciting.
Speaker 2 So still a lot more work to do for us to validate that like all this is going to be possible.
Speaker 2 But I think just the pace at which the field is moving just gives us a lot of optimism for what's going to be possible next. And maybe I can just share one anecdote about why we are so optimistic.
Speaker 2 We had a partner come to us as we were in the process of building these models. We didn't even really know.
Speaker 2 We hadn't had back our first few batches of data, so we didn't know if it was going to really work yet. But this partner had been working on this problem for a few years.
Speaker 2 They had a team of, I think, five to 10 people working on it.
Speaker 2 They estimated that fully loaded, all of those people might have set the company back with the experiments that they had done as well, maybe five, $10 million.
Speaker 2 And it was a problem where they wanted to build a molecule that... cross-reacts against two different species.
Speaker 2 So both a human form and a sino or a monkey form of this protein, such that when they put this molecule into animal testing, if you know they didn't want it to fail because the monkey has a a slightly different version of that that protein than the human does so we're really struggling to get this to work uh for whatever reason and we put this into the model and just prompted the model to design for these these two targets at the same time not just one target so you can imagine that this is a slightly more sophisticated challenge than just designing against once we ordered actually only 14 sequences to the lab and I think four of those were hits to humans.
Speaker 2 One of those was a hit to the Sino. One of them was actually overlapped and hit both.
Speaker 2 That one now allows us to move forward with that program and gives us a whole amount, a host of diversity around that molecule that one can explore as well.
Speaker 1 First of all, that's very cool. And second, I think it's interesting that a lot of industry observers would say, like, the bottleneck in pharma and the expense in pharma is clinical, not discovery.
Speaker 1 And like, I think you're pointing to the fact, well, like, we can design for clinic. Right.
Speaker 1 And actually, it's intuitive, but it's just because it is an argument from people bearish in biotech or concerned about the ability to make progress in programs and uh and reduce cost for for any given successful drug is well you know if discovery had less risk as josh was pointing out which is like a huge claim then the entire uh industry is more efficient right and more effective that's the that's the hope yeah and i think we've got a lot of reason to be optimistic i also don't want to oversimplify things you know there's lots of other things that go into making a drug there's capital markets that go into this uh You know, there's tons of clinical risks.
Speaker 2 This is really just the tip of the iceberg, but we're really excited about the progress that this could represent.
Speaker 1 I want to ask strategically where Chai invests from here. So you talked about other attributes that you want to be able to design in Chai models.
Speaker 1 But if we just look at this generically as an AI model company, where do you think the defensibility is?
Speaker 2 There are two key areas of investment for the company.
Speaker 2 I think, firstly,
Speaker 2 what comes out of these models, these just aren't drugs yet.
Speaker 2 They're hits, they're antibody hits, but there's a lot more work to be done to actually turn these into viable molecules that we can put into humans.
Speaker 2 We have early data, which we put in our preprint to suggest that a lot of the properties that one might want from a drug that these molecules actually have.
Speaker 2 But we need to do a lot more further characterization and assays to convince ourselves that we can do that.
Speaker 2 And then I think there's also the next step, stage beyond that, is actually designing entire drug candidates in zero shot, right out of the models.
Speaker 2 And I think a few months ago, we might have said this was a pretty futuristic idea and nobody in the company was really, really talking much about this.
Speaker 2 But I think once you see these results and grapple with the implications, the fact that we can get antibody hips in just 20 attempts.
Speaker 2
There's no reason that we couldn't generate entire drug candidates in that same number of attempts. So I think there's going to be some key investments there.
And really,
Speaker 2
the model right now is a model. It's not really a product.
It is a product.
Speaker 2 And it's certainly useful today, but there's a lot better that product can get with more investment into just making sure that we can optimize all the therapeutic properties that people care about.
Speaker 2 And then, of course, there's the entire interface and software layer around that to make this really easy to use and the real platform that goes around
Speaker 2 supporting that. So, you know, how do you, if you want to hit two targets, design a molecule that hits both? How do you specify that in the software?
Speaker 2 This is going to be a sufficiently advanced piece of software. It's going to become, you know, as advanced as a Photoshop over time.
Speaker 2 And as we build that out, I think we're going to need to make some really core investments into just the engineering and the products to ensure that
Speaker 2 we are
Speaker 2
building a software that we ourselves and others will really love to use. Yeah, one thing to add on to that as well.
We released Chai 1 open source. We thought of it as a model.
Speaker 2 And I think Chai 2 is a lot more than a model, right? It's become a product. It's actually more of a bigger pipeline that comes together to even make this happen.
Speaker 2
And it also becomes trickier to use these models. Protein folding, you put in your sequences, you get on a structure.
Design is a different story, right?
Speaker 2 Actually, specifying the prompt on its own, we did that programmatically in the paper to go and assess this thing at scale.
Speaker 2 But a scientist who wants to use this to initiate a drug discovery program, probably not using a script to come up with that prompt, is probably going to be really thoughtful about it.
Speaker 2 And I think that's why investing in the product layer here is really important. And not to mention, it's only going to get more complicated from here, right?
Speaker 2 As we start to support more advanced drug modalities, as there's various properties that come online, as the, we actually show some early evidence of this in the in the white paper.
Speaker 2 You know, you might want to actually optimize for multiple proteins at the same time. Sometimes, you know, actually, it's a good time to be a sick mouse.
Speaker 2 In order to have a human drug, it usually needs to work in animals as well. And sometimes drug programs actually get stuck there.
Speaker 2
It's like, okay, guys, like we either have a mouse drug or a human drug. It's really hard to get both.
And there are actually some cases where people have to discover two different drugs.
Speaker 2
They have, they call a surrogate antibody. I'm going to like make the mouse version.
I'm going to study that, convince the FDA that like this mechanism works. And but you're even taking risk.
Speaker 2
You're like, maybe this molecule like works like slightly differently. We literally show that example in the paper of optimizing.
We don't do mouse, we actually do monkey.
Speaker 2
So like monkey and human together. But you can throw other species into that as well.
Sometimes we've got the opposite problem. I want to hit this protein.
I don't want to hit this other protein.
Speaker 2 We've got some early evidence as of late that that's possible as well. And these sorts of things, you know, the prompts are just a lot more complicated.
Speaker 2 And it means that you need to have like the right product.
Speaker 2 what happens when you start doing those experiments in the lab.
Speaker 2 We want the models to learn from that and then help us really be like a co-pilot and driving like the next stage of those designs as well.
Speaker 2 You know, all of this is, again, it's more than just the models. It's really thinking about those workflows as well.
Speaker 2 And it's even about just getting that word out to people and having them think about this as a new tool in their stack.
Speaker 2 What happens if you're an antibody engineer and you've been doing things in a certain way for the past 30 years? And now there's a new paradigm in discovering drugs.
Speaker 2 Like that itself is actually a problem that a company needs to solve. So these are all different areas that we're investing in right now.
Speaker 1 That actually.
Speaker 1 begs a question I was going to ask you is like, if you are an antibody engineer or a biologist today, what advice, you know, given, let's say they believe you about how much is going to change and this like CAD for biology, like a software suite that is coming into existence, like what should they learn, be good at, like go study?
Speaker 2 Well, number one, get access to try to. Number two, you know, figure out how to get your prompts right and actually take full advantage of it.
Speaker 2
And then I think number three, you know, start dreaming about the new possibilities. You know, it's interesting.
We've talked to a lot of
Speaker 2 engineers since, since starting the company, and we've been alluding sometimes to, you know, what we're doing here.
Speaker 2 You know, uh, sometimes you do the market research question, you ask, you know, suppose you had like a 1% success rate as on antibodies, like what would you use that for?
Speaker 2 Uh, the conversations are changing now that first of all, it's not one percent, it's 10. And like, people see that it's working.
Speaker 2 I think that creativity is really being unlocked, even ourselves, right? I think when people are thinking about the answer to that question, there's always some big doubt in your mind.
Speaker 2 It's like, ah, it's a hypothetical question, you know, your neurons are not activating in the same way of doing something with it. It was the same thing with LLM.
Speaker 2 It's like, imagine asking someone five, 10 years ago, oh, you know, if we could predict the next word in a sentence perfectly, like, what would you do with that?
Speaker 2 It's actually very hard to imagine until you start playing with the models, even our team internally, you know, now, even without sending it to the lab, you know, we can, again, choose some targets, choose some prompt, generate stuff against it.
Speaker 2 You start to look at the generations that are coming out of the model and you're like, oh, wait, I can actually solve this problem by like choosing the right epitope on a target, choosing the right part of the target.
Speaker 2 Like these two targets are different.
Speaker 2 Like, sure, we have an engine that the model can optimize for one and/or both, uh, you know, or selectivity optimized for one and the other, but you can actually get a lot of that by choosing your prompt in a smart way.
Speaker 2 So, so, let me hit part of that protein that is actually quite different between the two things or quite similar between the two things.
Speaker 2 Uh, these are the sorts of realizations that, in retrospect, are quite obvious, but they don't really hit you until you actually start to like use a product like this yourself.
Speaker 2 So, I think people are just once they get their hands on this, I think they will start to dream of the new possibilities. I think it just really raises the bar.
Speaker 2 You know, the people who are most excited about that are often these antibody engineers and these biologists.
Speaker 2 A lot of the work that they're doing today is painstaking and they're not the biggest fan of these slow feedback loops and these intractable problems because many of them that we speak to are just really motivated to solve a particular task.
Speaker 2 And so you give them, you know,
Speaker 2
I'm an engineer. You give me a tool which says I have to write less code.
I love that. I can now think more about system design and architecture and more complex products and all these other things.
Speaker 2 But it's really going to raise the bar for a lot of these people. And I think people are only really now starting, as Josh said, to think through all the possibilities.
Speaker 2 I was on calls with people as a matter of a few weeks ago, where people are saying, when do you think this is going to happen? They say, oh, not for three to five years.
Speaker 2 This is a really futuristic idea. And then a couple of weeks later, you show them what they have and they sort of of fall off their chair.
Speaker 2 And so, there's going to be a sort of joint effort with us alongside these domain, real domain experts, to actually figure out these key application areas because biology is so vast and so complicated that actually there is so much knowledge that so many of the practitioners, the specialists have that
Speaker 2 no one company will just ever possess, which is why we're so excited to go out and be partnering with people to really bring this to life.
Speaker 1 I want to ask a couple of questions more, just specifically about company building before we run out of time. And maybe, Jack, I will start with you're an amazing engineer.
Speaker 1 And then you guys also have like a very software-oriented team working on biological problems. Some of those people come from, you know, long-term research in that space in particular.
Speaker 1 But for yourself, Jack, like, as you said, you're a software person. How do you get up to speed on the bio area to go do leading work?
Speaker 2 Well, I think it's two things. First of all, ramping up on any new field is always just a total fight.
Speaker 2 You have to get to the frontier and to have read the right papers and to be knowledgeable about the areas that you need to learn. You just have to sort of push your head down and push through.
Speaker 2 And there are waves of excitement and misery in that experience, but you can get there fast if you really set your mind to it.
Speaker 2 And I'd say the second part is that surrounding yourself with just the most incredible team is the best thing that you can do. Far beyond anything that you can learn by yourself.
Speaker 2 And we have certainly the most special group of people that I've ever worked with within the company are our co-founders, Matt McPartlin and Jack Bautreaux, who are just rare talents.
Speaker 2 And then the entire team beyond that, some of the former heads of AI at other drug discovery companies, some of the top open source contributors, the team is so...
Speaker 2
multi-talented. It's small.
It's around a dozen people, but mighty. And I think as we've seen in other areas of AI, small but mighty teams can go a really, really long, long way these days.
Speaker 2 And so, you know,
Speaker 2 I think there are actually surprisingly few people on our team, even with a computer science degree. Josh himself got a chemistry degree.
Speaker 2 Alex got his PhD in physics and
Speaker 2 a whole host of others. But
Speaker 2 this work is so
Speaker 2 interdisciplinary that really having that breadth of knowledge across biology, chemistry, physics, artificial intelligence, computer science, engineering.
Speaker 2 It really takes a village and everybody is learning from each other every day because of just how vast that subject matter is that one has to have a command of.
Speaker 2 I think we've also benefited from such immense focus as well. Everyone has been so passionate about trying to solve this problem.
Speaker 2 And I think I really credit that to being a huge reason and like why we were able to achieve it. And we've also got a team that because of that focus is very engineering centric as well.
Speaker 2 So if you look at the whole team, you know, we have a very research-oriented oriented team right now, but everyone is a stellar engineer as well and takes that very seriously.
Speaker 2 So it's not everyone solving, you know, their favorite pet problem. We are all going after the same problem and solving that together.
Speaker 2 And, you know, even just 10 people solving a problem together, there's a lot of code being written every day. You have to be very thoughtful about how that all comes together and interacts.
Speaker 2 And I think especially.
Speaker 2 in our next phase of growth for the company as we start to invest you know more and more in product and and and the velocity around that and getting this in into folks hands that's just going to become even more important.
Speaker 2 How do we make sure that the latest research breakthroughs that we're shipping internally are actually making their ways into partners' hands?
Speaker 2 That's something that, again, we are very thoughtful about at Chai and take very seriously.
Speaker 1 Yeah, I also remember Jack in our office at Conviction, like debating the merits of dev containers with some of your scientist teammates at the very beginning of the company.
Speaker 1 And both of you from the beginning, you know, talked a lot about platform investment.
Speaker 1 And so, I actually think that's like a little bit sort of unconventional in terms of such a research-oriented team to say, like, we need to make this platform investment.
Speaker 1 And can you talk a little bit about that?
Speaker 2 Yeah. So I've gone through the experience of going from, you know, zero to 100 on engineering, large engineering products before I worked on Stripe Link, which was kind of a multi-year project.
Speaker 2 And again, Stripe Capital, where engineering teams scaling from zero to 25, 50 people
Speaker 2 by the time we
Speaker 2 were done there,
Speaker 2 same for Link, maybe more.
Speaker 2 And I think you just learn that unless somebody is really taking care to keep the entire system in their head and is an effective technical steward of the architecture, that things just evolve and the sort of the entropy of a software takes over and slows down your rate of progress to zero because nobody can get work done anymore.
Speaker 2 And so somebody needs to keep the entire system in their head head and the interaction between all those components and make sure that people who are working on individual subcomponents of your code base actually have to minimize the amount of context that they need to load into their heads to understand how to accomplish their task.
Speaker 2 So, these are just the principles of really, you know, it's pretty basic, it's just simplicity and modularity, but making sure that's a practice and a kind of
Speaker 2 cultural practice, and everybody's on the same page about investing in that
Speaker 2 and that you know people aren't cutting corners. They see it as their responsibility to lay the groundwork for the next person.
Speaker 2 And this is doubly hard to do in deep learning code basis because often if you introduce a bug or write a regression, you won't know for weeks that that has shown up. It's sort of terrifying
Speaker 2 because
Speaker 2 you could spend a million dollars on a training run with a bug that crept in four weeks ago.
Speaker 2 We've literally had to do this in Chi's history, but we've had to go and bisect Git history, run launch training runs with a sort of a binary search to identify a small enough range of pull requests to identify a bug, then go to that pull request, identify the bug.
Speaker 2 And I think it's those sorts of experiences and the cost that
Speaker 2 finding that bug probably, I'm not sure if it was millions of dollars, but it was certainly tens of thousands of dollars of compute time to go back and find that thing.
Speaker 2 It's experiences like that, which I think make rigor such an important practice and engineering rigor in the company.
Speaker 2 And so being rigorous about it, I think, you know, some people are surprised to learn that even though we do deep learning, that we are pretty rigorous about writing unit tests for everything.
Speaker 2 But I think these basic software engineering practices are actually sorely lacking from most research code bases. And so bringing in some of those basic principles has
Speaker 2 allowed us to move very fast and not just fast in the shorn term, but it should give us a mechanism to compound on that investment over time.
Speaker 1 Well, it's overall very aligned with your just mission of term biology from science to engineering, right? It makes sense that it would go through the core of the company's practice.
Speaker 1 I have two more questions before we run out of time here. The first is, you know, you talk about the expense of like training experiments.
Speaker 1 Like, what's your decision framework for like how quickly to scale compute or, you know, parallelize experiments here?
Speaker 2 Yeah, we've tried to set up the company in a pretty scrappy way. Actually, when we were getting started, we should have even talked about this as well.
Speaker 2 You know, we, the company wasn't even, we were based in San Francisco, it wasn't even clear the company would be in San Francisco when we started.
Speaker 2 And, you know, back then, we hadn't like raised capital for the company really yet. Uh, we were kind of like using free compute uh credits from the cloud providers.
Speaker 2 Um, I think for us, it's it's just about uh being again laser focused on solving the problem and just like really making the case, like, why are we doing something?
Speaker 2 And I think that you know, if that's reasonable, we'll go and invest in it again in an engineering problem.
Speaker 2 If it's if it's uh, you know, kind of clear, you're seeing signs of life you're seeing some scaling law whatever it is like let's go as fast as possible make that work let's also not get distracted like scaling something out if we are not convinced that that it's going to work yet uh so i think that you know kind of scrappy culture on the you know where are we spending side kind of goes hand in hand with with making really fast progress because it means we have a high bar for like where we're spending our time Everyone on the team works extremely hard.
Speaker 2 You know, there's there's there's people in the office, like, you know, all times of day, all times of night, and it's, it's pretty beautiful to see that.
Speaker 2 So we work hard, but I think we also work really smart. And I think you have to do that to make progress with how fast the field is moving right now.
Speaker 1 You now see signs of life. You know, you're very, you're very bullish on biotech.
Speaker 1 That also means like, given that you are going to try to scale to support, you know, demand from the industry and your own, your own efforts, who are you looking to hire now?
Speaker 2 We're really hiring across all functions right now. So we've done made some really big breakthroughs here on the AI research side.
Speaker 2 And as we take that to the next level and try to get Chai 2 in front of the right partners, we're hiring for product engineering, for antibody engineering, for business development,
Speaker 2
accounting executive. Like there's a there's a whole host of roles that are open on our site right now.
And again, this work is extremely interdisciplinary.
Speaker 2 And we really wanted to build this in a thoughtful way so that we can make Chai 2 as useful as possible for the industry.
Speaker 1 Well, thanks for doing this, guys, and congratulations on progressing the frontier of AI discovery.
Speaker 2 Thanks so much for having us on, Sarah.
Speaker 2 It's been really fun. Thank you, Sarah.
Speaker 1
Find us on Twitter at no priors pod. Subscribe to our YouTube channel if you want to see our faces.
Follow the show on Apple Podcasts, Spotify, or wherever you listen.
Speaker 1 That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.