E146: The 92% AI Failure: Unmasking Enterprise's Trillion-Dollar Mistake

March 14, 2025 25m Episode 146

In this episode of How I Invest, I sit down with Matt Fitzpatrick, CEO of Invisible Technologies and former head of McKinsey’s QuantumBlack Labs. Matt shares his deep insights on enterprise AI adoption, model fine-tuning, and the challenges businesses face when integrating AI into their workflows. We explore why only 8% of AI models make it to production, how enterprises can overcome friction points, and the future of AI-powered enterprise solutions. If you’re curious about the intersection of AI and business strategy, this episode is a must-listen.

Download Audio Original Episode

Listen and Follow Along

Speed:

Full Transcript

An AI native solution, the framework is not just that AI is replacing what a human is doing, but how would you design the model with AI in mind? I think most of the material benefit you're going to see is when you clean sheet any process to be like, how would I design this process knowing all the AI tools I have from scratch? And how do I use both technology and humans? And by the way, I think the example for that is going to involve both for a long, long time. In fact, I think humans are a core part of this solution.
I think in Invisible, we believe that's the human machine interface where all the value sits. But it's not necessarily just giving all your people on an existing process and a tool.
It's redesigning the process to use all the tools you can dispose of. Let's talk about Invisible.
Give me some specifics on how the company is doing today. I joined in mid-January.
We ended 2024 at $134 million in revenue. Profitable.
We were the third fastest growing AI business in America over the last three years. So how will DeepSeek affect Invisible? The viral story was that it was $5 million to build the models they did.
The latest estimates that have come out since in the FT and elsewhere would say it's closer to $1.6 billion. I think the number that's been cited from a compute standpoint is like 50,000 GPUs.
So if you had just told that narrative as the exact same story, but with 1.6 billion of compute, I don't even think it would have been a media story. The fact that it costs over a billion dollars to build that model means it is a continuation of the current paradigm.
Look, there are some interesting innovations they've had, mixture of experts.

They did some interesting stuff around data storage that does have some benefits on reducing compute costs.

But I think those are things we've seen other model builders experiment with already.

If I think about types of data, they basically went around things that are base truth logic,

like math, where there's a fair amount of synthetic data available.

That's a fairly small percentage of the overall training tasks that I'd say most model builders

are focused on.

Tell me more about that.

Think about training as kind of three main vectors.

So you have base truth information where a lot of synthetic or kind of internet broad-based data exists. So math is a really good example of that.
Then you have tasks like creative writing where there is no real kind of AI feedback, but there's no synthetic data that's existing. There's no way to train those models without human feedback.
But the most interesting one is you have a whole set of base truth information where you also don't have enough synthetic data. So an example of that I would give would be computational biology in Hindi.
The corpus of that is just not broad enough. Each branch of that tree and each topic you train off of will have a different approach.
And tell me about what Invisible Technologies does exactly. We have two big components of our business, what I call reinforced learning and feedback, which is the process on any topic where a model is being trained.
We can spin up a mix of expert agents on that particular topic. So that could be everything from, I mean, I use the example of computational biology in Hindi.
Our pool has a 1% acceptance rate, and about 30% of the pool is PhDs and masters. So these are very high-end specific experts.
The funniest one I've talked about somebody is like falconry in the 1800s, things where there's just not a lot of good existing data. And look, I think models are going to be built on the full corpus of information that is mattered to humanity.
So there's a lot of branches of that tree and we bring all of the different experts to help train those models. But that's only half the business where we're seeing increased focus and demands on the enterprise side.
The big challenge today and the kind of chasm that exists between let's call it Silicon Valley and the enterprises, there's a demand for broad-based model development, which is really important. But I think what a lot of the enterprise is looking for is how do I get those models to then work at 99% accuracy in my specific context? Tell me about some examples of enterprise models that have worked.
Therein lies a great question. The stat that I've seen most frequently cited is that about 8% of models today make it to production.
The two largest high-profile public enterprise cases I've seen are Moody's had a chain of thought reasoning example. And then probably the most often cited one is Klarna had a contact center where they basically built up an entirely Gen AI centric contact center to replace the old contact center they had.
The realized impact in the enterprise has not materialized the way people expected it would. I am very bullish on where it will go, but to date, those are the only two examples I can set.
I can cite some pretty public struggles, but there have not been many other realized examples that I've seen. So there's hundreds of billions of dollars being put into this problem set, only two successful examples.
Where are the main frictions and how do you see that evolving over the next five, 10 years? Most of that money to date has gone into building the models that are extensible, generalizable, and moving towards greater levels of intelligent change of thought reasoning. We've seen unbelievable progress phase of the model building process.
The challenge is, let's say you're an insurer and you need to build a claims model. What you need to know is that your model works with perfect accuracy.
You're not 99% accuracy. The investments have led to material improvements.
It's that the motion of then taking those models and fine-tuning them in an enterprise context has not been standardized yet. The motion of how do I deploy a machine learning model with accuracy, you've seen a bunch of really good examples of that, like straight through processing of mortgage loans is one example where those are being productionalized.
They're working. There's a ton of examples of impact coming from machine learning deployments.
AI has not really figured

out. It's what I'd call production paradigm yet.
The open AIs, the anthropics, the X-AIs of the

world, developing these incredible generalized models. And then you only have really two use

cases for enterprises. You mentioned fine tuning.
What are the other steps that a company needs to

go through in order to make their AI work? Let's take an asset manager that is going to build a

Thank you. two use cases for enterprises.
You mentioned fine tuning. What are the other steps that a company needs to go through in order to make their AI work? Let's take an asset manager that is going to build a system to do ongoing reviews of its assets based on its internal investments, right?

The first step you need is you need all your internal data organized and structured and in

a place where you can use it and access it. That's probably the biggest challenge most people face is

there's a joke I like to say that when good AI meets bad data, the data usually wins. And I think

the challenge is if your internal data environments, if you don't have a clear definition of your assets, your products, if you don't have kind of what I'd call organized core data domains, it's very hard to even use AI until you've got that organized. That's probably the biggest challenge I think the enterprise faces right now is most of the data is on systems that are a decade or later old.
It's not organized or mastered across those systems in a way that they can use it in Gen AI models. So that's one big problem.
I think the bigger than issue is, so let's say you get that all organized. David, I'll put you on the spot to give you an example.
So let's say that you built a, let's say you built a model, a Gen AI model to produce summaries of investments in the financial services space and just kind of look at new investment ideas. How would you build a chatbot to do that? You spend money to do it.
And at the end of that, you have a model that will start generating these kind of investment memos. How would you define a good memo or a bad memo at scale? So let's say it generates 10,000 memos.
How do you know it works? If you're on a fund, then you know, having instant access to fund insights, modeling and forecasting are critical for producing returns, which is why I recommend Tactic by Carta. With Tactic by Carta, fund managers can now run complex fund construction, round modeling, and scenario analysis, meaning for the first time ever, you could use a single centralized platform to do everything from analyzing performance to modeling returns with real industry benchmarks.
Most importantly, communicate it all with precision to your LPs. Discover the new standard in private fund management with access to key insights and data-driven analysis that could change the trajectory of your fund.
Learn more at carta.com slash how I invest. That's C-A-R-T-A dot com slash H-O-W-I-I-N-V-E-S-T.
it's difficult to do at scale at 10 000 but i think on an individual basis a good model is a

model that hits all the points and then has more clarity and more details on the sub points. So I would evaluate it based on, did it get all the main key points of the investment thesis at a high level? And then were the sub-level points sufficient or covered the main topics? What you're saying makes complete sense, which is you have a set of parameters or a set of outcomes you're looking for in a memo.
Even if you have that, though, the question then becomes, how do you evaluate that consistently across 10,000 memos? And I think this is the difference between backtesting of ML dataset versus Gen AI is you need a way to actually go back and validate that what is produced works. And I think that has been the real challenge that the enterprise has struggled with is you may have a sense for what good looks like.
You might say, for example, the definition of a good investment memo would be like at least a paragraph summary of competitive set, some context on the market, including growth rates. Like you could set a set of parameters that you're looking forward to answer, but then you have to wade through and kind of assess all that.
And so what we've spent the last eight years doing for the model builders and others is building what's called semi-private custom evals, where we effectively set parameters like that would say, these are the definitions of good. This is the outcomes we're looking for.
And then we use human feedback to score those parameters. So we could go at big scale and say, does this outcome cover what you're looking for? And we bring subject matter experts to Bayer to actually do that scoring.
I think that's actually been the big gap is these are often things you can't score with a random person in the street. You can't just put it into market and hope it works.
You need a subject matter expert to say, this looks generally good before any organization gets comfortable launching it. One way I've seen enterprises do that is I've seen a couple customers experiments already is they'll actually have their own employees evaluating this at huge scale.
But if you think about the time suck of like having large numbers of people just reviewing genetic outputs, that's very hard to do. So I think a lot of what we've now evolved to is on the enterprise side, a mix of these kind of evals and assessments of the models that are happening that we then help customers fine tune and improve their models.
Is there a gap between what a generalist searcher might want and somebody domain specific? In other words, if I'm making a $100 million investment decision based on a memo, that has to be much better than if I want to find out if dogs could eat a certain type of food and what's the best practice for raising a healthy dog. Post-earnings reports are more than just a data dump.
They're a goldmine of opportunities waiting to be unlocked.

With Dilupa, you could turn those opportunities into actionable insights.

Dilupa's dynamic scenario building tools integrate updated earnings data, letting you model multiple strategic outcomes such as what happens if a company revises guidance. With automated sensitivity analysis, you could quickly understand the impact of key variables like cost pressures, currency fluctuations, or interest rate changes.
This means you'll deliver more actionable insights for your clients, helping them navigate risks and seizing opportunities faster. Ready to enrich your post-earnings narratives? Visit Dalupe.com slash how.
That's D-A-L-O-O-P-A dot com slash how today to get started. One of the questions you're asking here is what is the bar or the risk bar for production depending on the use case? And I do think it's different.
As an example, if the goal of a chatbot is just to say something like review restaurants and it's a consumer facing, the risk bar on that is does it say anything toxic? Is there any bias? You can put some risk parameters around it, but you don't really need kind of subject matter expert feedback on it. I'll give an interesting one, legal, like law.
The bar for accuracy on that is materially different than a consumer example. And many law firms are experimenting with this, but it's hard to assess something like a debt covenant agreement without subject matter experts weighing in on a scale.
The outputs are consistently good. How do AI models evolve when the parameters change? Is this something that will always be needed to be refreshed? There are two ways that models generally be consumed.
Many models that will be consumed by consumers that are effectively just going to be what the model builders produced. Usually the way the enterprise is using these models is they're tailoring those models to their corpus of information.
I'll give an example. Let's say that you have two wealth managers.
Let's say I'm going to make up to Fidelity and T. Rowe Price, and they want to use, you know, experiment with things like robo-advisory or question answering.
They're not going to just use an off-the-shelf framework for that. They're going to tailor it off of all of the information that exists in their communications and their training documentation.
Any model that's being trained at the enterprise is usually being trained off of the internal knowledge management corpus that that institution has. And so you're using the large language model from the model builder and you're tailoring it to

your specific context. That process is called fine-tuning.
Prior to becoming CEO of Invisible,

you headed McKinsey's Quantum Black Labs, which is their AI labs. What did you do at McKinsey?

I focused on three main things. One, kind of all of our large-scale data transformation,

data lake house, data warehouse build. So the first thing I mentioned, which is if your data

is messy, it's very hard to use AI. I spent a lot of time focusing on that.
I spent a lot of time

Thank you. of things.
One, kind of all of our large-scale data transformation, data lake house, data warehouse built. So the first thing I mentioned, which is if your data is messy, it's very hard to use AI.
I spent a lot of time focusing on that. I spent a lot of time doing custom application development, so building all sorts of different applications, whether that be for retention, pricing, contact centers, kind of software, custom software that people could use to deploy models.
And I do think that's an understated part of a lot of this is there is what a model does, but then there is the way that somebody can understand it and interpret it. And a lot of that is the user interface by which they consume it.
And so I think that's something the enterprise is spending a lot of time on is what is the user interface by which people consume and think about and make decisions around these models. And in the third area, I oversaw the Gen AI lab, which is McKinsey's kind of global Gen AI tool build.
When I was there, we're doing anything from 220 to 240 Gen.ai builds at the time. I want to double click on the enterprise side of what you did at McKinsey.
You mentioned those two high profile use cases for successful enterprises built. Did you build any successful enterprise use cases while you were at McKinsey? We definitely did.
And there's a public case you can reference that a couple of folks in Quantum Black build for ING, where it's effectively a chatbot. And one of the things they mentioned in it, very similar to what I'm saying now, is a lot of what was required to put that in production was getting it to 99% accuracy.
So they had a lot of parameters and fine tuning that you get around testing, quality controlling it, building audit LLMs to make sure that the outcome was good. We definitely did a lot of that.
But the rough math you see across the industry is about 8% of Gen AI models make it to production. And that's broad-based.
The amount that kind of stall around the proof-of-concept pilot phase is pretty material. I think it will get better over time.
But to date, the challenge has been a lot of things I mentioned, challenging data, an unclear definition of good, and an unwillingness from the folks in the field to actually use and believe the outcomes of the models. And I think that's going to take time.
There's this concept in the productivity space, which is 80% done is 100% good. Is there like an 80-20 rule here where you could use AI to solve many things and dramatically decrease your need for sales representatives, for customer support? And does it have to be 100% good? That is a really complicated question.
So there's another analogy we shall use, which means manufacturing lines in a factory. And so if I ask you the question, if I have 10 people on a manufacturing line and every one of them saves 5% of their time, what is the line savings? I'm guessing half a person.
Zero, because you can't take out a line and no person can be taken off that line. You effectively just move to a world where everyone has a little bit more free time.
And I think that's the challenge. The 8020 here is things like copilots have had a very interesting kind of last two years in that they are helpful.
coding co-pilots, legal co-pilots, all these things. But it's unclear the degree to which they actually save any work.

They kind of tweak a lot of things on the margin.

And I think the difference, I'd say say in 80-20 is I think to do that well, you actually have to re-engineer processes. You have to say, what does my end-to-end workflow look like for claims processing or whatever that might be? And how do I take out two full steps to actually get to a better level of efficiency? That's hard to use a software tool for.
You need kind of people on your team to think about the workflow design. You need to redesign the actual process flow.
That's been a bit of the challenge of the last two years is a lot of people have just focused on all different types of co-pilots across all different industries. And I think that's helpful.
But I think the next phase of this is actually process redesign and moving to ways where you can actually totally restructure the way a line works, as an example. An AI-native solution.
The framework is not just that AI is replacing what a human is doing, but how would you design the model with AI in mind? Most of the material benefit you're going to see is when you clean sheet any process to be like, how would I design this process knowing all the AI tools I have from scratch? And how do I use both technology and humans? And by the way, I think the example for that is going to involve both for a long, long time. Humans are a core part of this solution.
I think in Invisible, we believe that it's the human-machine interface where all the value sits. But it's not necessarily just giving all your people on an existing process and a tool.
It's redesigning the process to use all the tools at your disposal. Thank you for listening.
To join our community and to make sure you do not miss any future episodes, please click the follow button above to subscribe.

So let's talk about Invisible. Give me some specifics on how the company is doing today.

We ended 2024 at $134 million in revenue. Profitable.
We were the third fastest growing AI business in America over the last three years.

You just joined as CEO. What is your strategy for the next five to 10 years? And how do you even conceptualize a strategy given how fast the industry is changing?

Thank you. You just joined as CEO.
What is your strategy for the next five to 10 years? And how do you even conceptualize a strategy given how fast the industry is changing?

We've had explosive growth in the current kind of core of the business, which is AI training. And we plan to continue to focus on that.

Our goal is to work with all the model builders to get these models as accurate as possible and support them any way we can with lots of human feedback.

So if you think about what Invisible has there, we have this kind of AI process platform where we trot out any individual task into a set of stages and then insert kind of feedback analytics and all of those different steps.

We'll be right back. feedback.
So if you think about what Invisible has there, we have this kind of AI process platform where we trotch out any individual task into a set of stages and then insert kind of feedback analytics and all of those different steps. We then have the AI training and evals motion I described, which is a set of modules.
On the back of that, we have a labor marketplace where we can source all of those 5,000 different expert agents on any given topic. The core of that will remain our focus.
The shift I envision are kind of twofold. One, deepening our focus on using that for fine-tuning the enterprise.
This is something I think all the model builders are hopeful for as well, is the more that we can help all of the enterprise clients figure out how to get the most of their model builds they're focused on, how to get those working, that's better for everyone. Everyone is hoping to see many more examples.
And I, by the way, am very, very optimistic that over the next five, six years, we're going to see many, many more examples of great GNI use cases in production. It's just been, I think, a period of learning.
The last few years have been kind of a proof of concept phase for the enterprise. Really helping the enterprise get many of those into production is a core focus for us.
The other big area that I'm going to evolve Invisible into is the analogy I would use is we're going to build a modern service now anchored in GNI. So Invisible's process platform will include much more data infrastructure.
It'll include an application development environment and process builder tools. And it'll include our kind of our really, really good services delivery team around that.
So one belief I have is that it's very hard to do any of this with a push of a button. I think the age of software has kind of relied on the idea that you build something and people take that as is.
And I think AI is much more around configuring and customizing different workflows exactly for any given customer wants. You can envision what Invisible will evolve into as kind of our AI process platform with lots of process builder tools where people can build very sector-specific applications like claims for insurance or onboarding for food and beverage or fund admin for private equity.
So you'll have a bunch of different verticalized use cases we'll go after and a lot of really interesting core data infrastructure tools like data ontology, master data management, things like that to help people get their data working. How do you avoid being the victim of your own success? So you come in enterprises, you streamline their AI models using the services model.
How do you avoid making yourself obsolescent? Funny piece of context outside there is 70% of the software in America is over 20 years old. The rate of modernization of that has been glacially slow.
I know there's been a lot of hype that says suddenly the whole world is going to be hyper-modern, everything's going to work in two years. I think this is a long journey over the next two decades where we get to a world where every enterprise runs off of modern infrastructure, modern tech stacks, and functions much like the digital companies, digitally native companies that have been built up over the last five years.
That will take time to get to, but I'm very excited about what our platform can do to enable that. Said another way, your total addressable market is every enterprise for minus the two that have built models.
Even in those two companies, I'm sure they're looking to streamline other parts of the business. I think that's right.
The interesting thing, if you look at what I would call the application modernization market, so all of the modernization of legacy systems, it happens annually. No player right now is more than 3% of that.
So it's actually a very fragmented market that is painfully slow in how it moves. And it's the main frustration points for most enterprises.
Like if you ask the average CEO in any company that's over 10 years old, how happy are they with their core data, the kind of tools they use on a daily basis, most are pretty frustrated. So I don't think this is something where the existing is really good and everyone's really happy.
There's a lot of frustration that we are hoping to help fix. And I think Gen.ai will be the root of doing a lot of that.
I think there's a lot of tooling you can do to generate insights faster, to pull up reporting faster. And so we will be a Gen.ai native kind of application development platform.
You have a very unique vantage point in that you're the CEO of one of the fastest growing AI companies. You ran McKinsey's lab.
Walk me through the AI ecosystem today in terms of how you look at the ecosystem. I've talked to a bunch of VCs about this in the past couple of days, the infrastructure layer, which is where most of the capital is going today.
And that's a mix of kind of things like data centers, as well as the model builders. And you asked about the gap to kind of investment today to enterprise modernization.
The challenge is that above the infrastructure layer, you have what I call the application layer, which is individual tools for individual use cases, right? And that could be, I mentioned claims, it could be legal services. It's all the verticalized applications that exist anchored in Gen AI to solve problems.
All of those applications today, for the most part, are SaaS or traditional software based. So they are designed, like all software's last 20 years, to be a kind of a push button deployment of a specific use case that functions like traditional SaaS software.
I am skeptical that that is actually going to be the way that impact is realized with Gen AI for a couple of different reasons. Software as a paradigm has existed that way because the idea was it took so long to get data schemas organized and structured, and it took so long to build any custom tool that you had to invest all the money up front of building a perfect piece of software.
Once you got data locked in on that software, it was very hard for anyone to ever migrate off of that. The term of this is your systems of record.
Once you're locked in on any sort of a system of record, whether that be an ERP system, whether it be an HR database, you basically never leave as an enterprise because the data is really painful to move. And so that's been the conventional wisdom on how to build software for a long time.
You've had some really public examples. Sachin Nadella mentioned it.
What Gen AI may enable is a movement where the value moves from the system of record layer to the authentic layer. So you actually move to a world where people don't stay on software that's sticky just because of the data.
They actually want the best possible software for their specific institution. So you might have a world where people are building tooling that is much more custom to their enterprise.
You might have a world where I have a React workflow that uses analytics that are customized to my enterprise in a cloud environment, and I can stand that up in a couple months. And I think that paradigm is a very different way forward for technology.
Now, I'm sure there are some that would dispute that. I'm sure there are some that will say software will exist as it always has.

But I would say that the main feedback I heard from a lot of VCs is that most of the application layer today focuses on the standard software paradigm. And I think we're looking at something very different, which is we want to have kind of an application development environment with a lot of configurability and customizability, the ability to build verticalized applications for specific sectors.
That will allow us to say, not this is our tool, take it, but much more, what is the workflow you need? Let's bring that to life. Let's say a company is looking, a Fortune 500 company is looking to create a CRM.
What would an AI native CRM look like for a Fortune 500 company versus just using a Salesforce? What you would usually end up doing in that world is you'll look at Salesforce or Dynamics or ServiceNow has one of these now, and you will buy out-of-the-box functionality. Like you'll buy, let's say, their contact center tooling, but then you will end up customizing a fair amount of that to your enterprise.
So you'll say, my contact center is going to have this flow for services, this over calls. And so even though you're buying the tool, you're going to spend a year customizing, configuring it for your workflow.
CRM is a little bit different in that you do have several large players, Salesforce Dynamics and ServiceNow now, that have built fairly good builder applications for that use case. If you're successful as CEO of Invisible, what will the company look like in 2030? I use the analogy for now because I have a ton of respect for what they've done.
My main Northstar metric is that every Gen AI model we work on will reach production. And so I'm really excited about working with all the model builders over the next couple of years to continue to fine tune and train their models and get that working at huge scale in the enterprise.
And I think that's something that we will be a huge driver of. What would you like our audience to know about you, about invisible technologies or anything else you'd like to share? What I don't want any of what I've said to Kim McCall is pessimistic.
There's nobody that believes that AI will be more positive for the enterprise over the next five to 10 years. I think the last two years did not live up to the hype cycle, partly because there was a belief that you could just buy a product out of the box, push a button and suddenly all your gen AI will work.
My kind of advice or view on the path forward is I don't think that will be the paradigm. I think every enterprise will have to build some capabilities around what do I want to get out of these models? How do I train and validate these models? How do I make sure my data is adequately reflected in these models? And that's a very doable thing.
When we sit here five, 10 years from now, there'll be some really exciting deployments in this, like the ability to stand up new software, new digital workflow, companies based on Gen AI is going to expand significantly.

But I do think it's been a bit of reality check over the last two years that, you know, this is not like I just stand up a piece of software, push a button and everything works. How should people follow you and Invisible? You can add me on LinkedIn.
I'll be posting about some of the updates we'll be adding there. We're building kind of, I call it data insights function, Invisible as well.
We're going to start to bring as much of the truth that we're seeing

and what's exciting

and what we recommend

to our enterprise clients

so we can help them navigate

what is a very complex

and difficult world.

Thank you for listening

to my conversation with Matt.

If you enjoyed this episode,

please share with a friend.

This helps us grow

and also provides