Exclusive: How GPT-5 Actually Works

August 22, 2025 28m

▶️ Listen to episode Download audio (MP3)

In a bonus third episode, Ed Zitron reports exclusively on how OpenAI’s new "router-based" ChatGPT-5 makes it impossible for the company to cache the static prompt for any model or tool it uses every single prompt, doubling token burn for mediocre gains.

Better Offline listener deal: Get $15 Off Where's Your Ed At Premium! Deal goes until the end of August.
https://edzitronswheresyouredatghostio.outpost.pub/public/promo-subscription/better-offline-discount

YOU CAN NOW BUY BETTER OFFLINE MERCH! Go to https://cottonbureau.com/people/better-offline and use code FREE99 for free shipping on orders of $99 or more.

BUY A LIMITED EDITION BETTER OFFLINE CHALLENGE COIN! https://cottonbureau.com/p/XSH74N/challenge-coin/better-offline-challenge-coin#/29269226/gold-metal-1.75in

---

LINKS: https://www.tinyurl.com/betterofflinelinks

Newsletter: https://www.wheresyoured.at/

Reddit: https://www.reddit.com/r/BetterOffline/

Discord: chat.wheresyoured.at

Ed's Socials:

https://twitter.com/edzitron

https://www.instagram.com/edzitron

See omnystudio.com/listener for privacy information.

Jump to transcript

Listen and follow along

Speed:

Transcript

This is an iHeart podcast.

For a limited time, get $20 off select meds at Chewy Pharmacy when you spend over $49.

Fleen check prevention?

Yep.

Allergy meds?

Those two.

Pain medications?

On sale?

And much more.

For life with pets, there's Chewy Pharmacy.

Stop settling for weak sound.

It's time to level up your game and bring the boom.

Hit the town with the ultra-durable LGX Boom portable speaker and enjoy vibrant sound wherever you go.

Elevate your listening experience to new heights because let's be real, your music deserves it.

The future of sound is now with LGXBoom.

And for a limited time, save 25% at lg.com with code FALL25.

Bring the boom.

X-Boom.

Be honest, how many tabs do you have open right now?

Too many?

Sounds like you need close all tabs from KQED, where I, Morgan Sung, Doom Scroll so you don't have to.

Every week, we scour the internet to bring you deep dives that explain how the digital world connects and divides us all.

Everyone's cooped up in their house.

I will talk to this robot.

If you're a truly engaged activist, the government already has data on you.

Driverless cars are going to mess up in ways that humans wouldn't.

Listen to Close All Tabs, wherever you get your podcasts.

Let's be real.

Life happens.

Kids spill.

Pets shed.

And accidents are inevitable.

Find a sofa that can keep up at washable sofas.com.

Starting at just $699, our sofas are fully machine washable inside and out.

So you can say goodbye to stains and hello to worry-free living.

Made with liquid and stain-resistant fabrics.

They're kid-proof, pet-friendly, and built for everyday life.

Plus, changeable fabric covers let you refresh your sofa whenever you want.

Neat flexibility?

Our modular design lets you rearrange your sofa anytime to fit your space, whether it's a growing family room or a cozy apartment.

Plus, they're earth-friendly and trusted by over 200,000 happy customers.

It's time to upgrade to a stress-free, mess-proof sofa.

Visit washablefas.com today and save.

That's washable sofas.com.

Offers are subject to change and certain restrictions may apply.

Call Zone Media.

Hi, my name's Ed Zitron, and welcome to Better Offline.

This is also Jackass.

So you've just had a cheery two-part chuckle first about how generative AI made tanko markets in our economy.

So I'm going to give you a lighter one, an episode about GPT-5, which is a model from OpenAI, and why just under three years of hype have led to the software equivalent of the launch of Saint Anger, except every time Lazoric hit the snare drum, it cost them $55,000.

Now, if we look at the positive reviews, we see takes ranging from Simon Willison's tepid remark that GPT-5 is just good at stuff to Semi-Analysis's completely insane statement that GPT-5 is setting the stage for ad monetization and the OpenAI GPT, chat GPT super app, in a piece that makes several assertions about how the router that underpins GPT-5 is somehow the secret way that OpenAI will inject ads, which is just distinctly silly.

It's

I'll get into this in the episode a little bit, but just with everything you're gonna hear, you're gonna realize that this is just someone just saying stuff.

Took four bylines to do that shit, too.

I'm also British, I'm gonna say router.

I might say router as well because I've been here a while.

Make fun of my voice if you really must.

But with that out of the way, here's a quote from Semini Analysis's coverage: Before the router, there was no way for a query to be distinguished, and after the router, the first low-value query could be routed to a GPT-5 mini-model that can answer with zero tool calls and no reasoning.

This likely means serving this user is approaching the cost of a search query.

This does not make any sense.

This, none of this makes it like it's just a bunch of assumptions.

Why would this be the case?

The article also makes a lot of claims about the value of a question and how ChatGPT could, I am serious,

agentically reach out to lawyers.

I'm not going to edit that out because agentically is not a fun word to say.

It's just complete nonsense.

And in fact, I'm not sure this piece reflects how GPT-5 even works at all.

Again, quoting it, the router serves multiple purposes on both the cost and performance side.

On the cost side, routing users to mini versions of each bubble allows OpenAI to service users at a lower cost.

Or with lower costs, even.

To be fair on semi-analysis, it's not as if OpenAI gave them much help.

OpenAI's official writings about the router aren't exactly filled with details,

talking in glowing terms about what it does, but not how.

Here's what they say.

Chat GPT's real-time router quickly decides which model to use based on the conversation type complexity tool needs and your explicit intent.

For example, if you say think hard about this in the prompt.

The router is continuously trained on real signals including when users switch models, preference rates for responses and measured

correctness, improving over time.

Once usage limits are reached, a mini version of each model handles remaining queries.

In the near future, we plan to integrate these capabilities into a single model.

And that last bit really doesn't make sense, but in any case, the launch of GPT-5 has been very, very weird.

At first, some people seemed really happy about it.

Chief of them, software YouTuber Theo Brown, who has over 468,000 subscribers.

He's also known as Theo GG, who said, I didn't know it could get this good.

This was kind of the like, oh fuck moment for me in a lot of ways.

And I've had to fight like a slow spiral into insanity.

It's a really, really good model.

He finished finished by saying, and keep an eye on your job because I don't know what this means for us long term.

Pretty crazy, right?

Comments on the video included people saying things like: if OpenAI is held you hostage, blink twice, and yes, that is an adverb quote.

Another saying, this dude is everything wrong in IT today, another saying this video is sponsored by OpenAI, another saying, GPT-5 failed every test project I gave it today.

It's a lie in my experience.

Maybe they haven't ramped up the GPUs.

Now, from what I can tell, Theo Brown played with GPT-5 in OpenAI's OpenAI's offices and did all the benchmarking there.

OpenAI, by the way, fucking hell.

Come on, you can't benchmark in their offices.

Anyway, OpenAI's API-based access to GPT-5 models, you know, the thing that you use if you want to integrate GPT into your app, does not root them, by the way, nor does OpenAI offer access to its router or any associated models.

Important detail.

Just want you to know that because we need to make sure we're very clear.

Now, a week later, Theo Brown would put out another video called I Was Wrong About GPT-5, which he would open by saying, So first and foremost, I want to make sure it is very, very clear that the experience that you probably are having with ChatGPT and GPT-5 right now is not the experience that I had when I was first testing it.

Brown goes on to explain that he was not paid by OpenAI at all, that he was sincerely impressed by the company and GPT-5, and that he'd actually spent over $25,000 in inference testing it on his own company's software.

And indeed, also that he turned down a grand appearance fee.

Sorry, I mean, that's a very British thing, a thousand dollar appearance fee, not just like a really nice one.

Brown claims he asked OpenAI to try it out, and after they declined to let him test it early on his own, he was invited to try it on camera with a small group of other people at OpenAI's offices, where they'd film his reactions.

He said that the API was incredible, but that it's become apparent that the models he was using in the video were not the same as those released to the public, making a post on August 13th on XThe Everything app that GPT-5 was nowhere near as good as in cursor as when it wa as it was when he was using it a few weeks ago, complaining that things that worked while demoing it at OpenAI no longer did, adding that there was something, somebody else on Twitter that said they'd had a similarly great experience with GPT-5 on launch that has since decayed.

It isn't completely clear what happened here, but I'm going to guess that OpenAI showed Theo Brown and others in their offices some sort of heavily modified version of the model that burns significantly more compute to provide its outputs.

Though I'm also very suspicious of how significant the difference is here.

Brown's videos attempt to show the difference between the generations that he received from the model when it was good and when it was bad in this video, which I'll include a link to in the episode notes.

But if I'm honest,

they look pretty similar in that they're kind of mediocre.

I'm not saying that as a hater, by the way.

They just kind of look like shit.

It's just kind of, okay, like shit.

They look like regular fucking generated websites.

They don't look special.

The good one is fine, and the bad one has weird gradients on it.

This whole thing sucks, though.

And was a clear setup by OpenAI to overstate the abilities of GPT-5, one that fell apart with the lightest brush with reality.

I imagine their assumption was that Brown would post a glossy video and then walk away, and I gave Theo some credit for straight up stating he was misled.

This was a desperate move, and one that blew up in the face of OpenAI, along with the rest of the GPT-5 launch.

People hate the model, customers are mad at OpenAI for taking models away like 4.0, and have remained mad even with their return.

And the ChatGPT subreddit is almost entirely people complaining about how ineffective the new version is and how even GPT-4.0 is not the same.

They got gamer brain baby.

As I said in last week's monologue, I believe OpenAI has grown a fandom rather than any kind of sustainable product market fit and they're now suffering fandom-like hate with every minor change they make in an attempt to push GPT-5 further, further aggravating people that barely understand why they use the product to begin with.

Yet at the center of the anger laid the reason for GPT-5's launch.

The belief that this was somehow a cost-cutting measure where OpenAI had added a router to ChatGPT as a means of sending certain requests to cheaper models to save money.

But when I hear router, I hear latency, and I never, for even a second, believed that this would somehow be cheaper to run.

It didn't make sense.

I'm a curious little critter, so I went and found out how ChatGPT 5 actually works.

And unlike the following incredible products that you should buy, it's actually kind of a big piece of shit.

Stop settling for for weak sound.

It's time to level up your game and bring the boom.

Hit the town with the ultra-durable LG X-Boom portable speaker and enjoy vibrant sound wherever you go.

Elevate your listening experience to new heights because let's be real, your music deserves it.

The future of sound is now with LGX Boom.

And for a limited time, save 25% at lg.com with code Fall25.

Bring the boom.

X-Boom.

There's more to San Francisco with the Chronicle.

More to experience and to explore.

Knowing San Francisco is our passion.

Discover more at sfchronicle.com.

Be honest, how many tabs do you have open right now?

Too many?

Sounds like you need close all tabs from KQED, where I, Morgan Sung, Doom Scroll so you don't have to.

Every week, we scour the internet to bring you deep dives that explain how the digital world connects and divides us all.

Everyone's cooped up in their house.

I will talk to this robot.

If you're a truly engaged activist, the government already has data on you.

Driverless cars are going to mess up in ways that humans wouldn't.

Listen to Close All Tabs, wherever you get your podcasts.

Tired of spills and stains on your sofa?

WashableSofas.com has your back, featuring the Anime Collection, the only designer sofa that's machine washable inside and out, where designer quality meets budget-friendly prices.

That's right, sofas started just $699.

Enjoy a no-risk experience with pet-friendly, stain-resistant, and changeable slip covers made with performance fabrics.

Experience cloud-like comfort with high-resilience foam that's hypoallergenic and never needs fluffing.

The sturdy steel frame ensures longevity, and the modular pieces can be rearranged anytime.

Check out washable sofas.com and get up to 60% off your Anabay sofa, backed by a 30-day satisfaction guarantee.

If you're not absolutely in love, send it back for a full refund.

No return shipping or restocking fees.

Every penny back.

Upgrade now at washable sofas.com.

Offers are subject to change and certain restrictions may apply.

And we're back.

And from here on out, I will define two things.

GPT-5 referring to the model and its associated mini and nano models, and ChatGPT-5, referring to the current state of ChatGPT, which features an auto, fast, and thinking and thinking mini model selections you also can see legacy models but that's not what we're talking about today and that's also only for a little bit it's a distinction I have to make by the way and make early because the two things are different they work in different ways and chat gpt5's structure introduces a bunch of trade-offs and downsides that as I'll discuss later make this whole thing even more wasteful In discussions with a source at an infrastructure provider familiar with the architecture, it appears that ChatGPT-5 is in fact potentially more expensive to run than previous models, and due to the complex and chaotic nature of said architecture can at times burn upwards of double the tokens per query.

Tokens, for those who don't know, are basically chunks of text that the AI models do stuff with.

I'm simplifying this.

Do not email me and correct some minor thing.

Nobody cares.

A sentence like the quick brown fox jumps over the lazy dog will be broken into lots of smaller four character chunks.

There are different kinds of tokens and they're all priced differently.

An input token refers to the data you send to the model when you ask it a question.

Output tokens are are used to measure the size of its response with bigger responses, requiring more tokens.

The more tokens you burn per query, the more expensive it is to run that query.

The fact that ChatGPT-5 can, in certain circumstances, burn twice the number of tokens per query means that every question costs more.

Chat GPT is also significantly more convoluted, plagued by latency issues, and is more compute intensive thanks to OpenAI's new smarter, more efficient model routing system.

In simpler terms, every user prompt on ChatGPT, whether it's in auto, fast, thinking, or thinking mini, starts by putting the user's prompt before the static prompt.

I don't want to lose you here.

This is important.

A static prompt is the invisible instructions given by OpenAI to ChatGPT and the models themselves and the tools associated with them to tell them how to operate.

Instructions like you are ChatGPT, you're a large language model, you're a helpful chat bot, do not threaten them with a knife, and so on and so forth.

These static prompts are different with each model you use.

A reasoning model will have a different instruction set to a more chat-focused one, such as think harder about a particular problem before giving an answer, break down problems into component answers when you get a certain thing, like if someone asks you a coding question, query a coding tool, that kind of thing.

A user prompt is exactly what it sounds like, the thing that a user wants the AI model to do.

The new order in ChatGPT-5 becomes an issue when you use multiple different models in the same conversation, because the router, the thing that selects the right model for the request, has to look at the user prompt.

It can't consider static instructions first because they may be different based on what the user asked.

In fact, the order has to be flipped for the whole thing to work.

Put simpler, previous versions of ChatGPT would take the static prompt and then invisibly append the user prompt onto it.

This static prompt would typically be cached, massively reducing the amount of compute the model needs to perform a task.

ChatGPT cannot do this.

Every time you use ChatGPT 5, every single thing you say or do can cause it to do something different.

Attach a file might need a different model.

Ask it to look into something and be detailed might trigger a reasoning model or a different depth of reasoning.

Ask Ask a question in a weird way.

Sorry, the router is going to need to send you to a different model entirely.

Each time coming up with new instructions based on the subtle interpretation of what you asked it.

Every single thing that can happen when you ask ChatGPT to do something may trigger the router to change model or request a new tool.

And each time it does so requires a completely fresh static prompt, regardless of whether you select auto-thinking, fast, or any other option on ChatGPT.

This in turn requires it to expand more compute, with queries consuming more tokens compared to previous versions.

It's like you started a job and every time you do a task, write an email, make a cup of coffee, attend a meeting, email someone with a threat.

Your workplace requires you to complete the entire mandatory onboarding training first.

Want to edit a spreadsheet?

Not before you brush up on your anti-bribery legislation first, you prick.

As a result, ChatGPT may be smart, but it doesn't really seem efficient in the GPT-5 version.

Now, to play devil's advocate, OpenAI likely added the routing model as a means of creating a more sophisticated output for a user, and I imagine with the intention of cost saving.

Then again, this might just be the thing it had to ship.

After all, GPT-5 was meant to be the next great leap in AI and the pressure was on to get it out the door.

By creating a system that depends on an external routing model, likely another LLM in this case, OpenAI has removed the ability to cache the hidden instructions that dictate how the models generate answers in ChatGPT, creating massive infrastructural overhead.

Worse still, this happens with every single turn, as in message on ChatGPT-5, regardless of the model you choose, creating endless infrastructural baggage with no real way out that only compounds based on how complex a user's queries get or how much they change.

They could be simple, but just going in different directions every time.

Could OpenAI make a better router?

Sure.

Does it have a good one today?

No.

Every time you message ChatGPT it has the potential to change model or tooling based on its own whims, each time requiring a fresh static prompt and short of totally reworking the architecture of ChatGPT-5, there's no way to change this.

And if it's an LLM choosing which model, I don't know, maybe it hallucinates.

Just a guess.

It doesn't even need to be the case where a user asks ChatGPT-5 to think.

And based on my tests with GPT-5,

sometimes you can just ask it a four-word question, and it'll think about it for no apparent reason.

OpenAI has created a product with latency issues and an overwhelmingly convoluted routing system that's already straining capacity, to the point that this announcement feels like OpenAI is walking away from its API entirely.

This, as a reminder, is the thing that people use to incorporate OpenAI's models into their apps, while also running said models on the infrastructure OpenAI rents from Microsoft and Coreweave at some point, as well as Oracle.

And this API thing is really weird, by the way, because these are new models, but OpenAI is really not talking about the models themselves that much.

Unlike the GPT-4.0 announcement, which mentions the API in the first paragraph, the GPT-5 announcement has no reference to it, and only has a single reference to developers at all when talking about coding.

Sam Alman has already hinted that he intends to deprecate any new API demand, though I imagine he'll let anyone who will pay for priority processing, which is essentially OpenAI's way to require minimum commitments and extra payments from API customers, just so they never feel the bite of any compute shortages and throttling, which they absolutely will do to people that don't pay.

Chat GPT-5 feels like the ultimate commuppence for a company that has never been forced to build a product, choosing instead to bolt increasingly complex tools onto the side of models in the hopes that one will magically appear.

Now each and every feature of ChatGPT burns more money than it ever did before.

ChatGPT 5 feels like a product that was rushed to market by a desperate company that had to get something out the door.

In simpler terms, here, it's actually really funny.

When I worked this out,

I chuckled vigorously.

This is just a case where OpenAI has given ChatGPT a middle manager.

But now I'm giving you the chance to open up your hearts and do something better.

Open up your wallets too, and send money to a company that follows here.

Behold my advertisements.

Stop settling for weak sound.

It's time to level up your game and bring the boom.

Hit the town with the ultra-durable LG X-Boom portable speaker and enjoy vibrant sound wherever you go.

Elevate your listening experience to new heights because let's be real, your music deserves it.

The future of sound is now with LG XBoom.

And for a limited time, save 25% 25% at LG.com with code FALL25.

Bring the boom, X-Boom.

There's more to San Francisco with the Chronicle.

There's more food for thought, more thought for food.

There's more data insights to help with those day-to-day choices.

There's more to the weather than whether it's going to rain.

And with our arts and entertainment coverage, you won't just get out more, you'll get more out of it.

At the Chronicle, knowing more about San Francisco is our passion.

Discover more at sfchronicle.com.

Hi, I'm Morgan Sung, host of Close All Tabs from KQED, where every week we reveal how the online world collides with everyday life.

There was the six-foot cartoon otter who came out from behind a curtain.

It actually really matters that driverless cars are going to mess up in ways that humans wouldn't.

Should I be telling this thing all about my love life?

I I think we will see a Twitch stream or president maybe within our lifetimes.

You can find Close All tabs wherever you listen to podcasts.

Let's be real.

Life happens.

Kids spill.

Pets shed.

And accidents are inevitable.

Find a sofa that can keep up at washable sofas.com.

Starting at just $699, our sofas are fully machine washable inside and out.

So you can say goodbye to stains and hello to worry-free living.

Made with liquid and stain-resistant fabrics, they're kid-proof, pet-friendly, and built for everyday life.

Plus, changeable fabric covers let you refresh your sofa whenever you want.

Neat flexibility?

Our modular design lets you rearrange your sofa anytime to fit your space, whether it's a growing family room or a cozy apartment.

Plus, they're earth-friendly and trusted by over 200,000 happy customers.

It's time to upgrade to a stress-free, mess-proof sofa.

Visit sofas.com today and save.

That's washable sofas.com.

Offers are subject to change and certain restrictions may apply.

And we're back.

Like every great middle manager, ChatGPT-5's router creates more work based on its own interpretation of what's going on.

And as a separate large language model, I can't imagine it has a ton of training data available.

If I had to guess, and this is a guess by the way, OpenAI has done and will do a lot of fine-tuning and reinforcement learning to make it work.

Though, to give it a little grace, this is a new thing that it's doing, and it's doing so at a huge scale.

The problems start, by the way, with the fact that ChatGPT-5 is taking the user's initial prompt and then deciding which model to use.

Unlike previous models, which sent your prompt directly to the model along with the static prompt, which was cached and came first, an important feature in how these models limit token burn, OpenAI starts with a router model that makes takes what you ask and gives it to ChatGPT and tags it based on what kind of thing your question might need.

The thing might be a tool such as whether it has to do a web search to spit out the thing at the end, a reasoning model, whether it needs to use a coding language and so on and so forth.

Once ChatGPT has bounced your query across various models, burning compute along the way, it then pushes it towards the chat portion of the generation.

And each time you ask ChatGPT a question or to do something, a new specialized static prompt is generated, sometimes several, making it impossible to cache them in advance.

In simpler terms, each time you message it, ChatGPT has to dump all cached information and instructions for what you need to do and reload it with each prompt.

Now here are some examples of what ChatGPT-5 has to reload every single time you prompt it.

Whether or not to use a browser or search the internet and under what conditions to do so because they will change with each prompt.

How to approach a particular problem based on what the user asked including any specific ways you meant to answer tone, brevity and so on, based on their request.

Specifics around how it might use say OpenAI's code interpreter, such as the usage rules for running a Python script or how you want the code's output which again will be different based on each prompt and you can even say do it in exactly the same way and because it's the large language model it may hallucinate something different every single goddamn time you prompt chat gpt5 it has to do this worse still a particular conversation can involve you using multiple different models and tools requiring you with each and every prompt

having to inject a different static prompt for each component that chat gpt5 uses and you can't cache the static prompt before the user's intent because if you did that, it might send an instruction to a model that doesn't make sense, such as telling a reasoning model to give a quick and simple answer, or a mini or nano model to do some sort of deep reasoning, which would create a crappy answer and burn tokens in the process.

And this is all thanks to the complicated way that OpenAI insisted on building GPT-5.

Every single time you send something to ChatGPT can trigger it to use a different series of models,

audio, vision, reasoning, each with their own instructions, static prompts, all while putting different tools, each requiring their own instructions based on what you asked, and reasoning models even have different depths of reasoning.

Unlike 4.0, which is a multimodal model combining text, vision, and voice, GPT-5 is a ratking of OpenAI's models and tools that gets reborn every single time you ask it to do anything.

It can prompt cache some things, but the core instructions, not so much.

But let's get a little more granular, because I know I've...

been quite repetitive, but this is detailed.

So from what I've been told, there are either one or two models at work for the routing.

I'm going to go with what I think is most likely based on the discussions I've had with people familiar with the architecture.

I've heard the term orchestrator thrown around,

potentially suggesting the router may be more omnipresent throughout the process, but I was unable to confirm its existence.

Reach out if you hear differently.

I'll explain things as they were explained to me, though.

When a user sends a prompt, it goes through the splitter leg, which decides to send the query on one of two paths.

One is called the fast path, where a query is straightforward, such as a text-only conversation that doesn't require any analysis or extra tools, or thinking, a path where the query may require reasoning, or more complex tools like code generation or access to a web browser for research.

To be clear, there are prompts where it may be split into multiple parts that trigger multiple models or tools, each requiring their own static instructions.

From what I understand, the splitter model is a completely separate large language model, though we don't have a ton of details about it.

I also, based on conversations I've had, think there's a chance there could be a separate model that sits above the splitter that does much lighter classification of how a query might be routed.

So you ask it to do something, it might just go, okay, this looks like it needs a tool.

But I'm going off why now.

In any case, none of this can be cached because all of this exists before inference, which is where, by the way, it's inference I've misstated in the past as like inferring, meaning inference is everything that happens to get an output to you.

So all of the stuff that's happening.

And by the way, this is all a completely new cost that OpenAI has created.

No one does this like this.

It's so fucking stupid.

But now we get to the chat leg.

Now that OpenAI has added layers of abstraction, it can begin cooking up the output, by which I mean do inference.

The chat leg is where the pieces that the splitter model created are pulled together, each loaded into there with their respective static prompts based on what the user asked ChatGPT-5 to do.

Each piece of the model, a tool to generate Python, an image generation tool, a reasoning model to generate an output, has to process an entirely new static prompt.

And again, that's every interaction.

Remember, static prompts are effectively instructions, so the splitter model has told each piece of the pie how to act to create a particular output.

As a result, much of this can't be cached, creating more and more repetitious token burn response, and meets have to repeat this stuff so that you really get him.

The upshot of the chat legs static prompt baggage is that you can do a little more here, at least in theory.

Because each component can be instructed separately, they can, again in theory, be made to give more individualized specialized outputs, like creating an image with text that is, as I'll give you an example of very shortly, generated using a specific reasoning model.

I'm clutching at straws here.

I don't really know if this better, but I'm trying to be reasonable.

I'm trying to be normal.

Every day I try and be normal.

Previously, OpenAI's advantage was that a model like 4.0 was a kind of a jack of all trades, but to get the benefits of ChatGPT-5, and that's in air quotes, it's engaged a conductor model that can just make things more convoluted, even in the case of simple requests.

Let me give you an example.

You upload a chart of NFL players' stats and ask ChatGPT to decide which is the best of the group and create an image to show the results.

In GPT-4.0, ChatGPT would use one model and thus one static prompt to look at the image, decide which tools to use, and then how to format the response.

You only needed one prompt, which was cached, because one model can look at the stats, pull the data, and make the decisions, and then use the image generation tool to make the final image.

In GPT-5, the ChatGPT conductor model would see the stats, route it to a vision model, requiring its own static prompt, then a separate text-only reasoning model, one that has no ability ability to use tools but it might be cheaper to get an answer from and also requires a static prompt and that would then decide which players were best and then spit out an output and then route it to a completely separate model that can generate text to query the image tool again

Need a static prompt for this to then generate the image.

On top of all this onerous baggage lies another problem.

The GPT-5's various models are just more complex.

By splitting out the component elements of what a model can do and allowing each model to have different levels of reasoning, even the cheaper ones like MIDI and Nano, OpenAI has created an endless combination of different reasons to have to make a brand new static prompt instruction, all automated by a router, a large language model that chooses what large language model to choose for a query.

It is, if I'm honest, kind of funny.

Reasoning models work, when simply described, by breaking up a prompt into component pieces, looking over them and deciding what the best course of action might be.

ChatGPT's router is effectively an abstraction hire, breaking up the prompt into component pieces, then choosing different models for each of those pieces, which may in turn turn be broken up by a reasoning model.

While I wouldn't say this is a hat-on-a-hat situation, it is at this point unclear what exactly the benefits of ChatGPT-5's new architecture are.

Less hallucinations?

Better answers?

Based on what I've been told, this was a decision made to increase the model's performance.

What I can say is that this very likely increased OpenAI's overhead at a time when it needs to do the exact opposite.

Even if ChatGPT-5 pushes people towards cheaper models, it does so while guaranteeing extra costs and latency, and whatever signals it may learn as people use this will have to create significant benefits, massive 100% plus gains for it to be anything close to worthwhile.

While OpenAI's router may be smart in terms of nuance of how it might answer a query and even that I question, it most decidedly is not more efficient and may have actually increased the burn rate for a company that will lose as much as $8 billion this year.

And I think that number might be low too.

Yet what I'm left with in writing this script is how wasteful all of this is.

OpenAI, a company that has already incinerated upwards of $15 billion in the last two years, has chosen to create a less efficient way of doing business as a means of eking out a modest at best performance improvements.

It just sucks.

In our own lives, we're continually pushed, pressured, and punished if we get into debt, judged by our peers and our parents if we spend our money recklessly.

And if we're too reckless, we find ourselves less likely to receive anything from credit to housing.

Companies like OpenAI live by a different set of standards.

Sam Altman intends to lose more than $44 billion by the end of 2028 on OpenAI, and graciously told CNBC like Lord Farquhar that he was willing to run at a loss for a long time, where he was treated like he was this smart, reasonable decision-maker rather than someone that needed to rein in their horrendous spending habits and be more mindful.

The ultra-rich are rewarded far more for their errant spending habits than we ever are for any thriftiness or austerity measures we make, and none of us are afforded the level of grace that clammy Sam Altman has been, and has been feels appropriate.

Chat GPT-5 is an engineering nightmare, a phenomenally silly and desperate attempt to juice what remains of the dying innovation and excitement within the walls of OpenAI.

It's not November 2022 anymore, and let's be honest, there really hasn't been anything exciting or interesting out this company since GPT-4.

There's nothing exciting happening at this company.

As many as 700 million people a week allegedly use ChatGPT, but nobody can really say why, and OpenAI, despite its massive popularity, cannot seem to stop losing billions of dollars and it can't seem to explain why that's necessary other than this shit's really expensive dude can anyone actually articulate a reason why we need to burn billions of dollars to do this what are we doing why are we doing it has everybody just agreed to do this until it becomes completely untenable do we all yearn for the abyss so much that we can't find camaraderie in admitting we were wrong Look at GPT-5.

This is, if you believe the hype, the best funded, best-resourced company in the world with the greatest mind at its helm and the greatest minds within its walls.

And this is the best they've got.

A large language model that chooses which large language model will answer your question.

Gee, fucking whiz, Sam Altman.

Sounds dandy.

And how much better is this, you say?

Oh, you can't really say?

Fucking brilliant.

Hey, does it do anything new?

No?

Oh, what's that?

It's actually our job to work that out for ourselves?

Thanks, man.

I love it.

I love this shit.

And if you're someone that is a hype merchant listening to this, and you've done really well getting to the end of the third part, by the way, i respect you i want you to email me and explain why they should be justified in burning billions of dollars if you tell me uber if you tell me aws i will eat you alive i mean that as a i mean that completely literally i will unhinge my jaw

i'll eat you like kirby and shit out of dance i've said that one before but i'm going with him

in any case This three-parter has also really reminded me how ridiculous this is, how nonsensical things have become, and how much waste has been

kind of justified.

Justified on this idea that this will become something by people that don't really know what it does today or might do in the future.

None of this is going to end well, and not even the boosters seem to be having fun anymore.

Everybody's just flating around, waiting for it to end.

Even Sam Altman seems tired of it all.

I know I bloody well am.

Thank you for listening to Better Offline.

The editor and composer of the Better Offline theme song is Matasowski.

You can check out more of his music and audio projects at matasowski.com.

M-A-T-T-O-S-O-W-S-K-I dot com.

You can email me at easy at betteroffline.com or visit betteroffline.com to find more podcast links and of course my newsletter.

I also really recommend you go to chat.where's your ed.at to visit the Discord and go to r/slash betteroffline to check out our Reddit.

Thank you so much for listening.

Better Offline is a production of CoolZone Media.

For more from CoolZone Media, visit our website, coolzonemedia.com or check us out on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.

Stop settling for weak sound.

It's time to level up your game and bring the boom.

Hit the town with the ultra-durable LG X-Boom portable speaker and enjoy vibrant sound wherever you go.

Elevate your listening experience to new heights because let's be real, your music deserves it.

The future of sound is now with LG X Boom.

And for a limited time, save 25% at LG.com with code Fall25.

Bring a boom.

X boom.

Hi, I'm Morgan Sung, host of Close All Tabs from KQED, where every week we reveal how the online world collides with everyday life.

There was the six-foot cartoon otter who came out from behind a curtain.

It actually really matters that driverless cars are going to mess up in ways that humans wouldn't.

Should I be telling this thing all about my love life?

I think we will see a Twitch stream or president maybe within our lifetimes.

You can find Close All tabs wherever you listen to podcasts.

Life's messy.

We're talking spills, stains, pets, and kids.

But with Anibay, you never have to stress about messes again.

At washablefas.com, discover Anibay Sofas, the only fully machine-washable sofas inside and out, starting at just $699.

Made with liquid and stain-resistant fabrics.

That means fewer stains and more peace of mind.

Designed for real life, our sofas feature changeable fabric covers, allowing you to refresh your style anytime.

Need flexibility?

Our modular design lets you rearrange your sofa effortlessly.

Perfect for cozy apartments or spacious homes.

Plus, they're earth-friendly and built to last.

That's why over 200,000 happy customers have made the switch.

Upgrade your space today.

Visit washable sofas.com now and bring home a sofa made for life.

That's washablesofas.com.

Offers are subject to change and certain restrictions may apply.

Lily is a proud partner of the iHeartRadio Music Festival for Lily's duets for type 2 diabetes campaign that celebrates patient stories of support.

Share your story at mountjaro.com slash duets.

Mountjaro terzepatide is an injectable prescription medicine that is used along with diet and exercise to improve blood sugar, glucose in adults with type 2 diabetes mellitus.

Mount Jaro is not for use in children.

Don't take Mount Jaro if you're allergic to it or if you or someone in your family had medullary thyroid cancer or multiple endocrine neoplasia syndrome type 2.

Stop and call your doctor right away if you have an allergic reaction, a lump or swelling in your neck, severe stomach pain, or vision changes.

Serious side effects may include inflamed pancreas and gallbladder problems.

Taking Maljaro with a sulfinyl norrhea or insulin may cause low blood sugar.

Tell your doctor if you're nursing, pregnant, plan to be, or taking birth control pills, and before scheduled procedures with anesthesia.

Side effects include nausea, diarrhea, and vomiting, which can cause dehydration and may cause kidney problems.

Once weekly Mountjaro is available by prescription only in 2.55, 7.5, 10, 12.5, and 15 milligram per 0.5 milliliter injection.

Call 1-800-LILLIERX-800-545-5979 or visit mountjaro.lilly.com for the Mount Jaro indication and safety summary with warnings.

Talk to your doctor for more information about Mountjaro.

Mountjaro and its delivery device base are registered trademarks owned or licensed by Eli Lilly and Company, its subsidiaries or affiliates.

This is an iHeart podcast.

← Previous: The AI Money Trap, Part Two Next: Radio Better Offline: Kyle Barr, Alex Cranz & Michael Fisher →

Exclusive: How GPT-5 Actually Works

Listen and follow along

Transcript

More episodes from Better Offline

Monologue: Don't Be Scared Of Sora

Exclusive: Here's How Much Anthropic Spends on AWS

Monologue: ChatGPT's Growth Is Collapsing In Europe

The Interview: Steve Burke of GamersNexus

The World of AI Regulation with Brian Merchant