Monologue: Don't Be Scared Of Sora
In this week's Better Offline monologue, Ed Zitron walks you through why nobody should be afraid of generative video, and how it’s impossible, impractical and expensive to make movies with Sora.
https://www.npr.org/2025/10/20/nx-s1-5567119/sora-2-openai-hollywood
https://platform.openai.com/docs/guides/video-generation
https://mediashower.com/blog/first-ai-viral-ad/
https://x.com/PJaccetturo/status/1932893269228466270
Want to support me? Get $10 off a year’s subscription to my premium newsletter: https://edzitronswheresyouredatghostio.outpost.pub/public/promo-subscription/w08jbm4jwg
YOU CAN NOW BUY BETTER OFFLINE MERCH! Go to https://cottonbureau.com/people/better-offline and use code FREE99 for free shipping on orders of $99 or more.
---
LINKS: https://www.tinyurl.com/betterofflinelinks
Newsletter: https://www.wheresyoured.at/
Reddit: https://www.reddit.com/r/BetterOffline/
Discord: chat.wheresyoured.at
Ed's Socials:
https://www.instagram.com/edzitron
See omnystudio.com/listener for privacy information.
Listen and follow along
Transcript
This is an iHeart podcast.
There's more to San Francisco with the Chronicle.
More to experience and to explore.
Knowing San Francisco is our passion.
Discover more at sfchronicle.com.
AI agents are everywhere, automating tasks and making decisions at machine speed.
But agents make mistakes.
Just one rogue agent can do big damage before you even notice.
Rubrik Agent Cloud is the only platform that helps you monitor agents, set guardrails, and rewind mistakes so you can unleash agents, not risk.
Accelerate your AI transformation at rubric.com.
That's R-U-B-R-I-K.com.
Is there anything better than seeing your favorite neighborhood businesses grow?
When the bagel shop with the line out the door opens new locations or your favorite boutique adds a cafe?
Square has the tools to help make it happen.
Square banking to fund the next goal, AI-powered analytics to make informed decisions, and square point of sale that works for any type of business.
Go to square.com/slash go/slash iHeart to learn how businesses grow with Square.
Block Incorporated is not a bank.
Banking services provided by Square Financial Services Incorporated and Sutton Bank members FDIC.
Loans are subject to credit approval.
Not all group chats are the same, just like not all Adams are the same.
Adam Brody, for instance, uses WhatsApp to pin messages, send events, and settle debates using polls with his friends, all in one group chat.
Makes our guys' night easier.
But Adam Scott group messages with an app that isn't WhatsApp, which means he still can't find that text from his friends about where to meet.
Hang on, still scrolling.
I know the address is here somewhere.
It's time for WhatsApp.
Message privately with everyone.
Cool Zone Media.
All right, Matt, I've read the YouTube comments, and this time I want it so you do not cut me off with the music too fast, okay?
Good?
Right, alright, let's go.
This is this week's Better Offline monologue and I'm Ed Zitr.
A lot of you have been saying you want me to do something about Sora and if I'm honest, I haven't wanted to because I find the whole thing so utterly pathetic.
A few weeks ago, OpenAI launched a half-baked social networking app attached to a compute intensive video and audio generator.
And people immediately began to do two things: freak out and generate as many copyright violations as humanely possible.
All because of OpenAI's original plan was to ask copyright holders to opt out of having their content presented in these videos.
Sora spent several days covered in Nazi SpongeBobs and Pikachu's with guns before multiple Hollywood talent agencies, along with the estate of Martin Luther King Jr., intervened and complained, leading to OpenAI creating, to quote NPR, an opt-in policy allowing all artists, performers, and individuals the right to determine how and whether they can be simulated, with OpenAI blocking the generation of well-known characters on its public feed and offering to take down material not in compliance.
It's unclear what happened with Nintendo, but I imagine one of their 70 million lawyers attacked.
And now we've got that out of the way, let's talk about Sora itself.
I understand a lot of the people who listen in film and TV, they're kind of scared, and I understand that you've seen a few clips that look kind of sort of realistic and that this, especially if you're in the creative arts, is quite terrifying because your mind naturally assumes that these clips can be strung together into some sort of coherent whole.
This isn't the case.
Every single good, and I use the term loosely, Sora video is cherry-picked for many, many, many terrible generations.
Every time you use Sora is random.
It doesn't matter how specific your prompt is or however many times you've used it, Sora is effectively a giant video and audio slot machine.
You can never, ever guarantee that Sora will generate something useful, and as a result can never really budget for using it.
The human eye is remarkably demanding, and little visual inconsistencies between scenes will make people feel weird and uncomfortable.
Imagine that extrapolated to 10 or 15 seconds at a time and how difficult it will be to get something that makes visual sense before you have to think about things like, does this connect to the rest of the footage I'm using?
Okay,
so the majority of actual professionals who would use Sora would not be using the app.
They'll be connecting directly to the model on OpenAI's API.
It's just...
It's not done via a classical app interface.
Now, then there's the problem of cost.
This is where you really need to start worrying if you're building things with Sora.
So let's start off with the first problem.
Cost.
So OpenAI offers two different Sora models.
Sora 2, which they say is designed for speed and flexibility and is ideal for the exploration phase.
And that costs 10 cents per second.
And then there's Sora 2 Pro, which is either 30 cents or 50 cents a second, depending on resolution.
And I quote, it's the thing you go to for production quality outputs.
So you're either spending one, three or five dollars for every 10 seconds of footage and like every generative model, the longer you generate, the higher the likelihood of hallucinations, which in the case of Sora means bizarre animations, inconsistent details, or just flat out useless crap.
Then there's the problem of time.
OpenAI's own documentation says that a single render may take several minutes.
At the end of those several minutes, out pops a video that may or may not be of any use.
OpenAI allows you to remix using more prompts, which allows some iterative development, but these remixes also cost money and also take several minutes.
So let me walk you through a scenario.
You're making a short film.
Let's just say it's 15 minutes long, which is 900 seconds.
You ask Sora to generate a man putting on a hat.
Your first eight generations, each taking four minutes and $5 a piece, which takes about 32 minutes and $40.
They don't really do the job, so you do two more, taking another four minutes apiece and $10 more.
You finally, on the next try, get something kind of useful, which costs you another $5,
and then you realize you wanted him to wear a specific kind of hat.
This happens all the time when directing stuff.
There are minor changes you make that you realize when you're finally in the moment would look or sound or be better.
So yeah, that doesn't go so well with probabilistic models.
So shit, fuck, you gotta do something.
So you remix him.
Another four minutes, another $5.
Fuck, wrong hat.
Four minutes, $5.
Right hat, but his hand blends through it for some reason.
Okay, four minutes, $5.
The hat's right, but when he puts it on, his eye blinks.
One of his eyes just blinks three times for some reason.
So you can't really use that.
Okay, four minutes, five dollars, looks kind of good.
Different hat again.
Four minutes, five dollars.
Hmm.
You've now spent $80 and over an hour generating a man trying to put on a hat.
You're not really much closer to having useful footage, and because as you remix it again and again, Sora keeps making these little errors because that's how these models go.
It's impossible to tell whether the next generation will be the one that works or whether Sora will spit spit out some new little fuck-up.
So, the more intricate something is, the more expensive it gets.
But you know what?
You can find money places.
You can't find more goddamn time.
I guess you could have a separate computer running more, but that's still going to cost a bunch of money.
How many of these slot machines are you going to run at once?
How many times are you going to allow them to edit?
How can you have a coherent vision when you've got multiple people generating things?
You can't.
But you know what?
Perhaps the next generation will be great.
Or perhaps it will be dog sheer.
You have no way to know because that's the magic of generative AI.
Yet these problems compound aggressively once you need any kind of visual consistency.
The man now has to put the hat on and leave the house.
How does the house look?
Is the hat the same?
Does he have wallpaper on his walls?
Is there anyone else in the house?
What kind of table?
Two chairs?
One chair?
Five chairs?
How do you possibly keep all of these things consistent?
You don't.
You can't.
That's part of what makes Sora so goddamn awful.
It's built specifically to make you scared of it.
To create superficially impressive clips so that brain-dead Hollywood executives can claim it's the future, yet in a practical sense it's impossible to budget or plan or guarantee anything about what Sora might do.
And this is pretty much across the board for these generative models of making video and audio.
Now I've heard from a few people that Sora is cheaper because it doesn't involve labor, which is something you could say only if you believed Sora would give consistent outputs.
And really, the only thing that a probabilistic model like Sora can do is guarantee inconsistency.
Even by Hollywood accounting standards, a generative tool that will cost hundreds or thousands of dollars to generate 10 seconds of shitty footage that is impossible to coherently connect to more footage is a really terrible idea and also very inconsistent in its costs too.
And like I said earlier, there's the issue of time.
Every single entertainment product requires some sort of time budgeting and it's impossible to say how long it will take Sora to generate something.
OpenAI doesn't doesn't even specify what several minutes means, meaning you can't really plan a production using it.
Sora isn't cheaper, Sora isn't easier, and Sora certainly isn't more efficient.
But you need to remember also that generative video models have been around for over a year, and they're not really seeing mass use.
Now, if this thing were capable of making anything truly useful, you'd see it everywhere right now, but you are seeing a little bit of it, and I do want to address that.
You probably saw Kalshi's ad and heard that it cost $2,000 to make and took only a few days, but I really encourage you to look at the actual commercial itself.
It's completely incoherent nonsense, each shot completely disconnected with weird glitches and animations in the crowds.
And one point towards the end, a woman is meant to say OKC, but the C part does not map to her mouth?
It looks really bad, and the only way you could get away with something like this is having these quick hit shots.
And also, please go and view the comments about this, that people just rip the fuck out of this thing.
But nevertheless, it was made using VO3, Google's generative video model, and it apparently took 300 to 400 clips to get 15 usable shots stitched together using traditional editing tools.
Now, the reason this cost two grand is that it sucked, and the reason you're not seeing more advertisers do this is because it's impossible to make a coherent video out of this footage.
I realize most commercials you see on TV may feel chaotic or kind of bland, but they're remarkably precise.
And the generative shots used for the Calci commercial are chaotic and fail to convey any real meaning beyond a person yelling Indiana or OKC.
The only reason it cost so little was one guy put several days of prompting it to it and the end result was shitty and Cauchy didn't mind because this was a publicity move.
Caushi put out the commercial specifically so the media would write it up and they succeeded because the media loves to feed on scary stories like AI is going to replace human actors.
Since the Cauchy ads, PJ Ace who made it has made a few others.
A Popeyes rap one where, again, go and look at the comments.
I'm not linking to it by the way.
I don't want to send them any fucking traffic.
But the Poppace one, people are just responding saying, this looks like shit.
What is this?
It's incoherent.
It's inconsistent.
But the funniest one I found was David Beckham's IM8 health supplement ad, which ends with a shot of the bottle of the product with a bunch of garbled generative texts.
It does not appear that PJ Ace has got a ton more work than this, probably because the outputs kind of suck and brands really do not like inconsistent things.
And also,
a fucking health supplement from David Beckham.
Jesus Christ.
Just say it's a private equity firm.
Anyway, to conclude, I also want to be clear that the rates for these videos are heavily subsidized by big tech, just like every other generative AI product.
While Sora might cost 30 or 50 cents a second right now, once the AI bubble bursts, these prices will either skyrocket or these models will cease to exist for public consumption.
The biggest clue I can give you is that Google only allows you to generate four or five VO3 videos a day on their $250 a month Gemini Ultra plan.
That suggests that Google's video costs are brutal and that OpenAI is burning money by the bucketful to let you fuck around on the Sora app.
I don't recommend you do that, but if you have, just know you're burning a hole in Clammy Sammy's pocket.
I will add that you may worry about these models getting better.
While they might be more nuanced in their ability to generate video in 5 or 10 second bursts, their ability to generate longer or consistent videos is inherently impossible due to the probabilistic nature of transformer-based models.
In simple terms, these things are rolling the dice every time.
The way you prompt them is what makes them generate, and they don't have minds or thoughts.
They're just rolling the dice every time on whatever you say and trying to interpret what you mean.
Human beings, by the way, are extremely magical.
I think you really underestimate how amazing people are.
When we direct someone on a film set, even like an assistant director, that person keeps the production moving and makes sure everyone gets what they need and pushes back on a director when something might be impractical.
A director is a visionary, but also an actor is someone that takes interpretation and then is directed to do different things.
But that direction is not a fucking prompt.
Move your elbow,
look at this way, look that way.
The things that operate on a film or TV set are inherently different to just plugging words into a fucking model.
And I get Em.
I get everyone in Hollywood who's scared right now.
I get everyone in creatives, in creative arts even, who is scared right now.
I feel for you.
These people are losing.
These people are losing.
This stuff does not work.
It's inconsistent.
It's incredibly expensive on subsidized rates.
And in the end, I really, really believe that once the bubble pops, these things are going away.
Thank you so much for listening.
Reach out if you have any thoughts.
I always love to hear from people.
EZ at betteroffline.com.
I love getting your emails.
I love getting your weird little missives on Reddit.
I really am.
I'm truly blessed.
And I love you all.
I love how many of you listen.
I love how communicative you are.
It's been a big week with the Anthropic exclusive and yeah, I'm gonna have a Radio Better Offline next week as well.
Crap, I gotta do an episode.
Shit.
Damn.
Oh well, I have the best job in the world anyway.
Thank you for listening.
AI agents are everywhere.
automating tasks and making decisions at machine speed.
But agents make mistakes.
Just one rogue agent can do big damage before you even notice.
Rubrik Agent Cloud is the only platform that helps you monitor agents, set guardrails, and rewind mistakes so you can unleash agents, not risk.
Accelerate your AI transformation at rubric.com.
That's R-U-B-R-I-K.com.
Top Reasons Your Career Wants You to Move to Ohio.
So many amazing growth opportunities, high-paying jobs in technology, advanced manufacturing, engineering, life sciences, and more.
You'll soar to new heights, just like the Wright brothers, John Glenn, even Neil Armstrong.
Their careers all took off in Ohio, and yours can too.
A job that can take you further and a place you can't wait to come home to.
Have it all in the heart of it all.
Launch your search at callohiohome.com.
This podcast is brought to you by FedEx, the new power move.
Hey, you know those people in your office who are always pulling old-school corporate power moves?
Like the guy who weaponizes eye contact.
He's confident.
He's engaged.
He's often creepy.
It's an old-school power move.
But this alpha dog laser gaze won't keep your supply chain moving across borders.
The real power move?
Having a smart platform that keeps up with the changing trade landscape.
That's why smart businesses partner with FedEx and use the power of digital intelligence to navigate around supply chain issues before they happen.
Set your sights on something that will actually improve your business.
FedEx, the new power move.
Mint is still $15 a month for premium wireless.
And if you haven't made the switch yet, here are 15 reasons why you should.
One, it's $15 a month.
Two, seriously, it's $15 a month.
Three, no big contracts.
Four, I use it.
Five, my mom uses it.
Are you playing me off?
That's what's happening, right?
Okay, give it a try at mintmobile.com/slash switch.
Upfront payment of $45 for three months plan, $15 per month equivalent required.
New customer offer first three months only, then full price plan options available.
Taxes and fees extra.
See mintmobile.com.
This is an iHeart Podcast.