90. Closed Captions
Listen and follow along
Transcript
going to put you on, nephew.
All right, Uncle.
Welcome to McDonald's.
Can I take your order?
Miss, I've been hitting up McDonald's for years.
Now it's back.
We need snack wraps.
What's a snack wrap?
It's the return of something great.
Snack wrap is back.
And now, a next level moment from AT ⁇ T Business.
Say you've sent out a gigantic shipment of pillows, and they need to be there in time for International Sleep Day.
You've got AT ⁇ T 5G, so you're fully confident.
But the vendor isn't responding.
And International Sleep Day is tomorrow.
Luckily, AT ⁇ T 5G lets you deal with any issues with ease, so the pillows will get delivered and everyone can sleep soundly, especially you.
AT ⁇ T 5G requires a compatible plan and device.
Coverage not available everywhere.
Learn more at ATT.com slash 5G network.
Katie Ryan's home office in Pittsburgh, Pennsylvania is pretty run of the mill.
I just have a regular IKEA desk.
I have a big TV up on the wall.
I have a laptop laptop stand with my laptop on it.
And then I have a monitor stand that has two monitors on it.
There's a blanket on the floor for my dog, you know.
But the work she does at this desk is seen by millions of people every week.
I've done the Super Bowl a handful of times.
I've done the Olympics many times.
I just did the Oscars a couple weeks ago.
Any sporting event that you can think of, I've probably done it.
Any major news event event that has happened, I have probably been involved in that somehow.
Presidential funerals, presidential debates.
I remember when the Boston Marathon bombing happened, the breaking news was just constant.
I think I was on the air writing without a commercial break for something like three and a half hours.
Ryan is a captioner.
She writes the text transcripts that appear on your TV screen when you turn on closed captioning.
She does this in real time.
Most people think their TV just does it.
They don't realize that there's a person like me sitting in a room with headphones on.
And people don't realize that it's happening live.
Like if I'm writing a news broadcast or a sporting event, maybe I have like five seconds extra than you do when you're hearing it.
And I have to write it at the same time and try and keep up with all the speedy talkers that are out there.
In some ways, it's a good business to be in.
One survey found that 50% of Americans and 70% of Gen Z viewers say they watch content with captions on most of the time.
But the industry is also rapidly changing.
The nimble fingers of human captioners like Katie Ryan are up against the neural networks of artificial intelligence services.
Technology is the key to the future of captioning.
But, you know, you need people that are looking at the content.
For the Freakonomics Radio Network, this is the Economics of Everyday Things.
I'm Zachary Crockett.
Today, closed captions.
The term captions is often used interchangeably with subtitles, but the two are different.
Subtitles are used for translation.
Captions are designed for people with hearing impairments, and they describe every auditory element, dialogue, sound effects, music, and sometimes even background noises.
The goal of captioning is to give the user the content of exactly what's being heard.
That's Doug Karlovitz.
He's a general manager at Verbit, the largest provider of captions in America.
He says that if you're watching something on TV, either live or pre-recorded, you can almost always turn on the captions in the device's settings.
But But that wasn't always an option.
Really, captions were born for television in 1970.
The first pre-recorded show ever captioned was The French Chef with Julia Chaub.
The earliest efforts were called open captions, and they were limited to pre-recorded shows.
The text was a permanent part of the video.
Eventually, a new method called closed captions made it possible for viewers to turn the text on and off.
And by the 1980s, thanks to the efforts of the nonprofit National Captioning Institute, captions could also be used for live television.
Around this time, Karlovitz's father Joe saw an opportunity to expand the captioning industry.
My father was a court reporter, a stenographer, and he became very interested in computers and how to take his stenotype and get it translated through a computer into English.
Stenographers are extremely fast typists.
On stenotype machines, they can transcribe up to 300 words per minute.
Joe began training fellow stenographers to do TV captioning.
And in 1986, he founded a company called Vitac, which was later acquired by another company called Verbit.
We started out with a local television station in Pittsburgh and eventually grew into the largest provider in North America of captioning.
Today, broadcasters, cable companies, and satellite services are required by federal laws to have captions available for nearly every televised program.
This also carries over to much of the media on streaming services online and most video content in public settings like courtrooms, hospitals, schools, and sports bars.
Captions have to be readable, accurate, and inclusive of all audio context.
They have to clearly identify each speaker, and for live broadcasts like news programs, they appear almost in real time.
In the United States, everything that airs on television should have captions today.
Almost every show has captions on it.
Fitak is one of three companies alongside IBM and Zoo Digital Group that control around 60% of the captioning market.
Karlovitz says they caption around 500,000 hours of content a year.
We work with all the major broadcasters, all the various producers of television programs.
Work with all the different universities around the world, providing captions for the classroom.
On the legal side, we're working with law firms and court reporting agencies.
And on the government side, we'll do anything from town halls to training on all the different things.
We also work with sports venues, theaters, so everywhere where words are spoken, there's the opportunity to add captions.
Much of today's captioning has shifted from human stenographers to automated tools.
In some cases, the captioning service uses a technique called re-speaking.
A human employee watches a show in a recording booth and carefully recites every word into a special microphone.
Voice-to-text software turns the narration into a written transcript.
In other cases, particularly with pre-recorded TV shows, technology can be used to generate text from a script.
But for live TV, like news broadcasts, Super Bowls, and presidential debates, a human captioner clacking away at a machine is still the most reliable option.
A stenographer gets a live feed of a network's audio a few few seconds before it goes to the general public.
They listen through a pair of headphones while typing out the words in shorthand on their stenotype machine.
This shorthand goes through processing software on a computer that turns it into text.
The text is embedded in a video signal that's transmitted to the television network through modems and IP connections.
And when you press the closed captions button on your remote, a microchip inside your TV retrieves and displays the captions on screen.
It's a complex process, and networks might pay Verbit anywhere from $130 to $175 per hour for live human captioning services.
So if you have a broadcast show that's in a 30-minute block, but it may be really only on the air for 24 minutes, they would pay for that on a per-minute basis.
If you're doing a live show, you're paying basically for the times that are booked because you don't know how long those live shows can go.
So who are these humans who create the captions on TV?
And what's it like to be on the clock during a live broadcast?
Sometimes you can't even get a drink of water.
That's coming up.
The Economics of Everyday Things is sponsored by SurveyMonkey.
Look, we get it.
You can hardly go anywhere or do anything these days without hearing about AI this or AI that.
And if you're like most people, when it comes to AI, you're impressed, but have a few concerns.
But what if AI was used not as a tool to replace people, but as a way to help understand people better?
AI from SurveyMonkey is designed to do just that.
From crafting the perfect survey, which is harder than you might think, to analysis that digs deep, finds patterns and surfaces trends quickly, SurveyMonkey's powerful suite of AI capabilities makes it faster and easier than ever before to get insights from real people, helping you make confident decisions for your business.
Try it today at surveymonkey.com slash economics.
The Economics of Everyday Things is sponsored by Rula.
Finding a therapist online is hard enough, but finding one who actually takes your insurance, that's even harder.
Rula is a healthcare company that makes accessing affordable, high-quality mental health care easier.
They partner with over 100 insurance plans and match users with licensed in-network therapists and psychiatrists nationwide based on their goals, preferences, and background.
No long wait lists, no frustrating back and forth, just personalized care that fits.
Plus, Rula sticks with you throughout your journey, checking in to make sure your care is helping you move forward.
Appointments are often available as soon as tomorrow.
And with most patients paying just $15 per session, and sometimes even less, it's care that actually fits your budget too.
Thousands are already using Rula for therapy that's high quality, accessible, and covered by insurance.
Visit rula.com slash everyday to get started.
After signing up, you'll be asked how you heard about them.
Let them know this show sent you.
That's rula.com slash everyday.
Because mental health care should work with you, not against your budget.
The Economics of Everyday Things is sponsored by Acorns.
Did you know that your money could grow on its own?
No, it's not magic.
It's compounding.
That's when your money makes more money, and then that money makes even more money.
Acorns makes it easy to give your money a chance to grow.
Acorns is the financial wellness app that helps you invest for your future, save for tomorrow, and spend smarter today.
Acorns makes it easy to start doing more with your money.
You don't need to be a finance whiz.
Acorns puts your money into an expert-built portfolio to make sure you're investing wisely, not wildly.
And it's an all-in-one, easy-to-use app.
Sign up now and Acorns will boost your new account with a $5 bonus investment.
Join the over 14 million all-time customers who have already saved and invested over $25 billion with Acorns.
Head to acorns.com/slash economics or download the Acorns app to get started.
Paid non-client endorsement.
Compensation provides incentive to positively promote Acorns.
Tier 2 compensation provided.
Investing involves risk.
Acorns Advisors LLC, an SEC registered investment advisor.
View important disclosures at acorns.com/slash economics.
Katie Ryan didn't start out hoping to be a professional captioner.
When I was graduating high school, I really didn't know what I wanted to do with my life.
And my great aunt Sandy, her sister at the time, was an official court reporter in Philadelphia.
And Sandy said, well, you can type fast on a keyboard.
Why don't you look into stenography?
Ryan completed a court reporting program at a community college in Pittsburgh and joined Vitac, now Verbit, after graduating.
She's been at the company as a captioner for more than two decades.
In her work, Ryan uses a machine called a stenotype.
It has a small screen and around 20 unmarked keys that look kind of like popsicle sticks.
She's able to type at speeds of up to 300 strokes per minute using a technique called chording.
She presses down on multiple multiple keys simultaneously to phonetically spell out whole syllables, words, and phrases with one motion.
Stenography is essentially learning another language.
It's combinations of keys to make words.
And so on the machine, each key has a letter, and then there are combinations of keys that make more letters.
PB would be N.
The letter I would be EU.
The letter D would be be T-K.
And then there are combinations of keys that make words.
So and would be A, P,
B, D.
Your hands are on different sides of the keyboard on the machine.
Your left hand is prefixes, your right hand is suffixes.
And then you have your endings, I-N-G-S-E-D, on your right side.
Brian can spell out entire phrases with just a few keystrokes.
A good example would be like, ladies and gentlemen.
That would be good for TV or court.
On my machine, it would be L-A-I-R-J.
So you hit all of those keys at once and ladies and gentlemen will come out in your computer software.
In one fell swoop.
In one stroke, you get all of those words.
Before she goes live, Ryan creates a dictionary full of customized briefs.
abbreviations of specific words that she knows will reoccur throughout the broadcast.
For the Academy Awards, she'll program combinations of keystrokes for the title of each nominated movie.
For a hockey game, she'll program every player's name.
Instead of having to write out their name every single time that it's said, you hit that one combination of keys one time or twice, and then that whole name will come out.
Obviously, we have to search ahead of time to find out who like your play-by-play announcer is and who your color analyst is.
But the process doesn't usually go without a hitch or two.
Captioners are human, after all, and they make the occasional mistake.
While there's no federally mandated benchmark, the standard for accuracy in the industry is 99%,
meaning one out of every 100 words might be misspelled or altogether butchered.
Oftentimes, a captioner is aware of a typo.
They just don't have the time to fix it during a high-speed live broadcast.
We have the asterisk on my machine, which is the key in the very middle.
That can erase a mistake, but nine times out of 10, you are not going to catch it fast enough before it already goes out on the air.
And then if you try and take it back, it's just going to garble the captions up.
So it's better to just, if you make a mistake, just ignore it and keep writing and move past it.
And then the faster it moves off the screen, the faster people will forget about it.
Even after 21 years on the job, Ryan has a few recurring issues.
I tend to drag my fingers, so sometimes I will catch extra letters when I'm trying to write certain words.
or I'll miss keys too.
Like if my fingernails are too long, sometimes I can't quite hit the keys right.
Sometimes you might notice the captions pause for a few moments or go blank.
This is likely because the captioner fell off pace and is trying to catch up.
This happens most often with news shows where the banter can be lightning fast.
Rachel Maddow, who hosts her own live show on MSNBC, has been clocked talking at up to 270 words per minute.
A challenge for even the most seasoned captioner.
If you need to just let a sentence go and then catch up again, that's okay.
When you start paraphrasing though, then you take the risk of presenting the wrong information or turning it into something that they didn't actually say.
And that's the last thing you want to do.
You don't want to put words in anybody's mouth.
The goal is to provide a text equivalent of as much of the audio as possible.
This can be particularly challenging when multiple people are speaking at once.
A lot of times it'll just be, you know, a couple of words in a dash, and then the next person will be a couple of words in a dash.
Sometimes there's nothing you can do.
If they're just screaming at each other, there is nothing you can do.
You know, once they figure it out, then you can keep going again.
Doug Karlovitz, the general manager at Verbit, says certain TV shows pose more problems than others.
Like the Osborns, a reality show from the early 2000s that followed the aging and often incomprehensible rock star Ozzy Osborne and his family.
The debates around the office on what we thought he was saying on that show was, you know, good water cooler conversations.
Well, first was, is he just putting this on?
Eventually, as that show got renewed, you realize, no, that's how Ozzy talks.
It was really like, I think he said this.
And then, you know, people would go and come over, listen to this.
What do you think he said?
And, you know, you would just sit there and I don't know.
I don't know what he said.
I don't think he knows what he was saying.
There are also elements that require interpretation, like how to caption a noise or a nonverbal vocalization.
Some networks and studios are particular.
Disney reportedly has specific rules about how R2D2's mechanical noises should be captioned.
Netflix is fond of using the phrase wet squelching to describe the sound of monsters in the show Stranger Things.
For background noises and live captioning, Ryan uses a list of templatized descriptions.
We call them parentheticals, so like bells tolling or applause, singing, chanting, things like that.
You want to try and be descriptive, but also you don't want to go overboard.
All of this effort is to ensure that people who are deaf or hard of hearing have equal access to media.
But captions have found a much broader audience.
A 2022 survey by the language learning platform Preply found that half of all viewers now watch media with captions on most of the time.
Some have speculated that's at least partly to do with modern sound mixing, which alternates between loud sound effects and quiet dialogue.
Game of Thrones, there was so much background noise occurring on that show that a lot of the people started using captions.
But the most frequent users of captions are now younger people, particularly Gen Z.
And that has more to do with changes in the media landscape.
The younger viewers, they're watching it on their phones.
They're watching it on their iPads.
They're not necessarily listening, but they're reading it as they're in class or they're at work and don't want to call attention to themselves.
Some publishers have estimated that up to 85% of the videos they post on Facebook are watched on mute.
Many short form videos on social media sites now have captions coded directly into the media file that can't be turned on or off.
That's because it's keeping that person who's looking, it's keeping their attention longer.
Some platforms, like YouTube, offer their own tools to creators that use speech recognition to generate captions automatically.
Karlovitz says artificial intelligence has already fundamentally changed the captioning business.
Verbit offers automatic speech recognition and generative AI tools that are trained with diverse language models to pick up on speech patterns.
Karlovitz says these options cost much less than traditional transcription, but they still aren't as accurate or precise as a human captioner.
And at least for now, many clients still prefer their captions to be generated by a human being like Katie Ryan.
Maybe a deaf person is in an area that there's tornadoes and they turn on their local news.
We want those people to be able to have captioning that is as accurate and as clean as possible so they know what to do and they can be safe.
I will always advocate for a human captioner to be there to give the best service possible.
When you watch TV, do you always use the captions?
No, never have captions on in my house.
Really?
Never, no.
I sit in front of a computer and deal with that all day.
I don't need to worry about it.
I'm off the clock.
For the economics of everyday things, I'm Zachary Crockett.
This episode was produced by me and Sarah Lilly and mixed by Jeremy Johnston.
We had help from Daniel Moritz Rapson.
And thanks to our listeners, Owen Roberts and David Kennett for suggesting this topic.
If you have an idea for an episode, feel free to email us at everydaythings at freakonomics.com.
Our inbox is always open.
All right, until next week.
What if you're in the middle of like a live broadcast and you just really have to pee?
Now, from my office to my bathroom, it's like 10 steps, so I can make it.
The Freakonomics Radio Network, the hidden side of everything.
Stitcher.
This is a vacation with Chase Sapphire Reserve.
The butler who knows your name.
This is the robe, the view, the steam from your morning coffee.
This is the complimentary breakfast on the balcony, the beach with no one else on it.
This is the edit, a collection of hand-picked luxury hotels you can access with Chase Sapphire Reserve.
And a $500 edit credit that gets you closer to all of it.
Chase Sapphire Reserve, the most rewarding card.
Learn more at chase.com/slash Sapphire Reserve.
Cards issued by J.P.
Morgan Chase Bank, any member of FDIC, subject to credit approval.
I'm going to put you on, nephew.
All right, huh?
Welcome to McDonald's.
Can I take your order?
Miss, I've been hitting up McDonald's for years.
Now it's back.
We need snack wraps.
What's a snack wrap?
It's the return of something great.
Snackrap is back.
Honey, do not make plans Saturday, September 13th, okay?
Why, what's happening?
The Walmart Wellness Event.
Flu shots, health screenings, free samples from those brands you you like.
All that at Walmart.
We can just walk right in.
No appointment needed.
Who knew we could cover our health and wellness needs at Walmart?
Check the calendar Saturday, September 13th.
Walmart Wellness Event.
You knew.
I knew.
Check in on your health at the same place you already shop.
Visit Walmart Saturday, September 13th for our semi-annual wellness event.
Flu shots subject to availability and applicable state law.
Age restrictions apply.
Free samples while supplies last.