The Chaos of AI Voice Cloning

May 09, 2024 33m

▶️ Listen to episode Download audio (MP3)

What happens when voices can be copied so well they can fool friends, family… and voters?

Staff writer Charlie Warzel has followed the explosion of AI technology with a mix of fascination and fear. DALL-E, Midjourney, Chat-GPT. New leaps in AI tech seem to happen every month now. Recently, he narrowed in on AI voice cloning for a feature for The Atlantic.

He and host Hanna Rosin cloned their voices and tested it out before a live audience at the Cascade PBS Ideas Festival. What are the promises of the technology? And what are the perils?

Related Atlantic Podcast: How to Know What's Real
Subscribe here: Apple Podcasts | Spotify | YouTube | | Pocket Casts

Get more from your favorite Atlantic voices when you subscribe. You’ll enjoy unlimited access to Pulitzer-winning journalism, from clear-eyed analysis and insight on breaking news to fascinating explorations of our world. Subscribe today at TheAtlantic.com/podsub.
Learn more about your ad choices. Visit megaphone.fm/adchoices

Jump to transcript

Listen and follow along

Speed:

Transcript

Charlie Sheen is an icon of decadence.

I lit the fuse and my life turns into everything it wasn't supposed to be.

He's going the distance.

He was the highest paid TV star of all time.

When it started to change, it was quick.

He kept saying, No, no, no, I'm in the hospital now, but next week I'll be ready for the show.

Now, Charlie's sober.

He's gonna tell you the truth.

How do I present this with any class?

I think we're past that, Charlie.

We're past that, yeah.

Somebody call action.

Yeah, aka Charlie Sheen, only on Netflix, September 10th.

Running a business comes with a lot of what-ifs.

But luckily, there's a simple answer to them: Shopify.

It's the commerce platform behind millions of businesses, including Thrive Cosmetics and Momofuku, and it'll help you with everything you need.

From website design and marketing to boosting sales and expanding operations, Shopify can get the job done and make your dream a reality.

Turn those what-ifs into

So, a few weeks ago, my colleague, staff writer Charlie Warzell,

introduced me to something that's either amazing or sinister.

Probably both.

Charlie's been on this show before.

He writes about technology.

And most recently, he wrote about AI voice software.

And I have to say, it is uncannily good.

I signed up for it, uploaded my voice, and man, does it sound like me.

So, of course, what immediately occurred to me was all the different flavors of chaos this could cause in our future.

I'm Hannah Rosen.

This is Radio Atlantic.

And this past weekend, I was in Seattle, Washington for the Cascade PBS Ideas Festival.

It's a gathering of journalists and creators.

And we discussed all kinds of topics from homelessness to the Supreme Court to the nation's obsession with true crime.

Charlie and I talked about this new voice software, and we tried to see if the AI voices would fool the audience.

So for this week's episode, we bring you a live taping with me and Charlie.

Here's our conversation.

So today we're going to talk about AI.

We're all aware that there's this thing barreling towards us called AI that's going to lead to huge changes in our world.

You've probably heard something, seen something about deep fakes.

And then the next big word I want to put in the room is election interference.

Today we're going to connect the dots between those three big ideas and bring them a little closer to us.

Because

there are two important truths that you need to know about this coming year.

One is that it is extremely easy, by which I mean

$10 a month easy, to clone your own voice and possibly anybody's voice well enough to fool your mother.

Now, why do I know this?

Because I cloned my voice and I fooled my mother.

And I also fooled my partner and I fooled my son.

You can clone your voice so well now that it really, really, really sounds a lot like you or the other person.

And the second fact that it's important to know about this year is that about half the world's population is about to undergo an election.

So those two facts together can lead to some chaos.

And that's something Charlie's been following for a while.

Now we've already had our first taste of AI voice election chaos.

That came in the Democratic primary.

Charlie, tell us what happened there.

A bunch of New Hampshire voters, I think it was about 5,000 people, got a phone call and it would say robocall when you pick it up, which is standard if you live in a state doing a primary.

And the voice on the other end of the line was this kind of grainy but real sounding voice of Joe Biden urging people not to go out and vote in the primary that was coming up on Tuesday.

Let's, before we keep talking about it, listen to the Robocall, okay?

We're going to play it.

Republicans have been trying to push nonpartisan and Democratic voters to participate in their primary.

What a bunch of malarkey.

We know the value of voting Democratic when our votes count.

It's important that you save your vote for the November election.

We'll need your help in electing Democrats up and down the ticket.

Voting this Tuesday only enables the Republicans in their quest to elect Donald Trump again.

Your vote makes a difference in November, not this Tuesday.

I'm feeling that some of you are dubious, like that doesn't sound like Joe Biden.

Clap if you think it does not sound like Joe Biden.

Huh, well, okay, somewhere in there.

So when you heard that call, did you think, uh-oh, here it comes?

Like, what was the lesson you took from that call?

Or did you think, oh, this got solved in a second, and so we don't have to worry about it?

When I saw this, I was actually reporting out a feature for The Atlantic about the company 11 Labs whose technology was used to make that phone call.

So it was very resonant for me.

You know, when I started writing, I've been writing about deep fakes and things things like that for quite a while, I mean, in internet time, like since 2017.

But there's always been this feeling of,

what is the actual level of concern that I should have here?

Like, what is theoretical?

With technology and especially with misinformation stuff, we tend to

talk and freak out about the theoretical so much that sometimes we're not really talking about and thinking, grounding it in plausibility.

So with this, I was actually trying to get a sense of, is this something that would actually have any real sway in the primary?

Like, did people believe it, right?

Sort of what you just asked the audience, which is, is this plausible?

And I think when you're sitting here listening to this, you know, with hindsight and knowing, you know, trying to evaluate, that's one thing.

Are you really going to question, like, at this moment in time, if you're getting that, especially if you aren't paying close attention to technology.

Are you really going to be thinking about that?

This software is,

it's still working out some of the kinks, but I think the believability has crossed this threshold that is

alarming.

So just to give these guys a sense, what can it do now?

Like we heard a robocall.

Could it give a state of the union speech?

Could it talk to your wife?

Could it scan?

Like what are the things that it can do now that it's made this leap that it couldn't do a few months ago?

Convincingly.

Yeah, well, the convincing part is the biggest part of it, but the other part of these models is the ability to basically

ingest more characters and throw it out there, right?

So this company, 11 Labs, has a level that you can pay for where you can if you're an author, you can throw your whole novel in there and it can do it in a matter of minutes, essentially.

And then you can go through and you can tweak it.

It could definitely do a whole state of the union.

I mean, really, essentially,

it's given anyone who's got 20 bucks a month the ability to take anything

that they

want to do content-wise and have it come out in their voice.

So a lot of people that I know who are independent journalists or authors or people like that are doing all of their blog posts, their email newsletters, as podcasts, but also as YouTube videos, because they hook this technology, the voice AI, into one of the video or one of the image generators.

So it generates an image on YouTube every

few paragraphs or keeps people hooked in.

So it's this idea of

I'm no longer a writer, right?

I am a content

human.

I'm a multi-platform human.

Wow.

Okay, that sounds...

You fill in the adjective.

Yeah, it's intense.

It's intense.

Okay, so Charlie went to visit the company that has brought us here.

And it's really interesting to look at them because they did not set out to clone Joe Biden's voice.

They did not set out, obviously, nobody sets out to run fake robocalls.

So getting behind that fortress and learning, like, who are these people?

What do they want, was an interesting adventure.

So it's called 11 Labs.

And by the way, the Atlantic, I will say, uses 11 Labs to read out some articles, our magazines.

So just so you know that, a disclaimer.

I was really surprised to learn that it was a small company.

Like, I would expect that it was Google who crossed this threshold, but not this small company in London.

How did that happen?

So,

one of the most interesting things I learned when I was there, I was interested in them

because they were small and because they had produced this tech that is, I think, better than everyone else.

There are a few companies.

Meta has one that they have not released to the public, and OpenAI also has one that they have released to certain select users, partly because they aren't quite sure how to control it necessarily from being abused.

But that aside, 11 Labs is quite good.

They are quite small.

What I learned when I was there talking to them is they talked about their engineering team.

Their engineering team is seven people.

Seven?

Yeah.

it's like former,

this is the engineering research, I guess I should say.

Engineering research team.

It's this small little team, and

they describe them almost as like these brains in a tank that would just like,

they would say, hey, you know, what we really want to do is we want to create a dubbing part of our technology, right?

Where you can feed it video

of a movie in, you know.

Chinese, right?

And it will just sort of almost in real time running it through the technology dub it out in English, or you name the length.

Is that because dubbing is historically tragic, and so they want to make dubbing?

It's quite bad, it's quite flat in a lot of places.

Obviously, if you live in a couple of the big markets, you can get some good voice acting in the dubbing.

But like in Poland, where these guys are from, it is all dubbed in a completely flat.

They're called lectors.

That's the name for it.

But like when the real housewives was dubbed into Poland, it was one male voice that just spoke like this

for all the real housewives.

So that's like a good example of like,

this isn't good.

And so people like, you know, watching U.S.

cinema or TV in Poland is like kind of like a grinding, terrible experience.

So they wanted to change things like that.

For some reason, I'm stuck on this, and I'm imagining RuPaul being dubbed in a completely flat, accentless, like Sasha, away, you know,

anyway.

Totally.

So this was actually like one of the problems that they initially were setting out to solve, this company.

And they kind of not lucked into but found all the rest of the voice cloning stuff in that space.

But anyway, they talk about this research team as these brains in the tank.

And they'll just be like, well, now the model does this.

Now the model laughs like a human being.

Like last week, it didn't.

And again, when you try to talk to them about what we did, it's not like pushing a button, right?

Then they're like, it's too complicated to really describe.

But they'll just say that it's this small group of people who are essentially like

it's

the reason why the technology is good or does things that other people can't do is because

they had an idea, an academic idea that they put into the model, had the numbers crunch, and this came out.

And that to me was kind of staggering because what it showed me was that

with artificial intelligence, unlike something like social networking, right?

You just got to get a giant mass of people connected, right?

It's network effects.

But with this stuff, it really is like quantum leap style computer science.

And obviously money is good.

Obviously compute is good.

But a very small group of people can throw something out into the world that is incredibly powerful.

And I think that that is a real revelation that I had from that.

We're going to take a short break.

and when we come back, Charlie explains what the founders of 11 Labs hope their technology will accomplish.

Trip planner by Expedia.

You were made to outdo your holiday,

your hammocking

and your pooling.

We were made to help organize the competition.

Expedia made to travel.

Tires matter.

They're the only part of your vehicle that touches the road.

Tread confidently with new tires from Tire Rack.

Whether you're looking for expert recommendations or know exactly what you want, Tire Rack makes it easy.

Fast, free shipping, free road hazard protection, convenient installation options, and the best selection of BF Good Rich tires.

Go to TireRack.com to see their BF Goodrich test results, tire ratings, and reviews.

And be sure to check out all the special offers.

TireRack.com, the way tire buying should be.

So these guys, like a lot of founders, they did not set out to disrupt the election.

They probably have a dream besides just better dubbing.

What is their dream?

Like when they're sitting around and you got to enter their kind of brain space, what is the future, the magical future of many languages that they envision?

The full dream is

basically breaking down the walls of translation completely, right?

there's this

famous science fiction book, Hitchhiker's Guide to the Galaxy, where there's this thing called the Babble Fish that can translate any language seamlessly in real time.

So anyone can understand everyone.

That's what they ultimately want to make.

They want to have this

the dubbing has a little bit of latency now, but it's getting faster.

That plus all the different voices.

And what they essentially want to do is create a tool at the end

down the line that you know you can put an AirPod in your ear and you can go anywhere and everyone else has an AirPod in their ear and you're talking and so you can hear everything immediately in whatever language.

That's the end goal.

So the beautiful dream, if you just take the purest version of it, is all peoples of the world will be able to communicate with each other.

Yeah,

when I started talking to them, because I, you know,

living in America, I have a different experience than, you know, most of them are European, or many of the two founders are European you know they said you know you grow up and you have to like you have to learn English like in school right and there's a and so there's only a few places where you don't grow up and you know they say you got you also got to learn English because you know if you want to go to university wherever do whatever and participate in the world and they said if we do this then you don't have to do that anymore

imagine

imagine the time you would say you know of not having to learn this other language

So they're thinking about Babel and this beautiful dream, and we're thinking, like, oh my God, like, who's going to scam my grandmother?

And, like, who's going to mess up my election?

Do they

think about that?

Did you talk to them about that?

Like, how aware are they of the potential chaos coming down?

They're very aware.

I mean, I've dealt with a lot of in my career, like tech executives who are sort of, you know, they're not willing to really entertain the question, right?

Or if they do, it's kind of like glib or, you know, sort of

there's a little bit of resentment, you can tell.

They were very, and I think because of their age,

the CEO is 29,

very like earnest about it.

Like they care a lot.

They obviously look at all this and see, they're not blinded by the opportunity, but the opportunity looms so large that

these negative externalities are just problems they will solve right or that they can solve.

And so we had this conversation where I would, you know, I called it like the bad things, right?

And I just kept like, what are you going to do about jobs this takes away?

What are you going to do about

all this misinformation stuff?

What are you going to do about scams?

And they have these ideas like digitally watermarking all voices, right?

And working with all sorts of different companies to build a watermarking coalition.

So when you voice record something on your phone, that has its own sort of like metadata, right, that says like, this came from Charlie's phone this time, or you know, like this is real.

Or when you, and when you post to 11 labs thing, it says, and people can quickly decode it, right?

So there's all these ideas.

But we, like,

I can't tell you, it was like

smashing my head against a brick wall for an hour and a half.

with this really earnest, nice person who's like, yeah, no, no, we'll, like, it's gonna be, it's gonna take a while before we, you know, societally all get used to all these different tools, not just 11 labs.

And I was like, and in the meantime, you know, like, and there's sort of like, they would never say it this way, but the vibe is sort of like, well, you gotta, you know, break a lot of eggs to get the universal translation and omelet situation.

That's the same story.

But like, you know, the like some of those eggs might be like the 2024 election maybe, right?

And that's a big egg.

Right, right, right.

So it's the familiar story, but more earnest and more self-aware.

Do you guys want to do another test?

Okay, you've been listening to me talk for a while.

Charlie and I both fed our voices into the system.

One of, we're going to play you me saying the same thing twice.

One of them is me recorded.

I just recorded it.

Me, the human being.

in the flesh right here.

And one of them is my AI avatar saying this thing.

There's only two.

I'm saying the the same thing.

So we're going to vote at the end for which one is fake AI HANA.

Okay, let's play the two Hanas.

Charlie, how far do you think artificial intelligence is from being able to spit out a million warrior robots programmed to destroy humanity?

Okay,

who thinks that number one is fake Hana?

Who thinks number two is fake hana?

It's pretty even.

I would say two is more robust and two is correct.

That's the fake one.

But it is

man, it's close.

Like Charlie spent time at this place and he's gotten both of them wrong so far.

We work together.

It is really, we work together.

This is really, really close.

You know, the only like bulwark right now against this stuff is that I do think people are generally

pretty dubious now of most things.

Interesting.

I do think there is just a general suspicion of stuff that happens online.

And I also think that the one thing we have seen from some of these,

there's been a couple of ransom calls.

It's a scam and it's your mom's voice or something like that.

Those things sort of come down the line pretty quickly.

Like you can pretty quickly realize

your mom isn't being kidnapped.

You can pretty quickly, as administrators, like you can get to the bottom of that.

Like,

basically, like, I don't know how effective these things are yet because of the human element.

It seems like we

have a little bit more of a defense now than we did, you know, let's say in 2016.

And I do think that time is our greatest asset here with all of this.

The problem is, you know, it only takes one, right?

It only takes person

you know in late October who puts out something just good enough or early November that it's the last thing someone sees before they go to the polls right and it's and it's too hard to debunk or that person doesn't see the debunking right and so those are the things that make you nervous but also I don't think yet that we're dealing with like godlike ability to just you know right totally destroy reality.

It's sort of somewhere in the middle, which is still, you know.

I see.

So the danger scenario is a thin margin, very strategic use of this technology.

Like less informed voters, suppress the vote, someplace where you could use it in small strategic ways.

That's a realistic fear.

Yeah, I think like

hyper-targeted in some way.

I mean, it's funny.

I've talked to a couple of, you know, like AI experts and people in the field of this, and they're so worried about it, it's really hard to coax out nightmare scenarios from them.

They're like, no,

I've got mine, and I'm absolutely not telling a journalist.

Like, no way.

I do not want this printed.

I do not want anyone to know about it.

But I do think, and this could be the fact that they're too close to something, or it could be that they're right and they are really close to it.

But there's so much fear from people who work.

with these tools.

I'm not talking about the 11 labs people necessarily.

But AI people.

I mean, true believers in the sense of, you know, if it doesn't happen this time around, wait till you see what it's going to be in four years.

I know.

That really worries me.

That the people inside are, some of the people inside are so worried about it.

It's like they've birthed a monster kind of vibe.

But it's also good marketing.

So you can go back and forth on this, right?

Like the whole idea of, you know, we're building the Terminator, we're building Skynet, it could end humanity.

Like there's no better marketing than like we are creating the potential apocalypse, pay attention.

Right, right, right, right.

All right, I'm going to tell you my two fears and you tell me how realistic they are.

One is

the absolute perfection of scams designed to target older people who are slightly losing their memories that are already pretty good.

Like they're already pretty good and you already hear so many stories of people losing a lot of money.

That is one I'm worried about.

Like how easy it is to consistently call someone in the voice of a grandson or in the voice of whatever.

That one seems like a problem.

Yeah,

I think it will be.

And I don't think it has to be relegated to people who are

so old they're losing their memories.

It's difficult to discern this stuff.

And I think

what I have learned from a lot of time reporting on the internet is that there is nobody is immune to a scam.

Yes.

Like there is a scam out there waiting.

It's like, you know, like,

yes, exactly.

There's a scam waiting to match with you.

And, you know, when you find your counterpoint, it's

like true love.

Like, out there is a perfect scam for you.

Okay, one more worry, and then we're going to do our last test.

My real worry is that people will know that things are fake, but it won't matter because people are so attached to whatever narrative they have that it won't matter to them if you prove something is real or fake.

Like, you can imagine that Trump would put out a thing that was fake, and everybody would kind of know it's fake, but everyone would collude and decide that it was real and proceed based on that.

Like real and fake, just it's not a line people worry about anymore, so it doesn't matter.

I fully think we live in that world right now.

Yeah.

I mean honestly.

And I think a good example

is a lot of the stuff,

not only the stuff that you see coming out of the Middle East in the way that, I mean, obviously there's so much like

literal digital propaganda and misinformation coming from different places, but also just from the normal stuff that we see.

And this is a little less AI involved, but I think there's just a lot of people, especially younger people, who

just don't trust the establishment media to do the thing.

And they're like, oh, I'm going to watch this, and I don't really care.

And so I think the level of distrust is so high at the moment that

we're already in that situation.

Yeah, like we're of a generation and we're journalists, and so we sit and worry about what's real and what's fake, but that's not actually the line that people are paying attention to out there.

Yeah, I think the real thing is

getting to a point where you have built enough of like a parasocial trust relationship with someone that they're just going to believe what you say and then try to be responsible about it.

Right.

You know, about delivering them information, which is crazy.

Okay, one final fake voice trick.

This one's on me.

Since Charlie, you were wrong both times.

Now it's my turn.

My producers wanted to give me the experience of knowing what it's like to have your voice saying something that you didn't say.

So they took my account, they had my voice say things, and I haven't heard it and I don't know what it is.

So we are going to listen to that now.

It will be a surprise for all of us, including me.

So let's listen to these fake voicemails created by my wonderful producers.

Hi, I'm calling to leave a message about after-school pickup for my kids.

Just wanted to let their homeroom teacher know that Zeke in the white van is a dear family friend and he'll be picking them up today.

Okay.

Hi, mom.

I'm calling from jail and I can't talk long.

I've only got one phone call.

I really need you to send bail money as soon as you can.

I need about $10,000.

Cash App, Venmo, or Bitcoin.

All work.

My mom does not have $10,000.

Hey, I hope I have the right number.

This is a voicemail for the folks running the Cascade PBS Ideas Festival.

I'm running late at the moment and wondering if I'm going to make it.

Honestly, I feel like I should just skip it.

I can't stand talking to that Charlie whatever character.

Why am I even here?

Washington DC is clearly the superior Washington anyway.

Wow.

Yeah, okay, okay.

Now, I would say I was talking too fast.

So, yeah, but I mean, you can, so one thing I did with my voice is I had it say a whole bunch of like worse things, like COVID came from a whatever, you know, like, just to see what those things would be like.

And they were, you know, sort of believable, whatever.

But also, what if then you took audio, so the one from jail, right?

What if you took audio, your producers, our producers are great, and inserted a lot of noise that sounded like it was coming from a crowd or like a slamming of cell door or something like that in the background, faded it in nicely.

That would be enough to ratchet it up, right?

Right.

And just be real.

And so I think all those things can become extremely believable if you layer the right context on them.

Right.

You know what, Charlie?

Here's the last thing, because we're in our last few minutes.

You, as someone who's been really close to this, fluctuate between, okay, we don't need to be that alarmed.

It's only got these small uses,

but also it's got these uses and they're really scary.

Having been close to this and gone through this experience, is there a word you would use to sum up how you feel now?

Because clearly it's uncertain.

We don't actually know.

We don't know how quickly this technology is going to move.

Yeah.

How should we feel about it?

I think it's disorientation is the word.

Because, so a big reason why I wanted to go talk to this company was not just because of what they were doing, but to be kind of closer, like to get some proximity to like the generative AI revolution, whatever we're going to call it, right?

To see these people doing it, to feel like I could like moor my boat to something, right?

And just feel like I like to build a new system.

Yeah, to understand what we're, yeah, and I understand what we're building towards, or that they understand what they're building towards and like the answer is that you can like walk up to these people and stare them in the face and have them answer questions and just sort of feel

really at sea about a lot of this stuff because there are excellent transformative applications for this but also I see the

you know this voice technology with the other generative AI technologies they're all basically a good way to think of them is like plug-ins to each other, right?

Like people are going to use

voice technology with ChatGPT, with some of the

video stuff, and it's going to just

make the internet make media weirder, right?

Everything you see is going to be weirder.

The provenance of it is going to be weirder.

It's not necessarily always going to be worse, right?

But it could be.

And it could maybe be better, but everyone seems like they're speeding towards this destination and it's unknown

where we're going.

And I just feel that disorientation is the sort of the most honest and truthful way to look at this.

And I think when you're disoriented, it's best to be really wary of your surroundings, right?

To pay very close attention.

And

that's what it feels like right now.

We can handle the truth.

Thank you for giving us the truth.

And thank you all for coming today and for listening listening to this talk.

And be prepared to be disoriented.

Thanks for listening.

And thank you to the production staff at the Cascade PBS Ideas Festival.

This is the AI version of Hana Rosen speaking as made by 11 Labs.

This episode of Radio Atlantic was produced by Kevin Townsend.

He's typing these words into 11 Labs right now and can make me say anything.

You may hate me, but it ain't no lie.

Baby, bye, bye, bye, bye-bye.

This episode was edited by Claudine A.

Bade and engineered by Rob Smirziak.

Claudine A.

Bade is the executive producer of Atlantic Audio, and Andrea Valdez is our managing editor.

I'm not Hannah Rosen.

Thank you for listening.

Hey, real Hannah Rosen here.

I wanted to pop back in to recommend another Atlantic podcast.

It's the next season of the Atlantic's How-To series.

This season also happens to focus on something AI-related.

It's about how our culture deals with digital fakery.

Co-hosted by Megan Garber and Andrea Valdez, it's called How to Know What's Real.

What are your findings when it comes to how our brains respond?

If something makes you feel a really strong emotion, that's typically a time to pause and kind of double check, is this true or not.

If you were an urban planner, basically, for the internet, what would you advise us to do?

I think you're going to see different kinds of resistance

because that's the thing about a city is that, you know, what does it mean to maintain morality in a way of recognizing the dignity and humanity of the collective?

That's how to know what's real.

You can find subscription links in the notes for this episode.

Subscribe now to listen to the sixth episode season starting this May.

← Previous: If Plants Could Talk Next: Finally, Male Contraceptives →