GPT-5 Arrives, and We Try the New Alexa+

Transcript

Speaker 1 In business, they say you can have better, cheaper, or faster, but you only get to pick two. What if you could have all three? You can with Oracle Cloud Infrastructure.

Speaker 1 OCI is the blazing fast hyperscaler for your infrastructure, database, application development, and AI needs where you can run any workload for less.

Speaker 1 Compared with other clouds, OCI costs up to 50% less for compute, 70% less for storage, and 80% less for networking. Try OCI for free at oracle.com/slash NYT.
Oracle.com/slash NYT.

Speaker 2 I had a first this week. What's that? I had my first experience with smelling salts.

Speaker 3 Wait, did you faint?

Speaker 2 Yes.

Speaker 2 So I had to get a blood draw at the doctor, and I am a big baby when it comes to getting blood taken.

Speaker 2 And so, like half the time when I get blood taken, I pass out.

Speaker 2 And this time, I not only passed out, but I vomited and had to be brought back with smelling salts, which Casey, if you have never experienced smelling salts,

Speaker 2 they're not messing around. They are not.

Speaker 3 And I, okay, so I cannot believe that I'm just learning this information about you because I am also a fainter. You're a fainter.
I am a fainter. We are Legion.

Speaker 3 In 12th grade, we went to go see a cadaver for my AP biology class. And intellectually, I was like so fascinated, you know, by all the systems of the body.

Speaker 3 And so me and all the other kids are standing around the cadaver and the person is sort of explaining, well, you know, this is the liver and this is the spleen.

Speaker 3 And then I just got a whiff of something. I don't know if it was embalming fluid or formaldehyde or something, but it was like something triggered in my brain and was like, this is against nature.

Speaker 3 Like you should not be this close to an opened up dead body.

Speaker 3 And I, and I truly, I spun around, I took a header off a whiteboard that was like, you know, against the wall and I wake up and I'm staring. at the ceiling.

Speaker 3 And the first thing I hear is my AP bio teacher, Miss Oliver, saying, do we have an emergency contact for this kid?

Speaker 3 Obviously, like, I don't want to tell people that they should like faint, but it is one of the most amazing, crazy experiences you can have in the human. Like, do you know what I mean?

Speaker 3 The moment when you're like, your consciousness just sort of like leaves you, yes, crazy.

Speaker 2 And when I was brought back with the smelling salts, it felt very like Victorian, you know?

Speaker 2 Like, I was like, No, the vapors, the vapors.

Speaker 3 Call Mr. Darcy.

Speaker 2 I'm Kevin Roos, a tech colonist at the New York Times.

Speaker 3 I'm Casey Noon from Platformer.

Speaker 2 And this is Hard Fork.

Speaker 3 This week, give me five. GPT 5.
We'll tell you all about OpenAI's latest Frontier model. Then, Kevin and I get access to the new Alexa Plus.

Speaker 3 We found a few minuses, and we're bringing in Amazon's VP of Alexa to talk about it. Alexa, prepare my interview questions.

Speaker 2 Well, Casey, it has been another busy week in the world of AI. Boy, has it.

Speaker 2 And because we are going to talk about open AI, I should add my disclosure that the New York Times company is suing them and Microsoft for copyright violations.

Speaker 3 And my boyfriend works at Anthropic.

Speaker 2 So we've gotten a bunch of new AI releases and announcements this week.

Speaker 2 We're not going to go through all of them, but some of the highlights, we got something called a world model from Google DeepMind.

Speaker 2 Genie 3 has this kind of interactive game engine where you can just sort of like describe a game that you want to play and it can sort of build it in real time. Pretty cool.
We can't use that yet.

Speaker 2 So that was just a demo or research preview, but that was early in the week. And then we got a new cloud version, Opus 4.1 is out.
So I've been playing around with that.

Speaker 2 Not too different, but sort of a newer update from them. We also got open source models from OpenAI, putting the open back in their name.
They released two open source models this week.

Speaker 2 And Casey, have you played around with either of those?

Speaker 3 I have not yet downloaded them. Have you?

Speaker 2 I have not. One of them is apparently small enough that you can run it on a MacBook.
Another one, you kind of need a dedicated GPU for it.

Speaker 2 But these are basically OpenAI's first open source models since GPT-2 many years ago. People have been hounding them.

Speaker 2 You guys are sort of betraying the founding spirit of open AI by not making these things open and accessible through open source. And they said, well, here you go.
Here are some models.

Speaker 2 They're not there sort of top of the line models, but people are sort of finding various uses for them.

Speaker 2 And this is sort of designed to compete with the open source models coming out of China with companies like DeepSeek.

Speaker 3 Yeah. And the early word on these is that they're pretty good and that they're competitive with the O3 Mini, 04 Mini, which are more proprietary models.

Speaker 3 But yeah, the early reviews I was reading of the open source models were that they were powerful and good. Yeah.

Speaker 2 So those are some of the announcements that we got earlier in the week. But the big one is that this week, OpenAI released GPT-5, their long-awaited flagship model.

Speaker 2 People have been asking Sam Altman about this, including us, for many months now.

Speaker 2 This was long-awaited. There was lots of hype and rumors flying around about this.
And we just got off a press briefing, a sort of Zoom call with Sam Altman and some of the other leaders of OpenAI.

Speaker 2 And Casey, what did we learn? Well,

Speaker 3 it probably won't surprise people to learn that what they told us during this briefing was that GPT-5 is their best model ever.

Speaker 3 You know, Sam Altman in his remarks said that this is a major upgrade. He called it a significant step along the path to AGI.

Speaker 3 But he also said that we're not at AGI yet. Among other things, for example, he said, look, this model does not continuously learn.
And in his view, AGI will continuously learn.

Speaker 3 So So I actually thought it was cool that he said that because now we have sort of one thing to hang on to where, well, maybe when a model can do that, we'll feel like we really are getting close to AGI.

Speaker 3 But, you know, the one other thing that he said that struck me that I thought was kind of funny was that he said that after they had sort of put GPT-5 together, he went back to using GPT-4 and he said, quote, it was quite miserable.

Speaker 3 He said he never wants to go back to using GPT-4 ever again. That's how good he says GPT-5 is, Kevin.

Speaker 2 Yeah. So he sort of compared it to the previous models.

Speaker 2 He said GPT-3 felt like talking to a high school student, GPT-4 felt like talking to a college student, and GPT-5, he said, is the first time it feels like talking to an expert, someone who has a PhD in a subject.

Speaker 2 So I think we should caveat this all by saying that as of our taping this week, GPT-5 had not yet been rolled out.

Speaker 2 And we hadn't been able to sort of put it through its paces, but it will be rolled out this week,

Speaker 2 including to free users of ChatGPT who have not previously had access to their top-of-the-line models.

Speaker 3 Yeah, and I think that that's important because OpenAI's best models at the moment have been reserved for paying users.

Speaker 3 So the chatbot that I use the most is O3, which is a reasoning model that OpenAI makes. That's not accessible to people who are on the free plan.

Speaker 3 So I do think it's really notable now that even free users, which I think is going to include a lot of high school school and college students out there, are now going to have access to what at least they are saying is PhD level intelligence and reasoning.

Speaker 2 Yeah.

Speaker 2 So one of the most annoying features of ChatGPT for years now has been this model selector where you go in and it sort of gives you this little dropdown menu and it defaults to GPT-4.0 now, but you can kind of pick your own model if you want something more powerful than that and you're a paying user.

Speaker 2 But for GPT-5, OpenAI is doing away with the model picker or at least making it sort of less necessary because they have built this what they called a router that'll essentially analyze your request and how much

Speaker 2 sort of computation it needs to be able to answer that request and whether it's a simple query or something more

Speaker 2 involved. And it will kind of direct it to the correct model.

Speaker 2 And I think for a lot of people, this is going to be their first experience with a reasoning model because OpenAI does not make that the default right now in ChatGPT.

Speaker 2 And so I think that will be a big update for people, regardless of whether GPT-5 is actually better than previous models.

Speaker 2 Just the ability to kind of use these reasoning models for free seems like a pretty big deal.

Speaker 3 Yeah, getting rid of the model picker, I think, could cut both ways. And we should say that all of the big labs have a model picker.
So Gemini has one,

Speaker 3 Anthropic has one, and Claude. And sometimes I will, you know, maybe ask an easy question of one of these models that is maybe set to reasoning mode.

Speaker 3 And then I'll think, oh, gosh, you know, I probably didn't need that much computational power.

Speaker 3 On the other hand, I do sort of feel like it sets up an incentive for OpenAI, which wants to sort of save as much money as it can, to just always try to route you to sort of the absolute least compute that you need.

Speaker 3 And so I will just be curious to see if I feel like that is affecting the quality of my experience now that I, you know, maybe can't go in and actually say, hey, let me use the good stuff. Yeah.

Speaker 2 So on this briefing, OpenAI said all the sort of expected things about how GPT-5 is better at everything than previous models.

Speaker 2 But they also spent a lot of time talking about what they called the vibes of the model, which they believe are quite good.

Speaker 3 They also gave a series of demos and one that I thought was interesting introduced this concept of what I believe Sam Altman called software on demand.

Speaker 3 So that GPT-5 can instantaneously create a piece of software for you. In the demo that we saw,

Speaker 3 one of the employees there built a

Speaker 3 tool to let his girlfriend learn French. And it did this in some sort of fun ways.
In one case, it sort of created a series of flashcards for her.

Speaker 3 In another, it created like a little snake game with like a mouse and a piece of cheese. So every time like the mouse caught a piece of cheese, it would like show her a new word to learn.

Speaker 3 And he was able to do all of that just via a text-based prompt. And it actually looked like pretty good.

Speaker 3 I mean, I think, you know, five years ago, if you turned that in in an intro to computer programming class, you probably would have gotten an A.

Speaker 2 Yeah, it's pretty impressive, but those are also things that other models can do today. So I'm going to need to like really drive this thing myself to figure out what it can do.

Speaker 2 And I'm going to put it through my usual bevy of tests known as Roostbench and see how it does. Yeah.
I confess during this briefing, I zoned out a little bit. I've been to a bunch of these.

Speaker 2 Everyone says their model is the latest and greatest, and it's so good at coding, and it's all got all these agentic capabilities. And it all starts to sound a little bit like marketing hype to me.

Speaker 2 For me, the interesting question to ask about new models these days is not how much better is it, or how does it score on these benchmarks? It's like, what is possible for me now that wasn't before?

Speaker 2 And I still don't have a really good answer to that from GPT-5, although I'm going to investigate.

Speaker 3 Yeah, and you know, that might be a good point to bring up two of the questions that got asked of the GPT-5 team during the briefing that we're on that I think would be of interest to our listeners.

Speaker 3 One was somebody asked, hey, like, are you starting to run into the limits here, right? Are the scaling laws holding?

Speaker 3 And Sam Altman said, quote, they absolutely still hold and we keep finding new dimensions to scale on. He said that they're still finding new paradigms that will let them scale in new ways.

Speaker 3 So he very much tried to give the impression that like, no, we are not struggling at all to figure out how to build better models.

Speaker 3 I suspect though that that might get some pushback as people start to use this thing and just sort of observe that like, yes, it is like clearly better in a handful of ways, but to your point, Kevin, can you really do anything that you couldn't do before?

Speaker 3 That doesn't like seem like it's the case. It's just sort of that it can do what it used to do a little bit better.

Speaker 3 I'm curious what you made of that.

Speaker 2 Yeah, I think that's reasonable. I mean, I wonder if they are starting to finesse their definitions of the scaling laws to sort of account for the new reasoning models.

Speaker 2 Because people, you know, for months now have been saying, well, the models in the pre-training phase may have gotten as good as they're going to get or about as good as they're going to get.

Speaker 2 But the way to get them to be more intelligent is through this post-training phase, is through these reinforcement learning cycles, these sort of reasoning environments that they're trained on and put into.

Speaker 2 And so I suspect that when they say that the scaling laws have not broken, they're also sort of referring to this kind of reinforcement learning reasoning approach as well. And I believe them.

Speaker 2 I've talked to people who say, you know, they think there's still a long way to go on that. But this was a really big model.
We don't know exactly how big.

Speaker 2 We don't know exactly how many GPUs it was trained on or how much data it was fed, but it's safe to assume that they did everything everything they could to sort of max out the scale of the model, at least in pre-training.

Speaker 2 And, you know, from what we saw in the demos, it doesn't look like it's like that much smarter.

Speaker 2 I mean, maybe it's a little better at some things, but it did not, you know, come out of the box super intelligent or anything like that.

Speaker 3 Yeah. The other big question that I'm always interested in when these big new models come out is what was the safety testing experience like? And is this model going to be sycophantic?

Speaker 3 And what sorts of very intense relationships are people going to to form with it? And Nick Turley addressed that one.

Speaker 3 He noted that earlier this week, OpenAI put out a blog post, which I actually wrote about in Platformer this week, that is all about their approach here.

Speaker 3 They say they're working with physicians, really trying to bring in a lot of outside expertise to help them understand how people are interacting with these models and make them safer.

Speaker 3 And he said that they are absolutely not optimizing for engagement here.

Speaker 3 They just want to sort of make a useful tool that sends you on your way and that essentially they're going to have more to communicate about this this soon.

Speaker 3 So we didn't get a ton of detail there, but they have said that at least in some ways they think that they have improved these models to make them less sycophantic.

Speaker 3 In addition to that, they said they did 5,000 hours of red teaming. They shared, I believe, these models with some external experts for advice on that.

Speaker 3 So they did say that they rate this model as high on the scale of could it be used to create novel bio-risks. So they're building in a bunch of protections around that,

Speaker 3 you know, but that doesn't doesn't seem great.

Speaker 3 But anyway, that was the sort of safety report that we got in advance of the launch.

Speaker 2 So they also said that in addition to all these new capabilities, GPT-5 is much more reliable than previous models.

Speaker 2 They claim it hallucinates less, and it does this interesting thing called safe completions, where basically if a model doesn't want to,

Speaker 2 you know, accommodate some request or carry out some task because it's sort sort of against the guidelines.

Speaker 2 Instead of just refusing it, it will sort of make up a safer version of the request and complete that instead. So it'll be interesting to see how people use that.
But yes, this is the claim they make.

Speaker 2 It's more reliable, less deceptive, and gives these kind of safe completions.

Speaker 3 Well, and that actually gets into something interesting, though, Kevin, which is like, what is it that makes OpenAI say this is GPT 5?

Speaker 3 The sort of like big number releases have this amazing marketing power now. I think because the leap from GPT-2 to GPT-3 was so big and the leap from three to four was also pretty big.

Speaker 3 And so that creates a lot of expectations for five. But in the background, OpenAI is just...

Speaker 3 trying a bunch of things, building a bunch of new models, and then sort of stapling various things together.

Speaker 3 And then eventually they get to something and they say, we're going to call this one five, but like it's not quite as linear as it looks from the outside, right?

Speaker 2 Yes. And all of the labs have had experiences where they thought they were training like a new model and then it didn't quite work out for as well as they wanted it to.

Speaker 2 And so they sort of assign it some lower number. You know, that's happened at at least a number of big labs that I know about.
It's happened at OpenAI.

Speaker 2 You know, they had a previous big model that they were building, which ended up becoming 4.5, I believe was supposed to be GPT-5 at one point, and it just didn't turn out as well as they'd wanted it to.

Speaker 2 So yes, they are playing games with the numbering of the models and sort of the marketing around that.

Speaker 2 But I think calling this GPT-5 just signals that they want this to be viewed as a similar step in capability that people saw from GPT-3 to GPT-4.

Speaker 3 Yeah, to me, that is one of the most interesting things about this release is like whatever GPT-5 turns out to be,

Speaker 3 this is the thing that they thought was kind of the next big step forward. And I think we should like evaluate it on those grounds.

Speaker 2 I think it's also like the big picture here is that OpenAI is trying really, really hard to stay at the head of the pack.

Speaker 2 This is a company that has been sort of racing very hard toward AGI or something that they can claim is AGI. And they are.
they are still going. I find actually their execution to be quite impressive.

Speaker 2 Like this is now a very large company. They've got a lot of different competing teams and priorities.

Speaker 2 And they had all this board drama. And I think, you know, it was reasonable to expect.

Speaker 2 And I certainly expected that in the wake of all that, they would sort of slow down and maybe allow some competitors to catch up.

Speaker 2 But they showed this week that they are not slowing down, that they are, in fact, accelerating and they want to get here before anyone else.

Speaker 3 True. Although they have also experienced a lot of poaching in recent weeks and months.
And I think one thing I'll have my eye on over the next several months is are they able to

Speaker 3 continue iterating very quickly or have some of the losses that they've experienced over the past few weeks really hurt them.

Speaker 3 You know, incidentally, Kevin, I'm told that in response to the GPT-5 launch inside Meta headquarters, the superintelligence researchers have moved their desks even closer to Mark Zuckerberg.

Speaker 3 So that's how seriously they're starting to take this over there.

Speaker 2 They are now sitting on top of Mark Zuckerberg.

Speaker 3 There are now two researchers who are sitting at Mark Zuckerberg's desk with him.

Speaker 3 And we'll have to see how that plays out.

Speaker 2 Yeah. Yeah.

Speaker 2 Those are some of our initial impressions, but we are going to actually come back tomorrow after we have had had a little time to play with the model and give some first impressions there too let's travel to the future now kevin

Speaker 2 All right, Casey, it is now Thursday. GPT-5 has been officially released for a few hours.
I still do not have access to it for some reason, but I gather that you do.

Speaker 2 So give me your day one vibe check. What are you seeing? What is GPT-5 like? And what do you make of the reaction to it?

Speaker 3 Well, this is a very significant moment in the history of Hard Fork, Kevin, because for the first time, I'm having a conversation with you while vibe coding something.

Speaker 2 What are you vibe coding?

Speaker 3 Well, ChatGPT is currently hard at work building a to-do list app for me. I said I wanted it to have the aesthetic of the fantastic four first steps movie that just came out.

Speaker 3 I didn't love the movie, but I did love the production design. So I was like, make me a to-do list app that looks like that.
Let's see how it goes.

Speaker 2 I can't believe we're building giant gigawatt data centers for your stupid to-do apps. This is so wasteful.
God. Listen,

Speaker 3 you need to send over Roostbench your proprietary suite of evals so I could really put this thing through its pages.

Speaker 3 But look, let me give you some high-level notes, Kevin, on what I'm seeing and on what others are seeing. The headline here is that this does seem like a really meaningful improvement to ChatGPT.

Speaker 3 And I think, in particular, if you are a free user of ChatGPT, you're going to have a great day, right?

Speaker 3 Because for the first time now, in addition to the kind of, you know, standard ChatGPT model, you're going to have some reasoning capabilities.

Speaker 3 So essentially, if you're cheating your way through high school, you're just going to have a lot easier time of it now because this thing can do some really extended work on long problems.

Speaker 2 Yes, you can now cheat your way through an entire semester with just one press of a button.

Speaker 2 Exactly.

Speaker 3 This is good.

Speaker 2 Yeah, I mean, the way I saw people talking about it online was that they thought that OpenAI had not raised the ceiling of the AI frontier by a lot with GPT-5, but they had raised the floor.

Speaker 2 Essentially, now all the free users who previously got defaulted into the less powerful models are now going to be using the more powerful models, which could be a big...

Speaker 2 perceptual shift, if not a sort of one about the frontier capabilities.

Speaker 3 For sure. And I do think it does have some things that are not quite capabilities, but still will meaningfully affect how people use these AI systems.

Speaker 3 For example, this thing really is just a lot faster than its predecessor. So just over the past couple of hours, I took some editing work that I sometimes ask ChatGPT to do.

Speaker 3 I know about how long it takes using the O3 model. I put it through GPT-5 and sure enough, yeah, it blazed through it.
It did just as good of a job as it had done before.

Speaker 3 And so if you're the sort of person who's using ChatGPT a lot, I think that's really going to stand out to you.

Speaker 2 Yeah. What about the pricing? I saw some people saying that GPT-5 was much cheaper than they expected it to be.
That would not for cheaper for the sort of chat GPT subscriber.

Speaker 2 Those subscription prices are staying the same, but for developers who are building on top of it, my understanding is it's a lot cheaper than other models from other AI labs.

Speaker 3 That's right. It came in at $1.25 per 1 million input tokens, which is the same as Google's Gemini 2.5 Pro.

Speaker 3 Google has, of course, also been pricing really aggressively to try to box out the competition.

Speaker 3 What makes that figure interesting, I think, Kevin, is that that number is a lot smaller than Anthropic's Claude for Opus API, which comes in at $15 per million input token.

Speaker 3 So I think some of these really well-capitalized AI labs are taking this moment to say, hey, we're going to put a lot of pricing pressure on some of our competitors.

Speaker 2 Yeah, it very much reminds me of the moment like 10 years ago when like Ubers were $4 because the venture capitalists were just subsidizing the artificially cheap prices.

Speaker 2 We're in sort of that moment for AI tokens now.

Speaker 2 What else can we say about GPT-5 in the couple of hours now that it has been out?

Speaker 3 Yeah, well, you know, I sometimes like to joke that the worst insult you can make to anyone who has just released a new AI model is my timelines are now longer.

Speaker 3 And it does seem like that is something that people are saying about the new GPT-5.

Speaker 3 And what I slash they would mean by that is I now think it's going to take a little bit longer until we reach AGI, some sort of very, very powerful AI system.

Speaker 3 In fact, some people were posting online screenshots of some prediction markets that until today, when asked, who do you think will have the most powerful AI model at the end of August, were showing open AI in the lead.

Speaker 3 And almost instantaneously after the live stream on Thursday, Open AI collapsed and Google is now ascended and is assumed to have the best model by the end of this month.

Speaker 3 So, you know, I don't want to overstate what that means necessarily, but it does seem like there was a huge contingent of people who thought that GPT-5 was going to be this revolutionary new model.

Speaker 3 And it seems instead like a more evolutionary one.

Speaker 2 Yeah, that makes a lot of sense to me. One other thing that stuck out to me, and I wonder if it stuck out to you too, was OpenAI released some benchmarks and some data about GPT-5.

Speaker 2 And one of the things they showed was that hallucinations, the rate of GPT-5 just making stuff up while answering questions has gone way down.

Speaker 2 It's now, you know, for some types of questions, it's sort of around 1%

Speaker 2 hallucination rate.

Speaker 2 And I think that was interesting to me because this is clearly something that was a problem with earlier versions of this. In fact, there was some speculation and some indication that actually

Speaker 2 these newer reasoning models were hallucinating at higher rates than the sort of previous generation of models. And there's a lot of concern about that.

Speaker 2 And it seems like they have figured out a way to get the hallucinations under control with GPT-5. Although with everything, I don't totally trust these benchmarks.

Speaker 2 I'm going to have to see this for myself.

Speaker 3 Yeah, everyone's mileage is going to vary on this one. I will say I have already caught it hallucinating a couple of times, like somewhat disappointingly.

Speaker 3 So as always, don't trust these things for anything mission critical. You're always going to want to double check your facts.

Speaker 2 Yeah, only use it to build stupid to-do apps with the Fantastic Four aesthetic on them.

Speaker 3 The Fantastic Four have a very cool aesthetic, and I think you need to open up your mind a little bit.

Speaker 2 Okay, that is our day one vibe check of GPT-5,

Speaker 2 and we will continue to play around with this and tell you anything cool, or interesting, or strange, or upsetting that we find.

Speaker 3 Sounds good.

Speaker 2 All right, that's enough about GPT-5. When we come back, we'll talk about another AI system we got our hands on this week, Alexa Plus.

Speaker 1 This podcast is supported by ATT. America's first network is also its fastest and most reliable.

Speaker 1 Based on Rootmetrics United States Root Score Report 1H2025, tested with best commercially available smartphones on three national mobile networks across all available network types, your experiences may vary.

Speaker 1 Root metrics rankings are not an endorsement of ATT. When you compare, there's no comparison.
ATT.

Speaker 4 Over the last few decades, the world has witnessed incredible progress.

Speaker 4 From dial-up modems to 5G connectivity, from massive PC towers to AI-enabled microchips, innovators are rethinking possibilities every day.

Speaker 4 Through it all, Invesco QQQ ETF has provided investors access to the world of innovation with a single investment. Invesco QQQ, let's rethink possibility.

Speaker 4 There are risks when investing in ETFs, including possible loss of money. ETFs' risks are similar to those of stocks.

Speaker 4 Investments in the tech sector are subject to greater risk and more volatility than more diversified investments.

Speaker 4 Before investing carefully, we can consider fund investment objectives, risks, charges, expenses, and more in perspectives at Invesco.com. Invesco Distributors Incorporated.

Speaker 5 Hear that?

Speaker 7 That's what it sounds like when you plant more trees than you harvest.

Speaker 9 Work done by thousands of working forest professionals like Adam, a district forest manager who works to protect our forests from fires.

Speaker 10 Keeping a forest fire resistant is synonymous with keeping a forest healthy.

Speaker 11 And we do that through planting more than we harvest and mitigate those risks through active management.

Speaker 12 It's a long-term commitment.

Speaker 5 Visit workingforestsinitiative.com to learn more.

Speaker 2 Now, Casey, are you an Alexa user?

Speaker 3 I have been an Alexa user for a long time.

Speaker 3 I still have one of the original Amazon Echoes in my house. And to Amazon's credit, it still works.

Speaker 2 The Pringles can.

Speaker 3 Yeah, I have the big old sort of Pringles can Echo.

Speaker 2 Yeah, me too. So I am a heavy user of this product.

Speaker 2 I have probably five of them in my house in various rooms. And so I'm very excited for our conversation today, which is going to be about the new AIified Alexa Plus.

Speaker 2 And before we get into our experiences using this thing and our interview with the guy who runs it, we should make a couple disclosures.

Speaker 2 One of them is that the New York Times company has recently agreed to a licensing deal with Amazon that will allow Amazon access to Times content for its AI platforms, including including Alexa.

Speaker 2 So we just thought you should know that. We have nothing to do with that, obviously, but that is going on in the background in another part of the company.

Speaker 2 The second thing we should say is that if you have an Alexa device, it is going to be going off constantly during this segment, unless you go over right now and hit the little button that mutes it.

Speaker 2 So sorry in advance to Alexa owners, but we'll give you a little time right now to pause this, go over, hit the mute button on your Alexa and come back.

Speaker 3 Or alternatively, just find the circuit breaker at whatever house you're in right now. You just shut them all off.
Run on battery power for the rest of this episode. Alexa, order 14 bags of dog food.

Speaker 2 I wonder if that actually works.

Speaker 3 Wait, and I should also probably disclose that my boyfriend works for Anthropic because I'm pretty sure that Anthropic is providing APIs that are being used in Alexa Plus.

Speaker 2 Wow, we've got so many disclosures today. Yeah.
Okay. All right.
Let's get started. So

Speaker 2 Alexa. Alexa is one of the most puzzling technology products that I have ever encountered.
And like you, I have been an Alexa user since the very early days.

Speaker 2 People don't realize this is, this product was released in 2014. Alexa is 11 years old.

Speaker 2 And when it came out, I was very excited. I thought, I'm going to put this smart speaker in my house and I'm going to, you know, ask it to do things for me.

Speaker 2 And it's going to be like having a little assistant right there on my kitchen counter. And Alexa has added dozens of features, maybe hundreds of features since 2014.

Speaker 2 And I use zero of them because the three things that I use Alexa for are setting timers, choosing music to play in my house, and telling me the weather before I leave for the day. Absolutely.

Speaker 2 Are those like similar to what you use those?

Speaker 3 Those are the exact three things that I use Alexa for. Have I tried to use it for other things? Yes, but the experience, frankly, has just never been that great.
So I always come back to those three.

Speaker 2 Yes, those are the big three in my house. Same use cases, same limitations.

Speaker 2 But when generative AI started to get good a couple of years ago, I think people naturally started to ask, well, when is Alexa going to start using this new generative AI technology?

Speaker 2 It's sort of built on this older, more deterministic kind of system, but it seemed like a natural thing to expect that Alexa would start to incorporate some of this.

Speaker 2 technology to be able to answer maybe more open-ended questions, to give longer, more detailed responses, to do more than just set timers and tell you the weather.

Speaker 3 Yeah, I mean, once OpenAI released voice mode for Chat GPT, it just immediately seemed so much more interesting and powerful than Alexa and Siri, which is Apple's very similar system that it makes for its devices.

Speaker 3 And so, yeah, I think both of us were like, okay, well, when are we going to get that open AI style voice mode in these smart devices that we have in our homes?

Speaker 2 Yeah. And so it's taken a while.
We should say that. Like it has not been a smooth or simple process.

Speaker 2 And part of what I'm so excited to talk with Daniel Rausch, the VP of Alexa, about later in this show, is just why it's been so hard to sort of shove an LLM-based generative AI technology into this pre-existing assistant product.

Speaker 2 But we should just talk briefly about what Alexa Plus is and then our experiences with it, because both you and I have gotten to try this over the past few days.

Speaker 3 Yeah, so Kevin, tell us a little bit about Alexa Plus.

Speaker 2 So Alexa Plus is the name for the most recent overhaul of the Alexa virtual assistant. It's powered by generative AI.

Speaker 2 We don't know exactly which model or models, but it seems to be sort of a mix of Amazon's proprietary AI models. models and then maybe some of Claude, which it has a deal with Anthropic.

Speaker 2 And so it's been using Claude inside of its AI products for a number of months now. And Amazon claims that the new Alexa Plus is able to do much more.

Speaker 2 It's able to be much more conversational, more personalized. It can do things like book reservations at a restaurant or order you an Uber.

Speaker 2 It can answer questions that sort of aren't just pure lookups where you're looking for, you know, what time is the baseball game tonight? It can actually do more complex things for you.

Speaker 2 It can control the smart devices and appliances in your house, and it can purchase things for you online. So, this new Alexa Plus is not out to everyone yet.
They've been rolling it out slowly.

Speaker 2 They are now in what they call the early access period, but we were able to get this on some new devices that we ordered. It also doesn't work on every kind of Echo device.

Speaker 2 You have to have sort of one of the newer ones to be able to run it.

Speaker 3 And, Kevin, when you say that this has been rolling out slowly, like it has been rolling out extremely slowly.

Speaker 3 Like, it was only on June 23rd that Amazon said that 1 million people had Alexa Plus across presumably hundreds of millions of echo devices out there.

Speaker 2 Yep. So you and I both got the new echoes that can run the Alexa Plus early access program and turned it on and set it up.
And a few things stick out to me right away.

Speaker 2 One is the voice on this new Alexa is just way better than the old.

Speaker 3 I would agree with that.

Speaker 2 It is way more fluid. It sort of sounds like more like something you'd hear out of ChatGPT voice mode.
They have managed to sort of overhaul the actual voice part of the voice assistant.

Speaker 2 So it sounds much more like a human.

Speaker 3 And there are a bunch of different voices. I think I saw eight of them inside the app.
Half of them are masculine. Half of them are feminine.
So yeah, you can change that to your liking.

Speaker 2 Yeah. The other big difference I noticed right away is that the new Alexa Plus does not require you to say the wake word like Alexa between every question and answer pair, right?

Speaker 2 With the old Alexa, if you wanted to ask a follow-up question, you had to say Alexa again.

Speaker 2 With the new one, you can just kind of leave it and it will sort of intuit or pick up that you have a follow-up question and it will sort of listen for a while longer.

Speaker 2 So you can actually have these more extended multi-turn conversations.

Speaker 3 Yeah. And that lets it do different kinds of things.
Like one of the first things that I did with Alexa Plus was it said, hey, would you like to try to solve a riddle?

Speaker 3 And I thought, what are you, the Sphinx? But I said, well, sure. What the heck? And, you know, it gave me a series of clues.

Speaker 3 And within, you know, three clues, Kevin, I was actually able to solve the riddle. Wow.

Speaker 2 Good for you.

Speaker 2 Totally smart. I'm so proud.
Thank you.

Speaker 3 Thank you so much.

Speaker 3 So, yeah, what else were you doing with this thing?

Speaker 2 Another thing it can do is just give you longer answers. Like the original Alexa was sort of limited to a sentence or two.

Speaker 2 Maybe you could ask it to look something up on Wikipedia and it would sort of spit out a few sentences, but it was really limited beyond that.

Speaker 2 But I can now ask it to make up a story and read it to my kids. So we had some fun doing that the other night as a family.

Speaker 2 You can ask it to suggest a recipe for dinner based on, you know, what's in your fridge, and it will sort of help you with that. I used that last night.

Speaker 2 So, these are some of the new features that I was excited to try. I also tried some of their integrations, like they have an integration with Open Table and with Uber and a bunch of other companies.

Speaker 3 Tell me about this because I set this up, but I did not actually use it. So, how did that work?

Speaker 2 So, basically, you scan a little QR code on your phone and link your Uber account or your OpenTable account to your Alexa account. It takes, a minute or so.

Speaker 2 And then you can just kind of say, like, order me an Uber from this place to this place, or I want a table at a restaurant in downtown San Francisco near the ferry building for two people at 6.30 tomorrow.

Speaker 2 And it will sort of pull up a couple options and you choose what you want. And then it can go book the table for you.
So that I thought was cool.

Speaker 3 And that actually worked when you tried it.

Speaker 2 So I did not actually follow through with the booking, but I did order an Uber for myself and it did work. Okay, cool.
Yeah.

Speaker 3 I mean, that actually seems like truly useful. Just be like, say to your thing on your desk, hey, I need an Uber to the airport, and it pulls up one.
That's great. Yeah.

Speaker 2 And it can do other cool sort of multi-step things too.

Speaker 2 Like, I was able to say I needed a new thing for my kitchen, like a box grater, and I was able to kind of go to Alexa and say, hey, look up on Wirecutter what the best-rated box grater is and add it to my Amazon.

Speaker 3 Now, can I guess why you need a new box grater? Why is that? You used it to grate ginger and it dulled the edges. No.
Okay. Was it what was the reason?

Speaker 2 I left it in an Airbnb. Okay.

Speaker 3 I should have seen that coming. Anyways, go ahead.

Speaker 2 So anyway, those were some of the good things about this product, but we have to talk about some of the limitations as well. Casey, what was your experience with Alexa Plus?

Speaker 3 Okay, so I have to say, I did not have a good experience with this thing. First of all, I bought an Echo Show 5.
There's a big banner on the page that says it works with Alexa Plus.

Speaker 3 And so the thing shows up at my house. And basically what I've come to understand is that an Echo Show is a device that just constantly invites you to spend money with Amazon.

Speaker 3 And I found it honestly infuriating because I plugged this thing in. And when you set it up, it's like, what, you know, what kind of background do you want? I was like, show me some art.

Speaker 3 You know, that's one of the options. And I would say for about four seconds per minute, it would show me, you know, some Renaissance

Speaker 3 masterpiece or something. And then it would be like, hey, do you want aspirin?

Speaker 3 What? You want paper towels? You want to buy paper towels? You can actually buy paper towels right now. Just say, hey, Alexa, buy paper towels.
And it, it was just sort of this like forever.

Speaker 3 And so I eventually just unplugged the thing because I was like, why did I just spend $90 to have a permanent rotating advertisement for household products like on my desk? That is so weird.

Speaker 3 So I was just like, it just put such a bad taste in my mouth about the whole thing. And then a day later, I get the Echo Show 15.
Okay. And for some reason, Amazon sent me two of them.

Speaker 2 I truly don't know why.

Speaker 3 I did not need two of them. And so I unbox the thing, and the thing is meant to be mounted on a wall.
Now, there are a lot of things I'm willing to do for a podcast, but mount an echo show on my wall.

Speaker 2 You're not willing to do a construction project, it's not one of them.

Speaker 3 No, I was not going to do that, and so I just had, and also the thing like can't stand up on its own, so I just had like a 15-inch screen sitting on my desk for a day while I'm talking to it.

Speaker 3 And it's like, this whole thing is like very silly. So, that's like the hardware side of it, okay?

Speaker 3 You may may have a better experience because, you know, I don't know, you like mounting things to your wall. And so you did that.
And so, you know, you're having a good time.

Speaker 3 But that was just kind of all of the, um, you know, the, the, the, the precursor steps I needed to take to even be able to like engage with this thing.

Speaker 3 And so then I, you know, I, I finally have it set up and I start to try to put it through its paces.

Speaker 3 So I, you know, go through the little riddle game and I, uh, you know, it's, it's like, hey, like, I could help you like do a personalized meal plan. It's like, all right, great.

Speaker 3 Yeah, you set me up up with a personalized meal plan. And, you know, it's like, well, you know, we could do this or that.
And it showed me like a row of

Speaker 3 like recipes that it could cook for me. And so I swipe through with my finger and I see a lemon pasta.
And I say, okay, show me the lemon pasta. And it says, sorry, I didn't get that.

Speaker 3 And I said, Alexa, the lemon pasta, right? Could you make me this lemon pasta from this website that you're showing me right now? Dead silence. And I was like,

Speaker 3 oh my God. And it's like this, like, and right here, we have just landed in the exact spot that has been bedeviling Apple for the last year, that is bedeviling Alexa right now.

Speaker 3 These systems are just very hard to make reliable. Now, I will say, I, the, the device was sort of having trouble connecting to my internet.
Everything else in my house was connected to the internet.

Speaker 3 It was working fine. But like, I, this was just sort of every once in a while being like, you're not connected to the internet.
So was that an issue with me? Was that an issue with the hardware?

Speaker 3 I'm not totally sure. Maybe that was why it wasn't able to perfectly answer my question.
I, you know, I do want to say that in case this was not actually an AI issue, but oh man, I.

Speaker 3 Within five minutes, I was like, get this thing out of my house. Like, and like, and I, and again, I wanted to like, I was excited about it.

Speaker 3 And after five, and after like two days of ads for paper towels and one day of, I'm not going to show you the lemon pasta, I thought, what am I doing with my life?

Speaker 2 Yeah, I, I should say, like, I have also had a bunch of like very bizarre and frustrating experiences with this thing.

Speaker 2 Okay, so we've said what we like about this thing.

Speaker 3 Yeah, I think many remind me what that is again.

Speaker 2 Many of the new capabilities are quite cool. Yeah.
Unfortunately, many of the old capabilities I relied on as the reason I used Alexa at all have become broken as a result of this update.

Speaker 3 Okay, so tell me about this.

Speaker 2 So one of the things you also notice very quickly when you're using this thing is that the latency is just like a problem. It's a little slow to respond to questions.

Speaker 2 It's not as zippy as the old sort of pre-LLM Alexa.

Speaker 2 I understand that these things have to go to the cloud. They're processing more complex instructions.
It's all going to take a little time. I assume that will get better.

Speaker 2 The basic things that it gets wrong now include alarms, which is actually a thing that I use Alexa for every day.

Speaker 3 Wait, so tell me how it got it wrong.

Speaker 2 So the new Alexa Plus update seems to have broken Alexa's ability to reliably

Speaker 2 set and cancel alarms, which is a core thing that I use this product for.

Speaker 2 And so, for example, this morning, I woke up on my own a little bit earlier than my alarm, like 10 minutes before my alarm was supposed to go off. And so I said to Alexa, Alexa, cancel the alarm.

Speaker 2 Silence. Nothing.
This is a command that I have issued probably a thousand times.

Speaker 3 And Alexa Plus is a little smarter now and she's giving you the cold shoulder.

Speaker 2 Yes. And she's saying, actually,

Speaker 2 I'm going to wake you up anyway in 10 minutes. So that was not good.

Speaker 2 I also experienced some like hallucinations when I would ask it questions about like things happening in the world, things happening in the news.

Speaker 2 I asked it about a tennis tournament that's going on right now. I said, who's the top seed in this tennis tournament? It gave me the name of a player who's like not even playing in this tournament.

Speaker 2 And

Speaker 2 it also like has trouble. orchestrating the different tasks.
So one of the things that would happen is I would like tell it to, I gave it a research project for a dinner playlist.

Speaker 2 I was looking for some new

Speaker 2 music to put on a dinner playlist. And instead of doing that research project, it just started searching on Spotify.

Speaker 2 Like it routed the query to Spotify within the Alexa thing and started playing the music when what I had asked was like, do some research for me.

Speaker 2 So it seems to have a little trouble figuring out like what exactly the user wants and like orchestrating the commands.

Speaker 3 That case seems like a little borderline to me. I can imagine some people like asking for that, you know, maybe being happy if it like played some music.

Speaker 3 But I had this almost opposite issue where, again, as I'm sort of going through, okay, what can this thing actually do? And it's, you know, and it, it's like, you know, ask me what I can do.

Speaker 3 So I asked it. And one of the things it said was,

Speaker 3 I can help you explore Gen Z music trends, which there was just like something funny about the way it said it to me. So I was like, yeah, sure.
Why don't you help me explore Gen Z music trends?

Speaker 3 And, you know, it thinks for a second. And then it goes, well, I found some podcasts about it on Amazon Music.

Speaker 3 And I I was like, I sort of assumed you were either going to tell me something about Gen Z music or you were going to play Gen Z music, but now you're trying to sell me Amazon music, which I feel like is very consistent with how Alexa Plus handles everything, which is, could we sell you a service right now?

Speaker 3 Could we sell you a product? So, you know, Kevin, I want to say two things. One is I have not used this product all that long.

Speaker 3 And so I don't want people to think about anything I'm saying as anything other than first impressions.

Speaker 3 Like I have not truly had a chance to do the amount of reviewing that I would like to do to, I'm very confident that lots of other people are probably having much better experiences with this thing because I think if most people were having experiences bad as me, I just would have heard about this before now.

Speaker 3 But all of that said, Alexa Plus did not make a great first impression on me. And the echo family of devices that are just little windows that let you send money to amazon.com, they're not for me.

Speaker 2 Yeah, I had, I think, a slightly more positive experience than you. I did actually enjoy some of my interactions with Alexa Plus, but it just seems like it is not quite there yet.

Speaker 2 And I think Amazon knows this, which is why it's in this early access program. If you open it up, it says, you know, Alexa may make mistakes.

Speaker 2 So they're sort of like doing all of the careful rollout that you would expect for a product that is not fully baked. But like some of the features just don't seem to work.

Speaker 2 There's another feature that I tried where you can like email a document to this email address and it will sort of ingest it into your Alexa and then you can have it summarize it.

Speaker 2 So I was very excited. I was like, I can like, you know, I can learn about new papers in AI while I'm like doing the dishes.

Speaker 2 And so I email the paper to the Alexa email address and I say, summarize the paper I just sent you. And it says, I did not receive a document.
So

Speaker 2 I'm just like, I think they need to spend a little more time in the kitchen cooking this one.

Speaker 2 But I think the overall, my overall impression is that like the Alexa Plus that you have now in this early access program, it is a little like having a kind of GPT 3.5 class model inside of a smart speaker, which I think is a valuable thing and one that I would like them to continue to build on.

Speaker 2 But it is not the state of the art in either the language model or the kind of basic tasks. And actually, it seems to be regressing on some of the basic tasks.

Speaker 2 So I would say this is like two steps forward, one step back.

Speaker 3 I think the most powerful thing that the new Alexa Plus has done for me is it has made me forgive Apple for not shipping anything with the new Siri. I get it now, Apple.

Speaker 3 I talked a lot of mess about you on this podcast about not shipping this thing, but now having used one of your close rivals attempt to do the same thing that you're doing, like I get it now.

Speaker 3 I think the finest minds in the world who are working on this stuff actually don't know how to do this yet. That's my big takeaway.

Speaker 2 Yeah, I think what's happening with Alexa and Siri right now is sort of a

Speaker 2 symbol of what's happening in the American economy writ large, which is like we are trying to jam these like new AI technologies into these legacy systems and processes.

Speaker 2 And it's just kind of a messy fit. Like these things are weird.
They are not deterministic. They are not reliable in the ways that like a sort of older, more rule-based thing could be.

Speaker 2 And they have these amazing capabilities. But when you try to like make these hybrid Frankenstein things with with like the old system with the new brain, it just doesn't really work.

Speaker 2 And I think that's like happening not just in these virtual assistants, but like in a lot of places throughout the economy. Absolutely.

Speaker 3 I also just think that when I'm using a chat bot on my laptop and it gives me something that's like 80 or 85% right, that's much more useful to me than like an Alexa response that's 85% right.

Speaker 3 Because on a chat, like in a chatbot setting, I can just sort of take what I need, I can edit or modify it.

Speaker 3 I can maybe ask the same question of another chat bot and see if I get a slightly different or better result. Like I feel much more in control of my own destiny.

Speaker 3 I can take the stuff that works and leave behind the stuff that doesn't.

Speaker 3 When you're doing this thing with a smart speaker, if it doesn't work, you say, yeah, why'd I spend 90 bucks on this piece of junk? You know?

Speaker 3 And it's like, and I think what I learned about myself was I have so much less patience for this sort of thing when it is a piece of hardware in my home that has made some really big promises about how it's going to help me with all my routines and everything.

Speaker 3 Well, if it's kind of hard to set up and it doesn't work like the vast majority of the time, it all just kind of feels like a waste.

Speaker 2 Right. So I'm, I'm very glad we got to exchange first impressions of Alexa Plus.

Speaker 2 We should also have a conversation with someone at Amazon who has been involved in this overhaul of their flagship voice assistant.

Speaker 2 So when we come back, we're going to be joined in the studio by Daniel Rausch. He's the vice president of Alexa and Echo at Amazon, and we're going to get into all of this with him.

Speaker 2 And you can ask about those ads. Oh, I'm going to.

Speaker 3 But first, some ads.

Speaker 1 As a small business owner, you don't have the luxury of clocking out early. Your business is on your mind 24-7.
So when you're hiring, you need a partner that works just as hard as you do.

Speaker 1 That hiring partner is LinkedIn Jobs. When you clock out, LinkedIn clocks in.

Speaker 1 LinkedIn makes it easy to post your job for free, share it with your network, and get qualified candidates that you can manage, all in one place. Post your job.

Speaker 1 LinkedIn's new feature can help you write job descriptions and then quickly get your job in front of the right people with deep candidate insights. Either post your job for free or pay to promote.

Speaker 1 Promoted jobs get three times more qualified applicants. At the end of the day, the most important thing to your small business is the quality of candidates.

Speaker 1 And with LinkedIn, you can feel confident that you're getting the best. Find out why more than 2.5 million small businesses use LinkedIn for hiring today.
Find your next great hire on LinkedIn.

Speaker 1 Post your job for free at linkedin.com/slash hard fork. That's linkedin.com slash hard fork to post your job for free.
Terms and conditions apply.

Speaker 9 Work done by thousands of working forest professionals, like Adam, a district forest manager who works to protect our forests from fires.

Speaker 11 Keeping a forest fire-resistant synonymous with keeping a forest healthy, and we do that through planting more than we harvest and mitigate those risks through active management.

Speaker 5 Visit WorkingForestsInitiative.com to learn more.

Speaker 2 Daniel Rausch, welcome to Hard Fork.

Speaker 13 Thanks so much for having me.

Speaker 2 So, Casey and I have both spent the past few days playing around with the new Alexa Plus.

Speaker 2 And I'd like to just start by asking about the technology that powers this thing.

Speaker 2 How much of it is a new LLM-based system versus the old, more deterministic model that powered the old Alexa?

Speaker 13 Yeah, well, from an AI and model perspective, everything is entirely new.

Speaker 13 There are some legacy deterministic systems downstream, but really it's a complete re-architecture of everything that you would say Alexa is from the way you have a conversation and engage with the experience at a very basic level, you know, all the way through Alexa acknowledging you or just maintaining a chat.

Speaker 13 So there's a lot of new under the hood. Yeah.

Speaker 3 Talk about the challenge of moving from this deterministic system to something that is very powerful, but also much less reliable.

Speaker 13 Yeah, I would say, well, hopefully you're not seeing much less reliable. I would say, you know, we've got some edges to sand and we're in early access.

Speaker 13 I'm sure we'll get to talk about the nature of the rollout, but I just mean like in general, LOS

Speaker 3 are not as reliable as a deterministic system.

Speaker 13 I get it. So, you know, we want to capture all the benefits of that non-deterministic, we'd call it stochastic system in the space.

Speaker 13 It has the elegance of really engaging in human conversation, but we want the predictable outcomes. Now, large language models don't support interfaces out of the box to classic systems.

Speaker 13 So getting those capabilities to interface, we would talk about it as APIs across these interfaces for other systems, it's quite hard. They speak natural language.
APIs don't speak natural language.

Speaker 13 They speak clunky computer science language, but it's very predictable and it gets a lot of things done.

Speaker 13 So I would say if you had to list the technical challenges, you would say, well, the many millions of things, we stopped counting at some point, but the many millions of things that, you know, original Alexa could do, marrying that with the power of LLMs is definitely the first and most prominent on the list.

Speaker 3 So take us back to when LLMs are first coming out. You guys are starting to play around with them.

Speaker 3 It's sparking ideas for you of, gosh, if we could like marry this to Alexa, we could have something really cool. What are some of the uses that you're thinking about?

Speaker 3 Like, what are the kind of dreams that you have for this model that you're hoping you can bring into reality?

Speaker 13 I mean, I think it's, we think of the capabilities in two buckets, I would say. Take everything that Alexa, the original Alexa can do, and just make it way better.

Speaker 13 You know, just picking up from what customers are already doing with Alexa, then you start brainstorming. And where I think you were really headed was, what are all the new things that we can do?

Speaker 13 And the depth of conversation that you can have with the new Alexa experience just opens whole vistas of new kinds of things we can get done. We can help you plan a trip and then follow through on it.

Speaker 13 We can watch for concert tickets for you.

Speaker 13 We can, you know, not just help you brainstorm about cuisine, but either pick a recipe and get some groceries and invite the neighbors or, you know, let your partner know it's date night and we're going out and book a table.

Speaker 13 So I think the kinds of journeys and the kinds of tasks we can get done for customers are just so much more expansive.

Speaker 2 So Casey and I have spent the past couple of days trying out Alexa Plus, and we have some feedback which we can share with you now or later. We've talked about it on the show just before this.

Speaker 2 I think it's fair to say we both had some things that impressed us about the new Alexa Plus and some things that were challenging, including some of the basic stuff that Alexa seemed to be very good at before, or at least that I know how to get reliable performance out of Alexa 4 that no longer seem to work as well.

Speaker 2 But what I actually want to know is like, why has it been so hard to do this?

Speaker 2 Because back in 2023, when Amazon announced that it was going to revamp Alexa, sort of give it this brain upgrade with these new AI capabilities.

Speaker 2 They said this was going to be ready in 2024, and then that got pushed back a couple different times.

Speaker 2 So walk us through kind of the journey that you all have been on over there, trying to sort of shoehorn this new technology into this existing products and maybe some of the challenges that you encountered along the way.

Speaker 13 Well, I'll tell you, you know, we should definitely get some of the feedback. We can cover as much as you like here on the show.

Speaker 13 So if you rewind the tape, actually, you were asking about this too, as we're starting to to experiment, what can we imagine doing?

Speaker 13 If you go back to 2023 and the models that were available then, the state of the art, very little instruction following, reasoning,

Speaker 13 low ability to execute on these interfaces with other systems, we announced something called Let's Chat, which was sort of a mode of Alexa.

Speaker 13 So think about, you know, flipping a switch on Alexa and turning on a chat interface so that you can do some basic question and answer and have a discussion about a topic, mostly about knowledge native to the model's training data versus bringing something in at runtime the way that modern chatbots answer questions by going out on the internet.

Speaker 13 I think what we mostly learned from that announcement and the customers that we rolled out to is that we just had to increase our vision and do something more audacious, basically.

Speaker 13 Customers really wanted, and we all really wanted, to pick up from where Alexa is and was and extend all of those capabilities. That is many millions of things that Alexa can do.

Speaker 13 And when you count the tens of thousands of services and devices that are integrated with Alexa and the space of the interfaces and the systems that you need to integrate with, it's incredibly large.

Speaker 13 So that's the first sort of technical challenge I mentioned before, sort of the first and probably most important bucket. Second is really grounding it in authoritative sources.

Speaker 13 I think, as all of us know, you can sit there and fiddle with a chatbot long enough to precede to being smarmy or responding in ways that we don't believe are the way Alexa might act, for example, or press it to give you wrong information that's from some unauthoritative source or a mistake in its training data online shifts back to its native training.

Speaker 13 So getting Alexa to speak confidently in her personality with authority and answer questions right. It's another key challenge.

Speaker 13 Personalizing an experience of this depth so that Alexa is always learning from her interactions with you and extending your interactions so they get more delightful over time.

Speaker 13 I think this is something you probably wouldn't have seen in a weekend's worth of fiddling with the experience. You'll see it gets more personalized.

Speaker 13 That's another big technical challenge because the surface area is so much bigger. So those are a few of the reasons why it sort of took so long.

Speaker 13 And I think if you rewind the tape to 2023, it's really about learning

Speaker 13 how big a project Alexa Plus would be and then starting to put one foot in front of the other, really inventing the space of creating those integrations because it just hasn't been done.

Speaker 2 What's an example of some early failure mode that you all had to overcome? I mean, I've heard some stories from folks who have worked on Alexa or worked with suppliers who provide models to Alexa.

Speaker 2 They would tell me stories about, you know,

Speaker 2 you'd ask Alexa to set a timer for you and it would like write you an essay about the history of timers. It was just sort of misunderstanding the request in the way that a large language model might.

Speaker 2 So tell us some of those stories.

Speaker 13 Well, that's a good one. I mean, verbosity was definitely an early issue.

Speaker 3 It continues to be an issue on our podcast, by the way.

Speaker 2 We still haven't solved it.

Speaker 13 I've got some training ideas.

Speaker 2 Okay, good.

Speaker 13 You know, verbosity, these models want to give you an extensive answer.

Speaker 13 Customers don't want an extensive answer read out, and they certainly don't want a disquisition on, you know, the nature of timers, right? What they want is an interface that sets a spaghetti timer.

Speaker 2 And how do you get them to do that? Is it just as simple as putting in the system prompt, like, if a customer asks for a timer, like, don't give them an essay on the history of timers or be concise?

Speaker 2 Or how do you actually solve that problem?

Speaker 13 I would love it if it were that easy.

Speaker 13 You need a set of models. There are over 70 models in Alexa Plus.

Speaker 13 It's a vast space. There are different models specialized in different tasks.

Speaker 13 There are different corpuses of training data we use on different models to get them to complete instruction sets for us and really follow the rules of the road in interfacing with something.

Speaker 13 You always need to loop back to central central systems that are maintaining context and the conversation and picking up from references and pronouns that you've used to refer back in time, right, and sort of cascade those forward.

Speaker 13 But the amount of work that went into just the interface between a large language model and the downstream systems that complete tasks is, I mean, it's the biggest body of work that we've put in.

Speaker 13 And there's It would be too much to even try, without a whiteboard here, it would be too much to even try to explain to you and your listeners, I think, the technical depth that went into it.

Speaker 13 We've got a great team working on it, and it's hard.

Speaker 2 Of those 70 models in Alexa Plus, how many are Amazon's own in-house models versus models like Claude that you all get from external companies? Aaron Ross Powell, there's a mix.

Speaker 13 So, you know, the best way to know what models in our Alexa Plus is just go to the Bedrock webpage and look at an update on the latest in there.

Speaker 13 We use the best tools that we have available to us for the job. We've got great partners over in AWS helping make sure we've got the right best tools for the job.

Speaker 13 Most of our traffic does flow through Amazon Nova models. We have the most control over how those get trained and tuned and post-trained.

Speaker 13 I think it's over 80% of traffic on sort of the main big inferences within the system flow through Nova models. But there are many different reasons to use many different models.

Speaker 13 I think you guys know better than most that models are special, you know, they specialize in different things. So we use the best tool for the job.

Speaker 3 Can you give us a sense of like how big the team is that's working on Alexa? Like, how big of a priority is this within Amazon?

Speaker 13 It's thousands of people. Okay.

Speaker 2 Yeah.

Speaker 13 Now, that's building hardware.

Speaker 13 It's building Alexa Plus. It's integrating with all those systems.
It's adding new integrations and new things that Alexa can do.

Speaker 13 It's a pretty vast scope. So it takes a big team.

Speaker 3 Yeah.

Speaker 2 There was a former

Speaker 2 machine learning scientist at Alexa AI, Mihail Eric, who did a long post on X last year, sort of his version of a post-mortem or a retrospective on what was happening with Alexa.

Speaker 2 And he wrote that Amazon had, quote, all the resources, talent, and momentum to become the unequivocal market leader in conversational AI.

Speaker 2 But then he said that Amazon and Alexa had fumbled the ball because Alexa was, quote, riddled with technical and bureaucratic problems, sort of made it seem like the problem was not just that the technology was an uneasy fit, but that there were also some organizational and bureaucracy problems that had to be solved.

Speaker 2 Can you talk a little bit about that?

Speaker 13 I won't comment on that post in particular. Honestly, I don't remember it, but there is definitely,

Speaker 13 I would say, sort of a startup culture transformation happening within the Alexa team. I think, you know, the life cycle of any product that's been around for 10 years, right, it has ups and downs.

Speaker 13 But I think our rate of innovation had slowed down. And I think coming through for customers on integrating these new powerful tools is something that's really quickened and inspired the team.

Speaker 13 I don't identify with the bureaucratic comment.

Speaker 13 Maybe it's a comment about me, so maybe I won't identify with it. I don't know.
But I do think the team is inspired.

Speaker 13 It's inspired by the vision, executing at an unbelievable pace and really

Speaker 13 creating a lot of invention because there are a lot of really hard problems.

Speaker 2 I'm curious where the new Alexa sits in relation to Amazon's overall AI ambitions. You know, this is a company that has offered a lot of AI models through AWS,

Speaker 2 big market share in cloud-based AI, also recently started an AGI lab at Amazon that is going to be pushing toward something like an artificial general intelligence.

Speaker 2 Is Alexa part of that overall effort to create and serve more capable AI systems? Or is this sort of a consumer targeted sort of spin-off of those efforts?

Speaker 13 I would say we do believe, and I share this belief,

Speaker 13 the leadership team at Amazon has,

Speaker 13 that

Speaker 13 this generation of generative AI is going to transform every customer experience we have. And that means, I mean, we have a lot of different types of customers.
You mentioned AWS.

Speaker 13 We have enterprise business customers. We have consumer customers.
We offer a very big landscape of services.

Speaker 13 I know that at some point within the last year, we counted and there's over a thousand different

Speaker 13 AI efforts going on with consumer applications alone.

Speaker 13 So if you sort of look at the scale and scope of what Amazon does and, you know, assume our belief that every experience will be transformed with generative AI, it's as big as Amazon is at that point.

Speaker 13 I would also say just internally, it's part of how we work now.

Speaker 13 To be as productive as you can be in this day and age and get as much done for customers as we aspire to, you have to build AI into how you're working. You both do this.
I know that.

Speaker 13 I'm sure many of your listeners do too, but it's certainly part of what's going on at Amazon as well. Yeah.

Speaker 2 Okay. Well, Daniel, we have some product feedback for you.
Let's do it. As they say, feedback is a gift.
Always. So we'd like to give you a chance.
It's Christmas.

Speaker 2 Casey, why don't you start? All right. Well, so

Speaker 2 let's see.

Speaker 3 I feel like most of my feedback is less about like Alexa Plus as an AI than it is about like Alexa Plus in the actual like hardware that I got.

Speaker 3 I first started with the Echo Show 5, which does say on the website that it is Alexa Plus enabled, but then some of your folks were like, no, like to get the full experience, you should get the 15.

Speaker 3 So I sort of had the two experiences.

Speaker 3 On the five,

Speaker 3 my first observation was that after I told it I would like to see art, I feel like every time I looked over at it, it was asking me if I wanted to buy paper towels or Advil or something.

Speaker 3 That was a little bit less the case once I got the 15.

Speaker 3 I don't know why that might have been, but I felt like the Alexa Plus AI thinks of me primarily as a person who might send more money to Amazon if you just sort of gave me a few more ideas for how I might do that.

Speaker 3 And what I would love is if it evolved to treat me like a person who isn't like constantly looking to buy paper towels. You know what I mean?

Speaker 3 So that I think is actually my biggest piece of feedback was I wanted fewer ads, fewer reminders that Amazon music exists, fewer reminders that Amazon Prime Video exists.

Speaker 3 Just like get to know me as a person a little bit. That's my big feedback.

Speaker 13 Subject line. Yeah.

Speaker 13 Enough with the paper towels.

Speaker 2 Enough with the paper towels.

Speaker 3 If I say I want to see art, I really mean it. So I get it because you want to show everything that your hardware could do.
You worked very hard on it. It can do many things.

Speaker 3 You want to like showcase all of those things. but I do think it comes across as a kind of insecurity in the device.

Speaker 3 Like if we're not constantly showing you everything that we've built into this thing, you'll never discover it and you'll put this thing in a drawer.

Speaker 3 I understand the pressures that you're under and I understand why it has evolved this way.

Speaker 3 But I, I have to say, when I unplugged it, I felt more relaxed because it wasn't giving me a list of things to do.

Speaker 3 And I didn't feel that way about my original Alexa, which is like great at the things that it does. So that's, I know, I know that's a lot, but those were my emotions.

Speaker 13 The first one to me, the five, the Echo Show five feedback sounds like a bug.

Speaker 13 I don't know what state it got into, but just if you asked for artwork and that's not what it was showing to you, that one sounds like a bug.

Speaker 13 The latter part, it might just be you have a different reaction than most of our customers do to the, to the onboarding experience is maybe what it sounds like, or you're just looking for more diverse things.

Speaker 13 I will be curious to follow up with. you in a week and find out if your use has helped shape the nature of what we're showing you.

Speaker 13 That is certainly our intention is that when you're onboarding to the new experience, the types of things you're asking for are the types of things we're showing you. And that could be anything.

Speaker 13 Like one of my most delightful,

Speaker 13 we have a new element called For You, which is a place where we post little notifications about things we think you might be interested in.

Speaker 13 And I had been helping my daughter study the periodic table for part of her chemistry final.

Speaker 13 And I was never great at remembering, in particular, the elements that like you need a mnemonic for, like lead or, you know, they don't match

Speaker 13 right pb very good so you were good at chemistry obviously yeah nailed it very low latency on this you know made them mnemonics uh but i had done that the night before and when i came in in the morning my for you said you know uh

Speaker 13 should we should we make a chemistry quiz for ellie or something like that i was like write a chemistry quiz for ellie and with the generative content capabilities of Alexa Plus, I just said, yeah, let's try that.

Speaker 13 Like, let's make, can we make a sheet of all of the elements that aren't intuitive?

Speaker 3 Now, did it also ask you if you wanted to buy lead?

Speaker 2 It didn't ask me that,

Speaker 13 which, you know, it's a product safety thing, so I'm glad we ticked that box.

Speaker 13 We will have to just look and see what the like the extent of the Amazon services being

Speaker 13 shown to you. But, you know, I will tell you that the body of feedback that we get from customers doesn't accord with that specific version of it.
Definitely customers want to learn what they can do.

Speaker 13 That is one of the biggest things that we hear from customers.

Speaker 13 I want to come back to what you said about unplugging unplugging the device and plugging it back in.

Speaker 13 We made the Alexa Plus experience incredibly easy to get out of and get back into and get out of, which is not true for sort of like an OS update, right? It's very hard to go backwards.

Speaker 13 And we worked very hard to try to make it possible because we knew there would be so much change. The very high 90%.
tile of customers stick to the new experience. And I love that.

Speaker 3 That makes sense to me. I mean, it's clearly much more capable.
It can do more stuff. And I know it's going to evolve and presumably improve over time.

Speaker 3 So yeah, no part of me was like, I want to go back to the old experience. I was just like, wow, this is very intense.

Speaker 3 Honestly, I think the bigger shift that I experienced was going from like just a pure speaker to something with a screen.

Speaker 3 Like that actually feels bigger than the challenge. I understand that.

Speaker 2 Piggyback on Casey's question, I think this is one of the big questions about.

Speaker 2 the Alexa business model is whether you see this as something that is going to make money on its own or whether this is primarily sort of a way of increasing the amount of money that people spend on Amazon.

Speaker 2 You know, I spend an ungodly amount of money on Amazon.

Speaker 13 Thank you for your business.

Speaker 2 Thank you for your business.

Speaker 2 A large fraction of my income is spent on various things on Amazon. And so I'm well aware of the many products that exist on Amazon.com, the website.

Speaker 2 I do not need like ads to be cascading on my screen telling me to buy more stuff on Amazon. But it does seem like this is primarily going to be an ad-supported product.

Speaker 2 Andy Jassy recently said on the earnings call for their most recent quarter that you all were trying to bring more advertising experiences to Alexa Plus. So talk to us about that.

Speaker 2 And like, are we just going to inevitably be more annoyed at the number of ads that are showing up on these devices?

Speaker 13 I definitely don't think you'll inevitably be more annoyed.

Speaker 13 I would say advertising is definitely part of the business plan, but it's not the biggest part. It's actually...

Speaker 13 probably the smallest part. The most important decision we made on the business side with Alexa Plus was bringing it into prime.

Speaker 13 And putting it into prime sort of brings together both all of a customer's prime benefits, because you might watch a video or listen to a song from Amazon Music or use your Amazon Photos benefit, which is awesome with an Echo Show and review your family photos.

Speaker 13 I use that all the time to look back at the kids in particular. But,

Speaker 13 you know, you have this long list of prime benefits.

Speaker 13 Alexa is a great place where they come together and putting in the value of having this world's best personal assistant into Prime just turns the Prime flywheel.

Speaker 13 And we know every time we've added a benefit to Prime, customers use their Prime benefits more. It's stickier to them.
It provides them more value and it turns into a great business.

Speaker 2 That's the goal. Okay, so Casey's feedback was about advertising.
Mine is about some of these new features that don't work and some of the old features that actually don't work either.

Speaker 2 So some of the more complicated stuff that I tried with Alexa Plus, such as setting up routines that involve involve multiple steps, such as emailing documents or research papers to the Alexa email address and trying to have it summarized, these things.

Speaker 2 These things just didn't work for me. The routines didn't run.
The papers didn't show up to be summarized. I assume this is just sort of growing pains and beta testing bugs and things like that.

Speaker 2 What I found more sort of frustrating that I wanted to ask you about because I'm actually not sure why this happens was that some of the basic features that Alexa had previously been good at and reliable at for me were less reliable with Alexa Plus.

Speaker 2 So this morning, for example, I tried to cancel an alarm that was about 10 minutes from going off and Alexa just didn't listen, didn't hear me. The alarm went off anyway.

Speaker 2 So help me understand why that is. Is that like a hallucination of the model? Is that a problem that is sort of related to the orchestration of the various tasks and sending it to the right place?

Speaker 2 What is going on there?

Speaker 13 Honestly, we'd have to dive deep into each of those to figure it out.

Speaker 13 part you know early access is here as a program to cover off on these kinds of issues and to make sure customers know that they can opt into alexa plus they can opt out if they want again the very vast majority of customers stick to it the

Speaker 13 The key challenges probably in everything you said are that interface between large language models and these more predictable rule-based systems that communicate through APIs.

Speaker 13 Something like canceling an alarm, making sure we find out the exact intent of what you were looking for, translating that into a set of commands, and then issuing those commands to an API, sometimes it does fail.

Speaker 13 It's rarely at this point because of hallucination. We've got so much going on to monitor for model hallucinations.

Speaker 13 It is sometimes because of just incorrect use of an API or just misunderstanding exactly where to send those commands.

Speaker 13 So that's more likely the case in each of these cases. Got it.

Speaker 2 I will give you one more piece of feedback, which is actually not for me. This is for my three-year-old son,

Speaker 2 who is our house's most active Alexa user. He talks to Alexa all the time, probably more than he talks to us.
Should I be concerned about that? Maybe, but we'll save that for a later episode.

Speaker 2 But he was doing story time with this because he wants to, you know, constantly wants more stories about various vehicles, you know, various dinosaurs.

Speaker 2 And so we were doing a story time about a super Tro truck that rescues cars from the water. And he asked for another one and it gave him a totally different set of characters.

Speaker 2 So, if there's some way for kids to have like a kind of, you know, their own private cinematic universe, persistent cinematic universes for super tow trucks, I know at least one three-year-old would really appreciate it.

Speaker 13 I got it. Excellent product description, by the way.
I like that for sure.

Speaker 13 I agree that children, as they explore, it doesn't even have to be an imaginary friend, but they do love themes and they love to continue them. So, it's great.
That's great feedback.

Speaker 13 We'll take that to the team.

Speaker 2 Yeah. For all of our feedback, I actually am very glad I've got to try this.

Speaker 2 I'm going to keep testing it. We are very active Alexa users in my household.
So we'll keep sending you our feedback.

Speaker 13 That's awesome. Yeah.

Speaker 3 We like trying new things around here.

Speaker 2 Yeah. Daniel, thanks so much for coming.
Thanks, Daniel.

Speaker 13 Really appreciate your time, guys. Thanks a lot.

Speaker 13 Oh,

Speaker 2 wait. What was that? Did you just set off your Alexa?

Speaker 3 Oh, Siri, stay out of this.

Speaker 3 Gosh, she's got a lot of nerve coming into this podcast recording.

Speaker 2 Wow.

Speaker 1 As a small business owner, you don't have the luxury of clocking out early. Your business is on your mind 24-7.
So when you're hiring, you need a partner that works just as hard as you do.

Speaker 1 That hiring partner is LinkedIn Jobs. When you clock out, LinkedIn clocks in.

Speaker 1 LinkedIn makes it easy to post your job for free, share it with your network, and get qualified candidates that you can manage all in one place. Post your job.

Speaker 1 LinkedIn's new feature can help you write job descriptions and then quickly get your job in front of the right people with deep candidate insights. Either post your job for free or pay to promote.

Speaker 1 Promoted jobs get three times more qualified applicants. At the end of the day, the most important thing to your small business is the quality of candidates.

Speaker 1 And with LinkedIn, you can feel confident that you're getting the best. Find out why more than 2.5 million small businesses use LinkedIn for hiring today.
Find your next great hire on LinkedIn.

Speaker 1 Post your job for free at linkedin.com slash hard fork. That's linkedin.com slash hard fork to post your job for free.
Terms and conditions apply.

Speaker 10 Keeping a forest fire resistant synonymous with keeping the forest healthy.

Speaker 5 Visit workingforestsinitiative.com to learn more.

Speaker 2 Well, Casey, we've got some good news and bad news. The bad news is that our hat promo is coming to an end.

Speaker 2 So as of this week, the limited time offer to get a free hard fork hat along with a new annual New York Times audio subscription is running out.

Speaker 2 This is your last chance to get this very cool limited edition hat as a thank you for subscribing to New York Times Audio.

Speaker 2 And these hard fork hats are only available to subscribers in the United States. The good news is that we are going to have more hats available for sale.

Speaker 2 A different hat with a slightly different design is going to be available in the New York Times store, we're told pretty soon.

Speaker 2 So look out for that if you already subscribe to New York Times Audio or you just want the hat without the subscription.

Speaker 2 We are told that the next wave of hard fork hats will be available internationally as well.

Speaker 3 Hard Fork is produced by Whitney Jones and Rachel Cohn. We're edited by Jen Poyant.
We're fact-checked by Caitlin Love. Today's show was engineered by Chris Wood.

Speaker 3 Original music by Marian Lozano, Diane Wong, Rowan Nemisto, and Dan Powell. Video production by Sawyer Roquet, Pat Gunther, Jake Nicol, and Chris Schott.

Speaker 3 You can watch this whole episode on YouTube at youtube.com/slash hard fork. Special thanks to Paula Schuman, Hui Wing Tam, Dahlia Haddad, and Jeffrey Miranda.

Speaker 3 You can email us at hardfork at nytimes.com with a story about when you fainted.

Speaker 14 Keeping a forest fire-resistant synonymous with keeping a forest healthy.

Speaker 5 Visit WorkingForestsInitiative.com to learn more.

GPT-5 Arrives, and We Try the New Alexa+

Press play and read along

Transcript

More episodes from Hard Fork

Hard Fork’s 50 Most Iconic Technologies of 2025

The Interview: How Wikipedia Is Responding to the Culture Wars

We Asked Roblox’s C.E.O About Child Safety. It Got Tense.

Google's Gemini 3 Is Here: A Special Early Look

Data Centers in Space + A.I. Policy on the Right + A Gemini History Mystery