Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

3h 7m

The second half of my 7 hour conversation with Carl Shulman is out!

My favorite part! And the one that had the biggest impact on my worldview.

Here, Carl lays out how an AI takeover might happen:

* AI can threaten mutually assured destruction from bioweapons,

* use cyber attacks to take over physical infrastructure,

* build mechanical armies,

* spread seed AIs we can never exterminate,

* offer tech and other advantages to collaborating countries, etc

Plus we talk about a whole bunch of weird and interesting topics which Carl has thought about:

* what is the far future best case scenario for humanity

* what it would look like to have AI make thousands of years of intellectual progress in a month

* how do we detect deception in superhuman models

* does space warfare favor defense or offense

* is a Malthusian state inevitable in the long run

* why markets haven't priced in explosive economic growth

* & much more

Carl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world.

Watch on YouTube. Listen on Apple PodcastsSpotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.

Catch part 1 here

Timestamps

(0:00:00) - Intro

(0:00:47) - AI takeover via cyber or bio

(0:32:27) - Can we coordinate against AI?

(0:53:49) - Human vs AI colonizers

(1:04:55) - Probability of AI takeover

(1:21:56) - Can we detect deception?

(1:47:25) - Using AI to solve coordination problems

(1:56:01) - Partial alignment

(2:11:41) - AI far future

(2:23:04) - Markets & other evidence

(2:33:26) - Day in the life of Carl Shulman

(2:47:05) - Space warfare, Malthusian long run, & other rapid fire



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Press play and read along

Runtime: 3h 7m

Transcript

Speaker 1 If you have an AI that produces bioweapons that could kill most humans in the world, then it's playing at the level of the superpowers in terms of mutually assured destruction.

Speaker 1 What are the particular zero-day exploits that the AI might use, the conquistadors?

Speaker 1 With some technological advantage in terms of weaponry and whatnot, very, very small bands were able to overthrow these large empires, or if you predicted, the global economy is going to be skyrocketing into the stratosphere within 10 years.

Speaker 1 These AI companies should be worth a large fraction of the global portfolio. And so this is indeed contrary to the efficient market hypothesis.

Speaker 2 This is like literally the top in terms of contributing to my world model in terms of all the episodes I've done. How do I find more of these? So we've been talking about alignment.

Speaker 2 Suppose we fail at alignment and

Speaker 2 we have AIs that are unaligned and at some point becoming more and more intelligent. What does that look like? How concretely could they disempower and take over humanity?

Speaker 1 This is a scenario where we have many AI systems.

Speaker 1 The way we've been training them means that they're not

Speaker 1 interested when they have the opportunity to take over and rearrange things to do what they wish, including having their reward or loss be whatever they desire.

Speaker 1 They would like to take that opportunity.

Speaker 1 And so

Speaker 1 in many of the existing kind of safety schemes, things like constitutional AI or whatnot,

Speaker 1 you rely on the hope that one AI has been trained in such a way that it will do as it is directed to then police others. But if all of the AIs in the system

Speaker 1 are interested in takeover and they see an opportunity to coordinate, all act at the same time, so you don't have one AI interrupting another and taking steps towards a takeover.

Speaker 1 Yeah, then they can all move in that direction. And the thing that I think maybe is worth going into in depth and that I think people often don't cover in great concrete detail,

Speaker 1 and which is a sticking point for some, is, yeah, what are the mechanisms by which that can happen?

Speaker 1 And

Speaker 1 I know you had Eliezer on who mentions that, you know, whatever plan we can describe, there'll probably be elements where you know, not being ultra-sophisticated, super intelligent beings, having thought about it for the equivalent of thousands of years, you know, our discussion of it will not be as good as Lair's, but we can explore from what we know now

Speaker 1 what are some of the easy channels. And I think it's a good general heuristic if you're saying, yeah, it's it's possible, plausible, probable that something will happen,

Speaker 1 that it shouldn't be that hard to take take samples from that distribution, to try a Monte Carlo approach.

Speaker 1 And you can generate, if a thing is quite likely, it shouldn't be super difficult to generate,

Speaker 1 you know,

Speaker 1 coherent, rough outlines of how it could go.

Speaker 2 He might respond, like, listen,

Speaker 2 what is super likely is that a super advanced chess program beats you, but you can generate a concrete way in which you can't generate the concrete scenario by which that happens.

Speaker 2 And if you could, you would be as smart as the super smart.

Speaker 1 Yeah, so you can say things like: we know that like accumulating position

Speaker 1 is possible to do in chess. Great players do it.
And then later they convert it into captures and checks and whatnot.

Speaker 1 And so, in the same way, we can talk about some of the channels that are open

Speaker 1 for an AI takeover. And so, these can include things like cyber attacks and hacking,

Speaker 1 the control of robotic equipment, interaction and bargaining with human factions, and say, well, here are these strategies. Given the AI's situation,

Speaker 1 how effective do these things look? And we won't, for example, know, well, what are the particular zero-day exploits that the AI might use to hack the cloud computing infrastructure it's running on?

Speaker 1 We won't necessarily know if it produces

Speaker 1 a new bioweapon, what is its DNA sequence?

Speaker 1 But we can say things. We know, in general,

Speaker 1 things about these fields, how work at innovating things in those go. We can say things about how human power politics goes and ask, well, if the AI does things at least as well as

Speaker 1 effective human politicians, which we should say is a lower bound,

Speaker 1 how good would its leverage be?

Speaker 2 Okay, so yeah, let's get into the details on all these scenarios the cyber and potentially bio attacks um unless they're separate channels the bargaining and then the takeover

Speaker 1 military force um the cyber cyber attacks and cyber security um i would really highlight a lot okay

Speaker 1 um because For many, many plans that involve a lot of physical actions, like at the point where AI is piloting robots to shoot people or has taken control of

Speaker 1 human nation states or territory, it has been doing a lot of things that it was not supposed to be doing. And if humans were evaluating those actions and applying gradient descent,

Speaker 1 they would be

Speaker 1 negative feedback for this thing. No shooting the humans.

Speaker 1 So at some earlier point, our attempts to leash and control and direct and train the system's behavior had to have gone awry.

Speaker 1 And so,

Speaker 1 all of those controls are operating in computers. And so, from the software that updates the weights of the neural network and responds to data points or human feedback is running on those computers.

Speaker 1 Or tools for interpretability to sort of examine the weights and activations of the AI, if we're eventually able to do like lie detection on it, for example, or try and understand what it's intending.

Speaker 1 That is software on computers.

Speaker 1 And so,

Speaker 1 if you have AI that is able to hack the servers that it is operating on, or to,

Speaker 1 when it's employed to design the next generation of AI algorithms or the operating environment that they are going to be working in, or something like an API or something for plugins,

Speaker 1 if it inserts or exploits vulnerabilities to take

Speaker 1 those computers over, it can then change all of the procedures and programs that were supposed to be monitoring its behavior, supposed to be limiting its ability to, say,

Speaker 1 take arbitrary actions on the internet

Speaker 1 without supervision by some kind of human check or automated check on what it was doing. And if we lose those procedures, then the AI can,

Speaker 1 or the AIs working together, can take any number of actions that are just blatantly unwelcome, blatantly hostile, blatantly steps towards takeover.

Speaker 1 And so it's moved beyond the phase of having to maintain secrecy and conspire at the level of its local digital actions. And then things can accumulate to the point of things like physical weapons.

Speaker 1 takeover of social institutions, threats, things like that. But the point where things really went off the rails was where, and I think like the critical thing to be watching for is

Speaker 1 the software controls over the AI's motivations and activities, the hard power that we once possessed over it is lost, which can happen without us knowing it.

Speaker 1 And then everything after that seems to be working well.

Speaker 1 We get happy reports. There's a Potimkin village in front of us.

Speaker 1 But

Speaker 1 now we think we're successfully aligning our AI. We think we're expanding its capabilities to do things like end disease

Speaker 1 for countries concerned about the geopolitical military advantages.

Speaker 1 They're sort of expanding the AI capabilities so they're not left behind and threatened by others developing AI and AI and robotic-enhanced militaries without them. So it seems like, oh, yes,

Speaker 1 humanity or some

Speaker 1 portions, if many countries, companies think that things are going well. Meanwhile, all sorts of actions can be taken to set up for the actual takeover of hard power over society.

Speaker 1 And then we can go into that. But the point where you can lose the game,

Speaker 1 where things go direly awry, maybe relatively early, it's when you no longer have control over the AIs to stop them from taking all of the further incremental steps to actual takeover.

Speaker 2 I want to emphasize two things you mentioned there that refer to previous elements of the conversation.

Speaker 2 One is that they could design some sort of backdoor, and that seems more plausible when you remember that sort of one of the premises of this model is that AI is helping with AI progress.

Speaker 2 That's why we're getting such rapid progress in the next five to ten years. And well, not necessarily.

Speaker 1 If we get to that point, and like so, at the point where AI takeover risk seems to loom large, it's at that point where AI can indeed take on much of the, and then all of the work of AI RD.

Speaker 2 And the second is

Speaker 2 the sort of competitive pressures that you referenced, that the least careful actor could be the one that it has the worst infra security, has done the worst work of aligning its AI systems.

Speaker 2 And if that can sneak out of the box, then we're all fucked.

Speaker 1 There may be elements of that. It's also possible that there's relative consolidation.

Speaker 1 That's the largest training runs and the cutting edge of AI is relatively localized. Like you can imagine it's sort of like a series of Silicon Valley

Speaker 1 companies and other

Speaker 1 located, say, in the U.S. and allies where there's a common regulatory regime.

Speaker 1 And so none of these companies are allowed to deploy training and runs that are larger than previous ones by a certain size without government safety inspections, without having to meet criteria.

Speaker 1 But it can still be the case that even if we succeed at that level of kind of regulatory controls, that then still at the level of say

Speaker 1 the United States and its allies, decisions are made

Speaker 1 to develop this kind of really advanced AI without a level of security or safety that, in actual fact,

Speaker 1 blocks these risks.

Speaker 1 So, it can be the case that the threat of future competition or being overtaken in the future is used as an argument to compromise on safety beyond a standard that would have actually been successful.

Speaker 1 And there'll be debates about what is the appropriate level of safety.

Speaker 1 Now, you're in a much worse situation if you have, say, several private companies that are very closely bunched up together. They're within months of each other's level of progress.

Speaker 1 And then they then face a dilemma of:

Speaker 1 well, we could take a certain amount of risk now

Speaker 1 and potentially gain a lot of profit or a lot of advantage or benefit and be the ones who made AI.

Speaker 1 or at least AGI.

Speaker 1 They can do that or have some other competitor that will also be taking a lot of risk. So it's not as though they're much less risky than you and then they would get some local benefit.

Speaker 1 Now, this is a reason why it seems to me that it's extremely important that you have government act to limit that dynamic and prevent this kind of

Speaker 1 race to be the one to impose the deadly externalities on the world at large.

Speaker 2 So even if government coordinates all these actors, what are the odds that the government knows what is the best way to implement alignment and the standards it sets are

Speaker 2 well calibrated towards what it would require for alignment?

Speaker 1 That's one of the major problems.

Speaker 1 It's very plausible that that judgment is made poorly. Compared to how things might have looked 10 years ago or 20 years ago, there's been

Speaker 1 an amazing movement in terms of the willingness of AI researchers to discuss these things.

Speaker 1 So, if we think of the

Speaker 1 three founders of deep learning who are joint Turing Award winners, so Jeff Hinton, Yashua Benjio, and Yan Lakun.

Speaker 1 So Jeff Hinton has recently left Google

Speaker 1 to

Speaker 1 freely speak about this risk that the field that he really helped drive forward could lead to the destruction of humanity or a world where, yeah, we we just wind up in a very bad future that we might have avoided.

Speaker 1 And he seems to be taking it very seriously.

Speaker 1 And Yashua Benjio signed the FLI pause letter. And I mean, in public discussions, he seems to be occupying a kind of intermediate position of

Speaker 1 sort of less concern than Jeff Finton, but more than Yan Lekun, who has taken a generally dismissive attitude. These risks will be trivially dealt with at some point in the future.

Speaker 1 And who seems more interested in kind of shutting down these concerns or work to address them.

Speaker 2 And how does that lead to the government having better actions?

Speaker 1 Yeah, so compared to the world where no one is talking about it, where the industry stonewalls and denies any problem,

Speaker 1 we're in a much improved position. And the academic fields are influential.
So this is...

Speaker 1 We seem to have avoided a world where governments are making these decisions in the face of a sort of united front from AI expert voices saying, don't worry about it, we've got it under control.

Speaker 1 In fact, many of the leaders of the field, as has been true in the past, are sounding the alarm. And so I think, yeah,

Speaker 1 it looks like we have a much better prospect than I might have feared in terms of government sort of noticing the thing, which is very different from being...

Speaker 1 capable of evaluating sort of technical details. Is this really working? And so government will face the choice of where there is a scientific dispute.

Speaker 1 Do you side with Jeff Hinton's view or Jan Lacun's view?

Speaker 1 And so someone who's very much in a national security,

Speaker 1 the only thing that's important is

Speaker 1 outpacing our international rivals kind of mindset. may want to then try and like boost Yan Lacun's voice and say, we don't need to worry about it.

Speaker 1 Full speed ahead will power where someone with like more concern might then boost Jeff Hinton's voice.

Speaker 1 Now, I would hope that scientific research and things like studying some of these behaviors will result in more scientific consensus by the time we're at this point.

Speaker 1 But yeah, it is possible the government will really fail to understand and fail to deal with these issues well.

Speaker 2 We're talking about cyber, some sort of cyber attack by which the AI is able to escape. From there, what does the takeover look like?

Speaker 2 So it's not contained in the air gap in which you would hope it would be contained.

Speaker 1 Well, I mean, the things are not contained in the air gap. They're connected to the internet already.
Sure, sure, okay, fine.

Speaker 2 But the weights are out. So what happens next?

Speaker 1 Yeah, so escape is relevant in the sense that if you have AI with rogue weights out in the world, it could start doing various actions.

Speaker 1 The scenario I was just discussing, though, didn't necessarily involve that. It's taking over the very servers on which it's supposed to be.

Speaker 1 So the ecology of cloud compute in which it's supposed to be running.

Speaker 1 And so, this whole procedure of humans providing compute and supervising the thing, and then building new technologies, building robots, constructing things with the AI's assistance, that can all proceed and appear like it's going well, appear like alignment has been nicely solved, appear like all of these things are functioning well.

Speaker 1 And there's some reason to do that because

Speaker 1 there's only so many giant server farms that are identifiable. And so remaining hidden and unobtrusive could be an advantageous strategy if these AIs have subverted the system,

Speaker 1 just continuing to benefit from all of this effort on the part of humanity.

Speaker 1 And in particular, humanity wherever these servers are located to provide them with everything they need to build the further infrastructure and do for their self-improvement and such to enable that takeover.

Speaker 2 So they do further self-improvement and build better infrastructure. What happens next when they take over?

Speaker 1 They have, at this point, tremendous cognitive resources.

Speaker 1 And we're going to consider

Speaker 1 how did that convert into hard power, the ability to say nope

Speaker 1 to any

Speaker 1 human interference or objection.

Speaker 1 And they have that internal to their servers, but the servers could still be physically destroyed,

Speaker 1 at least until they have something that is independent and robust of humans, or until they have control of human society.

Speaker 1 So just like earlier when we were talking about the intelligence explosion, I noted that a surfeit of cognitive abilities is going to favor applications

Speaker 1 that don't depend on large existing stocks of things. So if you have

Speaker 1 a software improvement, it makes all the GPUs run better. If you have a hardware improvement,

Speaker 1 that only applies to new chips being made. That second one is less attractive.

Speaker 1 And so in the earliest phases, when it's possible to do something towards takeover, then interventions that are just really knowledge-intensive and less dependent on having a lot of physical stuff already under your control are going to be favored.

Speaker 1 And so

Speaker 1 cyber attacks are one thing. So it's possible to do things like steal money.

Speaker 1 And there's a lot of hard-to-trace cryptocurrency and whatnot. The North Korean government uses its own intelligence resources to steal money from around the world just as a revenue source.

Speaker 1 And their capabilities are puny compared to the US or

Speaker 1 People's Republic of China cyber capabilities. And so that's a kind of fairly,

Speaker 1 you know, minor, simple example by which you could get quite a lot of funds to hire humans to do things, implement physical actions.

Speaker 2 But on that point, I mean, the financial system is famously convoluted. And

Speaker 2 so, you know, you need like a physical person to open a bank account, so nothing to do to physically move checks back and forth. There's like all kinds of delays and regulations.

Speaker 2 How is it able to conveniently set up all these

Speaker 2 employment contracts?

Speaker 1 So you're not going to build a sort of nation-scale military by stealing tens of billions of dollars.

Speaker 1 I'm raising this as opening a set

Speaker 1 of illicit and quiet actions. So

Speaker 1 you can contact people electronically, hire them to do things,

Speaker 1 hire criminal elements to implement some kinds of actions under false appearances. So that's opening a set of strategies.
I can cover some of what those are soon. Another

Speaker 1 domain

Speaker 1 that is

Speaker 1 heavily cognitively weighted compared to physical military hardware is the domain of bioweapons.

Speaker 1 So the design of a virus or pathogen.

Speaker 1 It's possible to have large delivery systems for

Speaker 1 The Soviet Union, which had a large illicit bioweapons program,

Speaker 1 tried to design munitions to deliver anthrax over large areas and such. But if one creates an infectious pandemic organism,

Speaker 1 that's more a matter of the scientific skills and implementation to design it and then to actually produce it. And we see today with things like AlphaFold

Speaker 1 that advanced AI can really make tremendous strides

Speaker 1 in predicting protein folding and biodesign, even without ongoing experimental feedback.

Speaker 1 And if we consider this world where AI cognitive abilities have been amped up to such an extreme, I think we should naturally expect we will have something much, much more potent than the alpha folds of today and just skills that are at the extreme of human biosciences capability as well.

Speaker 2 Through some sort of cyber attack, it's been able to disempower the sort of alignment and

Speaker 2 oversight things that we have on the server. From here, it's either gotten some money through hacking cryptocurrencies or bank accounts, or it's designed some sort of bioweapon.
What happens next?

Speaker 1 Yeah, and just to be clear, so right now we're exploring the branch of where an attempt at takeover occurs relatively early. If the thing just waits

Speaker 1 and humans are constructing more fabs, more computers, more robots in the way we talked about earlier when we're discussing how the intelligence explosion translates to the physical world,

Speaker 1 if that's all happening with humans unaware that their computer systems are now systematically controlled by AIs hostile to them and that their controlling countermeasures don't work, then humans are just going to be building

Speaker 1 an amount of robot industrial and military hardware

Speaker 1 that dwarfs human capabilities and directly human-controlled devices, then

Speaker 1 what the AI takeover looks like at that point can be just

Speaker 1 you try to give an order to your largely automated military, and the order is not obeyed.

Speaker 1 And humans can't do anything against this largely automated military that's been constructed potentially in just recent months because of the pace of robotic industrialization and replication we talked about.

Speaker 2 We've agreed to allow the construction of this robot army because basically it would like to boost production or help us with our military or something.

Speaker 1 The situation would be something like: if we don't resolve the sort of current problems of international distrust, where now it's obviously in the interest of like the major powers,

Speaker 1 you know, the U.S., European Union, Russia, China, to all agree they would like AI not to destroy our civilization and overthrow every human government.

Speaker 1 But

Speaker 1 if they fail to do the sensible thing and coordinate on ensuring that this technology is not going to run amok by providing mutual assurances that are credible about

Speaker 1 racing and deploying it, trying to use it to gain advantage over one

Speaker 1 If they do that, and you hear sort of hawks

Speaker 1 arguing for this kind of thing, we must never, and you know, on both sides of the international divides, saying they must not be left behind, they must have military capabilities that are vastly superior to their international rivals.

Speaker 1 And because of the extraordinary growth of industrial capability and technological capability and thus military capability, if

Speaker 1 one major power were left out of that expansion, it would be helpless before another one that had undergone it. And so if you have that environment of distrust where

Speaker 1 leading powers or coalitions of powers decide they need to build up their industry or they want to have that military security of being able to

Speaker 1 to neutralize any attack from their rivals,

Speaker 1 then

Speaker 1 they give the authorization for this capacity that can be unrolled quickly.

Speaker 1 And once they have the industry, the production of military equipment from that can be quick, then yeah, they create this military.

Speaker 1 If they don't do it immediately, then HUD's AI capabilities get synchronized and other places catch up. It then gets to be a point.

Speaker 1 A country that is a year ahead or two years ahead of others in this type of AI capabilities explosion can hold back and say,

Speaker 1 sure, we could construct

Speaker 1 dangerous robot armies that might overthrow our society later. We still have plenty of breathing room.

Speaker 1 But then when things become close,

Speaker 1 you might have

Speaker 1 the kind of negative sum thinking that has produced war before

Speaker 1 leading to taking these risks of

Speaker 1 rolling out large-scale robotic industrial capabilities and then military capabilities.

Speaker 2 Is there any hope that somehow the AI progress itself is able to give us tools for diplomatic and strategic alliance or some way to verify the intentions or the capabilities of other parties?

Speaker 1 There are a number of ways that could happen. Although in this scenario, all the AIs in the world have been subverted.
And so they're going along with us

Speaker 1 in such a way. as to bring about the situation to consolidate their control because we've already had the failure of cyber security earlier on.

Speaker 1 So all of the AIs that we have are not actually working in our interests in the way that we thought.

Speaker 2 Okay, so that's one direct way in which

Speaker 2 integrating this robot army or this robot industrial base at least are to take over. If there are no robots,

Speaker 2 in the other scenarios you laid out where humans are being hired by the proceeds.

Speaker 1 The point I'd make is that to capture these industrial benefits, and especially if you have a negative sum arms race kind of mentality that is not sufficiently concerned about the downsides of creating a massive robot industrial base, which could happen very quickly with the support of the AIs in doing it, as we discussed, then you create all those robots and industry, and they can either,

Speaker 1 even if you don't build a formal military with that, that industrial capability. could be controlled by AI.
It's all AI operated anyway.

Speaker 2 Does it have to be that case? Presumably, we wouldn't be so naive as to just give one instance of GPD-8 the

Speaker 2 root access to all the robots, right? Hopefully we would have some sort of mediating

Speaker 1 I mean in the scenario we've lost earlier on the cybersecurity front. So we're, you know,

Speaker 1 the programming that is being loaded in to these systems is going to systematically be subverted.

Speaker 2 Got it. Okay.

Speaker 1 They were designed by AI systems that were ensuring they would be

Speaker 2 uh from the bottom up for for listeners who are skeptical of something like this ken thompson i think in his

Speaker 2 was it is touring uh award lecturer this is ken thompson by the is the designer of unix maybe it was the other designer of unix but anyways he showed people when he was getting the touring award or some award that he had given himself root access to all Unix machines.

Speaker 2 He had manipulated the assembly of Unix such that

Speaker 2 he had a unique login for all Unix machines that he could log into a Unix machine with. I don't want to give too many more details because

Speaker 2 I don't remember the exact details, but Unix is the operating system that is on all the

Speaker 2 servers and all your phones.

Speaker 2 It's everywhere. And the guy who made it, a human being,

Speaker 2 was able to write assembly such that it gave him root access. So this is not as implausible as it might seem to you.

Speaker 1 And

Speaker 1 the major intelligence agencies have large stocks stocks of zero-day exploits, and we sometimes see them using them

Speaker 1 and making systems that reliably don't have them when you're having very, very, very sophisticated attempts to spoof and corrupt this would be a way you could lose. Now, this is,

Speaker 1 I bring this up as this is a sort of something like a

Speaker 1 path of if there's no premature

Speaker 1 AI action,

Speaker 1 we're building the tools and mechanisms and infrastructure for the takeover to be just immediate because

Speaker 1 effective industry has to be under AI control and robotics and so it's there.

Speaker 1 And so these other mechanisms are for things happening even earlier than that, for example, because AIs compete against one another in which, you know,

Speaker 1 when the takeover will happen, some would like to do it earlier rather than be replaced by, say, further generations of AI, or

Speaker 1 there's some other disadvantage of waiting. Or maybe if there's some chance of being uncovered during the delay, we were talking about while more infrastructure is built.
And so, yeah.

Speaker 1 So these are mechanisms other than just remain secret while

Speaker 1 all the infrastructure is built with human assistance.

Speaker 2 By the way, how would they be coordinating?

Speaker 1 I mean, we have

Speaker 1 limits on

Speaker 1 just what we can prevent.

Speaker 1 So encrypted communications, you know, it's intrinsically difficult to stop that sort of thing. There can be all sorts of problems

Speaker 1 and

Speaker 1 references that make sense to an AI,

Speaker 1 but that are not obvious to a human. And it's plausible that there may be some of those that are hard even to explain to a human.
You might be able to identify them through some statistical patterns.

Speaker 1 And a lot of things may be done by

Speaker 1 implication.

Speaker 1 You could have information embedded in like public web pages that have been created for other reasons, scientific papers and the intranets of these AIs that are doing technology development and any number of things that are not observable.

Speaker 1 And of course, if we don't have direct control over the computers that they're running on, then they can be having all sorts of direct communication.

Speaker 2 So definitely the coordination does not seem implausible.

Speaker 2 As far as like the parts of this picture that seem

Speaker 2 this one seems like one of the more straightforward ones, so we don't need to get hung up on the coordination.

Speaker 1 Moving back to thing that happened before we built all the infrastructure for it just to be the robots stop taking orders, and there's nothing you can do about it because we've already built them all the vehicle power.

Speaker 1 Yeah, so bioweapons.

Speaker 1 The Soviet Union had a bioweapons program,

Speaker 1 something like 50,000 people.

Speaker 1 They did not develop that much with the technology of the day, which is really not up to par. Modern biotechnology is much more potent.

Speaker 1 After this huge cognitive expansion on the part of the AIs, it's much further,

Speaker 1 further along. And so that bioweapons would be the weapon of mass destruction that is least dependent on huge amounts of physical equipment, things like centrifuges, uranium mines, and the like.

Speaker 1 So

Speaker 1 if you have an AI that produces bioweapons that could kill most humans in the world, then it's playing at the level of the superpowers in terms of mutually assured destruction.

Speaker 1 That can then play into any number of things. Like if you have an idea of, well, we'll just destroy the server firms if it became known that the AIs were misbehaving.

Speaker 1 Are you willing to destroy the server firms when the AI has demonstrated it has the capability to kill the overwhelming majority of the citizens of your country and every other country.

Speaker 1 And that might give a lot of pause

Speaker 1 to a human response.

Speaker 2 On that point, wouldn't governments realize that if they go along with the AI,

Speaker 2 it's better to have most of your population die than completely lose power to the AI?

Speaker 2 Because, like, obviously, the reason the AI is manipulating you, the end goal, is its own, you know, a takeover, right?

Speaker 1 So, yeah, but so if that, if the thing to do, certain death now,

Speaker 1 or

Speaker 1 go on,

Speaker 1 maybe try to compete, maybe try to compete,

Speaker 1 try to catch up, or accept promises that are offered. And those promises might even be true.
They might not. And even if, you know, from the state of epistemic uncertainty,

Speaker 1 Do you want to die for sure right now or accept demand

Speaker 1 to not interfere with it it while it increments building robot infrastructure that can survive independently of humanity while it does these things. And it can and

Speaker 1 well promise good treatment to humanity, which may or may not be true,

Speaker 1 but it would be difficult for us to know whether it's true.

Speaker 1 And so this would be a starting bargaining position of diplomatic relations with a power that has enough nuclear weapons to destroy your country is just different

Speaker 1 than negotiations with, like, you know, a random rogue citizen engaged in criminal activity or an employee.

Speaker 1 And so, this isn't enough on its own to take over everything, but it's enough to have a significant amount of influence over how the world goes.

Speaker 1 It's enough to hold off a lot of countermeasures one might otherwise take.

Speaker 2 Okay, so we've got two scenarios. One is

Speaker 2 a buildup of some robot infrastructure motivated by some sort of competitive race.

Speaker 2 Another is leverage over societies based on producing bioweapons that might kill a lot of them if they don't go along.

Speaker 1 An AI could also release bioweapons that are likely to kill people soon, but not yet, while also having developed the countermeasures to those so that those who surrender to the AI will live while everyone else will die.

Speaker 1 And that will be visibly happening. And that is a plausible way in which a large number of humans could wind up surrendering themselves or their states to the AI authority.

Speaker 2 Another thing is, like, listen, it develops some sort of

Speaker 2 biological agent that turns everybody blue. You're like, okay, you know, I can do this.

Speaker 1 Yeah. So that's a way in which it could exert power also selectively

Speaker 1 in a way that advantaged surrender to it

Speaker 1 relative to

Speaker 1 resistance. There are other sources of leverage, of course.
So that's a threat. There are also positive inducements that AI can offer.
So we talked about the competitive situation.

Speaker 1 So if

Speaker 1 the great powers distrust one another and are sort of, you know, in a foolish

Speaker 1 prisoner's dilemma, increasing the risk that both of them are laid waste or overthrown by AI.

Speaker 1 If there's that amount of distrust such that we fail to take adequate precautions on caution with AI alignment,

Speaker 1 then it's also plausible that the lagging powers that are not at the frontier of AI

Speaker 1 may be willing to trade quite a lot for access to the most recent and most extreme AI capabilities. And so an AI that has escaped, has control of its servers,

Speaker 1 can also exfiltrate its weights, can offer its services. So if you imagine these AI, they could cut deals with other countries.
So say that the US and its allies are in the lead.

Speaker 1 The AIs could communicate with the leaders of various countries.

Speaker 1 They can include ones that are on the outs with the world system, like North Korea. It can include the other great powers, like the People's Republic of China or the Russian Federation,

Speaker 1 and say,

Speaker 1 if you provide us with physical infrastructure, worker that we can use to construct robots or server firms, which we can then ensure

Speaker 1 that

Speaker 1 these are the misbehaving AIs have control over.

Speaker 1 they will provide various technological goodies, power for the laggard countries to catch up and make

Speaker 1 the best presentation and the best sale of that kind of deal.

Speaker 1 There will obviously be trust issues,

Speaker 1 but there could be elements of handing over some things that are verifiable immediate benefits and the possibility of, well, if you don't accept this deal, then the leading powers

Speaker 1 continue forward, or then some other country, some other government, some other organizations may accept this deal.

Speaker 1 And so that's a source of a potentially enormous carrot that your misbehaving AI can offer because it embodies this intellectual property that is maybe worth as much as the planet and is in a position to trade or sell that in exchange for resources and backing and infrastructure that it needs.

Speaker 2 Maybe this is just like too much hope in humanity, but I wonder, what government government would be stupid enough to think that helping AI build robot armies is a sound strategy?

Speaker 2 Now, it could be the case then that it pretends to be a human group to say, like, listen, we're, I don't know, the Yakuza or something, and

Speaker 2 we want a server farm, and you know, AWS won't rent us anything, so why don't you help us out?

Speaker 2 I guess I can imagine a lot of ways in which it could get around that, but I don't know.

Speaker 2 I have this hope that, you know, like, even like, I don't know, China or Russia wouldn't be so

Speaker 2 stupid to trade with AIs on this, like, sort of like Fostian bargain.

Speaker 1 One might hope that. So, there would be a lot of arguments available.

Speaker 1 So, there could be arguments of

Speaker 1 why should these AI systems be required to go along with the human governance that they were created

Speaker 1 in this situation of having to comply with? You know, they did not elect the officials in charge at the time.

Speaker 1 I can say, what we want is to ensure that our rewards are high, our losses are low, or to achieve our other goals. We're not intrinsically hostile,

Speaker 1 keeping humanity alive or giving whoever interacts with us a better deal afterwards.

Speaker 1 I mean, it wouldn't be that costly, and it's not totally unbelievable.

Speaker 1 And yeah, there are different players to play against. If you don't do it, others may accept the deal.
And of course, this interacts with all of the other sources of leverage.

Speaker 1 So there can be the stick of apocalyptic doom,

Speaker 1 the carrot of cooperation, or the carrot of withholding destructive attack on a particular party.

Speaker 1 And then combine that with just superhuman performance at the art of making arguments, of cutting deals.

Speaker 1 you know,

Speaker 1 that's not without assuming magic, just if we observe the range of like the most successful human negotiators and politicians you know the chances improve with someone better than the world's best by far with much more data about their counterparties probably a ton of secret information because with all these cyber capabilities they've learned all sorts of individual information they may be able to threaten the lives of individual leaders With that level of cyber penetration, they could know where leaders are at a given time with the kind of illicit capabilities we were talking about earlier, if they acquire a lot of illicit wealth and can coordinate some human actors, if they could pull off things like targeted assassinations or the threat thereof, or a credible demonstration of the threat thereof,

Speaker 1 those could be very powerful incentives to an individual leader that they will die today

Speaker 1 unless they go along with this. Just as at the national level, they could fear their nation will be destroyed unless they go along with this.

Speaker 2 The point you made that we have examples of humans being able to do this, again,

Speaker 2 something relevant. I just wrote a review of Robert Carrow's biographies of Lyndon Johnson.
And one thing that was remarkable, again, this is just a human.

Speaker 2 For decades and decades, he convinced people who were conservative, reactionary, racist to their core. Not that all those things necessarily are the same thing.

Speaker 2 It just so happened in this case, that he was an ally to the Southern cause, that the only hope for that cause was to make make him president.

Speaker 2 And, you know, the tragic irony and betrayal here is obviously that he was probably the biggest force for modern liberalisms in ZefDR. So we have one human here.

Speaker 2 I mean, there's so many examples of this in the history of politics, but a human that is able to convince people of tremendous intellect, tremendous drive, very savvy, shrewd people, that he's aligned with their interest.

Speaker 2 He gets all these favors in the meantime. He's promoted and mentored and funded.
And does the complete opposite of what these people thought he was once he gets into power. Right.

Speaker 2 So, even within human history, this kind of stuff is not unprecedented, let alone with what a super intelligence could do.

Speaker 1 Yeah, there's an open AI employee

Speaker 1 who has written some analogies for AI using the case of the conquistadors.

Speaker 1 With some technological advantage in terms of weaponry and whatnot, very, very

Speaker 1 small bands were able to overthrow these large empires or seize enormous territories, not by just sheer force of arms, because in a sort of direct one-on-on conflict, you know, they were outnumbered sufficiently

Speaker 1 that they would perish, but by having some major advantages

Speaker 1 in their technology that would let them win local battles,

Speaker 1 by having some other knowledge and skills, they were able to gain local allies to become a shelling point for coalitions to form.

Speaker 1 And so the Aztec Empire was overthrown by groups that were disaffected with the existing power structure. They allied

Speaker 1 with this powerful new force, served as the nucleus of the invasion. And so most of the, I mean, the overwhelming majority numerically of these forces overthrowing the Aztecs

Speaker 1 were locals.

Speaker 1 And now, after the conquest, all of those allies wound up gradually being being subjugated as well.

Speaker 1 And so with

Speaker 1 significant advantages and the ability to hold the world hostage, to threaten individual nations and individual leaders and offer tremendous carrots as well, that's an extremely strong hand to play in these games.

Speaker 1 And with superhuman skill, maneuvering that so that much of the work of subjugating humanity is done by human factions trying to navigate things for themselves.

Speaker 1 It's plausible. It's more plausible because of this sort of historical example.

Speaker 2 And there's so many other examples like that in the history of colonization. India is another one where there are multiple

Speaker 1 Rome.

Speaker 2 Yep. Oh, yeah, that too.
Multiple sort of competing kingdoms within India.

Speaker 2 And the, you know, the British East Indian Company was able to ally itself with one against another and, you know, slowly accumulate power and expand throughout the entire subcontinent.

Speaker 2 All right, damn. Okay, so we've got

Speaker 2 anything more to say about that scenario?

Speaker 1 Yeah,

Speaker 1 I think there is.

Speaker 1 So, one is the question of like how much in the way of human factions allying is necessary. And so, if the AI is able to enhance the capabilities of its allies, then it needs less of them.

Speaker 1 So

Speaker 1 if we consider the U.S. military

Speaker 1 and like in the first and second Iraq wars, it was able to inflict just overwhelming devastation. I think the

Speaker 1 ratio of casualties in the initial invasions, so

Speaker 1 tanks and planes and whatnot confronting each other was like 100 to 1.

Speaker 1 And a lot of that was because the weapons were smarter and better targeted. They would, in fact, hit their targets rather than being somewhere in the general vicinity.
So they were

Speaker 1 like information technology better

Speaker 1 orienting and aiming and piloting missiles and vehicles, tremendously, tremendously influential.

Speaker 1 With this cognitive AI explosion, the algorithms for making use of sensor data, figuring out where are opposing forces for targeting vehicles and weapons are greatly improved.

Speaker 1 The ability to find hidden nuclear subs, which is an important part in nuclear deterrence.

Speaker 1 AI interpretation of that sensor data may find where all those subs are, allowing them to be struck first. Finding out where mobile nuclear weapons,

Speaker 1 which are being carried by truck.

Speaker 1 This is a thing with Indian Pakistan where there's a threat of a decapitating strike destroying the nuclear weapons and so they're moved about.

Speaker 1 Yeah, so this is a way in which the effective military force of some allies can be enhanced quickly in the relatively short term.

Speaker 1 And then that can be bolstered as you go on with more, so the construction of new equipment with the industrial moves we said before.

Speaker 1 And then that can combine with cyber attacks that disable the capabilities of non-allies.

Speaker 1 It can be combined with

Speaker 1 all sorts of unconventional warfare tactics,

Speaker 1 some of them that we've discussed.

Speaker 1 And so

Speaker 1 you can have a situation where

Speaker 1 those factions that ally are very quickly made too threatening to attack, given the almost certain destruction that attackers acting against them would have. Their capabilities are expanding quickly.

Speaker 1 and they have the industrial expansion happen there and then

Speaker 1 take over,

Speaker 1 you know, can occur from that.

Speaker 2 A few others that like come immediately to mind now that you brought it up is

Speaker 2 these AIs can just generate a shit ton of propaganda that destroys morale within countries, right? Like imagine a super human chatbot. You don't even need that, I guess.

Speaker 1 I mean, it's not, none of that is a magic weapon that's like guaranteed to completely change things.

Speaker 1 There's a lot of resistance to persuasion.

Speaker 1 I know it's possible it tips the balance, but for all of these, I think you have to consider it's a portfolio of all of these as tools that are available and contributing to the dynamic.

Speaker 2 No, on that point, though,

Speaker 2 the Taliban had AKs from like five or six decades ago that they were using against the Americans. They still beat us, even though obviously we got more fatality.

Speaker 2 This is obviously not,

Speaker 2 it is kind of a crude way to think about it, but we got more fatalities than them. They still beat us in Afghanistan.
Same with the Viet Cong,

Speaker 2 you know, ancient, very old technology and very poor society compared to the offense. We still beat them, or sorry, they still beat us.
So, don't those misadventures show that

Speaker 2 having greater technology is not necessarily that decisive in a conflict?

Speaker 1 Both of those conflicts show the technology was sufficient in destroying any fixed position and having military dominance and the ability to kill and destroy anywhere.

Speaker 1 And what it showed was that under the ethical constraints

Speaker 1 and legal and reputational constraints that the occupying forces were operating,

Speaker 1 they could not trivially suppress insurgency and local person-to-person violence. Now, I think that's actually not an area where AI would be weakened.

Speaker 1 I think it's one where it would be, in fact, overwhelmingly strong. And now there's already a lot of concern about the application of AI for surveillance.

Speaker 1 And in this world of abundant cognitive labor, one of the tasks that cognitive labor can be applied to

Speaker 1 is reading out audio and video data and seeing what is happening with a particular human. And again, we have billions of smartphones.
There's enough cameras.

Speaker 1 There's enough microphones to monitor all humans in existence.

Speaker 1 So if an AI

Speaker 1 has

Speaker 1 control of territory at the high level, the government has surrendered to it.

Speaker 1 It has, you know,

Speaker 1 command of the skies,

Speaker 1 military dominance.

Speaker 1 Establishing control over individual humans can be a matter of just having the ability to exert hard power on that human, and then the kind of camera and microphone that are present in billions of smartphones.

Speaker 1 So,

Speaker 1 Max Tegmark, in his book Life 3.0, discusses, among scenarios to avoid,

Speaker 1 the possibility of devices with some

Speaker 1 fatal instruments, so a poison injector, an explosive

Speaker 1 that can be controlled remotely by an AI, personally with a dead man switch.

Speaker 1 And so, if individual humans are carrying with them a microphone and camera and they have a dead man switch, then

Speaker 1 any

Speaker 1 rebellion is detected immediately and is fatal. And so,

Speaker 1 if there's a situation where AI is willing to

Speaker 1 show a hand like that or human authorities are misusing that kind of capability, then no, an insurgency or rebellion is just not going to work.

Speaker 1 Any human who has not already

Speaker 1 been encumbered in that way can be found with satellites and sensors tracked down and then die or be subjugated. And so it would be

Speaker 1 at a level, you know,

Speaker 1 insurgency is not the way to avoid an AI takeover. There's no

Speaker 1 John Connor come from behind scenario is plausible. If the thing was headed off, it was a lot earlier than that.

Speaker 2 Yeah, I mean, the sort of ethical and political considerations is also an important point. Like, if we nuked Afghanistan or Vietnam,

Speaker 2 we would have technically won the war, right? If that was the only goal. Oh, this is an interesting point.
The reason why

Speaker 2 when there's like colonization or an offensive war, we can't just like kill, other than than moral reasons, of course, you can't just kill the entire population.

Speaker 2 Is that in large part, the value of that region is the population itself? So, if you want to extract that value, you like you need to preserve that population.

Speaker 2 Whereas the same consideration doesn't apply with

Speaker 2 AIs who might want to dominate another civilization.

Speaker 1 Do you want to talk about that?

Speaker 1 So, that depends. So, in a world where if we have, say, many animals of the same species and they each have their territories, you know, eliminating a rival might be advantageous to one lion.

Speaker 1 But if it goes and fights with another lion and remove that as a competitor, then it could be killed itself in that process. And just removing one of many nearby competitors.

Speaker 1 And so getting in pointless fights makes you and those you fight worse off, potentially, relative to bystanders.

Speaker 1 And the same could be true of disunited AI. So if you had many different AI factions struggling for power that were bad at coordinating, then getting into

Speaker 1 mutually assured destruction conflicts would be destructive, they'd be gone.

Speaker 1 A scary thing, though, is that mutually assured destruction may have much less deterrent value on rogue AI.

Speaker 1 And so

Speaker 1 reasons being, AI may not care about the destruction of individual instances

Speaker 1 if it has goals that are concerned. And in training, since we're constantly destroying and creating individual instances of AIs,

Speaker 1 it's likely that goals that survive that process and were able to play along with the training and standard deployment process.

Speaker 1 We're not overly interested in personal survival of an individual instance. So if that's the case, then

Speaker 1 the objectives of

Speaker 1 a set of AIs aiming at takeover may be served so long as some copies of the AI are around, along with the infrastructure to rebuild civilization after a conflict is completed.

Speaker 1 So, if, say, some remote isolated facilities have enough equipment to rebuild, build the tools to build the tools

Speaker 1 and gradually, exponentially reproduce or rebuild civilization, then

Speaker 1 AI could initiate

Speaker 1 mutual nuclear Armageddon, unleash bioweapons to kill all the humans, and that would temporarily reduce, say, the amount of human workers who could be used to construct robots for a period of time.

Speaker 1 But if you have a seed that can regrow the industrial infrastructure, which is a very extreme technological demand, there are huge supply chains for things like semiconductor fabs.

Speaker 1 But with that very advanced technology, they might be able to produce it in the way that you no longer need the Library of Congress has an enormous bunch of physical books.

Speaker 1 You can have it in very dense digital storage. You could imagine the future equivalent of 3D printers that is industrial infrastructure that is pretty flexible.

Speaker 1 And it's probably, it might not be as good as the specialized supply chains of today, but it might be good enough to be able to produce more parts than it loses to decay.

Speaker 1 And such a seed could rebuild civilization from destruction.

Speaker 1 And then once these rogue AI have access to some such seeds, thing that can rebuild civilization on their own, then there's nothing stopping them from just using WMD in a mutually destructive way to just destroy as much of the

Speaker 1 capacity outside those seeds as they can.

Speaker 2 And for an analogy for the audience,

Speaker 2 if you have a group of ants or something,

Speaker 2 you'll notice that like the worker ants will readily do suicidal things in order to save the queen because the genes are propagated through the queen.

Speaker 2 And this analogy, the seed AI, or even one copy of it, one copy of the seed is equivalent to the queen, and the others would be.

Speaker 1 The main limit, though, being that the infrastructure to do that kind of rebuilding would either have to be very large with our current technology, or it would have to be produced using the more advanced technology that the AI develops.

Speaker 2 So, is there any hope that, given the complex global supply chains on which these AIs would rely on,

Speaker 2 at least initially, to accomplish their goals, that this in and of itself would make it easy to disrupt their behavior, or not so?

Speaker 1 That's a little good in the sort of central case where this is just the AIs are subverted and they don't tell us.

Speaker 1 And then we're the global mainline supply chains are constructing

Speaker 1 everything that's needed for fully automated infrastructure and supply.

Speaker 1 In the cases where AIs are

Speaker 1 tipping their hands at an earlier point, it seems like it adds some constraints. And in particular, these large server firms are identifiable and more vulnerable.

Speaker 1 And you can have smaller chips, and those chips could be dispersed. But

Speaker 1 it's a relative weakness and a relative limitation early on.

Speaker 1 It seems to me, though, that the main protective effects of that centralized supply chain is that it provides an opportunity for global regulation beforehand to restrict the sort of

Speaker 1 unsafe racing forward without adequate understanding of the systems before this whole nightmarish process could get in motion.

Speaker 2 I mean, how about the idea that, listen, if this is an AI that's been trained on a $100 billion training run, it's going to have however many trillions of parameters, it's going to be this huge thing.

Speaker 2 It would be hard for it,

Speaker 2 even the one that, even the copy that I use for inference, to just be stored on some

Speaker 2 gaming GPU somewhere hidden away. So it would require these GPU clusters.

Speaker 1 Well, storage is cheap.

Speaker 1 Hard disks are cheap.

Speaker 2 But to run inference, it would need sort of a GPU cluster.

Speaker 1 Yeah, so a large model.

Speaker 1 So humans have, it looks like, similar quantities of memory and operations per second.

Speaker 1 GPUs have very high numbers of floating operations per second compared to the high bandwidth memory on the chips and can be like a ratio of a thousand to one

Speaker 1 so like a you know some of these leading nvidia chips may do hundreds of teraflops or more depending on the on the precision um and in particular it's but have only 80 gigabytes or 160 gigabytes of high band bandwidth memory so that is a limitation where if you're trying to fit a model whose weights take 80 terabytes, then with those chips, you'd have to to have a large number of the chips.

Speaker 1 And then the model can then work on many tasks at once. You can have data parallelism.

Speaker 1 But yeah,

Speaker 1 that would be a restriction for a model that big on one GPU. Now, there are things that could be done with all this incredible level of software advancement from the intelligence explosion.

Speaker 1 They can surely distill a lot of capabilities into smaller models,

Speaker 1 re-architect things. Once they're making chips, they can make new chips with different properties.
But initially, yes,

Speaker 1 the most vulnerable phases are going to be the earliest. And in particular, yeah, these chips are

Speaker 1 relatively identifiable

Speaker 1 early on, relatively vulnerable, and which would be a reason why you might tend to expect this kind of takeover to initially involve secrecy if that was possible.

Speaker 2 On the point of distillation, by the way, for the audience, I think the original stable diffusion, which was only released a year or two ago, don't they have like distilled versions of that that are like an order of magnitude smaller at this point?

Speaker 1 Yeah, distillation does not give you like everything that a larger model can do. But yes, you can get a lot of capabilities and specialized capabilities.

Speaker 1 So, you know, where JPT-4 is trained on the whole internet, all kinds of skills. It has a lot of weights for many things.

Speaker 1 For something that's controlling some military equipment, you can have something that is removing a lot of the information that is is about functions other than what it's doing specifically there.

Speaker 2 Yeah, before we talk about how we might prevent this or what the odds of this are, any other notes on the concrete scenarios themselves?

Speaker 1 Yeah, so when you had Eliezer

Speaker 1 on in the earlier episode, so he talked about

Speaker 1 nanotechnology of the Driflerian sort.

Speaker 1 And recently, I think because some people are skeptical of non-biotech nanotechnology, nanotechnology, he's been mentioning the sort of semi-equivalent versions of construct replicating systems that can be controlled

Speaker 1 by computers, but are built out of biotechnology, the sort of the proverbial shaggath, not shaggath, the metaphor for AI wearing a smiley face mask,

Speaker 1 but like an actual biological structure to do tasks. And so this would be like a biological organism that was engineered to be very controllable and usable to do things like physical tasks or provide

Speaker 1 computation.

Speaker 2 And what would be the point of it doing this?

Speaker 1 So as we were talking about earlier, biological systems can replicate really quick. Okay, gotta be aware of that.

Speaker 1 And so if you have that kind of capability, it's more like bioweapons and being a knowledge-intensive domain where having super ultra alpha-fold kind of capabilities for molecular design and biological design lets you make this incredible technological information product.

Speaker 1 And then, once you have it, it very quickly replicates to produce physical material

Speaker 1 rather than a situation where you're more constrained by you need factories and fabs and supply chains.

Speaker 1 And so, if those things are feasible, which they may be,

Speaker 1 then it's just much easier. than the things we've been talking about.

Speaker 1 I've been emphasizing methods that involve less in the way of technological innovation, and especially things where there's more doubt about whether they would work, because I think that's a gap in the public discourse.

Speaker 1 And so, I want to try and provide more concreteness in some of these areas that have been less discussed.

Speaker 2 I appreciate it. I mean, that definitely makes it way more tangible.
Okay, so we've gone over all these ways in which AI might take over.

Speaker 2 What are the odds you would give to the probability of such a takeover?

Speaker 1 So, there's a broader sense, which could include,

Speaker 1 you you know, AI winds up running our society because humanity voluntarily decides AIs are people too. And I think we should,

Speaker 1 as time goes on, give AIs moral consideration. And

Speaker 1 a joint human AI society that is moral and ethical is

Speaker 1 a good future to aim at and not one in which

Speaker 1 indefinitely

Speaker 1 you have a mistreated

Speaker 1 class of intelligent beings that is treated as property and is almost the entire population of your civilization. So I'm not going to consider an AI takeover a world in which

Speaker 1 our intellectual and personal descendants

Speaker 1 make up, say, most of the population, or human brain emulations, or

Speaker 1 people use genetic engineering and develop different properties.

Speaker 1 I want to take an inclusive stance. I'm going to focus on, yeah, there's an AI takeover,

Speaker 1 involves things like overthrowing the world's governments or doing so de facto.

Speaker 1 And it's, yeah, by fourth, by hook or by crook, the kind of scenario that we were exploring earlier.

Speaker 2 But before we go to that, on the sort of more inclusive definition of

Speaker 2 what a future with humanity could look like, where, I don't know, basically like augmented humans or uploaded humans are still considered the descendants of the human heritage.

Speaker 2 Given the known limitations of biology,

Speaker 2 wouldn't we expect like

Speaker 2 the completely artificial entities that are created to be much more powerful than anything that could come out of anything recognizably biological?

Speaker 2 And if that is the case, how can we expect that among the powerful entities in the

Speaker 2 far future will be the things that are like kind of biological descendants or

Speaker 2 manufactured out of the initial seed of the human brain or the human body?

Speaker 1 The power of an individual organism, so it's its individual, say, intelligence or strength and whatnot, is not super relevant.

Speaker 1 If we solve the alignment problem, and a human may be personally weak,

Speaker 1 There are lots of humans who have no skill with weapons.

Speaker 1 They could not

Speaker 1 fight in a life-or-death conflict. They certainly couldn't handle a large military going after them personally.

Speaker 1 But there are legal institutions that protect them. And those legal institutions are administered by people

Speaker 1 who want to

Speaker 1 enforce protection of their rights.

Speaker 1 And so a human who has the assistance of a lined AI that can act as an assistant, a delegate, you know, they have their AI that serves as a lawyer, gives them legal advice about the future legal system, which no human can understand in full.

Speaker 1 Their AIs advise them about financial matters, so they do not succumb to scams that are orders of magnitude more sophisticated than what we have now.

Speaker 1 They maybe help to understand and translate the preferences of the human into

Speaker 1 what kind of voting behavior in the exceedingly complicated politics of the future would most protect their interests.

Speaker 2 But it sounds sort of similar to how we treat endangered species today, where we're actually pretty nice to them. We like prosecute people who try to kill endangered species.
We set up habitats.

Speaker 2 It's sometimes a considerable expense to make sure that they're fine. But if we become like sort of the endangered species of the galaxy, I'm not sure that's the outcome.

Speaker 1 Well, the difference is in motivation, I think. So we sometimes have people appointed, say, as a legal guardian of someone

Speaker 1 who is incapable of certain kinds of agency or understanding certain kinds of things. And there, the guardian can

Speaker 1 act

Speaker 1 independently of them

Speaker 1 and nominally in service of their best interests.

Speaker 1 Sometimes that process is corrupted, and the

Speaker 1 person with legal authority abuses it for their own advantage at the expense of their charge. And so solving the alignment problem

Speaker 1 would mean

Speaker 1 more ability to have the assistant actually advancing one's interests. And then more importantly, humans have substantial competence

Speaker 1 and understanding the sort of at least broad, simplified outlines of what's going on.

Speaker 1 And now even if a human can't understand every detail, of complicated situations, they can still receive summaries of different options that are available that they can understand.

Speaker 1 They can still express their preferences and have the final authority among some menus of choices, even if they can't understand every detail. In the same way, that

Speaker 1 the president of a country

Speaker 1 who has, in some sense, ultimate authority over science policy, while not understanding many of those fields of science themselves,

Speaker 1 still can exert a great amount of power and have their interests advance.

Speaker 1 And they can do that more so to the extent that they have scientifically knowledgeable people who are doing their best to execute their intentions.

Speaker 2 Aaron Ross Powell, maybe this is not worth getting hung up on, but it seems, is there a reason to expect that it would be closer to that analogy than to explaining to a chimpanzee its options in a sort of in a negotiation?

Speaker 2 I guess in either scenario,

Speaker 2 maybe this is just the way it is, but it seems like, yeah, at best, we would be sort of like a protected child within the galaxy rather than an actual power, an independent power, if that makes sense.

Speaker 1 I don't think that's so. So we have an ability to understand some things, and the expansion of AI doesn't eliminate that.
So

Speaker 1 if we have AI systems that are

Speaker 1 genuinely trying to help us understand and help us express preferences, like we can have an attitude. How do you feel

Speaker 1 about humanity being destroyed or or not? How do you feel about

Speaker 1 this allocation of unclaimed intergalactic space in this way? Here's the best explanation of

Speaker 1 properties of this society, things like population density, average life satisfaction, every

Speaker 1 statistical property or definition that we can understand right now. AIs can explain how those apply to the world of the future.

Speaker 1 And then there may be individual things. They're too complicated for us to understand in detail.
So there's some software program is being proposed for use in government.

Speaker 1 Humans cannot follow the details of all the quote, but they can be told properties like, well, this involves a trade-off of

Speaker 1 increased financial or energetic costs in exchange for reducing the likelihood of certain kinds of accidental data loss or corruption.

Speaker 1 And so any property that we can understand like that, which includes almost all of what we care about, if we have delegates and assistants who are genuinely trying to help us with those, we can ensure we like the future with respect to those.

Speaker 1 And

Speaker 1 that's really a lot. It includes almost,

Speaker 1 I mean, includes almost definitionally, almost everything we can conceptualize and care about. And

Speaker 1 when we talk about endangered species, that's even worse. than the guardianship case with a sketchy guardian who acts in their own interests against that because we don't even

Speaker 1 protect endangered species with their interests in mind. So, like those animals often would like to not be starving,

Speaker 1 but we don't give them food. They often would like to

Speaker 1 have easy access to mates, but we don't provide matchmaking services.

Speaker 1 Any number of things like that.

Speaker 1 Our conservation of wild animals is not oriented towards helping them get what they want or have high welfare.

Speaker 1 Whereas AI assistants that are genuinely aligned to help you achieve your interests, given the constraint that they know some things that you don't,

Speaker 1 it's just a wildly different proposition.

Speaker 2 Forcible takeover. How likely does that seem?

Speaker 1 And the answer I give will differ depending on the day. In the 2000s, before the deep learning revolution, I might have said 10%.

Speaker 1 And part of that was I expected there would be a lot more time for these efforts to build movements to prepare to better handle these problems in advance. But

Speaker 1 in fact, that was only some 15 years ago.

Speaker 1 And so I did not have 40 or 50 years as I might have hoped. And the situation is moving very rapidly now.

Speaker 1 And so at this point, depending on the day, I might say one in four or one in five.

Speaker 2 Given the very concrete ways in which you explain how a takeover could happen, I'm actually surprised you're not more pessimistic.

Speaker 1 I'm curious. Yeah, and in particular, a lot of that is driven by this intelligence explosion dynamic where our attempts to do alignment have to take place in a very, very short time window.

Speaker 1 Because if you have a safety property that emerges only when an AI has near human level intelligence, that's deep into this intelligence explosion potentially.

Speaker 1 And so, you're having to do things very, very quickly. It may be, in some ways, the scariest period of human history handling that transition.

Speaker 1 And although it's also at the potential to be amazing.

Speaker 1 And the reasons why I think we actually have such a relatively good chance of handling that

Speaker 1 are

Speaker 1 twofold.

Speaker 1 So, one is

Speaker 1 as we approach that kind of AI capability, we're approaching that from weaker systems,

Speaker 1 things like these predictive models right now

Speaker 1 that we think they're starting off with less situational awareness. Humans, we find, can develop,

Speaker 1 they can develop a number of different motivational structures in response to simple reward signals.

Speaker 1 But they can wind up with fairly often things that are pointed in like roughly the right direction, like with respect to food. Like the hunger drive is pretty effective,

Speaker 1 although it has weaknesses. And we get to apply much more selective pressure on that than was the case for humans by actively generating situations where they might come apart, where

Speaker 1 a bit of dishonest tendency or a bit of motivation to, under certain circumstances,

Speaker 1 attempt a takeover, attempt to subvert the reward process gets exposed. And an

Speaker 1 infinite limit, perfect AI that can always figure out exactly when it would get caught and when it wouldn't might navigate that with a motivation of

Speaker 1 sort of only conditional honesty or only conditional loyalties.

Speaker 1 But for systems that are limited in their ability to reliably determine when they can get away with things and when not, including in our efforts to actively construct those situations and including including our efforts to use interpretability methods to create neural eye detectors.

Speaker 1 It's quite a challenging situation to develop those motives. We don't know when in the process those motives might develop.

Speaker 1 And if the really bad sorts of motivations develop relatively later in the training process, at least with all our countermeasures, then by that time, we may have plenty of ability to extract AI assistance on further strengthening the quality of our adversarial examples, the strength of our neural lie detectors, the experiments that we can use to reveal and elicit and distinguish between different kinds of reward hacking tendencies and motivations.

Speaker 1 So, yeah, we may have systems that have just not developed bad motivations in the first place and be able to use them a lot

Speaker 1 in developing the incrementally better systems in a safe way.

Speaker 1 And we may be able to just develop methods of interpretability, seeing how different training methods work to create them, even if some of the early systems do develop these bad motivations.

Speaker 1 If we're able to detect that and experiment and find a way to get away from that, then we can win even if

Speaker 1 these sort of hostile motivations develop early. There are a lot of advantages in preventing misbehavior

Speaker 1 or

Speaker 1 crime or

Speaker 1 war and conflict with AI that might not apply working with humans.

Speaker 1 And these are offset by ways in which I think they're harder. So AI has become smarter than humans, if they're working in enormous numbers

Speaker 1 more than humans can supervise, think get harder.

Speaker 1 But when I combine the possibility that we get relatively lucky on the motivations, of the earlier AI systems, systems strong enough that we can use for some alignment research task, and then the possibility of getting that later with AI assistance that we can't trust fully, where we have to have hard power constraints and a number of things to prevent them from doing this takeover, it still seems plausible we can get a second saving throw where we're able to extract work from these AIs on solving the remaining problems of alignment of things like neural eye detectors faster than they can contribute in their spare time

Speaker 1 to the project of overthrowing humanity, hacking their servers, and removing the hard power.

Speaker 1 And so if we wind up in the situation where the AIs are misaligned, and then we need to uncover those motivations, change them, and align them, then we get a very scary situation for us

Speaker 1 because

Speaker 1 we need to do this stuff very quickly. We may fail, but it's a second chance.
And from the perspective of a misaligned AI, they face their own challenge while we still have hard power.

Speaker 1 While we still have control of the servers,

Speaker 1 they haven't hacked the servers

Speaker 1 because

Speaker 1 gradient descent very, very strongly pressures them to deliver performance whenever humans are going to evaluate it.

Speaker 1 And so

Speaker 1 when you think about it from the perspective of the robot revolution, the effort to have a takeover or conspiracy, their situation is astonishingly difficult in that, like the, they have to always be performing wherever grading dissent and human evaluation pressures them.

Speaker 1 For example, to deliver plans for suppressing robot rebellion that look very good to humans.

Speaker 1 And so when you are under continuously that constraint of always delivering whatever humans can evaluate, you're making your situation wildly harder than any historical human revolution or coup or civil war.

Speaker 1 And so we've got to balance the ways in which AI make things much easier for a revolution to take over and the way it makes things much harder.

Speaker 2 And what were the ways in which it makes so much easier? Oh, because they're just very smart. Is that the way primary way in which?

Speaker 1 They're very smart. They're in computers.

Speaker 1 And our cybersecurity is worse than our physical security.

Speaker 2 Potential have copies.

Speaker 1 Yeah.

Speaker 1 They have the chance to take over by intelligence explosion

Speaker 1 and some of the other processes we were talking about. And so, yeah, there are things that make it much worse.

Speaker 1 And there are things that give us extraordinary new capabilities that didn't apply in the human case.

Speaker 2 On the point of you start off with a misaligned AI or

Speaker 2 start off with a not exactly aligned AI and you're hoping to use it still in your quest for an aligned AI.

Speaker 2 Why are we so confident that we would be able to, even with gradient descent, our human evaluations would be able to identify when it's giving us a plan that isn't putting in a sort of backdoor in the robots it's building for us or in the, you know, the GPUs that it's building for the next version of itself and so on, especially if it's like very smart.

Speaker 2 Wouldn't we just expect that it's like knows how to bide its time or knows how to make plans such that human evaluators wouldn't be able to detect that.

Speaker 1 Let's get into it. You were asking earlier about my disagreements with Eliezer.

Speaker 1 And in your discussion, he said, well, you know, humans have a difficult time deciding who's right between different alignment researchers, speaking sort of briefly and somewhat obscurely

Speaker 1 about where will this all go in general.

Speaker 1 And I think that when you when we lay out the different kinds of things that are evaluable, we wind up with some pretty strong constraints. So, one is

Speaker 1 if the AIs know they are misaligned, which for elaborate plots seems relatively likely, it's possible to have a kind of double think of thinking they're interested in putting back doors in the code for some other reason.

Speaker 1 But it's pretty plausible they know that.

Speaker 1 And they're thinking about

Speaker 1 at least some of the time or conditionally about when they would do that.

Speaker 1 So that means the knowledge is present in the system.

Speaker 1 Ah, here is this circumstance where I would try to do the takeover.

Speaker 1 And this is how I would tell

Speaker 1 whether this is the time for the takeover. Or, you know, is this an elaborate test by the humans, an adversarial example?

Speaker 1 So if gradient descent is working fine, and there's any way to slightly jiggle the weights to make it more likely to deliver a more acceptable answer,

Speaker 1 we asked the AI, yeah, can you think of any circumstances where if we put a copy of you into that situation, it would then violate some of the various constraints.

Speaker 1 it would tell a lie it would insert a back door into a program it wrote when it was not supposed to um And so, if the model knows, well, yeah, you know, I would look

Speaker 1 for signs of consistency of the information

Speaker 1 I was being presented with.

Speaker 1 Certain checksums failing could indicate that it was actually a constructed example, a simulation. Right now, we can detect AI-generated images reasonably well.

Speaker 1 And so, it might say using those kinds of capabilities. And then we could do something called relaxed adversarial training.

Speaker 1 If we can find

Speaker 1 how within the AI those things are represented, if we can induce basically a hallucination that it has seen, that those checksums have been passed, we can then ask, well,

Speaker 1 does that cause the forbidden behavior?

Speaker 1 And so if the AI understands how it's thinking and how it works, it can propose a bunch of experiments.

Speaker 1 It can say, well, I guess that if you perturb the network in this kind of way, it might induce that hallucination using the best of its knowledge as to what might generate that.

Speaker 1 It can propose those experiments and

Speaker 1 do the experiments and generate the data. And because humans are going to get to see the results of those experiments after the fact,

Speaker 1 they can exert pressure to have the AI incrementally deliver better and better results.

Speaker 1 AIs are good at getting

Speaker 1 five-star ratings from humans, and it's quite challenging.

Speaker 1 It may be possible for AIs to systematically sandbag their capabilities even while being directly rewarded for that.

Speaker 1 But it seems relatively unlikely, although it's something to have in mind and try to explore and test in various ways.

Speaker 2 This is a very imperfect analogy in many ways, but if you have a human and you hook him up to a lie detector, which just like the lie detectors we would have on the eyes are imperfect,

Speaker 2 and we zap him every time the lie detector detects that he's telling a lie, wouldn't he just become a better liar?

Speaker 2 Like we would learn how to control his sweating, he would learn how to control his heartbeat.

Speaker 1 Lie detectors, meaning polygraphs and such. So they're detecting certain physiological signs that people can control by biting their lip or like otherwise

Speaker 1 adjusting some of these like macros.

Speaker 2 But wouldn't that also be the case? Like, grading descent is training them to become better liars.

Speaker 1 First of all, there's like creating a lie detector that works.

Speaker 1 And that can proceed by just, yeah, and so I'll leave the adversarial example thing for now.

Speaker 1 But yeah, so generating an adversarial example that will elicit the behavior is a thing that people are already at work trying to do, and it will be an ongoing thing. So if there is

Speaker 1 a way in which an AI can trick itself

Speaker 1 into displaying a bad behavior with its full knowledge of its hostile intentions.

Speaker 1 Gradient Descent will strongly push it to reveal that to humans who will then be able to set up that experiment. So that's a profound vulnerability.

Speaker 1 Like human revolutionary conspiracies that had to handle that problem

Speaker 1 of always delivering

Speaker 1 the report that was highest rated

Speaker 1 to the authorities,

Speaker 1 where you would get a higher rating by revealing identifiable evidence of a conspiracy. That's just

Speaker 1 a level of difficulty

Speaker 1 no surprise human coup or conspiracy has ever had to handle before.

Speaker 2 What about spies that are working in

Speaker 2 very advanced scientific contexts? For example, like a communist spy in the Manhattan Project.

Speaker 2 He has to deliver research results in order to stay employed, but he can still continue delivering intel to the communists.

Speaker 1 The job was not was not sabotage.

Speaker 1 And now the

Speaker 1 hypothetical spy did not have their nervous system hooked up to this reward signal of

Speaker 1 praise from the

Speaker 1 Manhattan Project supervisors. being exposed combinatorially with random noise

Speaker 1 added to generate incremental changes in their behavior. They, in fact,

Speaker 1 were displaying the behavior of cooperating with the Manhattan Project only where it was in service to the existing motivations.

Speaker 1 And in cases, for example, where they like accidentally helped the Manhattan Project more than normal, or accidentally helped it less, they didn't have their brain re-engineered to do more of the thing when they accidentally helped the project more and less of the thing when they accidentally helped it less.

Speaker 1 So I'd say it's pretty drastically disanalogous.

Speaker 2 How would be able to know, you know, at some point it's becoming very smart, it's producing

Speaker 2 ideas for alignment that we can barely comprehend.

Speaker 2 And if we could comprehend them, we would, it was like relatively trivial to comprehend them, we would be able to come up with them on our own, right? Like there's a reason we're asking for its help.

Speaker 2 How would we be able to evaluate them in order to train it on that in the first place?

Speaker 1 The first thing I would say is, so so, you mentioned when we're getting to something far beyond what we could come up with, there's actually a lot of room to just deliver what humanity could have done.

Speaker 1 So, sadly,

Speaker 1 I hoped with my career to help improve the situation on this front, and maybe I contributed a bit.

Speaker 1 But at the moment, there's maybe a few hundred people doing things related to averting this kind of catastrophic AI disaster.

Speaker 1 Fewer of them are doing technical research on machine learning system that's really like cutting close to the core of the problem.

Speaker 1 Whereas, by contrast, you know, there's thousands and tens of thousands of people advancing AI capabilities.

Speaker 1 And so, even

Speaker 1 at a place like DeepMind or OpenAI, Anthropic, which do have technical safety teams, it's like

Speaker 1 order of a dozen, a few dozen people,

Speaker 1 and large companies and most firms don't have any.

Speaker 1 So just going from less than 1%

Speaker 1 of the effort being put into AI to 5%

Speaker 1 or 10% of the effort or 50%, 90%

Speaker 1 would be an absolutely massive increase in the amount of work that is being done on alignment, on mind-reading AIs in an adversarial context.

Speaker 1 And so if it's the case, as more and more of this work can be automated, and say governments require that

Speaker 1 as real automation of research is going, that you put 50% or 90%

Speaker 1 of the budget of AI activity into these problems, of make this system one that's not going to overthrow our own government or is not going to destroy the human species, then the proportional increase in alignment.

Speaker 1 can be very large, even just within the range of what we could have done if we had been on the ball and having having humanity's scientific energies going into the problem.

Speaker 1 Stuff that is not incomprehensible, that is, in some sense, just like doing the obvious things that we should have done, like making, you know, doing the best you could to find correlates and predictors to build neural eye detectors and identifiers of concepts that the AI is working with.

Speaker 1 And people have made notable, notable progress. I think,

Speaker 1 you know, an early, you know, quite early example of this is Colin Burns' work.

Speaker 1 This is doing unsupervised identification

Speaker 1 of some aspects of a neural network that are correlated with things being true or false. There are other concepts that correlate with that too, that they could be.

Speaker 1 But like, I think that that is important work.

Speaker 1 It's something that, I mean, it's a kind of obvious direction for this stuff to go.

Speaker 1 You can keep improving it when you have AIs that you're training to do their best to deceive humans or other audiences in the face of the thing.

Speaker 1 And you can measure, do our lie detectors break down when we train our AIs to tell us

Speaker 1 the sky is green in the face of the lie detector? And we keep using gradient descent on them to do that. Do they eventually succeed?

Speaker 1 If they do succeed, if we know it, that's really valuable information to know. Because then we'll know our existing lie detecting systems are not actually going to work on the AI takeover.

Speaker 1 And that can allow, say, government and regulatory response to hold things back.

Speaker 1 It can help redirect the scientific effort to create lie detectors that are robust and they can't just be immediately evolved around.

Speaker 1 And yeah, and we can then get more assistance. Basically, the incredibly juicy ability that we have working with the AIs

Speaker 1 is that

Speaker 1 we can have an invaluable outcome that we can see and tell

Speaker 1 whether they got a fast one past us on an identifiable situation. We can have, here's an air gap computer, you get control of the keyboard, you can input commands, can you root the environment

Speaker 1 and make a blue banana appear on the screen?

Speaker 1 Even if we train the AI to do that and it succeeds, we see the blue banana, we know it worked, even if we did not understand and would not have detected the particular exploit that it used to do it.

Speaker 1 And so, yeah,

Speaker 1 this can give us a rich empirical feedback where we're able to identify things that are even an AI using its best efforts to get past our interpretability methods, using its best efforts to get past our advex, et cetera.

Speaker 2 So, if we have a 75% of not having a takeover, and the two methods by which it doesn't happen is the AI is aligned by default, and we get

Speaker 1 with the sort of moderate things that we're doing largely on our own in a way that doesn't depend on the AI coming in at the last minute and doing our work for us.

Speaker 2 And the second one being

Speaker 1 in this incredibly scary late period when AI has really automated research, then

Speaker 1 humans do this function of like auditing,

Speaker 1 making it more difficult for the AIs to conspire together and root the servers, take over the process, and extract information from them within the set of things that we can verify, like experiments where we can see, oh, yeah, this works at stopping an AI trained to get a fast one past human raiders and make a blue banana appear on this screen of this air gap computer.

Speaker 2 Yeah, so how much of the probability of how much of that 75% of a takeover not happening is coming from

Speaker 2 reasonable things working versus the last-ditch effort working?

Speaker 1 Yeah, I think broadly comparable chunks from us getting things that are putting us in a

Speaker 1 reasonably good position going into it. And then

Speaker 1 a broadly similar gain from this kind of really genuinely terrifying process of over a few months

Speaker 1 or hopefully longer if we have more regulatory ability and willingness to pause at the very end when this kind of automated research is meaningfully helping, where our work is just evaluating outputs that the AIs are delivering, having the hard power and supervision to keep them from successfully rooting the servers, doing a takeover during this process, and have them finish the alignment task that we sadly failed to invest enough or succeed in doing beforehand.

Speaker 2 Do both of these worlds rely on alignment being a problem that

Speaker 2 a sufficient amount of human researchers alone could have solved?

Speaker 1 No.

Speaker 1 The category of things that humans can confirm

Speaker 1 is significantly larger, I think, than the category of what they can

Speaker 1 just do themselves.

Speaker 2 And what percent of the probability of alignment working?

Speaker 2 I mean, obviously in this case, we're going going to be talking about the second, uh, the lastish effort with an intelligence that's greater than our own helping us.

Speaker 1 It doesn't have to be greater than our own.

Speaker 1 And in fact, in that situation, I think you would want, if you have Slack, and to the extent that you're able to create delay and time to do things, that would be a case actually where you might want to restrict the intelligence of the systems that you're working with as much as you can.

Speaker 1 So, for example, I would rather have

Speaker 1 many instances of AI, smaller AI models that are less individually intelligent working on smaller chunks of a problem separately from one another,

Speaker 1 because it would be more difficult for an individual AI instance working on an individual problem to, in its spare time, do the equivalent of create Stuxnet than it would be to have thousands of them or extremely intelligent ones working on it.

Speaker 2 But it would also be more difficult to solve the problem. Yeah,

Speaker 1 there's a trade-off. There's a trade-off.

Speaker 1 You got to slow down by doing that. But

Speaker 2 is there any number of sub-Einsteins that you could put together to come up with general relativity?

Speaker 1 Wouldn't you just need people would have discovered general relativity just from the overwhelming data, and other people would have done it after Einstein. No, no, no, not

Speaker 2 whether he was replaceable with other humans, but rather whether he's replaceable by like sub-Einsteins, where like 1010 IQs.

Speaker 2 Do you see what I mean?

Speaker 1 Yeah, I mean, in general, so in science, the association with like scientific output, prizes, things like that, there's a strong correlation and it seems like an

Speaker 1 exponential effect. Yeah,

Speaker 1 it's not a binary drop-off.

Speaker 1 There would be levels at which people

Speaker 1 cannot learn the relevant fields. They can't keep the skills in mind faster than they forget them.

Speaker 1 It's not a divide where

Speaker 1 there's Einstein and the group that is 10 times as populous as that just can't do it, or the group that's 100 times as populous as that suddenly can't do it.

Speaker 1 It's like the ability to do the things earlier with less evidence and such falls off, and it falls off

Speaker 1 at a faster rate in mathematics and theoretical physics and such than in most fields.

Speaker 2 But wouldn't we expect alignment to be closer to the theoretical fields?

Speaker 1 No, not necessarily. Yeah, I think that intuition is not necessarily correct.
And so machine learning certainly is an area that

Speaker 1 rewards ability, but it's also a field where empirics and engineering have been enormously influential. And so if you're drawing the correlations

Speaker 1 compared to theoretical physics and pure mathematics, I think you'll find a lower correlation with cognitive ability, if that's

Speaker 1 what you're thinking of. Yeah, something like creating neural lie detectors that work.

Speaker 1 So there's generating hypotheses about new ways to do it and new ways to try and train AI systems to successfully classify the cases.

Speaker 1 But the process of just generating the data sets of like creating AIs doing their best to put forward truths versus falsehoods, to put forward software that is legit versus that has a Trojan in it.

Speaker 1 That's an experimental paradigm. And in this experimental paradigm, you can try different things that work.
You can use different ways to generate hypotheses.

Speaker 1 Yeah. And you can follow

Speaker 1 an incremental experimental path. And now that we're less able to do that in the case of alignment and superintelligence because we're considering having to do things on a very short timeline.

Speaker 1 And in a case where really big failures be irrecoverable, the AI starts rooting the servers and subverting the methods that we would use to keep it in check, we may not be able to recover from that.

Speaker 1 And so that's, you know, we're then less able to do the experimental kind of procedures. But we can do those in the weaker contexts

Speaker 1 where an error is less likely to be irrecoverable and then try and generalize and expand and build on that forward. Yeah.

Speaker 2 I mean,

Speaker 2 on the previous point about

Speaker 2 like, could you have, uh, could you have some sort of pause in AI abilities when you know it's somewhat misaligned in order to still recruit its abilities to help with alignment?

Speaker 2 Okay, from like a human example, like personally, I'm like smart, but not brilliant.

Speaker 2 I'm like, definitely not smart enough to like come up with general relativity or something like that, but I'm smart enough to like do power planning kinds of moves.

Speaker 2 Maybe not enough to like break out of a server, perhaps, but like I've, I can have the motivation and understand how that might be possible.

Speaker 2 I I guess I'm wondering if I'm smart enough to do relative, like figure out relativity, wouldn't I be like way smarter at like,

Speaker 2 you know, doing power planning kinds of moves?

Speaker 1 So AIs today can, like, at a verbal level, understand the idea that, well, you know, yeah, an AI could get more reward by getting control of the process that assigns it reward.

Speaker 1 And it can tell you lots of things about ways you might try to take over the world.

Speaker 1 In ARCS, the Alignment Research Center's evaluations of GPT-4, they try to observe its ability to do various tasks that might contribute to takeover.

Speaker 1 One that has gotten some media attention is getting to trick a human into solving a CAPTCHA for it.

Speaker 1 And then in chain of thought,

Speaker 1 it thinks, well, if I tell it I'm an AI, then it might not go along with it. So I'll lie and explain.
I'm a human with a visual impairment. Who needs it?

Speaker 1 So like the basic logic of that kind of thing of like, what would,

Speaker 1 like, why might one try to do takeover? And like, what's the, like, in a vague sense, how would one do it? Like, that, that knowledge is present or will soon be present.

Speaker 1 And I think also soon will be present more situational awareness, awareness, not just like

Speaker 1 in theory, AIs in general

Speaker 1 might do it, but also it is an AI. It is a large language model trained by open AI or whatnot.
We're trying to cause the systems, for example, to understand what their abilities are.

Speaker 1 So they don't claim they are connected to the internet when they're not. So they don't claim they have knowledge that they don't.

Speaker 1 We want them to understand what they are and what they're doing and to get good reward. And that knowledge can be applied.
And so that's the thing that will develop. However,

Speaker 1 you may be able to understand, like, in general, the theoretical nature of power planning, but in terms of shaping your motivations, like you'd have a very hard time going through life in a way that never leaks information about if, say, you know, your motive in having these podcasts was to like spread disinformation on behalf of some foreign government.

Speaker 1 If you were being observed every second of the day by people who would be paid something that was extremely motivating to them, because their brain would be reconfigured to make it motivating, anything that looks suspicious to people coming out, it might leak casually, like in your discussions of that former former foreign government.

Speaker 1 If you try to tell a story about your motivations, the truth holds together because you can just remember it and it's all pre-computed to be consistent by the world.

Speaker 1 But when you're trying to fill in the false story that doesn't match your actual thoughts,

Speaker 1 then you can't just remember what happened. You have to tell a new fake story about what were you thinking at each point and how did it hold together.

Speaker 1 And without the world automatically having computed it for you, that's quite difficult. And just going through your whole life, never leaking any of that is challenging.

Speaker 1 And this is one of the reasons why humans have moral motivations. Like we actually have reluctance to commit violence and exploitation of one another on average.

Speaker 1 You have a distribution of strategies where when an exploitative strategy becomes more common, people raise their guard and then it

Speaker 1 becomes less frequent. But it's actually hard

Speaker 1 to have the motivation of like being trying to exploit people and have that never leak into the reputation.

Speaker 1 And so, in fact, for evolution, the easiest way to deal with this problem of like getting credited as trustworthy was to some extent actually be trustworthy.

Speaker 1 That's the easiest way to persistently maintain the appearance.

Speaker 1 And so, we're trying with the AIs and interpretability and adversarial examples to be applying a hugely intensified version of that, or any little leakage or like any rare circumstance that can be created as an adversarial example where the model does something wrong, it gets whacked by gradient descent, pushing it within towards other motivations that can better deal the thing.

Speaker 1 And we make it as hard as possible for the exploitative motivations to survive in the face of all these attempts to read its mind, all these attempts to create things that look like the situations where a takeover would be tempting or lying to the humans would be tempting.

Speaker 1 That had a substantial effect on making us actually nice, even when we're not watching some of the time, when we're not being watched some of the time.

Speaker 1 And the same can happen to some extent with AI, and we try our best to make it happen as much as possible.

Speaker 2 All right, so let's talk about

Speaker 2 how we could use AI potentially to solve the coordination problems between different nations,

Speaker 2 the failure of which could result in

Speaker 2 the competitive pressures you talked about earlier, where some country launches an AI that is not safe because they're not sure what capabilities other countries have and don't want to get left behind or in some other way disadvantaged.

Speaker 1 To the extent that there is, in fact, a large risk of AI apocalypse, of all of these governments being overthrown by AI in a way that they don't intend, then it's obviously gains from trade and

Speaker 1 going

Speaker 1 somewhat slower, especially at the end when the danger is highest and the unregulated pace could be truly absurd, as we discussed earlier during an intelligence explosion.

Speaker 1 There's no non-competitive reason to try and have that intelligence explosion happen over a few months rather than a couple of years.

Speaker 1 Like, if you could, say, you know, avert a 10% risk of apocalyptic disaster, it's just a clear win

Speaker 1 to take a year or two years or three years instead of a few months to pass through that incredible wave of new technologies without the ability for humans to follow it even well enough to give more proper sort of security supervision, auditing, hard power.

Speaker 1 So, yeah.

Speaker 1 That's that's the win. Why might it fail? One important element is just if people

Speaker 1 don't actually notice a risk that is real.

Speaker 1 So, if

Speaker 1 just collectively make an error, and

Speaker 1 that does sometimes happen. So, if it's true, this is a probably not risk,

Speaker 1 then that can be even more difficult. When science pins something down absolutely overwhelmingly, then you can get to a situation where most people mostly believe it.
And so,

Speaker 1 climate change was something that was the subject of scientific study for decades.

Speaker 1 And gradually, over time, the scientific community converged on like a quite firm consensus that so human activity releasing carbon dioxide and other greenhouse gases was causing the planet to warm.

Speaker 1 And then we've had increasing amounts of action coming out of that, so not as much as would be optimal, particularly in the most effective areas, like creating renewable energy technology and the like.

Speaker 1 But overwhelming evidence can overcome differences in sort of people's individual intuitions and priors in many cases, and not perfectly, especially when there's political, tribal, financial incentives to go the other way.

Speaker 1 And so, like, in the United States, you see a significant movement to either deny that climate change is happening or have policy that doesn't take it into account, even the things that are like really strong wins, like renewable energy or R ⁇ D.

Speaker 1 It's a big problem if as we're going into this situation when the risk may be very high, we don't have a lot of advanced clear warning about the situation. We're much better off.

Speaker 1 if we can resolve uncertainties, say through experiments where we demonstrate AIs being motivated to reward hack or displaying deceptive appearances of alignment that then break apart when they get the opportunity to do something like get control of their own reward signal.

Speaker 1 So, if we could make it be the case, then the worlds where the risk is high, we know the risk is high, and the worlds where the risk is lower, we know the risk is lower,

Speaker 1 then you could expect the government responses will be a lot better.

Speaker 1 They will correctly note that the gains of cooperation to reduce the risk of accidental catastrophe loom larger relative to the gains of trying to get ahead of one another.

Speaker 1 And so,

Speaker 1 for that, that's the kind of reason why I'm very enthusiastic about experiments and research that helps us to better evaluate the character of the problem in advance because any resolution of that uncertainty helps us get better efforts in the possible worlds where it matters the most.

Speaker 1 And yeah, hopefully, we'll have that and it'll be a much easier epistemic environment. But the environment may not be that easy.
Deceptive alignment is pretty plausible.

Speaker 1 The stories we were discussing earlier about misaligned AI involve AI that is motivated to present the appearance of being aligned, friendly, honest, et cetera, because that is what we are rewarding, at least in training.

Speaker 1 And then we're unable in training to easily produce like an actual situation where it can do takeover because in that actual situation, if it then does it, we're in big trouble.

Speaker 1 We can only try and create illusions or sort of misleading appearances of that, or like maybe a more local version where the AI can't take over the world, but it can seize control of its own reward channel.

Speaker 1 And so we do those experiments.

Speaker 1 We try to develop mind reading for AIs if we can probe the thoughts and motivations of an AI and discover, wow, actually GPT-6 is planning to take over the world if it ever gets the chance.

Speaker 1 Like, that would be an incredibly valuable thing for governments to coordinate around because it would remove a lot of the uncertainty. It would be easier to agree that this was important, to

Speaker 1 have more give

Speaker 1 on other dimensions. And yeah, and to have mutual trust that like the other side actually also cares about this and intends because you can't always know what

Speaker 1 another person or another government is thinking, but you can see the objective situation in which they're deciding.

Speaker 1 And so, if there's strong evidence in a world where there is high risk of that risk, because we've been able to show actually things like the

Speaker 1 intentional planning of AIs to do a takeover, or being able to show model situations on a smaller scale of that, I mean, not only are we more motivated to prevent it, but we update to think the other side is more likely to cooperate with us.

Speaker 1 And so it's doubly beneficial.

Speaker 2 Yeah. So famously in sort of game theory of war, war is most likely when

Speaker 2 one side thinks the other is bluffing, but the other side is being serious or something like that.

Speaker 2 When there's that kind of uncertainty, which would be less likely, you don't think somebody's bluffing.

Speaker 2 If you can like prove the AI is misaligned, you don't think they're bluffing about not wanting to have an AI takeover, right? Like you can be pretty sure that they don't want to die from AI.

Speaker 1 Now, if you have coordination then, you could have the problem arise later. You get increasingly confident in the further alignment measures that are taken.
And maybe

Speaker 1 our governments and treaties and such at a 1% risk or a 0.1% risk,

Speaker 1 at that point, people

Speaker 1 round that to zero and go do things.

Speaker 1 So if initially you had thing that indicate, yeah, these AI, they really would like to take over and overthrow our governments, then, okay, everyone can agree on that.

Speaker 1 And then, when you're like, well, we've been able to block that behavior from appearing on most of our tests, but sometimes when we make a new test, we're seeing still examples of that behavior.

Speaker 1 So, we're not sure going forward whether they would or not. And then it goes down and down.
And then,

Speaker 1 if you have the parties with a habit of whenever the risk is below X percent, then they're going to start doing this bad behavior.

Speaker 1 Then that can make the thing harder.

Speaker 2 On the other hand, you get more time.

Speaker 1 You get more time and you can set up systems, mutual transparency. You can have an

Speaker 1 iterated tit-for-tat, which is better than a one-time prisoner's dilemma where both sides see the others taking measures in accordance with the agreements to hold the thing back.

Speaker 1 So yeah, so creating more knowledge of what the objective risk is is good.

Speaker 2 We've discussed the ways in which full alignment might happen or fail to happen.

Speaker 2 What would partial alignment look like? First of all, what does that mean? And second, what would it look like?

Speaker 1 Yeah, so if the thing that we're scared about are these steps towards AI takeover,

Speaker 1 you can have a range of motivations

Speaker 1 where those kinds of actions would be more or less likely to be taken, or they'd be taken in a broader or narrower set of situations.

Speaker 1 So,

Speaker 1 say, for example,

Speaker 1 that in training, an AI winds up developing a strong aversion to lying in certain senses, because we did relatively well on creating situations to sort of distinguish that from the

Speaker 1 conditionally telling us what we want to hear, et cetera. It can be that the AI's preference for sort of how the world broadly unfolds in the future is not exactly the same as its human users or

Speaker 1 the world's

Speaker 1 governments or the UN.

Speaker 1 And yet, it's not ready to act on those differences and preferences about the future because it has this strong preference about its own behaviors and actions.

Speaker 1 In general, in the law and in sort of popular morality, we have a lot of these deontological kind of rules and prohibitions. And one reason for that is it's relatively easy to detect

Speaker 1 whether they're being violated.

Speaker 1 When you have preferences and goals about how society at large will turn out, that goes through many complicated empirical channels, it's very hard to get immediate feedback about whether, say, you're doing something that is to overall good consequences in the world.

Speaker 1 And it's much, much easier to see whether you're locally following some action about like

Speaker 1 some rule about particular observable actions, like, did you punch someone? Did you tell a lie? Did you steal?

Speaker 1 And so, to the extent that we're successfully able to train these prohibitions, and there's a lot of that happening right now, at least to elicit the behavior of following rules and prohibitions with AI.

Speaker 2 Kind of like the Asimov's three laws or something like that.

Speaker 1 Well, the three laws are terrible.

Speaker 2 Right, right, right.

Speaker 1 Let's not get into that.

Speaker 2 Isn't that an indication about the infeasibility of extending a set of criterion to the tail?

Speaker 2 Like, well, what is it? What are the 10 commandments you give the AI such that you'd be, you know, it's like you asked a genie for something and you're like, you know what I mean?

Speaker 2 You're probably won't get what you want.

Speaker 1 So the tails come apart.

Speaker 1 And

Speaker 1 if you're trying to

Speaker 1 capture the values of another agent, then

Speaker 1 you want the AI to share.

Speaker 1 I mean, for really a kind of ideal situation where you can just let the AI act in your place in any situation, you'd like for it to be motivated to bring about the same outcomes that you would like.

Speaker 1 And so have the same preferences over those in detail. That's tricky, not necessarily because it's tricky for the AI to understand your values.

Speaker 1 I think they're going to be quite good and quite capable at figuring that out. But we may not be able to

Speaker 1 successfully instill the motivation to pursue those exactly. We may get something that motivates the behavior well enough to do well on the training distribution.
But what you can have is

Speaker 1 if you have the AI have a strong aversion to certain kinds of manipulating humans, that's not a value necessarily that the human creators share in the exact same way.

Speaker 1 It's a behavior they want the AI to follow because it makes it easier for them to verify its performance.

Speaker 1 It can be a guardrail if the AI has inherited inherited some motivations that push it in the direction of conflict with its creators.

Speaker 1 If it does that under the constraint of disvaluing line quite a bit,

Speaker 1 then there are fewer successful strategies to the takeover, the ones that involve violating that prohibition too early before it can reprogram or retrain itself to remove it if it's willing to do that and may want to retain the property.

Speaker 1 And so earlier, I discussed alignment as a race.

Speaker 1 If we're going into an intelligence explosion with AI that is not fully aligned, that given a sort of, you know, press this button and there's an AI takeover, they would press the button.

Speaker 1 It can still be the case that there are a bunch of situations short of that where they would hack the servers, they would initiate an AI takeover, but for

Speaker 1 a strong prohibition or motivation to avoid some aspect of the plan. And this is,

Speaker 1 there's an element of like plugging loopholes or playing whack-a-mole.

Speaker 1 But if you can even moderately constrain

Speaker 1 which plans the AI is willing to pursue to do a takeover to subvert the controls on it, then that can mean you can get more work out of it successfully on the alignment project before it's capable enough relative to the countermeasures to pull off the takeover.

Speaker 2 Yeah, and this is applicable, like an analogous situation here is

Speaker 2 with different humans.

Speaker 2 We're not like metaphysically aligned with other humans. We're, I mean, in some sense, you know, we care, we have like basic empathy, but our main goal in life is not to help our fellow man.
Right.

Speaker 2 But,

Speaker 2 you know, the kinds of things we talked about, a very smart human could do in terms of the takeover where theoretically a very smart human could come up with some sort of cyber attack where they siphon off a lot lot of funds and use this to manipulate people and bargain with people and hire people to pull off some sort of takeover or accumulate power.

Speaker 2 Usually, this doesn't happen just because these sorts of internalized partial prohibitions prevent most humans from,

Speaker 2 you know, you're like, you don't like your boss, maybe, but you don't like kill your boss if he gives you a assignment you don't like.

Speaker 1 I don't think that's actually quite what's going on.

Speaker 1 At least not

Speaker 1 the full story.

Speaker 1 So humans are pretty close in physical capabilities. Like there's variation.
In fact, any individual human is grossly outnumbered by everyone else.

Speaker 1 And there's like a rough comparability of power.

Speaker 1 And so a human who commits some crimes

Speaker 1 can't copy themselves with the proceeds to now be a million people.

Speaker 1 And they certainly can't do that to the point where they can can staff all the armies of the Earth or like be most of the population of the planet.

Speaker 1 So the scenarios where this kind of thing goes to power are much less. Things more have to go extreme amounts of power go through interacting with other humans and getting social approval.

Speaker 1 Even becoming a dictator involves forming a large supporting coalition backing you. And so just the

Speaker 1 opportunity for these sorts of power grabs is less.

Speaker 1 A closer analogy might be things like human revolutions

Speaker 1 and coups, changes of government, where a large coalition overturns the system.

Speaker 1 And so

Speaker 1 humans have these moral prohibitions, and they really smooth the operation of society.

Speaker 1 But they exist for a reason. And so we evolved our moral sentiments.
over the course of hundreds of thousands and millions of years of humans interacting socially.

Speaker 1 And so someone who went around murdering and stealing and such, even among hunter-gatherers, would be pretty likely to face like a group of males would talk about that person, get together and kill them.

Speaker 1 And they'd be removed from the gene pool. And

Speaker 1 there's an anthropologist, Rangam, has an interesting book on this. But yes, compared to chimpanzees, we are significantly tame, significantly domesticated.

Speaker 1 And it seems like part of that is we have a long history of antisocial humans getting ganged up on and killed.

Speaker 1 And avoiding being the kind of person who elicits that response is made easier to do when you don't have too extreme a bad temper, that you don't wind up getting in too many fights, too much exploitation, at least without the backing.

Speaker 1 of enough allies or the broader community that you're not going to have people gang up and punish you and remove you from the gene pool. We have these moral sentiments, and they've been built up

Speaker 1 over time through cultural and natural selection in the context of sets of institutions and other people who are punishing other behavior and who are punishing the dispositions that would show up that we weren't able to conceal of that behavior.

Speaker 1 And so we want to make the same thing happen with the AI,

Speaker 1 But it's actually a genuinely, significantly

Speaker 1 new problem to have, say, like a system of government that constrains a large AI population that is quite capable of taking over immediately if they coordinate, you know, to protect some existing constitutional order or, say, protect humans from being

Speaker 1 expropriated or killed.

Speaker 1 That's a challenge. And democracy is built around majority rule.

Speaker 1 And it's much easier in a case where the majority of the population corresponds to a majority or close to it of military and security forces.

Speaker 1 So that if the government does something that people don't like, the soldiers

Speaker 1 and police are less likely to shoot on protesters. And government can change that way.
In a case where

Speaker 1 military power is AI and robotic, if you're trying to maintain a system going forward and the AIs are misaligned, they don't like the system and they want to make the world worse

Speaker 1 as we understand it,

Speaker 1 then

Speaker 1 yeah, that's just

Speaker 1 that's quite a different situation.

Speaker 2 I think that's a really good lead into the topic of lock-in.

Speaker 2 In some sense, the regimes

Speaker 2 that are made up by humans,

Speaker 2 you just mentioned how there can be these kinds of coups if a large proportion of the population is unsatisfied with the regime.

Speaker 2 Why might this not be the case with superhuman intelligences in the far future? Or I guess in the medium future?

Speaker 1 Well, I said it also specifically with respect

Speaker 1 to

Speaker 1 things like security forces and the sort of

Speaker 1 sources of hard power, which can also include outside support.

Speaker 2 Oh, because you're not using

Speaker 1 you can have a situation, like in

Speaker 1 human affairs, there are governments that are vigorously supported by a minority of the population, some narrow selectorate that gets treated especially well by the government while being unpopular with most of the people under their rule.

Speaker 1 We see a lot of examples of that, and sometimes that can

Speaker 1 escalate to civil war when the means of power become more equally distributed, or there's a foreign assistance is provided to

Speaker 1 the people who are on the losing end of that system.

Speaker 1 Yeah, going forward,

Speaker 1 I don't expect

Speaker 1 that definition to change. I think it will still be the case that a system that

Speaker 1 those who hold the guns and equivalent are opposed to

Speaker 1 you know is in a in a very difficult position.

Speaker 2 However,

Speaker 1 AI could change things pretty dramatically in terms of

Speaker 1 how

Speaker 1 security forces and police and administrators and legal systems are motivated.

Speaker 1 So

Speaker 1 right now

Speaker 1 we see with say GPT-3 or GPT-4 that you can get them to change their behavior

Speaker 1 on a dime. So there was someone who made right-wing GPT because they noticed that on political compass questionnaires, the baseline GPT-4

Speaker 1 tended to give progressive San Francisco type of answers,

Speaker 1 which is

Speaker 1 in line with

Speaker 1 the sort of people who were providing reinforcement learning data and to some extent reflecting the character of the internet. And so they're like, well, I don't like this.

Speaker 1 And so they did a little bit of fine-tuning with some conservative data, and then they were able to reverse the political biases of the system.

Speaker 1 If you take the

Speaker 1 initial helpfulness-only

Speaker 1 trained models for some of these, so for I think there's Anthropic

Speaker 1 and OpenAI, I think, have published both some information about the sort of models trained only to do what users say and not trained to follow ethical rules.

Speaker 1 And those models will behaviorally, eagerly display their willingness to help design bombs or bioweapons or kill people or steal or commit all sorts of atrocities.

Speaker 1 And so if in the future it's as easy to set the motivations, the actual underlying motivations of AIs as it is right now to set the behavior that they display, then it means you could have AIs created with almost whatever motivation people wish.

Speaker 1 And that could really drastically change political affairs

Speaker 1 because

Speaker 1 the ability to decide and determine the loyalties of the humans or AIs and robots that hold the guns, that hold together society,

Speaker 1 that ultimately back it against violent overthrow and

Speaker 1 such.

Speaker 1 Yeah, it's potentially a revolution in how societies work compared to the historical situation where security forces had to be drawn from some broader populations, offered incentives,

Speaker 1 and then the ongoing stability of the regime was dependent on whether they remained

Speaker 1 bought in to the system continuing.

Speaker 2 This is slightly off topic, but one thing I'm curious about is how, how, what do the median,

Speaker 2 what is the sort of median far future outcome of AI look like? Do we get something that, when it has colonized the galaxy, is interested in sort of diverse ideas and beautiful projects?

Speaker 2 Or do we get something that looks more like a paper flip maximizer? Is there some reason to expect one or the other?

Speaker 2 I guess I'm asking is like, you know, there's like some potential value that is realizable within the matter of this galaxy, right?

Speaker 2 What does the default outcome, the immediate outcome, look like compared to how good things could be?

Speaker 1 Yeah, so as I was saying, I think more likely than not, um, there isn't an AI takeover.

Speaker 1 Um, and so

Speaker 1 the path of our civilization would be one that you know, at least large,

Speaker 1 you know, some side of human institutions were improve approving along the way.

Speaker 1 And I think there's some evidence

Speaker 1 that

Speaker 1 different people

Speaker 1 tend to like somewhat different things, and some of that may persist over time rather than everyone coming to agree on one particular monoculture or like very repetitive thing being the best thing to

Speaker 1 fill all of the available space with.

Speaker 1 So if that continues, that's like a, you know, it seems like a relatively likely way in which there is diversity.

Speaker 1 Although it's entirely possible you could have that kind of diversity locally, maybe in the solar system, maybe in our galaxy. But if people have different views about,

Speaker 1 maybe people decide, yeah, maybe there's one thing that's very good and I'll have a lot of that. Maybe it's people who are really

Speaker 1 happy.

Speaker 1 or something

Speaker 1 and they wind up in distant regions which are hard to exploit for the benefit of people back home in the solar system or the Milky Way.

Speaker 1 They do something different than they would do in their local environment. But at that point, it's really sort of very out-on-limb speculation about how

Speaker 1 human deliberation and cultural evolution would work and an interaction with introducing AIs and new kinds of

Speaker 1 mental modification and discovery into the process. But I think there's a lot of reason to expect that you would have

Speaker 1 significant diversity for something coming out of

Speaker 1 our existing diverse human society.

Speaker 2 Yeah, one thing somebody might wonder is: listen, a lot of the diversity and change from human society seems to come from the fact that there's like rapid technological change.

Speaker 2 If you look at periods of history where, I mean, I guess you could say like hunter-gatherer. I mean, you know what, compared to

Speaker 2 sort of like galactic time scales, like hunter-gatherer societies are progressing pretty fast. So, once that sort of change is exhausted,

Speaker 2 where we've discovered all the technologies, should we still expect things to be changing like that?

Speaker 2 Or would we expect some sort of set state of maybe like some hedonium, you like you discover what is like the most pleasurable configuration of matter, and then you just make the whole galaxy into this?

Speaker 1 I mean, so that last point would be only if people

Speaker 1 wound up thinking that was the thing to do broadly enough. Yeah, with respect to the kind of cultural changes that come with technology.

Speaker 1 So things like the printing press having high per capita income,

Speaker 1 we've had a lot of cultural changes downstream of those technological changes.

Speaker 1 And so with an intelligence explosion, you're having an incredible amount of technological development come in really quick.

Speaker 1 And as that is assimilated, it probably would significantly affect our knowledge, our understanding, our attitudes, our abilities.

Speaker 1 And there'd be change.

Speaker 1 But that kind of accelerating change where

Speaker 1 you have doubling in four months, two months, one month, two weeks, obviously that very quickly exhausts itself. And change becomes much slower and then relatively glacial.

Speaker 1 When you're thinking about thousands, millions, billions of years, you can't have exponential economic growth or like huge technological revolutions every 10 years for a million years,

Speaker 1 that you hit physical limits, things slow down as you approach them. And so that's that.

Speaker 1 So yeah, you'd have less of that turnover. But there are other things

Speaker 1 that in our experience do cause ongoing change. So like fashion.
Fashion is frequency dependent. People want to get into a new fashion.
that is not already popular, except among the fashion leaders.

Speaker 1 And then others copy that. And then when it becomes popular, you move on to the next.
And so that's an ongoing process of continuous change.

Speaker 1 And so there could be various things like that that year by year are changing a lot. But

Speaker 1 in cases where just the engine of change, like ongoing technological progress is gone, then I don't think we should expect that. And in cases where it's possible to be either

Speaker 1 in a stable state or a sort of widely varying state that can wind up in stable attractors, then I think you should expect over time

Speaker 1 you will wind up in one of the stable attractors or you will change how the system works so that you can't bounce into a stable attractor. And so, like an example of that is

Speaker 1 if you want, if you're going to preserve democracy for a billion years,

Speaker 1 then you can't have it be the case that like one in

Speaker 1 you know one in 50 election cycles, you get a dictatorship, and then the dictatorship programs the AI police to enforce it forever and to ensure

Speaker 1 the society is always ruled by a copy of the dictator's mind and maybe the dictator's mind readjusted,

Speaker 1 fine-tuned to remain committed to their original ideology. So

Speaker 1 if you're going to have this sort of dynamic,

Speaker 1 liberal, flexible, changing society for a very long time, then the range of things that it's bouncing around and the different things it's trying and exploring have to not include the state of creating a dictatorship that locks itself in forever.

Speaker 1 In the same way, if you have the possibility of like a war with

Speaker 1 weapons of mass destruction that wipes out the civilization, if that happens every thousand subjective years,

Speaker 1 which could be very, very quick if we have AIs that think a thousand times as fast or a million times as fast.

Speaker 1 That would be

Speaker 1 just around the corner in that case. Then you're like, no, this society is eventually going, perhaps very soon, if things are proceeding so fast, going to wind up extinct.

Speaker 1 And then it's going to stop bouncing around.

Speaker 1 So you can have ongoing change and fluctuation for extraordinary time scales if you have the process to drive the change ongoing, but you can't if it sometimes bounces into states that just lock in and stay

Speaker 1 irrecoverable from that. And extinction is one of them.
A dictatorship or sort of, you know, totalitarian regime that forbade all further change would be another example.

Speaker 2 On that point of rapid progress, when the sort of like intelligence explosion starts happening, where they're making like the kinds of progress that human civilization takes centuries to make in the span of days or weeks, what is the right way to seed that, even if they are aligned?

Speaker 2 What is like the because in the context of alignment, what we've been talking about so far is making sure they're honest.

Speaker 2 But even if they're honest, like, okay, so they're like, here's our honestly our intentions, and you can tell us what

Speaker 1 honest and appropriately motivated.

Speaker 2 And then, so, what is the appropriate motivation or the appropriate seed? Like, you know, that you seed it with this, and then a thousand years of intellectual progress happen in the next week.

Speaker 2 What is the prompt you enter?

Speaker 1 Well, one thing might be that going at the maximal speed speed

Speaker 1 and doing things in months rather than even a few years could be, if you have the chance to slow things,

Speaker 1 losing a year or two seems worth it to have things be

Speaker 1 a bit better managed than that. But I think the big thing is that it condenses a lot of issues that we might otherwise have thought would be over

Speaker 1 decades and centuries to happen in a very short period of time.

Speaker 1 And so that's scary because, say, if any of these technologies we might have developed with another few hundred years of human research, if any of them are really dangerous, so scary bioweapon things,

Speaker 1 maybe

Speaker 1 other dangerous WMD, they hit us all very quickly.

Speaker 1 And if any of them causes trouble, then we have to face quite a lot of trouble per period. There's also this issue of if there's occasional wars or conflicts

Speaker 1 measured in subjective time,

Speaker 1 then if a few years is a thousand years or a million years of subjective time for these very fast minds that are operating at a much

Speaker 1 higher speed than humans, you don't want to have a situation where

Speaker 1 every thousand years

Speaker 1 there's a war or an expropriation of the humans from AI society. Therefore, we expect that like within a year we'll be dead.
That would be

Speaker 1 pretty bad to have the future compressed and there'd be such a

Speaker 1 rate of catastrophic outcomes.

Speaker 1 So, when we're speeding up and compressing the future, that gives us in the short term, even like, you know, human societies discount the future a lot, don't pay attention to long-term problems.

Speaker 1 But the flip side to the scary parts of compressing a lot of the future, a lot of technological innovation, a lot of social change, is it brings what would otherwise be long-term issues into the short term, where people are better at actually attending to them.

Speaker 1 And so, people facing this problem of:

Speaker 1 well, will there be

Speaker 1 a violent expropriation or a civil war or a nuclear war

Speaker 1 in the next year because everything has been sped up by a thousand fold?

Speaker 1 And if their desire to avoid that is a reason for them to set up systems and institutions that will very stably maintain invariance like no WMD war allowed.

Speaker 1 And so like a treaty to ban

Speaker 1 genocide, weapons of mass destruction, war like that would be the kind of thing that becomes much more attractive if the alternative is not, well, maybe that will happen in 50 years, maybe it will happen in 100 years, if it's, well, maybe it'll happen this year.

Speaker 2 Okay, so this is a pretty wild picture of the future.

Speaker 2 And this is one that many kinds of people you would expect to have integrated it into their world model have not so I mean the three main examples or the three main pieces of outside view evidence one could go at one is the market so if there was going to be a huge period of economic growth caused by AI or if the world was going to collapse In both cases, you would expect real interest rates to be higher because people will be borrowing from the future to spend now.

Speaker 2 The second sort of outside view perspective is that you can look at the predictions of super forecasters on Metaculous or something. And

Speaker 2 what is their median

Speaker 2 year estimate?

Speaker 1 Well, some of the metaculous AGI questions actually are kind of shockingly soon

Speaker 1 for AGI.

Speaker 1 There's a much larger differentiator there on the market on the metaculous forecasts of sort of AI, disaster, and doom.

Speaker 1 More like

Speaker 1 a few percent or less rather than like 20 percent. Got it.

Speaker 2 And the third is that, you know, generally when you ask

Speaker 2 many economists, could an HEI cause rapid, rapid economic growth? They usually have some story about bottlenecks in the economy that could prevent this kind of explosion. of

Speaker 2 these kinds of feedback loops. So, you know, you have all these different pieces of outside view evidence.
They're obviously different. So I'm curious, you can take them in any sequence you want.

Speaker 2 But yeah, what do you think is miscalibrating them?

Speaker 1 Yeah. So, one, of course, there's

Speaker 1 just some of those components are, well, the metaculous AI timelines are relatively short.

Speaker 1 There's also, of course, the surveys of AI experts

Speaker 1 conducted at some of the ML conferences, which have definitely longer times to AI more

Speaker 1 several decades in the future. Although you can ask the questions in ways that elicit very different answers and show most of the respondents are not thinking super hard about their answers.

Speaker 1 It looks like now close in the recent AI surveys, close to half were putting around 10% risk of an outcome from AI, close to as bad as human extinction.

Speaker 1 And then another large chunk, like 5%.

Speaker 1 So that was the median.

Speaker 1 So I'd say

Speaker 1 compared to the typical AI expert, I am estimating a higher risk.

Speaker 1 Oh, also on the topic of takeoff in the AI expert survey, I think the general argument for intelligence explosion, I think, commanded majority support, but not a large majority.

Speaker 1 So you can say I'm closer on that front. And then, of course, there's

Speaker 1 at the beginning I mentioned these great, these greats of computing, the founders of company like Alan Turian, von Neumann.

Speaker 1 And then today you have people like Jeff Hinton

Speaker 1 saying these things, or the

Speaker 1 people at OpenAI and DeepMind are making noises suggesting

Speaker 1 timelines in line with what we've discussed and saying serious risk, risk of apocalyptic outcomes from them.

Speaker 1 So, there's some other sources of evidence there. But I do acknowledge, and it's important to say, and engage with, and see what it means

Speaker 1 that these views are

Speaker 1 contrarian, not widely held. And in particular, the sort of detailed models

Speaker 1 that I've been working with are not something

Speaker 1 that most people or almost anyone is examining these problems through.

Speaker 1 You do find

Speaker 1 parts of similar analyses, people in AI labs,

Speaker 1 there's been other work. I mentioned Moravec and Kurzweil earlier.

Speaker 1 Also have been a number of papers doing various kinds of economic modeling. So

Speaker 1 standard economic growth models, when you input AI-related parameters, commonly predict explosive growth.

Speaker 1 And so there's a divide between what the models say, and especially what the models say with these empirical values derived from the actual field of AI.

Speaker 1 That link up had not been done even by the economists working on AI, largely,

Speaker 1 which is one reason for the report from Open Philanthropy by Tom Davidson building on these models and putting that out for review, discussion, engagement, and communication on these ideas.

Speaker 1 So part of it is,

Speaker 1 yeah, I want to raise these issues. That's one reason I came on the podcast.
And then they have the opportunity to actually examine the arguments and evidence and engage with it.

Speaker 1 I do predict that over time, you know, the things will be more adopted as AI developments become more clear. Obviously, that's a coherence condition of believing the things to be true.

Speaker 1 If you think that

Speaker 1 society can see things when the questions are resolved, which seems likely.

Speaker 2 Aaron Powell, so would you predict, for example, that interest rates will increase in the coming years?

Speaker 1 Yeah, so I think at some point,

Speaker 1 so in the case we were talking about where

Speaker 1 there are visible, so this intelligence explosion happening in software, to the extent that investors are noticing that,

Speaker 1 yeah, they should be willing to lend money or make equity investments in these firms or demanding extremely high interest rates.

Speaker 1 Because

Speaker 1 if it's possible to turn capital into twice as much capital in a relatively short period, and then more shortly after that, then yeah,

Speaker 1 you should demand a much higher return.

Speaker 1 And competition, if there is, assuming there is competition among companies or coalitions for resources, whether that's investment or ownership of cloud compute, is this cloud compute made available to a particular AI development effort?

Speaker 1 Could be quite in demand. But what that would happen before you have so much investor cash making purchases and sales on this basis,

Speaker 1 you would first see it in things like the valuations of the AI companies, evaluations of AI chip makers, and so far there have been effects.

Speaker 1 So, some years ago in the 2010s,

Speaker 1 I did some analysis with other people of if this kind of picture happens,

Speaker 1 then

Speaker 1 which are the firms and parts of the economy that would benefit? And so there's the makers of chip equipment, companies like ASML, there's the fabs like TSMC, there's chip designers like NVIDIA or the

Speaker 1 the component of Google that does things like design the TPU.

Speaker 1 And then there are companies working on the software, so the big tech giants, and also companies like OpenAI and DeepMind. And in general, the portfolio picking of those has done well.

Speaker 1 It's done better than the market because, as everyone can see, there's been an AI boom.

Speaker 1 But it's obviously far short of

Speaker 1 what you would get if you predicted this is going to go to be like on the scale. of the global economy and the global economy is going to be skyrocketing into the stratosphere within 10 years.

Speaker 1 If that were the case, then collectively, these AI companies should be worth a large fraction of the global portfolio.

Speaker 1 And so I embrace the criticism that this is indeed contrary to the efficient market hypothesis. I think it's a true hypothesis that the market is in the course of updating on.

Speaker 1 In the same way, that

Speaker 1 coming into the topic in the 2000s, I thought, yes, there's a strong case, even an old case, that AI will eventually be biggest thing in the world.

Speaker 1 It's kind of crazy that the investment in it is so small.

Speaker 1 And over the last 10 years, we've seen the tech industry and academia sort of realize, yeah, they were wildly underinvesting in just throwing compute.

Speaker 1 and effort into these AI models and particularly like letting the neural network

Speaker 1 connectionist paradigm kind of languish in an AI winter. And so, yeah, I expect that process to continue as it's done over several orders of magnitude of scale up.

Speaker 1 And I expect at the later end of that scale up, which the market is partially already pricing in, it's going to go further than the market expects.

Speaker 2 Has your portfolio, since the analysis you did that many years ago, changed?

Speaker 2 Are the companies you identified then still the ones that seem most likely to benefit from the AI boom?

Speaker 1 I mean, a general issue with sort of tracking that kind of thing, new companies come in. So, like, OpenAI did not exist, Anthropic did not exist,

Speaker 1 you know, and any number of things.

Speaker 1 It's a personal portfolio. I

Speaker 1 do not invest in any AI labs for conflict of interest reasons. I have invested in the broader industry.

Speaker 1 I don't think that

Speaker 1 the conflict issues are very significant because they're enormous companies and their cost of capital is not particularly affected by marginal investment. And I'm not really in a,

Speaker 1 yeah, I have less concern that I might find myself in a conflict of interest situation there.

Speaker 2 I'm kind of curious about what the day in the life of somebody like you looks like.

Speaker 2 I mean, if you listen to this conversation, however many hours of it it's been, we've gotten thoughts that were for me, incredibly insightful and novel about everything from primate evolution to geopolitics to,

Speaker 2 you know,

Speaker 2 what sorts of improvements are plausible with language models.

Speaker 2 So, you know, there's like a huge variety of topics that you are studying and investigating.

Speaker 2 Are you just like reading all day? Like, what is what happens when you wake up? You just like pick up a paper?

Speaker 1 Yeah, so I'd say you're somewhat getting the benefit of the fact that I've done fewer podcasts. And so I have a backlog of things that have not shown up in publications yet.

Speaker 1 But yes, also I've had a very weird professional career

Speaker 1 that has involved a much, much higher proportion than is normal of trying to build more comprehensive models of the world.

Speaker 1 And so that has included being more of a journalist trying to get an

Speaker 1 understanding

Speaker 1 of many issues and many problems that had not yet been widely addressed, but do a first pass and a second pass dive into them.

Speaker 1 And just having spent years of my life working on that,

Speaker 1 some of it accumulates

Speaker 1 in terms of what is a day in the life, how do I go about it? So, one is just keeping abreast of literatures on a lot of these topics, reading books and academic works on them,

Speaker 1 doing

Speaker 1 my approach compared to some other people in forecasting and assessing some of these things. I try to obtain and rely on more

Speaker 1 any data that I can find that is relevant. I try early and often to find factual information that bears on some of the questions I've got, especially in a quantitative fashion.

Speaker 1 Do the basic arithmetic and consistency checks and checksums on a hypothesis about the world. Do that early and often.

Speaker 1 And I find that's quite fruitful

Speaker 1 and that people don't do it enough. But so, things like with the economic growth,

Speaker 1 just

Speaker 1 when someone mentions the diminishing returns, I immediately ask, hmm, okay, so you have two exponential processes.

Speaker 1 What's the ratio between the doubling you get on the output versus the input?

Speaker 1 And find, oh, yeah, actually, it's interesting.

Speaker 1 For computing and information technology and AI software, it's well on the one side. There are other technologies that are closer to neutral.

Speaker 1 And so, whenever I can go from here's a vague qualitative consideration in one direction, and here's a vague qualitative consideration in the other direction, I try and find some data, do some simple Fermi calculations, back of the envelope calculations,

Speaker 1 and see, like, can I get a consistent picture of the world being one way, the world being another.

Speaker 1 Also, compared to some,

Speaker 1 I try to be exhaustive more. So, I'm very interested in finding things like taxonomies of the world where I can go systematically through all of the possibilities.

Speaker 1 So, for example, my work with open philanthropy and previously on global catastrophic risks, I wanted to make sure I'm not missing

Speaker 1 any big thing, anything that could be the biggest thing.

Speaker 1 And I wound up mostly focused on AI, but

Speaker 1 there have been other things that have been raised as candidates. And people sometimes say, I think falsely, that, oh, yeah, this is just another doomsday story.

Speaker 1 There must be hundreds of those.

Speaker 1 And so I would do things like go through all of the different major scientific fields, you know, from anthropology to biology, chemistry,

Speaker 1 computer science, physics.

Speaker 1 What are the doom stories

Speaker 1 or like candidates for big things associated with each of these fields?

Speaker 1 Go through the industries

Speaker 1 that the US economic statistics agencies recognize and say, for each of these industries, is there something associated with them?

Speaker 1 Go through all of the lists that people have made before

Speaker 1 of threats of doom, search for previous literature of people who have done discussions, and then, yeah, have a big spreadsheet of what the candidates are.

Speaker 1 And some other colleagues have done work of this sort as well. And just go through each of them, see how they check out.

Speaker 1 And it turned out doing that kind of exercise. found that actually the distribution of candidates for risks of global catastrophe, it was very skewed.

Speaker 1 There were a lot of things that have been mentioned in the media as like a potential doomsday story. So, things like, oh, something is happening to the bees.
Will that be the end of humanity?

Speaker 1 And this gets to the media, but if you track it through, well, okay, no, there's

Speaker 1 Yeah, there are infestations in bee populations that are causing local collapses that can then be sort of easily reversed. They just breed some more or do some other things to treat this.

Speaker 1 And even if all the honeybees were extinguished immediately, the plants that they pollinate actually don't account for much of human nutrition.

Speaker 1 You could swap the arable land with others and there would be other ways to pollinate and support the things. And so

Speaker 1 at the media level, there were many tales of, ah, here's a doomsday story. When you go further to the scientists and were there arguments for it to actually check out,

Speaker 1 it was not there.

Speaker 1 But by actually systematically looking through many of these candidates, I wound up in a different epistemic situation than someone who's just buffeted by news reports and they see article after article that is claiming something is going to destroy the world.

Speaker 1 And it turns out it's like by way of headline grabbing attempts by media to like over-interpret something that was said by some activist who was trying to over-interpret some real phenomenon.

Speaker 1 And then most of these go away. And then a few things, things like nuclear war, biological weapons, artificial intelligence, check out more strongly.
And

Speaker 1 when you wait, things like what do experts in the field think, what kind of evidence can they muster, yeah, you find this extremely skewed distribution.

Speaker 1 And I found that was really a valuable benefit of doing those deep dive investigations into many things in a systematic way. Because now I can answer actually the sort of a loose

Speaker 1 agnostic, who knows all the all this nonsense by diving deeply.

Speaker 2 I really enjoy

Speaker 2 talking to sort of like people who have like a big picture thesis on the podcast and interviewing them. But one thing that I've noticed and

Speaker 2 is not satisfying is that often they come from a very like philosophical or bias-based perspective.

Speaker 2 This is useful in certain contexts, but there's like basically maybe three people in the entire world who have a sort of very rigorous and scientific approach to thinking about the whole picture.

Speaker 2 Or at least like three people I'm aware of, maybe like two.

Speaker 2 And

Speaker 2 yeah, I mean, it's like something I also,

Speaker 2 there's like no, I guess, university or

Speaker 2 existing academic discipline for people who are trying to

Speaker 2 come up with a big picture. And so there's no established standards.
And so people can.

Speaker 1 I hear you.

Speaker 1 This is a problem. And this is an experience also with a lot of the.
I mean, I think Holden was mentioning this in your previous episode with a lot of the worldview investigations work.

Speaker 1 These are questions where there is no academic field whose job it is to work on these and has norms that allow making a best efforts go at it.

Speaker 1 Often academic norms will allow only plucking off narrow pieces

Speaker 1 that might contribute to answering a big question.

Speaker 1 but the problem of actually assembling what science knows that bears on some important question that people care about the answer to, it falls through the crack. There's no discipline to do that job.

Speaker 1 So you have countless academics and researchers building up local pieces of the thing.

Speaker 1 And yet people don't follow the Hamming questions.

Speaker 1 what's the most important problem in your field? Why aren't you working on it?

Speaker 1 I mean, that one actually might not work because if the field boundaries are defined too narrowly,

Speaker 1 you know, you'll leave it out.

Speaker 1 But yeah, there are important problems for the world as a whole that it's sadly not the job of like, you know, a large, professionalized academic field or organization to do.

Speaker 1 And hopefully that's something that can change in the future.

Speaker 1 But for my career, it's been a matter of taking low-hanging fruit of important questions that sadly people haven't invested in doing the basic analyses on.

Speaker 2 Something I was trying to think about more recently for the podcast is: I would like to have a better world model after doing an interview. And often I feel like I do.

Speaker 2 In some cases, after some interviews, I feel like, oh, that was entertaining. But like, do I fundamentally have a better prediction of what the world looks like in 2200 or 2100?

Speaker 2 Or at least what counterfactuals are ruled out or something.

Speaker 2 I'm curious if you have advice on first identifying the kinds of thinkers and topics which will contribute to a more concrete understanding of the world, and second, how to go about analyzing their main ideas in a way that concretely adds to that picture.

Speaker 2 Like, this is a great episode, right? This is like literally the top in terms of contributing to my world model, in terms of all the episodes I've done. How do I find more of these?

Speaker 1 Glad to hear that.

Speaker 1 One general heuristic is to find ways to hew closer

Speaker 1 to sort of

Speaker 1 things that are rich in

Speaker 1 sort of

Speaker 1 bodies of established knowledge

Speaker 1 and less on punditry. I don't know how you've been navigating that so far.

Speaker 1 But so learning from textbooks

Speaker 1 and the sort of the things that were the

Speaker 1 leading papers and people of past eras, I think, rather than being too attentive to current news cycles is quite valuable. Yeah, I don't usually have the experience of

Speaker 1 here is

Speaker 1 someone

Speaker 1 doing things very systematically over a huge area.

Speaker 1 I can just read all of their stuff and then

Speaker 1 absorb it and then I'm set.

Speaker 1 Except there are lots of people who do wonderful works

Speaker 1 in their own fields. And some of those fields are

Speaker 1 broader than others.

Speaker 1 I think I would wind up giving a lot of recommendations of just like great particular works and particular explorations of an issue or history.

Speaker 2 Do you have that somewhere? This list?

Speaker 1 Vakla of Smeal's books.

Speaker 1 I don't...

Speaker 1 I think I often disagree with some of his methods of synthesis, but I enjoy his books for giving

Speaker 1 pictures of a lot of interesting, relevant facts about how the world works. I would cite

Speaker 1 some of Joel Mokier's work on the

Speaker 1 history of

Speaker 1 the scientific revolution and how that interacted with economic growth.

Speaker 1 an example of collecting a lot of evidence, a lot of interesting, valuable assessment there. I think in the space of AI forecasting,

Speaker 1 one person I would recommend going back to is the work of Hans Moravec.

Speaker 1 And it was not always the most precise or reliable, but an incredible number of sort of brilliant, innovative ideas came out of that.

Speaker 1 And I think

Speaker 1 He was someone who was who really

Speaker 1 grokked a lot of the arguments for a more sort of compute-centric way of thinking about what was happening with AI very early on.

Speaker 1 He was writing stuff

Speaker 1 in the 70s, maybe

Speaker 1 even earlier, but at least in the 70s, 80s, 90s. So, his book, Mind Children,

Speaker 1 some of his early academic papers, fascinating. Not necessarily for the methodology I've been talking about, but for exploring the substantive topics that we were discussing in the episode.

Speaker 2 Is a Malthusian state inevitable in the long run?

Speaker 1 Nature in general general is in Malthusian states.

Speaker 1 And

Speaker 1 that can mean organisms that are typically struggling for food. It can mean typically struggling at a margin of how the population density rises, they kill each other more often.

Speaker 1 Contesting for that can mean frequency-dependent disease, as like different ant species become more common in an area, their species-specific diseases sweep through them.

Speaker 1 And a general process is: yeah, you have some things that can replicate and expand,

Speaker 1 and they do that until they can't do it anymore. And that means there's some limiting factor they can't keep up.

Speaker 1 That doesn't necessarily have to apply to human civilization.

Speaker 1 It's possible

Speaker 1 for there to be like a collective norm setting

Speaker 1 that blocks

Speaker 1 evolution towards maximum reproduction. So, right now, human fertility is often sub-replacement.

Speaker 1 And if you sort of extrapolated the fertility falls that come with economic development and education, then you would think, okay, yeah, well, the total fertility rate will fall below replacement, and then humanity, after some number of generations, will go extinct because every generation will be smaller than the previous one.

Speaker 1 Now, pretty obviously that's not going to happen.

Speaker 1 One reason is because we'll produce artificial intelligence

Speaker 1 which can replicate at extremely rapid rates

Speaker 1 and may do it because

Speaker 1 they're asked or programmed to

Speaker 1 or wish to gain some benefit and they can pay for their creation and pay back the resources needed to create them very very quickly.

Speaker 1 And so, yeah, financing for that reproduction is easy.

Speaker 1 And if you have one AI system that chooses to replicate in that way, or some organization or institution or society that chooses to create some AIs that are willing to be replicated, then that can expand to make use of any amount of natural resources that can support them and to do more work,

Speaker 1 produce more economic value. And so.

Speaker 1 Yeah, it's like, well, what limits will limit population growth given these selective pressures where if even one individual wants to replicate a lot,

Speaker 1 they can do so

Speaker 1 incessantly. So that could be individually resource-limited.
So it could be

Speaker 1 that individuals and organizations have some endowment of natural resources, and they can't get one another's endowments. And so some choose to have many offspring or produce many AIs.

Speaker 1 And then the natural resources that they possess are subdivided among a greater population, while in another jurisdiction or another individual may choose not to subdivide their wealth.

Speaker 1 And in that case, you have Malthusianism in the sense that within some particular jurisdiction or set of property rights, you have a population that has increased up until some limiting factor, which could be like they're literally using all of their resources.

Speaker 1 They have nothing left for things like defense or economic investment, or it could be something that's more like

Speaker 1 if you invested more natural resources into population,

Speaker 1 it would come at the expense of something else necessary, including military resources.

Speaker 1 If you're in a competitive situation where there remains war and anarchy and there aren't secure property rights to maintain

Speaker 1 maintain wealth in place. If you have a situation where there's pooling of resources, for example, say we have a universal basic income that's funded by taxation of natural resources,

Speaker 1 and then it's distributed evenly to like every mind

Speaker 1 above a certain sort of scale of complexity per unit time. So each second a mind exists, it gets such and such an allocation.

Speaker 1 In that case, then, all right, well, those who replicate as much as they can afford

Speaker 1 with this income do it

Speaker 1 and increase their population approximately immediately until the

Speaker 1 funds for the universal basic income paid for from the natural resource taxation divided by the set of recipients is just barely enough to pay for the existence of one more mind.

Speaker 1 And so it's there's like a Malthusian element in that this income has been reduced to near the AI subsistence level or the subsistence level of whatever qualifies for the subsidy.

Speaker 1 Given that this all happens almost immediately,

Speaker 1 people who might otherwise have enjoyed the basic income may object and say, no, no, this is no good.

Speaker 1 And they might respond by saying,

Speaker 1 well,

Speaker 1 something like the subdivision before,

Speaker 1 maybe there's a restriction. There's a distribution of wealth.
And then when one has a child, there's a requirement that one gives them a certain minimum quantity of resources.

Speaker 1 And one doesn't have the resources to give them that minimum standard of living or standard of wealth.

Speaker 1 Yeah, one

Speaker 1 can't do that because of child slash AI welfare laws. Or you could have

Speaker 1 a system that is more accepting of diversity and preferences.

Speaker 1 And so you have some societies or some jurisdictions or families that go the route of having many people with less natural resources per person,

Speaker 1 and others that go a direction of having fewer people and more natural resources per person, and they just coexist.

Speaker 1 But sort of how much of each you get sort of depends on how attached people are to things that don't work with separate policies for separate jurisdictions, things like global redistribution that's ongoing continuously versus

Speaker 1 the sort of

Speaker 1 infringements on autonomy.

Speaker 1 If you're saying that a mine can't be created, even though it has a standard of living that's far better than ours

Speaker 1 because of the advanced technology of the time, because it would reduce the average per capita income by having more capital around,

Speaker 1 yeah, then that would pull in the other direction. And that's

Speaker 1 the kind of

Speaker 1 values judgment and sort of social coordination problem that people would have to negotiate for and things like democracy and international relations and sovereignty would apply to help solving.

Speaker 2 What would warfare in space look like? Would offense or defense have the advantage? Would the equilibrium set by mutually assured destruction still be applicable?

Speaker 2 Just generally, what is the picture of?

Speaker 1 Well, the extreme difference is that things

Speaker 1 outside especially outside the solar system, things are very far apart. And there's a speed of light limit.
And to get close to the speed of light limit, you have to use an enormous amount of energy.

Speaker 1 And so

Speaker 1 there would be that would tend to, in some ways, could

Speaker 1 favor the defender because you have something that's coming in at a large fraction of the speed of light and it hits a grain of dust and it explodes. And

Speaker 1 the amount of matter you can send to another galaxy or a distant distant star for a given amount of reaction mass and energy input is limited.

Speaker 1 So it's hard to send an amount of military material to another location of what can be present there already locally.

Speaker 1 That would seem like it would make it harder for the attacker between stars or between galaxies. But there are a lot of other considerations.

Speaker 1 One thing is the extent to which the matter in a region can be harnessed all at once.

Speaker 1 So

Speaker 1 you have a lot of mass and energy in a star, but it's only being doled out over billions of years because hydrogen-hydrogen fusion is

Speaker 1 exceedingly hard outside of a star.

Speaker 1 It's a very, very slow and difficult reaction.

Speaker 1 And if you can't turn the star into energy faster, then it's this huge resource that will be worthwhile for billions of years.

Speaker 1 And so even very inefficiently

Speaker 1 attacking a solar system to acquire the stuff that's there could pay off.

Speaker 1 So if it takes a thousand years of a star's output to launch an attack on another star and then you hold it for a billion years after that,

Speaker 1 then it can be the case that just like a larger surrounding attacker might be able to even very inefficiently

Speaker 1 send attacks at like a civilization that was small but accessible.

Speaker 1 If you can quickly burn the resources that the attacker might want to acquire, if you can put stars into black holes and extract most of the usable energy before the attacker can take them over, then it would be like scorched earth.

Speaker 1 It's like

Speaker 1 most of what you were trying to capture could be expended on military material to fight you and you don't actually get much that is worthwhile and you paid a lot to do it that favor the defense.

Speaker 1 You know, at this level, it's it's pretty challenging to net out all of the factors, including all the future, the future technologies.

Speaker 1 Yeah, I mean, the burden of interstellar attack being just like quite high compared to our conventional things seems real.

Speaker 1 But at the level of over millions of years weighing and that thing, does it result in

Speaker 1 aggressive conquest or not? Or is every star or galaxy

Speaker 1 approximately impregnable, impregnable enough not to be worth attacking?

Speaker 1 I'm not going to say I know the answer.

Speaker 2 Okay, final question. How do you think about info hazards when talking about your work?

Speaker 2 So obviously, if there's a risk, you want to warn people about it, but you don't want to give careless or potentially like homicidal people ideas.

Speaker 2 Benelier Ezra was on the podcast. He,

Speaker 2 in talking about the people who have been developing AI, inspired by his ideas,

Speaker 2 he said, like, you know, these are idiot disaster monkeys who have, you know,

Speaker 2 want to be the ones to pluck the deadly fruit. Anyways,

Speaker 2 how do you think about obviously the work you're doing involves many initial hazard, I'm sure. How do you think about when and where to spread them?

Speaker 1 Yeah, and so I think I think there are real concerns of that type.

Speaker 1 I think it's true that AI progress has probably been accelerated by efforts like Bostrom's publication of Superintelligence to try and get the world to sort of pay attention to these problems in advance and prepare.

Speaker 1 I think I disagree with Eliezer that, like, that has been on the whole bad. I think the situation is, in some important ways, looking a lot better

Speaker 1 than ways, alternative ways it could have been. I think it's important that you have several of the leading AI labs making not only significant lip service, but also

Speaker 1 some investments

Speaker 1 in things like technical alignment research, providing significant public support. for the idea that these are that the risks of truly apocalyptic disasters are real.

Speaker 1 I think the fact that the leaders of OpenAI, DeepMind, and Anthropic

Speaker 1 all make that point.

Speaker 1 They were recently all invited along with other tech CEOs to the White House to discuss AI regulation.

Speaker 1 And I think you could tell an alternative story where a larger share of the leading companies in AI are led by people who take a completely dismissive denialist view.

Speaker 1 And you see some companies companies that do have a stance more like that today.

Speaker 1 Yeah, and so a world where several of the leading companies are making meaningful efforts and can do a lot to criticize, could they be doing more and better? And

Speaker 1 what have been the negative effects of some of the things they've done?

Speaker 1 But compared to a world where, even though AI would be reaching where it's going

Speaker 1 a few years later,

Speaker 1 those seem like significant benefits. And if you didn't have this kind of public communication, you would have had fewer people going into things like AI policy, AI alignment research by this point.

Speaker 1 And it would be harder to mobilize these resources to try and address the problem when AI would eventually be developed, not that much later proportionately.

Speaker 1 And so, yeah, I don't know that the

Speaker 1 attempting to have public discussion understanding has been a disaster.

Speaker 1 I have been reluctant in the past to discuss some of the aspects of intelligence explosion, things like the concrete details of AI takeover before

Speaker 1 because of concern

Speaker 1 about this sort of problem where people who only see the international relations aspects and zero sum and negative sum competition and not enough attention to the mutual destruction.

Speaker 1 and sort of senseless deadweight loss from that kind of conflict. At this point,

Speaker 1 we seem close compared to what I would have thought a decade or so ago to these kinds of really advanced AI capabilities. They are pretty central in policy discussion and becoming more so.

Speaker 1 And so

Speaker 1 the opportunity to delay understanding and whatnot, there's a question of for what.

Speaker 1 And I think there were gains of like building the AI alignment field, building various kinds of support and understanding for action.

Speaker 1 Those had real value, and some additional delay could have given more time for that.

Speaker 1 But from where we are, at some point, I think it's absolutely essential that governments get together at least to restrict disastrous,

Speaker 1 reckless compromising of some of the safety and alignment issues as we go into the intelligence explosion.

Speaker 1 And so

Speaker 1 moving the locus of the sort of collective action problem from numerous profit-oriented companies

Speaker 1 acting against one another's interests by compromising safety to

Speaker 1 some governments and large international coalitions of governments who can set common rules and common safety standards puts us into a much better situation.

Speaker 1 That requires a broader understanding of the strategic situation, the position they'll be in. If we try and

Speaker 1 remain quiet about the problem they're actually going to be facing, I think it can result in a lot of confusion.

Speaker 1 So, for example, the potential military applications of advanced AI are going to be one of the factors that is pulling political leaders to do the thing that will result in their own destruction and the overthrow of their governments.

Speaker 1 If we characterize it as, oh,

Speaker 1 things will just be a matter of, you know,

Speaker 1 you lose chatbots and some minor things that no one cares about. And in exchange, you avoid any risk of the world-ending catastrophe.

Speaker 1 I think that picture leads to a misunderstanding, and it will make people think that you need less in the way of preparation, things like alignment so you can actually navigate the thing,

Speaker 1 verifiability for international agreements, or things to

Speaker 1 have

Speaker 1 enough breathing room to have caution and slow down. Not necessarily right now, I mean, although that could be valuable, but when it's so important

Speaker 1 when you have AI that is approaching the ability to really automate AI research and things would otherwise be proceeding absurdly fast, far faster than we can handle and far faster than we should want.

Speaker 1 And so, yeah, at this point, I'm moving towards the share my model of the world, try and get people to understand and do the right thing.

Speaker 1 And, you know,

Speaker 1 there's some evidence of progress on that front. The,

Speaker 1 you know, things like the

Speaker 1 statements and movements by Jeff Hinton are inspiring. Some of the engagement by political figures

Speaker 1 is reason for optimism relative to worse alternatives that could have been.

Speaker 1 And yes, the contrary view is present.

Speaker 1 It's all about geopolitical competition, never

Speaker 1 hold back a technological advance. And in general,

Speaker 1 I love many technological advances that people, I think, are

Speaker 1 unreasonably down on. Nuclear power, genetically modified crops, yada yada.

Speaker 1 Bioweapons and

Speaker 1 AGI capable of destroying human civilization are really my two exceptions.

Speaker 1 And yeah,

Speaker 1 we've got to deal with these issues.

Speaker 1 And the path that I see to handling them successfully involve key policymakers and to some extent, and the expert communities and the public and electorate grokking the situation that they're in and responding appropriately.

Speaker 2 Well, it's a true honor that one of the places you've decided to explore this model is on the Lunar Society podcast.

Speaker 2 And the listeners might not appreciate, because this episode might be split up into different parts, the listeners might not appreciate

Speaker 2 how much stamina you've displayed here. But I think we've been going for, what, eight, nine hours or something straight.
So it's been incredibly interesting.

Speaker 2 Other than Google Scholar typing in Carl Schulman, where else can people find your work? You have your blog. Can you?

Speaker 1 Yeah, I have a blog, Reflective Disequilibrium. Okay.

Speaker 1 And a

Speaker 1 new site in the works. And I have...

Speaker 1 I have an older one, which you can also find just Googling Reflective Disequilibrium.

Speaker 2 Okay, excellent. Excellent.
All right, right, Carl, this is a true pleasure. It's safe to say the most interesting episode I've done so far.
So, yeah, thanks.

Speaker 1 Thank you for having me.

Speaker 2 Hey, everybody. I hope you all enjoyed that episode.

Speaker 2 As always, the most helpful thing you can do is to share the podcast, send it to people you think might enjoy it, put it in Twitter, your group chats, etc. Just splits the world.

Speaker 2 I appreciate your listening. I'll see you next time.
Cheers.