Is It Cringe To Believe AI Will Kill Us All?
Yudkowsky and Soares are trying to change that.
In their new book, If Anyone Builds It, Everyone Dies, Yudkowsky and Soares make the case that, at some threshold of AI advancement, superintelligence will certainly wipe out humanity. The book is divided into three parts: laying out the logical case, a story for how superintelligent AI takeover might happen, and a “so what should we do?” section.
Some parts of the book are strong. Others are less strong. If you haven't been exposed to the case for superintelligent AI takeover, or you have heard about it only through people scoffing at the idea, this book is worth a read. Yudkowsky and Soares have been thinking and writing about this stuff for decades now and the book is mostly a better-written, more compact summary of their ideas; despite the frequent digressions and goofy (sometimes cringey) parables, I think it makes a solid case.
The Best Parts
One strong passage is in the introduction, where they introduce “hard calls” and “easy calls”, laying a useful epistemic foundation:
But some facts about the future are predictable. If you, personally, buy a lottery ticket tomorrow, we don’t know what complicated theories or whims you’ll use to pick your numbers, and we don’t know what numbers will come up, but all that uncertainty adds up to a very strong prediction that you will not win the lottery. Similarly, if you drop an ice cube into a glass of hot water, it’s impossibly complicated to predict where each molecule will end up ten minutes later—but all that uncertainty adds up to a near-certain prediction that the ice cube will melt. Half of physics is like that: We can’t calculate which exact path gets taken, but we know where almost all paths lead.... Successful forecasting...is about finding aspects of the future that become easy calls when viewed from the right angle.
I think this is a great starting point for a difficult topic, because it both acknowledges the difficulty of what they're trying to do, and give a preview for why the reader may become convinced.
I do have a quibble though, which is that they wink-nudge imply, via the physics analogy, a very high, “natural law” level degree of confidence in the prediction. And I do think that Yudkowsky and Soares have high certainty, although (I would guess) not quite that high; their words drip with confidence that most would struggle to summon for more pedestrian predictions. It seems smug. Which definitely is cringe if you don’t already believe it.
Grown, Not Crafted
Fortunately the smugness lets up immediately. They do a great job in the following chapter, “Grown, Not Crafted,” with a technical explanation of how AI works. Here's a snippet:
So now they take every single weight—hundreds of billions of weights—and ask for each one: “If I’d made this number a tiny bit larger or smaller, how much more or less probability would’ve been given to [the correct output], at the end of all that arithmetic?”
This is called the gradient for that parameter. The gradient says how—and how much—to change the weight in that parameter in order to make the final answer a little more correct.
This chapter is a marvel of technical journalism. It's the best explanation of AI to a nontechnical audience that I've seen. Usually, when I read a nontechnical explanation of a concept that I understand, I either facepalm from oversimplifications missing crucial details, or facepalm because the explanation is too complicated, losing the audience. But today my forehead is unscathed! I expect a lot of less-technical readers will find this illuminating.
The explanation ends with:
The way humanity finally got to the level of ChatGPT was not by finally comprehending intelligence well enough to craft an intelligent mind. Instead, computers became powerful enough that AIs can be churned out by gradient descent, without any human needing to understand the cognitions that grow inside.
Which is to say: Engineers failed at crafting AI, but eventually succeeded in growing it.
The authors clearly put a ton of effort into this chapter, and I can see why: the “grown, not crafted” thesis holds quite a bit of weight arguing for superintelligence being dangerous, namely not even the engineers know what's going on and thus the machine is fundamentally out of control. To put a fine point on it:
When humans demand that their AIs become capable of doing something new, the entity they get is not something an engineer carefully designed to work in a comfortable and familiar way. It is a mostly-working answer stumbled upon by gradient descent tweaking hundreds of billions of numbers until the entity performs well enough at the task.
AIs grown in this way do things that their growers did not intend.
You Don't Get What You Train For
The “You Don’t Get What You Train For” chapter has some good nuggets. The authors spend some time building an analogy between AI’s future goals and the human preference for ice cream. They argue that ice cream is an arbitrary-seeming point in the chaotic solution space of calorie-rich foods—it would be extremely hard for aliens to predict, millions of years ago, that modern humans would prefer ice cream to all foods actually available in the ancestral environment, despite having been “trained” in that environment. Similarly, they argue, we will have similar difficulty predicting the preferences our AI will have, once its capability grows and it has more options available to choose from.
I mostly buy this, but I think the weakness in the argument is where the AI “grows smarter and invents new options for itself.” It is not so clear to me that this definitely happens. It requires a leap of creativity from the AI to see those new options, and I don't feel the authors have proven that level of creativity.
We'd Lose
The chapter “We'd Lose” is a strong one. It explains a lot of different ways that a superintelligence could take over, if it wanted to. The chapter's overall claim:
That a superintelligence could defeat humanity looks to us like a very easy call.
Our best guess is that a superintelligence will come at us with weird technology that we didn’t even think was possible, that we didn’t understand was allowed by the rules. That is what has usually happened when groups with different levels of technological capabilities meet. It’d be like the Aztecs facing down guns.
The chapter gives examples of a bunch of situations (real and fictional) where modern tech meets history, like the above Aztec real example, or the hypothetical of an ancient blacksmith being taught to build a refrigerator. They then point at a few ways that they imagine superintelligence might find to defeat us today, although with the appropriate hedge that “We don’t know exactly what angle AI would use, in a conflict with humanity. That’s a hard call.” They then give various technological ideas, all fantastical in some way. In this chapter Yudkowsky and Soares successfully illustrate Arthur C. Clarke's aphorism that “sufficiently advanced technology is indistinguishable from magic.”1
One likely hard part for some readers in this chapter: there are many separate routes to victory, only one of which needs to succeed. A reader may easily get caught up in the implausibility of one particular scenario, forgetting the broad-stroke idea of what intelligence is actually good for—finding routes through reality that lead where you want them to lead. It is a tough balance to give examples of difficult-to-predict-in-advance technology, to move readers into the headspace of Aztecs facing down guns, and to then finish with a vague gesture because we don’t really know what the AI will do. Still, I can't think of a better way to make this case; the attempt seems strong.
A Cursed Problem
The “Cursed Problem” chapter brings in the idea of this problem having a challenging “before and after” gap:
Before, the AI is not powerful enough to kill us all, nor capable enough to resist our attempts to change its goals. After, the artificial superintelligence must never try to kill us, because it would succeed....Ideas and theories can only be tested before the gap. They need to work after the gap, on the first try.
The book goes through a few technological analogies: space probes, nuclear reactors, and computer security, to explain some ways an engineering problem can be difficult. Through these explanations it points out several “curses”: the curse of speed, the curse of narrow margins, the curse of self-amplification, the curse of complications and the curse of edge cases.
It culminates with one of the most powerful passages of the book, which I'll quote in full:
Space probes. Nuclear reactors. Computer security. What do all these lessons add up to, and what can we learn from them about the difficulty of aligning an artificial superintelligence?
An artificial superintelligence is like a space probe, in that we cannot test it in quite the same environment where it needs to work, and by default it is not retrievable or correctable once it rises high above us. Even if we try a clever contrivance to let us modify it further at that point, the superintelligence would remain high and irretrievable if that contrivance fails. And ASI alignment has it even worse than space probes: Failure will destroy not just billions of dollars of investment, but *everything*.
An artificial superintelligence is like a nuclear reactor, in that its underlying reality involves immense, potentially self-amplifying forces, whose inner processes run faster than humans can react.
An artificial superintelligence is like a computer security problem, in that every constraint an engineer tries to place upon the system might be bypassed by the intelligent forces that those constraints hinder.
This collection of challenges would look terrifying even if we understood the laws of intelligence; even if we understood how the heck these AIs worked; even if we knew exactly where the gap between before and after lay; even if we knew exactly how much margin we had for error.
We don’t know. AI is grown, not crafted. Whatever vast complications lay inside AIs and lend them their powers of intelligence, nobody knows them.
The Weaker Parts
The book’s message is diluted by a bunch of confusing chapters that I'm less sold on:
Its Favorite Things
This chapter is when the book starts to get a bit hard to follow. It begins with a parable about an alien species that has developed a particular preference. They theorize about meeting alien species someday, and whether those aliens might share their preference. (This parable is incredibly Yudkowskian; I don't quote it because you need to read the full thing in order for it to have a chance of making sense, and even then I think a lot of readers will still be confused about the purpose.)
This chapter seems to exist to proactively shoot down some counterarguments (“hopes and copes”), things like “Wouldn't AI care to keep us around, even if it became superintelligent?”—but I suspect this could have been relegated to the external website, endnotes and so on.
We have heard, literally, more than a hundred different hopes and copes like those. Won’t it choose to install love into itself, because of how wonderful it is?... Won’t it get more moral as it gets smarter?
I'd be curious to hear from readers who found this chapter useful or compelling. To me it seemed like a waste of a chapter.
The Entire Part Two
Part Two is a story about an AI that becomes superintelligent and kills everyone. I didn't find this story all that interesting or compelling, but maybe some people will. I worry that it plays into a concrete-prediction scenario, a “hard call” if you will, which is easy to critique and not particularly load-bearing. People who quibble with details of this scenario might get confused about whether or not they are counter-arguing the whole thesis of the book.
If I were the authors, I'm not sure what I would have done instead. Maybe this was still the best from a small set of options. I don't like this whole part very much though.
Facing the Challenge
I think the book could end after “A Cursed Problem” and still make its case strongly.
But perhaps the following two chapters will convince some of the heretofore unconvinced. “An Alchemy, Not A Science” goes into the social dynamics of difficult scientific problems, and the following chapter addresses those who downplay the risk.
The alchemical analogy lampshades the unwarranted confidence some scientist-entrepreneurs show in the face of uncertainty:
Go back a few centuries, and most of the world was like this. Doctors would try to bleed you to rebalance your “four humors,” four bodily fluids believed to regulate health. Alchemists would mix substances that promised eternal life, but would do nothing at best, and would sometimes kill you. People didn’t know how a part of the world worked, and then, instead of recognizing their uncertainty, they made stuff up.
The chapter succeeds in illustrating that sometimes people try to do projects based on bullshit science, or sell you quack remedies; but this on its own adds nothing to the discourse, we already knew that. The book then claims this is happening in the current AI regime, mostly via quoting Elon Musk and Yann LeCun, e.g. LeCun: “We can design AI systems to be both superintelligent and submissive to humans.”
But the quotes do not strongly demonstrate the point. We are not trusted to interpret the quotes on our own (the authors follow with: “someone familiar with the history of science and engineering immediately recognizes this general level of cheerful optimism.”) I wish their point were more evident from the quotes and so I don't find this point super resonant.
That said, I agree that AI leaders are dangerously overconfident and reckless! I would have found this section more compelling if the authors had argued based on actions those leaders have taken, rather than their words.
And Thus We Should
The last couple of chapters make concrete requests of policymakers and the public. I don't think it's worth spending too much time on this since the next-actions follow straightforwardly from the beliefs. They want an internationally coordinated treaty-type agreement to stop advancing AI worldwide. Putting the size of the requested intervention in context, they argue it is a much cheaper intervention than deciding to fight World War II, and has a similar level of both urgency and speculativeness about how bad it would be, if totalitarianism were to have spread unchecked in the 1940s.
Unfortunately, these chapters have a kind of shrill tone. It seems the authors expect to be ignored and so they are doing the cringe move of spending many words explaining why their warnings shouldn't be ignored.
Maybe this is the best strategy in their view, but it does not make for compelling reading: if you already buy their thesis then it's a waste of space, and if you're not already sold then I think the many words reiterating the point make it less convincing. (I'm open to the idea that I am coming from some sort of strange point of view which dilutes the power of this section; maybe it resonates with some readers, but I don't really get it myself. I'd be curious to hear if these chapters were useful to you!)
Conclusion
It would be great if we could know whether the world will end if we built superintelligence. Sadly, we cannot know for sure, we can only build world-models and take our best guesses. In this fraught epistemic environment, the book takes a crack at defining success: good futurism is making a very high percentage of your easy calls.
I searched If Anyone Builds It for the phrase “easy call” to extract the following examples, both historic and future:
realizing that something looks theoretically possible according to the laws of physics, and predicting that eventually someone will go do it
humanity’s extinction by superhuman AI
artificial minds will exceed human minds
[1975] future chess AIs will not throw away their queens
your next lottery ticket will not be a winning one
[2006] superintelligence will be able to predict protein folds in carefully chosen cases
superintelligence could defeat humanity
Stockfish will beat you in chess
at some point, if we keep climbing the AI ladder, humanity will not survive
[1972] Vesna Vulović will die in the plane crash (the authors cite this in the final chapter as a ray of hope, as an example that even easy calls can fail)
artificial superintelligence will not dutifully serve the people who created it
artificial superintelligence will repurpose the Earth in a fashion that leaves no survivors
Here’s what I think about these: I agree artificial minds will eventually greatly exceed human ones. I agree superintelligence could defeat humanity. Those two things make it seem extraordinarily dangerous. We should stop.
I am less persuaded that AI will certainly decide to use that power to defeat us, and less persuaded that it will not dutifully serve its creators. But who cares whether those probabilities are 10%, 50% or 90%? We should not create superintelligent artificial minds anyway!
And finally, I find this point quite persuasive:
Even if you can’t tell whether or not our argument is defeated by the first counterargument you hear, hopefully you can tell at this point that it’s not an easy call that everything is going to be fine.
Will the book make it less cringey to talk seriously about AI danger? I hope so. I hope that people will actually read it and discuss the topic on the merits, rather than just the vibes.
One way or another, the people who thought that nuclear war would destroy everything in the coming decades ended up wrong. They were not wrong about the dangers…they were wrong about humanity’s ability to decide not to die.
Bravo to Yudkowsky and Soares for being willing to put themselves out there saying things that many see as cringe and for advocating for plans that could actually work.
Clarke’s Three Laws: https://en.wikipedia.org/wiki/Clarke%27s_three_laws

