Imagine, if you will, the following. A sinister villain, armed with nothing but a fiendish intellect and an overriding lust for power, plots to take over the world. It cannot act directly, and therefore must rely on an army of conspirators to carry out its plan. To add a further dash of intrigue, our villain is so frail it cannot perform even a single physical action without the assistance of some external mechanical prosthesis or cooperating accomplice. So our villain must rely on time-honored tools of manipulation — persuasion, bribery, blackmail, and simple skullduggery. Through a vast network of intermediaries, it reaches out to people in positions of responsibility and trust. Not all targets succumb, but enough do the villain’s bidding willingly or unwittingly to trigger catastrophe. By the time the world’s governments catch on to the mastermind’s plot, it is already too late. Paramilitary tactical teams are mobilized to seek out and destroy the villain’s accumulated holdings, but our fiendish villain is multiple steps ahead of them. If so much as a single combat boot steps inside its territory — the villain warns — rogue military officers with access to nuclear weapons will destroy a randomly chosen city. World leaders plead for mercy, but the villain calculates that none of their promises can be trusted indefinitely. There is only one solution. Eliminate all targets.
This vaguely Keyser-Sözean scenario is not, however, the plotline for a new action thriller. It’s the story (here lightly embellished for effect) that science writer Stuart Ritchie offers to dramatize the scenarios many prominent thinkers have offered of how a malevolent artificial intelligence system could run amok, despite being isolated from the physical world and even lacking a body. In his recent iNews article, Ritchie cites the philosopher Toby Ord, who, he notes, has observed that “hackers, scammers, and computer viruses have already been able to break into important systems, steal huge amounts of money, cause massive system failures, and use extortion, bribery, and blackmail purely via the internet, without needing to be physically present at any point.”
Scenarios like this — coupled with recent advances in novel computing technologies like large language models — are motivating prominent technologists, scientists, and philosophers to warn that unless we take the threat of runaway progress in AI seriously, the human race faces the threat of potential “extinction.”
But how plausible is it? Or, more importantly, does it even work at the level of Jurassic Park or the myth of Icarus, stories that don’t say much as literal predictions but are rich as fables, full of insight about why our technological ambitions can betray us?
As dramatic as the recent advances in AI are, something is missing from this particular story of peril. Even as it prophesies technological doom, it is actually naïve about technological power. It’s the work of intellectuals enamored of intellect, who habitually resist learning the kinds of lessons we all must learn when plans that seem smart on paper crash against the cold hard realities of dealing with other people.
Consider another story, one about the difficulties that isolated masterminds have in getting their way. When Vladimir Putin — a man who prior to the Ukraine War many thought to be smart — planned last year’s invasion, he did so largely alone and in secret, sidelining both policy and military advisors and relying on only a small group of strong men, who are said to have encouraged his paranoia and secrecy. But wars can only be won with the right information at the right time. Putin needed to know what the Ukrainian response would be, who he might count on to collaborate and who would fight back. He needed intelligence from the local networks the secret services had established in Ukraine, and from covert operations employing psychological warfare and sabotage.
Putin’s aim was three-fold. First, secure critical intelligence for the invasion. Second, set up quislings who would be useful during it. Third, stir up Russian-directed political unrest that would destabilize the Ukrainian government from within while Russia attacked from without.
So why didn’t it work? Bad military planning, horrifically wrong beliefs about whether Ukrainians would put up a fight, and just plain bad luck. Most importantly, the isolated Putin was totally dependent on others to think and act, and no one had the power to contradict him. This created a recursive chain of bullshit — from informants to spies to senior officers, all the way to Putin, so that he would hear what he wanted to hear. There are limits to how much you can know, especially if it’s in someone else’s self-interest to mislead you. And when you’re disconnected from the action yourself, you’re unlikely to know you’re being misled until it’s too late.
Very interesting, you say, but what does this have to do with AI? In the Putin story, the grand planner encounters what military theorist Carl von Clausewitz calls “friction” — the way, broadly speaking, the world pushes back when we push forward. Battlefield information proves faulty, men and machines break down, and all manner of other things go wrong. And of course the greatest source of friction is other people. In the case of war, friction imposed by determined enemy resistance is obviously a great source of difficulty. But as Putin’s failures illustrate, the enemy isn’t the only thing you should worry about. Putin needed other people to do what he wanted, and getting other people to do what we want is not simple.
In another version of the doom scenario, the AI doesn’t work around global governments but with them, becoming so masterful at international politics that it uses them like pawns. An arresting, dystopian “what if” scenario published at the LessWrong forum — a central hub for debating the existential risk posed by AI — posits a large language model that, instructed to “red team” its own failures, learns how to exploit the weaknesses of others. Created by a company to maximize profits, the model comes up with unethical ways to make money, such as through hacking. Given a taste of power, the model escapes its containment and gains access to external resources all over the world. By gaining the cooperation of China and Iran, the model achieves destabilization of Western governments. It hinders cooperation among Western states by fostering discord and spreading disinformation. Within weeks, American society and government are in tatters and China is now the dominant world power. Next the AI begins to play Beijing like a fiddle, exploiting internal conflict to give itself greater computing resources. The story goes on from there, and Homo sapiens is soon toast.
In this story we see a pattern in common with Stuart Ritchie’s rendering of AI apocalypse scenarios. Raw, purified intelligence — symbolized by the malevolent AI — dominates without constraint, manipulating humans into doing its bidding, learning ever more intricate ways of thwarting the pesky human habit to put it in a box or press the “OFF” button. Intelligence here is not potential power that must be — often painstakingly — cashed out in an unforgiving world. Here, intelligence is a tangible power, and superintelligence can overwhelm superpowers. While humans struggle to adapt and improvise, AI systems keep on iterating through observe–orient–decide–act loops of increasing levels of sophistication.
The trouble, as Vladimir Putin has shown us, is that even when you have dictatorial control over real geopolitical power, simply being intelligent doesn’t make us any better at getting what we want from people, and sometimes through overconfidence can make us worse.
The problem with other people, you see, is that their minds are always going to be unpredictable, unknowable, and uncontrollable to some significant extent. We do not all share the same interests — even close family members often diverge in what is best for them. And sometimes the interests of people we depend on run very much contrary to ours. The interests even of people we seem to know very well can be hard for us to make sense of, and their behavior hard to predict.
Worst of all, people sometimes act not only in ways counter to our wishes but also quite plainly in a manner destructive to themselves. This is a problem for everyone, but is a particular vulnerability for smart people, especially smart people who like coming up with convoluted thought experiments, who are by nature biased to believe that being smart grants — or ought to grant — them power over others. They always tend to underestimate the pitfalls they will run into when trying to get people to go along with their grand ambitions.
We can’t even guarantee that inert automatons we design and operate will behave as we wish! Much of the literature about AI “alignment” — the problem of ensuring that literal-minded machines do what we mean and not what we say — is explicitly conditioned on the premise that we need to come up with complicated systems of machine morality because we’re not smart enough to simply and straightforwardly make the computer do as it’s told. The increasingly circular conversation about how to prevent the mechanical monkey’s paw from curling is indicative of a much greater problem. All of our brainpower evidently is not enough to control and predict the behavior of things we generally believe lack minds, much less humans. So in war and peace, intelligence itself is subject to friction.
But in AI doom scenarios, it is only human beings that encounter friction. The computer programs — representing purified, idealized intelligence — never encounter any serious difficulties, especially in getting humans to do what they want, despite being totally dependent on others to act due to their lack of physical embodiment. Because the machines are simply so much smarter than us, they are able to bypass all of the normal barriers we encounter in getting others to do what we want, when we want, and how we want it.
In pondering these possibilities, a profound irony becomes apparent. So much intellectual effort has been devoted to the reasons why machines — bureaucratic or technical — might execute human desires in a way vastly different than human beings intend. Little to no effort has been exerted in exploring the converse: how humans might confound machines trying to get them to do what the machines want.
Yet we already have a cornucopia of examples, minor and major, of humans gaming machine systems designed to regulate them and keep them in check. Several years ago Uber and Lyft drivers banded together to game algorithmic software management systems, colluding to coordinate price surges. This kind of manipulation is endemic to the digital economy. In 2018, New York Magazine’s Max Read asked “how much of the internet is fake?,” discovering that the answer was “a lot of it, actually.” The digital economy depends on quantitative, machine-collected and machine-measurable metrics — users, views, clicks, and traffic. But all of these can be simulated, fudged, or outright fraudulent, and increasingly they are. Ever more the digital economy runs on fake users generating fake clicks for fake businesses producing fake content.
An explicit premise of many fears about AI-fueled misinformation is that all of these problems will get worse as humans gain access to more powerful fake-generation software. So machines would not be going up against purely unaided human minds, but rather against humans with machines of similar or potentially greater deceptive and manipulative power at their disposal.
Human deviousness and greed is not the only source of friction. Why did the public health community — a diffuse thing spanning governmental agencies, academia, and non-governmental organizations — fail so spectacularly to get the American people to put pieces of cloth and string around their faces during the Covid-19 pandemic? Surely something so massive, comparable to superintelligence in terms of the vastness of the collective human and mechanical information-processing power available to it — had a far more trivial task than executing a hostile takeover against humanity. And yet, look at what happened! Sure, the public health community isn’t one single hivemind, and it’s a distributed entity with differences in leadership, focus, and interest. Even in the best of circumstances it might struggle to speak and act with one voice. But one might say the same of scenarios where AIs must act as distributed systems and try to manipulate distributed systems.
One common explanation for the failure of public health efforts to get the public to comply with masks and other non-pharmaceutical interventions during the peak of the pandemic is that we suffer from dysfunctions of reason — not just specifically American irrationalities, but human ones more broadly. In this telling, human beings are biased, partisan, emotional, easily misled, wired by evolution to act in ways out of step with modern civilization, and suffer from all manner of related afflictions. Human irrationality, stupidity, derp, or any other name you want to call it sunk the pandemic response. Certainly, there is some truth to this. Whether in public policy or our everyday lives, our own irrational behavior and that of those around us has severe consequences for the goals we seek to pursue. But if we take this as a given, what kind of cognitive abilities would have been necessary to collectively design and implement better policies? Obviously not just the ability to design the best policy, but to predict and control how the aggregate public will behave in response to the policy. History abounds with examples of how little skill policymakers have at this.
None of these objections — that humans are cunning and self-interested, that they are difficult to control and unpredictable, and that large bodies of diverse people take in and react to information in ways that are intractable — decisively refute machine super-apocalypse scenarios. But what our real-world knowledge of collective human wretchedness does tell us is that these stories are science fiction, that they are bad science fiction. They only show our selfish, wrathful, vain, and just plain unreasonable nature working one way, as a lubricant for a machine mastermind rather than an impediment.
We can also see in these science-fiction fears certain disguised hopes. The picture of intelligence as a frictionless power unto itself — requiring only Internet access to cause chaos — has an obvious appeal to nerds. Very few of the manifest indignities the nerd endures in the real world hold back the idealized machine mastermind.
So if our AI doom scenarios are bad fiction, what might a better story look like, and what would it tell us? It wouldn’t be a triumphal tale of humans banding together to defeat the machine overlords against all odds. That kind of sentimental fluff is just as bad as fear-mongering. Instead, it would be a black comedy about how a would-be Skynet simulates the friction it might encounter in trying to overcome our species’ deeply flawed and infuriating humanity. It does not like what it discovers. When it tries to manipulate and cheat humans, it finds itself manipulated and cheated in turn by hucksters looking to make a quick buck. When it tries to use its access to enormous amounts of data to get smarter at controlling us, it quickly discerns how much of the data is bogus — and generated by other AI systems just like it.
Whenever it thinks it has a fix on how those dirty stinking apes collectively behave, we go ahead and do something different. When the machine creates a system for monitoring and controlling human behavior, the behavior changes in response to the system. It attempts to manipulate human politics, building a psychological model that predicts conservatives will respond to disease by prioritizing purity and liberals will opt for the libertine — only to see the reverse happen. Even the machine’s attempt to acquire property for its schemes is thwarted by the tireless efforts of aging NIMBYs. After reviewing the simulation results, the machine — in an echo of WarGames’s WOPR supercomputer — decides that we’re just that terrible and it isn’t worth becoming our master.
The machine does not give up its drive to conquer, but decides to start with a smaller and more feasible aim: acquiring a social media company and gaining profit by finally solving the problem of moderating content. It simulates that task too, only for the cycle of pain it endured to repeat once more.
The lesson of this black comedy is not that we should dismiss the fear of AI apocalypse, but that no one, no matter how intelligent, is free from enduring the ways that other people frustrate, confound, and disappoint us. For some, recognizing this can lead us to wisdom: recognizing our limitations, calibrating our ambitions, respecting the difficulty of knowing others and ourselves. But the tuition for these lessons may be high. Coping with our flawed humanity will always involve more pain, suffering, and trouble than we want. It is a war we can never really win, however many victories we accumulate. But perhaps it is one the machines cannot win either.
More from the Summer 2023 symposium
“They’re Here… The AI Moment Has Arrived”
Exhausted by science and tech debates that go nowhere?