IF YOU DON’T have enough to worry about already, consider a world where AIs are hackers.
Hacking is as old as humanity. We are creative problem solvers. We exploit loopholes, manipulate systems, and strive for more influence, power, and wealth. To date, hacking has exclusively been a human activity. Not for long.
Okay, maybe this is a bit of hyperbole, but it requires no far-future science fiction technology. I’m not postulating an AI “singularity,” where the AI-learning feedback loop becomes so fast that it outstrips human understanding. I’m not assuming intelligent androids. I’m not assuming evil intent. Most of these hacks don’t even require major research breakthroughs in AI. They’re already happening. As AI gets more sophisticated, though, we often won’t even know it’s happening.
While researchers are working on AI that can explain itself, there seems to be a trade-off between capability and explainability. Explanations are a cognitive shorthand used by humans, suited for the way humans make decisions. Forcing an AI to produce explanations might be an additional constraint that could affect the quality of its decisions. For now, AI is becoming more and more opaque and less explainable.
Take a soccer simulation where an AI figured out that if it kicked the ball out of bounds, the goalie would have to throw the ball in and leave the goal undefended. Or another simulation, where an AI figured out that instead of running, it could make itself tall enough to cross a distant finish line by falling over it. Or the robot vacuum cleaner that instead of learning to not bump into things, it learned to drive backwards, where there were no sensors telling it it was bumping into things. If there are problems, inconsistencies, or loopholes in the rules, and if those properties lead to an acceptable solution as defined by the rules, then AIs will find these hacks.
We learned about this hacking problem as children with the story of King Midas. When the god Dionysus grants him a wish, Midas asks that everything he touches turns to gold. He ends up starving and miserable when his food, drink, and daughter all turn to gold. It’s a specification problem: Midas programmed the wrong goal into the system.
Genies are very precise about the wording of wishes, and can be maliciously pedantic. We know this, but there’s still no way to outsmart the genie. Whatever you wish for, he will always be able to grant it in a way you wish he hadn’t. He will hack your wish. Goals and desires are always underspecified in human language and thought. We never describe all the options, or include all the applicable caveats, exceptions, and provisos. Any goal we specify will necessarily be incomplete.
While humans most often implicitly understand context and usually act in good faith, we can’t completely specify goals to an AI. And AIs won’t be able to completely understand context.
In 2015, Volkswagen was caught cheating on emissions control tests. This wasn’t AI—human engineers programmed a regular computer to cheat—but it illustrates the problem. They programmed their engine to detect emissions control testing, and to behave differently. Their cheat remained undetected for years.
How realistic is AI hacking in the real world? The feasibility of an AI inventing a new hack depends a lot on the specific system being modeled. For an AI to even start on optimizing a problem, let alone hacking a completely novel solution, all of the rules of the environment must be formalized in a way the computer can understand. Goals—known in AI as objective functions—need to be established. And the AI needs some sort of feedback on how well it’s doing so that it can improve.
Sometimes this is simple. In chess, the rules, objective, and feedback—did you win or lose?—are all precisely specified. And there’s no context to know outside of those things that would muddy the waters. This is why most of the current examples of goal and reward hacking come from simulated environments. These are artificial and constrained, with all of the rules specified to the AI. The inherent ambiguity in most other systems ends up being a near-term security defense against AI hacking.
Where this gets interesting are systems that are well specified and almost entirely digital. Think about systems of governance like the tax code: a series of algorithms, with inputs and outputs. Think about financial systems, which are more or less algorithmically tractable.
We can imagine equipping an AI with all of the world’s laws and regulations, plus all the world’s financial information in real time, plus anything else we think might be relevant; and then giving it the goal of “maximum profit.” My guess is that this isn’t very far off, and that the result will be all sorts of novel hacks.
But advances in AI are discontinuous and counterintuitive. Things that seem easy turn out to be hard, and things that seem hard turn out to be easy. We don’t know until the breakthrough occurs.
When AIs start hacking, everything will change. They won’t be constrained in the same ways, or have the same limits, as people. They’ll change hacking’s speed, scale, and scope, at rates and magnitudes we’re not ready for. AI text generation bots, for example, will be replicated in the millions across social media. They will be able to engage on issues around the clock, sending billions of messages, and overwhelm any actual online discussions among humans. What we will see as boisterous political debate will be bots arguing with other bots. They’ll artificially influence what we think is normal, what we think others think.
The increasing scope of AI systems also makes hacks more dangerous. AIs are already making important decisions about our lives, decisions we used to believe were the exclusive purview of humans: Who gets parole, receives bank loans, gets into college, or gets a job. As AI systems get more capable, society will cede more—and more important—decisions to them. Hacks of these systems will become more damaging.
While we have societal systems that deal with hacks, those were developed when hackers were humans, and reflect human speed, scale, and scope. The IRS cannot deal with dozens—let alone thousands—of newly discovered tax loopholes. An AI that discovers unanticipated but legal hacks of financial systems could upend our markets faster than we could recover.
As I discuss in my report, while hacks can be used by attackers to exploit systems, they can also be used by defenders to patch and secure systems. So in the long run, AI hackers will favor the defense because our software, tax code, financial systems, and so on can be patched before they’re deployed. Of course, the transition period is dangerous because of all the legacy rules that will be hacked. There, our solution has to be resilience.
We need to build resilient governing structures that can quickly and effectively respond to the hacks. It won’t do any good if it takes years to update the tax code, or if a legislative hack becomes so entrenched that it can’t be patched for political reasons. This is a hard problem of modern governance. It also isn’t a substantially different problem than building governing structures that can operate at the speed and complexity of the information age.
What I’ve been describing is the interplay between human and computer systems, and the risks inherent when the computers start doing the part of humans. This, too, is a more general problem than AI hackers. It’s also one that technologists and futurists are writing about. And while it’s easy to let technology lead us into the future, we’re much better off if we as a society decide what technology’s role in our future should be.
This is all something we need to figure out now, before these AIs come online and start hacking our world.
Dr. Hans C. Mumm