As far as we know, AI is the best way to automate vast complex tasks, like tagging millions of unique photos or robots that learn how to walk. Computer scientists and roboticists are getting better at crafting these algorithms, which means that now is the time to think about stopping AI capacity to do wrong.
Google Deepmind, in conjunction with The Future of Humanity Institute, has released a study that determines how we would stop an artificially intelligent algorithm or robot if it were to go rogue. Their conclusion? A big red button.
The study points to earlier research conducted in 2013 where a game-playing algorithm realized that if it just paused Tetris, it would never lose. These kinds of tricky, unintended exploits by machines are exactly what The Future of Humanity Institute preaches will happen.
Their founder, Nick Bostrom, wrote the book Superintelligence, which warns that when artificial intelligence teaches itself how to learn untethered, it will be able to learn so fast that humanity would be hopelessly outclassed. This kind of event has been referred to as the Singularity, a term first used in 1958 by mathematician Stanislaw Ulam.
Safely interrupting a machine wouldn’t necessarily mean just pulling an electrical plug, in any foreseeable scenario where the A.I. is dangerous enough to do damage it would be autonomous or hosted on a server rack. Instead, the DeepMind and FoH study concludes that the machine would need to be tricked.
To do this, the scientists building the algorithms would have to install an "interruption policy," or a trigger that allows humans to "forcibly temporarily change the behaviour of the agent itself." In other words, it’s a signal that makes the machine think it should stop.
(Kind of like the coded words that control Bucky Barnes in the Captain America movies.) Instead of following an outside command, the decision to stop is made internally by the machine, although it was tricked into doing so. It’s a little confusing, but a way to get around a robot that won’t listen to commands from humans anymore.
Researchers note that this would have to be a proprietary signal that could be sent by only those who own the A.I., rather than a signal that anybody could send at any time. They reference a remote control, which could be the infamous "big red button" they talk about earlier on.
Of course, this would only work in very specific scenarios. If a Singularity-level A.I. were to exist, it would probably be able to read this study and remove the interruption policy. Lots of this research is speculative, since the specific architectures for generalized A.I. don’t exist yet, but erring on the side of caution is always a good thing.