Sometimes AI hype can be so silly that it distracts us from the important work of making it functional. For example, you can read Bill Gates’ paean to AI and believe that within the next five years, “You’ll simply tell your device, in everyday language, what you want to do.” Of course! And maybe you’ll issue those commands while sitting in one of Elon Musk’s fully autonomous self-driving cars that he’s been promising forever (well, for 10 years, to be fair).
In our rush to hype the AI future, we risk setting unrealistic expectations that can have a dampening impact on investment, particularly in areas of security. Even if we reach Bill Gates’ utopia, it will feel more like a dystopia if we can’t fix things like prompt injection for large language models (LLMs).
Fully autonomous, self-driving AI
Gates has been waiting for AI agents for decades. And we’re not talking about Clippy 2.0. “Clippy has as much in common with agents as a rotary phone has with a mobile device,” he declares. And why? Because “with permission to follow your online interactions and real-world locations, it will develop a powerful understanding of the people, places, and activities you engage in.”
You know, sort of like how online advertising works today. If you didn’t immediately think, “Oh, right, online advertising and all those incredibly tailored advertisements I see all day,” then you’ll begin to spot the problems with Gates’ vision of the future. He talks up how AI will democratise healthcare, private tutoring services, and more, despite the reality that humanity has a pretty spotty record of ever gifting advances to the less privileged.
This brings us to Musk and his persistent predictions of self-driving cars. It’s easy to predict a rosy future but far harder to deliver it. Gates can gush that “agents will be able to help with virtually any activity and any area of life,” all within five years, but for anyone who has actually used things like Midjourney to edit images, the results tend to be really bad, and not merely in terms of quality. I tried to make Mario Bros. characters out of my peers at work and discovered that Caucasians fared better than Asians (who came out looking like a grotesque amalgamation of the worst stereotypes). We have a ways to go.
But even if we magically could make AI do all the things Gates projects it will be able to do in five short years, and even if we resolve its biases, we still have major security hurdles to clear.
The hurdle of prompt injection
“The key to understanding the real threat of prompt injection is to understand that AI models are deeply, incredibly gullible by design,” notes Simon Willison. Willison is one of the most expert and enthusiastic proponents of AI’s potential for software development (and general use), but he’s also unwilling to pull punches on where it needs to improve: “I don’t know how to build it securely! And these holes aren’t hypothetical, they’re a huge blocker on us shipping a lot of this stuff.”
The problem is that the LLMs believe everything they read, as it were. By design, they ingest content and respond to prompts. They don’t know how to tell the difference between a good prompt and a bad one. They’re gullible. As Willison puts it, “These models would believe anything anyone tells them. They don’t have a good mechanism for considering the source of information.” This is fine if all you’re doing is asking an LLM to write a term paper (this has ethical implications but not security implications), but what happens once you start feeding sensitive corporate or personal information into the LLM?
It’s not enough to say, “But my private LLM is local and offline.” As Willison explains, “If your LLM reads emails people have sent you or webpages people have written, those people can inject additional instructions into your private LLM.” Why? Because “If your private LLM has the ability to perform actions on your behalf, those attackers can perform actions on your behalf too.” By definition, Willison continues, prompt injection is “a way for attackers to sneak their own instructions into an LLM, tricking that LLM into thinking those instructions came from its owner.”
Anything the owner can do, the attackers can do. It takes phishing and malware to a whole new level. SQL injections are, by contrast, simple to fix. Prompt injections are anything but, as described in the Radical Briefing: “It’s as if we’ve coded a digital Pandora’s box—exceptionally brilliant, but gullible enough to unleash havoc if given the wrong set of instructions.”
As we begin to deploy AI agents in public-facing roles, the problem will become worse—which is not the same as saying unsolvable. Though the issues are thorny, as Willison covers in detail, they’re not intractable. At some point, we’ll figure out how to “teach an AI to only disclose sensitive data with some kind of ‘authentication,’ ” as Leon Schmidt suggests. Figuring out that authentication is non-trivial (and AI won’t be of much help securing itself).
We’ve been getting AI wrong for years, hyping the end of radiologists, software developers, and more. “ChatGPT might scale all the way to the Terminator in five years, or in five decades, or it might not. … We don’t know,” says Benedict Arnold. He’s right. We don’t. What we do know is that without more investment in AI security, even the rosiest AI hype will end up feeling like doom. We’ve got to fix the prompt injection problem.