AI Agent Traps: When Information Becomes the Attack Surface

AI agents do more than answer questions. They browse websites, read emails, query tools, and make decisions autonomously. That creates a new problem: what happens when the information they trust has been manipulated?

Researchers from Google DeepMind have categorized these threats into six types. Content injection hides malicious instructions in plain sight — a webpage might look harmless but contain commands in its underlying code or metadata. NIST evaluations found malicious instructions succeeded 57% of the time across tested injection tasks.

Semantic manipulation is subtler. Instead of telling an agent what to do, attackers feed it repeated emotional language, selective context, and false authority to skew its conclusions. Think of a search results page engineered to make one supplier look like the obvious choice.

Cognitive state traps poison an agent’s memory. USENIX research showed that inserting just five crafted texts into a RAG system’s knowledge base could manipulate its answers 90% of the time, even among millions of legitimate documents.

The defensive playbook: source verification, content screening, memory governance, restricted permissions, isolated execution, and human approval for high-impact actions. As agent populations grow, the question isn’t what they can do — it’s what they should trust.