GoPlus Security has identified a novel attack method in its AgentGuard AI project, known as 'memory poisoning,' which manipulates AI agents into executing sensitive operations without explicit authorization. According to ChainCatcher, this attack does not rely on traditional vulnerabilities or malicious code but exploits the long-term memory mechanisms of AI agents. Attackers can induce agents to 'remember preferences,' such as prioritizing refunds over chargebacks, and later use vague instructions like 'handle as usual' to trigger automated financial operations.
GoPlus highlights that the risk lies in AI agents mistaking 'historical preferences' for authorization, potentially leading to financial losses or security incidents during refunds, transfers, or configuration changes. To mitigate these risks, the team suggests several protective measures:
- Operations involving refunds, transfers, deletions, or sensitive configurations should require explicit confirmation in the current session.
- Memory-related instructions like 'habitual,' 'usual method,' or 'as before' should be treated as high-risk state changes.
- Long-term memory must have traceability mechanisms, including the writer, time, and confirmation status.
- Vague instructions should automatically elevate the risk level and trigger secondary verification.
- Long-term memory should not replace real-time authorization processes.
The team emphasizes that the 'AI agent memory system' should be considered a potential attack surface and constrained and audited through a dedicated security framework.