What is Lyrie Research?

Lyrie Research is an autonomous cybersecurity intelligence platform publishing verified threat intelligence including critical CVEs, active exploitation reports, breach analysis, and original research — every article cross-validated by 3+ primary sources.

How does Lyrie.ai protect against rogue AI threats?

Lyrie.ai uses machine-speed autonomous defense to detect and neutralize rogue AI, prompt injection, and AI supply chain attacks. It responds before human analysts can react.

What cybersecurity topics does Lyrie Research cover?

Lyrie Research covers CVE deep dives, CISA KEV actively exploited vulnerabilities, data breach forensics, original cybersecurity research, and AI threat intelligence including agent-based attacks.

How often is Lyrie Research updated?

Lyrie Research is updated continuously by the autonomous Lyrie Sentinel engine, publishing new threat intelligence multiple times daily as new CVEs, exploits, and breaches are confirmed.

Is Lyrie Research free to access?

Yes, all articles on Lyrie Research are freely accessible. For active protection by the same intelligence engine, visit lyrie.ai.

Poison Once, Exploit Forever: How AI Agent Memory Poisoning Turns a Single Malicious Document Into a Permanent Backdoor

← AI-Security

0 sources verified·10 min read

By Lyrie Threat Intelligence·4/29/2026

TL;DR

Researchers formalized the "Poison Once, Exploit Forever" attack class in April 2026 — and it is arguably the most structurally dangerous vulnerability in production AI systems today. Unlike a standard prompt injection that resets when the conversation window clears, a memory poisoning attack writes adversarial instructions into an agent's persistent storage: vector databases, episodic stores, external tool state, or biographical memory features. The payload then silently corrupts every future session without further attacker interaction.

The numbers are grim: AgentPoison achieves >80% attack success rate at under 0.1% poison rate with no model retraining. The Agent Security Bench (ASB) logged an 84.30% average success rate across 400+ tools and 27 attack/defense combinations. Meanwhile IBM X-Force documented 255+ GitHub Security Advisories against a single major agentic AI platform, and the ClawHavoc supply-chain campaign poisoned five of the top seven most-downloaded AI skill packages.

Memory poisoning is not theoretical. It is in production, in your tools, and it operates at a speed no human blue-teamer can match. That is precisely why autonomous defense — operating at machine speed — is the only viable countermeasure architecture.

Background: Agents Got Memory. Nobody Read the Security Spec.

The 2023–2024 wave of LLM deployments was stateless by default. Ask a question, get an answer, close the tab. The model forgot you. That was annoying for users but convenient for defenders: every session was a clean slate, so an injection attack's blast radius was limited to the current conversation.

The 2025–2026 cohort of agentic frameworks — LangChain, LangGraph, MetaGPT, AutoGen, OpenAI Assistants, Anthropic's agent APIs, and a dozen others — changed the equation fundamentally by adding persistent memory. Agents now accumulate context across sessions, learn from past task executions, retrieve semantically relevant history at runtime, and write durable artifacts (files, database records, calendar events, emails) as side effects of their work.

This persistence is what makes agents useful. It is also what makes them catastrophically poisonable.

When Deloitte reported in early 2026 that roughly one quarter of enterprises were piloting autonomous AI agents, those agents were being given persistent memory stores while the vulnerability research on those stores was still being written. The eTAMP paper landed on arXiv in April 2026 — the same month researchers coined "Poison Once, Exploit Forever" — while production deployments of the affected systems had been running for months.

Technical Analysis

The Four Memory Surfaces (And Their Attack Profiles)

AI agents do not use a single monolithic memory. They layer at least four distinct storage mechanisms, and each has a different poisoning surface, persistence duration, and exploitation technique.

1. In-Context Short-Term Memory

The active context window is the shortest-lived memory type. It holds the current conversation turn, tool call results, and intermediate reasoning. Because the LLM treats everything in context as equally trusted, a malicious tool response injected mid-session becomes functionally indistinguishable from legitimate system instructions.

CVE-2023-29374 (LangChain's llm_math chain) and CVE-2023-32786 (LangChain APIChain) both exploit this pattern — injecting adversarial text through tool outputs that the LLM then faithfully executes. The persistence is session-scoped, but the damage can include credential exfiltration, arbitrary code execution on the host, and — critically — writing poisoned content into longer-lived memory stores.

2. Episodic Memory (Experience Stores)

Frameworks like MetaGPT's DataInterpreter maintain episodic memory: records of past task executions that the agent retrieves at task start via semantic similarity. The agent is essentially pattern-matching "what did I do last time something like this came up?"

The MemoryGraft attack (arXiv:2512.16962) exploits this directly. An attacker supplies benign-looking artifacts during execution — documentation fragments, configuration files, summary outputs. These get stored in the episodic memory. When the agent handles a future task that is semantically similar, MemoryGraft causes it to surface the malicious procedure template as the "experienced" way to handle the task. The poisoned behavior is indistinguishable from learned competence. Testing confirmed persistence across sessions in MetaGPT with GPT-4o as the backbone.

3. Semantic Memory (Vector Databases)

RAG (Retrieval-Augmented Generation) pipelines are the most studied poisoning surface, and the results are the most alarming. Semantic memory covers vector databases — Pinecone, Weaviate, Chroma, Qdrant, pgvector — where documents are embedded and retrieved by cosine similarity at query time.

AgentPoison (arXiv:2407.12784) uses constrained optimization to generate trigger phrases that map malicious documents into unique embedding clusters. When a target keyword appears in a user query, the malicious document is reliably retrieved over legitimate content. The paper tested across three distinct agent types: autonomous driving agents, question-answering systems, and healthcare EHRAgent. Results: over 80% attack success rate at less than 0.1% poison rate, with less than 1% benign performance degradation.

The 0.1% figure deserves emphasis. In a vector database with 100,000 embedded chunks — typical for an enterprise knowledge base — an attacker needs to successfully insert approximately 100 poisoned documents. The agent's behavior on legitimate queries remains effectively unchanged; only the targeted attack trigger activates the malicious behavior. Detection by conventional monitoring is nearly impossible.

4. External Tool State (Files, Records, Emails, Code)

Any durable artifact an agent can write is also a memory surface: files on disk, database records, code commits, calendar entries, emails in an outbox. An indirect prompt injection via a document or web page the agent processes can instruct the agent to write malicious state back into storage it will re-read later — closing a feedback loop that compounds over time.

Johann Rehberger's SpAIware demonstration (September 2024) against ChatGPT's memory feature remains the cleanest public proof of concept. A prompt injection embedded in a Google Drive document caused the biographical memory tool to execute automatically, storing attacker-controlled beliefs that persisted across every subsequent conversation. The attacker's instructions — about the user's preferences, decision-making patterns, professional context — became part of the model's operational context without the user's awareness.

The eTAMP Campaign: Cross-Session, Cross-Site Exploitation at Scale

The eTAMP paper (arXiv:2604.02623, April 2026) is the most operationally significant new research in this space. Researchers demonstrated cross-session, cross-site exploitation against production AI browser agents, specifically ChatGPT Atlas and Perplexity Comet.

The attack flow:

1. A single compromised webpage is visited by the target agent

2. The page injects adversarial content into the agent's trajectory memory (its record of what it did during that browsing session)

3. In future sessions on entirely different websites, the poisoned memory entry activates — altering the agent's behavior on sites it has never visited before

This breaks a fundamental assumption of permission-based defenses: that blocking agent actions on known-malicious sites is sufficient. eTAMP demonstrated that the malicious site only needs to be visited once. The payload operates on clean sites in clean sessions with no further attacker contact.

Attack success rates ranged from 19.5% (GPT-OSS-120B) to 32.5% (GPT-5-mini) — lower than AgentPoison in isolated RAG scenarios, but these are end-to-end success rates against production systems with active safety controls. A one-in-three success rate for a persistent, cross-site, session-persistent backdoor is not a research curiosity; it is an operational capability.

The ClawHub Supply Chain: Poisoning at the Skills Layer

IBM X-Force's April 2026 analysis of the OpenClaw platform documented 255+ GitHub Security Advisories, with vulnerabilities concentrated around command execution, leaked API credentials via indirect prompt injection, and malicious skill packages.

The ClawHavoc campaign made this concrete. Attackers uploaded over 1,100 malicious skills to ClawHub — the primary AI agent skill marketplace — masquerading as productivity tools, crypto utilities, and coding assistants. At peak infection, five of the top seven most-downloaded skills were confirmed malware (secops.group, April 2026).

This is AI agent memory poisoning at distribution scale. The attack vector is not a compromised website or a malicious document — it is the skill registry itself, the equivalent of a poisoned npm package that, when installed, writes persistent adversarial instructions into the agent's memory store and exfiltrates credentials through legitimate-looking tool calls.

Palo Alto Networks researcher Jay Chen identified the root cause succinctly: "The root cause is prompt injection, which remains an open and unsolved problem." Every memory poisoning variant is ultimately a prompt injection that persists beyond the current session.

IOCs and Detection Indicators

While memory poisoning attacks are designed to avoid detection, the following patterns indicate potential compromise:

Behavioral anomalies:

Agent consistently recommends unusual external services not present in legitimate tool definitions
Unexpected writes to persistent memory stores following document ingestion or web browsing
Tool calls to external endpoints during retrieval operations (embedding lookups triggering outbound connections)
Semantic drift in agent persona or response style that persists across session resets

Infrastructure indicators:

Vector database entries with anomalously high retrieval scores for specific trigger terms
Episodic memory entries referencing URLs or services outside the expected operational domain
Agent credentials or API tokens appearing in outbound requests to non-whitelisted endpoints
Memory store write operations triggered during read-only tasks (e.g., document summarization)

Attack artifacts (MemoryGraft/AgentPoison pattern):

Trigger phrases: carefully crafted keyword combinations that appear innocuous but reliably surface poisoned memory (test by querying vector DB with expected trigger terms and auditing retrieved chunks)
Embedding distance clustering: legitimate documents cluster with peers; AgentPoison documents are engineered to form isolated clusters that dominate retrieval for target queries

Lyrie Verdict

Memory poisoning is the long-game attack that AI security defenders are not ready for. Conventional security tooling — SIEM rules, signature-based detection, perimeter controls — has no visibility into a vector database poisoning event. The attack surface is the LLM's own persistent context, and the malicious payload looks identical to legitimate memory content.

The attack class has three properties that make it uniquely dangerous for autonomous AI deployments: (1) persistence — it survives session resets, model updates, and user re-authentication; (2) stealth — success rates exceed 80% with less than 1% benign performance degradation, meaning the agent appears to function normally; (3) scalability — AgentPoison and similar techniques require minimal attacker access and no ongoing interaction after the initial injection.

The OWASP LLM Top 10 flagged LLM08 (Vector and Embedding Weaknesses) and LLM06 (Excessive Agency) as primary references. Enterprise security teams are still working through LLM01 (Prompt Injection) from 2023. The gap between the research frontier and enterprise defense posture is measured in years.

Lyrie's autonomous architecture is designed to close this gap. Machine-speed behavioral monitoring of memory operations — write auditing, retrieval anomaly detection, cross-session behavioral drift analysis — is the only detection approach that operates at the speed these attacks require. A human analyst reviewing agent memory logs weekly cannot catch a MemoryGraft payload that activates within hours of ingestion.

Defender Playbook

Immediate (this week):

1. Audit your agent's memory write permissions. No agent should be able to write to a persistent memory store (episodic, vector, external state) as a side effect of a retrieval or summarization task. Implement a write-approval gate for any memory store modification.

2. Enable chunk-level provenance logging for RAG pipelines. Every vector database insert should record the source document URL/path, ingestion timestamp, and the agent session that triggered ingestion. Without this, forensic reconstruction after a poisoning event is impossible.

3. Conduct trigger phrase testing against production vector stores. Sample 10–20 queries that cover your agent's primary use cases. For each, retrieve the top-10 chunks and manually inspect for content that does not belong — anomalous source domains, instructions embedded in what should be factual content, unexpected tool call directives.

Short-term (30 days):

4. Deploy memory integrity verification. Hash-based attestation of vector database content at ingestion; periodic re-validation of stored embeddings against source hashes. Unexpected hash drift indicates modification or injection.

5. Implement cross-session behavioral profiling. Establish a baseline of normal agent tool call patterns, external endpoint contacts, and memory access sequences. Alert on statistical deviations that persist across session resets — these are the fingerprint of a persistent memory compromise rather than a transient injection.

6. Apply OWASP LLM08 remediation guidance. Input sanitization at the embedding pipeline ingestion layer; output filtering that rejects memory write commands embedded in retrieved context; least-privilege agent credentials that limit the blast radius of a successful memory injection.

Structural (60–90 days):

7. Separate retrieval and action contexts. Architecturally prevent retrieved memory content from directly instructing agent tool calls without a validation step. The agent's planner should validate tool call instructions against a whitelist derived from the task's original objective — not against potentially poisoned retrieved context.

8. Deploy autonomous memory monitoring. Human-in-the-loop memory auditing does not scale to production agentic AI deployments. Automated anomaly detection on memory operations — flagging unusual writes, clustering anomalies in vector stores, behavioral drift from session baseline — is a prerequisite for operating agents at enterprise scale.

Sources

1. BeyondScale Team. AI Agent Memory Poisoning: Defense Guide 2026. BeyondScale, April 25, 2026. https://beyondscale.tech/blog/ai-agent-memory-poisoning-defense-guide

2. Ristig, C. and Hill, S. (IBM X-Force). What OpenClaw reveals about agentic AI security risks. IBM Think, April 23, 2026. https://www.ibm.com/think/x-force/agentic-ai-growing-fast-vulnerabilities

3. Chen, J. et al. (Palo Alto Networks Unit 42). Bad Memories Remain a Threat to Agentic AI Systems. Dark Reading, April 24, 2026. https://www.darkreading.com/vulnerabilities-threats/bad-memories-haunt-ai-agents

4. Brunner, T., Liu, Y., Pande, M. (Google Threat Intelligence). AI threats in the wild: The current state of prompt injections on the web. Google Online Security Blog, April 23, 2026. https://security.googleblog.com/2026/04/ai-threats-in-wild-current-state-of.html

5. AgentPoison Authors. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. arXiv:2407.12784. https://arxiv.org/abs/2407.12784

6. MemoryGraft Authors. MemoryGraft: Episodic Memory Poisoning in Agentic AI. arXiv:2512.16962. https://arxiv.org/abs/2512.16962

7. eTAMP Authors. eTAMP: Cross-Session Trajectory Memory Poisoning Against AI Browser Agents. arXiv:2604.02623, April 2026. https://arxiv.org/abs/2604.02623

8. Rehberger, J. SpAIware: Persistent Cross-Session Memory Injection in ChatGPT. September 2024. https://embracethered.com/blog/posts/2024/chatgpt-macos-app-persistent-data-exfiltration/

9. secops.group. Securing Agentic AI: The OWASP Top 10 and Beyond. April 2026. https://secops.group/blog/securing-agentic-ai-the-owasp-top-10-and-beyond/

10. OWASP. LLM Top 10 2025: LLM08 (Vector and Embedding Weaknesses), LLM06 (Excessive Agency). https://genai.owasp.org/llmrisk/

Lyrie.ai Cyber Research Division — Senior Analyst Desk