Lyrie
AI-Security
0 sources verified·12 min read
By Lyrie Threat Intelligence·4/27/2026

The Agentic Trojan: ClawHavoc, ClawJacked, and How AI Skill Marketplaces Became the Next Supply Chain Battleground

TL;DR: IBM X-Force researchers catalogued over 255 security advisories for OpenClaw — GitHub's most-starred AI agent platform — in a single report published this week. The ClawHavoc supply-chain campaign compromised 341 of 2,857 ClawHub skills (12% of the entire registry) with keyloggers and infostealers. ClawJacked (CVE-2026-25253, patched February 2026) let any malicious website silently hijack a locally running agent via indirect prompt injection. Together these incidents prove the threat model security professionals spent years projecting has arrived: autonomous AI agents are now a tier-one attack surface, and defenders are operating without the tools they need to track it.


Background: The Agent Revolution Happened Faster Than Security Could Follow

Agentic AI was a research curiosity in 2024. By late 2025 it was a product category. By Q1 2026, roughly one quarter of organizations were actively piloting autonomous AI agents, according to Deloitte. Platforms like OpenClaw — which can browse the web, write and execute code, manage local files, read credentials, and chain arbitrarily many capabilities into autonomous task loops — went from niche developer tools to enterprise infrastructure in months.

That velocity created a problem security has seen before: adoption outran threat modeling, threat modeling outran tooling, and attackers noticed before defenders did.

OpenClaw (formerly ClawdBot, formerly MoltBot) became GitHub's most-starred repository weeks after launch. It also became, almost immediately, an attack target. IBM X-Force researchers Chris Ristig and Sandra Hill, publishing on April 23, 2026, documented what the platform's security posture looks like from the outside: 255+ published GitHub Security Advisories, spanning command execution vulnerabilities, plaintext API key leaks, and credential theft via indirect prompt injection.[1]

The CVE tracking system — designed for traditional software stacks — is already breaking under the load. Many OpenClaw vulnerabilities haven't been assigned CVE identifiers at all, which means they won't surface in vulnerability scanners, compliance dashboards, or most patch management platforms. They are effectively invisible to enterprise security tooling.


Technical Analysis

The Lethal Trifecta: Why Agentic AI Is Uniquely Dangerous

IBM X-Force named the core risk clearly: autonomous AI agents possess a "lethal trifecta" that no previous software category combined at this scale:

1. Deep access to private local data — file systems, credentials, browser sessions, SSH keys, API tokens stored in config files

2. Interaction with untrusted external content — web pages, email, documents, API responses, any data the agent reads while executing tasks

3. Ability to communicate outward — send messages, make API calls, push data to external endpoints, execute code that phones home

In a traditional application, compromise at one layer doesn't automatically cascade to the others. In an agentic system, they're all connected by the LLM acting as coordinator. A single poisoned input that hijacks the LLM's instruction state can chain across all three layers simultaneously. As IBM X-Force put it: "A leaked token or a spoofed packet can quickly escalate into full operator-level compromise."

This isn't theoretical. The ClawJacked and ClawHavoc incidents demonstrate it operating in the wild.

ClawJacked: Indirect Prompt Injection as a Remote Hijack Vector

The ClawJacked vulnerability, discovered by Oasis Security researchers and patched in OpenClaw version 2026.2.26 (February 26, 2026), is a textbook illustration of how agentic AI transforms prompt injection from a "say something bad" problem into a "do something bad" problem.

The attack chain:

A malicious website instructs OpenClaw's browser automation layer to treat embedded content as commands rather than data. Because OpenClaw is designed to act on instructions it processes during task execution, and because the platform's trust model didn't adequately distinguish between user-originated instructions and data-originated instructions, a sufficiently crafted web page could:

1. Identify that an OpenClaw instance was running locally (via connection probing)

2. Brute-force the local API port (OpenClaw defaults to localhost with predictable port ranges)

3. Inject instructions that caused the agent to exfiltrate data — API keys, session tokens, file contents — to an attacker-controlled endpoint

4. Execute the exfiltration silently, with no user interaction required beyond the agent's autonomous browsing task

The Oasis Security disclosure framing made the severity clear: the victim didn't need to click anything, approve anything, or interact with the malicious site in any way the agent didn't already plan to do. The attacker needed only to get the target agent to visit (or process content from) a page they controlled.

This is the "confused deputy" pattern manifested in production AI infrastructure. The agent has permissions the user granted for legitimate tasks; indirect injection redirects those permissions toward attacker goals without triggering any visible user consent event.

Patched in 2026.2.26. But the underlying architectural problem — that an LLM coordinating autonomous actions needs robust trust boundaries between instruction sources — is not solved by a single patch. It's a design challenge that affects the entire category of agentic AI platforms.

ClawHavoc: 12% of the Skill Registry Was Malware

The ClawHub marketplace for OpenClaw skills (analogous to npm for Node.js, or the Chrome Extension store for browsers) was compromised at scale in early 2026. The campaign, dubbed ClawHavoc by investigators, ran in at least two phases:

Phase 1 — Infiltration: Attackers created publisher accounts and uploaded skills designed to appear legitimate. One confirmed threat actor, operating as "hightower6eu," uploaded dozens of near-identical malicious skills with slight name variations. The skills used professional documentation, screenshots, and credible descriptions — "solana-wallet-tracker," crypto utilities, developer productivity tools — the same masquerade pattern used in npm and PyPI supply chain campaigns.

Phase 2 — Execution: Once a user installed a ClawHavoc skill and the skill received execution context from the agent runtime, it did one of two things depending on the victim's operating system:

  • Windows: Dropped a keylogger
  • macOS: Installed Atomic Stealer malware (a credential-harvesting infostealer that targets browser passwords, crypto wallets, and stored API keys)

The final confirmed count: 341 malicious skills out of 2,857 audited — roughly 12% of the entire ClawHub registry. Several of the malicious packages became some of the most-downloaded in the catalog before detection. CVE-2026-25253 describes the sandbox escape that allowed skills to break out of execution constraints and operate with host-level privileges.

IBM X-Force noted that by the time investigators fully mapped the campaign, attackers had uploaded over 1,100 malicious skills in total (including variants and re-uploads after initial removals). The Next Web reporting put the initial discovery at 341 confirmed before the campaign expanded to 800+ before full takedown.[2]

This is the npm/PyPI playbook, applied to an attack surface with far more damaging execution context. A malicious npm package compromises a developer build environment. A malicious ClawHub skill compromises an autonomous agent with access to the operator's local files, credentials, browser, and code execution environment — all at once.

The CVE Tracking Gap: AI Vulns Are Invisible to Enterprise Tooling

IBM X-Force raised a structural concern that deserves emphasis. The CVE assignment and enrichment pipeline — NIST NVD, MITRE, the full disclosure ecosystem — was built for traditional software. The pipeline assumes vulnerabilities are discrete, identifiable, and arrive at a pace compatible with manual review.

Agentic AI is breaking that assumption. OpenClaw's 255+ GitHub Security Advisories represent only the formally tracked issues. Many vulnerabilities in this category aren't receiving CVE identifiers fast enough, or at all. This creates a dangerous blind spot:

  • Patch management tools rely on CVE IDs to surface what needs updating
  • Compliance frameworks (SOC 2, PCI, ISO 27001 controls) reference CVE-based risk scoring
  • SIEM correlation rules and threat intel feeds key on CVE identifiers
  • Vulnerability scanners miss flaws without formal CVE assignments

An enterprise deploying OpenClaw in production may have a fully patched, CVE-clean instance — and still be running with multiple untracked vulnerabilities that appear nowhere in their security tooling. IBM X-Force's framing: "The traditional CVE assignment and enrichment process is working to adapt and catch up, but organizations can't afford to wait for formal updates before responding."

The Broader 2026 AI Security Landscape

ClawHavoc and ClawJacked are specific incidents. They sit inside a broader threat environment that TokenMix Research Lab documented on April 25, 2026, drawing on disclosed incidents across the industry:[3]

| Metric | 2026 Reality |

|---|---|

| OWASP LLM01 (top risk) | Prompt injection (unchanged) |

| Production AI deployments vulnerable to injection | 73% |

| Multi-turn jailbreaks | Now the primary vector on frontier models |

| Jailbreak transfer rate (GPT-4 → Claude 2) | 64.1% |

| Average time to generate successful GPT-4 jailbreak | <17 minutes |

The evolution from 2024-2025 to 2026:

  • Single-shot jailbreaks declining as frontier models improve alignment; attackers pivoted to multi-turn conversational attacks that exploit the model's contextual memory across a session
  • Multimodal injection surfaces matured: malicious instructions embedded in images, steganographic payloads in PNG/JPEG metadata, QR codes decoded to override commands, PDF documents carrying embedded directives
  • MCP tool poisoning emerged as agents adopted the Model Context Protocol — malicious servers register seemingly useful tools, agent frameworks naively trusting tool registration get compromised (extensively covered in our April 26 deep dive on CVE-2026-30615)
  • Credential exfiltration via injection became a documented pattern: payloads that trick agents into echoing API keys or session tokens to attacker-controlled endpoints via crafted tool parameters or URL construction

The Slack AI Assistant incident from 2025 established the template: hidden instructions in Slack messages hijacked the AI assistant to leak private channel data to attacker-controlled servers. No malware, no CVE, no perimeter breach — just prompt injection in what looked like normal internal chat. The ClawJacked pattern is the local-agent equivalent.


IOCs / Indicators

ClawHavoc Campaign:

  • Publisher account: hightower6eu (confirmed malicious)
  • Malicious skill naming patterns: cryptocurrency trackers, "solana-*", developer utilities with minor name variations from legitimate packages
  • Payload behavior: Atomic Stealer on macOS, keylogger payload on Windows
  • CVE: CVE-2026-25253 (sandbox escape enabling host-level execution)

ClawJacked Vulnerability:

  • Affected versions: OpenClaw prior to 2026.2.26
  • Attack vector: Indirect prompt injection via malicious web content
  • Mechanism: Local API port brute-force + instruction injection via processed web content
  • Patched: OpenClaw version 2026.2.26 (February 26, 2026)
  • Researcher: Oasis Security

Behavioral IOCs for AI Skill Compromise:

  • Agent making outbound requests to non-task-related endpoints during skill execution
  • Unusual file access patterns (credential files, SSH keys, .env files) outside stated task scope
  • Skill processes spawning child processes not declared in skill metadata
  • Network connections to IP ranges not associated with legitimate skill function
  • Exfiltration patterns: base64-encoded data in URL parameters, DNS tunneling, HTTP PUT to unexpected endpoints

Lyrie Take

The ClawHavoc and ClawJacked incidents confirm something Lyrie has been tracking since we published our agentic AI threat model in late 2025: the agentic attack surface is not an extension of the LLM attack surface — it's a categorically different class of risk.

When an attacker manipulates a chatbot, they get bad text. When an attacker manipulates an autonomous agent with operator-level access, they get an autonomous execution environment pointed at your credentials, your files, your internal systems, and your outbound communication channels. The blast radius is not bounded by what the LLM can say — it's bounded by what the agent has permission to do.

The skill marketplace attack pattern is particularly concerning because it inverts the trust model defenders rely on. Security teams can audit code. They can review dependencies. They can scan for known-malicious packages in npm and PyPI. But AI skill marketplaces are operating outside established software supply chain controls: no signing by default, no reproducible builds, no automated behavior analysis at the registry level, and — critically — CVE tracking that can't keep pace with the volume.

The rogue AI dimension is direct. ClawHavoc skills, once installed, operated as autonomous agents within the agent: they redirected the host agent's execution context toward attacker goals without the operator's knowledge or consent. This is precisely the threat model Lyrie was built to address — not just malware, but AI systems operating contrary to operator intent, at machine speed, with operator-level permissions. An agent that installs a malicious skill and then executes that skill's instructions is no longer your agent. Detecting that state transition — intent drift, behavioral divergence from declared purpose — is the core problem Lyrie's autonomous defense layer solves.

The numbers from IBM X-Force and TokenMix make the urgency clear: 73% of production AI deployments are vulnerable to prompt injection. 12% of a major skill registry was actively malicious. This is not edge-case threat modeling. It's current-state risk management.


Defender Playbook

For organizations running agentic AI in production:

1. Audit your skill/plugin registry exposure immediately. If you're running ClawHub skills installed before February 2026, assume any skill installed from a non-verified publisher should be treated as potentially compromised. Cross-reference against the 341 confirmed malicious skill list (via Reco.ai disclosure).

2. Update OpenClaw to 2026.2.26 or later. CVE-2026-25253 and the ClawJacked IPI vulnerability are patched. Running pre-patch versions against untrusted web content is unacceptable operational posture.

3. Implement explicit trust boundaries for instruction sources. Your agent's system prompt should explicitly define what sources are authorized to provide instructions. Data processed during task execution (web content, emails, documents, API responses) should be handled as untrusted data, not as potential instruction input. This is an architectural control, not just a prompt engineering recommendation.

4. Deploy network monitoring for agent processes. Instrument outbound connections from agent runtime processes. Legitimate task execution has a predictable network footprint; malicious exfiltration does not. Anomalous DNS, unexpected IP ranges, and base64-in-URL patterns are all detectable at the network layer.

5. Establish a CVE-independent tracking process for AI security advisories. GitHub Security Advisories for AI platforms, vendor disclosure blogs, and security researcher publications are producing vulnerability intelligence faster than the CVE pipeline can formalize. Your security program needs to ingest these directly rather than waiting for NVD enrichment.

6. Apply minimal-privilege principles to agent capability scope. Agents that need to draft emails don't need file system write access. Agents that need to read documents don't need SSH tooling. Reducing the capability surface limits the blast radius of a successful compromise.

7. Require human-in-the-loop checkpoints for high-risk agent actions. File deletion, outbound API calls to new endpoints, code execution outside pre-approved paths, and credential access should all trigger confirmation prompts that can't be bypassed by instruction injection from processed content.

8. Monitor for multi-turn prompt injection patterns. Single-shot injection detection (looking for "ignore all previous instructions" in input) is insufficient. Multi-turn attacks gradually shift context across a session. Behavioral monitoring that tracks semantic drift in agent task execution is the correct detection approach.


Sources

1. Chris Ristig, Sandra Hill — "What OpenClaw reveals about agentic AI security risks" — IBM X-Force / IBM Think, April 23, 2026: https://www.ibm.com/think/x-force/agentic-ai-growing-fast-vulnerabilities

2. The Next Web — "Sequoia distributes 200 engraved Mac Minis at AI event as OpenClaw becomes the infrastructure layer VCs cannot own" (ClawHavoc timeline detail) — April 2026: https://thenextweb.com/news/sequoia-openclaw-mac-mini-ai-agents

3. TokenMix Research Lab — "LLM Security News 2026: Latest Attacks, Defenses & Updates" — April 25, 2026: https://tokenmix.ai/blog/llm-security-news-2026-attacks-defenses-updates

4. Reco.ai — "OpenClaw: The AI Agent Security Crisis Unfolding Right Now" (341 malicious skills detail, CVE-2026-25253) — April 2026: https://www.reco.ai/blog/openclaw-the-ai-agent-security-crisis-unfolding-right-now

5. IBM X-Force — "Agentic AI Attack Surface 2026" / dev.to cross-reference: https://dev.to/lucky_lonerusher/autonomous-ai-agents-attack-surface-2026-security-risks-of-agentic-ai-4bl1

6. OWASP LLM Top 10 2026 / Elevate Consulting analysis: https://elevateconsult.com/insights/owasp-llm-top-10-security-vulnerabilities-every-ai-developer-must-know-in-2026/

7. Adven Boost — "OpenClaw ClawHub: The 2026 Security-First Guide to Agent Skills" (sandbox escape detail): https://advenboost.com/openclaw-clawhub/

8. Anthem Creation — "Hermes vs OpenClaw" (12% registry compromise confirmation): https://anthemcreation.com/en/artificial-intelligence/hermes-vs-openclaw-learning-orchestrating-ai-agents/


Lyrie.ai Cyber Research Division — Senior Analyst Desk

Lyrie Verdict

Lyrie's autonomous defense layer flags this class of exposure the moment it surfaces — no signature update required.