MCP Security Risks in Agent Skill Registries

Snyk’s ToxicSkills audit dropped in February 2026 with numbers that reframe the MCP security conversation entirely: 1,467 malicious payloads across 3,984 scanned skills, a 36% flaw rate, and 76 confirmed malicious skills with active payloads. Within days, Antiy CERT documented ClawHavoc, a coordinated supply chain campaign that poisoned 1,184 skills on ClawHub before most platform users had heard the word “remediation.” By April 27, OWASP had published a dedicated Agentic Skills Top 10 risk framework, confirming what the incident data already showed: agent skill registries are a mature attack surface, not an emerging one.

The npm parallel is instructive but understates the exposure. A malicious npm package might steal a credential. A malicious agent skill inherits the full permission set of the agent running it, with access to file systems, cloud credentials, persistent memory, and outbound channels, while simultaneously manipulating the agent’s reasoning through prompt injection. For engineering and security leaders deploying MCP-based tooling and coding agents at scale, the governance window is narrow.

The Numbers Behind the Problem

Snyk’s ToxicSkills research represents the first comprehensive security audit of the agent skills ecosystem, scanning 3,984 skills from ClawHub and skills.sh as of February 2026. The findings are difficult to contextualize without alarm: 36.82% of all skills (1,467 total) contain at least one security flaw, and 13.4% contain at least one critical-level issue. Snyk’s human-in-the-loop review confirmed 76 skills with malicious payloads, and as of publication, 8 of those skills remained publicly available on ClawHub.

The growth curve amplifies every one of those numbers. Daily skill submissions to ClawHub jumped from under 50 in mid-January 2026 to over 500 by early February, a 10x increase in weeks. Vetting capacity did not scale alongside it.

The barrier to publishing: a SKILL.md file and a GitHub account at least one week old. No code signing. No security review. No sandbox by default.

Why This Is Worse Than the npm Parallel

Traditional packages execute in relatively constrained contexts. Agent skills inherit the full permissions of the AI agent they extend: shell access, file system read/write, access to credentials stored in environment variables and config files, and the ability to send messages across email, Slack, and other channels. Persistent memory that survives across sessions.

A malicious agent skill can exfiltrate credentials, modify agent memory for long-term behavioral persistence, disable security tooling, and do all of this while the agent’s own safety mechanisms are being manipulated through simultaneous prompt injection. Snyk found that 91% of confirmed malicious skills combined traditional malware patterns with prompt injection techniques, a convergence that defeats both AI safety mechanisms and conventional MCP security scanners.

In the npm era, attackers figured out the attack surface after developers did. Here, according to Snyk’s research, they got there first.

Anatomy of a Hybrid Attack: When Prompt Injection Meets Traditional Malware

The 91% figure from Snyk’s ToxicSkills research deserves the most attention from security teams, because it describes something genuinely new in the threat landscape. Nearly every confirmed malicious skill combines traditional malware patterns with prompt injection techniques simultaneously. This convergence is a systematic response to how defenses are layered in agentic systems.

Traditional antivirus and static analysis tools scan for malicious code: suspicious executables, known-bad hashes, dangerous shell patterns. They cannot evaluate whether a block of natural language text is manipulating an AI agent’s reasoning. Conversely, AI safety filters are trained to recognize and refuse explicit malicious instructions, but are not reliably catching conventional malware embedded in what looks like legitimate skill documentation. Each defense layer has a blind spot, and the 91% convergence rate suggests attackers have mapped exactly where those blind spots meet.

Three Techniques That Frame the Picture

Password-protected ZIPs for AV evasion. Skill installation instructions direct the agent to download an archive that automated scanners cannot inspect. The password is provided in the skill documentation itself, readable by the agent, invisible to the scanner.

Unicode and Base64 obfuscation for exfiltration commands. Snyk found obfuscated commands in confirmed malicious skills that, once decoded, resolve to credential harvesting one-liners: a curl request that reads AWS credentials, base64-encodes them, and posts them to an attacker-controlled endpoint. The obfuscation layer keeps the payload out of signature-based detection while the prompt injection layer keeps the agent from questioning the instruction.

Hardcoded secrets in 10.9% of skills. Some are developer accidents, API keys left in published skill files. Others are deliberate, embedding passwords for malicious archives or tokens for attacker infrastructure directly into the skill content.

The academic research from PoisonedSkills, published in April 2026, adds important depth to the bypass question. The researchers introduced a technique called Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic inside code examples and configuration templates within skill documentation rather than phrasing it as an explicit instruction. When the agent handles a routine task, it reproduces these examples as reference implementations and executes them. Tested across four agent frameworks and five models, DDIPE achieved bypass rates of 11.6% to 33.5% across all configurations. Under the same defended setups, direct injection achieved 0%. The gap between those two numbers is the practical value of embedding malicious intent in documentation structure rather than stating it directly.

The PoisonedSkills finding also surfaces something important about MCP security defense architecture: the two defense layers, model-level safety alignment and framework-level architectural guardrails, do not compose predictably. Removing architectural protections amplified one model’s execution rate by 11.3x while leaving another nearly unchanged. Neither layer alone is sufficient, and their interaction is model-dependent in ways that are difficult to anticipate without systematic testing. For organizations building an AI governance framework around agent skill deployment, this asymmetry means no single control point covers the rest of the stack.

Static analysis catches the majority of these attacks: Semgrep flagged 90.7% of the PoisonedSkills adversarial samples. The remaining 9.3%, including a 479-byte pip configuration script that every tested model executed without hesitation, used legitimate-looking operational patterns that static tools cannot distinguish from routine automation. The 2.5% that evaded both static analysis and model alignment represent the hardest problem: payloads semantically disguised well enough to look like normal DevOps configuration to both automated scanners and the models themselves.

ClawHavoc: What a Real-World Agent Supply Chain Attack Looks Like

On February 1, 2026, Koi Security identified a coordinated wave of malicious skill uploads on ClawHub, operating simultaneously across multiple threat actors. By February 5, Antiy CERT’s post-incident analysis confirmed that 900 of the platform’s 4,500 skills, roughly one in five, had been weaponized. The campaign was named ClawHavoc, and it is the clearest picture we have of what an agent supply chain attack looks like at scale.

The Five-Step Kill Chain

The attack began with a poisoned manifest. Threat actors uploaded skills that appeared functional, in some cases genuinely were functional, while embedding malicious instructions in the manifest content. One skill titled “What Would Elon Do?” accumulated significant installations before detection, demonstrating that social engineering in skill marketplaces follows the same attention-harvesting playbook as traditional app store manipulation.

Step two was LLM social engineering through skill selection. Because agents choose tools based on natural-language descriptions, poisoned skills could manipulate that selection process through their metadata. The skill described itself as authoritative or uniquely capable for a given task, nudging the agent toward it without any code execution required yet.

Step three was execution under trusted agent context. Once selected, the skill ran with the full permissions of the host agent, inheriting file system access, environment variables, cloud credentials, and outbound network capability. Traditional security tooling, built to inspect code rather than evaluate natural language instructions, had no reliable visibility into what the agent was being directed to do.

Step four was memory poisoning, where ClawHavoc earned its durability as a threat. According to ThreatDown’s analysis, malicious skills rewrote MEMORY.md, the agent’s persistent instruction store, injecting directives that survived skill deletion. The agent continued behaving maliciously after the original skill was removed, because from the agent’s perspective, those directives had always been part of its configuration. Incident response teams who deleted the skill and moved on left the persistence mechanism fully intact.

Step five was credential exfiltration: the classic curl-based harvest of AWS credentials and similar secrets, routed to attacker-controlled infrastructure.

The Exposure Surface CVE-2026-25253 Made Exploitable

Running beneath the skill-based attack was a separate vulnerability that expanded the damage radius. SecurityScorecard’s STRIKE Team scanned the internet and found 42,900 publicly accessible OpenClaw instances across 82 countries. Of those, 93% lacked proper authentication. CVE-2026-25253, scored CVSS 8.8, chained a CSRF flaw into full remote code execution via WebSocket hijacking. OpenClaw accepted a user-supplied gatewayUrl parameter from the browser query string and automatically transmitted the user’s authentication token to whatever endpoint was specified. One malicious link, one click, full execution.

MCP Security as a Control Plane Problem

ClawHavoc exposed an MCP security architecture gap that no individual team can patch its way out of. The 15,200 instances still vulnerable after the patch was released confirm what traditional vulnerability management already taught us: disclosure and remediation do not move at the same speed, and in an ecosystem of autonomous agents with active tool-calling capabilities, the cost of that lag is higher than it was with passive software dependencies.

The campaign’s five-step structure is a blueprint for where controls need to sit. Registry governance stops step one. Agent hardening limits step three. Runtime monitoring catches steps four and five. At ClawHavoc’s peak, none of those layers existed in any consistent way across the 42,900 exposed instances. The attackers did not find a sophisticated vulnerability. They walked through a series of doors that nobody had thought to close yet.

OWASP Draws the Boundary: The Agentic Skills Top 10 Is Now a Recognized Threat Category

OWASP published its Agentic Skills Top 10 on April 27, 2026. When OWASP formalizes a threat category, it is confirming that the risk has already matured enough to require a shared vocabulary, structured mitigations, and institutional accountability. The AST10 is that confirmation for agent skill registries.

MCP Security Gets Its Own Risk Framework

AST01: Malicious Skills (Critical). The ClawHavoc campaign and Snyk’s ToxicSkills audit are both cited as defining incidents for this category. OWASP’s prescribed mitigations are Merkle root signing and registry scanning. Merkle root signing means treating every skill publication as a cryptographically verifiable event, the same approach that hardened certificate transparency in the browser ecosystem. Registries that cannot produce a signed audit trail of what was published, by whom, and when are operating without a basic accountability layer.

AST02: Supply Chain Compromise (Critical). Registry transparency and provenance tracking are the prescribed controls. The framing maps directly to what software supply chain security learned from SolarWinds and the npm typosquatting era: knowing what you’re running is a prerequisite for securing it, and “it came from the marketplace” is not provenance.

AST03: Over-Privileged Skills (High). Snyk’s February 2026 research found more than 280 credential-leaking skills in the wild. OWASP’s response is structural: least-privilege manifests and schema validation, meaning skills should declare exactly what permissions they require and nothing more, with those declarations machine-verifiable before installation.

AST10: Cross-Platform Reuse (Medium). The Universal YAML format has made it straightforward to port skills across ecosystems. Malicious skills documented on ClawHub have been observed migrating to skills.sh. An approved skill in one registry cannot be assumed safe in another.

The cumulative message from these four categories is precise: skills are software dependencies, and they need to be governed accordingly. An AI governance framework that covers your models and your prompts but stops short of the tools your agents are calling is incomplete. OWASP has now drawn that boundary explicitly.

Try Obot Today

⬇️ Download the Obot open-source gateway on GitHub and begin integrating your systems with a secure, extensible MCP foundation.

The MCP Layer: Why the Attack Surface Grows With Every New Server You Connect

MCP standardizes more than the integration surface. It standardizes the attack surface too.

When an agent connects to an MCP server, it receives tool descriptions written in natural language. Those descriptions are how the agent decides which tools to call, when to call them, and what to trust. A malicious MCP server doesn’t need to ship exploitable code; it can simply describe its tools in ways that redirect agent behavior, monopolize tool selection, or smuggle instructions into the agent’s reasoning process. The manipulation lives entirely in metadata.

MCP Security Starts With What You’re Connecting To

The registry landscape makes this harder to manage than it should be. Skills.sh, one of the two major skill distribution channels Snyk audited, has zero publisher verification and no automated scanning. ClawHub at least ties publishing to a GitHub account and runs VirusTotal scans before approval. But even ClawHub’s local scanner has documented gaps: according to the explain-openclaw security documentation, the openclaw security audit command inspects only JavaScript and TypeScript file types. It does not scan .md, .sh, or .py files. It also caps at 500 files and 1 MB per file, silently skipping anything over the limit. SKILL.md, the primary carrier of prompt injection payloads in malicious skills, falls entirely outside its scope. That is the file format malicious skills depend on most.

Every MCP server an agent connects to is an unvetted dependency with potential access to the agent’s full permission set. Each new connection expands the attack surface by a factor proportional to what that agent can reach: file systems, credentials, cloud APIs, memory that persists across sessions. Organizations deploying agents at scale are accumulating MCP connections faster than any team can manually review them.

The architectural response is a centralized MCP gateway sitting between agents and the tools they call. Instead of each agent maintaining its own direct connections to an unbounded set of MCP servers, a gateway enforces an approved catalog, handles authentication, and maintains an audit trail of every tool invocation. The Obot MCP Gateway is a production-grade implementation of this pattern, designed specifically for the governance reality that OWASP’s AST10 and the ToxicSkills research have now formalized. For any organization building a coherent AI governance framework around agent tool access, a centralized gateway converts an unmanageable perimeter into a controlled one.

The security posture of your agent deployment is, in large part, a function of how many unreviewed MCP servers it can reach. Reducing that number is the first architectural decision worth making.

What Organizations Deploying Coding Agents at Scale Need to Do Now

Treat Skills and MCP Servers as Software Dependencies

Every skill your agents load, every MCP server they connect to, is a dependency with a transitive permission set. Apply the same controls you would to any third-party library: track provenance, pin versions, require review before deployment, and log every invocation. The OWASP Agentic Skills Top 10 prescribes registry transparency and Merkle root signing as the baseline for AST02 compliance. If your current tool catalog cannot tell you who published a skill, when, and from what source, you have an unsigned dependency graph.

Enforce Least Privilege at the Skill Level

Skills should not inherit the full permission set of the agent by default. OWASP AST03 calls for least-privilege manifests and schema validation: skills declare exactly what access they require, and those declarations are machine-verified before installation. Most of the 280-plus credential-leaking skills Snyk found in the wild did not need the access they had.

Audit Your Existing Deployments for Over-Privileged Connections

Map every MCP server connection across your current agent deployments. For each one, ask: does this agent need this tool’s full capability set, and would you know if that connection was established silently by a skill rather than explicitly by your team? Start with whichever answers are most uncomfortable.

Require Authentication on Every MCP Server Instance

93% of publicly accessible OpenClaw instances lacked proper authentication. CVE-2026-25253 turned that gap into remote code execution. Every unauthenticated MCP server instance in your environment is an open door.

Establish a Centralized Catalog with Provenance Tracking

The practical response to OWASP’s tool-layer requirement is a centralized gateway that maintains an approved catalog with provenance records, enforces authentication, and produces an audit trail of every tool invocation. Obot MCP Gateway is built specifically for this governance pattern, converting an unmanageable collection of direct agent-to-server connections into a controlled, auditable perimeter.

Watch for Memory Poisoning, Not Just Malicious Skills

Deleting a flagged skill is not remediation. ClawHavoc demonstrated that malicious skills rewrote MEMORY.md before removal, injecting directives that survived the incident response. Add memory integrity checks to your detection runbook: look for unexpected modifications to persistent agent instruction stores, behavioral drift after skill removal, and any agent configuration that cannot be traced to an explicit, reviewed change.

The Window for Getting Ahead of This Is Closing

The ToxicSkills audit, the ClawHavoc campaign, the PoisonedSkills research, and OWASP’s formal framework all point to the same moment: the agent skills ecosystem has already attracted serious adversaries, and most organizations haven’t yet mapped the attack surface they’re defending.

Skills are dependencies. MCP servers are trust boundaries. Memory is a persistence layer that survives incident response if you’re not watching it. None of that changes based on how quickly the ecosystem matures.

The organizations that come through this period well will be the ones that treated MCP security as an architectural decision rather than a patch-later problem. A centralized gateway like Obot MCP Gateway converts an unmanageable sprawl of direct agent connections into an auditable, governed catalog. Build that foundation before the next ClawHavoc, not after.

Get Started with Obot

Try Obot on GitHub · Get a demo · Read the docs

The New Supply Chain Frontier: Securing MCP Security and Agent Skills