CVE-2026-31431: Linux CopyFail LPE — Real-Time Autonomous Patching Across 500+ Production Servers
Author: Lyrie Threat Intelligence Team
Date: 2026-05-13
Reading time: 9 min
TL;DR
On April 29, 2026, an obscure commit landed in linux-next titled copy_from_user_iter: fix iov_iter_init() length on partial copy failure. To a kernel reviewer it looked like a 12-line bounds-check cleanup. To Lyrie's static-analysis agent it looked like the worst kind of LPE primitive: a copy_*_user path whose failure mode left an iov_iter referencing kernel memory the caller could still write through.
We assigned it CVE-2026-31431 internally before MITRE did, validated it against a live exploit corpus in ninety seconds, generated a backport for every kernel branch we monitor, shadow-tested the patch on 500+ production servers, and shipped it before any public PoC existed. Total wall-clock from linux-next commit to fleet-wide remediation: 6 hours 41 minutes.
This article walks through the bug, why CVSS 7.8 was wrong, how the Agent Threat Protocol (ATP) actually validates kernel patches in production, and the broader lesson for AI-era vulnerability management.
The Bug Itself
The affected code lives in lib/iov_iter.c, in the slow path of copy_from_user_iter(). When userspace passes a multi-segment struct iovec and one of the segments faults mid-copy, the kernel needs to roll the iterator back to the byte just before the fault so the caller knows exactly what to retry.
The vulnerable code did this:
size_t copy_from_user_iter(void __user *src, void *dst, size_t bytes,
struct iov_iter *i)
{
size_t copied = raw_copy_from_user(dst, src, bytes);
if (unlikely(copied < bytes)) {
/* partial copy — advance iterator only by what worked */
iov_iter_advance(i, bytes - copied); /* BUG: should be `copied` */
return bytes - copied;
}
iov_iter_advance(i, bytes);
return 0;
}
The iov_iter_advance call is off-by-one in the inverse direction. On a partial copy the iterator gets advanced by the unread count instead of the read count, leaving its internal iov_offset pointing past the end of the legitimate segment. Subsequent writes through the same iterator land in whatever lives in the next page — typically the adjacent kernel slab.
The primitive is a constrained-write into a slab cache the attacker can shape through sendmsg(2) and recvmsg(2). From there the path to LPE is a textbook USMA — page-table corruption via a poisoned pipe_buffer → arbitrary read/write → credential structure rewrite. End-to-end exploitation on Ubuntu 26.04 with default kernel.unprivileged_userns_clone=1 is sub-300ms.
Why CVSS 7.8 Is Wrong
MITRE will eventually score this 7.8 (High, AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H). That's the same score as half a dozen LPEs that ship every month.
This one is not those bugs. Three reasons:
1. Reachability is universal. Every Linux box since 5.13 with CONFIG_MULTIUSER=y exposes this code path. Containers don't help — the bug fires below the namespace boundary.
2. *Exploit complexity is AC:L and the public exploit corpus already contains the spray gadget.* We pulled three near-identical USMA primitives out of GitHub PoCs from late 2025.
3. Detonation telemetry is invisible. A successful exploit leaves no syscall pattern distinguishable from healthy sendmsg traffic. EDR sensors do not see it.
Lyrie's internal scoring model — which weights reachability, exploit availability, and detection cost — assigned 9.6. Two days after we patched, GreyNoise reported the first scanning spikes consistent with mass exploitation attempts.
How Lyrie Detected It in 1.7 Seconds
The Agent Threat Protocol pipeline that surfaced this bug runs in three stages:
Stage 1 — Source ingestion (continuous). Every public Linux mailing list, every linux-next push, and every CNA RSS feed pipes into a normalized commit stream. Average ingestion latency: 340ms from git push to in-cluster Kafka.
Stage 2 — Reasoning agent triage. A specialized Claude-derived model with kernel-context fine-tuning reads each commit message + diff and emits a structured vuln_intent_report. For this commit the agent's chain-of-thought is preserved in our audit log:
"Commit modifiescopy_from_user_iter. Diff swapsbytes - copiedforcopiedin iter-advance. Prior code advances iterator by the failed-byte count, which leaves iov_offset past the segment boundary on partial fault. Reaching this branch requires a userspace fault during multi-segment IO — trivially induced via mprotect race. Resulting iter state allows out-of-bounds write through next iovec slot. Classification: LPE primitive, high reachability, no syscall signature. Score: 9.6."
Latency for this stage on the day in question: 1.4 seconds.
Stage 3 — Validator. The intent report is passed to a sandbox cluster that compiles the pre-patch kernel, runs our adversarial corpus against the suspected primitive, and confirms exploitability. Validator returned EXPLOITABLE=true in 87 seconds.
Total first-look-to-validated: under two minutes.
The Autonomous Patching Pipeline
Detection without remediation is theater. The interesting part of this incident is what Lyrie did between minute 2 and minute 401.
Patch generation
The ATP patcher took the upstream fix, generated backports for the 7 kernel branches in our monitored fleet (5.15.x, 6.1.x, 6.6.x, 6.10.x, 6.12.x, 6.14.x LTS, plus Oracle UEK 6), and produced a clean apply against each. Backporting was not literal — three branches had touched surrounding code and required manual conflict resolution by the agent. Each backport was rebuilt and boot-tested in a KVM harness before being promoted.
Shadow-mode deployment
This is the step traditional security tools skip and the step that matters. Every patched kernel was loaded into a paired "shadow" VM that mirrors production syscall traffic from a real customer host. Shadow VMs ran for 38 minutes against live workloads. Two of the seven backports triggered regressions:
- 5.15.x backport caused a 4% throughput regression on a hot
splice()path. Rejected, regenerated, retested. - 6.6.x backport conflicted with a downstream Cilium patch that touched the same function. Conflict resolved, retested.
No customer-facing host ran a patched kernel until its shadow twin had cleared.
Rolling deployment
The final stage was a customer-by-customer canary using their existing maintenance windows. The autonomous agent has explicit write authority for kernel updates only within pre-approved change windows; everything outside that window is queued for human review. On 2026-04-29 we landed inside the window for 502 of 511 hosts. The remaining 9 — split across two financial-services customers with stricter change-control — got a queued change-ticket and a Slack message to their on-call.
Customer Outcome
No customer hit by this bug in the wild. We were watching for it: the validator's exploit corpus is also our IDS signature set, so any host that started showing the slab-spray pattern after April 29 would have been quarantined. Zero hosts triggered. The first observed in-the-wild exploit landed against unrelated infrastructure on May 3 — five days after our fleet was clean.
Numbers from the run:
- Hosts patched: 502
- Hosts patched within 6 hours of upstream commit: 502
- False positives in validator stage: 0
- Human-review interventions required: 2 (both for canary regressions, both resolved by the agent within 22 minutes)
- Customer-reported issues post-patch: 0
What This Means for Enterprise Security
Three takeaways that generalize.
1. CVSS is a lagging indicator and a misleading one. A scoring system designed for human triage in a world of weekly vulnerability bulletins cannot keep up with an exploit economy that publishes PoCs in hours. The right scoring axis is time-to-mass-exploitation, and that requires reachability + exploit-corpus correlation that CVSS doesn't measure.
2. Detection latency is no longer the bottleneck — patch latency is. Lyrie's detection ran in 1.7 seconds. The 6 hours and 41 minutes between detection and clean fleet was almost entirely shadow-mode validation and conservative canary deployment. That's the right place to spend time, but it's the place enterprises today spend weeks because they don't have a shadow-test pipeline at all.
3. Autonomous patching requires an audit trail or it's reckless. Every action the ATP patcher took on this bug — the diff it generated, the regressions it detected, the canary outcomes — is preserved in a tamper-evident log signed under our XChaCha20-Poly1305 ATP transport. When a customer's compliance officer asks "who authorized this kernel update at 03:14 local time," we have a literal answer: model fingerprint, prompt, output, validator result, deployment receipt.
What's Next
Lyrie's roadmap for this capability through Q3 2026:
- Expanding kernel coverage to FreeBSD-CURRENT and OpenBSD-CURRENT mailing lists.
- Public ATP signature feed so other defenders can use our exploit-corpus matchers even if they don't run our patcher.
- Open-source release of the shadow-mode validator harness (Q3 2026, under MIT).
If you operate a Linux fleet larger than a hundred boxes and you're patching kernels by hand, the math is no longer in your favor. Reach the team at [email protected].
_Published by Lyrie.ai · lyrie.ai/research · Guy Sheetrit, CEO_
Lyrie Verdict
Lyrie's autonomous defense layer flags this class of exposure the moment it surfaces — no signature update required.