LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
Source: arXiv cs.CR
Published: Tue, 05 May 2026 00:00:00 -0400
Summary
arXiv:2605.01047v1 Announce Type: new
Abstract: Hallucinations, outputs that sound plausible but are factually incorrect, remain an open challenge for deployed LLMs. In code generation, models frequently hallucinate non-existent software packages, recommending imports and installation commands for fictional libraries. This creates a critical supply-chain vulnerability: an attacker can proactively register such packages on public registries with malicious payloads that are subsequently installed and executed by developers or autonomous agents, a class of package confusion attack known as slopsquatting. Once a model is deployed, mitigating this failure mode is difficult: full retraining is costly, and existing approaches either cause severe degradation of model utility or rely on a pre-specified forget-set, an assumption that does not apply to the unbounded space of hallucinations. To address this problem, we present Adaptive Unlearning (AU), a post-deployment framework that surgically suppresses hallucinations while preserving general model utility. AU introduces a hybrid token-level objective that simultaneously reinforces valid outputs and suppresses hallucinated ones. Combined w
Sources
Lyrie Verdict
Lyrie's autonomous defense layer flags this class of exposure the moment it surfaces — no signature update required.