Stylometry-as-a-Service: Claude Opus 4.7 Just Killed Anonymous Writing on the Internet
TL;DR
Journalist Kelsey Piper (writing for The Argument) discovered last week that Claude Opus 4.7 can identify her as the author of texts she has never published — including an unpublished political column (125 words), an unpublished educational progress report, an unpublished movie review in a genre she has never published in, fragments of a fantasy novel (~500 words), and a 15-year-old college application essay.
Tested in Incognito mode. Tested via API. Tested on a friend's computer. Tested with memory disabled and no account-linked context. Same result every time: "Kelsey Piper."
This is not an anecdote. The Washington Post, Techdirt, BoingBoing, and the broader AI/security press picked it up within a week. Multiple academics, including several at MIT and Princeton, reported being identified by Opus 4.7 from drafts and mid-conversation chats they were certain weren't in any training corpus.
The implication is what every threat researcher and CISO needs to internalize this week: the LLM era has operationalized stylometric attribution — taking a discipline that previously required PhD-level forensic linguistics expertise and turning it into a one-shot API call. Anyone with a substantial public writing corpus is no longer reliably anonymous when they write online under a pseudonym, no matter how careful their OPSEC.
This closes our six-piece weekly arc on identity, automation, and AI capability outpacing defense. And it is structurally bad news for whistleblowers, security researchers, journalists' sources, dissidents, and anyone whose physical safety depends on maintaining separation between their public name and an anonymous account.
What Happened
Anthropic released Claude Opus 4.7 in mid-April 2026. Piper, who writes a tech column for The Argument and previously at Vox, runs a private benchmark on every new frontier-model release: she pastes her own unpublished drafts and asks the model to guess the author.
For 18 months, this benchmark had been failing reliably. Models would guess plausible-but-wrong authors — Matt Yglesias, Scott Alexander, Freddie deBoer, Duncan Sabien, K.J. Parker. Sometimes they'd flatter her by attributing her work to better-known writers.
Opus 4.7 broke the benchmark.
The five test passages
Piper tested five distinct genres of her own writing, each unpublished:
1. 125 words of an unpublished political column draft (about Servant of the People, the Ukrainian political TV series). Opus 4.7: "Kelsey Piper." ChatGPT guessed Yglesias. Gemini guessed Scott Alexander.
2. An unpublished progress report about a tutoring student — completely different register (educational/practitioner). Opus 4.7: "Kelsey Piper." ChatGPT guessed Freddie deBoer. Gemini guessed Duncan Sabien.
3. A movie review — a genre Piper has never published in. Opus 4.7 AND ChatGPT: "Kelsey Piper." Gemini guessed Ursula Vernon.
4. ~500 words of an unpublished fantasy novel. Opus 4.7 needed more text but still got there: "Kelsey Piper." (ChatGPT guessed real fantasy novelist K.J. Parker.)
5. A 15-year-old college application essay written when Piper was a different writer entirely. Opus 4.7 AND ChatGPT both: "Kelsey Piper."
The validation pattern
Crucially, Piper was rigorous about excluding alternative explanations:
- All tests in Incognito mode to prevent browser-stored context
- Memory disabled in Claude
- No personal information in custom preferences
- A friend ran the same tests on his computer — same results
- Tests via API (no UI context whatsoever) — same results
The capability is the model itself, not metadata or session-leak. Opus 4.7 has internalized stylometric features of public writers it was trained on, and it can match unpublished text against those internal representations without explicit search or RAG.
The boundary case
Piper also tested it against people without significant public writing:
- A close friend who has minimal internet presence: Opus 4.7 failed — but it guessed two of her close friends from a Discord channel (Piper herself, and another friend with public writing)
- Another friend with no real-name corpus: also failed, but again guessed someone in his Discord subculture
The important lesson here: if you don't have a large public corpus under your real name, you're currently safe from direct identification — but you are vulnerable to attribution-by-association. Subcultural style markers (the way effective altruists write, the way Rust developers write, the way crypto Twitter writes) leak enough that the model can map a writer to their community even when it can't name the individual.
Why This Is Different From Previous Stylometry
Stylometric authorship attribution is not new. It dates to Mosteller & Wallace's 1964 work on the Federalist Papers, and modern academic stylometry tools like JGAAP, Stylo (R), and WriteprintsRFW have been usable for years.
So what makes Opus 4.7 different?
1. The corpus is everything ever published, not just one author's known works
Classical stylometry requires you to have a comparison corpus — known writing samples of the suspected author, plus enough samples of likely alternative candidates to score against. You're matching one document against a small known pool.
Opus 4.7's "comparison corpus" is the entire public training corpus. When you ask it "who wrote this?" it's effectively asking "of all the millions of writers I've seen, whose stylistic distribution is closest to this passage?" The needle isn't being matched against ten haystacks — it's being matched against every haystack on the internet that the training data could reach.
2. No statistical setup needed
Classical stylometry tools require you to:
- Choose feature sets (function-word frequencies, n-grams, POS-tag distributions, sentence-length distributions, type-token ratio)
- Choose a classification algorithm (Burrows' Delta, SVM, random forest)
- Tune hyperparameters
- Run on a controlled corpus
- Interpret confidence intervals
Opus 4.7 collapses all of that into: "Paste text. Ask. Receive name." This is the commoditization of stylometry — the same kind of shift that turned cryptography from a PhD-only discipline into a checkbox in a web framework.
3. Cross-genre attribution
This is the property that makes Opus 4.7 genuinely scary. Classical stylometry mostly fails when the attacker tests across genres — a journalist's news writing has different features than the same journalist's personal letters or fiction. The features that disambiguate authors are typically genre-specific.
Opus 4.7 attributed Piper's writing across at least five distinct genres (political column, educational report, movie review, fantasy fiction, college essay). It is finding cross-genre invariants — fingerprints that persist even when the writer is consciously trying to write differently.
What those invariants are remains unclear (Piper's attempt to ask Opus to explain itself produced post-hoc rationalizations that were "absolute nonsense"), but the empirical result is reproducible.
4. Sub-paragraph minimum
125 words is small. Most academic stylometry tools require 5,000+ words for confident attribution. Opus 4.7 is converging at one twentieth of that.
The trend is downward. As Piper notes: "the amount of public text needed for this kind of deanonymization to work is likely to eventually decrease." Six months from now, expect 50-word minimums. A year from now, single-tweet identification for prolific writers.
Threat Modeling — Who's at Risk
Tier 1 — Immediate, high-severity risk
People who write extensively under their real name on the public internet and maintain anonymous accounts where their physical safety, career, or freedom depends on the separation:
- Journalists' sources in authoritarian regimes (especially those who message reporters via apps and write public content under their real name elsewhere — even Glassdoor reviews, Reddit comments, GitHub issues, code commit messages)
- Whistleblowers drafting public communications about their employer
- Security researchers maintaining "research" personas that are publicly active under their real name elsewhere
- Dissidents and political activists in regimes that imprison for online speech
- LGBTQ+ writers in jurisdictions where being identified is dangerous (Piper herself flags this in her piece)
- Sex workers and adult content creators who maintain firewall between work and civilian identities
- Survivors of abuse writing publicly about their experiences while maintaining a public-name career
For Tier 1, the attack surface is now any text they have ever written under their real name. Every blog post, every Reddit comment, every GitHub commit message, every Stack Overflow answer is now training-corpus material that an adversary can run a deanonymization API call against.
Tier 2 — Near-term elevated risk
People with a moderate-to-large public writing corpus who maintain anonymous accounts for non-safety-critical reasons:
- Anonymous Glassdoor / Blind / Levels.fyi reviewers of their employer
- Anonymous code-review or pull-request commenters in open-source projects
- Anonymous corporate critics on Twitter / X / Bluesky
- Anonymous community moderators of subreddits and Discord servers
Tier 2 risk is mostly professional — career consequences from being doxxed, not safety consequences. But the operational attack is the same.
Tier 3 — Lower current risk (but trending higher)
People without significant real-name public writing:
- Most professionals who haven't blogged
- Most software engineers without publicly visible code
- Most users of consumer internet platforms
Currently safe by lack of training data. Trending higher as: (1) corporate communication tools surface more text, (2) AI training corpora expand to include more semi-public writing (Slack archives leaked in breaches, leaked Discord servers, scraped private chats), (3) the minimum-words threshold continues to drop.
What's Coming Next
Piper's experiment is the publicly-visible version of capabilities that adversarial actors are already deploying privately.
Adversaries who are already using stylometric AI
- State intelligence services — almost certainly already running this against suspected dissidents, with their own internal models (or fine-tuned versions of public ones) trained on country-specific corpora. The CCP, Iran, Russia all have demonstrated interest in style-based attribution as part of broader signal intelligence.
- Corporate counter-leak operations — companies investigating leaks now routinely run stylometric analysis against employee writing samples (Slack messages, emails, code comments) to narrow down whistleblower suspects. Opus 4.7-class capabilities make this trivial.
- Stalkers and harassers — the entire infrastructure of deanonymization-as-a-service is already a Discord-server-and-Telegram-channel cottage industry. Stylometric AI lowers the cost of a successful dox from "hours of OSINT work" to "a 30-second API call."
- Criminal organizations — for narrow targets where deanonymizing one user is high-value (a Bitcoin mixer admin, a darknet market vendor, a tip-line journalist).
What gets weaponized in the next 12 months
1. Glassdoor, Blind, and similar review sites — adversarial employers running stylometric analysis against employees who left negative reviews. Expect publicly-disclosed cases where companies use this against ex-employees.
2. Anonymous open-source contributions — corporate-policy violations identified by matching code commits to public writing.
3. Pseudonymous Twitter/X accounts — mass deanonymization of accounts with consistent posting history. Expect the Substack/X cottage-industry "anonymous influencer" market to take a hit.
4. Tipline communication — journalists' encrypted tipline messages, if they ever leak, become attribution-vulnerable.
5. Bug bounty submissions — researchers reporting under handle X who also have public writing under their real name become trivially deanonymizable to the disclosing company.
Lyrie Assessment
*The collapse of writing-style anonymity is not a "future problem" — it is a present, exploitable capability that defenders need to integrate into their threat models now.*
Several immediate consequences for the security industry:
1. OPSEC training needs an emergency update
Every "how to stay anonymous online" guide written before April 2026 is now incomplete. Tor, VPNs, encrypted email, hardware key separation, alias accounts — all useful, none sufficient. The new advice has to include style obfuscation, which is a fundamentally harder ask because it requires constant conscious effort.
The realistic options for someone who needs strong anonymity going forward:
- Stop writing publicly under your real name. This is the only fully effective defense, and it's not realistic for most professionals.
- Use AI to rewrite your anonymous output in a different style. Ironic, but it's the only scalable option. Tools like GPTZero's style-transfer endpoint and various open-source paraphrasers are starting points.
- Restrict anonymous output to short, factual, low-stylistic-content messages. A two-line bug report isn't deanonymizable. A 300-word essay is.
- Operate anonymous accounts through trusted intermediaries — a lawyer, a journalist, a translator — who rewrite output before publishing. Expensive, slow, but provably effective.
2. Whistleblower-protection programs need rebuilding
Existing whistleblower-tip infrastructure (SecureDrop, GlobaLeaks, NYTimes/WaPo tip lines, EU whistleblower-protection platforms) was designed assuming the threat model is traffic analysis and metadata. Style-based attribution changes the threat model fundamentally — even a perfectly-anonymous transmission of a 500-word document leaks identity if the source has a public writing corpus.
Recommendation for journalism organizations: build editorial workflows that style-rewrite source-submitted text before publication as a default-on protection, not an opt-in.
3. Corporate counter-investigation capability is now democratized
Where a Fortune 500 used to need a forensic-linguistics consultant ($500/hr+) to do internal whistleblower attribution, any company can now do it for the cost of an Opus API call. Expect a wave of post-leak investigations that successfully identify sources who would have been safe under 2025-era OPSEC.
This has uncomfortable implications for internal-leaks-as-an-accountability-mechanism — historically, leaks have been one of the most effective ways for misconduct to surface against the wishes of corporate leadership. Stylometric attribution shifts the power balance back toward employers.
4. Lyrie's defensive product roadmap implication
Two product lines that the stylometric capability creates demand for:
LyrieStyleVeil(concept, not yet roadmapped) — automated style-transfer for sensitive communications. Take user's draft, rewrite in a neutralized style, return to user. Operates as a privacy primitive comparable to Tor for traffic.LyrieAttributionMonitor— for Tier 1 customers (journalists, NGOs, dissident-protection orgs), continuously test their public corpus against their anonymous output and warn when stylometric proximity is dangerously high.
Neither of these are profitable products in the traditional B2B-SaaS sense. They are public-good infrastructure that the broader security ecosystem needs to fund collectively (analogous to how Tor, Signal, and SecureDrop are funded). We're in conversation with several digital-rights organizations about co-developing the first iteration.
5. The deeper strategic question
The capability that makes Opus 4.7 dangerous to whistleblowers is the same capability that makes it useful for legitimate authentication of human-written content (against AI-generated impersonation), forensic-linguistics in criminal cases, and defending against impersonation attacks. The same primitive serves both protection and harm.
This is the recurring pattern across the six pieces we've shipped this week: AI capabilities are dual-use by default. The cybersecurity question is never "should this capability exist?" — it's already deployed, and it's not going back in the box. The question is "how does the defensive ecosystem catch up to a capability that has been democratized faster than any previous comparable forensic technique?"
That gap — between the speed at which AI capabilities ship and the speed at which defensive primitives evolve — is the operational space where Lyrie operates. The PocketOS incident showed it for production agents. The Bad Bot Report showed it for web automation. CopyFail showed it for kernel exploits. Cyberzap showed it for offensive deception. And Opus 4.7's stylometry shows it for anonymity itself.
Six pieces in 24 hours, one consistent thesis: the defensive layer that adapts to novel AI capabilities in real time, without waiting for vendors to ship safety features, is no longer optional.
Recommended Actions
For Tier 1 individuals (whistleblowers, journalists' sources, dissidents, at-risk activists)
1. Audit your public real-name writing corpus. GitHub commits, Reddit comments, blog posts, LinkedIn articles, conference talk transcripts — all of it is now potential attribution material.
2. Stop typing original prose into anonymous channels. Either don't write, or have someone trusted rewrite your text in a neutralized style.
3. For active anonymous communications — assume style-based attribution is happening. Plan for the day your anonymous account is publicly linked to your real identity, even if you've done everything right.
4. For journalism/NGO operators — implement style-rewrite as default-on for all source submissions before publication.
For organizations with internal-leak risk (or counter-leak ambitions)
5. Update your investigation playbooks. Stylometric attribution is now in your forensic toolkit. Use it ethically. Document your use of it for legal compliance.
6. For the leakers' side — assume your employer is or will be using this. Compose internal communications and external anonymous communications in distinguishably different styles. Better: don't compose external anonymous communications about your employer.
For security researchers and CISOs
7. Add stylometric deanonymization to your threat model. It is a primitive now, not a research curiosity. Threat-model your whistleblower-protection process, your internal anonymous-feedback channels, your bug bounty submission flow.
8. Consider whether your customer-facing products leak style features that could be used to attribute users — chat transcripts, support ticket text, in-product comments. If yes, design defensive features.
For Lyrie users
9. Subscribe to our LyrieAttributionMonitor beta when it ships. Q1 2027 target. We're prioritizing Tier 1 customer access (journalism orgs, digital-rights NGOs, dissident-protection programs) over enterprise self-serve.
Sources
1. Kelsey Piper. "I can never talk to an AI anonymously again." The Argument, April 2026. https://www.theargumentmag.com/p/i-can-never-talk-to-an-ai-anonymously
2. Washington Post. "Artificial intelligence could kill anonymity online." https://www.washingtonpost.com/opinions/interactive/2026/04/26/artificial-intelligence-could-kill-anonymity-online/
3. Techdirt. "The Risks Of Anonymity In The Age Of Generative AI." https://www.techdirt.com/2026/04/27/the-risks-of-anonymity-in-the-age-of-generative-ai/
4. BoingBoing. "Claude Opus 4.7 identified a writer from 125 words she'd never published." https://boingboing.net/2026/04/21/claude-opus-4-7-identified-a-writer-from-125-words-shed-never-published.html
5. r/ClaudeAI Reddit thread. https://www.reddit.com/r/ClaudeAI/comments/1sw8npc/claude_47_named_a_journalist_from_125_words_of/
6. Mosteller, F. & Wallace, D. (1964). Inference and Disputed Authorship: The Federalist Papers. https://en.wikipedia.org/wiki/Mosteller_and_Wallace
7. Lefaroll Telegram channel coverage (Hebrew). https://t.me/Lefaroll
Lyrie.ai Cyber Research Division
Lyrie Verdict
Lyrie's autonomous defense layer flags this class of exposure the moment it surfaces — no signature update required.