Anthropic Distillation Attacks: DeepSeek, Moonshot, MiniMax
Anthropic accuses DeepSeek, Moonshot AI, and MiniMax of industrial-scale distillation via 24,000 fake accounts and 16M+ Claude exchanges. Full analysis inside.
Fraudulent accounts created
Claude exchanges recorded
Accounts per proxy cluster
Chinese AI labs implicated
Key Takeaways
On February 23, 2026, Anthropic published what may be the most detailed account of industrial-scale model theft in AI history. The company accused three Chinese AI labs — DeepSeek, Moonshot AI, and MiniMax — of conducting systematic distillation attacks against Claude using 24,000 fraudulent accounts and over 16 million exchanges. The operation extracted reasoning capabilities, tool-use patterns, and chain-of-thought processes that took Anthropic years and billions of dollars to develop.
The allegations land at a moment of maximum geopolitical tension over AI capabilities. The Trump administration is debating H200 chip exports to China, Anthropic CEO Dario Amodei testified before the House Homeland Security Committee on AI risks, and DeepSeek V4 just demonstrated capabilities that rival frontier Western models at a fraction of the cost. Whether you see this as legitimate intellectual property protection or competitive gatekeeping depends on where you stand — and both positions have merit.
What Anthropic Claims Happened
Anthropic's report details a sustained extraction campaign spanning months. The three accused companies allegedly created thousands of accounts through automated registration systems, then used those accounts to systematically query Claude with prompts designed to elicit its internal reasoning, tool-use capabilities, and safety-boundary behaviors.
Per-Company Breakdown
~150,000
exchanges recorded. Targeted advanced reasoning and chain-of-thought capabilities. Smallest volume but most precisely targeted queries.
~3.4 million
exchanges recorded. Broader extraction campaign spanning tool use, coding, and multi-turn conversation patterns.
~13 million
exchanges recorded. Largest volume by far. Wide-spectrum extraction covering general reasoning, safety boundaries, and conversational patterns.
The Hydra Cluster Architecture
The report describes what Anthropic calls “hydra clusters” — proxy architectures that managed 20,000+ accounts simultaneously. Each cluster rotated through accounts to distribute API traffic, mixed distillation queries with legitimate-seeming requests, and used geographic IP distribution to avoid triggering rate-limiting or abuse detection. The name reflects the multi-headed nature of the operation: cut off one account, and thousands more continue the extraction.
What Is Model Distillation and Why It Matters
Model distillation is a well-established technique in machine learning where a smaller “student” model learns to replicate the behavior of a larger “teacher” model. In its legitimate form, it is how companies deploy efficient models to edge devices, reduce inference costs, and create specialized models for narrow tasks. Google, Meta, and Anthropic itself use distillation internally.
Legitimate Use vs. Competitive Extraction
The line between legitimate distillation and what Anthropic describes is about consent and scale. A developer using Claude's API to build an application generates data as a byproduct. A company creating 24,000 fake accounts to systematically generate training data is something categorically different. The intent is not to use the model but to replicate it.
- Prompt variation: Thousands of paraphrased versions of the same question to capture the model's reasoning distribution, not just a single answer
- Chain-of-thought elicitation: Prompts specifically crafted to make the model “show its work,” revealing internal reasoning steps
- Safety boundary probing: Queries designed to map exactly where the model refuses requests, enabling the student model to learn the boundaries without the corresponding safety training
- Tool-use extraction: Complex multi-step tasks that force the model to demonstrate agentic capabilities — function calling, planning, and error recovery
The Safety Guardrail Problem
This is where distillation becomes a security concern rather than just an intellectual property issue. When you distill a model through its outputs, you capture its capabilities but not its safety training. The RLHF (Reinforcement Learning from Human Feedback), Constitutional AI constraints, and red-team-tested refusal mechanisms that Anthropic spent years developing do not transfer through API outputs. The resulting student model can replicate Claude's reasoning power without Claude's safety filters — creating what researchers call an “unconstrained” model.
How Anthropic Detected the Attacks
Anthropic's report provides unusual technical detail about its detection methodology. Ironically, the scale of the operation — which gave the accused companies more training data — also made detection more feasible. Patterns that would be invisible across dozens of accounts become statistically significant across thousands.
IP Address Correlation
Despite geographic distribution, clusters of accounts shared infrastructure patterns — similar IP ranges, identical timing distributions, and coordinated registration timestamps that revealed centralized orchestration.
Behavioral Fingerprinting
Normal users show diverse query patterns reflecting different tasks and skill levels. The flagged accounts shared distinctive prompt structures — systematic prompt variation, consistent formatting, and query sequences that followed extraction methodologies rather than genuine usage.
Chain-of-Thought Detection
A disproportionate number of queries from flagged accounts included prompts designed to elicit step-by-step reasoning — “think through this carefully,” “explain your reasoning,” “walk me through your approach” — at rates far exceeding normal API usage.
Statistical Pattern Recognition
The query distribution across flagged accounts was statistically inconsistent with organic usage. Normal traffic follows power-law distributions with heavy tails. The distillation traffic showed uniform coverage patterns — systematically exploring capability space rather than solving specific problems.
The Geopolitical Context
Anthropic's report did not arrive in a vacuum. The timing intersects with several major developments in the U.S.–China AI competition, and understanding that context is essential to evaluating the allegations fairly.
DeepSeek's latest model demonstrated reasoning capabilities approaching frontier Western models at dramatically lower cost. The distillation allegations raise questions about how much of that capability was independently developed versus extracted from competitors. Read our DeepSeek V4 guide for the full technical breakdown.
The Trump administration is actively debating whether to allow H200 GPU exports to China. Anthropic CEO Dario Amodei has publicly called for tighter export controls, arguing that API-level distillation makes hardware restrictions insufficient if software capabilities can be extracted directly.
Amodei testified before the House Homeland Security Committee on AI safety risks just weeks before the report. His testimony emphasized the danger of AI capability transfer to adversarial nations — a narrative that the distillation report directly supports.
OpenAI has made parallel accusations about Chinese labs distilling from GPT-4 and its successors. Anthropic's report provides more technical detail, but the pattern of Western frontier labs accusing Chinese competitors of model extraction is now well-established across the industry.
Critics point out that the timing serves Anthropic's policy advocacy. Supporters argue that the technical evidence stands regardless of timing. Both observations can be true simultaneously — the evidence should be evaluated on its own merits, while acknowledging the strategic context.
The Community Backlash
The AI community's response has been sharply divided, and both sides make substantive arguments worth considering.
- Frontier labs trained on vast quantities of copyrighted web data, books, and code without consent or compensation
- The New York Times, Getty Images, and thousands of authors have active lawsuits against OpenAI, Google, and Meta for training data usage
- Anthropic itself faces a lawsuit from music publishers over copyrighted lyrics in training data
- “You cannot build your model on everyone else's work, then cry foul when someone builds on yours” — a common refrain in developer communities
- API distillation specifically replicates tool-use and agentic capabilities — not just text generation
- Distilled models lack safety training, creating systems that can bypass refusals the source model was designed to enforce
- Training on web data (even copyrighted) is qualitatively different from targeted extraction of a specific system's capabilities
- The safety implications of unconstrained distilled models represent a genuine risk regardless of the intellectual property debate
The most honest assessment acknowledges both positions. The frontier labs do have a credibility problem on intellectual property — they built their businesses on contested training data practices. But the safety argument about distillation is substantively different from the copyright debate. Extracting a model's safety boundaries to create an unconstrained clone raises risks that web-scraping for training data does not. These are two separate conversations that keep getting conflated.
National Security and Safety Implications
Beyond the intellectual property and competitive dimensions, the distillation allegations raise concrete safety questions that warrant serious attention regardless of one's position on the hypocrisy debate.
The Unconstrained Model Problem
When you distill a model through its outputs, the student model learns what the teacher can do — but not what it was trained not to do. Anthropic's Constitutional AI framework, its RLHF-trained refusal mechanisms, and its red-team-tested safety boundaries do not transfer through API outputs. A distilled model can potentially generate content that Claude would refuse: detailed instructions for harmful activities, persuasive disinformation at scale, or offensive cyber tools without the safety guardrails.
Frontier models have demonstrated capability in vulnerability discovery, exploit generation, and offensive security research. These capabilities are deliberately constrained through safety training. Distilled versions without those constraints could lower the barrier for sophisticated cyber operations.
Claude's reasoning and persuasion capabilities — designed for helpful dialogue — could be repurposed in an unconstrained model for generating targeted disinformation, social engineering campaigns, or influence operations at a scale that manual creation cannot match.
Who Was Not Accused
Notably, Anthropic's report does not implicate all Chinese AI companies. Alibaba's Qwen team and Z.ai are explicitly not named. This selectivity lends some credibility to the allegations — a purely political report might cast a wider net. It also suggests that the three accused companies' behavior was distinguishable from normal API usage patterns of other Chinese AI organizations.
As of publication, none of the three accused companies had issued public responses to the allegations. For a deeper look at how Claude's capabilities compare across models, see our Claude Sonnet 4.6 benchmarks and pricing guide.
What This Means for AI Development
Regardless of how the specific allegations play out, the Anthropic distillation report signals several durable shifts in the AI industry that businesses and developers should plan for.
API Protections Will Tighten
Expect stricter rate-limiting, more sophisticated abuse detection, and potentially identity verification requirements for API access across all frontier model providers. This will increase friction for legitimate developers and may drive up API costs.
Open-Source Dynamics Shift
The distillation debate strengthens the argument for open-weight models (like Meta's Llama) while simultaneously raising questions about what those open models were trained on. The distinction between “open-weight” and “open-source” will matter more as these debates intensify.
Safety Becomes a Competitive Differentiator
Companies that can demonstrate their models were not built on extracted capabilities — and that include robust safety training — will have a competitive advantage in regulated industries and government contracts.
Business Planning Must Account for Supply Chain Risk
Organizations building on AI models need to understand the provenance of those models. If a model's capabilities were derived from unauthorized distillation, downstream users could face legal, reputational, or operational risks as enforcement evolves.
Navigate AI Security with Confidence
From model selection to security assessment, our AI experts help you make informed decisions about AI adoption and protect your competitive advantage.
Frequently Asked Questions
Related Articles
Continue exploring AI development and security topics