Not sure, and not completely convinced of the explanation, but the way this stic...

mmaunder · 2026-03-31T19:13:13 1774984393

Great theory. I'll dig deeper.

mmaunder · 2026-03-31T19:31:50 1774985510

Claude Code has a server-side anti-distillation opt-in called fake_tools, but the local code does not show the actual mechanism.

The client sometimes sends anti_distillation: ['fake_tools'] in the request body at services/api/claude.ts:301

The client still sends its normal real tools: allTools at services/api/claude.ts:1711

If the model emits a tool name the client does not actually have, the client turns that into No such tool available errors at services/tools/StreamingToolExecutor.ts:77 and services/tools/toolExecution.ts:369

If Anthropic were literally appending extra normal tool definitions to the live tool set, and Claude used them, that would be user-visible breakage.

That leaves a few more plausible possibilities:

Fake_tools is just the name of the server-side experiment, but the implementation is subtler than “append fake tools to the real tool list.”

or

The server may inject tool-looking text into hidden prompt context, with separate hidden instructions not to call it.

or

The server may use decoys only in an internal representation that is useful for poisoning traces/training data but not exposed as real executable tools.

cedws · 2026-03-31T19:42:38 1774986158

We do know that Anthropic has the ability to detect when their models are being distilled, so there could be some backend mechanism that needs to be tripped to observe certain behaviour. Not possible to confirm though.

mmaunder · 2026-03-31T20:20:13 1774988413

Who's we, and how do you know this?

BoorishBears · 2026-03-31T23:42:38 1775000558

We can be used to refer to people in general, and we know because Anthropic published a post called "Detecting and preventing distillation attacks" a month ago, while calling out 3 AI labs for large scale distillation

https://www.anthropic.com/news/detecting-and-preventing-dist...