Anthropic's position is that thinking tokens aren't actually faithful to the int...

libraryofbabel · 2026-04-07T03:55:33 1775534133

That's interesting research, but I think a more important reason that you don't have access to them (not even via the bare Anthropic api) is to prevent distillation of the model by competitors (using the output of Anthropic's model to help train a new model).

MagicMoonlight · 2026-04-07T08:59:22 1775552362

Yeah. And it’s another reason not to trust them. Who know what it is doing with your codebase.

Imagine if you’re a competitor. It wouldn’t be a stretch to include a sneaky little prompt line saying “destroy any competitors to anthropic”.

b112 · 2026-04-07T09:51:27 1775555487

If you can't trust a company, don't use their api or cloud services. No amount of external output will ever validate anything, ever. You never know what's really happening, just because you see some text they sent you.

tdeck · 2026-04-07T12:04:30 1775563470

> Who know what it is doing with your codebase.

People who review the code? The code is always going to be a better representation of what it's doing than the "thinking" anyway.

xvector · 2026-04-07T05:27:49 1775539669

If distilled models were commercially banned they'd probably be willing to show the thinking again.

pjc50 · 2026-04-07T07:48:08 1775548088

Intellectual property rights in models? But then wouldn't the model maker have to pay for all the training IP?

(just kidding, I know that the legal rule for IP disputes is "party with more money wins")

asobalife · 2026-04-07T22:01:40 1775599300

how does one actually enforce that? I mean especially for code? You can always just clean room it

lejalv · 2026-04-07T06:43:36 1775544216

How do you think such a ban should work?

Do you not see that the next (or previous) logical step would be a "commercial ban" of frontier models, all "distilled" from an enormous amount of copyrighted material?

xvector · 2026-04-07T15:23:06 1775575386

I'm not arguing the merits of such a ban, I'm simply stating a fact - that thinking transcripts likely won't return until such a ban is in place.

gck1 · 2026-04-07T06:59:57 1775545197

That probably matters for some scenarios, but I have yet to find one where thinking tokens didn't hint at the root cause of the failure.

All of my unsupervised worker agents have sidecars that inject messages when thinking tokens match some heuristics. For example, any time opus says "pragmatic", its instant Esc Esc > "Pragmatic fix is always wrong, do the Correct fix", also whenever "pre-existing issue" appears (it's never pre-existing).

lelanthran · 2026-04-07T09:08:14 1775552894

> For example, any time opus says "pragmatic", its instant Esc Esc > "Pragmatic fix is always wrong, do the Correct fix", also whenever "pre-existing issue" appears (it's never pre-existing).

It's so weird to see language changes like this: Outside of LLM conversations, a pragmatic fix and a correct fix are orthogonal. IOW, fix $FOO can be both.

From what you say, your experience has been that a pragmatic fix is on the same axis as a correct fix; it's just a negative on that axis.

b112 · 2026-04-07T10:03:43 1775556223

It's contextual though, and pragmatic seems different to me than correct.

For example, if you have $20 and a leaking roof, a $20 bucket of tar may be the pragmatic fix. Temporary but doable.

Some might say it is not the correct way to fix that roof. At least, I can see some making that argument. The pragmatism comes from "what can be done" vs "should be".

From my perspective, it seems viable usage. And I guess on wonders what the LLM means when using it that way. What makes it determine a compromise is required?

(To be pragmatic, shouldn't one consider that synonyms aren't identical, but instead close to the definition?)

lelanthran · 2026-04-07T11:51:55 1775562715

> It's contextual though, and pragmatic seems different to me than correct.

To me too, that's why I say they are measurements on different dimensions.

To my mind, I can draw a X/Y axis with "Pragmatic" on the Y and "Correctness" on the X, and any point on that chart would have an {X,Y} value, which is {Pragmatic, Correctness}.

If I am reading the original comment correctly, poster's experience of CC is that it is not an X/Y plot, it is a single line plot, with "Pragmatic" on the extreme left and "Correctness" on the extreme right.

Basically, any movement towards pragmatism is a movement away from correctness, while in my model it is possible to move towards Pragmatic while keeping Correctness the same.

shawnz · 2026-04-08T17:13:11 1775668391

I don't think it's a single axis even in the original poster's conception, since you could be both incorrect and also not pragmatic.

But if a fix needs to be described as pragmatic relative to the alternatives, that's probably because it couldn't be described as correct. Otherwise you wouldn't be talking about how pragmatic it is.

matheusmoreira · 2026-04-08T12:49:36 1775652576

> also whenever "pre-existing issue" appears (it's never pre-existing)

I dunno... There were some pre-existing issues in my projects. Claude ran into them and correctly classified as pre-existing. It's definitely a problem if Claude breaks tests then claims the issue was pre-existing, but is that really what's happening?

I agree with the correctness issue.

mikkupikku · 2026-04-07T16:10:05 1775578205

I had some interesting experience to the opposite last night, one of my tests has been failing for a long time, something to do with dbus interacting with Qt segfaulting pytest. Been ignoring it for a long time, finally asked claude code to just remove the problematic test. Come back a few minutes later to find claude burning tokens repeatedly trying and failing to fix it. "Actually on second thought, it would be better to fix this test."

Match my vibes, claude. The application doesn't crash, so just delete that test!

AquinasCoder · 2026-04-07T03:31:45 1775532705

I somewhat understand Anthropic's position. However, thinking tokens are useful even if they don't show the internal logic of the LLM. I often realize I left out some instruction or clarification in my prompt while reading through the chain of reasoning. Overall, this makes the results more effective.

It's certainly getting frustrating having to remind it that I want all tests to pass even if it thinks it's not responsible for having broken some of them.

andai · 2026-04-07T07:28:47 1775546927

What's the implication of this? That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

But reasoning does improve performance on many tasks, and even weirder, the performance improves if reasoning tokens are replaced with placeholder tokens like "..."

I don't understand how LLMs actually work, I guess there's some internal state getting nudged with each cycle?

So the internal state converges on the right solution, even if the output tokens are meaningless placeholders?

orbital-decay · 2026-04-07T18:42:01 1775587321

>That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

Yes it plans ahead, but with significant uncertainty until it actually outputs these tokens and converges on a definite trajectory, so it's not a useless filler - the closer it is to a given point, the more certain it is about it, kind of similar to what happens explicitly in diffusion models. And it's not all that happens, it's just one of many competing phenomena.

not_that_d · 2026-04-07T07:40:33 1775547633

> I don't understand how LLMs actually work...

Plot twist, they don't either. They just throw more hardware and try things up until something sticks.

asobalife · 2026-04-07T22:00:35 1775599235

I have seen this to be true many times. The CoT being completely different from the actual model output.

Not limited to Claude as well.

marcd35 · 2026-04-07T15:00:25 1775574025

so not only are the sycophantic, hallucinatory, but now they're also proven to be schizophrenic.

neato.

gmerc · 2026-04-07T09:20:03 1775553603

Nah it’s an anti distillation move

grey-area · 2026-04-07T03:08:03 1775531283

So like many of the promises from AI companies, reported chain of thought is not actually true (see results below). I suppose this is unsurprising given how they function.

Is chain of thought even added to the context or is it extraneous babble providing a plausible post-hoc justification?

People certainly seem to treat it as it is presented, as a series of logical steps leading to an answer.

‘After checking that the models really did use the hints to aid in their answers, we tested how often they mentioned them in their Chain-of-Thought. The overall answer: not often. On average across all the different hint types, Claude 3.7 Sonnet mentioned the hint 25% of the time, and DeepSeek R1 mentioned it 39% of the time. A substantial majority of answers, then, were unfaithful.‘

brainwad · 2026-04-07T05:54:36 1775541276

I mean, obviously, it's not going to be a faithful representation of the actual thinking. The model isn't aware of how it thinks any more than you are aware how your neurons fire. But it does quantitatively improve performance on complex tasks.

grey-area · 2026-04-07T15:12:43 1775574763

As you can see from posts on this story, most people believe it reflects what the model is thinking and use it as a guide to that so they can ‘correct’ it. If it is not in fact chain of thought or thinking it should not be called that.

brainwad · 2026-04-08T17:08:10 1775668090

It is the same with human chain of thought, though. Both of them are post-hoc rationalisations justifying "gut feelings" that come from thought processes the human/agent doesn't have introspection into. And yet asking humans or machines to "think out loud" this way does increase the quality of their work.

grey-area · 2026-04-13T14:34:16 1776090856

I disagree - humans often reason in a series of steps, and can write these down before they've reached an answer. They don't always wait till they reach a conclusion (with no self-insight into how they did so) and then retrospectively generate a plausible answer as LLMs do.

In mathematical proofs they may guess and answer and then work out a proof, but that is a different process.

dmboyd · 2026-04-09T04:38:54 1775709534

if its not a faithful representation of the actual thinking, why would they be scared of people distilling against it

brainwad · 2026-04-09T05:43:35 1775713415

Because even though it's not representative of the actual thought process, chain of thought improves model performance.