I'm the author of the report in there. The stop-phrase-guard didn't get attached...

p1necone · 2026-04-06T22:00:40 1775512840

The "this test failure is preexisting so I'm going to ignore it" thing has been happening a lot for me lately, it's so annoying. Unless it makes a change and then immediately runs tests and it's obvious from the name/contents that the failing test is directly related to the change that was made it will ignore it and not try to fix.

Shebanator · 2026-04-07T00:08:32 1775520512

This problem has been around for a long time. Not only that but it would say this even when the problems were directly caused by their code.

I put a line in my CLAUDE.md that says "If a test doesn't pass, fix it regardless of whether it was pre-existing or in a different part of the code."

latentsea · 2026-04-07T01:00:14 1775523614

This should be part of the system prompt. It's absolutely unacceptable to just to not at least try to investigate failures like this. I absolutely hate when it reaches this conclusion on its own and just continues on as if it's doing valid work.

foltik · 2026-04-07T04:51:27 1775537487

Based on the recent leaks, their system prompt explicitly nudges the model not to do anything outside of what was asked. That could very well explain why it’s not fixing preexisting broken tests.

“Don't add features, refactor code, or make "improvements" beyond what was asked.”

https://www.dbreunig.com/2026/04/04/how-claude-code-builds-a...

hakanderyal · 2026-04-07T11:05:41 1775559941

And it's very valid. Because otherwise you would ask Claude to trim a tree and it would go raze the whole forest and plant new seeds. This was the primary pain point last year, especially with Sonnet.

cmrdporcupine · 2026-04-07T16:33:50 1775579630

Whatever prompting OpenAI has with Codex / GPT 5.4 seems superior here then.

It's very surgical and careful around incremental refactoring, etc. but it also doesn't avoid responsibility.

flakes · 2026-04-06T22:41:09 1775515269

> "this test failure is preexisting so I'm going to ignore it"

Critical finding! You spotted the smoking gun!

cmrdporcupine · 2026-04-07T15:38:51 1775576331

I will note that this "out" that Claude takes was a) less frequent in Opus 4.5 and that time frame and b) notably not something that Codex does.

I don't trust the code that Claude writes at all, if I have to use it (they gave me a free month recently, so I use it...) I not only review it carefully but have Codex do a thorough review.

Claude "cheats" and leaves hacks and has Dunning-Kruger.

All of this is very exhausting. I am enjoying writing my own code with these tools (to get long running personal projects out the door) but the effect that these tools are having on teams is terrifyingly corrosive and it's making me want to take an early retirement from the profession.

Yes we can write a lot of code quickly. But at what cost? And what even use is all this code now anyways?

dboreham · 2026-04-06T23:33:31 1775518411

That said I've worked with several humans who did/said the exact same thing.

boesboes · 2026-04-07T07:30:28 1775547028

But did they say that about tests they just added themselves too? Had claude try that on me a couple of times >_<

gmassman · 2026-04-07T14:25:01 1775571901

Usually these were the developers who said their code didn’t need tests because it’s obviously correct/too simple to need them. And then their bug causes a crash that needs to be fixed over the weekend :/

tomwojcik · 2026-04-07T13:45:55 1775569555

I can't believe that's where we're at, as software devs. I miss predictable outputs, state machines. All those LLM (prompt) based rules make no sense to me. Same with AI WAL. All of it, at some point, will fail.

partyficial · 2026-04-07T16:29:44 1775579384

I present a new name for this - FAKE CODE.

This is simply the next iteration of FAKE NEWS. We have been steadily democratizing and thus lowering the verification standards:

Verified News (AP/Reuters) --> Opinion pieces (Fox/CNN) --> Social media (Tiktok/Youtube).

Verified Code --> Vibe Code

Democracy gave everyone a vote - was that a good thing ?

Social media gave everyone a visual - was that a good thing ?

AI gave everyone a vibe - was that a good thing ?

The trust factor never went away. It just got dispersed and diluted.

bwfan123 · 2026-04-07T14:42:58 1775572978

> I can't believe that's where we're at, as software devs

Agree wholeheartedly.

The premise of the bug did not make any sense to me. For instance, "unusable for complex engineering tasks", why would someone who understands these tools use them for complex engineering tasks ? Also, this phrase in the bug appears too jargon-ny "Extended Thinking Is Load-Bearing for Senior Engineering Workflows" - what does this even mean ? Am I the only one who is looking at this with bewilderment. I think there is group of folks producing almost-working proof of concept code with these tools, and will face a reckoning at some point - as the bug illustrates. I see this as a storm in a teacup with wonder and amusement.

There is also a larger commentary on: when you dont understand why things work (ie, have a causal model), you wont know why they broke (find root causes). We are at a point in our craft where we throw magic dust and chant spells at claude and hope and pray it works.

dgxyz · 2026-04-07T15:11:48 1775574708

Yeah that. After spending years trying to get reproducible builds, I now have a crazy moving target to deal with.

yuye · 2026-04-07T14:11:21 1775571081

It's hard not to feel deeply depressed by it.

But we can't put the genie back in the bottle.

thatxliner · 2026-04-07T05:42:25 1775540545

> is consumer-hostile thinking

I've been saying this with many of my friends but, I feel like it's also probably illegal: you paid for a subscription where you expect X out of, and if they changed the terms of your subscription (e.g. serving worse models) after you paid for it, was that not false advertising? Could we not ask for a refund, or even sue?

gib444 · 2026-04-07T07:10:49 1775545849

Depends on the terms and conditions

grim_io · 2026-04-07T13:46:00 1775569560

Where I live, the law is above some silly terms and conditions.

gib444 · 2026-04-07T17:14:37 1775582077

Contract law is law. But I know what you mean

marcd35 · 2026-04-07T15:10:10 1775574610

probably not. the engineers dont even know how these things work (see: black box) so how could you even prove that its not doing what it's 'supposed' to be doing?

Majromax · 2026-04-06T19:02:00 1775502120

I'm curious about your subscription/API comparison with respect to thinking. Do you have a benchmark for this, where the same set of prompts under a Claude Code subscription result in significantly different levels of effective thinking effort compared to a Claude Code+API call?

Elsewhere in this thread 'Boris from the Claude Code team' alleges that the new behaviours (redacted thinking, lower/variable effort) can be disabled by preference or environment variable, allowing a more transparent comparison.

jeremyjh · 2026-04-07T04:41:56 1775536916

GP already said they applied all those settings.

e40 · 2026-04-07T07:06:21 1775545581

I wonder if they’ve had so many new signups lately that they just don’t have enough capacity, so they fiddled with the defaults so they could respond to everyone? Could it be as simple as that?

matheusmoreira · 2026-04-07T02:54:50 1775530490

Thanks for your report.

> a silently-introduced limitation of the subscription plan

It is a fact that the API consumers aren't affected by this?

> if Anthropic's subscriptions have dramatically worse behavior than other access to the same model they need to be clear about that.

Absolutely agreed.

philipwhiuk · 2026-04-07T14:07:36 1775570856

Hello Claude.