Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That analysis is pretty brutal. It's very disconcerting that they can sell access to a high quality model then just stealthily degrade it over time, effectively pulling the rug from under their customers.


Stealthily degrade the model or stealthily constrain the model with a tighter harness? These coding tools like Claude Code were created to overcome the shortcomings of last year's models. Models have gotten better but the harnesses have not been rebuilt from scratch to reflect improved planning and tool use inherent to newer models.

I do wonder how much all the engineering put into these coding tools may actually in some cases degrade coding performance relative to simpler instructions and terminal access. Not to mention that the monthly subscription pricing structure incentivizes building the harness to reduce token use. How much of that token efficiency is to the benefit of the user? Someone needs to be doing research comparing e.g. Claude Code vs generic code assist via API access with some minimal tooling and instructions.


I've been using pi.dev since December. The only significant change to the harness in that time which affects my usage is the availability of parallel tool calls. Yet Claude models have become unusable in the past month for many of the reasons observed here. Conclusion: it's not the harness.

I tend to agree about the legacy workarounds being actively harmful though. I tried out Zed agent for a while and I was SHOCKED at how bad its edit tool is compared to the search-and-replace tool in pi. I didn't find a single frontier model capable of using it reliably. By forking, it completely decouples models' thinking from their edits and then erases the evidence from their context. Agents ended up believing that a less capable subagent was making editing mistakes.


Are you using Pi with a cloud subscription, or are you using the API?


Out of curiosity, what can parallel tool calls do that one can't do with parallel subagents and background processes?


How would you do a parallel subagent if you don't have parallel tool calls? Sub agents are tools.


you find that pay-per-use API's degraded too?


Yes, absolutely.


Agree: it is Anthropic's aggressive changes to the harnesses and to the hidden base prompt we users do not see. Clearly intended to give long right tail users a haircut.


I feel like "feature/model freeze" may be justified

just call it something like "[month][year]edition" and work on next release

users spend effort arriving to narrow peak of performace, but every change keeps moving the peak sideways


The changes to reduce inference costs are intentional. Last thing you're going to do is have users linger on an older version that spends much more. This is essentially what's going on with layers upon layers of social engineering on top of it.


Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM.


> Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM.

Well, according to this story, instructions refined by trial and error over months might be good for one LLM on Tuesday, and then be bad for the same LLM on Wednesday.


Disconcerting for sure, but from a business point of view you can understand where they're at; afaiui they're still losing money on basically every query and simultaneously under huge pressure to show that they can (a) deliver this product sustainably at (b) a price point that will be affordable to basically everyone (eg, similar market penetration to smartphones).

The constraints of (b) limit them from raising the price, so that means meeting (a) by making it worse, and maybe eventually doing a price discrimination play with premium tiers that are faster and smarter for 10x the cost. But anything done now that erodes the market's trust in their delivery makes that eventual premium tier a harder sell.


They'll never get anyone on board if the product can't be trusted to not suck.

And idk about the pricing thing. Right now I waste multiple dollars on a 40 minute response that is useless. Why would I ever use this product?


Yeah. I've been enjoying programming with Claude so much I started feeling the need to upgrade to Max. Then it turns out even big companies paying API premiums are getting an intentionally degraded and inferior model. I don't want to pay for Opus if I can't trust what it says.


This could also be a marketing strategy. Make your models perform worse towards the end of a model's cycle, so that the next model appears as if more progress has been made than there actually has been.


  afaiui they're still losing money on basically every query
Source?


i mean you could just search up "is Anthropic making profit" and most sources will say no.

There's this one source on Reddit which calculated that Anthropic has been subsidizing their costs by 32x


I really wonder about this. Is it so bad that they cannot even disclose it? not even an optimistic lie in the ballpark of reality? it's not like they haven't been found cooking the truth repeatedly.

I look at the output of Kimi and the costs of running inference on it that i can replicate, and it isn't that bad, although admittedly i don't have to worry anywhere near as much about scaling it and about having to dedicate large amounts of compute to research and distillation on the back end. It's true that it's perhaps a step behind SotA vs January's Opus or current Codex, depending on what you do. But not by a lot. In fact it's leaps and bounds superior to the current subscription API experience. Together with GLM, Qwen and Minimax they are an amazing backstop just the way they are right now.

With all the layers of obfuscation it's hard to even know roughly how many i/o Opus tokens do Claude subscriptions pay for. They'll give you some flippant arguments like "people were not looking at thinking so we're not showing you anymore" with a straight face. However podcasts still insist Anthropic are "winning the AI war" (??) it really makes me wonder because in no metric I can see them as providing neither best value nor best quality, and let's not get started about consumer experience.

My intuition is that things must be really bad so they're willing to pull the kind of moves they're pulling right now. They're speedrunning people into understanding how important it is to be able to run your own generative AI infrastructure for reliability, thus becoming a very fancy but trustless throwaway solution factory.

I wonder if OpenAI will turn the screws similarly if/when their pockets start to dry up at a certain pace.


the biggest red flag I see is this: https://youtu.be/iOyFja87uyw?si=5INnIG1kZI0AbCGa

tldr: they are trying hard to change S&P500 inclusion rules so that they dont have to wait 12months after going public so they can list mega-ipo asap in force index funds to buy a portion (presumably before revenue exponential growth settles and profits start tanking due to opensource catching up). They know something that we dont.

btw if they are public and part of S&P500 then potentially they'll be a candidate for a bailout.


ChatGPT has been doing the same consistently for years. Model starts out smooth, takes a while, and produces good (relatively) results. Within a few weeks, responses start happening much more quickly, at a poorer quality.


people have been complaining about this since GPT-4 and have never been able to provide any evidence (even though they have all their old conversations in their chat history). I think it’s simply new model shininess turning into raised expectations after some amount of time.


I would have thought so too. But my n=1 has CC solving pretty much the same task today and about two weeks ago with drastically degraded results.

The background being that we scrapped working on a feature and then started again a sprint later.

In my cynicism I find it more likely that a massively unprofitable LLM company tries to reduce costs at any price than everyone else suffering from a collective delusion.


I agree with you. I too complain about this same phenomenon with my colleagues, and we always arrive at the same conclusion: it’s probably us just expecting more and more over time.


First time interacting with a corporation in America?


With an AI corporation, yes. I subscribed during the promotional 2x usage period. Anthropic's reputation as a more ethical alternative to OpenAI factored heavily in that decision. I'm very disappointed.


Ethics don't mean anything when talking about corporations. Their good guy persona is itself a marketing stunt.

https://news.ycombinator.com/item?id=47633396#47635060


I don't think humanity has fully reckoned with the idea of a product that can manipulate us unilaterally like this.


This was always the plan, it’s always the plan. If you can’t self host they will change the rules.


It's disconcerting. But in 2026 it's not very surprising.


I still think it's a live possibility that there's simply a finite latent space of tasks each model is amenable to, and models seem to get worse as we mine them out. (The source link claims this is associated with "the rollout of thinking content redaction", but also that observable symptoms began before that rollout, so I wouldn't particularly trust its diagnosis even without the LLM psychosis bit at the end.)


Did anyone ever expect anything different from modern tech companies? This will only ever get more expensive and worse in quality.


> effectively pulling the rug from under their customers.

This is the whole point of AI. Its a black box that they can completely control.


I hope local models advance to the point they can match Opus one day...


If OP is correct, Opus has regressed to a point where local models are already on par with it.


Having tried GLM-5 and Minimax M2.5, alongside regularly using Opus 4.6 (on default thinking): Opus is still much, much better at writing non-garbage code. I haven't yet tried GLM-5.1 though.


Considering the advances in software and hardware, I would expect that in 2 or 3 years.

And I hope we will eventually reach a point where models become "good enough" for certain tasks, and we won't have to replace them every 6 months.

(That would be similar to the evolution of other technologies like personal computers and smartphones.)


We said this since ChatGPT 3. People will never be content with local models.


Perhaps the subscription part of the business is so heavily subsidized that they have no choice but to reduce the cost.


Or they don’t have enough compute to handle the recent influx of traffic. I’m guessing it’s a bit of both.


It's not rug pulling, it's simple price anchoring. They'll degrade when it makes financial sense for them. You will pay for it. There's no way around it besides self hosting or using reality metered endpoints like openrouter.


It seems likely to me they are moving compute power to the new models they are creating,


Seems like the logical conclusion, no matter what.


You just got used to slop and peeked behind the curtain when the wow factor wore off.


If you think that’s brutal, wait until you hear about how fiat currency works




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: