It's really interesting how much the AI harness seems to matter. Going from 48% ...

manx · 2026-04-27T15:42:59 1777304579

We probably want to compare the cartesian product of model+harness.

culi · 2026-04-27T17:37:24 1777311444

Maybe the future isn't a human-like centralized intelligence but an octopus-like decentralized intelligence where more focus is placed on making the harness itself "smart"

dominotw · 2026-04-27T17:47:05 1777312025

That would be counter to AI company goals. They want harness to be dumb and models to be smart so they can sell models.

satvikpendem · 2026-04-27T20:53:26 1777323206

Not really. Anthropic for example sells both the harness and the models as a unified kit via Claude Code, it is in their best interest to make sure both parts work as well as possible, via reinforcement learning of previous usage as well for new model performance increases.

dominotw · 2026-04-28T12:47:36 1777380456

but harness are not a moat. They wouldnt have to subsidize their own harness massively if that was the case. Anyone can write a good harness .

satvikpendem · 2026-04-28T12:52:27 1777380747

That's not true that anyone can write a good harness because the LLM providers have information like prompts that they can RL train off of that someone writing their own harness would not have. Therefore a good and proprietary harness is a moat.

dominotw · 2026-04-28T17:21:19 1777396879

that doesnt answer why claude subsidizes their own harness and bans ppl from using subsidized inference on openclaw ect

satvikpendem · 2026-04-28T17:24:39 1777397079

Yes it does? They want people to be locked into the Claude Code product.

dominotw · 2026-04-28T17:52:28 1777398748

why do they have "lock" them if its clearly superior to alternatives that merely u se their api.

satvikpendem · 2026-04-28T17:57:17 1777399037

Because it's a way to make more money in the future. I feel like you're not really getting the difference between what a business does for profit and its technical decisions.

dominotw · 2026-04-28T19:22:02 1777404122

well internet is rife with theories about why anthropic does it. I dont buy that you have it all figured out.

SwellJoe · 2026-04-27T19:16:18 1777317378

https://en.wikipedia.org/wiki/Bitter_lesson

History indicates you can't tool and harness your way to effectively competing against a smarter model with more compute.

nikcub · 2026-04-27T22:01:07 1777327267

the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.

somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark

[0] https://www.tbench.ai/leaderboard/terminal-bench/2.0

isege · 2026-04-27T20:44:52 1777322692

Isn't that what terminal-bench does?

GodelNumbering · 2026-04-27T15:13:32 1777302812

I really wish there was! I thought of even creating one but it would be conflict of interest

alfiedotwtf · 2026-04-28T06:52:26 1777359146

For my local tests the past few months on the same local model, I’ve found Claude Code to be way better than OpenCode, and OpenCode to be better than Codex.