DeepSeek V4 – almost on the frontier

wg0 · 2026-05-02T01:48:18 1777686498

Deepseek v4 Pro feels like Claude Opus 4.6 in it's personality but here's what I did find out about costs:

I did cut loose Deepseek v4 on a decent sized Typescript codebase and asked it to only focus on a single endpoint and go in depth on it layer by layer (API, DTOs, service, database models) and form a complete picture of types involved and introduced and ensure no adhoc types are being introduced.

It developed a very brief but very to the point summary of types being introduced and which of them were refunded etc.

Then I asked it to simplify it all.

It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.

On Claude Opus I think (from past experience before price hikes) these two prompts alone would have burned somewhere between $9 to $13 easily with not much benefit.

Note - I didn't use Open router rather used the Deepseek API directly because Open router itself was being rate limited by Deep seek.

yogthos · 2026-05-02T13:58:22 1777730302

I find a lot of the inefficiency also comes from the model just randomly poking around and grepping all the time which is the fault of the harness. I ended up building a Prolog based MCP where I use tree-sitter to parse the code into a graph, and then the model can just ask questions like 'what are all the functions connected to this function'. So, in case you're trying to focus on what a particular endpoint is doing, you can trivially and predictably trace the whole subgraphs of calls.

https://github.com/yogthos/chiasmus

__turbobrew__ · 2026-05-02T18:57:39 1777748259

I don’t know if it exists already, but bazel would be very useful for the same type of MCP server. Since all dependencies are explicit you can pretty easily do a bazel (r)deps query to find related targets.

yogthos · 2026-05-02T22:04:28 1777759468

Similar idea, I find tree sitter is nice because it already supports a bunch of languages and it's easily extensible. Once you the AST, you can really have the LLM go to town with it.

fragmede · 2026-05-04T12:34:06 1777898046

yeah, lsp integration is way better than grep

mark_l_watson · 2026-05-02T15:12:11 1777734731

Chiasmus Looks very cool. I might have a use for it because I like to use LLM harnesses to explore code. Thanks.

yogthos · 2026-05-02T15:20:55 1777735255

Awesome, and feel free to open issues if you find anything missing that would be useful.

jbritton · 2026-05-02T16:19:42 1777738782

This sounds great. I’m going to play with it.

soerxpso · 2026-05-02T14:23:11 1777731791

I've been having the same experience. Tasks like "go through this entire module and pedantically make it match my preferred styleguide exactly" were not worth a couple dollars with frontier models. It's nice to be able to put deepseek flash on stupid, unnecessary or highly speculative tasks without thinking about the cost.

trollbridge · 2026-05-08T23:57:18 1778284638

DeepSeek V4 Pro's pricing is blowing me away, particularly with how effective the cache is. I just burned 2M tokens and the total cost was 30¢. On Claude Code, I'd have used up multiple 5 hour windows by now, or else horrific amounts of API consumption, around $20-$30 I'm guessing.

onlyrealcuzzo · 2026-05-02T18:17:55 1777745875

> It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.

When people say that LLMs aren't worth it, it kills me.

A lot of us, on average, make $100+ an hour. $0.09 is < 4 seconds of our time.

You can't even read the vast majority of prompt responses that fast.

LLMs will continue to get better (I'm doubtful at previous rates, all indications are showing that progress is slowing and costs are increasing disproportionately).

It seems like >50% of devs think LLMs provide less than 0 value. I just do not get it.

Did they use an LLM one time 3 years ago and decide it's never going to be worth it? Have they even tried? Or have you only ever tried it on 1 giant, monolythic proprietary codebase where they're a total expert and decided that an LLM isn't as good as them, so it's "completely worthless"?

They are shockingly unhelpful on my company's codebase.

But that doesn't mean they are flat-out worthless.

kelnos · 2026-05-02T18:56:50 1777748210

I know I'm guilty of making this sort of argument sometimes, but it's just not valid.

I don't get paid for every waking hour of every day. Often I'm using an LLM for something that's uncompensated, so my hourly wage equivalent is irrelevant.

And for times when we might use an LLM for something related to paid work, it's still money out of your paycheck (unless the employer is paying for it; go nuts in that case). And it's not like using the LLM lets you go home early if it saves you time. You just end up doing more work.

I still use them because they're a useful tool sometimes. But I don't pretend it has negligible or no cost. (Not to mention the externalities around electricity use, crazy data center buildout, skyrocketing GPU and RAM prices, etc.)

killingtime74 · 2026-05-03T12:08:58 1777810138

I don't understand, your employer doesn't pay for your AI use? If my employer didn't pay for it I just wouldn't use it at all out of principle. Just as I don't buy my own work laptop

fragmede · 2026-05-04T12:35:29 1777898129

> You just end up doing more work.

Might want to dig into that one a bit deeper there.

ifwinterco · 2026-05-03T12:16:18 1777810578

Biggest issue with Opus for me is not so much that it's expensive (though it is), but the fact it's slow especially during US working hours.

I prefer using slightly worse but significantly quicker models on a tighter leash and iterating faster, feels more productive

culopatin · 2026-05-05T12:58:53 1777985933

100+ on average?! That hurt.

mastermage · 2026-05-04T05:24:08 1777872248

Very American centered POV

stavros · 2026-05-02T09:52:24 1777715544

How did you use it? OpenRouter, or provider directly?

freedomben · 2026-05-02T12:25:05 1777724705

I'm guessing downvoted because OpenRouter was mentioned in the note (which may not have been there originally), but aside from that this is a perfectly legitimate question. In order to reproduce we need to know how. Was it a coding agent like opencode, an IDE, or something else?

wg0 · 2026-05-02T16:53:53 1777740833

OpenCode + Direct Deepseek API.

TacticalCoder · 2026-05-02T15:44:22 1777736662

> would have burned somewhere between $9 to $13 easily with not much benefit

With not much benefit compared to DeepSeek v4 Pro @ 9 cents (1/100th of the price) or did neither offer any benefit?

ithkuil · 2026-05-02T08:14:52 1777709692

Even taking into account the fact that they are billing at 75% discount it's still quite cheaper

amelius · 2026-05-02T09:19:49 1777713589

Aren't they all billing at discount?

stavros · 2026-05-02T09:46:14 1777715174

Anthropic's and OpenAI's costs seem to include a fairly ok margin, from the very fourth hand info I have.

vdfs · 2026-05-02T11:56:24 1777722984

In total, how many hands do you have?

gessha · 2026-05-02T12:03:08 1777723388

Enough to reach the bottom of the rabbit hole.

rzzzt · 2026-05-03T14:44:07 1777819447

Counting turtles on the way down?

sumeno · 2026-05-02T18:43:50 1777747430

If I was a betting man I'd bet that at least one of those hands is an LLM

utopiah · 2026-05-02T11:58:30 1777723110

Those aren't their hands.

locknitpicker · 2026-05-02T13:03:39 1777727019

> Aren't they all billing at discount?

Microsoft just announced the availability of OpenAI GPT-5.5, which they are charging 30x for it. In contrast, they charge 7.5x for Claude Opus 4.6 and 1x for OpenAI GPT-5.4

Check out the token-based pricing, and compare GPT-5.5 with all other models.

https://docs.github.com/en/copilot/reference/copilot-billing...

giwook · 2026-05-03T14:13:54 1777817634

Actually it is $30 for GPT 5.5, $25 for both Opus 4.6 and 4.7.

If you're referring to the multipliers that are used for subscription-based usage, GPT 5.5 is not available yet (according to https://docs.github.com/en/copilot/reference/copilot-billing...) and Opus will be at 27x at the end of the month.

When I check GH Copilot right now, it looks like Opus 4.7 multiplier was increased to 15x (I think it was 6x just a few days ago) but 4.6 is still at 3x. But these relatively cheap multipliers exist only until the end of the month.

mandeepj · 2026-05-03T15:43:28 1777823008

That’s the classic phenomenon of cheaper pricing due to offshoring! If your expenses are in dollars then for sure recovery is going to be in dollars as well. Why is that a surprise to anyone?

baldai · 2026-05-02T09:01:27 1777712487

Only similarity it has to Opus 4.6 is the 4 in the name. I do not understand these dishonest comparisons. OOS models are vool, cheap and promising for a future -- but why are we pretending they are better than they are?

gmerc · 2026-05-02T09:38:35 1777714715

Speak for yourself. I found switching from Opus 4.7 to be completely painless and in fact, due to the reliability of Anthropic’s API, less of a friction despite slower response times. Zero issues on a large mono repro

baldai · 2026-05-02T14:29:57 1777732197

Hi, I am happy it works well for you. For me personally I struggle finding good use-cases in general for these OOS models. I am lightly technical but I do not manually code. So my flow is /grill-me (can take hours), make plan, review plan with 2. model, implement, review after implementation.

Maybe it is because my tasks are usually chunkier, or because I cant code myself that I struggle using cheaper models. Feels like at every stage of this process SOTA model improves it by 5%, which adds up.

But I am maybe ignorant of Opus level. My main driver is 5.5 and Opus is there for frontend and 2. opinion. In a past I also used Claude models for the chatting phase, but 5.5 took over recently. Maybe Deepseek is closer to Opus and I just overestimated the model compared to 5.5? I tried to give it benefit of being similar.

Recently I started experimenting with Deepseek Flash, maybe hoping if plan is solid enough it can implement quickly and cheaply, but for now it feels not worth it.

How do you use the model to see the benefits? Have you tried 5.5 and can you compare to that one as well?

Thanks.

logicprog · 2026-05-02T17:34:09 1777743249

In my experience, deep seek models are massively overrated in terms of how good they actually are at agantic usage, coding and writing, just because they are kind of the first open source entrant and the name a lot of people know. Try GLM 5.1, coding and writing just because they are kind of the first open source entrant and the name a lot of people know. Try GLM 5.1.

baldai · 2026-05-03T14:38:51 1777819131

Isnt 5.5 Low just better? Its so fast, needs so little tool calls to get work done.

Reviving1514 · 2026-05-02T10:41:22 1777718482

What provider are you using? I have it a shot through open router and saw some weird half formed words coming through occasionally, would love to switch over and give it a proper go

gmerc · 2026-05-03T04:12:37 1777781557

Direct API

Reviving1514 · 2026-05-08T06:20:16 1778221216

Thank you!

itissid · 2026-05-02T21:09:04 1777756144

So RPI/QRSPI like skills (e.g. https://github.com/mattpocock/skills and https://github.com/humanlayer/humanlayer/tree/main/.claude/c... and https://github.com/dfrysinger/qrspi-plus ) for working with claude code work well enough for me that they can reliably* produce code that matches the plan/spec in a way they did not till December 2025.

I have a gut feeling that these models can do just as well, has someone run a reasonable size task — >=1-2 days of designing and planning — and see it work well with these models?

* For me what worked well was the grill me skill(or its variation) at the design stage, the hygiene I followed here was have it ask one question at a time, resolving dependencies at the design stage and reading the hashed out plan closely. The use of a couple of other MCP tools like a documentation server like deepwiki and arxiv for grounding. Other tricks I use are having high signal tests and having claude either be able to read logs and code at the same time or embedding it in the execution(e.g. as a debugger, repl or devtools)

techno303 · 2026-05-03T12:51:31 1777812691

are you talking about a single prompt that runs for 24 hours or 8 hours of developer time spent in a single session?

itissid · 2026-05-03T18:08:56 1777831736

No duplicate the whole task e.g. I use grill-me skill for planning and it takes me ~3 hours and CC asks me 20-40 questions. Do the same grill-me with this and compare the outcomes. I admit Its quite a lot of work to duplicate, but i am really itching to do this over a few tasks and compare the final plan. Just need the time.

cheshire_cat · 2026-05-02T11:17:40 1777720660

While the cost are lower than frontier models there are two factors that make DS4 Pro and K2.6 not as cheap as they might look.

For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.

The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens. To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.

But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]

I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.

[0] https://artificialanalysis.ai/models/deepseek-v4-pro [1] https://artificialanalysis.ai/models/kimi-k2-6 [2] https://artificialanalysis.ai/models/gpt-5-5-high

segmondy · 2026-05-02T13:13:01 1777727581

This is very false DS4 is super cheap. I would advise to begin by reading their release paper. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

They introduce very novel methods to improve long context efficiency and attention. HCA & mCH. It requires only 27% of flops for inference and 10% for KV cache than v3.2. This makes it super efficient. Think of this. For flops, we can now serve more than 3x the amount with the same number of compute, and you would need 30% of prior KV cache.

Furthermore, this release is a PREVIEW, DeepSeek is the real open labs and they not only cook up quite a bit with every single release, but they publish and share it. I'm running this locally.

Let me tell you how "CHEAP" this is. With v3.2 I would run out of GPU ram, spill into system ram with 256k context. It ran quite alright and I was happy with my 7tk/sec. With this, I'm 100% in GPU ram with full 1million token, run more than 2x fast while getting better results.

This is super cheap. moonshot has made it clear that they are starved for GPUs and that's why. If they had GPU capacity like we do in US and subsidized the models like we do here, they would be giving it away for free!

johndough · 2026-05-02T13:21:45 1777728105

> I'm running this locally.

Impressive! What is your setup? Are you running the full DeepSeek V4 Pro, or V4 Flash?

segmondy · 2026-05-02T14:10:12 1777731012

I'm running flash. You can run it under 128gb, so a $3000 strix halo would do. My rig tho is 8 Nvidia gpus and spilling over to system ram.

gbgarbeb · 2026-05-03T16:28:18 1777825698

All hail antirez.

djmips · 2026-05-02T20:08:03 1777752483

No offense but everything comments about local models without telling their GPU setup and VRAM so it's pretty useless information.

giwook · 2026-05-03T14:42:53 1777819373

Found the DeepSeek employee.

cassianoleal · 2026-05-02T11:30:39 1777721439

Sure that can happen but it hasn’t been my experience. I just spent a whole day using it for some pretty hefty refactors, many rounds of back-and-forths, thousands of lines of code changes, reviews, investigations, many subagents running parallel tasks, the works. Total cost $0.95, altogether.

I had attempted this with Opus 4.6 in the past and it burned through the $10 budget I’d given it before it returned from my initial prompt.

Even if it’s heavily discounted, it would still have cost me single digits for a complete solution vs double-digits for exactly nothing.

cheshire_cat · 2026-05-02T11:55:09 1777722909

Sounds promising, thanks for your report.

I didn't want to say that they're not cheaper to run, artificial analysis also shows that they're cheaper. My main point was about it being important to also look at token efficiency, not only cost per token, to get the full picture.

cassianoleal · 2026-05-02T14:57:16 1777733836

I agree! I don't find Claude models to be particularly efficient anyway though. Maybe when running through Claude Code? I don't know, I tried it a while back but it didn't suit me and I kept hitting bugs so I dropped it in favour of something that does something closer to what I want rather than what the provider wants!

pedrosorio · 2026-05-02T14:34:40 1777732480

What harness do you use?

cassianoleal · 2026-05-02T14:54:50 1777733690

Mostly OpenCode but I've been experimenting with Pi a bit lately.

I use Agent Hive [0] for more complex tasks. It sends off subagents with models and parameters I can configure for each different agent (i.e. a low-temp coder, a higher temp with some top_k / top_p for research and architecture, etc).

[0] https://github.com/rretsiem/opencode-hive

pants2 · 2026-05-02T17:19:05 1777742345

According to Artificial Analysis, Grok 4.3[1] is faster, smarter, cheaper, and uses fewer tokens than DS4. So why aren't we talking about Grok?

1. https://artificialanalysis.ai/models/grok-4-3

spaceman_2020 · 2026-05-04T09:31:49 1777887109

Pretty much every AI model is also on discount - its just not explicitly stated

knollimar · 2026-05-04T11:03:26 1777892606

How does that hold true for open weight ones?

deaux · 2026-05-02T06:40:30 1777704030

I'm surprised that people here don't care at all about these models openly training on your data, especially if you use them straight from the model developer. Whereas things like "GitHub now automatically opts everyone into using their code for model training" get hundreds of justifiably angry comments, I never see this brought up anymore on posts like these talking about using Chinese models through OpenRouter. This might be explained by "well they're different people", but the difference is very stark for that to be the whole explanation.

dbeley · 2026-05-02T09:39:04 1777714744

The cool thing about open-weights model is that you are free to use alternative providers that won't phone home to the original model creators.

I see 6 alternative providers listed on Openrouter for DeepSeek V4 Pro for example.

eckelhesten · 2026-05-02T11:04:42 1777719882

At least that’s what they’re telling you. It’s a ”trust me bro” scenario.

I’d rather use the phone home version (deepseeks own endpoint). The benefit is that I’m fairly certain that they actually host the model I’m paying for.

0xbadcafebee · 2026-05-02T15:49:34 1777736974

If you're not Chinese, and you start a company outside of China, and your whole pitch is "We run open weights and we have nothing to do with China", 1) why would send data to China?? 2) why would you risk your business to do a thing that makes no sense?

sagarm · 2026-05-04T06:14:35 1777875275

A fly by night operation created primarily for the purpose of collecting training data and corporate espionage will make whatever claims they think will get them the right traffic.

eckelhesten · 2026-05-02T19:48:05 1777751285

Well, the context was running the models via open router, not hosting 800B> models yourself. Of course, if given the option I believe most people would pick ”don’t share sensitive data”.

What I’m trying to say is that EVERYONE uses your data, even the sensitive type. So you might aswell use an endpoint that does what it says and treat EVERY endpoint whether that’s OpenAI or anthropic as if it’s collecting all of your data.

0xbadcafebee · 2026-05-02T20:35:47 1777754147

No, not everyone uses your data. There are providers who very explicitly do not collect or use your data.

eckelhesten · 2026-05-02T21:30:29 1777757429

Sure, and I won’t collect or otherwise store your credit card info if you send it to me. Trust me bro :)

No but seriously, I am astonished by the level of trust you have for these for-profit companies. I’ll remind you of this quote:

”Zuckerberg: People just submitted it. Zuckerberg: I don't know why. Zuckerberg: They "trust me" Zuckerberg: Dumb fucks”

soerxpso · 2026-05-02T14:26:42 1777732002

Some providers are based in the US or EU and would face legal repercussions for lying about what they do with your data. It's a bit more than "trust me bro". Off the top of my head, you can use Fireworks, for example, which is based in California and would face the same consequences for lying about their data policy as OpenAI or Anthropic would.

eckelhesten · 2026-05-02T19:49:57 1777751397

Meta is based in the US, yet they torrented TERABYTES worth of books to feed their AI.

I’m not trying to be negative here, but your point is invalidated by that particular event in itself.

0xbadcafebee · 2026-05-02T20:40:28 1777754428

What, because they broke the law in one way, they'd break the law in every way? That's not how business works. The way business works is, I steal from other people to make a product, but then I don't steal from my customers, because if they find out, then I no longer have any customers. (Plus all their customers would sue them, which would both legally and financially tank them)

eckelhesten · 2026-05-03T08:41:37 1777797697

That's a naive way of thinking. You're saying "oh, they are thieves in this way, but they surely wouldn't be thieves in this other way!"

If you have no problems shitting on tens of thousands of authors of books, you don't have problems shitting on your customers as well (which they have proved again and again, see https://en.wikipedia.org/wiki/Facebook–Cambridge_Analytica_d...)

Let's just say I wholeheartedly disagree with your viewpoint and leave it there :)

vagrantJin · 2026-05-02T11:21:12 1777720872

You definitely have a bone to pick. Chinese researchers usually have given the world the most cheap and consistent high quality research around LLMs. They don't pretend, they do the work and release the goodies. Mostly so cheap, every one in the world has a chance to use close to frontier models. Why would you respond with "Anger"?

You let us know what your real complaint is about and let's not feign indignation at open models and research.

deaux · 2026-05-02T11:26:52 1777721212

You're making completely unfounded assumptions about me. I use Chinese models myself.

vagrantJin · 2026-05-02T14:08:50 1777730930

I made no such claims. Maybe you have something to share about why we need to have a negative view of free and open models based on publicly available frontier research.

tw1984 · 2026-05-03T12:56:38 1777812998

Anthropic and OpenAI took your data, trained their model, and tell you "we are not going to tell you anything how we trained our models, we are not giving your the weights our models, you will have to pay us to access the model trained from your data".

they took your rights and your data.

Chinese labs took your data, trained their model, and tell you "this paper details how our models are trained using your data, here is the final weights of our model trained from your data, feel free to use it for what you want, it is your model trained on your data".

they converted your data, everything is still in your hand under your control.

you couldn't see the difference?

Your specific question can actually be translated as -

1. why people don't stop Chinese labs so US monopoly can be maintained?

2. why people don't stop Chinese labs providing free models to those who would otherwise never be able to afford the same $200 USD/month Anthropic and OpenAI subscriptions.

3. why people don't complain Chinese labs publishing those trillion dollar secret ideas on model training.

well, because most people are not dickhead I guess?

jml78 · 2026-05-04T12:22:55 1777897375

Hold up. Look, this is all shades of grey but saying Chinese labs all release open weights stuff is kinda crazy thing to say.

Right now they are doing that because they are still trying to catch up to Anthropic, Google, and OpenAI.

The moment they have the special sauce, they will shut it down and you won't be able to run their stuff anymore outside of them. Why do I say that? We already have the evidence in the diffusion model arena. All the chinese labs were pumping out open weights models for image and video, the moment they got to SOTA, they stopped doing it. Less and less is being released.

Chinese companies aren't doing open weights models out of the goodness of their hearts, they are doing it because it help their entire industry catch up. Don't get it twisted, this is very much a US vs China battle here. China wants to win and I am not sure how they won't. Deepseek is the first major large model trained on Huawei chips. It won't be the last and I am betting that China will make up for lesser performance of those chips with more manufacturing and power generation.

I am very bullish on China winning the AI war here. But I also am not naive enough to think that the Chinese companies is doing open weights out of wanting to make the world a better place or the goodness of their hearts. It undercuts the american AI companies.

vagrantJin · 2026-05-04T17:54:54 1777917294

Now we get to the nub. American anti-Chinese rhetoric. Very good.

pheggs · 2026-05-02T06:52:17 1777704737

I am personally okay helping them as long as they publish the models and dont keep them closed. And I dont trust the settings where providers say they wont train on it.

gmerc · 2026-05-02T08:31:22 1777710682

Because they give it away for free and offer APIs at very acceptable rates. Not that hard to figure out, Robin Hood stealing our data tax back comes to mind.

deaux · 2026-05-02T08:40:59 1777711259

GitHub is free.

notrealyme123 · 2026-05-02T09:14:07 1777713247

User publishes to github => Copilot trains with GitHub data => MS Sells copilot => User workes for Microsoft (in the sense of giving it's labour for MS to make money)

User publishes to github => Deepseek trains with GitHub data => Deepseek gives model away for free => User did not work for Deepseek (in the sense of giving it's labour for Deepseek to make money)

deaux · 2026-05-02T11:22:50 1777720970

In the first case MS is giving part of Github itself away for free.

arikrahman · 2026-05-02T09:33:11 1777714391

Exactly, it's intuitively different.

0xbadcafebee · 2026-05-02T15:44:11 1777736651

> I'm surprised that people here don't care at all about these models openly training on your data

You can use zero data retention and zero training providers for most open weights. See OpenRouter and OpenCode Go/Zen for examples.

This is actually one of the big selling points behind open weights - neither China nor the US get your data.

stavros · 2026-05-02T09:50:24 1777715424

If they give me the resulting model in the end, they can train on my data all they want. Hell, I'll send them more of it.

prism56 · 2026-05-02T07:41:28 1777707688

If the data is opensource on github, then in my opinion it should be fair game.

ozgrakkurt · 2026-05-02T08:19:32 1777709972

IMO this is unfair for GPL or similarly licensed code.

Seems ok for MIT like licensed code though

ForHackernews · 2026-05-02T09:24:54 1777713894

It's totally fair to use GPL code, it just means all the models built by Anthropic, OpenAI, etc. using GPL-licensed source are themselves bound by the GPL. Plus, any works created downstream using those AI tools.

We're on the verge of a golden age of software as soon as someone finds a court with courage.

duskdozer · 2026-05-02T09:40:04 1777714804

Ah, you have much more faith in the legal system than I do. It's nice to dream, though.

singpolyma3 · 2026-05-02T14:32:29 1777732349

There's no difference. Either you need to follow the license or you don't. MIT has requirements still.

edg5000 · 2026-05-02T10:16:29 1777716989

I think AI will create an open source dark age. Gradually, we'll see a lot less new good open source code. A gradual shift back to the proprietary world. Simmilar to the 1950-1990 period.

singpolyma3 · 2026-05-02T14:33:32 1777732412

Why would giving more people software freedom and the ability to reverse engineer nonfree code result in a dark age?

driverdan · 2026-05-02T15:50:26 1777737026

The data is not open source. They have open weights but the source data is never open.

notrealyme123 · 2026-05-02T09:16:49 1777713409

Things being public should not be enough. just because someone leaked your medical information to the public via a data breach should not make it fair game. There should be some rules.

prism56 · 2026-05-02T09:33:45 1777714425

I feel that's a false dichotomy. The code on github is freely available for people to read and learn from, leaked medical data isn't.

prism56 · 2026-05-02T09:32:42 1777714362

I feel that's a flase dichotomy. The code visible on github is freely available for anyone to read and learn from.

notrealyme123 · 2026-05-02T16:34:02 1777739642

So would be your leaked medical record.

The point is not that this situation seems absurd. The point is that we need some point where we say whats ok or not.

And by ignoring licensing of public code already we moved it closer to the worse end of the spectrum

singpolyma3 · 2026-05-02T14:34:52 1777732492

There are rules. I believe that search engine indexing follows these rules and that so called "training" is search engine indexing.

But a court may differ in the future.

edg5000 · 2026-05-02T10:14:47 1777716887

My policy is that I don't allow agents to access all code. Some of it is shielded behind bind mounts. Maybe this is a pathetic, artisanal (or ego-driven), reaction of mine to the inevitable. I allow them to work on about 90% of the code (most codebases fully), with some code being considered too valuable to expose to the vendor. When data is involved, LLMs only get to see anonymized data.

This cute policy of mine won't affect anything though. The more we use the models, the more the models will replace this kind of work. Centralisation of power is inevitable; in Medival Europe, we used to have state & church ruling. In modern times but before the internet, it was probably state and banks. Maybe with ongoing digitization (bank offices disappearing) making banks less costly to operate; combined with with bank bailouts, maybe govenments will fully nationalize or at least banks will consolidate.

Then the AI companies will consolidate with the internet information and communication companies (Google/Meta for the US, and Alibaba/Tencent for China). Maybe we'll end up with a few de-facto governmental megacorps that rule in tandem and close cooperation with the formal government, who might handle mostly infra, utilities and the army. The megacorp would control narrative more and take more of a paternal role (educating and protecting the citizens, normally handled by formal governments).

Does this make sense?

antiloper · 2026-05-02T07:43:36 1777707816

AWS Bedrock has DeepSeek models running on their infrastructure. That should be enough to prevent training on user data (there's a markup compared to DeepSeek's pricing though).

And unfortunately AWS doesn't have prepaid billing, so you can't just give the internet access to your API key without getting FinDDoS'd.

deaux · 2026-05-02T07:59:42 1777708782

The latest one available for serverless inference looks to be from 8 months (Deepseek v3.1), which is an eternity and far behind.

ThreatSystems · 2026-05-02T10:19:45 1777717185

If anyone is looking for a solution in this space. Fire me an email, I have a partner whose focussed closely on that problem set!

wolttam · 2026-05-02T15:04:53 1777734293

At this point, that's kind of the reason I use open-weight models through the official providers when I can now.

There's some use cases I won't use a hosted model for, and will only do self hosted.

Otherwise, if they're going to keep releasing open-weight models, I'm going to keep giving them data.

never_inline · 2026-05-02T13:07:52 1777727272

I am fine with them training on my open source code (which is pretty bad but not the point, because they're providing the service for free). I will be super pissed if I pay for enterprise and they train on it though. I believe this is the opinion of majority programmers.

deaux · 2026-05-08T16:05:17 1778256317

At least Moonshot (Kimi) says in the ToS that they train on your prompts when using their paid API.

duskdozer · 2026-05-02T09:36:23 1777714583

What do you mean specifically? Data passed through OpenRouter? Or that they too indiscriminately ingest data all over the web? If the former, I assume it's just that anyone still using them just doesn't care where the data comes from. If the latter, well, it seems like every day there's some news on some new model from somewhere, and it takes dedication to complain every time. There's also the factor that I believe DeepSeek is more open with the model, while others keep it entirely proprietary, which feels fairer and (personally) is also less offensive.

eckelhesten · 2026-05-02T11:12:45 1777720365

As opposed to?

Do you really think OpenAI, Anthropic or any other entity in the same business respects your data?

The Chinese AI companies who release open weights actually deserve whatever input you give them. They are the reason why there is competition and not duopolies in the domain.

deaux · 2026-05-02T11:25:26 1777721126

I think Google, and likely Anthropic, indeed do honor the settings chosen by the user. For Google in particular it'd be very surprising if they didn't. That's also why both do everything they can to trick users into allowing it.

OpenAI, I wouldn't be surprised if you were right.

gspetr · 2026-05-02T15:56:33 1777737393

You mean the same Anthropic, that wouldn't blink an eye at intentionally overcharging users hundreds of dollars just for having a HERMES.md file in a repo, would be above taking your data for... ethical reasons?

theshrike79 · 2026-05-04T06:09:54 1777874994

They also INTENTIONALLY gave people full refunds for that case.

pheggs · 2026-05-02T12:30:08 1777725008

unfortunately the history of these big tech companies has shown that they do not care about data privacy and are even willing to lie about it. but I guess its irrelevant, in practice you have to assume the worst anyway since there is no way to verify it

eckelhesten · 2026-05-02T13:01:44 1777726904

The models doesn’t get better by themselves. You’re naive.

Aeolun · 2026-05-03T12:54:25 1777812865

I never see the output of the Claude or MS models without having to pay for the privilege. All the Chinese models are open weight and open source.

jesterson · 2026-05-04T03:12:25 1777864345

Are you implying chinese models training on my data is worse than OpenAI/Grok/Claude training on my data?

raincole · 2026-05-02T09:16:33 1777713393

Two factors. First is anti-americanism (or at least anti-american-capitalism).

But the more important one is the social contract. Github came far before LLM era. The branding around it is being the storage of open source projects and many users want to it stay away from AI hype. You won't expect LLM providers to stay away from AI hype (duh) so it's less an issue for them.

NicoJuicy · 2026-05-04T15:56:12 1777910172

From the EU side. I think we'll make a cost comparison between the US ( where it's leaders are doing weird shit against the EU and pro Russia) vs China ( who at least gives cheap models and doesn't actually tries to take over an entire European country).

US has too much influence atm. I'm ok with switching between "bullies".

blastro · 2026-05-04T00:39:19 1777855159

thanks for the heads up on github

Aldipower · 2026-05-03T13:09:59 1777813799

I am using DS4 via cortecs.ai. There is no training and it is GDPR-compliant. The flip side, it is expensive.

cedws · 2026-05-02T13:04:19 1777727059

The biggest differentiator for me: DeepSeek just does what I ask. I've tried using both GPT and Claude for reverse engineering recently, both refused. I even got a warning on my OpenAI account.

rurban · 2026-05-02T17:57:26 1777744646

Well, I'm using all the top models extensively on the very same codebase, my new compiler. I use deepseek for it's cheap API costs, when kimi, claude and codex are in their overbudget phase. I asked deepseek V4 Pro for an estimate of a new arm64 port. It said 4 weeks, I said, ok, do it. (I knew ncc was there, and tinycc was also known to the AI's). So it took it half an hour to produce a working arm64 port. First for arm64-elf, because this was easiest to test, and then also after more hours of back and forth the arm64-darwin port. (with crossbuild and github actions). It did cost me with all the subsequent fixes around $8 API costs.

So the experience: at the beginning deepseek was amazing. When it started to get expensive (china day time), I switched from Pro to Flash. No problem, same results. Some bitfield implementation was too complicated so I had to wait for Sonnet 4.6 tokens, kimi-2.6 did the rest. For the very hard problems I asked gpt-5.5, but this was only for one problem. minmax was horrible. didnt follow rules, and made lot of silly stuff.

But when the deepseek context window got filled, deepseek also started to become stupid. So either /clear, or /export and strip the file. And start a new session with the cleared sessions. kimi was overall better, but running into limits with my cheap moderate subscription. Paying private for it, as my companies' token budget is usually out after a week of work.

All in all it is worth it. My next compilers (perl 5+6=11) will be done with deepseek and kimi also.

regarding decompilation: recently we had to decompile a firmware for a USV we bought, but doesnt work on a new system. It only worked on a raspi. So I decompiled it with ghidra, and told my colleague, easy, that's how you do it. But my colleage didnt know about token budgets yet, and already threw opus at it. CoPilot Business account. He had working C files immediately, compilable for our new system. It ended up the USV was not beefy enough. But Opus was fantastic. The code was very short and simple C though.

mrbonner · 2026-05-02T19:00:34 1777748434

Your method of combining models to strengthen the implementation reminds me of how we form stronger alloys by combining metals!

gigatexal · 2026-05-02T19:33:55 1777750435

it also sounds like a lot to manage, do you have some sort of agentic framework that's treating all of these llm's you have access to as sort of inputs that it optimizes?

rurban · 2026-05-02T20:12:06 1777752726

Unfortunately not. I'm using plain kimi, opencode (with deepseek, gpt, minmax, whatever) and claude. claude is the best, but only for some hours. The trick is to get a good AGENTS.md file, good test cases and test runner to repro, like seemless docker and qemu calls. GNU autotools would be easiest, but here I'm using plain makefiles. Also for LSP clangd being up-to-date a compile_commands.json is important. git worktrees helped developing the arm port and fixing c-testsuite cases in parallel. I wanted to keep the costs down. About $15-$30 I think.

And for low-level problems, like ARM calling-convention in asm, those models are much better than simple algorithmic python problems. Just for the hardest problem I needed the big expensive gun, but never opus. This helps in deciding what to do with my next jit project.

irthomasthomas · 2026-05-02T21:36:47 1777757807

Not op but I wrote llm-consortium to prompt multiple models and create a synthesis. And it can run on an openai endpoint using llm-model-gateway. It's expensive, naturally, but for situations where you absolutely must get max intelligence its hard to beat.

e.g.

  Pelican Riding a Bicycle — Engineering Study by DeepSeek v4 Pro, Kimi K2.6, and GLM-5.1 (1 iteration in synthesis mode with DeepSeek v4 flash as judge)

https://htmlpreview.github.io/?https://gist.githubuserconten...

rgbrgb · 2026-05-02T19:29:36 1777750176

what harness do you use with all of these?

SeriousM · 2026-05-02T20:37:21 1777754241

It really sounds like pi.dev

enraged_camel · 2026-05-02T15:17:14 1777735034

>> I even got a warning on my OpenAI account.

I was using GPT 5.5 through Cursor recently, and it found what it thought to be a security-related issue. I read the code, didn't see what it was seeing, and said "Run the chain of operations against my local server and provide proof of the exploit."

It thought for a few seconds, then I got a message in the chat window UI saying OpenAI flagged the request as unsafe, and suggested I use a "safer prompt."

Definitely soured me on the model. Whatever guardrails they are putting are too hamfisted and stupid.

scrollop · 2026-05-02T17:08:08 1777741688

Obscene levels of hallucinations, the worst of LLMs, unfortunately.

Deepseek v4 pro 94%

Deepseek v4 flash - 96%

https://artificialanalysis.ai/evaluations/omniscience?models...

_0ffh · 2026-05-02T17:30:51 1777743051

Personally, I'm not bothered very much by LLM confabulation, as long as it's the result of missing context. In most practical tasks, we either give context to the model, or tell it to find it itself using the internet. What I am concerned with is confabulation that contradicts available in-context information, but that doesn't seem to be what is measured here.

UlisesAC4 · 2026-05-02T17:52:54 1777744374

This must be easily benchmaxed because I have never gotten an "idk like" answer for the western frontier models. All my personal "real world" use cases will always resort to hallucinations.

dust42 · 2026-05-02T17:45:45 1777743945

The output of any LLM is always 100% hallucination by principle. On top of that, most benchmarks are at best an approximation of LLM quality. Your use case decides which one to use. That said, I haven't tested v4 yet but the old 3.2 is still a decent model. And concerning use cases, I had coding problems that Opus couldn't solve but a local 35B model did.

All the talk about frontier and SOTA is do dig deeper and deeper into the pockets of VCs and finally do an IPO.

sanex · 2026-05-02T14:14:09 1777731249

We have an enterprise cursor account so I can try all the mainstream models. Using composer 2 on our own code which I obviously have the source code for I couldn't get it to turn on a debug flag to bypass license checks while I was troubleshooting something. Infuriating. It was like that old Patrick from SpongeBob meme.

I don't understand why we would turn the models into law enforcement officers. Things that are illegal are still illegal and we have professionals to deal with crimes. I don't need Google to be the arbiter of truth and justice. It's already bad enough trying to get accountability from law enforcement and they work for us.

oneseven · 2026-05-02T14:23:36 1777731816

They're probably worried about liability. Let's say that Oracle finds out you reverse engineered their DB using Gemini. You can be sure they will sue Google. Not just for providing the tools, but you could make the argument that it's actually Gemini doing the reverse engineering, and on Google's hardware no less.

Wowfunhappy · 2026-05-02T14:44:21 1777733061

Let's say that Oracle finds out you reverse engineered their DB using IDA Pro. Would you expect Oracle to sue Hex Rays?

I don't understand why everything changes as soon as an LLM is involved. An LLM is just software.

sunnybeetroot · 2026-05-02T15:09:59 1777734599

The difference is IDA Pro doesn’t do something unless you instruct it to, an LLM is unpredictable and may end up performing an action you did not intend. I see it often, it presents me options and does wait for my response, just starts doing what it thinks I want.

ethbr1 · 2026-05-02T16:26:00 1777739160

This. It's going to be tricky for the frontier model labs to argue they didn't intentionally design their models to do so, when the models take illegal actions.

I'm not even sure how one would construct a viable legal argument around that for SOTA models + harnesses, given the amount of creative choices that go into building them.

It'd be something like "Yes, we spent billions of dollars and thousands of person-hours creating these things, but none of that creative effort was responsible for or influenced this particular illegal choice the model made."

And they're caught between a rock and a hard place, because if they cripple initiative, they kill their agentic utility.

Ultimately, this will take a DMCA Section 512-like safe harbor law to definitively clear up: making it clear that outcomes from LLMs are the responsibility of their prompting users, even if the LLM produces unintended actions.

Wowfunhappy · 2026-05-02T16:37:09 1777739829

> I'm not even sure how one would construct a viable legal argument around that for SOTA models + harnesses, given the amount of creative choices that go into building them.

I'm not a lawyer, but to me the legal case seems pretty obvious. "We spent billions of dollars creating this thing to be a good programmer, but we did not intend for it to reverse engineer Oracle's database. No creative effort was spent making it good at reverse engineering Oracle's database. The model reverse-engineered Oracle's database because the user directed it to do so."

If merely fine-tuning an LLM to be good at reverse engineering is enough to be found liable when a user does something illegal, what does that mean for torrent clients?

ethbr1 · 2026-05-02T20:55:58 1777755358

> No creative effort was spent making it good at reverse engineering Oracle's database.

That's the bit that's going to be nasty in evidence. 'So you didn't have any reverse engineering in your training or testing sets?'

skeledrew · 2026-05-03T02:22:52 1777774972

Reverse engineering skill is just a byproduct of programming skill. They go hand in hand.

ethbr1 · 2026-05-03T14:11:36 1777817496

Yes.

Which is going to be hard to explain to a judge and jury, if it comes to that, how despite investing time, money, and effort (and no doubt test cases) into making a model better at reverse engineering... they shouldn't be liable when that model is used for reverse engineering.

Afaik, liability typically turns on intentional development of a product capability.

And there's no way in hell I'd take a bet against the frontier labs having reverse engineering training data, validation / test cases, and internal communications specifically talking about reverse engineering.

jodrellblank · 2026-05-02T23:03:05 1777762985

> “making it clear that outcomes from LLMs are the responsibility of their prompting users, even if the LLM produces unintended actions”

So if I ask “how does a real world production quality database implement indexes?” And it says “I disassembled Oracle and it does XYZ” then I am liable and owe Oracle a zillion dollars?

Whereas if I caveat “you may look at the PostgreSQL or SQLite or other free database engine source code, or industry studies, academic papers; you may not disassemble anything or touch any commercial software” - if it does, I’m still liable?

Who would dare use an LLM for anything in those circumstances?

nullstyle · 2026-05-02T14:52:57 1777733577

If they thought they would succeed, no doubt oracle would sue. I expect bad behavior from multinationals, especially oracle

lokar · 2026-05-02T15:10:07 1777734607

They would not even expect it to succeed, just make an example of the company (the lawsuit is the punishment) to discourage others.

sanex · 2026-05-02T15:10:06 1777734606

We need that lawsuit to happen already so we can establish precedent. The person in the driver's seat of the Tesla should be at fault. The engineer using the llm should be at fault. The person behind the gun not the manufacturer should be at fault.

Iolaum · 2026-05-02T15:33:02 1777735982

We shouldn't need a lawsuit. The legislative branch should pass a law clarifying those things, that's their job.

jon_richards · 2026-05-03T02:08:19 1777774099

Then you need a lawsuit to determine whether the law is “constitutional”.

hvb2 · 2026-05-02T16:04:52 1777737892

> The person in the driver's seat of the Tesla should be at fault.

I don't think this is a good analogy. For Tesla right now it might fly. However, when their software gets to waymo level of autonomy, I would expect liability to shift to the manufacturer.

If anything, I think that would be the true proof of a company trusting their software to allow for autonomous driving

rokob · 2026-05-02T20:05:54 1777752354

> However, when their software gets to waymo level of autonomy

Luckily that won’t happen.

kelvinjps10 · 2026-05-03T23:13:02 1777849982

Also especially if they claim they're selling autonomous cars

dotancohen · 2026-05-03T06:41:02 1777790462

I believe that Mercedes does offer manufacturer liability.

missedthecue · 2026-05-02T17:08:36 1777741716

In the America, whoever has the most money is liable. It's not worth it for the legal industry otherwise. The lawyer earns his pay by convincing the court that whatever established precedent doesn't apply to his case.

sanex · 2026-05-02T18:08:55 1777745335

Unfortunately.

cortesoft · 2026-05-02T17:17:07 1777742227

Also because Google is the one with a lot more money than whoever was using Gemini.

redanddead · 2026-05-03T02:50:54 1777776654

they're very worried about liability, it used to be a small thing, now it's as important as being on the frontier

sad to see, bc China doesn't give a fuck about liability, this is a structural disadvantage

the labs don't feel very protected by government, meanwhile the chinese government is yet again fostering protectionism

american industry keeps getting fucked by dubious lawmakers

varispeed · 2026-05-02T21:31:04 1777757464

> Things that are illegal are still illegal and we have professionals to deal with crimes.

This is quite naive take though. The direction of travel is more fascism in Western governments where duties of traditional policing are taken over by big corporations whilst police forces are being gutted and made impotent.

sanex · 2026-05-02T23:39:15 1777765155

My small town police force has an MRAP, definitely not impotent.

mannanj · 2026-05-02T14:40:39 1777732839

Maybe control is also profitable.

gordonhart · 2026-05-02T14:39:42 1777732782

> I don't understand why we would turn the models into law enforcement officers

It's a simple corporate risk minimization strategy. Just look at how universally despised Grok is on HN. Not because it's a bad model, but because it has less aggressive alignment which means it can be coaxed into saying things that get Xai pilloried here and elsewhere.

Wowfunhappy · 2026-05-02T15:23:43 1777735423

I just think Grok is a bad model. I haven't had success with it.

bilbo0s · 2026-05-02T16:24:12 1777739052

This.

I tried them all.

Grok was worse than even some of the more mediocre open models at actually doing anything. (At least anything tech work related.) GPT and Claude just do what I ask most of the time. With grok, it’s like a chore just getting it to understand the question.

You’re pulling your hair out trying to figure out what on earth you need to do to land in the right place in whatever topsy turvy embedding grok is using?

noelsusman · 2026-05-02T15:44:53 1777736693

It's mostly just a bad model. Plenty of people would be willing to overlook the baggage if the model was even marginally better than the competition.

toraway · 2026-05-02T18:36:44 1777747004

I also used to see Grok boosting/slack-cutting on here/Reddit constantly back in Peak Subsidy when xAI was giving out hundreds of dollars of credits for free per month.

After they killed that and then stopped handing out free model access to users of every Cline fork for weeks following model releases, vibe coder hype moved back to Chinese models for cost and the SOTA models for quality.

kelnos · 2026-05-02T18:44:50 1777747490

Agreed. There's are plenty of instances where people here on HN do mental gymnastics to justify using a truly good product when the company that builds it is morally bankrupt.

Not a criticism (I probably engage in that sort of thinking myself sometimes), just something I've observed. If Grok were actually good, we'd see that phenomenon here, but we don't.

DANmode · 2026-05-03T09:11:24 1777799484

I just read a bunch of compelling “Grok is better at this” use cases in a thread yesterday.

I’m not rushing towards it, but, had to mention.

ascorbic · 2026-05-02T15:35:25 1777736125

No, they've clearly put a lot of work into alignment. It's just that they've been trying to align it with Elon Musk rather than Amanda Askell. Unfortunately the more anti-woke they try to make it, the worse it seems to perform.

skeledrew · 2026-05-03T02:34:01 1777775641

> Unfortunately the more anti-woke they try to make it, the worse it seems to perform.

Probably because being anti-woke generally goes hand in hand with going against facts and logic. Cull the "woke", lose the facts+logic. Not that they care about that anyway.

lostdog · 2026-05-02T15:00:28 1777734028

Grok is despised because it has more aggressive alignment.

igravious · 2026-05-02T16:08:28 1777738108

to what does the "it" in "I couldn't get it to turn on a debug flag" refer to?

sanex · 2026-05-04T02:52:44 1777863164

Composer

ifwinterco · 2026-05-03T12:21:12 1777810872

Software engineering is one thing but if you look 10-20 years into the future and everyone can run models equivalent to today's SoTA locally with zero monitoring or censorship, that could... not be good.

Some people will use them responsibly but a lot of people will not.

LLMs are already frying some people's brains and there are some human desires that should not be encouraged

blubber · 2026-05-03T17:24:35 1777829075

That's why there won't be any local models in 10-20 years. The latest Chinese models are already hosted on proprietary clouds.

regexorcist · 2026-05-03T18:11:43 1777831903

That's a wild assumption and most certainly wrong. Open models will continue to evolve with or without Chinese labs.

GCUMstlyHarmls · 2026-05-02T14:02:48 1777730568

> I even got a warning on my OpenAI account.

This is kind of terrifying to me, regularly. No real manner of recourse to normal people without a following, potential exclusion from real fundamental tooling. Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow. Not just OpenAI but the entire ecosystem is so... hard to read.

I was asking Gemini about a quote from catch 22 and it kept dying mid stream saying it cant talk about it, god knows why, it had no violent or sexual content -- though that is in the book. I could imagine it dinging my whole workspace account just because ... shrug?...

I know ideally the future is local, but I don't know how real that is for most people at least in the next few years with practical costs and power usage except I guess through a M* processor if you're in that ecosystem.

eikenberry · 2026-05-02T18:52:10 1777747930

Open models running locally is the answer. Relying on proprietary, closed software always puts that company's priorities above your own when using their software. You have given up control.

While running them locally presently doesn't make sense economically, you don't need to run them locally to address this issue. There is a lot of competition in hosting open models and you have a variety of services to choose from. Run the open models now, reward that ecosystem instead of continuing to reward closed systems that dreams of rent-seeking.

ryan-a · 2026-05-02T21:45:50 1777758350

You don't need to run the model locally if you don't care about sharing your data. Personally I am happy to share data with Kimi or Deepseek if it means we get better OSS models. For private stuff though local is king

skeledrew · 2026-05-03T02:47:09 1777776429

It'll be a while yet before open models that're good enough will be viable for local use. Heck I've been trying to use the Qwen 3.5 39B A3B on my system, which is modest but no slouch, and have only been able to get ~4.5 tok/s after optimization, and it really runs my system red (fans instantly go crazy). It's just not practical for serious work.

Zambyte · 2026-05-03T12:06:06 1777809966

I've been using Qwen 3.5 and then 3.6 27b Q4 on Ollama with a single 7900 XTX with the codex cli, and I have been blown away by how genuinely useful it is. I've been able to ask it to do long, multi step problems, and it's able to do things that would have likely taken me days to iron out in a matter of hours, or even minutes sometimes.

I get about 30 tok/s, which is far from blazing, but given the capability it has it is absolutely viable for accelerating my work.

cedws · 2026-05-02T17:09:40 1777741780

Yep, and with ID verification, it's not like you can just make another account either. At least, I'm guessing if they don't already, they'll soon be blacklisting individuals, not accounts.

Imagine your livelihood depending on access to LLMs and then OpenAI ban you with no recourse. This is where AI legislation should be focusing right now IMO. We can ensure a level of fairness for everyone without putting the brakes on.

SyneRyder · 2026-05-02T18:09:30 1777745370

It's probably because you were talking about a quote from a book (ie copyrighted material). Authors have sued the AI companies for repeating / memorizing copyrighted works, and getting an AI to discuss a quote would be making it repeat a portion of copyrighted work.

Funny that your case is Kurt Vonnegut. I think I had Claude refuse a task where I was doing an OCR scan of a book review (in a zine / journal a family member published years ago). I think the review might have included a Vonnegut quote as well, and that I ultimately figured it out it was the quote that was making Claude refuse. I may be misremembering the author though.

Mistral had no such refusals, but their OCR is lesser quality.

wmwmwm · 2026-05-02T18:30:07 1777746607

Joseph Heller methinks, but probably not too far away in embedding space!

SyneRyder · 2026-05-02T18:54:31 1777748071

OMG. Where did I get Kurt Vonnegut from? I swear I saw that name in the post and the whole time I was thinking "but he didn't write Catch 22"... I must be fuzzier brained than I thought tonight. Thank you for being kind with your correction.

Hopefully I'm still correct that quoting from books is a reason for some over-zealous task refusals, though.

andriy_koval · 2026-05-04T20:47:31 1777927651

> Authors have sued the AI companies for repeating / memorizing copyrighted works, and getting an AI to discuss a quote would be making it repeat a portion of copyrighted work.

short quotes are fair use..

Hamuko · 2026-05-02T15:09:13 1777734553

>Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow.

Don't worry, you can just make your own Figma, Next, whatever if you have some thousand dollars worth of tokens. This is at least what all of the AI thought leaders have been telling me for the past couple of years.

Aeolun · 2026-05-03T12:47:31 1777812451

I think it’s so bizarre that chatgpt regularly gives me advice on how to get around it’s filters. Like, literally “I can’t do anything if you use copyrighted character’s name, but how about you just say ‘someone that looks like character’”. If you are going to do that, can you just execute the instruction?

kamikazechaser · 2026-05-02T17:07:44 1777741664

In my experience GLM 5.1 has been excellent when paired with IDA Pro (DeepSeek v4 pro comes in close second, Kimi straight up refuses). Claude can only do reverse engineering if you throw it into some sort of hero/saviour mode then gradually pivot into red team (though it gets easily tripped).

loehnsberg · 2026-05-02T20:48:39 1777754919

Among the inexpensive models (and I include Grok 4.3 in this list), GLM 5.1 really sticks out!

On my personal test bench, when compared to other inexpensive models, GLM 5.1 provides the answers that I would consider most complete or satisfying (these are subjects that I consider myself an expert in). The answers tend to be more comprehensive, nuanced, and include references that I would consider the correct ones (if given access to web search).

I also find it a joy to code with, somewhere between Sonnet 4.6 and Opus 4.6 (have not tested Opus 4.7 yet).

Finally, just gauging by pelicans, it kind of stick out: https://simonwillison.net/tags/pelican-riding-a-bicycle/

actsasbuffoon · 2026-05-02T19:14:47 1777749287

This is so strange. I do a ton of RE with Claude, Codex, and sometimes Deepseek, GLM, and Kimi. I don’t have difficulty getting any of them to use IDA or otherwise decompile things.

There is one important difference, which is that Claude and Codex will both refuse if I ask them to touch anything related to security. But so long as I’m just studying algorithms and things like that, they’re totally fine with it.

That said, Codex especially will sometimes randomly give me a cybersecurity warning and stop responding. It’s random but happens maybe 2-3 times per day if I’m doing heavy reverse engineering work. Claude is much less fussy unless, once again, you’re explicitly trying to touch anything related to licenses, passwords, etc.

0xkvyb · 2026-05-02T17:39:44 1777743584

Yes, GLM 5.1 is surprisingly good! Particularly for long-horizon Agentic tasks, with 100+ available tools. It really shocked me in a good way when it was able to complete a long run with 50+ steps and not fall into a loop along the way.

nsingh2 · 2026-05-02T20:36:04 1777754164

I've been using GPT-5.4, and more recently 5.5, with Codex CLI + Ghidra MCP for reverse engineering a game without many issues. Injecting code is where it usually balks at, but I'm just trying to discover and parse structures from game memory.

I did get a refusal when trying to read in-game currency, even though modifying it would do nothing. It has some strange boundaries.

ryandrake · 2026-05-02T16:34:30 1777739670

> I even got a warning on my OpenAI account.

This idea of software threatening the user with consequences is totally wild and dystopian. Fellow developers, what kind of world have be built? This is insanity. Imagine if my hammer told me, "Hey, you shouldn't use me on screws--only nails. Do it again and I'll self-destruct!" WTF people, stop making this kind of software!

neya · 2026-05-02T16:56:22 1777740982

> This idea of software threatening the user with consequences is totally wild and dystopian.

This idea of software built on top of reverse-engineered data threatening the user with consequences is what's really even wild and dystopian.

blastro · 2026-05-04T00:12:56 1777853576

god you're so right

estearum · 2026-05-02T17:12:43 1777741963

All sorts of tools try to prevent dangerous/destructive uses

In fact probably every single piece of commercial software you use had you sign a contract saying you wouldn’t do it

ryandrake · 2026-05-02T17:44:37 1777743877

> All sorts of tools try to prevent dangerous/destructive uses

But they don't threaten their users or have an "N strikes and you're out" policy. I take those safety caps off of all the chemicals in my garage because I'm a grown-ass adult and those caps are a pain in the butt. I would not expect the manufacturer of a solvent to show up at my house lecturing me about safety and threatening to ban me from buying his products.

estearum · 2026-05-02T18:05:49 1777745149

Sure but they would if they could. If they knew idiots were doing idiot things with their products (or evils doing evil things) and did not utilize available methods to prevent them, then the company ends up holding liability. And no, this is not easily signed away in a contract.

kelnos · 2026-05-02T18:46:48 1777747608

There actually is a very important distinction between "would if they could" and "they can and do", though.

estearum · 2026-05-02T19:24:37 1777749877

Uhh right, but describing that as "dystopian" is frankly hysterical.

It's an obvious corollary of good things (like product liability). Virtually everyone I've heard complain about these safety rails was up to antisocial (at best) stuff. I've never heard a sympathetic use-case. It's objectively good that companies can be held responsible for misuse of their products and that they are therefore incentivized to mitigate misuse.

"My inability continuously attack product guardrails to enable my super esoteric (and probably antisocial) use-case is dystopian" is just... not a compelling argument.

ryandrake · 2026-05-02T19:43:32 1777751012

Yes, my safety cap policy is definitely anti-social.

estearum · 2026-05-02T19:46:40 1777751200

"These safety rails" was referring to LLMs, which have far more nuanced and capable safety rails than chemical caps do, and accordingly also have much more assertive ways to enforce them.

ryandrake · 2026-05-02T22:05:43 1777759543

It's the same underlying principle. If I want to ask a software tool what the suicide rate is for my county, I do not expect it to come back with: "Naughty boy! You said an unsafe word! You're getting a strike, and if you get two more, you're banned." This is totally out of the ordinary for a software product, and is absolutely a modern invention. Replace "suicide" with whatever the "AI Safety" obsession word is today.

estearum · 2026-05-03T12:51:29 1777812689

> If I want to ask a software tool what the suicide rate is for my county, I do not expect it to come back with: "Naughty boy! You said an unsafe word! You're getting a strike, and if you get two more, you're banned."

Did this happen?

I just tested this query in Grok, Gemini, Claude, and ChatGPT and 0% of them admonished me or refused to return an answer.

Just like every single conversation I've ever had on this topic, you have to make up examples that aren't even true. Why don't you just share what you were doing that you feel you were unfairly prevented from?

(I have an inkling why you won't do that...)