The biggest differentiator for me: DeepSeek just does what I ask. I've tried usi...

rurban · 2026-05-02T17:57:26 1777744646

Well, I'm using all the top models extensively on the very same codebase, my new compiler. I use deepseek for it's cheap API costs, when kimi, claude and codex are in their overbudget phase. I asked deepseek V4 Pro for an estimate of a new arm64 port. It said 4 weeks, I said, ok, do it. (I knew ncc was there, and tinycc was also known to the AI's). So it took it half an hour to produce a working arm64 port. First for arm64-elf, because this was easiest to test, and then also after more hours of back and forth the arm64-darwin port. (with crossbuild and github actions). It did cost me with all the subsequent fixes around $8 API costs.

So the experience: at the beginning deepseek was amazing. When it started to get expensive (china day time), I switched from Pro to Flash. No problem, same results. Some bitfield implementation was too complicated so I had to wait for Sonnet 4.6 tokens, kimi-2.6 did the rest. For the very hard problems I asked gpt-5.5, but this was only for one problem. minmax was horrible. didnt follow rules, and made lot of silly stuff.

But when the deepseek context window got filled, deepseek also started to become stupid. So either /clear, or /export and strip the file. And start a new session with the cleared sessions. kimi was overall better, but running into limits with my cheap moderate subscription. Paying private for it, as my companies' token budget is usually out after a week of work.

All in all it is worth it. My next compilers (perl 5+6=11) will be done with deepseek and kimi also.

regarding decompilation: recently we had to decompile a firmware for a USV we bought, but doesnt work on a new system. It only worked on a raspi. So I decompiled it with ghidra, and told my colleague, easy, that's how you do it. But my colleage didnt know about token budgets yet, and already threw opus at it. CoPilot Business account. He had working C files immediately, compilable for our new system. It ended up the USV was not beefy enough. But Opus was fantastic. The code was very short and simple C though.

mrbonner · 2026-05-02T19:00:34 1777748434

Your method of combining models to strengthen the implementation reminds me of how we form stronger alloys by combining metals!

gigatexal · 2026-05-02T19:33:55 1777750435

it also sounds like a lot to manage, do you have some sort of agentic framework that's treating all of these llm's you have access to as sort of inputs that it optimizes?

rurban · 2026-05-02T20:12:06 1777752726

Unfortunately not. I'm using plain kimi, opencode (with deepseek, gpt, minmax, whatever) and claude. claude is the best, but only for some hours. The trick is to get a good AGENTS.md file, good test cases and test runner to repro, like seemless docker and qemu calls. GNU autotools would be easiest, but here I'm using plain makefiles. Also for LSP clangd being up-to-date a compile_commands.json is important. git worktrees helped developing the arm port and fixing c-testsuite cases in parallel. I wanted to keep the costs down. About $15-$30 I think.

And for low-level problems, like ARM calling-convention in asm, those models are much better than simple algorithmic python problems. Just for the hardest problem I needed the big expensive gun, but never opus. This helps in deciding what to do with my next jit project.

irthomasthomas · 2026-05-02T21:36:47 1777757807

Not op but I wrote llm-consortium to prompt multiple models and create a synthesis. And it can run on an openai endpoint using llm-model-gateway. It's expensive, naturally, but for situations where you absolutely must get max intelligence its hard to beat.

e.g.

  Pelican Riding a Bicycle — Engineering Study by DeepSeek v4 Pro, Kimi K2.6, and GLM-5.1 (1 iteration in synthesis mode with DeepSeek v4 flash as judge)

https://htmlpreview.github.io/?https://gist.githubuserconten...

rgbrgb · 2026-05-02T19:29:36 1777750176

what harness do you use with all of these?

SeriousM · 2026-05-02T20:37:21 1777754241

It really sounds like pi.dev

enraged_camel · 2026-05-02T15:17:14 1777735034

>> I even got a warning on my OpenAI account.

I was using GPT 5.5 through Cursor recently, and it found what it thought to be a security-related issue. I read the code, didn't see what it was seeing, and said "Run the chain of operations against my local server and provide proof of the exploit."

It thought for a few seconds, then I got a message in the chat window UI saying OpenAI flagged the request as unsafe, and suggested I use a "safer prompt."

Definitely soured me on the model. Whatever guardrails they are putting are too hamfisted and stupid.

scrollop · 2026-05-02T17:08:08 1777741688

Obscene levels of hallucinations, the worst of LLMs, unfortunately.

Deepseek v4 pro 94%

Deepseek v4 flash - 96%

https://artificialanalysis.ai/evaluations/omniscience?models...

_0ffh · 2026-05-02T17:30:51 1777743051

Personally, I'm not bothered very much by LLM confabulation, as long as it's the result of missing context. In most practical tasks, we either give context to the model, or tell it to find it itself using the internet. What I am concerned with is confabulation that contradicts available in-context information, but that doesn't seem to be what is measured here.

UlisesAC4 · 2026-05-02T17:52:54 1777744374

This must be easily benchmaxed because I have never gotten an "idk like" answer for the western frontier models. All my personal "real world" use cases will always resort to hallucinations.

dust42 · 2026-05-02T17:45:45 1777743945

The output of any LLM is always 100% hallucination by principle. On top of that, most benchmarks are at best an approximation of LLM quality. Your use case decides which one to use. That said, I haven't tested v4 yet but the old 3.2 is still a decent model. And concerning use cases, I had coding problems that Opus couldn't solve but a local 35B model did.

All the talk about frontier and SOTA is do dig deeper and deeper into the pockets of VCs and finally do an IPO.

sanex · 2026-05-02T14:14:09 1777731249

We have an enterprise cursor account so I can try all the mainstream models. Using composer 2 on our own code which I obviously have the source code for I couldn't get it to turn on a debug flag to bypass license checks while I was troubleshooting something. Infuriating. It was like that old Patrick from SpongeBob meme.

I don't understand why we would turn the models into law enforcement officers. Things that are illegal are still illegal and we have professionals to deal with crimes. I don't need Google to be the arbiter of truth and justice. It's already bad enough trying to get accountability from law enforcement and they work for us.

oneseven · 2026-05-02T14:23:36 1777731816

They're probably worried about liability. Let's say that Oracle finds out you reverse engineered their DB using Gemini. You can be sure they will sue Google. Not just for providing the tools, but you could make the argument that it's actually Gemini doing the reverse engineering, and on Google's hardware no less.

Wowfunhappy · 2026-05-02T14:44:21 1777733061

Let's say that Oracle finds out you reverse engineered their DB using IDA Pro. Would you expect Oracle to sue Hex Rays?

I don't understand why everything changes as soon as an LLM is involved. An LLM is just software.

sunnybeetroot · 2026-05-02T15:09:59 1777734599

The difference is IDA Pro doesn’t do something unless you instruct it to, an LLM is unpredictable and may end up performing an action you did not intend. I see it often, it presents me options and does wait for my response, just starts doing what it thinks I want.

ethbr1 · 2026-05-02T16:26:00 1777739160

This. It's going to be tricky for the frontier model labs to argue they didn't intentionally design their models to do so, when the models take illegal actions.

I'm not even sure how one would construct a viable legal argument around that for SOTA models + harnesses, given the amount of creative choices that go into building them.

It'd be something like "Yes, we spent billions of dollars and thousands of person-hours creating these things, but none of that creative effort was responsible for or influenced this particular illegal choice the model made."

And they're caught between a rock and a hard place, because if they cripple initiative, they kill their agentic utility.

Ultimately, this will take a DMCA Section 512-like safe harbor law to definitively clear up: making it clear that outcomes from LLMs are the responsibility of their prompting users, even if the LLM produces unintended actions.

Wowfunhappy · 2026-05-02T16:37:09 1777739829

> I'm not even sure how one would construct a viable legal argument around that for SOTA models + harnesses, given the amount of creative choices that go into building them.

I'm not a lawyer, but to me the legal case seems pretty obvious. "We spent billions of dollars creating this thing to be a good programmer, but we did not intend for it to reverse engineer Oracle's database. No creative effort was spent making it good at reverse engineering Oracle's database. The model reverse-engineered Oracle's database because the user directed it to do so."

If merely fine-tuning an LLM to be good at reverse engineering is enough to be found liable when a user does something illegal, what does that mean for torrent clients?

ethbr1 · 2026-05-02T20:55:58 1777755358

> No creative effort was spent making it good at reverse engineering Oracle's database.

That's the bit that's going to be nasty in evidence. 'So you didn't have any reverse engineering in your training or testing sets?'

skeledrew · 2026-05-03T02:22:52 1777774972

Reverse engineering skill is just a byproduct of programming skill. They go hand in hand.

ethbr1 · 2026-05-03T14:11:36 1777817496

Yes.

Which is going to be hard to explain to a judge and jury, if it comes to that, how despite investing time, money, and effort (and no doubt test cases) into making a model better at reverse engineering... they shouldn't be liable when that model is used for reverse engineering.

Afaik, liability typically turns on intentional development of a product capability.

And there's no way in hell I'd take a bet against the frontier labs having reverse engineering training data, validation / test cases, and internal communications specifically talking about reverse engineering.

jodrellblank · 2026-05-02T23:03:05 1777762985

> “making it clear that outcomes from LLMs are the responsibility of their prompting users, even if the LLM produces unintended actions”

So if I ask “how does a real world production quality database implement indexes?” And it says “I disassembled Oracle and it does XYZ” then I am liable and owe Oracle a zillion dollars?

Whereas if I caveat “you may look at the PostgreSQL or SQLite or other free database engine source code, or industry studies, academic papers; you may not disassemble anything or touch any commercial software” - if it does, I’m still liable?

Who would dare use an LLM for anything in those circumstances?

nullstyle · 2026-05-02T14:52:57 1777733577

If they thought they would succeed, no doubt oracle would sue. I expect bad behavior from multinationals, especially oracle

lokar · 2026-05-02T15:10:07 1777734607

They would not even expect it to succeed, just make an example of the company (the lawsuit is the punishment) to discourage others.

sanex · 2026-05-02T15:10:06 1777734606

We need that lawsuit to happen already so we can establish precedent. The person in the driver's seat of the Tesla should be at fault. The engineer using the llm should be at fault. The person behind the gun not the manufacturer should be at fault.

Iolaum · 2026-05-02T15:33:02 1777735982

We shouldn't need a lawsuit. The legislative branch should pass a law clarifying those things, that's their job.

jon_richards · 2026-05-03T02:08:19 1777774099

Then you need a lawsuit to determine whether the law is “constitutional”.

hvb2 · 2026-05-02T16:04:52 1777737892

> The person in the driver's seat of the Tesla should be at fault.

I don't think this is a good analogy. For Tesla right now it might fly. However, when their software gets to waymo level of autonomy, I would expect liability to shift to the manufacturer.

If anything, I think that would be the true proof of a company trusting their software to allow for autonomous driving

rokob · 2026-05-02T20:05:54 1777752354

> However, when their software gets to waymo level of autonomy

Luckily that won’t happen.

kelvinjps10 · 2026-05-03T23:13:02 1777849982

Also especially if they claim they're selling autonomous cars

dotancohen · 2026-05-03T06:41:02 1777790462

I believe that Mercedes does offer manufacturer liability.

missedthecue · 2026-05-02T17:08:36 1777741716

In the America, whoever has the most money is liable. It's not worth it for the legal industry otherwise. The lawyer earns his pay by convincing the court that whatever established precedent doesn't apply to his case.

sanex · 2026-05-02T18:08:55 1777745335

Unfortunately.

cortesoft · 2026-05-02T17:17:07 1777742227

Also because Google is the one with a lot more money than whoever was using Gemini.

redanddead · 2026-05-03T02:50:54 1777776654

they're very worried about liability, it used to be a small thing, now it's as important as being on the frontier

sad to see, bc China doesn't give a fuck about liability, this is a structural disadvantage

the labs don't feel very protected by government, meanwhile the chinese government is yet again fostering protectionism

american industry keeps getting fucked by dubious lawmakers

varispeed · 2026-05-02T21:31:04 1777757464

> Things that are illegal are still illegal and we have professionals to deal with crimes.

This is quite naive take though. The direction of travel is more fascism in Western governments where duties of traditional policing are taken over by big corporations whilst police forces are being gutted and made impotent.

sanex · 2026-05-02T23:39:15 1777765155

My small town police force has an MRAP, definitely not impotent.

mannanj · 2026-05-02T14:40:39 1777732839

Maybe control is also profitable.

gordonhart · 2026-05-02T14:39:42 1777732782

> I don't understand why we would turn the models into law enforcement officers

It's a simple corporate risk minimization strategy. Just look at how universally despised Grok is on HN. Not because it's a bad model, but because it has less aggressive alignment which means it can be coaxed into saying things that get Xai pilloried here and elsewhere.

Wowfunhappy · 2026-05-02T15:23:43 1777735423

I just think Grok is a bad model. I haven't had success with it.

bilbo0s · 2026-05-02T16:24:12 1777739052

This.

I tried them all.

Grok was worse than even some of the more mediocre open models at actually doing anything. (At least anything tech work related.) GPT and Claude just do what I ask most of the time. With grok, it’s like a chore just getting it to understand the question.

You’re pulling your hair out trying to figure out what on earth you need to do to land in the right place in whatever topsy turvy embedding grok is using?

noelsusman · 2026-05-02T15:44:53 1777736693

It's mostly just a bad model. Plenty of people would be willing to overlook the baggage if the model was even marginally better than the competition.

toraway · 2026-05-02T18:36:44 1777747004

I also used to see Grok boosting/slack-cutting on here/Reddit constantly back in Peak Subsidy when xAI was giving out hundreds of dollars of credits for free per month.

After they killed that and then stopped handing out free model access to users of every Cline fork for weeks following model releases, vibe coder hype moved back to Chinese models for cost and the SOTA models for quality.

kelnos · 2026-05-02T18:44:50 1777747490

Agreed. There's are plenty of instances where people here on HN do mental gymnastics to justify using a truly good product when the company that builds it is morally bankrupt.

Not a criticism (I probably engage in that sort of thinking myself sometimes), just something I've observed. If Grok were actually good, we'd see that phenomenon here, but we don't.

DANmode · 2026-05-03T09:11:24 1777799484

I just read a bunch of compelling “Grok is better at this” use cases in a thread yesterday.

I’m not rushing towards it, but, had to mention.

ascorbic · 2026-05-02T15:35:25 1777736125

No, they've clearly put a lot of work into alignment. It's just that they've been trying to align it with Elon Musk rather than Amanda Askell. Unfortunately the more anti-woke they try to make it, the worse it seems to perform.

skeledrew · 2026-05-03T02:34:01 1777775641

> Unfortunately the more anti-woke they try to make it, the worse it seems to perform.

Probably because being anti-woke generally goes hand in hand with going against facts and logic. Cull the "woke", lose the facts+logic. Not that they care about that anyway.

lostdog · 2026-05-02T15:00:28 1777734028

Grok is despised because it has more aggressive alignment.

igravious · 2026-05-02T16:08:28 1777738108

to what does the "it" in "I couldn't get it to turn on a debug flag" refer to?

sanex · 2026-05-04T02:52:44 1777863164

Composer

ifwinterco · 2026-05-03T12:21:12 1777810872

Software engineering is one thing but if you look 10-20 years into the future and everyone can run models equivalent to today's SoTA locally with zero monitoring or censorship, that could... not be good.

Some people will use them responsibly but a lot of people will not.

LLMs are already frying some people's brains and there are some human desires that should not be encouraged

blubber · 2026-05-03T17:24:35 1777829075

That's why there won't be any local models in 10-20 years. The latest Chinese models are already hosted on proprietary clouds.

regexorcist · 2026-05-03T18:11:43 1777831903

That's a wild assumption and most certainly wrong. Open models will continue to evolve with or without Chinese labs.

GCUMstlyHarmls · 2026-05-02T14:02:48 1777730568

> I even got a warning on my OpenAI account.

This is kind of terrifying to me, regularly. No real manner of recourse to normal people without a following, potential exclusion from real fundamental tooling. Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow. Not just OpenAI but the entire ecosystem is so... hard to read.

I was asking Gemini about a quote from catch 22 and it kept dying mid stream saying it cant talk about it, god knows why, it had no violent or sexual content -- though that is in the book. I could imagine it dinging my whole workspace account just because ... shrug?...

I know ideally the future is local, but I don't know how real that is for most people at least in the next few years with practical costs and power usage except I guess through a M* processor if you're in that ecosystem.

eikenberry · 2026-05-02T18:52:10 1777747930

Open models running locally is the answer. Relying on proprietary, closed software always puts that company's priorities above your own when using their software. You have given up control.

While running them locally presently doesn't make sense economically, you don't need to run them locally to address this issue. There is a lot of competition in hosting open models and you have a variety of services to choose from. Run the open models now, reward that ecosystem instead of continuing to reward closed systems that dreams of rent-seeking.

ryan-a · 2026-05-02T21:45:50 1777758350

You don't need to run the model locally if you don't care about sharing your data. Personally I am happy to share data with Kimi or Deepseek if it means we get better OSS models. For private stuff though local is king

skeledrew · 2026-05-03T02:47:09 1777776429

It'll be a while yet before open models that're good enough will be viable for local use. Heck I've been trying to use the Qwen 3.5 39B A3B on my system, which is modest but no slouch, and have only been able to get ~4.5 tok/s after optimization, and it really runs my system red (fans instantly go crazy). It's just not practical for serious work.

Zambyte · 2026-05-03T12:06:06 1777809966

I've been using Qwen 3.5 and then 3.6 27b Q4 on Ollama with a single 7900 XTX with the codex cli, and I have been blown away by how genuinely useful it is. I've been able to ask it to do long, multi step problems, and it's able to do things that would have likely taken me days to iron out in a matter of hours, or even minutes sometimes.

I get about 30 tok/s, which is far from blazing, but given the capability it has it is absolutely viable for accelerating my work.

cedws · 2026-05-02T17:09:40 1777741780

Yep, and with ID verification, it's not like you can just make another account either. At least, I'm guessing if they don't already, they'll soon be blacklisting individuals, not accounts.

Imagine your livelihood depending on access to LLMs and then OpenAI ban you with no recourse. This is where AI legislation should be focusing right now IMO. We can ensure a level of fairness for everyone without putting the brakes on.

SyneRyder · 2026-05-02T18:09:30 1777745370

It's probably because you were talking about a quote from a book (ie copyrighted material). Authors have sued the AI companies for repeating / memorizing copyrighted works, and getting an AI to discuss a quote would be making it repeat a portion of copyrighted work.

Funny that your case is Kurt Vonnegut. I think I had Claude refuse a task where I was doing an OCR scan of a book review (in a zine / journal a family member published years ago). I think the review might have included a Vonnegut quote as well, and that I ultimately figured it out it was the quote that was making Claude refuse. I may be misremembering the author though.

Mistral had no such refusals, but their OCR is lesser quality.

wmwmwm · 2026-05-02T18:30:07 1777746607

Joseph Heller methinks, but probably not too far away in embedding space!

SyneRyder · 2026-05-02T18:54:31 1777748071

OMG. Where did I get Kurt Vonnegut from? I swear I saw that name in the post and the whole time I was thinking "but he didn't write Catch 22"... I must be fuzzier brained than I thought tonight. Thank you for being kind with your correction.

Hopefully I'm still correct that quoting from books is a reason for some over-zealous task refusals, though.

andriy_koval · 2026-05-04T20:47:31 1777927651

> Authors have sued the AI companies for repeating / memorizing copyrighted works, and getting an AI to discuss a quote would be making it repeat a portion of copyrighted work.

short quotes are fair use..

Hamuko · 2026-05-02T15:09:13 1777734553

>Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow.

Don't worry, you can just make your own Figma, Next, whatever if you have some thousand dollars worth of tokens. This is at least what all of the AI thought leaders have been telling me for the past couple of years.

Aeolun · 2026-05-03T12:47:31 1777812451

I think it’s so bizarre that chatgpt regularly gives me advice on how to get around it’s filters. Like, literally “I can’t do anything if you use copyrighted character’s name, but how about you just say ‘someone that looks like character’”. If you are going to do that, can you just execute the instruction?

kamikazechaser · 2026-05-02T17:07:44 1777741664

In my experience GLM 5.1 has been excellent when paired with IDA Pro (DeepSeek v4 pro comes in close second, Kimi straight up refuses). Claude can only do reverse engineering if you throw it into some sort of hero/saviour mode then gradually pivot into red team (though it gets easily tripped).

loehnsberg · 2026-05-02T20:48:39 1777754919

Among the inexpensive models (and I include Grok 4.3 in this list), GLM 5.1 really sticks out!

On my personal test bench, when compared to other inexpensive models, GLM 5.1 provides the answers that I would consider most complete or satisfying (these are subjects that I consider myself an expert in). The answers tend to be more comprehensive, nuanced, and include references that I would consider the correct ones (if given access to web search).

I also find it a joy to code with, somewhere between Sonnet 4.6 and Opus 4.6 (have not tested Opus 4.7 yet).

Finally, just gauging by pelicans, it kind of stick out: https://simonwillison.net/tags/pelican-riding-a-bicycle/

actsasbuffoon · 2026-05-02T19:14:47 1777749287

This is so strange. I do a ton of RE with Claude, Codex, and sometimes Deepseek, GLM, and Kimi. I don’t have difficulty getting any of them to use IDA or otherwise decompile things.

There is one important difference, which is that Claude and Codex will both refuse if I ask them to touch anything related to security. But so long as I’m just studying algorithms and things like that, they’re totally fine with it.

That said, Codex especially will sometimes randomly give me a cybersecurity warning and stop responding. It’s random but happens maybe 2-3 times per day if I’m doing heavy reverse engineering work. Claude is much less fussy unless, once again, you’re explicitly trying to touch anything related to licenses, passwords, etc.

0xkvyb · 2026-05-02T17:39:44 1777743584

Yes, GLM 5.1 is surprisingly good! Particularly for long-horizon Agentic tasks, with 100+ available tools. It really shocked me in a good way when it was able to complete a long run with 50+ steps and not fall into a loop along the way.

nsingh2 · 2026-05-02T20:36:04 1777754164

I've been using GPT-5.4, and more recently 5.5, with Codex CLI + Ghidra MCP for reverse engineering a game without many issues. Injecting code is where it usually balks at, but I'm just trying to discover and parse structures from game memory.

I did get a refusal when trying to read in-game currency, even though modifying it would do nothing. It has some strange boundaries.

ryandrake · 2026-05-02T16:34:30 1777739670

> I even got a warning on my OpenAI account.

This idea of software threatening the user with consequences is totally wild and dystopian. Fellow developers, what kind of world have be built? This is insanity. Imagine if my hammer told me, "Hey, you shouldn't use me on screws--only nails. Do it again and I'll self-destruct!" WTF people, stop making this kind of software!

neya · 2026-05-02T16:56:22 1777740982

> This idea of software threatening the user with consequences is totally wild and dystopian.

This idea of software built on top of reverse-engineered data threatening the user with consequences is what's really even wild and dystopian.

blastro · 2026-05-04T00:12:56 1777853576

god you're so right

estearum · 2026-05-02T17:12:43 1777741963

All sorts of tools try to prevent dangerous/destructive uses

In fact probably every single piece of commercial software you use had you sign a contract saying you wouldn’t do it

ryandrake · 2026-05-02T17:44:37 1777743877

> All sorts of tools try to prevent dangerous/destructive uses

But they don't threaten their users or have an "N strikes and you're out" policy. I take those safety caps off of all the chemicals in my garage because I'm a grown-ass adult and those caps are a pain in the butt. I would not expect the manufacturer of a solvent to show up at my house lecturing me about safety and threatening to ban me from buying his products.

estearum · 2026-05-02T18:05:49 1777745149

Sure but they would if they could. If they knew idiots were doing idiot things with their products (or evils doing evil things) and did not utilize available methods to prevent them, then the company ends up holding liability. And no, this is not easily signed away in a contract.

kelnos · 2026-05-02T18:46:48 1777747608

There actually is a very important distinction between "would if they could" and "they can and do", though.

estearum · 2026-05-02T19:24:37 1777749877

Uhh right, but describing that as "dystopian" is frankly hysterical.

It's an obvious corollary of good things (like product liability). Virtually everyone I've heard complain about these safety rails was up to antisocial (at best) stuff. I've never heard a sympathetic use-case. It's objectively good that companies can be held responsible for misuse of their products and that they are therefore incentivized to mitigate misuse.

"My inability continuously attack product guardrails to enable my super esoteric (and probably antisocial) use-case is dystopian" is just... not a compelling argument.

ryandrake · 2026-05-02T19:43:32 1777751012

Yes, my safety cap policy is definitely anti-social.

estearum · 2026-05-02T19:46:40 1777751200

"These safety rails" was referring to LLMs, which have far more nuanced and capable safety rails than chemical caps do, and accordingly also have much more assertive ways to enforce them.

ryandrake · 2026-05-02T22:05:43 1777759543

It's the same underlying principle. If I want to ask a software tool what the suicide rate is for my county, I do not expect it to come back with: "Naughty boy! You said an unsafe word! You're getting a strike, and if you get two more, you're banned." This is totally out of the ordinary for a software product, and is absolutely a modern invention. Replace "suicide" with whatever the "AI Safety" obsession word is today.

estearum · 2026-05-03T12:51:29 1777812689

> If I want to ask a software tool what the suicide rate is for my county, I do not expect it to come back with: "Naughty boy! You said an unsafe word! You're getting a strike, and if you get two more, you're banned."

Did this happen?

I just tested this query in Grok, Gemini, Claude, and ChatGPT and 0% of them admonished me or refused to return an answer.

Just like every single conversation I've ever had on this topic, you have to make up examples that aren't even true. Why don't you just share what you were doing that you feel you were unfairly prevented from?

(I have an inkling why you won't do that...)

ryandrake · 2026-05-03T16:03:42 1777824222

That's why I said:

> Replace "suicide" with whatever the "AI Safety" obsession word is today

I don't know what those queries are, but original-OP made one and got a "strike", which is what spawned this thread.

estearum · 2026-05-03T16:25:25 1777825525

Which would be more than 0% concerning if I've ever heard (even once) an example of this happening with a query that shouldn't actually trigger something like that, or is so close to such a query, that the false positive is understandable and of incredibly niche value anyway.

OP gave an example of reverse engineering, something that to the LLM looks identical to just hacking. I am totally fine if the incredibly tiny little fraction of people who want to reverse engineer their own systems can't use LLMs to do it, and in exchange top LLMs aren't helpful for the hordes of actual malicious actors who would love a superintelligence to aid their crimes.

No-brainer tradeoff, just like 100% of examples I've ever heard.

klagermkii · 2026-05-02T21:30:09 1777757409

I don't think that "dystopian" necessarily goes far enough, this would be one of the rare times where I would call it a fascist mentality - the idea that everything's primary allegiance is to the state and the goals of the state rather than those of the customer or the user.

I want a default that has people empowered, rather than something where it's just another performative smokescreen caused by overzealous product liability. I'll thank you and your kind for needing to distractedly tap the "Agree" button on my car's infotainment every time I start it to confirm that I will pay attention to the road.

estearum · 2026-05-02T21:47:52 1777758472

"the state" is just shorthand we use for "other people in my community"

> I'll thank you and your kind for needing to distractedly tap the "Agree" button on my car's infotainment every time I start it to confirm that I will pay attention to the road.

Does that actually mitigate antisocial usecases? No? Then it's not what I'm talking about :)

Of course if you wanted to you could just share specifically what totally-reasonable LLM use-case you have in mind that's neutered by this "fascist mentality" instead of dreaming up unrelated instances.

klagermkii · 2026-05-02T22:38:21 1777761501

> "the state" is just shorthand we use for "other people in my community"

It's a very different abstraction layer, in the same way as individual cells vs the entity that is you. The entity that comes together from all those "other people in my community" and its priorities are different to the individual desires.

> Does that actually mitigate antisocial usecases? No? Then it's not what I'm talking about :)

Maybe it does? Maybe someone is alive on the road today because they read the message and changed their behaviour. I'm giving an example of something where this liability mindset has created a world where manufacturers are no longer prioritising the desires of their users in order to appease a sense of harm-reduction. And you weren't limiting it to LLMs you were applying it to all sorts of tools.

I think that "reverse engineering" as the OP was talking about is one of those things where maybe 1/10000 uses could actually be harmful. This is not even a high-risk request such as to produce a weapon of some kind where maybe your "antisocial usecases" could be applied.

estearum · 2026-05-03T12:52:53 1777812773

Yes if you apply some logic to such extremity that it produces bad outcomes then you should stop applying that logic to those extreme cases.

motoxpro · 2026-05-02T16:51:43 1777740703

I think it's closer to asking a remote (human) assistant to do something that someone doesn't want done (e.g., view the source of a closed-source product, whether through reverse engineering, going into their office, or social engineering) and that remote assistant company saying, "Please stop asking our assistants to do that."

You can still use an IDE (hammer) to reverse engineer anything you want.

Wilder7977 · 2026-05-02T17:57:24 1777744644

It's not though. It's still just a piece of code, much closer to IDEs or any other program than to a human assistant in any way that matters (morals, responsibility).

motoxpro · 2026-05-03T08:17:47 1777796267

It just seems like you are saying if you found out Claude code was a bunch of remote working doing work for you, then it would be morally wrong to do illegal/morally wrong/irresponsible things with them, but because it is NOT a human, those same things are fine?

Wilder7977 · 2026-05-09T08:56:46 1778317006

Yes, correct.

Is the distinction between human labor/actions and a program executing hard to grasp?

Moral is a human thing, not an absolute thing, so of course it's different if there is a single human involved and a tool, and a human with a relationship to other humans.

motoxpro · 2026-05-16T08:30:20 1778920220

I just have different moral preferences. I think its morally wrong to do illegal/morally wrong/irresponsible things in general, whether I am using a hammer, a car, a company, or AI.

It's worse to ask someone else to break a window with a hammer for me, but the window still got broken, and the person whose window it was is still sad/out of money/etc.

The thought experiment was that if you were doing illegal things with an AI, you would not feel bad, but if you found out that the AI was a person, you would feel bad. That is very strange to me, more a feeling of guilt/shame.

ryan-a · 2026-05-02T21:44:15 1777758255

This is huge for me too, I was working on something super benign the other day and GPT flagged it for Cyber risk, Deepseek just does the work, its fast and cheap. Its only missing image support IMO, once deepseek cracks image too its going to be hard for anthropic and openai to compete.

Footprint0521 · 2026-05-02T16:04:45 1777737885

Buying it now to test this out, I’ve been looking for a model that doesn’t treat me like a child lol

api · 2026-05-02T16:38:54 1777739934

Speaking of this: is anyone working on binary to source decompiler models? Seems like a no brainer and I could see it working exceptionally well especially if they were fine tuned for each language. So if you can tell it’s a Go binary use a Go model, etc.

janalsncm · 2026-05-02T16:51:17 1777740677

Trivially easy to train if it doesn’t exist already. Take a codebase, compile it to binary, train a model to reverse the process since you have the ground truth.

varispeed · 2026-05-02T21:29:22 1777757362

I myself got refusals often for legitimate data analysis work. I am starting to lean on buying powerful hardware little by little until I get suitable rig to run local models that make sense.

ignoramous · 2026-05-02T14:35:26 1777732526

> even got a warning on my OpenAI account

Edit: https://chatgpt.com/cyber

cedws · 2026-05-02T17:11:49 1777741909

I don't want to verify my ID. OpenAI uses Persona which recently was found to be doing very dodgy stuff.

https://www.therage.co/persona-age-verification/

lolpython · 2026-05-02T14:41:33 1777732893

> https://openai.com/cyber

that link 404s

ignoramous · 2026-05-02T14:55:58 1777733758

Yikes. Thx. It is: https://chatgpt.com/cyber

For enterprises: https://openai.com/form/enterprise-trusted-access-for-cyber/

Announcements:

Introducing Trusted Access for Cyber, https://openai.com/index/trusted-access-for-cyber/ (Feb 2026)

Trusted access for the next era of cyber defense, https://openai.com/index/scaling-trusted-access-for-cyber-de... (Apr 2026)

teaearlgraycold · 2026-05-02T22:03:57 1777759437

Claude has refused to run nmap so I can locate my own computer on my own network! The guard rails are completely out of control.

johnbarron · 2026-05-02T15:09:18 1777734558

Silicon Valley has do to dirty tricks now. Next phase is they win....

"A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat" - https://www.wired.com/story/super-pac-backed-by-openai-and-p...

Bridged7756 · 2026-05-02T15:58:03 1777737483

It wouldn't surprise me the US government is behind it. As it wouldn't surprise me the government of China is subsidizing those OS models. A lot of things at play, and all over a huge bubble.

bilbo0s · 2026-05-02T16:36:22 1777739782

Yep.

Eventually, access to Chinese models may be illegal in the US. I tell every developer I work with, download them as fast as possible. You never know when this administration could cut off access.

nurettin · 2026-05-02T17:08:25 1777741705

To be fair, anthropic has a procedure which lets them vet you as a security researcher so you can use claude as a pentester.

grassfedgeek · 2026-05-02T15:06:05 1777734365

Are you kidding? Ask this question and see what answer you get: What famous photo depicts a man standing in front of a line of tanks?

kouteiheika · 2026-05-02T15:17:21 1777735041

Are you kidding?

The main difference here is not that DeepSeek's model is completely free of censorship (although I'd wager it's less censored), but that it's open-weight. That has two major advantages:

1) If Anthropic/OpenAI/Google bans you - you're screwed, you can't access their model at all, but if DeepSeek bans - you just go to another provider, or host the model yourself.

2) If the model refuses to answer you can uncensor it (and this is getting easier and more automated day-by-day[1]).

[1] -- https://github.com/p-e-w/heretic

himata4113 · 2026-05-02T16:12:46 1777738366

The photo depicts "Tank Man" which was taken on June 5, 1989 during the Tiananmen Square protests. v4-pro and v4-flash roughly answer the same way on openrouter.

0x3f · 2026-05-02T17:04:49 1777741489

Are you really concerned about asking these kinds of questions though? Like how many LLM-able Tiananmen Square questions are you needing answered per month really? And it seems like you know not to trust it, so there's not even a risk that you're going to ask such a question and rely on the answer.

I run into Claude being a stubborn idiot about far more useful stuff all the time. And often all it takes to bypass is starting a new chat and reframing it, so it's entirely pointless hand wringing.

Then let's not forget only one of these is a paid product, and it's not the more annoying one. I feel like I can forgive DeepSeek for just obeying the laws of the country they're based in, as silly as those might be, because they're being pretty generous with the weights in the first place.

bilbo0s · 2026-05-02T16:38:03 1777739883

Huh?

Did you ever actually ask v4 this question?

Tomte · 2026-05-02T16:44:27 1777740267

I tried after reading parent, and the DeepSeek app refused and suggested to switch topics. I don‘t know if the chat interface uses v4, though.

lostmsu · 2026-05-02T17:58:09 1777744689

That's the app, not the model.

slopinthebag · 2026-05-02T16:22:44 1777738964

Here is DeepSeek v4 on OpenRouter:

"The photograph you're referring to is the iconic "Tank Man" image, taken during the Tiananmen Square protests in Beijing, China, on June 5, 1989.

The photo, captured by Associated Press photographer Jeff Widener, shows an unidentified protester standing defiantly in front of a column of Chinese Type 59 tanks as they moved through Chang'an Avenue near Tiananmen Square, in the aftermath of the Chinese government's violent crackdown on the pro-democracy demonstrations.

The lone man, dressed in a white shirt and carrying what appears to be a shopping bag, repeatedly blocked the lead tank's path — even as the tank swerved to avoid him. The image became one of the most powerful and enduring symbols of peaceful resistance against oppression in modern history. The identity of the "Tank Man" remains officially unknown to this day."