I suspect the “don’t do that” prompting is more to prevent the model from halluc...

orbital-decay · 2025-06-05T15:20:03 1749136803

Ironically, the negative prompt has a certain chance to do the opposite, as it shifts model's Overton window. Although I don't think there's a reliable way to prompt LLMs to avoid doing things they've been trained to do (the opposite is easy).

They probably don't give Claude.ai's prompt too much attention anyway, it's always been weird. They had many glaring bugs over time ("Don't start your response with Of course!" and then clearly generated examples doing exactly that), they refer to Claude in third person despite first-person measurably performing better, they try to shove everything into a single prompt, etc.

>I assume this capability is used internally (or a better one has been found)

By doing so they would force users to rewrite and re-eval their prompts (costly and unexpected, to put it mildly). Besides, they admitted it was way too crude (and found a slightly better way indeed), and from replication of their work it's known to be expensive and generally not feasible for this purpose.

addaon · 2025-06-06T14:16:17 1749219377

> first-person

Second person?

orbital-decay · 2025-06-07T16:13:33 1749312813

Right.

moritonal · 2025-06-05T07:47:03 1749109623

This would be the actual issue right. Any AI smart enough to write the good things can also write the bad things. Because ethics are something humans made. How long until we have internal court systems for fleets of AI?