torturing a model with human stupidity probably doesn't align with their position on model welfare; wondering if they tried bullying it into hacking its way out of the slop gulag
So if all the AI code is being reviewed by humans (not sure this is true, but let's assume it is), then why are there 5000+ bugs? Are you blaming the Anthropic developers rather than the AI?
https://github.com/anthropics/claude-code/issues?q=is%3Aissu...
Apparently whatever SWE-bench is measuring isn't very relevant.