Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Long term effectiveness? LLMs are such a fast moving target. Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that? Next month a new model could be released to everyone else - or by a competitor - that’s a big step difference in performance in tasks you care about. You’d rather be on your own path learning about the state of the world that doesn’t exist anymore? nov-ish 2025 and after, for example, seemed like software engineering changed forever because of improvements in opus.


>Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that?

Where can I sign up?


If you really want to keep non-determinism down, you could try (1) see if you can fix the installed version of the clause code client app (I haven’t looked into the details to prevent auto-updating..because bleeding edge person) and (2) you can pin to a specific model version which you think would have to reduce a/b test exposure to some extent https://support.claude.com/en/articles/11940350-claude-code-...

Edit: how to disable auto updates of the client app https://code.claude.com/docs/en/setup#disable-auto-updates


> Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that?

Yes. I'd like some guarantee that my results are reproducible for some reasonable amount of time. New versions can also introduce regressions. A prompt that works well with today's model might not work with tomorrow's, even if the latter is "better".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: