I tried something similar when Llama2 came out, pitting two assistants, who each...

nowittyusername · 2025-04-30T00:22:07 1745972527

With my own experiments I've also found this. This behavior is very persistent with llms on default hyperparameters and system prompt. Right now I am exploring how to get these models to output more human like interactions and it seems that a very specific and detailed system prompt is very important to get this to work. These systems are VERY sensitive to system prompt and user input. Meaning that the quality of output varies drastically depending on not just the language you use but how its structured, the order of that structure and also other many nuanced things like system prompt plus user input pre conditioning. So far it seems its possible to get to where we need to for this task but lots of exploration needs to be done in finding the way in how to structure the whole system together. This revelation is kind of nuts when you think about it. It basically means, once you find the right words and the order in which they should be structured for the whole system you can get 2x+ improvement in every variable you care about. That's why I am spending some time creating an automated solution to find these things for x model. Its a tedious effort to do manually, but we have the tools to automate its own optimization and calibration efforts.

generalizations · 2025-04-29T19:49:40 1745956180

I always assumed you'd have to use different models. Even if only one of them is large, the others would inject enough difference of opinion to keep it useful.

zamalek · 2025-04-29T22:50:46 1745967046

This might be a situation that warrants a higher temperature. Actually, it could be worth starting a very high temperature initially and gradually decreasing it.

caseyy · 2025-04-30T05:46:32 1745991992

Even after turning the temperature way up, the outcome was the same, just the text less coherent. Not dismissing the idea, just sharing my exp.