These models making bad / tasteless decisions about what dependencies to pull in...

KellyCriterion · 2026-03-07T13:30:27 1772890227

You need to tell the model what to do and what not to do: The dependency thingy is an issue, yes, but you can tell the model not to do so - and you should always know which result the prompt should create: For sure you must be able to read/understand/judge the code - completely fire and forget is not possible (to my experience), though I see many people saying "I had one mega prompt and after 2 days the app was ready", I take those always with a grain of salt.

mikkupikku · 2026-03-07T14:34:05 1772894045

Absolutely. These models still need a lot of this sort of hand holding, so they work best in experienced hands. I'm also skeptical of those very long runs, letting it go so long without active oversight must surely produce at least some objectionable design or implementation details, right? So I guess the people claiming those sort of results have less care for these sort of qualities.

KellyCriterion · 2026-03-07T16:58:36 1772902716

Yes, even Claude Opus 4.6 is still running into accidents on longer chats which lasts for 3 - 4 days. But its getting better and better.