Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience (with ChatGPT 5.1 as of late) is that the AI follows a problem->solution internal logic and doesn't think and try to structure its code.

If you ask for an endpoint to a CRUD API, it'll make one. If you ask for 5, it'll repeat the same code 5 times and modify it for the use case.

A dev wouldn't do this, they would try to figure out the common parts of code, pull them out into helpers, and try to make as little duplicated code as possible.

I feel like the AI has a strong bias towards adding things, and not removing them. The most obviously wrong thing is with CSS - when I try to do some styling, it gets 90% of the way there, but there's almost always something that's not quite right.

Then I tell the AI to fix a style, since that div is getting clipped or not correctly centered etc.

It almost always keeps adding properties, and after 2-3 tries and an incredibly bloated style, I delete the thing and take a step back and think logically about how to properly lay this out with flexbox.



> If you ask for an endpoint to a CRUD API, it'll make one. If you ask for 5, it'll repeat the same code 5 times and modify it for the use case. > >A dev wouldn't do this, they would try to figure out the common parts of code, pull them out into helpers, and try to make as little duplicated code as possible. > >I feel like the AI has a strong bias towards adding things, and not removing them.

I suspect this is because an LLM doesn't build a mental model of the code base like a dev does. It can decide to look at certain files, and maybe you can improve this by putting a broad architecture overview of a system in an agents.md file, I don't have much experience with that.

But for now, I'm finding it most useful still think in terms of code architecture, and give it small steps that are part of that architecture, and then iterate based on your own review of AI generated code. I don't have the confidence in it to just let some agent plan, and then run for tens of minutes or even hours building out a feature. I want to be in the loop earlier to set the direction.


I've noticed this as well, though I've also noticed that you can sometimes avoid it if you're more explicit and actually say things like "can you write these endpoints in a way that ___, ___, and ____." Or I'll mention some context that I'm worried the LLM will miss (for example pointing out when there are already existing functions for doing certain things).

The broader a request is, the more likely I am to get a bunch of bloat. I think this is partly because the LLM will always try to fully solve the problem entirely from your initial prompt. Instead of stopping to clarify something, it'll just move forward with doing something that technically works. I find it's better to break things into smaller steps so that you can "intervene" if it starts to do things wrong.


A good system prompt goes a long way with the latest models. Even just something as simple as "use DRY principles whenever possible." or prompting a plan-implement-evaluate cycle gets pretty good results, at least for tasks that are doing things that AI is well trained on like CRUD APIs.


> If you ask for an endpoint to a CRUD API, it'll make one. If you ask for 5, it'll repeat the same code 5 times and modify it for the use case.

I don’t think this is an inherent issue to the technology. Duplicate code detectors have been around for ages. Given an AI agent a tool which calls one, and ask it to reduce duplication, it will start refactoring.

Of course, there is a risk of going too far in the other direction-refactorings which technically reduce duplication but which have unacceptable costs (you can be too DRY). But some possible solutions: (a) ask it to judge if the refactoring is worth it or not - if it judges no, just ignore the duplication and move on; (b) get a human to review the decision in (a); (c) if AI repeatedly makes wrong decision (according to human), prompt engineering, or maybe even just some hardcoded heuristics


It actually is somewhat a limit of the technology. LLMs can't go back and modify their own output, later tokens are always dependent on earlier tokens and they can't do anything out of order. "Thinking" helps somewhat by allowing some iteration before they give the user actual output, but that requires them to write it the long way and THEN refactor it without being asked, which is both very expensive and something they have to recognize the user wants.


Coding agents can edit their own output - because their output is tool calls to read and write files, and so it can write a file, run some check on it, modify the file to try to make it pass, run the check again, etc


Sorry but from where I sit, this is only marginally closes gap from AI to truly senior engineers.

Basically human junior engineers start by writing code in a very procedural and literal style with duplicate logic all over the place because that's the first step in adapting human intelligence to learning how to program. Then the programmer realizes this leads to things becoming unmaintainable and so they start to learn the abstraction techniques of functions, etc. An LLM doesn't have to learn any of that, because they already know all languages and mechanical technique in their corpus, so this beginning journey never applies.

But what the junior programmer has that the LLM doesn't, is an innate common sense understanding of human goals that are driving the creation of the code to begin with, and that serves them through their entire progression from junior to senior. As you point out, code can be "too DRY", but why? Senior engineers understand that DRYing up code is not a style issue, its more about maintainability and understanding what is likely to change, and what will be the apparent effects to human stakeholders who depend on the software. Basically do these things map to things that are conceptually the same for human users and are unlikely to diverge in the future. This is also a surprisingly deep question as perhaps every human stakeholder will swear up and down they are the same, but nevertheless 6 months from now a problem arises that requires them to diverge. At this point there is now a cognitive overhead and dissonance of explaining that divergence of the users who were heretofore perfectly satisfied with one domain concept.

Ultimately the value function for success of a specific code factoring style depends on a lot of implicit context and assumptions that are baked into the heads of various stakeholders for the specific use case and can change based on myriad outside factors that are not visible to an LLM. Senior engineers understand the map is not the territory, for LLMs there is no territory.


I’m not suggesting AIs can replace senior engineers (I don’t want to be replaced!)

But, senior engineers can supervise the AI, notice when it makes suboptimal decisions, intervene to address that somehow (by editing prompts or providing new tools)… and the idea is gradually the AI will do better.

Rather than replacing engineers with AIs, engineers can use AIs to deliver more in the same amount of time


Which I think points out the biggest issue with current AI - knowledge workers in any profession at any skill level tend to get the impression that AI is very impressive, but is prone to fail at real world tasks unpredictably, thus the mental model of 'junior engineer' or any human that does its simple tasks by itself reliably, is wrong.

AI operating at all levels needs to be constantly supervised.

Which would still make AI a worthwhile technology, as a tool, as many have remarked before me.

The problem is, companies are pushing for agentic AI instead of one that can do repetitve, short horizon tasks in a fast and reliable manner.


Sure. My point was AI was already 25% of the way there even with their verbose messy style. I think with your suggestions (style guidance, human in the loop, etc) we get at most 30% of the way there.


Bad code is only really bad if it needs to be maintained.

If your AI reliably generates working code from a detailed prompt, the prompt is now the source that needs to be maintained. There is no important reason to even look at the generated code


> the prompt is now the source that needs to be maintained

The inference response to the prompt is not deterministic. In fact, it’s probably chaotic since small changes to the prompt can produce large changes to the inference.


The inference response to the prompt is not deterministic.

So? Nobody cares.

Is the output of your C compiler the same every time you run it? How about your FPGA synthesis tool? Is that deterministic? Are you sure?

What difference does it make, as long as the code works?


> Is the output of your C compiler the same every time you run it?

Yes? Because of actual engineering mind you and not rolling the dice until the lucky number comes up.

https://reproducibility.nixos.social/evaluations/2/2d293cbfa...


It's not true for a place-and-route engine, so why does it have to be true for a C compiler?

Nobody else cares. If you do, that's great, I guess... but you'll be outcompeted by people who don't.



That's an advertisement, not an answer.


Did you really read and understand this page in the 1 minute between my post and your reply or did you write a dismissive answer immediately?


Eh, I'll get an LLM to give me a summary later.

In the meantime: no, deterministic code generation isn't necessary, and anyone who says it is is wrong.


The C compiler will still make working programs every time, so long as your code isn’t broken. But sometimes the code chatgpt produces won’t work. Or it'll kinda work but you’ll get weird, different bugs each time you generate it. No thanks.


Nothing matters but d/dt. It's so much better than it was a year ago, it's not even funny.

How weird would it be if something like this worked perfectly out of the box, with no need for further improvement and refinement?


> So? Nobody cares

Yeah the business surely won't care when we rerun the prompt and the server works completely differently.

> Is the output of your C compiler the same every time you run it

I've never, in my life, had a compiler generate instructions that do something completely different from what my code specifies.

That you would suggest we will reach a level where an English language prompt will give us deterministic output is just evidence you've drank the kool-aid. It's just not possible. We have code because we need to be that specific, so the business can actually be reliable. If we could be less specific, we would have done that before AI. We have tried this with no code tools. Adding randomness is not going to help.


I've never, in my life, had a compiler generate instructions that do something completely different from what my code specifies.

Nobody is saying it should. Determinism is not a requirement for this. There are an infinite number of ways to write a program that behaves according to a given spec. This is equally true whether you are writing the source code, an LLM is writing the source code, or a compiler is generating the object code.

All that matters is that the program's requirements are met without undesired side effects. Again, this condition does not require deterministic behavior on the author's part or the compiler's.

To the extent it does require determinism, the program was poorly- or incompletely-specified.

That you would suggest we will reach a level where an English language prompt will give us deterministic output is just evidence you've drank the kool-aid.

No, it's evidence that you're arguing with a point that wasn't made at all, or that was made by somebody else.


You're on the wrong axis. You have to be deterministic about following the spec, or it's a BUG in the compiler. Whether or not you actually have the exact same instructions, a compiler will always do what the code says or it's bugged.

LLMs do not and cannot follow the spec of English reliably, because English is open to interpretation, and that's a feature. It makes LLMs good at some tasks, but terrible for what you're suggesting. And it's weird because you have to ignore the good things about LLMs to believe what you wrote.

> There are an infinite number of ways to write a program that behaves according to a given spec

You're arguing for more abstractions on top of an already leaky abstraction. English is not an appropriate spec. You can write 50 pages of what an app should do and somebody will get it wrong. It's good for ballparking what an app should do, and LLMs can make that part faster, but it's not good for reliably plugging into your business. We don't write vars, loops, and ifs for no reason. We do it because, at the end of the day, an English spec is meaningless until someone actually encodes it into rules.

The idea that this will be AI, and we will enjoy the same reliability we get with compilers, is absurd. It's also not even a conversation worth having when LLMs hallucinate basic linux commands.


People are betting trillions that you're the one who's "on the wrong axis." Seems that if you're that confident, there's money to be made on the other side of the market, right? Got any tips?

Essentially all of the drawbacks to LLMs you're mentioning are either already obsolete or almost so, or are solvable by the usual philosopher's stone in engineering: negative feedback. In this case, feedback from carefully-structured tests. Safe to say that we'll spend more time writing tests and less time writing original code going forward.


> People are betting trillions that you're the one who's "on the wrong axis."

People are betting trillions of dollars that AI agents will do a lot of useful economic work in 10 years. But if you take the best LLMs in the world, and ask them to make a working operating system, C compiler or web browser, they fail spectacularly.

The insane investment in AI isn't because today's agents can reliably write software better than senior developers. The investment is a bet that they'll be able to reliably solve some set of useful problems tomorrow. We don't know which problems they'll be able to reliably solve, or when. They're already doing some useful economic work. And AI agents will probably keep getting smarter over time. Thats all we know.

Maybe in a few years LLMs will be reliable enough to do what you're proposing. But neither I - nor most people in this thread - think they're there yet. If you think we're wrong, prove us wrong with code. Get ChatGPT - or whichever model you like - to actually do what you're suggesting. Nobody is stopping you.


Get ChatGPT - or whichever model you like - to actually do what you're suggesting. Nobody is stopping you.

I do, all the time.

But if you take the best LLMs in the world, and ask them to make a working operating system, C compiler or web browser, they fail spectacularly.

Like almost any powerful tool, there are a few good ways to use LLM technology and countless bad ways. What kind of moron would expect "Write an operating system" or "Write a compiler" or "Write a web browser" to yield anything but plagiarized garbage? A high-quality program starts with a high-quality specification, same as always. Or at least with carefully-considered intent.

The difference is, given a sufficiently high-quality specification, an LLM can handle the specification->source step, just as a compiler or assembler relieves you of having to micromanage the source->object code step.

IMHO, the way it will shake out is that LLMs as we know them today will be only components, perhaps relatively small ones, of larger systems that translate human intent to machine function. What we call "programming" today is only one implementation of a larger abstraction.


I think this might be plausible in the future, but it needs a lot more tooling. For starters you need to be able to run the prompt through the exact same model so you can reproduce a "build".


Even the exact same model isn't enough. There are several sources of nondeterminism in LLMs. These would all need to be squashed or seeded - which as far as I know isn't a feature that openai / anthropic / etc provide.


OK, then the current models aren't as good as I thought/hoped.

I guess one thing it means is that we still need extensive test suites. I suppose an LLM can write those too.


Well.. except the AI models are nondeterministic. If you ask an AI the same prompt 20 times, you'll get 20 different answers. Some of them might work, some probably won't. It usually takes a human to tell which are which and fix problems & refactor. If you keep the prompt, you can't manually modify the generated code afterwards (since it'll be regenerated). Even if you get the AI to write all the code correctly, there's no guarantee it'll do the same thing next time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: