Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience with AI coding, very large context windows aren't useful in practice. Every model seems to get confused when you feed them more than ~25-30k tokens. The models stop obeying their system prompts, can't correctly find/transcribe pieces of code in the context, etc.

Developing aider, I've seen this problem with gpt-4o, Sonnet, DeepSeek, etc. Many aider users report this too. It's perhaps the #1 problem users have, so I created a dedicated help page [0].

Very large context may be useful for certain tasks with lots of "low value" context. But for coding, it seems to lure users into a problematic regime.

[0] https://aider.chat/docs/troubleshooting/edit-errors.html#don...



Aider is great, but you need specific formats from the llm. That might be where the challenge is.

I've used the giant context in Gemini to dump a code base and say: describe the major data structures and data flows.

Things like that, overview documents, work great. It's amazing for orienting in an unfamiliar codebase.


Yes, that is true. Aider expects to work with the LLM to automatically apply edits to the source files. This requires precision from the LLM, which is what breaks down when you overload them with context.


Not true. In Aider the patch produced by the LLM is sent to a second model that is just tasked with fixing the patch — it works wonders.


Based on an earlier comment, I think the person you're replying to is the author of aider.


Yes, aider can also work in architect/editor mode [0] which tends to produce the best results [1]. An architect model solves the coding problem and describes the needed changes however comes naturally to it. The editor model then takes that solution and turns it into correctly formatted instructions to edit the files.

Too much context can still confuse the LLMs in this situation, but they may be somewhat more resilient.

[0] https://aider.chat/2024/09/26/architect.html

[1] https://aider.chat/2025/01/24/r1-sonnet.html


My hypothesis is code completion is not a text completion problem. More of a graph completion one.

So we may have got to a local maximum regarding code helpers with LLMs and we'll have to wait for some breakthrough in the AI field before we get something better.


But these models don't work that well even for text when you gave them a huge context. They're reasonably good at summarization, but if you ask them to "continue the story" they will write very inconsistent things (eerily similar to what a sloppy human writer does, though.)


We should be able to provide 2 fields, context and prompt so the prompt gets higher priority and don't get mixed with the whole context.


For this breakthrough to happen, big tech will need to hire software engineers again :)

But the good thing is that DeepSeek proved those breakthroughs are going to happen one way or another, fast.


I concur. In my work (analysing news show transcripts and descriptions), I work with about 250k input tokens max. Tasks include:

- Summarize topics (with references to shows) - Find quotes specific to a topic (again with references)

Anything above 32k tokens fails to have acceptable recall, across GPT-4o, Sonnet, and Google's Gemini Flash 1.5 and 2.0.

I suppose it kind of makes sense, given how large context windows are implemented via things like sparse attention etc.


What could be the reason? Do they selectively skip tokens to make it appear they support the full context?


Thanks for aider! It has become an integral part of my workflow. Looking forward to try DeepSeek in architect mode with Sonnet as the driver. Curious if it will be a noticeable improvement as compared to using Sonnet by itself.


I'm guessing you're interesting in R1+Sonnet because of the recent SOTA benchmark result? It does seem to be a powerful architect/editor combo.

https://aider.chat/2025/01/24/r1-sonnet.html


Claude works incredibly well for me with asking for code changes to projects filling up 80% of context (160K tokens). It's way expensive with the API though but reasonable through the web interface with pro.


It’s not just the quantity of tokens in context that matters, but the coherence of the concepts in the context.

Many conflicting ideas are harder for models to follow than one large unified idea.


In my own experience with openAI's gpt-4o, when feeding with long input, the model simply 'strips' the starting content and answers, while it should output error for more 'accurate' feedback. I switched to my own deployed model for that purpose.

That's a product choice from openAI, not a problem with LLM's long context window input capability.


The behaviour you described is what happens when you have small context windows. Perhaps you're feeding the models with more tokens than you think you are. I have enjoyed loading large codebases into AI Studio and getting very satisfying and accurate answers because the models have 1M to 2M token context windows.


How do you get those large codebases into AI Studio? Concat everything into one big file?



Concat to a file but it helps to make an ascii tree at the top and then for each merged file out its path and orientation details. I’ve also started playing with adding line ranges to the ascii tree hoping that the LLMs (more specifically the agentic ones) start getting smart enough to jump to the relevant section.


Basically yes, I have a helper program, but that's mainly what it does.


I learned this very explicitly recently. I've had some success with project and branch prompts - feeding a bunch of context into the beginning of each dialog.

In one dialog, some 30k tokens later, Claude requested the contents of package.json... which was in the context window already - the whole file!

The strange thing was that after I said so, without re-inserting, Claude successfully read it from context to fill the gap in what it was trying to do.

It's as if a synopsis of what exists in-context delivered with each message would help. But that feels weird!


That's what these char models are already doing.

Most chat is just a long running prompt. LLMs have zero actual memory. You just keep feeding it history.

Maybe I misunderstood what you're saying but what you're describing is some kind of 2nd model that condenses that history and that gets fed; this has been done.

Really, what you probably need is another model managing the heap and the stack of the history and brining forward the current context.

But that's easy to say because we are humans.


To clarify:

I didn't 'fix' the problem by re-inserting the package.json. I just gave a reminder that it that it was already in context.

"I could confirm this root cause if I could see the contents of package.json".

"You can see it."

"Whoops, yes. Package x needs to bump from n to n+1."

The point being, even info inside the current context can actively be overlooked unless it is specifically referenced.


Maybe the problem is that the "UI" we're providing to the LLMs is not very useful.

Imagine dumping the entire text of a large code repository in front of a human programmer, and asking them to fix a bug. Human programmers use IDEs, search through the code, flip back and forth between different functions, etc. Maybe with a better interface that the LLM could interact with, it would perform better.


I wonder if you could figure out a pseudo code like python. I'd think yaml might work also

Something: Filename: index.js Content: | Class Example...

Another item would be some kind of hyperlinking. Maybe you could load in a hrefs but there might be a more semantically popular way, but the data feeding these AIs just aren't constructed like that.


Overall accuracy degradation on longer contexts is just one major issue. Another is that lost-in-the-middle problem starts being much worse on longer contexts, so when it significantly exceeds the length of model's training examples, the tokens in the middle might as well not exist.


Any idea why this happens?


Yeah, and thanks to the features of the programming language, it's very easy to automatically assemble a highly relevant but short context, just by following symbol references recursively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: