In my experience with AI coding, very large context windows aren't useful in pra...

adamgordonbell · on Jan 26, 2025

Aider is great, but you need specific formats from the llm. That might be where the challenge is.

I've used the giant context in Gemini to dump a code base and say: describe the major data structures and data flows.

Things like that, overview documents, work great. It's amazing for orienting in an unfamiliar codebase.

anotherpaulg · on Jan 26, 2025

Yes, that is true. Aider expects to work with the LLM to automatically apply edits to the source files. This requires precision from the LLM, which is what breaks down when you overload them with context.

noname120 · on Jan 27, 2025

Not true. In Aider the patch produced by the LLM is sent to a second model that is just tasked with fixing the patch — it works wonders.

trentnelson · on Jan 27, 2025

Based on an earlier comment, I think the person you're replying to is the author of aider.

anotherpaulg · on Jan 27, 2025

Yes, aider can also work in architect/editor mode [0] which tends to produce the best results [1]. An architect model solves the coding problem and describes the needed changes however comes naturally to it. The editor model then takes that solution and turns it into correctly formatted instructions to edit the files.

Too much context can still confuse the LLMs in this situation, but they may be somewhat more resilient.

[0] https://aider.chat/2024/09/26/architect.html

[1] https://aider.chat/2025/01/24/r1-sonnet.html

arkh · on Jan 27, 2025

My hypothesis is code completion is not a text completion problem. More of a graph completion one.

So we may have got to a local maximum regarding code helpers with LLMs and we'll have to wait for some breakthrough in the AI field before we get something better.

raincole · on Jan 27, 2025

But these models don't work that well even for text when you gave them a huge context. They're reasonably good at summarization, but if you ask them to "continue the story" they will write very inconsistent things (eerily similar to what a sloppy human writer does, though.)

meiraleal · on Jan 27, 2025

We should be able to provide 2 fields, context and prompt so the prompt gets higher priority and don't get mixed with the whole context.

meiraleal · on Jan 27, 2025

For this breakthrough to happen, big tech will need to hire software engineers again :)

But the good thing is that DeepSeek proved those breakthroughs are going to happen one way or another, fast.

badlogic · on Jan 27, 2025

I concur. In my work (analysing news show transcripts and descriptions), I work with about 250k input tokens max. Tasks include:

- Summarize topics (with references to shows) - Find quotes specific to a topic (again with references)

Anything above 32k tokens fails to have acceptable recall, across GPT-4o, Sonnet, and Google's Gemini Flash 1.5 and 2.0.

I suppose it kind of makes sense, given how large context windows are implemented via things like sparse attention etc.

kgeist · on Jan 27, 2025

What could be the reason? Do they selectively skip tokens to make it appear they support the full context?

lifty · on Jan 26, 2025

Thanks for aider! It has become an integral part of my workflow. Looking forward to try DeepSeek in architect mode with Sonnet as the driver. Curious if it will be a noticeable improvement as compared to using Sonnet by itself.

anotherpaulg · on Jan 26, 2025

I'm guessing you're interesting in R1+Sonnet because of the recent SOTA benchmark result? It does seem to be a powerful architect/editor combo.

https://aider.chat/2025/01/24/r1-sonnet.html

cma · on Jan 26, 2025

Claude works incredibly well for me with asking for code changes to projects filling up 80% of context (160K tokens). It's way expensive with the API though but reasonable through the web interface with pro.

Yusefmosiah · on Jan 27, 2025

It’s not just the quantity of tokens in context that matters, but the coherence of the concepts in the context.

Many conflicting ideas are harder for models to follow than one large unified idea.

jingyibo123 · on Feb 3, 2025

In my own experience with openAI's gpt-4o, when feeding with long input, the model simply 'strips' the starting content and answers, while it should output error for more 'accurate' feedback. I switched to my own deployed model for that purpose.

That's a product choice from openAI, not a problem with LLM's long context window input capability.

seunosewa · on Jan 26, 2025

The behaviour you described is what happens when you have small context windows. Perhaps you're feeding the models with more tokens than you think you are. I have enjoyed loading large codebases into AI Studio and getting very satisfying and accurate answers because the models have 1M to 2M token context windows.

dr_kiszonka · on Jan 27, 2025

How do you get those large codebases into AI Studio? Concat everything into one big file?

msoad · on Jan 27, 2025

I use yek

https://github.com/bodo-run/yek

social_quotient · on Jan 27, 2025

Concat to a file but it helps to make an ascii tree at the top and then for each merged file out its path and orientation details. I’ve also started playing with adding line ranges to the ascii tree hoping that the LLMs (more specifically the agentic ones) start getting smart enough to jump to the relevant section.

adamgordonbell · on Jan 27, 2025

Basically yes, I have a helper program, but that's mainly what it does.

NiloCK · on Jan 27, 2025

I learned this very explicitly recently. I've had some success with project and branch prompts - feeding a bunch of context into the beginning of each dialog.

In one dialog, some 30k tokens later, Claude requested the contents of package.json... which was in the context window already - the whole file!

The strange thing was that after I said so, without re-inserting, Claude successfully read it from context to fill the gap in what it was trying to do.

It's as if a synopsis of what exists in-context delivered with each message would help. But that feels weird!

cyanydeez · on Jan 27, 2025

That's what these char models are already doing.

Most chat is just a long running prompt. LLMs have zero actual memory. You just keep feeding it history.

Maybe I misunderstood what you're saying but what you're describing is some kind of 2nd model that condenses that history and that gets fed; this has been done.

Really, what you probably need is another model managing the heap and the stack of the history and brining forward the current context.

But that's easy to say because we are humans.

NiloCK · on Jan 28, 2025

To clarify:

I didn't 'fix' the problem by re-inserting the package.json. I just gave a reminder that it that it was already in context.

"I could confirm this root cause if I could see the contents of package.json".

"You can see it."

"Whoops, yes. Package x needs to bump from n to n+1."

The point being, even info inside the current context can actively be overlooked unless it is specifically referenced.

DiogenesKynikos · on Jan 27, 2025

Maybe the problem is that the "UI" we're providing to the LLMs is not very useful.

Imagine dumping the entire text of a large code repository in front of a human programmer, and asking them to fix a bug. Human programmers use IDEs, search through the code, flip back and forth between different functions, etc. Maybe with a better interface that the LLM could interact with, it would perform better.

cyanydeez · on Jan 27, 2025

I wonder if you could figure out a pseudo code like python. I'd think yaml might work also

Something: Filename: index.js Content: | Class Example...

Another item would be some kind of hyperlinking. Maybe you could load in a hrefs but there might be a more semantically popular way, but the data feeding these AIs just aren't constructed like that.

orbital-decay · on Jan 27, 2025

Overall accuracy degradation on longer contexts is just one major issue. Another is that lost-in-the-middle problem starts being much worse on longer contexts, so when it significantly exceeds the length of model's training examples, the tokens in the middle might as well not exist.

ksynwa · on Jan 27, 2025

Any idea why this happens?

torginus · on Jan 27, 2025

Yeah, and thanks to the features of the programming language, it's very easy to automatically assemble a highly relevant but short context, just by following symbol references recursively.