I’m building Sift, a drop-in gateway that makes LLM tool use far more reliable when tools return large JSON payloads. The usual pattern is that agents paste raw tool outputs directly into the prompt, which quickly blows up context, causes truncation/compaction, and leads to incorrect answers once earlier results disappear. Sift sits between the model and its tools (MCP, APIs, CLIs), stores the full payload locally as an artifact (indexed in SQLite), and returns only a compact schema plus an artifact_id. When the model needs something from the data, it runs a tiny Python query against the stored artifact instead of reasoning over thousands of tokens of JSON. In benchmarks across 103 questions on real datasets, this approach cut input tokens by ~95% and improved answer accuracy from ~33% to ~99%. Repo: https://github.com/lourencomaciel/sift-gateway.
I was talking about part of this problem this morning, I keep having it with our large OpenAPI spec and the JSON parsing. But it occured to me that recently sometimes Claude will work around the context blowing problem by using a sub-agent to do the parsing. Have you seen that?
It's still a big time saver to have something like this, and stops it even taking the risk of doing it in the first place.