Back to all articles
AI cost optimization

Why your AI agent re-reads the same file 30 times a session — and what it costs you

Every modern AI coding agent burns 34–88 % of its input budget re-sending files it already knows. Here's why the loop happens, what it adds up to in dollars, and the cheapest fix.

DRIP contributors6 min read

Open your Claude Code logs from a typical refactor session. Filter on read app.py. Count the hits.

On a 400-line file, you'll find that exact line 22 to 34 times in a single 90-minute session. The file didn't change much between reads — maybe an import got added, a function got renamed, three lines moved. But the agent re-sent every byte, every time.

That's the loop. And it's where 34 to 88 % of your input-token budget goes.

The math

A 400-line Python file is roughly 6,000 tokens when serialised the way coding agents do it. Claude Sonnet 4.6 charges $3.00 per million input tokens.

Reads / session Tokens shipped Cost (per session)
5 30,000 $0.09
15 90,000 $0.27
30 180,000 $0.54
50 300,000 $0.90

Multiply by a working week of refactor sessions and you're looking at $20–$40 a developer-week just on the same file being re-sent. Across a team of 10, that's a quiet $10,000+ a year for content the agent already has.

Why doesn't the agent just remember?

Three reasons, in order of severity.

1. The agent has no working memory between tool calls

Each read call in Claude Code is treated as an independent operation. The agent's "memory" is whatever's in its context window — and context windows get compacted (truncated, summarised, dropped) the moment the conversation gets long. Once compaction happens, the agent has to re-read the file because its in-context copy is gone.

2. The hook ecosystem assumes statelessness

Claude Code's PreToolUse / PostToolUse hooks fire on every tool call. There's no built-in primitive for "I already sent this file 4 minutes ago and it hasn't changed." Every hook starts fresh.

3. Pricing pressure is on the model side, not the tooling side

Anthropic ships prompt caching at the API level — but caching applies to the request payload structure, not to the agent's mental model of your codebase. The cache key is what you sent, not what you sent that didn't change.

So the model has no incentive to flag re-reads. The tooling has no infrastructure to track them. You pay every time.

What "delta read interception" actually means

DRIP sits between your agent and the filesystem with three rules:

  1. First read of a file → full content (semantically compressed when possible).
  2. Re-read of an unchanged file → an [unchanged] sentinel of ~12 tokens.
  3. Re-read of a changed file → a unified diff (--- old / +++ new / @@ hunks) — typically 200–400 tokens regardless of file size.

The agent's mental model doesn't change. The protocol doesn't change. The agent's read tool returns exactly the information needed to update its understanding — no more, no less.

A 400-line file that gets 30 re-reads now costs:

1 × 6,000 tok  (first read)
+ 24 × 12 tok  (unchanged sentinels)
+ 5 × 350 tok  (deltas after edits)
= 7,738 tokens

Versus the naïve 180,000. That's a 96 % reduction on the same workflow.

What about edit cycles?

Edits make the math more interesting. When the agent writes to a file and then re-reads it (very common — agents like to "verify" their own writes), DRIP returns an edit certificate: hash + touched line ranges + symbol names. ~390 bytes. The agent gets confirmation the edit landed without having to re-read 600 lines it just wrote.

That single optimisation accounts for ~15 % of total savings in typical sessions — agents verify their own writes far more often than developers realise.

Bash commands too?

Yes — cat, grep, sed, awk pipelines on a single source file get the same treatment. DRIP recognises common shapes (cat foo | head -50 | grep def, grep PAT file, sed -n '20,40p' file) and caches their outputs the same way. On a re-run with an unchanged source, you get a [DRIP: pipeline unchanged] sentinel.

This is de-emphasised in current marketing because file reads carry the meaningful win, but the bash pipeline support is in the code if you want it.

The cheapest fix

There are exactly two ways to stop paying for re-reads:

  1. Stop using AI coding agents for refactors. (Nobody is going to do this.)
  2. Intercept the reads at the protocol layer before they hit the model.

DRIP is option 2. It's a single 4 MB Rust binary, runs entirely on your machine, has zero network calls, zero telemetry. brew install drip-cli/drip/drip && drip init -g wires it into Claude Code in 8 seconds.

You can verify the savings live in your menu bar with DripMeter, the macOS companion app — or just drip meter from the terminal.

Further reading

#Claude Code#Codex CLI#Gemini CLI#tokens#cost