Cost & Token Analysis

Real benchmark: five developer questions about the antilist repo (Convex + React, 66 files, ~50 k tokens of source), run twice — with raw file context and with Codegraph MCP tools. Measured on gpt-4o-mini and local llama3.1:8b.

85%
fewer input
tokens used
84%
cost reduction
per session
3.4
avg tool calls
per question

Benchmark · 5 questions on gpt-4o-mini

Same repo, same questions, two runs. Direct loads source files into the prompt; MCP queries the graph through Codegraph tools instead.

  1. Where is the Convex schema defined and what tables or fields does it declare?
  2. What is the save function in convex/userData.ts used for, and what calls it?
  3. What does the App component in App.tsx do and which child screens does it render?
  4. Which source files import from the Convex generated API (convex/_generated/api)?
  5. How does HomeScreen in components/HomeScreen.tsx connect to Convex mutations or queries?
Q Direct (in) MCP (in) Tools Direct (ms) MCP (ms) Saved
Q150,3228,97868,0007,03482%
Q250,3275,533313,2304,64489%
Q350,3249,76628,8034,47681%
Q450,3232,22913,1812,04596%
Q550,32411,91259,1868,08576%
Total251,62038,4181742,40026,28485%

Cost on gpt-4o-mini ($0.15 / M in, $0.60 / M out): $0.0387 → $0.0062 across all five questions. Same model, same answers, 84% cheaper and 38% faster end-to-end. On Claude Sonnet or GPT-4 the dollar savings scale proportionally.

Same benchmark · local Ollama (llama3.1:8b)

Codegraph is provider-agnostic. The same five questions answered with a local 8B model via Ollama — no API key, no network, no marginal cost.

Q Direct (in) MCP (in) Tools Direct (ms) MCP (ms) Saved
Q14,0962,238331,80816,25245%
Q24,0961,280030,97418,03369%
Q34,0961,277036,18011,59369%
Q44,0961,506128,8259,84363%
Q54,0961,277045,80013,24269%
Total20,4807,5784173,58768,96363%

Direct context is capped at the 8B model's 4 096-token window — Codegraph still trims another 63% off that, and runs 2.5× faster (173 s → 69 s) because the model thinks over smaller, structured input. Cost on both runs: $0.

What an MCP answer looks like

One semantic-search call against a different repo (Camwatcher) for "Where is object detection written?". Ten ranked symbol pointers; zero file contents transferred.

Symbol Kind File Relevance
motion_detector Var backend/app/ai/motion_detector.py:78
50.6%
yolo_detector Var backend/app/ai/yolo_detector.py:11
48.5%
YOLODetector Class backend/app/ai/yolo_detector.py:5
48.4%
get_motion_detection Fn backend/app/integrations/tapo_client.py:180
47.8%
detect Fn backend/app/ai/yolo_detector.py:6
46.2%
MotionDetector Class backend/app/ai/motion_detector.py:15
45.8%
set_motion_detection Fn backend/app/integrations/tapo_client.py:184
44.4%
step_motion_detection Fn backend/test_tapo.py:96
42.9%
handle_event Fn backend/app/events/event_pipeline.py:37
35.3%
_get_subtractor Fn backend/app/ai/motion_detector.py:30
33.7%

The MCP returned ~600 tokens of symbol metadata. Without it, Claude would have loaded ~6 detection-related files (4 000–6 000 tokens) just to find the same starting points.

Input tokens per query — measured + projected

Direct context scales with repo size; MCP returns symbol pointers and stays nearly flat. The 66-file row is from the benchmark above; the larger rows are projected from the same access pattern.

Directantilist (66 files · measured)
50 322
With MCPantilist (66 files · measured)
7 684
Directmedium (~200 files · projected)
150 000
With MCPmedium (~200 files · projected)
~12 000
Directlarge (1 000+ files)
won't fit
With MCPlarge (1 000+ files · projected)
~15 000

MCP input stays flat because the graph index returns symbol pointers, not source. Direct context grows linearly with codebase size until it exceeds the model's context window — gpt-4o-mini caps at 128 k, Claude Sonnet at 200 k.

How cost scales — Claude Sonnet 4.6 projection

The 5-question benchmark above ran on gpt-4o-mini. Below: the same input pattern applied to Claude Sonnet 4.6 at $3 / M in · $15 / M out across codebases of different sizes.

Project Without MCP With MCP Reduction
Camwatcher (~50 files) $0.0255 $0.0252 84%
Medium (~200 files) $0.4538 $0.0390 91%
Large (1 000+ files) won't fit $0.0480 100%*

MCP cost is nearly fixed: ~8 000 input + ~150 output tokens per question. Direct cost grows with repo size — past ~200 files, the full codebase no longer fits in any model's context window, so retrieval is the only viable mode. *Large-repo direct mode is undefined; MCP is the path that actually works.

Monthly savings — Sonnet 4.6 on a ~66-file repo

Projection from the antilist benchmark — $0.1548 direct vs $0.0252 with MCP per question.

Queries / month Without MCP With MCP Monthly saving Annual saving
50 $7.74 $1.26 $6.48 $77.76
200 $30.96 $5.04 $25.92 $311.04
500 $77.40 $12.60 $64.80 $777.60
1 000 $154.80 $25.20 $129.60 $1 555.20
2 500 $387.00 $63.00 $324.00 $3 888.00

Savings compound on larger codebases. A team of 5 developers each running 500 queries / month on this baseline saves roughly $3 888 / year; on a 200-file project that grows past $13 800 / year.

Why the difference

Without Codegraph MCP

The model receives raw file contents as context. On the 66-file antilist benchmark that meant 50 322 input tokens per question — 100% of the repo, every turn.

Context size grows linearly with codebase size, then hits the model's context-window ceiling and stops working entirely.

With Codegraph MCP

Claude calls typed graph tools (3.4 on average) and receives symbol names, paths, signatures, and relevance scores — about 7 700 tokens per question, regardless of repo size.

The model reads compact pointers, not source. Same answers, fraction of the cost, and 1.6× faster end-to-end.

How accuracy is maintained

1 Parsed, not guessed — tree-sitter extracts every symbol, call edge, and import from your source. No LLM hallucination in the index.
2 Vector KNN — each symbol is embedded at index time. Queries find semantically similar symbols even with different naming.
3 Graph traversal — tools like find_callers and blast_radius walk typed edges in Kuzu; Claude receives exact structural answers.
4 Incremental re-index — only changed files are re-parsed on each run, so the graph stays current without full re-embedding.