> investigation demo

Find the step that spiked your AI bill.

A walkthrough of a cost regression investigation. No installation required.

STEP 1

Monday morning. We compared the last two weeks. Total spend was up.

The support_reply workflow runs hundreds of times a day powering a production chatbot. Week over week costs rose, but existing dashboards just showed aggregate spend increasing. No per-step breakdown.

Terminal - tokengoblin drift
# Compare Apr 1-7 vs Mar 25-31 for the support_reply workflow
$ curl -s https://api.tokengoblin.dev/v1/drift \
  -H "Authorization: Bearer tgproj_a1b2c3d4e5f6..." \
  -G --data-urlencode "period_a_start=2026-03-25T00:00:00Z" \
  --data-urlencode "period_a_end=2026-03-31T23:59:59Z" \
  --data-urlencode "period_b_start=2026-04-01T00:00:00Z" \
  --data-urlencode "period_b_end=2026-04-07T23:59:59Z"
STEP 2

The drift report flagged summarize_context at +129%.

TokenGoblin returned a per-step breakdown. One row stood out: summarize_context went from $0.0021 to $0.0048. The other steps showed negligible change. This was not a traffic increase. This was a specific regression.

Step Avg Cost (Mar 25 Avg Cost (Apr 1 Change
summarize_context $0.0021 $0.0048 +129%
generate_reply $0.0035 $0.0036 +3%
fact_check $0.0012 $0.0012 0%
rerank_results $0.0008 $0.0007 -12%
STEP 3

We checked the summarize_context step. Found the root cause.

Pulled up the workflow definition. A commit from last Thursday had bumped max_tokens from 2000 to 4000. Twice the tokens meant twice the cost per call. The drift report isolated the exact step. The git blame found the exact line.

support_reply.py - commit 8f3a2b1 (last Thursday)
# Step: summarize_context
with goblin.step("summarize_context") as step:
    response = openai.chat.completions.create(
        model="gpt-4.1-mini",
        messages=context_messages,
        # max_tokens changed from 2000 to 4000 last Thursday
        max_tokens=4000,  <-- root cause
    )
    step.record_usage(
        input_tokens=response.usage.prompt_tokens,
        output_tokens=response.usage.completion_tokens,
        cost_usd=compute_cost(response.usage, "gpt-4.1-mini"),
    )
STEP 4

Reverted the config. Verified the fix with a fresh drift report.

Changed max_tokens back to 2000, deployed the fix. Ran a new report the next day comparing Apr 1-7 vs Apr 8-14. summarize_context average cost had dropped from $0.0048 to $0.0022. Regression resolved.

Step Avg Cost (Apr 1 Avg Cost (Apr 8 Change
summarize_context $0.0048 $0.0022 -54%
generate_reply $0.0036 $0.0035 -3%
fact_check $0.0012 $0.0012 0%
rerank_results $0.0007 $0.0008 +14%

✓ Root cause found. Fix deployed. Incident closed.

Investigate your own cost regressions.

Join the Beta