A walkthrough of a cost regression investigation. No installation required.
The support_reply workflow runs hundreds of times a day
powering a production chatbot. Week over week costs rose, but existing
dashboards just showed aggregate spend increasing. No per-step breakdown.
# Compare Apr 1-7 vs Mar 25-31 for the support_reply workflow $ curl -s https://api.tokengoblin.dev/v1/drift \ -H "Authorization: Bearer tgproj_a1b2c3d4e5f6..." \ -G --data-urlencode "period_a_start=2026-03-25T00:00:00Z" \ --data-urlencode "period_a_end=2026-03-31T23:59:59Z" \ --data-urlencode "period_b_start=2026-04-01T00:00:00Z" \ --data-urlencode "period_b_end=2026-04-07T23:59:59Z"
summarize_context at +129%.TokenGoblin returned a per-step breakdown. One row stood out: summarize_context went from $0.0021 to $0.0048. The other steps showed negligible change. This was not a traffic increase. This was a specific regression.
| Step | Avg Cost (Mar 25 | Avg Cost (Apr 1 | Change |
|---|---|---|---|
summarize_context |
$0.0021 | $0.0048 | +129% |
generate_reply |
$0.0035 | $0.0036 | +3% |
fact_check |
$0.0012 | $0.0012 | 0% |
rerank_results |
$0.0008 | $0.0007 | -12% |
summarize_context step. Found the root cause.
Pulled up the workflow definition. A commit from
last Thursday had bumped max_tokens
from 2000 to 4000. Twice the tokens meant twice the cost per call.
The drift report isolated the exact step. The git blame found the
exact line.
# Step: summarize_context
with goblin.step("summarize_context") as step:
response = openai.chat.completions.create(
model="gpt-4.1-mini",
messages=context_messages,
# max_tokens changed from 2000 to 4000 last Thursday
max_tokens=4000, <-- root cause
)
step.record_usage(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
cost_usd=compute_cost(response.usage, "gpt-4.1-mini"),
)
Changed max_tokens back to 2000, deployed the fix.
Ran a new report the next day comparing Apr 1-7 vs Apr 8-14.
summarize_context average cost had dropped from $0.0048
to $0.0022. Regression resolved.
| Step | Avg Cost (Apr 1 | Avg Cost (Apr 8 | Change |
|---|---|---|---|
summarize_context |
$0.0048 | $0.0022 | -54% |
generate_reply |
$0.0036 | $0.0035 | -3% |
fact_check |
$0.0012 | $0.0012 | 0% |
rerank_results |
$0.0007 | $0.0008 | +14% |
✓ Root cause found. Fix deployed. Incident closed.