Compare two time windows and isolate exactly which workflow step's cost drifted, with a clear percent change. No more guessing in dashboards.
Last Thursday, a routine config change doubled the cost of our RAG pipeline. Our
retrievalstep went from $0.0021/call to $0.0065/call. Traditional dashboards showed total spend increasing, but TokenGoblin isolatedretrievalas the source of the regression within minutes. We reverted the chunk size config and saved $1,200/month. Root cause found, fix deployed, incident closed.
Output from GET /v1/drift comparing two weeks
of a support_reply workflow.
| Step | Avg Cost (Prev) | Avg Cost (Curr) | Change |
|---|---|---|---|
summarize_context |
$0.0021 | $0.0048 | +129% |
generate_reply |
$0.0035 | $0.0036 | +3% |
fact_check |
$0.0012 | $0.0012 | 0% |
rerank_results |
$0.0008 | $0.0007 | -12% |
One table. One culprit. Zero fumbling through generic dashboards.
goblin.step(). Record input tokens, output
tokens, and cost for every call.
GET /v1/drift
or GET /v1/report/markdown with two date ranges: current week
vs last week, or any two periods you need to investigate.
Python SDK. Wrap each logical step. That's the whole integration.
from tokengoblin import TokenGoblin
goblin = TokenGoblin(
api_url="http://127.0.0.1:8000",
api_key="tgproj_...",
workflow_name="support_reply",
provider="openai",
model="gpt-4.1-mini",
)
with goblin.step("summarize_context") as step:
step.record_usage(
input_tokens=7600,
output_tokens=420,
cost_usd="0.00443",
)
Prefer plain HTTP? REST works too.
curl -X POST https://api.tokengoblin.dev/v1/events \
-H "Authorization: Bearer tgproj_..." \
-H "Content-Type: application/json" \
-d '{
"workflow_name": "support_reply",
"step_name": "summarize_context",
"provider": "openai",
"model": "gpt-4.1-mini",
"input_tokens": 7600,
"output_tokens": 420,
"cost_usd": "0.00443"
}'
TokenGoblin tracks cost per step regardless of which LLM provider you use. Any model, any API. One consistent drift report.
Future: OpenTelemetry exporter for zero-code instrumentation
TokenGoblin is in active development. Early users get free access and direct input on the roadmap. If you're debugging LLM costs in production, we want to talk to you.