TokenGoblin | Forensic AI Cost Drift Debugging

🔍 Incident Report: RAG Pipeline Cost Spike

Last Thursday, a routine config change doubled the cost of our RAG pipeline. Our retrieval step went from $0.0021/call to $0.0065/call. Traditional dashboards showed total spend increasing, but TokenGoblin isolated retrieval as the source of the regression within minutes. We reverted the chunk size config and saved $1,200/month. Root cause found, fix deployed, incident closed.

Sample Drift Report

Output from GET /v1/drift comparing two weeks of a support_reply workflow.

Step	Avg Cost (Prev)	Avg Cost (Curr)	Change
`summarize_context`	$0.0021	$0.0048	+129%
`generate_reply`	$0.0035	$0.0036	+3%
`fact_check`	$0.0012	$0.0012	0%
`rerank_results`	$0.0008	$0.0007	-12%

One table. One culprit. Zero fumbling through generic dashboards.

How It Works

Instrument your code. Wrap each step of your LLM workflow with goblin.step(). Record input tokens, output tokens, and cost for every call.
Let it run. TokenGoblin accumulates events as your production traffic flows. No dashboards to configure, no metrics to define.
Compare two time windows. Call GET /v1/drift or GET /v1/report/markdown with two date ranges: current week vs last week, or any two periods you need to investigate.
Get a drift report. A per-step breakdown with average cost before and after, plus the percent change. Know exactly where to start investigating.

One Context Manager, One Step

Python SDK. Wrap each logical step. That's the whole integration.

Python SDK

from tokengoblin import TokenGoblin

goblin = TokenGoblin(
    api_url="http://127.0.0.1:8000",
    api_key="tgproj_...",
    workflow_name="support_reply",
    provider="openai",
    model="gpt-4.1-mini",
)

with goblin.step("summarize_context") as step:
    step.record_usage(
        input_tokens=7600,
        output_tokens=420,
        cost_usd="0.00443",
    )

Prefer plain HTTP? REST works too.

cURL

curl -X POST https://api.tokengoblin.dev/v1/events \
  -H "Authorization: Bearer tgproj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_name": "support_reply",
    "step_name": "summarize_context",
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "input_tokens": 7600,
    "output_tokens": 420,
    "cost_usd": "0.00443"
  }'

Provider-Agnostic

TokenGoblin tracks cost per step regardless of which LLM provider you use. Any model, any API. One consistent drift report.

◇ OpenAI ◇ Anthropic ◇ Azure OpenAI ◇ Google Gemini ◇ AWS Bedrock ◇ Together AI ◇ Groq

Ingestion Architecture

Your Code

→

SDK / REST

→

TokenGoblin API

→

Drift Report

Future: OpenTelemetry exporter for zero-code instrumentation

Where We're Headed

$ what-is-real

Production-grade FastAPI backend with PostgreSQL
Python SDK with context-manager API
Per-step cost drift detection between two time windows
Markdown and JSON report formats
Organization and project scoping with scoped API keys

$ what-is-next

Automated deploy correlation: "this PR increased your AI bill by 14%"
OpenTelemetry exporter for zero-code instrumentation
Alerting thresholds on per-step cost drift
CI/CD checks that flag cost regressions before they merge

Join the Beta

TokenGoblin is in active development. Early users get free access and direct input on the roadmap. If you're debugging LLM costs in production, we want to talk to you.