> API beta · free during early access

Find the step that spiked your AI bill.

Compare two time windows and isolate exactly which workflow step's cost drifted, with a clear percent change. No more guessing in dashboards.

⚠️ Step summarize_context cost +218% compared to previous period.
Investigate →
Mon 3 Tue 4 Wed 5 Thu 6 Fri 7 Sat 8 Sun 9 PR #142 regression start +218% PR #156 (thu) Cost per call (USD) - support_reply workflow

🔍 Incident Report: RAG Pipeline Cost Spike

Last Thursday, a routine config change doubled the cost of our RAG pipeline. Our retrieval step went from $0.0021/call to $0.0065/call. Traditional dashboards showed total spend increasing, but TokenGoblin isolated retrieval as the source of the regression within minutes. We reverted the chunk size config and saved $1,200/month. Root cause found, fix deployed, incident closed.

Sample Drift Report

Output from GET /v1/report/markdown comparing two weeks of a support_reply workflow.

Step Avg Cost (Prev) Avg Cost (Curr) Change
summarize_context $0.0021 $0.0048 +129%
generate_reply $0.0035 $0.0036 +3%
fact_check $0.0012 $0.0012 0%
rerank_results $0.0008 $0.0007 -12%

One table. One culprit. Zero fumbling through generic dashboards.

How It Works

  1. Instrument your code. Wrap each step of your LLM workflow with goblin.step(). Record input tokens, output tokens, and cost for every call.
  2. Let it run. TokenGoblin accumulates events as your production traffic flows. No dashboards to configure, no metrics to define.
  3. Compare two time windows. Call the report endpoint with two date ranges: current week vs last week, or any two periods you need to investigate.
  4. Get a drift report. A per-step breakdown with average cost before and after, plus the percent change. Know exactly where to start investigating.

One Context Manager, One Step

Python SDK. Wrap each logical step. That's the whole integration.

Python SDK
from tokengoblin import TokenGoblin

goblin = TokenGoblin(
    api_url="http://127.0.0.1:8000",
    api_key="tgproj_...",
    workflow_name="support_reply",
    provider="openai",
    model="gpt-4.1-mini",
)

with goblin.step("summarize_context") as step:
    step.record_usage(
        input_tokens=7600,
        output_tokens=420,
        cost_usd="0.00443",
    )

Prefer plain HTTP? REST works too.

cURL
curl -X POST https://api.tokengoblin.dev/v1/events \
  -H "Authorization: Bearer tgproj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_name": "support_reply",
    "step_name": "summarize_context",
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "input_tokens": 7600,
    "output_tokens": 420,
    "cost_usd": "0.00443"
  }'

Provider-Agnostic

TokenGoblin tracks cost per step regardless of which LLM provider you use. Any model, any API. One consistent drift report.

◇ OpenAI ◇ Anthropic ◇ Azure OpenAI ◇ Google Gemini ◇ AWS Bedrock ◇ Together AI ◇ Groq

Ingestion Architecture

Your Code
SDK / REST
TokenGoblin API
Drift Report

Future: OpenTelemetry exporter for zero-code instrumentation

Where We're Headed

$ what-is-real
  • Production-grade FastAPI backend with PostgreSQL
  • Python SDK with context-manager API
  • Per-step cost drift detection between two time windows
  • Markdown and JSON report formats
  • Organization and project scoping with scoped API keys
$ what-is-next
  • Automated deploy correlation: "this PR increased your AI bill by 14%"
  • OpenTelemetry exporter for zero-code instrumentation
  • Alerting thresholds on per-step cost drift
  • CI/CD checks that flag cost regressions before they merge

Join the Beta

TokenGoblin is in active development. Early users get free access and direct input on the roadmap. If you're debugging LLM costs in production, we want to talk to you.