Find the Root Cause of LLM Cost Spikes

Everything you need to debug cost regressions

🔍

Forensic Report

Classifies incidents as context_bloat, behavioral_drift, traffic_spike, and more. Ranks hypotheses with confidence % and severity.

🔗

Automatic Attribution

Detects deploy SHA from GitHub Actions, GitLab CI, and more. Correlates anomalies to specific deploys with precise timing.

⚡

One-Line SDK

Wrap each step with goblin.step(). Trace IDs, prompt hashes, and deploy SHA are automatic. No other code changes.

📋

JSON + Markdown

Human-readable reports for engineers or structured JSON for dashboards and automation. Your choice of output.

🔒

Privacy-First

No prompts leave your infrastructure. Only token counts, hashes, and metadata ever reach the API.

🌐

Provider-Agnostic

Works with OpenAI, Anthropic, Gemini, Bedrock, Azure, Together AI, Groq, and any OpenAI-compatible endpoint.

One context manager. Zero manual tracking.

Wrap each LLM call with goblin.step(). Deploy SHA, trace IDs, and prompt hashes are captured automatically.

from tokengoblin import TokenGoblin

goblin = TokenGoblin(
    api_key="tgproj_...",
    workflow_name="support_reply",
    user_id_provider=lambda: request.user.id,
    tenant_id_provider=lambda: request.tenant.id,
)

with goblin.step("summarize_context") as step:
    # Your LLM call here
    step.record_usage(
        input_tokens=7600,
        output_tokens=420,
        cost_usd="0.00443",
    )

curl -H "Authorization: Bearer tgproj_..." \
  "https://api.tokengoblin.gobblsoftware.com/v1/forensic-report?period=last-7d&format=json"

{
  "classification": "context_bloat",
  "confidence": 0.95,
  "hypotheses": [
    {
      "rank": 1,
      "description": "Input tokens exploded 60,000×",
      "severity": "critical",
      "recommendation": "..."
    }
  ],
  "deploy_sha": "a1b2c3d4e5f..."
}

How It Works

1

Instrument

Add the tokengoblin SDK and wrap each step of your LLM workflow with goblin.step(). Deploy SHA is captured automatically from CI.

2

Run

TokenGoblin collects cost, token, and trace data. Every step gets a trace ID, prompt hash, and deploy SHA — nothing more, nothing less.

3

Investigate

Call GET /v1/forensic-report with any timeframe to get a structured report with classification, ranked hypotheses, and evidence.

4

Act

Follow concrete recommendations — add a circuit breaker, revert a config change, or investigate a specific deploy. No guessing.

🔍 Incident Report: RAG Pipeline Cost Spike

Last Thursday, a routine config change doubled the cost of our RAG pipeline. Our retrieval step went from $0.0021/call to $0.0065/call. TokenGoblin's forensic report classified the incident as context_bloat with 95% confidence and correlated the anomaly to Thursday's deploy. We reverted the chunk size config and saved $1,200/month.

Sample Forensic Report

One API call comparing two weeks of a support_reply workflow. Ranked hypotheses. Actionable recommendations. No guessing.

GET /v1/forensic-report?period=last-7d

# 🔍 Forensic Cost Report

**Classification:** context_bloat — Context Bloat

## Summary
|                | Period A   | Period B   | Delta                    |
|----------------|------------|------------|--------------------------|
| **Cost**       | $1.2000    | $3.8400    | +$2.6400  (+220%)        |
| **Error Rate** | 0.0%       | 0.0%       | +0.0 pp                  |

## Deploy Correlation
- **All anomalous events share** deploy_sha = a1b2c3d (deployed Apr 3)
- **First anomaly detected** 10 hours after deploy

## Top Hypothesis
**[CRITICAL, 95% confidence]** Input tokens per call exploded 60,000×
- Input tokens per call: 100 → 6,000,000
- **Recommendation:** Add a circuit breaker for calls exceeding 500k tokens

# 🔍 Forensic Cost Report

**Classification:** context_bloat — Context Bloat

## Summary
|                | Period A   | Period B   | Delta                    |
|----------------|------------|------------|--------------------------|
| **Cost**       | $1.2000    | $3.8400    | +$2.6400  (+220%)        |
| **Error Rate** | 0.0%       | 0.0%       | +0.0 pp                  |

## Timeline
- **First anomaly detected:** 2026-04-03 14:22 UTC
- **Detection window:** Mar 25 00:00 UTC – Apr 01 00:00 UTC
- **Related deploy:** sha:a1b2c3d4e5f...

## Attribution Summary

**Deploy correlation:**
- All anomalous events share deploy_sha = a1b2c3d (deployed Apr 3).
  First anomaly detected 10 hours later.

**Top users by cost increase:**
| user_id | Period A | Period B | Delta |
|---------|----------|----------|-------|
| user_789 | $0.00 | $145.20 | +$145.20 |

## Step: summarize_context
**Root cause:** Input tokens per call exploded 60,000×…

**### Hypotheses**
1. **[CRITICAL, 95% confidence]** Input tokens per call exploded 6E4×
   - Evidence: Input tokens per call: 100 → 6,000,000
   - See Evidence Log: prompt_hashes, trace_ids.
   - **Recommendation:** Add a circuit breaker for calls exceeding 500k tokens.

2. **[MEDIUM, 60% confidence]** High calls per run (13.5) amplifies impact
   - Evidence: Calls per run: 0 → 13.5
   - **Recommendation:** Investigate retry loops or missing caches.

---
## Remediation Priority
**Remediation priority:** **HIGH** – Immediate action recommended.

Get Your API Key

Enter your email to receive a verification link. Verify and we'll send you a project API key instantly. Early users get free access during the beta.

Where We're Headed

✓ Already Live

Forensic root-cause reports with confidence-ranked hypotheses
Incident classification (context_bloat, behavioral_drift, etc.)
Automatic deploy SHA detection from CI/CD
Deploy correlation with anomaly timing
User / tenant attribution providers
Evidence logs with trace IDs and prompt hashes
Relative timeframes (last-24h, last-7d, last-30d, custom)
JSON & Markdown output formats
Organization and project scoping with scoped API keys
Python SDK with context-manager API
Production-grade FastAPI backend with PostgreSQL
JSONL sink for local development

⏳ Coming Soon

OpenTelemetry exporter for zero-code instrumentation
Alerting webhooks on per-step cost drift thresholds
Budget caps with automatic notifications
CI/CD gates that flag cost regressions before merge
Custom dashboards with historical trends

Works With Your Stack

◇ OpenAI ◇ Anthropic ◇ Azure OpenAI ◇ Google Gemini ◇ AWS Bedrock ◇ Together AI ◇ Groq

Any model, any API. One consistent forensic report.

Find the root cause of LLM cost spikes in minutes.