API Docs | TokenGoblin

Base URL

https://api.tokengoblin.gobblsoftware.com

For self-hosted deployments, use your own domain or http://127.0.0.1:8000 when running locally via Docker Compose.

Authentication

All project‑scoped endpoints require a project API key in the Authorization header:

Authorization: Bearer tgproj_a1b2c3d4e5f6_...

API keys are scoped to a single project with events:write and/or events:read permissions. The plaintext key is shown only once at creation time; only its hash is stored.

API keys follow the format tgproj_<12-hex-prefix>_. To get one, join the beta.

Error Codes

Status	Description
`200 OK`	Request succeeded or event was idempotent.
`201 Created`	Event ingested successfully.
`400 Bad Request`	Invalid parameters (malformed date, mutually exclusive options, period ordering).
`401 Unauthorized`	Missing or invalid API key.
`403 Forbidden`	Valid key but insufficient scope (e.g. write with a read‑only key).
`404 Not Found`	Endpoint or project not found.
`413 Payload Too Large`	Request body or raw_payload exceeds size limits.
`422 Unprocessable Entity`	Validation error (missing required field, invalid value).
`429 Too Many Requests`	Rate limit exceeded. Response includes `Retry-After` header.
`500 Internal Server Error`	Server-side error — report via support.

Concepts

workflow_name: A logical grouping name for a pipeline (e.g. support_reply, rag_pipeline). Defaults to "default".
step_name: A specific step within the workflow (e.g. summarize_context, generate_reply).
trace_id: Automatically generated UUID that groups all events from the same workflow invocation. SDK‑managed; invisible to users.
prompt_hash: Deterministic SHA‑256 hash of the prompt (truncated to 16 hex chars). Used to detect when the same prompt's cost changes. Auto‑computed by the SDK.
deploy_sha: Git commit SHA automatically captured by the SDK from CI/CD environment variables. Powers deploy correlation in forensic reports.
user_id / tenant_id: Optional identifiers populated by user_id_provider and tenant_id_provider callables. Enable "which user/tenant drove the cost increase?" in forensic reports.
hypothesis: A possible explanation for a cost change, ranked by estimated dollar impact, with confidence (0–100%), severity, evidence, and a concrete recommendation.
evidence log: Raw forensic data — trace IDs, prompt hashes, model/provider changes — for deep-dive investigation.

POST /v1/events

Ingest one LLM workflow step event.

Request Body

JSON

{
  // Required
  "step_name": "summarize_context",
  "provider": "openai",
  "model": "gpt-4.1-mini",

  // Optional — with defaults
  "workflow_name": "support_reply",        // "default"
  "input_tokens": 7600,                   // 0
  "output_tokens": 420,                   // 0
  "cost_usd": "0.00443",                  // "0" — string avoids float issues
  "timestamp": "2026-04-07T14:32:00Z",    // server current UTC
  "latency_ms": 342.5,                    // optional
  "error": false,                         // false
  "error_message": null,                  // optional
  "event_id": "evt_01HQ...",              // auto UUID — provides idempotency

  // SDK-managed — attached automatically
  "trace_id": "550e8400-...",             // auto UUID per workflow run
  "prompt_hash": "a1b2c3d4e5f6a7b8",       // SHA‑256 truncated to 16 hex

  // Attribution — SDK auto-detect or manual
  "deploy_sha": "a1b2c3d4e5f...",          // Git SHA from CI/CD or git (max 64 chars)
  "user_id": "user_789",                   // from user_id_provider (max 255)
  "tenant_id": "acme_corp"                 // from tenant_id_provider (max 255)
}

Idempotency

If an event_id is provided, duplicate POSTs return 200 OK with "idempotent": true and the existing event — no duplicate is created.

Response `201 Created`

{
  "event": {
    "id": "uuid",
    "event_id": "evt_01HQ...",
    "timestamp": "2026-04-07T14:32:00Z",
    "workflow_name": "support_reply",
    "step_name": "summarize_context",
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "input_tokens": 7600,
    "output_tokens": 420,
    "cost_usd": "0.00443",
    "latency_ms": 342.5,
    "deploy_sha": "a1b2c3d4e5f...",
    "user_id": "user_789",
    "tenant_id": "acme_corp",
    "error": false,
    "error_message": null,
    "created_at": "2026-04-07T14:32:01Z"
  },
  "idempotent": false
}

cURL

curl -X POST https://api.tokengoblin.gobblsoftware.com/v1/events \
  -H "Authorization: Bearer tgproj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "step_name": "summarize_context",
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "input_tokens": 7600,
    "output_tokens": 420,
    "cost_usd": "0.00443"
  }'

Python SDK

SDK with deploy SHA (auto), prompt hash (auto), and attribution providers

from tokengoblin import TokenGoblin

goblin = TokenGoblin(
    api_key="tgproj_...",
    workflow_name="support_reply",
    // deploy_sha auto-detected from CI/CD env or git — no code needed

    // Optional providers (e.g. in a FastAPI/Flask request context)
    user_id_provider=lambda: request.user.id,
    tenant_id_provider=lambda: request.tenant.id,

    // Explicit SHA override (for manual deploys or testing)
    commit_sha="a1b2c3d4e5f...",
)

with goblin.step(
    "summarize_context",
    prompt="You are a helpful assistant...",
) as step:
    // Your LLM call here
    step.record_usage(
        input_tokens=7600,
        output_tokens=420,
        cost_usd="0.00443",
    )

SDK Constructor Reference

Parameter	Type	Description
`api_key`	`str`	Project API key. Required for `sink="api"`.
`api_url`	`str`	Base URL. Defaults to `TOKEN_GOBLIN_API_URL` env var or `http://127.0.0.1:8000`.
`sink`	`str`	`"api"` (default) or `"jsonl"` for local file output.
`workflow_name`	`str`	Logical workflow name. Defaults to `"default"`.
`provider`	`str`	Default LLM provider name. Defaults to `"unknown"`.
`model`	`str`	Default model identifier. Defaults to `"unknown"`.
`trace_id`	`str`	Workflow‑run identifier. Auto‑generated UUID if omitted.
`user_id_provider`	`Callable[[], str] \| None`	Optional callable that returns the current user ID. Called once per `step()`.
`tenant_id_provider`	`Callable[[], str] \| None`	Optional callable that returns the current tenant ID. Called once per `step()`.
`commit_sha`	`str \| None`	Optional override for deploy SHA. Auto‑detected from CI/CD or git if omitted.

GET /v1/forensic-report

The primary report endpoint. Returns a forensic root‑cause analysis with incident classification, confidence‑ranked hypotheses, evidence log, deploy correlation, and user/tenant attribution summary.

Query Parameters

Parameter	Type	Description
`period`	`string`	Relative timeframe: `last-24h`, `last-7d`, or `last-30d`. Compares the most recent period to the preceding one of equal length. Default: `last-24h`.
`since`	`string`	Fixed range start date (`YYYY-MM-DD`). Must be paired with `until`. Mutually exclusive with `period`.
`until`	`string`	Fixed range end date (`YYYY-MM-DD`).
`baseline_offset`	`string`	Offset for the comparison baseline (e.g. `-7d`). Defaults to an equal‑length shift.
`format`	`string`	Output format: `markdown` (default) or `json`.

How Attribution Works

Deploy SHA is captured automatically by the SDK on every event. The SDK reads CI/CD environment variables (GITHUB_SHA, CI_COMMIT_SHA, BITBUCKET_COMMIT, CIRCLE_SHA1, GIT_COMMIT, BUILD_SOURCEVERSION, TRAVIS_COMMIT, CF_REVISION) and falls back to git rev-parse HEAD. Each stored event carries a deploy_sha field, and the deploy_correlation section of the report is built from those stored fields — detecting new deploys and checking whether anomalies started after them.

User / tenant attribution is driven by user_id_provider and tenant_id_provider callables passed to the SDK constructor. Their return values are stored in user_id and tenant_id event fields. The attribution_summary section of the report aggregates those stored fields to show which users and tenants drove cost changes.

Optional Headers

X-TokenGoblin-Deploy-SHA — appends a sha:… entry to the timeline's related_deploys list. Does not affect the primary deploy_correlation analysis, which is built entirely from stored event deploy_sha fields.
X-TokenGoblin-Business-Context — comma‑separated raw_payload metadata keys to surface in the business_impact section (e.g. feature,customer_tier). When omitted, the system auto‑discovers up to 5 non‑reserved metadata keys from events.

cURL — Markdown (no headers)

curl -s https://https://api.tokengoblin.gobblsoftware.com/v1/forensic-report?period=last-7d \
  -H "Authorization: Bearer tgproj_..."

cURL — JSON with optional headers

curl -s "https://api.tokengoblin.gobblsoftware.com/v1/forensic-report?period=last-7d&format=json" \
  -H "Authorization: Bearer tgproj_..." \
  -H "X-TokenGoblin-Deploy-SHA: a1b2c3d4e5f" \
  -H "X-TokenGoblin-Business-Context: feature,customer_tier"

JSON Response Schema

Response (format=json)

{
  "report_id": "uuid",
  "generated_at": "2026-06-14T12:00:00Z",
  "classification": "context_bloat",
  "classification_label": "Context Bloat",
  "periods": {
    "a": { "start": "...", "end": "..." },
    "b": { "start": "...", "end": "..." }
  },
  "cost_summary": {
    "total_a_usd": 1.2,
    "total_b_usd": 3.84,
    "delta_usd": 2.64,
    "delta_percent": 220.0
  },
  "timeline": {
    "first_anomaly": "2026-06-14T08:25:00+00:00",
    "detection_window": "Jun 07 00:00 UTC – Jun 14 00:00 UTC",
    "related_deploys": ["sha:a1b2c3d..."]
  },
  "steps": [
    {
      "step_name": "summarize_context",
      "workflow_name": "support_reply",
      "provider": "openai",
      "model": "gpt-4.1-mini",
      "events_a": 340,
      "events_b": 340,
      "cost_delta_usd": 2.64,
      "cost_delta_percent": 220.0,
      "is_new_step": false,
      "hypotheses": [
        {
          "rank": 1,
          "factor": "input_tokens_per_call",
          "confidence": 95,
          "severity": "critical",
          "description": "Input tokens per call exploded 6E4×…",
          "evidence": ["Input tokens per call: 100 → 6,000,000"],
          "recommendation": "Add a circuit breaker for calls > 500k tokens."
        },
        {
          "rank": 2,
          "factor": "calls_per_run",
          "confidence": 60,
          "severity": "medium",
          "description": "High calls per run (13.5) amplifies impact.",
          "evidence": ["Calls per run: 0 → 13.5"],
          "recommendation": "Investigate retry loops or missing caches."
        }
      ],
      "metrics": {
        "workflow_runs": { "a": 5, "b": 6, "abs_change": 1, "pct_change": 20 },
        "calls_per_run": { "a": 0, "b": 13.5, "abs_change": 13.5, "pct_change": null },
        "input_tokens_per_call": { "a": 100, "b": 6000000, "abs_change": 5999900, "pct_change": 5999900 },
        "output_tokens_per_call": { "a": 50, "b": 420, "abs_change": 370, "pct_change": 740 }
      },
      "evidence_log": {
        "trace_ids": ["uuid1", "uuid2"],
        "prompt_hashes": ["a1b2c3d4e5f6a7b8"],
        "model_changes": ["openai/gpt-4.1-mini → openai/gpt-4.1-mini (no change)"],
        "provider_changes": []
      }
    }
  ],
  "attribution_summary": {
    "top_users": [
      { "id": "user_789", "period_a_cost_usd": 0, "period_b_cost_usd": 145.2, "delta_usd": 145.2 }
    ],
    "top_tenants": []
  },
  "deploy_correlation": {
    "deploy_changes": [ { "deploy_sha": "a1b2c3d", "is_new_deploy": true, ... } ],
    "primary_deploy": { "deploy_sha": "a1b2c3d", ... },
    "has_new_deploy": true
  },
  "business_impact": null,
  "remediation_priority": "HIGH",
  "error_rate_a": 0.0,
  "error_rate_b": 0.0,
  "primary_driver_step": "summarize_context"
}

Classification Labels

Value	Label
`model_or_pricing_drift`	Model or Pricing Drift
`behavioral_drift`	Behavioral Drift
`context_bloat`	Context Bloat
`traffic_spike`	Traffic Spike
`new_feature`	New Feature
`multiple_factors`	Multiple Factors
`unknown`	Unknown

GET /v1/drift

Compare cost and token usage between two explicit time periods. Returns a JSON response with total deltas, per‑model breakdowns, per‑day aggregations, and input‑vs‑output token splits. Prefer GET /v1/forensic-report for root‑cause analysis.

Query Parameters

period_a_start (required) — ISO‑8601
period_a_end (required) — ISO‑8601
period_b_start (required) — ISO‑8601
period_b_end (required) — ISO‑8601

GET /v1/events

List ingested events with cursor‑based pagination.

cursor — opaque pagination cursor.
limit — 1–1000, default 100.

Rate Limits & Limits

Rate Limits (Private Beta)

POST /v1/events — 120 req/min per project key, burst 240
GET /v1/events, GET /v1/drift, GET /v1/forensic-report, GET /v1/report/markdown — 30 req/min per project key, burst 60
/health, /ready — exempt
Unauthenticated requests — additional IP‑based rate limit

Rate‑limited responses include a Retry-After header. format=json does not change the limit.

Request Limits

Request body: 1 MiB (configurable via REQUEST_MAX_BYTES)
Stored raw_payload: 64 KiB compact JSON

More Information

OpenAPI spec — /openapi.json on self‑hosted instances (local/dev mode)
Interactive docs — /docs and /redoc on self‑hosted instances
Deploy SHA detection — The SDK auto‑detects GITHUB_SHA, CI_COMMIT_SHA, BITBUCKET_COMMIT, CIRCLE_SHA1, GIT_COMMIT, BUILD_SOURCEVERSION, TRAVIS_COMMIT, CF_REVISION. Falls back to git rev-parse HEAD. No configuration needed.
User / tenant attribution — Pass callables to TokenGoblin() to attach user and tenant context to every event. Reports then show which users/tenants drove cost increases.
Contact us — beta signup form

Base URL

Authentication

Error Codes

Concepts

POST /v1/events

Request Body

Idempotency

Response 201 Created

cURL

Python SDK

SDK Constructor Reference

GET /v1/forensic-report

Query Parameters

How Attribution Works

Optional Headers

cURL — Markdown (no headers)

cURL — JSON with optional headers

JSON Response Schema

Classification Labels

GET /v1/drift

Query Parameters

GET /v1/events

Rate Limits & Limits

Rate Limits (Private Beta)

Request Limits

More Information

Response `201 Created`