> api reference

TokenGoblin API

Ingest LLM cost events per workflow step. Generate drift reports that compare two time windows with per-step percent change.

Base URL

All API endpoints are served from:

https://api.tokengoblin.dev

For self-hosted deployments, use your own domain or http://127.0.0.1:8000 when running locally via Docker Compose.

Authentication

All project-scoped endpoints require a project API key. Pass it in the Authorization header:

Authorization: Bearer tgproj_01H7...

API keys are created through the admin API and are scoped to a single project. Each key can have events:write and/or events:read scopes. The plaintext key is shown only once at creation time; only its hash is stored.

API keys follow the format tgproj_<12-hex-prefix>_.

To get an API key, join the beta.

POST /v1/events

Ingest one LLM workflow step event.

Request Body

JSON
{
  // Required
  "step_name": "summarize_context",
  "provider": "openai",
  "model": "gpt-4.1-mini",

  // Optional (defaults shown)
  "workflow_name": "support_reply",           // default: "default"
  "input_tokens": 7600,                       // default: 0
  "output_tokens": 420,                       // default: 0
  "cost_usd": "0.00443",                      // default: "0", as string (avoids float issues)
  "timestamp": "2026-04-07T14:32:00Z",        // default: server current time
  "latency_ms": 342.5,                        // optional, milliseconds
  "metadata_json": {"prompt_version":"v3"},   // max 16 KiB
  "error": false,                             // default: false
  "error_message": null,                      // optional
  "event_id": "evt_01HQ..."                   // auto-generated UUID if omitted
}

Fields

  • step_name (required) - The specific step within the workflow (e.g. "summarize_context").
  • provider (required) - LLM provider name. Free-form, use consistent values.
  • model (required) - Model identifier (e.g. "gpt-4.1-mini", "claude-3-opus").
  • workflow_name - Logical grouping (e.g. "support_reply", "rag_pipeline"). Defaults to "default".
  • input_tokens, output_tokens - Token counts as reported by the provider. Default to 0.
  • cost_usd - Cost in USD as a string (avoids floating-point issues). Defaults to "0".
  • timestamp - ISO-8601 UTC. If omitted, server uses current time.
  • latency_ms - Request duration in milliseconds.
  • metadata_json - Arbitrary JSON object for tagging (max 16 KiB compact JSON).
  • event_id - Optional client-generated UUID. Duplicate events return the existing event with idempotent: true. Auto-generated if omitted.

Response

201 Created (or 200 OK if idempotent)
{
  "event": {
    "event_id": "evt_01HQ...",
    "project_id": "prj_01HQ...",
    "timestamp": "2026-04-07T14:32:00Z",
    "workflow_name": "support_reply",
    "step_name": "summarize_context",
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "input_tokens": 7600,
    "output_tokens": 420,
    "cost_usd": "0.00443",
    "latency_ms": 342.5,
    "metadata_json": {"prompt_version": "v3"},
    "error": false,
    "error_message": null,
    "created_at": "2026-04-07T14:32:01Z"
  },
  "idempotent": false
}

cURL Example

curl -X POST https://api.tokengoblin.dev/v1/events \
  -H "Authorization: Bearer tgproj_..." \
  -H "Content-Type: application/json" \
  -d '{
    "step_name": "summarize_context",
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "input_tokens": 7600,
    "output_tokens": 420,
    "cost_usd": "0.00443"
  }'

Python SDK

tokengoblin SDK
from tokengoblin import TokenGoblin

goblin = TokenGoblin(
    api_url="http://127.0.0.1:8000",
    api_key="tgproj_...",
    workflow_name="support_reply",
    provider="openai",
    model="gpt-4.1-mini",
)

with goblin.step("summarize_context") as step:
    step.record_usage(
        input_tokens=7600,
        output_tokens=420,
        cost_usd="0.00443",
    )

GET /v1/events

List ingested events for the authenticated project using cursor-based pagination.

Query Parameters

  • cursor - Opaque pagination cursor from a previous response.
  • limit - Number of events to return (1-1000, default 100).

Response

{
  "items": [ /* Event objects */ ],
  "next_cursor": "...",
  "limit": 100
}

GET /v1/drift

Compare cost and token usage between two explicit time periods. Returns a JSON response with total deltas, per-model breakdowns, per-day aggregations, and input-vs-output token splits.

Query Parameters

  • period_a_start (required) — ISO-8601 start of the first comparison period.
  • period_a_end (required) — ISO-8601 end of the first comparison period.
  • period_b_start (required) — ISO-8601 start of the second comparison period.
  • period_b_end (required) — ISO-8601 end of the second comparison period.

cURL Example

curl -s https://api.tokengoblin.dev/v1/drift \
  -H "Authorization: Bearer tgproj_..." \
  -G --data-urlencode "period_a_start=2026-03-25T00:00:00Z" \
  --data-urlencode "period_a_end=2026-03-31T23:59:59Z" \
  --data-urlencode "period_b_start=2026-04-01T00:00:00Z" \
  --data-urlencode "period_b_end=2026-04-07T23:59:59Z"

Response

{
  "project_id": "...",
  "period_a_start": "2026-03-25T00:00:00Z",
  "period_a_end": "2026-03-31T23:59:59Z",
  "period_b_start": "2026-04-01T00:00:00Z",
  "period_b_end": "2026-04-07T23:59:59Z",
  "period_a_total_cost_usd": "0.006000000",
  "period_a_total_tokens": 1650,
  "period_a_event_count": 2,
  "period_b_total_cost_usd": "0.030000000",
  "period_b_total_tokens": 2200,
  "period_b_event_count": 1,
  "cost_delta_usd": "0.024000000",
  "cost_delta_percent": 400.0,
  "by_model": [
    {
      "provider": "openai",
      "model": "gpt-4",
      "period_a_cost_usd": "0.006000000",
      "period_b_cost_usd": "0.030000000",
      "cost_delta_usd": "0.024000000",
      "cost_delta_percent": 400.0
    }
  ],
  "token_breakdown": [ /* period_a, period_b input/output split */ ],
  "by_day": [ /* per-day aggregations with period labels */ ]
}

GET /v1/report/markdown

Generate a Markdown cost drift report for all events in the authenticated project. The report groups events by workflow, step, provider, and model, then splits each group's events in half to compare average cost between the earlier and later halves. Clear drifts (25%+ change and $0.001+ absolute) are flagged.

This is a zero-config quick-check endpoint. For explicit period comparisons, use GET /v1/drift instead.

Example Output (rendered)

Workflow Step Provider Model Events Total Cost Avg Cost Prior Avg Current Avg Change Drift
support_reply summarize_context openai gpt-4.1-mini 340 $1.6320 $0.0048 $0.0021 $0.0048 +129% yes
support_reply generate_reply openai gpt-4.1-mini 340 $1.2240 $0.0036 $0.0035 $0.0036 +3% no

Rate Limits & Idempotency

Rate Limits (Private Beta)

  • POST /v1/events — 120 requests/minute per project API key, burst 240
  • GET /v1/events, GET /v1/drift, and GET /v1/report/markdown — 30 requests/minute per project API key, burst 60
  • Health endpoints (/health, /ready) are exempt
  • All unauthenticated requests are subject to an additional IP-based rate limit for abuse prevention

Rate-limited responses include a Retry-After header. The built-in limiter is in-memory and intended for single-instance deployment during private beta.

Idempotency

If you include an event_id in the request body, the API will not create a duplicate. Subsequent POSTs with the same (project_id, event_id) pair return the existing event with "idempotent": true.

Request Limits

  • Request body: 1 MiB (configurable)
  • metadata_json: 16 KiB compact JSON
  • Stored raw payload: 64 KiB compact JSON

More Information

  • OpenAPI spec — available at /openapi.json on self-hosted instances (local/dev mode)
  • Interactive API docs — available at /docs and /redoc on self-hosted instances (local/dev mode)
  • Local development: docker compose up and follow the included walkthrough guide
  • Contact us via the beta signup form for API keys, support, or questions