Ingest LLM cost events per workflow step. Generate drift reports that compare two time windows with per-step percent change.
All API endpoints are served from:
https://api.tokengoblin.dev
For self-hosted deployments, use your own domain or
http://127.0.0.1:8000 when running locally via Docker Compose.
All project-scoped endpoints require a project API key. Pass it in the
Authorization header:
Authorization: Bearer tgproj_01H7...
API keys are created through the admin API and are scoped to a single
project. Each key can have events:write and/or
events:read scopes. The plaintext key is shown only once
at creation time; only its hash is stored.
API keys follow the format tgproj_<12-hex-prefix>_.
To get an API key, join the beta.
Ingest one LLM workflow step event.
{
// Required
"step_name": "summarize_context",
"provider": "openai",
"model": "gpt-4.1-mini",
// Optional (defaults shown)
"workflow_name": "support_reply", // default: "default"
"input_tokens": 7600, // default: 0
"output_tokens": 420, // default: 0
"cost_usd": "0.00443", // default: "0", as string (avoids float issues)
"timestamp": "2026-04-07T14:32:00Z", // default: server current time
"latency_ms": 342.5, // optional, milliseconds
"metadata_json": {"prompt_version":"v3"}, // max 16 KiB
"error": false, // default: false
"error_message": null, // optional
"event_id": "evt_01HQ..." // auto-generated UUID if omitted
}
step_name (required) - The specific step within the workflow (e.g. "summarize_context").provider (required) - LLM provider name. Free-form, use consistent values.model (required) - Model identifier (e.g. "gpt-4.1-mini", "claude-3-opus").workflow_name - Logical grouping (e.g. "support_reply", "rag_pipeline"). Defaults to "default".input_tokens, output_tokens - Token counts as reported by the provider. Default to 0.cost_usd - Cost in USD as a string (avoids floating-point issues). Defaults to "0".timestamp - ISO-8601 UTC. If omitted, server uses current time.latency_ms - Request duration in milliseconds.metadata_json - Arbitrary JSON object for tagging (max 16 KiB compact JSON).event_id - Optional client-generated UUID. Duplicate events return the existing event with idempotent: true. Auto-generated if omitted.{
"event": {
"event_id": "evt_01HQ...",
"project_id": "prj_01HQ...",
"timestamp": "2026-04-07T14:32:00Z",
"workflow_name": "support_reply",
"step_name": "summarize_context",
"provider": "openai",
"model": "gpt-4.1-mini",
"input_tokens": 7600,
"output_tokens": 420,
"cost_usd": "0.00443",
"latency_ms": 342.5,
"metadata_json": {"prompt_version": "v3"},
"error": false,
"error_message": null,
"created_at": "2026-04-07T14:32:01Z"
},
"idempotent": false
}
curl -X POST https://api.tokengoblin.dev/v1/events \
-H "Authorization: Bearer tgproj_..." \
-H "Content-Type: application/json" \
-d '{
"step_name": "summarize_context",
"provider": "openai",
"model": "gpt-4.1-mini",
"input_tokens": 7600,
"output_tokens": 420,
"cost_usd": "0.00443"
}'
from tokengoblin import TokenGoblin
goblin = TokenGoblin(
api_url="http://127.0.0.1:8000",
api_key="tgproj_...",
workflow_name="support_reply",
provider="openai",
model="gpt-4.1-mini",
)
with goblin.step("summarize_context") as step:
step.record_usage(
input_tokens=7600,
output_tokens=420,
cost_usd="0.00443",
)
List ingested events for the authenticated project using cursor-based pagination.
cursor - Opaque pagination cursor from a previous response.limit - Number of events to return (1-1000, default 100).{
"items": [ /* Event objects */ ],
"next_cursor": "...",
"limit": 100
}
Compare cost and token usage between two explicit time periods. Returns a JSON response with total deltas, per-model breakdowns, per-day aggregations, and input-vs-output token splits.
period_a_start (required) — ISO-8601 start of the first comparison period.period_a_end (required) — ISO-8601 end of the first comparison period.period_b_start (required) — ISO-8601 start of the second comparison period.period_b_end (required) — ISO-8601 end of the second comparison period.curl -s https://api.tokengoblin.dev/v1/drift \
-H "Authorization: Bearer tgproj_..." \
-G --data-urlencode "period_a_start=2026-03-25T00:00:00Z" \
--data-urlencode "period_a_end=2026-03-31T23:59:59Z" \
--data-urlencode "period_b_start=2026-04-01T00:00:00Z" \
--data-urlencode "period_b_end=2026-04-07T23:59:59Z"
{
"project_id": "...",
"period_a_start": "2026-03-25T00:00:00Z",
"period_a_end": "2026-03-31T23:59:59Z",
"period_b_start": "2026-04-01T00:00:00Z",
"period_b_end": "2026-04-07T23:59:59Z",
"period_a_total_cost_usd": "0.006000000",
"period_a_total_tokens": 1650,
"period_a_event_count": 2,
"period_b_total_cost_usd": "0.030000000",
"period_b_total_tokens": 2200,
"period_b_event_count": 1,
"cost_delta_usd": "0.024000000",
"cost_delta_percent": 400.0,
"by_model": [
{
"provider": "openai",
"model": "gpt-4",
"period_a_cost_usd": "0.006000000",
"period_b_cost_usd": "0.030000000",
"cost_delta_usd": "0.024000000",
"cost_delta_percent": 400.0
}
],
"token_breakdown": [ /* period_a, period_b input/output split */ ],
"by_day": [ /* per-day aggregations with period labels */ ]
}
Generate a Markdown cost drift report for all events in the authenticated project. The report groups events by workflow, step, provider, and model, then splits each group's events in half to compare average cost between the earlier and later halves. Clear drifts (25%+ change and $0.001+ absolute) are flagged.
This is a zero-config quick-check endpoint. For explicit period comparisons,
use GET /v1/drift instead.
| Workflow | Step | Provider | Model | Events | Total Cost | Avg Cost | Prior Avg | Current Avg | Change | Drift |
|---|---|---|---|---|---|---|---|---|---|---|
| support_reply | summarize_context | openai | gpt-4.1-mini | 340 | $1.6320 | $0.0048 | $0.0021 | $0.0048 | +129% | yes |
| support_reply | generate_reply | openai | gpt-4.1-mini | 340 | $1.2240 | $0.0036 | $0.0035 | $0.0036 | +3% | no |
POST /v1/events — 120 requests/minute per project API key, burst 240GET /v1/events, GET /v1/drift, and GET /v1/report/markdown — 30 requests/minute per project API key, burst 60/health, /ready) are exempt
Rate-limited responses include a Retry-After header.
The built-in limiter is in-memory and intended for single-instance deployment
during private beta.
If you include an event_id in the request body, the API
will not create a duplicate. Subsequent POSTs with the same
(project_id, event_id) pair return the existing event with
"idempotent": true.
metadata_json: 16 KiB compact JSON/openapi.json on self-hosted instances (local/dev mode)/docs and /redoc on self-hosted instances (local/dev mode)docker compose up and follow the included walkthrough guide