TokenCap

Live in production · 220+ models supported

Stop AI cost explosions
before they happen

TokenCap enforces hard spend caps on every LLM call. Set a budget per agent. Block calls when the limit is hit. Get alerted in real time.

Start free — no card required View docs

Every AI team eventually gets hit by one of these

The runaway loop

An agent calls GPT-4o 10,000 times because of a bug. $800 gone in 2 minutes.

The silent spike

A new feature ships, usage spikes overnight. You find out when the invoice arrives.

No visibility

Multiple agents running, no idea which one is eating the budget.

Two ways to integrate

Monitoring mode — any language, no lock-in. Proxy mode — one line of config.

Option 1 — Monitoring: check, call, report

Check before calling

Ask TokenCap if the agent is allowed to spend.

const status = await fetch(
  `${TOKENCAP_API}/v1/status?agent_id=${AGENT_ID}`,
  { headers: { Authorization: `Bearer ${API_KEY}` } }
).then(r => r.json());

if (!status.allowed) throw new Error('Cap exceeded');

Make the LLM call

Nothing changes. Call OpenAI, Anthropic, or any provider as normal.

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
});

Report what was spent

Pass the raw response — TokenCap extracts tokens and calculates cost.

await fetch(`${TOKENCAP_API}/v1/events`, {
  method: 'POST',
  headers: { Authorization: `Bearer ${API_KEY}` },
  body: JSON.stringify({ agent_id: AGENT_ID, model: 'gpt-4o', response }),
});

Option 2 — Proxy: one config change

Store your provider key in TokenCap (encrypted). Change your SDK's base URL. Caps enforced automatically — no check or report calls needed.

# OpenAI
client = openai.OpenAI(
  base_url="https://api-production-0ba1.up.railway.app/proxy/openai/v1",
  api_key=TOKENCAP_KEY,
  default_headers={"X-TokenCap-Agent-Id": AGENT_ID},
)
# Call client.chat.completions.create() as normal — done

# Anthropic
client = anthropic.Anthropic(
  base_url="https://api-production-0ba1.up.railway.app/proxy/anthropic/v1",
  api_key="not-used",
  default_headers={
    "Authorization": "Bearer " + TOKENCAP_KEY,
    "X-TokenCap-Agent-Id": AGENT_ID,
  },
)

Everything you need to control AI spend

Hard enforcement

Calls are blocked before they happen — not just flagged after the invoice arrives.

Transparent proxy

Point any OpenAI or Anthropic SDK at our proxy URL. Caps enforced with one line of config. Prompts never stored.

Per-agent budgets

Set different limits for different agents. Your chatbot gets $10/day. Your pipeline gets $50/month.

Loop protection

Velocity caps catch infinite loops by rate-limiting calls per minute before they cost you.

220+ models

Built-in pricing for OpenAI, Anthropic, Gemini, Mistral, Cohere, and DeepSeek. Updated weekly.

Instant alerts

Slack, webhook, or email when a cap is hit or approaching 80%. Fires in real time.

Full audit log

Every allowed and blocked call recorded. Filter by agent, model, or date.

Simple pricing

Start free. Upgrade when you need more.

Free

£0forever

✓1 agent
✓1 seat
✓7-day event log
✓Daily & monthly caps

Get started

Starter

£25per month

✓10 agents
✓5 seats
✓30-day event log
✓All cap types
✓Slack alerts

Get started

Growth

£52per month

✓50 agents
✓20 seats
✓90-day event log
✓All cap types
✓All alert types
✓Loop protection

Get started

Enterprise

From £300per month

✓Custom agents & seats
✓Unlimited event log
✓All cap types
✓All alert types
✓Loop protection
✓Priority support

Get started

Set your first cap in 10 minutes

Free plan. No credit card. Works with OpenAI, Anthropic, Gemini, and any other LLM provider.

Get started for free

Stop AI cost explosions before they happen

Two ways to integrate

Everything you need to control AI spend

Simple pricing

Set your first cap in 10 minutes

Stop AI cost explosions
before they happen