TokenCap enforces hard spend caps on every LLM call. Set a budget per agent. Block calls when the limit is hit. Get alerted in real time.
Every AI team eventually gets hit by one of these
The runaway loop
An agent calls GPT-4o 10,000 times because of a bug. $800 gone in 2 minutes.
The silent spike
A new feature ships, usage spikes overnight. You find out when the invoice arrives.
No visibility
Multiple agents running, no idea which one is eating the budget.
Monitoring mode — any language, no lock-in. Proxy mode — one line of config.
Option 1 — Monitoring: check, call, report
Check before calling
Ask TokenCap if the agent is allowed to spend.
const status = await fetch(
`${TOKENCAP_API}/v1/status?agent_id=${AGENT_ID}`,
{ headers: { Authorization: `Bearer ${API_KEY}` } }
).then(r => r.json());
if (!status.allowed) throw new Error('Cap exceeded');Make the LLM call
Nothing changes. Call OpenAI, Anthropic, or any provider as normal.
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
});Report what was spent
Pass the raw response — TokenCap extracts tokens and calculates cost.
await fetch(`${TOKENCAP_API}/v1/events`, {
method: 'POST',
headers: { Authorization: `Bearer ${API_KEY}` },
body: JSON.stringify({ agent_id: AGENT_ID, model: 'gpt-4o', response }),
});Option 2 — Proxy: one config change
Store your provider key in TokenCap (encrypted). Change your SDK's base URL. Caps enforced automatically — no check or report calls needed.
# OpenAI
client = openai.OpenAI(
base_url="https://api-production-0ba1.up.railway.app/proxy/openai/v1",
api_key=TOKENCAP_KEY,
default_headers={"X-TokenCap-Agent-Id": AGENT_ID},
)
# Call client.chat.completions.create() as normal — done
# Anthropic
client = anthropic.Anthropic(
base_url="https://api-production-0ba1.up.railway.app/proxy/anthropic/v1",
api_key="not-used",
default_headers={
"Authorization": "Bearer " + TOKENCAP_KEY,
"X-TokenCap-Agent-Id": AGENT_ID,
},
)Hard enforcement
Calls are blocked before they happen — not just flagged after the invoice arrives.
Transparent proxy
Point any OpenAI or Anthropic SDK at our proxy URL. Caps enforced with one line of config. Prompts never stored.
Per-agent budgets
Set different limits for different agents. Your chatbot gets $10/day. Your pipeline gets $50/month.
Loop protection
Velocity caps catch infinite loops by rate-limiting calls per minute before they cost you.
220+ models
Built-in pricing for OpenAI, Anthropic, Gemini, Mistral, Cohere, and DeepSeek. Updated weekly.
Instant alerts
Slack, webhook, or email when a cap is hit or approaching 80%. Fires in real time.
Full audit log
Every allowed and blocked call recorded. Filter by agent, model, or date.
Start free. Upgrade when you need more.
Growth
Enterprise
Free plan. No credit card. Works with OpenAI, Anthropic, Gemini, and any other LLM provider.
Get started for free