LLM Cost Control

Never get a surprise
API bill again

A local proxy that enforces token budgets on every OpenAI-compatible API call. No SDK changes. No code changes. One command.

View on GitHub See it in action
terminal
$ pip install token-budget-proxy

Architecture

One proxy, zero code changes

The proxy sits between your application and the real API. Your code doesn't need to change at all.

Your code http://localhost:8080 (token-budget-proxy) ↓ counts tokens, checks budget ↓ allowed or ↓ blocked (HTTP 429) api.openai.com (or any compatible API)

Live output

What you see in the terminal

Every request is logged with token counts. Blocked requests show exactly why they were stopped.

tokenproxy start --max-tokens-per-request 4096 --max-tokens-per-minute 20000
[INFO] Token-budget proxy listening on http://127.0.0.1:8080 [INFO] Upstream: https://api.openai.com/v1 [INFO] Per-request limit: 4096 tokens [INFO] Per-minute limit: 20000 tokens [INFO] POST /v1/chat/completions — prompt=312t max_out=512t [INFO] POST /v1/chat/completions — prompt=891t max_out=1024t [BLOCK] /v1/chat/completions — Request would use 5200 tokens, exceeding per-request limit of 4096. [WARN] Rate limit: 18400 tokens used in the last 60s, adding 2100 would exceed 20000/min.

Budget modes

Three ways to control costs

Use one or combine all three. The strictest matching limit wins.

--max-tokens-per-request
Per-request limit
Blocks any single call that would exceed N tokens. Catches runaway prompts immediately.
--max-tokens-per-minute
Rate limiting
Rolling 60-second window. Prevents burst usage from loops or retries.
--max-tokens-total
Session budget
Hard cap for the entire proxy session. Set it to your daily budget and forget about it.
--warn-only
Warn-only mode
Logs violations but forwards requests anyway. Useful for auditing without blocking.

Compatible APIs

Works with any OpenAI-compatible API

Just change the --upstream flag.

OpenAI
api.openai.com/v1
Groq
api.groq.com/openai/v1
Together AI
api.together.xyz/v1
Ollama
localhost:11434/v1
LM Studio
localhost:1234/v1
Any compatible
your URL here