LLM Cost Control

Never get a surprise
API bill again

A local proxy that enforces token budgets on every OpenAI-compatible API call. No SDK changes. No code changes. One command.

View on GitHub See it in action

terminal

$ pip install token-budget-proxy

Architecture

One proxy, zero code changes

The proxy sits between your application and the real API. Your code doesn't need to change at all.

Your code ↓ http://localhost:8080 (token-budget-proxy) ↓ counts tokens, checks budget ↓ allowed or ↓ blocked (HTTP 429) api.openai.com (or any compatible API)

Live output

What you see in the terminal

Every request is logged with token counts. Blocked requests show exactly why they were stopped.

tokenproxy start --max-tokens-per-request 4096 --max-tokens-per-minute 20000

[INFO] Token-budget proxy listening on http://127.0.0.1:8080 [INFO] Upstream: https://api.openai.com/v1 [INFO] Per-request limit: 4096 tokens [INFO] Per-minute limit: 20000 tokens [INFO] POST /v1/chat/completions — prompt=312t max_out=512t [INFO] POST /v1/chat/completions — prompt=891t max_out=1024t [BLOCK] /v1/chat/completions — Request would use 5200 tokens, exceeding per-request limit of 4096. [WARN] Rate limit: 18400 tokens used in the last 60s, adding 2100 would exceed 20000/min.

Budget modes

Three ways to control costs

Use one or combine all three. The strictest matching limit wins.

--max-tokens-per-request

Per-request limit

Blocks any single call that would exceed N tokens. Catches runaway prompts immediately.

--max-tokens-per-minute

Rate limiting

Rolling 60-second window. Prevents burst usage from loops or retries.

--max-tokens-total

Session budget

Hard cap for the entire proxy session. Set it to your daily budget and forget about it.

--warn-only

Warn-only mode

Logs violations but forwards requests anyway. Useful for auditing without blocking.

Compatible APIs

Works with any OpenAI-compatible API

Just change the --upstream flag.

OpenAI

api.openai.com/v1

Groq

api.groq.com/openai/v1

Together AI

api.together.xyz/v1

Ollama

localhost:11434/v1

LM Studio

localhost:1234/v1

Any compatible

your URL here

Never get a surpriseAPI bill again

One proxy, zero code changes

What you see in the terminal

Three ways to control costs

Works with any OpenAI-compatible API

Never get a surprise
API bill again