QuotaKit gives you real attribution, hard limits, and enforcement across every API call in your stack — without proxies, without infrastructure changes, without the mystery bills.
No credit card. No proxy. No infrastructure changes.
import quotakit
import openai
quotakit.init(api_key="aisc_...")
client = openai.OpenAI()
with quotakit.track("app/prod", service="openai", model="gpt-4o") as t:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this"}],
)
t.result(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
success=True,
)Sound familiar?
$4,200 from OpenAI this month. Is it the summarizer? The search feature? That internal tool someone shipped last quarter? Good luck tracing it back through the provider dashboard. They don't break it down by feature.
One feedback loop in your pipeline. One feature that went viral overnight. One leaked API key left in a public repo. By the time the alert fires, the damage is already done and you're writing an email to your finance team.
Your engineers are digging through logs, cross-referencing timestamps, writing ad-hoc scripts to figure out why costs spiked last Tuesday. That's not what you hired them for. Every hour spent there is an hour not spent shipping.
How QuotaKit fixes it
Wrap any API call with quotakit.track(), define your quotas, and QuotaKit handles the rest — AI models, third-party APIs, anything.
Attribution down to project, feature, service, and model. See thatapp/searchcosts $800/month on gpt-4o, andapp/summarizercosts $90 on claude-3-5-sonnet. Not just a single number from your provider.
Set hard limits by path, service, and model. Block mode raises before the call is ever made — the charge never happens. Open mode lets it through and flags it. Your call.
Not all failures cost the same. QuotaKit distinguishes successful calls, failed-but-charged calls, and free failures — so you know whether a spike is real spend or just noise.
Paths. Tags. Full control.
Any slash-separated string becomes a tracked path — app/search, app/v2/onboarding, infra/ml-pipeline. Add tags to organize further. Set quotas at any level — they stack down the tree automatically.
Quotas on app cascade down — search and summarizer each have their own limits too.
Cap the whole app, then cap individual paths lower. Children always answer to their parents — a child path can never exceed what its parent allows. Scope a quota to a specific service or model if you want. Every layer is enforced independently.
Get notified before a path blows its quota, not after the bill arrives. Set thresholds at whatever granularity makes sense for your team.
Built for teams shipping features on pay-as-you-go APIs — AI and otherwise.
The next surprise bill is preventable. Set it up in an afternoon — attribution and hard limits across every pay-as-you-go API you use.