Your AI layer.
Under control.

CapHound sits in front of your LLM calls and makes real-time decisions — block, route, enforce — based on rules you define. When a bug triggers thousands of requests, CapHound stops it. When free users hit expensive models, CapHound reroutes them. No code changes. Point your SDK at CapHound.

We never store your prompt content. Ever.

Works with
OpenAIAnthropicGoogle Gemini

AI costs don't drift. They spike.

By the time the invoice arrives, the damage is done.

A bug triggers a request loop. 4,000 calls fire before anyone notices.
A model change doubles cost per request — silently.
One feature drives 70% of your AI bill. No one knows which one.
Finance sees the number. Engineering can't explain it.

Enforce. Attribute. Control.

CapHound doesn't just report on your AI usage — it governs it.

Enforcement

Set hard limits per feature, team, or customer. When the budget hits, CapHound blocks the request. Not an alert. A block.

Attribution

Every request tagged by feature, team, and customer. You know exactly what caused the spike.

Routing

Define rules. CapHound routes automatically. Dev uses GPT-4o-mini. Free users get the cheaper model. No app changes.

See every decision CapHound made — and why

Every request shows: original model, final model, decision taken, reason. Full audit trail. No black boxes.

Control Center
Last 30 daysLive
Total Spend
$12,847
+8.3% vs prev period
Requests
1.2M
+12.1% vs prev period
Avg Cost/Req
$0.0107
-2.4% vs prev period
Active Models
6
Spend by Model
GPT-4o$5,142
Claude 3.5 Sonnet$3,854
GPT-4o-mini$2,056
Gemini 1.5 Pro$1,795
Top Features by Cost
Chat
$4,210
Summarization
$3,180
Search
$2,940
Classification
$2,517

How it works

01

Point your SDK at CapHound

CapHound mirrors your LLM provider's API surface exactly. No code changes. No refactoring. One config update.

02

CapHound makes decisions on every request

Budget check. Policy check. Routing rules. All evaluated inline, before the request hits your provider.

03

You see everything

Every decision logged. Every cost attributed. Every block explained. You're in control — and you can prove it.

Drop-in. No refactor.

Already calling OpenAI? Change two lines. CapHound mirrors the API surface — your existing code stays exactly the same.

main.pyPython
from openai import OpenAI

client = OpenAI(
    api_key="warden_live_...",
    base_url="https://api.wardenai.dev/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"x-warden-feature": "chat"},
)

Available for Python and Node.js · OpenAI-compatible · Same SDK calls, same response shape

Rules you define. Decisions CapHound makes.

Not alerts you act on later — enforcement that happens now.

Block

Hard stop when a budget limit is reached. The request never hits the provider. No alert to act on — it's already done.

Route

Automatically switch models based on feature, environment, or customer. Cheaper model in dev. Best model for paid users.

Enforce

Restrict which models are allowed across your system. Prevent teams from reaching for GPT-4 when GPT-4o-mini will do.

We never see your prompts

CapHound operates on metadata — model, cost, tags. We never store prompts, responses, or request bodies. Your data travels directly between your application and the LLM provider.

This isn't a policy — it's how the system is built.

Put your AI layer under control

Start free. Add enforcement as you grow.

Get Started Free