Your AI layer.
Under control.
CapHound sits in front of your LLM calls and makes real-time decisions — block, route, enforce — based on rules you define. When a bug triggers thousands of requests, CapHound stops it. When free users hit expensive models, CapHound reroutes them. No code changes. Point your SDK at CapHound.
We never store your prompt content. Ever.
AI costs don't drift. They spike.
By the time the invoice arrives, the damage is done.
Enforce. Attribute. Control.
CapHound doesn't just report on your AI usage — it governs it.
Enforcement
Set hard limits per feature, team, or customer. When the budget hits, CapHound blocks the request. Not an alert. A block.
Attribution
Every request tagged by feature, team, and customer. You know exactly what caused the spike.
Routing
Define rules. CapHound routes automatically. Dev uses GPT-4o-mini. Free users get the cheaper model. No app changes.
See every decision CapHound made — and why
Every request shows: original model, final model, decision taken, reason. Full audit trail. No black boxes.
How it works
Point your SDK at CapHound
CapHound mirrors your LLM provider's API surface exactly. No code changes. No refactoring. One config update.
CapHound makes decisions on every request
Budget check. Policy check. Routing rules. All evaluated inline, before the request hits your provider.
You see everything
Every decision logged. Every cost attributed. Every block explained. You're in control — and you can prove it.
Drop-in. No refactor.
Already calling OpenAI? Change two lines. CapHound mirrors the API surface — your existing code stays exactly the same.
from openai import OpenAI
client = OpenAI(
api_key="warden_live_...",
base_url="https://api.wardenai.dev/v1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={"x-warden-feature": "chat"},
)Available for Python and Node.js · OpenAI-compatible · Same SDK calls, same response shape
Rules you define. Decisions CapHound makes.
Not alerts you act on later — enforcement that happens now.
Block
Hard stop when a budget limit is reached. The request never hits the provider. No alert to act on — it's already done.
Route
Automatically switch models based on feature, environment, or customer. Cheaper model in dev. Best model for paid users.
Enforce
Restrict which models are allowed across your system. Prevent teams from reaching for GPT-4 when GPT-4o-mini will do.
We never see your prompts
CapHound operates on metadata — model, cost, tags. We never store prompts, responses, or request bodies. Your data travels directly between your application and the LLM provider.
This isn't a policy — it's how the system is built.