Gateway check
Block bad calls before they execute.
The Audit Vault is the after-the-fact record. The gateway-check endpoint is the before-the-fact decision. Same engine, same policies, same detectors -- flipped from observe to enforce.
How it works
Call POST /api/v1/gateway/check with the same event shape you'd send to /api/v1/audit/ingest, but BEFORE you make the model call. The response includes an explicit action field:
allow: no enabled policy matched. Proceed.alert: a policy matched but its action is alert, not block. Proceed and let the receipt and alert flow handle it.block: at least one enabled policy returned action=block. Do not proceed. Return the findings to the upstream caller.
Allowed and blocked decisions are both persisted to your Audit Vault, so you have a complete trail of what was attempted, not just what executed.
Drop-in pattern
import requests, openai
def safe_chat_completion(messages, model="gpt-4o"):
check = requests.post(
"https://prova.cobound.dev/api/v1/gateway/check",
headers={"Authorization": f"Bearer {os.environ['PROVA_API_KEY']}"},
json={
"kind": "model_call",
"model": {"provider": "openai", "name": model},
"source": {"app_id": "support-bot", "environment": "production"},
"payload": {"messages": messages},
},
timeout=2.0,
).json()
if check["action"] == "block":
raise PolicyBlocked(check["findings"])
return openai.chat.completions.create(model=model, messages=messages)Total added latency for the allow path is ~80ms (one round-trip to Prova + the synchronous policy + detector pass). Blocking responses have the same shape with action: "block" plus the matched findings. You decide how to surface them to your upstream caller.
Or enforce inline with no per-call code (the proxy)
The drop-in above wires the check by hand. The proxy does it for you: point your OpenAI or Anthropic client base URL at Prova and every call is screened before it reaches the vendor, with a signed receipt recorded automatically. No per-call code. The SDK makes it one line:
import OpenAI from 'openai'
import { routeThroughGateway } from '@cobound/prova-sdk'
// Route every call through the gateway in guarantee (fail-closed) mode.
const openai = routeThroughGateway(new OpenAI(), { policy: 'guarantee' })
// Your vendor key still rides in Authorization and is forwarded to the vendor;
// the Prova key rides X-Prova-Auth. A blocking verdict returns HTTP 422 before
// the call is made (works for streaming too: the decision is pre-forward).Three enforcement tiers, set per request via X-Prova-Policy:
observe(default): forward every call and record a signed receipt. Never blocks.enforce: ablockverdict on the request returns 422 before the vendor is called. Out of the box a block verdict means a secret in the prompt, a crossed budget cap, a boundary violation, or an unauthorized agent capability. PII and prompt injection are detected and recorded by default (actionalert): raisepii_in_promptorprompt_injection_patterntoblockon the policy dashboard to enforce them here too. Fail-open: if Prova is unreachable the call still proceeds, so enforcement never takes your inference down.guarantee: enforce, plus fail-closed. If the gate cannot be reached, the call is blocked (HTTP 503) rather than allowed through unscreened. This is the tier that makes "a disallowed call is blocked before it runs" hold even during a Prova outage, at the cost of availability. Use it on the regulated paths where a missed screen is worse than a failed call.
Every checked call resolves to one verdict, returned in the X-Prova-Action response header: allow, alert, or block. Each gateway-routed executed call carries a signed _prova_gated marker recording the tier it passed, so your gateway coverage (the share of traffic carrying the guarantee) is a measured number an auditor can verify.
Enforcement screens the request. Output-side checks (PII in the response, groundedness) and per-call cost land on the post-execution receipt; a single call's marginal output cost cannot be enforced before it runs.
What gets evaluated
The gateway-check endpoint runs the same engine as ingest:
- All enabled policies for your org (built-in + customer-authored).
- All enabled inline detectors (prompt injection, PII leak).
- The highest action across matches wins: block > alert > allow.
External-API detectors (coordination loops) do not run on the gateway path because they require multi-step history. Use them on the post-execution ingest surface instead.
Response shape
{
"mode": "authenticated",
"action": "block",
"blocked": true,
"event_id": "9f4a2e71d03bcafe...",
"findings": [
{
"detector": "policy:secret_in_prompt",
"verdict": "policy_violation",
"severity": "critical",
"summary": "Secret credential detected in event payload.",
"details": { "pattern": "\\bsk-[A-Za-z0-9]{20,}\\b" },
"remediation": "Rotate the exposed credential immediately..."
}
],
"receipt": { /* full signed AIDecisionEvent */ }
}Tuning what blocks vs alerts
Each policy ships with a default action. Override it per-org from the policy dashboard. Common patterns:
- Run in alert-only mode for the first week to baseline. Find out what your system would actually have blocked, without breaking traffic.
- Promote to block per policy once you trust it (data-protection ones tend to graduate fastest; safety ones can stay alert-mode in some workloads).
- Stage a staging environment with all policies in block mode + production in alert mode. Diff the receipts to find regressions before they ship.
Latency budget + failure mode
The gateway-check endpoint runs entirely synchronously inside your request path. Typical latency:
- Network: ~30-60ms round-trip to Vercel edge.
- Policy + detector evaluation: < 5ms for the full library against a typical payload.
- Receipt signing: < 1ms (Ed25519).
- Audit persistence: best-effort, async-tolerant.
Decide upfront how to handle Prova being unavailable: fail-open (proceed with the model call) or fail-closed (refuse). Most teams start fail-open in dev and fail-closed in regulated production paths. Either way, set a tight timeout (~2 seconds) on your client.