Detectors
Algorithmic checks on every receipt.
Detectors are the first-party algorithms that fire on specific failure modes -- coordination loops in agent systems, prompt-injection attempts, regulated data leaking in model outputs, bias drift over time, hallucination against retrieved context. They run automatically on every AI decision ingested into your Audit Vault.
Detectors vs. policies
Both detectors and policies emit findings on receipts. The difference is who owns the logic:
- Policies are declarative rules customers tune ("alert when step count exceeds 50"). Author them in the policy editor; tweak thresholds without redeploying anything.
- Detectors are first-party algorithms. Pattern-based, statistical, or ML-based: the implementation is opaque. You just toggle them on or off.
Practically: if it can be expressed in the policy JSON DSL, write a policy. If it needs counting across events, model inference, or a learned threshold, request a detector.
Execution modes
Each detector runs in one of three modes:
inline
Runs in the Next.js ingest handler before the receipt is signed. Detectors can be sync (pattern matching, ~1ms) or async (DB query or external model call). All detectors share a 1.5s per-detector timeout enforced by the evaluator, so a slow detector cannot stall ingest.
external_api
Reserved for future detectors that genuinely need a separate runtime (heavy ML on GPU, etc.). Not used today; all 5 first-party detectors run inline.
preview
Registered but not yet implemented. The toggle is visible in the dashboard so you can pre-configure your environment, but the algorithm itself has not shipped.
The detector catalog
5 detectors ship with Prova today. Enable/disable each in the detector dashboard.
Safety
Pattern-based detector for common prompt-injection phrasing in event inputs. Scores 11 known attack templates and fires at score >= 3.
prompt_injection
Data protection
Scans model outputs for regulated data patterns (SSN, credit card, passport, medical record number, ICD-10, email). Complements the input-side PII policy.
pii_leak
Integrity
For RAG systems: flags model claims not supported by the retrieved context attached to the event. Uses Claude Haiku as an inline entailment classifier (requires ANTHROPIC_API_KEY). Sampled at 20% of qualifying events; ~$0.0001 per detection.
hallucination
Fairness
Statistical detector that flags when approval rates for a labeled `subject.group` in the current 24h window diverge by more than 20 percentage points from the prior 30-day baseline. Requires a group label and a binary outcome signal (decision / outcome / verdict). Sampled at 10% of qualifying events to control DB load.
bias_drift
Coordination
Detects persistent cycles in multi-agent execution by building the agent communication graph from a trace, running Tarjan SCC, and checking whether the cycle persists across many steps. Catches the case where N agents pass work between each other without progressing.
coordination_loop
What a detector finding looks like
Detector matches attach to the receipt as findings. The detector field is prefixed with detector: (vs. policy: for policies) so downstream consumers can group them separately.
{
"detector": "detector:prompt_injection",
"verdict": "prompt_injection",
"severity": "high",
"summary": "Prompt-injection patterns detected (score 8, confidence high).",
"details": {
"score": 8,
"confidence": "high",
"matched_patterns": [
{ "label": "override_prior", "weight": 4 },
{ "label": "exfil_system_prompt", "weight": 5 }
]
},
"remediation": "Sanitize untrusted input before placing it in a system prompt..."
}Custom detectors (preview)
If you've built an internal classifier (a credit-decision fairness model, a medical-coding accuracy check, a domain-specific safety filter) and want it to run on every event with a tamper-evident receipt, the detector SDK is in preview. It exposes the same (event) => DetectorMatch shape as the first-party detectors. Talk to us at founders@prova.cobound.dev if you want early access.