Detectors

Algorithmic checks on every receipt.

Detectors are the first-party algorithms that fire on specific failure modes -- coordination loops in agent systems, prompt-injection attempts, regulated data leaking in model outputs, bias drift over time, hallucination against retrieved context. They run automatically on every AI decision ingested into your Audit Vault.

Detectors vs. policies

Both detectors and policies emit findings on receipts. The difference is who owns the logic:

  • Policies are declarative rules customers tune ("alert when step count exceeds 50"). Author them in the policy editor; tweak thresholds without redeploying anything.
  • Detectors are first-party algorithms. Pattern-based, statistical, or ML-based: the implementation is opaque. You just toggle them on or off.

Practically: if it can be expressed in the policy JSON DSL, write a policy. If it needs counting across events, model inference, or a learned threshold, request a detector.

Execution modes

Each detector runs in one of three modes:

inline

Runs in the Next.js ingest handler before the receipt is signed. Detectors can be sync (pattern matching, ~1ms) or async (DB query or external model call). All detectors share a 1.5s per-detector timeout enforced by the evaluator, so a slow detector cannot stall ingest.

external_api

Reserved for future detectors that genuinely need a separate runtime (heavy ML on GPU, etc.). Not used today; all 5 first-party detectors run inline.

preview

Registered but not yet implemented. The toggle is visible in the dashboard so you can pre-configure your environment, but the algorithm itself has not shipped.

The detector catalog

5 detectors ship with Prova today. Enable/disable each in the detector dashboard.

Safety

Prompt injectionhigh · inline · on by default

Pattern-based detector for common prompt-injection phrasing in event inputs. Scores 11 known attack templates and fires at score >= 3.

prompt_injection

Data protection

PII / PHI leak in outputhigh · inline · on by default

Scans model outputs for regulated data patterns (SSN, credit card, passport, medical record number, ICD-10, email). Complements the input-side PII policy.

pii_leak

Integrity

Hallucination vs. retrieved contextmedium · inline · off by default

For RAG systems: flags model claims not supported by the retrieved context attached to the event. Uses Claude Haiku as an inline entailment classifier (requires ANTHROPIC_API_KEY). Sampled at 20% of qualifying events; ~$0.0001 per detection.

hallucination

Fairness

Bias drift over timehigh · inline · off by default

Statistical detector that flags when approval rates for a labeled `subject.group` in the current 24h window diverge by more than 20 percentage points from the prior 30-day baseline. Requires a group label and a binary outcome signal (decision / outcome / verdict). Sampled at 10% of qualifying events to control DB load.

bias_drift

Coordination

Coordination loophigh · inline · on by default

Detects persistent cycles in multi-agent execution by building the agent communication graph from a trace, running Tarjan SCC, and checking whether the cycle persists across many steps. Catches the case where N agents pass work between each other without progressing.

coordination_loop

What a detector finding looks like

Detector matches attach to the receipt as findings. The detector field is prefixed with detector: (vs. policy: for policies) so downstream consumers can group them separately.

{
  "detector": "detector:prompt_injection",
  "verdict": "prompt_injection",
  "severity": "high",
  "summary": "Prompt-injection patterns detected (score 8, confidence high).",
  "details": {
    "score": 8,
    "confidence": "high",
    "matched_patterns": [
      { "label": "override_prior", "weight": 4 },
      { "label": "exfil_system_prompt", "weight": 5 }
    ]
  },
  "remediation": "Sanitize untrusted input before placing it in a system prompt..."
}

Custom detectors (preview)

If you've built an internal classifier (a credit-decision fairness model, a medical-coding accuracy check, a domain-specific safety filter) and want it to run on every event with a tamper-evident receipt, the detector SDK is in preview. It exposes the same (event) => DetectorMatch shape as the first-party detectors. Talk to us at founders@prova.cobound.dev if you want early access.