Changelog

What we shipped.

Everything that's changed in Prova, in reverse chronological order.

Jun 8, 2026

Run health: a label-free verdict on every agent run

Observability shows you traces. Run health gives you a verdict. Every agent run now gets a 0 to 100 health score and a letter grade, read straight from the signals already in your receipts. No eval set, no labels, no LLM.

A score you can act on. Clean runs auto-pass. Clearly-broken runs (a coordination loop, a blocked call, a high or critical finding) auto-flag. Only the ambiguous middle is routed to a human. You triage by exception instead of reading every trace.

Every point off is explained. The score is 100 minus the sum of named signal penalties: coordination loop (45), blocked call (25), severe finding (20 to 30), no-progress cycle (20), step blowup (15), repeated tool call (12), medium finding (10). A poor grade tells you what went wrong, not just that something did. The dashboard shows each signal with its penalty and detail.

Deterministic, and the same everywhere. The scorer is pure: no network, no clock, no model call. It runs offline in the SDK with no account (prova-local), and on /dashboard/health once you ingest. The Python and Node ports match the server. Free on every plan.

It answers whether the run was healthy, not whether the final answer was semantically correct. That line is deliberate: a semantic judge would need an LLM, and this layer stays deterministic. Full detail in the run health guide.

Jun 7, 2026

Coordination-loop detection now separates a stuck loop from productive iteration

A planner/executor loop and a stuck loop have the same shape: agents revisiting each other in a cycle. The detector used to fire on that structure alone, which flagged healthy iteration as a fault. It now checks whether the loop is actually making progress.

Progress, measured by content. After finding a cycle, the detector clusters the state each step produced by near-duplicate similarity (k=4 byte-shingles, Jaccard at or above 85%). It fires only when the loop keeps revisiting states it already produced. A converging planner/executor or reviewer/worker loop produces materially new state each round and stays silent. On the canonical validation set this lifted precision from 40% to 100%, with recall held at 100%.

Drift-resistant. A small changing token each round (a counter, a timestamp, a retry number) is a tiny fraction of a large repeated output, so similarity stays high and the stuck loop still fires. The same change in a small output stays distinct, so a genuinely converging loop is preserved.

Deterministic and cross-language. The server, the Python SDK, and the Node SDK share a byte-identical implementation, so they agree on the same trace. No embeddings, no LLM.

Pure paraphrase is opt-in. A loop that restates the same non-progress in entirely different words has low lexical overlap. Closing that last gap is an optional server-side enrichment (PROVA_SEMANTIC_LOOP=1 plus an embeddings key) that re-clusters by embedding similarity. It runs only when the inline detector did not already fire, is bounded by a timeout, and fails open, so a stock deployment is byte-identical to before.

Jun 6, 2026

Drop-in adapters for Vercel AI SDK, OpenAI Agents, LlamaIndex, and Pydantic AI

Prova now meets your agent where it runs. Four new first-class adapters sign a receipt for every model call and agent step, with no manual instrumentation.

Vercel AI SDK (Node). Wrap your model with provaMiddleware and every generateText and streamText call emits a signed model_call receipt with token usage, so cost is computed and signed server-side. Streaming calls are tapped, not buffered. Works with the ai package v5 and v6, and ai is an optional peer dependency so the SDK installs cleanly without it.

OpenAI Agents SDK (Python). ProvaAgentsHooks plugs into the SDK's run hooks, so agent turns and tool calls land in the Audit Vault. Fail-silent: a Prova outage never breaks the run.

LlamaIndex (Python). ProvaLlamaHandler attaches to the LlamaIndex event stream and records model calls with token usage.

Pydantic AI (Python). ingest_pydantic_run turns a completed Pydantic AI run into receipts.

These sit alongside the existing LangGraph and LangChain callback handler, the CrewAI tap, and RunGuard for any other runtime. Telemetry is fire-and-forget throughout: a failed ingest never breaks your model call. Quickstart for each is in the SDK guide.

Jun 5, 2026

Receipts for RAG retrieval and embedding calls

The receipt schema now covers two more kinds of AI work that used to slip through unaudited: vector retrieval and embedding generation. If your system does RAG, the lookup and the embedding step are now on the record like any model call.

Two new decision kinds. rag_retrieval captures the query, the retrieved context, and the result count. embedding_call captures the input and token usage, not the vectors themselves. Both are accepted by ingest, gateway-check, and policy evaluation, and both price end to end through the cost pipeline.

A new policy, pii_in_retrieved_context. Retrieved context is a real compliance gap. Personal data can sit in your vector store and get pulled into a prompt without anyone screening it. The pii_in_retrieved_context policy scans payload.retrieved_context for PII. The existing PII, PHI, and secret policies apply automatically too, since they scan every string on the payload.

Captured automatically by the SDK. The Python and Node control-plane SDKs emit rag_retrieval from the retriever callbacks, and wrap_openai now also taps embeddings.create to emit embedding_call. No manual instrumentation.

No new migration.

Jun 4, 2026

Continuous ingestion from LangSmith, Langfuse, and OpenAI

Teams already running LangSmith or Langfuse no longer have to choose between their existing tracing and a Prova audit trail. A new inbound webhook mirrors traces into the Audit Vault continuously, so every model call, agent run, and tool use gets a signed receipt without changing how you instrument.

The endpoint. POST /api/v1/ingest/webhook?source=langsmith|langfuse|openai. Point a LangSmith or Langfuse webhook (or your own forwarder) at it with your Prova key in the Authorization header or a ?key= query param. A single object, an array, or a {records|data|observations|runs|items} envelope all work.

Same receipts as the one-shot backfill. Vendor payloads are mapped to the AIDecisionEvent shape by the same mappers the migrate CLI uses, then forwarded to the canonical ingest path. Backfilled history and live-mirrored traffic produce byte-identical receipts: same signing, same policies, same detectors, same cost attribution.

Why it matters. The migrate command was a snapshot. This is the live mirror. You keep your current tracing stack, and Prova becomes the continuous system of record alongside it rather than a rip-and-replace.

Docs at /docs/ingest-webhook.

No new migration.

Jun 3, 2026

Inline enforcement: block a risky model call before it runs

The drop-in OpenAI and Anthropic proxy used to record decisions after the fact. It can now check a request against your policies and detectors before it reaches the model, and stop it when it breaks one. The control plane is no longer advisory.

How it works. Point your OpenAI or Anthropic client at the Prova proxy with a one-line base-URL change. Before each request forwards, the proxy calls /api/v1/gateway/check on the input (PII, secrets, prompt injection, budget). After the call executes, it records the decision to the Audit Vault via /api/v1/audit/ingest. Both the attempt and the outcome land on the record, so the trail shows what was tried, not only what ran.

Two modes, opt-in and fail-open:

observe (the default). The check runs and records, and the request always forwards. Recording happens on a background task, so it adds no latency to your model call.
enforce. A blocking finding stops the request before it reaches the model, and the caller gets a 422 with the reason. Turn it on per request with the X-Prova-Policy: enforce header.

Enforcement engages only when an X-Prova-Key header and a configured check endpoint are both present. If that endpoint is unreachable, the proxy fails open and forwards the request. Prova being down never takes your AI down with it.

Streaming is covered. The decision is made before forwarding, so streaming responses are enforced the same way as non-streaming ones. Cost enrichment on the recorded receipt is non-streaming in this version.

Why a developer cares: one base-URL change turns "log what my agent did" into "stop my agent from doing the thing I told it not to," with no rewrite of your model-call code.

Why a CISO cares: the blocking policies are now applied at the wire, not reported in a dashboard after the data already left. Secrets, budget caps, boundaries, and agent authorization block by default. A PII rule like "no customer PII leaves for an external model" starts in detect-and-record, and promotes to block at the wire the moment you raise its action on the policy dashboard.

Wired into the docker-compose api service and the Helm chart. Docs at /docs/gateway-check.

No new migration.

Jun 2, 2026

Control plane SDK for Python and Node

A first-party SDK for the AI control plane ships in Python and Node. It does three things: ingest decisions, verify receipts offline, and migrate existing logs into the Audit Vault.

Ingest. A typed client wraps POST /api/v1/audit/ingest with single and batch calls.

Offline verification. verify_receipt (Python) and verifyReceipt (Node) recompute the canonical JSON, check the SHA-256 hash, fetch the published public key from /api/v1/keys/{id}, and verify the Ed25519 signature. The canonicalization is byte-for-byte identical to the server's, so a receipt that verifies in the SDK verifies the same way an auditor would with OpenSSL.

Migration. One-shot importers map LangSmith, Langfuse, and OpenAI logs into AIDecisionEvent shape and stream them in batches. The mapped kind values and source fields match the ingest schema, so migrated history is indistinguishable from natively-ingested receipts.

Why a CISO cares: receipt verification does not require trusting Prova or running Prova's code in production. The verifier is small, open, and reproducible against the published key.

Docs at /docs/sdk.

Jun 1, 2026

Custom policy versioning, 2-eyes approval, and bulk replay

Custom policies now move through a pending_review to approved gate instead of mutating the live row in place.

Two-eyes approval. Editing an existing policy stages a draft in a new versions table rather than changing what the ingest pipeline evaluates. A different user holding policy.approve_version must approve before the draft goes live. One pending draft per policy at a time; a fresh draft supersedes the prior one. Initial policy creation is still one-shot, since there is no live policy to gate yet. Submit, approve, and reject each fire an operational audit event, so the policy-change flow is itself on the record.

Bulk replay. Before promoting a draft, replay it against stored history. POST /api/v1/policies/replay walks audit_events over a window (capped at 5000 events or 90 days) and diffs the draft predicate against what the live policy actually matched at ingest. Results come back in four buckets: became a match, no longer matches, still matches, still unmatched, with example event ids per bucket. The UI surface is the "Replay against history" panel in the policy editor.

Why a CISO cares: nobody can quietly widen or weaken an enforcement rule. The change is staged, reviewed by a second person, replayed against real history so the blast radius is known before it goes live, and the whole flow is auditable.

Access: policy.approve_version (owner, developer, security) and policy.replay (owner, developer, security, contractor, audit).

Migration: 021_custom_policy_versions.sql. Apply it before anyone touches the custom-policy editor in production.

May 31, 2026

Network-layer AI discovery closes the shadow-AI gap

AI Inventory now discovers integrations from network logs, not just from receipts and active registration. Point a Cloudflare Logpush job or a Datadog log forwarder at POST /api/v1/inventory/network and Prova parses outbound HTTP records, matches hostnames against a known AI-provider registry, and upserts one inventory row per distinct app, env, provider, and model.

What you get:

Three input formats: cloudflare_logpush, datadog, and generic NDJSON.
Fifteen providers matched out of the box: openai, anthropic, azure_openai, aws_bedrock, google, google_vertex, cohere, mistral, together, fireworks, groq, openrouter, replicate, huggingface, perplexity. The registry is a list, not an algorithm, so adding a missing hostname is a one-line change.
A new discovery channel. Each integration now carries discovered_via across observed, registered, and network. The dashboard surfaces a Network-only stat tile, a network filter, and a network badge per row.

Why a CISO cares: a model call that was never instrumented and never registered still shows up, because the egress traffic itself is the signal. This is the last way shadow AI could hide from the Inventory pillar.

Docs at /docs/inventory-network.

May 30, 2026

HSM signing, self-audit, and a public status page

Three procurement-blocker items shipped together. SSO stays deferred until the first Enterprise contract.

HSM / KMS / Vault signing. Receipt signing now resolves through a Signer interface. Local PEM, file path, and ephemeral keys still work unchanged. A new KMS backend talks to a customer-run sidecar over a two-endpoint HTTP contract, so private key material never leaves the HSM. Activate with PROVA_SIGNING_BACKEND=kms plus the sidecar URL and key id. The contract is documented at deploy/signer-sidecar/README.md.

Self-audit instrumentation. Every admin action now emits a kind='operational' row through the same audit_events table the customer's AI decisions flow through: API key create and revoke, policy and detector toggles, custom policy CRUD, member invite, role change, removal, audit export, and risk config changes. The audit dashboard kind filter gains an "operational (admin)" entry. A self-audit write that fails is swallowed and logged so it can never break the action it instruments.

Public status page. SLO samples are recorded per request on the ingest and gateway-check paths and aggregated at /status and GET /api/v1/status. Encoded SLA targets: 99.9% uptime, 250ms ingest p95, 200ms gateway-check p95. Sample writes fire and forget so a status-table blip never fails a customer request.

Why a CISO cares: the signing key can live in the customer's own HSM, the admin surface audits itself through the same tamper-evident path as everything else, and uptime is publicly verifiable rather than asserted.

Migration: 020_slo_samples.sql. Absence of the table degrades the status page to "no data yet" rather than erroring.

May 29, 2026

Per-org Risk Score weighting

The AI Risk Score now weights its five components per-org instead of uniformly. Editable at /dashboard/risk/config.

Why: uniform weighting was the wrong default for a regulated buyer. A payments company cares more that a bad call was stopped pre-execution (Enforcement Rate) than that every detector is on. A hospital cares more about signing-key persistence and retention (Compliance Readiness) than raw receipt volume. One number for the board only works if the number reflects what that board is actually accountable for.

What you get:

Four starting profiles: uniform (default), fintech (Enforcement Rate weighted), healthcare (Compliance Readiness weighted), EU AI Act / high-risk (policy + audit depth weighted). Each is a starting point, not a constraint -- tune any component from there.
A slider editor. Enter any positive number per component; Prova normalizes to integer weights summing to exactly 100. The per-component ceiling moves, the 0-100 total never does, so the A/B/C/D/F thresholds are unaffected. Only relative emphasis changes.
Automatic propagation. Both the dashboard score and the signed quarterly export recompute against the stored weighting -- no separate step.

Why a CISO cares: the active weighting is stamped into the signed quarterly export alongside the score. An auditor reviewing last quarter's board number can see exactly how it was weighted and verify, via the same Ed25519 signature that protects the receipts, that the weighting was not retuned after the fact to flatter the number. Re-weighting is itself an auditable event.

Access: governed by policy.toggle_builtin -- the same posture-owner audience (owner / developer / security) that governs which built-in policies are on. Audit and contractor roles cannot change it.

Migration: 018_org_risk_config.sql adds the org_risk_config table. One row per org; absence of a row means uniform default, so this is non-breaking for every existing org -- no backfill required.

May 28, 2026

AI Inventory topology graph

The AI Inventory now ships with a topology graph at /dashboard/inventory/topology -- a server-rendered SVG showing the connections between your apps and the models they call.

What it shows:

Apps on the left, grouped by environment (production / staging / development first, then anything else alphabetically).
Models on the right, grouped by provider.
Edges between them, with thickness reflecting invocation volume (sqrt scale -- so a 100x-volume integration is ~10x thicker, not 100x) and color reflecting the worst finding severity recorded for that route (clean / medium / high / critical).
Group labels along both rails so a glance tells you "we have 12 production apps and 4 staging apps, talking to OpenAI and Anthropic and one self-hosted endpoint."

Why this matters for the demo: the inventory table is informational. The topology graph is the artifact a CISO screenshots for the slide. "Here is every place AI runs in our company; here are the routes that have ever produced a finding." It's the single most "wow this is real" visual in the product.

Implementation note: pure server-rendered SVG. Deterministic bipartite layout. No D3, no Three.js, no canvas, no force simulation, no client-side hydration. The same data always produces the same graph -- safe to screenshot. Drops cleanly into a board deck or a regulator pre-meeting.

The graph reuses the same aggregateInventory data the table view uses, so there's a single source of truth -- no lag between the two surfaces. Wired from the inventory page header as a primary CTA.

What v2 doesn't do (yet):

No hover interactions or zoom. Static. If the customer has 200+ integrations the SVG remains readable but won't fit on a 1080p screen without scrolling -- worth revisiting if real customers hit this.
No edges from app -> app. The topology assumes a bipartite app→model relationship, which is true for the vast majority of integrations but misses multi-agent chains where one app's output is another app's input. Multi-agent topology is a different (richer) visualization.
Active discovery still passive. An integration only appears once its first receipt arrives. SDK-registered pre-flight discovery is the v3 step.

No new migration. The data is already there in audit_events.

May 27, 2026

All five detectors now ship in this codebase

The detector catalog has been carrying two "preview" rows (bias drift, hallucination) and one "external_api" row (coordination loop) since the marketplace launched. Today all five detectors ship inline -- the catalog is now an honest list of things that actually run.

What's new:

coordination_loop -- moved from external_api to inline. Re-implemented in TypeScript: builds the agent communication graph from the event's payload.steps, runs Tarjan's SCC, fires when a cycle persists across 6+ steps. The historical Python version used persistent homology on a simplicial complex; for graphs of the size we see in real agent traces (<= 30 agents, <= 200 steps) Tarjan agrees with the math and is two orders of magnitude faster.
bias_drift -- moved from preview to inline. Statistical detector: pulls the 30-day baseline and current 24h window of decisions for a given subject.group, fires when approval rates diverge by 20+ percentage points. Sampled at 10% of qualifying events to keep DB load bounded. Requires a group label and a binary outcome signal in the payload -- this is bias detection, not bias measurement, and it can't fire without labeled data.
hallucination -- moved from preview to inline. For RAG systems: when payload.retrieved_context and payload.completion are both present, the detector sends them to Claude Haiku as a strict JSON-mode entailment classifier. Fires on unsupported claims with classifier confidence >= 0.45. Sampled at 20% of qualifying events, ~$0.0001 per detection. Requires ANTHROPIC_API_KEY; without it the detector silently no-ops.

Architecture note: detector evaluators can now return Promise<DetectorMatch> in addition to the sync variant. The ingest evaluator wraps async detectors in a 1.5s timeout so a slow downstream (Anthropic API latency spike, slow Supabase query) cannot stall ingestion. A timed-out detector treats the event as "no match" -- the receipt still signs and persists.

What the docs used to say vs. what's true now: previous releases described an "external Python analyzer service" that hosted some of these detectors. There was never a deployed service. The references in the registry, the docs, and the CLAUDE.md were architectural intent. As of today the codebase is honest: every detector lives in lib/detectors/inline/* and runs in the Next.js process. The external_api mode is reserved for future heavy-ML detectors that genuinely need a separate runtime (a local entailment model, vector retrievers, etc.) but is unused today.

Cost of running this in production: the Anthropic-backed hallucination detector is the only one with a marginal cost. At 20% sampling and Haiku pricing, expect ~$0.10 per 1,000 receipts that have both retrieved context and a completion -- negligible at any reasonable scale. Disable the detector if your traffic pattern doesn't involve RAG.

No new migration.

May 26, 2026

AI Inventory -- find every integration in your org

Most enterprises with more than 200 engineers have lost track of how many AI integrations they're running. Prova's new AI Inventory rebuilds that map from your audit receipts.

Where it lives:

/dashboard/inventory -- every distinct combination of app, environment, framework, model provider, and model name found in your receipts. Sortable by recency, invocation volume, or finding count. Filterable by environment + provider.
/dashboard/inventory/[id] -- drill-down per integration: invocations, severity breakdown, recent receipts deep-linked to the Audit Vault, cross-links to "everything in this environment" and "everything on this provider."
/api/v1/inventory?days=30 -- JSON snapshot for automation. Wire this into Slack to alert whenever a new integration appears (the canonical "shadow AI" detection).
/docs/inventory -- methodology, integration-id derivation, what's not in v1.

How discovery works:

When a receipt arrives at the Audit Vault, Prova hashes the tuple (app_id, environment, framework, provider, model_name) to derive a stable integration_id. A new integration appears the moment its first receipt is signed. Missing fields fall back to literal unknown -- which makes instrumentation gaps obvious.

Per integration we expose: invocation count in the rolling 30d window, first/last seen, finding count broken down by severity (info/low/medium/high/critical), and a handful of recent receipt IDs.

What v1 doesn't do (yet):

Active SDK discovery -- today an integration only appears once it sends a receipt. prova.registerIntegration(...) for proactive registration is next.
Topology graph -- the SVG visualization of app->model edges weighted by call volume. Data is in the JSON export; the renderer is the gap.
Network-layer discovery -- catching uninstrumented AI calls (where the customer hasn't wired Prova at all) would require a Cloudflare / Datadog log integration. Roadmap.

Why this matters: the question "where is AI running in our company?" is one every CISO at every regulated company has tried and failed to answer cleanly. The Inventory is the answer that builds itself from data Prova already has -- no separate scanner, no agent on every host, no security team interview. The integration appears the moment the receipt does.

The four-pillar vision is now feature-complete: Audit Vault, Policy Engine, Detector Marketplace, AI Risk Score -- plus the Inventory, the gateway enforcement layer, RBAC, and role-aware API keys. The remaining work is depth (the preview detectors actually shipping in the Python analyzer; per-org risk-score weight customization; the SDK-registered active inventory) rather than breadth.

No new migration.

May 25, 2026

AI Risk Score -- the slide for the board deck

Prova now produces a single 0-100 number that summarizes your AI governance posture. It's the slide that goes on the quarterly board deck. It closes pillar four of the four-pillar vision (Audit Vault, Policy Engine, Detector Marketplace, AI Risk Score) -- the platform is now feature-complete on the original brief.

Where it lives:

/dashboard/risk -- live score with letter grade, per-component breakdown, and a ranked list of remediations.
/api/v1/risk/quarterly?q=2026-Q2 -- signed JSON export for any quarter. Same Ed25519 signature as audit receipts -- a regulator three years from now can verify the score the board saw was the score the system produced.
/docs/risk-score -- the full methodology with weights, grade letters, and known caveats.
/risk-score -- the CISO-facing marketing page.

The five components, each 0-20:

Audit Coverage -- how many receipts you've recorded (saturates at 1,000/30d)
Detector Breadth -- fraction of first-party detectors enabled
Policy Coverage -- built-in policies enabled (14 pts) + custom policies authored (6 pts, 2 each capped at 3)
Enforcement Rate -- fraction of receipts that went through pre-execution gateway-check vs post-hoc ingest
Compliance Readiness -- five-item hygiene checklist with weighted points (signing key persistent, retention, EU AI Act export, role hygiene, custom policy in use)

Why we built this: every CISO at every regulated company we've talked to has the same problem -- "are we doing enough on AI governance?" has no quantitative answer in any current tool. You can show logs. You can show trace screenshots. You can hand someone a 40-page audit report. None of those roll up to a single number a non-technical board member can react to.

The Risk Score is intentionally simple and intentionally honest about its limitations. It's a directional metric, not a precise one. The component weights are not yet locked -- if you disagree with them, email us; we want input from real customers before they harden.

Performance: the score recomputes on every page load (pulls from audit_events, org_policies, org_detectors, custom_policies, org_members in parallel; ~50ms at typical org sizes). No background job, no cache invalidation gotchas.

No new migration. The score is derived from data we already have.

May 24, 2026

API keys now have roles

Every API key now carries a role. The role determines which endpoints the key can call.

Why it matters: until today, a Prova API key was effectively root for your org -- it could ingest, export, gate, manage, everything. That's fine for a solo dev with one key, but it breaks enterprise patterns ("give the SIEM a read-only key for compliance pulls", "the contractor's CI runner shouldn't be able to delete custom policies").

What changed:

New role column on the api_keys table (migration 016). Existing keys default to developer so their behaviour is unchanged.
The "generate key" dialog at /dashboard now offers four roles:
- developer -- read + write everything except billing and members. The standard SDK seat.
- contractor -- read + ingest + gateway-check, no policy / detector edits. For outside consultants.
- security -- read across everything + edit policies + edit detectors + run gateway-checks. No ingest from this key.
- audit -- read + export only. No ingest, no gateway-check, no edits. The SIEM seat.
Public endpoints now check the key's role:
- POST /api/v1/audit/ingest requires audit.ingest
- GET /api/v1/audit/export requires audit.export (accepts either a dashboard session or an API key)
- POST /api/v1/gateway/check requires gateway.check
Missing-permission responses return HTTP 403 with a structured error: "forbidden" and a human-readable explanation in detail so a misconfigured key fails loud and obvious.

Migration note: supabase db push applies 016. The default is intentionally permissive (developer) so no existing integration breaks. Tighten by issuing new keys at the right scope and rotating off the old ones.

Roadmap implication: the role model is now end-to-end, dashboard + API. The next enterprise-readiness move is wiring SAML/OIDC SSO via Supabase Auth Pro, but that's gated on the first signed Enterprise contract (which the SSO is itself a prerequisite for, so coordinate timing).

May 23, 2026

Role permissions now enforced everywhere in the dashboard

Yesterday we shipped the role model. Today we wired it into the existing dashboard pages and server actions, so the permissions you assign at /dashboard/members actually do something.

What's now permission-gated:

Surface	Required permission
`/dashboard/audit` + `/dashboard/audit/[id]`	`audit.read`
`/dashboard/policies`	`policy.read`
`/dashboard/policies/new` + `/dashboard/policies/edit/[id]`	`policy.write_custom`
`/dashboard/detectors`	`detector.read`
`/dashboard/billing`	`org.update_billing`
`/dashboard/members`	`org.read_members`
Toggling a built-in policy	`policy.toggle_builtin`
Authoring or editing a custom policy	`policy.write_custom`
Deleting a custom policy	`policy.delete_custom`
Toggling a detector	`detector.toggle`
Creating an API key	`apikey.create`
Revoking an API key	`apikey.revoke`

Two new helpers in lib/auth/guard.ts:

ensurePermission(permission, redirectUrl) -- for server components. Redirects to /login if not authenticated, or /dashboard/forbidden?need=<permission> if the role doesn't grant it.
requirePermissionAction(permission) -- for server actions. Throws so the client sees a structured error.

Visitors who hit a page they can't access land on a clean 403 page that explains which role they have, which permission they're missing, and links to the role matrix. No more silent crashes or generic auth errors.

Caveat for early customers: API endpoints (/api/v1/audit/ingest, /api/v1/gateway/check) still treat any valid API key as full-access. Binding API keys to roles is the next sprint; until then, treat your Prova API keys with the same care as production secrets.

No schema or migration changes in this release -- all infrastructure already in place from yesterday's RBAC sprint.

May 22, 2026

RBAC, member invites, and a roadmap to SSO

Prova is no longer a single-seat product. Today's release adds the role + invite infrastructure that any enterprise pilot expects.

Five roles per org, each with a documented permission matrix:

Owner -- full access, billing, member management. Every org has at least one.
Developer -- day-to-day product use. Edits policies + detectors + API keys, no billing or member control.
Security -- read across everything plus the ability to toggle policies and detectors. For the team that owns AI posture but doesn't ship the code.
Audit -- read + export receipts. The seat you hand to your external auditor.
Contractor -- read everything; can create + revoke API keys; cannot edit policies. Scoped dev access for outside consultants.

Member management ships at /dashboard/members:

Invite a teammate by email with a chosen role. We mint a single-use, 14-day link.
Change a member's role or remove them entirely.
Revoke a pending invite instantly.

Invites are link-based, not email-based -- there's no SMTP dependency, and you can drop the link in whatever channel makes sense (Slack, email, Linear comment). When the invitee clicks the link they sign up or log in and on acceptance they're added to your org with the role from the invite.

Documentation at /docs/access-control with the full permission matrix, the ownership transfer flow, and the SSO roadmap.

SSO: SAML and OIDC via Supabase Auth Pro are available on the Enterprise tier. The integration itself isn't auto-provisioned -- when an Enterprise contract signs, we configure your IdP (Okta, Azure AD, Google Workspace, OneLogin, JumpCloud) in a 30-minute call with your IdP admin. Just-in-time provisioning + group-to-role mapping are supported.

Migration: supabase/migrations/015_org_members.sql creates org_members + org_invites and backfills every existing user as the owner of their own (legacy single-tenant) org. No customer-facing breakage.

What's not in this release: per-page permission enforcement. The role model and the matrix are in code, but the existing dashboard pages still assume the actor is the owner. Walking each surface to add canDo() guards is the next sprint.

May 21, 2026

Audit Vault gets filters, search, and pagination

The receipt browser at /dashboard/audit is now usable at any scale. Previously it showed the latest 50 rows with no filtering -- fine for a brand-new account, painful for anyone with real traffic.

What you can now do:

Filter by kind -- agent_run, model_call, tool_call, agent_step, reasoning_chain
Filter by severity -- info / low / medium / high / critical
Filter by verdict -- caught loop, policy violation, prompt injection, PII leak, bias drift, hallucination
Filter by phase -- gateway-checked (pre-execution) vs ingested (post-hoc). Useful when you want to see only the calls Prova was asked about before they ran.
Time range -- last 24h / 7d / 30d / all time. Default is 7 days.
Free-text search -- string match across the entire signed payload. Useful for finding "every receipt mentioning customer X" or "every call to model Y."
Pagination -- cursor-based, deep pages don't slow down. URL holds all filter state so any view is shareable + bookmarkable.

The stats row at the top of the page now reflects the filtered set, not the full table. Empty state distinguishes "no receipts yet" (new account, links to integration docs) from "no receipts match these filters" (clear-filters CTA).

Implementation note for performance-curious customers: SQL handles kind + time-range + cursor at the index level. Severity / verdict / phase / search filters run in-memory after fetch -- the page fetches a wider page (up to 200 rows) when those filters are active and slices to the displayed window. This is fine for any reasonable working set; a jsonb gin index on payload is on the roadmap for when a customer's filtered queries start touching > 10k rows.

Where: live at /dashboard/audit. No migration needed.

May 20, 2026

Custom policy editor -- author your own rules without emailing us

Until today, customer-authored policies required sending JSON to founders@prova.cobound.dev and waiting for it to be loaded into your org. That ended today.

/dashboard/policies/new opens a JSON-DSL editor pre-loaded with template policies you can clone and tweak:

Long agent run in production -- alert when a production agent_run takes more than 25 steps
Block specific models in production -- refuse production traffic to deprecated model names
Per-call cost cap -- alert when a single invocation costs more than $10
Restrict an action to known apps -- block tool_call to refund_customer unless source.app_id is support-bot

Submit a policy, the validator (lib/policies/validate.ts) confirms the predicate parses + the regex compiles, and it's wired into your ingest + gateway pipelines immediately. Custom policies show up at the top of the policy dashboard with the same enable/disable + edit + delete affordances as the built-ins.

The validator rejects bad predicates at edit time, not at evaluation time -- so customer-authored policies can't break the ingest pipeline downstream. Validation errors come back as a structured list of paths and messages so you can fix one at a time.

Operator coverage: and, or, not, eq, neq, gt, gte, lt, lte, contains, contains_ci, matches, matches_ci, in, exists, missing. Predicate trees can nest up to 8 levels.

Where: list at /dashboard/policies, editor at /dashboard/policies/new, docs at /docs/policies. Migration: supabase/migrations/014_custom_policies.sql.

The detector-side editor is intentionally not in this release. Detectors are algorithms, not predicates, and the right execution model (sandboxed JS, wasm, or a managed plugin marketplace) is still an open question. For now the SDK path for custom detectors remains "email us and we'll review."

May 19, 2026

Gateway check -- block bad calls before they execute

Up until now, Prova's policies and detectors observed AI decisions after they happened. Today they can also stop them before they happen.

POST /api/v1/gateway/check is the new endpoint. Send the same AIDecisionEvent shape you'd send to ingest, but BEFORE you make the model call. The response includes an explicit action:

allow -- no policy matched. Proceed.
alert -- a policy matched but its action is alert. Proceed; the receipt + alert path handles it.
block -- at least one enabled policy returned action=block. Don't proceed. Return the findings to your upstream caller.

Both allowed and blocked decisions are persisted to your Audit Vault with a _prova_gateway marker so the audit trail captures every attempt, not just every execution. Latency budget for the allow path is ~80ms end-to-end.

Drop-in pattern:

check = requests.post(
    "https://prova.cobound.dev/api/v1/gateway/check",
    headers={"Authorization": f"Bearer {os.environ['PROVA_API_KEY']}"},
    json={"kind": "model_call", "payload": {"messages": messages}, ...},
    timeout=2.0,
).json()
if check["action"] == "block":
    raise PolicyBlocked(check["findings"])
response = openai.chat.completions.create(...)

The Audit Vault dashboard now shows a "Gateway checks" stat with the count of blocks. Tune which policies block (vs. alert) from the policy dashboard; start in alert-only mode for a week to baseline, then graduate the data-protection ones to block.

Docs at /docs/gateway-check. This is the enforcement complement to the Audit Vault's observation. Secrets, budget caps, and boundaries block at the edge by default; raise the prompt-injection or PII policies to block and they enforce at the edge too.

May 18, 2026

Detector plugins -- 5 in the catalog, 3 active today

The detector catalog is live. Five first-party detectors are now registered, with three running today and two in preview.

Active inline detectors (run at ingest, attach findings to the signed receipt):

prompt_injection -- pattern-based detector with 11 attack templates and a confidence score. Catches override-prior-instructions, persona-swap, exfil-system-prompt, named-jailbreak patterns (DAN/STAN/DUDE), chat-template injection, and developer-mode tricks. Fires at score >= 3.
pii_leak -- scans model outputs for SSN, credit card, US passport, MRN, ICD-10 codes, and email addresses. Complements the input-side pii_in_prompt policy.

Active external-API detector:

coordination_loop -- the original Prova detector. Runs in the analyzer service using persistent homology on the agent communication graph.

Preview detectors (registered in the dashboard, algorithm shipping next):

bias_drift -- statistical divergence in decisions across protected groups vs. a baseline window
hallucination -- entailment check against retrieved context for RAG systems

Where to look:

Dashboard: /dashboard/detectors -- toggle each detector on/off, see severity / mode / source.
Docs: /docs/detectors -- how detectors differ from policies, execution modes explained, the full catalog.
API: GET /api/v1/detectors -- public catalog endpoint.
Migration: supabase/migrations/013_org_detectors.sql.

This is the detector-plugin interface from Bet 3 of the year-ahead roadmap. The plugin shape ((event) => DetectorMatch) is the same one custom-authored detectors will use when the SDK ships, so internal classifiers your team has already built can plug in cleanly.

May 17, 2026

Policy Engine v1 -- 16 built-in rules, every receipt evaluated

Every event flowing into the Audit Vault is now evaluated against a built-in policy library before its receipt is signed. Matched policies attach a policy_violation finding to the receipt with severity, action, and remediation guidance.

What ships today:

16 built-in policies across six categories: data protection (PHI, PII, secrets in prompts), safety (prompt-injection patterns, runaway agents, high-impact tool calls without approval), cost (high-cost invocations), compliance (EU data residency, medical decisions without human-in-the-loop), operational (latency, empty completions), and governance (unknown apps, experimental models in production, missing model identity).
Policy dashboard at /dashboard/policies -- one toggle per policy, grouped by category. Disabled policies don't run at all (no overhead, no findings).
Dry-run API at POST /api/v1/policies/evaluate -- send any event, get back the findings it would produce. Use from CI to validate a prompt change won't trip a blocking rule.
Public catalog at GET /api/v1/policies -- list the entire policy library with each policy's current state for your org.
Custom JSON-DSL predicates (the editor is in preview, but the evaluator works today). Supports and / or / not / eq / neq / gt / gte / lt / lte / contains / contains_ci / matches / matches_ci / in / exists / missing against any field path in the event.

Docs: /docs/policies. Migration: supabase/migrations/012_org_policies.sql.

This is Bet 2 of the year-ahead roadmap landing on schedule. Bet 3 (detector marketplace) is next.

May 16, 2026

Self-hosted Prova preview

For enterprises that can't send data out of their environment -- financial services, healthcare, defense -- Prova now ships as a self-hosted deployment.

Two paths:

Docker Compose for single-node dev/test and small-team production. One file, four containers (web, API, Postgres, MinIO). Configure, generate a signing key, docker compose up.
Helm chart for production Kubernetes. Horizontal scaling, persistent volumes, HSM-backed key management, OpenTelemetry exporter for your existing observability stack.

Both modes support air-gapped install -- mirror the two container images to your internal registry, set PROVA_OFFLINE=1, and Prova runs with zero outbound network calls. Receipts are still signed; auditors can still verify them against your published public key.

Self-hosted is part of the Enterprise plan. See deploy/README.md for the deployment guide, or book a call to walk through architecture.

May 16, 2026

Audit Vault preview

Every AI decision your system makes can now produce a tamper-evident receipt -- not just coordination-loop detections.

We generalized the receipt format that powered coordination-loop catching into a full Audit Vault: a permanent, signed, searchable record of every AI call your enterprise makes. Model invocations, agent runs, tool calls, policy evaluations -- all of them serialize to the same AIDecisionEvent schema, each one signed with our Ed25519 key, each one independently verifiable.

Two ways to feed data in:

POST /api/v1/audit/ingest -- send a webhook from your existing observability stack (LangSmith, Langfuse, Helicone, OpenTelemetry, custom logs). Prova signs it and returns the receipt.
prova.audit(decision) in the SDK -- wrap your agent code once, capture every decision automatically.

The new Audit Vault dashboard surfaces a searchable receipt browser, filters by kind / verdict / app, and a one-click compliance export for EU AI Act Article 12. FDA, SEC, and HIPAA templates are in preview.

This is the foundation for everything else we're building this year -- the policy engine, the detector marketplace, and the AI risk score all sit on top of receipts. Read the docs or book a call.

May 16, 2026

AI Inventory v3 -- active discovery

The AI Inventory now sees integrations before they fire, not just after. Register one explicitly and it appears immediately, flagged dark until its first receipt confirms it is live.

Why: passive discovery -- rebuilding the inventory from receipts -- can only ever show you what has already run. The integration that was wired up, shipped, and has never actually executed in prod is invisible to it by construction. That is the exact integration a CISO wants flagged: dead code that still holds an API key, or a service that is running and silently not reporting.

What you get:

A register endpoint. POST /api/v1/inventory/register declares an integration by its (app_id, environment, framework, provider, model_name) tuple. Same auth and the same write permission as event ingestion (audit.ingest). One registration or an array (max 1000). Idempotent on the tuple -- re-registering refreshes metadata without disturbing the original registration time.
Dark detection. The inventory left-joins registrations against observed receipts. An integration is dark when it is registered but no AI decision has flowed through it in the 30-day window. The dashboard surfaces a Dark count, a banner, a per-row badge, and a discovery filter (all / dark / registered / observed).
Discovery state on every row. The JSON export now carries discovery (observed / registered / both), exercised, and registered_at so an alert pipeline can fire on "registered 7 days ago, still never exercised."

Why a CISO cares: the gap between what engineering says is wired up and what is actually running is the shadow-AI question. Passive discovery measures one side of it; registration supplies the other. The number that matters -- registered minus exercised -- is now a single figure on the dashboard.

What this is not: a registration is a declaration, not an AI decision. It is deliberately not a signed receipt. The signed audit trail remains exactly the receipts; registration only tells the inventory the integration is supposed to exist.

Migration: 019_registered_integrations.sql adds the registered_integrations table. Absence of the table is non-fatal -- the loader treats a missing table as "nothing registered", so passive discovery keeps working before the migration is applied.

May 15, 2026

New transcript hero on the homepage

We replaced the homepage's 3D agent-graph visualization with a streaming chat transcript that shows four agents working on a real task, falling into a coordination loop, and getting caught by Prova in about ten seconds.

Why: the abstract 3D graph was a beautiful Three.js demo, but it required a developer to interpret what was happening. The transcript shows the failure mode in the way developers already think about it -- a conversation log -- and the loop becomes obvious by the time the third "Pulling APAC_Q3_dataset_v3" message appears.

The transcript also halves the homepage's bundle size: we removed Three.js and the postprocessing chain (~400KB gzipped). Loads instantly, works on mobile, fully accessible to screen readers.

May 14, 2026

How-it-works on the homepage; legacy verifier demoted

Two changes to the homepage hero:

A three-step "How it works" section sits directly under the hero now: install the SDK, run your agents normally, get an alert when a coordination loop forms.
The reasoning-chain verifier (the original Prova product, from before our pivot to multi-agent failure detection) was demoted to a secondary section further down the page, with a clear "Also from Prova" label and a proper display-style header.

The site has been carrying the legacy verifier as the main pitch for months. It's still a useful product; it just isn't the headline anymore. The homepage now says clearly: Prova catches coordination loops in your agent system. The verifier is the bonus.

May 13, 2026

Native Stripe billing; book-a-call replaces the contact form

The Team plan is now self-serve. Click "subscribe" on the pricing page and you'll go straight to a Stripe Checkout session -- no contact form, no waiting for a sales email. The dashboard has a new billing page where you can manage your subscription via the Stripe Customer Portal.

For the Enterprise plan, we replaced the contact form with a proper book a call page. Pick a time on a calendar, talk to a founder for thirty minutes. Same experience as every other modern developer-tools company. Way less friction than the old "request access" flow.

May 10, 2026

Design system cleanup on vertical and competitive pages

The /for/agents, /for/healthcare, /for/financial-services, and all /vs/<competitor> pages have been migrated off raw Tailwind colors and onto the site's design tokens. Buttons, typography, and section spacing now match the rest of the site.

Practical effect: if you bounce between the homepage and one of those landing pages, the site reads as one coherent product instead of two. Small change, large credibility lift -- visual inconsistency is the loudest "early-stage" signal even for visitors who can't articulate why.

May 8, 2026

ProvaTracer SDK shipped

The first official Prova SDK is out. One line wraps your LangGraph or CrewAI agent system and starts watching for coordination failures in real time.

import prova

with prova.watch(graph):
    graph.invoke(state)

The tracer streams every read and write into Prova's analyzer. When a coordination loop is detected, your configured webhook fires with the exact agents stuck, the step the loop started, and a tamper-evident receipt.

API keys for self-serve users now generate from the dashboard. Free tier gets two active keys; Team plan removes that limit.

Read the blog →