AI Risk Score

The board-deck number, explained.

The AI Risk Score is a single 0-100 number that summarizes your AI governance posture. It's designed for the slide that goes to the board every quarter, and for the conversation that follows when the number moves.

Why a score

"Are we doing enough on AI governance?" is a question no current tool answers numerically. You get logs. You get traces. You get individual policy hits. None of those roll up to a single number a non-technical board member can react to.

The Risk Score collapses five orthogonal signals into one number. It's not precise. It's directionally honest and improvable. If you move it from 47 to 78 between Q1 and Q2, that fact, together with the underlying component deltas, is the story.

The five components

Each component is 0-20 under the default uniform weighting. Total is 0-100. Higher is better posture. The per-component ceilings change when you re-weight the score; the 0-100 total never does.

Audit Coverage0-20

How much of your AI decision flow is recorded as tamper-evident receipts. Score scales linearly to the maximum at 1,000 receipts in the 30-day window. A pre-revenue project with 5 receipts will score 0; a production system with 5,000 will cap easily.

Improve by

  • Wire POST /api/v1/audit/ingest into your AI traffic.
  • Use the SDK's prova.audit() on every model call.
  • Mirror existing observability streams (LangSmith / Langfuse / Helicone) into Prova.
Detector Breadth0-20

What fraction of Prova's five first-party detectors are enabled. All five run inline on every receipt. Coordination loops, prompt injection, and PII leak are on by default; bias drift and hallucination are opt-in (off by default) and sampled, so enabling them is a deliberate posture choice the score rewards.

Improve by

  • Enable every active detector at /dashboard/detectors.
Policy Coverage0-20

Breadth of governance rules. 14 of the 20 points come from enabled built-in policies (16 in the library); the remaining 6 points come from customer-authored custom policies (2 per policy, capped at 3). Custom policies earn more per-policy because they encode org-specific knowledge.

Improve by

  • Toggle on the built-in policies appropriate to your workload.
  • Author 3 custom policies that encode your specific risks (the highest-leverage move once built-ins are saturated).
Enforcement Rate0-20

Fraction of receipts that came through pre-execution gateway-check (/api/v1/gateway/check) vs. post-hoc ingestion. Post-hoc ingestion leaves a paper trail but lets bad calls through. Gateway-check actually blocks them. The score weights enforcement as 1:1 with audit coverage because catching it on the way in is the higher-trust mode.

Improve by

  • Replace ingest-only callers with gateway-check before the model call.
  • See /docs/gateway-check for the drop-in code pattern.
Compliance Readiness0-20

Five-item checklist: receipt signing key is persistent (not ephemeral), 30 days of audit history retained, EU AI Act export available, role hygiene (more than one role in use, or single-seat), at least one custom policy in use. Weights are 6 / 4 / 4 / 3 / 3.

Improve by

  • Set PROVA_SIGNING_KEY_PATH (or _PEM) in production so receipts verify across deploys.
  • Use the role system in /dashboard/members.
  • Author at least one custom policy.

Grade letters

A

90-100

Production-ready for regulated workloads. The auditor will have a short call, not a long one.

B

75-89

Solid foundation. Address the top 1-2 remediations to reach A.

C

60-74

Acceptable for internal use, lacks the depth an external auditor expects. Focus on enforcement + policy coverage.

D

40-59

Significant gaps. Not appropriate for regulated workloads until the bottom three components are addressed.

F

0-39

Critical gaps. AI traffic is largely unmonitored.

Custom weighting

Uniform weighting is the wrong default for a regulated industry. A payments company cares more that a bad call was stopped on the way in (Enforcement Rate) than that every detector is on. A hospital cares more about signing-key persistence and retention (Compliance Readiness) than raw receipt volume. So the weighting is per-org, editable at /dashboard/risk/config.

You enter any positive number per component. Prova normalizes them to integer weights summing to exactly 100, so the per-component ceiling moves but the 0-100 total (and therefore the A/B/C/D/F thresholds) never does. Only the relative emphasis changes. Three starting profiles ship (fintech, healthcare, EU AI Act); each is a starting point you can then tune.

The active weighting is stamped into the signed quarterly export alongside the score. An auditor reviewing last quarter's board number can see exactly how it was weighted and verify (via the same Ed25519 signature that protects the receipts) that the weighting was not retuned after the fact to flatter the number.

The quarterly export

The score for any quarter is available at GET /api/v1/risk/quarterly?q=2026-Q2. The response is signed with the same Ed25519 key as audit receipts, so a regulator can verify that the score the board saw last May was the score the system actually produced.

Use it from a quarterly PDF generator, a Slack workflow that posts the score every Monday morning, or as a cell in a board spreadsheet that auto-refreshes. The endpoint accepts either a dashboard session or an API key with the audit.export permission.

Caveats we'll fix

  • Audit Coverage uses a fixed saturation point of 1,000 receipts/month. Different orgs have wildly different traffic levels, so this is the weakest component. The right denominator is "your historical baseline", which we'll add once we have enough orgs to learn it from.