Incident Review Template for AI Failures

Resources & Utilities • ~7–8 min read • Updated May 7, 2025

Context

AI systems fail in ways classic software doesn’t: probabilistic outputs, distribution shifts, vendor changes, and hidden policy interactions. Postmortems must therefore go beyond root-cause to include model and data behavior, guardrail efficacy, and decision impact. This template is blameless, lightweight, and designed to produce clear control changes.

Core Template

  1. Header & Triage
    • Incident ID: YYYYMMDD-domain-sequence
    • Severity: S1–S4 (customer harm, regulatory exposure, financial impact, operational disruption)
    • Discovery: Monitoring alert, user report, internal review, audit
    • Timeframe: First observed → contained → resolved
  2. Context Snapshot
    • Use Case: Purpose, decision stakes, HITL gates
    • Model & Version: Base/finetuned model, prompt pack version, policies
    • Data Inputs: Retrieval sources, freshness, PII/PHI handling
  3. Impact Summary
    • Affected: Users, transactions, processes
    • Blast Radius: Systems, teams, customers
    • Measured Impact: Cost, latency, error rate, SLA breaches
  4. Failure Characterization
    • Error Type: Hallucination, retrieval miss, policy misfire, routing error, drift, jailbreaking, data leak
    • Repro Steps: Minimal prompt/input to reproduce
    • Signals: Logs, eval results, human edits/overrides, anomaly alerts
  5. Contributing Factors
    • Model: Temperature, context length, update cadence
    • Retrieval: Index coverage, chunking, ranking config
    • Guardrails: Filter policy gaps, prompt hardening, HITL placement
    • Ops: Caching, timeouts, dependency health, vendor change
  6. Remediations & Control Changes
    • Immediate Fix: Patch applied; owner; ETA
    • Follow-up Tasks: Tracked as tickets with SLOs
    • Control Map: Which guardrails / tests / monitors were added or tightened
  7. Verification & Close
    • Regression Tests: Added to golden/eval sets
    • Post-Fix Metrics: Error rate, latency, override rate vs. baseline
    • Decision: Close / watchlist / hold for next release

Recommended Actions

  1. Adopt Severity & Taxonomy: Standardize S1–S4 and error types across teams.
  2. Wire to Tooling: Create a simple form (or doc template) & link it to your ticketing system.
  3. Golden Set First: Every incident adds at least one test to eval/golden suites.
  4. Control Registry: Maintain a living list of guardrails, tests, and monitors with owners and SLOs.
  5. Monthly Review: Summarize incidents, patterns, and control effectiveness for governance.

Common Pitfalls

  • Blaming People: Focus on systems & controls; assume good intent.
  • One-Off Fixes: Patch without strengthening guardrails or tests.
  • No Repro: Closing incidents without a minimal reproducible case.
  • Unowned Actions: Remediations without named owners and dates.

Quick Win Checklist

  • Publish the template (copyable doc or form) with example incidents.
  • Define S1–S4 and 6–8 error types your org will use.
  • Require one golden/eval test per incident before closing.

Closing

Great AI teams turn incidents into leverage. A blameless, structured review process—connected to evals, guardrails, and monitoring—reduces repeat failures and steadily raises quality without slowing delivery.