Incident Review Template for AI Failures
Resources & Utilities • ~7–8 min read • Updated May 7, 2025
Context
AI systems fail in ways classic software doesn’t: probabilistic outputs, distribution shifts, vendor changes, and hidden policy interactions. Postmortems must therefore go beyond root-cause to include model and data behavior, guardrail efficacy, and decision impact. This template is blameless, lightweight, and designed to produce clear control changes.
Core Template
- Header & Triage
- Incident ID: YYYYMMDD-domain-sequence
- Severity: S1–S4 (customer harm, regulatory exposure, financial impact, operational disruption)
- Discovery: Monitoring alert, user report, internal review, audit
- Timeframe: First observed → contained → resolved
- Context Snapshot
- Use Case: Purpose, decision stakes, HITL gates
- Model & Version: Base/finetuned model, prompt pack version, policies
- Data Inputs: Retrieval sources, freshness, PII/PHI handling
- Impact Summary
- Affected: Users, transactions, processes
- Blast Radius: Systems, teams, customers
- Measured Impact: Cost, latency, error rate, SLA breaches
- Failure Characterization
- Error Type: Hallucination, retrieval miss, policy misfire, routing error, drift, jailbreaking, data leak
- Repro Steps: Minimal prompt/input to reproduce
- Signals: Logs, eval results, human edits/overrides, anomaly alerts
- Contributing Factors
- Model: Temperature, context length, update cadence
- Retrieval: Index coverage, chunking, ranking config
- Guardrails: Filter policy gaps, prompt hardening, HITL placement
- Ops: Caching, timeouts, dependency health, vendor change
- Remediations & Control Changes
- Immediate Fix: Patch applied; owner; ETA
- Follow-up Tasks: Tracked as tickets with SLOs
- Control Map: Which guardrails / tests / monitors were added or tightened
- Verification & Close
- Regression Tests: Added to golden/eval sets
- Post-Fix Metrics: Error rate, latency, override rate vs. baseline
- Decision: Close / watchlist / hold for next release
Recommended Actions
- Adopt Severity & Taxonomy: Standardize S1–S4 and error types across teams.
- Wire to Tooling: Create a simple form (or doc template) & link it to your ticketing system.
- Golden Set First: Every incident adds at least one test to eval/golden suites.
- Control Registry: Maintain a living list of guardrails, tests, and monitors with owners and SLOs.
- Monthly Review: Summarize incidents, patterns, and control effectiveness for governance.
Common Pitfalls
- Blaming People: Focus on systems & controls; assume good intent.
- One-Off Fixes: Patch without strengthening guardrails or tests.
- No Repro: Closing incidents without a minimal reproducible case.
- Unowned Actions: Remediations without named owners and dates.
Quick Win Checklist
- Publish the template (copyable doc or form) with example incidents.
- Define S1–S4 and 6–8 error types your org will use.
- Require one golden/eval test per incident before closing.
Closing
Great AI teams turn incidents into leverage. A blameless, structured review process—connected to evals, guardrails, and monitoring—reduces repeat failures and steadily raises quality without slowing delivery.