Versioning Prompts, Policies, and Models Together

Cross-Industry • ~7–8 min read • Updated Sep 27, 2024

Context

Most AI regressions aren’t single-file mistakes—they’re set mismatches: a prompt tweak that clashes with a new policy filter, or a routing change that invalidates cached outputs. Treating each artifact separately makes rollbacks messy and audits painful. Release sets, not parts.

Core Framework: Release Sets

  1. Define the set: {model_ref, prompt_pack, policy_pack, routing_table, tool_specs, retrieval_cfg, eval_suite}.
  2. Single tag: Use one immutable tag (e.g., ai-suite@2024.09.27) applied to every artifact and runtime switch.
  3. SemVer the set: MAJOR (policy or model change), MINOR (prompt/routing), PATCH (typo, threshold nudge).
  4. Environment gates: dev → staging → canary → prod with the same set tag; no “rebuilds” between envs.
  5. Diffs & notes: Store human-readable release notes, and machine diffs for prompts/policies & routing deltas.

Recommended Actions

  1. Create a Set Manifest: Versioned JSON (or lockfile) listing all artifact URIs and checksums.
  2. Tie CI to Evals: Staging promotion requires passing the tagged eval_suite (answerability, safety, latency, cost).
  3. Wire Feature Flags: Roll out by segment or tenant; keep previous set hot for instant rollback.
  4. Cache Discipline: Namespaced caches by set tag; invalidate on promotion to avoid mixed outputs.
  5. Policy Precedence: Registry that enforces system > policy > tool > app prompts per set.

Common Pitfalls

  • Artifact drift: Prompt edited in-prod without bumping the set tag → unreproducible incidents.
  • Env snowflakes: Rebuilding prompts or dependencies per env creates “works in staging” bugs.
  • Unscoped caches: Old answers leaking into new releases; no TTL by risk tier.
  • Shadow changes: Vendor model updates without model pinning & fitness checks.

Quick Win Checklist

  • Introduce a set manifest and tag today’s prod as ai-suite@YYYY.MM.DD.
  • Namespace caches by set tag; add TTLs by use-case risk.
  • Require eval pass for promotion; store scorecard alongside release notes.
  • Pin vendor models (ID + date); alert on upstream weight changes.

Closing

Versioning AI as a set makes behavior reproducible, audits simple, and rollbacks instant. That’s how you ship faster without surprises.