AI PM Playbook

Use this to decide if the product can enter or advance beyond each release stage. Three gates: pilot entry, limited production entry, and scale-up entry. Do not skip gates.

Inputs: scores and evidence come from the AI PRD, eval plan, cost model, observability plan, and human review workflow. Output: a go/no-go decision with rationale, conditions, owner, and reversal trigger.

Gate 1: pilot entry

Pass/fail criteria

Criterion	Target	Actual	Pass?
Eval accuracy on golden set	e.g., >= 90%
Failure behavior tested	All failure modes documented and handled
Human review workflow functional	Reviewers can approve/reject/edit
Latency	e.g., p95 < 10s
Cost per task	e.g., < $0.10
Safety boundaries hold	Adversarial eval passes
Observability in place	Logs, metrics, alerts configured
Trace review completed	Prototype or pilot traces reviewed and failures labeled

Staged rollout plan

Shadow mode: production requests duplicated to AI path, outputs logged but not shown to users
Canary: 1% of traffic, gated ramp (1% -> 5% -> 20% -> full), rollback criteria defined
Cohort-based: specific user segment or geography first

Rollback trigger: e.g., quality score drops > 5%, cost per task > 2x budget, any safety incident

Regulatory compliance

Risk classification and compliance path determined
Data provenance documented (training data sources, retrieval sources, retention policy)
Transparency requirements met (users informed they are interacting with AI)
System card or model card drafted (which models, prompts, tools, retrieval sources, human review points)
Vendor due diligence complete for third-party model providers

Required pass conditions

No unmitigated high-severity risks in risk register
No data leakage between users/tenants
Failure behavior does not expose raw model output to users
Trace review has happened for prototype or pilot behavior
Any agent, eval, prompt, tool, or workflow self-improvement requires human review before rollout

Risk and decision record

Risk or blocker	Severity	Owner	Required mitigation	Due

Options considered

Option	Pros	Cons
Start pilot
Hold
Do not launch

What would reverse this decision: Name a specific metric, date, dependency, or evidence threshold that would reopen the decision.

Decision

Start pilot
Advance with conditions: list conditions
Hold: what needs to change
Do not launch: reason

Decided by: name Date: YYYY-MM-DD Review date or trigger: YYYY-MM-DD or metric threshold

Gate 2: limited production entry

Pass/fail criteria

Criterion	Target	Actual	Pass?
Eval accuracy on production sample	e.g., >= 92%
User task completion rate	e.g., >= 80%
Accept rate	e.g., >= 60%
Reject/escalation rate	e.g., < 15%
User-reported issues	e.g., < 5 per week
Cost per task (production)	e.g., < $0.08
Latency (production)	e.g., p95 < 8s
No regression vs. pilot	Quality metrics stable or improving

Required pass conditions

No unresolved incidents from pilot
No systematic bias detected in output quality across user segments
Cost trajectory within budget at projected scale
Regulatory requirements from Gate 1 still met (no scope changes that alter risk classification)

Risk and decision record

Risk or blocker	Severity	Owner	Required mitigation	Due

Options considered

Option	Pros	Cons
Advance
Advance with conditions
Hold

What would reverse this decision: Name a specific metric, date, dependency, or evidence threshold that would reopen the decision.

Decision

Advance to limited production
Advance with conditions: list conditions
Hold: what needs to change
Do not launch: reason

Decided by: name Date: YYYY-MM-DD Review date or trigger: YYYY-MM-DD or metric threshold

Gate 3: scale-up entry

Pass/fail criteria

Criterion	Target	Actual	Pass?
Quality metrics stable for >= 2 weeks	specific metrics
Cost per customer within margin target	e.g., < $X/customer/month
Support ticket volume	e.g., < baseline + 10%
Rollback plan tested	Can disable AI path in < 15 min
Monitoring and alerting validated	Alerts fire correctly on synthetic failures
Trace-to-eval loop running	Production failures feed back into eval set

Required pass conditions

No open high-severity incidents
Rollback plan tested and documented
On-call runbook reviewed by ops team
Incident response process tested (at least one simulated incident)
Near-miss capture process in place (not just incidents, but close calls)

Risk and decision record

Risk or blocker	Severity	Owner	Required mitigation	Due

Options considered

Option	Pros	Cons
Scale
Hold expansion
Roll back

What would reverse this decision: Name a specific metric, date, dependency, or evidence threshold that would reopen the decision.

Decision

Ship to all users
Advance with conditions: list conditions
Hold: what needs to change
Do not launch: reason

Decided by: name Date: YYYY-MM-DD Review date or trigger: YYYY-MM-DD or metric threshold

Launch gate checklist

Gate 1: pilot entry

Pass/fail criteria

Staged rollout plan

Regulatory compliance

Required pass conditions

Risk and decision record

Decision

Gate 2: limited production entry

Pass/fail criteria

Required pass conditions

Risk and decision record

Decision

Gate 3: scale-up entry

Pass/fail criteria

Required pass conditions

Risk and decision record

Decision

Read alongside this template

Launch gates

Healthcare Intake Assistant: Launch gate