Use this to define what you're building, how the AI behaves, and what "good" looks like. The AI job statement is the most important line in this document.
Upstream: this should follow an approved opportunity brief . Downstream: use this PRD to create the eval plan , human review workflow , cost model , observability plan , and launch gate . If engineers or coding agents need a tighter handoff, use the optional build brief .
What user problem does this solve? Include evidence: support tickets, user interviews, usage data.
What outcomes will you measure? Be specific: "reduce manual review time from 20 min to under 5 min" not "improve efficiency."
What are you explicitly not doing? This prevents scope creep mid-build.
Who uses this first? Be narrow. "Enterprise compliance analysts reviewing vendor contracts" not "business users."
Step-by-step: what does the user do today without this product?
Step-by-step: what changes with AI in the loop? Where does the human still act?
The AI [does what] using [inputs] to produce [outputs] for [user] inside [workflow], subject to [constraints].
Fill this in. Example: "The AI drafts a contract summary using the uploaded PDF to produce a structured risk report for compliance analysts inside their review queue, subject to a 2% max hallucination rate on key terms and no fabricated clause references."
What model or provider are you using? What constraints apply?
Parameter Value Notes Model / provider e.g., Claude Sonnet via Anthropic API Token budget per task e.g., 2k input, 500 output Multi-model routing e.g., cheap model for classification, expensive model for generation Context window needs e.g., must handle 50-page documents
How should the AI communicate? This is a product requirement, not an engineering detail. If you don't spec it, the model default ships.
Tone: e.g., professional, direct, no hedging
Constraints: e.g., never speculate beyond source material, always cite the document section
Persona boundaries: e.g., does not give opinions, does not role-play
What data does the AI use? Where does it come from? What are the privacy and compliance constraints?
Retrieval sources: e.g., customer knowledge base, internal docs
Data permissions: e.g., customer data processed under DPA, no cross-tenant access
Retention policy: e.g., prompts and outputs logged for 30 days, then deleted
What does the AI receive? Format, size limits, required fields, what happens if input is malformed.
Input Format Required Max size Fallback if missing
What does the AI produce? Structure, format, required fields.
Output field Type Always present Example
Justify your choice. Higher autonomy requires higher quality bars and better failure handling.
Different actions in the same product can have different autonomy levels. Map each AI action below:
AI action Autonomy level Justification e.g., draft support response e.g., draft e.g., customer-facing, must be reviewed before sending e.g., categorize incoming ticket e.g., act e.g., low risk, reversible
If this product uses agents (tool-calling, multi-step workflows), define what tools the agent can access and what it cannot do. Skip this section for single-turn LLM features.
Tool / capability Allowed Constraints e.g., read customer records yes/no e.g., read-only, current tenant only e.g., send email yes/no e.g., draft only, requires human approval e.g., modify database yes/no e.g., never
Escalation: When the agent encounters something outside its scope, what happens? e.g., hand off to human, surface uncertainty, stop and ask.
Include 2-3 examples to anchor alignment across PM, engineering, and design. At least one rejection case.
Case Input Expected output Happy path typical input what good looks like Rejection input the AI should refuse e.g., refusal with escalation to human Edge case unusual but valid input acceptable behavior
When must a human review AI output before it reaches the user or takes effect?
Name the risks that could block pilot, production, or scale. Any high-severity risk needs an owner and mitigation before launch.
Risk Scenario User impact Business impact Likelihood Severity Mitigation Detection signal Owner Incorrect output plausible but wrong output Over-trust user treats AI as authoritative Data leakage wrong user/tenant sees data Unsafe autonomy AI takes action beyond scope Cost spike usage or retries exceed budget Silent degradation quality drops without alert
Agentic risks, if relevant:
Risk Scenario Mitigation Owner Goal hijacking Tool misuse Error cascading
Specific, measurable thresholds. Example: "Precision >= 95% on golden eval set (n=200). No fabricated citations. Structured fields match source document in >= 98% of cases."
p50 and p95 targets. Example: "p50 < 3s, p95 < 8s for single-document summary."
Per-task and per-user-month budget. Example: "< $0.05 per summary, < $2/user/month at expected usage."
What does the product do when the AI fails, times out, or returns low-confidence output?
On timeout:
On low confidence:
On malformed output:
On safety trigger:
What must be logged and monitored from day one?
What must be true before this ships to any user? Reference the launch gate checklist.
What do you still need to figure out? Be specific about what would change your plan.