The AI Growth OS Framework
Contents
- The AI Growth OS Framework (definition)
- Visual representation: the AI Growth OS matrix
- When to use this ai growth framework (and when NOT to)
- Step-by-step application guide (ship the OS in 10 working days)
- Decision matrix: where AI belongs in your growth system
- Real examples (Uber/Postmates patterns + common scenarios)
- Copy-pasteable prompts (use with your internal docs + data)
- Common mistakes I see (and how to avoid them)
- Related frameworks and how they connect
- Frequently Asked Questions
The AI Growth OS Framework is an ai growth framework that turns “AI ideas” into a repeatable operating system: a spec-driven experiment pipeline, agent orchestration, quality gates, and measurement that compound learnings. Use it when you need consistent velocity and reliability across many growth bets, and avoid it when you only need one-off automation.
Key takeaways:
- Treat growth as a production system: specs → agents → quality gates → measurement → learning cards.
- Standardize inputs (YAML specs) so AI work is auditable, comparable, and easy to scale across teams.
- Win by enforcing quality gates and instrumentation before you chase velocity.
Most teams adopting AI for growth start with tools, then wonder why results feel noisy: inconsistent outputs, unclear ownership, “promising” prototypes that never ship, and experiments that can’t be compared because the inputs keep changing.
I’ve seen the opposite work at scale. At Uber and Postmates, growth only stayed sane when we ran it like an operating system: clear intake, standardized experiment definitions, strict measurement, and a disciplined learning loop. AI doesn’t change that. AI amplifies it. If your system is sloppy, AI makes it sloppier faster.
The AI Growth OS Framework is the structure I wish every CEO and VP Growth had on day one of an AI push. It’s built to answer the questions that actually matter: What are we running? Why are we running it? How do we know it worked? What did we learn, and how does that change the next 10 bets?
This page gives you the exact mental model, a scoring rubric to prioritize where AI belongs, an executable step-by-step rollout, and copy-pasteable prompts plus a YAML spec template you can put into production this week.
The AI Growth OS Framework (definition)
AI Growth OS = a repeatable operating system that converts growth hypotheses into shipped, measured, AI-assisted experiments with enforced quality and a compounding learning library.
It has four production primitives:
- Experiment Spec (YAML): single source of truth for hypothesis, audience, channel, metric, guardrails, rollout, and instrumentation.
- Agent Skills: modular AI capabilities (research, segmentation, copy, creative QA, SQL drafting, etc.) that can be composed per experiment.
- Quality Gates: deterministic checks that block bad experiments from shipping (policy, brand, math, instrumentation, sample ratio mismatch checks, etc.).
- Learning Cards: standardized post-mortems that make insights reusable and searchable.
If your growth team can’t answer “what’s running and what we learned last week” in two minutes, you don’t have an OS. You have activity.
Visual representation: the AI Growth OS matrix
Use this matrix to keep the system concrete. Each row is an artifact you can literally implement.
| OS Layer | Primary Output | Owner (default) | “Done” Definition | Failure mode it prevents |
|---|---|---|---|---|
| Intake & Prioritization | Ranked backlog | VP Growth + Data | Every item has a score + spec stub | Random acts of AI |
| Experiment Spec | experiment.yaml |
Growth PM | Hypothesis + metrics + rollout + instrumentation filled | Vague tests, unmeasurable wins |
| Agent Orchestration | Agent runbook + prompts | Growth Eng / Ops | Agents produce assets + checks in <24h | Bottlenecks, inconsistent execution |
| Quality Gates | Pass/fail checklist + automated checks | Data + Legal/Brand (as needed) | Cannot ship unless gates pass | Brand risk, broken tracking, junk outputs |
| Launch & Measurement | Dashboard + query | Data | Primary + guardrail metrics live day 0 | “We’ll measure later” |
| Learning Cards | learning.md + tags |
Experiment owner | Decision + why + next action captured | Repeating mistakes, no compounding |
Print that table. If any row is missing, you’ll feel it within two sprints.
When to use this ai growth framework (and when NOT to)
Use it when:
- You’re running multiple concurrent growth bets across lifecycle, acquisition, and monetization.
- You have cross-functional dependencies (data, eng, design, legal) and need cleaner handoffs.
- Your AI efforts keep producing artifacts without outcomes (copy, dashboards, “insights”) but no shipped wins.
- You want repeatable velocity: experiments per week increases without quality collapsing.
Don’t use it when:
- You have a single narrow AI use case (ex: “summarize Gong calls into 5 bullets”). A lightweight SOP is enough.
- Your tracking is broken and you can’t fix it soon. The OS will surface that gap immediately, but it can’t replace instrumentation.
- You’re pre-PMF and still changing the core customer weekly. You’ll over-invest in process before your inputs stabilize.
A clean heuristic: if you can’t name your top 3 growth constraints and their owners, pause before building the OS. The OS enforces clarity; it doesn’t create it from nothing.
Step-by-step application guide (ship the OS in 10 working days)
Step 1 (Day 1): Define the growth surface area and constraints
You need a shared map: acquisition loops, activation moments, retention drivers, monetization levers.
Deliverable:
- One page: North Star, 3-5 supporting metrics, and 3 constraints (ex: supply, latency, trust & safety, margin).
From my Uber experience: growth strategies that ignore constraints (like driver supply or ETAs) create short-lived wins and long-lived pain. Your OS should force every experiment to declare its constraint impact.
Step 2 (Day 2): Implement the Experiment Spec YAML (non-negotiable)
This is the “contract” between growth, AI agents, data, and engineering.
Experiment Spec YAML template (copy-paste):
id: "retention_push_v3"
name: "Winback: personalized push copy via AI segments"
owner: "growth_pm@company.com"
status: "proposed" # proposed|running|stopped|shipped|archived
team: "Lifecycle"
hypothesis: >
Personalized winback messaging based on churn reason will increase 28-day
reactivation without increasing opt-out rate.
audience:
eligibility_sql_ref: "warehouse/eligibility/winback_churned_28d.sql"
segment_keys: ["churn_reason", "last_order_category", "geo_tier"]
exclusions: ["push_opted_out", "fraud_flagged"]
channels:
- type: "push"
platform: "braze"
cadence: "1x"
send_window_local: "17:00-20:00"
variants:
control:
description: "Current generic winback copy"
treatment:
description: "AI-personalized copy per segment"
primary_metric:
name: "28d_reactivation_rate"
definition: "reactivated_users / eligible_users"
event_sources: ["order_completed"]
guardrails:
- name: "push_opt_out_rate"
threshold: "no_increase_gt_0.10pp"
- name: "cs_tickets_rate"
threshold: "no_increase_gt_5_percent"
power_and_duration:
min_runtime_days: 14
stop_rules:
- "SRM_detected"
- "guardrail_breach"
instrumentation:
exposure_event: "push_sent"
assignment_key: "user_id"
logging_required: ["variant_id", "segment_keys", "copy_id"]
quality_gates_required:
- "tracking_validated"
- "brand_policy_pass"
- "copy_factuality_pass"
- "dashboard_live"
notes:
risks: ["spam perception", "policy violations"]
dependencies: ["Braze liquid template support", "segment table refresh daily"]
If you do nothing else from this page, do this. Specs make AI work testable.
Step 3 (Days 3-4): Define agent skills as reusable modules
Common “skills” I’ve used in real growth orgs:
- Segment Synthesizer: proposes segmentations, then outputs eligibility SQL stubs.
- Creative Generator: produces copy/creative variants per segment with constraints.
- QA / Policy Checker: flags policy/brand violations and factuality risks.
- Instrumentation Assistant: drafts event taxonomy and analytics queries.
- Experiment Analyst: generates decision-ready readouts and learning cards.
Store skills as prompt templates + tools access (warehouse read, docs read, policy docs, etc.). Keep them composable so one experiment can call 2 skills and another can call 5.
Step 4 (Days 5-6): Install quality gates (block shipping until they pass)
Quality gates are where most AI growth initiatives either become safe or become chaos.
Minimum viable gates:
- Tracking validated: exposure + conversion events exist, fire correctly, and are joinable by assignment key.
- SRM check configured: sample ratio mismatch detection ready day 1.
- Brand/policy check: AI outputs pass your rules (promises, prohibited claims, pricing language).
- Factuality check: no invented features, no false urgency.
- Dashboard live: primary metric + guardrails visible before launch.
Operational note: at Postmates, the fastest teams weren’t the ones who moved recklessly. They were fast because they removed rework. Gates remove rework.
Step 5 (Days 7-8): Measurement framework and decision rules
Define decision rules per experiment class:
- Ship: primary metric improves and guardrails are clean.
- Iterate: directional lift but high uncertainty; narrow scope, improve targeting, rerun.
- Kill: no lift or guardrail breach; document why and tag for later.
Be explicit about what “good enough” means for rollout. Many teams stall because every readout turns into a debate about thresholds.
Step 6 (Days 9-10): Learning cards library (compounding mechanism)
A learning card is a standardized record you can search later.
Template:
- Context (problem, audience, channel)
- Hypothesis
- What we shipped
- Results (primary + guardrails)
- Decision (ship/iterate/kill)
- Why it worked/failed (mechanism)
- Reusable assets (prompts, segments, templates)
- Next bets
This is how you avoid repeating the same experiment every six months with different names.
Decision matrix: where AI belongs in your growth system
Use this rubric to score candidate initiatives. It forces you to pick work that’s both high-impact and operationally feasible.
Score each 1–5 (5 is best). Multiply by weight. Highest totals go first.
| Criterion | Weight | 1 | 3 | 5 |
|---|---|---|---|---|
| Impact potential | 0.30 | Small local metric | Noticeable metric | Major lever tied to NSM |
| Repeatability | 0.15 | One-off task | Occasional | Recurring weekly/daily |
| Data readiness | 0.15 | Missing events | Partial | Clean exposure + outcome |
| Time-to-first-ship | 0.10 | >4 weeks | 2–4 weeks | <2 weeks |
| Risk profile (brand/legal) | 0.10 | High | Medium | Low |
| AI advantage | 0.20 | AI adds little | AI speeds work | AI enables new approach |
Scoring formula (copy-paste):
- Total = Σ(score × weight)
- Prioritize: Total ≥ 4.0 first, 3.2–3.9 next, <3.2 backlog unless strategic
Common high scorers:
- Lifecycle personalization (email/push/in-app)
- Paid creative iteration with guardrails
- SEO content ops with strict quality checks
- In-product onboarding personalization
Common low scorers:
- High-stakes pricing changes with weak instrumentation
- Trust & safety copy without policy maturity
- Any experiment where “success” can’t be measured within one cycle
Real examples (Uber/Postmates patterns + common scenarios)
Example 1: Uber rider reactivation (pattern)
At Uber scale, lifecycle work fails when targeting is blunt. The OS approach forces: (1) eligibility definition, (2) segment keys, (3) guardrails like unsubscribe/complaints, (4) learning cards tagged by geography and cohort.
How AI fits inside the OS:
- Agent proposes 10 segment-message pairings from churn reasons.
- QA gate blocks anything that implies guarantees (“Always cheaper”) or invents ETAs.
- Measurement gate ensures holdouts and SRM checks are live.
The win isn’t “AI wrote copy.” The win is that you can run reactivation as a production line without brand or measurement drift.
Example 2: Postmates supply-constrained growth (pattern)
In delivery, growth is constrained by supply and ETA. If you push demand without considering supply, you get cancellations and angry users.
OS enforcement:
- Spec requires declaring constraint impact (supply/ETA) and guardrail metrics (cancellation rate, late delivery).
- Quality gate blocks shipping if supply health is below threshold in target zones (simple rule-based gate is fine).
AI contribution:
- Agent generates zone-tiered messaging and throttling rules, but the OS blocks “spray and pray” sends when ops health is red.
Example 3: Common scenario — AI SEO program without content spam
If you run SEO growth, AI increases output fast. Without gates, you publish thin pages and damage brand.
OS approach:
- Spec includes target query, intent, internal links requirements, and factuality constraints.
- Agent skill: SERP summarizer + outline generator + schema draft.
- Gates: plagiarism check, claims verification, editorial style, and conversion instrumentation (newsletter signup, demo request).
Copy-pasteable prompts (use with your internal docs + data)
Prompt 1: Turn a growth idea into an Experiment Spec YAML
You are my Growth Ops Lead. Convert the idea below into an experiment spec YAML.
Inputs you must produce:
- hypothesis (mechanism-based, falsifiable)
- audience (eligibility rules + segment_keys + exclusions)
- channels, variants (control/treatment)
- primary_metric + 2-4 guardrails with thresholds
- power_and_duration (min runtime + stop rules)
- instrumentation (exposure event, assignment key, required logging)
- quality_gates_required
- dependencies + risks
Constraints:
- Do not invent numbers about expected lift.
- Ask 5 clarifying questions first if any critical info is missing.
- Assume we measure in a warehouse and activate in Braze unless I specify otherwise.
Idea:
[PASTE YOUR IDEA HERE]
Context:
- North Star metric: [NSM]
- Constraints: [SUPPLY/MARGIN/LEGAL/etc]
- Available events: [LIST EVENTS]
Prompt 2: Create quality gates + automated checks for a specific experiment
You are my AI QA engineer for growth experiments. Design quality gates for the experiment spec below.
Output:
1) A checklist of gates with pass/fail criteria
2) For each gate, suggest one automated check (SQL, regex rules, or unit-test style)
3) Identify the top 5 failure modes and how we detect them in <24 hours
Rules:
- Prefer deterministic checks over subjective review.
- Include tracking validation, SRM detection, brand/policy constraints, and factuality constraints.
- Keep it practical: what would block launch vs what is a warning.
Experiment spec:
[PASTE experiment.yaml HERE]
Common mistakes I see (and how to avoid them)
- Shipping AI outputs without an exposure event. If you can’t measure who saw what, you can’t trust the result. Block launch until exposure logging is real.
- Treating agents as teammates instead of tools. Agents produce drafts. Your OS defines checks. The human owner stays accountable.
- No guardrails, then surprise blowback. Every growth win has a shadow metric. Put it in the spec up front.
- Prompt sprawl. If prompts live in Slack threads, you can’t iterate. Store prompts in version control with the experiment ID.
- Learning cards that read like diaries. Capture the mechanism and the decision. Skip the narrative.
Related frameworks and how they connect
- ICE / RICE prioritization: AI Growth OS adds “AI advantage” and “data readiness” so your scoring reflects operational reality.
- OODA loop (Observe–Orient–Decide–Act): Learning cards are your “Orient” layer; specs and gates accelerate “Act” without losing control.
- Growth Accounting: OS measurement plugs into acquisition/activation/retention/revenue/referral so experiments roll up cleanly.
- Opportunity Solution Tree (Teresa Torres): Use OST to generate high-quality hypotheses; the OS is how you execute them repeatedly.
- CRISP-DM / ML Ops: If you’re training models, the OS becomes the product-facing counterpart to ML ops (data, validation, monitoring), but for growth work.
Frequently Asked Questions
How is the AI Growth OS different from a normal experiment program?
Normal programs standardize experiment cadence. The AI Growth OS standardizes the inputs and outputs of AI work: specs, agent skills, quality gates, and learning cards. That’s what prevents “AI prototypes” from turning into unmeasurable one-offs.
Do I need to build AI agents to use this framework?
No. Start with prompt templates and a disciplined spec + measurement system. Agents become useful once you have repeatable skills and clear gates.
Who should own the AI Growth OS?
A VP Growth or Head of Growth should own the OS design and prioritization. Day-to-day, a Growth Ops lead (or Growth Eng) keeps the YAML specs, gates, and learning library clean.
What’s the minimum viable version I can ship in one sprint?
YAML specs + one dashboard template + three gates (tracking, SRM check, brand/policy) + learning cards. Add agent orchestration after you’ve run 5–10 experiments through the system.
How do I prevent the OS from slowing teams down?
Keep the spec short, make gates partly automated, and reuse components. Velocity comes from reduced rework, not from skipping structure.
Frequently Asked Questions
How is the AI Growth OS different from a normal experiment program?
Do I need to build AI agents to use this framework?
Who should own the AI Growth OS?
What’s the minimum viable version I can ship in one sprint?
How do I prevent the OS from slowing teams down?
Ready to build your AI growth engine?
I help CEOs use AI to build the growth engine their board is asking for.
Talk to Isaac