Year 1 Success Criteria¶
"Success in Year 1 is not 'we have 500 agents live' — it's 'we have one agent delivering real value, built on a mesh-ready foundation, with trust mechanisms in place, ready to scale.'"
This document defines what "done" looks like for Year 1 of Seer's AI agent infrastructure. It provides measurable criteria, explicit boundaries, and decision rules for scaling.
Definition of Done¶
Year 1 is complete when the Gateway Agent (Content Audits & Outlines) meets all success criteria AND the foundation supports safe scaling.
Gateway Agent Success Metrics¶
| Criterion | Target | Measurement Method |
|---|---|---|
| Efficiency Gain | 15h → <3h (5x improvement) | Time tracking comparison |
| Quality Alignment | 85%+ match with senior strategist | Side-by-side audit comparison |
| Adoption | 5+ SEO team members actively using | Usage logs within first 2 weeks |
| Error Rate | Zero critical errors in Auto-Ship tier | Incident tracking |
| User Satisfaction | 4.5/5 rating | Post-workflow survey |
Foundation Success Metrics¶
| Criterion | Target | Evidence |
|---|---|---|
| Reuse Rate | New agents use ≥3 shared modules | Code review checklist |
| Trust Mechanisms | Escalation + audit logging operational | System logs |
| Cross-Platform | Works on OpenCode + Claude Code | Automated validation |
| MCP Reliability | 99.5% uptime for critical servers | Monitoring dashboard |
What Success Is NOT¶
Explicitly out of scope for Year 1:
| Anti-Goal | Why It's Wrong |
|---|---|
| Agent count ("120 agents live") | Vanity metric; encourages one-off builds |
| Feature velocity | Speed without quality degrades trust |
| Platform-specific capabilities | Lock-in prevents scaling |
| Autonomous write-backs to systems of record | Trust not yet established |
| Bespoke UI for each agent | If vendor UI suffices, don't build |
Decision rule: If proposed work optimizes for an anti-goal, reject it.
Minimum Viable Mesh Backbone¶
The infrastructure that makes the second and third agents cheaper and safer to build:
1. Registry¶
- Agent ownership documented
- Allowed data sources and tools per agent
- Version tracking with semantic versioning
2. Observability¶
- Execution logs with timestamps
- Error traces with actionable messages
- Evaluation results (quality scores, timing)
3. Permissioning¶
- OAuth 2.0 for all external services
- Role-based access control (RBAC)
- Least-privilege defaults
4. Reusable Skills/Tools Layer¶
- Shared modules (exec summary, findings rubric, TL checklist)
- Standards skills (writing, quality, prompt-engineering)
- MCP integrations (BigQuery, DataForSEO, Wrike)
Scaling gate: Net-new agents that don't reuse shared skills increase maintenance load and audit surface area. Reuse rate becomes a Year 1 approval criterion.
Quality Gate Tiers¶
Every agent output passes through the appropriate tier:
| Tier | Risk Level | Examples | Validation |
|---|---|---|---|
| Auto-Ship | Low | Page scoring, data extraction, report compilation | Automated tests pass → deploy |
| Peer Review | Medium | Content outlines, strategic recommendations | Division lead approval (24h) |
| Shadow Mode | High | Budget recommendations, QBR deliverables | AI + human parallel; 90%+ alignment for 4 weeks |
Tier assignment is mandatory. Unassigned operations default to Peer Review.
Trust Guardrails¶
Non-negotiable rules for all agent outputs:
No Ungrounded Quantified Claims¶
- Never: "This will result in 20-30% improvement"
- Instead: Use qualitative severity (critical/high/medium/low)
- If numeric: Must cite source ("Based on BigQuery data showing...")
Required Attribution¶
- Data sources must be named in outputs
- "Based on DataForSEO competitive analysis..." not just "Analysis shows..."
Human Review Gates¶
- Client-facing content requires human approval
- Strategic recommendations require peer review
- Budget/spend recommendations require expert approval
Division-Specific Requirements¶
Each division has documented "Questions That Must Be Answered" for their agents. See the Executive Tracker Requirements for:
| Division | Key Initiatives | Data Requirements |
|---|---|---|
| SEO | Search Landscape, Quick Wins, Content Gap | SERP Snapshots, SeerSignals |
| PDM | Competitive Analysis, Keyword Analysis | AdClarity, SEMRush, Google Ads |
| Client Services | QBR Strategic Meetings | SeerSignals, Guru, previous deliverables |
| Creative | Brand Content Strategy | Competitor data, brand guidelines |
Pattern: Every agent must document: 1. Data Requirements (what MCP connections) 2. Questions That Must Be Answered (success criteria) 3. Actions That Must Happen (output format)
Decision Rule for Next Builds¶
Use this framework when evaluating new agent proposals:
IF task is repeatable AND data-dependent:
→ Build as shared skill/MCP capability first
→ Surface via NinjaCat (or other consumption layer)
IF task is mostly reasoning/drafting:
→ Agent-only build is acceptable
IF task doesn't reuse ≥3 shared modules:
→ Reject or redesign before approval
Prioritization Criteria¶
Strongest candidates for next-wave agents: 1. Tasks that repeat across teams and deliverables 2. Tasks that require the same core steps every time (ingest → analyze → revise → QA) 3. Tasks that currently cause rework or cycle-time drag
Build vs Consume Split¶
| Role | Tooling | Purpose |
|---|---|---|
| Builders | Claude, IDE, modular files | Create and QA agents |
| Practitioners | NinjaCat | Run agents with client data connections |
This split is intentional: - Builders get power and flexibility - Practitioners get simplicity and data access - Agents deploy once, consume everywhere
Year 1 Timeline Checkpoints¶
| Milestone | Target | Success Indicator |
|---|---|---|
| Q1 | Gateway Agent in production | 5+ users, 5x efficiency validated |
| Q2 | Second division online | Reuse rate ≥60% |
| Q3 | Mesh backbone operational | Registry + observability live |
| Q4 | Scale decision | Meet all metrics → expand; miss → diagnose |
Measure, Learn, Iterate¶
Track not just efficiency metrics, but trust metrics:
| Category | Metrics |
|---|---|
| Efficiency | Time saved, throughput, adoption rate |
| Trust | Escalation rate, error rate, correction rate |
| Reuse | Shared module adoption, duplicate code detection |
| Satisfaction | User ratings, qualitative feedback |
Monthly review: Dashboard review with division leads. Adjust priorities based on what's working.
Related Documents¶
- Executive Overview — Business value and ROI
- Strategic Roadmap — What's live and coming soon
- ROI Calculator — Efficiency metrics
- Plugin Architecture — Technical specification
- Constitution — Governance principles
Last updated: January 2026