Skip to content

Year 1 Success Criteria

"Success in Year 1 is not 'we have 500 agents live' — it's 'we have one agent delivering real value, built on a mesh-ready foundation, with trust mechanisms in place, ready to scale.'"

This document defines what "done" looks like for Year 1 of Seer's AI agent infrastructure. It provides measurable criteria, explicit boundaries, and decision rules for scaling.


Definition of Done

Year 1 is complete when the Gateway Agent (Content Audits & Outlines) meets all success criteria AND the foundation supports safe scaling.

Gateway Agent Success Metrics

Criterion Target Measurement Method
Efficiency Gain 15h → <3h (5x improvement) Time tracking comparison
Quality Alignment 85%+ match with senior strategist Side-by-side audit comparison
Adoption 5+ SEO team members actively using Usage logs within first 2 weeks
Error Rate Zero critical errors in Auto-Ship tier Incident tracking
User Satisfaction 4.5/5 rating Post-workflow survey

Foundation Success Metrics

Criterion Target Evidence
Reuse Rate New agents use ≥3 shared modules Code review checklist
Trust Mechanisms Escalation + audit logging operational System logs
Cross-Platform Works on OpenCode + Claude Code Automated validation
MCP Reliability 99.5% uptime for critical servers Monitoring dashboard

What Success Is NOT

Explicitly out of scope for Year 1:

Anti-Goal Why It's Wrong
Agent count ("120 agents live") Vanity metric; encourages one-off builds
Feature velocity Speed without quality degrades trust
Platform-specific capabilities Lock-in prevents scaling
Autonomous write-backs to systems of record Trust not yet established
Bespoke UI for each agent If vendor UI suffices, don't build

Decision rule: If proposed work optimizes for an anti-goal, reject it.


Minimum Viable Mesh Backbone

The infrastructure that makes the second and third agents cheaper and safer to build:

1. Registry

  • Agent ownership documented
  • Allowed data sources and tools per agent
  • Version tracking with semantic versioning

2. Observability

  • Execution logs with timestamps
  • Error traces with actionable messages
  • Evaluation results (quality scores, timing)

3. Permissioning

  • OAuth 2.0 for all external services
  • Role-based access control (RBAC)
  • Least-privilege defaults

4. Reusable Skills/Tools Layer

  • Shared modules (exec summary, findings rubric, TL checklist)
  • Standards skills (writing, quality, prompt-engineering)
  • MCP integrations (BigQuery, DataForSEO, Wrike)

Scaling gate: Net-new agents that don't reuse shared skills increase maintenance load and audit surface area. Reuse rate becomes a Year 1 approval criterion.


Quality Gate Tiers

Every agent output passes through the appropriate tier:

Tier Risk Level Examples Validation
Auto-Ship Low Page scoring, data extraction, report compilation Automated tests pass → deploy
Peer Review Medium Content outlines, strategic recommendations Division lead approval (24h)
Shadow Mode High Budget recommendations, QBR deliverables AI + human parallel; 90%+ alignment for 4 weeks

Tier assignment is mandatory. Unassigned operations default to Peer Review.


Trust Guardrails

Non-negotiable rules for all agent outputs:

No Ungrounded Quantified Claims

  • Never: "This will result in 20-30% improvement"
  • Instead: Use qualitative severity (critical/high/medium/low)
  • If numeric: Must cite source ("Based on BigQuery data showing...")

Required Attribution

  • Data sources must be named in outputs
  • "Based on DataForSEO competitive analysis..." not just "Analysis shows..."

Human Review Gates

  • Client-facing content requires human approval
  • Strategic recommendations require peer review
  • Budget/spend recommendations require expert approval

Division-Specific Requirements

Each division has documented "Questions That Must Be Answered" for their agents. See the Executive Tracker Requirements for:

Division Key Initiatives Data Requirements
SEO Search Landscape, Quick Wins, Content Gap SERP Snapshots, SeerSignals
PDM Competitive Analysis, Keyword Analysis AdClarity, SEMRush, Google Ads
Client Services QBR Strategic Meetings SeerSignals, Guru, previous deliverables
Creative Brand Content Strategy Competitor data, brand guidelines

Pattern: Every agent must document: 1. Data Requirements (what MCP connections) 2. Questions That Must Be Answered (success criteria) 3. Actions That Must Happen (output format)


Decision Rule for Next Builds

Use this framework when evaluating new agent proposals:

IF task is repeatable AND data-dependent:
    → Build as shared skill/MCP capability first
    → Surface via NinjaCat (or other consumption layer)

IF task is mostly reasoning/drafting:
    → Agent-only build is acceptable

IF task doesn't reuse ≥3 shared modules:
    → Reject or redesign before approval

Prioritization Criteria

Strongest candidates for next-wave agents: 1. Tasks that repeat across teams and deliverables 2. Tasks that require the same core steps every time (ingest → analyze → revise → QA) 3. Tasks that currently cause rework or cycle-time drag


Build vs Consume Split

Role Tooling Purpose
Builders Claude, IDE, modular files Create and QA agents
Practitioners NinjaCat Run agents with client data connections

This split is intentional: - Builders get power and flexibility - Practitioners get simplicity and data access - Agents deploy once, consume everywhere


Year 1 Timeline Checkpoints

Milestone Target Success Indicator
Q1 Gateway Agent in production 5+ users, 5x efficiency validated
Q2 Second division online Reuse rate ≥60%
Q3 Mesh backbone operational Registry + observability live
Q4 Scale decision Meet all metrics → expand; miss → diagnose

Measure, Learn, Iterate

Track not just efficiency metrics, but trust metrics:

Category Metrics
Efficiency Time saved, throughput, adoption rate
Trust Escalation rate, error rate, correction rate
Reuse Shared module adoption, duplicate code detection
Satisfaction User ratings, qualitative feedback

Monthly review: Dashboard review with division leads. Adjust priorities based on what's working.



Last updated: January 2026