Year 1 Success Criteria¶

"Success in Year 1 is not 'we have 500 agents live' — it's 'we have one agent delivering real value, built on a mesh-ready foundation, with trust mechanisms in place, ready to scale.'"

This document defines what "done" looks like for Year 1 of Seer's AI agent infrastructure. It provides measurable criteria, explicit boundaries, and decision rules for scaling.

Definition of Done¶

Year 1 is complete when the Gateway Agent (Content Audits & Outlines) meets all success criteria AND the foundation supports safe scaling.

Gateway Agent Success Metrics¶

Criterion	Target	Measurement Method
Efficiency Gain	15h → <3h (5x improvement)	Time tracking comparison
Quality Alignment	85%+ match with senior strategist	Side-by-side audit comparison
Adoption	5+ SEO team members actively using	Usage logs within first 2 weeks
Error Rate	Zero critical errors in Auto-Ship tier	Incident tracking
User Satisfaction	4.5/5 rating	Post-workflow survey

Foundation Success Metrics¶

Criterion	Target	Evidence
Reuse Rate	New agents use ≥3 shared modules	Code review checklist
Trust Mechanisms	Escalation + audit logging operational	System logs
Cross-Platform	Works on OpenCode + Claude Code	Automated validation
MCP Reliability	99.5% uptime for critical servers	Monitoring dashboard

What Success Is NOT¶

Explicitly out of scope for Year 1:

Anti-Goal	Why It's Wrong
Agent count ("120 agents live")	Vanity metric; encourages one-off builds
Feature velocity	Speed without quality degrades trust
Platform-specific capabilities	Lock-in prevents scaling
Autonomous write-backs to systems of record	Trust not yet established
Bespoke UI for each agent	If vendor UI suffices, don't build

Decision rule: If proposed work optimizes for an anti-goal, reject it.

Minimum Viable Mesh Backbone¶

The infrastructure that makes the second and third agents cheaper and safer to build:

1. Registry¶

Agent ownership documented
Allowed data sources and tools per agent
Version tracking with semantic versioning

2. Observability¶

Execution logs with timestamps
Error traces with actionable messages
Evaluation results (quality scores, timing)

3. Permissioning¶

OAuth 2.0 for all external services
Role-based access control (RBAC)
Least-privilege defaults

4. Reusable Skills/Tools Layer¶

Shared modules (exec summary, findings rubric, TL checklist)
Standards skills (writing, quality, prompt-engineering)
MCP integrations (BigQuery, DataForSEO, Wrike)

Scaling gate: Net-new agents that don't reuse shared skills increase maintenance load and audit surface area. Reuse rate becomes a Year 1 approval criterion.

Quality Gate Tiers¶

Every agent output passes through the appropriate tier:

Tier	Risk Level	Examples	Validation
Auto-Ship	Low	Page scoring, data extraction, report compilation	Automated tests pass → deploy
Peer Review	Medium	Content outlines, strategic recommendations	Division lead approval (24h)
Shadow Mode	High	Budget recommendations, QBR deliverables	AI + human parallel; 90%+ alignment for 4 weeks

Tier assignment is mandatory. Unassigned operations default to Peer Review.

Trust Guardrails¶

Non-negotiable rules for all agent outputs:

No Ungrounded Quantified Claims¶

Never: "This will result in 20-30% improvement"
Instead: Use qualitative severity (critical/high/medium/low)
If numeric: Must cite source ("Based on BigQuery data showing...")

Required Attribution¶

Data sources must be named in outputs
"Based on DataForSEO competitive analysis..." not just "Analysis shows..."

Human Review Gates¶

Client-facing content requires human approval
Strategic recommendations require peer review
Budget/spend recommendations require expert approval

Division-Specific Requirements¶

Each division has documented "Questions That Must Be Answered" for their agents. See the Executive Tracker Requirements for:

Division	Key Initiatives	Data Requirements
SEO	Search Landscape, Quick Wins, Content Gap	SERP Snapshots, SeerSignals
PDM	Competitive Analysis, Keyword Analysis	AdClarity, SEMRush, Google Ads
Client Services	QBR Strategic Meetings	SeerSignals, Guru, previous deliverables
Creative	Brand Content Strategy	Competitor data, brand guidelines

Pattern: Every agent must document: 1. Data Requirements (what MCP connections) 2. Questions That Must Be Answered (success criteria) 3. Actions That Must Happen (output format)

Decision Rule for Next Builds¶

Use this framework when evaluating new agent proposals:

IF task is repeatable AND data-dependent:
    → Build as shared skill/MCP capability first
    → Surface via NinjaCat (or other consumption layer)

IF task is mostly reasoning/drafting:
    → Agent-only build is acceptable

IF task doesn't reuse ≥3 shared modules:
    → Reject or redesign before approval

Prioritization Criteria¶

Strongest candidates for next-wave agents: 1. Tasks that repeat across teams and deliverables 2. Tasks that require the same core steps every time (ingest → analyze → revise → QA) 3. Tasks that currently cause rework or cycle-time drag

Build vs Consume Split¶

Role	Tooling	Purpose
Builders	Claude, IDE, modular files	Create and QA agents
Practitioners	NinjaCat	Run agents with client data connections

This split is intentional: - Builders get power and flexibility - Practitioners get simplicity and data access - Agents deploy once, consume everywhere

Year 1 Timeline Checkpoints¶

Milestone	Target	Success Indicator
Q1	Gateway Agent in production	5+ users, 5x efficiency validated
Q2	Second division online	Reuse rate ≥60%
Q3	Mesh backbone operational	Registry + observability live
Q4	Scale decision	Meet all metrics → expand; miss → diagnose

Measure, Learn, Iterate¶

Track not just efficiency metrics, but trust metrics:

Category	Metrics
Efficiency	Time saved, throughput, adoption rate
Trust	Escalation rate, error rate, correction rate
Reuse	Shared module adoption, duplicate code detection
Satisfaction	User ratings, qualitative feedback

Monthly review: Dashboard review with division leads. Adjust priorities based on what's working.

Executive Overview — Business value and ROI
Strategic Roadmap — What's live and coming soon
ROI Calculator — Efficiency metrics
Plugin Architecture — Technical specification
Constitution — Governance principles

Last updated: January 2026