QA Agent Infrastructure: Trust But Verify¶

Last Updated: 2026-01-29
Related Slack Discussion: "Trust + Verify, QA Agent" (AI Channel)
Status: ✅ Production - Fully Implemented

Overview¶

Seer's QA Agent Infrastructure provides automated quality validation for AI-generated deliverables. This system ensures data accuracy, proper citation, and brand compliance before client delivery.

Core Components¶

Component	Purpose	Location
Trust But Verify Playbook	Practitioner guide for data validation	`docs/wiki/how-to/trust-but-verify.md`
`/qa-check` Command	On-demand quality validation	`plugins/core-dependencies/commands/qa-check.md`
Stop Hook (QA Check)	Auto-validation before completion	`plugins/core-dependencies/scripts/stop-qa-check.sh`
Quality Standards Skill	Auto-activated QA standards	`plugins/core-dependencies/skills/quality-standards/`
PostToolUse Hook	File change tracking for verification	`plugins/core-dependencies/scripts/post-tool-use.sh`

How It Works¶

1. Automatic Quality Gates (Stop Hook)¶

When Claude attempts to complete work, the Stop hook automatically:

Detects completion claims ("done", "finished", "works")
Triggers QA validation checks
Blocks completion if critical issues found
Provides actionable feedback for fixes

What it checks:

✅ Test coverage for code changes
✅ Verification of completion claims
✅ Commit checkpoints for multi-file edits

2. On-Demand Validation (`/qa-check`)¶

Runs comprehensive checks on current deliverable:

/qa-check

Output:

QA CHECK RESULTS
================

Deliverable Type: slide-deck

Action Titles: PASS
  ✓ All titles are conclusions

Data Sources: FAIL
  ✗ Line 42: "Traffic increased 34%" - no source cited
  ✗ Line 67: "Competitor rankings improved" - no data reference

Brand Compliance: WARN
  ⚠ Line 89: "We recommend" - prefer "The data shows"

Completeness: PASS
  ✓ No placeholders found

Overall: NEEDS REVISION

3. Quality Standards Skill (Auto-Activated)¶

Activates automatically when working with:

Data analysis or metrics
Projections/forecasts
Recommendations based on data
Client-facing deliverables

Core Principles:

Explicit is better than implicit - State assumptions, cite sources
Evidence-based recommendations - Every claim backed by data
Conservative estimates - Under-promise, over-deliver
Sanity checks - Validate math, logic, conclusions

Quality Check Categories¶

1. Action Titles (Presentations)¶

Rule: Slide titles must be conclusions, not labels.

❌ Bad (Label)	✅ Good (Conclusion)
Traffic Overview	Organic traffic increased 34% YoY
Performance Summary	Mobile conversion rate improved despite traffic decline
Q3 Results	Competitive gap closed by 15 positions

2. Data Source Verification¶

Rule: All metrics must cite sources with date ranges.

Required format:

"Based on BigQuery OrganicRankings_Daily table (March 1-31, 2024), 
traffic increased 18% (12,400 → 14,616 sessions)."

Acceptable sources:

BigQuery / Seer Signals
Google Analytics 4
Google Search Console
DataForSEO
Platform APIs (Google Ads, Meta Ads, etc.)

3. Projection Methodology¶

Rule: Projections must show calculation + assumptions.

Example:

Potential traffic lift: 180-220 clicks/month

Methodology:
- Current position: 7
- Target position: 3
- Monthly search volume: 2,400
- CTR improvement: 4.5% → 9.2% (AWR 2024 CTR Study)
- Calculation: 2,400 × (0.092 - 0.045) = 113 clicks (base)
- Scenarios: Best (+50%), Better (+30%), Good (+10%)

Assumptions:
1. Competitive landscape remains stable
2. Content implemented within 30 days
3. Technical SEO issues addressed
4. No major algorithm updates

4. Brand Compliance (Seer Voice)¶

Issues flagged:

Passive voice where active is better
Jargon without explanation
"We recommend" (prefer "The data shows" or "Testing confirmed")
Overly formal language
Guarantee language ("will", "guaranteed", "100% will")

5. Completeness¶

Issues flagged:

[TODO], [TBD], [PLACEHOLDER] markers
Empty sections in templates
Incomplete sentences or bullet points

QA Tiers (When Peer Review Required)¶

Auto-Ship (No Peer Review)¶

Data extraction queries
Keyword research (volume, difficulty, SERP features)
Competitive research (what competitors are doing)
Traffic/ranking reports

Peer Review Required¶

Strategic recommendations
Content differentiation strategies
Client-facing deliverables (audits, outlines, analyses)
ROI projections and Expected Outcome tables
Priority recommendations and roadmaps

Shadow Mode (Senior Review + Validation)¶

New methodologies not yet proven
Experimental approaches
High-stakes client deliverables (>$100K projected impact)
Sensitive competitive positioning

Decision rule:

If deliverable includes projections/recommendations → Peer Review
If deliverable goes directly to client → Peer Review
If stakes are high (revenue, relationship) → Shadow Mode
If just data extraction → Auto-Ship

Verification Paths by Data Source¶

BigQuery / Seer Signals¶

What to verify:

-- Re-run query to confirm data freshness
SELECT * FROM `project.dataset.table`
WHERE org_name = 'ClientName'
  AND date >= '2024-03-01'
  AND date <= '2024-03-31'

Check:

✅ org_name filter matches client
✅ Date range matches deliverable
✅ No outliers or anomalies

Google Analytics 4¶

Where to verify:

GA4 UI → Reports → Acquisition → Traffic acquisition
GA4 UI → Reports → Engagement → Pages and screens
GA4 UI → Explore → Free form

Check:

✅ Date range matches exactly
✅ Segment filters correct (device, geography, user type)
✅ Cross-check conversions with CRM (expect 5-15% variance)

Google Search Console¶

Where to verify:

GSC UI → Performance → Search results
GSC UI → Performance → Queries tab (keyword data)
GSC UI → Performance → Pages tab (landing page data)

Check:

✅ Date range matches (GSC has 2-3 day lag)
✅ Filter matches (device, country, search type)
✅ Compare GSC clicks with GA4 organic (expect 10-20% variance)

Why variance exists:

GSC tracks Google-only; GA4 includes all search engines
GSC counts clicks; GA4 counts sessions
Bot traffic filtered differently

DataForSEO (SERP Analysis)¶

What to verify:

Manual Google search - Confirm rankings/SERP features
SEMrush/Ahrefs - Cross-check keyword volumes
Multiple browsers/locations - Account for personalization

Check:

✅ Rankings can fluctuate daily - note snapshot date
✅ SERP features are dynamic - verify current state
✅ Competitor analysis is point-in-time

The Rule of Five (Self-Review Protocol)¶

Agent outputs are first drafts, not final deliverables.
Self-review at least 5 times before delivery:

Pass 1: Data Accuracy¶

Are all metrics cited with sources?
Are calculations correct?
Do date ranges make sense?
Is sample size adequate?

Pass 2: Logic & Reasoning¶

Do recommendations follow from data?
Are there alternative explanations?
Did I consider external factors (seasonality, algorithm updates)?
Are assumptions reasonable?

Pass 3: Client Context¶

Does this align with client's business model?
Is language client-appropriate (no jargon)?
Are recommendations actionable for their team?
Is tone suitable for relationship stage?

Pass 4: Deliverable Quality¶

Is formatting clean and consistent?
Do links work?
Are tables and charts clear?
Is document structure logical?

Pass 5: Final QA Gate¶

Run /qa-check for automated validation
Review Quality Standards
Review the QA Review Checklist (quality-standards skill resources)
Confirm peer review if required

Why 5 passes? Each review catches different types of issues. First pass sees data problems. Last pass catches subtle tone or framing issues.

Common QA Block Scenarios¶

Block Message	What It Means	How to Fix
"Metrics cited without data source"	Numbers without citation	Add `(Source: GA4, March 2024)` or `(BigQuery OrganicRankings_Daily)`
"Completion claimed but no verification"	Said "done" without proof	Run tests/build commands and confirm pass
"Action titles required"	Slide titles are labels	Change to conclusions ("Traffic grew 34% YoY")
"Unsupported claim detected"	"Studies show..." without citation	Cite specific study or rephrase as analysis
"Overpromising language detected"	"Will increase", "guaranteed"	Use qualified language: "could increase", "may improve"

Integration with Core Infrastructure¶

Hook Execution Flow¶

User completes work
    ↓
Stop Hook fires automatically
    ↓
Reads edit-log.txt (from PostToolUse)
    ↓
Checks for:
    - TDD violations (source edited without tests)
    - Verification claims ("done", "works")
    - Commit reminders (3+ files edited)
    ↓
Provides gentle reminders:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💭 Remember: Write tests for changed code
✅ Before claiming complete: Run tests and verify
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

File Change Tracking (PostToolUse Hook)¶

When: After every Edit/Write/MultiEdit operation

What it tracks:

File path and edit type
Division categorization (SEO, PDM, Analytics, etc.)
Session cache (.claude/edit-cache/)
Cross-hook state (context/edit-log.txt)

Why: Enables Stop hook to provide context-aware reminders (e.g., "Remember to test the 3 files you edited")

Practitioner Shortcuts¶

"Does this number look right?"¶

Sanity check questions:

Would a 500% traffic increase actually be realistic?
Do these rankings align with competitive landscape?
Is this CTR projection reasonable for this keyword type?
Does conversion rate match client's historical average?

Quick verification:

Compare to prior period (does trend make sense?)
Check against industry benchmarks (within 2x of normal?)
Cross-reference with another data source (GA4 vs. GSC)

"Where do I verify this metric?"¶

Metric Type	Primary Source	Backup Source
Organic traffic, rankings	Google Search Console	GA4 organic sessions
Sessions, conversions	Google Analytics 4	Client CRM/backend
SERP features, competitors	DataForSEO	Manual Google search
Paid campaign performance	Google Ads / Meta Ads	Platform UI
Keyword volumes	Seer Signals / DataForSEO	SEMrush / Ahrefs

"What if sources conflict?"¶

Variance tolerance guidelines:

Comparison	Expected Variance	Action if Exceeded
GA4 vs. CRM conversions	5-15%	Investigate attribution, tracking lag
GSC clicks vs. GA4 organic	10-20%	Note in deliverable (different definitions)
DataForSEO rankings vs. manual	±2 positions	Use manual as source of truth
Backend conversions vs. pixel	>15%	Flag for CAPI/Redundant Event Pipeline

Core QA Infrastructure¶

Trust But Verify Playbook - Comprehensive practitioner guide
Quality Standards Skill - Auto-activated standards
/qa-check Command - On-demand validation

For Builders

The full QA skill resources (qa-review.md, fact-checking.md, quality.md) are in the plugin source at plugins/core-dependencies/skills/quality-standards/resources/.

Division-Specific Guidance¶

SEO Workflows - SEO-specific verification patterns
Analytics Workflows - Analytics verification patterns
PDM Workflows - Paid media verification patterns

Development Context¶

Best Practices - Working with agents effectively
Troubleshooting - Common issues and solutions

Implementation Notes¶

Source: Slack Discussion Context¶

This document synthesizes QA infrastructure knowledge from:

Thread: "Trust + Verify, QA Agent" (AI Slack Channel, ~Jan 2026)
Implemented components in plugins/core-dependencies/
Practitioner feedback and quality gate patterns

Production Status¶

✅ Fully Operational:

Stop hook with gentle reminders
/qa-check command with structured validation
Quality Standards skill auto-activation
PostToolUse file tracking
Trust But Verify practitioner playbook

🚧 Enhancement Opportunities (from Slack discussion):

Override mechanism for /qa-check --override "reason"
Integration with /slide-deck (auto-run QA before output)
Division-specific quality checks (SEO, Analytics, PDM)
Automated peer review routing based on QA tier

Key Takeaways¶

QA is automatic - Stop hook provides gentle reminders without being intrusive
QA is on-demand - /qa-check provides detailed validation anytime
QA is context-aware - Skills activate based on content type and division
QA is practitioner-friendly - Trust But Verify playbook provides verification paths
QA is non-blocking - Gentle reminders, not hard stops (unless critical)

Philosophy: Trust AI outputs, but verify before client delivery. The QA Agent Infrastructure makes verification systematic, not burdensome.

QA Agent Infrastructure: Trust But Verify¶

Overview¶

Core Components¶

How It Works¶

1. Automatic Quality Gates (Stop Hook)¶

2. On-Demand Validation (/qa-check)¶

3. Quality Standards Skill (Auto-Activated)¶

Quality Check Categories¶

1. Action Titles (Presentations)¶

2. Data Source Verification¶

3. Projection Methodology¶

4. Brand Compliance (Seer Voice)¶

5. Completeness¶

QA Tiers (When Peer Review Required)¶

Auto-Ship (No Peer Review)¶

Peer Review Required¶

Shadow Mode (Senior Review + Validation)¶

Verification Paths by Data Source¶

BigQuery / Seer Signals¶

Google Analytics 4¶

Google Search Console¶

DataForSEO (SERP Analysis)¶

The Rule of Five (Self-Review Protocol)¶

Pass 1: Data Accuracy¶

Pass 2: Logic & Reasoning¶

Pass 3: Client Context¶

Pass 4: Deliverable Quality¶

Pass 5: Final QA Gate¶

Common QA Block Scenarios¶

Integration with Core Infrastructure¶

Hook Execution Flow¶

File Change Tracking (PostToolUse Hook)¶

Practitioner Shortcuts¶

"Does this number look right?"¶

"Where do I verify this metric?"¶

"What if sources conflict?"¶

Related Resources¶

Core QA Infrastructure¶

Division-Specific Guidance¶

Development Context¶

Implementation Notes¶

Source: Slack Discussion Context¶

Production Status¶

Key Takeaways¶

2. On-Demand Validation (`/qa-check`)¶