Skip to content

Skills Overhaul — Implementation Spec

Author: Forge (Director of Product Architecture) Date: 2026-03-20 Status: Draft — Pending Quinn + Jules review before Melody execution References: BRIEF.md, anthropic-skills-guide.md, ANTHROPIC_SKILLS_GUIDE_REVIEW.md


How to Read This Spec

  • Melody builds directly from this. No clarifying questions expected.
  • Quinn verifies against acceptance criteria — each AC is independently testable.
  • Complexity ratings: ⭐ = ~30 min, ⭐⭐ = ~1 hr, ⭐⭐⭐ = ~2 hr, ⭐⭐⭐⭐ = ~3 hr, ⭐⭐⭐⭐⭐ = 4+ hr
  • Model tags: [cloud-required] = needs Claude Sonnet or better. [local-ok] = local model fine.
  • All paths are relative to ~/.openclaw/workspace/skills/ unless otherwise noted.

MEDIUM TIER


M1: cost-estimation

Complexity: ⭐⭐ | Model: [local-ok]

Objective

Add a Python estimation script and move the pricing registry into the skill's references/ directory for proper progressive disclosure. Update SKILL.md to reference new paths.

File Changes

New Files

cost-estimation/references/MODEL_PRICING_REGISTRY.md - Move from ~/.openclaw/workspace/MODEL_PRICING_REGISTRY.md (current location) - No content changes — copy as-is - Update all internal references if any exist

cost-estimation/scripts/estimate.py - Language: Python 3 - Purpose: Parse session token data and compute estimated cost without manual math - Inputs (CLI args): - --input-tokens INT — input tokens used - --output-tokens INT — output tokens used - --cached-tokens INT (optional, default 0) — cached input tokens - --model STR — model name matching registry (e.g. claude-sonnet-4-6) - --registry PATH (optional) — path to pricing registry MD; defaults to ../references/MODEL_PRICING_REGISTRY.md relative to script location - Behavior: 1. Parse the pricing registry MD file — find the row matching --model, extract input/output/cached per-million prices 2. Calculate: (input_tokens / 1_000_000 * input_price) + (output_tokens / 1_000_000 * output_price) + (cached_tokens / 1_000_000 * cached_price) 3. Print a formatted summary:

Model: claude-sonnet-4-6
Input:    1,234,567 tokens × $3.00/M  = $3.70
Output:      45,678 tokens × $15.00/M = $0.69
Cached:     123,456 tokens × $0.30/M  = $0.04
─────────────────────────────────────────────
Estimated total: $4.43
⚠ Flag: Exceeds $5 single-session threshold
4. Exit code 0 on success, 1 on model-not-found or parse error - Error handling: if model not found in registry, print ERROR: Model 'X' not found in registry. Available: [list] and exit 1

Modified Files

cost-estimation/SKILL.md — frontmatter + body updates:

---
name: cost-estimation
description: >
  Estimate API spend from local OpenClaw telemetry using the MODEL_PRICING_REGISTRY.
  Use during heartbeat checks or when Jeff asks about token burn or costs. Also use when
  computing post-session spend after heavy sub-agent runs (Melody/Quinn/Atlas). Run
  scripts/estimate.py for automated calculation — provide token counts from session_status.
  Do NOT use for billing disputes or exact invoicing — this produces estimates only,
  not billing-grade numbers.
---

SKILL.md body changes: - Step 2: Update path reference from workspace root to references/MODEL_PRICING_REGISTRY.md within the skill folder - Step 3: Add instruction to use python scripts/estimate.py instead of manual math. Show exact CLI invocation with token counts from session_status - Add Examples section (see below) - Add Anomaly Thresholds section (keep existing content)

Examples section to add:

## Examples

### Automated calculation
After a session shows 1.2M input, 45K output, 120K cached on claude-sonnet-4-6:
python scripts/estimate.py --model claude-sonnet-4-6 \ --input-tokens 1200000 --output-tokens 45000 --cached-tokens 120000
### Manual fallback (if script unavailable)
Read references/MODEL_PRICING_REGISTRY.md, locate the model row, apply formula:
(1200000/1000000 × 3.00) + (45000/1000000 × 15.00) + (120000/1000000 × 0.30) = $3.60 + $0.68 + $0.04 = $4.32 estimated
### What NOT to report
Never report "we spent $X" as exact billing. Always qualify: "approximately $X estimated."

Acceptance Criteria (Quinn)

  1. cost-estimation/references/MODEL_PRICING_REGISTRY.md exists and contains the same pricing data as the source file
  2. cost-estimation/scripts/estimate.py exists and is executable (python scripts/estimate.py --help does not crash)
  3. Running the script with valid args produces formatted output matching the specified format
  4. Running with an unknown model produces a clear error and exits non-zero
  5. SKILL.md description field is under 1024 characters and includes a negative trigger
  6. SKILL.md body references references/MODEL_PRICING_REGISTRY.md (not the old workspace root path)
  7. SKILL.md body shows a concrete example of python scripts/estimate.py invocation

M2: service-management (absorbs project-scaffolding)

Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]

Objective

Merge project-scaffolding into service-management. Expand service-management with shell scripts for common operations, a references directory, troubleshooting section, and worked examples. Delete project-scaffolding skill after merge.

File Changes

New Files

service-management/scripts/start-service.sh - Language: Bash - Purpose: Start a named service in detached production mode on Mac mini - Inputs (CLI args): - $1 — service name (display label, e.g. "Mission Control") - $2 — command to run (e.g. npm start) - $3 — working directory (absolute path) - $4 — port number to health-check - $5 (optional) — health endpoint path, defaults to /api/health - Behavior: 1. cd to working dir, fail if not found 2. Launch: nohup $COMMAND > service.log 2>&1 < /dev/null & 3. Write PID: echo $! > service.pid 4. Poll curl -sf http://localhost:$PORT$HEALTH_PATH every 2s for up to 30s 5. On 200 OK: print ✓ $SERVICE_NAME is up (PID $PID) and exit 0 6. On timeout: print ✗ $SERVICE_NAME failed to start. Last log: + tail service.log, exit 1 - Error handling: Check that port isn't already bound before launching (use lsof -ti :$PORT on macOS; if occupied, print warning and exit 1)

service-management/scripts/stop-service.sh - Language: Bash - Purpose: Stop a running service using its PID file - Inputs: - $1 — working directory containing service.pid - $2 (optional) — service name for display - Behavior: 1. Read PID from service.pid 2. Send SIGTERM, wait up to 10s 3. If still alive, send SIGKILL 4. Remove service.pid 5. Print ✓ $SERVICE_NAME stopped or ✗ No PID file found at $DIR/service.pid

service-management/scripts/check-health.sh - Language: Bash - Purpose: Verify a running service is healthy - Inputs: - $1 — port number - $2 (optional) — health endpoint path, defaults to /api/health - $3 (optional) — service name for display - Behavior: 1. curl -sf -o /dev/null -w "%{http_code}" http://localhost:$PORT$PATH 2. If 200: print ✓ $SERVICE_NAME healthy (HTTP 200) and exit 0 3. If other code: print ✗ $SERVICE_NAME unhealthy (HTTP $CODE) and exit 1 4. If curl fails: print ✗ $SERVICE_NAME unreachable and exit 1

service-management/scripts/scaffold.sh - Language: Bash - Purpose: Scaffold a new VV Next.js project to standard - Inputs: - $1 — project name (kebab-case, becomes directory name and npm name) - $2 — target port number - $3 (optional) — parent directory, defaults to ~/projects - Behavior: 1. Validate that $1 is kebab-case (reject if contains spaces or uppercase) 2. cd $PARENT_DIR && npx create-next-app@latest $PROJECT_NAME --typescript --tailwind --app --src-dir=false --import-alias "@/*" --yes 3. Create app/api/health/route.ts with { ok: true, ts: new Date().toISOString() } response 4. Create scripts/start.sh, scripts/stop.sh, scripts/status.sh using the service-management templates (see references/script-templates.md) 5. Create globals.css with VV dark theme variables (see references/globals-template.css) 6. Append dev port override to package.json scripts: "dev": "next dev -p $PORT" 7. Run npm run build — fail loudly if errors 8. Print summary: project path, port, next steps - Error handling: If create-next-app fails, print error and exit 1. Do NOT proceed with subsequent steps.

service-management/references/standard-stack.md - Purpose: Reference document defining the VV standard tech stack for new projects - Content: - Next.js 15 (App Router), TypeScript, Tailwind CSS v4 - Font choices: Inter (UI), JetBrains Mono (data/code) - Dark theme baseline: #0a0a0a bg, #141414 cards - Health endpoint contract: GET /api/health{ ok: true, ts: ISO-8601 } - PID file convention: service.pid in project root - Log file convention: service.log in project root - Port assignment registry (list known ports: 3100=MC, 3200=next project, etc.) - Jest for unit tests, minimum threshold: 100% pass on build - Git: initialize on scaffold, initial commit required

service-management/references/globals-template.css - Purpose: Copy-paste starting CSS with VV dark theme variables - Content: CSS custom properties for colors, spacing tokens, font stacks. Dark theme. Matches VV brand. Tailwind v4 compatible (use @theme directive).

service-management/references/script-templates.md - Purpose: Templates for the per-project start/stop/status scripts - Content: Three shell script templates (start.sh, stop.sh, status.sh) with placeholders {{PORT}}, {{PROJECT_NAME}}, {{HEALTH_PATH}}. Note: these are per-project convenience wrappers that call the skill-level scripts.

service-management/references/troubleshooting.md - Purpose: Reference for common service management failure modes - Content: - Port already bound: lsof -ti :$PORT | xargs kill -9 (macOS), then re-run start - Stale PID file: Service crashed without cleanup. Remove .pid, verify process is dead, restart. - launchd issues: MC not a launchd service — do not try to create plist unless explicitly requested - next build fails: TypeScript errors must be resolved before proceeding. Never bypass with --no-lint in production. - Health check returns 404: /api/health route not created. Scaffold may be incomplete — check app/api/health/route.ts exists. - nohup log empty after 5s: Service likely crashed on startup. cat service.log to see error. - Port conflicts between MC and new project: Keep port registry in references/standard-stack.md updated.

Modified Files

service-management/SKILL.md — full rewrite to absorb project-scaffolding:

Frontmatter:

---
name: service-management
description: >
  Launch, maintain, verify, and scaffold long-running local web services on the Mac mini.
  Use when starting, stopping, or health-checking Mission Control or any VV web app. Also use
  when Melody or Jules needs to scaffold a new Next.js/React project to VV standards.
  Scripts: start-service.sh, stop-service.sh, check-health.sh, scaffold.sh.
  Do NOT use for cloud-deployed services, CI/CD pipelines, or Docker containers —
  this is for local Mac mini operations only.
---

Body section outline: 1. Core Principle — production mode only; startup log ≠ healthy 2. Launch Rules — existing 4 rules (preserved) 3. Scripts Reference — one-line description of each script, example invocations 4. New Project Setup — merged from project-scaffolding: standard stack, scaffold.sh usage, what scaffold.sh does, manual steps if script fails 5. Health Verification Protocol — after any launch: run check-health.sh, verify 200 OK 6. Troubleshooting — link to references/troubleshooting.md + inline quick-reference for top 3 issues 7. Examples — two walkthroughs (see below) 8. Local macOS Constraints — existing lsof note (preserved)

Examples section:

## Examples

### Starting Mission Control from scratch
```bash
./scripts/start-service.sh "Mission Control" "npm start" ~/projects/mission-control 3100
# Expected: ✓ Mission Control is up (PID 12345)

Scaffolding a new VV app

./scripts/scaffold.sh vv-analytics 3200
# Expected: new project at ~/projects/vv-analytics, builds clean, port 3200

Manual stop when PID file is stale

ps aux | grep "npm start" | grep -v grep
# Find PID, then:
kill -TERM $PID
rm ~/projects/mission-control/service.pid
#### Files to Delete
- `project-scaffolding/SKILL.md` — delete after merge verified

### Acceptance Criteria (Quinn)
1. All four scripts exist in `service-management/scripts/`: `start-service.sh`, `stop-service.sh`, `check-health.sh`, `scaffold.sh`
2. All scripts are executable (`chmod +x` applied)
3. `start-service.sh` with a running port returns exit 1 before attempting launch
4. `stop-service.sh` gracefully handles missing PID file (exit 1 with clear message, no crash)
5. `check-health.sh` returns exit 0 for a live service at the correct port
6. `scaffold.sh` creates a project directory with `app/api/health/route.ts` present
7. `scaffold.sh` runs `npm run build` and fails loudly (exit 1) if build errors exist
8. `service-management/references/standard-stack.md` exists and lists at least port 3100=MC
9. `service-management/references/globals-template.css` exists and contains at least 5 CSS custom properties
10. `service-management/references/troubleshooting.md` exists and covers at least 5 failure modes
11. `service-management/SKILL.md` description is under 1024 characters with negative trigger
12. SKILL.md body contains a "New Project Setup" section
13. SKILL.md body contains an "Examples" section with at least 2 walkthroughs
14. `project-scaffolding/SKILL.md` has been deleted

---

## M3: qa-validation

**Complexity:** ⭐⭐⭐ | **Model:** `[local-ok]`

### Objective
Add a validation script to automate the QA checklist, add a Peekaboo reference guide, and add a troubleshooting section for common build failure patterns.

### File Changes

#### New Files

**`qa-validation/scripts/validate.sh`**
- Language: Bash
- Purpose: Automate the QA checklist — run build, tests, health check in sequence
- Inputs (CLI args):
  - `$1` — project root directory (absolute path)
  - `$2` (optional) — port number for health check, defaults to 3100
  - `$3` (optional) — health endpoint, defaults to `/api/health`
- Behavior (ordered steps, halt on first failure):
  1. `cd $PROJECT_ROOT` — fail if directory not found
  2. Print `[1/4] Static analysis: npm run build`
     - Run `npm run build` — capture exit code
     - If non-zero: print `✗ Build failed. Fix TypeScript/lint errors before proceeding.` + last 20 lines of output, exit 1
     - If zero: print `✓ Build passed`
  3. Print `[2/4] Unit tests: npm test`
     - Run `npm test -- --watchAll=false` (non-interactive)
     - If non-zero: print `✗ Tests failed.` + output, exit 1
     - If zero: print `✓ Tests passed`
  4. Print `[3/4] Service check: verifying process is running on port $PORT`
     - `lsof -ti :$PORT > /dev/null 2>&1 || (echo "No process on port $PORT — start the service first" && exit 1)`
  5. Print `[4/4] Health endpoint: curl $HEALTH_ENDPOINT`
     - `curl -sf http://localhost:$PORT$HEALTH_PATH` 
     - If fails: print `✗ Health check failed. Service may be starting — wait and retry.`, exit 1
     - If passes: print `✓ Health check passed`
  6. Print `✓ All QA checks passed. Safe to report complete.` and exit 0
- Notes: Script does NOT run Peekaboo (visual QA requires human/agent judgment). It prints a reminder to run Peekaboo visual checks after all automated checks pass.

**`qa-validation/references/peekaboo-guide.md`**
- Purpose: Step-by-step guide for visual QA using Peekaboo CLI
- Content sections:
  - **What is Peekaboo** — 2 sentences: macOS UI capture CLI, use for visual verification when automated checks pass
  - **Capture a screenshot:** `peekaboo capture --app "Brave Browser" --output /tmp/qa-screen.png` + example output
  - **Navigate and capture tabs:** how to use Peekaboo to click through tabs and capture each
  - **Button interaction:** how to use Peekaboo to click buttons and verify response
  - **MC-specific checklist:** ordered list of every MC tab/section to verify (derived from existing qa-validation SKILL.md Mission Control section)
  - **When visual QA is required:** any MC change, any new dashboard component, any data source change
  - **When visual QA can be skipped:** backend-only changes with no rendering impact (still requires health check)
  - **Reporting:** how to report visual QA pass/fail — include screenshot path in QA report

#### Modified Files

**`qa-validation/SKILL.md`** — add Troubleshooting section and script reference:

Frontmatter (no change needed — already has negative trigger):
```yaml
---
name: qa-validation
description: Standard operating procedure for Quinn and Jules to validate code builds, system changes, and UI implementations before marking a task complete. Do NOT use for research validation or business analysis — this is for code and build QA only.
---

Add to SKILL.md body:

Scripts Reference section (new, insert after Step 2):

## Automated QA Script
Run `scripts/validate.sh` to execute steps 1–4 programmatically:
```bash
./scripts/validate.sh /path/to/project 3100
The script halts on first failure and prints actionable error output. After it passes, proceed to visual QA via Peekaboo (see references/peekaboo-guide.md).
**Troubleshooting section (new, append at end):**
```markdown
## Troubleshooting

### Build fails with TypeScript errors
- Do NOT bypass with `// @ts-ignore` or `--noEmit` hacks
- Fix the actual type error. If it's in a 3rd-party type definition, add a proper type override.
- Common cause: Melody used a deprecated API. Check the actual error message.

### Tests fail but code looks right
- Check if tests are against stale build artifacts: `rm -rf .next && npm run build`
- Check if a test is importing a module that requires environment variables (mock them in jest.config)
- Never skip failing tests to ship. If a test is wrong, fix the test or document why it's wrong.

### Health check returns 404
- `/api/health` route is missing or renamed. Verify `app/api/health/route.ts` exists and exports GET.

### Health check connection refused
- Service is not running. Run `start-service.sh` first or check for crashed PID.

### Peekaboo can't find Brave Browser
- Verify Brave is open. Peekaboo requires the target app to be running and visible.
- Try `peekaboo list-windows` to see what's available.

### Visual QA shows blank page
- Next.js production mode doesn't hot-reload. After any change: rebuild + restart + verify.
- Check `service.log` for runtime errors that don't appear in build output.

Acceptance Criteria (Quinn)

  1. qa-validation/scripts/validate.sh exists and is executable
  2. Running validate.sh against a project with a passing build, passing tests, and live health endpoint exits 0
  3. Running validate.sh against a project with a failing build exits 1 with non-empty error output
  4. Running validate.sh against a project where no service is running on the specified port exits 1 with a clear message
  5. qa-validation/references/peekaboo-guide.md exists with at least 5 distinct sections
  6. The peekaboo guide contains an MC-specific checklist with at least 5 items
  7. SKILL.md body contains a "Troubleshooting" section with at least 5 failure scenarios
  8. SKILL.md body references scripts/validate.sh with a concrete invocation example
  9. SKILL.md description is unchanged (already correct)

M4: vv-sigint

Complexity: ⭐⭐ | Model: [local-ok]

Objective

Add examples of good vs rejected signals, add troubleshooting section for source failures, add a script to check source URL availability, and fix stale cron schedule references.

File Changes

New Files

vv-sigint/scripts/check-sources.sh - Language: Bash - Purpose: Ping all sources in the watch list and report which are reachable - Inputs: None (reads references/sources.md internally) - Behavior: 1. Parse references/sources.md — extract all URLs (lines matching http:// or https://) 2. For each URL, curl -sf -o /dev/null -w "%{http_code}" --max-time 10 $URL 3. Collect results: OK (200-299), REDIRECT (300-399), BLOCKED (403/429), DEAD (other/timeout) 4. Print grouped summary:

✓ OK (12): [list of domains]
⚠ REDIRECT (2): [list — may need URL update]
✗ BLOCKED (1): [list — check if RSS still available]
✗ DEAD (1): [list — remove from sources.md]
5. Exit 0 if all OK or REDIRECT. Exit 1 if any BLOCKED or DEAD. - Notes: Rate-limit friendly — add 0.5s sleep between requests

Modified Files

vv-sigint/SKILL.md — add Examples, Troubleshooting sections, fix cron reference:

Frontmatter: no change (already good).

Changes to body: 1. Fix stale cron schedule section: Remove any hardcoded scan times. Replace with: "Scan schedule is configured in OpenClaw cron, not here. Check the active cron config for current schedule. This skill covers the scan procedure, not the schedule." 2. Add Examples section (new, after Relevance Scoring):

## Examples

### Well-scored signal (8/10 — include)
> [2026-03-15] article — OpenAI launches "Coach GPT" for professional athletes, targeting NFL/NBA teams — source: techcrunch.com/2026/03/15/openai-coach-gpt

Why it scores high: Direct competitive signal in our target market (professional sports analytics). Affects positioning of any VV sports product. Actionable intelligence.

### Borderline signal (5/10 — include with lower priority)
> [2026-03-15] social — Reddit thread: "What fitness apps actually track VO2 max correctly?" r/fitness — source: reddit.com/r/fitness/...

Why borderline: Consumer space, not B2B. But reveals user frustration with existing tools — potential ICP signal.

### Rejected signal (2/10 — discard)
> General article about tech layoffs at Meta, no fitness/sports/AI angle

Why rejected: No connection to VV goals filter. Not a competitor. Not a market signal. Discard.

### Displacement example (column at 30-item cap)
New 7/10 signal arrives. Scan existing column. Find lowest-scored item (3/10). Move it to `## Dismissed` with note: `[displaced by higher-priority signal on YYYY-MM-DD]`. Insert new signal.

  1. Add Troubleshooting section (new, append at end):
    ## Troubleshooting
    
    ### Source returns 403 Forbidden
    - The site is blocking automated crawlers.
    - Try `web_fetch` with a different user-agent, or check if they have a public RSS feed.
    - If consistently blocked, mark the source as "manual-only" in sources.md and check it via browser on digest day.
    - Run `scripts/check-sources.sh` to batch-check all sources and identify which are blocking.
    
    ### RSS feed returns empty or malformed XML
    - The feed URL may have changed. Check the source's website for a current feed link.
    - Blogwatcher may cache stale feeds — force a refresh: `blogwatcher refresh [source-name]`.
    
    ### Signal count drops to zero after a scan
    - Check if sources.md was recently modified (sources removed without replacement).
    - Check if the local model (used for triage) is returning "below threshold" for everything — test with a known-relevant article manually.
    
    ### Duplicate signals appearing in SIGINT.md
    - The retention check should catch these. If duplicates persist, search SIGINT.md for the source URL before inserting.
    - Add a grep step before writing: `grep -F "$URL" SIGINT.md` — skip if match found.
    
    ### MC events API call failing
    - The events endpoint may be down or require auth. Verify MC is running: `curl -sf http://localhost:3100/api/health`.
    - If MC is down, log the scan to the daily memory file instead and note that MC event was not emitted.
    

Acceptance Criteria (Quinn)

  1. vv-sigint/scripts/check-sources.sh exists and is executable
  2. Running check-sources.sh produces grouped output (OK/REDIRECT/BLOCKED/DEAD) without crashing
  3. SKILL.md body contains an "Examples" section with at least 3 scored signal examples (good, borderline, rejected)
  4. SKILL.md body contains a "Troubleshooting" section with at least 4 failure scenarios
  5. SKILL.md body contains no hardcoded cron schedule times (replaced with cron config reference)
  6. Frontmatter description is unchanged

M5: vv-dashboard-design

Complexity: ⭐⭐ | Model: [local-ok]

Objective

Add worked implementation examples to references/, add a script to detect hardcoded hex values in component files, and update SKILL.md to link to these.

File Changes

New Files

vv-dashboard-design/scripts/check-tokens.sh - Language: Bash - Purpose: Grep component files for hardcoded hex color values that should be CSS tokens - Inputs: - $1 — project root directory to scan - Behavior: 1. grep -rn "#[0-9a-fA-F]\{3,6\}" $PROJECT_ROOT/app $PROJECT_ROOT/components (if those dirs exist) 2. Exclude: comments (//, /*), CSS variable definitions (lines containing --), and the tokens file itself 3. Print matches grouped by file:

✗ Hardcoded hex values found (should use CSS tokens):
  components/StatCard.tsx:12:  className="text-[#40a060]"
  app/dashboard/page.tsx:45:  color: "#c04040"

Fix: Replace with var(--color-accent) or the equivalent Tailwind token class.
Run: grep -r "color-" references/tokens.md to find the right token name.
4. If no matches: print ✓ No hardcoded hex values found. Token discipline maintained. 5. Exit 1 if violations found, exit 0 if clean

vv-dashboard-design/references/examples/stat-card.md - Purpose: Correct and incorrect implementation of the Stat Card component - Content: - ✅ Correct: Full React component using CSS tokens, tabular-nums, delta badge - ❌ Incorrect: Same component with hardcoded colors, missing tabular-nums, no status icon - Explanation of each diff and why it matters

vv-dashboard-design/references/examples/alert-feed.md - Purpose: Alert/activity feed implementation reference - Content: - ✅ Correct: Timestamped entries, severity-coded with icon+color (not color alone), auto-scroll, max-height with scroll - ❌ Incorrect: Color-only severity (accessibility fail), no timestamp, no scroll limit - VV severity color mappings: critical=#c04040, warning=#c0a040, ok=#40a060, idle=#555

vv-dashboard-design/references/examples/empty-state.md - Purpose: Empty state pattern for when no data is available - Content: - ✅ Correct: Helpful message, icon, suggested action - ❌ Incorrect: Blank panel, loading spinner that never resolves - Template JSX for standard VV empty state

Modified Files

vv-dashboard-design/SKILL.md — add script reference and examples link:

In the "Component Patterns" section, add after the existing list:

## Examples
See `references/examples/` for correct vs incorrect implementations of:
- `stat-card.md` — KPI stat card with trend delta
- `alert-feed.md` — severity-coded activity feed
- `empty-state.md` — empty state pattern

## Token Compliance Check
Before marking any UI task complete, run:
```bash
./scripts/check-tokens.sh /path/to/mc-project
A clean scan is required. Any hardcoded hex values are a build blocker.
### Acceptance Criteria (Quinn)
1. `vv-dashboard-design/scripts/check-tokens.sh` exists and is executable
2. Running check-tokens.sh against a directory containing `color: "#40a060"` exits 1 and prints the file/line
3. Running check-tokens.sh against a clean directory exits 0
4. `references/examples/stat-card.md` exists with at least one ✅ and one ❌ example
5. `references/examples/alert-feed.md` exists with severity color mappings documented
6. `references/examples/empty-state.md` exists with template JSX
7. SKILL.md references `scripts/check-tokens.sh` with a concrete invocation example
8. SKILL.md references `references/examples/` directory

---

## M6: project-pipeline

**Complexity:** ⭐ | **Model:** `[local-ok]`

### Objective
Move the evaluation template from workspace root into the skill's `references/` directory. Add a completed example evaluation. Update SKILL.md to use relative path.

### File Changes

#### New Files

**`project-pipeline/references/evaluation-template.md`**
- Purpose: The reusable evaluation template, moved from workspace root
- Content: Copy from `~/.openclaw/workspace/PROJECT_EVALUATION_TEMPLATE.md` verbatim
- After verifying SKILL.md is updated, the workspace root copy can be archived (move to `project-pipeline/references/`, do not delete unless Jeff confirms)

**`project-pipeline/references/examples/apa-evaluation.md`**
- Purpose: Completed example evaluation showing how to use the template
- Content: A realistic (non-fictional) completed evaluation of the APA (Athletic Performance Analytics) initiative using the template structure. Show all fields filled in with real-style content. Demonstrate: conclusion-first, moat vs moat-hypothesis distinction, pursue/defer/reject recommendation.

#### Modified Files

**`project-pipeline/SKILL.md`** — update template path:
- Change: `use the evaluation template at /Users/viverevitalis/.openclaw/workspace/PROJECT_EVALUATION_TEMPLATE.md`
- To: `use the evaluation template at references/evaluation-template.md`
- Add after the template reference: `See references/examples/apa-evaluation.md for a completed example.`

### Acceptance Criteria (Quinn)
1. `project-pipeline/references/evaluation-template.md` exists and is non-empty
2. `project-pipeline/references/examples/apa-evaluation.md` exists with all template fields completed
3. SKILL.md references `references/evaluation-template.md` (not the old absolute workspace root path)
4. SKILL.md references `references/examples/apa-evaluation.md`
5. The example evaluation demonstrates the pursue/defer/reject recommendation format

---

## M7: frontend-design

**Complexity:** ⭐⭐ | **Model:** `[cloud-required]`

### Objective
Add a `references/` directory with curated font pairings and color palette examples. Update SKILL.md to link to these as starting-point resources.

### File Changes

#### New Files

**`frontend-design/references/font-pairings.md`**
- Purpose: Curated font pairing guide for distinctive VV frontend work
- Content sections:
  - **What makes a pairing work** — 3 sentences: contrast (weight/style), shared geometry, role clarity (display vs body)
  - **Pairings by aesthetic direction** — for each of the aesthetics in SKILL.md (minimal, maximalist, brutalist, editorial, luxury, retro-futuristic, organic), provide:
    - Display font name + Google Fonts URL
    - Body font name + Google Fonts URL
    - One-line rationale
    - CSS import snippet
  - **Anti-pairings** — fonts to avoid and why (Inter alone, Roboto alone, "Space Grotesk + Inter" as the cliché AI pairing)
  - **Variable font usage** — when and how to use variable fonts for animation
  - Minimum 8 distinct pairings total

**`frontend-design/references/color-examples.md`**
- Purpose: Color palette examples organized by aesthetic direction
- Content sections:
  - **How to read these palettes** — dominant, accent, neutral, and semantic roles
  - **Palettes by direction** — for each aesthetic in SKILL.md, provide:
    - A 5-color palette with hex values and role labels
    - CSS custom property declarations
    - "What this communicates" — one sentence
  - **Color anti-patterns** — purple gradients on white, neon on neon, flat grays everywhere
  - **Dark theme starting point** — the VV dark palette (from service-management globals) as a reference baseline
  - **Accessibility notes** — WCAG AA contrast minimum, how to check with browser devtools
  - Minimum 6 distinct palettes total

#### Modified Files

**`frontend-design/SKILL.md`** — add References section:

After the "Output Requirements" section, add:
```markdown
## Starting Resources

Before committing to an aesthetic, check:
- `references/font-pairings.md` — curated pairings organized by direction
- `references/color-examples.md` — palette examples by aesthetic

These are starting points, not constraints. Deviate intentionally, not by default.

Acceptance Criteria (Quinn)

  1. frontend-design/references/font-pairings.md exists with at least 8 distinct pairings
  2. Each pairing includes display font, body font, rationale, and CSS import snippet
  3. An anti-pairings section exists naming at least 3 pairings to avoid
  4. frontend-design/references/color-examples.md exists with at least 6 distinct palettes
  5. Each palette includes hex values, role labels, and CSS variable declarations
  6. SKILL.md references both files in a "Starting Resources" or equivalent section

M8: memory-manager

Complexity: ⭐⭐ | Model: [local-ok]

Objective

Add concrete examples of good vs bad memory entries, and add an optional script to check for line count bloat and surface obvious contradictions.

File Changes

New Files

memory-manager/scripts/consolidate-check.py - Language: Python 3 - Purpose: Audit memory files for line count bloat and surface potential contradictions - Inputs (CLI args): - --memory-dir PATH — path to memory/ directory (defaults to ~/.openclaw/workspace/memory/) - --semantic-file PATH — path to MEMORY.md (defaults to ~/.openclaw/workspace/MEMORY.md) - --max-daily-lines INT — warn if a daily file exceeds this (default: 200) - --max-semantic-lines INT — warn if MEMORY.md exceeds this (default: 500) - Behavior: 1. Scan memory/ directory. For each .md file: - Count lines. If > max-daily-lines, flag as "bloated" - Print: memory/2026-03-15.md: 312 lines ⚠ (exceeds 200 — consider consolidating) 2. Check MEMORY.md line count. If > max-semantic-lines, flag. 3. Naive contradiction detection: Find lines where the same subject appears with conflicting modifiers (e.g., "Jeff prefers X" and "Jeff prefers Y" where X≠Y). This is a simple keyword scan, not semantic comparison — flag for human review, not auto-resolve. 4. Print summary:

Memory Audit — 2026-03-20
Daily files: 14 total, 2 flagged for consolidation
MEMORY.md: 342 lines (OK)
Potential contradictions to review:
  Line 45 vs Line 112: "Jeff prefers concise replies" / "Jeff prefers thorough explanations"
5. Exit 0 always (this is advisory, not blocking) - Notes: This script should be run manually or as part of the nightly consolidation workflow, not on every session.

Modified Files

memory-manager/SKILL.md — add Examples section and script reference:

Add Examples section after "What to Capture":

## Examples

### Good memory entry — specific, durable, actionable
> **[2026-03-15] Jeff confirmed: APA is the top-priority initiative for Q2. Revenue target: $50K ARR by end of year. ICP is mid-market sports teams (50-200 athletes).**

Why it's good: Specific, time-stamped decision with clear business context. Actionable — affects how we prioritize work.

### Bad memory entry — vague, transient, not worth keeping
> **Jeff said he was tired today and might want shorter updates.**

Why it's bad: Transient state, not durable preference. Will be stale within a day. Not worth storing in long-term memory.

### Good MEMORY.md entry — distilled, not redundant
> **Jeff's communication preference (confirmed multiple times): conclusions first, then rationale. Never lead with caveats.**

Why it's good: Repeated pattern confirmed across sessions — worth keeping long-term. Not a one-time observation.

### Bad MEMORY.md entry — redundant with daily notes
> **On 2026-03-15, we discussed APA. Jeff said it was important. We then talked about the skills overhaul.**

Why it's bad: This belongs in the daily note, not MEMORY.md. MEMORY.md is for distilled insights, not session logs.

### Consolidation trigger
Daily file at 300+ lines → run `scripts/consolidate-check.py` to identify which sections to distill into MEMORY.md.

Add script reference after the Smart Startup Routine section:

## Memory Health Check
Run `scripts/consolidate-check.py` when daily memory files feel bloated or when MEMORY.md grows unwieldy. 
This script identifies files to consolidate and surfaces potential contradictions for human review.
It is advisory only — never auto-resolves contradictions.

Acceptance Criteria (Quinn)

  1. memory-manager/scripts/consolidate-check.py exists and runs without crashing on an empty directory
  2. Running the script with a daily file exceeding 200 lines prints a warning for that file
  3. Running the script with MEMORY.md under 500 lines prints no warnings for the semantic file
  4. SKILL.md contains an "Examples" section with at least 4 entries (2 good, 2 bad)
  5. Each example is labeled with "Why it's good" or "Why it's bad" explanation
  6. SKILL.md references scripts/consolidate-check.py

M9: openclaw-prime

Complexity: ⭐ | Model: [local-ok]

Objective

Add a troubleshooting section for common gateway and admin issues. No scripts or reference files needed.

File Changes

Modified Files

openclaw-prime/SKILL.md — add Troubleshooting section:

Frontmatter: no change needed (already has negative trigger).

Append to body:

## Troubleshooting

### Gateway won't start
- Check `openclaw gateway status` first — may already be running.
- If crashed: `openclaw gateway stop && openclaw gateway start`
- Check logs: `openclaw gateway logs` (or equivalent) for the specific error.
- Common cause: port conflict. Verify the bind port is not in use.

### Node won't pair / QR code fails
- Use the node-connect skill for systematic pairing diagnosis.
- Quick check: Is the gateway reachable on its public URL? Try `curl -sf $PUBLIC_URL/health`.
- Bootstrap token may be expired — regenerate from gateway config.

### Channel not receiving messages
- Verify the channel is configured correctly: `openclaw gateway status` shows active channels.
- For Telegram: check bot token is valid. Send `/start` to the bot directly.
- For webhook channels: verify the webhook URL matches the gateway's public URL.

### Model routing sending to wrong model
- Check `openclaw gateway config` for the routing table.
- Session-level overrides take precedence over defaults — verify no override is active.
- If fallback is triggering unexpectedly: the primary model may be rate-limited or timing out.

### Config changes not taking effect
- OpenClaw requires a gateway restart after config file changes.
- `openclaw gateway restart` — verify with `openclaw gateway status`.
- If using org-level config: individual settings may be overridden. Check admin console.

### "Unauthorized" errors from agents
- API keys may have rotated. Check the key in gateway config against the current key in the provider console.
- Session authentication: verify the session token hasn't expired.

> **When in doubt:** Read live docs first (this skill's prime workflow), not memory. OpenClaw configs change; memory doesn't always keep up.

Acceptance Criteria (Quinn)

  1. SKILL.md body contains a "Troubleshooting" section
  2. Troubleshooting section covers at least 6 distinct failure scenarios
  3. Each scenario has: symptom identification + at least one concrete resolution step
  4. The node-connect skill is referenced for pairing issues (do not duplicate that skill's content)
  5. Frontmatter description is unchanged

COMPLEX TIER


C1: doc-coauthoring

Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]

Objective

New skill. Systematic 3-stage document coauthoring workflow: Context Gathering → Refinement Loop → Reader Testing. Adapted from Anthropic's example skill. Primary users: Forge (specs), Jules (proposals), Atlas (research docs).

File Structure

doc-coauthoring/
├── SKILL.md
├── references/
│   ├── document-types.md
│   ├── quality-checklist.md
│   └── examples/
│       ├── spec-example.md
│       └── proposal-example.md
└── scripts/
    └── word-count.sh

Frontmatter

---
name: doc-coauthoring
description: >
  Collaborative document creation with structured context gathering, iterative refinement,
  and reader testing. Use when creating specs, proposals, research docs, briefs, or any
  multi-section document that requires consistent quality and structure. Triggered by:
  "help me write a spec", "draft a proposal", "coauthor a brief", "write a doc with me",
  or when Forge, Atlas, or Jules needs to produce a formal document artifact.
  Do NOT use for quick single-paragraph responses, memory capture, or code documentation —
  this is for multi-section formal documents only.
---

SKILL.md Body Outline

1. When to Use — multi-section documents, formal artifacts, anything that requires structure and quality review before delivery. Examples: specs, proposals, research briefs, strategic memos.

2. Three-Stage Workflow

Stage 1: Context Gathering - Before writing a single word, collect: - Document type (spec, proposal, research, brief, memo, other) - Primary audience (Jeff, external client, agent team, public) - Purpose: what decision does this document enable or inform? - Required sections (explicit) and implied sections (based on type) - Length target: short (<500w), standard (500-2000w), long (2000+) - Tone: formal, operational, strategic, technical - Deadline or urgency level - Ask for any missing context. Do not guess audience or purpose. - Read references/document-types.md to confirm expected section structure for the document type.

Stage 2: Draft → Quality Check → Refinement Loop - Write complete first draft. - Before sharing, self-evaluate against references/quality-checklist.md: - Does the opening state the purpose and conclusion (BLUF for ops docs)? - Is every section necessary? Remove anything that doesn't serve the purpose. - Are all claims supported or flagged as assumptions? - Is tone consistent throughout? - Are there any undefined acronyms or jargon for the target audience? - If 3+ checklist items fail, revise before sharing. - Share draft with note: "Here's the first draft. Key decisions/gaps: [list]." - Incorporate feedback. Repeat until approved.

Stage 3: Reader Testing - Before finalizing, test from the reader's perspective: - Can someone unfamiliar with the context understand the purpose from the first paragraph? - Are action items (if any) unambiguous? - Run scripts/word-count.sh — is the document within the target length? - If the document is a spec: ask "Can Melody build from this without clarifying questions?" - If the document is a proposal: ask "Does this make the decision easy for the reader?" - Deliver final version with a one-sentence summary of what changed from draft 1.

3. Document Type Reference Link to references/document-types.md for expected structure by type.

4. Examples Link to references/examples/spec-example.md and references/examples/proposal-example.md.

5. Troubleshooting - "I don't know what sections to include" → Read document-types.md for your type. - "The document is too long" → Run word-count.sh. Identify the longest section. Is it doing the work? If not, cut it. - "Feedback keeps changing" → The audience may not have been defined correctly. Return to Stage 1. - "The reader said it's unclear" → The opening paragraph probably failed reader testing. Fix BLUF.

Reference File Descriptions

references/document-types.md - Sections for each document type: Spec, Proposal, Research Brief, Strategic Memo, Initiative Brief - For each type: purpose, primary audience, required sections (ordered), common mistakes, length guidance

references/quality-checklist.md - Universal checklist (applies to all types): 10 items - Type-specific additions for Spec, Proposal, Research Brief

references/examples/spec-example.md - An anonymized but realistic spec fragment showing: frontmatter, objective, scope, acceptance criteria - Annotated to explain what makes it work

references/examples/proposal-example.md - A realistic proposal fragment showing: BLUF opening, problem statement, proposed solution, recommendation - Annotated

scripts/word-count.sh

  • Language: Bash
  • Inputs: $1 — path to markdown file
  • Behavior: Strip markdown syntax (sed to remove headers, bullets, code fences), count words (wc -w), print: "$FILENAME: ~$WORD_COUNT words". Also flag if >3000 words with a recommendation to split into sections.
  • Exit 0 always (advisory)

Acceptance Criteria (Quinn)

  1. All files exist at specified paths
  2. SKILL.md frontmatter is valid YAML with name, description under 1024 chars, negative trigger present
  3. SKILL.md body contains exactly 3 stages, clearly labeled
  4. Stage 1 includes a complete list of context questions (at least 7)
  5. Stage 2 references references/quality-checklist.md
  6. Stage 3 references scripts/word-count.sh with a concrete invocation
  7. references/document-types.md covers at least 4 document types with required sections listed
  8. references/quality-checklist.md contains at least 10 universal checklist items
  9. Both example files exist and are annotated
  10. scripts/word-count.sh is executable and produces word count output without crashing
  11. Running the skill-creator skill against this SKILL.md produces no "missing required field" warnings

C2: webapp-testing

Complexity: ⭐⭐⭐⭐⭐ | Model: [cloud-required]

Objective

New skill. Playwright-based automated web testing for Quinn. Replaces manual "check every page" QA. Includes a server lifecycle management script adapted from Anthropic's Playwright skill pattern. Primary users: Quinn (automated test execution), Melody (test authoring).

File Structure

webapp-testing/
├── SKILL.md
├── scripts/
│   ├── with_server.py
│   ├── run-tests.sh
│   └── capture-baseline.sh
└── references/
    ├── playwright-setup.md
    ├── test-patterns.md
    └── mc-test-suite.md

Frontmatter

---
name: webapp-testing
description: >
  Playwright-based automated web testing for VV web applications. Use when Quinn needs to run
  a test suite against Mission Control or any VV web app, when Melody writes new Playwright tests,
  or when establishing a visual baseline for a new feature. Handles server lifecycle automatically
  via with_server.py. Triggered by: "run the test suite", "write Playwright tests",
  "verify the build with tests", "capture a visual baseline".
  Do NOT use for unit tests (use npm test), API-only validation (use qa-validation validate.sh),
  or mobile app testing — this is for web UIs with a browser context only.
---

SKILL.md Body Outline

1. When to Use — automated browser testing, visual regression baselines, multi-page navigation verification. Contrast with qa-validation (which handles build, unit tests, and basic health check).

2. Setup Prerequisites - Playwright must be installed: npx playwright install (check references/playwright-setup.md) - The target app must be buildable in production mode - with_server.py handles server start/stop automatically around test runs

3. Core Workflow

Running an existing test suite:

python scripts/with_server.py --port 3100 --start-cmd "npm start" --dir /path/to/project \
  -- npx playwright test

Writing a new test: - Read references/test-patterns.md for VV-standard test structure - Tests live in tests/ directory of the target project - Always test: navigation, data display, error states, empty states - Never test implementation details — test what the user sees

Capturing a visual baseline:

./scripts/capture-baseline.sh /path/to/project 3100
Stores screenshots in tests/baselines/ for future regression comparison.

4. MC Test Suite For Mission Control specifically, read references/mc-test-suite.md. The MC suite covers all tabs, all data columns, all buttons. Run before marking any MC change complete.

5. Troubleshooting - "Playwright can't connect to browser" → Run npx playwright install chromium - "Tests fail on CI but pass locally" → Likely a timing issue. Add await page.waitForSelector(...) before assertions. - "with_server.py can't start server" → Check that the port isn't already bound. Run check-health.sh first. - "Visual regression diff is huge" → Check if a CSS token changed. Run check-tokens.sh first. - "Tests are slow" → Run with --workers 1 first to isolate failures. Then increase workers.

Script Descriptions

scripts/with_server.py - Language: Python 3 - Purpose: Manage server lifecycle around a test command — start server, wait for it to be healthy, run tests, stop server. Adapted from Anthropic's pattern. - Inputs (CLI args): - --port INT — port to start server on - --start-cmd STR — command to start the server (e.g. "npm start") - --dir PATH — working directory for server command - --health-path STR (optional) — defaults to /api/health - --timeout INT (optional) — seconds to wait for server healthy, defaults to 30 - -- [test_command...] — the test command to run after -- - Behavior: 1. Start server process using subprocess.Popen with start-cmd in dir 2. Poll http://localhost:PORT/health-path every 2s up to timeout seconds 3. If server healthy: run test command via subprocess.run, capture exit code 4. In finally block: terminate server process (SIGTERM then SIGKILL if needed) 5. Exit with test command's exit code - Error handling: if server never becomes healthy, print error, terminate server, exit 1

scripts/run-tests.sh - Language: Bash - Purpose: Convenience wrapper for common test patterns - Inputs: - $1 — project directory - $2 — port - $3 (optional) — test file pattern, defaults to all tests - Behavior: Calls python with_server.py with correct args and npx playwright test $3

scripts/capture-baseline.sh - Language: Bash - Purpose: Navigate to key pages and capture screenshots for visual baseline - Inputs: - $1 — project directory - $2 — port - Behavior: 1. Start server via with_server.py 2. Use Playwright to navigate to each route defined in references/mc-test-suite.md 3. Save screenshots to $PROJECT/tests/baselines/YYYY-MM-DD/ 4. Print list of captured files

Reference File Descriptions

references/playwright-setup.md - Playwright installation: npm install -D @playwright/test && npx playwright install - playwright.config.ts template for VV projects - How to run headed vs headless - How to update snapshots: npx playwright test --update-snapshots

references/test-patterns.md - VV standard test structure: describebeforeEach (navigate) → test (assertion) - Selector philosophy: prefer data-testid, then ARIA role, then text content. Never class names. - Async patterns: await page.waitForLoadState('networkidle') for data-heavy pages - Assertion patterns: expect(locator).toBeVisible(), expect(locator).toHaveText() - Error state testing: test what happens when API returns 500 - Empty state testing: test what happens when API returns []

references/mc-test-suite.md - Complete test checklist for Mission Control - Routes to test: /, /sigint, /agents, /tasks, /initiatives, /settings (all current MC routes) - For each route: what to verify (data visible, no blank panels, no console errors) - Button checklist: all interactive buttons by page - Data accuracy checks: compare displayed agent count vs actual agent count

Acceptance Criteria (Quinn)

  1. All files exist at specified paths
  2. SKILL.md frontmatter has valid YAML, description under 1024 chars, negative trigger present
  3. scripts/with_server.py is executable, starts a server, waits for health, runs a test command, and stops server — verified with a trivial echo command as the test
  4. scripts/with_server.py exits with the test command's exit code (not always 0)
  5. scripts/with_server.py terminates the server even if the test command fails
  6. scripts/run-tests.sh is executable and calls with_server.py correctly
  7. scripts/capture-baseline.sh is executable and creates at least one file in tests/baselines/
  8. references/playwright-setup.md contains a working playwright.config.ts template
  9. references/test-patterns.md covers selector philosophy, async patterns, and error state testing
  10. references/mc-test-suite.md covers all current MC routes with at least one assertion per route
  11. SKILL.md troubleshooting section covers at least 5 failure scenarios

C3: skill-creator

Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]

Objective

New meta-skill for building and improving VV skills. Guides through use case definition, frontmatter generation, SKILL.md authoring, and trigger testing. Also reviews existing skills for structural issues and over/under-triggering risks. Replaces ad hoc skill authoring.

File Structure

skill-creator/
├── SKILL.md
├── references/
│   ├── skill-spec-template.md
│   ├── frontmatter-guide.md
│   ├── trigger-test-suite.md
│   └── vv-skill-standards.md
└── scripts/
    └── validate-skill.sh

Frontmatter

---
name: skill-creator
description: >
  Interactive guide for creating new VV skills or improving existing ones. Walks through use case
  definition, frontmatter authoring, SKILL.md structure, and trigger testing. Also reviews existing
  SKILL.md files for structural issues, vague descriptions, missing negative triggers, or
  over/under-triggering risks. Triggered by: "create a skill", "build a skill", "improve this skill",
  "review this skill", "audit the skill", "tidy up a skill", "does this skill follow standards".
  Do NOT use for executing skill workflows — use the target skill itself. Do NOT use for non-skill
  documentation — use doc-coauthoring instead.
---

SKILL.md Body Outline

1. Two Modes - Create mode: Building a new skill from scratch - Review mode: Auditing and improving an existing skill

2. Create Mode Workflow

Step 1: Use Case Definition Ask: - What specific task or workflow does this skill enable? - Who will use it? (Jules, Melody, Quinn, Forge, external) - What trigger phrases will users say? - What should NOT trigger this skill? - Is there an existing skill that overlaps? (check current skill list) - Does this need scripts? references? assets?

Step 2: Generate Frontmatter Using answers, produce: - name — kebab-case, matches folder name, under 30 chars - description — WHAT + WHEN + negative triggers, under 1024 chars - Optional fields: compatibility, metadata Apply frontmatter-guide.md rules. Validate with validate-skill.sh.

Step 3: Write SKILL.md Body Using Anthropic's recommended structure (adapted for VV): 1. When to Use 2. Workflow (numbered steps, explicit ordering) 3. Examples (good output, bad output, or scenario walkthroughs) 4. Troubleshooting (at least 3 failure modes) 5. References section (if references/ files exist)

Step 4: Trigger Testing Generate 15 test cases using trigger-test-suite.md format: - 5 "should trigger" — obvious phrasing - 5 "should trigger" — paraphrased - 5 "should NOT trigger" — related but different

Evaluate each against the description. If any "should trigger" doesn't clearly match the description, revise the description.

Step 5: Validate Run scripts/validate-skill.sh path/to/skill/ — must pass all checks before marking complete.

3. Review Mode Workflow

Read the target SKILL.md. Check against references/vv-skill-standards.md: - Frontmatter valid? name, description, negative trigger present? - Description under 1024 chars? Contains WHAT + WHEN? - Body has: When to Use, Workflow, Examples, Troubleshooting? - Any references linked but missing from disk? - Any scripts referenced but not executable? - Any hardcoded absolute paths that should be relative? - SKILL.md under 5000 words? If over, what should move to references/?

Report findings as: PASS, WARN, or FAIL for each check. Provide specific fix instructions for each WARN/FAIL.

4. VV Skill Standards Read references/vv-skill-standards.md for the complete checklist.

5. Troubleshooting - "Description is over 1024 chars" → Move details from description into SKILL.md body. Description is just the trigger, not the manual. - "Skill triggers too often" → Add negative triggers. Be more specific about what it does NOT cover. - "Skill never triggers" → Your description is too generic. Add specific trigger phrases users would actually say. - "validate-skill.sh fails on YAML" → Check for unescaped colons and unclosed quotes in frontmatter.

Reference File Descriptions

references/skill-spec-template.md - Blank SKILL.md template with every section pre-populated with placeholder comments - Shows correct frontmatter structure - Section headers in the right order - Used as the starting skeleton in Create mode

references/frontmatter-guide.md - Rules for each frontmatter field (from Anthropic guide, VV-adapted) - Good and bad description examples (at least 5 each) - Negative trigger patterns with examples - Common mistakes: XML tags, unclosed quotes, name with spaces

references/trigger-test-suite.md - 15-test format template with fill-in-the-blank structure - Example completed suite for the doc-coauthoring skill - How to evaluate results: if 3+ should-trigger tests fail, revise description

references/vv-skill-standards.md - Complete VV skill quality checklist (30 items) - Derived from: Anthropic guide checklist + VV-specific rules - Each item: REQUIRED or RECOMMENDED - Used by Review mode and validate-skill.sh

scripts/validate-skill.sh

  • Language: Bash
  • Purpose: Automated structural validation of a skill folder
  • Inputs: $1 — path to skill folder
  • Checks (each produces PASS/WARN/FAIL):
  • SKILL.md exists at exact path
  • Frontmatter has --- delimiters
  • name field is kebab-case, no spaces, no capitals
  • description field exists and is non-empty
  • description length is under 1024 characters
  • description contains "Do NOT" or "NOT for" (negative trigger)
  • No < or > characters in frontmatter
  • SKILL.md word count under 5000
  • All files referenced in SKILL.md body exist on disk
  • All scripts in scripts/ directory are executable
  • No README.md in skill folder root
  • Folder name matches name field in frontmatter
  • Output: grouped PASS/WARN/FAIL list. Exit 1 if any FAIL. Exit 0 if all PASS (even with WARNs).

Acceptance Criteria (Quinn)

  1. All files exist at specified paths
  2. SKILL.md frontmatter valid, under 1024 chars, negative trigger present
  3. SKILL.md body clearly defines Create mode and Review mode as separate workflows
  4. Create mode has exactly 5 steps, each with specific sub-instructions
  5. Review mode produces PASS/WARN/FAIL report (testable by running review on a known-bad skill)
  6. scripts/validate-skill.sh is executable
  7. Running validate-skill.sh on a correct skill exits 0
  8. Running validate-skill.sh on a skill with no description exits 1 with "FAIL" output
  9. Running validate-skill.sh on a skill with non-executable scripts exits 1
  10. references/vv-skill-standards.md has at least 20 checklist items
  11. references/frontmatter-guide.md has at least 5 good description examples and 5 bad ones
  12. references/trigger-test-suite.md has a completed example suite with 15 tests

C4: agent-dispatch

Complexity: ⭐⭐⭐ | Model: [cloud-required]

Objective

New skill. Codifies VV agent selection rules, model routing, timeout settings, and handoff protocol. Currently scattered in MEMORY.md as prose — this makes it procedural and reliably triggerable. Primary user: Jules (orchestrating work), Jeff (understanding who does what).

File Structure

agent-dispatch/
├── SKILL.md
└── references/
    ├── agent-profiles.md
    ├── model-routing.md
    └── handoff-protocol.md

Frontmatter

---
name: agent-dispatch
description: >
  VV agent selection, model routing, and task handoff protocol. Use when deciding which agent
  (Atlas, Melody, Quinn, Forge) should handle a task, which model tier to use, how to structure
  a subagent prompt, or how to handle an agent handoff. Also use when a task needs to be split
  across agents or when verifying that the right agent is doing the right work.
  Triggered by: "who should do this", "which agent", "spawn a subagent", "delegate this",
  "route this task", "which model for this".
  Do NOT use for actually executing tasks — use the target skill for the work itself. Do NOT use
  for external API routing or OpenClaw model config — use openclaw-prime for that.
---

SKILL.md Body Outline

1. VV Agent Team Quick-reference table:

Agent Role Strengths Not For
Jules Orchestrator / COA Planning, routing, synthesis, comms Deep coding, sustained research
Atlas Research Director Deep research, signal gathering, synthesis Writing code, UI work
Melody Engineering Lead Code implementation, refactors, scaffolding Research, strategy
Quinn QA Director Code QA, testing, validation, verification Building features
Forge Product Architect Specs, architecture decisions Execution, coding

2. Task Routing Decision Tree - Is the task primarily research/market intelligence? → Atlas - Is the task primarily code implementation (new feature, refactor, bug fix)? → Melody - Is the task primarily validation (build pass, test, visual QA)? → Quinn - Is the task primarily spec/architecture decisions? → Forge - Does the task span multiple domains? → Jules orchestrates, spawns specialists - Is it a quick synthesis or decision? → Jules handles directly

3. Model Tier Selection Reference references/model-routing.md for the full routing table. Quick rules: - Complex reasoning, architecture, strategy → anthropic/claude-opus-4-6 (default) - Code implementation, most agent tasks → anthropic/claude-sonnet-4-6 - Local simple tasks, triage, filtering → qwen3.5:35b or glm-4.7-flash - Never use a heavy model for a task that local can handle

4. Subagent Prompt Structure When spawning a subagent, every prompt must include: 1. Agent persona ("You are [Agent], [Role] for Vivere Vitalis.") 2. Convention files to read (CONVENTIONS.md, LESSONS_LEARNED.md if they exist) 3. Input files/context to read 4. The task — specific, scoped, completable in one session 5. Output destination — exact file path to write results 6. Acceptance signal — how the requester knows the task is done

See references/handoff-protocol.md for the full template.

5. Handoff Protocol - Jules → Melody: Include spec file path, acceptance criteria, QA contact (Quinn) - Melody → Quinn: Include PR/diff or file list, spec file path for comparison - Quinn → Jules: Include pass/fail with specific failure details if failed - Jules → Jeff: Include summary of what was done, what's pending, what needs approval

6. Escalation - If a subagent reports blocked (missing context, capability gap): Jules escalates to Jeff - If Quinn reports repeated failures on the same spec: Forge re-specs - Never re-assign a failed task to the same agent without changing the approach

7. Examples

Example: Research → Spec → Build → QA pipeline
Task: Build a new revenue dashboard widget

1. Jules assigns Atlas: "Research how other SaaS dashboards display MRR/ARR. Summarize top 3 patterns."
2. Jules assigns Forge: "Spec the MRR widget using Atlas's research. Acceptance criteria for Quinn."
3. Jules assigns Melody: "Build to Forge's spec. Reference vv-dashboard-design skill."
4. Jules assigns Quinn: "QA the MRR widget against Forge's acceptance criteria. Run qa-validation + visual Peekaboo checks."
5. Quinn → Jules: "All AC passed. Ready for Jeff review."

8. Troubleshooting - "Melody keeps asking clarifying questions" → Forge's spec is incomplete. Return to spec phase. - "Quinn keeps failing QA" → Check if the spec's acceptance criteria are testable. If not, Forge needs to revise. - "Atlas research is too broad" → Scoping was wrong. Give Atlas a more specific brief. - "Task took 3x expected time" → Break it into smaller atomic tasks. One task = one file or one logical unit.

Reference File Descriptions

references/agent-profiles.md - Full profile for each agent: Jules, Atlas, Melody, Quinn, Forge - For each: role title, what they're optimized for, what they're NOT for, preferred model tier, known biases/tendencies (e.g., "Melody codes fast but skips architecture thinking") - How to write effective prompts for each agent

references/model-routing.md - Full routing table: task type → model recommendation - Tier definitions: heavy (Opus), standard (Sonnet), local (Qwen/GLM) - Cost consideration: when to use heavy vs standard - Context window guidance: what tasks fit in what window sizes - Update cadence: revisit monthly as new models become available

references/handoff-protocol.md - Subagent prompt template (fill-in-the-blank) - Handoff checklist for each agent pair (Jules→Melody, Melody→Quinn, etc.) - Failure escalation paths - "Task complete" signal standards — what constitutes a complete handoff

Acceptance Criteria (Quinn)

  1. All files exist at specified paths
  2. SKILL.md frontmatter valid, under 1024 chars, negative triggers present
  3. SKILL.md body contains agent team table with all 5 agents (Jules, Atlas, Melody, Quinn, Forge)
  4. SKILL.md body contains a task routing decision tree (structured, not prose)
  5. SKILL.md body contains a subagent prompt structure with all 6 required fields
  6. SKILL.md body contains an Examples section with at least one multi-agent pipeline example
  7. SKILL.md body contains a Troubleshooting section
  8. references/agent-profiles.md has a profile for each of the 5 agents
  9. references/model-routing.md covers at least 3 model tiers with task-type recommendations
  10. references/handoff-protocol.md contains a fill-in-the-blank subagent prompt template

C5: revenue-modeling

Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]

Objective

New skill. Standardizes ARR modeling, pricing tier design, and market sizing (TAM/SAM/SOM) analysis for VV. Used by Forge (specs with revenue implications) and Atlas (market research). Immediately relevant to APA pricing validation.

File Structure

revenue-modeling/
├── SKILL.md
├── references/
│   ├── arr-model-template.md
│   ├── pricing-tier-guide.md
│   ├── market-sizing-guide.md
│   └── examples/
│       ├── apa-arr-model.md
│       └── apa-market-sizing.md
└── scripts/
    └── arr-calc.py

Frontmatter

---
name: revenue-modeling
description: >
  ARR modeling, pricing tier design, and market sizing (TAM/SAM/SOM) for VV products and services.
  Use when evaluating a new product's revenue potential, designing pricing tiers, stress-testing
  revenue assumptions, or calculating TAM/SAM/SOM for a market. Triggered by: "model the revenue",
  "what should we charge", "design pricing tiers", "what's the market size", "ARR projections",
  "SOM calculation", "pricing validation", "revenue assumptions".
  Do NOT use for existing revenue tracking or invoicing — this is for planning and modeling only.
  Do NOT use for cost estimation of API spend — use cost-estimation skill instead.
---

SKILL.md Body Outline

1. When to Use Revenue modeling before building, pricing decisions, market opportunity validation, pitch preparation.

2. Three Analysis Types This skill covers three distinct analyses — use what the situation requires: - ARR Modeling — what revenue can we make from this product? - Pricing Tier Design — how should we structure pricing? - Market Sizing — how big is the opportunity?

Often all three are needed together for a new product evaluation.

3. ARR Modeling Workflow

Step 1: Define the model inputs - Product type: SaaS subscription, one-time, services, hybrid - Billing cadence: monthly, annual, usage-based - Customer segment count (conservative, base, optimistic) - Churn rate assumption (monthly) - Expansion revenue assumption (if applicable)

Step 2: Use the template Read references/arr-model-template.md. Fill in all fields. Run scripts/arr-calc.py to compute projections.

Step 3: Stress-test Run three scenarios: conservative (50% of base), base, optimistic (150% of base). Report all three. Never report only the optimistic case.

Step 4: Identify the biggest assumption Every model has one number that determines everything. Name it explicitly: "This model's critical assumption is [X]. If [X] is wrong by Y%, the projection changes by Z%."

4. Pricing Tier Design Workflow

Step 1: Identify buyer types (ICP segmentation) - Who are the distinct buyer types? - What is each type's willingness to pay? - What is each type's key value metric?

Step 2: Apply value-based pricing principles - Price to the value delivered, not cost to produce - Tier structure: 3 tiers (Starter/Pro/Enterprise or equivalent) is the default - Anchor pricing: Enterprise tier anchors perception of value for Pro tier - Free tier or trial: is it needed to reduce friction? What's the conversion assumption?

Step 3: Validate against market - What do competitors charge for comparable value? - What does the target ICP currently budget for this category? - Reference references/pricing-tier-guide.md for VV standard structures

Step 4: Model the tier mix - What % of customers will land in each tier? - What is the resulting blended ARPU? - Feed into ARR model

5. Market Sizing Workflow

Definitions (must use these consistently): - TAM — Total Addressable Market: the entire category revenue if all possible customers bought - SAM — Serviceable Addressable Market: the portion we can realistically reach with our GTM - SOM — Serviceable Obtainable Market: what we can realistically capture in 3-5 years

Process: 1. Define the market category precisely (too broad = useless; too narrow = underestimates) 2. Use top-down approach (industry reports) AND bottom-up approach (customer count × ARPU) 3. Report both approaches. If they diverge significantly, explain why. 4. SAM = TAM × (% addressable by our GTM). Be honest about GTM reach. 5. SOM = SAM × (% we can capture). Justify the capture rate with comparable company benchmarks. 6. Reference references/market-sizing-guide.md for methodology and data sources.

6. Examples See references/examples/apa-arr-model.md and references/examples/apa-market-sizing.md.

7. Output Format Always deliver: - Inputs table (all assumptions, explicitly labeled as assumptions) - Projections table (3 scenarios) - Single "critical assumption" statement - Recommendation or implication (don't just present numbers — tell Jeff what they mean)

8. Troubleshooting - "I don't have enough data for a bottom-up estimate" → Use top-down as primary, bottom-up sanity check from first principles. Flag the uncertainty explicitly. - "The optimistic case looks unrealistically high" → It probably is. Use the base case as the planning figure. Optimistic is the ceiling if everything goes right. - "We can't agree on what the TAM is" → The market definition is wrong. Narrow it until you can get a specific number. - "The model shows we need 10,000 customers to hit $1M ARR but our SAM is 500 companies" → Pricing is too low. Return to pricing tier design.

Reference File Descriptions

references/arr-model-template.md - Structured fill-in-the-blank ARR model - Sections: Product inputs, Customer inputs (segment counts by tier), Churn assumptions, Expansion assumptions - Monthly and annual projection tables (pre-formatted as markdown tables with formula comments) - 3-scenario output table template

references/pricing-tier-guide.md - VV standard 3-tier structure (Starter/Pro/Enterprise) with naming options - Value metric examples by product type (per seat, per API call, per team, flat rate) - Anchoring principle with examples - Free trial / freemium decision framework: when it helps vs when it hurts ARR - Common pricing mistakes: underpricing for enterprise, feature-gating the wrong things

references/market-sizing-guide.md - TAM/SAM/SOM definitions and common mistakes - Top-down methodology: how to use industry reports + adjustment factors - Bottom-up methodology: how to estimate from customer count × ARPU - Data sources for VV target markets (fitness, sports, wellness): which reports to reference - Comparable company benchmarks for capture rate justification

references/examples/apa-arr-model.md - Completed ARR model for Athletic Performance Analytics (APA) - Uses real-style inputs, all three scenarios, critical assumption identified - Annotated to show reasoning

references/examples/apa-market-sizing.md - Completed TAM/SAM/SOM analysis for the APA target market (professional sports teams, NCAA) - Both top-down and bottom-up approaches shown - Annotations explain methodology choices

scripts/arr-calc.py

  • Language: Python 3
  • Purpose: Calculate ARR projections from model inputs, avoiding spreadsheet dependency
  • Inputs (CLI args):
  • --tiers STR — comma-separated tier names (e.g. "Starter,Pro,Enterprise")
  • --prices STR — comma-separated monthly prices per tier (e.g. "99,499,2000")
  • --customers STR — comma-separated customer counts per tier for base case (e.g. "50,20,5")
  • --churn FLOAT — monthly churn rate as decimal (e.g. 0.02 for 2%)
  • --months INT — projection horizon (default: 36)
  • --scenario FLOAT (optional) — multiplier for optimistic/conservative (e.g. 1.5 for optimistic, 0.5 for conservative); if omitted, runs all three
  • Output: Formatted table showing MRR and ARR at month 1, 6, 12, 24, 36 for each scenario
  • Also prints: blended ARPU, implied annual churn %, total customers at horizon
  • Exit 0 on success, 1 on invalid inputs

Acceptance Criteria (Quinn)

  1. All files exist at specified paths
  2. SKILL.md frontmatter valid, under 1024 chars, both negative triggers present
  3. SKILL.md body defines TAM, SAM, SOM explicitly (exact definitions)
  4. SKILL.md body covers all 3 analysis types (ARR, Pricing, Market Sizing) as distinct workflows
  5. ARR modeling workflow includes 3-scenario requirement (conservative, base, optimistic)
  6. ARR modeling workflow includes "critical assumption" identification step
  7. Pricing tier design covers value-based pricing, tier structure, and tier-mix modeling
  8. scripts/arr-calc.py is executable and produces output without crashing with valid inputs
  9. scripts/arr-calc.py produces projections for 3 scenarios when --scenario is omitted
  10. scripts/arr-calc.py exits 1 with clear error when prices and customers have different tier counts
  11. references/arr-model-template.md has a complete 3-scenario output table template
  12. references/market-sizing-guide.md covers both top-down and bottom-up methodology
  13. Both example files (APA ARR model, APA market sizing) exist and are fully populated
  14. Troubleshooting section covers at least 4 failure scenarios

Summary Table

ID Skill Type Complexity Model Key Deliverables
M1 cost-estimation Medium ⭐⭐ local-ok estimate.py, references/MODEL_PRICING_REGISTRY.md
M2 service-management Medium ⭐⭐⭐⭐ cloud 4 scripts, 3 reference files, absorb project-scaffolding
M3 qa-validation Medium ⭐⭐⭐ local-ok validate.sh, references/peekaboo-guide.md, troubleshooting
M4 vv-sigint Medium ⭐⭐ local-ok check-sources.sh, examples, troubleshooting
M5 vv-dashboard-design Medium ⭐⭐ local-ok check-tokens.sh, 3 example files
M6 project-pipeline Medium local-ok references/evaluation-template.md, example evaluation
M7 frontend-design Medium ⭐⭐ cloud references/font-pairings.md, references/color-examples.md
M8 memory-manager Medium ⭐⭐ local-ok consolidate-check.py, examples section
M9 openclaw-prime Medium local-ok troubleshooting section
C1 doc-coauthoring Complex ⭐⭐⭐⭐ cloud New skill, 3-stage workflow, 4 reference files, 1 script
C2 webapp-testing Complex ⭐⭐⭐⭐⭐ cloud New skill, 3 scripts, 3 reference files, with_server.py
C3 skill-creator Complex ⭐⭐⭐⭐ cloud New skill, validate-skill.sh, 4 reference files
C4 agent-dispatch Complex ⭐⭐⭐ cloud New skill, 3 reference files, handoff protocol
C5 revenue-modeling Complex ⭐⭐⭐⭐ cloud New skill, arr-calc.py, 5 reference files

Total estimated effort: ~35-45 Melody hours across all items


Implementation Notes for Melody

  1. Work M1-M9 in order. They are mostly independent but M2 (service-management) must complete before any references to start-service.sh in other skills are added.
  2. M2 is the highest-risk task. The project-scaffolding merge involves content synthesis — don't just append, integrate. Read both existing SKILL.md files carefully before writing the merged version.
  3. Scripts must be tested before marking done. Run each script with at least one valid input and one invalid input. Verify exit codes.
  4. Reference files must be complete. A stub with "coming soon" is a build error, not a pass.
  5. For Complex tier: Build C1 (doc-coauthoring) first — it's the simplest and validates the workflow pattern. Then C3 (skill-creator) — you'll need it to validate the other skills. Then C2, C4, C5.
  6. Absolute path check: Scan every SKILL.md for /Users/viverevitalis/ paths and replace with relative paths using references/ structure.
  7. Deletion of project-scaffolding: Do NOT delete until M2 is verified by Quinn. Confirmation step required.

Spec complete. Ready for Quinn review, then Jules strategic review, then Jeff approval before Melody execution.



V2 REVISION — Post-Quinn Review Delta

Author: Forge (Director of Product Architecture) Date: 2026-03-20 Status: Addresses all blocking and non-blocking issues raised in QA_SPEC_REVIEW.md Scope: This section is a delta only. All original spec content above remains in effect unless explicitly superseded here.


BLOCKER 1: intelligence-suite — Disposition Added (NEW: M10)

Quinn's finding: intelligence-suite was called out in the BRIEF but completely absent from the spec.

Jules's decision: ARCHIVE. vv-sigint is VV's intelligence skill. intelligence-suite belongs to a different agent persona (Makima) and has no place in our stack. No adaptation needed.


M10: intelligence-suite (Archive)

Complexity: ⭐ | Model: [local-ok]

Objective

Archive the intelligence-suite skill by relocating its folder. No content adaptation required. vv-sigint handles all signal intelligence needs for VV.

File Changes

Move Operation

  • From: ~/.openclaw/workspace/skills/intelligence-suite/
  • To: ~/.openclaw/workspace/skills/.archived-intelligence-suite/

Use mv — do not copy-and-delete. Dot-prefix ensures it is excluded from normal skill discovery.

No Content Changes

Do NOT modify SKILL.md or any files inside the archived folder. The goal is preservation with suppression, not deletion.

Post-Move Verification

  • Confirm ~/.openclaw/workspace/skills/intelligence-suite/ no longer exists
  • Confirm ~/.openclaw/workspace/skills/.archived-intelligence-suite/SKILL.md exists and is readable
  • Confirm no other file in ~/.openclaw/workspace/skills/ references intelligence-suite by name in a live SKILL.md trigger (grep check)

Acceptance Criteria (Quinn)

  1. skills/intelligence-suite/ does not exist after M10 completes
  2. skills/.archived-intelligence-suite/SKILL.md exists and is non-empty
  3. grep -r "intelligence-suite" ~/.openclaw/workspace/skills/ --include="SKILL.md" returns zero results from non-archived skills
  4. M10 completes in under 5 minutes — this is a one-command move, not a rewrite

BLOCKER 2: skill-creator Name Conflict — Renamed to vv-skill-creator

Quinn's finding: A system-level skill-creator skill already exists at /opt/homebrew/lib/node_modules/openclaw/skills/skill-creator/. Creating a workspace-level skill with the same name creates an ambiguous resolution conflict.

Jules's decision: RENAME. The VV variant becomes vv-skill-creator. This avoids collision, follows VV naming convention for custom skills, and preserves the system-level skill for general OpenClaw use.

Changes to C3 (now vv-skill-creator)

C3 File Structure — UPDATED:

vv-skill-creator/
├── SKILL.md
├── references/
│   ├── skill-spec-template.md
│   ├── frontmatter-guide.md
│   ├── trigger-test-suite.md
│   └── vv-skill-standards.md
└── scripts/
    └── validate-skill.sh

C3 Frontmatter — UPDATED:

---
name: vv-skill-creator
description: >
  Interactive guide for creating new VV skills or improving existing ones. Walks through use case
  definition, frontmatter authoring, SKILL.md structure, and trigger testing. Also reviews existing
  SKILL.md files for structural issues, vague descriptions, missing negative triggers, or
  over/under-triggering risks. Triggered by: "create a skill", "build a skill", "improve this skill",
  "review this skill", "audit the skill", "tidy up a skill", "does this skill follow standards".
  Do NOT use for executing skill workflows — use the target skill itself. Do NOT use for non-skill
  documentation — use doc-coauthoring instead.
---

All internal C3 references updated: Every occurrence of skill-creator within the SKILL.md body, reference files, and script output messages must use vv-skill-creator. The folder name is vv-skill-creator/. The frontmatter name field is vv-skill-creator.

Cross-Reference Update: C1 AC11

Original AC11 (C1 — doc-coauthoring):

"Running the skill-creator skill against this SKILL.md produces no 'missing required field' warnings."

Updated AC11 (V2):

"Running the vv-skill-creator skill against this SKILL.md via scripts/validate-skill.sh produces no FAIL outputs."

Note: AC11 cannot be verified at C1 completion time (see Blocker 3). It is demoted to a post-C3 retroactive check. See updated implementation order below.


BLOCKER 3: C1/C3 Build Order — C3 Builds First

Quinn's finding: C1 AC11 depends on vv-skill-creator's validate-skill.sh existing. The original order (C1 → C3 → C2 → C4 → C5) makes AC11 unverifiable at C1 completion time.

Fix: Reverse C1 and C3 in the build sequence. C3 (vv-skill-creator) builds and passes QA first, then C1 (doc-coauthoring) builds and AC11 is verified using the newly-built validate-skill.sh.

Updated Complex Tier Build Order: 1. C3vv-skill-creator (builds first; provides validate-skill.sh for all subsequent skill QA) 2. C1doc-coauthoring (AC11 now verifiable post-C3) 3. C2webapp-testing 4. C4agent-dispatch 5. C5revenue-modeling

See updated Implementation Order at the end of this section.


NON-BLOCKING: M2 Deletion Ordering Clarification

Quinn's finding: AC14 instructs Melody to delete project-scaffolding/SKILL.md as part of M2 completion, while the Implementation Notes say "Do NOT delete until M2 is verified by Quinn." Direct contradiction.

Fix: Remove AC14 from M2's acceptance criteria. Deletion is a post-QA step, not a build step.

M2 Acceptance Criteria — UPDATED

Remove from M2 ACs:

~~14. project-scaffolding/SKILL.md has been deleted~~

Add as post-QA step (after Quinn signs off M2):

Post-QA Step — M2 Deletion:

After Quinn confirms M2 PASS, Jules or Melody executes: trash ~/.openclaw/workspace/skills/project-scaffolding/ (use trash, not rm). Confirm deletion with Quinn before marking M2 fully closed. This step is NOT part of Melody's build pass — it is a separate controlled action.


NON-BLOCKING: M4, M5, M8 — Negative Trigger Confirmation

Quinn's finding: Spec said "no frontmatter change needed" for M4, M5, M8 without quoting the existing negative triggers. Required verification.

Confirmed — negative triggers already exist (added to Simple tier today per Jules):

M4 — vv-sigint: Existing description includes:

"Do NOT use for internal project status or team updates — this is for external signal gathering only." ✅ Negative trigger confirmed. No frontmatter update required.

M5 — vv-dashboard-design: Existing description includes:

"Do NOT use for non-MC web projects — use frontend-design for general web work." ✅ Negative trigger confirmed. No frontmatter update required.

M8 — memory-manager: Existing description includes:

"Do NOT use for session-to-session conversation continuity — that is handled by the AGENTS.md startup routine." ✅ Negative trigger confirmed. No frontmatter update required.

Melody: treat the frontmatter for M4, M5, M8 as frozen. Do not touch it unless a check on the actual file reveals the text above is absent, in which case add the quoted trigger text verbatim.


NON-BLOCKING: M7 — Example Scenarios Added

Quinn's finding: The brief called for "2-3 example scenario walkthroughs." The spec provided reference files (font-pairings.md, color-examples.md) but not scenario walkthroughs. These are different things.

Fix: Add a references/examples/ directory to M7 with two scenario walkthrough files.

M7 Additional File Changes

frontend-design/references/examples/minimal-saas-landing.md - Purpose: Scenario walkthrough — designing a minimal SaaS landing page - Content structure: - The brief: "Build a landing page for a B2B analytics tool. Tone: trustworthy, clean, modern." - Aesthetic selection: Why minimal? (trust signals, B2B buyers, data-forward context) - Font pairing chosen: (reference font-pairings.md minimal entry) — reasoning explained - Color palette chosen: (reference color-examples.md minimal entry) — reasoning explained - Layout decisions: whitespace, typography scale, CTA placement - ❌ What NOT to do: purple gradient hero, generic stock photo, Inter/Roboto alone - ✅ Output: Full React component (hero + CTA section) using the chosen fonts and palette

frontend-design/references/examples/editorial-dashboard-widget.md - Purpose: Scenario walkthrough — designing an editorial-style data widget - Content structure: - The brief: "Build a stats widget that feels premium and distinctive. Not like every other dashboard." - Aesthetic selection: Why editorial? (contrast with typical dashboards, creates visual hierarchy) - Font pairing chosen: (reference font-pairings.md editorial entry) — reasoning explained - Color palette chosen: (reference color-examples.md editorial entry) — reasoning explained - Typography choices: large display numbers, small label type, intentional weight contrast - ❌ What NOT to do: rounded cards with soft shadows everywhere, Inter everywhere, gradient accents - ✅ Output: Stat card React component using editorial typography and color

M7 SKILL.md Updates — REVISED

In addition to the "Starting Resources" section already specified, add to the SKILL.md body:

## Example Scenarios
See `references/examples/` for end-to-end scenario walkthroughs:
- `minimal-saas-landing.md` — B2B landing page, minimal aesthetic, full reasoning chain
- `editorial-dashboard-widget.md` — Premium data widget, editorial aesthetic, full reasoning chain

Read these before starting any project to see how aesthetic selection → font → color → component flows in practice.

M7 Updated Acceptance Criteria

Add to existing M7 ACs:

  1. references/examples/minimal-saas-landing.md exists with aesthetic selection reasoning and a React component
  2. references/examples/editorial-dashboard-widget.md exists with aesthetic selection reasoning and a React component
  3. SKILL.md references references/examples/ with links to both files

NON-BLOCKING: M1 — Input Validation for Negative Token Values

Quinn's finding: estimate.py has no handling for --input-tokens -1000. Should reject negative values.

Fix: Add to cost-estimation/scripts/estimate.py spec:

Validation step (insert after arg parsing, before all calculation logic):

# Input validation
for flag, value in [("--input-tokens", args.input_tokens),
                    ("--output-tokens", args.output_tokens),
                    ("--cached-tokens", args.cached_tokens)]:
    if value < 0:
        print(f"ERROR: {flag} cannot be negative. Got: {value}")
        sys.exit(1)

M1 AC addition:

  1. Running estimate.py --input-tokens -1000 --output-tokens 1000 --model claude-sonnet-4-6 exits 1 with a clear error message containing "cannot be negative"

NON-BLOCKING: M2 — scaffold.sh Directory Existence Check

Quinn's finding: If ~/projects/vv-analytics already exists, create-next-app will fail or prompt interactively.

Fix: Add to scaffold.sh spec, before step 2 (the create-next-app invocation):

Insert as step 1.5:

1.5. Check if target directory already exists: if [ -d "$PARENT_DIR/$PROJECT_NAME" ]; then echo "ERROR: Directory $PARENT_DIR/$PROJECT_NAME already exists. Choose a different project name or remove the existing directory."; exit 1; fi


NON-BLOCKING: M2 — start-service.sh Complex Command Fix

Quinn's finding: Passing $2 (the command) as a bare string to nohup $COMMAND breaks for commands with pipes, redirects, or chained operators.

Fix: Update start-service.sh spec, step 2 launch line:

Original:

nohup $COMMAND > service.log 2>&1 < /dev/null &

Updated:

nohup bash -c "$COMMAND" > service.log 2>&1 < /dev/null &

This ensures complex commands like "cd /app && npm start" or "npm run build && npm start" execute correctly.


NON-BLOCKING: C2 — capture-baseline.sh Playwright Companion File

Quinn's finding: A bash script cannot parse a markdown file and generate Playwright navigation commands. The spec needs to clarify the mechanism.

Fix: Update capture-baseline.sh spec behavior:

Replace step 2 in original spec:

~~2. Use Playwright to navigate to each route defined in references/mc-test-suite.md~~

Updated step 2:

  1. Invoke the companion Playwright test file: npx playwright test tests/capture-baselines.spec.ts --project=chromium — this file handles navigation to each route defined in references/mc-test-suite.md. If tests/capture-baselines.spec.ts does not exist in the target project, exit 1 with: "Playwright baseline spec not found. Create tests/capture-baselines.spec.ts first."

Additional note to C2 spec:

The webapp-testing skill should document that capture-baselines.spec.ts is a required file in the target project (not part of the skill itself). Add to references/playwright-setup.md:

## Baseline Capture Test File
capture-baseline.sh requires `tests/capture-baselines.spec.ts` in the target project.
This file should navigate to each route in mc-test-suite.md and call `page.screenshot()`.
See test-patterns.md for the correct Playwright screenshot API usage.


NON-BLOCKING: C3 — validate-skill.sh Check 9 Narrowed

Quinn's finding: Check 9 ("All files referenced in SKILL.md body exist on disk") is too broad. Parsing arbitrary markdown for all file references will produce a fragile parser.

Fix: Narrow check 9 in validate-skill.sh spec:

Original check 9:

  1. All files referenced in SKILL.md body exist on disk

Updated check 9:

  1. Check that all relative paths matching references/ or scripts/ patterns mentioned in the SKILL.md body exist on disk. Implementation: grep -oE '(references|scripts)/[a-zA-Z0-9._/-]+' SKILL.md — for each match, verify the file exists at $SKILL_DIR/$MATCH. Print FAIL for any missing file, PASS if all exist or none are referenced.

NON-BLOCKING: C4 — Date-Stamp Requirement for model-routing.md

Quinn's finding: The model routing table will go stale. AC9 should require a date-stamp to make staleness detectable.

Fix: Add to agent-dispatch/references/model-routing.md spec:

Required header in model-routing.md:

# Model Routing Table
**Last updated:** YYYY-MM-DD
**Review cadence:** Monthly. Revisit when new model tiers are released or pricing changes significantly.

Updated C4 AC9:

  1. references/model-routing.md covers at least 3 model tiers with task-type recommendations and includes a "Last updated: YYYY-MM-DD" date-stamp in the header

NON-BLOCKING: C5 — --scenario Flag Clarified + Input Validation

Quinn's finding (1): When --scenario IS provided, behavior is ambiguous — does it run only that multiplier, or still run all three?

Clarification: When --scenario FLOAT is provided, run only that single multiplier scenario against the base customer counts. When omitted, run all three (0.5× conservative, 1.0× base, 1.5× optimistic).

Updated arr-calc.py spec behavior:

  • --scenario 1.5 → runs only the optimistic scenario (1.5× customer counts)
  • --scenario 0.5 → runs only the conservative scenario (0.5× customer counts)
  • --scenario omitted → runs all three: conservative (0.5×), base (1.0×), optimistic (1.5×)

Quinn's finding (2): No handling for nonsensical inputs (churn > 50%, negative prices/customer counts).

Add to arr-calc.py spec — Input Validation section:

# Validation rules (after arg parsing):
if args.churn < 0 or args.churn > 1:
    print("ERROR: --churn must be between 0.0 and 1.0 (e.g., 0.02 for 2% monthly churn)")
    sys.exit(1)
if args.churn > 0.5:
    print("WARNING: Monthly churn > 50% detected. Double-check inputs — this is economically unusual.")
    # Continue with calculation (warning only, not exit)
for price in [float(p) for p in args.prices.split(",")]:
    if price < 0:
        print("ERROR: Prices cannot be negative.")
        sys.exit(1)
for count in [int(c) for c in args.customers.split(",")]:
    if count < 0:
        print("ERROR: Customer counts cannot be negative.")
        sys.exit(1)

Updated C5 AC additions:

  1. arr-calc.py --churn 1.5 ... exits 1 with a clear error about churn range
  2. arr-calc.py --churn 0.6 ... completes with a printed warning about high churn (does not exit 1)
  3. arr-calc.py --prices "-100,500,2000" ... exits 1 with a clear error about negative prices
  4. arr-calc.py --scenario 1.5 ... runs only the 1.5× scenario, not all three

NON-BLOCKING: C5 — APA Example Numbers — Atlas Cross-Reference Required

Quinn's finding: APA example files in revenue-modeling need realistic numbers that don't contradict Atlas's market research.

Directive to Melody: Do NOT populate references/examples/apa-arr-model.md or references/examples/apa-market-sizing.md with specific numbers until Atlas's APA market research is available. Instead:

Placeholder approach: - Create both files with complete structure (all sections, headings, tables) - Replace all specific dollar figures, customer counts, and market size numbers with [TBD — pending Atlas market research] - Add a header note to both files: ⚠ Numbers in this file are placeholders. Cross-reference with Atlas's APA market research before treating as planning inputs.

Updated C5 AC13:

  1. Both APA example files exist with complete structure. All specific market figures are marked [TBD — pending Atlas market research] with a header warning. Files are not populated with invented numbers.

Updated Implementation Order (V2)

Medium Tier (unchanged order)

  1. M10 — intelligence-suite archive (new, run first — fast, no dependencies)
  2. M1 — cost-estimation
  3. M2 — service-management (absorbs project-scaffolding)
  4. M3 — qa-validation
  5. M4 — vv-sigint
  6. M5 — vv-dashboard-design
  7. M6 — project-pipeline
  8. M7 — frontend-design
  9. M8 — memory-manager
  10. M9 — openclaw-prime

Post-M2 QA gate: Quinn signs off M2 → Jules/Melody deletes project-scaffolding (controlled deletion, not part of build pass).

Complex Tier (C1/C3 order reversed from original)

  1. C3vv-skill-creator (builds first; validate-skill.sh required by C1 AC11)
  2. C1doc-coauthoring (AC11 now verifiable post-C3)
  3. C2webapp-testing
  4. C4agent-dispatch
  5. C5revenue-modeling

Post-C3 retroactive check: After C3 passes QA, run validate-skill.sh against C1's SKILL.md as a retroactive AC11 verification. If it fails, Melody patches C1 before proceeding to C2.


Updated Summary Table (V2)

ID Skill Type Complexity Model Key Deliverables V2 Changes
M1 cost-estimation Medium ⭐⭐ local-ok estimate.py, references/MODEL_PRICING_REGISTRY.md Add negative token validation (AC8)
M2 service-management Medium ⭐⭐⭐⭐ cloud 4 scripts, 3 reference files, absorb project-scaffolding Remove AC14; post-QA deletion step; scaffold.sh dir check; start-service.sh bash -c fix
M3 qa-validation Medium ⭐⭐⭐ local-ok validate.sh, references/peekaboo-guide.md, troubleshooting No changes
M4 vv-sigint Medium ⭐⭐ local-ok check-sources.sh, examples, troubleshooting Negative trigger confirmed (no frontmatter change)
M5 vv-dashboard-design Medium ⭐⭐ local-ok check-tokens.sh, 3 example files Negative trigger confirmed (no frontmatter change)
M6 project-pipeline Medium local-ok references/evaluation-template.md, example evaluation No changes
M7 frontend-design Medium ⭐⭐ cloud references/font-pairings.md, references/color-examples.md Add references/examples/ with 2 scenario walkthroughs (AC7-9 added)
M8 memory-manager Medium ⭐⭐ local-ok consolidate-check.py, examples section Negative trigger confirmed (no frontmatter change)
M9 openclaw-prime Medium local-ok troubleshooting section No changes
M10 intelligence-suite Medium local-ok Archive to .archived-intelligence-suite/ NEW — was missing from spec
C1 doc-coauthoring Complex ⭐⭐⭐⭐ cloud New skill, 3-stage workflow, 4 reference files, 1 script AC11 demoted to post-C3 retroactive check; builds AFTER C3
C2 webapp-testing Complex ⭐⭐⭐⭐⭐ cloud New skill, 3 scripts, 3 reference files, with_server.py capture-baseline.sh invokes companion Playwright spec file
C3 vv-skill-creator Complex ⭐⭐⭐⭐ cloud New skill, validate-skill.sh, 4 reference files RENAMED from skill-creator; builds FIRST in complex tier; validate-skill.sh check 9 narrowed
C4 agent-dispatch Complex ⭐⭐⭐ cloud New skill, 3 reference files, handoff protocol model-routing.md requires date-stamp header
C5 revenue-modeling Complex ⭐⭐⭐⭐ cloud New skill, arr-calc.py, 5 reference files --scenario flag clarified; churn/price validation added; APA examples use TBD placeholders pending Atlas research

Total estimated effort: ~35-45 Melody hours (unchanged — M10 adds ~15 min)


V2 Revision complete. All 3 blocking issues resolved. All non-blocking issues addressed. Ready for Quinn delta sign-off, then Jeff approval before Melody execution.

— Forge