Skills Overhaul — Implementation Spec¶
Author: Forge (Director of Product Architecture) Date: 2026-03-20 Status: Draft — Pending Quinn + Jules review before Melody execution References: BRIEF.md, anthropic-skills-guide.md, ANTHROPIC_SKILLS_GUIDE_REVIEW.md
How to Read This Spec¶
- Melody builds directly from this. No clarifying questions expected.
- Quinn verifies against acceptance criteria — each AC is independently testable.
- Complexity ratings: ⭐ = ~30 min, ⭐⭐ = ~1 hr, ⭐⭐⭐ = ~2 hr, ⭐⭐⭐⭐ = ~3 hr, ⭐⭐⭐⭐⭐ = 4+ hr
- Model tags:
[cloud-required]= needs Claude Sonnet or better.[local-ok]= local model fine. - All paths are relative to
~/.openclaw/workspace/skills/unless otherwise noted.
MEDIUM TIER¶
M1: cost-estimation¶
Complexity: ⭐⭐ | Model: [local-ok]
Objective¶
Add a Python estimation script and move the pricing registry into the skill's references/ directory for proper progressive disclosure. Update SKILL.md to reference new paths.
File Changes¶
New Files¶
cost-estimation/references/MODEL_PRICING_REGISTRY.md
- Move from ~/.openclaw/workspace/MODEL_PRICING_REGISTRY.md (current location)
- No content changes — copy as-is
- Update all internal references if any exist
cost-estimation/scripts/estimate.py
- Language: Python 3
- Purpose: Parse session token data and compute estimated cost without manual math
- Inputs (CLI args):
- --input-tokens INT — input tokens used
- --output-tokens INT — output tokens used
- --cached-tokens INT (optional, default 0) — cached input tokens
- --model STR — model name matching registry (e.g. claude-sonnet-4-6)
- --registry PATH (optional) — path to pricing registry MD; defaults to ../references/MODEL_PRICING_REGISTRY.md relative to script location
- Behavior:
1. Parse the pricing registry MD file — find the row matching --model, extract input/output/cached per-million prices
2. Calculate: (input_tokens / 1_000_000 * input_price) + (output_tokens / 1_000_000 * output_price) + (cached_tokens / 1_000_000 * cached_price)
3. Print a formatted summary:
Model: claude-sonnet-4-6
Input: 1,234,567 tokens × $3.00/M = $3.70
Output: 45,678 tokens × $15.00/M = $0.69
Cached: 123,456 tokens × $0.30/M = $0.04
─────────────────────────────────────────────
Estimated total: $4.43
⚠ Flag: Exceeds $5 single-session threshold
ERROR: Model 'X' not found in registry. Available: [list] and exit 1
Modified Files¶
cost-estimation/SKILL.md — frontmatter + body updates:
---
name: cost-estimation
description: >
Estimate API spend from local OpenClaw telemetry using the MODEL_PRICING_REGISTRY.
Use during heartbeat checks or when Jeff asks about token burn or costs. Also use when
computing post-session spend after heavy sub-agent runs (Melody/Quinn/Atlas). Run
scripts/estimate.py for automated calculation — provide token counts from session_status.
Do NOT use for billing disputes or exact invoicing — this produces estimates only,
not billing-grade numbers.
---
SKILL.md body changes:
- Step 2: Update path reference from workspace root to references/MODEL_PRICING_REGISTRY.md within the skill folder
- Step 3: Add instruction to use python scripts/estimate.py instead of manual math. Show exact CLI invocation with token counts from session_status
- Add Examples section (see below)
- Add Anomaly Thresholds section (keep existing content)
Examples section to add:
## Examples
### Automated calculation
After a session shows 1.2M input, 45K output, 120K cached on claude-sonnet-4-6:
### Manual fallback (if script unavailable)
Read references/MODEL_PRICING_REGISTRY.md, locate the model row, apply formula:
### What NOT to report
Never report "we spent $X" as exact billing. Always qualify: "approximately $X estimated."
Acceptance Criteria (Quinn)¶
cost-estimation/references/MODEL_PRICING_REGISTRY.mdexists and contains the same pricing data as the source filecost-estimation/scripts/estimate.pyexists and is executable (python scripts/estimate.py --helpdoes not crash)- Running the script with valid args produces formatted output matching the specified format
- Running with an unknown model produces a clear error and exits non-zero
- SKILL.md
descriptionfield is under 1024 characters and includes a negative trigger - SKILL.md body references
references/MODEL_PRICING_REGISTRY.md(not the old workspace root path) - SKILL.md body shows a concrete example of
python scripts/estimate.pyinvocation
M2: service-management (absorbs project-scaffolding)¶
Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]
Objective¶
Merge project-scaffolding into service-management. Expand service-management with shell scripts for common operations, a references directory, troubleshooting section, and worked examples. Delete project-scaffolding skill after merge.
File Changes¶
New Files¶
service-management/scripts/start-service.sh
- Language: Bash
- Purpose: Start a named service in detached production mode on Mac mini
- Inputs (CLI args):
- $1 — service name (display label, e.g. "Mission Control")
- $2 — command to run (e.g. npm start)
- $3 — working directory (absolute path)
- $4 — port number to health-check
- $5 (optional) — health endpoint path, defaults to /api/health
- Behavior:
1. cd to working dir, fail if not found
2. Launch: nohup $COMMAND > service.log 2>&1 < /dev/null &
3. Write PID: echo $! > service.pid
4. Poll curl -sf http://localhost:$PORT$HEALTH_PATH every 2s for up to 30s
5. On 200 OK: print ✓ $SERVICE_NAME is up (PID $PID) and exit 0
6. On timeout: print ✗ $SERVICE_NAME failed to start. Last log: + tail service.log, exit 1
- Error handling: Check that port isn't already bound before launching (use lsof -ti :$PORT on macOS; if occupied, print warning and exit 1)
service-management/scripts/stop-service.sh
- Language: Bash
- Purpose: Stop a running service using its PID file
- Inputs:
- $1 — working directory containing service.pid
- $2 (optional) — service name for display
- Behavior:
1. Read PID from service.pid
2. Send SIGTERM, wait up to 10s
3. If still alive, send SIGKILL
4. Remove service.pid
5. Print ✓ $SERVICE_NAME stopped or ✗ No PID file found at $DIR/service.pid
service-management/scripts/check-health.sh
- Language: Bash
- Purpose: Verify a running service is healthy
- Inputs:
- $1 — port number
- $2 (optional) — health endpoint path, defaults to /api/health
- $3 (optional) — service name for display
- Behavior:
1. curl -sf -o /dev/null -w "%{http_code}" http://localhost:$PORT$PATH
2. If 200: print ✓ $SERVICE_NAME healthy (HTTP 200) and exit 0
3. If other code: print ✗ $SERVICE_NAME unhealthy (HTTP $CODE) and exit 1
4. If curl fails: print ✗ $SERVICE_NAME unreachable and exit 1
service-management/scripts/scaffold.sh
- Language: Bash
- Purpose: Scaffold a new VV Next.js project to standard
- Inputs:
- $1 — project name (kebab-case, becomes directory name and npm name)
- $2 — target port number
- $3 (optional) — parent directory, defaults to ~/projects
- Behavior:
1. Validate that $1 is kebab-case (reject if contains spaces or uppercase)
2. cd $PARENT_DIR && npx create-next-app@latest $PROJECT_NAME --typescript --tailwind --app --src-dir=false --import-alias "@/*" --yes
3. Create app/api/health/route.ts with { ok: true, ts: new Date().toISOString() } response
4. Create scripts/start.sh, scripts/stop.sh, scripts/status.sh using the service-management templates (see references/script-templates.md)
5. Create globals.css with VV dark theme variables (see references/globals-template.css)
6. Append dev port override to package.json scripts: "dev": "next dev -p $PORT"
7. Run npm run build — fail loudly if errors
8. Print summary: project path, port, next steps
- Error handling: If create-next-app fails, print error and exit 1. Do NOT proceed with subsequent steps.
service-management/references/standard-stack.md
- Purpose: Reference document defining the VV standard tech stack for new projects
- Content:
- Next.js 15 (App Router), TypeScript, Tailwind CSS v4
- Font choices: Inter (UI), JetBrains Mono (data/code)
- Dark theme baseline: #0a0a0a bg, #141414 cards
- Health endpoint contract: GET /api/health → { ok: true, ts: ISO-8601 }
- PID file convention: service.pid in project root
- Log file convention: service.log in project root
- Port assignment registry (list known ports: 3100=MC, 3200=next project, etc.)
- Jest for unit tests, minimum threshold: 100% pass on build
- Git: initialize on scaffold, initial commit required
service-management/references/globals-template.css
- Purpose: Copy-paste starting CSS with VV dark theme variables
- Content: CSS custom properties for colors, spacing tokens, font stacks. Dark theme. Matches VV brand. Tailwind v4 compatible (use @theme directive).
service-management/references/script-templates.md
- Purpose: Templates for the per-project start/stop/status scripts
- Content: Three shell script templates (start.sh, stop.sh, status.sh) with placeholders {{PORT}}, {{PROJECT_NAME}}, {{HEALTH_PATH}}. Note: these are per-project convenience wrappers that call the skill-level scripts.
service-management/references/troubleshooting.md
- Purpose: Reference for common service management failure modes
- Content:
- Port already bound: lsof -ti :$PORT | xargs kill -9 (macOS), then re-run start
- Stale PID file: Service crashed without cleanup. Remove .pid, verify process is dead, restart.
- launchd issues: MC not a launchd service — do not try to create plist unless explicitly requested
- next build fails: TypeScript errors must be resolved before proceeding. Never bypass with --no-lint in production.
- Health check returns 404: /api/health route not created. Scaffold may be incomplete — check app/api/health/route.ts exists.
- nohup log empty after 5s: Service likely crashed on startup. cat service.log to see error.
- Port conflicts between MC and new project: Keep port registry in references/standard-stack.md updated.
Modified Files¶
service-management/SKILL.md — full rewrite to absorb project-scaffolding:
Frontmatter:
---
name: service-management
description: >
Launch, maintain, verify, and scaffold long-running local web services on the Mac mini.
Use when starting, stopping, or health-checking Mission Control or any VV web app. Also use
when Melody or Jules needs to scaffold a new Next.js/React project to VV standards.
Scripts: start-service.sh, stop-service.sh, check-health.sh, scaffold.sh.
Do NOT use for cloud-deployed services, CI/CD pipelines, or Docker containers —
this is for local Mac mini operations only.
---
Body section outline:
1. Core Principle — production mode only; startup log ≠ healthy
2. Launch Rules — existing 4 rules (preserved)
3. Scripts Reference — one-line description of each script, example invocations
4. New Project Setup — merged from project-scaffolding: standard stack, scaffold.sh usage, what scaffold.sh does, manual steps if script fails
5. Health Verification Protocol — after any launch: run check-health.sh, verify 200 OK
6. Troubleshooting — link to references/troubleshooting.md + inline quick-reference for top 3 issues
7. Examples — two walkthroughs (see below)
8. Local macOS Constraints — existing lsof note (preserved)
Examples section:
## Examples
### Starting Mission Control from scratch
```bash
./scripts/start-service.sh "Mission Control" "npm start" ~/projects/mission-control 3100
# Expected: ✓ Mission Control is up (PID 12345)
Scaffolding a new VV app¶
./scripts/scaffold.sh vv-analytics 3200
# Expected: new project at ~/projects/vv-analytics, builds clean, port 3200
Manual stop when PID file is stale¶
ps aux | grep "npm start" | grep -v grep
# Find PID, then:
kill -TERM $PID
rm ~/projects/mission-control/service.pid
#### Files to Delete
- `project-scaffolding/SKILL.md` — delete after merge verified
### Acceptance Criteria (Quinn)
1. All four scripts exist in `service-management/scripts/`: `start-service.sh`, `stop-service.sh`, `check-health.sh`, `scaffold.sh`
2. All scripts are executable (`chmod +x` applied)
3. `start-service.sh` with a running port returns exit 1 before attempting launch
4. `stop-service.sh` gracefully handles missing PID file (exit 1 with clear message, no crash)
5. `check-health.sh` returns exit 0 for a live service at the correct port
6. `scaffold.sh` creates a project directory with `app/api/health/route.ts` present
7. `scaffold.sh` runs `npm run build` and fails loudly (exit 1) if build errors exist
8. `service-management/references/standard-stack.md` exists and lists at least port 3100=MC
9. `service-management/references/globals-template.css` exists and contains at least 5 CSS custom properties
10. `service-management/references/troubleshooting.md` exists and covers at least 5 failure modes
11. `service-management/SKILL.md` description is under 1024 characters with negative trigger
12. SKILL.md body contains a "New Project Setup" section
13. SKILL.md body contains an "Examples" section with at least 2 walkthroughs
14. `project-scaffolding/SKILL.md` has been deleted
---
## M3: qa-validation
**Complexity:** ⭐⭐⭐ | **Model:** `[local-ok]`
### Objective
Add a validation script to automate the QA checklist, add a Peekaboo reference guide, and add a troubleshooting section for common build failure patterns.
### File Changes
#### New Files
**`qa-validation/scripts/validate.sh`**
- Language: Bash
- Purpose: Automate the QA checklist — run build, tests, health check in sequence
- Inputs (CLI args):
- `$1` — project root directory (absolute path)
- `$2` (optional) — port number for health check, defaults to 3100
- `$3` (optional) — health endpoint, defaults to `/api/health`
- Behavior (ordered steps, halt on first failure):
1. `cd $PROJECT_ROOT` — fail if directory not found
2. Print `[1/4] Static analysis: npm run build`
- Run `npm run build` — capture exit code
- If non-zero: print `✗ Build failed. Fix TypeScript/lint errors before proceeding.` + last 20 lines of output, exit 1
- If zero: print `✓ Build passed`
3. Print `[2/4] Unit tests: npm test`
- Run `npm test -- --watchAll=false` (non-interactive)
- If non-zero: print `✗ Tests failed.` + output, exit 1
- If zero: print `✓ Tests passed`
4. Print `[3/4] Service check: verifying process is running on port $PORT`
- `lsof -ti :$PORT > /dev/null 2>&1 || (echo "No process on port $PORT — start the service first" && exit 1)`
5. Print `[4/4] Health endpoint: curl $HEALTH_ENDPOINT`
- `curl -sf http://localhost:$PORT$HEALTH_PATH`
- If fails: print `✗ Health check failed. Service may be starting — wait and retry.`, exit 1
- If passes: print `✓ Health check passed`
6. Print `✓ All QA checks passed. Safe to report complete.` and exit 0
- Notes: Script does NOT run Peekaboo (visual QA requires human/agent judgment). It prints a reminder to run Peekaboo visual checks after all automated checks pass.
**`qa-validation/references/peekaboo-guide.md`**
- Purpose: Step-by-step guide for visual QA using Peekaboo CLI
- Content sections:
- **What is Peekaboo** — 2 sentences: macOS UI capture CLI, use for visual verification when automated checks pass
- **Capture a screenshot:** `peekaboo capture --app "Brave Browser" --output /tmp/qa-screen.png` + example output
- **Navigate and capture tabs:** how to use Peekaboo to click through tabs and capture each
- **Button interaction:** how to use Peekaboo to click buttons and verify response
- **MC-specific checklist:** ordered list of every MC tab/section to verify (derived from existing qa-validation SKILL.md Mission Control section)
- **When visual QA is required:** any MC change, any new dashboard component, any data source change
- **When visual QA can be skipped:** backend-only changes with no rendering impact (still requires health check)
- **Reporting:** how to report visual QA pass/fail — include screenshot path in QA report
#### Modified Files
**`qa-validation/SKILL.md`** — add Troubleshooting section and script reference:
Frontmatter (no change needed — already has negative trigger):
```yaml
---
name: qa-validation
description: Standard operating procedure for Quinn and Jules to validate code builds, system changes, and UI implementations before marking a task complete. Do NOT use for research validation or business analysis — this is for code and build QA only.
---
Add to SKILL.md body:
Scripts Reference section (new, insert after Step 2):
## Automated QA Script
Run `scripts/validate.sh` to execute steps 1–4 programmatically:
```bash
./scripts/validate.sh /path/to/project 3100
**Troubleshooting section (new, append at end):**
```markdown
## Troubleshooting
### Build fails with TypeScript errors
- Do NOT bypass with `// @ts-ignore` or `--noEmit` hacks
- Fix the actual type error. If it's in a 3rd-party type definition, add a proper type override.
- Common cause: Melody used a deprecated API. Check the actual error message.
### Tests fail but code looks right
- Check if tests are against stale build artifacts: `rm -rf .next && npm run build`
- Check if a test is importing a module that requires environment variables (mock them in jest.config)
- Never skip failing tests to ship. If a test is wrong, fix the test or document why it's wrong.
### Health check returns 404
- `/api/health` route is missing or renamed. Verify `app/api/health/route.ts` exists and exports GET.
### Health check connection refused
- Service is not running. Run `start-service.sh` first or check for crashed PID.
### Peekaboo can't find Brave Browser
- Verify Brave is open. Peekaboo requires the target app to be running and visible.
- Try `peekaboo list-windows` to see what's available.
### Visual QA shows blank page
- Next.js production mode doesn't hot-reload. After any change: rebuild + restart + verify.
- Check `service.log` for runtime errors that don't appear in build output.
Acceptance Criteria (Quinn)¶
qa-validation/scripts/validate.shexists and is executable- Running validate.sh against a project with a passing build, passing tests, and live health endpoint exits 0
- Running validate.sh against a project with a failing build exits 1 with non-empty error output
- Running validate.sh against a project where no service is running on the specified port exits 1 with a clear message
qa-validation/references/peekaboo-guide.mdexists with at least 5 distinct sections- The peekaboo guide contains an MC-specific checklist with at least 5 items
- SKILL.md body contains a "Troubleshooting" section with at least 5 failure scenarios
- SKILL.md body references
scripts/validate.shwith a concrete invocation example - SKILL.md description is unchanged (already correct)
M4: vv-sigint¶
Complexity: ⭐⭐ | Model: [local-ok]
Objective¶
Add examples of good vs rejected signals, add troubleshooting section for source failures, add a script to check source URL availability, and fix stale cron schedule references.
File Changes¶
New Files¶
vv-sigint/scripts/check-sources.sh
- Language: Bash
- Purpose: Ping all sources in the watch list and report which are reachable
- Inputs: None (reads references/sources.md internally)
- Behavior:
1. Parse references/sources.md — extract all URLs (lines matching http:// or https://)
2. For each URL, curl -sf -o /dev/null -w "%{http_code}" --max-time 10 $URL
3. Collect results: OK (200-299), REDIRECT (300-399), BLOCKED (403/429), DEAD (other/timeout)
4. Print grouped summary:
✓ OK (12): [list of domains]
⚠ REDIRECT (2): [list — may need URL update]
✗ BLOCKED (1): [list — check if RSS still available]
✗ DEAD (1): [list — remove from sources.md]
Modified Files¶
vv-sigint/SKILL.md — add Examples, Troubleshooting sections, fix cron reference:
Frontmatter: no change (already good).
Changes to body: 1. Fix stale cron schedule section: Remove any hardcoded scan times. Replace with: "Scan schedule is configured in OpenClaw cron, not here. Check the active cron config for current schedule. This skill covers the scan procedure, not the schedule." 2. Add Examples section (new, after Relevance Scoring):
## Examples
### Well-scored signal (8/10 — include)
> [2026-03-15] article — OpenAI launches "Coach GPT" for professional athletes, targeting NFL/NBA teams — source: techcrunch.com/2026/03/15/openai-coach-gpt
Why it scores high: Direct competitive signal in our target market (professional sports analytics). Affects positioning of any VV sports product. Actionable intelligence.
### Borderline signal (5/10 — include with lower priority)
> [2026-03-15] social — Reddit thread: "What fitness apps actually track VO2 max correctly?" r/fitness — source: reddit.com/r/fitness/...
Why borderline: Consumer space, not B2B. But reveals user frustration with existing tools — potential ICP signal.
### Rejected signal (2/10 — discard)
> General article about tech layoffs at Meta, no fitness/sports/AI angle
Why rejected: No connection to VV goals filter. Not a competitor. Not a market signal. Discard.
### Displacement example (column at 30-item cap)
New 7/10 signal arrives. Scan existing column. Find lowest-scored item (3/10). Move it to `## Dismissed` with note: `[displaced by higher-priority signal on YYYY-MM-DD]`. Insert new signal.
- Add Troubleshooting section (new, append at end):
## Troubleshooting ### Source returns 403 Forbidden - The site is blocking automated crawlers. - Try `web_fetch` with a different user-agent, or check if they have a public RSS feed. - If consistently blocked, mark the source as "manual-only" in sources.md and check it via browser on digest day. - Run `scripts/check-sources.sh` to batch-check all sources and identify which are blocking. ### RSS feed returns empty or malformed XML - The feed URL may have changed. Check the source's website for a current feed link. - Blogwatcher may cache stale feeds — force a refresh: `blogwatcher refresh [source-name]`. ### Signal count drops to zero after a scan - Check if sources.md was recently modified (sources removed without replacement). - Check if the local model (used for triage) is returning "below threshold" for everything — test with a known-relevant article manually. ### Duplicate signals appearing in SIGINT.md - The retention check should catch these. If duplicates persist, search SIGINT.md for the source URL before inserting. - Add a grep step before writing: `grep -F "$URL" SIGINT.md` — skip if match found. ### MC events API call failing - The events endpoint may be down or require auth. Verify MC is running: `curl -sf http://localhost:3100/api/health`. - If MC is down, log the scan to the daily memory file instead and note that MC event was not emitted.
Acceptance Criteria (Quinn)¶
vv-sigint/scripts/check-sources.shexists and is executable- Running check-sources.sh produces grouped output (OK/REDIRECT/BLOCKED/DEAD) without crashing
- SKILL.md body contains an "Examples" section with at least 3 scored signal examples (good, borderline, rejected)
- SKILL.md body contains a "Troubleshooting" section with at least 4 failure scenarios
- SKILL.md body contains no hardcoded cron schedule times (replaced with cron config reference)
- Frontmatter description is unchanged
M5: vv-dashboard-design¶
Complexity: ⭐⭐ | Model: [local-ok]
Objective¶
Add worked implementation examples to references/, add a script to detect hardcoded hex values in component files, and update SKILL.md to link to these.
File Changes¶
New Files¶
vv-dashboard-design/scripts/check-tokens.sh
- Language: Bash
- Purpose: Grep component files for hardcoded hex color values that should be CSS tokens
- Inputs:
- $1 — project root directory to scan
- Behavior:
1. grep -rn "#[0-9a-fA-F]\{3,6\}" $PROJECT_ROOT/app $PROJECT_ROOT/components (if those dirs exist)
2. Exclude: comments (//, /*), CSS variable definitions (lines containing --), and the tokens file itself
3. Print matches grouped by file:
✗ Hardcoded hex values found (should use CSS tokens):
components/StatCard.tsx:12: className="text-[#40a060]"
app/dashboard/page.tsx:45: color: "#c04040"
Fix: Replace with var(--color-accent) or the equivalent Tailwind token class.
Run: grep -r "color-" references/tokens.md to find the right token name.
✓ No hardcoded hex values found. Token discipline maintained.
5. Exit 1 if violations found, exit 0 if clean
vv-dashboard-design/references/examples/stat-card.md
- Purpose: Correct and incorrect implementation of the Stat Card component
- Content:
- ✅ Correct: Full React component using CSS tokens, tabular-nums, delta badge
- ❌ Incorrect: Same component with hardcoded colors, missing tabular-nums, no status icon
- Explanation of each diff and why it matters
vv-dashboard-design/references/examples/alert-feed.md
- Purpose: Alert/activity feed implementation reference
- Content:
- ✅ Correct: Timestamped entries, severity-coded with icon+color (not color alone), auto-scroll, max-height with scroll
- ❌ Incorrect: Color-only severity (accessibility fail), no timestamp, no scroll limit
- VV severity color mappings: critical=#c04040, warning=#c0a040, ok=#40a060, idle=#555
vv-dashboard-design/references/examples/empty-state.md
- Purpose: Empty state pattern for when no data is available
- Content:
- ✅ Correct: Helpful message, icon, suggested action
- ❌ Incorrect: Blank panel, loading spinner that never resolves
- Template JSX for standard VV empty state
Modified Files¶
vv-dashboard-design/SKILL.md — add script reference and examples link:
In the "Component Patterns" section, add after the existing list:
## Examples
See `references/examples/` for correct vs incorrect implementations of:
- `stat-card.md` — KPI stat card with trend delta
- `alert-feed.md` — severity-coded activity feed
- `empty-state.md` — empty state pattern
## Token Compliance Check
Before marking any UI task complete, run:
```bash
./scripts/check-tokens.sh /path/to/mc-project
### Acceptance Criteria (Quinn)
1. `vv-dashboard-design/scripts/check-tokens.sh` exists and is executable
2. Running check-tokens.sh against a directory containing `color: "#40a060"` exits 1 and prints the file/line
3. Running check-tokens.sh against a clean directory exits 0
4. `references/examples/stat-card.md` exists with at least one ✅ and one ❌ example
5. `references/examples/alert-feed.md` exists with severity color mappings documented
6. `references/examples/empty-state.md` exists with template JSX
7. SKILL.md references `scripts/check-tokens.sh` with a concrete invocation example
8. SKILL.md references `references/examples/` directory
---
## M6: project-pipeline
**Complexity:** ⭐ | **Model:** `[local-ok]`
### Objective
Move the evaluation template from workspace root into the skill's `references/` directory. Add a completed example evaluation. Update SKILL.md to use relative path.
### File Changes
#### New Files
**`project-pipeline/references/evaluation-template.md`**
- Purpose: The reusable evaluation template, moved from workspace root
- Content: Copy from `~/.openclaw/workspace/PROJECT_EVALUATION_TEMPLATE.md` verbatim
- After verifying SKILL.md is updated, the workspace root copy can be archived (move to `project-pipeline/references/`, do not delete unless Jeff confirms)
**`project-pipeline/references/examples/apa-evaluation.md`**
- Purpose: Completed example evaluation showing how to use the template
- Content: A realistic (non-fictional) completed evaluation of the APA (Athletic Performance Analytics) initiative using the template structure. Show all fields filled in with real-style content. Demonstrate: conclusion-first, moat vs moat-hypothesis distinction, pursue/defer/reject recommendation.
#### Modified Files
**`project-pipeline/SKILL.md`** — update template path:
- Change: `use the evaluation template at /Users/viverevitalis/.openclaw/workspace/PROJECT_EVALUATION_TEMPLATE.md`
- To: `use the evaluation template at references/evaluation-template.md`
- Add after the template reference: `See references/examples/apa-evaluation.md for a completed example.`
### Acceptance Criteria (Quinn)
1. `project-pipeline/references/evaluation-template.md` exists and is non-empty
2. `project-pipeline/references/examples/apa-evaluation.md` exists with all template fields completed
3. SKILL.md references `references/evaluation-template.md` (not the old absolute workspace root path)
4. SKILL.md references `references/examples/apa-evaluation.md`
5. The example evaluation demonstrates the pursue/defer/reject recommendation format
---
## M7: frontend-design
**Complexity:** ⭐⭐ | **Model:** `[cloud-required]`
### Objective
Add a `references/` directory with curated font pairings and color palette examples. Update SKILL.md to link to these as starting-point resources.
### File Changes
#### New Files
**`frontend-design/references/font-pairings.md`**
- Purpose: Curated font pairing guide for distinctive VV frontend work
- Content sections:
- **What makes a pairing work** — 3 sentences: contrast (weight/style), shared geometry, role clarity (display vs body)
- **Pairings by aesthetic direction** — for each of the aesthetics in SKILL.md (minimal, maximalist, brutalist, editorial, luxury, retro-futuristic, organic), provide:
- Display font name + Google Fonts URL
- Body font name + Google Fonts URL
- One-line rationale
- CSS import snippet
- **Anti-pairings** — fonts to avoid and why (Inter alone, Roboto alone, "Space Grotesk + Inter" as the cliché AI pairing)
- **Variable font usage** — when and how to use variable fonts for animation
- Minimum 8 distinct pairings total
**`frontend-design/references/color-examples.md`**
- Purpose: Color palette examples organized by aesthetic direction
- Content sections:
- **How to read these palettes** — dominant, accent, neutral, and semantic roles
- **Palettes by direction** — for each aesthetic in SKILL.md, provide:
- A 5-color palette with hex values and role labels
- CSS custom property declarations
- "What this communicates" — one sentence
- **Color anti-patterns** — purple gradients on white, neon on neon, flat grays everywhere
- **Dark theme starting point** — the VV dark palette (from service-management globals) as a reference baseline
- **Accessibility notes** — WCAG AA contrast minimum, how to check with browser devtools
- Minimum 6 distinct palettes total
#### Modified Files
**`frontend-design/SKILL.md`** — add References section:
After the "Output Requirements" section, add:
```markdown
## Starting Resources
Before committing to an aesthetic, check:
- `references/font-pairings.md` — curated pairings organized by direction
- `references/color-examples.md` — palette examples by aesthetic
These are starting points, not constraints. Deviate intentionally, not by default.
Acceptance Criteria (Quinn)¶
frontend-design/references/font-pairings.mdexists with at least 8 distinct pairings- Each pairing includes display font, body font, rationale, and CSS import snippet
- An anti-pairings section exists naming at least 3 pairings to avoid
frontend-design/references/color-examples.mdexists with at least 6 distinct palettes- Each palette includes hex values, role labels, and CSS variable declarations
- SKILL.md references both files in a "Starting Resources" or equivalent section
M8: memory-manager¶
Complexity: ⭐⭐ | Model: [local-ok]
Objective¶
Add concrete examples of good vs bad memory entries, and add an optional script to check for line count bloat and surface obvious contradictions.
File Changes¶
New Files¶
memory-manager/scripts/consolidate-check.py
- Language: Python 3
- Purpose: Audit memory files for line count bloat and surface potential contradictions
- Inputs (CLI args):
- --memory-dir PATH — path to memory/ directory (defaults to ~/.openclaw/workspace/memory/)
- --semantic-file PATH — path to MEMORY.md (defaults to ~/.openclaw/workspace/MEMORY.md)
- --max-daily-lines INT — warn if a daily file exceeds this (default: 200)
- --max-semantic-lines INT — warn if MEMORY.md exceeds this (default: 500)
- Behavior:
1. Scan memory/ directory. For each .md file:
- Count lines. If > max-daily-lines, flag as "bloated"
- Print: memory/2026-03-15.md: 312 lines ⚠ (exceeds 200 — consider consolidating)
2. Check MEMORY.md line count. If > max-semantic-lines, flag.
3. Naive contradiction detection: Find lines where the same subject appears with conflicting modifiers (e.g., "Jeff prefers X" and "Jeff prefers Y" where X≠Y). This is a simple keyword scan, not semantic comparison — flag for human review, not auto-resolve.
4. Print summary:
Memory Audit — 2026-03-20
Daily files: 14 total, 2 flagged for consolidation
MEMORY.md: 342 lines (OK)
Potential contradictions to review:
Line 45 vs Line 112: "Jeff prefers concise replies" / "Jeff prefers thorough explanations"
Modified Files¶
memory-manager/SKILL.md — add Examples section and script reference:
Add Examples section after "What to Capture":
## Examples
### Good memory entry — specific, durable, actionable
> **[2026-03-15] Jeff confirmed: APA is the top-priority initiative for Q2. Revenue target: $50K ARR by end of year. ICP is mid-market sports teams (50-200 athletes).**
Why it's good: Specific, time-stamped decision with clear business context. Actionable — affects how we prioritize work.
### Bad memory entry — vague, transient, not worth keeping
> **Jeff said he was tired today and might want shorter updates.**
Why it's bad: Transient state, not durable preference. Will be stale within a day. Not worth storing in long-term memory.
### Good MEMORY.md entry — distilled, not redundant
> **Jeff's communication preference (confirmed multiple times): conclusions first, then rationale. Never lead with caveats.**
Why it's good: Repeated pattern confirmed across sessions — worth keeping long-term. Not a one-time observation.
### Bad MEMORY.md entry — redundant with daily notes
> **On 2026-03-15, we discussed APA. Jeff said it was important. We then talked about the skills overhaul.**
Why it's bad: This belongs in the daily note, not MEMORY.md. MEMORY.md is for distilled insights, not session logs.
### Consolidation trigger
Daily file at 300+ lines → run `scripts/consolidate-check.py` to identify which sections to distill into MEMORY.md.
Add script reference after the Smart Startup Routine section:
## Memory Health Check
Run `scripts/consolidate-check.py` when daily memory files feel bloated or when MEMORY.md grows unwieldy.
This script identifies files to consolidate and surfaces potential contradictions for human review.
It is advisory only — never auto-resolves contradictions.
Acceptance Criteria (Quinn)¶
memory-manager/scripts/consolidate-check.pyexists and runs without crashing on an empty directory- Running the script with a daily file exceeding 200 lines prints a warning for that file
- Running the script with MEMORY.md under 500 lines prints no warnings for the semantic file
- SKILL.md contains an "Examples" section with at least 4 entries (2 good, 2 bad)
- Each example is labeled with "Why it's good" or "Why it's bad" explanation
- SKILL.md references
scripts/consolidate-check.py
M9: openclaw-prime¶
Complexity: ⭐ | Model: [local-ok]
Objective¶
Add a troubleshooting section for common gateway and admin issues. No scripts or reference files needed.
File Changes¶
Modified Files¶
openclaw-prime/SKILL.md — add Troubleshooting section:
Frontmatter: no change needed (already has negative trigger).
Append to body:
## Troubleshooting
### Gateway won't start
- Check `openclaw gateway status` first — may already be running.
- If crashed: `openclaw gateway stop && openclaw gateway start`
- Check logs: `openclaw gateway logs` (or equivalent) for the specific error.
- Common cause: port conflict. Verify the bind port is not in use.
### Node won't pair / QR code fails
- Use the node-connect skill for systematic pairing diagnosis.
- Quick check: Is the gateway reachable on its public URL? Try `curl -sf $PUBLIC_URL/health`.
- Bootstrap token may be expired — regenerate from gateway config.
### Channel not receiving messages
- Verify the channel is configured correctly: `openclaw gateway status` shows active channels.
- For Telegram: check bot token is valid. Send `/start` to the bot directly.
- For webhook channels: verify the webhook URL matches the gateway's public URL.
### Model routing sending to wrong model
- Check `openclaw gateway config` for the routing table.
- Session-level overrides take precedence over defaults — verify no override is active.
- If fallback is triggering unexpectedly: the primary model may be rate-limited or timing out.
### Config changes not taking effect
- OpenClaw requires a gateway restart after config file changes.
- `openclaw gateway restart` — verify with `openclaw gateway status`.
- If using org-level config: individual settings may be overridden. Check admin console.
### "Unauthorized" errors from agents
- API keys may have rotated. Check the key in gateway config against the current key in the provider console.
- Session authentication: verify the session token hasn't expired.
> **When in doubt:** Read live docs first (this skill's prime workflow), not memory. OpenClaw configs change; memory doesn't always keep up.
Acceptance Criteria (Quinn)¶
- SKILL.md body contains a "Troubleshooting" section
- Troubleshooting section covers at least 6 distinct failure scenarios
- Each scenario has: symptom identification + at least one concrete resolution step
- The node-connect skill is referenced for pairing issues (do not duplicate that skill's content)
- Frontmatter description is unchanged
COMPLEX TIER¶
C1: doc-coauthoring¶
Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]
Objective¶
New skill. Systematic 3-stage document coauthoring workflow: Context Gathering → Refinement Loop → Reader Testing. Adapted from Anthropic's example skill. Primary users: Forge (specs), Jules (proposals), Atlas (research docs).
File Structure¶
doc-coauthoring/
├── SKILL.md
├── references/
│ ├── document-types.md
│ ├── quality-checklist.md
│ └── examples/
│ ├── spec-example.md
│ └── proposal-example.md
└── scripts/
└── word-count.sh
Frontmatter¶
---
name: doc-coauthoring
description: >
Collaborative document creation with structured context gathering, iterative refinement,
and reader testing. Use when creating specs, proposals, research docs, briefs, or any
multi-section document that requires consistent quality and structure. Triggered by:
"help me write a spec", "draft a proposal", "coauthor a brief", "write a doc with me",
or when Forge, Atlas, or Jules needs to produce a formal document artifact.
Do NOT use for quick single-paragraph responses, memory capture, or code documentation —
this is for multi-section formal documents only.
---
SKILL.md Body Outline¶
1. When to Use — multi-section documents, formal artifacts, anything that requires structure and quality review before delivery. Examples: specs, proposals, research briefs, strategic memos.
2. Three-Stage Workflow
Stage 1: Context Gathering
- Before writing a single word, collect:
- Document type (spec, proposal, research, brief, memo, other)
- Primary audience (Jeff, external client, agent team, public)
- Purpose: what decision does this document enable or inform?
- Required sections (explicit) and implied sections (based on type)
- Length target: short (<500w), standard (500-2000w), long (2000+)
- Tone: formal, operational, strategic, technical
- Deadline or urgency level
- Ask for any missing context. Do not guess audience or purpose.
- Read references/document-types.md to confirm expected section structure for the document type.
Stage 2: Draft → Quality Check → Refinement Loop
- Write complete first draft.
- Before sharing, self-evaluate against references/quality-checklist.md:
- Does the opening state the purpose and conclusion (BLUF for ops docs)?
- Is every section necessary? Remove anything that doesn't serve the purpose.
- Are all claims supported or flagged as assumptions?
- Is tone consistent throughout?
- Are there any undefined acronyms or jargon for the target audience?
- If 3+ checklist items fail, revise before sharing.
- Share draft with note: "Here's the first draft. Key decisions/gaps: [list]."
- Incorporate feedback. Repeat until approved.
Stage 3: Reader Testing
- Before finalizing, test from the reader's perspective:
- Can someone unfamiliar with the context understand the purpose from the first paragraph?
- Are action items (if any) unambiguous?
- Run scripts/word-count.sh — is the document within the target length?
- If the document is a spec: ask "Can Melody build from this without clarifying questions?"
- If the document is a proposal: ask "Does this make the decision easy for the reader?"
- Deliver final version with a one-sentence summary of what changed from draft 1.
3. Document Type Reference
Link to references/document-types.md for expected structure by type.
4. Examples
Link to references/examples/spec-example.md and references/examples/proposal-example.md.
5. Troubleshooting - "I don't know what sections to include" → Read document-types.md for your type. - "The document is too long" → Run word-count.sh. Identify the longest section. Is it doing the work? If not, cut it. - "Feedback keeps changing" → The audience may not have been defined correctly. Return to Stage 1. - "The reader said it's unclear" → The opening paragraph probably failed reader testing. Fix BLUF.
Reference File Descriptions¶
references/document-types.md
- Sections for each document type: Spec, Proposal, Research Brief, Strategic Memo, Initiative Brief
- For each type: purpose, primary audience, required sections (ordered), common mistakes, length guidance
references/quality-checklist.md
- Universal checklist (applies to all types): 10 items
- Type-specific additions for Spec, Proposal, Research Brief
references/examples/spec-example.md
- An anonymized but realistic spec fragment showing: frontmatter, objective, scope, acceptance criteria
- Annotated to explain what makes it work
references/examples/proposal-example.md
- A realistic proposal fragment showing: BLUF opening, problem statement, proposed solution, recommendation
- Annotated
scripts/word-count.sh¶
- Language: Bash
- Inputs:
$1— path to markdown file - Behavior: Strip markdown syntax (
sedto remove headers, bullets, code fences), count words (wc -w), print:"$FILENAME: ~$WORD_COUNT words". Also flag if >3000 words with a recommendation to split into sections. - Exit 0 always (advisory)
Acceptance Criteria (Quinn)¶
- All files exist at specified paths
- SKILL.md frontmatter is valid YAML with name, description under 1024 chars, negative trigger present
- SKILL.md body contains exactly 3 stages, clearly labeled
- Stage 1 includes a complete list of context questions (at least 7)
- Stage 2 references
references/quality-checklist.md - Stage 3 references
scripts/word-count.shwith a concrete invocation references/document-types.mdcovers at least 4 document types with required sections listedreferences/quality-checklist.mdcontains at least 10 universal checklist items- Both example files exist and are annotated
scripts/word-count.shis executable and produces word count output without crashing- Running the skill-creator skill against this SKILL.md produces no "missing required field" warnings
C2: webapp-testing¶
Complexity: ⭐⭐⭐⭐⭐ | Model: [cloud-required]
Objective¶
New skill. Playwright-based automated web testing for Quinn. Replaces manual "check every page" QA. Includes a server lifecycle management script adapted from Anthropic's Playwright skill pattern. Primary users: Quinn (automated test execution), Melody (test authoring).
File Structure¶
webapp-testing/
├── SKILL.md
├── scripts/
│ ├── with_server.py
│ ├── run-tests.sh
│ └── capture-baseline.sh
└── references/
├── playwright-setup.md
├── test-patterns.md
└── mc-test-suite.md
Frontmatter¶
---
name: webapp-testing
description: >
Playwright-based automated web testing for VV web applications. Use when Quinn needs to run
a test suite against Mission Control or any VV web app, when Melody writes new Playwright tests,
or when establishing a visual baseline for a new feature. Handles server lifecycle automatically
via with_server.py. Triggered by: "run the test suite", "write Playwright tests",
"verify the build with tests", "capture a visual baseline".
Do NOT use for unit tests (use npm test), API-only validation (use qa-validation validate.sh),
or mobile app testing — this is for web UIs with a browser context only.
---
SKILL.md Body Outline¶
1. When to Use — automated browser testing, visual regression baselines, multi-page navigation verification. Contrast with qa-validation (which handles build, unit tests, and basic health check).
2. Setup Prerequisites
- Playwright must be installed: npx playwright install (check references/playwright-setup.md)
- The target app must be buildable in production mode
- with_server.py handles server start/stop automatically around test runs
3. Core Workflow
Running an existing test suite:
python scripts/with_server.py --port 3100 --start-cmd "npm start" --dir /path/to/project \
-- npx playwright test
Writing a new test:
- Read references/test-patterns.md for VV-standard test structure
- Tests live in tests/ directory of the target project
- Always test: navigation, data display, error states, empty states
- Never test implementation details — test what the user sees
Capturing a visual baseline:
Stores screenshots intests/baselines/ for future regression comparison.
4. MC Test Suite
For Mission Control specifically, read references/mc-test-suite.md. The MC suite covers all tabs, all data columns, all buttons. Run before marking any MC change complete.
5. Troubleshooting
- "Playwright can't connect to browser" → Run npx playwright install chromium
- "Tests fail on CI but pass locally" → Likely a timing issue. Add await page.waitForSelector(...) before assertions.
- "with_server.py can't start server" → Check that the port isn't already bound. Run check-health.sh first.
- "Visual regression diff is huge" → Check if a CSS token changed. Run check-tokens.sh first.
- "Tests are slow" → Run with --workers 1 first to isolate failures. Then increase workers.
Script Descriptions¶
scripts/with_server.py
- Language: Python 3
- Purpose: Manage server lifecycle around a test command — start server, wait for it to be healthy, run tests, stop server. Adapted from Anthropic's pattern.
- Inputs (CLI args):
- --port INT — port to start server on
- --start-cmd STR — command to start the server (e.g. "npm start")
- --dir PATH — working directory for server command
- --health-path STR (optional) — defaults to /api/health
- --timeout INT (optional) — seconds to wait for server healthy, defaults to 30
- -- [test_command...] — the test command to run after --
- Behavior:
1. Start server process using subprocess.Popen with start-cmd in dir
2. Poll http://localhost:PORT/health-path every 2s up to timeout seconds
3. If server healthy: run test command via subprocess.run, capture exit code
4. In finally block: terminate server process (SIGTERM then SIGKILL if needed)
5. Exit with test command's exit code
- Error handling: if server never becomes healthy, print error, terminate server, exit 1
scripts/run-tests.sh
- Language: Bash
- Purpose: Convenience wrapper for common test patterns
- Inputs:
- $1 — project directory
- $2 — port
- $3 (optional) — test file pattern, defaults to all tests
- Behavior: Calls python with_server.py with correct args and npx playwright test $3
scripts/capture-baseline.sh
- Language: Bash
- Purpose: Navigate to key pages and capture screenshots for visual baseline
- Inputs:
- $1 — project directory
- $2 — port
- Behavior:
1. Start server via with_server.py
2. Use Playwright to navigate to each route defined in references/mc-test-suite.md
3. Save screenshots to $PROJECT/tests/baselines/YYYY-MM-DD/
4. Print list of captured files
Reference File Descriptions¶
references/playwright-setup.md
- Playwright installation: npm install -D @playwright/test && npx playwright install
- playwright.config.ts template for VV projects
- How to run headed vs headless
- How to update snapshots: npx playwright test --update-snapshots
references/test-patterns.md
- VV standard test structure: describe → beforeEach (navigate) → test (assertion)
- Selector philosophy: prefer data-testid, then ARIA role, then text content. Never class names.
- Async patterns: await page.waitForLoadState('networkidle') for data-heavy pages
- Assertion patterns: expect(locator).toBeVisible(), expect(locator).toHaveText()
- Error state testing: test what happens when API returns 500
- Empty state testing: test what happens when API returns []
references/mc-test-suite.md
- Complete test checklist for Mission Control
- Routes to test: /, /sigint, /agents, /tasks, /initiatives, /settings (all current MC routes)
- For each route: what to verify (data visible, no blank panels, no console errors)
- Button checklist: all interactive buttons by page
- Data accuracy checks: compare displayed agent count vs actual agent count
Acceptance Criteria (Quinn)¶
- All files exist at specified paths
- SKILL.md frontmatter has valid YAML, description under 1024 chars, negative trigger present
scripts/with_server.pyis executable, starts a server, waits for health, runs a test command, and stops server — verified with a trivial echo command as the testscripts/with_server.pyexits with the test command's exit code (not always 0)scripts/with_server.pyterminates the server even if the test command failsscripts/run-tests.shis executable and callswith_server.pycorrectlyscripts/capture-baseline.shis executable and creates at least one file intests/baselines/references/playwright-setup.mdcontains a workingplaywright.config.tstemplatereferences/test-patterns.mdcovers selector philosophy, async patterns, and error state testingreferences/mc-test-suite.mdcovers all current MC routes with at least one assertion per route- SKILL.md troubleshooting section covers at least 5 failure scenarios
C3: skill-creator¶
Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]
Objective¶
New meta-skill for building and improving VV skills. Guides through use case definition, frontmatter generation, SKILL.md authoring, and trigger testing. Also reviews existing skills for structural issues and over/under-triggering risks. Replaces ad hoc skill authoring.
File Structure¶
skill-creator/
├── SKILL.md
├── references/
│ ├── skill-spec-template.md
│ ├── frontmatter-guide.md
│ ├── trigger-test-suite.md
│ └── vv-skill-standards.md
└── scripts/
└── validate-skill.sh
Frontmatter¶
---
name: skill-creator
description: >
Interactive guide for creating new VV skills or improving existing ones. Walks through use case
definition, frontmatter authoring, SKILL.md structure, and trigger testing. Also reviews existing
SKILL.md files for structural issues, vague descriptions, missing negative triggers, or
over/under-triggering risks. Triggered by: "create a skill", "build a skill", "improve this skill",
"review this skill", "audit the skill", "tidy up a skill", "does this skill follow standards".
Do NOT use for executing skill workflows — use the target skill itself. Do NOT use for non-skill
documentation — use doc-coauthoring instead.
---
SKILL.md Body Outline¶
1. Two Modes - Create mode: Building a new skill from scratch - Review mode: Auditing and improving an existing skill
2. Create Mode Workflow
Step 1: Use Case Definition Ask: - What specific task or workflow does this skill enable? - Who will use it? (Jules, Melody, Quinn, Forge, external) - What trigger phrases will users say? - What should NOT trigger this skill? - Is there an existing skill that overlaps? (check current skill list) - Does this need scripts? references? assets?
Step 2: Generate Frontmatter
Using answers, produce:
- name — kebab-case, matches folder name, under 30 chars
- description — WHAT + WHEN + negative triggers, under 1024 chars
- Optional fields: compatibility, metadata
Apply frontmatter-guide.md rules. Validate with validate-skill.sh.
Step 3: Write SKILL.md Body Using Anthropic's recommended structure (adapted for VV): 1. When to Use 2. Workflow (numbered steps, explicit ordering) 3. Examples (good output, bad output, or scenario walkthroughs) 4. Troubleshooting (at least 3 failure modes) 5. References section (if references/ files exist)
Step 4: Trigger Testing Generate 15 test cases using trigger-test-suite.md format: - 5 "should trigger" — obvious phrasing - 5 "should trigger" — paraphrased - 5 "should NOT trigger" — related but different
Evaluate each against the description. If any "should trigger" doesn't clearly match the description, revise the description.
Step 5: Validate
Run scripts/validate-skill.sh path/to/skill/ — must pass all checks before marking complete.
3. Review Mode Workflow
Read the target SKILL.md. Check against references/vv-skill-standards.md:
- Frontmatter valid? name, description, negative trigger present?
- Description under 1024 chars? Contains WHAT + WHEN?
- Body has: When to Use, Workflow, Examples, Troubleshooting?
- Any references linked but missing from disk?
- Any scripts referenced but not executable?
- Any hardcoded absolute paths that should be relative?
- SKILL.md under 5000 words? If over, what should move to references/?
Report findings as: PASS, WARN, or FAIL for each check. Provide specific fix instructions for each WARN/FAIL.
4. VV Skill Standards
Read references/vv-skill-standards.md for the complete checklist.
5. Troubleshooting - "Description is over 1024 chars" → Move details from description into SKILL.md body. Description is just the trigger, not the manual. - "Skill triggers too often" → Add negative triggers. Be more specific about what it does NOT cover. - "Skill never triggers" → Your description is too generic. Add specific trigger phrases users would actually say. - "validate-skill.sh fails on YAML" → Check for unescaped colons and unclosed quotes in frontmatter.
Reference File Descriptions¶
references/skill-spec-template.md
- Blank SKILL.md template with every section pre-populated with placeholder comments
- Shows correct frontmatter structure
- Section headers in the right order
- Used as the starting skeleton in Create mode
references/frontmatter-guide.md
- Rules for each frontmatter field (from Anthropic guide, VV-adapted)
- Good and bad description examples (at least 5 each)
- Negative trigger patterns with examples
- Common mistakes: XML tags, unclosed quotes, name with spaces
references/trigger-test-suite.md
- 15-test format template with fill-in-the-blank structure
- Example completed suite for the doc-coauthoring skill
- How to evaluate results: if 3+ should-trigger tests fail, revise description
references/vv-skill-standards.md
- Complete VV skill quality checklist (30 items)
- Derived from: Anthropic guide checklist + VV-specific rules
- Each item: REQUIRED or RECOMMENDED
- Used by Review mode and validate-skill.sh
scripts/validate-skill.sh¶
- Language: Bash
- Purpose: Automated structural validation of a skill folder
- Inputs:
$1— path to skill folder - Checks (each produces PASS/WARN/FAIL):
- SKILL.md exists at exact path
- Frontmatter has
---delimiters namefield is kebab-case, no spaces, no capitalsdescriptionfield exists and is non-emptydescriptionlength is under 1024 charactersdescriptioncontains "Do NOT" or "NOT for" (negative trigger)- No
<or>characters in frontmatter - SKILL.md word count under 5000
- All files referenced in SKILL.md body exist on disk
- All scripts in
scripts/directory are executable - No
README.mdin skill folder root - Folder name matches
namefield in frontmatter - Output: grouped PASS/WARN/FAIL list. Exit 1 if any FAIL. Exit 0 if all PASS (even with WARNs).
Acceptance Criteria (Quinn)¶
- All files exist at specified paths
- SKILL.md frontmatter valid, under 1024 chars, negative trigger present
- SKILL.md body clearly defines Create mode and Review mode as separate workflows
- Create mode has exactly 5 steps, each with specific sub-instructions
- Review mode produces PASS/WARN/FAIL report (testable by running review on a known-bad skill)
scripts/validate-skill.shis executable- Running validate-skill.sh on a correct skill exits 0
- Running validate-skill.sh on a skill with no description exits 1 with "FAIL" output
- Running validate-skill.sh on a skill with non-executable scripts exits 1
references/vv-skill-standards.mdhas at least 20 checklist itemsreferences/frontmatter-guide.mdhas at least 5 good description examples and 5 bad onesreferences/trigger-test-suite.mdhas a completed example suite with 15 tests
C4: agent-dispatch¶
Complexity: ⭐⭐⭐ | Model: [cloud-required]
Objective¶
New skill. Codifies VV agent selection rules, model routing, timeout settings, and handoff protocol. Currently scattered in MEMORY.md as prose — this makes it procedural and reliably triggerable. Primary user: Jules (orchestrating work), Jeff (understanding who does what).
File Structure¶
agent-dispatch/
├── SKILL.md
└── references/
├── agent-profiles.md
├── model-routing.md
└── handoff-protocol.md
Frontmatter¶
---
name: agent-dispatch
description: >
VV agent selection, model routing, and task handoff protocol. Use when deciding which agent
(Atlas, Melody, Quinn, Forge) should handle a task, which model tier to use, how to structure
a subagent prompt, or how to handle an agent handoff. Also use when a task needs to be split
across agents or when verifying that the right agent is doing the right work.
Triggered by: "who should do this", "which agent", "spawn a subagent", "delegate this",
"route this task", "which model for this".
Do NOT use for actually executing tasks — use the target skill for the work itself. Do NOT use
for external API routing or OpenClaw model config — use openclaw-prime for that.
---
SKILL.md Body Outline¶
1. VV Agent Team Quick-reference table:
| Agent | Role | Strengths | Not For |
|---|---|---|---|
| Jules | Orchestrator / COA | Planning, routing, synthesis, comms | Deep coding, sustained research |
| Atlas | Research Director | Deep research, signal gathering, synthesis | Writing code, UI work |
| Melody | Engineering Lead | Code implementation, refactors, scaffolding | Research, strategy |
| Quinn | QA Director | Code QA, testing, validation, verification | Building features |
| Forge | Product Architect | Specs, architecture decisions | Execution, coding |
2. Task Routing Decision Tree - Is the task primarily research/market intelligence? → Atlas - Is the task primarily code implementation (new feature, refactor, bug fix)? → Melody - Is the task primarily validation (build pass, test, visual QA)? → Quinn - Is the task primarily spec/architecture decisions? → Forge - Does the task span multiple domains? → Jules orchestrates, spawns specialists - Is it a quick synthesis or decision? → Jules handles directly
3. Model Tier Selection
Reference references/model-routing.md for the full routing table. Quick rules:
- Complex reasoning, architecture, strategy → anthropic/claude-opus-4-6 (default)
- Code implementation, most agent tasks → anthropic/claude-sonnet-4-6
- Local simple tasks, triage, filtering → qwen3.5:35b or glm-4.7-flash
- Never use a heavy model for a task that local can handle
4. Subagent Prompt Structure When spawning a subagent, every prompt must include: 1. Agent persona ("You are [Agent], [Role] for Vivere Vitalis.") 2. Convention files to read (CONVENTIONS.md, LESSONS_LEARNED.md if they exist) 3. Input files/context to read 4. The task — specific, scoped, completable in one session 5. Output destination — exact file path to write results 6. Acceptance signal — how the requester knows the task is done
See references/handoff-protocol.md for the full template.
5. Handoff Protocol - Jules → Melody: Include spec file path, acceptance criteria, QA contact (Quinn) - Melody → Quinn: Include PR/diff or file list, spec file path for comparison - Quinn → Jules: Include pass/fail with specific failure details if failed - Jules → Jeff: Include summary of what was done, what's pending, what needs approval
6. Escalation - If a subagent reports blocked (missing context, capability gap): Jules escalates to Jeff - If Quinn reports repeated failures on the same spec: Forge re-specs - Never re-assign a failed task to the same agent without changing the approach
7. Examples
Example: Research → Spec → Build → QA pipeline
Task: Build a new revenue dashboard widget
1. Jules assigns Atlas: "Research how other SaaS dashboards display MRR/ARR. Summarize top 3 patterns."
2. Jules assigns Forge: "Spec the MRR widget using Atlas's research. Acceptance criteria for Quinn."
3. Jules assigns Melody: "Build to Forge's spec. Reference vv-dashboard-design skill."
4. Jules assigns Quinn: "QA the MRR widget against Forge's acceptance criteria. Run qa-validation + visual Peekaboo checks."
5. Quinn → Jules: "All AC passed. Ready for Jeff review."
8. Troubleshooting - "Melody keeps asking clarifying questions" → Forge's spec is incomplete. Return to spec phase. - "Quinn keeps failing QA" → Check if the spec's acceptance criteria are testable. If not, Forge needs to revise. - "Atlas research is too broad" → Scoping was wrong. Give Atlas a more specific brief. - "Task took 3x expected time" → Break it into smaller atomic tasks. One task = one file or one logical unit.
Reference File Descriptions¶
references/agent-profiles.md
- Full profile for each agent: Jules, Atlas, Melody, Quinn, Forge
- For each: role title, what they're optimized for, what they're NOT for, preferred model tier, known biases/tendencies (e.g., "Melody codes fast but skips architecture thinking")
- How to write effective prompts for each agent
references/model-routing.md
- Full routing table: task type → model recommendation
- Tier definitions: heavy (Opus), standard (Sonnet), local (Qwen/GLM)
- Cost consideration: when to use heavy vs standard
- Context window guidance: what tasks fit in what window sizes
- Update cadence: revisit monthly as new models become available
references/handoff-protocol.md
- Subagent prompt template (fill-in-the-blank)
- Handoff checklist for each agent pair (Jules→Melody, Melody→Quinn, etc.)
- Failure escalation paths
- "Task complete" signal standards — what constitutes a complete handoff
Acceptance Criteria (Quinn)¶
- All files exist at specified paths
- SKILL.md frontmatter valid, under 1024 chars, negative triggers present
- SKILL.md body contains agent team table with all 5 agents (Jules, Atlas, Melody, Quinn, Forge)
- SKILL.md body contains a task routing decision tree (structured, not prose)
- SKILL.md body contains a subagent prompt structure with all 6 required fields
- SKILL.md body contains an Examples section with at least one multi-agent pipeline example
- SKILL.md body contains a Troubleshooting section
references/agent-profiles.mdhas a profile for each of the 5 agentsreferences/model-routing.mdcovers at least 3 model tiers with task-type recommendationsreferences/handoff-protocol.mdcontains a fill-in-the-blank subagent prompt template
C5: revenue-modeling¶
Complexity: ⭐⭐⭐⭐ | Model: [cloud-required]
Objective¶
New skill. Standardizes ARR modeling, pricing tier design, and market sizing (TAM/SAM/SOM) analysis for VV. Used by Forge (specs with revenue implications) and Atlas (market research). Immediately relevant to APA pricing validation.
File Structure¶
revenue-modeling/
├── SKILL.md
├── references/
│ ├── arr-model-template.md
│ ├── pricing-tier-guide.md
│ ├── market-sizing-guide.md
│ └── examples/
│ ├── apa-arr-model.md
│ └── apa-market-sizing.md
└── scripts/
└── arr-calc.py
Frontmatter¶
---
name: revenue-modeling
description: >
ARR modeling, pricing tier design, and market sizing (TAM/SAM/SOM) for VV products and services.
Use when evaluating a new product's revenue potential, designing pricing tiers, stress-testing
revenue assumptions, or calculating TAM/SAM/SOM for a market. Triggered by: "model the revenue",
"what should we charge", "design pricing tiers", "what's the market size", "ARR projections",
"SOM calculation", "pricing validation", "revenue assumptions".
Do NOT use for existing revenue tracking or invoicing — this is for planning and modeling only.
Do NOT use for cost estimation of API spend — use cost-estimation skill instead.
---
SKILL.md Body Outline¶
1. When to Use Revenue modeling before building, pricing decisions, market opportunity validation, pitch preparation.
2. Three Analysis Types This skill covers three distinct analyses — use what the situation requires: - ARR Modeling — what revenue can we make from this product? - Pricing Tier Design — how should we structure pricing? - Market Sizing — how big is the opportunity?
Often all three are needed together for a new product evaluation.
3. ARR Modeling Workflow
Step 1: Define the model inputs - Product type: SaaS subscription, one-time, services, hybrid - Billing cadence: monthly, annual, usage-based - Customer segment count (conservative, base, optimistic) - Churn rate assumption (monthly) - Expansion revenue assumption (if applicable)
Step 2: Use the template
Read references/arr-model-template.md. Fill in all fields. Run scripts/arr-calc.py to compute projections.
Step 3: Stress-test Run three scenarios: conservative (50% of base), base, optimistic (150% of base). Report all three. Never report only the optimistic case.
Step 4: Identify the biggest assumption Every model has one number that determines everything. Name it explicitly: "This model's critical assumption is [X]. If [X] is wrong by Y%, the projection changes by Z%."
4. Pricing Tier Design Workflow
Step 1: Identify buyer types (ICP segmentation) - Who are the distinct buyer types? - What is each type's willingness to pay? - What is each type's key value metric?
Step 2: Apply value-based pricing principles - Price to the value delivered, not cost to produce - Tier structure: 3 tiers (Starter/Pro/Enterprise or equivalent) is the default - Anchor pricing: Enterprise tier anchors perception of value for Pro tier - Free tier or trial: is it needed to reduce friction? What's the conversion assumption?
Step 3: Validate against market
- What do competitors charge for comparable value?
- What does the target ICP currently budget for this category?
- Reference references/pricing-tier-guide.md for VV standard structures
Step 4: Model the tier mix - What % of customers will land in each tier? - What is the resulting blended ARPU? - Feed into ARR model
5. Market Sizing Workflow
Definitions (must use these consistently): - TAM — Total Addressable Market: the entire category revenue if all possible customers bought - SAM — Serviceable Addressable Market: the portion we can realistically reach with our GTM - SOM — Serviceable Obtainable Market: what we can realistically capture in 3-5 years
Process:
1. Define the market category precisely (too broad = useless; too narrow = underestimates)
2. Use top-down approach (industry reports) AND bottom-up approach (customer count × ARPU)
3. Report both approaches. If they diverge significantly, explain why.
4. SAM = TAM × (% addressable by our GTM). Be honest about GTM reach.
5. SOM = SAM × (% we can capture). Justify the capture rate with comparable company benchmarks.
6. Reference references/market-sizing-guide.md for methodology and data sources.
6. Examples
See references/examples/apa-arr-model.md and references/examples/apa-market-sizing.md.
7. Output Format Always deliver: - Inputs table (all assumptions, explicitly labeled as assumptions) - Projections table (3 scenarios) - Single "critical assumption" statement - Recommendation or implication (don't just present numbers — tell Jeff what they mean)
8. Troubleshooting - "I don't have enough data for a bottom-up estimate" → Use top-down as primary, bottom-up sanity check from first principles. Flag the uncertainty explicitly. - "The optimistic case looks unrealistically high" → It probably is. Use the base case as the planning figure. Optimistic is the ceiling if everything goes right. - "We can't agree on what the TAM is" → The market definition is wrong. Narrow it until you can get a specific number. - "The model shows we need 10,000 customers to hit $1M ARR but our SAM is 500 companies" → Pricing is too low. Return to pricing tier design.
Reference File Descriptions¶
references/arr-model-template.md
- Structured fill-in-the-blank ARR model
- Sections: Product inputs, Customer inputs (segment counts by tier), Churn assumptions, Expansion assumptions
- Monthly and annual projection tables (pre-formatted as markdown tables with formula comments)
- 3-scenario output table template
references/pricing-tier-guide.md
- VV standard 3-tier structure (Starter/Pro/Enterprise) with naming options
- Value metric examples by product type (per seat, per API call, per team, flat rate)
- Anchoring principle with examples
- Free trial / freemium decision framework: when it helps vs when it hurts ARR
- Common pricing mistakes: underpricing for enterprise, feature-gating the wrong things
references/market-sizing-guide.md
- TAM/SAM/SOM definitions and common mistakes
- Top-down methodology: how to use industry reports + adjustment factors
- Bottom-up methodology: how to estimate from customer count × ARPU
- Data sources for VV target markets (fitness, sports, wellness): which reports to reference
- Comparable company benchmarks for capture rate justification
references/examples/apa-arr-model.md
- Completed ARR model for Athletic Performance Analytics (APA)
- Uses real-style inputs, all three scenarios, critical assumption identified
- Annotated to show reasoning
references/examples/apa-market-sizing.md
- Completed TAM/SAM/SOM analysis for the APA target market (professional sports teams, NCAA)
- Both top-down and bottom-up approaches shown
- Annotations explain methodology choices
scripts/arr-calc.py¶
- Language: Python 3
- Purpose: Calculate ARR projections from model inputs, avoiding spreadsheet dependency
- Inputs (CLI args):
--tiers STR— comma-separated tier names (e.g."Starter,Pro,Enterprise")--prices STR— comma-separated monthly prices per tier (e.g."99,499,2000")--customers STR— comma-separated customer counts per tier for base case (e.g."50,20,5")--churn FLOAT— monthly churn rate as decimal (e.g.0.02for 2%)--months INT— projection horizon (default: 36)--scenario FLOAT(optional) — multiplier for optimistic/conservative (e.g.1.5for optimistic,0.5for conservative); if omitted, runs all three- Output: Formatted table showing MRR and ARR at month 1, 6, 12, 24, 36 for each scenario
- Also prints: blended ARPU, implied annual churn %, total customers at horizon
- Exit 0 on success, 1 on invalid inputs
Acceptance Criteria (Quinn)¶
- All files exist at specified paths
- SKILL.md frontmatter valid, under 1024 chars, both negative triggers present
- SKILL.md body defines TAM, SAM, SOM explicitly (exact definitions)
- SKILL.md body covers all 3 analysis types (ARR, Pricing, Market Sizing) as distinct workflows
- ARR modeling workflow includes 3-scenario requirement (conservative, base, optimistic)
- ARR modeling workflow includes "critical assumption" identification step
- Pricing tier design covers value-based pricing, tier structure, and tier-mix modeling
scripts/arr-calc.pyis executable and produces output without crashing with valid inputsscripts/arr-calc.pyproduces projections for 3 scenarios when--scenariois omittedscripts/arr-calc.pyexits 1 with clear error when prices and customers have different tier countsreferences/arr-model-template.mdhas a complete 3-scenario output table templatereferences/market-sizing-guide.mdcovers both top-down and bottom-up methodology- Both example files (APA ARR model, APA market sizing) exist and are fully populated
- Troubleshooting section covers at least 4 failure scenarios
Summary Table¶
| ID | Skill | Type | Complexity | Model | Key Deliverables |
|---|---|---|---|---|---|
| M1 | cost-estimation | Medium | ⭐⭐ | local-ok | estimate.py, references/MODEL_PRICING_REGISTRY.md |
| M2 | service-management | Medium | ⭐⭐⭐⭐ | cloud | 4 scripts, 3 reference files, absorb project-scaffolding |
| M3 | qa-validation | Medium | ⭐⭐⭐ | local-ok | validate.sh, references/peekaboo-guide.md, troubleshooting |
| M4 | vv-sigint | Medium | ⭐⭐ | local-ok | check-sources.sh, examples, troubleshooting |
| M5 | vv-dashboard-design | Medium | ⭐⭐ | local-ok | check-tokens.sh, 3 example files |
| M6 | project-pipeline | Medium | ⭐ | local-ok | references/evaluation-template.md, example evaluation |
| M7 | frontend-design | Medium | ⭐⭐ | cloud | references/font-pairings.md, references/color-examples.md |
| M8 | memory-manager | Medium | ⭐⭐ | local-ok | consolidate-check.py, examples section |
| M9 | openclaw-prime | Medium | ⭐ | local-ok | troubleshooting section |
| C1 | doc-coauthoring | Complex | ⭐⭐⭐⭐ | cloud | New skill, 3-stage workflow, 4 reference files, 1 script |
| C2 | webapp-testing | Complex | ⭐⭐⭐⭐⭐ | cloud | New skill, 3 scripts, 3 reference files, with_server.py |
| C3 | skill-creator | Complex | ⭐⭐⭐⭐ | cloud | New skill, validate-skill.sh, 4 reference files |
| C4 | agent-dispatch | Complex | ⭐⭐⭐ | cloud | New skill, 3 reference files, handoff protocol |
| C5 | revenue-modeling | Complex | ⭐⭐⭐⭐ | cloud | New skill, arr-calc.py, 5 reference files |
Total estimated effort: ~35-45 Melody hours across all items
Implementation Notes for Melody¶
- Work M1-M9 in order. They are mostly independent but M2 (service-management) must complete before any references to
start-service.shin other skills are added. - M2 is the highest-risk task. The project-scaffolding merge involves content synthesis — don't just append, integrate. Read both existing SKILL.md files carefully before writing the merged version.
- Scripts must be tested before marking done. Run each script with at least one valid input and one invalid input. Verify exit codes.
- Reference files must be complete. A stub with "coming soon" is a build error, not a pass.
- For Complex tier: Build C1 (doc-coauthoring) first — it's the simplest and validates the workflow pattern. Then C3 (skill-creator) — you'll need it to validate the other skills. Then C2, C4, C5.
- Absolute path check: Scan every SKILL.md for
/Users/viverevitalis/paths and replace with relative paths usingreferences/structure. - Deletion of project-scaffolding: Do NOT delete until M2 is verified by Quinn. Confirmation step required.
Spec complete. Ready for Quinn review, then Jules strategic review, then Jeff approval before Melody execution.
V2 REVISION — Post-Quinn Review Delta¶
Author: Forge (Director of Product Architecture) Date: 2026-03-20 Status: Addresses all blocking and non-blocking issues raised in QA_SPEC_REVIEW.md Scope: This section is a delta only. All original spec content above remains in effect unless explicitly superseded here.
BLOCKER 1: intelligence-suite — Disposition Added (NEW: M10)¶
Quinn's finding: intelligence-suite was called out in the BRIEF but completely absent from the spec.
Jules's decision: ARCHIVE. vv-sigint is VV's intelligence skill. intelligence-suite belongs to a different agent persona (Makima) and has no place in our stack. No adaptation needed.
M10: intelligence-suite (Archive)¶
Complexity: ⭐ | Model: [local-ok]
Objective¶
Archive the intelligence-suite skill by relocating its folder. No content adaptation required. vv-sigint handles all signal intelligence needs for VV.
File Changes¶
Move Operation¶
- From:
~/.openclaw/workspace/skills/intelligence-suite/ - To:
~/.openclaw/workspace/skills/.archived-intelligence-suite/
Use mv — do not copy-and-delete. Dot-prefix ensures it is excluded from normal skill discovery.
No Content Changes¶
Do NOT modify SKILL.md or any files inside the archived folder. The goal is preservation with suppression, not deletion.
Post-Move Verification¶
- Confirm
~/.openclaw/workspace/skills/intelligence-suite/no longer exists - Confirm
~/.openclaw/workspace/skills/.archived-intelligence-suite/SKILL.mdexists and is readable - Confirm no other file in
~/.openclaw/workspace/skills/referencesintelligence-suiteby name in a liveSKILL.mdtrigger (grep check)
Acceptance Criteria (Quinn)¶
skills/intelligence-suite/does not exist after M10 completesskills/.archived-intelligence-suite/SKILL.mdexists and is non-emptygrep -r "intelligence-suite" ~/.openclaw/workspace/skills/ --include="SKILL.md"returns zero results from non-archived skills- M10 completes in under 5 minutes — this is a one-command move, not a rewrite
BLOCKER 2: skill-creator Name Conflict — Renamed to vv-skill-creator¶
Quinn's finding: A system-level skill-creator skill already exists at /opt/homebrew/lib/node_modules/openclaw/skills/skill-creator/. Creating a workspace-level skill with the same name creates an ambiguous resolution conflict.
Jules's decision: RENAME. The VV variant becomes vv-skill-creator. This avoids collision, follows VV naming convention for custom skills, and preserves the system-level skill for general OpenClaw use.
Changes to C3 (now vv-skill-creator)¶
C3 File Structure — UPDATED:
vv-skill-creator/
├── SKILL.md
├── references/
│ ├── skill-spec-template.md
│ ├── frontmatter-guide.md
│ ├── trigger-test-suite.md
│ └── vv-skill-standards.md
└── scripts/
└── validate-skill.sh
C3 Frontmatter — UPDATED:
---
name: vv-skill-creator
description: >
Interactive guide for creating new VV skills or improving existing ones. Walks through use case
definition, frontmatter authoring, SKILL.md structure, and trigger testing. Also reviews existing
SKILL.md files for structural issues, vague descriptions, missing negative triggers, or
over/under-triggering risks. Triggered by: "create a skill", "build a skill", "improve this skill",
"review this skill", "audit the skill", "tidy up a skill", "does this skill follow standards".
Do NOT use for executing skill workflows — use the target skill itself. Do NOT use for non-skill
documentation — use doc-coauthoring instead.
---
All internal C3 references updated: Every occurrence of skill-creator within the SKILL.md body, reference files, and script output messages must use vv-skill-creator. The folder name is vv-skill-creator/. The frontmatter name field is vv-skill-creator.
Cross-Reference Update: C1 AC11¶
Original AC11 (C1 — doc-coauthoring):
"Running the skill-creator skill against this SKILL.md produces no 'missing required field' warnings."
Updated AC11 (V2):
"Running the
vv-skill-creatorskill against this SKILL.md viascripts/validate-skill.shproduces no FAIL outputs."
Note: AC11 cannot be verified at C1 completion time (see Blocker 3). It is demoted to a post-C3 retroactive check. See updated implementation order below.
BLOCKER 3: C1/C3 Build Order — C3 Builds First¶
Quinn's finding: C1 AC11 depends on vv-skill-creator's validate-skill.sh existing. The original order (C1 → C3 → C2 → C4 → C5) makes AC11 unverifiable at C1 completion time.
Fix: Reverse C1 and C3 in the build sequence. C3 (vv-skill-creator) builds and passes QA first, then C1 (doc-coauthoring) builds and AC11 is verified using the newly-built validate-skill.sh.
Updated Complex Tier Build Order:
1. C3 — vv-skill-creator (builds first; provides validate-skill.sh for all subsequent skill QA)
2. C1 — doc-coauthoring (AC11 now verifiable post-C3)
3. C2 — webapp-testing
4. C4 — agent-dispatch
5. C5 — revenue-modeling
See updated Implementation Order at the end of this section.
NON-BLOCKING: M2 Deletion Ordering Clarification¶
Quinn's finding: AC14 instructs Melody to delete project-scaffolding/SKILL.md as part of M2 completion, while the Implementation Notes say "Do NOT delete until M2 is verified by Quinn." Direct contradiction.
Fix: Remove AC14 from M2's acceptance criteria. Deletion is a post-QA step, not a build step.
M2 Acceptance Criteria — UPDATED¶
Remove from M2 ACs:
~~14.
project-scaffolding/SKILL.mdhas been deleted~~
Add as post-QA step (after Quinn signs off M2):
Post-QA Step — M2 Deletion:
After Quinn confirms M2 PASS, Jules or Melody executes:
trash ~/.openclaw/workspace/skills/project-scaffolding/(usetrash, notrm). Confirm deletion with Quinn before marking M2 fully closed. This step is NOT part of Melody's build pass — it is a separate controlled action.
NON-BLOCKING: M4, M5, M8 — Negative Trigger Confirmation¶
Quinn's finding: Spec said "no frontmatter change needed" for M4, M5, M8 without quoting the existing negative triggers. Required verification.
Confirmed — negative triggers already exist (added to Simple tier today per Jules):
M4 — vv-sigint: Existing description includes:
"Do NOT use for internal project status or team updates — this is for external signal gathering only."✅ Negative trigger confirmed. No frontmatter update required.
M5 — vv-dashboard-design: Existing description includes:
"Do NOT use for non-MC web projects — use frontend-design for general web work."✅ Negative trigger confirmed. No frontmatter update required.
M8 — memory-manager: Existing description includes:
"Do NOT use for session-to-session conversation continuity — that is handled by the AGENTS.md startup routine."✅ Negative trigger confirmed. No frontmatter update required.
Melody: treat the frontmatter for M4, M5, M8 as frozen. Do not touch it unless a check on the actual file reveals the text above is absent, in which case add the quoted trigger text verbatim.
NON-BLOCKING: M7 — Example Scenarios Added¶
Quinn's finding: The brief called for "2-3 example scenario walkthroughs." The spec provided reference files (font-pairings.md, color-examples.md) but not scenario walkthroughs. These are different things.
Fix: Add a references/examples/ directory to M7 with two scenario walkthrough files.
M7 Additional File Changes¶
frontend-design/references/examples/minimal-saas-landing.md
- Purpose: Scenario walkthrough — designing a minimal SaaS landing page
- Content structure:
- The brief: "Build a landing page for a B2B analytics tool. Tone: trustworthy, clean, modern."
- Aesthetic selection: Why minimal? (trust signals, B2B buyers, data-forward context)
- Font pairing chosen: (reference font-pairings.md minimal entry) — reasoning explained
- Color palette chosen: (reference color-examples.md minimal entry) — reasoning explained
- Layout decisions: whitespace, typography scale, CTA placement
- ❌ What NOT to do: purple gradient hero, generic stock photo, Inter/Roboto alone
- ✅ Output: Full React component (hero + CTA section) using the chosen fonts and palette
frontend-design/references/examples/editorial-dashboard-widget.md
- Purpose: Scenario walkthrough — designing an editorial-style data widget
- Content structure:
- The brief: "Build a stats widget that feels premium and distinctive. Not like every other dashboard."
- Aesthetic selection: Why editorial? (contrast with typical dashboards, creates visual hierarchy)
- Font pairing chosen: (reference font-pairings.md editorial entry) — reasoning explained
- Color palette chosen: (reference color-examples.md editorial entry) — reasoning explained
- Typography choices: large display numbers, small label type, intentional weight contrast
- ❌ What NOT to do: rounded cards with soft shadows everywhere, Inter everywhere, gradient accents
- ✅ Output: Stat card React component using editorial typography and color
M7 SKILL.md Updates — REVISED¶
In addition to the "Starting Resources" section already specified, add to the SKILL.md body:
## Example Scenarios
See `references/examples/` for end-to-end scenario walkthroughs:
- `minimal-saas-landing.md` — B2B landing page, minimal aesthetic, full reasoning chain
- `editorial-dashboard-widget.md` — Premium data widget, editorial aesthetic, full reasoning chain
Read these before starting any project to see how aesthetic selection → font → color → component flows in practice.
M7 Updated Acceptance Criteria¶
Add to existing M7 ACs:
references/examples/minimal-saas-landing.mdexists with aesthetic selection reasoning and a React componentreferences/examples/editorial-dashboard-widget.mdexists with aesthetic selection reasoning and a React component- SKILL.md references
references/examples/with links to both files
NON-BLOCKING: M1 — Input Validation for Negative Token Values¶
Quinn's finding: estimate.py has no handling for --input-tokens -1000. Should reject negative values.
Fix: Add to cost-estimation/scripts/estimate.py spec:
Validation step (insert after arg parsing, before all calculation logic):
# Input validation
for flag, value in [("--input-tokens", args.input_tokens),
("--output-tokens", args.output_tokens),
("--cached-tokens", args.cached_tokens)]:
if value < 0:
print(f"ERROR: {flag} cannot be negative. Got: {value}")
sys.exit(1)
M1 AC addition:
- Running
estimate.py --input-tokens -1000 --output-tokens 1000 --model claude-sonnet-4-6exits 1 with a clear error message containing "cannot be negative"
NON-BLOCKING: M2 — scaffold.sh Directory Existence Check¶
Quinn's finding: If ~/projects/vv-analytics already exists, create-next-app will fail or prompt interactively.
Fix: Add to scaffold.sh spec, before step 2 (the create-next-app invocation):
Insert as step 1.5:
1.5. Check if target directory already exists:
if [ -d "$PARENT_DIR/$PROJECT_NAME" ]; then echo "ERROR: Directory $PARENT_DIR/$PROJECT_NAME already exists. Choose a different project name or remove the existing directory."; exit 1; fi
NON-BLOCKING: M2 — start-service.sh Complex Command Fix¶
Quinn's finding: Passing $2 (the command) as a bare string to nohup $COMMAND breaks for commands with pipes, redirects, or chained operators.
Fix: Update start-service.sh spec, step 2 launch line:
Original:
Updated:
This ensures complex commands like "cd /app && npm start" or "npm run build && npm start" execute correctly.
NON-BLOCKING: C2 — capture-baseline.sh Playwright Companion File¶
Quinn's finding: A bash script cannot parse a markdown file and generate Playwright navigation commands. The spec needs to clarify the mechanism.
Fix: Update capture-baseline.sh spec behavior:
Replace step 2 in original spec:
~~2. Use Playwright to navigate to each route defined in
references/mc-test-suite.md~~
Updated step 2:
- Invoke the companion Playwright test file:
npx playwright test tests/capture-baselines.spec.ts --project=chromium— this file handles navigation to each route defined inreferences/mc-test-suite.md. Iftests/capture-baselines.spec.tsdoes not exist in the target project, exit 1 with: "Playwright baseline spec not found. Create tests/capture-baselines.spec.ts first."
Additional note to C2 spec:
The webapp-testing skill should document that capture-baselines.spec.ts is a required file in the target project (not part of the skill itself). Add to references/playwright-setup.md:
## Baseline Capture Test File
capture-baseline.sh requires `tests/capture-baselines.spec.ts` in the target project.
This file should navigate to each route in mc-test-suite.md and call `page.screenshot()`.
See test-patterns.md for the correct Playwright screenshot API usage.
NON-BLOCKING: C3 — validate-skill.sh Check 9 Narrowed¶
Quinn's finding: Check 9 ("All files referenced in SKILL.md body exist on disk") is too broad. Parsing arbitrary markdown for all file references will produce a fragile parser.
Fix: Narrow check 9 in validate-skill.sh spec:
Original check 9:
- All files referenced in SKILL.md body exist on disk
Updated check 9:
- Check that all relative paths matching
references/orscripts/patterns mentioned in the SKILL.md body exist on disk. Implementation:grep -oE '(references|scripts)/[a-zA-Z0-9._/-]+' SKILL.md— for each match, verify the file exists at$SKILL_DIR/$MATCH. Print FAIL for any missing file, PASS if all exist or none are referenced.
NON-BLOCKING: C4 — Date-Stamp Requirement for model-routing.md¶
Quinn's finding: The model routing table will go stale. AC9 should require a date-stamp to make staleness detectable.
Fix: Add to agent-dispatch/references/model-routing.md spec:
Required header in model-routing.md:
# Model Routing Table
**Last updated:** YYYY-MM-DD
**Review cadence:** Monthly. Revisit when new model tiers are released or pricing changes significantly.
Updated C4 AC9:
references/model-routing.mdcovers at least 3 model tiers with task-type recommendations and includes a "Last updated: YYYY-MM-DD" date-stamp in the header
NON-BLOCKING: C5 — --scenario Flag Clarified + Input Validation¶
Quinn's finding (1): When --scenario IS provided, behavior is ambiguous — does it run only that multiplier, or still run all three?
Clarification: When --scenario FLOAT is provided, run only that single multiplier scenario against the base customer counts. When omitted, run all three (0.5× conservative, 1.0× base, 1.5× optimistic).
Updated arr-calc.py spec behavior:
--scenario 1.5→ runs only the optimistic scenario (1.5× customer counts)--scenario 0.5→ runs only the conservative scenario (0.5× customer counts)--scenarioomitted → runs all three: conservative (0.5×), base (1.0×), optimistic (1.5×)
Quinn's finding (2): No handling for nonsensical inputs (churn > 50%, negative prices/customer counts).
Add to arr-calc.py spec — Input Validation section:
# Validation rules (after arg parsing):
if args.churn < 0 or args.churn > 1:
print("ERROR: --churn must be between 0.0 and 1.0 (e.g., 0.02 for 2% monthly churn)")
sys.exit(1)
if args.churn > 0.5:
print("WARNING: Monthly churn > 50% detected. Double-check inputs — this is economically unusual.")
# Continue with calculation (warning only, not exit)
for price in [float(p) for p in args.prices.split(",")]:
if price < 0:
print("ERROR: Prices cannot be negative.")
sys.exit(1)
for count in [int(c) for c in args.customers.split(",")]:
if count < 0:
print("ERROR: Customer counts cannot be negative.")
sys.exit(1)
Updated C5 AC additions:
arr-calc.py --churn 1.5 ...exits 1 with a clear error about churn rangearr-calc.py --churn 0.6 ...completes with a printed warning about high churn (does not exit 1)arr-calc.py --prices "-100,500,2000" ...exits 1 with a clear error about negative pricesarr-calc.py --scenario 1.5 ...runs only the 1.5× scenario, not all three
NON-BLOCKING: C5 — APA Example Numbers — Atlas Cross-Reference Required¶
Quinn's finding: APA example files in revenue-modeling need realistic numbers that don't contradict Atlas's market research.
Directive to Melody: Do NOT populate references/examples/apa-arr-model.md or references/examples/apa-market-sizing.md with specific numbers until Atlas's APA market research is available. Instead:
Placeholder approach:
- Create both files with complete structure (all sections, headings, tables)
- Replace all specific dollar figures, customer counts, and market size numbers with [TBD — pending Atlas market research]
- Add a header note to both files: ⚠ Numbers in this file are placeholders. Cross-reference with Atlas's APA market research before treating as planning inputs.
Updated C5 AC13:
- Both APA example files exist with complete structure. All specific market figures are marked
[TBD — pending Atlas market research]with a header warning. Files are not populated with invented numbers.
Updated Implementation Order (V2)¶
Medium Tier (unchanged order)¶
- M10 — intelligence-suite archive (new, run first — fast, no dependencies)
- M1 — cost-estimation
- M2 — service-management (absorbs project-scaffolding)
- M3 — qa-validation
- M4 — vv-sigint
- M5 — vv-dashboard-design
- M6 — project-pipeline
- M7 — frontend-design
- M8 — memory-manager
- M9 — openclaw-prime
Post-M2 QA gate: Quinn signs off M2 → Jules/Melody deletes project-scaffolding (controlled deletion, not part of build pass).
Complex Tier (C1/C3 order reversed from original)¶
- C3 —
vv-skill-creator(builds first; validate-skill.sh required by C1 AC11) - C1 —
doc-coauthoring(AC11 now verifiable post-C3) - C2 —
webapp-testing - C4 —
agent-dispatch - C5 —
revenue-modeling
Post-C3 retroactive check: After C3 passes QA, run validate-skill.sh against C1's SKILL.md as a retroactive AC11 verification. If it fails, Melody patches C1 before proceeding to C2.
Updated Summary Table (V2)¶
| ID | Skill | Type | Complexity | Model | Key Deliverables | V2 Changes |
|---|---|---|---|---|---|---|
| M1 | cost-estimation | Medium | ⭐⭐ | local-ok | estimate.py, references/MODEL_PRICING_REGISTRY.md | Add negative token validation (AC8) |
| M2 | service-management | Medium | ⭐⭐⭐⭐ | cloud | 4 scripts, 3 reference files, absorb project-scaffolding | Remove AC14; post-QA deletion step; scaffold.sh dir check; start-service.sh bash -c fix |
| M3 | qa-validation | Medium | ⭐⭐⭐ | local-ok | validate.sh, references/peekaboo-guide.md, troubleshooting | No changes |
| M4 | vv-sigint | Medium | ⭐⭐ | local-ok | check-sources.sh, examples, troubleshooting | Negative trigger confirmed (no frontmatter change) |
| M5 | vv-dashboard-design | Medium | ⭐⭐ | local-ok | check-tokens.sh, 3 example files | Negative trigger confirmed (no frontmatter change) |
| M6 | project-pipeline | Medium | ⭐ | local-ok | references/evaluation-template.md, example evaluation | No changes |
| M7 | frontend-design | Medium | ⭐⭐ | cloud | references/font-pairings.md, references/color-examples.md | Add references/examples/ with 2 scenario walkthroughs (AC7-9 added) |
| M8 | memory-manager | Medium | ⭐⭐ | local-ok | consolidate-check.py, examples section | Negative trigger confirmed (no frontmatter change) |
| M9 | openclaw-prime | Medium | ⭐ | local-ok | troubleshooting section | No changes |
| M10 | intelligence-suite | Medium | ⭐ | local-ok | Archive to .archived-intelligence-suite/ | NEW — was missing from spec |
| C1 | doc-coauthoring | Complex | ⭐⭐⭐⭐ | cloud | New skill, 3-stage workflow, 4 reference files, 1 script | AC11 demoted to post-C3 retroactive check; builds AFTER C3 |
| C2 | webapp-testing | Complex | ⭐⭐⭐⭐⭐ | cloud | New skill, 3 scripts, 3 reference files, with_server.py | capture-baseline.sh invokes companion Playwright spec file |
| C3 | vv-skill-creator | Complex | ⭐⭐⭐⭐ | cloud | New skill, validate-skill.sh, 4 reference files | RENAMED from skill-creator; builds FIRST in complex tier; validate-skill.sh check 9 narrowed |
| C4 | agent-dispatch | Complex | ⭐⭐⭐ | cloud | New skill, 3 reference files, handoff protocol | model-routing.md requires date-stamp header |
| C5 | revenue-modeling | Complex | ⭐⭐⭐⭐ | cloud | New skill, arr-calc.py, 5 reference files | --scenario flag clarified; churn/price validation added; APA examples use TBD placeholders pending Atlas research |
Total estimated effort: ~35-45 Melody hours (unchanged — M10 adds ~15 min)
V2 Revision complete. All 3 blocking issues resolved. All non-blocking issues addressed. Ready for Quinn delta sign-off, then Jeff approval before Melody execution.
— Forge