QA Spec Review — Skills Overhaul Implementation Spec¶

Reviewer: Quinn (Lead QA Engineer, Vivere Vitalis) Date: 2026-03-20 Spec Reviewed: IMPLEMENTATION_SPEC.md (Author: Forge) Review Type: Pre-build spec quality gate

Executive Summary¶

The spec is well-structured and covers most ground, but has 3 blocking issues that must be resolved before Melody touches a file: 1. intelligence-suite is entirely absent despite explicit brief callout 2. C3 (skill-creator) conflicts with the existing system-level skill at /opt/homebrew/lib/node_modules/openclaw/ 3. C1 AC11 creates a circular dependency with C3

There are also 4 significant (non-blocking) issues and a collection of minor edge case gaps. Overall verdict: FAIL — spec requires revision on blocking items before build begins.

Item-by-Item Review¶

M1 — cost-estimation¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (move registry, add estimate.py, example output, negative trigger)
✅ Acceptance criteria: All 7 ACs are independently testable
✅ Frontmatter: Description well under 1024 chars, has WHAT + WHEN + negative trigger
✅ Feasibility: Python script spec is realistic and well-scoped

⚠️ Note — Edge case: negative token counts. The script spec has no handling for --input-tokens -1000. Should add input validation that rejects negative values with exit 1 and a clear message.

⚠️ Note — AC1 wording ambiguity. AC1 says "contains the same pricing data as the source file." Melody has no way to verify this programmatically unless she diffs the files. Suggest revising AC1 to: "File exists, is non-empty, and diff with source file shows no content changes."

M2 — service-management (absorbs project-scaffolding)¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (4 scripts, 3 reference files, troubleshooting, examples, merge)
✅ Most ACs are independently testable
✅ Frontmatter: Description under 1024 chars, has WHAT + WHEN + negative trigger

⚠️ Significant — Deletion ordering contradiction. AC14 says Melody should delete project-scaffolding/SKILL.md as part of M2 completion. The Implementation Notes say "Do NOT delete until M2 is verified by Quinn. Confirmation step required." These directly contradict each other. AC14 will read as a green light to delete during build. The spec must clarify: is deletion an AC (Melody does it) or a post-QA step (Jules/Quinn triggers it)? Recommend removing AC14 from the build criteria and adding a separate step: "Quinn confirms M2 pass, then Melody deletes project-scaffolding."

⚠️ Note — scaffold.sh edge case: target directory already exists. If ~/projects/vv-analytics already exists, create-next-app will fail or prompt interactively. The spec should add: "Before running create-next-app, check if the target directory exists. If it does, print an error and exit 1."

⚠️ Note — start-service.sh: complex commands. Passing $2 (the command) as a bare string to nohup $COMMAND will break if the command contains pipes, redirects, or multiple chained operators. Suggest wrapping in bash -c "$COMMAND".

M3 — qa-validation¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (validate.sh, peekaboo-guide.md, troubleshooting, negative trigger already exists)
✅ ACs are testable
✅ Frontmatter: Spec correctly notes existing description is unchanged and already compliant

⚠️ Note — validate.sh test runner assumption. Step 3 runs npm test -- --watchAll=false (a Jest-specific flag). If a VV project uses Vitest (which is common with newer Next.js setups), this flag will be unknown and the command will fail. The script spec should add: "If npm test -- --watchAll=false fails with 'unknown option', retry with npm test -- --run."

⚠️ Note — AC2 requires specific preconditions. "Running validate.sh against a project with passing build, passing tests, and live health endpoint exits 0" — Melody can't test this AC in isolation without a real VV project running. Suggest adding: "Use Mission Control as the reference project for positive-path AC testing."

M4 — vv-sigint¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (fix cron, add examples, troubleshooting, check-sources.sh)
✅ ACs are independently testable
✅ Frontmatter: Spec claims no change needed

⚠️ Note — Frontmatter compliance not verified. The spec says "Frontmatter: no change (already good)" but doesn't show the existing vv-sigint description for inspection. The brief requires ALL skills to have negative triggers. Melody should verify the existing description has one before skipping frontmatter updates. If it doesn't, a frontmatter update is required.

⚠️ Note — check-sources.sh: missing sources.md guard. If references/sources.md doesn't exist (fresh install), the script will error silently or behave unexpectedly. Script should check for file existence and exit 1 with a clear message if not found.

M5 — vv-dashboard-design¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (example references, check-tokens.sh)
✅ ACs are testable

⚠️ Missing brief item — negative trigger not addressed. The brief explicitly says to add a negative trigger for vv-dashboard-design ("MC only, use frontend-design for general web"). The spec contains no frontmatter update for M5. Melody needs to check whether the existing description has a negative trigger; if not, this is a build requirement that's unspecified.

⚠️ Note — check-tokens.sh grep robustness. The spec's exclusion logic ("exclude comments, CSS variable definitions, tokens file itself") is described in prose but not implemented in the grep command shown. A naïve grep -rn "#[0-9a-fA-F]\{3,6\}" will generate false positives in comment blocks, hex color names in documentation, and the token definition file itself. Recommend the spec provide the actual grep pipeline (e.g., grep -rn --include="*.tsx" --include="*.ts" "#[0-9a-fA-F]" | grep -v "var(--" | grep -v "//.*#") rather than leaving exclusion logic to Melody's interpretation.

⚠️ Note — check-tokens.sh directory existence. If $PROJECT_ROOT/app or $PROJECT_ROOT/components don't exist, grep will emit an error to stderr. Script should check directories exist before running.

M6 — project-pipeline¶

Verdict: PASS

✅ Completeness: All brief items present (evaluation template moved, completed example, path updated, negative trigger)
✅ All 5 ACs are independently testable
✅ Feasibility: Simple file moves and SKILL.md edits — no risks
✅ AC5 is specific about what the example must demonstrate

No issues.

M7 — frontend-design¶

Verdict: PASS WITH NOTES

✅ Frontmatter: Spec adds Starting Resources section. Existing description appears to already have negative trigger (spec implies it).
✅ ACs are testable with specific count requirements

⚠️ Missing brief item — example scenarios not addressed. The brief says "Add 2-3 example scenarios" to frontend-design. The spec adds reference files (font-pairings.md, color-examples.md) but no scenario walkthroughs as described in the brief's universal gaps section. These are different things: reference files are reference material; scenario walkthroughs show "given this request, here's good vs bad output." This is a gap. Recommend adding an Examples section or references/examples/ to M7 with at least 2 scenario walkthroughs.

M8 — memory-manager¶

Verdict: PASS WITH NOTES

✅ Completeness: Core brief items present (good/bad examples, consolidate-check.py)
✅ ACs are testable
✅ consolidate-check.py is appropriately scoped as advisory

⚠️ Missing brief item — negative trigger not specified. The brief says ALL skills need negative triggers. The spec has no frontmatter update for M8. The current description should be verified — if it lacks a negative trigger, M8 must add one.

⚠️ Feasibility concern — contradiction detection. The spec describes "naive contradiction detection: find lines where the same subject appears with conflicting modifiers." This is significantly underspecified. Implementing a subject-modifier scanner in Python that doesn't generate massive false positives is non-trivial. The spec should either: (a) lower expectations ("scan for duplicate lines containing the same key phrase") or (b) provide a clearer algorithm. As written, Melody will produce something that barely works or floods output. Suggest limiting scope to: "flag any line in MEMORY.md that contains a key phrase already appearing in another line" with a configurable phrase list.

⚠️ Note — consolidate-check.py: MEMORY.md missing. If MEMORY.md doesn't exist, the script should handle gracefully and print "MEMORY.md not found at [path]" rather than crashing.

M9 — openclaw-prime¶

Verdict: PASS

✅ Completeness: Brief items present (troubleshooting, negative trigger already exists per spec)
✅ All 5 ACs are independently testable and specific
✅ AC4 correctly calls out cross-referencing node-connect skill
✅ Frontmatter: Spec correctly notes it's already compliant

No issues.

C1 — doc-coauthoring (New Skill)¶

Verdict: FAIL

✅ Completeness: Covers all brief items
✅ Frontmatter: Description under 1024 chars, has WHAT + WHEN + negative trigger
✅ Most ACs are testable

🚫 Blocking — AC11 circular dependency. AC11 states: "Running the skill-creator skill against this SKILL.md produces no 'missing required field' warnings." The Implementation Notes specify: build C1 first, then C3 (skill-creator). AC11 cannot be verified at C1 completion time because the skill-creator with validate-skill.sh won't exist yet. Either: - Remove AC11 from C1 and add it as a post-C3 retroactive check, OR - Reorder so C3 is built before C1 (or at minimum, validate-skill.sh is extracted to a standalone script first)

⚠️ Note — AC4 verifiability. "Stage 1 includes at least 7 context questions" — Melody should count questions explicitly in the SKILL.md body. Recommend adding a numbered list format requirement to make this easier to audit.

C2 — webapp-testing (New Skill)¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (with_server.py, run-tests.sh, capture-baseline.sh, references)
✅ Frontmatter: Under 1024 chars, WHAT + WHEN + negative trigger ✓
✅ AC3–5 test the critical server lifecycle behavior correctly

⚠️ Note — AC7 requires running service. AC7: "capture-baseline.sh creates at least one file in tests/baselines/" — this requires a live MC instance on port 3100. The AC should specify: "Run against a running Mission Control instance. If MC is down, note this dependency explicitly."

⚠️ Note — capture-baseline.sh implementation gap. The script is described as a bash script that "uses Playwright to navigate to each route defined in mc-test-suite.md." A bash script cannot parse a markdown file and generate Playwright navigation commands directly. The spec needs to clarify: does this script call a pre-written Playwright test file (which itself reads the test suite), or does it use the Playwright CLI screenshot command? As written, Melody will have to make this up. Recommend specifying that capture-baseline.sh invokes a companion tests/capture-baselines.spec.ts file that handles the actual navigation.

⚠️ Note — mc-test-suite.md route staleness. AC10 requires coverage of "all current MC routes." The spec lists them inline. As MC grows, this list will go stale. Add a note that mc-test-suite.md is a living document requiring update when MC routes change.

C3 — skill-creator (New Skill)¶

Verdict: FAIL

✅ Completeness: All brief items present
✅ Frontmatter: Under 1024 chars, WHAT + WHEN + negative trigger ✓
✅ ACs are specific and mostly testable

🚫 Blocking — conflicts with existing system-level skill. There is already a skill-creator skill at /opt/homebrew/lib/node_modules/openclaw/skills/skill-creator/SKILL.md (listed in available_skills). The spec creates a new skill-creator at ~/.openclaw/workspace/skills/skill-creator/. These will conflict. OpenClaw's skill resolution order determines which one wins, but this isn't defined in the spec. Before building C3, Jules/Forge must decide: (a) replace the system-level skill, (b) extend it, or (c) use a different name for the VV-specific variant (e.g., vv-skill-creator). This is a design decision that blocks implementation.

⚠️ Note — validate-skill.sh check 9 complexity. Check 9: "All files referenced in SKILL.md body exist on disk." Parsing a markdown file to extract all file references is non-trivial — SKILL.md files contain code blocks, inline paths, and references in varying formats. The spec should narrow this: "Check that all relative paths matching references/ or scripts/ patterns mentioned in SKILL.md body exist on disk." Without narrowing, Melody will either produce a fragile parser or skip this check entirely.

⚠️ Note — AC5 requires "known-bad skill." AC5: "Review mode produces PASS/WARN/FAIL report testable by running review on a known-bad skill." The AC should specify which skill to use as the known-bad test case, or instruct Melody to create a synthetic test fixture in references/test-fixtures/.

C4 — agent-dispatch (New Skill)¶

Verdict: PASS WITH NOTES

✅ Completeness: Covers all brief items (agent selection, model routing, handoff protocol)
✅ Frontmatter: Under 1024 chars, WHAT + WHEN + two negative triggers ✓
✅ ACs are specific and testable
✅ Well-scoped: knowledge consolidation from MEMORY.md prose is a clear value-add

⚠️ Note — model routing table will go stale quickly. The spec notes "Update cadence: revisit monthly." This is in references/model-routing.md, not the SKILL.md body, which is appropriate. But AC9 should require a date-stamp in model-routing.md ("Last updated: YYYY-MM-DD") so staleness is detectable.

⚠️ Note — AC5 verifiability. "All 6 required fields" — the 6 fields are listed in the SKILL.md body outline but not labeled 1-6 explicitly. Recommend requiring a numbered list in the spec to make AC5 auditable.

C5 — revenue-modeling (New Skill)¶

Verdict: PASS WITH NOTES

✅ Completeness: All brief items present (ARR, pricing, market sizing, scripts, examples)
✅ Frontmatter: Under 1024 chars, WHAT + WHEN + two negative triggers ✓
✅ ACs are specific and cover all major behaviors including error cases

⚠️ Note — arr-calc.py --scenario flag ambiguity. The spec says --scenario FLOAT is a "multiplier for optimistic/conservative (e.g. 1.5 for optimistic, 0.5 for conservative)." It's unclear whether passing --scenario 1.5 runs only the optimistic scenario or still runs all three with 1.5 as the ceiling. AC9 confirms all three run when omitted. But the behavior when --scenario IS provided is unspecified. Clarify: when --scenario is provided, run only that multiplier scenario against base; when omitted, run all three (0.5, 1.0, 1.5).

⚠️ Note — arr-calc.py: missing input validation for nonsensical values. No mention of handling --churn 1.5 (150% monthly churn — mathematically valid but economically nonsensical) or negative prices. Add: "If churn > 0.5 (50%), print a warning: 'Monthly churn > 50% — double-check inputs.' If prices or customer counts are negative, exit 1."

⚠️ Note — APA example files. The spec calls for "real-style inputs" in the APA examples. Forge should review these before Melody writes them — fictional-but-realistic numbers for APA need to align with Atlas's market research, not contradict it. Flag for Jules to confirm APA numbers with Atlas before Melody populates the examples.

Cross-Cutting Issues¶

Issue 1 — intelligence-suite COMPLETELY MISSING (BLOCKING)¶

The brief explicitly calls out: "intelligence-suite — Decision: Archive or adapt. It's Makima's skill, not ours. vv-sigint is our intel skill. If keeping: strip Makima references, align with our agent team."

There is no M10 or any entry in the spec addressing intelligence-suite. This must be resolved before spec is finalized. Options for Forge: - Add M10: Archive intelligence-suite (move to a deprecated/ folder, update available_skills references) - Add M10: Adapt intelligence-suite (strip Makima, realign with VV agents) - Add a note explaining it was intentionally deferred with a reason

Leaving it unaddressed means the initiative is incomplete per the brief.

Issue 2 — Universal "negative triggers on ALL skills" partially unverified¶

The brief states: "Add negative triggers to every skill description." The spec handles this for skills with explicit frontmatter sections, but for M4 (vv-sigint), M5 (vv-dashboard-design), and M8 (memory-manager), the spec says "no change" or doesn't address frontmatter. These three skills must be verified to already have negative triggers before the spec can claim compliance. Recommend Forge add an explicit verification step: "Confirmed [skill] already has negative trigger: [quote the trigger]" for every case where frontmatter is left unchanged.

Issue 3 — C1 build order must change¶

As noted in C1 and C3: AC11 of C1 depends on skill-creator's validate-skill.sh existing. The correct order should be: 1. C3 (or at minimum extract validate-skill.sh first) 2. C1 (now AC11 can be verified) 3. C2, C4, C5

The spec's current order (C1 → C3 → C2 → C4 → C5) needs revision.

Ordering Risk Summary¶

Risk	Severity	Notes
C1 AC11 can't be verified before C3 exists	HIGH	Spec order must change
M2 deletion instruction contradicts AC14	MEDIUM	Choose: Melody deletes OR Quinn triggers deletion
C3 conflicts with system skill-creator	BLOCKING	Requires design decision before build
intelligence-suite not in spec	BLOCKING	Missing initiative item
M2 must complete before other scripts reference start-service.sh	LOW	Spec already notes this ✓
C2 capture-baseline.sh requires MC running	LOW	Note as environmental prerequisite in AC7

Verdict by Item¶

ID	Skill	Verdict	Blocking Issues
M1	cost-estimation	PASS WITH NOTES	None
M2	service-management	PASS WITH NOTES	Deletion ordering contradiction
M3	qa-validation	PASS WITH NOTES	None
M4	vv-sigint	PASS WITH NOTES	Frontmatter compliance unverified
M5	vv-dashboard-design	PASS WITH NOTES	Negative trigger not addressed
M6	project-pipeline	PASS	None
M7	frontend-design	PASS WITH NOTES	Example scenarios missing from brief
M8	memory-manager	PASS WITH NOTES	Negative trigger not addressed
M9	openclaw-prime	PASS	None
C1	doc-coauthoring	FAIL	AC11 circular dependency with C3
C2	webapp-testing	PASS WITH NOTES	capture-baseline.sh underspecified
C3	skill-creator	FAIL	Conflicts with system-level skill-creator
C4	agent-dispatch	PASS WITH NOTES	None
C5	revenue-modeling	PASS WITH NOTES	--scenario flag ambiguous

intelligence-suite: NOT IN SPEC — BLOCKING

Overall Verdict: FAIL¶

3 blocking issues must be resolved before Melody begins:

intelligence-suite is missing — Forge must add a disposition for this skill (archive, adapt, or explicitly defer with justification). The brief called it out; the spec must address it.
skill-creator (C3) conflicts with the existing system-level skill — Forge must decide whether C3 replaces, extends, or renames around the existing /opt/homebrew/lib/node_modules/openclaw/skills/skill-creator/. This is a design decision, not a build decision.
C1 AC11 circular dependency — The build order must be revised. C3 (or its validate-skill.sh) must be built before C1's acceptance criteria can be fully verified.

Once blocking issues are resolved, the M-tier work (M1-M9) can begin immediately — those items are PASS or PASS WITH NOTES and the notes are non-blocking clarifications. The PASS WITH NOTES items should be addressed in a Forge revision to the spec before Melody touches those items, but they don't gate the start of work.

Recommend: Forge addresses blocking items, appends a V2 revision section to the spec, and submits back to Quinn for sign-off on the delta only (not a full re-review).

Review complete. Ready for Jules strategic review and Jeff approval before execution.

Quinn
Lead QA Engineer, Vivere Vitalis