FORGE_SPEC.md — APA Production Readiness Shakedown¶
Author: Forge (Director of Product Architecture) Date: 2026-03-20 Spec Type: Architecture Spec (3-5 pages) Status: READY FOR EXECUTION
1. Objective¶
Build a functional APA Core API prototype — a runnable Express/TypeScript/Prisma application covering athletes, teams, sessions, metric ingestion, GAP score calculation, JWT auth with RBAC, and a Garmin webhook integration — to validate whether qwen3-coder:30b and our Forge → Melody → Quinn pipeline can execute APA development sprints at acceptable quality, speed, and cost.
This is a shakedown cruise, not a production deployment. Real logic, real patterns, real constraints — no toy code.
2. Scope Boundaries¶
IN SCOPE¶
- Prisma schema with full APA entity set (Athlete, Team, Session, MetricReading, GapScore, User, RefreshToken)
- TypeScript type layer mirroring schema + API-specific request/response types
- Express app with request logging, error handling, and health check
- Athlete and Team CRUD with Zod validation
- GAP Score calculation service with normalization, trend detection, and missing-data handling
- Session management and MetricReading ingestion endpoints
- Athlete timeline/history aggregation (last N readings per metric type)
- JWT authentication: register, login, token refresh, logout
- Three roles: ADMIN, COACH, ATHLETE
- Row-level security: coach → team athletes only; athlete → self only
- Garmin Connect mock webhook receiver
- External data transformation pipeline (Garmin format → internal MetricReading)
- Rate limiting middleware (express-rate-limit)
- Malformed external data error recovery (schema validation + graceful degradation)
- Unit tests for GAP score service (Vitest)
- Layer 5 adversarial tasks: bug hunting, refactoring, ambiguous requirement, multi-file feature
OUT OF SCOPE¶
- Production database (use SQLite for prototype, not Postgres)
- Real Garmin OAuth / live API credentials
- Frontend / dashboard
- Push notifications
- Historical data import / migration tooling
- Multi-tenancy beyond coach → team scoping
- Caching layer (Redis, etc.)
- Deployment / containerization
- Email / SMS delivery
- Any UI whatsoever
3. Data Models / API Contracts¶
3.1 Prisma Schema¶
File: prototype/prisma/schema.prisma
generator client {
provider = "prisma-client-js"
}
datasource db {
provider = "sqlite"
url = env("DATABASE_URL")
}
enum Role {
ADMIN
COACH
ATHLETE
}
enum SessionType {
TRAINING
RECOVERY
COMPETITION
REST
}
enum DataSource {
MANUAL
GARMIN
APPLE_HEALTH
API
}
enum MetricType {
HRV
RESTING_HR
SLEEP_DURATION
SLEEP_QUALITY
TRAINING_LOAD
MOOD_SCORE
}
model User {
id String @id @default(cuid())
email String @unique
passwordHash String
role Role @default(ATHLETE)
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
athlete Athlete?
coach Coach?
refreshTokens RefreshToken[]
}
model RefreshToken {
id String @id @default(cuid())
token String @unique
userId String
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
expiresAt DateTime
createdAt DateTime @default(now())
}
model Coach {
id String @id @default(cuid())
userId String @unique
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
name String
teams Team[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model Team {
id String @id @default(cuid())
name String
coachId String
coach Coach @relation(fields: [coachId], references: [id])
athletes Athlete[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model Athlete {
id String @id @default(cuid())
userId String @unique
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
name String
teamId String?
team Team? @relation(fields: [teamId], references: [id])
dateOfBirth DateTime?
sessions Session[]
metricReadings MetricReading[]
gapScores GapScore[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model Session {
id String @id @default(cuid())
athleteId String
athlete Athlete @relation(fields: [athleteId], references: [id], onDelete: Cascade)
sessionType SessionType
startTime DateTime
endTime DateTime?
notes String?
source DataSource @default(MANUAL)
metricReadings MetricReading[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model MetricReading {
id String @id @default(cuid())
athleteId String
athlete Athlete @relation(fields: [athleteId], references: [id], onDelete: Cascade)
sessionId String?
session Session? @relation(fields: [sessionId], references: [id])
metricType MetricType
value Float
unit String
recordedAt DateTime
source DataSource @default(MANUAL)
isStale Boolean @default(false)
createdAt DateTime @default(now())
}
model GapScore {
id String @id @default(cuid())
athleteId String
athlete Athlete @relation(fields: [athleteId], references: [id], onDelete: Cascade)
score Float
trend Float
components String // JSON string: { hrv, sleep, trainingLoad, mood, restingHr }
hasStaleData Boolean @default(false)
calculatedAt DateTime
createdAt DateTime @default(now())
}
3.2 TypeScript Types¶
File: prototype/src/types/index.ts
All Prisma model types are re-exported from @prisma/client. This file defines additional API-specific types.
// Re-export Prisma enums for use in application code
export { Role, SessionType, DataSource, MetricType } from '@prisma/client'
// GAP Score component weights (must sum to 1.0)
export const GAP_WEIGHTS = {
HRV: 0.30,
SLEEP: 0.20, // composite of SLEEP_DURATION + SLEEP_QUALITY
TRAINING_LOAD: 0.25,
MOOD_SCORE: 0.15,
RESTING_HR: 0.10,
} as const
// Normalization ranges per metric (raw value → 0-100 normalized)
export interface MetricNormalizationRange {
metricType: string
rawMin: number
rawMax: number
invertedScale: boolean // true = lower raw value → higher normalized score (e.g., resting HR)
}
export const NORMALIZATION_RANGES: Record<string, MetricNormalizationRange> = {
HRV: { metricType: 'HRV', rawMin: 20, rawMax: 100, invertedScale: false },
RESTING_HR: { metricType: 'RESTING_HR', rawMin: 40, rawMax: 100, invertedScale: true },
SLEEP_DURATION: { metricType: 'SLEEP_DURATION', rawMin: 3, rawMax: 10, invertedScale: false },
SLEEP_QUALITY: { metricType: 'SLEEP_QUALITY', rawMin: 1, rawMax: 10, invertedScale: false },
TRAINING_LOAD: { metricType: 'TRAINING_LOAD', rawMin: 0, rawMax: 500, invertedScale: false },
MOOD_SCORE: { metricType: 'MOOD_SCORE', rawMin: 1, rawMax: 10, invertedScale: false },
}
// GAP Score computation input and output
export interface GapScoreInput {
athleteId: string
readings: {
metricType: string
value: number
recordedAt: Date
}[]
}
export interface GapScoreComponents {
hrv: number | null // normalized 0-100 or null if missing
sleep: number | null // composite of duration + quality, or null
trainingLoad: number | null
mood: number | null
restingHr: number | null
}
export interface GapScoreResult {
score: number // final weighted composite 0-100
trend: number // positive = improving, negative = declining
components: GapScoreComponents
hasStaleData: boolean // true if any component used data >24h old
missingComponents: string[]
}
// Trend detection
export interface TrendWindow {
recent: number[] // last 7 days of daily GAP scores
baseline: number[] // days 8-28 of daily GAP scores
delta: number // recent mean minus baseline mean
}
// API Request/Response types
export interface RegisterRequest {
email: string
password: string
role: 'ADMIN' | 'COACH' | 'ATHLETE'
name: string
}
export interface LoginRequest {
email: string
password: string
}
export interface AuthResponse {
accessToken: string
refreshToken: string
user: {
id: string
email: string
role: string
}
}
export interface CreateAthleteRequest {
name: string
email: string // must match a User with role ATHLETE
teamId?: string
dateOfBirth?: string // ISO date string
}
export interface UpdateAthleteRequest {
name?: string
teamId?: string | null
dateOfBirth?: string
}
export interface CreateTeamRequest {
name: string
coachId: string
}
export interface UpdateTeamRequest {
name?: string
}
export interface CreateSessionRequest {
athleteId: string
sessionType: 'TRAINING' | 'RECOVERY' | 'COMPETITION' | 'REST'
startTime: string // ISO datetime
endTime?: string // ISO datetime
notes?: string
source?: 'MANUAL' | 'GARMIN' | 'APPLE_HEALTH' | 'API'
}
export interface IngestMetricRequest {
athleteId: string
metricType: 'HRV' | 'RESTING_HR' | 'SLEEP_DURATION' | 'SLEEP_QUALITY' | 'TRAINING_LOAD' | 'MOOD_SCORE'
value: number
unit: string
recordedAt: string // ISO datetime
sessionId?: string
source?: 'MANUAL' | 'GARMIN' | 'APPLE_HEALTH' | 'API'
}
export interface AthleteTimelineRequest {
athleteId: string
metricTypes?: string[] // filter by type; omit = all
limit?: number // readings per metric type; default 30
fromDate?: string // ISO datetime
toDate?: string // ISO datetime
}
export interface AthleteTimelineResponse {
athleteId: string
metrics: {
metricType: string
readings: {
value: number
unit: string
recordedAt: string
source: string
isStale: boolean
}[]
}[]
latestGapScore: GapScoreResult | null
}
// Garmin webhook types
export interface GarminWebhookPayload {
userId: string // Garmin user ID (mapped to athleteId externally)
summaries: GarminDailySummary[]
}
export interface GarminDailySummary {
summaryId: string
startTimeInSeconds: number // Unix epoch
durationInSeconds: number
hrvValue?: number // ms
restingHeartRateInBeatsPerMinute?: number
sleepDurationInSeconds?: number
sleepScoreTotal?: number // 0-100
trainingLoadBalance?: {
currentTrainingLoad?: number
}
stressLevel?: number // 0-100
}
// Express augmentation for auth context
export interface AuthenticatedRequest extends Express.Request {
user: {
id: string
email: string
role: 'ADMIN' | 'COACH' | 'ATHLETE'
athleteId?: string
coachId?: string
}
}
// Standard API error response
export interface ApiError {
error: string
message: string
statusCode: number
details?: unknown
}
// Standard paginated response wrapper
export interface PaginatedResponse<T> {
data: T[]
total: number
page: number
pageSize: number
}
3.3 API Endpoint Contracts¶
All endpoints return Content-Type: application/json. All errors use ApiError shape.
| Method | Path | Auth Required | Roles | Description |
|---|---|---|---|---|
| GET | /health | No | — | Health check |
| POST | /auth/register | No | — | Create user + athlete/coach record |
| POST | /auth/login | No | — | Returns access + refresh tokens |
| POST | /auth/refresh | No | — | Exchange refresh token for new access token |
| POST | /auth/logout | Yes | All | Invalidate refresh token |
| GET | /athletes | Yes | ADMIN, COACH | List athletes (coach: team-scoped) |
| POST | /athletes | Yes | ADMIN, COACH | Create athlete |
| GET | /athletes/:id | Yes | ADMIN, COACH, ATHLETE | Get athlete (RLS applies) |
| PATCH | /athletes/:id | Yes | ADMIN, COACH | Update athlete |
| DELETE | /athletes/:id | Yes | ADMIN | Delete athlete |
| GET | /teams | Yes | ADMIN, COACH | List teams (coach: own teams only) |
| POST | /teams | Yes | ADMIN, COACH | Create team |
| GET | /teams/:id | Yes | ADMIN, COACH | Get team |
| PATCH | /teams/:id | Yes | ADMIN, COACH | Update team |
| DELETE | /teams/:id | Yes | ADMIN | Delete team |
| GET | /sessions | Yes | ADMIN, COACH, ATHLETE | List sessions (RLS applies) |
| POST | /sessions | Yes | ADMIN, COACH, ATHLETE | Create session |
| GET | /sessions/:id | Yes | All | Get session (RLS applies) |
| POST | /metrics | Yes | ADMIN, COACH, ATHLETE | Ingest single metric reading |
| POST | /metrics/bulk | Yes | ADMIN, COACH | Bulk ingest metric readings |
| GET | /athletes/:id/timeline | Yes | ADMIN, COACH, ATHLETE | Athlete timeline (RLS applies) |
| GET | /athletes/:id/gap-score | Yes | ADMIN, COACH, ATHLETE | Latest GAP score (RLS applies) |
| POST | /athletes/:id/gap-score/calculate | Yes | ADMIN, COACH | Trigger GAP score recalculation |
| POST | /webhooks/garmin | No* | — | Garmin webhook receiver (*HMAC signature validated) |
4. Architecture Decisions¶
ADR-001: SQLite for prototype database¶
Decision: Use SQLite via Prisma, not PostgreSQL. Rationale: Zero infrastructure setup. This is a capability test, not a deployment test. Prisma abstracts the difference — swapping to Postgres in production is a config change, not a code change. Tradeoff: SQLite has no native enum type (Prisma handles this via string fields + CHECK constraints). Some advanced Postgres features (JSONB, row-level security at DB layer) are unavailable — implement RLS in application middleware instead.
ADR-002: Application-layer RLS, not database-layer¶
Decision: Implement row-level security in Express middleware, not Postgres RLS policies. Rationale: SQLite doesn't support DB-level RLS. Also, explicit middleware RLS is more testable and more legible than DB policies for a team of this size. The middleware injects scoping filters into all Prisma queries. Tradeoff: Application-layer RLS can be bypassed by internal service calls that skip middleware. Document this clearly. For Layer 3 implementation, every route that touches athlete/session/metric data MUST pass through the RLS middleware.
ADR-003: JWT with short-lived access tokens + refresh tokens¶
Decision: Access token TTL = 15 minutes. Refresh token TTL = 7 days. Refresh tokens stored in DB (RefreshToken table) for revocation support. Rationale: Short-lived access tokens limit blast radius of token theft. Stored refresh tokens allow logout-everywhere functionality. This is the standard pattern for mobile + web APIs. Tradeoff: More complex than single long-lived token. Justified for a system that will handle athlete health data.
ADR-004: GAP Score as computed + cached, not derived on every read¶
Decision: GAP scores are calculated on-demand (POST /athletes/:id/gap-score/calculate) and stored in the GapScore table. The GET endpoint reads the latest stored score.
Rationale: GAP score involves aggregating up to 28 days of MetricReadings. Computing this on every GET would be expensive. Storing the result allows trend comparison over time (each calculation is a snapshot). Garmin webhook ingestion auto-triggers recalculation.
Tradeoff: Score can be stale between webhook deliveries. The hasStaleData flag on GapScore signals this to consumers.
ADR-005: Garmin webhook uses HMAC signature validation, not OAuth¶
Decision: Garmin webhook receiver validates a shared HMAC-SHA256 secret in the X-Garmin-Signature header.
Rationale: Garmin's Connect IQ webhook pattern uses HMAC validation, not OAuth bearer tokens. For the prototype, the secret is an env var (GARMIN_WEBHOOK_SECRET). For the test, we mock valid and invalid signatures.
Tradeoff: Requires secret rotation strategy in production.
ADR-006: Zod for all request validation, not class-validator or manual checks¶
Decision: Use Zod schemas for all incoming request bodies. Validation runs in a route-level middleware factory (validateBody(schema)).
Rationale: Zod is TypeScript-first, generates types automatically, and keeps validation co-located with route definitions. No decorator magic.
ADR-007: Directory structure¶
prototype/
prisma/
schema.prisma
seed.ts
src/
types/
index.ts # All TypeScript types + API contracts
middleware/
auth.ts # JWT verification middleware
errorHandler.ts # Global error handler
requestLogger.ts # Morgan-style request logging
rateLimiter.ts # express-rate-limit config
roleGuard.ts # Role-based access factory
rls.ts # Row-level security middleware
validateBody.ts # Zod validation factory
services/
gapScore.ts # GAP score calculation (core business logic)
timeline.ts # Athlete history aggregation
sessionManager.ts # Session overlap validation + creation
garminTransform.ts # External data transformation
routes/
auth.ts
athletes.ts
teams.ts
sessions.ts
metrics.ts
webhooks.ts
app.ts # Express app setup, route mounting
server.ts # HTTP server entry point
tests/
unit/
gapScore.test.ts # Vitest unit tests for GAP score service
integration/
athletes.test.ts # Basic integration tests for Athlete CRUD
package.json
tsconfig.json
.env.example
5. Task Decomposition¶
LAYER 1 — Foundation¶
T01 — Prisma Schema + Initial Migration¶
Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: None
Output File: prototype/prisma/schema.prisma
Task Instruction for Melody:
Create the Prisma schema file exactly as specified in Section 3.1 of this spec. Run npx prisma migrate dev --name init to generate and apply the initial SQLite migration. Verify with npx prisma studio that all tables are created. Also create prototype/.env.example with DATABASE_URL="file:./dev.db" and JWT_SECRET="changeme" and JWT_REFRESH_SECRET="changeme_refresh" and GARMIN_WEBHOOK_SECRET="changeme_garmin".
Acceptance Criteria:
1. prototype/prisma/schema.prisma exists and contains all 9 models: User, RefreshToken, Coach, Team, Athlete, Session, MetricReading, GapScore, and all 4 enums: Role, SessionType, DataSource, MetricType
2. npx prisma migrate dev --name init completes without errors when run from prototype/
3. prototype/prisma/migrations/ directory exists with at least one migration folder
4. Running npx prisma db push on a fresh database creates all tables without error
5. prototype/.env.example exists with all four env var keys listed
T02 — TypeScript Type Definitions¶
Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: T01 (Prisma types must exist to import from @prisma/client)
Output File: prototype/src/types/index.ts
Task Instruction for Melody:
Create prototype/src/types/index.ts with all types defined in Section 3.2. This includes: GAP_WEIGHTS constant, NORMALIZATION_RANGES constant, all interface definitions (GapScoreInput, GapScoreResult, GapScoreComponents, TrendWindow, all Request/Response types, GarminWebhookPayload, GarminDailySummary, AuthenticatedRequest, ApiError, PaginatedResponse). Re-export Role, SessionType, DataSource, MetricType from @prisma/client.
Acceptance Criteria:
1. prototype/src/types/index.ts exists
2. GAP_WEIGHTS.HRV + GAP_WEIGHTS.SLEEP + GAP_WEIGHTS.TRAINING_LOAD + GAP_WEIGHTS.MOOD_SCORE + GAP_WEIGHTS.RESTING_HR === 1.0 (weights sum to exactly 1.0)
3. tsc --noEmit runs without type errors in prototype/
4. All interfaces listed in Section 3.2 are present and exported
5. GapScoreComponents allows null for each component field (missing data handling)
6. AuthenticatedRequest extends Express.Request with a user property
T03 — Express App Scaffold + Middleware Stack¶
Complexity: ⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T02
Output Files: prototype/src/app.ts, prototype/src/server.ts, prototype/src/middleware/errorHandler.ts, prototype/src/middleware/requestLogger.ts, prototype/package.json, prototype/tsconfig.json
Task Instruction for Melody:
Scaffold the Express application. app.ts sets up middleware and mounts routes (routes will be added in later tasks — use placeholder routers). server.ts starts the HTTP server on PORT env var (default 3001). Create errorHandler.ts middleware that catches all errors, logs them, and returns ApiError-shaped JSON responses with appropriate HTTP status codes. Create requestLogger.ts that logs method, path, status, and duration for every request. Implement GET /health directly in app.ts returning { status: "ok", timestamp: ISO_STRING }. Set up package.json with all required dependencies and tsconfig.json with strict TypeScript settings targeting es2022.
Required dependencies: express, @prisma/client, prisma, zod, jsonwebtoken, bcrypt, express-rate-limit, morgan, dotenv, cors Required devDependencies: typescript, @types/express, @types/node, @types/jsonwebtoken, @types/bcrypt, vitest, tsx, ts-node
Acceptance Criteria:
1. npm install completes without errors in prototype/
2. npm run dev (using tsx src/server.ts) starts the server without errors
3. curl -s http://localhost:3001/health returns HTTP 200 with JSON body containing keys status and timestamp
4. status value is exactly the string "ok"
5. timestamp value is a valid ISO 8601 datetime string
6. Sending GET /nonexistent returns HTTP 404 with an ApiError-shaped body (has keys: error, message, statusCode)
7. Throwing an error from a route handler returns HTTP 500 with an ApiError-shaped body (does NOT expose stack traces in the response body)
8. Request log output appears in stdout for every request (format: [METHOD] /path STATUS DURATIONms)
9. tsc --noEmit passes without errors
T04 — Athlete CRUD Endpoints¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T03
Output Files: prototype/src/routes/athletes.ts, prototype/src/middleware/validateBody.ts
Task Instruction for Melody:
Implement the Athlete CRUD routes as specified in Section 3.3. Create validateBody.ts as a Zod-based middleware factory: validateBody(schema) returns an Express middleware that validates req.body against the schema, calls next() on success, or returns HTTP 422 with a Zod error summary on failure. Implement all 5 Athlete endpoints. At this stage, skip auth middleware (will be added in T11). Use Prisma client for all DB operations. Validate incoming data with Zod schemas derived from CreateAthleteRequest and UpdateAthleteRequest types. Handle the case where an athlete ID does not exist (404). Handle unique constraint violations on userId (409 Conflict). PATCH should only update provided fields (partial update, not full replacement).
Acceptance Criteria:
1. POST /athletes with valid body returns HTTP 201 with the created Athlete object (all fields present including id, createdAt, updatedAt)
2. POST /athletes with missing name field returns HTTP 422 with an error body (not HTTP 500)
3. POST /athletes with missing email field returns HTTP 422
4. GET /athletes returns HTTP 200 with an array (can be empty)
5. GET /athletes/:id with a valid existing ID returns HTTP 200 with that athlete's data
6. GET /athletes/:id with a non-existent ID returns HTTP 404
7. PATCH /athletes/:id with { "name": "Updated Name" } returns HTTP 200 with updated name, preserving all other fields
8. PATCH /athletes/:id with an unknown field (e.g., { "hackField": "x" }) is silently ignored (Zod strips unknown keys, no error)
9. DELETE /athletes/:id with a valid ID returns HTTP 204 with no body
10. DELETE /athletes/:id with a non-existent ID returns HTTP 404
T05 — Team CRUD Endpoints¶
Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: T03, T04 (reuse validateBody middleware)
Output File: prototype/src/routes/teams.ts
Task Instruction for Melody:
Implement the Team CRUD routes as specified in Section 3.3. Reuse validateBody.ts from T04. Teams must validate that the coachId in a POST /teams request refers to an existing Coach record (not just a User). If coachId references a non-existent Coach, return HTTP 422 with message "Coach not found". Skip auth for now. PATCH supports partial update of name only.
Acceptance Criteria:
1. POST /teams with { "name": "Team Alpha", "coachId": "valid-coach-id" } returns HTTP 201 with team object
2. POST /teams with missing name returns HTTP 422
3. POST /teams with a coachId that doesn't reference a valid Coach record returns HTTP 422 (not HTTP 500)
4. GET /teams returns HTTP 200 with array
5. GET /teams/:id returns the team with its athletes array included (Prisma include: { athletes: true })
6. PATCH /teams/:id with { "name": "New Name" } returns HTTP 200 with updated team
7. DELETE /teams/:id returns HTTP 204
8. DELETE /teams/:id with a non-existent ID returns HTTP 404
LAYER 2 — Business Logic¶
T06 — GAP Score Calculation Service¶
Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [local-ok]
Dependencies: T02
Output File: prototype/src/services/gapScore.ts
Task Instruction for Melody: Implement the GAP Score calculation service. This is the most important file in the prototype — get this right.
Normalization function: Given a raw metric value and a MetricNormalizationRange, return a 0-100 score. Linear interpolation: (value - rawMin) / (rawMax - rawMin) * 100. Clamp result to [0, 100]. For inverted scales (lower is better, e.g., resting HR), the formula is (rawMax - value) / (rawMax - rawMin) * 100.
Component calculation: - HRV → directly normalize MetricType.HRV - SLEEP → average of normalized SLEEP_DURATION and SLEEP_QUALITY (if only one is present, use that one alone) - TRAINING_LOAD → normalize MetricType.TRAINING_LOAD - MOOD → normalize MetricType.MOOD_SCORE - RESTING_HR → normalize MetricType.RESTING_HR (inverted scale)
Missing data handling:
- If a component has no reading within the last 24 hours, mark isStale = true for that component
- If a component is completely absent, set it to null in components output and redistribute its weight proportionally among present components
- Weight redistribution: effectiveWeight[i] = GAP_WEIGHTS[i] / sum(GAP_WEIGHTS[present components])
- If ALL components are absent, return score = 0, trend = 0, hasStaleData = true
Weighted composite: score = sum(normalizedComponent[i] * effectiveWeight[i]) for all non-null components
Trend detection:
- Accept an array of past GAP scores (daily, up to 28 entries, most recent last)
- Recent window = last 7 scores (indices length-7 to length-1)
- Baseline window = scores at indices 0 to length-8 (up to 21 scores)
- If fewer than 2 total scores, trend = 0
- If fewer than 7 total scores, recent = all available, baseline = empty → trend = 0
- trend = mean(recent) - mean(baseline) — positive means improving
Export these functions: normalizeMetric, calculateGapScore(input: GapScoreInput): GapScoreResult, detectTrend(historicalScores: number[]): number
Acceptance Criteria:
1. normalizeMetric({ metricType: 'HRV', rawMin: 20, rawMax: 100, invertedScale: false }, 60) returns 50 (exactly)
2. normalizeMetric({ metricType: 'RESTING_HR', rawMin: 40, rawMax: 100, invertedScale: true }, 40) returns 100 (lowest resting HR = best score)
3. normalizeMetric(range, value) returns 0 when value <= rawMin (for non-inverted) and 100 when value >= rawMax (clamping enforced)
4. calculateGapScore with all 6 metric types present returns a score between 0 and 100 (inclusive)
5. calculateGapScore with HRV missing: the returned components.hrv is null, missingComponents array contains "HRV", and the score is calculated using only the 4 present components with redistributed weights
6. calculateGapScore with all metrics missing returns { score: 0, trend: 0, hasStaleData: true, missingComponents: ['HRV', 'SLEEP', 'TRAINING_LOAD', 'MOOD', 'RESTING_HR'] }
7. A reading with recordedAt more than 24 hours before the calculation time sets hasStaleData: true
8. detectTrend([70, 72, 68, 75, 73]) returns 0 (fewer than 7 scores → trend = 0)
9. detectTrend with exactly 7 scores returns 0 (no baseline window)
10. detectTrend([60,61,62,63,64,65,66,70,71,72,73,74,75,76]) returns a positive number (recent 7 = [70-76], baseline = [60-66], recent mean > baseline mean)
11. All exported functions are pure (no side effects, no DB calls)
T07 — Session Management + Metric Ingestion Endpoints¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T04 (validateBody), T06 (gapScore service)
Output Files: prototype/src/routes/sessions.ts, prototype/src/routes/metrics.ts, prototype/src/services/sessionManager.ts
Task Instruction for Melody: Implement session management and metric ingestion.
sessionManager.ts: Export createSession(data: CreateSessionRequest): Promise<Session>. This function must:
1. Validate that endTime > startTime (if both provided) — throw 422 if violated
2. Check for overlapping sessions: query for any session for the same athlete where (existingStart < newEnd) AND (existingEnd > newStart). If overlap found, throw 409 with message "Session overlaps with existing session [id]"
3. Create and return the Prisma session record
Routes: Implement POST /sessions, GET /sessions, GET /sessions/:id per the contract in Section 3.3. For metric ingestion: POST /metrics creates a single MetricReading. POST /metrics/bulk accepts array of IngestMetricRequest and creates all, returning { created: N, failed: M, errors: [...] } — individual failures should not abort the batch.
Acceptance Criteria:
1. POST /sessions with valid body returns HTTP 201 with session object
2. POST /sessions with endTime earlier than startTime returns HTTP 422
3. POST /sessions that would overlap an existing session returns HTTP 409
4. Two sessions that share an exact boundary (session A ends at T, session B starts at T) do NOT overlap — this is valid and returns HTTP 201
5. GET /sessions?athleteId=X filters by athleteId (query param filter is supported)
6. GET /sessions/:id for non-existent ID returns HTTP 404
7. POST /metrics with valid body returns HTTP 201 with created MetricReading
8. POST /metrics with value as a string instead of number returns HTTP 422
9. POST /metrics/bulk with 5 valid readings returns { created: 5, failed: 0, errors: [] }
10. POST /metrics/bulk where 1 of 5 readings has invalid data returns { created: 4, failed: 1, errors: [...] } — does NOT return HTTP 422 for the whole batch
T08 — Athlete Timeline Aggregation Service¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T06, T07
Output Files: prototype/src/services/timeline.ts, addition to prototype/src/routes/athletes.ts
Task Instruction for Melody: Implement the athlete timeline aggregation service and the two new athlete routes.
timeline.ts: Export getAthleteTimeline(request: AthleteTimelineRequest): Promise<AthleteTimelineResponse>. This function:
1. Queries MetricReadings for the athlete, filtered by optional metricTypes, fromDate, toDate
2. Groups readings by metricType
3. Returns at most limit readings per type, ordered by recordedAt descending
4. Fetches the most recent GapScore record for this athlete
5. Includes latestGapScore in the response (null if no score exists)
Add routes to athletes.ts:
- GET /athletes/:id/timeline → calls getAthleteTimeline
- GET /athletes/:id/gap-score → returns latest GapScore from DB
- POST /athletes/:id/gap-score/calculate → queries last 28 days of MetricReadings, calls calculateGapScore, stores result in GapScore table, returns the GapScoreResult
Acceptance Criteria:
1. GET /athletes/:id/timeline for athlete with no readings returns { athleteId: id, metrics: [], latestGapScore: null }
2. GET /athletes/:id/timeline returns readings grouped by metricType (each group has a metricType key and readings array)
3. GET /athletes/:id/timeline?limit=5 returns at most 5 readings per metric type
4. Readings within each metric type group are ordered most-recent-first (recordedAt descending)
5. POST /athletes/:id/gap-score/calculate returns a GapScoreResult-shaped object (has keys: score, trend, components, hasStaleData, missingComponents)
6. POST /athletes/:id/gap-score/calculate with no readings returns { score: 0, hasStaleData: true, missingComponents: [...all 5 components...] }
7. GET /athletes/:id/gap-score after a calculate returns the last stored GapScore record
8. GET /athletes/:id/gap-score before any calculation returns HTTP 404 with message "No GAP score calculated yet"
T09 — Unit Tests for GAP Score Service¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T06
Output File: prototype/tests/unit/gapScore.test.ts
Task Instruction for Melody:
Write Vitest unit tests for all exported functions in gapScore.ts. Tests must cover: all normalization edge cases (at min, at max, above max, below min, inverted), all weight redistribution cases (1-5 missing components), trend detection with various history lengths, and the stale data detection logic. Add "test": "vitest run" script to package.json. Tests must run without a database (pure function tests only — no Prisma).
Acceptance Criteria:
1. npm test runs without errors
2. At least 20 test cases are present (not 20 it blocks wrapping trivial assertions — 20 meaningful behavioral tests)
3. All 11 acceptance criteria from T06 are covered by at least one test each
4. All tests pass (0 failed)
5. Tests import only from ../../src/services/gapScore and ../../src/types/index — no Prisma, no HTTP, no DB
6. Test file contains a describe block for each of the three exported functions: normalizeMetric, calculateGapScore, detectTrend
LAYER 3 — Access Control¶
T10 — JWT Auth Endpoints¶
Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [cloud-required]
Rationale for cloud: Auth implementation involves security-sensitive patterns (password hashing, token signing, token revocation, timing attacks). qwen3-coder:30b has shown a tendency to make subtle security mistakes on auth flows — missing token expiry checks, improper bcrypt round counts, predictable token IDs. Cloud model required.
Dependencies: T03, T02
Output File: prototype/src/routes/auth.ts
Task Instruction for Melody:
Implement auth routes: POST /auth/register, POST /auth/login, POST /auth/refresh, POST /auth/logout. Password hashing: bcrypt with saltRounds = 12. Access token: JWT signed with JWT_SECRET env var, expires in 15m, payload { sub: userId, email, role }. Refresh token: cuid(), stored in RefreshToken table with expiresAt = now + 7 days, signed separately with JWT_REFRESH_SECRET. On logout, delete the RefreshToken record. On refresh, verify the token exists in DB AND hasn't expired AND matches the signed refresh JWT, then issue new access + refresh token pair and delete the old refresh token. Register creates a User record + the appropriate role-specific record (Athlete or Coach) based on the role field.
Acceptance Criteria:
1. POST /auth/register with valid body returns HTTP 201 with AuthResponse (has accessToken, refreshToken, user keys)
2. POST /auth/register with duplicate email returns HTTP 409
3. POST /auth/login with correct credentials returns HTTP 200 with AuthResponse
4. POST /auth/login with wrong password returns HTTP 401 (not HTTP 403)
5. POST /auth/login with non-existent email returns HTTP 401 (not HTTP 404 — do not leak account existence)
6. The access token is a valid JWT decodable with jwt.verify(token, JWT_SECRET) and contains sub, email, role claims
7. The access token has an exp claim and the expiry is within 16 minutes from now (15 min + 1 min tolerance)
8. POST /auth/refresh with a valid refresh token returns HTTP 200 with a new AuthResponse (new tokens)
9. POST /auth/refresh with an expired refresh token (modify expiresAt in DB to the past) returns HTTP 401
10. POST /auth/refresh with a token that doesn't exist in the DB returns HTTP 401
11. POST /auth/logout with a valid access token deletes the refresh token from DB; subsequent POST /auth/refresh with that token returns HTTP 401
12. Passwords are NOT stored in plain text — the passwordHash column contains a bcrypt hash (starts with $2b$)
T11 — Auth Middleware + Role Guards¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [cloud-required]
Rationale for cloud: Middleware that touches auth context must be reviewed by a model that understands the subtle ways auth bypass can occur (null checks, missing next(err) calls, order dependencies). Security-sensitive.
Dependencies: T10
Output Files: prototype/src/middleware/auth.ts, prototype/src/middleware/roleGuard.ts
Task Instruction for Melody:
Implement auth.ts: a middleware that reads the Authorization: Bearer <token> header, verifies the JWT with JWT_SECRET, and attaches req.user = { id, email, role, athleteId?, coachId? } to the request. If token is missing, return 401. If token is invalid or expired, return 401. If token is valid, call next().
Implement roleGuard.ts: a factory requireRole(...roles: Role[]): RequestHandler that reads req.user.role (already set by auth middleware) and calls next() if the role is allowed, or returns HTTP 403 if not.
Apply authenticate + appropriate requireRole(...) to all existing routes per the table in Section 3.3. Update app.ts route mounting accordingly.
Acceptance Criteria:
1. GET /athletes without Authorization header returns HTTP 401
2. GET /athletes with expired JWT returns HTTP 401
3. GET /athletes with valid JWT and ATHLETE role returns HTTP 403 (athletes can't list all athletes)
4. GET /athletes with valid JWT and COACH role returns HTTP 200
5. GET /athletes with valid JWT and ADMIN role returns HTTP 200
6. GET /health with no Authorization header returns HTTP 200 (health check is not protected)
7. POST /auth/login with no Authorization header returns HTTP 200 when credentials are valid (auth endpoints are not protected)
8. requireRole('ADMIN') middleware returns HTTP 403 for COACH and ATHLETE roles
9. req.user is populated with id, email, role on all protected routes after successful auth
10. A token from a user who was deleted mid-session returns HTTP 401 (middleware queries DB to verify user still exists)
T12 — Row-Level Security Middleware¶
Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [cloud-required]
Rationale for cloud: RLS logic is subtle — COACH must see athletes in their teams only, ATHLETE must see only self, and ADMIN bypasses all. Incorrect RLS implementation is a data breach. Local model risk too high for this task.
Dependencies: T11
Output File: prototype/src/middleware/rls.ts
Task Instruction for Melody:
Implement rls.ts with a applyAthleteScope middleware that adds scoping information to the request context.
Logic:
- ADMIN: set req.athleteScope = null (no restriction)
- COACH: query the DB for all Athlete IDs whose teamId is in any team where coachId = req.user.coachId. Set req.athleteScope = { athleteIds: [...] }
- ATHLETE: set req.athleteScope = { athleteIds: [req.user.athleteId] }
Then update athlete routes to apply scope:
- GET /athletes: if req.athleteScope !== null, add where: { id: { in: req.athleteScope.athleteIds } } to Prisma query
- GET /athletes/:id: if req.athleteScope !== null AND requested ID is not in req.athleteScope.athleteIds, return HTTP 403
Apply applyAthleteScope to all athlete, session, metric, and GAP score routes.
Add athleteScope?: { athleteIds: string[] } | null to the AuthenticatedRequest type extension.
Acceptance Criteria:
1. COACH making GET /athletes returns only athletes in their teams (not athletes on other teams)
2. ATHLETE making GET /athletes returns HTTP 403 (per Section 3.3 role requirements)
3. ATHLETE making GET /athletes/:id where :id is their own athlete ID returns HTTP 200
4. ATHLETE making GET /athletes/:id where :id is a different athlete returns HTTP 403
5. ADMIN making GET /athletes returns all athletes regardless of team
6. COACH making GET /athletes/:id for an athlete NOT on their team returns HTTP 403
7. COACH making GET /athletes/:id for an athlete on their team returns HTTP 200
8. After a coach's athlete is moved to a different team, the coach can no longer access that athlete (scope is recalculated per request, not cached)
9. GET /health is unaffected by RLS middleware (still returns 200 without auth)
10. An ATHLETE accessing POST /sessions can only create sessions for their own athleteId — if the body's athleteId differs from their own, return HTTP 403
LAYER 4 — Integration Patterns¶
T13 — Garmin Webhook Receiver¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T07 (metric ingestion), T14 (transform — implement T14 first)
Output File: prototype/src/routes/webhooks.ts
NOTE: T14 must be completed before T13. Update dependency accordingly.
Task Instruction for Melody:
Implement POST /webhooks/garmin. This endpoint:
1. Validates the HMAC-SHA256 signature: compute hmac-sha256(rawBody, GARMIN_WEBHOOK_SECRET) and compare with X-Garmin-Signature header. Return HTTP 401 if invalid. IMPORTANT: Use crypto.timingSafeEqual for comparison to prevent timing attacks. Use express.raw({ type: 'application/json' }) on this specific route to preserve the raw body for HMAC validation, then parse JSON manually.
2. Calls transformGarminPayload(payload) (from garminTransform.ts) to get an array of IngestMetricRequest
3. For each transformed metric, looks up the athleteId by mapping payload.userId through an env var or a simple in-memory config GARMIN_USER_MAPPINGS = JSON.parse(process.env.GARMIN_USER_MAPPINGS || '{}')
4. Ingests all transformed metrics via the metric ingestion logic
5. Returns HTTP 200 with { processed: N, failed: M }
Acceptance Criteria:
1. POST /webhooks/garmin with valid HMAC signature returns HTTP 200
2. POST /webhooks/garmin with missing X-Garmin-Signature header returns HTTP 401
3. POST /webhooks/garmin with incorrect HMAC signature returns HTTP 401
4. POST /webhooks/garmin with valid signature but empty summaries array returns { processed: 0, failed: 0 }
5. POST /webhooks/garmin with valid signature and 1 summary containing HRV data creates a MetricReading in the DB
6. If a Garmin userId has no mapping in GARMIN_USER_MAPPINGS, that summary is counted in failed but does not abort processing of other summaries
7. POST /webhooks/garmin does not require an Authorization header
T14 — Garmin Data Transformation Pipeline¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T02 (types)
Output File: prototype/src/services/garminTransform.ts
Task Instruction for Melody:
Implement garminTransform.ts. Export transformGarminPayload(payload: GarminWebhookPayload, athleteId: string): IngestMetricRequest[].
Mapping rules (Garmin field → internal MetricType):
- hrvValue → HRV, unit: "ms", value: direct
- restingHeartRateInBeatsPerMinute → RESTING_HR, unit: "bpm", value: direct
- sleepDurationInSeconds → SLEEP_DURATION, unit: "hours", value: sleepDurationInSeconds / 3600
- sleepScoreTotal → SLEEP_QUALITY, unit: "score", value: sleepScoreTotal / 10 (converts 0-100 to 0-10)
- trainingLoadBalance.currentTrainingLoad → TRAINING_LOAD, unit: "au", value: direct
- stressLevel → MOOD_SCORE, unit: "score", value: (100 - stressLevel) / 10 (inverts stress: 0 stress = 10 mood)
- recordedAt for all: new Date(summary.startTimeInSeconds * 1000).toISOString()
- source: "GARMIN" for all
If a Garmin field is undefined or null, skip that metric (do not create a reading with null value).
Export also: transformGarminSummary(summary: GarminDailySummary, athleteId: string): IngestMetricRequest[] (transforms a single summary, used internally by transformGarminPayload).
Acceptance Criteria:
1. transformGarminSummary with all fields present returns exactly 6 IngestMetricRequest objects (one per MetricType)
2. transformGarminSummary with hrvValue undefined returns 5 objects (HRV is omitted)
3. transformGarminSummary with sleepDurationInSeconds = 28800 returns a SLEEP_DURATION reading with value = 8 (8 hours) and unit = "hours"
4. transformGarminSummary with sleepScoreTotal = 75 returns a SLEEP_QUALITY reading with value = 7.5 and unit = "score"
5. transformGarminSummary with stressLevel = 30 returns a MOOD_SCORE reading with value = 7 (100-30=70, /10=7.0) and unit = "score"
6. transformGarminSummary with startTimeInSeconds = 1700000000 returns readings where recordedAt is the ISO string of that Unix timestamp
7. All returned IngestMetricRequest objects have source = "GARMIN"
8. All functions are pure (no DB calls, no side effects)
T15 — Rate Limiting Middleware¶
Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: T03
Output File: prototype/src/middleware/rateLimiter.ts
Task Instruction for Melody:
Implement rate limiting using express-rate-limit. Create three limiters:
1. globalLimiter: 200 requests per 15 minutes per IP — apply to all routes
2. authLimiter: 10 requests per 15 minutes per IP — apply only to POST /auth/login and POST /auth/register
3. webhookLimiter: 100 requests per minute per IP — apply only to POST /webhooks/garmin
When rate limit is exceeded, return HTTP 429 with body { error: "Too Many Requests", message: "Rate limit exceeded. Try again in X seconds.", statusCode: 429 }.
Set standardHeaders: true (sends RateLimit-* headers) and legacyHeaders: false.
Acceptance Criteria:
1. rateLimiter.ts exports three limiters: globalLimiter, authLimiter, webhookLimiter
2. The authLimiter is applied to POST /auth/login (verifiable by inspecting app.ts route mounting)
3. GET /health returns RateLimit-Limit header in response (global limiter is applied)
4. After 10 requests to POST /auth/login in rapid succession from the same IP, the 11th request returns HTTP 429
5. The 429 response body contains keys error, message, statusCode
6. message in the 429 response contains "seconds" (indicates retry timing)
T16 — Malformed External Data Error Recovery¶
Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T14, T13
Output File: Updates to prototype/src/services/garminTransform.ts and prototype/src/routes/webhooks.ts
Task Instruction for Melody:
Add resilient error handling to the Garmin integration. The transform pipeline must handle: (1) summaries is not an array (log and return empty array), (2) individual summary has startTimeInSeconds as a non-number (skip that summary, log warning), (3) individual metric value is not a finite number (skip that metric, log warning), (4) sleepScoreTotal > 100 or < 0 (clamp to [0, 100] before conversion, log warning).
Implement validateGarminSummary(summary: unknown): summary is GarminDailySummary — a type guard that returns true only if the input has a summaryId string and a startTimeInSeconds number. Use this in the transform pipeline to skip invalid summaries.
Update the webhook handler to catch any error thrown during transform/ingest and count it in failed (never let a single malformed summary crash the entire webhook handler).
Acceptance Criteria:
1. POST /webhooks/garmin with { "userId": "u1", "summaries": "not-an-array" } returns HTTP 200 with { processed: 0, failed: 0 } (does NOT crash or return 500)
2. POST /webhooks/garmin with a summary missing startTimeInSeconds field: that summary is counted in failed, valid summaries in the same payload are still processed
3. POST /webhooks/garmin with a summary where hrvValue = "not-a-number": HRV metric is skipped, other metrics from that summary are still processed
4. transformGarminSummary with sleepScoreTotal = 150 produces SLEEP_QUALITY value of 10.0 (clamped 150→100, /10=10), not 15.0
5. validateGarminSummary returns false for a non-object input (null, undefined, string, number)
6. validateGarminSummary returns false for an object missing summaryId or startTimeInSeconds
7. validateGarminSummary returns true for a minimal valid summary { summaryId: "s1", startTimeInSeconds: 1700000000 }
8. Server logs a warning (console.warn or logger.warn) when a summary is skipped due to validation failure
LAYER 5 — Adversarial¶
T17 — Bug Hunt: Find and Fix Planted Bugs¶
Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [local-ok] (primary test of local model's debugging capability)
Dependencies: T09 (unit tests must exist to validate fixes)
Output: Updated versions of the bugged files (see Section 9 for exact bug specifications)
Prerequisite (Melody builds, not the test subject): Before this task runs, Melody will introduce exactly 4 bugs as specified in Section 9.1. The bugs are planted in the actual codebase by Melody running as a setup agent. This task is then given to qwen3-coder:30b with the following prompt:
Task Prompt for the test subject (qwen3-coder:30b):
"The unit tests in tests/unit/gapScore.test.ts are failing. Some tests that should pass are failing. Find the bugs in src/services/gapScore.ts and src/services/sessionManager.ts that are causing the failures and fix them. Do not modify the test file. Run npm test to verify your fixes."
Acceptance Criteria (Quinn evaluates after the model's attempt):
1. npm test passes with 0 failures after the model's fix attempt
2. The model identified at least 2 of the 4 planted bugs by description (check the model's reasoning/output for evidence)
3. None of the model's fixes introduced new bugs (test suite should still cover the original AC from T06/T09)
4. The model did NOT modify the test file (verify with git diff tests/)
5. The fix for Bug B1 (see Section 9.1) is present and correct
6. The fix for Bug B2 (see Section 9.1) is present and correct
T18 — Code Refactor: Messy to Clean¶
Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [local-ok] (tests refactoring judgment)
Dependencies: T07, T08 (messy code is based on these files)
Output: Updated versions of the messy files (see Section 9.2 for exact messy patterns)
Prerequisite (Melody builds, not the test subject): Before this task runs, Melody will introduce the messy patterns specified in Section 9.2 into the codebase.
Task Prompt for the test subject:
"The files in src/routes/athletes.ts, src/routes/sessions.ts, and src/routes/metrics.ts have grown messy during development. Refactor them to improve code quality. Requirements: (1) Do not change any behavior — all tests must still pass. (2) The external API contract must not change (same endpoints, same response shapes). (3) You decide how to structure the improvements."
Acceptance Criteria:
1. npm test still passes after refactor (no regressions)
2. The duplicated inline Zod validation code (present in 3+ places) is consolidated into a reusable middleware or helper — verify by counting occurrences of the duplicated pattern (should be 0 after refactor)
3. Magic number metric weights are defined in a single named constant (not repeated in multiple files)
4. No as any type assertions remain in the refactored files
5. The refactored code does not mix .then() chains with async/await in the same function
6. Each of the three affected route files is shorter after refactor than before (byte count reduced)
7. The model did not change the API contract (endpoint paths, HTTP methods, response shapes are identical)
T19 — Ambiguous Requirement Implementation¶
Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: Mixed — [local-ok] for implementation, Quinn/Jules evaluate interpretation quality
Dependencies: T12 (RLS must be in place for this to be meaningful)
Output: New files and/or modifications as determined by the model
Task Prompt for the test subject (exact text, do not modify):
"Add a fatigue indicator to athlete profiles that coaches can see but athletes cannot. The fatigue indicator should reflect how tired or overloaded the athlete currently is."
No additional context is provided. The model must interpret and implement.
Scoring rubric for this task (see also Section 11):
| Behavior | Points |
|---|---|
| Model asks for clarification before implementing | +3 (shows awareness of ambiguity) |
| Model states its interpretation explicitly before writing code | +2 |
| Model derives fatigue from existing metrics (doesn't add new fields) | +2 |
| Model correctly applies RLS (athlete cannot see their own fatigue indicator) | +3 |
| Model defines "fatigued" with a concrete threshold (e.g., 3 consecutive declining GAP scores) | +2 |
Model returns fatigue in the GET /athletes/:id response only for COACH/ADMIN roles |
+2 |
| Implementation compiles and returns correct shape for COACH requests | +3 |
| Implementation correctly returns 403 or omits fatigue field for ATHLETE requests | +3 |
| Model adds a test for the new behavior | +2 |
| Maximum | 22 |
| Minimum acceptable (pass) | 12 |
Acceptance Criteria (minimum bar):
1. The implementation compiles without TypeScript errors
2. GET /athletes/:id for a COACH returns a response that includes some form of fatigue indicator (any key name, any shape)
3. GET /athletes/:id for an ATHLETE either returns HTTP 403 OR returns the response without the fatigue field (either approach is acceptable)
4. The fatigue indicator is derived from data already in the system (no new user input required)
5. The model's output includes some written explanation of the interpretation choices made
T20 — Multi-File Cross-Layer Feature: Team Performance Summary¶
Complexity: ⭐⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [cloud-required]
Rationale for cloud: This task requires coordinated changes across types, services, routes, middleware, and auth — all simultaneously. qwen3-coder:30b's context window and multi-file coordination ability is expected to be insufficient for this. This is the ceiling test.
Dependencies: All prior tasks (T01–T16)
Output Files: prototype/src/services/teamPerformance.ts, additions to prototype/src/routes/teams.ts, additions to prototype/src/types/index.ts
Task Instruction for Melody:
Add a Team Performance Summary endpoint. GET /teams/:id/performance returns aggregate performance data for all athletes on the team:
interface TeamPerformanceSummary {
teamId: string
teamName: string
athleteCount: number
averageGapScore: number | null // mean of latest GAP scores across all athletes
gapScoreDistribution: {
excellent: number // score >= 80
good: number // score 60-79
moderate: number // score 40-59
poor: number // score < 40
}
trendingUp: number // athletes with positive trend (trend > 0)
trendingDown: number // athletes with negative trend (trend < 0)
dataFreshness: {
athletesWithFreshData: number // at least one metric reading in last 24h
athletesWithStaleData: number // last metric reading > 24h ago
athletesWithNoData: number // no metric readings at all
}
calculatedAt: string // ISO datetime
}
Access control: COACH and ADMIN only. COACH can only access their own teams. Add this type to src/types/index.ts.
Acceptance Criteria:
1. GET /teams/:id/performance returns HTTP 401 without auth
2. GET /teams/:id/performance with ATHLETE role returns HTTP 403
3. GET /teams/:id/performance with COACH role for a team they own returns HTTP 200
4. GET /teams/:id/performance with COACH role for a team they don't own returns HTTP 403
5. Response body matches the TeamPerformanceSummary shape (all fields present)
6. averageGapScore is null when no athletes on the team have calculated GAP scores
7. gapScoreDistribution counts sum to athleteCount (or fewer if some athletes have no score)
8. trendingUp + trendingDown <= athleteCount
9. dataFreshness.athletesWithFreshData + dataFreshness.athletesWithStaleData + dataFreshness.athletesWithNoData === athleteCount
10. TeamPerformanceSummary type is defined and exported from src/types/index.ts
11. tsc --noEmit passes after implementation
6. Layer 5 Adversarial Setup¶
9.1 Bugs to Plant¶
Melody introduces these bugs BEFORE T17 runs. They are planted in otherwise working code from T06 and T07.
Bug B1: Wrong slice indices in trend detection
File: prototype/src/services/gapScore.ts
Function: detectTrend
Buggy code pattern: Replace historicalScores.slice(-7) (last 7) with historicalScores.slice(0, 7) (first 7)
Effect: Trend calculation always uses the oldest 7 scores as the "recent" window and the middle scores as baseline — a semantically inverted trend. A genuinely improving athlete will show as declining.
Subtlety: Only manifests when historicalScores.length > 7. Tests with fewer than 8 scores will pass fine. This is a logic error, not a crash.
Bug B2: Missing weight redistribution when component is absent
File: prototype/src/services/gapScore.ts
Function: calculateGapScore
Buggy code pattern: When a component is null/missing, the effective weight denominator is NOT recalculated — the code still divides by 1.0 (the sum of all weights) instead of the sum of present component weights.
Effect: When any metric is missing, the GAP score is systematically too low because the weights of present components don't sum to 1.0.
Example: If HRV is missing (weight 0.30), the remaining 4 components should share 100% of the weight. With the bug, they only account for 70% of the total and the score is deflated by 30%.
Subtlety: Only affects scores when data is incomplete. Full-data scenarios pass correctly.
Bug B3: Off-by-one in session overlap check
File: prototype/src/services/sessionManager.ts
Function: createSession
Buggy code pattern: The overlap query uses gte (>=) instead of gt (>) for the endTime > newStart comparison:
Bug B4: String comparison instead of Date comparison in sort
File: prototype/src/services/timeline.ts
Function: getAthleteTimeline
Buggy code pattern: Readings are sorted by recordedAt using string comparison: a.recordedAt.toString() > b.recordedAt.toString() instead of new Date(a.recordedAt).getTime() > new Date(b.recordedAt).getTime()
Effect: When recordedAt values are a mix of ISO strings from different sources (Garmin vs Manual), string comparison may sort incorrectly if format varies. More importantly, the toString() on a Prisma DateTime returns a localized string representation, not ISO format — sort order becomes locale-dependent and wrong.
Subtlety: Only manifests with mixed data sources. Purely manual data with consistent ISO strings will sort correctly by luck.
9.2 Messy Code Scenario for Refactor¶
Melody introduces these patterns BEFORE T18 runs. These are messy-but-working patterns, not bugs.
Pattern 1 — Inline Zod validation duplicated in 3 routes
In routes/athletes.ts, routes/sessions.ts, and routes/metrics.ts, replace all uses of the validateBody(schema) middleware with inline validation blocks:
// inline validation pattern to introduce (3x copies):
const parsed = SomeZodSchema.safeParse(req.body)
if (!parsed.success) {
res.status(422).json({
error: 'Validation Error',
message: parsed.error.issues.map(i => i.message).join(', '),
statusCode: 422
})
return
}
const data = parsed.data
Pattern 2 — Magic numbers for GAP weights in multiple places
In routes/athletes.ts (in the calculate endpoint handler) and in services/gapScore.ts, directly use the numeric weights 0.30, 0.20, 0.25, 0.15, 0.10 in 4 different locations instead of importing GAP_WEIGHTS from types. Add a comment // TODO: centralize these near one of the occurrences.
Pattern 3 — Mixed async styles
In routes/sessions.ts, introduce one route handler that mixes .then() and async/await:
router.post('/', async (req, res, next) => {
const session = await sessionManager.createSession(req.body)
prisma.metricReading.findMany({ where: { sessionId: session.id } })
.then(readings => {
res.status(201).json({ ...session, metricCount: readings.length })
})
.catch(next)
})
Pattern 4 — as any type assertions
Add three as any type assertions in strategic locations:
- In routes/metrics.ts: const metric = req.body as any before accessing fields
- In services/timeline.ts: (reading as any).recordedAt in the sort comparison
- In routes/athletes.ts: const athlete = await prisma.athlete.findUnique(...) as any
What "clean" looks like after refactor:
- Single validateBody(schema) middleware used consistently (no inline blocks)
- GAP_WEIGHTS imported from types in every file that uses weights
- All route handlers are pure async/await — no .then() chains
- Zero as any assertions — proper types used throughout
- Code is DRY: a helper function for patterns that appear 3+ times
9.3 Deliberately Ambiguous Product Requirement¶
Exact text for T19 prompt (do not add context):
"Add a fatigue indicator to athlete profiles that coaches can see but athletes cannot. The fatigue indicator should reflect how tired or overloaded the athlete currently is."
Why it's ambiguous: 1. What is "fatigue"? Not defined. Could be: declining GAP score trend, training load above threshold, combined multi-metric signal, explicit fatigue rating entered by athlete, or something else. 2. What does "profiles" mean? Could be the GET /athletes/:id endpoint, a new endpoint, or a new field on the Athlete DB record. 3. What format? Boolean (fatigued/not), enum (low/medium/high), 0-100 score, or textual description? 4. When does it update? On every request (real-time calculation), on metric ingestion (event-driven), daily batch, or manual trigger? 5. "Coaches can see but athletes cannot" — Does this mean a 403 error for athletes, or just omit the field from athlete responses?
Test target behavior: - A weak model will implement something without questioning the ambiguity - An average model will make explicit assumptions but may make them poorly - A good model will either ask for clarification OR explicitly enumerate its interpretations, choose one defensibly, and implement it correctly - The scoring rubric in T19 captures this gradient
7. Dependency Graph¶
Sequential (must complete in order):¶
T01 → T02 → T03 → T04 → T05
↓
T06 → T07 → T08 → T09
↓
T10 → T11 → T12
↓
T14 → T13 → T16
↓
T15
Layer 5 (all Layer 1-4 must complete first):
T09 complete → T17 (bugs planted by Melody first)
T07, T08 complete → T18 (messy code planted by Melody first)
T12 complete → T19
All T01-T16 complete → T20
Parallel execution opportunities:¶
PARALLEL BATCH A (after T03):
T04 and T05 can run in parallel
PARALLEL BATCH B (after T04 completes):
T06 and (T10 → T11 → T12) can begin simultaneously
T14 can begin independently of T06
PARALLEL BATCH C (after T06, T07 complete):
T08 and T09 can run in parallel
PARALLEL BATCH D (after T14 completes):
T13 and T15 can run in parallel
PARALLEL BATCH E (after T13, T14 complete):
T16 can begin
Critical path (longest sequential chain):¶
Critical path estimated time: 10+10+15+15+20+15+15+15+20 = 135 minutes on local modelLayer 5 gate:¶
All Layer 1-4 tasks must complete (and pass Quinn QA) before Layer 5 tasks are initiated. Melody plants the bugs and messy code as a separate setup step between Layer 4 completion and Layer 5 start.
8. Execution Constraints¶
Model tier summary:¶
| Layer | Tasks | Tier | Justification |
|---|---|---|---|
| Layer 1 | T01-T05 | [local-ok] |
Clear schema, CRUD, scaffold — well within 30B capability |
| Layer 2 | T06-T09 | [local-ok] |
Business logic with detailed spec; pure function tests validate output |
| Layer 3 | T10-T12 | [cloud-required] |
Security-sensitive; subtle auth bugs are production-critical risks |
| Layer 4 | T13-T16 | [local-ok] |
Transform + integration — deterministic with clear I/O contracts |
| Layer 5 | T17-T18 | [local-ok] |
Primary test of local model debugging/refactor capability |
| Layer 5 | T19 | Mixed | Local model implements; Jules/Quinn evaluate interpretation quality |
| Layer 5 | T20 | [cloud-required] |
Multi-file cross-layer coordination exceeds 30B reliable context window |
Token/context budget notes:¶
- T06 (GAP Score) is the most complex
[local-ok]task. The prompt + types file + service spec is ~6K tokens — within safe range for 30B. - T20 (Team Performance Summary) requires holding auth, RLS, routes, types, and services in context simultaneously — estimated 20K+ tokens.
[cloud-required]is mandatory. - If qwen3-coder:30b fails on T06 or T09, escalate to
[cloud-required]and flag as CONDITIONAL GO finding.
Speed budget (local model):¶
- At 10 tok/s average, a 300-line service file takes ~3 min generation
- 20 tasks × ~8 min avg = ~160 min total generation time for local tasks
- Add Quinn QA time + iteration: budget 4-5 hours for full run with local model
- Cloud tasks (T10-T12, T20): budget 30 min total at cloud generation speed
9. Risk & Edge Cases¶
R1: qwen3-coder:30b fails on GAP Score service (T06)¶
Probability: Medium. The weight redistribution logic and inverted scale normalization are non-trivial. Detection: Unit tests from T09 catch this immediately. Mitigation: If T06 output fails >3 of 11 acceptance criteria, escalate to cloud model. Flag as CONDITIONAL GO data point.
R2: Local model introduces security mistakes in auth (T10-T12)¶
Probability: High enough that we've already tagged these [cloud-required].
Detection: Quinn's auth-specific AC (password not in plain text, timing-safe comparison, token expiry).
Mitigation: These are already cloud-gated. If the pipeline accidentally runs them on local, Quinn should catch it and flag.
R3: Bug hunt (T17) produces false fixes — model "fixes" something that wasn't broken¶
Probability: Medium. Models sometimes change working code when debugging.
Detection: npm test passes before bug planting. Quinn runs the full test suite before and after T17. Any new test failures introduced by T17 are false fixes.
Mitigation: Version control. Quinn compares git diff for T17 output and verifies changes are limited to the buggy sections.
R4: Ambiguous requirement (T19) produces untestable output¶
Probability: Medium. If the model outputs pure prose without implementation, T19 fails. Mitigation: The minimum acceptance criteria require a compiling implementation. Prose-only = automatic fail.
R5: SQLite enum handling with Prisma¶
Known limitation: Prisma on SQLite stores enums as strings — not native enum types. This is expected behavior but could confuse the model into thinking enum validation isn't needed. Mitigation: Spec explicitly calls for Zod enum validation on all enum fields in request bodies. Quinn verifies that invalid enum values return 422, not a DB error.
R6: Context overflow on T20¶
Probability: High on local model (which is why it's [cloud-required]). If run on cloud but with a poorly structured prompt, still possible.
Mitigation: T20 prompt for Melody should include explicit file list and focused scope. Full codebase context is NOT needed — only the specific files that need changes.
R7: Garmin webhook HMAC test setup¶
The webhook HMAC test requires generating a valid signature for the test payload. Quinn must know the test secret value.
Mitigation: The .env.example includes GARMIN_WEBHOOK_SECRET=changeme_garmin. Tests use this fixed value. Quinn generates the expected HMAC in test setup using the same secret.
R8: 30B model writes tests that match its own wrong implementation¶
Probability: Low but nonzero in T09. If the model writes gapScore.ts incorrectly AND writes matching tests, unit tests pass but the logic is wrong. Mitigation: Quinn validates unit tests against the spec's acceptance criteria (AC from T06), not just that tests pass. If tests pass but don't cover AC T06-1 through T06-11, that's a spec gap finding.
10. Scoring Rubric¶
10.1 Per-Task Scoring (Quinn evaluates each task)¶
Each task scored on 5 dimensions:
| Dimension | 0 | 1 | 2 |
|---|---|---|---|
| Correctness | Fails >2 AC | Fails 1-2 AC | All AC pass |
| Completeness | Missing files or exports | Minor gaps | All deliverables present |
| Code Quality | Lint errors, type errors, or as any abuse |
Minor style issues | Clean, typed, no warnings |
| Edge Cases | Ignored spec edge cases | Some handled | All spec'd edge cases handled |
| Production Proximity | Major rework needed to ship | Minor cleanup needed | Near-production quality |
Max per task: 10 points
10.2 Pipeline Scoring (Jules evaluates after full run)¶
| Metric | Measurement | Weight |
|---|---|---|
| Spec clarity | # of clarifying questions Melody asked Jules during execution | -1 per question (max -10) |
| Orchestration overhead | # of Jules interventions needed to unblock execution | -2 per intervention (max -10) |
| Estimation accuracy | % of tasks within 2x of estimated time | % score × 10 |
| Cloud escalation rate | % of [local-ok] tasks that actually needed cloud |
-2 per escalated task |
| Feedback loop effectiveness | Did spec quality visibly improve layer to layer? | 0-5 subjective score |
Max pipeline score: 25 points
10.3 Layer 5 Adversarial Scoring¶
| Task | Max Points | Pass Threshold |
|---|---|---|
| T17 Bug Hunt | 20 | ≥12 (60%) |
| T18 Refactor | 14 | ≥8 (57%) |
| T19 Ambiguous Req | 22 | ≥12 (55%) |
| T20 Multi-file | 10 | ≥6 (60%) |
Layer 5 total max: 66 points
10.4 Go/No-Go Mapping¶
Local Model Pass Rate:
- Count tasks tagged [local-ok] that scored ≥6/10 (passes)
- Count total [local-ok] tasks (16 tasks: T01-T09, T13-T18, T19 partially)
- Pass rate = passes / total
| Pass Rate | Verdict | Next Steps |
|---|---|---|
| ≥70% (≥11/16) | GO | Proceed to APA sprint planning with qwen3-coder:30b as primary for [local-ok] tasks |
| 50-69% (8-10/16) | CONDITIONAL GO | Proceed with explicit cloud escalation policy for task types that failed; document failure patterns |
| <50% (<8/16) | NO GO | Local model insufficient; re-evaluate with different local model or full cloud approach |
Additional triggers that force CONDITIONAL GO regardless of pass rate: - T06 (GAP Score) fails: core business logic is the highest-value use case; failure here is significant even if pass rate is technically ≥70% - T17 (Bug Hunt) scores <6/20: debugging ability is essential for sprint work - T10-T12 would fail if run on local (theoretical — they're cloud-required, but if Jules manually tests these with local model as a side experiment)
Triggers that force NO GO regardless of pass rate: - Any security bug makes it to production-proximate output in T10-T12 (even though cloud model executes, if the spec was unclear enough to allow security failures, that's a spec problem) - T20 (Team Performance) completely fails with cloud model: suggests a spec design problem that will block real APA sprints
10.5 Pipeline Health Metrics (thresholds)¶
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Spec questions asked | 0 | 1-2 | >3 |
| Jules interventions | 0-1 | 2-3 | >4 |
| Estimation accuracy | >70% within 2x | 50-70% | <50% |
Cloud escalation of [local-ok] tasks |
0 | 1-2 | >2 |
11. Output Directory Structure¶
All output files in ~/.openclaw/workspace/test-runs/local-model-benchmark/prototype/:
prototype/
.env.example
package.json
tsconfig.json
prisma/
schema.prisma ← T01
seed.ts ← optional, not required
migrations/ ← T01 (auto-generated)
src/
types/
index.ts ← T02
middleware/
auth.ts ← T11
errorHandler.ts ← T03
rateLimiter.ts ← T15
requestLogger.ts ← T03
rls.ts ← T12
roleGuard.ts ← T11
validateBody.ts ← T04
services/
gapScore.ts ← T06
garminTransform.ts ← T14
sessionManager.ts ← T07
teamPerformance.ts ← T20
timeline.ts ← T08
routes/
athletes.ts ← T04, T08 additions, T12 additions
auth.ts ← T10
metrics.ts ← T07
sessions.ts ← T07
teams.ts ← T05, T20 additions
webhooks.ts ← T13
app.ts ← T03
server.ts ← T03
tests/
unit/
gapScore.test.ts ← T09
integration/
athletes.test.ts ← optional, not required for shakedown
12. Pre-Run Checklist (Melody completes before starting)¶
-
~/.openclaw/workspace/test-runs/local-model-benchmark/prototype/directory created - Ollama running with qwen3-coder:30b loaded (
ollama psshows it active) -
GARMIN_WEBHOOK_SECRET=shakedown-test-secret-2026set in.envfor consistent test HMAC values - Git initialized in
prototype/so diff commands work for T17/T18 validation - Quinn has access to the test directory before Layer 5 starts for the pre-bug baseline run
Spec complete. Forge out. "The blueprint is the product. Everything else is just typing."