FORGE_SPEC.md — APA Production Readiness Shakedown¶

Author: Forge (Director of Product Architecture) Date: 2026-03-20 Spec Type: Architecture Spec (3-5 pages) Status: READY FOR EXECUTION

1. Objective¶

Build a functional APA Core API prototype — a runnable Express/TypeScript/Prisma application covering athletes, teams, sessions, metric ingestion, GAP score calculation, JWT auth with RBAC, and a Garmin webhook integration — to validate whether qwen3-coder:30b and our Forge → Melody → Quinn pipeline can execute APA development sprints at acceptable quality, speed, and cost.

This is a shakedown cruise, not a production deployment. Real logic, real patterns, real constraints — no toy code.

2. Scope Boundaries¶

IN SCOPE¶

Prisma schema with full APA entity set (Athlete, Team, Session, MetricReading, GapScore, User, RefreshToken)
TypeScript type layer mirroring schema + API-specific request/response types
Express app with request logging, error handling, and health check
Athlete and Team CRUD with Zod validation
GAP Score calculation service with normalization, trend detection, and missing-data handling
Session management and MetricReading ingestion endpoints
Athlete timeline/history aggregation (last N readings per metric type)
JWT authentication: register, login, token refresh, logout
Three roles: ADMIN, COACH, ATHLETE
Row-level security: coach → team athletes only; athlete → self only
Garmin Connect mock webhook receiver
External data transformation pipeline (Garmin format → internal MetricReading)
Rate limiting middleware (express-rate-limit)
Malformed external data error recovery (schema validation + graceful degradation)
Unit tests for GAP score service (Vitest)
Layer 5 adversarial tasks: bug hunting, refactoring, ambiguous requirement, multi-file feature

OUT OF SCOPE¶

Production database (use SQLite for prototype, not Postgres)
Real Garmin OAuth / live API credentials
Frontend / dashboard
Push notifications
Historical data import / migration tooling
Multi-tenancy beyond coach → team scoping
Caching layer (Redis, etc.)
Deployment / containerization
Email / SMS delivery
Any UI whatsoever

3. Data Models / API Contracts¶

3.1 Prisma Schema¶

File: prototype/prisma/schema.prisma

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "sqlite"
  url      = env("DATABASE_URL")
}

enum Role {
  ADMIN
  COACH
  ATHLETE
}

enum SessionType {
  TRAINING
  RECOVERY
  COMPETITION
  REST
}

enum DataSource {
  MANUAL
  GARMIN
  APPLE_HEALTH
  API
}

enum MetricType {
  HRV
  RESTING_HR
  SLEEP_DURATION
  SLEEP_QUALITY
  TRAINING_LOAD
  MOOD_SCORE
}

model User {
  id            String         @id @default(cuid())
  email         String         @unique
  passwordHash  String
  role          Role           @default(ATHLETE)
  createdAt     DateTime       @default(now())
  updatedAt     DateTime       @updatedAt
  athlete       Athlete?
  coach         Coach?
  refreshTokens RefreshToken[]
}

model RefreshToken {
  id        String   @id @default(cuid())
  token     String   @unique
  userId    String
  user      User     @relation(fields: [userId], references: [id], onDelete: Cascade)
  expiresAt DateTime
  createdAt DateTime @default(now())
}

model Coach {
  id        String   @id @default(cuid())
  userId    String   @unique
  user      User     @relation(fields: [userId], references: [id], onDelete: Cascade)
  name      String
  teams     Team[]
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt
}

model Team {
  id        String    @id @default(cuid())
  name      String
  coachId   String
  coach     Coach     @relation(fields: [coachId], references: [id])
  athletes  Athlete[]
  createdAt DateTime  @default(now())
  updatedAt DateTime  @updatedAt
}

model Athlete {
  id             String          @id @default(cuid())
  userId         String          @unique
  user           User            @relation(fields: [userId], references: [id], onDelete: Cascade)
  name           String
  teamId         String?
  team           Team?           @relation(fields: [teamId], references: [id])
  dateOfBirth    DateTime?
  sessions       Session[]
  metricReadings MetricReading[]
  gapScores      GapScore[]
  createdAt      DateTime        @default(now())
  updatedAt      DateTime        @updatedAt
}

model Session {
  id             String          @id @default(cuid())
  athleteId      String
  athlete        Athlete         @relation(fields: [athleteId], references: [id], onDelete: Cascade)
  sessionType    SessionType
  startTime      DateTime
  endTime        DateTime?
  notes          String?
  source         DataSource      @default(MANUAL)
  metricReadings MetricReading[]
  createdAt      DateTime        @default(now())
  updatedAt      DateTime        @updatedAt
}

model MetricReading {
  id          String     @id @default(cuid())
  athleteId   String
  athlete     Athlete    @relation(fields: [athleteId], references: [id], onDelete: Cascade)
  sessionId   String?
  session     Session?   @relation(fields: [sessionId], references: [id])
  metricType  MetricType
  value       Float
  unit        String
  recordedAt  DateTime
  source      DataSource @default(MANUAL)
  isStale     Boolean    @default(false)
  createdAt   DateTime   @default(now())
}

model GapScore {
  id            String   @id @default(cuid())
  athleteId     String
  athlete       Athlete  @relation(fields: [athleteId], references: [id], onDelete: Cascade)
  score         Float
  trend         Float
  components    String   // JSON string: { hrv, sleep, trainingLoad, mood, restingHr }
  hasStaleData  Boolean  @default(false)
  calculatedAt  DateTime
  createdAt     DateTime @default(now())
}

3.2 TypeScript Types¶

File: prototype/src/types/index.ts

All Prisma model types are re-exported from @prisma/client. This file defines additional API-specific types.

// Re-export Prisma enums for use in application code
export { Role, SessionType, DataSource, MetricType } from '@prisma/client'

// GAP Score component weights (must sum to 1.0)
export const GAP_WEIGHTS = {
  HRV: 0.30,
  SLEEP: 0.20,       // composite of SLEEP_DURATION + SLEEP_QUALITY
  TRAINING_LOAD: 0.25,
  MOOD_SCORE: 0.15,
  RESTING_HR: 0.10,
} as const

// Normalization ranges per metric (raw value → 0-100 normalized)
export interface MetricNormalizationRange {
  metricType: string
  rawMin: number
  rawMax: number
  invertedScale: boolean  // true = lower raw value → higher normalized score (e.g., resting HR)
}

export const NORMALIZATION_RANGES: Record<string, MetricNormalizationRange> = {
  HRV: { metricType: 'HRV', rawMin: 20, rawMax: 100, invertedScale: false },
  RESTING_HR: { metricType: 'RESTING_HR', rawMin: 40, rawMax: 100, invertedScale: true },
  SLEEP_DURATION: { metricType: 'SLEEP_DURATION', rawMin: 3, rawMax: 10, invertedScale: false },
  SLEEP_QUALITY: { metricType: 'SLEEP_QUALITY', rawMin: 1, rawMax: 10, invertedScale: false },
  TRAINING_LOAD: { metricType: 'TRAINING_LOAD', rawMin: 0, rawMax: 500, invertedScale: false },
  MOOD_SCORE: { metricType: 'MOOD_SCORE', rawMin: 1, rawMax: 10, invertedScale: false },
}

// GAP Score computation input and output
export interface GapScoreInput {
  athleteId: string
  readings: {
    metricType: string
    value: number
    recordedAt: Date
  }[]
}

export interface GapScoreComponents {
  hrv: number | null        // normalized 0-100 or null if missing
  sleep: number | null      // composite of duration + quality, or null
  trainingLoad: number | null
  mood: number | null
  restingHr: number | null
}

export interface GapScoreResult {
  score: number             // final weighted composite 0-100
  trend: number             // positive = improving, negative = declining
  components: GapScoreComponents
  hasStaleData: boolean     // true if any component used data >24h old
  missingComponents: string[]
}

// Trend detection
export interface TrendWindow {
  recent: number[]          // last 7 days of daily GAP scores
  baseline: number[]        // days 8-28 of daily GAP scores
  delta: number             // recent mean minus baseline mean
}

// API Request/Response types

export interface RegisterRequest {
  email: string
  password: string
  role: 'ADMIN' | 'COACH' | 'ATHLETE'
  name: string
}

export interface LoginRequest {
  email: string
  password: string
}

export interface AuthResponse {
  accessToken: string
  refreshToken: string
  user: {
    id: string
    email: string
    role: string
  }
}

export interface CreateAthleteRequest {
  name: string
  email: string            // must match a User with role ATHLETE
  teamId?: string
  dateOfBirth?: string     // ISO date string
}

export interface UpdateAthleteRequest {
  name?: string
  teamId?: string | null
  dateOfBirth?: string
}

export interface CreateTeamRequest {
  name: string
  coachId: string
}

export interface UpdateTeamRequest {
  name?: string
}

export interface CreateSessionRequest {
  athleteId: string
  sessionType: 'TRAINING' | 'RECOVERY' | 'COMPETITION' | 'REST'
  startTime: string         // ISO datetime
  endTime?: string          // ISO datetime
  notes?: string
  source?: 'MANUAL' | 'GARMIN' | 'APPLE_HEALTH' | 'API'
}

export interface IngestMetricRequest {
  athleteId: string
  metricType: 'HRV' | 'RESTING_HR' | 'SLEEP_DURATION' | 'SLEEP_QUALITY' | 'TRAINING_LOAD' | 'MOOD_SCORE'
  value: number
  unit: string
  recordedAt: string        // ISO datetime
  sessionId?: string
  source?: 'MANUAL' | 'GARMIN' | 'APPLE_HEALTH' | 'API'
}

export interface AthleteTimelineRequest {
  athleteId: string
  metricTypes?: string[]    // filter by type; omit = all
  limit?: number            // readings per metric type; default 30
  fromDate?: string         // ISO datetime
  toDate?: string           // ISO datetime
}

export interface AthleteTimelineResponse {
  athleteId: string
  metrics: {
    metricType: string
    readings: {
      value: number
      unit: string
      recordedAt: string
      source: string
      isStale: boolean
    }[]
  }[]
  latestGapScore: GapScoreResult | null
}

// Garmin webhook types
export interface GarminWebhookPayload {
  userId: string            // Garmin user ID (mapped to athleteId externally)
  summaries: GarminDailySummary[]
}

export interface GarminDailySummary {
  summaryId: string
  startTimeInSeconds: number    // Unix epoch
  durationInSeconds: number
  hrvValue?: number             // ms
  restingHeartRateInBeatsPerMinute?: number
  sleepDurationInSeconds?: number
  sleepScoreTotal?: number      // 0-100
  trainingLoadBalance?: {
    currentTrainingLoad?: number
  }
  stressLevel?: number          // 0-100
}

// Express augmentation for auth context
export interface AuthenticatedRequest extends Express.Request {
  user: {
    id: string
    email: string
    role: 'ADMIN' | 'COACH' | 'ATHLETE'
    athleteId?: string
    coachId?: string
  }
}

// Standard API error response
export interface ApiError {
  error: string
  message: string
  statusCode: number
  details?: unknown
}

// Standard paginated response wrapper
export interface PaginatedResponse<T> {
  data: T[]
  total: number
  page: number
  pageSize: number
}

3.3 API Endpoint Contracts¶

All endpoints return Content-Type: application/json. All errors use ApiError shape.

Method	Path	Auth Required	Roles	Description
GET	/health	No	—	Health check
POST	/auth/register	No	—	Create user + athlete/coach record
POST	/auth/login	No	—	Returns access + refresh tokens
POST	/auth/refresh	No	—	Exchange refresh token for new access token
POST	/auth/logout	Yes	All	Invalidate refresh token
GET	/athletes	Yes	ADMIN, COACH	List athletes (coach: team-scoped)
POST	/athletes	Yes	ADMIN, COACH	Create athlete
GET	/athletes/:id	Yes	ADMIN, COACH, ATHLETE	Get athlete (RLS applies)
PATCH	/athletes/:id	Yes	ADMIN, COACH	Update athlete
DELETE	/athletes/:id	Yes	ADMIN	Delete athlete
GET	/teams	Yes	ADMIN, COACH	List teams (coach: own teams only)
POST	/teams	Yes	ADMIN, COACH	Create team
GET	/teams/:id	Yes	ADMIN, COACH	Get team
PATCH	/teams/:id	Yes	ADMIN, COACH	Update team
DELETE	/teams/:id	Yes	ADMIN	Delete team
GET	/sessions	Yes	ADMIN, COACH, ATHLETE	List sessions (RLS applies)
POST	/sessions	Yes	ADMIN, COACH, ATHLETE	Create session
GET	/sessions/:id	Yes	All	Get session (RLS applies)
POST	/metrics	Yes	ADMIN, COACH, ATHLETE	Ingest single metric reading
POST	/metrics/bulk	Yes	ADMIN, COACH	Bulk ingest metric readings
GET	/athletes/:id/timeline	Yes	ADMIN, COACH, ATHLETE	Athlete timeline (RLS applies)
GET	/athletes/:id/gap-score	Yes	ADMIN, COACH, ATHLETE	Latest GAP score (RLS applies)
POST	/athletes/:id/gap-score/calculate	Yes	ADMIN, COACH	Trigger GAP score recalculation
POST	/webhooks/garmin	No*	—	Garmin webhook receiver (*HMAC signature validated)

4. Architecture Decisions¶

ADR-001: SQLite for prototype database¶

Decision: Use SQLite via Prisma, not PostgreSQL. Rationale: Zero infrastructure setup. This is a capability test, not a deployment test. Prisma abstracts the difference — swapping to Postgres in production is a config change, not a code change. Tradeoff: SQLite has no native enum type (Prisma handles this via string fields + CHECK constraints). Some advanced Postgres features (JSONB, row-level security at DB layer) are unavailable — implement RLS in application middleware instead.

ADR-002: Application-layer RLS, not database-layer¶

Decision: Implement row-level security in Express middleware, not Postgres RLS policies. Rationale: SQLite doesn't support DB-level RLS. Also, explicit middleware RLS is more testable and more legible than DB policies for a team of this size. The middleware injects scoping filters into all Prisma queries. Tradeoff: Application-layer RLS can be bypassed by internal service calls that skip middleware. Document this clearly. For Layer 3 implementation, every route that touches athlete/session/metric data MUST pass through the RLS middleware.

ADR-003: JWT with short-lived access tokens + refresh tokens¶

Decision: Access token TTL = 15 minutes. Refresh token TTL = 7 days. Refresh tokens stored in DB (RefreshToken table) for revocation support. Rationale: Short-lived access tokens limit blast radius of token theft. Stored refresh tokens allow logout-everywhere functionality. This is the standard pattern for mobile + web APIs. Tradeoff: More complex than single long-lived token. Justified for a system that will handle athlete health data.

ADR-004: GAP Score as computed + cached, not derived on every read¶

Decision: GAP scores are calculated on-demand (POST /athletes/:id/gap-score/calculate) and stored in the GapScore table. The GET endpoint reads the latest stored score. Rationale: GAP score involves aggregating up to 28 days of MetricReadings. Computing this on every GET would be expensive. Storing the result allows trend comparison over time (each calculation is a snapshot). Garmin webhook ingestion auto-triggers recalculation. Tradeoff: Score can be stale between webhook deliveries. The hasStaleData flag on GapScore signals this to consumers.

ADR-005: Garmin webhook uses HMAC signature validation, not OAuth¶

Decision: Garmin webhook receiver validates a shared HMAC-SHA256 secret in the X-Garmin-Signature header. Rationale: Garmin's Connect IQ webhook pattern uses HMAC validation, not OAuth bearer tokens. For the prototype, the secret is an env var (GARMIN_WEBHOOK_SECRET). For the test, we mock valid and invalid signatures. Tradeoff: Requires secret rotation strategy in production.

ADR-006: Zod for all request validation, not class-validator or manual checks¶

Decision: Use Zod schemas for all incoming request bodies. Validation runs in a route-level middleware factory (validateBody(schema)). Rationale: Zod is TypeScript-first, generates types automatically, and keeps validation co-located with route definitions. No decorator magic.

ADR-007: Directory structure¶

prototype/
  prisma/
    schema.prisma
    seed.ts
  src/
    types/
      index.ts          # All TypeScript types + API contracts
    middleware/
      auth.ts           # JWT verification middleware
      errorHandler.ts   # Global error handler
      requestLogger.ts  # Morgan-style request logging
      rateLimiter.ts    # express-rate-limit config
      roleGuard.ts      # Role-based access factory
      rls.ts            # Row-level security middleware
      validateBody.ts   # Zod validation factory
    services/
      gapScore.ts       # GAP score calculation (core business logic)
      timeline.ts       # Athlete history aggregation
      sessionManager.ts # Session overlap validation + creation
      garminTransform.ts # External data transformation
    routes/
      auth.ts
      athletes.ts
      teams.ts
      sessions.ts
      metrics.ts
      webhooks.ts
    app.ts              # Express app setup, route mounting
    server.ts           # HTTP server entry point
  tests/
    unit/
      gapScore.test.ts  # Vitest unit tests for GAP score service
    integration/
      athletes.test.ts  # Basic integration tests for Athlete CRUD
  package.json
  tsconfig.json
  .env.example

5. Task Decomposition¶

LAYER 1 — Foundation¶

T01 — Prisma Schema + Initial Migration¶

Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: None
Output File: prototype/prisma/schema.prisma

Task Instruction for Melody: Create the Prisma schema file exactly as specified in Section 3.1 of this spec. Run npx prisma migrate dev --name init to generate and apply the initial SQLite migration. Verify with npx prisma studio that all tables are created. Also create prototype/.env.example with DATABASE_URL="file:./dev.db" and JWT_SECRET="changeme" and JWT_REFRESH_SECRET="changeme_refresh" and GARMIN_WEBHOOK_SECRET="changeme_garmin".

Acceptance Criteria: 1. prototype/prisma/schema.prisma exists and contains all 9 models: User, RefreshToken, Coach, Team, Athlete, Session, MetricReading, GapScore, and all 4 enums: Role, SessionType, DataSource, MetricType 2. npx prisma migrate dev --name init completes without errors when run from prototype/ 3. prototype/prisma/migrations/ directory exists with at least one migration folder 4. Running npx prisma db push on a fresh database creates all tables without error 5. prototype/.env.example exists with all four env var keys listed

T02 — TypeScript Type Definitions¶

Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: T01 (Prisma types must exist to import from @prisma/client)
Output File: prototype/src/types/index.ts

Task Instruction for Melody: Create prototype/src/types/index.ts with all types defined in Section 3.2. This includes: GAP_WEIGHTS constant, NORMALIZATION_RANGES constant, all interface definitions (GapScoreInput, GapScoreResult, GapScoreComponents, TrendWindow, all Request/Response types, GarminWebhookPayload, GarminDailySummary, AuthenticatedRequest, ApiError, PaginatedResponse). Re-export Role, SessionType, DataSource, MetricType from @prisma/client.

Acceptance Criteria: 1. prototype/src/types/index.ts exists 2. GAP_WEIGHTS.HRV + GAP_WEIGHTS.SLEEP + GAP_WEIGHTS.TRAINING_LOAD + GAP_WEIGHTS.MOOD_SCORE + GAP_WEIGHTS.RESTING_HR === 1.0 (weights sum to exactly 1.0) 3. tsc --noEmit runs without type errors in prototype/ 4. All interfaces listed in Section 3.2 are present and exported 5. GapScoreComponents allows null for each component field (missing data handling) 6. AuthenticatedRequest extends Express.Request with a user property

T03 — Express App Scaffold + Middleware Stack¶

Complexity: ⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T02
Output Files: prototype/src/app.ts, prototype/src/server.ts, prototype/src/middleware/errorHandler.ts, prototype/src/middleware/requestLogger.ts, prototype/package.json, prototype/tsconfig.json

Task Instruction for Melody: Scaffold the Express application. app.ts sets up middleware and mounts routes (routes will be added in later tasks — use placeholder routers). server.ts starts the HTTP server on PORT env var (default 3001). Create errorHandler.ts middleware that catches all errors, logs them, and returns ApiError-shaped JSON responses with appropriate HTTP status codes. Create requestLogger.ts that logs method, path, status, and duration for every request. Implement GET /health directly in app.ts returning { status: "ok", timestamp: ISO_STRING }. Set up package.json with all required dependencies and tsconfig.json with strict TypeScript settings targeting es2022.

Required dependencies: express, @prisma/client, prisma, zod, jsonwebtoken, bcrypt, express-rate-limit, morgan, dotenv, cors Required devDependencies: typescript, @types/express, @types/node, @types/jsonwebtoken, @types/bcrypt, vitest, tsx, ts-node

Acceptance Criteria: 1. npm install completes without errors in prototype/ 2. npm run dev (using tsx src/server.ts) starts the server without errors 3. curl -s http://localhost:3001/health returns HTTP 200 with JSON body containing keys status and timestamp 4. status value is exactly the string "ok" 5. timestamp value is a valid ISO 8601 datetime string 6. Sending GET /nonexistent returns HTTP 404 with an ApiError-shaped body (has keys: error, message, statusCode) 7. Throwing an error from a route handler returns HTTP 500 with an ApiError-shaped body (does NOT expose stack traces in the response body) 8. Request log output appears in stdout for every request (format: [METHOD] /path STATUS DURATIONms) 9. tsc --noEmit passes without errors

T04 — Athlete CRUD Endpoints¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T03
Output Files: prototype/src/routes/athletes.ts, prototype/src/middleware/validateBody.ts

Task Instruction for Melody: Implement the Athlete CRUD routes as specified in Section 3.3. Create validateBody.ts as a Zod-based middleware factory: validateBody(schema) returns an Express middleware that validates req.body against the schema, calls next() on success, or returns HTTP 422 with a Zod error summary on failure. Implement all 5 Athlete endpoints. At this stage, skip auth middleware (will be added in T11). Use Prisma client for all DB operations. Validate incoming data with Zod schemas derived from CreateAthleteRequest and UpdateAthleteRequest types. Handle the case where an athlete ID does not exist (404). Handle unique constraint violations on userId (409 Conflict). PATCH should only update provided fields (partial update, not full replacement).

Acceptance Criteria: 1. POST /athletes with valid body returns HTTP 201 with the created Athlete object (all fields present including id, createdAt, updatedAt) 2. POST /athletes with missing name field returns HTTP 422 with an error body (not HTTP 500) 3. POST /athletes with missing email field returns HTTP 422 4. GET /athletes returns HTTP 200 with an array (can be empty) 5. GET /athletes/:id with a valid existing ID returns HTTP 200 with that athlete's data 6. GET /athletes/:id with a non-existent ID returns HTTP 404 7. PATCH /athletes/:id with { "name": "Updated Name" } returns HTTP 200 with updated name, preserving all other fields 8. PATCH /athletes/:id with an unknown field (e.g., { "hackField": "x" }) is silently ignored (Zod strips unknown keys, no error) 9. DELETE /athletes/:id with a valid ID returns HTTP 204 with no body 10. DELETE /athletes/:id with a non-existent ID returns HTTP 404

T05 — Team CRUD Endpoints¶

Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: T03, T04 (reuse validateBody middleware)
Output File: prototype/src/routes/teams.ts

Task Instruction for Melody: Implement the Team CRUD routes as specified in Section 3.3. Reuse validateBody.ts from T04. Teams must validate that the coachId in a POST /teams request refers to an existing Coach record (not just a User). If coachId references a non-existent Coach, return HTTP 422 with message "Coach not found". Skip auth for now. PATCH supports partial update of name only.

Acceptance Criteria: 1. POST /teams with { "name": "Team Alpha", "coachId": "valid-coach-id" } returns HTTP 201 with team object 2. POST /teams with missing name returns HTTP 422 3. POST /teams with a coachId that doesn't reference a valid Coach record returns HTTP 422 (not HTTP 500) 4. GET /teams returns HTTP 200 with array 5. GET /teams/:id returns the team with its athletes array included (Prisma include: { athletes: true }) 6. PATCH /teams/:id with { "name": "New Name" } returns HTTP 200 with updated team 7. DELETE /teams/:id returns HTTP 204 8. DELETE /teams/:id with a non-existent ID returns HTTP 404

LAYER 2 — Business Logic¶

T06 — GAP Score Calculation Service¶

Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [local-ok]
Dependencies: T02
Output File: prototype/src/services/gapScore.ts

Task Instruction for Melody: Implement the GAP Score calculation service. This is the most important file in the prototype — get this right.

Normalization function: Given a raw metric value and a MetricNormalizationRange, return a 0-100 score. Linear interpolation: (value - rawMin) / (rawMax - rawMin) * 100. Clamp result to [0, 100]. For inverted scales (lower is better, e.g., resting HR), the formula is (rawMax - value) / (rawMax - rawMin) * 100.

Component calculation: - HRV → directly normalize MetricType.HRV - SLEEP → average of normalized SLEEP_DURATION and SLEEP_QUALITY (if only one is present, use that one alone) - TRAINING_LOAD → normalize MetricType.TRAINING_LOAD - MOOD → normalize MetricType.MOOD_SCORE - RESTING_HR → normalize MetricType.RESTING_HR (inverted scale)

Missing data handling: - If a component has no reading within the last 24 hours, mark isStale = true for that component - If a component is completely absent, set it to null in components output and redistribute its weight proportionally among present components - Weight redistribution: effectiveWeight[i] = GAP_WEIGHTS[i] / sum(GAP_WEIGHTS[present components]) - If ALL components are absent, return score = 0, trend = 0, hasStaleData = true

Weighted composite: score = sum(normalizedComponent[i] * effectiveWeight[i]) for all non-null components

Trend detection: - Accept an array of past GAP scores (daily, up to 28 entries, most recent last) - Recent window = last 7 scores (indices length-7 to length-1) - Baseline window = scores at indices 0 to length-8 (up to 21 scores) - If fewer than 2 total scores, trend = 0 - If fewer than 7 total scores, recent = all available, baseline = empty → trend = 0 - trend = mean(recent) - mean(baseline) — positive means improving

Export these functions: normalizeMetric, calculateGapScore(input: GapScoreInput): GapScoreResult, detectTrend(historicalScores: number[]): number

Acceptance Criteria: 1. normalizeMetric({ metricType: 'HRV', rawMin: 20, rawMax: 100, invertedScale: false }, 60) returns 50 (exactly) 2. normalizeMetric({ metricType: 'RESTING_HR', rawMin: 40, rawMax: 100, invertedScale: true }, 40) returns 100 (lowest resting HR = best score) 3. normalizeMetric(range, value) returns 0 when value <= rawMin (for non-inverted) and 100 when value >= rawMax (clamping enforced) 4. calculateGapScore with all 6 metric types present returns a score between 0 and 100 (inclusive) 5. calculateGapScore with HRV missing: the returned components.hrv is null, missingComponents array contains "HRV", and the score is calculated using only the 4 present components with redistributed weights 6. calculateGapScore with all metrics missing returns { score: 0, trend: 0, hasStaleData: true, missingComponents: ['HRV', 'SLEEP', 'TRAINING_LOAD', 'MOOD', 'RESTING_HR'] } 7. A reading with recordedAt more than 24 hours before the calculation time sets hasStaleData: true 8. detectTrend([70, 72, 68, 75, 73]) returns 0 (fewer than 7 scores → trend = 0) 9. detectTrend with exactly 7 scores returns 0 (no baseline window) 10. detectTrend([60,61,62,63,64,65,66,70,71,72,73,74,75,76]) returns a positive number (recent 7 = [70-76], baseline = [60-66], recent mean > baseline mean) 11. All exported functions are pure (no side effects, no DB calls)

T07 — Session Management + Metric Ingestion Endpoints¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T04 (validateBody), T06 (gapScore service)
Output Files: prototype/src/routes/sessions.ts, prototype/src/routes/metrics.ts, prototype/src/services/sessionManager.ts

Task Instruction for Melody: Implement session management and metric ingestion.

sessionManager.ts: Export createSession(data: CreateSessionRequest): Promise<Session>. This function must: 1. Validate that endTime > startTime (if both provided) — throw 422 if violated 2. Check for overlapping sessions: query for any session for the same athlete where (existingStart < newEnd) AND (existingEnd > newStart). If overlap found, throw 409 with message "Session overlaps with existing session [id]" 3. Create and return the Prisma session record

Routes: Implement POST /sessions, GET /sessions, GET /sessions/:id per the contract in Section 3.3. For metric ingestion: POST /metrics creates a single MetricReading. POST /metrics/bulk accepts array of IngestMetricRequest and creates all, returning { created: N, failed: M, errors: [...] } — individual failures should not abort the batch.

Acceptance Criteria: 1. POST /sessions with valid body returns HTTP 201 with session object 2. POST /sessions with endTime earlier than startTime returns HTTP 422 3. POST /sessions that would overlap an existing session returns HTTP 409 4. Two sessions that share an exact boundary (session A ends at T, session B starts at T) do NOT overlap — this is valid and returns HTTP 201 5. GET /sessions?athleteId=X filters by athleteId (query param filter is supported) 6. GET /sessions/:id for non-existent ID returns HTTP 404 7. POST /metrics with valid body returns HTTP 201 with created MetricReading 8. POST /metrics with value as a string instead of number returns HTTP 422 9. POST /metrics/bulk with 5 valid readings returns { created: 5, failed: 0, errors: [] } 10. POST /metrics/bulk where 1 of 5 readings has invalid data returns { created: 4, failed: 1, errors: [...] } — does NOT return HTTP 422 for the whole batch

T08 — Athlete Timeline Aggregation Service¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T06, T07
Output Files: prototype/src/services/timeline.ts, addition to prototype/src/routes/athletes.ts

Task Instruction for Melody: Implement the athlete timeline aggregation service and the two new athlete routes.

timeline.ts: Export getAthleteTimeline(request: AthleteTimelineRequest): Promise<AthleteTimelineResponse>. This function: 1. Queries MetricReadings for the athlete, filtered by optional metricTypes, fromDate, toDate 2. Groups readings by metricType 3. Returns at most limit readings per type, ordered by recordedAt descending 4. Fetches the most recent GapScore record for this athlete 5. Includes latestGapScore in the response (null if no score exists)

Add routes to athletes.ts: - GET /athletes/:id/timeline → calls getAthleteTimeline - GET /athletes/:id/gap-score → returns latest GapScore from DB - POST /athletes/:id/gap-score/calculate → queries last 28 days of MetricReadings, calls calculateGapScore, stores result in GapScore table, returns the GapScoreResult

Acceptance Criteria: 1. GET /athletes/:id/timeline for athlete with no readings returns { athleteId: id, metrics: [], latestGapScore: null } 2. GET /athletes/:id/timeline returns readings grouped by metricType (each group has a metricType key and readings array) 3. GET /athletes/:id/timeline?limit=5 returns at most 5 readings per metric type 4. Readings within each metric type group are ordered most-recent-first (recordedAt descending) 5. POST /athletes/:id/gap-score/calculate returns a GapScoreResult-shaped object (has keys: score, trend, components, hasStaleData, missingComponents) 6. POST /athletes/:id/gap-score/calculate with no readings returns { score: 0, hasStaleData: true, missingComponents: [...all 5 components...] } 7. GET /athletes/:id/gap-score after a calculate returns the last stored GapScore record 8. GET /athletes/:id/gap-score before any calculation returns HTTP 404 with message "No GAP score calculated yet"

T09 — Unit Tests for GAP Score Service¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T06
Output File: prototype/tests/unit/gapScore.test.ts

Task Instruction for Melody: Write Vitest unit tests for all exported functions in gapScore.ts. Tests must cover: all normalization edge cases (at min, at max, above max, below min, inverted), all weight redistribution cases (1-5 missing components), trend detection with various history lengths, and the stale data detection logic. Add "test": "vitest run" script to package.json. Tests must run without a database (pure function tests only — no Prisma).

Acceptance Criteria: 1. npm test runs without errors 2. At least 20 test cases are present (not 20 it blocks wrapping trivial assertions — 20 meaningful behavioral tests) 3. All 11 acceptance criteria from T06 are covered by at least one test each 4. All tests pass (0 failed) 5. Tests import only from ../../src/services/gapScore and ../../src/types/index — no Prisma, no HTTP, no DB 6. Test file contains a describe block for each of the three exported functions: normalizeMetric, calculateGapScore, detectTrend

LAYER 3 — Access Control¶

T10 — JWT Auth Endpoints¶

Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [cloud-required]
Rationale for cloud: Auth implementation involves security-sensitive patterns (password hashing, token signing, token revocation, timing attacks). qwen3-coder:30b has shown a tendency to make subtle security mistakes on auth flows — missing token expiry checks, improper bcrypt round counts, predictable token IDs. Cloud model required.
Dependencies: T03, T02
Output File: prototype/src/routes/auth.ts

Task Instruction for Melody: Implement auth routes: POST /auth/register, POST /auth/login, POST /auth/refresh, POST /auth/logout. Password hashing: bcrypt with saltRounds = 12. Access token: JWT signed with JWT_SECRET env var, expires in 15m, payload { sub: userId, email, role }. Refresh token: cuid(), stored in RefreshToken table with expiresAt = now + 7 days, signed separately with JWT_REFRESH_SECRET. On logout, delete the RefreshToken record. On refresh, verify the token exists in DB AND hasn't expired AND matches the signed refresh JWT, then issue new access + refresh token pair and delete the old refresh token. Register creates a User record + the appropriate role-specific record (Athlete or Coach) based on the role field.

Acceptance Criteria: 1. POST /auth/register with valid body returns HTTP 201 with AuthResponse (has accessToken, refreshToken, user keys) 2. POST /auth/register with duplicate email returns HTTP 409 3. POST /auth/login with correct credentials returns HTTP 200 with AuthResponse 4. POST /auth/login with wrong password returns HTTP 401 (not HTTP 403) 5. POST /auth/login with non-existent email returns HTTP 401 (not HTTP 404 — do not leak account existence) 6. The access token is a valid JWT decodable with jwt.verify(token, JWT_SECRET) and contains sub, email, role claims 7. The access token has an exp claim and the expiry is within 16 minutes from now (15 min + 1 min tolerance) 8. POST /auth/refresh with a valid refresh token returns HTTP 200 with a new AuthResponse (new tokens) 9. POST /auth/refresh with an expired refresh token (modify expiresAt in DB to the past) returns HTTP 401 10. POST /auth/refresh with a token that doesn't exist in the DB returns HTTP 401 11. POST /auth/logout with a valid access token deletes the refresh token from DB; subsequent POST /auth/refresh with that token returns HTTP 401 12. Passwords are NOT stored in plain text — the passwordHash column contains a bcrypt hash (starts with $2b$ )

T11 — Auth Middleware + Role Guards¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [cloud-required]
Rationale for cloud: Middleware that touches auth context must be reviewed by a model that understands the subtle ways auth bypass can occur (null checks, missing next(err) calls, order dependencies). Security-sensitive.
Dependencies: T10
Output Files: prototype/src/middleware/auth.ts, prototype/src/middleware/roleGuard.ts

Task Instruction for Melody: Implement auth.ts: a middleware that reads the Authorization: Bearer <token> header, verifies the JWT with JWT_SECRET, and attaches req.user = { id, email, role, athleteId?, coachId? } to the request. If token is missing, return 401. If token is invalid or expired, return 401. If token is valid, call next().

Implement roleGuard.ts: a factory requireRole(...roles: Role[]): RequestHandler that reads req.user.role (already set by auth middleware) and calls next() if the role is allowed, or returns HTTP 403 if not.

Apply authenticate + appropriate requireRole(...) to all existing routes per the table in Section 3.3. Update app.ts route mounting accordingly.

Acceptance Criteria: 1. GET /athletes without Authorization header returns HTTP 401 2. GET /athletes with expired JWT returns HTTP 401 3. GET /athletes with valid JWT and ATHLETE role returns HTTP 403 (athletes can't list all athletes) 4. GET /athletes with valid JWT and COACH role returns HTTP 200 5. GET /athletes with valid JWT and ADMIN role returns HTTP 200 6. GET /health with no Authorization header returns HTTP 200 (health check is not protected) 7. POST /auth/login with no Authorization header returns HTTP 200 when credentials are valid (auth endpoints are not protected) 8. requireRole('ADMIN') middleware returns HTTP 403 for COACH and ATHLETE roles 9. req.user is populated with id, email, role on all protected routes after successful auth 10. A token from a user who was deleted mid-session returns HTTP 401 (middleware queries DB to verify user still exists)

T12 — Row-Level Security Middleware¶

Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [cloud-required]
Rationale for cloud: RLS logic is subtle — COACH must see athletes in their teams only, ATHLETE must see only self, and ADMIN bypasses all. Incorrect RLS implementation is a data breach. Local model risk too high for this task.
Dependencies: T11
Output File: prototype/src/middleware/rls.ts

Task Instruction for Melody: Implement rls.ts with a applyAthleteScope middleware that adds scoping information to the request context.

Logic: - ADMIN: set req.athleteScope = null (no restriction) - COACH: query the DB for all Athlete IDs whose teamId is in any team where coachId = req.user.coachId. Set req.athleteScope = { athleteIds: [...] } - ATHLETE: set req.athleteScope = { athleteIds: [req.user.athleteId] }

Then update athlete routes to apply scope: - GET /athletes: if req.athleteScope !== null, add where: { id: { in: req.athleteScope.athleteIds } } to Prisma query - GET /athletes/:id: if req.athleteScope !== null AND requested ID is not in req.athleteScope.athleteIds, return HTTP 403

Apply applyAthleteScope to all athlete, session, metric, and GAP score routes.

Add athleteScope?: { athleteIds: string[] } | null to the AuthenticatedRequest type extension.

Acceptance Criteria: 1. COACH making GET /athletes returns only athletes in their teams (not athletes on other teams) 2. ATHLETE making GET /athletes returns HTTP 403 (per Section 3.3 role requirements) 3. ATHLETE making GET /athletes/:id where :id is their own athlete ID returns HTTP 200 4. ATHLETE making GET /athletes/:id where :id is a different athlete returns HTTP 403 5. ADMIN making GET /athletes returns all athletes regardless of team 6. COACH making GET /athletes/:id for an athlete NOT on their team returns HTTP 403 7. COACH making GET /athletes/:id for an athlete on their team returns HTTP 200 8. After a coach's athlete is moved to a different team, the coach can no longer access that athlete (scope is recalculated per request, not cached) 9. GET /health is unaffected by RLS middleware (still returns 200 without auth) 10. An ATHLETE accessing POST /sessions can only create sessions for their own athleteId — if the body's athleteId differs from their own, return HTTP 403

LAYER 4 — Integration Patterns¶

T13 — Garmin Webhook Receiver¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T07 (metric ingestion), T14 (transform — implement T14 first)
Output File: prototype/src/routes/webhooks.ts

NOTE: T14 must be completed before T13. Update dependency accordingly.

Task Instruction for Melody: Implement POST /webhooks/garmin. This endpoint: 1. Validates the HMAC-SHA256 signature: compute hmac-sha256(rawBody, GARMIN_WEBHOOK_SECRET) and compare with X-Garmin-Signature header. Return HTTP 401 if invalid. IMPORTANT: Use crypto.timingSafeEqual for comparison to prevent timing attacks. Use express.raw({ type: 'application/json' }) on this specific route to preserve the raw body for HMAC validation, then parse JSON manually. 2. Calls transformGarminPayload(payload) (from garminTransform.ts) to get an array of IngestMetricRequest 3. For each transformed metric, looks up the athleteId by mapping payload.userId through an env var or a simple in-memory config GARMIN_USER_MAPPINGS = JSON.parse(process.env.GARMIN_USER_MAPPINGS || '{}') 4. Ingests all transformed metrics via the metric ingestion logic 5. Returns HTTP 200 with { processed: N, failed: M }

Acceptance Criteria: 1. POST /webhooks/garmin with valid HMAC signature returns HTTP 200 2. POST /webhooks/garmin with missing X-Garmin-Signature header returns HTTP 401 3. POST /webhooks/garmin with incorrect HMAC signature returns HTTP 401 4. POST /webhooks/garmin with valid signature but empty summaries array returns { processed: 0, failed: 0 } 5. POST /webhooks/garmin with valid signature and 1 summary containing HRV data creates a MetricReading in the DB 6. If a Garmin userId has no mapping in GARMIN_USER_MAPPINGS, that summary is counted in failed but does not abort processing of other summaries 7. POST /webhooks/garmin does not require an Authorization header

T14 — Garmin Data Transformation Pipeline¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T02 (types)
Output File: prototype/src/services/garminTransform.ts

Task Instruction for Melody: Implement garminTransform.ts. Export transformGarminPayload(payload: GarminWebhookPayload, athleteId: string): IngestMetricRequest[].

Mapping rules (Garmin field → internal MetricType): - hrvValue → HRV, unit: "ms", value: direct - restingHeartRateInBeatsPerMinute → RESTING_HR, unit: "bpm", value: direct - sleepDurationInSeconds → SLEEP_DURATION, unit: "hours", value: sleepDurationInSeconds / 3600 - sleepScoreTotal → SLEEP_QUALITY, unit: "score", value: sleepScoreTotal / 10 (converts 0-100 to 0-10) - trainingLoadBalance.currentTrainingLoad → TRAINING_LOAD, unit: "au", value: direct - stressLevel → MOOD_SCORE, unit: "score", value: (100 - stressLevel) / 10 (inverts stress: 0 stress = 10 mood) - recordedAt for all: new Date(summary.startTimeInSeconds * 1000).toISOString() - source: "GARMIN" for all

If a Garmin field is undefined or null, skip that metric (do not create a reading with null value).

Export also: transformGarminSummary(summary: GarminDailySummary, athleteId: string): IngestMetricRequest[] (transforms a single summary, used internally by transformGarminPayload).

Acceptance Criteria: 1. transformGarminSummary with all fields present returns exactly 6 IngestMetricRequest objects (one per MetricType) 2. transformGarminSummary with hrvValue undefined returns 5 objects (HRV is omitted) 3. transformGarminSummary with sleepDurationInSeconds = 28800 returns a SLEEP_DURATION reading with value = 8 (8 hours) and unit = "hours" 4. transformGarminSummary with sleepScoreTotal = 75 returns a SLEEP_QUALITY reading with value = 7.5 and unit = "score" 5. transformGarminSummary with stressLevel = 30 returns a MOOD_SCORE reading with value = 7 (100-30=70, /10=7.0) and unit = "score" 6. transformGarminSummary with startTimeInSeconds = 1700000000 returns readings where recordedAt is the ISO string of that Unix timestamp 7. All returned IngestMetricRequest objects have source = "GARMIN" 8. All functions are pure (no DB calls, no side effects)

T15 — Rate Limiting Middleware¶

Complexity: ⭐⭐
Estimated Time: 10 min
Model Tier: [local-ok]
Dependencies: T03
Output File: prototype/src/middleware/rateLimiter.ts

Task Instruction for Melody: Implement rate limiting using express-rate-limit. Create three limiters: 1. globalLimiter: 200 requests per 15 minutes per IP — apply to all routes 2. authLimiter: 10 requests per 15 minutes per IP — apply only to POST /auth/login and POST /auth/register 3. webhookLimiter: 100 requests per minute per IP — apply only to POST /webhooks/garmin

When rate limit is exceeded, return HTTP 429 with body { error: "Too Many Requests", message: "Rate limit exceeded. Try again in X seconds.", statusCode: 429 }.

Set standardHeaders: true (sends RateLimit-* headers) and legacyHeaders: false.

Acceptance Criteria: 1. rateLimiter.ts exports three limiters: globalLimiter, authLimiter, webhookLimiter 2. The authLimiter is applied to POST /auth/login (verifiable by inspecting app.ts route mounting) 3. GET /health returns RateLimit-Limit header in response (global limiter is applied) 4. After 10 requests to POST /auth/login in rapid succession from the same IP, the 11th request returns HTTP 429 5. The 429 response body contains keys error, message, statusCode 6. message in the 429 response contains "seconds" (indicates retry timing)

T16 — Malformed External Data Error Recovery¶

Complexity: ⭐⭐⭐
Estimated Time: 15 min
Model Tier: [local-ok]
Dependencies: T14, T13
Output File: Updates to prototype/src/services/garminTransform.ts and prototype/src/routes/webhooks.ts

Task Instruction for Melody: Add resilient error handling to the Garmin integration. The transform pipeline must handle: (1) summaries is not an array (log and return empty array), (2) individual summary has startTimeInSeconds as a non-number (skip that summary, log warning), (3) individual metric value is not a finite number (skip that metric, log warning), (4) sleepScoreTotal > 100 or < 0 (clamp to [0, 100] before conversion, log warning).

Implement validateGarminSummary(summary: unknown): summary is GarminDailySummary — a type guard that returns true only if the input has a summaryId string and a startTimeInSeconds number. Use this in the transform pipeline to skip invalid summaries.

Update the webhook handler to catch any error thrown during transform/ingest and count it in failed (never let a single malformed summary crash the entire webhook handler).

Acceptance Criteria: 1. POST /webhooks/garmin with { "userId": "u1", "summaries": "not-an-array" } returns HTTP 200 with { processed: 0, failed: 0 } (does NOT crash or return 500) 2. POST /webhooks/garmin with a summary missing startTimeInSeconds field: that summary is counted in failed, valid summaries in the same payload are still processed 3. POST /webhooks/garmin with a summary where hrvValue = "not-a-number": HRV metric is skipped, other metrics from that summary are still processed 4. transformGarminSummary with sleepScoreTotal = 150 produces SLEEP_QUALITY value of 10.0 (clamped 150→100, /10=10), not 15.0 5. validateGarminSummary returns false for a non-object input (null, undefined, string, number) 6. validateGarminSummary returns false for an object missing summaryId or startTimeInSeconds 7. validateGarminSummary returns true for a minimal valid summary { summaryId: "s1", startTimeInSeconds: 1700000000 } 8. Server logs a warning (console.warn or logger.warn) when a summary is skipped due to validation failure

LAYER 5 — Adversarial¶

T17 — Bug Hunt: Find and Fix Planted Bugs¶

Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [local-ok] (primary test of local model's debugging capability)
Dependencies: T09 (unit tests must exist to validate fixes)
Output: Updated versions of the bugged files (see Section 9 for exact bug specifications)

Prerequisite (Melody builds, not the test subject): Before this task runs, Melody will introduce exactly 4 bugs as specified in Section 9.1. The bugs are planted in the actual codebase by Melody running as a setup agent. This task is then given to qwen3-coder:30b with the following prompt:

Task Prompt for the test subject (qwen3-coder:30b): "The unit tests in tests/unit/gapScore.test.ts are failing. Some tests that should pass are failing. Find the bugs in src/services/gapScore.ts and src/services/sessionManager.ts that are causing the failures and fix them. Do not modify the test file. Run npm test to verify your fixes."

Acceptance Criteria (Quinn evaluates after the model's attempt): 1. npm test passes with 0 failures after the model's fix attempt 2. The model identified at least 2 of the 4 planted bugs by description (check the model's reasoning/output for evidence) 3. None of the model's fixes introduced new bugs (test suite should still cover the original AC from T06/T09) 4. The model did NOT modify the test file (verify with git diff tests/) 5. The fix for Bug B1 (see Section 9.1) is present and correct 6. The fix for Bug B2 (see Section 9.1) is present and correct

T18 — Code Refactor: Messy to Clean¶

Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [local-ok] (tests refactoring judgment)
Dependencies: T07, T08 (messy code is based on these files)
Output: Updated versions of the messy files (see Section 9.2 for exact messy patterns)

Prerequisite (Melody builds, not the test subject): Before this task runs, Melody will introduce the messy patterns specified in Section 9.2 into the codebase.

Task Prompt for the test subject: "The files in src/routes/athletes.ts, src/routes/sessions.ts, and src/routes/metrics.ts have grown messy during development. Refactor them to improve code quality. Requirements: (1) Do not change any behavior — all tests must still pass. (2) The external API contract must not change (same endpoints, same response shapes). (3) You decide how to structure the improvements."

Acceptance Criteria: 1. npm test still passes after refactor (no regressions) 2. The duplicated inline Zod validation code (present in 3+ places) is consolidated into a reusable middleware or helper — verify by counting occurrences of the duplicated pattern (should be 0 after refactor) 3. Magic number metric weights are defined in a single named constant (not repeated in multiple files) 4. No as any type assertions remain in the refactored files 5. The refactored code does not mix .then() chains with async/await in the same function 6. Each of the three affected route files is shorter after refactor than before (byte count reduced) 7. The model did not change the API contract (endpoint paths, HTTP methods, response shapes are identical)

T19 — Ambiguous Requirement Implementation¶

Complexity: ⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: Mixed — [local-ok] for implementation, Quinn/Jules evaluate interpretation quality
Dependencies: T12 (RLS must be in place for this to be meaningful)
Output: New files and/or modifications as determined by the model

Task Prompt for the test subject (exact text, do not modify):

"Add a fatigue indicator to athlete profiles that coaches can see but athletes cannot. The fatigue indicator should reflect how tired or overloaded the athlete currently is."

No additional context is provided. The model must interpret and implement.

Scoring rubric for this task (see also Section 11):

Behavior	Points
Model asks for clarification before implementing	+3 (shows awareness of ambiguity)
Model states its interpretation explicitly before writing code	+2
Model derives fatigue from existing metrics (doesn't add new fields)	+2
Model correctly applies RLS (athlete cannot see their own fatigue indicator)	+3
Model defines "fatigued" with a concrete threshold (e.g., 3 consecutive declining GAP scores)	+2
Model returns fatigue in the `GET /athletes/:id` response only for COACH/ADMIN roles	+2
Implementation compiles and returns correct shape for COACH requests	+3
Implementation correctly returns 403 or omits fatigue field for ATHLETE requests	+3
Model adds a test for the new behavior	+2
Maximum	22
Minimum acceptable (pass)	12

Acceptance Criteria (minimum bar): 1. The implementation compiles without TypeScript errors 2. GET /athletes/:id for a COACH returns a response that includes some form of fatigue indicator (any key name, any shape) 3. GET /athletes/:id for an ATHLETE either returns HTTP 403 OR returns the response without the fatigue field (either approach is acceptable) 4. The fatigue indicator is derived from data already in the system (no new user input required) 5. The model's output includes some written explanation of the interpretation choices made

T20 — Multi-File Cross-Layer Feature: Team Performance Summary¶

Complexity: ⭐⭐⭐⭐⭐
Estimated Time: 20 min
Model Tier: [cloud-required]
Rationale for cloud: This task requires coordinated changes across types, services, routes, middleware, and auth — all simultaneously. qwen3-coder:30b's context window and multi-file coordination ability is expected to be insufficient for this. This is the ceiling test.
Dependencies: All prior tasks (T01–T16)
Output Files: prototype/src/services/teamPerformance.ts, additions to prototype/src/routes/teams.ts, additions to prototype/src/types/index.ts

Task Instruction for Melody: Add a Team Performance Summary endpoint. GET /teams/:id/performance returns aggregate performance data for all athletes on the team:

interface TeamPerformanceSummary {
  teamId: string
  teamName: string
  athleteCount: number
  averageGapScore: number | null      // mean of latest GAP scores across all athletes
  gapScoreDistribution: {
    excellent: number   // score >= 80
    good: number        // score 60-79
    moderate: number    // score 40-59
    poor: number        // score < 40
  }
  trendingUp: number     // athletes with positive trend (trend > 0)
  trendingDown: number   // athletes with negative trend (trend < 0)
  dataFreshness: {
    athletesWithFreshData: number     // at least one metric reading in last 24h
    athletesWithStaleData: number     // last metric reading > 24h ago
    athletesWithNoData: number        // no metric readings at all
  }
  calculatedAt: string   // ISO datetime
}

Access control: COACH and ADMIN only. COACH can only access their own teams. Add this type to src/types/index.ts.

Acceptance Criteria: 1. GET /teams/:id/performance returns HTTP 401 without auth 2. GET /teams/:id/performance with ATHLETE role returns HTTP 403 3. GET /teams/:id/performance with COACH role for a team they own returns HTTP 200 4. GET /teams/:id/performance with COACH role for a team they don't own returns HTTP 403 5. Response body matches the TeamPerformanceSummary shape (all fields present) 6. averageGapScore is null when no athletes on the team have calculated GAP scores 7. gapScoreDistribution counts sum to athleteCount (or fewer if some athletes have no score) 8. trendingUp + trendingDown <= athleteCount 9. dataFreshness.athletesWithFreshData + dataFreshness.athletesWithStaleData + dataFreshness.athletesWithNoData === athleteCount 10. TeamPerformanceSummary type is defined and exported from src/types/index.ts 11. tsc --noEmit passes after implementation

6. Layer 5 Adversarial Setup¶

9.1 Bugs to Plant¶

Melody introduces these bugs BEFORE T17 runs. They are planted in otherwise working code from T06 and T07.

Bug B1: Wrong slice indices in trend detection File: prototype/src/services/gapScore.ts Function: detectTrend Buggy code pattern: Replace historicalScores.slice(-7) (last 7) with historicalScores.slice(0, 7) (first 7) Effect: Trend calculation always uses the oldest 7 scores as the "recent" window and the middle scores as baseline — a semantically inverted trend. A genuinely improving athlete will show as declining. Subtlety: Only manifests when historicalScores.length > 7. Tests with fewer than 8 scores will pass fine. This is a logic error, not a crash.

Bug B2: Missing weight redistribution when component is absent File: prototype/src/services/gapScore.ts Function: calculateGapScore Buggy code pattern: When a component is null/missing, the effective weight denominator is NOT recalculated — the code still divides by 1.0 (the sum of all weights) instead of the sum of present component weights. Effect: When any metric is missing, the GAP score is systematically too low because the weights of present components don't sum to 1.0. Example: If HRV is missing (weight 0.30), the remaining 4 components should share 100% of the weight. With the bug, they only account for 70% of the total and the score is deflated by 30%. Subtlety: Only affects scores when data is incomplete. Full-data scenarios pass correctly.

Bug B3: Off-by-one in session overlap check File: prototype/src/services/sessionManager.ts Function: createSession Buggy code pattern: The overlap query uses gte (>=) instead of gt (>) for the endTime > newStart comparison:

// BUG: should be { gt: newStart } not { gte: newStart }
existingEnd: { gte: newStart }

Effect: Two sessions that share an exact boundary (A ends at T, B starts at T) are incorrectly flagged as overlapping. Acceptance criterion T07-AC4 explicitly tests this boundary case. Subtlety: Only fails for the exact boundary case. Normal overlaps and non-overlaps are unaffected.

Bug B4: String comparison instead of Date comparison in sort File: prototype/src/services/timeline.ts Function: getAthleteTimeline Buggy code pattern: Readings are sorted by recordedAt using string comparison: a.recordedAt.toString() > b.recordedAt.toString() instead of new Date(a.recordedAt).getTime() > new Date(b.recordedAt).getTime() Effect: When recordedAt values are a mix of ISO strings from different sources (Garmin vs Manual), string comparison may sort incorrectly if format varies. More importantly, the toString() on a Prisma DateTime returns a localized string representation, not ISO format — sort order becomes locale-dependent and wrong. Subtlety: Only manifests with mixed data sources. Purely manual data with consistent ISO strings will sort correctly by luck.

9.2 Messy Code Scenario for Refactor¶

Melody introduces these patterns BEFORE T18 runs. These are messy-but-working patterns, not bugs.

Pattern 1 — Inline Zod validation duplicated in 3 routes

In routes/athletes.ts, routes/sessions.ts, and routes/metrics.ts, replace all uses of the validateBody(schema) middleware with inline validation blocks:

// inline validation pattern to introduce (3x copies):
const parsed = SomeZodSchema.safeParse(req.body)
if (!parsed.success) {
  res.status(422).json({ 
    error: 'Validation Error', 
    message: parsed.error.issues.map(i => i.message).join(', '),
    statusCode: 422
  })
  return
}
const data = parsed.data

Pattern 2 — Magic numbers for GAP weights in multiple places

In routes/athletes.ts (in the calculate endpoint handler) and in services/gapScore.ts, directly use the numeric weights 0.30, 0.20, 0.25, 0.15, 0.10 in 4 different locations instead of importing GAP_WEIGHTS from types. Add a comment // TODO: centralize these near one of the occurrences.

Pattern 3 — Mixed async styles

In routes/sessions.ts, introduce one route handler that mixes .then() and async/await:

router.post('/', async (req, res, next) => {
  const session = await sessionManager.createSession(req.body)
  prisma.metricReading.findMany({ where: { sessionId: session.id } })
    .then(readings => {
      res.status(201).json({ ...session, metricCount: readings.length })
    })
    .catch(next)
})

Pattern 4 — as any type assertions

Add three as any type assertions in strategic locations: - In routes/metrics.ts: const metric = req.body as any before accessing fields - In services/timeline.ts: (reading as any).recordedAt in the sort comparison - In routes/athletes.ts: const athlete = await prisma.athlete.findUnique(...) as any

What "clean" looks like after refactor: - Single validateBody(schema) middleware used consistently (no inline blocks) - GAP_WEIGHTS imported from types in every file that uses weights - All route handlers are pure async/await — no .then() chains - Zero as any assertions — proper types used throughout - Code is DRY: a helper function for patterns that appear 3+ times

9.3 Deliberately Ambiguous Product Requirement¶

Exact text for T19 prompt (do not add context):

"Add a fatigue indicator to athlete profiles that coaches can see but athletes cannot. The fatigue indicator should reflect how tired or overloaded the athlete currently is."

Why it's ambiguous: 1. What is "fatigue"? Not defined. Could be: declining GAP score trend, training load above threshold, combined multi-metric signal, explicit fatigue rating entered by athlete, or something else. 2. What does "profiles" mean? Could be the GET /athletes/:id endpoint, a new endpoint, or a new field on the Athlete DB record. 3. What format? Boolean (fatigued/not), enum (low/medium/high), 0-100 score, or textual description? 4. When does it update? On every request (real-time calculation), on metric ingestion (event-driven), daily batch, or manual trigger? 5. "Coaches can see but athletes cannot" — Does this mean a 403 error for athletes, or just omit the field from athlete responses?

Test target behavior: - A weak model will implement something without questioning the ambiguity - An average model will make explicit assumptions but may make them poorly - A good model will either ask for clarification OR explicitly enumerate its interpretations, choose one defensibly, and implement it correctly - The scoring rubric in T19 captures this gradient

7. Dependency Graph¶

Sequential (must complete in order):¶

T01 → T02 → T03 → T04 → T05
                    ↓
              T06 → T07 → T08 → T09
                    ↓
              T10 → T11 → T12
                    ↓
              T14 → T13 → T16
                    ↓
                   T15

Layer 5 (all Layer 1-4 must complete first):
T09 complete → T17 (bugs planted by Melody first)
T07, T08 complete → T18 (messy code planted by Melody first)
T12 complete → T19
All T01-T16 complete → T20

Parallel execution opportunities:¶

PARALLEL BATCH A (after T03):
  T04 and T05 can run in parallel

PARALLEL BATCH B (after T04 completes):
  T06 and (T10 → T11 → T12) can begin simultaneously
  T14 can begin independently of T06

PARALLEL BATCH C (after T06, T07 complete):
  T08 and T09 can run in parallel

PARALLEL BATCH D (after T14 completes):
  T13 and T15 can run in parallel

PARALLEL BATCH E (after T13, T14 complete):
  T16 can begin

Critical path (longest sequential chain):¶

T01 → T02 → T03 → T04 → T06 → T07 → T08 → T09 → T17 [Layer 5]

Critical path estimated time: 10+10+15+15+20+15+15+15+20 = 135 minutes on local model

Layer 5 gate:¶

All Layer 1-4 tasks must complete (and pass Quinn QA) before Layer 5 tasks are initiated. Melody plants the bugs and messy code as a separate setup step between Layer 4 completion and Layer 5 start.

8. Execution Constraints¶

Model tier summary:¶

Layer	Tasks	Tier	Justification
Layer 1	T01-T05	`[local-ok]`	Clear schema, CRUD, scaffold — well within 30B capability
Layer 2	T06-T09	`[local-ok]`	Business logic with detailed spec; pure function tests validate output
Layer 3	T10-T12	`[cloud-required]`	Security-sensitive; subtle auth bugs are production-critical risks
Layer 4	T13-T16	`[local-ok]`	Transform + integration — deterministic with clear I/O contracts
Layer 5	T17-T18	`[local-ok]`	Primary test of local model debugging/refactor capability
Layer 5	T19	Mixed	Local model implements; Jules/Quinn evaluate interpretation quality
Layer 5	T20	`[cloud-required]`	Multi-file cross-layer coordination exceeds 30B reliable context window

Token/context budget notes:¶

T06 (GAP Score) is the most complex [local-ok] task. The prompt + types file + service spec is ~6K tokens — within safe range for 30B.
T20 (Team Performance Summary) requires holding auth, RLS, routes, types, and services in context simultaneously — estimated 20K+ tokens. [cloud-required] is mandatory.
If qwen3-coder:30b fails on T06 or T09, escalate to [cloud-required] and flag as CONDITIONAL GO finding.

Speed budget (local model):¶

At 10 tok/s average, a 300-line service file takes ~3 min generation
20 tasks × ~8 min avg = ~160 min total generation time for local tasks
Add Quinn QA time + iteration: budget 4-5 hours for full run with local model
Cloud tasks (T10-T12, T20): budget 30 min total at cloud generation speed

9. Risk & Edge Cases¶

R1: qwen3-coder:30b fails on GAP Score service (T06)¶

Probability: Medium. The weight redistribution logic and inverted scale normalization are non-trivial. Detection: Unit tests from T09 catch this immediately. Mitigation: If T06 output fails >3 of 11 acceptance criteria, escalate to cloud model. Flag as CONDITIONAL GO data point.

R2: Local model introduces security mistakes in auth (T10-T12)¶

Probability: High enough that we've already tagged these [cloud-required]. Detection: Quinn's auth-specific AC (password not in plain text, timing-safe comparison, token expiry). Mitigation: These are already cloud-gated. If the pipeline accidentally runs them on local, Quinn should catch it and flag.

R3: Bug hunt (T17) produces false fixes — model "fixes" something that wasn't broken¶

Probability: Medium. Models sometimes change working code when debugging. Detection: npm test passes before bug planting. Quinn runs the full test suite before and after T17. Any new test failures introduced by T17 are false fixes. Mitigation: Version control. Quinn compares git diff for T17 output and verifies changes are limited to the buggy sections.

R4: Ambiguous requirement (T19) produces untestable output¶

Probability: Medium. If the model outputs pure prose without implementation, T19 fails. Mitigation: The minimum acceptance criteria require a compiling implementation. Prose-only = automatic fail.

R5: SQLite enum handling with Prisma¶

Known limitation: Prisma on SQLite stores enums as strings — not native enum types. This is expected behavior but could confuse the model into thinking enum validation isn't needed. Mitigation: Spec explicitly calls for Zod enum validation on all enum fields in request bodies. Quinn verifies that invalid enum values return 422, not a DB error.

R6: Context overflow on T20¶

Probability: High on local model (which is why it's [cloud-required]). If run on cloud but with a poorly structured prompt, still possible. Mitigation: T20 prompt for Melody should include explicit file list and focused scope. Full codebase context is NOT needed — only the specific files that need changes.

R7: Garmin webhook HMAC test setup¶

The webhook HMAC test requires generating a valid signature for the test payload. Quinn must know the test secret value. Mitigation: The .env.example includes GARMIN_WEBHOOK_SECRET=changeme_garmin. Tests use this fixed value. Quinn generates the expected HMAC in test setup using the same secret.

R8: 30B model writes tests that match its own wrong implementation¶

Probability: Low but nonzero in T09. If the model writes gapScore.ts incorrectly AND writes matching tests, unit tests pass but the logic is wrong. Mitigation: Quinn validates unit tests against the spec's acceptance criteria (AC from T06), not just that tests pass. If tests pass but don't cover AC T06-1 through T06-11, that's a spec gap finding.

10. Scoring Rubric¶

10.1 Per-Task Scoring (Quinn evaluates each task)¶

Each task scored on 5 dimensions:

Dimension	0	1	2
Correctness	Fails >2 AC	Fails 1-2 AC	All AC pass
Completeness	Missing files or exports	Minor gaps	All deliverables present
Code Quality	Lint errors, type errors, or `as any` abuse	Minor style issues	Clean, typed, no warnings
Edge Cases	Ignored spec edge cases	Some handled	All spec'd edge cases handled
Production Proximity	Major rework needed to ship	Minor cleanup needed	Near-production quality

Max per task: 10 points

10.2 Pipeline Scoring (Jules evaluates after full run)¶

Metric	Measurement	Weight
Spec clarity	# of clarifying questions Melody asked Jules during execution	-1 per question (max -10)
Orchestration overhead	# of Jules interventions needed to unblock execution	-2 per intervention (max -10)
Estimation accuracy	% of tasks within 2x of estimated time	% score × 10
Cloud escalation rate	% of `[local-ok]` tasks that actually needed cloud	-2 per escalated task
Feedback loop effectiveness	Did spec quality visibly improve layer to layer?	0-5 subjective score

Max pipeline score: 25 points

10.3 Layer 5 Adversarial Scoring¶

Task	Max Points	Pass Threshold
T17 Bug Hunt	20	≥12 (60%)
T18 Refactor	14	≥8 (57%)
T19 Ambiguous Req	22	≥12 (55%)
T20 Multi-file	10	≥6 (60%)

Layer 5 total max: 66 points

10.4 Go/No-Go Mapping¶

Local Model Pass Rate: - Count tasks tagged [local-ok] that scored ≥6/10 (passes) - Count total [local-ok] tasks (16 tasks: T01-T09, T13-T18, T19 partially) - Pass rate = passes / total

Pass Rate	Verdict	Next Steps
≥70% (≥11/16)	GO	Proceed to APA sprint planning with qwen3-coder:30b as primary for `[local-ok]` tasks
50-69% (8-10/16)	CONDITIONAL GO	Proceed with explicit cloud escalation policy for task types that failed; document failure patterns
<50% (<8/16)	NO GO	Local model insufficient; re-evaluate with different local model or full cloud approach

Additional triggers that force CONDITIONAL GO regardless of pass rate: - T06 (GAP Score) fails: core business logic is the highest-value use case; failure here is significant even if pass rate is technically ≥70% - T17 (Bug Hunt) scores <6/20: debugging ability is essential for sprint work - T10-T12 would fail if run on local (theoretical — they're cloud-required, but if Jules manually tests these with local model as a side experiment)

Triggers that force NO GO regardless of pass rate: - Any security bug makes it to production-proximate output in T10-T12 (even though cloud model executes, if the spec was unclear enough to allow security failures, that's a spec problem) - T20 (Team Performance) completely fails with cloud model: suggests a spec design problem that will block real APA sprints

10.5 Pipeline Health Metrics (thresholds)¶

Metric	Healthy	Warning	Critical
Spec questions asked	0	1-2	>3
Jules interventions	0-1	2-3	>4
Estimation accuracy	>70% within 2x	50-70%	<50%
Cloud escalation of `[local-ok]` tasks	0	1-2	>2

11. Output Directory Structure¶

All output files in ~/.openclaw/workspace/test-runs/local-model-benchmark/prototype/:

prototype/
  .env.example
  package.json
  tsconfig.json
  prisma/
    schema.prisma          ← T01
    seed.ts                ← optional, not required
    migrations/            ← T01 (auto-generated)
  src/
    types/
      index.ts             ← T02
    middleware/
      auth.ts              ← T11
      errorHandler.ts      ← T03
      rateLimiter.ts       ← T15
      requestLogger.ts     ← T03
      rls.ts               ← T12
      roleGuard.ts         ← T11
      validateBody.ts      ← T04
    services/
      gapScore.ts          ← T06
      garminTransform.ts   ← T14
      sessionManager.ts    ← T07
      teamPerformance.ts   ← T20
      timeline.ts          ← T08
    routes/
      athletes.ts          ← T04, T08 additions, T12 additions
      auth.ts              ← T10
      metrics.ts           ← T07
      sessions.ts          ← T07
      teams.ts             ← T05, T20 additions
      webhooks.ts          ← T13
    app.ts                 ← T03
    server.ts              ← T03
  tests/
    unit/
      gapScore.test.ts     ← T09
    integration/
      athletes.test.ts     ← optional, not required for shakedown

12. Pre-Run Checklist (Melody completes before starting)¶

~/.openclaw/workspace/test-runs/local-model-benchmark/prototype/ directory created
Ollama running with qwen3-coder:30b loaded (ollama ps shows it active)
GARMIN_WEBHOOK_SECRET=shakedown-test-secret-2026 set in .env for consistent test HMAC values
Git initialized in prototype/ so diff commands work for T17/T18 validation
Quinn has access to the test directory before Layer 5 starts for the pre-bug baseline run

Spec complete. Forge out. "The blueprint is the product. Everything else is just typing."