GAP Score — Public Dataset Validation Research¶

Date: 2026-03-19 Status: Complete — Ready for Review Origin: Jeff priority directive (2026-03-19) Assigned: Atlas

Research Question¶

Can we retroactively validate the GAP Score algorithm against existing public datasets that contain wearable biometric data (HRV, sleep, activity) and injury/performance outcomes?

Short answer: Partially. No single public dataset covers all GAP Score inputs simultaneously. Full retroactive validation isn't possible — but partial validation of the CV Index component is, and the prospective path with Jeff's Garmin data is the fastest credible route to full validation.

1. Public Dataset Inventory¶

Dataset	Source	Key Biometric Signals	Sample Size	Outcome Data	Access
PMData	datasets.simula.no/pmdata	HR, activity (Fitbit Versa 2), subjective sleep quality (self-report), wellness questionnaires	16 runners, 5 months	Injury reports, RPE, perceived fatigue via PMSys app	Free download, no IRB required
Soccer Athlete Health Dataset	nature.com/articles/s41597-024-03386-x	GPS training load (ATL/CTL/ACWR), RPE, subjective wellness/HRV (Polar), injury status	~120 athletes, multi-season	Injury incidence, match availability, training load spikes	Free download, CC license
LifeSnaps	zenodo.org/record/6826682	Fitbit HRV, sleep (light/deep/REM via Fitbit algorithm), activity, EMA stress surveys	~71 participants, 4 months	Wellbeing scores, stress, mood — no injury outcomes	Free download via Zenodo
Garmin-RUNSAFE Running Health Study	jospt.org/doi/10.2519/jospt.2023.11959	Running dynamics (GCT, vertical oscillation, cadence, stride length), training load	7,000+ runners, 18 months	Running-related injury (self-reported, 3-tier classification)	Not publicly available — proprietary to study/Garmin; papers publish findings only
HybridSense / Nature HRV+Sleep Diary	nature.com/articles/s41597-025-05801-3	Smartwatch HRV (100ms sampling), self-reported sleep diary	N=TBD (new 2025)	Sleep quality — no injury or performance outcomes	Open access, data availability pending publication
WESAD	dl.acm.org/doi/10.1145/3242969.3242985	EDA, BVP, IBI, HR, temp (Empatica E4 + Respiban)	15 subjects, lab study	Stress/affect states (lab-induced) — no injury, no sleep staging	Free download from authors
PhysioNet Wearable Stress+Exercise	physionet.org/content/wearable-device-dataset/1.0.1	EDA, BVP, HR, temp, ACC (Empatica E4)	36 healthy volunteers	Acute stress/exercise conditions — no injury, no longitudinal sleep	Free, no credentialed access required

2. Signal Overlap Analysis — Top 3 Recommendations¶

The GAP Score algorithm requires: overnight HRV (RMSSD), sleep staging (deep/REM split), running dynamics (GCT, vertical oscillation), and injury/performance outcome.

Signal overlap rated 1–5 where: 1 = minimal overlap, 5 = near-complete match.

Rank 1: PMData — Signal Overlap: 2/5¶

Best available for partial CV Index validation.

✅ Longitudinal (5 months, real-world)
✅ Injury reports + wellness/fatigue questionnaires (proxy outcome)
✅ Activity tracking (step-based training load proxy)
⚠️ HR data present but HRV is subjective/self-reported — no RMSSD
❌ No sleep staging (deep sleep / REM split) — only subjective sleep quality score
❌ No running dynamics (GCT, vertical oscillation, stride length)
❌ Small N (16) — statistical power is limited

Verdict: Can test a simplified version of the CV Index (RHR proxy + subjective wellness). Useful as proof-of-concept for the injury-prediction logic but not the full GAP Score.

Rank 2: Soccer Athlete Health Dataset — Signal Overlap: 2/5¶

Best available for training load + injury correlation testing.

✅ Large N (~120 athletes, multi-season)
✅ Injury incidence data (well-documented)
✅ Acute:Chronic Workload Ratio — structural parallel to GAP Score's CV/CNS split concept
✅ Subjective wellness + perceived recovery
⚠️ Sport-specific (soccer) — not running; no running dynamics translatable to GCT
❌ No sleep staging
❌ No overnight HRV (RMSSD) from wearable
❌ GAP Score CNS Index is running-specific — soccer biomechanics don't map

Verdict: Best for testing the conceptual architecture — does separating training load from recovery state predict injury? Yes, but can't validate the specific formula, only the framework.

Rank 3: LifeSnaps — Signal Overlap: 2/5¶

Best for HRV + sleep quality + stress signal chain.

✅ Fitbit HRV + sleep data (4-stage algorithm: wake, light, deep, REM)
✅ Longitudinal (4 months, free-living)
✅ Stress and wellbeing EMA surveys
❌ No injury outcomes — can only test HRV-sleep-stress correlations, not injury prediction
❌ No running dynamics
❌ Fitbit sleep staging is less accurate than Garmin/Oura (validated research shows Fitbit underperforms on deep sleep accuracy)

Verdict: Can validate the input signal chain (does HRV predict next-day reported wellness/fatigue?) but not the output (injury risk). Good for algorithm calibration, not validation.

Why No Dataset Scores Higher Than 2/5¶

The fundamental gap: running dynamics (GCT, vertical oscillation) — the CNS Index backbone — does not exist in any public dataset alongside longitudinal HRV, sleep staging, AND injury outcomes. The Garmin-RUNSAFE study is the only research that comes close, and its raw data is not publicly released. Every existing public dataset makes at least one fatal compromise for GAP Score validation purposes.

This isn't a research oversight — it reflects how siloed wearable data collection has been. Most studies capture either (a) physiological readiness (HRV/sleep) or (b) biomechanical performance (running dynamics) or (c) injury outcomes, rarely all three in one longitudinal dataset.

3. Validation Approach¶

What "Validated" Would Mean Statistically¶

A valid GAP Score demonstrates predictive validity — meaning the score produced before a training session predicts injury occurrence or performance degradation during/after that session, better than chance and better than a simpler baseline (e.g., just HRV, or just training load).

Minimum viable validation thresholds: - AUC-ROC ≥ 0.70 for injury classification (better than ACWR alone at ~0.62) - Pearson r ≥ 0.40 between GAP Score and next-day RPE/performance output - Hazard ratio ≥ 2.0 for injury risk in high-GAP vs. low-GAP days (Cox proportional hazards model) - N ≥ 50 subjects, ≥ 6 months longitudinal data for meaningful injury incidence

Study Design (Retroactive, with Available Data)¶

Option A — Partial CV Index validation using PMData: 1. Compute a simplified CV Index: RHR (from Fitbit HR data, overnight min) + subjective fatigue/wellness as HRV proxy 2. Define outcome: injury report days (binary) or "yellow/red" wellness threshold 3. Run logistic regression: Does CV Index delta predict next injury event? 4. Baseline comparison: Does plain training load (ACWR) perform similarly? 5. Result: Validates injury-prediction logic but not the full GAP Score formula

Option B — Framework validation using Soccer Dataset: 1. Test whether separating ACWR (load) from wellness (recovery) improves injury prediction vs. ACWR alone 2. Calculate whether an ACWR-wellness GAP (structural analog to GAP Score) predicts injury better 3. Result: Validates the concept (CV/CNS split matters) but not the specific running algorithm

Option C — Full prospective validation with Jeff's data (recommended): 1. Collect: Garmin Fenix 8 data (GCT, vertical oscillation, overnight HRV, sleep staging via Garmin Connect API) 2. Log: Training sessions, perceived recovery, any soreness/injury events (weekly self-report) 3. Duration: 3–6 months minimum; 6 months preferred for meaningful injury incidence 4. N=1 initially (Jeff), then expand to 5–10 volunteer runners 5. Statistical approach: LASSO regression for coefficient tuning (w1–w5), ROC analysis for threshold setting

4. Bottom Line¶

Can we do retroactive public dataset validation? Partially — but not fully.

The CNS Index (the part that makes GAP Score different from everything else) requires running dynamics data that simply isn't available in any open public dataset alongside HRV, sleep staging, and injury outcomes. The Garmin-RUNSAFE study comes closest but the data isn't released.

What we CAN do retroactively: - Validate the CV Index component and injury-prediction logic using PMData (quick, cheap, 2–3 days of work) - Validate the conceptual architecture (does the CV/CNS gap predict outcomes better than single metrics?) using the soccer dataset - These partial validations give credible "supporting evidence" language, not "fully validated" claims

Fastest path to meaningful validation:

Immediate (1 week): Run PMData retroactive analysis to prove CV Index injury-prediction logic. Generates a chart/result we can show.
Short-term (2–4 months): Collect Jeff's Garmin data prospectively. This is already the plan (see README.md). N=1 with clean data is more credible than N=16 with incomplete signals.
Medium-term (6–12 months): Expand to 5–10 volunteer runners. At that point, publish a small case study or whitepaper. This is when "clinically validated" language becomes defensible.

One more thing worth flagging: The Garmin-RUNSAFE team has published. The PI may be reachable for data sharing under a research collaboration agreement. If we want a fast lane to running dynamics + injury data, a data sharing request to the University of Southern Denmark (Rasmus Ørntoft's group) is worth a cold email. Long shot, but low cost.

Sources Cited¶

PMData: Johansen HD et al. (2020). PMData: a sports logging dataset. MMSports '20. https://datasets.simula.no/pmdata/
Soccer Athlete Health Dataset: Nature Scientific Data (2024). https://www.nature.com/articles/s41597-024-03386-x
LifeSnaps: Servia-Rodríguez S et al. (2022). Scientific Data. https://www.nature.com/articles/s41597-022-01764-x | Zenodo: https://doi.org/10.5281/zenodo.6826682
Garmin-RUNSAFE: Ørntoft C et al. (2019). BMJ Open. https://bmjopen.bmj.com/content/9/9/e032627
Garmin-RUNSAFE Results: Møller M et al. (2023). JOSPT. https://www.jospt.org/doi/10.2519/jospt.2023.11959
HybridSense/HRV+Sleep: Scientific Data (2025). https://www.nature.com/articles/s41597-025-05801-3
WESAD: Schmidt P et al. (2018). ICMI '18. https://dl.acm.org/doi/10.1145/3242969.3242985
PhysioNet Wearable Dataset: https://physionet.org/content/wearable-device-dataset/1.0.1/
Wearable validation for running injury (ScienceDirect 2023): https://www.sciencedirect.com/science/article/pii/S1466853X23001578