Curriculum/purged-k-fold-cv
Purged k-Fold CV
research process·L2 · idiom·stub
Replacesthe belief that scikit-learn's `KFold` is safe for time-series.
Vanilla k-fold randomly shuffles samples; for return data this leaks information between folds because labels with overlapping observation periods correlate. Purged k-fold (López de Prado, AFML ch. 7) removes training samples whose labels overlap with test labels, then adds an embargo period after each test fold to prevent serial-correlation leakage. The fix that makes CV credible for financial backtesting.
Prerequisites
Unlocks—
Bridges
- combinatorially-symmetric-cvmodel to implementationCPCV (López de Prado, AFML ch. 12) generalises purged k-fold to many train/test splits, allowing direct measurement of the Probability of Backtest Overfitting (PBO) — the fraction of splits where in-sample rank predicts out-of-sample rank. Same purging mechanism, richer statistic.
- embargo-period-sizingshared measurementEmbargo length is the serial-correlation horizon of the label generator. Wrong sizing — too small, leakage survives; too large, training samples wasted. The right size is measured (Ljung-Box autocorrelation on label residuals), not picked.
This concept is a node in the curriculum DAG. The full lab — page blocks, done state, references — has not been authored yet. The relations above describe where it sits in the graph.
Author at: content/concepts/purged-k-fold-cv/card.ts