ST
StateTrace
Visual Quant & Low-Latency Systems Lab
GitHub
Curriculum/feature-lookahead-bias

Feature Lookahead Bias

research process·L1 · combinator·stub
Replacesthe belief that look-ahead bias only happens in labels.

Every feature has an information-cutoff time (the latest timestamp whose data the feature could reference). Computing a 'feature at time t' that actually depends on data from t+ε produces a backtest that looks great and a live model that breaks. The bias is silent in pipelines that join across time without checking cutoffs — and most ML-on-finance code does this without knowing.

Prerequisites
Bridges
  • walk-forward-validationshared mechanism
    Walk-forward validation re-trains the model on data up to time t, evaluates on [t, t+Δ], rolls forward. The discipline makes feature-cutoff violations measurable — a model that beats walk-forward but fails purged k-fold has lookahead in features, not just labels.
  • train-test-time-leakageshared failure mode
    Time leakage is the meta-failure mode: any pipeline step that mixes information across time produces it. Feature lookahead is one form; label overlap is another; rolling normalisations computed on the full series are a third. The fix is the same: every step gets a cutoff.
Status

This concept is a node in the curriculum DAG. The full lab — page blocks, done state, references — has not been authored yet. The relations above describe where it sits in the graph.

Author at: content/concepts/feature-lookahead-bias/card.ts