Curriculum/probabilistic-calibration

Probabilistic Calibration

statistics·L2 · idiom·stub

Replacesthe belief that a classifier's predicted probability is its true probability.

Most ML models are *sharp* (confident) but *poorly calibrated* (a predicted 0.9 may correspond to a 0.7 empirical frequency). The calibration check is the reliability diagram: bin predictions by probability, plot bin-mean against bin-empirical-frequency, look for the diagonal. Platt scaling and isotonic regression fix calibration post-hoc; weighted log-likelihood loss fixes it at training time.

Prerequisites

bootstrap-resampling

Unlocks—

Bridges

reliability-diagramsshared measurement
Reliability diagrams visualise calibration: predicted probability on x-axis, empirical frequency on y-axis, perfect calibration is the diagonal. Brier score is the integrated squared deviation. Both are essential for any probabilistic strategy (e.g. Kelly sizing depends on calibrated probabilities to size correctly).
platt-scaling-vs-isotonicmodel to implementation
Platt scaling (logistic regression on raw scores) is the simple post-hoc fix; isotonic regression (non-parametric monotonic fit) is more flexible but data-hungry. For typical sample sizes in finance (thousands to tens of thousands of labels), Platt is the default; isotonic earns its keep only at >100k labels.

Status

This concept is a node in the curriculum DAG. The full lab — page blocks, done state, references — has not been authored yet. The relations above describe where it sits in the graph.

Author at: content/concepts/probabilistic-calibration/card.ts