ST
StateTrace
Visual Quant & Low-Latency Systems Lab
GitHub
Curriculum/profiling-cpp

Profiling C++

execution quality·L2 · idiom·stub
Replacesthe belief that 'I think this is the hot path' is enough.

perf records sampled CPU stacks under load; Tracy gives microsecond-resolution timelines of named regions; VTune adds hardware counters (cache misses, branch mispredicts, IPC). The discipline is: write the code, run the profiler, find the actual hot path (which is often not what you expected), optimise *that*. Profiling-driven optimisation beats intuition-driven optimisation by orders of magnitude.

Prerequisites
Bridges
  • perf-vs-tracy-vs-vtuneshared measurement
    perf is the open-source default (Linux only, sampling profiler). Tracy is the low-overhead interactive timeline tool (cross-platform, instrumentation-based). VTune adds hardware counters and microarchitectural analysis (Intel CPUs). Each surfaces a different signal; serious work uses two of the three.
  • cpu-cache-hierarchyshared mechanism
    Profiling that ignores cache misses misses the most common hot path. L1 ≈ 1 ns, L2 ≈ 4 ns, L3 ≈ 12 ns, RAM ≈ 100 ns. A 'fast' loop with L3 misses is 10× slower than the same loop with L1 hits — and only the profiler shows you which is which.
Status

This concept is a node in the curriculum DAG. The full lab — page blocks, done state, references — has not been authored yet. The relations above describe where it sits in the graph.

Author at: content/concepts/profiling-cpp/card.ts