ST
StateTrace
Visual Quant & Low-Latency Systems Lab
GitHub
Curriculum/race-condition

Race Condition

synchronization·L1 · combinator
Replacesthe belief that 'thread-safe' is a property of operations.

Race conditions are properties of programs — the outcome depends on an interleaving the program cannot control. Shared Counter is one instance; the same shape recurs in bank-balance updates, lazy initialization, ticket booking, and event-driven trading systems.

Same read-compute-write race in three surfaces. Switch approach to see the pattern repeat.

Steps
1/12
Stateshared + per-thread
T1
T1 local
statusidle
balance
new_balance
amount
shared
shared memoryread
balance100
counter per stepnow = 100
T2
T2 local
statusidle
balance
new_balance
amount
Race conditions are a bug class. Correctness, not latency, is the measurement.
OBSERVED

shared.balance = 100. Both worker threads are idle, waiting for a withdraw call.

Bank Balance
1account = {"balance": 100}
2
3def withdraw(amount):
4    balance = account["balance"]       # read
5    new_balance = balance - amount     # compute
6    account["balance"] = new_balance   # write
Historystep 1 / 12
step 1/12
Same race shape, different surface. Walk the other two for the full pattern.

The pattern

A race condition is a bug whose presence depends on the order in which concurrent operations execute. Three ingredients are required and sufficient: (1) shared mutable state, (2) two or more concurrent operations touching it, (3) no contract that orders them. Remove any one ingredient and the race cannot exist.

Where you've already seen this

Shared Counter is the smallest possible race: two threads, one int64_t, one +1 each. The read–compute–write window is the gap where the bug lives. The same gap exists at larger scales — anywhere a program reads a value, decides based on it, and writes back, another thread can interleave between the read and the write.

Three surfaces, same shape

Bank balance. Thread A and thread B each withdraw $50 from an account starting at $100. Both read $100, both compute $50, both write $50. The account ends at $50; one withdrawal vanished.

Lazy initialization. A singleton's constructor runs twice because two threads each check if (instance == null), both see null, both allocate, both write. The second new is leaked. This is the bug that motivated double-checked locking — a pattern that itself has subtle memory-model bugs.

Ticket booking. Thread A and thread B each check whether seat 12C is free, both see free, both attempt to book. The first write succeeds; the second silently overwrites; two customers now hold a confirmation for the same seat.

Bank balance — the smallest non-counter race

python
# Two threads withdrawing from the same account, starting at $100.
# Each withdraws $50. Expected final balance: $0. Observed: $50.

def transfer(account, amount):
    balance = account.read_balance()       # T1 reads 100, T2 reads 100
    new_balance = balance - amount         # both compute 50
    account.write_balance(new_balance)     # both write 50; one withdrawal disappears

# The bank ate $50.

Race condition vs data race

These are not synonyms.

Data race is a specific memory-model violation: two threads access the same memory location, at least one access is a write, and there is no synchronization between them. C++ labels this undefined behaviour. Rust rejects it at compile time.

Race condition is the broader bug class: the program's outcome depends on an uncontrolled interleaving. A program can have a race condition without a data race — e.g. two threads each call a properly-synchronized read_balance() and write_balance(), but the compound read; compute; write sequence is not atomic. Each individual access is race-free; the operation as a whole is not.

Fixing data races (atomics or mutexes around each access) does not automatically fix race conditions. The unit of correctness is the operation, not the access.

The shape of every fix

Three fix shapes account for the concurrency primitives Stage 1 introduces. Other shapes — lock-free CAS loops, transactional memory, single-threaded event loops — live in later stages and are not reducible to these three.

Serialize. Make the compound operation indivisible. Mutex holds a lock for the duration of the section. Atomic RMW collapses three instructions into one indivisible one. The cost is contention.

Immutable. Do not share mutable state. Pass copies, return new values instead of mutating. Functional-style concurrency. The cost is memory allocation and bookkeeping.

Own, don't share. Ownership transfer. Only one thread at a time has the right to mutate; transfer is explicit. Rust's Send model, Go's share by communicating, the actor model. The cost is a different mental model.

GIL is serialize at runtime scope (one giant mutex on all interpreter state). Mutex is serialize at section scope. Borrow checker is own, don't share enforced at compile time. The three Stage 1 primitives are three points in this design space.

Identify the race

Look at the bank-balance code above. Answer three questions in writing: (a) precisely which two steps the race window sits between; (b) which of the three fix shapes applies most naturally here; (c) why adding volatile to the balance variable would not help.

Expected: (a) Between `read_balance()` returning and `write_balance()` being called. (b) Serialize — wrap the entire transfer in a mutex held for the duration. (c) `volatile` constrains compiler caching of a single access; the bug is at the compound-operation level. Both threads still execute `read; compute; write` and can still interleave between the steps even with the variable marked volatile.

Bridges
  • order-book-event-orderingshared failure mode
    Race conditions and market-event-ordering bugs share the same failure shape: state that depends on an unintended interleaving. Same shape, different domain — the bridge from Stage 1 to Stage 5.
    Where this shows up
    • L2 update applied before the L1 quote it depends on
    • Two venues delivering the same trade in different order across the feed
    • Sequence-gap recovery that replays out of order under load
  • deterministic-replayshared measurement
    Deterministic replay is how an unintended interleaving becomes observable enough to reproduce and classify.
    Where this shows up
    • rr record / rr replay reproduces a race on the same syscall sequence
    • Recorded scheduler decisions let the failing interleaving run twice
    • ChaosMonkey-style fuzzers seed the scheduler to replay a known bad trace
Done state

Evidence the learner produces, checks that confirm it.

Evidence
  • observable behaviorArticulates the three-ingredient pattern (shared mutable state + concurrent operations + no ordering contract) and applies it to a code snippet not previously seen.
  • artifactWritten distinction between race condition and data race, one paragraph each, without using the phrase 'memory model'.
Checks
  • trace · race-condition/bank-balanceIdentifies the step at which both threads hold balance = 100 in their local copy and neither has written back — the moment one of the two withdrawals is guaranteed to vanish. The race window made visible in the smallest non-counter form.
  • manualExplains the bank-balance lab in three sentences: where the race window sits, which fix shape applies, why `volatile` does not help.
  • manualStates the difference between race condition and data race in one sentence each, without using the word 'memory'. Reference answer: data race is the memory-level form — two unsynchronized accesses, at least one a write; race condition is the program-level form — outcome depends on uncontrolled interleaving. A program can have the latter without the former.
References