Advanced Reading time: ~18 min

Memory Model

JMM, happens-before relationship, visibility and atomicity

Java Memory Model

The JMM (JSR-133) defines the rules by which threads communicate through memory — specifying exactly when a write by one thread becomes visible to reads by another.

1. Definition

What is it?

The Java Memory Model (JMM), standardised by JSR-133 and introduced in Java 5, is the formal specification that governs how threads interact through shared memory. It answers one key question:

"When is a value written by Thread A guaranteed to be visible to Thread B?"

Without a well-defined memory model, the JVM, the compiler (JIT), and the CPU are all free to reorder operations and cache values in ways that improve single-threaded performance but break multi-threaded correctness.

Why does it exist?

Modern hardware and compilers apply many optimisations that are invisible at the source level:

  • CPU caches: each core has its own L1/L2 cache; writes may not immediately reach main memory.
  • Store buffers: a write issued by a CPU may sit in a store buffer before becoming globally visible.
  • Instruction reordering: both compilers (JIT) and out-of-order CPUs may execute instructions in a different order than written, as long as the single-threaded result is unchanged.

These optimisations are safe in a single-threaded world. In a multi-threaded world, they can cause one thread to see a stale or partial view of another thread's work — leading to data races and visibility bugs.

Where does it fit?

The JMM sits between the Java language/libraries and the underlying hardware. It is the contract that:

  • Library authors (like java.util.concurrent) rely on to build safe abstractions.
  • Application developers invoke implicitly whenever they use volatile, synchronized, or java.util.concurrent primitives.
  • The JVM implementor must uphold when compiling to native code on any hardware platform.

2. Core Concepts

2.1 The Problem Without JMM

Consider two threads sharing variables x and flag:

Example timeline without synchronization:

  1. Thread 1 writes x = 1.
  2. Thread 1 writes flag = true.
  3. Thread 2 observes flag == true.
  4. Thread 2 may still observe x == 0 because visibility is not guaranteed.

Without synchronisation:

  • The compiler may reorder the two writes in Thread 1 (flag before x).
  • Thread 2 may read from its CPU cache and see flag = true but still x = 0.
  • The CPU's store buffer may flush writes in a different order.

Any of these can happen on real hardware (especially ARM and POWER).

2.2 Happens-Before (HB) Relationship

The happens-before relationship is the core of the JMM. If action A happens-before action B, then:

  • All side-effects of A (and everything that A happens-before) are visible to B.
  • The JVM/CPU must not reorder them in a way that violates this guarantee.

HB is not wall-clock time ordering. A can happen-before B even if B executes on a different CPU nanoseconds earlier — it is a logical ordering guarantee.

Built-in HB Rules

Rule Description
Program order Each action in a thread happens-before every subsequent action in that same thread.
Monitor unlock An unlock of a monitor happens-before every subsequent lock of that same monitor.
Volatile write A write to a volatile field happens-before every subsequent read of that field.
Thread start thread.start() happens-before any action in the started thread.
Thread termination All actions in a thread happen-before any thread that detects its termination via join() or isAlive().
Interruption A call to interrupt() happens-before the interrupted thread detects the interrupt.
Finalizer Completion of a constructor happens-before the start of the finalizer for that object.
Transitivity If A HB B and B HB C, then A HB C.

Visualising HB

HB via monitor rule:

  1. Thread 1 writes x = 1.
  2. Thread 1 unlocks the monitor.
  3. Thread 2 locks the same monitor.
  4. Thread 2 must observe x = 1 when reading it.

2.3 Memory Architecture and the Visibility Problem

Without synchronization, visibility usually depends on two layers:

  • Thread-local cache or registers — a thread may keep reading stale values from here.
  • Main memory — the updated value may already exist here, but other threads are not guaranteed to observe it immediately.

This is why Thread 2 can keep seeing an outdated value even after Thread 1 has already written the new one.

Without a happens-before edge between the write in Thread 1 and the read in Thread 2, the JMM makes no guarantee that Thread 2 will ever see the updated value. This is the visibility problem.

2.4 Atomicity

Atomicity means an operation is performed as a single, indivisible unit — no other thread can observe a partially-completed state.

Operation Atomic? Notes
int / boolean / byte / short / char / float read/write ✅ Guaranteed by JLS
long / double read/write ⚠ NOT atomic on 32-bit JVMs (two 32-bit operations)
long / double declared volatile ✅ volatile forces atomic 64-bit access
i++ (any type) ❌ Read-modify-write: three separate operations
AtomicInteger.incrementAndGet() ✅ Uses CAS (compare-and-swap) hardware instruction

Key insight: Even if a field is volatile, the compound operation i++ is not atomic. It reads i, increments, then writes back — another thread can interleave between the read and write.

2.5 `volatile`

Declaring a field volatile provides two guarantees:

  1. Visibility: A write to a volatile field happens-before any subsequent read of that field (volatile write → HB → volatile read).
  2. Reordering prevention: The JVM inserts memory barriers around volatile accesses, preventing the compiler and CPU from reordering ordinary reads/writes across a volatile access.

What volatile does NOT guarantee:

  • Atomicity of compound operations (flag++, lazyInit = new Heavy() in DCL without volatile is broken, etc.)
  • Mutual exclusion

When to use volatile:

  • Simple status flags read by multiple threads (e.g., volatile boolean running)
  • Publishing an immutable object reference (one writer, many readers)
  • Double-checked locking (DCL) pattern — the reference field must be volatile

2.6 `synchronized`

synchronized provides both:

  1. Mutual exclusion (atomicity of blocks): Only one thread holds the monitor at a time.
  2. Visibility (happens-before): All writes before unlock are visible to any thread that subsequently acquires the same lock.
Thread 1                         Thread 2
synchronized(lock) {             synchronized(lock) {
  x = 1;          ──── HB ────â–ș    read x  → 1 ✅
  y = 2;                           read y  → 2 ✅
}                                }

2.7 Memory Barriers / Fences

volatile and synchronized compile down to memory barrier instructions that prevent the CPU from reordering loads and stores across the barrier.

Barrier type Effect
LoadLoad No load may be reordered before a preceding load
StoreStore No store may be reordered before a preceding store
LoadStore No store may be reordered before a preceding load
StoreLoad No load may be reordered before a preceding store — the most expensive

A volatile write inserts a StoreStore barrier before and a StoreLoad barrier after.
A volatile read inserts a LoadLoad + LoadStore barrier after.
synchronized effectively inserts a full fence on lock acquisition and release.


3. Practical Usage

When to Use `volatile`

  • Simple boolean flags or status indicators (volatile boolean shutdown)
  • Publishing a single immutable object reference safely
  • The reference field in double-checked locking (DCL)
  • Counters where you only need visibility (one thread writes, others only read)

When to Use `synchronized`

  • Any situation requiring mutual exclusion (check-then-act, read-modify-write)
  • When multiple fields must be updated atomically as a group
  • When volatile alone is insufficient (compound operations)

When to Use `AtomicXxx`

  • High-contention single-variable counters or accumulators
  • CAS-based non-blocking algorithms
  • Prefer AtomicInteger, AtomicLong, AtomicReference over volatile + manual CAS

When to Use Immutable Objects

If an object is immutable (all fields final, set in constructor, no escaping this), it is safely published to all threads via any mechanism — including plain assignment. The JMM special-cases final fields: their values are guaranteed visible after the constructor completes.

Safe Publication

An object is safely published when the reference to it is made visible to other threads through a properly synchronised mechanism:

Mechanism Why it's safe
static initialiser Class loading is synchronised by the JVM
final field Frozen after construction by JMM guarantee
volatile field Volatile write HB volatile read
Properly locked field Monitor rule
java.util.concurrent collections Internal use of volatile/locking

Double-Checked Locking (DCL)

DCL is a common singleton pattern. Without volatile it is broken prior to Java 5:

// ❌ BROKEN — reference may be seen partially initialised
class Singleton {
    private static Singleton instance;
    public static Singleton getInstance() {
        if (instance == null) {               // check 1 (no lock)
            synchronized (Singleton.class) {
                if (instance == null) {       // check 2 (with lock)
                    instance = new Singleton(); // reordering possible!
                }
            }
        }
        return instance;
    }
}

new Singleton() is three operations: allocate, initialise fields, assign reference. The JIT can reorder assign before initialise. Another thread may see a non-null but incompletely initialised object.

// ✅ CORRECT — volatile prevents reordering of assignment
class Singleton {
    private static volatile Singleton instance;
    public static Singleton getInstance() {
        if (instance == null) {
            synchronized (Singleton.class) {
                if (instance == null) {
                    instance = new Singleton();
                }
            }
        }
        return instance;
    }
}

4. Code Examples

Example 1 — Visibility Bug (Infinite Loop)

// Without volatile, this loop may never terminate!
public class VisibilityBug {
    private static boolean running = true; // ❌ not volatile

    public static void main(String[] args) throws InterruptedException {
        Thread worker = new Thread(() -> {
            while (running) { /* spin */ }
            System.out.println("Stopped.");
        });
        worker.start();

        Thread.sleep(100);
        running = false; // Thread 1 writes, but worker may never see it!
        System.out.println("Set running = false");
    }
}

Fix: declare private static volatile boolean running = true;

Example 2 — Broken vs Correct Double-Checked Locking

// ❌ Broken DCL — missing volatile
class BrokenSingleton {
    private static BrokenSingleton instance;
    public static BrokenSingleton get() {
        if (instance == null) {
            synchronized (BrokenSingleton.class) {
                if (instance == null) instance = new BrokenSingleton();
            }
        }
        return instance; // May return partially initialised object!
    }
}

// ✅ Correct DCL — volatile on instance
class CorrectSingleton {
    private static volatile CorrectSingleton instance;
    public static CorrectSingleton get() {
        if (instance == null) {
            synchronized (CorrectSingleton.class) {
                if (instance == null) instance = new CorrectSingleton();
            }
        }
        return instance;
    }
}

Example 3 — volatile Counter Pitfall (Not Atomic!)

public class VolatileCounter {
    private volatile int count = 0; // ❌ volatile does NOT make ++ atomic!

    public void increment() {
        count++; // read → increment → write (3 ops, can race)
    }

    public int get() { return count; }
}

// With 1000 threads each calling increment() once, final count < 1000 is possible!

Example 4 — Correct Atomic Counter

import java.util.concurrent.atomic.AtomicInteger;

public class AtomicCounter {
    private final AtomicInteger count = new AtomicInteger(0);

    public void increment() {
        count.incrementAndGet(); // ✅ CAS-based, lock-free, atomic
    }

    public int get() { return count.get(); }
}

Example 5 — Safe Publication via Final Fields

// Immutable object: safe to publish via any reference
public final class ImmutablePoint {
    private final int x;
    private final int y;

    public ImmutablePoint(int x, int y) {
        this.x = x;
        this.y = y;
        // After constructor returns, all readers guaranteed to see x and y
    }

    public int getX() { return x; }
    public int getY() { return y; }
}

// Even a plain (non-volatile) assignment is safe for immutable objects
// published via static initializer:
public class Config {
    public static final ImmutablePoint ORIGIN = new ImmutablePoint(0, 0); // ✅ safe
}

Common Pitfall — Checking Flag Without Synchronisation

// ❌ race condition: check-then-act without atomicity
if (!map.containsKey(key)) {
    map.put(key, computeValue()); // another thread may have inserted between check and put
}

// ✅ Use ConcurrentHashMap.computeIfAbsent for atomic check-then-put
map.computeIfAbsent(key, k -> computeValue());

5. Trade-offs

Aspect volatile synchronized AtomicXxx
⚡ Performance Low overhead, ~memory barrier only Higher — lock acquisition/release involves OS or spin Low–medium — CAS may retry under contention
🔒 Mutual exclusion ❌ None ✅ Yes ✅ Per-variable (CAS)
đŸ‘ïž Visibility ✅ Yes ✅ Yes ✅ Yes
🔱 Compound ops ❌ Not atomic ✅ If in same block ✅ Per-method (e.g. compareAndSet)
đŸ’Ÿ Memory Minimal Monitor object overhead Object per variable
🔧 Maintainability Simple for flags Clear intent, familiar Good for counters/references
🔄 Scalability High — no blocking Contention reduces throughput High — non-blocking algorithms

False sharing is a hidden performance problem: two volatile fields in the same CPU cache line cause the entire cache line to be invalidated on every write, even from different threads accessing different variables. Use padding or @jdk.internal.vm.annotation.Contended to separate hot fields.


6. Common Mistakes

❌ Mistake 1: Assuming `volatile` makes compound operations atomic

// ❌ volatile does NOT make ++ atomic!
private volatile int counter = 0;
public void increment() { counter++; } // DATA RACE

// ✅ Use AtomicInteger
private final AtomicInteger counter = new AtomicInteger(0);
public void increment() { counter.incrementAndGet(); }

❌ Mistake 2: Double-checked locking without `volatile`

// ❌ Broken before Java 5, and still incorrect — JIT may reorder
private static Resource instance;
// ...
if (instance == null) { synchronized(...) { if (instance == null) instance = new Resource(); } }

// ✅ Must declare volatile
private static volatile Resource instance;

❌ Mistake 3: Sharing mutable state without any synchronisation

// ❌ Both threads access 'list' with no sync — ConcurrentModificationException / data loss
List<String> list = new ArrayList<>();
// Thread 1: list.add("a");
// Thread 2: list.add("b");

// ✅ Use thread-safe collection
List<String> list = Collections.synchronizedList(new ArrayList<>());
// or
List<String> list = new CopyOnWriteArrayList<>();

❌ Mistake 4: Over-synchronizing

// ❌ Locking on every read of an immutable value — unnecessary contention
public synchronized String getImmutableConfig() { return config; }

// ✅ Immutable / final fields need no locking
private final String config = "value";
public String getConfig() { return config; }

❌ Mistake 5: Confusing happens-before with wall-clock ordering

// WRONG mental model: "Thread 1 writes before Thread 2 reads, so Thread 2 sees it"
// HB is a LOGICAL guarantee, not a time guarantee.
// Without a HB edge (volatile/synchronized/etc.), there is NO visibility guarantee
// regardless of how Thread 1 ran earlier in wall-clock time.

7. Senior-level Insights

JSR-133 and the Java 5 Rewrite

The original Java Memory Model (JDK 1.0–1.4) was widely recognised as broken — it couldn't even guarantee correct behaviour for double-checked locking. JSR-133 rewrote the JMM for Java 5, introducing the happens-before formalism, the strengthened volatile semantics, and the final field guarantees that underpin modern concurrent Java.

CPU Memory Models: x86 TSO vs ARM

The JMM is hardware-agnostic, but its implementation cost varies by CPU:

  • x86 (TSO — Total Store Order): x86 already has a relatively strong memory model. volatile reads are essentially free (just a load); only volatile writes need a LOCK XCHG or MFENCE. This is why many JMM bugs only manifest on ARM or POWER.
  • ARM/POWER (weak memory model): Requires explicit dmb / sync barrier instructions for both reads and writes, making volatile more expensive.

False Sharing and `@Contended`

When two hot volatile fields share a CPU cache line (typically 64 bytes), writes to either field invalidate the entire cache line for all other CPUs, causing false sharing — a silent performance killer in high-throughput concurrent code.

// ❌ False sharing: counter and flag likely on same cache line
class Shared {
    volatile long counter = 0;
    volatile boolean flag = false;
}

// ✅ Use @Contended (JDK internal, requires --add-opens or JVM flag)
// or manual padding
class Padded {
    volatile long counter = 0;
    long p1, p2, p3, p4, p5, p6, p7; // 56 bytes padding
    volatile boolean flag = false;
}

VarHandle (Java 9+)

java.lang.invoke.VarHandle provides fine-grained control over memory ordering semantics without the overhead of full volatile:

Access mode Ordering guarantee
getPlain / setPlain No ordering (like non-volatile)
getOpaque / setOpaque Coherent per-variable ordering
getAcquire / setRelease Acquire/release semantics (cheaper than full volatile)
getVolatile / setVolatile Full volatile semantics
compareAndSet CAS with full volatile semantics

Acquire/release (used extensively in java.util.concurrent) is cheaper than full volatile on weak-memory architectures because it only requires one-directional barriers.

`final` Fields and Safe Publication

The JMM provides a special guarantee for final fields: once a constructor completes and the reference does not escape during construction, all threads will see the correctly initialised values of all final fields without any additional synchronisation. This is the foundation of immutability-based safe publication.

Lock-Free Algorithms and CAS

AtomicInteger, AtomicReference, etc. use compare-and-swap (CAS) — a single atomic CPU instruction (CMPXCHG on x86) that reads, compares, and conditionally writes in one indivisible step. This enables non-blocking algorithms with higher throughput than lock-based alternatives under moderate contention. Under very high contention, CAS retry loops (ABA problem, contended CAS) can degrade to worse performance than a well-tuned lock.


8. Glossary

Term Definition
JMM Java Memory Model — the formal specification (JSR-133) defining how threads share memory.
Happens-Before A logical ordering guarantee: if A HB B, all of A's effects are visible to B.
Visibility Whether a write by one thread can be observed by a read in another thread.
Atomicity A property of an operation: it executes as one indivisible unit, with no observable intermediate state.
volatile Java keyword that enforces visibility and prevents reordering, but not mutual exclusion.
synchronized Java keyword providing mutual exclusion and visibility via monitor locks.
Memory Barrier A CPU/compiler instruction that prevents reordering of reads/writes across the barrier.
Race Condition A flaw where the outcome depends on the relative timing of thread execution.
Data Race Two threads access the same memory location concurrently, at least one writes, with no synchronisation.
Monitor The per-object lock mechanism used by synchronized in Java.
Safe Publication Making an object reference visible to other threads in a way that guarantees visibility of its state.
Reordering Compiler or CPU changing the order of memory operations (safe for single-threaded, dangerous for multi-threaded).
Store Buffer CPU hardware buffer that holds pending writes before they reach the cache/main memory.
Cache Coherence Hardware protocol (e.g., MESI) ensuring all CPUs eventually agree on the value of a shared memory location.
False Sharing Two threads unintentionally contending on the same CPU cache line due to proximity of unrelated fields.
CAS Compare-And-Swap — an atomic CPU instruction used to implement lock-free data structures.
Acquire/Release Weaker memory ordering semantics: acquire (after a read) prevents subsequent reads/writes from moving before it; release (before a write) prevents preceding reads/writes from moving after it.

9. Cheatsheet

  • 🔑 HB is the JMM's core rule: establish a happens-before edge or you have no visibility guarantee.
  • đŸ·ïž volatile = visibility + reordering prevention; NOT mutual exclusion or compound-operation atomicity.
  • 🔒 synchronized = mutual exclusion + visibility; use when multiple fields or compound operations must be atomic.
  • ⚛ AtomicInteger / AtomicReference = lock-free, CAS-based atomic operations for single variables.
  • ♟ i++ is NEVER atomic, even on a volatile field — it is read + modify + write.
  • đŸ—ïž DCL pattern requires volatile on the reference field to prevent partially-initialised objects.
  • 🧊 final fields are freely safe after construction — use immutable objects for the simplest thread safety.
  • 🐌 False sharing silently kills performance; pad hot volatile fields or use @Contended.
  • 🔧 VarHandle (Java 9+) offers acquire/release semantics — cheaper than full volatile on weak-memory CPUs.
  • ⚠ Happens-before ≠ wall-clock time: logical ordering, not chronological ordering.

🎼 Games

8 questions