☕ JavaMultithreading

Atomic Variables

The java.util.concurrent.atomic package provides a set of classes that support lock-free, thread-safe operations on single variables: AtomicBoolean, AtomicInteger, AtomicLong, AtomicReference, AtomicIntegerArray, AtomicLongArray, AtomicReferenceArray, AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater, and the Java 8 additions LongAdder, LongAccumulator, DoubleAdder, and DoubleAccumulator. All atomic classes are built on compare-and-swap (CAS), a single CPU instruction that atomically reads a memory location, compares it to an expected value, and writes a new value only if the comparison succeeds — returning a boolean or the previous value indicating whether the swap occurred. CAS enables atomic compound operations (read-modify-write, check-then-act) without locks, avoiding the context switch overhead and lock contention of synchronized blocks. This entry covers the CAS instruction and its JVM mapping, every major atomic class and its API, the ABA problem and its solution via AtomicStampedReference, the performance characteristics of CAS under contention versus synchronized, LongAdder and LongAccumulator as high-throughput alternatives to AtomicLong for specific patterns, and the correct use of atomic classes in building lock-free data structures and algorithms.

Compare-and-Swap — The Hardware Foundation of Atomic Classes

Compare-and-swap (CAS) is a CPU instruction available on all modern processor architectures (CMPXCHG on x86, LDREX/STREX on ARM, LL/SC on POWER and MIPS). It takes three operands: a memory address, an expected value, and a new value. Atomically — as a single indivisible operation at the hardware level, invisible to all other CPU cores during execution — it reads the current value at the address, compares it to the expected value, and if they match, writes the new value to the address and signals success. If the current value does not match the expected value, it leaves the address unchanged and signals failure. The entire read-compare-write sequence is atomic: no other thread or core can observe the address in a state between the comparison and the write. The JVM exposes CAS through sun.misc.Unsafe.compareAndSwapInt/Long/Object(), which maps directly to the CPU instruction with no additional overhead. The java.util.concurrent.atomic classes are thin wrappers around Unsafe that expose CAS through safe, documented APIs: compareAndSet(expectedValue, newValue) returns true if the swap succeeded, false if the current value did not match the expected value. The calling code is responsible for retrying on failure — the standard pattern is a spin loop: read the current value, compute the new value, attempt CAS, and if it fails, re-read and try again. The CAS loop pattern is the basis of all lock-free algorithms. It is correct because if the CAS succeeds, the thread knows no other thread modified the value between its read and its write — the comparison to the expected value is the proof of absence of concurrent modification. If the CAS fails, the thread re-reads the current value (which reflects whatever modification caused the failure) and retries from a consistent starting point. Under low contention, CAS loops complete in one or two iterations and are significantly faster than acquiring a lock, which requires OS involvement and potential thread suspension. Under high contention with many threads competing, CAS loops can spin many times before succeeding, and the increased number of failed CAS attempts (each producing a cache coherence bus transaction) can make CAS less efficient than a mutex that parks losers. The JVM JIT compiler further optimizes CAS operations on x86: the LOCK CMPXCHG instruction is a single atomic bus-locked operation that implicitly includes a full memory fence, so volatile reads and writes surrounding a CAS may have their barrier costs reduced or eliminated. The net result is that AtomicInteger.incrementAndGet() on x86 typically costs one LOCK XADD or LOCK CMPXCHG instruction — roughly 10–30 nanoseconds — versus a few nanoseconds for an uncontested lock acquisition or 200–500 nanoseconds for a contended lock that causes thread parking.

Java

// ── CAS semantics — what compareAndSet does ──────────────────────────
AtomicInteger ai = new AtomicInteger(10);

// compareAndSet(expected, update):
boolean success1 = ai.compareAndSet(10, 20);   // current=10, expected=10: SWAP  → true
System.out.println(success1 + " value=" + ai.get());  // true value=20

boolean success2 = ai.compareAndSet(10, 30);   // current=20, expected=10: NO SWAP → false
System.out.println(success2 + " value=" + ai.get());  // false value=20 (unchanged)

// ── The CAS retry loop — foundation of all lock-free operations ───────
AtomicInteger counter = new AtomicInteger(0);

// Manual CAS loop implementing increment (same as incrementAndGet() internally):
int prev, next;
do {
    prev = counter.get();           // 1. Read current value
    next = prev + 1;               // 2. Compute new value
} while (!counter.compareAndSet(prev, next));  // 3. Attempt CAS; retry if it fails
// After loop: counter is guaranteed to be exactly prev+1 where prev was the
// value seen on the successful iteration — no lost updates possible

System.out.println("Manual CAS increment: " + counter.get());  // 1

// ── Built-in atomic compound operations ──────────────────────────────
AtomicInteger a = new AtomicInteger(5);

System.out.println(a.getAndIncrement());   // 5  (returns old, now 6)
System.out.println(a.incrementAndGet());   // 7  (increments, returns new)
System.out.println(a.getAndDecrement());   // 7  (returns old, now 6)
System.out.println(a.decrementAndGet());   // 5  (decrements, returns new)
System.out.println(a.getAndAdd(10));       // 5  (returns old, now 15)
System.out.println(a.addAndGet(-3));       // 12 (adds, returns new)
System.out.println(a.getAndSet(100));      // 12 (returns old, sets to 100)

// ── CAS under contention — multiple threads incrementing ──────────────
AtomicInteger shared = new AtomicInteger(0);
int numThreads = 10, incPerThread = 100_000;

List<Thread> threads = new ArrayList<>();
for (int i = 0; i < numThreads; i++) {
    threads.add(new Thread(() -> {
        for (int j = 0; j < incPerThread; j++) {
            shared.incrementAndGet();   // CAS loop internally — always correct
        }
    }));
}
threads.forEach(Thread::start);
for (Thread t : threads) t.join();
System.out.println("Expected: " + (numThreads * incPerThread));
System.out.println("Actual:   " + shared.get());  // always exactly 1,000,000

// ── updateAndGet / accumulateAndGet — Java 8+ functional CAS ─────────
AtomicInteger val = new AtomicInteger(3);
int result = val.updateAndGet(n -> n * n);       // atomically: 3 → 9
System.out.println(result);  // 9

int accumulated = val.accumulateAndGet(2, Integer::sum);  // atomically: 9 + 2 = 11
System.out.println(accumulated);  // 11

// getAndUpdate / getAndAccumulate return the OLD value:
int old = val.getAndUpdate(n -> n - 1);   // atomically: 11 → 10
System.out.println("Old: " + old + ", New: " + val.get());  // Old: 11, New: 10

AtomicReference, AtomicArray, FieldUpdater, and the ABA Problem

AtomicReference<V> applies CAS semantics to object references. compareAndSet(expectedRef, newRef) succeeds only if the current reference is the same object as expectedRef (identity comparison, not equals()). This makes AtomicReference useful for atomically swapping out entire immutable value objects — configuration records, state snapshots, node pointers in lock-free data structures — as an alternative to locking. AtomicReference's get() and set() have the same volatile semantics as a volatile field: get() is a volatile read, set() is a volatile write, and compareAndSet() is a volatile read-modify-write with full fence semantics. The ABA problem is a subtle hazard in CAS-based algorithms that use object identity or primitive values as the expected value. Thread A reads value V from a shared location. Thread B changes it to W. Thread C changes it back to V. Thread A's CAS expecting V succeeds — even though the value went through an intermediate state that A did not observe. For primitive counters (AtomicInteger, AtomicLong), ABA is usually harmless because the counter is only ever incremented and the intermediate state does not matter. For pointer-based data structures (lock-free stacks, queues, linked lists), ABA is dangerous: a node that was removed and re-added at the same memory address looks like it was never touched, but the structure around it has changed, and a CAS that succeeds based on the pointer value alone can corrupt the data structure. AtomicStampedReference<V> solves the ABA problem by pairing the reference with an integer stamp (version counter). CAS on an AtomicStampedReference requires both the reference and the stamp to match. Since the stamp is incremented on every write, a value that goes A→B→A will have different stamps at each A, so a stale CAS expecting (A, stamp=1) will fail after the second A (which has stamp=3). AtomicMarkableReference<V> is a lighter version that pairs the reference with a single boolean mark, useful for logical deletion in lock-free linked lists. The atomic array classes (AtomicIntegerArray, AtomicLongArray, AtomicReferenceArray) provide CAS semantics on individual elements of an array. They are more memory-efficient than an array of AtomicInteger objects because they store values as a primitive array (avoiding object header overhead) and use Unsafe to perform element-level CAS. Each element has the same volatile semantics as a standalone AtomicInteger. The FieldUpdater classes (AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater) allow atomic CAS operations on volatile fields of existing objects without wrapping those fields in atomic wrapper objects. This is a memory optimization for high-volume objects (like nodes in a concurrent data structure) where adding an AtomicInteger field would add an object header per node. The updater is created once (statically) and applied to any instance of the target class. The target field must be volatile and accessible.

Java

// ── AtomicReference — atomic reference swap ──────────────────────────
record Config(String host, int port, int timeout) {}

AtomicReference<Config> configRef = new AtomicReference<>(
    new Config("localhost", 8080, 30)
);

// Safe configuration update — swap entire immutable Config record:
Config oldConfig = configRef.get();
Config newConfig = new Config("prod.example.com", 443, 60);
boolean swapped = configRef.compareAndSet(oldConfig, newConfig);
System.out.println("Config swapped: " + swapped);
System.out.println("Current: " + configRef.get());

// Atomic conditional update pattern (retry until success):
Config updated;
do {
    Config current = configRef.get();
    updated = new Config(current.host(), current.port(), current.timeout() * 2);
} while (!configRef.compareAndSet(configRef.get(), updated));
// Note: must re-read inside loop body to avoid using stale reference as expected value

// ── ABA problem — why AtomicStampedReference exists ──────────────────
AtomicInteger simpleRef = new AtomicInteger(1);   // value = 1 (represents 'A')

// Thread A reads value = 1:
int observedByA = simpleRef.get();   // 1

// Thread B changes 1 → 2 → 1:
simpleRef.compareAndSet(1, 2);   // A → B
simpleRef.compareAndSet(2, 1);   // B → A (back to same value)

// Thread A's CAS expects 1, sees 1 — SUCCEEDS even though value changed and came back:
boolean casResult = simpleRef.compareAndSet(observedByA, 99);
System.out.println("ABA CAS succeeded (incorrectly): " + casResult);  // true — ABA problem!

// ── AtomicStampedReference — version counter defeats ABA ─────────────
AtomicStampedReference<String> stampedRef =
    new AtomicStampedReference<>("A", 0);   // initial value "A", stamp 0

// Thread A reads value and stamp:
int[] stampHolder = new int[1];
String observedVal = stampedRef.get(stampHolder);   // "A", stamp=0
int   observedStamp = stampHolder[0];               // 0

// Thread B: A → B → A, incrementing stamp each time:
stampedRef.compareAndSet("A", "B", 0, 1);   // success: (A,0) → (B,1)
stampedRef.compareAndSet("B", "A", 1, 2);   // success: (B,1) → (A,2)

// Thread A: tries CAS with stale stamp — FAILS:
boolean abaFixed = stampedRef.compareAndSet(
    observedVal, "C", observedStamp, observedStamp + 1);
System.out.println("ABA fixed — CAS failed: " + !abaFixed);  // true — correctly rejected
// Current state: ("A", stamp=2) — A's stale (stamp=0) doesn't match

// ── AtomicIntegerArray — element-level CAS on arrays ─────────────────
AtomicIntegerArray scores = new AtomicIntegerArray(new int[]{10, 20, 30, 40, 50});

System.out.println(scores.get(2));                    // 30
scores.set(2, 35);                                    // volatile set
System.out.println(scores.getAndAdd(2, 5));           // 35 (now 40)
scores.compareAndSet(2, 40, 100);                     // 40→100
System.out.println(scores.get(2));                    // 100

// Thread-safe increment of array elements by multiple threads:
AtomicIntegerArray counters = new AtomicIntegerArray(5);
Thread[] workers = new Thread[50];
for (int i = 0; i < workers.length; i++) {
    int bucket = i % 5;
    workers[i] = new Thread(() -> {
        for (int j = 0; j < 1000; j++) counters.incrementAndGet(bucket);
    });
    workers[i].start();
}
for (Thread t : workers) t.join();
// Each bucket gets exactly 10,000 increments (50 threads / 5 buckets * 1000):
for (int i = 0; i < 5; i++) System.out.print(counters.get(i) + " ");  // 10000 10000 ...

// ── AtomicReferenceFieldUpdater — zero-overhead atomic on existing fields
class QueueNode<T> {
    volatile QueueNode<T> next = null;   // volatile field — target for updater
    final T value;
    QueueNode(T value) { this.value = value; }
}

// Created once — statically, no overhead per node:
@SuppressWarnings("rawtypes")
static final AtomicReferenceFieldUpdater<QueueNode, QueueNode> NEXT_UPDATER =
    AtomicReferenceFieldUpdater.newUpdater(QueueNode.class, QueueNode.class, "next");

QueueNode<String> head = new QueueNode<>("head");
QueueNode<String> tail = new QueueNode<>("tail");
// CAS on 'next' field without wrapping in AtomicReference — saves one object per node:
boolean linked = NEXT_UPDATER.compareAndSet(head, null, tail);
System.out.println("Linked: " + linked + ", head.next=" + head.next.value);  // Linked: true

LongAdder, LongAccumulator, and Choosing Between Atomic Classes

AtomicLong is correct and efficient for concurrent counters when the number of threads contending for it is small or moderate. But under high contention — many threads all trying to CAS the same AtomicLong simultaneously — the performance degrades significantly. When many threads attempt CAS at the same time, only one succeeds per round; all others see a failed CAS and must retry. The failed CAS attempts each generate cache coherence traffic (the LOCK CMPXCHG instruction forces all other cores to invalidate their cached copy of the cache line), and the resulting cache line bouncing — the same cache line being shuttled between cores — becomes a bottleneck. Under 64-thread contention on a modern server, AtomicLong throughput can be 10–50x lower than LongAdder. LongAdder, introduced in Java 8, solves this by distributing contention across multiple cells. Internally, LongAdder maintains a base variable and an array of Cell objects, each padded to occupy a full cache line (64 bytes) to prevent false sharing between cells. Under low contention, all additions go to base. Under contention, threads are hashed to different cells and increment their assigned cell. The cell array expands (up to the number of CPU cores) as contention increases. The sum() method adds base and all cell values to produce the current total. This reduces cache line contention from all threads fighting over one cache line to each thread mostly owning its cell's cache line, dramatically improving throughput. The trade-off of LongAdder is that sum() does not return a point-in-time consistent value — increments may be happening in other cells during the traversal of cells in sum(). LongAdder is ideal for high-throughput counters where you accumulate frequently and read the total infrequently: monitoring metrics, event counters, performance counters, hit rate trackers. It is wrong when you need a precise snapshot (use AtomicLong.get() after all writers have stopped) or when the value is read as frequently as it is written (the cells add up on every read, eliminating the throughput benefit). LongAccumulator generalizes LongAdder to arbitrary accumulation functions. It is constructed with a LongBinaryOperator (the accumulation function) and an identity value. Threads call accumulate(long x) and the accumulator applies the function to combine x with the current value. LongAccumulator correctly handles non-sum operations like max, min, product, or bitwise operations as long as the function is associative and commutative. DoubleAdder and DoubleAccumulator provide the same patterns for floating-point values, with the caveat that floating-point addition is not perfectly associative, so results may vary slightly from a single-threaded sum depending on the order of cell accumulation. The choice between atomic classes follows the access pattern. Use AtomicInteger/AtomicLong/AtomicBoolean for single-variable state that requires compare-and-set or complex conditional updates (CAS loops). Use AtomicReference for atomically swapping immutable value objects. Use LongAdder when throughput on frequent increments under high contention is the priority and sum() is called infrequently. Use AtomicStampedReference when the ABA problem is a concern. Use FieldUpdater when memory per node matters and the field is already volatile. Use synchronized or ReentrantLock when the critical section includes multiple variables or complex invariants that cannot be expressed as a single CAS.

Java

// ── LongAdder vs AtomicLong under high contention ────────────────────
int numThreads = 16;
int iterations = 1_000_000;

// AtomicLong — all threads contend for same cache line:
AtomicLong atomicLong = new AtomicLong(0);
long atomicStart = System.nanoTime();
List<Thread> atomicWorkers = new ArrayList<>();
for (int i = 0; i < numThreads; i++) {
    atomicWorkers.add(new Thread(() -> {
        for (int j = 0; j < iterations; j++) atomicLong.incrementAndGet();
    }));
}
atomicWorkers.forEach(Thread::start);
for (Thread t : atomicWorkers) t.join();
long atomicTime = System.nanoTime() - atomicStart;
System.out.printf("AtomicLong:  value=%d  time=%dms%n",
    atomicLong.get(), atomicTime / 1_000_000);

// LongAdder — threads distributed across cells, minimal contention:
LongAdder adder = new LongAdder();
long adderStart = System.nanoTime();
List<Thread> adderWorkers = new ArrayList<>();
for (int i = 0; i < numThreads; i++) {
    adderWorkers.add(new Thread(() -> {
        for (int j = 0; j < iterations; j++) adder.increment();
    }));
}
adderWorkers.forEach(Thread::start);
for (Thread t : adderWorkers) t.join();
long adderTime = System.nanoTime() - adderStart;
System.out.printf("LongAdder:   value=%d  time=%dms%n",
    adder.sum(), adderTime / 1_000_000);
// Typical result: LongAdder is 5–20x faster under high contention

// ── LongAdder API ─────────────────────────────────────────────────────
LongAdder counter = new LongAdder();
counter.increment();            // add 1
counter.decrement();            // subtract 1
counter.add(42);                // add arbitrary long
System.out.println(counter.sum());           // current total (not a snapshot)
System.out.println(counter.sumThenReset());  // get total and atomically reset to 0
System.out.println(counter.intValue());      // sum() cast to int
System.out.println(counter.longValue());     // same as sum()

// ── LongAccumulator — arbitrary associative+commutative functions ─────
// Running maximum:
LongAccumulator maxTracker = new LongAccumulator(Long::max, Long.MIN_VALUE);
List<Thread> maxWorkers = new ArrayList<>();
long[] values = {42L, 17L, 99L, 3L, 77L, 55L, 100L, 8L};
for (long v : values) {
    maxWorkers.add(new Thread(() -> maxTracker.accumulate(v)));
}
maxWorkers.forEach(Thread::start);
for (Thread t : maxWorkers) t.join();
System.out.println("Max: " + maxTracker.get());  // 100

// Running minimum:
LongAccumulator minTracker = new LongAccumulator(Long::min, Long.MAX_VALUE);
for (long v : values) minTracker.accumulate(v);
System.out.println("Min: " + minTracker.get());  // 3

// Product accumulator (identity = 1):
LongAccumulator productAcc = new LongAccumulator((a, b) -> a * b, 1L);
for (long v : new long[]{2L, 3L, 5L}) productAcc.accumulate(v);
System.out.println("Product: " + productAcc.get());  // 30

// ── DoubleAdder — high-throughput floating-point accumulation ─────────
DoubleAdder totalRevenue = new DoubleAdder();
double[] transactions = {19.99, 49.99, 5.00, 129.99, 9.99};
Arrays.stream(transactions)
    .parallel()   // each parallel thread adds to its own cell
    .forEach(totalRevenue::add);
System.out.printf("Total revenue: %.2f%n", totalRevenue.sum());  // 214.96

// ── Choosing the right atomic class ──────────────────────────────────
// Single counter, low contention, need CAS:
AtomicLong sequence = new AtomicLong(0);
long nextId = sequence.incrementAndGet();  // unique ID generation — AtomicLong correct

// High-throughput counter, many writers, sum read infrequently:
LongAdder requestCount = new LongAdder();
requestCount.increment();    // called millions of times per second — LongAdder correct

// Atomic object swap without ABA concern:
AtomicReference<String> token = new AtomicReference<>("old-token");
token.set("new-token");      // or compareAndSet for conditional swap

// Atomic object swap with ABA concern (e.g., lock-free stack node):
AtomicStampedReference<Object> nodeRef =
    new AtomicStampedReference<>(new Object(), 0);
// Bump stamp on every update to detect ABA

// Multiple variables — atomic as a group:
// synchronized or ReentrantLock — cannot express multi-variable atomicity with CAS alone
synchronized (this) {
    balance -= amount;     // both must change atomically
    ledger.add(amount);    // no atomic class can do two fields atomically
}

Atomic Variables

Compare-and-Swap — The Hardware Foundation of Atomic Classes

AtomicReference, AtomicArray, FieldUpdater, and the ABA Problem

LongAdder, LongAccumulator, and Choosing Between Atomic Classes

Related Topics in Multithreading