☕ Java

GC Algorithms

Garbage collection algorithms are the strategies the JVM uses to identify and reclaim unreachable objects. Different algorithms make different trade-offs between throughput, latency, and memory overhead. The foundational algorithms are mark-sweep, mark-compact, and copying — each with distinct properties that make them suitable for different parts of the heap and different application requirements. Modern production collectors (G1GC, ZGC, Shenandoah) combine these algorithms with concurrency, parallelism, and incremental collection to achieve the best overall performance. This entry covers the core algorithms in depth, their correctness properties, performance characteristics, why generational collection is effective, and how the major production collectors implement these algorithms.

Core Algorithms — Mark-Sweep, Mark-Compact, Copying

Mark-sweep is the foundational GC algorithm. It consists of two phases. The mark phase traverses the object graph from all GC roots, marking every reachable object. The sweep phase scans the entire heap, reclaiming the memory of every unmarked (unreachable) object. Mark-sweep is conceptually simple and handles cyclic garbage correctly (a strength over reference counting), but has two significant weaknesses: it does not compact the heap, leading to fragmentation where free memory is split into many small pieces making large allocations impossible; and it must scan the entire heap during the sweep phase, making its cost proportional to heap size rather than live set size. Mark-compact extends mark-sweep by adding a compaction phase after sweep: live objects are moved to one end of the heap, eliminating fragmentation and enabling fast bump-pointer allocation. Compaction eliminates fragmentation at the cost of moving objects, which requires updating all references to moved objects — an expensive operation requiring multiple passes over the heap. Mark-compact is stop-the-world and proportional to the live set size. Copying collection divides the heap into two semi-spaces (fromspace and tospace). It copies all live objects from fromspace to tospace, then swaps the roles of the two spaces. Only live objects are traversed — dead objects are implicitly collected by being left in fromspace which is entirely wiped. Copying is extremely fast for heaps with high garbage rates because its cost is proportional to the live set, not the garbage set. The cost is the 50% memory overhead (only half the heap is usable at any time) and the need to update all references to copied objects. The young generation in all modern collectors uses a form of copying collection — because most young objects are dead, copying the few survivors is very cheap. The old generation uses mark-compact or concurrent marking with incremental compaction — because most old objects are live, copying would be expensive.
Java
// ── Mark-Sweep conceptually ──────────────────────────────────────────
//
// BEFORE GC:
// GC Roots → A → B → C
//                 ↘
//                  D    (D reachable via A→B→D)
//     E → F        (E, F not reachable from any GC root)
//     G             (G not reachable)
//
// MARK phase (traverses from GC roots):
// Marked: A, B, C, D    (reachable)
// Unmarked: E, F, G     (unreachable)
//
// SWEEP phase (scan heap, reclaim unmarked):
// Reclaimed: E, F, G
//
// AFTER GC:
// GC Roots → A → B → C
//                 ↘
//                  D
// Free: [E space][F space][G space]  ← fragmented free space

// ── Copying collection (young gen) ───────────────────────────────────
//
// BEFORE GC (fromspace):
// [A][B][C][D][E][F][G][H][I] ← mostly garbage (dark = live, light = dead)
//  ↑   ↑       ↑       ↑
// live  live   live    live   (only 4 objects are live)
//
// COPY phase (copy only live objects to tospace):
// fromspace: [A][B][C][D][E][F][G][H][I] ← all reclaimed in one step
// tospace:   [A][B][D][H]                 ← only 4 live objects
//             ↑ bump pointer advances sequentially
//
// RESULT: No fragmentation, fast allocation, cost proportional to LIVE set
// Only 4 objects processed — 5 dead objects cost nothing!

// ── When each algorithm is used in HotSpot ────────────────────────────
//
// Young generation:   Copying (semi-space or to-survivor)
//   Cost: proportional to LIVE set → fast because most young objects die
//
// Old generation (G1): Concurrent marking + incremental compaction
//   Cost: proportional to heap occupancy → slower but infrequent
//
// Old generation (ZGC): Concurrent marking + concurrent compaction
//   Cost: concurrent → spread across application execution, minimal STW

// ── Fragmentation — why it matters ───────────────────────────────────
// After many mark-sweep cycles WITHOUT compaction:
// [  free  ][obj][  free  ][obj][  free  ][obj][  free  ]
//   8 bytes   4    16 bytes  4    32 bytes  4    8 bytes
// Total free: 64 bytes
// Can allocate: up to 32-byte object — fragmented, cannot use full free space
//
// After compaction:
// [obj][obj][obj][         64 bytes free          ]
// Can allocate: up to 64-byte object — contiguous, fully usable

Generational Collection — Why It Works

Generational collection exploits the weak generational hypothesis: most objects die young. By segregating objects by age and collecting young objects more frequently than old objects, the collector concentrates effort where garbage density is highest. A minor collection that scans only the young generation (5-50MB) is much faster than a full collection of the entire heap (1-16GB), even though most garbage is young. The young generation in HotSpot consists of Eden space and two Survivor spaces (S0 and S1, also called "from" and "to"). New objects are allocated in Eden using bump-pointer allocation. When Eden fills, a minor GC copies all live objects from Eden and the current "from" Survivor space to the "to" Survivor space (or directly to old gen if they are large or have survived enough GCs). Eden and "from" are then entirely reclaimed. The roles of "to" and "from" swap, and Eden is again empty for new allocations. The tenuring threshold controls when objects are promoted from young to old generation. An object's age is the number of minor GCs it has survived. When age reaches the tenuring threshold (default 15 in HotSpot, dynamically adjusted by adaptive tenuring), the object is promoted to old generation. Objects that are too large to fit in a Survivor space are also promoted directly to old generation. The remembered set (or card table) solves the cross-generational reference problem: old-generation objects can reference young-generation objects. During a minor GC, if only young-generation reachability is traced, an object referenced only from an old-generation object would appear unreachable and be incorrectly collected. The card table records which old-generation regions contain references to young-generation objects. Minor GC scans these recorded regions in addition to GC roots, ensuring cross-generational references are not missed.
Java
// ── Young generation lifecycle ───────────────────────────────────────
//
// Allocation in Eden:
// ┌──────────────────────────────────┬────────┬────────┐
// │     Eden (empty, fills fast)     │ Surv S0│ Surv S1│
// └──────────────────────────────────┴────────┴────────┘
//
// After Eden fills → Minor GC:
// 1. Mark live objects in Eden + S0 (from space)
// 2. Copy survivors to S1 (to space), incrementing their age
// 3. Promote objects with age >= threshold to Old Gen
// 4. Eden + S0 entirely wiped → bump pointer reset to start of Eden
//
// ┌──────────────────────────────────┬────────┬────────┐
// │     Eden (empty again!)          │ empty  │[A,age2]│
// └──────────────────────────────────┴────────┴────────┘
//   ↑ New allocations start here again        ↑ A survived 2 GCs

// ── The card table for cross-generational references ──────────────────
//
// Old Gen:                             Young Gen (Eden):
// ┌────────┬────────┬────────┐         ┌────────────────┐
// │ OldObj │ OldObj │ OldObj │         │ YoungObj       │
// │  ref ──┼────────┼────────┼─────────►  (referenced   │
// │        │        │        │         │  from old gen) │
// └────────┴────────┴────────┘         └────────────────┘
//    card 0   card 1  card 2
//    dirty!                  ← card table records this card is "dirty"
//
// During minor GC: dirty cards are added to GC root set
// → YoungObj is reached via OldObj.ref → kept alive (correctly)

// ── Tenuring threshold — adaptive promotion ───────────────────────────
// java -XX:MaxTenuringThreshold=15   (max age before forced promotion)
// java -XX:InitialTenuringThreshold=7 (starting threshold)
// The JVM dynamically adjusts threshold to keep Survivor spaces ~50% full
//
// jstat -gcnew <pid> shows tenuring info:
// TT MTT  — current threshold, max threshold
// S0U S1U — Survivor space usage
// If Survivor fills → "premature promotion" → more old gen GCs

// ── Object allocation decision tree ──────────────────────────────────
// new Object() call:
//   Is it larger than TLAB?
//     YES → allocate in shared Eden (slower, lock needed)
//     NO  → allocate in TLAB (fast bump pointer, no lock)
//   Does TLAB have space?
//     YES → bump pointer + return (< 10ns)
//     NO  → retire TLAB, get new TLAB from Eden
//          Eden has space?
//            YES → new TLAB from Eden
//            NO  → trigger Minor GC
//                  After GC, Eden has space?
//                    YES → allocate
//                    NO  → promote to Old Gen or OOM

G1GC — The Default Collector

G1GC (Garbage First) is the default collector since Java 9. It divides the heap into a large number of equal-sized regions (typically 2048 regions, 1-32MB each) rather than the fixed young/old areas of earlier collectors. Regions are dynamically assigned roles: Eden, Survivor, Old, or Humongous (for objects larger than half a region). This flexible layout allows G1 to adapt to the application's allocation pattern. G1GC performs most of its work concurrently. The concurrent marking cycle traces the live set while the application runs, then uses the results to prioritise which regions to collect. G1 always collects the regions with the highest garbage density first — hence "Garbage First." This allows G1 to collect only as much as needed to meet the pause time goal without collecting the entire heap. The pause time goal (-XX:MaxGCPauseMillis, default 200ms) is a soft target. G1 tries to limit stop-the-world work to fit within the goal by choosing how many regions to collect per cycle. If the goal is set too low, G1 cannot collect enough and the heap fills, eventually forcing a full STW GC. If the goal is set appropriately, G1 achieves predictable pause times. Mixed GC is G1's mechanism for collecting old-generation regions. After a concurrent marking cycle determines that old generation occupancy is high enough, G1 schedules a series of "mixed" collections that include both young and selected old regions. This avoids the stop-the-world full collection of the old generation that traditional collectors require.
Java
// ── G1GC key flags ────────────────────────────────────────────────────
// java -XX:+UseG1GC             (default Java 9+, explicit for clarity)
// java -XX:MaxGCPauseMillis=200 (target max pause, default 200ms)
// java -XX:InitiatingHeapOccupancyPercent=45  (trigger concurrent mark at 45% full)
// java -XX:G1HeapRegionSize=4m  (region size: 1-32MB, auto-calculated if not set)
// java -XX:G1NewSizePercent=5   (min % of heap for young gen)
// java -XX:G1MaxNewSizePercent=60 (max % of heap for young gen)

// ── G1GC phases ───────────────────────────────────────────────────────
//
// 1. Young-only GC (most frequent — every Eden fill):
//    STW pause, evacuates Eden + Survivor to new Survivor/Old
//    Duration: proportional to live set in young gen
//    Goal: stay within MaxGCPauseMillis
//
// 2. Concurrent Marking Cycle (triggered by IHOP threshold):
//    Initial Mark:    STW, brief — piggybacks on Young GC
//    Root Region Scan: concurrent
//    Concurrent Mark: concurrent — traverses object graph
//    Remark:          STW, brief — finalises marking
//    Cleanup:         STW, brief — identifies empty regions
//
// 3. Mixed GC (follows concurrent marking if old gen is full enough):
//    STW, evacuates young gen + selected old gen regions
//    "Garbage First": selects regions with highest garbage density
//    Continues in multiple increments until old gen reclaimed

// ── G1GC log analysis ─────────────────────────────────────────────────
// Healthy application log pattern:
// [0.234s] GC(1)  Pause Young (Normal) 45M->22M(256M) 8ms    ← Normal
// [1.456s] GC(8)  Pause Young (Normal) 68M->31M(256M) 11ms   ← Normal
// [5.678s] GC(25) Pause Young (Concurrent Start) ...          ← Mark beginning
// [5.679s] Concurrent Cycle                                   ← Mark running
// [8.901s] GC(26) Pause Remark 180M->180M(256M) 3ms           ← Mark finishing
// [9.012s] GC(27) Pause Cleanup 180M->128M(256M) 2ms          ← Empty regions reclaimed
// [9.234s] GC(28) Pause Young (Mixed) 128M->65M(256M) 15ms    ← Mixed collection
//
// PROBLEM signs:
// Pause Full (G1 Compaction Pause) → emergency full GC, investigate
// "to-space exhausted" → heap too small or promotion failure
// Concurrent cycles not completing before next → IHOP too low

ZGC and Shenandoah — Sub-Millisecond Latency

ZGC (Z Garbage Collector) achieves sub-millisecond pause times for heaps ranging from megabytes to terabytes by performing almost all GC work concurrently. Its key innovations are load barriers and coloured pointers. ZGC stores metadata (remapped, marked, finalizable flags) in the unused upper bits of 64-bit pointers. When any thread accesses an object through a reference, a load barrier checks the pointer's metadata and performs any necessary healing (updating the reference to point to the object's new location after compaction) before returning. This allows ZGC to compact the heap concurrently while the application runs — references are transparently updated on first access. The consequence of concurrent compaction is that ZGC's stop-the-world pauses contain only a few operations: scanning thread stacks, synchronising marking state, and a few bookkeeping tasks. These are O(number of GC roots), not O(live set size), so pause times are bounded to single milliseconds regardless of heap size. A 4TB heap has the same pause time as a 4GB heap with ZGC — a revolutionary property. Shenandoah (originally a Red Hat contribution) achieves similar goals through different mechanics: it uses Brooks pointers (forwarding pointers added to every object header) rather than coloured pointers, and performs load barriers on object accesses as well. Both achieve similar latency goals; ZGC is in the OpenJDK mainline and available across all distributions, while Shenandoah is also now in OpenJDK but has stronger Red Hat support. The cost of these low-latency collectors is throughput: the concurrent GC work and load barriers consume CPU that would otherwise be available to the application. Under heavy load, applications using ZGC or Shenandoah may achieve 10-15% lower throughput than G1GC. This is acceptable for latency-critical services but may not be acceptable for batch processing.
Java
// ── ZGC key flags (Java 21 LTS) ──────────────────────────────────────
// java -XX:+UseZGC                         (enable ZGC)
// java -XX:+UseZGC -XX:+ZGenerational      (generational ZGC, Java 21+, recommended)
// java -XX:SoftMaxHeapSize=2g              (soft limit, GC more aggressively before this)
// java -Xmx4g                              (hard limit)
// ZGC automatically sizes GC threads — usually no tuning needed

// ── Shenandoah key flags ──────────────────────────────────────────────
// java -XX:+UseShenandoahGC
// java -XX:ShenandoahGCMode=iu             (incremental update, default)
// java -XX:ShenandoahGCMode=satb           (snapshot-at-beginning)

// ── Choosing the right GC ─────────────────────────────────────────────
//
// Criterion                 Serial  Parallel   G1GC    ZGC    Shenandoah
// Heap size                 Small   Any        Any     Any    Any
// Throughput                LOW     BEST       GOOD    GOOD   GOOD
// Max pause (200MB heap)    Secs    100-500ms  <200ms  <1ms   <1ms
// Max pause (64GB heap)     N/A     Secs       200ms+  <1ms   <1ms
// Memory overhead           Lowest  Low        Medium  Medium Medium
// JDK availability          All     All        All     11+    12+
// Maturity                  Oldest  Old        8+years 4years 4years
//
// Decision:
// Default (most apps):      G1GC — well-tested, balanced
// Latency-critical (<1ms):  ZGC or Shenandoah
// Batch/throughput jobs:    Parallel GC
// Memory-constrained:       Serial GC
// Java 21+ new apps:        ZGC with generational mode

// ── ZGC in practice — tuning is minimal ──────────────────────────────
// Unlike G1GC, ZGC rarely needs tuning beyond heap size.
// Key metrics to watch:
//   Allocation stall: application threads blocked waiting for GC
//   → Fix: increase heap (ZGC needs headroom for concurrent work)
//
// Recommendation: set Xmx 30-50% larger than live set
// Live set 2GB → -Xmx3g or -Xmx4g provides enough headroom for ZGC

// ── Concurrent GC overhead measurement ───────────────────────────────
// -Xlog:gc*:gc.log:time,uptime,level,tags
// Look for: concurrent-mark, concurrent-relocate durations
// These consume CPU but do NOT stop the application
// Throughput reduction ≈ GC thread CPU / total CPU

Related Topics in Java Memory Management

Stack Memory
Stack memory is the region of memory where the JVM stores method invocation frames, local variables, and partial results. Every thread has its own private stack created at thread creation, and the stack grows and shrinks as methods are called and return. Stack memory operates on a last-in, first-out discipline — the frame for the most recently called method sits on top, and when that method returns its frame is immediately discarded. Understanding stack memory explains why local variables are thread-safe by default, why recursive algorithms can cause StackOverflowError, why primitive values behave differently from objects, and what the JVM does at every method call and return. This entry covers stack frame structure, the stack pointer, local variable storage, operand stacks, frame lifecycle, thread isolation, and the performance characteristics that make stack allocation extremely fast.
Heap Memory
The heap is the runtime data area from which all Java object instances and arrays are allocated. It is shared across all threads in a JVM process, grows dynamically up to a configured maximum, and is managed entirely by the garbage collector. Understanding heap memory means understanding how objects are allocated, how the generational hypothesis drives GC design, how the major garbage collectors (G1, ZGC, Shenandoah, Parallel) partition and manage the heap, what triggers garbage collection, how to tune heap size and GC behaviour, and how to diagnose heap-related problems including OutOfMemoryError and excessive GC pause times. This entry covers heap structure, generational design, object allocation, garbage collection triggers, common collectors, heap tuning, and memory leak detection.
Metaspace
Metaspace is the JVM memory region that stores class metadata — the internal representations of loaded classes, methods, fields, constant pools, and annotations. It replaced PermGen (Permanent Generation) in Java 8. Unlike PermGen which was a fixed-size heap region, Metaspace is allocated from native memory (outside the Java heap) and grows dynamically up to an optional maximum. Understanding Metaspace means understanding what class metadata contains, what causes Metaspace to grow, how class unloading reclaims Metaspace, what OutOfMemoryError from Metaspace looks like, how to monitor and limit it, and what the practical implications are for application servers, OSGi containers, and frameworks that generate classes dynamically.
JVM Memory Model
The JVM Memory Model encompasses two related but distinct concepts. The first is the runtime memory architecture — how the JVM partitions memory into regions (stack, heap, Metaspace, code cache, native memory) and what each region stores. The second, more precise meaning is the Java Memory Model (JMM) defined in the Java Language Specification — the formal set of rules that govern how multithreaded programs observe memory: when writes made by one thread become visible to other threads, what ordering guarantees exist, and how volatile, synchronized, and final provide those guarantees. This entry covers both: the complete runtime memory architecture with all regions explained, and the Java Memory Model's happens-before relationship, visibility rules, and the practical consequences for concurrent code.