☕ Java

LinkedHashSet

LinkedHashSet is a Set implementation that combines the O(1) performance of a hash table with the predictable iteration order of a doubly-linked list. It maintains a linked list running through its entries in the order elements were inserted, so iteration always produces elements in insertion order. LinkedHashSet occupies the middle ground between HashSet (no order guarantee) and TreeSet (sorted order, O(log n) operations) — it provides the ordering guarantee that HashSet lacks at nearly the same performance cost. This entry covers the dual data structure implementation, insertion-order semantics, the memory overhead of the linked list, when LinkedHashSet is the right choice over HashSet and TreeSet, and practical patterns including deduplication with preserved order.

Dual Structure — Hash Table Plus Linked List

LinkedHashSet extends HashSet and is backed by a LinkedHashMap. It combines two data structures: a hash table for O(1) average-case add, remove, and contains operations (identical to HashSet), and a doubly-linked list that connects all entries in insertion order. Every element is both an entry in the hash table and a node in the linked list. The linked list is used exclusively for iteration — it provides the ordered traversal that the hash table cannot. When an element is added, it is inserted into the hash table (its bucket determined by its hash code) and appended to the tail of the linked list. When an element is removed, it is removed from its hash table bucket and unlinked from the doubly-linked list by updating its predecessor's next pointer and its successor's prev pointer. These linked list operations are O(1) given the node reference, which the hash table lookup provides. The memory overhead of LinkedHashSet over HashSet is exactly the doubly-linked list: two additional reference fields per entry (before and after, pointing to the predecessor and successor in insertion order). For an entry holding a reference to an element, the overhead is roughly 50%: two extra references on top of the existing three (key, value, next-in-bucket). This is significant for large sets with small elements but negligible for large objects. The key guarantee: iteration order is insertion order, defined as the order in which elements were first successfully added (ignored re-insertions of existing elements). If "Alice" is added at time 1 and "Bob" at time 2, the iterator always produces "Alice" before "Bob" regardless of their hash codes.
Java
// ── Insertion order preserved ─────────────────────────────────────────
Set<String> linked = new LinkedHashSet<>();
linked.add("banana");
linked.add("apple");
linked.add("cherry");
linked.add("date");

// Iteration produces elements in INSERTION order:
for (String s : linked) System.out.print(s + " ");
// banana apple cherry date — always, regardless of hash codes

// Compare with HashSet:
Set<String> hashed = new HashSet<>(linked);
for (String s : hashed) System.out.print(s + " ");
// might print: apple banana date cherry — order unpredictable

// ── Re-insertion does not change order ────────────────────────────────
Set<String> set = new LinkedHashSet<>();
set.add("A");
set.add("B");
set.add("C");
set.add("A");   // already present — ignored, position in list unchanged

for (String s : set) System.out.print(s + " ");
// A B C — A stays at its original insertion position

// ── Dual structure explained ──────────────────────────────────────────
//
// LinkedHashSet with {"banana", "apple", "cherry"}:
//
// Hash table (for O(1) lookup):
// Bucket 2: [banana]
// Bucket 5: [apple]
// Bucket 7: [cherry]
//
// Doubly-linked list (for ordered iteration):
// head ↔ banana ↔ apple ↔ cherry ↔ tail
//
// contains("apple"): hash("apple") → bucket 5 → found → O(1)
// iteration: follow linked list: banana → apple → cherry

// ── Memory overhead per entry ─────────────────────────────────────────
// HashSet entry: { hash, key, value(dummy), next-in-bucket }
// LinkedHashSet entry: { hash, key, value(dummy), next-in-bucket,
//                        before(linked list prev),
//                        after(linked list next) }
// Extra: 2 references × 8 bytes = 16 bytes per entry
// For large sets of small objects, this is significant
// For large objects (e.g., complex DTOs), the overhead is negligible

When to Choose LinkedHashSet

The choice between HashSet, LinkedHashSet, and TreeSet maps to three distinct requirements. Use HashSet when ordering does not matter and maximum performance is the priority. Use LinkedHashSet when insertion order must be preserved — for deduplication that maintains original order, for output that should appear in the same sequence as input, or for any set that will be iterated repeatedly where a consistent traversal order matters for correctness or readability. Use TreeSet when elements must be in sorted order. LinkedHashSet is the natural tool for deduplication that preserves the order of first occurrence. Converting a list to a HashSet deduplicates but loses order. Converting to a LinkedHashSet deduplicates while maintaining the order in which unique elements first appeared. This is a common requirement in data processing: deduplicate a list of user events while preserving chronological order, deduplicate a CSV of product codes while preserving the sequence for downstream processing. LinkedHashSet is also useful for caches, recently-used lists, and any set that represents an ordered history. The insertion-order semantics mean the set can serve as a compact deduplicating buffer: as items arrive, inserting them in a LinkedHashSet automatically eliminates duplicates while preserving arrival order. The performance of LinkedHashSet versus HashSet is nearly identical for add, remove, and contains — both are O(1) average case. The difference is the constant factor: LinkedHashSet's operations are slightly slower due to linked list maintenance. For iteration, LinkedHashSet is actually faster than HashSet in practice: the linked list traversal follows a predictable pointer chain in memory order, while HashSet iteration scans the entire bucket array including empty buckets.
Java
// ── Deduplication preserving insertion order ─────────────────────────
List<String> events = List.of(
    "login", "view", "login", "purchase", "view", "logout", "login");

// HashSet dedup — order lost:
Set<String> unordered = new HashSet<>(events);
// might print: purchase logout login view

// LinkedHashSet dedup — insertion order preserved:
Set<String> ordered = new LinkedHashSet<>(events);
for (String e : ordered) System.out.print(e + " ");
// login view purchase logout — first occurrence order

// ── Preserving CSV column order while deduplicating ───────────────────
String[] columns = {"name", "email", "phone", "email", "name", "city"};
Set<String> uniqueColumns = new LinkedHashSet<>(Arrays.asList(columns));
// uniqueColumns iteration: name, email, phone, city — preserves first-seen order

// ── OrderedHistory: recent items without duplicates ───────────────────
public class RecentItems<T> {
    private final Set<T> items = new LinkedHashSet<>();
    private final int    maxSize;

    public RecentItems(int maxSize) { this.maxSize = maxSize; }

    public void add(T item) {
        items.remove(item);    // remove if present (to re-insert at end)
        items.add(item);       // add at tail of linked list
        if (items.size() > maxSize) {
            // Remove the oldest item (head of linked list)
            items.remove(items.iterator().next());
        }
    }

    public List<T> getRecentItems() {
        List<T> result = new ArrayList<>(items);
        Collections.reverse(result);  // most recent first
        return result;
    }
}

// Usage:
RecentItems<String> history = new RecentItems<>(5);
history.add("page1");
history.add("page2");
history.add("page3");
history.add("page1");   // moves page1 to end (most recent)
// getRecentItems(): [page1, page3, page2] — most recent first

// ── Performance comparison ────────────────────────────────────────────
// Operation      HashSet    LinkedHashSet   TreeSet
// ─────────────────────────────────────────────────────
// add            O(1)*      O(1)*           O(log n)
// remove         O(1)*      O(1)*           O(log n)
// contains       O(1)*      O(1)*           O(log n)
// iteration      O(capacity) O(size)        O(size)
// memory/entry   baseline   +16 bytes       +node overhead
//
// * amortised average case
// LinkedHashSet iteration is FASTER than HashSet because:
// - HashSet scans the entire bucket array (capacity slots, mostly empty)
// - LinkedHashSet follows a pointer chain through only live entries

Practical Patterns with LinkedHashSet

LinkedHashSet's combination of O(1) operations and insertion-order iteration makes it versatile beyond simple set semantics. It naturally implements a seen-before filter, a deduplicating pipeline stage, and any component that needs to track unique items without losing the sequence in which they appeared. The toList() conversion from a LinkedHashSet preserves insertion order, making it suitable as a deduplication step in a processing pipeline that ends with an ordered list. This pattern appears in recommendation systems (deduplicate candidate items while preserving relevance ranking), in data ETL (deduplicate rows while preserving record sequence), and in UI rendering (deduplicate tags or categories while preserving display order). LinkedHashSet is also useful when printing or serialising a set and a consistent, human-predictable output order is desired. HashSet output changes between runs and between JVM versions because hash code distributions can change. LinkedHashSet output is always the same insertion order, making logs, diffs, and test assertions predictable.
Java
// ── Deduplication pipeline step ──────────────────────────────────────
public List<String> deduplicatePreservingOrder(List<String> input) {
    return new ArrayList<>(new LinkedHashSet<>(input));
    // One line: deduplicate + preserve first-occurrence order + return as List
}

List<String> tags = List.of("java", "spring", "java", "rest", "spring", "api");
List<String> uniqueTags = deduplicatePreservingOrder(tags);
// [java, spring, rest, api] — first occurrence order

// ── Stream-based deduplication with order ─────────────────────────────
// Stream.distinct() also preserves encounter order:
List<String> distinct = tags.stream()
    .distinct()
    .collect(Collectors.toList());
// [java, spring, rest, api] — same result, but LinkedHashSet is faster for large input

// ── Predictable serialisation ──────────────────────────────────────────
// Configuration keys in defined order:
Set<String> configKeys = new LinkedHashSet<>();
configKeys.add("database.url");
configKeys.add("database.username");
configKeys.add("database.password");
configKeys.add("server.port");

// Serialising always produces the same order:
configKeys.forEach(key ->
    System.out.println(key + " = " + config.get(key)));
// database.url = jdbc:...
// database.username = admin
// database.password = ***
// server.port = 8080
// Always in this exact order — predictable for diffs and logs

// ── Testing: predictable assertion order ─────────────────────────────
// HashSet assertion can be flaky:
// assertThat(processResult()).containsExactly("A", "B", "C");
// fails intermittently because HashSet iteration order varies

// LinkedHashSet assertion is stable:
Set<String> result = new LinkedHashSet<>();
result.add("A");
result.add("B");
result.add("C");
// This assertion is always correct:
assertThat(result).containsExactly("A", "B", "C");

// ── Intersection preserving left-operand order ─────────────────────────
public <T> Set<T> orderedIntersection(Set<T> ordered, Set<T> filter) {
    Set<T> result = new LinkedHashSet<>(ordered);
    result.retainAll(filter);  // keeps elements in 'ordered' that are in 'filter'
    return result;             // preserves 'ordered's insertion order
}

Set<String> preferred = new LinkedHashSet<>(
    List.of("cherry", "banana", "apple", "date"));
Set<String> available = Set.of("apple", "cherry", "elderberry");
Set<String> inStock = orderedIntersection(preferred, available);
// [cherry, apple] — preferred order preserved, banana/date excluded

Related Topics in Collections Framework

Collections Overview
The Java Collections Framework (JCF) is a unified architecture for representing and manipulating groups of objects. It provides a set of interfaces that define the operations a collection must support, a set of abstract classes that provide partial implementations, and a set of concrete implementations optimised for different use cases. Every Java developer uses collections daily — lists for sequences, sets for uniqueness, maps for key-value pairs, and queues for ordering — and choosing the right implementation for the right use case is one of the most fundamental practical skills in Java.
Iterable
Iterable<E> is the root interface of the Java Collections hierarchy. Any class that implements Iterable can be used in a for-each loop. It declares a single abstract method: iterator(), which returns an Iterator<E> that the for-each loop uses to traverse the elements. Implementing Iterable is all that is required to make a custom data structure work with Java's enhanced for loop, the Stream API, and any method that accepts an Iterable.
Collection Interface
Collection<E> is the root interface of the main collection hierarchy, extending Iterable<E>. It defines the common operations that all collection types must support: adding elements, removing elements, checking containment, querying size, clearing, converting to an array, and bulk operations. List, Set, and Queue all extend Collection. Map does not extend Collection because a map operates on key-value pairs rather than individual elements.
ArrayList
ArrayList is a resizable-array implementation of the List interface. It is the most commonly used collection in Java, providing dynamic sizing on top of a standard array. Elements are stored in contiguous memory, enabling O(1) random access by index. When the internal array is full, ArrayList automatically allocates a larger array and copies all elements. ArrayList is not thread-safe and preserves insertion order, allowing duplicates.