☕ Java

Intermediate Operations

Intermediate operations are Stream API methods that transform one stream into another stream, enabling pipeline construction through method chaining. Every intermediate operation is lazy — it does not process any elements when called; it only builds a description of the computation to be performed. The actual processing occurs only when a terminal operation triggers stream traversal, at which point all intermediate operations execute in a single pass over the data, interleaved element by element rather than stage by stage. This laziness and fusion model is the defining architectural feature of the Streams API, distinguishing it from eager collection-transformation approaches. Intermediate operations fall into categories: filtering (filter, distinct), transformation (map and its primitive variants, flatMap), ordering (sorted), size-limiting (limit, skip), peeking (peek), and the Java 9 additions for prefix-based selection (takeWhile, dropWhile). This entry covers the laziness model and why it matters, every intermediate operation with its exact semantics and performance characteristics, stateless versus stateful intermediate operations, short-circuiting operations and how they interact with infinite streams, the map/flatMap distinction, and operation ordering and its performance implications.

Laziness, Pipeline Fusion, and Stateless vs Stateful Operations

Every intermediate operation in the Stream API returns a new Stream without consuming any elements from the source. Calling stream.filter(predicate) does not iterate the source; it returns a new Stream object that records the filtering operation. This laziness is fundamental: a stream pipeline built from any number of intermediate operations performs zero work until a terminal operation is invoked. Stream.of(1,2,3).filter(n -> { throw new RuntimeException(); }) does not throw — the lambda is never called because no terminal operation triggered traversal. When a terminal operation does execute, the JVM does not process the stream stage by stage (filtering all elements, then mapping all results, then collecting). Instead, it fuses all intermediate operations into a single pass: each source element flows through the entire pipeline — filter, then map, then any subsequent operations — before the next element is processed. This is why peek() calls interleave with terminal operation processing rather than completing entirely before the next stage begins, and why a filter() that rejects 99% of elements means the subsequent map() only runs on the 1% that pass, without ever materializing an intermediate collection of filtered results. Intermediate operations are classified as stateless or stateful based on whether processing one element requires knowledge of other elements. filter(), map(), mapToInt/Long/Double(), flatMap(), peek(), and unordered() are stateless: each element is processed independently of all others, requiring no buffering. sorted(), distinct(), and limit() (in certain pipeline configurations) are stateful: they require seeing some or all elements before producing results, because sorting requires knowing all elements' relative order, distinct() requires remembering all previously seen elements, and limit() requires counting. The performance consequence of stateful operations is significant for parallel streams: stateless operations parallelize trivially because each thread can process its chunk of elements independently. Stateful operations require coordination — sorted() must merge sorted chunks from each thread (effectively a parallel merge sort), distinct() must synchronize on a shared set or merge per-thread sets, and limit() must coordinate to know when enough elements have been produced across all threads. This coordination overhead can make a stateful operation in a parallel stream slower than the sequential equivalent for small or moderately-sized inputs. Short-circuiting operations are those that may terminate processing before examining all elements: limit(n) stops after producing n elements, takeWhile() stops at the first element failing the predicate. These are essential for working with infinite streams (Stream.iterate(), Stream.generate()) — without a short-circuiting operation, processing an infinite stream never terminates.
Java
// ── Laziness — nothing happens until a terminal operation ─────────────
Stream<Integer> lazy = Stream.of(1, 2, 3, 4, 5)
    .filter(n -> {
        System.out.println("filtering " + n);
        return n % 2 == 0;
    })
    .map(n -> {
        System.out.println("mapping " + n);
        return n * 10;
    });
// Nothing printed yet — no terminal operation called

System.out.println("Pipeline built, about to consume...");
List<Integer> result = lazy.collect(Collectors.toList());  // NOW the pipeline executes
System.out.println(result);

// Output:
// Pipeline built, about to consume...
// filtering 1
// filtering 2
// mapping 2
// filtering 3
// filtering 4
// mapping 4
// filtering 5
// [20, 40]
// Note: element-by-element interleaving — NOT all filters then all maps

// ── Pipeline fusion — single pass, not staged ──────────────────────────
List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

List<String> fused = numbers.stream()
    .peek(n -> System.out.println("1. saw: " + n))
    .filter(n -> n % 2 == 0)
    .peek(n -> System.out.println("2. passed filter: " + n))
    .map(n -> n * n)
    .peek(n -> System.out.println("3. mapped: " + n))
    .map(String::valueOf)
    .collect(Collectors.toList());
// Each number flows through ALL stages before the next number starts:
// 1. saw: 1
// 1. saw: 2
// 2. passed filter: 2
// 3. mapped: 4
// 1. saw: 3
// 1. saw: 4
// 2. passed filter: 4
// 3. mapped: 16
// ... (interleaved, not staged)

// ── Stateless operations — parallelize trivially ──────────────────────
List<Integer> bigList = IntStream.rangeClosed(1, 1_000_000).boxed().collect(Collectors.toList());

// filter, map are stateless — each thread handles its chunk independently:
long count = bigList.parallelStream()
    .filter(n -> n % 7 == 0)   // stateless: no coordination needed
    .map(n -> n * 2)            // stateless: no coordination needed
    .count();
System.out.println(count);

// ── Stateful operations — require coordination in parallel ────────────
// sorted() requires global ordering knowledge:
List<Integer> sorted = bigList.parallelStream()
    .sorted()                   // stateful: parallel merge sort under the hood
    .collect(Collectors.toList());

// distinct() requires tracking all seen elements:
List<Integer> withDupes = IntStream.range(0, 1_000_000)
    .mapToObj(n -> n % 1000)    // creates many duplicates
    .collect(Collectors.toList());
List<Integer> unique = withDupes.parallelStream()
    .distinct()                 // stateful: must coordinate seen-set across threads
    .collect(Collectors.toList());

// ── Short-circuiting with infinite streams ─────────────────────────────
// Without short-circuit: infinite loop (would run forever):
// Stream.iterate(1, n -> n + 1).forEach(System.out::println);  // NEVER terminates

// With limit() (short-circuiting): terminates correctly:
List<Integer> firstTen = Stream.iterate(1, n -> n + 1)
    .limit(10)   // SHORT-CIRCUIT: stops infinite stream after 10 elements
    .collect(Collectors.toList());
System.out.println(firstTen);  // [1,2,3,4,5,6,7,8,9,10]

// takeWhile (short-circuiting, Java 9+):
List<Integer> belowHundred = Stream.iterate(1, n -> n * 2)
    .takeWhile(n -> n < 100)   // SHORT-CIRCUIT: stops at first n >= 100
    .collect(Collectors.toList());
System.out.println(belowHundred);  // [1,2,4,8,16,32,64]

// Generate with limit:
Stream.generate(Math::random)
    .limit(3)
    .forEach(System.out::println);  // exactly 3 random doubles

filter, map, flatMap, and Primitive Variants

filter(Predicate<? super T> predicate) returns a stream containing only the elements that satisfy the predicate. It is stateless and does not change element count guarantees beyond "less than or equal to the input count." Multiple chained filter() calls are equivalent to a single filter() with an AND of all predicates, but the JVM does not automatically merge them — each filter() call adds a stage to the pipeline. map(Function<? super T, ? extends R> mapper) transforms each element using the mapper function, producing a stream of the mapper's return type. It is a strict one-to-one transformation: exactly one output element for each input element (unlike flatMap, which can produce zero, one, or many output elements per input). mapToInt(), mapToLong(), and mapToDouble() are specialized forms that produce primitive streams (IntStream, LongStream, DoubleStream) instead of boxed Stream<Integer>, etc., avoiding boxing overhead for numeric pipelines. The corresponding boxed() method on a primitive stream converts back to the boxed Stream<T> form when object semantics are needed downstream (for collecting into a List<Integer>, for example). flatMap(Function<? super T, ? extends Stream<? extends R>> mapper) is the operation for one-to-many transformations: each input element is mapped to a Stream (possibly empty, possibly containing multiple elements), and all the resulting streams are concatenated (flattened) into a single output stream. This is the correct tool for "list of lists to flat list," "object to its collection of children," and any transformation where the natural mapping produces a variable number of outputs per input. mapToObj(), flatMapToInt(), flatMapToLong(), and flatMapToDouble() are the primitive-stream variants. The map vs flatMap distinction is the most common point of confusion for developers new to streams. map(x -> someList) produces a Stream<List<T>> — a stream where each element is itself a list. flatMap(x -> someList.stream()) produces a Stream<T> — a flattened stream of all the lists' elements combined. The rule: if your mapper function naturally returns a Stream (or something convertible to one, like a Collection via .stream()), and you want the elements of all those streams combined into one stream, use flatMap. If your mapper returns a single value per input, use map. mapMulti() (Java 16+) is an alternative to flatMap for one-to-many transformations that avoids the overhead of stream creation: instead of returning a Stream, the mapper is a BiConsumer that calls a provided consumer for each output element. mapMulti() can be more efficient than flatMap() when the number of output elements per input is small and creating a Stream object for each input element adds measurable overhead.
Java
// ── filter — selecting elements ─────────────────────────────────────
List<String> words = List.of("apple", "fig", "banana", "kiwi", "cherry", "date");

List<String> longWords = words.stream()
    .filter(w -> w.length() > 4)
    .collect(Collectors.toList());
System.out.println(longWords);  // [apple, banana, cherry]

// Multiple filter() calls = AND of predicates (separate pipeline stages):
List<String> filtered = words.stream()
    .filter(w -> w.length() > 3)
    .filter(w -> w.startsWith("b") || w.startsWith("c"))
    .collect(Collectors.toList());
System.out.println(filtered);  // [banana, cherry]

// ── map — one-to-one transformation ──────────────────────────────────
List<Integer> lengths = words.stream()
    .map(String::length)
    .collect(Collectors.toList());
System.out.println(lengths);  // [5, 3, 6, 4, 6, 4]

// map() always preserves count: input.size() == output.size()
List<String> upper = words.stream()
    .map(String::toUpperCase)
    .collect(Collectors.toList());
System.out.println(upper);  // [APPLE, FIG, BANANA, KIWI, CHERRY, DATE]

// ── mapToInt/Long/Double — avoid boxing ────────────────────────────────
int totalLength = words.stream()
    .mapToInt(String::length)   // IntStream — no Integer boxing
    .sum();
System.out.println(totalLength);  // 28

OptionalDouble avgLength = words.stream()
    .mapToInt(String::length)
    .average();
System.out.println(avgLength);  // OptionalDouble[4.666...]

// boxed() — convert primitive stream back to object stream:
List<Integer> lengthsAsObjects = words.stream()
    .mapToInt(String::length)
    .boxed()                     // IntStream → Stream<Integer>
    .collect(Collectors.toList());

// ── flatMap — one-to-many flattening ────────────────────────────────
List<List<Integer>> nested = List.of(
    List.of(1, 2, 3),
    List.of(4, 5),
    List.of(6, 7, 8, 9)
);

// WITHOUT flatMap — map produces Stream<List<Integer>>:
List<List<Integer>> stillNested = nested.stream()
    .map(list -> list)   // no-op, illustrating the shape
    .collect(Collectors.toList());
// stillNested is List<List<Integer>> — not flattened

// WITH flatMap — produces Stream<Integer>:
List<Integer> flattened = nested.stream()
    .flatMap(List::stream)   // each inner list → its own stream, concatenated
    .collect(Collectors.toList());
System.out.println(flattened);  // [1,2,3,4,5,6,7,8,9]

// Common use case: object → collection of children
record Author(String name, List<String> books) {}
List<Author> authors = List.of(
    new Author("Author A", List.of("Book 1", "Book 2")),
    new Author("Author B", List.of("Book 3")),
    new Author("Author C", List.of())   // no books
);

List<String> allBooks = authors.stream()
    .flatMap(a -> a.books().stream())
    .collect(Collectors.toList());
System.out.println(allBooks);  // [Book 1, Book 2, Book 3]  (empty list contributes nothing)

// flatMap with Optional (treating Optional as a 0-or-1 element stream):
List<Optional<String>> optionals = List.of(
    Optional.of("present"), Optional.empty(), Optional.of("also present")
);
List<String> presentValues = optionals.stream()
    .flatMap(Optional::stream)   // Optional.stream() — Java 9+
    .collect(Collectors.toList());
System.out.println(presentValues);  // [present, also present]

// ── flatMapToInt/Long/Double — primitive flattening ────────────────────
List<String> sentences = List.of("hello world", "java streams api");
int totalChars = sentences.stream()
    .flatMapToInt(s -> s.chars())   // each sentence → IntStream of char codes
    .filter(c -> c != ' ')
    .map(c -> 1)
    .sum();
System.out.println("Non-space chars: " + totalChars);

// ── mapMulti — Java 16+, avoids stream creation overhead ──────────────
List<Integer> numbers = List.of(1, 2, 3, 4, 5);

// flatMap version — creates a Stream object per element:
List<Integer> flatMapResult = numbers.stream()
    .flatMap(n -> n % 2 == 0 ? Stream.of(n, n * 10) : Stream.empty())
    .collect(Collectors.toList());

// mapMulti version — uses a consumer, no Stream object per element:
List<Integer> mapMultiResult = numbers.stream()
    .<Integer>mapMulti((n, consumer) -> {
        if (n % 2 == 0) {
            consumer.accept(n);
            consumer.accept(n * 10);
        }
    })
    .collect(Collectors.toList());

System.out.println(flatMapResult);   // [2, 20, 4, 40]
System.out.println(mapMultiResult);  // [2, 20, 4, 40]  — same result, less overhead

sorted, distinct, limit, skip, peek, takeWhile, dropWhile

sorted() sorts elements according to their natural order (requires elements to implement Comparable, throws ClassCastException otherwise). sorted(Comparator<? super T> comparator) sorts according to the provided comparator. Both are stateful operations: the entire upstream must be consumed before any element can be emitted downstream, because the relative order of any two elements may depend on elements not yet seen. This means sorted() breaks the element-by-element fusion model for everything downstream of it — all elements must accumulate, then sort, then proceed. distinct() removes duplicate elements according to equals() and hashCode(), preserving encounter order for ordered streams. It is stateful — it must remember every element seen to detect duplicates, requiring O(n) memory in the worst case (all elements unique). For unordered streams (after BaseStream.unordered() or from a HashSet source), distinct() may execute more efficiently in parallel since it isn't constrained to preserve a specific order, but it must still track seen elements. limit(long maxSize) truncates the stream to at most maxSize elements. It is a short-circuiting operation: a pipeline with limit() can terminate after producing maxSize elements even if the source is infinite. limit() applied after sorted() requires the full sort to complete first (since the "first n" elements after sorting depend on the complete order). limit() applied directly to an unsorted stream is much cheaper because it only needs to count, not coordinate global state. skip(long n) discards the first n elements and emits the rest. It is stateful in the sense that it must count elements, but it requires no buffering — once n elements have been counted and discarded, all subsequent elements pass through. skip() combined with limit() implements pagination: stream.skip(pageSize * pageNumber).limit(pageSize). peek(Consumer<? super T> action) performs the action on each element as it flows through, without changing the stream's content — it is designed for debugging and side effects, not for stream manipulation. The Javadoc explicitly discourages using peek() for anything other than debugging because terminal operations may optimize away upstream operations whose results aren't needed for the terminal operation's outcome (e.g., findFirst() may not need to traverse the entire stream, so peek() calls for unvisited elements never execute). takeWhile(Predicate<? super T> predicate) (Java 9+) emits elements from the start of the stream as long as the predicate is true, stopping at the first element that fails. It is short-circuiting. dropWhile(Predicate<? super T> predicate) (Java 9+) discards elements from the start as long as the predicate is true, then emits all remaining elements (including the first one that failed the predicate and everything after, regardless of whether they would also satisfy the predicate). These differ from filter() because they respect encounter order and stop/start based on a contiguous prefix, not on a per-element test applied to the whole stream.
Java
// ── sorted() — natural order and custom comparator ────────────────────
List<String> words = List.of("banana", "apple", "cherry", "date", "fig");

List<String> naturalSort = words.stream()
    .sorted()   // natural order — requires Comparable
    .collect(Collectors.toList());
System.out.println(naturalSort);  // [apple, banana, cherry, date, fig]

List<String> byLengthThenAlpha = words.stream()
    .sorted(Comparator.comparingInt(String::length).thenComparing(Comparator.naturalOrder()))
    .collect(Collectors.toList());
System.out.println(byLengthThenAlpha);  // [date, fig, apple, banana, cherry]

// sorted() is stateful — must consume ALL elements before emitting ANY:
Stream.of(5, 3, 1, 4, 2)
    .peek(n -> System.out.println("before sort: " + n))
    .sorted()
    .peek(n -> System.out.println("after sort: " + n))
    .forEach(n -> {});
// ALL "before sort" prints happen BEFORE any "after sort" — unlike stateless ops

// ── distinct() — deduplication by equals/hashCode ─────────────────────
List<Integer> withDupes = List.of(1, 2, 2, 3, 3, 3, 4, 1, 5);
List<Integer> unique = withDupes.stream()
    .distinct()   // preserves first-encounter order
    .collect(Collectors.toList());
System.out.println(unique);  // [1, 2, 3, 4, 5]

// distinct() on custom objects requires equals/hashCode override:
record Point(int x, int y) {}   // record auto-generates equals/hashCode
List<Point> points = List.of(new Point(1,1), new Point(2,2), new Point(1,1));
List<Point> uniquePoints = points.stream().distinct().collect(Collectors.toList());
System.out.println(uniquePoints.size());  // 2

// ── limit() and skip() — pagination pattern ───────────────────────────
List<Integer> data = IntStream.rangeClosed(1, 100).boxed().collect(Collectors.toList());

int pageSize = 10, pageNumber = 2;  // 0-indexed: page 2 = items 21-30
List<Integer> page = data.stream()
    .skip((long) pageSize * pageNumber)
    .limit(pageSize)
    .collect(Collectors.toList());
System.out.println(page);  // [21, 22, ..., 30]

// limit() is short-circuiting — efficient for infinite/large streams:
List<Integer> firstFew = Stream.iterate(1, n -> n + 1)
    .limit(5)
    .collect(Collectors.toList());
System.out.println(firstFew);  // [1, 2, 3, 4, 5]

// limit() after sorted() requires full sort first (expensive for large data):
List<Integer> top5 = data.stream()
    .sorted(Comparator.reverseOrder())  // must sort ALL 100 elements
    .limit(5)                           // then take top 5
    .collect(Collectors.toList());
// More efficient alternative for "top N": use a PriorityQueue or Collections.max repeatedly

// ── peek() — debugging only, NOT for side effects in production ──────
List<Integer> debugged = Stream.of(1, 2, 3, 4, 5)
    .peek(n -> System.out.println("Processing: " + n))
    .filter(n -> n % 2 == 0)
    .peek(n -> System.out.println("After filter: " + n))
    .collect(Collectors.toList());

// DANGER: peek() may not execute for all elements if terminal op short-circuits:
Optional<Integer> first = Stream.of(1, 2, 3, 4, 5)
    .peek(n -> System.out.println("Peeked: " + n))  // may only print "Peeked: 1"
    .filter(n -> n > 0)
    .findFirst();   // short-circuits after first match
// JVM is permitted to skip peek() for elements 2-5 since findFirst() doesn't need them

// ── takeWhile() and dropWhile() — Java 9+ ─────────────────────────────
List<Integer> sequence = List.of(2, 4, 6, 7, 8, 10, 12);  // not all even after index 3

// takeWhile: stops at FIRST failure (7 is odd), even though 8,10,12 are even:
List<Integer> leadingEvens = sequence.stream()
    .takeWhile(n -> n % 2 == 0)
    .collect(Collectors.toList());
System.out.println(leadingEvens);  // [2, 4, 6] — stops at 7, doesn't continue past

// Compare with filter — examines EVERY element, not just the prefix:
List<Integer> allEvens = sequence.stream()
    .filter(n -> n % 2 == 0)
    .collect(Collectors.toList());
System.out.println(allEvens);  // [2, 4, 6, 8, 10, 12] — includes elements after the odd one

// dropWhile: discards leading matches, keeps everything from first failure onward:
List<Integer> afterFirstOdd = sequence.stream()
    .dropWhile(n -> n % 2 == 0)
    .collect(Collectors.toList());
System.out.println(afterFirstOdd);  // [7, 8, 10, 12] — 7 onward, including later evens

// Practical use: parsing structured text with a header section:
List<String> fileLines = List.of(
    "# comment", "# another comment", "data1", "data2", "data3"
);
List<String> dataLines = fileLines.stream()
    .dropWhile(line -> line.startsWith("#"))
    .collect(Collectors.toList());
System.out.println(dataLines);  // [data1, data2, data3]

Related Topics in Java 8 Features

Lambda Expressions
Lambda expressions, introduced in Java 8, are anonymous functions — blocks of code that can be stored in variables, passed as arguments, and returned from methods, treating behavior as data. A lambda has three parts: a parameter list, an arrow token (->), and a body. The body is either a single expression (whose value is the implicit return value) or a block of statements wrapped in braces. Lambdas implement functional interfaces — interfaces with exactly one abstract method — allowing any lambda whose signature matches the abstract method's signature to be used wherever that interface is expected. The lambda syntax is syntactic sugar: every lambda is compiled to an invocation of the functional interface's abstract method, with the compiler generating a class (via invokedynamic) that implements the interface and delegates to the lambda body. This entry covers the complete lambda syntax including all shorthand forms, variable capture and the effectively-final constraint, method references as a specialized lambda syntax, the relationship between lambdas and the type system, how lambdas interact with exception handling, the invokedynamic compilation strategy and its performance characteristics, and the complete set of rules governing lambda type inference.
Functional Interfaces
A functional interface is any Java interface that has exactly one abstract method. This single-abstract-method (SAM) contract makes the interface a valid target type for a lambda expression or method reference — the lambda provides the implementation of that one abstract method. The @FunctionalInterface annotation is optional but strongly recommended: it causes the compiler to verify that the interface satisfies the SAM constraint, rejecting it at compile time if there is more than one abstract method. The java.util.function package, introduced in Java 8, provides 43 standard functional interfaces organized around four root types — Function, Consumer, Supplier, Predicate — and their variations for primitives (IntFunction, LongSupplier, DoubleConsumer, etc.), binary operations (BiFunction, BiConsumer, BiPredicate), and unary operators (UnaryOperator, IntUnaryOperator, etc.). This entry covers the design principles behind functional interfaces, the complete @FunctionalInterface contract including default and static methods, the full java.util.function hierarchy and the pattern that governs naming, creating custom functional interfaces with checked exceptions, composing functional interfaces via default methods, and the relationship between functional interfaces and the type system including the rules for lambda assignment and widening.
Predicate
Predicate<T> is a functional interface in java.util.function representing a boolean-valued function of one argument, with the single abstract method boolean test(T t). It is one of the four foundational functional interfaces in the Java standard library and is used throughout the Collections framework, Streams API, and Optional for filtering, condition testing, and validation. Predicate is designed for composition: its default methods and(Predicate), or(Predicate), and negate() allow building complex boolean expressions from simple predicates without boilerplate. The static methods isEqual(Object) and not(Predicate) provide factory methods for common cases. The primitive specializations IntPredicate, LongPredicate, and DoublePredicate avoid boxing overhead for numeric values. BiPredicate<T,U> extends the concept to two-argument boolean functions. This entry covers the complete Predicate API, all composition methods and their short-circuit semantics, the static factory methods, primitive specializations, BiPredicate, using Predicate in stream pipelines and Collections methods, building validation frameworks with Predicate composition, and the performance and readability trade-offs of different composition styles.
Function
Function<T,R> is a functional interface in java.util.function representing a function that accepts one argument of type T and produces a result of type R, with the single abstract method R apply(T t). It is the most general transformation interface in the standard library, used throughout the Streams API for mapping (Stream.map()), in Optional for value transformation (Optional.map(), Optional.flatMap()), and as a building block for more specialized functional interfaces. Function provides two default composition methods — andThen() and compose() — that create new functions by chaining two functions together, enabling functional pipeline construction without intermediate variables. The specializations cover all combinations of generic and primitive inputs and outputs: ToIntFunction, IntFunction, IntToLongFunction, and so on. UnaryOperator<T> extends Function<T,T> for operations that transform a value within the same type. BiFunction<T,U,R> generalizes to two input arguments. This entry covers the complete Function API, the semantics of andThen versus compose, all specializations and when each is appropriate, the functional relationship between Function and other java.util.function types, partial application patterns, and Function as the basis for building data pipelines.