☕ Java

Matcher Class

The Matcher class (java.util.regex.Matcher) is the engine that applies a compiled regular expression Pattern to a specific input string and performs matching operations. While Pattern represents the compiled regex, Matcher is the stateful object that tracks the current position in the input, finds matches, captures groups, and supports find-and-replace operations. Every interaction with a regex against actual text goes through a Matcher instance.

Pattern and Matcher — The Two-Class Design

Java's regular expression API splits its functionality across two classes: Pattern and Matcher. Understanding why requires understanding the performance model of regular expressions. A regular expression must be compiled from its text form (like "\d{3}-\d{4}") into an internal representation — a finite automaton or equivalent structure — that the matching engine can efficiently execute. This compilation step is expensive relative to the matching step itself. If you re-compiled the same pattern for every string you wanted to check, you would pay the compilation cost repeatedly for no benefit. Pattern represents the compiled, immutable, thread-safe form of the regex. It is compiled once and can be stored as a static final field and reused across thousands of matching operations without any synchronisation concern. Pattern has no mutable state — it is safe to share freely between threads. Matcher is the stateful engine that applies a specific Pattern to a specific input string. It tracks where the last match ended so find() can advance through the input finding successive matches, and it remembers which groups matched what content so group() can retrieve them. Because Matcher maintains this mutable state, a single Matcher is not safe for concurrent use. But since Matchers are cheap to create (pattern.matcher(input) allocates a small stateful wrapper), the correct pattern is to create one Matcher per matching operation rather than sharing. This two-class design is the correct approach: compile the pattern once (expensive, shared), apply it per operation (cheap, not shared). Failing to follow this model — compiling a pattern inside a loop or method that is called repeatedly — wastes CPU time and is a common performance mistake in Java regex code.
Java
// ── Two-class design: Pattern (compiled, shared) + Matcher (stateful, per-use): ─
import java.util.regex.Pattern;
import java.util.regex.Matcher;

// ── BAD — recompiles the pattern every call: ──────────────────────────
public boolean isValidEmail(String email) {
    return email.matches("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}");
    // String.matches() compiles the pattern every time it is called.
    // If this method is called 10,000 times, the pattern is compiled 10,000 times.
}

// ── GOOD — compile once, reuse many times: ────────────────────────────
public class EmailValidator {
    // Pattern is compiled ONCE when the class is loaded:
    private static final Pattern EMAIL_PATTERN =
        Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}");

    public boolean isValid(String email) {
        // Matcher created per call — cheap, not shared:
        return EMAIL_PATTERN.matcher(email).matches();
    }
}

// ── Creating a Matcher: ───────────────────────────────────────────────
Pattern pattern = Pattern.compile("\d+");   // one or more digits
Matcher matcher = pattern.matcher("abc 123 def 456 ghi");
// matcher now knows:
//   - what pattern to apply: d+
//   - what input to apply it to: "abc 123 def 456 ghi"
//   - current position: 0 (start of input)

// ── Pattern is thread-safe — Matcher is NOT: ─────────────────────────
// Safe to share across threads — Pattern is immutable:
public static final Pattern PHONE = Pattern.compile("\d{3}-\d{4}");

// Each thread creates its own Matcher:
String input = "Call 555-1234 or 555-9876";
Matcher m = PHONE.matcher(input);   // per-call, per-thread
while (m.find()) {
    System.out.println("Found: " + m.group());
}

Core Matching Methods — matches(), find(), lookingAt()

Matcher provides three distinct matching operations that are often confused because they all perform matching but differ fundamentally in scope. Understanding when to use each is essential for writing correct regex code. matches() attempts to match the entire input string against the pattern. It returns true only if the pattern matches the whole string from start to finish — there can be no leading or trailing characters that are not part of the match. This is equivalent to anchoring the pattern with ^ and $ on both sides. Use matches() when you want to validate that an entire value — an email address, a phone number, a date — conforms to a pattern. find() searches for the next occurrence of the pattern anywhere within the input, starting from the current position. Each call to find() advances the position past the last match, allowing successive calls to find all non-overlapping occurrences of the pattern in the input. find() does not require the pattern to consume the entire string — it looks for any substring that matches. Use find() when you want to extract all occurrences of a pattern from within a larger text. lookingAt() is a hybrid: it attempts to match the pattern at the beginning of the input but does not require the match to extend to the end. The pattern must match a prefix of the input. lookingAt() is used much less frequently than the other two — it is appropriate when you are parsing a string from left to right and need to determine whether the current position starts with a particular pattern without consuming the entire remaining input. The position model of find() is crucial to understand. After a successful find(), the Matcher remembers where the match ended. The next find() call starts searching from that position. The reset() method resets the position back to the beginning of the input, allowing you to iterate through matches a second time without creating a new Matcher.
Java
Pattern digits = Pattern.compile("\d+");

// ── matches() — entire input must match: ─────────────────────────────
System.out.println(digits.matcher("12345").matches());      // true  — all digits
System.out.println(digits.matcher("123abc").matches());     // false"abc" not matched
System.out.println(digits.matcher("abc123").matches());     // false"abc" not matched

Pattern emailPattern = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}");
System.out.println(emailPattern.matcher("user@example.com").matches());   // true
System.out.println(emailPattern.matcher("not-an-email").matches());       // false
System.out.println(emailPattern.matcher("user@example.com extra").matches()); // false

// ── find() — search anywhere in input, advance position: ─────────────
Matcher m = digits.matcher("abc 123 def 456 ghi 789");

while (m.find()) {
    System.out.printf("Found '%s' at positions [%d, %d)%n",
        m.group(), m.start(), m.end());
}
// Found '123' at positions [4, 7)
// Found '456' at positions [12, 15)
// Found '789' at positions [20, 23)

// ── find() position model: ────────────────────────────────────────────
Matcher pos = digits.matcher("a1b2c3");
m.reset();  // reset to start

pos.find();  System.out.println(pos.group() + " at " + pos.start());  // 1 at 1
pos.find();  System.out.println(pos.group() + " at " + pos.start());  // 2 at 3
pos.find();  System.out.println(pos.group() + " at " + pos.start());  // 3 at 5
System.out.println(pos.find());  // false — no more matches

pos.reset();  // back to position 0
System.out.println(pos.find());  // true — finds "1" again
System.out.println(pos.group()); // 1

// ── lookingAt() — match at start, not necessarily to end: ────────────
System.out.println(digits.matcher("123abc").lookingAt());   // true  — starts with digits
System.out.println(digits.matcher("abc123").lookingAt());   // false — does not start with digits
System.out.println(digits.matcher("12345").lookingAt());    // true  — works like matches() here

Capturing Groups — Extracting Sub-Matches

Capturing groups are one of the most powerful features of regular expressions. A capturing group is a portion of a pattern enclosed in parentheses — the regex engine records what text matched that portion, and you can retrieve it separately from the overall match. Groups are numbered from 1 in the order their opening parentheses appear in the pattern. Group 0 is special — it always refers to the entire match. When a match is found (by matches(), find(), or lookingAt()), the group() method retrieves the text that matched each group. group() with no argument or group(0) returns the entire match. group(1) returns what the first group matched. group(2) returns what the second group matched. If a group was optional and did not participate in the match, group(n) returns null. The start(n) and end(n) methods return the start (inclusive) and end (exclusive) positions of each group within the input string, which is useful for operations that need the position of a specific part of a match, such as highlighting matches in a text editor or replacing only specific portions. Named capturing groups provide an alternative to numeric group references. A named group is declared with (?<name>pattern) and retrieved with group("name"). Named groups make complex patterns more readable because the group purpose is documented in the pattern itself, and the retrieval code reads like "get the year component" rather than "get group 3."
Java
// ── Numbered groups: ─────────────────────────────────────────────────
Pattern datePattern = Pattern.compile("(\d{4})-(\d{2})-(\d{2})");
//                                       ↑ g1       ↑ g2       ↑ g3

Matcher m = datePattern.matcher("Event date: 2025-05-30, end: 2025-06-15");

while (m.find()) {
    System.out.println("Full match: " + m.group(0));  // or m.group()
    System.out.println("  Year:     " + m.group(1));
    System.out.println("  Month:    " + m.group(2));
    System.out.println("  Day:      " + m.group(3));
    System.out.println();
}
// Full match: 2025-05-30
//   Year:     2025
//   Month:    05
//   Day:      30
//
// Full match: 2025-06-15
//   Year:     2025
//   Month:    06
//   Day:      15

// ── Group positions with start() and end(): ───────────────────────────
Pattern p = Pattern.compile("(\w+)@(\w+)\.(\w+)");
Matcher em = p.matcher("Contact: user@example.com please");

if (em.find()) {
    System.out.println("Full:   [" + em.start()    + "," + em.end()    + ")");
    System.out.println("User:   [" + em.start(1)   + "," + em.end(1)   + ")");
    System.out.println("Domain: [" + em.start(2)   + "," + em.end(2)   + ")");
    System.out.println("TLD:    [" + em.start(3)   + "," + em.end(3)   + ")");
}
// Full:   [9, 24)
// User:   [9, 13)
// Domain: [14, 21)
// TLD:    [22, 25)

// ── Named capturing groups — more readable: ───────────────────────────
Pattern namedDate = Pattern.compile(
    "(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})");

Matcher nm = namedDate.matcher("Submitted: 2025-05-30");
if (nm.find()) {
    System.out.println("Year:  " + nm.group("year"));   // 2025
    System.out.println("Month: " + nm.group("month"));  // 05
    System.out.println("Day:   " + nm.group("day"));    // 30
}

// ── Optional groups — may be null: ───────────────────────────────────
Pattern optional = Pattern.compile("(\+1)?(\d{10})");
Matcher om = optional.matcher("Call 8005551234 or +18005559876");

while (om.find()) {
    String countryCode = om.group(1);   // null if no country code
    String number      = om.group(2);
    System.out.println("Code: " + countryCode + "  Number: " + number);
}
// Code: null  Number: 8005551234
// Code: +1    Number: 8005559876

Replace Operations — replaceAll(), replaceFirst(), appendReplacement()

Matcher provides three mechanisms for replacing matched text, each serving a different level of complexity. replaceAll() and replaceFirst() are the simple forms — they replace all or only the first match with a fixed replacement string, returning the new string. They are concise and appropriate when the replacement is the same for every match. The replacement string supports backreferences to captured groups using $1, $2 syntax (where $1 refers to group 1) or ${name} for named groups. This allows rearranging or reformatting matched content — for example, converting a date from YYYY-MM-DD to DD/MM/YYYY by matching the year, month, and day as groups and referencing them in the replacement string in a different order. appendReplacement() and appendTail() form the most powerful replacement pattern, enabling computed replacements where the replacement string is determined by code logic at runtime for each individual match. appendReplacement() copies the text between the previous match end and the current match start into a StringBuffer, then appends the specified replacement. appendTail() copies the remaining text after the last match. Together, they support arbitrary transformation of each matched substring. The appendReplacement() pattern is the correct tool when different matches need different replacements — such as looking up each matched word in a dictionary and replacing it with its definition, or censoring profanity with asterisks where the number of asterisks matches the length of the censored word. Java 9 added a replaceAll(Function<MatchResult, String>) overload that provides the same capability more cleanly without the verbose StringBuffer accumulation.
Java
// ── replaceAll() — replace all matches with fixed string: ────────────
String text = "The cat sat on the mat with another cat";
String result = text.replaceAll("cat", "dog");
System.out.println(result);
// The dog sat on the mat with another dog

// ── replaceFirst() — replace only the first match: ────────────────────
System.out.println(text.replaceFirst("cat", "dog"));
// The dog sat on the mat with another cat

// ── Backreferences in replacement — reformat dates: ───────────────────
String dates = "Events: 2025-05-30 and 2025-06-15";
String reformatted = dates.replaceAll(
    "(\d{4})-(\d{2})-(\d{2})",
    "$3/$2/$1"              // rearrange: day/month/year
);
System.out.println(reformatted);
// Events: 30/05/2025 and 15/06/2025

// ── Named group backreference: ────────────────────────────────────────
String namedResult = "2025-05-30".replaceAll(
    "(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})",
    "${day}/${month}/${year}"
);
System.out.println(namedResult);   // 30/05/2025

// ── appendReplacement() + appendTail() — computed replacement: ────────
Pattern wordPattern = Pattern.compile("\b\w+\b");
String sentence = "hello world java programming";
StringBuffer sb = new StringBuffer();
Matcher wm = wordPattern.matcher(sentence);

while (wm.find()) {
    String word = wm.group();
    // Replace each word with its uppercase version plus its length:
    String replacement = word.toUpperCase() + "(" + word.length() + ")";
    wm.appendReplacement(sb, replacement);
}
wm.appendTail(sb);   // append remaining text after last match
System.out.println(sb);
// HELLO(5) WORLD(5) JAVA(4) PROGRAMMING(11)

// ── Java 9+ functional replacement — cleaner: ─────────────────────────
String modern = wordPattern.matcher(sentence)
    .replaceAll(mr -> mr.group().toUpperCase() + "(" + mr.group().length() + ")");
System.out.println(modern);   // same result

// ── Censor sensitive words — replacement depends on match content: ────
Pattern badWords = Pattern.compile("foo|bar|baz", Pattern.CASE_INSENSITIVE);
String censored = badWords.matcher("This FOO and Bar and BAZ here")
    .replaceAll(mr -> "*".repeat(mr.group().length()));
System.out.println(censored);   // This *** and *** and *** here

Practical Regex Patterns and Matcher Use Cases

Regular expressions combined with Matcher are used for a specific but important set of tasks: validating user input against a format, extracting structured data from text, transforming text by replacing patterns, parsing log files, and tokenising input. Understanding which Matcher method to use for each use case — and knowing a library of useful patterns — makes regex a practical everyday tool rather than an obscure syntax. Input validation is the most common use case. Phone numbers, email addresses, postal codes, IP addresses, dates, and other structured inputs have format rules that can be expressed as regular expressions. The pattern is compiled once as a static constant, and matches() is called per validation check. Data extraction from unstructured text is where find() with capturing groups excels. Log parsing is the canonical example: a log line like [2025-05-30 14:23:01] INFO OrderService - Order placed has a known structure that can be captured with a single pattern into timestamp, level, service, and message groups, making each field directly accessible without complex string splitting. Escaping special characters in replacement strings is a common pitfall. The $ character in a replacement string introduces a group reference, and the backslash is an escape character. If the replacement string is dynamically computed and might contain these characters, use Matcher.quoteReplacement() to escape them, or use the functional replaceAll(Function) form which does not interpret the returned string as a replacement pattern.
Java
// ── Common validation patterns as static constants: ──────────────────
public class Patterns {
    // Email (simplified):
    public static final Pattern EMAIL =
        Pattern.compile("^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$");

    // UK postcode:
    public static final Pattern UK_POSTCODE =
        Pattern.compile("^[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}$",
            Pattern.CASE_INSENSITIVE);

    // IPv4 address:
    public static final Pattern IPV4 =
        Pattern.compile("^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}" +
                        "(25[0-5]|2[0-4]\d|[01]?\d\d?)$");

    // UK phone (E.164):
    public static final Pattern UK_PHONE =
        Pattern.compile("^\+44\d{10}$");

    // ISO date (YYYY-MM-DD):
    public static final Pattern ISO_DATE =
        Pattern.compile("^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$");
}

System.out.println(Patterns.EMAIL.matcher("user@example.com").matches());  // true
System.out.println(Patterns.UK_POSTCODE.matcher("SW1A 1AA").matches());    // true
System.out.println(Patterns.IPV4.matcher("192.168.1.1").matches());        // true
System.out.println(Patterns.IPV4.matcher("999.0.0.1").matches());          // false

// ── Log parsing — extract fields from structured log lines: ───────────
Pattern logPattern = Pattern.compile(
    "\[(?<date>\d{4}-\d{2}-\d{2}) (?<time>\d{2}:\d{2}:\d{2})]" +
    " (?<level>\w+)" +
    " (?<service>\w+)" +
    " - (?<message>.+)");

String[] logLines = {
    "[2025-05-30 14:23:01] INFO  OrderService - Order placed orderId=1001",
    "[2025-05-30 14:23:02] ERROR PayService   - Payment failed orderId=1001",
    "[2025-05-30 14:23:03] INFO  UserService  - User logged in userId=42"
};

for (String line : logLines) {
    Matcher lm = logPattern.matcher(line);
    if (lm.matches()) {
        System.out.printf("[%s %s] %-5s %-12s → %s%n",
            lm.group("date"), lm.group("time"),
            lm.group("level"), lm.group("service"),
            lm.group("message"));
    }
}

// ── Extract all URLs from text: ───────────────────────────────────────
Pattern urlPattern = Pattern.compile("https?://[\w./%-]+");
String html = "Visit https://example.com or https://docs.oracle.com/javase";
Matcher um = urlPattern.matcher(html);

List<String> urls = new ArrayList<>();
while (um.find()) {
    urls.add(um.group());
}
System.out.println(urls);
// [https://example.com, https://docs.oracle.com/javase]

// ── Matcher.quoteReplacement() — safe dynamic replacements: ───────────
String input = "price is 100";
String dynamicReplacement = "$50.00";  // contains $ — dangerous in replacement!

// WRONG — $ interpreted as group reference, throws or produces garbage:
// input.replaceFirst("100", dynamicReplacement);

// CORRECT — escape special chars in replacement string:
String safe = input.replaceFirst("100",
    Matcher.quoteReplacement(dynamicReplacement));
System.out.println(safe);  // price is $50.00

Matcher State and Advanced Features

Matcher maintains position state that enables streaming iteration over matches but can also cause confusion if the state is not managed correctly. The current position advances automatically after each find() call. reset() resets the position without creating a new Matcher. reset(newInput) replaces the input string and resets the position — allowing a Matcher to be reused for a different input string while keeping the same Pattern, which avoids the overhead of calling pattern.matcher() again. The groupCount() method returns the number of capturing groups in the pattern (not counting group 0). This is useful when iterating over all groups programmatically — loop from 1 to groupCount() inclusive. The find(int start) overload starts the search at a specified position rather than at the current position, jumping past already-matched text when needed. Pattern flags control matching behaviour and can be specified either inline in the pattern string (?i) or as constants in the Pattern.compile() second argument. CASE_INSENSITIVE enables case-insensitive matching. MULTILINE makes ^ and $ match at the start and end of each line rather than just the entire string. DOTALL makes . match any character including newlines (by default, . does not match ). COMMENTS allows whitespace and comments in the pattern for readability. Combining flags with | produces the combined behaviour. The toMatchResult() method captures a snapshot of the current match state as an immutable MatchResult — useful when you need to store match results for later processing without keeping the Matcher object around. Results() method (Java 9+) returns a Stream<MatchResult> of all matches, enabling functional-style processing of all matches in a single expression.
Java
// ── reset() and reset(input) — reuse a Matcher: ──────────────────────
Pattern p = Pattern.compile("\d+");
Matcher m = p.matcher("abc 123 def");

m.find(); System.out.println(m.group());  // 123
m.reset();                                // back to position 0
m.find(); System.out.println(m.group());  // 123 again

m.reset("xyz 456 789");                   // new input, same pattern
while (m.find()) System.out.println(m.group());  // 456, 789

// ── groupCount() — number of groups in the pattern: ───────────────────
Pattern grp = Pattern.compile("(\d{4})-(\d{2})-(\d{2})");
Matcher gm  = grp.matcher("2025-05-30");
gm.matches();
System.out.println("Groups: " + gm.groupCount());  // 3

for (int i = 0; i <= gm.groupCount(); i++) {       // 0 = full match, 1-3 = groups
    System.out.println("Group " + i + ": " + gm.group(i));
}
// Group 0: 2025-05-30
// Group 1: 2025
// Group 2: 05
// Group 3: 30

// ── Pattern flags: ────────────────────────────────────────────────────
// CASE_INSENSITIVE:
Pattern ci = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);
System.out.println(ci.matcher("Hello World").find());     // true
System.out.println(ci.matcher("HELLO").matches());        // true

// MULTILINE — ^ and $ match line starts/ends:
Pattern ml = Pattern.compile("^\d+$", Pattern.MULTILINE);
Matcher mm = ml.matcher("abc
123
def
456
ghi");
while (mm.find()) System.out.println("Line number: " + mm.group());
// 123
// 456

// DOTALL — . matches newline too:
Pattern ds = Pattern.compile("start.*end", Pattern.DOTALL);
String multiLine = "start
some
text
end";
System.out.println(ds.matcher(multiLine).find());  // true (. matches 
)

// Combining flags:
Pattern combo = Pattern.compile("hello.world",
    Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
System.out.println(combo.matcher("HELLO
WORLD").matches());  // true

// Inline flags — equivalent to constructor flags:
Pattern inline = Pattern.compile("(?i)hello");  // (?i) = CASE_INSENSITIVE
System.out.println(inline.matcher("HELLO").matches());  // true

// ── Java 9+ results() stream: ────────────────────────────────────────
Pattern words = Pattern.compile("[A-Z][a-z]+");
String text = "The Quick Brown Fox Jumps";
words.matcher(text)
    .results()
    .map(mr -> mr.group() + "[" + mr.start() + "]")
    .forEach(System.out::println);
// The[0]
// Quick[4]
// Brown[10]
// Fox[16]
// Jumps[20]

Related Topics in Strings

String Class
String is one of the most fundamental classes in Java — used in virtually every program, yet deeply misunderstood by many developers. A String represents an immutable sequence of Unicode characters. It is not a primitive type but a full class in java.lang, automatically imported into every Java file. Understanding String means understanding how it is stored in memory, why it is immutable, how the string pool works, what the difference between == and equals() means for strings, and how to use the class efficiently. This entry covers String's nature as a class, its internal representation, the critical distinction between reference equality and value equality, String's place in the type hierarchy, and the design decisions that make String behave the way it does.
String Pool
The string pool (also called the string intern pool or string constant pool) is a special memory region maintained by the JVM that stores a single copy of each unique string value. When two string literals have the same content, they refer to the same object in the pool rather than two separate objects. The pool is a flyweight pattern applied at the language level — it dramatically reduces memory consumption in applications that use many repeated string values, which is nearly every application. This entry covers how the pool works, where it lives in JVM memory, how to interact with it programmatically, the intern() method, performance implications, and when to use or avoid pool entries.
Immutable String
String immutability is the most important design decision in Java's String class. Once a String object is created, its character sequence can never change. No method on String modifies the string; every method that appears to modify returns a new String object containing the result. This design decision drives thread safety, enables the string pool, makes strings safe hash map keys, and simplifies reasoning about string values. Understanding why String is immutable, how immutability is enforced, and what the consequences of immutability are clarifies the behaviour of virtually every piece of Java code that handles strings.
Mutable String
Java provides two mutable string classes for scenarios where String's immutability would be inefficient: StringBuilder and StringBuffer. Both maintain an internal character buffer that can be modified in place — characters can be appended, inserted, deleted, and replaced without creating new objects. StringBuilder is the modern choice for single-threaded use; StringBuffer is the legacy thread-safe version with synchronised methods. This entry covers the internal buffer mechanics, the full API of both classes, performance characteristics, when to use each, thread safety implications, and the complete patterns for efficient string construction.