☕ Java

StringTokenizer

StringTokenizer is a legacy Java class that breaks a string into tokens — substrings separated by delimiter characters. It was the standard string-splitting mechanism before Java 1.4 introduced String.split() with regular expression support. StringTokenizer is still available and occasionally used for simple, high-performance tokenisation of strings with single-character delimiters. It implements Enumeration<Object> and processes tokens one at a time without loading all tokens into memory simultaneously.

What StringTokenizer Does and How It Works

StringTokenizer breaks a string into substrings (tokens) by scanning for delimiter characters. Unlike String.split() which takes a regular expression, StringTokenizer works with a set of delimiter characters specified as a single string. Any character in the delimiter string is treated as a delimiter. The tokeniser scans the input string left to right, skips delimiter characters, and extracts the non-delimiter character sequences as tokens. The fundamental behavioural difference from String.split() is how consecutive delimiters are handled. StringTokenizer skips consecutive delimiters and treats them as a single separator — a string like "a,,b" with comma as delimiter produces two tokens "a" and "b". String.split(",") on the same input produces three tokens "a", "", "b" — the empty string between the two consecutive commas is included as an empty token. This means StringTokenizer is not suitable for parsing CSV data where empty fields (consecutive delimiters) must be preserved, while String.split() correctly produces empty strings for those fields. StringTokenizer implements the Enumeration interface and processes the input string lazily — it extracts tokens one at a time on demand rather than computing all tokens upfront. This makes it slightly more memory-efficient for very long strings when only the first few tokens are needed. The hasMoreTokens() and nextToken() methods form the standard iteration loop, analogous to Iterator's hasNext() and next(). For modern Java code, StringTokenizer is rarely the right choice. String.split() and the more powerful Pattern.split() handle nearly all real-world tokenisation needs with cleaner syntax. StringTokenizer remains relevant in competitive programming for its very fast simple tokenisation, and in legacy codebases where it is already in use.
Java
// ── Basic StringTokenizer usage: ─────────────────────────────────────
String sentence = "The quick brown fox";

StringTokenizer st = new StringTokenizer(sentence);
// Default delimiter is " 	

" (whitespace characters)

System.out.println("Token count: " + st.countTokens()); // 4

while (st.hasMoreTokens()) {
    System.out.println(st.nextToken());
}
// The
// quick
// brown
// fox

// ── Custom delimiter: ─────────────────────────────────────────────────
String csv = "Alice,30,Engineer,London";
StringTokenizer csvSt = new StringTokenizer(csv, ",");

while (csvSt.hasMoreTokens()) {
    System.out.println(csvSt.nextToken());
}
// Alice
// 30
// Engineer
// London

// ── Multiple delimiter characters — any char in the string is a delimiter: ─
String mixed = "one:two;three:four;five";
StringTokenizer multiSt = new StringTokenizer(mixed, ":;");
// Both ':' and ';' are delimiters

while (multiSt.hasMoreTokens()) {
    System.out.print(multiSt.nextToken() + " ");
}
// one two three four five

// ── Consecutive delimiters — SKIPPED (no empty tokens): ───────────────
String withGaps = "one,,two,,three";
StringTokenizer gapSt = new StringTokenizer(withGaps, ",");
System.out.println("Tokens: " + gapSt.countTokens());  // 3 — not 5!

while (gapSt.hasMoreTokens()) {
    System.out.println("[" + gapSt.nextToken() + "]");
}
// [one]
// [two]
// [three]
// Empty strings between consecutive commas are SKIPPED.

// ── Compare with String.split() — different behaviour: ────────────────
String[] parts = "one,,two,,three".split(",");
System.out.println("Parts: " + parts.length);    // 5
for (String p : parts) System.out.println("[" + p + "]");
// [one]
// []      ← empty token for the empty field
// [two]
// []      ← empty token for the empty field
// [three]

StringTokenizer Constructor Options

StringTokenizer has three constructors that control its behaviour. The simplest takes only the string to tokenise and uses the default whitespace delimiter set. The second takes the string and a custom delimiter string. The third takes the string, a delimiter string, and a boolean flag that controls whether delimiters should be returned as tokens. The returnDelims flag is the most useful advanced feature. When set to true, delimiter characters are returned as single-character tokens alongside the content tokens. This allows the caller to determine exactly where delimiters appeared in the original string — useful when the position and type of delimiter matters for parsing logic. Without this flag, the tokeniser only tells you what the content is, not where it was or what separated it. The constructor argument order and the meaning of returnDelims is a common source of confusion. The delimiter string is not a regular expression and not a single character — it is a string where every character is individually treated as a delimiter. Passing ".,;" means each of the three characters — period, comma, and semicolon — is a delimiter character. The nextToken(newDelimiter) method allows changing the delimiter set mid-stream. This supports structured formats where different sections use different delimiters — for example, parsing a protocol message where the header uses one delimiter style and the body uses another. This capability is not available in String.split().
Java
// ── Constructor 1: default whitespace delimiters: ────────────────────
StringTokenizer st1 = new StringTokenizer("  hello   world  ");
System.out.println(st1.countTokens());  // 2 — leading/trailing spaces skipped
while (st1.hasMoreTokens()) System.out.print(st1.nextToken() + "|");
// hello|world|

// ── Constructor 2: custom delimiter string: ───────────────────────────
StringTokenizer st2 = new StringTokenizer("a=b&c=d", "=&");
// Every char in "=&" is a delimiter: '=' and '&'
while (st2.hasMoreTokens()) System.out.print(st2.nextToken() + " ");
// a b c d

// ── Constructor 3: returnDelims=true — delimiters returned as tokens: ─
StringTokenizer st3 = new StringTokenizer("a+b*c", "+*", true);
while (st3.hasMoreTokens()) {
    String token = st3.nextToken();
    if (token.equals("+"))      System.out.print("[PLUS]");
    else if (token.equals("*")) System.out.print("[MULT]");
    else                         System.out.print(token);
}
// a[PLUS]b[MULT]c

// ── nextToken(delimiter) — change delimiter mid-stream: ───────────────
StringTokenizer dynamic = new StringTokenizer("key=value&next");
String key   = dynamic.nextToken("=");    // "key"   — up to '='
String value = dynamic.nextToken("&");    // "value" — up to '&'
String rest  = dynamic.nextToken();       // "next"  — remaining
System.out.printf("key=%s value=%s rest=%s%n", key, value, rest);

// ── Collecting all tokens into a List: ────────────────────────────────
StringTokenizer st4 = new StringTokenizer("one two three four five");
List<String> tokens = new ArrayList<>();
while (st4.hasMoreTokens()) {
    tokens.add(st4.nextToken());
}
System.out.println(tokens);  // [one, two, three, four, five]

StringTokenizer vs String.split() vs Scanner

Modern Java offers three primary string-splitting mechanisms, each with different capabilities and appropriate use cases. StringTokenizer is the oldest and fastest for simple single-character delimiter tokenisation, but it lacks support for regular expressions, cannot return empty tokens, and uses the legacy Enumeration API. String.split() uses regular expressions, returns an array (all tokens computed at once), and preserves empty tokens. Scanner with useDelimiter() uses regular expressions, supports streaming access, handles type conversion (nextInt(), nextDouble()), and can read from files and streams in addition to strings. The practical choice is almost always String.split() for splitting a string into an array of parts, Scanner for parsing typed values from a string, and Stream-based approaches for functional transformations. StringTokenizer's place in new code is narrow: it is faster for very simple tokenisation of very large numbers of strings where the overhead of regex compilation and array allocation in split() is measurable, and in competitive programming where it is used as a fast input-reading tool. String.split() has one known pitfall: a trailing delimiter at the end of the input string produces a trailing empty string that is silently dropped by default. For example, "a,b,c,".split(",") produces ["a", "b", "c"], not ["a", "b", "c", ""]. This trailing-empty-string elimination can be disabled by passing -1 as the limit parameter: "a,b,c,".split(",", -1) produces ["a", "b", "c", ""].
Java
// ── Three ways to split a string — comparison: ────────────────────────

String input = "Alice,30,Engineer,London";

// 1. StringTokenizer — fast, no empty tokens, legacy API:
StringTokenizer st = new StringTokenizer(input, ",");
List<String> tokenizerResult = new ArrayList<>();
while (st.hasMoreTokens()) tokenizerResult.add(st.nextToken());
System.out.println("Tokenizer: " + tokenizerResult);
// [Alice, 30, Engineer, London]

// 2. String.split() — regex, returns array, modern:
String[] splitResult = input.split(",");
System.out.println("split():   " + Arrays.toString(splitResult));
// [Alice, 30, Engineer, London]

// 3. Scanner — typed parsing, streaming:
Scanner sc = new Scanner(input).useDelimiter(",");
List<String> scannerResult = new ArrayList<>();
while (sc.hasNext()) scannerResult.add(sc.next());
System.out.println("Scanner:   " + scannerResult);
// [Alice, 30, Engineer, London]

// ── Empty token behaviour — critical difference: ──────────────────────
String withEmpties = "one,,three,,five";

// StringTokenizer — skips empty tokens:
StringTokenizer st2 = new StringTokenizer(withEmpties, ",");
System.out.println("Tokenizer count: " + st2.countTokens());  // 3

// String.split() — preserves empty tokens by default:
String[] parts = withEmpties.split(",");
System.out.println("split() count:   " + parts.length);       // 5
System.out.println(Arrays.toString(parts));  // [one, , three, , five]

// String.split() with limit=-1 — includes trailing empty: ─────────────
String trailing = "a,b,c,";
System.out.println(Arrays.toString(trailing.split(",")));       // [a, b, c]
System.out.println(Arrays.toString(trailing.split(",", -1)));   // [a, b, c, ]

// ── When to use each: ─────────────────────────────────────────────────
//
// StringTokenizer:
//   ✓ Simple delimiter chars (no regex)
//   ✓ Fastest for simple splits
//   ✓ Memory-efficient (lazy, no array allocation)
//   ✗ No empty tokens (wrong for CSV with empty fields)
//   ✗ No regex support
//   ✗ Legacy Enumeration API
//   → Use in competitive programming / legacy code only
//
// String.split(regex):
//   ✓ Regex support — powerful pattern matching
//   ✓ Returns array — easy iteration
//   ✓ Preserves empty tokens
//   ✗ Slower (regex compilation)
//   ✗ Returns all tokens at once
//   → Standard choice for most use cases
//
// Scanner.useDelimiter(regex):
//   ✓ Regex support
//   ✓ Lazy streaming — one token at a time
//   ✓ Type-parsing (nextInt, nextDouble, etc.)
//   ✓ Works with files, streams, strings
//   → Use when parsing typed data from any character source

Related Topics in Strings

String Class
String is one of the most fundamental classes in Java — used in virtually every program, yet deeply misunderstood by many developers. A String represents an immutable sequence of Unicode characters. It is not a primitive type but a full class in java.lang, automatically imported into every Java file. Understanding String means understanding how it is stored in memory, why it is immutable, how the string pool works, what the difference between == and equals() means for strings, and how to use the class efficiently. This entry covers String's nature as a class, its internal representation, the critical distinction between reference equality and value equality, String's place in the type hierarchy, and the design decisions that make String behave the way it does.
String Pool
The string pool (also called the string intern pool or string constant pool) is a special memory region maintained by the JVM that stores a single copy of each unique string value. When two string literals have the same content, they refer to the same object in the pool rather than two separate objects. The pool is a flyweight pattern applied at the language level — it dramatically reduces memory consumption in applications that use many repeated string values, which is nearly every application. This entry covers how the pool works, where it lives in JVM memory, how to interact with it programmatically, the intern() method, performance implications, and when to use or avoid pool entries.
Immutable String
String immutability is the most important design decision in Java's String class. Once a String object is created, its character sequence can never change. No method on String modifies the string; every method that appears to modify returns a new String object containing the result. This design decision drives thread safety, enables the string pool, makes strings safe hash map keys, and simplifies reasoning about string values. Understanding why String is immutable, how immutability is enforced, and what the consequences of immutability are clarifies the behaviour of virtually every piece of Java code that handles strings.
Mutable String
Java provides two mutable string classes for scenarios where String's immutability would be inefficient: StringBuilder and StringBuffer. Both maintain an internal character buffer that can be modified in place — characters can be appended, inserted, deleted, and replaced without creating new objects. StringBuilder is the modern choice for single-threaded use; StringBuffer is the legacy thread-safe version with synchronised methods. This entry covers the internal buffer mechanics, the full API of both classes, performance characteristics, when to use each, thread safety implications, and the complete patterns for efficient string construction.