☕ JavaStrings

StringTokenizer

StringTokenizer is a legacy Java class that breaks a string into tokens — substrings separated by delimiter characters. It was the standard string-splitting mechanism before Java 1.4 introduced String.split() with regular expression support. StringTokenizer is still available and occasionally used for simple, high-performance tokenisation of strings with single-character delimiters. It implements Enumeration<Object> and processes tokens one at a time without loading all tokens into memory simultaneously.

What StringTokenizer Does and How It Works

StringTokenizer breaks a string into substrings (tokens) by scanning for delimiter characters. Unlike String.split() which takes a regular expression, StringTokenizer works with a set of delimiter characters specified as a single string. Any character in the delimiter string is treated as a delimiter. The tokeniser scans the input string left to right, skips delimiter characters, and extracts the non-delimiter character sequences as tokens. The fundamental behavioural difference from String.split() is how consecutive delimiters are handled. StringTokenizer skips consecutive delimiters and treats them as a single separator — a string like "a,,b" with comma as delimiter produces two tokens "a" and "b". String.split(",") on the same input produces three tokens "a", "", "b" — the empty string between the two consecutive commas is included as an empty token. This means StringTokenizer is not suitable for parsing CSV data where empty fields (consecutive delimiters) must be preserved, while String.split() correctly produces empty strings for those fields. StringTokenizer implements the Enumeration interface and processes the input string lazily — it extracts tokens one at a time on demand rather than computing all tokens upfront. This makes it slightly more memory-efficient for very long strings when only the first few tokens are needed. The hasMoreTokens() and nextToken() methods form the standard iteration loop, analogous to Iterator's hasNext() and next(). For modern Java code, StringTokenizer is rarely the right choice. String.split() and the more powerful Pattern.split() handle nearly all real-world tokenisation needs with cleaner syntax. StringTokenizer remains relevant in competitive programming for its very fast simple tokenisation, and in legacy codebases where it is already in use.

Java

// ── Basic StringTokenizer usage: ─────────────────────────────────────
String sentence = "The quick brown fox";

StringTokenizer st = new StringTokenizer(sentence);
// Default delimiter is " 	

" (whitespace characters)

System.out.println("Token count: " + st.countTokens()); // 4

while (st.hasMoreTokens()) {
    System.out.println(st.nextToken());
}
// The
// quick
// brown
// fox

// ── Custom delimiter: ─────────────────────────────────────────────────
String csv = "Alice,30,Engineer,London";
StringTokenizer csvSt = new StringTokenizer(csv, ",");

while (csvSt.hasMoreTokens()) {
    System.out.println(csvSt.nextToken());
}
// Alice
// 30
// Engineer
// London

// ── Multiple delimiter characters — any char in the string is a delimiter: ─
String mixed = "one:two;three:four;five";
StringTokenizer multiSt = new StringTokenizer(mixed, ":;");
// Both ':' and ';' are delimiters

while (multiSt.hasMoreTokens()) {
    System.out.print(multiSt.nextToken() + " ");
}
// one two three four five

// ── Consecutive delimiters — SKIPPED (no empty tokens): ───────────────
String withGaps = "one,,two,,three";
StringTokenizer gapSt = new StringTokenizer(withGaps, ",");
System.out.println("Tokens: " + gapSt.countTokens());  // 3 — not 5!

while (gapSt.hasMoreTokens()) {
    System.out.println("[" + gapSt.nextToken() + "]");
}
// [one]
// [two]
// [three]
// Empty strings between consecutive commas are SKIPPED.

// ── Compare with String.split() — different behaviour: ────────────────
String[] parts = "one,,two,,three".split(",");
System.out.println("Parts: " + parts.length);    // 5
for (String p : parts) System.out.println("[" + p + "]");
// [one]
// []      ← empty token for the empty field
// [two]
// []      ← empty token for the empty field
// [three]

StringTokenizer Constructor Options

StringTokenizer has three constructors that control its behaviour. The simplest takes only the string to tokenise and uses the default whitespace delimiter set. The second takes the string and a custom delimiter string. The third takes the string, a delimiter string, and a boolean flag that controls whether delimiters should be returned as tokens. The returnDelims flag is the most useful advanced feature. When set to true, delimiter characters are returned as single-character tokens alongside the content tokens. This allows the caller to determine exactly where delimiters appeared in the original string — useful when the position and type of delimiter matters for parsing logic. Without this flag, the tokeniser only tells you what the content is, not where it was or what separated it. The constructor argument order and the meaning of returnDelims is a common source of confusion. The delimiter string is not a regular expression and not a single character — it is a string where every character is individually treated as a delimiter. Passing ".,;" means each of the three characters — period, comma, and semicolon — is a delimiter character. The nextToken(newDelimiter) method allows changing the delimiter set mid-stream. This supports structured formats where different sections use different delimiters — for example, parsing a protocol message where the header uses one delimiter style and the body uses another. This capability is not available in String.split().

Java

// ── Constructor 1: default whitespace delimiters: ────────────────────
StringTokenizer st1 = new StringTokenizer("  hello   world  ");
System.out.println(st1.countTokens());  // 2 — leading/trailing spaces skipped
while (st1.hasMoreTokens()) System.out.print(st1.nextToken() + "|");
// hello|world|

// ── Constructor 2: custom delimiter string: ───────────────────────────
StringTokenizer st2 = new StringTokenizer("a=b&c=d", "=&");
// Every char in "=&" is a delimiter: '=' and '&'
while (st2.hasMoreTokens()) System.out.print(st2.nextToken() + " ");
// a b c d

// ── Constructor 3: returnDelims=true — delimiters returned as tokens: ─
StringTokenizer st3 = new StringTokenizer("a+b*c", "+*", true);
while (st3.hasMoreTokens()) {
    String token = st3.nextToken();
    if (token.equals("+"))      System.out.print("[PLUS]");
    else if (token.equals("*")) System.out.print("[MULT]");
    else                         System.out.print(token);
}
// a[PLUS]b[MULT]c

// ── nextToken(delimiter) — change delimiter mid-stream: ───────────────
StringTokenizer dynamic = new StringTokenizer("key=value&next");
String key   = dynamic.nextToken("=");    // "key"   — up to '='
String value = dynamic.nextToken("&");    // "value" — up to '&'
String rest  = dynamic.nextToken();       // "next"  — remaining
System.out.printf("key=%s value=%s rest=%s%n", key, value, rest);

// ── Collecting all tokens into a List: ────────────────────────────────
StringTokenizer st4 = new StringTokenizer("one two three four five");
List<String> tokens = new ArrayList<>();
while (st4.hasMoreTokens()) {
    tokens.add(st4.nextToken());
}
System.out.println(tokens);  // [one, two, three, four, five]

StringTokenizer vs String.split() vs Scanner

Modern Java offers three primary string-splitting mechanisms, each with different capabilities and appropriate use cases. StringTokenizer is the oldest and fastest for simple single-character delimiter tokenisation, but it lacks support for regular expressions, cannot return empty tokens, and uses the legacy Enumeration API. String.split() uses regular expressions, returns an array (all tokens computed at once), and preserves empty tokens. Scanner with useDelimiter() uses regular expressions, supports streaming access, handles type conversion (nextInt(), nextDouble()), and can read from files and streams in addition to strings. The practical choice is almost always String.split() for splitting a string into an array of parts, Scanner for parsing typed values from a string, and Stream-based approaches for functional transformations. StringTokenizer's place in new code is narrow: it is faster for very simple tokenisation of very large numbers of strings where the overhead of regex compilation and array allocation in split() is measurable, and in competitive programming where it is used as a fast input-reading tool. String.split() has one known pitfall: a trailing delimiter at the end of the input string produces a trailing empty string that is silently dropped by default. For example, "a,b,c,".split(",") produces ["a", "b", "c"], not ["a", "b", "c", ""]. This trailing-empty-string elimination can be disabled by passing -1 as the limit parameter: "a,b,c,".split(",", -1) produces ["a", "b", "c", ""].

Java

// ── Three ways to split a string — comparison: ────────────────────────

String input = "Alice,30,Engineer,London";

// 1. StringTokenizer — fast, no empty tokens, legacy API:
StringTokenizer st = new StringTokenizer(input, ",");
List<String> tokenizerResult = new ArrayList<>();
while (st.hasMoreTokens()) tokenizerResult.add(st.nextToken());
System.out.println("Tokenizer: " + tokenizerResult);
// [Alice, 30, Engineer, London]

// 2. String.split() — regex, returns array, modern:
String[] splitResult = input.split(",");
System.out.println("split():   " + Arrays.toString(splitResult));
// [Alice, 30, Engineer, London]

// 3. Scanner — typed parsing, streaming:
Scanner sc = new Scanner(input).useDelimiter(",");
List<String> scannerResult = new ArrayList<>();
while (sc.hasNext()) scannerResult.add(sc.next());
System.out.println("Scanner:   " + scannerResult);
// [Alice, 30, Engineer, London]

// ── Empty token behaviour — critical difference: ──────────────────────
String withEmpties = "one,,three,,five";

// StringTokenizer — skips empty tokens:
StringTokenizer st2 = new StringTokenizer(withEmpties, ",");
System.out.println("Tokenizer count: " + st2.countTokens());  // 3

// String.split() — preserves empty tokens by default:
String[] parts = withEmpties.split(",");
System.out.println("split() count:   " + parts.length);       // 5
System.out.println(Arrays.toString(parts));  // [one, , three, , five]

// String.split() with limit=-1 — includes trailing empty: ─────────────
String trailing = "a,b,c,";
System.out.println(Arrays.toString(trailing.split(",")));       // [a, b, c]
System.out.println(Arrays.toString(trailing.split(",", -1)));   // [a, b, c, ]

// ── When to use each: ─────────────────────────────────────────────────
//
// StringTokenizer:
//   ✓ Simple delimiter chars (no regex)
//   ✓ Fastest for simple splits
//   ✓ Memory-efficient (lazy, no array allocation)
//   ✗ No empty tokens (wrong for CSV with empty fields)
//   ✗ No regex support
//   ✗ Legacy Enumeration API
//   → Use in competitive programming / legacy code only
//
// String.split(regex):
//   ✓ Regex support — powerful pattern matching
//   ✓ Returns array — easy iteration
//   ✓ Preserves empty tokens
//   ✗ Slower (regex compilation)
//   ✗ Returns all tokens at once
//   → Standard choice for most use cases
//
// Scanner.useDelimiter(regex):
//   ✓ Regex support
//   ✓ Lazy streaming — one token at a time
//   ✓ Type-parsing (nextInt, nextDouble, etc.)
//   ✓ Works with files, streams, strings
//   → Use when parsing typed data from any character source

StringTokenizer

What StringTokenizer Does and How It Works

StringTokenizer Constructor Options

StringTokenizer vs String.split() vs Scanner

Related Topics in Strings