☕ Java

String Class

String is one of the most fundamental classes in Java — used in virtually every program, yet deeply misunderstood by many developers. A String represents an immutable sequence of Unicode characters. It is not a primitive type but a full class in java.lang, automatically imported into every Java file. Understanding String means understanding how it is stored in memory, why it is immutable, how the string pool works, what the difference between == and equals() means for strings, and how to use the class efficiently. This entry covers String's nature as a class, its internal representation, the critical distinction between reference equality and value equality, String's place in the type hierarchy, and the design decisions that make String behave the way it does.

String as a Class — Not a Primitive

String occupies a unique position in Java's type system. It is not a primitive like int or boolean, but it behaves more like a primitive than any other class. Java gives String special syntax — string literals, the + concatenation operator, and automatic wrapping in quotes — that no other class enjoys. This special treatment hides the fact that every string literal creates an object, every concatenation potentially creates objects, and string variables hold references, not values. Unlike primitive types, String variables hold a reference to a heap object. Assigning one String variable to another copies the reference, not the string content. Two String variables can hold the same reference (pointing to the same object) or different references (pointing to different objects that happen to contain the same characters). This reference semantics is why == on strings compares memory addresses, not character content — the same trap that applies to all Java objects. String is declared final, preventing subclassing. This finality is essential to the string pool (all pool entries must be the exact String class, not an unknown subclass) and to the immutability guarantee (a subclass could override methods to mutate state). String implements three interfaces: Serializable (strings can be written to streams), Comparable<String> (strings have a natural lexicographic ordering), and CharSequence (strings participate in the common character sequence abstraction shared with StringBuilder and CharBuffer).
Java
// ── String is a class, not a primitive ───────────────────────────────
String s = "Hello";   // s holds a REFERENCE to a String object on the heap

// ── Reference semantics ───────────────────────────────────────────────
String a = "Hello";
String b = a;         // b and a hold the SAME reference

// ── == compares references, not content ──────────────────────────────
String x = new String("Hello");
String y = new String("Hello");

System.out.println(x == y);        // false — different objects
System.out.println(x.equals(y));   // true  — same content

// ── String is final — cannot subclass ────────────────────────────────
// public class MyString extends String { }  // COMPILE ERROR

// ── String implements CharSequence ───────────────────────────────────
CharSequence cs = "Hello";          // String is-a CharSequence
CharSequence sb = new StringBuilder("Hello");  // StringBuilder is-a CharSequence

// Methods accepting CharSequence work with both:
public static int charCount(CharSequence seq) {
    return seq.length();
}
System.out.println(charCount("Hello"));                    // 5
System.out.println(charCount(new StringBuilder("Hello"))); // 5

// ── String implements Comparable<String> ─────────────────────────────
String[] words = {"banana", "apple", "cherry"};
Arrays.sort(words);   // uses compareTo() — lexicographic natural order
System.out.println(Arrays.toString(words)); // [apple, banana, cherry]

// ── Type hierarchy ────────────────────────────────────────────────────
String str = "Hello";
System.out.println(str instanceof String);       // true
System.out.println(str instanceof Object);       // true — everything is
System.out.println(str instanceof CharSequence); // true
System.out.println(str instanceof Comparable);   // true
System.out.println(str instanceof Serializable); // true

Internal Representation and Memory Layout

Internally, Java's String class stores characters in a char array (or byte array from Java 9 onwards with compact strings). The String object contains a reference to this character data array, a hash field cached after first computation, and from Java 9, a coder byte indicating whether the string uses Latin-1 (1 byte per character) or UTF-16 (2 bytes per character) encoding. The compact string optimization introduced in Java 9 stores Latin-1 strings (characters with code points 0-255) in a byte array rather than a char array, halving memory usage for ASCII-heavy applications. The length of a String is the number of char values (UTF-16 code units), not the number of Unicode code points. For characters in the Basic Multilingual Plane (code points 0x0000 to 0xFFFF), one char equals one code point. For supplementary characters (code points above 0xFFFF, such as emoji and many CJK extension characters), Java uses surrogate pairs — two chars to represent one code point. This means length() can return a value larger than the actual number of visible characters for strings containing emoji or supplementary characters. The hash code is computed lazily and cached after first computation. Calling hashCode() the first time traverses all characters to compute the hash; subsequent calls return the cached value in O(1). This caching is safe because String is immutable — the hash can never become stale.
Java
// ── String length vs code point count ───────────────────────────────
String ascii   = "Hello";
String emoji   = "Hello 👋";          // thumbs wave = U+1F44B, a supplementary char
String chinese = "你好";               // BMP characters, 1 char each

System.out.println(ascii.length());           // 5
System.out.println(emoji.length());           // 8 (5 + space + 2 surrogates)
System.out.println(emoji.codePointCount(0, emoji.length())); // 7 (5 + space + 1 emoji)
System.out.println(chinese.length());         // 2

// ── Iterating code points correctly ──────────────────────────────────
String text = "Hi 👋";
// WRONG for supplementary chars:
for (int i = 0; i < text.length(); i++) {
    char c = text.charAt(i);   // may get half a surrogate pair
}

// CORRECT — iterate by code point:
text.codePoints().forEach(cp ->
    System.out.printf("U+%04X %s%n",
        cp, Character.getName(cp)));

// ── Hash code caching ─────────────────────────────────────────────────
String s = "hello world";
System.out.println(s.hashCode());  // computes and caches
System.out.println(s.hashCode());  // returns cached value — O(1)

// ── Memory comparison: String vs char[] ──────────────────────────────
// String object overhead (approximate):
// - Object header:      16 bytes
// - char[]/byte[] ref:   8 bytes
// - hash field:          4 bytes
// - coder field:         1 byte (Java 9+)
// - char[] object:      16 bytes header + 2 bytes per char
//
// "Hello" (5 chars): ~16 + 8 + 4 + 1 + 16 + 10 = ~55 bytes
// vs char[] alone:   ~16 + 10 = 26 bytes
// The String wrapper adds ~30 bytes overhead per string object

// ── Compact strings (Java 9+) ─────────────────────────────────────────
// Latin-1 strings (all chars <= U+00FF) use 1 byte per char:
// "Hello"byte[5] = {72, 101, 108, 108, 111}
//
// Non-Latin-1 strings use 2 bytes per char (UTF-16):
// "Héllo"byte[10] (because é = U+00E9 > U+00FF)
//
// This halves memory for ASCII-heavy applications

equals() vs == — The Critical Distinction

The single most common Java beginner mistake is comparing strings with ==. The == operator on reference types tests reference equality — do both variables point to the same object in memory? The equals() method tests value equality — do both strings contain the same sequence of characters? For most string comparisons in application code, value equality is what you want, which means equals() is almost always the right choice. The reason == sometimes appears to work for strings is the string pool. String literals with the same content are interned — they refer to the same object in the pool. So "hello" == "hello" is true because both literals resolve to the same pool entry. But this is an implementation detail, not a guarantee for all strings. Strings created with new String(), strings read from files, strings returned by methods like substring() or trim(), and strings built with StringBuilder are not guaranteed to be in the pool. Relying on == for correctness is a latent bug. The correct discipline is: use equals() for content comparison, == only to check identity (same object). For null-safe comparison, use Objects.equals(a, b) which handles null on either side. For case-insensitive comparison, use equalsIgnoreCase(). Never use == to compare string content in production code.
Java
// ── == vs equals() — the fundamental difference ──────────────────────
String a = "Hello";
String b = "Hello";
String c = new String("Hello");

System.out.println(a == b);          // true  — same pool object
System.out.println(a == c);          // false — c is a new heap object
System.out.println(a.equals(b));     // true  — same content
System.out.println(a.equals(c));     // true  — same content

// ── Why == sometimes "works" and sometimes fails ──────────────────────
String s1 = "Hello";
String s2 = "Hel" + "lo";           // compile-time constant, interned
System.out.println(s1 == s2);       // true — compiler optimised

String part = "Hel";
String s3 = part + "lo";            // runtime concatenation, not interned
System.out.println(s1 == s3);       // false — s3 is a new heap object
System.out.println(s1.equals(s3));  // true  — same content

// ── Correct comparison patterns ───────────────────────────────────────
// Standard comparison
if (s1.equals(s3)) { System.out.println("same content"); }

// Null-safe comparison (avoids NullPointerException)
String maybeNull = null;
System.out.println(Objects.equals(s1, maybeNull)); // false, no NPE

// Literal on left — protects against NPE if variable is null
if ("Hello".equals(maybeNull)) { }   // safe — no NPE

// Case-insensitive comparison
System.out.println("Hello".equalsIgnoreCase("HELLO")); // true

// ── Pitfall: == in switch before Java 7 (now safe with equals) ────────
// Never do:
// if (status == "active") { ... }   // only works by accident sometimes
// Always do:
// if ("active".equals(status)) { ... }

// ── intern() — force pool entry ──────────────────────────────────────
String c2 = c.intern();    // return pool entry for c's content
System.out.println(a == c2);   // true — now same pool object

String Concatenation — Performance and Mechanics

The + operator for string concatenation is syntactic sugar that the Java compiler translates into StringBuilder operations. When you write a + b + c, the compiler generates new StringBuilder().append(a).append(b).append(c).toString(). This translation is efficient for simple concatenation expressions but breaks down in loops. In a loop where a string is accumulated with +=, the compiler creates a new StringBuilder for each iteration because it cannot see across loop boundaries to reuse a single builder. The result is O(n²) time and O(n²) space for building an n-character string one character at a time — catastrophic for large strings. The correct pattern for loop-based string building is to explicitly use StringBuilder outside the loop and call append() inside it. From Java 9 onward, string concatenation uses invokedynamic with StringConcatFactory, which can make more intelligent decisions about how to concatenate at runtime. For compile-time-constant expressions, concatenation is resolved entirely at compile time. But the loop problem remains — explicit StringBuilder is still required for loop-based accumulation.
Java
// ── Compiler translation of + ────────────────────────────────────────
String first = "Hello";
String last  = "World";

// What you write:
String full = first + ", " + last + "!";

// What the compiler generates (conceptually):
String full2 = new StringBuilder()
    .append(first)
    .append(", ")
    .append(last)
    .append("!")
    .toString();

// ── The loop problem — O(n²) without StringBuilder ────────────────────
// WRONG — creates a new StringBuilder on every iteration:
String result = "";
for (int i = 0; i < 10_000; i++) {
    result += i + ",";   // new StringBuilder each time — O(n²)
}

// CORRECT — one StringBuilder for the whole loop — O(n):
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10_000; i++) {
    sb.append(i).append(",");
}
String efficient = sb.toString();

// ── Performance demonstration ─────────────────────────────────────────
int N = 50_000;

// String += accumulation
long start = System.currentTimeMillis();
String s = "";
for (int i = 0; i < N; i++) s += "x";
System.out.println("String +=:    " + (System.currentTimeMillis() - start) + "ms");
// e.g. ~2000ms for N=50,000

// StringBuilder
start = System.currentTimeMillis();
StringBuilder sb2 = new StringBuilder();
for (int i = 0; i < N; i++) sb2.append("x");
String s2 = sb2.toString();
System.out.println("StringBuilder: " + (System.currentTimeMillis() - start) + "ms");
// e.g. ~1ms for N=50,000

// ── When + is fine: non-loop expressions ─────────────────────────────
// These compile to efficient single StringBuilder operations:
String msg = "User " + userId + " logged in at " + timestamp;
String err = "Error: " + ex.getMessage() + " in " + getClass().getName();

// ── String.join() — joining with delimiter ────────────────────────────
String csv = String.join(", ", "Alice", "Bob", "Carol");
System.out.println(csv);   // Alice, Bob, Carol

List<String> items = List.of("apple", "banana", "cherry");
String joined = String.join(" | ", items);
System.out.println(joined); // apple | banana | cherry

Related Topics in Strings

String Pool
The string pool (also called the string intern pool or string constant pool) is a special memory region maintained by the JVM that stores a single copy of each unique string value. When two string literals have the same content, they refer to the same object in the pool rather than two separate objects. The pool is a flyweight pattern applied at the language level — it dramatically reduces memory consumption in applications that use many repeated string values, which is nearly every application. This entry covers how the pool works, where it lives in JVM memory, how to interact with it programmatically, the intern() method, performance implications, and when to use or avoid pool entries.
Immutable String
String immutability is the most important design decision in Java's String class. Once a String object is created, its character sequence can never change. No method on String modifies the string; every method that appears to modify returns a new String object containing the result. This design decision drives thread safety, enables the string pool, makes strings safe hash map keys, and simplifies reasoning about string values. Understanding why String is immutable, how immutability is enforced, and what the consequences of immutability are clarifies the behaviour of virtually every piece of Java code that handles strings.
Mutable String
Java provides two mutable string classes for scenarios where String's immutability would be inefficient: StringBuilder and StringBuffer. Both maintain an internal character buffer that can be modified in place — characters can be appended, inserted, deleted, and replaced without creating new objects. StringBuilder is the modern choice for single-threaded use; StringBuffer is the legacy thread-safe version with synchronised methods. This entry covers the internal buffer mechanics, the full API of both classes, performance characteristics, when to use each, thread safety implications, and the complete patterns for efficient string construction.
String Methods
The String class provides over 60 methods covering character inspection, searching, comparison, transformation, splitting, joining, formatting, and encoding. Knowing these methods well eliminates the need to write manual character-by-character loops for common string operations and prevents the common mistake of reimplementing logic that the library already provides efficiently. This entry covers every major method category with precise semantics, edge cases, performance characteristics, and practical examples — from the fundamental length() and charAt() through to the modern Java 11+ additions like strip(), isBlank(), lines(), and repeat().