☕ Java
Java Unicode System
Java was built with internationalization in mind from day one. Every string, every character, every source file is Unicode. Understanding how Java handles Unicode — from char and String internals to escape sequences and encoding — is essential for building software that works correctly in every language.
Why Java Chose Unicode
When Java was designed in 1995, most languages used ASCII — a 7-bit encoding covering only 128 characters, barely enough for English. Building global software meant layering complex character encoding conversions on top, with endless bugs when systems disagreed on encodings.
Java made a different choice: Unicode as the native character system, from the ground up. Every Java char is a Unicode character. Every String is a Unicode string. Every .java source file is treated as Unicode. This means Java programs work correctly with English, Chinese, Arabic, Hindi, Japanese, emoji, and mathematical symbols — out of the box, without external libraries.
char — A 16-bit Unicode Character
Java's char primitive is a 16-bit unsigned integer representing a Unicode code point. It can hold any Unicode character in the Basic Multilingual Plane (U+0000 to U+FFFF).
Java
// char holds a single Unicode character:
char letter = 'A'; // U+0041 — Latin capital A
char digit = '5'; // U+0035
char space = ' '; // U+0020
char newline = '
'; // U+000A — escape sequence
// Unicode escape sequences — \uXXXX (4 hex digits):
char omega = 'Ω'; // Ω — Greek capital Omega
char rupee = '₹'; // ₹ — Indian Rupee sign
char snowman = '☃'; // ☃
// char is numerically a 16-bit integer:
char c = 'A';
int code = c; // code = 65 (Unicode code point for 'A')
char next = (char)(c + 1); // next = 'B'
// Printing Unicode:
System.out.println('Ω'); // prints: Ω
System.out.println('₹'); // prints: ₹
// char arithmetic:
for (char ch = 'A'; ch <= 'Z'; ch++) {
System.out.print(ch); // prints: ABCDEFGHIJKLMNOPQRSTUVWXYZ
}Unicode Escape Sequences
Java supports Unicode escapes using the \uXXXX syntax — four hexadecimal digits representing a Unicode code point. These are processed by the compiler before any other syntax processing — which means they work in string literals, character literals, identifiers, and even comments.
Java
// In string literals:
String greeting = "中文"; // 中文 — Chinese characters
String hello = "Hello"; // "Hello"
// In char literals:
char euro = '€'; // €
char pi = 'π'; // π
// In identifiers (unusual but valid):
int Age = 25; // same as: int Age = 25;
// Unicode escapes are processed BEFORE compilation:
// This means
(newline) in source code is a real newline:
// System.out.println("line1
line2"); // same as
// Common Unicode escapes worth knowing:
// — horizontal tab (same as )
//
— line feed / newline (same as
)
//
— carriage return (same as
)
// " — double quote (same as ")
// ' — single quote (same as ')
// \ — backslash (same as \)String and Unicode — Internal Representation
Java String is a sequence of char values — Unicode UTF-16 code units. For characters in the Basic Multilingual Plane (the vast majority of characters), one char = one character. For supplementary characters (emoji, rare CJK characters, ancient scripts) outside the BMP, Java uses surrogate pairs — two char values together to represent one character.
Java
// String length returns the number of char values, not visual characters:
String s = "Hello";
System.out.println(s.length()); // 5 — as expected
// Emoji are supplementary characters — stored as surrogate pairs:
String emoji = "😀";
System.out.println(emoji.length()); // 2 — two char values!
System.out.println(emoji.codePointCount(0, emoji.length())); // 1 — one character
// Safe iteration over code points (handles surrogates correctly):
String text = "Hello 😀";
text.codePoints().forEach(cp -> {
System.out.println(Character.toString(cp)); // correctly handles emoji
});
// String.chars() returns char values — may split surrogate pairs:
// String.codePoints() returns code points — handles supplementary characters
String mixed = "A😀B"; // A + 😀 (surrogate pair) + B
System.out.println(mixed.length()); // 4 (A + 2 surrogates + B)
System.out.println(mixed.codePointCount(0, 4)); // 3 (A + 😀 + B)File Encoding — Reading and Writing Unicode
When reading or writing files, encoding matters. Java's default file encoding can vary by platform — always specify UTF-8 explicitly to guarantee correct Unicode handling across all operating systems.
Java
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.*;
// ALWAYS specify charset explicitly — never rely on the default:
// Writing UTF-8 file:
Path path = Path.of("output.txt");
Files.writeString(path, "Hello 世界 🌍", StandardCharsets.UTF_8);
// Reading UTF-8 file:
String content = Files.readString(path, StandardCharsets.UTF_8);
// PrintWriter with explicit encoding:
PrintWriter writer = new PrintWriter(
new OutputStreamWriter(
new FileOutputStream("file.txt"), StandardCharsets.UTF_8));
// BufferedReader with explicit encoding:
BufferedReader reader = new BufferedReader(
new InputStreamReader(
new FileInputStream("file.txt"), StandardCharsets.UTF_8));
// Java 18+ — UTF-8 is the default charset (finally!)
// Before Java 18: the default was platform-dependent (Cp1252 on Windows, UTF-8 on Linux)
// Best practice: still specify StandardCharsets.UTF_8 explicitly for clarityThe Character Class — Unicode Utilities
Java's Character class provides a rich set of static methods for working with Unicode characters — testing categories, converting case, and handling supplementary characters.
Java
// Testing character categories:
Character.isLetter('A') // true
Character.isLetter('5') // false
Character.isDigit('7') // true
Character.isLetterOrDigit('_') // false
Character.isWhitespace(' ') // true
Character.isUpperCase('A') // true
Character.isLowerCase('a') // true
// Unicode-aware:
Character.isLetter('α') // true — Greek alpha
Character.isLetter('中') // true — Chinese character
// Case conversion:
Character.toUpperCase('a') // 'A'
Character.toLowerCase('Ω') // 'ω'
// Supplementary character support (code point versions):
int codePoint = "😀".codePointAt(0);
Character.isLetterOrDigit(codePoint) // false — emoji is not a letter
Character.getType(codePoint) // 28 = OTHER_SYMBOL
// Get Unicode block and category:
Character.UnicodeBlock block = Character.UnicodeBlock.of('α');
// block = Character.UnicodeBlock.GREEKRelated Topics in Introduction
What is Java?
Java is a high-level, object-oriented programming language built on one killer idea: write your code once, and run it on any device — Windows, Mac, Linux, phone, smartwatch, you name it. No rewrites needed.
Features of Java
Java didn't become one of the world's most used languages by accident. From running on any device to handling millions of users simultaneously, here's what makes Java genuinely powerful — and why companies keep betting on it.
Uses of Java
Java powers everything from Android apps to banking systems, from Netflix's backend to NASA's mission control. Here's where Java is actually used in the real world — and why it keeps showing up in the most critical systems on the planet.
Java Editions (Java SE, EE, ME)
Java isn't one-size-fits-all. It comes in three distinct editions — each built for a different environment. Whether you're building a desktop app, a banking backend, or firmware for a SIM card, there's a Java edition designed exactly for that job.