☕ Java

Character Streams

Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.

Reader and Writer Hierarchy, and the Bridge Classes

Reader is the abstract base class for character input streams. Its core method is read(), which returns a single char as an int in the range [0, 65535] (the Unicode code point range for chars), or -1 for end of stream. read(char[] cbuf, int off, int len) reads up to len chars into cbuf starting at off, returning the count read or -1. read(char[] cbuf) reads up to cbuf.length chars. Like InputStream.read(byte[]), these methods may return partial fills — the count returned must be used, not the array length. Reader also defines skip(long n), ready() (analogous to InputStream.available() — returns true if a read is guaranteed not to block), markSupported(), mark(), and reset(). Writer is the abstract base class for character output streams. Its core method is write(int c) which writes a single char (the low 16 bits of the int). write(char[] cbuf, int off, int len) writes len chars from cbuf. write(String str, int off, int len) writes a substring, which is unique to Writer — there is no corresponding method in OutputStream. append(CharSequence csq) and append(char c) provide a fluent API for building character output. flush() and close() have the same semantics as in OutputStream. InputStreamReader extends Reader and wraps an InputStream, applying charset decoding as it reads: bytes from the InputStream are decoded according to the specified Charset into chars returned by read(). If no Charset is specified, it uses Charset.defaultCharset(), which on most JVMs defaults to UTF-8 but was historically platform-dependent — always specify the charset explicitly. OutputStreamWriter extends Writer and wraps an OutputStream, applying charset encoding: chars written to the OutputStreamWriter are encoded according to the specified Charset into bytes sent to the underlying OutputStream. These two bridge classes are the standard connection points between the byte stream world and the character stream world. Any network socket communication that carries text must go through InputStreamReader/OutputStreamWriter to handle encoding correctly. Any file containing text should be read and written through these classes (or through FileReader/FileWriter with explicit charset, which are convenience subclasses introduced in Java 11 — prior to Java 11, FileReader and FileWriter always used the platform default charset and had no charset parameter, making them treacherous). The combination of FileInputStream + InputStreamReader + BufferedReader is the correct, explicit way to read text files that works on all Java versions and all platforms.
Java
// ── InputStreamReader — byte stream to char stream bridge ───────────
// Always specify charset explicitly — never rely on platform default:
try (Reader reader = new InputStreamReader(
        new FileInputStream("document.txt"), StandardCharsets.UTF_8)) {
    char[] buffer = new char[4096];
    int charsRead;
    while ((charsRead = reader.read(buffer)) != -1) {
        process(buffer, 0, charsRead);   // process only charsRead chars, not buffer.length
    }
}

// WRONG: relies on platform default charset — breaks across environments
try (Reader broken = new InputStreamReader(new FileInputStream("document.txt"))) {
    // Platform default charset may be UTF-8 on Linux, Cp1252 on Windows — inconsistent
}

// ── OutputStreamWriter — char stream to byte stream bridge ────────────
try (Writer writer = new OutputStreamWriter(
        new FileOutputStream("output.txt"), StandardCharsets.UTF_8)) {
    writer.write("Hello, 世界!");   // writes UTF-8 bytes to the underlying stream
    writer.write('
');
    writer.write(new char[]{'A', 'B', 'C'}, 0, 3);
    writer.write("substring", 3, 6);   // writes chars 3..8 (inclusive of 3, exclusive of 9)
}

// ── Full stack: FileInputStream → InputStreamReader → BufferedReader ──
// This is the most explicit and portable way to read text files:
try (BufferedReader br = new BufferedReader(
        new InputStreamReader(new FileInputStream("data.csv"), StandardCharsets.UTF_8))) {
    String line;
    while ((line = br.readLine()) != null) {
        String[] fields = line.split(",");
        processCsvRow(fields);
    }
}

// Java 11+: FileReader with charset (simpler, same result):
try (BufferedReader br = new BufferedReader(
        new FileReader("data.csv", StandardCharsets.UTF_8))) {
    // Equivalent but cleaner
}

// ── Reader hierarchy — all concrete Reader classes ────────────────────
// StringReader: reads chars from a String
try (Reader sr = new StringReader("Hello, Reader!")) {
    char[] buf = new char[5];
    int n = sr.read(buf);    // reads "Hello"
    System.out.println(new String(buf, 0, n));  // Hello
}

// CharArrayReader: reads chars from a char[]
char[] chars = "CharArray".toCharArray();
try (Reader cr = new CharArrayReader(chars)) {
    System.out.println((char) cr.read());  // C
}

// PipedReader/PipedWriter: inter-thread character piping (rarely used directly)
PipedWriter pw = new PipedWriter();
PipedReader pr = new PipedReader(pw, 4096);
new Thread(() -> {
    try { pw.write("Inter-thread text"); pw.close(); }
    catch (IOException e) { e.printStackTrace(); }
}).start();

try (BufferedReader br = new BufferedReader(pr)) {
    System.out.println(br.readLine());  // Inter-thread text
}

BufferedReader, BufferedWriter, PrintWriter — High-Level Text I/O

BufferedReader is the workhorse of text input in Java. It wraps any Reader and adds two critical capabilities: an internal char buffer (default 8192 chars) that amortizes system call overhead, and readLine() which reads a complete line of text and strips the line terminator. readLine() handles all three line terminator conventions: carriage return ( ), line feed ( ), and carriage-return-line-feed ( ). It returns null at end of stream rather than -1 (because null is a valid sentinel for String return types). The lines() method (Java 8+) returns a Stream<String> of all lines, enabling functional processing with filter, map, and collect. The readLine() null sentinel is the most common source of BufferedReader bugs. Code that compares the return value of readLine() to "" (empty string) to detect end-of-stream is wrong — an empty line returns "" (an empty string, not null), and end-of-stream returns null. The correct idiom is while ((line = br.readLine()) != null). BufferedWriter wraps any Writer and adds buffering plus one specific extra method: newLine(). newLine() writes the platform-specific line separator (System.lineSeparator()), which is on Windows and on Unix/macOS. This is important for writing files that will be read on the same platform and should conform to its line ending convention. For cross-platform files (configuration files shared between systems, files committed to version control), always write explicitly rather than calling newLine(), to avoid accidentally writing Windows-style line endings from a Windows build server. PrintWriter adds the full print/println/printf/format API to any Writer, making it the most convenient class for formatted text output to files. PrintWriter is constructed from a Writer (explicit control over charset and buffering) or from a File or filename (with optional auto-flush and charset in Java 10+). When constructed from a Writer with autoFlush=false (the default), output is buffered and must be flushed manually or by close(). When constructed with autoFlush=true, println(), printf(), and format() automatically flush. PrintWriter suppresses IOExceptions (unlike PrintStream, it does not inherit from FilterOutputStream); use checkError() to detect write failures. StringWriter is a Writer backed by an internal StringBuffer. Everything written to it accumulates in the buffer, which is retrieved as a String via toString(). It is used when a method accepts a Writer parameter and you want to capture its output as a String — for testing, for building formatted strings, or for serializing object state to a string.
Java
// ── BufferedReader.readLine() — correct and incorrect usage ──────────
try (BufferedReader br = new BufferedReader(
        new InputStreamReader(new FileInputStream("lines.txt"), StandardCharsets.UTF_8))) {
    String line;

    // CORRECT: check for null to detect end-of-stream
    while ((line = br.readLine()) != null) {
        if (!line.isBlank()) processLine(line);
    }

    // WRONG: comparing to "" — empty lines return "", end-of-stream returns null
    // while (!(line = br.readLine()).equals("")) { }  // NullPointerException at end!
}

// ── BufferedReader.lines() — functional stream API (Java 8+) ──────────
try (BufferedReader br = Files.newBufferedReader(
        Path.of("data.txt"), StandardCharsets.UTF_8)) {
    long wordCount = br.lines()
        .filter(line -> !line.isBlank())
        .flatMap(line -> Arrays.stream(line.split("\s+")))
        .filter(word -> !word.isEmpty())
        .count();
    System.out.println("Word count: " + wordCount);
}

// Lines stream is lazy — reads lines on demand, not all at once
try (Stream<String> lines = Files.lines(Path.of("large.txt"), StandardCharsets.UTF_8)) {
    lines.filter(l -> l.contains("ERROR"))
         .limit(100)
         .forEach(System.out::println);
    // Only reads until 100 ERROR lines found — doesn't read entire file
}

// ── BufferedWriter.newLine() vs 
 ────────────────────────────────────
// For platform-specific line endings (e.g., Windows batch files):
try (BufferedWriter bw = new BufferedWriter(
        new OutputStreamWriter(new FileOutputStream("windows.bat"), StandardCharsets.UTF_8))) {
    bw.write("@echo off");
    bw.newLine();        // 

 on Windows, 
 on Unix — matches platform
    bw.write("echo Hello");
    bw.newLine();
}

// For cross-platform files (config files, source code, version-controlled files):
try (BufferedWriter bw = new BufferedWriter(
        new OutputStreamWriter(new FileOutputStream("config.txt"), StandardCharsets.UTF_8))) {
    bw.write("key=value");
    bw.write("
");     // ALWAYS 
 — never 
 in cross-platform files
    bw.write("other=data");
    bw.write("
");
}

// ── PrintWriter — convenient formatted text output to files ───────────
// From Writer (explicit charset and buffering control — recommended):
try (PrintWriter pw = new PrintWriter(
        new BufferedWriter(new OutputStreamWriter(
            new FileOutputStream("report.txt"), StandardCharsets.UTF_8)))) {
    pw.println("Report: " + LocalDate.now());
    pw.printf("Total items: %,d%n", 1_234_567);
    pw.printf("Average: %.2f%n", 98.76);
    pw.println("Done");
    if (pw.checkError()) throw new IOException("PrintWriter write failed");
}

// From File (Java 10+: charset as second parameter):
try (PrintWriter pw = new PrintWriter(new File("output.txt"), StandardCharsets.UTF_8)) {
    pw.println("Simple output");
}

// autoFlush=true: println/printf/format flush automatically:
try (PrintWriter autoFlushed = new PrintWriter(new FileWriter("live.txt"), true)) {
    autoFlushed.println("Line 1");  // flushed immediately
    autoFlushed.println("Line 2");  // flushed immediately
    // Useful for log files that must be visible while program is running
}

// ── StringWriter — capture Writer output as String ────────────────────
StringWriter sw = new StringWriter();
try (PrintWriter pw = new PrintWriter(sw)) {
    pw.printf("Name: %s%n", "Alice");
    pw.printf("Score: %d%n", 95);
}   // pw closed; sw still valid
String report = sw.toString();
System.out.println(report);
// Name: Alice
// Score: 95

// Testing utility: capture method output that writes to a Writer:
StringWriter capturedOutput = new StringWriter();
generateReport(new PrintWriter(capturedOutput));   // method under test
assertThat(capturedOutput.toString()).contains("Expected Section");

Charset Handling, Encoding Best Practices, and Modern Alternatives

Character encoding is the most common source of silent data corruption in Java I/O. Java strings are sequences of UTF-16 code units; files and network streams are sequences of bytes; the mapping between them is defined by a Charset. Using the wrong charset, or using the platform default charset, causes characters outside the charset's repertoire to be replaced with question marks or other substitution characters, and causes some byte sequences to be misinterpreted as different characters — all without any exception being thrown. The StandardCharsets class provides constants for the six charsets guaranteed to be available on every Java platform: US_ASCII, ISO_8859_1, UTF_8, UTF_16, UTF_16BE, and UTF_16LE. For virtually all text I/O, UTF_8 is the correct choice: it can represent every Unicode character, it is the dominant encoding on the internet, it is backward compatible with ASCII, and it is the default for Java source files and most modern operating systems. ISO_8859_1 (Latin-1) is sometimes needed for HTTP headers (which are historically Latin-1 encoded), legacy files, or binary data masquerading as text. US_ASCII is appropriate only when you know the data is pure ASCII and want to fail fast (with a replacement character or exception) on non-ASCII input. The Charset.defaultCharset() is Charset.forName("UTF-8") on most modern JVMs (including all JVMs running on Java 17+, where it was standardized via JEP 400), but was platform-dependent on earlier JVMs — UTF-8 on Linux/macOS, Cp1252 on Windows. Code that implicitly relies on the default charset (by using FileReader, FileWriter, new String(bytes), or String.getBytes() without a charset) produces different results on different platforms. This is the definition of a latent cross-platform bug. Always pass an explicit charset. Java NIO.2 provides higher-level alternatives to character stream boilerplate for common file operations: Files.readString(Path, Charset) reads an entire text file as a String; Files.writeString(Path, CharSequence, Charset, OpenOption...) writes a String to a file; Files.readAllLines(Path, Charset) reads all lines as a List<String>; Files.write(Path, Iterable<? extends CharSequence>, Charset, OpenOption...) writes a collection of lines. These methods are cleaner than constructing stream chains for simple cases, but they read or write the entire file at once — unsuitable for very large files. For large files or streaming processing, the stream chain approach (BufferedReader wrapping InputStreamReader wrapping FileInputStream) remains necessary.
Java
// ── Always specify charset — never rely on default ───────────────────
// WRONG: platform-dependent charset
String fromBytes = new String(bytes);               // uses Charset.defaultCharset()
byte[] toBytes   = string.getBytes();               // uses Charset.defaultCharset()
FileReader fr    = new FileReader("file.txt");       // uses Charset.defaultCharset()
FileWriter fw    = new FileWriter("file.txt");       // uses Charset.defaultCharset()

// CORRECT: always explicit
String fromBytesUTF8 = new String(bytes, StandardCharsets.UTF_8);
byte[] toBytesUTF8   = string.getBytes(StandardCharsets.UTF_8);
FileReader frExplicit = new FileReader("file.txt", StandardCharsets.UTF_8);   // Java 11+
FileWriter fwExplicit = new FileWriter("file.txt", StandardCharsets.UTF_8);   // Java 11+

// ── Charset detection — when encoding is unknown ──────────────────────
// If charset is truly unknown: read raw bytes and detect via BOM or external library
try (InputStream is = new FileInputStream("unknown.txt")) {
    byte[] bom = is.readNBytes(3);
    Charset charset;
    int skip = 0;
    if (bom[0] == (byte)0xEF && bom[1] == (byte)0xBB && bom[2] == (byte)0xBF) {
        charset = StandardCharsets.UTF_8; skip = 3;   // UTF-8 BOM
    } else if (bom[0] == (byte)0xFF && bom[1] == (byte)0xFE) {
        charset = StandardCharsets.UTF_16LE; skip = 2; // UTF-16 LE BOM
    } else if (bom[0] == (byte)0xFE && bom[1] == (byte)0xFF) {
        charset = StandardCharsets.UTF_16BE; skip = 2; // UTF-16 BE BOM
    } else {
        charset = StandardCharsets.UTF_8; skip = 0;   // assume UTF-8 (most common)
    }
    // Re-read without BOM bytes consumed:
    InputStream adjusted = new SequenceInputStream(
        new ByteArrayInputStream(bom, skip, bom.length - skip), is);
    try (BufferedReader br = new BufferedReader(
            new InputStreamReader(adjusted, charset))) {
        br.lines().forEach(System.out::println);
    }
}

// ── CodingErrorAction — control behavior on invalid byte sequences ────
Charset utf8Strict = StandardCharsets.UTF_8;

// Default: REPLACE (replaces unmappable chars with '?')
CharsetDecoder defaultDecoder = utf8Strict.newDecoder();
// On error: replaces with U+FFFD (replacement char)

// Strict: REPORT (throws exception on invalid sequence)
CharsetDecoder strictDecoder = utf8Strict.newDecoder()
    .onMalformedInput(CodingErrorAction.REPORT)
    .onUnmappableCharacter(CodingErrorAction.REPORT);

// Using strict decoder with InputStreamReader:
try (Reader reader = new InputStreamReader(
        new FileInputStream("strict.txt"), strictDecoder)) {
    // CharacterCodingException thrown on any invalid UTF-8 sequence
    String content = new BufferedReader(reader).lines()
        .collect(Collectors.joining("
"));
}

// ── NIO.2 alternatives for simple file operations ────────────────────
Path path = Path.of("data.txt");

// Read entire file as String (suitable for small-to-medium files):
String content = Files.readString(path, StandardCharsets.UTF_8);

// Write String to file:
Files.writeString(path, "Hello, World!
Second line
",
    StandardCharsets.UTF_8,
    StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);

// Read all lines as List<String> (loads entire file into memory):
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);

// Write lines (adds system line separator after each):
Files.write(path, List.of("Line 1", "Line 2", "Line 3"),
    StandardCharsets.UTF_8);

// Stream lines lazily (for large files):
try (Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)) {
    stream.filter(l -> l.startsWith("ERROR"))
          .forEach(System.err::println);
}

// ── Newline normalization on read ─────────────────────────────────────
// BufferedReader.readLine() strips ALL line terminators (

, 
, 
)
// If you need to preserve original line endings, read with char[] not readLine():
try (Reader reader = new BufferedReader(new FileReader("mixed.txt", StandardCharsets.UTF_8))) {
    StringBuilder sb = new StringBuilder();
    char[] buf = new char[4096];
    int n;
    while ((n = reader.read(buf)) != -1) {
        sb.append(buf, 0, n);  // preserves all original 
 and 
 characters
    }
    String withOriginalLineEndings = sb.toString();
}

Related Topics in Java I/O

I/O Basics
Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.
Byte Streams
Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.
File Handling
File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.
File Class
The java.io.File class is Java's original file system abstraction, present since Java 1.0. A File object represents an abstract pathname — a string denoting a file or directory that may or may not exist on the file system. File objects are immutable: once constructed, the path string they represent never changes. The class provides a comprehensive set of methods for path manipulation, file system queries, directory operations, and file creation and deletion. File served as the primary file system API for 17 years until NIO.2's Path and Files classes superseded it in Java 7. Understanding File is essential for reading existing Java codebases, working with older APIs that accept File parameters, and understanding why NIO.2 was designed the way it was. This entry covers the complete File API in depth: all constructor forms and path semantics, every query and mutation method with its exact return and failure semantics, the listFiles() filtering API, path resolution and relative path handling, platform-specific behavior differences, the interoperability bridge between File and Path, and a precise catalog of File's deficiencies that motivated NIO.2.