☕ Java

Byte Streams

Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.

InputStream — The Complete API and Read Contract

InputStream defines the contract for all byte input: eight methods that every concrete subclass must implement or inherit. Understanding the exact semantics of each method is prerequisite knowledge for correct byte stream usage. read() is the atomic unit of byte input. It reads a single byte and returns it as an int in the range [0, 255]. The return type is int, not byte, specifically to accommodate the end-of-stream sentinel -1 without ambiguity — if the return type were byte, -1 would collide with the valid byte value 0xFF. The method blocks until a byte is available, end of stream is reached, or an IOException occurs. Every other read method in InputStream is built on top of this one method, which each concrete subclass must implement. read(byte[] b) reads bytes into an array and returns the number of bytes actually read, which may be less than b.length. This is the most important contract detail: partial reads are legal and common. A single call to read(byte[] b) may return as few as one byte even when more are available, because the underlying I/O source may return data in chunks. The end-of-stream condition is signaled by returning -1, not 0. Code that assumes read(byte[] b) fills the entire array is one of the most prevalent bugs in Java I/O code. read(byte[] b, int off, int len) reads at most len bytes into b starting at offset off, returning the count actually read or -1. readNBytes(byte[] b, int off, int len) (Java 11+) differs critically: it blocks until exactly len bytes have been read, end of stream occurs, or an IOException is thrown. For fixed-size binary protocols where a frame header or record must be read completely before processing, readNBytes is correct and read is not. readAllBytes() (Java 9+) reads the entire remaining stream into a byte array. This is convenient but dangerous for large streams — it allocates memory proportional to the stream length, which may be hundreds of megabytes or unknown. available() returns the number of bytes that can be read without blocking. It does not return the total bytes remaining in the stream; it returns only how many bytes are immediately available in buffers or kernel read buffers. For FileInputStream, available() returns the remaining bytes in the file, which happens to equal the total remaining bytes. For network streams, available() typically returns the bytes currently in the receive buffer, which may be far less than the total data to be received. Using available() to allocate a buffer for a complete network read is incorrect. skip(long n) skips over and discards n bytes. Its return value is the number of bytes actually skipped, which may be less than n — partial skips are legal. transferTo(OutputStream out) (Java 9+) copies the remaining stream content to the provided OutputStream and returns the total bytes transferred. It replaces the common while-loop copy pattern.
Java
// ── read() return contract — the most important detail ───────────────
try (InputStream is = new FileInputStream("data.bin")) {
    int byteValue;
    while ((byteValue = is.read()) != -1) {   // int, not byte — crucial
        byte actualByte = (byte) byteValue;    // safe: value is always [0, 255]
        processByte(actualByte);
    }
}

// WRONG — misidentifies 0xFF as end-of-stream:
// byte b;
// while ((b = (byte) is.read()) != -1) { }   // 0xFF → (byte)-1 → stops too early!

// ── read(byte[]) — partial reads are LEGAL and COMMON ────────────────
byte[] buffer = new byte[1024];

// WRONG: assumes read() fills the entire buffer:
try (InputStream is = socketInputStream) {
    int n = is.read(buffer);
    processAll(buffer, 0, buffer.length);   // WRONG: only 'n' bytes were read, not 1024
}

// CORRECT: always use the actual count returned by read():
try (InputStream is = socketInputStream) {
    byte[] buf = new byte[4096];
    int bytesRead;
    while ((bytesRead = is.read(buf)) != -1) {
        process(buf, 0, bytesRead);   // process only the bytes that were actually read
    }
}

// ── readNBytes() — guaranteed to fill buffer (Java 11+) ──────────────
// For fixed-size binary protocol headers: always need exactly N bytes
try (InputStream is = new FileInputStream("protocol.bin")) {
    byte[] header = new byte[16];
    int actuallyRead = is.readNBytes(header, 0, 16);
    if (actuallyRead < 16) {
        throw new EOFException("Expected 16-byte header, got " + actuallyRead);
    }
    int version  = ((header[0] & 0xFF) << 8) | (header[1] & 0xFF);
    int bodyLen  = ((header[4] & 0xFF) << 24) | ((header[5] & 0xFF) << 16)
                 | ((header[6] & 0xFF) << 8)  |  (header[7] & 0xFF);

    byte[] body = is.readNBytes(bodyLen);   // convenience overload — returns new array
    process(version, body);
}

// ── readAllBytes() — convenience for small streams only (Java 9+) ────
try (InputStream is = new FileInputStream("small-config.json")) {
    byte[] allBytes = is.readAllBytes();   // safe for small files (<few MB)
    String json = new String(allBytes, StandardCharsets.UTF_8);
    parseJson(json);
}
// DANGEROUS for large or unknown-size streams:
// byte[] huge = largeFileInputStream.readAllBytes();   // may throw OutOfMemoryError

// ── available() — bytes immediately available, NOT total remaining ────
try (InputStream is = new FileInputStream("file.bin")) {
    System.out.println(is.available());   // = remaining file bytes (FileInputStream is special)
}

try (InputStream is = socket.getInputStream()) {
    System.out.println(is.available());   // = bytes in receive buffer ONLY, not total to come
    // WRONG: byte[] all = new byte[is.available()]; is.read(all);  — reads only current buffer
}

// ── skip() and transferTo() ───────────────────────────────────────────
try (InputStream is = new FileInputStream("data.bin")) {
    long skipped = is.skip(1024);   // skip first 1KB — may skip less!
    if (skipped < 1024) {
        // Stream ended before we could skip 1024 bytes
        System.out.println("Could only skip " + skipped + " bytes");
    }
}

// transferTo — copy entire stream to output (Java 9+):
try (InputStream  src  = new FileInputStream("source.bin");
     OutputStream dest = new FileOutputStream("dest.bin")) {
    long bytesCopied = src.transferTo(dest);
    System.out.println("Copied " + bytesCopied + " bytes");
}

// ── mark/reset — re-reading data from a stream ────────────────────────
try (InputStream is = new BufferedInputStream(new FileInputStream("data.bin"))) {
    if (is.markSupported()) {
        is.mark(100);            // mark current position; readlimit=100 bytes
        byte[] probe = is.readNBytes(4);
        // Check magic bytes to determine file type:
        if (probe[0] == 0x89 && probe[1] == 'P') {
            is.reset();          // go back to marked position
            processPng(is);      // re-read from start
        } else {
            is.reset();
            processOther(is);
        }
    }
}
// FileInputStream.markSupported() = false
// BufferedInputStream.markSupported() = true
// ByteArrayInputStream.markSupported() = true

OutputStream and Key Concrete Stream Classes

OutputStream defines the contract for all byte output through three core methods: write(int b) writes a single byte (only the low 8 bits of the int are written, the upper 24 bits are ignored), write(byte[] b) writes all bytes in the array, and write(byte[] b, int off, int len) writes len bytes from the array starting at offset off. Unlike InputStream.read(), OutputStream.write() does not return a partial count — it either writes all requested bytes or throws an IOException. The flush() method forces any buffered bytes to the underlying sink; for unbuffered streams it is a no-op. close() flushes and releases the underlying resource. ByteArrayInputStream and ByteArrayOutputStream are in-memory streams backed by byte arrays. ByteArrayInputStream wraps an existing byte array for reading — useful for testing code that expects an InputStream, or for parsing binary data already in memory. ByteArrayOutputStream grows dynamically as bytes are written; toByteArray() returns a copy of the accumulated bytes, and toString(charset) converts to a string with the specified encoding. ByteArrayOutputStream.writeTo(OutputStream) copies the accumulated bytes to another stream without creating a copy — more efficient than toByteArray() when chaining streams. PipedInputStream and PipedOutputStream enable byte-level communication between threads. One thread writes to PipedOutputStream; another reads from the connected PipedInputStream. The pipe is backed by an internal circular buffer (default 1024 bytes). Reads block when the buffer is empty; writes block when the buffer is full. If the reading thread terminates while the writing thread is writing (or vice versa), a "Pipe broken" IOException is thrown. PipedInputStream/PipedOutputStream are rarely the best tool for inter-thread communication in modern Java — BlockingQueue or channels are generally preferable — but they remain correct for scenarios requiring stream-based inter-thread data passing. DataInputStream and DataOutputStream wrap any InputStream/OutputStream and add read/write methods for every Java primitive type in a defined binary format: readInt()/writeInt() read and write a 32-bit big-endian signed integer, readDouble()/writeDouble() read and write a 64-bit IEEE 754 double, readUTF()/writeUTF() read and write a length-prefixed modified-UTF-8 string. The format is JVM-specific and not interoperable with most non-Java systems, but it is compact and fast for Java-to-Java binary protocols and file formats. The modified UTF-8 used by readUTF/writeUTF is not standard UTF-8 — null characters are encoded differently, and supplementary characters (above U+FFFF) are represented as surrogate pairs rather than 4-byte sequences. SequenceInputStream concatenates multiple InputStreams into a single sequential stream. Reading from it transparently moves from one stream to the next as each is exhausted. It is useful for logically concatenating files or byte arrays without copying them into a single buffer.
Java
// ── ByteArrayInputStream — in-memory byte reading ───────────────────
byte[] data = {72, 101, 108, 108, 111};   // "Hello" in ASCII
try (InputStream is = new ByteArrayInputStream(data)) {
    byte[] read = is.readAllBytes();
    System.out.println(new String(read, StandardCharsets.US_ASCII));  // Hello
}

// Test utility: create InputStream from a string
InputStream fromString(String s) {
    return new ByteArrayInputStream(s.getBytes(StandardCharsets.UTF_8));
}

// ── ByteArrayOutputStream — accumulate bytes then extract ─────────────
ByteArrayOutputStream baos = new ByteArrayOutputStream(256);  // initial capacity hint
try (DataOutputStream dos = new DataOutputStream(baos)) {
    dos.writeInt(42);
    dos.writeDouble(3.14159);
    dos.writeUTF("hello");
}   // dos.close() flushes to baos; baos itself doesn't need closing

byte[] serialized = baos.toByteArray();   // copy of accumulated bytes
System.out.println("Serialized " + serialized.length + " bytes");

// writeTo — no extra copy:
try (FileOutputStream fos = new FileOutputStream("serialized.bin")) {
    baos.writeTo(fos);   // writes bytes directly from baos buffer to file
}

// Convert to String:
ByteArrayOutputStream textBaos = new ByteArrayOutputStream();
textBaos.write("Hello, World!".getBytes(StandardCharsets.UTF_8));
String s = textBaos.toString(StandardCharsets.UTF_8);   // Java 10+
System.out.println(s);

// ── PipedInputStream/PipedOutputStream — inter-thread piping ─────────
PipedOutputStream pipeOut = new PipedOutputStream();
PipedInputStream  pipeIn  = new PipedInputStream(pipeOut, 8192);  // 8KB buffer

Thread producer = new Thread(() -> {
    try (DataOutputStream dos = new DataOutputStream(pipeOut)) {
        for (int i = 0; i < 100; i++) {
            dos.writeInt(i);
        }
    } catch (IOException e) { e.printStackTrace(); }
}, "pipe-producer");

Thread consumer = new Thread(() -> {
    try (DataInputStream dis = new DataInputStream(pipeIn)) {
        for (int i = 0; i < 100; i++) {
            int value = dis.readInt();
            System.out.println("Received: " + value);
        }
    } catch (IOException e) { e.printStackTrace(); }
}, "pipe-consumer");

producer.start();
consumer.start();
producer.join();
consumer.join();

// ── DataInputStream/DataOutputStream — structured binary I/O ─────────
// Write a binary record:
try (DataOutputStream dos = new DataOutputStream(
        new BufferedOutputStream(new FileOutputStream("record.bin")))) {
    dos.writeInt(1);              // 4 bytes: record ID
    dos.writeLong(System.currentTimeMillis());  // 8 bytes: timestamp
    dos.writeDouble(98.6);        // 8 bytes: temperature
    dos.writeBoolean(true);       // 1 byte: flag
    dos.writeUTF("John Doe");     // 2-byte length prefix + UTF-8 bytes
}

// Read the same binary record:
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("record.bin")))) {
    int id          = dis.readInt();
    long timestamp  = dis.readLong();
    double temp     = dis.readDouble();
    boolean flag    = dis.readBoolean();
    String name     = dis.readUTF();
    System.out.printf("id=%d ts=%d temp=%.1f flag=%b name=%s%n",
        id, timestamp, temp, flag, name);
}

// ── SequenceInputStream — concatenate multiple streams ────────────────
InputStream part1 = new ByteArrayInputStream("Hello ".getBytes());
InputStream part2 = new ByteArrayInputStream("World".getBytes());
InputStream part3 = new ByteArrayInputStream("!".getBytes());

try (SequenceInputStream seq = new SequenceInputStream(
        new SequenceInputStream(part1, part2), part3)) {
    System.out.println(new String(seq.readAllBytes()));  // Hello World!
}

// With Enumeration (for many streams):
Vector<InputStream> streams = new Vector<>();
for (int i = 0; i < 10; i++) {
    streams.add(new ByteArrayInputStream(("Part" + i + " ").getBytes()));
}
try (SequenceInputStream seq2 = new SequenceInputStream(streams.elements())) {
    System.out.println(new String(seq2.readAllBytes()));
}

ObjectInputStream, ObjectOutputStream, and Serialization

ObjectOutputStream and ObjectInputStream support Java object serialization: converting an object graph into a byte stream (serialization) and reconstructing it (deserialization). writeObject(Object obj) serializes an object and all objects reachable from it through non-transient, non-static fields, producing a byte sequence that encodes the object's class, all its field values, and the structure of the object graph. readObject() reconstructs the object graph from the byte stream, returning an Object that must be cast to the expected type. For a class to be serializable, it must implement java.io.Serializable (a marker interface with no methods). All non-transient, non-static fields must also be serializable. If a field references a non-serializable object, writeObject() throws NotSerializableException. The transient keyword marks a field as excluded from serialization — its value is not written and is initialized to its default value (null, 0, false) on deserialization. This is used for fields that cannot be serialized (database connections, thread references, file handles) or that should not be persisted (cached computed values, sensitive data like passwords). The serialVersionUID is a long field that must match between the serializing and deserializing classes for deserialization to succeed. If classes don't declare serialVersionUID, the JVM computes one based on the class structure; any structural change (adding a field, changing a method signature) changes the computed UID and causes deserialization of old data to fail with InvalidClassException. Always declare serialVersionUID explicitly: private static final long serialVersionUID = 1L; is the minimum declaration. Changing serialVersionUID signals an incompatible change; maintaining it with schema evolution through custom readObject/writeObject is the standard approach for version management. ObjectOutputStream's writeObject implements reference sharing: if the same object is written multiple times to the same stream, subsequent writes store a reference to the first occurrence rather than duplicating the object. This preserves the object graph structure (shared references remain shared after deserialization) and avoids infinite loops in graphs with cycles. The downside is that ObjectOutputStream maintains a reference table of all written objects, preventing garbage collection of those objects for the lifetime of the stream. For long-lived streams writing many objects, call reset() periodically to clear the reference table. Java serialization has well-documented security vulnerabilities: deserializing untrusted data can execute arbitrary code (gadget chains). This has made Java serialization a major attack vector in enterprise Java applications. Modern Java applications should not use Java serialization for data that crosses trust boundaries. Alternatives include JSON (Jackson, Gson), Protocol Buffers, Apache Avro, MessagePack, or XML — all of which provide safer, language-neutral serialization with better version management.
Java
// ── Basic serialization ──────────────────────────────────────────────
import java.io.Serializable;

class Employee implements Serializable {
    private static final long serialVersionUID = 1L;   // ALWAYS declare this

    private final int id;
    private final String name;
    private double salary;
    private transient String cachedDisplayName;   // excluded from serialization
    private transient Connection dbConnection;    // non-serializable — must be transient

    Employee(int id, String name, double salary) {
        this.id = id;
        this.name = name;
        this.salary = salary;
    }

    // Called after deserialization to restore transient fields:
    private Object readResolve() {
        this.cachedDisplayName = name + " (" + id + ")";   // recompute transient
        return this;
    }

    @Override public String toString() {
        return "Employee{id=" + id + ", name='" + name + "', salary=" + salary + "}";
    }
}

// Write:
List<Employee> employees = List.of(
    new Employee(1, "Alice", 95000.0),
    new Employee(2, "Bob",   87000.0)
);

try (ObjectOutputStream oos = new ObjectOutputStream(
        new BufferedOutputStream(new FileOutputStream("employees.ser")))) {
    oos.writeObject(employees);
    System.out.println("Serialized " + employees.size() + " employees");
}

// Read:
try (ObjectInputStream ois = new ObjectInputStream(
        new BufferedInputStream(new FileInputStream("employees.ser")))) {
    @SuppressWarnings("unchecked")
    List<Employee> restored = (List<Employee>) ois.readObject();
    restored.forEach(System.out::println);
}

// ── Reference sharing — object graph preservation ─────────────────────
Employee sharedManager = new Employee(99, "Carol", 120000.0);
Employee worker1 = new Employee(1, "Dave", 80000.0);
Employee worker2 = new Employee(2, "Eve",  82000.0);

ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (ObjectOutputStream oos = new ObjectOutputStream(baos)) {
    oos.writeObject(sharedManager);   // write #1: full object
    oos.writeObject(sharedManager);   // write #2: reference to #1 (not a duplicate)
    oos.writeObject(worker1);
    oos.writeObject(worker2);
}

try (ObjectInputStream ois = new ObjectInputStream(
        new ByteArrayInputStream(baos.toByteArray()))) {
    Employee m1 = (Employee) ois.readObject();
    Employee m2 = (Employee) ois.readObject();   // same object as m1
    System.out.println("Same object: " + (m1 == m2));  // true — reference preserved
}

// ── reset() — prevent memory leak for long-lived streams ──────────────
try (ObjectOutputStream oos = new ObjectOutputStream(
        new BufferedOutputStream(new FileOutputStream("stream.ser")))) {
    for (int i = 0; i < 100_000; i++) {
        oos.writeObject(new Employee(i, "Worker-" + i, 50000.0));
        if (i % 1000 == 0) {
            oos.reset();   // clear reference table every 1000 objects
            // Without reset(): all 100,000 objects held in memory until stream closes
        }
    }
}

// ── Custom serialization — writeObject/readObject ─────────────────────
class SecureEmployee implements Serializable {
    private static final long serialVersionUID = 2L;
    private final String name;
    private transient byte[] encryptedSalary;   // store encrypted
    private transient double salary;             // not directly serialized

    SecureEmployee(String name, double salary) {
        this.name = name;
        this.salary = salary;
        this.encryptedSalary = encrypt(salary);
    }

    private void writeObject(ObjectOutputStream oos) throws IOException {
        oos.defaultWriteObject();         // write non-transient fields (name)
        oos.writeObject(encryptedSalary); // write encrypted salary bytes
    }

    private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException {
        ois.defaultReadObject();                          // restore name
        this.encryptedSalary = (byte[]) ois.readObject(); // read encrypted bytes
        this.salary = decrypt(encryptedSalary);           // restore transient salary
    }

    private byte[] encrypt(double v) { /* ... */ }
    private double decrypt(byte[] b) { /* ... */ }
}

Related Topics in Java I/O

I/O Basics
Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.
Character Streams
Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.
File Handling
File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.
File Class
The java.io.File class is Java's original file system abstraction, present since Java 1.0. A File object represents an abstract pathname — a string denoting a file or directory that may or may not exist on the file system. File objects are immutable: once constructed, the path string they represent never changes. The class provides a comprehensive set of methods for path manipulation, file system queries, directory operations, and file creation and deletion. File served as the primary file system API for 17 years until NIO.2's Path and Files classes superseded it in Java 7. Understanding File is essential for reading existing Java codebases, working with older APIs that accept File parameters, and understanding why NIO.2 was designed the way it was. This entry covers the complete File API in depth: all constructor forms and path semantics, every query and mutation method with its exact return and failure semantics, the listFiles() filtering API, path resolution and relative path handling, platform-specific behavior differences, the interoperability bridge between File and Path, and a precise catalog of File's deficiencies that motivated NIO.2.