☕ JavaJava I/O

Deserialization

Deserialization is the process of reconstructing a Java object from a byte stream previously produced by serialization. It is performed by ObjectInputStream.readObject(), which reads the class descriptor from the stream, loads the corresponding class, allocates a new instance without calling any constructor, and populates the fields from the stream data. The field population bypasses the constructor entirely — invariants established in the constructor are not automatically re-enforced. This constructor-bypass is the source of both the power and the danger of Java deserialization: it can reconstruct complex object graphs in one call, but it can also produce objects in states that the constructor would have rejected. This entry covers the deserialization process step by step, the constructor bypass and its security implications, resolving class versions with serialVersionUID, handling missing and extra fields during version evolution, readObject and readResolve hooks, ObjectInputStream configuration (class loader, filter), and safe deserialization practices.

The Deserialization Process — Constructor Bypass and Field Population

ObjectInputStream.readObject() reconstructs an object through a sequence of steps that deliberately bypass the normal construction path. First, it reads the class descriptor from the stream — the class name and serialVersionUID. It resolves this to a Class object using the stream's ClassLoader (the application class loader by default). If the class cannot be found, ClassNotFoundException is thrown. Second, it compares the stream's serialVersionUID with the class's declared serialVersionUID. If they differ, InvalidClassException is thrown. Third, it allocates a new instance using sun.misc.Unsafe.allocateInstance() (or an equivalent mechanism), which creates an instance without invoking any constructor. Fourth, it populates the instance's fields directly from the stream data — matching fields by name and type between the stream's class descriptor and the current class definition. The constructor bypass means that the deserialized object starts its life in a state that was not validated by any constructor. Fields declared final can still be set by the deserialization mechanism (this is one of the few ways to set final fields outside a constructor). Invariants that the constructor enforces — range checks, null checks, consistency between fields — are not checked unless a readObject() method explicitly re-checks them. An attacker who controls the byte stream can craft an object in any field configuration, including configurations that the constructor would reject. The field-matching process handles version mismatches gracefully. If the stream contains a field that the current class does not have, the field is ignored during deserialization. If the current class has a field that the stream does not contain, the field is set to its default value (null for references, 0 for numeric primitives, false for boolean). This is the basis for compatible version evolution: adding fields (with same serialVersionUID) is safe because old serialized data will leave new fields at their defaults. The deserialization of the superclass chain follows the class hierarchy. For each serializable superclass, the corresponding fields are deserialized. For the first non-Serializable superclass, the no-argument constructor is called (allocating and initializing that part of the object graph normally). This means any initialization in non-Serializable superclass constructors does execute; only serializable class constructors are bypassed.

Java

// ── Basic deserialization ─────────────────────────────────────────────
try (ObjectInputStream ois = new ObjectInputStream(
        new BufferedInputStream(new FileInputStream("person.ser")))) {
    Person person = (Person) ois.readObject();
    System.out.println("Deserialized: " + person);
    // Constructor was NOT called — object allocated and populated directly
}

// ── Reading multiple objects from the same stream ──────────────────────
try (ObjectInputStream ois = new ObjectInputStream(
        new BufferedInputStream(new FileInputStream("people.ser")))) {
    try {
        while (true) {
            Person p = (Person) ois.readObject();
            System.out.println("Read: " + p);
        }
    } catch (EOFException e) {
        System.out.println("All objects read");
    }
}

// ── Constructor bypass: invariants not re-enforced ─────────────────────
public class SafePeriod implements Serializable {
    private static final long serialVersionUID = 1L;
    private final Date start;
    private final Date end;

    public SafePeriod(Date start, Date end) {
        if (start.after(end)) throw new IllegalArgumentException("start > end");
        this.start = new Date(start.getTime());
        this.end   = new Date(end.getTime());
    }

    // WITHOUT readObject: an attacker can craft a stream where start > end
    // despite the constructor preventing it — the constructor is never called
}

// Demonstrate via reflection-less route — crafting a stream manually is complex
// but the point: readObject MUST validate invariants for security-sensitive classes

// ── readObject: re-enforce invariants after deserialization ───────────
public class ValidatedPeriod implements Serializable {
    private static final long serialVersionUID = 1L;
    private final Date start;
    private final Date end;

    public ValidatedPeriod(Date start, Date end) {
        if (start.after(end)) throw new IllegalArgumentException("start > end");
        this.start = new Date(start.getTime());
        this.end   = new Date(end.getTime());
    }

    private void readObject(ObjectInputStream ois)
            throws IOException, ClassNotFoundException {
        ois.defaultReadObject();   // populate fields from stream

        // Re-enforce constructor invariants — as if the constructor ran:
        if (start.after(end)) throw new InvalidObjectException("start > end");

        // Re-apply defensive copy (the stream may have set the field to a
        // mutable Date — we need our own copy):
        // Note: cannot reassign final fields here without reflection tricks
        // Use non-final fields or the serialization proxy pattern instead
    }
}

// ── Field matching during version evolution ───────────────────────────
// Version 1 of Person: {name, age, email}
// Version 2 of Person: {name, age, email, phone}  (added field)

// Deserializing v1 data into v2 class:
// - name, age, email: populated from stream
// - phone: set to null (not in stream — gets default for reference type)
// No error — compatible evolution with same serialVersionUID

// Deserializing v2 data into v1 class (with same serialVersionUID):
// - name, age, email: populated from stream
// - phone: in stream but not in class — IGNORED
// No error — extra fields are silently skipped

readObject, readResolve, and ObjectInputStream Configuration

The readObject hook and the readResolve hook serve different purposes. readObject (private void readObject(ObjectInputStream ois)) runs during deserialization to customize field population. It is the mirror of writeObject. The readObject method must call ois.defaultReadObject() to populate the standard fields, then may read any extra data that writeObject() wrote, in the exact same order. readObject should also re-enforce invariants and re-derive transient fields that were not serialized. readResolve (private Object readResolve()) runs after readObject (or after default deserialization) and allows the deserialized object to replace itself with a different object. The return value of readResolve() is what readObject() in the calling code receives. This is used for the singleton pattern (return the singleton instance rather than the newly-deserialized instance), for enum-like types (return the canonical constant), and for the serialization proxy pattern (the proxy's readResolve returns the final target object). ObjectInputStream configuration: the default class loader for resolving class names is the thread's context class loader, then the class loader of the nearest bootstrapping class. In application servers and OSGi containers where multiple class loaders are active, this may not find the correct class. ObjectInputStream can be subclassed to override resolveClass(ObjectStreamClass desc) and provide the correct ClassLoader for the environment. ObjectInputFilter (Java 9+) provides a hook for accepting or rejecting classes during deserialization. The filter is called once per unique class in the stream, plus for each array element count, stream depth, and byte count. Filters can be set per-stream (ois.setObjectInputFilter()) or globally (via ObjectInputFilter.Config.setSerialFilter() or the jdk.serialFilter system property). Filters should whitelist only known-safe classes rather than blacklisting known-bad ones — the whitelist approach is robust against unknown gadget chains.

Java

// ── readObject: custom deserialization with extra data ────────────────
public class VersionedData implements Serializable {
    private static final long serialVersionUID = 1L;

    private String  name;
    private int     value;
    private transient String derived;    // not serialized
    private transient long   timestamp;  // not serialized in v1, serialized in v2

    private void writeObject(ObjectOutputStream oos) throws IOException {
        oos.defaultWriteObject();   // write name, value
        oos.writeLong(System.currentTimeMillis());  // write extra timestamp (v2 addition)
    }

    private void readObject(ObjectInputStream ois)
            throws IOException, ClassNotFoundException {
        ois.defaultReadObject();    // read name, value

        // Read extra data written by writeObject (if available):
        // ObjectInputStream.available() is unreliable — use try/catch for version compat:
        try {
            this.timestamp = ois.readLong();   // v2 data
        } catch (EOFException e) {
            this.timestamp = 0L;   // v1 data had no timestamp — use default
        }

        // Re-derive transient computed fields:
        this.derived = name.toUpperCase() + "_" + value;

        // Re-enforce invariants:
        if (value < 0) throw new InvalidObjectException("value cannot be negative");
    }
}

// ── readResolve: singleton and enum-like patterns ─────────────────────
public final class Weekday implements Serializable {
    private static final long serialVersionUID = 1L;

    public static final Weekday MONDAY    = new Weekday("MONDAY");
    public static final Weekday TUESDAY   = new Weekday("TUESDAY");
    public static final Weekday WEDNESDAY = new Weekday("WEDNESDAY");
    // ... etc.

    private final String name;
    private Weekday(String name) { this.name = name; }

    // readResolve: return the canonical constant instead of the new instance
    private Object readResolve() {
        return switch (name) {
            case "MONDAY"    -> MONDAY;
            case "TUESDAY"   -> TUESDAY;
            case "WEDNESDAY" -> WEDNESDAY;
            default -> throw new InvalidObjectException("Unknown weekday: " + name);
        };
    }
}

// ── Custom class loader via ObjectInputStream subclass ────────────────
public class ContextClassLoaderOIS extends ObjectInputStream {
    private final ClassLoader classLoader;

    public ContextClassLoaderOIS(InputStream in, ClassLoader cl)
            throws IOException {
        super(in);
        this.classLoader = cl;
    }

    @Override
    protected Class<?> resolveClass(ObjectStreamClass desc)
            throws IOException, ClassNotFoundException {
        try {
            return Class.forName(desc.getName(), false, classLoader);
        } catch (ClassNotFoundException e) {
            return super.resolveClass(desc);  // fall back to default resolution
        }
    }
}

// ── ObjectInputFilter: whitelist-based security ───────────────────────
// Per-stream filter:
try (ObjectInputStream ois = new ObjectInputStream(
        new BufferedInputStream(new FileInputStream("data.ser")))) {

    ois.setObjectInputFilter(filterInfo -> {
        Class<?> clazz = filterInfo.serialClass();
        if (clazz == null) return ObjectInputFilter.Status.UNDECIDED;

        // Whitelist: accept only these classes
        if (clazz == Person.class     ||
            clazz == Department.class  ||
            clazz == java.util.ArrayList.class ||
            clazz == java.lang.String.class) {
            return ObjectInputFilter.Status.ALLOWED;
        }
        // Reject everything else:
        System.err.println("Rejected class: " + clazz.getName());
        return ObjectInputFilter.Status.REJECTED;
    });

    Object obj = ois.readObject();  // filter applied to every class in graph
}

// Global JVM filter (set once at startup):
// System property: -Djdk.serialFilter=com.example.*;java.util.*;java.lang.String;!*
// Programmatic:
ObjectInputFilter.Config.setSerialFilter(
    ObjectInputFilter.Config.createFilter("com.example.*;java.util.*;java.lang.String;!*")
);

Related Topics in Java I/O

I/O Basics

Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.

Byte Streams

Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.

Character Streams

Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.

File Handling

File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.

Serialization

transient