☕ Java

Serialization

Java serialization is the mechanism for converting an object graph into a sequence of bytes that can be stored to a file, transmitted over a network, or persisted in a database. An object is serializable if its class implements the java.io.Serializable marker interface, which has no methods — it serves only as a type token that grants the JVM permission to serialize instances of the class. The serialization process is performed by ObjectOutputStream.writeObject(), which traverses the object graph recursively, encoding each object's class descriptor and field values into a binary stream in a platform-independent format. The format captures the full object graph — if two references point to the same object, it is written once and both references are restored correctly on deserialization. This entry covers the Serializable marker interface and what it commits to, the binary format structure and version negotiation via serialVersionUID, how the serialization engine traverses object graphs and handles cycles and shared references, the customization hooks writeObject and readObject, the serialization proxy pattern for robust versioning, security implications of deserialization, and when to use Java serialization versus alternatives.

The Serializable Contract and serialVersionUID

Implementing Serializable is a declaration of intent: the class commits to a persistent external representation of its state. This commitment has implications that extend beyond the class itself. The serialized form becomes part of the class's public API — changing field names, types, or the class hierarchy can break deserialization of previously-serialized data. The Java ecosystem treats this contract seriously: changing a serializable class without considering its serialized form is a binary compatibility break. The serialVersionUID is a 64-bit long that acts as a version fingerprint for the class. It appears in the serialized stream alongside the class name. On deserialization, the JVM compares the serialVersionUID in the stream with the serialVersionUID of the class in the JVM. If they differ, InvalidClassException is thrown, indicating an incompatible class version. If serialVersionUID is not explicitly declared, the JVM computes one automatically from the class structure — method signatures, field names and types, implemented interfaces, and more. Any change to the class structure changes the computed serialVersionUID, making all previously-serialized instances unreadable. Declaring serialVersionUID explicitly gives the developer control: instances serialized with the old version can still be deserialized with the new version (with whatever field additions or removals have been made), avoiding the automatic incompatibility. The convention for serialVersionUID declaration is: private static final long serialVersionUID = 1L; — start at 1L and increment when making an incompatible change that requires rejecting old serialized data. For compatible changes (adding fields, changing field modifiers), keep the same value. The IDE tools (javac -Xlint:serial, IntelliJ, Eclipse) warn about missing serialVersionUID declarations. The Serializable marker interface triggers the JVM's default serialization: all non-transient, non-static fields of the class and its superclasses (up to the first non-Serializable superclass) are written. Fields of the first non-Serializable superclass are not serialized — they are initialized via that class's no-argument constructor on deserialization. If the first non-Serializable superclass has no accessible no-argument constructor, deserialization throws InvalidClassException.
Java
// ── Basic Serializable class ──────────────────────────────────────────
import java.io.*;

public class Person implements Serializable {
    // ALWAYS declare serialVersionUID explicitly:
    private static final long serialVersionUID = 1L;

    private String name;
    private int    age;
    private String email;

    // All non-static, non-transient fields are serialized by default
    public Person(String name, int age, String email) {
        this.name  = name;
        this.age   = age;
        this.email = email;
    }

    @Override public String toString() {
        return "Person{name=" + name + ", age=" + age + ", email=" + email + "}";
    }
}

// ── Serializing to a file ─────────────────────────────────────────────
Person alice = new Person("Alice", 30, "alice@example.com");
try (ObjectOutputStream oos = new ObjectOutputStream(
        new BufferedOutputStream(new FileOutputStream("person.ser")))) {
    oos.writeObject(alice);   // writes class descriptor + field values
    System.out.println("Serialized: " + alice);
}

// ── Serializing multiple objects to the same stream ───────────────────
List<Person> people = List.of(
    new Person("Alice", 30, "alice@example.com"),
    new Person("Bob",   25, "bob@example.com"),
    new Person("Carol", 35, "carol@example.com")
);

try (ObjectOutputStream oos = new ObjectOutputStream(
        new BufferedOutputStream(new FileOutputStream("people.ser")))) {
    for (Person p : people) {
        oos.writeObject(p);   // each object written sequentially
    }
}

// ── serialVersionUID: explicit control of version compatibility ────────
public class BankAccount implements Serializable {
    private static final long serialVersionUID = 1L;  // version 1

    private String accountNumber;
    private double balance;
    // Adding fields in version 2 with same serialVersionUID = 1L:
    // private String currency = "USD";  // OK: new field gets default on deserialization
    // Changing field type from double to BigDecimal: INCOMPATIBLE — increment to 2L
}

// ── Automatic serialVersionUID computation (risky): ───────────────────
public class NoVersionUID implements Serializable {
    // No explicit serialVersionUID — JVM computes from:
    // class name, interface names, field names+types, method signatures
    private String data;   // Adding any method changes the computed UID — breaks deserialization
}

// ── IDE warning: -Xlint:serial catches missing declarations ───────────
// javac -Xlint:serial MyClass.java
// warning: [serial] serializable class Person has no definition of serialVersionUID

Object Graph Traversal, Cycles, and writeObject/readObject

The serialization engine traverses the object graph starting from the root object passed to writeObject(). For each object, it writes a class descriptor (class name, serialVersionUID, field descriptors) followed by the field values. If a field is a reference to another object, that object is serialized recursively. If a field is null, a null reference token is written. If an object has already been serialized in this stream, a back-reference handle is written instead of re-serializing — this correctly handles shared references and circular references without infinite recursion. The shared-reference mechanism preserves object identity across serialization. If two fields reference the same String object, after deserialization they reference the same deserialized String object. If an object graph contains a cycle (object A references object B which references object A), it is serialized correctly: A is written first with a handle, B is written with a reference to A's handle — no infinite loop. The writeObject and readObject customization hooks allow a class to supplement or replace the default serialization. A class declares private void writeObject(ObjectOutputStream oos) throws IOException to customize writing. The method typically calls oos.defaultWriteObject() to write all non-transient fields, then writes additional data using the oos write methods. The corresponding private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException customizes reading, typically calling ois.defaultReadObject() first, then reading the additional data written by writeObject. writeObject/readObject are used for: serializing transient fields that can be re-derived (compress or encrypt data before writing), enforcing invariants after deserialization (normalize a deserialized date, validate constraints), and adding version-compatible extra data (write additional fields in a new version, handle their absence gracefully when reading old data). The ObjectStreamField[] serialPersistentFields declaration provides an alternative to transient for precisely controlling which fields are included in the default serialized form.
Java
// ── Object graph with shared references ──────────────────────────────
public class Department implements Serializable {
    private static final long serialVersionUID = 1L;
    private String name;
    private List<Person> members = new ArrayList<>();

    public void addMember(Person p) { members.add(p); }
}

Person alice = new Person("Alice", 30, "alice@example.com");
Department dept = new Department("Engineering");
dept.addMember(alice);
dept.addMember(alice);   // same object referenced twice

try (ObjectOutputStream oos = new ObjectOutputStream(
        new BufferedOutputStream(new FileOutputStream("dept.ser")))) {
    oos.writeObject(dept);  // alice serialized ONCE — second reference is a handle
}

// After deserialization: both list entries reference the same Person object ✓

// ── Circular references are handled correctly ─────────────────────────
public class Node implements Serializable {
    private static final long serialVersionUID = 1L;
    String value;
    Node   next;
    Node   prev;  // bidirectional list — creates cycles
}

Node n1 = new Node(); n1.value = "A";
Node n2 = new Node(); n2.value = "B";
n1.next = n2; n2.prev = n1;   // cycle: n1 → n2 → n1

try (ObjectOutputStream oos = new ObjectOutputStream(new ByteArrayOutputStream())) {
    oos.writeObject(n1);   // no StackOverflowError — cycles handled via handle table
    System.out.println("Circular graph serialized successfully");
}

// ── writeObject / readObject customization ────────────────────────────
public class SecureRecord implements Serializable {
    private static final long serialVersionUID = 1L;

    private String username;
    private transient String password;  // transient: excluded from default serialization
    private transient String derivedKey; // computed field — not serialized

    public SecureRecord(String username, String password) {
        this.username   = username;
        this.password   = password;
        this.derivedKey = deriveKey(password);
    }

    // Custom serialization: encrypt password before writing
    private void writeObject(ObjectOutputStream oos) throws IOException {
        oos.defaultWriteObject();   // writes 'username' (non-transient fields)
        String encrypted = encrypt(password);
        oos.writeObject(encrypted); // write encrypted password as extra data
    }

    // Custom deserialization: decrypt password after reading
    private void readObject(ObjectInputStream ois)
            throws IOException, ClassNotFoundException {
        ois.defaultReadObject();    // reads 'username'
        String encrypted = (String) ois.readObject();
        this.password   = decrypt(encrypted);
        this.derivedKey = deriveKey(password);  // re-derive computed field
    }

    private String encrypt(String s) { return "ENC:" + s; }  // placeholder
    private String decrypt(String s) { return s.substring(4); }
    private String deriveKey(String pw) { return "KEY:" + pw.hashCode(); }
}

// ── serialPersistentFields: explicit field list ────────────────────────
public class LegacyClass implements Serializable {
    private static final long serialVersionUID = 1L;

    // Explicit serialized field list — only 'id' and 'name' are serialized:
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("id",   Integer.TYPE),
        new ObjectStreamField("name", String.class)
    };

    private int    id;
    private String name;
    private int    internalCache;   // excluded — acts like transient
}

Security, Serialization Proxy, and Alternatives

Java deserialization is a well-known attack vector. Deserializing data from an untrusted source can execute arbitrary code if the classpath contains vulnerable classes (gadget chains). The attack works by crafting a byte stream that, when deserialized, constructs objects whose methods (readObject, finalize, etc.) chain together to execute arbitrary code. Libraries like Apache Commons Collections, Spring Framework, and others have had deserialization gadget chains that allowed remote code execution. The attack requires no custom code on the server — only that the vulnerable library is on the classpath. Mitigations: never deserialize data from untrusted sources with the default ObjectInputStream. Use ObjectInputStream.setObjectInputFilter() (Java 9+) or the system-wide jdk.serialFilter property to whitelist acceptable class names. The filter receives each class name as it is encountered during deserialization and can accept, reject, or defer the decision. The simplest filter: accept only classes from a known list; reject everything else. An alternative: use a completely different serialization format (JSON, Protocol Buffers, Avro, MessagePack) for network communication and persistent storage, reserving Java serialization only for JVM-internal use. The serialization proxy pattern is the safest and most robust approach for classes that must be Serializable. Instead of serializing the actual object, a writeReplace() method returns a private static inner SerializationProxy object (a simple record-like class with the minimum state needed to reconstruct the original). The proxy implements Serializable and its readResolve() method reconstructs the original object from the proxy's state, going through the class's normal constructor. This ensures invariants are always enforced on deserialization (the constructor validates the data), makes the serialized form independent of internal representation, and prevents all known deserialization gadget attacks on the serializable class itself because no instance of the class is ever created directly by the deserialization machinery. Modern alternatives to Java serialization: JSON (Jackson, Gson) for human-readable, language-interoperable data; Protocol Buffers (Protobuf) for compact, schema-versioned binary data; Apache Avro for Hadoop-ecosystem data; MessagePack for compact binary with JSON semantics; CBOR (Concise Binary Object Representation) for binary JSON. These alternatives avoid the security risks of Java serialization and provide better schema evolution support.
Java
// ── Security: ObjectInputFilter to whitelist classes ──────────────────
import java.io.ObjectInputFilter;

// System-wide filter (JVM startup property):
// -Djdk.serialFilter=com.example.**;java.util.*;!*
// Accepts com.example and java.util classes; rejects everything else

// Programmatic filter on a specific stream:
try (ObjectInputStream ois = new ObjectInputStream(
        new BufferedInputStream(new FileInputStream("data.ser")))) {

    ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(
        "com.example.Person;com.example.Department;" +  // whitelist
        "java.util.ArrayList;java.lang.String;"        +
        "!*"   // reject everything not explicitly whitelisted
    );
    ois.setObjectInputFilter(filter);

    Object obj = ois.readObject();  // filter checked for every class in the graph
}

// ── Serialization proxy pattern ────────────────────────────────────────
public final class Period implements Serializable {
    private static final long serialVersionUID = 1L;

    private final Date start;
    private final Date end;

    public Period(Date start, Date end) {
        // Constructor enforces invariant:
        if (start.after(end)) throw new IllegalArgumentException("start after end");
        this.start = new Date(start.getTime());  // defensive copy
        this.end   = new Date(end.getTime());
    }

    // writeReplace: instead of serializing 'this', serialize the proxy
    private Object writeReplace() {
        return new SerializationProxy(this);
    }

    // readObject: prevent direct deserialization of Period instances
    private void readObject(ObjectInputStream ois) throws InvalidObjectException {
        throw new InvalidObjectException("Use serialization proxy");
    }

    // Private static proxy class — minimal, correct state
    private static class SerializationProxy implements Serializable {
        private static final long serialVersionUID = 1L;
        private final Date start;
        private final Date end;

        SerializationProxy(Period p) {
            this.start = p.start;
            this.end   = p.end;
        }

        // readResolve: reconstruct Period through its public constructor
        private Object readResolve() {
            return new Period(start, end);  // invariant enforced — no way to bypass
        }
    }

    public Date start() { return new Date(start.getTime()); }
    public Date end()   { return new Date(end.getTime()); }
}

// ── Modern alternative: Jackson JSON serialization ─────────────────────
import com.fasterxml.jackson.databind.ObjectMapper;

ObjectMapper mapper = new ObjectMapper();

// Serialize to JSON byte array:
byte[] json = mapper.writeValueAsBytes(alice);
System.out.println(new String(json));
// {"name":"Alice","age":30,"email":"alice@example.com"}

// Deserialize from JSON:
Person restored = mapper.readValue(json, Person.class);
// No security risk from untrusted sources (no code execution via gadget chains)
// Schema evolution: add fields freely — missing fields get defaults, extra fields ignored

// ── writeReplace / readResolve for singleton pattern ─────────────────
public class Config implements Serializable {
    private static final long serialVersionUID = 1L;
    private static final Config INSTANCE = new Config();

    private Config() {}

    public static Config getInstance() { return INSTANCE; }

    // Preserve singleton property across serialization:
    private Object readResolve() {
        return INSTANCE;   // replace deserialized instance with the singleton
    }
}

Config c1 = Config.getInstance();
byte[] bytes = serialize(c1);       // serialize
Config c2 = (Config) deserialize(bytes); // deserialize
System.out.println(c1 == c2);      // true — readResolve ensures singleton

Related Topics in Java I/O

I/O Basics
Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.
Byte Streams
Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.
Character Streams
Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.
File Handling
File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.