☕ JavaJava I/O

Externalization

Externalization is Java's mechanism for giving a class complete, explicit control over its serialized form. A class that implements java.io.Externalizable takes full responsibility for reading and writing its own state — the JVM provides no default field serialization. Externalizable declares two methods: writeExternal(ObjectOutput out) writes the object's state using the provided ObjectOutput; readExternal(ObjectInput in) reads it back. Unlike Serializable, which uses the JVM's reflection-based automatic field serialization, Externalizable gives developers explicit control over what is written, the byte-level format, the order of fields, and the encoding of each value. Externalizable classes must have a public no-argument constructor, which is called by the deserialization mechanism before readExternal() is invoked. This constructor-calling behavior is a key difference from Serializable's constructor-bypass: Externalizable deserialization does call a constructor, though it may be a no-op constructor. This entry covers the Externalizable contract in full, the two-method API and ObjectOutput/ObjectInput interfaces, the public no-arg constructor requirement, performance characteristics versus Serializable, the identity-preservation mechanism for shared references, version evolution challenges, and when Externalizable is the right choice.

The Externalizable Contract — writeExternal and readExternal

Externalizable extends Serializable and adds two abstract methods. writeExternal(ObjectOutput out) is called when the object is being serialized. The method must write all state needed to reconstruct the object, using the ObjectOutput interface's write methods — writeInt(), writeLong(), writeObject(), writeUTF(), write(byte[]), and so on. These are the same methods as DataOutput, which ObjectOutput extends, plus writeObject() for writing other serializable or externalizable objects. readExternal(ObjectInput in) is called during deserialization and must read the state back in exactly the same order that writeExternal wrote it. The ObjectInput interface extends DataInput, providing all the DataInputStream read methods plus readObject(). The JVM's role in Externalizable is minimal: it writes a class descriptor (class name only — no field information, since fields are entirely the developer's responsibility) and handles object identity tracking (so the same object referenced multiple times in a graph is serialized once and back-referenced correctly). Everything else is up to the class: what fields to write, their encoding, their order, and any additional data. The public no-argument constructor is required by the Externalizable contract and enforced at deserialization time. The JVM calls the no-arg constructor before calling readExternal(). If no public no-arg constructor exists, instantiation throws a RuntimeException during deserialization. This is a hard requirement — there is no workaround within the Externalizable contract (unlike Serializable, where the constructor bypass means no constructor is needed). The no-arg constructor typically does minimal initialization, since readExternal will set all the real field values immediately after. The key behavioral difference from Serializable: with Externalizable, the developer writes every byte of the external representation. The class descriptor written by the JVM does not include field names or types — there is no automatic field-matching on deserialization. The order in which readExternal reads values must match exactly the order in which writeExternal wrote them. Any mismatch — a field added in a new version, a field removed, a type changed — requires explicit version handling code in readExternal.

Java

// ── Basic Externalizable implementation ──────────────────────────────
import java.io.*;

public class Point implements Externalizable {
    // No serialVersionUID needed (Externalizable writes no field metadata)
    // But it's still good practice to include it:
    private static final long serialVersionUID = 1L;

    private int x;
    private int y;

    // REQUIRED: public no-arg constructor — called by deserialization BEFORE readExternal
    public Point() { }   // must be public, must exist

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        // Write ONLY what we need — in any format we choose:
        out.writeInt(x);   // 4 bytes big-endian
        out.writeInt(y);   // 4 bytes big-endian
        // Total: 8 bytes for this object (vs Serializable: 8 bytes for fields
        //        + class descriptor overhead ~50-100 bytes for first instance)
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        // Read in EXACTLY the same order as writeExternal:
        x = in.readInt();
        y = in.readInt();
        // After readExternal: x and y are set — the public() constructor ran first (no-op)
    }

    @Override public String toString() { return "Point(" + x + ", " + y + ")"; }
}

// ── Serialization round-trip ──────────────────────────────────────────
Point original = new Point(10, 20);

// Serialize:
byte[] bytes;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
     ObjectOutputStream oos = new ObjectOutputStream(baos)) {
    oos.writeObject(original);
    bytes = baos.toByteArray();
}
System.out.println("Serialized size: " + bytes.length + " bytes");

// Deserialize:
// 1. JVM reads class descriptor (class name only)
// 2. JVM calls Point() no-arg constructor  ← constructor IS called
// 3. JVM calls readExternal(in) on the new instance
try (ObjectInputStream ois = new ObjectInputStream(
        new ByteArrayInputStream(bytes))) {
    Point restored = (Point) ois.readObject();
    System.out.println("Restored: " + restored);  // Point(10, 20)
}

// ── Missing public no-arg constructor: runtime failure ────────────────
public class BadExternalizable implements Externalizable {
    private int value;

    // Only private constructor — no public no-arg constructor:
    private BadExternalizable(int v) { this.value = v; }

    @Override public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(value);
    }
    @Override public void readExternal(ObjectInput in) throws IOException {
        value = in.readInt();
    }
}

try (ObjectInputStream ois = new ObjectInputStream(...)) {
    BadExternalizable b = (BadExternalizable) ois.readObject();
    // Throws: java.lang.RuntimeException: java.lang.InstantiationException
    //         (no public no-arg constructor)
}

ObjectOutput/ObjectInput, Shared References, and Version Evolution

ObjectOutput and ObjectInput provide a superset of DataOutput and DataInput, adding writeObject(Object) and readObject(). When writeExternal calls out.writeObject(someObject), the JVM serializes someObject using the standard serialization mechanism — if someObject is Externalizable, its writeExternal is called recursively; if it is Serializable, the normal field-based serialization applies; if it is neither, an error is thrown. The JVM's object identity tracking applies to objects written via writeObject — the same object instance written multiple times (within the same stream) is written once and referenced by handle on subsequent writes. The ObjectOutput interface methods available in writeExternal: write(int b), write(byte[]), write(byte[], int, int), writeBoolean(boolean), writeByte(int), writeShort(int), writeChar(int), writeInt(int), writeLong(long), writeFloat(float), writeDouble(double), writeBytes(String), writeChars(String), writeUTF(String), writeObject(Object). All are inherited from DataOutput with the addition of writeObject. Version evolution is the primary weakness of Externalizable compared to Serializable. With Serializable's field-matching mechanism, adding a field is backward-compatible: old serialized data leaves the new field at its default. With Externalizable, the stream is a raw byte sequence with no field name metadata. Adding a field in a new version of readExternal means the old stream, which did not write that field, will fail when readExternal tries to read more bytes than exist. The developer must handle this manually — typically by versioning the stream: write a version number as the first value in writeExternal, and in readExternal, check the version number to determine which fields to read. The performance advantage of Externalizable over Serializable: Externalizable can produce a more compact stream (writing only what is needed, using the most efficient encoding), and avoids the overhead of reflection-based field discovery and the field descriptors written per class in the Serializable stream. For the first instance of a class, the class descriptor overhead dominates. For thousands of instances of the same class in the same stream, the per-instance overhead dominates, and Externalizable's control over field encoding can reduce stream size significantly.

Java

// ── writeObject inside writeExternal: nested serialization ───────────
public class Order implements Externalizable {
    private long   orderId;
    private String customerId;
    private List<OrderLine> lines;   // List<OrderLine> — OrderLine is Serializable

    public Order() { }

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeLong(orderId);
        out.writeUTF(customerId);
        out.writeInt(lines.size());
        for (OrderLine line : lines) {
            out.writeObject(line);   // each OrderLine serialized via its own mechanism
        }
    }

    @Override
    public void readExternal(ObjectInput in)
            throws IOException, ClassNotFoundException {
        orderId    = in.readLong();
        customerId = in.readUTF();
        int count  = in.readInt();
        lines      = new ArrayList<>(count);
        for (int i = 0; i < count; i++) {
            lines.add((OrderLine) in.readObject());  // cast required
        }
    }
}

// ── Shared reference preservation ────────────────────────────────────
// Even with Externalizable, the JVM tracks object identity at the writeObject level:
Order o1 = new Order();
Order o2 = new Order();

try (ObjectOutputStream oos = new ObjectOutputStream(new ByteArrayOutputStream())) {
    oos.writeObject(o1);   // O1 written, assigned handle #1
    oos.writeObject(o1);   // O1 again — written as back-reference to handle #1 (not duplicated)
    oos.writeObject(o2);   // O2 written, assigned handle #2
}
// Deserialization: two readObject() calls return the SAME Order instance for o1

// ── Version evolution with explicit version number ─────────────────────
public class VersionedPoint implements Externalizable {
    private static final int CURRENT_VERSION = 2;

    private int    x, y;
    private double z;      // added in version 2

    public VersionedPoint() { }

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(CURRENT_VERSION);  // always write version first
        out.writeInt(x);
        out.writeInt(y);
        out.writeDouble(z);   // version 2 addition
    }

    @Override
    public void readExternal(ObjectInput in)
            throws IOException, ClassNotFoundException {
        int version = in.readInt();   // read version first
        x = in.readInt();
        y = in.readInt();
        if (version >= 2) {
            z = in.readDouble();      // only read if stream has version 2 data
        } else {
            z = 0.0;                  // default for v1 streams that didn't write z
        }
        if (version > CURRENT_VERSION) {
            throw new IOException("Unknown version: " + version);
        }
    }
}

// ── Externalizable vs Serializable: when to use each ─────────────────
// USE Serializable when:
//   - Simplicity is the priority
//   - The class has few fields and the default format is acceptable
//   - Version evolution via field addition/deletion is the main concern
//   - Security via writeReplace/readResolve/serialization proxy is needed

// USE Externalizable when:
//   - Full control over the byte format is required (e.g., interoperability with non-Java)
//   - Maximum performance and minimum stream size are critical
//   - Custom encoding (variable-length integers, packed bytes) is needed
//   - The class has non-serializable fields that require custom logic anyway

// ── Performance comparison ────────────────────────────────────────────
// Serializable overhead per instance (first class occurrence):
//   Class descriptor: ~60-100 bytes (class name, serialVersionUID, field count, field descriptors)
//   Instance data: actual field values

// Externalizable per instance:
//   Class descriptor: ~30-50 bytes (class name only — no field metadata)
//   Instance data: exactly what writeExternal writes (developer-controlled)

// For a class with 3 int fields (12 bytes of actual data):
// Serializable first instance: ~80 + 12 = 92 bytes total
// Externalizable:              ~40 + 12 = 52 bytes total  (36% smaller)
// For subsequent instances in same stream, class descriptor is referenced not repeated:
// Serializable: ~5 + 12 = 17 bytes (back-reference to class descriptor)
// Externalizable: ~5 + 12 = 17 bytes (same — back-reference)
// So for large arrays: similar size; the difference is in the first instance only

Externalizable vs Serializable — Security and Design Trade-offs

Externalizable has a fundamentally different security posture than Serializable because it calls a public no-arg constructor before readExternal. This constructor call is a key security property: the constructor can initialize internal state (set sentinel values, initialize security checks, establish class invariants) before readExternal reads any untrusted data. The constructor cannot, however, be used to validate the data that readExternal will read, since the data has not been read yet at that point. Conversely, Externalizable is more vulnerable in one specific way: the public no-arg constructor is called for every deserialization, even from untrusted sources. If the no-arg constructor has side effects (acquiring resources, registering with a registry, creating files), those side effects can be triggered by an attacker who sends malicious byte streams. This is less dangerous than the full gadget chain attacks possible with Serializable, but it is a consideration. The absence of automatic field handling in Externalizable means that adding a field is a breaking change to the external format unless version numbers are managed explicitly. This is a significant maintenance burden for classes that evolve frequently. Serializable's field-based approach handles most common evolution patterns (adding fields, removing optional fields) automatically with the same serialVersionUID. A hybrid approach: use Serializable with writeObject/readObject for customization rather than switching to Externalizable. writeObject/readObject gives similar control over what is written while keeping the default field-matching for fields that do not need customization. This hybrid is the preferred approach for most cases — full Externalizable is appropriate for performance-critical scenarios where the stream format must be precisely controlled.

Java

// ── Security: constructor called before readExternal ─────────────────
public class SecureExternalizable implements Externalizable {
    private int    value;
    private String data;
    private boolean initialized = false;

    // Public no-arg constructor: initializes security sentinels
    public SecureExternalizable() {
        // Called BEFORE readExternal — cannot validate stream data yet,
        // but can initialize internal state:
        this.initialized = false;
        this.value = Integer.MIN_VALUE;  // sentinel
        System.out.println("Constructor called (may be from untrusted source)");
    }

    public SecureExternalizable(int value, String data) {
        if (value < 0) throw new IllegalArgumentException("value must be non-negative");
        if (data == null) throw new NullPointerException("data cannot be null");
        this.value = value;
        this.data  = data;
        this.initialized = true;
    }

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(value);
        out.writeUTF(data);
    }

    @Override
    public void readExternal(ObjectInput in)
            throws IOException, ClassNotFoundException {
        int    v = in.readInt();
        String d = in.readUTF();

        // Validate deserialized values — similar to readObject in Serializable:
        if (v < 0) throw new IOException("Invalid value: " + v);
        if (d == null) throw new IOException("data cannot be null");

        this.value = v;
        this.data  = d;
        this.initialized = true;
    }

    // Always validate that initialization completed:
    public int getValue() {
        if (!initialized) throw new IllegalStateException("Not properly initialized");
        return value;
    }
}

// ── Hybrid: Serializable + writeObject/readObject (usually better) ────
// Instead of Externalizable, use Serializable with custom writeObject/readObject
// for most cases:
public class HybridCustom implements Serializable {
    private static final long serialVersionUID = 1L;

    private int[]  rawData;             // large array — want compact encoding
    private String name;                // normal field
    private transient int[] decompressed; // derived from rawData

    private void writeObject(ObjectOutputStream oos) throws IOException {
        oos.defaultWriteObject();       // writes rawData, name normally
        // Add extra compressed version for large data:
        byte[] compressed = compress(rawData);
        oos.writeInt(compressed.length);
        oos.write(compressed);
    }

    private void readObject(ObjectInputStream ois)
            throws IOException, ClassNotFoundException {
        ois.defaultReadObject();        // restores rawData, name
        int len = ois.readInt();
        byte[] compressed = new byte[len];
        ois.readFully(compressed);
        this.decompressed = decompress(compressed);
    }

    private byte[] compress(int[] data) { return new byte[0]; }   // placeholder
    private int[]  decompress(byte[] b) { return new int[0]; }    // placeholder
}

// ── Summary: choosing between Externalizable and Serializable ─────────
//
//                    Serializable          Externalizable
// ──────────────────────────────────────────────────────────────────────
// Control over format   Low (automatic)       Complete
// Version evolution     Easy (field matching) Manual (version numbers)
// Constructor behavior  Bypassed              Called (public no-arg required)
// Security hooks        writeReplace/Resolve  readExternal validation
// Maintenance burden    Low                   High
// Performance           Good (JVM-optimized)  Better (hand-tuned possible)
// Best for              Most cases            Performance-critical, format-specific

Related Topics in Java I/O

I/O Basics

Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.

Byte Streams

Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.

Character Streams

Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.

File Handling

File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.

transient

NIO Overview