☕ JavaJava I/O

DataOutputStream

DataOutputStream wraps any OutputStream and adds methods for writing Java primitive types in a machine-independent binary format using big-endian byte order. It writes boolean, byte, short, int, long, float, double, and char values using fixed byte widths, and strings using a modified UTF-8 encoding with a 2-byte length prefix. Every value written by DataOutputStream can be read back by DataInputStream using the corresponding read method, making them the standard pair for binary serialization of primitive data across files, sockets, pipes, and inter-process communication. DataOutputStream is unbuffered — each write call propagates to the underlying OutputStream — so it should always be wrapped over a BufferedOutputStream for file or network I/O. DataOutputStream tracks a written byte counter via the size() method, which returns the total bytes written since construction (wraps at Integer.MAX_VALUE). This entry covers all write methods and their byte-level encoding, size() semantics, the writeUTF contract and its limitations, flush() behavior, composition with BufferedOutputStream, and patterns for building binary protocol messages.

Construction, Write Methods, and Byte Encoding

DataOutputStream is constructed with new DataOutputStream(OutputStream out). Like DataInputStream, it adds no buffer — it should always be layered over a BufferedOutputStream: new DataOutputStream(new BufferedOutputStream(new FileOutputStream(path))). The write methods encode values as fixed-size big-endian byte sequences and pass them to the underlying OutputStream. writeBoolean(boolean v) writes 1 byte: 1 for true, 0 for false. writeByte(int v) writes the low-order byte of the int (only the lowest 8 bits are written). writeShort(int v) writes the low-order 16 bits as a 2-byte big-endian short. writeChar(int v) writes the low-order 16 bits as a 2-byte big-endian character. writeInt(int v) writes a 4-byte big-endian integer. writeLong(long v) writes an 8-byte big-endian long. writeFloat(float v) converts to an int bit pattern via Float.floatToIntBits(v) and writes 4 bytes. writeDouble(double v) converts to a long bit pattern via Double.doubleToLongBits(v) and writes 8 bytes. write(byte[] b, int off, int len) writes len bytes from the array starting at off — inherited from FilterOutputStream, passes directly to the underlying OutputStream. The big-endian encoding is fixed and cannot be changed. For writing little-endian data (required by many binary file formats including BMP, WAV, AVI, most Windows binary formats, and ELF), use java.nio.ByteBuffer with order(ByteOrder.LITTLE_ENDIAN) to encode values, then write the backing array. The ByteBuffer approach also allows batch encoding — accumulate multiple fields in a buffer and write them all at once, reducing write() call overhead. writeBytes(String s) writes only the low-order byte of each character — this is a legacy ASCII-only method that silently discards the high byte of Unicode characters. It should never be used for non-ASCII strings. Use writeUTF() for strings in the DataOutputStream format, or encode the string manually with String.getBytes(charset) and write the byte array for standard encoding.

Java

// ── Construction: always wrap with BufferedOutputStream ───────────────
try (DataOutputStream dos = new DataOutputStream(
        new BufferedOutputStream(new FileOutputStream("data.bin")))) {
    // writes accumulate in 8192-byte buffer
}

// ── Writing all primitive types ───────────────────────────────────────
try (DataOutputStream dos = new DataOutputStream(
        new BufferedOutputStream(new FileOutputStream("primitives.bin")))) {

    dos.writeBoolean(true);       // 1 byte: value 1
    dos.writeBoolean(false);      // 1 byte: value 0
    dos.writeByte(127);           // 1 byte: 0x7F (low-order 8 bits of int)
    dos.writeByte(-1);            // 1 byte: 0xFF
    dos.writeShort(1000);         // 2 bytes big-endian: 0x03 0xE8
    dos.writeShort(-1);           // 2 bytes: 0xFF 0xFF
    dos.writeChar('A');           // 2 bytes big-endian: 0x00 0x41
    dos.writeChar('€');           // 2 bytes big-endian: 0x20 0xAC (U+20AC)
    dos.writeInt(0x12345678);     // 4 bytes: 0x12 0x34 0x56 0x78
    dos.writeInt(-1);             // 4 bytes: 0xFF 0xFF 0xFF 0xFF
    dos.writeLong(Long.MAX_VALUE);// 8 bytes: 0x7F 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
    dos.writeFloat(3.14f);        // 4 bytes IEEE 754: Float.floatToIntBits(3.14f)
    dos.writeDouble(Math.PI);     // 8 bytes IEEE 754: Double.doubleToLongBits(π)

    System.out.println("Written: " + dos.size() + " bytes");  // total written so far
}

// ── Byte layout of writeInt(0x12345678) ──────────────────────────────
// Byte 0 (MSB): 0x12
// Byte 1:       0x34
// Byte 2:       0x56
// Byte 3 (LSB): 0x78
// Big-endian: most significant byte written first (network byte order)

// ── Little-endian via ByteBuffer ─────────────────────────────────────
// For BMP, WAV, AVI, Windows PE, ELF, and other little-endian formats:
try (DataOutputStream dos = new DataOutputStream(
        new BufferedOutputStream(new FileOutputStream("le_data.bin")))) {

    // Encode int 0x12345678 in little-endian:
    ByteBuffer bb = ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN);
    bb.putInt(0x12345678);
    dos.write(bb.array());   // writes: 0x78 0x56 0x34 0x12 (little-endian)

    // Batch encoding: encode multiple fields at once
    ByteBuffer multi = ByteBuffer.allocate(16).order(ByteOrder.LITTLE_ENDIAN);
    multi.putInt(42);
    multi.putLong(System.currentTimeMillis());
    multi.putInt(100);
    dos.write(multi.array());
}

// ── writeBytes: ASCII-only legacy method — avoid for Unicode ──────────
dos.writeBytes("ASCII only");   // LOW BYTE ONLY — truncates Unicode silently
// 'é' (U+00E9) → writes 0xE9 only (one byte) — data loss for high bytes > 0x7F
// NEVER use writeBytes for Unicode content

// For UTF-8 string writing with explicit length:
byte[] utf8Bytes = "Hello, 世界!".getBytes(StandardCharsets.UTF_8);
dos.writeInt(utf8Bytes.length);   // 4-byte length prefix
dos.write(utf8Bytes);             // followed by UTF-8 bytes

writeUTF(), size(), flush(), and the DataOutputStream/DataInputStream Contract

writeUTF(String s) encodes the string in DataOutputStream's modified UTF-8 format: first a 2-byte big-endian unsigned short containing the number of bytes in the encoded representation (not the number of characters), followed by that many bytes of modified UTF-8 text. The maximum string size is 65535 bytes in modified UTF-8 encoding — strings longer than this cause UTFDataFormatException. For the BMP characters (U+0000 to U+FFFF), modified UTF-8 differs from standard UTF-8 only for the null character (U+0000, encoded as 0xC0 0x80 in modified UTF-8 vs 0x00 in standard UTF-8). For supplementary characters (U+10000 and above), modified UTF-8 encodes each as a 6-byte sequence (via surrogate pairs) rather than the 4-byte standard UTF-8 encoding. The writeUTF/readUTF pair is intended for internal Java serialization — it is used by Java object serialization, by the class file format (for constant pool strings), and by DataInputStream/DataOutputStream when a text string must be embedded in a binary stream without a separate length field. It is not suitable for writing files intended to be read by non-Java programs, because the format is not standard UTF-8. size() returns the total number of bytes written to the DataOutputStream since it was constructed, as an int. The count wraps around if more than Integer.MAX_VALUE bytes are written (approximately 2GB). size() is useful for: checking how large a serialized object is before deciding whether to buffer or stream it, computing message offsets for protocol framing, and verifying that a write sequence produces the expected byte count. flush() propagates to the underlying OutputStream. For DataOutputStream over BufferedOutputStream, flush() causes the BufferedOutputStream to write its buffer to the underlying FileOutputStream (or socket OutputStream). It does not guarantee disk durability — for that, call the underlying FileOutputStream.getFD().sync() or use NIO FileChannel.force(). The close() method calls flush() and then closes the underlying OutputStream.

Java

// ── writeUTF: DataOutputStream's modified UTF-8 format ────────────────
try (DataOutputStream dos = new DataOutputStream(
        new BufferedOutputStream(new FileOutputStream("strings.bin")));
     DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("strings.bin")))) {

    // Write:
    dos.writeUTF("Hello");          // 2-byte length (5) + 5 bytes
    dos.writeUTF("世界");           // 2-byte length (6) + 6 bytes (3 bytes per CJK char)
    dos.writeUTF("Hello, 世界!");   // 2-byte length + encoded bytes
    dos.flush();                    // ensure all bytes reach the file

    // Read back — exactly matches what was written:
    System.out.println(dis.readUTF()); // "Hello"
    System.out.println(dis.readUTF()); // "世界"
    System.out.println(dis.readUTF()); // "Hello, 世界!"
}

// ── writeUTF limitations ─────────────────────────────────────────────
// Max 65535 bytes in modified UTF-8 (NOT max 65535 characters):
String tooLong = "A".repeat(65536);  // 65536 ASCII chars = 65536 bytes
try {
    dos.writeUTF(tooLong);  // throws UTFDataFormatException: string too long
} catch (UTFDataFormatException e) {
    System.out.println("String too long for writeUTF: " + e.getMessage());
}

// For long strings: write 4-byte length + raw UTF-8 bytes
byte[] utf8 = tooLong.getBytes(StandardCharsets.UTF_8);
dos.writeInt(utf8.length);   // 4-byte length — handles strings up to 2GB
dos.write(utf8);             // raw UTF-8 bytes (standard format)

// ── size(): total bytes written since construction ────────────────────
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream counter = new DataOutputStream(baos);

counter.writeInt(42);         // 4 bytes
counter.writeDouble(3.14);    // 8 bytes
counter.writeUTF("Hi");       // 2 (length) + 2 (bytes) = 4 bytes

System.out.println("Written: " + counter.size() + " bytes");  // 16 bytes
System.out.println("Buffer size: " + baos.size() + " bytes"); // also 16 bytes

// size() counts ALL bytes, including multi-byte writes:
ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
DataOutputStream dos2 = new DataOutputStream(baos2);
dos2.writeLong(0L);      // 8 bytes
dos2.writeLong(0L);      // 8 bytes more
dos2.write(new byte[100]); // 100 bytes more
System.out.println(dos2.size());  // 116

// ── flush() propagation chain ─────────────────────────────────────────
try (DataOutputStream dos3 = new DataOutputStream(
        new BufferedOutputStream(
            new FileOutputStream("flushed.bin")))) {

    dos3.writeInt(42);
    dos3.writeLong(123456789L);
    // At this point: bytes in BufferedOutputStream's internal buffer (not on disk)

    dos3.flush();
    // After flush: BufferedOutputStream.flush() writes buffer to FileOutputStream
    // FileOutputStream makes OS system call — bytes in OS kernel buffer
    // NOT guaranteed on disk yet (OS may still buffer)

    // For disk durability:
    // ((FileOutputStream)((BufferedOutputStream)dos3.out).out).getFD().sync();
    // (requires accessing the FileDescriptor through the chain)
}

Binary Protocol Serialization and Message Building Patterns

DataOutputStream is the standard tool for building binary protocol messages in Java. The typical pattern for a framed binary protocol: write the message body to a ByteArrayOutputStream wrapped in DataOutputStream (so size() gives the exact message length), then write the frame header (type + length) to the real output stream, then write the body bytes. This two-phase approach avoids the problem of not knowing the message length before serializing the body. For high-performance binary serialization, the ByteBuffer approach competes with DataOutputStream. ByteBuffer.allocate() creates a buffer of fixed size; the put methods (putInt, putLong, putDouble) write values; flip() prepares the buffer for reading; and the buffer's array() or a channel write sends the bytes. ByteBuffer has advantages in: supporting both big-endian and little-endian byte orders, enabling zero-copy NIO I/O via DirectByteBuffer, and being processable by SIMD-aware JIT optimizations. DataOutputStream has advantages in: being a standard OutputStream chain component, supporting writeUTF() for quick string serialization, and being more readable for protocol implementations that map naturally to a sequence of typed field writes. A common pattern for writing variable-length fields: writeShort(byteArray.length) followed by write(byteArray) for fields up to 65535 bytes; writeInt(byteArray.length) followed by write(byteArray) for larger fields. This is more efficient than writeUTF() for arbitrary binary data because writeUTF() is text-specific (applies modified UTF-8 encoding). For binary data like cryptographic material, image data, or arbitrary byte sequences, always use explicit length + raw bytes.

Java

// ── Two-phase message building: serialize body first, then frame ───────
public static void sendMessage(DataOutputStream networkDos, int msgType, Object payload)
        throws IOException {

    // Phase 1: serialize payload to byte array to get its length
    ByteArrayOutputStream bodyBaos = new ByteArrayOutputStream(256);
    DataOutputStream bodyDos = new DataOutputStream(bodyBaos);

    if (payload instanceof LoginRequest lr) {
        bodyDos.writeInt(lr.userId());
        bodyDos.writeShort(lr.flags());
        byte[] nameBytes = lr.username().getBytes(StandardCharsets.UTF_8);
        bodyDos.writeShort(nameBytes.length);   // 2-byte length prefix
        bodyDos.write(nameBytes);               // UTF-8 bytes
    } else if (payload instanceof TradeRequest tr) {
        bodyDos.writeLong(tr.timestamp());
        bodyDos.writeInt(tr.symbolId());
        bodyDos.writeLong(Math.round(tr.price() * 10000));  // fixed-point
        bodyDos.writeInt(tr.quantity());
    }
    bodyDos.flush();
    byte[] body = bodyBaos.toByteArray();

    // Phase 2: write frame header + body to network
    networkDos.writeInt(msgType);   // 4-byte type
    networkDos.writeInt(body.length); // 4-byte length
    networkDos.write(body);           // body bytes
    networkDos.flush();               // ensure TCP sends the complete message
}

// ── ByteBuffer alternative for big-endian binary serialization ────────
public static byte[] serializeTrade(long timestamp, int symbolId,
        double price, int quantity) {
    ByteBuffer buf = ByteBuffer.allocate(24);  // 8+4+8+4 bytes
    buf.order(ByteOrder.BIG_ENDIAN);           // match DataOutputStream
    buf.putLong(timestamp);
    buf.putInt(symbolId);
    buf.putLong(Math.round(price * 10000));    // fixed-point
    buf.putInt(quantity);
    return buf.array();   // ready to write to any OutputStream
}

// ── Writing a complete binary file format ─────────────────────────────
// Simple custom binary format: [8-byte magic][4-byte version][4-byte record_count][records...]
// Each record: [8-byte id][4-byte value][2-byte tag_length][N-byte tag]
public static void writeDataFile(Path path, List<DataRecord> records) throws IOException {
    try (DataOutputStream dos = new DataOutputStream(
            new BufferedOutputStream(new FileOutputStream(path.toFile()), 65536))) {

        // File header:
        dos.writeLong(0x4A415641_44415441L);  // "JAVADATA" magic bytes
        dos.writeInt(1);                       // format version 1
        dos.writeInt(records.size());          // record count

        // Records:
        for (DataRecord rec : records) {
            dos.writeLong(rec.id());
            dos.writeInt(rec.value());

            byte[] tagBytes = rec.tag().getBytes(StandardCharsets.UTF_8);
            dos.writeShort(tagBytes.length);   // 2-byte tag length (max 65535 bytes)
            dos.write(tagBytes);
        }

        System.out.printf("Wrote %d records, %d total bytes%n",
            records.size(), dos.size());
    }
}

// ── Reading back the same format with DataInputStream ─────────────────
public static List<DataRecord> readDataFile(Path path) throws IOException {
    List<DataRecord> records = new ArrayList<>();
    try (DataInputStream dis = new DataInputStream(
            new BufferedInputStream(new FileInputStream(path.toFile()), 65536))) {

        // Verify magic:
        long magic = dis.readLong();
        if (magic != 0x4A415641_44415441L) throw new IOException("Not a JAVADATA file");

        int version = dis.readInt();
        if (version != 1) throw new IOException("Unsupported version: " + version);

        int count = dis.readInt();

        for (int i = 0; i < count; i++) {
            long id    = dis.readLong();
            int  value = dis.readInt();

            int tagLen    = dis.readUnsignedShort();
            byte[] tagBytes = new byte[tagLen];
            dis.readFully(tagBytes);
            String tag = new String(tagBytes, StandardCharsets.UTF_8);

            records.add(new DataRecord(id, value, tag));
        }
    }
    return records;
}

Related Topics in Java I/O

I/O Basics

Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.

Byte Streams

Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.

Character Streams

Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.

File Handling

File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.

DataInputStream

Serialization