☕ Java

DataInputStream

DataInputStream wraps any InputStream and adds methods for reading Java primitive types in a machine-independent binary format. It reads boolean, byte, short, int, long, float, double, and char values from the underlying stream using fixed byte widths and big-endian byte order. The big-endian, fixed-width encoding is identical to the format written by DataOutputStream, making the two classes the natural pair for serializing and deserializing primitive data across files, network connections, or inter-process pipes. DataInputStream also provides readFully(), which blocks until exactly the specified number of bytes have been read — filling a buffer completely rather than returning a partial read as InputStream.read() may do. readUTF() reads a string encoded in a modified UTF-8 format (a two-byte length prefix followed by the encoded string bytes) that was written by DataOutputStream.writeUTF(). DataInputStream is unbuffered, so it should always be wrapped inside a BufferedInputStream for performance. This entry covers the full method API, the big-endian byte order contract, readFully() vs read() semantics, the modified UTF-8 format and its limitations, end-of-file detection, and composition patterns for binary protocol parsing.

Construction, Read Methods, and Big-Endian Contract

DataInputStream is constructed with new DataInputStream(InputStream in). The only parameter is the underlying InputStream — DataInputStream adds no buffer. In practice, it is always layered over a BufferedInputStream: new DataInputStream(new BufferedInputStream(new FileInputStream(path))). The primitive read methods each read a fixed number of bytes and decode them as the corresponding Java type using big-endian byte order (most significant byte first). readBoolean() reads 1 byte: zero is false, any non-zero value is true. readByte() reads 1 signed byte (-128 to 127). readUnsignedByte() reads 1 byte and returns it as an int in the range 0-255. readShort() reads 2 bytes as a signed 16-bit big-endian integer. readUnsignedShort() reads 2 bytes as an unsigned 16-bit value (0-65535). readChar() reads 2 bytes as a big-endian UTF-16 char. readInt() reads 4 bytes as a signed 32-bit big-endian integer. readLong() reads 8 bytes as a signed 64-bit big-endian integer. readFloat() reads 4 bytes, interprets them as a 32-bit IEEE 754 big-endian float using Float.intBitsToFloat(). readDouble() reads 8 bytes, interprets them as a 64-bit IEEE 754 big-endian double using Double.longBitsToDouble(). All of these methods block until the required bytes are available. All throw EOFException (a subclass of IOException) if the stream reaches end of file before all required bytes are read. This consistent behavior — throwing EOFException on premature EOF rather than returning -1 or a partial value — makes protocol framing detection straightforward: EOFException signals that the connection was closed mid-message. The big-endian byte order is a fixed contract. If the data source uses little-endian byte order (common for binary file formats written by C or C++ programs on x86, and for formats like TIFF, BMP, or WAV), DataInputStream reads the bytes in the wrong order. To handle little-endian data, use java.nio.ByteBuffer with order(ByteOrder.LITTLE_ENDIAN) or read the raw bytes and reverse them manually.
Java
// ── Construction: always wrap with BufferedInputStream ────────────────
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("data.bin")))) {
    // reads from file with 8192-byte buffer
}

// ── Reading all primitive types ───────────────────────────────────────
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("primitives.bin")))) {

    boolean flag    = dis.readBoolean();       // 1 byte: 0=false, nonzero=true
    byte    b       = dis.readByte();          // 1 byte signed: -128 to 127
    int     ub      = dis.readUnsignedByte();  // 1 byte as int: 0 to 255
    short   s       = dis.readShort();         // 2 bytes big-endian signed
    int     us      = dis.readUnsignedShort(); // 2 bytes as int: 0 to 65535
    char    c       = dis.readChar();          // 2 bytes big-endian UTF-16 char
    int     i       = dis.readInt();           // 4 bytes big-endian signed
    long    l       = dis.readLong();          // 8 bytes big-endian signed
    float   f       = dis.readFloat();         // 4 bytes IEEE 754 big-endian
    double  d       = dis.readDouble();        // 8 bytes IEEE 754 big-endian

    System.out.printf("bool=%b byte=%d ubyte=%d short=%d ushort=%d%n", flag, b, ub, s, us);
    System.out.printf("char=%c int=%d long=%d float=%f double=%f%n", c, i, l, f, d);
}

// ── Big-endian byte layout visualization ─────────────────────────────
// readInt() reading 0x12345678:
// Stream bytes:   0x12  0x56  0x34  0x78   ← WRONG (little-endian source)
// readInt() sees: 0x12  0x56  0x34  0x78 → 0x12345678 (treating as big-endian)
// Correct big-endian source for 0x12345678:
//                 0x12  0x34  0x56  0x78

// ── EOFException on premature stream end ──────────────────────────────
byte[] partialData = {0x00, 0x00};  // only 2 bytes — not enough for readInt() (needs 4)
try (DataInputStream dis = new DataInputStream(
        new ByteArrayInputStream(partialData))) {
    int value = dis.readInt();  // throws EOFException — stream ended after 2 bytes
} catch (EOFException e) {
    System.out.println("Stream ended before readInt could read 4 bytes");
}

// ── Little-endian: use ByteBuffer instead of DataInputStream ──────────
byte[] leBytes = {0x78, 0x56, 0x34, 0x12};  // 0x12345678 in little-endian
ByteBuffer bb = ByteBuffer.wrap(leBytes).order(ByteOrder.LITTLE_ENDIAN);
int leValue = bb.getInt();   // 0x12345678 — correct little-endian interpretation
System.out.printf("Little-endian: 0x%08X%n", leValue);

readFully(), readUTF(), and End-of-File Detection

readFully(byte[] b) and readFully(byte[] b, int off, int len) are the most important methods in DataInputStream beyond the primitives. The standard InputStream.read(byte[]) makes a best-effort read — it may return fewer bytes than requested, particularly on network connections where data arrives in packets. readFully() blocks until exactly len bytes have been read, accumulating partial reads internally. If the stream reaches end of file before all bytes are read, it throws EOFException. readFully() is essential for binary protocol parsing where each message has a precisely-known size. readUTF() reads a string encoded in DataOutputStream's modified UTF-8 format: a 2-byte big-endian unsigned short specifying the number of bytes in the encoded string, followed by that many bytes of modified UTF-8 encoded text. The 2-byte length prefix limits strings to 65535 bytes in modified UTF-8 encoding, which may be fewer than 65535 characters for strings containing supplementary Unicode characters (which encode as 6 bytes each in modified UTF-8). This format is not standard UTF-8 — null bytes (U+0000) are encoded as the 2-byte sequence 0xC0 0x80 rather than a single 0x00 byte, ensuring that null bytes never appear in the byte stream. End-of-file detection: the int-returning methods (read(), readUnsignedByte()) return -1 at EOF; the primitive-returning methods (readInt(), readDouble(), etc.) throw EOFException; readFully() throws EOFException. The inconsistency between -1 (for byte-level reads) and EOFException (for structured reads) reflects the evolution of the API. For protocol parsing with DataInputStream, catching EOFException is the standard way to detect clean stream termination at a message boundary. An EOFException mid-message indicates a truncated stream (connection dropped, file corrupted). skipBytes(int n) attempts to skip n bytes, returning the actual number skipped (which may be less than n). Unlike InputStream.skip(), which also may skip fewer than requested, skipBytes() does not guarantee skipping all n bytes. For guaranteed skipping, use readFully() into a discard buffer.
Java
// ── readFully: guaranteed to fill the buffer completely ───────────────
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("messages.bin")))) {

    // Each message: 4-byte length header + N bytes body
    while (true) {
        int length;
        try {
            length = dis.readInt();   // read 4-byte length — EOFException at clean EOF
        } catch (EOFException e) {
            System.out.println("No more messages");
            break;
        }

        byte[] body = new byte[length];
        dis.readFully(body);   // blocks until EXACTLY 'length' bytes are read
                               // InputStream.read() might return partial data on network streams
        processMessage(body);
    }
}

// ── readFully vs read: the critical difference ────────────────────────
byte[] buf = new byte[1024];

// InputStream.read(buf): may return ANY number from 1 to 1024
// Particularly on networks: returns when ONE packet arrives (possibly just 512 bytes)
int bytesRead = inputStream.read(buf);  // may be 512, not 1024

// DataInputStream.readFully(buf): blocks until EXACTLY 1024 bytes are read
// or EOFException if stream ends first
dis.readFully(buf);   // always fills buf completely

// ── readFully with offset and length ─────────────────────────────────
byte[] largeBuffer = new byte[4096];
dis.readFully(largeBuffer, 0, 100);    // fill bytes 0-99 from stream
dis.readFully(largeBuffer, 100, 200);  // fill bytes 100-299 from stream

// ── readUTF: reads DataOutputStream.writeUTF() format ─────────────────
try (DataInputStream dis2 = new DataInputStream(
        new BufferedInputStream(new FileInputStream("strings.bin")))) {

    String s1 = dis2.readUTF();   // reads 2-byte length prefix, then UTF bytes
    String s2 = dis2.readUTF();
    System.out.println(s1 + " " + s2);
}

// readUTF() limitations:
// - Only works with DataOutputStream.writeUTF() format (not standard UTF-8 files)
// - Max 65535 bytes of modified UTF-8 per string
// - U+0000 encoded as 2-byte sequence (not standard UTF-8)

// For general UTF-8 strings from files/network:
// Read length prefix manually, then readFully into byte[], then new String(bytes, UTF_8)
byte[] strBytes = new byte[dis.readInt()];  // 4-byte length
dis.readFully(strBytes);
String s = new String(strBytes, StandardCharsets.UTF_8);  // standard UTF-8 decode

// ── EOF detection patterns ────────────────────────────────────────────
// At message boundary (clean termination):
try {
    while (true) {
        int msgType = dis.readInt();    // EOFException here = clean end of stream
        int msgLen  = dis.readInt();
        byte[] payload = new byte[msgLen];
        dis.readFully(payload);         // EOFException here = truncated stream (error)
        dispatch(msgType, payload);
    }
} catch (EOFException e) {
    // At this point: either clean EOF (after last message) or truncated stream
    // Protocol design determines which: check if we're at a message boundary
    System.out.println("Stream ended");
}

// ── skipBytes: skip exactly n bytes (best-effort) ─────────────────────
int toSkip = 16;
int skipped = dis.skipBytes(toSkip);   // may skip fewer than 16
if (skipped < toSkip) {
    // Guarantee exact skip with readFully:
    byte[] discard = new byte[toSkip - skipped];
    dis.readFully(discard);
}

Binary Protocol Parsing and Composition Patterns

DataInputStream's primary use case is parsing binary protocols and binary file formats where data is laid out in a precisely-defined sequence of typed fields. Network protocols (custom TCP protocols, binary game protocols, financial market data feeds), file formats (class files, PNG images, ZIP archives, audio codecs), and inter-process communication (shared memory segments, memory-mapped files) all use binary layouts that DataInputStream can parse field by field. The standard composition for network protocol parsing: DataInputStream(BufferedInputStream(socket.getInputStream())). The BufferedInputStream reduces the number of native read() system calls on the socket. The DataInputStream provides the typed read methods. For protocols with framing (each message prefixed with its length), the pattern is: readInt() for length, then readFully() for the message body, then parse the body using a ByteArrayInputStream wrapped in a DataInputStream to avoid interleaving with the next message. A critical design consideration: DataInputStream and BufferedInputStream together buffer reads from the socket. When a message has been fully received, the ByteArrayInputStream created from readFully()'s output contains exactly that message's bytes with no overlap into the next message. Parsing the ByteArrayInputStream with a nested DataInputStream cleanly separates each message's parsing from the stream management. For high-performance binary I/O where DataInputStream is too slow (due to Java method call overhead and inability to use SIMD/vector instructions), java.nio.ByteBuffer is the alternative. ByteBuffer.wrap(byte[]) creates a buffer over an existing byte array; getInt(), getLong(), etc. read primitive values with explicit or inherited byte order. The NIO ByteBuffer approach is particularly useful when reading messages into a ByteBuffer via NIO channels (SocketChannel.read(ByteBuffer)) and then parsing the buffer — this eliminates the copy from channel to byte array.
Java
// ── Binary protocol parsing: message framing ─────────────────────────
public class BinaryProtocolParser {
    private final DataInputStream dis;

    public BinaryProtocolParser(InputStream raw) {
        // BufferedInputStream reduces socket read() calls to one per 8192 bytes:
        this.dis = new DataInputStream(new BufferedInputStream(raw, 65536));
    }

    public Message readMessage() throws IOException {
        // Frame header: [4-byte type][4-byte length]
        int msgType;
        try {
            msgType = dis.readInt();    // throws EOFException at clean EOF
        } catch (EOFException e) {
            return null;   // clean end of stream
        }
        int msgLen = dis.readInt();

        // Read exactly msgLen bytes into a buffer:
        byte[] body = new byte[msgLen];
        dis.readFully(body);   // blocks until complete — no partial reads

        // Parse body independently:
        return parseBody(msgType, body);
    }

    private Message parseBody(int type, byte[] body) throws IOException {
        // Wrap body bytes in DataInputStream for field parsing:
        try (DataInputStream bodyDis = new DataInputStream(
                new ByteArrayInputStream(body))) {

            return switch (type) {
                case 0x01 -> {
                    // LOGIN: [4-byte user_id][2-byte flags][UTF-8 username]
                    int userId  = bodyDis.readInt();
                    short flags = bodyDis.readShort();
                    int nameLen = bodyDis.readUnsignedShort();
                    byte[] nameBytes = new byte[nameLen];
                    bodyDis.readFully(nameBytes);
                    String username = new String(nameBytes, StandardCharsets.UTF_8);
                    yield new LoginMessage(userId, flags, username);
                }
                case 0x02 -> {
                    // TRADE: [8-byte timestamp][4-byte symbol_id][8-byte price][4-byte quantity]
                    long  timestamp = bodyDis.readLong();
                    int   symbolId  = bodyDis.readInt();
                    long  priceRaw  = bodyDis.readLong();  // price * 10000 (fixed-point)
                    int   quantity  = bodyDis.readInt();
                    yield new TradeMessage(timestamp, symbolId, priceRaw / 10000.0, quantity);
                }
                default -> throw new IOException("Unknown message type: " + type);
            };
        }
    }
}

// ── PNG signature parsing with DataInputStream ────────────────────────
public static boolean isPng(Path path) throws IOException {
    try (DataInputStream dis = new DataInputStream(
            new BufferedInputStream(new FileInputStream(path.toFile())))) {
        // PNG signature: 8 bytes: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
        byte[] sig = new byte[8];
        try {
            dis.readFully(sig);
        } catch (EOFException e) {
            return false;   // file too small
        }
        return sig[0] == (byte)0x89
            && sig[1] == 'P'
            && sig[2] == 'N'
            && sig[3] == 'G';
    }
}

// ── NIO ByteBuffer alternative for high performance ───────────────────
// Read binary message via NIO channel, parse with ByteBuffer:
SocketChannel channel = SocketChannel.open(new InetSocketAddress("host", 8080));

ByteBuffer headerBuf = ByteBuffer.allocateDirect(8);  // 4-byte type + 4-byte length
headerBuf.order(ByteOrder.BIG_ENDIAN);  // match DataOutputStream's byte order

// Read exactly 8 bytes:
while (headerBuf.hasRemaining()) channel.read(headerBuf);
headerBuf.flip();

int msgType = headerBuf.getInt();   // reads 4 bytes big-endian
int msgLen  = headerBuf.getInt();   // reads 4 bytes big-endian

ByteBuffer bodyBuf = ByteBuffer.allocateDirect(msgLen);
while (bodyBuf.hasRemaining()) channel.read(bodyBuf);
bodyBuf.flip();

// Parse body fields:
int userId   = bodyBuf.getInt();
double price = bodyBuf.getDouble();
// etc.

Related Topics in Java I/O

I/O Basics
Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.
Byte Streams
Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.
Character Streams
Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.
File Handling
File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.