☕ Java

FileInputStream

FileInputStream is a concrete InputStream subclass that reads raw bytes from a file on the file system. It is one of the most fundamental I/O classes in Java, providing a direct connection between the Java byte stream model and the OS file descriptor for reading. FileInputStream opens a file, holds an OS file descriptor for it, and reads bytes by delegating to the OS read() system call. Its read() methods are direct I/O — unbuffered — making BufferedInputStream an essential wrapper for any usage pattern that reads more than a few bytes. FileInputStream can be constructed from a String path, a File object, or (less commonly) an existing FileDescriptor. It provides the standard InputStream API plus getChannel() for NIO integration and getFD() for low-level file descriptor access. This entry covers FileInputStream construction and its open-file resource contract, the performance implications of unbuffered reading, reading specific file types (binary, structured, large), the getChannel() bridge to FileChannel and memory-mapped I/O, the complete interaction between FileInputStream and NIO.2, common mistakes including the partial-read bug, and when to use NIO.2's Files methods instead.

Construction, the File Descriptor Contract, and Buffering

FileInputStream has three constructors: FileInputStream(String name) opens the file at the given path; FileInputStream(File file) opens the file represented by the File object; FileInputStream(FileDescriptor fdObj) wraps an existing file descriptor (rarely used directly — primarily by library code and piped stream implementations). All three open the file immediately at construction time: the constructor makes a system call to open the file, acquires an OS file descriptor, and returns a FileInputStream connected to that descriptor. If the file does not exist, FileNotFoundException is thrown (a subclass of IOException) from the constructor. If the path refers to a directory rather than a file, FileNotFoundException is also thrown on most platforms (the message will say "Is a directory"). Opening a file acquires an OS resource — the file descriptor — that is distinct from the Java object. Java's garbage collector collects the Java object but not the file descriptor. If FileInputStream objects are created without being closed, file descriptors accumulate until the OS process hits its file descriptor limit (typically 1024 per process on Linux by default, configurable but finite). This is the file descriptor leak problem: even after garbage collection, the finalizer that closes the file descriptor runs unpredictably or not at all under G1 and ZGC collectors. Always close FileInputStream in a try-with-resources block. The Java 9 Cleaner mechanism partially mitigates this by more reliably closing forgotten streams, but relying on it instead of explicit close() is incorrect. FileInputStream's read() methods are direct system calls — there is no internal buffering. Each call to read() that reads one byte makes one system call; each call to read(byte[] buf, 0, 1024) makes one system call for 1024 bytes. Because system calls have fixed overhead (context switch to kernel mode, approximately 100–500 nanoseconds on modern hardware), reading one byte at a time is catastrophically slow for any significant file. BufferedInputStream is always required for byte-at-a-time reading, and strongly recommended even for array-based reading because it allows the OS's read-ahead prefetching to work effectively. The NIO.2 alternative for simple small-file reading — Files.readAllBytes(Path) — internally opens a FileInputStream (or FileChannel), reads all content, and closes the stream in a single method call. For files that fit comfortably in memory (less than ~100MB), Files.readAllBytes() is cleaner and less error-prone than managing FileInputStream directly. For large files, streaming access through a buffered FileInputStream is more appropriate than loading everything into memory.
Java
// ── Three constructors ────────────────────────────────────────────────
// From String path:
FileInputStream fis1 = new FileInputStream("/home/user/data.bin");

// From File object:
File f = new File("/home/user/data.bin");
FileInputStream fis2 = new FileInputStream(f);

// From FileDescriptor (rare — usually for stdin):
FileInputStream fromStdin = new FileInputStream(FileDescriptor.in);

// ── FileNotFoundException — thrown at construction ─────────────────────
try {
    FileInputStream missing = new FileInputStream("/nonexistent/path/file.bin");
} catch (FileNotFoundException e) {
    System.err.println("File not found: " + e.getMessage());
    // Message includes path — useful for diagnosis
}

// FileNotFoundException for directory:
try {
    FileInputStream dir = new FileInputStream("/home/user");   // Is a directory
} catch (FileNotFoundException e) {
    System.err.println("Cannot read directory: " + e.getMessage());
}

// ── Always use try-with-resources — file descriptor leak without it ────
// WRONG: no close() call — file descriptor leaked if exception occurs
FileInputStream leaky = new FileInputStream("data.bin");
int b = leaky.read();
leaky.close();   // not called if read() throws

// CORRECT: try-with-resources guarantees close() even on exception:
try (FileInputStream fis = new FileInputStream("data.bin")) {
    int b2 = fis.read();
}   // fis.close() always called here

// ── Unbuffered reading performance — always wrap in BufferedInputStream
// Unbuffered: one system call per byte
long start = System.nanoTime();
try (FileInputStream fis = new FileInputStream("1mb.bin")) {
    while (fis.read() != -1) {}  // 1,000,000 individual read() calls
}
System.out.printf("Unbuffered: %.0f ms%n", (System.nanoTime()-start)/1e6); // ~500ms

// Buffered: one system call per 8KB
start = System.nanoTime();
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream("1mb.bin"))) {
    while (bis.read() != -1) {}  // reads fill 8KB buffer; ~128 system calls total
}
System.out.printf("Buffered: %.0f ms%n", (System.nanoTime()-start)/1e6); // ~5ms

// Array-based read with buffer:
start = System.nanoTime();
try (FileInputStream fis = new FileInputStream("1mb.bin")) {
    byte[] buf = new byte[65536];
    while (fis.read(buf) != -1) {}  // ~16 system calls for 1MB
}
System.out.printf("Chunked: %.0f ms%n", (System.nanoTime()-start)/1e6); // ~2ms

// ── NIO.2 alternatives for simple cases ──────────────────────────────
// Small file: read entire content at once (cleaner than FileInputStream)
byte[] content = Files.readAllBytes(Path.of("config.json"));  // ~1 line, no resource mgmt

// Medium file: stream with buffering (explicit control)
try (BufferedInputStream bis = new BufferedInputStream(
        Files.newInputStream(Path.of("data.bin")))) {
    // Files.newInputStream() returns an InputStream backed by FileChannel
}

// Large file: avoid loading all into memory
try (FileInputStream fis = new FileInputStream("large.bin");
     BufferedInputStream bis = new BufferedInputStream(fis, 65536)) {
    byte[] chunk = new byte[65536];
    int n;
    while ((n = bis.read(chunk)) != -1) {
        processChunk(chunk, n);
    }
}

Binary File Reading, Partial Read Bug, and Structured Data

The most prevalent bug in FileInputStream usage is failing to handle partial reads. read(byte[] buf, int off, int len) returns the number of bytes actually read, which may be less than len even when the file has more data. This happens because the OS may return fewer bytes than requested if: the read request crosses a page boundary and the OS chooses to return what it has; the file system read-ahead cache doesn't have the next block yet; the file is on a network share; or for any other implementation-defined reason. For a local file with a warm page cache, partial reads are rare — but for network files, cold cache files, or when reading from pipes or special files, partial reads are common. The correct pattern for reading a known number of bytes is one of: a loop that accumulates until count is reached (verbose), DataInputStream.readNBytes() (Java 11+, blocks until full), or InputStream.readNBytes() (Java 11+). For reading until end-of-stream into an array, readAllBytes() (Java 9+) is the cleanest option for files that fit in memory. For structured binary formats — protocol buffers, PNG headers, ZIP local file headers, custom binary protocols — DataInputStream wrapping a BufferedInputStream wrapping FileInputStream is the standard approach. DataInputStream.readInt() reads exactly 4 bytes in big-endian order and handles the multi-byte read atomically. DataInputStream.readFully(byte[] b) (inherited from DataInput interface) reads exactly b.length bytes, throwing EOFException if end-of-stream is reached first — this is the definitive solution to the partial-read problem for fixed-size records. For reading very large binary files (multi-gigabyte), memory-mapped I/O via FileChannel and MappedByteBuffer is far more efficient than stream-based reading: the file is mapped directly into the process's virtual address space, and page faults bring in data on demand without copying through Java heap buffers. This approach requires NIO, not FileInputStream directly, but FileInputStream.getChannel() provides the FileChannel needed to create a MappedByteBuffer.
Java
// ── Partial read bug — the most common FileInputStream mistake ────────
byte[] header = new byte[16];

// WRONG: assumes read() fills the entire buffer
try (FileInputStream fis = new FileInputStream("protocol.bin")) {
    fis.read(header);                    // reads up to 16 bytes — may read fewer!
    int version = (header[0] << 8) | (header[1] & 0xFF);  // may be wrong if partial read
}

// CORRECT option 1: loop until full
try (FileInputStream fis = new FileInputStream("protocol.bin")) {
    int totalRead = 0;
    while (totalRead < header.length) {
        int n = fis.read(header, totalRead, header.length - totalRead);
        if (n == -1) throw new EOFException("File too short: expected 16 bytes header");
        totalRead += n;
    }
    int version = ((header[0] & 0xFF) << 8) | (header[1] & 0xFF);
}

// CORRECT option 2: DataInputStream.readNBytes() (Java 11+)
try (DataInputStream dis = new DataInputStream(new FileInputStream("protocol.bin"))) {
    int n = dis.readNBytes(header, 0, header.length);
    if (n < header.length) throw new EOFException("File too short");
}

// CORRECT option 3: DataInputStream.readFully() (available since Java 1.0)
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("protocol.bin")))) {
    dis.readFully(header);   // throws EOFException if stream ends before header.length bytes
    int version = dis.readShort() & 0xFFFF;  // unsigned short — big-endian
}

// ── Structured binary format reading ──────────────────────────────────
// PNG file header parsing example:
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("image.png")))) {

    // PNG magic bytes: 137 80 78 71 13 10 26 10
    byte[] magic = new byte[8];
    dis.readFully(magic);
    if (magic[0] != (byte)0x89 || magic[1] != 'P' || magic[2] != 'N' || magic[3] != 'G') {
        throw new IOException("Not a PNG file");
    }

    // First chunk: IHDR
    int chunkLength = dis.readInt();     // 4 bytes big-endian — always 13 for IHDR
    byte[] chunkType = new byte[4];
    dis.readFully(chunkType);
    String type = new String(chunkType, StandardCharsets.US_ASCII);  // "IHDR"

    // IHDR data:
    int width  = dis.readInt();
    int height = dis.readInt();
    byte bitDepth     = dis.readByte();
    byte colorType    = dis.readByte();
    byte compression  = dis.readByte();
    byte filter       = dis.readByte();
    byte interlace    = dis.readByte();
    int crc           = dis.readInt();   // CRC32 of type+data

    System.out.printf("PNG: %dx%d, bitDepth=%d, colorType=%d%n",
        width, height, bitDepth, colorType);
}

// ── Memory-mapped I/O via getChannel() — for very large files ──────────
try (FileInputStream fis = new FileInputStream("large-database.bin");
     FileChannel channel = fis.getChannel()) {

    long fileSize = channel.size();
    System.out.printf("File size: %.2f GB%n", fileSize / 1e9);

    // Map entire file into virtual memory (efficient up to ~2GB per map):
    MappedByteBuffer mapped = channel.map(
        FileChannel.MapMode.READ_ONLY, 0, Math.min(fileSize, Integer.MAX_VALUE));

    // Read directly from virtual memory — no copy through heap:
    mapped.order(ByteOrder.LITTLE_ENDIAN);   // many binary formats are little-endian
    int magic = mapped.getInt();             // reads 4 bytes at current position
    long timestamp = mapped.getLong();
    System.out.printf("Magic: 0x%08X  Timestamp: %d%n", magic, timestamp);

    // Random access: seek to specific offset
    mapped.position(1024);                   // seek to byte 1024
    byte recordType = mapped.get();          // read one byte

    // For files > 2GB: map in chunks
    long chunkSize = 512L * 1024 * 1024;    // 512MB chunks
    for (long offset = 0; offset < fileSize; offset += chunkSize) {
        long len = Math.min(chunkSize, fileSize - offset);
        MappedByteBuffer chunk = channel.map(FileChannel.MapMode.READ_ONLY, offset, len);
        processChunkBuffer(chunk);
    }
}

// ── getFD() — accessing the raw file descriptor ───────────────────────
try (FileInputStream fis = new FileInputStream("data.bin")) {
    FileDescriptor fd = fis.getFD();
    fd.sync();   // flush OS kernel buffer to disk hardware (fsync)
    // Rarely needed for reads; useful when FD must be passed to native code
    System.out.println("FD valid: " + fd.valid());
}

Reading Text Files, Skip, Available, and Complete Usage Patterns

FileInputStream is not the right class for reading text files directly. Reading text requires character decoding, which FileInputStream does not perform — it returns raw bytes. Feeding raw bytes directly into String constructors without specifying a charset uses the platform default charset, which produces different results on different platforms and systems with different locale settings. The correct approach for text reading is InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8) or, more cleanly in modern Java, Files.newBufferedReader(Path.of(path), StandardCharsets.UTF_8). skip(long n) on FileInputStream skips bytes by actually reading and discarding them for most file types, because seeking in a stream means reading past the skipped bytes. For seekable files (regular files on disk), the JVM may optimize skip() to perform a seek operation using the file descriptor's lseek() system call rather than reading, making it O(1) for local files. The return value of skip() must be checked — it may return fewer than the requested number of bytes if end-of-stream is reached. For reliable skipping of a known number of bytes, a loop with accumulated count or DataInputStream.skipBytes() (which also does not guarantee the full skip but throws no exception) are the options; readNBytes() to a discarded array is the most reliable. available() on FileInputStream is one of the rare cases where available() is meaningful: it returns the remaining number of bytes in the file (total file size minus current position), because the OS knows exactly how many bytes are left in a regular file. This makes it safe to use available() to pre-size a byte array for FileInputStream, unlike network streams where available() returns only the bytes in the kernel buffer. However, available() on FileInputStream can return incorrect values for files on some network file systems that do not report accurate sizes, and for special files (devices, fifos). The complete pattern for reading a binary file with structured content is: FileInputStream for the source, BufferedInputStream for performance, DataInputStream for typed reads, try-with-resources for resource safety, and readFully() or readNBytes() for guaranteed complete reads of fixed-size structures. For reading an entire small binary file into memory, Files.readAllBytes(Path) is the cleanest and most correct option, requiring no explicit resource management and handling partial reads internally.
Java
// ── Text reading — never use FileInputStream directly for text ────────
// WRONG: platform-dependent charset, byte-level reading
try (FileInputStream fis = new FileInputStream("document.txt")) {
    byte[] bytes = fis.readAllBytes();
    String text = new String(bytes);   // uses platform default charset — WRONG
}

// CORRECT: explicit charset with InputStreamReader bridge
try (BufferedReader br = new BufferedReader(
        new InputStreamReader(new FileInputStream("document.txt"), StandardCharsets.UTF_8))) {
    String line;
    while ((line = br.readLine()) != null) {
        processLine(line);
    }
}

// BEST for text files (Java 11+):
String text = Files.readString(Path.of("document.txt"), StandardCharsets.UTF_8);
// Or for line-by-line:
try (Stream<String> lines = Files.lines(Path.of("document.txt"), StandardCharsets.UTF_8)) {
    lines.forEach(this::processLine);
}

// ── available() — meaningful for FileInputStream ──────────────────────
try (FileInputStream fis = new FileInputStream("data.bin")) {
    int remainingBytes = fis.available();   // = total file size (at start)
    System.out.printf("File size: %,d bytes%n", remainingBytes);

    // Pre-size buffer based on available() — safe for FileInputStream:
    byte[] buffer = new byte[remainingBytes];
    // But readAllBytes() is simpler and more correct:
    byte[] allData = fis.readAllBytes();
}

// ── skip() — reliable skipping ────────────────────────────────────────
try (FileInputStream fis = new FileInputStream("data.bin")) {
    // Skip 1024-byte header:
    long skipped = 0;
    while (skipped < 1024) {
        long n = fis.skip(1024 - skipped);
        if (n <= 0) throw new EOFException("File too short to skip 1024 bytes");
        skipped += n;
    }
    // Now read the data after the header:
    byte[] data = fis.readAllBytes();
}

// ── Complete binary file reading patterns ─────────────────────────────

// Pattern 1: small binary file — read all bytes at once:
byte[] smallFile = Files.readAllBytes(Path.of("config.bin"));
ByteBuffer bb = ByteBuffer.wrap(smallFile).order(ByteOrder.BIG_ENDIAN);
int magic    = bb.getInt();
int version  = bb.getShort() & 0xFFFF;
// Process the whole file from memory — simplest and most correct

// Pattern 2: large binary file with records — streaming:
try (DataInputStream dis = new DataInputStream(
        new BufferedInputStream(new FileInputStream("records.bin"), 65536))) {

    // Read file header:
    byte[] fileHeader = new byte[32];
    dis.readFully(fileHeader);   // guaranteed: throws EOFException if < 32 bytes

    int recordCount = ByteBuffer.wrap(fileHeader).getInt(4);
    System.out.println("Records: " + recordCount);

    // Read each fixed-size record:
    byte[] record = new byte[64];
    for (int i = 0; i < recordCount; i++) {
        try {
            dis.readFully(record);   // throws EOFException on truncated file
        } catch (EOFException e) {
            throw new IOException("File truncated at record " + i, e);
        }
        processRecord(i, record);
    }
}

// Pattern 3: CSV-like text file disguised as "binary" reading — DON'T:
// Some developers use FileInputStream to read text files byte-by-byte and manually
// assemble lines. This is always wrong. Use BufferedReader.readLine() instead.

// Pattern 4: copy one file to another efficiently:
try (FileInputStream  src  = new FileInputStream("source.bin");
     FileOutputStream dest = new FileOutputStream("dest.bin")) {
    src.transferTo(dest);   // Java 9+: efficient, handles partial reads internally
}
// NIO.2 is cleaner:
Files.copy(Path.of("source.bin"), Path.of("dest.bin"),
    StandardCopyOption.REPLACE_EXISTING);

Related Topics in Java I/O

I/O Basics
Java I/O is built on a small set of abstract concepts that underlie every I/O operation in the language: streams, readers, writers, channels, and buffers. A stream is a sequential flow of data — bytes moving from a source to a destination one at a time or in chunks. Java organizes I/O around two fundamental distinctions: byte I/O (reading and writing raw bytes, the universal representation that everything ultimately reduces to) and character I/O (reading and writing text encoded in a specific character set, with automatic encoding and decoding). The original java.io package, introduced in Java 1.0, provides stream-based I/O through four abstract base classes: InputStream, OutputStream, Reader, and Writer. The java.nio package, introduced in Java 1.4, adds a channel-and-buffer model for non-blocking and memory-mapped I/O. The java.nio.file package, introduced in Java 7 as part of NIO.2, provides a modern, comprehensive file system API that supersedes much of java.io.File. This entry covers the conceptual model of streams and their abstract base classes, the decorator pattern that underlies Java I/O class hierarchy, the source-processor-sink taxonomy of stream classes, blocking versus non-blocking I/O, buffering and why it is almost always necessary, the standard I/O streams (System.in, System.out, System.err), and the resource management contract that every I/O class must satisfy.
Byte Streams
Byte streams are the fundamental I/O abstraction in Java for reading and writing raw binary data. InputStream and OutputStream are the abstract base classes for all byte-oriented I/O, and their concrete subclasses cover every byte-level data source and destination: files, byte arrays in memory, network sockets, pipes between threads, and process standard streams. The critical read() contract — returning an int from 0 to 255 for valid bytes and -1 for end-of-stream — is the foundation of all stream-based binary processing. Byte streams do not perform character encoding or decoding; every byte is passed through as-is, making them correct for binary formats (images, audio, archives, serialized data, protocol buffers), and incorrect for text unless the encoding is explicitly managed. This entry covers the complete InputStream and OutputStream APIs, every major concrete byte stream class and its use case, DataInputStream and DataOutputStream for structured binary I/O, the mark/reset mechanism, available() and its correct interpretation, skipping and transferTo, and ObjectInputStream and ObjectOutputStream for Java serialization.
Character Streams
Character streams, represented by the Reader and Writer abstract base classes, handle text data by abstracting away the encoding and decoding between Java's internal char/String representation (UTF-16) and the byte encoding used in files and network connections. Where byte streams treat data as raw octets, character streams treat data as Unicode characters, handling multi-byte sequences transparently according to a specified Charset. InputStreamReader and OutputStreamWriter are the bridge classes that connect byte streams to character streams, applying charset encoding on write and decoding on read. BufferedReader adds line-at-a-time reading via readLine() and multi-character buffering. PrintWriter adds print/println/printf formatting output. StringReader and StringWriter enable in-memory character stream operations on String data. This entry covers the complete Reader and Writer APIs, charset handling and the consequences of using the wrong charset, the complete class hierarchy of character streams with the use case for each, BufferedReader.readLine() semantics and the lines() stream, the bridge classes in depth, character encoding best practices, and the interaction between character streams and Java's String.lines() and Files.readString()/writeString() alternatives.
File Handling
File handling in Java spans two generations of API: the legacy java.io.File class introduced in Java 1.0, and the modern java.nio.file package (NIO.2) introduced in Java 7 with its Path interface, Files utility class, and FileSystem abstraction. The File class represents a file or directory path as an abstract pathname and provides methods for querying metadata, listing directory contents, creating and deleting files, and basic path manipulation. Its limitations — no symbolic link support, inconsistent error reporting (methods return boolean instead of throwing exceptions), no atomic operations, limited metadata access, and performance issues for large directory traversals — motivated the complete redesign in NIO.2. The Path interface and Files class cover all functionality of File with better exception handling, symbolic link support, atomic operations, rich metadata via BasicFileAttributes, efficient directory walking with Files.walk() and Files.walkFileTree(), file watching with WatchService, and a provider model for custom file system implementations. This entry covers the complete File API and its limitations, the NIO.2 Path and Files APIs, directory traversal strategies, file watching, temporary files, and best practices for cross-platform path handling.