Spring Boot

ELK Stack

The ELK Stack — Elasticsearch, Logstash, and Kibana — is the most widely used log aggregation and search platform for microservices. Spring Boot services ship JSON-structured logs to Logstash (or directly to Elasticsearch via Filebeat), which indexes them in Elasticsearch. Kibana provides full-text search, field filtering, dashboards, and alerting over the aggregated logs from all services.

ELK Stack Architecture

The ELK stack has four components in a modern deployment. Filebeat (a lightweight shipper) runs as a sidecar or DaemonSet, tails log files or collects container stdout, and forwards logs to Logstash. Logstash parses, enriches, and routes log events to Elasticsearch. Elasticsearch indexes and stores the logs. Kibana queries Elasticsearch and visualises the data.

Java

// ── Modern ELK data flow: ─────────────────────────────────────────────
//
//  Spring Boot Service
//    │  writes JSON logs to stdout / file
//    │
//    ▼
//  Filebeat (sidecar / DaemonSet)
//    │  tails log files or Docker/K8s container logs
//    │  forwards to Logstash (or directly to Elasticsearch)
//    │
//    ▼
//  Logstash (optional — skip for simple setups)
//    │  parse, filter, enrich, route
//    │  e.g. parse stack traces, add geo-IP, route by service name
//    │
//    ▼
//  Elasticsearch
//    │  indexes log documents
//    │  one index per service per day: order-service-2025.05.30
//    │  stores: message, level, traceId, userId, service, timestamp, ...
//    │
//    ▼
//  Kibana
//    │  Discover: full-text search + field filters
//    │  Dashboards: error rate charts, latency histograms
//    │  Alerting: notify on error spike or keyword match
//    │  Lens: drag-and-drop visualisations

// ── Two common simpler alternatives to full ELK: ─────────────────────
//
// EFK Stack (Elasticsearch + Fluentd + Kibana):
//   Fluentd replaces Logstash — lighter, more cloud-native.
//   Common in Kubernetes environments.
//
// Grafana Loki (alternative to Elasticsearch):
//   Stores log metadata only (labels), not full-text indexed.
//   Much cheaper storage — trade full-text search for lower cost.
//   Query language: LogQL (similar to PromQL).
//   Best choice when already using Grafana for metrics.

Docker Compose Setup

Running the full ELK stack locally requires Elasticsearch, Logstash, Kibana, and Filebeat. Elasticsearch needs significant memory — set the heap to at least 512MB and raise vm.max_map_count on Linux. The stack below is configured for local development with security disabled for simplicity.

yaml

# docker-compose.yml — full ELK stack + Spring Boot service
version: "3.9"

services:

  # ── Spring Boot microservice: ─────────────────────────────────────────
  order-service:
    build: ./order-service
    ports:
      - "8082:8082"
    environment:
      SPRING_PROFILES_ACTIVE: docker
    volumes:
      # Mount log directory so Filebeat can read it:
      - app-logs:/var/log/order-service
    logging:
      driver: json-file         # Docker writes stdout as JSON
      options:
        max-size: "10m"
        max-file: "3"

  # ── Elasticsearch: ────────────────────────────────────────────────────
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false      # disable for local dev
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ulimits:
      memlock:
        soft: -1
        hard: -1

  # ── Logstash: ─────────────────────────────────────────────────────────
  logstash:
    image: docker.elastic.co/logstash/logstash:8.13.0
    ports:
      - "5044:5044"       # Beats input (from Filebeat)
      - "5000:5000/tcp"   # TCP input (direct from app)
      - "5000:5000/udp"
      - "9600:9600"       # Logstash monitoring API
    volumes:
      - ./elk/logstash/pipeline:/usr/share/logstash/pipeline
      - ./elk/logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml
    depends_on:
      - elasticsearch

  # ── Kibana: ───────────────────────────────────────────────────────────
  kibana:
    image: docker.elastic.co/kibana/kibana:8.13.0
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
    depends_on:
      - elasticsearch

  # ── Filebeat: ─────────────────────────────────────────────────────────
  filebeat:
    image: docker.elastic.co/beats/filebeat:8.13.0
    user: root
    volumes:
      - ./elk/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
      - app-logs:/var/log/order-service:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    depends_on:
      - logstash

volumes:
  elasticsearch-data:
  app-logs:

Logstash Pipeline Configuration

A Logstash pipeline has three stages: input (where logs come from), filter (how to parse and enrich them), and output (where to send them). For Spring Boot JSON logs, the filter stage parses the JSON payload, extracts the timestamp, handles multi-line stack traces, and adds the Elasticsearch index name.

ruby

# elk/logstash/pipeline/spring-boot.conf

input {
  # ── Receive logs from Filebeat: ────────────────────────────────────
  beats {
    port => 5044
    type => "spring-boot"
  }

  # ── Direct TCP input from Spring Boot (Logback TCP appender): ──────
  tcp {
    port => 5000
    codec => json_lines    # expect one JSON object per line
    type  => "spring-boot-direct"
  }
}

filter {

  # ── Parse JSON log payload from Spring Boot: ───────────────────────
  if [type] == "spring-boot" or [type] == "spring-boot-direct" {

    json {
      source  => "message"
      target  => "log"
      remove_field => ["message"]   # avoid duplicating the field
    }

    # ── Promote key fields to top level for easier Kibana querying: ──
    mutate {
      rename => {
        "[log][timestamp]"  => "@timestamp"
        "[log][level]"      => "level"
        "[log][service]"    => "service"
        "[log][traceId]"    => "traceId"
        "[log][spanId]"     => "spanId"
        "[log][userId]"     => "userId"
        "[log][message]"    => "message"
        "[log][logger]"     => "logger"
        "[log][thread]"     => "thread"
        "[log][stack_trace]"=> "stackTrace"
      }
    }

    # ── Parse the timestamp: ─────────────────────────────────────────
    date {
      match    => ["@timestamp", "ISO8601"]
      target   => "@timestamp"
      timezone => "UTC"
    }

    # ── Add environment tag from service name: ───────────────────────
    if [service] {
      mutate {
        add_field => {
          "[@metadata][index]" => "logs-%{service}-%{+YYYY.MM.dd}"
        }
      }
    } else {
      mutate {
        add_field => {
          "[@metadata][index]" => "logs-unknown-%{+YYYY.MM.dd}"
        }
      }
    }

    # ── Tag ERROR logs for easy filtering: ───────────────────────────
    if [level] == "ERROR" {
      mutate { add_tag => ["error"] }
    }

    # ── Drop noisy health check logs: ────────────────────────────────
    if [message] =~ "GET /actuator/health" {
      drop { }
    }
  }
}

output {
  elasticsearch {
    hosts    => ["elasticsearch:9200"]
    index    => "%{[@metadata][index]}"
    # index lifecycle management — auto-delete old indices:
    ilm_enabled         => true
    ilm_rollover_alias  => "logs"
    ilm_policy          => "logs-policy"
  }

  # ── Debug: also print to Logstash console in dev: ────────────────
  # stdout { codec => rubydebug }
}

Filebeat Configuration

Filebeat tails log files and container stdout, parses multiline stack traces, adds metadata (container name, service name, Kubernetes pod labels), and forwards events to Logstash or Elasticsearch. Running as a Kubernetes DaemonSet ensures every node's pod logs are collected without per-service configuration.

yaml

# elk/filebeat/filebeat.yml

filebeat.inputs:

  # ── Read Spring Boot log files from mounted volume: ──────────────────
  - type: log
    enabled: true
    paths:
      - /var/log/order-service/*.log
      - /var/log/user-service/*.log
    fields:
      log_type: spring-boot
    fields_under_root: true

    # ── Handle multi-line Java stack traces: ─────────────────────────
    multiline:
      type:     pattern
      pattern:  '^\d{4}-\d{2}-\d{2}'   # new log line starts with date
      negate:   true
      match:    after
      # Lines NOT starting with a date (stack trace lines) are
      # appended to the previous line.

  # ── Collect Docker container stdout logs: ────────────────────────────
  - type: container
    enabled: true
    paths:
      - /var/lib/docker/containers/*/*.log
    processors:
      - add_docker_metadata:
          host: "unix:///var/run/docker.sock"
      - decode_json_fields:
          fields: ["message"]
          target: ""
          overwrite_keys: true

# ── Processors — enrich all events: ──────────────────────────────────
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~

# ── Output — forward to Logstash: ────────────────────────────────────
output.logstash:
  hosts: ["logstash:5044"]
  loadbalance: true       # distribute across multiple Logstash nodes

# ── OR: Output directly to Elasticsearch (no Logstash): ──────────────
# output.elasticsearch:
#   hosts: ["elasticsearch:9200"]
#   index: "logs-%{[fields.service]}-%{+yyyy.MM.dd}"

# ── Kubernetes DaemonSet — collect from all pods: ─────────────────────
# filebeat.autodiscover:
#   providers:
#     - type: kubernetes
#       node: \${NODE_NAME}
#       hints.enabled: true
#       hints.default_config:
#         type: container
#         paths:
#           - /var/log/containers/*\${data.kubernetes.container.id}*.log
#       templates:
#         - condition:
#             contains:
#               kubernetes.labels.app: spring-boot
#           config:
#             - type: container
#               paths:
#                 - /var/log/containers/*.log
#               multiline:
#                 type: pattern
#                 pattern: '^\d{4}-\d{2}-\d{2}'
#                 negate: true
#                 match: after

Spring Boot Logback Configuration for ELK

Spring Boot must emit JSON-structured logs so Logstash can parse them without fragile regex patterns. The Logstash Logback encoder formats every log entry as a single JSON object per line — including the message, level, logger, thread, MDC fields (traceId, userId), and exception stack traces.

XML

<!-- pom.xml — Logstash Logback encoder: -->
<dependency>
    <groupId>net.logstash.logback</groupId>
    <artifactId>logstash-logback-encoder</artifactId>
    <version>7.4</version>
</dependency>

// ── src/main/resources/logback-spring.xml: ───────────────────────────
/*
<?xml version="1.0" encoding="UTF-8"?>
<configuration>

  <springProperty name="APP_NAME"
      source="spring.application.name"
      defaultValue="unknown"/>

  <!-- ── JSON appender for ELK (production / docker profiles): ───── -->
  <appender name="LOGSTASH"
            class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">

      <!-- Include all MDC fields (traceId, spanId, userId, etc.): -->
      <includeMdc>true</includeMdc>

      <!-- Custom fields added to every log event: -->
      <customFields>{"service":"${APP_NAME}"}</customFields>

      <!-- Include caller info (class, method, line): -->
      <includeCallerData>false</includeCallerData>

      <!-- Field name overrides: -->
      <fieldNames>
        <timestamp>timestamp</timestamp>
        <version>[ignore]</version>
        <levelValue>[ignore]</levelValue>
      </fieldNames>

      <!-- Include full stack trace as a separate field: -->
      <throwableConverter
          class="net.logstash.logback.stacktrace.ShortenedThrowableConverter">
        <maxDepthPerCause>20</maxDepthPerCause>
        <shortenedClassNameLength>20</shortenedClassNameLength>
        <rootCauseFirst>true</rootCauseFirst>
      </throwableConverter>

    </encoder>
  </appender>

  <!-- ── Direct TCP appender — sends to Logstash without Filebeat: ─ -->
  <appender name="LOGSTASH_TCP"
            class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <destination>logstash:5000</destination>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
    <reconnectionDelay>10 seconds</reconnectionDelay>
    <keepAliveDuration>5 minutes</keepAliveDuration>
  </appender>

  <!-- ── Profile selection: ──────────────────────────────────────── -->
  <springProfile name="local,test">
    <root level="INFO">
      <appender-ref ref="CONSOLE"/>   <!-- plain text for local dev -->
    </root>
  </springProfile>

  <springProfile name="docker,prod,staging">
    <root level="INFO">
      <appender-ref ref="LOGSTASH"/>  <!-- JSON to stdout → Filebeat -->
    </root>
  </springProfile>

</configuration>
*/

// ── Sample JSON log output (one line, formatted here for readability): ─
// {
//   "@timestamp":  "2025-05-30T14:23:01.456Z",
//   "message":     "Order placed orderId=1001 userId=42",
//   "logger_name": "c.e.order.service.OrderService",
//   "thread_name": "http-nio-8082-exec-3",
//   "level":       "INFO",
//   "service":     "order-service",
//   "traceId":     "4bf92f3577b34da6",
//   "spanId":      "00f067aa0ba902b7",
//   "userId":      "42",
//   "correlationId": "a3ce929d0e0e4736",
//   "stack_trace": null
// }

Kibana Queries and Dashboards

Kibana's Discover view is the primary tool for searching logs during an incident. KQL (Kibana Query Language) filters log documents by field values. Saved searches, dashboards, and alerts are built on top of KQL queries. The most important queries for microservices operations are error searches, trace lookups, and per-service error rate trends.

yaml

# ── KQL — Kibana Query Language examples: ────────────────────────────

# Find all errors in the last hour:
level: "ERROR"

# Find all logs for a specific trace (distributed request trace):
traceId: "4bf92f3577b34da6"

# Find errors for a specific service:
level: "ERROR" AND service: "order-service"

# Find all logs for a specific user:
userId: "42"

# Find all 5xx errors across all services:
level: "ERROR" AND message: *exception*

# Find slow requests (requires duration field in logs):
duration_ms > 1000 AND service: "order-service"

# Find NullPointerExceptions across all services today:
stackTrace: *NullPointerException*

# Find all logs for a specific correlation ID:
correlationId: "a3ce929d0e0e4736"

# Exclude health check noise:
NOT message: "/actuator/health"

# ── Kibana Dashboard panels for microservices: ────────────────────────
#
# 1. Error Rate Over Time (Area chart):
#    Metric: Count of documents
#    Filter: level: "ERROR"
#    Group by: service (terms aggregation)
#    Split series by: @timestamp (date histogram, 1 minute)
#
# 2. Log Volume by Service (Bar chart):
#    Metric: Count of documents
#    Group by: service.keyword (terms aggregation)
#
# 3. Top 10 Error Messages (Data Table):
#    Metric: Count
#    Group by: message.keyword (terms, top 10)
#    Filter: level: "ERROR"
#
# 4. Error Heatmap (service × time):
#    X-axis: @timestamp (date histogram)
#    Y-axis: service.keyword (terms)
#    Color:  Count (filter: level: ERROR)
#
# 5. Trace Lookup (Saved Search):
#    Columns: @timestamp, service, level, traceId, message
#    Filter:  traceId: $traceId  (with variable input)

# ── Index Lifecycle Management policy (auto-delete old logs): ─────────
# PUT _ilm/policy/logs-policy
# {
#   "policy": {
#     "phases": {
#       "hot": {
#         "actions": {
#           "rollover": {
#             "max_size":  "10gb",
#             "max_age":   "1d"
#           }
#         }
#       },
#       "warm": {
#         "min_age": "7d",
#         "actions": {
#           "shrink":   { "number_of_shards": 1 },
#           "forcemerge": { "max_num_segments": 1 }
#         }
#       },
#       "delete": {
#         "min_age": "30d",      # delete logs older than 30 days
#         "actions": {
#           "delete": {}
#         }
#       }
#     }
#   }
# }

Distributed Logging