Spring BootPrometheus Integration
Spring Boot

Prometheus Integration

Prometheus is a pull-based time-series metrics database. It scrapes the /actuator/prometheus endpoint of each service at a configurable interval, stores the metrics, and makes them queryable via PromQL. Spring Boot exposes metrics in the Prometheus text format through the micrometer-registry-prometheus dependency — no agent or sidecar required.

Spring Boot Setup for Prometheus

The micrometer-registry-prometheus dependency adds the /actuator/prometheus endpoint that Prometheus scrapes. All Micrometer metrics are automatically translated to Prometheus format — counters, timers, gauges, and summaries all appear with correct Prometheus type annotations.
XML
<!-- pom.xml: -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <!-- version managed by Spring Boot BOM -->
</dependency>

Prometheus Endpoint and Metric Format

Prometheus scrapes /actuator/prometheus and stores the metrics as time series. Spring Boot's Micrometer bridge converts metric names (dots to underscores), adds _total suffix to counters, and adds _seconds suffix to timers automatically. Understanding the naming conversion is essential for writing correct PromQL queries.
yaml
# application.yml — expose and configure the Prometheus endpoint: ─────
management:
  endpoints:
    web:
      exposure:
        include: health, prometheus, metrics
  metrics:
    tags:
      application: ${spring.application.name}  # global tag on every metric
      environment: ${spring.profiles.active:local}
    distribution:
      percentiles-histogram:
        http.server.requests: true          # enables Prometheus histogram
        orders.placement.duration: true
      percentiles:
        http.server.requests: 0.5, 0.95, 0.99
      slo:
        http.server.requests: 100ms, 250ms, 500ms

# GET /actuator/prometheus — sample output: ──────────────────────────
# # HELP http_server_requests_seconds Duration of HTTP server request handling
# # TYPE http_server_requests_seconds summary
# http_server_requests_seconds{application="order-service",
#   environment="prod",method="GET",status="200",
#   uri="/api/orders/{id}",quantile="0.5"} 0.042
# http_server_requests_seconds{...,quantile="0.95"} 0.187
# http_server_requests_seconds{...,quantile="0.99"} 0.443
# http_server_requests_seconds_count{...} 15234
# http_server_requests_seconds_sum{...}   643.21
#
# # HELP orders_placed_total Total orders placed
# # TYPE orders_placed_total counter
# orders_placed_total{application="order-service",
#   channel="mobile",tier="premium"} 1523.0
#
# # HELP jvm_memory_used_bytes Used JVM memory
# # TYPE jvm_memory_used_bytes gauge
# jvm_memory_used_bytes{area="heap",id="G1 Eden Space"} 5.24288E7

# ── Micrometer → Prometheus naming conversion: ───────────────────────
# Micrometer name          Prometheus name
# ────────────────────     ──────────────────────────────────────
# http.server.requests  →  http_server_requests_seconds (Timer adds _seconds)
# orders.placed         →  orders_placed_total          (Counter adds _total)
# jvm.memory.used       →  jvm_memory_used_bytes        (with baseUnit=bytes)
# orders.active         →  orders_active                (Gauge — no suffix)

Prometheus Configuration (prometheus.yml)

Prometheus is configured with a YAML file that defines scrape jobs — which services to scrape, how often, and with what labels. Each microservice is a separate job. In Kubernetes, ServiceMonitor CRDs (from the kube-prometheus-stack) replace static scrape configs with dynamic discovery.
yaml
# prometheus.yml — scrape configuration: ────────────────────────────
global:
  scrape_interval:     15s    # scrape every service every 15 seconds
  evaluation_interval: 15s    # evaluate alerting rules every 15 seconds
  external_labels:
    cluster: production
    region:  us-east-1

# ── Alerting rules files: ─────────────────────────────────────────────
rule_files:
  - "rules/microservices-alerts.yml"
  - "rules/jvm-alerts.yml"

# ── Alertmanager integration: ─────────────────────────────────────────
alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

# ── Scrape jobs: ──────────────────────────────────────────────────────
scrape_configs:

  # API Gateway:
  - job_name: api-gateway
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ["api-gateway:9090"]
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance

  # User Service:
  - job_name: user-service
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ["user-service:9090"]

  # Order Service:
  - job_name: order-service
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ["order-service:9090"]

  # Payment Service:
  - job_name: payment-service
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ["payment-service:9090"]

  # ── Kubernetes: dynamic discovery via pod annotations ────────────────
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Only scrape pods with annotation prometheus.io/scrape: "true":
      - source_labels:
          [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      # Use custom metrics path if specified:
      - source_labels:
          [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      # Add namespace and pod name as labels:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

PromQL Queries for Microservices

PromQL (Prometheus Query Language) is used to query, aggregate, and derive metrics for dashboards and alerts. These queries cover the most common microservices observability needs — request rate, error rate, latency percentiles, and JVM health.
yaml
# ── REQUEST RATE (requests per second): ──────────────────────────────
# All services, all endpoints:
rate(http_server_requests_seconds_count[5m])

# Specific service:
rate(http_server_requests_seconds_count{
  application="order-service"}[5m])

# ── ERROR RATE (% of requests returning 5xx): ────────────────────────
# Per service:
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
  /
rate(http_server_requests_seconds_count[5m])
* 100

# ── LATENCY PERCENTILES: ──────────────────────────────────────────────
# p99 latency per endpoint (requires percentiles-histogram: true):
histogram_quantile(
  0.99,
  sum by (uri, le) (
    rate(http_server_requests_seconds_bucket{
      application="order-service"}[5m])
  )
)

# p95 latency across all endpoints for a service:
histogram_quantile(
  0.95,
  sum by (le) (
    rate(http_server_requests_seconds_bucket{
      application="order-service"}[5m])
  )
)

# ── BUSINESS METRICS: ────────────────────────────────────────────────
# Order placement rate per second:
rate(orders_placed_total[5m])

# Order failure rate by reason:
rate(orders_failed_total[5m])

# Orders by channel:
sum by (channel) (rate(orders_placed_total[5m]))

# ── JVM METRICS: ─────────────────────────────────────────────────────
# Heap usage percentage:
jvm_memory_used_bytes{area="heap"}
  /
jvm_memory_max_bytes{area="heap"}
* 100

# GC pause time rate:
rate(jvm_gc_pause_seconds_sum[5m])

# Active threads:
jvm_threads_live_threads

# ── CONNECTION POOL: ─────────────────────────────────────────────────
# Pool utilisation %:
hikaricp_connections_active
  /
hikaricp_connections_max
* 100

# Pending threads waiting for connection:
hikaricp_connections_pending

Alerting Rules

Prometheus alerting rules evaluate PromQL expressions on a schedule. When an expression evaluates to true for longer than the configured duration, Prometheus fires an alert to Alertmanager, which routes it to PagerDuty, Slack, or email. Alerts must be actionable — every alert should have a clear runbook and a defined on-call response.
yaml
# rules/microservices-alerts.yml
groups:
  - name: microservices
    interval: 30s
    rules:

      # ── Service down: ────────────────────────────────────────────────
      - alert: ServiceDown
        expr: up{job=~".*-service"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.job }} is down"
          description: >
            {{ $labels.job }} on {{ $labels.instance }}
            has been unreachable for more than 1 minute.
          runbook: "https://wiki.example.com/runbooks/service-down"

      # ── High error rate: ─────────────────────────────────────────────
      - alert: HighErrorRate
        expr: >
          rate(http_server_requests_seconds_count{status=~"5.."}[5m])
          /
          rate(http_server_requests_seconds_count[5m])
          > 0.01
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: >
            High error rate on {{ $labels.application }}
          description: >
            Error rate is {{ $value | humanizePercentage }}
            on {{ $labels.application }} (threshold: 1%)

      # ── High p99 latency: ────────────────────────────────────────────
      - alert: HighP99Latency
        expr: >
          histogram_quantile(0.99,
            sum by (application, le)(
              rate(http_server_requests_seconds_bucket[5m])
            )
          ) > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: >
            High p99 latency on {{ $labels.application }}
          description: >
            p99 latency is {{ $value | humanizeDuration }}
            (threshold: 1s)

      # ── JVM heap usage: ──────────────────────────────────────────────
      - alert: HighHeapUsage
        expr: >
          jvm_memory_used_bytes{area="heap"}
          /
          jvm_memory_max_bytes{area="heap"}
          > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: >
            High heap usage on {{ $labels.application }}
          description: >
            Heap usage is {{ $value | humanizePercentage }}
            (threshold: 85%). Risk of OOM.

      # ── DB connection pool exhaustion: ───────────────────────────────
      - alert: ConnectionPoolExhaustion
        expr: hikaricp_connections_pending > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: >
            DB connection pool exhausted on
            {{ $labels.application }}
          description: >
            {{ $value }} threads waiting for a DB connection.
            Increase pool size or optimise queries.