Spring Boot
Prometheus Integration
Prometheus is a pull-based time-series metrics database. It scrapes the /actuator/prometheus endpoint of each service at a configurable interval, stores the metrics, and makes them queryable via PromQL. Spring Boot exposes metrics in the Prometheus text format through the micrometer-registry-prometheus dependency — no agent or sidecar required.
Spring Boot Setup for Prometheus
The micrometer-registry-prometheus dependency adds the /actuator/prometheus endpoint that Prometheus scrapes. All Micrometer metrics are automatically translated to Prometheus format — counters, timers, gauges, and summaries all appear with correct Prometheus type annotations.
XML
<!-- pom.xml: -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<!-- version managed by Spring Boot BOM -->
</dependency>Prometheus Endpoint and Metric Format
Prometheus scrapes /actuator/prometheus and stores the metrics as time series. Spring Boot's Micrometer bridge converts metric names (dots to underscores), adds _total suffix to counters, and adds _seconds suffix to timers automatically. Understanding the naming conversion is essential for writing correct PromQL queries.
yaml
# application.yml — expose and configure the Prometheus endpoint: ─────
management:
endpoints:
web:
exposure:
include: health, prometheus, metrics
metrics:
tags:
application: ${spring.application.name} # global tag on every metric
environment: ${spring.profiles.active:local}
distribution:
percentiles-histogram:
http.server.requests: true # enables Prometheus histogram
orders.placement.duration: true
percentiles:
http.server.requests: 0.5, 0.95, 0.99
slo:
http.server.requests: 100ms, 250ms, 500ms
# GET /actuator/prometheus — sample output: ──────────────────────────
# # HELP http_server_requests_seconds Duration of HTTP server request handling
# # TYPE http_server_requests_seconds summary
# http_server_requests_seconds{application="order-service",
# environment="prod",method="GET",status="200",
# uri="/api/orders/{id}",quantile="0.5"} 0.042
# http_server_requests_seconds{...,quantile="0.95"} 0.187
# http_server_requests_seconds{...,quantile="0.99"} 0.443
# http_server_requests_seconds_count{...} 15234
# http_server_requests_seconds_sum{...} 643.21
#
# # HELP orders_placed_total Total orders placed
# # TYPE orders_placed_total counter
# orders_placed_total{application="order-service",
# channel="mobile",tier="premium"} 1523.0
#
# # HELP jvm_memory_used_bytes Used JVM memory
# # TYPE jvm_memory_used_bytes gauge
# jvm_memory_used_bytes{area="heap",id="G1 Eden Space"} 5.24288E7
# ── Micrometer → Prometheus naming conversion: ───────────────────────
# Micrometer name Prometheus name
# ──────────────────── ──────────────────────────────────────
# http.server.requests → http_server_requests_seconds (Timer adds _seconds)
# orders.placed → orders_placed_total (Counter adds _total)
# jvm.memory.used → jvm_memory_used_bytes (with baseUnit=bytes)
# orders.active → orders_active (Gauge — no suffix)Prometheus Configuration (prometheus.yml)
Prometheus is configured with a YAML file that defines scrape jobs — which services to scrape, how often, and with what labels. Each microservice is a separate job. In Kubernetes, ServiceMonitor CRDs (from the kube-prometheus-stack) replace static scrape configs with dynamic discovery.
yaml
# prometheus.yml — scrape configuration: ────────────────────────────
global:
scrape_interval: 15s # scrape every service every 15 seconds
evaluation_interval: 15s # evaluate alerting rules every 15 seconds
external_labels:
cluster: production
region: us-east-1
# ── Alerting rules files: ─────────────────────────────────────────────
rule_files:
- "rules/microservices-alerts.yml"
- "rules/jvm-alerts.yml"
# ── Alertmanager integration: ─────────────────────────────────────────
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager:9093"]
# ── Scrape jobs: ──────────────────────────────────────────────────────
scrape_configs:
# API Gateway:
- job_name: api-gateway
metrics_path: /actuator/prometheus
static_configs:
- targets: ["api-gateway:9090"]
relabel_configs:
- source_labels: [__address__]
target_label: instance
# User Service:
- job_name: user-service
metrics_path: /actuator/prometheus
static_configs:
- targets: ["user-service:9090"]
# Order Service:
- job_name: order-service
metrics_path: /actuator/prometheus
static_configs:
- targets: ["order-service:9090"]
# Payment Service:
- job_name: payment-service
metrics_path: /actuator/prometheus
static_configs:
- targets: ["payment-service:9090"]
# ── Kubernetes: dynamic discovery via pod annotations ────────────────
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Only scrape pods with annotation prometheus.io/scrape: "true":
- source_labels:
[__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: "true"
# Use custom metrics path if specified:
- source_labels:
[__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# Add namespace and pod name as labels:
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: podPromQL Queries for Microservices
PromQL (Prometheus Query Language) is used to query, aggregate, and derive metrics for dashboards and alerts. These queries cover the most common microservices observability needs — request rate, error rate, latency percentiles, and JVM health.
yaml
# ── REQUEST RATE (requests per second): ──────────────────────────────
# All services, all endpoints:
rate(http_server_requests_seconds_count[5m])
# Specific service:
rate(http_server_requests_seconds_count{
application="order-service"}[5m])
# ── ERROR RATE (% of requests returning 5xx): ────────────────────────
# Per service:
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
/
rate(http_server_requests_seconds_count[5m])
* 100
# ── LATENCY PERCENTILES: ──────────────────────────────────────────────
# p99 latency per endpoint (requires percentiles-histogram: true):
histogram_quantile(
0.99,
sum by (uri, le) (
rate(http_server_requests_seconds_bucket{
application="order-service"}[5m])
)
)
# p95 latency across all endpoints for a service:
histogram_quantile(
0.95,
sum by (le) (
rate(http_server_requests_seconds_bucket{
application="order-service"}[5m])
)
)
# ── BUSINESS METRICS: ────────────────────────────────────────────────
# Order placement rate per second:
rate(orders_placed_total[5m])
# Order failure rate by reason:
rate(orders_failed_total[5m])
# Orders by channel:
sum by (channel) (rate(orders_placed_total[5m]))
# ── JVM METRICS: ─────────────────────────────────────────────────────
# Heap usage percentage:
jvm_memory_used_bytes{area="heap"}
/
jvm_memory_max_bytes{area="heap"}
* 100
# GC pause time rate:
rate(jvm_gc_pause_seconds_sum[5m])
# Active threads:
jvm_threads_live_threads
# ── CONNECTION POOL: ─────────────────────────────────────────────────
# Pool utilisation %:
hikaricp_connections_active
/
hikaricp_connections_max
* 100
# Pending threads waiting for connection:
hikaricp_connections_pendingAlerting Rules
Prometheus alerting rules evaluate PromQL expressions on a schedule. When an expression evaluates to true for longer than the configured duration, Prometheus fires an alert to Alertmanager, which routes it to PagerDuty, Slack, or email. Alerts must be actionable — every alert should have a clear runbook and a defined on-call response.
yaml
# rules/microservices-alerts.yml
groups:
- name: microservices
interval: 30s
rules:
# ── Service down: ────────────────────────────────────────────────
- alert: ServiceDown
expr: up{job=~".*-service"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} is down"
description: >
{{ $labels.job }} on {{ $labels.instance }}
has been unreachable for more than 1 minute.
runbook: "https://wiki.example.com/runbooks/service-down"
# ── High error rate: ─────────────────────────────────────────────
- alert: HighErrorRate
expr: >
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
/
rate(http_server_requests_seconds_count[5m])
> 0.01
for: 2m
labels:
severity: warning
annotations:
summary: >
High error rate on {{ $labels.application }}
description: >
Error rate is {{ $value | humanizePercentage }}
on {{ $labels.application }} (threshold: 1%)
# ── High p99 latency: ────────────────────────────────────────────
- alert: HighP99Latency
expr: >
histogram_quantile(0.99,
sum by (application, le)(
rate(http_server_requests_seconds_bucket[5m])
)
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: >
High p99 latency on {{ $labels.application }}
description: >
p99 latency is {{ $value | humanizeDuration }}
(threshold: 1s)
# ── JVM heap usage: ──────────────────────────────────────────────
- alert: HighHeapUsage
expr: >
jvm_memory_used_bytes{area="heap"}
/
jvm_memory_max_bytes{area="heap"}
> 0.85
for: 5m
labels:
severity: warning
annotations:
summary: >
High heap usage on {{ $labels.application }}
description: >
Heap usage is {{ $value | humanizePercentage }}
(threshold: 85%). Risk of OOM.
# ── DB connection pool exhaustion: ───────────────────────────────
- alert: ConnectionPoolExhaustion
expr: hikaricp_connections_pending > 0
for: 2m
labels:
severity: critical
annotations:
summary: >
DB connection pool exhausted on
{{ $labels.application }}
description: >
{{ $value }} threads waiting for a DB connection.
Increase pool size or optimise queries.