Spring Boot

Service Discovery

Service discovery is the mechanism by which microservices dynamically locate each other at runtime without hard-coded host/port configuration. In a containerised environment instances start, stop, and move constantly — service discovery maintains a live registry of healthy instances and lets clients resolve a logical service name to a real network address.

Why Service Discovery Is Needed

In a monolith, a method call resolves at compile time. In microservices, a service call goes over the network — and the target's IP and port change whenever a container restarts or scales. Hard-coding addresses in configuration breaks immediately in any dynamic environment. Service discovery replaces static addresses with a runtime registry that tracks live instances.

Java

// ── The problem: hard-coded addresses ────────────────────────────────

// BAD — hard-coded host and port:
// application.yml:
// user-service:
//   url: http://192.168.1.101:8081    ← breaks when container restarts

// What happens in Kubernetes / Docker Compose:
//   Container starts  → gets IP 172.18.0.5
//   Container crashes → gets IP 172.18.0.9  (different!)
//   Scale to 3 instances → which IP do you call?

// ── The solution: service registry ───────────────────────────────────
//
//  Step 1 — Service registers itself on startup:
//  ┌─────────────────┐         ┌─────────────────────────┐
//  │  UserService    │─────────▶   Service Registry      │
//  │  172.18.0.5:8081│ register│  user-service           │
//  └─────────────────┘         │    172.18.0.5:8081  ✓   │
//                              │    172.18.0.9:8081  ✓   │
//  Step 2 — Client queries registry:   └─────────────────────────┘
//  ┌─────────────────┐  "where is user-service?"    │
//  │  OrderService   │──────────────────────────────▶│
//  │                 │◀── [172.18.0.5, 172.18.0.9] ──┘
//  └─────────────────┘
//
//  Step 3 — Client picks an instance and calls it:
//  OrderService → HTTP GET http://172.18.0.5:8081/api/users/1

// ── With service discovery — no hard-coded IPs: ───────────────────────
// application.yml:
// (nothing — service name resolved dynamically via Eureka)
//
// Java:
// restTemplate.getForObject("http://user-service/api/users/1", ...)
//                                    ↑
//                    logical name, resolved at runtime

Client-Side vs Server-Side Discovery

There are two discovery patterns. In client-side discovery the client queries the registry itself, receives a list of instances, and picks one using a load-balancing algorithm. In server-side discovery a load balancer or proxy (e.g. AWS ALB, Kubernetes Service) performs the lookup on behalf of the client. Spring Cloud uses client-side discovery by default.

Java

// ── CLIENT-SIDE DISCOVERY (Spring Cloud default) ─────────────────────
//
//  Client (OrderService)
//    │
//    ├─ 1. Ask Eureka: "give me instances of user-service"
//    │      Eureka returns: [172.18.0.5:8081, 172.18.0.9:8081]
//    │
//    ├─ 2. Spring Cloud LoadBalancer picks one (round-robin by default)
//    │      Selected: 172.18.0.5:8081
//    │
//    └─ 3. Client calls 172.18.0.5:8081/api/users/1 directly
//
//  Pros: simple, no extra hop, client controls load-balancing strategy
//  Cons: every client must include discovery logic + registry dependency

// Spring Cloud LoadBalancer bean (auto-configured with Eureka client):
@Configuration
public class LoadBalancerConfig {

    // Override default round-robin with random selection:
    @Bean
    ReactorLoadBalancer<ServiceInstance> randomLoadBalancer(
            Environment env,
            LoadBalancerClientFactory factory) {
        String name = env.getProperty(
            LoadBalancerClientFactory.PROPERTY_NAME);
        return new RandomLoadBalancer(
            factory.getLazyProvider(name, ServiceInstanceListSupplier.class),
            name
        );
    }
}

// ── SERVER-SIDE DISCOVERY (Kubernetes / AWS) ──────────────────────────
//
//  Client (OrderService)
//    │
//    └─ 1. Call http://user-service/api/users/1
//                       ↓
//           Kubernetes Service (ClusterIP)   ← acts as virtual IP
//                       ↓
//           kube-proxy selects a Pod:
//              Pod A: 10.0.0.5:8081
//              Pod B: 10.0.0.6:8081
//                       ↓
//           Request forwarded to selected Pod
//
//  Pros: client is simple — no registry SDK, no load-balancer logic
//  Cons: extra network hop; less flexible load-balancing strategies

// ── In Kubernetes you often skip Eureka entirely: ────────────────────
// Kubernetes provides built-in server-side discovery via Services.
// spring.application.name=user-service
// In another service: http://user-service → Kubernetes DNS resolves it.
// No Eureka needed — Kubernetes IS the registry.

Health Checks and Heartbeats

A registry is only useful if it reflects live instances. Services send periodic heartbeats to the registry to prove they are alive. If a heartbeat is missed for a configurable number of intervals the registry evicts the instance, preventing traffic from being routed to a dead service. Spring Boot Actuator provides the health endpoint that Eureka uses for checks.

Java

// ── How Eureka heartbeats work: ───────────────────────────────────────
//
//  UserService                    Eureka Server
//      │── register (on startup) ──▶│
//      │── heartbeat every 30s ────▶│  "still alive"
//      │── heartbeat ───────────────▶│
//      │   [crash / no heartbeat]    │
//      │                            │  after 90s (3 missed) → evict instance
//      │                            │  user-service 172.18.0.5 removed

// ── Actuator health endpoint (required for checks): ──────────────────

// pom.xml:
// <dependency>
//   <groupId>org.springframework.boot</groupId>
//   <artifactId>spring-boot-starter-actuator</artifactId>
// </dependency>

// application.yml:
// management:
//   endpoints:
//     web:
//       exposure:
//         include: health, info
//   endpoint:
//     health:
//       show-details: always    # show DB, disk, custom indicators

// GET http://localhost:8081/actuator/health
// Response:
// {
//   "status": "UP",
//   "components": {
//     "db":          { "status": "UP" },
//     "diskSpace":   { "status": "UP" },
//     "eureka":      { "status": "UP" }
//   }
// }

// ── Custom HealthIndicator: ───────────────────────────────────────────
@Component
public class ExternalApiHealthIndicator implements HealthIndicator {

    private final ExternalApiClient apiClient;

    @Override
    public Health health() {
        try {
            apiClient.ping();
            return Health.up()
                .withDetail("external-api", "reachable")
                .build();
        } catch (Exception ex) {
            return Health.down()
                .withDetail("external-api", "unreachable")
                .withException(ex)
                .build();
        }
    }
}

// ── Eureka heartbeat tuning (application.yml): ───────────────────────
// eureka:
//   instance:
//     lease-renewal-interval-in-seconds: 30   # heartbeat frequency
//     lease-expiration-duration-in-seconds: 90 # evict after 3 missed
//   client:
//     registry-fetch-interval-seconds: 30      # how often to refresh cache

Self-Preservation Mode

Eureka's self-preservation mode protects the registry from incorrectly evicting healthy instances during a network partition. If Eureka stops receiving heartbeats from a large percentage of instances at once, it assumes a network problem rather than mass service failures and stops evicting. This prevents cascading removal of healthy services.

Java

// ── Self-preservation explained: ─────────────────────────────────────
//
//  Normal: Eureka receives heartbeats from 100 instances.
//  Suddenly: 40 instances stop sending heartbeats.
//
//  Without self-preservation:
//    Eureka evicts all 40 → clients get errors → cascading failure.
//
//  With self-preservation (default: ON):
//    Eureka thinks: "40 services don't die simultaneously.
//                   This is probably a network partition."
//    → Eureka KEEPS the stale registrations rather than evicting them.
//    → When network recovers, heartbeats resume, registry self-heals.
//
//  Threshold: if Eureka receives < 85% of expected heartbeats
//             → self-preservation activates.

// ── Configuration (Eureka Server application.yml): ───────────────────
// eureka:
//   server:
//     enable-self-preservation: true          # default: true
//     renewal-percent-threshold: 0.85         # 85% threshold
//     eviction-interval-timer-in-ms: 60000    # check every 60s

// ── Disable self-preservation in local dev (avoid stale entries): ────
// eureka:
//   server:
//     enable-self-preservation: false
//     eviction-interval-timer-in-ms: 5000     # evict stale faster in dev

// ── Eureka dashboard warning when self-preservation is active: ────────
// "EMERGENCY! EUREKA MAY BE INCORRECTLY CLAIMING INSTANCES ARE UP
//  WHEN THEY'RE NOT. RENEWALS ARE LESSER THAN THRESHOLD AND HENCE
//  THE INSTANCES ARE NOT BEING EXPIRED JUST TO BE SAFE."
//
// This is expected during network partitions — not an error.
// The registry is intentionally being conservative.

Spring Cloud

Eureka Server