Load Balancing & Networking

Load balancers distribute traffic across servers. Understanding routing algorithms and failure handling is essential for any high-availability design.

L4 vs L7 Load Balancing

Load balancers operate at different layers of the network stack. The layer determines what information they can see and act on.

L4 vs L7 Comparison

Layer	What it sees	Routing decisions based on	Examples
L4 (Transport Layer)	IP addresses, ports, TCP/UDP protocol	IP address and port only. Fast. Cannot see request content.	AWS NLB, HAProxy in TCP mode
L7 (Application Layer)	HTTP headers, URL paths, cookies, body	URL path, hostname, user session. Slower but far more powerful.	AWS ALB, Nginx, Kong

Most web systems use L7 load balancing because it enables content-based routing: /api/* goes to API servers, /static/* goes to CDN origin servers, /admin/* goes to admin servers. L4 is used in front of L7 when you need maximum throughput with minimal overhead.

Load Balancing Algorithms

Algorithm Comparison

Algorithm	How it works	Best for
Round Robin	Send request 1 to server 1, request 2 to server 2, etc., cycling through.	Stateless services with identical hardware and request costs.
Weighted Round Robin	Same but powerful servers get proportionally more traffic.	Heterogeneous server fleet (different CPU/memory).
Least Connections	Send next request to server currently handling fewest active connections.	Long-lived connections (WebSockets, streaming) where request duration varies.
Consistent Hashing	Hash request key (e.g., user_id) to always route same user to same server.	Stateful services, caches (same user hits same server = warm cache), chat servers.
Least Response Time	Route to the server with lowest current latency AND fewest connections.	Heterogeneous workloads where response time varies significantly.

Health Checks & Failover

A load balancer continuously checks if backend servers are healthy via health check endpoints — typically a GET /health that returns 200 OK. If a server fails to respond (times out) or returns a non-200 status for N consecutive checks, the load balancer removes it from the rotation and stops sending traffic. When the server recovers, it's automatically added back. This is the foundation of high availability: servers can die without user impact.

Health check best practices:

Check every 10 seconds; fail after 3 consecutive failures (~30 seconds to remove)
Health endpoint should test actual dependencies (DB connection, Redis connection), not just return 200
Use different health endpoints for load balancer (shallow) vs. orchestrator (deep)

Load Balancer Types in Practice

Type	Product	Use Case
Hardware LB	F5, Citrix ADC	Enterprise, telco. Expensive but handle millions of connections at line rate
Cloud-managed L7	AWS ALB, GCP HTTPS LB	Web applications, microservices. Auto-scaling. Path/header routing
Cloud-managed L4	AWS NLB	Ultra-high throughput. Preserves source IP. Static IP addresses
Software LB (self-hosted)	Nginx, HAProxy	Full control. Used inside VPC for internal traffic
Service Mesh	Istio, Envoy, Linkerd	Kubernetes. Per-pod load balancing, circuit breaking, mTLS
DNS-based	AWS Route 53, Cloudflare	Global routing, geo-based routing, health-check-based failover

Global Load Balancing (Multi-Region)

For global deployments, you need a layer above regional load balancers:

GeoDNS (Route 53 Geolocation routing): DNS returns different IPs based on user's geographic location. US users → us-east-1, European users → eu-west-1.
Anycast routing: The same IP address is advertised from multiple data centers. Network routing sends users to the nearest one. Used by Cloudflare, Fastly.
Global HTTP LB (GCP Global LB, AWS CloudFront): Anycast front-end routes to the nearest healthy backend. Works at L7.

Sticky Sessions (Session Affinity)

Some applications require the same user to always reach the same backend server (stateful applications). Load balancers achieve this via sticky sessions:

Cookie-based: LB injects a cookie with the server ID. Subsequent requests with that cookie go to the same server.
IP hash: Hash the client's IP address to always route to the same server.
Application-level: Use consistent hashing on user_id.

⚠️ Warning: Sticky sessions undermine the purpose of load balancing. If a user's "sticky" server is overloaded, that user has poor performance even though other servers are idle. Prefer stateless servers + externalized state (Redis sessions) over sticky sessions.

Connection Draining (Graceful Shutdown)

When removing a server from rotation (deployment, scale-in), you need connection draining:

Load balancer stops sending new requests to the server (deregistered)
Existing active requests are allowed to complete (drain timeout: 30–300 seconds)
Server shuts down after all connections close or drain timeout expires

This prevents cutting off in-flight requests during deployments.

Interview Talking Points

"I'd use an L7 load balancer (AWS ALB) in front of the API servers. It routes /api/v1/users/* to the user service and /api/v1/orders/* to the order service. Health checks on /health every 10 seconds."
"For WebSocket connections, I'd use Least Connections algorithm — since WebSocket connections are long-lived, round robin would unevenly distribute load over time."
"Consistent hashing at the load balancer routes all requests from the same user to the same WebSocket server — their connection is always reachable without Redis pub/sub for intra-server routing."
"GeoDNS routes European users to our Frankfurt region and US users to Virginia — this cuts latency by 100ms+ versus serving everyone from one region."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load Balancing & Networking

L4 vs L7 Load Balancing

Load Balancing Algorithms

Health Checks & Failover

Load Balancer Types in Practice

Global Load Balancing (Multi-Region)

Sticky Sessions (Session Affinity)

Connection Draining (Graceful Shutdown)

Interview Talking Points

FilesExpand file tree

load-balancing.md

Latest commit

History

load-balancing.md

File metadata and controls

Load Balancing & Networking

L4 vs L7 Load Balancing

Load Balancing Algorithms

Health Checks & Failover

Load Balancer Types in Practice

Global Load Balancing (Multi-Region)

Sticky Sessions (Session Affinity)

Connection Draining (Graceful Shutdown)

Interview Talking Points