Thchere

How Server-Side Sharding Reduces API Server Load in Kubernetes v1.36

Published: 2026-05-14 00:43:19 | Category: Cloud Computing

Introduction

As Kubernetes clusters grow to tens of thousands of nodes, controllers that watch high-cardinality resources like Pods face significant scaling barriers. In a horizontally scaled controller, every replica receives the full stream of events from the API server. This means each replica must deserialize and process every event, only to discard objects it isn't responsible for—resulting in wasted CPU, memory, and network resources. Even when you add more replicas, the per-replica cost doesn't decrease; it multiplies. Kubernetes v1.36 addresses this with an alpha feature called server-side sharded list and watch (KEP-5866). By filtering events at the source, the API server now sends only the slice of the resource collection that each replica owns.

How Server-Side Sharding Reduces API Server Load in Kubernetes v1.36

The Limitations of Client-Side Sharding

Some controllers, such as kube-state-metrics, already support horizontal sharding on the client side. In this approach, each replica is assigned a portion of the keyspace and discards objects that do not belong to it. While this works functionally, it does nothing to reduce the volume of data flowing from the API server. The result is a clear scaling inefficiency:

  • N replicas × full event stream: Every replica deserializes and processes every event, then discards what it doesn't need.
  • Network bandwidth scales with replicas, not shard size: Adding more replicas increases total bandwidth consumption.
  • Wasted CPU: Deserialization effort is spent on data that is later thrown away.

Server-side sharding solves all these problems by moving the filtering logic upstream into the API server itself.

How Server-Side Sharding Works

The feature introduces a new shardSelector field in ListOptions. Clients specify a hash range using the shardRange() function. For example:

shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')

The API server computes a deterministic 64-bit FNV-1a hash of the specified field (currently supporting object.metadata.uid or object.metadata.namespace) and returns only objects whose hash falls within the range [start, end). This applies to both list responses and watch event streams. Because the hash function produces the same result across all API server instances, the feature is safe to use with multiple API server replicas.

Implementing Sharded Watches in Controllers

Controllers typically use informers to list and watch resources. To shard the workload, each replica injects the shardSelector into the ListOptions used by its informers via WithTweakListOptions. Here's an example:

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/informers"
)

shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"

factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod,
    informers.WithTweakListOptions(func(opts *metav1.ListOptions) {
        opts.ShardSelector = shardSelector
    }),
)

For a 2-replica deployment, the selectors split the hash space in half:

  • Replica 0: shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')
  • Replica 1: shardRange(object.metadata.uid, '0x8000000000000000', '0x0000000000000000')

Each replica now receives only the events relevant to its assigned hash range, dramatically reducing the load on both the API server and the controller.

Key Benefits

The primary advantage of server-side sharding is the dramatic reduction in network and CPU overhead. Controllers no longer need to deserialize and process events they don't own. This leads to:

  1. Lower bandwidth usage: Data sent to each replica is proportional to its shard size, not the total resource count.
  2. Reduced API server load: The API server performs filtering once per shard instead of sending the full stream to every replica.
  3. Better scalability: Controllers can be scaled horizontally without multiplying the cost.

Conclusion

Server-side sharded list and watch is a powerful addition to Kubernetes v1.36. By shifting filtering from the client to the API server, it addresses one of the most important scaling bottlenecks for large clusters. Controllers that adopt this alpha feature will see immediate improvements in resource efficiency and can continue to scale with confidence. As the feature matures, it is expected to become a standard tool for optimizing Kubernetes event streaming.