-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Is your feature request related to a problem? Please describe
OpenSearch cluster metadata (cluster state) is managed by leader elected from one of the cluster manager eligible nodes. Metadata updates is coordinated by leader and updated atomically as a whole. The updated metadata is published to follower nodes in cluster and the propagation of update to individual nodes (via cluster appliers) is tracked by leader. Leader is also responsible for tracking the health of follower nodes and taking corrective actions if they do not have latest metadata. Cluster state is persisted on local disk of nodes and to S3 remotely.
Besides coordination of metadata updates, leader is responsible for cluster tasks such as shard allocation, cluster information refresh, follower checker, lag detector, snapshot etc that needs schedule-once guarantees to be run on only one node in cluster. The cluster management activities are time sensitive and dedicated cluster manager nodes ensure resource isolation for admin operations. This necessitates customers to provision dedicated cluster manager nodes (typically 3 nodes with 2 of them in stand-by).
The core logic of cluster coordination :- cluster state persistence, cluster state publication, node discovery, health checks and leader election is implemented in Coordinator module of the OpenSearch. The module is not extensible to adapt to cloud native deployment alternatives for cluster management.

Version specific UpdateTask
OpenSearch follows semantic versioning and provides wire compatibility of REST and Transport endpoints across minor versions of a specific major version. Cluster state updates are however processed using UpdateTask, the implementation of which can change across minor versions. Unlike endpoints, the implementation of UpdateTask that mutate metadata are not guaranteed to be compatible across all the minor versions for a major version.
Describe the solution you'd like
As OpenSearch evolves toward a cloud-native architecture, we propose leveraging external cloud services to
- Coordinate for metadata updates
- Enable leaderless cluster operation
- Distribute cluster tasks across external services
It will eliminate the need for provisioning dedicated cluster manager nodes while maintaining reliable cluster management through cloud services. This transition requires cluster manager to be stateless and coordinate the cluster tasks using remote service.
Decoupled Metadata Updates
We propose to decouple the following entities from cluster state so as that each of them can be persisted to remote store outside the distributed cluster state.
- Metadata :- index metadata, cluster settings and templates
- Routing information :- shard assignment to nodes
- Node :- cluster membership information
The updates to each of these entities can happen concurrently without interfering with one another.
Task Queue Per-Update type
As discussed above, OpenSearch can have implementation changes in update tasks for minor versions, and it will not be feasible to deploy version specific update task on remote service. Local OpenSearch cluster needs to process metadata update requests using version specific update task with coordination from remote service. The remote service can coordinate for queuing the update request, persisting the metadata, publish the updates and track the nodes in cluster for the latest update. With the coordination performed by remote service, the metadata updates can be processed on any node in the cluster and not necessarily on leader node.
We propose to introduce multiple TaskQueue for processing metadata updates to allow non-interfering UpdateTask to run in parallel and publish updates to remote service independently. The task queues can be hosted on any cluster manager eligible node in the cluster and need not be necessarily on leader.
Coordination Interfaces
OpenSearch should continue to be compatible for users who choose to use local cluster coordination. We propose to introduce interfaces into cluster coordination module and have remote implementations pluggable via Cluster Plugins. The interfaces will help OpenSearch cluster to plugin remote implementation for 1) Leader Election 2) Follower & Leader Checker 3) Cluster State Applier
For remote coordination, leader election process needs be externalised and it should be possible for remote service to assign cluster leader without running election loop within the cluster. Likewise it should be possible for nodes in cluster to listen to notification from remote service to consume latest update and not rely on leader .
Architecture
The proposed architecture paves way for offloading the cluster coordination activities and metadata persistence completely to remote multi-tenant service in cloud. The leader of OpenSearch cluster can be a stateless cluster coordinator with no metadata information persisted in local distributed cluster state.

Customer Benefits
- Stability / Availability of leader (w/o dedicated cluster manager)
- Ease of use (less configuration changes) by customer, since no vertical scaling of leader is required
- Scale is not limited to leader node in cluster
Goals
- Break down cluster state as entities which can be mutated independently. Cluster should be able to bootstrap metadata from remote store even if quorum of nodes is not available.
- It should be possible for remote service (via lease lock or other mechanisms) to assign a cluster leader without running election loop within the cluster.
- Task Queue Per Metadata-type :- Cluster metadata update using multiple task queues based on the entity to be updated
- Task Delegation :- Ability to delegate cluster coordination tasks to other nodes in cluster (or) to remote service and track them to completion.
Related component
Cluster Manager
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status