Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

  • November 9, 2018
Table of Contents

Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

The number of shards on each node, and tries to balance the number of shards per node evenly across the clusterThe high and low disk watermarks. Elasticsearch considers the available disk space on a node before deciding whether to allocate new shards to that node or to actively relocate shards away from that node. A nodes that has reached the low watermark (i.e 80% disk used) is not allowed receive any more shards.

A node that has reached the high watermark (i.e 90%) will start to actively move shards away from it. The high and low disk watermarks. Elasticsearch considers the available disk space on a node before deciding whether to allocate new shards to that node or to actively relocate shards away from that node.

A nodes that has reached the low watermark (i.e 80% disk used) is not allowed receive any more shards. A node that has reached the high watermark (i.e 90%) will start to actively move shards away from it. New indices created and old indices dropping.

Disk watermark triggers due to indexing and other shard movements. Elasticsearch randomly deciding that a node has too few/too many shards compared to the cluster average. Hardware and OS-level failures causing new AWS instances to spin up and join the cluster.

With 500+ nodes this happens several times a week on average. New nodes added, almost every week because of normal data growth.

Source: meltwater.com

Share :
comments powered by Disqus

Related Posts

A Netflix Web Performance Case Study

A Netflix Web Performance Case Study

Netflix is one of the most popular video streaming services. Since launching globally in 2016, the company has found that many new users are not only signing up on mobile devices but are also using less-than-ideal connections to do so. By refining the JavaScript used for Netflix.com’s sign-up process and using prefetching techniques, the developer team was able to provide a better user experience for both mobile and desktop users and offer several improvements.

Read More
20 Best YouTube channels for AI and machine learning

20 Best YouTube channels for AI and machine learning

What are the most interesting and informative YouTube channels about artificial intelligence (AI) and machine learning? Subscribe to these 20 high-quality channels today to stay up to date with the latest AI and machine learning breakthroughs. Siraj Raval:

Read More
Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management provides an abstraction layer for various workloads. With the increasing scale of our business, the efficient use of cluster resources becomes very important.

Read More