Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

  • November 9, 2018
Table of Contents

Optimal Shard Placement in a Petabyte Scale Elasticsearch Cluster

The number of shards on each node, and tries to balance the number of shards per node evenly across the clusterThe high and low disk watermarks. Elasticsearch considers the available disk space on a node before deciding whether to allocate new shards to that node or to actively relocate shards away from that node. A nodes that has reached the low watermark (i.e 80% disk used) is not allowed receive any more shards.

A node that has reached the high watermark (i.e 90%) will start to actively move shards away from it. The high and low disk watermarks. Elasticsearch considers the available disk space on a node before deciding whether to allocate new shards to that node or to actively relocate shards away from that node.

A nodes that has reached the low watermark (i.e 80% disk used) is not allowed receive any more shards. A node that has reached the high watermark (i.e 90%) will start to actively move shards away from it. New indices created and old indices dropping.

Disk watermark triggers due to indexing and other shard movements. Elasticsearch randomly deciding that a node has too few/too many shards compared to the cluster average. Hardware and OS-level failures causing new AWS instances to spin up and join the cluster.

With 500+ nodes this happens several times a week on average. New nodes added, almost every week because of normal data growth.

Source: meltwater.com

Share :
comments powered by Disqus

Related Posts

A Netflix Web Performance Case Study

A Netflix Web Performance Case Study

Netflix is one of the most popular video streaming services. Since launching globally in 2016, the company has found that many new users are not only signing up on mobile devices but are also using less-than-ideal connections to do so. By refining the JavaScript used for Netflix.com’s sign-up process and using prefetching techniques, the developer team was able to provide a better user experience for both mobile and desktop users and offer several improvements.

Read More
GraphQL: A success story for PayPal Checkout

GraphQL: A success story for PayPal Checkout

At PayPal, we recently introduced GraphQL to our technology stack. At PayPal, GraphQL has been a complete game changer to the way we think about data, fetch data and build applications. This blog post takes a close look at PayPal Checkout and explains our journey from REST to Batch REST to GraphQL and lessons learned along the way.

Read More
Modernizing your build pipelines

Modernizing your build pipelines

Doing Continuous Integration is a lot easier if you have the right tools. In our project at a german car manufacturer, we were tasked with developing new services and bringing them to the cloud. We had a centralized Jenkins instance, shared by all the teams in the department.

Read More