Cross shard transactions at 10 million requests per second

Cross shard transactions at 10 million requests per second

  • November 9, 2018
Table of Contents

Cross shard transactions at 10 million requests per second

Dropbox stores petabytes of metadata to support user-facing features and to power our production infrastructure. The primary system we use to store this metadata is named Edgestore and is described in a previous blog post, (Re)Introducing Edgestore. In simple terms, Edgestore is a service and abstraction over thousands of MySQL nodes that provides users with strongly consistent, transactional reads and writes at low latency.

Edgestore hides details of physical sharding from the application layer to allow developers to scale out their metadata storage needs without thinking about complexities of data placement and distribution. Central to building a distributed database on top of individual MySQL shards in Edgestore is the ability to collocate related data items together on the same shard. Developers express logical collocation of data via the concept of a colo, indicating that two pieces of data are typically accessed together.

In turn, Edgestore provides low-latency, transactional guarantees for reads and writes within a given colo (by placing them on the same physical MySQL shard), but only best-effort support across colos. While the product use-cases at Dropbox are usually a good fit for collocation, over time we found that certain ones just aren’t easily partitionable. As a simple example, an association between a user and the content they share with another user is unlikely to be collocated, since the users likely live on different shards.

Even if we were to attempt to reorganize physical storage such that related colos land on the same physical shards, we would never get a perfect cut of data.

Source: dropbox.com

Share :
comments powered by Disqus

Related Posts

Learning Concepts with Energy Functions

Learning Concepts with Energy Functions

We’ve developed an energy-based model that can quickly learn to identify and generate instances of concepts, such as near, above, between, closest, and furthest, expressed as sets of 2d points. Our model learns these concepts after only five demonstrations.

Read More
Horizon: An open-source reinforcement learning platform

Horizon: An open-source reinforcement learning platform

Horizon is the first open source end-to-end platform that uses applied reinforcement learning (RL) to optimize systems in large-scale production environments. The workflows and algorithms included in this release were built on open frameworks — PyTorch 1.0, Caffe2, and Spark — making Horizon accessible to anyone using RL at scale. We’ve put Horizon to work internally over the past year in a wide range of applications, including helping to personalize M suggestions, delivering more meaningful notifications, and optimizing streaming video quality.

Read More
New Theory of Intelligence May Disrupt AI and Neuroscience

New Theory of Intelligence May Disrupt AI and Neuroscience

Recent advancement in artificial intelligence, namely in deep learning, has borrowed concepts from the human brain. The architecture of most deep learning models is based on layers of processing– an artificial neural network that is inspired by the neurons of the biological brain. Yet neuroscientists do not agree on exactly what intelligence is, and how it is formed in the human brain — it’s a phenomena that remains unexplained.

Read More