Keeping sync fast with automated performance regression detection

Keeping sync fast with automated performance regression detection

  • September 25, 2020
Table of Contents

Keeping sync fast with automated performance regression detection

Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine.

In particular, we describe a performance regression testing framework we call Apogee. Apogee helps us find unanticipated performance issues in the development process and safeguard against bugs that we would otherwise release to our users. As we developed our new sync engine, we used Apogee to compare the performance of new vs. old, ensuring that the Dropbox sync experience didn’t suffer when we rolled Nucleus out to our users.

When we specifically sought to improve sync performance, we used Apogee as pre-release validation that our improvements had the intended impact. In this post, we’ll be covering Apogee’s system design, how we overcame challenges we faced while building it, and finish by discussing a few performance regressions it caught for us over the past two years.

Source: dropbox.tech

Share :
comments powered by Disqus

Related Posts

Scaling services with Shard Manager

Scaling services with Shard Manager

We look at how Shard Manager is fully integrated in Facebook’s infrastructure ecosystem and provides a holistic, end-to-end solution supporting basic shard failover as well as sophisticated load balancing, shard scaling, and operational safety. Over the years, as we’ve expanded in scale and functionalities, Facebook has evolved from a basic web server architecture into a complex one with thousands of services working behind the scenes. It’s no trivial task to scale the wide range of back-end services needed for Facebook’s products.

Read More
Building storage-first serverless applications with HTTP APIs service integrations

Building storage-first serverless applications with HTTP APIs service integrations

Over the last year, I have been talking about “storage first” serverless patterns. With these patterns, data is stored persistently before any business logic is applied. The advantage of this pattern is increased application resiliency.

Read More
3 Years of Kubernetes in Production–Here’s What We Learned

3 Years of Kubernetes in Production–Here’s What We Learned

We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2. Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres.

Read More