Keeping sync fast with automated performance regression detection

Keeping sync fast with automated performance regression detection

  • September 25, 2020
Table of Contents

Keeping sync fast with automated performance regression detection

Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine.

In particular, we describe a performance regression testing framework we call Apogee. Apogee helps us find unanticipated performance issues in the development process and safeguard against bugs that we would otherwise release to our users. As we developed our new sync engine, we used Apogee to compare the performance of new vs. old, ensuring that the Dropbox sync experience didn’t suffer when we rolled Nucleus out to our users.

When we specifically sought to improve sync performance, we used Apogee as pre-release validation that our improvements had the intended impact. In this post, we’ll be covering Apogee’s system design, how we overcame challenges we faced while building it, and finish by discussing a few performance regressions it caught for us over the past two years.

Source: dropbox.tech

Share :
comments powered by Disqus

Related Posts

3 Years of Kubernetes in Production–Here’s What We Learned

3 Years of Kubernetes in Production–Here’s What We Learned

We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2. Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres.

Read More
Production testing with dark canaries

Production testing with dark canaries

Back in 2013, one of our large backend services wanted support in Rest.li for dark canaries. The service, at the time, involved duplicating requests from one host machine and sending it to another host machine. This was added via a Python tool to populate the host-to-host mapping in Apache ZooKeeper along with a filter to read this mapping and multiply traffic.

Read More
How we upgraded PostgreSQL at GitLab.com

How we upgraded PostgreSQL at GitLab.com

We explain the precise maintenance process to execute a major version upgrade of PostgreSQL. The biggest challenge was to do a complete fleet major upgrade through an orchestrated pg_upgrade. We needed to have a rollback plan to optimize our capacity right after Recovery Time Objective (RTO) while maintaining a 12-node clusterâs 6TB-data consistent serving 300.000 aggregated transactions per second from around six million users.

Read More