Keeping Sync Fast With Automated Performance Regression Detection

Keeping sync fast with automated performance regression detection

Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine.

In particular, we describe a performance regression testing framework we call Apogee. Apogee helps us find unanticipated performance issues in the development process and safeguard against bugs that we would otherwise release to our users. As we developed our new sync engine, we used Apogee to compare the performance of new vs. old, ensuring that the Dropbox sync experience didn’t suffer when we rolled Nucleus out to our users.

When we specifically sought to improve sync performance, we used Apogee as pre-release validation that our improvements had the intended impact. In this post, we’ll be covering Apogee’s system design, how we overcame challenges we faced while building it, and finish by discussing a few performance regressions it caught for us over the past two years.