Production testing with dark canaries

Production testing with dark canaries

  • September 25, 2020
Table of Contents

Production testing with dark canaries

Back in 2013, one of our large backend services wanted support in Rest.li for dark canaries. The service, at the time, involved duplicating requests from one host machine and sending it to another host machine. This was added via a Python tool to populate the host-to-host mapping in Apache ZooKeeper along with a filter to read this mapping and multiply traffic.

As operational complexity grew (due to additional data centers, dark canaries being used in midtier and even frontend services, and dynamic scale up-down of instances), this became more complex to maintain. More teams were using dark canaries, but our developers and SREs were still hindered by how difficult it was to onboard and maintain dark canaries. For example, when dark canaries suddenly stopped receiving traffic or disappeared because hosts were swapped out from underneath them, engineers had to recreate the tedious host-to-host mapping in every data center.

It was clear we needed a new solution.

Source: linkedin.com

Share :
comments powered by Disqus

Related Posts

Scaling services with Shard Manager

Scaling services with Shard Manager

We look at how Shard Manager is fully integrated in Facebook’s infrastructure ecosystem and provides a holistic, end-to-end solution supporting basic shard failover as well as sophisticated load balancing, shard scaling, and operational safety. Over the years, as we’ve expanded in scale and functionalities, Facebook has evolved from a basic web server architecture into a complex one with thousands of services working behind the scenes. It’s no trivial task to scale the wide range of back-end services needed for Facebook’s products.

Read More
Designing Edge Gateway, Uber’s API Lifecycle Management Platform

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

In October 2014, Uber had started its journey of scale in what would eventually turn out to be one of the most impressive growth phases in the company. Over time we were scaling our engineering teams non-linearly each month and acquiring millions of users across the world. In this article, we will go through the different phases of the evolution of Uber’s API gateway that powers Uber products.

Read More
3 Years of Kubernetes in Production–Here’s What We Learned

3 Years of Kubernetes in Production–Here’s What We Learned

We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2. Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres.

Read More