Automating Datacenter Operations at Dropbox

Automating Datacenter Operations at Dropbox

  • January 23, 2019
Table of Contents

Automating Datacenter Operations at Dropbox

Switch provisioning at Dropbox is handled by a Pirlo component called the TOR Starter. The TOR Starter is responsible for validating and configuring switches in our datacenter server racks, PoP server racks, and at the different layers of our datacenter fabric that connect racks in the same facility together. Writing the TOR Starter on top of the ClusterOps queue provides us with a basic manager-worker queuing service.

We also have the ability to customize the queue to fit our needs in switch provisioning. The switch job table (shown below) is an extension of the basic job table. Similarly, the TOR Starter queue manager thread implementation is customized to queue switch jobs, and the TOR Starter worker implements all of the switch validation and provisioning logic.

Along with all of the switch job attributes, there are several tables that provide a comprehensive view of a switch job. As a switch job is running, the current state can be queried by a client and displayed in our user interface. After a switch job has completed, all of the job’s state is kept in the database and can be queried for reporting and analytics purposes.

Tables in the database also hold information related to each component in the switch such as its network uplinks, fans, and power supplies. All of the captured data from a switch is linked to a particular switch job.

Source: dropbox.com

Share :
comments powered by Disqus

Related Posts

Building Services at Airbnb Part 3

Building Services at Airbnb Part 3

In the third post of our series on scaling service development, we dive into resilience engineering practices built into the standard service platform that powers the new Services Oriented Architecture atAirbnb. Airbnb is moving its infrastructure towards a Service Oriented Architecture. A reliable, performant, and developer-friendly polyglot service platform is an underpinning component in Airbnb’s architectural evolution.

Read More
Kubernetes Failure Stories

Kubernetes Failure Stories

I started to compile a list of public failure/horror stories related to Kubernetes. It should make it easier for people tasked with operations to find outage reports to learn from. Since we started with Kubernetes at Zalando in 2016, we collected many internal postmortems.

Read More