Engineering For Failure

September 25, 2020

Table of Contents

A set of practical patterns to recover from failures in external services Not so long ago, our systems were simple: we had one machine, with one process, probably no more than one external datastore, and the entire request lifecycle was processed and handled within this simple world. Our users were also accustomed to a certain SLA standard — a 2-second page load time could have been acceptable a few years ago, but waiting more than a second for an Instagram post is unthinkable nowadays. When systems get more complex, with strict latency requirements and a distributed infrastructure, an uninvited guest crawls up our systems — request failure.

Source: medium.com

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

In October 2014, Uber had started its journey of scale in what would eventually turn out to be one of the most impressive growth phases in the company. Over time we were scaling our engineering teams non-linearly each month and acquiring millions of users across the world. In this article, we will go through the different phases of the evolution of Uber’s API gateway that powers Uber products.

How AWS Cloud Customers Are Using Local Zones for Edge Computing

Keeping sync fast with automated performance regression detection

Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine.

Engineering For Failure

Tags :

Share :

Related Posts

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

How AWS Cloud Customers Are Using Local Zones for Edge Computing

Keeping sync fast with automated performance regression detection