VIPKid chose TiDB to manage its high data volume, highly concurrent write application. Learn how TiDB excels in that scenario, along with multidimensional queries, data life cycle management, and real-time analytics. We use MySQL as our backend database. But as our application data grew rapidly, standalone MySQL’s storage capacity became a bottleneck, and it could […]
A set of practical patterns to recover from failures in external services Not so long ago, our systems were simple: we had one machine, with one process, probably no more than one external datastore, and the entire request lifecycle was processed and handled within this simple world. Our users were also accustomed to a certain […]
Amazon Web Services says users are tapping Local Zones to run hybrid environments and support latency-intensive tasks like game rendering. Source: datacenterfrontier
Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine. […]
Back in 2013, one of our large backend services wanted support in Rest.li for dark canaries. The service, at the time, involved duplicating requests from one host machine and sending it to another host machine. This was added via a Python tool to populate the host-to-host mapping in Apache ZooKeeper along with a filter to […]
Over the last year, I have been talking about “storage first” serverless patterns. With these patterns, data is stored persistently before any business logic is applied. The advantage of this pattern is increased application resiliency. By persisting the data before processing, the original data is still available, if or when errors occur. Using Amazon API […]
We look at how Shard Manager is fully integrated in Facebook’s infrastructure ecosystem and provides a holistic, end-to-end solution supporting basic shard failover as well as sophisticated load balancing, shard scaling, and operational safety. Over the years, as we’ve expanded in scale and functionalities, Facebook has evolved from a basic web server architecture into a […]
We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2. Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres. The platform hosts highly-available mission-critical software applications and systems, to […]
What happened to our infrastructure when a customer got over 10 million page views in a few hours? Yesterday was an exhausting day. I woke up at 6 AM to a 1.5 million job backlog in our queue and immediately jumped out of bed. Now, this has happened before, so I no longer stress about […]
In October 2014, Uber had started its journey of scale in what would eventually turn out to be one of the most impressive growth phases in the company. Over time we were scaling our engineering teams non-linearly each month and acquiring millions of users across the world. In this article, we will go through the […]