News

Building storage-first serverless applications with HTTP APIs service integrations

Building storage-first serverless applications with HTTP APIs service integrations

Over the last year, I have been talking about “storage first” serverless patterns. With these patterns, data is stored persistently before any business logic is applied. The advantage of this pattern is increased application resiliency.

Read More
Scaling services with Shard Manager

Scaling services with Shard Manager

We look at how Shard Manager is fully integrated in Facebook’s infrastructure ecosystem and provides a holistic, end-to-end solution supporting basic shard failover as well as sophisticated load balancing, shard scaling, and operational safety. Over the years, as we’ve expanded in scale and functionalities, Facebook has evolved from a basic web server architecture into a complex one with thousands of services working behind the scenes. It’s no trivial task to scale the wide range of back-end services needed for Facebook’s products.

Read More
3 Years of Kubernetes in Production–Here’s What We Learned

3 Years of Kubernetes in Production–Here’s What We Learned

We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2. Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres.

Read More
Designing Edge Gateway, Uber’s API Lifecycle Management Platform

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

In October 2014, Uber had started its journey of scale in what would eventually turn out to be one of the most impressive growth phases in the company. Over time we were scaling our engineering teams non-linearly each month and acquiring millions of users across the world. In this article, we will go through the different phases of the evolution of Uber’s API gateway that powers Uber products.

Read More
VALORANT’s 128-Tick Servers

VALORANT’s 128-Tick Servers

To provide a short summary – in VALORANT, a key part of the gameplay is taking strategic positions and holding them. Holding positions can become impossible if other players can run around a corner and kill the defender before the defender can react due to latency. That latency is partly based on the network and partly based on the server tick rate.

Read More
How we upgraded PostgreSQL at GitLab.com

How we upgraded PostgreSQL at GitLab.com

We explain the precise maintenance process to execute a major version upgrade of PostgreSQL. The biggest challenge was to do a complete fleet major upgrade through an orchestrated pg_upgrade. We needed to have a rollback plan to optimize our capacity right after Recovery Time Objective (RTO) while maintaining a 12-node clusterâs 6TB-data consistent serving 300.000 aggregated transactions per second from around six million users.

Read More
Three Basecamp outages. One week. What happened?

Three Basecamp outages. One week. What happened?

Basecamp has suffered through three serious outages in the last week, on Friday, August 28th, on Tuesday, September 1, and again today. It’s embarrassing, and we’re deeply sorry. This is more than a blip or two.

Read More
Tags