BUILDING STORAGE-FIRST SERVERLESS APPLICATIONS WITH HTTP APIS SERVICE INTEGRATIONS

Over the last year, I have been talking about “storage first” serverless patterns. With these patterns, data is stored persistently before any business logic is applied. The advantage of this pattern is increased application resiliency. By persisting the data before processing, the original data is still available, if or when errors occur. Using Amazon API Gateway as a proxy to an AWS Lambda function is a common pattern in serverless applications.

Read more

SCALING SERVICES WITH SHARD MANAGER

We look at how Shard Manager is fully integrated in Facebook’s infrastructure ecosystem and provides a holistic, end-to-end solution supporting basic shard failover as well as sophisticated load balancing, shard scaling, and operational safety. Over the years, as we’ve expanded in scale and functionalities, Facebook has evolved from a basic web server architecture into a complex one with thousands of services working behind the scenes. It’s no trivial task to scale the wide range of back-end services needed for Facebook’s products.

Read more

3 YEARS OF KUBERNETES IN PRODUCTION–HERE’S WHAT WE LEARNED

We started out building our first Kubernetes cluster in 2017, version 1.9.4. We had two clusters, one that ran on bare-metal RHEL VMs, and another that ran on AWS EC2. Today, our Kubernetes infrastructure fleet consists of over 400 virtual machines spread across multiple data-centres. The platform hosts highly-available mission-critical software applications and systems, to manage a massive live network with nearly four million active devices. Kubernetes eventually made our lives easier, but the journey was a hard one, a paradigm shift.

Read more

WHAT HAPPENED TO OUR INFRASTRUCTURE WHEN A CUSTOMER GOT OVER 10 MILLION PAGE VIEWS IN A FEW HOURS?

What happened to our infrastructure when a customer got over 10 million page views in a few hours? Yesterday was an exhausting day. I woke up at 6 AM to a 1.5 million job backlog in our queue and immediately jumped out of bed. Now, this has happened before, so I no longer stress about it, and I got on my computer. The main detriment here was that people couldn’t see their current visitors & there was a delay in their dashboard stats.

Read more

DESIGNING EDGE GATEWAY, UBER’S API LIFECYCLE MANAGEMENT PLATFORM

In October 2014, Uber had started its journey of scale in what would eventually turn out to be one of the most impressive growth phases in the company. Over time we were scaling our engineering teams non-linearly each month and acquiring millions of users across the world. In this article, we will go through the different phases of the evolution of Uber’s API gateway that powers Uber products. We will walk through history to understand the evolution of architectural patterns that occurred alongside this breakneck growth phase.

Read more

HOW WE BUILT A SERVERLESS E-COMMERCE WEBSITE ON AWS TO COMBAT COVID-19

Lesson learned from manufacturing a physical product to building a scalable e-commerce platform on AWS In a phone conversation in March 2020, Olalekan Elesin informed me of an idea he thought about while doing his regular grocery shopping at DM Drogrie. The idea is centered on a bracelet like your typical watch that dispenses Sanitizer. Before this time, Sanitizer comes packaged in bottles and cans of different sizes and kinds.

Read more

VALORANT’S 128-TICK SERVERS

To provide a short summary – in VALORANT, a key part of the gameplay is taking strategic positions and holding them. Holding positions can become impossible if other players can run around a corner and kill the defender before the defender can react due to latency. That latency is partly based on the network and partly based on the server tick rate. To give defenders the time they need to react to aggressors, we determined that VALORANT would require 128-tick servers.

Read more

EVERNOTE’S CEO ON THE COMPANY’S LONG, TRICKY JOURNEY TO FIX ITSELF

When Small joined Evernote in late 2018, the problem had eclipsed the room at the once-highly-regarded and still-hugely-popular company. (Even after years of problems, it still has more than 250 million users.) Evernote had five different apps run by five different teams for five different platforms, and each had its own set of features, design touches and technical issues. Internally, Evernote employees called the app’s codebase ‘the monolith,’ and that monolith had grown so big and complex, it was preventing the company from shipping cross-platform features or doing much of anything in a short time.

Read more

HOW WE UPGRADED POSTGRESQL AT GITLAB.COM

We explain the precise maintenance process to execute a major version upgrade of PostgreSQL. The biggest challenge was to do a complete fleet major upgrade through an orchestrated pg_upgrade. We needed to have a rollback plan to optimize our capacity right after Recovery Time Objective (RTO) while maintaining a 12-node clusterâs 6TB-data consistent serving 300.000 aggregated transactions per second from around six million users. The best way to resolve an engineering challenge is to follow the blueprints and design docs.

Read more

THREE BASECAMP OUTAGES. ONE WEEK. WHAT HAPPENED?

Basecamp has suffered through three serious outages in the last week, on Friday, August 28th, on Tuesday, September 1, and again today. It’s embarrassing, and we’re deeply sorry. This is more than a blip or two. Basecamp has been down during the middle of your day. We know these outages have really caused issues for you and your work. We’ve put you in the position of explaining Basecamp’s reliability to your customers and clients, too.

Read more