Tips for High Availability

Tips for High Availability

  • April 30, 2018
Table of Contents

Tips for High Availability

Over the past four years, Netflix has gone from less than 50 Million subscribers to 125 Million subscribers. While this kind of growth has caused us no shortage of scaling challenges, we actually managed to improve the overall availability of our service in that time frame. Along the way, we have learned a lot and now have a much better understanding of what it takes to make our system more highly available.

But the news is not all good. The truth is that we learned many of our lessons the hard way: through heroics, through mad scrambles when things went wrong, and sometimes unfortunately through customer-facing incidents. Even though we haven’t figured everything out and still have many opportunities to improve our systems, we want to share some of the experience we have gained and the tips or best practices we derived.

Hopefully some of you will take something away that will save you a wake-up call at 3am for a customer-facing incident.

Source: medium.com

Tags :
Share :
comments powered by Disqus

Related Posts

Notes on structured concurrency, or: Go statement considered harmful

Notes on structured concurrency, or: Go statement considered harmful

In this post, I want to convince you that nurseries aren’t quirky or idiosyncratic at all, but rather a new control flow primitive that’s just as fundamental as for loops or function calls. And furthermore, the other approaches we saw above – thread spawning and callback registration – should be removed entirely and replaced with nurseries.

Read More
Caddy – The HTTP/2 Web Server with Automatic HTTPS

Caddy – The HTTP/2 Web Server with Automatic HTTPS

All you have to do is run caddy and voilà! Caddy automatically loads your Caddyfile if it’s in the same folder. For production sites, HTTPS is on by default!

Read More
Netflix FlameScope

Netflix FlameScope

We’re excited to release FlameScope: a new performance visualization tool for analyzing variance, perturbations, single-threaded execution, application startup, and other time-based issues. It has been created by the Netflix cloud performance engineering team and just released as open source, and we welcome help from others to develop the project further. (If it especially interests you, you might be interested in joining Netflix to work on it and other projects.)

Read More