Over the past four years, Netflix has gone from less than 50 Million subscribers to 125 Million subscribers. While this kind of growth has caused us no shortage of scaling challenges, we actually managed to improve the overall availability of our service in that time frame. Along the way, we have learned a lot and now have a much better understanding of what it takes to make our system more highly available. But the news is not all good. The truth is that we learned many of our lessons the hard way: through heroics, through mad scrambles when things went wrong, and sometimes unfortunately through customer-facing incidents. Even though we haven’t figured everything out and still have many opportunities to improve our systems, we want to share some of the experience we have gained and the tips or best practices we derived. Hopefully some of you will take something away that will save you a wake-up call at 3am for a customer-facing incident.