Building Services at Airbnb Part 3

Building Services at Airbnb Part 3

In the third post of our series on scaling service development, we dive into resilience engineering practices built into the standard service platform that powers the new Services Oriented Architecture atAirbnb. Airbnb is moving its infrastructure towards a Service Oriented Architecture. A reliable, performant, and developer-friendly polyglot service platform is an underpinning component in Airbnb’s architectural evolution.

In Part 1 and Part 2 of our Building Services series, we shared how we used Thrift service IDL-centered service framework to scale the development of services; how a standardized service platform encourages and enforces infrastructure standards; and how to enforce best practices to for all new services without incurring additional development overhead. Service oriented architecture cultivates ownership and boosts development velocity. However, it imposes a new set of challenges.

The system complexity of distributed services is much higher than that of monolithic applications, and many techniques that used to work in monolithic architecture are no longer sufficient. In this post, we share how we built resilience engineering into service platform standards and helped service owners improve the availability of their services. In distributed services architecture, service resilience is a hard requirement.

Each service’s ability to respond and avoid downtime decreases as the inter-services communication complexity increases. As an example, the Airbnb Homes PDP (Product Detail Page) needs to fetch data from 20 downstream services. Assuming we didn’t take measures to improve our resilience, if these 20 dependent services each had 99.9% availability, our PDP page would only have 98.0% uptime, or about 14.5 hours of downtime each month.