Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

  • March 16, 2019
Table of Contents

Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

Reliability engineering teams at Uber build the tools, libraries, and infrastructure that enable engineers to operate our thousands of microservices reliably at scale. At its essence, reliability engineering boils down to actively preventing outages that affect the mean time between failures (MTBF). As Uber’s global mobility platform grows, our global scale and complex network of microservice call patterns have made capacity requirements for individual services difficult to predict.

When we’re unable to predict service-level capacity requirements, capacity-related outages can occur. Given the real-time nature of our operations, capacity-related outages (affecting microservices at the feature level as opposed to directly impacting user experiences) constitute a large chunk of Uber’s overall outages. To help prevent these outages, our reliability engineers abide by a concept called capacity safety, in other words, the ability to support historical peaks or a 90-day forecast of concurrent riders-on-trip from a single data center without causing significant CPU, memory, and general resource starvation.

On the Maps Reliability Engineering team, we built in-house machine learning tooling that forecasts core service metrics—requests per second, latency, and CPU usage—to provide an idea of how these metrics will change throughout the course of a week or month. Using these forecasts, teams can perform accurate capacity safety tests that capture the nuances of each microservice.

Source: uber.com

Share :
comments powered by Disqus

Related Posts

AI year in review

AI year in review

At Facebook, we think that artificial intelligence that learns in new, more efficient ways – much like humans do – can play an important role in bringing people together. That core belief helps drive our AI strategy, focusing our investments in long-term research related to systems that learn using real-world data, inspiring our engineers to share cutting-edge tools and platforms with the wider AI community, and ultimately demonstrating new ways to use the technology to benefit the world. In 2018, we made important progress in all these areas.

Read More
AI Blueprints: Implementing content-based recommendations using Python

AI Blueprints: Implementing content-based recommendations using Python

In this article, we’ll have a look at how you can implement a content-based recommendation system using Python and the scikit-learn library. But before diving straight into this, it’s important to have some prerequisite knowledge of the different ways by which recommendation systems can recommend an item to users. Content-based: A content-based recommendation finds similar items to a given item by examining the item’s properties, such as its title or description, category, or dependencies on other items (for example, electronic toys require batteries).

Read More
Machine Learning-Powered Search Ranking of Airbnb Experiences

Machine Learning-Powered Search Ranking of Airbnb Experiences

How we built and iterated on a machine learning Search Ranking platform for a new two-sided marketplace and how we helped itgrow. Airbnb Experiences are handcrafted activities designed and led by expert hosts that offer a unique taste of local scene and culture. Each experience is vetted for quality by a team of editors before it makes its way onto the platform.

Read More