Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

  • March 16, 2019
Table of Contents

Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

Reliability engineering teams at Uber build the tools, libraries, and infrastructure that enable engineers to operate our thousands of microservices reliably at scale. At its essence, reliability engineering boils down to actively preventing outages that affect the mean time between failures (MTBF). As Uber’s global mobility platform grows, our global scale and complex network of microservice call patterns have made capacity requirements for individual services difficult to predict.

When we’re unable to predict service-level capacity requirements, capacity-related outages can occur. Given the real-time nature of our operations, capacity-related outages (affecting microservices at the feature level as opposed to directly impacting user experiences) constitute a large chunk of Uber’s overall outages. To help prevent these outages, our reliability engineers abide by a concept called capacity safety, in other words, the ability to support historical peaks or a 90-day forecast of concurrent riders-on-trip from a single data center without causing significant CPU, memory, and general resource starvation.

On the Maps Reliability Engineering team, we built in-house machine learning tooling that forecasts core service metrics—requests per second, latency, and CPU usage—to provide an idea of how these metrics will change throughout the course of a week or month. Using these forecasts, teams can perform accurate capacity safety tests that capture the nuances of each microservice.

Source: uber.com

Share :
comments powered by Disqus

Related Posts

Teaching AI to learn speech the way children do

Teaching AI to learn speech the way children do

A collaboration between the Facebook AI Research (FAIR) group and the Paris Sciences & Lettres University, with additional sponsorship from Microsoft Research, to challenge other researchers to teach AI systems to learn speech in a way that more closely resembles how young children learn. The ZeroSpeech 2019 challenge (which builds on previous efforts in 2015 and 2017) asks participants to build a speech synthesizer using only audio input, without any text or phonetic labels. The challenge’s central task is to build an AI system that can discover, in an unknown language, the machine equivalent of text of phonetic labels and use them to re-synthesize a sentence in a given voice.

Read More
Machine Learning-Powered Search Ranking of Airbnb Experiences

Machine Learning-Powered Search Ranking of Airbnb Experiences

How we built and iterated on a machine learning Search Ranking platform for a new two-sided marketplace and how we helped itgrow. Airbnb Experiences are handcrafted activities designed and led by expert hosts that offer a unique taste of local scene and culture. Each experience is vetted for quality by a team of editors before it makes its way onto the platform.

Read More