Observability at Scale: Building Uber’s Alerting Ecosystem

Observability at Scale: Building Uber’s Alerting Ecosystem

  • December 22, 2018
Table of Contents

Observability at Scale: Building Uber’s Alerting Ecosystem

Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex configurations that affect these products at city and sub-city levels. To maintain our growth and architecture, Uber’s Observability team built a robust, scalable metrics and alerting pipeline responsible for detecting, mitigating, and notifying engineers of issues with their services as soon as they occur.

Specifically, we built two in-data center alerting systems, called uMonitor and Neris, that flow into the same notification and alerting pipeline. uMonitor is our metrics-based alerting system that runs checks against our metrics database M3, while Neris primarily looks for alerts in host-level infrastructure.

Source: uber.com

Share :
comments powered by Disqus

Related Posts

Bye bye Mongo, Hello Postgres

Bye bye Mongo, Hello Postgres

In April the Guardian switched off the Mongo DB cluster used to store our content after completing a migration to PostgreSQL on Amazon RDS. This post covers why and how At the Guardian, the majority of content – including articles, live blogs, galleries and video content – is produced in our in-house CMS tool, Composer. This, until recently, was backed by a Mongo DB database running on AWS.

Read More
Kubernetes Federation V2

Kubernetes Federation V2

With datacenters spread across the globe, users are increasingly looking at ways to spread their applications and services across multiple locales or clusters. This need is driven by multiple use cases: from providing high availability, spreading load across multiple clusters while being resilient to individual cluster failures; to avoiding provider lock-in by using hybrid cloud … With datacenters spread across the globe, users are increasingly looking at ways to spread their applications and services across multiple locales or clusters.

Read More