How should pipelines be monitored?

How should pipelines be monitored?

  • August 4, 2019
Table of Contents

How should pipelines be monitored?

For online serving systems it’s fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though? For a typical web application, high latency or error rates are the sort of thing you want to wake someone up about as they usually negatively affect the end-user’s experience.

Request rate isn’t something to alert on in and of itself, however it’s important to know as it’s often related to errors/latency plus you’ll want it for capacity planning. A offline processing pipeline typically involves queues (such as Kafka) between various stages of computation. There’s no end user eagerly waiting for a web page to load, however how long it takes for data to get through is a key metric.

Similarly if data goes in but an error causes it to be dropped or otherwise not correctly processed that’s usually something to be concerned about. In addition there’s how much data is sitting in each queue, how fast data is being added, and how fast data is being removed. Many will have alerts on too much data being in a queue, and this tends to be a bit spammy.

First off any alert on a fixed threshold tends to get out of date as traffic grows, in the same way it’s better to alert on the ratio of HTTP errors to total requests rather than how many happen per second. The more serious issue however is that one queue having a certain number of items in it doesn’t mean that the overall pipeline is processing data too slowly, and setting thresholds to avoid such false positives would miss actual problems. This is typical for alerts that work off causes rather than symptoms.

Source: robustperception.io

Share :
comments powered by Disqus

Related Posts

M3: Uber’s Open Source Large-Scale Metrics Platform for Prometheus

M3: Uber’s Open Source Large-Scale Metrics Platform for Prometheus

M3, Uber’s open source metrics platform for Prometheus, facilitates scalable and configurable multi-tenant storage for large-scale metrics. To facilitate the growth of Uber’s global operations, we need to be able to quickly store and access billions of metrics on our back-end systems at any given time. As part of our robust and scalable metrics infrastructure, we built M3, a metrics platform that has been in use at Uber for several years now.

Read More
Kubernetes Metrics and Monitoring

Kubernetes Metrics and Monitoring

This post explores the current state of metrics and monitoring in Kubernetes by walking through the gradual thought process that I experienced when learning this topic. Kubernetes needs some metrics for it’s basic out-of-the-box functionality, like autoscaling and scheduling. This is regardless of any monitoring solution you may want for the purpose of troubleshooting and alerting.

Read More