How much RAM does Prometheus 2.x need for cardinality and ingestion?

May 11, 2019

Table of Contents

Prometheus 2.x has a very different ingestion system to 1.x, with many performance improvements. This time I’m also going to take into account the cost of cardinality in the head block. To start with I took a profile of a Prometheus 2.9.2 ingesting from a single target with 100k unique time series: This gives a good starting point to find the relevant bits of code, but as my Prometheus has just started doesn’t have quite everything.

From here I can start digging through the code to understand what each bit of usage is. So PromParser. Metric for example looks to be the length of the full timeseries name, while the scrapeCache is a constant cost of 145ish bytes per time series, and under getOrCreateWithID there’s a mix of constants, usage per unique label value, usage per unique symbol, and per sample label.

The usage under fanoutAppender.commit is from the initial writing of all the series to the WAL, which just hasn’t been GCed yet. One thing missing is chunks, which work out as 192B for 128B of data which is a 50% overhead. From here I take various worst case assumptions.

For example half of the space in most lists is unused and chunks are practically empty. To simplify I ignore the number of label names, as there should never be many of those. This works out then as about 732B per series, another 32B per label pair, 120B per unique label value and on top of all that the time series name twice.

Last, but not least , all of that must be doubled given how Go garbage collection works.

Source: robustperception.io

Monitoring Kubernetes, part 1: the challenges + data sources

Our industry has long been relying on microservice-based architecture to deliver software faster and safer. The advent and ubiquity of microservices naturally paved the way for container technology, empowering us to rethink how we build and deploy our applications. Docker exploded onto the scene in 2013, and, for companies focusing on modernizing their infrastructure and cloud migration, a tool like Docker is critical to shipping applications quickly, at scale.

How Uber Monitors 4,000 Microservices

With 4,000 proprietary microservices and a growing number of open source systems that needed to be monitored, by late 2014 Uber was outgrowing its usage of Graphite and Nagios for metrics. They evaluated several technologies, including Atlas and OpenTSDB, but the fact that a growing number of open source systems were adding native support for the Prometheus Metrics Exporter format tipped the scales in that direction. Uber found with its use of Prometheus and M3, Uber’s storage costs for ingesting metrics became 8.53x more cost effective per metric per replica.

Optimising Prometheus 2.6.0 Memory Usage with pprof

There have been some reportsthat compaction was causing larger memory spikes than was desirable. I dug into this and improved it for Prometheus 2.6.0, so let’s see how. Firstly I wrote a test setup that created some samples for 100k time series, in a way that would require compaction.

How much RAM does Prometheus 2.x need for cardinality and ingestion?

Tags :

Share :

Related Posts

Monitoring Kubernetes, part 1: the challenges + data sources

How Uber Monitors 4,000 Microservices

Optimising Prometheus 2.6.0 Memory Usage with pprof