Predictive CPU isolation of containers at Netflix

Predictive CPU isolation of containers at Netflix

  • June 8, 2019
Table of Contents

Predictive CPU isolation of containers at Netflix

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. However, the key insight here is that these caches are partially shared among the CPUs, which means that perfect performance isolation of co-hosted containers is not possible. If the container running on the core next to your container suddenly decides to fetch a lot of data from the RAM, it will inevitably result in more cache misses for you (and hence a potential performance degradation).

Traditionally it has been the responsibility of the operating system’s task scheduler to mitigate this performance isolation problem. In Linux, the current mainstream solution is CFS (Completely Fair Scheduler). Its goal is to assign running processes to time slices of the CPU in a “fair” way.

CFS is widely used and therefore well tested and Linux machines around the world run with reasonable performance. So why mess with it? As it turns out, for the large majority of Netflix use cases, its performance is far from optimal.

Titus is Netflix’s container platform. Every month, we run millions of containers on thousands of machines on Titus, serving hundreds of internal applications and customers. These applications range from critical low-latency services powering our customer-facing video streaming service, to batch jobs for encoding or machine learning.

Maintaining performance isolation between these different applications is critical to ensuring a good experience for internal and external customers.

Source: medium.com

Share :
comments powered by Disqus

Related Posts

Dockter: A Docker image builder for researchers

Dockter: A Docker image builder for researchers

Dependency hell is ubiquitous in the world of software for research, and this affects research transparency and reproducibility. Containerization is one solution to this problem, but it creates new challenges for researchers. Docker is gaining popularity in the research community—but using it efficiently requires solid Dockerfile writing skills.

Read More
Anatomy of CVE-2019-5736: A runc container escape!

Anatomy of CVE-2019-5736: A runc container escape!

On Monday, February 11, CVE-2019-5736 was disclosed. This vulnerability is a flaw in runc, which can be exploited to escape Linux containers launched with Docker, containerd, CRI-O, or any other user of runc. But how does it work?

Read More
Containers, Security and Echo chambers

Containers, Security and Echo chambers

There seems to be some confusion around sandboxing containers as of late, mostly because of the recent launch of gvisor. Before I get into the body of this post I would like to make one thing clear. I have no problem with gvisor itself.

Read More