Kubernetes Failure Stories

Kubernetes Failure Stories

  • January 20, 2019
Table of Contents

Kubernetes Failure Stories

I started to compile a list of public failure/horror stories related to Kubernetes. It should make it easier for people tasked with operations to find outage reports to learn from. Since we started with Kubernetes at Zalando in 2016, we collected many internal postmortems.

Docker bugs (daemon unresponsive, process stuck in pipe wait, ..) were a major pain point in the beginning, but Docker itself has become more mature and did not bite us recently. The biggest chunk of problems can be attributed to the nature of distributed systems and ‘cascading failures’, e.g. a Kubernetes API server outage should not affect running workloads, but it did, or see our recent CoreDNS incident.

Source: srcco.de

Share :
comments powered by Disqus

Related Posts

Istio Multicluster

Istio Multicluster

Istio Multicluster is a feature of Istio–the basis of Red Hat OpenShift Service Mesh–that allows for the extension of the service mesh across multiple Kubernetes or Red Hat OpenShift clusters. The primary goal of this feature is to enable control of services deployed across multiple clusters with a single control plane. The main requirement for Istio multicluster to work is that the pods in the mesh and the Istio control plane can talk to each other.

Read More
The Biggest IT Failures of 2018

The Biggest IT Failures of 2018

This year provedonce againthat IT-related failures “are universally unprejudiced: they happen in every country; to large companies and small; in commercial, nonprofit, and governmental organizations; and without regard to status or reputation.” Below is a review that just scratches the surface of the sundry failures, glitches, and other IT hiccups that made the news in 2018. This year saw a slight reduction in the number of flight cancellations and delays due to computer-related problems as compared with the past three years, especially in the United States.

Read More
8 emerging trends in container orchestration

8 emerging trends in container orchestration

Containerization is now officially mainstream. A quarter of Datadog’s total customer base has adopted Docker and other container technologies, and half of the companies with more than 1,000 hosts have done so. As containers take a more prominent place in the infrastructure landscape, we see our customers adding automation and orchestration to help manage their fleets of ephemeral containers.

Read More