Kubernetes Failure Stories

Kubernetes Failure Stories

  • January 20, 2019
Table of Contents

Kubernetes Failure Stories

I started to compile a list of public failure/horror stories related to Kubernetes. It should make it easier for people tasked with operations to find outage reports to learn from. Since we started with Kubernetes at Zalando in 2016, we collected many internal postmortems.

Docker bugs (daemon unresponsive, process stuck in pipe wait, ..) were a major pain point in the beginning, but Docker itself has become more mature and did not bite us recently. The biggest chunk of problems can be attributed to the nature of distributed systems and ‘cascading failures’, e.g. a Kubernetes API server outage should not affect running workloads, but it did, or see our recent CoreDNS incident.

Source: srcco.de

Share :
comments powered by Disqus

Related Posts

Deployment strategies for the Jaeger Agent

Deployment strategies for the Jaeger Agent

If you’ve been following the evolution of the Kubernetes templates for Jaeger, you might have noticed an important change recently: the Jaeger Agent is not being deployed as a DaemonSet anymore. Instead, instructions are now being provided on how to deploy it as a “Sidecar”. The Agent component was developed to act as a “buffer” between the tracer and the collector.

Read More
PostgreSQL across clouds and on-premises

PostgreSQL across clouds and on-premises

PostgreSQL is a very popular open source relational database. It’s been in active development for over 30 years and has achieved a very high level of reliability and performance, as well as a very robust feature set.

Read More
8 emerging trends in container orchestration

8 emerging trends in container orchestration

Containerization is now officially mainstream. A quarter of Datadog’s total customer base has adopted Docker and other container technologies, and half of the companies with more than 1,000 hosts have done so. As containers take a more prominent place in the infrastructure landscape, we see our customers adding automation and orchestration to help manage their fleets of ephemeral containers.

Read More