Kubernetes Failure Stories

January 20, 2019

Table of Contents

I started to compile a list of public failure/horror stories related to Kubernetes. It should make it easier for people tasked with operations to find outage reports to learn from. Since we started with Kubernetes at Zalando in 2016, we collected many internal postmortems.

Docker bugs (daemon unresponsive, process stuck in pipe wait, ..) were a major pain point in the beginning, but Docker itself has become more mature and did not bite us recently. The biggest chunk of problems can be attributed to the nature of distributed systems and ‘cascading failures’, e.g. a Kubernetes API server outage should not affect running workloads, but it did, or see our recent CoreDNS incident.

Source: srcco.de

Tags :

comments powered by Disqus

Kubernetes Federation V2

With datacenters spread across the globe, users are increasingly looking at ways to spread their applications and services across multiple locales or clusters. This need is driven by multiple use cases: from providing high availability, spreading load across multiple clusters while being resilient to individual cluster failures; to avoiding provider lock-in by using hybrid cloud … With datacenters spread across the globe, users are increasingly looking at ways to spread their applications and services across multiple locales or clusters.

8 emerging trends in container orchestration

Containerization is now officially mainstream. A quarter of Datadog’s total customer base has adopted Docker and other container technologies, and half of the companies with more than 1,000 hosts have done so. As containers take a more prominent place in the infrastructure landscape, we see our customers adding automation and orchestration to help manage their fleets of ephemeral containers.

Kubernetes Failure Stories

Tags :

Share :

Related Posts

Kubernetes Federation V2

8 emerging trends in container orchestration

8 emerging trends in container orchestration