Building a Kubernetes platform at Pinterest

Building a Kubernetes platform at Pinterest

  • October 5, 2019
Table of Contents

Building a Kubernetes platform at Pinterest

Over the years, 300 million Pinners have saved more than 200 billion Pins on Pinterest across more than 4 billion boards. To serve this vast user base and content pool, we’ve developed thousands of services, ranging from microservices of a handful CPUs to huge monolithic services that occupy a whole VM fleet. There are also various kinds of batch jobs from all kinds of different frameworks, which can be CPU, memory or I/O intensive.

To support these diverse workloads, the infrastructure team at Pinterest is facing multiple challenges: Engineers don’t have a unified experience when launching their workload. Stateless services, stateful services and batch jobs are deployed and managed by totally different tech stacks. This has created a steep learning curve for our engineers, as well as huge maintenance and customer support burdens for the infrastructure team.

Engineers managing their own VM fleets is creating a huge maintenance load for the infra team. Simple operations such as an OS or AMI upgrade can take weeks to months. Production workloads are also disturbed during those processes, which are supposed to be transparent to them.

It’s hard to build infrastructure governance tools on top of separated management systems. It’s even more difficult for us to determine who owns which machines and if they can be safely recycled.

Source: medium.com

Share :
comments powered by Disqus

Related Posts

OPA Gatekeeper: Policy and Governance for Kubernetes

OPA Gatekeeper: Policy and Governance for Kubernetes

The Open Policy Agent Gatekeeper project can be leveraged to help enforce policies and strengthen governance in your Kubernetes environment. In this post, we will walk through the goals, history, and current state of the project. The following recordings from the Kubecon EU 2019 sessions are a great starting place in working with Gatekeeper: If your organization has been operating Kubernetes, you probably have been looking for ways to control what end-users can do on the cluster and ways to ensure that clusters are in compliance with company policies.

Read More
Announcing etcd 3.4

Announcing etcd 3.4

In particular, etcd experienced performance issues with a large number of concurrent read transactions even when there is no write (e.g. “read-only range request … took too long to execute”). Previously, the storage backend commit operation on pending writes blocks incoming read transactions, even when there was no pending write. Now, the commit does not block reads which improve long-running read transaction performance.

Read More