News

Yes, you can run VMs on Kubernetes with KubeVirt

Yes, you can run VMs on Kubernetes with KubeVirt

Containers and Kubernetes are awesome technologies that enable applications to run without a heavy operating system (OS), as using a virtual machine (VM) would require. Container-first, cloud-native applications are the future, but not every application is suitable to be cloud-native. KubeVirt allows you to run your virtual machines alongside your containers on a Kubernetes platform.

Read More
Telltale: Netflix Application Monitoring Simplified

Telltale: Netflix Application Monitoring Simplified

Our Netflix teams need to quickly detect, diagnose, and remediate problems. Telltale is intelligent monitoring and intelligent alerting. The Telltale application health model yields intelligent monitoring and intelligent alerting.

Read More
Introducing visx from Airbnb

Introducing visx from Airbnb

After 3 years of development, 2.5 years of production use at Airbnb, and a rewrite in TypeScript we are excited to announce the official 1.0 release of visx (formerly vx). You can find the project on GitHub and browse documentation and examples on airbnb.io. At Airbnb, we made it a goal to unify our visualization stack across the company, and in the process we created a new project that brings together the power of D3 with the joy of React.

Read More
How Alibaba Cloud uses Cilium for High-Performance Cloud

How Alibaba Cloud uses Cilium for High-Performance Cloud

A couple of weeks ago, the Alibaba team presented details on the new datapath for the Alibaba Cloud during the SIG Cloud-Provider-Alibaba meeting and also published a blog post with the technical architecture. Guess what, it is all Cilium & eBPF based. Alibaba Cloud is not the first cloud provider to directly embed Cilium.

Read More
Engineering For Failure

Engineering For Failure

A set of practical patterns to recover from failures in external services Not so long ago, our systems were simple: we had one machine, with one process, probably no more than one external datastore, and the entire request lifecycle was processed and handled within this simple world. Our users were also accustomed to a certain SLA standard — a 2-second page load time could have been acceptable a few years ago, but waiting more than a second for an Instagram post is unthinkable nowadays. When systems get more complex, with strict latency requirements and a distributed infrastructure, an uninvited guest crawls up our systems — request failure.

Read More
Keeping sync fast with automated performance regression detection

Keeping sync fast with automated performance regression detection

Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine.

Read More
Production testing with dark canaries

Production testing with dark canaries

Back in 2013, one of our large backend services wanted support in Rest.li for dark canaries. The service, at the time, involved duplicating requests from one host machine and sending it to another host machine. This was added via a Python tool to populate the host-to-host mapping in Apache ZooKeeper along with a filter to read this mapping and multiply traffic.

Read More
Tags