YES, YOU CAN RUN VMS ON KUBERNETES WITH KUBEVIRT

Containers and Kubernetes are awesome technologies that enable applications to run without a heavy operating system (OS), as using a virtual machine (VM) would require. Container-first, cloud-native applications are the future, but not every application is suitable to be cloud-native. KubeVirt allows you to run your virtual machines alongside your containers on a Kubernetes platform. Containers and Kubernetes are awesome technologies that enable applications to run without a heavy operating system (OS), as using a virtual machine (VM) would require.

Read more

TELLTALE: NETFLIX APPLICATION MONITORING SIMPLIFIED

Our Netflix teams need to quickly detect, diagnose, and remediate problems. Telltale is intelligent monitoring and intelligent alerting. The Telltale application health model yields intelligent monitoring and intelligent alerting. Netflix service owners get alerts they can trust with little configuration and no need for constant tuning. When health problems strike, Telltale presents only the most relevant context and suggests possible causes. An alert fires and you get paged in the middle of the night.

Read more

INTRODUCING VISX FROM AIRBNB

After 3 years of development, 2.5 years of production use at Airbnb, and a rewrite in TypeScript we are excited to announce the official 1.0 release of visx (formerly vx). You can find the project on GitHub and browse documentation and examples on airbnb.io. At Airbnb, we made it a goal to unify our visualization stack across the company, and in the process we created a new project that brings together the power of D3 with the joy of React.

Read more

HOW ALIBABA CLOUD USES CILIUM FOR HIGH-PERFORMANCE CLOUD

A couple of weeks ago, the Alibaba team presented details on the new datapath for the Alibaba Cloud during the SIG Cloud-Provider-Alibaba meeting and also published a blog post with the technical architecture. Guess what, it is all Cilium & eBPF based. Alibaba Cloud is not the first cloud provider to directly embed Cilium. Recently, Google announced the availability of Dataplane V2 based on Cilium & eBPF for GKE and Anthos.

Read more

HOW TO PERFORM A CNI LIVE MIGRATION FROM FLANNEL+CALICO TO CILIUM

Container Network Interface (CNI) is a big topic, but in short, CNI is a set of specifications that define an interface used by container orchestrators to set up networking between containers. In the Kubernetes space, the Kubelet is responsible for calling the CNI installed on the cluster so Pods are attached to the Kubernetes cluster network during creation, and its resources are properly released during deletion. CNIs can also be responsible for more advanced features than just setting up routes in the cluster, such as network policy enforcement, encryption, load balancing, etc.

Read more

WHY WE CHOSE A DISTRIBUTED SQL DATABASE TO COMPLEMENT MYSQL

VIPKid chose TiDB to manage its high data volume, highly concurrent write application. Learn how TiDB excels in that scenario, along with multidimensional queries, data life cycle management, and real-time analytics. We use MySQL as our backend database. But as our application data grew rapidly, standalone MySQL’s storage capacity became a bottleneck, and it could no longer meet our application requirements. We tried MySQL sharding on our core applications, but it was difficult to run multi-dimensional queries on sharded data.

Read more

ENGINEERING FOR FAILURE

A set of practical patterns to recover from failures in external services Not so long ago, our systems were simple: we had one machine, with one process, probably no more than one external datastore, and the entire request lifecycle was processed and handled within this simple world. Our users were also accustomed to a certain SLA standard — a 2-second page load time could have been acceptable a few years ago, but waiting more than a second for an Instagram post is unthinkable nowadays.

Read more

HOW AWS CLOUD CUSTOMERS ARE USING LOCAL ZONES FOR EDGE COMPUTING

Amazon Web Services says users are tapping Local Zones to run hybrid environments and support latency-intensive tasks like game rendering. Source: datacenterfrontier.com

KEEPING SYNC FAST WITH AUTOMATED PERFORMANCE REGRESSION DETECTION

Sync is a hard distributed systems problem and re-writing the heart of our sync engine on the desktop client was a monumental effort. We’ve previously discussed our efforts to heavily test durability at different layers of the system. Today, we are going to talk about how we ensured the performance of our new sync engine. In particular, we describe a performance regression testing framework we call Apogee. Apogee helps us find unanticipated performance issues in the development process and safeguard against bugs that we would otherwise release to our users.

Read more

PRODUCTION TESTING WITH DARK CANARIES

Back in 2013, one of our large backend services wanted support in Rest.li for dark canaries. The service, at the time, involved duplicating requests from one host machine and sending it to another host machine. This was added via a Python tool to populate the host-to-host mapping in Apache ZooKeeper along with a filter to read this mapping and multiply traffic. As operational complexity grew (due to additional data centers, dark canaries being used in midtier and even frontend services, and dynamic scale up-down of instances), this became more complex to maintain.

Read more