How we 30x’d our Node parallelism

How we 30x’d our Node parallelism

  • January 23, 2020
Table of Contents

How we 30x’d our Node parallelism

What’s the best way to safely increase parallelism in a production Node service? That’s a question my team needed to answer a couple of months ago. We were running 4,000 Node containers (or ‘workers’) for our bank integration service.

The service was originally designed such that each worker would process only a single request at a time. This design lessened the impact of integrations that accidentally blocked the event loop, and allowed us to ignore the variability in resource usage across different integrations. But since our total capacity was capped at 4,000 concurrent requests, the system did not gracefully scale.

Most requests were network-bound, so we could improve our capacity and costs if we could just figure out how to increase parallelism safely. In our research, we couldn’t find a good playbook for going from ‘no parallelism’ to ‘lots of parallelism’ in a Node service. So we put together our own plan, which relied on careful planning, good tooling and observability, and a healthy dose of debugging.

In the end, we were able to 30x our parallelism, which equated to a cost savings of about $300k annually. This post will outline how we increased the performance and efficiency of our Node workers and describe the lessons that we learned in the process.

Source: plaid.com

Share :
comments powered by Disqus

Related Posts

Lyft’s Journey through Mobile Networking

Lyft’s Journey through Mobile Networking

In 5 years, the number of endpoints consumed by Lyft’s mobile apps grew to over 500, and the size of our mobile engineering team increased by more than 15x. To scale with this growth, our infrastructure had to evolve dramatically to utilize new advances in modern networking in order to continue to provide benefits for our users. This post describes the journey through the evolution of Lyft’s mobile networking: how it’s changed, what we’ve learned, and why it’s important for us as a growing business.

Read More
The Biggest IT Failures of 2018

The Biggest IT Failures of 2018

This year provedonce againthat IT-related failures “are universally unprejudiced: they happen in every country; to large companies and small; in commercial, nonprofit, and governmental organizations; and without regard to status or reputation.” Below is a review that just scratches the surface of the sundry failures, glitches, and other IT hiccups that made the news in 2018. This year saw a slight reduction in the number of flight cancellations and delays due to computer-related problems as compared with the past three years, especially in the United States.

Read More