How we 30x’d our Node parallelism

How we 30x’d our Node parallelism

  • January 23, 2020
Table of Contents

How we 30x’d our Node parallelism

What’s the best way to safely increase parallelism in a production Node service? That’s a question my team needed to answer a couple of months ago. We were running 4,000 Node containers (or ‘workers’) for our bank integration service.

The service was originally designed such that each worker would process only a single request at a time. This design lessened the impact of integrations that accidentally blocked the event loop, and allowed us to ignore the variability in resource usage across different integrations. But since our total capacity was capped at 4,000 concurrent requests, the system did not gracefully scale.

Most requests were network-bound, so we could improve our capacity and costs if we could just figure out how to increase parallelism safely. In our research, we couldn’t find a good playbook for going from ‘no parallelism’ to ‘lots of parallelism’ in a Node service. So we put together our own plan, which relied on careful planning, good tooling and observability, and a healthy dose of debugging.

In the end, we were able to 30x our parallelism, which equated to a cost savings of about $300k annually. This post will outline how we increased the performance and efficiency of our Node workers and describe the lessons that we learned in the process.

Source: plaid.com

Share :
comments powered by Disqus

Related Posts

Stack Overflow: How We Do Monitoring

Stack Overflow: How We Do Monitoring

What is monitoring? As far as I can tell, it means different things to different people. But we more or less agree on the concept.

Read More
Database Migration To Amazon Aurora

Database Migration To Amazon Aurora

In this blog post we’ll show you how we migrated a critical Postgres database with 18Tb of data from Amazon RDS (Relational Database Service) to Amazon Aurora, with minimal downtime. To do so, we’ll discuss our experience at Codacy.

Read More