TL;DR: Leverage the backpressure-transform npm package to prevent memory growths with your NodeJS Transforms!
I wanted to make my first post something very practical and hands on. Something just to simply get the momentum started on writing. With that said, I figured I’d dive a little into NodeJS Streams, specifically how backpressure should be handled in Transforms. Without handling backpressure, you can easily crash your NodeJS application due to memory growth.
I’ve spent the last five years deep in NodeJS, building Petabyte-scale data streaming software for Cribl, so needless to say, I’ve spent my fair share of time debugging NodeJS Streams, whether that’s stuck data, race conditions, non-performant code, memory leaks, etc. A lot of the time, debugging a NodeJS Stream results in me going directly to the NodeJS code base and reading how the code is expected to work.
If you’re unfamiliar with NodeJS Streams, I’d suggest reading up on them from the official docs here. There’s also a great guide that NodeJS provides regarding handling backpressure here. With that said, they lack information on how to properly handle backpressure when it comes to custom-built Transforms.
You can find the following in the NodeJS documentation:
Care must be taken when using
Transform
streams in that data written to the stream can cause theWritable
side of the stream to become paused if the output on theReadable
side is not consumed.
Despite the warning, NodeJS does not actually help you figure out the best way to handle backpressure inside of a custom Transform. Once you dig a little into the actual Transform code, it becomes clear how to actually ensure backpressure is respected. Here is a an example of handling backpressure properly in a Transform:
_transform(data, enc, callback) {
const continueTransforming = () => {
// do some work / parse the data, keep state of where we're at etc
if(!this.push(event))
// will get called again when the reader can consume more data
this.once('data', continueTransforming);
if(allDone)
callback();
}
continueTransforming()
}
This works because data
is only emitted when someone downstream is consuming the Transform
's readable buffer that you're this.push()
-ing to. So whenever the downstream has capacity to pull off of this buffer, you should be able to start writing back to the buffer.
The flaw with listening for drain
on the downstream (other than reaching into the internals of node) is that you also are relying on your Transform
's buffer having been drained as well, which there's no guarantee that it has been when the downstream emits drain
.
As part of this blog, I decided to also build out an npm package that implements this backpressure behavior to make it much easier folks to consume! Hopefully this package helps you produce more reliable software.