Utilising streams to optimise Lambda costs
At CreateTOTALLY, AWS Lambda is a centerpiece of our platform - from sending AWS Cognito invitation emails using SES to processing of large automated videos uploaded to an S3 bucket, serverless is used heavily.
Lambdas should be short-lived, low memory consuming functions. An increase in memory size triggers an equivalent increase in CPU available to your function, which also means an increase in cost. And no one wants that.
So what happens when you want to, for example, extract a 5GB zip file to individual files in S3?
One solution is to extract files to /tmp
but then you'd hit the 512MB storage limit.
Another is to extract the files in memory using something like adm-zip but then you'd quickly hit the memory limit.
Streams to the rescue
Streams solve this problem by removing the memory limitation of traditional approaches. By utilising a stream we can process a file byte by byte rather than loading the entire thing into memory. This means even with that 5GB file we won't run out of memory.
Take a common problem for us. Zipping large video files. Here each video could run up to many gigabytes in size. But our lambdas could have a memory limit of 512mb (depending on size).
const fs = require('fs')
const archiver = require('archiver')
// Create a read stream to two large videos
const input1 = fs.createReadStream('./files/large-video1.mp4') // 1GB video
const input2 = fs.createReadStream('./files/large-video2.mp4') // 1GB video
// Create a write stream to our destination zip
const output = fs.createWriteStream('./output.zip')
// Initialize the archiver which will pipe the zip
const archive = archiver('zip', {
zlib: { level: 9 }, // Sets the compression level.
})
// Add our video files to the zip
archive.append(input1, { name: 'large-video1.mp4' })
archive.append(input2, { name: 'large-video2.mp4' })
// Once everything is setup we finally begin the zipping process.
archive.finalize()
Despite proccessing 2GB of video this script will run with just 5MB of memory :D
node --max-old-space-size=5 index.js
Scaling and costs
Let's compare the cost of streaming to non-streaming approaches. In our example above let's stick to the assumption we are using a lambda with 512MB of memory and processing 2GB of data.
Streaming:
512MB -> $0.0000000083
Non-Streaming
2048 -> $0.0000000333
Here the non streaming approach would be 4x more expensive (333/83 = 4)
But most importantly streaming scales. It doesnt matter if our video was 1MB or 10GB, the memory limit would never be hit.
Now that helps me sleep at night.