Published on

AWS Lambda storage options

Authors
Tired of using AWS Console? 🤕
Time to boost your productivity with Cloudash — an AWS desktop client.

If you're building a serverless app, you're most likely using AWS Lambda.

Lambda functions are (by design) emphemeral, which means that their execution environments exist briefly when the function is invoked. Which presents an interesting challenge:

What if I need access to storage in my Lambda function?

The seemingly obvious answer to this question is "use a database".

As with most obvious answers, this one is not entirely correct. For instance, storing third-party libraries in DynamoDB would surely be an interesting idea, but not exactly practical.

Good luck storing node_modules in DynamoDB, by the way.

Other potential use cases include machine learning models, image processing, the output of your business-specific compute operation and more.

The goal of this post is to give you an overview of the different storage options available to you when building serverless applications with AWS Lambda, their differences and common use-cases.

Amazon S3

Amazon S3 is a widely popular object storage service, offering high availability and 11 9's of durability. It's a great choice for storing unstructured static assets, such as images, videos, documents, etc.

S3 is a common element of serverless architecture diagrams, to quote AWS docs:

S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3.

Not only can you invoke a Lambda function whenever an object is placed into an S3 bucket, but you can also both retrieve and send data to/from S3 in your Lambda function invocation. This is often useful because Lambda can be invoked with 6MB payload in a synchronous manner and 256kB in an asynchronous manner. Should you need a larger dataset, you can consider fetching that from S3.

Storing data in S3 has an additional benefit, given how well it integrates with other AWS services. For instance, you can use Amazon Athena to query your S3 data, or Amazon Rekognition to analyze it. Additionally you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards.

Check out S3 FAQ to learn more.

Temporary storage, also known as /tmp

Another interesting storage option available for AWS Lambda functions is its execution environment file system, available at /tmp. There are multiple factors to consider before using /tmp as a storage option:

In short - /tmp works well for ephemeral storage which should be shared between invocations with an added benefit of fast I/O throughput. As an example you may want to fetch a machine learning model, store it in /tmp and use it in your Lambda function. That way you won't need to fetch it from S3 during every invocation.

Amazon EFS for Lambda

Speaking of file systems - AWS Lambda comes with a support for EFS. Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services.

The biggest difference between aforementioned /tmp is that EFS is a durable storage that offers high availability.

You may wonder whether mounting a file system increases the cold start time, according to AWS:

The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.

Potential use cases for EFS include ingesting/writing large files durably, for instance large zip archives (e.g. machine learning models). Since EFS is a file system, you can append to existing files (unlike S3 where a new version of a whole object gets created).

Lambda layers

Insert Shrek onion quote here

Lambda functions can (and often do) use additional libraries as a part of the deployment package (after all, who can live without node_modules?). Each function can have up to 5 layers, which are counted in the maximum deployment size of 50MB (zipped).

Since layers are not temporary, they are not available in /tmp - instead, they're stored in /opt directory. There's an added benefit of using layers - they can be shared with other AWS accounts (you may want to read about benefits of using multiple AWS accounts).

Using Lambda layers does not incur any additional costs.

Read more about layers in using Lambda layers to simplify your development process on AWS Compute Blog

Comparing the different data storage options

Amazon S3/tmpLambda LayersAmazon EFS
Maximum sizeElastic512 MB50 MB (direct upload; larger if from S3).Elastic
PersistenceDurableEphemeralDurableDurable
ContentDynamicDynamicStaticDynamic
Storage typeObjectFile systemArchiveFile system
Lambda event source integrationNativeN/AN/AN/A
Operations supportedAtomic with versioningAny file system operationImmutableAny file system operation
Object taggingYNNN
Object metadataYNNN
Pricing modelStorage + requests + data transferIncluded in LambdaIncluded in LambdaStorage + data transfer + throughput
Sharing/permissions modelIAMFunction-onlyIAMIAM + NFS
Source for AWS GlueYNNN
Source for Amazon QuickSightYNNN
Relative data access speed from LambdaFastFastestFastestVery fast

Source: https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/

Tired of switching between AWS console tabs? 😒

Cloudash provides clear access to CloudWatch logs and metrics, to help you make quicker decisions.
Try it for free:

Logs screen