AWS Lambda storage options

If you're building a serverless app, you're most likely using AWS Lambda.

Lambda functions are (by design) emphemeral, which means that their execution environments exist briefly when the function is invoked. Which presents an interesting challenge:

What if I need access to storage in my Lambda function?

The seemingly obvious answer to this question is "use a database".

As with most obvious answers, this one is not entirely correct. For instance, storing third-party libraries in DynamoDB would surely be an interesting idea, but not exactly practical.

Good luck storing node_modules in DynamoDB, by the way.

Other potential use cases include machine learning models, image processing, the output of your business-specific compute operation and more.

The goal of this post is to give you an overview of the different storage options available to you when building serverless applications with AWS Lambda, their differences and common use-cases.

Amazon S3

Amazon S3 is a widely popular object storage service, offering high availability and 11 9's of durability. It's a great choice for storing unstructured static assets, such as images, videos, documents, etc.

S3 is a common element of serverless architecture diagrams, to quote AWS docs:

S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3.

Not only can you invoke a Lambda function whenever an object is placed into an S3 bucket, but you can also both retrieve and send data to/from S3 in your Lambda function invocation. This is often useful because Lambda can be invoked with 6MB payload in a synchronous manner and 256kB in an asynchronous manner. Should you need a larger dataset, you can consider fetching that from S3.

Storing data in S3 has an additional benefit, given how well it integrates with other AWS services. For instance, you can use Amazon Athena to query your S3 data, or Amazon Rekognition to analyze it. Additionally you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards.

Check out S3 FAQ to learn more.

Temporary storage, also known as /tmp

Another interesting storage option available for AWS Lambda functions is its execution environment file system, available at /tmp. There are multiple factors to consider before using /tmp as a storage option:

It has a fixed size of 512MB. EDIT: Justin Plock pointed out that you can get up to 10GB of ephemeral storage in /tmp
Because of the way Lambda is designed, the same execution environment will be reused by multiple invocations to optimize performance
Each new execution environment starts with an empty /tmp directory

In short - /tmp works well for ephemeral storage which should be shared between invocations with an added benefit of fast I/O throughput. As an example you may want to fetch a machine learning model, store it in /tmp and use it in your Lambda function. That way you won't need to fetch it from S3 during every invocation.

Amazon EFS for Lambda

Speaking of file systems - AWS Lambda comes with a support for EFS. Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services.

The biggest difference between aforementioned /tmp is that EFS is a durable storage that offers high availability.

You may wonder whether mounting a file system increases the cold start time, according to AWS:

The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.

Potential use cases for EFS include ingesting/writing large files durably, for instance large zip archives (e.g. machine learning models). Since EFS is a file system, you can append to existing files (unlike S3 where a new version of a whole object gets created).

Lambda layers

Insert Shrek onion quote here

Lambda functions can (and often do) use additional libraries as a part of the deployment package (after all, who can live without node_modules?). Each function can have up to 5 layers, which are counted in the maximum deployment size of 50MB (zipped).

Since layers are not temporary, they are not available in /tmp - instead, they're stored in /opt directory. There's an added benefit of using layers - they can be shared with other AWS accounts (you may want to read about benefits of using multiple AWS accounts).

Using Lambda layers does not incur any additional costs.

Comparing the different data storage options

	Amazon S3	/tmp	Lambda Layers	Amazon EFS
Maximum size	Elastic	512 MB	50 MB (direct upload; larger if from S3).	Elastic
Persistence	Durable	Ephemeral	Durable	Durable
Content	Dynamic	Dynamic	Static	Dynamic
Storage type	Object	File system	Archive	File system
Lambda event source integration	Native	N/A	N/A	N/A
Operations supported	Atomic with versioning	Any file system operation	Immutable	Any file system operation
Object tagging	Y	N	N	N
Object metadata	Y	N	N	N
Pricing model	Storage + requests + data transfer	Included in Lambda	Included in Lambda	Storage + data transfer + throughput
Sharing/permissions model	IAM	Function-only	IAM	IAM + NFS
Source for AWS Glue	Y	N	N	N
Source for Amazon QuickSight	Y	N	N	N
Relative data access speed from Lambda	Fast	Fastest	Fastest	Very fast

Source: https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/