Building a simple AWS Serverless app to serve images
Introduction
In this short article I will show you how to build a simple Serverless app on AWS to upload and update images using Terraform.
The basic idea is build an app to upload a set of images with some metadata, also the app has to be able to update images and keep those versioned.
Following best practices, I have decided to build the infrastructure using Terraform modules. Those modules create the whole AWS infrastructure and also upload the Lambda function code.
The goal for writing this article is to provide a template that you can modify and use in your project. All source code is here.
Services Used
The following AWS services were used to build the app:
- API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST and WebSocket APIs at any scale.
- Lambda is a compute service that lets you run code without provisioning or managing servers. AWS Lambda executes your code only when needed and scales automatically, from a few requests per day to thousands per second.
- DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don’t have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
- S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.
Architecture
RestAPI
An AWS API Gateway service was created to host the REST API and route traffic to the Lambda functions. The API contains just one proxy resource and two methods that triggers one Lambda each one:
- Method PUT: Updates an existing image passing the new image in the body, new metadata in headers and image id as a query parameter.
It returns the image id, new image url and version. - Method POST: Creates a new image passing the image in the body, metadata in headers and image name as a query parameter.
It returns the image url, id and version.
See more details in OpenAPI specification here.
Metadata storage
All the image metadata including the version tag is stored in a DynamoDB table, that has a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key (“id” in this case), and the second attribute is the sort key (“version”).
Example Dynamo item:
{
“Author”: “Matias Zilli”,
“Bucket”: “imagesproject”,
“Category”: “cats”,
“Date”: 1584453887564,
“Filename”: “1b2d32b0–25bb-41d4-ba74–9bb14f5c8369-test5.jpg”,
“id”: “1b2d32b0–25bb-41d4-ba74–9bb14f5c8369”,
“Version”: “RW3GW7UqujewJL_o1We4mMRhT6I3Y66j”
}
Image storage
Images are stored in two S3 bucket using versioning, it allows keeping multiple variants of an image in a bucket without change the image name. Versioning is useful to preserve, retrieve, and restore every version of each image stored in a S3 bucket and with that, you can also easily recover from both unintended user actions and application failures.
A versioning example is shown below:
In order to achieve better availability and reduce costs, there is a second bucket (shown as “backup” in the architecture diagram) to replicate and backup images in a different region. Replication enables automatic, asynchronous copying of objects across S3 buckets.
The main bucket has a storage class type “Standard” which provides the best availability and the lower cost for frequently accessed assets, and the backup bucket has a storage class type “One Zone-IA” which is designed for long-lived, infrequently accessed and non-critical data.
The images are continuously replicated from the main to backup bucket and non current version (old versions after a update) of the images expires from the main bucket after 6 days and never expires from backup bucket. That mechanism allows to have old image versions in a low cost and infrequent accessed storage which save a lot of money.
Lambda functions and app code
The app has two AWS Lambda function, one for upload and other for update images, both of them were developed using Javascript and run with Node.js 12. Both functions were deployed as public functions (i.e: not inside of a VPC), it was possible because S3 and Dynamo support calls from public Lambdas.
VPC Lambda functions add huge overheads on “cold starts”, it is about 8 second of delay that is a horrible user experience and running into multiple cold starts would negatively affect a user’s experience. Also keep in mind that each time a Lambda function is executed inside a VPC, it uses a proportion of your IP capacity from the subnet, so you must have enough IP capacity to support your Lambda scaling requirements.
Another optimization was accomplished adding “keep-alive” for Dynamo connections. By default the default Node.js HTTP/HTTPS agent creates a new TCP connection for every new request, to avoid that cost, you can reuse an existing connection setting the AWS_NODEJS_CONNECTION_REUSE_ENABLED
environment variable to 1
.
upload Lambda function source code.
update Lambda function source code.
Future improvements
Another better approach to implement this, is generate S3 pre-signed urls and upload images directly to S3 using these. In this case we did not follow this approach for some constraints.
Also, the best way to serve images from S3 is using Cloudfront by creating a Cloudfront distribution to serve HTTPS image requests directly from a edge location that provides the lowest latency with the best possible performance to the end user.
Another optimization might be applied to the Node.js code using Minify and Uglify, that reduces code size and therefore optimize “cold start” by reducing code download time.