Deep learning image vector embeddings at scale using AWS Batch and CDK

Applying various transformations to images at scale is an easily parallelized and scaled task. In Computer Vision, we often need to represent images in a more concise and uniform way.

In order to run our image vectorization task, we will utilize the following AWS cloud components: Amazon ECR — Elastic Container Registry is a Docker image repository from which our batch instances will pull the job images; S3 — Amazon Simple Storage Service will act as our image source from which our batch jobs will read the image; Amazon DynamoDB — NoSQL database in which we will write the resulting vectors and other metadata; AWS Lambda — Serverless compute environment which will conduct some pre-processing and, ultimately, trigger the batch job execution; and AWS Batch — Scalable computing environment powering our models as https://en.wikipedia.org/wiki/Embarrassingly_parallel tasks running as AWS Batch jobs.

Figure Screenshot of a running AWS Batch job that creates feature vectors from images and stores them to DynamoDB.

In this post we solved an https://en.wikipedia.org/wiki/Embarrassingly_parallel job of creating vector embeddings from images using AWS batch.

DevOps Articles

Deep learning image vector embeddings at scale using AWS Batch and CDK

Product

Useful Links

DevOps Articles