Category: Data, Docker, github, automation, yaml

Applying various transformations to images at scale is an easily parallelized and scaled task. In Computer Vision, we often need to represent images in a more concise and uniform way.

In order to run our image vectorization task, we will utilize the following AWS cloud components: Amazon ECR — Elastic Container Registry is a Docker image repository from which our batch instances will pull the job images; S3 — Amazon Simple Storage Service will act as our image source from which our batch jobs will read the image; Amazon DynamoDB — NoSQL database in which we will write the resulting vectors and other metadata; AWS Lambda — Serverless compute environment which will conduct some pre-processing and, ultimately, trigger the batch job execution; and AWS Batch — Scalable computing environment powering our models as https://en.wikipedia.org/wiki/Embarrassingly_parallel tasks running as AWS Batch jobs.

Figure Screenshot of a running AWS Batch job that creates feature vectors from images and stores them to DynamoDB.

In this post we solved an https://en.wikipedia.org/wiki/Embarrassingly_parallel job of creating vector embeddings from images using AWS batch.

Related Articles