https://aws.amazon.com/polly/ Deep learning (DL) models have been increasing in size and complexity over the last few years, pushing the time to train from days to weeks. To reduce model training times and enable machine learning (ML) practitioners to iterate fast, AWS has been innovating across chips, servers, and data center connectivity.
New Trn1 Instance Highlights Trn1 instances are available today in two sizes and are powered by up to 16 AWS Trainium chips with 128 vCPUs.
Trn1 EC2 UltraClusters For large-scale model training, Trn1 instances integrate with https://aws.amazon.com/fsx/lustre/ high-performance storage and are deployed in EC2 UltraClusters.
Get Started with Trn1 Instances In this example, I train a PyTorch model on an EC2 Trn1 instance using the available PyTorch Neuron packages.