Category: Deployment, Software, Kubernetes, github, machine-learning

It combines the developer agility and the scalability of Kubernetes, with the wide selection of Amazon Elastic Compute Cloud (EC2) instance types available on AWS, such as the C5, P3, and G4 families.

Today, we’re very happy to announce that AWS customers can now use the Amazon EC2 Inf1 instances on Amazon Elastic Kubernetes Service, for high performance and the lowest prediction cost in the cloud.

Inf1 instances are available in multiple sizes, with 1, 4, or 16 AWS Inferentia chips, with up to 100 Gbps network bandwidth and up to 19 Gbps EBS bandwidth.

Compiling Models for EC2 Inf1 Instances To run machine learning models on Inf1 instances, you need to compile them to a hardware-optimized representation using the AWS Neuron SDK.

In the demo below, I will show you how to deploy a Neuron-optimized model on an EKS cluster of Inf1 instances, and how to serve predictions with TensorFlow Serving.

Related Articles