Deep Learning (DL) keeps growing and pushing the boundaries of where AI is going and compute is expanding to keep up with the https://thenewstack.io/brain-js-brings-deep-learning-to-the-browser-and-node-js/. With expanded compute comes expanded deployment in production.
https://aws.amazon.com/?utm_content=inline-mention‘s recently-launched AWS Inf2 instances can handle ML models with up to 175 billion parameters in production at scale with 4x higher throughput and 10x lower latency than their previous offering.
Though the model can handle hundreds of billions of parameters, multiple machines can work concurrently to serve larger models.
“These deep learning models were exploding in size, going from being a few million parameters to billions of parameters.