Nvidia Shaves up to 30% off Large Language Model Training Times

3 years ago thenewstack.io

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

Nvidia is announcing today that its https://developer.nvidia.com/nvidia-nemo product — an open source full-stack framework for developing and managing large language models (LLMs) — will ship with several improvements that reduce LLM training times. But these improvements are not small; Nvidia says they can trim training times by as much as 30%. LLMs are a specific type of deep learning/neural network model, used for a variety of natural language use cases, including content generation, text summarization, chatbots and other conversational AI applications.

That definitely seems to be the credo in place for these NeMo Megatron improvements, which come down to: Two novel approaches in training LLMs: selective activation recomputation and sequence parallelism.

Training deep learning models in general, and LLMs specifically, involves a process of iterative improvement.

DevOps Articles