Researchers at the King Abdullah University of Science and Technology (KAUST) are now proposing a method of accelerating distributed deep learning by dropping data blocks with zero values, which are frequently produced during distributed machine learning processes that use large datasets. One approach is to implement what is known as distributed deep learning, which scales out the training of models over a wider base of computational resources.
These libraries assume dense input data and make inefficient use of precious network bandwidth to transmit large volumes of zeroes.”
In testing OmniReduce against other existing collective libraries like NCCL and Gloo, while running six popular deep neural net models like BERT and ResNet152, the team of researchers found that OmniReduce performed well, boosting training times by up to 8.2 times.
The team is now working to adapt OmniReduce to run on programmable switches utilizing in-network computation to further boost performance.