When the cost of machine learning is discussed, it is often in the context of model training. Left out in these discussions is the bottleneck presented by inference costs, particularly in the case of realtime inference.
Here, I want to share what we’ve learned as a short checklist of opportunities for cost optimization in inference pipelines. Depending on the level of optimization in your current pipeline, these steps could reduce inference costs by more than 80%. Inference costs, particularly in the case of realtime inference, are largely driven by cloud compute costs.