7 Tips for Building Fast, Scalable, Cost-Efficient Streaming Data Pipelines

5 years ago thenewstack.io

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

Businesses are ingesting more and more data from sensors, smartphones, IT equipment, websites and other non-traditional sources, and processing this data in real-time to improve operations and better serve customers. More often than not, data comes in from multiple sources and is collected in an open data lake where it’s combined with existing historical data to deliver business value and results, often with machine learning and AI. The challenge for data engineers is to build streaming data pipelines that allow for rapid experimentation and operate reliably at scale.

This piece explores some of the best practices to overcome those hurdles and build streaming pipelines that are fast, scalable and robust.

In a world where data is generated and stored in multiple clouds, it’s imperative to have a streaming pipelines strategy that doesn’t lock you into a particular repository, storage format, data processing framework or user interface.

DevOps Articles