Category: Business, Data, automation, artificial-intelligence

Businesses are ingesting more and more data from sensors, smartphones, IT equipment, websites and other non-traditional sources, and processing this data in real-time to improve operations and better serve customers. More often than not, data comes in from multiple sources and is collected in an open data lake where it’s combined with existing historical data to deliver business value and results, often with machine learning and AI. The challenge for data engineers is to build streaming data pipelines that allow for rapid experimentation and operate reliably at scale.

This piece explores some of the best practices to overcome those hurdles and build streaming pipelines that are fast, scalable and robust.

In a world where data is generated and stored in multiple clouds, it’s imperative to have a streaming pipelines strategy that doesn’t lock you into a particular repository, storage format, data processing framework or user interface.

Related Articles