Part 1: The Evolution of Data Pipeline Architecture

4 years ago thenewstack.io

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

Their purpose is pretty simple: they are implemented and deployed to copy or move data from “System to “System B.” To be a bit more formal (and abstract enough to justify our titles as engineers), a data pipeline is a process responsible for replicating the state from “System A” to “System B.”

This state is called a checkpoint and we will use it to resume our pipeline and avoid syncing every piece of data again and again.

This is a very important design decision we made, as we will see in the second part of this series, and the one that we want to rethink.

As I was interacting with more customers at Blendo and today at RudderStack, it became more and more clear that the bad things are actually opportunities to innovate and create a new type of data pipeline architecture.

DevOps Articles