Category: Software, Database, Data

Their purpose is pretty simple: they are implemented and deployed to copy or move data from “System to “System B.” To be a bit more formal (and abstract enough to justify our titles as engineers), a data pipeline is a process responsible for replicating the state from “System A” to “System B.”

This state is called a checkpoint and we will use it to resume our pipeline and avoid syncing every piece of data again and again.

This is a very important design decision we made, as we will see in the second part of this series, and the one that we want to rethink.

As I was interacting with more customers at Blendo and today at RudderStack, it became more and more clear that the bad things are actually opportunities to innovate and create a new type of data pipeline architecture.

Related Articles