Twitter Turns to Google Cloud for Real-Time Data Stream Analysis

3 years ago thenewstack.io

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

Anytime someone takes an action on Twitter, from something as small as a click or a scroll to something as large as signing up or tweeting, Twitter logs it. To date, the company has used mainly a batch processing system to take a first pass at the data, surprisingly.

Before Twitter Sparrow, the log ingestion pipeline workflow included a batching system causing data science engineers to wait several hours for fresh customer events data.

Streaming data pipelines provide data scientists access to fresh data in real time.

Twitter engineers developed a Streaming Event Aggregator that collected log events from services and passed them to a message queue like https://thenewstack.io/apache-kafka-primer/ or https://cloud.google.com/pubsub/docs/overview.

DevOps Articles