Source: Pittsburgh Code & Supply

Handling Data at Petabyte Scale

Ever wondered how organizations are handling the petabytes of data being generated today? Data analytics traditionally ran on a single monolithic database but have now evolved to depend on several specialized components including at the compute, storage, and file-format layers. In this talk, we will discuss how to achieve data analytics at scale by integrating multiple specialized components such as Hadoop, Spark, TensorFlow, Vertica, etc..

Some of the challenges at this scale include consistent results, data security, and ease of use. I will describe a few solutions to these challenges.

Another interesting facet of this space is the co-existence of open-source (Spark, Hadoop) and enterprise products (Vertica) and I will share some insights into how they work together.

Deepak Majeti is a technical lead at Vertica, Pittsburgh where he leads the Vertica SQL on Data Lakes product. He is also a PMC member of the Apache ORC Project and a committer of the Apache Parquet Project. Deepak enjoys hikes, board games, and loves science fiction.

  • Get the latest DevOps jobs, events and curated articles straight to your inbox, once a week

  • Community Partners