** PLEASE REGISTER HERE **
https://trifork.zoom.us/webinar/register/WN_UUFKKiKmSKqAfZgBIrNIXA

ABOUT THE TALK

Apache Spark can natively run on top of Kubernetes (as a replacement of Hadoop YARN) since Spark 2.3 (2018). There are many benefits to running Apache Spark on Kubernetes, including the native containerization and cloud-provider abstraction, the integration with the rest of the Kubernetes ecosystem, and significant cost savings enabled by the superior efficiency in resource sharing.

Despite a growing interest and adoption from the community, making Spark on Kubernetes performant and stable remains a challenge for most companies. In this talk, we will walk you through an example data pipeline in PySpark running on Kubernetes, and give you tips and best practices to get started with Apache Spark on Kubernetes. Topics include: - Core concepts of Spark on Kubernetes - App-level dynamic allocation and cluster-level autoscaling - How to get the maximum performance out of shuffle stages (all-to-all data exchanges) - Monitoring and security best practices on Kubernetes - Limitations of Spark and Kubernetes and planned future works.

Agenda
17.00: Welcome to this GOTO Night with Jean-Yves Stephan
17.05: Jean-Yves Stephan will give his talk Kubernetes
17.45: Live Q&A with Jean-Yves Stephan
17.55: Thank you for joining us for this GOTO Night

ABOUT THE SPEAKER

Jean-Yves is the co-founder of Data Mechanics, a YCombinator startup building a serverless platform for Apache Spark. Prior to that, he was a software engineer and the Spark infrastructure lead at Databricks, the data science platform founded by the creators of Apache Spark. JY is passionate about making Spark and distributed data engineering 10x more easy-to-use and performant.

** PLEASE REGISTER HERE **
https://trifork.zoom.us/webinar/register/WN_UUFKKiKmSKqAfZgBIrNIXA

Newsletter
  • Get the latest DevOps jobs, events and curated articles straight to your inbox, once a week

  • Community Partners