Source: medium.com

Spark on Kubernetes

Category: Docker, shell

It is my hope that you will be able to use the skills developed across this series in order to become proficient at building and deploying Spark applications using the Kubernetes scheduler.

The idea here is to introduce the concepts and components we will be using across the series, including Kubernetes (K8s), Docker and Spark 3.0.1.

Kubernetes in a nutshell is an open-source infrastructure management framework that allows you to provision logical slices of your clustered compute infrastructure in order to deploy, scale and manage containerized applications of varying sizes.

In the case of Spark applications, this application runtime logic is the standard spark.conf.* settings, as well as any shared volumes (disk), application dependencies (jars/python/r etc) and really anything you want fine grained access to when running your spark application.

This is essential to running Spark on Kubernetes, and also useful if you want to create a base image for Spark to reuse within your organization.

Related Articles