SK DATA SHARE: Spark Deployment modes (Class -42)

Spark applications can be deployed and executed using spark-submit in a shell command on a cluster. It can use any of the cluster managers like YARN, Mesos or its own Cluster manager through its uniform interface and there is no extra configuration needed for each one of them separately.

For deploying our Spark application on a cluster, we need to use the spark-submit script of Spark.

Spark has to do the process:

Spark-submit --class classname mode jar inputfile outputfile location

The spark-submit command is a utility to run or submit a Spark or PySpark application program (or job) to the cluster by specifying options and configurations, the application you are submitting can be written in Scala, Java, or Python (PySpark) code. You can use this utility in order to do the following.

1. Submitting Spark application on different cluster managers like Yarn, Kubernetes, Mesos, and Stand-alone.

2. Submitting Spark application on client or cluster deployment modes

Spark application needs to be deployed into 3 modes:

a) Local à Spark itself allocates the Resources (Standalone).

b) YARN Client à Driver will be running in Edge node. (Dev)

c) YARN Cluster à Driver will be running in any one of the Data Nodes. (Prod)

We need to create Spark environment for IDEs like Eclipse & IntelliJ Idea. After creating the environment, we need to write the code into Executable File (Jar) using Build tool (MAVEN). Finally, take the Jar file and put it into the Cluster.

Go to Eclipse IDE for Spark Project: