Labels

Sunday, May 15, 2022

Spark Deployment modes (Class -42)

Spark applications can be deployed and executed using spark-submit in a shell command on a cluster. It can use any of the cluster managers like YARN, Mesos or its own Cluster manager through its uniform interface and there is no extra configuration needed for each one of them separately.

For deploying our Spark application on a cluster, we need to use the spark-submit script of Spark.

Spark has to do the process:

                 Spark-submit  --class classname mode  jar inputfile outputfile location



The spark-submit command is a utility to run or submit a Spark or PySpark application program (or job) to the cluster by specifying options and configurations, the application you are submitting can be written in Scala, Java, or Python (PySpark) code. You can use this utility in order to do the following.

1.    Submitting Spark application on different cluster managers like Yarn, Kubernetes, Mesos, and Stand-alone.

2.    Submitting Spark application on client or cluster deployment modes

Spark application needs to be deployed into 3 modes:

a)      Local à Spark itself allocates the Resources (Standalone).

b)      YARN Client à Driver will be running in Edge node. (Dev)

c)       YARN Cluster à Driver will be running in any one of the Data Nodes. (Prod)

We need to create Spark environment for IDEs like Eclipse & IntelliJ Idea. After creating the environment, we need to write the code into Executable File (Jar) using Build tool (MAVEN). Finally, take the Jar file and put it into the Cluster.

Go to Eclipse IDE for Spark Project:

Click on ‘Next’ for Maven Project:


Click on ‘Simple Project’ and Next:


Give any name on Group Id & Artifact Id and click Finish:


See Project has been created:


You see default project is of Java only.

 Next Right Click on ‘Sparkworks’ project à Configure and Add ‘Scala Nature’.


Right click on src/main/java à Refractor à Rename to scala:


Similarly do the same to file src/test/java

 Right click on Scala Library Container à Configure Build Path à Scala Compiler

 Click on ‘Use Project Settings’ to Latest 2.11 bundle (dynamic).




Configure JRE System Library:



Open ‘pom.xml’


Further to create Spark Environment, we need to import Maven dependencies (Spark modes) into pom.xml:

We process the data in Spark in 3 modes:

a)      Spark Core

b)      Spark SQL

c)       Spark Streaming

 

Go to Google à Spark Core Maven Dependency 2.2.0


Spark SQL Maven dependency:


Spark Streaming Maven dependency:


Paste above Maven dependencies in pom.xml:


Now create a Package by Right click on Sparkworks à Sparkworks.scala


Create ‘Scala Object’ under Sparkworks.scala:


See Object ‘singleton’ created:


Import Spark Context & Spark Conf:


Write Main function and create Configuration object.

Assign class ‘SparkConf’ to Object ‘conf’.


Now right click on ‘Sparkworks’ and Run as Maven Clean.

Next step, Run as Maven Build.. and type ‘package’ in Goals.


Click Refresh after Right Click on project ‘Sparkworks’.

See Jar file is created.




No comments:

Post a Comment