Spark applications can be deployed and executed using spark-submit in a shell command on a cluster. It can use any of the cluster managers like YARN, Mesos or its own Cluster manager through its uniform interface and there is no extra configuration needed for each one of them separately.
For deploying our Spark application on a cluster, we need to use the spark-submit script of Spark.
Spark has to do the process:
Spark-submit --class classname mode jar inputfile outputfile location
1.
Submitting Spark application on different
cluster managers like Yarn, Kubernetes, Mesos,
and Stand-alone.
2. Submitting Spark application on client or cluster deployment modes
Spark application needs to be deployed into 3 modes:
a)
Local à Spark itself allocates the Resources (Standalone).
b)
YARN Client à Driver will be running in Edge node. (Dev)
c) YARN Cluster à Driver will be running in any one of the Data Nodes. (Prod)
We need to create Spark environment for IDEs like Eclipse & IntelliJ Idea. After creating the environment, we need to write the code into Executable File (Jar) using Build tool (MAVEN). Finally, take the Jar file and put it into the Cluster.
Go to Eclipse IDE for Spark Project:
Click on ‘Next’ for Maven
Project:
Click on ‘Simple Project’ and Next:
Give any name on Group Id
& Artifact Id and click Finish:
See Project has been created:
You see default project is of Java only.
Right click on src/main/java à Refractor à Rename to scala:
Similarly do the same to
file src/test/java
Click on ‘Use Project Settings’ to Latest 2.11
bundle (dynamic).
Configure JRE System Library:
Open ‘pom.xml’
Further to create Spark Environment, we need to import Maven
dependencies (Spark modes) into pom.xml:
We process the data in Spark in 3 modes:
a)
Spark Core
b)
Spark SQL
c)
Spark Streaming
Go to Google à Spark Core Maven Dependency 2.2.0
Spark SQL Maven dependency:
Spark
Streaming Maven dependency:
Paste
above Maven dependencies in pom.xml:
Now create a Package by Right click on
Sparkworks à
Sparkworks.scala
Create ‘Scala Object’ under
Sparkworks.scala:
See Object ‘singleton’ created:
Import Spark Context & Spark Conf:
Write Main function and create
Configuration object.
Assign class ‘SparkConf’ to Object
‘conf’.
Now right click on ‘Sparkworks’ and
Run as Maven Clean.
Next step, Run as Maven Build.. and
type ‘package’ in Goals.
Click Refresh after Right Click on
project ‘Sparkworks’.
No comments:
Post a Comment