Apache
Spark is an open-source cluster-computing framework for real time processing which is 100 times faster in memory and 10 times
faster on disk when compared to Apache Hadoop.
Apache Spark has a well-defined architecture
integrated with various extensions and libraries where all the spark components
and layers are loosely coupled.
Spark is a
distributed processing engine and it follows the Master-Slave architecture. So,
for every Spark Application, it will create one master process and multiple
slave processes.
When you run a Spark application, Spark Driver
creates a context that is an entry point to your application, and all
operations (transformations and actions) are executed on worker nodes, and the
resources are managed by Cluster Manager.
Features of
Apache Spark: