Spark was introduced
by Apache Software Foundation is a lightning-fast cluster computing technology,
designed for speeding up the Hadoop computational computing software process.
Spark is not a modified version of Hadoop and not entirely dependent on
Hadoop because it has its own cluster management. The
main feature of Spark is its in-memory cluster computing that
increases the processing speed of an application.
Spark is not an
ecosystem of Hadoop as it can run individually. Spark uses Hadoop in two ways –
one is Storage and second is Processing (MapReduce).
Hadoop frameworks are
known for analyzing datasets based on a simple programming model (MapReduce)
and main concern is to maintain speed in processing large datasets in terms of
waiting time between queries and waiting time to run the program.
Earlier Hadoop Versions:
Hadoop 1.0 introduced
in 2006 and used up-to 2012 until Hadoop 2.0 (YARN) came into the picture.
Main drawbacks of Hadoop 1.0 are
a) Single
Point of Failure
b) Block Size
c) Relying on
MapReduce (MR) [1970] for Resource management and processing engine.
In 2008, Cloudera becomes the
commercial version of Hadoop which is open source and enterprise.
Spark is
one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab by Matei
Zaharia when they are testing on Resource manager called Mesos Cluster, not for
processing the data.
Spark was
Open Sourced in 2010 under a BSD license. It was donated to Apache software
foundation in 2013, and now Apache Spark has become a top level Apache project
from Feb-2014.