
Saturday, April 9, 2022

Data Analytics Syllabus

 Main topics covered in Data Analytics are

a) Big Data Architecture








i) AWS


Module 01 - Hadoop Introduction

Big Data Curriculum

Introduction to Big Data

Big Data and Its Importance

Simple Architecture of Big Data

Hadoop 1.0 Architecture

Hadoop 2.0 Architecture

Big Data Environments

Map Reduce Explanation with an example

YARN Architecture

Setting Up Cloudera environment

What is Linux ?

Linux Basic commands sessions

Unix shell Scripting basics 

Hadoop Basic Commands

Module 02 - SQOOP

Sqoop Introduction

Sqoop Internal Process

Sqoop Explanation with Example

Sqoop with Eval

Sqoop with Split by

Handson1 - Eval,SplitBy,Basic Import from MySQL

Sqoop Import Properties

Sqoop Incremental Import

Handson2 - Sqoop Incremental Import

Sqoop Incremental last_Modified

Handson-3 Sqoop Incremental Append

Sqoop Job Creation - Basic

Sqoop Job creating Password file

Direct Mode

Sqoop Import using Shell sripting

Sqoop Handson Session -2

Sqoop validate Command

Sqoop Import into Hive table

Sqoop Import All Tables

Sqoop Import All Tables Exclude command

Sqoop Export Introduction

Sqoop Export Internal Process

Sqoop Export Incremental load

Sqoop Export properties

Sqoop Export Transnationality

Module 03 - HIVE

Hive Introduction



Hive Architecture

Different Types of Hive metastore

Different ways of Accessing Hive

Hive Beeline Explanation

Different types of Execution Engines in Hive

Hive - Hadoop Integration

Hive - Tables - Managed and External Tables

Hive Internal tables Explanation

How to create the Internal tables

Hive Internal Table creation on top of Directory

Loading Data from a File to Hive table

Hive External tables Explanation

How to external Tables on dirtectory

Difference between Internal and External table

Handson - Internal And External Tables

Partitions Introduction

Static Partition - Load and Insert

Dynamic Partitions Insert

Handson - Static and Dynamic Partitions

Hive Sub partitions Explanation

Handson - Sub partitions

Bucketing in Hive Explanation

Bucketing on INTEGER column

Bucketing on String Column

Bucketing in Date Column

Handson- Bucketing

Bucketing and Partition on Same Table

Hive Query Optimization

Hive built In Functions

Views in Hive

Hive Sub quires

SCD Types Explanation in Hive

Implementation of SCD Type 1 in Hive

How to remove the duplicates in Hive table

Hive Serde Properties Explanation

Hive table creation on parquet

Hive Table creation on Avro

Hive table creation on XML files

Different types of Joins in Hive

Map side Join In Hive

Bucket Map Join in Hive

Sort Merge Bucket Join in Hibe

Handson - Joins in Hive

Hive UDF creation

Handson - Handle Incremental Load in Hive through Views

Hive Ranking functions

Concept of Vectorization

Choosing File format in Hive - Industry based

Hive MSCK command Explanation

Hive Advanced commands

ACID Properties In Hive

Handson - DML operations in Hive

ORC vs Parquet Vs AVRO


Introduction to Hbase

Types of NOSQL Databases

Characteristrics of NOSQL


Why Column Based Storage is highly preferred than Row Based

RDBMS vs Hbase

Storage Hierarchy in HBASE

Hbase Architecture


What is column family in Hbase ?

Handson Session on HBASE commands

How to create the Hbase table

How to insert the data into Hbase Table

How to scan the data

How to enable the table

How to disable the table


Basics of Scala

Why Scala is called Functional programming

What is the use of Function programming

Difference between VAL and VAR

Data Types of Scala

Use of UNIT Data Type

Collection in Scala

List Collection

Set collection

Tuple Collection

Map Collection

Range Collection

Expressions in Scala

Statements in Scala

Scala Class Hierarchy

For Loop

If loop

Match Expression

Wild Card Pattern Matching

String Interpolation

Functions in Scala

Methods and Operations

Nested Functions

Variable Args Functions

Vector collection explanation

Recursive functions

Higher Order Functions Introduction

What is the use of Higher Order functions

Map Higher Order function

Filter Higher Order function

Foreach higher Order function

Reduceby Higher Order function

Currying In scala

Singleton Object on Scala

How to create a singeton object

Classes in Scala

Companion Object and Case Class

Main method in Scala

Factory Design Pattern

Traits in Scala

How to create traits in Projects

Options in Scala

Handling Nulls in Scala


Spark Introduction

Why Spark?

Spark Ecosystem Components

Spark and mapReduce differences

Architecture of Spark

Different ways of process the data in Spark

Spark Core Introduction

What is SparkContext?

what is RDD and its importance?

what is DAG?

RDD Lineage

Concept of resilent

Lazy transformations

What is transformation in RDD

Examples of Transformations in RDD

What is actions in RDD ?

Examples of RDD Actions

Narrow and Wide Transformation

Setting Up Eclipse IDE

How to perform word count processing in Spark Core

Jar creationJar deployment

Spark Submit Introduction

Spark Submit Architecture explanation

Spark Submit - Stages in Spark

Different modes of Spark Submit

Spark Submit in Client mode

Spark Submit In cluster mode

Spark submit in Standalone mode

Spark Dynamic memory Allocation of resources

Difference between Group By Vs ReduceBy

Concept of Accumulators

Concepts of Broadcast varibales

How to Accumulators and broadcast variables acts as a Optimization techniques in Spark



Difference between repartition and Coalesce - Real time scenerio

How to increase the parallelism in spark

Hands On Document for Spark Core

Spark Core HandsOn Session -1

Spark Core HandsOn Session -2

Concept of Map partition

Cache Concept In Detail

Units of Caching

Different memory Levels in Spark

Difference between cache vs persist

Concept of Serialization in Spark

Java serialization

Kyro Serialization

why Kyro Serialization is best for Spark?

Joins in Spark Core

Benefits of Repartitions

partitionBy vs bucketBy

saving file in various file format

Module 11-KAFKA

Why Kafka?

Kafka explanation with real time scenerio

Kafka Message Queue Components explanation


What is Producer and Consumer?

Broker and its importance

Controller Broker explanation and its election

Use of Zookeeper

What is Offset ?

what is BootStrap Servers?

Installing One Node Kafka cluster locally

Introduction to KAFKA

Data storage in Brokers

Leader Copy in Kafka

Follower copy in Kafka

Consumer Groups

Data Serialization in Kafka

Module 12 - Cassandra

what is cassandra?


Cassandra VS RDBMS

NoSQL DB Comparisions

Features of Cassandra


Replica placement Strategy

Simple Strategy

Network Topology Strategy

CQL Table

Cassandra Handson Session

Module 13 - Cloud Computing

What is cloud computing ?

Different types of Cloud

Customer Defination for Cloud Computing

Businees Defination for Cloud Computing

What is Public Cloud

What is Private Cloud

What is Hybrid Cloud

What are cloud services ?

IaaS Service

PaaS Service

SaaS Service

Module 14 - AWS In Big Data

Why we go for AWS ?

Why AWS is a world largest cloud provider ?

Storage services in AWS

What is S3 Storage?

How to upload the data in S3 Storage?

How to process the data that is present in S3 Storage ?

EMR - hadoop service in AWS

how to create EMR cluster

How to process the data in EMR through Hive ?

how to create hive tables in EMR on S3 Storage

How to copy the data from S3 to local

How to create EC2 Instance

How to generate Key value pair

AWS basic commands requires for Big Data processing

What is Athena

Why we go for Athena ?

Module 15 - Azure in Big Data

What is Azure ?

Why Azure is world number one Cloud provider interms of Security

Services Offered by Azure

How to create the Free Azure account

Services Offered by Azure

Data Storage Services in Azure - Azure BLOB storage

HdInsight cluster - Hadoop service in Azure

Creation of HDInsight cluster

Performing Hive analytics in Azure HDInsight Cluster

Upload the processed data into Azure Data lake Storage

Full Stack Data Analytics Certification Course Program

You will have:

Online Live Video Sessions Weekly

Course Material

Practical Exercises


 Get 10% Discount by joining through below link:

Full Stack Data Science Boot Camp

 Benefits of Course are

 Online Video Sessions (500 Hrs.) on Weekends

One Year Internship

Personalized Mentorship

56+ Real Time Projects

Resume building

Career guidance

Interview Preparation

Job Guarantee

 Get 10% Discount by joining with below link:

Data Analytics (DA) Course:

I can provide complete DA Course material through Google Drive link includes

 a) 100 Hrs. of Videos by expert Data Analyst working in IT MNC Company.

 b) Workouts data

 c) Interview Preparations

 d) IDE Software ZIP files (Cloudera, Oracle VM, Eclipse)

Topics covered in Big Data are:

 A) Hadoop

 B) Sqoop

 C) Hive

 D) Scala

 E) Spark


  All this for a very reasonable price. You can learn by self in systematic way.

        Call me to 7013965360.

No comments:

Post a Comment