Big Data is a technology which handles huge amount of Data which are of 3 types - Structured, Semi-structured & Unstructured.
Data is processed n information which can be Kilobytes (KBs) / Gigabytes (GBs) / Terabytes (TB) / Petabyte (PB) / Exabyte (EB) / Zettabyte (ZB) / Utabytes (UBs).
Approximately, 55 UBs are being generated every day.
Entire Data Science usually handles and divided among:
A) Data Engineers
B) Data Analysts
C) Data Scientist
DATA TYPES |
EXAMPLES |
Structured |
Tables [Rows & Columns] |
Semi-Structured |
JSON, XML, CSV [No Schema] |
Unstructured |
Twitter Logs, Whatsapp chats, Audio files, Video files |
We store Company input Data in Cloud / RDBMS server, etc. which is called "Source Layer".
We need to use SQL (Structured Query Language) to process data in RDBMS server [Structured Table Data].
DATA STORAGE |
SOURCE LAYER |
My SQL RDBMS Server |
Official Website Company Signup Registration |
CSV file |
Telephone File |
JSON |
Facebook social media |
XML |
Twitter |
RDBMS works on Process Locality which works on small amounts of data is Server based Storage.
RDBMS server only used for Structured data analysis use for OLTP [Online Transaction Processing] allows to Insert, Update & Delete Data [DML operations] .
Hadoop is the framework which can handle only Structured and Semi-structured data while Spark ML framework can handle Unstructured data only.
Frameworks which follow Master Slave Architecture for Distributing Data are
HADOOP
HIVE
SPARK
PIG
Data Warehouse is of 3 types:
a) Hot Data (frequently used)
b) Warm Data ( less frequent data)
c) Cold Data (historical data)
3 Phases of Big Data:
a) Ingestion Phase [Source layer, SQOOP]
b) Enrichment Phase [HADOOP]
c) Extraction Phase [HBase]
3 Layers for Big Data process:
a) Source Layer
b) Landing Layer
c) Presentation Layer
There are 5 Vs of Big Data:
a) Volume
b) Velocity
c) Variety
d) Veracity
e) Value
No comments:
Post a Comment