google.com, pub-4600324410408482, DIRECT, f08c47fec0942fa0 SK DATA SHARE

Labels

Friday, April 8, 2022

Transactional vs Analytical Data - Key Difference (Class -21)

We deal with Data on day-to-day basis whether be an Individual Social / Personal Data or an Organizational Business data. So, to store the Data, it needs to be processed from time-to-time. This is how Data Processing comes into picture which is of 2 types:

1) Transactional System Data (OLTP)

2) Analytical System Data (OLAP)


OLAP creates a single platform for all type of business analysis needs which includes planning, budgeting, forecasting, and analysis while OLTP is useful to administer day to day transactions of an organization.

Wednesday, March 30, 2022

Static vs Dynamic Partitions - Key Differences in Hive (Class -20)

Two types of Partitions in Hive - Static and Dynamic, both operate on the same basic principles of Partitioning. Once the partitions are created, the tables won’t have any difference like static and dynamic partitions. All partitions are treated and one and the same.


Tuesday, March 29, 2022

Partitioning vs Bucketing - Key Differences in Hive (Class -19)

Hive is a distributed Data Warehouse system that manages the data stored in HDFS (Hadoop Distributed File System) and provides a SQL-like language (HiveQL) for querying the data

For data storage, Hive has four main components for organizing data: databases, tables, partitions and buckets.


Internal vs External Tables - Key Differences in Hive (Class -18)

The main difference between an internal table and an external table in Hive:

An Internal table is also called a managed table, meaning it’s “managed” by Hive.  When you 'DROP' the Internal table, Hive will delete both the schema/definition and the Metadata and it will also physically delete the data/rows (truncation) associated with that table from the Hadoop Distributed File System (HDFS).

An External table is not “managed” by Hive. When you drop an external table, the schema/table definition is deleted and gone, but the data/rows associated with it are left alone. The table’s rows are not deleted. 

Wednesday, March 23, 2022

Performance Optimization Techniques in Hive (Class -17)

There are several types of Hive Query Optimization techniques are available to optimize hive queries to execute them faster on Hadoop Cluster.

 Hive is a query language which is similar to SQL built on Hadoop ecosystem can process Penta bytes of data.

1)      1) Partitions

1)      2) Bucketing

1)      3) Tez-Execution

1)      4) CBO à Cost based optimization

1)      5) Vectorization