Labels

Wednesday, March 30, 2022

Static vs Dynamic Partitions - Key Differences in Hive (Class -20)

Two types of Partitions in Hive - Static and Dynamic, both operate on the same basic principles of Partitioning. Once the partitions are created, the tables won’t have any difference like static and dynamic partitions. All partitions are treated and one and the same.


1) Static Partition steps:

a) First Create a Table (say name "Student") or choose a table from /in Database.

     Example: CREATE TABLE student(student_name STRING ,school_name STRING ,percentage FLOAT) partitioned by (section STRINGROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

b) Select particular "Column" (say String) which you want to do partition using "Partitioned By" statement.

c) Describe Table name ('Student' in our case) - here we see section 'Column" has been marked as the partition attribute.

d) Create Different Files (say 5) containing Students (source) Data from respective Sections.

      (Note: Make sure that Section Column which is our Partition Column is never added to Actual table).

e) Load Data from Different Files (say 5) containing students Data Section-wise into main Table Name (Student) along with Partitioned attribute value.

LOAD DATA LOCAL INPATH '/home/cloudera/Documents/student_A' INTO TABLE STUDENT partition(section= "A");

f) Now go to your HDFS (/user/hive/warehouse/) and check the student table to see how the partitions are made.

Here we can easily observe that the student table is partitioned and contain data of student in accordance to their section.

g) Now operate with "WHERE" clause to check particular section data.

hive> select * from student where Section"A";

This is the way to perform static partitioning in the hive.


2) Dynamic Partition steps:

a) We need to create a non-partitioned table to store the data.

b) Create a Table

c) Load the Data in Table from any external source say it a text file.

d) Now Create a partitioned table where we want to insert the data with dynamic partition.

e) Once this table is created, we can check for the partition where the partition is done in the right way or not with the following commands:

SHOW PARTITIONS student_part;

f) Insert the data we want to insert with the partition needed:

Insert into student_part  partition(course)
Select id,name,age,institute,course from  stud_demo;

g) With this Query, we can insert data with the dynamic partition of Table over column course.

Differences

Static

Dynamic

Meaning

Insert input data files individually into a partition table.

Single insert to partition table.

Creation

We need to manually create each partition before inserting data into a Partition.

Partitions will be created dynamically based on input data to the table.

Commands

Use LOAD command to create the partition.

INSERT with a SELECT query we create multiple partitions in table.

MapReduce Job

Will not result in MapReduce job execution since the data is already physically categorized/partitioned.

Will result in MapReduce job execution to group the data first and then partitions will be added to the table.

Property

hive.mapred.mode = strict

set hive.exec.dynamic.partition=true;

Data Load

Saves your time in loading data.

Takes more time in loading data.

Limitations

Use WHERE clause to use limit in the static partition.

 There is no required WHERE clause to use limit.

Changes

We can alter the partition.

We can’t perform alter.

Usage

To Load Small amounts of Data.

To handle Large amounts of Data without knowing the no. of columns.

Examples

Departments, State Names, etc.

Date, city names, food prices, etc.

Drawback

when we are loading data to some partition, we have to make sure we are putting the right data in the right partition.

Difficult Implementation of Variable (Dynamic) Partitioning.

No comments:

Post a Comment