Two types of Partitions in Hive - Static and Dynamic, both operate on the same basic principles of Partitioning. Once the partitions are created, the tables won’t have any difference like static and dynamic partitions. All partitions are treated and one and the same.
1) Static Partition steps:
a) First Create a Table (say name "Student") or choose a table from /in Database.
Example: CREATE TABLE student(student_name STRING ,school_name STRING ,percentage FLOAT) partitioned by (section STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
b) Select particular "Column" (say String) which you want to do partition using "Partitioned By" statement.
c) Describe Table name ('Student' in our case) - here we see section 'Column" has been marked as the partition attribute.
d) Create Different Files (say 5) containing Students (source) Data from respective Sections.
(Note: Make sure that Section Column which is our Partition Column is never added to Actual table).
e) Load Data from Different Files (say 5) containing students Data Section-wise into main Table Name (Student) along with Partitioned attribute value.
LOAD DATA LOCAL INPATH '/home/cloudera/Documents/student_A' INTO TABLE STUDENT partition(section= "A");
f) Now go to your HDFS (/user/hive/warehouse/) and check the student table to see how the partitions are made.
Here we can easily observe that the student table is partitioned and contain data of student in accordance to their section.
g) Now operate with "WHERE" clause to check particular section data.
hive> select * from student where Section"A";
This is the way to perform static partitioning in the hive.
2) Dynamic Partition steps:
a) We need to create a non-partitioned table to store the data.
b) Create a Table
c) Load the Data in Table from any external source say it a text file.
d) Now Create a partitioned table where we want to insert the data with dynamic partition.
e) Once this table is created, we can check for the partition where the partition is done in the right way or not with the following commands:
SHOW PARTITIONS student_part;
f) Insert the data we want to insert with the partition needed:
g) With this Query, we can insert data with the dynamic partition of Table over column course.
Differences |
Static |
Dynamic |
Meaning |
Insert
input data files individually into a partition table. |
Single
insert to partition table. |
Creation |
We need
to manually create each partition before inserting data into a Partition. |
Partitions
will be created dynamically based on input data to the table. |
Commands |
Use LOAD
command to create the partition. |
INSERT
with a SELECT query we create multiple partitions in table. |
MapReduce
Job |
Will not
result in MapReduce job execution since the data is already physically
categorized/partitioned. |
Will
result in MapReduce job execution to group the data first and then partitions
will be added to the table. |
Property |
hive.mapred.mode
= strict |
set
hive.exec.dynamic.partition=true; |
Data
Load |
Saves
your time in loading data. |
Takes
more time in loading data. |
Limitations |
Use WHERE
clause to use limit in the static partition. |
There is no required WHERE clause to use
limit. |
Changes |
We can
alter the partition. |
We can’t
perform alter. |
Usage |
To Load
Small amounts of Data. |
To handle
Large amounts of Data without knowing the no. of columns. |
Examples |
Departments,
State Names, etc. |
Date,
city names, food prices, etc. |
Drawback |
when we
are loading data to some partition, we have to make sure we are putting the
right data in the right partition. |
Difficult
Implementation of Variable (Dynamic) Partitioning. |
No comments:
Post a Comment