Apache Hive allows us to organize the table into multiple partitions where we can group the same kind of data together. It is used for distributing the load horizontally. These smaller logical tables are not visible to users and users still access the data from just one table.
Hive organizes tables into Partitions à way of dividing a table into related parts based on the values of
particular columns like date, city, and department.
Each table in the hive can have one or more partition keys to identify a particular partition. Using partition it is easy to do queries on slices of the data.
Partitioning in Hive distributes execution load horizontally.
In partition faster execution of queries with the low volume of data takes place.
When you load the data into the partition table, Hive internally splits the records based on the partition key and stores each partition data into a sub-directory of tables directory on HDFS. The name of the directory would be partition key and it’s value.
·
Static Partitioning
·
Dynamic Partitioning
·
Insert input data files individually into a
partition table is Static Partition.
·
Usually when loading files (big files)
into Hive
tables static partitions are preferred.
·
Static Partition saves your time in loading
data compared to dynamic partition.
·
You “statically” add a partition in the table
and move the file into the partition of the table.
·
We can alter the partition in the static
partition.
·
If you want to use the Static partition in the hive you should set
property set hive.mapred.mode = strict This property set by
default in hive-site.xml
·
Static partition is in Strict Mode.
·
You should use “where” clause to use limit in the static
partition.
You can perform Static partition on Hive Manage table or external table.
B) Hive Dynamic Partitioning:
·
Single insert to partition table is known as a
dynamic partition.
·
Usually, dynamic partition loads the data from
the non-partitioned table.
·
Dynamic Partition takes more time in
loading data compared to static partition.
·
When you have large data stored in a table
then the Dynamic partition is suitable.
·
If you want to partition a number of columns
but you don’t know how many columns then also dynamic partition is suitable.
·
Dynamic partition there is no required where
clause to use limit.
·
We can’t perform alter on the Dynamic
partition.
·
You can perform dynamic partition on hive
external table and managed table.
·
If you want to use the Dynamic partition in
the hive then the mode is in non-strict mode.
Static Partition with example:
Steps
included are:
No need to use ‘where’ condition as in Static.
Create a
Parent table or use already Existing One:
PARTITIONED BY
clause along with the column you wanted to partition and its type. Let’s create a table and Load the CSV file.- Enable the dynamic partition by using the following commands: -
- hive> set hive.exec.dynamic.partition.mode=nonstrict;
Sub-Partitions
Create a Sub-Partition table:
Show
Tables:
Insert
table into Sub-Partition:
Check in
Cloudera Hue:
Partitions
will create folders.
No comments:
Post a Comment