SK DATA SHARE: Partitions in Hive (Class -9)

Apache Hive allows us to organize the table into multiple partitions where we can group the same kind of data together. It is used for distributing the load horizontally. These smaller logical tables are not visible to users and users still access the data from just one table.

Hive organizes tables into Partitions à way of dividing a table into related parts based on the values of particular columns like date, city, and department.

Each table in the hive can have one or more partition keys to identify a particular partition. Using partition it is easy to do queries on slices of the data.

Partitioning in Hive distributes execution load horizontally.

In partition faster execution of queries with the low volume of data takes place.

When you load the data into the partition table, Hive internally splits the records based on the partition key and stores each partition data into a sub-directory of tables directory on HDFS. The name of the directory would be partition key and it’s value.

There are two types of Partitioning in Apache Hive-

· Static Partitioning

· Dynamic Partitioning

A) Hive Static Partitioning:

· Insert input data files individually into a partition table is Static Partition.

· Usually when loading files (big files) into Hive tables static partitions are preferred.

· Static Partition saves your time in loading data compared to dynamic partition.

· You “statically” add a partition in the table and move the file into the partition of the table.

· We can alter the partition in the static partition.

· If you want to use the Static partition in the hive you should set property set hive.mapred.mode = strict This property set by default in hive-site.xml

· Static partition is in Strict Mode.

· You should use “where” clause to use limit in the static partition.

You can perform Static partition on Hive Manage table or external table.

B) Hive Dynamic Partitioning:

· Single insert to partition table is known as a dynamic partition.

· Usually, dynamic partition loads the data from the non-partitioned table.

· Dynamic Partition takes more time in loading data compared to static partition.

· When you have large data stored in a table then the Dynamic partition is suitable.

· If you want to partition a number of columns but you don’t know how many columns then also dynamic partition is suitable.

· Dynamic partition there is no required where clause to use limit.

· We can’t perform alter on the Dynamic partition.

· You can perform dynamic partition on hive external table and managed table.

· If you want to use the Dynamic partition in the hive then the mode is in non-strict mode.

a) Static Partition using Insert:

Static Partition with example:

Static Partition using Insert:

No Project will use Static Partition using Load only with Insert.

b) Dynamic Partition:

Steps included are:

No need to use ‘where’ condition as in Static.

Create a Parent table or use already Existing One:

To create a Hive table with partitions, you need to use PARTITIONED BY clause along with the column you wanted to partition and its type. Let’s create a table and Load the CSV file.