For data storage, Hive has four main components for organizing data: databases, tables, partitions and buckets.
Partitioning is a mechanism that roughly divides the table based on the value of the partition column. Each partition in Hive corresponds to many subdirectories of the table, and all data is put into different subdirectories according to the partition column. Entire Table does not need to be scanned when searching for data in the partition, which is very helpful for improving search efficiency.
Buckets are implemented by performing hash calculations on specified columns. The data in the partition can be further divided into buckets. Unlike partitions that directly split the columns, the buckets often use the hash value of the column to break up the data and distribute it to different buckets to complete the data bucketing process. When the number of partitions is so large that it may cause the file system to crash , we need to use bucketing to solve the problem.
Differences |
Partitioning |
Bucketing |
Meaning |
Hive Partition
is a way to split the large table into smaller tables based on the values of
a column. |
Bucket
is Clustering technique to split the data into more manageable files (by
specify how many buckets you want). |
Similarity |
Divide
the Data into Multiple Parts & then Scan only One Part of it. |
improve performance by eliminating table
scans when dealing with a large set of data on HDFS. |
Columns |
You can
have one or more Partition columns. |
You can
have only one Bucketing column. |
Commands |
Uses
PARTITIONED BY |
Uses
CLUSTERED BY |
Cardinality |
partitioning
on a column with Low Cardinality (no. of Partitions = no. of Distinct Values) |
Cardinality
of a Column is high (fixed no. of Buckets) |
Based on |
Logical
Division |
Hash
Function of a Column. |
Usage |
It is
effective when the data volume in each partition is not very high. |
If some
map-side joins are involved in your queries, then bucketed tables are a good
option. |
Storage |
Partition
is a Folder. |
Bucket
is a File. |
No comments:
Post a Comment