Labels

Tuesday, March 29, 2022

Internal vs External Tables - Key Differences in Hive (Class -18)

The main difference between an internal table and an external table in Hive:

An Internal table is also called a managed table, meaning it’s “managed” by Hive.  When you 'DROP' the Internal table, Hive will delete both the schema/definition and the Metadata and it will also physically delete the data/rows (truncation) associated with that table from the Hadoop Distributed File System (HDFS).

An External table is not “managed” by Hive. When you drop an external table, the schema/table definition is deleted and gone, but the data/rows associated with it are left alone. The table’s rows are not deleted. 

All the write operations to the Managed tables are performed using Hive SQL commands. If a Managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. The transactional semantics (ACID) are also supported only on Managed tables.

The writes on External tables can be performed using Hive SQL commands but data files can also be accessed and managed by processes outside of Hive. If an External table or partition is dropped, only the metadata associated with the table or partition is deleted but the underlying data files stay intact. 


Differences

Internal Table

External Table

Owns

also known as Managed tables manages the lifecycle of the table (metadata & data).

Data here are not owned or managed by Hive. To create an External table you need to use EXTERNAL clause.

Storage

Hive by default stores the files at the data warehouse location which is located at /user/hive/warehouse

stored outside the warehouse directory.

Drop semantics

both Table data (Schema) and the Metadata will be deleted from HDFS.

only the metadata (not actual data) associated with the table will get deleted.

Load semantics

Hive moves data into the warehouse directory.

With the EXTERNAL keyword, Hive knows that it is not managing the table data, so it does not move data to its warehouse directory.

Supports

ACID/Transactional, ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE operations.

Not supported.

Usage

If Data is temporary and doesn’t affect businesses in real-time.

use data outside HIVE for performing a different operations such as loading and merging.

Security

exclusively responsible for the security and management of the data present in the internal table.

managed at the HDFS level as anyone having access to the HDFS file structure can access an external table.


No comments:

Post a Comment