The main difference between an internal table and an external table in Hive:
An Internal table is also called a managed table, meaning it’s “managed” by Hive. When you 'DROP' the Internal table, Hive will delete both the schema/definition and the Metadata and it will also physically delete the data/rows (truncation) associated with that table from the Hadoop Distributed File System (HDFS).
An External table is not “managed” by Hive. When you drop an external table, the schema/table definition is deleted and gone, but the data/rows associated with it are left alone. The table’s rows are not deleted.
The writes on External tables can be performed using Hive SQL commands but data files can also be accessed and managed by processes outside of Hive. If an External table or partition is dropped, only the metadata associated with the table or partition is deleted but the underlying data files stay intact.
Differences | Internal
Table | External
Table |
Owns | also
known as Managed tables manages the lifecycle of the table (metadata &
data). | Data here
are not owned or managed by Hive. To create an External table you need to use
EXTERNAL clause. |
Storage | Hive by
default stores the files at the data warehouse location which is located at
/user/hive/warehouse | stored
outside the warehouse directory. |
Drop
semantics | both
Table data (Schema) and the Metadata will be deleted from HDFS. | only the
metadata (not actual data) associated with the table will get deleted. |
Load
semantics | Hive
moves data into the warehouse directory. | With the
EXTERNAL keyword, Hive knows that it is not managing the table data, so it
does not move data to its warehouse directory. |
Supports | ACID/Transactional,
ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE operations. | Not
supported. |
Usage | If Data
is temporary and doesn’t affect businesses in real-time. | use data
outside HIVE for performing a different operations such as loading and
merging. |
Security | exclusively
responsible for the security and management of the data present in the internal
table. | managed
at the HDFS level as anyone having access to the HDFS file structure can
access an external table. |
No comments:
Post a Comment