The main difference between an internal table and an external table in Hive:
An Internal table is also called a managed table, meaning it’s “managed” by Hive. When you 'DROP' the Internal table, Hive will delete both the schema/definition and the Metadata and it will also physically delete the data/rows (truncation) associated with that table from the Hadoop Distributed File System (HDFS).
An External table is not “managed” by Hive. When you drop an external table, the schema/table definition is deleted and gone, but the data/rows associated with it are left alone. The table’s rows are not deleted.
The writes on External tables can be performed using Hive SQL commands but data files can also be accessed and managed by processes outside of Hive. If an External table or partition is dropped, only the metadata associated with the table or partition is deleted but the underlying data files stay intact.
| Differences | Internal
  Table | External
  Table | 
| Owns | also
  known as Managed tables manages the lifecycle of the table (metadata &
  data). | Data here
  are not owned or managed by Hive. To create an External table you need to use
  EXTERNAL clause. | 
| Storage | Hive by
  default stores the files at the data warehouse location which is located at
  /user/hive/warehouse | stored
  outside the warehouse directory. | 
| Drop
  semantics | both
  Table data (Schema) and the Metadata will be deleted from HDFS. | only the
  metadata (not actual data) associated with the table will get deleted. | 
| Load
  semantics | Hive
  moves data into the warehouse directory. | With the
  EXTERNAL keyword, Hive knows that it is not managing the table data, so it
  does not move data to its warehouse directory. | 
| Supports | ACID/Transactional,
  ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE operations. | Not
  supported. | 
| Usage | If Data
  is temporary and doesn’t affect businesses in real-time. | use data
  outside HIVE for performing a different operations such as loading and
  merging. | 
| Security | exclusively
  responsible for the security and management of the data present in the internal
  table. | managed
  at the HDFS level as anyone having access to the HDFS file structure can
  access an external table. | 

No comments:
Post a Comment