Labels

Thursday, March 17, 2022

How to create a Table in Hive? (Class -8)

 A table in Hive is a set of data that uses a schema to sort the data by given identifiers.

The way of creating tables in the hive is very much similar to the way we create tables in SQL. 

We can perform the various operations with these tables like Joins, Filtering, etc. 


 Create a Database in Hive:

The default location where the database is stored on HDFS is /user/hive/warehouse. 




Config file of Hive is hive-site.xml

Type “vi hive-site.xml



Go to Windows tab in Cloudera, Click on HUE.

Sign In into Hue as

Username: cloudera

Password: cloudera

Go to File Browser (Hue) – Cloudera Windows.



Click on ‘user’.


Click on ‘Hive’.


Click on ‘warehouse’.


See the database which we have created.


Use Database:


See there are no tables inside.


Create a Table inside database:

Since this is warehouse, you cannot create data inside table but can load the data from hdfs using delimiter.

hive> create table t1(id int,name string,salary int) row format delimited fields terminated by ‘,’;

Take one data in Cloudera insert into Hive.



Type ‘show tables’ to check whether the table is created or not.


But no data inside table here:


We can only load data into the Hive.

See the table t1 in database, but we don’t have a data till now.


hive> load data local inpath ‘/home/cloudera/employee_data.txt’ into table t1;

Data loading will happen in above command.


Hive> select * from t1;

Hive> describe formatted t1;

Two Types of Tables in Hive:

  • Internal table
  • External table
A) Internal Table:

The internal tables are also called managed tables as the lifecycle of their data is controlled by the Hive. By default, these tables are stored in a subdirectory under the directory defined by hive.metastore.warehouse.dir (i.e. /user/hive/warehouse). The internal tables are not flexible enough to share with other tools like Pig. If we try to drop the internal table, Hive deletes both table schema and data.
  • Let's create an internal table by using the following command:-
Create a table b10_internal:


  • Let's see the metadata of the created table by using the following command:-




Data in the Edge node:



Load Data:


Whatever process we do in Hive, this will reflect on HDFS HUE.

Check in Hue:


B) External Table:

All production tables are external.

The external table allows us to create and access a table and a data externally. The external keyword is used to specify the external table, whereas the location keyword is used to determine the location of loaded data.

As the table is external, the data is not present in the Hive directory. Therefore, if we try to drop the table, the metadata of the table will be deleted, but the data still exists.

To create an external table, follow the below steps: -

  • Let's create a directory on HDFS by using the following command: -
  • hdfs dfs -mkdir /HiveDirectory 
  • Now, store the file on the created directory.
  • hdfs dfs -put hive/emp_details /HiveDirectory
  • Let's create an external table using the following command: -

       Create External table à b10_external

Hive> describe formatted b10_external;


  • Now, we can use the following command to retrieve the data: -




No comments:

Post a Comment