Labels

Sunday, March 20, 2022

Hive Metastore (Class -11)

Central Repository location where Hive stores all the Schemas [Column names, Data types, etc.] are called ‘Metastore’ (Derby).  It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API.

By default, Hive uses a built-in Derby SQL server.

Now when you run your Hive query and you are using the default Derby database, you will find that your current directory now contains a new sub-directory, metastore_db

The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true. This value specifies that you will be using the embedded Derby as your Hive metastore, and the location of the metastore is metastore_db.

We can also configure the directory for the Hive to store table information. By default, the location of the warehouse is file:///user/hive/warehouse and we can also use the hive-site.xml file for the local or remote metastore.

MySQL server à Metastore (Hive table schema details) where most of the projects use.


Hive metastore consists of two fundamental units:

1.       A service that provides metastore access to other Apache Hive services.

2.       Disk storage for the Hive metadata which is separate from HDFS storage.

 

There are three modes for Hive Metastore deployment:

·         Embedded Metastore

·         Local Metastore

·         Remote Metastore



i. Embedded Metastore:

Embedded mode is the default metastore deployment mode for Cloudera distribution. In this mode, the metastore uses a Derby database. Both the database and the metastore service are embedded in the main HiveServer process, i.e. runs in the same JVM as the Hive service. 

This mode supports only one active user at a time, i.e. only one Hive session could be open at a time. Note that this mode is not certified for production use.

ii. Local Metastore

Hive is the data-warehousing framework, so hive does not prefer single session. To overcome this limitation of Embedded Metastore, for Local Metastore was introduced. This mode allows us to have many Hive sessions i.e. many users can use the metastore at the same time.

The embedded metastore service communicates with the metastore database over JDBC.

MySQL is a popular choice for the standalone metastore. In this case, the javax.jdo.option.ConnectionURL property is set to jdbc:mysql://host/dbname? createDatabaseIfNotExist=true, and javax.jdo.option.ConnectionDriverName is set to com.mysql.jdbc.Driver. The JDBC driver JAR file for MySQL (Connector/J) must be on Hive’s classpath, which is achieved by placing it in Hive’s lib directory.

iii. Remote Metastore

In Remote mode, the Hive Metastore service runs in its own JVM process. HiveServer2, HCatalog, Impala, and other processes communicate with it using the Thrift network API (configured using the hive.metastore.uris property). The metastore service communicates with the metastore database over JDBC (configured using the javax.jdo.option .ConnectionURL property). The database, the HiveServer process 
and the metastore service can all be on the same host, but running the HiveServer process on a separate host provides better availability and scalability.

This also brings better manageability/security because the database tier can be completely firewalled off. And the clients no longer need share database credentials with each Hiver user to access the metastore database.

Hive> set hive.cli.print.current.db=true; 







Hive Driver will do 2 checks:

a) Syntax Error: à whether query is correct or not.

b) Semantic Exception: à whether particular table is present in Hive Metastore or not.

No comments:

Post a Comment