Central Repository location where Hive stores all the Schemas [Column names, Data types, etc.] are called ‘Metastore’ (Derby). It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API.
By default, Hive uses
a built-in Derby SQL server.
Now when you run your
Hive query and you are using the default Derby database, you will find that
your current directory now contains a new sub-directory, metastore_db.
The default value of
this property is jdbc:derby:;databaseName=metastore_db;create=true. This value specifies that you will be using the embedded
Derby as your Hive metastore, and the location of the metastore is metastore_db.
We can also configure
the directory for the Hive to store table information. By default, the location
of the warehouse is file:///user/hive/warehouse and we can also use the hive-site.xml file for the local or remote metastore.
MySQL
server à Metastore
(Hive table schema details) where most of the projects use.
Hive metastore consists of two fundamental units:
1.
A service that provides metastore access to
other Apache Hive services.
2.
Disk storage for the Hive metadata which is separate from HDFS storage.
There are three modes for Hive Metastore
deployment:
·
Embedded Metastore
·
Local Metastore
·
Remote Metastore
i. Embedded
Metastore:
Embedded
mode is the default metastore deployment mode for Cloudera distribution. In
this mode, the metastore uses a Derby database. Both the database and the
metastore service are embedded in the main HiveServer process, i.e. runs in the
same JVM as the Hive service.
This
mode supports only one active user at a time, i.e. only one Hive session could
be open at a time. Note that this mode is not certified for production use.
ii. Local
Metastore
Hive is the
data-warehousing framework, so hive does not prefer single session. To overcome
this limitation of Embedded Metastore, for Local
Metastore was introduced. This mode allows us to have
many Hive sessions i.e. many users can use the metastore at the same time.
The
embedded metastore service communicates with the metastore database over JDBC.
MySQL is a popular
choice for the standalone metastore. In this case, the javax.jdo.option.ConnectionURL property is set to jdbc:mysql://host/dbname? createDatabaseIfNotExist=true, and javax.jdo.option.ConnectionDriverName is set to com.mysql.jdbc.Driver. The JDBC driver JAR file for MySQL (Connector/J) must
be on Hive’s classpath, which is achieved by placing it in Hive’s lib directory.
iii. Remote
Metastore
hive.metastore.uris
property).
The metastore service communicates with the metastore database over JDBC
(configured using the javax.jdo.option .ConnectionURL
property).
The database, the HiveServer process and
the metastore service can all be on the same host, but running the HiveServer
process on a separate host provides better availability and scalability.This also brings
better manageability/security because the database tier can be completely
firewalled off. And the clients no longer need share database credentials with
each Hiver user to access the metastore database.
No comments:
Post a Comment