SerDe means Serializer and Deserializer. Hive uses SerDe and FileFormat to read and write table rows. Main use of SerDe interface is for IO operations.
Process of
compressing the huge Data and serialize it to binary value format before
transfer through HDFS though Network is called “Serialization”.
Data
transferred through Serialization process finally reach the HDFS in the form of
AVRO file format / Parquet file format.
Binary
values of data in HDFS are converted to Human readable format through Deserialization process.
Now, we
have AVRO file and AVRO schema on HDFS:
All Hive
tables create on External table.
We need to
put input and output both in Avro file format in hive.
Hive>
show tables;
Avro_hive_tab
Hive_dml
Hive>
describe formatted hive_dml;
Storing
the file in ORC file format:
ALTER
TABLE person SET SERDEPROPERTIES (‘serialization.encoding’=’GBK’);
·
ThriftSerDe: This SerDe is used to read/write Thrift serialized objects.
The class file for the Thrift object must be loaded first.
·
DynamicSerDe: This SerDe also read/write Thrift serialized objects, but
it understands Thrift DDL so the schema of the object can be provided at
runtime. Also it supports a lot of different protocols, including
TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in
delimited records).
Other Built-in SerDes are Avro, ORC, RegEx, Parquet, CSV, JsonSerDe, etc.
No comments:
Post a Comment