Apache Hive is a popular data warehouse, based on Hadoop infrastructure and very demanding for data analytic. Nowadays Hive is almost used in every data analytic job. It's very much similar to any sql-oriented rdbms syntax but the objective of Hive is totally different than, traditional RDBMS. Hive is very popular for batch processing.

In this article, I am going to show you an example of one of the collection data type in hive known as struct, although we have already seen a complete hive data type tutorial here. Hive's collection data type support four different  type and those are-
Collection data type in Hive:
  • Array: Indexed based collection of similar type.
  • Struct: Object(object contains different types of fields)
  • Map: Collection of Key-Value pair.
  • uniontype: is a collection of heterogeneous data types.
In our previous post, we have already seen Array Collection type in Hive, now lets explore the Struct type in this article.

Struct data type in Hive:
It's very much similar to Java object or exactly same as struct in C language. It contains different types of fields unlike array(array contains similar type) and fields can be accessed via .(dot) notation like product.id

Sample Cricket Player Dataset:

Lets create table to hold struct type:
create table cricket_players(id int,team string,country string,player string,
match_details struct<total_test:int,total_odi:int,debut_dt:string>) 
row format delimited
fields terminated by '\001'
collection items terminated by '\002'
stored as textfile;

Describe command to verify table creation:
hive> desc cricket_players;
id                   int                                      
team                 string                                   
country              string                                   
player               string                                   
match_details        struct<total_test:int,total_odi:int,debut_dt:string>      
Time taken: 0.083 seconds, Fetched: 5 row(s)

Load data into cricket_players table:
putting data on hdfs
hadoop fs -put cricket_players /
loading dataset
hive> load data  inpath '/cricket_players' into table cricket_players;
Select Query:
hive> select * from cricket_players;

Let's access some elements from struct type:
hive> select team,country,player,match_details.total_test as total_test,match_details.total_odi as total_odi from cricket_players;

That's it in a simpler way to use struct in hive.

