Sponsored Links

Ad by Google
Nowadays Big data is trending in the market, and that's why to analyse those dataset too many technologies are in the market every year or you can say on every six months a new big data technology is introduced and California city is the hub of start-up company where most of the technologies are getting matured. Apache Hive is one of the popular big data technology used in the market.

Anyway in this tutorial post, I am going to list you the complex data types supported in Apache Hive, with the example of Map Collection data type. If you are new to Apache Hive, than you need to set-up your Hive environment first before starting and here is a step by step guide to set-up your Hive environment.

What is Apache Hive?
Apache Hive is a data warehouse tool to process and analyse structure dataset, with schema on read feature based on Hadoop infrastructure. Hive was developed at Facebook and mainly used for data presentation phase. You can start with word count tutorial in hive.

Collection types in Hive
Hive support four different collection types and those are listed below-
  • Array: Indexed based collection of similar type.
  • Struct: Object(object contains different types of fields)
  • Map: Collection of Key-Value pair.
  • uniontype: is a collection of heterogeneous data types
We have already seen Array and Struct data types in hive and now time see Map in hive here is a complete hive data type tutorial.

Map in Hive:
A collection of key value pair, when you need elements to be accessed by key, than use key-value pair type map. And values are accessed using key, for example mapColumn.[key]

Let's try with sample of cricket players data creating table and loading sample data than trying to fetch some of the details using key of map.
Sample Data set:

Create table to hold data in map.
create table cricket_players_info(id int,team string,country string,
player string,match_info map<string,string>) 
row format delimited
fields terminated by '\001'
collection items terminated by '\002'
map keys terminated by ':' 
stored as textfile;
Describe command to verify table creation:
hive> desc cricket_players_info;
id                   int                                      
team                 string                                   
country              string                                   
player               string                                   
match_info            map<string,string>                      
Time taken: 0.078 seconds, Fetched: 5 row(s)

Load data into cricket_players_info table:
putting data on hdfs
hadoop fs -put cricket_players_data /
loading dataset into table
hive> load data  inpath '/cricket_players_data' into table cricket_players_info;
Select Query:
hive> select * from cricket_players_info;

Lets access some element from map:
select player, country, match_info["test_match"] as total_test,
match_info["odi_match"] as total_odi from cricket_players_info;

That's it about Complex type in hive.

Sponsored Links


Post a Comment