0
Sponsored Links


Ad by Google
Apache Hive is a popular data warehouse, based on Hadoop infrastructure and one of the demanding technology among the big data technologies. Nowadays almost all the analytic companies are using Hive to perform analytics on big data, because of the simplicity of hive  and also hive is very much sql oriented. Hive enables easy way to summarize your big data, perform ad-hoc queries, and analysis of big data sets.
If you are new to Big data technology and want to build your carrier in big data technology, than you should must know What is Hadoop and Hive. Anyway, for installation of hive you may visit my previous step by step guide to install Apache hive. In this, tutorial, I'm going to show you an example of one of the commonly used collection data type in hive, that is Array in Hive.
Collection data type:
Four different Collection data type supported in hive-
  • Array: Indexed based collection of similar type.
  • Struct: Object(object contains different types of fields)
  • Map: Collection of Key-Value pair.
  • uniontype: is a collection of heterogeneous data types.
Array in Hive:
Indexed based ordered collection, index start with zero, to access element of array you need to pass the index position of element to be access.
Example,
array("Apple","Banana","Mango","Papaya")
Now 3rd element/Mango can be access by using,
array[2]

Lets create table with array type:
create table cricket_team(id int,name string,country string,
players array<string>) row format delimited
fields terminated by '\001'
collection items terminated by '\002'
stored as textfile;

Describe command to verify table creation:
hive> describe cricket_team;
OK
id                   int                                      
name                 string                                   
country              string                                   
players              array<string>                            
Time taken: 0.161 seconds, Fetched: 4 row(s)

Load data into the cricket_team:
putting dataset on hdfs:
hadoop fs -put cricket_data /
loading dataset
hive> load data  inpath '/cricket_data' into table cricket_team;
Select Query:
hive> select * from cricket_team;
Output:

Access element from players array type:
hive> select country,players[0] from cricket_team;

OUTPUT:


Sponsored Links

0 comments:

Post a Comment