Word count program using hive query

Sponsored Links

Ad by Google

We have already done the setup of hive in our local machine with the help of my previous article, hive installation and now time to start with the hello world implementation of hadoop in hive, which is also known as word count in hive :)
In this post, I am going to show you an example of word count program using hive, although we have already done the same using map reduce program here at word count in map reduce tutorial.

Word count program using hive

Input sample data:
This is my first hive tutorial, which is known as hello world program in big data , big data technologies are now on demand.

Expected Output:

To achieve the above one, I am going to give you a very short and simple query unlike many lines of code while doing with map reduce :)

Instead of directly executing the complete query, I'm breaking them in a step by step, so that it will help easy to read as well as easy understand, of course at the end you will find a complete query to full fill your needs.

Step 1. Create a table in hive
Create a table in hive to insert the above input sample data from the file to hive table.
Syntax:

hive> create table feedback(comments string);

Step 2. Load data from the sample file
Syntax:

hive> load data local inpath '/home/subodh/hadoop_data/comments.txt' into table feedback;

Ok, data is inserted into the hive table, now time to analyse how we can count..

Step 3. Convert comments into an array
Now time to convert comments of feedback table into an array of string.
Syntax:

hive> select split(comments,' ') from feedback;

The above split udf will return the below output
["This","is","my","first","hive","tutorial,","which","is","known","as","hello","world","program","in","big","data",",","big","data","technologies","are","now","on","demand."]

Step 4. Use table generation udf
Now time to return multiple row from the above array of string, and for that we have built in table generation UDTF explode function.
Syntax:

hive> select explode( split(comments,' ')) from feedback;

The output of the above explode with split function is

This
is
my
first
hive
tutorial,
which
is
known
as
hello
world
program
in
big
data
,
big
data
technologies
are
now
on
demand.

Step 5. Final step
Ok,Ok too late, let's put it all together and see the result
Syntax:

hive> select word,count(*) from (select explode( split(comments,' ')) as word from feedback)tmp group by word;

Or alternative one

hive> SELECT word, COUNT(*) FROM feedback LATERAL VIEW explode(split(comments, ' ')) tmp as word GROUP BY word;

And below is the final output.

,	1
This	1
are	1
as	1
big	2
data	2
demand.	1
first	1
hello	1
hive	1
in	1
is	2
known	1
my	1
now	1
on	1
program	1
technologies	1
tutorial,	1
which	1
world	1

That's it. Don't forget to provide your valuable feedback.

Word count program using hive query

1 comments: