Difference between Apache Pig and Apache Hive

Sponsored Links

Ad by Google

Apache Pig and Apache Hive both are very popular in the batch processing in Hadoop and both technologies are in the list of top Big Data technologies.
From the 10 thousand feet both are seems to be almost same, and most of the folks really think both Pig & Hive is same but that is not actually true.
In this post, I am going to show you few key differences between Pig and Hive, this question is also very popular in Big Data interview and of-course before going to an interview don't forget to remember these key differences.

Well Apache Hive is developed by Facebook and Apache Pig is developed by Yahoo, and Pig is better suits for ETL kind of stuffs, whereas Hive is for data presentation phase, let's see the key differences-

Key Differences Between Hive & Pig

Pig is a data flow programming language, whereas Hive is a dataware house and SQL oriented.
Pig is all about loading and storing the datasets, whereas Hive can perform update/delete on datasets also.
Pig allows you to save intermediate transformation values, whereas Hive doesn't.
Hive require pre-defined schema, whereas in Pig schema is an optional.
Hive supports bucket and partitioning concept, whereas Pig doesn't.
Hive supports JDBC/ODBC connectivity while Pig doesn't.
Managing Pig is easy as compared to Hive.

Use Case better suited for Pig instead of Hive -
Where you have a petabytes of UN-structured/semi-structured data, and on those datasets you want to test your algorithm/theory. Pig will better suits here, because Pig doesn't force you to design your schema first.

Use Case better suited for Hive instead of Pig -
Where you have petabytes of data and want to perform some select operations fast, than, you can utilized the concept of partitioning/bucketing in Hive for better performance.

Difference between Apache Pig and Apache Hive

Key Differences Between Hive & Pig

0 comments:

Post a Comment