Ad by Google
Nowadays you will find this technology is trending in the market, Hadoop is very popular in very short period,although not very short it's already crossed the 10 years but now actually market is getting opted the big data technologies and of course Hadoop is the fundamental of big data related technology although more than 50 popular big data technologies are in the market but the initial one is Hadoop and known as fundamental of big data technology and in this post, I am trying to explain you what is Hadoop? If you want to read about big data you can follow this What is Big Data.
What is Hadoop ??
The Apache Hadoop is an open source software framework for storing and processing the large data(big data) sets in a distributed fashion(cluster of computer) with the help of commodity(low cost) hardware.
Hadoop allows you to run your application on large cluster of computer, having more than thousand of nodes to process peta bytes,tera bytes of data, something similar to grid computing, but not limited to grid computing, here is a detail differences between grid computing and Hadoop.
Although Hadoop has many features which can not be compare with grid computing or other similar technologies but, what Hadoop makes totally different from other distributed system is known as "Data locality". It's the heart of Hadoop.
Unlike traditional system, instead of data coming to code/logic. In Hadoop your MapReduce logic goes to data node and it become local for the data-node, which means your processing become much much faster as compared to other distributed system.
You have a dataset of user activity log of 1TB and you written a map reduce logic to get only those users data who belong to USA.
- Your 1TB data sets will be broken into number of chunks(blocks).
- All blocks goes to different data nodes.
- Your map reduce logic to find the user details would be available to every nodes.
- Task tracker is the one who execute your mapreduce job on every data node corresponding to the block of data and this is known as data locality.
Because built for data, play with data, get the insights of your data and innovate something.
- Accessible: Hadoop runs on cluster of computers.
- Robust: Runs on commodity hardware,easily handle failure/crash. It's designed for failure handling.
- Scalable: To handle large dataset, easily add nodes whenever required.
- Simple: Easy to learn, you can start with word count from the day one.(How to set-up single node cluster)
Hadoop inspired from google papers of Google File System(GFS) and MapReduce paper, initially developed under Apache Nutch project, under the supervision of Doug Cutting (The inventor of Apache Lucene,Apache Nutch and Apache Hadoop) and later on moved to a separate Hadoop sub project in 2006. Now Apache Hadoop become one of the popular Apache project under Apache Software Foundation(ASF).
Who Uses Hadoop??
Google,Yahoo, IBM, Amazon, Facebook, Linkedin, Adobe, Twitter, HP, EMC, NYSE, Ebay and a lot of companies.
Top Hadoop Distribution Providers
Hello World in Hadoop