Apache Hadoop Single Node Cluster Setup

Sponsored Links

Ad by Google

In this tutorial, I am going to show you how to install Pseudo distribution mode of Hadoop. This tutorial is all about step by step installation of Apache Hadoop Single Node Cluster.
If you are very new to Big Data concept than of-course you can go back and get a glimpse of Big Data at What is Big Data

Apache Hadoop 2.7.1 Single Node Cluster Setup
Prerequisites for this tutorial -

Linux Operating System must be installed(I am using Ubuntu 14.x)
Java must be installed (I am using Java 8)
ssh must be installed and sshd must be running.

Step 1. Verify Java is installed
Type java -version on terminal if you are getting something like below that means Java is installed.

subodh@subodh-Inspiron-3520:~$ java -version
java version "1.8.0_71"
Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)

If Java is not installed, than first install it you can follow step-by-step Java installation on Ubuntu to install Java.

Step 2. Verify ssh and sshd
Type which ssh and which sshd if you are getting below output means they are working fine.

subodh@subodh-Inspiron-3520:~$ which ssh
/usr/bin/ssh
subodh@subodh-Inspiron-3520:~$ which sshd
/usr/sbin/sshd

If not than install it using below command -

subodh@subodh-Inspiron-3520:~$ sudo apt-get install ssh

Once installed, verify it using above commands.

Step 3. Download & Install Hadoop
You can download the Hadoop latest version from the Apache official site, alternatively you can download it using command prompt also, below is the syntax to download it using wget command

subodh@subodh-Inspiron-3520:~/software$ wget http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

The above will take some time to download, it depends on your internet connection speed, once you downloaded now unzip it using the below command.

subodh@subodh-Inspiron-3520:~/software$ tar -xzf hadoop-2.7.1.tar.gz

Now set HADOOP_HOME class path inside ~/.bashrc file

subodh@subodh-Inspiron-3520:~/software$ vi ~/.bashrc

Above will open a vi editor, place below statements inside vi editor and saved it.

# hadoop installed directory
export HADOOP_HOME=/home/subodh/software/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Note- Value of HADOOP_HOME is your hadoop installed directory, in my case it is /home/subodh/software/hadoop-2.7.1

Step 4. Configure pass-phraseless ssh
Hadoop uses ssh to access nodes, so now we have to configure password less ssh, although we have already installed ssh, now just need to set password less ssh.
Configure pass-phraseless ssh using below syntax,

subodh@subodh-Inspiron-3520:~/software$ ssh-keygen -t rsa -P ""

Once you enter above command it will ask you for enter a file name, you just need to leave it blank and press enter something like below message,

subodh@subodh-Inspiron-3520:~/software$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/subodh/.ssh/id_rsa):

Once it get successful, it will generate below output and create .ssh file.

subodh@subodh-Inspiron-3520:~/software$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/subodh/.ssh/id_rsa): 
Your identification has been saved in /home/subodh/.ssh/id_rsa.
Your public key has been saved in /home/subodh/.ssh/id_rsa.pub.
The key fingerprint is:
6c:d9:c4:24:b1:dd:bb:aa:15:ba:4f:4b:80:4c:55:ff subodh@subodh-Inspiron-3520
The key's randomart image is:
+--[ RSA 2048]----+
|        +oo      |
|       . * o     |
|      . . + o    |
|     o o +   o   |
|      o S o . E  |
|       . o . .   |
|        . + .    |
|         = o     |
|        oo+      |
+-----------------+

Now add the just created key into the authorized keys using below syntax, so that you can use ssh without asking for a password.

subodh@subodh-Inspiron-3520:~/software$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now time to verify the ssh configuration using below syntax, if it's asking for password that means your ssh configuration was not properly executed, so do it once again with proper steps. If it's not asking for password means you have configured ssh successfully.

subodh@subodh-Inspiron-3520:~/software$ ssh localhost
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-25-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

Last login: Sat Jan 23 22:43:10 2016 from localhost

Wow! ssh is configured successfully :)

Step 5. Do Some Configuration
Edit these files exist inside your installed Hadoop directory

/home/subodh/software/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
/home/subodh/software/hadoop-2.7.1/etc/hadoop/core-site.xml
/home/subodh/software/hadoop-2.7.1/etc/hadoop/mapred-site.xml.template
/home/subodh/software/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

i. Tell Hadoop where your java is installed by editing hadoop-env.sh file

subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/hadoop-env.sh

And place this export JAVA_HOME=/home/subodh/software/jdk1.8.0_71 inside hadoop-env.sh and saved it.

ii. Edit core-site.xml (This file contains few properties related to hdfs like url of hdfs and a lot.

subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/core-site.xml

And place below configurations inside configuration tag.

<configuration>
  <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
        <description>The name of the default file system.  A URI whose
          scheme and authority determine the FileSystem implementation.  The
          uri's scheme determines the config property (fs.SCHEME.impl) naming
          the FileSystem implementation class.  The uri's authority is used to
          determine the host, port, etc. for a filesystem.</description>
    </property>
</configuration>

iii. Renamed mapred-site.xml.template to mapred-site.xml with below command.

subodh@subodh-Inspiron-3520:~/software/hadoop-2.7.1/etc/hadoop$ pwd
/home/subodh/software/hadoop-2.7.1/etc/hadoop
subodh@subodh-Inspiron-3520:~/software/hadoop-2.7.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml

Now, edit mapred-site.xml

subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/mapred-site.xml

And place the below configurations inside configuration tag-

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
 </property>
</configuration>

iv. Let's create two folder, namenode and datanode anywhere-

/home/subodh/hadoop_data/hdfs/namenode
/home/subodh/hadoop_data/hdfs/datanode

v. Edit hdfs-site.xml

subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

And place below configurations inside configuration tag-

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/subodh/hadoop_data/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/subodh/hadoop_data/hdfs/datanode</value>
 </property>
</configuration>

Step 6. Format New Hadoop File System

subodh@subodh-Inspiron-3520:~/software$ hdfs namenode -format

Step 7. Start Hadoop (Type start-all.sh on command prompt)

subodh@subodh-Inspiron-3520:~/software$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
16/01/24 00:56:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/subodh/software/hadoop-2.7.1/logs/hadoop-subodh-namenode-subodh-Inspiron-3520.out
localhost: starting datanode, logging to /home/subodh/software/hadoop-2.7.1/logs/hadoop-subodh-datanode-subodh-Inspiron-3520.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is e4:33:d0:0c:6e:96:d1:eb:81:37:98:24:e6:dc:23:99.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/subodh/software/hadoop-2.7.1/logs/hadoop-subodh-secondarynamenode-subodh-Inspiron-3520.out
16/01/24 00:57:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/subodh/software/hadoop-2.7.1/logs/yarn-subodh-resourcemanager-subodh-Inspiron-3520.out
localhost: starting nodemanager, logging to /home/subodh/software/hadoop-2.7.1/logs/yarn-subodh-nodemanager-subodh-Inspiron-3520.out

Step 8. Verify Hadoop is running or not(Type jps)

subodh@subodh-Inspiron-3520:~/software$ jps
6021 DataNode
6807 Jps
5866 NameNode
6220 SecondaryNameNode
6381 ResourceManager
6510 NodeManager

If you are getting above, means all the components of Hadoop is running perfectly.

Step 9. Access NameNode manager UI
Open your favorite browser and type http://localhost:50070/dfshealth.html#tab-overview
output

Step 10. Stop Hadoop (Type stop-all.sh)

subodh@subodh-Inspiron-3520:~/software$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
16/01/24 01:23:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
16/01/24 01:23:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop

That's it, Congratulation your Hadoop Single node cluster setup is done :)

Apache Hadoop Single Node Cluster Setup

1 comments: