Sponsored Links
Ad by Google
In this tutorial, I am going to show you how to install Pseudo distribution mode of Hadoop. This tutorial is all about step by step installation of Apache Hadoop Single Node Cluster.
If you are very new to Big Data concept than of-course you can go back and get a glimpse of Big Data at What is Big Data
Apache Hadoop 2.7.1 Single Node Cluster Setup
Prerequisites for this tutorial -
Type java -version on terminal if you are getting something like below that means Java is installed.
Step 2. Verify ssh and sshd
Type which ssh and which sshd if you are getting below output means they are working fine.
Step 3. Download & Install Hadoop
You can download the Hadoop latest version from the Apache official site, alternatively you can download it using command prompt also, below is the syntax to download it using wget command
Step 4. Configure pass-phraseless ssh
Hadoop uses ssh to access nodes, so now we have to configure password less ssh, although we have already installed ssh, now just need to set password less ssh.
Configure pass-phraseless ssh using below syntax,
Step 5. Do Some Configuration
Edit these files exist inside your installed Hadoop directory
ii. Edit core-site.xml (This file contains few properties related to hdfs like url of hdfs and a lot.
iv. Let's create two folder, namenode and datanode anywhere-
v. Edit hdfs-site.xml
Step 6. Format New Hadoop File System
If you are getting above, means all the components of Hadoop is running perfectly.
Step 9. Access NameNode manager UI
Open your favorite browser and type http://localhost:50070/dfshealth.html#tab-overview
output
Step 10. Stop Hadoop (Type stop-all.sh)
That's it, Congratulation your Hadoop Single node cluster setup is done :)
If you are very new to Big Data concept than of-course you can go back and get a glimpse of Big Data at What is Big Data
Apache Hadoop 2.7.1 Single Node Cluster Setup
Prerequisites for this tutorial -
- Linux Operating System must be installed(I am using Ubuntu 14.x)
- Java must be installed (I am using Java 8)
- ssh must be installed and sshd must be running.
Type java -version on terminal if you are getting something like below that means Java is installed.
subodh@subodh-Inspiron-3520:~$ java -version java version "1.8.0_71" Java(TM) SE Runtime Environment (build 1.8.0_71-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)If Java is not installed, than first install it you can follow step-by-step Java installation on Ubuntu to install Java.
Step 2. Verify ssh and sshd
Type which ssh and which sshd if you are getting below output means they are working fine.
subodh@subodh-Inspiron-3520:~$ which ssh /usr/bin/ssh subodh@subodh-Inspiron-3520:~$ which sshd /usr/sbin/sshdIf not than install it using below command -
subodh@subodh-Inspiron-3520:~$ sudo apt-get install sshOnce installed, verify it using above commands.
Step 3. Download & Install Hadoop
You can download the Hadoop latest version from the Apache official site, alternatively you can download it using command prompt also, below is the syntax to download it using wget command
subodh@subodh-Inspiron-3520:~/software$ wget http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gzThe above will take some time to download, it depends on your internet connection speed, once you downloaded now unzip it using the below command.
subodh@subodh-Inspiron-3520:~/software$ tar -xzf hadoop-2.7.1.tar.gzNow set HADOOP_HOME class path inside ~/.bashrc file
subodh@subodh-Inspiron-3520:~/software$ vi ~/.bashrcAbove will open a vi editor, place below statements inside vi editor and saved it.
# hadoop installed directory export HADOOP_HOME=/home/subodh/software/hadoop-2.7.1 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"Note- Value of HADOOP_HOME is your hadoop installed directory, in my case it is /home/subodh/software/hadoop-2.7.1
Step 4. Configure pass-phraseless ssh
Hadoop uses ssh to access nodes, so now we have to configure password less ssh, although we have already installed ssh, now just need to set password less ssh.
Configure pass-phraseless ssh using below syntax,
subodh@subodh-Inspiron-3520:~/software$ ssh-keygen -t rsa -P ""Once you enter above command it will ask you for enter a file name, you just need to leave it blank and press enter something like below message,
subodh@subodh-Inspiron-3520:~/software$ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/subodh/.ssh/id_rsa):Once it get successful, it will generate below output and create .ssh file.
subodh@subodh-Inspiron-3520:~/software$ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/subodh/.ssh/id_rsa): Your identification has been saved in /home/subodh/.ssh/id_rsa. Your public key has been saved in /home/subodh/.ssh/id_rsa.pub. The key fingerprint is: 6c:d9:c4:24:b1:dd:bb:aa:15:ba:4f:4b:80:4c:55:ff subodh@subodh-Inspiron-3520 The key's randomart image is: +--[ RSA 2048]----+ | +oo | | . * o | | . . + o | | o o + o | | o S o . E | | . o . . | | . + . | | = o | | oo+ | +-----------------+Now add the just created key into the authorized keys using below syntax, so that you can use ssh without asking for a password.
subodh@subodh-Inspiron-3520:~/software$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keysNow time to verify the ssh configuration using below syntax, if it's asking for password that means your ssh configuration was not properly executed, so do it once again with proper steps. If it's not asking for password means you have configured ssh successfully.
subodh@subodh-Inspiron-3520:~/software$ ssh localhost Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-25-generic x86_64) * Documentation: https://help.ubuntu.com/ Last login: Sat Jan 23 22:43:10 2016 from localhostWow! ssh is configured successfully :)
Step 5. Do Some Configuration
Edit these files exist inside your installed Hadoop directory
- /home/subodh/software/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
- /home/subodh/software/hadoop-2.7.1/etc/hadoop/core-site.xml
- /home/subodh/software/hadoop-2.7.1/etc/hadoop/mapred-site.xml.template
- /home/subodh/software/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/hadoop-env.shAnd place this export JAVA_HOME=/home/subodh/software/jdk1.8.0_71 inside hadoop-env.sh and saved it.
ii. Edit core-site.xml (This file contains few properties related to hdfs like url of hdfs and a lot.
subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/core-site.xmlAnd place below configurations inside configuration tag.
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>iii. Renamed mapred-site.xml.template to mapred-site.xml with below command.
subodh@subodh-Inspiron-3520:~/software/hadoop-2.7.1/etc/hadoop$ pwd /home/subodh/software/hadoop-2.7.1/etc/hadoop subodh@subodh-Inspiron-3520:~/software/hadoop-2.7.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xmlNow, edit mapred-site.xml
subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/mapred-site.xmlAnd place the below configurations inside configuration tag-
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>
iv. Let's create two folder, namenode and datanode anywhere-
/home/subodh/hadoop_data/hdfs/namenode /home/subodh/hadoop_data/hdfs/datanode
v. Edit hdfs-site.xml
subodh@subodh-Inspiron-3520:~/software$ vi /home/subodh/software/hadoop-2.7.1/etc/hadoop/hdfs-site.xmlAnd place below configurations inside configuration tag-
<configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/subodh/hadoop_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/subodh/hadoop_data/hdfs/datanode</value> </property> </configuration>
Step 6. Format New Hadoop File System
subodh@subodh-Inspiron-3520:~/software$ hdfs namenode -formatStep 7. Start Hadoop (Type start-all.sh on command prompt)
subodh@subodh-Inspiron-3520:~/software$ start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 16/01/24 00:56:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [localhost] localhost: starting namenode, logging to /home/subodh/software/hadoop-2.7.1/logs/hadoop-subodh-namenode-subodh-Inspiron-3520.out localhost: starting datanode, logging to /home/subodh/software/hadoop-2.7.1/logs/hadoop-subodh-datanode-subodh-Inspiron-3520.out Starting secondary namenodes [0.0.0.0] The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established. ECDSA key fingerprint is e4:33:d0:0c:6e:96:d1:eb:81:37:98:24:e6:dc:23:99. Are you sure you want to continue connecting (yes/no)? yes 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts. 0.0.0.0: starting secondarynamenode, logging to /home/subodh/software/hadoop-2.7.1/logs/hadoop-subodh-secondarynamenode-subodh-Inspiron-3520.out 16/01/24 00:57:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons starting resourcemanager, logging to /home/subodh/software/hadoop-2.7.1/logs/yarn-subodh-resourcemanager-subodh-Inspiron-3520.out localhost: starting nodemanager, logging to /home/subodh/software/hadoop-2.7.1/logs/yarn-subodh-nodemanager-subodh-Inspiron-3520.outStep 8. Verify Hadoop is running or not(Type jps)
subodh@subodh-Inspiron-3520:~/software$ jps 6021 DataNode 6807 Jps 5866 NameNode 6220 SecondaryNameNode 6381 ResourceManager 6510 NodeManager
If you are getting above, means all the components of Hadoop is running perfectly.
Step 9. Access NameNode manager UI
Open your favorite browser and type http://localhost:50070/dfshealth.html#tab-overview
output
Step 10. Stop Hadoop (Type stop-all.sh)
subodh@subodh-Inspiron-3520:~/software$ stop-all.sh This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh 16/01/24 01:23:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Stopping namenodes on [localhost] localhost: stopping namenode localhost: stopping datanode Stopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode 16/01/24 01:23:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable stopping yarn daemons stopping resourcemanager localhost: stopping nodemanager no proxyserver to stop
That's it, Congratulation your Hadoop Single node cluster setup is done :)
Sponsored Links
Thank you so much... I have just installed Hadoop successfully with the help of your post...it is very helpful article :)
ReplyDelete