Saturday, 14 July 2012

Installation setup of Hadoop single node cluster on Ubuntu

Installation of hadoop for single node cluster on Ubuntu.

Many people are eager to work in big data, nosql and map reduce framework in the present day scenario.  I was also given the opportunity during  my summer internship to work  in big data. We used hadoop technology to achieve map/reduce framework and I would like to write about the installation setup of hadoop single node cluster on ubuntu.

Do the following commands in your linux machine and enjoy with hadoop!!!

Install Java 6 

$sudo add-apt-repository ppa:ferramroberto/java
$sudo apt-get update
$sudo apt-get install sun-java6-jdk
$sudo update-java-alternatives -s java-6-sun (To config java information).
Check the version – > $java -version

Install SSH

$sudo apt-get install openssh-client
$sudo apt-get install openssh-server

Configuring SSH


$ssh-keygen -t rsa -P ""




$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ssh localhost

Download the Hadoop source package in the following link http://www.apache.org/dyn/closer.cgi/hadoop/core and do the following changes.

Extract the hadoop package from the downloaded tar file.
Change the Hadoop source package permission (chmod -R 777 (hadoop_package_name). 
NOTE: Some times it wont works for all internal files so u have to right click the folder and change the permission for all the internal files.

Set the relevant environmental variables.

$export HADOOP_HOME=(path of ur hadoop package).
$export JAVA_HOME=/usr/lib/jvm/java-6-sun

In Hadoop/conf/hadoop-env.sh

Update the following line as
# The java implementation to use. Required.

$export JAVA_HOME=/usr/lib/jvm/java-6-sun

In Hadoop/conf/core-site.xml

Do the following changes

In Hadoop/conf/mapred-site.xml

Do the Following changes 
 

In Hadoop/conf/hdfs-site.xml

Do the following changes

Getting Start with Hadoop

To format ur Namenode

$.../Hadoop/bin/hadoop namenode –format

To start ur single node cluster

$…/Hadoop/bin/start-all.sh
              (or)
$.../Hadoop/bin/start-dfs.sh
$.../Hadoop/bin/start-mapred.sh



Running Sample application on mapreduce – (Word count program)

Download the sample plain txt file in the following links and save it as …./input/

http://www.gutenberg.org/ebooks/20417
http://www.gutenberg.org/ebooks/5000
http://www.gutenberg.org/ebooks/4300

Move the input folder to Hadoop distributed filesystem

Here /user/hduser/input is the path that will be created in the HDFS.

$bin/hadoop dfs -copyFromLocal …/input /user/hduser/input

To run a word count program

Here * represents version of your Hadoop and the jar file.

$bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/input /user/hduser/output


Retrieve job results from hdfs

$bin/hadoop dfs -cat /user/hduser/output/part-r-00000


Hadoop web interfaces

$http://localhost:50070/ - Web UI of the NameNode (Browse the hdfs filesystem, log files).

$http://localhost:50030/ - Web UI of the JobTracker.

$http://localhost:50060/ - Web UI of the TaskTracker


Once this is over you can work with the single node cluster of Hadoop on Ubuntu. Have fun!!!

Cheers,
Kiran.

2 comments:

  1. Great Work!! Successfully installed hadoop and ran the word count program.

    ReplyDelete