Installation of hadoop for single node cluster on Ubuntu.
Many people are eager to work in big data, nosql and map reduce framework in the present day scenario. I was also given the opportunity during my summer internship to work in big data. We used hadoop technology to achieve map/reduce framework and I would like to write about the installation setup of hadoop single node cluster on ubuntu.
Do the following commands in your linux machine and enjoy with hadoop!!!
Install Java 6
$sudo add-apt-repository ppa:ferramroberto/java
$sudo apt-get update
$sudo apt-get install sun-java6-jdk
$sudo update-java-alternatives -s java-6-sun (To config java information).
Check the version – > $java -version
$sudo apt-get update
$sudo apt-get install sun-java6-jdk
$sudo update-java-alternatives -s java-6-sun (To config java information).
Check the version – > $java -version
Install SSH
$sudo apt-get install openssh-client
$sudo apt-get install openssh-server
$sudo apt-get install openssh-server
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ssh localhost
$ssh localhost
Download the Hadoop source package in the following link http://www.apache.org/dyn/closer.cgi/hadoop/core and do the following changes.
Extract the hadoop package from the downloaded tar file.
Change the Hadoop source package permission (chmod -R 777 (hadoop_package_name).
NOTE: Some times it wont works for all internal files so u have to right click the folder and change the permission for all the internal files.
Set the relevant environmental variables.
$export HADOOP_HOME=(path of ur hadoop package).
$export JAVA_HOME=/usr/lib/jvm/java-6-sun
$export JAVA_HOME=/usr/lib/jvm/java-6-sun
In Hadoop/conf/hadoop-env.sh
Update the following line as
# The java implementation to use. Required.
$export JAVA_HOME=/usr/lib/jvm/java-6-sun
In Hadoop/conf/core-site.xml
Do the following changes
In Hadoop/conf/mapred-site.xml
Do the Following changes
In Hadoop/conf/hdfs-site.xml
Do the following changes
Getting Start with Hadoop
To format ur Namenode
$.../Hadoop/bin/hadoop namenode –format
To start ur single node cluster
Running Sample application on mapreduce – (Word count program)
Download the sample plain txt file in the following links and save it as …./input/
http://www.gutenberg.org/ebooks/20417
http://www.gutenberg.org/ebooks/5000
http://www.gutenberg.org/ebooks/4300
http://www.gutenberg.org/ebooks/5000
http://www.gutenberg.org/ebooks/4300
Move the input folder to Hadoop distributed filesystem
Here /user/hduser/input is the path that will be created in the HDFS.
$bin/hadoop dfs -copyFromLocal …/input /user/hduser/input
To run a word count program
Here * represents version of your Hadoop and the jar file.
Here * represents version of your Hadoop and the jar file.
Retrieve job results from hdfs
Hadoop web interfaces
Once this is over you can work with the single node cluster of Hadoop on Ubuntu. Have fun!!!
Cheers,
Kiran.








Great Work!! Successfully installed hadoop and ran the word count program.
ReplyDeleteThanks aps!!!
Delete