CLOUDY THOUGHTS!!!: Installation setup of Hadoop single node cluster on Ubuntu

Installation of hadoop for single node cluster on Ubuntu.

Many people are eager to work in big data, nosql and map reduce framework in the present day scenario. I was also given the opportunity during my summer internship to work in big data. We used hadoop technology to achieve map/reduce framework and I would like to write about the installation setup of hadoop single node cluster on ubuntu.

Do the following commands in your linux machine and enjoy with hadoop!!!

Install Java 6

$sudo add-apt-repository ppa:ferramroberto/java
$sudo apt-get update
$sudo apt-get install sun-java6-jdk
$sudo update-java-alternatives -s java-6-sun (To config java information).
Check the version – > $java -version

Install SSH

$sudo apt-get install openssh-client
$sudo apt-get install openssh-server

Configuring SSH

$ssh-keygen -t rsa -P ""

$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ssh localhost

Download the Hadoop source package in the following link http://www.apache.org/dyn/closer.cgi/hadoop/core and do the following changes.

Extract the hadoop package from the downloaded tar file.

Change the Hadoop source package permission (chmod -R 777 (hadoop_package_name).

NOTE: Some times it wont works for all internal files so u have to right click the folder and change the permission for all the internal files.

Set the relevant environmental variables.

$export HADOOP_HOME=(path of ur hadoop package).
$export JAVA_HOME=/usr/lib/jvm/java-6-sun

In Hadoop/conf/hadoop-env.sh

Update the following line as

# The java implementation to use. Required.

$export JAVA_HOME=/usr/lib/jvm/java-6-sun

In Hadoop/conf/core-site.xml

Do the following changes

<!-- In: conf/core-site.xml -->
<property>
<name>hadoop.tmp.dir</name>
<value>(Path where u want to create cluster)</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>

In Hadoop/conf/mapred-site.xml

Do the Following changes

<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in-process as a single map and reduce task.
</description>
</property>

In Hadoop/conf/hdfs-site.xml

Do the following changes

<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.The actual number of replications can e    specified when the file is created.the default is used if replication is not specified in create time.
</description>
</property>

Getting Start with Hadoop

To format ur Namenode

$.../Hadoop/bin/hadoop namenode –format

To start ur single node cluster

$…/Hadoop/bin/start-all.sh
(or)
$.../Hadoop/bin/start-dfs.sh
$.../Hadoop/bin/start-mapred.sh

Running Sample application on mapreduce – (Word count program)

Download the sample plain txt file in the following links and save it as …./input/

http://www.gutenberg.org/ebooks/20417
http://www.gutenberg.org/ebooks/5000
http://www.gutenberg.org/ebooks/4300

Move the input folder to Hadoop distributed filesystem

Here /user/hduser/input is the path that will be created in the HDFS.

$bin/hadoop dfs -copyFromLocal …/input /user/hduser/input

To run a word count program

Here * represents version of your Hadoop and the jar file.

$bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/input /user/hduser/output

Retrieve job results from hdfs

$bin/hadoop dfs -cat /user/hduser/output/part-r-00000

Hadoop web interfaces

$http://localhost:50070/ - Web UI of the NameNode (Browse the hdfs filesystem, log files).

$http://localhost:50030/ - Web UI of the JobTracker.

$http://localhost:50060/ - Web UI of the TaskTracker

Once this is over you can work with the single node cluster of Hadoop on Ubuntu. Have fun!!!

Cheers,

Kiran.

CLOUDY THOUGHTS!!!

Saturday, 14 July 2012

Installation setup of Hadoop single node cluster on Ubuntu

2 comments:

Total Pageviews

Contributors

Followers