CLOUDY THOUGHTS!!!: July 2012

Tuesday, 17 July 2012

Transfering files to remote desktop in Windows 7

Hi guys,

Many a times when we work we would want to transfer machines from our local system to a remote desktop. You can always use the internet to mail and then download it or you can download it directly again. But there's a much easier way to transfer the files to a remote desktop from the local machine using windows 7. Windows 7 has a slightly different procedure compared to Windows XP.

Recently I happened to do this with Amazon EC2. I had to transfer files from my local system to the EC2 instance. All I had to do was,
Go to start -> remote desktop connection. The remote desktop window opens up.

Now click on options which is in the lower left corner

Go to local resources tab in the following window and click on more button at the bottom .

Now u can select the drive that contains the files

Now once you log into the instance , you can see the drive that u shared along with your other drives in My Computer of the EC2 instance.

So now you can copy files from your local drive to your instance.

Beware!!! If u log into your EC2 instance via the AWS Console you will not be able to access the drive you selected. You have to make use of the Remote Desktop Connection facility provided by Windows.

Regards,
Aps.

Sunday, 15 July 2012

Installing Apache Mahout on Hadoop Cluster

Mahout is an open source scalable machine learning library by Apache. To process huge amounts of data, you have to run mahout on top of hadoop cluster. Mahout can run as a stand-alone application as well. To run mahout on hadoop cluster it is sufficient to install mahout only on Hadoop master node.
Mahout can be installed from its source distributions. It's obvious to build the source using ant like build tool. Here we use Apache maven to build the source by dynamically downloading dependencies and plug-ins from appropriate repositories.

Installing Maven on Ubuntu:

1) Download the latest version of apache-maven from the link http://maven.apache.org/download.html and extract it.

2) Set the MAVEN_HOME variable in the /etc/environment file.

PATH = $JAVA_HOME/bin:$MAVEN_HOME /bin

MAVEN_HOME = /your location/apache-maven-*

SUDO GEDIT /ETC/ENVIRONMENT

3) Add the above two lines in the environment file and save the file. Logout and login for the changes to take effect or use source command

SOURCE /ETC/ENVIRONMENT

4) Maven is installed and Check your maven version.

MVN --VERSION.

Mahout Installation:

1) Download and extract the mahout-distribution-* from the link http://www.apache.org/dyn/closer.cgi/mahout/ . Alternatively you can checkout mahout distributions from the trunk.

svn co http://svn.apache.org/repos/asf/mahout/trunk
cd mahout-distributions-*

2) Now build the package using maven

MVN CLEAN INSTALL

[avoid testing of files by using the option -DskipTests]

3) Mahout is installed and you can check by the command

/BIN/MAHOUT

4) Before running mahout on top of hadoop ensure hadoop is installed and running in any of the modes. To run mahout on top of hadoop set the environment variables HADOOP_HOME and HADOOP_CONF_DIR and comment the MAHOUT_LOCAL variable in the mahout script file.

SUDO GEDIT /BIN/MAHOUT

5) Add the following lines on the top

HADOOP_HOME = /your location/hadoop

HADOOP_CONF_DIR = /your location/hadoop/conf

6) Comment the following line which makes the mahout application to run locally.

#MAHOUT_LOCAL =/your location/mahout-distribution-*

Simple Program in Mahout:

1) 20news-group with naive bayes classifier algorithm. the test and train data set from the link http://people .csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz.

2) Extract the tar file. Now you need to prepare the data set to be accepted by the bayes algorithm. You need to scan each directory and transform each file into relevant format. The mahout program prepare20newsgroups does this. Use the command to convert train and test data

bin/mahout prepare20newsgroups -p /your location/20news-bydate-train -o /your location/20news-train -a org.apache.mahout.vectorizer.DefaultAnalyzer -c UTF-8

bin/mahout prepare20newsgroups -p /your location/20news-bydate-test -o /your/location/20news-test -a org.apache.mahout.vectorizer.DefaultAnalyzer -c UTF-8

3) Now train the classifier to create the model.

bin/mahout trainclassifier -i /your location/20news-train -o /your location/20news-model -type bayes -ng 1 -source

hdfs

4) The model generated is tested with the test data set to measure the accuracy. A confusion matrix is generated for the classification.

bin/mahout testclassifier -d /your location/20news-test -m /your location/20news-model -type bayes -ng 1 -source hdfs -method parallel

[parallel - runs on map-reduce process , sequential - runs on local system ]

5) The confusion matrix generated

Thats it !!!! Mahout is configured to run on hadoop and you can play with it.

>>Venk@_7harun<<

Remote Desktop Access for AmazonEC2

The interaction with the AmazonEC2 can be done via command line SSH connection. It's a lot easier to develop the code on the target sever than it is to develop on the local machine and sync the files repeatedly. So I have had to run Eclipse IDE on Amazon EC2 instance which required a GUI front-end to operate.
Connecting to the desktop instance will require a means of sending over the X-Windows screen. We will use the NoMachine NX server, which allows remote GUI connections of the desktop even over low bandwidth connections.

Server-side Commands[Amazon EC2]:

1) Installation of NoMachine's NX server involves installing 3 packages that depend on each other to function. Follow the link http://www.nomachine.com/select-package.php?os=linux&id=1
to download the 3 packages appropriate for the platform. I chose the debian packages.
a) i386 - 32 bit.
b) x86_64 - 64 bit

2) Once downloaded install the packages in the same order.
a) sudo dpkg -i nxclient_3.5.0-7_i386.deb
b) sudo dpkg -i nxnode_3.5.0-9_i386.deb
c) sudo dpkg -i nxserver_3.5.0-11_i386.deb

[*replace 'gpkg -i' by rpm -iv for rpm packages]

3) Now run the nxserver install script.
a) usr/NX/scripts/setup/nxserver --install.

4) In order to connect from the client machine, the ssh keys must match between client and server. The easiest approach is to generate the key on the server. Generate the key using this command. The resulting ssh key will be stored in /usr/NX/share/keys/default.id_dsa.key.
a)usr/NX/bin/server --keygen.

Client-side Installation:

1) On your client machine download the nxclient from this link http://www.nomachine.com/download.php appropriate for your platform.

2) Install the package and then start the nxclient to configure the connection such as ipaddress, port etc.

3) Copy the generated key from the server [/usr/NX/share/keys/default.id_dsa.key] . Back on the client machine within the configuration window click the Keys tab and paste the copied key.

4) Give the username and password for the server login.

Done!!! You should be able to start the new session of the amazon instance.

>>Venk@_7harun<<

Saturday, 14 July 2012

How to create a Hadoop project in eclipse

Most of us prefer IDE for the development process as it makes life easier. So here we will see how to create a hadoop project on eclipse.

Hadoop on Eclipse:

1) Follow the installation setup of hadoop single node cluster on Ubuntu (My previous blog).

2) Download Hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar from the following site

https://issues.apache.org/jira/browse/MAPREDUCE-1280

(I prefer this plugin and it works well for me).

3) Paste that jar file into /usr/lib/eclipse/plugins/

4) Run the eclipse via terminal

$ cd /usr/lib/eclipse
$ ./eclipse –clean

5) Create a new map-reduce project

a) File->new->map/reduce project

b) Select configure Hadoop install directory or specify hadoop library location and give the pathname of hadoop folder in which you configured single node cluster.

6) From window->show view -> map-reduce locations.

a) In that map-reduce location, you have to set hadoop location by using New hadoop location icon (blue color elephant symbol). Hadoop location can be any name as you like.

b) In General

1) Give Map/Reduce Master as

Host -> localhost

port -> 54311

2) Give HDFS Master as

Host -> localhost

port -> 54310

3) Refer: /conf/*-site.xml files for the values.

7) Refresh the dfs locations in project explorer tab.

a) You can upload the file into dfs location and also download it.

b) This location will be helpful for input – output operation for hadoop map-reduce programs.

8) Finally you can run your project on hadoop.

Now create your own map/reduce program on eclipse and enjoy with hadoop.

Cheers,

Kiran.

Installation setup of Hadoop single node cluster on Ubuntu

Installation of hadoop for single node cluster on Ubuntu.

Many people are eager to work in big data, nosql and map reduce framework in the present day scenario. I was also given the opportunity during my summer internship to work in big data. We used hadoop technology to achieve map/reduce framework and I would like to write about the installation setup of hadoop single node cluster on ubuntu.

Do the following commands in your linux machine and enjoy with hadoop!!!

Install Java 6

$sudo add-apt-repository ppa:ferramroberto/java
$sudo apt-get update
$sudo apt-get install sun-java6-jdk
$sudo update-java-alternatives -s java-6-sun (To config java information).
Check the version – > $java -version

Install SSH

$sudo apt-get install openssh-client
$sudo apt-get install openssh-server

Configuring SSH

$ssh-keygen -t rsa -P ""

$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ssh localhost

Download the Hadoop source package in the following link http://www.apache.org/dyn/closer.cgi/hadoop/core and do the following changes.

Extract the hadoop package from the downloaded tar file.

Change the Hadoop source package permission (chmod -R 777 (hadoop_package_name).

NOTE: Some times it wont works for all internal files so u have to right click the folder and change the permission for all the internal files.

Set the relevant environmental variables.

$export HADOOP_HOME=(path of ur hadoop package).
$export JAVA_HOME=/usr/lib/jvm/java-6-sun

In Hadoop/conf/hadoop-env.sh

Update the following line as

# The java implementation to use. Required.

$export JAVA_HOME=/usr/lib/jvm/java-6-sun

In Hadoop/conf/core-site.xml

Do the following changes

<!-- In: conf/core-site.xml -->
<property>
<name>hadoop.tmp.dir</name>
<value>(Path where u want to create cluster)</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>

In Hadoop/conf/mapred-site.xml

Do the Following changes

<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in-process as a single map and reduce task.
</description>
</property>

In Hadoop/conf/hdfs-site.xml

Do the following changes

<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.The actual number of replications can e    specified when the file is created.the default is used if replication is not specified in create time.
</description>
</property>

Getting Start with Hadoop

To format ur Namenode

$.../Hadoop/bin/hadoop namenode –format

To start ur single node cluster

$…/Hadoop/bin/start-all.sh
(or)
$.../Hadoop/bin/start-dfs.sh
$.../Hadoop/bin/start-mapred.sh

Running Sample application on mapreduce – (Word count program)

Download the sample plain txt file in the following links and save it as …./input/

http://www.gutenberg.org/ebooks/20417
http://www.gutenberg.org/ebooks/5000
http://www.gutenberg.org/ebooks/4300

Move the input folder to Hadoop distributed filesystem

Here /user/hduser/input is the path that will be created in the HDFS.

$bin/hadoop dfs -copyFromLocal …/input /user/hduser/input

To run a word count program

Here * represents version of your Hadoop and the jar file.

$bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/input /user/hduser/output

Retrieve job results from hdfs

$bin/hadoop dfs -cat /user/hduser/output/part-r-00000

Hadoop web interfaces

$http://localhost:50070/ - Web UI of the NameNode (Browse the hdfs filesystem, log files).

$http://localhost:50030/ - Web UI of the JobTracker.

$http://localhost:50060/ - Web UI of the TaskTracker

Once this is over you can work with the single node cluster of Hadoop on Ubuntu. Have fun!!!

Cheers,

Kiran.

Friday, 13 July 2012

Creating Image of an EC2 Instance

Hi,
This is my first technical article and I will try my best to keep it as technical as possible though I have no idea to what extent I will be achieving it.I will start with a simple article.

Well recently when I was working with Amazon EC2, I had to create almost 100 instances and I had to install MySQL, tomcat, and other software in every instance. I probably would have spent the entire time doing it. But Amazon always makes our work easier doesn't it? I read about creating an image of the instance and creating any number of instances from that image. the created instances will be exactly like the one from which the image was created. When I 'googled' on how to create Amazon EC2 Instance Images nothing much turned up. Finally it was a simple right click that did the trick.

So when you want to create an Image of an Instance all you have to do is log into your AWS Management Console. And navigate to your EC2 tab. Right click on the instance and select Create Image(EBS AMI)

Next Navigate to the AMIs path , right click on the desired AMI and click launch instance.

The request instances wizard opens up where you can choose the type of instance and specify the number of instances.

This way you can exactly replicate the Amazon EC2 instances using the AWS Management Consoles.

Aps.

Thursday, 12 July 2012

Hi There!!! :)

#include<stdio.h>
int main()
{
char* common_interest="cloud computing";
printf("HELLO WORLD !! Welcome to CLOUDY THOUGHTS!!! :) \n");
printf(" We are a group of five people who are very diverse, yet share a common interest, %s.",common_interest);
return 0;
}

Compilation error: Not enough content..

Hi there,

Well another blog, another concept. Hope this survives. I invite others to attend the ribbon cutting, and start writing something. In this blog, we have decided to pen our thoughts and our experiences on cloud computing.

Aps...

CLOUDY THOUGHTS!!!