* Techie(S)pArK *: aws

Showing posts with label aws. Show all posts

Tuesday, 18 June 2013

One Cap to rule 'em all ...

Wondering which cap it could be?
....Well its Capistrano :D

I am a fan of Capistrano from way back and we use it for almost all kind of deployments - Hadoop, MongoDB clusters and so on.
If you have not tried Capistrano, you must try it and figure out how you can use for deployments in your environment.

Its highly configurable - so capify your stuffs!

Checkout the capified scripts to Deploy a replicated sharded MongoDB cluster on AWS EC2 instances in the following link
https://github.com/SwathiMystery/deploy_shard_mongodb
Feel free to experiment, report bugs/ issues and contribute back.

For more details, follow the link below :
https://github.com/SwathiMystery/deploy_shard_mongodb/blob/master/README.md#deploy-replicated-sharded-mongodb-cluster

Monday, 11 February 2013

Hadoop Hangover: How-to launch a hadoop cluster CDH4 [MRv1 / YARN + Ganglia] using Apache Whirr

This post is about how-to launch a CDH4 MRv1 or CDH4 Yarn cluster on EC2 instances. It's said that you can launch a cluster with the help of Whirr and in a matter of 5 minutes! This is very true if and only if everything works out well! ;)

Hopefully, this article helps you in that regard.
So, let's row the boat...

Download the stable version of Apache Whirr ie. whirr-0.8.1.tar.gz from the following link whirr-0.8.1.tar.gz
Extract from the tarball and generate the key

Generate the key

Make a properties file to launch the cluster with that configuration.

Now let me tell you how to avoid getting headaches!

cluster name: Keep your cluster name simple. Avoid testCluster, testCluster1 etc. ie. No Caps, numerics..
Decide on the number of datanodes you want judiciously.
Your launch may not be successful, if java is not installed. Make sure the image has Java. However, this properties file takes care of that.
It will be good to go ahead with MRv1 for now and later switch to MRv2, when we get a production stable release.
This is the minimal set of configurations for launching a Hadoop cluster. But, you can do a lot performance tuning upon this.
I had launched this cluster from an ec2 instance, Initially i faced errors, regarding user. Setting the configuration below, solved the problem.

Set proper permissions for ~/.ssh and whirr-0.8.1 folder before launching.

Well, we are ready to launch the cluster. Name the properties file as "whirr_cdh.properties".

In the console you can see, links to Namenode and JobTracker Web UI. It also prints how to ssh to the instances in the end.

Now, you should be having the files generated. You will be able to see these files: instances, hadoop-proxy.sh and hadoop-site.xml
Starting the proxy

Open another terminal, and type
You should be able to access the HDFS.

You can alternatively download hadoop tarball and launch with

Okay! So I know that you will not be satisfied unless you a web UI

So, we are good to go!

If you want to launch MRv2, use this.

and the same process!
Happy Learning! :)

Tuesday, 18 December 2012

FUSE on Amazon S3

FUSE: File System In User Space, hosted on sourceforge, a well known open source project http://fuse.sourceforge.net/
You either put the files in S3 bucket directly or in the mount point, both will always be in the same hierarchy and in Sync. The best thing is that any arbitrary program can just point to this mount point and perform simple/ normal commands, rather than file system specific commands.

Here is a small documentation about how we can achieve this.

1. Check out the code from google code.

$ svn checkout http://s3fs.googlecode.com/svn/trunk/ s3fs

2. Switch to the working directory

$ cd s3fs

$ ls

AUTHORS autogen.sh ChangeLog configure.ac COPYING doc INSTALL Makefile.am NEWS README src test

3. Now same old ritual of configure , make and install.
To run the subsequent command you need autoconf. So make sure you have it by running the following command.

$ sudo apt-get install autoconf
$ autoreconf --install

It is silently notifying you that you lack the libraries. Time to get them installed...

$ sudo apt-get install build-essential libfuse-dev fuse-utils libcurl4-openssl-dev libxml2-dev mime-support

Getting back...

$ ./configure --prefix=/usr

$ make

$ sudo make install

4. Done with the Installation process.
Cross-check:

$ /usr/bin/s3fs

s3fs: missing BUCKET argumentUsage: s3fs BUCKET:[PATH] MOUNTPOINT [OPTION]...

5. Add the following line to your ~/.bashrc file and source it.

export s3fs=/usr/bin/s3fs

$source ~/.bashrc$ s3fs s3fs: missing BUCKET argumentUsage: s3fs BUCKET:[PATH] MOUNTPOINT [OPTION]...

6. Install s3cmd. Many of you must be using this tool to interact with s3.

$ sudo apt-get install s3cmd$ s3cmd --configure

This will configure with the S3 account using Access and Secret Key.

Configuring FUSE
1. First set use_allow_other for others to use. Uncomment in fuse.conf

$ vi /etc/fuse.conf

2. Set the AcessKey:SecretKey in the format in passwd-s3fs file

$ sudo vi /etc/passwd-s3fs

$ sudo chmod 640 /etc/passwd-s3fs

3. Created a bucket called "s3dir-sync" for this experiment.

$ s3cmd ls2012-12-18 09:23 s3://s3dir-sync

4. Creating a mount point where you want to dump/place the files and keep them in sync with the S3 bucket. Create as root user.

$ sudo mkdir -p /mnt/s3Sync$ sudo chmod 777 /mnt/s3Sync

5. With s3fs, as a root user.

$ sudo s3fs s3dir-sync -o default_acl=public-read -o allow_other /mnt/s3Sync/

Cross-check:

$ mount -ls3fs on /mnt/s3Sync type fuse.s3fs (rw,nosuid,nodev,allow_other)

If you try mounting again, you will get the following Warning

mount: according to mtab, s3fs is already mounted on /mnt/s3Sync

6. I created a directory structure of
/mnt/s3Sync/
-> 2012/12/18$ more test.txt
This is a check file to sync with the s3dir-sync.
Blah..!

The same is synced in the bucket "s3dir-sync"
Cross-Check:

$ s3cmd ls s3://s3dir-sync

DIR s3://s3dir-sync/2012/

2012-12-18 09:57 0 s3://s3dir-sync/2012

Happy Learning! :)