Monday 11 February 2013

Hadoop Hangover: How-to launch a hadoop cluster CDH4 [MRv1 / YARN + Ganglia] using Apache Whirr


  This post is about how-to launch a CDH4 MRv1 or CDH4 Yarn cluster on EC2 instances. It's said that you can launch a cluster with the help of Whirr and in a matter of 5 minutes! This is very true if and only if everything works out well! ;) 

Hopefully, this article helps you in that regard.
So, let's row the boat...
  • Download the stable version of Apache Whirr  ie. whirr-0.8.1.tar.gz from the following link whirr-0.8.1.tar.gz
  • Extract from the tarball and generate the key 
  • Generate the key
  • Make a properties file to launch the cluster with that configuration.
  • Now let me tell you how to avoid getting headaches!
    • cluster name: Keep your cluster name simple. Avoid testCluster, testCluster1 etc. ie. No Caps, numerics..
    • Decide on the number of datanodes you want judiciously.
    • Your launch may not be successful, if java is not installed. Make sure the image has Java. However, this properties file takes care of that.
    • It will be good to go ahead with MRv1 for now and later switch to MRv2, when we get a production stable release.
    • This is the minimal set of configurations for launching a Hadoop cluster. But, you can do a lot performance tuning upon this.
    • I had launched this cluster from an ec2 instance, Initially i faced errors, regarding user. Setting the configuration below, solved the problem.
    • Set proper permissions for ~/.ssh and whirr-0.8.1 folder before launching.
  •  Well, we are ready to launch the cluster. Name the properties file as "whirr_cdh.properties".
In the console you can see, links to Namenode and JobTracker Web UI. It also prints how to ssh to the instances in the end.

  • Now, you should be having the files generated. You will be able to see  these files: instances, hadoop-proxy.sh and hadoop-site.xml
  • Starting the proxy
  • Open another terminal, and type
  • You should be able to access the HDFS.
  • You can alternatively download hadoop tarball and launch with 
  •  Okay! So I know that you will not be satisfied unless you a web UI
So, we are good to go! 
  •   If you want to launch MRv2,  use this.
and the same process! 
Happy Learning! :)

No comments:

Post a Comment