Showing posts with label Apache Bigtop. Show all posts
Showing posts with label Apache Bigtop. Show all posts

Wednesday, 27 June 2012

Hadoop Hangover: Introduction To Apache Bigtop and Installing Hive, HBase and Pig

In the previous post we learnt how easy it was to install Hadoop with Apache Bigtop!
We know its not just Hadoop and there are sub-projects around the table! So, lets have a look at how to install Hive, Hbase and Pig in this post.


Before rowing your boat...
Please follow the previous post and get ready with Hadoop installed!
Follow the link for previous post:
http://femgeekz.blogspot.in/2012/06/hadoop-hangover-introduction-to-apache.html
also, the same can be found at DZone, developer site: http://www.dzone.com/links/hadoop_hangover_introduction_to_apache_bigtop_and.html


All Set?? Great! Head On..
Make sure all the services of Hadoop are running. Namely, JobTracker, SecondaryNameNode, TaskTracker, DataNode and NameNode. [standalone mode]


Hive with Bigtop:
The steps here are almost the same as Installing Hive as a separate project.
However, few steps are reduced.
The Hadoop installed in the previous post is Release 1.0.1


We had installed Hadoop with the following command
sudo apt-get install hadoop\*
Step 1: Installing Hive
We have installed Bigtop 0.3.0, and so issuing the following command installs all the hive components.
ie. hive, hive-metastore, hive-server. The daemons names are different in Bigtop 0.3.0.
sudo apt-get install hive\*
This installs all the hive components. After installing, the scripts must be able to create /tmp and /usr/hive/warehouseand HDFS doesn't allow these to be created while installing as it is unaware of the path to Java. So, create the directories if not created and grant the execute permissions.
In the hadoop directory, ie. /usr/lib/hadoop/
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir /user/hive/warehouse

bin/hadoop -chmod g+x /tmp
bin/hadoop -chmod g+x /user/hive/warehouse



Step 2: The alternative directories could be/var/run/hiveand/var/lock/subsys
sudo mkdir /var/run/hive
sudo mkdir /var/lock/subsys



Step 3: Start the hive server, a daemon
sudo /etc/init.d/hive-server start
Image:
start hive-server






Step 4: Running Hive
Go-to the directory /usr/lib/hive.
See the Image below:
bin/hive
bin/hive
















Step 5: Operations on Hive
Image: 
Basic hive operations






HBase with Bigtop:
Installing Hbase is similar to Hive.


Step 1: Installing HBase
sudo apt-get install hbase\*
Image: 
hbase-0.92.0




Step 2: Starting HMaster
sudo service hbase-master start
Image: 
Starting HMaster


Image: 
jps (HMaster started)




Step 3: Starting HBase shell
hbase shell
Image: 
start HBase shell




Step 4: HBase Operations
Image: 
HBase table operations


Image: 
list,scan,get,describe In HBase




Pig with Bigtop:
Installing Pig is similar too.


Step 1: Installing Pig
sudo apt-get install pig
Image: 
Installing Pig




Step 2: Moving a file to HDFS
Image: 
Moving a tab separated file "book.csv" to HDFS




Step 3: Installed Pig-0.9.2
Image: 
Pig installed Pig-0.9.2




Step 4: Starting the grunt shell
pig
Image: 
Starting Pig




Step 5: Pig Basic Operations
Image:
Basic Pig Operations


Image:
Job Completion




We saw that is it possible to install the subprojects and work with Hadoop, with no issues.
Apache Bigtop has its own spark! :)
There is a release coming BIGTOP-0.4.0 which is supposedly to fix the following issues:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12318889&styleName=Html&projectId=12311420
Source and binary files:
http://people.apache.org/~rvs/bigtop-0.4.0-incubating-RC0
Maven staging repo:
https://repository.apache.org/content/repositories/orgapachebigtop-279
Bigtop's KEYS file containing PGP keys we use to sign the release:
http://svn.apache.org/repos/asf/incubator/bigtop/dist/KEYS


Let us see how to install other sub-projects in the coming posts!
Until then, Happy Learning!! :):)

Friday, 22 June 2012

Hadoop Hangover : Introduction To Apache Bigtop and Playing With It (Installing Hadoop)!

Ah!! The name is everywhere, carried with the wind. Apache Hadoop!!
The BIG DATA crunching platform!
We all know how alien it can be at start too! Phew!! :o

Its my personal experience, nearly 11 months before, I was trying to install HBase, I faced few issues! The problem was version compatibility. Ex: "HBase some x.version" with "Hadoop some y.version".
This is a real issue because you will never know which package of what version blends well with the other, unless, someone has tested it. This testing again depends on the environment where they have set up and could be another issue.
There was a pressing demand for the management of distributions and then comes an open source project which attempts to create a fully integrated and tested Big Data management distribution, "Apache Bigtop".

Goals of Apache Bigtop:
-Packaging
-Deployment
-Integration Testing
of all the sub-projects of Hadoop. This project aims at system as a whole, than the individual project.

I love the way Doug Cutting quoted in the Keynote, back then, wherein he expressed the similarity between Hadoop and Linux kernel,and the corresponding similarity between the big stack of Hadoop ( Hive, Hbase, Pig, Avro, etc.) and the fully operational operating systems with its distributions (RedHat, Ubuntu, Fedora, Debian etc.). This is an awesome analogy! :)

Life is made easy with Bigtop:
Bigtop Hadoop distribution artifacts won't make you feel that you live in an alien world! After installing, you will get a chance to blend a Hadoop cluster in any mode, with the sub-projects of it. Its all for you to garnish next! :)

Setup Of Bigtop and Installing Hadoop:
It's time to welcome all your packages home. [I also mean /home/..]  ;)
I've tested on Ubuntu 11.04 and here goes a quick and easy installation process.

Step 1: Installing the GNU Privacy Guard key, a key management system to access all public key directories.
wget -O- http://www.apache.org/dist/incubator/bigtop/bigtop-0.3.0-incubating/repos/GPG-KEY-bigtop | sudo apt-key add -

Step 3: Updating the apt cache
sudo apt-get update

Step 4: Checking in the artifacts
sudo apt-cache search hadoop
Image: 
Search in the apt cache

Step 5: Set your JAVA_HOME
export JAVA_HOME=path_to_your_Java
export $JAVA_HOME in  ~/.bashrc

Step 6: Installing the complete Hadoop stack
sudo apt-get install hadoop\*
Image: (above)

Running Hadoop:

Step 1: Formatting the namendoe
sudo -u hdfs hadoop namenode -format
Image : 
Formatting the namenode



Step 2: Starting the Namenode, Datanode, Jobtracker, Tasktracker of Hadoop
for i in hadoop-namenode hadoop-datanode hadoop-jobtracker hadoop-tasktracker ; do sudo service $i start ; done
Now, the cluster is up and running.
Image :
Start all the services


Step 3: Creating a new directory in hdfs
sudo -u hdfs hadoop fs -mkdir /user/bigtop
bigtop is the directory name in the user $USER
sudo -u hdfs hadoop fs -chown $USER /user/bigtop
Image :
Create a directory in HDFS


Step 4: List the directories in file system
hadoop fs -lsr /
Image :
HDFS directories


Step 5: Running a sample pi example
hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 10 1000
Image :
Running a sample program

Job Completed!


Enjoy with your cluster! :)
We shall see what more blending could be done with Hadoop (with Hive, Hbase, etc.) in the next post!
Until then, 
Happy Learning!! :):)