Showing posts with label ubuntu. Show all posts
Showing posts with label ubuntu. Show all posts

Saturday, 20 December 2014

Introduction and getting started with Apache Mesos

Introduction to Apache Mesos

In this era of distributed computing, where we spin up clusters for Hadoop, Storm, Jenkins, Cassandra, etc separately, we are not making effective use of the resources. There would be long pauses in the cluster after entering a burst of information, thus making it very in-efficient. 
Now, what if all these frameworks shared the same set of machines and resources, then small slices of time spent waiting for some resources could be granted to other frameworks. This is the concept of Time Sharing. 

Apache Mesos is a datacenter operating system and it shares the same philosophy of time-sharing. Mesos is called a datacenter because it hosts different frameworks under a single roof. It is called an operating system because shares many concepts of Linux.
1. Isolation : Linux creates isolation through processes where, each of these processes has its own file descriptors and its own address space. This is achieved by Linux Containers (wiki:LXC) in Mesos
2. Process Scheduler : The processes have accesses to the system resources by balancing the work loads across multiple computing resources, thereby optimizing resources, maximize the throughput, minimizing the response time and avoiding the overhead by any one resource. There are various scheduling algorithms to execute more than one process at a time (wiki:Multitasking) and also transmit multiple data streams simultaneously across a single physical channel(wiki:Multiplexing). Mesos uses such scheduling algorithms.
3. Common Infrastructure : Linux has a set of calls irrespective of filesystems, drivers etc. Similarly, Mesos has a common set of calls which helps in the execution of tasks.
4. Package Manager : Linux has apt-get, aptitude, synaptic, yum etc that helps in the automation of the process of installing, upgrading, configuring, and removing software. Similarly, Mesos has a recent support for Docker(wiki:Docker)

Traditionally, distributed systems has 2 components in a non-peer to peer systems.
1. Coordinator : Generate tasks, send the tasks to worker and receive results from Worker.
2. Worker : Execute the tasks and send the status and results back to Worker.

With Mesos, there are three levels Coordinator, Mesos master and Mesos slaves where coordinator negotiates with mesos master and then master decides on partitioning the cluster to distribute the tasks. Thus, we can schedule jobs across the machines, thereby running hadoop, cassandra, spark etc.
All the distributed systems that run on Mesos are called applications or frameworks and the coordinator is called as scheduler in Mesos vocabulary.

How does Mesos work?

In summary, Mesos works on a request/offer based model. Whenever, you want to run a job, you send a request. These requests are simplified subset of specification like number of GPUs, RAM etc, at that point of time. Mesos, checks for the request specification and it will reply back with the resource offers of what resources are available on a set of machines. This is non-blocking and has two level of scheduling : Offering and Scheduling.
Mesos master: Control the resource allocation to the schedulers
Scheduler: Uses the resource offers to decide which tasks to run and which one to run next.
More information on Mesos architecture is here : Mesos Architecture

Getting started with mesos


1. Download the tarball from the Mirror Apache Mesos v0.21.0 and untar it.

tar -zvxf  mesos-0.21.0.tar.gz 
cd mesos-0.21.0/

2. Install the dependencies

sudo apt-get update
sudo apt-get install build-essential openjdk-6-jdk python-dev python-boto libcurl4-nss-dev libsasl2-dev maven  libapr1-dev libsvn-dev

3. Building Mesos
Please make sure it has appropriate permissions while building.

mkdir build
cd build
../configure
make
make check 
make install

Start Mesos Master

./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos

Start Mesos Slave

./bin/mesos-slave.sh --master=127.0.0.1:5050

Web UI

http://127.0.0.1:5050

Running a test framework in Java

./src/examples/java/test-framework 127.0.0.1:5050

Mesos Home
Mesos Frameworks



Mesos  executor tasks
Mesos Slave Nodes


In the next post, let us see how to build our own distributed framework on Apache Mesos.
Happy Learning! :)

Tuesday, 5 August 2014

Raspberry Pi B+ : Connect to your Pi with no display monitor, keyboard or mouse

        The new Raspberry Pi B+ was released on July 14th 2014. The following link does all the justice to know about the improved features of B+.  Please refer it here : Raspberry Pi B+

        The moment I had this credit card sized thing on my palm, I was so excited to get started with it. I downloaded Raspbian from Raspberry Pi Downloads Page and unzipped it. This is the image file 2014-06-20-wheezy-raspbian.img you get after the unzip operation. I used the SanDisk 8GB Class 4 Micro SD card and dd tool to write to it. Refer Installing images on Linux here ( Please note, I did bs=1M with dd tool while writing). This was one of the guide I was following Quick Start Guide

But then, wait! Do you need a HDMI TV or display monitor, USB keyboard and a USB mouse to connect to it? Oh really!? I was not interested to invest any further. 

Here is what I did:

  1. I inserted the Micro SD card into the slot behind. 
  2. I had a micro USB cable my for Nexus 5. I used the same for the 5V power supply
  3. I had an Ethernet crossover cable already with me for some VPN stuffs. I took one of them and connected  one end to the router and the other end to the Pi Ethernet Port (next to 4 USB Ports) 
  4. Next, I used my laptop to connect to my Pi in the same network.
  5. On my laptop, 
    1. Opened Terminal. Typed in $ ifconfig  to get to know the IP address. 
      • For ex: if  inet addr:192.168.0.101 then our Pi should have  inet addr:192.168.0.*
  6. Next, for my rescue came the nmap tool. If you do not have it already installed, $ sudo apt-get install nmap 
    • Execute $ nmap -T4 -F 192.168.0.* on the terminal ( as per the example considered ) This will scan all the hosts which are up and then lists out the open ports. You will find Pi as one among them.
  7. If you like GUI Based, you may like Zenmap and can be installed by $ sudo apt-get install zenmap.
    • After installing, type $ sudo zenmap
    • Type Target as 192.168.0.*
    • Choose Profile as Quick Scan
    • Click on Scan
    • This will list the Raspberry Pi MAC Address and the IP Address
  8. Now, you know the IP Address of Pi. The default login for Raspbian is username "pi" with the password "raspberry"
  9. Time to ssh!
  10. Need to change your configuration settings after login? Type $ sudo raspi-config and change it accordingly!
Happy hacking! :)

Tuesday, 18 June 2013

One Cap to rule 'em all ...

Wondering which cap it could be?
....Well its Capistrano :D

I am a fan of Capistrano from way back and we use it for almost all kind of deployments - Hadoop, MongoDB clusters and so on.
If you have not tried Capistrano, you must try it and figure out how you can use for deployments in your environment.

Its highly configurable - so capify your stuffs!

Checkout the capified scripts to Deploy a replicated sharded MongoDB cluster on AWS EC2 instances in the following link
https://github.com/SwathiMystery/deploy_shard_mongodb
Feel free to experiment, report bugs/ issues and contribute back.

For more details, follow the link below :
https://github.com/SwathiMystery/deploy_shard_mongodb/blob/master/README.md#deploy-replicated-sharded-mongodb-cluster

Monday, 15 April 2013

Monitoring S3 uploads for a real time data

        If you are working on Big Data and its bleeding edge technologies like Hadoop etc., the primary thing you need is a "dataset" to work on. So, this data can be reviews, blogs, news, social media data (Twitter, Facebook etc), domain specific data, research data, forums, groups, feeds, fire hose data etc. Generally, companies reach the data vendors to fetch such kind of data.

        Normally, these data vendors dump the data into a shared server kind of environment. For us to use this data for processing with MapReduce and so forth, we move them to S3 for storage first and processing next. Assume, the data belong to social media such as Twitter or Facebook, then the data can be dumped according to the date format directory. Majority of the cases, its the practice.
Also assuming 140-150GB /day being dumped in a hierarchy like 2013/04/15 ie. yyyy/mm/dd format, stream of data, how do you 
-  upload them to s3 in the same hierarchy to a given bucket?
-  monitor the new incoming files and upload them?
-  save the space effectively on the disk?
-  ensure the reliability of uploads to s3?
-  clean if the logging is enabled to track?
-  re-try the failed uploads?

These were some of the questions, running at the back of my mind, when I wanted to automate the uploads to S3. Also, I wanted 0 human intervention or at-least the least!
So, I came up with 
- s3sync / s3cmd.
- the python Watcher script by Greggory Hernandez, here https://github.com/greggoryhz/Watcher 
A big thanks! This helped me with monitoring part and it works so great!
- few of my own scripts.

What are the ingredients?
  •  Installation of s3sync. I have just used one script of s3cmd here and not s3sync in real. May be in future -- so I have this.
  • Installation of Watcher.
  • My own wrapper scripts.
  • cron
Next, having set up of the environment ready, lets make some common "assumptions".
  • Data being dumped will be at /home/ubuntu/data/ -- from there it could be 2013/04/15 for ex.
  • s3sync is located at /home/ubuntu
  • Watcher repository is at /home/ubuntu
Getting our hands dirty...
  • Goto Watcher and set the directory to be watched for and corresponding action to be undertaken.
  • Create a script called monitor.sh to upload to s3 in s3sync directory as below.
    • The variables you may like to change is s3bucket path in "s3path" in monitor.sh
    • This script will upload the new incoming file detected by the watcher script in the reduced redundancy storage format. (you can remove the header -- provided you are not interested to store in RRS format)
    • The script will call s3cmd ruby script to upload recursively and thus maintains the hierarchy ie. yyyy/mm/dd format with files *.*
    • It will delete the file successfully uploaded to s3 from the local path -- to save the disk space.
    • The script would not delete the directory, as it will be taken care by yet another script re-upload.sh, which acts as a backup for the failed uploads to be uploaded again to s3.
  • Create a script called re-upload.sh which will upload the failed file uploads.
    • This script ensures that the files that are left over from monitor.sh (failed uploads -- this chance is very less. May be 2-4 files/day. -- due to various reasons.), will be uploaded to s3 again with the same hierarchy in RRS format.
    •  Post successful upload, deletes the file and hence the directory if empty.
  • Now, more dirtiest work -- Logging and cleaning logs.
    • All the "echo" created in monitor.sh can be found in ~/.watcher/watcher.log when the watcher.py is running.
    • This log helps us initially and may be later too, to backtrack errors or so.
    • Call of duty - Janitor for cleaning logs. To do this, we can use cron to run a script at sometime. I was interested to run - Every Saturday at 8.00 AM
    • Create a script to clean log as "clean_log.sh" in /home/ubuntu/s3sync
  • Time for cron
    • All set! logging clean happens every Saturday 8.00 AM and re-upload script runs for the previous day, to check if files exist and does the cleaning accordingly.
  • Let's start the script
So, this assures successful uploads  to S3. 
My bash-fu with truth! ;)
Happy Learning! :)

Tuesday, 18 December 2012

FUSE on Amazon S3

FUSE: File System In User Space, hosted on sourceforge, a well known open source project http://fuse.sourceforge.net/
You either put the files in S3 bucket directly or in the mount point, both will always be in the same hierarchy and in Sync. The best thing is that any arbitrary program can just point to this mount point and perform simple/ normal commands, rather than file system specific commands.

Here is a small documentation about how we can achieve this.

1.  Check out the code from google code.
$ svn checkout http://s3fs.googlecode.com/svn/trunk/ s3fs

2. Switch to the working directory
$ cd s3fs
$ ls 
AUTHORS  autogen.sh  ChangeLog  configure.ac  COPYING  doc  INSTALL  Makefile.am  NEWS  README  src  test

3. Now same old ritual of configure , make and install.
To run the subsequent command you need autoconf. So make sure you have it by running the following command.
$ sudo apt-get install autoconf
$ autoreconf --install 
It is silently notifying you that you lack the libraries. Time to get them installed...
$ sudo apt-get install build-essential libfuse-dev fuse-utils libcurl4-openssl-dev libxml2-dev mime-support

Getting back...
$ ./configure --prefix=/usr
$ make
$ sudo make install

4. Done with the Installation process.
Cross-check:
$ /usr/bin/s3fs  
s3fs: missing BUCKET argumentUsage: s3fs BUCKET:[PATH] MOUNTPOINT [OPTION]...

5. Add the following line to your ~/.bashrc file and source it.
export s3fs=/usr/bin/s3fs
$source ~/.bashrc$ s3fs s3fs: missing BUCKET argumentUsage: s3fs BUCKET:[PATH] MOUNTPOINT [OPTION]...

6. Install s3cmd. Many of you must be using this tool to interact with s3.
$ sudo apt-get install s3cmd$ s3cmd --configure 
This will configure with the S3 account using Access and Secret Key.

Configuring FUSE
1. First set use_allow_other for others to use. Uncomment in fuse.conf
$ vi /etc/fuse.conf

2. Set the AcessKey:SecretKey in the format in passwd-s3fs file
$ sudo vi /etc/passwd-s3fs
$ sudo chmod 640 /etc/passwd-s3fs
3. Created a bucket called "s3dir-sync" for this experiment.
$ s3cmd ls2012-12-18 09:23  s3://s3dir-sync
4. Creating a mount point where you want to dump/place the files and keep them in sync with the S3 bucket. Create as root user.
$ sudo mkdir -p /mnt/s3Sync$ sudo chmod 777 /mnt/s3Sync

5. With s3fs, as a root user.
$ sudo s3fs s3dir-sync -o default_acl=public-read -o allow_other /mnt/s3Sync/
Cross-check:
$ mount -ls3fs on /mnt/s3Sync type fuse.s3fs (rw,nosuid,nodev,allow_other)
If you try mounting again, you will get the following Warning
mount: according to mtab, s3fs is already mounted on /mnt/s3Sync

6. I created a directory structure of 
/mnt/s3Sync/
-> 2012/12/18$ more test.txt
This is a check file to sync with the s3dir-sync.
Blah..!

The same is synced in the bucket "s3dir-sync"
Cross-Check: 
$ s3cmd ls s3://s3dir-sync
DIR   s3://s3dir-sync/2012/
2012-12-18 09:57         0   s3://s3dir-sync/2012

Happy Learning! :)

Monday, 10 December 2012

Get your wireless working on DELL Inspiron 5220


I brought a new DELL Inspiron 5220. It's amazing!
Configuration :

  • 3rd Generation i5 Processor
  • 4GB RAM
  • 1TB Hard Disk
  • 15" Screen
  • 1GB Graphics

It ships with Windows 8! ;)
However, made a dual boot upon it. Although BIOS looked different this time!!

Well,
I'm working on Ubuntu :)
Release : 11.10 (Oneiric)
Kernel Linux : 3.0.0-28-generic
GNOME 3.2.1

But Wi-Fi was not getting detected. This was not unusual, as I had set this up in earlier Dell models.
Well, the remedy is easy.
Step 1:  Make sure that you can witness the Device Card and the ID. Especially, the Network !!
Type in the following command.

$ lspci -nnk | grep Network
08:00.0 Network controller [0280]: Intel Corporation Device [8086:0887] (rev c4)


Step 2: Figure out which kernel version. This is because the driver which we will be installing works on 2.6.37 or higher

$ uname -a
Linux Swathi 3.0.0-28-generic #45-Ubuntu SMP Wed Nov 14 21:57:26 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


Step 3: Install the network manager

$ sudo apt-get install network-manager*

Step 4: Install few packages

$ sudo apt-get install build-essential linux-headers
Step 5: Check the output of
    $ dmesg
If it outputs the failure of firmware file, then its time to download the .ucode and place it in /lib/firmware.
Reboot
It should be working. If not, try Step 6.

Step 6: Download the compat wireless tarball from this location
http://linuxwireless.org/download/compat-wireless-2.6/compat-wireless-2012-05-10-p.tar.bz2

Extract from the tarball

$ tar -xvf <path_to_compat_wireless_bz2>
$ cd <extracted_path_compat_wireless>


Installing the packages

$ make
$ sudo make install

After this command it will show on the console the command to disable bluetooth, ethernet and Wi-Fi. Type in the 3 commands.
Place this module into the kernel.

$ sudo modprobe alx
The Ethernet LAN should be detected.
Add the driver module into this file : /etc/modules
Append the following lines. Don't touch the rest. This will enable this module while restarting the system as it loads the module.

$ sudo vi /etc/modules
#E2200 support
alx

Reboot your machine.
You must witness "Wi-Fi Networks Available!" notification on you desktop :)
Happy Learning! :)

Wednesday, 6 June 2012

ANTLR as an external tool in eclipse on ubuntu

This tutorial tells how to setup the ANTLR in your eclipse.
STEP 1:
Download the jar file antlrworks-1.4.2.jar from http://www.antlr.org/download.
Further details about ANTLRWorks: The ANTLR GUI Development Environment, follow the link : http://www.antlr.org/works/index.html
STEP 2:
Create a java project in eclipse as follows:
File->New->Project
Select Java and Java project.
Click on Next.
Name the project as "TestANTLR"
Press Finish.
Add the antlrworks-1.4.2.jar to the project classpath.
Right click on "TestANTLR" project .
Select Properties->Libraries.
Click on "Add External jar"
Select the complete path of the "antlrworks-1.4.2.jar" and press Ok.
STEP 3: make it as an external tool
Goto Run->External Tools->Configure
Click on New.
Name: ANTLR Compiler
Tool Location: /usr/lib/jvm/java-6-sun-1.6.0.26/bin/java
// this must be the complete path to your java
Tool Arguments: -classpath complete_path_to_antlrworks-1.4.2.jar 

org.antlr.Tool ${resource_name}
Working Directory: ${container_loc}
Here, org.antlr.Tool is the main class which would take the ${resource_name} for processing.
${resource_name} and ${container_loc} can be selected with "Browse Variables" option too.

Going ahead :
*Creating a grammar file
Create a grammar file with .g extension. Say, Example.g

//sample code
grammar Example;
start : 'hello' ID ';' {System.out.println("hiii... "+$ID.text);} ;
ID: 'a'..'z' + ;
WS: (' ' |'\n' |'\r' )+
{$channel=HIDDEN;}


*Running the above code:
Run->External Tools->ANTLR Compiler
Press F5 or right click on the project and "refresh"
all you can see is a lexer and parser files generated with the tokens.


In our example,
ExampleLexer.java , ExampleParser.java and Example.tokens

Create Main.java program in the same project with the following code:

import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
// create a CharStream that reads from standard input
ANTLRInputStream input = new ANTLRInputStream(System.in);
// create a lexer that feeds off of input CharStream
ExampleLexer lexer = new ExampleLexer(input);
// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// create a parser that feeds off the tokens buffer
ExampleParser parser = new ExampleParser(tokens);
// begin parsing at rule start
parser.start();
}
}  


Set the arguments in the Run configurations and click on Apply and Run.
Now you have the output at console.
:)