Showing posts with label open source. Show all posts
Showing posts with label open source. Show all posts

Saturday, 20 December 2014

Introduction and getting started with Apache Mesos

Introduction to Apache Mesos

In this era of distributed computing, where we spin up clusters for Hadoop, Storm, Jenkins, Cassandra, etc separately, we are not making effective use of the resources. There would be long pauses in the cluster after entering a burst of information, thus making it very in-efficient. 
Now, what if all these frameworks shared the same set of machines and resources, then small slices of time spent waiting for some resources could be granted to other frameworks. This is the concept of Time Sharing. 

Apache Mesos is a datacenter operating system and it shares the same philosophy of time-sharing. Mesos is called a datacenter because it hosts different frameworks under a single roof. It is called an operating system because shares many concepts of Linux.
1. Isolation : Linux creates isolation through processes where, each of these processes has its own file descriptors and its own address space. This is achieved by Linux Containers (wiki:LXC) in Mesos
2. Process Scheduler : The processes have accesses to the system resources by balancing the work loads across multiple computing resources, thereby optimizing resources, maximize the throughput, minimizing the response time and avoiding the overhead by any one resource. There are various scheduling algorithms to execute more than one process at a time (wiki:Multitasking) and also transmit multiple data streams simultaneously across a single physical channel(wiki:Multiplexing). Mesos uses such scheduling algorithms.
3. Common Infrastructure : Linux has a set of calls irrespective of filesystems, drivers etc. Similarly, Mesos has a common set of calls which helps in the execution of tasks.
4. Package Manager : Linux has apt-get, aptitude, synaptic, yum etc that helps in the automation of the process of installing, upgrading, configuring, and removing software. Similarly, Mesos has a recent support for Docker(wiki:Docker)

Traditionally, distributed systems has 2 components in a non-peer to peer systems.
1. Coordinator : Generate tasks, send the tasks to worker and receive results from Worker.
2. Worker : Execute the tasks and send the status and results back to Worker.

With Mesos, there are three levels Coordinator, Mesos master and Mesos slaves where coordinator negotiates with mesos master and then master decides on partitioning the cluster to distribute the tasks. Thus, we can schedule jobs across the machines, thereby running hadoop, cassandra, spark etc.
All the distributed systems that run on Mesos are called applications or frameworks and the coordinator is called as scheduler in Mesos vocabulary.

How does Mesos work?

In summary, Mesos works on a request/offer based model. Whenever, you want to run a job, you send a request. These requests are simplified subset of specification like number of GPUs, RAM etc, at that point of time. Mesos, checks for the request specification and it will reply back with the resource offers of what resources are available on a set of machines. This is non-blocking and has two level of scheduling : Offering and Scheduling.
Mesos master: Control the resource allocation to the schedulers
Scheduler: Uses the resource offers to decide which tasks to run and which one to run next.
More information on Mesos architecture is here : Mesos Architecture

Getting started with mesos


1. Download the tarball from the Mirror Apache Mesos v0.21.0 and untar it.

tar -zvxf  mesos-0.21.0.tar.gz 
cd mesos-0.21.0/

2. Install the dependencies

sudo apt-get update
sudo apt-get install build-essential openjdk-6-jdk python-dev python-boto libcurl4-nss-dev libsasl2-dev maven  libapr1-dev libsvn-dev

3. Building Mesos
Please make sure it has appropriate permissions while building.

mkdir build
cd build
../configure
make
make check 
make install

Start Mesos Master

./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos

Start Mesos Slave

./bin/mesos-slave.sh --master=127.0.0.1:5050

Web UI

http://127.0.0.1:5050

Running a test framework in Java

./src/examples/java/test-framework 127.0.0.1:5050

Mesos Home
Mesos Frameworks



Mesos  executor tasks
Mesos Slave Nodes


In the next post, let us see how to build our own distributed framework on Apache Mesos.
Happy Learning! :)

Tuesday, 18 June 2013

One Cap to rule 'em all ...

Wondering which cap it could be?
....Well its Capistrano :D

I am a fan of Capistrano from way back and we use it for almost all kind of deployments - Hadoop, MongoDB clusters and so on.
If you have not tried Capistrano, you must try it and figure out how you can use for deployments in your environment.

Its highly configurable - so capify your stuffs!

Checkout the capified scripts to Deploy a replicated sharded MongoDB cluster on AWS EC2 instances in the following link
https://github.com/SwathiMystery/deploy_shard_mongodb
Feel free to experiment, report bugs/ issues and contribute back.

For more details, follow the link below :
https://github.com/SwathiMystery/deploy_shard_mongodb/blob/master/README.md#deploy-replicated-sharded-mongodb-cluster

Tuesday, 18 December 2012

FUSE on Amazon S3

FUSE: File System In User Space, hosted on sourceforge, a well known open source project http://fuse.sourceforge.net/
You either put the files in S3 bucket directly or in the mount point, both will always be in the same hierarchy and in Sync. The best thing is that any arbitrary program can just point to this mount point and perform simple/ normal commands, rather than file system specific commands.

Here is a small documentation about how we can achieve this.

1.  Check out the code from google code.
$ svn checkout http://s3fs.googlecode.com/svn/trunk/ s3fs

2. Switch to the working directory
$ cd s3fs
$ ls 
AUTHORS  autogen.sh  ChangeLog  configure.ac  COPYING  doc  INSTALL  Makefile.am  NEWS  README  src  test

3. Now same old ritual of configure , make and install.
To run the subsequent command you need autoconf. So make sure you have it by running the following command.
$ sudo apt-get install autoconf
$ autoreconf --install 
It is silently notifying you that you lack the libraries. Time to get them installed...
$ sudo apt-get install build-essential libfuse-dev fuse-utils libcurl4-openssl-dev libxml2-dev mime-support

Getting back...
$ ./configure --prefix=/usr
$ make
$ sudo make install

4. Done with the Installation process.
Cross-check:
$ /usr/bin/s3fs  
s3fs: missing BUCKET argumentUsage: s3fs BUCKET:[PATH] MOUNTPOINT [OPTION]...

5. Add the following line to your ~/.bashrc file and source it.
export s3fs=/usr/bin/s3fs
$source ~/.bashrc$ s3fs s3fs: missing BUCKET argumentUsage: s3fs BUCKET:[PATH] MOUNTPOINT [OPTION]...

6. Install s3cmd. Many of you must be using this tool to interact with s3.
$ sudo apt-get install s3cmd$ s3cmd --configure 
This will configure with the S3 account using Access and Secret Key.

Configuring FUSE
1. First set use_allow_other for others to use. Uncomment in fuse.conf
$ vi /etc/fuse.conf

2. Set the AcessKey:SecretKey in the format in passwd-s3fs file
$ sudo vi /etc/passwd-s3fs
$ sudo chmod 640 /etc/passwd-s3fs
3. Created a bucket called "s3dir-sync" for this experiment.
$ s3cmd ls2012-12-18 09:23  s3://s3dir-sync
4. Creating a mount point where you want to dump/place the files and keep them in sync with the S3 bucket. Create as root user.
$ sudo mkdir -p /mnt/s3Sync$ sudo chmod 777 /mnt/s3Sync

5. With s3fs, as a root user.
$ sudo s3fs s3dir-sync -o default_acl=public-read -o allow_other /mnt/s3Sync/
Cross-check:
$ mount -ls3fs on /mnt/s3Sync type fuse.s3fs (rw,nosuid,nodev,allow_other)
If you try mounting again, you will get the following Warning
mount: according to mtab, s3fs is already mounted on /mnt/s3Sync

6. I created a directory structure of 
/mnt/s3Sync/
-> 2012/12/18$ more test.txt
This is a check file to sync with the s3dir-sync.
Blah..!

The same is synced in the bucket "s3dir-sync"
Cross-Check: 
$ s3cmd ls s3://s3dir-sync
DIR   s3://s3dir-sync/2012/
2012-12-18 09:57         0   s3://s3dir-sync/2012

Happy Learning! :)

Monday, 18 June 2012

Start Darting...(Contd.)

Back again! :)
The first part of "Start Darting" is at the following link:
http://www.dzone.com/links/start_darting_.html (or)
http://femgeekz.blogspot.in/2012/06/start-darting.html

Well, to start afresh and get started, no place is as great as http://www.dartlang.org/docs/editor/getting-started/
The steps given below are more detailed in the aforementioned link. Please do go through them. :)
Step 1: Download the dart editor. Its nearly 110MB .zip file, available for Windows, Linux and Mac.
Step 2: Launch the editor by double clicking the executable.
Step 3: Running a dart code.
We saw in the previous post a Hello World sample. The same sample can be run.
If your code has web app, it will launch it in Dartium (Chromium with the Dart VM).

  • New Features about Dart:
      • Snapshots: Modern browsers need to parse a web app’s source code before that app can run. Snapshots here mean that the state and the code is recorded at certain point of time, which will increase the speed at startup.
      • Isolates: These are the processes without any overhead. Each isolate has its own memory and code, which can’t be affected by any other isolate. The only way an isolate can communicate with another isolate is by way of messages. Isolates allow a single app to use multi-core computers effectively. Another use for isolates is running code from different origins in the same page, without compromising security.
      • Interfaces with default implementations: Dart interface can have a default implementation—a class (usually private) that is the default way to create objects that implement the interface.
      • Easy Generics: Dart takes a new approach by designing a generics system that’s more understandable.
      • Optional typing: Its optional to use type in Dart and it won't change the way your app executes, but they can help developers and programming tools to understand your code. You might not bother with types while you’re developing a prototype, but you might add types when you’re ready to commit to an implementation. An emerging pattern is to add types to interfaces and method signatures,and omit types inside methods.
      • HTML library: We also took a fresh look at how you should use the HTML DOM. DOM is short for Document Object Model; it’s the interface that lets you pro-grammatically update the content, structure, and style of a web page.) By creating a native Dart library (dart:html) to access and manipulate the DOM. There are elements, attributes, and nodes to work with.
Un-boxing Dart:
Dart is more of a platform than a language. It has

    • Libraries: The core libraries are like math, server side, I/O, JSON.
    • Virtual Machine: Dart Virtual Machine is used, being run on the browsers as embedded or on the command line.
    • Compiler to java script: The Frog compiler is used to compile the Dart code to java script.
    • Language Specifications: The language looks very familiar like Java or java script because Google wanted mass adoption of the language. If it was a standing new language, only few people would have adopted it. Since, it looks very familiar, people feel comfortable with the optional typing and also isolates.
    • Dart Editor: Its an editor that helps to build complete end to end app. It helps with syntax highlighting, auto completion and running the code.
  • Compiling dart code to java script:
    • The Frog Compiler which come with the SDK, compiles a given Dart code to java script, this could be on the client side. A future version of Chrome will ship with an embedded Dart VM, allowing your Dart code to run directly in Chrome without first being compiled to Java Script.
  • Libraries to play with:
    • dart:core This is the basic library used in majority of the scripts.
        • print() -> display basic text
        • Collection, Set, List, Queue -> group objects in collections
        • Map -> manipulate key-value pairs
        • Date, TimeZone, Duration, Stopwatch -> specify dates and times
        • String, Pattern, RegExp -> use strings
        • num, int, double, Math -> use numbers
        • Future -> return a value before your operation is complete
        • Comparable -> compare similar objects
        • Iterable -> perform an operation on each item in a multi-item object
    • dart:io This library is used on the server side to connect with the outside world.
        • InputStream and OutputStream : Reading and writing data
        • Files and directories : Open and close files
        • Sockets: Connection to the network
    • dart:html: API's for producing UI for webapps.
        • Document and window: get global objects
        • Element query and queryAll: find HTML elements
        • Elements On property: add or remove events
        • Collection Interfaces to operate on group of objects
There exists many more libraries and all you can see when you download the editor and play with it.
Happy Learning! :)