Recipe for Project-001

Install Antelope

  • Download antelope 5.6 ISO from http://www.brtt.com/home.html
  • Mount the ISO on CentOS 7 and install
  • Email the license request file with the IP address associate with the VM

Create Instances

  • Launch instance.

  • Fill "Instance Name".

  • Choose flavor for your instance. For MongoDB node, small or medium is sufficient. However, if the data set is extremely large, extra storage node is recommended.

  • Select Instance Count.

  • Select Instance Boot Source. The recommend way to do this is to Launch an empty instance from scratch. Walk through the whole recipe, install the necessity. Save it as a snapshot. Launch the rest instance from the snapshot.

  • CentOS 7 and Ubuntu 16.04 is recommended when selecting OS.

Config the network of the Cluster

The first node we launched is head node. Only this node needs to be associated with a floating IP. Every time we log through head node and ssh to the rest. 1.After launching all the instance, edit /etc/hosts file in the head node, put the ip address and hostname in this file.

2.To log passwordlessly, generate a new ssh key and add it to the ssh-agent on the head node. Also, upload the public key to the Chameleon key pair so that every node can be accessed through this key.

3.The final step is to log to every node manually and add the head node to the known_hosts on every node.

MongoDB Installment

To install MongoDB, we nee to configure the package management system. Here Yum for example.

1.Create a /etc/yum.repos.d/mongodb-org-3.2.repo file with root privilege so that you can install MongoDB directly, add:

[mongodb-org-3.2]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.2/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-3.2.asc

2.Install the MongoDB packages and associated tools.

sudo yum -y install mongodb-org

3.Before we start the Mongod service, we need to edit /etc/mongo.conf first. Since we are building a MongoDB cluster, we need to let each MongoDB node to bind the internal network kIP address. We will use default setting for other configurations.

Deploy a Sharded MongoDB Cluster

A sharded cluster can be deployed in the Cloud Manager, Ops Manager or with command lines.

A sharded MongoDB cluster includes config server and shard server.

Create the Config Server Replica Set

1.Start each member of the config server replica set. It's recommended to have at least three members within a config server replica set in production deployment for load balancing and fail safe. But I create a single-member replica set here for testing purposes.

2.Modify configuration file /etc/mongo.conf:

sharding:
  clusterRole: configsvr
replication:
  replSetName: <setname>

3.Start the mongod service.

sudo service mongod start

4.Initiate the replica set using the rs.initiate() method and a configuration document:

rs.initiate(
  {
    _id: "<replSetName>",
    configsvr: true,
    members: [
      { _id : 0, host : "cfg1.example.net:27017" }
    ]
  }
)

Create the Shard Replica Sets

1.Start each memeber of the shard replica set within the shard server.

2.Modify configuration file /etc/mongo.conf:

sharding:
  clusterRole: shardsvr
replication:
  replSetName: <replSetName>

3.start the mongod service.

sudo service mongod start

4.Initiate the replica set using rs.initiate() method:

rs.initiate(
  {
    _id : <replicaSetName>,
    members: [
      { _id : 0, host : "s1-mongo1.example.net:27017" }
  }
)

Connect a mongos to the sharded cluster

1.Modify the configuration file, set the sharding.configDB to the config server replica set name and at least one member of the replica set in /host:port format.

sharding:
  configDB: <configReplSetName>/cfg1.example.net:27017,...

2.Connect a mongo shell to the mongos.

mongos --host ip --port 27017

Add Shards to the cluster

1.Use the sh.addShard() method within mongo shell to add each shard to the cluster. If the shard is a replica set, specify the name of the replica set and specify a member of the set. In production deployments, all shards should be replica sets. Before doing so, make sure Sharding is enabled for a database. To proceed, you must be connected to a mongos associated to the target sharded cluster.

sh.addShard( "<replSetName>/s1-mongo1.example.net:27017")

2.Before you can shard a collection, you must enable sharding for the collection’s database. Enabling sharding for a database does not redistribute data but make it possible to shard the collections in that database. Once you enable sharding for a database, MongoDB assigns a primary shard for that database where MongoDB stores all data in that database. Use the sh.enableSharding() method to enable sharding on the target database.

sh.enableSharding("<database>")

3.To shard a collection, use the sh.shardCollection() method. You must specify the full namespace of the collection and a document containing the shard key. The database must have sharding enabled.

sh.shardCollection("<database>.<collection>", { <key> : <direction> } )

Jupyter Notebook Installment

To use Jupyter notebook, there are two ways to install it. One way is to use easy_install or pip, another way is to install through Anaconda. Anaconda is one of the best open data science platform powered by Python. It helps to manage and switch multiple isolated Python environments easily.

To install, you need to download the installer from its website based on your Operating System. After installment, you will need to install PyMongo for accessing MongoDB and ObsPy for accessing Seismic Processing libraries.

conda install pymongo
conda config --add channels conda-forge
conda config --add channels obspy
conda install obspy

Since we only need to execute our notebook on the head node. Hence, this step only needs to be performed on the head node.