Starting the Infrastructure Cluster

NEXUS relies on Apache Solr to store metadata about tiles and Apache Cassandra to store the floating point array data associated with those tiles. Both Solr and Cassandra are distributed storage systems and can be run in a cluster.

Solr requires Apache Zookeeper to run in cluster mode (called SolrCloud). This notebook walks through the process of bringing up a 3 node Cassandra cluster, 3 node Zookeeper cluster, and a 3 node SolrCloud.

Step 1: Start One Cassandra Container

When initializing a Cassandra cluster, one or more nodes must be designated as a 'seed' node to help bootstrap the internal communication between nodes: Internode communications (gossip).

Therefore, the first step is to start one Cassandra container so that it can act as the seed node for the rest of our cluster.

TODO

  1. Navigate to the directory containing the docker-compose.yml file for the infrastructure cluster

    $ cd ~/nexus/esip-workshop/docker/infrastructure
    
  2. Use docker-compose to bring up the cassandra1 container.

    $ docker-compose up -d cassandra1
    
  3. Wait for the Cassandra node to become ready before continuing. Run the following command to follow the logs for cassandra1.

    $ docker logs -f cassandra1
    
  4. Wait for the Cassandra node to start listening for clients. It should only take a minute or so. Look for this line in the logs:

    Starting listening for CQL clients on /0.0.0.0:9042

Step 2: Start the Remaining Infrastructure Containers

Once the first Cassandra node is running, the rest of the infrastructure cluster can be brought online. The remaining 8 containers in the infrastructure can be started using the docker-compose command again.

TODO

  1. Use docker-compose to bring up the remaining containers. Note: Make sure you are still in the same directory as Step 1 ~/nexus/esip-workshop/docker/infrastructure.
    $ docker-compose up -d
    

Step 3: Verify the Infrastructure has Started

Now there should be 9 containers running that make up our 3 node Cassandra cluster, 3 node Zookeeper cluster, and 3 node SolrCloud. We can use a variety of commands to verify that our cluster is active and healthy.

TODO

  1. List all running docker containers.

    $ docker ps
    

    The output should look simillar to this:

    CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                         NAMES  
    90d370eb3a4e        nexusjpl/jupyter      "tini -- start-not..."   30 hours ago        Up 30 hours         0.0.0.0:8000->8888/tcp                        jupyter  
    cd0f47fe303d        nexusjpl/nexus-solr   "docker-entrypoint..."   30 hours ago        Up 30 hours         8983/tcp                                      solr2  
    8c0f5c8eeb45        nexusjpl/nexus-solr   "docker-entrypoint..."   30 hours ago        Up 30 hours         8983/tcp                                      solr3  
    27e34d14c16e        nexusjpl/nexus-solr   "docker-entrypoint..."   30 hours ago        Up 30 hours         8983/tcp                                      solr1  
    247f807cb5ec        cassandra:2.2.8       "/docker-entrypoin..."   30 hours ago        Up 30 hours         7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp   cassandra3  
    09cc86a27321        zookeeper             "/docker-entrypoin..."   30 hours ago        Up 30 hours         2181/tcp, 2888/tcp, 3888/tcp                  zk1  
    33e9d9b1b745        zookeeper             "/docker-entrypoin..."   30 hours ago        Up 30 hours         2181/tcp, 2888/tcp, 3888/tcp                  zk3  
    dd29e4d09124        cassandra:2.2.8       "/docker-entrypoin..."   30 hours ago        Up 30 hours         7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp   cassandra2  
    11e57e0c972f        zookeeper             "/docker-entrypoin..."   30 hours ago        Up 30 hours         2181/tcp, 2888/tcp, 3888/tcp                  zk2  
    2292803d942d        cassandra:2.2.8       "/docker-entrypoin..."   30 hours ago        Up 30 hours         7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp   cassandra1  
    
  2. Get the Cassandra cluster status by running nodetool status inside the cassandra1 container.

    $ docker exec cassandra1 nodetool status
    

    You should see 3 cluster nodes:

    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  172.18.0.2  4.8 GB     256          35.3%             d9a0d273-b11c-41dd-9da1-cb77882f275f  rack1
    UN  172.18.0.5  4.42 GB    256          33.2%             d68d9ea7-04a0-4eaf-b9c6-333b606bd2b1  rack1
    UN  172.18.0.7  4.16 GB    256          31.5%             6f8683f9-abf8-4466-87bc-a5faa048956d  rack1
    
  3. Get the status of the SolrCloud by running the cell below


In [ ]:
# TODO Run this cell to get the status of the Solr Cluster. You should see a collection called
# 'nexustiles' with 3 shards spread across all 3 nodes.

import requests
import json

response = requests.get('http://solr1:8983/solr/admin/collections?action=clusterstatus&wt=json')
print(json.dumps(response.json(), indent=2))

Congratulations!

You have sucessfully started up the NEXUS infrastructure. Your EC2 instance now has 9 containers running: