(2) Reading Meetups from IBM Cloudant using Spark

This notebook will be the second in our series, we are going to utilize the Apache Spark and Cloudant Bluemix services to read data into Spark Dataframes from our IBM Cloudant Bluemix service.

Please reference the first notebook in our series, Streaming Meetups to IBM Cloudant using Spark, for a detailed list of prerequisites to get up and running.

1. Add Dependencies/Jars

In order to run this demonstration notebook we are using the cloudant-spark library. These scala dependencies/jars are added to our environment using the AddJar magic from the Spark Kernel, which adds the specified jars to the Spark Kernel and Spark cluster.

  • cloudant-spark - Allows us to perform Spark SQL queries against RDDs backed by Cloudant

In [1]:
%AddJar https://dl.dropboxusercontent.com/u/19043899/cloudant-spark.jar


Using cached version of cloudant-spark.jar

2. Initialize spark context with cloudant configurations

The Bluemix Apache Spark service notebook comes with a spark context ready to use, but we are going to have to modify this one to configure built in support for cloudant. Note for the demo purposes we are setting the spark master to run in local mode, but by default the Spark service will run in cluster mode. Update the HOST, USERNAME, and PASSWORD below with the credentials to connect to your Bluemix Cloudant service which our demo depends on. You can get these credentials from the Palette on the right by clicking on the Data Source option. If your data source does not exist add it using the Add Source button or if it already does you can use the "Insert to code" button to add the information to the notebook.


In [2]:
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, SQLContext}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Time, Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}

val conf = sc.getConf
conf.set("cloudant.host","HOST")
conf.set("cloudant.username", "USERNAME")
conf.set("cloudant.password","PASSWORD")

conf.setJars(ClassLoader.getSystemClassLoader.asInstanceOf[java.net.URLClassLoader].getURLs.map(_.toString).toSet.toSeq ++ kernel.interpreter.classLoader.asInstanceOf[java.net.URLClassLoader].getURLs.map(_.toString).toSeq)
conf.set("spark.driver.allowMultipleContexts", "true")
conf.set("spark.master","local[*]")
val scCloudant = new SparkContext(conf)
scCloudant.getConf.getAll.foreach(println)

3. Read from the IBM Cloudant Bluemix service

Using the cloudant-spark library we are able to seemlessly interact with our IBM Cloudant Bluemix Service meetup_group database through the abstraction of Spark Dataframes.


In [3]:
def readFromDatabase(sqlContext: SQLContext, databaseName: String) = {
    val df = sqlContext.read.format("com.cloudant.spark").load(databaseName)
    df
}

4. Read from IBM Cloudant and filter on our dataframe

First we must create an SQL context from our Spark context we created in step 2. We can then simply use our readFromDatabase function previously defined to perform Spark Dataframe operations such as filtering on fields.


In [4]:
val sqlContext = new SQLContext(scCloudant)
import sqlContext.implicits._
val df = readFromDatabase(sqlContext, "meetup_group")
df.show(5)
df.filter(df("group_city")==="Austin").show(5)


Use dbName=meetup_group, indexName=null, jsonstore.rdd.partitions=5, jsonstore.rdd.maxInPartition=-1, jsonstore.rdd.minInPartition=10, jsonstore.rdd.requestTimeout=100000,jsonstore.rdd.concurrentSave=-1,jsonstore.rdd.bulkSize=1
+--------------------+--------------------+----------+-------------+--------+---------+---------+--------------------+-----------+--------------------+--------------------+
|                 _id|                _rev|group_city|group_country|group_id|group_lat|group_lon|          group_name|group_state|        group_topics|       group_urlname|
+--------------------+--------------------+----------+-------------+--------+---------+---------+--------------------+-----------+--------------------+--------------------+
|1d335ff5eb23f7f00...|1-acea197c65b80be...|Richardson|           us|15196832|    32.96|   -96.75|Richardson/Plano ...|         TX|List([Referral Ma...|RichardsonPlanoNe...|
|1d335ff5eb23f7f00...|1-72d2c8a62057bcd...|    Austin|           us|18179233|    30.24|   -97.76|Doodle Crew USA :...|         TX|List([Cartoonists...|     Doodle-Dudes-TX|
|2c9d4bb8ad5eaa34f...|1-bbc4604e3e53946...|    Spring|           us|18652297|    30.14|   -95.47|Spring/Woodlands ...|         TX|List([Dining Out,...|Spring-Woodlands-...|
|2c9d4bb8ad5eaa34f...|1-ec2d7e34afb7173...|    Austin|           us| 5256562|    30.24|   -97.76|Austin Associatio...|         TX|List([Investing,i...|Austin-Associatio...|
|2c9d4bb8ad5eaa34f...|1-acea197c65b80be...|Richardson|           us|15196832|    32.96|   -96.75|Richardson/Plano ...|         TX|List([Referral Ma...|RichardsonPlanoNe...|
+--------------------+--------------------+----------+-------------+--------+---------+---------+--------------------+-----------+--------------------+--------------------+

+--------------------+--------------------+----------+-------------+--------+---------+---------+--------------------+-----------+--------------------+--------------------+
|                 _id|                _rev|group_city|group_country|group_id|group_lat|group_lon|          group_name|group_state|        group_topics|       group_urlname|
+--------------------+--------------------+----------+-------------+--------+---------+---------+--------------------+-----------+--------------------+--------------------+
|1d335ff5eb23f7f00...|1-72d2c8a62057bcd...|    Austin|           us|18179233|    30.24|   -97.76|Doodle Crew USA :...|         TX|List([Cartoonists...|     Doodle-Dudes-TX|
|2c9d4bb8ad5eaa34f...|1-ec2d7e34afb7173...|    Austin|           us| 5256562|    30.24|   -97.76|Austin Associatio...|         TX|List([Investing,i...|Austin-Associatio...|
|2c9d4bb8ad5eaa34f...|1-b07f0f605694ffc...|    Austin|           us| 7894862|    30.26|   -97.87|  Dance Walk! Austin|         TX|List([New In Town...|   Dance-Walk-Austin|
|39cc2df929299bb06...|1-f1b6882f0b1912a...|    Austin|           us| 3648022|    30.27|   -97.74|Austin Geeks and ...|         TX|List([Doctor Who,...|AustinGeeksandGamers|
|39cc2df929299bb06...|1-ee5bc44c0904ece...|    Austin|           us| 1758659|    30.27|   -97.74|AWESOME AUSTIN! I...|         TX|List([Dining Out,...|         AustinTexas|
+--------------------+--------------------+----------+-------------+--------+---------+---------+--------------------+-----------+--------------------+--------------------+