Twitter + Watson Tone Analyzer sample Notebook Part 1: Loading the data

In this Notebook, we show how to load the custom library generate as part of the Twitter + Watson Tone Analyzer streaming application. Code can be found here: https://github.com/ibm-watson-data-lab/spark.samples/tree/master/streaming-twitter. The following code is using a pre-built jar has been posted on the Github project, but you can replace with your own url if needed.


In [1]:
%AddJar https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar -f


Starting download from https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar
Finished download of streaming-twitter-assembly-1.6.jar

Set up the Twitter and Watson credentials

Please refer to the tutorial for details on how to find the Twitter and Watson credentials, then add the value in the placeholders specified in the code below


In [2]:
val demo = com.ibm.cds.spark.samples.StreamingTwitter
demo.setConfig("twitter4j.oauth.consumerKey","XXXX")
demo.setConfig("twitter4j.oauth.consumerSecret","XXXX")
demo.setConfig("twitter4j.oauth.accessToken","XXXX")
demo.setConfig("twitter4j.oauth.accessTokenSecret","XXXX")
demo.setConfig("watson.tone.url","https://gateway.watsonplatform.net/tone-analyzer-beta/api")
demo.setConfig("watson.tone.password","XXXX")
demo.setConfig("watson.tone.username","XXXX")

Start the Spark Stream to collect live tweets

Start a new Twitter Stream that collects the live tweets and enrich them with Sentiment Analysis scores. The stream is run for a duration specified in the second argument of the startTwitterStreaming method. Note: if no duration is specified then the stream will run until the stopTwitterStreaming method is called.


In [3]:
import org.apache.spark.streaming._
demo.startTwitterStreaming(sc, Seconds(40))


Twitter stream started
Tweets are collected real-time and analyzed
To stop the streaming and start interacting with the data use: StreamingTwitter.stopTwitterStreaming
Receiver Started: TwitterReceiver-0
Batch started with 139 records
Batch completed with 139 records
Batch started with 270 records
Stopping Twitter stream. Please wait this may take a while
Receiver Stopped: TwitterReceiver-0
Reason:  : Stopped by driver
Batch completed with 270 records
Twitter stream stopped
You can now create a sqlContext and DataFrame with 38 Tweets created. Sample usage: 
val (sqlContext, df) = com.ibm.cds.spark.samples.StreamingTwitter.createTwitterDataFrames(sc)
df.printSchema
sqlContext.sql("select author, text from tweets").show

Create a SQLContext and a dataframe with all the tweets

Note: this method will register a SparkSQL table called tweets


In [4]:
val (sqlContext, df) = demo.createTwitterDataFrames(sc)


A new table named tweets with 38 records has been correctly created and can be accessed through the SQLContext variable
Here's the schema for tweets
root
 |-- author: string (nullable = true)
 |-- date: string (nullable = true)
 |-- lang: string (nullable = true)
 |-- text: string (nullable = true)
 |-- lat: double (nullable = true)
 |-- long: double (nullable = true)
 |-- Anger: double (nullable = true)
 |-- Disgust: double (nullable = true)
 |-- Fear: double (nullable = true)
 |-- Joy: double (nullable = true)
 |-- Sadness: double (nullable = true)
 |-- Analytical: double (nullable = true)
 |-- Confident: double (nullable = true)
 |-- Tentative: double (nullable = true)
 |-- Openness: double (nullable = true)
 |-- Conscientiousness: double (nullable = true)
 |-- Extraversion: double (nullable = true)
 |-- Agreeableness: double (nullable = true)
 |-- EmotionalRange: double (nullable = true)

Execute a SparkSQL query that contains all the data


In [5]:
val fullSet = sqlContext.sql("select * from tweets")  //Select all columns
fullSet.show


+--------------------+--------------------+-----+--------------------+---+----+------------------+------------------+------------------+-----------------+------------------+----------+---------+-----------------+-----------------+------------------+-----------------+-----------------+-----------------+
|              author|                date| lang|                text|lat|long|             Anger|           Disgust|              Fear|              Joy|           Sadness|Analytical|Confident|        Tentative|         Openness| Conscientiousness|     Extraversion|    Agreeableness|   EmotionalRange|
+--------------------+--------------------+-----+--------------------+---+----+------------------+------------------+------------------+-----------------+------------------+----------+---------+-----------------+-----------------+------------------+-----------------+-----------------+-----------------+
|Three Words o Wisdom|Sun Mar 06 13:00:...|en-gb|wildebeest rebuff...|0.0| 0.0|              11.0|              20.0|              19.0|             44.0|              22.0|       0.0|      0.0|              0.0|             80.0| 56.00000000000001|             15.0|              1.0|             39.0|
|             Jonny P|Sun Mar 06 13:00:...|   en|Getting a pizza i...|0.0| 0.0|               8.0|               5.0|              13.0|56.00000000000001|               5.0|       0.0|      0.0|56.99999999999999|             24.0|              23.0|             83.0|56.99999999999999|             82.0|
|               Kayla|Sun Mar 06 13:00:...|   en|RT @ebhoniogarro:...|0.0| 0.0|               2.0|               0.0|               1.0|             99.0|               2.0|       0.0|      0.0|              0.0|             30.0| 56.00000000000001|             85.0|             66.0|             39.0|
|             Adamlbr|Sun Mar 06 13:00:...|   en|New Event now on....|0.0| 0.0|              24.0|              10.0|              11.0|             46.0|               4.0|       0.0|      0.0|              0.0|             11.0|              98.0|             46.0|             49.0|              6.0|
|Lexa deserved better|Sun Mar 06 13:00:...|   en|RT @canoodleclexa...|0.0| 0.0|               8.0| 7.000000000000001|               9.0|             80.0| 7.000000000000001|      84.0|      0.0|              0.0|             12.0|28.000000000000004|             73.0|             59.0|             51.0|
|  LoveBakesGoodCakes|Sun Mar 06 13:00:...|   en|Yum, yum! Honey B...|0.0| 0.0|              41.0|               2.0|               6.0|             62.0| 7.000000000000001|       0.0|      0.0|              0.0|             60.0|              69.0|             64.0|             18.0|             11.0|
|    High Tech Planet|Sun Mar 06 13:00:...|   en|Google is testing...|0.0| 0.0|              11.0|               5.0|              32.0|             37.0|               5.0|      78.0|      0.0|              0.0|56.99999999999999|              30.0|              6.0|             13.0|57.99999999999999|
|                Kael|Sun Mar 06 13:00:...|   en|RT @mgiseelle: Ha...|0.0| 0.0|              16.0|               4.0|14.000000000000002|             23.0|              13.0|       0.0|      0.0|              0.0|             68.0|              85.0|57.99999999999999|             35.0|              6.0|
|                Ryan|Sun Mar 06 13:00:...|   en|ALL THAT EFFORT T...|0.0| 0.0|              19.0|14.000000000000002|              24.0|             12.0|              24.0|      61.0|     79.0|              0.0|             78.0|               3.0|             49.0|              1.0|             91.0|
|           princesss|Sun Mar 06 13:00:...|   en|RT @SexualGif: Be...|0.0| 0.0|              13.0| 7.000000000000001|              13.0|             34.0|              15.0|       0.0|      0.0|              0.0|56.00000000000001|              93.0|             62.0|             38.0|             39.0|
|         Fadi Nasser|Sun Mar 06 13:00:...|   en|#USA missiles cha...|0.0| 0.0| 7.000000000000001|              10.0|               8.0|             30.0|              13.0|       0.0|      0.0|              0.0|             94.0|              75.0|             27.0|             23.0|             20.0|
|            Briyon?e|Sun Mar 06 13:00:...|   en|RT @tonestradamus...|0.0| 0.0|              52.0|              19.0|               5.0|              1.0|14.000000000000002|      23.0|      0.0|             75.0|             21.0|               6.0|             84.0|             44.0|             59.0|
|       BarnBurnerBBQ|Sun Mar 06 13:00:...|   en|Presenting sponso...|0.0| 0.0|              10.0|              18.0|              10.0|             26.0|               8.0|      67.0|      0.0|              0.0|             36.0|              91.0|             71.0|             91.0|              2.0|
|        Majid Navabi|Sun Mar 06 13:00:...|   en|            Download|0.0| 0.0|              12.0|               9.0|              18.0|56.99999999999999|14.000000000000002|       0.0|      0.0|              0.0|             52.0| 56.00000000000001|             15.0|            100.0|              0.0|
|        ?????? ?????|Sun Mar 06 13:00:...|   en|RT @Adel__Almalki...|0.0| 0.0|              43.0|               6.0|              20.0|              3.0|               2.0|       0.0|      0.0|              0.0|             90.0| 56.00000000000001|             15.0|              1.0|             39.0|
|                 liv|Sun Mar 06 13:00:...|   en|RT @iamjojo: You ...|0.0| 0.0|               5.0|               2.0|               9.0|             89.0|               9.0|       0.0|      0.0|              0.0|              2.0|               2.0|            100.0|             85.0|              2.0|
|           LADY GAGA|Sun Mar 06 13:00:...|   en|Miek_tweet #TilIt...|0.0| 0.0|              16.0|              16.0|               8.0|             23.0|              21.0|       0.0|      0.0|              0.0|             80.0| 56.00000000000001|             15.0|              1.0|             39.0|
|        donatello ;)|Sun Mar 06 13:00:...|   en|RT @__trillgawdd:...|0.0| 0.0|14.000000000000002|               3.0|              13.0|             66.0|               9.0|       0.0|      0.0|              0.0|             30.0| 56.00000000000001|             53.0|             69.0|             20.0|
|                 Liz|Sun Mar 06 13:00:...|   en|RT @Samantha_Evel...|0.0| 0.0|              12.0|               8.0|              24.0|             10.0|              33.0|      43.0|     72.0|             91.0|              5.0|              12.0|             34.0|             61.0|             97.0|
|    Chrystal Johnson|Sun Mar 06 13:00:...|   en|Take Aromatherapy...|0.0| 0.0|              16.0|              12.0|              44.0|              8.0|               8.0|       0.0|      0.0|              0.0|             71.0|              96.0|             40.0|             60.0|              2.0|
+--------------------+--------------------+-----+--------------------+---+----+------------------+------------------+------------------+-----------------+------------------+----------+---------+-----------------+-----------------+------------------+-----------------+-----------------+-----------------+
only showing top 20 rows

SparkSQL query example on the data.

Select all the tweets that have Anger score greated than 70%


In [6]:
val set = sqlContext.sql("select text from tweets where Anger > 60")
println(set.count)
set.show


0
+----+
|text|
+----+
+----+

Persist the dataset into a parquet file on Object Storage service

The parquet file will be reloaded in IPython Part 2 Notebook Note: you can disregard the warning messages related to SLF4J


In [7]:
fullSet.repartition(1).saveAsParquetFile("swift://notebooks.spark/tweetsFull.parquet")


SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

In [ ]: