Twitter + Watson Tone Analyzer sample Notebook Part 1: Loading the data

In this Notebook, we show how to load the custom library generate as part of the Twitter + Watson Tone Analyzer streaming application. Code can be found here: https://github.com/ibm-cds-labs/spark.samples/tree/master/streaming-twitter. The following code is using a pre-built jar has been posted on the Github project, but you can replace with your own url if needed.


In [1]:
%AddJar https://github.com/DTAIEB/demos/raw/master/streaming-twitter-assembly-1.5.jar -f


Starting download from https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.2.jar
Finished download of streaming-twitter-assembly-1.2.jar

Set up the Twitter and Watson credentials

Please refer to the tutorial for details on how to find the Twitter and Watson credentials, then add the value in the placeholders specified in the code below


In [2]:
val demo = com.ibm.cds.spark.samples.StreamingTwitter
demo.setConfig("twitter4j.oauth.consumerKey","XXXXX")
demo.setConfig("twitter4j.oauth.consumerSecret","XXXXX")
demo.setConfig("twitter4j.oauth.accessToken","XXXXX")
demo.setConfig("twitter4j.oauth.accessTokenSecret","XXXXX")
demo.setConfig("watson.tone.url","https://gateway.watsonplatform.net/tone-analyzer-experimental/api")
demo.setConfig("watson.tone.password","XXXXX")
demo.setConfig("watson.tone.username","XXXXX")

Start the Spark Stream to collect live tweets

Start a new Twitter Stream that collects the live tweets and enrich them with Sentiment Analysis scores. The stream is run for a duration specified in the second argument of the startTwitterStreaming method. Note: if no duration is specified then the stream will run until the stopTwitterStreaming method is called.


In [3]:
import org.apache.spark.streaming._
demo.startTwitterStreaming(sc, Seconds(40))


Twitter stream started
Tweets are collected real-time and analyzed
To stop the streaming and start interacting with the data use: StreamingTwitter.stopTwitterStreaming
Stopping Twitter stream. Please wait this may take a while
Twitter stream stopped
You can now create a sqlContext and DataFrame with 184 Tweets created. Sample usage: 
val (sqlContext, df) = com.ibm.cds.spark.samples.StreamingTwitter.createTwitterDataFrames(sc)
df.printSchema
sqlContext.sql("select author, text from tweets").show

Create a SQLContext and a dataframe with all the tweets

Note: this method will register a SparkSQL table called tweets


In [4]:
val (sqlContext, df) = demo.createTwitterDataFrames(sc)


A new table named tweets with 184 records has been correctly created and can be accessed through the SQLContext variable
Here's the schema for tweets
root
 |-- author: string (nullable = true)
 |-- date: string (nullable = true)
 |-- lang: string (nullable = true)
 |-- text: string (nullable = true)
 |-- lat: double (nullable = true)
 |-- long: double (nullable = true)
 |-- Cheerfulness: double (nullable = true)
 |-- Negative: double (nullable = true)
 |-- Anger: double (nullable = true)
 |-- Analytical: double (nullable = true)
 |-- Confident: double (nullable = true)
 |-- Tentative: double (nullable = true)
 |-- Openness: double (nullable = true)
 |-- Agreeableness: double (nullable = true)
 |-- Conscientiousness: double (nullable = true)

Execute a SparkSQL query that contains all the data


In [5]:
val fullSet = sqlContext.sql("select * from tweets")  //Select all columns
fullSet.show


+-------------------+--------------------+-----+--------------------+---+----+------------+--------+-----+----------+---------+---------+------------------+-------------+-----------------+
|             author|                date| lang|                text|lat|long|Cheerfulness|Negative|Anger|Analytical|Confident|Tentative|          Openness|Agreeableness|Conscientiousness|
+-------------------+--------------------+-----+--------------------+---+----+------------+--------+-----+----------+---------+---------+------------------+-------------+-----------------+
|       Hanna Atwood|Wed Oct 21 22:25:...|   en|RT @NoHoesMo: TOD...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|               1.0|         88.0|            100.0|
|              Donut|Wed Oct 21 22:25:...|   en|RT @3lazed: Ball ...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              54.0|          0.0|             68.0|
|               la?s|Wed Oct 21 22:25:...|   en|@deusmelirry hsis...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              99.0|          0.0|             49.0|
|               Camo|Wed Oct 21 22:25:...|   en|RT @TrapDrugs: It...|0.0| 0.0|       100.0|     0.0|  0.0|     100.0|      0.0|      0.0|               0.0|         59.0|              1.0|
|               njit|Wed Oct 21 22:25:...|   en|October 21, 2015 ...|0.0| 0.0|       100.0|     0.0|  0.0|       0.0|      0.0|      0.0|               0.0|        100.0|            100.0|
|       Emelly Mejia|Wed Oct 21 22:25:...|   en|RT @GirIsWant: li...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|               0.0|         43.0|             68.0|
|       Kristin Wong|Wed Oct 21 22:25:...|   en|I'm entered to wi...|0.0| 0.0|       100.0|     0.0|  0.0|       0.0|      0.0|      0.0|               0.0|         99.0|            100.0|
|               demi|Wed Oct 21 22:25:...|   en|RT @cindasmommy: ...|0.0| 0.0|        92.0|     0.0|  0.0|       0.0|      0.0|      0.0|               4.0|         98.0|             34.0|
|Long Island is Home|Wed Oct 21 22:25:...|   en|Audible Wilmer Fl...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|             100.0|          0.0|             68.0|
|        Thomas Gase|Wed Oct 21 22:25:...|   en|Action between Va...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              20.0|         99.0|             68.0|
|               Lori|Wed Oct 21 22:25:...|   en|Somehow Kaylee op...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|    100.0|     90.0|               1.0|         97.0|              6.0|
|           KN SOLID|Wed Oct 21 22:25:...|   en|#PushAwardsKathNi...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              97.0|          0.0|             68.0|
|                 Mo|Wed Oct 21 22:25:...|   en|RT @Johnny_Piazza...|0.0| 0.0|         0.0|   100.0|  0.0|       0.0|      0.0|    100.0|28.999999999999996|          0.0|              0.0|
|          06488jcsb|Wed Oct 21 22:25:...|   en|Get Weather Updat...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              74.0|         45.0|             96.0|
|          06021lkta|Wed Oct 21 22:25:...|   en|Get Weather Updat...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              74.0|         45.0|             96.0|
|   WeLiveinaKNWorld|Wed Oct 21 22:25:...|en-gb|RT @Silentkathnie...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              97.0|          0.0|            100.0|
|       Prettyelisha|Wed Oct 21 22:25:...|   en|RT @AyRealTalk: Y...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              97.0|          0.0|             68.0|
|7|-|3 4R(|-|4|\|93L|Wed Oct 21 22:25:...|   en|@BemetOr22 This i...|0.0| 0.0|         0.0|     0.0|  0.0|       0.0|      0.0|      0.0|              51.0|         15.0|             98.0|
|              ?IGY?|Wed Oct 21 22:25:...|   en|(2/3) --become a ...|0.0| 0.0|       100.0|   100.0|  0.0|      47.0|      0.0|     83.0|               0.0|         98.0|7.000000000000001|
|                KKU|Wed Oct 21 22:25:...|   en|@offbeatorbit I k...|0.0| 0.0|         0.0|     0.0|  0.0|     100.0|      0.0|      0.0|               4.0|         54.0|              9.0|
+-------------------+--------------------+-----+--------------------+---+----+------------+--------+-----+----------+---------+---------+------------------+-------------+-----------------+

Persist the dataset into a parquet file on Object Storage service

The parquet file will be reloaded in IPython Part 2 Notebook Note: you can disregard the warning messages related to SLF4J


In [6]:
fullSet.repartition(1).saveAsParquetFile("swift://notebooks.spark/tweetsFull.parquet")


SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

SparkSQL query example on the data.

Select all the tweets that have Anger score greated than 70%


In [7]:
val angerSet = sqlContext.sql("select author, text, Anger from tweets where Anger > 70")
println(angerSet.count)
angerSet.show


17
+----------------+--------------------+-----+
|          author|                text|Anger|
+----------------+--------------------+-----+
|         Freddie|RT @AHSHotel_: Wh...|100.0|
|   Monica Abrego|THE NEW PATD SONG...|100.0|
|  sp??ky pumpkin|rohi you've got m...|100.0|
|    Funny funny |My XXXXXXXXXX so ...|100.0|
|             Vai|XXXXXXX love the ...|100.0|
| Carter Pederson|RT @EvanMcSan: Bi...|100.0|
|Upside-down Joke|@CazCoyote @chipf...|100.0|
|            Kwin|"I go XXX first....|100.0|
|          Berlin|I hate when a guy...|100.0|
|         blayne?|If I don't find y...|100.0|
|         Patrick|RT @Pro_Jones_: D...|100.0|
|  Drizzy A Reyes|Dear diary, today...|100.0|
|      Chika MADU|RT @EmekaGift: @M...|100.0|
|        UrbanKid|@TheSlimJesus i k...|100.0|
|      jocelynnnn|@_baileelara figh...|100.0|
|      Tay_Baeee?| Hate sleeping alone|100.0|
|           Miraf|@Called_A_Legend ...|100.0|
+----------------+--------------------+-----+


In [ ]: