Q7 Most popular tweets:

- Most popular tweets means here is the tweet which has been re-tweeted maximum number of times.
- Get top 100 most re-tweeted tweets in last 1 hour related to “iphone”.



In [1]:

    
from __future__ import print_function

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn # For plotting
import time
import json

from IPython import display  # To work with graphs on jupyter
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import desc
from collections import namedtuple #function for creating tuple subclasses with named fields



In [2]:

    
# magic function to plot inline
%matplotlib inline



In [3]:

    
if __name__ == "__main__":

    sc = SparkContext(appName="TwitterRetweet")
    ssc = StreamingContext(sc, 60 * 60)  # Setting 1hr interval
    sqlContext = SQLContext(sc)  # Sql context for running sql query
 
    # Host port of server which is sending text stream
    host = "localhost"
    port = 8700
    socketStream = ssc.socketTextStream(host, port) # Connecting to socket
    dStream = socketStream.window(60 * 60)  # Setting 1hr window
    
    def parseTweet(dStream): # Data Manupulation
        try:
            data = json.loads(dStream)  # Load the json data
            return [( # Tuple of name and follower count
                     data.get("text", "undefined"), 
                     int(data.get("retweetcount", 0))
                    )]
        except:
            return []
        
    def displayTweet(time, rdd): # Print the data in readable format
        try:
            print(time)
            print("Top 100 Popular Tweets: ")
            print("Rank".center(6, "-") + "|" + "Tweet".center(40, "-") + "|" + "Retweet Count".center(20, "-"))
            for rank, item in enumerate(rdd.distinct().takeOrdered(100, key=lambda x: -x[1])):
                print(str(rank + 1).center(6, " ") + 
                      "|" + item[0] + 
                      "|" + str(item[1]).rjust(15, " ")
                     )
        except ValueError:
            pass
    
    _influencial = dStream.flatMap(parseTweet)\
                          .transform(  # Sorting the data
                                     lambda rdd: rdd.sortBy(lambda x: x[1], ascending=False)
                          ).foreachRDD(displayTweet)
    
    fields = ("id", "count")
    Tweet = namedtuple('Tweet', fields)
    # DStream where all the computation is done
    (dStream.flatMap(parseTweet)\
                          .transform(  # Sorting the data
                                     lambda rdd: rdd.sortBy(lambda x: x[1], ascending=False)
                          )\
          .map(lambda rec: Tweet(rec[0], rec[1]))\
          .foreachRDD(lambda rdd: rdd.toDF().sort(desc("count"))\
                      .limit(10).registerTempTable("tweets")))
        
    ssc.start()
#    ssc.awaitTermination()



In [4]:

    
while True:  # Display graph here
    try:
        time.sleep(60 * 60)  # Sleep 1hr, plot graph every hour
        topics = sqlContext.sql('Select id, count from tweets')
        topics = topics.toPandas()
        display.clear_output(wait=True)
        sn.set_style("whitegrid")  # Styling of plot
        sn.plt.figure(figsize = (10, 8)) # Figuresize of plot
        ax = sn.barplot(x=(topics.index.values + 1), y=topics["count"], estimator=sum)
        ax.set(xlabel='Rank of Tweet', ylabel='Retweet Count') # Labeling of plot
        sn.plt.show()
    except KeyboardInterrupt:  # User interrupt
        ssc.stop()
        print("Stoping the program")
        break
    # Continue even if there is exception and stop only on Keyboard Interrupt
    except Exception as e:  
        print(e)
        continue









    












    



2017-10-08 19:39:50
Top 100 Popular Tweets: 
-Rank-|-----------------Tweet------------------|---Retweet Count----
  1   |Learn something new every day: "How to Use Your iPhone as an Audio Input on Your Mac" https://t.co/vCD1e2BGks|              0
  2   |モイ！iPhoneからキャス配信中 - / ザッツダーン！！！ https://t.co/M0YGeQUlLN|              0
  3   |RT @DtimesDream: 【 #Dtimes読者プレゼント 】
フォロー&amp;RTで #イングレム ディズニーデザイン｢スマホ耐衝撃ケース キャトル｣（iPhone 8/7/6s/6用） ミニーデザインを1名様にプレゼント☆
〆切は10月15日まで！
商品詳細… |              0
  4   |RT @busou_shoutotsu: 【#ドイ帰れま10 完遂！】
このツイキャスを見ていただけたらわかる通りミッション4&amp;5同時クリア！
これにてドイの帰れま10無事に終了！
武装衝突の三連休ジャック大トリは明日！えのの帰ります！朝10時スタート！
乞うご期待！… |              0
  5   |モイ！iPhoneからキャス配信中 - https://t.co/SpXd4NhuzU|              0
  6   |Harry Potter Quotes Cover Case for iPhone https://t.co/2P8OhV6chf ... https://t.co/wzdZp6VVyE https://t.co/xs8YR4nkAi|              0
  7   |iPhoneの画像フォルダみると、思い出が染みてくる。変な言い方だけど、俺結構付き合って遊べてやれたら充分とか思うタイプのとこもあるから絶対に珍しい( ･᷄･᷅ )|              0
  8   |RT @martyswant: Asked by David Remnick for an example of a misuse of an iPhone, Apple chief designer Jony Ive said "perhaps constant use."#…|              0
  9   |RT @Eye_drops_flow: #photography
#photo
#ふぉと
#ファインダー越しの私の世界
#写真好きな人と繋がりたい 
#夕日
#夕暮れ
#空
#花
#flower
#iPhone https://t.co/vBdyJtCelR|              0
  10  |iPhoneか...|              0
  11  |RT @chochos: —¿Cuánto cuesta el iPhone X?
—$28,000.
—¿Y el 8?
—$22,000.
—¿El 7?
—$18,000.
—¿Y el 6?
—$10,000.
—¿Me da una recarga de $20?|              0
  12  |Κυκλάδες / Τήνος / Κρόκος  
iPhone 7
Copyright © 2017 Vassilis Makris

#iphone7 #procamapp @ Tínos https://t.co/p77LRvduH8|              0
  13  |モイ！iPhoneからBMSキャス配信中 -無謀特攻祭り https://t.co/k0Kpk1tfvu|              0
  14  |@ismeenn iphone takde 😫|              0
  15  |Paseando por la #playa #iphone https://t.co/FyfcobuaY9|              0
  16  |モイ！iPhoneからキャス配信中 - #福島#釣り https://t.co/ZsAQu0MGSH|              0
  17  |いいiPhoneケース見つからない|              0
  18  |バッキバキザリザリのiPhoneもっかい落としてさらにヒビ増えた笑
このままやとiPhone粉末化する
画面の見にくさすんごい👾👾👾|              0
  19  |モイ！iPhoneからキャス配信中 - / からおけ #ボカロ https://t.co/cMSAnOYEpm|              0
  20  |モイ！iPhoneからキャス配信中 - https://t.co/bmilBAEH39|              0
  21  |モイ！iPhoneからキャス配信中 - / 1枠だけ💭 https://t.co/aGcYvJjQht|              0
  22  |RT @AlgoElegante: El iPhone 8 y el iPhone X son tan inteligentes que si intentas escuchar "Despacito" se bloquean y se reportan como robado…|              0
  23  |RT @MochiManggae: If it was a Nokia the bullet would have bounced back and got the shooter. https://t.co/KnaSPFrepG|              0
  24  |モイ！iPhoneからキャス配信中 - / 本日も課題やってます https://t.co/JI2jVGZHew|              0
  25  |SORTEO iPHONE 8 INTERNACIONAL https://t.co/Sk0BacGx9c|              0
  26  |RT @comedyandtruth: Beyoncé: *uses iPhone X facial recognition*

iPhone X: https://t.co/RrmnqnQPV2|              0
  27  |RT @avoir1375: 電車で隣にいる幼女がお母さんのiPhone借りて勝手にfgo起動して10連やり始めてお母さんがちょっ！？なにやってんのばか！！！ああもう！！とか言って頭抱えてたんだけど、幼女がマーリン出した瞬間手の平クルーしてよくやった！！ってめっちゃ抱きしめて撫…|              0
  28  |モイ！iPhoneからキャス配信中 -湧き湧き https://t.co/p06dSxyPtV|              0
  29  |モイ！iPhoneからキャス配信中 - https://t.co/ZyQq0N9p1s|              0
  30  |Gli iPhone non sono vittima di obsolescenza programmata secondo nuovi benchmark - https://t.co/6W9g3RY9CF https://t.co/gohZpoaKeH|              0
  31  |#sale #tablet Easybuy LCD Glass Screen Outer Lens Cover for iPhone 5S (White) (Intl) #philiphines #shop https://t.co/i30gJDFf6h|              0
  32  |¡Participo en el sorteo de un iPhone 8 Plus! ¿Te lo vas a perder? Únete al #SorteoiPhone8Plus en https://t.co/1lXRLgFoZ8 #SorteoGS8|              0
  33  |モイ！iPhoneからキャス配信中 - コスモード取材二日目終わり！ #マチアソビ https://t.co/9B6iDHiYBt|              0
  34  |RT @kaor1n_n: ついに4年間使った5cとお別れして、iPhone7plusに機種変したんですけど、画面の大きさが違いすぎてびっくりしてます笑
とにかく写真がめっちゃ綺麗!!!
すごい！！
iPhoneケース新しく探すのってわくわくしますよね꜀(.௰. ꜆)꜄〜〜|              0
  35  |RT @THEPHARRAON: non mais il a qd mm perdu son iPhone 6 Plus 128G 😭 https://t.co/zaESPlWY6K|              0
  36  |モイ！iPhoneからキャス配信中 - / ROJACK投票お願いします https://t.co/TIQqBqae0t|              0
Stoping the program

Insights

The reason why retweet count is always 0 is because we receive the tweets as they are posted live on twitter platform, by the time we receive the tweet no other user had a chance to retweet it. If we want to find out the retweet_count we have to refetch this particular tweet some time later using the rest api then we can see the retweet_count will contain the number of retweets happened till this particular point in time. This is not done here currently, because hitting rest api for such a large amount of tweets will hit the limit.