Visualising tweets of IPL 2016 (without much programming)

Registering Twitter application.

First, you'll want to head over to https://dev.twitter.com/apps and register an application!

1) Fill Credentials:



In [3]:

    
Image(filename='/home/nipun/Pictures/Twitter_registering_application.png')









    Out[3]:

2) Get key for your apps:



In [5]:

    
Image('/home/nipun/Pictures/Twitter_OAuth.png')









    Out[5]:

Scraping Tweets

Necessary python packages:
Tweepy: https://github.com/tweepy/tweepy

$pip install tweepy



In [ ]:

    
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
consumer_key= "################"
consumer_secret = "################"
access_token = "################"
access_token_secret= "################"
tweet_file_path="/home/nipun/Downloads/Twitter_scrape/Twitter_scrape.txt"
tweet_file_handle=open(tweet_file_path,"a")

#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        tweet_file_handle.write(str(data)+"\n")
        return True

    def on_error(self, status):
        print status



#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)

#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['IPL2016'])

Configure your python file and write that data inside text file

for storing data

$python twitter_streaming.py > twitter_data.txt

JSON Format

JSON is a syntax for storing and exchanging data.
JSON is "self-describing" and easy to understand ( I am not kidding! ) > Let's Visualise it.
Key values kind of structure.

Data preprocessing and cleaning

1. Extracting main data from JSON



In [33]:

    
%matplotlib inline
import json
import pandas as pd
import matplotlib.pyplot as plt

tweets_data_path = '/home/nipun/Downloads/Twitter_scrape/Twitter_scrape.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue
        
print len(tweets_data)



In [13]:

    
tweets_df = pd.DataFrame()
tweets_df['Username'] = map(lambda tweet: tweet['user']['name'], tweets_data)
tweets_df['statuses_count'] = map(lambda tweet: tweet['user']['statuses_count'], tweets_data)
tweets_df['friends_count'] = map(lambda tweet: tweet['user']['friends_count'], tweets_data)
tweets_df['followers_count'] = map(lambda tweet: tweet['user']['followers_count'], tweets_data)
tweets_df['text'] = map(lambda tweet: tweet['text'], tweets_data)
tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)
tweets_df['location'] = map(lambda tweet: tweet['user']['location'], tweets_data)
tweets_df['created_at'] = map(lambda tweet: tweet['created_at'], tweets_data)



In [21]:

    
tweets_df.dtypes









    Out[21]:





Username           object
statuses_count      int64
friends_count       int64
followers_count     int64
text               object
lang               object
location           object
created_at         object
dtype: object

Converting created_at datatype from Object to Datetime



In [26]:

    
tweets_df['created_at'] = pd.to_datetime(tweets_df['created_at'].astype(str))
tweets_df.dtypes









    Out[26]:





Username                   object
statuses_count              int64
friends_count               int64
followers_count             int64
text                       object
lang                       object
location                   object
created_at         datetime64[ns]
dtype: object



In [27]:

    
tweets_df.head()









    Out[27]:






  
    
      
      Username
      statuses_count
      friends_count
      followers_count
      text
      lang
      location
      created_at
    
  
  
    
      0
      priyanka
      36122
      360
      10925
      RT @HEScreamSquad: Glenn Maxwell has taken a b...
      en
      india
      2016-04-23 08:27:25
    
    
      1
      MIK
      17959
      12676
      12558
      RT @HEScreamSquad: Glenn Maxwell has taken a b...
      en
      New Delhi, India
      2016-04-23 08:27:29
    
    
      2
      Suri Bishnoi
      4245
      112
      510
      RT @rajnish4587: #DDvMI #MIvsDD #IPL2016 #IPL9...
      en
      Jaipur, India
      2016-04-23 08:27:34
    
    
      3
      Hetal Vin
      102131
      1430
      2606
      @INBakwas #INBakwasIPL #IPL2016  #SRHvsKXIP\n ...
      en
      Surat, Gujarat
      2016-04-23 08:27:55
    
    
      4
      Vishvatimes
      32961
      73
      4523
      #आईपीएल : बेंगलोर ने पुणे को 13 रनों से हराया\...
      hi
      भारत
      2016-04-23 08:27:56



In [34]:

    
tweets_df.to_csv('/home/nipun/Downloads/Test_Twitter/Twitter_scrape.csv',encoding='utf-8')



In [29]:

    
tweets_df['Username'].value_counts()









    Out[29]:





The Gujarat Lions FC    32
Pune Supergiants FC     25
#IPL09                  19
❎✖©Company_420™®✖❎      15
Nazim Inamdar           13
IPL 2016 Tweets         11
vimalasubramanian       10
#VIVOIPL #IPL2016       10
Sradha Jena              9
#IPLT20 #IPL2016         8
Amey Bhosale             8
girijakriz               7
Kaptaan Kohli            6
sumit kumar              6
S K Sangwan              6
Modestologie             6
Sir Rohit Sharma         6
Thank You Prince         6
Passion❤FruiT            6
#NeverGiveUp             5
toma roy                 5
News Hunter              5
Cricket Trolls           5
`NigHtinGale°~°          5
ICC Live Scores          5
Golden Arrow             5
Sachin Tater             4
Aparna ツ                 4
Kamlesh                  4
ThatCricketGuy           4
                        ..
Mahesh Tikky             1
ICA Edu Skills           1
CHIRU                    1
#HOConZCafe              1
ADITYA SUMAN             1
Uzma Khan                1
ahamed yasir             1
Babu No Fear...          1
Hetal Vin                1
BiLaL_JaNi               1
Sarath Babu S            1
आदित्य तिवारी            1
Sachin Popat             1
Captains                 1
DEV                      1
docshan                  1
KreativeCreators         1
Rahul⏪                   1
Suyash Arora             1
Karuppaswamy             1
chandMishrA              1
#LIVE FOR OTHERS         1
Cricket Tweetz           1
Hridoy Paul              1
SRK GOD FANMania ❤       1
Michael O'Dwyer          1
NDTV Sports              1
The Viewspaper           1
Adhir Amdavadi           1
Logical Indian           1
dtype: int64

Data cleaning



In [41]:

    
new_tweet_df = pd.read_csv('/home/nipun/Downloads/Test_Twitter/Twitter_IPL2016.csv',header=0)
new_tweet_df['location'].value_counts()[0:20]









    Out[41]:





India                      951
Mumbai                     139
New Delhi                  107
Bangalore                   84
Rajkot                      76
Pune                        70
Hyderabad                   43
Cricket✔                    38
Global                      33
Chennai                     25
Pakistan                    25
Nepal                       23
Ahmedabad                   10
samastipur ( Bihar )        10
Kolkata, India              10
Trivandrum, India            9
Hisar (Haryanna) India.      9
cricket heart ❤              9
200 country                  9
india                        8
dtype: int64



In [40]:

    
#for simplicity in visualisation we will take only those locations 
#where count is greater than or equal to 10
tweet_filter_df = new_tweet_df.groupby("location").filter(lambda x: len(x) >= 10)
tweet_filter_df['location'].value_counts()









    Out[40]:





India                    951
Mumbai                   139
New Delhi                107
Bangalore                 84
Rajkot                    76
Pune                      70
Hyderabad                 43
Cricket✔                  38
Global                    33
Pakistan                  25
Chennai                   25
Nepal                     23
Kolkata, India            10
samastipur ( Bihar )      10
Ahmedabad                 10
dtype: int64



In [43]:

    
tweet_filter_df['lang'].value_counts()









    Out[43]:





en     1426
in       55
hi       43
und      41
et       24
da       23
cy       10
fi        5
tl        5
fr        4
mr        2
es        1
ml        1
bn        1
ht        1
de        1
lv        1
dtype: int64



In [51]:

    
tweet_filter_df['created_at'] = pd.to_datetime(tweet_filter_df['created_at'].astype(str))
tweet_filter_df.dtypes
tweet_filter_df.to_csv('/home/nipun/Downloads/Test_Twitter/Finished_Twitter_IPL2016.csv',index=0,encoding='utf-8')

Ordinary way to visualise data



In [44]:

    
tweets_by_lang = tweet_filter_df['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='cyan')









    Out[44]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f4d0605ee10>



In [ ]:

	Username	statuses_count	friends_count	followers_count	text	lang	location	created_at
0	priyanka	36122	360	10925	RT @HEScreamSquad: Glenn Maxwell has taken a b...	en	india	2016-04-23 08:27:25
1	MIK	17959	12676	12558	RT @HEScreamSquad: Glenn Maxwell has taken a b...	en	New Delhi, India	2016-04-23 08:27:29
2	Suri Bishnoi	4245	112	510	RT @rajnish4587: #DDvMI #MIvsDD #IPL2016 #IPL9...	en	Jaipur, India	2016-04-23 08:27:34
3	Hetal Vin	102131	1430	2606	@INBakwas #INBakwasIPL #IPL2016 #SRHvsKXIP\n ...	en	Surat, Gujarat	2016-04-23 08:27:55
4	Vishvatimes	32961	73	4523	#आईपीएल : बेंगलोर ने पुणे को 13 रनों से हराया\...	hi	भारत	2016-04-23 08:27:56