Michaël Defferrard, PhD student, EPFL LTS2
Theme of the exercise: understand the impact of your communication on social networks. A real life situation: the marketing team needs help in identifying which were the most engaging posts they made on social platforms to prepare their next AdWords campaign.
As you probably don't have a company (yet?), you can either use your own social network profile as if it were the company's one or choose an established entity, e.g. EPFL. You will need to be registered in FB or Twitter to generate access tokens. If you're not, either ask a classmate to create a token for you or create a fake / temporary account for yourself (no need to follow other people, we can fetch public data).
At the end of the exercise, you should have two datasets (Facebook & Twitter) and have used them to answer the following questions, for both Facebook and Twitter.
Tasks:
Note that some data cleaning is already necessary. E.g. there are some FB posts without message, i.e. without text. Some tweets are also just retweets without any more information. Should they be collected ?
In [ ]:
# Number of posts / tweets to retrieve.
# Small value for development, then increase to collect final data.
n = 20 # 4000
There is two ways to scrape data from Facebook, you can choose one or combine them.
You will need an access token, which can be created with the help of the Graph Explorer. That tool may prove useful to test queries. Once you have your token, you may create a credentials.ini
file with the following content:
[facebook]
token = YOUR-FB-ACCESS-TOKEN
In [ ]:
import configparser
credentials = configparser.ConfigParser()
credentials.read('credentials.ini')
token = credentials.get('facebook', 'token')
# Or token = 'YOUR-FB-ACCESS-TOKEN'
In [ ]:
import requests # pip install requests
import facebook # pip install facebook-sdk
In [ ]:
page = 'EPFL.ch'
In [ ]:
# Your code here.
There exists a bunch of Python-based clients for Twitter. Tweepy is a popular choice.
You will need to create a Twitter app and copy the four tokens and secrets in the credentials.ini
file:
[twitter]
consumer_key = YOUR-CONSUMER-KEY
consumer_secret = YOUR-CONSUMER-SECRET
access_token = YOUR-ACCESS-TOKEN
access_secret = YOUR-ACCESS-SECRET
In [ ]:
import tweepy # pip install tweepy
# Read the confidential tokens and authenticate.
auth = tweepy.OAuthHandler(credentials.get('twitter', 'consumer_key'), credentials.get('twitter', 'consumer_secret'))
auth.set_access_token(credentials.get('twitter', 'access_token'), credentials.get('twitter', 'access_secret'))
api = tweepy.API(auth)
user = 'EPFL_en'
In [ ]:
# Your code here.
Answer the questions using pandas, statsmodels, scipy.stats, bokeh.
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
In [ ]:
# Your code here.