In [122]:
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

Social media scraping 3/3

What have we achieved in the past 2 week?

1. Sanity checks

Do them

Srsly

E.g. a student email 💬

Message from Netvizz

Getting posts between 2017-09-11T00:00:00+0000 and 2017-09-18T23:59:59+0000. pid: 20446254070 / until:2017-06-19T01:15:00+0000 (100,1835008)

No posts were retrieved.

hmm... 🤔

Let's investigate

Read a Netvizz output file for scraping the page since beginning of June until mid-November 2017.


In [47]:
biposts = pd.read_csv('page_20446254070_2017_11_14_15_20_00.tab',
                      sep='\t',
                      parse_dates=['post_published'])

Re-index by dates, resample weekly, and plot counts


In [158]:
biweeks = biposts.set_index('post_published')
ax = biweeks.resample('W')['post_id'].count().plot(title="Posts per week")


"Between 2017-09-11 and 2017-09-18", and "until 2017-06-19"


In [160]:
biweeks = biposts.set_index('post_published')
ax = biweeks.resample('W')['post_id'].count().plot(title="Posts per week")
ax.annotate('"until"', xy=('2017-06-19T01:15:00+0000', 100))
ax.annotate('Requested interval', xy=('2017-09-11T00:00:00+0000', 100))
ax.axvline('2017-06-19T01:15:00+0000', linestyle='dotted')
ax.axvspan('2017-09-11T00:00:00+0000', '2017-09-18T23:59:59+0000', alpha=0.3);


Or with Tableau

2. Making the new, value-added graphs with fb_scraper

See sections 9.1 Analysis: co-reaction graph and 9.2 Analysis: user co-interaction graph

Use writeGraph in social media scraping 2/3 Notebook

write_graph(myjob1, 'CoReactionGraph')
write_graph(myjob2, 'UserCoInteractionGraph')

3. The other five Netvizz modules

  1. group data
  2. page data
  3. page like network
  4. page timeline images
  5. search
  6. link stats

4. Examples of social media scraping projects

5. A round of status updates

6. What is on your ignorance map?

Regarding your project, or make an ignorance map of social media scraping