This notebook convert a mailing list (or a set of mailing lists) into a network of interaction

What it does: -it creates a network of interaction between senders and receivers of emails, on one or more mailing lists -it generates a .gexf file that can be imported in Gephi for visualization and analysis

Parameters to set options: -it can look in one or more mailing lists, according to how many urls are set in the ‘urls’ variable; networks are aggregated across mailing lists -it can filter the network by date; set the variable 'date_from' and 'date_to' with a date frame consistent with the data


In [9]:
%matplotlib inline

In [1]:
from bigbang.archive import Archive
from bigbang.archive import load as load_archive
import bigbang.parse as parse
import bigbang.graph as graph
import bigbang.mailman as mailman
import bigbang.process as process
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
from pprint import pprint as pp
import pytz

In [13]:
#Insert the list of urls (one or more) from which to gather the data
#e.g. urls = [urls = ["http://mm.icann.org/pipermail/cc-humanrights/", 
                    # "http://mm.icann.org/pipermail/wp4/", 
                    # "http://mm.icann.org/pipermail/ge/"]

        
urls = ["http://mm.icann.org/pipermail/cc-humanrights/", 
                "http://mm.icann.org/pipermail/wp4/", 
                "http://mm.icann.org/pipermail/wp1/"]


try:
    arch_paths =[]
    for url in urls:
        arch_paths.append('../archives/'+url[:-1].replace('://','_/')+'.csv')
    archives = [load_archive(arch_path).data for arch_path in arch_paths]
except:
    arch_paths =[]
    for url in urls:
        arch_paths.append('../archives/'+url[:-1].replace('//','/')+'.csv')
    archives = [load_archive(arch_path).data for arch_path in arch_paths]
archives_merged = pd.concat(archives)
archives_data = Archive(archives_merged).data

Set a valid date frame for building the network.


In [3]:
#The oldest date and more recent date for the whole mailing lists are displayed, so you WON't set an invalid time frame 
print archives_data['Date'].min()
print archives_data['Date'].max()


2014-12-02 01:06:24
2016-06-27 14:39:29

In [4]:
#set the date frame
date_from = pd.datetime(2000,11,1,tzinfo=pytz.utc)
date_to = pd.datetime(2111,12,1,tzinfo=pytz.utc)

Filter data according to date frame and export to .gexf file


In [5]:
def filter_by_date(df,d_from,d_to):
    return df[(df['Date'] > d_from) & (df['Date'] < d_to)]

In [6]:
#create filtered network
archives_data_filtered = filter_by_date(archives_data, date_from, date_to)
network = graph.messages_to_interaction_graph(archives_data_filtered)

In [7]:
#export the network in a format that you can open in Gephi. 

#insert a valid path and file name (e.g. path = 'c:/bigbang/network.gexf')
path = 'c:/users/davide/bigbang/network_for_gephi.gexf'

nx.write_gexf(network, path)

In [ ]: