Aims :
To do so, we need a few things :
01-ML
folder.Ideal visualisation : Whenever we enter the page about someone -> Some general information (from Gaël's team maybe), and then, a graph with the votes (Q: What to display ?)
-> Idea : a vertical bar, which height corresponds to the number of law that the deputy voted on. Inside the vertical bar, show 5 sectors : percentage of yes/no/abstention/absent/president for each period. Then, click on it and a detailled view on a single period appears -> Same kind of graph, except the guy voted yes or no for a law (color for yes/no/abstention, other color for absent) + Color for whether the law was passed or not.
Maybe use stacked bar chart for the overall view (shift it so the bars are horizontal) or alternatively grouped to stacked
When you click on a session, maybe display something like Table with bar chart with change to another page, with either again the same chart, or a grouped bar chart, or we could use a mix between collapsed tree and a dendrogram with bars
In [1]:
import pandas as pd
import glob
import os
import numpy as np
import matplotlib.pyplot as plt
from helpers import load_vote_df, load_voting,format_voting_session, split_df_dict
%matplotlib inline
%load_ext autoreload
%autoreload 2
# There's a lot of columns in the DF.
# Therefore, we add this option so that we can see more columns
pd.options.display.max_columns = 100
In [2]:
VOTE_DICT = {1:'Yes', 2:'No', 3:'Abstention', 5:'Absent', 6:'Excused', 7:'President'}
vote_df = load_vote_df()
N.B. SessionName
field is rubbish -> The correct session is not the one mentionned in it. Hence we need to merge the Session ID with the one from the scraped session.
Looking at the Swiss Parties, their real names and abbreviation are the following:
However, we talk about the Groupes Parlementaires, which are not exactly the same as the Parties. We will work with those here.
format_voting_session : Appends session related fields to the voting data, preparing for aggregation.
In [3]:
voting_df = format_voting_session(load_voting())
voting_df.head()
Out[3]:
Still has a lot of data, but we tried to keep a minimum of it to make it readable by a human as well as having all the information we need to display.
In [4]:
voting_dict = split_df_dict(voting_df, 'Name')
Summary of the different votes possible
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|
Yes | No | Abstention | No entry | Did not participate | Excused | The president of the session does not vote |
Counting the percentage of abstention by person, in order to see whether there is a person that would abstain significantly more often than the rest. It does not appear to be the case.
In [5]:
#count_abst = lambda x:
voting_test = voting_df[['Name','Decision']].groupby('Name').apply(lambda x: np.sum(x.Decision==3)/len(x.Decision))
voting_test.sort_values(ascending=False).head()
Out[5]:
Simply trying the aggregation at a session level for a deputy in particular, it will have to be done for every single one after, then exporting the resulting DataFrame to a .csv
or .json
to make it readable with javascript
.
In [6]:
test = voting_dict['Filippo Leutenegger']
test.head()
Out[6]:
In [7]:
voting_dict['Filippo Leutenegger'].loc[voting_dict['Filippo Leutenegger'].IdSession==4913].SessionName.unique()[0]
Out[7]:
In [8]:
test = voting_dict['Filippo Leutenegger'].groupby(['IdSession','SessionName','Date'])
count_yes = lambda x: np.sum(x==VOTE_DICT[1])
count_no = lambda x: np.sum(x==VOTE_DICT[2])
count_abstention = lambda x: np.sum(x==VOTE_DICT[3])
count_absent = lambda x: np.sum(x==VOTE_DICT[5])
count_excused = lambda x: np.sum(x==VOTE_DICT[6])
count_president = lambda x: np.sum(x==VOTE_DICT[7])
name_session = lambda x: x.unique()[0]
year_session = lambda x: x.unique()[0]
test = test.agg({'Decision':{'Yes': count_yes, 'No': count_no,'Abstention': count_abstention,
'Excused':count_excused, 'Absent':count_absent, 'President':count_president}})
test.columns = test.columns.droplevel(0)
#test_grouped = test_grouped.join(test['SessionName','Year'])
test = test.reset_index().set_index('IdSession')
test = test[['Yes','No','Abstention','Absent','Excused','President','SessionName','Date']]
#test['Total'] = test.sum(axis=1)
test
Out[8]:
Aggregating further up at a yearly level and displaying the result.
In [9]:
test = test.drop('SessionName',1).groupby('Date').apply(sum).drop('Date',1)
test['Presence'] = test['Yes']+test['No']+test['Abstention']+test['President']
test['Absence'] = test['Excused']+test['Absent']
test.plot.bar(x=test.index, y=['Yes','No','Abstention','Excused','Absent','President'])
#test.plot.bar(x=test.index,y=['Presence','Absence'])
Out[9]:
N.B. We checked for consistency, and from the time the person is elected, he appears at all subsequent votes (the sum of all the fields will give the total number of votes from the session)
In [10]:
from helpers import export_session_vote_csv
directory = '../../datas/analysis/voting_agg/'
if not os.path.exists(directory):
os.makedirs(directory)
for deputee, df in voting_dict.items():
export_session_vote_csv(directory,deputee,df,'_vote_session.csv')
df_grouped = df.groupby(['IdSession','SessionName','Date'])
Before adding fields, let us first merge some parts of the vote DataFrame
with the voting DataFrame
In [11]:
def merge_vote_voting(vote_df, voting_df):
vote_info_df = vote_df[['Subject','MeaningNo','MeaningYes','ID']]
vote_info_df.columns = ['Subject','MeaningNo','MeaningYes','IdVote']
return pd.merge(voting_df, vote_info_df, on ='IdVote')
voting_df = merge_vote_voting(vote_df,voting_df)
voting_df.head()
Out[11]:
First of all, count the overall result of the vote for the whole councile considered
In [31]:
def format_result_vote(voting_df):
# Setting Name, Party and Topic as indices
result_df = voting_df.set_index('IdVote')[['Decision']]
result_df = result_df.groupby(level='IdVote')
# Functions to count number of yes/no/absentions, same principle as before
count_total = lambda x: sum(x==VOTE_DICT[1])+sum(x==VOTE_DICT[2])+sum(x==VOTE_DICT[3])
count_yes = lambda x: int(round(100*np.sum(x==VOTE_DICT[1])/count_total(x),0))
count_no = lambda x: int(round(100*np.sum(x==VOTE_DICT[2])/count_total(x),0))
count_abstention = lambda x: int(round(100*np.sum(x==VOTE_DICT[3])/count_total(x),0))
result_df = result_df.agg({'Decision':{'Total_Yes': count_yes, 'Total_No': count_no,'Total_Abstention': count_abstention,
'Total_Total': count_total}})
result_df.columns = result_df.columns.droplevel(0)
result_df.reset_index(inplace=True)
return pd.merge(voting_df, result_df, on= 'IdVote')
Secondly, compute the voting result aggregated to a party level, to be able to know whetether someone voted like his party
In [32]:
def format_party_voting(voting_df):
# Setting Name, Party and Topic as indices
party_voting_df = voting_df.set_index(['ParlGroupCode','IdVote'])[['Decision']]
party_voting_df = party_voting_df.groupby(level=['ParlGroupCode','IdVote'])
# Functions to count number of yes/no/absentions, same principle as before
count_total = lambda x: sum(x==VOTE_DICT[1])+sum(x==VOTE_DICT[2])+sum(x==VOTE_DICT[3]) if sum(x==VOTE_DICT[1])+sum(x==VOTE_DICT[2])+sum(x==VOTE_DICT[3]) != 0 else 1
count_yes = lambda x: int(round(100*np.sum(x==VOTE_DICT[1])/count_total(x),0))
count_no = lambda x: int(round(100*np.sum(x==VOTE_DICT[2])/count_total(x),0))
count_abstention = lambda x: int(round(100*np.sum(x==VOTE_DICT[3])/count_total(x),0))
party_voting_df = party_voting_df.agg({'Decision':{'Party_Yes': count_yes, 'Party_No': count_no,'Party_Abstention': count_abstention,
'Party_Total':count_total}})
party_voting_df.columns = party_voting_df.columns.droplevel(0)
party_voting_df.reset_index(inplace=True)
return pd.merge(voting_df, party_voting_df, on=['ParlGroupCode', 'IdVote'])
Applying the functions we previously defined to make the information about the vote of each deputee complete.
In [33]:
deputee_vote = format_result_vote(voting_df)
deputee_vote = format_party_voting(deputee_vote)
deputee_vote = deputee_vote.fillna('Not specified')
deputee_vote = deputee_vote.drop(['ParlGroupName','ParlGroupCode','IdSession'],1)
Exporting the result to the voting_deputee
folder
In [34]:
deputee_vote.head()
Out[34]:
In [35]:
deputee_dict = split_df_dict(deputee_vote, 'Name')
from helpers import export_session_vote_csv
directory = '../../datas/analysis/voting_deputee/'
if not os.path.exists(directory):
os.makedirs(directory)
for deputee, df in deputee_dict.items():
df.to_csv(directory+deputee+'_vote.csv',index=False)