Person Analysis

Aims :

We want to see how a person behaves at each vote -> Create a DataFrame that records whether the person voted like the average of the party (Could be a good thing to have what the party instructs to vote at this time), and whether the person was present.
We want to compute some global statistics about each person :
- Percentage of the time during which a person votes like his party
- Percentage of abstention
- Percentage of absence (Days or vote the person did not take ?)
- Time at which he entered the parliament.

To do so, we need a few things :

The Vote file that we parsed, that records all the votations taking place at the parliament. We will need to find the earliest law that each person votes, and then, from this, count all the laws that have been voted, and count the number of votes the person did not attend.
The Voting file will be our primary source of informations, recording all the votes of every person at the parliament. We will need to split it by each person, which was already done beforehand in the 01-ML folder.

Ideal visualisation : Whenever we enter the page about someone -> Some general information (from Gaël's team maybe), and then, a graph with the votes (Q: What to display ?)

-> Idea : a vertical bar, which height corresponds to the number of law that the deputy voted on. Inside the vertical bar, show 5 sectors : percentage of yes/no/abstention/absent/president for each period. Then, click on it and a detailled view on a single period appears -> Same kind of graph, except the guy voted yes or no for a law (color for yes/no/abstention, other color for absent) + Color for whether the law was passed or not.

Maybe use stacked bar chart for the overall view (shift it so the bars are horizontal) or alternatively grouped to stacked

When you click on a session, maybe display something like Table with bar chart with change to another page, with either again the same chart, or a grouped bar chart, or we could use a mix between collapsed tree and a dendrogram with bars



In [1]:

    
import pandas as pd
import glob
import os
import numpy as np
import matplotlib.pyplot as plt
from helpers import load_vote_df, load_voting,format_voting_session, split_df_dict
%matplotlib inline
%load_ext autoreload
%autoreload 2

# There's a lot of columns in the DF. 
# Therefore, we add this option so that we can see more columns
pd.options.display.max_columns = 100

1. Loading and formatting the Vote DataFrame

Convert date into datetime format with pd.to_datetime allows us to sort the dates the way we want.

Attributes of datetime object: year, month, day, hour, minute, second, microsecond, and tzinfo.

A few useful German words unknown to me.

Antrag : demande



In [2]:

    
VOTE_DICT  = {1:'Yes', 2:'No', 3:'Abstention', 5:'Absent', 6:'Excused', 7:'President'}

vote_df = load_vote_df()









    



Entries in the DataFrame (11182, 16)

N.B. SessionName field is rubbish -> The correct session is not the one mentionned in it. Hence we need to merge the Session ID with the one from the scraped session.

Looking at the Swiss Parties, their real names and abbreviation are the following:

PLR : Parti libéral-radical (PLR. Les Libéraux-Radicaux)
PDC : Parti démocrate-chrétien suisse
PS : Parti socialiste suisse
UDC : Union démocratique du centre
PES : Parti écologiste suisse (Les Verts)
PVL : Parti vert-libéral
PBD : Parti Bourgeois-Démocratique

However, we talk about the Groupes Parlementaires, which are not exactly the same as the Parties. We will work with those here.

2. Load the Voting DataFrame

format_voting_session : Appends session related fields to the voting data, preparing for aggregation.



In [3]:

    
voting_df = format_voting_session(load_voting())

voting_df.head()









    



Entries in the DataFrame (1713854, 26)






    Out[3]:






  
    
      
      Name
      BillTitle
      Decision
      BusinessShortNumber
      ParlGroupName
      ParlGroupCode
      IdVote
      IdSession
      VoteEnd
      SessionName
      Date
    
  
  
    
      0
      Adèle Thorens Goumaz
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      Groupe écologiste
      G
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
    
    
      1
      Ada Marra
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      Groupe socialiste
      S
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
    
    
      2
      Tiana Angelina Moser
      Arrêté fédéral relatif à l’initiative populair...
      Absent
      13.093
      Groupe vert'libéral
      GL
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
    
    
      3
      Sylvie Perrinjaquet
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      Groupe radical-démocratique
      RL
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
    
    
      4
      Natalie Rickli
      Arrêté fédéral relatif à l’initiative populair...
      Abstention
      13.093
      Groupe des Paysans, Artisans et Bourgeois
      V
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05

Still has a lot of data, but we tried to keep a minimum of it to make it readable by a human as well as having all the information we need to display.

3. Format the DataFrame

a) Session level statistics for each of the deputies.

Now, format the way we want to for exporting for each person

First of all : Global stats -> Yes/No/Abstention/Absence/Excused/President at a session level.



In [4]:

    
voting_dict = split_df_dict(voting_df, 'Name')









    



Number of unique entries in Name : 358

Summary of the different votes possible

1	2	3	4	5	6	7
Yes	No	Abstention	No entry	Did not participate	Excused	The president of the session does not vote

Counting the percentage of abstention by person, in order to see whether there is a person that would abstain significantly more often than the rest. It does not appear to be the case.



In [5]:

    
#count_abst = lambda x: 
voting_test = voting_df[['Name','Decision']].groupby('Name').apply(lambda x: np.sum(x.Decision==3)/len(x.Decision))
voting_test.sort_values(ascending=False).head()









    Out[5]:





Name
Yvonne Gilli          0.0
Hans Rutschmann       0.0
Franz Grüter          0.0
Franz Ruppen          0.0
Franziska Teuscher    0.0
dtype: float64

Simply trying the aggregation at a session level for a deputy in particular, it will have to be done for every single one after, then exporting the resulting DataFrame to a .csv or .json to make it readable with javascript.



In [6]:

    
test = voting_dict['Filippo Leutenegger']
test.head()









    Out[6]:






  
    
      
      BillTitle
      Decision
      BusinessShortNumber
      ParlGroupName
      ParlGroupCode
      IdVote
      IdSession
      VoteEnd
      SessionName
      Date
    
  
  
    
      77
      Arrêté fédéral relatif à l’initiative populair...
      Absent
      13.093
      Groupe radical-démocratique
      RL
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
    
    
      277
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.057
      Groupe radical-démocratique
      RL
      13909
      4913
      2014-05-06
      Session spéciale mai 2014
      2014-05-05
    
    
      477
      Loi fédérale sur les entreprises de transport ...
      Abstention
      13.072
      Groupe radical-démocratique
      RL
      13911
      4913
      2014-05-06
      Session spéciale mai 2014
      2014-05-05
    
    
      677
      Loi sur le transport de voyageurs (LTV)
      Yes
      13.072
      Groupe radical-démocratique
      RL
      13912
      4913
      2014-05-06
      Session spéciale mai 2014
      2014-05-05
    
    
      877
      Loi sur le transport de voyageurs (LTV)
      Abstention
      13.072
      Groupe radical-démocratique
      RL
      13913
      4913
      2014-05-06
      Session spéciale mai 2014
      2014-05-05



In [7]:

    
voting_dict['Filippo Leutenegger'].loc[voting_dict['Filippo Leutenegger'].IdSession==4913].SessionName.unique()[0]









    Out[7]:





'Session spéciale mai 2014'



In [8]:

    
test = voting_dict['Filippo Leutenegger'].groupby(['IdSession','SessionName','Date'])


count_yes = lambda x: np.sum(x==VOTE_DICT[1])
count_no = lambda x: np.sum(x==VOTE_DICT[2])
count_abstention = lambda x: np.sum(x==VOTE_DICT[3])
count_absent = lambda x: np.sum(x==VOTE_DICT[5])
count_excused = lambda x: np.sum(x==VOTE_DICT[6])
count_president = lambda x: np.sum(x==VOTE_DICT[7])
name_session = lambda x: x.unique()[0]
year_session = lambda x: x.unique()[0]

test = test.agg({'Decision':{'Yes': count_yes, 'No': count_no,'Abstention': count_abstention,
                'Excused':count_excused, 'Absent':count_absent, 'President':count_president}})

test.columns = test.columns.droplevel(0)
#test_grouped = test_grouped.join(test['SessionName','Year'])
test = test.reset_index().set_index('IdSession')
test = test[['Yes','No','Abstention','Absent','Excused','President','SessionName','Date']]
#test['Total'] = test.sum(axis=1)
test









    Out[8]:






  
    
      
      Yes
      No
      Abstention
      Absent
      Excused
      President
      SessionName
      Date
    
    
      IdSession
      
      
      
      
      
      
      
      
    
  
  
    
      4811
      107
      78
      27
      88
      0
      0
      Session d'automne 2009
      2009-09-07
    
    
      4812
      145
      86
      26
      47
      0
      0
      Session d'hiver 2009
      2009-11-23
    
    
      4813
      98
      51
      7
      142
      0
      0
      Session de printemps 2010
      2010-03-01
    
    
      4814
      159
      64
      35
      44
      0
      0
      Session d'été 2010
      2010-05-31
    
    
      4815
      129
      56
      18
      57
      0
      0
      Session d'automne 2010
      2010-09-13
    
    
      4816
      115
      40
      16
      71
      0
      0
      Session d'hiver 2010
      2010-11-29
    
    
      4817
      121
      62
      32
      55
      0
      0
      Session de printemps 2011
      2011-02-28
    
    
      4818
      20
      34
      5
      43
      0
      0
      Session spéciale avril 2011
      2011-04-11
    
    
      4819
      167
      105
      13
      119
      0
      0
      Session d'été 2011
      2011-05-30
    
    
      4820
      172
      92
      30
      105
      0
      0
      Session d'automne 2011
      2011-09-12
    
    
      4901
      67
      35
      8
      130
      0
      0
      Session d'hiver 2011
      2011-12-05
    
    
      4902
      128
      39
      15
      89
      0
      0
      Session de printemps 2012
      2012-02-27
    
    
      4903
      15
      36
      4
      69
      0
      0
      Session spéciale mai 2012
      2012-05-02
    
    
      4904
      123
      54
      9
      91
      0
      0
      Session d'été 2012
      2012-05-29
    
    
      4905
      139
      64
      11
      144
      0
      0
      Session d'automne 2012
      2012-09-10
    
    
      4906
      73
      51
      13
      43
      0
      0
      Session d'hiver 2012
      2012-11-26
    
    
      4907
      145
      73
      20
      18
      0
      0
      Session de printemps 2013
      2013-03-04
    
    
      4908
      29
      10
      6
      12
      0
      0
      Session spéciale avril 2013
      2013-04-15
    
    
      4909
      97
      75
      15
      25
      0
      0
      Session d'été 2013
      2013-06-03
    
    
      4910
      116
      70
      29
      70
      63
      0
      Session d'automne 2013
      2013-09-09
    
    
      4911
      88
      41
      18
      24
      0
      0
      Session d'hiver 2013
      2013-11-25
    
    
      4912
      95
      53
      18
      41
      18
      0
      Session de printemps 2014
      2014-03-03
    
    
      4913
      10
      8
      2
      1
      0
      0
      Session spéciale mai 2014
      2014-05-05

Aggregating further up at a yearly level and displaying the result.



In [9]:

    
test = test.drop('SessionName',1).groupby('Date').apply(sum).drop('Date',1)
test['Presence'] = test['Yes']+test['No']+test['Abstention']+test['President']
test['Absence'] = test['Excused']+test['Absent']

test.plot.bar(x=test.index, y=['Yes','No','Abstention','Excused','Absent','President'])
#test.plot.bar(x=test.index,y=['Presence','Absence'])









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f7633e80438>

N.B. We checked for consistency, and from the time the person is elected, he appears at all subsequent votes (the sum of all the fields will give the total number of votes from the session)

Exporting the results to the `analysis` folder in the datas



In [10]:

    
from helpers import export_session_vote_csv
directory = '../../datas/analysis/voting_agg/'
if not os.path.exists(directory):
    os.makedirs(directory)
    
for deputee, df in voting_dict.items():
    export_session_vote_csv(directory,deputee,df,'_vote_session.csv')
    
df_grouped = df.groupby(['IdSession','SessionName','Date'])

b) Detailled version for the votes of each deputee

Here, we store everything, sorted by session and time, that each deputee voted.

Before adding fields, let us first merge some parts of the vote DataFrame with the voting DataFrame



In [11]:

    
def merge_vote_voting(vote_df, voting_df):
    vote_info_df = vote_df[['Subject','MeaningNo','MeaningYes','ID']]
    vote_info_df.columns = ['Subject','MeaningNo','MeaningYes','IdVote']
    
    return pd.merge(voting_df, vote_info_df, on ='IdVote')

voting_df = merge_vote_voting(vote_df,voting_df)
voting_df.head()









    Out[11]:






  
    
      
      Name
      BillTitle
      Decision
      BusinessShortNumber
      ParlGroupName
      ParlGroupCode
      IdVote
      IdSession
      VoteEnd
      SessionName
      Date
      Subject
      MeaningNo
      MeaningYes
    
  
  
    
      0
      Adèle Thorens Goumaz
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      Groupe écologiste
      G
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
    
    
      1
      Ada Marra
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      Groupe socialiste
      S
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
    
    
      2
      Tiana Angelina Moser
      Arrêté fédéral relatif à l’initiative populair...
      Absent
      13.093
      Groupe vert'libéral
      GL
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
    
    
      3
      Sylvie Perrinjaquet
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      Groupe radical-démocratique
      RL
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
    
    
      4
      Natalie Rickli
      Arrêté fédéral relatif à l’initiative populair...
      Abstention
      13.093
      Groupe des Paysans, Artisans et Bourgeois
      V
      13907
      4913
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...

First of all, count the overall result of the vote for the whole councile considered



In [31]:

    
def format_result_vote(voting_df):
    # Setting Name, Party and Topic as indices
    result_df = voting_df.set_index('IdVote')[['Decision']]

    result_df = result_df.groupby(level='IdVote')
    
    # Functions to count number of yes/no/absentions, same principle as before
    count_total = lambda x: sum(x==VOTE_DICT[1])+sum(x==VOTE_DICT[2])+sum(x==VOTE_DICT[3])
    count_yes = lambda x: int(round(100*np.sum(x==VOTE_DICT[1])/count_total(x),0))
    count_no = lambda x: int(round(100*np.sum(x==VOTE_DICT[2])/count_total(x),0))
    count_abstention = lambda x: int(round(100*np.sum(x==VOTE_DICT[3])/count_total(x),0))

    result_df = result_df.agg({'Decision':{'Total_Yes': count_yes, 'Total_No': count_no,'Total_Abstention': count_abstention,
                                          'Total_Total': count_total}})
    result_df.columns = result_df.columns.droplevel(0)
    result_df.reset_index(inplace=True)
    return pd.merge(voting_df, result_df, on= 'IdVote')

Secondly, compute the voting result aggregated to a party level, to be able to know whetether someone voted like his party



In [32]:

    
def format_party_voting(voting_df):
    # Setting Name, Party and Topic as indices
    party_voting_df = voting_df.set_index(['ParlGroupCode','IdVote'])[['Decision']]

    party_voting_df = party_voting_df.groupby(level=['ParlGroupCode','IdVote'])
    
    # Functions to count number of yes/no/absentions, same principle as before
    count_total = lambda x: sum(x==VOTE_DICT[1])+sum(x==VOTE_DICT[2])+sum(x==VOTE_DICT[3]) if sum(x==VOTE_DICT[1])+sum(x==VOTE_DICT[2])+sum(x==VOTE_DICT[3]) != 0 else 1
    count_yes = lambda x: int(round(100*np.sum(x==VOTE_DICT[1])/count_total(x),0))
    count_no = lambda x: int(round(100*np.sum(x==VOTE_DICT[2])/count_total(x),0))
    count_abstention = lambda x: int(round(100*np.sum(x==VOTE_DICT[3])/count_total(x),0))

    party_voting_df = party_voting_df.agg({'Decision':{'Party_Yes': count_yes, 'Party_No': count_no,'Party_Abstention': count_abstention,
                                                      'Party_Total':count_total}})
    party_voting_df.columns = party_voting_df.columns.droplevel(0)
    party_voting_df.reset_index(inplace=True)
    return pd.merge(voting_df, party_voting_df, on=['ParlGroupCode', 'IdVote'])

Applying the functions we previously defined to make the information about the vote of each deputee complete.



In [33]:

    
deputee_vote = format_result_vote(voting_df)
deputee_vote = format_party_voting(deputee_vote)
deputee_vote = deputee_vote.fillna('Not specified')
deputee_vote = deputee_vote.drop(['ParlGroupName','ParlGroupCode','IdSession'],1)

Exporting the result to the voting_deputee folder



In [34]:

    
deputee_vote.head()









    Out[34]:






  
    
      
      Name
      BillTitle
      Decision
      BusinessShortNumber
      IdVote
      VoteEnd
      SessionName
      Date
      Subject
      MeaningNo
      MeaningYes
      Total_Yes
      Total_Total
      Total_Abstention
      Total_No
      Party_No
      Party_Total
      Party_Abstention
      Party_Yes
    
  
  
    
      0
      Adèle Thorens Goumaz
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      13907
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
      74
      174
      14
      11
      0
      14
      0
      100
    
    
      1
      Yvonne Gilli
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      13907
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
      74
      174
      14
      11
      0
      14
      0
      100
    
    
      2
      Francine John-Calame
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      13907
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
      74
      174
      14
      11
      0
      14
      0
      100
    
    
      3
      Maya Graf
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      13907
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
      74
      174
      14
      11
      0
      14
      0
      100
    
    
      4
      Christian van Singer
      Arrêté fédéral relatif à l’initiative populair...
      Yes
      13.093
      13907
      2014-05-05
      Session spéciale mai 2014
      2014-05-05
      Art. 2
      Antrag der SVP-Fraktion (Empfehlung auf Annahm...
      Antrag der Kommission (Empfehlung auf Ablehung...
      74
      174
      14
      11
      0
      14
      0
      100



In [35]:

    
deputee_dict = split_df_dict(deputee_vote, 'Name')

from helpers import export_session_vote_csv
directory = '../../datas/analysis/voting_deputee/'
if not os.path.exists(directory):
    os.makedirs(directory)
    
for deputee, df in deputee_dict.items():
    df.to_csv(directory+deputee+'_vote.csv',index=False)









    



Number of unique entries in Name : 358

	Name	BillTitle	Decision	BusinessShortNumber	ParlGroupName	ParlGroupCode	IdVote	IdSession	VoteEnd	SessionName	Date
0	Adèle Thorens Goumaz	Arrêté fédéral relatif à l’initiative populair...	Yes	13.093	Groupe écologiste	G	13907	4913	2014-05-05	Session spéciale mai 2014	2014-05-05
1	Ada Marra	Arrêté fédéral relatif à l’initiative populair...	Yes	13.093	Groupe socialiste	S	13907	4913	2014-05-05	Session spéciale mai 2014	2014-05-05
2	Tiana Angelina Moser	Arrêté fédéral relatif à l’initiative populair...	Absent	13.093	Groupe vert'libéral	GL	13907	4913	2014-05-05	Session spéciale mai 2014	2014-05-05
3	Sylvie Perrinjaquet	Arrêté fédéral relatif à l’initiative populair...	Yes	13.093	Groupe radical-démocratique	RL	13907	4913	2014-05-05	Session spéciale mai 2014	2014-05-05
4	Natalie Rickli	Arrêté fédéral relatif à l’initiative populair...	Abstention	13.093	Groupe des Paysans, Artisans et Bourgeois	V	13907	4913	2014-05-05	Session spéciale mai 2014	2014-05-05

	Yes	No	Abstention	Absent	Excused	President	SessionName	Date
IdSession
4811	107	78	27	88	0	0	Session d'automne 2009	2009-09-07
4812	145	86	26	47	0	0	Session d'hiver 2009	2009-11-23
4813	98	51	7	142	0	0	Session de printemps 2010	2010-03-01
4814	159	64	35	44	0	0	Session d'été 2010	2010-05-31
4815	129	56	18	57	0	0	Session d'automne 2010	2010-09-13
4816	115	40	16	71	0	0	Session d'hiver 2010	2010-11-29
4817	121	62	32	55	0	0	Session de printemps 2011	2011-02-28
4818	20	34	5	43	0	0	Session spéciale avril 2011	2011-04-11
4819	167	105	13	119	0	0	Session d'été 2011	2011-05-30
4820	172	92	30	105	0	0	Session d'automne 2011	2011-09-12
4901	67	35	8	130	0	0	Session d'hiver 2011	2011-12-05
4902	128	39	15	89	0	0	Session de printemps 2012	2012-02-27
4903	15	36	4	69	0	0	Session spéciale mai 2012	2012-05-02
4904	123	54	9	91	0	0	Session d'été 2012	2012-05-29
4905	139	64	11	144	0	0	Session d'automne 2012	2012-09-10
4906	73	51	13	43	0	0	Session d'hiver 2012	2012-11-26
4907	145	73	20	18	0	0	Session de printemps 2013	2013-03-04
4908	29	10	6	12	0	0	Session spéciale avril 2013	2013-04-15
4909	97	75	15	25	0	0	Session d'été 2013	2013-06-03
4910	116	70	29	70	63	0	Session d'automne 2013	2013-09-09
4911	88	41	18	24	0	0	Session d'hiver 2013	2013-11-25
4912	95	53	18	41	18	0	Session de printemps 2014	2014-03-03
4913	10	8	2	1	0	0	Session spéciale mai 2014	2014-05-05