Split Vote and Voting

Splits the Voting DataFrame by each single vote that was taken in .csv files. Each file will have two different IDs together: the one from the law that is voted, and the other from the iteration of the law.

1. Loading the data we will need: vote and voting fields


In [14]:
import pandas as pd
import glob
import os
import numpy as np
import matplotlib.pyplot as plt
from helpers import load_vote_df, load_voting,format_voting_session, split_df_dict
%matplotlib inline
%load_ext autoreload
%autoreload 2

# There's a lot of columns in the DF. 
# Therefore, we add this option so that we can see more columns
pd.options.display.max_columns = 100


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [2]:
voting_df = format_voting_session(load_voting())


Entries in the DataFrame (1713854, 26)

2. Map name of the law to an ID

Allows to load a light file, which will serve as link between the initial visualisation and the law specific.

Aside: Removing the not relevant laws from the Vote file

We see here a disparity, as in our Voting file, the smallest IdVote is 6392, which removes a third of the laws we actually have. This is due to the scraping not retrieving everything, but this is not a bad thing, as it removes the oldest entries, which are not the most relevant ones.


In [3]:
voting_df.IdVote.sort_values().unique()


Out[3]:
array([ 6392,  6393,  6394, ..., 17979, 17981, 17983])

In [15]:
vote_df = load_vote_df()
vote_df.head()


Entries in the DataFrame (11182, 16)
Out[15]:
VoteEnd BillTitle BusinessTitle Subject MeaningNo MeaningYes BusinessShortNumber ID IdLegislativePeriod IdSession
0 2003-12-01 Loi fédérale sur l'octroi d'aides financières ... Fondation Bibliomedia. Aide financière 2004-2007 Gesamtabstimmung NaN NaN 3.054 1 47 4701
1 2003-12-01 Arrêté fédéral concernant le plafond de dépens... Fondation Bibliomedia. Aide financière 2004-2007 Gesamtabstimmung NaN NaN 3.054 2 47 4701
2 2003-12-01 Loi fédérale sur l'octroi d'aides financières ... Musée suisse des transports. Aide financière 2... Vote sur l'ensemble NaN NaN 3.055 4 47 4701
3 2003-12-01 Arrêté fédéral concernant le plafond de dépens... Musée suisse des transports. Aide financière 2... Vote sur l'ensemble NaN NaN 3.055 5 47 4701
4 2003-12-01 Arrêté fédéral relatif au mandat de prestation... EPF. Mandat de prestations pour les années 200... Vote sur l'ensemble NaN NaN 3.045 6 47 4701

Indeed, we see here that here, the ID, which corresponds to the IdVote on the Voting file, starts from 1, but the subject is about the oldest votes. We will now generate an "epurated" version of the Vote dataframe, with only the votes which have a counterpart in the Voting file.

Moreover, we know that there are some empty BillTitle entries, so we want to put the BusinessTitle as BillTitle to be able to handle the subject generally.


In [16]:
vote_df = vote_df.loc[vote_df.ID>=6392]

# Setting the entries with a null BillTitle to have BusinessTitle as their entry
vote_df.loc[vote_df.BillTitle.isnull(),'BillTitle'] =  vote_df.loc[vote_df.BillTitle.isnull(),'BusinessTitle']
    
directory = '../../datas/treated_data/Vote/'
if not os.path.exists(directory):
    os.makedirs(directory)
    
vote_df.to_csv(directory+'legiid_47-50.csv')

End of the aside


In [17]:
vote_df = format_voting_session(vote_df)
#Filling the NaN with some text so the javascript does not crash later on.
vote_df = vote_df.fillna('Not specified')
vote_df.head()


Out[17]:
VoteEnd BillTitle BusinessTitle Subject MeaningNo MeaningYes BusinessShortNumber ID IdLegislativePeriod IdSession SessionName Date
2609 2009-09-07 Arrêté fédéral concernant la contribution de l... Réduction des disparités économiques et social... Entrer en matière Proposition de la minorité Stamm (ne pas entre... Proposition de la majorité (entrer en matière) 9.055 6392 48 4811 Session d'automne 2009 2009-09-07
2610 2009-09-07 Arrêté fédéral concernant la contribution de l... Réduction des disparités économiques et social... Art. 1 (frein aux dépenses) Not specified Not specified 9.055 6393 48 4811 Session d'automne 2009 2009-09-07
2611 2009-09-07 Arrêté fédéral concernant la contribution de l... Réduction des disparités économiques et social... Art. 2bis Proposition de la minorité Stamm Proposition de la majorité 9.055 6394 48 4811 Session d'automne 2009 2009-09-07
2612 2009-09-07 Arrêté fédéral concernant la contribution de l... Réduction des disparités économiques et social... Art. 2ter Proposition de la minorité Schlüer Proposition de la majorité 9.055 6395 48 4811 Session d'automne 2009 2009-09-07
2613 2009-09-07 Arrêté fédéral concernant la contribution de l... Réduction des disparités économiques et social... Art. 2quater Proposition de la minorité Fehr Hans-Jürg Proposition de la majorité 9.055 6396 48 4811 Session d'automne 2009 2009-09-07

First of all, we associate a unique ID to each BillTitle. To do so, we take as ID the one that a given Bill has at the last time it appears in our DataFrame.


In [18]:
def map_BillTitle_Vote(vote_df):
    df_link = vote_df[['BillTitle','ID']]
    df_link.columns=['BillTitle','ID_Bill']
    df_link = df_link.drop_duplicates(['BillTitle'], keep = 'last')
    df_link.set_index('BillTitle',inplace=True)
    vote_df = vote_df.join(df_link, on='BillTitle')
    
    return df_link, vote_df

df_link,vote_df = map_BillTitle_Vote(vote_df)

directory = '../../datas/analysis/'
if not os.path.exists(directory):
    os.makedirs(directory)
    
df_link.to_csv(directory+'map_bill_ID.csv')

We must first have a table in which we can link the BillTitle to the corresponding file, which will contain all the informations regarding a certain BillTitle.


In [19]:
bills_dict = split_df_dict(vote_df, 'ID_Bill')


Number of unique entries in ID_Bill : 3415

In [20]:
directory = '../../datas/analysis/bill_link/'
if not os.path.exists(directory):
    os.makedirs(directory)

for ID_Bill, df in bills_dict.items():
    df.to_csv(directory+'bill_'+str(ID_Bill)+'.csv')

3. Map the ID of a given iteration of a law to the votes on the subject

The last task that we have to do is linking the ID of a given iteration of the law (in the file in the bill_link folder) to the votes that happened on the subject. We simply need to group each voting by its IdVote and export it to a single file.


In [21]:
voting_df.head()


Out[21]:
Name BillTitle Decision BusinessShortNumber ParlGroupName ParlGroupCode IdVote IdSession VoteEnd SessionName Date
0 Adèle Thorens Goumaz Loi fédérale concernant la redevance pour l’ut... 2 12.018 Groupe écologiste G 12188 4907 2013-03-04 Session de printemps 2013 2013-03-04
1 Ada Marra Loi fédérale concernant la redevance pour l’ut... 2 12.018 Groupe socialiste S 12188 4907 2013-03-04 Session de printemps 2013 2013-03-04
2 Tiana Angelina Moser Loi fédérale concernant la redevance pour l’ut... 2 12.018 Groupe vert'libéral GL 12188 4907 2013-03-04 Session de printemps 2013 2013-03-04
3 Natalie Rickli Loi fédérale concernant la redevance pour l’ut... 1 12.018 Groupe des Paysans, Artisans et Bourgeois V 12188 4907 2013-03-04 Session de printemps 2013 2013-03-04
4 Barbara Schmid-Federer Loi fédérale concernant la redevance pour l’ut... 3 12.018 Groupe conservateur-catholique CE 12188 4907 2013-03-04 Session de printemps 2013 2013-03-04

Splitting the voting DataFrame by the Idvote field in order to get many subdataframes.


In [22]:
voting_dict = split_df_dict(voting_df, 'IdVote')


Number of unique entries in IdVote : 8573

Before exporting it, we remove a lot of redundant columns. Indeed, as we will access this file through the .csv in the bill_link folder, we will already have access to all the information about the vote itself. We just need to have what each member voted.


In [24]:
directory = '../../datas/analysis/bill_voting/'
if not os.path.exists(directory):
    os.makedirs(directory)

for IdVote, df in voting_dict.items():
    df = df.drop(['BillTitle','BusinessShortNumber', 'IdSession', 'VoteEnd', 'SessionName', 'Date'],axis=1)
    df.to_csv(directory+'voting_'+str(IdVote)+'.csv')