In [1]:
import pandas as pd
import numpy as np
import glob
import os
# There's a lot of columns in the DF.
# Therefore, we add this option so that we can see more columns
pd.options.display.max_columns = 100
Import the Voting. It is heavy, but we load it in order to know which are the deputies for which we need to get additional data.
In [2]:
dataset_tmp = []
path = '../datas/scrap/Voting'
allFiles = glob.glob(os.path.join(path, 'Session*.csv'))
for file_ in allFiles:
data_tmp = pd.read_csv(file_,index_col=0)
dataset_tmp += [data_tmp]
voting_df = pd.concat(dataset_tmp)
We create an array called names which contains all the unique name entries into the Voting dataframe.
In [3]:
voting_df['Name'] = voting_df['FirstName']+' '+voting_df['LastName']
names = voting_df['Name'].unique()
We now get the MemberCouncil dataframe, from which we'll scrap what we need for the infos on each deputee.
In [4]:
dataset_tmp = []
path = '../datas/scrap/MemberCouncil'
allFiles = glob.glob(os.path.join(path, 'MemberCouncilid*.csv'))
for file_ in allFiles:
data_tmp = pd.read_csv(file_,index_col=0)
dataset_tmp += [data_tmp]
member_df = pd.concat(dataset_tmp)
member_df['Name'] = member_df['FirstName']+' '+member_df['LastName']
We see below that there are quite some information available. We'll consider only a few ones:
But it wil be easy to add some other fields when needed.
In [5]:
member_df.head()
Out[5]:
In [6]:
member_df.columns
Out[6]:
In [7]:
member_df = member_df.loc[member_df['Name'].isin(names)]
member_df_final = member_df.drop(['BirthPlace_Canton','Canton','Council','FirstName','LastName','IdPredecessor','Language',
'MaritalStatus','MilitaryRank','Modified','OfficialName','ParlGroupFunction',
'ParlGroupNumber', 'Party'],1)
member_df_final.loc[:,'DateOfBirth'] = member_df_final['DateOfBirth'].apply(pd.to_datetime).apply(lambda x: x.date())
Now we match the party names to the ones we have in our database, in order to prevent as much as possible clashes in the data.
In [8]:
member_df_final.loc[:,'Active'] = member_df_final['Active'].astype(str)
member_df_final = member_df_final.fillna('Not specified')
member_df_final.head()
Out[8]:
The list of parties below shows all the parties we consider in a vote. This is more extensive than what we have in the voting fields.
In [9]:
member_df_final.PartyName.unique()
Out[9]:
Exporting the names in a single json file each time. We have to perform a little trick to get each row into the dict we want, and then formatting it properly for exportation into a .json
file.
In [10]:
import json
directory = '../datas/analysis/deputee_names/'
if not os.path.exists(directory):
os.makedirs(directory)
for deputee in names:
deputee_list = member_df_final.loc[member_df_final.Name==deputee].to_dict(orient='records')
deputee_list=deputee_list[0]
deputee_list['DateOfBirth'] = deputee_list['DateOfBirth'].isoformat()
with open(directory+deputee+'_info.json', 'w') as f:
json.dump(deputee_list, f)