Use pandas to generate a key;value file to submit to the leaderboard

First use a pure Gender based probability approach, where all females will survive

Import modules needed



In [52]:

    
from pandas import Series, DataFrame
import pandas as pd

Define path to working csv files, input and output



In [70]:

    
f = r'/home/hase/Documents/ZHAW/InfoEng/Lectures/Scripting/data/titanic3_test.csv'
fo = r'/home/hase/Documents/ZHAW/InfoEng/Lectures/Scripting/data/submit/titanic3_test_gender.csv'

Create a dataframe from the csv file

delimiter is ';', set index to be the 'id', and use only the columns 'id' and 'sex'



In [54]:

    
df = pd.read_csv(f, sep=';', index_col='id', usecols=['id', 'sex'])



In [55]:

    
df.head()  # Get the first five rows of the dataframe

Add a new column named "survived"

Create a lambda function to pass the logic to the new column on the dataframe



In [56]:

    
def gender(row):
    if row['sex'] == 'female':
        return 1
    else:
        return 0

Then, apply the lambda function created to a new column "survived"



In [57]:

    
df['survived'] = df.apply(lambda row: gender(row),axis=1)  # axis=1 means it applies to a row level
# Needs to be lambda to a pass a function to df.apply?



In [58]:

    
df.head()



In [63]:

    
df.drop('sex', axis=1, inplace=True)   # axis=1 means column-wise, and inplace=True does operation in place
# i.e. no need to do df = df.drop(....)



In [64]:

    
df.head()

Rename index and column to comply the format to be submitted



In [84]:

    
df.index.name = 'key'



In [85]:

    
df.index.name









    Out[85]:





'key'



In [81]:

    
df.rename(columns={'survived':'value'}, inplace=True)



In [86]:

    
df.head()

Done with indexing, from this point, create a new csv file for gender based probability and submit



In [87]:

    
df.to_csv(fo, sep=';')

Result on the leaderboard (https://openwhisk.ng.bluemix.net/api/v1/web/ZHAW%20ISPROT_ISPROT17/default/titanic.html)

test_gender_based_28102017 → 0.7777777777777778



In [ ]: