Use pandas to generate a key;value file to submit to the leaderboard

First use a pure Gender based probability approach, where all females will survive

Import modules needed


In [52]:
from pandas import Series, DataFrame
import pandas as pd

Define path to working csv files, input and output


In [70]:
f = r'/home/hase/Documents/ZHAW/InfoEng/Lectures/Scripting/data/titanic3_test.csv'
fo = r'/home/hase/Documents/ZHAW/InfoEng/Lectures/Scripting/data/submit/titanic3_test_gender.csv'

Create a dataframe from the csv file

delimiter is ';', set index to be the 'id', and use only the columns 'id' and 'sex'


In [54]:
df = pd.read_csv(f, sep=';', index_col='id', usecols=['id', 'sex'])

In [55]:
df.head()  # Get the first five rows of the dataframe


Out[55]:
sex
id
5 female
10 male
15 male
20 male
25 female

Add a new column named "survived"

Create a lambda function to pass the logic to the new column on the dataframe


In [56]:
def gender(row):
    if row['sex'] == 'female':
        return 1
    else:
        return 0

Then, apply the lambda function created to a new column "survived"


In [57]:
df['survived'] = df.apply(lambda row: gender(row),axis=1)  # axis=1 means it applies to a row level
# Needs to be lambda to a pass a function to df.apply?

In [58]:
df.head()


Out[58]:
sex survived
id
5 female 1
10 male 0
15 male 0
20 male 0
25 female 1

In [63]:
df.drop('sex', axis=1, inplace=True)   # axis=1 means column-wise, and inplace=True does operation in place
# i.e. no need to do df = df.drop(....)

In [64]:
df.head()


Out[64]:
survived
id
5 1
10 0
15 0
20 0
25 1

Rename index and column to comply the format to be submitted


In [84]:
df.index.name = 'key'

In [85]:
df.index.name


Out[85]:
'key'

In [81]:
df.rename(columns={'survived':'value'}, inplace=True)

In [86]:
df.head()


Out[86]:
value
key
5 1
10 0
15 0
20 0
25 1

Done with indexing, from this point, create a new csv file for gender based probability and submit


In [87]:
df.to_csv(fo, sep=';')

Result on the leaderboard (https://openwhisk.ng.bluemix.net/api/v1/web/ZHAW%20ISPROT_ISPROT17/default/titanic.html)

test_gender_based_28102017 → 0.7777777777777778


In [ ]: