(1) average the scores between the _0,_1,_2,_3 directions to get average score per image in each HOG configuration.
(2) In each HOG configuration, calculate the Precision and Recall values.
(3) "Bootstrap" or "jacknife" to get an error on the AUC for each HOG configuration, describe how you bootstrapped it in words.
(4) Output should look like:
HOG config | Precision | Recall | AUC | AUCerr
In [1]:
import glob
import numpy as np
import pandas as pd
from sklearn.metrics import precision_score, recall_score, roc_auc_score
In [ ]:
In [2]:
def get_data(datadir):
"""
Read the data files from different subdirectories of datadir corresponding
to different HOG configurations.
Inputs
datadir: top level directory in which there are subdirectories corresponding
to different HOG configurations
Output
data: {hogname: list(pd.DataFrame)} where each key corresponds to a
different subdirectory (HOG configuration) and the value is
a list of dataframes read from each of the files in that
subdirectory
"""
hognames = [s.split('/')[-1] for s in glob.glob(datadir + '/*')]
return {hogname: [pd.read_csv(filename, sep=None)
for filename in glob.glob('{}/{}/filenames_*.txt'.format(datadir, hogname))]
for hogname in hognames}
In [3]:
def get_average_scores(dataframes):
"""
Average the scores from several different rotations.
Inputs
dataframes: list(pd.DataFrame['filename', 'score', 'label'])
Output
df_out: pd.DataFrame['filename', 'score', 'label'] where 'score'
is the average over all of the input dataframes and
'label' is taken arbitrarily from the first input dataframe
"""
dataframes = [df.rename(columns={'score': 'score_{}'.format(idx),
'label': 'label_{}'.format(idx)})
for idx, df in enumerate(dataframes)]
merged_df = reduce(lambda df1, df2: pd.merge(df1, df2, on='filename'), dataframes)
assert all(df.shape[0] == merged_df.shape[0] for df in dataframes), \
'Not all keys are the same in the data sets'
merged_df['score'] = sum(merged_df['score_{}'.format(idx)] for idx, _ in enumerate(dataframes))
merged_df['label'] = merged_df['label_0']
return merged_df[['filename', 'score', 'label']]
In [4]:
def bootstrap(df, func, num_samples, sample_size_frac=1):
"""
Returns the bootstrap average and standard deviation when applying
func to df. It is assumed that applying func to df returns a scalar.
In each iteration, sample_size_frac*N rows are drawn from df at
random with replacement, where N is the number of rows in df.
In this way a DataFrame df_sample is created of the same type
as df, with possible a different number of rows. The calculation
of interest is done on df_sample by applying func and returning
a number. This number is collected into an array, and this
process is repeated for num_samples iterations. Finally, the
mean and standard deviation of the array of length num_samples
is returned. The standard deviation is an estimate of the error
(due to finite sample size) that you would get when applying
func to the full DataFrame df to get a number.
Inputs
df: pd.DataFrame of any type
func: function that takes in df and returns a scalar
num_samples: number of bootstrap samples/iterations,
see description above
sample_size_frac: in each bootstrap sample, the number
of rows sampled is this fraction of
the actual number of rows in df
Outputs
mean: mean of the bootstrap values. Should be close to
func(df) if num_samples is large enough.
std: standard deviation of the bootstrap values. This is
an estimate of the error (due to finite sample size)
of func(df).
"""
N = df.shape[0]
sample_size = int(N*sample_size_frac)
bootstrap_values = [func(df.iloc[np.random.randint(N, size=sample_size)])
for _ in range(num_samples)]
return np.mean(bootstrap_values), np.std(bootstrap_values)
In [5]:
def main(datadir, num_boot_samples, bands=None):
"""
For each HOG configuration, average scores from different rotations and
output metrics: precision, recall, AUC, and standard deviation of the AUC
from the bootstrap analysis. Details of the bootstrap analysis described
in the bootstrap function.
Inputs
datadir: directory name in which there are subdirectories corresponding
to different HOG configurations
num_boot_samples: number of bootstrap samples to create in the bootstrap
analysis (see bootstrap function)
bands: list of bands to analyze separately. If None, don't separate out
bands.
Output
pd.DataFrame['HOG_config', 'Precision', 'Recall', 'AUC',
'AUC_boot_avg', 'AUC_boot_std']
OR
pd.DataFrame['HOG_config', 'Band', 'Precision', 'Recall', 'AUC',
'AUC_boot_avg', 'AUC_boot_std']
"""
data = get_data(datadir)
columns = ['HOG_config',
'Precision',
'Recall',
'AUC',
'AUC_boot_avg',
'AUC_boot_std']
if bands is not None:
columns = columns[:1] + ['Band'] + columns[1:]
output = {k: [] for k in columns}
for hogname, dataframes in data.iteritems():
scores_all_bands = get_average_scores(dataframes)
if bands is not None:
scores_all_bands['band'] = scores_all_bands['filename'].apply(lambda s: s.split('_')[2])
# filter filenames further here if needed
for band in (bands if bands is not None else ['']):
if bands is not None:
scores = scores_all_bands[scores_all_bands['band'] == band]
output['Band'].append(band)
else:
scores = scores_all_bands
output['HOG_config'].append(hogname)
output['Precision'].append(precision_score(scores['label'], scores['score'] > 0.5))
output['Recall'].append(recall_score(scores['label'], scores['score'] > 0.5))
output['AUC'].append(roc_auc_score(scores['label'], scores['score']))
boot_avg, boot_std = bootstrap(scores, lambda sc: roc_auc_score(sc['label'], sc['score']),
num_boot_samples)
output['AUC_boot_avg'].append(boot_avg)
output['AUC_boot_std'].append(boot_std)
return pd.DataFrame(output)[columns]
In [ ]:
Test on Mock
In [182]:
main('/path/to/data/directory', 10000)
Out[182]:
Test on SLACS
In [6]:
main('/path/to/data/directory', 10000)
Out[6]:
Test on SLACS separating out different bands
In [7]:
main('/path/to/data/directory', 10000, bands=['435', '814'])
Out[7]:
In [ ]: