In [2]:
%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import mia

Loading and Preprocessing

Loading the hologic and synthetic datasets.


In [3]:
hologic = pd.DataFrame.from_csv("/Volumes/Seagate/mmp_data/2015-04-01/hologic.csv")
phantom = pd.DataFrame.from_csv("/Volumes/Seagate/mmp_data/2015-04-16/synthetics1-blobs-upscale.csv")

Loading the meta data for the real and synthetic datasets.


In [4]:
hologic_meta = mia.analysis.create_hologic_meta_data(hologic, "/Volumes/Seagate/mmp_data/meta_data/BIRADS.csv")
phantom_meta = mia.analysis.create_synthetic_meta_data(phantom, "/Volumes/Seagate/mmp_data/meta_data/synthetic_meta_data_cleaned.csv")
phantom_meta.index.name = 'img_name'

Prepare the BI-RADS/VBD labels for both datasets.


In [5]:
hologic_labels = hologic_meta.drop_duplicates().BIRADS
phantom_labels = phantom_meta['VBD.1']

class_labels = pd.concat([hologic_labels, phantom_labels])
class_labels.index.name = "img_name"
labels = mia.analysis.remove_duplicate_index(class_labels)[0]

Create blob features from distribution of blobs


In [6]:
hologic_blob_features = mia.analysis.features_from_blobs(hologic)
phantom_blob_features = mia.analysis.features_from_blobs(phantom)

Take a random subset of the phantom mammograms. This is important so that each case is not over represented.


In [7]:
syn_feature_meta = mia.analysis.remove_duplicate_index(phantom_meta)
phantom_blob_features['phantom_name'] = syn_feature_meta.phantom_name.tolist()
phantom_blob_features_subset = mia.analysis.create_random_subset(phantom_blob_features, 'phantom_name')

Combine the features from both datasets.


In [8]:
features = pd.concat([hologic_blob_features, phantom_blob_features_subset])
assert features.shape[0] == 366
features.head()


Out[8]:
blob_count avg_radius std_radius min_radius max_radius small_radius_count med_radius_count large_radius_count density upper_dist_count 25% 50% 75%
p214-010-60001-cl.png 56 22.121831 22.923389 8 128.000000 52 1 3 52.940812 21 8 11.313708 22.627417
p214-010-60001-cr.png 78 19.054538 17.506086 8 90.509668 68 4 6 40.749811 22 8 11.313708 22.627417
p214-010-60001-ml.png 98 20.011191 21.876304 8 128.000000 90 3 5 42.644057 27 8 11.313708 22.627417
p214-010-60001-mr.png 139 15.309764 15.307860 8 128.000000 136 1 2 38.287439 40 8 11.313708 16.000000
p214-010-60005-cl.png 97 20.132590 23.255605 8 181.019336 94 2 1 41.456308 27 8 11.313708 22.627417

Filter some features, such as the min, to remove noise.


In [9]:
selected_features = features.drop(['min_radius'], axis=1)

t-SNE

Running t-SNE to obtain a two dimensional representation.


In [10]:
kwargs = {
    'learning_rate': 300,
    'perplexity': 40,
    'verbose': 1
}

In [11]:
SNE_mapping_2d = mia.analysis.tSNE(selected_features, n_components=2, **kwargs)


[t-SNE] Computing pairwise distances...
[t-SNE] Computed conditional probabilities for sample 366 / 366
[t-SNE] Mean sigma: 1.097346
[t-SNE] Error after 83 iterations with early exaggeration: 12.547390
[t-SNE] Error after 275 iterations: 0.347549

In [12]:
mia.plotting.plot_mapping_2d(SNE_mapping_2d, hologic_blob_features.index, phantom_blob_features.index, labels)


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f9fab90>

Running t-SNE to obtain a 3 dimensional mapping


In [13]:
SNE_mapping_3d = mia.analysis.tSNE(selected_features, n_components=3, **kwargs)


[t-SNE] Computing pairwise distances...
[t-SNE] Computed conditional probabilities for sample 366 / 366
[t-SNE] Mean sigma: 1.097346
[t-SNE] Error after 83 iterations with early exaggeration: 13.123299
[t-SNE] Error after 310 iterations: 0.551418

In [14]:
mia.plotting.plot_mapping_3d(SNE_mapping_3d, hologic_blob_features.index, phantom_blob_features.index, labels)


Out[14]:
<matplotlib.axes._subplots.Axes3DSubplot at 0x10f932650>
<matplotlib.figure.Figure at 0x10f332910>

Isomap

Running Isomap to obtain a 2 dimensional mapping


In [15]:
iso_kwargs = {
    'n_neighbors': 8,
}

In [16]:
iso_mapping_2d = mia.analysis.isomap(selected_features, n_components=2, **iso_kwargs)

In [17]:
mia.plotting.plot_mapping_2d(iso_mapping_2d, hologic_blob_features.index, phantom_blob_features.index, labels)


Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f400250>

In [18]:
iso_mapping_3d = mia.analysis.isomap(selected_features, n_components=3, **iso_kwargs)

In [19]:
mia.plotting.plot_mapping_3d(iso_mapping_3d, hologic_blob_features.index, phantom_blob_features.index, labels)


Out[19]:
<matplotlib.axes._subplots.Axes3DSubplot at 0x10e929550>
<matplotlib.figure.Figure at 0x10e935a10>

Locally Linear Embedding

Running locally linear embedding to obtain 2d mapping


In [25]:
lle_kwargs = {
    'n_neighbors': 8,
}

In [26]:
lle_mapping_2d = mia.analysis.lle(selected_features, n_components=2, **lle_kwargs)

In [27]:
mia.plotting.plot_mapping_2d(lle_mapping_2d, hologic_blob_features.index, phantom_blob_features.index, labels)


Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x110f61e90>

In [21]:
lle_mapping_3d = mia.analysis.lle(selected_features, n_components=3, **lle_kwargs)

In [29]:
%matplotlib qt
mia.plotting.plot_mapping_3d(lle_mapping_3d, hologic_blob_features.index, phantom_blob_features.index, labels)


Out[29]:
<matplotlib.axes._subplots.Axes3DSubplot at 0x119431810>