pyRadiomics features can be used to classify and group images. Three techniques will be demonstrated, multi-dimensional scaling for visual grouping of datasets, heatmaps to show correlations between features, and a clustermap to show hierarchial clustering of features.
For more detailed examples and links to source code, visit http://radiomics.io
In this example, we will download and process data provided by Brigham and Women's Surgical Planning Lab. The data has kindly been made available by the SPL to support reproducible science.
Kaus M.R., Warfield S.K., Nabavi A., Black P.M., Jolesz F.A., Kikinis R. Automated Segmentation of Brain Tumors. SPL 2007 Dec http://www.spl.harvard.edu/publications/item/view/1180
Michael Kaus, Simon K. Warfield, Arya Nabavi, Peter M. Black, Ferenc A. Jolesz, and Ron Kikinis. Automated Segmentation of MRI of Brain Tumors. Radiology. 2001 Feb;218(2):586-91. http://www.spl.harvard.edu/publications/item/view/169
Simon K. Warfield, Michael Kaus, Ferenc A. Jolesz, and Ron Kikinis. Adaptive, Template Moderated, Spatially Varying Statistical Classification. Med Image Anal. 2000 Mar;4(1):43-55. http://www.spl.harvard.edu/publications/item/view/209
In [12]:
# Download the zip file if it does not exist
import os, zipfile
import pandas as pd
import seaborn as sns
from six.moves import urllib
url = "http://www.spl.harvard.edu/publications/bitstream/download/5270"
filename = 'example_data/Tumorbase.zip'
if not os.path.isfile(filename):
if not os.path.isdir('example_data'):
os.mkdir('example_data')
print ("retrieving")
urllib.request.urlretrieve(url, filename)
else:
print ("file already downloaded")
extracted_path = 'example_data/tumorbase'
if not os.path.isdir(extracted_path):
print ("unzipping")
z = zipfile.ZipFile(filename)
z.extractall('example_data')
print ("done unzipping")
In [13]:
# Import some libraries
import SimpleITK as sitk
from radiomics import featureextractor
In [14]:
# Load up the segmentations, 1 to 10 and extract the features
params = os.path.join(os.getcwd(), '..', 'examples', 'exampleSettings', 'Params.yaml')
extractor = featureextractor.RadiomicsFeatureExtractor(params)
# hang on to all our features
features = {}
for case_id in range(1,11):
path = 'example_data/tumorbase/AutomatedSegmentation/case{}/'.format(case_id)
image = sitk.ReadImage(path + "grayscale.nrrd")
mask = sitk.ReadImage(path + "segmented.nrrd")
# Tumor is in label value 6
features[case_id] = extractor.execute ( image, mask, label=6 )
# A list of the valid features, sorted
feature_names = list(sorted(filter ( lambda k: k.startswith("original_"), features[1] )))
In [15]:
# Make a numpy array of all the values
import numpy as np
samples = np.zeros((10,len(feature_names)))
for case_id in range(1,11):
a = np.array([])
for feature_name in feature_names:
a = np.append(a, features[case_id][feature_name])
samples[case_id-1,:] = a
# May have NaNs
samples = np.nan_to_num(samples)
Multidimensional scaling or MDS is as way to visualize very high dimensional data in a lower dimensional space. In our case, the feature space is len(feature_names) (or 93) dimensional space. To help us understand the data, we project into 2d space. MDS preserves the relative distance between sample points during the projection, so two samples close together in 2d space would also be close together in the original 93-dimensional space (and vice versa).
We us non-metric algorithm, because our data are highly non uniform in the scale of each feature.
In [16]:
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
similarities = euclidean_distances(samples)
seed = np.random.RandomState(seed=3)
mds = manifold.MDS(n_components=2, max_iter=5000, eps=1e-12, random_state=seed,
n_init=10,
dissimilarity="precomputed", n_jobs=1, metric=False)
pos = mds.fit_transform(similarities)
Here we use the results. NB: there are two points in the "green-ish" colored circle in the center of the plot.
Looking over our features, it's likely that meningioma and astrocytoma could be distinguisted by a classifier (based on our rather limited data set), but, in at least one case, glioma and astrocytoma features are relatively close together.
In [17]:
# Plot
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
import matplotlib.cm as cm
fig = plt.figure(1)
ax = plt.axes([0., 0., 1., 1.])
s = 100
# Type of tumor
meningioma = [0, 1, 2]
glioma = [3,5,9]
astrocytoma = [4, 6, 7, 8]
plt.scatter(pos[meningioma, 0], pos[meningioma, 1], color='navy', alpha=1.0, s=s, lw=1, label='meningioma')
plt.scatter(pos[glioma, 0], pos[glioma, 1], color='turquoise', alpha=1.0, s=s, lw=1, label='glioma')
plt.scatter(pos[astrocytoma, 0], pos[astrocytoma, 1], color='darkorange', alpha=0.5, s=s, lw=1, label='astrocytoma')
plt.legend(scatterpoints=1, loc=5, shadow=False)
similarities = similarities.max() / similarities * 100
similarities[np.isinf(similarities)] = 0
plt.show()
In [18]:
import pandas as pd
import seaborn as sns
# type of each tumor
types =['meningioma', 'meningioma', 'meningioma', 'glioma', 'astrocytoma', 'glioma', 'astrocytoma', 'astrocytoma', 'astrocytoma', 'glioma']
# Construct a pandas dataframe from the samples
d = pd.DataFrame(data=samples, columns=feature_names, index=types)
corr = d.corr()
# Set up the matplotlib figure, make it big!
f, ax = plt.subplots(figsize=(15, 10))
# Draw the heatmap using seaborn
sns.heatmap(corr, vmax=.8, square=True)
Out[18]:
Though useful, heatmaps tell a much better story if the features are clustered. Here we take a smaller subset of the features and cluster. In the dendrogram, there are 2 major groups, and many smaller groupings based on features.
In [19]:
# Choose a subset of features for clustering
dd = d.iloc[:,1:50]
# sns.clustermap(d, linewidths=.5, figsize=(13,13))
m = d.as_matrix()
from scipy.cluster.hierarchy import dendrogram, linkage
Z = linkage(m,'ward')
In [20]:
plt.figure(figsize=(25, 10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=8., # font size for the x axis labels
)
plt.show()
In [21]:
d.head()
Out[21]:
In [22]:
pp = sns.clustermap(d, col_cluster=False, metric='chebyshev', z_score=1)
_ = plt.setp(pp.ax_heatmap.get_yticklabels(), rotation=0)
plt.show()