pyRadiomics feature visualization using multi-dimensional scaling and heatmaps

pyRadiomics features can be used to classify and group images. Three techniques will be demonstrated, multi-dimensional scaling for visual grouping of datasets, heatmaps to show correlations between features, and a clustermap to show hierarchial clustering of features.

For more detailed examples and links to source code, visit http://radiomics.io

In this example, we will download and process data provided by Brigham and Women's Surgical Planning Lab. The data has kindly been made available by the SPL to support reproducible science.

Kaus M.R., Warfield S.K., Nabavi A., Black P.M., Jolesz F.A., Kikinis R. Automated Segmentation of Brain Tumors. SPL 2007 Dec http://www.spl.harvard.edu/publications/item/view/1180

Michael Kaus, Simon K. Warfield, Arya Nabavi, Peter M. Black, Ferenc A. Jolesz, and Ron Kikinis. Automated Segmentation of MRI of Brain Tumors. Radiology. 2001 Feb;218(2):586-91. http://www.spl.harvard.edu/publications/item/view/169

Simon K. Warfield, Michael Kaus, Ferenc A. Jolesz, and Ron Kikinis. Adaptive, Template Moderated, Spatially Varying Statistical Classification. Med Image Anal. 2000 Mar;4(1):43-55. http://www.spl.harvard.edu/publications/item/view/209


In [12]:
# Download the zip file if it does not exist
import os, zipfile
import pandas as pd
import seaborn as sns

from six.moves import urllib

url = "http://www.spl.harvard.edu/publications/bitstream/download/5270"
filename = 'example_data/Tumorbase.zip'
if not os.path.isfile(filename):
    if not os.path.isdir('example_data'):
        os.mkdir('example_data')
    print ("retrieving")
    urllib.request.urlretrieve(url, filename)
else:
    print ("file already downloaded")
    
extracted_path = 'example_data/tumorbase'
if not os.path.isdir(extracted_path):
    print ("unzipping")
    z = zipfile.ZipFile(filename)
    z.extractall('example_data')
    print ("done unzipping")


file already downloaded

In [13]:
# Import some libraries
import SimpleITK as sitk
from radiomics import featureextractor

Extract features

Next we loop over all cases (case_id from 1 to 10), reading each image and label. The label map contains a 6 at each voxel containing tumor. The radiomics package expects 0 and 1, so we need to use the BinaryThreshold filter to extract the right labels.


In [14]:
# Load up the segmentations, 1 to 10 and extract the features
params = os.path.join(os.getcwd(), '..', 'examples', 'exampleSettings', 'Params.yaml')

extractor = featureextractor.RadiomicsFeatureExtractor(params)
# hang on to all our features
features = {}

for case_id in range(1,11):
    path = 'example_data/tumorbase/AutomatedSegmentation/case{}/'.format(case_id)
    image = sitk.ReadImage(path + "grayscale.nrrd")
    mask = sitk.ReadImage(path + "segmented.nrrd")
    # Tumor is in label value 6
    features[case_id] = extractor.execute ( image, mask, label=6 )
    

# A list of the valid features, sorted
feature_names = list(sorted(filter ( lambda k: k.startswith("original_"), features[1] )))

In [15]:
# Make a numpy array of all the values
import numpy as np

samples = np.zeros((10,len(feature_names)))
for case_id in range(1,11):
    a = np.array([])
    for feature_name in feature_names:
        a = np.append(a, features[case_id][feature_name])
    samples[case_id-1,:] = a
    
# May have NaNs
samples = np.nan_to_num(samples)

Multidimensional scaling

Multidimensional scaling or MDS is as way to visualize very high dimensional data in a lower dimensional space. In our case, the feature space is len(feature_names) (or 93) dimensional space. To help us understand the data, we project into 2d space. MDS preserves the relative distance between sample points during the projection, so two samples close together in 2d space would also be close together in the original 93-dimensional space (and vice versa).

We us non-metric algorithm, because our data are highly non uniform in the scale of each feature.


In [16]:
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA

similarities = euclidean_distances(samples)


seed = np.random.RandomState(seed=3)

mds = manifold.MDS(n_components=2, max_iter=5000, eps=1e-12, random_state=seed,
                   n_init=10,
                   dissimilarity="precomputed", n_jobs=1, metric=False)
pos = mds.fit_transform(similarities)

Plot

Here we use the results. NB: there are two points in the "green-ish" colored circle in the center of the plot.

Looking over our features, it's likely that meningioma and astrocytoma could be distinguisted by a classifier (based on our rather limited data set), but, in at least one case, glioma and astrocytoma features are relatively close together.


In [17]:
# Plot

from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
import matplotlib.cm as cm


fig = plt.figure(1)
ax = plt.axes([0., 0., 1., 1.])

s = 100

# Type of tumor
meningioma = [0, 1, 2]
glioma = [3,5,9]
astrocytoma = [4, 6, 7, 8]

plt.scatter(pos[meningioma, 0], pos[meningioma, 1], color='navy', alpha=1.0, s=s, lw=1, label='meningioma')
plt.scatter(pos[glioma, 0], pos[glioma, 1], color='turquoise', alpha=1.0, s=s, lw=1, label='glioma')
plt.scatter(pos[astrocytoma, 0], pos[astrocytoma, 1], color='darkorange', alpha=0.5, s=s, lw=1, label='astrocytoma')

plt.legend(scatterpoints=1, loc=5, shadow=False)

similarities = similarities.max() / similarities * 100
similarities[np.isinf(similarities)] = 0
plt.show()


c:\python27\lib\site-packages\ipykernel\__main__.py:24: RuntimeWarning: divide by zero encountered in divide

Plot features as a heatmap

A heat map gives the correlation of features to each other, red indicating positive correlation, and blue negative. A heatmap may be clustered to show groupings of features that are similar.


In [18]:
import pandas as pd
import seaborn as sns

# type of each tumor
types =['meningioma', 'meningioma', 'meningioma', 'glioma', 'astrocytoma', 'glioma', 'astrocytoma', 'astrocytoma', 'astrocytoma', 'glioma']

# Construct a pandas dataframe from the samples
d = pd.DataFrame(data=samples, columns=feature_names, index=types)

corr = d.corr()

# Set up the matplotlib figure, make it big!
f, ax = plt.subplots(figsize=(15, 10))

# Draw the heatmap using seaborn
sns.heatmap(corr, vmax=.8, square=True)


Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x236f9ba8>

Cluster the heatmap

Though useful, heatmaps tell a much better story if the features are clustered. Here we take a smaller subset of the features and cluster. In the dendrogram, there are 2 major groups, and many smaller groupings based on features.


In [19]:
# Choose a subset of features for clustering
dd = d.iloc[:,1:50]

# sns.clustermap(d, linewidths=.5, figsize=(13,13))
m = d.as_matrix()

from scipy.cluster.hierarchy import dendrogram, linkage

Z = linkage(m,'ward')

In [20]:
plt.figure(figsize=(25, 10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
    Z,
    leaf_rotation=90.,  # rotates the x axis labels
    leaf_font_size=8.,  # font size for the x axis labels
)
plt.show()



In [21]:
d.head()


Out[21]:
original_firstorder_10Percentile original_firstorder_90Percentile original_firstorder_Energy original_firstorder_Entropy original_firstorder_InterquartileRange original_firstorder_Kurtosis original_firstorder_Maximum original_firstorder_Mean original_firstorder_MeanAbsoluteDeviation original_firstorder_Median ... original_shape_Maximum2DDiameterColumn original_shape_Maximum2DDiameterRow original_shape_Maximum2DDiameterSlice original_shape_Maximum3DDiameter original_shape_MeshVolume original_shape_MinorAxisLength original_shape_Sphericity original_shape_SurfaceArea original_shape_SurfaceVolumeRatio original_shape_VoxelVolume
meningioma 140.0 175.0 703288387.0 1.407119 18.0 5.772689 291.0 156.770410 10.982513 155.0 ... 51.702424 45.621104 58.245842 58.735404 37283.367920 38.979460 0.659890 8178.794758 0.219368 37416.357422
meningioma 150.0 204.0 773687995.0 2.111630 20.0 9.286318 458.0 176.417046 18.328115 177.0 ... 53.521644 45.348649 52.908566 53.922470 31761.254883 34.207775 0.617228 7857.842034 0.247403 31880.566406
meningioma 47.0 67.0 78040812.0 0.842528 10.0 5.565444 124.0 57.087513 6.358937 57.0 ... 48.536431 41.708110 46.118902 50.355184 30754.193115 36.263599 0.705009 6733.261085 0.218938 30912.890625
glioma 39.0 57.0 47124084.0 0.974351 10.0 5.640700 104.0 47.435372 5.806231 47.0 ... 45.296290 46.084205 43.912644 46.723254 26829.327393 34.663747 0.658875 6577.901286 0.245176 26916.943359
astrocytoma 89.0 122.0 22154132.0 1.296074 16.0 7.692408 135.0 105.548354 11.158143 108.0 ... 24.030011 23.865590 23.437500 26.681718 2513.397217 19.495527 0.567745 1574.603483 0.626484 2562.890625

5 rows × 100 columns


In [22]:
pp = sns.clustermap(d, col_cluster=False, metric='chebyshev', z_score=1)
_ = plt.setp(pp.ax_heatmap.get_yticklabels(), rotation=0)
plt.show()