Example analysis

This notebook provides a generic example for some analysis that you might want to conduct with the data provided through this AFQ-Browser instance. Note that this is just an example, and not may be a good approach to the data in this particular instance of the AFQ-Browser, and this data-set. Ultimately, the limits of the analysis you could do are the limits of your imagination.



In [1]:

    
%matplotlib inline



In [2]:

    
import pandas as pd



In [3]:

    
subjects = pd.read_csv('./data/subjects.csv')



In [4]:

    
nodes = pd.read_csv('./data/nodes.csv')

Merging nodes and subjects

The data from nodes (referring to diffusion statistics along the length of the tracts) can be merged together with the data about subjects into one table:



In [5]:

    
merged = pd.merge(nodes, subjects, on="subjectID")



In [6]:

    
merged.head()









    Out[6]:







  
    
      
      subjectID
      tractID
      nodeID
      rd
      md
      cl
      torsion
      curvature
      fa
      ad
      volume
      Unnamed: 0
      patient
      score
      session
    
  
  
    
      0
      patient_01
      Left Thalamic Radiation
      0
      0.613340
      0.851832
      0.257362
      -0.065856
      0.137844
      0.452992
      1.328816
      70.0
      0
      1
      0.194764
      1
    
    
      1
      patient_01
      Left Thalamic Radiation
      1
      0.611204
      0.848955
      0.256652
      -0.087893
      0.153220
      0.453549
      1.324456
      60.0
      0
      1
      0.194764
      1
    
    
      2
      patient_01
      Left Thalamic Radiation
      2
      0.608178
      0.843154
      0.254787
      -0.244466
      0.092622
      0.452349
      1.313105
      76.0
      0
      1
      0.194764
      1
    
    
      3
      patient_01
      Left Thalamic Radiation
      3
      0.604815
      0.837168
      0.253062
      -0.533194
      0.027443
      0.451471
      1.301873
      55.0
      0
      1
      0.194764
      1
    
    
      4
      patient_01
      Left Thalamic Radiation
      4
      0.601006
      0.831444
      0.251759
      0.272117
      0.047876
      0.451726
      1.292319
      71.0
      0
      1
      0.194764
      1

Visualizing the data

You can use Matplotlib and Seaborn to visualize the data:



In [7]:

    
import matplotlib.pyplot as plt
import seaborn as sns

We focus on the calculated diffusion statistics that are included in the nodes table:



In [8]:

    
stats = nodes.columns.drop(["subjectID", "tractID", "nodeID"])

And specifically on the very first one



In [9]:

    
print(stats[0])

rd



In [10]:

    
stat = merged[["nodeID", "subjectID", "tractID", stats[0]]]

Select a single tract:



In [11]:

    
tract_stat = stat[stat["tractID"] == stat["tractID"].values[0]]



In [12]:

    
tract_stat.head()









    Out[12]:







  
    
      
      nodeID
      subjectID
      tractID
      rd
    
  
  
    
      0
      0
      patient_01
      Left Thalamic Radiation
      0.613340
    
    
      1
      1
      patient_01
      Left Thalamic Radiation
      0.611204
    
    
      2
      2
      patient_01
      Left Thalamic Radiation
      0.608178
    
    
      3
      3
      patient_01
      Left Thalamic Radiation
      0.604815
    
    
      4
      4
      patient_01
      Left Thalamic Radiation
      0.601006



In [13]:

    
tract_p = tract_stat.pivot(index='nodeID', columns='subjectID', values=stats[0])



In [14]:

    
import numpy as np



In [15]:

    
sns.tsplot(tract_p.values.T, err_style="unit_traces", estimator=np.nanmean)









    



/Users/arokem/anaconda3/lib/python3.5/site-packages/seaborn/timeseries.py:183: UserWarning: The tsplot function is deprecated and will be removed or replaced (in a substantially altered version) in a future release.
  warnings.warn(msg, UserWarning)
/Users/arokem/anaconda3/lib/python3.5/site-packages/seaborn/algorithms.py:76: RuntimeWarning: Mean of empty slice
  boot_dist.append(func(*sample, **func_kwargs))






    Out[15]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a1692f2b0>

Analyzing data

As an example of one approach to AFQ data, we include here an example of how you might use Scikit Learn's implementation of the K-means algorithm to cluster the subjects in these data into two clusters, based on this statistic/tract combination.



In [16]:

    
from sklearn.cluster import KMeans 
from sklearn.preprocessing import Imputer
from sklearn.pipeline import Pipeline

We create a pipeline that imputes nan values (that sometimes occur in tract profiles), and clusters the results into two clusters:



In [17]:

    
estimator = Pipeline([("impute", Imputer()), ("cluster", KMeans(n_clusters=2))])

We compute the clusters and transform the data into cluster distance space



In [18]:

    
clusters = estimator.fit(tract_p.values.T).steps[1][1]



In [19]:

    
labels = clusters.labels_



In [20]:

    
x, y = estimator.fit_transform(tract_p.values.T).T

We plot the results in the latent cluster space



In [21]:

    
plt.scatter(x, y, c=labels.astype(np.float))









    Out[21]:





<matplotlib.collections.PathCollection at 0x1a173820f0>



In [ ]:

	subjectID	tractID	nodeID	rd	md	cl	torsion	curvature	fa	ad	volume	patient	score	session
0	patient_01	Left Thalamic Radiation	0	0.613340	0.851832	0.257362	-0.065856	0.137844	0.452992	1.328816	70.0	1	0.194764	1
1	patient_01	Left Thalamic Radiation	1	0.611204	0.848955	0.256652	-0.087893	0.153220	0.453549	1.324456	60.0	1	0.194764	1
2	patient_01	Left Thalamic Radiation	2	0.608178	0.843154	0.254787	-0.244466	0.092622	0.452349	1.313105	76.0	1	0.194764	1
3	patient_01	Left Thalamic Radiation	3	0.604815	0.837168	0.253062	-0.533194	0.027443	0.451471	1.301873	55.0	1	0.194764	1
4	patient_01	Left Thalamic Radiation	4	0.601006	0.831444	0.251759	0.272117	0.047876	0.451726	1.292319	71.0	1	0.194764	1