This notebook provides a generic example for some analysis that you might want to conduct with the data provided through this AFQ-Browser instance. Note that this is just an example, and not may be a good approach to the data in this particular instance of the AFQ-Browser, and this data-set. Ultimately, the limits of the analysis you could do are the limits of your imagination.
In [1]:
%matplotlib inline
In [2]:
import pandas as pd
In [3]:
subjects = pd.read_csv('./data/subjects.csv')
In [4]:
nodes = pd.read_csv('./data/nodes.csv')
In [5]:
merged = pd.merge(nodes, subjects, on="subjectID")
In [6]:
merged.head()
Out[6]:
You can use Matplotlib and Seaborn to visualize the data:
In [7]:
import matplotlib.pyplot as plt
import seaborn as sns
We focus on the calculated diffusion statistics that are included in the nodes table:
In [8]:
stats = nodes.columns.drop(["subjectID", "tractID", "nodeID"])
And specifically on the very first one
In [9]:
print(stats[0])
In [10]:
stat = merged[["nodeID", "subjectID", "tractID", stats[0]]]
Select a single tract:
In [11]:
tract_stat = stat[stat["tractID"] == stat["tractID"].values[0]]
In [12]:
tract_stat.head()
Out[12]:
In [13]:
tract_p = tract_stat.pivot(index='nodeID', columns='subjectID', values=stats[0])
In [14]:
import numpy as np
In [15]:
sns.tsplot(tract_p.values.T, err_style="unit_traces", estimator=np.nanmean)
Out[15]:
As an example of one approach to AFQ data, we include here an example of how you might use Scikit Learn's implementation of the K-means algorithm to cluster the subjects in these data into two clusters, based on this statistic/tract combination.
In [16]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import Imputer
from sklearn.pipeline import Pipeline
We create a pipeline that imputes nan values (that sometimes occur in tract profiles), and clusters the results into two clusters:
In [17]:
estimator = Pipeline([("impute", Imputer()), ("cluster", KMeans(n_clusters=2))])
We compute the clusters and transform the data into cluster distance space
In [18]:
clusters = estimator.fit(tract_p.values.T).steps[1][1]
In [19]:
labels = clusters.labels_
In [20]:
x, y = estimator.fit_transform(tract_p.values.T).T
We plot the results in the latent cluster space
In [21]:
plt.scatter(x, y, c=labels.astype(np.float))
Out[21]:
In [ ]: