The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. The Notebook has support for over 40 programming languages, including those popular in Data Science such as Julia, Python, R and Scala.
See transmart api client python module.
In [2]:
import getpass
from transmart_api import TransmartApi
api = TransmartApi(
host = 'http://localhost:8080',
user = raw_input('Username:'),
password = getpass.getpass('Password:'))
api.access()
Out[2]:
In [3]:
obs = api.get_observations(study = 'GSE8581')
obs[0:3]
Out[3]:
The api returns data serialized with Google Protocol Buffers. To deserialize binary stream it's important to have a protobuf definition file for the data structure. See it.
In [4]:
(hdHeader, hdRows) = api.get_hd_node_data(study = 'GSE8581',
node_name = 'Lung',
projection='default_real_projection',
genes = ['TP53', 'AURKA'])
In [5]:
hdHeader
Out[5]:
In [6]:
hdRows[0:5]
Out[6]:
In [7]:
#type:
#1 - double
#2 - string
[(x.name, x.type) for x in hdHeader.columnSpec]
Out[7]:
In [8]:
hdDataDic = {row.label: row.value[0].doubleValue for row in hdRows}
In [9]:
import pandas
from pandas.io.json import json_normalize
obs_df1 = json_normalize(obs)
obs_df1[0:3]
Out[9]:
In [10]:
obs_df2 = obs_df1.pivot(index = 'subject.inTrialId', columns = 'label', values = 'value')
obs_df2[0:3]
Out[10]:
In [11]:
obs_df2.dtypes
Out[11]:
In [12]:
#to make pandas guess data types of the columns
obs_df3 = obs_df2.convert_objects()
obs_df3.dtypes
Out[12]:
In [13]:
obs_df4 = obs_df3.rename(
columns = lambda c: c.replace('\Public Studies\GSE8581\\', '')[:-1],
inplace = False)
obs_df4[0:3]
Out[13]:
In [14]:
from pandas import DataFrame
hdDataDic['patientId'] = [assay.patientId for assay in hdHeader.assay]
assayIds = [assay.assayId for assay in hdHeader.assay]
hd_df = DataFrame(data=hdDataDic, index = assayIds)
hd_df[0:10]
Out[14]:
In [15]:
males_abv_50 = (obs_df4['Subjects\Age'] > 50) & (obs_df4['Subjects\Sex'] == 'male')
males_abv_50[0:10]
Out[15]:
In [16]:
obs_df4[males_abv_50][0:3]
Out[16]:
See pandas documentation on how to merge
In [17]:
hd_df.join(obs_df4, on='patientId')[0:3]
Out[17]:
The Notebook gives possibility to interact with R environunment easily. You need just to mark a cell with so called magic notation.
In [19]:
%load_ext rpy2.ipython
In [20]:
%%R -i obs_df4
str(obs_df4)
In [21]:
%%R -i obs_df4
library(ggplot2)
ggplot(obs_df4, aes(x = Subjects.Sex, y = Endpoints.FEV1)) + geom_boxplot()
In [22]:
%%R -i obs_df4 -o tmodel
tmodel <- t.test(obs_df4$Endpoints.FEV1~obs_df4$Subjects.Sex)
tmodel
In [23]:
tmodel[3]
Out[23]:
In [30]:
%%R -i hd_df
library(dplyr)
pc <- princomp(select(hd_df, -patientId))$loadings[, 1:2]
plot(pc[, 1], pc[, 2], pch = 19, xlab = "Comp. 1", ylab = "Comp. 2", main = "", cex = 1.3)
In [31]:
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
def f(x):
return x
interact(f, x=widgets.IntSlider(min=-10,max=30,step=1,value=10));
In [26]:
concepts = api.get_concepts('GSE8581', hal = True)
In [27]:
import ipywidgets as widgets
from IPython.display import display
d = widgets.Dropdown()
d.options = [c['name'] for c in concepts['_embedded']['ontology_terms'] if c['type'] == 'NUMERIC']
def f(prop, val):
print val
d.on_trait_change(f, 'value')
display(d)