R analysis


Copyright (c) 2015 The Hyve B.V. This notebook is licensed under the GNU General Public License, version 3. Authors: Ruslan Forostianov, Pieter Lukasse. Parts of the R script: Copyright 2014 Janssen Research & Development, LLC, & Copyright (c) 2015 The Hyve B.V., based on demoTransmartRClientCommands.R also made available as GPL v3 in this Jupyter home dir.

Before running other code cells: Authenticate with tranSMART

Authenticate with tranSMART first if you want to execute any of the analysis in the boxes below again.

Step 1: Please open URL http://localhost:8080/transmart/oauth/authorize?response_type=code&client_id=api-client&client_secret=api-client&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2Ftransmart%2Foauth%2Fverify

Step 2: paste token in the token parameter below.


In [1]:
require("transmartRClient")
connectToTransmart("http://localhost:8080/transmart",
                   prefetched.request.token = "v6FxXD")


Loading required package: transmartRClient
Loading required package: RCurl
Loading required package: bitops
Loading required package: RJSONIO
Loading required package: plyr
Loading required package: RProtoBuf

Attaching package: ‘RProtoBuf’

The following object is masked from ‘package:RCurl’:

    clone

Loading required package: hash
hash-2.2.6 provided by Decision Patterns


Attaching package: ‘hash’

The following object is masked from ‘package:RProtoBuf’:

    clear

Loading required package: reshape

Attaching package: ‘reshape’

The following objects are masked from ‘package:plyr’:

    rename, round_any

Authentication completed.
Connection successful.

If the output above is: Authentication completed. TRUE , then you can continue below.

Part 1 - Retrieving metadata and clinical data


Get studies and observations data


In [12]:
# Get studies
studies <- getStudies()
studies


Out[12]:
idapi.link.self.hrefontologyTerm.fullName
GSE8581GSE8581/studies/gse8581\Public Studies\GSE8581\

In [8]:
study <- "GSE8581"  

# Retrieve Clinical Data
allObservations <- getObservations(study, as.data.frame = T)
# show first 3 rows, just to get impression of the fields available
allObservations$observations[1:3,]


Out[8]:
subject.idBiomarker Data_GPL570Endpoints_DiagnosisEndpoints_FEV1Endpoints_Forced Expiratory Volume RatioSubjects_AgeSubjects_Height (inch)Subjects_Lung DiseaseSubjects_OrganismSubjects_RaceSubjects_Sex
11000384597NAnon-small cell adenocarcinoma1.41516566chronic obstructive pulmonary diseaseHomo sapiensAfro Americanfemale
21000384598Enon-small cell squamous cell carcinoma1.29537767chronic obstructive pulmonary diseaseHomo sapiensCaucasianfemale
31000384599Einflammation4.04795569controlHomo sapiensCaucasianmale

Making subsets based on attributes (aka "concepts")


In [9]:
# get the concepts for this study
concepts <- getConcepts(study)
concepts


Out[9]:
namefullNametypeapi.link.self.hrefapi.link.observations.hrefapi.link.parent.hrefapi.link.children.NA.hrefapi.link.children.NA.titleapi.link.highdim.href
1Afro American\Public Studies\GSE8581\Subjects\Race\Afro American\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Race/Afro%20American/studies/gse8581/concepts/Subjects/Race/Afro%20American/observations/studies/gse8581/concepts/Subjects/RaceNANANA
2Age\Public Studies\GSE8581\Subjects\Age\NUMERIC/studies/gse8581/concepts/Subjects/Age/studies/gse8581/concepts/Subjects/Age/observations/studies/gse8581/concepts/SubjectsNANANA
3Biomarker Data\Public Studies\GSE8581\Biomarker Data\UNKNOWN/studies/gse8581/concepts/Biomarker%20Data/studies/gse8581/concepts/Biomarker%20Data/observations/studies/gse8581/concepts/ROOT/studies/gse8581/concepts/Biomarker%20Data/GPL570GPL570NA
4carcinoid\Public Studies\GSE8581\Endpoints\Diagnosis\carcinoid\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/carcinoid/studies/gse8581/concepts/Endpoints/Diagnosis/carcinoid/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
5Caucasian\Public Studies\GSE8581\Subjects\Race\Caucasian\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Race/Caucasian/studies/gse8581/concepts/Subjects/Race/Caucasian/observations/studies/gse8581/concepts/Subjects/RaceNANANA
6chronic obstructive pulmonary disease\Public Studies\GSE8581\Subjects\Lung Disease\chronic obstructive pulmonary disease\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Lung%20Disease/chronic%20obstructive%20pulmonary%20disease/studies/gse8581/concepts/Subjects/Lung%20Disease/chronic%20obstructive%20pulmonary%20disease/observations/studies/gse8581/concepts/Subjects/Lung%20DiseaseNANANA
7control\Public Studies\GSE8581\Subjects\Lung Disease\control\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Lung%20Disease/control/studies/gse8581/concepts/Subjects/Lung%20Disease/control/observations/studies/gse8581/concepts/Subjects/Lung%20DiseaseNANANA
8Diagnosis\Public Studies\GSE8581\Endpoints\Diagnosis\UNKNOWN/studies/gse8581/concepts/Endpoints/Diagnosis/studies/gse8581/concepts/Endpoints/Diagnosis/observations/studies/gse8581/concepts/Endpoints/studies/gse8581/concepts/Endpoints/Diagnosis/UnknownUnknownNA
9emphysema\Public Studies\GSE8581\Endpoints\Diagnosis\emphysema\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/emphysema/studies/gse8581/concepts/Endpoints/Diagnosis/emphysema/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
10Endpoints\Public Studies\GSE8581\Endpoints\UNKNOWN/studies/gse8581/concepts/Endpoints/studies/gse8581/concepts/Endpoints/observations/studies/gse8581/concepts/ROOT/studies/gse8581/concepts/Endpoints/Forced%20Expiratory%20Volume%20RatioForced Expiratory Volume RatioNA
11female\Public Studies\GSE8581\Subjects\Sex\female\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Sex/female/studies/gse8581/concepts/Subjects/Sex/female/observations/studies/gse8581/concepts/Subjects/SexNANANA
12FEV1\Public Studies\GSE8581\Endpoints\FEV1\NUMERIC/studies/gse8581/concepts/Endpoints/FEV1/studies/gse8581/concepts/Endpoints/FEV1/observations/studies/gse8581/concepts/EndpointsNANANA
13Forced Expiratory Volume Ratio\Public Studies\GSE8581\Endpoints\Forced Expiratory Volume Ratio\NUMERIC/studies/gse8581/concepts/Endpoints/Forced%20Expiratory%20Volume%20Ratio/studies/gse8581/concepts/Endpoints/Forced%20Expiratory%20Volume%20Ratio/observations/studies/gse8581/concepts/EndpointsNANANA
14giant bullae\Public Studies\GSE8581\Endpoints\Diagnosis\giant bullae\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/giant%20bullae/studies/gse8581/concepts/Endpoints/Diagnosis/giant%20bullae/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
15Giant Cell Tumor\Public Studies\GSE8581\Endpoints\Diagnosis\Giant Cell Tumor\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/Giant%20Cell%20Tumor/studies/gse8581/concepts/Endpoints/Diagnosis/Giant%20Cell%20Tumor/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
16GPL570\Public Studies\GSE8581\Biomarker Data\GPL570\UNKNOWN/studies/gse8581/concepts/Biomarker%20Data/GPL570/studies/gse8581/concepts/Biomarker%20Data/GPL570/observations/studies/gse8581/concepts/Biomarker%20Data/studies/gse8581/concepts/Biomarker%20Data/GPL570/LungLungNA
17Height (inch)\Public Studies\GSE8581\Subjects\Height (inch)\NUMERIC/studies/gse8581/concepts/Subjects/Height%20%28inch%29/studies/gse8581/concepts/Subjects/Height%20%28inch%29/observations/studies/gse8581/concepts/SubjectsNANANA
18hematoma\Public Studies\GSE8581\Endpoints\Diagnosis\hematoma\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/hematoma/studies/gse8581/concepts/Endpoints/Diagnosis/hematoma/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
19Homo sapiens\Public Studies\GSE8581\Subjects\Organism\Homo sapiens\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Organism/Homo%20sapiens/studies/gse8581/concepts/Subjects/Organism/Homo%20sapiens/observations/studies/gse8581/concepts/Subjects/OrganismNANANA
20inflammation\Public Studies\GSE8581\Endpoints\Diagnosis\inflammation\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/inflammation/studies/gse8581/concepts/Endpoints/Diagnosis/inflammation/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
21Lung\Public Studies\GSE8581\Biomarker Data\GPL570\Lung\HIGH_DIMENSIONAL/studies/gse8581/concepts/Biomarker%20Data/GPL570/LungNA/studies/gse8581/concepts/Biomarker%20Data/GPL570NANA/studies/gse8581/concepts/Biomarker%20Data/GPL570/Lung/highdim
22Lung Disease\Public Studies\GSE8581\Subjects\Lung Disease\UNKNOWN/studies/gse8581/concepts/Subjects/Lung%20Disease/studies/gse8581/concepts/Subjects/Lung%20Disease/observations/studies/gse8581/concepts/Subjects/studies/gse8581/concepts/Subjects/Lung%20Disease/not%20specifiednot specifiedNA
23lymphoma\Public Studies\GSE8581\Endpoints\Diagnosis\lymphoma\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/lymphoma/studies/gse8581/concepts/Endpoints/Diagnosis/lymphoma/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
24male\Public Studies\GSE8581\Subjects\Sex\male\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Sex/male/studies/gse8581/concepts/Subjects/Sex/male/observations/studies/gse8581/concepts/Subjects/SexNANANA
25metastatic non-small cell adenocarcinoma\Public Studies\GSE8581\Endpoints\Diagnosis\metastatic non-small cell adenocarcinoma\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/metastatic%20non-small%20cell%20adenocarcinoma/studies/gse8581/concepts/Endpoints/Diagnosis/metastatic%20non-small%20cell%20adenocarcinoma/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
26metastatic renal cell carcinoma\Public Studies\GSE8581\Endpoints\Diagnosis\metastatic renal cell carcinoma\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/metastatic%20renal%20cell%20carcinoma/studies/gse8581/concepts/Endpoints/Diagnosis/metastatic%20renal%20cell%20carcinoma/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
27no malignancy\Public Studies\GSE8581\Endpoints\Diagnosis\no malignancy\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/no%20malignancy/studies/gse8581/concepts/Endpoints/Diagnosis/no%20malignancy/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
28non-small cell adenocarcinoma\Public Studies\GSE8581\Endpoints\Diagnosis\non-small cell adenocarcinoma\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/non-small%20cell%20adenocarcinoma/studies/gse8581/concepts/Endpoints/Diagnosis/non-small%20cell%20adenocarcinoma/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
29non-small cell squamous cell carcinoma\Public Studies\GSE8581\Endpoints\Diagnosis\non-small cell squamous cell carcinoma\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/non-small%20cell%20squamous%20cell%20carcinoma/studies/gse8581/concepts/Endpoints/Diagnosis/non-small%20cell%20squamous%20cell%20carcinoma/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
30not specified\Public Studies\GSE8581\Subjects\Lung Disease\not specified\CATEGORICAL_OPTION/studies/gse8581/concepts/Subjects/Lung%20Disease/not%20specified/studies/gse8581/concepts/Subjects/Lung%20Disease/not%20specified/observations/studies/gse8581/concepts/Subjects/Lung%20DiseaseNANANA
31NSC-Mixed\Public Studies\GSE8581\Endpoints\Diagnosis\NSC-Mixed\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/NSC-Mixed/studies/gse8581/concepts/Endpoints/Diagnosis/NSC-Mixed/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA
32Organism\Public Studies\GSE8581\Subjects\Organism\UNKNOWN/studies/gse8581/concepts/Subjects/Organism/studies/gse8581/concepts/Subjects/Organism/observations/studies/gse8581/concepts/Subjects/studies/gse8581/concepts/Subjects/Organism/Homo%20sapiensHomo sapiensNA
33Race\Public Studies\GSE8581\Subjects\Race\UNKNOWN/studies/gse8581/concepts/Subjects/Race/studies/gse8581/concepts/Subjects/Race/observations/studies/gse8581/concepts/Subjects/studies/gse8581/concepts/Subjects/Race/CaucasianCaucasianNA
34Sex\Public Studies\GSE8581\Subjects\Sex\UNKNOWN/studies/gse8581/concepts/Subjects/Sex/studies/gse8581/concepts/Subjects/Sex/observations/studies/gse8581/concepts/Subjects/studies/gse8581/concepts/Subjects/Sex/malemaleNA
35Subjects\Public Studies\GSE8581\Subjects\UNKNOWN/studies/gse8581/concepts/Subjects/studies/gse8581/concepts/Subjects/observations/studies/gse8581/concepts/ROOT/studies/gse8581/concepts/Subjects/SexSexNA
36Unknown\Public Studies\GSE8581\Endpoints\Diagnosis\Unknown\CATEGORICAL_OPTION/studies/gse8581/concepts/Endpoints/Diagnosis/Unknown/studies/gse8581/concepts/Endpoints/Diagnosis/Unknown/observations/studies/gse8581/concepts/Endpoints/DiagnosisNANANA

In [17]:
observations <- getObservations(study,
                                # concept names from api.link.self.href column above: 
                                                              
 concept.links =
                                  c("/studies/gse8581/concepts/Subjects/Age",
                                    "/studies/gse8581/concepts/Subjects/Lung%20Disease/")
                                )
observations$observations


Out[17]:
subject.idAgeLung Disease
1100038459765chronic obstructive pulmonary disease
2100038459877chronic obstructive pulmonary disease
3100038459955control
4100038460064chronic obstructive pulmonary disease
5100038460177not specified
6100038460261chronic obstructive pulmonary disease
7100038460361not specified
8100038460468control
9100038460577not specified
10100038460670chronic obstructive pulmonary disease
11100038460772chronic obstructive pulmonary disease
12100038460871not specified
13100038460939not specified
14100038461071control
15100038461156chronic obstructive pulmonary disease
16100038461274not specified
17100038461359control
18100038461471control
19100038461579chronic obstructive pulmonary disease
20100038461669not specified
21100038461764not specified
22100038461850control
23100038461968control
24100038462055control
25100038462161chronic obstructive pulmonary disease
26100038462259chronic obstructive pulmonary disease
27100038462378control
28100038462475chronic obstructive pulmonary disease
29100038462571control
30100038462677control
31100038462764not specified
32100038462873not specified
33100038462965not specified
34100038463052chronic obstructive pulmonary disease
35100038463153control
36100038463256chronic obstructive pulmonary disease
37100038463373not specified
38100038463440control
39100038463581not specified
40100038463682not specified
41100038463750not specified
42100038463867control
43100038463946not specified
44100038464062control
45100038464161not specified
46100038464270not specified
47100038464353chronic obstructive pulmonary disease
48100038464471not specified
49100038464575chronic obstructive pulmonary disease
50100038464668chronic obstructive pulmonary disease
51100038464784control
52100038464878control
53100038464979not specified
54100038465063chronic obstructive pulmonary disease
55100038465167not specified
56100038465261not specified
57100038465357control
58100038465454control

In [18]:
observations_control <- subset(observations$observations, `Lung Disease` == 'control')
observations_case <- subset(observations$observations, `Lung Disease` == 'chronic obstructive pulmonary disease')
observations_control
observations_case


Out[18]:
subject.idAgeLung Disease
3100038459955control
8100038460468control
14100038461071control
17100038461359control
18100038461471control
22100038461850control
23100038461968control
24100038462055control
27100038462378control
29100038462571control
30100038462677control
35100038463153control
38100038463440control
42100038463867control
44100038464062control
51100038464784control
52100038464878control
57100038465357control
58100038465454control
Out[18]:
subject.idAgeLung Disease
1100038459765chronic obstructive pulmonary disease
2100038459877chronic obstructive pulmonary disease
4100038460064chronic obstructive pulmonary disease
6100038460261chronic obstructive pulmonary disease
10100038460670chronic obstructive pulmonary disease
11100038460772chronic obstructive pulmonary disease
15100038461156chronic obstructive pulmonary disease
19100038461579chronic obstructive pulmonary disease
25100038462161chronic obstructive pulmonary disease
26100038462259chronic obstructive pulmonary disease
28100038462475chronic obstructive pulmonary disease
34100038463052chronic obstructive pulmonary disease
36100038463256chronic obstructive pulmonary disease
47100038464353chronic obstructive pulmonary disease
49100038464575chronic obstructive pulmonary disease
50100038464668chronic obstructive pulmonary disease
54100038465063chronic obstructive pulmonary disease

In [11]:
observations <- getObservations(study, 
                                # concept names from api.link.self.href column above: 
                                concept.links =
                                  c("/studies/gse8581/concepts/Subjects/Age",
                                    "/studies/gse8581/concepts/Subjects/Sex")
                                )
# make two groups based on gender :
observations_female <- subset(observations$observations, `Sex` == 'female')
observations_male <- subset(observations$observations, `Sex` == 'male')
observations_male
# show age distribution:
d <- density(as.integer(observations_male$Age)) # returns the density data 
plot(d, col="blue", main="Male vs Female groups age distribution") # plots the results
legend("topright", c("Males","Females"), pch = 1, col=c("blue", "red"))
d <- density(as.integer(observations_female$Age)) # returns the density data 
lines(d, col="red") # plots the results


Out[11]:
subject.idAgeSex
3100038459955male
4100038460064male
5100038460177male
6100038460261male
7100038460361male
10100038460670male
13100038460939male
14100038461071male
16100038461274male
20100038461669male
23100038461968male
25100038462161male
27100038462378male
28100038462475male
31100038462764male
32100038462873male
36100038463256male
38100038463440male
40100038463682male
41100038463750male
42100038463867male
43100038463946male
46100038464270male
48100038464471male
49100038464575male
50100038464668male
53100038464979male
54100038465063male

In [20]:
# show age distribution:
d <- density(as.integer(observations_control$Age)) # returns the density data 
plot(d, col="blue", main="Case vs Control groups age distribution") # plots the results
legend("topright", c("Control","Case"), pch = 1, col=c("blue", "red"))
d <- density(as.integer(observations_case$Age)) # returns the density data 
lines(d, col="red") # plots the results


Exercise 1

Make a distribution plot similar to the plot above, but now comparing the ages of "control" vs "chronic obstructive pulmonary disease" (if you did the previous RStudio exercise, you can just paste your code in a code cell below).

NB: run also all necessary cells above to fetch the data that your script requires.


In [ ]:
# write your code here:
observations_female <- subset(observations$observations, `Sex` == 'female')
observations_male <- subset(observations$observations, `Sex` == 'male')

Part 2 - Retrieving molecular (aka "high dimensional") data


Downloading the expression data for our chosen study

This can take a while (~1 minute)


In [21]:
dataDownloaded <- getHighdimData(study.name = study, concept.match = "Lung", projection = "log_intensity")


Retrieving data from server. This can take some time, depending on your network connection speed. 2015-10-15 11:23:57
Retrieving data: 
 24.363 MiB downloaded.
Download complete.
Received data for 55 assays. Unpacking data. 2015-10-15 11:25:18
  |======================================================================| 100%
Data unpacked. Converting to data.frame. 2015-10-15 11:25:38
Additional biomarker information is available.
This function will return a list containing a dataframe containing the high dimensional data and a hash describing which (column) labels refer to which bioMarker

In [23]:
summary(dataDownloaded)


Out[23]:
                    Length Class      Mode
data                54680  data.frame list
labelToBioMarkerMap 54674  hash       S4  

In [24]:
# preview part of the data
data<-dataDownloaded[["data"]]
data[1:10,1:10]


Out[24]:
assayIdpatientIdsampleTypeNametimepointNametissueTypeNameplatformX235956_atX226260_x_atX232632_atX214503_x_at
145741GSE8581GSM213034HumanLungGPL5706.4617924.9690125.5469750.3130368
245742GSE8581GSM212811HumanLungGPL5707.4617581.0536955.507547-1.114216
345743GSE8581GSM213036HumanLungGPL5706.9071915.0643487.6373860.5699965
445744GSE8581GSM212075HumanLungGPL5707.7755584.7667916.595917-0.1566947
545745GSE8581GSM211008HumanLungGPL5707.3087853.5592358.2498530.2687817
645746GSE8581GSM210090HumanLungGPL5706.2866544.9675387.3189860.1080092
745747GSE8581GSM212855HumanLungGPL5704.9243184.0855954.5525121.08789
845748GSE8581GSM212070HumanLungGPL5708.00494.7677456.7625351.19427
945749GSE8581GSM212810HumanLungGPL5702.411315-1.4471940.2749713-1.321647
1045750GSE8581GSM210193HumanLungGPL5706.6706854.0258537.5389340.9389105

Prepare the data for easy usage in different standard R functions

The steps below show how the table above is processed into a simple table that contains only expression values + an extra feature of having patient identifiers as row names.


In [25]:
# select gene expression data, which is the data *excluding* columns 1 to 6:
expression_data<-data[,-c(1:6)]
expression_data[1:3,1:3]
dim(expression_data)
# add patientId as the row name for the expression_data matrix:
rownames(expression_data)<-data$patientId
expression_data[1:3,1:3]


Out[25]:
X235956_atX226260_x_atX232632_at
16.4617924.9690125.546975
27.4617581.0536955.507547
36.9071915.0643487.637386
Out[25]:
  1. 55
  2. 54674
Out[25]:
X235956_atX226260_x_atX232632_at
GSE8581GSM2130346.4617924.9690125.546975
GSE8581GSM2128117.4617581.0536955.507547
GSE8581GSM2130366.9071915.0643487.637386

Generating the heatmap

If the dimensions of the expression_data table are large, you may want to create a subset of the data first. Here we use a probelist as a subset for the probes, based on the list found in: "Bhattacharya S., Srisuma S., Demeo D. L., et al., Molecular biomarkers for quantitative and discrete COPD phenotypes.American Journal of Respiratory Cell and Molecular Biology. 2009;40(3):359–367. doi: 10.1165/rcmb.2008-0114OC."


In [26]:
#Select a subset of the probes:
probeNames<- c("1552622_s_at","1555318_at","1557293_at","1558280_s_at","1558411_at","1558515_at","1559964_at","204284_at","205051_s_at","205528_s_at","208835_s_at","209377_s_at","209815_at","211548_s_at","212179_at","212263_at","213156_at","213269_at","213650_at","213878_at","215359_x_at","215933_s_at","218352_at","218490_s_at","220094_s_at","220906_at","220925_at","222108_at","224711_at","225318_at","225595_at","225835_at","225892_at","226316_at","226492_at","226534_at","226666_at","226800_at","227095_at","227105_at","227148_at","227812_at","227852_at","227930_at","227947_at","228157_at","228630_at","228665_at","228760_at","228850_s_at","228875_at","228963_at","229111_at","229572_at","230142_s_at","230986_at","232014_at","235423_at","235810_at","238712_at","238992_at","239842_x_at","239847_at","241936_x_at","242389_at")
#note: this is because R automatically prepends "X" in front of column names that start with a numerical value. Therefore prepend "X"
probeNames<- paste("X", probeNames, sep = "")

In [27]:
# select only the cases and controls (excluding the patients for which the lung disease is not specified). Note: in the observation table the database IDs 
# are used to identify the patients and not the patient IDs that are used in the gene expression dataset
cases <- allObservations$observations$subject.id[allObservations$observations$'Subjects_Lung Disease' == "chronic obstructive pulmonary disease"]
controls <- allObservations$observations$subject.id[allObservations$observations$'Subjects_Lung Disease' == "control"]
# visualize:
par(pin=c(5,2))
barplot(c(length(cases),length(controls)), main="Cases and controls", horiz=TRUE,
  names.arg=c("cases", "controls"))


Preparing the data

Some basic data preps to separate cases and controls and make the data suitable for passing on to the prcomp() function.


In [29]:
# now we have the *internal database* IDs for the patients, but we need to get the patient IDs because 
# this is the index of the expression_data matrix. 
# These can be retrieved from the subjectInfo table: 
subjectInfo <- allObservations$subjectInfo
patientIDsCase    <- subjectInfo$subject.inTrialId[ subjectInfo$subject.id %in% cases ] 
patientIDsControl <- subjectInfo$subject.inTrialId[ subjectInfo$subject.id %in% controls] 

# patient sets containing case and control patientIDs
patientSets <- c(patientIDsCase, patientIDsControl)
patientSets <- patientSets[which(patientSets %in% rownames(expression_data))]
# make a subset of the data based on the selected patientSets and the probelist, and transpose the 
# table so that the rows now contain probe names
subset<-t(expression_data[patientSets,probeNames]) 
# for ease of recognition: append "Case" and "Control" to the patient names
colnames(subset)[colnames(subset)%in% patientIDsCase] <- paste(colnames(subset)[colnames(subset)%in% patientIDsCase],"Case", sep="_" )
colnames(subset)[colnames(subset)%in% patientIDsControl] <- paste( colnames(subset)[colnames(subset)%in% patientIDsControl] , "Control",sep= "_")
subset


Out[29]:
GSE8581GSM212788_CaseGSE8581GSM212074_CaseGSE8581GSM210993_CaseGSE8581GSM210992_CaseGSE8581GSM212852_CaseGSE8581GSM210194_CaseGSE8581GSM212075_CaseGSE8581GSM211007_CaseGSE8581GSM212848_CaseGSE8581GSM213020_CaseGSE8581GSM212787_ControlGSE8581GSM212853_ControlGSE8581GSM212811_ControlGSE8581GSM210009_ControlGSE8581GSM212070_ControlGSE8581GSM213035_ControlGSE8581GSM212789_ControlGSE8581GSM212068_ControlGSE8581GSM210196_ControlGSE8581GSM212790_Control
X1552622_s_at5.710384926132476.763969209614837.847746416921396.483256322235317.323153864793686.742801199258077.418308030873767.004939527463845.804941542888647.821225494191637.551569848528228.5675416449886 6.564190654805167.939073741768448.379140064426368.845756611270497.928826275255887.901253084643167.308739470344597.87585991029486
X1555318_at4.828108550554354.258684742197944.670058513228155.406948168709216.340225767097255.053771696286965.273795599214264.7186739707379 5.380674044404356.245242743086565.736445228433 4.995669931716993.920140827684886.051837552472236.707662825386056.892998093055075.540644177216527.322676298891185.914210503161336.89704676223591
X1557293_at4.229734030511327.749380068044416.6285271963706 5.531674648742187.218112739881515.932602175050377.923155124325597.427522370682556.606314891053315.473342087823646.982685614823087.402790381315117.4979875197307 7.133183321557838.035519442511877.155779585289557.910366905779546.368857380040026.535872552088117.9142595449449
X1558280_s_at5.412392012832194.347020692662695.598237153309284.362729368168236.761976665022565.468013033503555.1624678676153 5.329970303077695.111628137391066.555093706597495.399666721426427.015035859513635.532039491298275.666097572787326.859957128272246.421866224214716.5257791790998 5.421953720648336.663999206160917.13959227933716
X1558411_at-0.2791314421354754.88666719694032 0.128979606885278 3.52669484554198 4.02934686633434 5.18158760921047 5.34186124894168 4.56246224671744 3.53365055512354 4.72039339463058 5.0223456175494 5.99999323735115 5.27488035900523 5.02172400545784 5.44176564335038 5.23926818705853 5.34584342235272 4.12964570610481 5.41312691801417 5.62579941955529
X1558515_at6.272971232932696.385329216606017.563447982901447.1957806568306 7.220300930411617.339297754654247.837993684891718.042009516993496.924337464075356.153146652184988.034578896979337.472877636359296.761431556924527.329267140895538.633707283916898.337653119167248.161237718738367.713166527986757.002454999998498.45349288659705
X1559964_at6.314667526469545.768282265886336.370983760846765.4811406859639 6.689830287675016.477368099238146.670004676513 6.911511832920865.742984388707975.514759809427886.767734239609636.985101903624266.5733548322638 5.814232523860587.6022757361326 6.474359785426137.500355921130117.060728351436576.405262321021317.77187897758957
X204284_at6.872496359759527.022467688458717.977411592038718.682618942642658.058825159411537.9483847540901 8.0507356702005 7.586126706541586.161686174584 7.108628996109467.462256860648238.893677705126148.479473102976886.018705381416328.557643668254217.3009931074202 9.451266757567898.734868782025588.316200059608218.076890401521
X205051_s_at6.121293134478517.718224611556287.593913946637086.993040269800148.6191562021118 8.591477466182478.584188347274918.748914016833115.224993367061188.593230503140887.215212634917648.017621315217578.522467758839048.352405433910639.999267195345127.618811273076049.607838866588518.708049251702229.013853029839578.63689283539135
X205528_s_at4.445521459555835.6917686396401 7.028591252135366.7568098142945 5.783613802123696.735616945817566.7362936828463 6.158761144310454.967740406560355.1003091114071 6.680619730230766.746527757118194.7302214865006 5.7124224485484 6.850474410161196.419163000266346.201782827423036.967998564113286.903484221540367.25714265799622
X208835_s_at6.6421990244728 8.517413498654118.0647050467398 7.428519300101077.975612991586438.798212562532 8.033158666781 8.7311900313484 7.044711040275138.111850318593597.336247677149188.304561235918587.5033487351675 8.434686601399618.882881441133389.3071757581807 8.810417396291819.645829528416188.5705464179067 7.86446474984065
X209377_s_at9.0708427824059310.94075404747489.9188215058227510.028969661053611.010276329087710.481466463620110.761717431977110.50028420848669.4249745595796 10.90876788284929.6212496924132210.944375319708 10.828612542618110.050324832393411.411055439689310.265322024234911.487669231124310.781408895471810.480093420139210.5816337020987
X209815_at6.384866619051478.195322988821926.300949193272068.235257954538798.195293456867437.324513610924898.757273322206657.680000783256117.263212853168187.3830908006872 7.681069706757198.283903257108727.537000911946077.612426224129738.094726950313267.392841163400028.117269068954558.727018488268818.013289170684268.19691681732122
X211548_s_at7.373543815731279.968637996599616.813768367204129.1953845117855910.116603882874210.048772932465810.24592105746719.501142534149818.5321182165844310.882620125 8.7578866576922710.69577637668268.9484197984624810.644162729902211.247767336928310.31897205211748.4582630543488210.916103382413710.868019977116410.1899604227346
X212179_at7.530078547643729.346531462028028.266870851577847.4712443735187 9.924989119039389.369804521166539.587916248362038.641733867775498.030176039827569.366870771239348.291456216454499.910494331347628.883333665984288.7413860853455310.07205219161289.019268240246619.620868617396849.689997971419449.788914049662939.23191955450771
X212263_at7.077178896721619.648033950563848.492462105676237.948700122800459.5188970262467 9.374289643487418.940228427901898.967335789965847.499566885927919.567085253588088.440790284286629.538782575443548.319459877896129.150991161447589.792058188882749.964733620184039.1813833677913710.082561561211 9.625944403149358.6396011843039
X213156_at6.062607451463548.315357911329987.959625275764797.170065256642338.3743526841367 8.281123282761388.5486562490502 8.760989336257346.573266962637078.339048282919077.692747996907118.305660505173927.235526033387128.441466879995248.981256162781248.654181311043438.437539865089289.045423857895948.100641315973478.20413582482097
X213269_at4.876384335309236.0411071536008 5.591499906551125.097159883967376.154822158716716.475498365996066.572468315697636.215622300896845.209886109232925.9534072662352 5.822131024735236.506024271260746.137222744695125.280009791376756.730789379868136.281143688749916.871486379581737.444203231022726.3606978699802 6.09803208296053
X213650_at4.759123868267996.1020323799582 6.292798310683 5.173723101958615.579490936246716.707856051482196.782395458989936.360429245524235.655371865904536.258332357255275.532491518612656.6069692248833 5.3602062316727 6.567616935695127.207258410363537.166705403902526.435021787858296.894696523336376.946485244260786.24699325579582
X213878_at6.1264566899848 7.642910914911627.024031501516757.069003933304148.188539395158387.162028721087218.303305859000387.792686195300136.372161602005077.861316477907426.509748089600498.231840374472037.913260438258457.664134355569618.703976225322417.311421373027628.300100838996627.862779915438727.6852533877236 7.63918874280131
X215359_x_at6.800537563291877.817290683090367.241544675327346.989411531067868.034325780035887.737051251641338.231787585367218.158321877820087.269790586967797.940049262389527.8269424735443 8.461057832240937.351698992293897.774800234838198.589935725384267.798718034603028.452801321637118.055591551320157.9683269567362 8.42947788063223
X215933_s_at4.861439314899986.110526113701676.210055662546846.698829660203856.706972520846976.7041249772765 6.726803975554747.103308792914554.835722001974356.2229726029782 6.626842100487677.509387067087486.1924565572307 5.555475683024057.330854150879636.592392749942756.892694591414977.243868976032647.645946582415346.91172752897535
X218352_at5.335658322692887.164454442645987.253497698334977.191750160445677.749742075872416.7952474508048 7.548290189883387.504032390252765.831517535721247.239818079033626.013473426133647.529867786516397.033720320258367.548382676757457.815107860480337.2687057342641 7.383436461948497.817891819355827.590804092764037.19750176311214
X218490_s_at5.499664143885157.822914933619667.423594977676086.545502011735758.179670279347437.8241823076216 7.720853020685287.612669491146736.229589845307397.718539725643666.586140220834278.188138783230477.719950045068567.459382435013458.221050541929677.393948120223388.117300242028148.182006934047497.925631579345557.57432861474722
X220094_s_at5.8945922488113 7.745136469051386.944788182974467.209463113535487.843625888705887.305952286604238.014907580541117.737139157972586.704138813887887.229837834936146.932061171343437.625964487888667.924052365258627.5521849162086 8.128283088937287.540011765659758.145779336841167.565642868596887.461307541465427.83416762547423
X220906_at5.414977761450546.747172537035465.651929960474345.115178937131456.480258658352526.3573954109264 6.862662151245486.238741010499235.856149102318417.034281755164326.287163820325966.406283251163586.326995667521435.981759057695747.517952781970376.667139934317986.382203800711426.506884345434377.037612841479477.18682736141406
X220925_at5.229180280735387.131167594025135.8843097439118 6.041012942362 6.699676552601176.905326828787517.008988783227266.382950784310095.536503672011076.827514197689215.409781259636796.859509938891336.168481584572646.658125775550127.345564921267787.188371250612197.327193328979387.640628157687256.973898224829426.55162521535875
X222108_at7.838605165977299.793749603330748.001211129904479.507477529516939.869436533869228.919858472725939.889265142252399.104349765633128.204228769210359.453550502049358.317525673209389.358143476292628.9245334368354 9.4960108774898410.02786453922029.171089207262749.175372399513159.616348506609469.726989547938548.9614149651126
X224711_at6.879570997373578.996244673137368.110519849797768.194126460861298.902158015512778.366829254300798.924560158331178.448801861544427.474598280535648.622721748133427.579813722922388.909326092196368.374322251063738.256996470368869.394908510666818.6112793680884 8.9429835981871 9.173549723842558.689512333733268.67465484526484
X225318_at6.349980896172778.295732206632957.783541778190277.145402338809388.277008366854877.792536539825528.864730922314758.476353332433226.946496941203297.980008157847916.984042426097028.172507517243727.618994756233848.059555373641288.564263833806327.710751341667398.306695151002838.5830714817088 8.029596441378188.24590917663782
X226534_at8.2995240026944610.25343387675718.139684360703699.3909363260455210.27806608251749.9641271193601610.16615052649899.953217214353018.7844581782053110.00936676909219.3412474932311610.91892285154059.648006977999399.2023515998024410.68166008803989.5147219430850210.113117036390710.40663020289949.996953601018449.9780955026221
X226666_at5.618317954649977.761790553706277.321711674399296.560473466238717.457101753173916.7363613390931 7.747152392024386.935506893931696.065162984453027.9358074142745 6.393306360352657.504302583163176.986024040860187.632748064484198.352273002454327.367030272574887.608920101252297.773290887824327.492846610742167.48688327815107
X226800_at4.235688771245036.783443557940595.690238284136845.348207628017616.698204584715296.052654990258566.468379670204285.095152244990354.711951871250716.213885761452485.440779489390966.828441179681586.356235384640825.921833797947346.689075469545316.338579515774396.879693515083526.899006403644196.698815772579346.2511737189701
X227095_at7.713812644219599.386918696320358.431886170720838.699065729369099.847250005854338.550292745904429.331389484941229.019604627087097.536713466008389.114437304645528.208839114697519.145412529240799.006255940940599.263722245818229.818655668379648.983623547020719.436395072153369.252163949648739.133702223943989.28908978410877
X227105_at5.110421711752318.189266147606416.497634717958016.624763294092358.107975500976367.128808910681248.0157551259158 7.5567052975987 6.489812645079787.727253394174126.448623483110647.747112101158327.389446352710987.847019750942458.685323480327647.368096075356697.372177276837618.424804545418657.884152199055098.12787570310306
X227148_at7.142015130442079.298463578910927.568191753377837.255293000321069.114851104521729.192988099896689.686206355486468.836751012678157.837905408483059.395274919748918.3830519086121710.02034106819218.5221617057453 8.3796339570644510.30761067215818.777386836949749.264041404851349.917182592090189.192381592972439.38479238249859
X227812_at5.294620748891638.222910796726776.432590337382597.774306081086677.5423664026216 7.087643166860347.969858468053877.316806464077256.043666030440857.748595405990966.6724253419715 7.526248824231828.555704973704457.234357611046088.765833066973547.087006622897628.398500016198098.113283764120127.7580299539483 7.43907454321691
X227852_at4.687855743666226.074471183938015.655371865904536.040508917294745.930609617922415.8664604656271 6.268481180094645.852303529303275.301082761322366.185803128750926.079337573421396.622412082065116.187601491879376.056444801019676.663188043728446.853172297886896.925453558955546.691994704178646.225222864176416.1587813372553
X227930_at6.494082740845298.087993146362327.438883317375957.552715201545888.420419239868057.753317048939678.249905117377248.174860784724316.880820190101638.3315529574236 7.116312767493958.180127965072427.508690101469537.899066844078768.728604211617598.042863176247128.303283023851188.355716679683838.153126380218478.01690854872284
X227947_at6.307045941692719.187161515988218.522675680952178.872425561025749.684562793540258.842516919498749.423630691251379.643592873901467.227278994468329.732046984013017.561700357594239.366256635708079.087656425206839.274184771391029.922368837118799.3392108899275 8.840139031674419.7006581866651110.15238625842849.0505044182782
X228157_at5.596100283910157.272256525366676.692594580285494.953176755972967.707407451201317.468819511290157.850074293664657.498322681312165.088200972794897.537257234337366.259750794868738.030987086664357.441026921321057.411036367087528.156790973771437.596309043825897.710434521650367.9873151806444 7.845772289756 7.59949633401661
X228630_at5.370401454619237.136837353083246.902375114486036.401417145458457.529094732185617.126467015120267.637233468853227.397366596973085.399577862796726.878578220276566.417059236398487.639810977716787.349090995082316.524770490895488.216348152985176.550502083553017.819981057044087.449825459373437.176791492583777.23848117147476
X228665_at6.884781054471299.034163431664687.702103406629398.419028970509158.492870638906538.6274572791625 9.407648041680649.0309291698658 7.201496641800619.133026586615297.444956888518999.089574948746888.6444728101115 8.421968022445399.142275574668458.820290671201818.630456437885079.169549250680749.531014891426268.69553802871125
X228760_at6.169017020146688.105306168946336.864384269424177.091382490943647.412535975357486.8149605130157 7.667622516977537.4532788661985 6.508718620305727.371698246703756.903580625348647.614606790788917.736138061436867.477960593776238.433781542549326.8746863255335 8.069702166630078.182444015029677.295796479647597.55746370159782
X228850_s_at6.123873084877298.256439878692547.386017659914578.339898982440388.156790973771438.205392512989688.184240615792318.278221768572996.850411899272948.060685124942657.467246869444018.445379095903637.091858479560528.035695382943228.635257399960237.9701173976055 8.379486673444268.594436581082918.438787694949677.92945371412982
X228875_at7.018500651020528.841237247361936.602568227938398.699409349258557.398812084748618.297503257424758.818460749532437.501980452330447.313989504366897.6825381845994 7.864495702651589.0478164576606 8.170060247762828.255094681988269.050537067855657.757550022126268.134939643492988.684857300523088.094595009612078.10571473978235
X228963_at4.694562653542316.920757765617266.852360964065056.703848217177267.684019200167436.816714892278697.255585706139187.100546708539445.503129260895297.606027601015415.236293612236997.042863176247126.975767736458837.094394436633568.130786757816087.1188372779797 7.309612772454127.237362947643387.444128672069137.4984423630095
X229111_at6.1465676575172 7.872120778980656.355002164316396.219706767991847.379967455570577.331364850495457.848034476249126.741493952411586.189779103388676.981338944144327.1523051261013 7.226489262919337.452908371232236.570359704203098.356465175607927.047102094254677.784994656590888.345533874822667.703308382228847.93243918655347
X229572_at4.744101879260186.658411446423675.716894894392685.4576730824433 6.838598863370566.5623324944331 7.183764387127616.426195973415235.627571820834766.281067628598366.479313029983416.968689830057846.338219370641176.178011444773347.4110109365596 6.509068722294597.469942940770817.212977811168796.783076809914236.69462950972053
X230142_s_at7.0770079435668 7.496174702279776.341678039068146.536898370236837.340019197936827.393047173175778.717097172681976.883767424185896.592515341899857.9905805277468 7.425509658562747.529118164161527.034050602545076.886696457860549.0511926238192 7.993912074399247.394977085526478.416392207400637.763311974080537.94921390932879
X230986_at4.427552539999096.357852827950395.238171596256074.420920250937325.981722530938675.802402625216885.785181722869955.242011741220244.226477702662225.427297757049115.378605256672166.554501366453515.364365817968965.7746974647984 6.334789925699745.2099017011742 6.378053772375966.580083661927515.841975634083715.98480806202707
X232014_at4.126212995334965.319401166858325.228514730722384.8167225710013 5.362669762160696.144115728287955.335779785124774.829301596918224.058030699359475.295686295583625.618041857630025.753615232377965.569743225603335.286016158409485.881106617139766.087100000255936.154514351804776.3046661815938 5.862639836946646.00355501552732
X235423_at5.441672701380856.302573127168946.5554066586264 6.444090562671457.419479901608696.3913535867025 7.009167988871976.140903373177455.770099587231976.626206943406466.400372269719127.183526213831746.877670641181376.113644246961767.452224764162116.5010617020533 7.265474403232117.001824756425876.812369865972227.06448410715948
X235810_at4.6221572714835 6.347213857623715.972736301760835.208283138412526.401744793725445.839546466218126.242202336913825.574216579930745.263875732577866.1567505318425 5.349669581678066.484347622643475.489870433399815.690945018609626.8257601918555 6.183817970700856.310089226927926.8516116353451 6.5632496690622 5.99279108597799
X238712_at5.9464665289555 6.855454067098536.377415313465745.371109257149817.386543692117317.667686376117247.620447119145657.4821222363974 6.424352756126196.818160331215957.114106699965127.3807207917533 7.1483443820384 5.586950846816348.064737376523427.379153063767467.802936349572857.718669861379547.740819650894828.07552728308864
X238992_at4.405999164825076.2021277428609 5.794790138673435.379586308126816.195170400549645.877525866314726.108616451789665.938189326661884.543706209566145.441835345898415.379752639394596.455928834318525.747030172927115.253414493348577.183049749248065.911473483579246.574822069970176.523966481208376.056345082498916.44396297170877
X239842_x_at6.126109722472226.876811594610327.180973306847867.264151686043917.284282771200146.784556329405187.267301696937086.709745542363126.599933664439526.004337338144497.279489867809187.518023621732236.825849214526136.202145378366157.384801014852177.069089887951928.253313317184087.373700406303716.6939747429164 7.09963185001446
X239847_at5.117890325152156.776525151421914.426734392917996.731658448751817.315946561307286.540239651772086.706862041462546.880012012444665.218699810506635.928242393648935.811804949068747.303999875085495.2310252873535 6.778826216279837.436236811124776.203447850173676.281126992962587.611747808506426.746232136655187.17646240727228
X241936_x_at5.880487177787326.172977478170946.792855352362497.471293154416117.161081482277186.593894530853317.0948694327187 6.709221697960076.596006332091855.911773060686736.796013643331037.294060102405526.979739567128755.715585914187747.163941455792796.712815854437377.633191419395677.386716119948556.297980060162837.46844484030874
X242389_at4.850594423474576.912134866890285.719035548444284.8011845291587 6.844121978499256.933549033764247.302428729626156.282525059388525.472653476916976.756649719411035.662988741064156.811098491727926.486291219102055.943684607655248.004462065899557.3865695576058 6.848485236283277.722780213608377.239827623918566.52111357068251

The heatmap


In [30]:
# make heatmap
heatmap(as.matrix(subset), scale = "row")


There is one patient that seems to be an outlier: GSE8581GSM212810_Case. We want to emove this outlier and plot the heatmap again.


In [31]:
subset_without_outlier <- subset[,colnames(subset)!= "GSE8581GSM212810_Case"]

Exercise 2

Generate heatmap above again, now without the outlier case.


In [32]:
# add code here:
heatmap(as.matrix(subset_without_outlier), scale = "row")