Chemometrics

Julien Wist / 2017 / Universidad del Valle
Andrés Bernal / 2017 / ???

An up-to-date version of this notebook can be found here: https://github.com/jwist/chemometrics/



In [1]:

    
options(repr.plot.width=4, repr.plot.height=4)



In [169]:

    
# we load a few packages
library(ggplot2)
library(corrplot)
library(reshape2)
library(caret)
library(MASS) # for LDA
library(klaR) # for pls
library(pls)
library(e1071)# for pls
library(pROC) # for pls
library(vegan)
require(scales)
require(gridExtra);









    



corrplot 0.84 loaded
Loading required package: lattice

Attaching package: ‘pls’

The following object is masked from ‘package:caret’:

    R2

The following object is masked from ‘package:corrplot’:

    corrplot

The following object is masked from ‘package:stats’:

    loadings

Type 'citation("pROC")' for a citation.

Attaching package: ‘pROC’

The following objects are masked from ‘package:stats’:

    cov, smooth, var

Loading required package: permute
This is vegan 2.5-2

Attaching package: ‘vegan’

The following object is masked from ‘package:pls’:

    scores

The following object is masked from ‘package:klaR’:

    rda

The following object is masked from ‘package:caret’:

    tolerance

Loading required package: scales
Loading required package: gridExtra

Let's load some NMR data



In [4]:

    
#rm(list=ls(all=TRUE))

load(url('https://github.com/jwist/chemometrics/raw/master/datasets/coffeeMulti.rda'))
load(url('https://github.com/jwist/chemometrics/raw/master/datasets/ppm.binned.rda'))



In [5]:

    
t<-coffeeMulti$gc$country; #levels(t)<-c("o", "c","o" )
# we perform ANOVA on the nmr data. 
#d <- data.frame("x"=I(as.matrix(coffeeMulti$nmrBin)),"country"=coffeeMulti$gc$country)
d <- data.frame("x"=I(as.matrix(coffeeMulti$nmrBin)),"country"=t)
colnames(d$x) <- round(ppm.binned,3) # put ppm of bins as tag for data column. This will be helpfull for tracking variables.
fit <- aov(x ~ country, data=d)
resultsF <- unlist( lapply(summary(fit),function(x) x[[4]][1]) ) # we extract F-values
# extract the p-value of K-W test for all NMR variables
#resultsK_W.p <- apply(d$x, 2, function(x) kruskal.test(x ~ coffeeMulti$gc$country)$p.value)
resultsK_W.p <- apply(d$x, 2, function(x) kruskal.test(x ~ t)$p.value)


# we look at some results to have an idea
b <- head(order(resultsF, decreasing = TRUE), 200) # select the most discriminant variable for further analysis
best <- b[1]
#best = 1150
d <- data.frame("x"=coffeeMulti$nmrBin[,best],"country"=t)
#d <- data.frame("x"=coffeeMulti$nmrBin[,best],"country"=coffeeMulti$gc$country)



In [6]:

    
boxplot(tapply(d$x,d$country,function(x) x))



In [7]:

    
# here we plot the distribution for each class
x <- tapply(d$x,d$country,function(x) x)
xlims <- range(unlist(tapply(d$x,d$country,function(x) unlist(range(density(x)$x)))))
ylims <- range(unlist(tapply(d$x,d$country,function(x) unlist(range(density(x)$y)))))
plot(density(x[1][[1]]),type='l', xlim=xlims, ylim=ylims, col=2)
lines(density(x[2][[1]]), col=3)
lines(density(x[3][[1]]), col=4)
legend(min(xlims),max(ylims),levels(d$country), col=2:4, pch="*")



In [8]:

    
ukeyHSD(fit, "country", ordered = TRUE);
plot(TukeyHSD(fit, "country"));



In [9]:

    
F <- resultsF > 4
index <- 1:length(resultsF)
plot(ppm.binned[F],as.numeric(coffeeMulti$nmrBin[1,F]), type="n", ylim=c(-1e7,5e8), xlim=c(7.5,5.5))
lines(ppm.binned,as.numeric(coffeeMulti$nmrBin[1,]), col=1)
for (i in c(4,6,8,15,30)) {
  F <- resultsF > i
  index <- 1:length(resultsF)
  points(ppm.binned[F],resultsF[F]*5e6,cex=0.5, col=i)
}



In [10]:

    
options(repr.plot.width=8, repr.plot.height=3)



In [11]:

    
par(mfrow=c(1,2))

x <- runif(200,min=-1,max=1)
y <- x^2
plot(x,y,pch=16)
rug(x,side=1,col="grey")
rug(y,side=2,col="grey")
#cor(x,y);
#chisq.test(x,y) ;

x <- runif(200,min=-1,max=1)
y <- x
plot(x,y,pch=16)
rug(x,side=1,col="grey")
rug(y,side=2,col="grey")
#cor(x,y);
#chisq.test(x,y) ;

Despite in both cases $x$ and $y$ are dependent, in the left case they are completely uncorrelated. So being uncorrelated is not the same as being independent, but independant variables are uncorrelated. Two variables are considered independent when their joint probability distribution is equal to the product of their individual marginal probability distributions: $$P_{X,Y}(x,y)=P_X(x)P_Y(y)$$

Two variables are uncorrelated when their correlation coefficient is zero or when their covariance is equal to zero.

high number of variable

The number of available variables (predictors) is much higher that the number of observation
overfitting
reduce the number of variables a task that is achieved by step-wise regression, etc.

multivariate analysis

However there are others and more robust alternatives to these methods that aknowledge the redundancy of information spanned by collinear predictors and try to convert collinearity of predictors (an issue) into reliability (a plusvalue).



In [15]:

    
options(repr.plot.width=4, repr.plot.height=4)
load('/Users/jul/metabo/paper_IRMS/resultsF.rda')
load('/Users/jul/metabo/paper_IRMS/coffeeMulti.rda')
load('/Users/jul/metabo/paper_IRMS/ppm.binned.rda')
colnames(coffeeMulti$nmrBin) <- round(ppm.binned,3)
country <- coffeeMulti$gc$country 
irms <- abs(coffeeMulti$irms$mean) 
vite <- coffeeMulti$gc$vitE
caff <- coffeeMulti$gc$caffeine 
fumeth <- coffeeMulti$gc$furanmethanol
ffilter <- resultsF > 15
x <- cbind(irms,
           vite,
           caff,
           fumeth,
           as.matrix(coffeeMulti$nmrBin[,ffilter]))
is(x)









    



Warning message in readChar(con, 5L, useBytes = TRUE):
“cannot open compressed file '/home/jul/metabo/paper_IRMS/resultsF.rda', probable reason 'No such file or directory'”





    



Error in readChar(con, 5L, useBytes = TRUE): cannot open the connection
Traceback:

1. load("/home/jul/metabo/paper_IRMS/resultsF.rda")
2. readChar(con, 5L, useBytes = TRUE)



In [13]:

    
corrplot::corrplot(cor(cbind(irms, vite, caff, fumeth)),
                   method='number')









    



Error in cbind(irms, vite, caff, fumeth): object 'irms' not found
Traceback:

1. corrplot::corrplot(cor(cbind(irms, vite, caff, fumeth)), method = "number")
2. cor(cbind(irms, vite, caff, fumeth))
3. is.data.frame(x)
4. cbind(irms, vite, caff, fumeth)



In [13]:

    
options(repr.plot.width=5, repr.plot.height=4)
qplot(x=Var1, y=Var2, data=melt(cor(x)), fill=value, geom="tile")



In [14]:

    
M <- cor(x)
M[M > 0.9] <- NA
qplot(x=Var1, y=Var2, data=melt(M), fill=value, geom="tile")



In [44]:

    
C <- findCorrelation(cor(x), 
                     cutoff = .90, 
                     verbose = FALSE)
#x <- x[ , -C ]

fvalues <- c(8.879,4.14,31.87,4.76,resultsF[ffilter])
# to ensure that both x and f-value with the same number of Var
fvalues <- fvalues[ -C ] 
x <- scale(x, center=TRUE)

FF <- c(7,20,22,24)
d <- data.frame("x"=I(as.matrix(x[-FF,])), "country"=country[-FF])
#d <- d[d$country!="Colombia",]

pca <- prcomp(d$x)



In [45]:

    
options(repr.plot.width=8, repr.plot.height=4)
pc <- c(1,2)
COLOR <- c(2:4)
ev <- c(round(pca$sdev[pc[1]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[2]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[3]]/sum(pca$sdev)*100,0))
plot(pca$x[,pc[1]], pca$x[,pc[2]], col=COLOR[d$country], cex=0.7, 
     xlab=paste0("PC ", pc[1], " (", ev[pc[1]], "%)"), 
     ylab=paste0("PC ", pc[2], " (", ev[pc[2]], "%)"))
legend("topright", legend=levels(country), col=COLOR, pch=1)



In [43]:

    
options(repr.plot.width=8, repr.plot.height=4)
pc <- c(1,2)
COLOR <- c(2:4)
ev <- c(round(pca$sdev[pc[1]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[2]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[3]]/sum(pca$sdev)*100,0))
plot(pca$x[,pc[1]], pca$x[,pc[2]], col=COLOR[d$country], cex=0.7, 
     xlab=paste0("PC ", pc[1], " (", ev[pc[1]], "%)"), 
     ylab=paste0("PC ", pc[2], " (", ev[pc[2]], "%)"))
legend("topright", legend=levels(country), col=COLOR, pch=1)



In [38]:

    
options(repr.plot.width=8, repr.plot.height=4)
pc <- c(1,2)
COLOR <- c(2:4)
ev <- c(round(pca$sdev[pc[1]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[2]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[3]]/sum(pca$sdev)*100,0))
plot(pca$x[,pc[1]], pca$x[,pc[2]], col=COLOR[d$country], cex=0.7, 
     xlab=paste0("PC ", pc[1], " (", ev[pc[1]], "%)"), 
     ylab=paste0("PC ", pc[2], " (", ev[pc[2]], "%)"))
legend("topright", legend=levels(country), col=COLOR, pch=1)



In [36]:

    
options(repr.plot.width=8, repr.plot.height=4)
pc <- c(1,2)
COLOR <- c(2:4)
ev <- c(round(pca$sdev[pc[1]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[2]]/sum(pca$sdev)*100,0),
        round(pca$sdev[pc[3]]/sum(pca$sdev)*100,0))
plot(pca$x[,pc[1]], pca$x[,pc[2]], col=COLOR[d$country], cex=0.7, 
     xlab=paste0("PC ", pc[1], " (", ev[pc[1]], "%)"), 
     ylab=paste0("PC ", pc[2], " (", ev[pc[2]], "%)"))
legend("topright", legend=levels(country), col=COLOR, pch=1)



In [23]:

    
options(repr.plot.width=8, repr.plot.height=4)
pc <- c(1,3)
COLOR <- c(2:4)
PCH <- c(1,16)
pca <- prcomp(d$x)
#pca <- prcomp(d$x,scale=FALSE,center=TRUE)
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(3,3), heights=c(1)) 
op <- par(mar = c(4,4,1,1))
biplot(pca,cex=0.7,cex.lab=0.7,cex.axis=0.7)
plot(pca$x[,pc[1]], pca$x[,pc[2]], col=COLOR[d$country], cex=PCH[1], xlab=paste0("PC ", pc[1], " (", round(pca$sdev[pc[1]]/sum(pca$sdev)*100,0), "%)"), ylab=paste0("PC ", pc[2], " (", round(pca$sdev[pc[2]]/sum(pca$sdev)*100,0), "%)"))
legend("topright", legend=levels(country), col=COLOR, pch=1)
par(op)



In [24]:

    
pairs(pca$x[,1:3],col=d$country)



In [25]:

    
options(repr.plot.width=4, repr.plot.height=4)
prin_comp<-rda(d$x, scale=TRUE)
pca_scores<-scores(prin_comp)
plot(pca_scores$sites[,1],pca_scores$sites[,2], pch=21, bg=as.numeric(d$country),
    arrows(0,0,pca_scores$species[,1],pca_scores$species[,2],lwd=1,length=0.2),
    xlab=paste0("PC ", pc[1], " (", round(pca$sdev[pc[1]]/sum(pca$sdev)*100,0), "%)"), ylab=paste0("PC ", pc[2], " (", round(pca$sdev[pc[2]]/sum(pca$sdev)*100,0), "%)"))
ordiellipse(prin_comp,d$country,conf=0.99)
text(pca_scores$species[,1],pca_scores$species[,2],colnames(d$x),cex=0.7,pos=4,col="red")



In [26]:

    
op <- par(mar = c(4,4,1,1))
plot(pca$rotation[,1], type='l', ylim=range(pca$rotation[,1:3]))
lines(pca$rotation[,2], type='l', col=2)
lines(pca$rotation[,3], type='l', col=3)
par(op)



In [27]:

    
options(repr.plot.width=8, repr.plot.height=4)
layout(matrix(c(1,2,3), 1, 3, byrow = TRUE), 
       widths=c(3,3,3), 
       heights=c(1)) 
op <- par(mar = c(4,4,0,0))
plot(pca$rotation[,1],pca$rotation[,2]); 
plot(pca$rotation[,1],pca$rotation[,3]); 
plot(pca$rotation[,2],pca$rotation[,3])
par(op)



In [28]:

    
options(repr.plot.width=4, repr.plot.height=4)
aload <- abs(pca$rotation)
contrib <- sweep(aload, 2, colSums(aload), "/")

plot(fvalues[],contrib[,pc[1]],
     xlab="F-values calculated from ANOVA", 
     ylab=paste0("contribution to PC",pc[1]))
text(fvalues[],contrib[,pc[1]],colnames(d$x),cex=0.7,pos=4,col="red")
legend('topleft', 
       legend=paste("corr=",round(cor(fvalues,contrib[,pc[1]]),2)))









    



Error in xy.coords(x, y, xlabel, ylabel, log): 'x' and 'y' lengths differ






    



Error in text.default(fvalues[], contrib[, pc[1]], colnames(d$x), cex = 0.7, : plot.new has not been called yet






    



Error in cor(fvalues, contrib[, pc[1]]): incompatible dimensions



In [29]:

    
x <- cbind(irms,vite,caff,fumeth,as.matrix(coffeeMulti$nmrBin[,ffilter]))
C <- findCorrelation(cor(x), cutoff = .90, verbose = FALSE)
fvalues <- c(8.879,4.14,31.87,4.76,resultsF[ffilter]) # we apply the filter
x <- scale(x, center=TRUE)
d <- data.frame("x"=I(as.matrix(x)), "country"=country)
rownames(d$x)<-1:nrow(d) #to have the numbers according to the index in the matrix
pca <- prcomp(d$x)
aload <- abs(pca$rotation)
contrib <- sweep(aload, 2, colSums(aload), "/")
op <- par(mar = c(4,4,0,0))



In [30]:

    
plot(fvalues[],contrib[,pc[1]],xlab="F-values calculated from ANOVA", ylab=paste0("contribution to PC",pc[1]))
text(fvalues[],contrib[,pc[1]],colnames(d$x),cex=0.7,pos=4,col="red")
legend('topleft', legend=paste("corr=",round(cor(fvalues,contrib[,pc[1]]),2)))
par(op)

conclusions

irms, gc-ms and nmr are not complementary
the same molecules are detected and therefore no additional value is gained

PLS



In [31]:

    
library(caret)
options(repr.plot.width=4, repr.plot.height=4)



In [32]:

    
m<-matrix(sample(0:140,140), nrow= 20,ncol =7, byrow="TRUE")

species<-c("1", "1", "1","1", "1", "1","1", "1", "1","1",
           "-1","-1","-1","-1","-1","-1","-1","-1","-1","-1")

M<-data.frame(m,species)
X<-m
Y<-as.numeric(species)



In [33]:

    
M[c(1,2,19,20),]









    Out[33]:





X1 X2 X3 X4 X5 X6 X7 species

	1 54 63 17 108 51 120 88 1  
	2 66 105 96 2  121 32 90 1  
	19 127 74 114 79 45 130 113 -1 
	20 33 119 37 40 82 68 0  -1



In [34]:

    
nf = dim(X)[1]
T = c()
P = c()
C = c()
W = c()



In [35]:

    
for (j in 1:nf) {
  w = (t(X) %*% Y) %*% solve(t(Y) %*% Y)
  w1 = t(w) %*% w
  w2 = abs(sqrt(w1))
  w = w %*% solve(w2)
  t = (X %*% w)
  t1 = t(t) %*% t
  c = t(Y) %*% t %*% solve(t1)
  c1 = t(c) %*% c
  u = Y %*% c %*% solve(c1)
  u1 = t(u) %*% u
  u2 = abs(sqrt(u1))
  p = (t(X) %*% t) %*% solve(t1)
  X = X - (t)%*%t(p)
  T = matrix(c(T, t))
  P = matrix(c(P, p))
  C = matrix(c(C, c))
  W = matrix(c(W, w))
}



In [36]:

    
T = matrix(T, ncol = nf)
T = scale(T, scale = TRUE, center = TRUE)
P = matrix(P, ncol = nf)
C = matrix(C, ncol = nf)
W = matrix(W, ncol = nf)



In [37]:

    
plot(T[,1], T[,2], col = factor(species))
text(T[,1], T[,2], species, pos = 2)

o-PLS



In [38]:

    
m<-matrix(sample(6e6:7e6,140), nrow= 20,ncol =7, byrow="TRUE")



In [39]:

    
species<-c("1", "1", "1","1", "1", "1","1", "1", "1","1",
           "-1","-1","-1","-1","-1","-1","-1","-1","-1","-1")

M<-data.frame(m,species)
X<-m
Y<-as.numeric(species)
nf=15

T = c()
P = c()
C = c()
W = c()
Tortho = c()
Portho = c()
Wortho = c()
Cortho = c()



In [40]:

    
for (j in 1:nf) {
  w = (t(X) %*% Y) %*% solve(t(Y) %*% Y)
  w=w%*%solve(norm(w))
  t = (X %*% w) %*% solve(t(w) %*% w)
  c = t(Y) %*% t %*% solve(t(t) %*% t)
  u = Y %*% c %*% solve(t(c) %*% c)
  p = (t(X) %*% t) %*% solve(t(t) %*% t)
  wortho = p - w
  wortho= p - t(((t(w)%*%p)%*% solve(t(w)%*%w))%*% t(w))
  wortho= wortho%*%solve(norm(wortho))
  tortho = X %*% wortho %*% solve(t(wortho) %*% wortho)
  portho = t(X) %*% tortho %*% solve(t(tortho) %*% tortho)
  cortho = t(Y) %*% tortho %*% solve(t(tortho) %*% tortho)
  X = X - tortho %*% t(portho)
  T = matrix(c(T, t))
  P = matrix(c(P, p))
  C = matrix(c(C, c))
  W = matrix(c(W, w))
  Tortho = matrix(c(Tortho, tortho))
  Portho = matrix(c(Portho, portho))
  Wortho = matrix(c(Wortho, wortho))
  Cortho = matrix(c(Cortho, cortho))
}



In [41]:

    
T = matrix(T, ncol = nf)
T = scale(T, scale = TRUE, center = TRUE)
P = matrix(P, ncol = nf)
C = matrix(C, ncol = nf)
W = matrix(W, ncol = nf)
Tortho = matrix(Tortho, ncol = nf)
Tortho = scale(Tortho, scale = TRUE, center = TRUE)
Portho = matrix(Portho, ncol = nf)
Wortho = matrix(Wortho, ncol = nf)
Xortho = Tortho %*% t(Portho)



In [42]:

    
plot(T[,1], Tortho[, 1], pch = 19, col=factor(species))
text(T[, 1], Tortho[,1],species, cex=1, pos=2)



In [51]:

    
m<-matrix(sample(6e6:7e6,14000), nrow= 20,ncol =700, byrow="TRUE")



In [52]:

    
species<-c("1", "1", "1","1", "1", "1","1", "1", "1","1",
           "-1","-1","-1","-1","-1","-1","-1","-1","-1","-1")

M<-data.frame(m,species)
X<-m
Y<-as.numeric(species)
nf=15

T = c()
P = c()
C = c()
W = c()
Tortho = c()
Portho = c()
Wortho = c()
Cortho = c()
for (j in 1:nf) {
  w = (t(X) %*% Y) %*% solve(t(Y) %*% Y)
  w=w%*%solve(norm(w))
  t = (X %*% w) %*% solve(t(w) %*% w)
  c = t(Y) %*% t %*% solve(t(t) %*% t)
  u = Y %*% c %*% solve(t(c) %*% c)
  p = (t(X) %*% t) %*% solve(t(t) %*% t)
  wortho = p - w
  wortho= p - t(((t(w)%*%p)%*% solve(t(w)%*%w))%*% t(w))
  wortho= wortho%*%solve(norm(wortho))
  tortho = X %*% wortho %*% solve(t(wortho) %*% wortho)
  portho = t(X) %*% tortho %*% solve(t(tortho) %*% tortho)
  cortho = t(Y) %*% tortho %*% solve(t(tortho) %*% tortho)
  X = X - tortho %*% t(portho)
  T = matrix(c(T, t))
  P = matrix(c(P, p))
  C = matrix(c(C, c))
  W = matrix(c(W, w))
  Tortho = matrix(c(Tortho, tortho))
  Portho = matrix(c(Portho, portho))
  Wortho = matrix(c(Wortho, wortho))
  Cortho = matrix(c(Cortho, cortho))
}
T = matrix(T, ncol = nf)
T = scale(T, scale = TRUE, center = TRUE)
P = matrix(P, ncol = nf)
C = matrix(C, ncol = nf)
W = matrix(W, ncol = nf)
Tortho = matrix(Tortho, ncol = nf)
Tortho = scale(Tortho, scale = TRUE, center = TRUE)
Portho = matrix(Portho, ncol = nf)
Wortho = matrix(Wortho, ncol = nf)
Xortho = Tortho %*% t(Portho)



In [53]:

    
plot(T[,1], Tortho[, 1], pch = 19, col=factor(species))
text(T[, 1], Tortho[,1],species, cex=1, pos=2)

Conclusions

this is a code for teaching (not validated)
we can get inspiration for creating a new publication style



In [236]:

    
m<-matrix(sample(1e6:7e6,28800), nrow= 64,ncol =450, byrow="TRUE")



In [237]:

    
64*45



In [238]:

    
colnames(m) <- paste0("x",seq_along(1:450))



In [239]:

    
rownames(m) <- paste0("y",seq_along(1:64))



In [240]:

    
mData <- data.frame("x" = m)



In [247]:

    
s <- paste0("x.x",seq(2,63,1), "+", collapse = '')
s <- substr(s, 1, nchar(s)-1)
s









    




'x.x2+x.x3+x.x4+x.x5+x.x6+x.x7+x.x8+x.x9+x.x10+x.x11+x.x12+x.x13+x.x14+x.x15+x.x16+x.x17+x.x18+x.x19+x.x20+x.x21+x.x22+x.x23+x.x24+x.x25+x.x26+x.x27+x.x28+x.x29+x.x30+x.x31+x.x32+x.x33+x.x34+x.x35+x.x36+x.x37+x.x38+x.x39+x.x40+x.x41+x.x42+x.x43+x.x44+x.x45+x.x46+x.x47+x.x48+x.x49+x.x50+x.x51+x.x52+x.x53+x.x54+x.x55+x.x56+x.x57+x.x58+x.x59+x.x60+x.x61+x.x62+x.x63'



In [248]:

    
model <- lm(paste("x.x1 ~", s), data=mData)
summary(model)









    





Call:
lm(formula = paste("x.x1 ~", s), data = mData)

Residuals:
     y1      y2      y3      y4      y5      y6      y7      y8      y9     y10 
 228444  248729  557359 -763495  -92660  610082  365486 -587470  -59516 -273532 
    y11     y12     y13     y14     y15     y16     y17     y18     y19     y20 
-651450 -567236  685098  409284   92661 -279554  -41412  681456 -302780 -314626 
    y21     y22     y23     y24     y25     y26     y27     y28     y29     y30 
 729304 -572922 -411326 -202207  428707  174516  120759 1212497 -599863  154313 
    y31     y32     y33     y34     y35     y36     y37     y38     y39     y40 
 388374 -778428 -393978  133684  -18933 -890221   93630  463111 -194247 -305744 
    y41     y42     y43     y44     y45     y46     y47     y48     y49     y50 
-887493  141877 -842055  193078 -660011   48905 -313482  563133  101061   97946 
    y51     y52     y53     y54     y55     y56     y57     y58     y59     y60 
-388915  349020 -209368  533258   74180  554217  650975   69209 -110065  510699 
    y61     y62     y63     y64 
 410185 -117145   24239 -269341 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.821e+07  6.911e+07  -0.264    0.836
x.x2         3.789e-01  1.674e+00   0.226    0.858
x.x3         1.910e-01  1.520e+00   0.126    0.920
x.x4        -8.649e-01  1.427e+00  -0.606    0.653
x.x5        -1.709e-01  1.464e+00  -0.117    0.926
x.x6         3.168e-02  1.133e+00   0.028    0.982
x.x7        -2.802e-01  9.881e-01  -0.284    0.824
x.x8        -4.358e-01  2.026e+00  -0.215    0.865
x.x9        -1.007e+00  2.565e+00  -0.393    0.762
x.x10        1.637e-01  1.019e+00   0.161    0.899
x.x11       -7.309e-01  2.102e+00  -0.348    0.787
x.x12       -7.967e-01  1.219e+00  -0.653    0.632
x.x13       -7.707e-02  8.856e-01  -0.087    0.945
x.x14        1.914e-01  1.074e+00   0.178    0.888
x.x15        4.558e-01  1.146e+00   0.398    0.759
x.x16       -2.308e-01  1.200e+00  -0.192    0.879
x.x17        3.424e-01  2.052e+00   0.167    0.895
x.x18       -2.482e-01  9.426e-01  -0.263    0.836
x.x19       -6.005e-01  1.377e+00  -0.436    0.738
x.x20        6.038e-01  8.329e-01   0.725    0.601
x.x21        2.958e-01  1.488e+00   0.199    0.875
x.x22        7.384e-01  2.399e+00   0.308    0.810
x.x23       -9.216e-01  1.762e+00  -0.523    0.693
x.x24        2.608e-01  1.235e+00   0.211    0.867
x.x25        9.870e-01  2.377e+00   0.415    0.749
x.x26        4.835e-01  1.407e+00   0.344    0.789
x.x27        2.981e-01  2.573e+00   0.116    0.927
x.x28        1.006e+00  1.774e+00   0.567    0.672
x.x29        1.494e-01  8.847e-01   0.169    0.893
x.x30        3.485e-01  8.606e-01   0.405    0.755
x.x31        6.421e-02  1.231e+00   0.052    0.967
x.x32       -9.711e-02  1.696e+00  -0.057    0.964
x.x33        8.155e-01  2.013e+00   0.405    0.755
x.x34       -4.847e-01  1.225e+00  -0.396    0.760
x.x35        6.480e-02  7.780e-01   0.083    0.947
x.x36        4.807e-01  1.742e+00   0.276    0.829
x.x37        8.239e-01  2.380e+00   0.346    0.788
x.x38       -3.876e-01  1.289e+00  -0.301    0.814
x.x39       -4.342e-01  1.161e+00  -0.374    0.772
x.x40        1.845e-01  1.284e+00   0.144    0.909
x.x41        7.603e-01  2.027e+00   0.375    0.772
x.x42       -4.295e-01  6.610e-01  -0.650    0.633
x.x43        1.193e-01  8.105e-01   0.147    0.907
x.x44        7.352e-01  1.182e+00   0.622    0.646
x.x45       -7.436e-01  8.543e-01  -0.870    0.544
x.x46        1.869e-01  1.510e+00   0.124    0.922
x.x47        1.190e+00  3.689e+00   0.323    0.801
x.x48        4.283e-01  1.272e+00   0.337    0.793
x.x49       -8.510e-02  1.054e+00  -0.081    0.949
x.x50       -1.190e-01  7.982e-01  -0.149    0.906
x.x51        6.494e-02  1.157e+00   0.056    0.964
x.x52       -8.586e-02  6.975e-01  -0.123    0.922
x.x53        9.418e-02  1.574e+00   0.060    0.962
x.x54        1.913e-01  1.587e+00   0.121    0.924
x.x55       -5.927e-01  2.145e+00  -0.276    0.828
x.x56       -2.929e-02  1.784e+00  -0.016    0.990
x.x57        6.238e-01  1.370e+00   0.455    0.728
x.x58        1.346e-01  1.345e+00   0.100    0.937
x.x59        8.996e-01  9.900e-01   0.909    0.530
x.x60        5.015e-01  2.225e+00   0.225    0.859
x.x61       -2.985e-01  7.780e-01  -0.384    0.767
x.x62        4.703e-01  1.233e+00   0.381    0.768
x.x63       -3.954e-01  1.046e+00  -0.378    0.770

Residual standard error: 3688000 on 1 degrees of freedom
Multiple R-squared:  0.9341,	Adjusted R-squared:  -3.15 
F-statistic: 0.2287 on 62 and 1 DF,  p-value: 0.9594



In [216]:

    
pt(1.8,df=62)









    




0.961636965220729



In [217]:

    
cor.test(mData$x.x1, mData$x.x23)
plot(mData$x.x1, mData$x.x23)









    





	Pearson's product-moment correlation

data:  mData$x.x1 and mData$x.x23
t = -1.1521, df = 638, p-value = 0.2497
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.12263152  0.03205036
sample estimates:
        cor 
-0.04556369



In [ ]:



In [183]:

    
qplot(x=Var1, y=Var2, data=melt(cor(mData)), fill=value, geom="tile")



In [ ]:



In [46]:

    
#library(devtools)
#install_github("jwist/rLims")
library(rLims)
account <- list(server='mylims.univalle.edu.co',user='jessica.medina@correounivalle.edu.co',key='PDfEZzKo3K')
metaData <- lims.getUserSamplesMetaData(account=account)
metaData <- lims.selectSamples(metaData,c(7:8))
Filter <- list('experiment'=c('noesygpps1dcomp'))
data <- lims.getSpectra(metaData, Filter=Filter,OP='AND',dry=FALSE)
plot(data$spectra[[2]])
#dataSet <- lims.createDataSet(data)









    



Loading required package: rjson

Attaching package: 'rLims'

The following object is masked from 'package:ggplot2':

    lims

Warning message:
In lims.getParameters(entries): NA found in parameters, please check





    



[1] "number of spectra ordered by experiment type:"

                     cpmgpr1dcomp     hmbcgplpndqf  hsqcdietgpsisp2 
               3              104                2                2 
        mlevphpr  noesygpphwgx1jw      noesygppr1d noesygpps1d.comp 
               2                2               66                2 
 noesygpps1dcomp             zg30           zgpg30 
              96                1                2 
[1] "number of spectra ordered by solvents:"

                        CDCl3 COFFEEcalctemp     COFFEEmeoh            D2O 
             2              4              5             78             11 
        H2OD2O           MeOD 
           165             17 
[1] "number of spectra ordered by nucleus:"

13C  1H   C 
  2 278   2 
[1] "number of spectra ordered by temperature:"

  300 300.1 300.2 300.3 300.5 
  246    18     9     2     1 
[1] "total number of spectra:  282"






    



Warning message:
In file(file, "rt"): unable to connect to 'mylims.univalle.edu.co' on port 80.





    



Error in file(file, "rt"): cannot open the connection






    



Error in data$spectra: object of type 'closure' is not subsettable



In [ ]:

	X1	X2	X3	X4	X5	X6	X7	species
1	54	63	17	108	51	120	88	1
2	66	105	96	2	121	32	90	1
19	127	74	114	79	45	130	113	-1
20	33	119	37	40	82	68	0	-1

	X1	X2	X3	X4	X5	X6	X7	species
1	54	63	17	108	51	120	88	1
2	66	105	96	2	121	32	90	1
19	127	74	114	79	45	130	113	-1
20	33	119	37	40	82	68	0	-1

	X1	X2	X3	X4	X5	X6	X7	species
1	54	63	17	108	51	120	88	1
2	66	105	96	2	121	32	90	1
19	127	74	114	79	45	130	113	-1
20	33	119	37	40	82	68	0	-1