Multivariate evolution

Gabriel Marroig and Diogo Melo

14/09/2016

In this tutorial we explore the consequences of genetic covariation for multivariate evolution using a toy dataset measured in 5 related species rodent species, named A to E. In the dataset we have 4 quantitative trais (Humer, Ulna, Tibia and Femur). Each species have a sample size of $N = 60$. We will use this dataset to illustrate several points related to multivariate evolution and data analyses.

As usual, first we need to install a few packages



In [1]:

    
list_pkgs <- c("evolqg", "ggplot2", "GGally")
new_pkgs <- list_pkgs[!(list_pkgs %in% installed.packages()[,"Package"])]
if(length(new_pkgs) > 0){ install.packages(new_pkgs) }

library(evolqg)
library(ggplot2)
library(GGally)
library(MASS)
library(scales)









    



Loading required package: plyr

Fortunately, the dataset we are going to use is included in the evolqg package, so we can just load it into our workspace using the data function.



In [2]:

    
data(dentus)

This creates a data.frame in the workspace names dentus with all the data we need.



In [3]:

    
dentus









    





humerus ulna femur tibia species

	3.214619  8.036072 13.51779 19.01131 A        
	5.222875 10.838375 15.54714 21.01718 A        
	5.193021 11.911770 17.24719 22.43190 A        
	6.547400 11.293085 14.89040 20.17706 A        
	4.724383  9.897135 14.81104 19.59753 A        
	4.630147  9.441174 15.24754 19.25937 A        
	6.771560 11.905400 15.57164 22.06777 A        
	5.229635  9.881575 13.98772 19.05881 A        
	4.786389 10.020607 15.67659 19.62829 A        
	4.171654  8.998204 15.52306 20.06622 A        
	5.428925  9.645383 14.02324 21.07972 A        
	4.689755 10.729000 15.81852 19.60883 A        
	5.826453 10.690124 14.72827 19.57330 A        
	6.627715 10.615446 16.00286 21.46821 A        
	4.455203  9.690600 15.16681 20.01375 A        
	4.946527 11.137245 14.24756 19.52614 A        
	5.380983 10.633657 13.41578 18.87848 A        
	4.595959  8.723820 14.98695 20.80915 A        
	4.206078  8.771372 14.92444 19.57403 A        
	5.634718  9.509186 15.48804 20.39082 A        
	5.291808 10.421983 15.78412 20.54966 A        
	4.829492  9.824937 14.07292 19.55087 A        
	3.503918  9.074366 13.97675 18.90484 A        
	4.555670 10.169815 16.37251 21.33269 A        
	3.917700  8.991461 15.23725 20.41206 A        
	4.101148  8.796782 15.40484 20.73443 A        
	5.099787  9.902734 13.84876 20.25140 A        
	6.661751 11.990388 15.30423 19.88481 A        
	3.903197  9.747912 13.72621 19.37218 A        
	6.145626 10.246054 14.21860 18.78641 A        
	⋮ ⋮ ⋮ ⋮ ⋮
	5.785819 14.53255 20.62877 26.88133 E       
	7.153986 16.10437 19.66304 26.53573 E       
	7.462511 13.09950 22.07091 28.38975 E       
	7.961418 13.92199 20.43572 26.26523 E       
	6.876212 14.26987 20.77345 27.98202 E       
	8.192530 13.49930 20.49090 27.00043 E       
	9.156935 13.84123 20.64467 28.66664 E       
	7.319326 16.05932 20.88733 27.28251 E       
	6.706329 14.03159 21.65913 28.69696 E       
	6.400634 14.69913 20.26323 28.91090 E       
	7.221770 13.05084 20.42197 27.56913 E       
	7.107433 15.13142 21.18512 28.13049 E       
	6.012521 13.12610 18.66419 26.92798 E       
	6.713681 14.33916 22.46279 27.50479 E       
	7.563836 14.95400 20.13634 28.04306 E       
	7.099208 13.46466 19.84551 25.82650 E       
	6.765223 14.71544 20.62701 28.76826 E       
	7.391892 14.18998 20.26107 26.43137 E       
	7.337556 13.26781 19.33766 26.36457 E       
	5.844740 14.11085 20.51828 26.41921 E       
	6.807692 15.13887 20.07011 26.00979 E       
	9.015036 12.34000 21.57812 28.13771 E       
	5.594462 14.98144 21.27598 27.25072 E       
	7.259060 14.51132 21.94688 28.78272 E       
	7.930196 15.25946 21.86969 28.05573 E       
	8.695040 15.52299 20.88409 29.06015 E       
	7.342726 13.83276 21.42994 28.26561 E       
	7.911118 14.29955 20.21398 27.89367 E       
	7.450212 15.39719 19.91358 28.18544 E       
	7.871188 14.42503 21.13337 27.91589 E

First, let's visualize the data by making simple graphs (scatterplots) of all the traits in the dataset with different symbol and colors for all species. There should be a total of 6 graphs (Trait 1 x 2, 1 x 3, and so on). It can also be represented as a scatterplot matrix:



In [4]:

    
ggpairs(dentus,  mapping = aes(color = species), columns = c("humerus", "ulna","femur", "tibia", "species"))









    



`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There is a lot of information on these graphs. What can you see in terms of association between traits within each species? Are they similar? Are there differences? The graphs also give you information about the size differences between species.

Principal components and discriminant analysis

Let us examine now how the variation within-groups (species) and between-groups is distributed. One common method to summarize or transform variation in several traits into fewer number of variables is principal component analysis (PCA). Usually, a system with 30-40 traits can be represented by a much smaller number of principal components, say 2 to 7. These principal components are composite traits using all the original traits, but unlike the original traits, they are uncorrelated to each other.

If you know some linear algebra, the pincipal components are the eigenvectors of the covariance matrix between the traits. The eigenvalues are the variances in the dataset in the direction of the corresponding eigenvector. If you don't know linear algebra, here is a figure to ilustrate a principal componente in two dimensions.

Also, take a look at this great tutorial on PCA

Let's extract the first 2 principal components from this dataset and then plot the two new variables and examine the distribution of points and species on it. What can you see in this graph in regard to the distribution of variation within and between groups?



In [5]:

    
# PCA is the eigenvalues of the covariance matrix, so let's calcualte the covariance matrix for the full dataset
fullcov = cov(dentus[,1:4])
fullcov









    





humerus ulna femur tibia

	humerus  5.246708  9.336696  7.851466 10.44042 
	ulna  9.336696 19.102111 16.135152 21.28146 
	femur  7.851466 16.135152 20.106820 25.94645 
	tibia 10.440416 21.281457 25.946449 34.96317



In [6]:

    
#Now we take the eigen decomposition:
eigen_fullcov = eigen(fullcov)
eigen_fullcov$values









    





	72.626035563574
	5.70583545013819
	0.59169798120015
	0.495240904446538



In [7]:

    
# These coluns are all the PCs
eigen_fullcov$vectors









    






	-0.2302141  0.4138996  0.6109147 -0.6344067
	-0.4663372  0.7473891 -0.3214898  0.3472514
	-0.5145071 -0.3053580 -0.5708807 -0.5622580
	-0.6817723 -0.4205392  0.4444353  0.4010120



In [8]:

    
# We can use these PCs to project the original dataset and get the scores for each individual in the PCs
dentus_fullPCscores = as.matrix(dentus[,1:4]) %*% eigen_fullcov$vectors

# These must be uncorrelated, so all correlation should be zero
round(cor(dentus_fullPCscores), 5)



In [9]:

    
# Now the plot of the first two principal components
ggplot(data.frame(PC1 = dentus_fullPCscores[,1], PC2 = dentus_fullPCscores[,2], species = dentus$species), 
       aes(-PC1, PC2, color = species)) + geom_point() +
 labs(x = paste("PC1 (", percent(eigen_fullcov$values[1]/sum(eigen_fullcov$values)), ")", sep=""),
       y = paste("PC2 (", percent(eigen_fullcov$values[2]/sum(eigen_fullcov$values)), ")", sep=""))

Now run a discriminant analyses with all 5 species as grouping (independent) variable and the 4 quantitative as predictor variables and plot the first discriminant function against the second one. Notice how both methods rotate the axis to find solutions to the problem. In the principal components you maximize the variance captured by each variable while on the DF you maximize the differences between groups while controlling for the variation within each group.



In [10]:

    
lda <- lda(species ~ ., 
           dentus, 
           prior = c(1,1,1,1,1)/5)

prop.lda = lda$svd^2/sum(lda$svd^2)

plda <- predict(object = lda,
                newdata = dentus)

dataset = data.frame(species = dentus[,"species"], lda = plda$x)

ggplot(dataset) + geom_point(aes(lda.LD1, lda.LD2, colour = species), size = 2.5) + 
  labs(x = paste("LD1 (", percent(prop.lda[1]), ")", sep=""),
       y = paste("LD2 (", percent(prop.lda[2]), ")", sep=""))

Response to selection

Now let’s turn our attention to the species Matrices and how traits are associated within each one of them.

Compute the correlation matrix (all 6 pairs of correlation) for each species. For our discussion here this matrix is very similar to the variance/covariance matrix so for sake of comparison we will use the correlation (all traits have very similar variance except for one in one species).



In [11]:

    
# We can calculate the within species matrix by subsettting the original data.frame for each matrix:
corA = cor(dentus[dentus$species == "A", 1:4])
corB = cor(dentus[dentus$species == "B", 1:4])
corC = cor(dentus[dentus$species == "C", 1:4])
corD = cor(dentus[dentus$species == "D", 1:4])
corE = cor(dentus[dentus$species == "E", 1:4])

#now let's group them in a list
cor_mats = list(A = corA, B = corB, C = corC, D = corD, E = corE)
cor_mats









    





	$A
		
humerus ulna femur tibia

	humerus 1.0000000 0.8042349 0.2332436 0.2880717
	ulna 0.8042349 1.0000000 0.3038286 0.3321251
	femur 0.2332436 0.3038286 1.0000000 0.7690596
	tibia 0.2880717 0.3321251 0.7690596 1.0000000



	$B
		
humerus ulna femur tibia

	humerus 1.0000000 0.43328569 0.17293372 0.1840682 
	ulna 0.4332857 1.00000000 0.03990535 0.1373924 
	femur 0.1729337 0.03990535 1.00000000 0.6896008 
	tibia 0.1840682 0.13739244 0.68960082 1.0000000 



	$C
		
humerus ulna femur tibia

	humerus 1.00000000 0.78877614 0.01499336 0.3469841 
	ulna 0.78877614 1.00000000 0.08068266 0.3640083 
	femur 0.01499336 0.08068266 1.00000000 0.1662252 
	tibia 0.34698412 0.36400826 0.16622521 1.0000000 



	$D
		
humerus ulna femur tibia

	humerus 1.0000000 0.7789553 0.5164091 0.5862964
	ulna 0.7789553 1.0000000 0.7241503 0.6674222
	femur 0.5164091 0.7241503 1.0000000 0.5806471
	tibia 0.5862964 0.6674222 0.5806471 1.0000000



	$E
		
humerus ulna femur tibia

	humerus  1.00000000 -0.12227727 0.02229512 0.3655136  
	ulna -0.12227727  1.00000000 0.01355941 0.1594792  
	femur  0.02229512  0.01355941 1.00000000 0.2598259  
	tibia  0.36551364  0.15947920 0.25982593 1.0000000

Do species have the same correlation structure (patterns and magnitude of correlation)? How would that structure affect the evolutionary potential of each species? To anwser these questions we will simulate a 1000 random vectors of selection and multiply each species matrix by these vectors. Doing this we obtain 1000 response vectors produced by this simulated selection.



In [12]:

    
random_betas = matrix(rnorm(1000*4), 4, 1000)

# These are the simulated responses
response_list = lapply(cor_mats, function(x) x %*% random_betas)

We can now relate the structure of the correlation matrix with those responses calculating a series of statistics:

Correlation between selection vector x response vector
Correlation between response vector x PC1 (first principal component of each species)
Correlation between response vector x PC2 (second principal component of each species)



In [13]:

    
# First we need a vector correlation funcion:
corVector = function(x, y) x %*% y / (Norm(x) * Norm(y))

# And a function that calculates the vector correlation between the columns of two matrices:
corColumns = function(x, y){
    n = ncol(x)
    correlations = numeric(n)
    for(i in 1:n)
        correlations[i] = corVector(x[,i], y[,i])
    return(correlations)
}
    
# And a function that calculates the vector correlation between the columns of a matrix and a vector:
corColumnVector = function(x, vector){
    n = ncol(x)
    correlations = numeric(n)
    for(i in 1:n)
        correlations[i] = corVector(x[,i], vector)
    return(correlations)
}



In [14]:

    
# Correlation between selection vector ($\beta$) x response vector $\delta z$
cor_beta_dz = lapply(response_list, corColumns, random_betas)

# Correlation between response vector x PC1 (first principal component of each species)
species = list("A", "B", "C", "D", "E")
cor_dz_PC1 = lapply(species, function(sp) abs(corColumnVector(response_list[[sp]], eigen(cor_mats[[sp]])$vector[,1])))
names(cor_dz_PC1) = species
    
# Correlation between response vector x PC1 (first principal component of each species)
cor_dz_PC2 = lapply(species, function(sp) abs(corColumnVector(response_list[[sp]], eigen(cor_mats[[sp]])$vector[,2])))
names(cor_dz_PC2) = species
    
# We can joint all these correlation in a data.frame
corelation_df = NULL
for(sp in species){
    corelation_df = rbind(corelation_df, 
                      data.frame(beta_dz = cor_beta_dz[[sp]], 
                                 dz_PC1 = cor_dz_PC1[[sp]], 
                                 dz_PC2 = cor_dz_PC2[[sp]], 
                                 species = sp))
}

Now make a graph for each species of the correlation between PC1 x response vector in the x-axis and response vector x pc2 on the y-axis. Do you see any differences in the potential of responses for each species? Which species is most “constrained” or limited and the potential responses? Does that make sense when you look at the association between the 4 traits?



In [15]:

    
ggplot(corelation_df, aes(dz_PC1, dz_PC2)) + geom_point(alpha = 0.1) + facet_wrap(~species, ncol = 2)

We can also look at the distribution of the aligmente between the response and the selection for each species



In [16]:

    
ggplot(corelation_df, aes(species, beta_dz)) + geom_violin() + geom_jitter(alpha = 0.1)

humerus	ulna	femur	tibia	species
3.214619	8.036072	13.51779	19.01131	A
5.222875	10.838375	15.54714	21.01718	A
5.193021	11.911770	17.24719	22.43190	A
6.547400	11.293085	14.89040	20.17706	A
4.724383	9.897135	14.81104	19.59753	A
4.630147	9.441174	15.24754	19.25937	A
6.771560	11.905400	15.57164	22.06777	A
5.229635	9.881575	13.98772	19.05881	A
4.786389	10.020607	15.67659	19.62829	A
4.171654	8.998204	15.52306	20.06622	A
5.428925	9.645383	14.02324	21.07972	A
4.689755	10.729000	15.81852	19.60883	A
5.826453	10.690124	14.72827	19.57330	A
6.627715	10.615446	16.00286	21.46821	A
4.455203	9.690600	15.16681	20.01375	A
4.946527	11.137245	14.24756	19.52614	A
5.380983	10.633657	13.41578	18.87848	A
4.595959	8.723820	14.98695	20.80915	A
4.206078	8.771372	14.92444	19.57403	A
5.634718	9.509186	15.48804	20.39082	A
5.291808	10.421983	15.78412	20.54966	A
4.829492	9.824937	14.07292	19.55087	A
3.503918	9.074366	13.97675	18.90484	A
4.555670	10.169815	16.37251	21.33269	A
3.917700	8.991461	15.23725	20.41206	A
4.101148	8.796782	15.40484	20.73443	A
5.099787	9.902734	13.84876	20.25140	A
6.661751	11.990388	15.30423	19.88481	A
3.903197	9.747912	13.72621	19.37218	A
6.145626	10.246054	14.21860	18.78641	A
⋮	⋮	⋮	⋮	⋮
5.785819	14.53255	20.62877	26.88133	E
7.153986	16.10437	19.66304	26.53573	E
7.462511	13.09950	22.07091	28.38975	E
7.961418	13.92199	20.43572	26.26523	E
6.876212	14.26987	20.77345	27.98202	E
8.192530	13.49930	20.49090	27.00043	E
9.156935	13.84123	20.64467	28.66664	E
7.319326	16.05932	20.88733	27.28251	E
6.706329	14.03159	21.65913	28.69696	E
6.400634	14.69913	20.26323	28.91090	E
7.221770	13.05084	20.42197	27.56913	E
7.107433	15.13142	21.18512	28.13049	E
6.012521	13.12610	18.66419	26.92798	E
6.713681	14.33916	22.46279	27.50479	E
7.563836	14.95400	20.13634	28.04306	E
7.099208	13.46466	19.84551	25.82650	E
6.765223	14.71544	20.62701	28.76826	E
7.391892	14.18998	20.26107	26.43137	E
7.337556	13.26781	19.33766	26.36457	E
5.844740	14.11085	20.51828	26.41921	E
6.807692	15.13887	20.07011	26.00979	E
9.015036	12.34000	21.57812	28.13771	E
5.594462	14.98144	21.27598	27.25072	E
7.259060	14.51132	21.94688	28.78272	E
7.930196	15.25946	21.86969	28.05573	E
8.695040	15.52299	20.88409	29.06015	E
7.342726	13.83276	21.42994	28.26561	E
7.911118	14.29955	20.21398	27.89367	E
7.450212	15.39719	19.91358	28.18544	E
7.871188	14.42503	21.13337	27.91589	E

	humerus	ulna	femur	tibia
humerus	5.246708	9.336696	7.851466	10.44042
ulna	9.336696	19.102111	16.135152	21.28146
femur	7.851466	16.135152	20.106820	25.94645
tibia	10.440416	21.281457	25.946449	34.96317

-0.2302141	0.4138996	0.6109147	-0.6344067
-0.4663372	0.7473891	-0.3214898	0.3472514
-0.5145071	-0.3053580	-0.5708807	-0.5622580
-0.6817723	-0.4205392	0.4444353	0.4010120

	humerus	ulna	femur	tibia
humerus	1.0000000	0.8042349	0.2332436	0.2880717
ulna	0.8042349	1.0000000	0.3038286	0.3321251
femur	0.2332436	0.3038286	1.0000000	0.7690596
tibia	0.2880717	0.3321251	0.7690596	1.0000000

	humerus	ulna	femur	tibia
humerus	1.0000000	0.43328569	0.17293372	0.1840682
ulna	0.4332857	1.00000000	0.03990535	0.1373924
femur	0.1729337	0.03990535	1.00000000	0.6896008
tibia	0.1840682	0.13739244	0.68960082	1.0000000

	humerus	ulna	femur	tibia
humerus	1.00000000	0.78877614	0.01499336	0.3469841
ulna	0.78877614	1.00000000	0.08068266	0.3640083
femur	0.01499336	0.08068266	1.00000000	0.1662252
tibia	0.34698412	0.36400826	0.16622521	1.0000000

	humerus	ulna	femur	tibia
humerus	1.0000000	0.7789553	0.5164091	0.5862964
ulna	0.7789553	1.0000000	0.7241503	0.6674222
femur	0.5164091	0.7241503	1.0000000	0.5806471
tibia	0.5862964	0.6674222	0.5806471	1.0000000

	humerus	ulna	femur	tibia
humerus	1.00000000	-0.12227727	0.02229512	0.3655136
ulna	-0.12227727	1.00000000	0.01355941	0.1594792
femur	0.02229512	0.01355941	1.00000000	0.2598259
tibia	0.36551364	0.15947920	0.25982593	1.0000000