In this tutorial we explore the consequences of genetic covariation for multivariate evolution using a toy dataset measured in 5 related species rodent species, named A to E. In the dataset we have 4 quantitative trais (Humer, Ulna, Tibia and Femur). Each species have a sample size of $N = 60$. We will use this dataset to illustrate several points related to multivariate evolution and data analyses.

As usual, first we need to install a few packages

```
In [1]:
```list_pkgs <- c("evolqg", "ggplot2", "GGally")
new_pkgs <- list_pkgs[!(list_pkgs %in% installed.packages()[,"Package"])]
if(length(new_pkgs) > 0){ install.packages(new_pkgs) }
library(evolqg)
library(ggplot2)
library(GGally)
library(MASS)
library(scales)

```
```

```
In [2]:
```data(dentus)

This creates a data.frame in the workspace names dentus with all the data we need.

```
In [3]:
``````
dentus
```

```
```

```
In [4]:
```ggpairs(dentus, mapping = aes(color = species), columns = c("humerus", "ulna","femur", "tibia", "species"))

```
```

Let us examine now how the variation within-groups (species) and between-groups is distributed. One common method to summarize or transform variation in several traits into fewer number of variables is principal component analysis (PCA). Usually, a system with 30-40 traits can be represented by a much smaller number of principal components, say 2 to 7. These principal components are composite traits using all the original traits, but unlike the original traits, they are uncorrelated to each other.

If you know some linear algebra, the pincipal components are the eigenvectors of the covariance matrix between the traits. The eigenvalues are the variances in the dataset in the direction of the corresponding eigenvector. If you don't know linear algebra, here is a figure to ilustrate a principal componente in two dimensions.

Also, take a look at this great tutorial on PCA

Let's extract the first 2 principal components from this dataset and then plot the two new variables and examine the distribution of points and species on it. What can you see in this graph in regard to the distribution of variation within and between groups?

```
In [5]:
```# PCA is the eigenvalues of the covariance matrix, so let's calcualte the covariance matrix for the full dataset
fullcov = cov(dentus[,1:4])
fullcov

```
```

```
In [6]:
```#Now we take the eigen decomposition:
eigen_fullcov = eigen(fullcov)
eigen_fullcov$values

```
```

```
In [7]:
```# These coluns are all the PCs
eigen_fullcov$vectors

```
```

```
In [8]:
```# We can use these PCs to project the original dataset and get the scores for each individual in the PCs
dentus_fullPCscores = as.matrix(dentus[,1:4]) %*% eigen_fullcov$vectors
# These must be uncorrelated, so all correlation should be zero
round(cor(dentus_fullPCscores), 5)

```
```

```
In [9]:
```# Now the plot of the first two principal components
ggplot(data.frame(PC1 = dentus_fullPCscores[,1], PC2 = dentus_fullPCscores[,2], species = dentus$species),
aes(-PC1, PC2, color = species)) + geom_point() +
labs(x = paste("PC1 (", percent(eigen_fullcov$values[1]/sum(eigen_fullcov$values)), ")", sep=""),
y = paste("PC2 (", percent(eigen_fullcov$values[2]/sum(eigen_fullcov$values)), ")", sep=""))

```
```

```
In [10]:
```lda <- lda(species ~ .,
dentus,
prior = c(1,1,1,1,1)/5)
prop.lda = lda$svd^2/sum(lda$svd^2)
plda <- predict(object = lda,
newdata = dentus)
dataset = data.frame(species = dentus[,"species"], lda = plda$x)
ggplot(dataset) + geom_point(aes(lda.LD1, lda.LD2, colour = species), size = 2.5) +
labs(x = paste("LD1 (", percent(prop.lda[1]), ")", sep=""),
y = paste("LD2 (", percent(prop.lda[2]), ")", sep=""))

```
```

Now let’s turn our attention to the species Matrices and how traits are associated within each one of them.

Compute the correlation matrix (all 6 pairs of correlation) for each species. For our discussion here this matrix is very similar to the variance/covariance matrix so for sake of comparison we will use the correlation (all traits have very similar variance except for one in one species).

```
In [11]:
```# We can calculate the within species matrix by subsettting the original data.frame for each matrix:
corA = cor(dentus[dentus$species == "A", 1:4])
corB = cor(dentus[dentus$species == "B", 1:4])
corC = cor(dentus[dentus$species == "C", 1:4])
corD = cor(dentus[dentus$species == "D", 1:4])
corE = cor(dentus[dentus$species == "E", 1:4])
#now let's group them in a list
cor_mats = list(A = corA, B = corB, C = corC, D = corD, E = corE)
cor_mats

```
```

```
In [12]:
```random_betas = matrix(rnorm(1000*4), 4, 1000)
# These are the simulated responses
response_list = lapply(cor_mats, function(x) x %*% random_betas)

We can now relate the structure of the correlation matrix with those responses calculating a series of statistics:

- Correlation between selection vector x response vector
- Correlation between response vector x PC1 (first principal component of each species)
- Correlation between response vector x PC2 (second principal component of each species)

```
In [13]:
```# First we need a vector correlation funcion:
corVector = function(x, y) x %*% y / (Norm(x) * Norm(y))
# And a function that calculates the vector correlation between the columns of two matrices:
corColumns = function(x, y){
n = ncol(x)
correlations = numeric(n)
for(i in 1:n)
correlations[i] = corVector(x[,i], y[,i])
return(correlations)
}
# And a function that calculates the vector correlation between the columns of a matrix and a vector:
corColumnVector = function(x, vector){
n = ncol(x)
correlations = numeric(n)
for(i in 1:n)
correlations[i] = corVector(x[,i], vector)
return(correlations)
}

```
In [14]:
```# Correlation between selection vector ($\beta$) x response vector $\delta z$
cor_beta_dz = lapply(response_list, corColumns, random_betas)
# Correlation between response vector x PC1 (first principal component of each species)
species = list("A", "B", "C", "D", "E")
cor_dz_PC1 = lapply(species, function(sp) abs(corColumnVector(response_list[[sp]], eigen(cor_mats[[sp]])$vector[,1])))
names(cor_dz_PC1) = species
# Correlation between response vector x PC1 (first principal component of each species)
cor_dz_PC2 = lapply(species, function(sp) abs(corColumnVector(response_list[[sp]], eigen(cor_mats[[sp]])$vector[,2])))
names(cor_dz_PC2) = species
# We can joint all these correlation in a data.frame
corelation_df = NULL
for(sp in species){
corelation_df = rbind(corelation_df,
data.frame(beta_dz = cor_beta_dz[[sp]],
dz_PC1 = cor_dz_PC1[[sp]],
dz_PC2 = cor_dz_PC2[[sp]],
species = sp))
}

```
In [15]:
```ggplot(corelation_df, aes(dz_PC1, dz_PC2)) + geom_point(alpha = 0.1) + facet_wrap(~species, ncol = 2)

```
```

```
In [16]:
```ggplot(corelation_df, aes(species, beta_dz)) + geom_violin() + geom_jitter(alpha = 0.1)

```
```