Using R libraries

R can be extended through the use of R packages. Packages provides more functionalities to R than what the base R packages have.

In this notebook, I will show you how to install packages, load the packages to use them.

To install a package you have to call the following method:


In [ ]:
install.packages("ggplot2")

And to use the methods available in the installed package you need to use the method:


In [7]:
library("ggplot2")

Now you have all the methods that are available in the package ggplot2. So, lets generate some data to play around with ggplot2. First, we will generate data from two normal distributions with a mean of 10 and 15 and standard deviation of 2 (for both distributions). We will store the data in two objects with the name spp1_data, spp2_data.


In [1]:
spp1_data<-rnorm(50, 10, 2)

In [2]:
spp2_data<-rnorm(50, 15, 2)

In [3]:
spp_id<-rep(factor(LETTERS[1:2]),each=50, times=1)

In [4]:
spp_data<-c(spp1_data, spp2_data)

In [5]:
data_matrix<-data.frame(spp_data, spp_id)

Now can use the data in the data matrix to graph something with ggplot2. Note that we will store the graph in an object with name p.


In [8]:
p<-ggplot(data_matrix, aes(spp_id,spp_data))

In ggplot2 you can start modifying the plot you just made. In the previous line you created an object with the object data_matrix in which the axes are going to be given by the columns spp_id and spp_data.

If you print p, nothing will be printed inside the plot's frame. For instance:


In [13]:
p


This is because ggplot2 doesn't know how to plot the data in these columns. You can use different "geometries", which are descriptions of the way in which the data should be plotted. So, for instance:


In [14]:
p + geom_boxplot()


We can mix geometries. For instance, lets mix a boxplot with the actual data used to calculate all the summary values used in it:


In [17]:
p + geom_boxplot() + geom_jitter(width=0.1)


produces a boxplot.

It is not necessary to store the plot in an object. You can do it like this:


In [18]:
ggplot(data_matrix, aes(spp_data)) + geom_histogram()


`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Note that in the aes() argument there is only one variable now. Using ggplot it is very easy to get one histogram for the two species in our dataset. For instance, and now using the binwidth to make the histogram look better, as recommended:


In [19]:
ggplot(data_matrix, aes(spp_data)) + geom_histogram(binwidth=1) + facet_grid(. ~ spp_id)


One nice feature of ggplot2 is that ability to improve the plot by changing or adding the aestetics.


In [21]:
ggplot(data_matrix, aes(spp_data)) + geom_histogram(binwidth=1, colour="red") + facet_grid(. ~ spp_id) + theme_bw() + labs(x="X label", y="Y label")