Introduction to R

Remember to set your kernel to R (SageMath).

Load the same dataset from previous exercises (genes.table) into a variable called "genes":


In [ ]:

Linear Regression

Are the expression values for geneA and geneC correlated across samples? Try to plot the data with "plot" function (put geneA on the X axis and geneC on the Y axis):


In [ ]:

Do a simple linear regression using geneC as the response variable and geneA as the explanatory variable.


In [ ]:

Plot the regression result on the scatter plot using "abline":


In [ ]:

Use the "summary" function on the linear regression result to see if there is a significant correlation between geneA and geneC :


In [ ]:

What is the p-value for the regression? What is the R-squared value? Is there a correlation between geneA and geneC?


In [ ]:

Plotting With R

Use boxplot to plot the expression level of geneA


In [ ]:

Do boxplots show the mean or the median of geneA?

Now let’s plot the distributions of gene expression values with histograms. Plot the histogram for each gene (geneA, geneB, geneC, geneD). You can plot them in four separate plots.


In [ ]:

From the above histograms, which genes have approximately normally distributed expression values?

Homework (10 points)

Load the dataset "single_cell_rnaseq_hw1.txt" into a variable named "scdata":


In [ ]:

Attach the dataset so that you can refer to the columns by column names.


In [ ]:

Plot the expression level of "Sub1" and "Scg2" using a scatter plot (put "Sub1" on Y axis and "Scg2" on X axis).


In [ ]:

Do a simple linear regression on "Sub1" and "Scg2" (Sub1 as the response variable and Scg2 as the explanatory variable).


In [ ]:

Plot the regression line on the previous scatter plot.


In [ ]:

Check the regression results. What is the p-value for the regression?


In [ ]:

For the same regression results, what is the R-squared value?


In [ ]:

Given these regression results, would you say there is or is not an interesting correlation between Sub1 and Scg2 and why or why not?


In [ ]:

Use a boxplot to plot the expression levels of Sub1 and Scg2.


In [ ]:

Which gene has larger median?


In [ ]:

Plot the distribution of gene Sub1 with a histogram.


In [ ]:

Plot the distribution of gene Scg2 with a histogram.


In [ ]:

Do they look normally distributed?


In [ ]: