A Demo on R's ggplot with the Jupyter Notebook

Basic Scatterplot

First, we will generate a basic scatterplot over two normal distributions as X amd Y.


In [3]:
x <- rnorm(50)
y <- x + rnorm(50, mean=0, sd=0.5)

In [4]:
data <- as.data.frame(cbind(x, y))
summary(data)


Out[4]:
       x                  y           
 Min.   :-1.90735   Min.   :-2.55561  
 1st Qu.:-0.65233   1st Qu.:-0.82265  
 Median :-0.09495   Median :-0.04525  
 Mean   :-0.05587   Mean   :-0.06356  
 3rd Qu.: 0.64674   3rd Qu.: 0.72672  
 Max.   : 2.10736   Max.   : 1.72585  

In [5]:
library(ggplot2)

In [6]:
ggplot(data, aes(x=x, y=y)) +
  geom_point(size=2) +
  ggtitle("Scatterplot of X and Y") + 
  theme(axis.text=element_text(size=12), 
        axis.title = element_text(size=14),
        plot.title = element_text(size=20, face="bold"))


Demo on Iris dataset


In [1]:
library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


In [2]:
iris


Warning message:
In `[<-.factor`(`*tmp*`, ri, value = "⋮"): invalid factor level, NA generatedWarning message:
In `[<-.factor`(`*tmp*`, ri, value = "⋮"): invalid factor level, NA generated
Out[2]:
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
15.13.51.40.2setosa
24.931.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
553.61.40.2setosa
65.43.91.70.4setosa
74.63.41.40.3setosa
853.41.50.2setosa
94.42.91.40.2setosa
104.93.11.50.1setosa
115.43.71.50.2setosa
124.83.41.60.2setosa
134.831.40.1setosa
144.331.10.1setosa
155.841.20.2setosa
165.74.41.50.4setosa
175.43.91.30.4setosa
185.13.51.40.3setosa
195.73.81.70.3setosa
205.13.81.50.3setosa
215.43.41.70.2setosa
225.13.71.50.4setosa
234.63.610.2setosa
245.13.31.70.5setosa
254.83.41.90.2setosa
26531.60.2setosa
2753.41.60.4setosa
285.23.51.50.2setosa
295.23.41.40.2setosa
304.73.21.60.2setosa
31NA
1216.93.25.72.3virginica
1225.62.84.92virginica
1237.72.86.72virginica
1246.32.74.91.8virginica
1256.73.35.72.1virginica
1267.23.261.8virginica
1276.22.84.81.8virginica
1286.134.91.8virginica
1296.42.85.62.1virginica
1307.235.81.6virginica
1317.42.86.11.9virginica
1327.93.86.42virginica
1336.42.85.62.2virginica
1346.32.85.11.5virginica
1356.12.65.61.4virginica
1367.736.12.3virginica
1376.33.45.62.4virginica
1386.43.15.51.8virginica
139634.81.8virginica
1406.93.15.42.1virginica
1416.73.15.62.4virginica
1426.93.15.12.3virginica
1435.82.75.11.9virginica
1446.83.25.92.3virginica
1456.73.35.72.5virginica
1466.735.22.3virginica
1476.32.551.9virginica
1486.535.22virginica
1496.23.45.42.3virginica
1505.935.11.8virginica

In [1]:
library(ggplot2)

In [4]:
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)


ggplot demo using the diamonds dataset


In [2]:
str(diamonds)


'data.frame':	53940 obs. of  10 variables:
 $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

In [5]:
head(diamonds)


Out[5]:
caratcutcolorclaritydepthtablepricexyz
10.23IdealESI261.5553263.953.982.43
20.21PremiumESI159.8613263.893.842.31
30.23GoodEVS156.9653274.054.072.31
40.29PremiumIVS262.4583344.24.232.63
50.31GoodJSI263.3583354.344.352.75
60.24Very GoodJVVS262.8573363.943.962.48

In [6]:
ggplot(diamonds, aes(x=carat, y=price, col=clarity)) + geom_point()



In [8]:
ggplot(diamonds, aes(x=price, fill=cut)) +
  geom_density(alpha = .3, color=NA)



In [9]:
ggplot(diamonds, aes(x=log10(price), fill=cut)) +
  geom_density(alpha = .3, color=NA)



In [ ]: