The grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (color, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformation of the data and is drawn on a specific coordinate system. Faceting can be used to generate that same plot for different subsets of dataset. It is the combination of these independent components that make up a graphic.
The main components of the grammar:
In [1]:
using Gadfly
In [2]:
# plot arrays
x = collect(linspace(-5,5,8))
y = 5*cos(x)+x
plot(x=x, y=y)
Out[2]:
In [3]:
# line options: point, line, smooth. Use Geom.
x = collect(linspace(-5,5,8))
y = 5*cos(x)+x
plot(x=x, y=y, Geom.step()) # optional arguments: direction :vh :hv
Out[3]:
In [4]:
# plot anonymous fuctions
plot([sin, cos], -5, 5)
Out[4]:
In [5]:
plot([x->5*cos(x) + x, x->5*sin(x) + x], -5, 5)
Out[5]:
In [6]:
f(x) = 5*cos(x) + x
Out[6]:
In [7]:
# customize plots: title, axis, labels, ...
plot(f, -4, 4, Guide.xlabel("variable x"), Guide.ylabel("variable y=f(x)"), Guide.title("This is the title"))
Out[7]:
In [8]:
# more on Guide, xrug, yrug.
plot(x=x, y=y, Guide.xrug, Guide.yrug)
Out[8]:
In [9]:
# more on guide: ticks
xt = [-3, -2, 3, 4]
yt = [-1, 0, 2]
plot(x=x, y=y, Geom.line, Guide.xticks(ticks=xt, orientation=:vertical), Guide.yticks(ticks=yt))
# optional: label: true or false.
Out[9]:
In [10]:
# more than one plot at the same time: layers
y1 = y+1
plot(layer(x=x, y=y, Geom.line), layer(x=x, y=y1, Geom.smooth, Theme(default_color=colorant"red")))
Out[10]:
In [11]:
# others: Geom: vline, hline.
plot(x=x, y=y, xintercept=[4], yintercept=[-2], Geom.line, Geom.hline(), Geom.vline(color=colorant"orange", size=1mm))
Out[11]:
In [12]:
# histograms
x0 = randn(10000)
plot(x=x0, Geom.histogram)
Out[12]:
In [13]:
# Scale. x_continuous, y_continuous, x_log, x_log10, etc. Same for y.
plot(x=x, y=y, Scale.x_continuous(format=:scientific), Geom.line)
Out[13]:
In [14]:
# Coord.cartesian xmin, xmax
plot(x=x, y=y, Coord.cartesian(xmin=-2,xmax=4))
Out[14]:
In [15]:
# Scale. x_continuous, y_continuous, x_log, x_log10, etc. Same for y.
x1 = collect(linspace(0,1,10))
y1 = exp(x1)
plot(x=x1, y=y1, Scale.y_log())
Out[15]:
In [16]:
# Other features: Geom.path
n = 10
xjumps = randn(n)
yjumps = randn(n)
plot(x=cumsum(xjumps),y=cumsum(yjumps),Geom.path())
Out[16]:
In [17]:
# Other features: Geom.ribbon
ymin = y - 1
ymax = y + 1
plot(x=x, y=y, ymax=ymax, ymin=ymin, Geom.line, Geom.ribbon)
Out[17]:
In [18]:
using RDatasets
using DataFrames
In [19]:
Data1 = dataset("car","Salaries")
# The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors and
# Professors in a college in the U.S.
Out[19]:
In [20]:
# density
plot(Data1, x="Salary", Geom.density) # Geom.density = Geom.line, Stat.density
Out[20]:
In [21]:
# histogram
plot(Data1, x="Salary", Geom.histogram, color="Discipline") # Geom.histogram = Geom.bar, Stat.histogram
Out[21]:
In [22]:
# 2D histogram
plot(Data1, x="Salary", y="YrsService", Geom.histogram2d(xbincount=40, ybincount=40))
Out[22]:
In [23]:
# error bars
using Distributions
sds = [1, 1/2, 1/4, 1/8, 1/16, 1/32]
n = 10
ys = [mean(rand(Normal(0, sd), n)) for sd in sds]
ymins = ys .- (1.96 * sds / sqrt(n))
ymaxs = ys .+ (1.96 * sds / sqrt(n))
plot(x=1:length(sds), y=ys, ymin=ymins, ymax=ymaxs, Geom.point, Geom.errorbar)
Out[23]:
In [24]:
Data2 = dataset("datasets","USArrests")
# This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in
# each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.
Out[24]:
In [25]:
# labels
plot(Data2, x="UrbanPop", y="Murder", label="State" , Geom.label, Geom.point)
Out[25]:
In [26]:
Data3 = dataset("datasets","chickwts")
# An experiment was conducted to measure and compare the effectiveness of various feed supplements
# on the growth rate of chickens.
Out[26]:
In [27]:
# boxplot
plot(Data3, x="Feed", y="Weight", Geom.boxplot)
Out[27]:
In [28]:
# Data by categories.
Data4 = dataset("Ecdat","Wages1")
# a panel of 595 observations from 1976 to 1982
Out[28]:
In [29]:
plot(Data4, x="Exper", y="Wage", color="Sex")
Out[29]:
In [30]:
Data5 = dataset("Zelig", "approval")
# The (approximately) quarterly approval rating for the President of the United States from the first month of 2001
# to the last month of 2005.
Out[30]:
In [31]:
plot(Data5, x="Month", y="Approve", color="Year", Geom.line)
Out[31]:
In [32]:
Data6 = dataset("Zelig","macro")
# Selected macroeconomic indicators for many countries.
Out[32]:
In [33]:
plot(Data6, x = "Year", y="GDP", color="Country", Geom.line)
Out[33]:
In [34]:
Data7 = dataset("vcd","Suicide")
# Data from Heuer (1979) on suicide rates in West Germany classified by age, sex, and method of suicide.
Out[34]:
In [35]:
# grouped data.
p = plot(Data7, xgroup="Sex", ygroup="Method", x="Age", y="Freq", Geom.subplot_grid(Geom.bar))
Out[35]:
In [36]:
draw(SVG("myplot.svg", 14cm, 25cm), p) # to save in other formats use pkg Cairo and Fontconfig.
In [37]:
# contour
volcano = convert(Array,(dataset("datasets","volcano")))
Out[37]:
In [38]:
plot(z=volcano, Geom.contour(levels=[110, 130, 150, 170, 190]))
# arguments(optional): levels: it could be either an array of contour levels, or the number of levels to plot.
# plot(z=volcano, Geom.contour(levels=5))
# plot(z=volcano, Geom.contour(levels=[110, 130, 150, 170, 190]))
Out[38]:
In [39]:
# contour also works for functions!!
plot(z=(x,y) -> x*exp(-(x-round(Int, x))^2-y^2), x=linspace(-8,8,150), y=linspace(-2,2,150), Geom.contour)
Out[39]:
[1] Gadfly Github page and this
[2] https://en.wikibooks.org/wiki/Introducing_Julia/Plotting
[3] The Grammar of Graphics (2005), Leland Wilkinson.
[4] ggplot2: Elegant Graphics for Data Analysis (2009), Hadley Wickham.
[5] ggplo2 Essentials (2015), Donato Teutonico.
[6] R graphics cookbook (2013), Winston Chang
In [ ]: