notebook.community

Edit and run



In [1]:

    
Pkg.add("Cairo")









    



INFO: No packages to install, update or remove
INFO: Package database updated



In [2]:

    
Pkg.update()









    



INFO: Updating METADATA...
INFO: Updating RCalling...
INFO: Computing changes...
INFO: No packages to install, update or remove



In [3]:

    
using Gadfly



In [5]:

    
# This is a simple plot
plot(x = rand(10), y = rand(10))

## Add geom.point and geom.line

plot(x = rand(10), y = rand(10), Geom.point, Geom.line)
# Note the case, plots both points and lines









    Out[5]:



In [6]:

    
## Produce more complex plots
plot(x = 1:10, y = 2.^rand(10),
Scale.y_sqrt, Geom.point, Geom.smooth,
Guide.xlabel("Stimulus"), Guide.ylabel("Response"),
Guide.title("Some Training")
)
## We added smotthing line, we added title, and x and y axis labels









    Out[6]:



In [7]:

    
# We can take the previous plot and export to png and pdf and others
myplot = plot(x = 1:10, y = 2.^rand(10),
Scale.y_sqrt, Geom.point, Geom.smooth,
Guide.xlabel("Stimulus"), Guide.ylabel("Response"),
Guide.title("Some Training")
)

draw(PNG("myplot.png", 4inch, 3inch), myplot)



In [8]:

    
myplot
# you can see the result by just typing the name of the object









    Out[8]:



In [9]:

    
Pkg.add("RDatasets")









    



INFO: Installing RDatasets v0.1.2
INFO: Package database updated



In [10]:

    
## We can also plot data frames
using DataFrames
using RDatasets









    



WARNING: Base.String is deprecated, use AbstractString instead.
  likely near /Users/arindambose/.julia/v0.4/RDatasets/src/dataset.jl:1
WARNING: Base.String is deprecated, use AbstractString instead.
  likely near /Users/arindambose/.julia/v0.4/RDatasets/src/dataset.jl:1
WARNING: Base.String is deprecated, use AbstractString instead.
  likely near /Users/arindambose/.julia/v0.4/RDatasets/src/datasets.jl:1



In [12]:

    
plot(dataset("datasets", "iris"), x = "SepalLength",
y = "SepalWidth", Geom.point)

# Here we took a data set iris, and then plotted the two variables









    Out[12]:



In [13]:

    
plot(dataset("car", "SLID"), 
x = "Wages", color = "Language",
Geom.histogram)

# Here we plot a staggered histogram









    Out[13]:



In [14]:

    
## Drawing kernel density
plot(dataset("ggplot2", "diamonds"), x="Price",
Geom.density)









    Out[14]:



In [15]:

    
## Using pyplot

using PyPlot









    



WARNING: using PyPlot.plot in module Main conflicts with an existing identifier.
WARNING: using PyPlot.draw in module Main conflicts with an existing identifier.



In [16]:

    
p = scatter(x = rand(10),
y = rand(10))









    












    Out[16]:





PyObject <matplotlib.collections.PathCollection object at 0x316bbd350>



In [17]:

    
Pkg.add("Bokeh")









    



INFO: Cloning cache of Bokeh from git://github.com/bokeh/Bokeh.jl.git
INFO: Cloning cache of Mustache from git://github.com/jverzani/Mustache.jl.git
INFO: Installing Bokeh v0.2.0
INFO: Installing MacroTools v0.2.0
INFO: Installing Mustache v0.0.14
INFO: Installing Requires v0.2.1
INFO: Building Bokeh
INFO: Package database updated



In [18]:

    
Pkg.available()









    Out[18]:





755-element Array{AbstractString,1}:
 "AbstractDomains"       
 "Accumulo"              
 "ActiveAppearanceModels"
 "AffineTransforms"      
 "AmplNLWriter"          
 "AndorSIF"              
 "AnsiColor"             
 "AppConf"               
 "AppleAccelerate"       
 "ApproxFun"             
 "Arbiter"               
 "Arduino"               
 "ArgParse"              
 ⋮                       
 "XGBoost"               
 "XSim"                  
 "XSV"                   
 "YAML"                  
 "Yelp"                  
 "Yeppp"                 
 "YT"                    
 "ZChop"                 
 "ZipFile"               
 "Zlib"                  
 "ZMQ"                   
 "ZVSimulator"



In [19]:

    
Pkg.add("A*")









    



LoadError: unknown package A*
 in error at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 [inlined code] from pkg/entry.jl:49
 in anonymous at task.jl:447
while loading In[19], in expression starting on line 1

 in sync_end at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 [inlined code] from task.jl:422
 in add at pkg/entry.jl:46
 in add at pkg/entry.jl:73
 in anonymous at pkg/dir.jl:31
 in cd at file.jl:22
 in cd at pkg/dir.jl:31
 in add at pkg.jl:23



In [20]:

    
Pkg.status()









    



21 required packages:
 - Bokeh                         0.2.0
 - Cairo                         0.2.31
 - DataFrames                    0.6.10
 - DataFramesMeta                0.1.0
 - Distances                     0.2.1
 - Distributions                 0.8.7
 - GLM                           0.4.8
 - Gadfly                        0.4.0
 - HypothesisTests               0.2.10
 - IJulia                        1.1.8
 - Interact                      0.2.1
 - MCMC                          0.3.0
 - NHST                          0.0.2
 - PyCall                        1.2.0
 - PyPlot                        2.1.1
 - RCall                         0.3.1
 - RDatasets                     0.1.2
 - Stats                         0.1.0
 - StatsBase                     0.7.4
 - TimeModels                    0.0.3
 - Winston                       0.11.13
54 additional packages:
 - ArrayViews                    0.6.4
 - BinDeps                       0.3.19
 - Calculus                      0.1.14
 - Codecs                        0.1.5
 - ColorTypes                    0.2.0
 - Colors                        0.6.0
 - Compat                        0.7.8
 - Compose                       0.4.0
 - Conda                         0.1.8
 - Contour                       0.0.8
 - DataArrays                    0.2.20
 - DataStructures                0.3.13
 - Dates                         0.4.4
 - Docile                        0.5.19
 - DualNumbers                   0.1.5
 - FactCheck                     0.4.1
 - FixedPointNumbers             0.1.1
 - ForwardDiff                   0.1.2
 - GZip                          0.2.18
 - Graphics                      0.1.3
 - Grid                          0.4.0
 - Hexagons                      0.0.4
 - Homebrew                      0.2.0
 - ImmutableArrays               0.0.11
 - IniFile                       0.2.4
 - Iterators                     0.1.9
 - JSON                          0.5.0
 - KernelDensity                 0.1.2
 - LaTeXStrings                  0.1.6
 - Loess                         0.0.5
 - MacroTools                    0.2.0
 - MathProgBase                  0.3.19
 - Measures                      0.0.1
 - Mustache                      0.0.14
 - NLopt                         0.2.3
 - NaNMath                       0.1.1
 - Nettle                        0.2.0
 - Optim                         0.4.4
 - PDMats                        0.3.6
 - Polynomials                   0.0.4
 - RCalling                      0.0.0-             master (unregistered)
 - Reactive                      0.2.4
 - Reexport                      0.0.3
 - Requires                      0.2.1
 - Roots                         0.1.20
 - SHA                           0.1.2
 - Showoff                       0.0.6
 - SortingAlgorithms             0.0.6
 - StatsFuns                     0.2.0
 - TimeSeries                    0.6.5
 - Tk                            0.3.6
 - URIParser                     0.1.1
 - WoodburyMatrices              0.1.2
 - ZMQ                           0.3.1



In [21]:

    
pwd()









    Out[21]:





"/Users/arindambose/Documents/julia"



In [23]:

    
n = 50
srand(1)
x = rand(n)
y = rand(n)

area = pi .* (15 .* rand(n)).^2
scatter(x, y, s = area, alpha = 0.5)









    












    Out[23]:





PyObject <matplotlib.collections.PathCollection object at 0x3183882d0>



In [24]:

    
help(repmat)









    



LoadError: UndefVarError: help not defined
while loading In[24], in expression starting on line 1



In [25]:

    
?repmat









    



search: 





    Out[25]:





repmat(A, n, m)
Construct a matrix by repeating the given matrix n times in dimension 1 and m times in dimension 2.







    



repmat



In [26]:

    
help(repmat)









    



LoadError: UndefVarError: help not defined
while loading In[26], in expression starting on line 1



In [28]:

    
using DataFrames
df = DataFrame(A = [1,2],
B = [pi, e],
C = ["xx", "yy"])
show(df)









    



2x3 DataFrames.DataFrame
| Row | A | B       | C    |
|-----|---|---------|------|
| 1   | 1 | 3.14159 | "xx" |
| 2   | 2 | 2.71828 | "yy" |



In [29]:

    
iris = dataset("datasets", "iris")
show(names(iris))









    



[:SepalLength,:SepalWidth,:PetalLength,:PetalWidth,:Species]



In [30]:

    
describe(iris:SepalLength)









    



LoadError: UndefVarError: SepalLength not defined
while loading In[30], in expression starting on line 1



In [31]:

    
describe(:SepalLength)









    



LoadError: MethodError: `describe` has no method matching describe(::Symbol)
Closest candidates are:
  describe(::Any, !Matched::DataFrames.AbstractDataFrame)
  describe{T<:Number}(::Any, !Matched::AbstractArray{T<:Number,N})
  describe{T}(::Any, !Matched::AbstractArray{T,N})
while loading In[31], in expression starting on line 1



In [32]:

    
head(iris)









    Out[32]:




SepalLength SepalWidth PetalLength PetalWidth Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa



In [33]:

    
aggregate(iris, :Species, sum)









    Out[33]:




Species SepalLength_sum SepalWidth_sum PetalLength_sum PetalWidth_sum
1 setosa 250.3 171.4 73.1 12.3
2 versicolor 296.8 138.5 213.00000000000003 66.3
3 virginica 329.4 148.70000000000002 277.6 101.3



In [34]:

    
aggregate(iris, :Species, [sum, mean])









    Out[34]:




Species SepalLength_sum SepalLength_mean SepalWidth_sum SepalWidth_mean PetalLength_sum PetalLength_mean PetalWidth_sum PetalWidth_mean
1 setosa 250.3 5.006 171.4 3.428 73.1 1.462 12.3 0.24600000000000002
2 versicolor 296.8 5.936 138.5 2.77 213.00000000000003 4.260000000000001 66.3 1.3259999999999998
3 virginica 329.4 6.587999999999999 148.70000000000002 2.974 277.6 5.5520000000000005 101.3 2.026



In [35]:

    
## reshape data using stack function
help("stack")









    



LoadError: UndefVarError: help not defined
while loading In[35], in expression starting on line 2



In [36]:

    
?stack









    



search: 





    Out[36]:




Stacks a DataFrame; convert from a wide to long format
stack(df::AbstractDataFrame, measure_vars, id_vars)
stack(df::AbstractDataFrame, measure_vars)
stack(df::AbstractDataFrame)
melt(df::AbstractDataFrame, id_vars, measure_vars)
melt(df::AbstractDataFrame, id_vars)

Arguments

df : the AbstractDataFrame to be stacked

measure_vars : the columns to be stacked (the measurement   variables), a normal column indexing type, like a Symbol,   Vector{Symbol}, Int, etc.; for melt, defaults to all   variables that are not id_vars

id_vars : the identifier columns that are repeated during   stacking, a normal column indexing type; for stack defaults to all   variables that are not measure_vars


If neither measure_vars or id_vars are given, measure_vars defaults to all floating point columns.
Result

::DataFrame : the long-format dataframe with column :value   holding the values of the stacked columns (measure_vars), with   column :variable a Vector of Symbols with the measure_vars name,   and with columns for each of the id_vars.

See also stackdf and meltdf for stacking methods that return a view into the original DataFrame. See unstack for converting from long to wide format.
Examples
d1 = DataFrame(a = repeat([1:3;], inner = [4]),
               b = repeat([1:4;], inner = [3]),
               c = randn(12),
               d = randn(12),
               e = map(string, 'a':'l'))

d1s = stack(d1, [:c, :d])
d1s2 = stack(d1, [:c, :d], [:a])
d1m = melt(d1, [:a, :b, :e])








    



stack stackdf stackplot StackOverflowError vstack hstack unstack



In [37]:

    
## from wide to long use melt
using StatsBase



In [38]:

    
names(iris)









    Out[38]:





5-element Array{Symbol,1}:
 :SepalLength
 :SepalWidth 
 :PetalLength
 :PetalWidth 
 :Species



In [39]:

    
summarystats(iris[:SepalLength])









    Out[39]:





Summary Stats:
Mean:         5.843333
Minimum:      4.300000
1st Quartile: 5.100000
Median:       5.800000
3rd Quartile: 6.400000
Maximum:      7.900000



In [40]:

    
describe(iris[:SepalLength])









    



Summary Stats:
Mean:         5.843333
Minimum:      4.300000
1st Quartile: 5.100000
Median:       5.800000
3rd Quartile: 6.400000
Maximum:      7.900000



In [41]:

    
table(iris[:SepalLength])









    



LoadError: PyError (:PyObject_Call) <type 'exceptions.TypeError'>
TypeError('table() takes exactly 0 arguments (1 given)',)

while loading In[41], in expression starting on line 1

 [inlined code] from /Users/arindambose/.julia/v0.4/PyCall/src/exception.jl:81
 in pycall at /Users/arindambose/.julia/v0.4/PyCall/src/PyCall.jl:361
 in table at /Users/arindambose/.julia/v0.4/PyPlot/src/PyPlot.jl:460



In [42]:

    
using StatsBase



In [43]:

    
mycount = counts(iris[:Species])









    



LoadError: MethodError: `counts` has no method matching counts(::DataArrays.PooledDataArray{ASCIIString,UInt8,1})
while loading In[43], in expression starting on line 1



In [44]:

    
?counts









    



search: 





    Out[44]:




No documentation found.
StatsBase.counts is a generic Function.
# 16 methods for generic function "counts":
counts(x::AbstractArray{T<:Integer,N}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:54
counts(x::AbstractArray{T<:Integer,N}, levels::UnitRange{T<:Integer}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:50
counts(x::AbstractArray{T<:Integer,N}, levels::UnitRange{T<:Integer}, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:51
counts(x::AbstractArray{T<:Integer,N}, k::Integer) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:52
counts(x::AbstractArray{T<:Integer,N}, k::Integer, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:53
counts(x::AbstractArray{T<:Integer,N}, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:55
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:144
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, levels::Tuple{UnitRange{T<:Integer},UnitRange{T<:Integer}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:130
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, levels::Tuple{UnitRange{T<:Integer},UnitRange{T<:Integer}}, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:134
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, levels::UnitRange{T<:Integer}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:137
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, levels::UnitRange{T<:Integer}, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:138
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, ks::Tuple{Integer,Integer}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:140
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, ks::Tuple{Integer,Integer}, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:141
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, k::Integer) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:142
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, k::Integer, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:143
counts(x::AbstractArray{T<:Integer,N}, y::AbstractArray{T<:Integer,N}, wv::StatsBase.WeightVec{W,Vec<:AbstractArray{T<:Real,1}}) at /Users/arindambose/.julia/v0.4/StatsBase/src/counts.jl:145








    



counts addcounts! countlines count_ones count_zeros count countnz



In [45]:

    
mycount = countmap(iris[:Species])









    Out[45]:





Dict{Union{ASCIIString,DataArrays.NAtype},Int64} with 3 entries:
  "virginica"  => 50
  "setosa"     => 50
  "versicolor" => 50



In [46]:

    
proportionmap(iris[:Species])









    Out[46]:





Dict{Union{ASCIIString,DataArrays.NAtype},Float64} with 3 entries:
  "virginica"  => 0.3333333333333333
  "setosa"     => 0.3333333333333333
  "versicolor" => 0.3333333333333333



In [51]:

    
Pkg.clone("git://github.com/nalimilan/Tables.jl.git")









    



INFO: Cloning Tables from git://github.com/nalimilan/Tables.jl.git






    



LoadError: Tables already exists
while loading In[51], in expression starting on line 1



In [48]:

    
using Tables









    



LoadError: LoadError: ArgumentError: NamedArrays not found in path
while loading /Users/arindambose/.julia/v0.4/Tables/src/Tables.jl, in expression starting on line 3
while loading In[48], in expression starting on line 1

 in require at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib
 in require at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib



In [50]:

    
Pkg.add("NamedArrays")









    



INFO: Cloning cache of NamedArrays from git://github.com/davidavdav/NamedArrays.jl.git
INFO: Installing NamedArrays v0.4.6
INFO: Package database updated



In [52]:

    
Pkg.clone("git://github.com/nalimilan/Tables.jl.git")









    



INFO: Cloning Tables from git://github.com/nalimilan/Tables.jl.git






    



LoadError: Tables already exists
while loading In[52], in expression starting on line 1



In [ ]:

	SepalLength	SepalWidth	PetalLength	PetalWidth	Species
1	5.1	3.5	1.4	0.2	setosa
2	4.9	3.0	1.4	0.2	setosa
3	4.7	3.2	1.3	0.2	setosa
4	4.6	3.1	1.5	0.2	setosa
5	5.0	3.6	1.4	0.2	setosa
6	5.4	3.9	1.7	0.4	setosa

	Species	SepalLength_sum	SepalWidth_sum	PetalLength_sum	PetalWidth_sum
1	setosa	250.3	171.4	73.1	12.3
2	versicolor	296.8	138.5	213.00000000000003	66.3
3	virginica	329.4	148.70000000000002	277.6	101.3

	Species	SepalLength_sum	SepalLength_mean	SepalWidth_sum	SepalWidth_mean	PetalLength_sum	PetalLength_mean	PetalWidth_sum	PetalWidth_mean
1	setosa	250.3	5.006	171.4	3.428	73.1	1.462	12.3	0.24600000000000002
2	versicolor	296.8	5.936	138.5	2.77	213.00000000000003	4.260000000000001	66.3	1.3259999999999998
3	virginica	329.4	6.587999999999999	148.70000000000002	2.974	277.6	5.5520000000000005	101.3	2.026