Author: Ryan B. Harvey
Created: 2014-07-18 / Last Updated: 2014-07-18
Contents:
sqldf
function in Julia (using the SQLite.jl
package) Package Dependencies:
sqldf
function, based on the package of the same name in RReferences:
sqldf
function in Julia (using the SQLite.jl
package)The SQLite.jl
package is a Julia package that provides an interface to the SQLite library and support for operations on DataFrames through the sqldf
function. The sqldf
function creates tables in a local SQLite database, runs the requested SQL query on those tables, and then drops the tables from the SQLite database. Although this can be slower than subsetting a data frame directly, in cases where the data frame is very complex and conditions get difficult to parse, SQL can provide a straightforward, declarative way of requesting a particular data subset.
If the SQLite.jl
package doesn't exist in your environment, add the package. Once you have it, use it.
In [ ]:
Pkg.update()
In [ ]:
Pkg.add("SQLite")
In [ ]:
Pkg.add("DataFrames")
In [ ]:
using SQLite
using DataFrames
For this example, we'll use the Basic Business Licenses data from the DC government open data site.
The data is available as a zip file, so we'll download it if it's not already available, and then unzip it if it hasn't been already.
In [ ]:
url = "http://data.octo.dc.gov/feeds/bbl/bbl_current_csv.zip"
data_dir = "../data/"
current_dir = pwd()
cd(data_dir)
local_zipped = "bbl_current_csv.zip"
local_data = "bbl_current_csv.csv"
if !(isfile(local_zipped) && isfile(local_data))
download(url, local_zipped)
fileDownloadedDate = strftime("%F", time())
fileDownloadedDate
end
In [ ]:
if !isfile(local_data)
run(`unzip $local_zipped`)
end
In [ ]:
bbl = readtable(local_data)
In [ ]:
summary(bbl)
In [ ]:
names(bbl)
In [ ]:
sqldf("select * from bbl where BUSINESSIMPROVEMENTDISTRICT = 'DOWNTOWN BID'")
In [ ]:
sqldf("select * from bbl where WARD = 6")
The sqldf()
function returns a data frame, so we can store it in another variable for use in a plot, for example.
In [ ]:
ward6 = sqldf("select LICENSECATEGORY, count(*) as LICENSECOUNT from bbl where WARD = 6 group by LICENSECATEGORY")
ward6
In [ ]:
Pkg.add("PyPlot")
using PyPlot
In [ ]:
barh(ward6["LICENSECOUNT"], ward6["LICENSECATEGORY"])
In [ ]: