Introduction to Databases and SQL

Plan

  • get a data set from R: mtcars
  • select from it using SQL (via sqldf)
  • export it as CSV file
  • in sqlite, create database and import CSV
  • use RSQLite connection to select direct from database

get a data set from R: mtcars

The data contain Motor Trends car testing results. There are 32 cars.


In [43]:
nrow(mtcars)
ncol(mtcars)


32
11

In [44]:
head(mtcars)


mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX421.0 6 160 110 3.90 2.62016.460 1 4 4
Mazda RX4 Wag21.0 6 160 110 3.90 2.87517.020 1 4 4
Datsun 71022.8 4 108 93 3.85 2.32018.611 1 4 1
Hornet 4 Drive21.4 6 258 110 3.08 3.21519.441 0 3 1
Hornet Sportabout18.7 8 360 175 3.15 3.44017.020 0 3 2
Valiant18.1 6 225 105 2.76 3.46020.221 0 3 1

In [45]:
summary(mtcars)


      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

use SQL to select cars with 6-cylinder engines


In [46]:
library(sqldf)
# help(sqldf)

In [47]:
sqldf('select * from mtcars where cyl=6')


mpgcyldisphpdratwtqsecvsamgearcarb
21.0 6 160.0110 3.90 2.62016.460 1 4 4
21.0 6 160.0110 3.90 2.87517.020 1 4 4
21.4 6 258.0110 3.08 3.21519.441 0 3 1
18.1 6 225.0105 2.76 3.46020.221 0 3 1
19.2 6 167.6123 3.92 3.44018.301 0 4 4
17.8 6 167.6123 3.92 3.44018.901 0 4 4
19.7 6 145.0175 3.62 2.77015.500 1 5 6

There are 7 cars matching the query.


In [48]:
nrow(sqldf('select * from mtcars where cyl=6'))


7

We can obtain the same result directly with SQL... but let's not get ahead of ourselves.


In [49]:
sqldf('select count(*) from mtcars where cyl=6')


count(*)
7

write mtcars dataset to the file system


In [50]:
df = mtcars
df$name = rownames(df) # move R's rownames into their own column
write.table(df, "mtcars.csv", quote=TRUE, row.names=FALSE, col.names=FALSE, sep=",")

create sqlite database in terminal

make mtcars-init

That will run the following:

sqlite3 mtcars.sqlite < mtcars-init.sql

That SQL file does the following:

  • drop a table called results
  • create a table called results with columns for the mtcars data
  • tell sqlite to load a CSV file
  • load the CSV file into the results table

access sqlite from R


In [51]:
library(RSQLite)

# connect to the database
db = dbConnect(SQLite(), dbname="mtcars.sqlite")

list the tables that are available


In [52]:
dbListTables(db)


'results'

Inspect columns in the results table


In [53]:
dbListFields(db, "results")


  1. 'mpg'
  2. 'cyl'
  3. 'disp'
  4. 'hp'
  5. 'drat'
  6. 'wt'
  7. 'qsec'
  8. 'vs'
  9. 'am'
  10. 'gear'
  11. 'carb'
  12. 'name'

use SQL to select cars with 8-cylinder engines - directly from database


In [54]:
dbGetQuery(conn = db, "select * from results where cyl=8")


mpgcyldisphpdratwtqsecvsamgearcarbname
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360
16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger
15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin
13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28
19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L
15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora

In [ ]: