Exercise 1 - Basic SQL DML and DDL

Part 1 - Data manipulation (DML)

Get the survey.db SQLite3 database file from the Software Carpentry lesson and connect to it.


In [ ]:
!wget http://files.software-carpentry.org/survey.db

In [ ]:
%load_ext sql

In [ ]:
%sql sqlite:///survey.db

Basic queries: SELECT, DISTINCT, FROM, ORDER BY


In [ ]:
%%sql
SELECT personal, family
FROM Person

In [ ]:
%%sql
SELECT family, personal
FROM Person

In [ ]:
%%sql
SELECT *
FROM Person
ORDER BY family, personal

In [ ]:
%%sql
SELECT quant
FROM survey

In [ ]:
%%sql
SELECT DISTINCT quant
FROM Survey

In [ ]:
%%sql
SELECT taken, person, quant
FROM Survey 
ORDER BY taken ASC, person DESC;

In [ ]:
%%sql
SELECT DISTINCT quant, person 
FROM Survey 
ORDER BY quant ASC;

Filtering: WHERE


In [ ]:
%%sql
SELECT * 
FROM Visited 
WHERE site='DR-1'

In [ ]:
%%sql
SELECT ident 
FROM Visited 
WHERE site='DR-1';

In [ ]:
from IPython.display import Image
Image(url="http://swcarpentry.github.io/sql-novice-survey/fig/sql-filter.svg")

In [ ]:
%%sql
SELECT * 
FROM Survey 
WHERE person='lake' 
  OR person='roe'

In [ ]:
%%sql
SELECT * 
FROM Survey 
WHERE person IN ('lake', 'roe')

In [ ]:
%%sql
SELECT * 
FROM Survey 
WHERE quant='sal' 
  AND (person='lake' OR person='roe')

In [ ]:
%%sql
SELECT * 
FROM Visited
WHERE site LIKE 'DR%';

In [ ]:
%%sql
SELECT DISTINCT person, quant 
FROM Survey 
WHERE person='lake' 
   OR person='roe';

In [ ]:
%%sql
SELECT reading, 1.05 * reading AS reading_multiplied
FROM Survey 
WHERE quant='rad'

In [ ]:
%%sql
SELECT taken, reading AS reading_fahrenheit, round(5*(reading-32)/9, 2) AS reading_celsius
FROM Survey 
WHERE quant='temp'

In [ ]:
%%sql
SELECT personal || ' ' || family AS full_name, personal, family
FROM Person
ORDER BY family, personal

In [ ]:
%%sql
SELECT * 
FROM Person 
WHERE ident='dyer' 

UNION 

SELECT * 
FROM Person 
WHERE ident='roe'

Use UNION to create a consolidated list of salinity measurements in which Roerich’s, and only Roerich’s, have been corrected as described in the previous challenge.


In [ ]:
%%sql
SELECT *
FROM Survey
WHERE quant = 'sal'

NULL values


In [ ]:
%%sql
SELECT * 
FROM Visited

In [ ]:
%%sql
SELECT * 
FROM Visited
WHERE dated < '1930-01-01'
  OR dated >= '1930-01-01'

In [ ]:
%%sql
SELECT * FROM Visited WHERE dated IS NULL;

In [ ]:
%%sql
SELECT * FROM Visited WHERE dated IS NOT NULL;

In [ ]:
%%sql
SELECT * 
FROM Survey 
WHERE quant = 'sal' 
  AND person != 'lake';

In [ ]:
%%sql
SELECT * 
FROM Survey
WHERE quant = 'sal'
  AND (person != 'lake' OR person IS NULL);

In [ ]:
%%sql
SELECT * 
FROM Visited 
WHERE dated IN ('1927-02-08', NULL)

Aggregation: min(), max(), count(), avg()


In [ ]:
%%sql
SELECT MIN(dated)
FROM Visited

In [ ]:
Image(url="http://swcarpentry.github.io/sql-novice-survey/fig/sql-aggregation.svg")

In [ ]:
%%sql
SELECT AVG(reading)
FROM Survey 
WHERE quant='sal'

In [ ]:
%%sql
SELECT COUNT(reading) 
FROM Survey 
WHERE quant='sal'

In [ ]:
%%sql
SELECT SUM(reading)
FROM Survey
WHERE quant='sal'

In [ ]:
%%sql
SELECT MIN(reading), MAX(reading)
FROM Survey 
WHERE quant = 'sal' 
  AND reading <= 1.0

In [ ]:
%%sql
SELECT person, COUNT(*) 
FROM Survey 
WHERE quant = 'sal' 
  AND reading <= 1.0

Aggregation: GROUP BY, HAVING


In [ ]:
%%sql
SELECT   person, COUNT(reading), ROUND(AVG(reading), 2)
FROM     Survey
WHERE    quant = 'rad'
GROUP BY person

In [ ]:
%%sql
SELECT   person, quant, COUNT(reading), ROUND(AVG(reading), 2)
FROM     Survey
WHERE    person IS NOT NULL
GROUP BY person, quant
HAVING   quant = 'rad'
ORDER BY person, quant

Combining tables with JOINs


In [ ]:
%%sql
SELECT * 
FROM Site 
JOIN Visited

In [ ]:
%%sql
SELECT * 
FROM Site 
JOIN Visited 
  ON Site.name = Visited.site

In [ ]:
%%sql
SELECT * 
FROM Site, Visited
WHERE Site.name = Visited.site

In [ ]:
%%sql
SELECT Site.lat, Site.long, Visited.dated
FROM   Site
JOIN   Visited
ON     Site.name = Visited.site

In [ ]:
%%sql
SELECT Site.lat, Site.long, Visited.dated, Survey.quant, Survey.reading
FROM   Site
JOIN   Visited 
  ON   Site.name = Visited.site
JOIN   Survey 
  ON   Visited.ident = Survey.taken
WHERE  Visited.dated IS NOT NULL

Part 2 - Data definition (DDL)

We're going to change the database, so copy the database file first so we have a local backup copy.


In [ ]:
!cp survey.db modified.db

In [ ]:
%sql sqlite:///modified.db

In [ ]:
%%sql
DROP TABLE Person;
CREATE TABLE Person(ident TEXT, personal TEXT, family TEXT);
DROP TABLE Site;
CREATE TABLE Site(name TEXT, lat REAL, long REAL);
DROP TABLE Visited;
CREATE TABLE Visited(ident INTEGER, site TEXT, dated TEXT);
DROP TABLE Survey;
CREATE TABLE Survey(taken INTEGER, person TEXT, quant REAL, reading REAL);

In [ ]:
%%sql
DROP TABLE Survey;
CREATE TABLE Survey(
    taken   INTEGER NOT NULL, -- where reading taken
    person  TEXT,             -- may not know who took it
    quant   REAL NOT NULL,    -- the quantity measured
    reading REAL NOT NULL,    -- the actual reading
    PRIMARY KEY (taken, quant),
    FOREIGN KEY (taken) REFERENCES Visited(ident),
    FOREIGN KEY (person) REFERENCES Person(ident)
);

In [ ]:
%%sql
SELECT * FROM Site;

In [ ]:
%%sql
INSERT INTO Site values('DR-1', -49.85, -128.57);
INSERT INTO Site values('DR-3', -47.15, -126.72);
INSERT INTO Site values('MSK-4', -48.87, -123.40);
SELECT * FROM Site;

In [ ]:
%%sql
CREATE TABLE JustLatLong(lat text, long text);
INSERT INTO JustLatLong SELECT lat, long FROM Site;
SELECT * FROM JustLatLong;

In [ ]:
%%sql
SELECT * 
FROM Site
WHERE name = 'MSK-4'

In [ ]:
%%sql
UPDATE Site 
SET lat = -48.87, long = -125.40 
WHERE name = 'MSK-4';

In [ ]:
%%sql
SELECT *
FROM Site
WHERE name = 'MSK-4'

In [ ]:
%%sql
SELECT *
FROM Site

In [ ]:
%%sql
DELETE FROM Site 
WHERE name = 'DR-3';

In [ ]:
%%sql
SELECT *
FROM Site

Part 3 - Using Python for queries

ipython-sql is a wonderful jupyter plugin. It's not only how we're talking with SQLite, it also can connect with other databases - we'll use the MySQL support next time.

Among other things, though, keep in mind that this works within a Python notebook, and the plugin allows you to pull data back and then work with straight Python.


In [ ]:
result = _
print result

In [ ]:
result.keys

In [ ]:
result[3]

Pandas integration


In [ ]:
df = result.DataFrame()
df

Matplotlib integration


In [ ]:
%sql sqlite:///survey.db

In [ ]:
%matplotlib inline

In [ ]:
%%sql
SELECT * 
FROM Survey
WHERE quant = 'rad'

In [ ]:
result = _
result.bar()