Lecture 18

Wednesday, November 8th, 2017

Databases with `SQlite`

`SQLite` Exercises

Today you will work with the candidates and contributors datasets to create a database in Python using SQLite.

The exercises will consist of a sequence of steps to help illustrate basic commands.

Exercise Deliverables

Create a Jupyter notebook called Exercises-Final.ipynb inside the L18 directory. This is the one we will grade.
For each step in this lecture, there were instructions labeled "Do the following:". Put all the code from those instructions in a single Jupyter notebook cell. It should look like a Python script. You must comment where appropriate to demonstrate that you understand what you are doing.
Save and close your database. Be sure to upload your database with the lecture exercises. You must name your database L18DB.sqlite.

Setting the Stage

Step 1

Interlude: Not required but highly recommended.

Setting the Stage

You should import sqlite3 again like last time.



In [1]:

    
import sqlite3

We will also use a basic a pandas feature to display tables in the database. Although this lecture isn't on pandas, I will still have you use it a little bit.



In [2]:

    
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)

Now we create the tables in the database (just like last time).



In [3]:

    
db = sqlite3.connect('L18DB_demo.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS candidates")
cursor.execute("DROP TABLE IF EXISTS contributors")
cursor.execute("PRAGMA foreign_keys=1")

cursor.execute('''CREATE TABLE candidates (
               id INTEGER PRIMARY KEY NOT NULL, 
               first_name TEXT, 
               last_name TEXT, 
               middle_init TEXT, 
               party TEXT NOT NULL)''')

db.commit() # Commit changes to the database

cursor.execute('''CREATE TABLE contributors (
          id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
          last_name TEXT, 
          first_name TEXT, 
          middle_name TEXT, 
          street_1 TEXT, 
          street_2 TEXT, 
          city TEXT, 
          state TEXT, 
          zip TEXT, 
          amount REAL, 
          date DATETIME, 
          candidate_id INTEGER NOT NULL, 
          FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')

db.commit()

Step 1

Read candidates.txt and contributors.txt and insert their values into the respective tables.



In [4]:

    
with open ("candidates.txt") as candidates:
    next(candidates) # jump over the header
    for line in candidates.readlines():
        cid, first_name, last_name, middle_name, party = line.strip().split('|')
        vals_to_insert = (int(cid), first_name, last_name, middle_name, party)
        cursor.execute('''INSERT INTO candidates 
                  (id, first_name, last_name, middle_init, party)
                  VALUES (?, ?, ?, ?, ?)''', vals_to_insert)



In [5]:

    
with open ("contributors.txt") as contributors:
    next(contributors)
    for line in contributors.readlines():
        cid, last_name, first_name, middle_name, street_1, street_2, \
            city, state, zip_code, amount, date, candidate_id = line.strip().split('|')
        vals_to_insert = (last_name, first_name, middle_name, street_1, street_2, 
                          city, state, int(zip_code), amount, date, candidate_id)
        cursor.execute('''INSERT INTO contributors (last_name, first_name, middle_name, 
                           street_1, street_2, city, state, zip, amount, date, candidate_id) 
                           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert)

Interlude

Now that you have values in the tables of the database, it would be convenient to be able to visualize those tables in some way. We'll write a little helper function to accomplish this.



In [6]:

    
def viz_tables(cols, query):
    q = cursor.execute(query).fetchall()
    framelist = []
    for i, col_name in enumerate(cols):
        framelist.append((col_name, [col[i] for col in q]))
    return pd.DataFrame.from_items(framelist)

Here's how we can use our helper function. It gives a pretty nice visualization of our table. You should do the same thing with the contributors table.



In [7]:

    
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)









    Out[7]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
    
  
  
    
      0
      16
      Mike
      Huckabee
      
      R
    
    
      1
      20
      Barack
      Obama
      
      D
    
    
      2
      22
      Rudolph
      Giuliani
      
      R
    
    
      3
      24
      Mike
      Gravel
      
      D
    
    
      4
      26
      John
      Edwards
      
      D
    
    
      5
      29
      Bill
      Richardson
      
      D
    
    
      6
      30
      Duncan
      Hunter
      
      R
    
    
      7
      31
      Dennis
      Kucinich
      
      D
    
    
      8
      32
      Ron
      Paul
      
      R
    
    
      9
      33
      Joseph
      Biden
      
      D
    
    
      10
      34
      Hillary
      Clinton
      R.
      D
    
    
      11
      35
      Mitt
      Romney
      
      R
    
    
      12
      36
      Samuel
      Brownback
      
      R
    
    
      13
      37
      John
      McCain
      
      R
    
    
      14
      38
      Tom
      Tancredo
      
      R
    
    
      15
      39
      Christopher
      Dodd
      J.
      D
    
    
      16
      41
      Fred
      Thompson
      D.
      R

Step 2: Various Queries

We can query our database for entries with certain characteristics. For example, we can query the candidates table for entries who's middle name fields are not empty.



In [8]:

    
query = '''SELECT * FROM candidates WHERE middle_init <> ""'''
viz_tables(candidate_cols, query)









    Out[8]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
    
  
  
    
      0
      34
      Hillary
      Clinton
      R.
      D
    
    
      1
      39
      Christopher
      Dodd
      J.
      D
    
    
      2
      41
      Fred
      Thompson
      D.
      R

We can also see how many entries satisfy the query:



In [9]:

    
print("{} candidates have a middle initial.".format(viz_tables(candidate_cols, query).shape[0]))









    



3 candidates have a middle initial.

Do the following queries:

Display the contributors where the state is "PA"
Display the contributors where the amount contributed is greater than $\$1000.00$.
Display the contributors from "UT" where the amount contributed is greater than $\$1000.00$.
Display the contributors who didn't list their state
- Hint: Match state to the empty string
Display the contributors from "WA" and "PA"
- Hint: You will need to use IN ("WA", "PA") in your SELECT statement.
Display the contributors who contributed between $\$100.00$ and $\$200.00$.
- Hint: You can use the BETWEEN 100.00 and 200.00 clause.

Step 3: Sorting

It could be beneficial to sort by one of the attributes in the database. The following cell contains a basic sorting demo.



In [10]:

    
query = '''SELECT * FROM candidates ORDER BY id DESC'''
viz_tables(candidate_cols, query)









    Out[10]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
    
  
  
    
      0
      41
      Fred
      Thompson
      D.
      R
    
    
      1
      39
      Christopher
      Dodd
      J.
      D
    
    
      2
      38
      Tom
      Tancredo
      
      R
    
    
      3
      37
      John
      McCain
      
      R
    
    
      4
      36
      Samuel
      Brownback
      
      R
    
    
      5
      35
      Mitt
      Romney
      
      R
    
    
      6
      34
      Hillary
      Clinton
      R.
      D
    
    
      7
      33
      Joseph
      Biden
      
      D
    
    
      8
      32
      Ron
      Paul
      
      R
    
    
      9
      31
      Dennis
      Kucinich
      
      D
    
    
      10
      30
      Duncan
      Hunter
      
      R
    
    
      11
      29
      Bill
      Richardson
      
      D
    
    
      12
      26
      John
      Edwards
      
      D
    
    
      13
      24
      Mike
      Gravel
      
      D
    
    
      14
      22
      Rudolph
      Giuliani
      
      R
    
    
      15
      20
      Barack
      Obama
      
      D
    
    
      16
      16
      Mike
      Huckabee
      
      R

Do the following sorts on the `contributors` table:

Sort the contributors table by last_name.
Sort by the amount in decending order where amount is restricted to be between $\$1000.00$ and $\$5000.00$.
Sort the contributors who donted between $\$1000.00$ and $\$5000.00$ by candidate_id and then by amount in descending order.
- Hint: Multiple orderings can be accomplished by separating requests after ORDER BY with commas.
- e.g. ORDER BY amount ASC, last_name DESC

Step 4: Selecting Columns

So far, we've been selecting all columns from a table (i.e. SELECT * FROM). Often, we just want to select specific columns (e.g. SELECT amount FROM).



In [11]:

    
query = '''SELECT last_name, party FROM candidates'''
viz_tables(['last_name', 'party'], query)









    Out[11]:







  
    
      
      last_name
      party
    
  
  
    
      0
      Huckabee
      R
    
    
      1
      Obama
      D
    
    
      2
      Giuliani
      R
    
    
      3
      Gravel
      D
    
    
      4
      Edwards
      D
    
    
      5
      Richardson
      D
    
    
      6
      Hunter
      R
    
    
      7
      Kucinich
      D
    
    
      8
      Paul
      R
    
    
      9
      Biden
      D
    
    
      10
      Clinton
      D
    
    
      11
      Romney
      R
    
    
      12
      Brownback
      R
    
    
      13
      McCain
      R
    
    
      14
      Tancredo
      R
    
    
      15
      Dodd
      D
    
    
      16
      Thompson
      R

Using the DISTINCT clause, you remove duplicate rows.



In [12]:

    
query = '''SELECT DISTINCT party FROM candidates'''
viz_tables(['party'], query)

Do the following:

Get the first and last name of contributors. Make sure each row has distinct values.

Step 5: Altering Tables

The ALTER clause allows us to modify tables in our database. Here, we had a new column to our candidates table called nick_name.



In [13]:

    
cursor.execute('''ALTER TABLE candidates ADD COLUMN full_name TEXT''')
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
viz_tables(candidate_cols, '''SELECT * FROM candidates''')









    Out[13]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
      full_name
    
  
  
    
      0
      16
      Mike
      Huckabee
      
      R
      None
    
    
      1
      20
      Barack
      Obama
      
      D
      None
    
    
      2
      22
      Rudolph
      Giuliani
      
      R
      None
    
    
      3
      24
      Mike
      Gravel
      
      D
      None
    
    
      4
      26
      John
      Edwards
      
      D
      None
    
    
      5
      29
      Bill
      Richardson
      
      D
      None
    
    
      6
      30
      Duncan
      Hunter
      
      R
      None
    
    
      7
      31
      Dennis
      Kucinich
      
      D
      None
    
    
      8
      32
      Ron
      Paul
      
      R
      None
    
    
      9
      33
      Joseph
      Biden
      
      D
      None
    
    
      10
      34
      Hillary
      Clinton
      R.
      D
      None
    
    
      11
      35
      Mitt
      Romney
      
      R
      None
    
    
      12
      36
      Samuel
      Brownback
      
      R
      None
    
    
      13
      37
      John
      McCain
      
      R
      None
    
    
      14
      38
      Tom
      Tancredo
      
      R
      None
    
    
      15
      39
      Christopher
      Dodd
      J.
      D
      None
    
    
      16
      41
      Fred
      Thompson
      D.
      R
      None

What if we want to rename or delete a columm? It can't be done with SQLite with a single command. We need to follow some roundabout steps (see SQLite ALTER TABLE). We won't consider this case at the moment.

For now, let's put a few commands together to populate the full_name column.



In [14]:

    
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")] # regenerate columns with full_name
query = '''SELECT id, last_name, first_name FROM candidates''' # Select a few columns
full_name_and_id = [(attr[1] + ", " + attr[2], attr[0]) for attr in cursor.execute(query).fetchall()] # List of tuples: (full_name, id)

update = '''UPDATE candidates SET full_name = ? WHERE id = ?''' # Update the table
for rows in full_name_and_id:
    cursor.execute(update, rows)

query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)









    Out[14]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
      full_name
    
  
  
    
      0
      16
      Mike
      Huckabee
      
      R
      Huckabee, Mike
    
    
      1
      20
      Barack
      Obama
      
      D
      Obama, Barack
    
    
      2
      22
      Rudolph
      Giuliani
      
      R
      Giuliani, Rudolph
    
    
      3
      24
      Mike
      Gravel
      
      D
      Gravel, Mike
    
    
      4
      26
      John
      Edwards
      
      D
      Edwards, John
    
    
      5
      29
      Bill
      Richardson
      
      D
      Richardson, Bill
    
    
      6
      30
      Duncan
      Hunter
      
      R
      Hunter, Duncan
    
    
      7
      31
      Dennis
      Kucinich
      
      D
      Kucinich, Dennis
    
    
      8
      32
      Ron
      Paul
      
      R
      Paul, Ron
    
    
      9
      33
      Joseph
      Biden
      
      D
      Biden, Joseph
    
    
      10
      34
      Hillary
      Clinton
      R.
      D
      Clinton, Hillary
    
    
      11
      35
      Mitt
      Romney
      
      R
      Romney, Mitt
    
    
      12
      36
      Samuel
      Brownback
      
      R
      Brownback, Samuel
    
    
      13
      37
      John
      McCain
      
      R
      McCain, John
    
    
      14
      38
      Tom
      Tancredo
      
      R
      Tancredo, Tom
    
    
      15
      39
      Christopher
      Dodd
      J.
      D
      Dodd, Christopher
    
    
      16
      41
      Fred
      Thompson
      D.
      R
      Thompson, Fred

Here's another update, this time on an existing column.



In [15]:

    
update = '''UPDATE candidates SET full_name = "Eventual Winner" WHERE last_name = "Obama"'''
cursor.execute(update)
update = '''UPDATE candidates SET full_name = "Eventual Loser" WHERE last_name = "McCain"'''
cursor.execute(update)
viz_tables(candidate_cols, query)









    Out[15]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
      full_name
    
  
  
    
      0
      16
      Mike
      Huckabee
      
      R
      Huckabee, Mike
    
    
      1
      20
      Barack
      Obama
      
      D
      Eventual Winner
    
    
      2
      22
      Rudolph
      Giuliani
      
      R
      Giuliani, Rudolph
    
    
      3
      24
      Mike
      Gravel
      
      D
      Gravel, Mike
    
    
      4
      26
      John
      Edwards
      
      D
      Edwards, John
    
    
      5
      29
      Bill
      Richardson
      
      D
      Richardson, Bill
    
    
      6
      30
      Duncan
      Hunter
      
      R
      Hunter, Duncan
    
    
      7
      31
      Dennis
      Kucinich
      
      D
      Kucinich, Dennis
    
    
      8
      32
      Ron
      Paul
      
      R
      Paul, Ron
    
    
      9
      33
      Joseph
      Biden
      
      D
      Biden, Joseph
    
    
      10
      34
      Hillary
      Clinton
      R.
      D
      Clinton, Hillary
    
    
      11
      35
      Mitt
      Romney
      
      R
      Romney, Mitt
    
    
      12
      36
      Samuel
      Brownback
      
      R
      Brownback, Samuel
    
    
      13
      37
      John
      McCain
      
      R
      Eventual Loser
    
    
      14
      38
      Tom
      Tancredo
      
      R
      Tancredo, Tom
    
    
      15
      39
      Christopher
      Dodd
      J.
      D
      Dodd, Christopher
    
    
      16
      41
      Fred
      Thompson
      D.
      R
      Thompson, Fred

Do the following:

Add a new column to the contributors table called full_name. The value in that column should be in the form last_name, first_name.
Change the value in the full_name column to the string "Too Much" if someone donated more than $\$1000.00$.

Step 6: Aggregation

You can perform some nice operations on the values in the database. For example, you can compute the maximum, minimum, and sum of a set. You can also count the number of items in a given set. Here's a little example. You can do the rest.



In [16]:

    
contributor_cols = [col[1] for col in cursor.execute("PRAGMA table_info(contributors)")] # You've already done this part.  I just need to do it here b/c I haven't yet.
function = '''SELECT *, MAX(amount) AS max_amount FROM contributors'''
viz_tables(contributor_cols, function)









    Out[16]:







  
    
      
      id
      last_name
      first_name
      middle_name
      street_1
      street_2
      city
      state
      zip
      amount
      date
      candidate_id
    
  
  
    
      0
      31
      Buckel
      Linda
      
      PO Box 683130
      
      Park City
      UT
      840683130
      4600.0
      2007-08-14
      20

Do the following:

Count how many donations there were above $\$1000.00$.
Calculate the average donation.
Calculate the average contribution from each state and display in a table.
- Hint: Use code that looks like:
```
"SELECT state,SUM(amount) FROM contributors GROUP BY state"
```

Step 7: DELETE

We have already noted that SQLite can't drop columns in a straightfoward manner. However, it can delete rows quite simply. Here's the syntax:

deletion = '''DELETE FROM table_name WHERE condition'''

Do the following:

Delete rows in the contributors table with last name "Ahrens".

Step 8: LIMIT

The LIMIT clause offers convenient functionality. It allows you to constrain the number of rows returned by your query. It shows up in many guises.



In [17]:

    
query = '''SELECT * FROM candidates LIMIT 3'''
viz_tables(candidate_cols, query)









    Out[17]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
      full_name
    
  
  
    
      0
      16
      Mike
      Huckabee
      
      R
      Huckabee, Mike
    
    
      1
      20
      Barack
      Obama
      
      D
      Eventual Winner
    
    
      2
      22
      Rudolph
      Giuliani
      
      R
      Giuliani, Rudolph



In [18]:

    
query = '''SELECT * FROM candidates LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)









    Out[18]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
      full_name
    
  
  
    
      0
      29
      Bill
      Richardson
      
      D
      Richardson, Bill
    
    
      1
      30
      Duncan
      Hunter
      
      R
      Hunter, Duncan
    
    
      2
      31
      Dennis
      Kucinich
      
      D
      Kucinich, Dennis
    
    
      3
      32
      Ron
      Paul
      
      R
      Paul, Ron



In [19]:

    
query = '''SELECT * FROM candidates ORDER BY last_name LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)









    Out[19]:







  
    
      
      id
      first_name
      last_name
      middle_init
      party
      full_name
    
  
  
    
      0
      22
      Rudolph
      Giuliani
      
      R
      Giuliani, Rudolph
    
    
      1
      24
      Mike
      Gravel
      
      D
      Gravel, Mike
    
    
      2
      16
      Mike
      Huckabee
      
      R
      Huckabee, Mike
    
    
      3
      30
      Duncan
      Hunter
      
      R
      Hunter, Duncan

Do the following:

Query and display the ten most generous donors.
Query and display the ten least generous donors who donated a positive amount of money (since the data we have has some negative numbers in it...).

Save

Don't forget to save all of these changes to your database using db.commit(). Before closing shop, be sure to close the database connection with db.close().

	id	first_name	last_name	middle_init	party
0	16	Mike	Huckabee		R
1	20	Barack	Obama		D
2	22	Rudolph	Giuliani		R
3	24	Mike	Gravel		D
4	26	John	Edwards		D
5	29	Bill	Richardson		D
6	30	Duncan	Hunter		R
7	31	Dennis	Kucinich		D
8	32	Ron	Paul		R
9	33	Joseph	Biden		D
10	34	Hillary	Clinton	R.	D
11	35	Mitt	Romney		R
12	36	Samuel	Brownback		R
13	37	John	McCain		R
14	38	Tom	Tancredo		R
15	39	Christopher	Dodd	J.	D
16	41	Fred	Thompson	D.	R

	id	first_name	last_name	middle_init	party	full_name
0	16	Mike	Huckabee		R	None
1	20	Barack	Obama		D	None
2	22	Rudolph	Giuliani		R	None
3	24	Mike	Gravel		D	None
4	26	John	Edwards		D	None
5	29	Bill	Richardson		D	None
6	30	Duncan	Hunter		R	None
7	31	Dennis	Kucinich		D	None
8	32	Ron	Paul		R	None
9	33	Joseph	Biden		D	None
10	34	Hillary	Clinton	R.	D	None
11	35	Mitt	Romney		R	None
12	36	Samuel	Brownback		R	None
13	37	John	McCain		R	None
14	38	Tom	Tancredo		R	None
15	39	Christopher	Dodd	J.	D	None
16	41	Fred	Thompson	D.	R	None

	id	first_name	last_name	middle_init	party	full_name
0	16	Mike	Huckabee		R	Huckabee, Mike
1	20	Barack	Obama		D	Obama, Barack
2	22	Rudolph	Giuliani		R	Giuliani, Rudolph
3	24	Mike	Gravel		D	Gravel, Mike
4	26	John	Edwards		D	Edwards, John
5	29	Bill	Richardson		D	Richardson, Bill
6	30	Duncan	Hunter		R	Hunter, Duncan
7	31	Dennis	Kucinich		D	Kucinich, Dennis
8	32	Ron	Paul		R	Paul, Ron
9	33	Joseph	Biden		D	Biden, Joseph
10	34	Hillary	Clinton	R.	D	Clinton, Hillary
11	35	Mitt	Romney		R	Romney, Mitt
12	36	Samuel	Brownback		R	Brownback, Samuel
13	37	John	McCain		R	McCain, John
14	38	Tom	Tancredo		R	Tancredo, Tom
15	39	Christopher	Dodd	J.	D	Dodd, Christopher
16	41	Fred	Thompson	D.	R	Thompson, Fred

Lecture 18

Wednesday, November 8th, 2017

Databases with SQlite

SQLite Exercises

Exercise Deliverables

Table of Contents

Setting the Stage

Step 1

Interlude

Step 2: Various Queries

Do the following queries:

Step 3: Sorting

Do the following sorts on the contributors table:

Step 4: Selecting Columns

Do the following:

Step 5: Altering Tables

Do the following:

Step 6: Aggregation

Do the following:

Step 7: DELETE

Do the following:

Step 8: LIMIT

Do the following:

Save

Databases with `SQlite`

`SQLite` Exercises

Do the following sorts on the `contributors` table: