Lecture 19

Monday, November 13th 2017

Joins with `SQLite`, `pandas`

Starting Up

You can connect to the saved database from last time if you want. Alternatively, for extra practice, you can just recreate it from the datasets provided in the .txt files. That's what I'll do.



In [1]:

    
import sqlite3
import numpy as np
import pandas as pd
import time
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)

db = sqlite3.connect('L19DB_demo.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS candidates")
cursor.execute("DROP TABLE IF EXISTS contributors")
cursor.execute("PRAGMA foreign_keys=1")

cursor.execute('''CREATE TABLE candidates (
               id INTEGER PRIMARY KEY NOT NULL, 
               first_name TEXT, 
               last_name TEXT, 
               middle_init TEXT, 
               party TEXT NOT NULL)''')

db.commit() # Commit changes to the database

cursor.execute('''CREATE TABLE contributors (
          id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
          last_name TEXT, 
          first_name TEXT, 
          middle_name TEXT, 
          street_1 TEXT, 
          street_2 TEXT, 
          city TEXT, 
          state TEXT, 
          zip TEXT, 
          amount REAL, 
          date DATETIME, 
          candidate_id INTEGER NOT NULL, 
          FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')

db.commit()

with open ("candidates.txt") as candidates:
    next(candidates) # jump over the header
    for line in candidates.readlines():
        cid, first_name, last_name, middle_name, party = line.strip().split('|')
        vals_to_insert = (int(cid), first_name, last_name, middle_name, party)
        cursor.execute('''INSERT INTO candidates 
                  (id, first_name, last_name, middle_init, party)
                  VALUES (?, ?, ?, ?, ?)''', vals_to_insert)

with open ("contributors.txt") as contributors:
    next(contributors)
    for line in contributors.readlines():
        cid, last_name, first_name, middle_name, street_1, street_2, \
            city, state, zip_code, amount, date, candidate_id = line.strip().split('|')
        vals_to_insert = (last_name, first_name, middle_name, street_1, street_2, 
                          city, state, int(zip_code), amount, date, candidate_id)
        cursor.execute('''INSERT INTO contributors (last_name, first_name, middle_name, 
                           street_1, street_2, city, state, zip, amount, date, candidate_id) 
                           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert)

candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
contributor_cols = [col[1] for col in cursor.execute("PRAGMA table_info(contributors)")]

def viz_tables(cols, query):
    q = cursor.execute(query).fetchall()
    framelist = []
    for i, col_name in enumerate(cols):
        framelist.append((col_name, [col[i] for col in q]))
    return pd.DataFrame.from_items(framelist)

Recap

Last time, you played with a bunch of SQLite commands to query and update the tables in the database.

One thing we didn't get to was how to query the contributors table based off of a query in the candidates table. For example, suppose you want to query which contributors donated to Obama. You could use a nested SELECT statement to accomplish that.



In [2]:

    
query = '''SELECT * FROM contributors WHERE candidate_id = (SELECT id from candidates WHERE last_name = "Obama")'''
viz_tables(contributor_cols, query)









    Out[2]:







  
    
      
      id
      last_name
      first_name
      middle_name
      street_1
      street_2
      city
      state
      zip
      amount
      date
      candidate_id
    
  
  
    
      0
      26
      Buckler
      Steve
      
      24351 Armada Dr
      
      Dana Point
      CA
      926291306
      50.00
      2007-07-30
      20
    
    
      1
      27
      Buckler
      Steve
      
      24351 Armada Dr
      
      Dana Point
      CA
      926291306
      25.00
      2007-08-16
      20
    
    
      2
      28
      Buckheit
      Bruce
      
      8904 KAREN DR
      
      FAIRFAX
      VA
      220312731
      100.00
      2007-09-19
      20
    
    
      3
      29
      Buckel
      Linda
      
      PO Box 683130
      
      Park City
      UT
      840683130
      2300.00
      2007-08-14
      20
    
    
      4
      30
      Buckel
      Linda
      
      PO Box 683130
      
      Park City
      UT
      840683130
      -2300.00
      2007-08-14
      20
    
    
      5
      31
      Buckel
      Linda
      
      PO Box 683130
      
      Park City
      UT
      840683130
      4600.00
      2007-08-14
      20
    
    
      6
      32
      Buck
      Thomas
      
      4206 Terrace Street
      
      Kansas City
      MO
      64111
      100.00
      2007-09-25
      20
    
    
      7
      33
      Buck
      Jay
      K.
      1855 Old Willow Rd Unit 322
      
      Northfield
      IL
      600932918
      200.00
      2007-09-12
      20
    
    
      8
      34
      Buck
      Blaine
      M
      45 Eaton Ave
      
      Camden
      ME
      48431752
      2300.00
      2007-09-30
      20
    
    
      9
      35
      Buck
      Barbara
      
      1780 NE 138th St
      
      North Miami
      FL
      331811316
      50.00
      2007-09-13
      20
    
    
      10
      36
      Buck
      Barbara
      
      1780 NE 138th St
      
      North Miami
      FL
      331811316
      50.00
      2007-07-19
      20
    
    
      11
      37
      Buchman
      Mark M
      
      2530 Lawton Ave
      
      San Luis Obispo
      CA
      934015622
      460.80
      2007-07-18
      20
    
    
      12
      38
      Bucher
      Ida
      M
      1400 Warnall Ave
      
      Los Angeles
      CA
      900245333
      100.00
      2007-07-10
      20
    
    
      13
      39
      Buchanek
      Elizabeth
      
      7917 Kentbury Dr
      
      Bethesda
      MD
      208144615
      50.00
      2007-09-30
      20
    
    
      14
      40
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      500.00
      2007-09-24
      20
    
    
      15
      41
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      -500.00
      2007-09-24
      20
    
    
      16
      42
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      500.00
      2007-09-24
      20
    
    
      17
      43
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      700.00
      2007-08-28
      20
    
    
      18
      44
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      -700.00
      2007-08-28
      20
    
    
      19
      45
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      1000.00
      2007-08-28
      20
    
    
      20
      46
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      1300.00
      2007-08-09
      20
    
    
      21
      47
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      200.00
      2007-08-14
      20
    
    
      22
      48
      Buchanan
      John
      
      2025 NW 29th Rd
      
      Boca Raton
      FL
      334316303
      500.00
      2007-07-25
      20
    
    
      23
      49
      Buchanan
      John
      
      4635 49th St NW
      
      Washington
      DC
      200164320
      200.09
      2007-09-23
      20
    
    
      24
      50
      Harrison
      Ryan
      
      2247 3rd St
      
      La Verne
      CA
      917504918
      25.00
      2007-07-26
      20

Joins

The last example involved querying data from multiple tables.

In particular, we combined columns from the two related tables (related through the FOREIGN KEY).

This leads to the idea of joining multiple tables together. SQL has a set of commands to handle different types of joins. SQLite does not support the full suite of join commands offered by SQL but you should still be able to get the main ideas from the limited command set.

We'll begin with the INNER JOIN.

INNER JOIN

The idea here is that you will combine the tables if the values of certain columns are the same between the two tables. In our example, we will join the two tables based on the candidate id. The result of the INNER JOIN will be a new table consisting of the columns we requested and containing the common data. Since we are joining based off of the candidate id, we will not be excluding any rows.

Example

Here are two tables. Table A has the form:

nA	attr	idA
s1	23	0
s2	7	2

and table B has the form:

nB	attr	idB
t1	60	0
t2	14	7
t3	22	2

Table A is associated with Table B through a foreign key on the id column.

If we join the two tables by comparing the id columns and selecting the nA, nB, and attr columns then we'll get

nA	A.attr	nB	B.attr
s1	23	t1	60
s2	7	t3	22

The SQLite code to do this join would be

SELECT nA, A.attr, nB, B.attr FROM A INNER JOIN B ON B.idB = A.idA

Notice that the second row in table B is gone because the id values are not the same.

Thoughts

What is SQL doing with this operation? It may help to visualize this with a Venn diagram. Table A has rows with values corresponding to the idA attribute. Column B has rows with values corresponding to the idB attribute. The INNER JOIN will combine the two tables such that rows with common entries in the id attributes are included. We essentially have the following Venn diagram.

Exercises

Using an INNER JOIN, join the candidates and contributors tables by comparing the candidate_id and candidates_id columns. Display your joined table with the columns contributors.last_name, contributors.first_name, and candidates.last_name.
Do the same inner join as in the last part, but this time append a WHERE clause to select a specific candidate's last name.

`LEFT JOIN` or `LEFT OUTER JOIN`

There are many ways to combine two tables. We just explored one possibility in which we combined the tables based upon the intersection of the two tables (the INNER JOIN).

Now we'll talk about the LEFT JOIN or LEFT OUTER JOIN.

In words, the LEFT JOIN is combining the tables based upon what is in the intersection of the two tables and what is in the "reference" table.

We can consider our toy example in two guises:

Example A

Let's do a LEFT JOIN of table B from table A. That is, we'd like to make a new table by putting table B into table A. In this case, we'll consider table A our "reference" table. We're comparing by the id column again. We know that these two tables share ids 0 and 2 and table A doesn't have anything else in it. The resulting table is:

nA	A.attr	nB	B.attr
s1	23	t1	60
s2	7	t3	22

That's not very exciting. It's the same result as from the INNER JOIN. We can do another example that may be more enlightening.

Example B

Let's do a LEFT JOIN of table A from table B. That is, we'd like to make a new table by putting table A into table B. In this case, we'll consider table B our "reference" table. Again, we use the id column from comparison. We know that these two tables share ids 0 and 2. This time, table B also contains the id 7, which is not shared by table A. The resulting table is:

nA	A.attr	nB	B.attr
s1	23	t1	60
None	NaN	t2	14
s2	7	t3	22

Notice that SQLite filed in the missing entries for us. This is necessary for completion of the requested join.

The SQLite commands to accomplish all of this are:

SELECT nA, A.attr, nB, B.attr FROM A LEFT JOIN B ON B.idB = A.idA

and

SELECT nA, A.attr, nB, B.attr FROM B LEFT JOIN A ON A.idA = B.idB

Here is a visualization using Venn diagrams of the LEFT JOIN.

Exercises

Use the following two tables to do the first two exercises in this section. Table A has the form:

nA	attr	idA
s1	23	0
s2	7	2
s3	15	2
s4	31	0

and table B has the form:

nB	attr	idB
t1	60	0
t2	14	7
t3	22	2

Draw the table that would result from a LEFT JOIN using table A as the reference and the id columns for comparison.
Draw the table that would result from a LEFT JOIN using table B as the reference and the id columns for comparison.
Create a new table with the following form:

average contribution	number of contributors	candidate last name
...	...	...

The table should be created using the LEFT JOIN clause on the contributors table by joining the candidates table by the id column. The average contribution column and number of contributors column should be obtained using the AVG and COUNT SQL functions. Finally, you should use the GROUP BY clause on the candidates last name.

`pandas`

We've been working with databases for the last few lectures and learning SQLite commands to work with and manipulate the databases. There is a Python package called pandas that provides broad support for data structures. It can be used to interact with relationsional databases through its own methods and even through SQL commands.

In the last part of this lecture, you will get to redo a bunch of the database exercises using pandas.

We won't be able to cover pandas from the ground up, but it's a well-documented library and is fairly easy to get up and running. Here's the website: pandas.

Reading a datafile into `pandas`



In [3]:

    
# Using pandas naming convention
dfcand = pd.read_csv("candidates.txt", sep="|")
dfcand









    Out[3]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      0
      33
      Joseph
      Biden
      NaN
      D
    
    
      1
      36
      Samuel
      Brownback
      NaN
      R
    
    
      2
      34
      Hillary
      Clinton
      R.
      D
    
    
      3
      39
      Christopher
      Dodd
      J.
      D
    
    
      4
      26
      John
      Edwards
      NaN
      D
    
    
      5
      22
      Rudolph
      Giuliani
      NaN
      R
    
    
      6
      24
      Mike
      Gravel
      NaN
      D
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R
    
    
      8
      30
      Duncan
      Hunter
      NaN
      R
    
    
      9
      31
      Dennis
      Kucinich
      NaN
      D
    
    
      10
      37
      John
      McCain
      NaN
      R
    
    
      11
      20
      Barack
      Obama
      NaN
      D
    
    
      12
      32
      Ron
      Paul
      NaN
      R
    
    
      13
      29
      Bill
      Richardson
      NaN
      D
    
    
      14
      35
      Mitt
      Romney
      NaN
      R
    
    
      15
      38
      Tom
      Tancredo
      NaN
      R
    
    
      16
      41
      Fred
      Thompson
      D.
      R



In [4]:

    
dfcontr = pd.read_csv("contributors.txt", sep="|")
dfcontr









    Out[4]:







  
    
      
      id
      last_name
      first_name
      middle_name
      street_1
      street_2
      city
      state
      zip
      amount
      date
      candidate_id
    
  
  
    
      0
      NaN
      Agee
      Steven
      NaN
      549 Laurel Branch Road
      NaN
      Floyd
      VA
      24091
      500.0
      2007-06-30
      16
    
    
      1
      NaN
      Ahrens
      Don
      NaN
      4034 Rennellwood Way
      NaN
      Pleasanton
      CA
      94566
      250.0
      2007-05-16
      16
    
    
      2
      NaN
      Ahrens
      Don
      NaN
      4034 Rennellwood Way
      NaN
      Pleasanton
      CA
      94566
      50.0
      2007-06-18
      16
    
    
      3
      NaN
      Ahrens
      Don
      NaN
      4034 Rennellwood Way
      NaN
      Pleasanton
      CA
      94566
      100.0
      2007-06-21
      16
    
    
      4
      NaN
      Akin
      Charles
      NaN
      10187 Sugar Creek Road
      NaN
      Bentonville
      AR
      72712
      100.0
      2007-06-16
      16
    
    
      5
      NaN
      Akin
      Mike
      NaN
      181 Baywood Lane
      NaN
      Monticello
      AR
      71655
      1500.0
      2007-05-18
      16
    
    
      6
      NaN
      Akin
      Rebecca
      NaN
      181 Baywood Lane
      NaN
      Monticello
      AR
      71655
      500.0
      2007-05-18
      16
    
    
      7
      NaN
      Aldridge
      Brittni
      NaN
      808 Capitol Square Place, SW
      NaN
      Washington
      DC
      20024
      250.0
      2007-06-06
      16
    
    
      8
      NaN
      Allen
      John D.
      NaN
      1052 Cannon Mill Drive
      NaN
      North Augusta
      SC
      29860
      1000.0
      2007-06-11
      16
    
    
      9
      NaN
      Allen
      John D.
      NaN
      1052 Cannon Mill Drive
      NaN
      North Augusta
      SC
      29860
      1300.0
      2007-06-29
      16
    
    
      10
      NaN
      Allison
      John W.
      NaN
      P.O. Box 1089
      NaN
      Conway
      AR
      72033
      1000.0
      2007-05-18
      16
    
    
      11
      NaN
      Allison
      Rebecca
      NaN
      3206 Summit Court
      NaN
      Little Rock
      AR
      72227
      1000.0
      2007-04-25
      16
    
    
      12
      NaN
      Allison
      Rebecca
      NaN
      3206 Summit Court
      NaN
      Little Rock
      AR
      72227
      200.0
      2007-06-12
      16
    
    
      13
      NaN
      Altes
      R.D.
      NaN
      8600 Moody Road
      NaN
      Fort Smith
      AR
      72903
      2300.0
      2007-06-21
      16
    
    
      14
      NaN
      Andres
      Dale
      NaN
      1160 Glen Oaks Drive
      NaN
      West Des Moines
      IA
      50266
      250.0
      2007-06-06
      16
    
    
      15
      NaN
      Anthony
      John
      NaN
      211 Long Island Drive
      NaN
      Hot Springs
      AR
      71913
      2300.0
      2007-06-12
      16
    
    
      16
      NaN
      Arbogast
      Robert
      NaN
      12900 State Route 56 SE
      NaN
      Mount Sterling
      OH
      43143
      500.0
      2007-04-08
      16
    
    
      17
      NaN
      Arbogast
      Robert
      NaN
      12900 State Route 56 SE
      NaN
      Mount Sterling
      OH
      43143
      100.0
      2007-06-22
      16
    
    
      18
      NaN
      Ardle
      William
      NaN
      412 Dakota Avenue
      NaN
      Springfield
      OH
      45504
      50.0
      2007-06-28
      16
    
    
      19
      NaN
      Atiq
      Omar
      NaN
      7200 S Hazel Street
      NaN
      Pine Bluff
      AR
      71603
      1000.0
      2007-05-18
      16
    
    
      20
      NaN
      Atiq
      Omar
      NaN
      7200 S Hazel Street
      NaN
      Pine Bluff
      AR
      71603
      1000.0
      2007-06-27
      16
    
    
      21
      NaN
      Baker
      David
      NaN
      2550 Adamsbrooke Drive
      NaN
      Conway
      AR
      72034
      2300.0
      2007-04-11
      16
    
    
      22
      NaN
      Bancroft
      David
      NaN
      2934 Broderick Street
      NaN
      San Francisco
      CA
      94123
      250.0
      2007-04-24
      16
    
    
      23
      NaN
      Banks
      Charles
      NaN
      P.O. Box 251310
      NaN
      Little Rock
      AR
      72225
      1000.0
      2007-05-14
      16
    
    
      24
      NaN
      Barbee
      John
      NaN
      516 Kellyridge Drive
      NaN
      Apex
      NC
      27502
      500.0
      2007-05-23
      16
    
    
      25
      NaN
      Buckler
      Steve
      NaN
      24351 Armada Dr
      NaN
      Dana Point
      CA
      926291306
      50.0
      2007-07-30
      20
    
    
      26
      NaN
      Buckler
      Steve
      NaN
      24351 Armada Dr
      NaN
      Dana Point
      CA
      926291306
      25.0
      2007-08-16
      20
    
    
      27
      NaN
      Buckheit
      Bruce
      NaN
      8904 KAREN DR
      NaN
      FAIRFAX
      VA
      220312731
      100.0
      2007-09-19
      20
    
    
      28
      NaN
      Buckel
      Linda
      NaN
      PO Box 683130
      NaN
      Park City
      UT
      840683130
      2300.0
      2007-08-14
      20
    
    
      29
      NaN
      Buckel
      Linda
      NaN
      PO Box 683130
      NaN
      Park City
      UT
      840683130
      -2300.0
      2007-08-14
      20
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      145
      NaN
      ABDELLA
      THOMAS
      M.
      4231 MONUMENT WALL WAY #340
      NaN
      FAIRFAX
      VA
      220308440
      50.0
      2007-09-30
      35
    
    
      146
      NaN
      ABBOTT
      WELDON
      S.
      777 EAST SOUTH TEMPLE  4E
      NaN
      SALT LAKE CITY
      UT
      841021269
      100.0
      2007-09-29
      35
    
    
      147
      NaN
      ABBOTT
      WELDON
      S.
      777 EAST SOUTH TEMPLE  4E
      NaN
      SALT LAKE CITY
      UT
      841021269
      50.0
      2007-08-09
      35
    
    
      148
      NaN
      ABBOTT
      GERALD
      F.
      389 BENEFIT STREET
      NaN
      PROVIDENCE
      RI
      29032946
      100.0
      2007-09-15
      35
    
    
      149
      NaN
      ABBOTT
      GERALD
      F.
      389 BENEFIT STREET
      NaN
      PROVIDENCE
      RI
      29032946
      100.0
      2007-08-15
      35
    
    
      150
      NaN
      ABEDIN
      ZAINUL
      NaN
      715 N. CENTRAL AVENUE
      SUITE 212
      GLENDALE
      CA
      912031164
      500.0
      2008-01-21
      37
    
    
      151
      NaN
      ABBOTT
      SYBIL
      F.
      446 GAMES DRIVE
      NaN
      RENO
      NV
      895093326
      75.0
      2008-01-08
      37
    
    
      152
      NaN
      ABBOTT
      SYBIL
      F.
      446 GAMES DRIVE
      NaN
      RENO
      NV
      895093326
      50.0
      2008-01-08
      37
    
    
      153
      NaN
      ABBOTT
      RONALD
      LEANDER
      5453 HAWTHORNE STREET
      NaN
      MONTCLAIR
      CA
      917632551
      200.0
      2008-01-31
      37
    
    
      154
      NaN
      ABBOTT
      RONALD
      LEANDER
      5453 HAWTHORNE STREET
      NaN
      MONTCLAIR
      CA
      917632551
      100.0
      2008-01-08
      37
    
    
      155
      NaN
      ABBOTT
      ROBERT
      A.
      3061 LOREE ROAD
      NaN
      DECKERVILLE
      MI
      484279763
      500.0
      2008-01-21
      37
    
    
      156
      NaN
      ABBOTT
      MIKE
      E.
      4516 OSPREY LNDG
      NaN
      NICEVILLE
      FL
      325786810
      1000.0
      2008-01-15
      37
    
    
      157
      NaN
      ABBOT
      DAVID
      M.
      56 SALEM STREET
      NaN
      ANDOVER
      MA
      18102114
      200.0
      2008-01-21
      37
    
    
      158
      NaN
      ABBO
      PAULINE
      MORENCY
      10720 JACOB LANE
      NaN
      WHITE LAKE
      MI
      483862274
      35.0
      2008-01-07
      37
    
    
      159
      NaN
      ABATE
      MARIA
      ELENA
      1291 NIGHTINGALE AVENUE
      NaN
      MIAMI SPRINGS
      FL
      331663832
      2600.0
      2008-01-25
      37
    
    
      160
      NaN
      ABAIR
      PETER
      NaN
      40 EVANS STREET
      NaN
      WATERTOWN
      MA
      24722150
      25.0
      2008-01-09
      37
    
    
      161
      NaN
      ABACHERLI
      SHIRLEY
      M.
      29875 NEWPORT ROAD
      NaN
      MENIFEE
      CA
      925849524
      150.0
      2008-01-28
      37
    
    
      162
      NaN
      AARONS
      CHARLES
      NaN
      1730 SHORE DRIVE
      NaN
      ANCHORAGE
      AK
      995153207
      300.0
      2008-01-30
      37
    
    
      163
      NaN
      AARONS
      CHARLES
      NaN
      1730 SHORE DRIVE
      NaN
      ANCHORAGE
      AK
      995153207
      410.0
      2008-01-15
      37
    
    
      164
      NaN
      AARONS
      CHARLES
      NaN
      1730 SHORE DRIVE
      NaN
      ANCHORAGE
      AK
      995153207
      500.0
      2008-01-09
      37
    
    
      165
      NaN
      ABEL
      JOHN
      H.
      422 THOMAS STREET
      NaN
      BETHLEHEM
      PA
      180153316
      200.0
      2008-01-22
      37
    
    
      166
      NaN
      ABEL
      MARLING
      L.
      14 HANGING MOSS LANE
      NaN
      GREENVILLE
      SC
      296155069
      100.0
      2008-01-22
      37
    
    
      167
      NaN
      ABEL
      RUDOLPH
      NaN
      4532 OCEAN BLVD.
      # 108
      SARASOTA
      FL
      342421337
      100.0
      2008-01-08
      37
    
    
      168
      NaN
      ABELE
      RODNEY
      NaN
      3620 METAIRIE HEIGHTS AVENUE
      NaN
      METAIRIE
      LA
      700021823
      500.0
      2008-01-15
      37
    
    
      169
      NaN
      ABERCROMBIE
      DENIS
      NaN
      11811 WATER OAK CT
      NaN
      MAGNOLIA
      TX
      773546270
      500.0
      2008-01-30
      37
    
    
      170
      NaN
      ABESHAUS
      MERRILL
      M.
      1801 N. HEREFORD DRIVE
      NaN
      FLAGSTAFF
      AZ
      860011121
      120.0
      2008-01-16
      37
    
    
      171
      NaN
      ABRAHAM
      GEORGE
      NaN
      P.O. BOX 1504
      NaN
      LAKE CHARLES
      LA
      706021504
      800.0
      2008-01-17
      37
    
    
      172
      NaN
      ABRAHAMSON
      PETER
      J.
      1030 W. ROSCOE STREET
      NaN
      CHICAGO
      IL
      606572207
      50.0
      2008-01-25
      37
    
    
      173
      NaN
      ABRAHAM
      SALEM
      A.
      P.O. BOX 7
      NaN
      CANADIAN
      TX
      790140007
      1000.0
      2008-01-17
      37
    
    
      174
      NaN
      ABRAHAM
      SALEM
      A.
      P.O. BOX 7
      NaN
      CANADIAN
      TX
      790140007
      1300.0
      2008-01-30
      37
    
  

175 rows × 12 columns

Reading things in is quite easy with pandas.

Notice that pandas populates empty fields with NaN values.

The id column in the contributors dataset is superfluous. Let's delete it.



In [5]:

    
del dfcontr['id']
dfcontr.head()









    Out[5]:







  
    
      
      last_name
      first_name
      middle_name
      street_1
      street_2
      city
      state
      zip
      amount
      date
      candidate_id
    
  
  
    
      0
      Agee
      Steven
      NaN
      549 Laurel Branch Road
      NaN
      Floyd
      VA
      24091
      500.0
      2007-06-30
      16
    
    
      1
      Ahrens
      Don
      NaN
      4034 Rennellwood Way
      NaN
      Pleasanton
      CA
      94566
      250.0
      2007-05-16
      16
    
    
      2
      Ahrens
      Don
      NaN
      4034 Rennellwood Way
      NaN
      Pleasanton
      CA
      94566
      50.0
      2007-06-18
      16
    
    
      3
      Ahrens
      Don
      NaN
      4034 Rennellwood Way
      NaN
      Pleasanton
      CA
      94566
      100.0
      2007-06-21
      16
    
    
      4
      Akin
      Charles
      NaN
      10187 Sugar Creek Road
      NaN
      Bentonville
      AR
      72712
      100.0
      2007-06-16
      16

Very nice! And we used the head method to print out the first five rows.

Creating a Table with `pandas`

We can use pandas to create tables in a database.

First, let's create a new database since we've already done a lot on our test database.



In [6]:

    
dbp = sqlite3.connect('L19_pandas_DB.sqlite')
csr = dbp.cursor()
csr.execute("DROP TABLE IF EXISTS candidates")
csr.execute("DROP TABLE IF EXISTS contributors")
csr.execute("PRAGMA foreign_keys=1")

csr.execute('''CREATE TABLE candidates (
               id INTEGER PRIMARY KEY NOT NULL, 
               first_name TEXT, 
               last_name TEXT, 
               middle_name TEXT, 
               party TEXT NOT NULL)''')

dbp.commit() # Commit changes to the database

csr.execute('''CREATE TABLE contributors (
          id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
          last_name TEXT, 
          first_name TEXT, 
          middle_name TEXT, 
          street_1 TEXT, 
          street_2 TEXT, 
          city TEXT, 
          state TEXT, 
          zip TEXT, 
          amount REAL, 
          date DATETIME, 
          candidate_id INTEGER NOT NULL, 
          FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')

dbp.commit()

Last time, we opened the data files with Python and then manually used SQLite commands to populate the individual tables. We can use pandas instead like so.



In [7]:

    
dfcand.to_sql("candidates", dbp, if_exists="append", index=False)

How big is our table?



In [8]:

    
dfcand.shape









    Out[8]:





(17, 5)

We can visualize the data in our pandas-populated table. No surprises here except that pandas did everything for us.



In [9]:

    
query = '''SELECT * FROM candidates'''
csr.execute(query).fetchall()









    Out[9]:





[(16, 'Mike', 'Huckabee', None, 'R'),
 (20, 'Barack', 'Obama', None, 'D'),
 (22, 'Rudolph', 'Giuliani', None, 'R'),
 (24, 'Mike', 'Gravel', None, 'D'),
 (26, 'John', 'Edwards', None, 'D'),
 (29, 'Bill', 'Richardson', None, 'D'),
 (30, 'Duncan', 'Hunter', None, 'R'),
 (31, 'Dennis', 'Kucinich', None, 'D'),
 (32, 'Ron', 'Paul', None, 'R'),
 (33, 'Joseph', 'Biden', None, 'D'),
 (34, 'Hillary', 'Clinton', 'R.', 'D'),
 (35, 'Mitt', 'Romney', None, 'R'),
 (36, 'Samuel', 'Brownback', None, 'R'),
 (37, 'John', 'McCain', None, 'R'),
 (38, 'Tom', 'Tancredo', None, 'R'),
 (39, 'Christopher', 'Dodd', 'J.', 'D'),
 (41, 'Fred', 'Thompson', 'D.', 'R')]

Querying a table with `pandas`

One Way



In [10]:

    
dfcand.query("first_name=='Mike' & party=='D'")









    Out[10]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      6
      24
      Mike
      Gravel
      NaN
      D

Another Way



In [11]:

    
dfcand[(dfcand.first_name=="Mike") & (dfcand.party=="D")]









    Out[11]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      6
      24
      Mike
      Gravel
      NaN
      D

More Queries



In [12]:

    
dfcand[dfcand.middle_name.notnull()]









    Out[12]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      2
      34
      Hillary
      Clinton
      R.
      D
    
    
      3
      39
      Christopher
      Dodd
      J.
      D
    
    
      16
      41
      Fred
      Thompson
      D.
      R



In [13]:

    
dfcand[dfcand.first_name.isin(['Mike', 'Hillary'])]









    Out[13]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      2
      34
      Hillary
      Clinton
      R.
      D
    
    
      6
      24
      Mike
      Gravel
      NaN
      D
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R

Exercises

Use pandas to populate the contributors table.
Query the contributors tables with the following:
1. List entries where the state is "VA" and the amount is less than $\$400.00$.
2. List entries where the state is "NULL".
3. List entries for the states of Texas and Pennsylvania.
4. List entries where the amount contributed is between $\$10.00$ and $\$50.00$.

Sorting



In [14]:

    
dfcand.sort_values(by='party')









    Out[14]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      0
      33
      Joseph
      Biden
      NaN
      D
    
    
      2
      34
      Hillary
      Clinton
      R.
      D
    
    
      3
      39
      Christopher
      Dodd
      J.
      D
    
    
      4
      26
      John
      Edwards
      NaN
      D
    
    
      13
      29
      Bill
      Richardson
      NaN
      D
    
    
      6
      24
      Mike
      Gravel
      NaN
      D
    
    
      11
      20
      Barack
      Obama
      NaN
      D
    
    
      9
      31
      Dennis
      Kucinich
      NaN
      D
    
    
      14
      35
      Mitt
      Romney
      NaN
      R
    
    
      12
      32
      Ron
      Paul
      NaN
      R
    
    
      8
      30
      Duncan
      Hunter
      NaN
      R
    
    
      15
      38
      Tom
      Tancredo
      NaN
      R
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R
    
    
      5
      22
      Rudolph
      Giuliani
      NaN
      R
    
    
      1
      36
      Samuel
      Brownback
      NaN
      R
    
    
      10
      37
      John
      McCain
      NaN
      R
    
    
      16
      41
      Fred
      Thompson
      D.
      R



In [15]:

    
dfcand.sort_values(by='party', ascending=False)









    Out[15]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      8
      30
      Duncan
      Hunter
      NaN
      R
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R
    
    
      15
      38
      Tom
      Tancredo
      NaN
      R
    
    
      14
      35
      Mitt
      Romney
      NaN
      R
    
    
      12
      32
      Ron
      Paul
      NaN
      R
    
    
      10
      37
      John
      McCain
      NaN
      R
    
    
      1
      36
      Samuel
      Brownback
      NaN
      R
    
    
      16
      41
      Fred
      Thompson
      D.
      R
    
    
      5
      22
      Rudolph
      Giuliani
      NaN
      R
    
    
      6
      24
      Mike
      Gravel
      NaN
      D
    
    
      9
      31
      Dennis
      Kucinich
      NaN
      D
    
    
      11
      20
      Barack
      Obama
      NaN
      D
    
    
      4
      26
      John
      Edwards
      NaN
      D
    
    
      13
      29
      Bill
      Richardson
      NaN
      D
    
    
      3
      39
      Christopher
      Dodd
      J.
      D
    
    
      2
      34
      Hillary
      Clinton
      R.
      D
    
    
      0
      33
      Joseph
      Biden
      NaN
      D

Selecting Columns



In [16]:

    
dfcand[['last_name', 'party']]









    Out[16]:







  
    
      
      last_name
      party
    
  
  
    
      0
      Biden
      D
    
    
      1
      Brownback
      R
    
    
      2
      Clinton
      D
    
    
      3
      Dodd
      D
    
    
      4
      Edwards
      D
    
    
      5
      Giuliani
      R
    
    
      6
      Gravel
      D
    
    
      7
      Huckabee
      R
    
    
      8
      Hunter
      R
    
    
      9
      Kucinich
      D
    
    
      10
      McCain
      R
    
    
      11
      Obama
      D
    
    
      12
      Paul
      R
    
    
      13
      Richardson
      D
    
    
      14
      Romney
      R
    
    
      15
      Tancredo
      R
    
    
      16
      Thompson
      R



In [17]:

    
dfcand[['last_name', 'party']].count()









    Out[17]:





last_name    17
party        17
dtype: int64



In [18]:

    
dfcand[['first_name']].drop_duplicates()









    Out[18]:







  
    
      
      first_name
    
  
  
    
      0
      Joseph
    
    
      1
      Samuel
    
    
      2
      Hillary
    
    
      3
      Christopher
    
    
      4
      John
    
    
      5
      Rudolph
    
    
      6
      Mike
    
    
      8
      Duncan
    
    
      9
      Dennis
    
    
      11
      Barack
    
    
      12
      Ron
    
    
      13
      Bill
    
    
      14
      Mitt
    
    
      15
      Tom
    
    
      16
      Fred



In [19]:

    
dfcand[['first_name']].drop_duplicates().count()









    Out[19]:





first_name    15
dtype: int64

Exercises

Sort the contributors table by amount and order in descending order.
Select the first_name and amount columns.
Select the last_name and first_name columns and drop duplicates.
Count how many there are after the duplicates have been dropped.

Altering Tables

Creating a new column is quite easy with pandas.



In [20]:

    
dfcand['name'] = dfcand['last_name'] + ", " + dfcand['first_name']
dfcand









    Out[20]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
      name
    
  
  
    
      0
      33
      Joseph
      Biden
      NaN
      D
      Biden, Joseph
    
    
      1
      36
      Samuel
      Brownback
      NaN
      R
      Brownback, Samuel
    
    
      2
      34
      Hillary
      Clinton
      R.
      D
      Clinton, Hillary
    
    
      3
      39
      Christopher
      Dodd
      J.
      D
      Dodd, Christopher
    
    
      4
      26
      John
      Edwards
      NaN
      D
      Edwards, John
    
    
      5
      22
      Rudolph
      Giuliani
      NaN
      R
      Giuliani, Rudolph
    
    
      6
      24
      Mike
      Gravel
      NaN
      D
      Gravel, Mike
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R
      Huckabee, Mike
    
    
      8
      30
      Duncan
      Hunter
      NaN
      R
      Hunter, Duncan
    
    
      9
      31
      Dennis
      Kucinich
      NaN
      D
      Kucinich, Dennis
    
    
      10
      37
      John
      McCain
      NaN
      R
      McCain, John
    
    
      11
      20
      Barack
      Obama
      NaN
      D
      Obama, Barack
    
    
      12
      32
      Ron
      Paul
      NaN
      R
      Paul, Ron
    
    
      13
      29
      Bill
      Richardson
      NaN
      D
      Richardson, Bill
    
    
      14
      35
      Mitt
      Romney
      NaN
      R
      Romney, Mitt
    
    
      15
      38
      Tom
      Tancredo
      NaN
      R
      Tancredo, Tom
    
    
      16
      41
      Fred
      Thompson
      D.
      R
      Thompson, Fred

We can change an existing field as well.



In [21]:

    
dfcand.loc[dfcand.first_name == "Mike", "name"]









    Out[21]:





6      Gravel, Mike
7    Huckabee, Mike
Name: name, dtype: object



In [22]:

    
dfcand.loc[dfcand.first_name == "Mike", "name"] = "Mikey"



In [23]:

    
dfcand.query("first_name == 'Mike'")









    Out[23]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
      name
    
  
  
    
      6
      24
      Mike
      Gravel
      NaN
      D
      Mikey
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R
      Mikey

You may recall that SQLite doesn't have the functionality to drop a column. It's a one-liner with pandas.



In [24]:

    
del dfcand['name']
dfcand









    Out[24]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      0
      33
      Joseph
      Biden
      NaN
      D
    
    
      1
      36
      Samuel
      Brownback
      NaN
      R
    
    
      2
      34
      Hillary
      Clinton
      R.
      D
    
    
      3
      39
      Christopher
      Dodd
      J.
      D
    
    
      4
      26
      John
      Edwards
      NaN
      D
    
    
      5
      22
      Rudolph
      Giuliani
      NaN
      R
    
    
      6
      24
      Mike
      Gravel
      NaN
      D
    
    
      7
      16
      Mike
      Huckabee
      NaN
      R
    
    
      8
      30
      Duncan
      Hunter
      NaN
      R
    
    
      9
      31
      Dennis
      Kucinich
      NaN
      D
    
    
      10
      37
      John
      McCain
      NaN
      R
    
    
      11
      20
      Barack
      Obama
      NaN
      D
    
    
      12
      32
      Ron
      Paul
      NaN
      R
    
    
      13
      29
      Bill
      Richardson
      NaN
      D
    
    
      14
      35
      Mitt
      Romney
      NaN
      R
    
    
      15
      38
      Tom
      Tancredo
      NaN
      R
    
    
      16
      41
      Fred
      Thompson
      D.
      R

Exercises

Create a name column for the contributors table with field entries of the form "last name, first name"
For contributors from the state of "PA", change the name to "X".
Delete the newly created name column.

Aggregation

We'd like to get information about the tables such as the maximum amount contributed to the candidates. Here are a bunch of way to describe the tables.



In [25]:

    
dfcand.describe()

It's not very interesting with the candidates table because the candidates table only has one numeric column.

Exercise

Use the describe() method on the contributors table.

I'll use the contributors table to do some demos now.



In [26]:

    
dfcontr.amount.max()









    Out[26]:





4600.0



In [27]:

    
dfcontr[dfcontr.amount==dfcontr.amount.max()]









    Out[27]:







  
    
      
      last_name
      first_name
      middle_name
      street_1
      street_2
      city
      state
      zip
      amount
      date
      candidate_id
    
  
  
    
      30
      Buckel
      Linda
      NaN
      PO Box 683130
      NaN
      Park City
      UT
      840683130
      4600.0
      2007-08-14
      20



In [28]:

    
dfcontr.groupby("state").sum()









    Out[28]:







  
    
      
      zip
      amount
      candidate_id
    
    
      state
      
      
      
    
  
  
    
      AK
      2985459621
      1210.00
      111
    
    
      AR
      864790
      14200.00
      192
    
    
      AZ
      860011121
      120.00
      37
    
    
      CA
      14736360720
      -5013.73
      600
    
    
      CO
      2405477834
      -5823.00
      111
    
    
      CT
      68901376
      2300.00
      35
    
    
      DC
      800341853
      -1549.91
      102
    
    
      FL
      8970626520
      -4050.00
      803
    
    
      IA
      50266
      250.00
      16
    
    
      ID
      83648
      -261.00
      32
    
    
      IL
      3042068689
      -5586.80
      175
    
    
      KS
      66215
      -330.00
      32
    
    
      KY
      402597029
      -200.00
      22
    
    
      LA
      1406043327
      1300.00
      74
    
    
      MA
      123026638
      -83.00
      208
    
    
      MD
      416287617
      300.00
      55
    
    
      ME
      165647170
      2520.00
      122
    
    
      MI
      2426973485
      -1265.00
      164
    
    
      MN
      1102338918
      322.00
      100
    
    
      MO
      64111
      100.00
      20
    
    
      NC
      27502
      500.00
      16
    
    
      NH
      32564424
      -24.60
      32
    
    
      NJ
      70254993
      -817.45
      64
    
    
      NV
      3575889763
      725.00
      144
    
    
      NY
      606129991
      -6474.50
      233
    
    
      OH
      176071
      450.00
      80
    
    
      OK
      2202499044
      800.00
      102
    
    
      PA
      540499020
      -2146.00
      145
    
    
      RI
      58065892
      200.00
      70
    
    
      SC
      296214789
      2400.00
      69
    
    
      TN
      37188
      -25.00
      32
    
    
      TX
      6221452245
      1985.24
      302
    
    
      UT
      9251153394
      5050.00
      340
    
    
      VA
      440691831
      515.92
      135
    
    
      WA
      2941290251
      -500.00
      90



In [29]:

    
dfcontr.groupby("state")["amount"].sum()









    Out[29]:





state
AK     1210.00
AR    14200.00
AZ      120.00
CA    -5013.73
CO    -5823.00
CT     2300.00
DC    -1549.91
FL    -4050.00
IA      250.00
ID     -261.00
IL    -5586.80
KS     -330.00
KY     -200.00
LA     1300.00
MA      -83.00
MD      300.00
ME     2520.00
MI    -1265.00
MN      322.00
MO      100.00
NC      500.00
NH      -24.60
NJ     -817.45
NV      725.00
NY    -6474.50
OH      450.00
OK      800.00
PA    -2146.00
RI      200.00
SC     2400.00
TN      -25.00
TX     1985.24
UT     5050.00
VA      515.92
WA     -500.00
Name: amount, dtype: float64



In [30]:

    
dfcontr.state.unique()









    Out[30]:





array(['VA', 'CA', 'AR', 'DC', 'SC', 'IA', 'OH', 'NC', 'UT', 'MO', 'IL',
       'ME', 'FL', 'MD', 'MI', 'CO', 'WA', 'NY', 'TX', 'KY', 'PA', 'TN',
       'MA', 'MN', 'KS', 'NJ', 'NH', 'ID', 'OK', nan, 'NV', 'CT', 'RI',
       'AK', 'LA', 'AZ'], dtype=object)

There is also a version of the LIMIT clause. It's very intuitive with pandas.



In [31]:

    
dfcand[0:3]









    Out[31]:







  
    
      
      id
      first_name
      last_name
      middle_name
      party
    
  
  
    
      0
      33
      Joseph
      Biden
      NaN
      D
    
    
      1
      36
      Samuel
      Brownback
      NaN
      R
    
    
      2
      34
      Hillary
      Clinton
      R.
      D

The usual Python slicing works just fine!

Joins with `pandas`

pandas has some some documentation on joins: Merge, join, and concatenate. If you want some more reinforcement on the concepts from earlier regarding JOIN, then the pandas documentation may be a good place to get it.

You may also be interested in a comparison with SQL.

To do joins with pandas, we use the merge command.

Here's an example of an explicit inner join:



In [32]:

    
cols_wanted = ['last_name_x', 'first_name_x', 'candidate_id', 'id', 'last_name_y']
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id")[cols_wanted]









    Out[32]:







  
    
      
      last_name_x
      first_name_x
      candidate_id
      id
      last_name_y
    
  
  
    
      0
      Agee
      Steven
      16
      16
      Huckabee
    
    
      1
      Ahrens
      Don
      16
      16
      Huckabee
    
    
      2
      Ahrens
      Don
      16
      16
      Huckabee
    
    
      3
      Ahrens
      Don
      16
      16
      Huckabee
    
    
      4
      Akin
      Charles
      16
      16
      Huckabee
    
    
      5
      Akin
      Mike
      16
      16
      Huckabee
    
    
      6
      Akin
      Rebecca
      16
      16
      Huckabee
    
    
      7
      Aldridge
      Brittni
      16
      16
      Huckabee
    
    
      8
      Allen
      John D.
      16
      16
      Huckabee
    
    
      9
      Allen
      John D.
      16
      16
      Huckabee
    
    
      10
      Allison
      John W.
      16
      16
      Huckabee
    
    
      11
      Allison
      Rebecca
      16
      16
      Huckabee
    
    
      12
      Allison
      Rebecca
      16
      16
      Huckabee
    
    
      13
      Altes
      R.D.
      16
      16
      Huckabee
    
    
      14
      Andres
      Dale
      16
      16
      Huckabee
    
    
      15
      Anthony
      John
      16
      16
      Huckabee
    
    
      16
      Arbogast
      Robert
      16
      16
      Huckabee
    
    
      17
      Arbogast
      Robert
      16
      16
      Huckabee
    
    
      18
      Ardle
      William
      16
      16
      Huckabee
    
    
      19
      Atiq
      Omar
      16
      16
      Huckabee
    
    
      20
      Atiq
      Omar
      16
      16
      Huckabee
    
    
      21
      Baker
      David
      16
      16
      Huckabee
    
    
      22
      Bancroft
      David
      16
      16
      Huckabee
    
    
      23
      Banks
      Charles
      16
      16
      Huckabee
    
    
      24
      Barbee
      John
      16
      16
      Huckabee
    
    
      25
      Buckler
      Steve
      20
      20
      Obama
    
    
      26
      Buckler
      Steve
      20
      20
      Obama
    
    
      27
      Buckheit
      Bruce
      20
      20
      Obama
    
    
      28
      Buckel
      Linda
      20
      20
      Obama
    
    
      29
      Buckel
      Linda
      20
      20
      Obama
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      145
      ABDELLA
      THOMAS
      35
      35
      Romney
    
    
      146
      ABBOTT
      WELDON
      35
      35
      Romney
    
    
      147
      ABBOTT
      WELDON
      35
      35
      Romney
    
    
      148
      ABBOTT
      GERALD
      35
      35
      Romney
    
    
      149
      ABBOTT
      GERALD
      35
      35
      Romney
    
    
      150
      ABEDIN
      ZAINUL
      37
      37
      McCain
    
    
      151
      ABBOTT
      SYBIL
      37
      37
      McCain
    
    
      152
      ABBOTT
      SYBIL
      37
      37
      McCain
    
    
      153
      ABBOTT
      RONALD
      37
      37
      McCain
    
    
      154
      ABBOTT
      RONALD
      37
      37
      McCain
    
    
      155
      ABBOTT
      ROBERT
      37
      37
      McCain
    
    
      156
      ABBOTT
      MIKE
      37
      37
      McCain
    
    
      157
      ABBOT
      DAVID
      37
      37
      McCain
    
    
      158
      ABBO
      PAULINE
      37
      37
      McCain
    
    
      159
      ABATE
      MARIA
      37
      37
      McCain
    
    
      160
      ABAIR
      PETER
      37
      37
      McCain
    
    
      161
      ABACHERLI
      SHIRLEY
      37
      37
      McCain
    
    
      162
      AARONS
      CHARLES
      37
      37
      McCain
    
    
      163
      AARONS
      CHARLES
      37
      37
      McCain
    
    
      164
      AARONS
      CHARLES
      37
      37
      McCain
    
    
      165
      ABEL
      JOHN
      37
      37
      McCain
    
    
      166
      ABEL
      MARLING
      37
      37
      McCain
    
    
      167
      ABEL
      RUDOLPH
      37
      37
      McCain
    
    
      168
      ABELE
      RODNEY
      37
      37
      McCain
    
    
      169
      ABERCROMBIE
      DENIS
      37
      37
      McCain
    
    
      170
      ABESHAUS
      MERRILL
      37
      37
      McCain
    
    
      171
      ABRAHAM
      GEORGE
      37
      37
      McCain
    
    
      172
      ABRAHAMSON
      PETER
      37
      37
      McCain
    
    
      173
      ABRAHAM
      SALEM
      37
      37
      McCain
    
    
      174
      ABRAHAM
      SALEM
      37
      37
      McCain
    
  

175 rows × 5 columns

Somewhat organized example



In [33]:

    
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id")[cols_wanted].groupby('last_name_y').describe()









    Out[33]:







  
    
      
      candidate_id
      id
    
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
    
      last_name_y
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Clinton
      25.0
      34.0
      0.0
      34.0
      34.0
      34.0
      34.0
      34.0
      25.0
      34.0
      0.0
      34.0
      34.0
      34.0
      34.0
      34.0
    
    
      Giuliani
      25.0
      22.0
      0.0
      22.0
      22.0
      22.0
      22.0
      22.0
      25.0
      22.0
      0.0
      22.0
      22.0
      22.0
      22.0
      22.0
    
    
      Huckabee
      25.0
      16.0
      0.0
      16.0
      16.0
      16.0
      16.0
      16.0
      25.0
      16.0
      0.0
      16.0
      16.0
      16.0
      16.0
      16.0
    
    
      McCain
      25.0
      37.0
      0.0
      37.0
      37.0
      37.0
      37.0
      37.0
      25.0
      37.0
      0.0
      37.0
      37.0
      37.0
      37.0
      37.0
    
    
      Obama
      25.0
      20.0
      0.0
      20.0
      20.0
      20.0
      20.0
      20.0
      25.0
      20.0
      0.0
      20.0
      20.0
      20.0
      20.0
      20.0
    
    
      Paul
      25.0
      32.0
      0.0
      32.0
      32.0
      32.0
      32.0
      32.0
      25.0
      32.0
      0.0
      32.0
      32.0
      32.0
      32.0
      32.0
    
    
      Romney
      25.0
      35.0
      0.0
      35.0
      35.0
      35.0
      35.0
      35.0
      25.0
      35.0
      0.0
      35.0
      35.0
      35.0
      35.0
      35.0

Other Joins with `pandas`

We didn't cover all possible joins because SQLite can only handle the few that we did discuss. As mentioned, there are workarounds for some things in SQLite, but not evertyhing. Fortunately, pandas can handle pretty much everything. Here are a few joins that pandas can handle:

LEFT OUTER (already discussed)
RIGHT OUTER - Think of the "opposite" of a LEFT OUTER join (shade the intersection and right set in the Venn diagram).
FULL OUTER - Combine everything from both tables (shade the entire Venn diagram)

Left Outer Join with `pandas`



In [34]:

    
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id", how="left")[cols_wanted]









    Out[34]:







  
    
      
      last_name_x
      first_name_x
      candidate_id
      id
      last_name_y
    
  
  
    
      0
      Agee
      Steven
      16
      16
      Huckabee
    
    
      1
      Ahrens
      Don
      16
      16
      Huckabee
    
    
      2
      Ahrens
      Don
      16
      16
      Huckabee
    
    
      3
      Ahrens
      Don
      16
      16
      Huckabee
    
    
      4
      Akin
      Charles
      16
      16
      Huckabee
    
    
      5
      Akin
      Mike
      16
      16
      Huckabee
    
    
      6
      Akin
      Rebecca
      16
      16
      Huckabee
    
    
      7
      Aldridge
      Brittni
      16
      16
      Huckabee
    
    
      8
      Allen
      John D.
      16
      16
      Huckabee
    
    
      9
      Allen
      John D.
      16
      16
      Huckabee
    
    
      10
      Allison
      John W.
      16
      16
      Huckabee
    
    
      11
      Allison
      Rebecca
      16
      16
      Huckabee
    
    
      12
      Allison
      Rebecca
      16
      16
      Huckabee
    
    
      13
      Altes
      R.D.
      16
      16
      Huckabee
    
    
      14
      Andres
      Dale
      16
      16
      Huckabee
    
    
      15
      Anthony
      John
      16
      16
      Huckabee
    
    
      16
      Arbogast
      Robert
      16
      16
      Huckabee
    
    
      17
      Arbogast
      Robert
      16
      16
      Huckabee
    
    
      18
      Ardle
      William
      16
      16
      Huckabee
    
    
      19
      Atiq
      Omar
      16
      16
      Huckabee
    
    
      20
      Atiq
      Omar
      16
      16
      Huckabee
    
    
      21
      Baker
      David
      16
      16
      Huckabee
    
    
      22
      Bancroft
      David
      16
      16
      Huckabee
    
    
      23
      Banks
      Charles
      16
      16
      Huckabee
    
    
      24
      Barbee
      John
      16
      16
      Huckabee
    
    
      25
      Buckler
      Steve
      20
      20
      Obama
    
    
      26
      Buckler
      Steve
      20
      20
      Obama
    
    
      27
      Buckheit
      Bruce
      20
      20
      Obama
    
    
      28
      Buckel
      Linda
      20
      20
      Obama
    
    
      29
      Buckel
      Linda
      20
      20
      Obama
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      145
      ABDELLA
      THOMAS
      35
      35
      Romney
    
    
      146
      ABBOTT
      WELDON
      35
      35
      Romney
    
    
      147
      ABBOTT
      WELDON
      35
      35
      Romney
    
    
      148
      ABBOTT
      GERALD
      35
      35
      Romney
    
    
      149
      ABBOTT
      GERALD
      35
      35
      Romney
    
    
      150
      ABEDIN
      ZAINUL
      37
      37
      McCain
    
    
      151
      ABBOTT
      SYBIL
      37
      37
      McCain
    
    
      152
      ABBOTT
      SYBIL
      37
      37
      McCain
    
    
      153
      ABBOTT
      RONALD
      37
      37
      McCain
    
    
      154
      ABBOTT
      RONALD
      37
      37
      McCain
    
    
      155
      ABBOTT
      ROBERT
      37
      37
      McCain
    
    
      156
      ABBOTT
      MIKE
      37
      37
      McCain
    
    
      157
      ABBOT
      DAVID
      37
      37
      McCain
    
    
      158
      ABBO
      PAULINE
      37
      37
      McCain
    
    
      159
      ABATE
      MARIA
      37
      37
      McCain
    
    
      160
      ABAIR
      PETER
      37
      37
      McCain
    
    
      161
      ABACHERLI
      SHIRLEY
      37
      37
      McCain
    
    
      162
      AARONS
      CHARLES
      37
      37
      McCain
    
    
      163
      AARONS
      CHARLES
      37
      37
      McCain
    
    
      164
      AARONS
      CHARLES
      37
      37
      McCain
    
    
      165
      ABEL
      JOHN
      37
      37
      McCain
    
    
      166
      ABEL
      MARLING
      37
      37
      McCain
    
    
      167
      ABEL
      RUDOLPH
      37
      37
      McCain
    
    
      168
      ABELE
      RODNEY
      37
      37
      McCain
    
    
      169
      ABERCROMBIE
      DENIS
      37
      37
      McCain
    
    
      170
      ABESHAUS
      MERRILL
      37
      37
      McCain
    
    
      171
      ABRAHAM
      GEORGE
      37
      37
      McCain
    
    
      172
      ABRAHAMSON
      PETER
      37
      37
      McCain
    
    
      173
      ABRAHAM
      SALEM
      37
      37
      McCain
    
    
      174
      ABRAHAM
      SALEM
      37
      37
      McCain
    
  

175 rows × 5 columns

Right Outer Join with `pandas`



In [36]:

    
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id", how="right")[cols_wanted]









    Out[36]:







  
    
      
      last_name_x
      first_name_x
      candidate_id
      id
      last_name_y
    
  
  
    
      0
      Agee
      Steven
      16.0
      16
      Huckabee
    
    
      1
      Ahrens
      Don
      16.0
      16
      Huckabee
    
    
      2
      Ahrens
      Don
      16.0
      16
      Huckabee
    
    
      3
      Ahrens
      Don
      16.0
      16
      Huckabee
    
    
      4
      Akin
      Charles
      16.0
      16
      Huckabee
    
    
      5
      Akin
      Mike
      16.0
      16
      Huckabee
    
    
      6
      Akin
      Rebecca
      16.0
      16
      Huckabee
    
    
      7
      Aldridge
      Brittni
      16.0
      16
      Huckabee
    
    
      8
      Allen
      John D.
      16.0
      16
      Huckabee
    
    
      9
      Allen
      John D.
      16.0
      16
      Huckabee
    
    
      10
      Allison
      John W.
      16.0
      16
      Huckabee
    
    
      11
      Allison
      Rebecca
      16.0
      16
      Huckabee
    
    
      12
      Allison
      Rebecca
      16.0
      16
      Huckabee
    
    
      13
      Altes
      R.D.
      16.0
      16
      Huckabee
    
    
      14
      Andres
      Dale
      16.0
      16
      Huckabee
    
    
      15
      Anthony
      John
      16.0
      16
      Huckabee
    
    
      16
      Arbogast
      Robert
      16.0
      16
      Huckabee
    
    
      17
      Arbogast
      Robert
      16.0
      16
      Huckabee
    
    
      18
      Ardle
      William
      16.0
      16
      Huckabee
    
    
      19
      Atiq
      Omar
      16.0
      16
      Huckabee
    
    
      20
      Atiq
      Omar
      16.0
      16
      Huckabee
    
    
      21
      Baker
      David
      16.0
      16
      Huckabee
    
    
      22
      Bancroft
      David
      16.0
      16
      Huckabee
    
    
      23
      Banks
      Charles
      16.0
      16
      Huckabee
    
    
      24
      Barbee
      John
      16.0
      16
      Huckabee
    
    
      25
      Buckler
      Steve
      20.0
      20
      Obama
    
    
      26
      Buckler
      Steve
      20.0
      20
      Obama
    
    
      27
      Buckheit
      Bruce
      20.0
      20
      Obama
    
    
      28
      Buckel
      Linda
      20.0
      20
      Obama
    
    
      29
      Buckel
      Linda
      20.0
      20
      Obama
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      155
      ABBOTT
      ROBERT
      37.0
      37
      McCain
    
    
      156
      ABBOTT
      MIKE
      37.0
      37
      McCain
    
    
      157
      ABBOT
      DAVID
      37.0
      37
      McCain
    
    
      158
      ABBO
      PAULINE
      37.0
      37
      McCain
    
    
      159
      ABATE
      MARIA
      37.0
      37
      McCain
    
    
      160
      ABAIR
      PETER
      37.0
      37
      McCain
    
    
      161
      ABACHERLI
      SHIRLEY
      37.0
      37
      McCain
    
    
      162
      AARONS
      CHARLES
      37.0
      37
      McCain
    
    
      163
      AARONS
      CHARLES
      37.0
      37
      McCain
    
    
      164
      AARONS
      CHARLES
      37.0
      37
      McCain
    
    
      165
      ABEL
      JOHN
      37.0
      37
      McCain
    
    
      166
      ABEL
      MARLING
      37.0
      37
      McCain
    
    
      167
      ABEL
      RUDOLPH
      37.0
      37
      McCain
    
    
      168
      ABELE
      RODNEY
      37.0
      37
      McCain
    
    
      169
      ABERCROMBIE
      DENIS
      37.0
      37
      McCain
    
    
      170
      ABESHAUS
      MERRILL
      37.0
      37
      McCain
    
    
      171
      ABRAHAM
      GEORGE
      37.0
      37
      McCain
    
    
      172
      ABRAHAMSON
      PETER
      37.0
      37
      McCain
    
    
      173
      ABRAHAM
      SALEM
      37.0
      37
      McCain
    
    
      174
      ABRAHAM
      SALEM
      37.0
      37
      McCain
    
    
      175
      NaN
      NaN
      NaN
      33
      Biden
    
    
      176
      NaN
      NaN
      NaN
      36
      Brownback
    
    
      177
      NaN
      NaN
      NaN
      39
      Dodd
    
    
      178
      NaN
      NaN
      NaN
      26
      Edwards
    
    
      179
      NaN
      NaN
      NaN
      24
      Gravel
    
    
      180
      NaN
      NaN
      NaN
      30
      Hunter
    
    
      181
      NaN
      NaN
      NaN
      31
      Kucinich
    
    
      182
      NaN
      NaN
      NaN
      29
      Richardson
    
    
      183
      NaN
      NaN
      NaN
      38
      Tancredo
    
    
      184
      NaN
      NaN
      NaN
      41
      Thompson
    
  

185 rows × 5 columns

Full Outer Join with `pandas`



In [37]:

    
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id", how="outer")[cols_wanted]









    Out[37]:







  
    
      
      last_name_x
      first_name_x
      candidate_id
      id
      last_name_y
    
  
  
    
      0
      Agee
      Steven
      16.0
      16
      Huckabee
    
    
      1
      Ahrens
      Don
      16.0
      16
      Huckabee
    
    
      2
      Ahrens
      Don
      16.0
      16
      Huckabee
    
    
      3
      Ahrens
      Don
      16.0
      16
      Huckabee
    
    
      4
      Akin
      Charles
      16.0
      16
      Huckabee
    
    
      5
      Akin
      Mike
      16.0
      16
      Huckabee
    
    
      6
      Akin
      Rebecca
      16.0
      16
      Huckabee
    
    
      7
      Aldridge
      Brittni
      16.0
      16
      Huckabee
    
    
      8
      Allen
      John D.
      16.0
      16
      Huckabee
    
    
      9
      Allen
      John D.
      16.0
      16
      Huckabee
    
    
      10
      Allison
      John W.
      16.0
      16
      Huckabee
    
    
      11
      Allison
      Rebecca
      16.0
      16
      Huckabee
    
    
      12
      Allison
      Rebecca
      16.0
      16
      Huckabee
    
    
      13
      Altes
      R.D.
      16.0
      16
      Huckabee
    
    
      14
      Andres
      Dale
      16.0
      16
      Huckabee
    
    
      15
      Anthony
      John
      16.0
      16
      Huckabee
    
    
      16
      Arbogast
      Robert
      16.0
      16
      Huckabee
    
    
      17
      Arbogast
      Robert
      16.0
      16
      Huckabee
    
    
      18
      Ardle
      William
      16.0
      16
      Huckabee
    
    
      19
      Atiq
      Omar
      16.0
      16
      Huckabee
    
    
      20
      Atiq
      Omar
      16.0
      16
      Huckabee
    
    
      21
      Baker
      David
      16.0
      16
      Huckabee
    
    
      22
      Bancroft
      David
      16.0
      16
      Huckabee
    
    
      23
      Banks
      Charles
      16.0
      16
      Huckabee
    
    
      24
      Barbee
      John
      16.0
      16
      Huckabee
    
    
      25
      Buckler
      Steve
      20.0
      20
      Obama
    
    
      26
      Buckler
      Steve
      20.0
      20
      Obama
    
    
      27
      Buckheit
      Bruce
      20.0
      20
      Obama
    
    
      28
      Buckel
      Linda
      20.0
      20
      Obama
    
    
      29
      Buckel
      Linda
      20.0
      20
      Obama
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      155
      ABBOTT
      ROBERT
      37.0
      37
      McCain
    
    
      156
      ABBOTT
      MIKE
      37.0
      37
      McCain
    
    
      157
      ABBOT
      DAVID
      37.0
      37
      McCain
    
    
      158
      ABBO
      PAULINE
      37.0
      37
      McCain
    
    
      159
      ABATE
      MARIA
      37.0
      37
      McCain
    
    
      160
      ABAIR
      PETER
      37.0
      37
      McCain
    
    
      161
      ABACHERLI
      SHIRLEY
      37.0
      37
      McCain
    
    
      162
      AARONS
      CHARLES
      37.0
      37
      McCain
    
    
      163
      AARONS
      CHARLES
      37.0
      37
      McCain
    
    
      164
      AARONS
      CHARLES
      37.0
      37
      McCain
    
    
      165
      ABEL
      JOHN
      37.0
      37
      McCain
    
    
      166
      ABEL
      MARLING
      37.0
      37
      McCain
    
    
      167
      ABEL
      RUDOLPH
      37.0
      37
      McCain
    
    
      168
      ABELE
      RODNEY
      37.0
      37
      McCain
    
    
      169
      ABERCROMBIE
      DENIS
      37.0
      37
      McCain
    
    
      170
      ABESHAUS
      MERRILL
      37.0
      37
      McCain
    
    
      171
      ABRAHAM
      GEORGE
      37.0
      37
      McCain
    
    
      172
      ABRAHAMSON
      PETER
      37.0
      37
      McCain
    
    
      173
      ABRAHAM
      SALEM
      37.0
      37
      McCain
    
    
      174
      ABRAHAM
      SALEM
      37.0
      37
      McCain
    
    
      175
      NaN
      NaN
      NaN
      33
      Biden
    
    
      176
      NaN
      NaN
      NaN
      36
      Brownback
    
    
      177
      NaN
      NaN
      NaN
      39
      Dodd
    
    
      178
      NaN
      NaN
      NaN
      26
      Edwards
    
    
      179
      NaN
      NaN
      NaN
      24
      Gravel
    
    
      180
      NaN
      NaN
      NaN
      30
      Hunter
    
    
      181
      NaN
      NaN
      NaN
      31
      Kucinich
    
    
      182
      NaN
      NaN
      NaN
      29
      Richardson
    
    
      183
      NaN
      NaN
      NaN
      38
      Tancredo
    
    
      184
      NaN
      NaN
      NaN
      41
      Thompson
    
  

185 rows × 5 columns

	id	last_name	first_name	middle_name	street_1	city	state	zip	amount	date	candidate_id
0	26	Buckler	Steve		24351 Armada Dr	Dana Point	CA	926291306	50.00	2007-07-30	20
1	27	Buckler	Steve		24351 Armada Dr	Dana Point	CA	926291306	25.00	2007-08-16	20
2	28	Buckheit	Bruce		8904 KAREN DR	FAIRFAX	VA	220312731	100.00	2007-09-19	20
3	29	Buckel	Linda		PO Box 683130	Park City	UT	840683130	2300.00	2007-08-14	20
4	30	Buckel	Linda		PO Box 683130	Park City	UT	840683130	-2300.00	2007-08-14	20
5	31	Buckel	Linda		PO Box 683130	Park City	UT	840683130	4600.00	2007-08-14	20
6	32	Buck	Thomas		4206 Terrace Street	Kansas City	MO	64111	100.00	2007-09-25	20
7	33	Buck	Jay	K.	1855 Old Willow Rd Unit 322	Northfield	IL	600932918	200.00	2007-09-12	20
8	34	Buck	Blaine	M	45 Eaton Ave	Camden	ME	48431752	2300.00	2007-09-30	20
9	35	Buck	Barbara		1780 NE 138th St	North Miami	FL	331811316	50.00	2007-09-13	20
10	36	Buck	Barbara		1780 NE 138th St	North Miami	FL	331811316	50.00	2007-07-19	20
11	37	Buchman	Mark M		2530 Lawton Ave	San Luis Obispo	CA	934015622	460.80	2007-07-18	20
12	38	Bucher	Ida	M	1400 Warnall Ave	Los Angeles	CA	900245333	100.00	2007-07-10	20
13	39	Buchanek	Elizabeth		7917 Kentbury Dr	Bethesda	MD	208144615	50.00	2007-09-30	20
14	40	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	500.00	2007-09-24	20
15	41	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	-500.00	2007-09-24	20
16	42	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	500.00	2007-09-24	20
17	43	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	700.00	2007-08-28	20
18	44	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	-700.00	2007-08-28	20
19	45	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	1000.00	2007-08-28	20
20	46	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	1300.00	2007-08-09	20
21	47	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	200.00	2007-08-14	20
22	48	Buchanan	John		2025 NW 29th Rd	Boca Raton	FL	334316303	500.00	2007-07-25	20
23	49	Buchanan	John		4635 49th St NW	Washington	DC	200164320	200.09	2007-09-23	20
24	50	Harrison	Ryan		2247 3rd St	La Verne	CA	917504918	25.00	2007-07-26	20

	id	first_name	last_name	middle_name	party
0	33	Joseph	Biden	NaN	D
1	36	Samuel	Brownback	NaN	R
2	34	Hillary	Clinton	R.	D
3	39	Christopher	Dodd	J.	D
4	26	John	Edwards	NaN	D
5	22	Rudolph	Giuliani	NaN	R
6	24	Mike	Gravel	NaN	D
7	16	Mike	Huckabee	NaN	R
8	30	Duncan	Hunter	NaN	R
9	31	Dennis	Kucinich	NaN	D
10	37	John	McCain	NaN	R
11	20	Barack	Obama	NaN	D
12	32	Ron	Paul	NaN	R
13	29	Bill	Richardson	NaN	D
14	35	Mitt	Romney	NaN	R
15	38	Tom	Tancredo	NaN	R
16	41	Fred	Thompson	D.	R

	id	last_name	first_name	middle_name	street_1	street_2	city	state	zip	amount	date	candidate_id
0	NaN	Agee	Steven	NaN	549 Laurel Branch Road	NaN	Floyd	VA	24091	500.0	2007-06-30	16
1	NaN	Ahrens	Don	NaN	4034 Rennellwood Way	NaN	Pleasanton	CA	94566	250.0	2007-05-16	16
2	NaN	Ahrens	Don	NaN	4034 Rennellwood Way	NaN	Pleasanton	CA	94566	50.0	2007-06-18	16
3	NaN	Ahrens	Don	NaN	4034 Rennellwood Way	NaN	Pleasanton	CA	94566	100.0	2007-06-21	16
4	NaN	Akin	Charles	NaN	10187 Sugar Creek Road	NaN	Bentonville	AR	72712	100.0	2007-06-16	16
5	NaN	Akin	Mike	NaN	181 Baywood Lane	NaN	Monticello	AR	71655	1500.0	2007-05-18	16
6	NaN	Akin	Rebecca	NaN	181 Baywood Lane	NaN	Monticello	AR	71655	500.0	2007-05-18	16
7	NaN	Aldridge	Brittni	NaN	808 Capitol Square Place, SW	NaN	Washington	DC	20024	250.0	2007-06-06	16
8	NaN	Allen	John D.	NaN	1052 Cannon Mill Drive	NaN	North Augusta	SC	29860	1000.0	2007-06-11	16
9	NaN	Allen	John D.	NaN	1052 Cannon Mill Drive	NaN	North Augusta	SC	29860	1300.0	2007-06-29	16
10	NaN	Allison	John W.	NaN	P.O. Box 1089	NaN	Conway	AR	72033	1000.0	2007-05-18	16
11	NaN	Allison	Rebecca	NaN	3206 Summit Court	NaN	Little Rock	AR	72227	1000.0	2007-04-25	16
12	NaN	Allison	Rebecca	NaN	3206 Summit Court	NaN	Little Rock	AR	72227	200.0	2007-06-12	16
13	NaN	Altes	R.D.	NaN	8600 Moody Road	NaN	Fort Smith	AR	72903	2300.0	2007-06-21	16
14	NaN	Andres	Dale	NaN	1160 Glen Oaks Drive	NaN	West Des Moines	IA	50266	250.0	2007-06-06	16
15	NaN	Anthony	John	NaN	211 Long Island Drive	NaN	Hot Springs	AR	71913	2300.0	2007-06-12	16
16	NaN	Arbogast	Robert	NaN	12900 State Route 56 SE	NaN	Mount Sterling	OH	43143	500.0	2007-04-08	16
17	NaN	Arbogast	Robert	NaN	12900 State Route 56 SE	NaN	Mount Sterling	OH	43143	100.0	2007-06-22	16
18	NaN	Ardle	William	NaN	412 Dakota Avenue	NaN	Springfield	OH	45504	50.0	2007-06-28	16
19	NaN	Atiq	Omar	NaN	7200 S Hazel Street	NaN	Pine Bluff	AR	71603	1000.0	2007-05-18	16
20	NaN	Atiq	Omar	NaN	7200 S Hazel Street	NaN	Pine Bluff	AR	71603	1000.0	2007-06-27	16
21	NaN	Baker	David	NaN	2550 Adamsbrooke Drive	NaN	Conway	AR	72034	2300.0	2007-04-11	16
22	NaN	Bancroft	David	NaN	2934 Broderick Street	NaN	San Francisco	CA	94123	250.0	2007-04-24	16
23	NaN	Banks	Charles	NaN	P.O. Box 251310	NaN	Little Rock	AR	72225	1000.0	2007-05-14	16
24	NaN	Barbee	John	NaN	516 Kellyridge Drive	NaN	Apex	NC	27502	500.0	2007-05-23	16
25	NaN	Buckler	Steve	NaN	24351 Armada Dr	NaN	Dana Point	CA	926291306	50.0	2007-07-30	20
26	NaN	Buckler	Steve	NaN	24351 Armada Dr	NaN	Dana Point	CA	926291306	25.0	2007-08-16	20
27	NaN	Buckheit	Bruce	NaN	8904 KAREN DR	NaN	FAIRFAX	VA	220312731	100.0	2007-09-19	20
28	NaN	Buckel	Linda	NaN	PO Box 683130	NaN	Park City	UT	840683130	2300.0	2007-08-14	20
29	NaN	Buckel	Linda	NaN	PO Box 683130	NaN	Park City	UT	840683130	-2300.0	2007-08-14	20
...	...	...	...	...	...	...	...	...	...	...	...	...
145	NaN	ABDELLA	THOMAS	M.	4231 MONUMENT WALL WAY #340	NaN	FAIRFAX	VA	220308440	50.0	2007-09-30	35
146	NaN	ABBOTT	WELDON	S.	777 EAST SOUTH TEMPLE 4E	NaN	SALT LAKE CITY	UT	841021269	100.0	2007-09-29	35
147	NaN	ABBOTT	WELDON	S.	777 EAST SOUTH TEMPLE 4E	NaN	SALT LAKE CITY	UT	841021269	50.0	2007-08-09	35
148	NaN	ABBOTT	GERALD	F.	389 BENEFIT STREET	NaN	PROVIDENCE	RI	29032946	100.0	2007-09-15	35
149	NaN	ABBOTT	GERALD	F.	389 BENEFIT STREET	NaN	PROVIDENCE	RI	29032946	100.0	2007-08-15	35
150	NaN	ABEDIN	ZAINUL	NaN	715 N. CENTRAL AVENUE	SUITE 212	GLENDALE	CA	912031164	500.0	2008-01-21	37
151	NaN	ABBOTT	SYBIL	F.	446 GAMES DRIVE	NaN	RENO	NV	895093326	75.0	2008-01-08	37
152	NaN	ABBOTT	SYBIL	F.	446 GAMES DRIVE	NaN	RENO	NV	895093326	50.0	2008-01-08	37
153	NaN	ABBOTT	RONALD	LEANDER	5453 HAWTHORNE STREET	NaN	MONTCLAIR	CA	917632551	200.0	2008-01-31	37
154	NaN	ABBOTT	RONALD	LEANDER	5453 HAWTHORNE STREET	NaN	MONTCLAIR	CA	917632551	100.0	2008-01-08	37
155	NaN	ABBOTT	ROBERT	A.	3061 LOREE ROAD	NaN	DECKERVILLE	MI	484279763	500.0	2008-01-21	37
156	NaN	ABBOTT	MIKE	E.	4516 OSPREY LNDG	NaN	NICEVILLE	FL	325786810	1000.0	2008-01-15	37
157	NaN	ABBOT	DAVID	M.	56 SALEM STREET	NaN	ANDOVER	MA	18102114	200.0	2008-01-21	37
158	NaN	ABBO	PAULINE	MORENCY	10720 JACOB LANE	NaN	WHITE LAKE	MI	483862274	35.0	2008-01-07	37
159	NaN	ABATE	MARIA	ELENA	1291 NIGHTINGALE AVENUE	NaN	MIAMI SPRINGS	FL	331663832	2600.0	2008-01-25	37
160	NaN	ABAIR	PETER	NaN	40 EVANS STREET	NaN	WATERTOWN	MA	24722150	25.0	2008-01-09	37
161	NaN	ABACHERLI	SHIRLEY	M.	29875 NEWPORT ROAD	NaN	MENIFEE	CA	925849524	150.0	2008-01-28	37
162	NaN	AARONS	CHARLES	NaN	1730 SHORE DRIVE	NaN	ANCHORAGE	AK	995153207	300.0	2008-01-30	37
163	NaN	AARONS	CHARLES	NaN	1730 SHORE DRIVE	NaN	ANCHORAGE	AK	995153207	410.0	2008-01-15	37
164	NaN	AARONS	CHARLES	NaN	1730 SHORE DRIVE	NaN	ANCHORAGE	AK	995153207	500.0	2008-01-09	37
165	NaN	ABEL	JOHN	H.	422 THOMAS STREET	NaN	BETHLEHEM	PA	180153316	200.0	2008-01-22	37
166	NaN	ABEL	MARLING	L.	14 HANGING MOSS LANE	NaN	GREENVILLE	SC	296155069	100.0	2008-01-22	37
167	NaN	ABEL	RUDOLPH	NaN	4532 OCEAN BLVD.	# 108	SARASOTA	FL	342421337	100.0	2008-01-08	37
168	NaN	ABELE	RODNEY	NaN	3620 METAIRIE HEIGHTS AVENUE	NaN	METAIRIE	LA	700021823	500.0	2008-01-15	37
169	NaN	ABERCROMBIE	DENIS	NaN	11811 WATER OAK CT	NaN	MAGNOLIA	TX	773546270	500.0	2008-01-30	37
170	NaN	ABESHAUS	MERRILL	M.	1801 N. HEREFORD DRIVE	NaN	FLAGSTAFF	AZ	860011121	120.0	2008-01-16	37
171	NaN	ABRAHAM	GEORGE	NaN	P.O. BOX 1504	NaN	LAKE CHARLES	LA	706021504	800.0	2008-01-17	37
172	NaN	ABRAHAMSON	PETER	J.	1030 W. ROSCOE STREET	NaN	CHICAGO	IL	606572207	50.0	2008-01-25	37
173	NaN	ABRAHAM	SALEM	A.	P.O. BOX 7	NaN	CANADIAN	TX	790140007	1000.0	2008-01-17	37
174	NaN	ABRAHAM	SALEM	A.	P.O. BOX 7	NaN	CANADIAN	TX	790140007	1300.0	2008-01-30	37

	id	first_name	last_name	middle_name	party	name
0	33	Joseph	Biden	NaN	D	Biden, Joseph
1	36	Samuel	Brownback	NaN	R	Brownback, Samuel
2	34	Hillary	Clinton	R.	D	Clinton, Hillary
3	39	Christopher	Dodd	J.	D	Dodd, Christopher
4	26	John	Edwards	NaN	D	Edwards, John
5	22	Rudolph	Giuliani	NaN	R	Giuliani, Rudolph
6	24	Mike	Gravel	NaN	D	Gravel, Mike
7	16	Mike	Huckabee	NaN	R	Huckabee, Mike
8	30	Duncan	Hunter	NaN	R	Hunter, Duncan
9	31	Dennis	Kucinich	NaN	D	Kucinich, Dennis
10	37	John	McCain	NaN	R	McCain, John
11	20	Barack	Obama	NaN	D	Obama, Barack
12	32	Ron	Paul	NaN	R	Paul, Ron
13	29	Bill	Richardson	NaN	D	Richardson, Bill
14	35	Mitt	Romney	NaN	R	Romney, Mitt
15	38	Tom	Tancredo	NaN	R	Tancredo, Tom
16	41	Fred	Thompson	D.	R	Thompson, Fred

	id
count	17.000000
mean	30.764706
std	7.110990
min	16.000000
25%	26.000000
50%	32.000000
75%	36.000000
max	41.000000

	zip	amount	candidate_id
state
AK	2985459621	1210.00	111
AR	864790	14200.00	192
AZ	860011121	120.00	37
CA	14736360720	-5013.73	600
CO	2405477834	-5823.00	111
CT	68901376	2300.00	35
DC	800341853	-1549.91	102
FL	8970626520	-4050.00	803
IA	50266	250.00	16
ID	83648	-261.00	32
IL	3042068689	-5586.80	175
KS	66215	-330.00	32
KY	402597029	-200.00	22
LA	1406043327	1300.00	74
MA	123026638	-83.00	208
MD	416287617	300.00	55
ME	165647170	2520.00	122
MI	2426973485	-1265.00	164
MN	1102338918	322.00	100
MO	64111	100.00	20
NC	27502	500.00	16
NH	32564424	-24.60	32
NJ	70254993	-817.45	64
NV	3575889763	725.00	144
NY	606129991	-6474.50	233
OH	176071	450.00	80
OK	2202499044	800.00	102
PA	540499020	-2146.00	145
RI	58065892	200.00	70
SC	296214789	2400.00	69
TN	37188	-25.00	32
TX	6221452245	1985.24	302
UT	9251153394	5050.00	340
VA	440691831	515.92	135
WA	2941290251	-500.00	90

	last_name_x	first_name_x	candidate_id	id	last_name_y
0	Agee	Steven	16.0	16	Huckabee
1	Ahrens	Don	16.0	16	Huckabee
2	Ahrens	Don	16.0	16	Huckabee
3	Ahrens	Don	16.0	16	Huckabee
4	Akin	Charles	16.0	16	Huckabee
5	Akin	Mike	16.0	16	Huckabee
6	Akin	Rebecca	16.0	16	Huckabee
7	Aldridge	Brittni	16.0	16	Huckabee
8	Allen	John D.	16.0	16	Huckabee
9	Allen	John D.	16.0	16	Huckabee
10	Allison	John W.	16.0	16	Huckabee
11	Allison	Rebecca	16.0	16	Huckabee
12	Allison	Rebecca	16.0	16	Huckabee
13	Altes	R.D.	16.0	16	Huckabee
14	Andres	Dale	16.0	16	Huckabee
15	Anthony	John	16.0	16	Huckabee
16	Arbogast	Robert	16.0	16	Huckabee
17	Arbogast	Robert	16.0	16	Huckabee
18	Ardle	William	16.0	16	Huckabee
19	Atiq	Omar	16.0	16	Huckabee
20	Atiq	Omar	16.0	16	Huckabee
21	Baker	David	16.0	16	Huckabee
22	Bancroft	David	16.0	16	Huckabee
23	Banks	Charles	16.0	16	Huckabee
24	Barbee	John	16.0	16	Huckabee
25	Buckler	Steve	20.0	20	Obama
26	Buckler	Steve	20.0	20	Obama
27	Buckheit	Bruce	20.0	20	Obama
28	Buckel	Linda	20.0	20	Obama
29	Buckel	Linda	20.0	20	Obama
...	...	...	...	...	...
155	ABBOTT	ROBERT	37.0	37	McCain
156	ABBOTT	MIKE	37.0	37	McCain
157	ABBOT	DAVID	37.0	37	McCain
158	ABBO	PAULINE	37.0	37	McCain
159	ABATE	MARIA	37.0	37	McCain
160	ABAIR	PETER	37.0	37	McCain
161	ABACHERLI	SHIRLEY	37.0	37	McCain
162	AARONS	CHARLES	37.0	37	McCain
163	AARONS	CHARLES	37.0	37	McCain
164	AARONS	CHARLES	37.0	37	McCain
165	ABEL	JOHN	37.0	37	McCain
166	ABEL	MARLING	37.0	37	McCain
167	ABEL	RUDOLPH	37.0	37	McCain
168	ABELE	RODNEY	37.0	37	McCain
169	ABERCROMBIE	DENIS	37.0	37	McCain
170	ABESHAUS	MERRILL	37.0	37	McCain
171	ABRAHAM	GEORGE	37.0	37	McCain
172	ABRAHAMSON	PETER	37.0	37	McCain
173	ABRAHAM	SALEM	37.0	37	McCain
174	ABRAHAM	SALEM	37.0	37	McCain
175	NaN	NaN	NaN	33	Biden
176	NaN	NaN	NaN	36	Brownback
177	NaN	NaN	NaN	39	Dodd
178	NaN	NaN	NaN	26	Edwards
179	NaN	NaN	NaN	24	Gravel
180	NaN	NaN	NaN	30	Hunter
181	NaN	NaN	NaN	31	Kucinich
182	NaN	NaN	NaN	29	Richardson
183	NaN	NaN	NaN	38	Tancredo
184	NaN	NaN	NaN	41	Thompson

Lecture 19

Monday, November 13th 2017

Joins with SQLite, pandas

Starting Up

Recap

Joins

INNER JOIN

Example

Thoughts

Exercises

LEFT JOIN or LEFT OUTER JOIN

Example A

Example B

Exercises

pandas

Reading a datafile into pandas

Creating a Table with pandas

Querying a table with pandas

One Way

Another Way

More Queries

Exercises

Sorting

Selecting Columns

Exercises

Altering Tables

Exercises

Aggregation

Exercise

Joins with pandas

Somewhat organized example

Other Joins with pandas

Left Outer Join with pandas

Right Outer Join with pandas

Full Outer Join with pandas

Joins with `SQLite`, `pandas`

`LEFT JOIN` or `LEFT OUTER JOIN`

`pandas`

Reading a datafile into `pandas`

Creating a Table with `pandas`

Querying a table with `pandas`

Joins with `pandas`

Other Joins with `pandas`

Left Outer Join with `pandas`

Right Outer Join with `pandas`

Full Outer Join with `pandas`