Lecture 19

Monday, November 13th 2017

Joins with SQLite, pandas

Starting Up

You can connect to the saved database from last time if you want. Alternatively, for extra practice, you can just recreate it from the datasets provided in the .txt files. That's what I'll do.


In [1]:
import sqlite3
import numpy as np
import pandas as pd
import time
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)

db = sqlite3.connect('L19DB_demo.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS candidates")
cursor.execute("DROP TABLE IF EXISTS contributors")
cursor.execute("PRAGMA foreign_keys=1")

cursor.execute('''CREATE TABLE candidates (
               id INTEGER PRIMARY KEY NOT NULL, 
               first_name TEXT, 
               last_name TEXT, 
               middle_init TEXT, 
               party TEXT NOT NULL)''')

db.commit() # Commit changes to the database

cursor.execute('''CREATE TABLE contributors (
          id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
          last_name TEXT, 
          first_name TEXT, 
          middle_name TEXT, 
          street_1 TEXT, 
          street_2 TEXT, 
          city TEXT, 
          state TEXT, 
          zip TEXT, 
          amount REAL, 
          date DATETIME, 
          candidate_id INTEGER NOT NULL, 
          FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')

db.commit()

with open ("candidates.txt") as candidates:
    next(candidates) # jump over the header
    for line in candidates.readlines():
        cid, first_name, last_name, middle_name, party = line.strip().split('|')
        vals_to_insert = (int(cid), first_name, last_name, middle_name, party)
        cursor.execute('''INSERT INTO candidates 
                  (id, first_name, last_name, middle_init, party)
                  VALUES (?, ?, ?, ?, ?)''', vals_to_insert)

with open ("contributors.txt") as contributors:
    next(contributors)
    for line in contributors.readlines():
        cid, last_name, first_name, middle_name, street_1, street_2, \
            city, state, zip_code, amount, date, candidate_id = line.strip().split('|')
        vals_to_insert = (last_name, first_name, middle_name, street_1, street_2, 
                          city, state, int(zip_code), amount, date, candidate_id)
        cursor.execute('''INSERT INTO contributors (last_name, first_name, middle_name, 
                           street_1, street_2, city, state, zip, amount, date, candidate_id) 
                           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert)

candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
contributor_cols = [col[1] for col in cursor.execute("PRAGMA table_info(contributors)")]

def viz_tables(cols, query):
    q = cursor.execute(query).fetchall()
    framelist = []
    for i, col_name in enumerate(cols):
        framelist.append((col_name, [col[i] for col in q]))
    return pd.DataFrame.from_items(framelist)

Recap

Last time, you played with a bunch of SQLite commands to query and update the tables in the database.

One thing we didn't get to was how to query the contributors table based off of a query in the candidates table. For example, suppose you want to query which contributors donated to Obama. You could use a nested SELECT statement to accomplish that.


In [2]:
query = '''SELECT * FROM contributors WHERE candidate_id = (SELECT id from candidates WHERE last_name = "Obama")'''
viz_tables(contributor_cols, query)


Out[2]:
id last_name first_name middle_name street_1 street_2 city state zip amount date candidate_id
0 26 Buckler Steve 24351 Armada Dr Dana Point CA 926291306 50.00 2007-07-30 20
1 27 Buckler Steve 24351 Armada Dr Dana Point CA 926291306 25.00 2007-08-16 20
2 28 Buckheit Bruce 8904 KAREN DR FAIRFAX VA 220312731 100.00 2007-09-19 20
3 29 Buckel Linda PO Box 683130 Park City UT 840683130 2300.00 2007-08-14 20
4 30 Buckel Linda PO Box 683130 Park City UT 840683130 -2300.00 2007-08-14 20
5 31 Buckel Linda PO Box 683130 Park City UT 840683130 4600.00 2007-08-14 20
6 32 Buck Thomas 4206 Terrace Street Kansas City MO 64111 100.00 2007-09-25 20
7 33 Buck Jay K. 1855 Old Willow Rd Unit 322 Northfield IL 600932918 200.00 2007-09-12 20
8 34 Buck Blaine M 45 Eaton Ave Camden ME 48431752 2300.00 2007-09-30 20
9 35 Buck Barbara 1780 NE 138th St North Miami FL 331811316 50.00 2007-09-13 20
10 36 Buck Barbara 1780 NE 138th St North Miami FL 331811316 50.00 2007-07-19 20
11 37 Buchman Mark M 2530 Lawton Ave San Luis Obispo CA 934015622 460.80 2007-07-18 20
12 38 Bucher Ida M 1400 Warnall Ave Los Angeles CA 900245333 100.00 2007-07-10 20
13 39 Buchanek Elizabeth 7917 Kentbury Dr Bethesda MD 208144615 50.00 2007-09-30 20
14 40 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 500.00 2007-09-24 20
15 41 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 -500.00 2007-09-24 20
16 42 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 500.00 2007-09-24 20
17 43 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 700.00 2007-08-28 20
18 44 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 -700.00 2007-08-28 20
19 45 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 1000.00 2007-08-28 20
20 46 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 1300.00 2007-08-09 20
21 47 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 200.00 2007-08-14 20
22 48 Buchanan John 2025 NW 29th Rd Boca Raton FL 334316303 500.00 2007-07-25 20
23 49 Buchanan John 4635 49th St NW Washington DC 200164320 200.09 2007-09-23 20
24 50 Harrison Ryan 2247 3rd St La Verne CA 917504918 25.00 2007-07-26 20

Joins

The last example involved querying data from multiple tables.

In particular, we combined columns from the two related tables (related through the FOREIGN KEY).

This leads to the idea of joining multiple tables together. SQL has a set of commands to handle different types of joins. SQLite does not support the full suite of join commands offered by SQL but you should still be able to get the main ideas from the limited command set.

We'll begin with the INNER JOIN.

INNER JOIN

The idea here is that you will combine the tables if the values of certain columns are the same between the two tables. In our example, we will join the two tables based on the candidate id. The result of the INNER JOIN will be a new table consisting of the columns we requested and containing the common data. Since we are joining based off of the candidate id, we will not be excluding any rows.

Example

Here are two tables. Table A has the form:

nA attr idA
s1 23 0
s2 7 2

and table B has the form:

nB attr idB
t1 60 0
t2 14 7
t3 22 2

Table A is associated with Table B through a foreign key on the id column.

If we join the two tables by comparing the id columns and selecting the nA, nB, and attr columns then we'll get

nA A.attr nB B.attr
s1 23 t1 60
s2 7 t3 22

The SQLite code to do this join would be

SELECT nA, A.attr, nB, B.attr FROM A INNER JOIN B ON B.idB = A.idA

Notice that the second row in table B is gone because the id values are not the same.

Thoughts

What is SQL doing with this operation? It may help to visualize this with a Venn diagram. Table A has rows with values corresponding to the idA attribute. Column B has rows with values corresponding to the idB attribute. The INNER JOIN will combine the two tables such that rows with common entries in the id attributes are included. We essentially have the following Venn diagram.

Exercises

  1. Using an INNER JOIN, join the candidates and contributors tables by comparing the candidate_id and candidates_id columns. Display your joined table with the columns contributors.last_name, contributors.first_name, and candidates.last_name.
  2. Do the same inner join as in the last part, but this time append a WHERE clause to select a specific candidate's last name.

LEFT JOIN or LEFT OUTER JOIN

There are many ways to combine two tables. We just explored one possibility in which we combined the tables based upon the intersection of the two tables (the INNER JOIN).

Now we'll talk about the LEFT JOIN or LEFT OUTER JOIN.

In words, the LEFT JOIN is combining the tables based upon what is in the intersection of the two tables and what is in the "reference" table.

We can consider our toy example in two guises:

Example A

Let's do a LEFT JOIN of table B from table A. That is, we'd like to make a new table by putting table B into table A. In this case, we'll consider table A our "reference" table. We're comparing by the id column again. We know that these two tables share ids 0 and 2 and table A doesn't have anything else in it. The resulting table is:

nA A.attr nB B.attr
s1 23 t1 60
s2 7 t3 22

That's not very exciting. It's the same result as from the INNER JOIN. We can do another example that may be more enlightening.

Example B

Let's do a LEFT JOIN of table A from table B. That is, we'd like to make a new table by putting table A into table B. In this case, we'll consider table B our "reference" table. Again, we use the id column from comparison. We know that these two tables share ids 0 and 2. This time, table B also contains the id 7, which is not shared by table A. The resulting table is:

nA A.attr nB B.attr
s1 23 t1 60
None NaN t2 14
s2 7 t3 22

Notice that SQLite filed in the missing entries for us. This is necessary for completion of the requested join.

The SQLite commands to accomplish all of this are:

SELECT nA, A.attr, nB, B.attr FROM A LEFT JOIN B ON B.idB = A.idA

and

SELECT nA, A.attr, nB, B.attr FROM B LEFT JOIN A ON A.idA = B.idB

Here is a visualization using Venn diagrams of the LEFT JOIN.

Exercises

Use the following two tables to do the first two exercises in this section. Table A has the form:

nA attr idA
s1 23 0
s2 7 2
s3 15 2
s4 31 0

and table B has the form:

nB attr idB
t1 60 0
t2 14 7
t3 22 2
  1. Draw the table that would result from a LEFT JOIN using table A as the reference and the id columns for comparison.
  2. Draw the table that would result from a LEFT JOIN using table B as the reference and the id columns for comparison.
  3. Create a new table with the following form:
average contribution number of contributors candidate last name
... ... ...

The table should be created using the LEFT JOIN clause on the contributors table by joining the candidates table by the id column. The average contribution column and number of contributors column should be obtained using the AVG and COUNT SQL functions. Finally, you should use the GROUP BY clause on the candidates last name.


pandas

We've been working with databases for the last few lectures and learning SQLite commands to work with and manipulate the databases. There is a Python package called pandas that provides broad support for data structures. It can be used to interact with relationsional databases through its own methods and even through SQL commands.

In the last part of this lecture, you will get to redo a bunch of the database exercises using pandas.

We won't be able to cover pandas from the ground up, but it's a well-documented library and is fairly easy to get up and running. Here's the website: pandas.

Reading a datafile into pandas


In [3]:
# Using pandas naming convention
dfcand = pd.read_csv("candidates.txt", sep="|")
dfcand


Out[3]:
id first_name last_name middle_name party
0 33 Joseph Biden NaN D
1 36 Samuel Brownback NaN R
2 34 Hillary Clinton R. D
3 39 Christopher Dodd J. D
4 26 John Edwards NaN D
5 22 Rudolph Giuliani NaN R
6 24 Mike Gravel NaN D
7 16 Mike Huckabee NaN R
8 30 Duncan Hunter NaN R
9 31 Dennis Kucinich NaN D
10 37 John McCain NaN R
11 20 Barack Obama NaN D
12 32 Ron Paul NaN R
13 29 Bill Richardson NaN D
14 35 Mitt Romney NaN R
15 38 Tom Tancredo NaN R
16 41 Fred Thompson D. R

In [4]:
dfcontr = pd.read_csv("contributors.txt", sep="|")
dfcontr


Out[4]:
id last_name first_name middle_name street_1 street_2 city state zip amount date candidate_id
0 NaN Agee Steven NaN 549 Laurel Branch Road NaN Floyd VA 24091 500.0 2007-06-30 16
1 NaN Ahrens Don NaN 4034 Rennellwood Way NaN Pleasanton CA 94566 250.0 2007-05-16 16
2 NaN Ahrens Don NaN 4034 Rennellwood Way NaN Pleasanton CA 94566 50.0 2007-06-18 16
3 NaN Ahrens Don NaN 4034 Rennellwood Way NaN Pleasanton CA 94566 100.0 2007-06-21 16
4 NaN Akin Charles NaN 10187 Sugar Creek Road NaN Bentonville AR 72712 100.0 2007-06-16 16
5 NaN Akin Mike NaN 181 Baywood Lane NaN Monticello AR 71655 1500.0 2007-05-18 16
6 NaN Akin Rebecca NaN 181 Baywood Lane NaN Monticello AR 71655 500.0 2007-05-18 16
7 NaN Aldridge Brittni NaN 808 Capitol Square Place, SW NaN Washington DC 20024 250.0 2007-06-06 16
8 NaN Allen John D. NaN 1052 Cannon Mill Drive NaN North Augusta SC 29860 1000.0 2007-06-11 16
9 NaN Allen John D. NaN 1052 Cannon Mill Drive NaN North Augusta SC 29860 1300.0 2007-06-29 16
10 NaN Allison John W. NaN P.O. Box 1089 NaN Conway AR 72033 1000.0 2007-05-18 16
11 NaN Allison Rebecca NaN 3206 Summit Court NaN Little Rock AR 72227 1000.0 2007-04-25 16
12 NaN Allison Rebecca NaN 3206 Summit Court NaN Little Rock AR 72227 200.0 2007-06-12 16
13 NaN Altes R.D. NaN 8600 Moody Road NaN Fort Smith AR 72903 2300.0 2007-06-21 16
14 NaN Andres Dale NaN 1160 Glen Oaks Drive NaN West Des Moines IA 50266 250.0 2007-06-06 16
15 NaN Anthony John NaN 211 Long Island Drive NaN Hot Springs AR 71913 2300.0 2007-06-12 16
16 NaN Arbogast Robert NaN 12900 State Route 56 SE NaN Mount Sterling OH 43143 500.0 2007-04-08 16
17 NaN Arbogast Robert NaN 12900 State Route 56 SE NaN Mount Sterling OH 43143 100.0 2007-06-22 16
18 NaN Ardle William NaN 412 Dakota Avenue NaN Springfield OH 45504 50.0 2007-06-28 16
19 NaN Atiq Omar NaN 7200 S Hazel Street NaN Pine Bluff AR 71603 1000.0 2007-05-18 16
20 NaN Atiq Omar NaN 7200 S Hazel Street NaN Pine Bluff AR 71603 1000.0 2007-06-27 16
21 NaN Baker David NaN 2550 Adamsbrooke Drive NaN Conway AR 72034 2300.0 2007-04-11 16
22 NaN Bancroft David NaN 2934 Broderick Street NaN San Francisco CA 94123 250.0 2007-04-24 16
23 NaN Banks Charles NaN P.O. Box 251310 NaN Little Rock AR 72225 1000.0 2007-05-14 16
24 NaN Barbee John NaN 516 Kellyridge Drive NaN Apex NC 27502 500.0 2007-05-23 16
25 NaN Buckler Steve NaN 24351 Armada Dr NaN Dana Point CA 926291306 50.0 2007-07-30 20
26 NaN Buckler Steve NaN 24351 Armada Dr NaN Dana Point CA 926291306 25.0 2007-08-16 20
27 NaN Buckheit Bruce NaN 8904 KAREN DR NaN FAIRFAX VA 220312731 100.0 2007-09-19 20
28 NaN Buckel Linda NaN PO Box 683130 NaN Park City UT 840683130 2300.0 2007-08-14 20
29 NaN Buckel Linda NaN PO Box 683130 NaN Park City UT 840683130 -2300.0 2007-08-14 20
... ... ... ... ... ... ... ... ... ... ... ... ...
145 NaN ABDELLA THOMAS M. 4231 MONUMENT WALL WAY #340 NaN FAIRFAX VA 220308440 50.0 2007-09-30 35
146 NaN ABBOTT WELDON S. 777 EAST SOUTH TEMPLE 4E NaN SALT LAKE CITY UT 841021269 100.0 2007-09-29 35
147 NaN ABBOTT WELDON S. 777 EAST SOUTH TEMPLE 4E NaN SALT LAKE CITY UT 841021269 50.0 2007-08-09 35
148 NaN ABBOTT GERALD F. 389 BENEFIT STREET NaN PROVIDENCE RI 29032946 100.0 2007-09-15 35
149 NaN ABBOTT GERALD F. 389 BENEFIT STREET NaN PROVIDENCE RI 29032946 100.0 2007-08-15 35
150 NaN ABEDIN ZAINUL NaN 715 N. CENTRAL AVENUE SUITE 212 GLENDALE CA 912031164 500.0 2008-01-21 37
151 NaN ABBOTT SYBIL F. 446 GAMES DRIVE NaN RENO NV 895093326 75.0 2008-01-08 37
152 NaN ABBOTT SYBIL F. 446 GAMES DRIVE NaN RENO NV 895093326 50.0 2008-01-08 37
153 NaN ABBOTT RONALD LEANDER 5453 HAWTHORNE STREET NaN MONTCLAIR CA 917632551 200.0 2008-01-31 37
154 NaN ABBOTT RONALD LEANDER 5453 HAWTHORNE STREET NaN MONTCLAIR CA 917632551 100.0 2008-01-08 37
155 NaN ABBOTT ROBERT A. 3061 LOREE ROAD NaN DECKERVILLE MI 484279763 500.0 2008-01-21 37
156 NaN ABBOTT MIKE E. 4516 OSPREY LNDG NaN NICEVILLE FL 325786810 1000.0 2008-01-15 37
157 NaN ABBOT DAVID M. 56 SALEM STREET NaN ANDOVER MA 18102114 200.0 2008-01-21 37
158 NaN ABBO PAULINE MORENCY 10720 JACOB LANE NaN WHITE LAKE MI 483862274 35.0 2008-01-07 37
159 NaN ABATE MARIA ELENA 1291 NIGHTINGALE AVENUE NaN MIAMI SPRINGS FL 331663832 2600.0 2008-01-25 37
160 NaN ABAIR PETER NaN 40 EVANS STREET NaN WATERTOWN MA 24722150 25.0 2008-01-09 37
161 NaN ABACHERLI SHIRLEY M. 29875 NEWPORT ROAD NaN MENIFEE CA 925849524 150.0 2008-01-28 37
162 NaN AARONS CHARLES NaN 1730 SHORE DRIVE NaN ANCHORAGE AK 995153207 300.0 2008-01-30 37
163 NaN AARONS CHARLES NaN 1730 SHORE DRIVE NaN ANCHORAGE AK 995153207 410.0 2008-01-15 37
164 NaN AARONS CHARLES NaN 1730 SHORE DRIVE NaN ANCHORAGE AK 995153207 500.0 2008-01-09 37
165 NaN ABEL JOHN H. 422 THOMAS STREET NaN BETHLEHEM PA 180153316 200.0 2008-01-22 37
166 NaN ABEL MARLING L. 14 HANGING MOSS LANE NaN GREENVILLE SC 296155069 100.0 2008-01-22 37
167 NaN ABEL RUDOLPH NaN 4532 OCEAN BLVD. # 108 SARASOTA FL 342421337 100.0 2008-01-08 37
168 NaN ABELE RODNEY NaN 3620 METAIRIE HEIGHTS AVENUE NaN METAIRIE LA 700021823 500.0 2008-01-15 37
169 NaN ABERCROMBIE DENIS NaN 11811 WATER OAK CT NaN MAGNOLIA TX 773546270 500.0 2008-01-30 37
170 NaN ABESHAUS MERRILL M. 1801 N. HEREFORD DRIVE NaN FLAGSTAFF AZ 860011121 120.0 2008-01-16 37
171 NaN ABRAHAM GEORGE NaN P.O. BOX 1504 NaN LAKE CHARLES LA 706021504 800.0 2008-01-17 37
172 NaN ABRAHAMSON PETER J. 1030 W. ROSCOE STREET NaN CHICAGO IL 606572207 50.0 2008-01-25 37
173 NaN ABRAHAM SALEM A. P.O. BOX 7 NaN CANADIAN TX 790140007 1000.0 2008-01-17 37
174 NaN ABRAHAM SALEM A. P.O. BOX 7 NaN CANADIAN TX 790140007 1300.0 2008-01-30 37

175 rows × 12 columns

Reading things in is quite easy with pandas.

Notice that pandas populates empty fields with NaN values.

The id column in the contributors dataset is superfluous. Let's delete it.


In [5]:
del dfcontr['id']
dfcontr.head()


Out[5]:
last_name first_name middle_name street_1 street_2 city state zip amount date candidate_id
0 Agee Steven NaN 549 Laurel Branch Road NaN Floyd VA 24091 500.0 2007-06-30 16
1 Ahrens Don NaN 4034 Rennellwood Way NaN Pleasanton CA 94566 250.0 2007-05-16 16
2 Ahrens Don NaN 4034 Rennellwood Way NaN Pleasanton CA 94566 50.0 2007-06-18 16
3 Ahrens Don NaN 4034 Rennellwood Way NaN Pleasanton CA 94566 100.0 2007-06-21 16
4 Akin Charles NaN 10187 Sugar Creek Road NaN Bentonville AR 72712 100.0 2007-06-16 16

Very nice! And we used the head method to print out the first five rows.

Creating a Table with pandas

We can use pandas to create tables in a database.

First, let's create a new database since we've already done a lot on our test database.


In [6]:
dbp = sqlite3.connect('L19_pandas_DB.sqlite')
csr = dbp.cursor()
csr.execute("DROP TABLE IF EXISTS candidates")
csr.execute("DROP TABLE IF EXISTS contributors")
csr.execute("PRAGMA foreign_keys=1")

csr.execute('''CREATE TABLE candidates (
               id INTEGER PRIMARY KEY NOT NULL, 
               first_name TEXT, 
               last_name TEXT, 
               middle_name TEXT, 
               party TEXT NOT NULL)''')

dbp.commit() # Commit changes to the database

csr.execute('''CREATE TABLE contributors (
          id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
          last_name TEXT, 
          first_name TEXT, 
          middle_name TEXT, 
          street_1 TEXT, 
          street_2 TEXT, 
          city TEXT, 
          state TEXT, 
          zip TEXT, 
          amount REAL, 
          date DATETIME, 
          candidate_id INTEGER NOT NULL, 
          FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')

dbp.commit()

Last time, we opened the data files with Python and then manually used SQLite commands to populate the individual tables. We can use pandas instead like so.


In [7]:
dfcand.to_sql("candidates", dbp, if_exists="append", index=False)

How big is our table?


In [8]:
dfcand.shape


Out[8]:
(17, 5)

We can visualize the data in our pandas-populated table. No surprises here except that pandas did everything for us.


In [9]:
query = '''SELECT * FROM candidates'''
csr.execute(query).fetchall()


Out[9]:
[(16, 'Mike', 'Huckabee', None, 'R'),
 (20, 'Barack', 'Obama', None, 'D'),
 (22, 'Rudolph', 'Giuliani', None, 'R'),
 (24, 'Mike', 'Gravel', None, 'D'),
 (26, 'John', 'Edwards', None, 'D'),
 (29, 'Bill', 'Richardson', None, 'D'),
 (30, 'Duncan', 'Hunter', None, 'R'),
 (31, 'Dennis', 'Kucinich', None, 'D'),
 (32, 'Ron', 'Paul', None, 'R'),
 (33, 'Joseph', 'Biden', None, 'D'),
 (34, 'Hillary', 'Clinton', 'R.', 'D'),
 (35, 'Mitt', 'Romney', None, 'R'),
 (36, 'Samuel', 'Brownback', None, 'R'),
 (37, 'John', 'McCain', None, 'R'),
 (38, 'Tom', 'Tancredo', None, 'R'),
 (39, 'Christopher', 'Dodd', 'J.', 'D'),
 (41, 'Fred', 'Thompson', 'D.', 'R')]

Querying a table with pandas

One Way


In [10]:
dfcand.query("first_name=='Mike' & party=='D'")


Out[10]:
id first_name last_name middle_name party
6 24 Mike Gravel NaN D

Another Way


In [11]:
dfcand[(dfcand.first_name=="Mike") & (dfcand.party=="D")]


Out[11]:
id first_name last_name middle_name party
6 24 Mike Gravel NaN D

More Queries


In [12]:
dfcand[dfcand.middle_name.notnull()]


Out[12]:
id first_name last_name middle_name party
2 34 Hillary Clinton R. D
3 39 Christopher Dodd J. D
16 41 Fred Thompson D. R

In [13]:
dfcand[dfcand.first_name.isin(['Mike', 'Hillary'])]


Out[13]:
id first_name last_name middle_name party
2 34 Hillary Clinton R. D
6 24 Mike Gravel NaN D
7 16 Mike Huckabee NaN R

Exercises

  1. Use pandas to populate the contributors table.
  2. Query the contributors tables with the following:
    1. List entries where the state is "VA" and the amount is less than $\$400.00$.
    2. List entries where the state is "NULL".
    3. List entries for the states of Texas and Pennsylvania.
    4. List entries where the amount contributed is between $\$10.00$ and $\$50.00$.

Sorting


In [14]:
dfcand.sort_values(by='party')


Out[14]:
id first_name last_name middle_name party
0 33 Joseph Biden NaN D
2 34 Hillary Clinton R. D
3 39 Christopher Dodd J. D
4 26 John Edwards NaN D
13 29 Bill Richardson NaN D
6 24 Mike Gravel NaN D
11 20 Barack Obama NaN D
9 31 Dennis Kucinich NaN D
14 35 Mitt Romney NaN R
12 32 Ron Paul NaN R
8 30 Duncan Hunter NaN R
15 38 Tom Tancredo NaN R
7 16 Mike Huckabee NaN R
5 22 Rudolph Giuliani NaN R
1 36 Samuel Brownback NaN R
10 37 John McCain NaN R
16 41 Fred Thompson D. R

In [15]:
dfcand.sort_values(by='party', ascending=False)


Out[15]:
id first_name last_name middle_name party
8 30 Duncan Hunter NaN R
7 16 Mike Huckabee NaN R
15 38 Tom Tancredo NaN R
14 35 Mitt Romney NaN R
12 32 Ron Paul NaN R
10 37 John McCain NaN R
1 36 Samuel Brownback NaN R
16 41 Fred Thompson D. R
5 22 Rudolph Giuliani NaN R
6 24 Mike Gravel NaN D
9 31 Dennis Kucinich NaN D
11 20 Barack Obama NaN D
4 26 John Edwards NaN D
13 29 Bill Richardson NaN D
3 39 Christopher Dodd J. D
2 34 Hillary Clinton R. D
0 33 Joseph Biden NaN D

Selecting Columns


In [16]:
dfcand[['last_name', 'party']]


Out[16]:
last_name party
0 Biden D
1 Brownback R
2 Clinton D
3 Dodd D
4 Edwards D
5 Giuliani R
6 Gravel D
7 Huckabee R
8 Hunter R
9 Kucinich D
10 McCain R
11 Obama D
12 Paul R
13 Richardson D
14 Romney R
15 Tancredo R
16 Thompson R

In [17]:
dfcand[['last_name', 'party']].count()


Out[17]:
last_name    17
party        17
dtype: int64

In [18]:
dfcand[['first_name']].drop_duplicates()


Out[18]:
first_name
0 Joseph
1 Samuel
2 Hillary
3 Christopher
4 John
5 Rudolph
6 Mike
8 Duncan
9 Dennis
11 Barack
12 Ron
13 Bill
14 Mitt
15 Tom
16 Fred

In [19]:
dfcand[['first_name']].drop_duplicates().count()


Out[19]:
first_name    15
dtype: int64

Exercises

  1. Sort the contributors table by amount and order in descending order.
  2. Select the first_name and amount columns.
  3. Select the last_name and first_name columns and drop duplicates.
  4. Count how many there are after the duplicates have been dropped.

Altering Tables

Creating a new column is quite easy with pandas.


In [20]:
dfcand['name'] = dfcand['last_name'] + ", " + dfcand['first_name']
dfcand


Out[20]:
id first_name last_name middle_name party name
0 33 Joseph Biden NaN D Biden, Joseph
1 36 Samuel Brownback NaN R Brownback, Samuel
2 34 Hillary Clinton R. D Clinton, Hillary
3 39 Christopher Dodd J. D Dodd, Christopher
4 26 John Edwards NaN D Edwards, John
5 22 Rudolph Giuliani NaN R Giuliani, Rudolph
6 24 Mike Gravel NaN D Gravel, Mike
7 16 Mike Huckabee NaN R Huckabee, Mike
8 30 Duncan Hunter NaN R Hunter, Duncan
9 31 Dennis Kucinich NaN D Kucinich, Dennis
10 37 John McCain NaN R McCain, John
11 20 Barack Obama NaN D Obama, Barack
12 32 Ron Paul NaN R Paul, Ron
13 29 Bill Richardson NaN D Richardson, Bill
14 35 Mitt Romney NaN R Romney, Mitt
15 38 Tom Tancredo NaN R Tancredo, Tom
16 41 Fred Thompson D. R Thompson, Fred

We can change an existing field as well.


In [21]:
dfcand.loc[dfcand.first_name == "Mike", "name"]


Out[21]:
6      Gravel, Mike
7    Huckabee, Mike
Name: name, dtype: object

In [22]:
dfcand.loc[dfcand.first_name == "Mike", "name"] = "Mikey"

In [23]:
dfcand.query("first_name == 'Mike'")


Out[23]:
id first_name last_name middle_name party name
6 24 Mike Gravel NaN D Mikey
7 16 Mike Huckabee NaN R Mikey

You may recall that SQLite doesn't have the functionality to drop a column. It's a one-liner with pandas.


In [24]:
del dfcand['name']
dfcand


Out[24]:
id first_name last_name middle_name party
0 33 Joseph Biden NaN D
1 36 Samuel Brownback NaN R
2 34 Hillary Clinton R. D
3 39 Christopher Dodd J. D
4 26 John Edwards NaN D
5 22 Rudolph Giuliani NaN R
6 24 Mike Gravel NaN D
7 16 Mike Huckabee NaN R
8 30 Duncan Hunter NaN R
9 31 Dennis Kucinich NaN D
10 37 John McCain NaN R
11 20 Barack Obama NaN D
12 32 Ron Paul NaN R
13 29 Bill Richardson NaN D
14 35 Mitt Romney NaN R
15 38 Tom Tancredo NaN R
16 41 Fred Thompson D. R

Exercises

  1. Create a name column for the contributors table with field entries of the form "last name, first name"
  2. For contributors from the state of "PA", change the name to "X".
  3. Delete the newly created name column.

Aggregation

We'd like to get information about the tables such as the maximum amount contributed to the candidates. Here are a bunch of way to describe the tables.


In [25]:
dfcand.describe()


Out[25]:
id
count 17.000000
mean 30.764706
std 7.110990
min 16.000000
25% 26.000000
50% 32.000000
75% 36.000000
max 41.000000

It's not very interesting with the candidates table because the candidates table only has one numeric column.

Exercise

Use the describe() method on the contributors table.

I'll use the contributors table to do some demos now.


In [26]:
dfcontr.amount.max()


Out[26]:
4600.0

In [27]:
dfcontr[dfcontr.amount==dfcontr.amount.max()]


Out[27]:
last_name first_name middle_name street_1 street_2 city state zip amount date candidate_id
30 Buckel Linda NaN PO Box 683130 NaN Park City UT 840683130 4600.0 2007-08-14 20

In [28]:
dfcontr.groupby("state").sum()


Out[28]:
zip amount candidate_id
state
AK 2985459621 1210.00 111
AR 864790 14200.00 192
AZ 860011121 120.00 37
CA 14736360720 -5013.73 600
CO 2405477834 -5823.00 111
CT 68901376 2300.00 35
DC 800341853 -1549.91 102
FL 8970626520 -4050.00 803
IA 50266 250.00 16
ID 83648 -261.00 32
IL 3042068689 -5586.80 175
KS 66215 -330.00 32
KY 402597029 -200.00 22
LA 1406043327 1300.00 74
MA 123026638 -83.00 208
MD 416287617 300.00 55
ME 165647170 2520.00 122
MI 2426973485 -1265.00 164
MN 1102338918 322.00 100
MO 64111 100.00 20
NC 27502 500.00 16
NH 32564424 -24.60 32
NJ 70254993 -817.45 64
NV 3575889763 725.00 144
NY 606129991 -6474.50 233
OH 176071 450.00 80
OK 2202499044 800.00 102
PA 540499020 -2146.00 145
RI 58065892 200.00 70
SC 296214789 2400.00 69
TN 37188 -25.00 32
TX 6221452245 1985.24 302
UT 9251153394 5050.00 340
VA 440691831 515.92 135
WA 2941290251 -500.00 90

In [29]:
dfcontr.groupby("state")["amount"].sum()


Out[29]:
state
AK     1210.00
AR    14200.00
AZ      120.00
CA    -5013.73
CO    -5823.00
CT     2300.00
DC    -1549.91
FL    -4050.00
IA      250.00
ID     -261.00
IL    -5586.80
KS     -330.00
KY     -200.00
LA     1300.00
MA      -83.00
MD      300.00
ME     2520.00
MI    -1265.00
MN      322.00
MO      100.00
NC      500.00
NH      -24.60
NJ     -817.45
NV      725.00
NY    -6474.50
OH      450.00
OK      800.00
PA    -2146.00
RI      200.00
SC     2400.00
TN      -25.00
TX     1985.24
UT     5050.00
VA      515.92
WA     -500.00
Name: amount, dtype: float64

In [30]:
dfcontr.state.unique()


Out[30]:
array(['VA', 'CA', 'AR', 'DC', 'SC', 'IA', 'OH', 'NC', 'UT', 'MO', 'IL',
       'ME', 'FL', 'MD', 'MI', 'CO', 'WA', 'NY', 'TX', 'KY', 'PA', 'TN',
       'MA', 'MN', 'KS', 'NJ', 'NH', 'ID', 'OK', nan, 'NV', 'CT', 'RI',
       'AK', 'LA', 'AZ'], dtype=object)

There is also a version of the LIMIT clause. It's very intuitive with pandas.


In [31]:
dfcand[0:3]


Out[31]:
id first_name last_name middle_name party
0 33 Joseph Biden NaN D
1 36 Samuel Brownback NaN R
2 34 Hillary Clinton R. D

The usual Python slicing works just fine!

Joins with pandas

pandas has some some documentation on joins: Merge, join, and concatenate. If you want some more reinforcement on the concepts from earlier regarding JOIN, then the pandas documentation may be a good place to get it.

You may also be interested in a comparison with SQL.

To do joins with pandas, we use the merge command.

Here's an example of an explicit inner join:


In [32]:
cols_wanted = ['last_name_x', 'first_name_x', 'candidate_id', 'id', 'last_name_y']
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id")[cols_wanted]


Out[32]:
last_name_x first_name_x candidate_id id last_name_y
0 Agee Steven 16 16 Huckabee
1 Ahrens Don 16 16 Huckabee
2 Ahrens Don 16 16 Huckabee
3 Ahrens Don 16 16 Huckabee
4 Akin Charles 16 16 Huckabee
5 Akin Mike 16 16 Huckabee
6 Akin Rebecca 16 16 Huckabee
7 Aldridge Brittni 16 16 Huckabee
8 Allen John D. 16 16 Huckabee
9 Allen John D. 16 16 Huckabee
10 Allison John W. 16 16 Huckabee
11 Allison Rebecca 16 16 Huckabee
12 Allison Rebecca 16 16 Huckabee
13 Altes R.D. 16 16 Huckabee
14 Andres Dale 16 16 Huckabee
15 Anthony John 16 16 Huckabee
16 Arbogast Robert 16 16 Huckabee
17 Arbogast Robert 16 16 Huckabee
18 Ardle William 16 16 Huckabee
19 Atiq Omar 16 16 Huckabee
20 Atiq Omar 16 16 Huckabee
21 Baker David 16 16 Huckabee
22 Bancroft David 16 16 Huckabee
23 Banks Charles 16 16 Huckabee
24 Barbee John 16 16 Huckabee
25 Buckler Steve 20 20 Obama
26 Buckler Steve 20 20 Obama
27 Buckheit Bruce 20 20 Obama
28 Buckel Linda 20 20 Obama
29 Buckel Linda 20 20 Obama
... ... ... ... ... ...
145 ABDELLA THOMAS 35 35 Romney
146 ABBOTT WELDON 35 35 Romney
147 ABBOTT WELDON 35 35 Romney
148 ABBOTT GERALD 35 35 Romney
149 ABBOTT GERALD 35 35 Romney
150 ABEDIN ZAINUL 37 37 McCain
151 ABBOTT SYBIL 37 37 McCain
152 ABBOTT SYBIL 37 37 McCain
153 ABBOTT RONALD 37 37 McCain
154 ABBOTT RONALD 37 37 McCain
155 ABBOTT ROBERT 37 37 McCain
156 ABBOTT MIKE 37 37 McCain
157 ABBOT DAVID 37 37 McCain
158 ABBO PAULINE 37 37 McCain
159 ABATE MARIA 37 37 McCain
160 ABAIR PETER 37 37 McCain
161 ABACHERLI SHIRLEY 37 37 McCain
162 AARONS CHARLES 37 37 McCain
163 AARONS CHARLES 37 37 McCain
164 AARONS CHARLES 37 37 McCain
165 ABEL JOHN 37 37 McCain
166 ABEL MARLING 37 37 McCain
167 ABEL RUDOLPH 37 37 McCain
168 ABELE RODNEY 37 37 McCain
169 ABERCROMBIE DENIS 37 37 McCain
170 ABESHAUS MERRILL 37 37 McCain
171 ABRAHAM GEORGE 37 37 McCain
172 ABRAHAMSON PETER 37 37 McCain
173 ABRAHAM SALEM 37 37 McCain
174 ABRAHAM SALEM 37 37 McCain

175 rows × 5 columns

Somewhat organized example


In [33]:
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id")[cols_wanted].groupby('last_name_y').describe()


Out[33]:
candidate_id id
count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max
last_name_y
Clinton 25.0 34.0 0.0 34.0 34.0 34.0 34.0 34.0 25.0 34.0 0.0 34.0 34.0 34.0 34.0 34.0
Giuliani 25.0 22.0 0.0 22.0 22.0 22.0 22.0 22.0 25.0 22.0 0.0 22.0 22.0 22.0 22.0 22.0
Huckabee 25.0 16.0 0.0 16.0 16.0 16.0 16.0 16.0 25.0 16.0 0.0 16.0 16.0 16.0 16.0 16.0
McCain 25.0 37.0 0.0 37.0 37.0 37.0 37.0 37.0 25.0 37.0 0.0 37.0 37.0 37.0 37.0 37.0
Obama 25.0 20.0 0.0 20.0 20.0 20.0 20.0 20.0 25.0 20.0 0.0 20.0 20.0 20.0 20.0 20.0
Paul 25.0 32.0 0.0 32.0 32.0 32.0 32.0 32.0 25.0 32.0 0.0 32.0 32.0 32.0 32.0 32.0
Romney 25.0 35.0 0.0 35.0 35.0 35.0 35.0 35.0 25.0 35.0 0.0 35.0 35.0 35.0 35.0 35.0

Other Joins with pandas

We didn't cover all possible joins because SQLite can only handle the few that we did discuss. As mentioned, there are workarounds for some things in SQLite, but not evertyhing. Fortunately, pandas can handle pretty much everything. Here are a few joins that pandas can handle:

  • LEFT OUTER (already discussed)
  • RIGHT OUTER - Think of the "opposite" of a LEFT OUTER join (shade the intersection and right set in the Venn diagram).
  • FULL OUTER - Combine everything from both tables (shade the entire Venn diagram)

Left Outer Join with pandas


In [34]:
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id", how="left")[cols_wanted]


Out[34]:
last_name_x first_name_x candidate_id id last_name_y
0 Agee Steven 16 16 Huckabee
1 Ahrens Don 16 16 Huckabee
2 Ahrens Don 16 16 Huckabee
3 Ahrens Don 16 16 Huckabee
4 Akin Charles 16 16 Huckabee
5 Akin Mike 16 16 Huckabee
6 Akin Rebecca 16 16 Huckabee
7 Aldridge Brittni 16 16 Huckabee
8 Allen John D. 16 16 Huckabee
9 Allen John D. 16 16 Huckabee
10 Allison John W. 16 16 Huckabee
11 Allison Rebecca 16 16 Huckabee
12 Allison Rebecca 16 16 Huckabee
13 Altes R.D. 16 16 Huckabee
14 Andres Dale 16 16 Huckabee
15 Anthony John 16 16 Huckabee
16 Arbogast Robert 16 16 Huckabee
17 Arbogast Robert 16 16 Huckabee
18 Ardle William 16 16 Huckabee
19 Atiq Omar 16 16 Huckabee
20 Atiq Omar 16 16 Huckabee
21 Baker David 16 16 Huckabee
22 Bancroft David 16 16 Huckabee
23 Banks Charles 16 16 Huckabee
24 Barbee John 16 16 Huckabee
25 Buckler Steve 20 20 Obama
26 Buckler Steve 20 20 Obama
27 Buckheit Bruce 20 20 Obama
28 Buckel Linda 20 20 Obama
29 Buckel Linda 20 20 Obama
... ... ... ... ... ...
145 ABDELLA THOMAS 35 35 Romney
146 ABBOTT WELDON 35 35 Romney
147 ABBOTT WELDON 35 35 Romney
148 ABBOTT GERALD 35 35 Romney
149 ABBOTT GERALD 35 35 Romney
150 ABEDIN ZAINUL 37 37 McCain
151 ABBOTT SYBIL 37 37 McCain
152 ABBOTT SYBIL 37 37 McCain
153 ABBOTT RONALD 37 37 McCain
154 ABBOTT RONALD 37 37 McCain
155 ABBOTT ROBERT 37 37 McCain
156 ABBOTT MIKE 37 37 McCain
157 ABBOT DAVID 37 37 McCain
158 ABBO PAULINE 37 37 McCain
159 ABATE MARIA 37 37 McCain
160 ABAIR PETER 37 37 McCain
161 ABACHERLI SHIRLEY 37 37 McCain
162 AARONS CHARLES 37 37 McCain
163 AARONS CHARLES 37 37 McCain
164 AARONS CHARLES 37 37 McCain
165 ABEL JOHN 37 37 McCain
166 ABEL MARLING 37 37 McCain
167 ABEL RUDOLPH 37 37 McCain
168 ABELE RODNEY 37 37 McCain
169 ABERCROMBIE DENIS 37 37 McCain
170 ABESHAUS MERRILL 37 37 McCain
171 ABRAHAM GEORGE 37 37 McCain
172 ABRAHAMSON PETER 37 37 McCain
173 ABRAHAM SALEM 37 37 McCain
174 ABRAHAM SALEM 37 37 McCain

175 rows × 5 columns

Right Outer Join with pandas


In [36]:
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id", how="right")[cols_wanted]


Out[36]:
last_name_x first_name_x candidate_id id last_name_y
0 Agee Steven 16.0 16 Huckabee
1 Ahrens Don 16.0 16 Huckabee
2 Ahrens Don 16.0 16 Huckabee
3 Ahrens Don 16.0 16 Huckabee
4 Akin Charles 16.0 16 Huckabee
5 Akin Mike 16.0 16 Huckabee
6 Akin Rebecca 16.0 16 Huckabee
7 Aldridge Brittni 16.0 16 Huckabee
8 Allen John D. 16.0 16 Huckabee
9 Allen John D. 16.0 16 Huckabee
10 Allison John W. 16.0 16 Huckabee
11 Allison Rebecca 16.0 16 Huckabee
12 Allison Rebecca 16.0 16 Huckabee
13 Altes R.D. 16.0 16 Huckabee
14 Andres Dale 16.0 16 Huckabee
15 Anthony John 16.0 16 Huckabee
16 Arbogast Robert 16.0 16 Huckabee
17 Arbogast Robert 16.0 16 Huckabee
18 Ardle William 16.0 16 Huckabee
19 Atiq Omar 16.0 16 Huckabee
20 Atiq Omar 16.0 16 Huckabee
21 Baker David 16.0 16 Huckabee
22 Bancroft David 16.0 16 Huckabee
23 Banks Charles 16.0 16 Huckabee
24 Barbee John 16.0 16 Huckabee
25 Buckler Steve 20.0 20 Obama
26 Buckler Steve 20.0 20 Obama
27 Buckheit Bruce 20.0 20 Obama
28 Buckel Linda 20.0 20 Obama
29 Buckel Linda 20.0 20 Obama
... ... ... ... ... ...
155 ABBOTT ROBERT 37.0 37 McCain
156 ABBOTT MIKE 37.0 37 McCain
157 ABBOT DAVID 37.0 37 McCain
158 ABBO PAULINE 37.0 37 McCain
159 ABATE MARIA 37.0 37 McCain
160 ABAIR PETER 37.0 37 McCain
161 ABACHERLI SHIRLEY 37.0 37 McCain
162 AARONS CHARLES 37.0 37 McCain
163 AARONS CHARLES 37.0 37 McCain
164 AARONS CHARLES 37.0 37 McCain
165 ABEL JOHN 37.0 37 McCain
166 ABEL MARLING 37.0 37 McCain
167 ABEL RUDOLPH 37.0 37 McCain
168 ABELE RODNEY 37.0 37 McCain
169 ABERCROMBIE DENIS 37.0 37 McCain
170 ABESHAUS MERRILL 37.0 37 McCain
171 ABRAHAM GEORGE 37.0 37 McCain
172 ABRAHAMSON PETER 37.0 37 McCain
173 ABRAHAM SALEM 37.0 37 McCain
174 ABRAHAM SALEM 37.0 37 McCain
175 NaN NaN NaN 33 Biden
176 NaN NaN NaN 36 Brownback
177 NaN NaN NaN 39 Dodd
178 NaN NaN NaN 26 Edwards
179 NaN NaN NaN 24 Gravel
180 NaN NaN NaN 30 Hunter
181 NaN NaN NaN 31 Kucinich
182 NaN NaN NaN 29 Richardson
183 NaN NaN NaN 38 Tancredo
184 NaN NaN NaN 41 Thompson

185 rows × 5 columns

Full Outer Join with pandas


In [37]:
dfcontr.merge(dfcand, left_on="candidate_id", right_on="id", how="outer")[cols_wanted]


Out[37]:
last_name_x first_name_x candidate_id id last_name_y
0 Agee Steven 16.0 16 Huckabee
1 Ahrens Don 16.0 16 Huckabee
2 Ahrens Don 16.0 16 Huckabee
3 Ahrens Don 16.0 16 Huckabee
4 Akin Charles 16.0 16 Huckabee
5 Akin Mike 16.0 16 Huckabee
6 Akin Rebecca 16.0 16 Huckabee
7 Aldridge Brittni 16.0 16 Huckabee
8 Allen John D. 16.0 16 Huckabee
9 Allen John D. 16.0 16 Huckabee
10 Allison John W. 16.0 16 Huckabee
11 Allison Rebecca 16.0 16 Huckabee
12 Allison Rebecca 16.0 16 Huckabee
13 Altes R.D. 16.0 16 Huckabee
14 Andres Dale 16.0 16 Huckabee
15 Anthony John 16.0 16 Huckabee
16 Arbogast Robert 16.0 16 Huckabee
17 Arbogast Robert 16.0 16 Huckabee
18 Ardle William 16.0 16 Huckabee
19 Atiq Omar 16.0 16 Huckabee
20 Atiq Omar 16.0 16 Huckabee
21 Baker David 16.0 16 Huckabee
22 Bancroft David 16.0 16 Huckabee
23 Banks Charles 16.0 16 Huckabee
24 Barbee John 16.0 16 Huckabee
25 Buckler Steve 20.0 20 Obama
26 Buckler Steve 20.0 20 Obama
27 Buckheit Bruce 20.0 20 Obama
28 Buckel Linda 20.0 20 Obama
29 Buckel Linda 20.0 20 Obama
... ... ... ... ... ...
155 ABBOTT ROBERT 37.0 37 McCain
156 ABBOTT MIKE 37.0 37 McCain
157 ABBOT DAVID 37.0 37 McCain
158 ABBO PAULINE 37.0 37 McCain
159 ABATE MARIA 37.0 37 McCain
160 ABAIR PETER 37.0 37 McCain
161 ABACHERLI SHIRLEY 37.0 37 McCain
162 AARONS CHARLES 37.0 37 McCain
163 AARONS CHARLES 37.0 37 McCain
164 AARONS CHARLES 37.0 37 McCain
165 ABEL JOHN 37.0 37 McCain
166 ABEL MARLING 37.0 37 McCain
167 ABEL RUDOLPH 37.0 37 McCain
168 ABELE RODNEY 37.0 37 McCain
169 ABERCROMBIE DENIS 37.0 37 McCain
170 ABESHAUS MERRILL 37.0 37 McCain
171 ABRAHAM GEORGE 37.0 37 McCain
172 ABRAHAMSON PETER 37.0 37 McCain
173 ABRAHAM SALEM 37.0 37 McCain
174 ABRAHAM SALEM 37.0 37 McCain
175 NaN NaN NaN 33 Biden
176 NaN NaN NaN 36 Brownback
177 NaN NaN NaN 39 Dodd
178 NaN NaN NaN 26 Edwards
179 NaN NaN NaN 24 Gravel
180 NaN NaN NaN 30 Hunter
181 NaN NaN NaN 31 Kucinich
182 NaN NaN NaN 29 Richardson
183 NaN NaN NaN 38 Tancredo
184 NaN NaN NaN 41 Thompson

185 rows × 5 columns