Jupyter
notebook called Exercises-Final.ipynb
inside the L18
directory. This is the one we will grade.Jupyter
notebook cell. It should look like a Python
script. You must comment where appropriate to demonstrate that you understand what you are doing.L18DB.sqlite
.
In [1]:
import sqlite3
We will also use a basic a pandas
feature to display tables in the database. Although this lecture isn't on pandas
, I will still have you use it a little bit.
In [2]:
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
Now we create the tables in the database (just like last time).
In [3]:
db = sqlite3.connect('L18DB_demo.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS candidates")
cursor.execute("DROP TABLE IF EXISTS contributors")
cursor.execute("PRAGMA foreign_keys=1")
cursor.execute('''CREATE TABLE candidates (
id INTEGER PRIMARY KEY NOT NULL,
first_name TEXT,
last_name TEXT,
middle_init TEXT,
party TEXT NOT NULL)''')
db.commit() # Commit changes to the database
cursor.execute('''CREATE TABLE contributors (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
last_name TEXT,
first_name TEXT,
middle_name TEXT,
street_1 TEXT,
street_2 TEXT,
city TEXT,
state TEXT,
zip TEXT,
amount REAL,
date DATETIME,
candidate_id INTEGER NOT NULL,
FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')
db.commit()
In [4]:
with open ("candidates.txt") as candidates:
next(candidates) # jump over the header
for line in candidates.readlines():
cid, first_name, last_name, middle_name, party = line.strip().split('|')
vals_to_insert = (int(cid), first_name, last_name, middle_name, party)
cursor.execute('''INSERT INTO candidates
(id, first_name, last_name, middle_init, party)
VALUES (?, ?, ?, ?, ?)''', vals_to_insert)
In [5]:
with open ("contributors.txt") as contributors:
next(contributors)
for line in contributors.readlines():
cid, last_name, first_name, middle_name, street_1, street_2, \
city, state, zip_code, amount, date, candidate_id = line.strip().split('|')
vals_to_insert = (last_name, first_name, middle_name, street_1, street_2,
city, state, int(zip_code), amount, date, candidate_id)
cursor.execute('''INSERT INTO contributors (last_name, first_name, middle_name,
street_1, street_2, city, state, zip, amount, date, candidate_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert)
In [6]:
def viz_tables(cols, query):
q = cursor.execute(query).fetchall()
framelist = []
for i, col_name in enumerate(cols):
framelist.append((col_name, [col[i] for col in q]))
return pd.DataFrame.from_items(framelist)
Here's how we can use our helper function. It gives a pretty nice visualization of our table. You should do the same thing with the contributors
table.
In [7]:
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)
Out[7]:
In [8]:
query = '''SELECT * FROM candidates WHERE middle_init <> ""'''
viz_tables(candidate_cols, query)
Out[8]:
We can also see how many entries satisfy the query:
In [9]:
print("{} candidates have a middle initial.".format(viz_tables(candidate_cols, query).shape[0]))
state
to the empty stringIN ("WA", "PA")
in your SELECT
statement.BETWEEN 100.00 and 200.00
clause.
In [10]:
query = '''SELECT * FROM candidates ORDER BY id DESC'''
viz_tables(candidate_cols, query)
Out[10]:
contributors
table:contributors
table by last_name
.amount
in decending order where amount
is restricted to be between $\$1000.00$ and $\$5000.00$.candidate_id
and then by amount
in descending order.ORDER BY
with commas.ORDER BY amount ASC, last_name DESC
In [11]:
query = '''SELECT last_name, party FROM candidates'''
viz_tables(['last_name', 'party'], query)
Out[11]:
Using the DISTINCT
clause, you remove duplicate rows.
In [12]:
query = '''SELECT DISTINCT party FROM candidates'''
viz_tables(['party'], query)
Out[12]:
In [13]:
cursor.execute('''ALTER TABLE candidates ADD COLUMN full_name TEXT''')
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
viz_tables(candidate_cols, '''SELECT * FROM candidates''')
Out[13]:
What if we want to rename or delete a columm? It can't be done with SQLite
with a single command. We need to follow some roundabout steps (see SQLite
ALTER TABLE). We won't consider this case at the moment.
For now, let's put a few commands together to populate the full_name
column.
In [14]:
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")] # regenerate columns with full_name
query = '''SELECT id, last_name, first_name FROM candidates''' # Select a few columns
full_name_and_id = [(attr[1] + ", " + attr[2], attr[0]) for attr in cursor.execute(query).fetchall()] # List of tuples: (full_name, id)
update = '''UPDATE candidates SET full_name = ? WHERE id = ?''' # Update the table
for rows in full_name_and_id:
cursor.execute(update, rows)
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)
Out[14]:
Here's another update, this time on an existing column.
In [15]:
update = '''UPDATE candidates SET full_name = "Eventual Winner" WHERE last_name = "Obama"'''
cursor.execute(update)
update = '''UPDATE candidates SET full_name = "Eventual Loser" WHERE last_name = "McCain"'''
cursor.execute(update)
viz_tables(candidate_cols, query)
Out[15]:
In [16]:
contributor_cols = [col[1] for col in cursor.execute("PRAGMA table_info(contributors)")] # You've already done this part. I just need to do it here b/c I haven't yet.
function = '''SELECT *, MAX(amount) AS max_amount FROM contributors'''
viz_tables(contributor_cols, function)
Out[16]:
In [17]:
query = '''SELECT * FROM candidates LIMIT 3'''
viz_tables(candidate_cols, query)
Out[17]:
In [18]:
query = '''SELECT * FROM candidates LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)
Out[18]:
In [19]:
query = '''SELECT * FROM candidates ORDER BY last_name LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)
Out[19]: