CSE 6040, Fall 2015 [09]: Relational Databases via SQL

Today's lab is a crash-course in relational databases, as well as SQL (Structured Query Language), which is the most popular language for managing relational databases.

There are many database management system ("DBMS") products that support SQL. The one we will consider in this class is the simplest, called sqlite3. It stores the database in a simple file and can be run in a "standalone" mode. However, we will consider invoking it from Python.

With a little luck, you might by the end of this class understand this xkcd comic on SQL injection attacks.

Getting started

In Python, you connect to an sqlite3 database by creating a connection object.


In [ ]:
import sqlite3 as db

# Connect to a database (or create one if it doesn't exist)
conn = db.connect ('example.db')

sqlite maintains databases as files; in this example, the name of that file is example.db.

If the named file does not yet exist, connecting to it in this way will create it.

To issue commands to the database, you also need to create a cursor.


In [ ]:
# Create a 'cursor' for executing commands
c = conn.cursor ()

A cursor tracks the current state of the database, and you will mostly be using the cursor to manipulate or query the database.

Tables and Basic Queries

The main object of a relational database is a table.

Conceptually, your data consists of items and attributes. In a database table, the items are rows and the attributes are columns.

For instance, suppose we wish to maintain a database of Georgia Tech students, whose attributes are their names and GT IDs. You might start by creating a table named Students to hold this data. You can create the table using the command, create table.


In [ ]:
c.execute ("create table Students (gtid integer, name text)")

Note: This command will fail if the table already exists. If you are trying to carry out these exercises from scratch, you may need to remove any existing example.db first.

To populate the table with items, you can use the command, insert into.


In [ ]:
c.execute ("insert into Students values (123, 'Vuduc')")
c.execute ("insert into Students values (456, 'Chau')")
c.execute ("insert into Students values (381, 'Bader')")
c.execute ("insert into Students values (991, 'Sokol')")

Given a table, the most common operation is a query. The simplest kind of query is called a select.

The following example selects all rows (items) from the Students table.


In [ ]:
c.execute ("select * from Students")

Conceptually, the database is now in a new state in which you can ask for results of the query. One way to do that is to call fetchone() on the cursor object, which will return a tuple corresponding to a row of the table.

This example calls fetchone() twice to get the first two query results.


In [ ]:
print (c.fetchone ())
print (c.fetchone ())

An alternative to fetchone() is fetchall(), which will return a list of tuples for all rows, starting at the cursor.

Since the preceding code has already fetched the first two results, calling fetchall() at this point will return all remaining results.


In [ ]:
print (c.fetchall ())

Question. What will calling fetchone() at this point return?


In [ ]:
print (c.fetchone ())

Here is an alternative, an arguably more natural, idiom for executing a query and iterating over its results.


In [ ]:
query = 'select * from Students'
for student in c.execute (query):
    print (student)

An insertion idiom

Another common operation is to perform a bunch of insertions into a table from a list of tuples. In this case, you can use executemany().


In [ ]:
# An important (and secure!) idiom
more_students = [(723, 'Rozga'),
                 (882, 'Zha'),
                 (401, 'Park'),
                 (377, 'Vetter'),
                 (904, 'Brown')]

c.executemany ('insert into Students values (?, ?)', more_students)

query = 'select * from Students'
for student in c.execute (query):
    print (student)

Exercise. Suppose we wish to maintain a second table, called Takes, which records classes that students have taken and the grades they earn.

In particular, each row of Takes stores a student by his/her GT ID, the course he/she took, and the grade he/she earned. More formally, suppose this table is defined as follows:


In [ ]:
c.execute ('create table Takes (gtid integer, course text, grade real)')

Write a command to insert the following records into the Takes table.

  • Vuduc: CSE 6040 - A (4.0), ISYE 6644 - B (3.0), MGMT 8803 - D (1.0)
  • Sokol: CSE 6040 - A (4.0), ISYE 6740 - A (4.0)
  • Chau: CSE 6040 - C (2.0), CSE 6740 - C (2.0), MGMT 8803 - B (3.0)

In [ ]:
# Insert your solution here; use the next cell to test the output
taken_spring2015 = [(991, "CSE 6040", 4.0),
                    (456, "CSE 6040", 4.0),
                    (123, "CSE 6040", 2.0),
                    (123, "ISYE 6644", 3.0),
                    (123, "MGMT 8803", 1.0),
                    (991, "ISYE 6740", 4.0),
                    (456, "CSE 6740", 2.0),
                    (456, "MGMT 8803", 3.0)]

c.executemany ('insert into Takes values (?, ?, ?)', taken_spring2015)

In [ ]:
# Displays the results of your code
for row in c.execute ('select * from Takes'):
    print (row)

Join queries

The "big idea" in a relational database is to build queries that combine information from multiple tables. A join query is one such operation.

There are many types of joins, but the simplest is one in which you use the where clause of a select statement to specify how to match rows from the tables being joined.

For example, recall that the Takes table stores classes taken by each student. However, these classes are recorded by a student's GT ID. Suppose we want a report where we want each student's name rather than his/her ID. We can get the matching name from the Students table. Here is a query to accomplish this matching:


In [ ]:
# See all (name, course, grade) tuples
query = '''
    select Students.name, Takes.course, Takes.grade
        from Students, Takes
        where Students.gtid=Takes.gtid
'''

for match in c.execute (query):
    print (match)

Exercise. Write a query to select only the names and grades of students who took CSE 6040.


In [ ]:

Aggregations

Another common style of query is an aggregation, which is a summary of information across multiple records, rather than the raw records themselves.

For instance, suppose we want to compute the GPA for each unique GT ID from the Takes table. Here is a query that does it:


In [ ]:
query = '''
    select Students.name, avg (Takes.grade)
        from Takes, Students
        where Students.gtid=Takes.gtid
        group by Takes.gtid
'''

for match in c.execute (query):
    print (match)

Exercise. Compute the GPA of every student, but report the name (rather than GT ID) and GPA.


In [ ]:

Cleanup

As one final bit of information, it's good practice to shutdown the cursor and connection, the same way you close files.


In [ ]:
c.close()
conn.close()

In [ ]: