PEP 249 - The Python Database API Specification gives very specific instructions for how Python developers should interact with databases. Although there are some notable differences in the database backends, what that means is that from the Python perspective, the use of different databases like SQLite, PostgreSQL, MySQL, etc. should all be very similar. In this tutorial, we will go over the use of Python with SQLite - a lightweight and simple database that will immediately make the data management in your apps more effective.
In [ ]:
## Imports
import sqlite3
The first thing to keep in mind is that we have to import driver code - that is the API for the specific database that we want to use. The most common are:
Though a host of other databases for vendors like IBM, Microsoft, and Oracle can be found at Python Database Interfaces.
In this tutorial we will be using sqlite3
because it ships with Python (the other drivers are third party) and because it is so simple to use. SQLite databases are the embedded backbone of many applications, though they should be kept small.
The first thing you have to do is make a connection to a database. Often times this means you'll need the following information to connect to a database server:
SQLite is an embedded database, however - which means it is stored in a file on disk, and operated on soley by a single program (not multiple programs at once). Therefore in order to create a connection to a SQLite database, we simply need to point it to a file on disk.
In [ ]:
DBPATH = 'people.db'
conn = sqlite3.connect(DBPATH)
At this point, you should notice that a file called people.db
has been created in your current working directory!
The connect method returned a connection object that we've called conn
. With conn you can manipulate your connection to the database including the following methods:
conn.commit()
- commit any changes back to the databaseconn.close()
- close our connection to the database and tidy upHowever, to execute SQL against the database to INSERT
or SELECT
rows, we'll need to create a cursor:
In [ ]:
cursor = conn.cursor()
A cursor is essentially a pointer into the database. Think of it like a mouse cursor that keeps track of where you are on in the database table or tables. Cursors have the following methods:
cursor.execute()
- executes a SQL string against the databasecursor.fetchone()
- fetch a single row back from the executed querycursor.fetchall()
- fetch all results back from the executed query. Together, connections and cursors are the basic way to interact with a SQL database.
The first thing we have to do is describe the type of data that we'll be putting in the database by creating a schema. For this workshop, we'll be creating a very simple contacts application, our schema is as follows:
Here we have two tables, contacts
which keeps track of people, their email, and who they are affiliated with, and companies
which keeps tracks of organizations. To create the companies table we would execute SQL as follows:
In [ ]:
sql = (
"CREATE TABLE IF NOT EXISTS companies ("
" id INTEGER PRIMARY KEY AUTOINCREMENT,"
" name TEXT NOT NULL"
")"
)
cursor.execute(sql)
A note on the syntax above - since I like to write clean, well-indented SQL; I used a string concatentation method in Python, by opening up a parentheses and adding strings without commas on new lines between them. If you print sql
you'll see it's just one long string with spaces inside of it. You could also use docstrings with the three quotes """
to write a multiline string, or even read in the SQL from a file.
Now write and execute the contacts table create statement.
In [ ]:
# Create the contacts table
In [ ]:
sql = "INSERT INTO companies (name) VALUES (?)"
cursor.execute(sql, ("Georgetown University",))
conn.commit()
Here we've created essentially a SQL template for inserting the names of companies into the table. Note that we don't have to assign an id, since it will be automatically assigned using the AUTOINCREMENT property of that field.
The ?
is a parameter to the query, and can be used as a placeholder for any user input. Values for the parameters are then passed to the second argument of the execute
method as a tuple. You should not use string formatting methods like:
sql = "INSERT INTO companies (name) VALUES ({})".format("Georgetown University")
This is potentially unsafe behavior, and the ?
parameters do a lot of work on your behalf to make sure things work correctly and securely.
Let's go ahead and insert another record using the same sql statement.
In [ ]:
cursor.execute(sql, ("US Department of Commerce",))
conn.commit()
The last thing we should mention is the commit
call. Nothing will be written to the database until commit is called. This gives us an interesting ability to do transactions - a series of SQL queries that when completed together succesfully, we commit them. However if something goes wrong during execution, we don't commit and therefore "rollback".
Before we go on to insert contact information, we need to know the id of the company of the contact we're inserting. However, because we inserted the data using the auto increment feature, we don't know what the company's ids are. To read them, we'll have to fetch them as follows:
In [ ]:
cursor.execute("SELECT id FROM companies WHERE name=?", ("Georgetown University",))
print cursor.fetchone()
Using the same parameter statement in our predicate clause and passing in the tuple containing "Georgetown University" as an argument to execute
, we can select the id that we need, which is returned as a Row
. Row
s present themselves as tuples, and since we only fetched the ID, it is the first element in the record. Note that we can use SELECT *
to select all fields in a record if so desired.
The fetchone
statement goes and gets the first record it finds. Note that the name of companies are not constrained uniqueness, therefore there could be multiple "Georgetown University" records fetched from this query. If you wanted all of them, you would use fetchall
.
Now to insert a person who works for Georgetown University you would write a statement similar to:
sql = "INSERT INTO contacts (name, email, company_id) VALUES (?,?,?)"
cursor.execute(sql, ("Benjamin Bengfort", "bb830@georgetown.edu", 1))
conn.commit()
And at this point you should start inserting some contacts and companies.
In [ ]:
# Insert some contacts and companies using the methods described above.
Consider the following questions while you're working with the database:
At this point, you should be able to create a small Python application that does the following:
Using the following incomplete code as a template to help you design your project:
In [ ]:
import os
import sqlite3
def create_tables(conn):
"""
Write your CREATE TABLE statements in this function and execute
them with the passed in connection.
"""
# TODO: fill in.
pass
def connect(path="people.db", syncdb=False):
"""
Connects to the database and ensures there are tables.
"""
# Check if the SQLite file exists, if not create it.
if not os.path.exists(path):
syncdb=True
# Connect to the sqlite database
conn = sqlite3.connect(path)
if syncdb:
create_tables(conn)
return conn
def insert(name, email, company, conn=None):
if not conn: conn = connect()
# Attempt to select company by name first.
# If not exists, insert and select new id.
# Insert contact
if __name__ == "__main__":
name = raw_input("Enter name: ")
email = raw_input("Enter email: ")
company = raw_input("Enter company: ")
conn = connect()
insert(name, email, company, conn)
# Change below to count contacts per company!
contacts = conn.execute("SELECT count(id) FROM contacts").fetchone()
print "There are now {} contacts".format(*contacts)
conn.close()