In the previous lesson, we introduced SQL and demonstrated how to use SQL to create schema, insert data into the new schema, how to query these data, and how to modify and delete this data and the schema itself. However, this was all done manually by using a database client tool. A more efficient mechanisms is to enable automatic database interactions from within a Python program.
In this lesson, we explore how to connect to a SQLite database from with a Python program. This will build on ideas presented in the previous two lessons. While some of what we do will be SQLite specific, most of what we do will be generic and easily applied to other databases for which a Python database interface library has been developed, which is most major SQL based databases.
To use the SQLite database from within a Python program, we simply need to import the sqlite3 module. Since SQLite is distributed with Python, there are no extra download or installation steps. If you wish to use a different database, you will first need to download the appropriate Python library for that database; and, second, install the library to the appropriate system path.
In the following code cells, we demonstrate how to use SQLite from within a Python program. The first step is to simply import the sqlite3 module, after this we can access the database. In the first example, we simply access several module atributes that specify the version information for both the SQLIte database and sqlite3 module we currently have installed.
In [1]:
import sqlite3 as sl
print("SQLite library version: {}".format(sl.version))
print("SQLite version: {}".format(sl.sqlite_version))
Since the sqlite3
module is part of the standard Python library, this
import process should be painless. We also can take advantage of the
fact that the SQLite database itself is embedded within the Python
interpreter. As a result, we can easily view the built-in help
information for either the sqlite3 module or for specific attributes,
functions, or types that are in the sqlite3 module by using the Python
help()
function.
In [2]:
# View built-in help for module
help(sl)
In [3]:
# We can also get useful information on specific functions/types
help(sl.connect)
To work with a database from within a Python program, we follow a specific set of operations:
Establish a Connection to the database. With most database systems, this involves a network connection to a (possibly) remote database server. SQLite is an embedded database, however, so we, by default, simply need the database file to opened by the SQLite library. Note that we can also work with a temporary, in memory database, which will be demonstrated in the next few code blocks.
Obtain a Cursor from the database connection. A database cursor is an object that enables us to programmatically move through a database table in order to insert, delete, update, or select data.
Execute SQL commands by using the database cursor. These commands can be SQL DDL commands where we create schema, or SQL DML commands where we insert, update, delete, or select data. The execution process can return information, that we can programmatically use.
In the following code cell, we first establish a connection to an in
memory SQLite database. We do this by using a runtime context via the
with
statement. This will ensure that if the operations contained
within the context code block are succesful that database transaction
will be committed. If there is a problem within the context, or an
exception is thrown, however, the transaction will instead be rolled
back. As a result, the database will be maintained in a consistent state.
We obtain a database connection by calling the connect
built-in method
in the sqlite3 library. The only required parameter is the name of the
database, which is generally the full pathname to the database file, for
example we could connect to our Docker container database by using the
path: /notebooks/i2ds/database/i2ds
. In this case we use the special
name ':memory:'
, which indicates that our database will be temporary
and maintained in the program's memory space. This method returns an
instance of the SQLite Connection object. Note that we can use
implicit cursor objects by calling execute methods directly on the
connection object, but the recommended approach is to always be
explicit, and in this case that means use an explicit cursor.
Once we have a cursor object, we can begin to execute SQL commands. There are several methods that we can use, depending on our specific needs:
execute()
: Will execute a singel SQL statement. The SQL statement
can be parameterized, in which case the replacement values also must be
passed to the execute method.
executemany()
: Executes a SQL statement multiple times. The SQL
statement, which is the first argument is parameterized and with each new
invocation the parameters are replaced by the values contained in the
second argument. This can be useful for inserting data into an existing
table.
executescript()
: execute multiple SQL statements simultaneously. A
transaction commit is first performed, and then the SQL statements
contained in the argument passed into the method are evaluated.
After we have executed one or more SQL query statements, our cursor object will enable us to fetch the results. There are three different fetch operations:
fetchone()
: returns the next row in the query result. The return
data type is a single sequence containing the values in the row. If no
data was returned, None
is instead returned.
fetchmany()
: returns the next set of rows in the query result. The
return data type is a list containing the individual rows (which are
sequences). A size parameter can be supplied to indicate how many rows should be returned.
fetchall()
: returns all (remaining) rows of a query result. The
return data type is a list, and if no more results remain, and empty
list is returned.
In [4]:
# Open a database connection, here we use an in memory DB
with sl.connect(":memory:") as con:
# Now we obtain our cursor and execute a simple query.
cur = con.cursor()
cur.execute('SELECT SQLITE_VERSION()')
# Our simple query has one return value, so we only need to fetch one
data = cur.fetchone()
# Output the information
print("SQLite version: {}".format(data[0]))
In the previous example, we
While trivial, this example does demonstrate how to perform the basic
steps of working with a SQLite database. In the next example, however, we
return to our Bigdog's Surf Shop example to programmatically create and
populate a schema before issuing a query. In this case, we use a
predefined SQL string to create our table, and use a tuple sequence to
pass our data into the executemany()
function. As shown in the
example, the parameters are indicated in the SQL INSERT statement as
question mark character format ?
. A second technique exists in which
the parameters are explicitly named; this will be demonstrated later.
The executemany()
method replaces the parameters by the corresponding
values from the items
sequence, continuing until the sequence of items
has been exhausted.
After the schema has been created and populated, we next issue a SQL
query. In this case, we use the fact that the execute()
method can be
used as an iterator to iteratively process each row returned from our
query. The row
object is a sequence, which easily allows us to extract
the desired columns.
In [5]:
# We define our Create Table SQL command
createSQL = '''
CREATE TABLE myProducts (
itemNumber INT NOT NULL,
price REAL,
stockDate TEXT,
description TEXT);
'''
# Tuple containing data values to insert into our database
items = ((1,19.95,'2015-03-31','Hooded sweatshirt'),
(2,99.99,'2015-03-29','Beach umbrella'),
(3,0.99,'2015-02-28', None),
(4,29.95,'2015-02-10','Male bathing suit, blue'),
(5,49.95,'2015-02-20','Female bathing suit, one piece, aqua'),
(6,9.95,'2015-01-15','Child sand toy set'),
(7,24.95,'2014-12-20','White beach towel'),
(8,32.95,'2014-12-22','Blue-striped beach towel'),
(9,12.95,'2015-03-12','Flip-flop'),
(10,34.95,'2015-01-24','Open-toed sandal'))
# Open a database connection, here we use an in memory DB
with sl.connect(":memory:") as con:
# Now we obtain our cursor
cur = con.cursor()
# First we create the table
cur.execute(createSQL)
# Now populate the table using all items
cur.executemany("INSERT INTO myProducts VALUES(?, ?, ?, ?)", items)
# We can now select rows and iterate through them
for row in cur.execute('SELECT * FROM myProducts'):
print("{0} costs ${1}; last stocked on {2}.".format(row[3], row[1], row[2]))
The second approach to using parameterized SQL statements is to
explicitly name the parameters. This is accomplished by using a colon
:
followed by the name in the SQL statement, and passing in a
dictionary where the name used in the SQL statement is the key and the
corresponding dictionary value holds the value to insert into the SQL
statement. For example, we could replace the previous SQL INSERT
statement with the equivalent named parameter form, using the
execute()
method:
for item in items:
cur.execute("INSERT INTO myProducts VALUES(:id, :price, :sdate, :desc)",
{"id" : item[0], "price" : item[1], "sdate" : item[2], "desc" : item[3]})
We can also create user-defined functions in SQLite, and use these new
functions in a subsequent SQL Statement. This is demonstrated in the
next code cell, where we recreate our temporary database in memory,
define three new functions to extract the year, month, and day
from the date text string in our database. To do this, we first create
the Python versions of these functions: myYear
, myMonth
, and
myDay
. Next, we call the create_function
method on our database
connection to create the SQLite functions.
This process essentially creates a map between the SQLite name of the function, which is used in SQL statements and the Python function. We also must pass in the expected number of arguments, which for all three of these demo-functions is simply the one data text string. Finally, we use these new functions in our SQL query, as we now explicitly return the year, month, and day separately for each item in the database. To be clear, these functions are simply for demonstration. To actually perform this operation, we should use regular expressions to build a more robust date component extraction.
In [6]:
# We define three simple functions
def myYear(date):
return int(date[:4])
def myMonth(date):
return int(date[5:7])
def myDay(date):
return (int(date[-2:]))
# Define our SQL query
query = 'SELECT description, fMonth(stockDate), fDay(stockDate), fYear(stockDate) FROM myProducts'
# Open a database connection, here we use an in memory DB
with sl.connect(":memory:") as con:
# Now we obtain our cursor
cur = con.cursor()
# First we create the table
cur.execute(createSQL)
# Now populate the table using all items
cur.executemany("INSERT INTO myProducts VALUES(?, ?, ?, ?)", items)
# Create our user-defined functions
con.create_function("fYear", 1, myYear)
con.create_function("fMonth", 1, myMonth)
con.create_function("fDay", 1, myDay)
# We can now select rows by using our functions and iterate through results
for row in cur.execute(query):
print("{0} last stocked on {1}/{2}/{3}.".format(row[0], row[1], row[2], row[3]))
Now we can turn to executing SQL queries against an actual SQLite database. In this case, we will use the airline database we built in the first lesson. As shown below, the program logic is identical to the in memory database, with the exception that we now explicitly list the full path to our database file.
One important change, however, will be in the execution speed since this database is large and in a file in our Docker container as opposed to an in memory database.
In [7]:
database = '/notebooks/i2ds/database/i2ds'
query = "SELECT COUNT(*), tailNumber FROM flights GROUP BY tailNumber LIMIT 10"
with sl.connect(database) as con:
cur = con.cursor()
for row in cur.execute(query):
print(row)
We can also unpack the row returned from our SQL query directly into Python variables. This can simplify their subsequent use as demonstrated below.
In [8]:
with sl.connect(database) as con:
total = 0
cur = con.cursor()
for row in cur.execute(query):
count, number = row
print("Flight# {} flew {} times in 2001".format(number, count))
total += count
print("\nWe found {} total flights for the selected period.".format(total))
We also can query a database, and write out results to a stream, such as a file. In the following code cell, we open a file for writing, establish a database connection, execute a query, and write a formatted version of each row returned from the query to the file.
In [9]:
fout = open('airports.txt', 'w')
with sl.connect(database) as con:
cur = con.cursor()
for row in cur.execute('SELECT * FROM airports LIMIT 100'):
print("Airport ID {0} is located in {1}, {2}.".format(row[0], row[2], row[3]),
file=fout)
fout.close()
In [10]:
!head airports.txt
Since SQLite supports a limited set of data types: NULL, INTEGER, REAL, TEXT, and BLOB, we can face difficulties in trying to effectively map complex Python data types into SQLite data types. By default, the following mappings exist:
Python | SQLite |
---|---|
None | NULL |
int | INTEGER |
float | REAL |
str | TEXT |
bytes | BLOB |
The only simply exception is that SQLite TEXT data can be converted to
different Python types by using a text_factory
. For example, the
TEXT data can be returned as a byte
string. To extend the SQLite
default type system we can either use object adaption to store more
Python data types in a SQLite database, or we can supply converters
to transform SQLite data types into different Python typres. Both of
these are fairly specific to SQLite, however, and thus we will not
discuss these in more detail.