Introduction to SQL

Professor Robert J. Brunner

</DIV>

Introduction

In this lesson, we focus on a basic component of relational database management systems, SQL. In this Notebook we will use the SQLite database to build a fictitious database. We will cover SQL data types and how to create SQL schemas before moving on to creating and executing queries and finishing with updating and deleting data.

We can now start using the SQLite database. First we will test SQLite from within the IPython Notebook, before switching to the command line to actually create and populate a database. Note that if we run the sqlite3 command line client from within an IPython Notebook cell, the process will continue to run in the background since we can not directly enter commands. Thus, you should either redirect STDIN for the sqlite3 client to be a file of commands, or work directly with this tool at the command line.

Working with SQLite

By default, the sqlite3 command line client will operate in interactive mode. However, this tool will also read and execute commands either in from a separate file by redirecting STDIN or by enclosing the commands in quotes. Since SQLite databases are files, unless explicitly created from within a program as in memory databases, we pass the name of the database as a command line argument. Thus, to connect to the i2ds database with the sqlite3 command line client in interactive mode, we simply enter the following at a command prompt in our Docker container:

/notebooks/i2ds/database# sqlite3 i2ds
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

To exit from the sqlite3 client, simply enter either ctrl-d or use the .quit command. The sqlite3 client can either accept SQL commands, which we will discus in more detail in the next lesson, or the client can accept dot commands, which are instructions to the SQLite database engine that begin with a . character. These commands can be explicitly listed by entering .help at the sqlite3 client prompt. We can do this from within our Notebook by creating and using a file as shown below.



In [ ]:

    
# Run the SQLite command line client

!sqlite3 -help



In [1]:

    
%%writefile help.txt
.help









    



Writing help.txt



In [1]:

    
# List 'dot' commands

!sqlite3 < help.txt









    



.backup ?DB? FILE      Backup DB (default "main") to FILE
.bail on|off           Stop after hitting an error.  Default OFF
.clone NEWDB           Clone data into NEWDB from the existing database
.databases             List names and files of attached databases
.dump ?TABLE? ...      Dump the database in an SQL text format
                         If TABLE specified, only dump tables matching
                         LIKE pattern TABLE.
.echo on|off           Turn command echo on or off
.exit                  Exit this program
.explain ?on|off?      Turn output mode suitable for EXPLAIN on or off.
                         With no args, it turns EXPLAIN on.
.headers on|off        Turn display of headers on or off
.help                  Show this message
.import FILE TABLE     Import data from FILE into TABLE
.indices ?TABLE?       Show names of all indices
                         If TABLE specified, only show indices for tables
                         matching LIKE pattern TABLE.
.log FILE|off          Turn logging on or off.  FILE can be stderr/stdout
.mode MODE ?TABLE?     Set output mode where MODE is one of:
                         csv      Comma-separated values
                         column   Left-aligned columns.  (See .width)
                         html     HTML <table> code
                         insert   SQL insert statements for TABLE
                         line     One value per line
                         list     Values delimited by .separator string
                         tabs     Tab-separated values
                         tcl      TCL list elements
.nullvalue STRING      Use STRING in place of NULL values
.once FILENAME         Output for the next SQL command only to FILENAME
.open ?FILENAME?       Close existing database and reopen FILENAME
.output ?FILENAME?     Send output to FILENAME or stdout
.print STRING...       Print literal STRING
.prompt MAIN CONTINUE  Replace the standard prompts
.quit                  Exit this program
.read FILENAME         Execute SQL in FILENAME
.restore ?DB? FILE     Restore content of DB (default "main") from FILE
.save FILE             Write in-memory database into FILE
.schema ?TABLE?        Show the CREATE statements
                         If TABLE specified, only show tables matching
                         LIKE pattern TABLE.
.separator STRING      Change separator used by output mode and .import
.shell CMD ARGS...     Run CMD ARGS... in a system shell
.show                  Show the current values for various settings
.stats on|off          Turn stats on or off
.system CMD ARGS...    Run CMD ARGS... in a system shell
.tables ?TABLE?        List names of tables
                         If TABLE specified, only list tables matching
                         LIKE pattern TABLE.
.timeout MS            Try opening locked tables for MS milliseconds
.timer on|off          Turn SQL timer on or off
.trace FILE|off        Output each SQL statement as it is run
.vfsname ?AUX?         Print the name of the VFS stack
.width NUM1 NUM2 ...   Set column widths for "column" mode
                         Negative values right-justify

SQLite Client Tool Options

The previous code block listed the available dot commands that we can use within the sqlite3 client. We will use several of these, including

.header: If this is on, headers will be displayed. Generally this is used to see the names of columns in a table.
.separator: This specifies the separator character for fields, either in output displays or when importing data. By default, the separator character is the vertical bar, |. However, to import data from a comma-separated-value (CSV) file, you would change the separator to a comma, .separator ",".
.import: This command is used to read data from a file and insert this data into a specific table in the database.
.schema: This command will list the schema commands required to recreate the tables contained in the database.
.stats: This command lists statistics after each command entered at the sqlite prompt. This can be useful for profiling SQL commands.
.width: This command changes the default width for columns, which can improve the visual formatting of the results from database queries that are displayed to the screen.

Creating and Populating a Database

We can easily create and populate a database by using the sqlite3 client. While we could do this at the command line (and advanced users are encouraged to do so), we can also complete these tasks from within this IPython Notebook. The steps we must complete include

Create the new database. We do this by simply passing the name of our new database to the sqlite3 client. If the file does not exist, a new file will be created. This file will hold the entire contents of the new database.
Create the schema for our new database. A relational database is built on a tabular data model. Thus our schema consists of the table definitions as well as the relationships that might exist between tables. To accomplish this, we must execute SQL CREATE TABLE statements. For now, we simply create the schema, the next lesson explores SQL statements in more detail. A schema file for the airline data is included in this lesson, under the schema directory.
Populate the tables with data. For simplicity, we will use the .import command within the sqlite3 client to import data from a file directly into the relevant table in our database.



In [ ]:

    
# First we make a new directory to hold our database

!mkdir /notebooks/i2ds/database



In [ ]:

    
!ls -l /notebooks/i2ds/database



In [ ]:

    
# We could create a schema SQL file, then use Bulk imort to load the databases.

The basics of relational database systems

Before you can begin to develop database applications, you need to understand the basic concepts. Relational databases hold data. This data can be of different types, such as numbers, characters, or dates. Within the database, the data are organized into logical units called tables. A table is like a spreadsheet, because it contains rows of data. Each row is made up of a number of columns. The columns hold data of a specific data type, like integer values or strings of characters. In most cases, a database has more than one table. To relate the tables together, a database designer takes advantage of natural (or artificial) links between the tables. In a spreadsheet, you can link rows in different sheets by cell values. The same idea holds in a relational database, and the column used to make the link is referred to as a key column.

To make it easier to understand the purpose of a table or a particular column, you should select appropriate names. The naming conventions can vary between databases. For a SQLite database, individual names:

by default, case insensitive (although this can be changed),
are unlimited in length, but should be kept to a reasonable length (given readability constraints),
must begin with a letter or an underscore character, and
must only code alphanumeric characters or underscores.

You can escape these rules by placing the name in double quotation marks, which allows names to be case sensitive and to include additional characters (including spaces). Doing this, however, is generally a bad practice: It requires the name to always be enclosed in double quotation marks and can easily confuse someone else who may be maintaining your code. Finally, a name can not be one of the reserved keywords.

Class Style

For this class, we will follow a specific style: All SQL commands are presented entirely in uppercase, and item names use camelCase. In camelCase style, words are joined together, and the first letter of each word-following the first one-is capitalized, such as aLongIdentifier. Combining these two styles together, these articles write SQL commands using the following style: SELECT aLongIdentifier FROM dataTable ;.

Schema

Related tables are often grouped together into a schema. You can think of a schema as a container for all the related structure definitions within a particular database. A table name must be unique within a given schema. Thus, by using schemas, you can have identically named objects (such as tables) enclosed within different schemas. When you're using the Apache Derby database, a table is always in a schema. If you don't specify a schema explicitly, Derby implicitly uses the built-in apps schema. A second built-in schema called sys is used to isolate system tables.

You can use the schema name to qualify a name. By default, SQLite uses the database name as the schema, and you do not need to prefix names to indicate the correct schema. For other databases, however, this is not the case. For these databases, you specify the schema name followed by a period and then the table name. For example, bigdog.products references the products table in the bigdog schema. Without the relevant schema name, a table name is said to be unqualified, as in products. When the schema name and the table name are completely specified, as in bigdog.products, the name is said to be fully qualified.

In an abstract sense, these database concepts may seem confusing, but in practice they're fairly straightforward. For example, imagine you own a store called Bigdog's Surf Shop that sells a variety of items like sunglasses, shirts, and so on. If you want to be profitable, you must keep a close eye on your inventory so you can easily order additional inventory or change vendors to keep your overhead to a minimum. One simple method for tracking this information is to write entries in a table-like format:

Product Table

Item#	Price	Stock Date	Description
1	29.95	1/15/15	Basic Sunglasses
2	9.95	12/14/14	Generic Shirt
3	99.95	8/04/14	Boogie Board

Vendors Table

Item#	Vendor#	Vendor Name
1	101	Mikal Arroyo
2	102	Quiet Beach Industries
3	103	Vista Luna

From this simple visual design you can easily map the business logic straight into database tables. You have two database tables, Products and Vendors, which are naturally linked by the item number. The data types for the columns in each table are easy to determine. Later in this lesson we will actually create this sample schema for Bigdog's Surf Shop, which consists of these two tables, in a SQLite database. But first, we need to address how data is stored in a relational database table.

SQL: Structured Query Language

Database systems can be complex pieces of software, especially when they scale to support enterprise-level applications. As a result, you may expect that every database has its own application programming interface (API) and that these APIs may be different from one system to the next. When relational databases were first developed, this was the case; but, fortunately, a number of vendors agreed to develop a standard language for accessing and manipulating relational databases. This language is officially called Structured Query Language (or SQL, pronounced sea-quill). Several official standard versions have been produced, including one in 1992 that is referred to as SQL-92, and one in 1999 that is referred to as SQL-99. The Apache Derby database provides a nearly complete implementation of the SQL-92 standard, so applications developed with Derby can be easily transported to other database systems.

SQL has two main components: a Data Definition Language (DDL) and a Data Manipulation Language (DML). DDL commands are used to create, modify, or delete items (such as tables) in a database. DML commands are used to add, modify, delete, or select data from a table in the database. The rest of this article provides a basic introduction to the DDL components of SQL. Future articles will focus on the DML commands and more advanced DDL commands.

SQL data types

SQL, being a programming language in its own right, defines a rich data-type hierarchy. Persisting these data types is one of the most important responsibilities of the database. As databases have become more powerful, this type hierarchy has grown more complex. But most simple databases don't require the full range of allowed types, and often they need to store only numerical, character, and date or time data.

While the SQL standard defines basic data types, different database systems can support the standard to varying degrees. While this might seem odd, doing so provides more flexibility in allowing a particular implementation to achieve a market niche. In the case of SQLite, the design decisions support a compact, zero-configuration database file that is platform-independent. As a result, SQLite does not support a rich data type hierarchy, and instead focuses on ease-of-use.

SQLite supports five storage classes:

NULL: A null value.
INTEGER: A signed integer, the number of bytes (1, 2, 3, 4, 6, or 8) used depends on the magnitude of the value.
REAL: A floating-point value stored as an 8 byte IEEE floating-point value.
TEXT: A string of character values stored in the default database encoding (e.g., UTF-8).
BLOB: A blob of data stored exactly as is in the database.

Note that SQLite does not support Boolean or Date/Time valus directly. Instead, Boolean values are encoded as INTEGERs (0 = False, 1 = True). Likewise Date/Time values can be encoded either as TEXT, REAL, or INTEGER values. For full details, see the SQLite documentation. In addition, SQLite supports the concept of Type Affinity, whereby different data types can be easily mapped into each other. This simplifies moving schemas from other database systems to SQLite.

The SQL NULL type

Before you begin creating database tables, you must know what to do when no value is specified for a column. To illustrate this point, imagine that you've been asked to fill out a Web form. If you leave a particular column blank, what is inserted into the database? As you can imagine, this problem could be cumbersome if you had to track no value markers. Fortunately, SQL defines a special value, NULL, to indicate that a column has no value.

CREATE TABLE

So far, you've learned how to design a table, including mapping out the table columns and defining the data type for each column. After you've properly designed a table, the method for creating a table in SQL is straightforward. Listing 1 shows the formal syntax for creating a table in Derby.

-- Comment describing the purpose and layout of the table 

CREATE TABLE tableName ( { <columnDefinition> | <tableLevelConstraint> } 
    [, { <columnDefinition> | <tableLevelConstraint> } ]* ) ;

You may feel bewildered after looking at this syntax for the first time. But it's easy to follow once you have the basics down. The square brackets ([ and ]) enclose optional parameters. As you can see from the formal syntax, any column definitions or table-level constraints after the required initial one (it wouldn't make sense to create a table with no columns!) are optional.

You probably understand what is meant by a column definition, but you might not understand the idea of a constraint. Constraints come in two types: table-level constraints and column constraints. A constraint limits either a column or a table in some manner. For example, you can use a constraint to require that a column always be assigned an actual value (no NULL values), or that every entry in a column must be unique, or that a column is automatically assigned a default value.

The asterisk (*) after the last closing square bracket indicates that more than one of the enclosing items can be included. This implies that the table must have one or more columns or table-level constraints. The vertical line (|) indicates an either/or condition. In this syntax example, you must either define a new column or define a new table-level constraint. The curly brackets ({ and }) group related items together, and the parentheses (( and )) are required elements. Finally, the semicolon (;) indicates the end of a SQL statement.

SQLite Schema Creation

In the following code block, we create our schema for Bigdog's Surf Shop, which includes two new tables: myProducts and myVendors. The myProducts table has four columns: itemNumber, price, stockDate, and description. The itemNumber column provides a unique identity for each item (or row) and has an attached column-level constraint that enforces a valid value to always be supplied (NOT NULL). Without this requirement, the itemNumber column isn't guaranteed to be unique because multiple columns could be assigned a NULL value. The price column is created as a REAL data type. The last two columns are simple: The stockDate column is stored as a TEXT (we can use application logic to transform the date/time information into and out of the appropriate YYYY-MM-DD format), and description is also stored in a TEXT field.

The myVendors table has three columns: itemNumber, vendorNumber, and vendorName. In this case, both the itemNumber and vendorNumber columns have attached column-level constraints (NOT NULL). In addition, the vendorName column is stored as a TEXT field.

Drop Table

No one is perfect. What do you do when you incorrectly create a table or a table is no longer needed? The simple answer is to delete the table from the database and, if necessary, create a replacement table. Deleting a table is easy, which means, of course, that you should exercise great care when doing so -- no dialog box pops up and asks if you're sure you want to proceed!

The full syntax for deleting-or, more formally, dropping-a table from a database in SQLite is

DROP TABLE tableName ;

The syntax is simple: You append the fully qualified name and a semicolon to the DROP TABLE SQL command, and you're finished.

In the next few code cells, we first create our schema file, before executing the SQL commands on our new database by using the sqlite3 client tool.



In [1]:

    
%%writefile create.sql

-- First we drop any tables if they exist
-- Ignore the no such Table error if present

DROP TABLE myVendors ;
DROP TABLE myProducts ;

-- Vendor Table: Could contain full vendor contact information.
    
CREATE TABLE myVendors (
    itemNumber INT NOT NULL,
    vendornumber INT NOT NULL,
    vendorName TEXT
) ;

-- Product Table: Could include additional data like quantity
    
CREATE TABLE myProducts (
    itemNumber INT NOT NULL,
    price REAL,
    stockDate TEXT,
    description TEXT
) ;









    



Writing create.sql



In [3]:

    
# Now create the schema in a new test database

!sqlite3 test < create.sql



In [4]:

    
# We can test the results

!sqlite3 test ".schema"









    



CREATE TABLE myVendors (
    itemNumber INT NOT NULL,
    vendornumber INT NOT NULL,
    vendorName TEXT
);
CREATE TABLE myProducts (
    itemNumber INT NOT NULL,
    price REAL,
    stockDate TEXT,
    description TEXT
);

SQL scripts

As the previous code blocks demonstrated, we can write SQL commands into a script file that can be easily executed by the sqlite3 client. While this might seem like overkill given the simplicity of our current schema, it is actually a useful technique. Often, you'll need to execute multiple, complex commands. To simplify debugging a set of complex SQL commands, it's generally easier to write them in a text file and then execute the commands in the text file all at once. By placing SQL commands in a script file, you gain the additional benefit of being able to execute the commands as many times as necessary.

A script file is just a plain text file that contains a combination of SQL commands and SQLite commands that can be run directly from the sqlite3 tool. A script file simplifies the development and maintenance of relational databases and provides a self-documenting technique for building databases. You should store these files as ASCII text files, not as RTF files (or any other format), to prevent text-encoding errors. Some text applications may try to automatically save your file as a rich text file. Be careful to avoid doing so, or you may have problems when you try to execute your script file. For this reason, you probably will want to always create and edit your SQL scripts at the Unix command prompt.

Our previous script file, create.sql, includes several lines that start with two dashes (--). These lines are SQL comments; you should use them to provide a basic description of the purpose of each major component within the script file. The first actual commands ion the script file are SQL DROP statements that delete the myProducts and myVendors tables from the database. If the tables don't exist (which is the case if the database was just created), an error message is displayed; but as the preceding SQL comments indicate, you can safely ignore those messages.

You first drop the tables, if they exist, so that you can cleanly create new tables with the exact column definitions you need. The next two SQL statements do just that, creating the myProducts and myVendors tables. To run this script file, we can either have the sqlite3 client tool read the file form STDIN, as shown above, or we can start sqlite3 and use the .read command to read and execute SQL commands from our create.sql file.

What if something goes wrong?

Sometimes, no matter how hard you try, things don't work out quite right. If you can't safely execute the create.sql script, there are a number of possibilities to check:

Be sure the SQLite client tool starts up properly. You can do this at the Unix command prompt.
Be sure you have free disk space in which to create a new database.
Be sure you have proper permissions (to read the script file and to create the new database) in the directory where you try to execute the script file.
Be sure your script file is a simple ASCII text file (and not an RTF file).

If the output of the !sqlite3 test ".schema" matches the data in the create.sql file, congratulations are in order. You now have a new test database with two new tables ready to hold data.

Now we'll focus on the process of inserting data into a table by using SQLite. To follow along, you will need a SQLite database with the myProducts table available. If you haven't already done so, you should execute the create.sql script file.

Additional References

Several sites exist that allow you to try out SQL commands online.

W3 Schools SQL, a general SQL demo site.
SQLZoo, allows you to specify the Relational Database to target.