Introduction to SQL Data Manipulation Language

Professor Robert J. Brunner

</DIV>


Introduction

In this lesson, we focus on the basic task of manipulating data in relational database management systems by using SQL DML. In this Notebook we will expand on our use of the SQLite database to build and query a fictitious database. We will cover inserting data into tables, creating and executing queries, and finishing with updating and deleting data.


INSERT

One of the most important tasks when you're building a database application is inserting data into the database. It doesn't matter how good the database software is-if you put bad data in a database, nothing else matters. There are several different ways to insert data into a database, but the rest of this lesson focuses on inserting data into a SQLite database by using the SQL INSERT statement.

Before you can insert data into a SQLite database using the SQL INSERT statement, you must know how to properly use this statement. The full syntax for the SQL INSERT statement is

INSERT INTO table-Name
    [ (Simple-column-Name [ , Simple-column-Name]* ) ]
      Expression

which should seem familiar. As discussed previously, the square brackets ([]) enclose optional parameters. The only component whose purpose isn't immediately clear is Expression; but how complex can that simple phrase be? Of course, appearances can be deceiving; the Expression term can expand to one of four different structures:

  • a single-row VALUES list
  • a multiple-row VALUES list
  • a SELECT expression
  • a UNION expression

Of these, the last two are beyond the scope of this lesson. The first two are similar; the only difference is that the first form inserts one row into a table, whereas the latter form inserts multiple rows into a table.

You can use the optional part of the SQL INSERT statement to specify the column order of the values being inserted into the table. By default, data is inserted into a table's columns in the same order that the columns were listed when the table was created. Sometimes you may want to change this order or perhaps only specify values for columns that have NOT NULL constraints. By explicitly listing the columns in your SQL INSERT statement, you gain more control of the operation and can more easily handle these specific use cases.

The syntax for the SQL VALUES expression is fairly simple,

{
    VALUES ( Value {, Value }* )
        [ , ( Value {, Value }* ) ]* |
    VALUES Value [ , Value ]*
 }

This syntax displays the multiple-row format first, followed by the single-row format (remember that the vertical line character, |, means or and that the asterisk character, *, means one or more). The value term stands for a value that you want to insert into a specific column. To insert data into multiple columns, you must enclose the data for a row in parentheses separated by commas.

As shown below, to insert data into a table, you first need to to make sure that the table exists. If you haven't already done so, execute the table creation scripts discussed earlier.

INSERT INTO myProducts 
    VALUES(1, 19.95, '2015-03-31', 'Hooded sweatshirt') ;

INSERT INTO myProducts(itemNumber, price, stockDate, description) 
    VALUES(2, 99.99, '2015-03-29', 'Beach umbrella') ;

INSERT INTO myProducts(itemNumber, price, stockDate) 
    VALUES(3, 0.99, '2015-02-28') ;

This example presents three single-row inserts into the myProducts table. The first SQL INSERT statement doesn't provide a list of columns; it inserts an itemNumber, a price, a stockDate, and a description. Notice that the values inserted into both the stockDate and description columns are enclosed in single quote characters. The description column is a TEXT field, so it expects a string (which you indicate by enclosing the character data within single quotes). The stockDate column is also a TEXT field; as part of our application logic, we could pass in dates in the correct day, month, and year format. (For more guidance on the format of data types during a SQL INSERT operation, read the SQLite documentation).

The second SQL INSERT statement explicitly lists all four columns and inserts new values appropriately. The final SQL INSERT statement lists only three columns and inserts only three values. The description column is left empty, which means it will have a NULL value.

Although single-row SQL INSERT statements can be useful, when you need to insert multiple rows, it's more efficient to do so directly, as shown below:

INSERT INTO myProducts(itemNumber, price, stockDate, description)
VALUES (4, 29.95, '2015-02-10', 'Male bathing suit, blue'),
       (5, 49.95, '2015-02-20', 'Female bathing suit, one piece, aqua'),
       (6, 9.95, '2015-01-15', 'Child sand toy set'),
       (7, 24.95, '2014-12-20', 'White beach towel'),
       (8, 32.95, '2014-12-22', 'Blue-striped beach towel'),
       (9, 12.95, '2015-03-12', 'Flip-flop'),
       (10, 34.95, '2015-01-24', 'Open-toed sandal') ;

In this example, we insert seven rows into the database by explicitly listing all four columns and providing new values for each row. As discussed earlier, multiple-row inserts enclose the values for each new row within parentheses, and these values are separated by commas.

To actually execute these statements, we can place the SQL INSERT statements in a script file and run the script to insert the data. This approach lets you more easily fix errors or reinsert the data if necessary without recreating the requisite SQL INSERT statements. In the following code cells, we first create a SQL INSERT script file, before executing this script by using the sqlite3 client tool. After this, we use the .dump command in the sqlite3 client tool to display the full schema and contents of this SQLite database.



In [5]:
%%writefile insert.sql

-- Single unnamed INSERT

INSERT INTO myProducts 
VALUES(1, 19.95, '2015-03-31', 'Hooded sweatshirt') ;

-- Single named INSERT

INSERT INTO myProducts (itemNumber, price, stockDate, description) 
VALUES(2, 99.99, '2015-03-29', 'Beach umbrella') ;

-- Single named INSERT with missing data

INSERT INTO myProducts (itemNumber, price, stockDate) 
VALUES(3, 0.99, '2015-02-28') ;

-- Multiple named INSERT

INSERT INTO myProducts (itemNumber, price, stockDate, description)
VALUES (4, 29.95, '2015-02-10', 'Male bathing suit, blue'),
       (5, 49.95, '2015-02-20', 'Female bathing suit, one piece, aqua'),
       (6, 9.95, '2015-01-15', 'Child sand toy set'),
       (7, 24.95, '2014-12-20', 'White beach towel'),
       (8, 32.95, '2014-12-22', 'Blue-striped beach towel'),
       (9, 12.95, '2015-03-12', 'Flip-flop'),
       (10, 34.95, '2015-01-24', 'Open-toed sandal') ;
        
-- Insert into myVendors

INSERT INTO myVendors(itemNumber, vendorNumber, vendorName)
VALUES (1, 1, 'Luna Vista Limited'),
       (2, 1, 'Luna Vista Limited'),
       (3, 1, 'Luna Vista Limited'),
       (4, 2, 'Mikal Arroyo Incorporated'),
       (5, 2, 'Mikal Arroyo Incorporated'),
       (6, 1, 'Luna Vista Limited'),
       (7, 1, 'Luna Vista Limited'),
       (8, 1, 'Luna Vista Limited'),
       (9, 3, 'Quiet Beach Industries'),
       (10, 3, 'Quiet Beach Industries') ;


Writing insert.sql

In [6]:
!head -10 insert.sql


-- Single unnamed INSERT

INSERT INTO myProducts 
VALUES(1, 19.95, '2015-03-31', 'Hooded sweatshirt') ;

-- Single named INSERT

INSERT INTO myProducts (itemNumber, price, stockDate, description) 
VALUES(2, 99.99, '2015-03-29', 'Beach umbrella') ;

In [7]:
!sqlite3 test < insert.sql

In [8]:
!sqlite3 test ".dump"


PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE myVendors (
    itemNumber INT NOT NULL,
    vendornumber INT NOT NULL,
    vendorName TEXT
);
INSERT INTO "myVendors" VALUES(1,1,'Luna Vista Limited');
INSERT INTO "myVendors" VALUES(2,1,'Luna Vista Limited');
INSERT INTO "myVendors" VALUES(3,1,'Luna Vista Limited');
INSERT INTO "myVendors" VALUES(4,2,'Mikal Arroyo Incorporated');
INSERT INTO "myVendors" VALUES(5,2,'Mikal Arroyo Incorporated');
INSERT INTO "myVendors" VALUES(6,1,'Luna Vista Limited');
INSERT INTO "myVendors" VALUES(7,1,'Luna Vista Limited');
INSERT INTO "myVendors" VALUES(8,1,'Luna Vista Limited');
INSERT INTO "myVendors" VALUES(9,3,'Quiet Beach Industries');
INSERT INTO "myVendors" VALUES(10,3,'Quiet Beach Industries');
CREATE TABLE myProducts (
    itemNumber INT NOT NULL,
    price REAL,
    stockDate TEXT,
    description TEXT
);
INSERT INTO "myProducts" VALUES(1,19.95,'2015-03-31','Hooded sweatshirt');
INSERT INTO "myProducts" VALUES(2,99.99,'2015-03-29','Beach umbrella');
INSERT INTO "myProducts" VALUES(3,0.99,'2015-02-28',NULL);
INSERT INTO "myProducts" VALUES(4,29.95,'2015-02-10','Male bathing suit, blue');
INSERT INTO "myProducts" VALUES(5,49.95,'2015-02-20','Female bathing suit, one piece, aqua');
INSERT INTO "myProducts" VALUES(6,9.95,'2015-01-15','Child sand toy set');
INSERT INTO "myProducts" VALUES(7,24.95,'2014-12-20','White beach towel');
INSERT INTO "myProducts" VALUES(8,32.95,'2014-12-22','Blue-striped beach towel');
INSERT INTO "myProducts" VALUES(9,12.95,'2015-03-12','Flip-flop');
INSERT INTO "myProducts" VALUES(10,34.95,'2015-01-24','Open-toed sandal');
COMMIT;

Transactions

If you look carefully at the output of the .dump command, you see that near the top of the output is a BEGIN TRANSACTION; statement and at the end of the output is a COMMIT; statement. These two statements are explicit instructions, inserted by SQLite, to start a transaction, which is a logical unit of work, and save all operations in the transaction to the database. If a set of operations is not completed successfully, the transaction model requires that the commit does not occur, and instead a rollback is issued to return the database to the state that existed prior to the transaction commencing.


SELECT

In the SQL programming language, the task of performing a query falls to the SELECT statement. To provide all the query functionality required by database applications, the SELECT statement's capabilities are extensive. Before looking at example SELECT statements, lets first look at the formal syntax of SELECT, which, as shown below is actually simple. The basic format is SELECT ... FROM ... WHERE;, you select the columns of interest from rows in a table or tables where certain conditions are satisfied. Of course, things can become considerably more complex. This article covers the basic features of SELECT and defers the more advanced issues to subsequent articles.

SELECT [ DISTINCT | ALL ] SelectItem [ , SelectItem ]*
FROM clause
[ WHERE clause ]
[ GROUP BY clause ]
[ HAVING clause ]

From this you can see that a basic SELECT statement requires only a SELECT and a FROM; you must specify what data to select and indicate the location of the data of interest. Everything else is optional (as indicated by the square brackets). The DISTINCT and ALL keywords are optional qualifiers to indicate that either rows with unique values or all rows should be selected, respectively. By default, ALL is implicitly assumed, and you can use only one DISTINCT qualifier per SELECT statement.

A SELECT statement can have multiple columns listed following the SELECT keyword. Multiple elements (or, more generally, column names) are separated by commas. For example, SELECT a, b, c selects the three columns a, b, and c. To select all columns from a table, you can use the asterisk character (*) as a shorthand for all columns. An important point to remember is that the result of any SELECT statement is a transient SQLite table, and you can use it in many of the same ways you use a more permanent table.

The FROM component of a SELECT statement indicates from which table (or multiple tables) the data will be extracted. For now, we will focus on selecting data from a single table; latter we will cover table joins and selecting data from multiple tables. In this case, the fully qualified name of the table to query must follow the FROM keyword.

The rest of the SELECT statement is optional. Before you build your first query, however, lets review the order in which the SELECT statement components are evaluated:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause

When you break down the process SQLite follows when processing a query, this order is intuitive. First you must locate the data to be analyzed, after which you filter out the rows of interest. The next steps are to group related rows and, finally, to select the actual columns of interest.

SQLite

To demonstrate a SELECT statement, we can extract all the columns from the myProducts table by using the sqlite3 tool and passing the SQL statement in as a command line argument.



In [9]:
!sqlite3 test "SELECT * FROM myProducts ;"


1|19.95|2015-03-31|Hooded sweatshirt
2|99.99|2015-03-29|Beach umbrella
3|0.99|2015-02-28|
4|29.95|2015-02-10|Male bathing suit, blue
5|49.95|2015-02-20|Female bathing suit, one piece, aqua
6|9.95|2015-01-15|Child sand toy set
7|24.95|2014-12-20|White beach towel
8|32.95|2014-12-22|Blue-striped beach towel
9|12.95|2015-03-12|Flip-flop
10|34.95|2015-01-24|Open-toed sandal

In the previous code cell, we used the asterisk character to select all columns from the myProducts table without listing them explicitly. This can be a useful shortcut, especially when you're developing database applications, but it isn't a recommended practice. By using the shortcut, you don't explicitly specify the database column names or their order. In a database application, if you always assume that the column names and their order in a table are fixed, you may end up with subtle bugs if someone else modifies the database tables on which your application depends. You should always explicitly name the database columns in your SELECT statements and list the order you require.

As a result, lets look at explicitly listing the columns to extract. This is a recommended practice that also allows us to control the order in which the columns are listed in the query output.



In [10]:
!sqlite3 test "SELECT price, itemNumber, description FROM myProducts ;"


19.95|1|Hooded sweatshirt
99.99|2|Beach umbrella
0.99|3|
29.95|4|Male bathing suit, blue
49.95|5|Female bathing suit, one piece, aqua
9.95|6|Child sand toy set
24.95|7|White beach towel
32.95|8|Blue-striped beach towel
12.95|9|Flip-flop
34.95|10|Open-toed sandal

WHERE clause

Up to this point, you have only selected columns for all rows in a single table. This can be expensive in terms of query performance, especially if you only want a subset of the rows from a large table. A more efficient technique is to filter database rows by placing conditions in the WHERE clause, which is evaluated immediately after the tables are specified within the FROM clause. The rest of this section discusses some of the basic features that are enabled by using the WHERE clause, including the ability to select rows that satisfy Boolean conditions as well as join multiple tables to perform more complex queries.

The simplest and most common use of the WHERE clause is to filter the rows from a table before selecting any columns, as shown in the next two code cells.



In [11]:
!sqlite3 test "SELECT p.itemNumber, p.price FROM myProducts AS p WHERE p.price > 30.00 ;"


2|99.99
5|49.95
8|32.95
10|34.95

In [12]:
!sqlite3 test "SELECT * FROM myProducts WHERE price > 30.00 AND stockDate < '2015-01-01' ;"


8|32.95|2014-12-22|Blue-striped beach towel

The first query shown in this example selects the itemNumber and price columns from the myProducts table for all rows where the price column has a value greater than $30.00. The second query extends this same query to select only those columns whose price column has a value more than $30.00 and whose stockDate column has a value less than January 1, 2006. These two query restrictions are combined in this query by using the Boolean AND operator.

You can perform a number of different Boolean operations within a WHERE clause. The following table lists and provides examples of the basic SQL Boolean operations that you can use in a query.

Operator Example Description
= p.price = 29.95 Test if any built-in type is equal to a specified value.
< p.price < 29.95 Test if any built-in type is less than a specified value.
> p.price > 29.95 Test if any built-in type is greater than a specified value.
<= p.price <= 29.95 Test if any built-in type is less than or equal to a specified value.
>= p.price >= 29.95 Test if any built-in type is greater than or equal to a specified value.
<> p.price <> 29.95 Test if any built-in type is not equal to a specified value.
IS NULL p.description IS NULL Test if an expression or value is null.
IS NOT NULL p.description IS NOT NULL Test if an expression or value is not null.
AND (p.price > 29.92) AND (p.itemNumber > 5) Test if two expressions are both true or evaluate as nonzero.
OR (p.price > 29.92) OR (p.itemNumber > 5) Test if one or both of two expressions are true or evaluate as nonzero.
NOT NOT v.vendorNumber = 1 Test if an expression is false or evaluates as zero.
BETWEEN p.price BETWEEN 29.95 AND 39.95 Test if a value lies inclusively between two other values (example is equivalent to 29.95 <= p.price <= 39.95).
LIKE v.vendorName LIKE 'Lun%' Test if a character expression matches a pattern, with the percent character (%) matching zero or more arbitrary characters and the underscore character (_) matching exactly one arbitrary character.

The first query above also introduces the AS clause, which you can use to create a table synonym. In these examples, you define a synonym p for the fully qualified table name myProducts. By defining a synonym, you can refer to table quantities by using a shorter notation. This may not seem important when only one table is being referenced in a query, but the next section shows how to join multiple tables together within a query; in that case, providing table synonyms is very useful. You can also use an AS clause to name the selected columns in a query. Doing so lets you control how the results are displayed, which is demonstrated in the next section.


Joins

The second major function performed by a WHERE clause is to join multiple tables together into a single table that can be queried more easily. Joining multiple tables is a powerful technique, and it can be complex when you're dealing with several large tables. Tables can be joined either explicitly, by using the JOIN keyword, or implicitly, by using a WHERE clause.

You join two tables by using an inner join or an outer join. An inner join is essentially the intersection of two tables, where the tables are matched by comparing the values of a key column, such as itemNumber. The resulting table is composed of only rows that were matched between the two tables. An outer join is more like a union of two tables, where the tables are matched by comparing the values of a key column, but non-matching rows are still included in the resulting table and filled with NULL values as appropriate. Writing SQL queries that use these more advanced table joins will be addressed in future articles.

In the current simple scheme, the process is simple; In the next code cell, we perform an implicit inner join of the myProducts table and the myVendors table.



In [13]:
%%writefile select.sql

SELECT p.price, p.description AS 'Item', v.vendorName AS 'Vendor' 
FROM myProducts AS p, myVendors AS v 
  WHERE p.itemNumber = v.itemNumber ;


Writing select.sql

In [14]:
# Execute SQL Script

!sqlite3 test < select.sql


19.95|Hooded sweatshirt|Luna Vista Limited
99.99|Beach umbrella|Luna Vista Limited
0.99||Luna Vista Limited
29.95|Male bathing suit, blue|Mikal Arroyo Incorporated
49.95|Female bathing suit, one piece, aqua|Mikal Arroyo Incorporated
9.95|Child sand toy set|Luna Vista Limited
24.95|White beach towel|Luna Vista Limited
32.95|Blue-striped beach towel|Luna Vista Limited
12.95|Flip-flop|Quiet Beach Industries
34.95|Open-toed sandal|Quiet Beach Industries

This query may seem complex, due primarily to its length. But by breaking it down line by line you can easily follow what's happening. First, you select two columns from the myProducts table and one column from the myVendors table and use an AS clause to name these columns for subsequent usage (in this case they are displayed by the sqlite3 client tool. Because the query joins these two tables (by using an implicit inner join), you can select columns from both tables. In the FROM clause, you list both tables and provide aliases for them to simplify the full SQL statement. In the WHERE clause, you provide the logic for joining the two tables, by explicitly instructing the SQLite database to only select rows from the two tables that have matching values in their respective itemNumber columns. In processing this query, SQLite first pulls all rows out of the first (left) table in the query (myProducts) and finds the row with a matching value in the itemNumber column in the second (right) table in the query (myVendors).


DISTINCT

By default, when data is selected by using an SQL query, all rows that satisfy the WHERE clause are extracted from the database. In some cases, this may result in rows that have identical column values being returned. If you need to restrict your query so that only unique row values are returned, you can use the DISTINCT qualifier, as shown in the following two code cells.



In [14]:
!sqlite3 test "SELECT DISTINCT vendorNumber AS 'Vendor #' FROM myVendors ;"


1
2
3

In [15]:
!sqlite3 test "SELECT DISTINCT vendorNumber AS 'Vendor #', itemNumber as 'Item #' FROM myVendors WHERE itemNumber > 5 ;"


1|6
1|7
1|8
3|9
3|10

If you want to use the DISTINCT qualifier, it must be the first item listed in the SELECT clause, as shown in Listing 1, and you can have only one DISTINCT qualifier per SELECT clause. If the selected rows contain a column with NULL values, multiple NULL values are considered duplicates when identifying unique rows.

The first query in this listing uses the DISTINCT qualifier to restrict the output of the query to only distinct, or unique, values of the vendorNumber column, which is the only column listed in the SELECT clause. In the example schema that these articles use, there are only three vendors (with vendorNumber being restricted to 1, 2, or 3). Thus, when the DISTINCT qualifier is used in the query, only three rows are selected.

The DISTINCT qualifier, however, applies to the entire list of selected columns, so if multiple columns are listed following a DISTINCT keyword, only unique combinations of all the columns are selected. This is demonstrated in the second example, where both vendorNumber and itemNumber are listed in the SELECT clause. Because every item has a unique itemNumber, every combination of these two columns is unique, and all rows that satisfy the WHERE clause are selected; in other words, the DISTINCT qualifier has no effect on the results.

One remaining point that you may have noticed from the two previous examples is that the selected rows were not in the same order. If the order of selected rows is important, you can easily control it by using an ORDER BY clause in your query.

ORDER BY

In general, you can't assume that SQLite, or any database, will return rows from a query in a specific order. If the order is important, you can use the ORDER BY clause to have SQLite order the data that are returned by your query in a particular manner. Generally, you do so by specifying a column that should be used to provide the ordinal values for comparison as shown in the next two code cells.



In [16]:
%%writefile orderby.sql

SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
    p.price AS "Price", p.itemNumber AS "Item #"
    FROM myProducts AS p, myVendors AS v
    WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
    ORDER by v.vendorNumber ;
     
SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
    p.price AS "Price", p.itemNumber AS "Item #"
    FROM myProducts AS p, myVendors AS v
    WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
    ORDER BY v.vendorNumber ASC, p.price DESC ;


Overwriting orderby.sql

In [17]:
!sqlite3 test < orderby.sql


1|Luna Vista Limited|99.99|2
1|Luna Vista Limited|24.95|7
1|Luna Vista Limited|32.95|8
2|Mikal Arroyo Incorporated|29.95|4
2|Mikal Arroyo Incorporated|49.95|5
3|Quiet Beach Industries|34.95|10
1|Luna Vista Limited|99.99|2
1|Luna Vista Limited|32.95|8
1|Luna Vista Limited|24.95|7
2|Mikal Arroyo Incorporated|49.95|5
2|Mikal Arroyo Incorporated|29.95|4
3|Quiet Beach Industries|34.95|10

In the previous example, the first query uses the ORDER BY clause to list a subset of all the rows in the table that results from joining the myVendors table to the myProducts table. The rows are ordered by vendorNumber (the subset is constructed by applying the WHERE clause). An ORDER BY clause can take either a column name, as in this example, or a column number, which is taken from the order in which the columns are listed after the SELECT keyword.

You can also specify multiple columns to use during the sorting process and even specify ASC for ascending order, which is the default, or DESC for descending order. For example, if you used the ORDER BY 1 DESC, 4 DESC clause in the first query, the query would return the same rows, but they would be ordered by using the vendorNumber column as the primary sort column in descending order followed by the itemNumber column as the secondary sort column in descending order.

Although using column numbers may seem like a handy shortcut, it generally isn't a good idea. To see why, consider what happens if you modify the columns listed in a SELECT clause or just modify their order. If you forget to modify the numbers used in the ORDER BY clause, the query will break-or worse, return bad data. In general, it's a best practice to always be explicit and specify the column names directly, even if doing so means more typing.

Query Math

Selecting columns from a database provides a number of useful benefits, but being able to compute and select quantities based on data in a table opens up even more possibilities. SQLite,as does any SQL database, provides several mathematical operators, the most common of which are listed in the following table, that you can use in either a SELECT clause or a WHERE clause.

Operator Example Description
unary + +1.0 A noop, or no operation, as +4 = 4
unary - -p.price Changes the sign of the value to which it's applied
+ p.itemNumber + 10 Adds the second value to the first value
- p.itemNumber - 10 Subtracts the second value from the first value
* p.price * 1.0825 Multiplies the first value by the second value
/ p.price / 100.0 Divides the first value by the second value

Using these operators is straightforward because they generally behave exactly as you expect. For example, if the sales tax is 8.25%, you can return the price for an item both before and after sales tax has been applied by using SELECT price, price * 1.0825 FROM myProducts ;. As another example, if you have a column called numberItems that tracks the number of items purchased and another column called price that contains the price at which they're purchased, you can return the total amount paid for those items at a given price by using numberItems * price. Several of the queries shown in this IPython Notebook provide additional examples of how to use these operators.

The only concern when using these operators arises from complications that result from using different data types, such as integer or floating-point, in a mathematical operation. If both operands are the same data type, the result type will be the same. If you're performing division, this can result in truncation (for example, if you're using two integer values), which might cause unexpected problems. On the other hand, if the two operands are different data types, the result type is promoted to the more complex type.

SQL Functions

SQL is a powerful and expressive language that can be used to perform a wide range of actions. Part of the SQL language's power comes from its ability to directly interact with a variety of data types. Some of the greatest power of a relational database arises from the inherent functions that it provides and from the extensibility enabled by allowing users to create new functions. SQLite provides functions (see the following SQLite tutorial for more information) in three different categories:

  1. Core functions
  2. Aggregate functions
  3. Date and Time functions

Of these built-in functions, we will most likely use the aggregate functions, which operate on multiple rows. Aggregate functions-also known as set functions in SQL-92 or, more informally, as column functions-return a computed quantity from a column over a number of rows. SQLite supports the following five aggregate functions (a sixth function, group_concat, is not listed).

Function Example Description
AVG AVG(p.price) Returns the average value of a column from all rows that satisfy an expression. Can only be used with built-in numeric data types. The precision of the returned value is defined by the precision of the column being evaluated.
COUNT COUNT(p.price) Returns the number of rows that satisfy an expression, such as a query. Can be used with any data type.
MAX MAX(p.price) Returns the maximum value of a column from all rows that satisfy an expression. Can only be used with built-in data types.
MIN MIN(p.price) Returns the minimum value of a column from all rows that satisfy an expression. Can only be used with built-in data types.
SUM SUM(p.price) Returns the sum of a column over all rows that satisfy an expression. Can only be used with built-in numeric data types.

These aggregate functions can often be used to quickly find useful information that might otherwise be difficult to identify, as shown in Listing 4.

SELECT COUNT(p.itemNumber) AS Number, 
        AVG(p.price) AS Average,
        MIN(p.stockDate) AS "First Date", MAX(p.stockDate) AS "Last Date"
    FROM myProducts AS p ;

Listing 4 uses four of the five aggregate functions to get summary information about the data in the myProducts table. The COUNT function indicates that the table includes ten rows (because the query didn't use a WHERE clause to restrict the rows selected from the table). The AVG function calculates the average price of all items in the myProducts table. Finally, the MIN and MAX functions extract the minimum and maximum dates from the myProducts table.


DELETE

To delete data in a SQLite database, you use the SQL DELETE statement, which can delete either all rows in a table or a specific subset of rows. The formal syntax for the SQL DELETE statement is remarkably simple:

DELETE FROM tableName
    [WHERE clause]

The DELETE statement deletes all rows from the specified table that satisfy an optional WHERE clause. If no WHERE clause is included, all rows in the table are deleted. To demonstrate this use of the DELETE statement, we can create a temporary table, insert several rows, and delete them all.



In [18]:
%%writefile delete.sql

-- First create the temporary table
CREATE TABLE temp (aValue INT) ;

-- Insert fake data
INSERT INTO temp VALUES(0), (1), (2), (3) ;

-- Count rows in the table
SELECT COUNT(*) AS COUNT FROM temp ; 

-- Delete all rows
DELETE FROM temp ;

-- Count all rows in the table
SELECT COUNT(*) AS COUNT FROM temp ; 

-- Now drop the temporary table

DROP TABLE temp ;


Writing delete.sql

In [19]:
!sqlite3 test < delete.sql


4
0

The previous example created a single-column temporary table to hold a single integer value. Next we inserted four rows into the database and issued a SELECT statement to verify that the new table contained four rows. By using an unconstrained DELETE statement, we delete all four rows from the temporary table, which is verified by the second SELECT statement, which indicates that the temporary table contains zero rows. Finally, the DROP TABLE statement deletes the empty table from the schema.

In general, however, you don't want to delete all rows from a table; instead, you'll selectively delete rows. To do this, you create an appropriate WHERE clause that identifies all rows of interest. The syntax for the WHERE clause that you can use with a DELETE statement is identical to that discussed previously when we presented the full SQL SELECT statement syntax. The basic building blocks for constructing a Boolean expression within a WHERE clause were presented in an earlier table. The following example demonstrates using a WHERE clause in a DELETE statement, where we delete all rows that satisfy at least one of two conditions.



In [18]:
%%writefile delete2.sql

-- First display data
SELECT itemNumber, description FROM myProducts ;

-- Selectively delete rows
DELETE FROM myProducts 
    WHERE description LIKE '%towel%' OR itemNumber <= 3 ;

-- Confirm the proper deletion
SELECT itemNumber, description FROM myProducts ;


Writing delete2.sql

In [19]:
!sqlite3 test < delete2.sql


1|Hooded sweatshirt
2|Beach umbrella
3|
4|Male bathing suit, blue
5|Female bathing suit, one piece, aqua
6|Child sand toy set
7|White beach towel
8|Blue-striped beach towel
9|Flip-flop
10|Open-toed sandal
4|Male bathing suit, blue
5|Female bathing suit, one piece, aqua
6|Child sand toy set
9|Flip-flop
10|Open-toed sandal

In this example, the DELETE statement includes a WHERE clause that identifies five rows. The WHERE clause contains two expressions that are joined by the OR operator, which means that if either expression evaluates as TRUE for a specific row, that row will be deleted.

The first expression finds all rows that contain the word "towel" in the product description. If you recall, there are two towels in the myProducts table, with itemNumber column values of 7 and 8. The other expression selects all rows with an itemNumber column value less than or equal to 3. The contents of the myProducts table are finally displayed with a simple SELECT statement, demonstrating that only five of the original ten rows remain in the table.

Although this example doesn't explicitly demonstrate their use, you can also include the SQL functions to gain more control over the selection of rows for deletion. These same functions and other operators that can be used in the WHERE clause of the DELETE statement also can be used with the UPDATE statement to selectively modify the values of rows in a table, as described in the next section.

UPDATE

The last SQL task for dealing with data that you need to address is updating specific column values for selected rows in a table. At some level, the SQL UPDATE statement is the union of the SQL INSERT and DELETE statements, because you must select rows to modify as well as specify how to modify them. Formally, the UPDATE statement syntax is straightforward, because you must specify the new column values for the set of rows to be updated:

UPDATE tableName
    SET columnName = Value
    [ , columnName = Value} ]*
    [WHERE clause]

As shown in this SQL syntax, an SQL UPDATE statement must have, at a minimum, one SET component to update one column, along with one or more SET components and a WHERE clause, both of which are optional. If the WHERE clause isn't included, the UPDATE statement modifies the indicated columns for all rows in the table.

Issuing an UPDATE statement is fairly easy, as shown in the following code example, where we modify two columns of a single row.



In [22]:
%%writefile update.sql

-- Extract the test row
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6 ;

-- Update the row
UPDATE myProducts SET price = price * 1.25, stockDate = date('now')  WHERE itemNumber = 6 ;

-- Show the new result
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6 ;


Writing update.sql

In [23]:
!sqlite3 test < update.sql


6|9.95|2015-01-15
6|12.4375|2015-02-24

This example wraps a single UPDATE statement with SELECT statements to demonstrate the change to the target row. The SELECT statements both select three columns from the myProducts table for a single row (the row with the value 6 in the itemNumber column). The UPDATE statement modifies both the price and the stockDate columns for this specific row. The value in the price column is increased by 25% (for example, perhaps due to the item's popularity), and the stockDate column is modified to hold the current date, which can be obtained easily with SQLite by using the built-in date function with an argument of now in an SQL query.

The previous example demonstrated how to modify multiple column values for a specific row in a single table. However, sometimes the logic to select rows to update is more complex. For example, suppose you need to modify the price of all objects in the myProducts table that you obtain from Quiet Beach Industries, which has a value of 3 in the vendorNumber column in the myVendors table. To do this, you need to use an embedded query:



In [24]:
%%writefile update2.sql

-- Update the table
UPDATE myProducts
    SET price = price * 1.10, description = 'NEW: ' || description
    WHERE itemNumber IN 
        ( SELECT v.itemNumber 
          FROM myProducts as p, myVendors as v 
          WHERE p.itemNumber = v.itemNumber AND v.vendorNumber = 3 ) ;

-- Show new results
SELECT * FROM myProducts ;


Writing update2.sql

In [25]:
!sqlite3 test < update2.sql


4|29.95|2015-02-10|Male bathing suit, blue
5|49.95|2015-02-20|Female bathing suit, one piece, aqua
6|12.4375|2015-02-24|Child sand toy set
9|14.245|2015-03-12|NEW: Flip-flop
10|38.445|2015-01-24|NEW: Open-toed sandal

In this example, the UPDATE statement modifies the price and description columns for all products that are obtained from the vendor with a value of 3 in the vendorNumber column in the myVendors table. Because you can't do a simple join within an UPDATE statement, you must include a subquery in the WHERE clause to extract the itemNumber rows that correspond to products from Quiet Beach Industries. The WHERE clause in the UPDATE statement uses the IN operator to select those rows that have an itemNumber column in the set of values selected by the embedded query.

Two types of queries can be used in the WHERE clause of an UPDATE statement: a scalar subquery and a table subquery. A scalar subquery is an embedded query that returns a single row that contains a single column-essentially, a single value, which is known as a scalar. You can use a scalar subquery to select a specific value that will be used in the expression of the WHERE clause. For example, itemNumber = (Scalar Subquery) updates any rows that have a value in the itemNumber column that matches the result of the scalar subquery.

A table subquery, on the other hand, can return multiple rows that generally only have one column. In certain instances, a table subquery can contain multiple columns. To use a table subquery, you need to use an SQL operator to combine the embedded query with a Boolean expression. For example, this was shown in the previous code listing, where the IN operator selected all rows from the myProducts table that were produced by Quiet Beach Industries.

ALTER TABLE

The previous section discussed modifying the data that already exists in a table. The other possibility is modifying the structure, or schema, of a database table. This can take the form of adding a column, changing the data type for a column, adding a constraint, or even deleting a column. This process isn't easy, which is one reason to be careful when you initially design your schema. If you do need to modify the structure of a table, you should use a temporary table.



In [26]:
%%writefile rename.sql

-- Create New table with extra column

CREATE TABLE newProducts (
    itemNumber INT NOT NULL,
    price REAL,
    stockDate TEXT,
    count INT NOT NULL DEFAULT 0,
    description TEXT
) ;

-- New copy old table into new table

INSERT INTO newProducts(itemNumber, price, stockDate, description) 
    SELECT itemNumber, price, stockDate, description FROM myProducts ;

-- Drop old table
DROP TABLE myProducts ;

-- Now Rename new table to old table name
ALTER TABLE newProducts RENAME TO myProducts ;

-- Show the results
SELECT * FROM myProducts ;


Writing rename.sql

In [27]:
# Execute SQL Script

!sqlite3 test < rename.sql


4|29.95|2015-02-10|0|Male bathing suit, blue
5|49.95|2015-02-20|0|Female bathing suit, one piece, aqua
6|12.4375|2015-02-24|0|Child sand toy set
9|14.245|2015-03-12|0|NEW: Flip-flop
10|38.445|2015-01-24|0|NEW: Open-toed sandal

As this example shows, to modify a table-in this case, to add a new count column to the myProducts table-you first create a table that has the exact schema you require. This example requires that it always have a valid value by including the column constraint NOT NULL and assigns a default value of 0 to the count column by using the column constraint DEFAULT 0. Notice how you can combine multiple column constraints by listing them sequentially.

The next step is to copy the existing data from the original table to the new table. You can do so by using an SQL INSERT statement that uses a subquery to get the values to insert. This is a powerful technique that lets you easily copy all or part of an existing table into a second table.

After you've created the new table and copied the appropriate data, you drop the old table by using an SQL DROP TABLE statement and rename the new table to the original name by using an SQL RENAME TABLE statement. The rename operation is straightforward: Rename the oldTableName to the newTableName, but don't supply a schema name for the new table name because the RENAME operation can't move a table between different database schemas. This example concludes by issuing a SELECT statement to display the schema and contents of the new myProducts table. As you can see, the new table has five columns, and the count column is always zero. At this point, a real application would modify the count column appropriately by issuing the necessary SQL UPDATE statements.


CREATE INDEX

Relational databases support the concept of an index to speed up queries. An index is often used when a particular column (or columns) is frequently involved in either a WHERE clause or a table join. The syntax for creating an index is rather simple:

CREATE [UNIQUE] INDEX idx_name ON table_name(column [column]*)
[WHERE]

In this format, we create an index on one or more columns in a given table. If UNIQUE is present in the index creation, only non-duplicate entries are allowed in the index. The columns can also be followed by either ASC or DESC to indicate the column sort order for the index. Finally, if a WHERE clause is included the index is only a partial index since it does not cover the entire table.

As an example, we can create an index on the myProducts table by using the itemNumber column:

CREATE INDEX itn ON myProducts(itemNumber) ;


Advanced Features

Of course we have only scratched the surface of SQL, despite the length of this IPython Notebook. We have not discussed views, which effectively turns a SQL query into a new, read-only table, or triggers, which are database operations that are automatically performed when a specific event occurs.

Another useful feature that we will use in the third lesson is the LIMIT clause. We can use this to restrict the number of rows returned by a particular query. For example, SELECT * FROM myProducts LIMIT 5 ; will only return five rows.


Additional References

Several sites exist that allow you to try out SQL commands online.

  1. W3 Schools SQL, a general SQL demo site.
  2. SQLZoo, allows you to specify the Relational Database to target.

Return to the Week 10 Index.