In this lesson, we focus on the basic task of manipulating data in relational database management systems by using SQL DML. In this Notebook we will expand on our use of the SQLite database to build and query a fictitious database. We will cover inserting data into tables, creating and executing queries, and finishing with updating and deleting data.
One of the most important tasks when you're building a database application is inserting data into the database. It doesn't matter how good the database software is-if you put bad data in a database, nothing else matters. There are several different ways to insert data into a database, but the rest of this lesson focuses on inserting data into a SQLite database by using the SQL INSERT statement.
Before you can insert data into a SQLite database using the SQL INSERT statement, you must know how to properly use this statement. The full syntax for the SQL INSERT statement is
INSERT INTO table-Name
[ (Simple-column-Name [ , Simple-column-Name]* ) ]
Expression
which should seem familiar. As discussed previously, the square brackets ([]) enclose optional parameters. The only component whose purpose isn't immediately clear is Expression; but how complex can that simple phrase be? Of course, appearances can be deceiving; the Expression term can expand to one of four different structures:
Of these, the last two are beyond the scope of this lesson. The first two are similar; the only difference is that the first form inserts one row into a table, whereas the latter form inserts multiple rows into a table.
You can use the optional part of the SQL INSERT statement to specify the column order of the values being inserted into the table. By default, data is inserted into a table's columns in the same order that the columns were listed when the table was created. Sometimes you may want to change this order or perhaps only specify values for columns that have NOT NULL constraints. By explicitly listing the columns in your SQL INSERT statement, you gain more control of the operation and can more easily handle these specific use cases.
The syntax for the SQL VALUES expression is fairly simple,
{
VALUES ( Value {, Value }* )
[ , ( Value {, Value }* ) ]* |
VALUES Value [ , Value ]*
}
This syntax displays the multiple-row format first, followed by the single-row format (remember that the vertical line character, |, means or and that the asterisk character, *, means one or more). The value term stands for a value that you want to insert into a specific column. To insert data into multiple columns, you must enclose the data for a row in parentheses separated by commas.
As shown below, to insert data into a table, you first need to to make sure that the table exists. If you haven't already done so, execute the table creation scripts discussed earlier.
INSERT INTO myProducts
VALUES(1, 19.95, '2015-03-31', 'Hooded sweatshirt') ;
INSERT INTO myProducts(itemNumber, price, stockDate, description)
VALUES(2, 99.99, '2015-03-29', 'Beach umbrella') ;
INSERT INTO myProducts(itemNumber, price, stockDate)
VALUES(3, 0.99, '2015-02-28') ;
This example presents three single-row inserts into the myProducts table. The first SQL INSERT statement doesn't provide a list of columns; it inserts an itemNumber, a price, a stockDate, and a description. Notice that the values inserted into both the stockDate and description columns are enclosed in single quote characters. The description column is a TEXT field, so it expects a string (which you indicate by enclosing the character data within single quotes). The stockDate column is also a TEXT field; as part of our application logic, we could pass in dates in the correct day, month, and year format. (For more guidance on the format of data types during a SQL INSERT operation, read the SQLite documentation).
The second SQL INSERT statement explicitly lists all four columns and inserts new values appropriately. The final SQL INSERT statement lists only three columns and inserts only three values. The description column is left empty, which means it will have a NULL value.
Although single-row SQL INSERT statements can be useful, when you need to insert multiple rows, it's more efficient to do so directly, as shown below:
INSERT INTO myProducts(itemNumber, price, stockDate, description)
VALUES (4, 29.95, '2015-02-10', 'Male bathing suit, blue'),
(5, 49.95, '2015-02-20', 'Female bathing suit, one piece, aqua'),
(6, 9.95, '2015-01-15', 'Child sand toy set'),
(7, 24.95, '2014-12-20', 'White beach towel'),
(8, 32.95, '2014-12-22', 'Blue-striped beach towel'),
(9, 12.95, '2015-03-12', 'Flip-flop'),
(10, 34.95, '2015-01-24', 'Open-toed sandal') ;
In this example, we insert seven rows into the database by explicitly listing all four columns and providing new values for each row. As discussed earlier, multiple-row inserts enclose the values for each new row within parentheses, and these values are separated by commas.
To actually execute these statements, we can place the SQL INSERT
statements in a script file and run the script to insert the data. This
approach lets you more easily fix errors or reinsert the data if
necessary without recreating the requisite SQL INSERT statements. In the
following code cells, we first create a SQL INSERT script file, before
executing this script by using the sqlite3
client tool. After this, we
use the .dump
command in the sqlite3
client tool to display the full
schema and contents of this SQLite database.
In [5]:
%%writefile insert.sql
-- Single unnamed INSERT
INSERT INTO myProducts
VALUES(1, 19.95, '2015-03-31', 'Hooded sweatshirt') ;
-- Single named INSERT
INSERT INTO myProducts (itemNumber, price, stockDate, description)
VALUES(2, 99.99, '2015-03-29', 'Beach umbrella') ;
-- Single named INSERT with missing data
INSERT INTO myProducts (itemNumber, price, stockDate)
VALUES(3, 0.99, '2015-02-28') ;
-- Multiple named INSERT
INSERT INTO myProducts (itemNumber, price, stockDate, description)
VALUES (4, 29.95, '2015-02-10', 'Male bathing suit, blue'),
(5, 49.95, '2015-02-20', 'Female bathing suit, one piece, aqua'),
(6, 9.95, '2015-01-15', 'Child sand toy set'),
(7, 24.95, '2014-12-20', 'White beach towel'),
(8, 32.95, '2014-12-22', 'Blue-striped beach towel'),
(9, 12.95, '2015-03-12', 'Flip-flop'),
(10, 34.95, '2015-01-24', 'Open-toed sandal') ;
-- Insert into myVendors
INSERT INTO myVendors(itemNumber, vendorNumber, vendorName)
VALUES (1, 1, 'Luna Vista Limited'),
(2, 1, 'Luna Vista Limited'),
(3, 1, 'Luna Vista Limited'),
(4, 2, 'Mikal Arroyo Incorporated'),
(5, 2, 'Mikal Arroyo Incorporated'),
(6, 1, 'Luna Vista Limited'),
(7, 1, 'Luna Vista Limited'),
(8, 1, 'Luna Vista Limited'),
(9, 3, 'Quiet Beach Industries'),
(10, 3, 'Quiet Beach Industries') ;
In [6]:
!head -10 insert.sql
In [7]:
!sqlite3 test < insert.sql
In [8]:
!sqlite3 test ".dump"
If you look carefully at the output of the .dump
command, you see that
near the top of the output is a BEGIN TRANSACTION;
statement and at
the end of the output is a COMMIT;
statement. These two statements are
explicit instructions, inserted by SQLite, to start a transaction, which
is a logical unit of work, and save all operations in the transaction to
the database. If a set of operations is not completed successfully, the
transaction model requires that the commit does not occur, and instead a
rollback is issued to return the database to the state that existed
prior to the transaction commencing.
In the SQL programming language, the task of performing a query falls to
the SELECT statement. To provide all the query functionality required by
database applications, the SELECT statement's capabilities are
extensive. Before looking at example SELECT statements, lets first look
at the formal syntax of SELECT, which, as shown below is actually
simple. The basic format is SELECT ... FROM ... WHERE;
, you select the
columns of interest from rows in a table or tables where certain
conditions are satisfied. Of course, things can become considerably more
complex. This article covers the basic features of SELECT and defers the
more advanced issues to subsequent articles.
SELECT [ DISTINCT | ALL ] SelectItem [ , SelectItem ]*
FROM clause
[ WHERE clause ]
[ GROUP BY clause ]
[ HAVING clause ]
From this you can see that a basic SELECT statement requires only a SELECT and a FROM; you must specify what data to select and indicate the location of the data of interest. Everything else is optional (as indicated by the square brackets). The DISTINCT and ALL keywords are optional qualifiers to indicate that either rows with unique values or all rows should be selected, respectively. By default, ALL is implicitly assumed, and you can use only one DISTINCT qualifier per SELECT statement.
A SELECT statement can have multiple columns listed following the SELECT
keyword. Multiple elements (or, more generally, column names) are
separated by commas. For example, SELECT a, b, c
selects the three
columns a, b, and c. To select all columns from a table, you can use the
asterisk character (*) as a shorthand for all columns. An important
point to remember is that the result of any SELECT statement is a
transient SQLite table, and you can use it in many of the same ways you
use a more permanent table.
The FROM component of a SELECT statement indicates from which table (or multiple tables) the data will be extracted. For now, we will focus on selecting data from a single table; latter we will cover table joins and selecting data from multiple tables. In this case, the fully qualified name of the table to query must follow the FROM keyword.
The rest of the SELECT statement is optional. Before you build your first query, however, lets review the order in which the SELECT statement components are evaluated:
When you break down the process SQLite follows when processing a query, this order is intuitive. First you must locate the data to be analyzed, after which you filter out the rows of interest. The next steps are to group related rows and, finally, to select the actual columns of interest.
To demonstrate a SELECT statement, we can extract all the columns from
the myProducts table by using the sqlite3
tool and passing the SQL
statement in as a command line argument.
In [9]:
!sqlite3 test "SELECT * FROM myProducts ;"
In the previous code cell, we used the asterisk character to select all columns from the myProducts table without listing them explicitly. This can be a useful shortcut, especially when you're developing database applications, but it isn't a recommended practice. By using the shortcut, you don't explicitly specify the database column names or their order. In a database application, if you always assume that the column names and their order in a table are fixed, you may end up with subtle bugs if someone else modifies the database tables on which your application depends. You should always explicitly name the database columns in your SELECT statements and list the order you require.
As a result, lets look at explicitly listing the columns to extract. This is a recommended practice that also allows us to control the order in which the columns are listed in the query output.
In [10]:
!sqlite3 test "SELECT price, itemNumber, description FROM myProducts ;"
Up to this point, you have only selected columns for all rows in a single table. This can be expensive in terms of query performance, especially if you only want a subset of the rows from a large table. A more efficient technique is to filter database rows by placing conditions in the WHERE clause, which is evaluated immediately after the tables are specified within the FROM clause. The rest of this section discusses some of the basic features that are enabled by using the WHERE clause, including the ability to select rows that satisfy Boolean conditions as well as join multiple tables to perform more complex queries.
The simplest and most common use of the WHERE clause is to filter the rows from a table before selecting any columns, as shown in the next two code cells.
In [11]:
!sqlite3 test "SELECT p.itemNumber, p.price FROM myProducts AS p WHERE p.price > 30.00 ;"
In [12]:
!sqlite3 test "SELECT * FROM myProducts WHERE price > 30.00 AND stockDate < '2015-01-01' ;"
The first query shown in this example selects the itemNumber
and
price
columns from the myProducts table for all rows where the price
column has a value greater than $30.00
. The second query extends this
same query to select only those columns whose price column has a value
more than $30.00
and whose stockDate column has a value less than
January 1, 2006. These two query restrictions are combined in this query
by using the Boolean AND operator.
You can perform a number of different Boolean operations within a WHERE clause. The following table lists and provides examples of the basic SQL Boolean operations that you can use in a query.
Operator | Example | Description |
---|---|---|
= | p.price = 29.95 |
Test if any built-in type is equal to a specified value. |
< | p.price < 29.95 |
Test if any built-in type is less than a specified value. |
> | p.price > 29.95 |
Test if any built-in type is greater than a specified value. |
<= | p.price <= 29.95 |
Test if any built-in type is less than or equal to a specified value. |
>= | p.price >= 29.95 |
Test if any built-in type is greater than or equal to a specified value. |
<> | p.price <> 29.95 |
Test if any built-in type is not equal to a specified value. |
IS NULL | p.description IS NULL |
Test if an expression or value is null. |
IS NOT NULL | p.description IS NOT NULL |
Test if an expression or value is not null. |
AND | (p.price > 29.92) AND (p.itemNumber > 5) |
Test if two expressions are both true or evaluate as nonzero. |
OR | (p.price > 29.92) OR (p.itemNumber > 5) |
Test if one or both of two expressions are true or evaluate as nonzero. |
NOT | NOT v.vendorNumber = 1 |
Test if an expression is false or evaluates as zero. |
BETWEEN | p.price BETWEEN 29.95 AND 39.95 |
Test if a value lies inclusively between two other values (example is equivalent to 29.95 <= p.price <= 39.95 ). |
LIKE | v.vendorName LIKE 'Lun%' |
Test if a character expression matches a pattern, with the percent character (%) matching zero or more arbitrary characters and the underscore character (_) matching exactly one arbitrary character. |
The first query above also introduces the AS clause, which you can use to create a table synonym. In these examples, you define a synonym p for the fully qualified table name myProducts. By defining a synonym, you can refer to table quantities by using a shorter notation. This may not seem important when only one table is being referenced in a query, but the next section shows how to join multiple tables together within a query; in that case, providing table synonyms is very useful. You can also use an AS clause to name the selected columns in a query. Doing so lets you control how the results are displayed, which is demonstrated in the next section.
The second major function performed by a WHERE clause is to join multiple tables together into a single table that can be queried more easily. Joining multiple tables is a powerful technique, and it can be complex when you're dealing with several large tables. Tables can be joined either explicitly, by using the JOIN keyword, or implicitly, by using a WHERE clause.
You join two tables by using an inner join or an outer join. An inner join is essentially the intersection of two tables, where the tables are matched by comparing the values of a key column, such as itemNumber. The resulting table is composed of only rows that were matched between the two tables. An outer join is more like a union of two tables, where the tables are matched by comparing the values of a key column, but non-matching rows are still included in the resulting table and filled with NULL values as appropriate. Writing SQL queries that use these more advanced table joins will be addressed in future articles.
In the current simple scheme, the process is simple; In the next code cell, we perform an implicit inner join of the myProducts table and the myVendors table.
In [13]:
%%writefile select.sql
SELECT p.price, p.description AS 'Item', v.vendorName AS 'Vendor'
FROM myProducts AS p, myVendors AS v
WHERE p.itemNumber = v.itemNumber ;
In [14]:
# Execute SQL Script
!sqlite3 test < select.sql
This query may seem complex, due primarily to its length. But by
breaking it down line by line you can easily follow what's happening.
First, you select two columns from the myProducts table and one column
from the myVendors table and use an AS clause to name these columns for
subsequent usage (in this case they are displayed by the sqlite3
client tool. Because the query joins these two tables (by using an
implicit inner join), you can select columns from both tables. In the
FROM clause, you list both tables and provide aliases for them to
simplify the full SQL statement. In the WHERE clause, you provide the
logic for joining the two tables, by explicitly instructing the SQLite
database to only select rows from the two tables that have matching
values in their respective itemNumber columns. In processing this query,
SQLite first pulls all rows out of the first (left) table in the query
(myProducts) and finds the row with a matching value in the itemNumber
column in the second (right) table in the query (myVendors).
By default, when data is selected by using an SQL query, all rows that satisfy the WHERE clause are extracted from the database. In some cases, this may result in rows that have identical column values being returned. If you need to restrict your query so that only unique row values are returned, you can use the DISTINCT qualifier, as shown in the following two code cells.
In [14]:
!sqlite3 test "SELECT DISTINCT vendorNumber AS 'Vendor #' FROM myVendors ;"
In [15]:
!sqlite3 test "SELECT DISTINCT vendorNumber AS 'Vendor #', itemNumber as 'Item #' FROM myVendors WHERE itemNumber > 5 ;"
If you want to use the DISTINCT qualifier, it must be the first item listed in the SELECT clause, as shown in Listing 1, and you can have only one DISTINCT qualifier per SELECT clause. If the selected rows contain a column with NULL values, multiple NULL values are considered duplicates when identifying unique rows.
The first query in this listing uses the DISTINCT qualifier to restrict the output of the query to only distinct, or unique, values of the vendorNumber column, which is the only column listed in the SELECT clause. In the example schema that these articles use, there are only three vendors (with vendorNumber being restricted to 1, 2, or 3). Thus, when the DISTINCT qualifier is used in the query, only three rows are selected.
The DISTINCT qualifier, however, applies to the entire list of selected columns, so if multiple columns are listed following a DISTINCT keyword, only unique combinations of all the columns are selected. This is demonstrated in the second example, where both vendorNumber and itemNumber are listed in the SELECT clause. Because every item has a unique itemNumber, every combination of these two columns is unique, and all rows that satisfy the WHERE clause are selected; in other words, the DISTINCT qualifier has no effect on the results.
One remaining point that you may have noticed from the two previous examples is that the selected rows were not in the same order. If the order of selected rows is important, you can easily control it by using an ORDER BY clause in your query.
In general, you can't assume that SQLite, or any database, will return rows from a query in a specific order. If the order is important, you can use the ORDER BY clause to have SQLite order the data that are returned by your query in a particular manner. Generally, you do so by specifying a column that should be used to provide the ordinal values for comparison as shown in the next two code cells.
In [16]:
%%writefile orderby.sql
SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
p.price AS "Price", p.itemNumber AS "Item #"
FROM myProducts AS p, myVendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
ORDER by v.vendorNumber ;
SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
p.price AS "Price", p.itemNumber AS "Item #"
FROM myProducts AS p, myVendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
ORDER BY v.vendorNumber ASC, p.price DESC ;
In [17]:
!sqlite3 test < orderby.sql
In the previous example, the first query uses the ORDER BY clause to list a subset of all the rows in the table that results from joining the myVendors table to the myProducts table. The rows are ordered by vendorNumber (the subset is constructed by applying the WHERE clause). An ORDER BY clause can take either a column name, as in this example, or a column number, which is taken from the order in which the columns are listed after the SELECT keyword.
You can also specify multiple columns to use during the sorting process and even specify ASC for ascending order, which is the default, or DESC for descending order. For example, if you used the ORDER BY 1 DESC, 4 DESC clause in the first query, the query would return the same rows, but they would be ordered by using the vendorNumber column as the primary sort column in descending order followed by the itemNumber column as the secondary sort column in descending order.
Although using column numbers may seem like a handy shortcut, it generally isn't a good idea. To see why, consider what happens if you modify the columns listed in a SELECT clause or just modify their order. If you forget to modify the numbers used in the ORDER BY clause, the query will break-or worse, return bad data. In general, it's a best practice to always be explicit and specify the column names directly, even if doing so means more typing.
Selecting columns from a database provides a number of useful benefits, but being able to compute and select quantities based on data in a table opens up even more possibilities. SQLite,as does any SQL database, provides several mathematical operators, the most common of which are listed in the following table, that you can use in either a SELECT clause or a WHERE clause.
Operator | Example | Description |
---|---|---|
unary + | +1.0 | A noop, or no operation, as +4 = 4 |
unary - | -p.price | Changes the sign of the value to which it's applied |
+ | p.itemNumber + 10 | Adds the second value to the first value |
- | p.itemNumber - 10 | Subtracts the second value from the first value |
* | p.price * 1.0825 | Multiplies the first value by the second value |
/ | p.price / 100.0 | Divides the first value by the second value |
Using these operators is straightforward because they generally behave
exactly as you expect. For example, if the sales tax is 8.25%, you can
return the price for an item both before and after sales tax has been
applied by using SELECT price, price * 1.0825 FROM myProducts ;
. As
another example, if you have a column called numberItems
that tracks
the number of items purchased and another column called price
that
contains the price at which they're purchased, you can return the total
amount paid for those items at a given price by using
numberItems * price
. Several of the queries shown in this
IPython Notebook provide additional examples of how to use these
operators.
The only concern when using these operators arises from complications that result from using different data types, such as integer or floating-point, in a mathematical operation. If both operands are the same data type, the result type will be the same. If you're performing division, this can result in truncation (for example, if you're using two integer values), which might cause unexpected problems. On the other hand, if the two operands are different data types, the result type is promoted to the more complex type.
SQL is a powerful and expressive language that can be used to perform a wide range of actions. Part of the SQL language's power comes from its ability to directly interact with a variety of data types. Some of the greatest power of a relational database arises from the inherent functions that it provides and from the extensibility enabled by allowing users to create new functions. SQLite provides functions (see the following SQLite tutorial for more information) in three different categories:
Of these built-in functions, we will most likely use the aggregate
functions, which operate on multiple rows. Aggregate functions-also
known as set functions in SQL-92 or, more informally, as column
functions-return a computed quantity from a column over a number of
rows. SQLite supports the following five aggregate functions (a sixth
function, group_concat
, is not listed).
Function | Example | Description |
---|---|---|
AVG | AVG(p.price) | Returns the average value of a column from all rows that satisfy an expression. Can only be used with built-in numeric data types. The precision of the returned value is defined by the precision of the column being evaluated. |
COUNT | COUNT(p.price) | Returns the number of rows that satisfy an expression, such as a query. Can be used with any data type. |
MAX | MAX(p.price) | Returns the maximum value of a column from all rows that satisfy an expression. Can only be used with built-in data types. |
MIN | MIN(p.price) | Returns the minimum value of a column from all rows that satisfy an expression. Can only be used with built-in data types. |
SUM | SUM(p.price) | Returns the sum of a column over all rows that satisfy an expression. Can only be used with built-in numeric data types. |
These aggregate functions can often be used to quickly find useful information that might otherwise be difficult to identify, as shown in Listing 4.
SELECT COUNT(p.itemNumber) AS Number,
AVG(p.price) AS Average,
MIN(p.stockDate) AS "First Date", MAX(p.stockDate) AS "Last Date"
FROM myProducts AS p ;
Listing 4 uses four of the five aggregate functions to get summary information about the data in the myProducts table. The COUNT function indicates that the table includes ten rows (because the query didn't use a WHERE clause to restrict the rows selected from the table). The AVG function calculates the average price of all items in the myProducts table. Finally, the MIN and MAX functions extract the minimum and maximum dates from the myProducts table.
To delete data in a SQLite database, you use the SQL DELETE statement, which can delete either all rows in a table or a specific subset of rows. The formal syntax for the SQL DELETE statement is remarkably simple:
DELETE FROM tableName
[WHERE clause]
The DELETE statement deletes all rows from the specified table that satisfy an optional WHERE clause. If no WHERE clause is included, all rows in the table are deleted. To demonstrate this use of the DELETE statement, we can create a temporary table, insert several rows, and delete them all.
In [18]:
%%writefile delete.sql
-- First create the temporary table
CREATE TABLE temp (aValue INT) ;
-- Insert fake data
INSERT INTO temp VALUES(0), (1), (2), (3) ;
-- Count rows in the table
SELECT COUNT(*) AS COUNT FROM temp ;
-- Delete all rows
DELETE FROM temp ;
-- Count all rows in the table
SELECT COUNT(*) AS COUNT FROM temp ;
-- Now drop the temporary table
DROP TABLE temp ;
In [19]:
!sqlite3 test < delete.sql
The previous example created a single-column temporary table to hold a single integer value. Next we inserted four rows into the database and issued a SELECT statement to verify that the new table contained four rows. By using an unconstrained DELETE statement, we delete all four rows from the temporary table, which is verified by the second SELECT statement, which indicates that the temporary table contains zero rows. Finally, the DROP TABLE statement deletes the empty table from the schema.
In general, however, you don't want to delete all rows from a table; instead, you'll selectively delete rows. To do this, you create an appropriate WHERE clause that identifies all rows of interest. The syntax for the WHERE clause that you can use with a DELETE statement is identical to that discussed previously when we presented the full SQL SELECT statement syntax. The basic building blocks for constructing a Boolean expression within a WHERE clause were presented in an earlier table. The following example demonstrates using a WHERE clause in a DELETE statement, where we delete all rows that satisfy at least one of two conditions.
In [18]:
%%writefile delete2.sql
-- First display data
SELECT itemNumber, description FROM myProducts ;
-- Selectively delete rows
DELETE FROM myProducts
WHERE description LIKE '%towel%' OR itemNumber <= 3 ;
-- Confirm the proper deletion
SELECT itemNumber, description FROM myProducts ;
In [19]:
!sqlite3 test < delete2.sql
In this example, the DELETE statement includes a WHERE clause that identifies five rows. The WHERE clause contains two expressions that are joined by the OR operator, which means that if either expression evaluates as TRUE for a specific row, that row will be deleted.
The first expression finds all rows that contain the word "towel" in the product description. If you recall, there are two towels in the myProducts table, with itemNumber column values of 7 and 8. The other expression selects all rows with an itemNumber column value less than or equal to 3. The contents of the myProducts table are finally displayed with a simple SELECT statement, demonstrating that only five of the original ten rows remain in the table.
Although this example doesn't explicitly demonstrate their use, you can also include the SQL functions to gain more control over the selection of rows for deletion. These same functions and other operators that can be used in the WHERE clause of the DELETE statement also can be used with the UPDATE statement to selectively modify the values of rows in a table, as described in the next section.
The last SQL task for dealing with data that you need to address is updating specific column values for selected rows in a table. At some level, the SQL UPDATE statement is the union of the SQL INSERT and DELETE statements, because you must select rows to modify as well as specify how to modify them. Formally, the UPDATE statement syntax is straightforward, because you must specify the new column values for the set of rows to be updated:
UPDATE tableName
SET columnName = Value
[ , columnName = Value} ]*
[WHERE clause]
As shown in this SQL syntax, an SQL UPDATE statement must have, at a minimum, one SET component to update one column, along with one or more SET components and a WHERE clause, both of which are optional. If the WHERE clause isn't included, the UPDATE statement modifies the indicated columns for all rows in the table.
Issuing an UPDATE statement is fairly easy, as shown in the following code example, where we modify two columns of a single row.
In [22]:
%%writefile update.sql
-- Extract the test row
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6 ;
-- Update the row
UPDATE myProducts SET price = price * 1.25, stockDate = date('now') WHERE itemNumber = 6 ;
-- Show the new result
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6 ;
In [23]:
!sqlite3 test < update.sql
This example wraps a single UPDATE statement with SELECT statements to
demonstrate the change to the target row. The SELECT statements both
select three columns from the myProducts table for a single row
(the row with the value 6 in the itemNumber column). The UPDATE
statement modifies both the price and the stockDate columns for this
specific row. The value in the price column is increased by 25% (for
example, perhaps due to the item's popularity), and the stockDate column
is modified to hold the current date, which can be obtained easily with
SQLite by using the built-in date
function with an argument of now
in an SQL query.
The previous example demonstrated how to modify multiple column values for a specific row in a single table. However, sometimes the logic to select rows to update is more complex. For example, suppose you need to modify the price of all objects in the myProducts table that you obtain from Quiet Beach Industries, which has a value of 3 in the vendorNumber column in the myVendors table. To do this, you need to use an embedded query:
In [24]:
%%writefile update2.sql
-- Update the table
UPDATE myProducts
SET price = price * 1.10, description = 'NEW: ' || description
WHERE itemNumber IN
( SELECT v.itemNumber
FROM myProducts as p, myVendors as v
WHERE p.itemNumber = v.itemNumber AND v.vendorNumber = 3 ) ;
-- Show new results
SELECT * FROM myProducts ;
In [25]:
!sqlite3 test < update2.sql
In this example, the UPDATE statement modifies the price and description columns for all products that are obtained from the vendor with a value of 3 in the vendorNumber column in the myVendors table. Because you can't do a simple join within an UPDATE statement, you must include a subquery in the WHERE clause to extract the itemNumber rows that correspond to products from Quiet Beach Industries. The WHERE clause in the UPDATE statement uses the IN operator to select those rows that have an itemNumber column in the set of values selected by the embedded query.
Two types of queries can be used in the WHERE clause of an UPDATE
statement: a scalar subquery and a table subquery. A scalar subquery is
an embedded query that returns a single row that contains a single
column-essentially, a single value, which is known as a scalar. You can
use a scalar subquery to select a specific value that will be used in
the expression of the WHERE clause. For example,
itemNumber = (Scalar Subquery)
updates any rows that have a
value in the itemNumber column that matches the result of the
scalar subquery.
A table subquery, on the other hand, can return multiple rows that generally only have one column. In certain instances, a table subquery can contain multiple columns. To use a table subquery, you need to use an SQL operator to combine the embedded query with a Boolean expression. For example, this was shown in the previous code listing, where the IN operator selected all rows from the myProducts table that were produced by Quiet Beach Industries.
The previous section discussed modifying the data that already exists in a table. The other possibility is modifying the structure, or schema, of a database table. This can take the form of adding a column, changing the data type for a column, adding a constraint, or even deleting a column. This process isn't easy, which is one reason to be careful when you initially design your schema. If you do need to modify the structure of a table, you should use a temporary table.
In [26]:
%%writefile rename.sql
-- Create New table with extra column
CREATE TABLE newProducts (
itemNumber INT NOT NULL,
price REAL,
stockDate TEXT,
count INT NOT NULL DEFAULT 0,
description TEXT
) ;
-- New copy old table into new table
INSERT INTO newProducts(itemNumber, price, stockDate, description)
SELECT itemNumber, price, stockDate, description FROM myProducts ;
-- Drop old table
DROP TABLE myProducts ;
-- Now Rename new table to old table name
ALTER TABLE newProducts RENAME TO myProducts ;
-- Show the results
SELECT * FROM myProducts ;
In [27]:
# Execute SQL Script
!sqlite3 test < rename.sql
As this example shows, to modify a table-in this case, to add a new
count
column to the myProducts
table-you first create a table that
has the exact schema you require. This example requires that it always
have a valid value by including the column constraint NOT NULL and
assigns a default value of 0 to the count column by using the column
constraint DEFAULT 0. Notice how you can combine multiple column
constraints by listing them sequentially.
The next step is to copy the existing data from the original table to the new table. You can do so by using an SQL INSERT statement that uses a subquery to get the values to insert. This is a powerful technique that lets you easily copy all or part of an existing table into a second table.
After you've created the new table and copied the appropriate data, you drop the old table by using an SQL DROP TABLE statement and rename the new table to the original name by using an SQL RENAME TABLE statement. The rename operation is straightforward: Rename the oldTableName to the newTableName, but don't supply a schema name for the new table name because the RENAME operation can't move a table between different database schemas. This example concludes by issuing a SELECT statement to display the schema and contents of the new myProducts table. As you can see, the new table has five columns, and the count column is always zero. At this point, a real application would modify the count column appropriately by issuing the necessary SQL UPDATE statements.
Relational databases support the concept of an index to speed up queries. An index is often used when a particular column (or columns) is frequently involved in either a WHERE clause or a table join. The syntax for creating an index is rather simple:
CREATE [UNIQUE] INDEX idx_name ON table_name(column [column]*)
[WHERE]
In this format, we create an index on one or more columns in a given table. If UNIQUE is present in the index creation, only non-duplicate entries are allowed in the index. The columns can also be followed by either ASC or DESC to indicate the column sort order for the index. Finally, if a WHERE clause is included the index is only a partial index since it does not cover the entire table.
As an example, we can create an index on the myProducts table by using the itemNumber column:
CREATE INDEX itn ON myProducts(itemNumber) ;
Of course we have only scratched the surface of SQL, despite the length of this IPython Notebook. We have not discussed views, which effectively turns a SQL query into a new, read-only table, or triggers, which are database operations that are automatically performed when a specific event occurs.
Another useful feature that we will use in the third lesson is the LIMIT
clause. We can use this to restrict the number of rows returned by a
particular query. For example, SELECT * FROM myProducts LIMIT 5 ;
will
only return five rows.
Several sites exist that allow you to try out SQL commands online.