Normal the %sql
magic command is used to execute SQL commands immediately to get a result. If this statement needs to be executed multiple times with different variables, the process is inefficient since the SQL statement must be recompiled every time.
The use of the PREPARE
and EXECUTE
command allow the user to optimize the statement once, and then re-execute the statement using different parameters.
In addition, the commit scope can be modified so that not every statement gets committed immediately. By managing the commit scope, overhead in the database engine can be avoided.
In [ ]:
%run db2.ipynb
By default, any SQL statements executed with the %sql
magic command are immediately commited. This means that the log file has the transaction details and the results are committed to disk. In other words, you can't change your mind after the statement finishes execution.
This behavior is often referred to as AUTOCOMMIT
and adds a level of overhead to statement execution because at the end of every statement the results must be "hardened". On the other hand, autocommit means you don't have to worry about explicitly committing work or causing potential locking issues because you are holding up resources. When a record is updated, no other user will be able to view it (unless using "dirty read") until you commit. Holding the resource in a lock means that other workloads may come to a halt while they wait for you to commit your work.
Here is a classic example of wanting a commit scope that is based on a series of statements:
withdrawal = 100
%sql update checking set balance = balance - withdrawal
%sql update savings set balance = balance + withdrawal
If autocommit is ON
, you could have a problem with the transaction if the system failed after the first update statement. You would have taken money out of the checking account, but have not updated the savings account. To make sure that this transaction is run successfully:
%sql autocommit off
withdrawal = 100
%sql update checking set balance = balance - withdrawal
%sql update savings set balance = balance + withdrawal
%sql commit work
If the transaction fails before the COMMIT WORK
, all changes to the database will be rolled back to its original state, thus protecting the integrity of the two tables.
To COMMIT
all changes to the database you must use the following syntax:
%sql COMMIT [WORK | HOLD]
The command COMMIT
or COMMIT WORK
are identical and will commit all work to the database. Issuing a COMMIT
command also closes all open cursors or statements that are open. If you had created a prepared statement (see section below) then the compiled statement will be no longer valid. By issuing a COMMIT
you are releasing all of the resources and locks that your application may be holding.
COMMIT HOLD
will allow you to commit your work to disk, but keeps all of the resources open for further execution. This is useful for situations where you are inserting or updating 1000's of records and do not want to tie up log space waiting for a commit to occur. The following pseudocode gives you an example how this would be used:
%sql autocommit off
for i = 1 to 1000
%sql insert into x values i
if (i / 100 == 0)
print i "Records inserted"
%sql commit work
end if
end for
%sql commit work
%sql autocommit on
You should always remember to turn AUTOCOMMIT ON
at the end of any code block or you will have to issue COMMIT
at the end of any SQL command to commit it to the database.
The PREPARE
and EXECUTE
commands are useful in situations where you want to repeat an SQL statement multiple times while just changing the parameter values. There isn't any benefit from using these statements for simple tasks that may only run occassionally. The benefit of PREPARE/EXECUTE
is more evident when dealing with a large number of transactions that are the same.
The PREPARE
statement can be used against many types of SQL, but in this implementation, only the following SQL statements are supported:
To prepare a statement, you must use the following syntax:
stmt = %sql PREPARE sql ....
The PREPARE
statement always returns a statement handle. You must assign the results of the PREPARE
statement to a variable since it will be required when you EXECUTE
the statement.
The SQL statement must have any variables replaced with a question mark ?
. For instance, if you wanted to insert a single value into a table you would use the following syntax:
stmt = %sql PREPARE insert into x values (?)
One important note with parameter markers. If you require the parameter to have a specific data type (say INTEGER) then you may want to place a CAST
statement around it to force the proper conversion. Usually strings, integers, decimals, etc... convert fine when using this syntax, but occasionally you may run across a data type issue. For the previous example we could modify it to:
stmt = %sql PREPARE insert into x values (CAST(? AS INTEGER))
Once you have prepared a statement, you can execute it using the following syntax:
%sql EXECUTE :stmt USING :v1,:v2,:v3,....
You must provide the variable names with a colon :
in front of them and separate each one with a comma. This allows the SQL parser to differentiate between a host variable and a column or SQL keyword. You can also use constants as part of the EXECUTE
statement:
%sql EXECUTE :stmt USING 3,'asdsa',24.5
Using variables are more convenient when dealing with strings that may contain single and double quotes.
When using the PREPARE
statement, it can become cumbersome when dealing with many parameter markers. For instance, in order to insert 10 columns into a table the code would look similar to this:
stmt = %sql PREPARE INSERT INTO X VALUES (?,?,?,?,?,?,?,?,?,?)
The %sql
command allows you to use the shortform ?*#
where #
is an integer representing the number of columns you want in the list. The above statement could be written as:
stmt = %sql PREPARE INSERT INTO X VALUES (?*10)
The syntax can also be used to create groups of parameter markers:
stmt = %sql PREPARE INSERT INTO X VALUES (?*3,?*7)
While this may seem a strange way of providing parameters, this becomes more useful when we use the EXECUTE
command.
The EXECUTE
command can use Python arrays (lists) as input arguments. For the previous example with 10 parameters you could issue the following command:
%sql EXECUTE :stmt USING :v1,:v2,:v3,:v4,:v5,:v6,:v7,:v8,:v9,:v10
If you placed all of these values into an array, you could also do the following:
%sql EXECUTE :stmt USING :v[0],:v[1],:v[2],:v[3],:v[4],:v[5],:v[6],:v[7],:v[8],:v[9]
That isn't much simpler but shows that you could use items within an array (one dimensional only). The easiest syntax is:
%sql EXECUTE :stmt USING :v
The only requirement is that the array v
has exactly the number of values required to satisfy the parameter list required by the prepared statement.
When you split the argument list into groups, you can use multiple arrays to contain the data. Given the following prepare statement:
stmt = %sql PREPARE INSERT INTO X VALUES (?*3,?*7)
You could execute the statement using two arrays:
%sql EXECUTE :stmt USING :name, :details
This would work as long as the total number of parameters supplied by the name
array and details
array is equal to 10.
The following examples will show the use of AUTOCOMMIT
and PREPARE/EXECUTE
when running SQL statements.
This first SQL statement will load the EMPLOYEE and DEPARTMENT tables (if they don't already exist) and then return an array of all of the employees in the company using a SELECT statement.
In [ ]:
%sql -sampledata
employees = %sql -r select * from employee
The employees
variable contains all of the employee data as a Python array. The next statement will retrieve the contents of the first row only (Remember that row 0 contains the name of the columns).
In [ ]:
print(employees[1])
We now will create another EMPLOYEE
table that is an exact duplicate of what we already have.
In [ ]:
%%sql -q
DROP TABLE EMPLOYEE2;
CREATE TABLE EMPLOYEE2 AS (SELECT * FROM EMPLOYEE) DEFINITION ONLY;
In [ ]:
%sql -q DELETE FROM EMPLOYEE2
print("Starting Insert")
start_time = time.time()
i = 0
for k in range(0,50):
for record in employees[1:]:
i += 1
empno,firstnme,midinit,lastname,workdept,phoneno,hiredate,job,edlevel,sex,birthdate,salary,bonus,comm = record
%sql -q INSERT INTO EMPLOYEE2 VALUES ( \
:empno,:firstnme,:midinit, \
:lastname,:workdept,:phoneno, \
:hiredate,:job,:edlevel, \
:sex,:birthdate,:salary, \
:bonus,:comm)
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
time_insert = end_time-start_time
In [ ]:
%sql -q DELETE FROM EMPLOYEE2
print("Starting Insert")
start_time = time.time()
i = 0
prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
for k in range(0,50):
for record in employees[1:]:
i += 1
empno,firstnme,midinit,lastname,workdept,phoneno,hiredate,job,edlevel,sex,birthdate,salary,bonus,comm = record
%sql execute :prep using :empno,:firstnme,:midinit,:lastname,:workdept,:phoneno,:hiredate,:job,:edlevel,:sex,:birthdate,:salary,:bonus,:comm
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
time_prepare = end_time-start_time
You will notice that it is a bit tedious to write out all of the columns that are required as part of an INSERT
statement. A simpler option is to use a Python list or array to and assign it directly in the EXECUTE
statement. So rather than:
%sql execute :prep using :empno, :firstnme, ...
We can just use the array variable generated as part of the for loop:
%sql execute :prep using :record
The following SQL demonstrates this approach.
In [ ]:
%sql -q DELETE FROM EMPLOYEE2
print("Starting Insert")
start_time = time.time()
i = 0
prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
for k in range(0,50):
for record in employees[1:]:
i += 1
%sql execute :prep using :record
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
time_array = end_time-start_time
In [ ]:
%sql -q DELETE FROM EMPLOYEE2
%sql autocommit off
print("Starting Insert")
start_time = time.time()
i = 0
prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?*14)
for k in range(0,50):
for record in employees[1:]:
i += 1
%sql execute :prep using :record
%sql commit work
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))
%sql autocommit on
time_commit = end_time-start_time
In [ ]:
%%sql -pb
WITH RESULT(RUN,ELAPSED) AS (
VALUES
('INSERT',CAST(:time_insert AS DEC(5,2))),
('PREPARE',CAST(:time_prepare AS DEC(5,2))),
('ARRAY ',CAST(:time_array AS DEC(5,2))),
('COMMIT ',CAST(:time_commit AS DEC(5,2)))
)
SELECT RUN, ELAPSED FROM RESULT
ORDER BY ELAPSED DESC