This notebook shows how to work with a real world dataset using SQL and Python. In this lab you will:
The city of Chicago released a dataset showing all school level performance data used to create School Report Cards for the 2011-2012 school year. The dataset is available from the Chicago Data Portal: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t
This dataset includes a large number of metrics. Start by familiarizing yourself with the types of metrics in the database: https://data.cityofchicago.org/api/assets/AAD41A13-BE8A-4E67-B1F5-86E711E09D5F?download=true
NOTE: Do not download the dataset directly from City of Chicago portal. Instead download a more database friendly version from the link below. Now download a static copy of this database and review some of its contents: https://ibm.box.com/shared/static/0g7kbanvn5l2gt2qu38ukooatnjqyuys.csv
In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.
While it is easier to read the dataset into a Pandas dataframe and then PERSIST it into the database as we saw in the previous lab, it results in mapping to default datatypes which may not be optimal for SQL querying. For example a long textual field may map to a CLOB instead of a VARCHAR.
Therefore, it is highly recommended to manually load the table using the database console LOAD tool, as indicated in Week 2 Lab 1 Part II. The only difference with that lab is that in Step 5 of the instructions you will need to click on create "(+) New Table" and specify the name of the table you want to create and then click "Next".
In [ ]:
%load_ext sql
In [ ]:
# Enter the connection string for your Db2 on Cloud database instance below
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
%sql ibm_db_sa://
In [ ]:
# type in your query to retrieve list of all tables in the database for your db2 schema (username)
Double-click here for a hint
Double-click here for the solution.
In [ ]:
# type in your query to retrieve the number of columns in the SCHOOLS table
Double-click here for a hint
Double-click here for the solution.
Now retrieve the the list of columns in SCHOOLS table and their column type (datatype) and length.
In [ ]:
# type in your query to retrieve all column names in the SCHOOLS table along with their datatypes and length
Double-click here for the solution.
In [ ]:
Double-click here for a hint
Double-click here for another hint
Double-click here for the solution.
In [ ]:
Double-click here for a hint
Double-click here for the solution.
In [ ]:
Double-click here for the solution.
In [ ]:
Double-click here for the solution.
In [ ]:
Double-click here for the solution.
In [ ]:
Double-click here for a hint
Double-click here for the solution.
In [ ]:
Double-click here for a hint
Double-click here for another hint
Double-click here for the solution.
In [ ]:
Double-click here for a hint
Double-click here for another hint
Double-click here for the solution.
In [ ]:
Double-click here for a hint
Double-click here for the solution.
Copyright © 2018 cognitiveclass.ai. This notebook and its source code are released under the terms of the MIT License.