Copyright Jana Schaich Borg/Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

MySQL Exercise 1: Welcome to your first notebook!

Database interfaces vary greatly across platforms and companies. The interface you will be using here, called Jupyter, is a web application designed for data science teams who need to create and share documents that share live code. We will be taking advantage of Jupyter's ability to integrate instructional text with areas that allow you to write and run SQL queries. In this course, you will practice writing queries about the Dognition data set in these Jupyter notebooks as I give you step-by-step instructions through written text. Then once you are comfortable with the query syntax, you will practice writing queries about the Dillard's data set in Teradata Viewpoint's more traditional SQL interface, called SQL scratchpad. Your assessments each week will be based on the exercises you complete using the Dillard's dataset.

Jupyter runs in a language called Python, so some of the commands you will run in these MySQL exercises will require incorporating small amounts of Python code, along with the MySQL query itself. Python is a very popular programming language with many statistical and visualization libraries, so you will likely encounter it in other business analysis settings. Since many data analysis projects do not use Python, however, I will point out to you what parts of the commands are specific to Python interfaces.

The first thing you should do every time you start working with a database: load the SQL library

Since Jupyter is run through Python, the first thing you need to do to start practicing SQL queries is load the SQL library. To do this, type the following line of code into the empty cell below:

%load_ext sql

In [2]:
%load_ext sql

The "%" in this line of code is syntax for Python, not SQL. The "cell" I am referring to is the empty box area beside the "In [ ]:" you see on the left side of the screen.

Once you've entered the line of code, press the "run" button on the Jupyter toolbar that looks like an arrow. It's located under “cell” in the drop-down menu bar, and next to the stop button (the solid square). The run button is outlined in red in this picture:

TIP: Whenever instructions say “run” or "execute" a command in future exercises, type the appropriate code into the empty cell and execute it by pressing this same button with the arrow.

When the library has loaded, you will see a number between the brackets that preceed the line of code you executed:

For example, it might look like this:

In [2]:

The second thing you must do every time you want to start working with a database: connect to the database you need to use.

Now that the SQL library is loaded, we need to connect to a database. The following command will log you into the MySQL server at mysqlserver as the user 'studentuser' and will select the database named 'dognitiondb' :

mysql://studentuser:studentpw@mysqlserver/dognitiondb

However, to execute this command using SQL language instead of Python, you will need to begin the line of code with:

%sql

Thus, the complete line of code is:

%sql mysql://studentuser:studentpw@mysqlserver/dognitiondb

Connect to the database by typing this command into the cell below, and running it (note: you can copy and paste the command from above rather than typing it, if you choose):


In [3]:
%sql mysql://studentuser:studentpw@172.17.0.4/dognitiondb


Out[3]:
'Connected: studentuser@dognitiondb'

In [4]:
import socket
socket.gethostbyname('mysqlserver')
#mysqlserver


Out[4]:
'172.17.0.4'

In [5]:
%config SqlMagic


SqlMagic options
--------------
SqlMagic.autolimit=<Int>
    Current: 100000
    Automatically limit the size of the returned result sets
SqlMagic.autopandas=<Bool>
    Current: False
    Return Pandas DataFrames instead of regular result sets
SqlMagic.column_local_vars=<Bool>
    Current: False
    Return data into local variables from column names
SqlMagic.displaylimit=<Int>
    Current: 1000
    Automatically limit the number of rows displayed (full result set is still
    stored)
SqlMagic.dsn_filename=<Unicode>
    Current: 'odbc.ini'
    Path to DSN file. When the first argument is of the form [section], a
    sqlalchemy connection string is formed from the matching section in the DSN
    file.
SqlMagic.feedback=<Bool>
    Current: True
    Print number of rows affected by DML
SqlMagic.short_errors=<Bool>
    Current: True
    Don't display the full traceback on SQL Programming Error
SqlMagic.style=<Unicode>
    Current: 'DEFAULT'
    Set the table printing style to any of prettytable's defined styles
    (currently DEFAULT, MSWORD_FRIENDLY, PLAIN_COLUMNS, RANDOM)

***Every time you run a line of SQL code in Jupyter, you will need to preface the line with "%sql". Remember to do this, even though I will not explicitly instruct you to do so for the rest of the exercises in this course.***

Once you are connected, the output cell (which reads "Out" followed by brackets) will read: "Connected:studentuser@dognitiondb". To make this the default database for our queries, run this "USE" command:

%sql USE dognitiondb

In [6]:
%sql USE dognitiondb


0 rows affected.
Out[6]:
[]

You are now ready to run queries in the Dognition database!

The third thing you should do every time you start working with a new database: get to know your data

The data sets you will be working with in business settings will be big. REALLY big. If you just start making queries without knowing what you are pulling out, you could hang up your servers or be staring at your computer for hours before you get an output. Therefore, even if you are given an ER diagram or relational schema like we learned about in the first week of the course, before you start querying I strongly recommend that you (1) confirm how many tables each database has, and (2) identify the fields contained in each table of the database. To determine how many tables each database has, use the SHOW command:

SHOW tables

Try it yourself (TIP: if you get an error message, it's probably because you forgot to start the query with "%sql"):


In [7]:
%sql SHOW tables


6 rows affected.
Out[7]:
Tables_in_dognitiondb
complete_tests
dogs
exam_answers
reviews
site_activities
users

The output that appears above should show you there are six tables in the Dognition database. To determine what columns or fields (we will use those terms interchangeably in this course) are in each table, you can use the SHOW command again, but this time (1) you have to clarify that you want to see columns instead of tables, and (2) you have to specify from which table you want to examine the columns.

The syntax, which sounds very similar to what you would actually say in the spoken English language, looks like this:

SHOW columns FROM (enter table name here)

or if you have multiple databases loaded:

SHOW columns FROM (enter table name here) FROM (enter database name here)

or

SHOW columns FROM databasename.tablename

Whenever you have multiple databases loaded, you will need to specify which database a table comes from using one of the syntax options described above.

As I said in the earlier "Introduction to Query Syntax" video, it makes it easier to read and troubleshoot your queries if you always write SQL keywords in UPPERCASE format and write your table and field names in their native format. We will only use the most important SQL keywords in this course, but a full list can be found here:

https://dev.mysql.com/doc/refman/5.5/en/keywords.html

Question 1: How many columns does the "dogs" table have? Enter the appropriate query below to find out:


In [8]:
%sql SHOW columns FROM dogs


21 rows affected.
Out[8]:
Field Type Null Key Default Extra
gender varchar(255) YES None
birthday varchar(255) YES None
breed varchar(255) YES None
weight int(11) YES None
dog_fixed tinyint(1) YES None
dna_tested tinyint(1) YES None
created_at datetime NO None
updated_at datetime NO None
dimension varchar(255) YES None
exclude tinyint(1) YES None
breed_type varchar(255) YES None
breed_group varchar(255) YES None
dog_guid varchar(60) YES MUL None
user_guid varchar(60) YES MUL None
total_tests_completed varchar(255) YES None
mean_iti_days varchar(255) YES None
mean_iti_minutes varchar(255) YES None
median_iti_days varchar(255) YES None
median_iti_minutes varchar(255) YES None
time_diff_between_first_and_last_game_days varchar(255) YES None
time_diff_between_first_and_last_game_minutes varchar(255) YES None

You should have determined that the "dogs" table has 21 columns.

An alternate way to learn the same information would be to use the DESCRIBE function. The syntax is:

DESCRIBE tablename

Question 2: Try using the DESCRIBE function to learn how many columns are in the "reviews" table:


In [9]:
%sql DESCRIBE reviews


7 rows affected.
Out[9]:
Field Type Null Key Default Extra
rating int(11) YES None
created_at datetime NO None
updated_at datetime NO None
user_guid varchar(60) YES MUL None
dog_guid varchar(60) YES MUL None
subcategory_name varchar(60) YES None
test_name varchar(60) YES None

You should have determined that there are 7 columns in the "reviews" table.

The SHOW and DESCRIBE functions give a lot more information about the table than just how many columns, or fields, there are. Indeed, the first column of the output shows the title of each field in the table. The column next to that describes what type of data are stored in that column. There are 3 main types of data in MySQL: text, number, and datetime. There are many subtypes of data within these three general categories, as described here:

http://support.hostgator.com/articles/specialized-help/technical/phpmyadmin/mysql-variable-types

The next column in the SHOW/DESCRIBE output indicates whether null values can be stored in the field in the table. The "Key" column of the output provides the following information about each field of data in the table being described (see https://dev.mysql.com/doc/refman/5.6/en/show-columns.html for more information):

  • Empty: the column either is not indexed or is indexed only as a secondary column in a multiple-column, nonunique index.
  • PRI: the column is a PRIMARY KEY or is one of the columns in a multiple-column PRIMARY KEY.
  • UNI: the column is the first column of a UNIQUE index.
  • MUL: the column is the first column of a nonunique index in which multiple occurrences of a given value are permitted within the column.

The "Default" field of the output indicates the default value that is assigned to the field. The "Extra" field contains any additional information that is available about a given field in that table.

Questions 3-6: In the cells below, examine the fields in the other 4 tables of the Dognition database:


In [10]:
%sql DESCRIBE complete_tests


6 rows affected.
Out[10]:
Field Type Null Key Default Extra
created_at datetime NO None
updated_at datetime NO None
user_guid varchar(60) YES MUL None
dog_guid varchar(60) YES MUL None
test_name varchar(60) YES None
subcategory_name varchar(60) YES None

In [11]:
%sql DESCRIBE exam_answers


8 rows affected.
Out[11]:
Field Type Null Key Default Extra
script_detail_id int(11) YES None
subcategory_name varchar(255) YES None
test_name varchar(255) YES None
step_type varchar(255) YES None
start_time datetime YES None
end_time datetime YES None
loop_number int(11) YES None
dog_guid varchar(60) YES None

In [12]:
%sql DESCRIBE site_activities


11 rows affected.
Out[12]:
Field Type Null Key Default Extra
activity_type varchar(150) YES MUL None
description text YES None
membership_id int(11) YES None
category_id int(11) YES None
script_id int(11) YES None
created_at datetime NO None
updated_at datetime NO None
user_guid varchar(255) YES MUL None
script_detail_id int(11) YES None
test_name varchar(255) YES None
dog_guid varchar(255) YES MUL None

In [13]:
%sql DESCRIBE users


16 rows affected.
Out[13]:
Field Type Null Key Default Extra
sign_in_count int(11) YES 0
created_at datetime NO None
updated_at datetime NO None
max_dogs int(11) YES 0
membership_id int(11) YES None
subscribed tinyint(1) YES 0
exclude tinyint(1) YES None
free_start_user tinyint(1) YES None
last_active_at datetime YES None
membership_type int(11) YES None
user_guid text YES MUL None
city varchar(255) YES None
state varchar(255) YES None
zip varchar(255) YES None
country varchar(255) YES None
utc_correction varchar(255) YES None

As you examine the fields in each table, you will notice that none of the Dognition tables have primary keys declared. However, take note of which fields say "MUL" in the "Key" column of the DESCRIBE output, because these columns can still be used to link tables together. An important thing to keep in mind, though, is that because these linking columns were not configured as primary keys, it is possible the linking fields contain NULL values or duplicate rows.

If you do not have a ER diagram or relational schema of the Dognition database yet, consider making one at this point, because you will need to refer back to table and column names constantly throughout designing your queries (if you don't remember what primary keys, secondary keys, ER diagrams, or relational schemas are, review the material discussed in the first week of this course). Some database interfaces do provide notes or visual representations about the information in each table for easy reference, but since the Jupyter interface does not provide this, take your own notes and make your own diagrams, and keep them handy. As you will see, these diagrams and notes will save you a lot of time when you design your queries to pull data.

Using SELECT to look at your raw data

Once you have an idea of what is in your tables look like and how they might interact, it's a good idea to look at some of the raw data itself so that you are aware of any anomalies that could pose problems for your analysis or interpretations. To do that, we will use arguably the most important SQL statement for analysts: the SELECT statement.

SELECT is used anytime you want to retrieve data from a table. In order to retrieve that data, you always have to provide at least two pieces of information:

(1) what you want to select, and      
(2) from where you want to select it.  

I recommend that you always format your SQL code to ensure that these two pieces of information are on separate lines, so they are easy to identify quickly by eye.

The skeleton of a SELECT statement looks like this:

SELECT
FROM

To fill in the statement, you indicate the column names you are interested in after "SELECT" and the table name (and database name, if you have multiple databases loaded) you are drawing the information from after "FROM." So in order to look at the breeds in the dogs table, you would execute the following command:

SELECT breed
FROM dogs;

Remember:

  • SQL syntax and keywords are case insensitive. I recommend that you always enter SQL keywords in upper case and table or column names in either lower case or their native format to make it easy to read and troubleshoot your code, but it is not a requirement to do so. Table or column names are often case insensitive as well, but defaults may vary across database platforms so it's always a good idea to check.
  • Table or column names with spaces in them need to be surrounded by quotation marks in SQL. MySQL accepts both double and single quotation marks, but some database systems only accept single quotation marks. In all database systems, if a table or column name contains an SQL keyword, the name must be enclosed in backticks instead of quotation marks.

    'the marks that surrounds this phrase are single quotation marks'
    "the marks that surrounds this phrase are double quotation marks"
    `the marks that surround this phrase are backticks`

  • The semi-colon at the end of a query is only required when you have multiple separate queries saved in the same text file or editor. That said, I recommend that you make it a habit to always include a semi-colon at the end of your queries.

An important note for executing queries in Jupyter: in order to tell Python that you want to execute SQL language on multiple lines, you must include two percent signs in front of the SQL prefix instead of one. Therefore, to execute the above query, you should enter:

%%sql
SELECT breed
FROM dogs;

When Jupyter is busy executing a query, you will see an asterisk in the brackets next to the output field:

Out [*]

When the query is completed, you will see a number in the brackets next to the output field.

Out [5]

Try it yourself:


In [14]:
%%sql
SELECT breed
FROM dogs;


35050 rows affected.
Out[14]:
breed
Labrador Retriever
Shetland Sheepdog
Golden Retriever
Golden Retriever
Shih Tzu
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix
German Shepherd Dog-Pembroke Welsh Corgi Mix
Vizsla
Pug
Boxer
German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix
Beagle
Beagle
Beagle
Mixed
Chesapeake Bay Retriever
Border Collie
Belgian Malinois
Australian Shepherd-German Shepherd Dog Mix
Poodle
Poodle
Golden Doodle
Labrador Retriever
German Shepherd Dog
German Shepherd Dog
Weimaraner
Mixed
Bouvier des Flandres
Mixed
German Shepherd Dog
Beagle
Mudi
Parson Russell Terrier-Beagle Mix
Dalmatian
I Don't Know
Border Collie-Labrador Retriever Mix
Belgian Tervuren
Mixed
Labrador Retriever
Shih Tzu
Australian Terrier
Shih Tzu
Golden Retriever
Bernese Mountain Dog
Chihuahua- Mix
Shetland Sheepdog
Shetland Sheepdog
Mixed
Chihuahua-Dachshund Mix
Finnish Spitz
Siberian Husky
Rottweiler
Pembroke Welsh Corgi
Brussels Griffon
Mixed
French Bulldog
French Bulldog
Shih Tzu
Doberman Pinscher
German Shepherd Dog
Pembroke Welsh Corgi
English Cocker Spaniel-Cocker Spaniel Mix
Rottweiler
German Shepherd Dog
Cavalier King Charles Spaniel-Bichon Frise Mix
Shih Tzu
Bedlington Terrier
Labrador Retriever
Russell Terrier
Poodle
Irish Setter
German Shepherd Dog
Irish Setter
Poodle
Irish Red and White Setter
Poodle-Cocker Spaniel Mix
American Pit Bull Terrier
Golden Retriever
Beagle-Schipperke Mix
Greyhound
Labrador Retriever-Golden Retriever Mix
Labrador Retriever-Golden Retriever Mix
Labrador Retriever
Boston Terrier-Chihuahua Mix
American Pit Bull Terrier
Mixed
Beagle-Cavalier King Charles Spaniel Mix
Boxer
Pug
French Bulldog
Mixed
Mixed
Labradoodle
Mixed
Pembroke Welsh Corgi
Golden Retriever
Cocker Spaniel
Mixed
Rottweiler
Labrador Retriever-Border Collie Mix
Lhasa Apso-Poodle Mix
Labradoodle
English Springer Spaniel
English Springer Spaniel
Mixed
Shih Tzu
Neapolitan Mastiff
Rat Terrier
Border Terrier
Collie-Shetland Sheepdog Mix
German Shepherd Dog
Dachshund
Dachshund
Golden Retriever-Collie Mix
Beagle
Labrador Retriever
American Eskimo Dog-Papillon Mix
Papillon
Pomeranian
German Shepherd Dog-Belgian Tervuren Mix
German Shepherd Dog
Maltese-Yorkshire Terrier Mix
Australian Shepherd
Shiba Inu
Rat Terrier
Poodle
Border Collie-Greyhound Mix
Poodle
Labradoodle
Mixed
Maltese-Poodle Mix
Siberian Husky-German Shepherd Dog Mix
Chihuahua
Golden Retriever-German Shepherd Dog Mix
Parson Russell Terrier
Rhodesian Ridgeback
Mixed
Mixed
Chihuahua
West Highland White Terrier
Poodle-Miniature Schnauzer Mix
Maltese-Poodle Mix
Yorkshire Terrier-Poodle Mix
Vizsla
Golden Retriever
Labradoodle
Silky Terrier-Poodle Mix
Mixed
Russell Terrier-Miniature Pinscher Mix
Mixed
Bernese Mountain Dog
Bernese Mountain Dog
Bernese Mountain Dog
Yorkshire Terrier
Australian Cattle Dog- Mix
Mixed
Flat-Coated Retriever
Staffordshire Bull Terrier-Bulldog Mix
Mixed
Mixed
Kooikerhondje
German Shepherd Dog
Labrador Retriever-Golden Retriever Mix
German Shepherd Dog-Border Collie Mix
Mixed
Golden Retriever
Mixed
German Shepherd Dog
Mixed
Labradoodle
Shih Tzu
American Staffordshire Terrier
Bulldog
Mixed
Labrador Retriever
Beagle
Beagle
Cardigan Welsh Corgi-German Shepherd Dog Mix
German Shepherd Dog
Doberman Pinscher
Doberman Pinscher-German Shepherd Dog Mix
Mixed
Shetland Sheepdog
Canaan Dog- Mix
Beagle
Rhodesian Ridgeback
Miniature Schnauzer
Russell Terrier-Pug Mix
Mixed
Australian Cattle Dog-Border Collie Mix
Mixed
Shih Tzu
Shih Tzu
English Cocker Spaniel
Australian Shepherd
Parson Russell Terrier
Boxer
German Shepherd Dog
Chihuahua-Dachshund Mix
Australian Cattle Dog
Border Collie-Australian Shepherd Mix
Shih Tzu
Miniature Schnauzer
Australian Shepherd-Border Collie Mix
Mixed
Poodle
Labrador Retriever
Mixed
French Spaniel
Mixed
Italian Greyhound
Weimaraner
Leonberger
Poodle
German Shepherd Dog
Mixed
Portuguese Water Dog
Boykin Spaniel
Mixed
Labrador Retriever
Soft Coated Wheaten Terrier
Poodle
Mixed
Mixed
German Shepherd Dog
Bulldog
Golden Retriever
Boston Terrier
Shih Tzu
Brittany
Bernese Mountain Dog
Golden Retriever-Labrador Retriever Mix
Mixed
Golden Retriever-Labrador Retriever Mix
Mixed
Australian Cattle Dog
Labrador Retriever-Chinese Shar-Pei Mix
Labrador Retriever
Nova Scotia Duck Tolling Retriever
Nova Scotia Duck Tolling Retriever
Mixed
Labrador Retriever
Labrador Retriever
Australian Shepherd
Mixed
Siberian Husky
Mixed
Mixed
Lhasa Apso
Australian Shepherd
Mixed
Mixed
Mixed
Labrador Retriever
Mixed
American Eskimo Dog
Boxer
Keeshond
Bull Terrier
Soft Coated Wheaten Terrier
Boston Terrier
German Shepherd Dog
Poodle
German Shepherd Dog
Shih Tzu-Maltese Mix
Labrador Retriever
Chesapeake Bay Retriever
Mixed
Shih Tzu
Cocker Spaniel
Labrador Retriever
Maltese
Mixed
Cockapoo
Chihuahua-Miniature Pinscher Mix
German Shepherd Dog
Mixed
Shih Tzu
Shih Tzu
Australian Cattle Dog
Maltese-Poodle Mix
Maltese-Shih Tzu Mix
Mixed
Boxer
Dachshund-Miniature Pinscher Mix
Mixed
Miniature American Shepherd
Mixed
Mixed
Golden Retriever
Parson Russell Terrier
Border Collie
German Shepherd Dog
Labrador Retriever-Australian Cattle Dog Mix
Poodle
Shih Tzu
Shih Tzu
Mixed
Mixed
I Don't Know
Border Collie
Golden Retriever
Mixed
Mixed
Chihuahua
Australian Shepherd
Poodle
Affenpinscher
Keeshond
Chihuahua
English Setter
Miniature Schnauzer-Poodle Mix
Mixed
Havanese
German Shepherd Dog
Australian Shepherd
Mixed
Border Collie
Boxer
Labrador Retriever-Cane Corso Mix
Labrador Retriever
Miniature Schnauzer
Mixed
Mixed
Bichon Frise-Shih Tzu Mix
Pug
Pointer-Labrador Retriever Mix
Labrador Retriever-Basenji Mix
Poodle
Golden Retriever
Australian Cattle Dog
German Shepherd Dog
Mixed
Golden Doodle
Pembroke Welsh Corgi-Russell Terrier Mix
Mixed
Mixed
Labrador Retriever
Border Collie-Labrador Retriever Mix
Mixed
Bulldog-Beagle Mix
Border Collie
Bulldog
Mixed
Nova Scotia Duck Tolling Retriever
Mixed
Australian Shepherd
English Cocker Spaniel
Golden Retriever
Poodle-Maltese Mix
Great Dane-Labrador Retriever Mix
Mixed
Golden Retriever
Boxer
Bernese Mountain Dog
Havanese
Mixed
Boxer
Mixed
Brittany-Poodle Mix
Shih Tzu
Mixed
Scottish Terrier
Mixed
Mixed
Weimaraner
Shih Tzu
Weimaraner
Mixed
West Highland White Terrier
Labradoodle
Labrador Retriever
Mixed
Beagle
Labrador Retriever-Cardigan Welsh Corgi Mix
Mixed
Mixed
Mixed
Bearded Collie-Tibetan Terrier Mix
Shih Tzu
Australian Shepherd
Beagle
Beagle
German Shepherd Dog
I Don't Know
Mixed
Miniature American Shepherd
Great Dane
Golden Retriever
Golden Retriever-Labrador Retriever Mix
Labrador Retriever
Mixed
Icelandic Sheepdog
Golden Retriever
Dachshund
German Shorthaired Pointer-Labrador Retriever Mix
Miniature Schnauzer
Miniature Schnauzer
Pug-Chihuahua Mix
Cockapoo
Mixed
Weimaraner
Yorkshire Terrier-Bichon Frise Mix
Cockapoo
Border Collie
Golden Retriever
Cairn Terrier
Yorkshire Terrier- Mix
German Shepherd Dog
Belgian Sheepdog- Mix
Australian Cattle Dog-Border Collie Mix
Old English Sheepdog
West Highland White Terrier
Golden Retriever
Mixed
Mixed
Miniature Schnauzer
Mixed
Boston Terrier
Portuguese Water Dog
American Pit Bull Terrier-American Staffordshire Terrier Mix
American Pit Bull Terrier-American Staffordshire Terrier Mix
Labrador Retriever
Golden Retriever
Golden Retriever
Danish-Swedish Farmdog
Golden Retriever
Labrador Retriever
Mixed
Cardigan Welsh Corgi
Bernese Mountain Dog
Boxer
Boxer
Poodle
Mixed
Labradoodle
Mixed
Labradoodle
Mixed
Shetland Sheepdog
Shetland Sheepdog
Cavalier King Charles Spaniel
Beagle
Bichon Frise-Cavalier King Charles Spaniel Mix
Golden Doodle
Labrador Retriever
French Bulldog
Cockapoo
Rhodesian Ridgeback-Boxer Mix
Mixed
Poodle-Cavalier King Charles Spaniel Mix
Poodle-Cavalier King Charles Spaniel Mix
Mixed
Beagle-Australian Cattle Dog Mix
Mixed
Mixed
Shetland Sheepdog
Labrador Retriever-Rottweiler Mix
Pomeranian
Poodle
Mixed
Yorkshire Terrier
Shih Tzu
Mixed
Border Collie
Mixed
Dalmatian
Chow Chow-Golden Retriever Mix
German Shepherd Dog-Beagle Mix
Siberian Husky
Other
American Staffordshire Terrier-Labrador Retriever Mix
Yorkshire Terrier
Cockapoo
Affenpinscher
Maltese
Mixed
Border Terrier- Mix
Scottish Terrier
English Setter
Labrador Retriever
Mixed
Mixed
Other
Havanese
Havanese
Havanese
Border Terrier
Border Terrier
Border Terrier
Australian Shepherd
Bulldog
Coton de Tulear
Cardigan Welsh Corgi
Mixed
Mixed
Russell Terrier
Greyhound
American Water Spaniel
Mixed
German Shepherd Dog
Russell Terrier-Chihuahua Mix
Labrador Retriever
Poodle
Cavalier King Charles Spaniel
Mastiff
Cavalier King Charles Spaniel
Mixed
I Don't Know
Shiba Inu
Golden Retriever
Mixed
Mixed
Border Collie
Beagle
Mixed
German Shepherd Dog
Miniature Schnauzer
Labradoodle
Australian Shepherd-Golden Retriever Mix
German Shorthaired Pointer
German Shorthaired Pointer
German Shorthaired Pointer
German Shorthaired Pointer
Border Collie-Belgian Tervuren Mix
American Staffordshire Terrier-Australian Shepherd Mix
Pekingese-Dachshund Mix
German Shepherd Dog-Siberian Husky Mix
Golden Doodle
Rat Terrier
Chihuahua
Portuguese Water Dog
Rottweiler- Mix
Mixed
Labrador Retriever
Labrador Retriever
Labrador Retriever
Pug-Maltese Mix
Boston Terrier
Boston Terrier
Labradoodle
Border Collie
Cairn Terrier
Mixed
Shih Tzu
English Springer Spaniel
Mixed
Maltese
Golden Retriever
Mixed
Golden Retriever
Siberian Husky-Australian Shepherd Mix
Australian Cattle Dog
Chihuahua
Whippet
West Highland White Terrier
Boston Terrier-Bulldog Mix
Mixed
Mixed
Mixed
Mixed
Portuguese Water Dog
Border Collie-Dalmatian Mix
Other
Cocker Spaniel
Welsh Springer Spaniel
Poodle-Old English Sheepdog Mix
Mixed
Golden Retriever
Mixed
Mixed
German Shepherd Dog
Mixed
Labrador Retriever
Labrador Retriever
German Shepherd Dog
Labradoodle
Labrador Retriever
Mixed
Border Terrier
Golden Retriever
Golden Retriever
Staffordshire Bull Terrier
Australian Shepherd
Havanese
Labrador Retriever
Soft Coated Wheaten Terrier
Mixed
Golden Retriever
Mixed
Mixed
Poodle
Poodle
Shih Tzu
Border Collie
Border Collie-Staffordshire Bull Terrier Mix
Siberian Husky
German Shepherd Dog
Shih Tzu
Mixed
English Springer Spaniel
Australian Cattle Dog- Mix
Boxer
Other
I Don't Know
Golden Doodle
Bichon Frise-Poodle Mix
Mixed
Shetland Sheepdog
Shetland Sheepdog
Golden Retriever
Mixed
Labrador Retriever
German Shorthaired Pointer-Catahoula Leopard Dog Mix
Mixed
Alaskan Malamute-Collie Mix
Mixed
Doberman Pinscher
Collie
Boxer
Weimaraner
Boxer
Labrador Retriever
Poodle
Mixed
Coton de Tulear
Shih Tzu
Shih Tzu
Golden Retriever-Newfoundland Mix
Shih Tzu
Shetland Sheepdog
Mixed
Border Collie
Mixed
Border Collie
Shih Tzu
American Staffordshire Terrier
Shih Tzu
Mixed
Other
Greyhound
Cavalier King Charles Spaniel
Border Collie
Mixed
Mixed
Boxer-American Staffordshire Terrier Mix
Miniature Schnauzer
Cavalier King Charles Spaniel-Poodle Mix
Other
Shih Tzu
Boston Terrier
Poodle
German Shepherd Dog
Norfolk Terrier
Norfolk Terrier
Labrador Retriever
Labrador Retriever-Rottweiler Mix
Mixed
Nova Scotia Duck Tolling Retriever
German Shepherd Dog
Australian Shepherd
Greyhound
Staffordshire Bull Terrier-Great Dane Mix
Mixed
Wire Fox Terrier
Golden Retriever
French Spaniel
Samoyed
Coton de Tulear
Mixed
Labrador Retriever
Poodle-Maltese Mix
German Shepherd Dog
Golden Retriever
Border Collie
Maltese-Yorkshire Terrier Mix
Cockapoo
Collie
Labrador Retriever-German Shepherd Dog Mix
I Don't Know
Border Collie
Mixed
Pekingese-Shih Tzu Mix
Mixed
Siberian Husky
Shih Tzu
Rhodesian Ridgeback
Mixed
Dalmatian
Dalmatian
Labrador Retriever
Staffordshire Bull Terrier-Miniature Schnauzer Mix
Cocker Spaniel
Mixed
Greyhound
Shetland Sheepdog
Poodle
Puggle
Mixed
Cockapoo
Beagle- Mix
Rhodesian Ridgeback
Havanese
Golden Retriever
Mixed
Shih Tzu
Shih Tzu
Mixed
Shih Tzu
Havanese
Yorkshire Terrier
Dalmatian
Dalmatian
Belgian Malinois
Mixed
Rhodesian Ridgeback
Labradoodle
Australian Shepherd
Australian Shepherd
Dalmatian
Portuguese Water Dog
Australian Shepherd-Australian Cattle Dog Mix
I Don't Know
French Bulldog
Boston Terrier
Labrador Retriever
Mixed
Mixed
Pomeranian
Whippet
Golden Doodle
Border Collie
Border Collie
Border Collie
Poodle-Labrador Retriever Mix
Pug-Wire Fox Terrier Mix
Greyhound
Pug-Wire Fox Terrier Mix
Pug-Smooth Fox Terrier Mix
Pembroke Welsh Corgi
Australian Shepherd
German Shepherd Dog
Mixed
Shih Tzu
Wire Fox Terrier
Mixed
Labrador Retriever
Bearded Collie-Beagle Mix
Golden Retriever
Border Collie-Australian Cattle Dog Mix
Mixed
Great Dane
Brussels Griffon
Poodle
Irish Setter
Poodle
Labrador Retriever
Mixed
Puggle
Shih Tzu
Nova Scotia Duck Tolling Retriever
Poodle-Dachshund Mix
German Shepherd Dog
Golden Retriever
Mixed
Golden Retriever
Cocker Spaniel
Miniature Schnauzer
Shetland Sheepdog
Shetland Sheepdog
Mixed
Pug-Chihuahua Mix
Mixed
Coton de Tulear
Golden Retriever
Golden Retriever
Papillon
Mastiff
Shih Tzu
Polish Lowland Sheepdog
Miniature Pinscher
English Springer Spaniel
I Don't Know
American Eskimo Dog
German Shepherd Dog
Soft Coated Wheaten Terrier
Mixed
Chinese Crested-Poodle Mix
Labrador Retriever
Golden Retriever
American Pit Bull Terrier-Australian Cattle Dog Mix
Shih Tzu
Catahoula Leopard Dog-Rottweiler Mix
Shih Tzu
Belgian Tervuren
Belgian Tervuren
Greyhound
Nova Scotia Duck Tolling Retriever
Mixed
Australian Shepherd-Border Collie Mix
Bernese Mountain Dog
West Highland White Terrier
Poodle-Shih Tzu Mix
Boxer-Border Collie Mix
Shih Tzu
Other
Cavalier King Charles Spaniel
Mixed
Border Collie
I Don't Know
Golden Retriever
Great Dane-Irish Wolfhound Mix
Pyrenean Shepherd
Pyrenean Shepherd
Labrador Retriever
Golden Retriever
German Shepherd Dog
Other
Chinese Shar-Pei
Mastiff
French Bulldog
Australian Shepherd
Australian Shepherd
Australian Shepherd
Dachshund
I Don't Know
Golden Retriever
Maltese
Mixed
Mixed
Cavalier King Charles Spaniel
Labradoodle
Labradoodle
Border Collie
Lhasa Apso
Greyhound
Mixed
Shih Tzu
Dogue de Bordeaux-Boxer Mix
Golden Doodle
Belgian Malinois
Greyhound
Belgian Malinois
Akita
Golden Retriever
Mixed
Labrador Retriever
Golden Retriever
Golden Retriever
Australian Shepherd
Bulldog
Golden Doodle
American Staffordshire Terrier-Catahoula Leopard Dog Mix
Mixed
Golden Retriever
German Shepherd Dog
Scottish Terrier
Bulldog
Shih Tzu-Pekingese Mix
Labrador Retriever
Rat Terrier
Border Terrier
Poodle
Labrador Retriever
Boxer
Beagle-Labrador Retriever Mix
Australian Shepherd
Golden Retriever
Dachshund-Beagle Mix
Silky Terrier
Shetland Sheepdog
Mixed
Golden Doodle
Scottish Terrier
-German Shepherd Dog Mix
Boxer
Labrador Retriever
Keeshond
Mixed
Mixed
German Shorthaired Pointer
Coton de Tulear
Golden Retriever
Mixed
Belgian Malinois
Chihuahua
Poodle
American Pit Bull Terrier
Bernese Mountain Dog
Golden Retriever
Mixed
Mixed
German Shepherd Dog-Chow Chow Mix
Labrador Retriever-Golden Retriever Mix
Weimaraner
Mixed
German Shepherd Dog
Border Collie-Australian Shepherd Mix
Labrador Retriever- Mix
Border Collie
Labrador Retriever
Mixed
Beagle-Australian Cattle Dog Mix
Mixed
Pembroke Welsh Corgi-Great Pyrenees Mix
Mixed
Golden Retriever
Golden Retriever
Mixed
German Shepherd Dog
Australian Shepherd-Pembroke Welsh Corgi Mix
Mixed
German Shepherd Dog
Labrador Retriever
Cavalier King Charles Spaniel
Australian Shepherd
French Bulldog
Labrador Retriever
Golden Doodle
Chesapeake Bay Retriever
Poodle
Golden Doodle
Labrador Retriever
Maltese-Poodle Mix
Labrador Retriever
Labradoodle
Cavalier King Charles Spaniel
Mixed
Mixed
Dachshund
Havanese
Mixed
Labrador Retriever
Boxer
Mixed
Boxer
German Shepherd Dog
Border Collie
Mixed
Australian Shepherd
Afghan Hound-Golden Retriever Mix
Mixed
Dachshund
Shorkie
Field Spaniel
Mixed
Cavalier King Charles Spaniel
Poodle
Chihuahua-American Staffordshire Terrier Mix
Newfoundland
Mixed
Yorkshire Terrier
Australian Cattle Dog
Other
English Cocker Spaniel
Boxer
Portuguese Water Dog
Labradoodle
Mixed
Pug
Bluetick Coonhound
Pug-Miniature Pinscher Mix
Shetland Sheepdog
Whippet-Chinese Shar-Pei Mix
German Shepherd Dog
German Shepherd Dog
Boxer-Bulldog Mix
Portuguese Water Dog
Other
Portuguese Water Dog
Portuguese Water Dog
Portuguese Water Dog
Golden Retriever
Labrador Retriever
Labradoodle
Papillon
Poodle
Mixed
Poodle
Portuguese Water Dog
Vizsla- Mix
Boxer
Chow Chow-Labrador Retriever Mix
Rat Terrier
Rhodesian Ridgeback-Boxer Mix
Labrador Retriever
Mixed
Border Collie-Whippet Mix
Mixed
Russell Terrier
Other
Belgian Malinois
German Shepherd Dog
Border Collie-Pembroke Welsh Corgi Mix
Mixed
Pug
35050 rows, truncated to displaylimit of 1000

When you do so, you will see a line at the top of the output panel that says "35050 rows affected". This means that there are 35050 rows of data in the dogs table. Each row of the output lists the name of the breed of the dog represented by that entry. Notice that some breed names are listed multiple times, because several dogs of that breed have participated in the Dognition tests.

If you scroll all the way down to the bottom of the output, you will see a notification that says "35050 rows, truncated to displaylimit of 1000." We have set up a display limit of 1000 rows in these notebooks to ensure that our database servers are not overloaded, and to reduce the amount of time you have to wait for the query output. However, in a actual scenario these limits would not necessarily be in place for you. Therefore, before we go any further, I want to show you how you could restrict the number of rows outputted by a query.

Using LIMIT to restrict the number of rows in your output (and prevent system crashes)

The MySQL clause you should use is called LIMIT, and it is always placed at the very end of your query. The simplest version of a limit statement looks like this:

SELECT breed
FROM dogs LIMIT 5;

The "5" in this case indicates that you will only see the first 5 rows of data you select.

Question 7: In the next cell, try entering a query that will let you see the first 10 rows of the breed column in the dogs table.


In [19]:
%%sql
SELECT breed
FROM dogs 
LIMIT 10;


10 rows affected.
Out[19]:
breed
Labrador Retriever
Shetland Sheepdog
Golden Retriever
Golden Retriever
Shih Tzu
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix

You can also select rows of data from different parts of the output table, rather than always just starting at the beginning. To do this, use the OFFSET clause after LIMIT. The number after the OFFSET clause indicates from which row the output will begin querying. Note that the offset of Row 1 of a table is actually 0. Therefore, in the following query:

SELECT breed
FROM dogs LIMIT 10 OFFSET 5;

10 rows of data will be returned, starting at Row 6.

An alternative way to write the OFFSET clause in the query is:

SELECT breed
FROM dogs LIMIT 5, 10;

In this notation, the offset is the number before the comma, and the number of rows returned is the number after the comma.

Try it yourself:


In [21]:
%%sql 
SELECT breed
FROM dogs LIMIT 10 OFFSET 5;


10 rows affected.
Out[21]:
breed
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix
German Shepherd Dog-Pembroke Welsh Corgi Mix
Vizsla
Pug
Boxer
German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix

In [23]:
%%sql 
SELECT breed
FROM dogs LIMIT 5, 10;


10 rows affected.
Out[23]:
breed
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix
German Shepherd Dog-Pembroke Welsh Corgi Mix
Vizsla
Pug
Boxer
German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix

The LIMIT command is one of the pieces of syntax that can vary across database platforms. MySQL uses LIMIT to restrict the output, but other databases including Teradata use a statement called "TOP" instead. Oracle has yet another syntax:

http://www.tutorialspoint.com/sql/sql-top-clause.htm

Make sure to look up the correct syntax for the database type you are using.

Using SELECT to query multiple columns

Now that we know how to limit our output, we are ready to make our SELECT statement work a little harder. The SELECT statement can be used to select multiple columns as well as a single column. The output of the query will depend on the order of the columns you enter after the SELECT statement in your query. When you enter column names, separate each name with a comma, but do NOT include a comma after the last column name.

Try the following query with different orders of the column names to observe the differences in output (I will include a LIMIT statement in many of the examples I use in the rest of the course, but feel free to change or remove them to explore different aspects of the data):

SELECT breed, breed_type, breed_group
FROM dogs LIMIT 5, 10;

In [24]:
%%sql
SELECT breed, breed_type, breed_group
FROM dogs LIMIT 0, 5;


5 rows affected.
Out[24]:
breed breed_type breed_group
Labrador Retriever Pure Breed Sporting
Shetland Sheepdog Pure Breed Herding
Golden Retriever Pure Breed Sporting
Golden Retriever Pure Breed Sporting
Shih Tzu Pure Breed Toy

Another trick to know about when using SELECT is that you can use an asterisk as a "wild card" to return all the data in a table. (A wild card is defined as a character that will represent or match any character or sequence of characters in a query.) Take note, this is very risky to do if you do not limit your output or if you don't know how many data are in your database, so use the wild card with caution. However, it is a handy tool to use when you don't have all the column names easily available or when you know you want to query an entire table.

The syntax is as follows:

SELECT *
FROM dogs LIMIT 5, 10;

Question 8: Try using the wild card to query the reviews table:


In [26]:
%%sql
SELECT *
FROM dogs LIMIT 0, 5;


5 rows affected.
Out[26]:
gender birthday breed weight dog_fixed dna_tested created_at updated_at dimension exclude breed_type breed_group dog_guid user_guid total_tests_completed mean_iti_days mean_iti_minutes median_iti_days median_iti_minutes time_diff_between_first_and_last_game_days time_diff_between_first_and_last_game_minutes
female 2011 Labrador Retriever 50 1 0 2013-02-05 03:57:19 2013-07-25 19:41:49 charmer None Pure Breed Sporting fd27b272-7144-11e5-ba71-058fbc01cf0b ce134e42-7144-11e5-ba71-058fbc01cf0b 21 28.20107581 40609.549167 0.004687499952 6.7499999308 564.0215162 812190.98333
male 2007 Shetland Sheepdog 20 1 0 2013-02-05 15:35:09 2013-07-25 19:41:49 protodog None Pure Breed Herding fd27b5ba-7144-11e5-ba71-058fbc01cf0b ce1353d8-7144-11e5-ba71-058fbc01cf0b 20 0.41330713937 595.1622807 0.0058449074865 8.4166667806 7.8528356481 11308.083333
male 2012 Golden Retriever 70 0 1 2013-02-05 17:22:56 2013-07-25 19:41:49 None None Pure Breed Sporting fd27b6b4-7144-11e5-ba71-058fbc01cf0b ce135ab8-7144-11e5-ba71-058fbc01cf0b 2 0.0091319445066 13.15000009 0.0091319445066 13.15000009 0.0091319445066 13.15000009
male 2011 Golden Retriever 70 0 0 2013-02-05 17:40:57 2014-05-30 15:52:53 None None Pure Breed Sporting fd27b79a-7144-11e5-ba71-058fbc01cf0b ce13507c-7144-11e5-ba71-058fbc01cf0b 11 56.115179398 80805.858333 0.0044328703647 6.3833333252 561.15179398 808058.58333
female 2010 Shih Tzu 190 0 0 2013-02-05 18:12:28 2014-05-30 15:52:54 einstein 1 Pure Breed Toy fd27b86c-7144-11e5-ba71-058fbc01cf0b ce135e14-7144-11e5-ba71-058fbc01cf0b 31 7.7342542438 11137.326111 0.00039930554425 0.57499998372 232.02762731 334119.78333

NOTE: if you do this for the dogs table, your output will be too wide to see at one time in the dialog box. Use the scroll bars at bottom and to the right of the dialog box to see the entire output table.

SELECT statements can also be used to make new derivations of individual columns using "+" for addition, "-" for subtraction, "*" for multiplication, or "/" for division. For example, if you wanted the median inter-test intervals in hours instead of minutes or days, you could query:

SELECT median_iti_minutes / 60
FROM dogs LIMIT 5, 10;

Question 9: Go ahead and try it, adding in a column to your output that shows you the original median_iti in minutes.


In [28]:
%%sql
SELECT median_iti_minutes / 60
FROM dogs LIMIT 0, 5;


5 rows affected.
Out[28]:
median_iti_minutes / 60
0.11249999884666666
0.14027777967666666
0.2191666681666667
0.10638888875333333
0.009583333062

Now it's time to practice writing your own SELECT statements.

Question 10: How would you retrieve the first 15 rows of data from the dog_guid, subcategory_name, and test_name fields of the Reviews table, in that order?


In [31]:
%%sql
SELECT dog_guid, subcategory_name, test_name
FROM reviews
Limit 0, 15;


15 rows affected.
Out[31]:
dog_guid subcategory_name test_name
ce3ac77e-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Warm-up
ce2aedcc-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Warm-up
ce2aedcc-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Game
ce2aedcc-7144-11e5-ba71-058fbc01cf0b Communication Treat Warm-up
ce405c52-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Warm-up
ce405c52-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Game
ce405c52-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Game
ce405e28-7144-11e5-ba71-058fbc01cf0b Communication Treat Warm-up
ce405e28-7144-11e5-ba71-058fbc01cf0b Cunning Turn Your Back
ce2609c4-7144-11e5-ba71-058fbc01cf0b Communication Treat Warm-up
ce2609c4-7144-11e5-ba71-058fbc01cf0b Communication Arm Pointing
ce2609c4-7144-11e5-ba71-058fbc01cf0b Communication Foot Pointing
ce260c1c-7144-11e5-ba71-058fbc01cf0b Shaker Game Shaker Warm-Up
ce3fd48a-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Game
ce3fd48a-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Warm-up

Question 11: How would you retrieve 10 rows of data from the activity_type, created_at, and updated_at fields of the site_activities table, starting at row 50? What do you notice about the created_at and updated_at fields?


In [33]:
%%sql
SELECT activity_type, created_at, updated_at 
FROM site_activities
Limit 49, 10;


10 rows affected.
Out[33]:
activity_type created_at updated_at
point_in_cat 2013-07-25 20:55:08 2013-07-25 20:55:08
point_in_cat 2013-07-25 20:55:07 2013-07-25 20:55:07
point_in_cat 2013-07-25 20:55:50 2013-07-25 20:55:50
point_in_cat 2013-07-25 20:56:09 2013-07-25 20:56:09
point_in_cat 2013-07-25 20:56:21 2013-07-25 20:56:21
point_in_cat 2013-07-25 21:00:31 2013-07-25 21:00:31
point_in_cat 2013-07-25 21:02:29 2013-07-25 21:02:29
point_in_cat 2013-07-25 21:02:31 2013-07-25 21:02:31
point_in_cat 2013-07-25 21:02:45 2013-07-25 21:02:45
point_in_cat 2013-07-25 21:05:41 2013-07-25 21:05:41

Question 12: How would you retrieve 20 rows of data from all the columns in the users table, starting from row 2000? What do you notice about the free_start_user field?


In [ ]:

You have already learned how to see all the data in your database! Congratulations!

Feel free to practice any other queries you are interested in below:


In [ ]: