Copyright Jana Schaich Borg/Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

MySQL Exercise 10: Useful Logical Operators

There are a few more logical operators we haven't covered yet that you might find useful when designing your queries. Expressions that use logical operators return a result of "true" or "false", depending on whether the conditions you specify are met. The "true" or "false" results are usually used to determine which, if any, subsequent parts of your query will be run. We will discuss the IF operator, the CASE operator, and the order of operations within logical expressions in this lesson.

Begin by loading the sql library and database, and making the Dognition database your default database:


In [8]:
%load_ext sql
%sql mysql://studentuser:studentpw@mysqlserver/dognitiondb
%sql USE dognitiondb

%config SqlMagic.displaylimit=25


The sql extension is already loaded. To reload it, use:
  %reload_ext sql
0 rows affected.

1. IF expressions

IF expressions are used to return one of two results based on whether inputs to the expressions meet the conditions you specify. They are frequently used in SELECT statements as a compact way to rename values in a column. The basic syntax is as follows:

IF([your conditions],[value outputted if conditions are met],[value outputted if conditions are NOT met])

So we could write:

SELECT created_at, IF(created_at<'2014-06-01','early_user','late_user') AS user_type
FROM users

to output one column that provided the time stamp of when a user account was created, and a second column called user_type that used that time stamp to determine whether the user was an early or late user. User_type could then be used in a GROUP BY statement to segment summary calculations (in database systems that support the use of aliases in GROUP BY statements).

For example, since we know there are duplicate user_guids in the user table, we could combine a subquery with an IF statement to retrieve a list of unique user_guids with their classification as either an early or late user (based on when their first user entry was created):

SELECT cleaned_users.user_guid as UserID,
       IF(cleaned_users.first_account<'2014-06-01','early_user','late_user') AS user_type
FROM (SELECT user_guid, MIN(created_at) AS first_account 
      FROM users
      GROUP BY user_guid) AS cleaned_users

We could then use a GROUP BY statement to count the number of unique early or late users:

SELECT IF(cleaned_users.first_account<'2014-06-01','early_user','late_user') AS user_type,
       COUNT(cleaned_users.first_account)
FROM (SELECT user_guid, MIN(created_at) AS first_account 
      FROM users
      GROUP BY user_guid) AS cleaned_users
GROUP BY user_type

Try it yourself:


In [3]:
%%sql
SELECT IF(cleaned_users.first_account<'2014-06-01','early_user','late_user') AS user_type,
       COUNT(cleaned_users.first_account)
FROM (SELECT user_guid, MIN(created_at) AS first_account 
      FROM users
      GROUP BY user_guid) AS cleaned_users
GROUP BY user_type


2 rows affected.
Out[3]:
user_type COUNT(cleaned_users.first_account)
early_user 14470
late_user 18723

Question 1: Write a query that will output distinct user_guids and their associated country of residence from the users table, excluding any user_guids that have NULL values. You should get 16,261 rows in your result.


In [7]:
%%sql
SELECT DISTINCT u.user_guid, u.country
FROM users u
WHERE u.user_guid IS NOT NULL AND u.country IS NOT NULL


16261 rows affected.
Out[7]:
user_guid country
ce134e42-7144-11e5-ba71-058fbc01cf0b US
ce1353d8-7144-11e5-ba71-058fbc01cf0b US
ce135ab8-7144-11e5-ba71-058fbc01cf0b US
ce13507c-7144-11e5-ba71-058fbc01cf0b US
ce135e14-7144-11e5-ba71-058fbc01cf0b US
ce13615c-7144-11e5-ba71-058fbc01cf0b US
ce135f2c-7144-11e5-ba71-058fbc01cf0b US
ce136a1c-7144-11e5-ba71-058fbc01cf0b US
ce136ac6-7144-11e5-ba71-058fbc01cf0b US
ce136c24-7144-11e5-ba71-058fbc01cf0b US
ce136e36-7144-11e5-ba71-058fbc01cf0b US
ce136ee0-7144-11e5-ba71-058fbc01cf0b US
ce136f94-7144-11e5-ba71-058fbc01cf0b US
ce134be0-7144-11e5-ba71-058fbc01cf0b US
ce1371a6-7144-11e5-ba71-058fbc01cf0b US
ce1373ae-7144-11e5-ba71-058fbc01cf0b US
ce13750c-7144-11e5-ba71-058fbc01cf0b US
ce1375b6-7144-11e5-ba71-058fbc01cf0b US
ce1377b4-7144-11e5-ba71-058fbc01cf0b US
ce137700-7144-11e5-ba71-058fbc01cf0b US
ce137868-7144-11e5-ba71-058fbc01cf0b US
ce137912-7144-11e5-ba71-058fbc01cf0b US
ce1379c6-7144-11e5-ba71-058fbc01cf0b US
ce137a7a-7144-11e5-ba71-058fbc01cf0b FR
ce137034-7144-11e5-ba71-058fbc01cf0b US
16261 rows, truncated to displaylimit of 25

Question 2: Use an IF expression and the query you wrote in Question 1 as a subquery to determine the number of unique user_guids who reside in the United States (abbreviated "US") and outside of the US.


In [8]:
%%sql
SELECT IF(cleanU.country='US', 'in US', 'out US') AS location, COUNT(cleanU.user_guid)
FROM (SELECT DISTINCT u.user_guid, u.country
        FROM users u
        WHERE u.user_guid IS NOT NULL AND u.country IS NOT NULL) AS cleanU
GROUP BY location


2 rows affected.
Out[8]:
location COUNT(cleanU.user_guid)
in US 9356
out US 6905

In [9]:
%%sql
SELECT IF(cleaned_users.country='US','In US','Outside US') AS US_user,
count(cleaned_users.user_guid)
FROM (SELECT DISTINCT user_guid, country
FROM users
WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user;


2 rows affected.
Out[9]:
US_user count(cleaned_users.user_guid)
In US 9356
Outside US 6905

Single IF expressions can only result in one of two specified outputs, but multiple IF expressions can be nested to result in more than two possible outputs. When you nest IF expressions, it is important to encase each IF expression--as well as the entire IF expression put together--in parentheses.

For example, if you examine the entries contained in the non-US countries category, you will see that many users are associated with a country called "N/A." "N/A" is an abbreviation for "Not Applicable"; it is not a real country name. We should separate these entries from the "Outside of the US" category we made earlier. We could use a nested query to say whenever "country" does not equal "US", use the results of a second IF expression to determine whether the outputed value should be "Not Applicable" or "Outside US." The IF expression would look like this:

IF(cleaned_users.country='US','In US', IF(cleaned_users.country='N/A','Not Applicable','Outside US'))

Since the second IF expression is in the position within the IF expression where you specify "value outputted if conditions are not met," its two possible outputs will only be considered if cleaned_users.country='US' is evaluated as false.

The full query to output the number of unique users in each of the three groups would be:

SELECT IF(cleaned_users.country='US','In US', 
          IF(cleaned_users.country='N/A','Not Applicable','Outside US')) AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user

Try it yourself. You should get 5,642 unique user_guids in the "Not Applicable" category, and 1,263 users in the "Outside US" category.


In [10]:
%%sql
SELECT IF(cleaned_users.country='US','In US', 
          IF(cleaned_users.country='N/A','Not Applicable','Outside US')) AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user


3 rows affected.
Out[10]:
US_user count(cleaned_users.user_guid)
In US 9356
Not Applicable 5642
Outside US 1263

The IF function is not supported by all database platforms, and some spell the function as IIF rather than IF, so be sure to double-check how the function works in the platform you are using.

If nested IF expressions seem confusing or hard to read, don't worry, there is a better function available for situations when you want to use conditional logic to output more than two groups. That function is called CASE.

2. CASE expressions

The main purpose of CASE expressions is to return a singular value based on one or more conditional tests. You can think of CASE expressions as an efficient way to write a set of IF and ELSEIF statements. There are two viable syntaxes for CASE expressions. If you need to manipulate values in a current column of your data, you would use this syntax:

Using this syntax, our nested IF statement from above could be written as:

SELECT CASE WHEN cleaned_users.country="US" THEN "In US"
            WHEN cleaned_users.country="N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user

Go ahead and try it:


In [11]:
%%sql
SELECT CASE WHEN cleaned_users.country="US" THEN "In US"
            WHEN cleaned_users.country="N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
      COUNT(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user


3 rows affected.
Out[11]:
US_user count(cleaned_users.user_guid)
In US 9356
Not Applicable 5642
Outside US 1263

Since our query does not require manipulation of any of the values in the country column, though, we could also take advantage of this syntax, which is slightly more compact:

Our query written in this syntax would look like this:

SELECT CASE cleaned_users.country
            WHEN "US" THEN "In US"
            WHEN "N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user

Try this query as well:


In [12]:
%%sql
SELECT CASE cleaned_users.country
            WHEN "US" THEN "In US"
            WHEN "N/A" THEN "Not Applicable"
            ELSE "Outside US"
            END AS US_user, 
      count(cleaned_users.user_guid)   
FROM (SELECT DISTINCT user_guid, country 
      FROM users
      WHERE country IS NOT NULL) AS cleaned_users
GROUP BY US_user


3 rows affected.
Out[12]:
US_user count(cleaned_users.user_guid)
In US 9356
Not Applicable 5642
Outside US 1263

There are a couple of things to know about CASE expressions:

  • Make sure to include the word END at the end of the expression
  • CASE expressions do not require parentheses
  • ELSE expressions are optional
  • If an ELSE expression is omitted, NULL values will be outputted for all rows that do not meet any of the conditions stated explicitly in the expression
  • CASE expressions can be used anywhere in a SQL statement, including in GROUP BY, HAVING, and ORDER BY clauses or the SELECT column list.

You will find that CASE statements are useful in many contexts. For example, they can be used to rename or revise values in a column.

Question 3: Write a query using a CASE statement that outputs 3 columns: dog_guid, dog_fixed, and a third column that reads "neutered" every time there is a 1 in the "dog_fixed" column of dogs, "not neutered" every time there is a value of 0 in the "dog_fixed" column of dogs, and "NULL" every time there is a value of anything else in the "dog_fixed" column. Limit your results for troubleshooting purposes.


In [16]:
%%sql
SELECT d.dog_guid, d.dog_fixed, 
        CASE
            WHEN d.dog_fixed=1 THEN 'neutered'
            WHEN d.dog_fixed=0 THEN 'not neutered'
            ELSE NULL
        END AS label
FROM dogs d
LIMIT 10


10 rows affected.
Out[16]:
dog_guid dog_fixed label
fd27b272-7144-11e5-ba71-058fbc01cf0b 1 neutered
fd27b5ba-7144-11e5-ba71-058fbc01cf0b 1 neutered
fd27b6b4-7144-11e5-ba71-058fbc01cf0b 0 not neutered
fd27b79a-7144-11e5-ba71-058fbc01cf0b 0 not neutered
fd27b86c-7144-11e5-ba71-058fbc01cf0b 0 not neutered
fd27b948-7144-11e5-ba71-058fbc01cf0b 1 neutered
fd27ba1a-7144-11e5-ba71-058fbc01cf0b 1 neutered
fd27bbbe-7144-11e5-ba71-058fbc01cf0b 1 neutered
fd27c1c2-7144-11e5-ba71-058fbc01cf0b 1 neutered
fd27c5be-7144-11e5-ba71-058fbc01cf0b 1 neutered

You can also use CASE statements to standardize or combine several values into one.

Question 4: We learned that NULL values should be treated the same as "0" values in the exclude columns of the dogs and users tables. Write a query using a CASE statement that outputs 3 columns: dog_guid, exclude, and a third column that reads "exclude" every time there is a 1 in the "exclude" column of dogs and "keep" every time there is any other value in the exclude column. Limit your results for troubleshooting purposes.


In [20]:
%%sql
SELECT d.dog_guid, d.exclude, 
        CASE d.exclude
            WHEN 1 THEN 'exclude'
            ELSE 'keep'
        END AS label
FROM dogs d
LIMIT 10


10 rows affected.
Out[20]:
dog_guid exclude label
fd27b272-7144-11e5-ba71-058fbc01cf0b None keep
fd27b5ba-7144-11e5-ba71-058fbc01cf0b None keep
fd27b6b4-7144-11e5-ba71-058fbc01cf0b None keep
fd27b79a-7144-11e5-ba71-058fbc01cf0b None keep
fd27b86c-7144-11e5-ba71-058fbc01cf0b 1 exclude
fd27b948-7144-11e5-ba71-058fbc01cf0b None keep
fd27ba1a-7144-11e5-ba71-058fbc01cf0b 1 exclude
fd27bbbe-7144-11e5-ba71-058fbc01cf0b None keep
fd27c1c2-7144-11e5-ba71-058fbc01cf0b None keep
fd27c5be-7144-11e5-ba71-058fbc01cf0b None keep

Question 5: Re-write your query from Question 4 using an IF statement instead of a CASE statement.


In [21]:
%%sql
SELECT d.dog_guid, d.exclude, 
        IF(d.exclude=1, 'exclude', 'keep') AS label
FROM dogs d
LIMIT 10


10 rows affected.
Out[21]:
dog_guid exclude label
fd27b272-7144-11e5-ba71-058fbc01cf0b None keep
fd27b5ba-7144-11e5-ba71-058fbc01cf0b None keep
fd27b6b4-7144-11e5-ba71-058fbc01cf0b None keep
fd27b79a-7144-11e5-ba71-058fbc01cf0b None keep
fd27b86c-7144-11e5-ba71-058fbc01cf0b 1 exclude
fd27b948-7144-11e5-ba71-058fbc01cf0b None keep
fd27ba1a-7144-11e5-ba71-058fbc01cf0b 1 exclude
fd27bbbe-7144-11e5-ba71-058fbc01cf0b None keep
fd27c1c2-7144-11e5-ba71-058fbc01cf0b None keep
fd27c5be-7144-11e5-ba71-058fbc01cf0b None keep

Case expressions are also useful for breaking values in a column up into multiple groups that meet specific criteria or that have specific ranges of values.

Question 6: Write a query that uses a CASE expression to output 3 columns: dog_guid, weight, and a third column that reads...
"very small" when a dog's weight is 1-10 pounds
"small" when a dog's weight is greater than 10 pounds to 30 pounds
"medium" when a dog's weight is greater than 30 pounds to 50 pounds
"large" when a dog's weight is greater than 50 pounds to 85 pounds
"very large" when a dog's weight is greater than 85 pounds
Limit your results for troubleshooting purposes.


In [25]:
%%sql
SELECT d.dog_guid, d.weight,
    CASE 
        WHEN d.weight >=1 and d.weight <=10 THEN 'very small'
        WHEN d.weight >10 and d.weight <=30 THEN 'small'
        WHEN d.weight >30 and d.weight <=50 THEN 'medium'
        WHEN d.weight >50 and d.weight <=85 THEN 'large'
        WHEN d.weight >85 THEN 'very large'
    ELSE 'just weird'
    END AS label
FROM dogs d
LIMIT 10


10 rows affected.
Out[25]:
dog_guid weight label
fd27b272-7144-11e5-ba71-058fbc01cf0b 50 medium
fd27b5ba-7144-11e5-ba71-058fbc01cf0b 20 small
fd27b6b4-7144-11e5-ba71-058fbc01cf0b 70 large
fd27b79a-7144-11e5-ba71-058fbc01cf0b 70 large
fd27b86c-7144-11e5-ba71-058fbc01cf0b 190 very large
fd27b948-7144-11e5-ba71-058fbc01cf0b 60 large
fd27ba1a-7144-11e5-ba71-058fbc01cf0b 190 very large
fd27bbbe-7144-11e5-ba71-058fbc01cf0b 50 medium
fd27c1c2-7144-11e5-ba71-058fbc01cf0b 70 large
fd27c5be-7144-11e5-ba71-058fbc01cf0b 0 just weird

3. Pay attention to the order of operations within logical expressions

As you started to see with the query you wrote in Question 6, CASE expressions often end up needing multiple AND and OR operators to accurately describe the logical conditions you want to impose on the groups in your queries. You must pay attention to the order in which these operators are included in your logical expressions, because unless parentheses are included, the NOT operator is always evaluated before an AND operator, and an AND operator is always evaluated before the OR operator.

When parentheses are included, the expressions within the parenthese are evaluated first. That means this expression:

CASE WHEN "condition 1" OR "condition 2" AND "condition 3"...

will lead to different results than this expression:

CASE WHEN "condition 3" AND "condition 1" OR "condition 2"...

or this expression:

CASE WHEN ("condition 1" OR "condition 2") AND "condition 3"...

In the first case you will get rows that meet condition 2 and 3, or condition 1. In the second case you will get rows that meet condition 1 and 3, or condition 2. In the third case, you will get rows that meet condition 1 or 2, and condition 3.

Let's see a concrete example of how the order in which logical operators are evaluated affects query results.

Question 7: How many distinct dog_guids are found in group 1 using this query?

SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN breed_group='Sporting' OR breed_group='Herding' AND exclude!='1' THEN "group 1"
     ELSE "everything else"
     END AS groups
FROM dogs
GROUP BY groups

In [30]:
%%sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN breed_group='Sporting' OR breed_group='Herding' AND exclude!='1' THEN "group 1"
     ELSE "everything else"
     END AS groups
FROM dogs
GROUP BY groups


2 rows affected.
Out[30]:
COUNT(DISTINCT dog_guid) groups
30179 everything else
4871 group 1

Question 8: How many distinct dog_guids are found in group 1 using this query?

SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND breed_group='Sporting' OR breed_group='Herding' THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name

In [27]:
%%sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND breed_group='Sporting' OR breed_group='Herding' THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name


2 rows affected.
Out[27]:
COUNT(DISTINCT dog_guid) group_name
31589 everything else
3461 group 1

Question 9: How many distinct dog_guids are found in group 1 using this query?

SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND (breed_group='Sporting' OR breed_group='Herding') THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name

In [28]:
%%sql
SELECT COUNT(DISTINCT dog_guid), 
CASE WHEN exclude!='1' AND (breed_group='Sporting' OR breed_group='Herding') THEN "group 1"
     ELSE "everything else"
     END AS group_name
FROM dogs
GROUP BY group_name


2 rows affected.
Out[28]:
COUNT(DISTINCT dog_guid) group_name
35004 everything else
46 group 1

**So make sure you always pay attention to the order in which your logical operators are listed in your expressions, and whenever possible, include parentheses to ensure that the expressions are evaluated in the way you intend!**

Let's practice some more IF and CASE statements

Question 10: For each dog_guid, output its dog_guid, breed_type, number of completed tests, and use an IF statement to include an extra column that reads "Pure_Breed" whenever breed_type equals 'Pure Breed" and "Not_Pure_Breed" whenever breed_type equals anything else. LIMIT your output to 50 rows for troubleshooting. HINT: you will need to use a join to complete this query.


In [5]:
%%sql
SELECT d.dog_guid, d.breed_type, COUNT(ct.dog_guid),
    IF(d.breed_type='Pure Breed', 'Pure_Breed', 'Not_Pure_Breed') AS 'Pure or not'
FROM dogs d JOIN complete_tests ct on d.dog_guid=ct.dog_guid
GROUP BY d.dog_guid


17986 rows affected.
Out[5]:
dog_guid breed_type COUNT(ct.dog_guid) Pure or not
fd27b272-7144-11e5-ba71-058fbc01cf0b Pure Breed 21 Pure_Breed
fd27b5ba-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27b6b4-7144-11e5-ba71-058fbc01cf0b Pure Breed 2 Pure_Breed
fd27b79a-7144-11e5-ba71-058fbc01cf0b Pure Breed 11 Pure_Breed
fd27b86c-7144-11e5-ba71-058fbc01cf0b Pure Breed 31 Pure_Breed
fd27b948-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27ba1a-7144-11e5-ba71-058fbc01cf0b Pure Breed 27 Pure_Breed
fd27bbbe-7144-11e5-ba71-058fbc01cf0b Mixed Breed/ Other/ I Don't Know 20 Not_Pure_Breed
fd27c1c2-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27c5be-7144-11e5-ba71-058fbc01cf0b Cross Breed 20 Not_Pure_Breed
fd27c74e-7144-11e5-ba71-058fbc01cf0b Cross Breed 14 Not_Pure_Breed
fd27c7d0-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27c852-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27c8d4-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27c956-7144-11e5-ba71-058fbc01cf0b Cross Breed 11 Not_Pure_Breed
fd27cb72-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27cd98-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27ce1a-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 Pure_Breed
fd27cea6-7144-11e5-ba71-058fbc01cf0b Mixed Breed/ Other/ I Don't Know 2 Not_Pure_Breed
fd27cf28-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27cfaa-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 Pure_Breed
fd27d02c-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 Pure_Breed
fd27d0b8-7144-11e5-ba71-058fbc01cf0b Cross Breed 21 Not_Pure_Breed
fd27d144-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 Pure_Breed
fd27d1c6-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 Pure_Breed
17986 rows, truncated to displaylimit of 25

In [6]:
%%sql
SELECT d.dog_guid AS dogID, d.breed_type AS breed_type, count(c.created_at) AS
numtests,
IF(d.breed_type='Pure Breed','pure_breed', 'not_pure_breed') AS pure_breed
FROM dogs d, complete_tests c
WHERE d.dog_guid=c.dog_guid
GROUP BY dogID


17986 rows affected.
Out[6]:
dogID breed_type numtests pure_breed
fd27b272-7144-11e5-ba71-058fbc01cf0b Pure Breed 21 pure_breed
fd27b5ba-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27b6b4-7144-11e5-ba71-058fbc01cf0b Pure Breed 2 pure_breed
fd27b79a-7144-11e5-ba71-058fbc01cf0b Pure Breed 11 pure_breed
fd27b86c-7144-11e5-ba71-058fbc01cf0b Pure Breed 31 pure_breed
fd27b948-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27ba1a-7144-11e5-ba71-058fbc01cf0b Pure Breed 27 pure_breed
fd27bbbe-7144-11e5-ba71-058fbc01cf0b Mixed Breed/ Other/ I Don't Know 20 not_pure_breed
fd27c1c2-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27c5be-7144-11e5-ba71-058fbc01cf0b Cross Breed 20 not_pure_breed
fd27c74e-7144-11e5-ba71-058fbc01cf0b Cross Breed 14 not_pure_breed
fd27c7d0-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27c852-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27c8d4-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27c956-7144-11e5-ba71-058fbc01cf0b Cross Breed 11 not_pure_breed
fd27cb72-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27cd98-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27ce1a-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 pure_breed
fd27cea6-7144-11e5-ba71-058fbc01cf0b Mixed Breed/ Other/ I Don't Know 2 not_pure_breed
fd27cf28-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27cfaa-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 pure_breed
fd27d02c-7144-11e5-ba71-058fbc01cf0b Pure Breed 20 pure_breed
fd27d0b8-7144-11e5-ba71-058fbc01cf0b Cross Breed 21 not_pure_breed
fd27d144-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 pure_breed
fd27d1c6-7144-11e5-ba71-058fbc01cf0b Pure Breed 7 pure_breed
17986 rows, truncated to displaylimit of 25

Question 11: Write a query that uses a CASE statement to report the number of unique user_guids associated with customers who live in the United States and who are in the following groups of states:

Group 1: New York (abbreviated "NY") or New Jersey (abbreviated "NJ")
Group 2: North Carolina (abbreviated "NC") or South Carolina (abbreviated "SC")
Group 3: California (abbreviated "CA")
Group 4: All other states with non-null values

You should find 898 unique user_guids in Group1.


In [29]:
%%sql
SELECT CASE
        WHEN (u.state='NY' OR u.state='NJ') THEN 'New York'
        WHEN (u.state='CA') THEN 'California'
        WHEN (u.state='NC' OR u.state='SC') THEN 'Carlolina'
        ELSE NULL
        END AS states_group,
        COUNT(u.state)
FROM (SELECT DISTINCT uu.user_guid, uu.state
      FROM users uu
      WHERE uu.country='US'
        ) AS u
GROUP BY states_group


4 rows affected.
Out[29]:
states_group COUNT(u.state)
None 6388
California 1417
Carlolina 653
New York 898

Question 12: Write a query that allows you to determine how many unique dog_guids are associated with dogs who are DNA tested and have either stargazer or socialite personality dimensions. Your answer should be 70.


In [36]:
%%sql
SELECT DISTINCT d.dog_guid
FROM dogs d
WHERE (d.dimension='stargazer' OR d.dimension='socialite') AND d.dna_tested=1


70 rows affected.
Out[36]:
dog_guid
fd3d0938-7144-11e5-ba71-058fbc01cf0b
fd3d0f96-7144-11e5-ba71-058fbc01cf0b
fd3d28aa-7144-11e5-ba71-058fbc01cf0b
fd3d2f9e-7144-11e5-ba71-058fbc01cf0b
fd3d424a-7144-11e5-ba71-058fbc01cf0b
fd3d42ea-7144-11e5-ba71-058fbc01cf0b
fd3d4376-7144-11e5-ba71-058fbc01cf0b
fd3d4b8c-7144-11e5-ba71-058fbc01cf0b
fd3d587a-7144-11e5-ba71-058fbc01cf0b
fd3fc0ce-7144-11e5-ba71-058fbc01cf0b
fd3ff18e-7144-11e5-ba71-058fbc01cf0b
fd406fce-7144-11e5-ba71-058fbc01cf0b
fd410a7e-7144-11e5-ba71-058fbc01cf0b
fd41bca8-7144-11e5-ba71-058fbc01cf0b
fd422210-7144-11e5-ba71-058fbc01cf0b
fd422332-7144-11e5-ba71-058fbc01cf0b
fd4233fe-7144-11e5-ba71-058fbc01cf0b
fd42913c-7144-11e5-ba71-058fbc01cf0b
fd42a302-7144-11e5-ba71-058fbc01cf0b
fd42e33a-7144-11e5-ba71-058fbc01cf0b
fd434df2-7144-11e5-ba71-058fbc01cf0b
fd43633c-7144-11e5-ba71-058fbc01cf0b
fd4373e0-7144-11e5-ba71-058fbc01cf0b
fd438254-7144-11e5-ba71-058fbc01cf0b
fd43a202-7144-11e5-ba71-058fbc01cf0b
70 rows, truncated to displaylimit of 25

Feel free to practice any other queries you like here!


In [ ]: