SQL Bootcamp

Sarah Beckett-Hile | NYU Stern School of Business | March 2015

Today's plan

SQL, the tool of business
Relational Databases
Why can't I do this in Excel?
Setting up this course
Basic Clauses

About SQL

SQL = "Structured Query Language" (pronounced "S-Q-L" or "sequel")
Database language of choice for most businesses
The software optimized for storing relational databases that you access with SQL varies. Relational Database Management Systems (RDBMS) include MySQL, Microsoft SQL Server, Oracle, and SQLite. We will be working with SQLite.
Relational Databases have multiple tables. Visualize it like an Excel file:
- Database = a single Excel file/workbook
- Table = a single worksheet in the same Excel file
SQL lets you perform four basic functions: C.R.U.D. = Create, Read, Update, Delete
"Read" is all you'll need for business analytics
Additional reading: http://www.w3schools.com/sql/sql_intro.asp
Find examples of queries for business analysis at botton of this lesson page

About this file

We'll use SQL in Python, specifically an IPython Notebook
No need to know what that means, but be sure you have SQL_support_code.py saved in the same folder as this file.
- Download if you haven't already: https://www.dropbox.com/s/dacxdvkk11tyr4n/SQL_support_code.py?dl=0
All SQL queries are in red
If you get stumped on a challenge, there are cheats at the bottom of a challenge cell. You'll see something like "#print(cheat1)". Delete the hash and run the cell (SHIFT-RETURN). Once you've figured it out, replace the hash, and try again.



In [1]:

    
from SQL_support_code import *

Structure and Formatting Query Basics:

Indentations and Returns:
- Mostly arbitrary in SQL
- Usually for readability
Capitalization:
- Convention to put keywords (functions, clauses) in CAPS
- Consistency is best
Order of Clauses:
- Very strict
- Not all clauses need to be present in a query, but when they are present, then they must be in the correct order
- Below are the major clauses that we are going to cover. Use this list as reference if you are getting errors with your queries - there's a chance you just have the clauses in the wrong order:
```
  SELECT
  FROM
  JOIN...ON 
  WHERE
  GROUP BY
  UNION
  ORDER BY
  LIMIT
```

Reading a table's structure:

PRAGMA TABLE_INFO(table_name)

Running this will let you see the column heads and data types of any table.

The SQL query above only works for SQLite, which is what we're using here. If you're interested in knowing the equivalent versions for other RDBMS options, see the table below.



In [2]:

    
describe_differences









    Out[2]:






  
    
      
      Microsoft SQL Server
      MySQL
      Oracle
      SQLite
    
  
  
    
      Reading a table
       SP_HELP table_name
       DESCRIBE table_name
       DESCRIBE table_table
       PRAGMA TABLE_INFO(table_name)

These are the names of the tables in our mini SQLite database:

sales_table
car_table
salesman_table
cust_table

Start by looking at the columns and their data types in the sales_table.



In [3]:

    
run('''
    PRAGMA TABLE_INFO(sales_table)
    ''')









    Out[3]:






  
    
      
      cid
      name
      type
      notnull
      dflt_value
      pk
    
  
  
    
      0
       0
                 id
       INTEGER
       0
        
       0
    
    
      1
       1
           model_id
       INTEGER
       0
        
       0
    
    
      2
       2
        customer_id
       INTEGER
       0
        
       0
    
    
      3
       3
            revenue
       INTEGER
       0
        
       0
    
    
      4
       4
       payment_type
          TEXT
       0
        
       0
    
    
      5
       5
        salesman_id
       INTEGER
       0
        
       0
    
    
      6
       6
               date
          TEXT
       0
        
       0

Rewrite the query to look at the other tables:



In [4]:

    
run('''
    PRAGMA TABLE_INFO(sales_table)
    ''')
#print(describe_cheat)









    Out[4]:






  
    
      
      cid
      name
      type
      notnull
      dflt_value
      pk
    
  
  
    
      0
       0
                 id
       INTEGER
       0
        
       0
    
    
      1
       1
           model_id
       INTEGER
       0
        
       0
    
    
      2
       2
        customer_id
       INTEGER
       0
        
       0
    
    
      3
       3
            revenue
       INTEGER
       0
        
       0
    
    
      4
       4
       payment_type
          TEXT
       0
        
       0
    
    
      5
       5
        salesman_id
       INTEGER
       0
        
       0
    
    
      6
       6
               date
          TEXT
       0
        
       0

Different RDBMS have different datatypes available:

Oracle: http://docs.oracle.com/cd/B10501_01/appdev.920/a96624/03_types.htm
MySQL:
- Numeric: http://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview.html
- Date/time: http://dev.mysql.com/doc/refman/5.0/en/date-and-time-type-overview.html
- String/text: http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html
SQLite: https://www.sqlite.org/datatype3.html
Microsoft: http://msdn.microsoft.com/en-us/library/ms187752.aspx

SELECT & FROM:

Basically every "read" query will contain a SELECT and FROM clause
In the SELECT clause, you tell SQL which columns you want to see
In the FROM clause, you tell SQL the table where those columns are located
More on SELECT: http://www.w3schools.com/sql/sql_select.asp

SELECT * (ALL COLUMNS)

SELECT          # specifies which columns you want to see 
    *           # asterisk returns all columns
FROM            # specifies the table or tables where these columns can be found
    table_name

Use an asterisk to tell SQL to return all columns from the table:



In [5]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    ''')









    Out[5]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0 
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
    
    
      1 
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
    
    
      2 
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
    
    
      3 
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
    
    
      4 
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014
    
    
      5 
       13154
       75
       720210
       21409
          cash
       215
        1/20/2014
    
    
      6 
       36535
       31
       908558
       19894
       finance
       215
        1/30/2014
    
    
      7 
       22813
       46
       705508
       12960
       finance
       813
        1/29/2014
    
    
      8 
       56245
       36
       248621
       25938
          cash
       492
        1/19/2014
    
    
      9 
       88118
       51
       341344
       19844
       finance
       215
        1/23/2014
    
    
      10
       84469
       31
       733566
       21441
       finance
       492
        1/20/2014
    
    
      11
       37412
       31
       750195
       17462
          cash
       949
        1/11/2014
    
    
      12
       68513
       31
       461723
       17020
          cash
       147
        1/18/2014
    
    
      13
       74380
       20
       468665
       18040
       finance
       803
         1/5/2014
    
    
      14
       24047
       75
       556188
       18671
       finance
       225
        1/12/2014
    
    
      15
       40603
       51
       241759
       22506
       finance
       949
         1/4/2014
    
    
      16
       69883
       20
       161369
       20460
       finance
       215
        1/25/2014
    
    
      17
       43338
       46
       731692
       13440
       finance
       225
        1/29/2014
    
    
      18
       39727
       51
       656750
       18634
       finance
       215
         1/1/2014
    
    
      19
       21022
       22
       619020
       17312
          cash
       492
        1/14/2014
    
    
      20
       18303
       36
       413891
       23318
          cash
       215
        1/17/2014
    
    
      21
       28176
       20
       965672
       21780
          cash
       862
        1/11/2014
    
    
      22
       62604
       75
       217720
       19418
          cash
       680
        1/30/2014
    
    
      23
       83930
       22
       946265
       17756
          cash
       813
        1/13/2014
    
    
      24
       87626
       31
       140795
       21441
       finance
       225
        1/13/2014
    
    
      25
       10682
       31
       145479
       17905
          cash
       949
        1/20/2014
    
    
      26
       23444
       20
       961650
       16720
       finance
       215
         1/1/2014
    
    
      27
       87998
       31
       457742
       20778
          cash
       680
         1/3/2014
    
    
      28
       95912
       75
       978159
       19667
          cash
       949
        1/25/2014
    
    
      29
       74410
       20
       474218
       19360
       finance
       276
         1/6/2014
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      70
       61014
       36
       253239
       20174
       finance
       215
        1/21/2014
    
    
      71
       18946
       20
       614301
       18480
          cash
       680
        1/28/2014
    
    
      72
       75834
       51
       168495
       23474
          cash
       492
        1/10/2014
    
    
      73
       57366
       31
       699431
       19231
          cash
       813
        1/12/2014
    
    
      74
       57711
       22
       528354
       18865
          cash
       225
         1/4/2014
    
    
      75
       65870
       36
       751098
       24104
          cash
       492
        1/23/2014
    
    
      76
       22147
       36
       480153
       20436
       finance
       225
        1/23/2014
    
    
      77
       99759
       51
       208677
       22990
          cash
       492
        1/16/2014
    
    
      78
       64406
       20
       349494
       19580
          cash
       276
        1/16/2014
    
    
      79
       55585
       22
       206310
       17756
       finance
       276
         1/1/2014
    
    
      80
       24710
       75
       185223
       18920
          cash
       147
        1/16/2014
    
    
      81
       53831
       75
       656181
       24646
       finance
       803
        1/18/2014
    
    
      82
       97494
       19
       393613
       14006
          cash
       862
        1/14/2014
    
    
      83
       13004
       36
       488910
       24890
       finance
       813
        1/12/2014
    
    
      84
       30779
       75
       846630
       23401
       finance
       147
        1/25/2014
    
    
      85
       29708
       51
       534633
       20328
       finance
       215
        1/29/2014
    
    
      86
       14082
       46
       909110
       15200
          cash
       813
        1/28/2014
    
    
      87
       82119
       19
       628031
       16189
          cash
       862
        1/12/2014
    
    
      88
       31815
       46
       743097
       15040
          cash
       147
        1/25/2014
    
    
      89
       32345
       22
       476047
       20197
       finance
       862
        1/19/2014
    
    
      90
       63526
       51
       862765
       23232
       finance
       276
        1/23/2014
    
    
      91
       20256
       19
       597655
       16734
          cash
       215
        1/16/2014
    
    
      92
       15930
       36
       486534
       20960
          cash
       862
        1/26/2014
    
    
      93
       68446
       75
       339830
       24148
          cash
       225
        1/22/2014
    
    
      94
       68330
       20
       283797
       19140
          cash
       862
         1/9/2014
    
    
      95
       85407
       22
       635204
       21751
       finance
       803
        1/18/2014
    
    
      96
       42428
       75
       619016
       20413
          cash
       147
        1/11/2014
    
    
      97
       37620
       51
       183947
       20328
       finance
       949
         1/3/2014
    
    
      98
       96344
       20
       731677
       17600
       finance
       680
        1/15/2014
    
    
      99
       53062
       31
       907549
       19894
       finance
       225
        1/28/2014
    
  

100 rows × 7 columns

Write a query to select all columns from the car_table:



In [6]:

    
run('''
    SELECT NULL
    ''')
#print(select_cheat1)

SELECT COLUMN:

SELECT 
    column_a,   # comma-separate multiple columns
    column_b
FROM 
    table_name

Instead of using an asterisk for "all columns", you can specify a particular column or columns:



In [7]:

    
run('''
    SELECT
        model_id, 
        revenue
    FROM
        sales_table
    ''')









    Out[7]:






  
    
      
      model_id
      revenue
    
  
  
    
      0 
       36
       21222
    
    
      1 
       20
       19140
    
    
      2 
       46
       14720
    
    
      3 
       31
       19010
    
    
      4 
       51
       22022
    
    
      5 
       75
       21409
    
    
      6 
       31
       19894
    
    
      7 
       46
       12960
    
    
      8 
       36
       25938
    
    
      9 
       51
       19844
    
    
      10
       31
       21441
    
    
      11
       31
       17462
    
    
      12
       31
       17020
    
    
      13
       20
       18040
    
    
      14
       75
       18671
    
    
      15
       51
       22506
    
    
      16
       20
       20460
    
    
      17
       46
       13440
    
    
      18
       51
       18634
    
    
      19
       22
       17312
    
    
      20
       36
       23318
    
    
      21
       20
       21780
    
    
      22
       75
       19418
    
    
      23
       22
       17756
    
    
      24
       31
       21441
    
    
      25
       31
       17905
    
    
      26
       20
       16720
    
    
      27
       31
       20778
    
    
      28
       75
       19667
    
    
      29
       20
       19360
    
    
      ...
      ...
      ...
    
    
      70
       36
       20174
    
    
      71
       20
       18480
    
    
      72
       51
       23474
    
    
      73
       31
       19231
    
    
      74
       22
       18865
    
    
      75
       36
       24104
    
    
      76
       36
       20436
    
    
      77
       51
       22990
    
    
      78
       20
       19580
    
    
      79
       22
       17756
    
    
      80
       75
       18920
    
    
      81
       75
       24646
    
    
      82
       19
       14006
    
    
      83
       36
       24890
    
    
      84
       75
       23401
    
    
      85
       51
       20328
    
    
      86
       46
       15200
    
    
      87
       19
       16189
    
    
      88
       46
       15040
    
    
      89
       22
       20197
    
    
      90
       51
       23232
    
    
      91
       19
       16734
    
    
      92
       36
       20960
    
    
      93
       75
       24148
    
    
      94
       20
       19140
    
    
      95
       22
       21751
    
    
      96
       75
       20413
    
    
      97
       51
       20328
    
    
      98
       20
       17600
    
    
      99
       31
       19894
    
  

100 rows × 2 columns

Write a query to select model_id and model from the car_table:



In [8]:

    
run('''
    SELECT NULL
    ''')
#print(select_cheat2)

One more quick note on the basics of SELECT - technically you can SELECT a value without using FROM to specify a table. You could just tell the query exactly what you want to see in the result-set. If it's a number, you can write the exact number. If you are using various characters, put them in quotes.

See the query below as an example:



In [9]:

    
run('''
    SELECT 
    4, 
    5, 
    7, 
    'various characters or text'
    ''')









    Out[9]:






  
    
      
      4
      5
      7
      'various characters or text'
    
  
  
    
      0
       4
       5
       7
       various characters or text

SELECT DISTINCT VALUES IN COLUMNS:

SELECT
    DISTINCT column_a     # returns a list of each unique value in column_a
FROM
    table_name

Use DISTINCT to return unique values from a column
More on DISTINCT: http://www.w3schools.com/sql/sql_distinct.asp

The query below pulls each distinct value from the model_id column in the sales_table, so each value is only listed one time:



In [10]:

    
run('''
    SELECT
        DISTINCT model_id
    FROM
        sales_table
    ''')

Use DISTINCT to select unqiue values from the salesman_id column in sales_table. Delete DISTINCT and rerun to see the effect.



In [11]:

    
run('''
    SELECT NULL
    ''')
#print(select_cheat3)

WHERE

SELECT 
    column_a
FROM
    table_name
WHERE
    column_a = x      # filters the result-set to rows where column_a's value is exactly x

A few more options for the where clause:

WHERE column_a = 'some_text'      # put text in quotations. CAPITALIZATION IS IMPORTANT

WHERE column_a != x      # filters the result-set to rows where column_a's value DOES NOT EQUAL x

WHERE column_a < x       # filters the result-set to rows where column_a's value is less than x

WHERE columna_a <= x      # filters the result-set to rows where column_a's value is less than or equal to x

WHERE column_a IN (x, y)       # column_a's value can be EITHER x OR y 

WHERE column_a NOT IN (x, y)     #  column_a's value can be NEITHER x NOR y  

WHERE column_a BETWEEN x AND y      # BETWEEN lets you specify a range   

WHERE column_a = x AND column_b = y    # AND lets you add more filters

WHERE column_a = x OR column_b = y     # OR will include results that fulfill either criteria

WHERE (column_a = x AND column_b = y) OR (column_c = z)   # use parentheses to create complex AND/OR statements

WHERE allows you to filter the result-set to only include rows matching specific values/criteria. If the value/criteria is text, remember to put it in single or double quotation marks
More on WHERE: http://www.w3schools.com/sql/sql_where.asp

Below, WHERE filters out any rows that don't match the criteria. The result-set will only contain rows where the payment type is cash AND where the model_id is 46:



In [12]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    WHERE
        payment_type = 'cash'
        AND model_id = 46 
    ''')









    Out[12]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       40112
       46
       819010
       14720
       cash
       147
        1/17/2014
    
    
      1
       82630
       46
       618295
       15040
       cash
       949
        1/16/2014
    
    
      2
       70434
       46
       944843
       13600
       cash
       680
        1/25/2014
    
    
      3
       30046
       46
       982483
       12640
       cash
       215
        1/29/2014
    
    
      4
       14082
       46
       909110
       15200
       cash
       813
        1/28/2014
    
    
      5
       31815
       46
       743097
       15040
       cash
       147
        1/25/2014

Rewrite the query to return rows where payment_type is NOT cash, and the model_id is either 31 or 36

Extra: Try changing 'cash' to 'Cash' to see what happens.



In [13]:

    
run('''
    SELECT NULL
    ''')
#print(where_cheat1)

Using BETWEEN, rewrite the query to return rows where the revenue was between 24,000 and 25,000:



In [14]:

    
run('''
    SELECT NULL
    ''')
#print(where_cheat2)

WHERE column LIKE:

SELECT 
    column_a
FROM
    table_name
WHERE
    column_a LIKE '%text or number%'      # Filters the result_set to rows where that text or value can be found, with % standing in as a wildcard

LIKE lets you avoid issues with capitalization in quotes, and you can use % as a wildcard to stand in for any character
Useful if you have an idea of what text you're looking for, but you are not sure of the spelling or you want all results that contatin those letters
More on LIKE: http://www.w3schools.com/sql/sql_like.asp
More on wildcards: http://www.w3schools.com/sql/sql_wildcards.asp

Note that you don't have to use the whole word "cash" when you use LIKE, and that the capital "C" now doesn't cause a problem:



In [15]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    WHERE
        payment_type LIKE 'Cas%'    
    ''').head()









    Out[15]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       40112
       46
       819010
       14720
       cash
       147
        1/17/2014
    
    
      1
       13154
       75
       720210
       21409
       cash
       215
        1/20/2014
    
    
      2
       56245
       36
       248621
       25938
       cash
       492
        1/19/2014
    
    
      3
       37412
       31
       750195
       17462
       cash
       949
        1/11/2014
    
    
      4
       68513
       31
       461723
       17020
       cash
       147
        1/18/2014

Be careful with LIKE though - it can't deal with extra characters or mispellings:



In [16]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    WHERE
        payment_type LIKE 'ces%'
    LIMIT 5
    ''')









    Out[16]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date

LIKE and % will also return too much if you're not specific enough. This returns both 'cash' and 'finance' because both have a 'c' with some letters before or after:



In [17]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    WHERE
        payment_type LIKE '%c%'
    LIMIT 5
    ''')









    Out[17]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
    
    
      1
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
    
    
      2
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
    
    
      3
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
    
    
      4
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014

You can use different wildcards besides % to get more specific. An underscore is a substitute for a single letter or character, rather than any number. The query below uses 3 underscores after c to get 'cash':



In [18]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    WHERE
        payment_type LIKE 'c___'
    LIMIT 5
    ''')









    Out[18]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       40112
       46
       819010
       14720
       cash
       147
        1/17/2014
    
    
      1
       13154
       75
       720210
       21409
       cash
       215
        1/20/2014
    
    
      2
       56245
       36
       248621
       25938
       cash
       492
        1/19/2014
    
    
      3
       37412
       31
       750195
       17462
       cash
       949
        1/11/2014
    
    
      4
       68513
       31
       461723
       17020
       cash
       147
        1/18/2014

Say you can't remember the model of the car you're trying to look up. You know it's "out"...something. Outcast? Outstanding? Write a query to return the model_id and model from the car_table and use LIKE to help you search:



In [19]:

    
run('''
    SELECT NULL
    ''')
#print(where_cheat3)

ORDER BY

SELECT 
    column_a
FROM
    table_name
WHERE                 # optional 
    column_a = x
ORDER BY              # sorts the result-set by column_a
    column_a DESC     # DESC is optional. It sorts results in descending order (100->1) instead of ascending (1->100)

Without an ORDER BY clause, the default result-set will be ordered by however it appears in the database
By default, ORDER BY will sort values in ascending order (A→Z, 1→100). Add DESC to order results in desceding order instead (Z→A, 100→1)
More on ORDER BY: http://www.w3schools.com/sql/sql_orderby.asp

The query below orders the result-set by revenue amount, starting with the smallest amount listed first:



In [20]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    ORDER BY
        revenue DESC
    LIMIT 5
    ''')









    Out[20]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       56245
       36
       248621
       25938
          cash
       492
        1/19/2014
    
    
      1
       37050
       36
       513614
       25937
          cash
       492
         1/2/2014
    
    
      2
       13004
       36
       488910
       24890
       finance
       813
        1/12/2014
    
    
      3
       53831
       75
       656181
       24646
       finance
       803
        1/18/2014
    
    
      4
       91723
       36
       490244
       24628
          cash
       225
         1/9/2014

Rewrite the query above to look at the sticker_price of cars from the car_table in descending order:



In [21]:

    
run('''
    SELECT NULL
    ''')
#print(order_cheat)

LIMIT

SELECT
    column_a
FROM
    table_name
WHERE
    columna_a = x      # optional
ORDER BY
    column_a           # optional
LIMIT                  # Limits the result-set to N rows
    N

LIMIT just limits the number of rows in your result set
More on LIMIT: http://www.w3schools.com/sql/sql_top.asp
The ability to limit results varies by RBDSM. Below you can see the different ways to do this:



In [22]:

    
limit_differences









    Out[22]:






  
    
      
      Microsoft SQL Server
      MySQL
      Oracle
      SQLite
    
  
  
    
      LIMITING
       SELECT TOP N column_a...
       LIMIT N
       WHERE ROWNUM <=N
       LIMIT N

The query below limits the number of rows to 5 results. Change it to 10 to get a quick sense of what we're doing here:



In [23]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    LIMIT 5
    ''')









    Out[23]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
    
    
      1
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
    
    
      2
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
    
    
      3
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
    
    
      4
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014

ALIASES

SELECT
    T.column_a AS alias_a    # creates a nickname for column_a, and states that it's from table_name (whose alias is T)
FROM
    table_name AS T          # creates a nickname for table_name
WHERE
    alias_a = z           # refer to an alias in the WHERE clause
ORDER BY
    alias_a                  # refer to an alias in the ORDER BY clause

Aliases are optional, but save you time and make column headers cleaner
AS isn't necessary to create and alias, but commonly used
The convention is to use "AS" in the "SELECT" clause, but not in the "FROM" clause.
More on Aliases: http://www.w3schools.com/sql/sql_alias.asp

Change the aiases for model_id and revenue, or add extra columns to see how they work:



In [24]:

    
run('''
    SELECT
        model_id AS Model_of_car,
        revenue AS Rev_per_car
    FROM 
        sales_table
    ''')









    Out[24]:






  
    
      
      Model_of_car
      Rev_per_car
    
  
  
    
      0 
       36
       21222
    
    
      1 
       20
       19140
    
    
      2 
       46
       14720
    
    
      3 
       31
       19010
    
    
      4 
       51
       22022
    
    
      5 
       75
       21409
    
    
      6 
       31
       19894
    
    
      7 
       46
       12960
    
    
      8 
       36
       25938
    
    
      9 
       51
       19844
    
    
      10
       31
       21441
    
    
      11
       31
       17462
    
    
      12
       31
       17020
    
    
      13
       20
       18040
    
    
      14
       75
       18671
    
    
      15
       51
       22506
    
    
      16
       20
       20460
    
    
      17
       46
       13440
    
    
      18
       51
       18634
    
    
      19
       22
       17312
    
    
      20
       36
       23318
    
    
      21
       20
       21780
    
    
      22
       75
       19418
    
    
      23
       22
       17756
    
    
      24
       31
       21441
    
    
      25
       31
       17905
    
    
      26
       20
       16720
    
    
      27
       31
       20778
    
    
      28
       75
       19667
    
    
      29
       20
       19360
    
    
      ...
      ...
      ...
    
    
      70
       36
       20174
    
    
      71
       20
       18480
    
    
      72
       51
       23474
    
    
      73
       31
       19231
    
    
      74
       22
       18865
    
    
      75
       36
       24104
    
    
      76
       36
       20436
    
    
      77
       51
       22990
    
    
      78
       20
       19580
    
    
      79
       22
       17756
    
    
      80
       75
       18920
    
    
      81
       75
       24646
    
    
      82
       19
       14006
    
    
      83
       36
       24890
    
    
      84
       75
       23401
    
    
      85
       51
       20328
    
    
      86
       46
       15200
    
    
      87
       19
       16189
    
    
      88
       46
       15040
    
    
      89
       22
       20197
    
    
      90
       51
       23232
    
    
      91
       19
       16734
    
    
      92
       36
       20960
    
    
      93
       75
       24148
    
    
      94
       20
       19140
    
    
      95
       22
       21751
    
    
      96
       75
       20413
    
    
      97
       51
       20328
    
    
      98
       20
       17600
    
    
      99
       31
       19894
    
  

100 rows × 2 columns

You can use an alias in the ORDER BY and WHERE clauses now. Write a query to:

pull the model_id and revenue for each transaction
give model_id the alias "Model"
give revenue the alias "Rev"
limit the results to only include rows where the model_id id 36, use the alias in the WHERE clause
order the results by revenue in descending order, use the alias in the ORDER BY clause
Run the query

THEN:

Try giving model_id the alias "ID" and use it in the WHERE clause, then rerun the query. What do you think is causing the error?



In [25]:

    
run('''
    SELECT NULL
    ''') 
#print(alias_cheat)

You can also assign an alias to a table, and use the alias to tell SQL which table the column is coming from. This isn't of much use when you're only using one table, but it will come in handy when you start using multiple tables.

Below,the sales_table has the alias "S". Read "S.model_id" as "the model_id column from S, which is the sales_table"

Change the S to another letter in the FROM clause and run. Why did you hit an error? What can you do to fix it?



In [26]:

    
run('''
    SELECT
        S.model_id,
        S.revenue
    FROM 
        sales_table AS S
    LIMIT 5
    ''')

JOINS

SELECT
    *
FROM
    table_x                                   
    JOIN table_y                                 # use JOIN to add the second table
        ON table_x.column_a = table_y.column_a   # use ON to specify which columns correspond on each table

Joining tables is the most fundamental and useful part about relational databases
Use columns on different tables with corresponding values to join the two tables
The format "table_x.column_a" can be read as "column_a from table_x"; it tells SQL the table where it can find that column
More on JOINS: http://www.w3schools.com/sql/sql_join.asp

Start by looking at the first few rows of sales_table again:



In [27]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    LIMIT 5
    ''')









    Out[27]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
    
    
      1
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
    
    
      2
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
    
    
      3
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
    
    
      4
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014

Now the first few rows of the car_table:



In [28]:

    
run('''
    SELECT
        *
    FROM
        car_table
    LIMIT 5
    ''')









    Out[28]:






  
    
      
      model_id
      make
      model
      sticker_price
      cogs
    
  
  
    
      0
       20
       Toyota
         Camry
       22000
       13200
    
    
      1
       46
       Toyota
       Corolla
       16000
        9600
    
    
      2
       51
       Toyota
         Prius
       24200
       14520
    
    
      3
       19
        Honda
         Civic
       18190
       10914
    
    
      4
       31
        Honda
        Accord
       22105
       13263

These tables are related. There's a column named "model_id" in the sales_table and a "model_id" in the car_table - but the column names don't need to be the same, what's important is that the values in the sales_table's model_id column correspond to the values in the car_table's model_id column.

You can join these tables by using these columns as keys.



In [29]:

    
run('''
    SELECT
        *
    FROM
        sales_table
        JOIN car_table ON sales_table.model_id = car_table.model_id
    LIMIT 10
    ''')









    Out[29]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
      model_id
      make
      model
      sticker_price
      cogs
    
  
  
    
      0
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
       36
       Toyota
        Tundra
       26200
       15720
    
    
      1
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
       20
       Toyota
         Camry
       22000
       13200
    
    
      2
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
       46
       Toyota
       Corolla
       16000
        9600
    
    
      3
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
       31
        Honda
        Accord
       22105
       13263
    
    
      4
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014
       51
       Toyota
         Prius
       24200
       14520
    
    
      5
       13154
       75
       720210
       21409
          cash
       215
        1/20/2014
       75
       Subaru
       Outback
       24895
       14937
    
    
      6
       36535
       31
       908558
       19894
       finance
       215
        1/30/2014
       31
        Honda
        Accord
       22105
       13263
    
    
      7
       22813
       46
       705508
       12960
       finance
       813
        1/29/2014
       46
       Toyota
       Corolla
       16000
        9600
    
    
      8
       56245
       36
       248621
       25938
          cash
       492
        1/19/2014
       36
       Toyota
        Tundra
       26200
       15720
    
    
      9
       88118
       51
       341344
       19844
       finance
       215
        1/23/2014
       51
       Toyota
         Prius
       24200
       14520

Write a query to join the cust_table to the sales_table, using the customer_id columns in both tables as the key:



In [30]:

    
run('''
    SELECT NULL
    ''') 
#print(join_cheat1)

Rewrite the query from above, but instead of selecting all columns, specify just the customer gender and the revenue:



In [31]:

    
run('''
    SELECT NULL
    ''')
#print(join_cheat2)

Rewrite the query from above, but this time select the customer_id, gender, and revenue:

You'll probably hit an error at first. Try to use what you've learned about this structure "table_x.column_a" to fix the issue. Why do you think you need to use this?



In [32]:

    
run('''
    SELECT NULL
    ''')
#print(join_cheat3)

A column with the name customer_id appears in both the cust_table and the sales_table. SQL doesn't know which one you want to see. You have to tell it from which table you want the customer_id.

This can be important when columns in different tables have the same names but totally unrelated values.

Look at the sales_table again:



In [33]:

    
run('''
    SELECT
        *
    FROM
        sales_table
    LIMIT 5
    ''')









    Out[33]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
    
    
      1
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
    
    
      2
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
    
    
      3
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
    
    
      4
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014

Above, there's a column called "id".

Now look at the salesman_table again:



In [34]:

    
run('''
    SELECT
        *
    FROM
        salesman_table
    LIMIT 5
    ''')









    Out[34]:






  
    
      
      id
      first_name
      last_name
    
  
  
    
      0
       949
       Samantha
       Douglas
    
    
      1
       215
          Jared
          Case
    
    
      2
       813
        Michael
          Hill
    
    
      3
       680
       Claudine
         Hatch
    
    
      4
       276
         Joseph
         Seney

There's a column named "id" in the salesman_table too. However, it doesn't look like those IDs correspond to the sales_table IDs. In fact, it's the salesman_id column in the sales_table that corresponds to the id column in the salesman_table. More often than not, your tables will use different names for corresponding columns, and will have columns with identical names that don't correspond at all.

Write a query to join the salesman_table with the sales_table (select all columns using an asterisk)



In [35]:

    
run('''
    SELECT NULL
    ''') 
#print(join_cheat4)

Practice applying this "table_x.column_a" format to all columns in the SELECT clause when you are joining multiple tables, since multiple tables frequenty use the same column names even when they don't correspond.

It's common to use single-letter aliases for tables to make queries shorter. Take a look at the query below and make sure you understand what's going on with the table aliases. It's the same query that you wrote earlier, but with aliases to help identify the columns



In [36]:

    
run('''
    SELECT
        S.customer_id,
        C.gender,
        S.revenue
    FROM
        sales_table AS S
        JOIN cust_table AS C on S.customer_id = C.customer_id
    ''')









    Out[36]:






  
    
      
      customer_id
      gender
      revenue
    
  
  
    
      0 
       237906
       female
       21222
    
    
      1 
       967016
         male
       19140
    
    
      2 
       819010
         male
       14720
    
    
      3 
       633030
         male
       19010
    
    
      4 
       341877
       female
       22022
    
    
      5 
       720210
         male
       21409
    
    
      6 
       908558
         male
       19894
    
    
      7 
       705508
       female
       12960
    
    
      8 
       248621
         male
       25938
    
    
      9 
       341344
       female
       19844
    
    
      10
       733566
       female
       21441
    
    
      11
       750195
       female
       17462
    
    
      12
       461723
         male
       17020
    
    
      13
       468665
       female
       18040
    
    
      14
       556188
       female
       18671
    
    
      15
       241759
         male
       22506
    
    
      16
       161369
       female
       20460
    
    
      17
       731692
       female
       13440
    
    
      18
       656750
         male
       18634
    
    
      19
       619020
       female
       17312
    
    
      20
       413891
         male
       23318
    
    
      21
       965672
         male
       21780
    
    
      22
       217720
       female
       19418
    
    
      23
       946265
         male
       17756
    
    
      24
       140795
       female
       21441
    
    
      25
       145479
         male
       17905
    
    
      26
       961650
       female
       16720
    
    
      27
       457742
         male
       20778
    
    
      28
       978159
         male
       19667
    
    
      29
       474218
         male
       19360
    
    
      ...
      ...
      ...
      ...
    
    
      70
       253239
         male
       20174
    
    
      71
       614301
       female
       18480
    
    
      72
       168495
         male
       23474
    
    
      73
       699431
       female
       19231
    
    
      74
       528354
       female
       18865
    
    
      75
       751098
         male
       24104
    
    
      76
       480153
       female
       20436
    
    
      77
       208677
         male
       22990
    
    
      78
       349494
       female
       19580
    
    
      79
       206310
       female
       17756
    
    
      80
       185223
         male
       18920
    
    
      81
       656181
       female
       24646
    
    
      82
       393613
       female
       14006
    
    
      83
       488910
       female
       24890
    
    
      84
       846630
         male
       23401
    
    
      85
       534633
         male
       20328
    
    
      86
       909110
       female
       15200
    
    
      87
       628031
         male
       16189
    
    
      88
       743097
       female
       15040
    
    
      89
       476047
         male
       20197
    
    
      90
       862765
         male
       23232
    
    
      91
       597655
         male
       16734
    
    
      92
       486534
         male
       20960
    
    
      93
       339830
         male
       24148
    
    
      94
       283797
         male
       19140
    
    
      95
       635204
       female
       21751
    
    
      96
       619016
       female
       20413
    
    
      97
       183947
         male
       20328
    
    
      98
       731677
         male
       17600
    
    
      99
       907549
       female
       19894
    
  

100 rows × 3 columns

Join the sales_table (assign it the alias S) and salesman_table (alias SM) again.

Select the id and salesman_id column from the sales_table
Also, select the id column from the salesman_table
Optional: assign aliases to the columns in the SELECT clause to make the result-set easier to read



In [37]:

    
run('''
    SELECT NULL
    ''') 
#print(join_cheat5)

Different Types of Joins

There are different types of joins you can do according to your needs. Here's a helpful way to visualize your options: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

However, not all types of joins are compatible with SQLite and MySQL. The table below breaks down compatibility:



In [38]:

    
join_differences









    Out[38]:






  
    
      
      Microsoft SQL Server
      MySQL
      Oracle
      SQLite
    
  
  
    
      JOIN or INNER JOIN
       ✓
                   ✓
       ✓
                   ✓
    
    
      LEFT JOIN or LEFT OUTER JOIN
       ✓
                   ✓
       ✓
                   ✓
    
    
      RIGHT JOIN or RIGHT OUTER JOIN
       ✓
                   ✓
       ✓
       not supported
    
    
      OUTER JOIN or FULL OUTER JOIN
       ✓
       not supported
       ✓
       not supported

So far, we've just done a simple join, also called an "inner join". To illustrate different types of joins, we're going to use a different "database" for the following lesson. First, let's take a look at each one:



In [39]:

    
run('''
    SELECT 
        *
    FROM
        Dog_Table
    ''')









    Out[39]:






  
    
      
      Owner_Name
      Dog_Name
    
  
  
    
      0
       Michael Martinez
       Chewbacca
    
    
      1
         Gilbert Henkel
             Max
    
    
      2
             May Reeves
         Georgie
    
    
      3
       Elizabeth Minier
           Sunny
    
    
      4
              Donna Ona
           Daisy
    
    
      5
           Daniel Abner
            Jake
    
    
      6
           Keith Bursey
          Muffin
    
    
      7
            Jane Garcia
         Admiral
    
    
      8
           Sarah Flores
           Bogie
    
    
      9
            David Brace
           Fritz



In [40]:

    
run('''
    SELECT 
        *
    FROM
        Cat_Table
    ''')









    Out[40]:






  
    
      
      Owner_Name
      Cat_Name
    
  
  
    
      0
       Michael Martinez
        Mrs. Paws
    
    
      1
         Gilbert Henkel
          Mittens
    
    
      2
             May Reeves
           Clover
    
    
      3
       Elizabeth Minier
       Sweetcakes
    
    
      4
              Donna Ona
           Barnie
    
    
      5
          Monika Turner
             Cara
    
    
      6
        Adrian Hardiman
        Annabelle
    
    
      7
           Loyd Mossman
         Garfield
    
    
      8
          Heather Emery
           Engine
    
    
      9
          Lauren Larson
         Midnight

Notice that the Owner_Name columns on each table have some corresponding values (Michael, Gilbert, May and Elizabeth and Donna are in both tables), but they both also have values that don't overlap.

JOINS or INNER JOINS

SELECT
    *
FROM 
    table_x X
    JOIN table_y Y ON X.column_a = Y.column_a      # Returns rows when values match on both tables.

This is what we used in the initial example. Simple joins, (also called Inner Joins) will combine tables only where there are corresponding values on both tables.

Write a query below to join the Cat_Table and Dog_Table using the same method we've used before:



In [41]:

    
run('''
    SELECT NULL
    ''')
#print(inner_join_cheat)

Notice that the result-set only includes the names that are in both tables. Think of inner joins as being the overlapping parts of a Venn Diagram. So, essentially we're looking at results only where the pet owner has both a cat and a dog.

LEFT JOINS or LEFT OUTER JOINS

SELECT
    *
FROM 
    table_x X
    LEFT JOIN table_y Y ON X.column_a = Y.column_a   # Returns all rows from 1st table, rows that match from 2nd

LEFT JOINS will return all rows from the first table, but only rows from the second table if a value matches on the key column.

Rewrite your query from above, but instead of "JOIN", write "LEFT JOIN":



In [42]:

    
run('''
    SELECT NULL
    ''')
#print(left_join_cheat)

This time, you're seeing everything from the Dog_Table, but only results from the Cat_Table IF the owner also has a dog.

OUTER JOINS or FULL OUTER JOINS:

SELECT
    *
FROM 
    table_x X
    OUTER JOIN table_y Y ON X.column_a = Y.column_a  # Returns all rows, regardless of whether values match

Outer joins include ALL rows from both tables, even if the values on the key columns don't match up.
SQLite doesn't support this, so the query below is a workaround to show you the visual effect of an outer join
This provides a great workaround for MySQL: http://stackoverflow.com/questions/4796872/full-outer-join-in-mysql

For now, this query won't totally make sense, just pay attention to the results so you can visualize an outer join:



In [43]:

    
run('''
    SELECT
        C.Owner_Name, 
        Cat_Name, 
        Dog_Name
    FROM
        Cat_Table C 
        LEFT JOIN Dog_Table D ON D.Owner_Name = C.Owner_Name

    UNION ALL
    
    SELECT
      D.Owner_Name, 
      ' ', 
      Dog_Name  
    FROM
        Dog_Table D 
    WHERE 
        Owner_Name NOT IN (SELECT Owner_Name from Cat_Table)
    ''')









    Out[43]:






  
    
      
      Owner_Name
      Cat_Name
      Dog_Name
    
  
  
    
      0 
       Michael Martinez
        Mrs. Paws
       Chewbacca
    
    
      1 
         Gilbert Henkel
          Mittens
             Max
    
    
      2 
             May Reeves
           Clover
         Georgie
    
    
      3 
       Elizabeth Minier
       Sweetcakes
           Sunny
    
    
      4 
              Donna Ona
           Barnie
           Daisy
    
    
      5 
          Monika Turner
             Cara
                
    
    
      6 
        Adrian Hardiman
        Annabelle
                
    
    
      7 
           Loyd Mossman
         Garfield
                
    
    
      8 
          Heather Emery
           Engine
                
    
    
      9 
          Lauren Larson
         Midnight
                
    
    
      10
           Daniel Abner
                 
            Jake
    
    
      11
           Keith Bursey
                 
          Muffin
    
    
      12
            Jane Garcia
                 
         Admiral
    
    
      13
           Sarah Flores
                 
           Bogie
    
    
      14
            David Brace
                 
           Fritz

Essentially, in Venn Diagram terms, and outer join lets you see all contents of both circles. This join will let you see all pet owners, regardless of whether the own only a cat or only a dog

Using the "WHERE" Clause to Join Tables

SELECT
    *
FROM
    table_x X
    JOIN table y Y
WHERE
    X.column_a = Y.column_a         # tells SQL the key for the join

Some people prefer to use the WHERE clause to specify the key for a join
Fine if the query is short, but SUPER messy when the query is complex
We won't use this moving forward, but it's good to see it in case you run across someone else's code and you need to make sense of it

When it's simple, it's not so bad:



In [44]:

    
run('''
    SELECT
        C.model, 
        S.revenue
    FROM
        sales_table S, car_table C 
    WHERE
        S.model_id = C.model_id
    LIMIT 5
    ''')

When the query is longer, this method is messy. Suddenly it's harder to parse out which parts of the "WHERE" clause are actual filters, and which parts are just facilitating the join.

Note that we've covered all of these clauses and expressions by now, try to parse out what's going on:



In [45]:

    
run('''
    SELECT
        C.make,
        C.model, 
        S.revenue,
        CUST.gender,
        SM.first_name
    FROM
        sales_table S
        JOIN car_table C
        JOIN salesman_table SM
        JOIN cust_table CUST
    WHERE
        S.customer_id = CUST.customer_id
        AND S.model_id = C.model_id
        AND S.salesman_id = SM.id
        AND (C.model in ('Tundra', 'Camry', 'Corolla') OR C.make = 'Subaru')
        AND S.revenue between 17000 and 22000
        AND CUST.gender = 'female' 
        AND SM.first_name NOT IN ('Kathleen', 'Samantha')
    LIMIT 5
    ''')









    Out[45]:






  
    
      
      make
      model
      revenue
      gender
      first_name
    
  
  
    
      0
       Toyota
          Camry
       20460
       female
          Jared
    
    
      1
       Subaru
       Forester
       17756
       female
         Joseph
    
    
      2
       Subaru
       Forester
       17090
       female
       Claudine
    
    
      3
       Subaru
        Outback
       19418
       female
       Claudine
    
    
      4
       Toyota
         Tundra
       21222
       female
         Joseph

OPERATORS

ADDING / SUBSTRACTING / MULTIPLYING / DIVIDING

SELECT
    column_a + column_b     # adds the values in column_a to the values in columns_b
FROM
    table_name

Use the standard formats for add, subtract, mutiply, and divide: + - * /

The query below subtracts cogs (from the car_table) from revenue (from the sales_table) to show us the gross_profit per transaction



In [46]:

    
run('''
    SELECT
        S.id,
        C.model, 
        S.revenue,
        C.cogs,
        S.revenue - C.cogs AS gross_profit
    FROM
        sales_table S 
        JOIN car_table C on S.model_id = C.model_id
    LIMIT 5
    ''')









    Out[46]:






  
    
      
      id
      model
      revenue
      cogs
      gross_profit
    
  
  
    
      0
       54858
        Tundra
       21222
       15720
       5502
    
    
      1
       43161
         Camry
       19140
       13200
       5940
    
    
      2
       40112
       Corolla
       14720
        9600
       5120
    
    
      3
       92495
        Accord
       19010
       13263
       5747
    
    
      4
       78000
         Prius
       22022
       14520
       7502

Rewrite the query above to return gross margin instead of gross profit. Rename the alias as well. Limit it to 5 results



In [47]:

    
run('''
    SELECT NULL
    ''') 
#print(operator_cheat)

CONCATENATING:

Concatenating varies by RDBMS:



In [48]:

    
concat_differences









    Out[48]:






  
    
      
      Microsoft SQL Server
      MySQL
      Oracle
      SQLite
    
  
  
    
      Concatenating
       CONCAT(column_a, column_b) or +
       CONCAT(column_a, column_b)
       CONCAT(column_a, column_b) or ||
       ||

Here we'll use SQLite and use the concatenating operator || to combine words/values in different columns:



In [49]:

    
run('''
    SELECT
        last_name, 
        first_name,
        last_name || ', ' || first_name AS full_name
    FROM
        salesman_table
    ''')









    Out[49]:






  
    
      
      last_name
      first_name
      full_name
    
  
  
    
      0
          Douglas
        Samantha
        Douglas, Samantha
    
    
      1
             Case
           Jared
              Case, Jared
    
    
      2
             Hill
         Michael
            Hill, Michael
    
    
      3
            Hatch
        Claudine
          Hatch, Claudine
    
    
      4
            Seney
          Joseph
            Seney, Joseph
    
    
      5
            March
        Kathleen
          March, Kathleen
    
    
      6
             Self
       Rosemarie
          Self, Rosemarie
    
    
      7
       Avellaneda
          Justin
       Avellaneda, Justin
    
    
      8
             Elzy
           Elton
              Elzy, Elton
    
    
      9
             Luna
         Matthew
            Luna, Matthew

Use || to pull the make and model from the car_table and make it appear in this format: "Model (Make)"

give it an alias to clean up the column header, otherwise it'll look pretty messy



In [50]:

    
run('''
    SELECT NULL
    ''') 
#print(concat_cheat)

FUNCTIONS:

SELECT
    SUM(column_a),             # sums up the values in column_a
    AVG(column_a),             # averages the values in column_a
    ROUND(AVG(column_a), 2),   # rounds the averaged values in column_a to 2 digits
    COUNT(column_a),           # counts the number of rows in column_a  
    MAX(column_a),             # returns the maximum value in column_a
    MIN(column_a),             # returns the minimum value in column_a
    GROUP_CONCAT(column_a)     # returns a comma separated list of all values in column_a
FROM
    table_name

Functions can be applied to columns to help analyze data
You can find more than just these basic few in the link below, or just Google what you're looking to do - there's a lot of help available on forums
More on functions: http://www.w3schools.com/sql/sql_functions.asp

The function below will sum up everything in the revenue column. Note that now we only get one row:



In [51]:

    
run('''
    SELECT
        SUM(revenue) AS Total_Revenue
    FROM 
        sales_table
    ''')









    Out[51]:






  
    
      
      Total_Revenue
    
  
  
    
      0
       1941252

Rewrite the query to return the average cost of goods for a car in the car table. Try rounding it to cents.

If you can't remember the name of the column for cost of goods in the car_table, remember you can use "SELECT * FROM car_table LIMIT 1" to see the first row of all columns, or you can use "PRAGMA TABLE_INFO(car_table)"



In [52]:

    
run('''
    SELECT NULL
    ''')
#print(avg_cheat)

Using COUNT(*) will return the number of rows in any given table. Rewrite the query to return the number of rows in the car_table:

After you've run the query, try changing it by adding "WHERE make = 'Subaru'" and see what happens



In [53]:

    
run('''
    SELECT NULL
    ''')
#print(count_cheat)

You can apply functions on top of other operators. Below is the sum of gross profits:



In [54]:

    
run('''
    SELECT
        '$ ' || SUM(S.revenue - C.cogs) total_gross_profit
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    ''')









    Out[54]:






  
    
      
      total_gross_profit
    
  
  
    
      0
       $ 596814.0

Write a query to show the average difference between the sticker_price (in car_table) and the revenue.

If you want a challenge, try to join cust_table and limit the query to only look at transactions where the customer's age is over 35



In [55]:

    
run('''
    SELECT NULL
    ''') 
#print(avg_cheat2)

GROUP_CONCAT

SELECT
    GROUP_CONCAT(column_a, '[some character separating items]') 
FROM
    table_x

This function is useful to return comma-separated lists of the values in a column



In [56]:

    
run('''
    SELECT 
        GROUP_CONCAT(model, ', ') as Car_Models
    FROM
        car_table
    ''')









    Out[56]:






  
    
      
      Car_Models
    
  
  
    
      0
       Camry, Corolla, Prius, Civic, Accord, Outback,...

Use GROUP_CONCAT to return a comma-separated list of last names from the salesman_table:



In [57]:

    
run('''
    SELECT NULL
''')
#print(concat_cheat)

GROUP BY:

SELECT
    column_a,                      
    SUM(column_b)    # sums up the values in column_b
FROM
    table_name
GROUP BY             # creates one group for each unique value in column_a
    column_a

Creates a group for each unique value in the column you specify
Extremely helpful when you're using functions - it segments out results
More on GROUP BY: http://www.w3schools.com/sql/sql_groupby.asp

The query below creates a group for each unique value in the car_table's model column, then sums up the revenue for each group. Note that you can use an alias in the GROUP BY clause.



In [58]:

    
run('''
    SELECT
        C.model AS Car_Model, 
        SUM(revenue) AS Total_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id
    GROUP BY
        Car_Model
    ''')









    Out[58]:






  
    
      
      Car_Model
      Total_Revenue
    
  
  
    
      0
         Accord
       234748
    
    
      1
          Camry
       212080
    
    
      2
          Civic
       157886
    
    
      3
        Corolla
       154880
    
    
      4
       Forester
       201083
    
    
      5
        Outback
       317406
    
    
      6
          Prius
       297418
    
    
      7
         Tundra
       365751

Rewrite the query above to return the average gross profit (revenue - cogs) per make (remember that "make" is in the car_table)

Extra things to try:

Round average revenue to two decimal points
Order the results by gross profit in descending order
Rename the make column as "Car_Maker" and use the alias in the GROUP BY clause
Rename gross profit column as "Avg_Gross_Profit" and use the alias in the ORDER BY clause
Join the salesman_table and filter results to only look at revenue where first_name is Michael
- After you've gotten the query to run with all of these adjustments, think about the risks involved with adding something in the WHERE clause that doesn't show up in the SELECT clause. Think about a potential solution to these risks.



In [59]:

    
run('''
    SELECT NULL
    ''')
#print(group_cheat)

Write a query to make a comma-separated list of models for each car maker:



In [60]:

    
run('''
    SELECT NULL
    ''')
#print(group_cheat1)

GROUP BY, when used with joins and functions, can help you quickly see trends in your data. Parse out what's going on here:



In [61]:

    
run('''
    SELECT
        C.model AS Car_Model, 
        MIN(S.revenue) || ' - ' || MAX(S.revenue) AS Min_to_Max_Sale,
        MAX(S.revenue) - MIN(S.revenue) AS Range,
        ROUND(AVG(S.revenue), 2) AS Average_Sale
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    GROUP BY
        Car_Model
    ORDER BY
        Average_Sale DESC
    ''')









    Out[61]:






  
    
      
      Car_Model
      Min_to_Max_Sale
      Range
      Average_Sale
    
  
  
    
      0
         Tundra
       20174 - 25938
       5764
       22859.44
    
    
      1
          Prius
       18150 - 23958
       5808
       21244.14
    
    
      2
        Outback
       18671 - 24646
       5975
       21160.40
    
    
      3
         Accord
       17020 - 21441
       4421
       19562.33
    
    
      4
          Camry
       16720 - 21780
       5060
       19280.00
    
    
      5
       Forester
       16646 - 21751
       5105
       18280.27
    
    
      6
          Civic
       13642 - 18008
       4366
       15788.60
    
    
      7
        Corolla
       12640 - 15360
       2720
       14080.00

You can also use GROUP BY with multiple columns to segment out the results further:



In [62]:

    
run('''
    SELECT
        C.make AS car_caker, 
        payment_type,
        ROUND(AVG(revenue)) as avg_revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    GROUP BY
        C.Make, 
        payment_type
    ''')









    Out[62]:






  
    
      
      car_caker
      payment_type
      avg_revenue
    
  
  
    
      0
        Honda
          cash
       17028
    
    
      1
        Honda
       finance
       19030
    
    
      2
       Subaru
          cash
       19581
    
    
      3
       Subaru
       finance
       20363
    
    
      4
       Toyota
          cash
       20437
    
    
      5
       Toyota
       finance
       19183

Rewrite the query to find the total revenue grouped by each salesperson's first_name and by the customer's gender (gender column in cust_table)

For an extra challenge, use the concatenating operator to use the salesperson's full name instead
Add COUNT(S.id) to the SELECT clause to see the number of transactions in each group



In [63]:

    
run('''
    SELECT NULL
    ''')
#print(group_cheat2)

"HAVING" in GROUP BY statements:

SELECT
    column_a,
    SUM(column_b) AS alias_b
FROM
    table_name
GROUP BY
    column_a HAVING alias_b > x   # only includes groups in column_a when the sum of column_b is greater than x

If you've applied a function to a column and want to filter to only show results meeting a particular criteria, use HAVING in your GROUP BY clause.
More on HAVING: http://www.w3schools.com/sql/sql_having.asp

The query below will sum up all the revenue for each car maker, but it will only show you results for car maker's whose total revenue is greater than 50,000:



In [64]:

    
run('''
    SELECT
        C.Make as Car_Maker, 
        SUM(revenue) as Total_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id
    GROUP BY
        Car_Maker HAVING Total_Revenue > 500000
    ''')









    Out[64]:






  
    
      
      Car_Maker
      Total_Revenue
    
  
  
    
      0
       Subaru
        518489
    
    
      1
       Toyota
       1030129

Rewrite the query above to look at average revenue per model, and using HAVING to filter your result-set to only include models whose average revenue is less than 18,000:



In [65]:

    
run('''
    SELECT NULL
    ''')
#print(having_cheat)

HAVING vs WHERE:

WHERE filters which rows will be included in the function, whereas HAVING filters what's returned after the function has been applied.

Take a look at the query below. It might look like the query you just wrote (above) if you'd tried to use WHERE instead of HAVING:

SELECT 
    C.model as Car_Model,
    AVG(S.revenue) as Avg_Revenue
FROM
    sales_table S
    JOIN car_table C on S.model_id = C.model_id      
WHERE 
    S.revenue < 18000
GROUP BY
    Car_Model

Find the sales_table and join it to the car_table
Pull the data from the 'model' column in car_table and 'revenue' column in sales_table
Filter out all rows where revenue is less than 18000
Average remaining rows for each Car_Model

Even though AVG( ) appears early in the query, it's not actually being applied until after the WHERE statement has filtered out rows with less than 18,000 in revenue.

This is the result:



In [66]:

    
run('''
    SELECT 
        C.model as Car_Model,
        AVG(S.revenue) as Avg_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id      
    WHERE 
        S.revenue < 18000
    GROUP BY
        Car_Model
    ''')









    Out[66]:






  
    
      
      Car_Model
      Avg_Revenue
    
  
  
    
      0
         Accord
       17462.333333
    
    
      1
          Camry
       17160.000000
    
    
      2
          Civic
       15542.000000
    
    
      3
        Corolla
       14080.000000
    
    
      4
       Forester
       17438.714286

All model_ids are returned, but the averages are all much lower than they should be. That's because the query first drops all rows that have revenue greater than 18000, and then averages the remaining rows.

When you use HAVING, SQL follows these steps instead (this query should look like the one you wrote in the last challenge):

SELECT
    C.model as Car_Model,
    AVG(S.revenue) as Avg_Revenue
FROM
    sales_table S
    JOIN car_table C on S.model_id = C.model_id
GROUP BY
    Car_Model HAVING Avg_Revenue < 18000

Find the sales_table and join it to the car_table (same as before)
Pull the data from the 'model' column in car_table and 'revenue' column in sales_table (same as before)
Average the rows for each Car_Model
Return only the Car_Models whose averages are less than 18,000

And as you can see, there's a big difference in these results and the results of the query that used "WHERE" instead of HAVING:



In [67]:

    
run('''
    SELECT
        C.model as Car_Model,
        AVG(S.revenue) as Avg_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id
    GROUP BY
        Car_Model HAVING Avg_Revenue < 18000
    ''')









    Out[67]:






  
    
      
      Car_Model
      Avg_Revenue
    
  
  
    
      0
         Civic
       15788.6
    
    
      1
       Corolla
       14080.0

HAVING & WHERE in the same query:

Sometimes, you will want to use WHERE and HAVING in the same query
Just be aware of the order of the steps that SQL takes
Rule of thumb: if you're applying a function to a column, you probably don't want that column in there WHERE clause

This query is only looking at Toyotas whose revenue is less than 18,000, using WHERE to limit the results to Toyotas, and HAVING to limit the results by revenue:



In [68]:

    
run('''
    SELECT
        C.model as Car_Model,
        AVG(S.revenue) as Avg_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id
    WHERE
        C.make = 'Toyota'
    GROUP BY
        Car_Model HAVING Avg_Revenue < 18000
    ''')









    Out[68]:






  
    
      
      Car_Model
      Avg_Revenue
    
  
  
    
      0
       Corolla
       14080

Write a query with the following criteria:

SELECT clause:
- salesman's last name and average revenue, rounded to the nearest cent
FROM clause:
- sales_table joined with the salesman_table and the cust_table
WHERE clause:
- only female customers
GROUP BY clause:
- only salespeople whose average revenue was greater than 20,000

So, in plain English, we want to see salespeople whose average revenue for female customers is greater than 20,000



In [69]:

    
run('''
    SELECT NULL
    ''')
#print(having_where_cheat)

ROLLUP

SELECT
    column_a,
    SUM(column_b)
FROM
    table_x
GROUP BY
    ROLLUP(column_a)     # adds up all groups' values in a single final row

Rollup, used with GROUP BY, provides subtotals and totals for your groups
Useful for quick analysis
Varies by RDBMS



In [70]:

    
rollup_differences









    Out[70]:






  
    
      
      Microsoft SQL Server
      MySQL
      Oracle
      SQLite
    
  
  
    
      ROLLUP
       GROUP BY column_a WITH ROLLUP
       GROUP BY column_a WITH ROLLUP
       GROUP BY ROLLUP (column_a)
       not supported

Because SQLite doesn't support ROLLUP, the query below is just intended to illustrate how ROLLUP would work. Don't worry about understanding the query itself, just get familiar with what's going on in the result-set:



In [71]:

    
run('''
    SELECT
        C.model AS Car_Model, 
        SUM(S.revenue) as Sum_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    GROUP BY C.model
    
    UNION ALL
    
    SELECT 
        'NULL', 
        SUM(S.revenue)
    FROM 
        sales_table S
''')









    Out[71]:






  
    
      
      Car_Model
      Sum_Revenue
    
  
  
    
      0
         Accord
        234748
    
    
      1
          Camry
        212080
    
    
      2
          Civic
        157886
    
    
      3
        Corolla
        154880
    
    
      4
       Forester
        201083
    
    
      5
        Outback
        317406
    
    
      6
          Prius
        297418
    
    
      7
         Tundra
        365751
    
    
      8
           NULL
       1941252

Conditional Expressions: IF & CASE WHEN

SELECT
    CASE WHEN column_a = x THEN some_value
         WHEN column_a = y THEN some_value2
         ELSE some_other_value
         END some_alias                         # alias optional after END
FROM
    table_name

Conditional expressions let you use IF/THEN logic in SQL
In SQLite, you have to use CASE WHEN, but in other RDBMS you may prefer to use IF, depending on your needs
More on CASE WHEN: http://www.dotnet-tricks.com/Tutorial/sqlserver/1MS1120313-Understanding-Case-Expression-in-SQL-Server-with-Example.html



In [72]:

    
conditional_differences









    Out[72]:






  
    
      
      Microsoft SQL Server
      MySQL
      Oracle
      SQLite
    
  
  
    
      IF
       IF condition PRINT value_if_true...
       IF(condition, value_if_true, value_if_false)
       IF condition THEN value_if_true ELSIF...END IF
       not supported
    
    
      CASE WHEN
                                         ✓
                                                  ✓
                                                    ✓
                   ✓

Starting with a simple example, here we'll use CASE WHEN to create a new column on the sales_table:



In [73]:

    
run('''
    SELECT
        revenue, 
        CASE WHEN revenue > 20000 THEN 'Revenue is more than 20,000' 
            END Conditional_Column
    FROM
        sales_table
    LIMIT 10
    ''')









    Out[73]:






  
    
      
      revenue
      Conditional_Column
    
  
  
    
      0
       21222
       Revenue is more than 20,000
    
    
      1
       19140
                                  
    
    
      2
       14720
                                  
    
    
      3
       19010
                                  
    
    
      4
       22022
       Revenue is more than 20,000
    
    
      5
       21409
       Revenue is more than 20,000
    
    
      6
       19894
                                  
    
    
      7
       12960
                                  
    
    
      8
       25938
       Revenue is more than 20,000
    
    
      9
       19844

CASE WHEN gives you the value "Revenue is more MORE 20,000" when revenue in that same row is greater than 20,000. Otherwise, it has no value.

Now let's add a level:



In [74]:

    
run('''
    SELECT
        revenue, 
        CASE WHEN revenue > 20000 THEN 'Revenue is MORE than 20,000' 
             WHEN revenue < 15000 THEN 'Revenue is LESS than 15,000'
             END Conditional_Column
    FROM
        sales_table
    LIMIT 10
    ''')









    Out[74]:






  
    
      
      revenue
      Conditional_Column
    
  
  
    
      0
       21222
       Revenue is MORE than 20,000
    
    
      1
       19140
                                  
    
    
      2
       14720
       Revenue is LESS than 15,000
    
    
      3
       19010
                                  
    
    
      4
       22022
       Revenue is MORE than 20,000
    
    
      5
       21409
       Revenue is MORE than 20,000
    
    
      6
       19894
                                  
    
    
      7
       12960
       Revenue is LESS than 15,000
    
    
      8
       25938
       Revenue is MORE than 20,000
    
    
      9
       19844

Now to deal with the blank spaces. You can assign an "ELSE" value to catch anything that's not included in the prior expressions:



In [75]:

    
run('''
    SELECT
        revenue,
        CASE WHEN revenue > 20000 THEN 'Revenue is MORE than 20,000' 
             WHEN revenue < 15000 THEN 'Revenue is LESS than 15,000'
             ELSE 'NEITHER'
             END Conditional_Column
    FROM
        sales_table
    LIMIT 10
    ''')









    Out[75]:






  
    
      
      revenue
      Conditional_Column
    
  
  
    
      0
       21222
       Revenue is MORE than 20,000
    
    
      1
       19140
                           NEITHER
    
    
      2
       14720
       Revenue is LESS than 15,000
    
    
      3
       19010
                           NEITHER
    
    
      4
       22022
       Revenue is MORE than 20,000
    
    
      5
       21409
       Revenue is MORE than 20,000
    
    
      6
       19894
                           NEITHER
    
    
      7
       12960
       Revenue is LESS than 15,000
    
    
      8
       25938
       Revenue is MORE than 20,000
    
    
      9
       19844
                           NEITHER

You can use values from another column as well. Remember this query from the GROUP BY lesson? It's often helpful to look at information broken out by multiple groups, but it's not especially easy to digest:



In [76]:

    
run('''
    SELECT
        C.Make as car_maker, 
        payment_type,
        ROUND(AVG(S.revenue)) as avg_revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    GROUP BY
        C.Make, 
        payment_type
    ''')









    Out[76]:






  
    
      
      car_maker
      payment_type
      avg_revenue
    
  
  
    
      0
        Honda
          cash
       17028
    
    
      1
        Honda
       finance
       19030
    
    
      2
       Subaru
          cash
       19581
    
    
      3
       Subaru
       finance
       20363
    
    
      4
       Toyota
          cash
       20437
    
    
      5
       Toyota
       finance
       19183

Look at what's going on in that query without the AVG( ) function and the GROUP BY clause:



In [77]:

    
run('''
    SELECT
        C.Make as Car_Maker, 
        payment_type,
        S.revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    ''')









    Out[77]:






  
    
      
      Car_Maker
      payment_type
      revenue
    
  
  
    
      0 
       Toyota
       finance
       21222
    
    
      1 
       Toyota
       finance
       19140
    
    
      2 
       Toyota
          cash
       14720
    
    
      3 
        Honda
       finance
       19010
    
    
      4 
       Toyota
       finance
       22022
    
    
      5 
       Subaru
          cash
       21409
    
    
      6 
        Honda
       finance
       19894
    
    
      7 
       Toyota
       finance
       12960
    
    
      8 
       Toyota
          cash
       25938
    
    
      9 
       Toyota
       finance
       19844
    
    
      10
        Honda
       finance
       21441
    
    
      11
        Honda
          cash
       17462
    
    
      12
        Honda
          cash
       17020
    
    
      13
       Toyota
       finance
       18040
    
    
      14
       Subaru
       finance
       18671
    
    
      15
       Toyota
       finance
       22506
    
    
      16
       Toyota
       finance
       20460
    
    
      17
       Toyota
       finance
       13440
    
    
      18
       Toyota
       finance
       18634
    
    
      19
       Subaru
          cash
       17312
    
    
      20
       Toyota
          cash
       23318
    
    
      21
       Toyota
          cash
       21780
    
    
      22
       Subaru
          cash
       19418
    
    
      23
       Subaru
          cash
       17756
    
    
      24
        Honda
       finance
       21441
    
    
      25
        Honda
          cash
       17905
    
    
      26
       Toyota
       finance
       16720
    
    
      27
        Honda
          cash
       20778
    
    
      28
       Subaru
          cash
       19667
    
    
      29
       Toyota
       finance
       19360
    
    
      ...
      ...
      ...
      ...
    
    
      70
       Toyota
       finance
       20174
    
    
      71
       Toyota
          cash
       18480
    
    
      72
       Toyota
          cash
       23474
    
    
      73
        Honda
          cash
       19231
    
    
      74
       Subaru
          cash
       18865
    
    
      75
       Toyota
          cash
       24104
    
    
      76
       Toyota
       finance
       20436
    
    
      77
       Toyota
          cash
       22990
    
    
      78
       Toyota
          cash
       19580
    
    
      79
       Subaru
       finance
       17756
    
    
      80
       Subaru
          cash
       18920
    
    
      81
       Subaru
       finance
       24646
    
    
      82
        Honda
          cash
       14006
    
    
      83
       Toyota
       finance
       24890
    
    
      84
       Subaru
       finance
       23401
    
    
      85
       Toyota
       finance
       20328
    
    
      86
       Toyota
          cash
       15200
    
    
      87
        Honda
          cash
       16189
    
    
      88
       Toyota
          cash
       15040
    
    
      89
       Subaru
       finance
       20197
    
    
      90
       Toyota
       finance
       23232
    
    
      91
        Honda
          cash
       16734
    
    
      92
       Toyota
          cash
       20960
    
    
      93
       Subaru
          cash
       24148
    
    
      94
       Toyota
          cash
       19140
    
    
      95
       Subaru
       finance
       21751
    
    
      96
       Subaru
          cash
       20413
    
    
      97
       Toyota
       finance
       20328
    
    
      98
       Toyota
       finance
       17600
    
    
      99
        Honda
       finance
       19894
    
  

100 rows × 3 columns

The result-set above is essentially what SQL is working with right before it separates the rows into groups and averages the revenue within those groups.

Now, we're going to use some CASE WHEN statements to change this a little:



In [78]:

    
run('''
    SELECT
        C.Make as Car_Maker, 
        payment_type,
        CASE WHEN payment_type = 'cash' THEN S.revenue END Cash_Revenue,
        CASE WHEN payment_type = 'finance' THEN S.revenue END Finance_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    ''')









    Out[78]:






  
    
      
      Car_Maker
      payment_type
      Cash_Revenue
      Finance_Revenue
    
  
  
    
      0 
       Toyota
       finance
            
       21222
    
    
      1 
       Toyota
       finance
            
       19140
    
    
      2 
       Toyota
          cash
       14720
            
    
    
      3 
        Honda
       finance
            
       19010
    
    
      4 
       Toyota
       finance
            
       22022
    
    
      5 
       Subaru
          cash
       21409
            
    
    
      6 
        Honda
       finance
            
       19894
    
    
      7 
       Toyota
       finance
            
       12960
    
    
      8 
       Toyota
          cash
       25938
            
    
    
      9 
       Toyota
       finance
            
       19844
    
    
      10
        Honda
       finance
            
       21441
    
    
      11
        Honda
          cash
       17462
            
    
    
      12
        Honda
          cash
       17020
            
    
    
      13
       Toyota
       finance
            
       18040
    
    
      14
       Subaru
       finance
            
       18671
    
    
      15
       Toyota
       finance
            
       22506
    
    
      16
       Toyota
       finance
            
       20460
    
    
      17
       Toyota
       finance
            
       13440
    
    
      18
       Toyota
       finance
            
       18634
    
    
      19
       Subaru
          cash
       17312
            
    
    
      20
       Toyota
          cash
       23318
            
    
    
      21
       Toyota
          cash
       21780
            
    
    
      22
       Subaru
          cash
       19418
            
    
    
      23
       Subaru
          cash
       17756
            
    
    
      24
        Honda
       finance
            
       21441
    
    
      25
        Honda
          cash
       17905
            
    
    
      26
       Toyota
       finance
            
       16720
    
    
      27
        Honda
          cash
       20778
            
    
    
      28
       Subaru
          cash
       19667
            
    
    
      29
       Toyota
       finance
            
       19360
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      70
       Toyota
       finance
            
       20174
    
    
      71
       Toyota
          cash
       18480
            
    
    
      72
       Toyota
          cash
       23474
            
    
    
      73
        Honda
          cash
       19231
            
    
    
      74
       Subaru
          cash
       18865
            
    
    
      75
       Toyota
          cash
       24104
            
    
    
      76
       Toyota
       finance
            
       20436
    
    
      77
       Toyota
          cash
       22990
            
    
    
      78
       Toyota
          cash
       19580
            
    
    
      79
       Subaru
       finance
            
       17756
    
    
      80
       Subaru
          cash
       18920
            
    
    
      81
       Subaru
       finance
            
       24646
    
    
      82
        Honda
          cash
       14006
            
    
    
      83
       Toyota
       finance
            
       24890
    
    
      84
       Subaru
       finance
            
       23401
    
    
      85
       Toyota
       finance
            
       20328
    
    
      86
       Toyota
          cash
       15200
            
    
    
      87
        Honda
          cash
       16189
            
    
    
      88
       Toyota
          cash
       15040
            
    
    
      89
       Subaru
       finance
            
       20197
    
    
      90
       Toyota
       finance
            
       23232
    
    
      91
        Honda
          cash
       16734
            
    
    
      92
       Toyota
          cash
       20960
            
    
    
      93
       Subaru
          cash
       24148
            
    
    
      94
       Toyota
          cash
       19140
            
    
    
      95
       Subaru
       finance
            
       21751
    
    
      96
       Subaru
          cash
       20413
            
    
    
      97
       Toyota
       finance
            
       20328
    
    
      98
       Toyota
       finance
            
       17600
    
    
      99
        Honda
       finance
            
       19894
    
  

100 rows × 4 columns

Now let's add back the ROUND() and AVG() functions and the GROUP BY statement:



In [79]:

    
run('''
    SELECT
        C.Make as Car_Maker, 
        ROUND(AVG(CASE WHEN payment_type = 'cash' THEN S.revenue END)) AS Avg_Cash_Revenue,
        ROUND(AVG(CASE WHEN payment_type = 'finance' THEN S.revenue END)) AS Avg_Finance_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    GROUP BY
        C.Make
    ''')









    Out[79]:






  
    
      
      Car_Maker
      Avg_Cash_Revenue
      Avg_Finance_Revenue
    
  
  
    
      0
        Honda
       17028
       19030
    
    
      1
       Subaru
       19581
       20363
    
    
      2
       Toyota
       20437
       19183

CASE WHEN makes this same information a lot easier to read by letting you pivot the result set a little.

Write a query using CASE WHEN to look at total revenue per gender, grouped by each car model



In [80]:

    
run('''
    SELECT NULL
    ''')
#print(case_cheat)

CASE WHEN also lets you create new groups. Start by looking at the cust_table grouped by age - remember that COUNT(***) tells you how many rows are in each group (which is the same as telling you the number of customers in each group):



In [81]:

    
run('''
    SELECT
        age,
        COUNT(*) customers
    FROM
        cust_table
    GROUP BY
        age
    ''')

When you want to segment your results, but there are too many different values for GROUP BY to be helpful, use CASE WHEN to make your own groups. GROUP BY the column you created with CASE WHEN to look at your newly created segments.



In [82]:

    
run('''
    SELECT
        CASE WHEN age BETWEEN 18 AND 24 THEN '18-24 years'
             WHEN age BETWEEN 25 AND 34 THEN '25-34 years'
             WHEN age BETWEEN 35 AND 44 THEN '35-45 years'
             WHEN age BETWEEN 45 AND 54 THEN '45-54 years'
             WHEN age BETWEEN 55 AND 64 THEN '55-64 years'
             END Age_Group,
        COUNT(*) as Customers
    FROM
        cust_table
    GROUP BY
        Age_Group
    ''')









    Out[82]:






  
    
      
      Age_Group
      Customers
    
  
  
    
      0
       18-24 years
       16
    
    
      1
       25-34 years
       29
    
    
      2
       35-45 years
       24
    
    
      3
       45-54 years
       19
    
    
      4
       55-64 years
       12

Ta-DA! Useful customer segments!

Try to break up the "Customers" column into 2 columns - one for male and one for female. Keep the age segments intact.

Note that COUNT(***) cannot be wrapped around a CASE WHEN expression the way that other functions can. Try to think of a different way to get a count.
Extra challenge: try to express male and female customers as a percentage of the total for each group, rounded to 2 decimal points



In [83]:

    
run('''
    SELECT NULL
    ''')
#print(case_cheat2)

NESTING

Nested queries allow you to put a query within a query
Depending on your needs, you might put a nested query in the SELECT clause, the FROM clause, or the WHERE clause

Consider the following query. We're using a nested query in the SELECT clause to see the sum of all revenue in the sales_table, and then using it again to what percentage of total revenue can be attributed to each Car_Model.



In [84]:

    
run('''
    SELECT
        C.model AS Car_Model,
        SUM(S.revenue) AS Revenue_Per_Model,
        (SELECT SUM(revenue) FROM sales_table) AS Total_Revenue,
        SUM(S.revenue) / (SELECT SUM(revenue) FROM sales_table) AS Contribution_to_Revenue
    FROM
        sales_table S
        JOIN car_table C ON C.model_id = S.model_id
    GROUP BY
        Car_Model
    ''')









    Out[84]:






  
    
      
      Car_Model
      Revenue_Per_Model
      Total_Revenue
      Contribution_to_Revenue
    
  
  
    
      0
         Accord
       234748
       1941252
       0
    
    
      1
          Camry
       212080
       1941252
       0
    
    
      2
          Civic
       157886
       1941252
       0
    
    
      3
        Corolla
       154880
       1941252
       0
    
    
      4
       Forester
       201083
       1941252
       0
    
    
      5
        Outback
       317406
       1941252
       0
    
    
      6
          Prius
       297418
       1941252
       0
    
    
      7
         Tundra
       365751
       1941252
       0

Write a query to look at the model name and COGs for each car in car_table, then use a nested query to also look at the average COGs off all car models in a third column

Extra Challenge: add a fourth colum using another nested query to return the difference between each car model's COGs and the average COGs



In [85]:

    
run('''
    SELECT NULL
    ''')
#print(nest_cheat1)

UNION & UNION ALL

SELECT
    column_a
FROM
    table_x

UNION               # or UNION ALL

SELECT
    column_b
FROM
    table_y

UNION allows you to run a 2nd query (or 3rd or 4th), the results will be ordered by default with the results of the first query
UNION ALL ensures that the results in the result set appear in order that the queries are written
The number of columns in each query must be the same in order for UNION & UNION ALL to work

Starting with something simple (and a little nonsensical), UNION basically lets you run two entirely separate queries. Technically, they could have nothing to do with each other:



In [86]:

    
run('''
    SELECT
        model
    FROM
        car_table
    WHERE
        model = 'Tundra'
    
    UNION
    
    SELECT 
        first_name
    FROM
        salesman_table
    WHERE first_name = 'Jared'
    ''')

Some things to note:

Although these queries and their results are unrelated, the column header is dictated by the query that appears first
Even though the query for "Tundra" is first, "Tundra" is second in the results. UNION will sort all results according to the normal default rules, acending order.
Replace UNION with UNION ALL and run the query again. What changes?

Use UNION to join two queries. The first should have two columns: car model and COGs per car. The second query should show you to average COGs for all the car models, rounded to cents. You want the the average COGs to appear in the last row.

Remember that united queries need to have the same number of columns.



In [87]:

    
run('''
    SELECT NULL
    ''')
#print(union_cheat1)

Consider the issue we had before, where SQLite didn't support WITH ROUNDUP. We used this query as a workaround. Does it make sense now?



In [88]:

    
run('''
    SELECT
        C.model AS Car_Model, 
        SUM(S.revenue) as Sum_Revenue
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    GROUP BY C.model
    
    UNION ALL
    
    SELECT 
        'NULL', 
        SUM(S.revenue)
    FROM 
        sales_table S
''')









    Out[88]:






  
    
      
      Car_Model
      Sum_Revenue
    
  
  
    
      0
         Accord
        234748
    
    
      1
          Camry
        212080
    
    
      2
          Civic
        157886
    
    
      3
        Corolla
        154880
    
    
      4
       Forester
        201083
    
    
      5
        Outback
        317406
    
    
      6
          Prius
        297418
    
    
      7
         Tundra
        365751
    
    
      8
           NULL
       1941252

Optimization:

Non-optimized queries can cause a lot of problems because tables frequently have thousands or millions of rows:

If you haven't optimized your query, it might:

Take several minutes (or even hours) to return the information you're requesting
Crash your computer
Muck up the server's processes, and you'll face the wrath of your company's system administrators once they figure out that you are the reason why the whole system has slowed down and everyone is sending them angry emails (this will probably happen to you no matter what. It's a rite of passage).

Find a few more useful optimization tips here: http://hungred.com/useful-information/ways-optimize-sql-queries/

Some of these seem strange, because we're going ling you NOT to do a bunch of things that you've learned how to do. Stick to this principal: if you're dealing with a small table, you can break a few of these rules. The larger the table, the fewer rules you can break.

DO name specific columns in the SELECT CLAUSE:



In [89]:

    
run(''' 
    SELECT
        date,
        revenue
    FROM 
        sales_table
    ''').head()

DON'T use an asterisk unless you absolutely have to:

This can put a lot of strain on servers. Only use if you know for certain that your using a small table



In [90]:

    
run(''' 
    SELECT
        *
    FROM 
        sales_table
    ''').head()









    Out[90]:






  
    
      
      id
      model_id
      customer_id
      revenue
      payment_type
      salesman_id
      date
    
  
  
    
      0
       54858
       36
       237906
       21222
       finance
       276
         1/7/2014
    
    
      1
       43161
       20
       967016
       19140
       finance
       225
        1/26/2014
    
    
      2
       40112
       46
       819010
       14720
          cash
       147
        1/17/2014
    
    
      3
       92495
       31
       633030
       19010
       finance
       215
        1/13/2014
    
    
      4
       78000
       51
       341877
       22022
       finance
       803
        1/12/2014

DO use LIKE on small tables and in simple queries:

LIKE is helpful if you know where to find something but you can't quite remember what it's called. Try to use a wildcard sparingly - don't use 2 when 1 will suffice:



In [91]:

    
run('''
    SELECT
        model_id, model
    FROM 
        car_table
    WHERE
        model LIKE '%undra'
    ''')









    Out[91]:






  
    
      
      model_id
      model
    
  
  
    
      0
       36
       Tundra

DON'T use LIKE on large tables or when using JOINs:



In [92]:

    
run('''
    SELECT 
        C.model, 
        AVG(revenue)
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
    WHERE
        C.model LIKE '%undra'
''')









    Out[92]:






  
    
      
      model
      AVG(revenue)
    
  
  
    
      0
       Tundra
       22859.4375

If you want to look at average revenue for car models that are like "%undra", run the LIKE query on the small table (car_table) first to figure out exacly what you're looking for, then use that information to search for the data you need from the sales_table

DO dip your toe in by starting with a small data set

Use WHERE to only view a few days of data at first. If the query runs quickly, add a few days at a time. If it starts to run slowly, run just a few days at a time and paste results into excel to combine results (or use Python...ask me later!!!).

The query below won't work because SQLite doesn't recognize dates, but remember these concepts when working with other RDBMS



In [93]:

    
run('''
    SELECT
        revenue,
        date
    FROM
        sales_table
    WHERE
        date = '1/1/14'
    ''')









    Out[93]:






  
    
      
      revenue
      date

DO use a UNION to look at result-sets that aren't mutually exclusive

Let's say you were interested in seeing all Toyotas as well as cars with COGs of more than 13000. Write a query for the first group, then a query for the second group, and unite them with UNION. The result set won't show you repeats - if a row matches both result sets, it will only display once.



In [94]:

    
run('''
    SELECT
        make, model, cogs
    FROM
        car_table
    WHERE 
        make = 'Toyota'
    
    UNION
    
    SELECT 
        make, model, cogs
    FROM
        car_table
    WHERE
        cogs > 13000
    ''')









    Out[94]:






  
    
      
      make
      model
      cogs
    
  
  
    
      0
        Honda
         Accord
       13263
    
    
      1
       Subaru
       Forester
       13317
    
    
      2
       Subaru
        Outback
       14937
    
    
      3
       Toyota
          Camry
       13200
    
    
      4
       Toyota
        Corolla
        9600
    
    
      5
       Toyota
          Prius
       14520
    
    
      6
       Toyota
         Tundra
       15720

DON'T use OR when a UNION will generate the same results

Note that we'll get the same results as above, but this query could run MUCH slower on a large table. It's tempting to use OR because it's faster to write, but unless you're dealing with very small tables, avoid the temptation. In 5 years of doing business analytics with SQL, I never used OR once. It's slow. Use a UNION.



In [95]:

    
run('''
    SELECT
        make, model, cogs
    FROM
        car_table
    WHERE
        make = 'Toyota' OR cogs > 13000
    ''')









    Out[95]:






  
    
      
      make
      model
      cogs
    
  
  
    
      0
       Toyota
          Camry
       13200
    
    
      1
       Toyota
        Corolla
        9600
    
    
      2
       Toyota
          Prius
       14520
    
    
      3
        Honda
         Accord
       13263
    
    
      4
       Subaru
        Outback
       14937
    
    
      5
       Subaru
       Forester
       13317
    
    
      6
       Toyota
         Tundra
       15720

DON'T use negative filters when a positive filter is possible

Let's say you want to look at cars made by Toyato and Honda, but you don't care about Subaru. It might be tempting to use a negative filter:



In [96]:

    
run('''
    SELECT
        *
    FROM
        car_table
    WHERE
        make != 'Subaru'
    ''')









    Out[96]:






  
    
      
      model_id
      make
      model
      sticker_price
      cogs
    
  
  
    
      0
       20
       Toyota
         Camry
       22000
       13200
    
    
      1
       46
       Toyota
       Corolla
       16000
        9600
    
    
      2
       51
       Toyota
         Prius
       24200
       14520
    
    
      3
       19
        Honda
         Civic
       18190
       10914
    
    
      4
       31
        Honda
        Accord
       22105
       13263
    
    
      5
       36
       Toyota
        Tundra
       26200
       15720

On a big table, this will run much more slowly than if you use a positive filter. Try this instead - it might require a little extra typing, but it will run much faster:



In [97]:

    
run('''
    SELECT
        *
    FROM
        car_table
    WHERE
        make in ('Toyota', 'Honda')
    ''')









    Out[97]:






  
    
      
      model_id
      make
      model
      sticker_price
      cogs
    
  
  
    
      0
       20
       Toyota
         Camry
       22000
       13200
    
    
      1
       46
       Toyota
       Corolla
       16000
        9600
    
    
      2
       51
       Toyota
         Prius
       24200
       14520
    
    
      3
       19
        Honda
         Civic
       18190
       10914
    
    
      4
       31
        Honda
        Accord
       22105
       13263
    
    
      5
       36
       Toyota
        Tundra
       26200
       15720

Wrapping Up:

Debugging:

If you run into errors when you start writing your own queries, here are some things to make sure your query has:

The right names for columns in the SELECT clause
Columns that can be found in the tables in the FROM clause
Consistent use of aliases throughout (if using aliases)
Joined tables on the corresponding column and proper aliases to indicate each table

The correct order of clauses:

    SELECT
    FROM
    JOIN...ON 
    WHERE
    GROUP BY
    UNION
    ORDER BY
    LIMIT

Consistent use of capitalization for variables in quotes
Fuctions and opperators for real numbers, not integers
The same number of columns/expressions in their SELECT clauses of your queries when using UNION

Gain a deeper understanding:

http://tech.pro/tutorial/1555/10-easy-steps-to-a-complete-understanding-of-sql

Practice on other databases:

http://sqlzoo.net/wiki/SELECT_.._WHERE

Sample Queries for Business Analysis:

Let's say you recently opened a car dealership, and you now have one month's worth of sales data. You want to know how your sales team is doing.

Start by looking at the number of cars each person sold last month. The names of the sales team and the list of transactions are on different tables in your database, but SQL can help you with that:



In [98]:

    
run('''
    SELECT
        first_name ||  ' ' || last_name as Salesperson,
        COUNT(*) as Cars_Sold
    FROM
        sales_table S
        JOIN salesman_table M ON S.salesman_id = M.id
    GROUP BY 
        Salesperson
    ORDER BY 
        Cars_Sold DESC
    ''')









    Out[98]:






  
    
      
      Salesperson
      Cars_Sold
    
  
  
    
      0
              Jared Case
       14
    
    
      1
          Claudine Hatch
       11
    
    
      2
            Michael Hill
       11
    
    
      3
          Rosemarie Self
       11
    
    
      4
              Elton Elzy
       10
    
    
      5
            Matthew Luna
       10
    
    
      6
            Joseph Seney
        9
    
    
      7
       Justin Avellaneda
        9
    
    
      8
        Samantha Douglas
        8
    
    
      9
          Kathleen March
        7

Add on the average amount of revenue made per sale:



In [99]:

    
run('''
    SELECT
        first_name ||  ' ' || last_name as Salesperson,
        COUNT(*) as Cars_Sold, 
        ROUND(AVG(revenue)) as Revenue_per_Sale
    FROM
        sales_table S
        JOIN salesman_table M ON S.salesman_id = M.id
    GROUP BY 
        Salesperson
    ORDER BY 
        Cars_Sold DESC
    ''')









    Out[99]:






  
    
      
      Salesperson
      Cars_Sold
      Revenue_per_Sale
    
  
  
    
      0
              Jared Case
       14
       18952
    
    
      1
          Claudine Hatch
       11
       17890
    
    
      2
            Michael Hill
       11
       18323
    
    
      3
          Rosemarie Self
       11
       19462
    
    
      4
              Elton Elzy
       10
       19467
    
    
      5
            Matthew Luna
       10
       21349
    
    
      6
            Joseph Seney
        9
       20002
    
    
      7
       Justin Avellaneda
        9
       19242
    
    
      8
        Samantha Douglas
        8
       19149
    
    
      9
          Kathleen March
        7
       21277

Make it easier to compare the average revenue of Jared's sales to the average revenue of per sale overall by adding a column to see by what percent each salesperson's sales are more or less than average:



In [100]:

    
run('''
    SELECT
        first_name ||  ' ' || last_name as Salesperson,
        COUNT(*) as Cars_Sold, 
        ROUND(AVG(revenue), 2) as Rev_per_Sale,
        ROUND((((AVG(revenue) 
            - (SELECT AVG(revenue) from sales_table))
            /(SELECT AVG(revenue) from sales_table))*100), 1) || ' %'
            as RPS_Compared_to_Avg
    FROM
        sales_table S
        JOIN salesman_table M ON S.salesman_id = M.id
    GROUP BY 
        Salesperson
    ORDER BY 
        Cars_Sold DESC
    ''')









    Out[100]:






  
    
      
      Salesperson
      Cars_Sold
      Rev_per_Sale
      RPS_Compared_to_Avg
    
  
  
    
      0
              Jared Case
       14
       18951.64
       -2.4 %
    
    
      1
          Claudine Hatch
       11
       17890.00
       -7.8 %
    
    
      2
            Michael Hill
       11
       18323.45
       -5.6 %
    
    
      3
          Rosemarie Self
       11
       19462.27
        0.3 %
    
    
      4
              Elton Elzy
       10
       19466.90
        0.3 %
    
    
      5
            Matthew Luna
       10
       21349.20
       10.0 %
    
    
      6
            Joseph Seney
        9
       20002.22
        3.0 %
    
    
      7
       Justin Avellaneda
        9
       19242.44
       -0.9 %
    
    
      8
        Samantha Douglas
        8
       19149.13
       -1.4 %
    
    
      9
          Kathleen March
        7
       21277.14
        9.6 %

So maybe Jared is just selling cheaper cars.

Let's go further and compare the sale price of each car against the sticker price to see how low Jared was willing to negotiate with customers. Sticker price is in anther table, but again, that's no problem with SQL:



In [101]:

    
run('''
    SELECT
        first_name ||  ' ' || last_name as Salesperson,
        COUNT(*) as Cars_Sold, 
        '$ ' || ROUND(AVG(revenue), 2)  as Rev_per_Sale,
        ROUND((((AVG(revenue) 
            - (SELECT AVG(revenue) from sales_table where salesman_id != 215))
            /(SELECT AVG(revenue) from sales_table where salesman_id != 215))*100), 1) || ' %'
            AS RPS_Compared_to_Avg,
        ROUND((1-(SUM(revenue) / SUM(sticker_price)))*100, 1) || ' %' as Avg_Customer_Discount
    FROM
        sales_table S
        JOIN salesman_table M ON S.salesman_id = M.id
        JOIN car_table C ON S.model_id = C.model_id
    GROUP BY 
        Salesperson
    ORDER BY 
        Cars_Sold DESC
    ''')









    Out[101]:






  
    
      
      Salesperson
      Cars_Sold
      Rev_per_Sale
      RPS_Compared_to_Avg
      Avg_Customer_Discount
    
  
  
    
      0
              Jared Case
       14
       $ 18951.64
       -2.7 %
       15.7 %
    
    
      1
          Claudine Hatch
       11
        $ 17890.0
       -8.2 %
       13.9 %
    
    
      2
            Michael Hill
       11
       $ 18323.45
       -6.0 %
       14.7 %
    
    
      3
          Rosemarie Self
       11
       $ 19462.27
       -0.1 %
       12.8 %
    
    
      4
              Elton Elzy
       10
        $ 19466.9
       -0.1 %
       13.4 %
    
    
      5
            Matthew Luna
       10
        $ 21349.2
        9.6 %
       10.3 %
    
    
      6
            Joseph Seney
        9
       $ 20002.22
        2.6 %
       12.5 %
    
    
      7
       Justin Avellaneda
        9
       $ 19242.44
       -1.3 %
       13.4 %
    
    
      8
        Samantha Douglas
        8
       $ 19149.13
       -1.7 %
       16.1 %
    
    
      9
          Kathleen March
        7
       $ 21277.14
        9.2 %
        9.5 %

Looks like Jared is letting customers negotiate prices down much more than his peers.

But is this a real problem? How much is each salesperson contributing to our gross profits?



In [102]:

    
run('''
    SELECT
        first_name ||  ' ' || last_name as Salesperson,
        COUNT(*) as Cars_Sold, 
        '$ ' || ROUND(AVG(revenue), 2)  as Rev_per_Sale,
        ROUND((((AVG(revenue) 
            - (SELECT AVG(revenue) from sales_table where salesman_id != 215))
            /(SELECT AVG(revenue) from sales_table where salesman_id != 215))*100), 1) || ' %'
            AS RPS_Compared_to_Peers,
        ROUND((1-(SUM(revenue) / SUM(sticker_price)))*100, 1) || ' %' as Avg_Customer_Discount, 
        ROUND(((SUM(revenue)-sum(C.cogs))
            /(SELECT SUM(revenue)-sum(cogs) FROM sales_table S join car_table C on S.model_id = C.model_id))*100, 1) || ' %' as Gross_Profit_Contribution
    FROM
        sales_table S
        JOIN salesman_table M ON S.salesman_id = M.id
        JOIN car_table C ON S.model_id = C.model_id
    GROUP BY 
        Salesperson
    ORDER BY 
        Cars_Sold DESC
    ''')









    Out[102]:






  
    
      
      Salesperson
      Cars_Sold
      Rev_per_Sale
      RPS_Compared_to_Peers
      Avg_Customer_Discount
      Gross_Profit_Contribution
    
  
  
    
      0
              Jared Case
       14
       $ 18951.64
       -2.7 %
       15.7 %
       12.8 %
    
    
      1
          Claudine Hatch
       11
        $ 17890.0
       -8.2 %
       13.9 %
       10.0 %
    
    
      2
            Michael Hill
       11
       $ 18323.45
       -6.0 %
       14.7 %
       10.0 %
    
    
      3
          Rosemarie Self
       11
       $ 19462.27
       -0.1 %
       12.8 %
       11.2 %
    
    
      4
              Elton Elzy
       10
        $ 19466.9
       -0.1 %
       13.4 %
       10.0 %
    
    
      5
            Matthew Luna
       10
        $ 21349.2
        9.6 %
       10.3 %
       11.9 %
    
    
      6
            Joseph Seney
        9
       $ 20002.22
        2.6 %
       12.5 %
        9.5 %
    
    
      7
       Justin Avellaneda
        9
       $ 19242.44
       -1.3 %
       13.4 %
        8.9 %
    
    
      8
        Samantha Douglas
        8
       $ 19149.13
       -1.7 %
       16.1 %
        7.3 %
    
    
      9
          Kathleen March
        7
       $ 21277.14
        9.2 %
        9.5 %
        8.4 %

SQL really lets you dig.

Some other quick examples - we could do a gender breakdown of customers per car model and add a total at the bottom:



In [103]:

    
run('''
    SELECT
        C.model as Car_Model, 
        ROUND(SUM(CASE WHEN CUST.gender = 'female' THEN 1 END)/(COUNT(S.id)*1.0), 2) AS '% Female Customers',
        ROUND(SUM(CASE WHEN CUST.gender = 'male' THEN 1 END)/(COUNT(S.id)*1.0), 2) AS '% Male Customers'
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id 
        JOIN cust_table CUST on S.customer_id = CUST.customer_id
    GROUP BY 
        Car_Model
    
    UNION ALL
    
    SELECT
        'Total:', 
        ROUND(SUM(CASE WHEN CUST.gender = 'female' THEN 1 END)/(COUNT(S.id)*1.0), 2) AS '% Female Customers',
        ROUND(SUM(CASE WHEN CUST.gender = 'male' THEN 1 END)/(COUNT(S.id)*1.0), 2) AS '% Male Customers'
    FROM
        sales_table S
        JOIN cust_table CUST on S.customer_id = CUST.customer_id
    ''')









    Out[103]:






  
    
      
      Car_Model
      % Female Customers
      % Male Customers
    
  
  
    
      0
         Accord
       0.50
       0.50
    
    
      1
          Camry
       0.45
       0.55
    
    
      2
          Civic
       0.30
       0.70
    
    
      3
        Corolla
       0.55
       0.45
    
    
      4
       Forester
       0.64
       0.36
    
    
      5
        Outback
       0.33
       0.67
    
    
      6
          Prius
       0.14
       0.86
    
    
      7
         Tundra
       0.50
       0.50
    
    
      8
         Total:
       0.42
       0.58

Easily create age groups and see how aggressively each group negotiates (judged by the difference between the actual sale amount and the sticker price):



In [104]:

    
run('''
    SELECT
        CASE WHEN age BETWEEN 18 AND 24 THEN '18-24 years'
             WHEN age BETWEEN 25 AND 34 THEN '25-34 years'
             WHEN age BETWEEN 35 AND 44 THEN '35-44 years'
             WHEN age BETWEEN 45 AND 54 THEN '45-54 years'
             WHEN age BETWEEN 55 AND 64 THEN '55-64 years'
             END Age_Group,
        ROUND((SUM(S.revenue)-SUM(C.sticker_price))/SUM(C.sticker_price), 2) as '% Paid Below Sticker Price'
    FROM
        sales_table S
        JOIN car_table C on S.model_id = C.model_id
        JOIN cust_table CUST on S.customer_id = CUST.customer_id
    GROUP BY
        Age_Group
    ''')









    Out[104]:






  
    
      
      Age_Group
      % Paid Below Sticker Price
    
  
  
    
      0
       18-24 years
      -0.14
    
    
      1
       25-34 years
      -0.12
    
    
      2
       35-44 years
      -0.11
    
    
      3
       45-54 years
      -0.16
    
    
      4
       55-64 years
      -0.14

	cid	name	type
0	0	id	INTEGER
1	1	model_id	INTEGER
2	2	customer_id	INTEGER
3	3	revenue	INTEGER
4	4	payment_type	TEXT
5	5	salesman_id	INTEGER
6	6	date	TEXT

	id	model_id	customer_id	revenue	payment_type	salesman_id	date
0	54858	36	237906	21222	finance	276	1/7/2014
1	43161	20	967016	19140	finance	225	1/26/2014
2	40112	46	819010	14720	cash	147	1/17/2014
3	92495	31	633030	19010	finance	215	1/13/2014
4	78000	51	341877	22022	finance	803	1/12/2014
5	13154	75	720210	21409	cash	215	1/20/2014
6	36535	31	908558	19894	finance	215	1/30/2014
7	22813	46	705508	12960	finance	813	1/29/2014
8	56245	36	248621	25938	cash	492	1/19/2014
9	88118	51	341344	19844	finance	215	1/23/2014
10	84469	31	733566	21441	finance	492	1/20/2014
11	37412	31	750195	17462	cash	949	1/11/2014
12	68513	31	461723	17020	cash	147	1/18/2014
13	74380	20	468665	18040	finance	803	1/5/2014
14	24047	75	556188	18671	finance	225	1/12/2014
15	40603	51	241759	22506	finance	949	1/4/2014
16	69883	20	161369	20460	finance	215	1/25/2014
17	43338	46	731692	13440	finance	225	1/29/2014
18	39727	51	656750	18634	finance	215	1/1/2014
19	21022	22	619020	17312	cash	492	1/14/2014
20	18303	36	413891	23318	cash	215	1/17/2014
21	28176	20	965672	21780	cash	862	1/11/2014
22	62604	75	217720	19418	cash	680	1/30/2014
23	83930	22	946265	17756	cash	813	1/13/2014
24	87626	31	140795	21441	finance	225	1/13/2014
25	10682	31	145479	17905	cash	949	1/20/2014
26	23444	20	961650	16720	finance	215	1/1/2014
27	87998	31	457742	20778	cash	680	1/3/2014
28	95912	75	978159	19667	cash	949	1/25/2014
29	74410	20	474218	19360	finance	276	1/6/2014
...	...	...	...	...	...	...	...
70	61014	36	253239	20174	finance	215	1/21/2014
71	18946	20	614301	18480	cash	680	1/28/2014
72	75834	51	168495	23474	cash	492	1/10/2014
73	57366	31	699431	19231	cash	813	1/12/2014
74	57711	22	528354	18865	cash	225	1/4/2014
75	65870	36	751098	24104	cash	492	1/23/2014
76	22147	36	480153	20436	finance	225	1/23/2014
77	99759	51	208677	22990	cash	492	1/16/2014
78	64406	20	349494	19580	cash	276	1/16/2014
79	55585	22	206310	17756	finance	276	1/1/2014
80	24710	75	185223	18920	cash	147	1/16/2014
81	53831	75	656181	24646	finance	803	1/18/2014
82	97494	19	393613	14006	cash	862	1/14/2014
83	13004	36	488910	24890	finance	813	1/12/2014
84	30779	75	846630	23401	finance	147	1/25/2014
85	29708	51	534633	20328	finance	215	1/29/2014
86	14082	46	909110	15200	cash	813	1/28/2014
87	82119	19	628031	16189	cash	862	1/12/2014
88	31815	46	743097	15040	cash	147	1/25/2014
89	32345	22	476047	20197	finance	862	1/19/2014
90	63526	51	862765	23232	finance	276	1/23/2014
91	20256	19	597655	16734	cash	215	1/16/2014
92	15930	36	486534	20960	cash	862	1/26/2014
93	68446	75	339830	24148	cash	225	1/22/2014
94	68330	20	283797	19140	cash	862	1/9/2014
95	85407	22	635204	21751	finance	803	1/18/2014
96	42428	75	619016	20413	cash	147	1/11/2014
97	37620	51	183947	20328	finance	949	1/3/2014
98	96344	20	731677	17600	finance	680	1/15/2014
99	53062	31	907549	19894	finance	225	1/28/2014

	model_id	make	model	sticker_price	cogs
0	20	Toyota	Camry	22000	13200
1	46	Toyota	Corolla	16000	9600
2	51	Toyota	Prius	24200	14520
3	19	Honda	Civic	18190	10914
4	31	Honda	Accord	22105	13263

	id	first_name	last_name
0	949	Samantha	Douglas
1	215	Jared	Case
2	813	Michael	Hill
3	680	Claudine	Hatch
4	276	Joseph	Seney

	customer_id	gender	revenue
0	237906	female	21222
1	967016	male	19140
2	819010	male	14720
3	633030	male	19010
4	341877	female	22022
5	720210	male	21409
6	908558	male	19894
7	705508	female	12960
8	248621	male	25938
9	341344	female	19844
10	733566	female	21441
11	750195	female	17462
12	461723	male	17020
13	468665	female	18040
14	556188	female	18671
15	241759	male	22506
16	161369	female	20460
17	731692	female	13440
18	656750	male	18634
19	619020	female	17312
20	413891	male	23318
21	965672	male	21780
22	217720	female	19418
23	946265	male	17756
24	140795	female	21441
25	145479	male	17905
26	961650	female	16720
27	457742	male	20778
28	978159	male	19667
29	474218	male	19360
...	...	...	...
70	253239	male	20174
71	614301	female	18480
72	168495	male	23474
73	699431	female	19231
74	528354	female	18865
75	751098	male	24104
76	480153	female	20436
77	208677	male	22990
78	349494	female	19580
79	206310	female	17756
80	185223	male	18920
81	656181	female	24646
82	393613	female	14006
83	488910	female	24890
84	846630	male	23401
85	534633	male	20328
86	909110	female	15200
87	628031	male	16189
88	743097	female	15040
89	476047	male	20197
90	862765	male	23232
91	597655	male	16734
92	486534	male	20960
93	339830	male	24148
94	283797	male	19140
95	635204	female	21751
96	619016	female	20413
97	183947	male	20328
98	731677	male	17600
99	907549	female	19894

	Microsoft SQL Server	MySQL	Oracle	SQLite
JOIN or INNER JOIN	✓	✓	✓	✓
LEFT JOIN or LEFT OUTER JOIN	✓	✓	✓	✓
RIGHT JOIN or RIGHT OUTER JOIN	✓	✓	✓	not supported
OUTER JOIN or FULL OUTER JOIN	✓	not supported	✓	not supported

	Owner_Name	Dog_Name
0	Michael Martinez	Chewbacca
1	Gilbert Henkel	Max
2	May Reeves	Georgie
3	Elizabeth Minier	Sunny
4	Donna Ona	Daisy
5	Daniel Abner	Jake
6	Keith Bursey	Muffin
7	Jane Garcia	Admiral
8	Sarah Flores	Bogie
9	David Brace	Fritz

	Owner_Name	Cat_Name
0	Michael Martinez	Mrs. Paws
1	Gilbert Henkel	Mittens
2	May Reeves	Clover
3	Elizabeth Minier	Sweetcakes
4	Donna Ona	Barnie
5	Monika Turner	Cara
6	Adrian Hardiman	Annabelle
7	Loyd Mossman	Garfield
8	Heather Emery	Engine
9	Lauren Larson	Midnight

	id	model	revenue	cogs	gross_profit
0	54858	Tundra	21222	15720	5502
1	43161	Camry	19140	13200	5940
2	40112	Corolla	14720	9600	5120
3	92495	Accord	19010	13263	5747
4	78000	Prius	22022	14520	7502

	last_name	first_name	full_name
0	Douglas	Samantha	Douglas, Samantha
1	Case	Jared	Case, Jared
2	Hill	Michael	Hill, Michael
3	Hatch	Claudine	Hatch, Claudine
4	Seney	Joseph	Seney, Joseph
5	March	Kathleen	March, Kathleen
6	Self	Rosemarie	Self, Rosemarie
7	Avellaneda	Justin	Avellaneda, Justin
8	Elzy	Elton	Elzy, Elton
9	Luna	Matthew	Luna, Matthew

	Car_Model	Total_Revenue
0	Accord	234748
1	Camry	212080
2	Civic	157886
3	Corolla	154880
4	Forester	201083
5	Outback	317406
6	Prius	297418
7	Tundra	365751

	Car_Model	Min_to_Max_Sale	Range	Average_Sale
0	Tundra	20174 - 25938	5764	22859.44
1	Prius	18150 - 23958	5808	21244.14
2	Outback	18671 - 24646	5975	21160.40
3	Accord	17020 - 21441	4421	19562.33
4	Camry	16720 - 21780	5060	19280.00
5	Forester	16646 - 21751	5105	18280.27
6	Civic	13642 - 18008	4366	15788.60
7	Corolla	12640 - 15360	2720	14080.00

	car_caker	payment_type	avg_revenue
0	Honda	cash	17028
1	Honda	finance	19030
2	Subaru	cash	19581
3	Subaru	finance	20363
4	Toyota	cash	20437
5	Toyota	finance	19183

	Car_Model	Avg_Revenue
0	Accord	17462.333333
1	Camry	17160.000000
2	Civic	15542.000000
3	Corolla	14080.000000
4	Forester	17438.714286

	Microsoft SQL Server	MySQL	Oracle	SQLite
IF	IF condition PRINT value_if_true...	IF(condition, value_if_true, value_if_false)	IF condition THEN value_if_true ELSIF...END IF	not supported
CASE WHEN	✓	✓	✓	✓

	revenue	Conditional_Column
0	21222	Revenue is more than 20,000
1	19140
2	14720
3	19010
4	22022	Revenue is more than 20,000
5	21409	Revenue is more than 20,000
6	19894
7	12960
8	25938	Revenue is more than 20,000
9	19844

	age	customers
0	18	4
1	19	2
2	20	2
3	21	2
4	22	2
5	23	1
6	24	3
7	25	1
8	26	1
9	27	3
10	28	3
11	29	4
12	30	1
13	31	5
14	32	3
15	33	6
16	34	2
17	36	3
18	38	4
19	39	1
20	40	3
21	41	4
22	42	5
23	43	2
24	44	2
25	45	3
26	46	4
27	47	1
28	48	2
29	49	1
30	50	1
31	51	1
32	52	3
33	53	1
34	54	2
35	55	1
36	56	3
37	57	3
38	58	4
39	59	1

	Age_Group	Customers
0	18-24 years	16
1	25-34 years	29
2	35-45 years	24
3	45-54 years	19
4	55-64 years	12

	Car_Model	Revenue_Per_Model	Total_Revenue
0	Accord	234748	1941252
1	Camry	212080	1941252
2	Civic	157886	1941252
3	Corolla	154880	1941252
4	Forester	201083	1941252
5	Outback	317406	1941252
6	Prius	297418	1941252
7	Tundra	365751	1941252

	Salesperson	Cars_Sold
0	Jared Case	14
1	Claudine Hatch	11
2	Michael Hill	11
3	Rosemarie Self	11
4	Elton Elzy	10
5	Matthew Luna	10
6	Joseph Seney	9
7	Justin Avellaneda	9
8	Samantha Douglas	8
9	Kathleen March	7