5T_데이터 분석을 위한 SQL 실습 (4) - SQL Advanced

특정 카테고리에 포함된 영화들의 렌탈 횟수

  • rental, inventory, film, film_category, category
  • "Comedy", "Sports", "Family" 카테고리에 포함되는 영화들의 렌탈 횟수

In [1]:
import pymysql

In [2]:
db = pymysql.connect(
    "db.fastcamp.us",
    "root",
    "dkstncks",
    "sakila",
    charset='utf8',
)

In [3]:
rental_df = pd.read_sql("SELECT * FROM rental;", db)
inventory_df = pd.read_sql("SELECT * FROM inventory;", db)
film_df = pd.read_sql("SELECT * FROM film;", db)
film_category_df = pd.read_sql("SELECT * FROM film_category;", db)
category_df = pd.read_sql("SELECT * FROM category;", db)

In [4]:
rental_df.head(1)


Out[4]:
rental_id rental_date inventory_id customer_id return_date staff_id last_update
0 1 2005-05-24 22:53:30 367 130 2005-05-26 22:04:30 1 2006-02-15 21:30:53

In [5]:
inventory_df.head(1)


Out[5]:
inventory_id film_id store_id last_update
0 1 1 1 2006-02-15 05:09:17

In [6]:
film_df.head(1)


Out[6]:
film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating special_features last_update
0 1 ACADEMY DINOSAUR A Epic Drama of a Feminist And a Mad Scientist... 2006 1 None 6 0.99 86 20.99 PG Deleted Scenes,Behind the Scenes 2006-02-15 05:03:42

In [7]:
film_category_df.head(1)


Out[7]:
film_id category_id last_update
0 1 6 2006-02-15 05:07:09

In [8]:
category_df.head(1)


Out[8]:
category_id name last_update
0 1 Action 2006-02-15 04:46:27

In [25]:
SQL_QUERY = """
    SELECT
        c.category_id category_id,
        c.name category_name,
        COUNT(*) rentals_per_category
    FROM
        rental r
            JOIN inventory i ON r.inventory_id = i.inventory_id
            JOIN film f ON f.film_id = i.film_id
            JOIN film_category fc ON fc.film_id = f.film_id
            JOIN category c ON fc.category_id = c.category_id
    WHERE
        c.name IN ("Family", "Sports", "Comedy")
    GROUP BY
        category_id
    ORDER BY rentals_per_category DESC
    ;
"""

pd.read_sql(SQL_QUERY, db)


Out[25]:
category_id category_name rentals_per_category
0 15 Sports 1179
1 8 Family 1096
2 5 Comedy 941

In [ ]:

Store 1의 등급별 매출 중 "R", "PG-13"의 매출

  • film, payment, inventory, rental

In [29]:
payment_df = pd.read_sql("SELECT * FROM payment;", db)

In [26]:
film_df.head(1)


Out[26]:
film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating special_features last_update
0 1 ACADEMY DINOSAUR A Epic Drama of a Feminist And a Mad Scientist... 2006 1 None 6 0.99 86 20.99 PG Deleted Scenes,Behind the Scenes 2006-02-15 05:03:42

In [30]:
payment_df.head(1)


Out[30]:
payment_id customer_id staff_id rental_id amount payment_date last_update
0 1 1 1 76.0 2.99 2005-05-25 11:30:37 2006-02-15 22:12:30

In [31]:
inventory_df.head(1)


Out[31]:
inventory_id film_id store_id last_update
0 1 1 1 2006-02-15 05:09:17

In [32]:
rental_df.head(1)


Out[32]:
rental_id rental_date inventory_id customer_id return_date staff_id last_update
0 1 2005-05-24 22:53:30 367 130 2005-05-26 22:04:30 1 2006-02-15 21:30:53

In [43]:
SQL_QUERY = """
    SELECT
        i.store_id store_id,
        f.rating rating,
        SUM(p.amount) total_revenue
    FROM
        payment p
            JOIN rental r ON p.rental_id = r.rental_id
            JOIN inventory i ON r.inventory_id = i.inventory_id
            JOIN film f ON i.film_id = f.film_id
    WHERE
        i.store_id = 1
        AND f.rating IN ("PG-13", "R")
    GROUP BY
        store_id,
        rating
    ;
"""
pd.read_sql(SQL_QUERY, db)


Out[43]:
store_id rating total_revenue
0 1 PG-13 8091.51
1 1 R 6514.62

In [ ]:

배우별 매출

  • 영화별 매출을 구하고 이 데이터를 바탕으로 배우별 매출을 구하세요

In [49]:
# 1. 영화별 매출 - rental, film, inventory
REVENUE_PER_FILM_SQL_QUERY = """
    SELECT
        f.film_id film_id,
        COUNT(*) * f.rental_rate revenue
    FROM
        rental r
            JOIN inventory i ON r.inventory_id = i.inventory_id
            JOIN film f ON i.film_id = f.film_id    
    GROUP BY
        film_id
    ;
"""
pd.read_sql(REVENUE_PER_FILM_SQL_QUERY, db)


# 2. 배우별 매출 - actor, film_actor

SQL_QUERY = """
    SELECT
        a.actor_id,
        a.last_name last_name,
        a.first_name first_name,
        SUM(rpf.revenue) revenue_per_actor
    FROM ({REVENUE_PER_FILM_SQL_QUERY}) AS rpf
        JOIN film_actor fa ON rpf.film_id = fa.film_id
        JOIN actor a ON fa.actor_id = a.actor_id
    GROUP BY
        actor_id
    ORDER BY revenue_per_actor DESC
    ;
""".format(
    REVENUE_PER_FILM_SQL_QUERY=REVENUE_PER_FILM_SQL_QUERY.replace(";",""),
)
pd.read_sql(SQL_QUERY, db)


Out[49]:
actor_id last_name first_name revenue_per_actor
0 107 DEGENERES GINA 2469.47
1 60 BERRY HENRY 1983.88
2 181 CARREY MATTHEW 1935.22
3 81 DAMON SCARLETT 1848.28
4 102 TORN WALTER 1829.60
5 58 AKROYD CHRISTIAN 1800.48
6 144 WITHERSPOON ANGELA 1799.46
7 17 VOIGHT HELEN 1789.43
8 198 KEITEL MARY 1767.26
9 111 ZELLWEGER CAMERON 1760.40
10 191 GOODING GREGORY 1759.77
11 136 MANSFIELD ED 1759.22
12 90 GUINESS SEAN 1759.01
13 23 KILMER SANDRA 1739.96
14 65 HUDSON ANGELA 1734.26
15 27 MCQUEEN JULIA 1727.51
16 185 BOLGER MICHAEL 1721.47
17 66 TANDY MARY 1709.65
18 83 WILLIS BEN 1709.49
19 117 TRACY RENEE 1700.54
20 28 HOFFMAN WOODY 1670.40
21 21 PALTROW KIRSTEN 1663.48
22 142 RYDER JADA 1660.40
23 147 WINSLET FAY 1655.57
24 13 WOOD UMA 1647.63
25 41 DEGENERES JODIE 1640.07
26 92 AKROYD KIRSTEN 1621.34
27 127 GARLAND KEVIN 1607.35
28 151 HESTON GEOFFREY 1603.38
29 105 CROWE SIDNEY 1579.43
... ... ... ... ...
170 32 HACKMAN TIM 1000.76
171 190 BAILEY AUDREY 987.67
172 38 MCKELLEN TOM 979.25
173 170 HOPPER MENA 976.44
174 99 MOSTEL JIM 975.15
175 77 MCCONAUGHEY CARY 973.57
176 69 PALTROW KENNETH 969.49
177 164 WILLIS HUMPHREY 968.06
178 63 WRAY CAMERON 967.97
179 91 BERRY CHRISTOPHER 967.29
180 174 BENING MICHAEL 950.08
181 79 HOFFMAN MAE 946.73
182 88 PESCI KENNETH 945.08
183 183 CLOSE RUSSELL 887.04
184 189 BIRCH CUBA 875.39
185 180 SILVERSTONE JEFF 873.75
186 110 DAVIS SUSAN 866.80
187 9 SWANK JOE 831.19
188 141 HARRIS CATE 825.38
189 3 CHASE ED 819.89
190 1 GUINESS PENELOPE 813.95
191 178 MONROE LISA 812.32
192 199 FAWCETT JULIA 808.45
193 186 ZELLWEGER JULIA 792.79
194 4 DAVIS JENNIFER 751.26
195 148 DEE EMILY 719.84
196 71 GRANT ADAM 706.19
197 30 PECK SANDRA 704.13
198 31 SOBIESKI SISSY 666.65
199 35 DEAN JUDY 640.45

200 rows × 4 columns