4T_데이터 분석을 위한 SQL 실습 (3) - GROUP BY, DISTINCT

월별 렌탈 지표

  • 월을 기준으로 그룹을 묶어서, 총 렌탈 수, 렌탈한 유저 수(중복 없어야 함), 유저 별 평균 렌탈 수를 출력

In [1]:
import pymysql

In [2]:
db = pymysql.connect(
    "db.fastcamp.us",
    "root",
    "dkstncks",
    "sakila",
    charset='utf8',
)

In [3]:
rental_df = pd.read_sql("SELECT * FROM rental;", db)

In [4]:
rental_df.head(2)


Out[4]:
rental_id rental_date inventory_id customer_id return_date staff_id last_update
0 1 2005-05-24 22:53:30 367 130 2005-05-26 22:04:30 1 2006-02-15 21:30:53
1 2 2005-05-24 22:54:33 1525 459 2005-05-28 19:40:33 1 2006-02-15 21:30:53

In [5]:
SQL_QUERY = """
    SELECT
        LEFT(rental_date, 7) month,
        COUNT(*) total_rentals_per_month,
        COUNT(DISTINCT customer_id) unique_customers,
        COUNT(*) / COUNT(DISTINCT customer_id) average_rentals_per_customer
    FROM rental
    GROUP BY month
    ORDER BY average_rentals_per_customer DESC
    ;
"""
pd.read_sql(SQL_QUERY, db)


Out[5]:
month total_rentals_per_month unique_customers average_rentals_per_customer
0 2005-07 6709 599 11.2003
1 2005-08 5686 599 9.4925
2 2005-06 2311 590 3.9169
3 2005-05 1156 520 2.2231
4 2006-02 182 158 1.1519

In [ ]:

월별 영화별 매출

  • 월별 렌탈 수 (=A)
  • 월별 렌탈된 영화의 수(unique한 영화의 수)(=B)
  • 영화별 평균 렌탈수(=A/B)

In [9]:
inventory_df = pd.read_sql("SELECT * FROM inventory;", db)
film_df = pd.read_sql("SELECT * FROM film;", db)

In [12]:
rental_df.head(2)


Out[12]:
rental_id rental_date inventory_id customer_id return_date staff_id last_update
0 1 2005-05-24 22:53:30 367 130 2005-05-26 22:04:30 1 2006-02-15 21:30:53
1 2 2005-05-24 22:54:33 1525 459 2005-05-28 19:40:33 1 2006-02-15 21:30:53

In [10]:
inventory_df.head(2)


Out[10]:
inventory_id film_id store_id last_update
0 1 1 1 2006-02-15 05:09:17
1 2 1 1 2006-02-15 05:09:17

In [11]:
film_df.head(2)


Out[11]:
film_id title description release_year language_id original_language_id rental_duration rental_rate length replacement_cost rating special_features last_update
0 1 ACADEMY DINOSAUR A Epic Drama of a Feminist And a Mad Scientist... 2006 1 None 6 0.99 86 20.99 PG Deleted Scenes,Behind the Scenes 2006-02-15 05:03:42
1 2 ACE GOLDFINGER A Astounding Epistle of a Database Administrat... 2006 1 None 3 4.99 48 12.99 G Trailers,Deleted Scenes 2006-02-15 05:03:42

In [21]:
SQL_QUERY = """
    SELECT
        LEFT(rental_date, 7) month,
        COUNT(*) MONTH_PER_RENTALS,
        COUNT(DISTINCT f.film_id) unique_films,
        COUNT(*) / COUNT(DISTINCT f.film_id) average_rentals_per_film
    FROM
        rental r
            JOIN inventory i ON r.inventory_id = i.inventory_id
            JOIN film f ON f.film_id = i.film_id
    GROUP BY month
    ;
"""
pd.read_sql(SQL_QUERY, db)


Out[21]:
month MONTH_PER_RENTALS unique_films average_rentals_per_film
0 2005-05 1156 686 1.6851
1 2005-06 2311 900 2.5678
2 2005-07 6709 958 7.0031
3 2005-08 5686 958 5.9353
4 2006-02 182 168 1.0833