Calculate the Box Office using ticket inflation price of corresponding year


In [10]:
import pandas as pd
import math
import csv

Read Ticket Inflation Price by Year


In [21]:
ticketPriceByYear = pd.read_csv('ticketPriceInflation.csv')

ticketPriceByYear.head(3)


Out[21]:
YEAR AVG. PRICE
0 2017 8.84
1 2016 8.65
2 2015 8.43

Read dataset with box office


In [22]:
movie = pd.read_csv('datasetWithBoxoffice.csv')

movie.head(3)


Out[22]:
TMDB ID IMDB ID TITLE YEAR GENRE RATING RELEASED ACTORS AWARDS COUNTRY LANGUAGE BOX OFFICE
0 862 tt0114709 Toy Story 1995 Animation, Adventure, Comedy 8.3 22 Nov 1995 Tom Hanks, Tim Allen, Don Rickles, Jim Varney Nominated for 3 Oscars. Another 23 wins & 18 n... USA English 191796233.0
1 8844 tt0113497 Jumanji 1995 Action, Adventure, Family 6.9 15 Dec 1995 Robin Williams, Jonathan Hyde, Kirsten Dunst, ... 4 wins & 9 nominations. USA English, French 100200000.0
2 15602 tt0113228 Grumpier Old Men 1995 Comedy, Romance 6.6 22 Dec 1995 Walter Matthau, Jack Lemmon, Sophia Loren, Ann... 2 wins & 2 nominations. USA English 69870000.0

Getting 2014 Ticket price


In [14]:
ticketPrice2014 = ticketPriceByYear[ticketPriceByYear['YEAR'] == 2014]['AVG. PRICE'].values[0]

Function to calculate Adjusted Box Office


In [15]:
def calculateAdjustedBoxOffice(year,boxOffice):
    
    if year == 2014 or math.isnan(boxOffice):
        return boxOffice
    
    else:
        ticketPrice = ticketPriceByYear[ticketPriceByYear['YEAR'] == year]['AVG. PRICE'].values[0] 
        estimateTicket = (boxOffice/ticketPrice)
        return math.floor(estimateTicket * ticketPrice2014)

List with

  • First Index - IMDB ID
  • Second Index - Adjusted Box Office

In [16]:
listAdjustTicketPriceInflation = [[imdbID for imdbID in movie['IMDB ID']], \
                                  [calculateAdjustedBoxOffice(i[3], i[11])for i in movie.values]]

Create a csv file


In [17]:
with open('adjustedBoxOffice.csv','w') as csvfile:
    fieldnames = ['IMDB ID','ADJ. BOX OFFICE']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    writer.writeheader()
    for i in range(len(listAdjustTicketPriceInflation[0])):
        writer.writerow({'IMDB ID': listAdjustTicketPriceInflation[0][i], 'ADJ. BOX OFFICE': listAdjustTicketPriceInflation[1][i]})

In [18]:
adjustedBoxOffice = pd.read_csv('adjustedBoxOffice.csv')

adjustedBoxOffice.head(3)


Out[18]:
IMDB ID ADJ. BOX OFFICE
0 tt0114709 360224189.0
1 tt0113497 188191724.0
2 tt0113228 131227103.0

Merging csv using IMDB ID as key


In [19]:
result = movie.merge(adjustedBoxOffice, left_on='IMDB ID', right_on='IMDB ID',how='inner')

result.head(3)


Out[19]:
TMDB ID IMDB ID TITLE YEAR GENRE RATING RELEASED ACTORS AWARDS COUNTRY LANGUAGE BOX OFFICE ADJ. BOX OFFICE
0 862 tt0114709 Toy Story 1995 Animation, Adventure, Comedy 8.3 22 Nov 1995 Tom Hanks, Tim Allen, Don Rickles, Jim Varney Nominated for 3 Oscars. Another 23 wins & 18 n... USA English 191796233.0 360224189.0
1 8844 tt0113497 Jumanji 1995 Action, Adventure, Family 6.9 15 Dec 1995 Robin Williams, Jonathan Hyde, Kirsten Dunst, ... 4 wins & 9 nominations. USA English, French 100200000.0 188191724.0
2 15602 tt0113228 Grumpier Old Men 1995 Comedy, Romance 6.6 22 Dec 1995 Walter Matthau, Jack Lemmon, Sophia Loren, Ann... 2 wins & 2 nominations. USA English 69870000.0 131227103.0

Converting Dataframe to csv


In [20]:
result.to_csv('finalDataset.csv', index=False)

In [ ]: