In [1]:
%matplotlib inline
import pandas as pd
In [2]:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
In [ ]:
First Step: Get the data from storage into the dataframe. Simple and easy method: pd.DataFrame.from_csv
In [4]:
imdb = pd.DataFrame.from_csv('imdb.csv', index_col=None)
Now, we have the data stored as a DataFrame titled "imdb". As a simple first step, we'd like to see the structure of this DataFrame. We'll use different ways to do this ("df" is the name of an imaginary dataframe):
In [5]:
len(imdb)
Out[5]:
In [6]:
imdb.shape
Out[6]:
But I want to look at the Data...not just funny numbers!
In [7]:
imdb.head(5)
Out[7]:
In [13]:
#Enter your code here
Individual columns of the dataframe can be accessed by df.column_name or df['column_name']. However, the result is not a DataFrame, but a Series structure.
In [8]:
imdb_top5=imdb.head(5) #smaller dataframe with only top 5 rows.
imdb_top5['title'] #The 'title' column of this smaller dataframe
Out[8]:
In [9]:
a=imdb_top5.title
In [10]:
type(a)
Out[10]:
In [11]:
top_years=imdb_top5.year
top_years
Out[11]:
In [12]:
top_years>1950 #get boolean vector. So, only 2 movies in the top 5 were made after 1950
#smh
Out[12]:
In [13]:
imdb_top5[imdb_top5.year>1950] #passing this boolean vector to the dataframe, filters the data
#rows which meet the condition (have a "True") are retained.
#rows not meeting the condition(have a "False) are removed.
Out[13]:
In [14]:
#type your code here
len(imdb[imdb.year>1950])
Out[14]:
If using multiple conditions in the filter, seperate each condition in brackets and use the logical operators:
In [39]:
imdb[(imdb.year>1950) & (imdb.rating>8.8)]
#Filters all movies/shows made after 1950 AND having a rating of over 8.8
Out[39]:
This output can be sorted using the sort method. df.sort(column_name)
In [41]:
imdb[(imdb.year>1950) & (imdb.rating>8.8)].sort('year')
#Filters all movies/shows made after 1950 AND having a rating of over 8.8,
#then sort this Dataframe based on the year
Out[41]:
In [ ]:
#type your code here
In [ ]:
#type your code here
In [ ]:
#type your code here
In [46]:
imdb_top5=imdb.head(5)
a=imdb_top5.top_250_rank
a
Out[46]:
In [47]:
a.isnull()
Out[47]:
In [ ]:
#type your code here
In [55]:
temp=imdb
temp.fillna(0)
Out[55]:
In [67]:
imdb[imdb.title.str.contains("Files")]
Out[67]:
In [ ]:
#Enter your code here
In [ ]:
In [ ]:
In [ ]:
In [ ]: