In [2]:
%matplotlib inline

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
sns.set_context('notebook', font_scale=1.5)

Data Science Stack

The following exercises give practice in the use of the Python data science stack, especially pandas for data manipulation and seaborn for quick statistical plotting.

1. (40 points) Read in the CSV file pokemon.csv in the local directory (Source: Kaggle). Do the following:

  • Calculate the number of rows and columns in the data frame (5 points)
  • Drop the column Type 2 without creating a copy of the data frame i.e. in-place (5 points)
  • Show a table with 5 rows sampled at random without replacement (5 points)
  • Sort the data frame in descending order of Speed in-place (5 points)
  • Create a column 'Value' where value = 3*HP + 2*Attack + 1*Defense (5 points)
  • Drop all rows that have the string Forme in the Name column in-place (5 points)
  • Find the mean and variance of Attack and Defense attributes of all the Type 1 AND Generation subgroups. For instance, one such group would be (Grass, 1). (10 points)

Note: If you change the data frame, print out the first 3 rows after each change with the head method.


In [5]:
# Your answer here

2. (30 points) Using the same Pokemon data frame, do the following:

  • Create a new data frame with the following columns Name, Type 1, Generation, Feature, Score where Name, Type 1, Generation have the same meaning as in the original data frame, Feature is a column containing one of the following strings HP, Attack, Defense, Sp. Atk, Sp. Def, Speed and Score is the numerical value of the feature. This is known as going from wide-to-tall formats. In R, this operation can be done using the gatehr function from the tidyr package. (10 points)
  • Using the new data frame and the seaborn package, create a grid of box plots where the x-axis the Features, the y-axis shows the 'Score', the rows are the Type 1 values, and the columns are the Generation values. (10 points)
  • Using seaborn, make a cluster map showing the mean values of HP, Attack, Defense, Sp. Atk, Sp. Def and Speed for each Type 1 Pokemon. Rotate the Type 1 lables so they are readable. (10 points)

In [14]:
# Your answer here

3. (30 points) Read in the CSV file pokemonGo.csv in the local directory (Source: Kaggle). Do the following:

  • Create a new data frame with that combines columns from the pokemon.csv and pokemonGO.csv files. Drop any row that does not have Name, Type 1 and Type 2 values that are exactly the same in both data frames. (10 points)
  • Write a loop to download the images of Pokemon whose speed is greater than 120. (10 points)
  • Display these Pokemon images in the Jupyter notebook. (10 points)

In [10]:
# Your answer here