2: Life And Death Of Avengers

The Avengers are a well-known and widely loved team of superheroes in the Marvel universe that were introduced in the 1960's in the original comic book series. They've since become popularized again through the recent Disney movies as part of the new Marvel Cinematic Universe.

The team at FiveThirtyEight wanted to dissect the deaths of the Avengers in the comics over the years. The writers were known to kill off and revive many of the superheroes so they were curious to know what data they could grab from the Marvel Wikia site, a fan-driven community site, to explore further. To learn how they collected their data, which is available on their Github repo, read the writeup they published on their site.



In [1]:

    
# %sh

# wget https://raw.githubusercontent.com/fivethirtyeight/data/master/avengers/avengers.csv

# ls -l

3: Exploring The Data

While the FiveThirtyEight team has done a wonderful job acquiring this data, the data still has some inconsistencies. Your mission, if you choose to accept it, is to clean up their dataset so it can be more useful for analysis in Pandas. Read our dataset into Pandas as a DataFrame and preview the first 5 rows to get a better sense of our data.



In [2]:

    
import pandas as pd

avengers = pd.read_csv("avengers.csv")
avengers.head(5)









    Out[2]:






  
    
      
      URL
      Name/Alias
      Appearances
      Current?
      Gender
      Probationary Introl
      Full/Reserve Avengers Intro
      Year
      Years since joining
      Honorary
      ...
      Return1
      Death2
      Return2
      Death3
      Return3
      Death4
      Return4
      Death5
      Return5
      Notes
    
  
  
    
      0
      http://marvel.wikia.com/Henry_Pym_(Earth-616)
      Henry Jonathan "Hank" Pym
      1269
      YES
      MALE
      NaN
      Sep-63
      1963
      52
      Full
      ...
      NO
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Merged with Ultron in Rage of Ultron Vol. 1. A...
    
    
      1
      http://marvel.wikia.com/Janet_van_Dyne_(Earth-...
      Janet van Dyne
      1165
      YES
      FEMALE
      NaN
      Sep-63
      1963
      52
      Full
      ...
      YES
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Dies in Secret Invasion V1:I8. Actually was se...
    
    
      2
      http://marvel.wikia.com/Anthony_Stark_(Earth-616)
      Anthony Edward "Tony" Stark
      3068
      YES
      MALE
      NaN
      Sep-63
      1963
      52
      Full
      ...
      YES
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Death: "Later while under the influence of Imm...
    
    
      3
      http://marvel.wikia.com/Robert_Bruce_Banner_(E...
      Robert Bruce Banner
      2089
      YES
      MALE
      NaN
      Sep-63
      1963
      52
      Full
      ...
      YES
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Dies in Ghosts of the Future arc. However "he ...
    
    
      4
      http://marvel.wikia.com/Thor_Odinson_(Earth-616)
      Thor Odinson
      2402
      YES
      MALE
      NaN
      Sep-63
      1963
      52
      Full
      ...
      YES
      YES
      NO
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Dies in Fear Itself brought back because that'...
    
  

5 rows × 21 columns

4: Filter Out The Bad Years

Since the data was collected from a community site, where most of the contributions came from individual users, there's room for errors to surface in the dataset. If you plot a histogram of the values in the Year column, which describes the year each Avenger was introduced, you'll immediately notice some oddities. There are quite a few Avengers who look like they were introduced in 1900, which we know is a little fishy. The Avengers weren't introduced in the comic series until the 1960's!

This is obviously a mistake in the data and you should remove all Avengers before 1960 from the DataFrame.



In [3]:

    
true_avengers = avengers[avengers['Year'] >= 1960]

print('All: ' + str(len(avengers.index)))
print('After 1960: ' + str(len(true_avengers.index)))









    



All: 173
After 1960: 159

5: Consolidating Deaths

We are interested in the number of total deaths each character experienced and we'd like a field containing that distilled information. Right now, there are 5 fields (Death1 to Death5) that each contain a binary value representing if a superhero experienced that death or not. For example, a superhero can experience Death1, then Death2, etc. until they were no longer brought back to life by the writers.

We'd like to coalesce that information into just one field so we can do numerical analysis more easily.



In [4]:

    
columns = ['Death1', 'Death2', 'Death3', 'Death4', 'Death5']

def death_count(row):
  death = 0
  for column in columns:
    if row[column] == 'YES':
      death += 1
  return death
true_avengers['Deaths'] = true_avengers[columns].apply(death_count, axis=1)
true_avengers['Deaths'].head()
# true_avengers[columns].head()









    



C:\Users\IBM_ADMIN\Anaconda2\lib\site-packages\ipykernel\__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy






    Out[4]:





0    1
1    1
2    1
3    1
4    2
Name: Deaths, dtype: int64

6: Years Since Joining

For the final task, we want to know if the Years since joining field accurately reflects the Year column. If an Avenger was introduced in Year 1960, is the Years since joining value for that Avenger 55?



In [5]:

    
joined_accuracy_count = len(true_avengers[true_avengers['Year'] + true_avengers['Years since joining'] == 2015])

print('Total number of rows: ' + str(len(true_avengers.index)))
print('Accurate rows: ' + str(joined_accuracy_count))









    



Total number of rows: 159
Accurate rows: 159



In [ ]:

	URL	Name/Alias	Appearances	Current?	Gender	Probationary Introl	Full/Reserve Avengers Intro	Year	Years since joining	Honorary	...	Return1	Death2	Return2	Death3	Return3	Death4	Return4	Death5	Return5	Notes
0	http://marvel.wikia.com/Henry_Pym_(Earth-616)	Henry Jonathan "Hank" Pym	1269	YES	MALE	NaN	Sep-63	1963	52	Full	...	NO	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Merged with Ultron in Rage of Ultron Vol. 1. A...
1	http://marvel.wikia.com/Janet_van_Dyne_(Earth-...	Janet van Dyne	1165	YES	FEMALE	NaN	Sep-63	1963	52	Full	...	YES	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Dies in Secret Invasion V1:I8. Actually was se...
2	http://marvel.wikia.com/Anthony_Stark_(Earth-616)	Anthony Edward "Tony" Stark	3068	YES	MALE	NaN	Sep-63	1963	52	Full	...	YES	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Death: "Later while under the influence of Imm...
3	http://marvel.wikia.com/Robert_Bruce_Banner_(E...	Robert Bruce Banner	2089	YES	MALE	NaN	Sep-63	1963	52	Full	...	YES	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Dies in Ghosts of the Future arc. However "he ...
4	http://marvel.wikia.com/Thor_Odinson_(Earth-616)	Thor Odinson	2402	YES	MALE	NaN	Sep-63	1963	52	Full	...	YES	YES	NO	NaN	NaN	NaN	NaN	NaN	NaN	Dies in Fear Itself brought back because that'...