Part One

Use the csv I've attached to answer the following questions:

1) Import pandas with the right name


In [253]:
import pandas as pd

2) Set all graphics from matplotlib to display inline


In [254]:
!pip install matplotlib
import matplotlib.pyplot as plt
%matplotlib inline


Requirement already satisfied (use --upgrade to upgrade): matplotlib in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pyparsing!=2.0.0,!=2.0.4,>=1.5.6 in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): cycler in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from python-dateutil->matplotlib)

3) Read the csv in (it should be UTF-8 already so you don't have to worry about encoding), save it with the proper boring name


In [255]:
df = pd.read_csv("07-hw-animals copy.csv")

4) Display the names of the columns in the csv


In [256]:
df.columns.values


Out[256]:
array(['animal', 'name', 'length'], dtype=object)

5) Display the first 3 animals.


In [257]:
df.head(3)


Out[257]:
animal name length
0 cat Anne 35
1 cat Bob 45
2 dog Egglesburg 65

6) Sort the animals to see the 3 longest animals.


In [258]:
df.sort_values(by='length', ascending = False).head(3)


Out[258]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
1 cat Bob 45

7) What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.


In [259]:
df['animal'].value_counts()


Out[259]:
dog    3
cat    3
Name: animal, dtype: int64

8) Only select the dogs.


In [260]:
df[df['animal'] == 'dog']


Out[260]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
5 dog Fontaine 35

9) Display all of the animals that are greater than 40 cm.


In [261]:
df[df['length']>40]


Out[261]:
animal name length
1 cat Bob 45
2 dog Egglesburg 65
3 dog Devon 50

10) 'length' is the animal's length in cm. Create a new column called inches that is the length in inches.


In [262]:
df['inches'] = df['length']*0.393701

11) Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."


In [263]:
cats = df[df['animal'] =='cat']
dogs = df[df['animal'] == 'dog']

12) Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.


In [264]:
cats[cats['inches']>12]


Out[264]:
animal name length inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

In [265]:
df[(df['animal']=='cat') & (df['inches']>12)]


Out[265]:
animal name length inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

13) What's the mean length of a cat?


In [266]:
cats.describe()


Out[266]:
length inches
count 3.000000 3.000000
mean 37.333333 14.698171
std 6.806859 2.679867
min 32.000000 12.598432
25% 33.500000 13.188984
50% 35.000000 13.779535
75% 40.000000 15.748040
max 45.000000 17.716545

the mean length of a cat is 14.698 inches

14) What's the mean length of a dog?


In [267]:
dogs.describe()


Out[267]:
length inches
count 3.0 3.000000
mean 50.0 19.685050
std 15.0 5.905515
min 35.0 13.779535
25% 42.5 16.732292
50% 50.0 19.685050
75% 57.5 22.637808
max 65.0 25.590565

the mean length of a dog is 19.685

15) Use groupby to accomplish both of the above tasks at once.


In [268]:
df.groupby('animal').mean()


Out[268]:
length inches
animal
cat 37.333333 14.698171
dog 50.000000 19.685050

16) Make a histogram of the length of dogs. I apologize that it is so boring.


In [269]:
dogs.hist('length')


Out[269]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x10905e6a0>]], dtype=object)

17) Change your graphing style to be something else (anything else!)


In [270]:
df.plot(kind='bar', x='name', y='length', legend=False)


Out[270]:
<matplotlib.axes._subplots.AxesSubplot at 0x107b226a0>

18) Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)


In [271]:
df.plot(kind='barh', x='animal', y='length', legend=False)


Out[271]:
<matplotlib.axes._subplots.AxesSubplot at 0x108b4b3c8>

19) Make a sorted horizontal bar graph of the cats, with the larger cats on top.


In [272]:
sortedcats = cats.sort_values(by='length', ascending = True)

sortedcats.plot(kind='barh', x='animal', y='length', legend=False)


Out[272]:
<matplotlib.axes._subplots.AxesSubplot at 0x1087214a8>

Part Two


In [279]:
df = pd.read_excel('billionaires copy.xlsx')
df.columns.values


Out[279]:
array(['year', 'name', 'rank', 'citizenship', 'countrycode',
       'networthusbillion', 'selfmade', 'typeofwealth', 'gender', 'age',
       'industry', 'IndustryAggregates', 'region', 'north',
       'politicalconnection', 'founder', 'generationofinheritance',
       'sector', 'company', 'companytype', 'relationshiptocompany',
       'foundingdate', 'gdpcurrentus', 'sourceofwealth', 'notes', 'notes2',
       'source', 'source_2', 'source_3', 'source_4'], dtype=object)

In [280]:
recent = df[df['year']==2014]
recent.head(5)


Out[280]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
1 2014 A. Jerrold Perenchio 663 United States USA 2.6 self-made executive male 83.0 ... former chairman and CEO 1955.0 NaN television, Univision represented Marlon Brando and Elizabeth Taylor NaN http://en.wikipedia.org/wiki/Jerry_Perenchio http://www.forbes.com/profile/a-jerrold-perenc... COLUMN ONE; A Hollywood Player Who Owns the Ga... NaN
5 2014 Abdulla Al Futtaim 687 United Arab Emirates ARE 2.5 inherited inherited male NaN ... relation 1930.0 NaN auto dealers, investments company split between him and cousin in 2000 NaN http://en.wikipedia.org/wiki/Al-Futtaim_Group http://www.al-futtaim.ae/content/groupProfile.asp NaN NaN
6 2014 Abdulla bin Ahmad Al Ghurair 305 United Arab Emirates ARE 4.8 inherited inherited male NaN ... relation 1960.0 NaN diversified inherited from father NaN http://en.wikipedia.org/wiki/Al-Ghurair_Group http://www.alghurair.com/about-us/our-history NaN NaN
8 2014 Abdullah Al Rajhi 731 Saudi Arabia SAU 2.4 self-made self-made finance male NaN ... founder 1957.0 NaN banking NaN NaN http://en.wikipedia.org/wiki/Al-Rajhi_Bank http://www.alrajhibank.com.sa/ar/investor-rela... http://www.alrajhibank.com.sa/ar/about-us/page... NaN
9 2014 Abdulsamad Rabiu 1372 Nigeria NGA 1.2 self-made founder non-finance male 54.0 ... founder 1988.0 NaN sugar, flour, cement NaN NaN http://www.forbes.com/profile/abdulsamad-rabiu/ http://www.bloomberg.com/research/stocks/priva... NaN NaN

5 rows × 30 columns

1) What country are most billionaires from? For the top ones, how many billionaires per billion people?


In [ ]:


In [ ]:

2) Who are the top 10 richest billionaires?


In [282]:
recent.sort_values(by='networthusbillion', ascending=False).head(10)


Out[282]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4 countryfreq
284 2014 Bill Gates 1 United States USA 76.0 self-made founder non-finance male 58.0 ... 1975.0 NaN Microsoft NaN NaN http://www.forbes.com/profile/bill-gates/ NaN NaN NaN 499
348 2014 Carlos Slim Helu 2 Mexico MEX 72.0 self-made privatized and resources male 74.0 ... 1990.0 NaN telecom NaN NaN http://www.ozy.com/provocateurs/carlos-slims-w... NaN NaN NaN 16
124 2014 Amancio Ortega 3 Spain ESP 64.0 self-made founder non-finance male 77.0 ... 1975.0 NaN retail NaN NaN http://www.forbes.com/profile/amancio-ortega/ NaN NaN NaN 26
2491 2014 Warren Buffett 4 United States USA 58.2 self-made founder non-finance male 83.0 ... 1839.0 NaN Berkshire Hathaway NaN NaN http://www.forbes.com/lists/2009/10/billionair... http://www.forbes.com/companies/berkshire-hath... NaN NaN 499
1377 2014 Larry Ellison 5 United States USA 48.0 self-made founder non-finance male 69.0 ... 1977.0 NaN Oracle NaN NaN http://www.forbes.com/profile/larry-ellison/ http://www.businessinsider.com/how-larry-ellis... NaN NaN 499
509 2014 David Koch 6 United States USA 40.0 inherited inherited male 73.0 ... 1940.0 NaN diversified inherited from father NaN http://www.kochind.com/About_Koch/History_Time... NaN NaN NaN 499
381 2014 Charles Koch 6 United States USA 40.0 inherited inherited male 78.0 ... 1940.0 NaN diversified inherited from father NaN http://www.kochind.com/About_Koch/History_Time... NaN NaN NaN 499
2185 2014 Sheldon Adelson 8 United States USA 38.0 self-made self-made finance male 80.0 ... 1952.0 NaN casinos NaN NaN http://www.forbes.com/profile/sheldon-adelson/ http://lasvegassun.com/news/1996/nov/26/rat-pa... NaN NaN 499
429 2014 Christy Walton 9 United States USA 36.7 inherited inherited female 59.0 ... 1962.0 NaN Wal-Mart widow NaN http://www.forbes.com/profile/christy-walton/ NaN NaN NaN 499
1128 2014 Jim Walton 10 United States USA 34.7 inherited inherited male 66.0 ... 1962.0 NaN Wal-Mart inherited from father NaN http://www.forbes.com/profile/jim-walton/ NaN NaN NaN 499

10 rows × 31 columns

3) What's the average wealth of a billionaire? Male? Female?


In [283]:
recent.groupby('gender').mean()


Out[283]:
year rank networthusbillion age north politicalconnection founder foundingdate gdpcurrentus countryfreq
gender
female 2014.0 801.761111 3.920556 62.608434 0.711111 1.0 0.161111 1939.450000 NaN 210.061111
male 2014.0 810.380855 3.902716 63.427669 0.556687 1.0 0.556687 1966.109514 NaN 188.315003

4) Who is the poorest billionaire? Who are the top 10 poorest billionaires?


In [284]:
recent.sort_values('networthusbillion').head(10)


Out[284]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4 countryfreq
234 2014 B.R. Shetty 1565 India IND 1.0 self-made founder non-finance male 72.0 ... 1975.0 NaN healthcare NaN NaN http://en.wikipedia.org/wiki/B._R._Shetty http://www.nmchealth.com/dr-br-shetty/ NaN NaN 56
2092 2014 Rostam Azizi 1565 Tanzania TZA 1.0 self-made executive male 49.0 ... 1999.0 NaN telecom, investments NaN NaN http://www.forbes.com/profile/rostam-azizi/ http://en.wikipedia.org/wiki/Vodacom_Tanzania http://www.thecitizen.co.tz/News/Rostam--Dewji... NaN 1
2401 2014 Tory Burch 1565 United States USA 1.0 self-made founder non-finance female 47.0 ... 2004.0 NaN fashion NaN NaN http://en.wikipedia.org/wiki/J._Christopher_Burch http://www.vanityfair.com/news/2007/02/tory-bu... NaN NaN 499
734 2014 Fred Chang 1565 United States USA 1.0 self-made founder non-finance male 57.0 ... 2001.0 NaN online retailing NaN NaN http://en.wikipedia.org/wiki/Newegg http://www.newegg.com/Info/FactSheet.aspx http://www.forbes.com/sites/andreanavarro/2014... NaN 499
171 2014 Angela Bennett 1565 Australia AUS 1.0 inherited inherited female 69.0 ... 1955.0 NaN mining inherited from father shared fortune with brother http://www.forbes.com/profile/angela-bennett/ NaN NaN NaN 29
748 2014 Fu Kwan 1565 China CHN 1.0 self-made self-made finance male 56.0 ... 1990.0 NaN diversified NaN NaN http://www.forbes.com/profile/fu-kwan/ http://www.macrolink.com.cn/en/AboutBig.aspx NaN NaN 152
2107 2014 Ryan Kavanaugh 1565 United States USA 1.0 self-made founder non-finance male 39.0 ... 2004.0 NaN Movies NaN NaN http://en.wikipedia.org/wiki/Ryan_Kavanaugh http://en.wikipedia.org/wiki/Relativity_Media http://www.vanityfair.com/news/2010/03/kavanau... NaN 499
1783 2014 O. Francis Biondi 1565 United States USA 1.0 self-made self-made finance male 49.0 ... 1995.0 NaN hedge fund NaN NaN http://www.forbes.com/profile/o-francis-biondi/ http://www.forbes.com/sites/nathanvardi/2014/0... NaN NaN 499
1371 2014 Lam Fong Ngo 1565 Macau MAC 1.0 self-made self-made finance female NaN ... 1997.0 NaN casinos NaN NaN http://www.forbes.com/profile/david-chow-1/ http://www.macaulegend.com/html/about_mileston... Macau Legend to roll the dice on HK IPO; But l... NaN 2
702 2014 Feng Hailiang 1565 China CHN 1.0 self-made founder non-finance male 53.0 ... 1989.0 NaN copper processing & real estate NaN NaN http://www.forbes.com/profile/feng-hailiang/ http://www.hailiang.com/en/about_int.php NaN NaN 152

10 rows × 31 columns

5) 'What is relationship to company'? And what are the most common relationships?


In [295]:
rel_counts = recent.groupby('relationshiptocompany').count()
rel_counts.sort_values('year', ascending=False).head(10)
#relationship to company describes the role a person plays in a company
#most common relationshops are founder, relation, owner, chairman, and investor


Out[295]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4 countryfreq
relationshiptocompany
founder 818 818 818 818 818 818 818 818 818 796 ... 811 0 805 99 3 818 745 279 5 818
relation 515 515 515 515 515 515 515 515 515 485 ... 515 0 515 514 104 515 380 124 6 515
owner 79 79 79 79 79 79 79 79 79 73 ... 79 0 75 17 0 79 70 29 1 79
chairman 64 64 64 64 64 64 64 64 64 63 ... 61 0 64 12 1 64 58 14 1 64
investor 30 30 30 30 30 30 30 30 30 30 ... 30 0 30 8 0 30 28 13 0 30
Chairman and Chief Executive Officer 15 15 15 15 15 15 15 15 15 15 ... 15 0 15 5 0 15 15 9 2 15
president 8 8 8 8 8 8 8 8 8 8 ... 8 0 8 0 0 8 8 5 0 8
ceo 8 8 8 8 8 8 8 8 8 7 ... 8 0 8 0 0 8 8 5 0 8
CEO 8 8 8 8 8 8 8 8 8 8 ... 8 0 8 1 0 8 8 2 0 8
Chairman 8 8 8 8 8 8 8 8 8 7 ... 8 0 8 1 0 8 7 5 0 8

10 rows × 30 columns

6) Most common source of wealth? Male vs. female?


In [298]:
source_counts = recent.groupby('sourceofwealth')


Out[298]:
<pandas.core.groupby.DataFrameGroupBy object at 0x1095bafd0>

7) Given the richest person in a country, what % of the GDP is their wealth?


In [ ]:

8) Add up the wealth of all of the billionaires in a given country (or a few countries) and then compare it to the GDP of the country, or other billionaires, so like pit the US vs India

9) What are the most common industries for billionaires to come from? What's the total amount of billionaire money from each industry?

10) How many self made billionaires vs. others?

11) How old are billionaires? How old are billionaires self made vs. non self made? or different industries?

12) Who are the youngest billionaires? The oldest? Age distribution - maybe make a graph about it?

13) Maybe just made a graph about how wealthy they are in general?

14) Maybe plot their net worth vs age (scatterplot)

15) Make a bar graph of the top 10 or 20 richest


In [ ]: