Part 1: Animals

1. Import pandas with the right name


In [1]:
import pandas as pd

2. Set all graphics from matplotlib to display inline


In [34]:
import matplotlib as plt
import matplotlib.pyplot as plt
% matplotlib inline

3. Read the csv in (it should be UTF-8 already so you don't have to worry about encoding), save it with the proper boring name


In [3]:
df = pd.read_csv("07-hw-animals.csv")

In [4]:
df


Out[4]:
animal name length
0 cat Anne 35
1 cat Bob 45
2 dog Egglesburg 65
3 dog Devon 50
4 cat Charlie 32
5 dog Fontaine 35

4. Display the names of the columns in the csv


In [5]:
df.columns.values


Out[5]:
array(['animal', 'name', 'length'], dtype=object)

5. Display the first 3 animals.


In [6]:
df.head(3)


Out[6]:
animal name length
0 cat Anne 35
1 cat Bob 45
2 dog Egglesburg 65

6. Sort the animals to see the 3 longest animals.


In [7]:
df.sort_values(by='length', ascending = False).head(3)


Out[7]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
1 cat Bob 45

7. What are the counts of the different values of the "animal" column?


In [8]:
df['animal'].value_counts()


Out[8]:
cat    3
dog    3
Name: animal, dtype: int64

8. Only select the dogs.


In [9]:
df['animal'] == 'dog'


Out[9]:
0    False
1    False
2     True
3     True
4    False
5     True
Name: animal, dtype: bool

In [10]:
df[df['animal'] == 'dog']


Out[10]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
5 dog Fontaine 35

9. Display all of the animals that are greater than 40 cm.


In [11]:
df[df['length'] > 40]


Out[11]:
animal name length
1 cat Bob 45
2 dog Egglesburg 65
3 dog Devon 50

10. 'length' is the animal's length in cm. Create a new column called inches that is the length in inches.


In [12]:
cm_in_inch = 0.393701
df['length_inches'] = df['length'] * cm_in_inch
df


Out[12]:
animal name length length_inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
2 dog Egglesburg 65 25.590565
3 dog Devon 50 19.685050
4 cat Charlie 32 12.598432
5 dog Fontaine 35 13.779535

11. Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."


In [37]:
cats = df[df['animal'] == 'cat']
cats


Out[37]:
animal name length length_inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

In [38]:
dogs = df[df['animal'] == 'dog']
dogs


Out[38]:
animal name length length_inches
2 dog Egglesburg 65 25.590565
3 dog Devon 50 19.685050
5 dog Fontaine 35 13.779535

12. Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe


In [44]:
cats[cats['length_inches']> 12]


Out[44]:
animal name length length_inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

In [16]:
#Using the normal dataframe
df[(df['animal'] == 'cat') & (df['length_inches'] > 12)]


Out[16]:
animal name length length_inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

13. What's the mean length of a cat?


In [42]:
cats['length'].describe()['mean']


Out[42]:
37.333333333333336

14. What's the mean length of a dog?


In [43]:
dogs['length'].describe()['mean']


Out[43]:
50.0

15. Use groupby to accomplish both of the above tasks at once.


In [19]:
animals = df.groupby(['animal'])
animals['length'].mean()


Out[19]:
animal
cat    37.333333
dog    50.000000
Name: length, dtype: float64

16. Make a histogram of the length of dogs.


In [53]:
dogs['length'].hist()


Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x84bf250>

17. Change your graphing style to be something else (anything else!)


In [54]:
plt.style.use('ggplot')
dogs['length'].hist()


Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x84d3bf0>

18. Make a horizontal bar graph of the length of the animals, with their name as the label


In [55]:
df.plot(kind='barh', x='name', y='length')


Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x74b8870>

19. Make a sorted horizontal bar graph of the cats, with the larger cats on top.


In [56]:
cats.sort_values(by='length').plot(kind='barh', x='name', y='length')


Out[56]:
<matplotlib.axes._subplots.AxesSubplot at 0x7504330>

Part 2: Rich people

Answer your own selection out of the following questions, or any other questions you might be able to think of.


In [105]:
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline
df = pd.read_csv('richpeople.csv', encoding='latin-1')

In [106]:
df.head(10)
richpeople = df[df['year'] == 2014]
richpeople.columns


Out[106]:
Index(['year', 'name', 'rank', 'citizenship', 'countrycode',
       'networthusbillion', 'selfmade', 'typeofwealth', 'gender', 'age',
       'industry', 'IndustryAggregates', 'region', 'north',
       'politicalconnection', 'founder', 'generationofinheritance', 'sector',
       'company', 'companytype', 'relationshiptocompany', 'foundingdate',
       'gdpcurrentus', 'sourceofwealth', 'notes', 'notes2', 'source',
       'source_2', 'source_3', 'source_4'],
      dtype='object')

1) Who are the top 10 richest billionaires?


In [107]:
richpeople.sort_values(by='networthusbillion', ascending=False).head(10)


Out[107]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
284 2014 Bill Gates 1 United States USA 76.0 self-made founder non-finance male 58.0 ... founder 1975.0 NaN Microsoft NaN NaN http://www.forbes.com/profile/bill-gates/ NaN NaN NaN
348 2014 Carlos Slim Helu 2 Mexico MEX 72.0 self-made privatized and resources male 74.0 ... founder 1990.0 NaN telecom NaN NaN http://www.ozy.com/provocateurs/carlos-slims-w... NaN NaN NaN
124 2014 Amancio Ortega 3 Spain ESP 64.0 self-made founder non-finance male 77.0 ... founder 1975.0 NaN retail NaN NaN http://www.forbes.com/profile/amancio-ortega/ NaN NaN NaN
2491 2014 Warren Buffett 4 United States USA 58.2 self-made founder non-finance male 83.0 ... founder 1839.0 NaN Berkshire Hathaway NaN NaN http://www.forbes.com/lists/2009/10/billionair... http://www.forbes.com/companies/berkshire-hath... NaN NaN
1377 2014 Larry Ellison 5 United States USA 48.0 self-made founder non-finance male 69.0 ... founder 1977.0 NaN Oracle NaN NaN http://www.forbes.com/profile/larry-ellison/ http://www.businessinsider.com/how-larry-ellis... NaN NaN
509 2014 David Koch 6 United States USA 40.0 inherited inherited male 73.0 ... relation 1940.0 NaN diversified inherited from father NaN http://www.kochind.com/About_Koch/History_Time... NaN NaN NaN
381 2014 Charles Koch 6 United States USA 40.0 inherited inherited male 78.0 ... relation 1940.0 NaN diversified inherited from father NaN http://www.kochind.com/About_Koch/History_Time... NaN NaN NaN
2185 2014 Sheldon Adelson 8 United States USA 38.0 self-made self-made finance male 80.0 ... founder 1952.0 NaN casinos NaN NaN http://www.forbes.com/profile/sheldon-adelson/ http://lasvegassun.com/news/1996/nov/26/rat-pa... NaN NaN
429 2014 Christy Walton 9 United States USA 36.7 inherited inherited female 59.0 ... relation 1962.0 NaN Wal-Mart widow NaN http://www.forbes.com/profile/christy-walton/ NaN NaN NaN
1128 2014 Jim Walton 10 United States USA 34.7 inherited inherited male 66.0 ... relation 1962.0 NaN Wal-Mart inherited from father NaN http://www.forbes.com/profile/jim-walton/ NaN NaN NaN

10 rows × 30 columns

2) Who are the top 10 poorest billionaires?


In [108]:
richpeople.sort_values(by='networthusbillion').head(10)


Out[108]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
234 2014 B.R. Shetty 1565 India IND 1.0 self-made founder non-finance male 72.0 ... founder 1975.0 NaN healthcare NaN NaN http://en.wikipedia.org/wiki/B._R._Shetty http://www.nmchealth.com/dr-br-shetty/ NaN NaN
2092 2014 Rostam Azizi 1565 Tanzania TZA 1.0 self-made executive male 49.0 ... investor 1999.0 NaN telecom, investments NaN NaN http://www.forbes.com/profile/rostam-azizi/ http://en.wikipedia.org/wiki/Vodacom_Tanzania http://www.thecitizen.co.tz/News/Rostam--Dewji... NaN
2401 2014 Tory Burch 1565 United States USA 1.0 self-made founder non-finance female 47.0 ... founder 2004.0 NaN fashion NaN NaN http://en.wikipedia.org/wiki/J._Christopher_Burch http://www.vanityfair.com/news/2007/02/tory-bu... NaN NaN
734 2014 Fred Chang 1565 United States USA 1.0 self-made founder non-finance male 57.0 ... founder 2001.0 NaN online retailing NaN NaN http://en.wikipedia.org/wiki/Newegg http://www.newegg.com/Info/FactSheet.aspx http://www.forbes.com/sites/andreanavarro/2014... NaN
171 2014 Angela Bennett 1565 Australia AUS 1.0 inherited inherited female 69.0 ... relation 1955.0 NaN mining inherited from father shared fortune with brother http://www.forbes.com/profile/angela-bennett/ NaN NaN NaN
748 2014 Fu Kwan 1565 China CHN 1.0 self-made self-made finance male 56.0 ... chairman 1990.0 NaN diversified NaN NaN http://www.forbes.com/profile/fu-kwan/ http://www.macrolink.com.cn/en/AboutBig.aspx NaN NaN
2107 2014 Ryan Kavanaugh 1565 United States USA 1.0 self-made founder non-finance male 39.0 ... founder 2004.0 NaN Movies NaN NaN http://en.wikipedia.org/wiki/Ryan_Kavanaugh http://en.wikipedia.org/wiki/Relativity_Media http://www.vanityfair.com/news/2010/03/kavanau... NaN
1783 2014 O. Francis Biondi 1565 United States USA 1.0 self-made self-made finance male 49.0 ... founder 1995.0 NaN hedge fund NaN NaN http://www.forbes.com/profile/o-francis-biondi/ http://www.forbes.com/sites/nathanvardi/2014/0... NaN NaN
1371 2014 Lam Fong Ngo 1565 Macau MAC 1.0 self-made self-made finance female NaN ... Vice Chairman 1997.0 NaN casinos NaN NaN http://www.forbes.com/profile/david-chow-1/ http://www.macaulegend.com/html/about_mileston... Macau Legend to roll the dice on HK IPO; But l... NaN
702 2014 Feng Hailiang 1565 China CHN 1.0 self-made founder non-finance male 53.0 ... founder 1989.0 NaN copper processing & real estate NaN NaN http://www.forbes.com/profile/feng-hailiang/ http://www.hailiang.com/en/about_int.php NaN NaN

10 rows × 30 columns

3) What's the average wealth of a billionaire? Male? Female?


In [109]:
print("The average networth of billionaires in US billion is", richpeople['networthusbillion'].mean())


The average networth of billionaires in US billion is 3.90465819722

In [110]:
richpeople.groupby('gender')['networthusbillion'].mean()


Out[110]:
gender
female    3.920556
male      3.902716
Name: networthusbillion, dtype: float64

4) What country are most billionaires from?


In [111]:
richpeople['citizenship'].value_counts()


Out[111]:
United States           499
China                   152
Russia                  111
Germany                  85
Brazil                   65
India                    56
United Kingdom           47
Hong Kong                45
France                   43
Italy                    35
Canada                   32
Australia                29
Taiwan                   28
Japan                    27
South Korea              27
Spain                    26
Turkey                   24
Switzerland              22
Indonesia                19
Sweden                   19
Israel                   18
Singapore                16
Mexico                   16
Malaysia                 13
Chile                    12
Thailand                 11
Philippines              10
Austria                  10
Ukraine                   9
Norway                    9
                       ... 
Kuwait                    5
Poland                    5
Kazakhstan                5
Ireland                   5
Cyprus                    4
Morocco                   4
Colombia                  4
Finland                   4
United Arab Emirates      4
Nigeria                   4
Venezuela                 3
Greece                    3
Monaco                    3
Portugal                  3
Belgium                   3
Oman                      2
Macau                     2
New Zealand               2
Angola                    1
Guernsey                  1
Georgia                   1
Nepal                     1
Swaziland                 1
Algeria                   1
St. Kitts and Nevis       1
Tanzania                  1
Romania                   1
Lithuania                 1
Uganda                    1
Vietnam                   1
Name: citizenship, dtype: int64

4) What are the most common industries for billionaires to come from?


In [112]:
richpeople['industry'].value_counts()


Out[112]:
Consumer                           291
Real Estate                        190
Retail, Restaurant                 174
Diversified financial              132
Technology-Computer                131
Money Management                   122
Media                              104
Energy                              87
Non-consumer industrial             83
Technology-Medical                  78
Mining and metals                   68
Constrution                         61
Other                               59
Hedge funds                         43
Private equity/leveraged buyout     18
0                                    6
Venture Capital                      5
Name: industry, dtype: int64

5) How old are billionaires? How old are billionaires self made vs. non self made?


In [113]:
print("On average billionaires are", richpeople['age'].mean(), "years old.")


On average billionaires are 63.3421383648 years old.

In [114]:
selfmade = richpeople[richpeople['selfmade'] == 'self-made']
print("Selfmade billionaires are about", selfmade['age'].mean(), "years old.")


Selfmade billionaires are about 62.6258992806 years old.

In [115]:
non_selfmade = richpeople[richpeople['selfmade'] != 'self-made']
print("Non-selfmade billionaires are on average", non_selfmade['age'].mean(), "years old.")


Non-selfmade billionaires are on average 65.0083682008 years old.

6) Who are the youngest billionaires?


In [116]:
richpeople.sort_values(by='age', ascending = True).head(3)


Out[116]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
1838 2014 Perenna Kei 1284 Hong Kong HKG 1.3 inherited inherited female 24.0 ... relation 1996.0 NaN real estate inherited from father NaN http://en.wikipedia.org/wiki/Perenna_Kei http://www.loganestate.com/en/about.aspx?ftid=294 NaN NaN
605 2014 Dustin Moskovitz 202 United States USA 6.8 self-made founder non-finance male 29.0 ... founder 2004.0 NaN Facebook NaN NaN http://en.wikipedia.org/wiki/Dustin_Moskovitz http://www.forbes.com/profile/dustin-moskovitz/ https://www.facebook.com/facebook/info?tab=pag... NaN
1586 2014 Mark Zuckerberg 21 United States USA 28.5 self-made founder non-finance male 29.0 ... founder 2004.0 NaN Facebook NaN NaN http://www.forbes.com/profile/mark-zuckerberg/ NaN NaN NaN

3 rows × 30 columns

7) Who are the oldest?


In [117]:
richpeople.sort_values(by='age', ascending = False).head(3)


Out[117]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
516 2014 David Rockefeller, Sr. 580 United States USA 2.9 inherited inherited male 98.0 ... relation 1870.0 NaN oil, banking family made most of fortune in the late 19th a... NaN http://en.wikipedia.org/wiki/David_Rockefeller http://en.wikipedia.org/wiki/Standard_Oil http://en.wikipedia.org/wiki/Rockefeller_family NaN
1277 2014 Karl Wlaschek 305 Austria AUT 4.8 self-made founder non-finance male 96.0 ... founder 1953.0 NaN retail NaN NaN http://en.wikipedia.org/wiki/BILLA http://en.wikipedia.org/wiki/Karl_Wlaschek https://www.billa.at/Footer_Nav_Seiten/Geschic... NaN
1328 2014 Kirk Kerkorian 328 United States USA 4.5 self-made self-made finance male 96.0 ... investor 1924.0 NaN casinos, investments purchased in 1969 NaN http://en.wikipedia.org/wiki/Kirk_Kerkorian http://www.forbes.com/profile/kirk-kerkorian/ PROFILE: Las Vegas billionaire amassed his wea... NaN

3 rows × 30 columns

8) Age distribution - maybe make a graph about it


In [119]:
plt.style.use('ggplot')
richpeople['age'].hist()


Out[119]:
<matplotlib.axes._subplots.AxesSubplot at 0x84fbb70>

9) Maybe plot their net worth vs age (scatterplot)


In [127]:
richpeople.plot(kind='scatter', x = 'age', y='networthusbillion', figsize=(10,10), alpha=0.3)


Out[127]:
<matplotlib.axes._subplots.AxesSubplot at 0xaf4df70>

In [ ]: