Part One

Use the csv I've attached to answer the following questions:

1) Import pandas with the right name


In [3]:
import pandas as pd

2) Set all graphics from matplotlib to display inline


In [4]:
!pip install matplotlib
import matplotlib.pyplot as plt
%matplotlib inline


Requirement already satisfied (use --upgrade to upgrade): matplotlib in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): pytz in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): cycler in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pyparsing!=2.0.0,!=2.0.4,>=1.5.6 in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six in /Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages (from cycler->matplotlib)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-f508ce6dea2d> in <module>()
      1 get_ipython().system('pip install matplotlib')
----> 2 import matplotlib.pyplot as plt
      3 get_ipython().magic('matplotlib inline')

/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/pyplot.py in <module>()
    112 
    113 from matplotlib.backends import pylab_setup
--> 114 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
    115 
    116 _IP_REGISTERED = None

/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/backends/__init__.py in pylab_setup()
     30     # imports. 0 means only perform absolute imports.
     31     backend_mod = __import__(backend_name,
---> 32                              globals(),locals(),[backend_name],0)
     33 
     34     # Things we pull in from all backends

/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/matplotlib/backends/backend_macosx.py in <module>()
     22 
     23 import matplotlib
---> 24 from matplotlib.backends import _macosx
     25 
     26 

RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are Working with Matplotlib in a virtual enviroment see 'Working with Matplotlib in Virtual environments' in the Matplotlib FAQ

3) Read the csv in (it should be UTF-8 already so you don't have to worry about encoding), save it with the proper boring name


In [ ]:
df = pd.read_csv("07-hw-animals copy.csv")

4) Display the names of the columns in the csv


In [ ]:
df.columns.values

5) Display the first 3 animals.


In [ ]:
df.head(3)

6) Sort the animals to see the 3 longest animals.


In [ ]:
df.sort_values(by='length', ascending = False).head(3)

7) What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.


In [ ]:
df['animal'].value_counts()

8) Only select the dogs.


In [ ]:
df[df['animal'] == 'dog']

9) Display all of the animals that are greater than 40 cm.


In [ ]:
df[df['length']>40]

10) 'length' is the animal's length in cm. Create a new column called inches that is the length in inches.


In [ ]:
df['inches'] = df['length']*0.393701

11) Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."


In [ ]:
cats = df[df['animal'] =='cat']
dogs = df[df['animal'] == 'dog']

12) Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.


In [ ]:
cats[cats['inches']>12]

In [ ]:
df[(df['animal']=='cat') & (df['inches']>12)]

13) What's the mean length of a cat?


In [ ]:
cats.describe()

the mean length of a cat is 14.698 inches

14) What's the mean length of a dog?


In [ ]:
dogs.describe()

the mean length of a dog is 19.685

15) Use groupby to accomplish both of the above tasks at once.


In [ ]:
df.groupby('animal').mean()

16) Make a histogram of the length of dogs. I apologize that it is so boring.


In [ ]:
dogs.hist('length')

17) Change your graphing style to be something else (anything else!)


In [ ]:
df.plot(kind='bar', x='name', y='length', legend=False)

18) Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)


In [ ]:
df.plot(kind='barh', x='animal', y='length', legend=False)

19) Make a sorted horizontal bar graph of the cats, with the larger cats on top.


In [ ]:
sortedcats = cats.sort_values(by='length', ascending = True)

sortedcats.plot(kind='barh', x='animal', y='length', legend=False)

Part Two


In [6]:
df = pd.read_excel('billionaires copy.xlsx')
df.columns.values


Out[6]:
array(['year', 'name', 'rank', 'citizenship', 'countrycode',
       'networthusbillion', 'selfmade', 'typeofwealth', 'gender', 'age',
       'industry', 'IndustryAggregates', 'region', 'north',
       'politicalconnection', 'founder', 'generationofinheritance',
       'sector', 'company', 'companytype', 'relationshiptocompany',
       'foundingdate', 'gdpcurrentus', 'sourceofwealth', 'notes', 'notes2',
       'source', 'source_2', 'source_3', 'source_4'], dtype=object)

In [7]:
recent = df[df['year']==2014]
recent.head(5)


Out[7]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
1 2014 A. Jerrold Perenchio 663 United States USA 2.6 self-made executive male 83.0 ... former chairman and CEO 1955.0 NaN television, Univision represented Marlon Brando and Elizabeth Taylor NaN http://en.wikipedia.org/wiki/Jerry_Perenchio http://www.forbes.com/profile/a-jerrold-perenc... COLUMN ONE; A Hollywood Player Who Owns the Ga... NaN
5 2014 Abdulla Al Futtaim 687 United Arab Emirates ARE 2.5 inherited inherited male NaN ... relation 1930.0 NaN auto dealers, investments company split between him and cousin in 2000 NaN http://en.wikipedia.org/wiki/Al-Futtaim_Group http://www.al-futtaim.ae/content/groupProfile.asp NaN NaN
6 2014 Abdulla bin Ahmad Al Ghurair 305 United Arab Emirates ARE 4.8 inherited inherited male NaN ... relation 1960.0 NaN diversified inherited from father NaN http://en.wikipedia.org/wiki/Al-Ghurair_Group http://www.alghurair.com/about-us/our-history NaN NaN
8 2014 Abdullah Al Rajhi 731 Saudi Arabia SAU 2.4 self-made self-made finance male NaN ... founder 1957.0 NaN banking NaN NaN http://en.wikipedia.org/wiki/Al-Rajhi_Bank http://www.alrajhibank.com.sa/ar/investor-rela... http://www.alrajhibank.com.sa/ar/about-us/page... NaN
9 2014 Abdulsamad Rabiu 1372 Nigeria NGA 1.2 self-made founder non-finance male 54.0 ... founder 1988.0 NaN sugar, flour, cement NaN NaN http://www.forbes.com/profile/abdulsamad-rabiu/ http://www.bloomberg.com/research/stocks/priva... NaN NaN

5 rows × 30 columns

1) What country are most billionaires from? For the top ones, how many billionaires per billion people?


In [9]:
recent['countrycode'].value_counts()


Out[9]:
USA       499
CHN       152
RUS       111
DEU        85
BRA        65
IND        56
GBR        47
HKG        45
FRA        43
ITA        35
CAN        32
AUS        29
Taiwan     28
KOR        27
JPN        27
ESP        26
TUR        24
CHE        22
SWE        19
IDN        19
ISR        18
SGP        16
MEX        16
MYS        13
CHL        12
THA        11
PHL        10
AUT        10
UKR         9
NOR         9
         ... 
ARG         5
IRL         5
KWT         5
POL         5
KAZ         5
MAR         4
ARE         4
CYP         4
COL         4
NGA         4
PRT         3
GRC         3
MCO         3
BEL         3
VEN         3
NZL         2
MAC         2
OMN         2
LTU         1
KNA         1
GGY         1
TZA         1
VNM         1
UGA         1
NPL         1
ROU         1
GEO         1
SWZ         1
DZA         1
AGO         1
Name: countrycode, dtype: int64

In [ ]:

2) Who are the top 10 richest billionaires?


In [10]:
recent.sort_values(by='networthusbillion', ascending=False).head(10)


Out[10]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
284 2014 Bill Gates 1 United States USA 76.0 self-made founder non-finance male 58.0 ... founder 1975.0 NaN Microsoft NaN NaN http://www.forbes.com/profile/bill-gates/ NaN NaN NaN
348 2014 Carlos Slim Helu 2 Mexico MEX 72.0 self-made privatized and resources male 74.0 ... founder 1990.0 NaN telecom NaN NaN http://www.ozy.com/provocateurs/carlos-slims-w... NaN NaN NaN
124 2014 Amancio Ortega 3 Spain ESP 64.0 self-made founder non-finance male 77.0 ... founder 1975.0 NaN retail NaN NaN http://www.forbes.com/profile/amancio-ortega/ NaN NaN NaN
2491 2014 Warren Buffett 4 United States USA 58.2 self-made founder non-finance male 83.0 ... founder 1839.0 NaN Berkshire Hathaway NaN NaN http://www.forbes.com/lists/2009/10/billionair... http://www.forbes.com/companies/berkshire-hath... NaN NaN
1377 2014 Larry Ellison 5 United States USA 48.0 self-made founder non-finance male 69.0 ... founder 1977.0 NaN Oracle NaN NaN http://www.forbes.com/profile/larry-ellison/ http://www.businessinsider.com/how-larry-ellis... NaN NaN
509 2014 David Koch 6 United States USA 40.0 inherited inherited male 73.0 ... relation 1940.0 NaN diversified inherited from father NaN http://www.kochind.com/About_Koch/History_Time... NaN NaN NaN
381 2014 Charles Koch 6 United States USA 40.0 inherited inherited male 78.0 ... relation 1940.0 NaN diversified inherited from father NaN http://www.kochind.com/About_Koch/History_Time... NaN NaN NaN
2185 2014 Sheldon Adelson 8 United States USA 38.0 self-made self-made finance male 80.0 ... founder 1952.0 NaN casinos NaN NaN http://www.forbes.com/profile/sheldon-adelson/ http://lasvegassun.com/news/1996/nov/26/rat-pa... NaN NaN
429 2014 Christy Walton 9 United States USA 36.7 inherited inherited female 59.0 ... relation 1962.0 NaN Wal-Mart widow NaN http://www.forbes.com/profile/christy-walton/ NaN NaN NaN
1128 2014 Jim Walton 10 United States USA 34.7 inherited inherited male 66.0 ... relation 1962.0 NaN Wal-Mart inherited from father NaN http://www.forbes.com/profile/jim-walton/ NaN NaN NaN

10 rows × 30 columns

3) What's the average wealth of a billionaire? Male? Female?


In [11]:
recent.groupby('gender')['networthusbillion'].mean()


Out[11]:
gender
female    3.920556
male      3.902716
Name: networthusbillion, dtype: float64

4) Who is the poorest billionaire? Who are the top 10 poorest billionaires?


In [12]:
recent.sort_values('networthusbillion').head(10)


Out[12]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... relationshiptocompany foundingdate gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4
234 2014 B.R. Shetty 1565 India IND 1.0 self-made founder non-finance male 72.0 ... founder 1975.0 NaN healthcare NaN NaN http://en.wikipedia.org/wiki/B._R._Shetty http://www.nmchealth.com/dr-br-shetty/ NaN NaN
2092 2014 Rostam Azizi 1565 Tanzania TZA 1.0 self-made executive male 49.0 ... investor 1999.0 NaN telecom, investments NaN NaN http://www.forbes.com/profile/rostam-azizi/ http://en.wikipedia.org/wiki/Vodacom_Tanzania http://www.thecitizen.co.tz/News/Rostam--Dewji... NaN
2401 2014 Tory Burch 1565 United States USA 1.0 self-made founder non-finance female 47.0 ... founder 2004.0 NaN fashion NaN NaN http://en.wikipedia.org/wiki/J._Christopher_Burch http://www.vanityfair.com/news/2007/02/tory-bu... NaN NaN
734 2014 Fred Chang 1565 United States USA 1.0 self-made founder non-finance male 57.0 ... founder 2001.0 NaN online retailing NaN NaN http://en.wikipedia.org/wiki/Newegg http://www.newegg.com/Info/FactSheet.aspx http://www.forbes.com/sites/andreanavarro/2014... NaN
171 2014 Angela Bennett 1565 Australia AUS 1.0 inherited inherited female 69.0 ... relation 1955.0 NaN mining inherited from father shared fortune with brother http://www.forbes.com/profile/angela-bennett/ NaN NaN NaN
748 2014 Fu Kwan 1565 China CHN 1.0 self-made self-made finance male 56.0 ... chairman 1990.0 NaN diversified NaN NaN http://www.forbes.com/profile/fu-kwan/ http://www.macrolink.com.cn/en/AboutBig.aspx NaN NaN
2107 2014 Ryan Kavanaugh 1565 United States USA 1.0 self-made founder non-finance male 39.0 ... founder 2004.0 NaN Movies NaN NaN http://en.wikipedia.org/wiki/Ryan_Kavanaugh http://en.wikipedia.org/wiki/Relativity_Media http://www.vanityfair.com/news/2010/03/kavanau... NaN
1783 2014 O. Francis Biondi 1565 United States USA 1.0 self-made self-made finance male 49.0 ... founder 1995.0 NaN hedge fund NaN NaN http://www.forbes.com/profile/o-francis-biondi/ http://www.forbes.com/sites/nathanvardi/2014/0... NaN NaN
1371 2014 Lam Fong Ngo 1565 Macau MAC 1.0 self-made self-made finance female NaN ... Vice Chairman 1997.0 NaN casinos NaN NaN http://www.forbes.com/profile/david-chow-1/ http://www.macaulegend.com/html/about_mileston... Macau Legend to roll the dice on HK IPO; But l... NaN
702 2014 Feng Hailiang 1565 China CHN 1.0 self-made founder non-finance male 53.0 ... founder 1989.0 NaN copper processing & real estate NaN NaN http://www.forbes.com/profile/feng-hailiang/ http://www.hailiang.com/en/about_int.php NaN NaN

10 rows × 30 columns

5) 'What is relationship to company'? And what are the most common relationships?


In [18]:
recent.groupby('relationshiptocompany')['relationshiptocompany'].count()


Out[18]:
relationshiptocompany
CEO                                                 8
COO                                                 1
Chairman                                            8
Chairman and Chief Executive Officer               15
Chairman, CEO                                       1
Chairman/founder                                    1
Chairman/shareholder                                1
Chief Executive                                     2
Exectuitve Director                                 1
Global Head of Real Estate                          1
Head of Board of Directors                          1
Honorary President for Life                         1
Relation                                            3
Vice Chairman                                       3
Vice President                                      1
Vice President of Infrastructure Software           1
ceo                                                 8
chairman                                           64
chairman and ceo                                    1
chairman of management committee                    1
chairman of the board                               1
chairwomen                                          1
chariman                                            1
co-chairman                                         1
co-director of zinc, copper and lead                1
deputy chairman                                     1
director                                            1
employee                                            2
executive chairman                                  3
former CEO                                          5
                                                 ... 
founder/CEO                                         2
founder/chairman                                    3
founder/president                                   1
founder/relation                                    1
founder/vice chairman                               2
general director                                    2
head of Microsoft's application software group      1
head of high-yield bond trading dept                1
inherited                                           1
inventor                                            1
investor                                           30
investor and  CEO                                   1
investor/founder                                    1
lawyer                                              1
leadership                                          2
owner                                              79
owner and former CEO                                1
owner and vice chair                                1
partner                                             4
president                                           8
president and ceo                                   2
relation                                          515
relation and ceo                                    1
relation and chairman                               3
relation/vice chairman                              1
relative                                            2
shareholder                                         1
supervisory board or directors                      1
vice chairman                                       1
vice-chairman                                       1
Name: relationshiptocompany, dtype: int64

6) Most common source of wealth? Male vs. female?


In [31]:
recent.groupby('gender')['sourceofwealth'].sort_values()


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-31-c49f96013923> in <module>()
----> 1 recent.groupby('gender')['sourceofwealth'].sort_values()

/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/core/groupby.py in __getattr__(self, attr)
    493             return self[attr]
    494         if hasattr(self.obj, attr):
--> 495             return self._make_wrapper(attr)
    496 
    497         raise AttributeError("%r object has no attribute %r" %

/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/pandas/core/groupby.py in _make_wrapper(self, name)
    507                    "using the 'apply' method".format(kind, name,
    508                                                      type(self).__name__))
--> 509             raise AttributeError(msg)
    510 
    511         # need to setup the selection

AttributeError: Cannot access callable attribute 'sort_values' of 'SeriesGroupBy' objects, try using the 'apply' method

7) Given the richest person in a country, what % of the GDP is their wealth?


In [ ]:

8) Add up the wealth of all of the billionaires in a given country (or a few countries) and then compare it to the GDP of the country, or other billionaires, so like pit the US vs India

9) What are the most common industries for billionaires to come from? What's the total amount of billionaire money from each industry?


In [24]:
recent['industry'].value_counts()


Out[24]:
Consumer                           291
Real Estate                        190
Retail, Restaurant                 174
Diversified financial              132
Technology-Computer                131
Money Management                   122
Media                              104
Energy                              87
Non-consumer industrial             83
Technology-Medical                  78
Mining and metals                   68
Constrution                         61
Other                               59
Hedge funds                         43
Private equity/leveraged buyout     18
0                                    6
Venture Capital                      5
Name: industry, dtype: int64

In [29]:
recent.groupby('industry')['networthusbillion'].sum().sort_values(ascending=False).head(10)


Out[29]:
industry
Consumer                   1177.8
Retail, Restaurant          820.9
Technology-Computer         684.6
Diversified financial       614.4
Real Estate                 573.8
Media                       490.5
Money Management            381.3
Energy                      340.5
Non-consumer industrial     298.4
Mining and metals           240.6
Name: networthusbillion, dtype: float64

10) How many self made billionaires vs. others?

11) How old are billionaires? How old are billionaires self made vs. non self made? or different industries?

12) Who are the youngest billionaires? The oldest? Age distribution - maybe make a graph about it?

13) Maybe just made a graph about how wealthy they are in general?

14) Maybe plot their net worth vs age (scatterplot)

15) Make a bar graph of the top 10 or 20 richest


In [ ]: