01: Building a pandas Cheat Sheet, Part 1

Use the csv I've attached to answer the following questions:

Import pandas with the right name


In [2]:
import pandas as pd

Set all graphics from matplotlib to display inline


In [3]:
%matplotlib inline

Read the csv in (it should be UTF-8 already so you don't have to worry about encoding), save it with the proper boring name


In [ ]:
df = pd.read_csv('07-hw-animals.csv')

Display the names of the columns in the csv


In [5]:
df.columns


Out[5]:
Index(['animal', 'name', 'length'], dtype='object')

Display the first 3 animals.


In [6]:
df['animal'].head(3)


Out[6]:
0    cat
1    cat
2    dog
Name: animal, dtype: object

Sort the animals to see the 3 longest animals.


In [7]:
df.sort_values('length', ascending = False).head(3)


Out[7]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
1 cat Bob 45

In [ ]:
# or
df.sort_values('length').tail(3)

What are the counts of the different values of the "animal" column? a.k.a. how many cats and how many dogs.


In [8]:
df['animal'].value_counts()


Out[8]:
dog    3
cat    3
Name: animal, dtype: int64

Only select the dogs.


In [9]:
df[df['animal'] == 'dog']


Out[9]:
animal name length
2 dog Egglesburg 65
3 dog Devon 50
5 dog Fontaine 35

Display all of the animals that are greater than 40 cm.


In [10]:
df[df['length'] > 40]


Out[10]:
animal name length
1 cat Bob 45
2 dog Egglesburg 65
3 dog Devon 50

'length' is the animal's length in cm. Create a new column called inches that is the length in inches.


In [11]:
inches = df['length'] * 0.393701
df['inches'] = inches

Save the cats to a separate variable called "cats." Save the dogs to a separate variable called "dogs."


In [12]:
cats = df[df['animal'] == 'cat']
dogs = df[df['animal'] == 'dog']

Display all of the animals that are cats and above 12 inches long. First do it using the "cats" variable, then do it using your normal dataframe.


In [13]:
cats[cats['inches'] > 12]


Out[13]:
animal name length inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

In [14]:
df[(df['animal'] == 'cat') & (df['length'] > 12)]


Out[14]:
animal name length inches
0 cat Anne 35 13.779535
1 cat Bob 45 17.716545
4 cat Charlie 32 12.598432

What's the mean length of a cat?


In [15]:
cats['inches'].describe()


Out[15]:
count     3.000000
mean     14.698171
std       2.679867
min      12.598432
25%      13.188984
50%      13.779535
75%      15.748040
max      17.716545
Name: inches, dtype: float64

What's the mean length of a dog?


In [16]:
dogs['inches'].describe()


Out[16]:
count     3.000000
mean     19.685050
std       5.905515
min      13.779535
25%      16.732292
50%      19.685050
75%      22.637808
max      25.590565
Name: inches, dtype: float64

Use groupby to accomplish both of the above tasks at once.


In [17]:
df.groupby('animal').describe()


Out[17]:
inches length
animal
cat count 3.000000 3.000000
mean 14.698171 37.333333
std 2.679867 6.806859
min 12.598432 32.000000
25% 13.188984 33.500000
50% 13.779535 35.000000
75% 15.748040 40.000000
max 17.716545 45.000000
dog count 3.000000 3.000000
mean 19.685050 50.000000
std 5.905515 15.000000
min 13.779535 35.000000
25% 16.732292 42.500000
50% 19.685050 50.000000
75% 22.637808 57.500000
max 25.590565 65.000000

Make a histogram of the length of dogs. I apologize that it is so boring.


In [18]:
dogs['inches'].hist()


Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x110f70630>

Change your graphing style to be something else (anything else!)


In [19]:
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
dogs['inches'].hist()


Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x110ece7b8>

Make a horizontal bar graph of the length of the animals, with their name as the label (look at the billionaires notebook I put on Slack!)


In [20]:
df.plot(kind = 'barh', x = 'name', y = 'inches', legend = False)


Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x111015828>

Make a sorted horizontal bar graph of the cats, with the larger cats on top.


In [21]:
cats.sort_values('inches').plot(kind = 'barh', x = 'name', y = 'inches', legend=False)


Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x111192f98>

02: Doing some research

Answer your own selection out of the following questions, or any other questions you might be able to think of. Write the question down first in a markdown cell (use a # to make the question a nice header), THEN try to get an answer to it. A lot of these are remarkably similar, and some you'll need to do manual work for - the GDP ones, for example.

If you are trying to figure out some other question that we didn't cover in class and it does not have to do with joining to another data set, we're happy to help you figure it out during lab!

Take a peek at the billionaires notebook I uploaded into Slack; it should be helpful for the graphs (I added a few other styles and options, too). You'll probably also want to look at the "sum()" line I added.

What country are most billionaires from? For the top ones, how many billionaires per billion people?


In [22]:
df = pd.read_excel('richpeople.xlsx')
df = df[df['year'] == 2014]

My Second Try, Which Used a Spreadsheet of Populations


In [24]:
cp = pd.read_excel('API_SP_POP_TOTL_DS2_en_excel_v2_toprowsremoved.xls')

In [28]:
pop_df = pd.merge(df, cp[['Country Code', '2014']], how = 'left', left_on = 'countrycode', right_on = 'Country Code')

In [30]:
dict_freq_countries = pop_df['citizenship'].value_counts().head(10).to_dict()
dict_freq_countries


Out[30]:
{'Brazil': 65,
 'China': 152,
 'France': 43,
 'Germany': 85,
 'Hong Kong': 45,
 'India': 56,
 'Italy': 35,
 'Russia': 111,
 'United Kingdom': 47,
 'United States': 499}

In [33]:
for x in dict_freq_countries:
    country_pop = pop_df[pop_df['citizenship'] == x].head(1).to_dict()
    for key in country_pop['2014'].keys():
        print(x, 'has', dict_freq_countries[x] / (country_pop['2014'][key] / 1000000000), 'billionaires per billion people.')
        if country_pop['2014'][key] / 1000000000 < 1:
            print('Of course, this is a nonsense figure for a country with less than a billion people.')
    print('')


Brazil has 315.414707889 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Russia has 771.800393867 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

China has 111.414895878 billionaires per billion people.

India has 43.2335100948 billionaires per billion people.

United States has 1564.964584 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Italy has 575.76073621 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

France has 649.375076915 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

United Kingdom has 728.014710854 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Germany has 1049.76203006 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Hong Kong has 6214.01052239 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

My First Try, Where I Looked Up the Populations by Hand


In [34]:
df['citizenship'].value_counts().head(10)


Out[34]:
United States     499
China             152
Russia            111
Germany            85
Brazil             65
India              56
United Kingdom     47
Hong Kong          45
France             43
Italy              35
Name: citizenship, dtype: int64

In [35]:
populations = [
    {'country': 'United States', 'pop': 0.3214}, 
    {'country': 'Germany', 'pop': 0.0809}, 
    {'country': 'China' , 'pop': 1.3675}, 
    {'country': 'Russia', 'pop': 0.1424}, 
    {'country': 'Japan', 'pop': 0.1269}, 
    {'country': 'Brazil' , 'pop': 0.2043}, 
    {'country': 'Hong Kong' , 'pop': 0.0071}, 
    {'country': 'France', 'pop': 0.0666}, 
    {'country': 'United Kingdom', 'pop': 0.0641}, 
    {'country': 'India', 'pop': 1.2517}, ]

for item in list(range(9)):
    print(populations[item]['country'], 'has', df['citizenship'].value_counts()[item] / populations[item]['pop'], 'billionaires per billion people.')
    if populations[item]['pop'] < 1:
        print('Of course, this is a nonsense figure for a country with less than a billion people.')
    print('')
#pop are in billions and based off of the CIA Factbook 2015 estimate


United States has 1552.58245177 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Germany has 1878.86279357 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

China has 81.1700182815 billionaires per billion people.

Russia has 596.91011236 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Japan has 512.214342002 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Brazil has 274.106705825 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Hong Kong has 6619.71830986 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

France has 675.675675676 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

United Kingdom has 670.826833073 billionaires per billion people.
Of course, this is a nonsense figure for a country with less than a billion people.

Who are the top 10 richest billionaires?


In [36]:
df[['name', 'rank', 'networthusbillion']].sort_values('networthusbillion', ascending = False).head(10)


Out[36]:
name rank networthusbillion
284 Bill Gates 1 76.0
348 Carlos Slim Helu 2 72.0
124 Amancio Ortega 3 64.0
2491 Warren Buffett 4 58.2
1377 Larry Ellison 5 48.0
509 David Koch 6 40.0
381 Charles Koch 6 40.0
2185 Sheldon Adelson 8 38.0
429 Christy Walton 9 36.7
1128 Jim Walton 10 34.7

What's the average wealth of a billionaire? Male? Female?


In [37]:
df[['gender', 'networthusbillion']].groupby('gender').describe()


Out[37]:
networthusbillion
gender
female count 180.000000
mean 3.920556
std 5.312604
min 1.000000
25% 1.400000
50% 2.300000
75% 3.700000
max 36.700000
male count 1473.000000
mean 3.902716
std 5.801227
min 1.000000
25% 1.400000
50% 2.100000
75% 3.700000
max 76.000000

Who is the poorest billionaire? Who are the top 10 poorest billionaires?

The 'Top 10' Poorest


In [38]:
df[['name', 'rank', 'networthusbillion']].sort_values('networthusbillion').head(10)


Out[38]:
name rank networthusbillion
234 B.R. Shetty 1565 1.0
2092 Rostam Azizi 1565 1.0
2401 Tory Burch 1565 1.0
734 Fred Chang 1565 1.0
171 Angela Bennett 1565 1.0
748 Fu Kwan 1565 1.0
2107 Ryan Kavanaugh 1565 1.0
1783 O. Francis Biondi 1565 1.0
1371 Lam Fong Ngo 1565 1.0
702 Feng Hailiang 1565 1.0

But There Are Many More People Who Make Just As Little Money


In [39]:
poorest_billionaires = df[(df['networthusbillion']) == (df['networthusbillion'].sort_values().head(1).values[0])]
print('But there are', poorest_billionaires['name'].count(), 'billionaires making just as little money:')
print('')
print(poorest_billionaires[['name', 'rank', 'networthusbillion']])


But there are 81 billionaires making just as little money:

                          name  rank  networthusbillion
56             Alberto Alcocer  1565                1.0
81               Alexander Vik  1565                1.0
129                    An Kang  1565                1.0
145   Andrea Reimann-Ciardelli  1565                1.0
164            Andrew Gotianun  1565                1.0
171             Angela Bennett  1565                1.0
178              Anne Beaufour  1565                1.0
234                B.R. Shetty  1565                1.0
261                Bent Jensen  1565                1.0
296                Boris Mints  1565                1.0
302              Brian Higgins  1565                1.0
320              C. James Koch  1565                1.0
343             Carlos Martins  1565                1.0
358           Chang Pyung-Soon  1565                1.0
424          Christopher Burch  1565                1.0
482              Dariusz Milek  1565                1.0
559                Ding Shijia  1565                1.0
560              Ding Shizhong  1565                1.0
569             Dmitry Korzhev  1565                1.0
573            Dmitry Troitsky  1565                1.0
638             Elena Baturina  1565                1.0
660           Enrique Banuelos  1565                1.0
702              Feng Hailiang  1565                1.0
734                 Fred Chang  1565                1.0
748                    Fu Kwan  1565                1.0
843             Graham Kirkham  1565                1.0
886          Harindarpal Banga  1565                1.0
914             Henri Beaufour  1565                1.0
952       Horst Julius Pudwill  1565                1.0
976                Huo Qinghua  1565                1.0
...                        ...   ...                ...
1731               Murat Vargi  1565                1.0
1755        Nerijus Numavicius  1565                1.0
1783         O. Francis Biondi  1565                1.0
1834               Pavel Tykac  1565                1.0
1868            Philip Falcone  1565                1.0
1971             Richard Chang  1565                1.0
2024             Robert Ingham  1565                1.0
2092              Rostam Azizi  1565                1.0
2107            Ryan Kavanaugh  1565                1.0
2147              Sara Blakely  1565                1.0
2158              Seo Jung-Jin  1565                1.0
2165             Sergei Petrov  1565                1.0
2168          Sergei Tsikalyuk  1565                1.0
2174            Serhiy Tihipko  1565                1.0
2203              Shoji Uehara  1565                1.0
2247    Stefan von Holtzbrinck  1565                1.0
2316         T.S. Kalyanaraman  1565                1.0
2359             Thomas Bailey  1565                1.0
2365             Thomas Kaplan  1565                1.0
2401                Tory Burch  1565                1.0
2443       Vivek Chaand Sehgal  1565                1.0
2472             Wang Jianfeng  1565                1.0
2479               Wang Muqing  1565                1.0
2484                 Wang Yong  1565                1.0
2530     William Moncrief, Jr.  1565                1.0
2547               Wu Chung-Yi  1565                1.0
2549                  Wu Xiong  1565                1.0
2561                 Yang Keng  1565                1.0
2591             Zdenek Bakala  1565                1.0
2607               Zhu Wenchen  1565                1.0

[81 rows x 3 columns]

'What is relationship to company'? And what are the most common relationships?

According to the PDF, relationship to company "describes the billionaire's relationship to the company primarily responsible for their wealth, such as founder, executive, relation, or shareholder"


In [40]:
df['relationshiptocompany'].value_counts()


Out[40]:
founder                                      818
relation                                     515
owner                                         79
chairman                                      64
investor                                      30
Chairman and Chief Executive Officer          15
Chairman                                       8
ceo                                            8
president                                      8
CEO                                            8
founder and chairman                           6
former CEO                                     5
partner                                        4
founder and CEO                                4
Relation                                       3
founder/chairman                               3
Vice Chairman                                  3
relation and chairman                          3
executive chairman                             3
founder/CEO                                    2
relative                                       2
Chief Executive                                2
leadership                                     2
former chairman and CEO                        2
founder, chairman, ceo                         2
founder/vice chairman                          2
president and ceo                              2
employee                                       2
general director                               2
founder, chairman                              2
                                            ... 
founder and ceo                                1
relation/vice chairman                         1
relation and ceo                               1
deputy chairman                                1
COO                                            1
Head of Board of Directors                     1
Exectuitve Director                            1
founder and executive vice chairman            1
founder and executive chairman                 1
founder and chairwoman                         1
Chairman, CEO                                  1
director                                       1
chairman of management committee               1
founder, chairwoman, ceo                       1
Honorary President for Life                    1
founder/relation                               1
head of high-yield bond trading dept           1
vice-chairman                                  1
chairman of the board                          1
Global Head of Real Estate                     1
chairman and ceo                               1
co-director of zinc, copper and lead           1
Chairman/shareholder                           1
Vice President of Infrastructure Software      1
founder/president                              1
owner and former CEO                           1
co-chairman                                    1
founder CEO owner                              1
supervisory board or directors                 1
inventor                                       1
Name: relationshiptocompany, dtype: int64

Most common source of wealth? Male vs. female?


In [41]:
print('Most common source of wealth:')
df['sourceofwealth'].value_counts().head(1)


Most common source of wealth:
Out[41]:
real estate    107
Name: sourceofwealth, dtype: int64

In [42]:
print('The most common source of wealth for females and males:')


The most common source of wealth for females and males:

In [43]:
df[['gender', 'sourceofwealth']].groupby('gender').describe()


Out[43]:
sourceofwealth
gender
female count 172
unique 100
top diversified
freq 9
male count 1464
unique 578
top real estate
freq 100

Given the richest person in a country, what % of the GDP is their wealth?


In [44]:
gdp = pd.read_excel('API_NY_GDP_MKTP_CD_DS2_en_excel_v2_rowsremoved.xls')

In [46]:
gdp.columns


Out[46]:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015'],
      dtype='object')

In [47]:
gdp_df = pd.merge(df, gdp[['Country Code', '2014']], how = 'left', left_on = 'countrycode', right_on = 'Country Code')

In [48]:
gdp_df.head(1)


Out[48]:
year name rank citizenship countrycode networthusbillion selfmade typeofwealth gender age ... gdpcurrentus sourceofwealth notes notes2 source source_2 source_3 source_4 Country Code 2014
0 2014 A. Jerrold Perenchio 663 United States USA 2.6 self-made executive male 83.0 ... NaN television, Univision represented Marlon Brando and Elizabeth Taylor NaN http://en.wikipedia.org/wiki/Jerry_Perenchio http://www.forbes.com/profile/a-jerrold-perenc... COLUMN ONE; A Hollywood Player Who Owns the Ga... NaN USA 1.741900e+13

1 rows × 32 columns


In [49]:
gdp_df[['name', 'citizenship', 'networthusbillion', '2014']].groupby('citizenship').max() #gives the max for each country


Out[49]:
name networthusbillion 2014
citizenship
Algeria Issad Rebrab 3.2 2.135185e+11
Angola Isabel dos Santos 3.7 NaN
Argentina Maria Ines de Lafuente Lacroze 5.5 5.376600e+11
Australia Vivek Chaand Sehgal 17.7 1.454675e+12
Austria Wolfgang Leitner 9.2 4.368875e+11
Belgium Patokh Chodiev 4.9 5.315466e+11
Brazil Werner Voigt 19.7 2.416636e+12
Canada Stephen Jarislowsky 22.6 1.785387e+12
Chile Sebastian Pinera 15.5 2.580615e+11
China Zong Qinghou 15.1 1.035483e+13
Colombia Luis Carlos Sarmiento 14.2 3.777396e+11
Cyprus Suat Gunsel 13.6 2.322616e+10
Czech Republic Zdenek Bakala 11.0 2.052697e+11
Denmark Niels Peter Louis-Hansen 10.2 NaN
Egypt Youssef Mansour 6.7 3.014990e+11
Finland Niklas Herlin 3.3 2.722166e+11
France Xavier Niel 34.5 2.829192e+12
Georgia Bidzina Ivanishvili 5.2 1.652996e+10
Germany Yvonne Bauer 25.0 3.868291e+12
Greece Spiro Latsis 3.2 2.355741e+11
Guernsey Stephen Lansdown 2.4 NaN
Hong Kong Zhang Zhirong 31.0 2.908958e+11
India Yusuf Hamied 18.6 2.048517e+12
Indonesia Theodore Rachmat 7.6 8.885382e+11
Ireland Pallonji Mistry 12.8 2.508136e+11
Israel Zadik Bino 7.0 3.056748e+11
Italy Stefano Pessina 26.5 2.141161e+12
Japan Yusaku Maezawa 18.4 4.601461e+12
Kazakhstan Vladimir Kim 2.2 2.178723e+11
Kuwait Mohannad Al-Kharafi 1.3 1.636124e+11
... ... ... ...
New Zealand Richard Chandler 7.0 1.999699e+11
Nigeria Mike Adenuga 25.0 5.685083e+11
Norway Stein Erik Hagen 5.0 4.998171e+11
Oman PNC Menon 1.2 8.179662e+10
Peru Vito Rodriguez Rodriguez 2.5 2.025963e+11
Philippines Tony Tan Caktiong 11.4 2.847771e+11
Poland Zygmunt Solorz-Zak 3.6 5.449666e+11
Portugal Belmiro de Azevedo 5.3 2.301169e+11
Romania Ioan Niculae 1.2 1.990437e+11
Russia Ziyaudin Magomedov 18.6 1.860598e+12
Saudi Arabia Sulaiman Al Rajhi 20.4 7.538317e+11
Singapore Zhong Sheng Jian 11.0 3.078598e+11
South Africa Stephen Saad 7.6 3.501408e+11
South Korea Suh Kyung-Bae 11.1 1.410383e+12
Spain Sandra Ortega Mera 64.0 1.381342e+12
St. Kitts and Nevis Jacky Xu 1.2 8.522031e+08
Swaziland Nathan Kirsh 3.7 4.412892e+09
Sweden Torbjorn Tornqvist 34.4 5.710905e+11
Switzerland Walter Frey 12.0 7.010371e+11
Taiwan Wu Chung-Yi 9.5 NaN
Tanzania Rostam Azizi 1.0 4.805668e+10
Thailand Vichai Srivaddhanaprabha 11.4 4.048240e+11
Turkey Suna Kirac 3.7 7.984292e+11
Uganda Sudhir Ruparelia 1.1 2.699848e+10
Ukraine Yuriy Kosiuk 12.5 1.318051e+11
United Arab Emirates Saif Al Ghurair 4.8 3.994513e+11
United Kingdom Xiu Li Hawken 13.0 2.988893e+12
United States Winnie Johnson-Marquart 76.0 1.741900e+13
Venezuela Lorenzo Mendoza 4.0 NaN
Vietnam Pham Nhat Vuong 1.6 1.862047e+11

69 rows × 3 columns


In [50]:
gdp_dict = gdp_df[['name', 'citizenship', 'networthusbillion', '2014']].groupby('citizenship').max().to_dict()

In [51]:
for country in gdp_dict['2014']:
    print(country)
    gdp_bill = gdp_dict['2014'][country] / 1000000000
    print('gdp in billions:', gdp_bill)
    print('richest billionaire:', gdp_dict['name'][country])
    print('how many billions:', gdp_dict['networthusbillion'][country])
    print('percent of gdp:', gdp_dict['networthusbillion'][country] / gdp_bill * 100)
    print('')


Swaziland
gdp in billions: 4.41289183003
richest billionaire: Nathan Kirsh
how many billions: 3.7
percent of gdp: 83.8452457598

Switzerland
gdp in billions: 701.037135966
richest billionaire: Walter Frey
how many billions: 12.0
percent of gdp: 1.71174954711

Nepal
gdp in billions: 19.7696421226
richest billionaire: Binod Chaudhary
how many billions: 1.1
percent of gdp: 5.56408655847

Norway
gdp in billions: 499.817138323
richest billionaire: Stein Erik Hagen
how many billions: 5.0
percent of gdp: 1.00036585716

Portugal
gdp in billions: 230.116912514
richest billionaire: Belmiro de Azevedo
how many billions: 5.3
percent of gdp: 2.30317708599

Tanzania
gdp in billions: 48.0566809822
richest billionaire: Rostam Azizi
how many billions: 1.0
percent of gdp: 2.08087612287

Japan
gdp in billions: 4601.46120689
richest billionaire: Yusaku Maezawa
how many billions: 18.4
percent of gdp: 0.39987297888

United Kingdom
gdp in billions: 2988.89328357
richest billionaire: Xiu Li Hawken
how many billions: 13.0
percent of gdp: 0.434943598404

Nigeria
gdp in billions: 568.508262378
richest billionaire: Mike Adenuga
how many billions: 25.0
percent of gdp: 4.39747346774

China
gdp in billions: 10354.8317293
richest billionaire: Zong Qinghou
how many billions: 15.1
percent of gdp: 0.145825643474

Vietnam
gdp in billions: 186.204652922
richest billionaire: Pham Nhat Vuong
how many billions: 1.6
percent of gdp: 0.859269612703

South Africa
gdp in billions: 350.140810003
richest billionaire: Stephen Saad
how many billions: 7.6
percent of gdp: 2.17055532599

Finland
gdp in billions: 272.216575502
richest billionaire: Niklas Herlin
how many billions: 3.3
percent of gdp: 1.21227004414

Israel
gdp in billions: 305.674837195
richest billionaire: Zadik Bino
how many billions: 7.0
percent of gdp: 2.29001512334

St. Kitts and Nevis
gdp in billions: 0.852203083881
richest billionaire: Jacky Xu
how many billions: 1.2
percent of gdp: 140.8115064

Ireland
gdp in billions: 250.813607686
richest billionaire: Pallonji Mistry
how many billions: 12.8
percent of gdp: 5.10339136624

Denmark
gdp in billions: nan
richest billionaire: Niels Peter Louis-Hansen
how many billions: 10.2
percent of gdp: nan

South Korea
gdp in billions: 1410.38298862
richest billionaire: Suh Kyung-Bae
how many billions: 11.1
percent of gdp: 0.787020269642

Turkey
gdp in billions: 798.429233036
richest billionaire: Suna Kirac
how many billions: 3.7
percent of gdp: 0.463409886175

Monaco
gdp in billions: nan
richest billionaire: Lily Safra
how many billions: 1.8
percent of gdp: nan

Lithuania
gdp in billions: 48.3539371103
richest billionaire: Nerijus Numavicius
how many billions: 1.0
percent of gdp: 2.068083924

Canada
gdp in billions: 1785.3866496
richest billionaire: Stephen Jarislowsky
how many billions: 22.6
percent of gdp: 1.26583225012

Argentina
gdp in billions: 537.659972702
richest billionaire: Maria Ines de Lafuente Lacroze
how many billions: 5.5
percent of gdp: 1.02295135945

Poland
gdp in billions: 544.966555714
richest billionaire: Zygmunt Solorz-Zak
how many billions: 3.6
percent of gdp: 0.66059099632

Algeria
gdp in billions: 213.518488688
richest billionaire: Issad Rebrab
how many billions: 3.2
percent of gdp: 1.49869925535

Netherlands
gdp in billions: 879.319321495
richest billionaire: Ralph Sonnenberg
how many billions: 10.4
percent of gdp: 1.1827330238

Egypt
gdp in billions: 301.498960052
richest billionaire: Youssef Mansour
how many billions: 6.7
percent of gdp: 2.22222988724

Sweden
gdp in billions: 571.090480171
richest billionaire: Torbjorn Tornqvist
how many billions: 34.4
percent of gdp: 6.02356390001

Greece
gdp in billions: 235.574074998
richest billionaire: Spiro Latsis
how many billions: 3.2
percent of gdp: 1.35838376953

Austria
gdp in billions: 436.887543467
richest billionaire: Wolfgang Leitner
how many billions: 9.2
percent of gdp: 2.10580506072

Colombia
gdp in billions: 377.739622866
richest billionaire: Luis Carlos Sarmiento
how many billions: 14.2
percent of gdp: 3.75920320253

Lebanon
gdp in billions: 45.7309452736
richest billionaire: Taha Mikati
how many billions: 3.1
percent of gdp: 6.77877962384

Oman
gdp in billions: 81.7966189857
richest billionaire: PNC Menon
how many billions: 1.2
percent of gdp: 1.4670533023

France
gdp in billions: 2829.19203917
richest billionaire: Xavier Niel
how many billions: 34.5
percent of gdp: 1.21942941739

Philippines
gdp in billions: 284.777093019
richest billionaire: Tony Tan Caktiong
how many billions: 11.4
percent of gdp: 4.00313096785

Chile
gdp in billions: 258.061522887
richest billionaire: Sebastian Pinera
how many billions: 15.5
percent of gdp: 6.00631966619

Russia
gdp in billions: 1860.59792276
richest billionaire: Ziyaudin Magomedov
how many billions: 18.6
percent of gdp: 0.999678639454

Venezuela
gdp in billions: nan
richest billionaire: Lorenzo Mendoza
how many billions: 4.0
percent of gdp: nan

United States
gdp in billions: 17419.0
richest billionaire: Winnie Johnson-Marquart
how many billions: 76.0
percent of gdp: 0.436305183994

Angola
gdp in billions: nan
richest billionaire: Isabel dos Santos
how many billions: 3.7
percent of gdp: nan

Hong Kong
gdp in billions: 290.895784166
richest billionaire: Zhang Zhirong
how many billions: 31.0
percent of gdp: 10.6567374597

Czech Republic
gdp in billions: 205.269709743
richest billionaire: Zdenek Bakala
how many billions: 11.0
percent of gdp: 5.35880330992

Thailand
gdp in billions: 404.823952118
richest billionaire: Vichai Srivaddhanaprabha
how many billions: 11.4
percent of gdp: 2.81603890787

Uganda
gdp in billions: 26.9984772888
richest billionaire: Sudhir Ruparelia
how many billions: 1.1
percent of gdp: 4.0743038514

United Arab Emirates
gdp in billions: 399.451327434
richest billionaire: Saif Al Ghurair
how many billions: 4.8
percent of gdp: 1.20164827861

India
gdp in billions: 2048.51743887
richest billionaire: Yusuf Hamied
how many billions: 18.6
percent of gdp: 0.907973720264

Singapore
gdp in billions: 307.859758504
richest billionaire: Zhong Sheng Jian
how many billions: 11.0
percent of gdp: 3.57305548912

Cyprus
gdp in billions: 23.2261589862
richest billionaire: Suat Gunsel
how many billions: 13.6
percent of gdp: 58.5546667794

Saudi Arabia
gdp in billions: 753.831733333
richest billionaire: Sulaiman Al Rajhi
how many billions: 20.4
percent of gdp: 2.70617421607

Spain
gdp in billions: 1381.34210174
richest billionaire: Sandra Ortega Mera
how many billions: 64.0
percent of gdp: 4.63317522282

Kazakhstan
gdp in billions: 217.872250221
richest billionaire: Vladimir Kim
how many billions: 2.2
percent of gdp: 1.00976604307

Australia
gdp in billions: 1454.67547967
richest billionaire: Vivek Chaand Sehgal
how many billions: 17.7
percent of gdp: 1.2167662305

Malaysia
gdp in billions: 338.103822298
richest billionaire: Yeoh Tiong Lay
how many billions: 11.5
percent of gdp: 3.40132209149

Brazil
gdp in billions: 2416.63550608
richest billionaire: Werner Voigt
how many billions: 19.7
percent of gdp: 0.815182924792

Indonesia
gdp in billions: 888.538201025
richest billionaire: Theodore Rachmat
how many billions: 7.6
percent of gdp: 0.855337451021

New Zealand
gdp in billions: 199.969858904
richest billionaire: Richard Chandler
how many billions: 7.0
percent of gdp: 3.50052754869

Mexico
gdp in billions: 1294.68973323
richest billionaire: Rufino Vigil Gonzalez
how many billions: 72.0
percent of gdp: 5.56117795267

Guernsey
gdp in billions: nan
richest billionaire: Stephen Lansdown
how many billions: 2.4
percent of gdp: nan

Romania
gdp in billions: 199.043652215
richest billionaire: Ioan Niculae
how many billions: 1.2
percent of gdp: 0.602882828286

Georgia
gdp in billions: 16.5299631874
richest billionaire: Bidzina Ivanishvili
how many billions: 5.2
percent of gdp: 31.4580252905

Germany
gdp in billions: 3868.29123182
richest billionaire: Yvonne Bauer
how many billions: 25.0
percent of gdp: 0.646280191996

Taiwan
gdp in billions: nan
richest billionaire: Wu Chung-Yi
how many billions: 9.5
percent of gdp: nan

Ukraine
gdp in billions: 131.805126738
richest billionaire: Yuriy Kosiuk
how many billions: 12.5
percent of gdp: 9.48369787225

Peru
gdp in billions: 202.596307719
richest billionaire: Vito Rodriguez Rodriguez
how many billions: 2.5
percent of gdp: 1.23398102766

Belgium
gdp in billions: 531.546586179
richest billionaire: Patokh Chodiev
how many billions: 4.9
percent of gdp: 0.921838297416

Macau
gdp in billions: 55.5017340461
richest billionaire: Lam Fong Ngo
how many billions: 1.8
percent of gdp: 3.24314191427

Morocco
gdp in billions: 110.009040838
richest billionaire: Othman Benjelloun
how many billions: 2.8
percent of gdp: 2.54524535316

Kuwait
gdp in billions: 163.61243851
richest billionaire: Mohannad Al-Kharafi
how many billions: 1.3
percent of gdp: 0.794560616441

Italy
gdp in billions: 2141.16132537
richest billionaire: Stefano Pessina
how many billions: 26.5
percent of gdp: 1.2376461169

Add up the wealth of all of the billionaires in a given country (or a few countries) and then compare it to the GDP of the country, or other billionaires, so like pit the US vs India


In [52]:
gdp_df[['citizenship', 'networthusbillion', '2014']].groupby('citizenship').sum() #gives the sum for each country


Out[52]:
networthusbillion 2014
citizenship
Algeria 3.2 2.135185e+11
Angola 3.7 NaN
Argentina 11.3 2.688300e+12
Australia 85.4 4.218559e+13
Austria 33.8 4.368875e+12
Belgium 8.0 1.594640e+12
Brazil 192.2 1.570813e+14
Canada 112.8 5.713237e+13
Chile 41.3 3.096738e+12
China 375.8 1.573934e+15
Colombia 30.6 1.510958e+12
Cyprus 19.7 9.290464e+10
Czech Republic 18.4 1.231618e+12
Denmark 26.9 NaN
Egypt 22.3 2.382709e+12
Finland 6.6 1.088866e+12
France 235.3 1.216553e+14
Georgia 5.2 1.652996e+10
Germany 401.4 3.288048e+14
Greece 8.2 7.067222e+11
Guernsey 2.4 NaN
Hong Kong 213.7 1.309031e+13
India 191.9 1.147170e+14
Indonesia 47.8 1.688223e+13
Ireland 25.5 1.254068e+12
Israel 51.8 5.502147e+12
Italy 158.1 7.494065e+13
Japan 101.0 1.242395e+14
Kazakhstan 9.2 1.089361e+12
Kuwait 6.5 8.180622e+11
... ... ...
New Zealand 9.8 3.999397e+11
Nigeria 33.3 2.274033e+12
Norway 21.8 4.498354e+12
Oman 2.3 1.635932e+11
Peru 11.9 1.620770e+12
Philippines 40.1 2.847771e+12
Poland 12.8 2.724833e+12
Portugal 10.6 6.903507e+11
Romania 1.2 1.990437e+11
Russia 422.5 2.065264e+14
Saudi Arabia 49.0 5.276822e+12
Singapore 45.1 4.925756e+12
South Africa 25.4 2.801126e+12
South Korea 60.7 3.808034e+13
Spain 122.6 3.591489e+13
St. Kitts and Nevis 1.2 8.522031e+08
Swaziland 3.7 4.412892e+09
Sweden 116.7 1.085072e+13
Switzerland 80.2 1.542282e+13
Taiwan 75.8 NaN
Tanzania 1.0 4.805668e+10
Thailand 36.8 4.453063e+12
Turkey 43.2 1.916230e+13
Uganda 1.1 2.699848e+10
Ukraine 26.6 1.186246e+12
United Arab Emirates 14.6 1.597805e+12
United Kingdom 152.0 1.404780e+14
United States 2322.4 8.692081e+15
Venezuela 9.0 NaN
Vietnam 1.6 1.862047e+11

69 rows × 2 columns


In [53]:
bill_df = gdp_df[['citizenship', 'networthusbillion', '2014']].groupby('citizenship').sum().to_dict()

In [54]:
for country in bill_df['2014']:
    print(country)
    gdp_bill = bill_df['2014'][country] / 1000000000
    print('gdp in billions:', gdp_bill)
    print('how many billions the billionaires there make:', bill_df['networthusbillion'][country])
    print('percent of gdp:', bill_df['networthusbillion'][country] / gdp_bill * 100)
    print('')


Swaziland
gdp in billions: 4.41289183003
how many billions the billionaires there make: 3.7
percent of gdp: 83.8452457598

Switzerland
gdp in billions: 15422.8169913
how many billions the billionaires there make: 80.2
percent of gdp: 0.520008763934

Nepal
gdp in billions: 19.7696421226
how many billions the billionaires there make: 1.1
percent of gdp: 5.56408655847

Norway
gdp in billions: 4498.35424491
how many billions the billionaires there make: 21.8
percent of gdp: 0.484621681911

Portugal
gdp in billions: 690.350737541
how many billions the billionaires there make: 10.6
percent of gdp: 1.53545139066

Tanzania
gdp in billions: 48.0566809822
how many billions the billionaires there make: 1.0
percent of gdp: 2.08087612287

Japan
gdp in billions: 124239.452586
how many billions the billionaires there make: 101.0
percent of gdp: 0.0812946273489

United Kingdom
gdp in billions: 140477.984328
how many billions the billionaires there make: 152.0
percent of gdp: 0.108202008114

Nigeria
gdp in billions: 2274.03304951
how many billions the billionaires there make: 33.3
percent of gdp: 1.46435866476

China
gdp in billions: 1573934.42286
how many billions the billionaires there make: 375.8
percent of gdp: 0.023876471252

Vietnam
gdp in billions: 186.204652922
how many billions the billionaires there make: 1.6
percent of gdp: 0.859269612703

South Africa
gdp in billions: 2801.12648003
how many billions the billionaires there make: 25.4
percent of gdp: 0.906778047372

Finland
gdp in billions: 1088.86630201
how many billions the billionaires there make: 6.6
percent of gdp: 0.60613502207

Israel
gdp in billions: 5502.14706951
how many billions the billionaires there make: 51.8
percent of gdp: 0.941450661816

St. Kitts and Nevis
gdp in billions: 0.852203083881
how many billions the billionaires there make: 1.2
percent of gdp: 140.8115064

Ireland
gdp in billions: 1254.06803843
how many billions the billionaires there make: 25.5
percent of gdp: 2.03338249748

Denmark
gdp in billions: nan
how many billions the billionaires there make: 26.9
percent of gdp: nan

South Korea
gdp in billions: 38080.3406926
how many billions the billionaires there make: 60.7
percent of gdp: 0.159399834392

Turkey
gdp in billions: 19162.3015929
how many billions the billionaires there make: 43.2
percent of gdp: 0.225442647328

Monaco
gdp in billions: nan
how many billions the billionaires there make: 4.6
percent of gdp: nan

Lithuania
gdp in billions: 48.3539371103
how many billions the billionaires there make: 1.0
percent of gdp: 2.068083924

Canada
gdp in billions: 57132.3727873
how many billions the billionaires there make: 112.8
percent of gdp: 0.197436224853

Argentina
gdp in billions: 2688.29986351
how many billions the billionaires there make: 11.3
percent of gdp: 0.420340013158

Poland
gdp in billions: 2724.83277857
how many billions the billionaires there make: 12.8
percent of gdp: 0.469753597383

Algeria
gdp in billions: 213.518488688
how many billions the billionaires there make: 3.2
percent of gdp: 1.49869925535

Netherlands
gdp in billions: 6155.23525046
how many billions the billionaires there make: 24.2
percent of gdp: 0.393161252418

Egypt
gdp in billions: 2382.70929586
how many billions the billionaires there make: 22.3
percent of gdp: 0.935909388473

Sweden
gdp in billions: 10850.7191232
how many billions the billionaires there make: 116.7
percent of gdp: 1.07550475387

Greece
gdp in billions: 706.722224995
how many billions the billionaires there make: 8.2
percent of gdp: 1.16028613647

Austria
gdp in billions: 4368.87543467
how many billions the billionaires there make: 33.8
percent of gdp: 0.773654467962

Colombia
gdp in billions: 1510.95849146
how many billions the billionaires there make: 30.6
percent of gdp: 2.02520454221

Lebanon
gdp in billions: 274.385671642
how many billions the billionaires there make: 12.3
percent of gdp: 4.48274136415

Oman
gdp in billions: 163.593237971
how many billions the billionaires there make: 2.3
percent of gdp: 1.40592608137

France
gdp in billions: 121655.257684
how many billions the billionaires there make: 235.3
percent of gdp: 0.193415397311

Philippines
gdp in billions: 2847.77093019
how many billions the billionaires there make: 40.1
percent of gdp: 1.40811887553

Chile
gdp in billions: 3096.73827464
how many billions the billionaires there make: 41.3
percent of gdp: 1.33366130222

Russia
gdp in billions: 206526.369427
how many billions the billionaires there make: 422.5
percent of gdp: 0.204574360733

Venezuela
gdp in billions: nan
how many billions the billionaires there make: 9.0
percent of gdp: nan

United States
gdp in billions: 8692081.0
how many billions the billionaires there make: 2322.4
percent of gdp: 0.0267185729171

Angola
gdp in billions: nan
how many billions the billionaires there make: 3.7
percent of gdp: nan

Hong Kong
gdp in billions: 13090.3102875
how many billions the billionaires there make: 213.7
percent of gdp: 1.6325052295

Czech Republic
gdp in billions: 1231.61825846
how many billions the billionaires there make: 18.4
percent of gdp: 1.49396940761

Thailand
gdp in billions: 4453.0634733
how many billions the billionaires there make: 36.8
percent of gdp: 0.82639738285

Uganda
gdp in billions: 26.9984772888
how many billions the billionaires there make: 1.1
percent of gdp: 4.0743038514

United Arab Emirates
gdp in billions: 1597.80530973
how many billions the billionaires there make: 14.6
percent of gdp: 0.913753378528

India
gdp in billions: 114716.976577
how many billions the billionaires there make: 191.9
percent of gdp: 0.167281256642

Singapore
gdp in billions: 4925.75613606
how many billions the billionaires there make: 45.1
percent of gdp: 0.915595469086

Cyprus
gdp in billions: 92.9046359447
how many billions the billionaires there make: 19.7
percent of gdp: 21.2045392565

Saudi Arabia
gdp in billions: 5276.82213333
how many billions the billionaires there make: 49.0
percent of gdp: 0.928589191788

Spain
gdp in billions: 35914.8946451
how many billions the billionaires there make: 122.6
percent of gdp: 0.34136254947

Kazakhstan
gdp in billions: 1089.36125111
how many billions the billionaires there make: 9.2
percent of gdp: 0.844531599655

Australia
gdp in billions: 42185.5889103
how many billions the billionaires there make: 85.4
percent of gdp: 0.202438800088

Malaysia
gdp in billions: 4395.34968988
how many billions the billionaires there make: 53.1
percent of gdp: 1.20809500373

Brazil
gdp in billions: 157081.307895
how many billions the billionaires there make: 192.2
percent of gdp: 0.122357015342

Indonesia
gdp in billions: 16882.2258195
how many billions the billionaires there make: 47.8
percent of gdp: 0.28313802049

New Zealand
gdp in billions: 399.939717807
how many billions the billionaires there make: 9.8
percent of gdp: 2.45036928408

Mexico
gdp in billions: 20715.0357317
how many billions the billionaires there make: 142.9
percent of gdp: 0.689837091524

Guernsey
gdp in billions: nan
how many billions the billionaires there make: 2.4
percent of gdp: nan

Romania
gdp in billions: 199.043652215
how many billions the billionaires there make: 1.2
percent of gdp: 0.602882828286

Georgia
gdp in billions: 16.5299631874
how many billions the billionaires there make: 5.2
percent of gdp: 31.4580252905

Germany
gdp in billions: 328804.754705
how many billions the billionaires there make: 401.4
percent of gdp: 0.12207852662

Taiwan
gdp in billions: nan
how many billions the billionaires there make: 75.8
percent of gdp: nan

Ukraine
gdp in billions: 1186.24614064
how many billions the billionaires there make: 26.6
percent of gdp: 2.24236767468

Peru
gdp in billions: 1620.77046175
how many billions the billionaires there make: 11.9
percent of gdp: 0.73421871146

Belgium
gdp in billions: 1594.63975854
how many billions the billionaires there make: 8.0
percent of gdp: 0.501680706077

Macau
gdp in billions: 111.003468092
how many billions the billionaires there make: 2.8
percent of gdp: 2.5224437111

Morocco
gdp in billions: 440.036163354
how many billions the billionaires there make: 7.4
percent of gdp: 1.68167996548

Kuwait
gdp in billions: 818.062192551
how many billions the billionaires there make: 6.5
percent of gdp: 0.794560616441

Italy
gdp in billions: 74940.6463879
how many billions the billionaires there make: 158.1
percent of gdp: 0.210966955345


In [55]:
for country in bill_df['2014']:
    if country == 'United States':
        country1 = country
        print(country)
        gdp_bill1 = bill_df['2014'][country] / 1000000000
        print('gdp in billions:', gdp_bill1)
        billions1 = bill_df['networthusbillion'][country]
        print('how many billions:', billions1)
        percent1 = bill_df['networthusbillion'][country] / gdp_bill1 * 100
        print('percent of gdp:', percent1)
        print('')
    elif country == 'India':
        country2 = country
        print(country)
        gdp_bill2 = bill_df['2014'][country] / 1000000000
        print('gdp in billions:', gdp_bill2)
        billions2 = bill_df['networthusbillion'][country]
        print('how many billions:', billions2)
        percent2 = bill_df['networthusbillion'][country] / gdp_bill2 * 100
        print('percent of gdp:', percent2)
        print('')
        
print(country1 + "'s GDP is", gdp_bill1 / gdp_bill2, 'times that of', country2)
print(country1, 'billionaires make', billions1 / billions2, 'times the money those in', country2, 'do')
print(country1, 'billionaires share of their countrys\'s GDP is', percent1 / percent2, 'times that of those living in', country2)


United States
gdp in billions: 8692081.0
how many billions: 2322.4
percent of gdp: 0.0267185729171

India
gdp in billions: 114716.976577
how many billions: 191.9
percent of gdp: 0.167281256642

United States's GDP is 75.7697880415 times that of India
United States billionaires make 12.1021365294 times the money those in India do
United States billionaires share of their countrys's GDP is 0.159722454586 times that of those living in India

What are the most common industries for billionaires to come from? What's the total amount of billionaire money from each industry?


In [56]:
# df.columns


Out[56]:
Index(['year', 'name', 'rank', 'citizenship', 'countrycode',
       'networthusbillion', 'selfmade', 'typeofwealth', 'gender', 'age',
       'industry', 'IndustryAggregates', 'region', 'north',
       'politicalconnection', 'founder', 'generationofinheritance', 'sector',
       'company', 'companytype', 'relationshiptocompany', 'foundingdate',
       'gdpcurrentus', 'sourceofwealth', 'notes', 'notes2', 'source',
       'source_2', 'source_3', 'source_4'],
      dtype='object')

In [57]:
# df[['networthusbillion', 'industry', 'sector']].head()


Out[57]:
networthusbillion industry sector
1 2.6 Media media
5 2.5 Retail, Restaurant trading
6 4.8 Diversified financial industrial goods
8 2.4 Money Management Banking
9 1.2 Consumer sugar, flour, cement

In [79]:
print('The most common industries for billionaires to come from:')
df['industry'].value_counts().head()


The most common industries for billionaires to come from:
Out[79]:
Consumer                 291
Real Estate              190
Retail, Restaurant       174
Diversified financial    132
Technology-Computer      131
Name: industry, dtype: int64

In [80]:
print('The total amount of billionaire money in each industry:')
df[['industry', 'networthusbillion']].groupby('industry').sum().sort_values('networthusbillion', ascending = False)


The total amount of billionaire money in each industry:
Out[80]:
networthusbillion
industry
Consumer 1177.8
Retail, Restaurant 820.9
Technology-Computer 684.6
Diversified financial 614.4
Real Estate 573.8
Media 490.5
Money Management 381.3
Energy 340.5
Non-consumer industrial 298.4
Mining and metals 240.6
Technology-Medical 218.0
Other 179.3
Constrution 175.4
Hedge funds 167.2
Private equity/leveraged buyout 71.9
Venture Capital 11.1
0 7.6

How many self made billionaires vs. others?


In [82]:
df['selfmade'].value_counts()


Out[82]:
self-made    1146
inherited     505
Name: selfmade, dtype: int64

How old are billionaires? How old are billionaires self made vs. non self made? or different industries?

Billionaire Ages


In [90]:
df['age'].hist()


Out[90]:
<matplotlib.axes._subplots.AxesSubplot at 0x111ccb668>

In [84]:
df['age'].describe()


/usr/local/lib/python3.5/site-packages/numpy/lib/function_base.py:3823: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[84]:
count    1590.000000
mean       63.342138
std        13.137743
min        24.000000
25%              NaN
50%              NaN
75%              NaN
max        98.000000
Name: age, dtype: float64

Self-Made Billionaire Ages


In [87]:
df[df['selfmade'] == 'self-made']['age'].hist()


Out[87]:
<matplotlib.axes._subplots.AxesSubplot at 0x111c07400>

In [91]:
df[df['selfmade'] == 'self-made']['age'].describe()


/usr/local/lib/python3.5/site-packages/numpy/lib/function_base.py:3823: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[91]:
count    1112.000000
mean       62.625899
std        13.054631
min        29.000000
25%              NaN
50%              NaN
75%              NaN
max        96.000000
Name: age, dtype: float64

The Ages of People Who Have Inherited Billions


In [92]:
df[df['selfmade'] == 'inherited']['age'].hist()


Out[92]:
<matplotlib.axes._subplots.AxesSubplot at 0x111e18780>

In [93]:
df[df['selfmade'] == 'inherited']['age'].describe()


/usr/local/lib/python3.5/site-packages/numpy/lib/function_base.py:3823: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[93]:
count    476.000000
mean      64.962185
std       13.174403
min       24.000000
25%             NaN
50%             NaN
75%             NaN
max       98.000000
Name: age, dtype: float64

The Ages of Billionaires in Different Industries


In [96]:
df[['age', 'industry']].groupby('industry').mean().sort_values('age')


Out[96]:
age
industry
Technology-Computer 54.496183
Hedge funds 55.325581
Venture Capital 58.200000
Non-consumer industrial 59.898734
Mining and metals 60.161765
Energy 61.651163
Private equity/leveraged buyout 62.277778
0 63.600000
Technology-Medical 63.870130
Real Estate 64.118919
Media 64.346535
Consumer 64.735507
Constrution 64.965517
Money Management 66.016949
Diversified financial 66.344000
Other 66.666667
Retail, Restaurant 66.734177

Who are the youngest billionaires? The oldest? Age distribution - maybe make a graph about it?

The Youngest Billionaires


In [97]:
df[['name', 'age']].sort_values('age').head()


Out[97]:
name age
1838 Perenna Kei 24.0
605 Dustin Moskovitz 29.0
1586 Mark Zuckerberg 29.0
189 Anton Kathrein, Jr. 29.0
602 Drew Houston 30.0

The Oldest Billionaires


In [98]:
df[['name', 'age']].sort_values('age', ascending=False).head()


Out[98]:
name age
516 David Rockefeller, Sr. 98.0
1277 Karl Wlaschek 96.0
1328 Kirk Kerkorian 96.0
921 Henry Hillman 95.0
666 Erika Pohl-Stroher 95.0

The Age Distribution of Billionaires


In [99]:
df['age'].hist()


Out[99]:
<matplotlib.axes._subplots.AxesSubplot at 0x111e1b048>

Maybe just made a graph about how wealthy they are in general?


In [100]:
df['networthusbillion'].hist()


Out[100]:
<matplotlib.axes._subplots.AxesSubplot at 0x11209a860>

In [107]:
# df['networthusbillion'].sort_values(ascending = False).head(10)


Out[107]:
284     76.0
348     72.0
124     64.0
2491    58.2
1377    48.0
381     40.0
509     40.0
2185    38.0
429     36.7
1128    34.7
Name: networthusbillion, dtype: float64

Maybe plot their net worth vs age (scatterplot)


In [111]:
df[['networthusbillion', 'age']].plot(kind = 'scatter', x = 'networthusbillion', y = 'age')


Out[111]:
<matplotlib.axes._subplots.AxesSubplot at 0x111801a20>

Make a bar graph of the top 10 or 20 richest


In [119]:
df[['name', 'networthusbillion']].sort_values('networthusbillion', ascending = False).head(10).plot(kind = 'bar', x = 'name', y = 'networthusbillion')


Out[119]:
<matplotlib.axes._subplots.AxesSubplot at 0x112816eb8>

03: Finding your own dataset

On Thursday, bring a dataset with you that's a csv/tsv/whatever. Try to open it in pandas, and do df.head() to make sure it displays OK.


In [ ]:
df = pd.read_json('https://data.sfgov.org/api/views/gxxq-x39z/rows.json')
# Can't get this to work! And I can't save the source code for some reason.

In [ ]:
df.head()