Let us first import the database.
In [1]:
import pandas as pd
In [32]:
import matplotlib.pyplot as plt
In [33]:
%matplotlib inline
In [34]:
df = pd.read_excel("richpeople.xlsx")
In [35]:
rich = df[df['year'] == 2014] #Getting only the latest list.
In [36]:
rich.head(5)
Out[36]:
In [37]:
rich.columns
Out[37]:
What country are most billionaires from? For the top ones, how many billionaires per billion people?
In [38]:
numberofbillionaires=rich['countrycode'].value_counts() #Clearly, Americans lead.
numberofbillionaires.head(10)
Out[38]:
In [39]:
numberofbillionairesperbillionpeople=numberofbillionaires/1000000000
numberofbillionairesperbillionpeople.head(6)
Out[39]:
Who are the top 10 richest billionaires?
In [40]:
sortedrich=rich.sort_values(by='networthusbillion',ascending=False)
sortedrich.head(10)
Out[40]:
Who is the poorest billionaire? Who are the top 10 poorest billionaires?
In [41]:
sortedpoor=rich.sort_values(by='networthusbillion')
sortedpoor.head(10)
Out[41]:
'What is relationship to company'? And what are the most common relationships?
In [42]:
rich['relationshiptocompany'].head(5)
Out[42]:
It is how the millionaire is related to the company in the industry from which they made the billions.
In [43]:
relations=rich['relationshiptocompany'].value_counts()
relations.head(10) #The problem with the dataset is a little apparent now.
#CEO and ceo are seen differently. Chairman AND Ceo is one category, either of those are separate categories.
#One can replace ceo with CEO and see how things change.
Out[43]:
In [44]:
rich1=rich.replace(['ceo'],['CEO'])
In [45]:
relations=rich1['relationshiptocompany'].value_counts()
relations.head(10) #The newer version of rich has CEOs merged into one list. So CEO is sixth most common position after investor.
Out[45]:
Most common source of wealth? Male vs. female?
In [92]:
source=rich1['sourceofwealth'].value_counts()
source.head(10) #REAL ESTATE IS WHERE THE MONEY IS, GUYS! "Diversified", HAHA!
maleb=rich1[rich1['gender']=='male']
maleb['sourceofwealth'].value_counts()
Out[92]:
In [93]:
femaleb=rich1[rich1['gender']=='female']
femaleb['sourceofwealth'].value_counts()
Out[93]:
In [47]:
rich1['gender'].value_counts() #Inherent Patriarchy perhaps
Out[47]:
Given the richest person in a country, what % of the GDP is their wealth?
In [48]:
gdp_csv=pd.read_csv('http://data.okfn.org/data/core/gdp/r/gdp.csv')
gdp_2014 = gdp_csv[gdp_csv['Year'] == 2014]
gdp_2014.tail()
gdp_2014=gdp_2014.rename(columns={'Country Code': 'countrycode'})
gdp_2014.tail()
rich2=pd.merge(rich1, gdp_2014, on='countrycode', how='outer')
rich2['percentageofcountrygdptheirwealthis']=(rich2['networthusbillion'])/(rich2['Value'])
rich2['percentageofcountrygdptheirwealthis'].head()
Out[48]:
What's the average wealth of a billionaire? Male? Female?
In [94]:
df['networthusbillion'].describe()
Out[94]:
In [95]:
maleb['networthusbillion'].describe()
Out[95]:
In [96]:
femaleb['networthusbillion'].describe()
Out[96]:
Add up the wealth of all of the billionaires in a given country (or a few countries) and then compare it to the GDP of the country, or other billionaires, so like pit the US vs India
In [113]:
americanbillionaires=rich2[rich2['countrycode']=='USA']
indianbillionaires=rich2[rich2['countrycode']=='IND']
wealthofamerica=americanbillionaires['networthusbillion'].sum()
wealthofamerica
Out[113]:
In [112]:
wealthofindia=indianbillionaires['networthusbillion'].sum()
wealthofindia
Out[112]:
What are the most common industries for billionaires to come from? What's the total amount of billionaire money from each industry?
In [51]:
rich2['industry'].value_counts()
Out[51]:
How many self made billionaires vs. others?
In [52]:
rich2['selfmade'].value_counts()
Out[52]:
How old are billionaires? How old are billionaires self made vs. non self made? or different industries?
In [58]:
selfmade=rich2[rich2['selfmade']=='self-made']
nonselfmade=rich2[rich2['selfmade']=='inherited']
print("The values for self made billionaires")
selfmade['age'].value_counts()
selfmade['age'].hist()
Out[58]:
In [56]:
print("The values for non-self made billionaires")
nonselfmade['age'].value_counts()
nonselfmade['age'].hist(label='Self Made Billionaires Age vs Number')
Out[56]:
In [ ]:
rich2['age'].value_counts()
Who are the youngest billionaires? The oldest?
In [61]:
rich2.sort_values(by='age').head(10)
Out[61]:
In [63]:
rich2.sort_values(by='age',ascending=False).head(10)
Out[63]:
Maybe plot their net worth vs age (scatterplot)
In [66]:
rich2.plot(kind='scatter',x='age',y='networthusbillion')
Out[66]:
In [ ]:
In [ ]:
In [ ]: