December 2016
Author: George Qian
Contact: george.qian@stern.nyu.edu
Many factors goes into transforming a nation from a third-world country to a first world country. As the saying goes "correlation does not equal causation", in this project I will only be looking at the correlation between a variable in development such as foreign direct investment and the country's GDP/GNI growth. I'm going to determine which of the factors I've chosen correlates most closely with a country's economic development. I'm going to explore the correlation between those factors and the country's wealth distribution. The correlation could be further explored to determine if there is a causation.
In [1]:
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics
import seaborn as sns # seaborn graphics package
import numpy as np # foundation for pandas
import sys # system module
import datetime as dt # date and time module
%matplotlib inline
Usually data on developing countries are difficult to gather as most of those countries lacks in census and other data collection. Out of all the global agencies, the World Bank has one of the most extensive collection of data for developing countries. The data for this report was collected from World Bank DataBank. I downloaded the datasets I needed and reuploaded to Github so others could also have access to the data used.
Countries Chosen: Burundi, Djibouti, Ethiopia, Kenya, Madagascar, Tanzania, Uganda, Zambia, and Zimbabwe
Factors:
Economic Development Indicators:
Wealth Distribution
1.. GINI Coefficient
In [2]:
#Importing the data from github
url = 'https://raw.githubusercontent.com/ghq201/databootcamp/master/Project%20Data.csv'
wb = pd.read_csv(url)
wb.head(153)
Out[2]:
As expected there are a lot of missing data for each countries. Some countries have more data than others and most only have data for a couple years for a certain variable.
In [3]:
#Setting the index to Series and Country
wb=wb.set_index(['Series Name', 'Country Name'])
In [4]:
#Removing missing values
wb=wb.replace(to_replace=['..'], value=[np.nan]).head(152)
In [5]:
#Converting objects to float as for some reason the type still came out as object after .replace
for i in range(2001,2016):
str_i=str(i)
wb[str_i]=wb[str_i].apply(pd.to_numeric)
wb.dtypes
Out[5]:
In [6]:
#Transposing the dataset
wb=wb.T.head(152)
wb.head(152)
Out[6]:
In [7]:
#separating the datasets
FDI=wb['Foreign direct investment, net inflows (BoP, current US$)']
Arable=wb['Arable land (% of land area)']
ArablePP=wb['Arable land (hectares per person)']
CGDebt=wb['Central government debt, total (% of GDP)']
CooSchool=wb['Children out of school (% of primary school age)']
FemaleGM=wb['Female genital mutilation prevalence (%)']
FertCons=wb['Fertilizer consumption (kilograms per hectare of arable land)']
Firms=wb['Firms expected to give gifts in meetings with tax officials (% of firms)']
GDP=wb['GDP (constant LCU)']
GDPPC=wb['GDP per capita (constant LCU)']
GINI=wb['GINI index (World Bank estimate)']
GNI=wb['GNI (constant LCU)']
GNIPC=wb['GNI per capita (constant LCU)']
HealthEx=wb['Health expenditure per capita, PPP (constant 2011 international $)']
LiteracyM=wb['Literacy rate, adult male (% of males ages 15 and above)']
LiteracyF=wb['Literacy rate, adult female (% of females ages 15 and above)']
ODA=wb['Net ODA received per capita (current US$)']
Factors = [FDI,Arable,ArablePP,CGDebt,CooSchool,FemaleGM,FertCons,Firms,HealthEx,LiteracyM,LiteracyF,ODA]
Economic_Indicator = [GDP,GDPPC,GINI,GNI,GNIPC]
Here are some important trends that I have decided to look at for each country's development.
In [8]:
fig, ax = plt.subplots(figsize=(12,7))
FDIGraph=FDI.plot(ax=ax,kind='line')
FDIGraph.set(ylabel="net inflows (BoP, current US$)", xlabel="Year")
FDIGraph.set_title('Foreign Direct Investment',fontsize= 30)
Out[8]:
There has been a general upward trend for foreign direct investment, which could prove beneficial to a country's growth.
In [9]:
fig, ax = plt.subplots(figsize=(12,7))
GDPGraph=GDPPC.plot(ax=ax,kind='line')
GDPGraph.set(ylabel="constant LCU", xlabel="Year")
GDPGraph.set_title('GDP Per Capita',fontsize= 30)
Out[9]:
Over the past 15 years, Tanzania and Burundi has experienced a steady pace of growth while other east African countries have not been as successful.
In [10]:
fig, ax = plt.subplots(figsize=(12,7))
ODAGraph=ODA.plot(ax=ax,kind='line')
ODAGraph.set(ylabel="current US$", xlabel="Year")
ODAGraph.set_title('ODA Received Per Capita',fontsize= 30)
Out[10]:
ODA received per capita has been fairly constant over the past fifteen years
Here we are looking at the correlation between the various factors and economic development indicators. I will first to do a country-by-country correlation analysis on two factors that are hotly debated on whether they are beneficial to the country, FDI/ODI receieved. Afterwards, I'm going to take a look at an average of the correlations for each factor.
In [11]:
FDI.corrwith(GDPPC)
Out[11]:
In [12]:
ODA.corrwith(GDPPC)
Out[12]:
In [13]:
#Creating a dictionary of variables to correlation mean
Factorname = ['FDI','Arable Land','Arable Land per person','Central Government Debt','Childern out of School',
'Female Genital Mutilation','Fertilizer Consumption','Firms Expected to Give Gifts','Health Expenditure',
'Male Literacy','Female Literacy','ODA']
GDPPC_Correlation = {}
for i in range(0,12):
corr=Factors[i].corrwith(GDPPC)
s=float(corr.mean())
GDPPC_Correlation[Factorname[i]]=s
GNIPC_Correlation = {}
for i in range(0,12):
corr=Factors[i].corrwith(GNI)
s=float(corr.mean())
GNIPC_Correlation[Factorname[i]]=s
GINIPC_Correlation = {}
for i in range(0,12):
corr=Factors[i].corrwith(GINI)
s=float(corr.mean())
GINIPC_Correlation[Factorname[i]]=s
In [14]:
#Converting the dictionaries to dataframe
GINIC=pd.DataFrame.from_dict(GINIPC_Correlation,orient='index')
GNIC=pd.DataFrame.from_dict(GNIPC_Correlation,orient='index')
GDPC=pd.DataFrame.from_dict(GDPPC_Correlation,orient='index')
Correlation=pd.concat([GDPC,GNIC,GINIC],axis=1)
Correlation.columns = ["GDP Correlation", "GNI Correlation", "GINI Correlation"]
print (Correlation)
As I mentioned before "correlation does not equal causation"; however, after running the analysis we determined some factors that we should look further into and some factors that we know are most likely not related. There is a sufficiently high correlation for factors such as Male Literacy, Fertilizer Consumption, Health Expediture, Childern out of School, and Central Government Debt to warrant further exploration. Some factors such as Female Genital Mutilation and ODA Received have low correlation with economic development, so those factors might be not be as important to the success of a country as the other factors. As for wealth distribtion, it seems uncorrelated with any of the factors I have chosen.
World Bank DataBank