Demographics and War
Authored by: Jonathan Broch jb5365.
War has been around since the dawn of time. As long as humans have been around they have bee fighting eachother over resources and even ideas. My project strives to take the first step in answering if there are demographic enablers for war. Traditionally, in order to fight civil wars, populations need a pool of young to be used as fighters. No youth, no war? The other parameter that must be satisfied in order for armed conflict to occur is that young people have to choose violence over the alternative. If all the youth are already employed in careers of their choice, why would they fight? My project strives to look for patterns in the population growth and youth unemployment data to see if countries that are currently experiencing civil war can be grouped together by this data.
I got population data from the UN website and datbase. I averaged the population growth rates from 1985 - 2005 because people born in those years would be considered "youth", between 15 and 25 years old by 2010. 2010 is the year I chose because it is the year before civil war broke out in Syria and other armed conflicts in the middle east.
I used the youth unemployment data from the World Bank. I only used youth unemployment data for 2010 because that is the year relevent to my analysis and the average population growth rate calculated with the UN data.
In [3]:
# import packages
import pandas as pd # data management
import matplotlib.pyplot as plt # graphics
import urllib
import numpy as np
# IPython command, puts plots in notebook
%matplotlib inline
# check Python version
import datetime as dt
import sys
print('Today is', dt.date.today())
print('What version of Python are we running? \n', sys.version, sep='')
In [4]:
#I am downloading and saving the UN data on population growth by year
import requests
url1 = "http://esa.un.org/unpd/wpp/DVD/Files/"
url2 = "1_Indicators%20(Standard)/EXCEL_FILES/"
url3 = "1_Population/WPP2015_POP_F02_POPULATION_GROWTH_RATE.XLS"
UNdata = url1 + url2 + url3
resp = requests.get(UNdata)
with open('UNdata.xls', 'wb') as output:
output.write(resp.content)
In [5]:
#I am downloading and saving the World Bank data on youth unemployment
url4 = "http://api.worldbank.org/v2/en/indicator/sl.uem.1524.zs?downloadformat=excel"
url5 = ''
url6 = ''
WBdata = url4 + url5 + url6
resp = requests.get(WBdata)
with open('WBdata.xls', 'wb') as output:
output.write(resp.content)
DATA
After downloading both files, I created a consolidated sheet on excel. This was difficult because the UN and World Banks did not use the same names for the same country. For example, "PDR Korea" in the UN file was classified as "the people's republic of Korea" in the World Bank. There were around 25 instances of this occuring. Next, I used formulas in excel to extract data from the UN and WB files to populate a table that had
"Country name" in the first column,
"Country code" in the second,
"Population Growth" in the third,
"Youth Unemployment" in the 4th column, and
"Group" in the fifth. The group designation will be explained later.
In [6]:
#after combining both files, I uploaded the excel document to my dropbox at the link below
xls_file = pd.ExcelFile('https://dl.dropboxusercontent.com/u/16846867/UN-WB%202010%20PG%20vs%20YU.xlsx')
xls_file
Out[6]:
In [7]:
#Here is the data I will analyze. Population growth is an average from 1985-2005 and youth unemployment data is from 2010.
Dataurl = 'https://dl.dropboxusercontent.com/u/16846867/UN-WB%202010%20PG%20vs%20YU.xlsx'
FP1 = pd.read_excel(Dataurl, sheetname=1, skiprows=0, na_values=['…'])
Data = FP1[list(range(5))]
Data
Out[7]:
In [8]:
#here is all the data in a scatter plot
Data.plot.scatter(x="Population Growth", y="Youth Unemployment", figsize=(10, 5), alpha=1)
Out[8]:
In [149]:
#Here are the values I am using the split the data into 4 groups
Data.median(axis=None, skipna=True, level=None, numeric_only=None)
Out[149]:
In [150]:
#Here is the same scatter plot with lines dividing the data into groups.
#Group A: Low pop growth, Low youth unemployment. Lower left
#Group B: High pop growth, Low youth unemployment. Lower right
#Group C: Low pop growth, High youth unemployment. Upper left
#Group D: High pop growth, High youth unemployment Upper right
Data.plot.scatter(x="Population Growth", y="Youth Unemployment", figsize=(20, 10), alpha=.9, s=50, color ="cyan")
#I used the numbers for median pop growth and youth unemployment to divide the data into 4 groups.
#Below is the code i used to intput the dividing lines
PM1 = [1.770625, 1.770625]
PM2 = [0,70]
plt.plot(PM1, PM2)
YE1 = [-2, 7]
YE2 = [16.326897,16.326897]
plt.plot(YE1, YE2)
plt.show()
In [97]:
#I am most interested by countries with high population growth and high youth unemployment, Group D.
#My prediction is that countries in this group are more likely to be countries currently undergoing or about to undergo civil unrest.
#As you can see, some of the most obvious countries currently undergoing violent unrest are in the top 12 of Group D.
Data[(Data.Group == "D")].sort_values("Population Growth", axis=0, ascending=False, inplace=False, kind='quicksort', na_position='last').head(12)
Out[97]:
In [98]:
#This table shows countries in group A, sorted by lowest youth unemployment.
#Most of these countries are traditionally known as being extremely stable in 2010.
Data[(Data.Group == "A")].sort_values("Youth Unemployment", axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last').head(12)
Out[98]:
In [99]:
#This table shows countries with the lowest Youth Unemployment. At a glance, they seem to be countries that have
#undergone violent civil strife in the past 50 years. Rwanda, Cambodia, Liberia, Sierra Leone
Data.sort_values("Youth Unemployment", axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last').head(10)
Out[99]:
In [9]:
#This table shows countries with the lowest populaiton growth. Coincidentally, there are many countries in eastern Europe.
Data.sort_values("Population Growth", axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last').head(12)
Out[9]:
In [10]:
#This table shows countries that have the highest population growth rates.
Data.sort_values("Population Growth", axis=0, ascending=False, inplace=False, kind='quicksort', na_position='last').head(12)
Out[10]:
The Grand Finale
And now back to what I thought was interesting. Here is the scatter plot with countries that have had thousands of death in the past years due to armed conflict as well as some of the largest economies in the world highlighted
in RED Afghanistan, Iraq, Yemen, Sudan, Syria, and Libya
in BLUE The United States
in YELLOW China
in GREEN India
In [140]:
fig, ax = plt.subplots()
Data.plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.5, color = 'white')
Data.iloc[199:201].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'red', s=100)
Data.iloc[172:173].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'red', s=100)
Data.iloc[164:165].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'red', s=100)
Data.iloc[188:189].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'red', s=100)
Data.iloc[129:130].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'red', s=100)
Data.iloc[64:65].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'blue', s=100)
Data.iloc[66:67].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'Yellow', s=100)
Data.iloc[109:110].plot.scatter(ax=ax, x="Population Growth", y="Youth Unemployment", figsize=(20,10), alpha=.9, color = 'green', s=100)
PM1 = [1.770625, 1.770625]
PM2 = [0,70]
plt.plot(PM1, PM2)
YE1 = [-2, 7]
YE2 = [16.326897,16.326897]
plt.plot(YE1, YE2)
Out[140]: