Working with data 2017. Class 4

Contact

Javier Garcia-Bernardo garcia@uva.nl

0. Structure

  1. Stats
    • Definitions
    • What's a p-value?
    • One-tailed test vs two-tailed test
    • Count vs expected count (binomial test)
    • Independence between factors: ($\chi^2$ test)
  2. In-class exercises to melt, pivot, concat, merge, groupby and plot.
  3. Read data from websited
  4. Time series

In [25]:
import pandas as pd
import numpy as np
import pylab as plt
import seaborn as sns
from scipy.stats import chi2_contingency,ttest_ind

#This allows us to use R
%load_ext rpy2.ipython

#Visualize in line
%matplotlib inline


#Be able to plot images saved in the hard drive
from IPython.display import Image,display

#Make the notebook wider
from IPython.core.display import display, HTML 
display(HTML("<style>.container { width:90% !important; }</style>"))


The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

3. Read tables from websites

pandas is cool

  • Use pd.read_html(url)
  • It returns a list of all tables in the website
  • It tries to guess the encoding of the website, but with no much success.

In [28]:
df = pd.read_html("https://piie.com/summary-economic-sanctions-episodes-1914-2006",encoding="UTF-8")
print(type(df),len(df))
df


<class 'list'> 1
Out[28]:
[     Year\n\t\t\timposed Year\n\t\t\tended  \
 0                   1914              1918   
 1                   1917              1918   
 2                   1918              1920   
 3                   1921              1921   
 4                   1925              1925   
 5                   1932              1935   
 6                   1932              1935   
 7                   1933              1933   
 8                   1935              1936   
 9                   1938              1947   
 10                  1939              1945   
 11                  1939              1945   
 12                  1940              1941   
 13                  1941              1945   
 14                  1941              1945   
 15                  1944              1947   
 16                  1946           Ongoing   
 17                  1948              1994   
 18                  1948              1955   
 19                  1948              1949   
 20                  1948              1949   
 21                  1948              1949   
 22                  1948              1949   
 23                  1948              1948   
 24                  1949              1970   
 25                  1950           Ongoing   
 26                  1950           Ongoing   
 27                  1950              1953   
 28                  1951              1953   
 29                  1954              1984   
 ..                   ...               ...   
 174                 1993              1996   
 175                 1993              1994   
 176                 1993              1993   
 177                 1994              1998   
 178                 1994              1995   
 179                 1994              1995   
 180                 1994              1995   
 181                 1995              1998   
 182                 1995              1998   
 183                 1995              1996   
 184                 1995              1995   
 185                 1996              2000   
 186                 1996              1999   
 187                 1996              1998   
 188                 1996              1998   
 189                 1996              1996   
 190                 1997           Ongoing   
 191                 1997              2003   
 192                 1998              2001   
 193                 1998              2001   
 194                 1998              2001   
 195                 1998              1999   
 196                 1998              1999   
 197                 1999              2002   
 198                 1999              2002   
 199                 1999              2002   
 200                 1991              2001   
 201                 2000              2006   
 202                 2000              2000   
 203                 2002              2006   
 
                              Principal \n\t\t\tsendera Target \n\t\t\tcountry  \
 0                                       United Kingdom                Germany   
 1                                        United States                  Japan   
 2                                       United Kingdom                 Russia   
 3                                    League of Nations             Yugoslavia   
 4                                    League of Nations                 Greece   
 5                                    League of Nations               Paraguay   
 6                                    League of Nations                Bolivia   
 7                                       United Kingdom                   USSR   
 8                                    League of Nations                  Italy   
 9                                        United States                 Mexico   
 10                                       United States                Germany   
 11                                       United States                Germany   
 12                                       United States                  Japan   
 13                                       United States                  Japan   
 14                                       United States                  Japan   
 15                                       United States              Argentina   
 16                                         Arab League                 Israel   
 17                                       United States          USSR, Comecom   
 18                                                USSR             Yugoslavia   
 19                                       United States            Netherlands   
 20                                                USSR                 France   
 21                                                USSR         United Kingdom   
 22                                                USSR          United States   
 23                                               India              Hyderabad   
 24                                       United States                  China   
 25                                       United States            North Korea   
 26                                       United States            North Korea   
 27                                       United States                  China   
 28                                       United States                   Iran   
 29                                               Spain         United Kingdom   
 ..                                                 ...                    ...   
 174                                        USSR/Russia             Kazakhstan   
 175                                      United States            North Korea   
 176                                      United States              Guatemala   
 177                                      United States                 Gambia   
 178                                             Greece              Macedonia   
 179                                             Greece                Albania   
 180                                     United Nations                 Rwanda   
 181                                      United States                   Peru   
 182                                      United States                Ecuador   
 183                                          Australia                 France   
 184                                     European Union                 Turkey   
 185                                      United States                Nigeria   
 186  East African members of Organization of Africa...                Burundi   
 187                                      United States                 Zambia   
 188                                      United States               Colombia   
 189                    United States, Mercosur members               Paraguay   
 190                                     United Nations  Cambodia, Khmer Rouge   
 191                                     United Nations           Sierra Leone   
 192                                      United States               Pakistan   
 193                                      United States                  India   
 194                                      United States     Yugoslavia, Serbia   
 195                                      United States     Yugoslavia, Serbia   
 196                                             Turkey                  Italy   
 197                                      United States              Indonesia   
 198                                     United Nations            Afghanistan   
 199                                      United States            Ivory Coast   
 200                                      United States               Pakistan   
 201          Economic Community of West African States                Liberia   
 202                                      United States                Ecuador   
 203                                      United States            North Korea   
 
                         Policy \n\t\t\tgoal  \
 0                          Military victory   
 1                       Shipping for Allies   
 2                    Destabilize Bolsheviks   
 3           Military disruption vs. Albania   
 4                    Withdraw from Bulgaria   
 5                      Settle the Chaco War   
 6                      Settle the Chaco War   
 7                  Release British citizens   
 8                   Withdraw from Abyssinia   
 9                     Expropriation dispute   
 10                            Regime change   
 11                         Military victory   
 12             Withdraw from Southeast Asia   
 13                            Regime change   
 14                         Military victory   
 15                        Destabilize Perón   
 16              Create Palestinian homeland   
 17                Impair military potential   
 18              Destabilize Tito government   
 19                      Recognize Indonesia   
 20                          Berlin blockade   
 21                          Berlin blockade   
 22                          Berlin blockade   
 23                     Assimilate Hyderabad   
 24                Impair military potential   
 25                            Regime change   
 26                      Military impairment   
 27               Military disruption, Korea   
 28                    Destabilize Mossadegh   
 29               Sovereignty over Gibraltar   
 ..                                      ...   
 174   Independence issues, energy resources   
 175                   Nuclear proliferation   
 176                                    Coup   
 177                               Democracy   
 178                       National identity   
 179  Release of jailed ethnic Greek leaders   
 180                          Civil violence   
 181                         Border conflict   
 182                         Border conflict   
 183                         Nuclear testing   
 184                            Human rights   
 185                               Democracy   
 186                               Democracy   
 187     Human rights, constitutional reform   
 188                 Narcotics, human rights   
 189                   Possible coup attempt   
 190                               Democracy   
 191                               Democracy   
 192                          Nuclear policy   
 193                   Nuclear proliferation   
 194                   Destabilize Milošević   
 195                                  Kosovo   
 196                          Kurdish leader   
 197             Independence for East Timor   
 198               Extradite Osama bin Laden   
 199                         Coup, democracy   
 200                         Coup, democracy   
 201                         Support for RUF   
 202                                    Coup   
 203                   Nuclear proliferation   
 
      \nSuccess\n\t\t\tscoreb\n\t\t\t(scale \n\t\t\t1 to 16)\n  \
 0                                                   12          
 1                                                    4          
 2                                                    2          
 3                                                   16          
 4                                                   16          
 5                                                    6          
 6                                                    6          
 7                                                   12          
 8                                                    2          
 9                                                    9          
 10                                                   8          
 11                                                  12          
 12                                                   1          
 13                                                   8          
 14                                                  12          
 15                                                   4          
 16                                                   4          
 17                                                   6          
 18                                                   1          
 19                                                  16          
 20                                                   2          
 21                                                   2          
 22                                                   2          
 23                                                   8          
 24                                                   2          
 25                                                   1          
 26                                                   4          
 27                                                   2          
 28                                                  12          
 29                                                   6          
 ..                                                 ...          
 174                                                  9          
 175                                                  9          
 176                                                 16          
 177                                                  6          
 178                                                  9          
 179                                                 16          
 180                                                  2          
 181                                                  8          
 182                                                  8          
 183                                                  4          
 184                                                  9          
 185                                                  9          
 186                                                  6          
 187                                                  4          
 188                                                  6          
 189                                                 12          
 190                                                  4          
 191                                                  8          
 192                                                  4          
 193                                                  2          
 194                                                 12          
 195                                                  6          
 196                                                  9          
 197                                                 12          
 198                                                  2          
 199                                                  6          
 200                                                  2          
 201                                                  8          
 202                                                 12          
 203                                                  1          
 
     \nCost\n\t\t\tto\n\t\t\ttarget\n\t\t\t(percent\n\t\t\tof GNP)\n  
 0                                                  7.1               
 1                                                  0.8               
 2                                                  4.1               
 3                                                    –               
 4                                                    –               
 5                                           negligible               
 6                                                  2.6               
 7                                                 0.02               
 8                                                  1.7               
 9                                                  0.2               
 10                                                 1.4               
 11                                                 1.4               
 12                                                 0.9               
 13                                                 1.9               
 14                                                 1.9               
 15                                                 0.8               
 16                                                 1.7               
 17                                                 0.2               
 18                                                –2.5               
 19                                                 0.2               
 20                                          negligible               
 21                                                 0.1               
 22                                                 0.1               
 23                                                   2               
 24                                                 0.5               
 25                                                 1.2               
 26                                                 1.2               
 27                                                 0.5               
 28                                                14.3               
 29                                               0.009               
 ..                                                 ...               
 174                                                4.6               
 175                                                  –               
 176                                                1.3               
 177                                              –0.02               
 178                                                3.2               
 179                                                2.9               
 180                                               –5.6               
 181                                         negligible               
 182                                         negligible               
 183                                         negligible               
 184                                         negligible               
 185                                                4.9               
 186                                               10.4               
 187                                                2.9               
 188                                                0.2               
 189                                                  –               
 190                                                2.6               
 191                                                5.8               
 192                                                1.0               
 193                                                0.2               
 194                                                8.3               
 195                                                8.3               
 196                                                  –               
 197                                                0.2               
 198                                                1.1               
 199                                                0.3               
 200                                         negligible               
 201                                               18.8               
 202                                                  –               
 203                                                0.6               
 
 [204 rows x 7 columns]]

In [32]:
df[0].head(10)


Out[32]:
Year imposed Year ended Principal  sendera Target  country Policy  goal Success scoreb (scale  1 to 16) Cost to target (percent of GNP)
0 1914 1918 United Kingdom Germany Military victory 12 7.1
1 1917 1918 United States Japan Shipping for Allies 4 0.8
2 1918 1920 United Kingdom Russia Destabilize Bolsheviks 2 4.1
3 1921 1921 League of Nations Yugoslavia Military disruption vs. Albania 16
4 1925 1925 League of Nations Greece Withdraw from Bulgaria 16
5 1932 1935 League of Nations Paraguay Settle the Chaco War 6 negligible
6 1932 1935 League of Nations Bolivia Settle the Chaco War 6 2.6
7 1933 1933 United Kingdom USSR Release British citizens 12 0.02
8 1935 1936 League of Nations Italy Withdraw from Abyssinia 2 1.7
9 1938 1947 United States Mexico Expropriation dispute 9 0.2

In [30]:
df[0].columns


Out[30]:
Index(['Year\n\t\t\timposed', 'Year\n\t\t\tended', 'Principal \n\t\t\tsendera',
       'Target \n\t\t\tcountry', 'Policy \n\t\t\tgoal',
       '\nSuccess\n\t\t\tscoreb\n\t\t\t(scale \n\t\t\t1 to 16)\n',
       '\nCost\n\t\t\tto\n\t\t\ttarget\n\t\t\t(percent\n\t\t\tof GNP)\n'],
      dtype='object')

In [5]:
df = pd.read_html("https://piie.com/summary-economic-sanctions-episodes-1914-2006",encoding="UTF-8")
df = df[0]
print(df.columns)
df.columns = ['Year imposed', 'Year ended', 'Principal sender',
       'Target country', 'Policy goal',
       'Success score (scale 1 to 16)',
       'Cost to target (percent of GNP)']

df = df.replace('negligible', 0) 
df = df.replace("–","-",regex=True) #the file uses long dashes
df.to_csv("data/economic_sanctions.csv",index=None,sep="\t")


Index(['Year\n\t\t\timposed', 'Year\n\t\t\tended', 'Principal \n\t\t\tsendera',
       'Target \n\t\t\tcountry', 'Policy \n\t\t\tgoal',
       '\nSuccess\n\t\t\tscoreb\n\t\t\t(scale \n\t\t\t1 to 16)\n',
       '\nCost\n\t\t\tto\n\t\t\ttarget\n\t\t\t(percent\n\t\t\tof GNP)\n'],
      dtype='object')

In [6]:
df = pd.read_csv("data/economic_sanctions.csv",sep="\t",na_values=["-","Ongoing"])
df["Duration"] = df["Year ended"] - df["Year imposed"]
df.head()


Out[6]:
Year imposed Year ended Principal sender Target country Policy goal Success score (scale 1 to 16) Cost to target (percent of GNP) Duration
0 1914 1918.0 United Kingdom Germany Military victory 12 7.1 4.0
1 1917 1918.0 United States Japan Shipping for Allies 4 0.8 1.0
2 1918 1920.0 United Kingdom Russia Destabilize Bolsheviks 2 4.1 2.0
3 1921 1921.0 League of Nations Yugoslavia Military disruption vs. Albania 16 NaN 0.0
4 1925 1925.0 League of Nations Greece Withdraw from Bulgaria 16 NaN 0.0

In [13]:
sns.lmplot(x="Duration",y="Cost to target (percent of GNP)",data=df,fit_reg=False,hue="Year imposed",legend=False,palette="YlOrBr")
plt.ylim((-2,10))
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5),ncol=4)


Out[13]:
<matplotlib.legend.Legend at 0x7fda70222240>

4. Parse dates

pandas is cool

  • Use parse_dates=[columns] when reading the file
  • It parses the date

In [177]:












4.1. Use parse_dates when reading the file


In [33]:
df = pd.read_csv("data/exchange-rate-twi-may-1970-aug-1.tsv",sep="\t",parse_dates=["Month"],skipfooter=2)
df.head()


/opt/anaconda/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
  if __name__ == '__main__':
Out[33]:
Month Exchange Rate TWI. May 1970 ? Aug 1995.
0 1970-05-01 100.0
1 1970-06-01 99.6
2 1970-07-01 99.4
3 1970-08-01 99.1
4 1970-09-01 99.2

4.2. You can now filter by date


In [37]:
#filter by time
df_after1980 = df.loc[df["Month"] > "1980-05-02"] #year-month-date
df_after1980.columns = ["Date","Rate"]
df_after1980.head()


Out[37]:
Date Rate
121 1980-06-01 85.0
122 1980-07-01 85.5
123 1980-08-01 85.8
124 1980-09-01 85.8
125 1980-10-01 86.2

4.3. And still extract columns of year and month


In [20]:
#make columns with year and month (useful for models)
df_after1980["Year"] = df_after1980["Date"].apply(lambda x: x.year)
df_after1980["Month"] = df_after1980["Date"].apply(lambda x: x.month)
df_after1980.head()


Out[20]:
Date Rate Year Month
0 1970-05-01 100.0 1970 5
1 1970-06-01 99.6 1970 6
2 1970-07-01 99.4 1970 7
3 1970-08-01 99.1 1970 8
4 1970-09-01 99.2 1970 9

4.4. You can resample the data with a specific frequency


In [38]:
#resample
df_after1980_resampled = df_after1980.resample("A",on="Date").mean()
display(df_after1980_resampled.head())


Rate
Date
1980-12-31 86.028571
1981-12-31 91.233333
1982-12-31 86.441667
1983-12-31 79.766667
1984-12-31 81.658333

In [23]:
df_after1980_resampled = df_after1980_resampled.reset_index()
df_after1980_resampled.head()


Out[23]:
Date Rate Year Month
0 1970-12-31 99.375000 1970.0 8.5
1 1971-12-31 99.741667 1971.0 6.5
2 1972-12-31 99.225000 1972.0 6.5
3 1973-12-31 111.208333 1973.0 6.5
4 1974-12-31 113.883333 1974.0 6.5

4.5 And of course plot it with a line plot


In [24]:
#Let's visualize it
plt.figure(figsize=(6,4))
plt.plot(df_after1980["Date"],df_after1980["Rate"],label="Before resampling")
plt.plot(df_after1980_resampled["Date"],df_after1980_resampled["Rate"],label="After resampling")
plt.xlabel("Time")
plt.ylabel("Rate")
plt.legend()
plt.show()