Working with data 2017. Class 4

Contact

Javier Garcia-Bernardo garcia@uva.nl

0. Structure

Stats
- Definitions
- What's a p-value?
- One-tailed test vs two-tailed test
- Count vs expected count (binomial test)
- Independence between factors: ($\chi^2$ test)
In-class exercises to melt, pivot, concat, merge, groupby and plot.
Read data from websited
Time series



In [25]:

    
import pandas as pd
import numpy as np
import pylab as plt
import seaborn as sns
from scipy.stats import chi2_contingency,ttest_ind

#This allows us to use R
%load_ext rpy2.ipython

#Visualize in line
%matplotlib inline


#Be able to plot images saved in the hard drive
from IPython.display import Image,display

#Make the notebook wider
from IPython.core.display import display, HTML 
display(HTML("<style>.container { width:90% !important; }</style>"))









    



The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

3. Read tables from websites

pandas is cool

Use pd.read_html(url)
It returns a list of all tables in the website
It tries to guess the encoding of the website, but with no much success.



In [28]:

    
df = pd.read_html("https://piie.com/summary-economic-sanctions-episodes-1914-2006",encoding="UTF-8")
print(type(df),len(df))
df









    



<class 'list'> 1






    Out[28]:





[     Year\n\t\t\timposed Year\n\t\t\tended  \
 0                   1914              1918   
 1                   1917              1918   
 2                   1918              1920   
 3                   1921              1921   
 4                   1925              1925   
 5                   1932              1935   
 6                   1932              1935   
 7                   1933              1933   
 8                   1935              1936   
 9                   1938              1947   
 10                  1939              1945   
 11                  1939              1945   
 12                  1940              1941   
 13                  1941              1945   
 14                  1941              1945   
 15                  1944              1947   
 16                  1946           Ongoing   
 17                  1948              1994   
 18                  1948              1955   
 19                  1948              1949   
 20                  1948              1949   
 21                  1948              1949   
 22                  1948              1949   
 23                  1948              1948   
 24                  1949              1970   
 25                  1950           Ongoing   
 26                  1950           Ongoing   
 27                  1950              1953   
 28                  1951              1953   
 29                  1954              1984   
 ..                   ...               ...   
 174                 1993              1996   
 175                 1993              1994   
 176                 1993              1993   
 177                 1994              1998   
 178                 1994              1995   
 179                 1994              1995   
 180                 1994              1995   
 181                 1995              1998   
 182                 1995              1998   
 183                 1995              1996   
 184                 1995              1995   
 185                 1996              2000   
 186                 1996              1999   
 187                 1996              1998   
 188                 1996              1998   
 189                 1996              1996   
 190                 1997           Ongoing   
 191                 1997              2003   
 192                 1998              2001   
 193                 1998              2001   
 194                 1998              2001   
 195                 1998              1999   
 196                 1998              1999   
 197                 1999              2002   
 198                 1999              2002   
 199                 1999              2002   
 200                 1991              2001   
 201                 2000              2006   
 202                 2000              2000   
 203                 2002              2006   
 
                              Principal \n\t\t\tsendera Target \n\t\t\tcountry  \
 0                                       United Kingdom                Germany   
 1                                        United States                  Japan   
 2                                       United Kingdom                 Russia   
 3                                    League of Nations             Yugoslavia   
 4                                    League of Nations                 Greece   
 5                                    League of Nations               Paraguay   
 6                                    League of Nations                Bolivia   
 7                                       United Kingdom                   USSR   
 8                                    League of Nations                  Italy   
 9                                        United States                 Mexico   
 10                                       United States                Germany   
 11                                       United States                Germany   
 12                                       United States                  Japan   
 13                                       United States                  Japan   
 14                                       United States                  Japan   
 15                                       United States              Argentina   
 16                                         Arab League                 Israel   
 17                                       United States          USSR, Comecom   
 18                                                USSR             Yugoslavia   
 19                                       United States            Netherlands   
 20                                                USSR                 France   
 21                                                USSR         United Kingdom   
 22                                                USSR          United States   
 23                                               India              Hyderabad   
 24                                       United States                  China   
 25                                       United States            North Korea   
 26                                       United States            North Korea   
 27                                       United States                  China   
 28                                       United States                   Iran   
 29                                               Spain         United Kingdom   
 ..                                                 ...                    ...   
 174                                        USSR/Russia             Kazakhstan   
 175                                      United States            North Korea   
 176                                      United States              Guatemala   
 177                                      United States                 Gambia   
 178                                             Greece              Macedonia   
 179                                             Greece                Albania   
 180                                     United Nations                 Rwanda   
 181                                      United States                   Peru   
 182                                      United States                Ecuador   
 183                                          Australia                 France   
 184                                     European Union                 Turkey   
 185                                      United States                Nigeria   
 186  East African members of Organization of Africa...                Burundi   
 187                                      United States                 Zambia   
 188                                      United States               Colombia   
 189                    United States, Mercosur members               Paraguay   
 190                                     United Nations  Cambodia, Khmer Rouge   
 191                                     United Nations           Sierra Leone   
 192                                      United States               Pakistan   
 193                                      United States                  India   
 194                                      United States     Yugoslavia, Serbia   
 195                                      United States     Yugoslavia, Serbia   
 196                                             Turkey                  Italy   
 197                                      United States              Indonesia   
 198                                     United Nations            Afghanistan   
 199                                      United States            Ivory Coast   
 200                                      United States               Pakistan   
 201          Economic Community of West African States                Liberia   
 202                                      United States                Ecuador   
 203                                      United States            North Korea   
 
                         Policy \n\t\t\tgoal  \
 0                          Military victory   
 1                       Shipping for Allies   
 2                    Destabilize Bolsheviks   
 3           Military disruption vs. Albania   
 4                    Withdraw from Bulgaria   
 5                      Settle the Chaco War   
 6                      Settle the Chaco War   
 7                  Release British citizens   
 8                   Withdraw from Abyssinia   
 9                     Expropriation dispute   
 10                            Regime change   
 11                         Military victory   
 12             Withdraw from Southeast Asia   
 13                            Regime change   
 14                         Military victory   
 15                        Destabilize Perón   
 16              Create Palestinian homeland   
 17                Impair military potential   
 18              Destabilize Tito government   
 19                      Recognize Indonesia   
 20                          Berlin blockade   
 21                          Berlin blockade   
 22                          Berlin blockade   
 23                     Assimilate Hyderabad   
 24                Impair military potential   
 25                            Regime change   
 26                      Military impairment   
 27               Military disruption, Korea   
 28                    Destabilize Mossadegh   
 29               Sovereignty over Gibraltar   
 ..                                      ...   
 174   Independence issues, energy resources   
 175                   Nuclear proliferation   
 176                                    Coup   
 177                               Democracy   
 178                       National identity   
 179  Release of jailed ethnic Greek leaders   
 180                          Civil violence   
 181                         Border conflict   
 182                         Border conflict   
 183                         Nuclear testing   
 184                            Human rights   
 185                               Democracy   
 186                               Democracy   
 187     Human rights, constitutional reform   
 188                 Narcotics, human rights   
 189                   Possible coup attempt   
 190                               Democracy   
 191                               Democracy   
 192                          Nuclear policy   
 193                   Nuclear proliferation   
 194                   Destabilize Milošević   
 195                                  Kosovo   
 196                          Kurdish leader   
 197             Independence for East Timor   
 198               Extradite Osama bin Laden   
 199                         Coup, democracy   
 200                         Coup, democracy   
 201                         Support for RUF   
 202                                    Coup   
 203                   Nuclear proliferation   
 
      \nSuccess\n\t\t\tscoreb\n\t\t\t(scale \n\t\t\t1 to 16)\n  \
 0                                                   12          
 1                                                    4          
 2                                                    2          
 3                                                   16          
 4                                                   16          
 5                                                    6          
 6                                                    6          
 7                                                   12          
 8                                                    2          
 9                                                    9          
 10                                                   8          
 11                                                  12          
 12                                                   1          
 13                                                   8          
 14                                                  12          
 15                                                   4          
 16                                                   4          
 17                                                   6          
 18                                                   1          
 19                                                  16          
 20                                                   2          
 21                                                   2          
 22                                                   2          
 23                                                   8          
 24                                                   2          
 25                                                   1          
 26                                                   4          
 27                                                   2          
 28                                                  12          
 29                                                   6          
 ..                                                 ...          
 174                                                  9          
 175                                                  9          
 176                                                 16          
 177                                                  6          
 178                                                  9          
 179                                                 16          
 180                                                  2          
 181                                                  8          
 182                                                  8          
 183                                                  4          
 184                                                  9          
 185                                                  9          
 186                                                  6          
 187                                                  4          
 188                                                  6          
 189                                                 12          
 190                                                  4          
 191                                                  8          
 192                                                  4          
 193                                                  2          
 194                                                 12          
 195                                                  6          
 196                                                  9          
 197                                                 12          
 198                                                  2          
 199                                                  6          
 200                                                  2          
 201                                                  8          
 202                                                 12          
 203                                                  1          
 
     \nCost\n\t\t\tto\n\t\t\ttarget\n\t\t\t(percent\n\t\t\tof GNP)\n  
 0                                                  7.1               
 1                                                  0.8               
 2                                                  4.1               
 3                                                    –               
 4                                                    –               
 5                                           negligible               
 6                                                  2.6               
 7                                                 0.02               
 8                                                  1.7               
 9                                                  0.2               
 10                                                 1.4               
 11                                                 1.4               
 12                                                 0.9               
 13                                                 1.9               
 14                                                 1.9               
 15                                                 0.8               
 16                                                 1.7               
 17                                                 0.2               
 18                                                –2.5               
 19                                                 0.2               
 20                                          negligible               
 21                                                 0.1               
 22                                                 0.1               
 23                                                   2               
 24                                                 0.5               
 25                                                 1.2               
 26                                                 1.2               
 27                                                 0.5               
 28                                                14.3               
 29                                               0.009               
 ..                                                 ...               
 174                                                4.6               
 175                                                  –               
 176                                                1.3               
 177                                              –0.02               
 178                                                3.2               
 179                                                2.9               
 180                                               –5.6               
 181                                         negligible               
 182                                         negligible               
 183                                         negligible               
 184                                         negligible               
 185                                                4.9               
 186                                               10.4               
 187                                                2.9               
 188                                                0.2               
 189                                                  –               
 190                                                2.6               
 191                                                5.8               
 192                                                1.0               
 193                                                0.2               
 194                                                8.3               
 195                                                8.3               
 196                                                  –               
 197                                                0.2               
 198                                                1.1               
 199                                                0.3               
 200                                         negligible               
 201                                               18.8               
 202                                                  –               
 203                                                0.6               
 
 [204 rows x 7 columns]]



In [32]:

    
df[0].head(10)









    Out[32]:






  
    
      
      Year
			imposed
      Year
			ended
      Principal 
			sendera
      Target 
			country
      Policy 
			goal
      Success
			scoreb
			(scale 
			1 to 16)
      Cost
			to
			target
			(percent
			of GNP)
    
  
  
    
      0
      1914
      1918
      United Kingdom
      Germany
      Military victory
      12
      7.1
    
    
      1
      1917
      1918
      United States
      Japan
      Shipping for Allies
      4
      0.8
    
    
      2
      1918
      1920
      United Kingdom
      Russia
      Destabilize Bolsheviks
      2
      4.1
    
    
      3
      1921
      1921
      League of Nations
      Yugoslavia
      Military disruption vs. Albania
      16
      –
    
    
      4
      1925
      1925
      League of Nations
      Greece
      Withdraw from Bulgaria
      16
      –
    
    
      5
      1932
      1935
      League of Nations
      Paraguay
      Settle the Chaco War
      6
      negligible
    
    
      6
      1932
      1935
      League of Nations
      Bolivia
      Settle the Chaco War
      6
      2.6
    
    
      7
      1933
      1933
      United Kingdom
      USSR
      Release British citizens
      12
      0.02
    
    
      8
      1935
      1936
      League of Nations
      Italy
      Withdraw from Abyssinia
      2
      1.7
    
    
      9
      1938
      1947
      United States
      Mexico
      Expropriation dispute
      9
      0.2



In [30]:

    
df[0].columns









    Out[30]:





Index(['Year\n\t\t\timposed', 'Year\n\t\t\tended', 'Principal \n\t\t\tsendera',
       'Target \n\t\t\tcountry', 'Policy \n\t\t\tgoal',
       '\nSuccess\n\t\t\tscoreb\n\t\t\t(scale \n\t\t\t1 to 16)\n',
       '\nCost\n\t\t\tto\n\t\t\ttarget\n\t\t\t(percent\n\t\t\tof GNP)\n'],
      dtype='object')



In [5]:

    
df = pd.read_html("https://piie.com/summary-economic-sanctions-episodes-1914-2006",encoding="UTF-8")
df = df[0]
print(df.columns)
df.columns = ['Year imposed', 'Year ended', 'Principal sender',
       'Target country', 'Policy goal',
       'Success score (scale 1 to 16)',
       'Cost to target (percent of GNP)']

df = df.replace('negligible', 0) 
df = df.replace("–","-",regex=True) #the file uses long dashes
df.to_csv("data/economic_sanctions.csv",index=None,sep="\t")









    



Index(['Year\n\t\t\timposed', 'Year\n\t\t\tended', 'Principal \n\t\t\tsendera',
       'Target \n\t\t\tcountry', 'Policy \n\t\t\tgoal',
       '\nSuccess\n\t\t\tscoreb\n\t\t\t(scale \n\t\t\t1 to 16)\n',
       '\nCost\n\t\t\tto\n\t\t\ttarget\n\t\t\t(percent\n\t\t\tof GNP)\n'],
      dtype='object')



In [6]:

    
df = pd.read_csv("data/economic_sanctions.csv",sep="\t",na_values=["-","Ongoing"])
df["Duration"] = df["Year ended"] - df["Year imposed"]
df.head()









    Out[6]:






  
    
      
      Year imposed
      Year ended
      Principal sender
      Target country
      Policy goal
      Success score (scale 1 to 16)
      Cost to target (percent of GNP)
      Duration
    
  
  
    
      0
      1914
      1918.0
      United Kingdom
      Germany
      Military victory
      12
      7.1
      4.0
    
    
      1
      1917
      1918.0
      United States
      Japan
      Shipping for Allies
      4
      0.8
      1.0
    
    
      2
      1918
      1920.0
      United Kingdom
      Russia
      Destabilize Bolsheviks
      2
      4.1
      2.0
    
    
      3
      1921
      1921.0
      League of Nations
      Yugoslavia
      Military disruption vs. Albania
      16
      NaN
      0.0
    
    
      4
      1925
      1925.0
      League of Nations
      Greece
      Withdraw from Bulgaria
      16
      NaN
      0.0



In [13]:

    
sns.lmplot(x="Duration",y="Cost to target (percent of GNP)",data=df,fit_reg=False,hue="Year imposed",legend=False,palette="YlOrBr")
plt.ylim((-2,10))
plt.legend(loc="center left", bbox_to_anchor=(1, 0.5),ncol=4)









    Out[13]:





<matplotlib.legend.Legend at 0x7fda70222240>

4. Parse dates

pandas is cool

Use parse_dates=[columns] when reading the file
It parses the date



In [177]:

4.1. Use parse_dates when reading the file



In [33]:

    
df = pd.read_csv("data/exchange-rate-twi-may-1970-aug-1.tsv",sep="\t",parse_dates=["Month"],skipfooter=2)
df.head()









    



/opt/anaconda/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
  if __name__ == '__main__':






    Out[33]:






  
    
      
      Month
      Exchange Rate TWI. May 1970 ? Aug 1995.
    
  
  
    
      0
      1970-05-01
      100.0
    
    
      1
      1970-06-01
      99.6
    
    
      2
      1970-07-01
      99.4
    
    
      3
      1970-08-01
      99.1
    
    
      4
      1970-09-01
      99.2

4.2. You can now filter by date



In [37]:

    
#filter by time
df_after1980 = df.loc[df["Month"] > "1980-05-02"] #year-month-date
df_after1980.columns = ["Date","Rate"]
df_after1980.head()

4.3. And still extract columns of year and month



In [20]:

    
#make columns with year and month (useful for models)
df_after1980["Year"] = df_after1980["Date"].apply(lambda x: x.year)
df_after1980["Month"] = df_after1980["Date"].apply(lambda x: x.month)
df_after1980.head()

4.4. You can resample the data with a specific frequency

Very similar to groupby.
Groups the data with a specific frequency
- "A" = End of year
- "B" = Business day
- others: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
Then you tell pandas to apply a function to the group (mean/max/median...)



In [38]:

    
#resample
df_after1980_resampled = df_after1980.resample("A",on="Date").mean()
display(df_after1980_resampled.head())



In [23]:

    
df_after1980_resampled = df_after1980_resampled.reset_index()
df_after1980_resampled.head()

4.5 And of course plot it with a line plot



In [24]:

    
#Let's visualize it
plt.figure(figsize=(6,4))
plt.plot(df_after1980["Date"],df_after1980["Rate"],label="Before resampling")
plt.plot(df_after1980_resampled["Date"],df_after1980_resampled["Rate"],label="After resampling")
plt.xlabel("Time")
plt.ylabel("Rate")
plt.legend()
plt.show()

	Date	Rate
121	1980-06-01	85.0
122	1980-07-01	85.5
123	1980-08-01	85.8
124	1980-09-01	85.8
125	1980-10-01	86.2

	Rate
Date
1980-12-31	86.028571
1981-12-31	91.233333
1982-12-31	86.441667
1983-12-31	79.766667
1984-12-31	81.658333

	Date	Rate	Year	Month
0	1970-12-31	99.375000	1970.0	8.5
1	1971-12-31	99.741667	1971.0	6.5
2	1972-12-31	99.225000	1972.0	6.5
3	1973-12-31	111.208333	1973.0	6.5
4	1974-12-31	113.883333	1974.0	6.5

	Year imposed	Year ended	Principal sendera	Target country	Policy goal	Success scoreb (scale 1 to 16)	Cost to target (percent of GNP)
0	1914	1918	United Kingdom	Germany	Military victory	12	7.1
1	1917	1918	United States	Japan	Shipping for Allies	4	0.8
2	1918	1920	United Kingdom	Russia	Destabilize Bolsheviks	2	4.1
3	1921	1921	League of Nations	Yugoslavia	Military disruption vs. Albania	16	–
4	1925	1925	League of Nations	Greece	Withdraw from Bulgaria	16	–
5	1932	1935	League of Nations	Paraguay	Settle the Chaco War	6	negligible
6	1932	1935	League of Nations	Bolivia	Settle the Chaco War	6	2.6
7	1933	1933	United Kingdom	USSR	Release British citizens	12	0.02
8	1935	1936	League of Nations	Italy	Withdraw from Abyssinia	2	1.7
9	1938	1947	United States	Mexico	Expropriation dispute	9	0.2

	Year imposed	Year ended	Principal sender	Target country	Policy goal	Success score (scale 1 to 16)	Cost to target (percent of GNP)	Duration
0	1914	1918.0	United Kingdom	Germany	Military victory	12	7.1	4.0
1	1917	1918.0	United States	Japan	Shipping for Allies	4	0.8	1.0
2	1918	1920.0	United Kingdom	Russia	Destabilize Bolsheviks	2	4.1	2.0
3	1921	1921.0	League of Nations	Yugoslavia	Military disruption vs. Albania	16	NaN	0.0
4	1925	1925.0	League of Nations	Greece	Withdraw from Bulgaria	16	NaN	0.0

	Month	Exchange Rate TWI. May 1970 ? Aug 1995.
0	1970-05-01	100.0
1	1970-06-01	99.6
2	1970-07-01	99.4
3	1970-08-01	99.1
4	1970-09-01	99.2

	Date	Rate	Year	Month
0	1970-05-01	100.0	1970	5
1	1970-06-01	99.6	1970	6
2	1970-07-01	99.4	1970	7
3	1970-08-01	99.1	1970	8
4	1970-09-01	99.2	1970	9