Homework 8: Dataset 1: Water


In [12]:
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline

In [13]:
df = pd.read_csv('water_data_class.csv', encoding='latin-1')

In [14]:
df


Out[14]:
country countrycode population total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
0 Aruba ABW 103441 NaN NaN NaN 0.000000
1 Andorra AND 72786 0.3156 0.3156 NaN 0.000000
2 Afghanistan AFG 31627506 47.1500 47.1500 43.011665 43.011665
3 Angola AGO 24227524 148.0000 148.0000 0.432770 0.476892
4 Albania ALB 2894475 26.9000 26.9000 6.832714 4.873606
5 Arab World ARB 385272539 109.5810 113.5810 NaN 215.861192
6 United Arab Emirates ARE 9086139 0.1500 0.1500 1936.000000 2665.333333
7 Argentina ARG 42980026 292.0000 292.0000 NaN 12.938356
8 Armenia ARM 3006154 6.8590 6.8590 25.266074 42.877971
9 American Samoa ASM 55434 NaN NaN NaN 0.000000
10 Antigua and Barbuda ATG 90900 0.0520 0.0520 NaN 22.115385
11 Australia AUS 23470118 492.0000 492.0000 4.410569 4.014228
12 Austria AUT 8545908 55.0000 55.0000 6.649091 6.649091
13 Azerbaijan AZE 9535079 8.1150 8.1150 124.091189 147.504621
14 Burundi BDI 10816860 10.0600 10.0600 2.862823 2.862823
15 Belgium BEL 11231213 12.0000 12.0000 56.150000 51.466667
16 Benin BEN 10598482 10.3000 10.3000 1.262136 1.262136
17 Burkina Faso BFA 17589198 12.5000 12.5000 5.725600 6.544000
18 Bangladesh BGD 159077513 105.0000 105.0000 NaN 34.161905
19 Bulgaria BGR 7223938 21.0000 21.0000 27.095238 29.138095
20 Bahrain BHR 1361930 0.0040 0.0040 NaN 8935.000000
21 Bahamas, The BHS 383054 0.7000 0.7000 NaN NaN
22 Bosnia and Herzegovina BIH 3817554 35.5000 35.5000 NaN 0.926761
23 Belarus BLR 9470000 34.0000 34.0000 5.402941 4.452941
24 Belize BLZ 351706 15.2600 15.2600 0.661861 0.661861
25 Bermuda BMU 65181 NaN NaN NaN NaN
26 Bolivia BOL 10561887 303.5000 303.5000 NaN 0.687974
27 Brazil BRA 206077898 5661.0000 5661.0000 NaN 1.321851
28 Barbados BRB 283380 0.0800 0.0800 NaN 101.250000
29 Brunei Darussalam BRN 417394 8.5000 8.5000 NaN 1.082353
... ... ... ... ... ... ... ...
218 Togo TGO 7115163 11.5000 11.5000 1.469565 1.469565
219 Thailand THA 67725979 224.5000 224.5000 0.000000 25.527840
220 Tajikistan TJK 8295840 63.4600 63.4600 18.846517 18.105893
221 Turkmenistan TKM 5307188 1.4050 1.4050 1772.953737 1989.323843
222 Timor-Leste TLS 1212107 8.2150 8.2150 0.000000 14.266586
223 Tonga TON 105586 NaN NaN 0.000000 0.000000
224 Trinidad and Tobago TTO 1354483 3.8400 3.8400 8.177083 9.979167
225 Tunisia TUN 10996600 4.1950 4.1950 67.938021 78.784267
226 Turkey TUR 75932348 227.0000 227.0000 18.502203 17.665198
227 Tuvalu TUV 9893 NaN NaN NaN NaN
228 Tanzania TZA 51822621 84.0000 84.0000 6.171429 6.171429
229 Uganda UGA 37782971 39.0000 39.0000 0.813846 1.633333
230 Ukraine UKR 45362900 55.1000 55.1000 33.176044 26.950998
231 Upper middle income UMC 2360817817 15552.8680 15561.2750 NaN 7.389323
232 Uruguay URY 3419516 92.2000 92.2000 3.969631 3.969631
233 United States USA 318857056 2818.0000 2818.0000 16.799148 16.976579
234 Uzbekistan UZB 30757700 16.3400 16.3400 370.746634 342.717258
235 St. Vincent and the Grenadines VCT 109360 0.1000 0.1000 NaN 8.500000
236 Venezuela, RB VEN 30693827 805.0000 805.0000 1.125963 2.811180
237 Virgin Islands (U.S.) VIR 104170 NaN NaN NaN NaN
238 Vietnam VNM 90728900 359.4000 359.4000 NaN 22.824151
239 Vanuatu VUT 258883 NaN NaN NaN NaN
240 West Bank and Gaza PSE 4294682 0.8120 0.8120 34.359606 51.477833
241 World WLD 7259691769 43378.3481 42800.7551 NaN 9.139196
242 Samoa WSM 191845 NaN NaN NaN NaN
243 Yemen, Rep. YEM 26183676 2.1000 2.1000 161.904762 169.761905
244 South Africa ZAF 54001953 44.8000 44.8000 27.901786 27.901786
245 Congo, Dem. Rep. COD 74877030 900.0000 900.0000 0.064900 0.075956
246 Zambia ZMB 15721343 80.2000 80.2000 1.960100 1.960100
247 Zimbabwe ZWE 15245855 12.2600 12.2600 34.298532 34.298532

248 rows × 7 columns


In [15]:
# only countries with more than 25 million inhabitants and those who have values in every column (2002 that one with most non-values)
big_ones = df[df['population'] > 25000000]
notnull = df[df['annual freshwater withdrawl [%] 2002'].notnull()]

selection = df[(df['population'] > 25000000) & (df['annual freshwater withdrawl [%] 2002'].notnull()) & (df['annual freshwater withdrawl [%] 2002'] != 0)]
selection


Out[15]:
country countrycode population total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
2 Afghanistan AFG 31627506 47.15 47.15 43.011665 43.011665
42 Colombia COL 47791393 2145.00 2145.00 0.361119 0.548718
52 Germany DEU 80970732 107.00 107.00 36.588785 30.878505
57 Algeria DZA 38934334 11.25 11.25 50.871111 74.888889
63 Egypt, Arab Rep. EGY 89579670 1.80 1.80 3794.444444 3794.444444
66 Spain ESP 46476032 111.20 111.20 32.320144 30.161871
68 Ethiopia ETH 96958732 122.00 122.00 4.555738 4.555738
73 France FRA 66217509 200.00 200.00 16.190000 16.555000
77 United Kingdom GBR 64559135 145.00 145.00 10.751724 7.468966
79 Ghana GHA 26786598 30.30 30.30 3.240924 3.240924
97 Indonesia IDN 254454778 2019.00 2019.00 5.611689 5.611689
99 India IND 1295291543 1446.00 1446.00 42.213001 52.627939
102 Iran, Islamic Rep. IRN 78143644 128.50 128.50 69.805447 72.607004
103 Iraq IRQ 34812326 35.20 35.20 187.500000 187.500000
106 Italy ITA 60789140 182.50 182.50 24.882192 29.452055
109 Japan JPN 127131800 430.00 430.00 19.686047 18.941860
116 Korea, Rep. KOR 50423955 64.85 64.85 44.965305 45.026985
138 Morocco MAR 33921203 29.00 29.00 51.103448 35.965517
144 Mexico MEX 125385833 409.00 409.00 17.750611 19.633252
150 Myanmar MMR 53437159 1003.00 1003.00 3.313061 3.313061
155 Mozambique MOZ 27216276 100.30 100.30 0.881555 0.881555
159 Malaysia MYS 29901997 580.00 580.00 1.604310 1.931034
164 Nigeria NGA 177475986 221.00 221.00 4.665158 5.932127
169 Nepal NPL 28174724 198.20 198.20 4.824420 4.791625
175 Pakistan PAK 185044286 55.00 55.00 313.818182 333.636364
181 Poland POL 38011735 53.60 53.60 21.884328 21.417910
190 Russian Federation RUS 143819569 4313.00 4313.00 1.534895 1.534895
226 Turkey TUR 75932348 227.00 227.00 18.502203 17.665198
228 Tanzania TZA 51822621 84.00 84.00 6.171429 6.171429
229 Uganda UGA 37782971 39.00 39.00 0.813846 1.633333
230 Ukraine UKR 45362900 55.10 55.10 33.176044 26.950998
233 United States USA 318857056 2818.00 2818.00 16.799148 16.976579
234 Uzbekistan UZB 30757700 16.34 16.34 370.746634 342.717258
236 Venezuela, RB VEN 30693827 805.00 805.00 1.125963 2.811180
243 Yemen, Rep. YEM 26183676 2.10 2.10 161.904762 169.761905
244 South Africa ZAF 54001953 44.80 44.80 27.901786 27.901786
245 Congo, Dem. Rep. COD 74877030 900.00 900.00 0.064900 0.075956

1) What is the average renewable freshwater resource?


In [16]:
selection['total renewable freshwater resources [billion cubic meters] 2014'].describe()


Out[16]:
count      37.000000
mean      518.383514
std       925.711693
min         1.800000
25%        47.150000
50%       122.000000
75%       430.000000
max      4313.000000
Name: total renewable freshwater resources [billion cubic meters] 2014, dtype: float64

2) What is the average withdrawl rate in 2014?


In [17]:
selection['annual freshwater withdrawl [%] 2014'].describe()
# @ TA values over 100% indicate that on top of renewable resources, water of non-renewable sources is withdrawn, 
# as described in the WorldBank metadata for the original datasets


Out[17]:
count      37.000000
mean      147.546627
std       621.611686
min         0.075956
25%         4.555738
50%        18.941860
75%        43.011665
max      3794.444444
Name: annual freshwater withdrawl [%] 2014, dtype: float64

3) Which are the 5 countries with most/least renewable freshwater (total, 2014)?


In [18]:
selection.sort_values('total renewable freshwater resources [billion cubic meters] 2014', ascending = True).head(5)


Out[18]:
country countrycode population total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
63 Egypt, Arab Rep. EGY 89579670 1.80 1.80 3794.444444 3794.444444
243 Yemen, Rep. YEM 26183676 2.10 2.10 161.904762 169.761905
57 Algeria DZA 38934334 11.25 11.25 50.871111 74.888889
234 Uzbekistan UZB 30757700 16.34 16.34 370.746634 342.717258
138 Morocco MAR 33921203 29.00 29.00 51.103448 35.965517

In [19]:
selection.sort_values('total renewable freshwater resources [billion cubic meters] 2014', ascending = False).head(5)


Out[19]:
country countrycode population total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
190 Russian Federation RUS 143819569 4313.0 4313.0 1.534895 1.534895
233 United States USA 318857056 2818.0 2818.0 16.799148 16.976579
42 Colombia COL 47791393 2145.0 2145.0 0.361119 0.548718
97 Indonesia IDN 254454778 2019.0 2019.0 5.611689 5.611689
99 India IND 1295291543 1446.0 1446.0 42.213001 52.627939

4) How much freshwater is there per 100,000 inhabitants for each country? TOP5 with most, TOP5 with least.


In [20]:
#inserting column with population converted to population in 100,000 inhabitants
population_hundret = selection['population']/100000
selection.insert(3,"population in 100,000 inhabitants", population_hundret)
selection


Out[20]:
country countrycode population population in 100,000 inhabitants total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
2 Afghanistan AFG 31627506 316.27506 47.15 47.15 43.011665 43.011665
42 Colombia COL 47791393 477.91393 2145.00 2145.00 0.361119 0.548718
52 Germany DEU 80970732 809.70732 107.00 107.00 36.588785 30.878505
57 Algeria DZA 38934334 389.34334 11.25 11.25 50.871111 74.888889
63 Egypt, Arab Rep. EGY 89579670 895.79670 1.80 1.80 3794.444444 3794.444444
66 Spain ESP 46476032 464.76032 111.20 111.20 32.320144 30.161871
68 Ethiopia ETH 96958732 969.58732 122.00 122.00 4.555738 4.555738
73 France FRA 66217509 662.17509 200.00 200.00 16.190000 16.555000
77 United Kingdom GBR 64559135 645.59135 145.00 145.00 10.751724 7.468966
79 Ghana GHA 26786598 267.86598 30.30 30.30 3.240924 3.240924
97 Indonesia IDN 254454778 2544.54778 2019.00 2019.00 5.611689 5.611689
99 India IND 1295291543 12952.91543 1446.00 1446.00 42.213001 52.627939
102 Iran, Islamic Rep. IRN 78143644 781.43644 128.50 128.50 69.805447 72.607004
103 Iraq IRQ 34812326 348.12326 35.20 35.20 187.500000 187.500000
106 Italy ITA 60789140 607.89140 182.50 182.50 24.882192 29.452055
109 Japan JPN 127131800 1271.31800 430.00 430.00 19.686047 18.941860
116 Korea, Rep. KOR 50423955 504.23955 64.85 64.85 44.965305 45.026985
138 Morocco MAR 33921203 339.21203 29.00 29.00 51.103448 35.965517
144 Mexico MEX 125385833 1253.85833 409.00 409.00 17.750611 19.633252
150 Myanmar MMR 53437159 534.37159 1003.00 1003.00 3.313061 3.313061
155 Mozambique MOZ 27216276 272.16276 100.30 100.30 0.881555 0.881555
159 Malaysia MYS 29901997 299.01997 580.00 580.00 1.604310 1.931034
164 Nigeria NGA 177475986 1774.75986 221.00 221.00 4.665158 5.932127
169 Nepal NPL 28174724 281.74724 198.20 198.20 4.824420 4.791625
175 Pakistan PAK 185044286 1850.44286 55.00 55.00 313.818182 333.636364
181 Poland POL 38011735 380.11735 53.60 53.60 21.884328 21.417910
190 Russian Federation RUS 143819569 1438.19569 4313.00 4313.00 1.534895 1.534895
226 Turkey TUR 75932348 759.32348 227.00 227.00 18.502203 17.665198
228 Tanzania TZA 51822621 518.22621 84.00 84.00 6.171429 6.171429
229 Uganda UGA 37782971 377.82971 39.00 39.00 0.813846 1.633333
230 Ukraine UKR 45362900 453.62900 55.10 55.10 33.176044 26.950998
233 United States USA 318857056 3188.57056 2818.00 2818.00 16.799148 16.976579
234 Uzbekistan UZB 30757700 307.57700 16.34 16.34 370.746634 342.717258
236 Venezuela, RB VEN 30693827 306.93827 805.00 805.00 1.125963 2.811180
243 Yemen, Rep. YEM 26183676 261.83676 2.10 2.10 161.904762 169.761905
244 South Africa ZAF 54001953 540.01953 44.80 44.80 27.901786 27.901786
245 Congo, Dem. Rep. COD 74877030 748.77030 900.00 900.00 0.064900 0.075956

In [60]:
# putting water and population of 100,000 inhabitants into relation
water_per_hundret = (selection['total renewable freshwater resources [billion cubic meters] 2014']/population_hundret)*1000
selection.insert(6, "Renewable freshwater resource per 100,000 inhabitants [million cubic meters]", water_per_hundret)

In [61]:
# sorting for countries with most water
selection.sort_values(by='Renewable freshwater resource per 100,000 inhabitants [million cubic meters]', ascending = False).head(5)


Out[61]:
country countrycode population population in 100,000 inhabitants total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 Renewable freshwater resource per 100,000 inhabitants [million cubic meters] annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
42 Colombia COL 47791393 477.91393 2145.0 2145.0 4488.255866 0.361119 0.548718
190 Russian Federation RUS 143819569 1438.19569 4313.0 4313.0 2998.896485 1.534895 1.534895
236 Venezuela, RB VEN 30693827 306.93827 805.0 805.0 2622.677192 1.125963 2.811180
159 Malaysia MYS 29901997 299.01997 580.0 580.0 1939.669782 1.604310 1.931034
150 Myanmar MMR 53437159 534.37159 1003.0 1003.0 1876.971042 3.313061 3.313061

In [62]:
#plotting those with most
plt.style.use('ggplot')

#does not print sorted bars
#selection.sort_values(by='Renewable freshwater resource per 100,000 inhabitants [million cubic meters]', ascending = False).head(5).plot(kind='barh', x='country', y='total renewable freshwater resources [billion cubic meters] 2014', legend=False)

In [63]:
# sorting for countries with least water
selection.sort_values(by='Renewable freshwater resource per 100,000 inhabitants [million cubic meters]').head(5)


Out[63]:
country countrycode population population in 100,000 inhabitants total renewable freshwater resources [billion cubic meters] 2002 total renewable freshwater resources [billion cubic meters] 2014 Renewable freshwater resource per 100,000 inhabitants [million cubic meters] annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014
63 Egypt, Arab Rep. EGY 89579670 895.79670 1.80 1.80 2.009384 3794.444444 3794.444444
243 Yemen, Rep. YEM 26183676 261.83676 2.10 2.10 8.020264 161.904762 169.761905
57 Algeria DZA 38934334 389.34334 11.25 11.25 28.894805 50.871111 74.888889
175 Pakistan PAK 185044286 1850.44286 55.00 55.00 29.722615 313.818182 333.636364
234 Uzbekistan UZB 30757700 307.57700 16.34 16.34 53.124909 370.746634 342.717258

In [64]:
#plotting those with least
selection.sort_values(by='Renewable freshwater resource per 100,000 inhabitants [million cubic meters]').head(5).plot(kind="barh", x ="country", y ="Renewable freshwater resource per 100,000 inhabitants [million cubic meters]", legend=False)


Out[64]:
<matplotlib.axes._subplots.AxesSubplot at 0x81451b0>

5) Compared between 2002 and 2014, how did water withdrawl change?


In [65]:
selection.columns


Out[65]:
Index(['country', 'countrycode', 'population',
       'population in 100,000 inhabitants',
       'total renewable freshwater resources [billion cubic meters] 2002',
       'total renewable freshwater resources [billion cubic meters] 2014',
       'Renewable freshwater resource per 100,000 inhabitants [million cubic meters]',
       'annual freshwater withdrawl [%] 2002',
       'annual freshwater withdrawl [%] 2014'],
      dtype='object')

In [67]:
withdrawl = selection[['country','annual freshwater withdrawl [%] 2002','annual freshwater withdrawl [%] 2014']]
withdrawl_diff=selection['annual freshwater withdrawl [%] 2014'] - selection['annual freshwater withdrawl [%] 2002']
withdrawl.insert(3, 'Change in withdrawl rate', withdrawl_diff)
withdrawl.head()


Out[67]:
country annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014 Change in withdrawl rate
2 Afghanistan 43.011665 43.011665 0.000000
42 Colombia 0.361119 0.548718 0.187599
52 Germany 36.588785 30.878505 -5.710280
57 Algeria 50.871111 74.888889 24.017778
63 Egypt, Arab Rep. 3794.444444 3794.444444 0.000000

6) 5 with biggest increase in withdrawl, 5 with biggest decrease in withdrawl


In [70]:
withdrawl.sort_values(by='Change in withdrawl rate').head(5)


Out[70]:
country annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014 Change in withdrawl rate
234 Uzbekistan 370.746634 342.717258 -28.029376
138 Morocco 51.103448 35.965517 -15.137931
230 Ukraine 33.176044 26.950998 -6.225045
52 Germany 36.588785 30.878505 -5.710280
77 United Kingdom 10.751724 7.468966 -3.282759

In [71]:
withdrawl.sort_values(by='Change in withdrawl rate').tail(5)


Out[71]:
country annual freshwater withdrawl [%] 2002 annual freshwater withdrawl [%] 2014 Change in withdrawl rate
106 Italy 24.882192 29.452055 4.569863
243 Yemen, Rep. 161.904762 169.761905 7.857143
99 India 42.213001 52.627939 10.414938
175 Pakistan 313.818182 333.636364 19.818182
57 Algeria 50.871111 74.888889 24.017778

7) Scatter plot: freshwater vs population for 10 most populated vs 10 least populated


In [92]:
selection.sort_values(by='population in 100,000 inhabitants', ascending= False).head(10).plot(kind='scatter', x='population in 100,000 inhabitants', y='annual freshwater withdrawl [%] 2014')
#would be nice to put the country's name next to each dot, no idea how to do that unfortunately. Couldn't google useful things


Out[92]:
<matplotlib.axes._subplots.AxesSubplot at 0x953c030>

In [93]:
selection.sort_values(by='population in 100,000 inhabitants').head(10).plot(kind='scatter', x='population in 100,000 inhabitants', y='annual freshwater withdrawl [%] 2014')
#would be nice to put the country's name next to each dot, no idea how to do that unfortunately. Couldn't google useful things


Out[93]:
<matplotlib.axes._subplots.AxesSubplot at 0x82e7050>

8) Scatter plot: freshwater vs population for 10 with most water vs 10 with least water


In [99]:
#those with most water
selection.sort_values(by='total renewable freshwater resources [billion cubic meters] 2014', ascending=False).head(10).plot(kind='scatter', x='population in 100,000 inhabitants', y='annual freshwater withdrawl [%] 2014')


Out[99]:
<matplotlib.axes._subplots.AxesSubplot at 0x9cea8f0>

In [101]:
#those with least water
selection.sort_values(by='total renewable freshwater resources [billion cubic meters] 2014').head(10).plot(kind='scatter', x='population in 100,000 inhabitants', y='annual freshwater withdrawl [%] 2014')


Out[101]:
<matplotlib.axes._subplots.AxesSubplot at 0x8209c90>

9) Scatter plot: freshwater per capita vs freshwater withdrawl


In [108]:
selection.plot(kind='scatter', x='annual freshwater withdrawl [%] 2014', y='Renewable freshwater resource per 100,000 inhabitants [million cubic meters]', xlim=(0,80), ylim=(0,3500))


Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0xa2a79d0>

In [ ]: