Data exploration

We use the dataset found at https://github.com/mkcor/data-wrangling/blob/master/data/tidy_who.csv (see the notebook at the root of that repo for the generation of this dataset).



In [1]:

    
import pandas as pd



In [2]:

    
df = pd.read_csv('../data/tidy_who.csv')

Loading data

... just works. read_csv() comes with many convenient arguments, such as skiprows, nrows, na_values, etc. Note that, alternatively, we could have run df = pd.read_csv('https://raw.githubusercontent.com/mkcor/data-wrangling/master/data/tidy_who.csv').



In [3]:

    
df.head()









    Out[3]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      0
      Afghanistan
      EMR
      1980
      NaN
      sp
      m
      14
    
    
      1
      Afghanistan
      EMR
      1981
      NaN
      sp
      m
      14
    
    
      2
      Afghanistan
      EMR
      1982
      NaN
      sp
      m
      14
    
    
      3
      Afghanistan
      EMR
      1983
      NaN
      sp
      m
      14
    
    
      4
      Afghanistan
      EMR
      1984
      NaN
      sp
      m
      14



In [4]:

    
df.shape









    Out[4]:





(429744, 7)



In [5]:

    
df.sample(10)









    Out[5]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      179685
      India
      SEA
      1983
      NaN
      sn
      f
      2534
    
    
      291809
      Angola
      AFR
      1997
      NaN
      ep
      f
      3544
    
    
      5802
      Saint Vincent and the Grenadines
      AMR
      2005
      0.0
      sp
      m
      14
    
    
      66582
      Palau
      WPR
      2005
      NaN
      sp
      f
      1524
    
    
      71576
      France
      EUR
      1994
      NaN
      sp
      f
      2534
    
    
      122880
      Algeria
      AFR
      2004
      NaN
      sn
      m
      2534
    
    
      174358
      Republic of Korea
      WPR
      1985
      NaN
      sn
      f
      1524
    
    
      92852
      Benin
      AFR
      1988
      NaN
      sp
      f
      5564
    
    
      343542
      Sao Tome and Principe
      AFR
      1981
      NaN
      rel
      m
      2534
    
    
      214090
      Tunisia
      EMR
      1990
      NaN
      sn
      f
      65



In [6]:

    
df.describe()









    Out[6]:







  
    
      
      year
      cases
      age_range
    
  
  
    
      count
      429744.000000
      81381.000000
      429744.000000
    
    
      mean
      1997.571540
      667.482496
      2542.714286
    
    
      std
      10.407887
      4490.566875
      1990.957917
    
    
      min
      1980.000000
      0.000000
      14.000000
    
    
      25%
      1989.000000
      3.000000
      65.000000
    
    
      50%
      1998.000000
      28.000000
      2534.000000
    
    
      75%
      2007.000000
      200.000000
      4554.000000
    
    
      max
      2015.000000
      250051.000000
      5564.000000



In [7]:

    
df['g_whoregion'].unique()









    Out[7]:





array(['EMR', 'EUR', 'AFR', 'WPR', 'AMR', 'SEA'], dtype=object)



In [8]:

    
df['country'].nunique()









    Out[8]:





219

Selecting data

We are already familiar with column selection. The [ ] syntax is the most basic way of indexing.



In [9]:

    
df['country'].head(3)









    Out[9]:





0    Afghanistan
1    Afghanistan
2    Afghanistan
Name: country, dtype: object

Columns can also be accessed as attributes (as long as they have a valid Python name).



In [10]:

    
df.country[1000:1003]









    Out[10]:





1000    Brazil
1001    Brazil
1002    Brazil
Name: country, dtype: object

We can select elements of a DataFrame either by label (with the .loc attribute) or by position (with the .iloc attribute). Row and column indices take the usual order (first and second place, respectively).



In [ ]:

    
df.loc[0, 'country']



In [11]:

    
df.loc[df.shape[0] - 1, 'country']









    Out[11]:





'Zimbabwe'



In [12]:

    
df.iloc[0, 0]









    Out[12]:





'Afghanistan'



In [13]:

    
df.iloc[df.shape[0] - 1, 0]









    Out[13]:





'Zimbabwe'

Slicing works too.



In [14]:

    
df.loc[:5, 'country']









    Out[14]:





0    Afghanistan
1    Afghanistan
2    Afghanistan
3    Afghanistan
4    Afghanistan
5    Afghanistan
Name: country, dtype: object

Label indexing is more natural than positional indexing (think of a function call, where keyword arguments are easier to work with than positional arguments).



In [15]:

    
df.loc[:5, 'country':'type']









    Out[15]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
    
  
  
    
      0
      Afghanistan
      EMR
      1980
      NaN
      sp
    
    
      1
      Afghanistan
      EMR
      1981
      NaN
      sp
    
    
      2
      Afghanistan
      EMR
      1982
      NaN
      sp
    
    
      3
      Afghanistan
      EMR
      1983
      NaN
      sp
    
    
      4
      Afghanistan
      EMR
      1984
      NaN
      sp
    
    
      5
      Afghanistan
      EMR
      1985
      NaN
      sp



In [16]:

    
df.iloc[:5, :5]









    Out[16]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
    
  
  
    
      0
      Afghanistan
      EMR
      1980
      NaN
      sp
    
    
      1
      Afghanistan
      EMR
      1981
      NaN
      sp
    
    
      2
      Afghanistan
      EMR
      1982
      NaN
      sp
    
    
      3
      Afghanistan
      EMR
      1983
      NaN
      sp
    
    
      4
      Afghanistan
      EMR
      1984
      NaN
      sp

Often we want to select data based on certain conditions.



In [17]:

    
cond = df.year < 1981



In [18]:

    
df[cond].shape









    Out[18]:





(11872, 7)



In [19]:

    
df[cond & (df.country == 'Argentina') & (df.type == 'rel') & (df.sex == 'm')]









    Out[19]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      322596
      Argentina
      AMR
      1980
      NaN
      rel
      m
      14
    
    
      330270
      Argentina
      AMR
      1980
      NaN
      rel
      m
      1524
    
    
      337944
      Argentina
      AMR
      1980
      NaN
      rel
      m
      2534
    
    
      345618
      Argentina
      AMR
      1980
      NaN
      rel
      m
      3544
    
    
      353292
      Argentina
      AMR
      1980
      NaN
      rel
      m
      4554
    
    
      360966
      Argentina
      AMR
      1980
      NaN
      rel
      m
      5564
    
    
      368640
      Argentina
      AMR
      1980
      NaN
      rel
      m
      65



In [20]:

    
gr_and_it = df.country.isin(['Greece', 'Italy'])



In [21]:

    
df[gr_and_it].tail()









    Out[21]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      425497
      Italy
      EUR
      2011
      NaN
      rel
      f
      65
    
    
      425498
      Italy
      EUR
      2012
      NaN
      rel
      f
      65
    
    
      425499
      Italy
      EUR
      2013
      249.0
      rel
      f
      65
    
    
      425500
      Italy
      EUR
      2014
      NaN
      rel
      f
      65
    
    
      425501
      Italy
      EUR
      2015
      280.0
      rel
      f
      65

Subsets can be selected by callable functions (returning valid indexers). The following function performs a selection by label (along country and g_whoregion).



In [22]:

    
lambda x: ['country', 'g_whoregion']









    Out[22]:





<function __main__.<lambda>>

So it can serve as a column indexer.



In [23]:

    
df.loc[:3, lambda x: ['country', 'g_whoregion']]









    Out[23]:







  
    
      
      country
      g_whoregion
    
  
  
    
      0
      Afghanistan
      EMR
    
    
      1
      Afghanistan
      EMR
    
    
      2
      Afghanistan
      EMR
    
    
      3
      Afghanistan
      EMR

The following function filters for data where the number of cases is greater than 100,000.



In [24]:

    
lambda x: x.cases > 100000









    Out[24]:





<function __main__.<lambda>>

So it can serve as a row indexer.



In [25]:

    
great = df.loc[lambda x: x.cases > 100000, :]
great









    Out[25]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      133665
      India
      SEA
      2007
      250051.0
      sn
      m
      3544
    
    
      187383
      India
      SEA
      2007
      148811.0
      sn
      f
      3544
    
    
      241101
      India
      SEA
      2007
      105825.0
      ep
      m
      3544
    
    
      294819
      India
      SEA
      2007
      101015.0
      ep
      f
      3544
    
    
      333196
      India
      SEA
      2014
      180319.0
      rel
      m
      1524
    
    
      333197
      India
      SEA
      2015
      186771.0
      rel
      m
      1524
    
    
      340870
      India
      SEA
      2014
      190483.0
      rel
      m
      2534
    
    
      340871
      India
      SEA
      2015
      197298.0
      rel
      m
      2534
    
    
      348544
      India
      SEA
      2014
      199850.0
      rel
      m
      3544
    
    
      348545
      India
      SEA
      2015
      207000.0
      rel
      m
      3544
    
    
      354519
      China
      WPR
      2013
      100297.0
      rel
      m
      4554
    
    
      354520
      China
      WPR
      2014
      102352.0
      rel
      m
      4554
    
    
      354521
      China
      WPR
      2015
      103685.0
      rel
      m
      4554
    
    
      356218
      India
      SEA
      2014
      188106.0
      rel
      m
      4554
    
    
      356219
      India
      SEA
      2015
      194837.0
      rel
      m
      4554
    
    
      362193
      China
      WPR
      2013
      112558.0
      rel
      m
      5564
    
    
      362194
      China
      WPR
      2014
      108902.0
      rel
      m
      5564
    
    
      362195
      China
      WPR
      2015
      105403.0
      rel
      m
      5564
    
    
      363892
      India
      SEA
      2014
      148933.0
      rel
      m
      5564
    
    
      363893
      India
      SEA
      2015
      154262.0
      rel
      m
      5564
    
    
      369867
      China
      WPR
      2013
      124476.0
      rel
      m
      65
    
    
      369868
      China
      WPR
      2014
      123436.0
      rel
      m
      65
    
    
      369869
      China
      WPR
      2015
      125699.0
      rel
      m
      65
    
    
      371566
      India
      SEA
      2014
      102929.0
      rel
      m
      65
    
    
      371567
      India
      SEA
      2015
      106612.0
      rel
      m
      65
    
    
      386914
      India
      SEA
      2014
      149671.0
      rel
      f
      1524
    
    
      386915
      India
      SEA
      2015
      155026.0
      rel
      f
      1524
    
    
      394588
      India
      SEA
      2014
      127605.0
      rel
      f
      2534
    
    
      394589
      India
      SEA
      2015
      132171.0
      rel
      f
      2534



In [26]:

    
df.cases.loc[lambda x: x > 100000]









    Out[26]:





133665    250051.0
187383    148811.0
241101    105825.0
294819    101015.0
333196    180319.0
333197    186771.0
340870    190483.0
340871    197298.0
348544    199850.0
348545    207000.0
354519    100297.0
354520    102352.0
354521    103685.0
356218    188106.0
356219    194837.0
362193    112558.0
362194    108902.0
362195    105403.0
363892    148933.0
363893    154262.0
369867    124476.0
369868    123436.0
369869    125699.0
371566    102929.0
371567    106612.0
386914    149671.0
386915    155026.0
394588    127605.0
394589    132171.0
Name: cases, dtype: float64

We may want to select or mask data while preserving the original shape.



In [27]:

    
great.where(great.country == 'India')









    Out[27]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      133665
      India
      SEA
      2007.0
      250051.0
      sn
      m
      3544.0
    
    
      187383
      India
      SEA
      2007.0
      148811.0
      sn
      f
      3544.0
    
    
      241101
      India
      SEA
      2007.0
      105825.0
      ep
      m
      3544.0
    
    
      294819
      India
      SEA
      2007.0
      101015.0
      ep
      f
      3544.0
    
    
      333196
      India
      SEA
      2014.0
      180319.0
      rel
      m
      1524.0
    
    
      333197
      India
      SEA
      2015.0
      186771.0
      rel
      m
      1524.0
    
    
      340870
      India
      SEA
      2014.0
      190483.0
      rel
      m
      2534.0
    
    
      340871
      India
      SEA
      2015.0
      197298.0
      rel
      m
      2534.0
    
    
      348544
      India
      SEA
      2014.0
      199850.0
      rel
      m
      3544.0
    
    
      348545
      India
      SEA
      2015.0
      207000.0
      rel
      m
      3544.0
    
    
      354519
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      354520
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      354521
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      356218
      India
      SEA
      2014.0
      188106.0
      rel
      m
      4554.0
    
    
      356219
      India
      SEA
      2015.0
      194837.0
      rel
      m
      4554.0
    
    
      362193
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      362194
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      362195
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      363892
      India
      SEA
      2014.0
      148933.0
      rel
      m
      5564.0
    
    
      363893
      India
      SEA
      2015.0
      154262.0
      rel
      m
      5564.0
    
    
      369867
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      369868
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      369869
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      371566
      India
      SEA
      2014.0
      102929.0
      rel
      m
      65.0
    
    
      371567
      India
      SEA
      2015.0
      106612.0
      rel
      m
      65.0
    
    
      386914
      India
      SEA
      2014.0
      149671.0
      rel
      f
      1524.0
    
    
      386915
      India
      SEA
      2015.0
      155026.0
      rel
      f
      1524.0
    
    
      394588
      India
      SEA
      2014.0
      127605.0
      rel
      f
      2534.0
    
    
      394589
      India
      SEA
      2015.0
      132171.0
      rel
      f
      2534.0



In [28]:

    
great.mask(great.country == 'India')









    Out[28]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      133665
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      187383
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      241101
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      294819
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      333196
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      333197
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      340870
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      340871
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      348544
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      348545
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      354519
      China
      WPR
      2013.0
      100297.0
      rel
      m
      4554.0
    
    
      354520
      China
      WPR
      2014.0
      102352.0
      rel
      m
      4554.0
    
    
      354521
      China
      WPR
      2015.0
      103685.0
      rel
      m
      4554.0
    
    
      356218
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      356219
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      362193
      China
      WPR
      2013.0
      112558.0
      rel
      m
      5564.0
    
    
      362194
      China
      WPR
      2014.0
      108902.0
      rel
      m
      5564.0
    
    
      362195
      China
      WPR
      2015.0
      105403.0
      rel
      m
      5564.0
    
    
      363892
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      363893
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      369867
      China
      WPR
      2013.0
      124476.0
      rel
      m
      65.0
    
    
      369868
      China
      WPR
      2014.0
      123436.0
      rel
      m
      65.0
    
    
      369869
      China
      WPR
      2015.0
      125699.0
      rel
      m
      65.0
    
    
      371566
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      371567
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      386914
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      386915
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      394588
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      394589
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN

Hands-on exercises

Select the rows of df where country is Greece, age is at most 24, and year is 2000. Name it df1.
Write the df1 DataFrame to a CSV file located in the data/ subdirectory. (Hint: The method is named to_csv.)
Read this CSV file into a DataFrame named df2. What do you notice about the index? (Feel free to fire up a Terminal and look at the CSV file.)



In [29]:

    
df1 = df[(df.country == 'Greece') & (df.year == 2000) & (df.age_range.isin([14, 1524]))]



In [30]:

    
df1.to_csv('../data/df1.csv')



In [31]:

    
df2 = pd.read_csv('../data/df1.csv')

Indexing



In [32]:

    
df1.index









    Out[32]:





Int64Index([  2768,  10442,  56486,  64160, 110204, 117878, 163922, 171596,
            217640, 225314, 271358, 279032, 325076, 332750, 378794, 386468],
           dtype='int64')



In [33]:

    
df2.index









    Out[33]:





RangeIndex(start=0, stop=16, step=1)

We could specify that the first (unnamed) column should be used as the index (row labels).



In [34]:

    
pd.read_csv('../data/df1.csv', index_col=0)









    Out[34]:







  
    
      
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      2768
      Greece
      EUR
      2000
      1.0
      sp
      m
      14
    
    
      10442
      Greece
      EUR
      2000
      10.0
      sp
      m
      1524
    
    
      56486
      Greece
      EUR
      2000
      0.0
      sp
      f
      14
    
    
      64160
      Greece
      EUR
      2000
      2.0
      sp
      f
      1524
    
    
      110204
      Greece
      EUR
      2000
      NaN
      sn
      m
      14
    
    
      117878
      Greece
      EUR
      2000
      NaN
      sn
      m
      1524
    
    
      163922
      Greece
      EUR
      2000
      NaN
      sn
      f
      14
    
    
      171596
      Greece
      EUR
      2000
      NaN
      sn
      f
      1524
    
    
      217640
      Greece
      EUR
      2000
      NaN
      ep
      m
      14
    
    
      225314
      Greece
      EUR
      2000
      NaN
      ep
      m
      1524
    
    
      271358
      Greece
      EUR
      2000
      NaN
      ep
      f
      14
    
    
      279032
      Greece
      EUR
      2000
      NaN
      ep
      f
      1524
    
    
      325076
      Greece
      EUR
      2000
      NaN
      rel
      m
      14
    
    
      332750
      Greece
      EUR
      2000
      NaN
      rel
      m
      1524
    
    
      378794
      Greece
      EUR
      2000
      NaN
      rel
      f
      14
    
    
      386468
      Greece
      EUR
      2000
      NaN
      rel
      f
      1524

Remember we learnt set_index() in the previous section? We also have reset_index() at our disposal.



In [35]:

    
df1.reset_index()









    Out[35]:







  
    
      
      index
      country
      g_whoregion
      year
      cases
      type
      sex
      age_range
    
  
  
    
      0
      2768
      Greece
      EUR
      2000
      1.0
      sp
      m
      14
    
    
      1
      10442
      Greece
      EUR
      2000
      10.0
      sp
      m
      1524
    
    
      2
      56486
      Greece
      EUR
      2000
      0.0
      sp
      f
      14
    
    
      3
      64160
      Greece
      EUR
      2000
      2.0
      sp
      f
      1524
    
    
      4
      110204
      Greece
      EUR
      2000
      NaN
      sn
      m
      14
    
    
      5
      117878
      Greece
      EUR
      2000
      NaN
      sn
      m
      1524
    
    
      6
      163922
      Greece
      EUR
      2000
      NaN
      sn
      f
      14
    
    
      7
      171596
      Greece
      EUR
      2000
      NaN
      sn
      f
      1524
    
    
      8
      217640
      Greece
      EUR
      2000
      NaN
      ep
      m
      14
    
    
      9
      225314
      Greece
      EUR
      2000
      NaN
      ep
      m
      1524
    
    
      10
      271358
      Greece
      EUR
      2000
      NaN
      ep
      f
      14
    
    
      11
      279032
      Greece
      EUR
      2000
      NaN
      ep
      f
      1524
    
    
      12
      325076
      Greece
      EUR
      2000
      NaN
      rel
      m
      14
    
    
      13
      332750
      Greece
      EUR
      2000
      NaN
      rel
      m
      1524
    
    
      14
      378794
      Greece
      EUR
      2000
      NaN
      rel
      f
      14
    
    
      15
      386468
      Greece
      EUR
      2000
      NaN
      rel
      f
      1524

And we are back to a default index for this DataFrame. The original index is stored in its own column.



In [36]:

    
df1.reset_index().index









    Out[36]:





RangeIndex(start=0, stop=16, step=1)

Hands-on exercises

Write the df1 DataFrame with a default index to another CSV file.
Read this other CSV file into a DataFrame, setting its index to be the original index.

Reference

Current limitations with the feather format: https://github.com/wesm/feather/tree/master/python#limitations

	country	g_whoregion	year	cases	type	sex	age_range
0	Afghanistan	EMR	1980	NaN	sp	m	14
1	Afghanistan	EMR	1981	NaN	sp	m	14
2	Afghanistan	EMR	1982	NaN	sp	m	14
3	Afghanistan	EMR	1983	NaN	sp	m	14
4	Afghanistan	EMR	1984	NaN	sp	m	14

	country	g_whoregion	year	cases	type	sex	age_range
179685	India	SEA	1983	NaN	sn	f	2534
291809	Angola	AFR	1997	NaN	ep	f	3544
5802	Saint Vincent and the Grenadines	AMR	2005	0.0	sp	m	14
66582	Palau	WPR	2005	NaN	sp	f	1524
71576	France	EUR	1994	NaN	sp	f	2534
122880	Algeria	AFR	2004	NaN	sn	m	2534
174358	Republic of Korea	WPR	1985	NaN	sn	f	1524
92852	Benin	AFR	1988	NaN	sp	f	5564
343542	Sao Tome and Principe	AFR	1981	NaN	rel	m	2534
214090	Tunisia	EMR	1990	NaN	sn	f	65

	year	cases	age_range
count	429744.000000	81381.000000	429744.000000
mean	1997.571540	667.482496	2542.714286
std	10.407887	4490.566875	1990.957917
min	1980.000000	0.000000	14.000000
25%	1989.000000	3.000000	65.000000
50%	1998.000000	28.000000	2534.000000
75%	2007.000000	200.000000	4554.000000
max	2015.000000	250051.000000	5564.000000

	country	g_whoregion	year	cases	type	sex	age_range
322596	Argentina	AMR	1980	NaN	rel	m	14
330270	Argentina	AMR	1980	NaN	rel	m	1524
337944	Argentina	AMR	1980	NaN	rel	m	2534
345618	Argentina	AMR	1980	NaN	rel	m	3544
353292	Argentina	AMR	1980	NaN	rel	m	4554
360966	Argentina	AMR	1980	NaN	rel	m	5564
368640	Argentina	AMR	1980	NaN	rel	m	65

	country	g_whoregion	year	cases	type	sex	age_range
425497	Italy	EUR	2011	NaN	rel	f	65
425498	Italy	EUR	2012	NaN	rel	f	65
425499	Italy	EUR	2013	249.0	rel	f	65
425500	Italy	EUR	2014	NaN	rel	f	65
425501	Italy	EUR	2015	280.0	rel	f	65

	country	g_whoregion	year	cases	type	sex	age_range
133665	India	SEA	2007	250051.0	sn	m	3544
187383	India	SEA	2007	148811.0	sn	f	3544
241101	India	SEA	2007	105825.0	ep	m	3544
294819	India	SEA	2007	101015.0	ep	f	3544
333196	India	SEA	2014	180319.0	rel	m	1524
333197	India	SEA	2015	186771.0	rel	m	1524
340870	India	SEA	2014	190483.0	rel	m	2534
340871	India	SEA	2015	197298.0	rel	m	2534
348544	India	SEA	2014	199850.0	rel	m	3544
348545	India	SEA	2015	207000.0	rel	m	3544
354519	China	WPR	2013	100297.0	rel	m	4554
354520	China	WPR	2014	102352.0	rel	m	4554
354521	China	WPR	2015	103685.0	rel	m	4554
356218	India	SEA	2014	188106.0	rel	m	4554
356219	India	SEA	2015	194837.0	rel	m	4554
362193	China	WPR	2013	112558.0	rel	m	5564
362194	China	WPR	2014	108902.0	rel	m	5564
362195	China	WPR	2015	105403.0	rel	m	5564
363892	India	SEA	2014	148933.0	rel	m	5564
363893	India	SEA	2015	154262.0	rel	m	5564
369867	China	WPR	2013	124476.0	rel	m	65
369868	China	WPR	2014	123436.0	rel	m	65
369869	China	WPR	2015	125699.0	rel	m	65
371566	India	SEA	2014	102929.0	rel	m	65
371567	India	SEA	2015	106612.0	rel	m	65
386914	India	SEA	2014	149671.0	rel	f	1524
386915	India	SEA	2015	155026.0	rel	f	1524
394588	India	SEA	2014	127605.0	rel	f	2534
394589	India	SEA	2015	132171.0	rel	f	2534

	country	g_whoregion	year	cases	type	sex	age_range
133665	India	SEA	2007.0	250051.0	sn	m	3544.0
187383	India	SEA	2007.0	148811.0	sn	f	3544.0
241101	India	SEA	2007.0	105825.0	ep	m	3544.0
294819	India	SEA	2007.0	101015.0	ep	f	3544.0
333196	India	SEA	2014.0	180319.0	rel	m	1524.0
333197	India	SEA	2015.0	186771.0	rel	m	1524.0
340870	India	SEA	2014.0	190483.0	rel	m	2534.0
340871	India	SEA	2015.0	197298.0	rel	m	2534.0
348544	India	SEA	2014.0	199850.0	rel	m	3544.0
348545	India	SEA	2015.0	207000.0	rel	m	3544.0
354519	NaN	NaN	NaN	NaN	NaN	NaN	NaN
354520	NaN	NaN	NaN	NaN	NaN	NaN	NaN
354521	NaN	NaN	NaN	NaN	NaN	NaN	NaN
356218	India	SEA	2014.0	188106.0	rel	m	4554.0
356219	India	SEA	2015.0	194837.0	rel	m	4554.0
362193	NaN	NaN	NaN	NaN	NaN	NaN	NaN
362194	NaN	NaN	NaN	NaN	NaN	NaN	NaN
362195	NaN	NaN	NaN	NaN	NaN	NaN	NaN
363892	India	SEA	2014.0	148933.0	rel	m	5564.0
363893	India	SEA	2015.0	154262.0	rel	m	5564.0
369867	NaN	NaN	NaN	NaN	NaN	NaN	NaN
369868	NaN	NaN	NaN	NaN	NaN	NaN	NaN
369869	NaN	NaN	NaN	NaN	NaN	NaN	NaN
371566	India	SEA	2014.0	102929.0	rel	m	65.0
371567	India	SEA	2015.0	106612.0	rel	m	65.0
386914	India	SEA	2014.0	149671.0	rel	f	1524.0
386915	India	SEA	2015.0	155026.0	rel	f	1524.0
394588	India	SEA	2014.0	127605.0	rel	f	2534.0
394589	India	SEA	2015.0	132171.0	rel	f	2534.0

	country	g_whoregion	year	cases	type	sex	age_range
2768	Greece	EUR	2000	1.0	sp	m	14
10442	Greece	EUR	2000	10.0	sp	m	1524
56486	Greece	EUR	2000	0.0	sp	f	14
64160	Greece	EUR	2000	2.0	sp	f	1524
110204	Greece	EUR	2000	NaN	sn	m	14
117878	Greece	EUR	2000	NaN	sn	m	1524
163922	Greece	EUR	2000	NaN	sn	f	14
171596	Greece	EUR	2000	NaN	sn	f	1524
217640	Greece	EUR	2000	NaN	ep	m	14
225314	Greece	EUR	2000	NaN	ep	m	1524
271358	Greece	EUR	2000	NaN	ep	f	14
279032	Greece	EUR	2000	NaN	ep	f	1524
325076	Greece	EUR	2000	NaN	rel	m	14
332750	Greece	EUR	2000	NaN	rel	m	1524
378794	Greece	EUR	2000	NaN	rel	f	14
386468	Greece	EUR	2000	NaN	rel	f	1524