Regression Modeling in Practice

Assignment: Writing about your data

Here is my first assignment of the Regression Modeling in Practice online course.

I decided to use Jupyter Notebook as it is a pretty way to write code and present results.

Research question

Using the Gapminder database, I would like to see if an increasing Internet usage results in an increasing suicide rate. A study shows that other factors like unemployment could have a great impact.

So for this assignment, the three following variables will be analyzed:

  • Internet Usage Rate (per 100 people)
  • Suicide Rate (per 100 000 people)
  • Employment Rate (% of the population of age 15+)

As the Gapminder database is an aggregation of data from different sources, I will described the data separately for the different variables.

About my data

Sample

The sample comes from the Gapminder database (http://www.gapminder.org/). This is a non-profit organization founded in Stockholm to promote sustainable global development in order to achieve the United Nations Millenium Development Goals.

The data are gathered for 215 areas (192 UN members, Serbia and Montenegro being aggregated + 24 areas). But not all indicators are available for all countries.

Although the database provides time series for all indicators on an yearly base, the sample used for this class uses data of a certain year depending on the indicator :

  • Internet Usage Rate (per 100 people): data used are for 2010
  • Suicide per 100 000 people: data used are for 2008
  • Employment rate for people of age 15+: data used are for 2007

Procedure

The three indicators are collected by different organizations :

  • Internet Usage Rate (per 100 people):
    • Data from the World Bank (http://databank.worldbank.org/data/home.aspx)
    • The data are computed as the weighted average of different sources: International Telecommunication Union, World Telecommunication/ICT Development Report and database, and World Bank estimates.
  • Suicide per 100 000 people: data used are for 2008
  • Employment rate for people of age 15+: data used are for 2007
    • Data from the International Labour Organization (ILO) (http://www.ilo.org/emppolicy/lang--en/index.htm)
    • ILO publishes every two years since 1999, 18 key indicators of the labour market including the employment rate.
    • The data were collected by different methods depending on the countries. The precise list is available there http://www.ilo.org/ilostat. Those methods are Population census, Official estimate, Administrative records, Population register, Household surveys, Labour force survey, Household income/expenditure survey or Other household survey.

Measures

All three indicators are constructed mainly on report from the member states.

  • The explanatory variable Internet Usage Rate (per 100 people):
    • Definition : Internet users are defined as individuals who used the Internet int he last 12 months.
    • Scale : 0 to 100
    • Management : Suppression of countries with no-data provided
  • The response variable Suicide, age adjusted, per 100 000 people:
    • Definition : Mortality due to self-inflicted injury, per 100 000 standard population, age adjusted. Combination of data from WHO Violence and Injury Prevention (VIP) and from WHO Global Burden of Disease.
    • Scale : 0 to 100 000
    • Management : Suppression of countries with no-data provided
  • Another explanatory variable Employment rate for people of age 15+:
    • Definiton : Percentage of total population, age above 15, that has been employed during the given year.
    • Scale : 0% to 100%
    • Management : Suppression of countries with no-data provided