Using internet speed and access to measure GDP per capita

Horace Fung and Shivam Saraogi

This notebook analyzes the relationship between internet speed and access to GDP per capita. The goal is to use internet speed and access as a proxy to measure GDP. In less developed countries, GDP is often poorly measured due to a lack of proper infrastructure or improper methodology. This is problematic when we compare GDP between countries. Therefore, a model that can circumvent the typical output accounting process may provide GDP comparisons that are more meaningful. The goal is not to get accurate GDP data; the goal is to have standardized data. We chose internet speed and access because of two reasons. Firstly, with technology becoming a key driver of economic growth, we believe that developed countries (with higher GDPs) can afford to and may want to spend more on network infrastructure, whether it is supplied by government or private firms. Secondly, the process of measuring internet speed/access is relatively easier and less prone to error than GDP accounting.

The notebook is broken into 3 parts:

1) Cleaning the data: We will clean up the data and merge into one dataframe for easier use
2) Graphing the data: We will produce some graphs to show the relationships between the variables
3) Creating the regression: Will will create a multiple variable regression to estimate GDP



In [11]:

    
import sys                            
import pandas as pd                    
import matplotlib.pyplot as plt          
import numpy as np                     
import seaborn.apionly as sns          

# plotly imports (Had to change from go since the graph was not showing up for others)
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot  
from plotly.graph_objs import *
init_notebook_mode()             
import plotly                                
import cufflinks as cf                       
from plotly import tools
cf.set_config_file(offline=True, offline_show_link=False)

# For regression 
import statsmodels.api as sm 

# For 3D plot
from numpy import exp,arange
from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis,title,show 
from mpl_toolkits.mplot3d import Axes3D

# To show graphs on notebook
%matplotlib inline             
plotly.offline.init_notebook_mode(connected=True)

Part I: Cleaning and examining the data

There are three data sources. Internet speed comes from an Akamai research and internet access comes from Internet Live Stats. The data could be scrapped, but for simplicity, we created excel files on the computer instead (the webpage data does not update, so there is not much benefit to scraping). The data for GDP per capita comes from the World Economic Outlook database.



In [16]:

    
# Importing internet speed/access and WEO data

path = '/Users/horacefung/Desktop/1 Uni/Sophomore/Spring_2016/Data_Bootcamp/Project Data' # Internet speed data
file = '/InternetSpeed.xlsx'
speed = pd.read_excel(path+file)

path = '/Users/horacefung/Desktop/1 Uni/Sophomore/Spring_2016/Data_Bootcamp/Project Data' # Internet access data
file = '/InternetAccess.xlsx'
access = pd.read_excel(path+file)

url = 'https://www.imf.org/external/pubs/ft/weo/2016/01/weodata/WEOApr2016all.xls'       # World Economic Outlook data
weo = pd.read_csv(url, 
                  sep = '\t',
                  skipfooter=1, engine='python',
                  na_values=['n/a', '--'],
                  thousands =',')

Quick look at the data

Looking at the data, we are interested in the Avg. connection speed (Mb/s) column in speed and the Penetration as %Pop column in access. I want to merge the relevant columns into one dataframe to make graphing simplier.



In [13]:

    
speed.head()









    Out[13]:






  
    
      
      Country/Territory
      Avg. connection speed (Mb/s)
      Avg. Peak Connection Speed (Mb/s)
      Above 4 Mbit/s
      Above 10 Mbit/s
      Above 15 Mbit/s
    
  
  
    
      0
      Global
      5.1
      32.2
      0.650
      0.270
      0.150
    
    
      1
      Argentina
      4.2
      26.9
      0.390
      0.031
      0.005
    
    
      2
      Bolivia
      1.8
      13.9
      0.028
      0.002
      0.001
    
    
      3
      Brazil
      3.6
      29.0
      0.320
      0.022
      0.006
    
    
      4
      Canada
      11.9
      52.4
      0.870
      0.430
      0.210



In [14]:

    
access.head()









    Out[14]:






  
    
      
      Country
      Internet Users
      Penetration as %Pop
      Population
      Non-Users
      Users 1 Year Change (%)
      Internet Users 1 YR Change
      Population 1 YR Change
    
  
  
    
      1
      China
      721434547
      0.522
      1382323332
      660888785
      0.022
      15520515
      0.0046
    
    
      2
      India
      462124989
      0.348
      1326801576
      864676587
      0.305
      108010242
      0.0120
    
    
      3
      U.S.
      286942362
      0.885
      324118787
      37176425
      0.011
      3229955
      0.0073
    
    
      4
      Brazil
      139111185
      0.664
      209567920
      70456735
      0.051
      6753879
      0.0083
    
    
      5
      Japan
      115111595
      0.911
      126323715
      11212120
      0.001
      117385
      -0.0020



In [17]:

    
weo.head()









    Out[17]:






  
    
      
      WEO Country Code
      ISO
      WEO Subject Code
      Country
      Subject Descriptor
      Subject Notes
      Units
      Scale
      Country/Series-specific Notes
      1980
      ...
      2013
      2014
      2015
      2016
      2017
      2018
      2019
      2020
      2021
      Estimates Start After
    
  
  
    
      0
      512
      AFG
      NGDP_R
      Afghanistan
      Gross domestic product, constant prices
      Expressed in billions of national currency uni...
      National currency
      Billions
      Source: National Statistics Office Latest actu...
      NaN
      ...
      457.654
      463.520
      470.313
      479.800
      494.431
      511.815
      532.369
      553.637
      575.622
      2014.0
    
    
      1
      512
      AFG
      NGDP_RPCH
      Afghanistan
      Gross domestic product, constant prices
      Annual percentages of constant price GDP are y...
      Percent change
      NaN
      See notes for:  Gross domestic product, consta...
      NaN
      ...
      3.933
      1.282
      1.466
      2.017
      3.049
      3.516
      4.016
      3.995
      3.971
      2014.0
    
    
      2
      512
      AFG
      NGDP
      Afghanistan
      Gross domestic product, current prices
      Expressed in billions of national currency uni...
      National currency
      Billions
      Source: National Statistics Office Latest actu...
      NaN
      ...
      1116.827
      1173.182
      1174.129
      1234.006
      1328.672
      1457.509
      1606.898
      1771.165
      1952.397
      2014.0
    
    
      3
      512
      AFG
      NGDPD
      Afghanistan
      Gross domestic product, current prices
      Values are based upon GDP in national currency...
      U.S. dollars
      Billions
      See notes for:  Gross domestic product, curren...
      NaN
      ...
      20.170
      20.444
      19.204
      17.275
      17.449
      18.468
      19.648
      20.905
      22.217
      2014.0
    
    
      4
      512
      AFG
      NGDP_D
      Afghanistan
      Gross domestic product, deflator
      The GDP deflator is derived by dividing curren...
      Index
      NaN
      See notes for:  Gross domestic product, consta...
      NaN
      ...
      244.033
      253.103
      249.648
      257.192
      268.727
      284.772
      301.839
      319.915
      339.180
      2014.0
    
  

5 rows × 52 columns

Cleaning up the data

There are a few things we want to clean up here. Firstly, there should be clearer names for the columns. Secondly, we want to have the access data expressed as a percentage. Thirdly, just by testing with pd.shape, the number of countries are not the same. After sorting these problems out, we can merge everything into a new dataframe.



In [18]:

    
# Setting speed dataframe to only country and internet speed columns

speed = speed[['Country/Territory', 'Avg. connection speed (Mb/s)']]  # Selecting the two needed columns
Newnames_1 = [['Country', 'Average Connection Speed (Mb/s)']] # Better names
speed.columns = Newnames_1 # Renaming columns
speed = speed.drop(speed.index[[0]])  # Dropping the index 0 row which is global averages.
speed.head()









    Out[18]:






  
    
      
      Country
      Average Connection Speed (Mb/s)
    
  
  
    
      1
      Argentina
      4.2
    
    
      2
      Bolivia
      1.8
    
    
      3
      Brazil
      3.6
    
    
      4
      Canada
      11.9
    
    
      5
      Chile
      5.7



In [19]:

    
# Setting access dataframe to only country and access columns

access = access[['Country', 'Penetration as %Pop']]
Newnames_2 = ['Country', '% Internet Penetration'] # Better names
access.columns = Newnames_2
access.loc[:,'% Internet Penetration'] *= 100  # multiply the observations to get the floats in percentage

# Selecting only observations with the same countries as speed

countries_list = speed['Country'].tolist() # create a list of the country names avaliable in speed                         
access_1 = access[access['Country'].isin(countries_list)] # use countries_list to select the same countries for access  
print('Internet Speed observations:', speed.shape, '| Internet Access observations:', access_1.shape)









    



Internet Speed observations: (55, 2) | Internet Access observations: (51, 2)

We are missing 4 observations (55 vs 51 countries). This could be due to i) Internet Access does not have data for those countries, ii) the country names are different or iii) a mix of both. We can check with merge.



In [20]:

    
# Figuring out which countries are missing/named incorrectly

check = pd.merge(speed, access_1, how='outer', indicator=True)
access_missing = check[check['_merge']=='left_only'][speed.columns]
access_missing









    Out[20]:






  
    
      
      Country
      Average Connection Speed (Mb/s)
    
  
  
    
      12
      United States
      12.6
    
    
      27
      Taiwan
      10.1
    
    
      29
      Vietnam
      3.4
    
    
      54
      United Kingdom
      13.0

Output shows that these 4 countries were missing from the access dataframe. Upon inspection, United States, Vietnam and United Kingdom have different names. Taiwan is not on the access dataframe and should be left out.



In [21]:

    
# Correcting the country names

access['Country'] = access['Country'].replace(to_replace=['U.S.', 'Viet Nam', 'U.K.'], 
                                              value=['United States', 'Vietnam', 'United Kingdom'])



In [22]:

    
# Cleaning up WEO data

weo = weo[['Country', 'WEO Subject Code', '2016']]
subject_variable = ['NGDPDPC'] # GDP per capita, current prices, US dollars currency
weo_2 = weo[weo['WEO Subject Code'].isin(subject_variable) & weo['Country'].isin(countries_list)] 
weo_2 = weo_2.set_index(['Country'])
weo_cleaned = weo_2[['2016']] # Now that we got the correct subject code, we can just drop other years
weo_cleaned.columns = ['GDP per capita in dollars']
weo_cleaned.head()









    Out[22]:






  
    
      
      GDP per capita in dollars
    
    
      Country
      
    
  
  
    
      Argentina
      10050.973
    
    
      Australia
      49144.526
    
    
      Austria
      44777.821
    
    
      Belgium
      40688.479
    
    
      Bolivia
      2898.338



In [23]:

    
# Merging the datasets into one

speed = speed.set_index(['Country'])
access = access.set_index(['Country'])
speed_access = speed.merge(access, left_index=True, right_index=True)
internet = weo_cleaned.merge(speed_access, left_index=True, right_index=True) # Run the line twice to merge both
internet.head()









    Out[23]:






  
    
      
      GDP per capita in dollars
      Average Connection Speed (Mb/s)
      % Internet Penetration
    
    
      Country
      
      
      
    
  
  
    
      China
      8239.894
      3.7
      52.2
    
    
      India
      1747.493
      2.5
      34.8
    
    
      United States
      57220.196
      12.6
      88.5
    
    
      Brazil
      7447.443
      3.6
      66.4
    
    
      Japan
      34870.931
      15.0
      91.1



In [24]:

    
internet.shape # Double-checking we have the correct number of countries here









    Out[24]:





(51, 3)



In [25]:

    
internet.dtypes









    Out[25]:





GDP per capita in dollars          float64
Average Connection Speed (Mb/s)    float64
% Internet Penetration             float64
dtype: object

Part II: Graphing the variables

We begin by plotting two horizontal bar charts for speed and access. The charts are designed to give a general overview of how internet speed/access differs across the countries.



In [26]:

    
# Sorting the data in ascending order

internet_speed_sorted = internet.sort_values('Average Connection Speed (Mb/s)', ascending = True)
internet_access_sorted = internet.sort_values('% Internet Penetration', ascending = True)



In [27]:

    
# Finding the max/min speed and access values for the graph
# We need this to plot the mean average lines

print(internet_speed_sorted.index[-1], internet_speed_sorted.index[0],
      internet_access_sorted.index[-1], internet_access_sorted.index[0])









    



Sweden Paraguay Norway Indonesia

Plotting bar charts



In [28]:

    
# Using plotly to create two descending horizontal barcharts for speed and access

# Creating a trace for speed
speed = Bar(
    x=internet_speed_sorted['Average Connection Speed (Mb/s)'],
    y=internet_speed_sorted.index, 
    marker=dict(color='rgb(0, 163, 209)'), # We will be using a blue (speed) and orange (access) color scheme
    name='Internet Speed in the world', # Label
    orientation='h',) # Horizontal bar-chart
    
# Creating a trace for an average speed line 
speed_mean = Scatter(
    x= [internet['Average Connection Speed (Mb/s)'].mean(), # setting the vertical line to the mean x-value
        internet['Average Connection Speed (Mb/s)'].mean()],
    y= ['Paraguay', 'Sweden'], # Using the results above to determine the starting/end points of the line
    mode= 'lines',
    line = dict(
        color = ('rgb(96, 96, 96)'),
        width = 3,
        dash = 'dot'), # dotted line 
    name= 'Global Average Speed')

# Creating a trace for access
access = Bar(
    x=internet_access_sorted['% Internet Penetration'],
    y=internet_access_sorted.index,
    marker=dict(color='rgb(255, 174, 15)'),
    name='Internet Access in the world',
    orientation='h')

# Creating a trace for an average access line 
access_mean = Scatter(
    x= [internet['% Internet Penetration'].mean(), 
        internet['% Internet Penetration'].mean()],
    y= ['Indonesia', 'Norway'],
    mode= 'lines',
    line = dict(
        color = ('rgb(96, 96, 96)'),
        width = 3,
        dash = 'dot'),
    name= 'Global Average Penetration')

# Specifying the layout 
layout = dict(
    width = 1000, height = 1100, # We want a nice large graph that shows every country
    title='<b>Internet speed and access around the world</b>', # bold title
    font=dict(family='PT Sans Narrow',size=13), # changing font to PT Sans Narrow + size 13
    yaxis1=dict(
        showgrid=False,
        showline=False,
        showticklabels=True, # Make sure we get the country names on y-axis
        domain=[4, 40]),
    yaxis2=dict(
        showgrid=False,
        showline=False,
        showticklabels=True,
        domain=[4, 40]), # shifting the domain to make sure the y-axis does not overlap
    xaxis1=dict(
        title='Internet speed measured in Mb/s',
        titlefont=dict(family='PT Sans Narrow'),
        zeroline=False,
        showline=False,
        showticklabels=True,
        showgrid=True,
        domain=[0.06, 0.5]),
    xaxis2=dict(
        title='Access to internet as % of total population',
        titlefont=dict(family='PT Sans Narrow'),
        zeroline=False,
        showline=False,
        showticklabels=True,
        showgrid=True,
        domain=[0.6, 1.04]), # shifting the domain to make sure the x-axis does not overlap
    
    legend=dict(
        x=1,
        y=0.5,
        font=dict(
            size=13)),
    margin=dict(
        l=100,
        r=90,
        t=70,
        b=70),
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(255, 255, 255)')

# Creating the subplots 
fig = tools.make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True,
                          shared_yaxes=False, vertical_spacing=0.001, horizontal_spacing=0.3)

fig.append_trace(speed, 1, 1)
fig.append_trace(speed_mean, 1, 1)
fig.append_trace(access, 1, 2)
fig.append_trace(access_mean, 1, 2)

fig['layout'].update(layout)
iplot(fig, filename='Internet speed and access')









    



This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]

The scatterplots support our hypothesis that there is some positive relationship between speed/access and GDP per capita. The speed data points are a bit spread out, but there seems to be a general positive relationship. The access data points appear to have a stronger positive relationship with GDP per capita. Moreover, the slope of internet access is relatively steep at lower GDP per capita levels, suggesting that internet access increases GDP per capita more rapidly for poorer countries while richer countries reach a plateau. We do have to take into account that these scatterplots are designed to give a quick peek into the relationships between speed/access and GDP; they are not great for comparison since the y-axis are different measurements and scale. Instead of scatterplots, regression jointplots can help us gain a more detailed understanding of the relationships at work.

Simple linear regressions



In [29]:

    
# Speed vs GDP per capita linear regression

sns.set(style="darkgrid", color_codes=True)

g = sns.jointplot("GDP per capita in dollars", "Average Connection Speed (Mb/s)", data=internet, kind="reg", order = 1, 
                  xlim = (0, 100000), ylim = (0, 20), size=6)

As expected, the regression shows a positive relationship between speed and GDP per capita. In addition, Pearson's R is 0.79, which suggests a strong-medium correlation between the two variables. Since p-value is approximately 0, we can also reject the hypothesis that β1=0, and conclude that the speed variable is statistically significant and this model has some use. Both variables, however, do not appear to be normally distributed. In addition, the 95% confidence interval is much wider at greater internet speeds, meaning this relationship may not be strong for countries with very high speeds.



In [30]:

    
# Access vs GDP per capita linear regression

sns.set(style="darkgrid", color_codes=True)

g = sns.jointplot("GDP per capita in dollars", "% Internet Penetration", data=internet, kind="reg", order = 1, 
                  xlim = (0, 100000), ylim = (0, 150), color="orangered", size=6)

Similarly, the regression for access and GDP per capita shows a positive relationship. The Pearson's R is higher at 0.81, which suggests a stronger correlation between the two variables. Since p-value is also approximately 0, we can reject the hypothesis that β1=0, and conclude that the access variable is statistically significant. Moreover, the 95% confidence interval is also quite wide at higher % penetration levels, implying the relationship may not be as meaningful with countries that have high accessibility. The internet access data does not seem to be normally distributed either. But both datasets seem close enough though, so we will not do any additional transformation to normalize the data.



In [255]:

    
# Access vs speed linear regression

sns.set(style="darkgrid", color_codes=True)

g = sns.jointplot("Average Connection Speed (Mb/s)", "% Internet Penetration", data=internet, kind="reg", order = 1, 
                xlim = (0, 20), ylim = (0,150), color="purple", size=6)

Unfortuntely, it does appear that there will be significant multicollinearity in our final regression. There is a strong-medium positive relationship between internet speed and access, which is not surprising as countries with greater usuage of the internet will also likely demand faster internet speed (or the other way around, higher internet speed may encourage greater usage). This means that the R-squared value for our final regression could be inflated.

KDE plots

A KDE plot is a good way to get a clearer look at how the data is distributed. This is important since our jointplots showed that our datasets are not normally distributed.



In [256]:

    
# KDE plot for speed vs gdp per capita
sns.jointplot(x="GDP per capita in dollars", y="Average Connection Speed (Mb/s)", data=internet, kind="kde")









    Out[256]:





<seaborn.axisgrid.JointGrid at 0x11f8a0e80>



In [257]:

    
# KDE plot for access vs gdp per capita
sns.jointplot(x="GDP per capita in dollars", y="% Internet Penetration", data=internet, kind="kde", color='orangered')









    Out[257]:





<seaborn.axisgrid.JointGrid at 0x1169fb8d0>

Both KDE plots showed a similar pattern; our data points have two clusters near 10,000 and 60,000 GDP per capita. This caused the two dark areas in our KDE plots. Perhaps, there is a significant barrier that countries have to jump from low speed/access to high speed/access, and that only richer countries can get above the 10 (Mb/s) and 80% access range. It could also be the other way; perhaps countries that have the technology to support a fast and widespread network are also the countries with enough resources to achieve a higher GDP per capita. In addition, all of our variables seem to have bimodal distributions.

Part III: Estimating GDP per Capita with a multiple variable regression



In [31]:

    
# Defining X and Y for the regression
X = internet[['Average Connection Speed (Mb/s)', '% Internet Penetration']]
Y = internet['GDP per capita in dollars']

# Fitting a OLS model with intercept on Speed and Access
X = sm.add_constant(X)
reg = sm.OLS(Y, X).fit()

reg.summary()









    Out[31]:





OLS Regression Results

  Dep. Variable:     GDP per capita in dollars    R-squared:             0.723


  Model:                        OLS               Adj. R-squared:        0.711


  Method:                  Least Squares          F-statistic:           62.54


  Date:                  Fri, 12 May 2017         Prob (F-statistic):  4.29e-14


  Time:                      00:07:01             Log-Likelihood:      -545.51


  No. Observations:               51              AIC:                   1097.


  Df Residuals:                   48              BIC:                   1103.


  Df Model:                        2                                          


  Covariance Type:           nonrobust                                        




                                     coef      std err       t       P>|t|   [0.025     0.975]  


  const                            -2.549e+04   5770.191     -4.417   0.000  -3.71e+04  -1.39e+04


  Average Connection Speed (Mb/s)   1831.6659    525.465      3.486   0.001    775.147   2888.185


  % Internet Penetration             495.4706    118.470      4.182   0.000    257.270    733.672




  Omnibus:         2.146    Durbin-Watson:         2.056


  Prob(Omnibus):   0.342    Jarque-Bera (JB):      1.384


  Skew:           -0.005    Prob(JB):              0.501


  Kurtosis:        3.807    Cond. No.               271.

$GDP per capita = -2.549e^4 + 1831.6659(Speed) + 495.4706(Access)$

The regression has a decent R-squared, at 72.3% (but a significant amount of correlation likely came from the relationship between speed and access). In addition, the p-value for the F-test is near 0, which means that this model does have some use. Lastly, the standard errors are quite large, which means that our model may not be very accurate.

Using + visualizing the regression

We want to test out the model on United States and two countries with very little/inaccurate GDP data (Georgia and North Korea). We also want to plot out the regression just to make it easier to visualize.

Georgia Speed and Access
North Korea Speed
North Korea Access



In [32]:

    
# Using the regression to predict GDP per capita of Georgia
# Georgia speed is 1.6875 Mb/s and access is 48.9%. WEO data for gdp is missing
# North Korea speed is 0.2500 Mb/s and access is 0.1%

# Defining the regression function
def z_func(x,y):
    return (1831.6659*x + 495.4706*y - 2.549*10**4)   

print('Georgia //', 'Regression GDP per capita: $', z_func(1.6875, 48.9), '| Older IMF GDP per capita: $', 3720)
print('North Korea //', 'Regression GDP per capita: $', z_func(0.2500, 0.1), '| Predicted GDP per capita: $', 583)
print('US //', 'Regression GDP per capita: $', z_func(12.6, 88.5), '| WEO GDP per capita: $', 57220.196)









    



Georgia // Regression GDP per capita: $ 1829.448546249998 | Older IMF GDP per capita: $ 3720
North Korea // Regression GDP per capita: $ -24982.536465 | Predicted GDP per capita: $ 583
US // Regression GDP per capita: $ 41438.138439999995 | WEO GDP per capita: $ 57220.196

Plotting the regression



In [33]:

    
# Defining the variables for plotting 
x = arange(0,20,1) # Setting the input X variables/axis
y = arange(0,100,1) # Setting the input Y variables/axis
X,Y = meshgrid(x, y) # grid of point
Z = z_func(X, Y) # Setting Z as a function of X and Y using the regression



In [39]:

    
# Plotting the 3d figure 
sns.set_style("whitegrid")
fig = plt.figure(figsize=(12,9))
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, 
                      cmap=cm.RdBu,linewidth=0, antialiased=False)

ax.set_xlabel('Internet Speed', fontsize=10)
ax.set_ylabel('Internet Access', fontsize=10)

# Creating and customizing the colour bar
cbar= fig.colorbar(surf, shrink=0.5, aspect=10)
cbar.set_label('GDP per capita', rotation=270)

# Plotting the points 
ax.plot([12.6], [88.5], [41438], 'o', color='black', markersize=10)
ax.text(13, 90, 42000, "US", color='k', fontsize=15)
ax.plot([2.305], [48.9], [2961], 'o', color='black', markersize=10)
ax.text(3, 54, 3000, "Georgia", color='k', fontsize=15)
ax.plot([0.2500], [0.1], [-24982], 'o', color='black', markersize=10)
ax.text(2, 2, -15000, "N. Korea", color='k', fontsize=15)

plt.show()

Conclusion

By looking at the relationships between internet speed/access to GDP per capita, we were able to determine a statistically significant positive relationship between the variables. We saw from the scatterplots that internet access seems to play a bigger role in accelerating GDP per capita for poorer countries, but reaches a plateau at around 80% access. Speed, on the other hand, seems to contribute to a more steady increase in GDP per capita. Both of those observations are reasonable; the avaliability of internet could be a huge initital driver for foundations like education, hence, it forms a base for greater GDP per capita. Speed, however, may continue to be relevant at higher levels since more developed countries like the US could be making use of greater internet speed to advance their growing technology sector. Our final regression placed a large emphasis on internet access as the contributing factor to high GDP per capita. From the graph above, we can see that the regression does a decent job when internet access is about 40% and higher. It, however, over "punishes" low internet access such as with North Korea, giving it a negative GDP per capita that is far below the estimate number. In fact, the model underestimates GDP per capita for all 3 of our test countries.

Overall, we believe this notebook shows that there are ways to measure GDP outside of the typical output processes. It may not be as accurate, but could be useful in some situations such as comparing US GDP with Georgia, where there is no 2016 data for Georgia. And on the regression above, we expect a rather large gap between the GDP per capita of US and Georgia.

	Country/Territory	Avg. connection speed (Mb/s)	Avg. Peak Connection Speed (Mb/s)	Above 4 Mbit/s	Above 10 Mbit/s	Above 15 Mbit/s
0	Global	5.1	32.2	0.650	0.270	0.150
1	Argentina	4.2	26.9	0.390	0.031	0.005
2	Bolivia	1.8	13.9	0.028	0.002	0.001
3	Brazil	3.6	29.0	0.320	0.022	0.006
4	Canada	11.9	52.4	0.870	0.430	0.210

	Country	Internet Users	Penetration as %Pop	Population	Non-Users	Users 1 Year Change (%)	Internet Users 1 YR Change	Population 1 YR Change
1	China	721434547	0.522	1382323332	660888785	0.022	15520515	0.0046
2	India	462124989	0.348	1326801576	864676587	0.305	108010242	0.0120
3	U.S.	286942362	0.885	324118787	37176425	0.011	3229955	0.0073
4	Brazil	139111185	0.664	209567920	70456735	0.051	6753879	0.0083
5	Japan	115111595	0.911	126323715	11212120	0.001	117385	-0.0020

	WEO Country Code	ISO	WEO Subject Code	Country	Subject Descriptor	Subject Notes	Units	Scale	Country/Series-specific Notes	1980	...	2013	2014	2015	2016	2017	2018	2019	2020	2021	Estimates Start After
0	512	AFG	NGDP_R	Afghanistan	Gross domestic product, constant prices	Expressed in billions of national currency uni...	National currency	Billions	Source: National Statistics Office Latest actu...	NaN	...	457.654	463.520	470.313	479.800	494.431	511.815	532.369	553.637	575.622	2014.0
1	512	AFG	NGDP_RPCH	Afghanistan	Gross domestic product, constant prices	Annual percentages of constant price GDP are y...	Percent change	NaN	See notes for: Gross domestic product, consta...	NaN	...	3.933	1.282	1.466	2.017	3.049	3.516	4.016	3.995	3.971	2014.0
2	512	AFG	NGDP	Afghanistan	Gross domestic product, current prices	Expressed in billions of national currency uni...	National currency	Billions	Source: National Statistics Office Latest actu...	NaN	...	1116.827	1173.182	1174.129	1234.006	1328.672	1457.509	1606.898	1771.165	1952.397	2014.0
3	512	AFG	NGDPD	Afghanistan	Gross domestic product, current prices	Values are based upon GDP in national currency...	U.S. dollars	Billions	See notes for: Gross domestic product, curren...	NaN	...	20.170	20.444	19.204	17.275	17.449	18.468	19.648	20.905	22.217	2014.0
4	512	AFG	NGDP_D	Afghanistan	Gross domestic product, deflator	The GDP deflator is derived by dividing curren...	Index	NaN	See notes for: Gross domestic product, consta...	NaN	...	244.033	253.103	249.648	257.192	268.727	284.772	301.839	319.915	339.180	2014.0

	Country	Average Connection Speed (Mb/s)
12	United States	12.6
27	Taiwan	10.1
29	Vietnam	3.4
54	United Kingdom	13.0

	GDP per capita in dollars
Country
Argentina	10050.973
Australia	49144.526
Austria	44777.821
Belgium	40688.479
Bolivia	2898.338

	GDP per capita in dollars	Average Connection Speed (Mb/s)	% Internet Penetration
Country
China	8239.894	3.7	52.2
India	1747.493	2.5	34.8
United States	57220.196	12.6	88.5
Brazil	7447.443	3.6	66.4
Japan	34870.931	15.0	91.1

Dep. Variable:	GDP per capita in dollars	R-squared:	0.723
Model:	OLS	Adj. R-squared:	0.711
Method:	Least Squares	F-statistic:	62.54
Date:	Fri, 12 May 2017	Prob (F-statistic):	4.29e-14
Time:	00:07:01	Log-Likelihood:	-545.51
No. Observations:	51	AIC:	1097.
Df Residuals:	48	BIC:	1103.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-2.549e+04	5770.191	-4.417	0.000	-3.71e+04	-1.39e+04
Average Connection Speed (Mb/s)	1831.6659	525.465	3.486	0.001	775.147	2888.185
% Internet Penetration	495.4706	118.470	4.182	0.000	257.270	733.672

Omnibus:	2.146	Durbin-Watson:	2.056
Prob(Omnibus):	0.342	Jarque-Bera (JB):	1.384
Skew:	-0.005	Prob(JB):	0.501
Kurtosis:	3.807	Cond. No.	271.