I use a service called "Libsyn" to host Data Skeptic. This is not a formal endorsement, just a fact. They provide a certain amount of data about downloads, including a summary of downloads by country.

It came as no surprise to me that Data Skeptic is most downloaded in my home country of the United States of America. But I get a non-trivial amount of downloads from our neighboring nation to the north, which I know has about 10% of our population. Do I get the most downloads per capita in the US?



In [10]:

    
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



In [56]:

    
df = pd.read_csv('show_countries_dataskeptic_2016-01-08.csv')
df.sort('total_downloads', inplace=True, ascending=False)



In [57]:

    
tail = df[11:].sum()
df = df[0:10]
df = df.append({'country_name': 'Other', 'total_downloads': tail['total_downloads']}, ignore_index=True)
df['downloads_percent'] = df['total_downloads'] / df['total_downloads'].sum() * 100
df['downloads_percent'] = df['downloads_percent'].apply(lambda x: round(x, 2))



In [58]:

    
df[['country_name', 'downloads_percent']]









    Out[58]:






  
    
      
      country_name
      downloads_percent
    
  
  
    
      0
      United States
      59.02
    
    
      1
      Australia
      6.97
    
    
      2
      United Kingdom
      6.60
    
    
      3
      Canada
      4.41
    
    
      4
      Germany
      2.53
    
    
      5
      Sweden
      1.84
    
    
      6
      South Africa
      1.48
    
    
      7
      India
      1.27
    
    
      8
      Netherlands
      1.09
    
    
      9
      France
      0.84
    
    
      10
      Other
      13.95



In [69]:

    
df.sort('total_downloads', inplace=True)
df.index = np.arange(df.shape[0])
plt.barh(df.index, df['downloads_percent'])
plt.gca().yaxis.grid(False)
plt.yticks(df.index + 0.4, df['country_name'])
plt.xlabel('% of downloads')
plt.ylim(-.25, df.shape[0])
plt.show()



In [81]:

    
# I wish I knew a clean API for this...

# These are the 2013 estimates as provided by Google searc hof "population of ____" on 1/8/2016

populations = []
populations.append({'country_name': 'France', 'population': 66030000})
populations.append({'country_name': 'Netherlands', 'population': 16800000})
populations.append({'country_name': 'India', 'population': 1252000000})
populations.append({'country_name': 'South Africa', 'population': 52980000})
populations.append({'country_name': 'Sweden', 'population': 9593000})
populations.append({'country_name': 'Germany', 'population': 80620000})
populations.append({'country_name': 'Canada', 'population': 35160000})
populations.append({'country_name': 'United Kingdom', 'population': 64100000})
populations.append({'country_name': 'Australia', 'population': 23130000})
populations.append({'country_name': 'United States', 'population': 316500000})
world_population = 7162119434
s = 0
for pop in populations:
    s += pop['population']
other = world_population - s
populations.append({'country_name': 'Other', 'population': other})
df2 = pd.DataFrame(populations)
df = df.merge(df2)
df['per_capita'] = df['total_downloads'] / df['population']



In [89]:

    
df.sort('per_capita', inplace=True)
df.index = np.arange(df.shape[0])
plt.barh(df.index, df['per_capita'])
plt.yticks(df.index+0.4, df['country_name'])
plt.gca().yaxis.grid(False)
plt.ylim(-.25, df.shape[0])
plt.xlabel('Per capital downloads')
plt.show()

How interesting! I could write a few pages about my thoughts on why this is, but for now, I'll let the data speak for itself.

And a special thank you to my Australian and Swedish listeners!

	country_name	downloads_percent
0	United States	59.02
1	Australia	6.97
2	United Kingdom	6.60
3	Canada	4.41
4	Germany	2.53
5	Sweden	1.84
6	South Africa	1.48
7	India	1.27
8	Netherlands	1.09
9	France	0.84
10	Other	13.95