# One million is a lot

This notebook presents analysis of data from the first million page views on my blog, Probably Overthinking It.

``````

In [1]:

%matplotlib inline

``````
``````

In [7]:

import pandas as pd

fp = open(filename)
table = t[5]
return table

``````
``````

In [13]:

table1.shape

``````
``````

Out[13]:

(100, 8)

``````
``````

In [14]:

table2.shape

``````
``````

Out[14]:

(20, 8)

``````
``````

In [18]:

table = pd.concat([table1, table2], ignore_index=True)
table.shape

``````
``````

Out[18]:

(120, 9)

``````
``````

In [19]:

import string
chars = string.ascii_letters + ' '

def convert(s):
return (int(s.rstrip(chars)))

def clean(s):
i = s.find('Edit')
return s[:i]

``````
``````

In [20]:

table['title'] = table[1].apply(clean)
table.title

``````
``````

Out[20]:

0                                   One million is a lot
1                    When will I win the Great Bear Run?
2                                    Bayes meets Fourier
3                First babies are more likely to be late
4                Bayesian analysis of gluten sensitivity
5                             Bayes theorem in real life
6                   The Inspection Paradox is Everywhere
7                                 Orange is the new stat
8                     Will Millennials Ever Get Married?
9                                     Bayesian Billiards
10                           The Sleeping Beauty Problem
11             Hypothesis testing is only mostly useless
12                 Two hour marathon by 2041 -- probably
13      Bayesian survival analysis for "Game of Thrones"
14            Statistical inference is only mostly wrong
15          Upcoming talk on survival analysis in Python
16            Bayesian analysis of match rates on Tinder
17       Godless freshmen: now more Nones than Catholics
18              Bayesian predictions for Super Bowl XLIX
19                    Statistics tutorials at PyCon 2015
20                                The Rock Hyrax Problem
21     The World Cup Problem Part 2: Germany v. Argen...
22              The World Cup Problem: Germany v. Brazil
23     On efficient algorithms for finding the goddam...
24                             Two hour marathon in 2041
25                         Bayesian election forecasting
26        Regression with Python, pandas and StatsModels
27         New study: vaccines prevent disease and death
28                     An exercise in hypothesis testing
29               More likely to be killed by a terrorist
...
90                          Girl Named Florida solutions
91                     The red-haired girl named Florida
92                             Somebody bet on the Bayes
93                      All your Bayes are belong to us!
94                  My favorite Bayes's Theorem problems
96                    Repeated tests: how bad can it be?
97                         The Jimmy Nut Company problem
98                       Upcoming webcast: Only One Test
99                                News flash: OJ did it.
100                        Postcard from NKS Summer Camp
101           A hierarchical Bayesian model of pond scum
102                         More hypotheses, less trivia
103                              There is only one test!
104                                  Statistics Workshop
106                            Two Hour Marathon in 2045
107                    Bayesianness is next to Godliness
108                                    Survival analysis
109              Freshman hordes more godless than ever!
110                            Predicting marathon times
111                                BQ is unfair to women
112                                 Moving the goalposts
113                                        The BQ Effect
114             Are first babies more likely to be late?
115     Yet another reason SAT scores are non-predictive
116                           Are you popular? Hint: no.
117                              Obesity epidemic cured!
118                       Observer effect in relay races
119                             Proofiness and elections
Name: title, dtype: object

``````
``````

In [21]:

table['plusses'] = table[4].fillna(0)

``````
``````

Out[21]:

0    0
1    1
2    7
3    2
4    9
Name: plusses, dtype: float64

``````
``````

In [24]:

``````
``````

Out[24]:

0    0
1    1
2    1
3    3
4    1

``````
``````

In [25]:

table['views'] = table[6].apply(convert)
table.views

``````
``````

Out[25]:

0           0
1         723
2        2363
3         944
4        3110
5        2514
6       30484
7        2131
8         589
9        1273
10       2348
11       1816
12       2891
13      32406
14       4666
15       1242
16       7602
17       1491
18       2254
19       1193
20        648
21       1789
22       3040
23        819
24       3090
25       1621
26       6456
27       1834
28       1057
29       1536
...
90       9454
91       1153
92       2332
93      48836
94      34384
95       3367
96       3797
97       1929
98        885
99          0
100         0
101      2162
102      1520
103      4246
104       203
105      1445
106      1745
107      1083
108      2849
109      1379
110      3847
111       815
112       513
113      3066
114    130722
115     17876
116      1468
117       289
118       725
119       396
Name: views, dtype: int64

``````
``````

In [26]:

table['date'] = pd.to_datetime(table[7])

``````
``````

Out[26]:

0   2015-11-01
1   2015-10-26
2   2015-10-23
3   2015-09-23
4   2015-09-01
Name: date, dtype: datetime64[ns]

``````
``````

In [34]:

table = table[table.views > 0]
table.shape

``````
``````

Out[34]:

(115, 13)

``````
``````

In [38]:

table.index = range(115, 0, -1)
table.title

``````
``````

Out[38]:

115                  When will I win the Great Bear Run?
114                                  Bayes meets Fourier
113              First babies are more likely to be late
112              Bayesian analysis of gluten sensitivity
111                           Bayes theorem in real life
110                 The Inspection Paradox is Everywhere
109                               Orange is the new stat
108                   Will Millennials Ever Get Married?
107                                   Bayesian Billiards
106                          The Sleeping Beauty Problem
105            Hypothesis testing is only mostly useless
104                Two hour marathon by 2041 -- probably
103     Bayesian survival analysis for "Game of Thrones"
102           Statistical inference is only mostly wrong
101         Upcoming talk on survival analysis in Python
100           Bayesian analysis of match rates on Tinder
99       Godless freshmen: now more Nones than Catholics
98              Bayesian predictions for Super Bowl XLIX
97                    Statistics tutorials at PyCon 2015
96                                The Rock Hyrax Problem
95     The World Cup Problem Part 2: Germany v. Argen...
94              The World Cup Problem: Germany v. Brazil
93     On efficient algorithms for finding the goddam...
92                             Two hour marathon in 2041
91                         Bayesian election forecasting
90        Regression with Python, pandas and StatsModels
89         New study: vaccines prevent disease and death
88                     An exercise in hypothesis testing
87               More likely to be killed by a terrorist
86        Bayesian solution to the Lincoln index problem
...
30                    Estimating the age of renal tumors
29                   Comment on "Racism and Meritocracy"
28                          Girl Named Florida solutions
27                     The red-haired girl named Florida
26                             Somebody bet on the Bayes
25                      All your Bayes are belong to us!
24                  My favorite Bayes's Theorem problems
22                    Repeated tests: how bad can it be?
21                         The Jimmy Nut Company problem
20                       Upcoming webcast: Only One Test
19            A hierarchical Bayesian model of pond scum
18                          More hypotheses, less trivia
17                               There is only one test!
16                                   Statistics Workshop
14                             Two Hour Marathon in 2045
13                     Bayesianness is next to Godliness
12                                     Survival analysis
11               Freshman hordes more godless than ever!
10                             Predicting marathon times
9                                  BQ is unfair to women
8                                   Moving the goalposts
7                                          The BQ Effect
6               Are first babies more likely to be late?
5       Yet another reason SAT scores are non-predictive
4                             Are you popular? Hint: no.
3                                Obesity epidemic cured!
2                         Observer effect in relay races
1                               Proofiness and elections
Name: title, dtype: object

``````
``````

In [ ]:

``````
``````

In [39]:

dates = table.date.sort_values()
diffs = dates.diff()

``````
``````

Out[39]:

1      NaT
2   6 days
3   7 days
4   7 days
5   9 days
Name: date, dtype: timedelta64[ns]

``````
``````

In [40]:

diffs.dropna().describe()

``````
``````

Out[40]:

count                        114
mean     15 days 09:41:03.157894
std      20 days 04:36:55.930513
min              1 days 00:00:00
25%              5 days 00:00:00
50%             10 days 00:00:00
75%             17 days 18:00:00
max            180 days 00:00:00
Name: date, dtype: object

``````
``````

In [41]:

``````
``````

Out[41]:

title
views
date

6
Are first babies more likely to be late?
130722
2011-02-07

25
All your Bayes are belong to us!
48836
2011-10-27

24
My favorite Bayes's Theorem problems
34384
2011-10-20

103
Bayesian survival analysis for "Game of Thrones"
32406
2015-03-25

110
30484
2015-08-18

41
23892
2012-03-14

5
Yet another reason SAT scores are non-predictive
17876
2011-02-02

72
Are your data normal? Hint: no.
16152
2013-08-07

36
Freshman hordes even more godless!
10826
2012-01-29

34
Think Complexity
10670
2012-01-23

54
Secularization in America: part six
9773
2012-07-10

28
Girl Named Florida solutions
9454
2011-11-10

55
Secularization in America: part seven
7705
2012-07-11

100
Bayesian analysis of match rates on Tinder
7602
2015-02-10

90
Regression with Python, pandas and StatsModels
6456
2014-09-14

57
Are first babies more likely to be late, revis...
5776
2013-01-08

78
Correlation is evidence of causation
4911
2014-02-20

102
Statistical inference is only mostly wrong
4666
2015-03-02

17
There is only one test!
4246
2011-05-31

65
The Price is Right Problem
4062
2013-04-22

``````
``````

In [56]:

``````
``````

Out[56]:

title
views
date

16
Statistics Workshop
203
2011-05-17

3
Obesity epidemic cured!
289
2011-01-17

1
Proofiness and elections
396
2011-01-04

45
Fog warning system: part two
504
2012-04-20

8
Moving the goalposts
513
2011-02-24

108
Will Millennials Ever Get Married?
589
2015-07-13

96
The Rock Hyrax Problem
648
2014-12-04

62
Belly Button Biodiversity: Part Four
675
2013-03-22

46
Fog warning system: part three
704
2012-04-25

115
When will I win the Great Bear Run?
723
2015-10-26

2
Observer effect in relay races
725
2011-01-10

60
Belly Button Biodiversity: Part Two
783
2013-02-08

9
BQ is unfair to women
815
2011-03-02

93
On efficient algorithms for finding the goddam...
819
2014-10-04

70
Belly Button Biodiversity: The End Game
839
2013-05-30

20
Upcoming webcast: Only One Test
885
2011-08-16

50
Secularization in America: part three
927
2012-06-22

61
Belly Button Biodiversity: Part Three
932
2013-02-18

113
First babies are more likely to be late
944
2015-09-23

32
Frank is a scoundrel, probably
947
2012-01-05

``````
``````

In [55]:

import thinkstats2
import thinkplot

cdf = thinkstats2.Cdf(table.views)

thinkplot.PrePlot(1)
thinkplot.Cdf(cdf, complement=True)
thinkplot.Config(xlabel ='Number of page views', xscale='log',
ylabel='CCDF', yscale='log',
legend=False)

``````
``````

``````
``````

In [45]:

``````
``````

Out[45]:

title
date

25
All your Bayes are belong to us!
56
2011-10-27

106
The Sleeping Beauty Problem
53
2015-06-12

28
Girl Named Florida solutions
25
2011-11-10

110
23
2015-08-18

54
Secularization in America: part six
14
2012-07-10

``````
``````

In [44]:

``````
``````

Out[44]:

title
plusses
date

110
909
2015-08-18

25
All your Bayes are belong to us!
59
2011-10-27

103
Bayesian survival analysis for "Game of Thrones"
54
2015-03-25

67
Software engineering practices for graduate st...
34
2013-05-06

102
Statistical inference is only mostly wrong
31
2015-03-02

``````
``````

In [ ]:

``````
``````

In [ ]:

``````