School Populations in Edinburgh/Scotland

The data set in this notebook can be found in http://www.opendatascotland.org/ and it details how many students can be found in each school in addition to the geographical location of the school. Let's mine this data and see if we can discover any interesting trends.

Import Statements

A bit like citations in textbooks that allow us to quote and use the works of another author. Import statements let us bring extra tools in to the notebook which we may need to use when analysing a data set.



In [2]:

    
# This is a comment. The notebook does not execute these. They are just used to provide information to the reader.
# We use pandas to manage tabular data such as spread sheets, coma separated values
# And tab separated values.
import pandas as pd  # This is an import statement.
# Bokeh is a very cool library which we will use for visualizing the data in intersting ways.
from bokeh.plotting import *
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource
# Numpy is a numerical package which we will use to deal with all the maths and metrics
# that we may want to compute from this data set.
import numpy as np

Reading and Visualizing Tabular Data

As mentioned in the comments of the previous code snippet pandas is a great library to deal with tabular data when programming in python.



In [3]:

    
# Load the data in to the notebook:
gov_csv = pd.read_csv("../../data/opendata/tutorial_1/schools.csv")
# Display the tabular data 
gov_csv









    Out[3]:






  
    
      
      school
      school_label
      latitude
      longitude
      pupils
    
  
  
    
      0
      http://data.opendatascotland.org/id/educationa...
      Linlithgow Academy
      55.97160
      -3.61259
      1231
    
    
      1
      http://data.opendatascotland.org/id/educationa...
      St Kentigern's Academy
      55.87101
      -3.63367
      1215
    
    
      2
      http://data.opendatascotland.org/id/educationa...
      James Young High,The
      55.88093
      -3.51523
      1135
    
    
      3
      http://data.opendatascotland.org/id/educationa...
      St Margaret's Academy
      55.88937
      -3.52213
      1094
    
    
      4
      http://data.opendatascotland.org/id/educationa...
      Inveralmond Community High
      55.90146
      -3.51932
      1090
    
    
      5
      http://data.opendatascotland.org/id/educationa...
      West Calder High
      55.86291
      -3.54044
      950
    
    
      6
      http://data.opendatascotland.org/id/educationa...
      Deans Community High
      55.90581
      -3.54977
      941
    
    
      7
      http://data.opendatascotland.org/id/educationa...
      Broxburn Academy
      55.93694
      -3.48778
      903
    
    
      8
      http://data.opendatascotland.org/id/educationa...
      Bathgate Academy
      55.89838
      -3.61313
      899
    
    
      9
      http://data.opendatascotland.org/id/educationa...
      Whitburn Academy
      55.86804
      -3.67964
      822
    
    
      10
      http://data.opendatascotland.org/id/educationa...
      Armadale Academy
      55.89481
      -3.71436
      780
    
    
      11
      http://data.opendatascotland.org/id/educationa...
      Armadale
      55.89717
      -3.70321
      440
    
    
      12
      http://data.opendatascotland.org/id/educationa...
      Balbardie
      55.90518
      -3.63735
      423
    
    
      13
      http://data.opendatascotland.org/id/educationa...
      Linlithgow
      55.97165
      -3.60945
      417
    
    
      14
      http://data.opendatascotland.org/id/educationa...
      Peel
      55.89497
      -3.53573
      407
    
    
      15
      http://data.opendatascotland.org/id/educationa...
      Williamston
      55.87527
      -3.50414
      404
    
    
      16
      http://data.opendatascotland.org/id/educationa...
      Carmondean
      55.90662
      -3.54190
      402
    
    
      17
      http://data.opendatascotland.org/id/educationa...
      St Mary's, Bathgate
      55.90016
      -3.64731
      401
    
    
      18
      http://data.opendatascotland.org/id/educationa...
      Harrysmuir
      55.90146
      -3.51932
      401
    
    
      19
      http://data.opendatascotland.org/id/educationa...
      St John Ogilvie
      55.90437
      -3.55298
      368
    
    
      20
      http://data.opendatascotland.org/id/educationa...
      St Nicholas
      55.93284
      -3.48396
      364
    
    
      21
      http://data.opendatascotland.org/id/educationa...
      St Nicholas
      55.93461
      -3.47285
      364
    
    
      22
      http://data.opendatascotland.org/id/educationa...
      Broxburn
      55.93545
      -3.47440
      361
    
    
      23
      http://data.opendatascotland.org/id/educationa...
      Windyknowe
      55.89849
      -3.66481
      359
    
    
      24
      http://data.opendatascotland.org/id/educationa...
      Parkhead
      55.85232
      -3.56651
      354
    
    
      25
      http://data.opendatascotland.org/id/educationa...
      Whitdale Primary
      55.86560
      -3.67475
      345
    
    
      26
      http://data.opendatascotland.org/id/educationa...
      Simpson Primary
      55.89066
      -3.62362
      336
    
    
      27
      http://data.opendatascotland.org/id/educationa...
      Bankton
      55.88157
      -3.50703
      331
    
    
      28
      http://data.opendatascotland.org/id/educationa...
      Eastertoun
      55.89812
      -3.71076
      329
    
    
      29
      http://data.opendatascotland.org/id/educationa...
      Howden St Andrew's
      55.89300
      -3.50936
      322
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      54
      http://data.opendatascotland.org/id/educationa...
      St John The Baptist
      55.83026
      -3.70456
      176
    
    
      55
      http://data.opendatascotland.org/id/educationa...
      Polkemmet
      55.86204
      -3.69082
      169
    
    
      56
      http://data.opendatascotland.org/id/educationa...
      Falla Hill
      55.82792
      -3.71016
      168
    
    
      57
      http://data.opendatascotland.org/id/educationa...
      Pumpherston and Uphall Station
      55.90655
      -3.49183
      168
    
    
      58
      http://data.opendatascotland.org/id/educationa...
      Pumpherston and Uphall Station
      55.90885
      -3.49211
      168
    
    
      59
      http://data.opendatascotland.org/id/educationa...
      Our Lady of Lourdes
      55.87509
      -3.62618
      148
    
    
      60
      http://data.opendatascotland.org/id/educationa...
      Blackridge
      55.88429
      -3.77368
      143
    
    
      61
      http://data.opendatascotland.org/id/educationa...
      St Columba's
      55.90156
      -3.60997
      126
    
    
      62
      http://data.opendatascotland.org/id/educationa...
      St Joseph's, Linlithgow
      55.97165
      -3.60945
      124
    
    
      63
      http://data.opendatascotland.org/id/educationa...
      St Paul's
      55.89645
      -3.45757
      117
    
    
      64
      http://data.opendatascotland.org/id/educationa...
      St Mary's, Polbeth
      55.85996
      -3.55515
      115
    
    
      65
      http://data.opendatascotland.org/id/educationa...
      Pinewood
      55.87249
      -3.61281
      115
    
    
      66
      http://data.opendatascotland.org/id/educationa...
      Seafield
      55.87864
      -3.58856
      115
    
    
      67
      http://data.opendatascotland.org/id/educationa...
      Winchburgh
      55.95631
      -3.46832
      102
    
    
      68
      http://data.opendatascotland.org/id/educationa...
      Stoneyburn
      55.84932
      -3.62739
      96
    
    
      69
      http://data.opendatascotland.org/id/educationa...
      Addiewell
      55.84520
      -3.61746
      95
    
    
      70
      http://data.opendatascotland.org/id/educationa...
      Longridge
      55.84493
      -3.67811
      92
    
    
      71
      http://data.opendatascotland.org/id/educationa...
      Bridgend
      55.96256
      -3.53472
      78
    
    
      72
      http://data.opendatascotland.org/id/educationa...
      Cedarbank
      55.89696
      -3.51670
      75
    
    
      73
      http://data.opendatascotland.org/id/educationa...
      Torphichen
      55.93437
      -3.65513
      71
    
    
      74
      http://data.opendatascotland.org/id/educationa...
      Holy Family
      55.95631
      -3.46832
      60
    
    
      75
      http://data.opendatascotland.org/id/educationa...
      Blackburn
      55.87750
      -3.63012
      59
    
    
      76
      http://data.opendatascotland.org/id/educationa...
      Ogilvie School Campus
      55.90653
      -3.52634
      56
    
    
      77
      http://data.opendatascotland.org/id/educationa...
      Our Lady's Primary
      55.84601
      -3.63337
      53
    
    
      78
      http://data.opendatascotland.org/id/educationa...
      Westfield
      55.92798
      -3.70243
      38
    
    
      79
      http://data.opendatascotland.org/id/educationa...
      St Thomas'
      55.84520
      -3.61746
      36
    
    
      80
      http://data.opendatascotland.org/id/educationa...
      Beatlie School
      55.89686
      -3.49517
      34
    
    
      81
      http://data.opendatascotland.org/id/educationa...
      Woodmuir
      55.82852
      -3.65678
      27
    
    
      82
      http://data.opendatascotland.org/id/educationa...
      Burnhouse
      55.86204
      -3.69082
      16
    
    
      83
      http://data.opendatascotland.org/id/educationa...
      Dechmont
      55.91956
      -3.54255
      14
    
  

84 rows × 5 columns

Looking at the table we can see that not all schools have the same number of pupils. But how badly are students actuallly spread out within schools in scotland? can we display the table in a better way that allows us to make a conclusion about the spread of pupils in scottish schools ?



In [4]:

    
# Create a histogram to visualize the tabular data:
pupils_histogram = gov_csv["pupils"]
number_of_schools = len(pupils_histogram)
bins = range(number_of_schools + 1)
# plot a bokeh figure
hist_fig = figure(title="Pupils Population Within Scottish Schools")
hist_fig.quad(top=pupils_histogram, bottom=0, left=bins[:-1], right=bins[1:],
              fill_color="#036564", line_color="#033649");
hist_fig.xaxis.axis_label = 'schools'
hist_fig.yaxis.axis_label = 'number of pupils'
# has school names as labels, sadly it looks very ugly. Can overcome this with a hover tool.
# x = list(gov_csv["school_label"])
# b = Bar(pupils_histogram, title="Sorted by School Name Length",cat=x,);



In [5]:

    
output_notebook()









    




    
        
        
        
    
        
        BokehJS successfully loaded.



In [6]:

    
show(hist_fig)

Much better! we can se that the number of pupils decreases almost linearly only that there is a sudden drop. Can we use use the histogram to find out why?

Lets say I have a hypothesis and that is that parentslike sending their kids to schools that have long names because it sounds pretentious in their social circles.

To motivate this (NOTE: motivate only and not prove in any formal way !) I will generate the exact same histogram only that the bins will be sorted by school name length as opposed to number of pupils



In [7]:

    
# obtain the length of each school name
int_index = gov_csv["school_label"].apply(lambda x: len(x))
# replace the index of the tabular index with int_index
gov_csv = gov_csv.set_index(int_index).sort()



In [8]:

    
# Create a histogram to visualize the sorted tabular data:
from bokeh.charts import Bar
# Group names with same length and up their populations
pupils_histogram = gov_csv.groupby(gov_csv.index).sum()["pupils"]
number_of_schools = len(pupils_histogram)
# plot a bokeh Bar since it allows us to label the x axis with relevant values
x = list(map(str,sorted(set(int_index))))
b = Bar(pupils_histogram, title="Sorted by School Name Length",cat=x,
        xlabel='school name lengths', ylabel='number of pupils');



In [9]:

    
show(b)

Its not a very strong motivation mayve a cumulative version of this might help in making a stronger conclusion. ~~This could be partially set up and left as an exercise?~~



In [ ]:



In [ ]:

	school	school_label	latitude	longitude	pupils
0	http://data.opendatascotland.org/id/educationa...	Linlithgow Academy	55.97160	-3.61259	1231
1	http://data.opendatascotland.org/id/educationa...	St Kentigern's Academy	55.87101	-3.63367	1215
2	http://data.opendatascotland.org/id/educationa...	James Young High,The	55.88093	-3.51523	1135
3	http://data.opendatascotland.org/id/educationa...	St Margaret's Academy	55.88937	-3.52213	1094
4	http://data.opendatascotland.org/id/educationa...	Inveralmond Community High	55.90146	-3.51932	1090
5	http://data.opendatascotland.org/id/educationa...	West Calder High	55.86291	-3.54044	950
6	http://data.opendatascotland.org/id/educationa...	Deans Community High	55.90581	-3.54977	941
7	http://data.opendatascotland.org/id/educationa...	Broxburn Academy	55.93694	-3.48778	903
8	http://data.opendatascotland.org/id/educationa...	Bathgate Academy	55.89838	-3.61313	899
9	http://data.opendatascotland.org/id/educationa...	Whitburn Academy	55.86804	-3.67964	822
10	http://data.opendatascotland.org/id/educationa...	Armadale Academy	55.89481	-3.71436	780
11	http://data.opendatascotland.org/id/educationa...	Armadale	55.89717	-3.70321	440
12	http://data.opendatascotland.org/id/educationa...	Balbardie	55.90518	-3.63735	423
13	http://data.opendatascotland.org/id/educationa...	Linlithgow	55.97165	-3.60945	417
14	http://data.opendatascotland.org/id/educationa...	Peel	55.89497	-3.53573	407
15	http://data.opendatascotland.org/id/educationa...	Williamston	55.87527	-3.50414	404
16	http://data.opendatascotland.org/id/educationa...	Carmondean	55.90662	-3.54190	402
17	http://data.opendatascotland.org/id/educationa...	St Mary's, Bathgate	55.90016	-3.64731	401
18	http://data.opendatascotland.org/id/educationa...	Harrysmuir	55.90146	-3.51932	401
19	http://data.opendatascotland.org/id/educationa...	St John Ogilvie	55.90437	-3.55298	368
20	http://data.opendatascotland.org/id/educationa...	St Nicholas	55.93284	-3.48396	364
21	http://data.opendatascotland.org/id/educationa...	St Nicholas	55.93461	-3.47285	364
22	http://data.opendatascotland.org/id/educationa...	Broxburn	55.93545	-3.47440	361
23	http://data.opendatascotland.org/id/educationa...	Windyknowe	55.89849	-3.66481	359
24	http://data.opendatascotland.org/id/educationa...	Parkhead	55.85232	-3.56651	354
25	http://data.opendatascotland.org/id/educationa...	Whitdale Primary	55.86560	-3.67475	345
26	http://data.opendatascotland.org/id/educationa...	Simpson Primary	55.89066	-3.62362	336
27	http://data.opendatascotland.org/id/educationa...	Bankton	55.88157	-3.50703	331
28	http://data.opendatascotland.org/id/educationa...	Eastertoun	55.89812	-3.71076	329
29	http://data.opendatascotland.org/id/educationa...	Howden St Andrew's	55.89300	-3.50936	322
...	...	...	...	...	...
54	http://data.opendatascotland.org/id/educationa...	St John The Baptist	55.83026	-3.70456	176
55	http://data.opendatascotland.org/id/educationa...	Polkemmet	55.86204	-3.69082	169
56	http://data.opendatascotland.org/id/educationa...	Falla Hill	55.82792	-3.71016	168
57	http://data.opendatascotland.org/id/educationa...	Pumpherston and Uphall Station	55.90655	-3.49183	168
58	http://data.opendatascotland.org/id/educationa...	Pumpherston and Uphall Station	55.90885	-3.49211	168
59	http://data.opendatascotland.org/id/educationa...	Our Lady of Lourdes	55.87509	-3.62618	148
60	http://data.opendatascotland.org/id/educationa...	Blackridge	55.88429	-3.77368	143
61	http://data.opendatascotland.org/id/educationa...	St Columba's	55.90156	-3.60997	126
62	http://data.opendatascotland.org/id/educationa...	St Joseph's, Linlithgow	55.97165	-3.60945	124
63	http://data.opendatascotland.org/id/educationa...	St Paul's	55.89645	-3.45757	117
64	http://data.opendatascotland.org/id/educationa...	St Mary's, Polbeth	55.85996	-3.55515	115
65	http://data.opendatascotland.org/id/educationa...	Pinewood	55.87249	-3.61281	115
66	http://data.opendatascotland.org/id/educationa...	Seafield	55.87864	-3.58856	115
67	http://data.opendatascotland.org/id/educationa...	Winchburgh	55.95631	-3.46832	102
68	http://data.opendatascotland.org/id/educationa...	Stoneyburn	55.84932	-3.62739	96
69	http://data.opendatascotland.org/id/educationa...	Addiewell	55.84520	-3.61746	95
70	http://data.opendatascotland.org/id/educationa...	Longridge	55.84493	-3.67811	92
71	http://data.opendatascotland.org/id/educationa...	Bridgend	55.96256	-3.53472	78
72	http://data.opendatascotland.org/id/educationa...	Cedarbank	55.89696	-3.51670	75
73	http://data.opendatascotland.org/id/educationa...	Torphichen	55.93437	-3.65513	71
74	http://data.opendatascotland.org/id/educationa...	Holy Family	55.95631	-3.46832	60
75	http://data.opendatascotland.org/id/educationa...	Blackburn	55.87750	-3.63012	59
76	http://data.opendatascotland.org/id/educationa...	Ogilvie School Campus	55.90653	-3.52634	56
77	http://data.opendatascotland.org/id/educationa...	Our Lady's Primary	55.84601	-3.63337	53
78	http://data.opendatascotland.org/id/educationa...	Westfield	55.92798	-3.70243	38
79	http://data.opendatascotland.org/id/educationa...	St Thomas'	55.84520	-3.61746	36
80	http://data.opendatascotland.org/id/educationa...	Beatlie School	55.89686	-3.49517	34
81	http://data.opendatascotland.org/id/educationa...	Woodmuir	55.82852	-3.65678	27
82	http://data.opendatascotland.org/id/educationa...	Burnhouse	55.86204	-3.69082	16
83	http://data.opendatascotland.org/id/educationa...	Dechmont	55.91956	-3.54255	14