Author: Barnabas Makonda

  • Data Scource Tanzania OpenData

    This dataset contains ranking information of primary schools according to performance in primary school leaving certificate examinations.


In [65]:
%matplotlib inline
from collections import defaultdict
import json
from __future__ import division

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from matplotlib import rcParams
import matplotlib.cm as cm
import matplotlib as mpl

import seaborn as sns
sns.set_context("talk")
sns.set_style("white")

Load Data


In [3]:
df = pd.read_csv('PrimarySchoolsPerfomanceAndLocation-2014.csv')
df.shape


Out[3]:
(15866, 13)

In [8]:
df.columns


Out[8]:
Index([u'NAME', u'REGION', u'DISTRICT', u'OWNERSHIP', u'PASS_RATE',
       u'AVG_MARK', u'CHANGE_PREVIOUS_YEAR', u'RANK'],
      dtype='object')

In [9]:
col =['NAME','REGION','DISTRICT','OWNERSHIP','PASS_RATE','AVG_MARK','CHANGE_PREVIOUS_YEAR','RANK']
for c in df.columns:
    if c not in col:
        df=df.drop(c,axis=1)

In [10]:
df.shape


Out[10]:
(15866, 8)

In [11]:
df.OWNERSHIP.unique()


Out[11]:
array(['GOVERNMENT', nan, 'NON GOVERNMENT'], dtype=object)

In [12]:
df.head(10)


Out[12]:
NAME REGION DISTRICT OWNERSHIP PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR RANK
0 BANGATA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 51.52 99.29 13.05 9426
1 BWAWANI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 26.83 85.78 8.42 13056
2 EKENYWA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 65.31 108.33 7.18 6842
3 EMAOI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100.00 135.09 25.83 2296
4 ENDONYOESOIYE PR. SCHOOL ARUSHA ARUSHA NaN 43.84 96.10 11.51 10363
5 ENGALAONI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 88.10 120.17 30.46 4220
6 ENGORIKA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 42.86 100.10 5.98 9195
7 ENGUTUKOITI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 4.76 70.10 -25.99 15425
8 ENYUATA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 86.55 131.43 11.22 2666
9 GREEN ACRES PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100.00 200.03 8.15 84

We need to know number of primary school which did standard seven exam year 2014


In [13]:
df.shape[0]


Out[13]:
15866
There about 15866 schools

Ownership of the School

can be seen below schools are grouped either Government or Non Government schools and there are about 2833 schools which have no type


In [14]:
print df.OWNERSHIP.unique()


['GOVERNMENT' nan 'NON GOVERNMENT']

how many do not have type or they are empty?


In [15]:
df[df.OWNERSHIP.isnull()].shape[0]


Out[15]:
2833

Sample of school which ownership is empty


In [16]:
df[df.OWNERSHIP.isnull()].head(10)


Out[16]:
NAME REGION DISTRICT OWNERSHIP PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR RANK
4 ENDONYOESOIYE PR. SCHOOL ARUSHA ARUSHA NaN 43.84 96.10 11.51 10363
18 KAMBI YA MAZIWA PR. SCHOOL ARUSHA ARUSHA NaN 86.67 122.29 -0.13 3875
67 OLDONYOSAPUK PR. SCHOOL ARUSHA ARUSHA NaN 98.73 160.06 66.59 744
86 THEMI YA SIMBA PR. SCHOOL ARUSHA ARUSHA NaN 44.44 95.61 26.11 10496
94 SOTWA WILSON PR. SCHOOL ARUSHA ARUSHA NaN 100.00 120.75 3.32 4118
98 ASSALAF ISLAMIC PR. SCHOOL ARUSHA ARUSHA NaN 100.00 162.83 0.94 673
104 ARUSHA HILL SIDE ARUSHA ARUSHA NaN 28.57 93.57 0.00 11085
105 EBENEZER ARUSHA ARUSHA NaN 100.00 159.29 0.00 774
106 EUNOTO ARUSHA ARUSHA NaN 72.41 118.93 0.00 4490
107 ILKONERE ARUSHA ARUSHA NaN 68.75 107.90 0.00 6953

Those which ownership is not null, either Government or Non Government


In [17]:
df[df.OWNERSHIP.notnull()].head(10)


Out[17]:
NAME REGION DISTRICT OWNERSHIP PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR RANK
0 BANGATA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 51.52 99.29 13.05 9426
1 BWAWANI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 26.83 85.78 8.42 13056
2 EKENYWA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 65.31 108.33 7.18 6842
3 EMAOI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100.00 135.09 25.83 2296
5 ENGALAONI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 88.10 120.17 30.46 4220
6 ENGORIKA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 42.86 100.10 5.98 9195
7 ENGUTUKOITI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 4.76 70.10 -25.99 15425
8 ENYUATA PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 86.55 131.43 11.22 2666
9 GREEN ACRES PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100.00 200.03 8.15 84
10 ILBORU PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 93.94 131.69 6.17 2641

and they are 13033


In [18]:
df[df.OWNERSHIP.notnull()].shape[0]


Out[18]:
13033

Lets plot a pie chat to visualize the data


In [19]:
government_schools =sum(df.OWNERSHIP=='GOVERNMENT') #Government schools
nongovernment_schools =sum(df.OWNERSHIP=='NON GOVERNMENT') #nongovernment schools
unknown = sum(df.OWNERSHIP.isnull()) #number of shools with unknown ownership
schl=df.shape[0] #number of schools

In [20]:
Labels =['Govenment', 'Non Government','Unknown']
fractions =[float(government_schools)/schl, float(nongovernment_schools)/schl, float(unknown)/schl] #percentage
colors = ['yellowgreen', 'gold', 'lightskyblue'] #colors for pie chart
explode = (0, 0, 0) #only explode the first slice

plt.pie(fractions, explode=explode, labels=Labels, colors=colors,
        autopct='%1.1f%%', shadow=True, startangle=90)
# Set aspect ratio to be equal so that pie is drawn as a circle.
plt.axis('equal')
plt.show()


Lets play with the numbers now

Lets look and see summary of the Pass Rate, Avarage Mark and Change compared to previous year


In [22]:
df[['PASS_RATE','AVG_MARK','CHANGE_PREVIOUS_YEAR']].describe()


Out[22]:
PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR
count 15866.000000 15866.000000 15866.000000
mean 55.935164 108.653736 4.821223
std 28.020727 26.142989 17.869999
min 0.000000 46.620000 -89.220000
25% 33.330000 90.440000 -3.960000
50% 55.320000 104.500000 0.470000
75% 80.000000 121.760000 12.385000
max 100.000000 234.700000 118.980000

The avarage Passing rate was 55.93% and Maximum passing rate was 100%(All students passed examination) while lowest was 0(Nobody passed)

Avarage mark was 108.7 maximum avarage mark per school was 234.7 and minimum was 46.62

NB:These marks are for 5 subjects hence total of 250 marks

Lets play a little more

  • How many schools did have 100% pass rate?
  • Is there a government school which had 100% pass rate?
  • Which school did best of all schools?
  • Which did bad?

In [41]:
df[df.PASS_RATE == 100 ]


Out[41]:
NAME REGION DISTRICT OWNERSHIP PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR RANK
3 EMAOI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 135.09 25.83 2296
9 GREEN ACRES PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 200.03 8.15 84
19 KIGONGONI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 140.12 52.26 1824
20 KILIMANJARO PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 194.59 21.03 120
60 NURU PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 139.81 -11.19 1847
65 OLDENDERETI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 140.04 3.99 1829
69 OLGILAI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 151.46 45.51 1101
74 OLORIEN VALLEY PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 158.91 -15.03 788
83 SITI MWINYI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 177.69 13.69 359
85 ST MAGRET PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 184.62 -41.24 235
88 TRUST ST PATRICK PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 181.72 -1.28 285
91 NEW LIFE PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 173.53 26.65 438
93 YAKINI PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 196.94 6.16 98
94 SOTWA WILSON PR. SCHOOL ARUSHA ARUSHA NaN 100 120.75 3.32 4118
95 WILKABAA PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 147.90 -12.58 1277
96 ALBEHIJE PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 190.11 54.25 151
97 NEW VISION PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 163.33 1.51 665
98 ASSALAF ISLAMIC PR. SCHOOL ARUSHA ARUSHA NaN 100 162.83 0.94 673
102 ST PAULO PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 135.63 -18.25 2237
103 GOODWILL PR. SCHOOL ARUSHA ARUSHA NON GOVERNMENT 100 142.06 7.20 1664
105 EBENEZER ARUSHA ARUSHA NaN 100 159.29 0.00 774
108 MAASAI JOY ARUSHA ARUSHA NaN 100 133.82 0.00 2421
109 SONGAMBELE ARUSHA ARUSHA NaN 100 148.17 0.00 1257
129 THEMI PR. SCHOOL ARUSHA ARUSHA(M) GOVERNMENT 100 148.05 6.66 1266
133 JAFFERY PR. SCHOOL ARUSHA ARUSHA(M) NaN 100 152.36 -2.52 1052
135 ARUSHA MODERN PR. SCHOOL ARUSHA ARUSHA(M) NON GOVERNMENT 100 165.54 -10.59 610
136 ARUSHA INTERGRATED PR. SCHOOL ARUSHA ARUSHA(M) NON GOVERNMENT 100 141.63 -9.69 1698
138 PARENTS PR. SCHOOL ARUSHA ARUSHA(M) NON GOVERNMENT 100 169.04 15.11 539
140 IMANI PR. SCHOOL ARUSHA ARUSHA(M) NON GOVERNMENT 100 191.87 -17.57 138
142 SWIFTS PR. SCHOOL ARUSHA ARUSHA(M) NON GOVERNMENT 100 139.58 0.00 1868
... ... ... ... ... ... ... ... ...
15148 ITULIKE PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 174.42 32.24 422
15154 KILIMANI PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 146.26 35.08 1388
15158 KKKT VIZIWI PR. SCHOOL NJOMBE MJI NJOMBE NON GOVERNMENT 100 156.33 19.22 874
15160 LIHOGOSA PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 183.57 27.12 246
15163 LIVINGSTONE PR. SCHOOL NJOMBE MJI NJOMBE NON GOVERNMENT 100 181.90 -7.83 281
15176 MATOLA PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 130.52 27.72 2754
15180 MGALA PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 124.33 44.41 3550
15185 MKELA PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 132.43 -8.78 2568
15186 MPECHI PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 160.10 12.05 743
15195 NYAMYUYA PR. SCHOOL NJOMBE MJI NJOMBE NON GOVERNMENT 100 146.18 2.07 1395
15198 RUHUJI PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 176.14 7.17 386
15199 SABASABA PR. SCHOOL NJOMBE MJI NJOMBE NaN 100 157.54 22.50 840
15200 SENGA PR. SCHOOL NJOMBE MJI NJOMBE GOVERNMENT 100 153.62 -6.76 1007
15201 ST BENEDICT PR. SCHOOL NJOMBE MJI NJOMBE NON GOVERNMENT 100 196.92 -1.70 99
15208 COMPASSION PR. SCHOOL NJOMBE MJI NJOMBE NON GOVERNMENT 100 140.33 -24.21 1803
15209 MT BAKITA PR. SCHOOL NJOMBE MJI NJOMBE NaN 100 189.13 -22.01 166
15359 BARIADI ALLIANCE ENG MED SCHOOL SIMIYU BARIADI MJI NaN 100 215.94 -5.69 23
15397 HERBERT GAPPA ENGMEDIUM SIMIYU BARIADI MJI NaN 100 187.59 0.00 186
15433 MWAMIJONDO PR. SCHOOL SIMIYU BARIADI VIJIJINI GOVERNMENT 100 190.00 23.14 152
15522 MWAMAGULU PR. SCHOOL SIMIYU BUSEGA GOVERNMENT 100 158.00 28.20 824
15552 SIMBA WA YUDA ENG MED PR. SCHOOL SIMIYU BUSEGA NaN 100 169.03 -2.79 541
15554 YORDAN ENG MED PR. SCHOOL SIMIYU BUSEGA NaN 100 189.38 24.53 161
15635 MWAOGAMA PR. SCHOOL SIMIYU ITILIMA-DC GOVERNMENT 100 141.72 -5.85 1689
15644 BUKIGI PR. SCHOOL SIMIYU MASWA GOVERNMENT 100 124.00 37.79 3605
15649 DEKAPOLI ENGLISH MEDIUM SCHOOL SIMIYU MASWA NaN 100 172.94 -10.42 451
15722 MWANHEGELE PR. SCHOOL SIMIYU MASWA GOVERNMENT 100 166.67 2.10 584
15737 NYANGUGANWA PR. SCHOOL SIMIYU MASWA GOVERNMENT 100 143.55 0.00 1574
15748 ST JOSEPHINE BAKHITA ENGLISH MEDIUM PR. SCHOOL SIMIYU MASWA NaN 100 182.61 0.00 265
15753 ZEBEYA PR. SCHOOL SIMIYU MASWA GOVERNMENT 100 137.34 13.78 2062
15811 MWAGAYI PR. SCHOOL SIMIYU MEATU GOVERNMENT 100 169.86 109.89 519

1106 rows × 8 columns


In [42]:
print "There were %s schools which had 100 pass rate"%sum(df.PASS_RATE == 100 )


There were 1106 schools which had 100 pass rate

In [50]:
df[(df.PASS_RATE == 100) & (df.OWNERSHIP=="GOVERNMENT")].describe()


Out[50]:
PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR RANK
count 532 532.000000 532.00000 532.000000
mean 100 154.375583 27.88265 1306.710526
std 0 18.891587 25.02545 986.919266
min 100 114.670000 -37.93000 25.000000
25% 100 140.210000 10.17000 573.000000
50% 100 152.150000 26.03000 1065.000000
75% 100 167.410000 44.23500 1813.500000
max 100 215.110000 117.02000 5398.000000

In [52]:
print "Sample of Government school which have 100% passing rate"
df[(df.PASS_RATE == 100) & (df.OWNERSHIP=="GOVERNMENT")].head(10)


Sample of Government school which have 100% passing rate
Out[52]:
NAME REGION DISTRICT OWNERSHIP PASS_RATE AVG_MARK CHANGE_PREVIOUS_YEAR RANK
3 EMAOI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 135.09 25.83 2296
19 KIGONGONI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 140.12 52.26 1824
65 OLDENDERETI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 140.04 3.99 1829
69 OLGILAI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 151.46 45.51 1101
83 SITI MWINYI PR. SCHOOL ARUSHA ARUSHA GOVERNMENT 100 177.69 13.69 359
129 THEMI PR. SCHOOL ARUSHA ARUSHA(M) GOVERNMENT 100 148.05 6.66 1266
145 ENGIRA PR. SCHOOL ARUSHA ARUSHA(M) GOVERNMENT 100 132.71 11.76 2534
230 GONGALI PR. SCHOOL ARUSHA KARATU GOVERNMENT 100 138.86 -13.37 1936
231 GYETIGHI PR. SCHOOL ARUSHA KARATU GOVERNMENT 100 182.59 11.14 266
304 MARMO PR. SCHOOL ARUSHA KARATU GOVERNMENT 100 159.19 -0.05 776

Lets group 100% pass rate by region


In [56]:
grouped = df[df.PASS_RATE == 100 ].groupby(df['REGION'])
passed_per_region = grouped.count()
passed_per_region.NAME


Out[56]:
REGION
ARUSHA           109
DAR ES SALAAM    121
DODOMA            23
GEITA             36
IRINGA            41
KAGERA            96
KATAVI             7
KIGOMA             9
KILIMANJARO       92
LINDI             14
MANYARA           28
MARA              33
MBEYA             16
MOROGORO          50
MTWARA            40
MWANZA            87
NJOMBE            36
PWANI             42
RUKWA              8
RUVUMA            39
SHINYANGA         21
SIMIYU            14
SINGIDA           18
TABORA            21
TANGA            105
Name: NAME, dtype: int64

Dar es salaam leads with 121 schools and Katavi is the last with 7


In [63]:
df.groupby(df.REGION).count().NAME


Out[63]:
REGION
ARUSHA            589
DAR ES SALAAM     506
DODOMA            727
GEITA             556
IRINGA            458
KAGERA            909
KATAVI            159
KIGOMA            627
KILIMANJARO       927
LINDI             462
MANYARA           542
MARA              722
MBEYA            1046
MOROGORO          844
MTWARA            609
MWANZA            891
NJOMBE            455
PWANI             520
RUKWA             348
RUVUMA            729
SHINYANGA         559
SIMIYU            508
SINGIDA           491
TABORA            723
TANGA             959
Name: NAME, dtype: int64

MBEYA is leading by having 1046

We can visulize our data more by using seaborn


In [76]:
sns.set_context("notebook")
#lets get mean Pass Rate
mean_pass = df.PASS_RATE.mean()
print mean_pass, df.PASS_RATE.median()

with sns.axes_style("whitegrid"):
    df.PASS_RATE.hist(bins=30, alpha=0.4);
    plt.axvline(mean_pass, 0, 0.75, color='r', label='Mean')
    plt.xlabel("Pass Rate")
    plt.ylabel("Counts")
    plt.title("Passing Rate Hisyogram")
    plt.legend()
    sns.despine()


55.9351638724 55.32

In [81]:
with sns.axes_style("whitegrid"):
    df.CHANGE_PREVIOUS_YEAR.hist(bins=15, alpha=0.6, color='r');
    plt.xlabel("change of passing rate comapred to 2013")
    plt.ylabel("school number")
    plt.title("Change of passing rate Hisyogram")
    plt.legend()



In [84]:
with sns.axes_style("whitegrid"):
    df.AVG_MARK.hist(bins=40,alpha=0.6, color='g')
    plt.xlabel("Avarage mark per school")
    plt.ylabel("school number")
    plt.title("Avarage Marks Hisyogram")
    plt.legend()


Thank You