Sveučilište u Zagrebu
Fakultet elektrotehnike i računarstva
http://www.fer.unizg.hr/predmet/su
Ak. god. 2015./2016.
(c) 2015 Jan Šnajder
Verzija: 0.8 (2015-10-15)
In [81]:
# Učitaj osnovne biblioteke...
import scipy as sp
import sklearn
import pandas as pd
%pylab inline
Populating the interactive namespace from numpy and matplotlib
VARIABLE DESCRIPTIONS: survival Survival (0 = No; 1 = Yes) pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd) name Name sex Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton) SPECIAL NOTES: Pclass is a proxy for socio-economic status (SES) 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower Age is in Years; Fractional if Age less than One (1) If the Age is Estimated, it is in the form xx.5 With respect to the family relation variables (i.e. sibsp and parch) some relations were ignored. The following are the definitions used for sibsp and parch. Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored) Parent: Mother or Father of Passenger Aboard Titanic Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic Other family relatives excluded from this study include cousins, nephews/nieces, aunts/uncles, and in-laws. Some children travelled only with a nanny, therefore parch=0 for them. As well, some travelled with very close friends or neighbors in a village, however, the definitions do not support such relations.
In [32]:
titanic_df = pd.read_csv("../data/titanic-train.csv")
titanic_df
Out[32]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35
0
0
373450
8.0500
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
6
7
0
1
McCarthy, Mr. Timothy J
male
54
0
0
17463
51.8625
E46
S
7
8
0
3
Palsson, Master. Gosta Leonard
male
2
3
1
349909
21.0750
NaN
S
8
9
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27
0
2
347742
11.1333
NaN
S
9
10
1
2
Nasser, Mrs. Nicholas (Adele Achem)
female
14
1
0
237736
30.0708
NaN
C
10
11
1
3
Sandstrom, Miss. Marguerite Rut
female
4
1
1
PP 9549
16.7000
G6
S
11
12
1
1
Bonnell, Miss. Elizabeth
female
58
0
0
113783
26.5500
C103
S
12
13
0
3
Saundercock, Mr. William Henry
male
20
0
0
A/5. 2151
8.0500
NaN
S
13
14
0
3
Andersson, Mr. Anders Johan
male
39
1
5
347082
31.2750
NaN
S
14
15
0
3
Vestrom, Miss. Hulda Amanda Adolfina
female
14
0
0
350406
7.8542
NaN
S
15
16
1
2
Hewlett, Mrs. (Mary D Kingcome)
female
55
0
0
248706
16.0000
NaN
S
16
17
0
3
Rice, Master. Eugene
male
2
4
1
382652
29.1250
NaN
Q
17
18
1
2
Williams, Mr. Charles Eugene
male
NaN
0
0
244373
13.0000
NaN
S
18
19
0
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
female
31
1
0
345763
18.0000
NaN
S
19
20
1
3
Masselmani, Mrs. Fatima
female
NaN
0
0
2649
7.2250
NaN
C
20
21
0
2
Fynney, Mr. Joseph J
male
35
0
0
239865
26.0000
NaN
S
21
22
1
2
Beesley, Mr. Lawrence
male
34
0
0
248698
13.0000
D56
S
22
23
1
3
McGowan, Miss. Anna "Annie"
female
15
0
0
330923
8.0292
NaN
Q
23
24
1
1
Sloper, Mr. William Thompson
male
28
0
0
113788
35.5000
A6
S
24
25
0
3
Palsson, Miss. Torborg Danira
female
8
3
1
349909
21.0750
NaN
S
25
26
1
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
female
38
1
5
347077
31.3875
NaN
S
26
27
0
3
Emir, Mr. Farred Chehab
male
NaN
0
0
2631
7.2250
NaN
C
27
28
0
1
Fortune, Mr. Charles Alexander
male
19
3
2
19950
263.0000
C23 C25 C27
S
28
29
1
3
O'Dwyer, Miss. Ellen "Nellie"
female
NaN
0
0
330959
7.8792
NaN
Q
29
30
0
3
Todoroff, Mr. Lalio
male
NaN
0
0
349216
7.8958
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
861
862
0
2
Giles, Mr. Frederick Edward
male
21
1
0
28134
11.5000
NaN
S
862
863
1
1
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
female
48
0
0
17466
25.9292
D17
S
863
864
0
3
Sage, Miss. Dorothy Edith "Dolly"
female
NaN
8
2
CA. 2343
69.5500
NaN
S
864
865
0
2
Gill, Mr. John William
male
24
0
0
233866
13.0000
NaN
S
865
866
1
2
Bystrom, Mrs. (Karolina)
female
42
0
0
236852
13.0000
NaN
S
866
867
1
2
Duran y More, Miss. Asuncion
female
27
1
0
SC/PARIS 2149
13.8583
NaN
C
867
868
0
1
Roebling, Mr. Washington Augustus II
male
31
0
0
PC 17590
50.4958
A24
S
868
869
0
3
van Melkebeke, Mr. Philemon
male
NaN
0
0
345777
9.5000
NaN
S
869
870
1
3
Johnson, Master. Harold Theodor
male
4
1
1
347742
11.1333
NaN
S
870
871
0
3
Balkic, Mr. Cerin
male
26
0
0
349248
7.8958
NaN
S
871
872
1
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
female
47
1
1
11751
52.5542
D35
S
872
873
0
1
Carlsson, Mr. Frans Olof
male
33
0
0
695
5.0000
B51 B53 B55
S
873
874
0
3
Vander Cruyssen, Mr. Victor
male
47
0
0
345765
9.0000
NaN
S
874
875
1
2
Abelson, Mrs. Samuel (Hannah Wizosky)
female
28
1
0
P/PP 3381
24.0000
NaN
C
875
876
1
3
Najib, Miss. Adele Kiamie "Jane"
female
15
0
0
2667
7.2250
NaN
C
876
877
0
3
Gustafsson, Mr. Alfred Ossian
male
20
0
0
7534
9.8458
NaN
S
877
878
0
3
Petroff, Mr. Nedelio
male
19
0
0
349212
7.8958
NaN
S
878
879
0
3
Laleff, Mr. Kristo
male
NaN
0
0
349217
7.8958
NaN
S
879
880
1
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
female
56
0
1
11767
83.1583
C50
C
880
881
1
2
Shelley, Mrs. William (Imanita Parrish Hall)
female
25
0
1
230433
26.0000
NaN
S
881
882
0
3
Markun, Mr. Johann
male
33
0
0
349257
7.8958
NaN
S
882
883
0
3
Dahlberg, Miss. Gerda Ulrika
female
22
0
0
7552
10.5167
NaN
S
883
884
0
2
Banfield, Mr. Frederick James
male
28
0
0
C.A./SOTON 34068
10.5000
NaN
S
884
885
0
3
Sutehall, Mr. Henry Jr
male
25
0
0
SOTON/OQ 392076
7.0500
NaN
S
885
886
0
3
Rice, Mrs. William (Margaret Norton)
female
39
0
5
382652
29.1250
NaN
Q
886
887
0
2
Montvila, Rev. Juozas
male
27
0
0
211536
13.0000
NaN
S
887
888
1
1
Graham, Miss. Margaret Edith
female
19
0
0
112053
30.0000
B42
S
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.4500
NaN
S
889
890
1
1
Behr, Mr. Karl Howell
male
26
0
0
111369
30.0000
C148
C
890
891
0
3
Dooley, Mr. Patrick
male
32
0
0
370376
7.7500
NaN
Q
891 rows × 12 columns
In [33]:
titanic_df.drop(['PassengerId'], axis=1, inplace=True)
In [34]:
titanic_df.describe()
Out[34]:
Survived
Pclass
Age
SibSp
Parch
Fare
count
891.000000
891.000000
714.000000
891.000000
891.000000
891.000000
mean
0.383838
2.308642
29.699118
0.523008
0.381594
32.204208
std
0.486592
0.836071
14.526497
1.102743
0.806057
49.693429
min
0.000000
1.000000
0.420000
0.000000
0.000000
0.000000
25%
0.000000
2.000000
20.125000
0.000000
0.000000
7.910400
50%
0.000000
3.000000
28.000000
0.000000
0.000000
14.454200
75%
1.000000
3.000000
38.000000
1.000000
0.000000
31.000000
max
1.000000
3.000000
80.000000
8.000000
6.000000
512.329200
In [35]:
titanic_df1 = titanic_df[['Pclass', 'Sex', 'Age','Survived']]
titanic_df1
Out[35]:
Pclass
Sex
Age
Survived
0
3
male
22
0
1
1
female
38
1
2
3
female
26
1
3
1
female
35
1
4
3
male
35
0
5
3
male
NaN
0
6
1
male
54
0
7
3
male
2
0
8
3
female
27
1
9
2
female
14
1
10
3
female
4
1
11
1
female
58
1
12
3
male
20
0
13
3
male
39
0
14
3
female
14
0
15
2
female
55
1
16
3
male
2
0
17
2
male
NaN
1
18
3
female
31
0
19
3
female
NaN
1
20
2
male
35
0
21
2
male
34
1
22
3
female
15
1
23
1
male
28
1
24
3
female
8
0
25
3
female
38
1
26
3
male
NaN
0
27
1
male
19
0
28
3
female
NaN
1
29
3
male
NaN
0
...
...
...
...
...
861
2
male
21
0
862
1
female
48
1
863
3
female
NaN
0
864
2
male
24
0
865
2
female
42
1
866
2
female
27
1
867
1
male
31
0
868
3
male
NaN
0
869
3
male
4
1
870
3
male
26
0
871
1
female
47
1
872
1
male
33
0
873
3
male
47
0
874
2
female
28
1
875
3
female
15
1
876
3
male
20
0
877
3
male
19
0
878
3
male
NaN
0
879
1
female
56
1
880
2
female
25
1
881
3
male
33
0
882
3
female
22
0
883
2
male
28
0
884
3
male
25
0
885
3
female
39
0
886
2
male
27
0
887
1
female
19
1
888
3
female
NaN
0
889
1
male
26
1
890
3
male
32
0
891 rows × 4 columns
In [36]:
survivors = titanic_df1[titanic_df1['Survived']==1]
victims = titanic_df1[titanic_df1['Survived']==0]
In [37]:
scatter(titanic_df1['Age'], titanic_df1['Pclass'],
c=titanic_df1['Survived'], cmap='prism', marker='o', s=100, alpha=0.5);
In [38]:
titanic_X = titanic_df[['Pclass', 'Sex', 'Age']].as_matrix()
titanic_y = titanic_df['Survived'].as_matrix()
In [39]:
shape(titanic_X), shape(titanic_y)
Out[39]:
((891, 3), (891,))
In [40]:
titanic_X
Out[40]:
array([[3, 'male', 22.0],
[1, 'female', 38.0],
[3, 'female', 26.0],
...,
[3, 'female', nan],
[1, 'male', 26.0],
[3, 'male', 32.0]], dtype=object)
In [41]:
titanic_y
Out[41]:
array([0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1,
1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0,
0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1,
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,
1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1,
1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0,
1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0,
0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1,
0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1,
1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0,
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0])
In [42]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
In [43]:
titanic_X[:,1] = le.fit_transform(titanic_X[:,1])
print titanic_X
[[3 1 22.0]
[1 0 38.0]
[3 0 26.0]
...,
[3 0 nan]
[1 1 26.0]
[3 1 32.0]]
In [44]:
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
titanic_X = imp.fit_transform(titanic_X)
print titanic_X
[[ 3. 1. 22. ]
[ 1. 0. 38. ]
[ 3. 0. 26. ]
...,
[ 3. 0. 29.69911765]
[ 1. 1. 26. ]
[ 3. 1. 32. ]]
In [45]:
from sklearn import tree
clf = tree.DecisionTreeClassifier(max_depth=3)
clf = clf.fit(titanic_X, titanic_y)
In [46]:
titanic_y_predicted = clf.predict(titanic_X)
In [47]:
titanic_df.insert(1,'Survior pred', titanic_y_predicted)
titanic_df
Out[47]:
Survived
Survior pred
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
0
0
3
Braund, Mr. Owen Harris
male
22
1
0
A/5 21171
7.2500
NaN
S
1
1
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38
1
0
PC 17599
71.2833
C85
C
2
1
1
3
Heikkinen, Miss. Laina
female
26
0
0
STON/O2. 3101282
7.9250
NaN
S
3
1
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35
1
0
113803
53.1000
C123
S
4
0
0
3
Allen, Mr. William Henry
male
35
0
0
373450
8.0500
NaN
S
5
0
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
6
0
0
1
McCarthy, Mr. Timothy J
male
54
0
0
17463
51.8625
E46
S
7
0
0
3
Palsson, Master. Gosta Leonard
male
2
3
1
349909
21.0750
NaN
S
8
1
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27
0
2
347742
11.1333
NaN
S
9
1
1
2
Nasser, Mrs. Nicholas (Adele Achem)
female
14
1
0
237736
30.0708
NaN
C
10
1
1
3
Sandstrom, Miss. Marguerite Rut
female
4
1
1
PP 9549
16.7000
G6
S
11
1
1
1
Bonnell, Miss. Elizabeth
female
58
0
0
113783
26.5500
C103
S
12
0
0
3
Saundercock, Mr. William Henry
male
20
0
0
A/5. 2151
8.0500
NaN
S
13
0
0
3
Andersson, Mr. Anders Johan
male
39
1
5
347082
31.2750
NaN
S
14
0
1
3
Vestrom, Miss. Hulda Amanda Adolfina
female
14
0
0
350406
7.8542
NaN
S
15
1
1
2
Hewlett, Mrs. (Mary D Kingcome)
female
55
0
0
248706
16.0000
NaN
S
16
0
0
3
Rice, Master. Eugene
male
2
4
1
382652
29.1250
NaN
Q
17
1
0
2
Williams, Mr. Charles Eugene
male
NaN
0
0
244373
13.0000
NaN
S
18
0
1
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
female
31
1
0
345763
18.0000
NaN
S
19
1
1
3
Masselmani, Mrs. Fatima
female
NaN
0
0
2649
7.2250
NaN
C
20
0
0
2
Fynney, Mr. Joseph J
male
35
0
0
239865
26.0000
NaN
S
21
1
0
2
Beesley, Mr. Lawrence
male
34
0
0
248698
13.0000
D56
S
22
1
1
3
McGowan, Miss. Anna "Annie"
female
15
0
0
330923
8.0292
NaN
Q
23
1
0
1
Sloper, Mr. William Thompson
male
28
0
0
113788
35.5000
A6
S
24
0
1
3
Palsson, Miss. Torborg Danira
female
8
3
1
349909
21.0750
NaN
S
25
1
1
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
female
38
1
5
347077
31.3875
NaN
S
26
0
0
3
Emir, Mr. Farred Chehab
male
NaN
0
0
2631
7.2250
NaN
C
27
0
0
1
Fortune, Mr. Charles Alexander
male
19
3
2
19950
263.0000
C23 C25 C27
S
28
1
1
3
O'Dwyer, Miss. Ellen "Nellie"
female
NaN
0
0
330959
7.8792
NaN
Q
29
0
0
3
Todoroff, Mr. Lalio
male
NaN
0
0
349216
7.8958
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
861
0
0
2
Giles, Mr. Frederick Edward
male
21
1
0
28134
11.5000
NaN
S
862
1
1
1
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
female
48
0
0
17466
25.9292
D17
S
863
0
1
3
Sage, Miss. Dorothy Edith "Dolly"
female
NaN
8
2
CA. 2343
69.5500
NaN
S
864
0
0
2
Gill, Mr. John William
male
24
0
0
233866
13.0000
NaN
S
865
1
1
2
Bystrom, Mrs. (Karolina)
female
42
0
0
236852
13.0000
NaN
S
866
1
1
2
Duran y More, Miss. Asuncion
female
27
1
0
SC/PARIS 2149
13.8583
NaN
C
867
0
0
1
Roebling, Mr. Washington Augustus II
male
31
0
0
PC 17590
50.4958
A24
S
868
0
0
3
van Melkebeke, Mr. Philemon
male
NaN
0
0
345777
9.5000
NaN
S
869
1
0
3
Johnson, Master. Harold Theodor
male
4
1
1
347742
11.1333
NaN
S
870
0
0
3
Balkic, Mr. Cerin
male
26
0
0
349248
7.8958
NaN
S
871
1
1
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
female
47
1
1
11751
52.5542
D35
S
872
0
0
1
Carlsson, Mr. Frans Olof
male
33
0
0
695
5.0000
B51 B53 B55
S
873
0
0
3
Vander Cruyssen, Mr. Victor
male
47
0
0
345765
9.0000
NaN
S
874
1
1
2
Abelson, Mrs. Samuel (Hannah Wizosky)
female
28
1
0
P/PP 3381
24.0000
NaN
C
875
1
1
3
Najib, Miss. Adele Kiamie "Jane"
female
15
0
0
2667
7.2250
NaN
C
876
0
0
3
Gustafsson, Mr. Alfred Ossian
male
20
0
0
7534
9.8458
NaN
S
877
0
0
3
Petroff, Mr. Nedelio
male
19
0
0
349212
7.8958
NaN
S
878
0
0
3
Laleff, Mr. Kristo
male
NaN
0
0
349217
7.8958
NaN
S
879
1
1
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
female
56
0
1
11767
83.1583
C50
C
880
1
1
2
Shelley, Mrs. William (Imanita Parrish Hall)
female
25
0
1
230433
26.0000
NaN
S
881
0
0
3
Markun, Mr. Johann
male
33
0
0
349257
7.8958
NaN
S
882
0
1
3
Dahlberg, Miss. Gerda Ulrika
female
22
0
0
7552
10.5167
NaN
S
883
0
0
2
Banfield, Mr. Frederick James
male
28
0
0
C.A./SOTON 34068
10.5000
NaN
S
884
0
0
3
Sutehall, Mr. Henry Jr
male
25
0
0
SOTON/OQ 392076
7.0500
NaN
S
885
0
0
3
Rice, Mrs. William (Margaret Norton)
female
39
0
5
382652
29.1250
NaN
Q
886
0
0
2
Montvila, Rev. Juozas
male
27
0
0
211536
13.0000
NaN
S
887
1
1
1
Graham, Miss. Margaret Edith
female
19
0
0
112053
30.0000
B42
S
888
0
1
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.4500
NaN
S
889
1
0
1
Behr, Mr. Karl Howell
male
26
0
0
111369
30.0000
C148
C
890
0
0
3
Dooley, Mr. Patrick
male
32
0
0
370376
7.7500
NaN
Q
891 rows × 12 columns
In [48]:
from sklearn.metrics import accuracy_score
accuracy_score(titanic_y, titanic_y_predicted)
Out[48]:
0.80920314253647585
In [49]:
from sklearn.externals.six import StringIO
import pyparsing
import pydot
from IPython.display import Image
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data, feature_names=['Pclass', 'Sex', 'Age'])
graph = pydot.graph_from_dot_data(dot_data.getvalue())
img = Image(graph.create_png())
In [50]:
img.width=800; img
Out[50]:
In [51]:
titanic_y_predicted_proba = clf.predict_proba(titanic_X)
In [52]:
titanic_df.insert(2,'Survior prob', titanic_y_predicted_proba[:,1])
titanic_df
Out[52]:
Survived
Survior pred
Survior prob
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
0
0
0.115473
3
Braund, Mr. Owen Harris
male
22
1
0
A/5 21171
7.2500
NaN
S
1
1
1
0.952381
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38
1
0
PC 17599
71.2833
C85
C
2
1
1
0.537879
3
Heikkinen, Miss. Laina
female
26
0
0
STON/O2. 3101282
7.9250
NaN
S
3
1
1
0.952381
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35
1
0
113803
53.1000
C123
S
4
0
0
0.115473
3
Allen, Mr. William Henry
male
35
0
0
373450
8.0500
NaN
S
5
0
0
0.115473
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
6
0
0
0.358333
1
McCarthy, Mr. Timothy J
male
54
0
0
17463
51.8625
E46
S
7
0
0
0.428571
3
Palsson, Master. Gosta Leonard
male
2
3
1
349909
21.0750
NaN
S
8
1
1
0.537879
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27
0
2
347742
11.1333
NaN
S
9
1
1
0.952381
2
Nasser, Mrs. Nicholas (Adele Achem)
female
14
1
0
237736
30.0708
NaN
C
10
1
1
0.537879
3
Sandstrom, Miss. Marguerite Rut
female
4
1
1
PP 9549
16.7000
G6
S
11
1
1
0.952381
1
Bonnell, Miss. Elizabeth
female
58
0
0
113783
26.5500
C103
S
12
0
0
0.115473
3
Saundercock, Mr. William Henry
male
20
0
0
A/5. 2151
8.0500
NaN
S
13
0
0
0.115473
3
Andersson, Mr. Anders Johan
male
39
1
5
347082
31.2750
NaN
S
14
0
1
0.537879
3
Vestrom, Miss. Hulda Amanda Adolfina
female
14
0
0
350406
7.8542
NaN
S
15
1
1
0.952381
2
Hewlett, Mrs. (Mary D Kingcome)
female
55
0
0
248706
16.0000
NaN
S
16
0
0
0.428571
3
Rice, Master. Eugene
male
2
4
1
382652
29.1250
NaN
Q
17
1
0
0.115473
2
Williams, Mr. Charles Eugene
male
NaN
0
0
244373
13.0000
NaN
S
18
0
1
0.537879
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
female
31
1
0
345763
18.0000
NaN
S
19
1
1
0.537879
3
Masselmani, Mrs. Fatima
female
NaN
0
0
2649
7.2250
NaN
C
20
0
0
0.115473
2
Fynney, Mr. Joseph J
male
35
0
0
239865
26.0000
NaN
S
21
1
0
0.115473
2
Beesley, Mr. Lawrence
male
34
0
0
248698
13.0000
D56
S
22
1
1
0.537879
3
McGowan, Miss. Anna "Annie"
female
15
0
0
330923
8.0292
NaN
Q
23
1
0
0.358333
1
Sloper, Mr. William Thompson
male
28
0
0
113788
35.5000
A6
S
24
0
1
0.537879
3
Palsson, Miss. Torborg Danira
female
8
3
1
349909
21.0750
NaN
S
25
1
1
0.537879
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
female
38
1
5
347077
31.3875
NaN
S
26
0
0
0.115473
3
Emir, Mr. Farred Chehab
male
NaN
0
0
2631
7.2250
NaN
C
27
0
0
0.358333
1
Fortune, Mr. Charles Alexander
male
19
3
2
19950
263.0000
C23 C25 C27
S
28
1
1
0.537879
3
O'Dwyer, Miss. Ellen "Nellie"
female
NaN
0
0
330959
7.8792
NaN
Q
29
0
0
0.115473
3
Todoroff, Mr. Lalio
male
NaN
0
0
349216
7.8958
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
...
861
0
0
0.115473
2
Giles, Mr. Frederick Edward
male
21
1
0
28134
11.5000
NaN
S
862
1
1
0.952381
1
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
female
48
0
0
17466
25.9292
D17
S
863
0
1
0.537879
3
Sage, Miss. Dorothy Edith "Dolly"
female
NaN
8
2
CA. 2343
69.5500
NaN
S
864
0
0
0.115473
2
Gill, Mr. John William
male
24
0
0
233866
13.0000
NaN
S
865
1
1
0.952381
2
Bystrom, Mrs. (Karolina)
female
42
0
0
236852
13.0000
NaN
S
866
1
1
0.952381
2
Duran y More, Miss. Asuncion
female
27
1
0
SC/PARIS 2149
13.8583
NaN
C
867
0
0
0.358333
1
Roebling, Mr. Washington Augustus II
male
31
0
0
PC 17590
50.4958
A24
S
868
0
0
0.115473
3
van Melkebeke, Mr. Philemon
male
NaN
0
0
345777
9.5000
NaN
S
869
1
0
0.428571
3
Johnson, Master. Harold Theodor
male
4
1
1
347742
11.1333
NaN
S
870
0
0
0.115473
3
Balkic, Mr. Cerin
male
26
0
0
349248
7.8958
NaN
S
871
1
1
0.952381
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
female
47
1
1
11751
52.5542
D35
S
872
0
0
0.358333
1
Carlsson, Mr. Frans Olof
male
33
0
0
695
5.0000
B51 B53 B55
S
873
0
0
0.115473
3
Vander Cruyssen, Mr. Victor
male
47
0
0
345765
9.0000
NaN
S
874
1
1
0.952381
2
Abelson, Mrs. Samuel (Hannah Wizosky)
female
28
1
0
P/PP 3381
24.0000
NaN
C
875
1
1
0.537879
3
Najib, Miss. Adele Kiamie "Jane"
female
15
0
0
2667
7.2250
NaN
C
876
0
0
0.115473
3
Gustafsson, Mr. Alfred Ossian
male
20
0
0
7534
9.8458
NaN
S
877
0
0
0.115473
3
Petroff, Mr. Nedelio
male
19
0
0
349212
7.8958
NaN
S
878
0
0
0.115473
3
Laleff, Mr. Kristo
male
NaN
0
0
349217
7.8958
NaN
S
879
1
1
0.952381
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
female
56
0
1
11767
83.1583
C50
C
880
1
1
0.952381
2
Shelley, Mrs. William (Imanita Parrish Hall)
female
25
0
1
230433
26.0000
NaN
S
881
0
0
0.115473
3
Markun, Mr. Johann
male
33
0
0
349257
7.8958
NaN
S
882
0
1
0.537879
3
Dahlberg, Miss. Gerda Ulrika
female
22
0
0
7552
10.5167
NaN
S
883
0
0
0.115473
2
Banfield, Mr. Frederick James
male
28
0
0
C.A./SOTON 34068
10.5000
NaN
S
884
0
0
0.115473
3
Sutehall, Mr. Henry Jr
male
25
0
0
SOTON/OQ 392076
7.0500
NaN
S
885
0
0
0.083333
3
Rice, Mrs. William (Margaret Norton)
female
39
0
5
382652
29.1250
NaN
Q
886
0
0
0.115473
2
Montvila, Rev. Juozas
male
27
0
0
211536
13.0000
NaN
S
887
1
1
0.952381
1
Graham, Miss. Margaret Edith
female
19
0
0
112053
30.0000
B42
S
888
0
1
0.537879
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.4500
NaN
S
889
1
0
0.358333
1
Behr, Mr. Karl Howell
male
26
0
0
111369
30.0000
C148
C
890
0
0
0.115473
3
Dooley, Mr. Patrick
male
32
0
0
370376
7.7500
NaN
Q
891 rows × 13 columns
In [53]:
# Pclass, Sex, Age
x_male_student = sp.array([3,1,21])
x_rich_countess = sp.array([1,0,65])
x_midleclass_mother = sp.array([2,0,40])
x_baby = sp.array([1,0,1])
In [54]:
clf.predict_proba(x_male_student)
Out[54]:
array([[ 0.88452656, 0.11547344]])
In [55]:
clf.predict_proba(x_rich_countess)
Out[55]:
array([[ 0.04761905, 0.95238095]])
In [56]:
clf.predict_proba(x_midleclass_mother)
Out[56]:
array([[ 0.04761905, 0.95238095]])
In [57]:
clf.predict_proba(x_baby)
Out[57]:
array([[ 0.5, 0.5]])
In [58]:
from sklearn import datasets
boston = datasets.load_boston()
print boston.DESCR
Boston House Prices dataset
Notes
------
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive
:Median Value (attribute 14) is usually the target
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
:Missing Attribute Values: None
:Creator: Harrison, D. and Rubinfeld, D.L.
This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.
The Boston house-price data has been used in many machine learning papers that address regression
problems.
**References**
- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
- many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)
In [59]:
boston_df = pd.DataFrame(boston.data,
columns=['CRIM','ZN','IDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT'])
boston_df.insert(13, 'Price', boston.target)
boston_df
Out[59]:
CRIM
ZN
IDUS
CHAS
NOX
RM
AGE
DIS
RAD
TAX
PTRATIO
B
LSTAT
Price
0
0.00632
18.0
2.31
0
0.538
6.575
65.2
4.0900
1
296
15.3
396.90
4.98
24.0
1
0.02731
0.0
7.07
0
0.469
6.421
78.9
4.9671
2
242
17.8
396.90
9.14
21.6
2
0.02729
0.0
7.07
0
0.469
7.185
61.1
4.9671
2
242
17.8
392.83
4.03
34.7
3
0.03237
0.0
2.18
0
0.458
6.998
45.8
6.0622
3
222
18.7
394.63
2.94
33.4
4
0.06905
0.0
2.18
0
0.458
7.147
54.2
6.0622
3
222
18.7
396.90
5.33
36.2
5
0.02985
0.0
2.18
0
0.458
6.430
58.7
6.0622
3
222
18.7
394.12
5.21
28.7
6
0.08829
12.5
7.87
0
0.524
6.012
66.6
5.5605
5
311
15.2
395.60
12.43
22.9
7
0.14455
12.5
7.87
0
0.524
6.172
96.1
5.9505
5
311
15.2
396.90
19.15
27.1
8
0.21124
12.5
7.87
0
0.524
5.631
100.0
6.0821
5
311
15.2
386.63
29.93
16.5
9
0.17004
12.5
7.87
0
0.524
6.004
85.9
6.5921
5
311
15.2
386.71
17.10
18.9
10
0.22489
12.5
7.87
0
0.524
6.377
94.3
6.3467
5
311
15.2
392.52
20.45
15.0
11
0.11747
12.5
7.87
0
0.524
6.009
82.9
6.2267
5
311
15.2
396.90
13.27
18.9
12
0.09378
12.5
7.87
0
0.524
5.889
39.0
5.4509
5
311
15.2
390.50
15.71
21.7
13
0.62976
0.0
8.14
0
0.538
5.949
61.8
4.7075
4
307
21.0
396.90
8.26
20.4
14
0.63796
0.0
8.14
0
0.538
6.096
84.5
4.4619
4
307
21.0
380.02
10.26
18.2
15
0.62739
0.0
8.14
0
0.538
5.834
56.5
4.4986
4
307
21.0
395.62
8.47
19.9
16
1.05393
0.0
8.14
0
0.538
5.935
29.3
4.4986
4
307
21.0
386.85
6.58
23.1
17
0.78420
0.0
8.14
0
0.538
5.990
81.7
4.2579
4
307
21.0
386.75
14.67
17.5
18
0.80271
0.0
8.14
0
0.538
5.456
36.6
3.7965
4
307
21.0
288.99
11.69
20.2
19
0.72580
0.0
8.14
0
0.538
5.727
69.5
3.7965
4
307
21.0
390.95
11.28
18.2
20
1.25179
0.0
8.14
0
0.538
5.570
98.1
3.7979
4
307
21.0
376.57
21.02
13.6
21
0.85204
0.0
8.14
0
0.538
5.965
89.2
4.0123
4
307
21.0
392.53
13.83
19.6
22
1.23247
0.0
8.14
0
0.538
6.142
91.7
3.9769
4
307
21.0
396.90
18.72
15.2
23
0.98843
0.0
8.14
0
0.538
5.813
100.0
4.0952
4
307
21.0
394.54
19.88
14.5
24
0.75026
0.0
8.14
0
0.538
5.924
94.1
4.3996
4
307
21.0
394.33
16.30
15.6
25
0.84054
0.0
8.14
0
0.538
5.599
85.7
4.4546
4
307
21.0
303.42
16.51
13.9
26
0.67191
0.0
8.14
0
0.538
5.813
90.3
4.6820
4
307
21.0
376.88
14.81
16.6
27
0.95577
0.0
8.14
0
0.538
6.047
88.8
4.4534
4
307
21.0
306.38
17.28
14.8
28
0.77299
0.0
8.14
0
0.538
6.495
94.4
4.4547
4
307
21.0
387.94
12.80
18.4
29
1.00245
0.0
8.14
0
0.538
6.674
87.3
4.2390
4
307
21.0
380.23
11.98
21.0
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
476
4.87141
0.0
18.10
0
0.614
6.484
93.6
2.3053
24
666
20.2
396.21
18.68
16.7
477
15.02340
0.0
18.10
0
0.614
5.304
97.3
2.1007
24
666
20.2
349.48
24.91
12.0
478
10.23300
0.0
18.10
0
0.614
6.185
96.7
2.1705
24
666
20.2
379.70
18.03
14.6
479
14.33370
0.0
18.10
0
0.614
6.229
88.0
1.9512
24
666
20.2
383.32
13.11
21.4
480
5.82401
0.0
18.10
0
0.532
6.242
64.7
3.4242
24
666
20.2
396.90
10.74
23.0
481
5.70818
0.0
18.10
0
0.532
6.750
74.9
3.3317
24
666
20.2
393.07
7.74
23.7
482
5.73116
0.0
18.10
0
0.532
7.061
77.0
3.4106
24
666
20.2
395.28
7.01
25.0
483
2.81838
0.0
18.10
0
0.532
5.762
40.3
4.0983
24
666
20.2
392.92
10.42
21.8
484
2.37857
0.0
18.10
0
0.583
5.871
41.9
3.7240
24
666
20.2
370.73
13.34
20.6
485
3.67367
0.0
18.10
0
0.583
6.312
51.9
3.9917
24
666
20.2
388.62
10.58
21.2
486
5.69175
0.0
18.10
0
0.583
6.114
79.8
3.5459
24
666
20.2
392.68
14.98
19.1
487
4.83567
0.0
18.10
0
0.583
5.905
53.2
3.1523
24
666
20.2
388.22
11.45
20.6
488
0.15086
0.0
27.74
0
0.609
5.454
92.7
1.8209
4
711
20.1
395.09
18.06
15.2
489
0.18337
0.0
27.74
0
0.609
5.414
98.3
1.7554
4
711
20.1
344.05
23.97
7.0
490
0.20746
0.0
27.74
0
0.609
5.093
98.0
1.8226
4
711
20.1
318.43
29.68
8.1
491
0.10574
0.0
27.74
0
0.609
5.983
98.8
1.8681
4
711
20.1
390.11
18.07
13.6
492
0.11132
0.0
27.74
0
0.609
5.983
83.5
2.1099
4
711
20.1
396.90
13.35
20.1
493
0.17331
0.0
9.69
0
0.585
5.707
54.0
2.3817
6
391
19.2
396.90
12.01
21.8
494
0.27957
0.0
9.69
0
0.585
5.926
42.6
2.3817
6
391
19.2
396.90
13.59
24.5
495
0.17899
0.0
9.69
0
0.585
5.670
28.8
2.7986
6
391
19.2
393.29
17.60
23.1
496
0.28960
0.0
9.69
0
0.585
5.390
72.9
2.7986
6
391
19.2
396.90
21.14
19.7
497
0.26838
0.0
9.69
0
0.585
5.794
70.6
2.8927
6
391
19.2
396.90
14.10
18.3
498
0.23912
0.0
9.69
0
0.585
6.019
65.3
2.4091
6
391
19.2
396.90
12.92
21.2
499
0.17783
0.0
9.69
0
0.585
5.569
73.5
2.3999
6
391
19.2
395.77
15.10
17.5
500
0.22438
0.0
9.69
0
0.585
6.027
79.7
2.4982
6
391
19.2
396.90
14.33
16.8
501
0.06263
0.0
11.93
0
0.573
6.593
69.1
2.4786
1
273
21.0
391.99
9.67
22.4
502
0.04527
0.0
11.93
0
0.573
6.120
76.7
2.2875
1
273
21.0
396.90
9.08
20.6
503
0.06076
0.0
11.93
0
0.573
6.976
91.0
2.1675
1
273
21.0
396.90
5.64
23.9
504
0.10959
0.0
11.93
0
0.573
6.794
89.3
2.3889
1
273
21.0
393.45
6.48
22.0
505
0.04741
0.0
11.93
0
0.573
6.030
80.8
2.5050
1
273
21.0
396.90
7.88
11.9
506 rows × 14 columns
In [60]:
scatter(boston_df['RM'], boston_df['Price']);
In [61]:
boston_X = boston.data
boston_y = boston.target
shape(boston_X)
Out[61]:
(506, 13)
In [62]:
from sklearn.linear_model import Ridge
h = Ridge(alpha=1.0)
h.fit(boston_X, boston_y)
Out[62]:
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, solver='auto', tol=0.001)
In [63]:
boston_y_predicted = h.predict(boston_X)
In [64]:
boston_df.insert(14, 'Price predicted', boston_y_predicted)
boston_df
Out[64]:
CRIM
ZN
IDUS
CHAS
NOX
RM
AGE
DIS
RAD
TAX
PTRATIO
B
LSTAT
Price
Price predicted
0
0.00632
18.0
2.31
0
0.538
6.575
65.2
4.0900
1
296
15.3
396.90
4.98
24.0
30.258005
1
0.02731
0.0
7.07
0
0.469
6.421
78.9
4.9671
2
242
17.8
396.90
9.14
21.6
24.809545
2
0.02729
0.0
7.07
0
0.469
7.185
61.1
4.9671
2
242
17.8
392.83
4.03
34.7
30.535338
3
0.03237
0.0
2.18
0
0.458
6.998
45.8
6.0622
3
222
18.7
394.63
2.94
33.4
28.913066
4
0.06905
0.0
2.18
0
0.458
7.147
54.2
6.0622
3
222
18.7
396.90
5.33
36.2
28.183422
5
0.02985
0.0
2.18
0
0.458
6.430
58.7
6.0622
3
222
18.7
394.12
5.21
28.7
25.440387
6
0.08829
12.5
7.87
0
0.524
6.012
66.6
5.5605
5
311
15.2
395.60
12.43
22.9
22.960449
7
0.14455
12.5
7.87
0
0.524
6.172
96.1
5.9505
5
311
15.2
396.90
19.15
27.1
19.300562
8
0.21124
12.5
7.87
0
0.524
5.631
100.0
6.0821
5
311
15.2
386.63
29.93
16.5
11.152173
9
0.17004
12.5
7.87
0
0.524
6.004
85.9
6.5921
5
311
15.2
386.71
17.10
18.9
18.820816
10
0.22489
12.5
7.87
0
0.524
6.377
94.3
6.3467
5
311
15.2
392.52
20.45
15.0
18.810170
11
0.11747
12.5
7.87
0
0.524
6.009
82.9
6.2267
5
311
15.2
396.90
13.27
18.9
21.508035
12
0.09378
12.5
7.87
0
0.524
5.889
39.0
5.4509
5
311
15.2
390.50
15.71
21.7
20.983508
13
0.62976
0.0
8.14
0
0.538
5.949
61.8
4.7075
4
307
21.0
396.90
8.26
20.4
20.029581
14
0.63796
0.0
8.14
0
0.538
6.096
84.5
4.4619
4
307
21.0
380.02
10.26
18.2
19.577042
15
0.62739
0.0
8.14
0
0.538
5.834
56.5
4.4986
4
307
21.0
395.62
8.47
19.9
19.777702
16
1.05393
0.0
8.14
0
0.538
5.935
29.3
4.4986
4
307
21.0
386.85
6.58
23.1
21.192141
17
0.78420
0.0
8.14
0
0.538
5.990
81.7
4.2579
4
307
21.0
386.75
14.67
17.5
17.159285
18
0.80271
0.0
8.14
0
0.538
5.456
36.6
3.7965
4
307
21.0
288.99
11.69
20.2
16.615288
19
0.72580
0.0
8.14
0
0.538
5.727
69.5
3.7965
4
307
21.0
390.95
11.28
18.2
18.703243
20
1.25179
0.0
8.14
0
0.538
5.570
98.1
3.7979
4
307
21.0
376.57
21.02
13.6
12.546846
21
0.85204
0.0
8.14
0
0.538
5.965
89.2
4.0123
4
307
21.0
392.53
13.83
19.6
17.857864
22
1.23247
0.0
8.14
0
0.538
6.142
91.7
3.9769
4
307
21.0
396.90
18.72
15.2
15.965941
23
0.98843
0.0
8.14
0
0.538
5.813
100.0
4.0952
4
307
21.0
394.54
19.88
14.5
13.875354
24
0.75026
0.0
8.14
0
0.538
5.924
94.1
4.3996
4
307
21.0
394.33
16.30
15.6
15.851091
25
0.84054
0.0
8.14
0
0.538
5.599
85.7
4.4546
4
307
21.0
303.42
16.51
13.9
13.561478
26
0.67191
0.0
8.14
0
0.538
5.813
90.3
4.6820
4
307
21.0
376.88
14.81
16.6
15.690294
27
0.95577
0.0
8.14
0
0.538
6.047
88.8
4.4534
4
307
21.0
306.38
17.28
14.8
14.876646
28
0.77299
0.0
8.14
0
0.538
6.495
94.4
4.4547
4
307
21.0
387.94
12.80
18.4
19.776349
29
1.00245
0.0
8.14
0
0.538
6.674
87.3
4.2390
4
307
21.0
380.23
11.98
21.0
21.138503
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
476
4.87141
0.0
18.10
0
0.614
6.484
93.6
2.3053
24
666
20.2
396.21
18.68
16.7
20.159345
477
15.02340
0.0
18.10
0
0.614
5.304
97.3
2.1007
24
666
20.2
349.48
24.91
12.0
11.043175
478
10.23300
0.0
18.10
0
0.614
6.185
96.7
2.1705
24
666
20.2
379.70
18.03
14.6
18.807340
479
14.33370
0.0
18.10
0
0.614
6.229
88.0
1.9512
24
666
20.2
383.32
13.11
21.4
21.562434
480
5.82401
0.0
18.10
0
0.532
6.242
64.7
3.4242
24
666
20.2
396.90
10.74
23.0
22.880801
481
5.70818
0.0
18.10
0
0.532
6.750
74.9
3.3317
24
666
20.2
393.07
7.74
23.7
26.485297
482
5.73116
0.0
18.10
0
0.532
7.061
77.0
3.4106
24
666
20.2
395.28
7.01
25.0
27.971894
483
2.81838
0.0
18.10
0
0.532
5.762
40.3
4.0983
24
666
20.2
392.92
10.42
21.8
20.682415
484
2.37857
0.0
18.10
0
0.583
5.871
41.9
3.7240
24
666
20.2
370.73
13.34
20.6
19.326336
485
3.67367
0.0
18.10
0
0.583
6.312
51.9
3.9917
24
666
20.2
388.62
10.58
21.2
22.117072
486
5.69175
0.0
18.10
0
0.583
6.114
79.8
3.5459
24
666
20.2
392.68
14.98
19.1
19.297614
487
4.83567
0.0
18.10
0
0.583
5.905
53.2
3.1523
24
666
20.2
388.22
11.45
20.6
21.106630
488
0.15086
0.0
27.74
0
0.609
5.454
92.7
1.8209
4
711
20.1
395.09
18.06
15.2
11.359091
489
0.18337
0.0
27.74
0
0.609
5.414
98.3
1.7554
4
711
20.1
344.05
23.97
7.0
7.607410
490
0.20746
0.0
27.74
0
0.609
5.093
98.0
1.8226
4
711
20.1
318.43
29.68
8.1
2.979239
491
0.10574
0.0
27.74
0
0.609
5.983
98.8
1.8681
4
711
20.1
390.11
18.07
13.6
13.248580
492
0.11132
0.0
27.74
0
0.609
5.983
83.5
2.1099
4
711
20.1
396.90
13.35
20.1
15.585289
493
0.17331
0.0
9.69
0
0.585
5.707
54.0
2.3817
6
391
19.2
396.90
12.01
21.8
20.929280
494
0.27957
0.0
9.69
0
0.585
5.926
42.6
2.3817
6
391
19.2
396.90
13.59
24.5
20.978616
495
0.17899
0.0
9.69
0
0.585
5.670
28.8
2.7986
6
391
19.2
393.29
17.60
23.1
17.328641
496
0.28960
0.0
9.69
0
0.585
5.390
72.9
2.7986
6
391
19.2
396.90
21.14
19.7
14.147261
497
0.26838
0.0
9.69
0
0.585
5.794
70.6
2.8927
6
391
19.2
396.90
14.10
18.3
19.347614
498
0.23912
0.0
9.69
0
0.585
6.019
65.3
2.4091
6
391
19.2
396.90
12.92
21.2
21.539158
499
0.17783
0.0
9.69
0
0.585
5.569
73.5
2.3999
6
391
19.2
395.77
15.10
17.5
18.606657
500
0.22438
0.0
9.69
0
0.585
6.027
79.7
2.4982
6
391
19.2
396.90
14.33
16.8
20.618846
501
0.06263
0.0
11.93
0
0.573
6.593
69.1
2.4786
1
273
21.0
391.99
9.67
22.4
23.946206
502
0.04527
0.0
11.93
0
0.573
6.120
76.7
2.2875
1
273
21.0
396.90
9.08
20.6
22.711802
503
0.06076
0.0
11.93
0
0.573
6.976
91.0
2.1675
1
273
21.0
396.90
5.64
23.9
27.930317
504
0.10959
0.0
11.93
0
0.573
6.794
89.3
2.3889
1
273
21.0
393.45
6.48
22.0
26.447660
505
0.04741
0.0
11.93
0
0.573
6.030
80.8
2.5050
1
273
21.0
396.90
7.88
11.9
22.685492
506 rows × 15 columns
In [65]:
digits = sklearn.datasets.load_digits()
In [66]:
print digits.DESCR
Optical Recognition of Handwritten Digits Data Set
Notes
-----
Data Set Characteristics:
:Number of Instances: 5620
:Number of Attributes: 64
:Attribute Information: 8x8 image of integer pixels in the range 0..16.
:Missing Attribute Values: None
:Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
:Date: July; 1998
This is a copy of the test set of the UCI ML hand-written digits datasets
http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.
Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is an integer in the range
0..16. This reduces dimensionality and gives invariance to small
distortions.
For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
1994.
References
----------
- C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
Graduate Studies in Science and Engineering, Bogazici University.
- E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
- Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
Linear dimensionalityreduction using relevance weighted LDA. School of
Electrical and Electronic Engineering Nanyang Technological University.
2005.
- Claudio Gentile. A New Approximate Maximal Margin Classification
Algorithm. NIPS. 2000.
In [67]:
digits_X = digits.data
digits_y = digits.target
In [68]:
digits_X
Out[68]:
array([[ 0., 0., 5., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 10., 0., 0.],
[ 0., 0., 0., ..., 16., 9., 0.],
...,
[ 0., 0., 1., ..., 6., 0., 0.],
[ 0., 0., 2., ..., 12., 0., 0.],
[ 0., 0., 10., ..., 12., 1., 0.]])
In [69]:
shape(digits_X)
Out[69]:
(1797, 64)
In [70]:
x = digits_X[0]; x
Out[70]:
array([ 0., 0., 5., 13., 9., 1., 0., 0., 0., 0., 13.,
15., 10., 15., 5., 0., 0., 3., 15., 2., 0., 11.,
8., 0., 0., 4., 12., 0., 0., 8., 8., 0., 0.,
5., 8., 0., 0., 9., 8., 0., 0., 4., 11., 0.,
1., 12., 7., 0., 0., 2., 14., 5., 10., 12., 0.,
0., 0., 0., 6., 13., 10., 0., 0., 0.])
In [71]:
gray()
def show_digit(x) :
matshow(x.reshape(8,8))
return
<matplotlib.figure.Figure at 0x7f7720927fd0>
In [72]:
show_digit(x)
In [73]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=10)
kmeans.fit(digits_X)
Out[73]:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=10, n_init=10,
n_jobs=1, precompute_distances=True, random_state=None, tol=0.0001,
verbose=0)
In [74]:
digits_y_predicted = map(lambda x : kmeans.predict(x)[0], digits_X)
print digits_y_predicted
[5, 1, 1, 3, 8, 2, 7, 4, 2, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 1, 2, 1, 8, 0, 4, 4, 3, 6, 0, 5, 5, 1, 1, 4, 1, 4, 5, 0, 1, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 1, 0, 6, 5, 2, 6, 0, 1, 0, 5, 5, 0, 4, 7, 3, 9, 0, 4, 8, 7, 3, 0, 3, 3, 1, 4, 1, 1, 8, 3, 1, 8, 5, 6, 4, 7, 2, 7, 0, 4, 6, 8, 8, 4, 9, 1, 1, 1, 6, 4, 2, 6, 8, 1, 1, 8, 2, 5, 1, 2, 1, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 3, 6, 6, 7, 6, 5, 2, 1, 2, 1, 8, 0, 4, 4, 3, 6, 0, 5, 5, 9, 9, 4, 1, 9, 5, 0, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 1, 2, 0, 6, 5, 2, 6, 9, 1, 9, 5, 5, 0, 4, 7, 3, 9, 0, 4, 3, 0, 3, 2, 0, 4, 7, 1, 8, 3, 0, 8, 5, 6, 4, 7, 2, 7, 0, 4, 6, 1, 8, 4, 9, 1, 9, 9, 6, 6, 0, 1, 1, 8, 2, 5, 1, 2, 1, 5, 1, 9, 3, 8, 2, 7, 4, 2, 6, 5, 1, 9, 3, 8, 6, 7, 4, 1, 4, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 2, 7, 2, 5, 2, 1, 2, 1, 8, 1, 4, 4, 3, 6, 1, 5, 5, 9, 9, 4, 2, 9, 5, 0, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 4, 1, 6, 5, 4, 6, 9, 2, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 4, 1, 4, 7, 1, 8, 3, 1, 8, 5, 2, 3, 7, 4, 7, 1, 4, 6, 8, 8, 4, 9, 2, 9, 9, 2, 4, 0, 6, 8, 2, 2, 8, 0, 5, 1, 4, 3, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 2, 3, 6, 7, 3, 5, 2, 2, 2, 2, 8, 0, 4, 4, 3, 6, 0, 5, 5, 9, 9, 4, 2, 9, 5, 9, 9, 7, 3, 2, 4, 2, 2, 8, 7, 7, 7, 8, 2, 0, 2, 5, 2, 6, 9, 1, 9, 5, 5, 0, 4, 7, 3, 9, 0, 4, 8, 7, 3, 0, 3, 2, 0, 4, 7, 2, 8, 3, 0, 8, 5, 6, 3, 7, 2, 1, 0, 4, 6, 0, 8, 4, 9, 2, 9, 1, 6, 4, 2, 6, 8, 1, 1, 8, 2, 5, 2, 2, 2, 5, 9, 9, 2, 8, 6, 7, 4, 1, 2, 5, 9, 9, 2, 4, 2, 7, 4, 1, 2, 5, 9, 9, 2, 8, 2, 7, 4, 1, 2, 5, 1, 6, 2, 7, 2, 5, 2, 1, 2, 1, 8, 9, 4, 4, 3, 2, 7, 5, 5, 9, 9, 4, 7, 9, 5, 9, 9, 7, 2, 3, 4, 2, 2, 8, 7, 7, 7, 8, 2, 7, 6, 5, 2, 2, 9, 1, 9, 5, 5, 9, 4, 7, 2, 9, 9, 4, 8, 7, 2, 9, 2, 2, 9, 4, 7, 1, 8, 3, 1, 8, 5, 2, 3, 7, 2, 7, 9, 4, 6, 8, 8, 4, 9, 1, 9, 9, 6, 4, 2, 2, 4, 0, 1, 4, 2, 5, 1, 2, 1, 5, 1, 9, 3, 8, 6, 7, 4, 2, 0, 5, 1, 9, 3, 8, 6, 7, 4, 1, 0, 5, 1, 9, 3, 0, 6, 7, 4, 2, 0, 5, 0, 8, 6, 7, 6, 5, 0, 1, 0, 2, 8, 1, 4, 4, 3, 6, 1, 5, 5, 3, 9, 4, 1, 1, 5, 1, 2, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 0, 1, 6, 5, 0, 6, 3, 2, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 0, 1, 4, 7, 1, 8, 3, 1, 4, 5, 6, 3, 7, 0, 7, 1, 4, 6, 8, 4, 4, 9, 1, 9, 9, 6, 4, 0, 6, 8, 2, 1, 4, 0, 5, 2, 0, 1, 5, 1, 9, 3, 8, 6, 7, 4, 7, 2, 5, 1, 9, 3, 8, 6, 5, 4, 0, 2, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 2, 2, 2, 8, 1, 4, 4, 3, 6, 1, 5, 5, 9, 9, 4, 0, 9, 5, 9, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 6, 9, 1, 9, 5, 5, 1, 4, 7, 3, 4, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 9, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 9, 0, 4, 9, 6, 4, 2, 6, 8, 0, 2, 8, 2, 5, 1, 2, 1, 9, 9, 3, 8, 6, 7, 4, 2, 2, 5, 9, 9, 3, 8, 2, 7, 4, 2, 2, 5, 9, 9, 3, 8, 2, 7, 4, 2, 2, 5, 2, 2, 2, 7, 2, 5, 2, 2, 2, 2, 8, 3, 4, 4, 3, 2, 9, 9, 4, 2, 9, 5, 9, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 9, 2, 5, 2, 2, 9, 2, 9, 5, 5, 9, 4, 7, 3, 9, 9, 8, 7, 3, 9, 3, 2, 9, 4, 7, 2, 8, 3, 9, 8, 5, 6, 3, 7, 2, 7, 9, 4, 2, 8, 8, 4, 9, 2, 9, 9, 2, 4, 2, 2, 8, 8, 2, 5, 2, 2, 2, 5, 1, 9, 3, 8, 6, 7, 4, 4, 3, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 1, 9, 3, 8, 6, 7, 4, 2, 2, 5, 2, 6, 6, 7, 6, 5, 2, 2, 2, 1, 8, 1, 4, 4, 3, 6, 1, 5, 5, 4, 1, 9, 5, 1, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 4, 2, 1, 6, 5, 6, 6, 9, 2, 3, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 4, 2, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 9, 1, 9, 9, 6, 4, 2, 6, 8, 9, 3, 8, 2, 5, 1, 2, 1, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 6, 2, 6, 8, 1, 4, 4, 4, 6, 1, 5, 5, 9, 9, 4, 4, 9, 5, 1, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 6, 9, 6, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 6, 4, 6, 8, 8, 4, 9, 9, 9, 9, 6, 4, 2, 6, 8, 1, 1, 8, 2, 5, 1, 2, 1, 5, 0, 9, 3, 8, 2, 7, 4, 1, 2, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 3, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 1, 2, 1, 8, 1, 4, 4, 3, 6, 1, 5, 5, 3, 3, 4, 1, 3, 5, 0, 3, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 7, 9, 1, 3, 5, 5, 0, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 2, 1, 9, 9, 2, 4, 2, 6, 8, 1, 1, 8, 2, 5, 1, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 0, 9, 3, 8, 2, 7, 4, 2, 2, 5, 0, 9, 3, 8, 2, 7, 4, 2, 2, 5, 2, 2, 2, 7, 2, 5, 2, 2, 2, 2, 8, 0, 4, 4, 3, 2, 0, 5, 5, 9, 9, 4, 0, 9, 5, 0, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 0, 2, 5, 2, 6, 9, 2, 9, 5, 5, 0, 4, 7, 3, 9, 0, 4, 8, 7, 3, 0, 3, 2, 0, 4, 7, 2, 8, 3, 0, 8, 5, 2, 3, 7, 2, 7, 0, 4, 2, 8, 8, 4, 9, 2, 9, 9, 6, 4, 2, 6, 8, 2, 4, 8, 3, 5, 4, 2, 2, 5, 1, 9, 3, 8, 6, 1, 1, 1, 0, 5, 1, 3, 3, 8, 6, 7, 0, 5, 1, 9, 3, 8, 6, 7, 4, 6, 0, 8, 0, 6, 6, 7, 6, 5, 6, 1, 6, 1, 8, 1, 4, 4, 1, 6, 1, 5, 5, 5, 9, 1, 1, 3, 5, 1, 9, 7, 1, 4, 4, 4, 1, 8, 7, 7, 7, 1, 0, 1, 6, 5, 2, 6, 3, 1, 5, 1, 4, 7, 3, 9, 1, 4, 1, 7, 3, 1, 3, 4, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 3, 7, 1, 4, 6, 8, 8, 4, 9, 3, 6, 4, 3, 6, 4, 8, 6, 5, 1, 6, 4, 5, 1, 9, 3, 8, 6, 7, 4, 3, 2, 5, 1, 9, 1, 8, 6, 7, 4, 1, 2, 5, 1, 9, 6, 8, 8, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 2, 2, 1, 8, 1, 4, 4, 4, 6, 1, 5, 5, 9, 9, 4, 1, 9, 5, 1, 9, 7, 1, 1, 4, 2, 1, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 2, 9, 1, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 1, 8, 6, 1, 8, 5, 2, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 9, 1, 9, 9, 2, 4, 3, 2, 8, 1, 1, 8, 2, 5, 1, 2, 2]
In [75]:
from sklearn.decomposition import PCA
digits_X_reduced = PCA(n_components=2).fit_transform(digits_X)
In [76]:
shape(digits_X_reduced)
Out[76]:
(1797, 2)
In [77]:
scatter(digits_X_reduced[:,0], digits_X_reduced[:,1], c=digits_y_predicted, cmap='prism');
In [78]:
kmeans.cluster_centers_
Out[78]:
array([[ 0.00000000e+00, 2.77555756e-16, 3.48837209e-02,
1.86046512e+00, 1.10581395e+01, 1.29418605e+01,
4.34883721e+00, 2.55813953e-01, 2.60208521e-18,
5.81395349e-02, 2.06976744e+00, 9.09302326e+00,
1.38139535e+01, 1.29534884e+01, 5.29069767e+00,
2.55813953e-01, 1.30104261e-18, 1.69767442e+00,
9.25581395e+00, 1.26162791e+01, 1.24069767e+01,
1.33720930e+01, 3.83720930e+00, 9.30232558e-02,
-1.08420217e-18, 3.80232558e+00, 1.23488372e+01,
1.18604651e+01, 1.33720930e+01, 1.35232558e+01,
2.25581395e+00, -2.16840434e-18, 0.00000000e+00,
1.74418605e+00, 6.43023256e+00, 6.94186047e+00,
1.17674419e+01, 1.24302326e+01, 1.58139535e+00,
0.00000000e+00, -8.67361738e-18, 6.74418605e-01,
1.61627907e+00, 3.40697674e+00, 1.18023256e+01,
1.18720930e+01, 9.88372093e-01, -6.93889390e-18,
-2.60208521e-18, 3.48837209e-02, 3.13953488e-01,
3.05813953e+00, 1.26627907e+01, 1.16627907e+01,
1.62790698e+00, 1.38777878e-16, -5.42101086e-19,
5.55111512e-16, 5.32907052e-15, 1.94186047e+00,
1.13720930e+01, 1.08604651e+01, 1.70930233e+00,
-3.33066907e-16],
[ 0.00000000e+00, 1.11111111e-01, 3.97333333e+00,
1.18311111e+01, 1.23244444e+01, 5.34222222e+00,
4.31111111e-01, -3.33066907e-16, 8.88888889e-03,
8.53333333e-01, 8.20888889e+00, 1.35155556e+01,
1.25733333e+01, 9.84444444e+00, 1.56000000e+00,
4.44089210e-16, -1.51788304e-17, 1.20888889e+00,
8.33333333e+00, 1.19022222e+01, 1.23288889e+01,
9.43111111e+00, 1.02222222e+00, -9.71445147e-17,
-3.25260652e-18, 9.37777778e-01, 7.24888889e+00,
1.40933333e+01, 1.41866667e+01, 4.99555556e+00,
2.04444444e-01, -6.50521303e-18, 0.00000000e+00,
7.64444444e-01, 7.99111111e+00, 1.47866667e+01,
1.28888889e+01, 2.21333333e+00, 6.22222222e-02,
0.00000000e+00, -2.60208521e-17, 1.22222222e+00,
1.04488889e+01, 1.20355556e+01, 1.20933333e+01,
4.00888889e+00, 2.66666667e-01, 3.46944695e-17,
1.33333333e-02, 8.71111111e-01, 9.53777778e+00,
1.15600000e+01, 1.20711111e+01, 5.60888889e+00,
6.75555556e-01, 4.44444444e-03, 4.44444444e-03,
1.11111111e-01, 4.18222222e+00, 1.19555556e+01,
1.25866667e+01, 4.90666667e+00, 8.48888889e-01,
8.88888889e-03],
[ 0.00000000e+00, 1.95121951e-01, 6.46341463e+00,
1.24959350e+01, 1.18373984e+01, 5.63414634e+00,
6.26016260e-01, 8.13008130e-03, 4.06504065e-03,
2.59756098e+00, 1.39593496e+01, 9.17886179e+00,
9.39837398e+00, 1.03739837e+01, 1.28048780e+00,
4.06504065e-03, -1.60461922e-17, 4.29268293e+00,
1.28048780e+01, 4.36991870e+00, 6.82113821e+00,
1.11544715e+01, 1.90650407e+00, -1.45716772e-16,
-8.67361738e-19, 2.32113821e+00, 1.04390244e+01,
1.18699187e+01, 1.32073171e+01, 1.20406504e+01,
2.47154472e+00, -1.73472348e-18, 0.00000000e+00,
3.04878049e-01, 3.21138211e+00, 6.23577236e+00,
6.81300813e+00, 1.12154472e+01, 4.27642276e+00,
0.00000000e+00, -6.93889390e-18, 2.19512195e-01,
2.35365854e+00, 2.00000000e+00, 1.67886179e+00,
1.09471545e+01, 6.43902439e+00, 1.62601626e-02,
-2.94902991e-17, 7.47967480e-01, 8.10569106e+00,
5.62601626e+00, 4.65447154e+00, 1.22317073e+01,
6.03658537e+00, 1.13821138e-01, -4.33680869e-19,
1.70731707e-01, 6.39024390e+00, 1.34796748e+01,
1.45203252e+01, 1.00000000e+01, 2.33739837e+00,
1.13821138e-01],
[ 0.00000000e+00, 5.92178771e-01, 8.73184358e+00,
1.45921788e+01, 1.40279330e+01, 7.04469274e+00,
6.25698324e-01, -2.77555756e-16, 1.11731844e-02,
4.18435754e+00, 1.26592179e+01, 9.16201117e+00,
1.12960894e+01, 1.20223464e+01, 1.89385475e+00,
1.11731844e-02, 5.58659218e-03, 1.90502793e+00,
3.74860335e+00, 3.68156425e+00, 1.18435754e+01,
9.92178771e+00, 8.60335196e-01, 5.55111512e-17,
-2.81892565e-18, 6.14525140e-02, 9.83240223e-01,
8.26256983e+00, 1.38156425e+01, 6.86592179e+00,
3.29608939e-01, -5.63785130e-18, 0.00000000e+00,
6.14525140e-02, 6.75977654e-01, 4.52513966e+00,
1.16648045e+01, 1.23351955e+01, 2.32402235e+00,
0.00000000e+00, -2.25514052e-17, 4.63687151e-01,
1.49720670e+00, 6.81564246e-01, 4.17877095e+00,
1.24022346e+01, 6.30167598e+00, 5.58659218e-03,
-2.42861287e-17, 9.44134078e-01, 7.34078212e+00,
6.60335196e+00, 8.66480447e+00, 1.37039106e+01,
6.02234637e+00, 1.73184358e-01, -1.40946282e-18,
4.69273743e-01, 9.50837989e+00, 1.49608939e+01,
1.41061453e+01, 8.81564246e+00, 1.82122905e+00,
4.13407821e-01],
[ 0.00000000e+00, 1.59420290e-01, 4.87922705e+00,
1.28792271e+01, 1.40241546e+01, 1.09275362e+01,
4.96135266e+00, 9.37198068e-01, -2.86229374e-17,
1.11594203e+00, 1.06473430e+01, 1.15217391e+01,
1.03864734e+01, 1.25507246e+01, 5.54106280e+00,
5.36231884e-01, -1.43114687e-17, 1.18840580e+00,
5.50241546e+00, 2.30434783e+00, 6.78260870e+00,
1.15507246e+01, 3.42995169e+00, 1.11111111e-01,
-3.25260652e-18, 1.00483092e+00, 5.09178744e+00,
6.47342995e+00, 1.21642512e+01, 1.20917874e+01,
4.79710145e+00, 4.83091787e-03, 0.00000000e+00,
1.48792271e+00, 8.65700483e+00, 1.30483092e+01,
1.46859903e+01, 1.07101449e+01, 3.94685990e+00,
0.00000000e+00, -2.60208521e-17, 1.11111111e+00,
5.15458937e+00, 1.14830918e+01, 1.10386473e+01,
3.73429952e+00, 5.36231884e-01, 1.73472348e-17,
-2.68882139e-17, 1.01449275e-01, 2.98067633e+00,
1.22753623e+01, 6.39130435e+00, 4.54106280e-01,
9.66183575e-03, -5.82867088e-16, -1.62630326e-18,
1.25603865e-01, 6.08212560e+00, 1.19855072e+01,
2.70531401e+00, 2.85024155e-01, 3.38164251e-02,
-7.21644966e-16],
[ 0.00000000e+00, 2.23463687e-02, 4.22905028e+00,
1.31396648e+01, 1.12681564e+01, 2.93854749e+00,
3.35195531e-02, -2.77555756e-16, -2.51534904e-17,
8.82681564e-01, 1.26201117e+01, 1.33687151e+01,
1.14078212e+01, 1.13687151e+01, 9.60893855e-01,
3.60822483e-16, -1.25767452e-17, 3.72625698e+00,
1.42122905e+01, 5.25139665e+00, 2.10614525e+00,
1.21173184e+01, 3.53072626e+00, 5.55111512e-17,
-2.81892565e-18, 5.29608939e+00, 1.26424581e+01,
2.03351955e+00, 2.29050279e-01, 9.07821229e+00,
6.47486034e+00, -5.63785130e-18, 0.00000000e+00,
5.88268156e+00, 1.14916201e+01, 8.65921788e-01,
3.35195531e-02, 8.81005587e+00, 7.15083799e+00,
0.00000000e+00, -2.25514052e-17, 3.51396648e+00,
1.32849162e+01, 1.65921788e+00, 1.49162011e+00,
1.13519553e+01, 5.84357542e+00, -2.08166817e-17,
-2.42861287e-17, 8.04469274e-01, 1.31117318e+01,
9.96089385e+00, 1.03519553e+01, 1.32960894e+01,
2.47486034e+00, 2.23463687e-02, -1.40946282e-18,
5.58659218e-03, 4.19553073e+00, 1.35865922e+01,
1.33407821e+01, 5.48044693e+00, 3.18435754e-01,
1.67597765e-02],
[ 0.00000000e+00, 1.10738255e+00, 1.00268456e+01,
1.34093960e+01, 1.41610738e+01, 1.25369128e+01,
4.37583893e+00, 4.02684564e-02, 6.71140940e-03,
4.55704698e+00, 1.49328859e+01, 1.25637584e+01,
8.70469799e+00, 7.03355705e+00, 2.47651007e+00,
3.35570470e-02, 1.34228188e-02, 6.07382550e+00,
1.45302013e+01, 5.95302013e+00, 1.97315436e+00,
1.02684564e+00, 2.01342282e-01, 1.38777878e-16,
6.71140940e-03, 5.30201342e+00, 1.43355705e+01,
1.23624161e+01, 7.85906040e+00, 2.26174497e+00,
1.47651007e-01, -5.20417043e-18, 0.00000000e+00,
1.94630872e+00, 8.15436242e+00, 1.00939597e+01,
1.02684564e+01, 5.51006711e+00, 6.37583893e-01,
0.00000000e+00, -2.08166817e-17, 3.02013423e-01,
1.39597315e+00, 4.87248322e+00, 9.87248322e+00,
7.02013423e+00, 7.78523490e-01, -7.63278329e-17,
-1.99493200e-17, 8.05369128e-01, 5.06040268e+00,
9.47651007e+00, 1.21275168e+01, 5.27516779e+00,
4.42953020e-01, -3.88578059e-16, -1.30104261e-18,
1.05369128e+00, 1.08926174e+01, 1.45369128e+01,
7.83892617e+00, 1.08724832e+00, 2.01342282e-02,
-6.10622664e-16],
[ 0.00000000e+00, -1.16573418e-15, 1.15934066e+00,
1.12252747e+01, 9.53296703e+00, 1.41758242e+00,
5.49450549e-03, -3.05311332e-16, -2.51534904e-17,
6.04395604e-02, 7.18131868e+00, 1.45604396e+01,
6.19230769e+00, 8.29670330e-01, 2.74725275e-02,
3.74700271e-16, -1.25767452e-17, 7.69230769e-01,
1.24560440e+01, 9.47252747e+00, 9.34065934e-01,
1.09890110e-01, 0.00000000e+00, 4.16333634e-17,
-3.03576608e-18, 2.29670330e+00, 1.36208791e+01,
8.09340659e+00, 3.87362637e+00, 1.92857143e+00,
1.04395604e-01, -6.07153217e-18, 0.00000000e+00,
3.52747253e+00, 1.46758242e+01, 1.29175824e+01,
1.22527473e+01, 1.02857143e+01, 2.71978022e+00,
0.00000000e+00, -2.42861287e-17, 1.86813187e+00,
1.45164835e+01, 1.06538462e+01, 5.57692308e+00,
1.01923077e+01, 9.13186813e+00, 2.30769231e-01,
-2.42861287e-17, 1.75824176e-01, 1.02857143e+01,
1.26263736e+01, 5.41758242e+00, 1.13241758e+01,
1.08956044e+01, 6.26373626e-01, -1.51788304e-18,
-6.10622664e-16, 1.44505495e+00, 1.07362637e+01,
1.50989011e+01, 1.31318681e+01, 4.62087912e+00,
1.70329670e-01],
[ 0.00000000e+00, -1.05471187e-15, 2.84023669e-01,
6.95266272e+00, 1.19467456e+01, 2.02366864e+00,
1.47928994e-01, 5.32544379e-02, -2.34187669e-17,
1.18343195e-02, 3.17159763e+00, 1.35917160e+01,
8.63313609e+00, 1.54437870e+00, 9.58579882e-01,
3.13609467e-01, -1.17093835e-17, 6.15384615e-01,
1.04674556e+01, 1.16390533e+01, 4.39644970e+00,
5.20710059e+00, 3.88165680e+00, 3.49112426e-01,
5.91715976e-03, 4.64497041e+00, 1.46449704e+01,
6.01775148e+00, 6.78106509e+00, 1.08165680e+01,
6.26035503e+00, 1.77514793e-02, 0.00000000e+00,
8.84615385e+00, 1.48284024e+01, 9.41420118e+00,
1.27692308e+01, 1.44260355e+01, 5.46153846e+00,
0.00000000e+00, 9.46745562e-02, 6.43786982e+00,
1.15621302e+01, 1.22307692e+01, 1.47751479e+01,
1.09112426e+01, 1.60355030e+00, -3.81639165e-17,
5.91715976e-02, 1.11242604e+00, 2.92899408e+00,
7.54437870e+00, 1.39881657e+01, 4.36686391e+00,
1.77514793e-02, -4.71844785e-16, -1.40946282e-18,
2.36686391e-02, 3.43195266e-01, 7.73964497e+00,
1.23491124e+01, 1.92307692e+00, -5.77315973e-15,
-6.66133815e-16],
[ 0.00000000e+00, 9.42857143e-01, 1.01885714e+01,
1.44400000e+01, 7.77142857e+00, 9.82857143e-01,
-1.33226763e-15, -2.77555756e-16, 2.28571429e-02,
5.24000000e+00, 1.37200000e+01, 1.26228571e+01,
1.16914286e+01, 3.23428571e+00, 1.71428571e-02,
3.60822483e-16, 1.14285714e-02, 4.56000000e+00,
8.11428571e+00, 6.13714286e+00, 1.21600000e+01,
3.56000000e+00, 1.71428571e-02, 7.63278329e-17,
-2.81892565e-18, 9.65714286e-01, 2.81714286e+00,
7.00571429e+00, 1.25371429e+01, 2.56000000e+00,
4.00000000e-02, -5.63785130e-18, 0.00000000e+00,
4.57142857e-02, 1.57142857e+00, 9.89714286e+00,
1.06971429e+01, 1.45142857e+00, -7.10542736e-15,
0.00000000e+00, -2.25514052e-17, 2.51428571e-01,
4.45714286e+00, 1.12457143e+01, 7.74285714e+00,
2.37142857e+00, 8.45714286e-01, 1.14285714e-02,
-2.34187669e-17, 1.19428571e+00, 1.09942857e+01,
1.37314286e+01, 1.19257143e+01, 1.11600000e+01,
7.66857143e+00, 1.10285714e+00, -1.40946282e-18,
9.31428571e-01, 1.03885714e+01, 1.44685714e+01,
1.35028571e+01, 1.23542857e+01, 8.96571429e+00,
2.95428571e+00]])
In [79]:
map(show_digit, kmeans.cluster_centers_);
Content source: jsnajder/StrojnoUcenje
Similar notebooks: