Sveučilište u Zagrebu
Fakultet elektrotehnike i računarstva

Strojno učenje

http://www.fer.unizg.hr/predmet/su

Ak. god. 2015./2016.

Bilježnica 1: Uvodni primjeri

(c) 2015 Jan Šnajder

Verzija: 0.8 (2015-10-15)


In [81]:
# Učitaj osnovne biblioteke...
import scipy as sp
import sklearn
import pandas as pd
%pylab inline


Populating the interactive namespace from numpy and matplotlib

Sadržaj:

  • Klasifikacija: Preživjeli s Titanika

  • Regresija: Cijene nekretnina u Bostonu

  • Grupiranje: Rukom pisane znamenke

Klasifikacija: Preživjeli s Titanika

Podatci

VARIABLE DESCRIPTIONS:
survival        Survival
                (0 = No; 1 = Yes)
pclass          Passenger Class
                (1 = 1st; 2 = 2nd; 3 = 3rd)
name            Name
sex             Sex
age             Age
sibsp           Number of Siblings/Spouses Aboard
parch           Number of Parents/Children Aboard
ticket          Ticket Number
fare            Passenger Fare
cabin           Cabin
embarked        Port of Embarkation
                (C = Cherbourg; Q = Queenstown; S = Southampton)

SPECIAL NOTES:
Pclass is a proxy for socio-economic status (SES)
 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower

Age is in Years; Fractional if Age less than One (1)
 If the Age is Estimated, it is in the form xx.5

With respect to the family relation variables (i.e. sibsp and parch)
some relations were ignored.  The following are the definitions used
for sibsp and parch.

Sibling:  Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic
Spouse:   Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)
Parent:   Mother or Father of Passenger Aboard Titanic
Child:    Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic

Other family relatives excluded from this study include cousins,
nephews/nieces, aunts/uncles, and in-laws.  Some children travelled
only with a nanny, therefore parch=0 for them.  As well, some
travelled with very close friends or neighbors in a village, however,
the definitions do not support such relations.

In [32]:
titanic_df = pd.read_csv("../data/titanic-train.csv")
titanic_df


Out[32]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NaN S
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
6 7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
7 8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NaN S
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NaN S
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NaN C
10 11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
11 12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
12 13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NaN S
13 14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NaN S
14 15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NaN S
15 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NaN S
16 17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NaN Q
17 18 1 2 Williams, Mr. Charles Eugene male NaN 0 0 244373 13.0000 NaN S
18 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31 1 0 345763 18.0000 NaN S
19 20 1 3 Masselmani, Mrs. Fatima female NaN 0 0 2649 7.2250 NaN C
20 21 0 2 Fynney, Mr. Joseph J male 35 0 0 239865 26.0000 NaN S
21 22 1 2 Beesley, Mr. Lawrence male 34 0 0 248698 13.0000 D56 S
22 23 1 3 McGowan, Miss. Anna "Annie" female 15 0 0 330923 8.0292 NaN Q
23 24 1 1 Sloper, Mr. William Thompson male 28 0 0 113788 35.5000 A6 S
24 25 0 3 Palsson, Miss. Torborg Danira female 8 3 1 349909 21.0750 NaN S
25 26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38 1 5 347077 31.3875 NaN S
26 27 0 3 Emir, Mr. Farred Chehab male NaN 0 0 2631 7.2250 NaN C
27 28 0 1 Fortune, Mr. Charles Alexander male 19 3 2 19950 263.0000 C23 C25 C27 S
28 29 1 3 O'Dwyer, Miss. Ellen "Nellie" female NaN 0 0 330959 7.8792 NaN Q
29 30 0 3 Todoroff, Mr. Lalio male NaN 0 0 349216 7.8958 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
861 862 0 2 Giles, Mr. Frederick Edward male 21 1 0 28134 11.5000 NaN S
862 863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48 0 0 17466 25.9292 D17 S
863 864 0 3 Sage, Miss. Dorothy Edith "Dolly" female NaN 8 2 CA. 2343 69.5500 NaN S
864 865 0 2 Gill, Mr. John William male 24 0 0 233866 13.0000 NaN S
865 866 1 2 Bystrom, Mrs. (Karolina) female 42 0 0 236852 13.0000 NaN S
866 867 1 2 Duran y More, Miss. Asuncion female 27 1 0 SC/PARIS 2149 13.8583 NaN C
867 868 0 1 Roebling, Mr. Washington Augustus II male 31 0 0 PC 17590 50.4958 A24 S
868 869 0 3 van Melkebeke, Mr. Philemon male NaN 0 0 345777 9.5000 NaN S
869 870 1 3 Johnson, Master. Harold Theodor male 4 1 1 347742 11.1333 NaN S
870 871 0 3 Balkic, Mr. Cerin male 26 0 0 349248 7.8958 NaN S
871 872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47 1 1 11751 52.5542 D35 S
872 873 0 1 Carlsson, Mr. Frans Olof male 33 0 0 695 5.0000 B51 B53 B55 S
873 874 0 3 Vander Cruyssen, Mr. Victor male 47 0 0 345765 9.0000 NaN S
874 875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28 1 0 P/PP 3381 24.0000 NaN C
875 876 1 3 Najib, Miss. Adele Kiamie "Jane" female 15 0 0 2667 7.2250 NaN C
876 877 0 3 Gustafsson, Mr. Alfred Ossian male 20 0 0 7534 9.8458 NaN S
877 878 0 3 Petroff, Mr. Nedelio male 19 0 0 349212 7.8958 NaN S
878 879 0 3 Laleff, Mr. Kristo male NaN 0 0 349217 7.8958 NaN S
879 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56 0 1 11767 83.1583 C50 C
880 881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 1 230433 26.0000 NaN S
881 882 0 3 Markun, Mr. Johann male 33 0 0 349257 7.8958 NaN S
882 883 0 3 Dahlberg, Miss. Gerda Ulrika female 22 0 0 7552 10.5167 NaN S
883 884 0 2 Banfield, Mr. Frederick James male 28 0 0 C.A./SOTON 34068 10.5000 NaN S
884 885 0 3 Sutehall, Mr. Henry Jr male 25 0 0 SOTON/OQ 392076 7.0500 NaN S
885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39 0 5 382652 29.1250 NaN Q
886 887 0 2 Montvila, Rev. Juozas male 27 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32 0 0 370376 7.7500 NaN Q

891 rows × 12 columns


In [33]:
titanic_df.drop(['PassengerId'], axis=1, inplace=True)

In [34]:
titanic_df.describe()


Out[34]:
Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

In [35]:
titanic_df1 = titanic_df[['Pclass', 'Sex', 'Age','Survived']]
titanic_df1


Out[35]:
Pclass Sex Age Survived
0 3 male 22 0
1 1 female 38 1
2 3 female 26 1
3 1 female 35 1
4 3 male 35 0
5 3 male NaN 0
6 1 male 54 0
7 3 male 2 0
8 3 female 27 1
9 2 female 14 1
10 3 female 4 1
11 1 female 58 1
12 3 male 20 0
13 3 male 39 0
14 3 female 14 0
15 2 female 55 1
16 3 male 2 0
17 2 male NaN 1
18 3 female 31 0
19 3 female NaN 1
20 2 male 35 0
21 2 male 34 1
22 3 female 15 1
23 1 male 28 1
24 3 female 8 0
25 3 female 38 1
26 3 male NaN 0
27 1 male 19 0
28 3 female NaN 1
29 3 male NaN 0
... ... ... ... ...
861 2 male 21 0
862 1 female 48 1
863 3 female NaN 0
864 2 male 24 0
865 2 female 42 1
866 2 female 27 1
867 1 male 31 0
868 3 male NaN 0
869 3 male 4 1
870 3 male 26 0
871 1 female 47 1
872 1 male 33 0
873 3 male 47 0
874 2 female 28 1
875 3 female 15 1
876 3 male 20 0
877 3 male 19 0
878 3 male NaN 0
879 1 female 56 1
880 2 female 25 1
881 3 male 33 0
882 3 female 22 0
883 2 male 28 0
884 3 male 25 0
885 3 female 39 0
886 2 male 27 0
887 1 female 19 1
888 3 female NaN 0
889 1 male 26 1
890 3 male 32 0

891 rows × 4 columns


In [36]:
survivors = titanic_df1[titanic_df1['Survived']==1]
victims = titanic_df1[titanic_df1['Survived']==0]

In [37]:
scatter(titanic_df1['Age'], titanic_df1['Pclass'], 
        c=titanic_df1['Survived'], cmap='prism', marker='o', s=100, alpha=0.5);


Treniranje klasifikatora


In [38]:
titanic_X = titanic_df[['Pclass', 'Sex', 'Age']].as_matrix()
titanic_y = titanic_df['Survived'].as_matrix()

In [39]:
shape(titanic_X), shape(titanic_y)


Out[39]:
((891, 3), (891,))

In [40]:
titanic_X


Out[40]:
array([[3, 'male', 22.0],
       [1, 'female', 38.0],
       [3, 'female', 26.0],
       ..., 
       [3, 'female', nan],
       [1, 'male', 26.0],
       [3, 'male', 32.0]], dtype=object)

In [41]:
titanic_y


Out[41]:
array([0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1,
       1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,
       1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0,
       1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0,
       0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
       0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
       1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1,
       0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1,
       1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0,
       1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0])

In [42]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [43]:
titanic_X[:,1] = le.fit_transform(titanic_X[:,1])
print titanic_X


[[3 1 22.0]
 [1 0 38.0]
 [3 0 26.0]
 ..., 
 [3 0 nan]
 [1 1 26.0]
 [3 1 32.0]]

In [44]:
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
titanic_X = imp.fit_transform(titanic_X)
print titanic_X


[[  3.           1.          22.        ]
 [  1.           0.          38.        ]
 [  3.           0.          26.        ]
 ..., 
 [  3.           0.          29.69911765]
 [  1.           1.          26.        ]
 [  3.           1.          32.        ]]

In [45]:
from sklearn import tree
clf = tree.DecisionTreeClassifier(max_depth=3)
clf = clf.fit(titanic_X, titanic_y)

In [46]:
titanic_y_predicted = clf.predict(titanic_X)

In [47]:
titanic_df.insert(1,'Survior pred', titanic_y_predicted)
titanic_df


Out[47]:
Survived Survior pred Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NaN S
1 1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 0 PC 17599 71.2833 C85 C
2 1 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NaN S
3 1 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
4 0 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NaN S
5 0 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
6 0 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
7 0 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NaN S
8 1 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NaN S
9 1 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NaN C
10 1 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
11 1 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
12 0 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NaN S
13 0 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NaN S
14 0 1 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NaN S
15 1 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NaN S
16 0 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NaN Q
17 1 0 2 Williams, Mr. Charles Eugene male NaN 0 0 244373 13.0000 NaN S
18 0 1 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31 1 0 345763 18.0000 NaN S
19 1 1 3 Masselmani, Mrs. Fatima female NaN 0 0 2649 7.2250 NaN C
20 0 0 2 Fynney, Mr. Joseph J male 35 0 0 239865 26.0000 NaN S
21 1 0 2 Beesley, Mr. Lawrence male 34 0 0 248698 13.0000 D56 S
22 1 1 3 McGowan, Miss. Anna "Annie" female 15 0 0 330923 8.0292 NaN Q
23 1 0 1 Sloper, Mr. William Thompson male 28 0 0 113788 35.5000 A6 S
24 0 1 3 Palsson, Miss. Torborg Danira female 8 3 1 349909 21.0750 NaN S
25 1 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38 1 5 347077 31.3875 NaN S
26 0 0 3 Emir, Mr. Farred Chehab male NaN 0 0 2631 7.2250 NaN C
27 0 0 1 Fortune, Mr. Charles Alexander male 19 3 2 19950 263.0000 C23 C25 C27 S
28 1 1 3 O'Dwyer, Miss. Ellen "Nellie" female NaN 0 0 330959 7.8792 NaN Q
29 0 0 3 Todoroff, Mr. Lalio male NaN 0 0 349216 7.8958 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
861 0 0 2 Giles, Mr. Frederick Edward male 21 1 0 28134 11.5000 NaN S
862 1 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48 0 0 17466 25.9292 D17 S
863 0 1 3 Sage, Miss. Dorothy Edith "Dolly" female NaN 8 2 CA. 2343 69.5500 NaN S
864 0 0 2 Gill, Mr. John William male 24 0 0 233866 13.0000 NaN S
865 1 1 2 Bystrom, Mrs. (Karolina) female 42 0 0 236852 13.0000 NaN S
866 1 1 2 Duran y More, Miss. Asuncion female 27 1 0 SC/PARIS 2149 13.8583 NaN C
867 0 0 1 Roebling, Mr. Washington Augustus II male 31 0 0 PC 17590 50.4958 A24 S
868 0 0 3 van Melkebeke, Mr. Philemon male NaN 0 0 345777 9.5000 NaN S
869 1 0 3 Johnson, Master. Harold Theodor male 4 1 1 347742 11.1333 NaN S
870 0 0 3 Balkic, Mr. Cerin male 26 0 0 349248 7.8958 NaN S
871 1 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47 1 1 11751 52.5542 D35 S
872 0 0 1 Carlsson, Mr. Frans Olof male 33 0 0 695 5.0000 B51 B53 B55 S
873 0 0 3 Vander Cruyssen, Mr. Victor male 47 0 0 345765 9.0000 NaN S
874 1 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28 1 0 P/PP 3381 24.0000 NaN C
875 1 1 3 Najib, Miss. Adele Kiamie "Jane" female 15 0 0 2667 7.2250 NaN C
876 0 0 3 Gustafsson, Mr. Alfred Ossian male 20 0 0 7534 9.8458 NaN S
877 0 0 3 Petroff, Mr. Nedelio male 19 0 0 349212 7.8958 NaN S
878 0 0 3 Laleff, Mr. Kristo male NaN 0 0 349217 7.8958 NaN S
879 1 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56 0 1 11767 83.1583 C50 C
880 1 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 1 230433 26.0000 NaN S
881 0 0 3 Markun, Mr. Johann male 33 0 0 349257 7.8958 NaN S
882 0 1 3 Dahlberg, Miss. Gerda Ulrika female 22 0 0 7552 10.5167 NaN S
883 0 0 2 Banfield, Mr. Frederick James male 28 0 0 C.A./SOTON 34068 10.5000 NaN S
884 0 0 3 Sutehall, Mr. Henry Jr male 25 0 0 SOTON/OQ 392076 7.0500 NaN S
885 0 0 3 Rice, Mrs. William (Margaret Norton) female 39 0 5 382652 29.1250 NaN Q
886 0 0 2 Montvila, Rev. Juozas male 27 0 0 211536 13.0000 NaN S
887 1 1 1 Graham, Miss. Margaret Edith female 19 0 0 112053 30.0000 B42 S
888 0 1 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 1 0 1 Behr, Mr. Karl Howell male 26 0 0 111369 30.0000 C148 C
890 0 0 3 Dooley, Mr. Patrick male 32 0 0 370376 7.7500 NaN Q

891 rows × 12 columns


In [48]:
from sklearn.metrics import accuracy_score
accuracy_score(titanic_y, titanic_y_predicted)


Out[48]:
0.80920314253647585

In [49]:
from sklearn.externals.six import StringIO  
import pyparsing
import pydot
from IPython.display import Image
dot_data = StringIO() 
tree.export_graphviz(clf, out_file=dot_data, feature_names=['Pclass', 'Sex', 'Age'])
graph = pydot.graph_from_dot_data(dot_data.getvalue()) 
img = Image(graph.create_png())

In [50]:
img.width=800; img


Out[50]:

In [51]:
titanic_y_predicted_proba = clf.predict_proba(titanic_X)

In [52]:
titanic_df.insert(2,'Survior prob', titanic_y_predicted_proba[:,1])
titanic_df


Out[52]:
Survived Survior pred Survior prob Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 0 0.115473 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NaN S
1 1 1 0.952381 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 0 PC 17599 71.2833 C85 C
2 1 1 0.537879 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NaN S
3 1 1 0.952381 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
4 0 0 0.115473 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NaN S
5 0 0 0.115473 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
6 0 0 0.358333 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
7 0 0 0.428571 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NaN S
8 1 1 0.537879 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NaN S
9 1 1 0.952381 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NaN C
10 1 1 0.537879 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
11 1 1 0.952381 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
12 0 0 0.115473 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NaN S
13 0 0 0.115473 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NaN S
14 0 1 0.537879 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NaN S
15 1 1 0.952381 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NaN S
16 0 0 0.428571 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NaN Q
17 1 0 0.115473 2 Williams, Mr. Charles Eugene male NaN 0 0 244373 13.0000 NaN S
18 0 1 0.537879 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31 1 0 345763 18.0000 NaN S
19 1 1 0.537879 3 Masselmani, Mrs. Fatima female NaN 0 0 2649 7.2250 NaN C
20 0 0 0.115473 2 Fynney, Mr. Joseph J male 35 0 0 239865 26.0000 NaN S
21 1 0 0.115473 2 Beesley, Mr. Lawrence male 34 0 0 248698 13.0000 D56 S
22 1 1 0.537879 3 McGowan, Miss. Anna "Annie" female 15 0 0 330923 8.0292 NaN Q
23 1 0 0.358333 1 Sloper, Mr. William Thompson male 28 0 0 113788 35.5000 A6 S
24 0 1 0.537879 3 Palsson, Miss. Torborg Danira female 8 3 1 349909 21.0750 NaN S
25 1 1 0.537879 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38 1 5 347077 31.3875 NaN S
26 0 0 0.115473 3 Emir, Mr. Farred Chehab male NaN 0 0 2631 7.2250 NaN C
27 0 0 0.358333 1 Fortune, Mr. Charles Alexander male 19 3 2 19950 263.0000 C23 C25 C27 S
28 1 1 0.537879 3 O'Dwyer, Miss. Ellen "Nellie" female NaN 0 0 330959 7.8792 NaN Q
29 0 0 0.115473 3 Todoroff, Mr. Lalio male NaN 0 0 349216 7.8958 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ... ...
861 0 0 0.115473 2 Giles, Mr. Frederick Edward male 21 1 0 28134 11.5000 NaN S
862 1 1 0.952381 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48 0 0 17466 25.9292 D17 S
863 0 1 0.537879 3 Sage, Miss. Dorothy Edith "Dolly" female NaN 8 2 CA. 2343 69.5500 NaN S
864 0 0 0.115473 2 Gill, Mr. John William male 24 0 0 233866 13.0000 NaN S
865 1 1 0.952381 2 Bystrom, Mrs. (Karolina) female 42 0 0 236852 13.0000 NaN S
866 1 1 0.952381 2 Duran y More, Miss. Asuncion female 27 1 0 SC/PARIS 2149 13.8583 NaN C
867 0 0 0.358333 1 Roebling, Mr. Washington Augustus II male 31 0 0 PC 17590 50.4958 A24 S
868 0 0 0.115473 3 van Melkebeke, Mr. Philemon male NaN 0 0 345777 9.5000 NaN S
869 1 0 0.428571 3 Johnson, Master. Harold Theodor male 4 1 1 347742 11.1333 NaN S
870 0 0 0.115473 3 Balkic, Mr. Cerin male 26 0 0 349248 7.8958 NaN S
871 1 1 0.952381 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47 1 1 11751 52.5542 D35 S
872 0 0 0.358333 1 Carlsson, Mr. Frans Olof male 33 0 0 695 5.0000 B51 B53 B55 S
873 0 0 0.115473 3 Vander Cruyssen, Mr. Victor male 47 0 0 345765 9.0000 NaN S
874 1 1 0.952381 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28 1 0 P/PP 3381 24.0000 NaN C
875 1 1 0.537879 3 Najib, Miss. Adele Kiamie "Jane" female 15 0 0 2667 7.2250 NaN C
876 0 0 0.115473 3 Gustafsson, Mr. Alfred Ossian male 20 0 0 7534 9.8458 NaN S
877 0 0 0.115473 3 Petroff, Mr. Nedelio male 19 0 0 349212 7.8958 NaN S
878 0 0 0.115473 3 Laleff, Mr. Kristo male NaN 0 0 349217 7.8958 NaN S
879 1 1 0.952381 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56 0 1 11767 83.1583 C50 C
880 1 1 0.952381 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 1 230433 26.0000 NaN S
881 0 0 0.115473 3 Markun, Mr. Johann male 33 0 0 349257 7.8958 NaN S
882 0 1 0.537879 3 Dahlberg, Miss. Gerda Ulrika female 22 0 0 7552 10.5167 NaN S
883 0 0 0.115473 2 Banfield, Mr. Frederick James male 28 0 0 C.A./SOTON 34068 10.5000 NaN S
884 0 0 0.115473 3 Sutehall, Mr. Henry Jr male 25 0 0 SOTON/OQ 392076 7.0500 NaN S
885 0 0 0.083333 3 Rice, Mrs. William (Margaret Norton) female 39 0 5 382652 29.1250 NaN Q
886 0 0 0.115473 2 Montvila, Rev. Juozas male 27 0 0 211536 13.0000 NaN S
887 1 1 0.952381 1 Graham, Miss. Margaret Edith female 19 0 0 112053 30.0000 B42 S
888 0 1 0.537879 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 1 0 0.358333 1 Behr, Mr. Karl Howell male 26 0 0 111369 30.0000 C148 C
890 0 0 0.115473 3 Dooley, Mr. Patrick male 32 0 0 370376 7.7500 NaN Q

891 rows × 13 columns


In [53]:
# Pclass, Sex, Age
x_male_student = sp.array([3,1,21])
x_rich_countess = sp.array([1,0,65])
x_midleclass_mother = sp.array([2,0,40])
x_baby = sp.array([1,0,1])

In [54]:
clf.predict_proba(x_male_student)


Out[54]:
array([[ 0.88452656,  0.11547344]])

In [55]:
clf.predict_proba(x_rich_countess)


Out[55]:
array([[ 0.04761905,  0.95238095]])

In [56]:
clf.predict_proba(x_midleclass_mother)


Out[56]:
array([[ 0.04761905,  0.95238095]])

In [57]:
clf.predict_proba(x_baby)


Out[57]:
array([[ 0.5,  0.5]])

Regresija: Cijene nekretnina u Bostonu

Podatci


In [58]:
from sklearn import datasets
boston = datasets.load_boston()
print boston.DESCR


Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
**References**

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)


In [59]:
boston_df = pd.DataFrame(boston.data, 
    columns=['CRIM','ZN','IDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT'])
boston_df.insert(13, 'Price', boston.target)
boston_df


Out[59]:
CRIM ZN IDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Price
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
5 0.02985 0.0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
6 0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.60 12.43 22.9
7 0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.90 19.15 27.1
8 0.21124 12.5 7.87 0 0.524 5.631 100.0 6.0821 5 311 15.2 386.63 29.93 16.5
9 0.17004 12.5 7.87 0 0.524 6.004 85.9 6.5921 5 311 15.2 386.71 17.10 18.9
10 0.22489 12.5 7.87 0 0.524 6.377 94.3 6.3467 5 311 15.2 392.52 20.45 15.0
11 0.11747 12.5 7.87 0 0.524 6.009 82.9 6.2267 5 311 15.2 396.90 13.27 18.9
12 0.09378 12.5 7.87 0 0.524 5.889 39.0 5.4509 5 311 15.2 390.50 15.71 21.7
13 0.62976 0.0 8.14 0 0.538 5.949 61.8 4.7075 4 307 21.0 396.90 8.26 20.4
14 0.63796 0.0 8.14 0 0.538 6.096 84.5 4.4619 4 307 21.0 380.02 10.26 18.2
15 0.62739 0.0 8.14 0 0.538 5.834 56.5 4.4986 4 307 21.0 395.62 8.47 19.9
16 1.05393 0.0 8.14 0 0.538 5.935 29.3 4.4986 4 307 21.0 386.85 6.58 23.1
17 0.78420 0.0 8.14 0 0.538 5.990 81.7 4.2579 4 307 21.0 386.75 14.67 17.5
18 0.80271 0.0 8.14 0 0.538 5.456 36.6 3.7965 4 307 21.0 288.99 11.69 20.2
19 0.72580 0.0 8.14 0 0.538 5.727 69.5 3.7965 4 307 21.0 390.95 11.28 18.2
20 1.25179 0.0 8.14 0 0.538 5.570 98.1 3.7979 4 307 21.0 376.57 21.02 13.6
21 0.85204 0.0 8.14 0 0.538 5.965 89.2 4.0123 4 307 21.0 392.53 13.83 19.6
22 1.23247 0.0 8.14 0 0.538 6.142 91.7 3.9769 4 307 21.0 396.90 18.72 15.2
23 0.98843 0.0 8.14 0 0.538 5.813 100.0 4.0952 4 307 21.0 394.54 19.88 14.5
24 0.75026 0.0 8.14 0 0.538 5.924 94.1 4.3996 4 307 21.0 394.33 16.30 15.6
25 0.84054 0.0 8.14 0 0.538 5.599 85.7 4.4546 4 307 21.0 303.42 16.51 13.9
26 0.67191 0.0 8.14 0 0.538 5.813 90.3 4.6820 4 307 21.0 376.88 14.81 16.6
27 0.95577 0.0 8.14 0 0.538 6.047 88.8 4.4534 4 307 21.0 306.38 17.28 14.8
28 0.77299 0.0 8.14 0 0.538 6.495 94.4 4.4547 4 307 21.0 387.94 12.80 18.4
29 1.00245 0.0 8.14 0 0.538 6.674 87.3 4.2390 4 307 21.0 380.23 11.98 21.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
476 4.87141 0.0 18.10 0 0.614 6.484 93.6 2.3053 24 666 20.2 396.21 18.68 16.7
477 15.02340 0.0 18.10 0 0.614 5.304 97.3 2.1007 24 666 20.2 349.48 24.91 12.0
478 10.23300 0.0 18.10 0 0.614 6.185 96.7 2.1705 24 666 20.2 379.70 18.03 14.6
479 14.33370 0.0 18.10 0 0.614 6.229 88.0 1.9512 24 666 20.2 383.32 13.11 21.4
480 5.82401 0.0 18.10 0 0.532 6.242 64.7 3.4242 24 666 20.2 396.90 10.74 23.0
481 5.70818 0.0 18.10 0 0.532 6.750 74.9 3.3317 24 666 20.2 393.07 7.74 23.7
482 5.73116 0.0 18.10 0 0.532 7.061 77.0 3.4106 24 666 20.2 395.28 7.01 25.0
483 2.81838 0.0 18.10 0 0.532 5.762 40.3 4.0983 24 666 20.2 392.92 10.42 21.8
484 2.37857 0.0 18.10 0 0.583 5.871 41.9 3.7240 24 666 20.2 370.73 13.34 20.6
485 3.67367 0.0 18.10 0 0.583 6.312 51.9 3.9917 24 666 20.2 388.62 10.58 21.2
486 5.69175 0.0 18.10 0 0.583 6.114 79.8 3.5459 24 666 20.2 392.68 14.98 19.1
487 4.83567 0.0 18.10 0 0.583 5.905 53.2 3.1523 24 666 20.2 388.22 11.45 20.6
488 0.15086 0.0 27.74 0 0.609 5.454 92.7 1.8209 4 711 20.1 395.09 18.06 15.2
489 0.18337 0.0 27.74 0 0.609 5.414 98.3 1.7554 4 711 20.1 344.05 23.97 7.0
490 0.20746 0.0 27.74 0 0.609 5.093 98.0 1.8226 4 711 20.1 318.43 29.68 8.1
491 0.10574 0.0 27.74 0 0.609 5.983 98.8 1.8681 4 711 20.1 390.11 18.07 13.6
492 0.11132 0.0 27.74 0 0.609 5.983 83.5 2.1099 4 711 20.1 396.90 13.35 20.1
493 0.17331 0.0 9.69 0 0.585 5.707 54.0 2.3817 6 391 19.2 396.90 12.01 21.8
494 0.27957 0.0 9.69 0 0.585 5.926 42.6 2.3817 6 391 19.2 396.90 13.59 24.5
495 0.17899 0.0 9.69 0 0.585 5.670 28.8 2.7986 6 391 19.2 393.29 17.60 23.1
496 0.28960 0.0 9.69 0 0.585 5.390 72.9 2.7986 6 391 19.2 396.90 21.14 19.7
497 0.26838 0.0 9.69 0 0.585 5.794 70.6 2.8927 6 391 19.2 396.90 14.10 18.3
498 0.23912 0.0 9.69 0 0.585 6.019 65.3 2.4091 6 391 19.2 396.90 12.92 21.2
499 0.17783 0.0 9.69 0 0.585 5.569 73.5 2.3999 6 391 19.2 395.77 15.10 17.5
500 0.22438 0.0 9.69 0 0.585 6.027 79.7 2.4982 6 391 19.2 396.90 14.33 16.8
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9

506 rows × 14 columns


In [60]:
scatter(boston_df['RM'], boston_df['Price']);


Treniranje regresijskog modela


In [61]:
boston_X = boston.data
boston_y = boston.target
shape(boston_X)


Out[61]:
(506, 13)

In [62]:
from sklearn.linear_model import Ridge
h = Ridge(alpha=1.0)
h.fit(boston_X, boston_y)


Out[62]:
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, solver='auto', tol=0.001)

In [63]:
boston_y_predicted = h.predict(boston_X)

In [64]:
boston_df.insert(14, 'Price predicted', boston_y_predicted)
boston_df


Out[64]:
CRIM ZN IDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Price Price predicted
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0 30.258005
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6 24.809545
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7 30.535338
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4 28.913066
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2 28.183422
5 0.02985 0.0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7 25.440387
6 0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.60 12.43 22.9 22.960449
7 0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.90 19.15 27.1 19.300562
8 0.21124 12.5 7.87 0 0.524 5.631 100.0 6.0821 5 311 15.2 386.63 29.93 16.5 11.152173
9 0.17004 12.5 7.87 0 0.524 6.004 85.9 6.5921 5 311 15.2 386.71 17.10 18.9 18.820816
10 0.22489 12.5 7.87 0 0.524 6.377 94.3 6.3467 5 311 15.2 392.52 20.45 15.0 18.810170
11 0.11747 12.5 7.87 0 0.524 6.009 82.9 6.2267 5 311 15.2 396.90 13.27 18.9 21.508035
12 0.09378 12.5 7.87 0 0.524 5.889 39.0 5.4509 5 311 15.2 390.50 15.71 21.7 20.983508
13 0.62976 0.0 8.14 0 0.538 5.949 61.8 4.7075 4 307 21.0 396.90 8.26 20.4 20.029581
14 0.63796 0.0 8.14 0 0.538 6.096 84.5 4.4619 4 307 21.0 380.02 10.26 18.2 19.577042
15 0.62739 0.0 8.14 0 0.538 5.834 56.5 4.4986 4 307 21.0 395.62 8.47 19.9 19.777702
16 1.05393 0.0 8.14 0 0.538 5.935 29.3 4.4986 4 307 21.0 386.85 6.58 23.1 21.192141
17 0.78420 0.0 8.14 0 0.538 5.990 81.7 4.2579 4 307 21.0 386.75 14.67 17.5 17.159285
18 0.80271 0.0 8.14 0 0.538 5.456 36.6 3.7965 4 307 21.0 288.99 11.69 20.2 16.615288
19 0.72580 0.0 8.14 0 0.538 5.727 69.5 3.7965 4 307 21.0 390.95 11.28 18.2 18.703243
20 1.25179 0.0 8.14 0 0.538 5.570 98.1 3.7979 4 307 21.0 376.57 21.02 13.6 12.546846
21 0.85204 0.0 8.14 0 0.538 5.965 89.2 4.0123 4 307 21.0 392.53 13.83 19.6 17.857864
22 1.23247 0.0 8.14 0 0.538 6.142 91.7 3.9769 4 307 21.0 396.90 18.72 15.2 15.965941
23 0.98843 0.0 8.14 0 0.538 5.813 100.0 4.0952 4 307 21.0 394.54 19.88 14.5 13.875354
24 0.75026 0.0 8.14 0 0.538 5.924 94.1 4.3996 4 307 21.0 394.33 16.30 15.6 15.851091
25 0.84054 0.0 8.14 0 0.538 5.599 85.7 4.4546 4 307 21.0 303.42 16.51 13.9 13.561478
26 0.67191 0.0 8.14 0 0.538 5.813 90.3 4.6820 4 307 21.0 376.88 14.81 16.6 15.690294
27 0.95577 0.0 8.14 0 0.538 6.047 88.8 4.4534 4 307 21.0 306.38 17.28 14.8 14.876646
28 0.77299 0.0 8.14 0 0.538 6.495 94.4 4.4547 4 307 21.0 387.94 12.80 18.4 19.776349
29 1.00245 0.0 8.14 0 0.538 6.674 87.3 4.2390 4 307 21.0 380.23 11.98 21.0 21.138503
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
476 4.87141 0.0 18.10 0 0.614 6.484 93.6 2.3053 24 666 20.2 396.21 18.68 16.7 20.159345
477 15.02340 0.0 18.10 0 0.614 5.304 97.3 2.1007 24 666 20.2 349.48 24.91 12.0 11.043175
478 10.23300 0.0 18.10 0 0.614 6.185 96.7 2.1705 24 666 20.2 379.70 18.03 14.6 18.807340
479 14.33370 0.0 18.10 0 0.614 6.229 88.0 1.9512 24 666 20.2 383.32 13.11 21.4 21.562434
480 5.82401 0.0 18.10 0 0.532 6.242 64.7 3.4242 24 666 20.2 396.90 10.74 23.0 22.880801
481 5.70818 0.0 18.10 0 0.532 6.750 74.9 3.3317 24 666 20.2 393.07 7.74 23.7 26.485297
482 5.73116 0.0 18.10 0 0.532 7.061 77.0 3.4106 24 666 20.2 395.28 7.01 25.0 27.971894
483 2.81838 0.0 18.10 0 0.532 5.762 40.3 4.0983 24 666 20.2 392.92 10.42 21.8 20.682415
484 2.37857 0.0 18.10 0 0.583 5.871 41.9 3.7240 24 666 20.2 370.73 13.34 20.6 19.326336
485 3.67367 0.0 18.10 0 0.583 6.312 51.9 3.9917 24 666 20.2 388.62 10.58 21.2 22.117072
486 5.69175 0.0 18.10 0 0.583 6.114 79.8 3.5459 24 666 20.2 392.68 14.98 19.1 19.297614
487 4.83567 0.0 18.10 0 0.583 5.905 53.2 3.1523 24 666 20.2 388.22 11.45 20.6 21.106630
488 0.15086 0.0 27.74 0 0.609 5.454 92.7 1.8209 4 711 20.1 395.09 18.06 15.2 11.359091
489 0.18337 0.0 27.74 0 0.609 5.414 98.3 1.7554 4 711 20.1 344.05 23.97 7.0 7.607410
490 0.20746 0.0 27.74 0 0.609 5.093 98.0 1.8226 4 711 20.1 318.43 29.68 8.1 2.979239
491 0.10574 0.0 27.74 0 0.609 5.983 98.8 1.8681 4 711 20.1 390.11 18.07 13.6 13.248580
492 0.11132 0.0 27.74 0 0.609 5.983 83.5 2.1099 4 711 20.1 396.90 13.35 20.1 15.585289
493 0.17331 0.0 9.69 0 0.585 5.707 54.0 2.3817 6 391 19.2 396.90 12.01 21.8 20.929280
494 0.27957 0.0 9.69 0 0.585 5.926 42.6 2.3817 6 391 19.2 396.90 13.59 24.5 20.978616
495 0.17899 0.0 9.69 0 0.585 5.670 28.8 2.7986 6 391 19.2 393.29 17.60 23.1 17.328641
496 0.28960 0.0 9.69 0 0.585 5.390 72.9 2.7986 6 391 19.2 396.90 21.14 19.7 14.147261
497 0.26838 0.0 9.69 0 0.585 5.794 70.6 2.8927 6 391 19.2 396.90 14.10 18.3 19.347614
498 0.23912 0.0 9.69 0 0.585 6.019 65.3 2.4091 6 391 19.2 396.90 12.92 21.2 21.539158
499 0.17783 0.0 9.69 0 0.585 5.569 73.5 2.3999 6 391 19.2 395.77 15.10 17.5 18.606657
500 0.22438 0.0 9.69 0 0.585 6.027 79.7 2.4982 6 391 19.2 396.90 14.33 16.8 20.618846
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4 23.946206
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6 22.711802
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9 27.930317
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0 26.447660
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9 22.685492

506 rows × 15 columns

Grupiranje: Rukom pisane znamenke

Podatci


In [65]:
digits = sklearn.datasets.load_digits()

In [66]:
print digits.DESCR


 Optical Recognition of Handwritten Digits Data Set

Notes
-----
Data Set Characteristics:
    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is an integer in the range
0..16. This reduces dimensionality and gives invariance to small
distortions.

For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
1994.

References
----------
  - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
    Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
    Graduate Studies in Science and Engineering, Bogazici University.
  - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
  - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
    Linear dimensionalityreduction using relevance weighted LDA. School of
    Electrical and Electronic Engineering Nanyang Technological University.
    2005.
  - Claudio Gentile. A New Approximate Maximal Margin Classification
    Algorithm. NIPS. 2000.


In [67]:
digits_X = digits.data
digits_y = digits.target

In [68]:
digits_X


Out[68]:
array([[  0.,   0.,   5., ...,   0.,   0.,   0.],
       [  0.,   0.,   0., ...,  10.,   0.,   0.],
       [  0.,   0.,   0., ...,  16.,   9.,   0.],
       ..., 
       [  0.,   0.,   1., ...,   6.,   0.,   0.],
       [  0.,   0.,   2., ...,  12.,   0.,   0.],
       [  0.,   0.,  10., ...,  12.,   1.,   0.]])

In [69]:
shape(digits_X)


Out[69]:
(1797, 64)

In [70]:
x = digits_X[0]; x


Out[70]:
array([  0.,   0.,   5.,  13.,   9.,   1.,   0.,   0.,   0.,   0.,  13.,
        15.,  10.,  15.,   5.,   0.,   0.,   3.,  15.,   2.,   0.,  11.,
         8.,   0.,   0.,   4.,  12.,   0.,   0.,   8.,   8.,   0.,   0.,
         5.,   8.,   0.,   0.,   9.,   8.,   0.,   0.,   4.,  11.,   0.,
         1.,  12.,   7.,   0.,   0.,   2.,  14.,   5.,  10.,  12.,   0.,
         0.,   0.,   0.,   6.,  13.,  10.,   0.,   0.,   0.])

In [71]:
gray()
def show_digit(x) : 
    matshow(x.reshape(8,8))
    return


<matplotlib.figure.Figure at 0x7f7720927fd0>

In [72]:
show_digit(x)


Grupiranje algoritmom k-srednjih vrijednosti


In [73]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=10)
kmeans.fit(digits_X)


Out[73]:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=10, n_init=10,
    n_jobs=1, precompute_distances=True, random_state=None, tol=0.0001,
    verbose=0)

In [74]:
digits_y_predicted = map(lambda x : kmeans.predict(x)[0], digits_X)
print digits_y_predicted


[5, 1, 1, 3, 8, 2, 7, 4, 2, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 1, 2, 1, 8, 0, 4, 4, 3, 6, 0, 5, 5, 1, 1, 4, 1, 4, 5, 0, 1, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 1, 0, 6, 5, 2, 6, 0, 1, 0, 5, 5, 0, 4, 7, 3, 9, 0, 4, 8, 7, 3, 0, 3, 3, 1, 4, 1, 1, 8, 3, 1, 8, 5, 6, 4, 7, 2, 7, 0, 4, 6, 8, 8, 4, 9, 1, 1, 1, 6, 4, 2, 6, 8, 1, 1, 8, 2, 5, 1, 2, 1, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 3, 6, 6, 7, 6, 5, 2, 1, 2, 1, 8, 0, 4, 4, 3, 6, 0, 5, 5, 9, 9, 4, 1, 9, 5, 0, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 1, 2, 0, 6, 5, 2, 6, 9, 1, 9, 5, 5, 0, 4, 7, 3, 9, 0, 4, 3, 0, 3, 2, 0, 4, 7, 1, 8, 3, 0, 8, 5, 6, 4, 7, 2, 7, 0, 4, 6, 1, 8, 4, 9, 1, 9, 9, 6, 6, 0, 1, 1, 8, 2, 5, 1, 2, 1, 5, 1, 9, 3, 8, 2, 7, 4, 2, 6, 5, 1, 9, 3, 8, 6, 7, 4, 1, 4, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 2, 7, 2, 5, 2, 1, 2, 1, 8, 1, 4, 4, 3, 6, 1, 5, 5, 9, 9, 4, 2, 9, 5, 0, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 4, 1, 6, 5, 4, 6, 9, 2, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 4, 1, 4, 7, 1, 8, 3, 1, 8, 5, 2, 3, 7, 4, 7, 1, 4, 6, 8, 8, 4, 9, 2, 9, 9, 2, 4, 0, 6, 8, 2, 2, 8, 0, 5, 1, 4, 3, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 2, 3, 6, 7, 3, 5, 2, 2, 2, 2, 8, 0, 4, 4, 3, 6, 0, 5, 5, 9, 9, 4, 2, 9, 5, 9, 9, 7, 3, 2, 4, 2, 2, 8, 7, 7, 7, 8, 2, 0, 2, 5, 2, 6, 9, 1, 9, 5, 5, 0, 4, 7, 3, 9, 0, 4, 8, 7, 3, 0, 3, 2, 0, 4, 7, 2, 8, 3, 0, 8, 5, 6, 3, 7, 2, 1, 0, 4, 6, 0, 8, 4, 9, 2, 9, 1, 6, 4, 2, 6, 8, 1, 1, 8, 2, 5, 2, 2, 2, 5, 9, 9, 2, 8, 6, 7, 4, 1, 2, 5, 9, 9, 2, 4, 2, 7, 4, 1, 2, 5, 9, 9, 2, 8, 2, 7, 4, 1, 2, 5, 1, 6, 2, 7, 2, 5, 2, 1, 2, 1, 8, 9, 4, 4, 3, 2, 7, 5, 5, 9, 9, 4, 7, 9, 5, 9, 9, 7, 2, 3, 4, 2, 2, 8, 7, 7, 7, 8, 2, 7, 6, 5, 2, 2, 9, 1, 9, 5, 5, 9, 4, 7, 2, 9, 9, 4, 8, 7, 2, 9, 2, 2, 9, 4, 7, 1, 8, 3, 1, 8, 5, 2, 3, 7, 2, 7, 9, 4, 6, 8, 8, 4, 9, 1, 9, 9, 6, 4, 2, 2, 4, 0, 1, 4, 2, 5, 1, 2, 1, 5, 1, 9, 3, 8, 6, 7, 4, 2, 0, 5, 1, 9, 3, 8, 6, 7, 4, 1, 0, 5, 1, 9, 3, 0, 6, 7, 4, 2, 0, 5, 0, 8, 6, 7, 6, 5, 0, 1, 0, 2, 8, 1, 4, 4, 3, 6, 1, 5, 5, 3, 9, 4, 1, 1, 5, 1, 2, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 0, 1, 6, 5, 0, 6, 3, 2, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 0, 1, 4, 7, 1, 8, 3, 1, 4, 5, 6, 3, 7, 0, 7, 1, 4, 6, 8, 4, 4, 9, 1, 9, 9, 6, 4, 0, 6, 8, 2, 1, 4, 0, 5, 2, 0, 1, 5, 1, 9, 3, 8, 6, 7, 4, 7, 2, 5, 1, 9, 3, 8, 6, 5, 4, 0, 2, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 2, 2, 2, 8, 1, 4, 4, 3, 6, 1, 5, 5, 9, 9, 4, 0, 9, 5, 9, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 6, 9, 1, 9, 5, 5, 1, 4, 7, 3, 4, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 9, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 9, 0, 4, 9, 6, 4, 2, 6, 8, 0, 2, 8, 2, 5, 1, 2, 1, 9, 9, 3, 8, 6, 7, 4, 2, 2, 5, 9, 9, 3, 8, 2, 7, 4, 2, 2, 5, 9, 9, 3, 8, 2, 7, 4, 2, 2, 5, 2, 2, 2, 7, 2, 5, 2, 2, 2, 2, 8, 3, 4, 4, 3, 2, 9, 9, 4, 2, 9, 5, 9, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 9, 2, 5, 2, 2, 9, 2, 9, 5, 5, 9, 4, 7, 3, 9, 9, 8, 7, 3, 9, 3, 2, 9, 4, 7, 2, 8, 3, 9, 8, 5, 6, 3, 7, 2, 7, 9, 4, 2, 8, 8, 4, 9, 2, 9, 9, 2, 4, 2, 2, 8, 8, 2, 5, 2, 2, 2, 5, 1, 9, 3, 8, 6, 7, 4, 4, 3, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 1, 9, 3, 8, 6, 7, 4, 2, 2, 5, 2, 6, 6, 7, 6, 5, 2, 2, 2, 1, 8, 1, 4, 4, 3, 6, 1, 5, 5, 4, 1, 9, 5, 1, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 4, 2, 1, 6, 5, 6, 6, 9, 2, 3, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 4, 2, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 9, 1, 9, 9, 6, 4, 2, 6, 8, 9, 3, 8, 2, 5, 1, 2, 1, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 0, 9, 3, 8, 6, 7, 4, 1, 2, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 6, 2, 6, 8, 1, 4, 4, 4, 6, 1, 5, 5, 9, 9, 4, 4, 9, 5, 1, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 6, 9, 6, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 6, 4, 6, 8, 8, 4, 9, 9, 9, 9, 6, 4, 2, 6, 8, 1, 1, 8, 2, 5, 1, 2, 1, 5, 0, 9, 3, 8, 2, 7, 4, 1, 2, 5, 1, 9, 3, 8, 6, 7, 4, 1, 2, 5, 0, 3, 3, 8, 6, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 1, 2, 1, 8, 1, 4, 4, 3, 6, 1, 5, 5, 3, 3, 4, 1, 3, 5, 0, 3, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 7, 9, 1, 3, 5, 5, 0, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 2, 1, 9, 9, 2, 4, 2, 6, 8, 1, 1, 8, 2, 5, 1, 5, 0, 9, 3, 8, 6, 7, 4, 2, 2, 5, 0, 9, 3, 8, 2, 7, 4, 2, 2, 5, 0, 9, 3, 8, 2, 7, 4, 2, 2, 5, 2, 2, 2, 7, 2, 5, 2, 2, 2, 2, 8, 0, 4, 4, 3, 2, 0, 5, 5, 9, 9, 4, 0, 9, 5, 0, 9, 7, 3, 3, 4, 3, 3, 8, 7, 7, 7, 8, 2, 0, 2, 5, 2, 6, 9, 2, 9, 5, 5, 0, 4, 7, 3, 9, 0, 4, 8, 7, 3, 0, 3, 2, 0, 4, 7, 2, 8, 3, 0, 8, 5, 2, 3, 7, 2, 7, 0, 4, 2, 8, 8, 4, 9, 2, 9, 9, 6, 4, 2, 6, 8, 2, 4, 8, 3, 5, 4, 2, 2, 5, 1, 9, 3, 8, 6, 1, 1, 1, 0, 5, 1, 3, 3, 8, 6, 7, 0, 5, 1, 9, 3, 8, 6, 7, 4, 6, 0, 8, 0, 6, 6, 7, 6, 5, 6, 1, 6, 1, 8, 1, 4, 4, 1, 6, 1, 5, 5, 5, 9, 1, 1, 3, 5, 1, 9, 7, 1, 4, 4, 4, 1, 8, 7, 7, 7, 1, 0, 1, 6, 5, 2, 6, 3, 1, 5, 1, 4, 7, 3, 9, 1, 4, 1, 7, 3, 1, 3, 4, 1, 4, 7, 1, 8, 3, 1, 8, 5, 6, 3, 7, 3, 7, 1, 4, 6, 8, 8, 4, 9, 3, 6, 4, 3, 6, 4, 8, 6, 5, 1, 6, 4, 5, 1, 9, 3, 8, 6, 7, 4, 3, 2, 5, 1, 9, 1, 8, 6, 7, 4, 1, 2, 5, 1, 9, 6, 8, 8, 7, 4, 1, 2, 5, 2, 6, 6, 7, 6, 5, 2, 2, 2, 1, 8, 1, 4, 4, 4, 6, 1, 5, 5, 9, 9, 4, 1, 9, 5, 1, 9, 7, 1, 1, 4, 2, 1, 8, 7, 7, 7, 8, 2, 1, 6, 5, 2, 2, 9, 1, 9, 5, 5, 1, 4, 7, 3, 9, 1, 4, 8, 7, 3, 1, 3, 2, 1, 4, 7, 1, 8, 6, 1, 8, 5, 2, 3, 7, 2, 7, 1, 4, 6, 8, 8, 4, 9, 1, 9, 9, 2, 4, 3, 2, 8, 1, 1, 8, 2, 5, 1, 2, 2]

In [75]:
from sklearn.decomposition import PCA
digits_X_reduced = PCA(n_components=2).fit_transform(digits_X)

In [76]:
shape(digits_X_reduced)


Out[76]:
(1797, 2)

In [77]:
scatter(digits_X_reduced[:,0], digits_X_reduced[:,1], c=digits_y_predicted, cmap='prism');



In [78]:
kmeans.cluster_centers_


Out[78]:
array([[  0.00000000e+00,   2.77555756e-16,   3.48837209e-02,
          1.86046512e+00,   1.10581395e+01,   1.29418605e+01,
          4.34883721e+00,   2.55813953e-01,   2.60208521e-18,
          5.81395349e-02,   2.06976744e+00,   9.09302326e+00,
          1.38139535e+01,   1.29534884e+01,   5.29069767e+00,
          2.55813953e-01,   1.30104261e-18,   1.69767442e+00,
          9.25581395e+00,   1.26162791e+01,   1.24069767e+01,
          1.33720930e+01,   3.83720930e+00,   9.30232558e-02,
         -1.08420217e-18,   3.80232558e+00,   1.23488372e+01,
          1.18604651e+01,   1.33720930e+01,   1.35232558e+01,
          2.25581395e+00,  -2.16840434e-18,   0.00000000e+00,
          1.74418605e+00,   6.43023256e+00,   6.94186047e+00,
          1.17674419e+01,   1.24302326e+01,   1.58139535e+00,
          0.00000000e+00,  -8.67361738e-18,   6.74418605e-01,
          1.61627907e+00,   3.40697674e+00,   1.18023256e+01,
          1.18720930e+01,   9.88372093e-01,  -6.93889390e-18,
         -2.60208521e-18,   3.48837209e-02,   3.13953488e-01,
          3.05813953e+00,   1.26627907e+01,   1.16627907e+01,
          1.62790698e+00,   1.38777878e-16,  -5.42101086e-19,
          5.55111512e-16,   5.32907052e-15,   1.94186047e+00,
          1.13720930e+01,   1.08604651e+01,   1.70930233e+00,
         -3.33066907e-16],
       [  0.00000000e+00,   1.11111111e-01,   3.97333333e+00,
          1.18311111e+01,   1.23244444e+01,   5.34222222e+00,
          4.31111111e-01,  -3.33066907e-16,   8.88888889e-03,
          8.53333333e-01,   8.20888889e+00,   1.35155556e+01,
          1.25733333e+01,   9.84444444e+00,   1.56000000e+00,
          4.44089210e-16,  -1.51788304e-17,   1.20888889e+00,
          8.33333333e+00,   1.19022222e+01,   1.23288889e+01,
          9.43111111e+00,   1.02222222e+00,  -9.71445147e-17,
         -3.25260652e-18,   9.37777778e-01,   7.24888889e+00,
          1.40933333e+01,   1.41866667e+01,   4.99555556e+00,
          2.04444444e-01,  -6.50521303e-18,   0.00000000e+00,
          7.64444444e-01,   7.99111111e+00,   1.47866667e+01,
          1.28888889e+01,   2.21333333e+00,   6.22222222e-02,
          0.00000000e+00,  -2.60208521e-17,   1.22222222e+00,
          1.04488889e+01,   1.20355556e+01,   1.20933333e+01,
          4.00888889e+00,   2.66666667e-01,   3.46944695e-17,
          1.33333333e-02,   8.71111111e-01,   9.53777778e+00,
          1.15600000e+01,   1.20711111e+01,   5.60888889e+00,
          6.75555556e-01,   4.44444444e-03,   4.44444444e-03,
          1.11111111e-01,   4.18222222e+00,   1.19555556e+01,
          1.25866667e+01,   4.90666667e+00,   8.48888889e-01,
          8.88888889e-03],
       [  0.00000000e+00,   1.95121951e-01,   6.46341463e+00,
          1.24959350e+01,   1.18373984e+01,   5.63414634e+00,
          6.26016260e-01,   8.13008130e-03,   4.06504065e-03,
          2.59756098e+00,   1.39593496e+01,   9.17886179e+00,
          9.39837398e+00,   1.03739837e+01,   1.28048780e+00,
          4.06504065e-03,  -1.60461922e-17,   4.29268293e+00,
          1.28048780e+01,   4.36991870e+00,   6.82113821e+00,
          1.11544715e+01,   1.90650407e+00,  -1.45716772e-16,
         -8.67361738e-19,   2.32113821e+00,   1.04390244e+01,
          1.18699187e+01,   1.32073171e+01,   1.20406504e+01,
          2.47154472e+00,  -1.73472348e-18,   0.00000000e+00,
          3.04878049e-01,   3.21138211e+00,   6.23577236e+00,
          6.81300813e+00,   1.12154472e+01,   4.27642276e+00,
          0.00000000e+00,  -6.93889390e-18,   2.19512195e-01,
          2.35365854e+00,   2.00000000e+00,   1.67886179e+00,
          1.09471545e+01,   6.43902439e+00,   1.62601626e-02,
         -2.94902991e-17,   7.47967480e-01,   8.10569106e+00,
          5.62601626e+00,   4.65447154e+00,   1.22317073e+01,
          6.03658537e+00,   1.13821138e-01,  -4.33680869e-19,
          1.70731707e-01,   6.39024390e+00,   1.34796748e+01,
          1.45203252e+01,   1.00000000e+01,   2.33739837e+00,
          1.13821138e-01],
       [  0.00000000e+00,   5.92178771e-01,   8.73184358e+00,
          1.45921788e+01,   1.40279330e+01,   7.04469274e+00,
          6.25698324e-01,  -2.77555756e-16,   1.11731844e-02,
          4.18435754e+00,   1.26592179e+01,   9.16201117e+00,
          1.12960894e+01,   1.20223464e+01,   1.89385475e+00,
          1.11731844e-02,   5.58659218e-03,   1.90502793e+00,
          3.74860335e+00,   3.68156425e+00,   1.18435754e+01,
          9.92178771e+00,   8.60335196e-01,   5.55111512e-17,
         -2.81892565e-18,   6.14525140e-02,   9.83240223e-01,
          8.26256983e+00,   1.38156425e+01,   6.86592179e+00,
          3.29608939e-01,  -5.63785130e-18,   0.00000000e+00,
          6.14525140e-02,   6.75977654e-01,   4.52513966e+00,
          1.16648045e+01,   1.23351955e+01,   2.32402235e+00,
          0.00000000e+00,  -2.25514052e-17,   4.63687151e-01,
          1.49720670e+00,   6.81564246e-01,   4.17877095e+00,
          1.24022346e+01,   6.30167598e+00,   5.58659218e-03,
         -2.42861287e-17,   9.44134078e-01,   7.34078212e+00,
          6.60335196e+00,   8.66480447e+00,   1.37039106e+01,
          6.02234637e+00,   1.73184358e-01,  -1.40946282e-18,
          4.69273743e-01,   9.50837989e+00,   1.49608939e+01,
          1.41061453e+01,   8.81564246e+00,   1.82122905e+00,
          4.13407821e-01],
       [  0.00000000e+00,   1.59420290e-01,   4.87922705e+00,
          1.28792271e+01,   1.40241546e+01,   1.09275362e+01,
          4.96135266e+00,   9.37198068e-01,  -2.86229374e-17,
          1.11594203e+00,   1.06473430e+01,   1.15217391e+01,
          1.03864734e+01,   1.25507246e+01,   5.54106280e+00,
          5.36231884e-01,  -1.43114687e-17,   1.18840580e+00,
          5.50241546e+00,   2.30434783e+00,   6.78260870e+00,
          1.15507246e+01,   3.42995169e+00,   1.11111111e-01,
         -3.25260652e-18,   1.00483092e+00,   5.09178744e+00,
          6.47342995e+00,   1.21642512e+01,   1.20917874e+01,
          4.79710145e+00,   4.83091787e-03,   0.00000000e+00,
          1.48792271e+00,   8.65700483e+00,   1.30483092e+01,
          1.46859903e+01,   1.07101449e+01,   3.94685990e+00,
          0.00000000e+00,  -2.60208521e-17,   1.11111111e+00,
          5.15458937e+00,   1.14830918e+01,   1.10386473e+01,
          3.73429952e+00,   5.36231884e-01,   1.73472348e-17,
         -2.68882139e-17,   1.01449275e-01,   2.98067633e+00,
          1.22753623e+01,   6.39130435e+00,   4.54106280e-01,
          9.66183575e-03,  -5.82867088e-16,  -1.62630326e-18,
          1.25603865e-01,   6.08212560e+00,   1.19855072e+01,
          2.70531401e+00,   2.85024155e-01,   3.38164251e-02,
         -7.21644966e-16],
       [  0.00000000e+00,   2.23463687e-02,   4.22905028e+00,
          1.31396648e+01,   1.12681564e+01,   2.93854749e+00,
          3.35195531e-02,  -2.77555756e-16,  -2.51534904e-17,
          8.82681564e-01,   1.26201117e+01,   1.33687151e+01,
          1.14078212e+01,   1.13687151e+01,   9.60893855e-01,
          3.60822483e-16,  -1.25767452e-17,   3.72625698e+00,
          1.42122905e+01,   5.25139665e+00,   2.10614525e+00,
          1.21173184e+01,   3.53072626e+00,   5.55111512e-17,
         -2.81892565e-18,   5.29608939e+00,   1.26424581e+01,
          2.03351955e+00,   2.29050279e-01,   9.07821229e+00,
          6.47486034e+00,  -5.63785130e-18,   0.00000000e+00,
          5.88268156e+00,   1.14916201e+01,   8.65921788e-01,
          3.35195531e-02,   8.81005587e+00,   7.15083799e+00,
          0.00000000e+00,  -2.25514052e-17,   3.51396648e+00,
          1.32849162e+01,   1.65921788e+00,   1.49162011e+00,
          1.13519553e+01,   5.84357542e+00,  -2.08166817e-17,
         -2.42861287e-17,   8.04469274e-01,   1.31117318e+01,
          9.96089385e+00,   1.03519553e+01,   1.32960894e+01,
          2.47486034e+00,   2.23463687e-02,  -1.40946282e-18,
          5.58659218e-03,   4.19553073e+00,   1.35865922e+01,
          1.33407821e+01,   5.48044693e+00,   3.18435754e-01,
          1.67597765e-02],
       [  0.00000000e+00,   1.10738255e+00,   1.00268456e+01,
          1.34093960e+01,   1.41610738e+01,   1.25369128e+01,
          4.37583893e+00,   4.02684564e-02,   6.71140940e-03,
          4.55704698e+00,   1.49328859e+01,   1.25637584e+01,
          8.70469799e+00,   7.03355705e+00,   2.47651007e+00,
          3.35570470e-02,   1.34228188e-02,   6.07382550e+00,
          1.45302013e+01,   5.95302013e+00,   1.97315436e+00,
          1.02684564e+00,   2.01342282e-01,   1.38777878e-16,
          6.71140940e-03,   5.30201342e+00,   1.43355705e+01,
          1.23624161e+01,   7.85906040e+00,   2.26174497e+00,
          1.47651007e-01,  -5.20417043e-18,   0.00000000e+00,
          1.94630872e+00,   8.15436242e+00,   1.00939597e+01,
          1.02684564e+01,   5.51006711e+00,   6.37583893e-01,
          0.00000000e+00,  -2.08166817e-17,   3.02013423e-01,
          1.39597315e+00,   4.87248322e+00,   9.87248322e+00,
          7.02013423e+00,   7.78523490e-01,  -7.63278329e-17,
         -1.99493200e-17,   8.05369128e-01,   5.06040268e+00,
          9.47651007e+00,   1.21275168e+01,   5.27516779e+00,
          4.42953020e-01,  -3.88578059e-16,  -1.30104261e-18,
          1.05369128e+00,   1.08926174e+01,   1.45369128e+01,
          7.83892617e+00,   1.08724832e+00,   2.01342282e-02,
         -6.10622664e-16],
       [  0.00000000e+00,  -1.16573418e-15,   1.15934066e+00,
          1.12252747e+01,   9.53296703e+00,   1.41758242e+00,
          5.49450549e-03,  -3.05311332e-16,  -2.51534904e-17,
          6.04395604e-02,   7.18131868e+00,   1.45604396e+01,
          6.19230769e+00,   8.29670330e-01,   2.74725275e-02,
          3.74700271e-16,  -1.25767452e-17,   7.69230769e-01,
          1.24560440e+01,   9.47252747e+00,   9.34065934e-01,
          1.09890110e-01,   0.00000000e+00,   4.16333634e-17,
         -3.03576608e-18,   2.29670330e+00,   1.36208791e+01,
          8.09340659e+00,   3.87362637e+00,   1.92857143e+00,
          1.04395604e-01,  -6.07153217e-18,   0.00000000e+00,
          3.52747253e+00,   1.46758242e+01,   1.29175824e+01,
          1.22527473e+01,   1.02857143e+01,   2.71978022e+00,
          0.00000000e+00,  -2.42861287e-17,   1.86813187e+00,
          1.45164835e+01,   1.06538462e+01,   5.57692308e+00,
          1.01923077e+01,   9.13186813e+00,   2.30769231e-01,
         -2.42861287e-17,   1.75824176e-01,   1.02857143e+01,
          1.26263736e+01,   5.41758242e+00,   1.13241758e+01,
          1.08956044e+01,   6.26373626e-01,  -1.51788304e-18,
         -6.10622664e-16,   1.44505495e+00,   1.07362637e+01,
          1.50989011e+01,   1.31318681e+01,   4.62087912e+00,
          1.70329670e-01],
       [  0.00000000e+00,  -1.05471187e-15,   2.84023669e-01,
          6.95266272e+00,   1.19467456e+01,   2.02366864e+00,
          1.47928994e-01,   5.32544379e-02,  -2.34187669e-17,
          1.18343195e-02,   3.17159763e+00,   1.35917160e+01,
          8.63313609e+00,   1.54437870e+00,   9.58579882e-01,
          3.13609467e-01,  -1.17093835e-17,   6.15384615e-01,
          1.04674556e+01,   1.16390533e+01,   4.39644970e+00,
          5.20710059e+00,   3.88165680e+00,   3.49112426e-01,
          5.91715976e-03,   4.64497041e+00,   1.46449704e+01,
          6.01775148e+00,   6.78106509e+00,   1.08165680e+01,
          6.26035503e+00,   1.77514793e-02,   0.00000000e+00,
          8.84615385e+00,   1.48284024e+01,   9.41420118e+00,
          1.27692308e+01,   1.44260355e+01,   5.46153846e+00,
          0.00000000e+00,   9.46745562e-02,   6.43786982e+00,
          1.15621302e+01,   1.22307692e+01,   1.47751479e+01,
          1.09112426e+01,   1.60355030e+00,  -3.81639165e-17,
          5.91715976e-02,   1.11242604e+00,   2.92899408e+00,
          7.54437870e+00,   1.39881657e+01,   4.36686391e+00,
          1.77514793e-02,  -4.71844785e-16,  -1.40946282e-18,
          2.36686391e-02,   3.43195266e-01,   7.73964497e+00,
          1.23491124e+01,   1.92307692e+00,  -5.77315973e-15,
         -6.66133815e-16],
       [  0.00000000e+00,   9.42857143e-01,   1.01885714e+01,
          1.44400000e+01,   7.77142857e+00,   9.82857143e-01,
         -1.33226763e-15,  -2.77555756e-16,   2.28571429e-02,
          5.24000000e+00,   1.37200000e+01,   1.26228571e+01,
          1.16914286e+01,   3.23428571e+00,   1.71428571e-02,
          3.60822483e-16,   1.14285714e-02,   4.56000000e+00,
          8.11428571e+00,   6.13714286e+00,   1.21600000e+01,
          3.56000000e+00,   1.71428571e-02,   7.63278329e-17,
         -2.81892565e-18,   9.65714286e-01,   2.81714286e+00,
          7.00571429e+00,   1.25371429e+01,   2.56000000e+00,
          4.00000000e-02,  -5.63785130e-18,   0.00000000e+00,
          4.57142857e-02,   1.57142857e+00,   9.89714286e+00,
          1.06971429e+01,   1.45142857e+00,  -7.10542736e-15,
          0.00000000e+00,  -2.25514052e-17,   2.51428571e-01,
          4.45714286e+00,   1.12457143e+01,   7.74285714e+00,
          2.37142857e+00,   8.45714286e-01,   1.14285714e-02,
         -2.34187669e-17,   1.19428571e+00,   1.09942857e+01,
          1.37314286e+01,   1.19257143e+01,   1.11600000e+01,
          7.66857143e+00,   1.10285714e+00,  -1.40946282e-18,
          9.31428571e-01,   1.03885714e+01,   1.44685714e+01,
          1.35028571e+01,   1.23542857e+01,   8.96571429e+00,
          2.95428571e+00]])

In [79]:
map(show_digit, kmeans.cluster_centers_);