```
In [7]:
```from sklearn import datasets
from sklearn import tree
from sklearn.cross_validation import train_test_split
from sklearn import metrics
import numpy as np

```
In [13]:
```def measure_performance(X,y,clf, show_accuracy=True, show_classification_report=True, show_confussion_matrix=True):
y_pred=clf.predict(X)
if show_accuracy:
print("Accuracy:{0:.3f}".format(metrics.accuracy_score(y, y_pred)),"\n")
if show_classification_report:
print("Classification report")
print(metrics.classification_report(y,y_pred),"\n")
if show_confussion_matrix:
print("Confusion matrix")
print(metrics.confusion_matrix(y,y_pred),"\n")

```
In [9]:
```iris = datasets.load_iris()
x = iris.data[:,2:]
y = iris.target
dt = tree.DecisionTreeClassifier()
dt = dt.fit(x,y)

```
In [ ]:
```x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.50,train_size=0.50)

```
In [15]:
```measure_performance(x_test,y_test,dt)

```
```

```
In [14]:
```measure_performance(x_train,y_train,dt)

```
```

```
In [16]:
```x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.25,train_size=0.75)

```
In [17]:
```measure_performance(x_train,y_train,dt)

```
```

```
In [28]:
```x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.95,train_size=0.05)
measure_performance(x_test,y_test, dt)

```
```

Weird. Small dataset, I guess?

`datasets.load_breast_cancer()`

) and perform basic exploratory analysis. What attributes to we have? What are we trying to predict?For context of the data, see the documentation here: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)

```
In [44]:
```bc = datasets.load_breast_cancer()

```
In [47]:
```x = bc.data[:,2:]
y = bc.target
dt = tree.DecisionTreeClassifier()
dt = dt.fit(x,y)

```
In [50]:
``````
dt
```

```
Out[50]:
```

```
In [51]:
```x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.50,train_size=0.50)

```
In [52]:
```measure_performance(x_train,y_train,dt)

```
```

```
In [53]:
```measure_performance(x_test,y_test,dt)

```
```

```
In [54]:
```x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.25,train_size=0.75)

```
In [55]:
```measure_performance(x_train,y_train,dt)

```
```

```
In [57]:
```from sklearn import tree
from sklearn.externals.six import StringIO
import pydotplus
dt = tree.DecisionTreeClassifier()
dt = dt.fit(x,y)
with open("bc.dot", 'w') as f:
f = tree.export_graphviz(dt, out_file=f)

```
In [59]:
```import os
os.unlink('bc.dot')

```
In [60]:
```dot_data = StringIO()
tree.export_graphviz(dt, out_file=dot_data) #brew install graphviz
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("bc.pdf")

```
Out[60]:
```

```
In [61]:
```from IPython.display import IFrame
IFrame("bc.pdf", width=800, height=800)

```
Out[61]:
```

```
In [ ]:
```