Einleitung

Warum wollen wir maschinelles Lernen einsetzen?

Programmieren können wir doch längst ?

Es geht nicht nur um Blumen !

Wo sind die Problemstellungen die wir mit maschinellem Lernen lösen können ?

Einleitung

Warum wollen wir maschinelles Lernen einsetzen?

Programmieren können wir doch längst ?

Es geht nicht nur um Blumen !

Wo sind die Problemstellungen die wir mit maschinellem Lernen lösen können ?



In [1]:

    
print('Auch beim Maschinellen Lernen immer wichtig:' + '\n' +'Aufgabe und Daten umfassend kennenlernen')









    



Auch beim Maschinellen Lernen immer wichtig:
Aufgabe und Daten umfassend kennenlernen

Deshalb die Daten sich anzeigen lassen (print() etc.)

In welchem Format kommen die Daten ? (csv, stream, etc.)

Daten plotten mit (matplotlib etc.)

Statistische Daten zeigen lassen: Varianz, Mittelwert, Standardabweichung, Maximum, Minimum, Reichweite

Hierzu gibt es den Befehl describe(), den werden wir noch kennenlernen.

scatter_matrix zeigen lassen !

Warum wollen wir python einsetzen und nicht .....[ Java, C++, R oder andere ? ]

Welche Antwort finden Sie darauf ?

Warum wollen Sie python einsetzen ?

Wesentliche Bibliotheken und Werkzeuge

![python_biblitheken](bilder/python-bibliotheken-10.png)

Programmieren können wir doch längst ?

Wo finde ich die Methoden zum Maschinellen Lernen

url="http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html")

Arbeiten mit Jupyter Notebooks kennenlernen



In [3]:

    
%matplotlib inline

Warum setzen wir die Bibliothek Scikit-learn ein ?

Installation Scikit-learn

Wir bevorzugen eine Anaconda Installation !

Warum wollen Sie scikit-learn einsetzen ?

Arbeiten mit NumPy kennenlernen



In [4]:

    
import numpy as np

x = np.array([[1, 2, 3], [4, 5, 6]])
print("x:\n{}".format(x))









    



x:
[[1 2 3]
 [4 5 6]]

Arbeiten mit SciPy



In [5]:

    
from scipy import sparse

# create a 2d NumPy array with a diagonal of ones, and zeros everywhere else
eye = np.eye(4)
print("NumPy array:\n{}".format(eye))









    



NumPy array:
[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]



In [6]:

    
# convert the NumPy array to a SciPy sparse matrix in CSR format
# only the non-zero entries are stored
sparse_matrix = sparse.csr_matrix(eye)
print("\nSciPy sparse CSR matrix:\n{}".format(sparse_matrix))









    



SciPy sparse CSR matrix:
  (0, 0)	1.0
  (1, 1)	1.0
  (2, 2)	1.0
  (3, 3)	1.0



In [7]:

    
data = np.ones(4)
row_indices = np.arange(4)
col_indices = np.arange(4)
eye_coo = sparse.coo_matrix((data, (row_indices, col_indices)))
print("COO representation:\n{}".format(eye_coo))









    



COO representation:
  (0, 0)	1.0
  (1, 1)	1.0
  (2, 2)	1.0
  (3, 3)	1.0

#matplotlib : Graphische Darstellungen einfach gemacht



In [8]:

    
%matplotlib inline
import matplotlib.pyplot as plt

# Generierung einer Zahlenreihe von -10 bis 10 in 100 Schritten
x = np.linspace(-10, 10, 100)
# Erzeugen eines Zweiten numpy arrays mit der Funktion sin()
y = np.sin(x)
# The plot function makes a line chart of one array against another
plt.plot(x, y, marker="o", color='brown')
plt.title('Sinus Kurve')
plt.xlabel('x')
plt.ylabel('y')
#plt.legend('sin x','upper left')









    Out[8]:





<matplotlib.text.Text at 0x7ab7668>



In [9]:

    
%matplotlib inline
import matplotlib.pyplot as plt

# Generierung einer Zahlenreihe von -10 bis 10 in 100 Schritten
x = np.linspace(-20, 20, 100)
# Erzeugen eines Zweiten numpy arrays mit der Funktion sin()
y = np.exp(x)
# The plot function makes a line chart of one array against another
plt.plot(x, y, marker="o", color='green')
plt.title('Exponential Kurve')
plt.xlabel('x')
plt.ylabel('y')









    Out[9]:





<matplotlib.text.Text at 0x7864128>



In [10]:

    
%matplotlib inline
import matplotlib.pyplot as plt
from numpy.random import randn

z = randn(100)

red_dot, = plt.plot(z, "ro", markersize=15)
# Schreibe ein weisses Kruez über einige der Daten.
white_cross, = plt.plot(z[:50], "w+", markeredgewidth=3, markersize=15)

plt.legend([red_dot, (red_dot, white_cross)], ["Attr A", "Attr A+B"])









    Out[10]:





<matplotlib.legend.Legend at 0x7eddac8>

Arbeiten mit pandas



In [11]:

    
import pandas as pd
from IPython.display import display

# create a simple dataset of people
data = {'Name': ["John", "Anna", "Peter", "Linda"],
        'Location' : ["New York", "Paris", "Berlin", "London"],
        'Age' : [24, 13, 53, 33]
       }

data_pandas = pd.DataFrame(data)
# IPython.display allows "pretty printing" of dataframes
# in the Jupyter notebook
display(data_pandas)









    






  
    
      
      Age
      Location
      Name
    
  
  
    
      0
      24
      New York
      John
    
    
      1
      13
      Paris
      Anna
    
    
      2
      53
      Berlin
      Peter
    
    
      3
      33
      London
      Linda



In [14]:

    
# One of many possible ways to query the table:
# selecting all rows that have an age column greate than 30
display(data_pandas[data_pandas.Age > 30])









    






  
    
      
      Age
      Location
      Name
    
  
  
    
      2
      53
      Berlin
      Peter
    
    
      3
      33
      London
      Linda

Python2 versus Python3

Versions Used in this Notebook



In [12]:

    
import sys
print("Python version: {}".format(sys.version))

import pandas as pd
print("pandas version: {}".format(pd.__version__))

import matplotlib
print("matplotlib version: {}".format(matplotlib.__version__))

import numpy as np
print("NumPy version: {}".format(np.__version__))

import scipy as sp
print("SciPy version: {}".format(sp.__version__))

import IPython
print("IPython version: {}".format(IPython.__version__))

import sklearn
print("scikit-learn version: {}".format(sklearn.__version__))









    



Python version: 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)]
pandas version: 0.18.1
matplotlib version: 1.5.3
NumPy version: 1.11.1
SciPy version: 0.18.1
IPython version: 5.1.0
scikit-learn version: 0.18.1

Eine erste Anwendung: Classifikation von Lilien

Schauen wir uns die Daten an



In [13]:

    
from sklearn.datasets import load_iris
iris_dataset = load_iris()



In [14]:

    
print("Keys of iris_dataset: {}".format(iris_dataset.keys()))









    



Keys of iris_dataset: ['target_names', 'data', 'target', 'DESCR', 'feature_names']



In [15]:

    
print(iris_dataset['DESCR'][:193] + "\n...")









    



Iris Plants Database
====================

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive att
...



In [16]:

    
print("Target names: {}".format(iris_dataset['target_names']))









    



Target names: ['setosa' 'versicolor' 'virginica']



In [17]:

    
print("Feature names: {}".format(iris_dataset['feature_names']))









    



Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']



In [18]:

    
print("Type of data: {}".format(type(iris_dataset['data'])))









    



Type of data: <type 'numpy.ndarray'>



In [19]:

    
print("Shape of data: {}".format(iris_dataset['data'].shape))









    



Shape of data: (150L, 4L)



In [20]:

    
print("First five rows of data:\n{}".format(iris_dataset['data'][:5]))









    



First five rows of data:
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]



In [21]:

    
print("Type of target: {}".format(type(iris_dataset['target'])))









    



Type of target: <type 'numpy.ndarray'>



In [22]:

    
print("Shape of target: {}".format(iris_dataset['target'].shape))









    



Shape of target: (150L,)



In [23]:

    
print("Shape of target: ")
print(iris_dataset['target'].shape)









    



Shape of target: 
(150L,)



In [24]:

    
print("Target:\n{}".format(iris_dataset['target']))









    



Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

Wie wird der Erfolg gemessen: Training und Test der Daten



In [25]:

    
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    iris_dataset['data'], iris_dataset['target'], random_state=1)



In [26]:

    
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))









    



X_train shape: (112L, 4L)
y_train shape: (112L,)



In [27]:

    
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))









    



X_test shape: (38L, 4L)
y_test shape: (38L,)

Immer zuerst: Schauen Sie sich die Daten an



In [29]:

    
import mglearn
# create dataframe from data in X_train
# label the columns using the strings in iris_dataset.feature_names
iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)
# create a scatter matrix from the dataframe, color by y_train
pd.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o',hist_kwds={'bins': 20}, s=60, alpha=.8, cmap=mglearn.cm3)









    Out[29]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000000D0EFE10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000DC39320>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000DD377F0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000DDE9748>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000000DEB6B00>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000DE71CC0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E0A6B38>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E1B16D8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000000E254B00>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E35E4A8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E4051D0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E4CDDD8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000000E5D96A0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E6D5390>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E78FEF0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E833DD8>]], dtype=object)

Ihr erstes Machine Learning Modell: k nearest neighbors

Es wurde von n=1 auf n=2 abgeändert



In [30]:

    
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)



In [31]:

    
knn.fit(X_train, y_train)









    Out[31]:





KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')

Vorhersagen machen



In [32]:

    
X_new = np.array([[5, 2.9, 1, 0.2]])
print("X_new.shape: {}".format(X_new.shape))









    



X_new.shape: (1L, 4L)



In [33]:

    
prediction = knn.predict(X_new)
print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(
       iris_dataset['target_names'][prediction]))









    



Prediction: [0]
Predicted target name: ['setosa']

Evaluation des Modells



In [34]:

    
y_pred = knn.predict(X_test)
print("Test set predictions:\n {}".format(y_pred))









    



Test set predictions:
 [0 1 1 0 2 1 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 1 0 2 1 0 0 1 2 1 2 1 2 2 0 1
 0]



In [35]:

    
print("Test set score: {:.2f}".format(np.mean(y_pred == y_test)))









    



Test set score: 1.00



In [36]:

    
print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))









    



Test set score: 1.00

Zusammenfassung und Ausblick



In [37]:

    
X_train, X_test, y_train, y_test = train_test_split(
    iris_dataset['data'], iris_dataset['target'], random_state=0)

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))









    



Test set score: 0.97



In [ ]: