Analiza danych i uczenie maszynowe w Python

Autor notebooka: Jakub Nowacki.

Regresja liniowa

Regresja liniowa jest jedną z podstawowych, niemniej nadal często wykorzystywanych rodzajów regresji. Przećwiczymy ją na przykładowym zbiorze danych związanych z cukrzycą.



In [15]:

    
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
%matplotlib inline

plt.rcParams['figure.figsize'] = (10, 8)

# Zbiór danych
diabetes = datasets.load_diabetes()
print(diabetes.DESCR)









    



Diabetes dataset
================

Notes
-----

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

Data Set Characteristics:

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attributes:
    :Age:
    :Sex:
    :Body mass index:
    :Average blood pressure:
    :S1:
    :S2:
    :S3:
    :S4:
    :S5:
    :S6:

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
http://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)



In [16]:

    
diabetes.keys()









    Out[16]:





dict_keys(['data', 'target', 'DESCR', 'feature_names'])



In [17]:

    
diabetes.data









    Out[17]:





array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990842, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06832974, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286377, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04687948,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452837, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00421986,  0.00306441]])



In [18]:

    
diabetes.data.shape









    Out[18]:





(442, 10)



In [19]:

    
diabetes.feature_names









    Out[19]:





['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']



In [20]:

    
diabetes.target









    Out[20]:





array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 288.,  88., 292.,  71.,
       197., 186.,  25.,  84.,  96., 195.,  53., 217., 172., 131., 214.,
        59.,  70., 220., 268., 152.,  47.,  74., 295., 101., 151., 127.,
       237., 225.,  81., 151., 107.,  64., 138., 185., 265., 101., 137.,
       143., 141.,  79., 292., 178.,  91., 116.,  86., 122.,  72., 129.,
       142.,  90., 158.,  39., 196., 222., 277.,  99., 196., 202., 155.,
        77., 191.,  70.,  73.,  49.,  65., 263., 248., 296., 214., 185.,
        78.,  93., 252., 150.,  77., 208.,  77., 108., 160.,  53., 220.,
       154., 259.,  90., 246., 124.,  67.,  72., 257., 262., 275., 177.,
        71.,  47., 187., 125.,  78.,  51., 258., 215., 303., 243.,  91.,
       150., 310., 153., 346.,  63.,  89.,  50.,  39., 103., 308., 116.,
       145.,  74.,  45., 115., 264.,  87., 202., 127., 182., 241.,  66.,
        94., 283.,  64., 102., 200., 265.,  94., 230., 181., 156., 233.,
        60., 219.,  80.,  68., 332., 248.,  84., 200.,  55.,  85.,  89.,
        31., 129.,  83., 275.,  65., 198., 236., 253., 124.,  44., 172.,
       114., 142., 109., 180., 144., 163., 147.,  97., 220., 190., 109.,
       191., 122., 230., 242., 248., 249., 192., 131., 237.,  78., 135.,
       244., 199., 270., 164.,  72.,  96., 306.,  91., 214.,  95., 216.,
       263., 178., 113., 200., 139., 139.,  88., 148.,  88., 243.,  71.,
        77., 109., 272.,  60.,  54., 221.,  90., 311., 281., 182., 321.,
        58., 262., 206., 233., 242., 123., 167.,  63., 197.,  71., 168.,
       140., 217., 121., 235., 245.,  40.,  52., 104., 132.,  88.,  69.,
       219.,  72., 201., 110.,  51., 277.,  63., 118.,  69., 273., 258.,
        43., 198., 242., 232., 175.,  93., 168., 275., 293., 281.,  72.,
       140., 189., 181., 209., 136., 261., 113., 131., 174., 257.,  55.,
        84.,  42., 146., 212., 233.,  91., 111., 152., 120.,  67., 310.,
        94., 183.,  66., 173.,  72.,  49.,  64.,  48., 178., 104., 132.,
       220.,  57.])

Dla lepszej jasności przykładu użyjmy jednego atrybutu do przeprowadzenia regresji.



In [21]:

    
diabetes_X = diabetes.data[:, np.newaxis, 2]  # wyciągamy jako wektor kolumnowy (nie trzeba tego robić jak mamy więcej niż jedną kolumnę)

# Dzielimy dane na zbiory treningowy i testowy
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]



In [22]:

    
diabetes_X_train[:5], diabetes_y_train[:5]









    Out[22]:





(array([[ 0.06169621],
        [-0.05147406],
        [ 0.04445121],
        [-0.01159501],
        [-0.03638469]]), array([151.,  75., 141., 206., 135.]))



In [23]:

    
# Tworzymy obiekt modelu i go uczymy
regr = linear_model.LinearRegression()

regr.fit(diabetes_X_train, diabetes_y_train)









    Out[23]:





LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Teraz możemy sprawdzić czy model dobrze się uczy i jak przewiduje na danych testowych.



In [24]:

    
diabetes_y_pred = regr.predict(diabetes_X_test)
diabetes_y_pred









    Out[24]:





array([225.9732401 , 115.74763374, 163.27610621, 114.73638965,
       120.80385422, 158.21988574, 236.08568105, 121.81509832,
        99.56772822, 123.83758651, 204.73711411,  96.53399594,
       154.17490936, 130.91629517,  83.3878227 , 171.36605897,
       137.99500384, 137.99500384, 189.56845268,  84.3990668 ])

Możemy przeprowadzić teraz ocenę jakości modelu.



In [25]:

    
print('Współczynniki: \n', regr.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
print('Metryka R2 (wariancji): %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))









    



Współczynniki: 
 [938.23786125]
Błąd średniokwadratowy: 2548.07
Metryka R2 (wariancji): 0.47

Narysujmy też predykcje naszego modelu na wykresie.



In [26]:

    
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.scatter(diabetes_X_train, diabetes_y_train,  color='red')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)
plt.show()



In [27]:

    
diabetes_X = diabetes.data  # wyciągamy jako wektor kolumnowy (nie trzeba tego robić jak mamy więcej niż jedną kolumnę)
diabetes_X









    Out[27]:





array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990842, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06832974, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286377, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04687948,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452837, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00421986,  0.00306441]])

Zadanie

Użyj więcej zmiennych do uczenia modelu; porównaj wyniki pomiaru jakości regresji.
Narysuj linię regresji w stosunku do innych zmiennych.
★ Jakie cechy wpływają na najbardziej na wynik? Jak to sprawdzić?



In [29]:

    
# diabetes_X = diabetes.data[:, np.newaxis, 2]  # np.newaxis - wyciągamy jako wektor kolumnowy (nie trzeba tego robić jak mamy więcej niż jedną kolumnę)
diabetes_X = diabetes.data[:, [1, 2, 3]]
#diabetes_X = diabetes.data[:, np.newaxis, 2] 


# Dzielimy dane na zbiory treningowy i testowy
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Tworzymy obiekt modelu i go uczymy
regr = linear_model.LinearRegression()

regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)


print('Współczynniki: \n', regr.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
print('Metryka R2 (wariancji): %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))


plt.scatter(diabetes_X_test[:,2], diabetes_y_test,  color='black')
plt.scatter(diabetes_X_train[:,2], diabetes_y_train,  color='red')
plt.plot(diabetes_X_test[:,2], diabetes_y_pred, color='blue', linewidth=3)
plt.show()









    



Współczynniki: 
 [-96.87616507 780.09757364 432.26095788]
Błąd średniokwadratowy: 2510.21
Metryka R2 (wariancji): 0.48

Pandas

Spróbujmy powrócić do Pandas i wykonać ten sam model.



In [30]:

    
import pandas as pd

dia_df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)\
    .assign(target=diabetes.target)
    
dia_df.head()



In [31]:

    
dia_train = dia_df.iloc[:-20, :]
dia_train.head(20)



In [32]:

    
dia_test = dia_df.iloc[-20:, :]
dia_test



In [33]:

    
lr = linear_model.LinearRegression()
lr.fit(dia_train[['age', 'sex', 'bmi']], dia_train['target'])









    Out[33]:





LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)



In [34]:

    
dia_test = dia_test.assign(predict=lambda x: lr.predict(x[['age', 'sex', 'bmi']]))
dia_test



In [35]:

    
print('Współczynniki: \n', lr.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(dia_test['target'], lr.predict(dia_test[['age', 'sex', 'bmi']])))
print('Metryka R2 (wariancji): %.2f' % r2_score(dia_test['target'], dia_test['predict']))









    



Współczynniki: 
 [144.25978848 -33.43463042 914.07000914]
Błąd średniokwadratowy: 2585.66
Metryka R2 (wariancji): 0.46



In [37]:

    
import pandas as pd


def model(dataframe, features, target, procent_testowy=20):

    dia_df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names).assign(target=diabetes.target)

    # Podiał zbioru na testowy i treningowy
    dia_train = dia_df.iloc[:-procent_testowy, :]
    dia_test = dia_df.iloc[-procent_testowy:, :]

    lr = linear_model.LinearRegression()
    lr.fit(dia_train[['bmi']], dia_train['target'])

    dia_test = dia_test.assign(predict=lambda x: lr.predict(x[['bmi']]))

    
    print('Współczynniki: \n', lr.coef_)
    print("Błąd średniokwadratowy: %.2f"  % mean_squared_error(dia_test['target'], lr.predict(dia_test[['bmi']])))
    print('Metryka R2 (wariancji): %.2f' % r2_score(dia_test['target'], dia_test['predict']))

Zadanie

Podobnie jak powyżej, poeksperymentuj z cechami.
Zautomatyzuj powyższy eksperyment.
★ Czy są jeszcze jakieś parametry które można dostosować?

Regresja liniowa z regularyzacją

Aby wybrać odpowiedni model, który odpowiednio generalizuje, używa się technik regularyzacji. Dwie najbardziej znane techniki to Lasso, czyli regularyzacja L1, oraz Ridge, czyli regularyzacja L2. Poniżej przykłady wykorzystania tych algorytmów.



In [38]:

    
ridge = linear_model.Ridge()
ridge.fit(dia_train[['age', 'sex', 'bmi']], dia_train['target'])









    Out[38]:





Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)



In [39]:

    
dia_test = dia_test.assign(predict=lambda x: ridge.predict(x[['age', 'sex', 'bmi']]))
dia_test



In [40]:

    
print('Współczynniki: \n', ridge.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(dia_test['target'], ridge.predict(dia_test[['age', 'sex', 'bmi']])))
print('Metryka R2 (wariancji): %.2f' % r2_score(dia_test['target'], dia_test['predict']))









    



Współczynniki: 
 [109.85742979   3.77646864 448.40398428]
Błąd średniokwadratowy: 3602.78
Metryka R2 (wariancji): 0.25



In [41]:

    
lasso = linear_model.Lasso()
lasso.fit(dia_train[['age', 'sex', 'bmi']], dia_train['target'])









    Out[41]:





Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)



In [42]:

    
dia_test = dia_test.assign(predict=lambda x: lasso.predict(x[['age', 'sex', 'bmi']]))
dia_test



In [ ]:

    
print('Współczynniki: \n', lasso.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(dia_test['target'], lasso.predict(dia_test[['age', 'sex', 'bmi']])))
print('Metryka R2 (wariancji): %.2f' % r2_score(dia_test['target'], dia_test['predict']))

Jak widać parametry wypadły gorzej niż dla zwykłej regresji liniowej. Wynika to z faktu, że regularyzacje mają hipetparametry, które należy dostosować do problemy. Do tego zostały stworzone wersje z wbudowaną walidacją krzyżową (Cross-validation, która również dobiera hiperparametry.



In [55]:

    
lasso = linear_model.LassoCV(cv=5)
lasso.fit(dia_train[['age', 'sex', 'bmi']], dia_train['target'])









    Out[55]:





LassoCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True,
    max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
    precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
    verbose=False)



In [52]:

    
dia_test = dia_test.assign(predict=lambda x: lasso.predict(x[['age', 'sex', 'bmi']]))
dia_test



In [59]:

    
columns = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']


lasso = linear_model.LassoCV(cv=5)
lasso.fit(dia_train[columns], dia_train['target'])
dia_test = dia_test.assign(predict=lambda x: lasso.predict(x[columns]))


print('Współczynniki: \n', lasso.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(dia_test['target'], lasso.predict(dia_test[columns])))
print('Metryka R2 (wariancji): %.2f' % r2_score(dia_test['target'], dia_test['predict']))









    



Współczynniki: 
 [   0.         -227.7909445   515.87541365  322.65588548 -410.39307487
  172.22093534  -69.15134823  134.40868857  594.35952058   75.48272645]
Błąd średniokwadratowy: 1990.67
Metryka R2 (wariancji): 0.59



In [53]:

    
print('Współczynniki: \n', lasso.coef_)
print("Błąd średniokwadratowy: %.2f"
      % mean_squared_error(dia_test['target'], lasso.predict(dia_test[['age', 'sex', 'bmi']])))
print('Metryka R2 (wariancji): %.2f' % r2_score(dia_test['target'], dia_test['predict']))









    



Współczynniki: 
 [ 1.16925238e+02 -4.07249379e-01  8.89889954e+02]
Błąd średniokwadratowy: 2616.15
Metryka R2 (wariancji): 0.46

Zobaczmy co dzieje się w trakcie procesu walidacji krzyżowej. Algorytm liczy dla każdego podziału danych krzywą MSE w zależności od parametru alpha, jak pokazano poniżej.



In [54]:

    
plt.plot(-pd.np.log10(lasso.alphas_), lasso.mse_path_, linestyle='--');
plt.plot(-pd.np.log10(lasso.alphas_), lasso.mse_path_.mean(axis=1), 'k', linewidth=3);

plt.xlabel('$-log_{10}(alpha)$');
plt.ylabel('Mean Square Error (MSE)');



In [63]:

    
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
# %matplotlib inline


diabetes = datasets.load_diabetes()
dataframe = pd.DataFrame(diabetes.data, columns=diabetes.feature_names).assign(target=diabetes.target)

dane_treningowe = dataframe.iloc[:-20, :]
dane_testowe = dataframe.iloc[-20:, :]

columns = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

model = linear_model.LassoCV(cv=5)
model.fit(dane_treningowe[columns], dane_treningowe['target'])
dane_testowe = dane_testowe.assign(predict=lambda x: model.predict(dane_treningowe[columns]))


print('Współczynniki: \n', model.coef_)
print("Błąd średniokwadratowy: %.2f" % mean_squared_error(dia_test['target'], model.predict(dane_testowe[columns])))
print('Metryka R2 (wariancji): %.2f' % r2_score(dane_testowe['target'], dane_testowe['predict']))









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-63-fe19845627a4> in <module>()
     17 model = linear_model.LassoCV(cv=5)
     18 model.fit(dane_treningowe[columns], dane_treningowe['target'])
---> 19 dane_testowe = dane_testowe.assign(predict=lambda x: model.predict(dane_treningowe[columns]))
     20 
     21 

~\.virtualenv\book-python\lib\site-packages\pandas\core\frame.py in assign(self, **kwargs)
   2692         # ... and then assign
   2693         for k, v in results:
-> 2694             data[k] = v
   2695         return data
   2696 

~\.virtualenv\book-python\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   2517         else:
   2518             # set column
-> 2519             self._set_item(key, value)
   2520 
   2521     def _setitem_slice(self, key, value):

~\.virtualenv\book-python\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   2583 
   2584         self._ensure_valid_index(value)
-> 2585         value = self._sanitize_column(key, value)
   2586         NDFrame._set_item(self, key, value)
   2587 

~\.virtualenv\book-python\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   2758 
   2759             # turn me into an ndarray
-> 2760             value = _sanitize_index(value, self.index, copy=False)
   2761             if not isinstance(value, (np.ndarray, Index)):
   2762                 if isinstance(value, list) and len(value) > 0:

~\.virtualenv\book-python\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
   3119 
   3120     if len(data) != len(index):
-> 3121         raise ValueError('Length of values does not match length of ' 'index')
   3122 
   3123     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

Zadanie

Spróbuj inne kolumny z LassoCV.
Spróbuj różne parametry modelu.
Spróbuj użyć RidgeCV.

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target
0	0.038076	0.050680	0.061696	0.021872	-0.044223	-0.034821	-0.043401	-0.002592	0.019908	-0.017646	151.0
1	-0.001882	-0.044642	-0.051474	-0.026328	-0.008449	-0.019163	0.074412	-0.039493	-0.068330	-0.092204	75.0
2	0.085299	0.050680	0.044451	-0.005671	-0.045599	-0.034194	-0.032356	-0.002592	0.002864	-0.025930	141.0
3	-0.089063	-0.044642	-0.011595	-0.036656	0.012191	0.024991	-0.036038	0.034309	0.022692	-0.009362	206.0
4	0.005383	-0.044642	-0.036385	0.021872	0.003935	0.015596	0.008142	-0.002592	-0.031991	-0.046641	135.0

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target
0	0.038076	0.050680	0.061696	0.021872	-0.044223	-0.034821	-0.043401	-0.002592	0.019908	-0.017646	151.0
1	-0.001882	-0.044642	-0.051474	-0.026328	-0.008449	-0.019163	0.074412	-0.039493	-0.068330	-0.092204	75.0
2	0.085299	0.050680	0.044451	-0.005671	-0.045599	-0.034194	-0.032356	-0.002592	0.002864	-0.025930	141.0
3	-0.089063	-0.044642	-0.011595	-0.036656	0.012191	0.024991	-0.036038	0.034309	0.022692	-0.009362	206.0
4	0.005383	-0.044642	-0.036385	0.021872	0.003935	0.015596	0.008142	-0.002592	-0.031991	-0.046641	135.0
5	-0.092695	-0.044642	-0.040696	-0.019442	-0.068991	-0.079288	0.041277	-0.076395	-0.041180	-0.096346	97.0
6	-0.045472	0.050680	-0.047163	-0.015999	-0.040096	-0.024800	0.000779	-0.039493	-0.062913	-0.038357	138.0
7	0.063504	0.050680	-0.001895	0.066630	0.090620	0.108914	0.022869	0.017703	-0.035817	0.003064	63.0
8	0.041708	0.050680	0.061696	-0.040099	-0.013953	0.006202	-0.028674	-0.002592	-0.014956	0.011349	110.0
9	-0.070900	-0.044642	0.039062	-0.033214	-0.012577	-0.034508	-0.024993	-0.002592	0.067736	-0.013504	310.0
10	-0.096328	-0.044642	-0.083808	0.008101	-0.103389	-0.090561	-0.013948	-0.076395	-0.062913	-0.034215	101.0
11	0.027178	0.050680	0.017506	-0.033214	-0.007073	0.045972	-0.065491	0.071210	-0.096433	-0.059067	69.0
12	0.016281	-0.044642	-0.028840	-0.009113	-0.004321	-0.009769	0.044958	-0.039493	-0.030751	-0.042499	179.0
13	0.005383	0.050680	-0.001895	0.008101	-0.004321	-0.015719	-0.002903	-0.002592	0.038393	-0.013504	185.0
14	0.045341	-0.044642	-0.025607	-0.012556	0.017694	-0.000061	0.081775	-0.039493	-0.031991	-0.075636	118.0
15	-0.052738	0.050680	-0.018062	0.080401	0.089244	0.107662	-0.039719	0.108111	0.036056	-0.042499	171.0
16	-0.005515	-0.044642	0.042296	0.049415	0.024574	-0.023861	0.074412	-0.039493	0.052280	0.027917	166.0
17	0.070769	0.050680	0.012117	0.056301	0.034206	0.049416	-0.039719	0.034309	0.027368	-0.001078	144.0
18	-0.038207	-0.044642	-0.010517	-0.036656	-0.037344	-0.019476	-0.028674	-0.002592	-0.018118	-0.017646	97.0
19	-0.027310	-0.044642	-0.018062	-0.040099	-0.002945	-0.011335	0.037595	-0.039493	-0.008944	-0.054925	168.0

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target
422	-0.078165	0.050680	0.077863	0.052858	0.078236	0.064447	0.026550	-0.002592	0.040672	-0.009362	233.0
423	0.009016	0.050680	-0.039618	0.028758	0.038334	0.073529	-0.072854	0.108111	0.015567	-0.046641	91.0
424	0.001751	0.050680	0.011039	-0.019442	-0.016704	-0.003819	-0.047082	0.034309	0.024053	0.023775	111.0
425	-0.078165	-0.044642	-0.040696	-0.081414	-0.100638	-0.112795	0.022869	-0.076395	-0.020289	-0.050783	152.0
426	0.030811	0.050680	-0.034229	0.043677	0.057597	0.068831	-0.032356	0.057557	0.035462	0.085907	120.0
427	-0.034575	0.050680	0.005650	-0.005671	-0.073119	-0.062691	-0.006584	-0.039493	-0.045421	0.032059	67.0
428	0.048974	0.050680	0.088642	0.087287	0.035582	0.021546	-0.024993	0.034309	0.066048	0.131470	310.0
429	-0.041840	-0.044642	-0.033151	-0.022885	0.046589	0.041587	0.056003	-0.024733	-0.025952	-0.038357	94.0
430	-0.009147	-0.044642	-0.056863	-0.050428	0.021822	0.045345	-0.028674	0.034309	-0.009919	-0.017646	183.0
431	0.070769	0.050680	-0.030996	0.021872	-0.037344	-0.047034	0.033914	-0.039493	-0.014956	-0.001078	66.0
432	0.009016	-0.044642	0.055229	-0.005671	0.057597	0.044719	-0.002903	0.023239	0.055684	0.106617	173.0
433	-0.027310	-0.044642	-0.060097	-0.029771	0.046589	0.019980	0.122273	-0.039493	-0.051401	-0.009362	72.0
434	0.016281	-0.044642	0.001339	0.008101	0.005311	0.010899	0.030232	-0.039493	-0.045421	0.032059	49.0
435	-0.012780	-0.044642	-0.023451	-0.040099	-0.016704	0.004636	-0.017629	-0.002592	-0.038459	-0.038357	64.0
436	-0.056370	-0.044642	-0.074108	-0.050428	-0.024960	-0.047034	0.092820	-0.076395	-0.061177	-0.046641	48.0
437	0.041708	0.050680	0.019662	0.059744	-0.005697	-0.002566	-0.028674	-0.002592	0.031193	0.007207	178.0
438	-0.005515	0.050680	-0.015906	-0.067642	0.049341	0.079165	-0.028674	0.034309	-0.018118	0.044485	104.0
439	0.041708	0.050680	-0.015906	0.017282	-0.037344	-0.013840	-0.024993	-0.011080	-0.046879	0.015491	132.0
440	-0.045472	-0.044642	0.039062	0.001215	0.016318	0.015283	-0.028674	0.026560	0.044528	-0.025930	220.0
441	-0.045472	-0.044642	-0.073030	-0.081414	0.083740	0.027809	0.173816	-0.039493	-0.004220	0.003064	57.0

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target	predict
422	-0.078165	0.050680	0.077863	0.052858	0.078236	0.064447	0.026550	-0.002592	0.040672	-0.009362	233.0	211.071181
423	0.009016	0.050680	-0.039618	0.028758	0.038334	0.073529	-0.072854	0.108111	0.015567	-0.046641	91.0	116.261552
424	0.001751	0.050680	0.011039	-0.019442	-0.016704	-0.003819	-0.047082	0.034309	0.024053	0.023775	111.0	161.517691
425	-0.078165	-0.044642	-0.040696	-0.081414	-0.100638	-0.112795	0.022869	-0.076395	-0.020289	-0.050783	152.0	105.886702
426	0.030811	0.050680	-0.034229	0.043677	0.057597	0.068831	-0.032356	0.057557	0.035462	0.085907	120.0	124.331705
427	-0.034575	0.050680	0.005650	-0.005671	-0.073119	-0.062691	-0.006584	-0.039493	-0.045421	0.032059	67.0	151.351420
428	0.048974	0.050680	0.088642	0.087287	0.035582	0.021546	-0.024993	0.034309	0.066048	0.131470	310.0	239.264161
429	-0.041840	-0.044642	-0.033151	-0.022885	0.046589	0.041587	0.056003	-0.024733	-0.025952	-0.038357	94.0	118.023364
430	-0.009147	-0.044642	-0.056863	-0.050428	0.021822	0.045345	-0.028674	0.034309	-0.009919	-0.017646	183.0	101.065322
431	0.070769	0.050680	-0.030996	0.021872	-0.037344	-0.047034	0.033914	-0.039493	-0.014956	-0.001078	66.0	133.051614
432	0.009016	-0.044642	0.055229	-0.005671	0.057597	0.044719	-0.002903	0.023239	0.055684	0.106617	173.0	206.145820
433	-0.027310	-0.044642	-0.060097	-0.029771	0.046589	0.019980	0.122273	-0.039493	-0.051401	-0.009362	72.0	95.489589
434	0.016281	-0.044642	0.001339	0.008101	0.005311	0.010899	0.030232	-0.039493	-0.045421	0.032059	49.0	157.934094
435	-0.012780	-0.044642	-0.023451	-0.040099	-0.016704	0.004636	-0.017629	-0.002592	-0.038459	-0.038357	64.0	131.082359
436	-0.056370	-0.044642	-0.074108	-0.050428	-0.024960	-0.047034	0.092820	-0.076395	-0.061177	-0.046641	48.0	78.489811
437	0.041708	0.050680	0.019662	0.059744	-0.005697	-0.002566	-0.028674	-0.002592	0.031193	0.007207	178.0	175.163578
438	-0.005515	0.050680	-0.015906	-0.067642	0.049341	0.079165	-0.028674	0.034309	-0.018118	0.044485	104.0	135.839740
439	0.041708	0.050680	-0.015906	0.017282	-0.037344	-0.013840	-0.024993	-0.011080	-0.046879	0.015491	132.0	142.652120
440	-0.045472	-0.044642	0.039062	0.001215	0.016318	0.015283	-0.028674	0.026560	0.044528	-0.025930	220.0	183.507446
441	-0.045472	-0.044642	-0.073030	-0.081414	0.083740	0.027809	0.173816	-0.039493	-0.004220	0.003064	57.0	81.047094

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target	predict
422	-0.078165	0.050680	0.077863	0.052858	0.078236	0.064447	0.026550	-0.002592	0.040672	-0.009362	233.0	179.626754
423	0.009016	0.050680	-0.039618	0.028758	0.038334	0.073529	-0.072854	0.108111	0.015567	-0.046641	91.0	136.525047
424	0.001751	0.050680	0.011039	-0.019442	-0.016704	-0.003819	-0.047082	0.034309	0.024053	0.023775	111.0	158.441800
425	-0.078165	-0.044642	-0.040696	-0.081414	-0.100638	-0.112795	0.022869	-0.076395	-0.020289	-0.050783	152.0	126.104300
426	0.030811	0.050680	-0.034229	0.043677	0.057597	0.068831	-0.032356	0.057557	0.035462	0.085907	120.0	141.335891
427	-0.034575	0.050680	0.005650	-0.005671	-0.073119	-0.062691	-0.006584	-0.039493	-0.045421	0.032059	67.0	152.034710
428	0.048974	0.050680	0.088642	0.087287	0.035582	0.021546	-0.024993	0.034309	0.066048	0.131470	310.0	198.426854
429	-0.041840	-0.044642	-0.033151	-0.022885	0.046589	0.041587	0.056003	-0.024733	-0.025952	-0.038357	94.0	133.477980
430	-0.009147	-0.044642	-0.056863	-0.050428	0.021822	0.045345	-0.028674	0.034309	-0.009919	-0.017646	183.0	126.437037
431	0.070769	0.050680	-0.030996	0.021872	-0.037344	-0.047034	0.033914	-0.039493	-0.014956	-0.001078	66.0	147.175451
432	0.009016	-0.044642	0.055229	-0.005671	0.057597	0.044719	-0.002903	0.023239	0.055684	0.106617	173.0	178.695047
433	-0.027310	-0.044642	-0.060097	-0.029771	0.046589	0.019980	0.122273	-0.039493	-0.051401	-0.009362	72.0	122.991844
434	0.016281	-0.044642	0.001339	0.008101	0.005311	0.010899	0.030232	-0.039493	-0.045421	0.032059	49.0	155.328408
435	-0.012780	-0.044642	-0.023451	-0.040099	-0.016704	0.004636	-0.017629	-0.002592	-0.038459	-0.038357	64.0	141.020127
436	-0.056370	-0.044642	-0.074108	-0.050428	-0.024960	-0.047034	0.092820	-0.076395	-0.061177	-0.046641	48.0	113.516516
437	0.041708	0.050680	0.019662	0.059744	-0.005697	-0.002566	-0.028674	-0.002592	0.031193	0.007207	178.0	166.697836
438	-0.005515	0.050680	-0.015906	-0.067642	0.049341	0.079165	-0.028674	0.034309	-0.018118	0.044485	104.0	145.561296
439	0.041708	0.050680	-0.015906	0.017282	-0.037344	-0.013840	-0.024993	-0.011080	-0.046879	0.015491	132.0	150.749094
440	-0.045472	-0.044642	0.039062	0.001215	0.016318	0.015283	-0.028674	0.026560	0.044528	-0.025930	220.0	165.459699
441	-0.045472	-0.044642	-0.073030	-0.081414	0.083740	0.027809	0.173816	-0.039493	-0.004220	0.003064	57.0	115.196995