In [1]:

    
### Example taken from :: https://machinelearningmastery.com/handle-missing-data-python/

Pima Indians Diabetes Dataset

The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details.

It is a binary (2-class) classification problem. The number of observations for each class is not balanced. There are 768 observations with 8 input variables and 1 output variable. The variable names are as follows:

Number of times pregnant.
Plasma glucose concentration a 2 hours in an oral glucose tolerance test.
Diastolic blood pressure (mm Hg).
Triceps skinfold thickness (mm).
2-Hour serum insulin (mu U/ml).
Body mass index (weight in kg/(height in m)^2).
Diabetes pedigree function.
Age (years).
Class variable (0 or 1).



In [2]:

    
import pandas as pd
dataset = pd.read_csv('pima-indians-diabetes.data.txt', header=None)
dataset.head()



In [3]:

    
print(dataset.describe())









    



                0           1           2           3           4           5  \
count  768.000000  768.000000  768.000000  768.000000  768.000000  768.000000   
mean     3.845052  120.894531   69.105469   20.536458   79.799479   31.992578   
std      3.369578   31.972618   19.355807   15.952218  115.244002    7.884160   
min      0.000000    0.000000    0.000000    0.000000    0.000000    0.000000   
25%      1.000000   99.000000   62.000000    0.000000    0.000000   27.300000   
50%      3.000000  117.000000   72.000000   23.000000   30.500000   32.000000   
75%      6.000000  140.250000   80.000000   32.000000  127.250000   36.600000   
max     17.000000  199.000000  122.000000   99.000000  846.000000   67.100000   

                6           7           8  
count  768.000000  768.000000  768.000000  
mean     0.471876   33.240885    0.348958  
std      0.331329   11.760232    0.476951  
min      0.078000   21.000000    0.000000  
25%      0.243750   24.000000    0.000000  
50%      0.372500   29.000000    0.000000  
75%      0.626250   41.000000    1.000000  
max      2.420000   81.000000    1.000000

This is useful.

We can see that there are columns that have a minimum value of zero (0). On some columns, a value of zero does not make sense and indicates an invalid or missing value.

Specifically, the following columns have an invalid zero minimum value:

1: Plasma glucose concentration 2: Diastolic blood pressure 3: Triceps skinfold thickness 4: 2-Hour serum insulin 5: Body mass index

#

We can get a count of the number of missing values on each of these columns. We can do this my marking all of the values in the subset of the DataFrame we are interested in that have zero values as True. We can then count the number of true values in each column.

We can do this my marking all of the values in the subset of the DataFrame we are interested in that have zero values as True. We can then count the number of true values in each column.



In [4]:

    
print((dataset[[1,2,3,4,5]]== 0).sum())









    



1      5
2     35
3    227
4    374
5     11
dtype: int64

In Python, specifically Pandas, NumPy and Scikit-Learn, we mark missing values as NaN.

Values with a NaN value are ignored from operations like sum, count, etc.

We can mark values as NaN easily with the Pandas DataFrame by using the replace() function on a subset of the columns we are interested in.

After we have marked the missing values, we can use the isnull() function to mark all of the NaN values in the dataset as True and get a count of the missing values for each column.



In [6]:

    
import numpy as np
# mark zero values as missing or NaN
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, np.NaN )

# count the number of NaN values in each column
print(dataset.isnull().sum())









    



0      0
1      5
2     35
3    227
4    374
5     11
6      0
7      0
8      0
dtype: int64

Missing Values Causes Problems

Having missing values in a dataset can cause errors with some machine learning algorithms.

In this section, we will try to evaluate a the Linear Discriminant Analysis (LDA) algorithm on the dataset with missing values.

This is an algorithm that does not work when there are missing values in the dataset.

The below example marks the missing values in the dataset, as we did in the previous section, then attempts to evaluate LDA using 3-fold cross validation and print the mean accuracy.



In [8]:

    
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

# split dataset into inputs and outputs
values = dataset.values   ## generated matrix out of the dataframe
X = values[:,:8]
y = values[:,8]
# evaluate an LDA model on the dataset using k-fold cross validation
model = LinearDiscriminantAnalysis()

kfold = KFold(n_splits =3 , random_state = 7)
result = cross_val_score(model,X, y, cv= kfold, scoring = 'accuracy' )
print('result.mean():', result.mean())









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-8def58b992d1> in <module>()
     11 
     12 kfold = KFold(n_splits =3 , random_state = 7)
---> 13 result = cross_val_score(model,X, y, cv= kfold, scoring = 'accuracy' )
     14 print('result.mean():', result.mean())

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    138                                               train, test, verbose, None,
    139                                               fit_params)
--> 140                       for train, test in cv_iter)
    141     return np.array(scores)[:, 0]
    142 

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)
    756             # was dispatched. In particular this covers the edge
    757             # case of Parallel used with an exhausted iterator.
--> 758             while self.dispatch_one_batch(iterator):
    759                 self._iterating = True
    760             else:

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)
    606                 return False
    607             else:
--> 608                 self._dispatch(tasks)
    609                 return True
    610 

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)
    569         dispatch_timestamp = time.time()
    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 571         job = self._backend.apply_async(batch, callback=cb)
    572         self._jobs.append(job)
    573 

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)
    107     def apply_async(self, func, callback=None):
    108         """Schedule a func to be run"""
--> 109         result = ImmediateResult(func)
    110         if callback:
    111             callback(result)

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)
    324         # Don't delay the application, to avoid keeping the input
    325         # arguments in memory
--> 326         self.results = batch()
    327 
    328     def get(self):

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
    236             estimator.fit(X_train, **fit_params)
    237         else:
--> 238             estimator.fit(X_train, y_train, **fit_params)
    239 
    240     except Exception as e:

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\discriminant_analysis.py in fit(self, X, y, store_covariance, tol)
    440                           DeprecationWarning)
    441             self.tol = tol
--> 442         X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
    443         self.classes_ = unique_labels(y)
    444 

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    519     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
    520                     ensure_2d, allow_nd, ensure_min_samples,
--> 521                     ensure_min_features, warn_on_dtype, estimator)
    522     if multi_output:
    523         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    405                              % (array.ndim, estimator_name))
    406         if force_all_finite:
--> 407             _assert_all_finite(array)
    408 
    409     shape_repr = _shape_repr(array.shape)

C:\Users\prassha\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

This is as we expect.

We are prevented from evaluating an LDA algorithm (and other algorithms) on the dataset with missing values.

Now, we can look at methods to handle the missing values.

#

Remove Rows With Missing Values

The simplest strategy for handling missing data is to remove records that contain a missing value.

We can do this by creating a new Pandas DataFrame with the rows containing missing values removed.

Pandas provides the dropna() function that can be used to drop either columns or rows with missing data. We can use dropna() to remove all rows with missing data, as follows:



In [10]:

    
# drop rows with missing values
dataset.dropna(inplace = True)
# summarize the number of rows and columns in the dataset
print(dataset.shape)

We now have a dataset that we could use to evaluate an algorithm sensitive to missing values like LDA.



In [11]:

    
values = dataset.values   ## generated matrix out of the dataframe
X = values[:,:8]
y = values[:,8]
# evaluate an LDA model on the dataset using k-fold cross validation
model = LinearDiscriminantAnalysis()

kfold = KFold(n_splits =3 , random_state = 7)
result = cross_val_score(model,X, y, cv= kfold, scoring = 'accuracy' )
print('result.mean():', result.mean())









    



result.mean(): 0.78582892934

Removing rows with missing values can be too limiting on some predictive modeling problems, an alternative is to impute missing values.



In [13]:

    
dataset = pd.read_csv('pima-indians-diabetes.data.txt', header=None)
# mark zero values as missing or NaN
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, np.NaN)

# fill missing values with mean column values
dataset.fillna(dataset.mean(), inplace = True)

# count the number of NaN values in each column
print(dataset.isnull().sum())









    



0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64

**The scikit-learn library provides the Imputer() pre-processing class that can be used to replace missing values.

It is a flexible class that allows you to specify the value to replace (it can be something other than NaN) and the technique used to replace it (such as mean, median, or mode). The Imputer class operates directly on the NumPy array instead of the DataFrame.

The example below uses the Imputer class to replace missing values with the mean of each column then prints the number of NaN values in the transformed matrix. **



In [18]:

    
from sklearn.preprocessing import Imputer
#values = dataset.values   ## generated matrix out of the dataframe
#X = values[:,:8]
#y = values[:,8]
imp = Imputer(missing_values = 'NaN', strategy = 'mean', axis =0)
imp.fit(dataset)   ######## working
print(dataset.isnull().sum())  ###### working









    



0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
dtype: int64



In [27]:

    
import pandas as pd
df = pd.read_csv('pima-indians-diabetes.data.txt', header=None)

y = df[8]
X = df.drop(8, axis =1)

#print((X[[1,2,3,4,5]]== 0).sum())   # works
print((X == 0).sum()) # also works
imp = Imputer(missing_values = 0, strategy = 'mean', axis =0)
imp.fit(X)   ######## working
print((X == 0).sum())  ###### working
X.head()









    



0    111
1      5
2     35
3    227
4    374
5     11
6      0
7      0
dtype: int64
0    111
1      5
2     35
3    227
4    374
5     11
6      0
7      0
dtype: int64






    Out[27]:







  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
    
  
  
    
      0
      6
      148
      72
      35
      0
      33.6
      0.627
      50
    
    
      1
      1
      85
      66
      29
      0
      26.6
      0.351
      31
    
    
      2
      8
      183
      64
      0
      0
      23.3
      0.672
      32
    
    
      3
      1
      89
      66
      23
      94
      28.1
      0.167
      21
    
    
      4
      0
      137
      40
      35
      168
      43.1
      2.288
      33



In [34]:

    
import pandas as pd
df = pd.read_csv('pima-indians-diabetes.data.txt', header=None)

y = df[8]
X = df.drop(8, axis =1)

print((X == 0).sum()) # also works
imp = Imputer(missing_values = 0, strategy = 'mean', axis =0)
X_imp = imp.fit_transform(X)
print((X_imp == 0).sum())
X_imp.head()









    



0    111
1      5
2     35
3    227
4    374
5     11
6      0
7      0
dtype: int64
0






    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-34-bb4eaf556e48> in <module>()
      9 X_imp = imp.fit_transform(X)
     10 print((X_imp == 0).sum())
---> 11 X_imp.head()

AttributeError: 'numpy.ndarray' object has no attribute 'head'



In [ ]:

    
'''
https://stackoverflow.com/questions/33660836/impute-entire-dataframe-all-columns-using-scikit-learn-sklearn-without-itera
If you want the mean or median you could do something like:

fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=1)
imputed_DF = pd.DataFrame(fill_NaN.fit_transform(DF))
imputed_DF.columns = DF.columns
imputed_DF.index = DF.index
If you want to fill them with 0s or something you could always just do:

DF[DF.isnull()] = 0
'''



In [40]:

    
import pandas as pd
df = pd.read_csv('pima-indians-diabetes.data.txt', header=None)

#values = df.values
#X = values[:,0:8]
#y = values[:,8]
y = df[8]
X = df.drop(8, axis =1)
print(X)









    



      0    1   2   3    4     5      6   7
0     6  148  72  35    0  33.6  0.627  50
1     1   85  66  29    0  26.6  0.351  31
2     8  183  64   0    0  23.3  0.672  32
3     1   89  66  23   94  28.1  0.167  21
4     0  137  40  35  168  43.1  2.288  33
5     5  116  74   0    0  25.6  0.201  30
6     3   78  50  32   88  31.0  0.248  26
7    10  115   0   0    0  35.3  0.134  29
8     2  197  70  45  543  30.5  0.158  53
9     8  125  96   0    0   0.0  0.232  54
10    4  110  92   0    0  37.6  0.191  30
11   10  168  74   0    0  38.0  0.537  34
12   10  139  80   0    0  27.1  1.441  57
13    1  189  60  23  846  30.1  0.398  59
14    5  166  72  19  175  25.8  0.587  51
15    7  100   0   0    0  30.0  0.484  32
16    0  118  84  47  230  45.8  0.551  31
17    7  107  74   0    0  29.6  0.254  31
18    1  103  30  38   83  43.3  0.183  33
19    1  115  70  30   96  34.6  0.529  32
20    3  126  88  41  235  39.3  0.704  27
21    8   99  84   0    0  35.4  0.388  50
22    7  196  90   0    0  39.8  0.451  41
23    9  119  80  35    0  29.0  0.263  29
24   11  143  94  33  146  36.6  0.254  51
25   10  125  70  26  115  31.1  0.205  41
26    7  147  76   0    0  39.4  0.257  43
27    1   97  66  15  140  23.2  0.487  22
28   13  145  82  19  110  22.2  0.245  57
29    5  117  92   0    0  34.1  0.337  38
..   ..  ...  ..  ..  ...   ...    ...  ..
738   2   99  60  17  160  36.6  0.453  21
739   1  102  74   0    0  39.5  0.293  42
740  11  120  80  37  150  42.3  0.785  48
741   3  102  44  20   94  30.8  0.400  26
742   1  109  58  18  116  28.5  0.219  22
743   9  140  94   0    0  32.7  0.734  45
744  13  153  88  37  140  40.6  1.174  39
745  12  100  84  33  105  30.0  0.488  46
746   1  147  94  41    0  49.3  0.358  27
747   1   81  74  41   57  46.3  1.096  32
748   3  187  70  22  200  36.4  0.408  36
749   6  162  62   0    0  24.3  0.178  50
750   4  136  70   0    0  31.2  1.182  22
751   1  121  78  39   74  39.0  0.261  28
752   3  108  62  24    0  26.0  0.223  25
753   0  181  88  44  510  43.3  0.222  26
754   8  154  78  32    0  32.4  0.443  45
755   1  128  88  39  110  36.5  1.057  37
756   7  137  90  41    0  32.0  0.391  39
757   0  123  72   0    0  36.3  0.258  52
758   1  106  76   0    0  37.5  0.197  26
759   6  190  92   0    0  35.5  0.278  66
760   2   88  58  26   16  28.4  0.766  22
761   9  170  74  31    0  44.0  0.403  43
762   9   89  62   0    0  22.5  0.142  33
763  10  101  76  48  180  32.9  0.171  63
764   2  122  70  27    0  36.8  0.340  27
765   5  121  72  23  112  26.2  0.245  30
766   1  126  60   0    0  30.1  0.349  47
767   1   93  70  31    0  30.4  0.315  23

[768 rows x 8 columns]



In [42]:

    
imp = Imputer(missing_values = 0, strategy = 'mean', axis =0)
X_imp = pd.DataFrame(imp.fit_transform(X))
X_imp.columns = X.columns
X_imp.index = X.index
print((X_imp == 0).sum())
X_imp









    



0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
dtype: int64






    Out[42]:







  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
    
  
  
    
      0
      6.000000
      148.0
      72.000000
      35.00000
      155.548223
      33.600000
      0.627
      50.0
    
    
      1
      1.000000
      85.0
      66.000000
      29.00000
      155.548223
      26.600000
      0.351
      31.0
    
    
      2
      8.000000
      183.0
      64.000000
      29.15342
      155.548223
      23.300000
      0.672
      32.0
    
    
      3
      1.000000
      89.0
      66.000000
      23.00000
      94.000000
      28.100000
      0.167
      21.0
    
    
      4
      4.494673
      137.0
      40.000000
      35.00000
      168.000000
      43.100000
      2.288
      33.0
    
    
      5
      5.000000
      116.0
      74.000000
      29.15342
      155.548223
      25.600000
      0.201
      30.0
    
    
      6
      3.000000
      78.0
      50.000000
      32.00000
      88.000000
      31.000000
      0.248
      26.0
    
    
      7
      10.000000
      115.0
      72.405184
      29.15342
      155.548223
      35.300000
      0.134
      29.0
    
    
      8
      2.000000
      197.0
      70.000000
      45.00000
      543.000000
      30.500000
      0.158
      53.0
    
    
      9
      8.000000
      125.0
      96.000000
      29.15342
      155.548223
      32.457464
      0.232
      54.0
    
    
      10
      4.000000
      110.0
      92.000000
      29.15342
      155.548223
      37.600000
      0.191
      30.0
    
    
      11
      10.000000
      168.0
      74.000000
      29.15342
      155.548223
      38.000000
      0.537
      34.0
    
    
      12
      10.000000
      139.0
      80.000000
      29.15342
      155.548223
      27.100000
      1.441
      57.0
    
    
      13
      1.000000
      189.0
      60.000000
      23.00000
      846.000000
      30.100000
      0.398
      59.0
    
    
      14
      5.000000
      166.0
      72.000000
      19.00000
      175.000000
      25.800000
      0.587
      51.0
    
    
      15
      7.000000
      100.0
      72.405184
      29.15342
      155.548223
      30.000000
      0.484
      32.0
    
    
      16
      4.494673
      118.0
      84.000000
      47.00000
      230.000000
      45.800000
      0.551
      31.0
    
    
      17
      7.000000
      107.0
      74.000000
      29.15342
      155.548223
      29.600000
      0.254
      31.0
    
    
      18
      1.000000
      103.0
      30.000000
      38.00000
      83.000000
      43.300000
      0.183
      33.0
    
    
      19
      1.000000
      115.0
      70.000000
      30.00000
      96.000000
      34.600000
      0.529
      32.0
    
    
      20
      3.000000
      126.0
      88.000000
      41.00000
      235.000000
      39.300000
      0.704
      27.0
    
    
      21
      8.000000
      99.0
      84.000000
      29.15342
      155.548223
      35.400000
      0.388
      50.0
    
    
      22
      7.000000
      196.0
      90.000000
      29.15342
      155.548223
      39.800000
      0.451
      41.0
    
    
      23
      9.000000
      119.0
      80.000000
      35.00000
      155.548223
      29.000000
      0.263
      29.0
    
    
      24
      11.000000
      143.0
      94.000000
      33.00000
      146.000000
      36.600000
      0.254
      51.0
    
    
      25
      10.000000
      125.0
      70.000000
      26.00000
      115.000000
      31.100000
      0.205
      41.0
    
    
      26
      7.000000
      147.0
      76.000000
      29.15342
      155.548223
      39.400000
      0.257
      43.0
    
    
      27
      1.000000
      97.0
      66.000000
      15.00000
      140.000000
      23.200000
      0.487
      22.0
    
    
      28
      13.000000
      145.0
      82.000000
      19.00000
      110.000000
      22.200000
      0.245
      57.0
    
    
      29
      5.000000
      117.0
      92.000000
      29.15342
      155.548223
      34.100000
      0.337
      38.0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      738
      2.000000
      99.0
      60.000000
      17.00000
      160.000000
      36.600000
      0.453
      21.0
    
    
      739
      1.000000
      102.0
      74.000000
      29.15342
      155.548223
      39.500000
      0.293
      42.0
    
    
      740
      11.000000
      120.0
      80.000000
      37.00000
      150.000000
      42.300000
      0.785
      48.0
    
    
      741
      3.000000
      102.0
      44.000000
      20.00000
      94.000000
      30.800000
      0.400
      26.0
    
    
      742
      1.000000
      109.0
      58.000000
      18.00000
      116.000000
      28.500000
      0.219
      22.0
    
    
      743
      9.000000
      140.0
      94.000000
      29.15342
      155.548223
      32.700000
      0.734
      45.0
    
    
      744
      13.000000
      153.0
      88.000000
      37.00000
      140.000000
      40.600000
      1.174
      39.0
    
    
      745
      12.000000
      100.0
      84.000000
      33.00000
      105.000000
      30.000000
      0.488
      46.0
    
    
      746
      1.000000
      147.0
      94.000000
      41.00000
      155.548223
      49.300000
      0.358
      27.0
    
    
      747
      1.000000
      81.0
      74.000000
      41.00000
      57.000000
      46.300000
      1.096
      32.0
    
    
      748
      3.000000
      187.0
      70.000000
      22.00000
      200.000000
      36.400000
      0.408
      36.0
    
    
      749
      6.000000
      162.0
      62.000000
      29.15342
      155.548223
      24.300000
      0.178
      50.0
    
    
      750
      4.000000
      136.0
      70.000000
      29.15342
      155.548223
      31.200000
      1.182
      22.0
    
    
      751
      1.000000
      121.0
      78.000000
      39.00000
      74.000000
      39.000000
      0.261
      28.0
    
    
      752
      3.000000
      108.0
      62.000000
      24.00000
      155.548223
      26.000000
      0.223
      25.0
    
    
      753
      4.494673
      181.0
      88.000000
      44.00000
      510.000000
      43.300000
      0.222
      26.0
    
    
      754
      8.000000
      154.0
      78.000000
      32.00000
      155.548223
      32.400000
      0.443
      45.0
    
    
      755
      1.000000
      128.0
      88.000000
      39.00000
      110.000000
      36.500000
      1.057
      37.0
    
    
      756
      7.000000
      137.0
      90.000000
      41.00000
      155.548223
      32.000000
      0.391
      39.0
    
    
      757
      4.494673
      123.0
      72.000000
      29.15342
      155.548223
      36.300000
      0.258
      52.0
    
    
      758
      1.000000
      106.0
      76.000000
      29.15342
      155.548223
      37.500000
      0.197
      26.0
    
    
      759
      6.000000
      190.0
      92.000000
      29.15342
      155.548223
      35.500000
      0.278
      66.0
    
    
      760
      2.000000
      88.0
      58.000000
      26.00000
      16.000000
      28.400000
      0.766
      22.0
    
    
      761
      9.000000
      170.0
      74.000000
      31.00000
      155.548223
      44.000000
      0.403
      43.0
    
    
      762
      9.000000
      89.0
      62.000000
      29.15342
      155.548223
      22.500000
      0.142
      33.0
    
    
      763
      10.000000
      101.0
      76.000000
      48.00000
      180.000000
      32.900000
      0.171
      63.0
    
    
      764
      2.000000
      122.0
      70.000000
      27.00000
      155.548223
      36.800000
      0.340
      27.0
    
    
      765
      5.000000
      121.0
      72.000000
      23.00000
      112.000000
      26.200000
      0.245
      30.0
    
    
      766
      1.000000
      126.0
      60.000000
      29.15342
      155.548223
      30.100000
      0.349
      47.0
    
    
      767
      1.000000
      93.0
      70.000000
      31.00000
      155.548223
      30.400000
      0.315
      23.0
    
  

768 rows × 8 columns



In [ ]:

	0	1	2	3	4	5	6	7	8
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

	0	1	2	3	4	5	6	7
0	6.000000	148.0	72.000000	35.00000	155.548223	33.600000	0.627	50.0
1	1.000000	85.0	66.000000	29.00000	155.548223	26.600000	0.351	31.0
2	8.000000	183.0	64.000000	29.15342	155.548223	23.300000	0.672	32.0
3	1.000000	89.0	66.000000	23.00000	94.000000	28.100000	0.167	21.0
4	4.494673	137.0	40.000000	35.00000	168.000000	43.100000	2.288	33.0
5	5.000000	116.0	74.000000	29.15342	155.548223	25.600000	0.201	30.0
6	3.000000	78.0	50.000000	32.00000	88.000000	31.000000	0.248	26.0
7	10.000000	115.0	72.405184	29.15342	155.548223	35.300000	0.134	29.0
8	2.000000	197.0	70.000000	45.00000	543.000000	30.500000	0.158	53.0
9	8.000000	125.0	96.000000	29.15342	155.548223	32.457464	0.232	54.0
10	4.000000	110.0	92.000000	29.15342	155.548223	37.600000	0.191	30.0
11	10.000000	168.0	74.000000	29.15342	155.548223	38.000000	0.537	34.0
12	10.000000	139.0	80.000000	29.15342	155.548223	27.100000	1.441	57.0
13	1.000000	189.0	60.000000	23.00000	846.000000	30.100000	0.398	59.0
14	5.000000	166.0	72.000000	19.00000	175.000000	25.800000	0.587	51.0
15	7.000000	100.0	72.405184	29.15342	155.548223	30.000000	0.484	32.0
16	4.494673	118.0	84.000000	47.00000	230.000000	45.800000	0.551	31.0
17	7.000000	107.0	74.000000	29.15342	155.548223	29.600000	0.254	31.0
18	1.000000	103.0	30.000000	38.00000	83.000000	43.300000	0.183	33.0
19	1.000000	115.0	70.000000	30.00000	96.000000	34.600000	0.529	32.0
20	3.000000	126.0	88.000000	41.00000	235.000000	39.300000	0.704	27.0
21	8.000000	99.0	84.000000	29.15342	155.548223	35.400000	0.388	50.0
22	7.000000	196.0	90.000000	29.15342	155.548223	39.800000	0.451	41.0
23	9.000000	119.0	80.000000	35.00000	155.548223	29.000000	0.263	29.0
24	11.000000	143.0	94.000000	33.00000	146.000000	36.600000	0.254	51.0
25	10.000000	125.0	70.000000	26.00000	115.000000	31.100000	0.205	41.0
26	7.000000	147.0	76.000000	29.15342	155.548223	39.400000	0.257	43.0
27	1.000000	97.0	66.000000	15.00000	140.000000	23.200000	0.487	22.0
28	13.000000	145.0	82.000000	19.00000	110.000000	22.200000	0.245	57.0
29	5.000000	117.0	92.000000	29.15342	155.548223	34.100000	0.337	38.0
...	...	...	...	...	...	...	...	...
738	2.000000	99.0	60.000000	17.00000	160.000000	36.600000	0.453	21.0
739	1.000000	102.0	74.000000	29.15342	155.548223	39.500000	0.293	42.0
740	11.000000	120.0	80.000000	37.00000	150.000000	42.300000	0.785	48.0
741	3.000000	102.0	44.000000	20.00000	94.000000	30.800000	0.400	26.0
742	1.000000	109.0	58.000000	18.00000	116.000000	28.500000	0.219	22.0
743	9.000000	140.0	94.000000	29.15342	155.548223	32.700000	0.734	45.0
744	13.000000	153.0	88.000000	37.00000	140.000000	40.600000	1.174	39.0
745	12.000000	100.0	84.000000	33.00000	105.000000	30.000000	0.488	46.0
746	1.000000	147.0	94.000000	41.00000	155.548223	49.300000	0.358	27.0
747	1.000000	81.0	74.000000	41.00000	57.000000	46.300000	1.096	32.0
748	3.000000	187.0	70.000000	22.00000	200.000000	36.400000	0.408	36.0
749	6.000000	162.0	62.000000	29.15342	155.548223	24.300000	0.178	50.0
750	4.000000	136.0	70.000000	29.15342	155.548223	31.200000	1.182	22.0
751	1.000000	121.0	78.000000	39.00000	74.000000	39.000000	0.261	28.0
752	3.000000	108.0	62.000000	24.00000	155.548223	26.000000	0.223	25.0
753	4.494673	181.0	88.000000	44.00000	510.000000	43.300000	0.222	26.0
754	8.000000	154.0	78.000000	32.00000	155.548223	32.400000	0.443	45.0
755	1.000000	128.0	88.000000	39.00000	110.000000	36.500000	1.057	37.0
756	7.000000	137.0	90.000000	41.00000	155.548223	32.000000	0.391	39.0
757	4.494673	123.0	72.000000	29.15342	155.548223	36.300000	0.258	52.0
758	1.000000	106.0	76.000000	29.15342	155.548223	37.500000	0.197	26.0
759	6.000000	190.0	92.000000	29.15342	155.548223	35.500000	0.278	66.0
760	2.000000	88.0	58.000000	26.00000	16.000000	28.400000	0.766	22.0
761	9.000000	170.0	74.000000	31.00000	155.548223	44.000000	0.403	43.0
762	9.000000	89.0	62.000000	29.15342	155.548223	22.500000	0.142	33.0
763	10.000000	101.0	76.000000	48.00000	180.000000	32.900000	0.171	63.0
764	2.000000	122.0	70.000000	27.00000	155.548223	36.800000	0.340	27.0
765	5.000000	121.0	72.000000	23.00000	112.000000	26.200000	0.245	30.0
766	1.000000	126.0	60.000000	29.15342	155.548223	30.100000	0.349	47.0
767	1.000000	93.0	70.000000	31.00000	155.548223	30.400000	0.315	23.0