In [11]:

    
import pandas as pd
import tensorflow as tf 
import numpy as np 
import matplotlib.pyplot as plt
from scipy import stats



In [3]:

    
from sklearn.preprocessing import normalize, minmax_scale



In [4]:

    
df = pd.read_csv('datasets/dataset2.csv')



In [5]:

    
df['average_montly_hours'][:10]









    Out[5]:





0    157
1    262
2    272
3    223
4    159
5    153
6    247
7    259
8    224
9    142
Name: average_montly_hours, dtype: int64



In [6]:

    
hours = df['average_montly_hours'].values

Normalization using scikit minmax-scalar:

It is also known as least absolute deviations (LAD), least absolute errors (LAE). It is basically minimizing the sum of the absolute differences (S) between the target value (Yi) and the estimated values (f(xi))

To understand easily, its just adding all the values in the array and dividing each of it using the sum



In [22]:

    
result = np.array(minmax_scale(df['average_montly_hours'].astype(float).values.reshape(1,-1), axis=1).reshape(-1,1))



In [26]:

    
result









    Out[26]:





array([[ 0.28504673],
       [ 0.77570093],
       [ 0.82242991],
       ..., 
       [ 0.21962617],
       [ 0.85981308],
       [ 0.28971963]])



In [25]:

    
stats.describe(result)









    Out[25]:





DescribeResult(nobs=14999, minmax=(array([ 0.]), array([ 1.])), mean=array([ 0.49088942]), variance=array([ 0.05446574]), skewness=array([ 0.0528367]), kurtosis=array([-1.13500325]))