In [11]:
import pandas as pd
import tensorflow as tf 
import numpy as np 
import matplotlib.pyplot as plt
from scipy import stats

In [3]:
from sklearn.preprocessing import normalize, minmax_scale

In [4]:
df = pd.read_csv('datasets/dataset2.csv')

In [5]:
df['average_montly_hours'][:10]


Out[5]:
0    157
1    262
2    272
3    223
4    159
5    153
6    247
7    259
8    224
9    142
Name: average_montly_hours, dtype: int64

In [6]:
hours = df['average_montly_hours'].values

Normalization using scikit minmax-scalar:

It is also known as least absolute deviations (LAD), least absolute errors (LAE). It is basically minimizing the sum of the absolute differences (S) between the target value (Yi) and the estimated values (f(xi))

To understand easily, its just adding all the values in the array and dividing each of it using the sum


In [22]:
result = np.array(minmax_scale(df['average_montly_hours'].astype(float).values.reshape(1,-1), axis=1).reshape(-1,1))

In [26]:
result


Out[26]:
array([[ 0.28504673],
       [ 0.77570093],
       [ 0.82242991],
       ..., 
       [ 0.21962617],
       [ 0.85981308],
       [ 0.28971963]])

In [25]:
stats.describe(result)


Out[25]:
DescribeResult(nobs=14999, minmax=(array([ 0.]), array([ 1.])), mean=array([ 0.49088942]), variance=array([ 0.05446574]), skewness=array([ 0.0528367]), kurtosis=array([-1.13500325]))