In [1]:

    
%load_ext watermark
%watermark -a 'Sebastian Raschka' -d -v









    



Sebastian Raschka 07/02/2015 

CPython 3.4.3
IPython 3.1.0



In [2]:

    
import sys
sys.path = ['/Users/sebastian/github/mlxtend/'] + sys.path



In [7]:

    
import pandas as pd
from mlxtend.preprocessing import minmax_scaling
from mlxtend.preprocessing import standardizing

Feature Scaling

Feature scaling is a crucial step in our preprocessing pipeline that can easily be forgotten. Decision trees and random forests are one of the very few machine learning algorithms where we don’t need to worry about feature scaling. However, the majority of machine learning and optimization algorithms behave much better if features are on the same scale.

The importance of feature scaling can be illustrated by a simple example. Let’s assume that we have two features where one feature is measured on a scale from 1 to 10 and the second feature is measured on a scale from 1 to 100,000, respectively. When we think of the squared error function in adaline in Chapter 2, it is intuitive to say that the algorithm will mostly be busy optimizing the weights according to the larger errors in the second feature. Another example is the K-nearest neighbors (KNN) algorithm with a Euclidean distance measure where every the distances between the samples will tend to be larger with on the second feature axis.

Now, there are two common approaches to bring different features onto the same scale, normalization and standardization. Those terms are often used quite loosely in different fields and the meaning has to be derived from the context, but typically, normalization refers to the rescaling of the features to a range of [0, 1], which is a special case of min-max scaling. To normalize our data, we can simply apply the min-max scaling to each feature column, where the new value $x_{norm}^{(i)}$ of a sample $x^{(i)}$ can be calculated as

$x_{norm}^{(i)} = \frac{x^{(i)} - \mathbf{x}_{min}}{\mathbf{x}_{max} - \mathbf{x}_{min}}$

where $x^{(i)}$ is a particular sample, $\mathbf{x}_{min}$ is the smallest feature value in the column, and $\mathbf{x}_{max}$ the largest value, respectively.

Although normalization via min-max scaling is a commonly used technique, which is useful when we need values in a bounded interval, standardization can be more practical for many machine learning algorithms. The reason is that many linear models, such as the logistic regression and SVMs, initialize the weights to 0 or small random values close to 0. Using standardization, we center the feature columns at mean 0 with standard deviation 1 so that the feature columns takes the form of a normal distribution, which makes it easier to learn the weights. The procedure of standardization can be expressed by the equation. The procedure of standardization can be expressed by the equation

$x_{std}^{(i)} = \frac{x^{(i)} - \mu_x}{\sigma_x}$

where $\mu_x$ is the sample mean of a particular feature column and $\sigma_x$ the corresponding standard deviation, respectively.

Example Data



In [8]:

    
s1 = pd.Series([1,2,3,4,5,6], index=(range(6)))
s2 = pd.Series([10,9,8,7,6,5], index=(range(6)))
df = pd.DataFrame(s1, columns=['s1'])
df['s2'] = s2
df

Min-Max Scaling



In [9]:

    
minmax_scaling(df, columns=['s1', 's2'])



In [10]:

    
minmax_scaling(df, columns=['s1', 's2'], min_val=50, max_val=100)

Standardizing



In [11]:

    
standardizing(df, columns=['s1', 's2'])



In [ ]:

	s1	s2
0	0.0	1.0
1	0.2	0.8
2	0.4	0.6
3	0.6	0.4
4	0.8	0.2
5	1.0	0.0

	s1	s2
0	50	100
1	60	90
2	70	80
3	80	70
4	90	60
5	100	50

	s1	s2
0	-1.46385	1.46385
1	-0.87831	0.87831
2	-0.29277	0.29277
3	0.29277	-0.29277
4	0.87831	-0.87831
5	1.46385	-1.46385