Data pre-processing in python, using Pima Indians diabetes dataset from National Institute of Diabetes and Digestive and Kidney Diseases
Citation: Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Feature Information:
In [1]:
import pandas as pd
from pandas import read_csv
pd.set_option('precision', 3) # set display precision to 3 significant figures
filename = 'C:/Users/craigrshenton/Desktop/Dropbox/python/python_pro/machine_learning_mastery_with_python/machine_learning_mastery_with_python_code/chapter_07/pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = read_csv(filename, names=names)
df.head()
Out[1]:
In [2]:
feature_cols = df.columns[0:8]
X = df[feature_cols] # first 8 cols are features
y = df['class'] # last col is target data
In [3]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
rescaledX = scaler.fit_transform(X)
df_scaled = pd.DataFrame(data=rescaledX, columns=feature_cols)
df_scaled.head()
Out[3]:
In [4]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X)
standardX = scaler.transform(X)
df_standard = pd.DataFrame(data=standardX, columns=feature_cols)
df_standard.head()
Out[4]:
In [5]:
from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X)
normalizedX = scaler.transform(X)
df_norm = pd.DataFrame(data=normalizedX, columns=feature_cols)
df_norm.head()
Out[5]:
In [ ]:
In [ ]: