Feature Transformations In The Iris Dataset

This notebook illustrates finding transforming features in the Iris dataset. It is a version of the Scikit-Learn example Concatenating multiple feature extraction methods.

The main point it shows is how Ibex creates pandas.DataFrame structures with appropriate column headings.

Loading The Data

First we load the dataset into a pandas.DataFrame.


In [1]:
import pandas as pd                                                          
import numpy as np                                                           
from ibex.sklearn import datasets                                            
from ibex.sklearn.decomposition import PCA as PdPCA                          
from ibex.sklearn.feature_selection import SelectKBest as PdSelectKBest


<class 'sklearn.decomposition.fastica_.FastICA'>/home/atavory/.local/lib/python3.5/site-packages/scikit_learn-0.19b2-py3.5-linux-x86_64.egg/sklearn/utils/deprecation.py:57: DeprecationWarning: Class RandomizedPCA is deprecated; RandomizedPCA was deprecated in 0.18 and will be removed in 0.20. Use PCA(svd_solver='randomized') instead. The new implementation DOES NOT store whiten ``components_``. Apply transform to get them.
  warnings.warn(msg, category=DeprecationWarning)
/home/atavory/.local/lib/python3.5/site-packages/scikit_learn-0.19b2-py3.5-linux-x86_64.egg/sklearn/utils/deprecation.py:57: DeprecationWarning: Class RandomizedPCA is deprecated; RandomizedPCA was deprecated in 0.18 and will be removed in 0.20. Use PCA(svd_solver='randomized') instead. The new implementation DOES NOT store whiten ``components_``. Apply transform to get them.
  warnings.warn(msg, category=DeprecationWarning)
<class 'sklearn.decomposition.dict_learning.SparseCoder'>/home/atavory/.local/lib/python3.5/site-packages/scikit_learn-0.19b2-py3.5-linux-x86_64.egg/sklearn/decomposition/online_lda.py:532: DeprecationWarning: The default value for 'learning_method' will be changed from 'online' to 'batch' in the release 0.20. This warning was introduced in 0.18.
  DeprecationWarning)
<class 'sklearn.feature_selection.rfe.RFE'><class 'sklearn.feature_selection.rfe.RFECV'><class 'sklearn.feature_selection.from_model.SelectFromModel'>

In [2]:
iris = datasets.load_iris()                                                  
features = iris['feature_names']                                             
iris = pd.DataFrame(                                                         
    np.c_[iris['data'], iris['target']],                                     
    columns=features+['class'])

Transforming Features

Now that the data is in a DataFrame, we can transform the features. Notice how, in the resulting DataFrame, the column headers indicate the meaning of the (numerical) features.


In [3]:
trn = PdPCA(n_components=2) + PdSelectKBest(k=1)                             
trn.fit(iris[features], iris['class']).transform(iris[features]).head()


Out[3]:
pca selectkbest
comp_0 comp_1 petal length (cm)
0 -2.684207 0.326607 1.4
1 -2.715391 -0.169557 1.4
2 -2.889820 -0.137346 1.3
3 -2.746437 -0.311124 1.5
4 -2.728593 0.333925 1.4

Comparison To The Original Example

Following is the example in the original form.


In [4]:
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris = load_iris()
X, y = iris.data, iris.target
pca = PCA(n_components=2)
selection = SelectKBest(k=1)
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
combined_features.fit(X, y).transform(X)


Out[4]:
array([[-2.68420713,  0.32660731,  1.4       ],
       [-2.71539062, -0.16955685,  1.4       ],
       [-2.88981954, -0.13734561,  1.3       ],
       [-2.7464372 , -0.31112432,  1.5       ],
       [-2.72859298,  0.33392456,  1.4       ],
       [-2.27989736,  0.74778271,  1.7       ],
       [-2.82089068, -0.08210451,  1.4       ],
       [-2.62648199,  0.17040535,  1.5       ],
       [-2.88795857, -0.57079803,  1.4       ],
       [-2.67384469, -0.1066917 ,  1.5       ],
       [-2.50652679,  0.65193501,  1.5       ],
       [-2.61314272,  0.02152063,  1.6       ],
       [-2.78743398, -0.22774019,  1.4       ],
       [-3.22520045, -0.50327991,  1.1       ],
       [-2.64354322,  1.1861949 ,  1.2       ],
       [-2.38386932,  1.34475434,  1.5       ],
       [-2.6225262 ,  0.81808967,  1.3       ],
       [-2.64832273,  0.31913667,  1.4       ],
       [-2.19907796,  0.87924409,  1.7       ],
       [-2.58734619,  0.52047364,  1.5       ],
       [-2.3105317 ,  0.39786782,  1.7       ],
       [-2.54323491,  0.44003175,  1.5       ],
       [-3.21585769,  0.14161557,  1.        ],
       [-2.30312854,  0.10552268,  1.7       ],
       [-2.35617109, -0.03120959,  1.9       ],
       [-2.50791723, -0.13905634,  1.6       ],
       [-2.469056  ,  0.13788731,  1.6       ],
       [-2.56239095,  0.37468456,  1.5       ],
       [-2.63982127,  0.31929007,  1.4       ],
       [-2.63284791, -0.19007583,  1.6       ],
       [-2.58846205, -0.19739308,  1.6       ],
       [-2.41007734,  0.41808001,  1.5       ],
       [-2.64763667,  0.81998263,  1.5       ],
       [-2.59715948,  1.10002193,  1.4       ],
       [-2.67384469, -0.1066917 ,  1.5       ],
       [-2.86699985,  0.0771931 ,  1.2       ],
       [-2.62522846,  0.60680001,  1.3       ],
       [-2.67384469, -0.1066917 ,  1.5       ],
       [-2.98184266, -0.48025005,  1.3       ],
       [-2.59032303,  0.23605934,  1.5       ],
       [-2.77013891,  0.27105942,  1.3       ],
       [-2.85221108, -0.93286537,  1.3       ],
       [-2.99829644, -0.33430757,  1.3       ],
       [-2.4055141 ,  0.19591726,  1.6       ],
       [-2.20883295,  0.44269603,  1.9       ],
       [-2.71566519, -0.24268148,  1.4       ],
       [-2.53757337,  0.51036755,  1.6       ],
       [-2.8403213 , -0.22057634,  1.4       ],
       [-2.54268576,  0.58628103,  1.5       ],
       [-2.70391231,  0.11501085,  1.4       ],
       [ 1.28479459,  0.68543919,  4.7       ],
       [ 0.93241075,  0.31919809,  4.5       ],
       [ 1.46406132,  0.50418983,  4.9       ],
       [ 0.18096721, -0.82560394,  4.        ],
       [ 1.08713449,  0.07539039,  4.6       ],
       [ 0.64043675, -0.41732348,  4.5       ],
       [ 1.09522371,  0.28389121,  4.7       ],
       [-0.75146714, -1.00110751,  3.3       ],
       [ 1.04329778,  0.22895691,  4.6       ],
       [-0.01019007, -0.72057487,  3.9       ],
       [-0.5110862 , -1.26249195,  3.5       ],
       [ 0.51109806, -0.10228411,  4.2       ],
       [ 0.26233576, -0.5478933 ,  4.        ],
       [ 0.98404455, -0.12436042,  4.7       ],
       [-0.174864  , -0.25181557,  3.6       ],
       [ 0.92757294,  0.46823621,  4.4       ],
       [ 0.65959279, -0.35197629,  4.5       ],
       [ 0.23454059, -0.33192183,  4.1       ],
       [ 0.94236171, -0.54182226,  4.5       ],
       [ 0.0432464 , -0.58148945,  3.9       ],
       [ 1.11624072, -0.08421401,  4.8       ],
       [ 0.35678657, -0.06682383,  4.        ],
       [ 1.29646885, -0.32756152,  4.9       ],
       [ 0.92050265, -0.18239036,  4.7       ],
       [ 0.71400821,  0.15037915,  4.3       ],
       [ 0.89964086,  0.32961098,  4.4       ],
       [ 1.33104142,  0.24466952,  4.8       ],
       [ 1.55739627,  0.26739258,  5.        ],
       [ 0.81245555, -0.16233157,  4.5       ],
       [-0.30733476, -0.36508661,  3.5       ],
       [-0.07034289, -0.70253793,  3.8       ],
       [-0.19188449, -0.67749054,  3.7       ],
       [ 0.13499495, -0.31170964,  3.9       ],
       [ 1.37873698, -0.42120514,  5.1       ],
       [ 0.58727485, -0.48328427,  4.5       ],
       [ 0.8072055 ,  0.19505396,  4.5       ],
       [ 1.22042897,  0.40803534,  4.7       ],
       [ 0.81286779, -0.370679  ,  4.4       ],
       [ 0.24519516, -0.26672804,  4.1       ],
       [ 0.16451343, -0.67966147,  4.        ],
       [ 0.46303099, -0.66952655,  4.4       ],
       [ 0.89016045, -0.03381244,  4.6       ],
       [ 0.22887905, -0.40225762,  4.        ],
       [-0.70708128, -1.00842476,  3.3       ],
       [ 0.35553304, -0.50321849,  4.2       ],
       [ 0.33112695, -0.21118014,  4.2       ],
       [ 0.37523823, -0.29162202,  4.2       ],
       [ 0.64169028,  0.01907118,  4.3       ],
       [-0.90846333, -0.75156873,  3.        ],
       [ 0.29780791, -0.34701652,  4.1       ],
       [ 2.53172698, -0.01184224,  6.        ],
       [ 1.41407223, -0.57492506,  5.1       ],
       [ 2.61648461,  0.34193529,  5.9       ],
       [ 1.97081495, -0.18112569,  5.6       ],
       [ 2.34975798, -0.04188255,  5.8       ],
       [ 3.39687992,  0.54716805,  6.6       ],
       [ 0.51938325, -1.19135169,  4.5       ],
       [ 2.9320051 ,  0.35237701,  6.3       ],
       [ 2.31967279, -0.24554817,  5.8       ],
       [ 2.91813423,  0.78038063,  6.1       ],
       [ 1.66193495,  0.2420384 ,  5.1       ],
       [ 1.80234045, -0.21615461,  5.3       ],
       [ 2.16537886,  0.21528028,  5.5       ],
       [ 1.34459422, -0.77641543,  5.        ],
       [ 1.5852673 , -0.53930705,  5.1       ],
       [ 1.90474358,  0.11881899,  5.3       ],
       [ 1.94924878,  0.04073026,  5.5       ],
       [ 3.48876538,  1.17154454,  6.7       ],
       [ 3.79468686,  0.25326557,  6.9       ],
       [ 1.29832982, -0.76101394,  5.        ],
       [ 2.42816726,  0.37678197,  5.7       ],
       [ 1.19809737, -0.60557896,  4.9       ],
       [ 3.49926548,  0.45677347,  6.7       ],
       [ 1.38766825, -0.20403099,  4.9       ],
       [ 2.27585365,  0.33338653,  5.7       ],
       [ 2.61419383,  0.55836695,  6.        ],
       [ 1.25762518, -0.179137  ,  4.8       ],
       [ 1.29066965, -0.11642525,  4.9       ],
       [ 2.12285398, -0.21085488,  5.6       ],
       [ 2.3875644 ,  0.46251925,  5.8       ],
       [ 2.84096093,  0.37274259,  6.1       ],
       [ 3.2323429 ,  1.37052404,  6.4       ],
       [ 2.15873837, -0.21832553,  5.6       ],
       [ 1.4431026 , -0.14380129,  5.1       ],
       [ 1.77964011, -0.50146479,  5.6       ],
       [ 3.07652162,  0.68576444,  6.1       ],
       [ 2.14498686,  0.13890661,  5.6       ],
       [ 1.90486293,  0.04804751,  5.5       ],
       [ 1.16885347, -0.1645025 ,  4.8       ],
       [ 2.10765373,  0.37148225,  5.4       ],
       [ 2.31430339,  0.18260885,  5.6       ],
       [ 1.92245088,  0.40927118,  5.1       ],
       [ 1.41407223, -0.57492506,  5.1       ],
       [ 2.56332271,  0.2759745 ,  5.9       ],
       [ 2.41939122,  0.30350394,  5.7       ],
       [ 1.94401705,  0.18741522,  5.2       ],
       [ 1.52566363, -0.37502085,  5.        ],
       [ 1.76404594,  0.07851919,  5.2       ],
       [ 1.90162908,  0.11587675,  5.4       ],
       [ 1.38966613, -0.28288671,  5.1       ]])

Note that, as is usual in numpy.arrays, the meaning of the columns is absent.


In [5]: