Here you can see how datapot works with Mushroom Data Set. The important detail about this dataset is that all it's features are categorical.
In [1]:
    
import datapot as dp
import pandas as pd
import time
    
Creating the DataPot object.
In [2]:
    
datapot = dp.DataPot()
    
In [3]:
    
import bz2
ftr = bz2.BZ2File('../data/mushrooms.jsonlines.bz2')
    
Let's call the fit method. It automatically finds appropriate transformers for the fields of jsonlines file. The parameter 'limit' means how many objects will be used to detect the right transformers.
In [4]:
    
t0 = time.time()
datapot.detect(ftr, limit = 1000)
print('detect time:', time.time() - t0)
datapot
    
    
    Out[4]:
In [5]:
    
datapot.fit(ftr)
    
    Out[5]:
In [6]:
    
datapot
    
    Out[6]:
As a result, only categorical transformers were choosen.
In [7]:
    
data = datapot.transform(ftr)
    
    
In [8]:
    
data.head()
    
    Out[8]:
In [9]:
    
data.columns
    
    Out[9]:
Let's test new features. For prediction, 'e' field is choosen.
In [10]:
    
X = data.drop(['e_e', 'e_t'], axis=1)
y = data['e_e']
    
In [11]:
    
from sklearn.model_selection import cross_val_score
    
In [12]:
    
from xgboost import XGBClassifier
clf = XGBClassifier(n_estimators=100)
cross_val_score(clf, X, y, cv=5)
    
    Out[12]:
In [ ]: