Dataset with timestamp features.

Creating the DataPot object.


In [1]:
import datapot as dp

In [2]:
datapot = dp.DataPot()

In [3]:
from datapot.utils import csv_to_jsonlines

csv_to_jsonlines('../data/transactions.csv', '../data/transactions.jsonlines')

In [4]:
ftr = open('../data/transactions.jsonlines')

Let's call the fit method. It automatically finds appropriate transformers for the fields of jsonlines file. The parameter 'limit' means how many objects will be used to detect the right transformers.


In [6]:
datapot.detect(ftr, limit=100)


Out[6]:
DataPot class instance
 - number of features without transformation: 5
 - number of new features: 13
features to transform: 
	('merchant_id', [SVDOneHotTransformer, NumericTransformer])
	('latitude', [NumericTransformer])
	('longitude', [NumericTransformer])
	('real_transaction_dttm', [TimestampTransformer])
	('record_date', [TimestampTransformer])

In [7]:
datapot.fit(ftr)


Out[7]:
DataPot class instance
 - number of features without transformation: 5
 - number of new features: 23
features to transform: 
	('merchant_id', [SVDOneHotTransformer, NumericTransformer])
	('latitude', [NumericTransformer])
	('longitude', [NumericTransformer])
	('real_transaction_dttm', [TimestampTransformer])
	('record_date', [TimestampTransformer])

In [8]:
datapot


Out[8]:
DataPot class instance
 - number of features without transformation: 5
 - number of new features: 23
features to transform: 
	('merchant_id', [SVDOneHotTransformer, NumericTransformer])
	('latitude', [NumericTransformer])
	('longitude', [NumericTransformer])
	('real_transaction_dttm', [TimestampTransformer])
	('record_date', [TimestampTransformer])

Let's remove the SVDOneHotTransformer


In [9]:
datapot.remove_transformer('merchant_id', 0)


Out[9]:
DataPot class instance
 - number of features without transformation: 5
 - number of new features: 23
features to transform: 
	('merchant_id', [NumericTransformer])
	('latitude', [NumericTransformer])
	('longitude', [NumericTransformer])
	('real_transaction_dttm', [TimestampTransformer])
	('record_date', [TimestampTransformer])

In [ ]:
data = datapot.transform(ftr)

In [9]:
data.head()


Out[9]:
merchant_id_ record_date_timestamp_unixtime record_date_timestamp_week_day record_date_timestamp_month_day record_date_timestamp_hour record_date_timestamp_minute latitude_ longitude_ real_transaction_dttm_timestamp_unixtime real_transaction_dttm_timestamp_week_day real_transaction_dttm_timestamp_month_day real_transaction_dttm_timestamp_hour real_transaction_dttm_timestamp_minute
0 178 1.488177e+09 0 27 9 30 0.000000 0.000000 1.488177e+09 0 27 9 34
1 178 1.488207e+09 0 27 17 54 55.055996 82.912991 1.488207e+09 0 27 17 49
2 178 1.488177e+09 0 27 9 31 0.000000 0.000000 1.488177e+09 0 27 9 34
3 178 1.488207e+09 0 27 17 43 55.056034 82.912734 1.488207e+09 0 27 17 49
4 178 1.488207e+09 0 27 17 45 55.056034 82.912734 1.488207e+09 0 27 17 49

In [10]:
data.columns


Out[10]:
Index(['merchant_id_', 'record_date_timestamp_unixtime',
       'record_date_timestamp_week_day', 'record_date_timestamp_month_day',
       'record_date_timestamp_hour', 'record_date_timestamp_minute',
       'latitude_', 'longitude_', 'real_transaction_dttm_timestamp_unixtime',
       'real_transaction_dttm_timestamp_week_day',
       'real_transaction_dttm_timestamp_month_day',
       'real_transaction_dttm_timestamp_hour',
       'real_transaction_dttm_timestamp_minute'],
      dtype='object')