In [1]:
import featuretools as ft
In this example, we will use a dataset of retail data of customers from a UK website from December 2010 to December 2011.
In [2]:
es = ft.demo.load_retail(nrows=10000)
let's use a simple feature for this example.
In [3]:
region = ft.Feature(es["customers"]["Country"])
We can supply “cutoff times” to specify that we want to calculate features one year after a customer’s first invoice.
In [4]:
import pandas as pd
cutoff_times = es["customers"].df[["CustomerID", "first_invoices_time"]].rename(
columns={"CustomerID": "instance_id", "first_invoices_time": "time"})
cutoff_times["time"] = cutoff_times["time"] + pd.Timedelta("365 days")
Here is what some of the cutoff times look like.
In [5]:
cutoff_times.head(10)
Out[5]:
If you want to save intermediate computations as CSVs, simply pass the location of a directory of where the computation should be saved. For example, if you pass a directory called "ft_temp", CSV files will be output to the directory, named according t the timestamp that it represents.
In [6]:
import os
save_progress = os.path.join(os.getcwd(), 'ft_temp')
if not os.path.exists(save_progress):
os.makedirs(save_progress)
In [7]:
fm_save = ft.calculate_feature_matrix([region],
entityset=es,
cutoff_time=cutoff_times.sample(10),
save_progress=save_progress)
As seen below, there are now files in the directory, named by timestamp.
In [8]:
% ls ft_temp/
In [9]:
import shutil
shutil.rmtree(save_progress)