In [1]:
import warnings
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(24)
import seaborn as sns
sns.set()
plt.rcParams["figure.figsize"] = 9, 4.51
import expectexception
import feets
warnings.simplefilter("ignore", feets.extractors.ExtractorWarning)
This tutorial is a guide of how you can create your custom feature extraction routine and add this extractor to feets.
feets.Extractor
data
list. fit()
must return
a dictionary with keys equals to the features
list.MaxMagMinTime
extractorLet's say we need to create a feature extractor called MaxMagMinTime that must return 2 features:
In [2]:
import feets
class MaxMagMinTime(feets.Extractor): # must inherit from Extractor
data = ['magnitude', 'time'] # Which data is needed
# to calculate this feature
features = ["magmax", "mintime"] # The names of the expected
# feature
# The values of data are the params
def fit(self, magnitude, time):
# The return value must be a dict with the same values
# defined in features
return {"magmax": magnitude.max(), "mintime": time.min()}
Finally to make the extractor available for the FeaturSpace
class, you need to register it with the command:
In [3]:
feets.register_extractor(MaxMagMinTime)
Out[3]:
Now the extractor are available as any other provided in feets:
In [4]:
# let's create the feature-space
fs = feets.FeatureSpace(only=["magmax", "mintime"])
fs
Out[4]:
In [5]:
# extract the features
fs.extract(time=[1,2,3], magnitude=[100, 200, 300])
Out[5]:
WeightedMean
extractor - Optional data.Any extractor can set some parameters of the light curves as optional. For this, the Extractors
provide the optional
attribute which is a list that any component defined in data
can have.
For example, if we wanted to make the WeightedMean
extractor which calculates the average of the magnitudes optionally weighted by the error:
In [6]:
class WeightedMean(feets.Extractor):
data = ['magnitude', 'error']
optional = ['error']
features = ["weighted_mean"]
# if error is not provided to the FeatureSpace,
# the value of erro will be None.
def fit(self, magnitude, error):
if error is None:
weighted_mean = np.average(magnitude)
else:
weighted_mean = np.average(magnitude, weights=error)
return {"weighted_mean": weighted_mean}
In [7]:
feets.register_extractor(WeightedMean)
Out[7]:
The interesting thing about this extractor is that it is selected by the FeatureSpace
regardless of whether we request for the error or not (In both cases the new extractor will be at the end of the set):
In [8]:
fs = feets.FeatureSpace(data=["magnitude", "error"])
fs.features_extractors_
Out[8]:
In [9]:
fs = feets.FeatureSpace(data=["magnitude"])
fs.features_extractors_
Out[9]:
Now let's try to use this extractor to calculate the average value of 100 random quantities between 12 and 14 and a normal distribution of errors
In [10]:
import numpy as np
magnitude = np.random.uniform(12, 14, size=100)
error = np.random.normal(size=100)
We can see that the average magnitude is
In [11]:
np.average(magnitude)
Out[11]:
And using the error as weight is
In [12]:
np.average(magnitude, weights=error)
Out[12]:
Now we create the FeatureSpace
only with the extractor of interest
In [13]:
fs = feets.FeatureSpace(only=["weighted_mean"])
And we can verify the difference of providing the error
In [14]:
fs.extract(magnitude=magnitude, error=error).as_dict()
Out[14]:
Or use only the magnitude
In [15]:
fs.extract(magnitude=magnitude).as_dict()
Out[15]:
RobustMean
extractor - Configuration parametersLet's assume that we want to make an extractor that calculates an average over the magnitudes but in this case, omit the extreme values. For example, remove all values of the upper and lower $ 5 \% $ from the quantities. For that, we can use the numpy percentile
function
In [16]:
class RobustMean(feets.Extractor):
data = ["magnitude"]
features = ["robust_mean"]
def fit(self, magnitude):
# extract the percentiles
llimit, ulimit = np.percentile(magnitude, (5, 95))
# remove the two limits
crop = magnitude[
(magnitude > llimit) & (magnitude < ulimit)]
# calculate the mean
robust_mean = np.mean(crop)
return{"robust_mean": robust_mean}
feets.register_extractor(RobustMean)
Out[16]:
now we can create the FeatureSpace
instance
In [17]:
fs = feets.FeatureSpace(only=["robust_mean"])
fs
Out[17]:
And finally, we can extract robust_mean
from a random uniform magnitudes.
In [18]:
magnitude = np.random.uniform(12, 14, size=100)
fs.extract(magnitude=magnitude)["robust_mean"]
Out[18]:
Now let's assume that we want a configurable extractor that allows the user to determine which percentiles to remove before calculating the average. This is possible thanks to the params
attribute.
params
must be a dict that has the name of the parameter as a key, and its default value. All the keys of the
dict params
are sent as parameters to thefit ()
method in conjunction with the required time-series data.
Now if we want to write the same RobustMean
but with a configurable parameter instead of leaving the $ 5 \% $ fixed, we can do it as follows.
In [19]:
class RobustMean(feets.Extractor):
data = ["magnitude"]
features = ["robust_mean"]
# by default the percentile to crop is still 5.
params = {"percentile": 5}
# now magnitude (from data), and percentile (from params)
# are are given as keyword argument.
def fit(self, magnitude, percentile):
# first calculate the lower and upper percentile
lp, up = percentile, 100 - percentile
# extract the percentiles
llimit, ulimit = np.percentile(magnitude, (lp, up))
# remove the two limits
crop = magnitude[
(magnitude > llimit) & (magnitude < ulimit)]
# calculate the mean
robust_mean = np.mean(crop)
return{"robust_mean": robust_mean}
feets.register_extractor(RobustMean)
Out[19]:
The parameter percentile
of our extractor, is configurable, at the time of
create our FeatureSpace
.
The way to change the value is to provide the class name as a keyword argument, and all the params
that you want to change
Like a dictionary
In [20]:
fs = feets.FeatureSpace(only=["robust_mean"], RobustMean={"percentile": 6})
fs
Out[20]:
In [21]:
fs.extract(magnitude=magnitude).as_dict()
Out[21]:
In [22]:
@feets.register_extractor
class MagErrHistogram(feets.Extractor):
data = ["magnitude", "error"]
features = ["Mag_Err_Histogram"]
def fit(self, magnitude, error):
histogram2d = np.histogram2d(magnitude, error)[0]
return {"Mag_Err_Histogram": histogram2d}
In [23]:
fs = feets.FeatureSpace(only=["Mag_Err_Histogram"])
In [24]:
magnitude = np.random.uniform(12, 14, size=100)
error = np.random.normal(size=100)
rs = fs.extract(magnitude=magnitude, error=error)
rs.as_dict()
Out[24]:
The interesting thing is that if we call the method as_arrays()
or as_dataframe()
each one of the values the matrix is split into independent elements.
In [32]:
rs.as_dataframe()
Out[32]:
The conversion of a 2D matrix to scalars is executed in the flatten_feature ()
method of the extractor; whose default behavior works for arrays of an arbitrary dimension by adding suffixes for each dimension.
Thus the value of the matrix Mag_Err_Histogram[i][j]
becomes a feature Mag_Err_Histogram_i_j
(if there were more dimensions only more sub-indices are added). If it's needed a custom behavior can be provided.
For example, if we wanted to return only an average value of the histogram; We could redefine this method as follows:
In [45]:
@feets.register_extractor
class MagErrHistogram(feets.Extractor):
data = ["magnitude", "error"]
features = ["Mag_Err_Histogram"]
def flatten_feature(self, feature, value, extractor_features, **kwargs):
return {"Mag_Err_Histogram": np.mean(value)}
def fit(self, magnitude, error):
histogram2d = np.histogram2d(magnitude, error)[0]
return {"Mag_Err_Histogram": histogram2d}
In [46]:
fs = feets.FeatureSpace(only=["Mag_Err_Histogram"])
rs = fs.extract(magnitude=magnitude, error=error)
rs.as_dataframe()
Out[46]:
The flatten_feature
method receives several parameters:
feature
: the name of the feature you are trying to flatten. It is useful if the extractor generates several features and we want to provide a different logic for each one.value
: the value of the feature to flatten as it was generated.extractor_features
: All the features generated by this extractor. It is useful if the flattening logic depends on the values of other features.**kwargs
: All parameters received by the fit()
method. This is all data plus all configurable params. It is useful if the flattening logic depends on the values of the time series or the configuration of the extractor.
In [ ]: