In [1]:

    
import warnings

%matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
np.random.seed(24)

import seaborn as sns
sns.set()

plt.rcParams["figure.figsize"] = 9, 4.51

import expectexception

import feets

warnings.simplefilter("ignore", feets.extractors.ExtractorWarning)

Extractors Tutorial

Introduction

This tutorial is a guide of how you can create your custom feature extraction routine and add this extractor to feets.

Fundamentals

The feature extraction as modeled as a class.
The class must inerith from feets.Extractor
The extractor class must has at least three elements
1. data: list with at least one valid feets data (time, magnitude, error, magnitude2, aligned_time, aligned_magnitude, aligned_magnitude2, aligned_error or aligned_error2)
2. features: list with the name of the features that this extractor generates.
3. fit: a method with the same parameters defined in the data list. fit() must return a dictionary with keys equals to the features list.

Example 1: `MaxMagMinTime` extractor

Let's say we need to create a feature extractor called MaxMagMinTime that must return 2 features:

magmax: The maximun magnitude
mintime: the minimun time



In [2]:

    
import feets

class MaxMagMinTime(feets.Extractor):  # must inherit from Extractor

    data = ['magnitude', 'time']  # Which data is needed 
                                  # to calculate this feature
    
    features = ["magmax", "mintime"] # The names of the expected 
                                     # feature
    
    # The values of data are the params
    def fit(self, magnitude, time):        
        # The return value must be a dict with the same values 
        # defined in  features
        return {"magmax": magnitude.max(), "mintime": time.min()}

Finally to make the extractor available for the FeaturSpace class, you need to register it with the command:



In [3]:

    
feets.register_extractor(MaxMagMinTime)









    Out[3]:





__main__.MaxMagMinTime

Now the extractor are available as any other provided in feets:



In [4]:

    
# let's create the feature-space
fs = feets.FeatureSpace(only=["magmax", "mintime"])
fs









    Out[4]:





<FeatureSpace: MaxMagMinTime()>



In [5]:

    
# extract the features
fs.extract(time=[1,2,3], magnitude=[100, 200, 300])









    Out[5]:





FeatureSet(features=<magmax, mintime>, timeserie=<time, magnitude>)

Example 2: `WeightedMean` extractor - Optional data.

Any extractor can set some parameters of the light curves as optional. For this, the Extractors provide the optional attribute which is a list that any component defined in data can have.

For example, if we wanted to make the WeightedMean extractor which calculates the average of the magnitudes optionally weighted by the error:



In [6]:

    
class WeightedMean(feets.Extractor):

    data = ['magnitude', 'error']
    optional = ['error']
    
    features = ["weighted_mean"] 
    
    # if error is not provided to the FeatureSpace,
    # the value of erro will be None.
    def fit(self, magnitude, error):
        if error is None:
            weighted_mean = np.average(magnitude) 
        else:
            weighted_mean = np.average(magnitude, weights=error)
        return {"weighted_mean": weighted_mean}



In [7]:

    
feets.register_extractor(WeightedMean)









    Out[7]:





__main__.WeightedMean

The interesting thing about this extractor is that it is selected by the FeatureSpace regardless of whether we request for the error or not (In both cases the new extractor will be at the end of the set):



In [8]:

    
fs = feets.FeatureSpace(data=["magnitude", "error"])
fs.features_extractors_









    Out[8]:





frozenset({Amplitude(),
           AndersonDarling(),
           AutocorLength(nlags=100),
           Beyond1Std(),
           Con(consecutiveStar=3),
           FluxPercentileRatioMid20(),
           FluxPercentileRatioMid35(),
           FluxPercentileRatioMid50(),
           FluxPercentileRatioMid65(),
           FluxPercentileRatioMid80(),
           Gskew(interpolation=nearest),
           Mean(),
           MeanVariance(),
           MedianAbsDev(),
           MedianBRP(),
           PairSlopeTrend(),
           PercentAmplitude(),
           PercentDifferenceFluxPercentile(),
           Q31(),
           RCS(),
           Skew(),
           SmallKurtosis(),
           Std(),
           StetsonK(),
           WeightedMean()})



In [9]:

    
fs = feets.FeatureSpace(data=["magnitude"])
fs.features_extractors_









    Out[9]:





frozenset({Amplitude(),
           AndersonDarling(),
           AutocorLength(nlags=100),
           Con(consecutiveStar=3),
           FluxPercentileRatioMid20(),
           FluxPercentileRatioMid35(),
           FluxPercentileRatioMid50(),
           FluxPercentileRatioMid65(),
           FluxPercentileRatioMid80(),
           Gskew(interpolation=nearest),
           Mean(),
           MeanVariance(),
           MedianAbsDev(),
           MedianBRP(),
           PairSlopeTrend(),
           PercentAmplitude(),
           PercentDifferenceFluxPercentile(),
           Q31(),
           RCS(),
           Skew(),
           SmallKurtosis(),
           Std(),
           WeightedMean()})

Now let's try to use this extractor to calculate the average value of 100 random quantities between 12 and 14 and a normal distribution of errors



In [10]:

    
import numpy as np

magnitude = np.random.uniform(12, 14, size=100)
error = np.random.normal(size=100)

We can see that the average magnitude is



In [11]:

    
np.average(magnitude)









    Out[11]:





13.100839148324201

And using the error as weight is



In [12]:

    
np.average(magnitude, weights=error)









    Out[12]:





13.132964316832446

Now we create the FeatureSpace only with the extractor of interest



In [13]:

    
fs = feets.FeatureSpace(only=["weighted_mean"])

And we can verify the difference of providing the error



In [14]:

    
fs.extract(magnitude=magnitude, error=error).as_dict()









    Out[14]:





{'weighted_mean': 13.132964316832446}

Or use only the magnitude



In [15]:

    
fs.extract(magnitude=magnitude).as_dict()









    Out[15]:





{'weighted_mean': 13.100839148324201}

Example 3: `RobustMean` extractor - Configuration parameters

Let's assume that we want to make an extractor that calculates an average over the magnitudes but in this case, omit the extreme values. For example, remove all values of the upper and lower $ 5 \% $ from the quantities. For that, we can use the numpy percentile function



In [16]:

    
class RobustMean(feets.Extractor):
    data = ["magnitude"]
    features = ["robust_mean"]
    
    def fit(self, magnitude):
        # extract the percentiles
        llimit, ulimit = np.percentile(magnitude, (5, 95))
        
        # remove the two limits
        crop = magnitude[
            (magnitude > llimit) & (magnitude < ulimit)]
        
        # calculate the mean
        robust_mean = np.mean(crop)
        
        return{"robust_mean": robust_mean}
    
feets.register_extractor(RobustMean)









    Out[16]:





__main__.RobustMean

now we can create the FeatureSpace instance



In [17]:

    
fs = feets.FeatureSpace(only=["robust_mean"])
fs









    Out[17]:





<FeatureSpace: RobustMean()>

And finally, we can extract robust_mean from a random uniform magnitudes.



In [18]:

    
magnitude = np.random.uniform(12, 14, size=100)
fs.extract(magnitude=magnitude)["robust_mean"]









    Out[18]:





12.958129090838725

Now let's assume that we want a configurable extractor that allows the user to determine which percentiles to remove before calculating the average. This is possible thanks to the params attribute.

params must be a dict that has the name of the parameter as a key, and its default value. All the keys of the dict params are sent as parameters to thefit ()method in conjunction with the required time-series data.

Now if we want to write the same RobustMean but with a configurable parameter instead of leaving the $ 5 \% $ fixed, we can do it as follows.



In [19]:

    
class RobustMean(feets.Extractor):
    data = ["magnitude"]
    features = ["robust_mean"]
        
    # by default the percentile to crop is still 5.
    params = {"percentile": 5}
    
    # now magnitude (from data), and percentile (from params)
    # are are given as keyword argument.
    def fit(self, magnitude, percentile):
        # first calculate the lower and upper percentile
        lp, up = percentile, 100 - percentile
        
        # extract the percentiles
        llimit, ulimit = np.percentile(magnitude, (lp, up))
        
        # remove the two limits
        crop = magnitude[
            (magnitude > llimit) & (magnitude < ulimit)]
        
        # calculate the mean
        robust_mean = np.mean(crop)
        
        return{"robust_mean": robust_mean}
    
feets.register_extractor(RobustMean)









    Out[19]:





__main__.RobustMean

The parameter percentile of our extractor, is configurable, at the time of create our FeatureSpace.

The way to change the value is to provide the class name as a keyword argument, and all the params that you want to change Like a dictionary



In [20]:

    
fs = feets.FeatureSpace(only=["robust_mean"], RobustMean={"percentile": 6})
fs









    Out[20]:





<FeatureSpace: RobustMean(percentile=6)>



In [21]:

    
fs.extract(magnitude=magnitude).as_dict()









    Out[21]:





{'robust_mean': 12.957153715793863}

Example 4: `MagErrHistogram` extractor - Flatten api.

The features generated by the extractors can be any arbitrary Python object.

For example the following extractor binarizes the values of magnitude anderror in a 2D histogram, and returns them in the form of a 2-dimensional array:



In [22]:

    
@feets.register_extractor
class MagErrHistogram(feets.Extractor):
    data = ["magnitude", "error"]
    features = ["Mag_Err_Histogram"]
    
    def fit(self, magnitude, error):
        histogram2d = np.histogram2d(magnitude, error)[0]
        return {"Mag_Err_Histogram": histogram2d}



In [23]:

    
fs = feets.FeatureSpace(only=["Mag_Err_Histogram"])



In [24]:

    
magnitude = np.random.uniform(12, 14, size=100)
error = np.random.normal(size=100)
rs = fs.extract(magnitude=magnitude, error=error)
rs.as_dict()









    Out[24]:





{'Mag_Err_Histogram': array([[0., 0., 1., 1., 1., 0., 1., 0., 6., 1.],
        [0., 0., 1., 1., 2., 2., 2., 1., 2., 1.],
        [0., 2., 0., 0., 3., 2., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1., 1., 1., 3., 1., 0.],
        [1., 0., 1., 0., 2., 2., 1., 3., 0., 0.],
        [0., 1., 0., 1., 1., 4., 1., 0., 0., 0.],
        [0., 0., 1., 0., 2., 2., 1., 1., 2., 3.],
        [2., 1., 1., 2., 1., 1., 1., 2., 1., 0.],
        [0., 0., 0., 0., 0., 2., 0., 1., 1., 2.],
        [0., 0., 1., 4., 2., 1., 1., 0., 0., 0.]])}

The interesting thing is that if we call the method as_arrays() or as_dataframe() each one of the values the matrix is split into independent elements.



In [32]:

    
rs.as_dataframe()









    Out[32]:







  
    
      
      Mag_Err_Histogram_0_0
      Mag_Err_Histogram_0_1
      Mag_Err_Histogram_0_2
      Mag_Err_Histogram_0_3
      Mag_Err_Histogram_0_4
      Mag_Err_Histogram_0_5
      Mag_Err_Histogram_0_6
      Mag_Err_Histogram_0_7
      Mag_Err_Histogram_0_8
      Mag_Err_Histogram_0_9
      ...
      Mag_Err_Histogram_9_0
      Mag_Err_Histogram_9_1
      Mag_Err_Histogram_9_2
      Mag_Err_Histogram_9_3
      Mag_Err_Histogram_9_4
      Mag_Err_Histogram_9_5
      Mag_Err_Histogram_9_6
      Mag_Err_Histogram_9_7
      Mag_Err_Histogram_9_8
      Mag_Err_Histogram_9_9
    
  
  
    
      0
      0.0
      0.0
      1.0
      1.0
      1.0
      0.0
      1.0
      0.0
      6.0
      1.0
      ...
      0.0
      0.0
      1.0
      4.0
      2.0
      1.0
      1.0
      0.0
      0.0
      0.0
    
  

1 rows × 100 columns

The conversion of a 2D matrix to scalars is executed in the flatten_feature () method of the extractor; whose default behavior works for arrays of an arbitrary dimension by adding suffixes for each dimension.

Thus the value of the matrix Mag_Err_Histogram[i][j] becomes a feature Mag_Err_Histogram_i_j (if there were more dimensions only more sub-indices are added). If it's needed a custom behavior can be provided.

For example, if we wanted to return only an average value of the histogram; We could redefine this method as follows:



In [45]:

    
@feets.register_extractor
class MagErrHistogram(feets.Extractor):
    data = ["magnitude", "error"]
    features = ["Mag_Err_Histogram"]
    
    def flatten_feature(self, feature, value, extractor_features, **kwargs):
        return {"Mag_Err_Histogram": np.mean(value)}
    
    def fit(self, magnitude, error):
        histogram2d = np.histogram2d(magnitude, error)[0]
        return {"Mag_Err_Histogram": histogram2d}



In [46]:

    
fs = feets.FeatureSpace(only=["Mag_Err_Histogram"])
rs = fs.extract(magnitude=magnitude, error=error)
rs.as_dataframe()









    Out[46]:







  
    
      
      Mag_Err_Histogram
    
  
  
    
      0
      1.0

The flatten_feature method receives several parameters:

feature: the name of the feature you are trying to flatten. It is useful if the extractor generates several features and we want to provide a different logic for each one.
value: the value of the feature to flatten as it was generated.
extractor_features: All the features generated by this extractor. It is useful if the flattening logic depends on the values of other features.
**kwargs: All parameters received by the fit() method. This is all data plus all configurable params. It is useful if the flattening logic depends on the values of the time series or the configuration of the extractor.



In [ ]:

Extractors Tutorial

Introduction

Fundamentals

Example 1: MaxMagMinTime extractor

Example 2: WeightedMean extractor - Optional data.

Example 3: RobustMean extractor - Configuration parameters

Example 4: MagErrHistogram extractor - Flatten api.

Example 1: `MaxMagMinTime` extractor

Example 2: `WeightedMean` extractor - Optional data.

Example 3: `RobustMean` extractor - Configuration parameters

Example 4: `MagErrHistogram` extractor - Flatten api.