PoC Autotune Techday

The main objetive of this PoC it's to perform an assessment about the new feature of FastText for hyperparametrization for the training time called Autotune.

What is Autotune?

From the press release the description of Autotune is:

[...]This feature automatically determines the best hyperparameters for your data set in order to build an efficient text classifier[...].

[...]FastText then uses the allotted time to search for the hyperparameters that give the best performance on the validation set.[...].

[...]Our strategy to explore various hyperparameters is inspired by existing tools, such as Nevergrad, but tailored to fastText by leveraging the specific structure of models. Our autotune explores hyperparameters by sampling, initially in a large domain that shrinks around the best combinations found over time[...]

The boilerplate code for data generation is in Tutorials in the FastText documentation.


In [1]:
import fasttext

In [2]:
def get_model_stats(model):
    """ Get the main stats for model and perform a 
    single test using a text.For model prediction 
    we'll use the text from docs.
    """
    pred_class = model.predict("Which baking dish is best to bake a banana bread ?")[0]
    pred_proba = model.predict("Which baking dish is best to bake a banana bread ?")[1]
    precision = model.test("cooking.valid")[1]
    recall = model.test("cooking.valid")[2]
    
    print(f'Predicted Class: {pred_class[0]}\nPredicted Probabilily: {round(pred_proba[0] *100, 2)} %')
    print(f'Precision: {round(precision *100, 2)} %\nRecall: {round(recall *100, 2)} %')

Vanilla Model

The idea here it's to have a plain vanilla model baseline just to make a small comparison and see the evolution in the recall and precision along the time.


In [3]:
model = fasttext.train_supervised(input="cooking.train",
                                  epoch=25,
                                 )

In [4]:
get_model_stats(model)


Predicted Class: __label__baking
Predicted Probabilily: 63.14 %
Precision: 51.9 %
Recall: 22.44 %

Autotune Options

We're going across all options for autotune and describe them. Due the lack of documentation about the feature in the main repository most of what we're going to see here was based just checking the source code in the main repository.

autotuneValidationFile

autotuneValidationFile: Validation file that need to be passed for the autotune. Should be in the same format as training set.


In [5]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                 )

get_model_stats(model)


Predicted Class: __label__bread
Predicted Probabilily: 34.16 %
Precision: 56.5 %
Recall: 24.43 %

autotuneMetric

autotuneMetric: Metric used for autotune for the model assessment. Default f1-score. Options available:

  • {f1}
  • {f1:labelname}

In [6]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                  autotuneMetric="f1",
                                 )

get_model_stats(model)


Predicted Class: __label__bread
Predicted Probabilily: 34.53 %
Precision: 56.43 %
Recall: 24.41 %

In [7]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                  autotuneMetric="f1:__label__baking",
                                 )

get_model_stats(model)


Predicted Class: __label__bread
Predicted Probabilily: 40.55 %
Precision: 51.3 %
Recall: 22.19 %

autotuneModelSize

autotuneModelSize: Desired size of the model already quantized. If the value is empty, it means "do not quantize". Byte units available:

  • {'k', 1000}
  • {'K', 1000}
  • {'m', 1000000}
  • {'M', 1000000}
  • {'g', 1000000000}
  • {'G', 1000000000}

In [8]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                  autotuneModelSize="2M",
                                 )

get_model_stats(model)


Predicted Class: __label__baking
Predicted Probabilily: 63.47 %
Precision: 54.9 %
Recall: 23.74 %

autotuneDuration

autotuneDuration: Timespan in seconds for autotune perform the search between several options. Default 60 * 5; // 5 minutes


In [9]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                  autotuneDuration=60,
                                 )

get_model_stats(model)


Predicted Class: __label__baking
Predicted Probabilily: 32.4 %
Precision: 22.17 %
Recall: 9.59 %

autotunePredictions

autotunePredictions: Number of predictions used for evaluation. Default 1.


In [ ]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                  autotunePredictions=200,
                                 )

get_model_stats(model)

Combination of all parameters in Autotune


In [ ]:
model = fasttext.train_supervised(input='cooking.train',
                                  autotuneValidationFile='cooking.valid',
                                  autotuneMetric="f1",
                                  autotuneModelSize="200M",
                                  autotuneDuration=1200,
                                  autotunePredictions=200
                                 )

Autotune Strategy

Checking the code we can find the search strategy for the Autotune follows:

For all parameters, the Autotuner have an updater (method updateArgGauss()) that considers a random number provided by a Gaussian distribution function (coeff) and set an update number between a single standard deviation (parameters startSigma and endSigma) and based on these values the coefficients have an update.

Each parameter has a specific range for the startSigma and endSigma that it's fixed in the updateArgGauss method.

Updates for each coefficient can be linear (i.e. updateCoeff + val) or power (i.e. pow(2.0, coeff); updateCoeff * val) and depends from the first random gaussian random number that are inside of standard deviation.

After each validation (that uses a different combination of parameters) one score (f1-score only) it's stored and the best one will be used to train the full model using the best combination of parameters.

Arguments Range

  • epoch: 1 to 100
  • learning rate: 0.01 to 5.00
  • dimensions: 1 to 1000
  • wordNgrams: 1 to 5
  • loss: Only softmax
  • bucket size: 10000 to 10000000
  • minn (min length of char ngram): 1 to 3
  • maxn (max length of char ngram): 1 to minn + 3
  • dsub (size of each sub-vector): 1 to 4

Clarification posted in issues in FastText project.

In terms of metrics for optimization there's only the f1score and labelf1score metrics.

Doubt about only two metrics.

The implementation in C in a simple Python representation it seems like this (to be reviewed):


In [ ]:
import numpy as np

# Fixed Sigma ranges
startSigma = 2.8
endSigma = 2.5

# Time parameter (remaining time)
t = 10

# Epochs
epoch = 35

# Standard deviation
stddev = startSigma -((startSigma - endSigma) / 0.5) * min(0.5, max((t - 0.25), 0.0))

# Coefficient update
mu = 0.0
sigma = stddev

# Number for update
coeff = np.random.normal(mu, sigma)
print(f'Coefficient value: {coeff}')

# Coefficient Update
print(f'Coefficient Update value: {np.power(2, coeff)}')

# Coefficient Update
print(f'Update Epochs: {(np.power(2, coeff)) + epoch}')