The main objetive of this PoC it's to perform an assessment about the new feature of FastText for hyperparametrization for the training time called Autotune.
From the press release the description of Autotune is:
[...]This feature automatically determines the best hyperparameters for your data set in order to build an efficient text classifier[...].
[...]FastText then uses the allotted time to search for the hyperparameters that give the best performance on the validation set.[...].
[...]Our strategy to explore various hyperparameters is inspired by existing tools, such as Nevergrad, but tailored to fastText by leveraging the specific structure of models. Our autotune explores hyperparameters by sampling, initially in a large domain that shrinks around the best combinations found over time[...]
The boilerplate code for data generation is in Tutorials in the FastText documentation.
In [1]:
import fasttext
In [2]:
def get_model_stats(model):
""" Get the main stats for model and perform a
single test using a text.For model prediction
we'll use the text from docs.
"""
pred_class = model.predict("Which baking dish is best to bake a banana bread ?")[0]
pred_proba = model.predict("Which baking dish is best to bake a banana bread ?")[1]
precision = model.test("cooking.valid")[1]
recall = model.test("cooking.valid")[2]
print(f'Predicted Class: {pred_class[0]}\nPredicted Probabilily: {round(pred_proba[0] *100, 2)} %')
print(f'Precision: {round(precision *100, 2)} %\nRecall: {round(recall *100, 2)} %')
In [3]:
model = fasttext.train_supervised(input="cooking.train",
epoch=25,
)
In [4]:
get_model_stats(model)
In [5]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
)
get_model_stats(model)
In [6]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
autotuneMetric="f1",
)
get_model_stats(model)
In [7]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
autotuneMetric="f1:__label__baking",
)
get_model_stats(model)
In [8]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
autotuneModelSize="2M",
)
get_model_stats(model)
In [9]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
autotuneDuration=60,
)
get_model_stats(model)
In [ ]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
autotunePredictions=200,
)
get_model_stats(model)
In [ ]:
model = fasttext.train_supervised(input='cooking.train',
autotuneValidationFile='cooking.valid',
autotuneMetric="f1",
autotuneModelSize="200M",
autotuneDuration=1200,
autotunePredictions=200
)
Checking the code we can find the search strategy for the Autotune follows:
For all parameters, the Autotuner have an updater (method updateArgGauss()) that considers a random number provided by a Gaussian distribution function (coeff) and set an update number between a single standard deviation (parameters startSigma and endSigma) and based on these values the coefficients have an update.
Each parameter has a specific range for the startSigma and endSigma that it's fixed in the updateArgGauss method.
Updates for each coefficient can be linear (i.e. updateCoeff + val) or power (i.e. pow(2.0, coeff); updateCoeff * val) and depends from the first random gaussian random number that are inside of standard deviation.
After each validation (that uses a different combination of parameters) one score (f1-score only) it's stored and the best one will be used to train the full model using the best combination of parameters.
Arguments Range
epoch: 1 to 100learning rate: 0.01 to 5.00dimensions: 1 to 1000wordNgrams: 1 to 5loss: Only softmaxbucket size: 10000 to 10000000minn (min length of char ngram): 1 to 3maxn (max length of char ngram): 1 to minn + 3dsub (size of each sub-vector): 1 to 4Clarification posted in issues in FastText project.
In terms of metrics for optimization there's only the f1score and labelf1score metrics.
The implementation in C in a simple Python representation it seems like this (to be reviewed):
In [ ]:
import numpy as np
# Fixed Sigma ranges
startSigma = 2.8
endSigma = 2.5
# Time parameter (remaining time)
t = 10
# Epochs
epoch = 35
# Standard deviation
stddev = startSigma -((startSigma - endSigma) / 0.5) * min(0.5, max((t - 0.25), 0.0))
# Coefficient update
mu = 0.0
sigma = stddev
# Number for update
coeff = np.random.normal(mu, sigma)
print(f'Coefficient value: {coeff}')
# Coefficient Update
print(f'Coefficient Update value: {np.power(2, coeff)}')
# Coefficient Update
print(f'Update Epochs: {(np.power(2, coeff)) + epoch}')