This release adds several major new features such as (1) faster and more memory-efficient transformer training, (2) a new plugin system for custom logging and training, (3) new API docs for better documentation - still in beta, and (4) various new models, datasets, bug fixes and enhancements. This release also increases the minimum requirement to Python 3.8!
New Feature: Faster and more memory-efficient transformer trainingThis release integrates @helpmefindaname's transformer-smaller-training-vocab into the ModelTrainer. This temporarily reduces a transformer's vocabulary to only the tokens in the training dataset, and after training restores the full vocabulary. Depending on the dataset, this may effect huge savings in GPU memory and tuning speeds.
To use this feature, simply add the flag reduce_transformer_vocab=True
to the fine_tune
method. For example, to fine-tune a distilbert model on TREC_6, run this code (step 7 has the flag to reduce the vocabulary):
# 1. get the corpus corpus: Corpus = TREC_6() # 2. what label do we want to predict? label_type = "question_class" # 3. create the label dictionary label_dict = corpus.make_label_dictionary(label_type=label_type) # 4. initialize transformer document embeddings (many models are available) document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True) # 5. create the text classifier classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type) # 6. initialize trainer trainer = ModelTrainer(classifier, corpus) # 7. fine-tune the model, but **reduce the vocabulary** for faster training trainer.fine_tune( "resources/taggers/question-classification-with-transformer", reduce_transformer_vocab=True, # set this to False for slow version )
Involved PR: add reduce transformer vocab plugin by @helpmefindaname in #3217
New Feature: Trainer PluginsA new "Plugin" system was added to the ModelTrainer
, allowing far greater options to customize the training cycle (and slimming down the code of the ModelTrainer somewhat). For instance, it is now possible to customize logging to a far greater degree and integrate third-party logging tools.
For instance, if you want to integrate ClearML logging into the above script, simply instantiate the plugin and attach it to the trainer:
[...] # 6. initialize trainer trainer = ModelTrainer(classifier, corpus) # NEW: instantiate a special logger and attach it to the trainer before the training run ClearmlLoggerPlugin(clearml.Task.init(project_name="test", task_name="test")).attach_to(trainer) # 7. fine-tune the model, but **reduce the vocabulary** for faster training trainer.fine_tune( "resources/taggers/question-classification-with-transformer", reduce_transformer_vocab=True, # set this to False for slow version )
Involved PRs:
ModelTrainer
train function by @plonerma in #3084We are working towards improving our documentation. A first step was the release of our tutorial page. Now, we are adding (in beta) online API docs to make navigating the code and options offered by Flair easier. To enable it, we changed all docstrings to Google docstrings. However, this process is still ongoing, so expect the API docs to improve in coming versions of Flair.
You can find the API docs here: https://flairnlp.github.io/flair/master/api/index.html
Involved PRs:
In an effort to unify class names, we now offer models that inherit from DefaultClassifier
for each label type we predict, i.e.:
TokenClassifier
for predicting Token
labelsTextPairClassifier
for predicting TextPair
labelsRelationClassifier
for predicting Relation
labelsSpanClassifier
for predicting Span
labelsTextClassifier
for predicting Sentence
labelsAn advantage of such a structure is that most functionality (such as new decoders) needs to only be implemented once in DefaultClassifier
and then is immediately usable for all model classes.
To enable this, we renamed and extended WordTagger
as TokenClassifier
, and renamed Entity Linker
to SpanClassifier
. This is not a breaking change yet, as the old names are still available. But in the future, WordTagger
and Entity Linker
will be removed.
Involved PRs:
TokenClassifier
model by @alanakbik in #3203We also add two new model classes: (1) a TextPairRegressor
for regression tasks on pairs of sentences (such as STS-B), and (2) an experimental Label Encoder method for few-shot classification.
Involved PRs:
TextPair
regression model by @plonerma in #3202LabelVerbalizer
so that it also works for non-BIOES span labes by @alanakbik in #3231flair/py.typed
and requirements.txt
in source distribution by @dobbersc in #3206to_dict
and add relations by @helpmefindaname in #3271save_final_model
is True (even if the training is interrupted) by @plonerma in #3251XLNetEmbeddings
XLMEmbeddings
OpenAIGPTEmbeddings
OpenAIGPT2Embeddings
RoBERTaEmbeddings
CamembertEmbeddings
XLMRobertaEmbeddings
BertEmbeddings
TransformerWordEmbeddings
or TransformerDocumentEmbeddings
instead.ELMoTransformerEmbeddings
as allennlp is no longer maintained.flair.hyperparameter
module: We recommend using the hyperparameter optimzier of your choice as external module, for example see here how to fine tune flair models with the hugginface AutoTrain SpaceRunnertrainer.resume(...)
functionality. Similary to the flair.hyperparameter
module, this functionality was dropped due to the trainer rework.trainer.train(...)
and trainer.fine_tune(...)
parameters:
monitor_train: bool
was replaced by monitor_train_sample: float
: this allows you to specify the percentage of training data points used for monitoring (setting monitor_train_sample=1.0
is equivalent to the previous behaivour of monitor_train=True
.eval_on_train_fraction
is removed in favour of monitor_train_sample
see monitor_train
.eval_on_train_shuffle
is removed.anneal_with_prestarts
and batch_growth_annealing
have been removed.num_workers
has been removed, now there is always used a single worker for data loading, as it is the fastest for the inmemory datasets.checkpoint
has been removed as parameter. You can use the CheckpointPlugin
for the same behaviour.cycle_momentum
has been removed, as schedulers have been moved to Plugins.param_selection_mode
has been removed, similar to the hyper parameter optimization.optimizer_state_dict
and scheduler_state_dict
were removed as part of the resume functionality.anneal_against_dev_loss
has been dropped, as the annealing goeas always against the metric specified by main_evaluation_metric
use_swa
has been removeduse_tensorboard
, tensorboard_comment
tensorboard_log_dir
& metrics_for_tensorboard
are removed in favour of the TensorboardLogger
plugin.amp_opt_level
is removed, as we moved to the torch integration.WordTagger
has been deprecated as it was renamed to TokenClassifier
EntityLinker
has been deprecated as it was renamed to SpanClassifier
Full Changelog: v0.12.2...v0.13.0
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4