RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/flairNLP/flair/releases below:

Website Navigation

Releases · flairNLP/flair · GitHub

Release 0.15.1

This release fixes compatibility bugs with the newest PyTorch and SciPy versions, and adds a number of small improvements and new features.

Improvements and new features

SegtokTokenizer: Add option to customize SegtokTokenizer, by @alanakbik in #3592
RegexpTagger: Add option to define matching groups to RegexpTagger, by @alanakbik in #3598
RelationClassifier: Optimize RelationClassifier by adding the option to filter long sentences and truncate context, by @alanakbik in #3593
RelationClassifier: Modify printouts in RelationClassifier evaluation to remove clutter by @alanakbik in #3591
Add sentence labeler, by @MattGPT-ai in #3570
Adding a Deep Nearest Class Means Classifier model to Flair, by @sheldon-roberts in #3532
Add per-task metrics by @ntravis22 in #3605
Add options to load full documents as Sentence objects, by @alanakbik in #3595

New Model: Deep Nearest Class Means Classifier (#3532)

Adds a new Nearest Class Mean classification approach to Flair that classifies data points to the class with the closest class data mean. This approach can be used as an alternative to fitting a Softmax Classifier. It is now available for any class in Flair that implements DefaultClassifier. For instance, to train a TextClassifier with DeepNCMs you can use the following code:

from flair.data import Corpus
from flair.datasets import TREC_50
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.nn import DeepNCMDecoder
from flair.trainers import ModelTrainer
from flair.trainers.plugins import DeepNCMPlugin

# load the TREC dataset
corpus: Corpus = TREC_50()

label_type = "class"

# make a transformer document embedding
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

# create the label_dictionary
label_dictionary = corpus.make_label_dictionary(label_type=label_type)

# create a text classifier with a special DeepNCM decoder
classifier = TextClassifier(
    document_embeddings,
    label_type=label_type,
    label_dictionary=label_dictionary,
    decoder=DeepNCMDecoder(
        mean_update_method="condensation",
        embeddings_size=document_embeddings.embedding_length,
        label_dictionary=label_dictionary,
    ),
)

# initialize the trainer
trainer = ModelTrainer(classifier, corpus)

# train the model using the DeepNCM plugin
trainer.fine_tune(
    "resources/taggers/deepncm_baseline",
    plugins=[DeepNCMPlugin()],
)

Contributed by @sheldon-roberts in #3532

Datasets

Add BarNER Dataset by @stefan-it in #3604

Bug Fixes

Fix model loading for compatibility with PyTorch 2.6, by @helpmefindaname in #3608
Fix SciPy compatibility by updating scipy .A to toarray(), by @sg-wbi in #3606
Fix: use proper eval default main eval metrics for text regression model by @MattGPT-ai in #3602
Fix: cast indices tensor to int to fix bug by @MattGPT-ai in #3601

New Contributors

@sg-wbi made their first contribution in #3606
@ntravis22 made their first contribution in #3605

Full Changelog: v0.15.0...v0.15.1

Release 0.15.0 Release 0.14.0

This release adds major new support for biomedical text analytics! It adds improved biomedical NER and a state-of-the-art model for biomedical entity linking. Other new features include (1) support for parameter-efficient fine-tuning and (2) various new datasets, bug fixes and enhancements! We also removed a few dependencies, so Flair should install faster and take up less space!

Biomedical NER and Entity Linking

With Flair 0.14.0, you can now detect and normalize biomedical entities in text.

For example, to analyze the sentence "We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome", use this code snippet:

from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.data import Sentence

# A sentence from biomedical literature
sentence = Sentence("We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome.")

# Tag named entities in the text
ner_tagger = Classifier.load("hunflair2")
ner_tagger.predict(sentence)

# Normalize disease names
disease_linker = EntityMentionLinker.load("gene-linker")
disease_linker.predict(sentence)

# Normalize gene names
gene_linker = EntityMentionLinker.load("disease-linker")
gene_linker.predict(sentence)

# Iterate over predicted entities and print
for label in sentence.get_labels():
    print(label)

This should print out:

Span[5:6]: "IFNAR2" → Gene (1.0)
Span[5:6]: "IFNAR2" → 3455/name=IFNAR2 

Span[7:8]: "POLG" → Gene (1.0)
Span[7:8]: "POLG" → 5428/name=POLG 

Span[9:11]: "long-COVID syndrome" → Disease (1.0)
Span[9:11]: "long-COVID syndrome" → MESH:D000094024/name=Post-Acute COVID-19 Syndrome

The printout shows that:

"IFNAR2" is a gene. Further, it is recognized as gene 3455 ("interferon alpha and beta receptor subunit 2") in the NCBI database.
"POLG" is a gene. Further, it is recognized as gene 5428 ("DNA polymerase gamma, catalytic subunit") in the NCBI database.
"long-COVID syndrome" is a disease. Further, it is uniquely linked to "Post-Acute COVID-19 Syndrome" in the MESH database.

Big thanks to @sg-wbi @WangXII @mariosaenger @helpmefindaname for all their work:

Entity Mention Linker by @helpmefindaname in #3388
Support for biomedical datasets with multiple entity types by @WangXII in #3387
Update documentation for Hunflair2 release by @mariosaenger in #3410
Improve nel tutorial by @helpmefindaname in #3369
Incorporate hunflair2 docs to docpage by @helpmefindaname in #3442

Parameter-Efficient Fine-Tuning

Flair 0.14.0 also adds support for PEFT.

For instance, to fine-tune a BERT model on the TREC question classification task using LoRA, use the following snippet:

from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# Note: you need to install peft to use this feature!
from peft import LoraConfig, TaskType

# Get corpus and make label dictionary
corpus: Corpus = TREC_6()
label_type = "question_class"
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Define embeddings with LoRA fine-tuning
document_embeddings = TransformerDocumentEmbeddings(
    "bert-base-uncased",
    fine_tune=True,
    # set LoRA config
    peft_config=LoraConfig(
        task_type=TaskType.FEATURE_EXTRACTION,
        inference_mode=False,
    ),
)

# define model
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

# train model
trainer = ModelTrainer(classifier, corpus)
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    learning_rate=5.0e-4,
    mini_batch_size=4,
    max_epochs=1,
)

Big thanks to @janpf for this new feature!

Add PEFT training and explicit kwarg passthrough by @janpf in #3480

Smaller Library

We've removed dependencies such as gensim from the core package, since they increased the size of the Flair library and caused some compatibility/maintenance issues. This means the core package is now smaller and fast to install.

Install as always with:

For certain features, you still need gensim, such as training a model that uses classic word embeddings. For this use case, install with:

pip install flair[word-embeddings]

Or just install gensim separately.

Big thanks to @helpmefindaname for this new feature!

Make gensim optional by @helpmefindaname in #3493
Update models for v0.14.0 by @alanakbik in #3505
Relax version constraint for konoha by @himkt in #3394
Dependencies maintainance updates by @helpmefindaname in #3402
Make janome optional by @himkt in #3405
Bump min. version of bpemb by @stefan-it in #3468

Other Improvements New Features and Improvements

Speed up euclidean distance calculation by @sheldon-roberts in #3485
Add DataTriples which act just like DataPairs by @janpf in #3481
Add random seed parameter to dataset splitting and downsampling for better reproducibility by @MattGPT-ai in #3475
Allow cpu device even if gpu available by @drbh in #3417
Add prediction label type for span classifier by @helpmefindaname in #3432
Character embeddings store their embedding name too by @helpmefindaname in #3477

Bug Fixes

TextPairRegressor: Fix data point iteration by @ya0guang in #3413
TextPairRegressor: Fix GPU memory leak by @MattGPT-ai in #3490
TextRegressor: Fix label_name bug by @sheldon-roberts in #3491
SequenceTagger: Fix _all_scores_for_token in ViterbiDecoder by @mauryaland in #3455
SentenceSplitter: Fix linking of sentences by @mariosaenger in #3397
SentenceSplitter: Fix case where split was performed on special characters by @helpmefindaname in #3404
Classifier: Fix loading by moving error message to main load function by @alanakbik in #3504
Trainer: Fix edge case by loading best model at end, even when there is no final evaluation by @helpmefindaname in #3470
TransformerEmbeddings: Fix special tokens by not replacing replace_additional_special_tokens by @helpmefindaname in #3451
Unit tests: Fix double data_folder in unit test by @ya0guang in #3412

New Datasets

Add revision support for all Universal Dependencies datasets by @stefan-it in #3420
NER_ESTONIAN_NOISY: Support for Estonian NER dataset with noise by @teresaloeffelhardt in #3463
MASAKHA_POS: Support for two new languages by @stefan-it in #3421
UD_BAVARIAN_MAIBAAM: Add support for new Bavarian MaiBaam UD by @stefan-it in #3426

Documentation

Minor readme fixes by @stefan-it in #3424
Fix typo transformer-embeddings.md by @abhisheklomsh in #3500
Fix typo in how-model-training-works.md by @abhisheklomsh in #3499

Build Management

Fix black and ruff by @stefan-it in #3423
Remove zappr yaml by @helpmefindaname in #3435
Fix tests package being incorrectly included in builds by @asumagic in #3440

New Contributors

@ya0guang made their first contribution in #3413
@drbh made their first contribution in #3417
@asumagic made their first contribution in #3440
@MattGPT-ai made their first contribution in #3475
@janpf made their first contribution in #3481
@sheldon-roberts made their first contribution in #3485
@abhisheklomsh made their first contribution in #3500
@teresaloeffelhardt made their first contribution in #3463

Full Changelog: v0.13.1...v0.14.0

Release 0.13.1 Release 0.13.0

This release adds several major new features such as (1) faster and more memory-efficient transformer training, (2) a new plugin system for custom logging and training, (3) new API docs for better documentation - still in beta, and (4) various new models, datasets, bug fixes and enhancements. This release also increases the minimum requirement to Python 3.8!

New Feature: Faster and more memory-efficient transformer training

This release integrates @helpmefindaname's transformer-smaller-training-vocab into the ModelTrainer. This temporarily reduces a transformer's vocabulary to only the tokens in the training dataset, and after training restores the full vocabulary. Depending on the dataset, this may effect huge savings in GPU memory and tuning speeds.

To use this feature, simply add the flag reduce_transformer_vocab=True to the fine_tune method. For example, to fine-tune a distilbert model on TREC_6, run this code (step 7 has the flag to reduce the vocabulary):

# 1. get the corpus
corpus: Corpus = TREC_6()

# 2. what label do we want to predict?
label_type = "question_class"

# 3. create the label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 4. initialize transformer document embeddings (many models are available)
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

# 5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

# 6. initialize trainer
trainer = ModelTrainer(classifier, corpus)

# 7. fine-tune the model, but **reduce the vocabulary** for faster training
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    reduce_transformer_vocab=True,  # set this to False for slow version
)

Involved PR: add reduce transformer vocab plugin by @helpmefindaname in #3217

New Feature: Trainer Plugins

A new "Plugin" system was added to the ModelTrainer, allowing far greater options to customize the training cycle (and slimming down the code of the ModelTrainer somewhat). For instance, it is now possible to customize logging to a far greater degree and integrate third-party logging tools.

For instance, if you want to integrate ClearML logging into the above script, simply instantiate the plugin and attach it to the trainer:

[...]

# 6. initialize trainer
trainer = ModelTrainer(classifier, corpus)

# NEW: instantiate a special logger and attach it to the trainer before the training run
ClearmlLoggerPlugin(clearml.Task.init(project_name="test", task_name="test")).attach_to(trainer)

# 7. fine-tune the model, but **reduce the vocabulary** for faster training
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    reduce_transformer_vocab=True,  # set this to False for slow version
)

Involved PRs:

Proposal: Pluggable ModelTrainer train function by @plonerma in #3084
Major refactoring of ModelTrainer by @alanakbik in #3182
Allow users to use no scheduler and use a custom scheduling plugin by @plonerma in #3200
Don't pickle classes & plugins in modelcard by @helpmefindaname in #3325
Clearml logger by @helpmefindaname in #3259
Add a convenience conversion for flair.device by @alanakbik in #3350

API Docs and other documentation

We are working towards improving our documentation. A first step was the release of our tutorial page. Now, we are adding (in beta) online API docs to make navigating the code and options offered by Flair easier. To enable it, we changed all docstrings to Google docstrings. However, this process is still ongoing, so expect the API docs to improve in coming versions of Flair.

You can find the API docs here: https://flairnlp.github.io/flair/master/api/index.html

Involved PRs:

Creating a doc page with autodocs by @helpmefindaname in #3273
Google doc strings by @helpmefindaname in #3164
Add redirects to old tutorials by @alanakbik in #3211
Add some more documentation and (rather empty) glossary page by @helpmefindaname in #3339
Update README.md by @eltociear in #3241
Fix embedding finetuning tutorial by @helpmefindaname in #3301
Fix build doc page action trigger by @helpmefindaname in #3319
Reduce gh-actions diskspace by @helpmefindaname in #3327
Orange secondary color by @helpmefindaname in #3321
Bump Flair and Python versions by @alanakbik in #3355

Model Refactorings

In an effort to unify class names, we now offer models that inherit from DefaultClassifier for each label type we predict, i.e.:

TokenClassifier for predicting Token labels
TextPairClassifier for predicting TextPair labels
RelationClassifier for predicting Relation labels
SpanClassifier for predicting Span labels
TextClassifier for predicting Sentence labels

An advantage of such a structure is that most functionality (such as new decoders) needs to only be implemented once in DefaultClassifier and then is immediately usable for all model classes.

To enable this, we renamed and extended WordTagger as TokenClassifier, and renamed Entity Linker to SpanClassifier. This is not a breaking change yet, as the old names are still available. But in the future, WordTagger and Entity Linker will be removed.

Involved PRs:

TokenClassifier model by @alanakbik in #3203
Rename EntityLinker and remove some legacy embeddings by @alanakbik in #3295

New Models

We also add two new model classes: (1) a TextPairRegressor for regression tasks on pairs of sentences (such as STS-B), and (2) an experimental Label Encoder method for few-shot classification.

Involved PRs:

Add TextPair regression model by @plonerma in #3202
Add dual encoder by @whoisjones in #3208
Adapt LabelVerbalizer so that it also works for non-BIOES span labes by @alanakbik in #3231

New Datasets

Integrate BigBio NER data sets into HunFlair by @mariosaenger in #3146
Add datasets STS-B and SST-2 to flair by @plonerma in #3201
Extend German LER Dataset by @stefan-it in #3288
Add support for MasakhaPOS Dataset by @stefan-it in #3247
Gh3275: sample_missing_splits in SST-2 by @plonerma in #3276
Add German MobIE NER Dataset by @stefan-it in #3351

Build Process

Use ruff instead of flake8 and isort by @Lingepumpe in #3213
Update mypy by @Lingepumpe in #3210
Use poetry instead of pipenv for developer/testing by @Lingepumpe in #3214
Remove poetry by @helpmefindaname in #3258

Bug Fixes

Fix seralization of config in transformers by @helpmefindaname in #3178
Add stacklevel to log_line in order to display correct file and line number (backwards compatible) by @plonerma in #3175
Fix tars loading by @helpmefindaname in #3212
Fix best epoch score update by @lephong in #3220
Fix loading of (not so) old models by @helpmefindaname in #3229
Fix false warning for "An empty Sentence was created!" by @AbdiHaryadi in #3268
Fix bug with sentences that do not contain a single valid transformer token by @helpmefindaname in #3230
Fix loading of old models by @helpmefindaname in #3228
Fix multiple arguments destination by @helpmefindaname in #3272
Support transformers 4310 by @helpmefindaname in #3289
Fix import error by @helpmefindaname in #3336

Enhancements

Bump min version to 3.8 by @helpmefindaname in #3297
Use torch native amp by @helpmefindaname in #3128
Unpin gdown dependency by @helpmefindaname in #3176
get_spans_from_bio: Start new span for previous S- if class also changed by @Lingepumpe in #3195
Include flair/py.typed and requirements.txt in source distribution by @dobbersc in #3206
Better tars inference by @helpmefindaname in #3222
prevent fasttext embeddings to be stored separately by @helpmefindaname in #3293
recreate to_dict and add relations by @helpmefindaname in https...

Read more Release 0.12.2

Another follow-up release to 0.12 that fixes a several bugs and adds a new multilingual frame tagger. Further, our new documentation website at https://flairnlp.github.io/docs/intro is now online!

New frame tagging model #3172

Adds a new model for detecting PropBank frame. The model is trained using the "FLERT" approach, so it is much stronger than the previous 'frame' model. We also added some training data from the universal proposition bank to improve multilingual frame detection.

Use it like this:

# load the large frame model
model = Classifier.load('frame-large')

# English sentence with the verb "return" in two different senses
sentence = Sentence("Dirk returned to Berlin to return his hat.")
model.predict(sentence)
print(sentence)

# German sentence with the verb "trug" in two different senses
sentence_de = Sentence("Dirk trug einen Koffer und trug einen Hut.")
model.predict(sentence_de)
print(sentence_de)

This should print:

Sentence[9]: "Dirk returned to Berlin to return his hat." → ["returned"/return.01, "return"/return.02]

Sentence[9]: "Dirk trug einen Koffer und trug einen Hut." → ["trug"/carry.01, "trug"/wear.01]

The printout tells us that the verbs in both sentences are correctly disambiguated.

Documentation

adds a pointer to the new Flair documentation website at https://flairnlp.github.io/docs/intro
adds a night mode Flair logo #3145

Enhancements / New Features

more consistent behavior of context dropout and FLERT token #3168
settting device through environment variable #3148 (thanks @HallerPatrick)
modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150 (thanks @mauryaland)
gather dev and test labels if the dataset is available #3162 (thanks @helpmefindaname)

Bug fixes

fix bugs caused by wrong data point equality and caching #3157
fix transformer smaller training vocab #3155 (thanks @helpmefindaname)
update scispacy version #3144 (thanks @mariosaenger)
unpin huggingface-hub #3149 (thanks @marctorsoc)

Release 0.12.1

This is a quick follow-up release to 0.12 that fixes a few small bugs and includes an improved version of our Zelda entity linker.

New Entity Linking model

We include a new version of our Zelda entity linker with improved predictions. Try it as follows:

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('linker')

# make a sentence
sentence = Sentence('Kirk and Spock met on the Enterprise.')

# predict NER tags
tagger.predict(sentence)

# print predicted entities
for label in sentence.get_labels():
    print(label)

This should print:

Span[0:1]: "Kirk" → James_T._Kirk (0.9969)
Span[2:3]: "Spock" → Spock (0.9971)
Span[6:7]: "Enterprise" → USS_Enterprise_(NCC-1701-D) (0.975)

Indicating correctly that the span "Kirk" points to "James_T._Kirk". As the prediction for the string "Enterprise" shows, the model is still beta and will be further improved with future releases.

Bug fixes

make transformer training vocab optional #3132
change token.get_tag() to token.get_label() #3135
update required version of transformers library #3138
update HunFlair tutorial to Flair 0.12 #3137

Release 0.12

Release 0.12 is out! This release greatly simplifies model usage for our users, includes our first entity linking model, adds support for the Ukrainian language, adds easy-to-use multitask learning, and many more features, improvements and bug fixes!

New Features Simplify Flair model usage #3067

You can now load any Flair model through its parent class. Since most models inherit from Classifier, you can load and run multiple different models with exactly the same code. So, to run three different taggers for sentiment, entities and frames, do:

from flair.data import Sentence
from flair.nn import Classifier

# load three taggers to tag entities, frames and sentiment
tagger_1 = Classifier.load('ner')
tagger_2 = Classifier.load('frame')
tagger_3 = Classifier.load('sentiment')

# example sentence
sentence = Sentence('Dirk celebrated in Essen')

# predict with all three models
tagger_1.predict(sentence)
tagger_2.predict(sentence)
tagger_3.predict(sentence)

# print all predictions
for label in sentence.get_labels():
    print(label)

With this change, users no longer need to know which model classes implement which model. For more advanced users who do know this, the regular way for loading a model still works:

sentiment_tagger = TextClassifier.load('sentiment')

Entity Linking (BETA)

As of Flair 0.12 we ship an experimental entity linker trained on the Zelda dataset. The linker not only tags entities, but also attempts to link each entity to the corresponding Wikipedia URL if one exists.

To illustrate, let's use a short example text with two mentions of "Barcelona". The first refers to the football club "FC Barcelona", the second to the city "Barcelona".

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('linker')

# make a sentence
sentence = Sentence('Bayern played against Barcelona. The match took place in Barcelona.')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print:

Sentence[12]: "Bayern played against Barcelona. The match took place in Barcelona." → ["Bayern"/FC_Bayern_Munich, "Barcelona"/FC_Barcelona, "Barcelona"/Barcelona]

As we can see, the linker can resolve what the two mentions of "Barcelona" refer to:

the first mention "Barcelona" is linked to "FC_Barcelona"
the second mention "Barcelona" is linked to "Barcelona"

Additionally, the mention "Bayern" is linked to "FC_Bayern_Munich", telling us that here the football club is meant.

Entity linking support includes:

Support for the ZELDA candidate lists #3108 #3111
Support for the ZELDA training and evaluation dataset #3088

Support for Ukrainian language #3026

This version adds support for Ukrainian taggers, embeddings and datasets. For instance, to do NER and POS tagging of a Ukrainian sentence, do:

# Load Ukrainian NER and POS taggers
from flair.models import SequenceTagger

ner_tagger = SequenceTagger.load('ner-ukrainian')
pos_tagger = SequenceTagger.load('pos-ukrainian')

# Tag a sentence
from flair.data import Sentence
sentence = Sentence("Сьогодні в Знам’янці проживають нащадки поета — родина Шкоди.")

ner_tagger.predict(sentence)
pos_tagger.predict(sentence)

print(sentence)
# ”Сьогодні в Знам’янці проживають нащадки поета — родина Шкоди." → 
# [“Сьогодні"/ADV, "в"/ADP, "Знам’янці"/LOC, "Знам’янці"/PROPN, "проживають”/VERB, "нащадки"/NOUN, "поета"/NOUN, "—"/PUNCT, "родина"/NOUN, "Шкоди”/PERS, "Шкоди"/PROPN, "."/PUNCT]

Multitask Learning (#2910 #3085 #3101)

We add support for multitask learning in Flair (closes #2508 and closes #1260) with hopefully a simple syntax to define multiple tasks that share parts of the model.

The most common part to share is the transformer, which you might want to fine-tune across several tasks. Instantiate a transformer embedding and pass it to two separate models that you instantiate as before:

# --- Embeddings that are shared by both models --- #
shared_embedding = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

# --- Task 1: Sentiment Analysis (5-class) --- #
corpus_1 = SENTEVAL_SST_GRANULAR()

model_1 = TextClassifier(shared_embedding,
                         label_dictionary=corpus_1.make_label_dictionary("class"),
                         label_type="class")

# -- Task 2: Binary Sentiment Analysis on Customer Reviews -- #
corpus_2 = SENTEVAL_CR()

model_2 = TextClassifier(shared_embedding,
                         label_dictionary=corpus_2.make_label_dictionary("sentiment"),
                         label_type="sentiment",
                         )

# -- Define mapping (which tagger should train on which model) -- #
multitask_model, multicorpus = make_multitask_model_and_corpus(
    [
        (model_1, corpus_1),
        (model_2, corpus_2),
    ]
)

# -- Create model trainer and train -- #
trainer = ModelTrainer(multitask_model, multicorpus)
trainer.fine_tune(f"resources/taggers/multitask_test")

The mapping part here defines which tagger should be trained on which corpus. By calling make_multitask_model_and_corpus with a mapping, you get a corpus and model object that you can train as before.

Explicit context boundaries in Transformer embeddings #3073 #3078

We improve our FLERT model by now explicitly marking up context boundaries using a new [FLERT] special token in our transformer embeddings. Our experiments show that the context marker leads to improved NER results:

Transformer Context-Marker CoNLL-03 Test F1 bert-base-uncased none 91.52 +- 0.16 [SEP] 91.38 +- 0.18 [FLERT] 91.56 +- 0.17 xlm-roberta-large none 93.73 +- 0.2 [SEP] 93.76 +- 0.13 [FLERT] 93.92 +- 0.14

In the table, none is the approach used in previous Flair versions. [SEP] means using the standard separator symbol as context delimiter. [FLERT] means using a new dedicated special token.

As [FLERT] performs best in our experiments, the [FLERT] context marker is now activated by default.

More details: Assume the current sentence is Peter Blackburn and the previous sentence ends with to boycott British lamb ., while the next sentence starts with BRUSSELS 1996-08-22 The European Commission.

In this case,

if use_context_separator=False, the embedding is produced from this string: to boycott British lamb . Peter Blackburn BRUSSELS 1996-08-22 The European Commission
if use_context_separator=True, the embedding is produced from this string to boycott British lamb . [FLERT] Peter Blackburn [FLERT] BRUSSELS 1996-08-22 The European Commission

Integrate transformer-smaller-training-vocab #3066

We integrate the transformer-smaller-training-vocab library into the ModelTrainer. With it, you can reduce the size of transformer models when training and evaluating models on specific datasets. This leads to faster training times and a smaller memory footprint. Documentation on this new feature will be added soon!

Masked Relation Classifier #2748 #2993 with various Encoding Strategies #3023 (BETA)

We now include BETA support a new type of relation extraction model that leads to much higher accuracies than our vanilla relation extraction, but increases computational costs. Documentation for this will be added as we iterate on the model.

ONNX compatible models #2640 #2643 #3041 #3075

This release continues the journey on making our models more ONNX compatible.

Other features

Add push to Hub functionalities #2897
Add layoutlm layoutxlm support and the the SROIE dataset #2980
Convenience method for learning rate factor #2888 #2893

New Datasets

Add fewnerd corpus #3103
Add support for NERMuD 2023 Dataset #3087
Adds ZELDA Entity Linking dataset #3088
Added Ukrainian NER and UD datasets #3069
Add support MasakhaNER v2 dataset #3013
Add support for MultiCoNerV2 #3006
Add support for new ICDAR Europeana NER Dataset #2911
datasets: add support for HIPE-2022 #2735 #2827 #2805

Major refactorings

Unify loss reduction by making sure that all losses are summed over all points, instead of averaged #2933 #2910
Python 3.7 #2769
Flatten DefaultClassifier interface #2978
Restructure Tokenizer and Splitter modules #3002
Refactor Token and Sentence Positional Properties #3001
Seralization of embeddings #3011

Various Improvements Enhancements

add functionality for using proxies #3082
add option not to shuffle the first epoch #3076
improved Tars Context #3063
release optimizer memory and fix legacy tokenization #3043
add time elapsed to training printout #2983
separate between token-lengths and sub-token lengths #2990
small speed optimizations #2975
change output of .text to original string #2974
remove BAD_EPOCHS printout for most schedulers #2970
warn if resuming with too low max_epochs & ' additional_epochs' parameter #2895
embeddings: add support for T5 encoder models #2896
add py.typed file for PEP-561 compatibility #2858
tars classifier always predict something on single label #2838
make add_unk optional and don't use it for ner #2839
add deprecation warning for SentenceDataset rename #2819
more precise type hint for eval_on_train_fraction #2811
better handling for consecutive whitespaces in Sentence #2721(already in flair 0.11.3)
remove unnecessary more-itertools pin #2730 (already in flair 0.11.3)
add exclude_labels parameter to trainer.train #2724 ...