Release 0.6.1 is bugfix release that fixes the issues caused by moving the server that originally hosted the Flair models. Additionally, this release adds a ton of new NER datasets, including the XTREME corpus for 40 languages, and a new model for NER on German-language legal text.
New Model: Legal NER (#1872)Add legal NER model for German. Trained using the German legal NER dataset available here that can be loaded in Flair with the LER_GERMAN
corpus object.
Uses German Flair and FastText embeddings and gets 96.35 F1 score.
Use like this:
# load German LER tagger tagger = SequenceTagger.load('de-ler') # example text text = "vom 6. August 2020. Alle Beschwerdeführer befinden sich derzeit gemeinsam im Urlaub auf der Insel Mallorca , die vom Robert-Koch-Institut als Risikogebiet eingestuft wird. Sie wollen am 29. August 2020 wieder nach Deutschland einreisen, ohne sich gemäß § 1 Abs. 1 bis Abs. 3 der Verordnung zur Testpflicht von Einreisenden aus Risikogebieten auf das SARS-CoV-2-Virus testen zu lassen. Die Verordnung sei wegen eines Verstoßes der ihr zugrunde liegenden gesetzlichen Ermächtigungsgrundlage, des § 36 Abs. 7 IfSG , gegen Art. 80 Abs. 1 Satz 1 GG verfassungswidrig." sentence = Sentence(text) # predict and print entities tagger.predict(sentence) for entity in sentence.get_spans('ner'): print(entity)New Datasets Add XTREME and WikiANN corpora for multilingual NER (#1862)
These huge corpora provide training data for NER in 176 languages. You can either load the language-specific parts of it by supplying a language code:
# load German Xtreme german_corpus = XTREME('de') print(german_corpus) # load French Xtreme french_corpus = XTREME('fr') print(french_corpus)
Or you can load the default 40 languages at once into one huge MultiCorpus by not providing a language ID:
# load Xtreme MultiCorpus for all multi_corpus = XTREME() print(multi_corpus)Add Twitter NER Dataset (#1850)
Dataset of tweets annotated with NER tags. Load with:
# load twitter dataset corpus = TWITTER_NER() # print example tweet print(corpus.test[0])Add German Europarl NER Dataset (#1849)
Dataset of German-language speeches in the European parliament annotated with standard NER tags like person and location. Load with:
# load corpus corpus = EUROPARL_NER_GERMAN() print(corpus) # print first test sentence print(corpus.test[1])Add MIT Restaurant NER Dataset (#1177)
Dataset of English restaurant reviews annotated with entities like "dish", "location" and "rating". Load with:
# load restaurant dataset corpus = MIT_RESTAURANTS() # print example sentence print(corpus.test[0])Add Universal Propositions Banks for French and German (#1866)
Our kickoff into supporting the Universal Proposition Banks adds the first two UP datasets to Flair. Load with:
# load German UP corpus = UP_GERMAN() print(corpus) # print example sentence print(corpus.dev[1])Add Universal Dependencies Dataset for Chinese (#1880)
Adds the Kyoto dataset for Chinese. Load with:
# load Chinese UD dataset corpus = UD_CHINESE_KYOTO() # print example sentence print(corpus.test[0])Bug fixes
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4