Deep Keyphrase extraction using SciBERT.
pytorch-pretrained-BERT
scibert
repo, untar the weights (rename their weight dump file to pytorch_model.bin
) and vocab file into a new folder model
.experiments/base_model/params.json
. We recommend keeping batch size of 4 and sequence length of 512, with 6 epochs, if GPU's VRAM is around 11 GB.python train.py --data_dir data/task1/ --bert_model_dir model/ --model_dir experiments/base_model
python evaluate.py --data_dir data/task1/ --bert_model_dir model/ --model_dir experiments/base_model --restore_file best
We used IO format here. Unlike original SciBERT repo, we only use a simple linear layer on top of token embeddings.
On test set, we got:
We used BIO format here. Overall F1 score was 0.4981 on test set.
Precision Recall F1-score Support Process 0.4734 0.5207 0.4959 870 Material 0.4958 0.6617 0.5669 807 Task 0.2125 0.2537 0.2313 201 Avg 0.4551 0.5527 0.4981 1878RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4