PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).
Installation β’ Quickstart β’ Datasets (37) β’ Inductive Datasets (5) β’ Models (40) β’ Support β’ Citation
The latest stable version of PyKEEN requires Python 3.9+. It can be downloaded and installed from PyPI with:
The latest version of PyKEEN can be installed directly from the source code on GitHub with:
pip install git+https://github.com/pykeen/pykeen.git
More information about installation (e.g., development mode, Windows installation, Colab, Kaggle, extras) can be found in the installation documentation.
QuickstartThis example shows how to train a model on a dataset and test on another dataset.
The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.
from pykeen.pipeline import pipeline result = pipeline( model='TransE', dataset='nations', )
The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on using your own dataset, understanding the evaluation, and making novel link predictions.
PyKEEN is extensible such that:
pykeen.models
can be dropped inpykeen.training.LCWATrainingLoop
can be dropped infrom pykeen.triples.TriplesFactory
The full documentation can be found at https://pykeen.readthedocs.io.
Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen
.
The following 37 datasets are built in to PyKEEN. The citation for each dataset corresponds to either the paper describing the dataset, the first paper published using the dataset with knowledge graph embedding models, or the URL for the dataset if neither of the first two are available. If you want to use a custom dataset, see the Bring Your Own Dataset tutorial. If you have a suggestion for another dataset to include in PyKEEN, please let us know here.
Name Documentation Citation Entities Relations Triples Aristo-v4pykeen.datasets.AristoV4
Chen et al., 2021 42016 1593 279425 BioKG pykeen.datasets.BioKG
Walsh et al., 2019 105524 17 2067997 Clinical Knowledge Graph pykeen.datasets.CKG
Santos et al., 2020 7617419 11 26691525 CN3l Family pykeen.datasets.CN3l
Chen et al., 2017 3206 42 21777 CoDEx (large) pykeen.datasets.CoDExLarge
Safavi et al., 2020 77951 69 612437 CoDEx (medium) pykeen.datasets.CoDExMedium
Safavi et al., 2020 17050 51 206205 CoDEx (small) pykeen.datasets.CoDExSmall
Safavi et al., 2020 2034 42 36543 ConceptNet pykeen.datasets.ConceptNet
Speer et al., 2017 28370083 50 34074917 Countries pykeen.datasets.Countries
Bouchard et al., 2015 271 2 1158 Commonsense Knowledge Graph pykeen.datasets.CSKG
Ilievski et al., 2020 2087833 58 4598728 DB100K pykeen.datasets.DB100K
Ding et al., 2018 99604 470 697479 DBpedia50 pykeen.datasets.DBpedia50
Shi et al., 2017 24624 351 34421 Drug Repositioning Knowledge Graph pykeen.datasets.DRKG
gnn4dr/DRKG
97238 107 5874257 FB15k pykeen.datasets.FB15k
Bordes et al., 2013 14951 1345 592213 FB15k-237 pykeen.datasets.FB15k237
Toutanova et al., 2015 14505 237 310079 Global Biotic Interactions pykeen.datasets.Globi
Poelen et al., 2014 404207 39 1966385 Hetionet pykeen.datasets.Hetionet
Himmelstein et al., 2017 45158 24 2250197 Kinships pykeen.datasets.Kinships
Kemp et al., 2006 104 25 10686 Nations pykeen.datasets.Nations
ZhenfengLei/KGDatasets
14 55 1992 NationsL pykeen.datasets.NationsLiteral
pykeen/pykeen
14 55 1992 OGB BioKG pykeen.datasets.OGBBioKG
Hu et al., 2020 93773 51 5088434 OGB WikiKG2 pykeen.datasets.OGBWikiKG2
Hu et al., 2020 2500604 535 17137181 OpenBioLink pykeen.datasets.OpenBioLink
Breit et al., 2020 180992 28 4563407 OpenBioLink LQ pykeen.datasets.OpenBioLinkLQ
Breit et al., 2020 480876 32 27320889 OpenEA Family pykeen.datasets.OpenEA
Sun et al., 2020 15000 248 38265 PharMeBINet pykeen.datasets.PharMeBINet
KΓΆnigs et al., 2022 2869407 208 15883653 PharmKG pykeen.datasets.PharmKG
Zheng et al., 2020 188296 39 1093236 PharmKG8k pykeen.datasets.PharmKG8k
Zheng et al., 2020 7247 28 485787 PrimeKG pykeen.datasets.PrimeKG
Chandak et al., 2022 129375 30 8100498 Unified Medical Language System pykeen.datasets.UMLS
ZhenfengLei/KGDatasets
135 46 6529 WD50K (triples) pykeen.datasets.WD50KT
Galkin et al., 2020 40107 473 232344 Wikidata5M pykeen.datasets.Wikidata5M
Wang et al., 2019 4594149 822 20624239 WK3l-120k Family pykeen.datasets.WK3l120k
Chen et al., 2017 119748 3109 1375406 WK3l-15k Family pykeen.datasets.WK3l15k
Chen et al., 2017 15126 1841 209041 WordNet-18 pykeen.datasets.WN18
Bordes et al., 2014 40943 18 151442 WordNet-18 (RR) pykeen.datasets.WN18RR
Toutanova et al., 2015 40559 11 92583 YAGO3-10 pykeen.datasets.YAGO310
Mahdisoltani et al., 2015 123143 37 1089000
The following 5 inductive datasets are built in to PyKEEN.
The following 22 representations are implemented by PyKEEN.
The following 34 interactions are implemented by PyKEEN.
Name Reference Citation AutoSFpykeen.nn.AutoSFInteraction
Zhang et al., 2020 BoxE pykeen.nn.BoxEInteraction
Abboud et al., 2020 ComplEx pykeen.nn.ComplExInteraction
Trouillon et al., 2016 ConvE pykeen.nn.ConvEInteraction
Dettmers et al., 2018 ConvKB pykeen.nn.ConvKBInteraction
Nguyen et al., 2018 Canonical Tensor Decomposition pykeen.nn.CPInteraction
Lacroix et al., 2018 CrossE pykeen.nn.CrossEInteraction
Zhang et al., 2019 DistMA pykeen.nn.DistMAInteraction
Shi et al., 2019 DistMult pykeen.nn.DistMultInteraction
Yang et al., 2014 ER-MLP pykeen.nn.ERMLPInteraction
Dong et al., 2014 ER-MLP (E) pykeen.nn.ERMLPEInteraction
Sharifzadeh et al., 2019 HolE pykeen.nn.HolEInteraction
Nickel et al., 2016 KG2E pykeen.nn.KG2EInteraction
He et al., 2015 LineaRE pykeen.nn.LineaREInteraction
Peng et al., 2020 MultiLinearTucker pykeen.nn.MultiLinearTuckerInteraction
Tucker et al., 1966 MuRE pykeen.nn.MuREInteraction
BalaΕΎeviΔ et al., 2019 NTN pykeen.nn.NTNInteraction
Socher et al., 2013 PairRE pykeen.nn.PairREInteraction
Chao et al., 2020 ProjE pykeen.nn.ProjEInteraction
Shi et al., 2017 QuatE pykeen.nn.QuatEInteraction
Zhang et al., 2019 RESCAL pykeen.nn.RESCALInteraction
Nickel et al., 2011 RotatE pykeen.nn.RotatEInteraction
Sun et al., 2019 Structured Embedding pykeen.nn.SEInteraction
Bordes et al., 2011 SimplE pykeen.nn.SimplEInteraction
Kazemi et al., 2018 TorusE pykeen.nn.TorusEInteraction
Ebisu et al., 2018 TransD pykeen.nn.TransDInteraction
Ji et al., 2015 TransE pykeen.nn.TransEInteraction
Bordes et al., 2013 TransF pykeen.nn.TransFInteraction
Feng et al., 2016 Transformer pykeen.nn.TransformerInteraction
Galkin et al., 2020 TransH pykeen.nn.TransHInteraction
Wang et al., 2014 TransR pykeen.nn.TransRInteraction
Lin et al., 2015 TripleRE pykeen.nn.TripleREInteraction
Yu et al., 2021 TuckER pykeen.nn.TuckERInteraction
BalaΕΎeviΔ et al., 2019 Unstructured Model pykeen.nn.UMInteraction
Bordes et al., 2014
The following 40 models are implemented by PyKEEN.
Name Model Citation AutoSFpykeen.models.AutoSF
Zhang et al., 2020 BoxE pykeen.models.BoxE
Abboud et al., 2020 Canonical Tensor Decomposition pykeen.models.CP
Lacroix et al., 2018 CompGCN pykeen.models.CompGCN
Vashishth et al., 2020 ComplEx pykeen.models.ComplEx
Trouillon et al., 2016 ComplEx Literal pykeen.models.ComplExLiteral
Kristiadi et al., 2018 ConvE pykeen.models.ConvE
Dettmers et al., 2018 ConvKB pykeen.models.ConvKB
Nguyen et al., 2018 CooccurrenceFiltered pykeen.models.CooccurrenceFilteredModel
Berrendorf et al., 2022 CrossE pykeen.models.CrossE
Zhang et al., 2019 DistMA pykeen.models.DistMA
Shi et al., 2019 DistMult pykeen.models.DistMult
Yang et al., 2014 DistMult Literal pykeen.models.DistMultLiteral
Kristiadi et al., 2018 DistMult Literal (Gated) pykeen.models.DistMultLiteralGated
Kristiadi et al., 2018 ER-MLP pykeen.models.ERMLP
Dong et al., 2014 ER-MLP (E) pykeen.models.ERMLPE
Sharifzadeh et al., 2019 Fixed Model pykeen.models.FixedModel
Berrendorf et al., 2021 HolE pykeen.models.HolE
Nickel et al., 2016 InductiveNodePiece pykeen.models.InductiveNodePiece
Galkin et al., 2021 InductiveNodePieceGNN pykeen.models.InductiveNodePieceGNN
Galkin et al., 2021 KG2E pykeen.models.KG2E
He et al., 2015 MuRE pykeen.models.MuRE
BalaΕΎeviΔ et al., 2019 NTN pykeen.models.NTN
Socher et al., 2013 NodePiece pykeen.models.NodePiece
Galkin et al., 2021 PairRE pykeen.models.PairRE
Chao et al., 2020 ProjE pykeen.models.ProjE
Shi et al., 2017 QuatE pykeen.models.QuatE
Zhang et al., 2019 R-GCN pykeen.models.RGCN
Schlichtkrull et al., 2018 RESCAL pykeen.models.RESCAL
Nickel et al., 2011 RotatE pykeen.models.RotatE
Sun et al., 2019 SimplE pykeen.models.SimplE
Kazemi et al., 2018 Structured Embedding pykeen.models.SE
Bordes et al., 2011 TorusE pykeen.models.TorusE
Ebisu et al., 2018 TransD pykeen.models.TransD
Ji et al., 2015 TransE pykeen.models.TransE
Bordes et al., 2013 TransF pykeen.models.TransF
Feng et al., 2016 TransH pykeen.models.TransH
Wang et al., 2014 TransR pykeen.models.TransR
Lin et al., 2015 TuckER pykeen.models.TuckER
BalaΕΎeviΔ et al., 2019 Unstructured Model pykeen.models.UM
Bordes et al., 2014
The following 15 losses are implemented by PyKEEN.
The following 6 regularizers are implemented by PyKEEN.
The following 3 training loops are implemented in PyKEEN.
The following 3 negative samplers are implemented in PyKEEN.
The following 2 stoppers are implemented in PyKEEN.
The following 5 evaluators are implemented in PyKEEN.
The following 44 metrics are implemented in PyKEEN.
Name Interval Direction Description Type Accuracy $[0, 1]$ π The ratio of the number of correct classifications to the total number. Classification Area Under The Receiver Operating Characteristic Curve $[0, 1]$ π The area under the receiver operating characteristic curve. Classification Average Precision Score $[0, 1]$ π The average precision across different thresholds. Classification Balanced Accuracy Score $[0, 1]$ π The average of recall obtained on each class. Classification Diagnostic Odds Ratio $[0, β)$ π The ratio of positive and negative likelihood ratio. Classification F1 Score $[0, 1]$ π The harmonic mean of precision and recall. Classification False Discovery Rate $[0, 1]$ π The proportion of predicted negatives which are true positive. Classification False Negative Rate $[0, 1]$ π The probability that a truly positive triple is predicted negative. Classification False Omission Rate $[0, 1]$ π The proportion of predicted positives which are true negative. Classification False Positive Rate $[0, 1]$ π The probability that a truly negative triple is predicted positive. Classification Fowlkes Mallows Index $[0, 1]$ π The Fowlkes Mallows index. Classification Informedness $[-1, 1]$ π The informedness metric. Classification Matthews Correlation Coefficient $[-1, 1]$ π The Matthews Correlation Coefficient (MCC). Classification Negative Likelihood Ratio $[0, β)$ π The ratio of false positive rate to true positive rate. Classification Negative Predictive Value $[0, 1]$ π The proportion of predicted negatives which are true negatives. Classification Number of Scores $[0, β)$ π The number of scores. Classification Positive Likelihood Ratio $[0, β)$ π The ratio of true positive rate to false positive rate. Classification Positive Predictive Value $[0, 1]$ π The proportion of predicted positives which are true positive. Classification Prevalence Threshold $[0, β)$ π The prevalence threshold. Classification Threat Score $[0, 1]$ π The harmonic mean of precision and recall. Classification True Negative Rate $[0, 1]$ π The probability that a truly false triple is predicted negative. Classification True Positive Rate $[0, 1]$ π The probability that a truly positive triple is predicted positive. Classification Adjusted Arithmetic Mean Rank (AAMR) $[0, 2)$ π The mean over all ranks divided by its expected value. Ranking Adjusted Arithmetic Mean Rank Index (AAMRI) $[-1, 1]$ π The re-indexed adjusted mean rank (AAMR) Ranking Adjusted Geometric Mean Rank Index (AGMRI) $(\frac{-E[f]}{1-E[f]}, 1]$ π The re-indexed adjusted geometric mean rank (AGMRI) Ranking Adjusted Hits at K $(\frac{-E[f]}{1-E[f]}, 1]$ π The re-indexed adjusted hits at K Ranking Adjusted Inverse Harmonic Mean Rank $(\frac{-E[f]}{1-E[f]}, 1]$ π The re-indexed adjusted MRR Ranking Geometric Mean Rank (GMR) $[1, β)$ π The geometric mean over all ranks. Ranking Harmonic Mean Rank (HMR) $[1, β)$ π The harmonic mean over all ranks. Ranking Hits @ K $[0, 1]$ π The relative frequency of ranks not larger than a given k. Ranking Inverse Arithmetic Mean Rank (IAMR) $(0, 1]$ π The inverse of the arithmetic mean over all ranks. Ranking Inverse Geometric Mean Rank (IGMR) $(0, 1]$ π The inverse of the geometric mean over all ranks. Ranking Inverse Median Rank $(0, 1]$ π The inverse of the median over all ranks. Ranking Mean Rank (MR) $[1, β)$ π The arithmetic mean over all ranks. Ranking Mean Reciprocal Rank (MRR) $(0, 1]$ π The inverse of the harmonic mean over all ranks. Ranking Median Rank $[1, β)$ π The median over all ranks. Ranking z-Geometric Mean Rank (zGMR) $(-β, β)$ π The z-scored geometric mean rank Ranking z-Hits at K $(-β, β)$ π The z-scored hits at K Ranking z-Mean Rank (zMR) $(-β, β)$ π The z-scored mean rank Ranking z-Mean Reciprocal Rank (zMRR) $(-β, β)$ π The z-scored mean reciprocal rank RankingThe following 8 trackers are implemented in PyKEEN.
PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:
pykeen experiments reproduce tucker balazevic2019 fb15k
Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d
.
PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:
pykeen experiments ablation ~/path/to/config.jsonLarge-scale Reproducibility and Benchmarking Study
We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:
@article{ali2020benchmarking, author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework}, year={2021}, pages={1-1}, doi={10.1109/TPAMI.2021.3124805}} }
We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
If you have questions, please use the GitHub discussions feature at https://github.com/pykeen/pykeen/discussions/new.
This project has been supported by several organizations (in alphabetical order):
The development of PyKEEN has been funded by the following grants:
The PyKEEN logo was designed by Carina Steinborn
If you have found PyKEEN useful in your work, please consider citing our article:
@article{ali2021pykeen, author = {Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens}, journal = {Journal of Machine Learning Research}, number = {82}, pages = {1--6}, title = {{PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings}}, url = {http://jmlr.org/papers/v22/20-825.html}, volume = {22}, year = {2021} }
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4