Skip to content

Python library for knowledge graph embedding and representation learning.


Notifications You must be signed in to change notification settings



Repository files navigation

Documentation Status CircleCI Python 3.6 Build Status PyPI version GitHub license Coverage Status Twitter

Pykg2vec: Python Library for KGE Methods

Pykg2vec is a library for learning the representation of entities and relations in Knowledge Graphs built on top of Tensorflow 2.1. We have attempted to bring state-of-the-art Knowledge Graph Embedding (KGE) algorithms and the necessary building blocks in the pipeline of knowledge graph embedding task into a single library. We hope Pykg2vec is both practical and educational for people who want to explore the related fields. For beginners, these papers, A Review of Relational Machine Learning for Knowledge Graphs, Knowledge Graph Embedding: A Survey of Approaches and Applications, and An overview of embedding models of entities and relationships for knowledge base completion can be good starting points! Pykg2vec has following features:

  • Support state-of-the-art KGE model implementations and benchmark datasets. (also support custom datasets)
  • Support automatic discovery for hyperparameters.
  • Tools for inspecting the learned embeddings.
    • Support exporting the learned embeddings in TSV or Pandas-supported format.
    • Interactive result inspector.
    • TSNE-based visualization, KPI summary visualization (mean rank, hit ratio) in various format. (csvs, figures, latex table)

The documentation is here.

We welcome any form of contribution! Please check for more details here.

Repository Structure

  • pykg2vec/config: This folder consists of the configuration module. It provides the necessary configuration to parse the datasets, and also consists of the baseline hyperparameters for the knowledge graph embedding algorithms.
  • pykg2vec/core: This folder consists of the core codes of the knowledge graph embedding algorithms. Inside this folder, each algorithm is implemented as a separate python module.
  • pykg2vec/utils: This folder consists of modules providing various utilities, such as data preparation, data visualization, and evaluation of the algorithms, data generators, baynesian optimizer.
  • pykg2vec/example: This folder consists of example codes that can be used to run individual modules or run all the modules at once or tune the model.

To Get Started

Pykg2vec aims to minimize the dependency on other libraries as far as possible to rapidly test the algorithms against different datasets. In pykg2vec, we won't focus in run-time performance at this moment. However, Tensorflow 2 nativaly support utilizing the GPUs available in your device! Please find out more the guide here to install Tensorflow through pip. In the future, may provide faster implementation of each of the algorithms. (C++ implementations to come!)

Before using pykg2vec, we strongly recommend users to set up a virtual work environment (Venv or Anaconda) and to have the following packages installed:

  • Python >= 3.6
  • tensorflow==2.1.0

Three ways to install pykg2vec are described as follows.

#Install pykg2vec from PyPI:  
$ pip install pykg2vec

# (Suggested!) Install stable version directly from github repo:
$ git clone
$ cd pykg2vec
$ python install

#Install development version directly from github repo:  
$ git clone
$ cd pykg2vec
$ git checkout development
$ python install

Usage Examples

1. Running a single algorithm:

from pykg2vec.utils.kgcontroller import KnowledgeGraph
from pykg2vec.config.config import Importer, KGEArgParser
from pykg2vec.utils.trainer import Trainer

def main():
    # getting the customized configurations from the command-line arguments.
    args = KGEArgParser().get_args()

    # Preparing data and cache the data for later usage
    knowledge_graph = KnowledgeGraph(dataset=args.dataset_name, negative_sample=args.sampling)

    # Extracting the corresponding model config and definition from Importer(). 
    config_def, model_def = Importer().import_model_config(args.model_name.lower())
    config = config_def(args=args)
    model = model_def(config)

    # Create, Compile and Train the model. While training, several evaluation will be performed.
    trainer = Trainer(model=model)

if __name__ == "__main__":

With you can try KGE methods using the following commands:

# check all tunnable parameters.
$ python -h 

# Train TransE on FB15k benchmark dataset.
$ python -mn TransE

# Train using different KGE methods.
$ python -mn [TransE|TransD|TransH|TransG|TransM|TransR|Complex|Complexn3|CP|RotatE|Analogy|

# For KGE using projection-based loss function, use more processes for batch generation.
$ python -mn [ConvE|ConvKB|Proje_pointwise] -npg [the number of processes, 4 or 6]

# Train TransE model using different benchmark datasets.
$ python -mn TransE -ds [fb15k|wn18|wn18_rr|yago3_10|fb15k_237|

Pykg2vec aims to include most of the state-of-the-art KGE methods. You can check Implemented Algorithms for more details. Some models are still under development [Conv2D|TuckER]. To ensure the correctness of included KGE methods we also use the hyperparameter settings from original papers to see if the result is consistent.

# train KGE method with the hyperparameters used in original papers. (FB15k supported only)
$ python -mn [TransE|TransD|TransH|TransG|TransM|TransR|Complex|Complexn3|CP|RotatE|Analogy|
                       distmult|KG2E|KG2E_EL|NTN|Rescal|SLM|SME|SME_BL|HoLE|ConvE|ConvKB|Proje_pointwise] -exp true -ds fb15k

Some metrics running on benchmark dataset (FB15k) is shown below (all are filtered). We are still working on this table so it will be updated.

MR MRR Hit1 Hit3 Hit5 Hit10
TransE 69.52 0.38 0.23 0.46 0.56 0.66
TransH 77.60 0.32 0.16 0.41 0.51 0.62
TransR 128.31 0.30 0.18 0.36 0.43 0.54
TransD 57.73 0.33 0.19 0.39 0.48 0.60
KG2E_EL 64.76 0.31 0.16 0.39 0.49 0.61
Complex 96.74 0.65 0.54 0.74 0.78 0.82
DistMult 128.78 0.45 0.32 0.53 0.61 0.70
RotatE 48.69 0.74 0.67 0.80 0.82 0.86
SME_L 86.3 0.32 0.20 0.35 0.43 0.54
SLM_BL 112.65 0.29 0.18 0.32 0.39 0.50

To use your own dataset, these steps are required:

  1. Store all of triples in a text-format with each line as below, using tab space ("\t") to seperate entities and relations.
  1. For the text file, separate it into three files according to your reference give names as follows,
[name]-train.txt, [name]-valid.txt, [name]-test.txt
  1. For those three files, create a folder [path_storing_text_files] to include them.
  2. Once finished, you then can use the custom dataset to train on a specific model using command:
$ python -mn TransE -ds [name] -dsp [path_storing_text_files] 

2. Tuning a single algorithm:

from pykg2vec.config.hyperparams import KGETuneArgParser
from pykg2vec.utils.bayesian_optimizer import BaysOptimizer

def main():
    # getting the customized configurations from the command-line arguments.
    args = KGETuneArgParser().get_args()

    # initializing bayesian optimizer and prepare data.
    bays_opt = BaysOptimizer(args=args)

    # perform the golden hyperparameter tuning. 
if __name__ == "__main__":

with we then can train the existed model using command:

# check all tunnable parameters.
$ python -h 

# Tune [TransE model] using the [benchmark dataset].
$ python -mn [TransE] -ds [dataset name] 

We are still working on making more convenient interfaces to manipulate this functionality. Right now, please have a look over to adjust the ranges to be searched through other than the default ranges. Besides, you can tune the hyperparameter on your own dataset as well by following the previous instructions.

3. Perform Inference Tasks (advanced):

import sys, code

from pykg2vec.utils.kgcontroller import KnowledgeGraph
from pykg2vec.config.config import Importer, KGEArgParser
from pykg2vec.utils.trainer import Trainer

def main():
    # getting the customized configurations from the command-line arguments.
    args = KGEArgParser().get_args(sys.argv[1:])

    # Preparing data and cache the data for later usage
    knowledge_graph = KnowledgeGraph(dataset=args.dataset_name, negative_sample=args.sampling, custom_dataset_path=args.dataset_path)

    # Extracting the corresponding model config and definition from Importer().
    config_def, model_def = Importer().import_model_config(args.model_name.lower())
    config = config_def(args=args)
    model = model_def(config)

    # Create, Compile and Train the model. While training, several evaluation will be performed.
    trainer = Trainer(model=model)
    #can perform all the inference here after training the model


if __name__ == "__main__":

For inference task, you can use the following command:

$ python -mn TransE # train a model on FK15K dataset and enter interactive CMD for manual inference tasks.
$ python -mn TransE -ld true # pykg2vec will look for the location of cached pretrained parameters in your local.

# Once interactive mode is reached, you can execute instruction manually like
# Example 1: trainer.infer_tails(1,10,topk=5) => give the list of top-5 predicted tails. 
# Example 2: trainer.infer_heads(10,20,topk=5) => give the list of top-5 predicted heads.
# Example 3: trainer.infer_rels(1,20,topk=5) => give the list of top-5 predicted relations.

You can utilize this script to inspect results from the training and to perform manual inference tasks. With this, you might need to check for more details.

Common Installation Problems


Please kindly consider citing our paper if you find pykg2vec useful for your research.

  title={Pykg2vec: A Python Library for Knowledge Graph Embedding},
  author={Yu, Shih Yuan and Rokka Chhetri, Sujit and Canedo, Arquimedes and Goyal, Palash and Faruque, Mohammad Abdullah Al},
  journal={arXiv preprint arXiv:1906.04239},