pyKE¶
Welcome to pyKE’s documentation. In this documentation you can find all information about the project.
README¶
An Open-source library for Knowledge Embedding forked from github.org/thunlp/OpenKE. The original API changed drastically to be more pythonic.
Overview¶
This is an implementation based on [TensorFlow](http://www.tensorflow.org) for knowledge representation learning (KRL). It includes native C++ implementations for underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs.
Installation¶
Clone repository and enter directory
git clone https://github.com/ifis-tu-bs/pyKE.git cd pyKE
Install package
python setup.py install
Quickstart¶
To compute a knowledge graph embedding, first import datasets and set configure parameters for training, then train models and export results. Here is an example to train the FB15K dataset with the TransE model.
from pyke.dataset import Dataset
from pyke.embedding import Embedding
from pyke.models import TransE
# Read the dataset
dataset = Dataset("./benchmarks/fb15k.nt")
embedding = Embedding(
dataset,
TransE,
folds=20,
epochs=20,
neg_ent=1,
neg_rel=0,
bern=False,
workers=4,
dimension=50, # TransE-specific
margin=1.0, # TransE-specific
)
# Train the model. It is saved in the process.
embedding.train(prefix="./TransE", post_epoch=print)
# Save the embedding to a JSON file
embedding.save_to_json("TransE.json")
Interfaces¶
The class pyke.embedding.Embedding represents an embedding which requires a dataset and a model class. Initialize your data set in form of a N-triples file with the class pyke.dataset.Dataset.
Models¶
The class pyke.models.base.BaseModel declares the methods that all implemented model classes share, including the loss function neccessairy for training (inserting information into the model) and prediction (aka. retrieving information from the model). This project implements the following model classes:
- RESCAL
- TransE
- TransH
- TransR
- TransD
- HolE
- ComplEx
- DistMult
Notes¶
The original fork consists of a C++ library which is compiled once you use the project. Please note, that the compilation is only supported on UNIX-based systems. In the future the C++ library should be replaced by a python library.
API reference¶
pyke package¶
Subpackages¶
pyke.models package¶
Submodules¶
pyke.models.ComplEx module¶
pyke.models.DistMult module¶
pyke.models.HolE module¶
pyke.models.RESCAL module¶
pyke.models.TransD module¶
pyke.models.TransE module¶
pyke.models.TransH module¶
pyke.models.TransR module¶
pyke.models.base module¶
-
class
pyke.models.base.
BaseModel
(ent_count=None, rel_count=None, batch_size=0, variants=0, optimizer=None, norm_func=<function l1>, per_process_gpu_memory_fraction=0.5)[source]¶ Bases:
object
Properties and behaviour that different embedding models share.
-
restore
(prefix: str)[source]¶ Reads a model from filesystem.
Parameters: prefix – Model prefix of the model to laod
-
Module contents¶
Submodules¶
pyke.dataset module¶
pyke.embedding module¶
pyke.library module¶
-
class
pyke.library.
Library
[source]¶ Bases:
object
Manages the connection to the library.
-
CPP_BASE
= 'cpp_library/Base.cpp'¶
-
MAKE_SCRIPT
= 'cpp_library/make.sh'¶
-
static
compile_library
(destination: str)[source]¶ Compile the library to the path
destination
.Parameters: destination – path for the library
-
static
get_library
(temp_dir: str = None, library_name: str = None)[source]¶ Return the C++ library. The function compiles it if it doesn’t exist and it loads the library.
Parameters: - temp_dir – directory where the library is saved (optional)
- library_name – filename of the library
Returns: c++ library
-
library
= None¶
-
library_name
= 'pyke.so'¶
-
static
load_library
(path: str)[source]¶ Loads the library from path.
Parameters: path – path to the library (.so)
-
temp_dir
= '.pyke'¶
-
pyke.norm module¶
pyke.parser module¶
pyke.utils module¶
-
pyke.utils.
get_array_pointer
(a)[source]¶ Returns the address of the numpy array.
Parameters: a – Numpy array Returns: Memory address of the array