The (Canonical) CitationExtractor

Todo

Insert one liner description.

About

TODO.

Installation

Installing TreeTagger

With pip

Overview

Named Entity Disambiguation

Todo

Brief explanation of what it means to disambiguate a reference.

Citation Matchers

Candidate Generation

Feature Extraction

Running the extraction pipeline

There are two ways of running the citation extraction pipeline:

  1. by using a command line interface
  2. by calling directly its API

Command Line Interface

API

Input/Output

IOB/CONLL

Functions to deal with input/output of data in CONLL/IOB format.

citation_extractor.io.iob.count_tokens(instances)[source]

Short summary.

Parameters:instances (type) – Description of parameter instances.
Returns:Description of returned object.
Return type:type
citation_extractor.io.iob.file_to_instances(inp_file)[source]

Reads a IOB file a converts it into a list of instances.

Parameters:inp_file (type) – Path to the input IOB file.
Returns:A list of tuples, where tuple[0] is the token and tuple[1] contains its assigned label.
Return type:list
citation_extractor.io.iob.filter_IOB(instances, tag_name)[source]

docstring for filter_IOB

citation_extractor.io.iob.instance_contains_label(instance, labels=['O'])[source]

TODO:

citation_extractor.io.iob.instance_to_string(instance)[source]

Converts a feature dictionary into a string representation.

Parameters:instance (type) – Description of parameter instance.
Returns:Description of returned object.
Return type:type
citation_extractor.io.iob.write_iob_file(instances, dest_file)[source]

Write a set of instances to an IOB file.

Parameters:
  • instances (list) – Description of parameter instances.
  • dest_file (str) – Description of parameter dest_file.
Returns:

Description of returned object.

Return type:

boolean

Brat standoff format

Apache Uima XMI

Indices and tables