Welcome to TERNIP’s documentation!

Contents:

ternip Package

ternip Package

ternip.__init__.normaliser()[source]

Returns default normaliser, already configured.

ternip.__init__.recogniser()[source]

Returns the default recogniser, already configured.

timex Module

class ternip.timex.Timex(type=None, value=None, id=None)[source]

Bases: object

A temporal expression

ternip.timex.add_timex_ids(ts)[source]

Goes through all TIMEXs and adds IDs to the timexes, if they don’t exist already. Each ID is an integer, and is guaranteed to be unique in this set of timexes.

Subpackages

formats Package

formats Package
gate Module
class ternip.formats.gate.GateDocument(file)[source]

Bases: object

A class to facilitate communication with GATE

get_dct_sents()[source]

Returns the creation time sents for this document.

get_sents()[source]

Returns a representation of this document in the [[(word, pos, timexes), ...], ...] format.

reconcile(sents)[source]

Update this document with the newly annotated tokens.

reconcile_dct(dct)[source]

Adds a TIMEX to the DCT tag and return the DCT

tempeval2 Module
class ternip.formats.tempeval2.TempEval2Document(file, docid='', dct='XXXXXXXX')[source]

Bases: object

A class which uses the format of stand-off format of TempEval-2

static create(sents, docid='')[source]

Creates a TempEval-2 document from the internal representation

sents is the [[(word, pos, timexes), ...], ...] format.

get_attrs()[source]

Print out the format suitable for timex-attributes.tab

get_dct_sents()[source]

Returns the creation time sents for this document.

get_extents()[source]

Print out the format suitable for timex-extents.tab

get_sents()[source]

Returns a representation of this document in the [[(word, pos, timexes), ...], ...] format.

static load_multi(file, dct_file)[source]

Load multiple documents from a single base-segmentation.tab

reconcile(sents)[source]

Update this document with the newly annotated tokens.

reconcile_dct(dct)[source]

Adds a TIMEX to the DCT tag and return the DCT

tern Module
class ternip.formats.tern.TernDocument(file, nodename='TEXT', has_S=False, has_LEX=False, pos_attr=False)[source]

Bases: ternip.formats.timex2.Timex2XmlDocument

A class which can handle TERN documents

static create(sents, docid, tok_offsets=None, add_S=False, add_LEX=False, pos_attr=False, dct='')[source]

Creates a TERN document from the internal representation

sents is the [[(word, pos, timexes), ...], ...] format.

tok_offsets is used to correctly reinsert whitespace lost in tokenisation. It’s in the format of a list of lists of integers, where each integer is the offset from the start of the sentence of that token. If set to None (the default), then a single space is assumed between all tokens.

If add_S is set to something other than false, then the tags to indicate sentence boundaries are added, with the name of the tag being the value of add_S

add_LEX is similar, but for token boundaries

pos_attr is similar but refers to the name of the attribute on the LEX (or whatever) tag that holds the POS tag.

dct is the document creation time string

get_dct_sents()[source]

Returns the creation time sents for this document.

reconcile_dct(dct, add_S=False, add_LEX=False, pos_attr=False)[source]

Adds a TIMEX to the DCT tag and return the DCT

timeml Module
class ternip.formats.timeml.TimeMlDocument(file, nodename=None, has_S=False, has_LEX=False, pos_attr=False)[source]

Bases: ternip.formats.timex3.Timex3XmlDocument

A class which holds a TimeML representation of a document.

Suitable for use with the AQUAINT dataset.

static create(sents, tok_offsets=None, add_S=False, add_LEX=False, pos_attr=False)[source]

Creates a TimeML document from the internal representation

sents is the [[(word, pos, timexes), ...], ...] format.

tok_offsets is used to correctly reinsert whitespace lost in tokenisation. It’s in the format of a list of lists of integers, where each integer is the offset from the start of the sentence of that token. If set to None (the default), then a single space is assumed between all tokens.

If add_S is set to something other than false, then the tags to indicate sentence boundaries are added, with the name of the tag being the value of add_S

add_LEX is similar, but for token boundaries

pos_attr is similar but refers to the name of the attribute on the LEX (or whatever) tag that holds the POS tag.

timex2 Module
class ternip.formats.timex2.Timex2XmlDocument(file, nodename=None, has_S=False, has_LEX=False, pos_attr=False)[source]

Bases: ternip.formats.xml_doc.XmlDocument

A class which takes any random XML document and adds TIMEX2 tags to it.

timex3 Module
class ternip.formats.timex3.Timex3XmlDocument(file, nodename=None, has_S=False, has_LEX=False, pos_attr=False)[source]

Bases: ternip.formats.xml_doc.XmlDocument

A class which takes any random XML document and adds TIMEX3 tags to it.

Suitable for use with Timebank, which contains many superfluous tags that aren’t in the TimeML spec, even though it claims to be TimeML.

xml_doc Module
exception ternip.formats.xml_doc.BadNodeNameError[source]

Bases: exceptions.Exception

exception ternip.formats.xml_doc.NestingError(s)[source]

Bases: exceptions.Exception

exception ternip.formats.xml_doc.TokeniseError(s)[source]

Bases: exceptions.Exception

class ternip.formats.xml_doc.XmlDocument(file, nodename=None, has_S=False, has_LEX=False, pos_attr=False)[source]

Bases: object

An abstract base class which all XML types can inherit from. This implements almost everything, apart from the conversion of timex objects to and from timex tags in the XML. This is done by child classes

static create(sents, tok_offsets=None, add_S=False, add_LEX=False, pos_attr=False)[source]

This is an abstract function for building XML documents from the internal representation only. You are not guaranteed to get out of get_sents what you put in here. Sentences and words will be retokenised and retagged unless you explicitly add S and LEX tags and the POS attribute to the document using the optional arguments.

sents is the [[(word, pos, timexes), ...], ...] format.

tok_offsets is used to correctly reinsert whitespace lost in tokenisation. It’s in the format of a list of lists of integers, where each integer is the offset from the start of the sentence of that token. If set to None (the default), then a single space is assumed between all tokens.

If add_S is set to something other than false, then the tags to indicate sentence boundaries are added, with the name of the tag being the value of add_S

add_LEX is similar, but for token boundaries

pos_attr is similar but refers to the name of the attribute on the LEX (or whatever) tag that holds the POS tag.

get_dct_sents()[source]

Returns the creation time sents for this document.

get_sents()[source]

Returns a representation of this document in the [[(word, pos, timexes), ...], ...] format.

If there are any TIMEXes in the input document that cross sentence boundaries (and the input is not already broken up into sentences with the S tag), then those TIMEXes are disregarded.

reconcile(sents, add_S=False, add_LEX=False, pos_attr=False)[source]

Reconciles this document against the new internal representation. If add_S is set to anything other than False, this means tags are indicated to indicate the sentence boundaries, with the tag names being the value of add_S. add_LEX is the same, but for marking token boundaries, and pos_attr is the name of the attribute which holds the POS tag for that token. This is mainly useful for transforming the TERN documents into something that GUTime can parse.

If your document already contains S and LEX tags, and add_S/add_LEX is set to add them, old S/LEX tags will be stripped first. If pos_attr is set and the attribute name differs from the old POS attribute name on the lex tag, then the old attribute will be removed.

Sentence/token boundaries will not be altered in the final document unless add_S/add_LEX is set. If you have changed the token boundaries in the internal representation from the original form, but are not then adding them back in, reconciliation may give undefined results.

There are some inputs which would output invalid XML. For example, if this document has elements which span multiple sentences, but not whole parts of them, then you will be unable to add XML tags and get valid XML, so failure will occur in unexpected ways.

If you are adding LEX tags, and your XML document contains tags internal to tokens, then reconciliation will fail, as it expects tokens to be in a continuous piece of whitespace.

reconcile_dct(dct, add_S=False, add_LEX=False, pos_attr=False)[source]

Adds a TIMEX to the DCT tag and return the DCT

strip_tag(tagname)[source]

Remove this tag from the document.

strip_timexes()[source]

Strips all timexes from this document. Useful if we’re evaluating the software - we can just feed in the gold standard directly and compare the output then.

rule_engine Package

expressions Module
normalisation_rule Module
class ternip.rule_engine.normalisation_rule.NormalisationRule(match, type=None, id='', value=None, change_type=None, freq=None, quant=None, mod=None, guards=None, after_guards=None, before_guards=None, sent_guards=None, after=None, tokenise=True, deliminate_numbers=False)[source]

Bases: ternip.rule_engine.rule.Rule

A class that represents normalisation rules

apply(timex, cur_context, dct, body, before, after)[source]

Applies this rule to this timex, where body is the full extent covered by this timex, before is the preceeding text in the sentence, and after is the proceeding text in the sentence, in the [(token, POS), ...] form

A boolean indicating whether or not application was successful is returned. The timex may also be modified, so should be passed in by reference.

normalisation_rule_block Module
class ternip.rule_engine.normalisation_rule_block.NormalisationRuleBlock(id, after, type, rules)[source]

Bases: ternip.rule_engine.rule_block.RuleBlock

A block of normalisation rules

apply(timex, cur_context, dct, body, before, after)[source]

Apply rules in this block, in order, to this sentence, either until one rule is successful, or all rules have been applied.

normalisation_rule_engine Module
class ternip.rule_engine.normalisation_rule_engine.NormalisationRuleEngine[source]

Bases: ternip.rule_engine.rule_engine.RuleEngine

A class which does normalisation using a rule engine

Complex rules must have a string member called ‘id’, which is used for after ordering, a list of strings called ‘after’ (which can be an empty list) which consists of IDs that must have run before this rule. Additionally, a function called ‘apply’ which takes a list of (token, pos, timexes) tuples and returns them in the same form with potentially modified timexes.

annotate(sents, dct)[source]

This annotates all the timexes in the sents. dct means the document creation time (in the TIDES-modified ISO8601 format), which some rules may use to determine a context.

recognition_rule Module
class ternip.rule_engine.recognition_rule.RecognitionRule(match, type, id, guards=None, after_guards=None, before_guards=None, after=None, squelch=False, case_sensitive=False, deliminate_numbers=False)[source]

Bases: ternip.rule_engine.rule.Rule

A class that represents identification rules

apply(sent)[source]

Applies this rule to the tokenised sentence. The ‘after’ ordering must be checked by the caller to ensure correct rule application.

sent is a list of tuples (token, POS, [timexes])

A tuple is returned where the first element is a list in the same form as sent, with additional timexes added to the 3rd element if need be, and the second element in the tuple is whether or not this rule matched anything

recognition_rule_block Module
class ternip.rule_engine.recognition_rule_block.RecognitionRuleBlock(id, after, type, rules)[source]

Bases: ternip.rule_engine.rule_block.RuleBlock

A block of recognition rules

apply(sent)[source]

Apply rules in this block, in order, to this sentence, either until one rule is successful, or all rules have been applied.

recognition_rule_engine Module
class ternip.rule_engine.recognition_rule_engine.RecognitionRuleEngine[source]

Bases: ternip.rule_engine.rule_engine.RuleEngine

A class which does recognition using a rule engine

Complex rules must have a string member called ‘id’, which is used for after ordering, a list of strings called ‘after’ (which can be an empty list) which consists of IDs that must have run before this rule. Additionally, a function called ‘apply’ which takes a list of (token, pos, timexes) tuples and returns them in the same form with potentially modified timexes.

tag(sents)[source]

This function actually does word recognition. It expects content to be split into tokenised, POS tagged, sentences. i.e., a list of lists of tuples ([[(token, pos-tag, timexes), ...], ...]). Rules are applied one at a time.

What is returned is in the same form, except the token tuples contain a third element consisting of the set of timexes associated with that token.

rule Module
class ternip.rule_engine.rule.Rule[source]

Bases: object

Base class for recognition and normalisation rules

rule_block Module
class ternip.rule_engine.rule_block.RuleBlock(id, after, type, rules)[source]

Bases: object

rule_engine Module
class ternip.rule_engine.rule_engine.RuleEngine[source]

Bases: object

A base class for rule engines to use

load_block(filename)[source]

Load a block of rules, then check for consistency Throws rule_load_errors if a rule fails to load

load_rule(filename)[source]

Load a rule, then check for consistency

Throws rule_load_error if a rule fails to load

load_rules(path)[source]

Do rule loading. Loads all files ending in .pyrule as ‘complex’ rules (direct Python code), .rule using the documented rule format, and .ruleblock as blocks which contain sequences of rules. For direct Python code, the rule must be a class called ‘rule’.

Throws rule_load_errors containing errors for all rules that failed to load.

exception ternip.rule_engine.rule_engine.RuleLoadError(filename, errorstr)[source]

Bases: exceptions.Exception

Error for when a rule fails to load

exception ternip.rule_engine.rule_engine.RuleLoadErrors(errors)[source]

Bases: exceptions.Exception

Error which bundles multiple rule_load_errors together. Allows for delayed exit on multiple load errors.

Subpackages
normalisation_functions Package
date_functions Module
ternip.rule_engine.normalisation_functions.date_functions.convert_to_24_hours(time, ap)[source]

Given a hour and an a/p specifier, then convert the hour into 24 hour clock if need be

ternip.rule_engine.normalisation_functions.date_functions.date_to_dow(y, m, d)[source]

Gets the integer day of week for a date. Sunday is 0.

ternip.rule_engine.normalisation_functions.date_functions.date_to_iso(string)[source]

A translation of GUTime’s Date2ISO function. Given some date/time string representing an absolute date, then return a date string in the basic ISO format.

ternip.rule_engine.normalisation_functions.date_functions.date_to_week(y, m, d)[source]

Convert a date into a week number string, with year

ternip.rule_engine.normalisation_functions.date_functions.easter_date(y)[source]

Return the date of Easter for that year as a string

ternip.rule_engine.normalisation_functions.date_functions.extract_timezone(string)[source]

Given some string, try and extract the timezone it refers to. Returns a string.

ternip.rule_engine.normalisation_functions.date_functions.normalise_two_digit_year(y)[source]

Given a year string, which could be 2 digits, try and get a 4 digit year out of it as a string

ternip.rule_engine.normalisation_functions.date_functions.nth_dow_to_day()[source]

Figures out the day of the nth day-of-week in the month m and year y as an integer

e.g., 2nd Wednesday in July 2010:
nth_dow_to_day((7, 3, 2), 2010)

Conversion from GUTime

relative_date_functions Module
ternip.rule_engine.normalisation_functions.relative_date_functions.compute_offset_base(ref_date, expression, current_direction)[source]

Given a reference date, some simple expression (yesterday/tomorrow or a day of week) and the direction of the relative expression, the base date with which to compute the offset from as a date string

ternip.rule_engine.normalisation_functions.relative_date_functions.offset_from_date(v, offset, gran='D', exact=False)[source]

Given a date string and some numeric offset, as well as a unit, then compute the offset from that value by offset gran’s. Gran defaults to D. If exact is set to true, then the exact date is figured out, otherwise the level of granuality given by gran is used. Returns a date string.

ternip.rule_engine.normalisation_functions.relative_date_functions.relative_direction_heuristic(before, after)[source]

Given what preceeds and proceeds a TIMEX, then use heuristics to use tense to compute which direction a relative expression is in. Converted from GUTime.

string_conversions Module

Functions which convert strings to some number index

ternip.rule_engine.normalisation_functions.string_conversions.build_duration_value(num, unit)[source]
ternip.rule_engine.normalisation_functions.string_conversions.day_to_num(day)[source]

Given the name of a day, the number of that day. Sunday is 0. Invalid data gets 7. All returned as integers.

ternip.rule_engine.normalisation_functions.string_conversions.decade_nums(dec)[source]

Given the decade component (less the ty suffix) of a year, the number of that year as an integer.

ternip.rule_engine.normalisation_functions.string_conversions.fixed_holiday_date(hol)[source]

Get the date string MMDD of a holiday

ternip.rule_engine.normalisation_functions.string_conversions.month_to_num(m)[source]

Given a name of a month, get the number of that month. Invalid data gets 0. Returned as an integer.

ternip.rule_engine.normalisation_functions.string_conversions.nth_dow_holiday_date(hol)[source]

Given the name of a holiday which always occur on the Nth X of some month, where X is day of week, returns tuples of the form (month, dow, n) representing the information about that holiday.

ternip.rule_engine.normalisation_functions.string_conversions.season(s)[source]

Transforms a season name into an identifer from TIDES. Invalid data gets returned as is

ternip.rule_engine.normalisation_functions.string_conversions.season_to_month(s)[source]

Convert seasons to months (roughly), returns an int

ternip.rule_engine.normalisation_functions.string_conversions.units_to_gran(unit)[source]

Given a word, or part of a word, that represents a unit of time, return the single character representing the granuality of that unit of time

words_to_num Module
ternip.rule_engine.normalisation_functions.words_to_num.ordinal_to_num(o)[source]

Given an ordinal (i.e., thirty-first or second) in the range 1st-31st (both numbers and words accepted), return the number value of that ordinal. Unrecognised data gets 1. Returns an integer

ternip.rule_engine.normalisation_functions.words_to_num.words_to_num(words)[source]

Converted from GUTime. Given a string of number words, attempts to derive the numerical value of those words. Returns an integer.

Indices and tables