Welcome to evalmate’s documentation!¶
Evalmate is a set of tools for evaluate audio related machine learning tasks.
Installation¶
Install the latest stable version:
pip install evalmate
Install the latest development version:
pip install git+https://github.com/ynop/evalmate.git
Changelog¶
Next Version¶
Breaking changes
New Features
evalmate.evaluator.Evaluation.write_report()
andevalmate.evaluater.Evaluation.get_report()
have an argument to pass parameters to templates.
v0.3.0¶
Breaking changes
- Refactoring of all elements, so that it is more obvious which aligner is used for which evaluator and confusion.
New Features
- Introduced False Rejection Rate, False Alarm Rate, Term-Weight Value for the Keyword Spotting task.
- Evaluator for the Automatic Speech Recognition Task
evalmate.evaluator.ASREvaluator
.
v0.2.0¶
New Features
- Introduced
evalmate.evaluator.Outcome
to have a common input structure for reference and hypothesis. - With
evalmate.evaluator.LabelSet
more statistics on reference and hypothesis can be computed. Label-Sets are created viaevalmate.evaluator.Outcome
class.
v0.1.0¶
Initial release
evalmate.evaluator¶
This module implements the top-level functionality for performing the evaluation for the different tasks.
For every task there is an Evaluator (extends Evaluator
) and an Evaluation (extends Evaluation
.
The Evaluator is the is class responsible to perform the evaluation and the Evaluation is the output,
which contains the aligned labels/segments and depending on the task further data like word confusions.
Base¶
-
class
evalmate.evaluator.
Evaluation
(ref_outcome, hyp_outcome)[source]¶ Base class for evaluation results.
Variables: -
get_report
(template=None, template_param=None)[source]¶ Generate and return a report.
Parameters: template (str) – Name of the Jinja2 template to use. If None, the default_template()
is used. All available templates are in thereport_templates
folder.Returns: The rendered report. Return type: str
-
template_data
¶ Return a dictionary that contains objects/values to use in the rendering template.
-
write_report
(path, template=None, template_param=None)[source]¶ Write the report to the given path.
Parameters: - path (str) – Path to write the report to.
- template (str) – Name of the Jinja2 template to use. If None, the
default_template()
is used. All available templates are in thereport_templates
folder.
-
-
class
evalmate.evaluator.
Evaluator
[source]¶ Base class for a evaluator.
Provides methods for reading outcomes in different ways. The evaluator for a specific class then has to implement
do_evaluate
, which performs the evaluation on ref and hyp outcome.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
do_evaluate
(ref, hyp)[source]¶ Create the evaluation result of the given hypothesis compared to the given reference (ground truth).
Parameters: Returns: The evaluation results.
Return type:
-
evaluate
(ref, hyp, label_list_idx=None)[source]¶ Create the evaluation result of the given hypothesis compared to the given reference (ground truth). There are different possibilities of input:
- ref = Outcome / hyp = Outcome: Both ref and hyp are Outcome instances.
See
do_evaluate
- ref = Corpus / hyp = dict: The dict contains label-lists which are compared against the corpus.
See
evaluate_label_lists_against_corpus
- ref = LabelList / hyp = LabelList: Ref label-list is compared against the other.
See
evaluate_label_lists
Parameters: - ref (LabelList, Corpus) – A label-list, a corpus.
- hyp (LabelList, dict) – A label-list, a dict.
- label_list_idx (str) – The label-list to use when reading from a corpus.
Returns: The evaluation results.
Return type: - ref = Outcome / hyp = Outcome: Both ref and hyp are Outcome instances.
See
-
evaluate_label_lists
(ll_ref, ll_hyp, duration=None)[source]¶ Create Evaluation for ref and hyp label-list. If the duration is not provided some metrics cannot be used.
Parameters: - ref (LabelList) – A label-list.
- hyp (LabelList) – A label-list.
- duration (float) – The duration of the utterance, that belongs to the label-lists.
Returns: The evaluation results.
Return type:
-
evaluate_label_lists_against_corpus
(corpus, label_lists, label_list_idx=None)[source]¶ Create Evaluation for the given corpus.
Parameters: - corpus (Corpus) – A corpus containing the reference label-lists.
- label_lists (Dict) – A dictionary containing label-lists with the utterance-idx as key. The utterance-idx is used to find the corresponding reference label-list in the corpus.
- label_list_idx (str) – The idx of the label-lists to use as reference from the corpus. If None, cls.default_label_list_idx is used.
Returns: The evaluation results.
Return type:
-
classmethod
Outcome¶
-
class
evalmate.evaluator.
Outcome
(label_lists=None, utterance_durations=None)[source]¶ An outcome represents the annotation/labels/transcriptions of a dataset/corpus for a given task. This can be either the ground truth/reference or the system output/hypothesis.
If no durations are provided or duration for some utterances are missing, some methods may not work or throw exceptions.
Variables: - label_lists (dict) – Dictionary containing all label-lists with the utterance-idx/sample-idx as key.
- utterance_durations (dict) – Dictionary (utterance-idx/duration) containing the durations of all utterances.
-
all_values
¶ Return a set of all values, occurring in the outcome.
-
label_set_for_value
(value)[source]¶ Return a label-set containing all labels, where the value is value.
Parameters: value (str) – The value to filter. Returns: Label-set containing all labels with the given value. Return type: LabelSet
-
total_duration
¶ Return the duration of all utterances together.
Notes
Only works if for all utterances, the durations are provided.
-
class
evalmate.evaluator.
LabelSet
(labels=None)[source]¶ Class to collect a bunch of labels. This is used to compute statistics over a defined set of labels.
For example we want to compute the average length of all labels with the value ‘music’. We can then collect all these in a label-set and perform the computation.
-
count
¶ Return the number of labels.
-
label_lengths
¶ Return a list containing all label lengths.
-
length_max
¶ Return the length of the longest label.
-
length_mean
¶ Return the mean length of all labels.
-
length_median
¶ Return the median of all label lengths.
-
length_min
¶ Return the length of the shortest label.
-
length_variance
¶ Return the variance of all label lengths.
-
Segment¶
-
class
evalmate.evaluator.
SegmentEvaluation
(ref_outcome, hyp_outcome, utt_to_segments)[source]¶ Result of an evaluation of a segment-based alignment.
Parameters: utt_to_segments (dict) – Dict of lists with
evalmate.alignment.Segment
. Key is the utterance-idx.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion result
-
segments
¶ Return a list of all segment (from all utterances together).
-
template_data
¶ Return a dictionary that contains objects/values to use in the rendering template.
-
class
evalmate.evaluator.
SegmentEvaluator
(aligner=None)[source]¶ Evaluation of an alignment based on segments.
Parameters: aligner (SegmentAligner) – An instance of an event-aligner to use. If not given, the alignment.InvariantSegmentAligner
is used.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
do_evaluate
(ref, hyp)[source]¶ Create the evaluation result of the given hypothesis compared to the given reference (ground truth).
Parameters: Returns: The evaluation results.
Return type:
-
static
flatten_overlapping_labels
(aligned_segments)[source]¶ Check all segments for overlapping labels. Overlapping means there are multiple reference or multiple hypothesis labels in a segment.
Parameters: aligned_segments (List) – List of segments. Returns: List of segments where ref and hyp is a single label. Return type: list Raises: ValueError
– A segment contains overlapping labels.
-
classmethod
Event¶
-
class
evalmate.evaluator.
EventEvaluation
(ref_outcome, hyp_outcome, utt_to_label_pairs)[source]¶ Result of an evaluation of any event-based alignment.
Parameters: utt_to_label_pairs (dict) – Key is the utterance-id, value is a list of
evalmate.alignment.LabelPair
.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion statistics
-
correct_utterances
¶ Return list of utterance-ids that are correct.
-
failing_utterances
¶ Return list of utterance-ids that are not correct.
-
label_pairs
¶ Return a list of all label-pairs (from all utterances together).
-
template_data
¶ Return a dictionary that contains objects/values to use in the rendering template.
-
class
evalmate.evaluator.
EventEvaluator
(aligner)[source]¶ Class to compute evaluation results for any event-based alignment.
Parameters: aligner (EventAligner) – An instance of an event-aligner to use. -
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
classmethod
KWS¶
-
class
evalmate.evaluator.
KWSEvaluation
(ref_outcome, hyp_outcome, utt_to_label_pairs)[source]¶ Result of an evaluation of a keyword spotting task.
Parameters: utt_to_label_pairs (dict) – Key is the utterance-id, value is a list of
evalmate.alignment.LabelPair
.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion statistics
-
false_alarm_rate
(keywords=None)[source]¶ The False Alarm Rate (FAR) is the percentage of detections, where no keyword is according to the ground truth. If no keyword is given the mean FAR is calculated over all keywords. This rate is relative to the duration of all utterances.
To calculate this, we need to know the number of times a keyword could be wrongly inserted. We assume that every keyword takes one second to approximate this value.
Parameters: keywords (list) – Only the FAR for the given keywords is returned. If None or the list is empty it all keywords are considered. Returns: A rate between 0 and 1 Return type: float
-
false_rejection_rate
(keywords=None)[source]¶ The False Rejection Rate (FRR) is the percentage of misses of all occurrences in the ground truth. If no keyword is given the mean FRR is calculated over all keywords.
Parameters: keywords (list) – Only the FRR for the given keywords is returned. If None or the list is empty it all keywords are considered. Returns: A rate between 0 and 1 Return type: float
-
term_weighted_value
(keywords=None)[source]¶ Computes the Term-Weighted Value (TWV).
Note
The TWV is implemented according to OpenKWS 2016 Evaluation Plan
Parameters: keywords (list) – Only the TWV for the given keywords is returned. If None or the list is empty it all keywords are considered. Returns: The TWV in the range 1 to -inf Return type: float
-
class
evalmate.evaluator.
KWSEvaluator
(aligner=None)[source]¶ Class to retrieve evaluation results for a keyword spotting task.
Parameters: aligner (EventAligner) – An instance of an event-aligner to use. If not given the evalmate.alignment.BipartiteMatchingAligner
is user.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
classmethod
ASR¶
-
class
evalmate.evaluator.
ASREvaluation
(ref_outcome, hyp_outcome, utt_to_label_pairs)[source]¶ Result of an evaluation of a automatic speech recognition task.
Parameters: utt_to_label_pairs (dict) – Key is the utterance-id, value is a list of
evalmate.alignment.LabelPair
.Variables: - ref_outcome (Outcome) – The outcome of the ground-truth/reference.
- hyp_outcome (Outcome) – The outcome of the system-output/hypothesis.
- confusion (AggregatedConfusion) – Confusion statistics
-
class
evalmate.evaluator.
ASREvaluator
(aligner=None)[source]¶ Class to retrieve evaluation results for a automatic speech recognition task.
Parameters: aligner (EventAligner) – An instance of an event-aligner to use. If not given, the alignment.LevenshteinAligner
is used.-
classmethod
default_label_list_idx
()[source]¶ Define the default label-lists which is used when reading a corpus.
-
classmethod
evalmate.confusion¶
This module contains classes for computing confusion statistics.
Confusion¶
-
class
evalmate.confusion.
Confusion
[source]¶ Base class that provides methods for computing common metrics.
-
accuracy
¶ Accuracy = correct / (total + insertions)
-
correct
¶ Amount that is correct.
Example
>>> ref = 'xxx' >>> hyp = 'xxx'
-
deletions
¶ Amount that is deleted.
Example
>>> ref = 'xxx' >>> hyp = None
-
error_rate
¶ ErrorRate = (substitutions + deletions + insertions) / total
-
f_measure
(beta=1)[source]¶ F-Measure see https://en.wikipedia.org/wiki/Precision_and_recall
-
false_negatives
¶ Amount of false negatives (No indication of precence, when it should be present).
Note
Equal to ‘self.total - self.correct’
-
false_positives
¶ Amount of false positives (Indications of presence, when it is not present).
Note
Equal to self.insertions + self.substitutions_out
-
insertions
¶ Amount that is inserted.
Example
>>> ref = None >>> hyp = 'xxx'
-
precision
¶ Precision = tp / (fp + tp)
-
recall
¶ Recall = tp / (fn + tp)
-
substitutions
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions
is the amount where the specific instance was substituted with some other instance/event. If not it is not necessary to designate which event/instance substitutes which event/instance.Example
>>> ref = 'xxx' >>> hyp = 'yyy'
-
substitutions_out
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions_out
is the amount where the specific instance was output, when some other event/instance was expected (reference). If not it is equal tosubstitutions
.Example
>>> ref = 'yyy' >>> hyp = 'xxx'
-
total
¶ Return the total amount based on the reference system.
Note
Equal to ‘self.correct + self.deletions + self.substitutions’
-
true_positives
¶ Amount of true positives (Correct indications).
Note
Equal to self.correct
-
SegmentConfusion¶
-
class
evalmate.confusion.
SegmentConfusion
(value)[source]¶ Class to represent confusions of a specific instance (e.g. some class) based on segments. The insertions, deletions and so on represent the time in seconds the instance was confused (or not).
- Argument:
- value (str): The value of the instance (e.g. the class “speech”)
Variables: - correct_segments (list) – (List of Segment) Segments that are correct (ref == hyp).
- insertion_segments (list) – (List of Segment) Segments that are insertions (ref = None, hyp = ‘value’).
- deletion_segments (list) – (List of Segment) Segments that are deletions (ref = ‘value’, hyp = None)
- substitution_segments (Dict) – Segments that are substitutions with other values (ref = ‘value’, hyp = ‘other-value’). Dict holding a list for every other-value.
- substitution_out_segments (Dict) – Segments that are substitutions of other values (ref = ‘other-value’, hyp = ‘value’). Dict holding a list for every other-value.
-
correct
¶ Amount that is correct.
Example
>>> ref = 'xxx' >>> hyp = 'xxx'
-
deletions
¶ Amount that is deleted.
Example
>>> ref = 'xxx' >>> hyp = None
-
insertions
¶ Amount that is inserted.
Example
>>> ref = None >>> hyp = 'xxx'
-
substitutions
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions
is the amount where the specific instance was substituted with some other instance/event. If not it is not necessary to designate which event/instance substitutes which event/instance.Example
>>> ref = 'xxx' >>> hyp = 'yyy'
-
substitutions_out
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions_out
is the amount where the specific instance was output, when some other event/instance was expected (reference). If not it is equal tosubstitutions
.Example
>>> ref = 'yyy' >>> hyp = 'xxx'
EventConfusion¶
-
class
evalmate.confusion.
EventConfusion
(value)[source]¶ Class to represent confusions of a specific instance (e.g. some class) based on label-to-label alignment. The insertions, deletions and so on represent the number of times a label was confused (or not).
- Argument:
- value (str): The value of the instance (e.g. the class “speech”)
Variables: - correct_pairs (list) – (List of LabelPair) Correct matches.
- insertion_pairs (list) – (List of LabelPair) Insertions (ref = None, hyp = value)
- deletion_pairs (list) – (List of LabelPair) Deletions (ref = value, hyp = None)
- substitution_pairs (Dict) – Substitutions with other values (ref = value, hyp = other-value). Dict holding a list for every other-value.
- substitution_out_pairs (Dict) – Substitutions from other values (ref = other-value, hyp = value) Dict holding a list for every other-value.
-
correct
¶ Amount that is correct.
Example
>>> ref = 'xxx' >>> hyp = 'xxx'
-
deletions
¶ Amount that is deleted.
Example
>>> ref = 'xxx' >>> hyp = None
-
insertions
¶ Amount that is inserted.
Example
>>> ref = None >>> hyp = 'xxx'
-
substitutions
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions
is the amount where the specific instance was substituted with some other instance/event. If not it is not necessary to designate which event/instance substitutes which event/instance.Example
>>> ref = 'xxx' >>> hyp = 'yyy'
-
substitutions_by_count
()[source]¶ Return a list of tuples (Substituted-value, Number-of-substitutions) ordered by number of substitutions descending.
Returns: List of tuples. Return type: list
-
substitutions_out
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions_out
is the amount where the specific instance was output, when some other event/instance was expected (reference). If not it is equal tosubstitutions
.Example
>>> ref = 'yyy' >>> hyp = 'xxx'
AggregatedConfusion¶
-
class
evalmate.confusion.
AggregatedConfusion
[source]¶ Class to aggregate multiple confusions.
Variables: instances (dict) – Dictionary containing the aggregated confusions. -
correct
¶ Amount that is correct.
Example
>>> ref = 'xxx' >>> hyp = 'xxx'
-
deletions
¶ Amount that is deleted.
Example
>>> ref = 'xxx' >>> hyp = None
-
get_confusion_with_instances
(instances)[source]¶ Return a new AggregatedConfusion with only the given instances.
Parameters: instances (list) – A list of strings containing the keys of the instances to include in the new confusion. Returns: A confusion with only the given instances. Return type: AggregatedConfusion
-
insertions
¶ Amount that is inserted.
Example
>>> ref = None >>> hyp = 'xxx'
-
precision_mean
¶ Calculate mean precision of all instances.
-
recall_mean
¶ Calculate mean recall of all instances.
-
substitutions
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions
is the amount where the specific instance was substituted with some other instance/event. If not it is not necessary to designate which event/instance substitutes which event/instance.Example
>>> ref = 'xxx' >>> hyp = 'yyy'
-
substitutions_out
¶ Amount that is substituted.
If this stats are representing stats for a specific instance (e.g. occurrence of the word ‘hello’)
substitutions_out
is the amount where the specific instance was output, when some other event/instance was expected (reference). If not it is equal tosubstitutions
.Example
>>> ref = 'yyy' >>> hyp = 'xxx'
-
evalmate.alignment¶
This module contains functionality for aligning labels of a ground truth with the labels of a system output.
Base classes¶
All aligners are based either on EventAligner
or SegmentAligner
.
The base classes are mainly distinguished by the type of the alignment they return.
While the EventAligner
returns a mapping between complete labels,
the SegmentAligner
returns segments, that can span over parts of labels.
-
class
evalmate.alignment.
EventAligner
[source]¶ Abstract class for aligner classes that return a mapping between labels (events).
An alignment is a mapping between labels from the ground truth (ref) and the system output (hyp). If there is no matching label in the system output for a label in the ground truth, it has to be aligned to
None
and vice versa. A single label can be aligned to multiple other labels.-
align
(ref_labels, hyp_labels)[source]¶ Return an alignment between the labels of the two label-lists.
Parameters: - ref_labels (list) – The list containing labels of the ground truth.
- hyp_labels (list) – The list containing labels of the system output.
Returns: A list of
evalmate.alignment.LabelPair
. Every pair contains one label from the ground truth and one from the system output, that are aligned. One of them also can beNone
.Return type: list
-
-
class
evalmate.alignment.
SegmentAligner
[source]¶ Abstract class for aligner classes that align labels in segments.
An alignment is represented as a list of Segments with start/end-time and the labels from the ground truth and the system output, that are within this segment.
-
align
(ref_labels, hyp_labels)[source]¶ Return an alignment of segments.
Parameters: - ref_labels (list) – The list containing labels of the ground truth.
- hyp_labels (list) – The list containing labels of the system output.
Returns: A list of
evalmate.utils.structure.Segment
. Every segment has start/end-time and two lists of labels that are contained in the segment (one for the ground truth and one for the system output).Return type: list
-
Time-Based¶
Align labels based on some distance metric based on their start/endtimes.
-
class
evalmate.alignment.
BipartiteMatchingAligner
(candidate_finder=None, non_overlap_penalty_weight=1, substitution_penalty=2, insertion_penalty=10, deletion_penalty=10)[source]¶ Create event-based alignment, based on bipartite matching.
1. In a first step for every possible label-pair between ref and hyp, it is decided if a mapping of such a pair is possible. For this a CandidateFinder is used.
2. Using penalty and weight parameters, for every pair a penalty is computed for aligning the pair.
3. From all the pairs and the computed probabilities, the best alignment is computed using bipartite matching. So that every label only occurs once in the final alignment.
Parameters: - candidate_finder (CandidateFinder) – CandidateFinder to use for finding potential labels for alignment.
- non_overlap_penalty_weight (float) – Weight-factor of penalty for the non-overlapping ratio between two labels.
- substitution_penalty (float) – Penalty for aligning two labels with different values.
- deletion_penalty (float) – Penalty for aligning a reference-label with no hypothesis-label.
- insertion_penalty (float) – Penalty for aligning a hypothesis-label with no reference-label.
-
align
(ref_labels, hyp_labels)[source]¶ Return an alignment between the events of the given label-lists.
Parameters: - ref_labels (list) – The list containing labels of the ground truth.
- hyp_labels (list) – The list containing labels of the system output.
Returns: A list of
evalmate.alignment.LabelPair
. Every pair contains one label (event) from the ground truth and one from the system output, that are aligned. One of them also can beNone
.Return type: list
-
class
evalmate.alignment.
FullMatchingAligner
(min_overlap=0)[source]¶ Event-based alignment, where all possible matches are returned. So a single label can occur multiple times, but with a different counterpart.
Parameters: min_overlap (float) – Number of seconds the segment of overlap has to be, to align two labels. If 0
, any overlap is accepted.-
align
(ref_labels, hyp_labels)[source]¶ Return an alignment between the labels of the two label-lists.
Parameters: - ref_labels (list) – The list containing labels of the ground truth.
- hyp_labels (list) – The list containing labels of the system output.
Returns: A list of
evalmate.alignment.LabelPair
. Every pair contains one label (event) from the ground truth and one from the system output, that are aligned. One of them also can beNone
.Return type: list
-
Sequence-Based¶
Align labels only considering the ordering of the sequence.
-
class
evalmate.alignment.
LevenshteinAligner
(deletion_cost=3, insertion_cost=3, substitution_cost=4, custom_substitution_cost_function=None)[source]¶ Alignment of labels of two label-lists based on the Levenshtein distance (https://en.wikipedia.org/wiki/Levenshtein_distance).
This only takes the order of the labels into account, not the start and end-times.
Parameters: - deletion_cost (float) – Cost for a deletion in the alignment.
- insertion_cost (float) – Cost for a insertion in the alignment.
- substitution_cost (float) – Cost for a substitution in the alignment.
- custom_substitution_cost_function (func) – Function to calculate substitution cost depending on the elements. The function has to take two paramters (ref-label, hyp-label).
-
align
(ref_labels, hyp_labels)[source]¶ Return an alignment between the labels of the given label-lists.
Parameters: - ref_labels (list) – The list containing labels of the ground truth.
- hyp_labels (list) – The list containing labels of the system output.
Returns: A list of
evalmate.alignment.LabelPair
. Every pair contains one label from the ground truth and one from the system output, that are aligned. One of them also can beNone
.Return type: list
Example
>>> from audiomate.corpus import assets >>> >>> reference = [ >>> assets.Label('a'), >>> assets.Label('b'), >>> assets.Label('c') >>> ] >>> hypothesis = [ >>> assets.Label('a'), >>> assets.Label('c') >>> ] >>> >>> LevenshteinAligner().align(reference, hypothesis) [ LabelPair(Label('a'), Label('a')), LabelPair(Label('b'), None), LabelPair(Label('c'), Label('c')) ]
Segment-Based¶
Align labels based on segments defined by start/end-time.
-
class
evalmate.alignment.
InvariantSegmentAligner
[source]¶ Create a segment-based alignment so that within every segment the same labels are active. So for example as reference we have a label-list as following.
>>> [ A ] [ B ] [ A ] >>> [ E ]
The output of some system (hypothesis) maybe as follows:
>>> [ Ax ] [ Ex ] [ Ax ]
Now the segments returned are created, so every segment represents some time range where the labels are equal.
>>> S1 S2 S3 S4 S5 S6 S7 S8 >>> >>> HYP | A | | B | B | | A | | | >>> HYP | | | | | | E | | | >>> REF | Ax | | Ex | | | | | Ax |
-
align
(ref_labels, hyp_labels)[source]¶ Create segment based alignment.
Parameters: - ref_labels (list) – The list with reference labels.
- hyp_labels (list) – The list with hypothesis labels.
Returns: A list of Segments.
Return type: list
Example
>>> from audiomate.corpus import assets >>> >>> ref = [ >>> assets.Label('a', 0, 3), >>> assets.Label('b', 3, 6), >>> assets.Label('c', 7, 10) >>> ] >>> >>> hyp = [ >>> assets.Label('a', 0, 3), >>> assets.Label('b', 4, 8), >>> assets.Label('c', 8, 10) >>> ] >>> >>> InvariantSegmentAligner().align(ref, hyp) [ 0 - 3 REF: [Label(a, 0, 3)] HYP: [Label(a, 0, 3)] 3 - 4 REF: [Label(b, 3, 6)] HYP: [] 4 - 6 REF: [Label(b, 3, 6)] HYP: [Label(b, 4, 8)] 6 - 7 REF: [] HYP: [Label(b, 4, 8)] 7 - 8 REF: [Label(c, 7, 10)] HYP: [Label(b, 4, 8)] 8 - 10 REF: [Label(c, 7, 10)] HYP: [Label(c, 8, 10)] ]
-
static
create_event_list
(ref_labels, hyp_labels, time_threshold=0.01)[source]¶ Create an event list of all labels.
Parameters: - ref_labels (list) – Reference labels.
- hyp_labels (list) – Hypothesis labels.
- time_threshold (float) – If two event times are closer than this threshold the time of the earlier event is used for both events.
Returns: List of list of tuples. Every tuple contains a time, type (start or end), ll_index (ref/hyp) and the label which is responsible for the event. It is sorted ascending by time.
Return type: list
-
Candidates¶
Classes to find possible pairs of labels for alignment.
-
class
evalmate.alignment.
CandidateFinder
[source]¶ Class to find possible pairs of labels for further alignment. This is used for preprocessing and finding pairs of labels that may be aligned together. A label can be a candidate in multiple pairs.
-
find
(ref_labels, hyp_labels)[source]¶ Return candidates as pairs of labels, as well as labels that have no possible counterparts.
Parameters: - ref_labels (list) – List with reference labels (ground truth).
- hyp_labels (list) – List with hypothesis labels (system output).
Returns: A tuple (candidates, single-ref, single-hyp) containing the candidates in paris, the ref-labels and the hyp-labels, that have no possible counterpart.
Return type: tuple
-
-
class
evalmate.alignment.
StartEndCandidateFinder
(start_delta_threshold, end_delta_threshold=-1)[source]¶ Finds candidates based on the difference between the start (and end) of two labels for a possible pairs.
Parameters: - start_delta_threshold (float) – Temporal tolerance of the start time in seconds. If the delta between the starts of the two labels is greater it is not a matching pair.
- end_delta_threshold (float) – Temporal tolerance of the end time in seconds. If the delta between the ends of the two labels is greater it is not a matching pair. If < 0 the end time is not checked at all.
-
find
(ref_labels, hyp_labels)[source]¶ Return candidates as pairs of labels, as well as labels that have no possible counterparts.
Parameters: - ref_labels (list) – List with reference labels (ground truth).
- hyp_labels (list) – List with hypothesis labels (system output).
Returns: A tuple (candidates, single-ref, single-hyp) containing the candidates in paris, the ref-labels and the hyp-labels, that have no possible counterpart.
Return type: tuple
-
class
evalmate.alignment.
OverlapCandidateFinder
(min_overlap=0.05)[source]¶ Finds candidates based on amount of overlapping between two labels.
Parameters: min_overlap (float) – Number of seconds the segment of overlap has to be, to include the combination of labels. (default 0.05 seconds) -
find
(ref_labels, hyp_labels)[source]¶ Return candidates as pairs of labels, as well as labels that have no possible counterparts.
Parameters: - ref_labels (list) – List with reference labels (ground truth).
- hyp_labels (list) – List with hypothesis labels (system output).
Returns: A tuple (candidates, single-ref, single-hyp) containing the candidates in paris, the ref-labels and the hyp-labels, that have no possible counterpart.
Return type: tuple
-
Utils¶
-
class
evalmate.alignment.
Segment
(start, end, ref=None, hyp=None)[source]¶ A class representing a segment within an alignment.
Parameters: - start (float) – The start time in seconds.
- end (float) – The end time in seconds.
Variables: - ref (Label, list) – List of or single reference label in the segment.
- hyp (Label, list) – List of or single hypothesis label in the segment.