Welcome to DyNN’s documentation!¶
DyNN¶
dynn package¶
DyNN¶
Subpackages¶
dynn.data package¶
Data¶
This module contains helper functions and classes to manage data. This includes code for minibatching as well as functions for downloading common datasets.
Supported datasets are:
Subpackages¶
Iterators implementing common batching strategies.
-
class
dynn.data.batching.
NumpyBatches
(data, targets, batch_size=32, shuffle=True)¶ Bases:
object
Wraps a list of numpy arrays and a list of targets as a batch iterator.
You can then iterate over this object and get tuples of
batch_data, batch_targets
ready for use in your computation graph.Example for classification:
# 1000 10-dimensional inputs data = np.random.uniform(size=(1000, 10)) # Class labels labels = np.random.randint(10, size=1000) # Iterator batched_dataset = NumpyBatches(data, labels, batch_size=20) # Training loop for x, y in batched_dataset: # x has shape (10, 20) while y has shape (20,) # Do something with x and y
Example for multidimensional regression:
# 1000 10-dimensional inputs data = np.random.uniform(size=(1000, 10)) # 5-dimensional outputs labels = np.random.uniform(size=(1000, 5)) # Iterator batched_dataset = NumpyBatches(data, labels, batch_size=20) # Training loop for x, y in batched_dataset: # x has shape (10, 20) while y has shape (5, 20) # Do something with x and y
Parameters: -
__getitem__
(index)¶ Returns the
index
th sampleThis returns something different every time the data is shuffled.
If index is a list or a slice this will return a batch.
The result is a tuple
batch_data, batch_target
where each of those is a numpy array in Fortran layout (for more efficient input in dynet). The batch size is always the last dimension.Parameters: index (int, slice) – Index or slice Returns: batch_data, batch_target
Return type: tuple
-
__init__
(data, targets, batch_size=32, shuffle=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
-
class
dynn.data.batching.
SequenceBatch
(sequences, original_idxs=None, pad_idx=None, left_aligned=True)¶ Bases:
object
Batched sequence object with padding
This wraps a list of integer sequences into a nice array padded to the longest sequence. The batch dimension (number of sequences) is the last dimension.
By default the sequences are padded to the right which means that they are aligned to the left (they all start at index 0)
Parameters: - sequences (list) – List of list of integers
- original_idxs (list) – This list should point to the original position
of each sequence in the data (before shuffling/reordering). This is
useful when you want to access information that has been discarded
during preprocessing (eg original sentence before numberizing and
<unk>
ing in MT). - pad_idx (int) – Default index for padding
- left_aligned (bool, optional) – Align to the left (all sequences start at the same position).
-
__init__
(sequences, original_idxs=None, pad_idx=None, left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
collate
(sequences)¶ Pad and concatenate sequences to an array
Args: sequences (list): List of list of integers pad_idx (int): Default index for padding
Returns: max_len x batch_size
arrayReturn type: np.ndarray
-
get_mask
(base_val=1, mask_val=0)¶ Return a mask expression with specific values for padding tokens.
This will return an expression of the same shape as
self.sequences
where thei
th element of batchb
isbase_val
iffi<=lengths[b]
(andmask_val
otherwise).For example, if
size
is4
andlengths
is[1,2,4]
then the returned mask will be:(here each row is a batch element)
Parameters:
-
class
dynn.data.batching.
PaddedSequenceBatches
(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)¶ Bases:
object
Wraps a list of sequences and a list of targets as a batch iterator.
You can then iterate over this object and get tuples of
batch_data, batch_targets
ready for use in your computation graph.Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols="abcde".split()) # 1000 sequences of various lengths up to 10 data = [np.random.randint(len(dic), size=np.random.randint(10)) for _ in range(1000)] # Class labels labels = np.random.randint(10, size=1000) # Iterator with at most 20 samples or 50 tokens per batch batched_dataset = PaddedSequenceBatches( data, targets=labels, max_samples=20, pad_idx=dic.pad_idx, ) # Training loop for x, y in batched_dataset: # x is a SequenceBatch object # and y has shape (batch_size,) # Do something with x and y # Without labels batched_dataset = PaddedSequenceBatches( data, max_samples=20, pad_idx=dic.pad_idx, ) for x in batched_dataset: # x is a SequenceBatch object # Do something with x
Parameters: - data (list) – List of numpy arrays containing the data
- targets (list) – List of targets
- pad_value (number) – Value at padded position
- max_samples (int, optional) – Maximum number of samples per batch
- max_tokens (int, optional) – Maximum number of tokens per batch. This count doesn’t include padding tokens
- strict_token_limit (bool, optional) – Padding tokens will count towards
the
max_tokens
limit - shuffle (bool, optional) – Shuffle the dataset whenever starting a new
iteration (default:
True
) - group_by_length (bool, optional) – Group sequences by length. This minimizes the number of padding tokens. The batches are not strictly IID though.
- left_aligned (bool, optional) – Align the sequences to the left
-
__getitem__
(index)¶ Returns the
index
th sampleThe result is a tuple
batch_data, batch_target
where the first is a batch of sequences and the other is is a numpy array in Fortran layout (for more efficient input in dynet).batch_data
is aSequenceBatch
objectParameters: index (int, slice) – Index or slice Returns: batch_data, batch_target
Return type: tuple
-
__init__
(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
class
dynn.data.batching.
BPTTBatches
(data, batch_size=32, seq_length=30)¶ Bases:
object
Wraps a list of sequences as a contiguous batch iterator.
This will iterate over batches of contiguous subsequences of size
seq_length
. TODO: elaborateExample:
# Dictionary # Sequence of length 1000 data = np.random.randint(10, size=1000) # Iterator with over subsequences of length 20 with batch size 5 batched_dataset = BPTTBatches(data, batch_size=5, seq_length=20) # Training loop for x, y in batched_dataset: # x has and y have shape (seq_length, batch_size) # y[i+1] == x[i] # Do something with x
Parameters: -
__getitem__
(index)¶ Returns the
index
th sampleThe result is a tuple
x, next_x
of numpy arrays of shapeseq_len x batch_size
seq_length
is determined by the range specified byindex
, andnext_x[t]=x[t+1]
for allt
Parameters: index (int, slice) – Index or slice Returns: x, next_x
Return type: tuple
-
__init__
(data, batch_size=32, seq_length=30)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
-
class
dynn.data.batching.
SequencePairsBatches
(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)¶ Bases:
object
Wraps two lists of sequences as a batch iterator.
This is useful for sequence-to-sequence problems or sentence pairs classification (entailment, paraphrase detection…). Following seq2seq conventions the first sequence is referred to as the “source” and the second as the “target”.
You can then iterate over this object and get tuples of
src_batch, tgt_batch
ready for use in your computation graph.Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols="abcde".split()) # 1000 source sequences of various lengths up to 10 src_data = [np.random.randint(len(dic), size=np.random.randint(10)) for _ in range(1000)] # 1000 target sequences of various lengths up to 10 tgt_data = [np.random.randint(len(dic), size=np.random.randint(10)) for _ in range(1000)] # Iterator with at most 20 samples or 50 tokens per batch batched_dataset = SequencePairsBatches( src_data, tgt_data, max_samples=20 ) # Training loop for x, y in batched_dataset: # x and y are SequenceBatch objects
Parameters: - src_data (list) – List of source sequences (list of int iterables)
- tgt_data (list) – List of target sequences (list of int iterables)
- src_dictionary (Dictionary) – Source dictionary
- tgt_dictionary (Dictionary) – Target dictionary
- max_samples (int, optional) – Maximum number of samples per batch (one sample is a pair of sentences)
- max_tokens (int, optional) – Maximum number of total tokens per batch (source + target tokens)
- strict_token_limit (bool, optional) – Padding tokens will count towards
the
max_tokens
limit - shuffle (bool, optional) – Shuffle the dataset whenever starting a new
iteration (default:
True
) - group_by_length (str, optional) – Group sequences by length. One of
"source"
or"target"
. This minimizes the number of padding tokens. The batches are not strictly IID though. - src_left_aligned (bool, optional) – Align the source sequences to the left
- tgt_left_aligned (bool, optional) – Align the target sequences to the left
-
__getitem__
(index)¶ Returns the
index
th sampleThe result is a tuple
src_batch, tgt_batch
where each is abatch_data
is aSequenceBatch
objectParameters: index (int, slice) – Index or slice Returns: src_batch, tgt_batch
Return type: tuple
-
__init__
(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
class
dynn.data.batching.bptt_batching.
BPTTBatches
(data, batch_size=32, seq_length=30)¶ Bases:
object
Wraps a list of sequences as a contiguous batch iterator.
This will iterate over batches of contiguous subsequences of size
seq_length
. TODO: elaborateExample:
# Dictionary # Sequence of length 1000 data = np.random.randint(10, size=1000) # Iterator with over subsequences of length 20 with batch size 5 batched_dataset = BPTTBatches(data, batch_size=5, seq_length=20) # Training loop for x, y in batched_dataset: # x has and y have shape (seq_length, batch_size) # y[i+1] == x[i] # Do something with x
Parameters: -
__getitem__
(index)¶ Returns the
index
th sampleThe result is a tuple
x, next_x
of numpy arrays of shapeseq_len x batch_size
seq_length
is determined by the range specified byindex
, andnext_x[t]=x[t+1]
for allt
Parameters: index (int, slice) – Index or slice Returns: x, next_x
Return type: tuple
-
__init__
(data, batch_size=32, seq_length=30)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
-
class
dynn.data.batching.numpy_batching.
NumpyBatches
(data, targets, batch_size=32, shuffle=True)¶ Bases:
object
Wraps a list of numpy arrays and a list of targets as a batch iterator.
You can then iterate over this object and get tuples of
batch_data, batch_targets
ready for use in your computation graph.Example for classification:
# 1000 10-dimensional inputs data = np.random.uniform(size=(1000, 10)) # Class labels labels = np.random.randint(10, size=1000) # Iterator batched_dataset = NumpyBatches(data, labels, batch_size=20) # Training loop for x, y in batched_dataset: # x has shape (10, 20) while y has shape (20,) # Do something with x and y
Example for multidimensional regression:
# 1000 10-dimensional inputs data = np.random.uniform(size=(1000, 10)) # 5-dimensional outputs labels = np.random.uniform(size=(1000, 5)) # Iterator batched_dataset = NumpyBatches(data, labels, batch_size=20) # Training loop for x, y in batched_dataset: # x has shape (10, 20) while y has shape (5, 20) # Do something with x and y
Parameters: -
__getitem__
(index)¶ Returns the
index
th sampleThis returns something different every time the data is shuffled.
If index is a list or a slice this will return a batch.
The result is a tuple
batch_data, batch_target
where each of those is a numpy array in Fortran layout (for more efficient input in dynet). The batch size is always the last dimension.Parameters: index (int, slice) – Index or slice Returns: batch_data, batch_target
Return type: tuple
-
__init__
(data, targets, batch_size=32, shuffle=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
-
class
dynn.data.batching.padded_sequence_batching.
PaddedSequenceBatches
(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)¶ Bases:
object
Wraps a list of sequences and a list of targets as a batch iterator.
You can then iterate over this object and get tuples of
batch_data, batch_targets
ready for use in your computation graph.Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols="abcde".split()) # 1000 sequences of various lengths up to 10 data = [np.random.randint(len(dic), size=np.random.randint(10)) for _ in range(1000)] # Class labels labels = np.random.randint(10, size=1000) # Iterator with at most 20 samples or 50 tokens per batch batched_dataset = PaddedSequenceBatches( data, targets=labels, max_samples=20, pad_idx=dic.pad_idx, ) # Training loop for x, y in batched_dataset: # x is a SequenceBatch object # and y has shape (batch_size,) # Do something with x and y # Without labels batched_dataset = PaddedSequenceBatches( data, max_samples=20, pad_idx=dic.pad_idx, ) for x in batched_dataset: # x is a SequenceBatch object # Do something with x
Parameters: - data (list) – List of numpy arrays containing the data
- targets (list) – List of targets
- pad_value (number) – Value at padded position
- max_samples (int, optional) – Maximum number of samples per batch
- max_tokens (int, optional) – Maximum number of tokens per batch. This count doesn’t include padding tokens
- strict_token_limit (bool, optional) – Padding tokens will count towards
the
max_tokens
limit - shuffle (bool, optional) – Shuffle the dataset whenever starting a new
iteration (default:
True
) - group_by_length (bool, optional) – Group sequences by length. This minimizes the number of padding tokens. The batches are not strictly IID though.
- left_aligned (bool, optional) – Align the sequences to the left
-
__getitem__
(index)¶ Returns the
index
th sampleThe result is a tuple
batch_data, batch_target
where the first is a batch of sequences and the other is is a numpy array in Fortran layout (for more efficient input in dynet).batch_data
is aSequenceBatch
objectParameters: index (int, slice) – Index or slice Returns: batch_data, batch_target
Return type: tuple
-
__init__
(data, targets=None, max_samples=32, pad_idx=0, max_tokens=inf, strict_token_limit=False, shuffle=True, group_by_length=True, left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
class
dynn.data.batching.parallel_sequences_batching.
SequencePairsBatches
(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)¶ Bases:
object
Wraps two lists of sequences as a batch iterator.
This is useful for sequence-to-sequence problems or sentence pairs classification (entailment, paraphrase detection…). Following seq2seq conventions the first sequence is referred to as the “source” and the second as the “target”.
You can then iterate over this object and get tuples of
src_batch, tgt_batch
ready for use in your computation graph.Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols="abcde".split()) # 1000 source sequences of various lengths up to 10 src_data = [np.random.randint(len(dic), size=np.random.randint(10)) for _ in range(1000)] # 1000 target sequences of various lengths up to 10 tgt_data = [np.random.randint(len(dic), size=np.random.randint(10)) for _ in range(1000)] # Iterator with at most 20 samples or 50 tokens per batch batched_dataset = SequencePairsBatches( src_data, tgt_data, max_samples=20 ) # Training loop for x, y in batched_dataset: # x and y are SequenceBatch objects
Parameters: - src_data (list) – List of source sequences (list of int iterables)
- tgt_data (list) – List of target sequences (list of int iterables)
- src_dictionary (Dictionary) – Source dictionary
- tgt_dictionary (Dictionary) – Target dictionary
- max_samples (int, optional) – Maximum number of samples per batch (one sample is a pair of sentences)
- max_tokens (int, optional) – Maximum number of total tokens per batch (source + target tokens)
- strict_token_limit (bool, optional) – Padding tokens will count towards
the
max_tokens
limit - shuffle (bool, optional) – Shuffle the dataset whenever starting a new
iteration (default:
True
) - group_by_length (str, optional) – Group sequences by length. One of
"source"
or"target"
. This minimizes the number of padding tokens. The batches are not strictly IID though. - src_left_aligned (bool, optional) – Align the source sequences to the left
- tgt_left_aligned (bool, optional) – Align the target sequences to the left
-
__getitem__
(index)¶ Returns the
index
th sampleThe result is a tuple
src_batch, tgt_batch
where each is abatch_data
is aSequenceBatch
objectParameters: index (int, slice) – Index or slice Returns: src_batch, tgt_batch
Return type: tuple
-
__init__
(src_data, tgt_data, src_dictionary, tgt_dictionary=None, labels=None, max_samples=32, max_tokens=99999999, strict_token_limit=False, shuffle=True, group_by_length='source', src_left_aligned=True, tgt_left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()¶ This returns the number of batches in the dataset (not the total number of samples)
Returns: - Number of batches in the dataset
ceil(len(data)/batch_size)
Return type: int
-
__weakref__
¶ list of weak references to the object (if defined)
-
just_passed_multiple
(batch_number)¶ Checks whether the current number of batches processed has just passed a multiple of
batch_number
.For example you can use this to report at regular interval (eg. every 10 batches)
Parameters: batch_number (int) – [description] Returns: True
if \(\fraccurrent_batch\)Return type: bool
-
percentage_done
()¶ What percent of the data has been covered in the current epoch
-
reset
()¶ Reset the iterator and shuffle the dataset if applicable
-
class
dynn.data.batching.sequence_batch.
SequenceBatch
(sequences, original_idxs=None, pad_idx=None, left_aligned=True)¶ Bases:
object
Batched sequence object with padding
This wraps a list of integer sequences into a nice array padded to the longest sequence. The batch dimension (number of sequences) is the last dimension.
By default the sequences are padded to the right which means that they are aligned to the left (they all start at index 0)
Parameters: - sequences (list) – List of list of integers
- original_idxs (list) – This list should point to the original position
of each sequence in the data (before shuffling/reordering). This is
useful when you want to access information that has been discarded
during preprocessing (eg original sentence before numberizing and
<unk>
ing in MT). - pad_idx (int) – Default index for padding
- left_aligned (bool, optional) – Align to the left (all sequences start at the same position).
-
__init__
(sequences, original_idxs=None, pad_idx=None, left_aligned=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
collate
(sequences)¶ Pad and concatenate sequences to an array
Args: sequences (list): List of list of integers pad_idx (int): Default index for padding
Returns: max_len x batch_size
arrayReturn type: np.ndarray
-
get_mask
(base_val=1, mask_val=0)¶ Return a mask expression with specific values for padding tokens.
This will return an expression of the same shape as
self.sequences
where thei
th element of batchb
isbase_val
iffi<=lengths[b]
(andmask_val
otherwise).For example, if
size
is4
andlengths
is[1,2,4]
then the returned mask will be:(here each row is a batch element)
Parameters:
Submodules¶
Various functions for accessing the Amazon Reviews dataset.
-
dynn.data.amazon.
download_amazon
(path='.', force=False)¶ Downloads the Amazon from “http://riejohnson.com/software/”
Parameters:
-
dynn.data.amazon.
load_amazon
(path, tok=True, size='200k')¶ Loads the Amazon dataset
Returns the train, dev and test sets in a dictionary, each as a tuple of containing the reviews and the labels.
Parameters: path (str) – Path to the folder containing the elec2.tar.gz
fileReturns: - Dictionary containing the train and test sets
- (dictionary of review/labels tuples)
Return type: dict
-
dynn.data.amazon.
read_amazon
(split, path, tok=True, size='200k')¶ Iterates over the Amazon dataset
Example:
for review, label in read_amazon("train", "/path/to/amazon"): train(review, label)
Parameters: Returns: review, label
Return type:
-
dynn.data.caching.
cached_to_file
(filename)¶ Decorator to cache the output of a function to a file
Sometimes your workflow will contain functions that are executed once but take a lot of time (typically data preprocessing). This can be annoying when eg. running multiple experiments different parameters. This decorator provides a solution by running the function once, then saving its output to a file. Next time you called this function, and unless the file in question has been deleted, the function will just read its result from the file instead of recomputing everything.
Caveats: - By default if you call the decorated function with different arguments,
this will still load the cached output from the first function call with the original arguments. You need to add the update_cache=True keyword argument to force the function to be rerun. Incidentally the decorated function should not have an argument named update_cache.- The serialization is done with pickle, so:
- it isn’t super secure (if you care about these things)
- it only handles functions where the outputs can be pickled (for now). Typically this wouldn’t work for dynet objects.
Example usage:
@cached_to_file("preprocessed_data.bin") def preprocess(raw_data): # do a lot of preprocessing # [...] do something else # This first call will run the function and pickle its output to # "preprocessed_data.bin" (and return the output) data = preprocess(raw_data) # [...] do something else, or maybe rerun the program # This will just load the output from "preprocessed_data.bin" data = preprocess(raw_data) # [...] do something else, or maybe rerun the program # This will force the function to be rerun and the cached output to be # updated. You should to that if for example the arguments of # `preprocess` are expected to change data = preprocess(raw_data, update_cache=True)
Parameters: filename (str) – Name of the file where the cached output should be saved to.
Various functions for accessing the CIFAR10 dataset.
-
dynn.data.cifar10.
download_cifar10
(path='.', force=False)¶ Downloads CIFAR10 from “https://www.cs.toronto.edu/~kriz/cifar.html”
Parameters:
-
dynn.data.cifar10.
load_cifar10
(path)¶ Loads the CIFAR10 dataset
Returns the train and test set, each as a list of images and a list of labels. The images are represented as numpy arrays and the labels as integers.
Parameters: path (str) – Path to the folder containing the *-ubyte.gz
filesReturns: train and test sets Return type: tuple
-
dynn.data.cifar10.
read_cifar10
(split, path)¶ Iterates over the CIFAR10 dataset
Example:
for image in read_cifar10("train", "/path/to/cifar10"): train(image)
Parameters: Returns: image, label
Return type:
Helper functions to download and manage datasets.
-
dynn.data.data_util.
download_if_not_there
(file, url, path, force=False, local_file=None)¶ Downloads a file from the given url if and only if the file doesn’t already exist in the provided path or
force=True
Parameters: - file (str) – File name
- url (str) – Url where the file can be found (without the filename)
- path (str) – Path to the local folder where the file should be stored
- force (bool, optional) – Force the file download (useful if you suspect that the file might have changed)
- file – File name for the local file (defaults to
file
)
Dictionary object for holding string to index mappings
Various functions for accessing the IWSLT translation datasets
-
dynn.data.iwslt.
download_iwslt
(path='.', year='2016', langpair='de-en', force=False)¶ Downloads the IWSLT from “https://wit3.fbk.eu/archive/”
Parameters:
-
dynn.data.iwslt.
load_iwslt
(path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')¶ Loads the IWSLT dataset
Returns the train, dev and test set, each as lists of source and target sentences.
Parameters: - path (str) – Path to the folder containing the
.tgz
file - year (str, optional) – IWSLT year (for now only 2016 is supported)
- langpair (str, optional) –
src-tgt
language pair (for now only{de,fr}-en
are supported) - src_eos (str, optional) – Optionally append an end of sentence token to each source line.
- tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns: train, dev and test sets
Return type: - path (str) – Path to the folder containing the
-
dynn.data.iwslt.
read_iwslt
(split, path, year='2016', langpair='de-en', src_eos=None, tgt_eos='<eos>')¶ Iterates over the IWSLT dataset
Example:
for src, tgt in read_iwslt("train", "/path/to/iwslt"): train(src, tgt)
Parameters: - split (str) – Either
"train"
,"dev"
or"test"
- path (str) – Path to the folder containing the
.tgz
file - year (str, optional) – IWSLT year (for now only 2016 is supported)
- langpair (str, optional) –
src-tgt
language pair (for now only{de,fr}-en
are supported) - src_eos (str, optional) – Optionally append an end of sentence token to each source line.
- tgt_eos (str, optional) – Optionally append an end of sentence token to each target line.
Returns: Source sentence, Target sentence
Return type: - split (str) – Either
Various functions for accessing the MNIST dataset.
-
dynn.data.mnist.
download_mnist
(path='.', force=False)¶ Downloads MNIST from “http://yann.lecun.com/exdb/mnist/”
Parameters:
-
dynn.data.mnist.
load_mnist
(path)¶ Loads the MNIST dataset
Returns MNIST as a dictionary.
Example:
mnist = load_mnist(".") # Train images and labels train_imgs, train_labels = mnist["train"] # Test images and labels test_imgs, test_labels = mnist["test"]
The images are represented as numpy arrays and the labels as integers.
Parameters: path (str) – Path to the folder containing the *-ubyte.gz
filesReturns: MNIST dataset Return type: dict
-
dynn.data.mnist.
read_mnist
(split, path)¶ Iterates over the MNIST dataset
Example:
for image in read_mnist("train", "/path/to/mnist"): train(image)
Parameters: Returns: image, label
Return type:
Usful functions for preprocessing data
-
dynn.data.preprocess.
lowercase
(data)¶ Lowercase text
Parameters: data (list,str) – Data to lowercase (either a string or a list [of lists..] of strings) Returns: Lowercased data Return type: list, str
-
dynn.data.preprocess.
normalize
(data)¶ Normalize the data to mean 0 std 1
Parameters: data (list,np.ndarray) – data to normalize Returns: Normalized data Return type: list,np.array
-
dynn.data.preprocess.
tokenize
(data, tok='space', lang='en')¶ Tokenize text data.
There are 5 tokenizers supported:
- “space”: split along whitespaces
- “char”: split in characters
- “13a”: Official WMT tokenization
- “zh”: Chinese tokenization (See
sacrebleu
doc) - “moses”: Moses tokenizer (you can specify lthe language).
- Uses the sacremoses
Parameters: Returns: Tokenized data
Return type:
Various functions for accessing the PTB dataset used by Mikolov et al., 2010.
-
dynn.data.ptb.
download_ptb
(path='.', force=False)¶ Downloads the PTB from “http://www.fit.vutbr.cz/~imikolov/rnnlm”
Parameters:
-
dynn.data.ptb.
load_ptb
(path, eos=None)¶ Loads the PTB dataset
Returns the train and test set, each as a list of images and a list of labels. The images are represented as numpy arrays and the labels as integers.
Parameters: Returns: dictionary mapping the split name to a list of strings
Return type:
-
dynn.data.ptb.
read_ptb
(split, path, eos=None)¶ Iterates over the PTB dataset
Example:
for sent in read_ptb("train", "/path/to/ptb"): train(sent)
Parameters: Returns: tree, label
Return type:
Various functions for accessing the SNLI dataset.
-
dynn.data.snli.
download_snli
(path='.', force=False)¶ Downloads the SNLI from “https://nlp.stanford.edu/projects/snli/”
Parameters:
-
dynn.data.snli.
load_snli
(path, terminals_only=True, binary=False)¶ Loads the SNLI dataset
Returns the train, dev and test sets in a dictionary, each as a tuple of containing the trees and the labels.
Parameters: Returns: - Dictionary containing the train, dev and test sets
(tuple of tree/labels tuples)
Return type:
-
dynn.data.snli.
read_snli
(split, path, terminals_only=True, binary=False)¶ Iterates over the SNLI dataset
Example:
for tree, label in read_snli("train", "/path/to/snli"): train(tree, label)
Parameters: Returns: tree, label
Return type:
Various functions for accessing the SST dataset.
-
dynn.data.sst.
download_sst
(path='.', force=False)¶ Downloads the SST from “https://nlp.stanford.edu/sentiment/”
Parameters:
-
dynn.data.sst.
load_sst
(path, terminals_only=True, binary=False)¶ Loads the SST dataset
Returns the train, dev and test sets in a dictionary, each as a tuple of containing the trees and the labels.
Parameters: Returns: - Dictionary containing the train, dev and test sets
(tuple of tree/labels tuples)
Return type:
-
dynn.data.sst.
read_sst
(split, path, terminals_only=True, binary=False)¶ Iterates over the SST dataset
Example:
for tree, label in read_sst("train", "/path/to/sst"): train(tree, label)
Parameters: Returns: tree, label
Return type:
Helper functions to handle tree-structured data
Various functions for accessing the WikiText datasets (WikiText-2 and WikiText-103).
-
dynn.data.wikitext.
download_wikitext
(path='.', name='2', force=False)¶ Downloads the WikiText from “http://www.fit.vutbr.cz/~imikolov/rnnlm”
Parameters:
-
dynn.data.wikitext.
load_wikitext
(path, name='2', eos=None)¶ Loads the WikiText dataset
Returns the train, validation test set, each as a list of sentences (each sentence is a list of words)
Parameters: Returns: dictionary mapping the split name to a list of strings
Return type:
-
dynn.data.wikitext.
read_wikitext
(split, path, name='2', eos=None)¶ Iterates over the WikiText dataset
Example:
for sent in read_wikitext("train", "/path/to/wikitext"): train(sent)
Parameters: Returns: list of words
Return type:
dynn.layers package¶
Layers¶
Layers are the standard unit of neural models in DyNN. Layers are typically used like this:
# Instantiate layer
layer = Layer(parameter_collection, *args, **kwargs)
# [...]
# Renew computation graph
dy.renew_cg()
# Initialize layer
layer.init(*args, **kwargs)
# Apply layer forward pass
y = layer(x)
-
class
dynn.layers.
BaseLayer
(name)¶ Bases:
object
Base layer interface
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
init
(test=True, update=False)¶ Initialize the layer before performing computation
For example setup dropout, freeze some parameters, etc…
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
sublayers
¶ Returns all attributes of the layer which are layers themselves
-
-
class
dynn.layers.
ParametrizedLayer
(pc, name)¶ Bases:
dynn.layers.base_layers.BaseLayer
This is the base class for layers with trainable parameters
When implementing a ParametrizedLayer, use
self.add_parameters
/self.add_lookup_parameters
to add parameters to the layer.-
__init__
(pc, name)¶ Creates a subcollection for this layer with a custom name
-
add_lookup_parameters
(name, dim, lookup_param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)¶ This adds a parameter to this layer’s parametercollection
The layer will have 1 new attribute:
self.[name]
which will contain the lookup parameter object (which you should use in__call__
).You can provide an existing lookup parameter with the
lookup_param
argument, in which case this parameter will be reused.The other arguments are the same as
dynet.ParameterCollection.add_lookup_parameters
-
add_parameters
(name, dim, param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)¶ This adds a parameter to this layer’s ParameterCollection.
The layer will have 1 new attribute:
self.[name]
which will contain the expression for this parameter (which you should use in__call__
).You can provide an existing parameter with the param argument, in which case this parameter will be reused.
The other arguments are the same as
dynet.ParameterCollection.add_parameters
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
lookup_parameters
¶ Return all lookup parameters specific to this layer
-
parameters
¶ Return all parameters specific to this layer
-
-
class
dynn.layers.
Lambda
(function)¶ Bases:
dynn.layers.base_layers.BaseLayer
This layer applies an arbitrary function to its input.
Lambda(f)(x) == f(x)
This is useful if you want to wrap activation functions as layers. The unary operation should be a function taking
dynet.Expression
todynet.Expression
.You shouldn’t use this to stack layers though,
op
oughtn’t be a layer. If you want to stack layers, usecombination_layers.Sequential
.Parameters: - layer (
base_layers.BaseLayer
) – The layer to which output you want to apply the unary operation. - binary_operation (function) – A unary operation on
dynet.Expression
objects
-
__call__
(*args, **kwargs)¶ Returns
function(*args, **kwargs)
-
__init__
(function)¶ Initialize self. See help(type(self)) for accurate signature.
- layer (
-
class
dynn.layers.
Affine
(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Densely connected layer
\(y=f(Wx+b)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output dimension
- activation (function, optional) – activation function (default: :py:function:`identity`)
- dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
)
-
__call__
(x)¶ Forward pass.
Parameters: x ( dynet.Expression
) – Input expression (a vector)Returns: \(y=f(Wx+b)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Embeddings
(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Layer for embedding elements of a dictionary
Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols=["a", "b"]) # Parameter collection pc = dy.ParameterCollection() # Embedding layer of dimension 10 embed = Embeddings(pc,dic, 10) # Initialize dy.renew_cg() embed.init() # Return a batch of 2 10-dimensional vectors vectors = embed([dic.index("b"), dic.index("a")])
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - dictionary (
dynn.data.dictionary.Dictionary
) – Mapping from symbols to indices - embed_dim (int) – Embedding dimension
- init (
dynet.PyInitializer
, optional) – How to initialize the parameters. By default this will initialize to \(\mathcal N(0, \frac{\)}{sqrt{textt{embed_dim}}})` - pad_mask (float, optional) – If provided, embeddings of the
dictionary.pad_idx
index will be masked with this value
-
__call__
(idxs, length_dim=0)¶ Returns the input’s embedding
If
idxs
is a list this returns a batch of embeddings. If it’s a numpy array of shapeN x b
it returns a batch ofb
N x embed_dim
matricesParameters: idxs (list,int) – Index or list of indices to embed Returns: Batch of embeddings Return type: dynet.Expression
-
__init__
(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)¶ Creates a subcollection for this layer with a custom name
-
weights
¶ Numpy array containing the embeddings
The first dimension is the lookup dimension
- pc (
-
class
dynn.layers.
Residual
(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)¶ Bases:
dynn.layers.base_layers.BaseLayer
Adds residual connections to a layer
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
RecurrentCell
(*args, **kwargs)¶ Bases:
object
Base recurrent cell interface
Recurrent cells must provide a default initial value for their recurrent state (eg. all zeros)
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
initial_value
(batch_size=1)¶ Initial value of the recurrent state. Should return a list.
-
-
class
dynn.layers.
StackedRecurrentCells
(*cells)¶ Bases:
dynn.layers.base_layers.BaseLayer
,dynn.layers.recurrent_layers.RecurrentCell
This implements a stack of recurrent layers
The recurrent state of the resulting cell is the list of the states of all the sub-cells. For example for a stack of 2 LSTM cells the resulting state will be
[h_1, c_1, h_2, c_2]
Example:
# Parameter collection pc = dy.ParameterCollection() # Stacked recurrent cell stacked_cell = StackedRecurrentCells( LSTM(pc, 10, 15), LSTM(pc, 15, 5), ElmanRNN(pc, 5, 20), ) # Inputs dy.renew_cg() x = dy.random_uniform(10, batch_size=5) # Initialize layer stacked_cell.init(test=False) # Initial state: [h_1, c_1, h_2, c_2, h_3] of sizes [15, 15, 5, 5, 20] init_state = stacked_cell.initial_value() # Run the cell on the input. new_state = stacked_cell(x, *init_state) # Get the final output (h_3 of size 20) h = stacked_cell.get_output(new_state)
-
__call__
(x, *state)¶ Compute the cell’s output from the list of states and an input expression
Parameters: x ( dynet.Expression
) – Input vectorReturns: new recurrent state Return type: list
-
__init__
(*cells)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_output
(state)¶ Get the output of the last cell
-
initial_value
(batch_size=1)¶ Initial value of the recurrent state.
-
-
class
dynn.layers.
ElmanRNN
(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
,dynn.layers.recurrent_layers.RecurrentCell
The standard Elman RNN cell:
\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- activation (function, optional) – Activation function \(sigma\)
(default:
dynn.activations.tanh()
) - dropout (float, optional) – Dropout rate (default 0)
-
__call__
(x, h)¶ Perform the recurrent update.
Parameters: - x (
dynet.Expression
) – Input vector - h (
dynet.Expression
) – Previous recurrent vector
Returns: - Next recurrent state
\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)
Return type: - x (
-
__init__
(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)¶ Creates a subcollection for this layer with a custom name
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
initial_value
(batch_size=1)¶ Return a vector of dimension hidden_dim filled with zeros
Returns: Zero vector Return type: dynet.Expression
- pc (
-
class
dynn.layers.
LSTM
(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
,dynn.layers.recurrent_layers.RecurrentCell
Standard LSTM
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- dropout_x (float, optional) – Input dropout rate (default 0)
- dropout_h (float, optional) – Recurrent dropout rate (default 0)
-
__call__
(x, h, c)¶ Perform the recurrent update.
Parameters: - x (
dynet.Expression
) – Input vector - h (
dynet.Expression
) – Previous recurrent vector - c (
dynet.Expression
) – Previous cell state vector
Returns: dynet.Expression
for the ext recurrent statesh
andc
Return type: - x (
-
__init__
(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)¶ Creates a subcollection for this layer with a custom name
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
- pc (
-
class
dynn.layers.
StackedLSTM
(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)¶ Bases:
dynn.layers.recurrent_layers.StackedRecurrentCells
Stacked LSTMs
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - num_layers (int) – Number of layers
- input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- dropout_x (float, optional) – Input dropout rate (default 0)
- dropout_h (float, optional) – Recurrent dropout rate (default 0)
-
__init__
(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.
Transduction
(layer)¶ Bases:
dynn.layers.base_layers.BaseLayer
Feed forward transduction layer
This layer runs one cell on a sequence of inputs and returns the list of outputs. Calling it is equivalent to calling:
[layer(x) for x in input_sequence]
Parameters: cell ( base_layers.BaseLayer
) – The recurrent cell to use for transduction-
__call__
(input_sequence)¶ Runs the layer over the input
The output is a list of the output of the layer at each step
Parameters: input_sequence (list) – Input as a list of dynet.Expression
objectsReturns: List of recurrent states (depends on the recurrent layer) Return type: list
-
__init__
(layer)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
Unidirectional
(cell, output_only=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
Unidirectional transduction layer
This layer runs a recurrent cell on a sequence of inputs and produces resulting the sequence of recurrent states.
Example:
# LSTM cell lstm_cell = dynn.layers.LSTM(dy.ParameterCollection(), 10, 10) # Transduction layer lstm = dynn.layers.Unidirectional(lstm_cell) # Inputs dy.renew_cg() xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)] # Initialize layer lstm.init(test=False) # Transduce forward states = lstm(xs) # Retrieve last h h_final = states[-1][0]
Parameters: - cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for transduction - output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
-
__call__
(input_sequence, backward=False, lengths=None, left_padded=True, output_only=None, initial_state=None)¶ Transduces the sequence using the recurrent cell.
The output is a list of the output states at each step. For instance in an LSTM the output is
(h1, c1), (h2, c2), ...
This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - backward (bool, optional) – If this is
True
the sequence will be processed from left to right. The output sequence will still be in the same order as the input sequence though. - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
- output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
- initial_state (
dy.Expression
, optional) – Overridies the default initial state of the recurrent cell
Returns: List of recurrent states (depends on the recurrent layer)
Return type: - input_sequence (list) – Input as a list of
-
__init__
(cell, output_only=False)¶ Initialize self. See help(type(self)) for accurate signature.
- cell (
-
class
dynn.layers.
Bidirectional
(forward_cell, backward_cell, output_only=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
Bidirectional transduction layer
This layer runs a recurrent cell on in each direction on a sequence of inputs and produces resulting the sequence of recurrent states.
Example:
# Parameter collection pc = dy.ParameterCollection() # LSTM cell fwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10) bwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10) # Transduction layer bilstm = dynn.layers.Bidirectional(fwd_lstm_cell, bwd_lstm_cell) # Inputs dy.renew_cg() xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)] # Initialize layer bilstm.init(test=False) # Transduce forward fwd_states, bwd_states = bilstm(xs) # Retrieve last h fwd_h_final = fwd_states[-1][0] # For the backward LSTM the final state is at # the beginning of the sequence (assuming left padding) bwd_h_final = fwd_states[0][0]
Parameters: - forward_cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for forward transduction - backward_cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for backward transduction - output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
-
__call__
(input_sequence, lengths=None, left_padded=True, output_only=None, fwd_initial_state=None, bwd_initial_state=None)¶ Transduces the sequence in both directions
The output is a tuple
forward_states, backward_states
where eachforward_states
is a list of the output states of the forward recurrent cell at each step (andbackward_states
for the backward cell). For instance in a BiLSTM the output is[(fwd_h1, fwd_c1), ...], [(bwd_h1, bwd_c1), ...]
This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
- output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
- fwd_initial_state (
dy.Expression
, optional) – Overridies the default initial state of the forward recurrent cell. - bwd_initial_state (
dy.Expression
, optional) – Overridies the default initial state of the backward recurrent cell.
Returns: - List of forward and backward recurrent states
(depends on the recurrent layer)
Return type: - input_sequence (list) – Input as a list of
-
__init__
(forward_cell, backward_cell, output_only=False)¶ Initialize self. See help(type(self)) for accurate signature.
- forward_cell (
-
class
dynn.layers.
MaxPool1D
(kernel_size=None, stride=1)¶ Bases:
dynn.layers.base_layers.BaseLayer
1D max pooling
Parameters: -
__call__
(x, kernel_size=None, stride=None)¶ Max pooling over the first dimension.
This takes either a list of
N
d
-dimensional vectors or aN x d
matrix.The output will be a matrix of dimension
(N - kernel_size + 1) // stride x d
Parameters: - x (
dynet.Expression
) – Input matrix or list of vectors - dim (int, optional) – The reduction dimension (default:
0
) - kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
- stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, stride=1)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
MaxPool2D
(kernel_size=None, strides=None)¶ Bases:
dynn.layers.base_layers.BaseLayer
2D max pooling.
Parameters: -
__call__
(x, kernel_size=None, strides=None)¶ Max pooling over the first dimension.
If either of the
kernel_size
elements is not specified, the pooling will be done over the full dimension (and the stride is ignored)Parameters: - x (
dynet.Expression
) – Input image (3-d tensor) or matrix. - kernel_size (list, optional) – Size of the pooling kernel. If this is not specified, the default specified in the constructor is used.
- strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, strides=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
MeanPool1D
(kernel_size=None, stride=1)¶ Bases:
dynn.layers.base_layers.BaseLayer
1D mean pooling.
The stride and kernel size arguments are here for consistency with
MaxPooling1D
but they are unsupported for now.Parameters: -
__call__
(x, kernel_size=None, stride=None, lengths=None)¶ Mean pooling over the first dimension.
This takes either a list of
N
d
-dimensional vectors or aN x d
matrix.The output will be a matrix of dimension
(N - kernel_size + 1) // stride x d
Parameters: - x (
dynet.Expression
) – Input matrix or list of vectors - dim (int, optional) – The reduction dimension (default:
0
) - kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
- stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, stride=1)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
MLPAttention
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multilayer Perceptron based attention
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- hidden_dim (int) – Hidden dimension of the MLP
- activation (function, optional) – MLP activation (defaults to tanh).
- dropout (float, optional) – Attention dropout (defaults to 0)
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq,), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
BilinearAttention
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Bilinear attention layer.
Here the scores are computed according to
\[\alpha_{ij}=q_i^\intercal A k_j\]Where \(q_i,k_j\) are the ith query and jth key respectively. If
dot_product
is set toTrue
this is replaced by:\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Where \(d\) is the dimension of the keys and queries.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- dot_product (bool, optional) – Compute attention with the dot product
only (no weight matrix). The requires that
query_dim==key_dim
. - dropout (float, optional) – Attention dropout (defaults to 0)
- A_p (
dynet.Parameters
, optional) – Specify the weight matrix directly.
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value.
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
MultiHeadAttention
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multi headed attention layer.
This functions like dot product attention
\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Except the key, query and values are split into multiple
heads
.Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_heads (int) – Number of heads
- query_dim (int) – Dimension of queries
- key_dim (int) – Dimension of keys
- values_dim (int) – Dimension of values
- hidden_dim (int) – Hidden dimension (must be a multiple of
n_heads
) - out_dim (bool, optional) – Output dimension
- dropout (float, optional) – Attention dropout (defaults to 0)
- Wq_p (
dynet.Parameters
, optional) – Specify the queries projection matrix directly. - Wk_p (
dynet.Parameters
, optional) – Specify the keys projection matrix directly. - Wv_p (
dynet.Parameters
, optional) – Specify the values projection matrix directly. - Wo_p (
dynet.Parameters
, optional) – Specify the output projection matrix directly.
-
__call__
(queries, keys, values, mask=None)¶ Compute attention weightss and return the pooled value.
This expects the queries, keys and values to have dimensions
dq x l
,dk x L
,dv x L
respectively.Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - queries (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - queries (
-
__init__
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Conv1D
(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
1D convolution along the first dimension
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- num_kernels (int) – Number of kernels (essentially the output dimension)
- kernel_width (int) – Width of the kernels
- activation (function, optional) – activation function
(default:
identity
) - dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
) - zero_padded (bool, optional) – Default padding behaviour. Pad
the input with zeros so that the output has the same length
(default
True
) - stride (list, optional) – Default stride along the length
(defaults to
1
).
-
__call__
(x, stride=None, zero_padded=None)¶ Forward pass
Parameters: - x (
dynet.Expression
) – Input expression with the shape (length, input_dim) - stride (int, optional) – Stride along the temporal dimension
- zero_padded (bool, optional) – Pad the image with zeros so that the
output has the same length (default
True
)
Returns: Convolved sequence.
Return type: - x (
-
__init__
(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Conv2D
(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
2D convolution
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - num_channels (int) – Number of channels in the input image
- num_kernels (int) – Number of kernels (essentially the output dimension)
- kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension.
- activation (function, optional) – activation function
(default:
identity
) - dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
) - zero_padded (bool, optional) – Default padding behaviour. Pad
the image with zeros so that the output has the same width/height
(default
True
) - strides (list, optional) – Default stride along each dimension
(list of size 2, defaults to
[1, 1]
).
-
__call__
(x, strides=None, zero_padded=None)¶ Forward pass
Parameters: - x (
dynet.Expression
) – Input image (3-d tensor) or matrix. - zero_padded (bool, optional) – Pad the image with zeros so that the output has the same width/height. If this is not specified, the default specified in the constructor is used.
- strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns: Convolved image.
Return type: - x (
-
__init__
(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
Flatten
¶ Bases:
dynn.layers.base_layers.BaseLayer
Flattens the output such that there is only one dimension left (batch dimension notwithstanding)
Example:
# Create the layer flatten = Flatten() # Dummy batched 2d input x = dy.zeros((3, 4), batch_size=7) # x.dim() -> (3, 4), 7 y = flatten(x) # y.dim() -> (12,), 7
-
__call__
(x)¶ Flattens the output such that there is only one dimension left (batch dimension notwithstanding)
Parameters: x ([type]) – [description] Returns: [description] Return type: [type]
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.
LayerNorm
(pc, input_dim, gain=None, bias=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Layer normalization layer:
\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)
Parameters: - input_dim (int, tuple) – Input dimension
- pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters
-
__call__
(x, d=None)¶ Layer-normalize the input.
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, gain=None, bias=None)¶ Creates a subcollection for this layer with a custom name
-
class
dynn.layers.
Sequential
(*layers, default_return_last_only=True)¶ Bases:
dynn.layers.base_layers.BaseLayer
A helper class to stack layers into deep networks.
Parameters: - layers (list) – A list of
dynn.layers.BaseLayer
objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. the output of each layer can be fed into the next one) - default_return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
-
__call__
(x, return_last_only=None)¶ Calls all the layers in succession.
Computes
layers[n-1](layers[n-2](...layers[0](x)))
Parameters: - x (
dynet.Expression
) – Input expression - return_last_only (bool, optional) – Overrides the default
Returns: - Depending on
return_last_only
, returns either the last expression or a list of all the layer’s outputs (first to last)
Return type: dynet.Expression
, list- x (
-
__init__
(*layers, default_return_last_only=True)¶ Initialize self. See help(type(self)) for accurate signature.
- layers (list) – A list of
-
class
dynn.layers.
Parallel
(*layers, dim=0, default_insert_dim=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
A helper class to run layers on the same input and concatenate their outputs
This can be used to create 2d conv layers with multiple kernel sizes by concatenating multiple
dynn.layers.Conv2D
.Parameters: - layers (list) – A list of
dynn.layers.BaseLayer
objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. each layer takes the same input and the outputs have the same shape everywhere except along the concatenation dimension) - dim (int) – The concatenation dimension
- default_insert_dim (bool, optional) – Instead of concatenating along an
existing dimension, insert a a new dimension at
dim
and concatenate.
-
__call__
(x, insert_dim=None, **kwargs)¶ Calls all the layers in succession.
Computes
dy.concatenate([layers[0](x)...layers[n-1](x)], d=dim)
Parameters: - x (
dynet.Expression
) – Input expression - default_insert_dim (bool, optional) – Override the default
Returns: - Depending on
return_last_only
, returns either the last expression or a list of all the layer’s outputs (first to last)
Return type: dynet.Expression
, list- x (
-
__init__
(*layers, dim=0, default_insert_dim=False)¶ Initialize self. See help(type(self)) for accurate signature.
- layers (list) – A list of
-
class
dynn.layers.
Transformer
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Transformer layer.
As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. Used for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.
StackedTransformers
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
- return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.
CondTransformer
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Conditional transformer layer.
As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the transformer is the list of
L-1
inputs and its output is theL
th output. This returns a tuple of both the new state (L-1
previous inputs +L
th input concatenated) and theL
th outputParameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
, optional) – Previous “state” (dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: [description]
Return type: [type]
- x (
- pc (
-
class
dynn.layers.
StackedCondTransformers
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (list) – list of contexts (one per layer, each of dim
cond_dim x L
). If this is not a list (but an expression), the same context will be used for each layer. - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the multilayered transformer is the list of
n_layers
L-1
sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list ofn_layers
L
sized inputs) and theL
th output.Parameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
) – Previous “state” (list ofn_layers
expressions of dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: - x (
- pc (
Submodules¶
-
class
dynn.layers.attention_layers.
BilinearAttention
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Bilinear attention layer.
Here the scores are computed according to
\[\alpha_{ij}=q_i^\intercal A k_j\]Where \(q_i,k_j\) are the ith query and jth key respectively. If
dot_product
is set toTrue
this is replaced by:\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Where \(d\) is the dimension of the keys and queries.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- dot_product (bool, optional) – Compute attention with the dot product
only (no weight matrix). The requires that
query_dim==key_dim
. - dropout (float, optional) – Attention dropout (defaults to 0)
- A_p (
dynet.Parameters
, optional) – Specify the weight matrix directly.
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value.
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, dot_product=False, dropout=0.0, A=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.attention_layers.
MLPAttention
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multilayer Perceptron based attention
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - query_dim (int) – Queries dimension
- key_dim (int) – Keys dimension
- hidden_dim (int) – Hidden dimension of the MLP
- activation (function, optional) – MLP activation (defaults to tanh).
- dropout (float, optional) – Attention dropout (defaults to 0)
-
__call__
(query, keys, values, mask=None)¶ Compute attention scores and return the pooled value
This returns both the pooled value and the attention score. You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - query (
dynet.Expression
) – Query vector of size(dq,), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - query (
-
__init__
(pc, query_dim, key_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Wq=None, Wk=None, b=None, V=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.attention_layers.
MultiHeadAttention
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Multi headed attention layer.
This functions like dot product attention
\[\alpha_{ij}=\frac 1 {\sqrt{d}} q_i^\intercal k_j\]Except the key, query and values are split into multiple
heads
.Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_heads (int) – Number of heads
- query_dim (int) – Dimension of queries
- key_dim (int) – Dimension of keys
- values_dim (int) – Dimension of values
- hidden_dim (int) – Hidden dimension (must be a multiple of
n_heads
) - out_dim (bool, optional) – Output dimension
- dropout (float, optional) – Attention dropout (defaults to 0)
- Wq_p (
dynet.Parameters
, optional) – Specify the queries projection matrix directly. - Wk_p (
dynet.Parameters
, optional) – Specify the keys projection matrix directly. - Wv_p (
dynet.Parameters
, optional) – Specify the values projection matrix directly. - Wo_p (
dynet.Parameters
, optional) – Specify the output projection matrix directly.
-
__call__
(queries, keys, values, mask=None)¶ Compute attention weightss and return the pooled value.
This expects the queries, keys and values to have dimensions
dq x l
,dk x L
,dv x L
respectively.Returns both the pooled value and the attention weights (list of weights, one per head). You can specify an additive mask when some values are not to be attended to (eg padding).
Parameters: - queries (
dynet.Expression
) – Query vector of size(dq, l), B
- keys (
dynet.Expression
) – Key vectors of size(dk, L), B
- values (
dynet.Expression
) – Value vectors of size(dv, L), B
- mask (
dynet.Expression
, optional) – Additive mask expression for the source side (size(L,), B
)
Returns: pooled_value, scores
, of size(dv,), B
and(L,), B
respectively
Return type: - queries (
-
__init__
(pc, n_heads, query_dim, key_dim, value_dim, hidden_dim, out_dim, dropout=0.0, Wq=None, Wk=None, Wv=None, Wo=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.base_layers.
BaseLayer
(name)¶ Bases:
object
Base layer interface
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
init
(test=True, update=False)¶ Initialize the layer before performing computation
For example setup dropout, freeze some parameters, etc…
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
sublayers
¶ Returns all attributes of the layer which are layers themselves
-
-
class
dynn.layers.base_layers.
ParametrizedLayer
(pc, name)¶ Bases:
dynn.layers.base_layers.BaseLayer
This is the base class for layers with trainable parameters
When implementing a ParametrizedLayer, use
self.add_parameters
/self.add_lookup_parameters
to add parameters to the layer.-
__init__
(pc, name)¶ Creates a subcollection for this layer with a custom name
-
add_lookup_parameters
(name, dim, lookup_param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)¶ This adds a parameter to this layer’s parametercollection
The layer will have 1 new attribute:
self.[name]
which will contain the lookup parameter object (which you should use in__call__
).You can provide an existing lookup parameter with the
lookup_param
argument, in which case this parameter will be reused.The other arguments are the same as
dynet.ParameterCollection.add_lookup_parameters
-
add_parameters
(name, dim, param=None, init=None, device='', scale=1.0, mean=0.0, std=1.0)¶ This adds a parameter to this layer’s ParameterCollection.
The layer will have 1 new attribute:
self.[name]
which will contain the expression for this parameter (which you should use in__call__
).You can provide an existing parameter with the param argument, in which case this parameter will be reused.
The other arguments are the same as
dynet.ParameterCollection.add_parameters
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
lookup_parameters
¶ Return all lookup parameters specific to this layer
-
parameters
¶ Return all parameters specific to this layer
-
Perhaps unsurprisingly, combination layers are layers that combine other layers within one layer.
-
class
dynn.layers.combination_layers.
Parallel
(*layers, dim=0, default_insert_dim=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
A helper class to run layers on the same input and concatenate their outputs
This can be used to create 2d conv layers with multiple kernel sizes by concatenating multiple
dynn.layers.Conv2D
.Parameters: - layers (list) – A list of
dynn.layers.BaseLayer
objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. each layer takes the same input and the outputs have the same shape everywhere except along the concatenation dimension) - dim (int) – The concatenation dimension
- default_insert_dim (bool, optional) – Instead of concatenating along an
existing dimension, insert a a new dimension at
dim
and concatenate.
-
__call__
(x, insert_dim=None, **kwargs)¶ Calls all the layers in succession.
Computes
dy.concatenate([layers[0](x)...layers[n-1](x)], d=dim)
Parameters: - x (
dynet.Expression
) – Input expression - default_insert_dim (bool, optional) – Override the default
Returns: - Depending on
return_last_only
, returns either the last expression or a list of all the layer’s outputs (first to last)
Return type: dynet.Expression
, list- x (
-
__init__
(*layers, dim=0, default_insert_dim=False)¶ Initialize self. See help(type(self)) for accurate signature.
- layers (list) – A list of
-
class
dynn.layers.combination_layers.
Sequential
(*layers, default_return_last_only=True)¶ Bases:
dynn.layers.base_layers.BaseLayer
A helper class to stack layers into deep networks.
Parameters: - layers (list) – A list of
dynn.layers.BaseLayer
objects. The first layer is the first one applied to the input. It is the programmer’s responsibility to make sure that the layers are compatible (eg. the output of each layer can be fed into the next one) - default_return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
-
__call__
(x, return_last_only=None)¶ Calls all the layers in succession.
Computes
layers[n-1](layers[n-2](...layers[0](x)))
Parameters: - x (
dynet.Expression
) – Input expression - return_last_only (bool, optional) – Overrides the default
Returns: - Depending on
return_last_only
, returns either the last expression or a list of all the layer’s outputs (first to last)
Return type: dynet.Expression
, list- x (
-
__init__
(*layers, default_return_last_only=True)¶ Initialize self. See help(type(self)) for accurate signature.
- layers (list) – A list of
-
class
dynn.layers.convolution_layers.
Conv1D
(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
1D convolution along the first dimension
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- num_kernels (int) – Number of kernels (essentially the output dimension)
- kernel_width (int) – Width of the kernels
- activation (function, optional) – activation function
(default:
identity
) - dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
) - zero_padded (bool, optional) – Default padding behaviour. Pad
the input with zeros so that the output has the same length
(default
True
) - stride (list, optional) – Default stride along the length
(defaults to
1
).
-
__call__
(x, stride=None, zero_padded=None)¶ Forward pass
Parameters: - x (
dynet.Expression
) – Input expression with the shape (length, input_dim) - stride (int, optional) – Stride along the temporal dimension
- zero_padded (bool, optional) – Pad the image with zeros so that the
output has the same length (default
True
)
Returns: Convolved sequence.
Return type: - x (
-
__init__
(pc, input_dim, num_kernels, kernel_width, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, stride=1, K=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.convolution_layers.
Conv2D
(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
2D convolution
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - num_channels (int) – Number of channels in the input image
- num_kernels (int) – Number of kernels (essentially the output dimension)
- kernel_size (list, optional) – Default kernel size. This is a list of two elements, one per dimension.
- activation (function, optional) – activation function
(default:
identity
) - dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
) - zero_padded (bool, optional) – Default padding behaviour. Pad
the image with zeros so that the output has the same width/height
(default
True
) - strides (list, optional) – Default stride along each dimension
(list of size 2, defaults to
[1, 1]
).
-
__call__
(x, strides=None, zero_padded=None)¶ Forward pass
Parameters: - x (
dynet.Expression
) – Input image (3-d tensor) or matrix. - zero_padded (bool, optional) – Pad the image with zeros so that the output has the same width/height. If this is not specified, the default specified in the constructor is used.
- strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns: Convolved image.
Return type: - x (
-
__init__
(pc, num_channels, num_kernels, kernel_size, activation=<function identity>, dropout_rate=0.0, nobias=False, zero_padded=True, strides=None, K=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.dense_layers.
Affine
(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Densely connected layer
\(y=f(Wx+b)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output dimension
- activation (function, optional) – activation function (default: :py:function:`identity`)
- dropout (float, optional) – Dropout rate (default 0)
- nobias (bool, optional) – Omit the bias (default
False
)
-
__call__
(x)¶ Forward pass.
Parameters: x ( dynet.Expression
) – Input expression (a vector)Returns: \(y=f(Wx+b)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, output_dim, activation=<function identity>, dropout=0.0, nobias=False, W=None, b=None)¶ Creates a subcollection for this layer with a custom name
- pc (
-
class
dynn.layers.dense_layers.
GatedLayer
(pc, input_dim, output_dim, activation=<built-in function tanh>, dropout=0.0, Wo=None, bo=None, Wg=None, bg=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Gated linear layer:
\(y=(W_ox+b_o)\circ \sigma(W_gx+b_g)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output dimension
- activation (function, optional) – activation function
(default:
dynet.tanh
) - dropout (float, optional) – Dropout rate (default 0)
-
__call__
(x)¶ Forward pass
Parameters: x ( dynet.Expression
) – Input expression (a vector)Returns: \(y=(W_ox+b_o)\circ \sigma(W_gx+b_g)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, output_dim, activation=<built-in function tanh>, dropout=0.0, Wo=None, bo=None, Wg=None, bg=None)¶ Creates a subcollection for this layer with a custom name
- pc (
For embedding discrete inputs (such as words, characters).
-
class
dynn.layers.embedding_layers.
Embeddings
(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Layer for embedding elements of a dictionary
Example:
# Dictionary dic = dynn.data.dictionary.Dictionary(symbols=["a", "b"]) # Parameter collection pc = dy.ParameterCollection() # Embedding layer of dimension 10 embed = Embeddings(pc,dic, 10) # Initialize dy.renew_cg() embed.init() # Return a batch of 2 10-dimensional vectors vectors = embed([dic.index("b"), dic.index("a")])
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - dictionary (
dynn.data.dictionary.Dictionary
) – Mapping from symbols to indices - embed_dim (int) – Embedding dimension
- init (
dynet.PyInitializer
, optional) – How to initialize the parameters. By default this will initialize to \(\mathcal N(0, \frac{\)}{sqrt{textt{embed_dim}}})` - pad_mask (float, optional) – If provided, embeddings of the
dictionary.pad_idx
index will be masked with this value
-
__call__
(idxs, length_dim=0)¶ Returns the input’s embedding
If
idxs
is a list this returns a batch of embeddings. If it’s a numpy array of shapeN x b
it returns a batch ofb
N x embed_dim
matricesParameters: idxs (list,int) – Index or list of indices to embed Returns: Batch of embeddings Return type: dynet.Expression
-
__init__
(pc, dictionary, embed_dim, init=None, pad_mask=None, E=None)¶ Creates a subcollection for this layer with a custom name
-
weights
¶ Numpy array containing the embeddings
The first dimension is the lookup dimension
- pc (
-
class
dynn.layers.functional_layers.
AdditionLayer
(layer1, layer2)¶ Bases:
dynn.layers.functional_layers.BinaryOpLayer
Addition of two layers.
This is the layer returned by the addition syntax:
AdditionLayer(layer1, layer2)(x) == layer1(x) + layer2(x) # is the same thing as add_1_2 = layer1 + layer2 add_1_2(x) == layer1(x) + layer2(x)
Parameters: - layer1 (
base_layers.BaseLayer
) – First layer - layer2 (
base_layers.BaseLayer
) – Second layer
-
__init__
(layer1, layer2)¶ Initialize self. See help(type(self)) for accurate signature.
- layer1 (
-
class
dynn.layers.functional_layers.
BinaryOpLayer
(layer1, layer2, binary_operation)¶ Bases:
dynn.layers.base_layers.BaseLayer
This layer wraps two layers with a binary operation.
BinaryOpLayer(layer1, layer2, op)(x) == op(layer1(x), layer2(x))
This is useful to express the addition of two layers as another layer.
Parameters: - layer1 (
base_layers.BaseLayer
) – First layer - layer2 (
base_layers.BaseLayer
) – Second layer - binary_operation (function) – A binary operation on
dynet.Expression
objects
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(layer1, layer2, binary_operation)¶ Initialize self. See help(type(self)) for accurate signature.
- layer1 (
-
class
dynn.layers.functional_layers.
CmultLayer
(layer1, layer2)¶ Bases:
dynn.layers.functional_layers.BinaryOpLayer
Coordinate-wise multiplication of two layers.
CmultLayer(layer1, layer2)(x) == dy.cmult(layer1(x), layer2(x))
Parameters: - layer1 (
base_layers.BaseLayer
) – First layer - layer2 (
base_layers.BaseLayer
) – Second layer
-
__init__
(layer1, layer2)¶ Initialize self. See help(type(self)) for accurate signature.
- layer1 (
-
class
dynn.layers.functional_layers.
ConstantLayer
(constant)¶ Bases:
dynn.layers.base_layers.BaseLayer
This is the “zero”-ary layer.
# Takes in numbers ConstantLayer(5)() == dy.inputTensor([5]) # Or lists ConstantLayer([5, 6])() == dy.inputTensor([5, 6]) # Or numpy arrays ConstantLayer(np.ones((10, 12)))() == dy.inputTensor(np.ones((10, 12)))
Parameters: constant (number, np.ndarray) – The constant. It must be a type that can be turned into a dynet.Expression
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(constant)¶ Initialize self. See help(type(self)) for accurate signature.
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
-
class
dynn.layers.functional_layers.
IdentityLayer
¶ Bases:
dynn.layers.functional_layers.Lambda
The identity layer does literally nothing
IdentityLayer()(x) == x
It passes its input directly as the output. Still, it can be useful to express more complicated layers like residual connections.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.functional_layers.
Lambda
(function)¶ Bases:
dynn.layers.base_layers.BaseLayer
This layer applies an arbitrary function to its input.
Lambda(f)(x) == f(x)
This is useful if you want to wrap activation functions as layers. The unary operation should be a function taking
dynet.Expression
todynet.Expression
.You shouldn’t use this to stack layers though,
op
oughtn’t be a layer. If you want to stack layers, usecombination_layers.Sequential
.Parameters: - layer (
base_layers.BaseLayer
) – The layer to which output you want to apply the unary operation. - binary_operation (function) – A unary operation on
dynet.Expression
objects
-
__call__
(*args, **kwargs)¶ Returns
function(*args, **kwargs)
-
__init__
(function)¶ Initialize self. See help(type(self)) for accurate signature.
- layer (
-
class
dynn.layers.functional_layers.
NegationLayer
(layer)¶ Bases:
dynn.layers.functional_layers.UnaryOpLayer
Negates the output of another layer:
NegationLayer(layer)(x) == - layer(x)
It can also be used with the - syntax directly:
negated_layer = - layer # is the same as negated_layer = NegationLayer(layer)
Parameters: layer ( base_layers.BaseLayer
) – The layer to which output you want to apply the negation.-
__init__
(layer)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.functional_layers.
SubstractionLayer
(layer1, layer2)¶ Bases:
dynn.layers.functional_layers.BinaryOpLayer
Substraction of two layers.
This is the layer returned by the substraction syntax:
SubstractionLayer(layer1, layer2)(x) == layer1(x) - layer2(x) # is the same thing as add_1_2 = layer1 - layer2 add_1_2(x) == layer1(x) - layer2(x)
Parameters: - layer1 (
base_layers.BaseLayer
) – First layer - layer2 (
base_layers.BaseLayer
) – Second layer
-
__init__
(layer1, layer2)¶ Initialize self. See help(type(self)) for accurate signature.
- layer1 (
-
class
dynn.layers.functional_layers.
UnaryOpLayer
(layer, unary_operation)¶ Bases:
dynn.layers.base_layers.BaseLayer
This layer wraps a unary operation on another layer.
UnaryOpLayer(layer, op)(x) == op(layer(x))
This is a shorter way of writing:
UnaryOpLayer(layer, op)(x) == Sequential(layer, Lambda(op))
You shouldn’t use this to stack layers though,
op
oughtn’t be a layer. If you want to stack layers, usecombination_layers.Sequential
.Parameters: - layer (
base_layers.BaseLayer
) – The layer to which output you want to apply the unary operation. - binary_operation (function) – A unary operation on
dynet.Expression
objects
-
__call__
(*args, **kwargs)¶ Returns
unary_operation(layer(*args, **kwargs))
-
__init__
(layer, unary_operation)¶ Initialize self. See help(type(self)) for accurate signature.
- layer (
-
class
dynn.layers.normalization_layers.
LayerNorm
(pc, input_dim, gain=None, bias=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Layer normalization layer:
\(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\)
Parameters: - input_dim (int, tuple) – Input dimension
- pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters
-
__call__
(x, d=None)¶ Layer-normalize the input.
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(y=\frac{g}{\sigma(x)}\cdot(x-\mu(x)+b)\) Return type: dynet.Expression
-
__init__
(pc, input_dim, gain=None, bias=None)¶ Creates a subcollection for this layer with a custom name
-
class
dynn.layers.pooling_layers.
MaxPool1D
(kernel_size=None, stride=1)¶ Bases:
dynn.layers.base_layers.BaseLayer
1D max pooling
Parameters: -
__call__
(x, kernel_size=None, stride=None)¶ Max pooling over the first dimension.
This takes either a list of
N
d
-dimensional vectors or aN x d
matrix.The output will be a matrix of dimension
(N - kernel_size + 1) // stride x d
Parameters: - x (
dynet.Expression
) – Input matrix or list of vectors - dim (int, optional) – The reduction dimension (default:
0
) - kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
- stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, stride=1)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.pooling_layers.
MaxPool2D
(kernel_size=None, strides=None)¶ Bases:
dynn.layers.base_layers.BaseLayer
2D max pooling.
Parameters: -
__call__
(x, kernel_size=None, strides=None)¶ Max pooling over the first dimension.
If either of the
kernel_size
elements is not specified, the pooling will be done over the full dimension (and the stride is ignored)Parameters: - x (
dynet.Expression
) – Input image (3-d tensor) or matrix. - kernel_size (list, optional) – Size of the pooling kernel. If this is not specified, the default specified in the constructor is used.
- strides (list, optional) – Stride along width/height. If this is not specified, the default specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, strides=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.pooling_layers.
MeanPool1D
(kernel_size=None, stride=1)¶ Bases:
dynn.layers.base_layers.BaseLayer
1D mean pooling.
The stride and kernel size arguments are here for consistency with
MaxPooling1D
but they are unsupported for now.Parameters: -
__call__
(x, kernel_size=None, stride=None, lengths=None)¶ Mean pooling over the first dimension.
This takes either a list of
N
d
-dimensional vectors or aN x d
matrix.The output will be a matrix of dimension
(N - kernel_size + 1) // stride x d
Parameters: - x (
dynet.Expression
) – Input matrix or list of vectors - dim (int, optional) – The reduction dimension (default:
0
) - kernel_size (int, optional) – Kernel size. If this is not specified, the default size specified in the constructor is used.
- stride (int, optional) – Temporal stride. If this is not specified, the default stride specified in the constructor is used.
Returns: Pooled sequence.
Return type: - x (
-
__init__
(kernel_size=None, stride=1)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
dynn.layers.pooling_layers.
max_pool_dim
(x, d=0, kernel_width=None, stride=1)¶ Efficent max pooling on GPU, assuming x is a matrix or a list of vectors
The particularity of recurrent is that their output can be fed back as input. This includes common recurrent cells like the Elman RNN or the LSTM.
-
class
dynn.layers.recurrent_layers.
ElmanRNN
(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
,dynn.layers.recurrent_layers.RecurrentCell
The standard Elman RNN cell:
\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- activation (function, optional) – Activation function \(sigma\)
(default:
dynn.activations.tanh()
) - dropout (float, optional) – Dropout rate (default 0)
-
__call__
(x, h)¶ Perform the recurrent update.
Parameters: - x (
dynet.Expression
) – Input vector - h (
dynet.Expression
) – Previous recurrent vector
Returns: - Next recurrent state
\(h_{t}=\sigma(W_{hh}h_{t-1} + W_{hx}x_{t} + b)\)
Return type: - x (
-
__init__
(pc, input_dim, hidden_dim, activation=<function tanh>, dropout=0.0, Whx=None, Whh=None, b=None)¶ Creates a subcollection for this layer with a custom name
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
-
initial_value
(batch_size=1)¶ Return a vector of dimension hidden_dim filled with zeros
Returns: Zero vector Return type: dynet.Expression
- pc (
-
class
dynn.layers.recurrent_layers.
LSTM
(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
,dynn.layers.recurrent_layers.RecurrentCell
Standard LSTM
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- dropout_x (float, optional) – Input dropout rate (default 0)
- dropout_h (float, optional) – Recurrent dropout rate (default 0)
-
__call__
(x, h, c)¶ Perform the recurrent update.
Parameters: - x (
dynet.Expression
) – Input vector - h (
dynet.Expression
) – Previous recurrent vector - c (
dynet.Expression
) – Previous cell state vector
Returns: dynet.Expression
for the ext recurrent statesh
andc
Return type: - x (
-
__init__
(pc, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0, Whx=None, Whh=None, b=None)¶ Creates a subcollection for this layer with a custom name
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
init_layer
(test=True, update=False)¶ Initializes only this layer’s parameters (not recursive) This needs to be implemented for each layer
- pc (
-
class
dynn.layers.recurrent_layers.
RecurrentCell
(*args, **kwargs)¶ Bases:
object
Base recurrent cell interface
Recurrent cells must provide a default initial value for their recurrent state (eg. all zeros)
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_output
(state)¶ Get the cell’s output from the list of states.
For example this would return
h
fromh,c
in the case of the LSTM
-
initial_value
(batch_size=1)¶ Initial value of the recurrent state. Should return a list.
-
-
class
dynn.layers.recurrent_layers.
StackedLSTM
(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)¶ Bases:
dynn.layers.recurrent_layers.StackedRecurrentCells
Stacked LSTMs
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - num_layers (int) – Number of layers
- input_dim (int) – Input dimension
- output_dim (int) – Output (hidden) dimension
- dropout_x (float, optional) – Input dropout rate (default 0)
- dropout_h (float, optional) – Recurrent dropout rate (default 0)
-
__init__
(pc, num_layers, input_dim, hidden_dim, dropout_x=0.0, dropout_h=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.recurrent_layers.
StackedRecurrentCells
(*cells)¶ Bases:
dynn.layers.base_layers.BaseLayer
,dynn.layers.recurrent_layers.RecurrentCell
This implements a stack of recurrent layers
The recurrent state of the resulting cell is the list of the states of all the sub-cells. For example for a stack of 2 LSTM cells the resulting state will be
[h_1, c_1, h_2, c_2]
Example:
# Parameter collection pc = dy.ParameterCollection() # Stacked recurrent cell stacked_cell = StackedRecurrentCells( LSTM(pc, 10, 15), LSTM(pc, 15, 5), ElmanRNN(pc, 5, 20), ) # Inputs dy.renew_cg() x = dy.random_uniform(10, batch_size=5) # Initialize layer stacked_cell.init(test=False) # Initial state: [h_1, c_1, h_2, c_2, h_3] of sizes [15, 15, 5, 5, 20] init_state = stacked_cell.initial_value() # Run the cell on the input. new_state = stacked_cell(x, *init_state) # Get the final output (h_3 of size 20) h = stacked_cell.get_output(new_state)
-
__call__
(x, *state)¶ Compute the cell’s output from the list of states and an input expression
Parameters: x ( dynet.Expression
) – Input vectorReturns: new recurrent state Return type: list
-
__init__
(*cells)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_output
(state)¶ Get the output of the last cell
-
initial_value
(batch_size=1)¶ Initial value of the recurrent state.
-
-
class
dynn.layers.residual_layers.
Residual
(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)¶ Bases:
dynn.layers.base_layers.BaseLayer
Adds residual connections to a layer
-
__call__
(*args, **kwargs)¶ Execute forward pass
-
__init__
(layer, shortcut_transform=None, layer_weight=1.0, shortcut_weight=1.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
Sequence transduction layers take in a sequence of expression and runs one
layer over each input. They can be feed-forward (each input is treated
independently, eg. Transduction
) or recurrent
(the output at one step depends on the output at the previous step, eg.
Unidirectional
).
-
class
dynn.layers.transduction_layers.
Bidirectional
(forward_cell, backward_cell, output_only=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
Bidirectional transduction layer
This layer runs a recurrent cell on in each direction on a sequence of inputs and produces resulting the sequence of recurrent states.
Example:
# Parameter collection pc = dy.ParameterCollection() # LSTM cell fwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10) bwd_lstm_cell = dynn.layers.LSTM(pc, 10, 10) # Transduction layer bilstm = dynn.layers.Bidirectional(fwd_lstm_cell, bwd_lstm_cell) # Inputs dy.renew_cg() xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)] # Initialize layer bilstm.init(test=False) # Transduce forward fwd_states, bwd_states = bilstm(xs) # Retrieve last h fwd_h_final = fwd_states[-1][0] # For the backward LSTM the final state is at # the beginning of the sequence (assuming left padding) bwd_h_final = fwd_states[0][0]
Parameters: - forward_cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for forward transduction - backward_cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for backward transduction - output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
-
__call__
(input_sequence, lengths=None, left_padded=True, output_only=None, fwd_initial_state=None, bwd_initial_state=None)¶ Transduces the sequence in both directions
The output is a tuple
forward_states, backward_states
where eachforward_states
is a list of the output states of the forward recurrent cell at each step (andbackward_states
for the backward cell). For instance in a BiLSTM the output is[(fwd_h1, fwd_c1), ...], [(bwd_h1, bwd_c1), ...]
This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
- output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
- fwd_initial_state (
dy.Expression
, optional) – Overridies the default initial state of the forward recurrent cell. - bwd_initial_state (
dy.Expression
, optional) – Overridies the default initial state of the backward recurrent cell.
Returns: - List of forward and backward recurrent states
(depends on the recurrent layer)
Return type: - input_sequence (list) – Input as a list of
-
__init__
(forward_cell, backward_cell, output_only=False)¶ Initialize self. See help(type(self)) for accurate signature.
- forward_cell (
-
class
dynn.layers.transduction_layers.
SequenceMaskingLayer
(mask_value=0.0, left_padded=True)¶ Bases:
dynn.layers.base_layers.BaseLayer
Masks a sequence of batched expressions according to each batch element’s length
This layer applies a mask value to the elements of a sequence of batched expressions which correspond to padding tokens. Typically if you batch a sequence of size 2 and a sequence of size 3 you will pad the first sequence to obtain a list of 3 expresions of batch size 2. This layer will mask the batch element of the last expression corresponding to the padding token in the 1st sequence.
This is useful when doing attention or max-pooling on padded sequences when you want to mask padding tokens with \(-\infty\) to ensure that they are ignored.
Parameters: -
__call__
(input_sequence, lengths, left_padded=None)¶ Runs the layer over the input
The output is a list of the output of the layer at each step
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded. Overwrites the value given in the constructor.
Returns: List of masked expression
Return type: - input_sequence (list) – Input as a list of
-
__init__
(mask_value=0.0, left_padded=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.transduction_layers.
Transduction
(layer)¶ Bases:
dynn.layers.base_layers.BaseLayer
Feed forward transduction layer
This layer runs one cell on a sequence of inputs and returns the list of outputs. Calling it is equivalent to calling:
[layer(x) for x in input_sequence]
Parameters: cell ( base_layers.BaseLayer
) – The recurrent cell to use for transduction-
__call__
(input_sequence)¶ Runs the layer over the input
The output is a list of the output of the layer at each step
Parameters: input_sequence (list) – Input as a list of dynet.Expression
objectsReturns: List of recurrent states (depends on the recurrent layer) Return type: list
-
__init__
(layer)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
dynn.layers.transduction_layers.
Unidirectional
(cell, output_only=False)¶ Bases:
dynn.layers.base_layers.BaseLayer
Unidirectional transduction layer
This layer runs a recurrent cell on a sequence of inputs and produces resulting the sequence of recurrent states.
Example:
# LSTM cell lstm_cell = dynn.layers.LSTM(dy.ParameterCollection(), 10, 10) # Transduction layer lstm = dynn.layers.Unidirectional(lstm_cell) # Inputs dy.renew_cg() xs = [dy.random_uniform(10, batch_size=5) for _ in range(20)] # Initialize layer lstm.init(test=False) # Transduce forward states = lstm(xs) # Retrieve last h h_final = states[-1][0]
Parameters: - cell (
recurrent_layers.RecurrentCell
) – The recurrent cell to use for transduction - output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states.
-
__call__
(input_sequence, backward=False, lengths=None, left_padded=True, output_only=None, initial_state=None)¶ Transduces the sequence using the recurrent cell.
The output is a list of the output states at each step. For instance in an LSTM the output is
(h1, c1), (h2, c2), ...
This assumes that all the input expression have the same batch size. If you batch sentences of the same length together you should pad to the longest sequence.
Parameters: - input_sequence (list) – Input as a list of
dynet.Expression
objects - backward (bool, optional) – If this is
True
the sequence will be processed from left to right. The output sequence will still be in the same order as the input sequence though. - lengths (list, optional) – If the expressions in the sequence are
batched, but have different lengths, this should contain a list
of the sequence lengths (default:
None
) - left_padded (bool, optional) – If the input sequences have different lengths they must be padded to the length of longest sequence. Use this to specify whether the sequence is left or right padded.
- output_only (bool, optional) – Only return the sequence of outputs instead of the sequence of states. Overwrites the value given in the constructor.
- initial_state (
dy.Expression
, optional) – Overridies the default initial state of the recurrent cell
Returns: List of recurrent states (depends on the recurrent layer)
Return type: - input_sequence (list) – Input as a list of
-
__init__
(cell, output_only=False)¶ Initialize self. See help(type(self)) for accurate signature.
- cell (
-
class
dynn.layers.transformer_layers.
CondTransformer
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Conditional transformer layer.
As described in Vaswani et al. (2017) This is the “decoder” side of the transformer, ie self attention + attention to context.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the transformer is the list of
L-1
inputs and its output is theL
th output. This returns a tuple of both the new state (L-1
previous inputs +L
th input concatenated) and theL
th outputParameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
, optional) – Previous “state” (dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: [description]
Return type: [type]
- x (
- pc (
-
class
dynn.layers.transformer_layers.
StackedCondTransformers
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- cond_dim (int) – Conditional dimension (dimension of the “encoder” side, used for attention)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - c (list) – list of contexts (one per layer, each of dim
cond_dim x L
). If this is not a list (but an expression), the same context will be used for each layer. - lengths (list, optional) – Defaults to None. List of lengths for masking (used for self attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking in self attention.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers). - triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, cond_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
-
step
(state, x, c, lengths=None, left_aligned=True, mask=None, triu=False, lengths_c=None, left_aligned_c=True, mask_c=None, return_att=False, return_last_only=True)¶ Runs the transformer for one step. Useful for decoding.
The “state” of the multilayered transformer is the list of
n_layers
L-1
sized inputs and its output is the output of the last layer. This returns a tuple of both the new state (list ofn_layers
L
sized inputs) and theL
th output.Parameters: - x (
dynet.Expression
) – Input (dimensioninput_dim
) - state (
dynet.Expression
) – Previous “state” (list ofn_layers
expressions of dimensionsinput_dim x (L-1)
) - c (
dynet.Expression
) – Context (dimensionscond_dim x l
) - lengths_c (list, optional) – Defaults to None. List of lengths for masking (used for conditional attention)
- left_aligned_c (bool, optional) – Defaults to True. Used for masking in conditional attention.
- mask_c (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength_c
, you can pass a mask expression directly (useful to reuse masks accross layers). - return_att (bool, optional) – Defaults to False. Return the self and conditional attention weights
- return_att – Defaults to False. [description]
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: - x (
- pc (
-
class
dynn.layers.transformer_layers.
StackedTransformers
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.combination_layers.Sequential
Multilayer transformer.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - n_layers (int) – Number of layers
- input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False, return_last_only=True)¶ Run the multilayer transformer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. USed for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
- return_last_only (bool, optional) – Return only the output of the last layer (as opposed to the output of all layers).
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, n_layers, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Initialize self. See help(type(self)) for accurate signature.
- pc (
-
class
dynn.layers.transformer_layers.
Transformer
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Bases:
dynn.layers.base_layers.ParametrizedLayer
Transformer layer.
As described in Vaswani et al. (2017) This is the “encoder” side of the transformer, ie self attention only.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to hold the parameters - input_dim (int) – Hidden dimension (used everywhere)
- n_heads (int) – Number of heads for self attention.
- activation (function, optional) – MLP activation (defaults to relu).
- dropout (float, optional) – Dropout rate (defaults to 0)
-
__call__
(x, lengths=None, left_aligned=True, triu=False, mask=None, return_att=False)¶ Run the transformer layer.
The input is expected to have dimensions
d x L
whereL
is the length dimension.Parameters: - x (
dynet.Expression
) – Input (dimensionsinput_dim x L
) - lengths (list, optional) – Defaults to None. List of lengths for masking (used for attention)
- left_aligned (bool, optional) – Defaults to True. Used for masking
- triu (bool, optional) – Upper triangular self attention. Mask such that each position can only attend to the previous positions.
- mask (
dynet.Expression
, optional) – Defaults to None. As an alternative tolength
, you can pass a mask expression directly (useful to reuse masks accross layers) - return_att (bool, optional) – Defaults to False. Return the self attention weights
Returns: - The output expression (+ the
attention weights if
return_att
isTrue
)
Return type: tuple,
dynet.Expression
- x (
-
__init__
(pc, input_dim, hidden_dim, n_heads, activation=<function relu>, dropout=0.0)¶ Creates a subcollection for this layer with a custom name
- pc (
Submodules¶
Activation functions¶
Common activation functions for neural networks.
Most of those are wrappers around standard dynet operations
(eg. rectify
-> relu
)
-
dynn.activations.
identity
(x)¶ The identity function
\(y=x\)
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(x\) Return type: dynet.Expression
-
dynn.activations.
relu
(x)¶ The REctified Linear Unit
\(y=\max(0,x)\)
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(\max(0,x)\) Return type: dynet.Expression
-
dynn.activations.
sigmoid
(x)¶ The sigmoid function
\(y=\frac{1}{1+e^{-x}}\)
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(\frac{1}{1+e^{-x}}\) Return type: dynet.Expression
-
dynn.activations.
tanh
(x)¶ The hyperbolic tangent function
\(y=\tanh(x)\)
Parameters: x ( dynet.Expression
) – Input expressionReturns: \(\tanh(x)\) Return type: dynet.Expression
Command line utilities¶
-
dynn.command_line.
add_dynet_args
(parser, new_group=True)¶ Adds dynet command line arguments to an
argparse.ArgumentParser
You can apply this to your argument parser so that it doesn’t throw an error when you add command line arguments for dynet. For a description of the arguments available for dynet, see the official documentation
Parameters: - parser (
argparse.ArgumentParser
) – Your argument parser. - new_group (bool, optional) – Add the arguments in a specific argument group (default: True)
- parser (
Input/output functions¶
These functions help writing to and reading from files
-
dynn.io.
load
(filename, ignore_invalid_names=False)¶ Load a ParameterCollection from a
.npz
file.This will recover the subcollection structure.
Parameters: Returns: Loaded ParameterCollection
Return type:
-
dynn.io.
loadtxt
(filename, encoding='utf-8')¶ Read text from a file
-
dynn.io.
populate
(pc, filename, ignore_shape_mismatch=False)¶ Populate a ParameterCollection from a
.npz
fileParameters: - pc (
dynet.ParameterCollection
) – Parameter collection to populate. - filename (str) – File to populate from.
- ignore_shape_mismatch (bool, optional) – Silently ignore shape mismatch
between the parameter and the value in the
.npz
file (just don’t load the parameter and move on)
- pc (
-
dynn.io.
save
(pc, filename, compressed=True)¶ Save a ParameterCollection as a .npz archive.
Each parameter is an entry in the archive and its name describes the subcollection it lives in.
Parameters: - pc (
dynet.ParameterCollection
) – Parameter collection to save. - filename (str) – Target filename. The
.npz
extension will be appended to the file name if it is not already there. - compressed (bool, optional) – Compressed
.npz
(slower but smaller on disk)
- pc (
-
dynn.io.
savetxt
(filename, txt, encoding='utf-8')¶ Save text to a file
Operations¶
This extends the base dynet
library with useful operations.
-
dynn.operations.
nll_softmax
(logit, y)¶ This is the same as
dy.pickneglogsoftmax
.The main difference is the shorter name and transparent handling of batches. It computes:
\[-\texttt{logit[y]}+\log(\sum_{\texttt{c'}}e^{logit[c']})\](softmax then negative log likelihood of
y
)Parameters: - logit (
dynet.Expression
) – Logit - y (int,list) – Either a class or a list of class
(if
logit
is batched)
- logit (
-
dynn.operations.
seq_mask
(size, lengths, base_val=1, mask_val=0, left_aligned=True)¶ Returns a mask for a batch sequences of different lengths.
This will return a
(size,), len(lengths)
shaped expression where thei
th element of batchb
isbase_val
iffi<=lengths[b]
(andmask_val
otherwise).For example, if
size
is4
andlengths
is[1,2,4]
then the returned mask will be:(here each row is a batch element)
Parameters: - size (int) – Max size of the sequence (must be
>=max(lengths)
) - lengths (list) – List of lengths
- base_val (int, optional) – Value of the mask for non-masked indices (typically 1 for multiplicative masks and 0 for additive masks). Defaults to 1.
- mask_val (int, optional) – Value of the mask for masked indices (typically 0 for multiplicative masks and -inf for additive masks). Defaults to 0.
- left_aligned (bool, optional) – Defaults to True.
Returns: dynet.Expression
: Mask expression- size (int) – Max size of the sequence (must be
-
dynn.operations.
squeeze
(x, d=0)¶ Removes a dimension of size 1 at the given position.
Example:
# (1, 20) x = dy.zeros((1, 20)) # (20,) squeeze(x, 0) # (20, 1) x = dy.zeros((20, 1)) # (20,) squeeze(x, 1) # (20,) squeeze(x, -1)
-
dynn.operations.
stack
(xs, d=0)¶ Like concatenated but inserts a dimension
d=-1
to insert a dimension at the last positionParameters:
-
dynn.operations.
unsqueeze
(x, d=0)¶ Insert a dimension of size 1 at the given position
Example:
# (10, 20) x = dy.zeros((10, 20)) # (1, 10, 20) unsqueeze(x, 0) # (10, 20, 1) unsqueeze(x, -1)
Parameter initialization¶
Some of those are just less verbose versions of
dynet’s PyInitializer
s
-
dynn.parameter_initialization.
NormalInit
(mean=0, std=1)¶ Gaussian initialization
Parameters: Returns: dy.NormalInitializer(mean, sqrt(std))
Return type:
-
dynn.parameter_initialization.
OneInit
()¶ Initialize with \(1\)
Returns: dy.ConstInitializer(1)
Return type: dynet.PyInitializer
-
dynn.parameter_initialization.
UniformInit
(scale=1.0)¶ Uniform initialization between
-scale
andscale
Parameters: scale (float) – Scale of the distribution Returns: dy.UniformInitializer(scale)
Return type: dynet.PyInitializer
-
dynn.parameter_initialization.
ZeroInit
()¶ Initialize with \(0\)
Returns: dy.ConstInitializer(0) Return type: dynet.PyInitializer
Training helper functions and classes¶
Adds new optimizers and LR schedules to dynet.
-
dynn.training.
inverse_sqrt_schedule
(warmup, lr0)¶ Inverse square root learning rate schedule
At step \(t\) , the learning rate has value
\[\texttt{lr}\times \min(1 {\sqrt{t}}, \sqrt{\frac {t} {\texttt{warmup}^3})\]Parameters:
Utility functions¶
-
dynn.util.
conditional_dropout
(x, dropout_rate, flag)¶ This helper function applies dropout only if the flag is set to
True
and thedropout_rate
is positive.Parameters: - x (
dynet.Expression
) – Input expression - dropout_rate (float) – Dropout rate
- flag (bool) – Setting this to false ensures that dropout is never applied (for testing for example)
- x (
-
dynn.util.
image_to_matrix
(M)¶ Transforms an ‘image’ with one channel (d1, d2, 1) into a matrix (d1, d2)
-
dynn.util.
list_to_matrix
(l)¶ Transforms a list of N vectors of dimension d into a (N, d) matrix
-
dynn.util.
mask_batches
(x, mask, value=0.0)¶ Apply a mask to the batch dimension
Parameters: - x (list,
dynet.Expression
) – The expression we want to mask. Either adynet.Expression
or a list thereof with the same batch dimension. - mask (np.array, list,
dynet.Expression
) – The mask. Either a list, 1d numpy array ordynet.Expression
. - value (float) – Mask value
- x (list,
-
dynn.util.
matrix_to_image
(M)¶ Transforms a matrix (d1, d2) into an ‘image’ with one channel (d1, d2, 1)
-
dynn.util.
num_params
(pc, params=True, lookup_params=True)¶ Number of parameters in a given ParameterCollection
-
dynn.util.
sin_embeddings
(length, dim, transposed=False)¶ Returns sinusoidal position encodings.
As described in Vaswani et al. (2017)
Specifically this return a
length x dim
matrix \(PE\) such that \(PE[p, 2i]=\sin(\frac{p}/{1000^{\frac{2i}{dim}}})\) and \(PE[p, 2i+1]=\cos(\frac{p}/{1000^{\frac{2i}{dim}}})\)Parameters: