Welcome to MatchZoo’s documentation!¶

MatchZoo is a toolkit for text matching. It was developed with a focus on facilitating the designing, comparing and sharing of deep text matching models. There are a number of deep matching methods, such as DRMM, MatchPyramid, MV-LSTM, aNMM, DUET, ARC-I, ARC-II, DSSM, and CDSSM, designed with a unified interface. Potential tasks related to MatchZoo include document retrieval, question answering, conversational response ranking, paraphrase identification, etc. We are always happy to receive any code contributions, suggestions, comments from all our MatchZoo users.
matchzoo¶
matchzoo package¶
Subpackages¶
matchzoo.auto package¶
Subpackages¶
-
matchzoo.auto.preparer.prepare.
prepare
(task, model_class, data_pack, preprocessor=None, embedding=None, config=None)¶ A simple shorthand for using
matchzoo.Preparer
.config is used to control specific behaviors. The default config will be updated accordingly if a config dictionary is passed. e.g. to override the default bin_size, pass config={‘bin_size’: 15}.
Parameters: - task (
BaseTask
) – Task. - model_class (
Type
[BaseModel
]) – Model class. - data_pack (
DataPack
) – DataPack used to fit the preprocessor. - preprocessor (
Optional
[BasePreprocessor
]) – Preprocessor used to fit the data_pack. (default: the default preprocessor of model_class) - embedding (
Optional
[Embedding
]) – Embedding to build a embedding matrix. If not set, then a correctly shaped randomized matrix will be built. - config (
Optional
[dict
]) – Configuration of specific behaviors. (default: return value of mz.Preparer.get_default_config())
Returns: A tuple of (model, preprocessor, data_generator_builder, embedding_matrix).
- task (
-
class
matchzoo.auto.preparer.preparer.
Preparer
(task, config=None)¶ Bases:
object
Unified setup processes of all MatchZoo models.
config is used to control specific behaviors. The default config will be updated accordingly if a config dictionary is passed. e.g. to override the default bin_size, pass config={‘bin_size’: 15}.
See tutorials/automation.ipynb for a detailed walkthrough on usage.
Default config:
- {
# pair generator builder kwargs ‘num_dup’: 1,
# histogram unit of DRMM ‘bin_size’: 30, ‘hist_mode’: ‘LCH’,
# dynamic Pooling of MatchPyramid ‘compress_ratio_left’: 1.0, ‘compress_ratio_right’: 1.0,
# if no matchzoo.Embedding is passed to tune ‘embedding_output_dim’: 50
}
Parameters: - task (
BaseTask
) – Task. - config (
Optional
[dict
]) – Configuration of specific behaviors.
Example
>>> import matchzoo as mz >>> task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss()) >>> preparer = mz.auto.Preparer(task) >>> model_class = mz.models.DenseBaseline >>> train_raw = mz.datasets.toy.load_data('train', 'ranking') >>> model, prpr, gen_builder, matrix = preparer.prepare(model_class, ... train_raw) >>> model.params.completed() True
-
classmethod
get_default_config
()¶ Default config getter.
Return type: dict
-
prepare
(model_class, data_pack, preprocessor=None, embedding=None)¶ Prepare.
Parameters: - model_class (
Type
[BaseModel
]) – Model class. - data_pack (
DataPack
) – DataPack used to fit the preprocessor. - preprocessor (
Optional
[BasePreprocessor
]) – Preprocessor used to fit the data_pack. (default: the default preprocessor of model_class) - embedding (
Optional
[Embedding
]) – Embedding to build a embedding matrix. If not set, then a correctly shaped randomized matrix will be built.
Return type: Tuple
[BaseModel
,BasePreprocessor
,DataGeneratorBuilder
,ndarray
]Returns: A tuple of (model, preprocessor, data_generator_builder, embedding_matrix).
- model_class (
-
class
matchzoo.auto.tuner.callbacks.callback.
Callback
¶ Bases:
object
Tuner callback base class.
To build your own callbacks, inherit mz.auto.tuner.callbacks.Callback and overrides corresponding methods.
A run proceeds in the following way:
- run start (callback)
- build model
- build end (callback)
- fit and evaluate model
- collect result
- run end (callback)
This process is repeated for num_runs times in a tuner.
-
on_build_end
(tuner, model)¶ Callback on build end stage.
Parameters:
-
on_run_end
(tuner, model, result)¶ Callback on run end stage.
Parameters:
-
class
matchzoo.auto.tuner.callbacks.lambda_callback.
LambdaCallback
(on_run_start=None, on_build_end=None, on_run_end=None)¶ Bases:
matchzoo.auto.tuner.callbacks.callback.Callback
LambdaCallback. Just a shorthand for creating a callback class.
See
matchzoo.tuner.callbacks.Callback
for more details.Example
>>> import matchzoo as mz >>> model = mz.models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> data = mz.datasets.toy.load_data() >>> data = model.get_default_preprocessor().fit_transform( ... data, verbose=0) >>> def show_inputs(*args): ... print(' '.join(map(str, map(type, args)))) >>> callback = mz.auto.tuner.callbacks.LambdaCallback( ... on_run_start=show_inputs, ... on_build_end=show_inputs, ... on_run_end=show_inputs ... ) >>> _ = mz.auto.tune( ... params=model.params, ... train_data=data, ... test_data=data, ... num_runs=1, ... callbacks=[callback], ... verbose=0, ... ) # noqa: E501 <class 'matchzoo.auto.tuner.tuner.Tuner'> <class 'dict'> <class 'matchzoo.auto.tuner.tuner.Tuner'> <class 'matchzoo.models.naive.Naive'> <class 'matchzoo.auto.tuner.tuner.Tuner'> <class 'matchzoo.models.naive.Naive'> <class 'dict'>
-
on_build_end
(tuner, model)¶ on_build_end.
-
on_run_end
(tuner, model, result)¶ on_run_end.
-
on_run_start
(tuner, sample)¶ on_run_start.
-
-
class
matchzoo.auto.tuner.callbacks.load_embedding_matrix.
LoadEmbeddingMatrix
(embedding_matrix)¶ Bases:
matchzoo.auto.tuner.callbacks.callback.Callback
Load a pre-trained embedding after the model is built.
Used with tuner to load a pre-trained embedding matrix for each newly built model instance.
Parameters: embedding_matrix – Embedding matrix to load. Example
>>> import matchzoo as mz >>> model = mz.models.ArcI() >>> prpr = model.get_default_preprocessor() >>> data = mz.datasets.toy.load_data() >>> data = prpr.fit_transform(data, verbose=0) >>> embed = mz.datasets.toy.load_embedding() >>> term_index = prpr.context['vocab_unit'].state['term_index'] >>> matrix = embed.build_matrix(term_index) >>> callback = mz.auto.tuner.callbacks.LoadEmbeddingMatrix(matrix) >>> model.params.update(prpr.context) >>> model.params['task'] = mz.tasks.Ranking() >>> model.params['embedding_output_dim'] = embed.output_dim >>> result = mz.auto.tune( ... params=model.params, ... train_data=data, ... test_data=data, ... num_runs=1, ... callbacks=[callback], ... verbose=0 ... )
-
on_build_end
(tuner, model)¶ on_build_end.
-
-
class
matchzoo.auto.tuner.callbacks.save_model.
SaveModel
(dir_path=PosixPath('/home/docs/.matchzoo/tuned_models'))¶ Bases:
matchzoo.auto.tuner.callbacks.callback.Callback
Save trained model.
For each trained model, a UUID will be generated as the model_id, the model will be saved under the dir_path/model_id. A model_id key will also be inserted into the result, which will visible in the return value of the tune method.
Parameters: dir_path ( Union
[str
,Path
]) – Path to save the models to. (default: matchzoo.USER_TUNED_MODELS_DIR)-
on_run_end
(tuner, model, result)¶ Save model on run end.
-
-
matchzoo.auto.tuner.tune.
tune
(params, train_data, test_data, fit_kwargs=None, evaluate_kwargs=None, metric=None, mode='maximize', num_runs=10, callbacks=None, verbose=1)¶ Tune model hyper-parameters.
A simple shorthand for using
matchzoo.auto.Tuner
.model.params.hyper_space reprensents the model’s hyper-parameters search space, which is the cross-product of individual hyper parameter’s hyper space. When a Tuner builds a model, for each hyper parameter in model.params, if the hyper-parameter has a hyper-space, then a sample will be taken in the space. However, if the hyper-parameter does not have a hyper-space, then the default value of the hyper-parameter will be used.
See tutorials/model_tuning.ipynb for a detailed walkthrough on usage.
Parameters: - params (
ParamTable
) – A completed parameter table to tune. Usually model.params of the desired model to tune. params.completed() should be True. - train_data (
Union
[DataPack
,DataGenerator
]) – Training data to use. Either a preprocessed DataPack, or a DataGenerator. - test_data (
Union
[DataPack
,DataGenerator
]) – Testing data to use. A preprocessed DataPack. - fit_kwargs (
Optional
[dict
]) – Extra keyword arguments to pass to fit. (default: dict(epochs=10, verbose=0)) - evaluate_kwargs (
Optional
[dict
]) – Extra keyword arguments to pass to evaluate. - metric (
Union
[str
,BaseMetric
,None
]) – Metric to tune upon. Must be one of the metrics in model.params[‘task’].metrics. (default: the first metric in params.[‘task’].metrics. - mode (
str
) – Either maximize the metric or minimize the metric. (default: ‘maximize’) - num_runs (
int
) – Number of runs. Each run takes a sample in params.hyper_space and build a model based on the sample. (default: 10) - callbacks (
Optional
[List
[Callback
]]) – A list of callbacks to handle. Handled sequentially at every callback point. - verbose – Verbosity. (default: 1)
Example
>>> import matchzoo as mz >>> train = mz.datasets.toy.load_data('train') >>> dev = mz.datasets.toy.load_data('dev') >>> prpr = mz.models.DenseBaseline.get_default_preprocessor() >>> train = prpr.fit_transform(train, verbose=0) >>> dev = prpr.transform(dev, verbose=0) >>> model = mz.models.DenseBaseline() >>> model.params['input_shapes'] = prpr.context['input_shapes'] >>> model.params['task'] = mz.tasks.Ranking() >>> results = mz.auto.tune( ... params=model.params, ... train_data=train, ... test_data=dev, ... num_runs=1, ... verbose=0 ... ) >>> sorted(results['best'].keys()) ['#', 'params', 'sample', 'score']
- params (
-
class
matchzoo.auto.tuner.tuner.
Tuner
(params, train_data, test_data, fit_kwargs=None, evaluate_kwargs=None, metric=None, mode='maximize', num_runs=10, callbacks=None, verbose=1)¶ Bases:
object
Model hyper-parameters tuner.
model.params.hyper_space reprensents the model’s hyper-parameters search space, which is the cross-product of individual hyper parameter’s hyper space. When a Tuner builds a model, for each hyper parameter in model.params, if the hyper-parameter has a hyper-space, then a sample will be taken in the space. However, if the hyper-parameter does not have a hyper-space, then the default value of the hyper-parameter will be used.
See tutorials/model_tuning.ipynb for a detailed walkthrough on usage.
param params: A completed parameter table to tune. Usually model.params of the desired model to tune. params.completed() should be True. param train_data: Training data to use. Either a preprocessed DataPack, or a DataGenerator. param test_data: Testing data to use. A preprocessed DataPack. param fit_kwargs: Extra keyword arguments to pass to fit. (default: dict(epochs=10, verbose=0)) param evaluate_kwargs: Extra keyword arguments to pass to evaluate. param metric: Metric to tune upon. Must be one of the metrics in model.params[‘task’].metrics. (default: the first metric in params.[‘task’].metrics. param mode: Either maximize the metric or minimize the metric. (default: ‘maximize’) param num_runs: Number of runs. Each run takes a sample in params.hyper_space and build a model based on the sample. (default: 10) param callbacks: A list of callbacks to handle. Handled sequentially at every callback point. param verbose: Verbosity. (default: 1) - Example:
>>> import matchzoo as mz >>> train = mz.datasets.toy.load_data('train') >>> dev = mz.datasets.toy.load_data('dev') >>> prpr = mz.models.DenseBaseline.get_default_preprocessor() >>> train = prpr.fit_transform(train, verbose=0) >>> dev = prpr.transform(dev, verbose=0) >>> model = mz.models.DenseBaseline() >>> model.params['input_shapes'] = prpr.context['input_shapes'] >>> model.params['task'] = mz.tasks.Ranking() >>> tuner = mz.auto.Tuner( ... params=model.params, ... train_data=train, ... test_data=dev, ... num_runs=1, ... verbose=0 ... ) >>> results = tuner.tune() >>> sorted(results['best'].keys()) ['#', 'params', 'sample', 'score']
-
callbacks
¶ callbacks getter.
-
evaluate_kwargs
¶ evaluate_kwargs getter.
-
fit_kwargs
¶ fit_kwargs getter.
-
metric
¶ metric getter.
-
mode
¶ mode getter.
-
num_runs
¶ num_runs getter.
-
params
¶ params getter.
-
test_data
¶ test_data getter.
-
train_data
¶ train_data getter.
-
tune
()¶ Start tuning.
Notice that tune does not affect the tuner’s inner state, so each new call to tune starts fresh. In other words, hyperspaces are suggestive only within the same tune call.
-
verbose
¶ verbose getter.
Module contents¶
matchzoo.data_generator package¶
Subpackages¶
-
class
matchzoo.data_generator.callbacks.callback.
Callback
¶ Bases:
object
DataGenerator callback base class.
To build your own callbacks, inherit mz.data_generator.callbacks.Callback and overrides corresponding methods.
A batch is processed in the following way:
- slice data pack based on batch index
- handle on_batch_data_pack callbacks
- unpack data pack into x, y
- handle on_batch_x_y callbacks
- return x, y
-
on_batch_data_pack
(data_pack)¶ on_batch_data_pack.
Parameters: data_pack ( DataPack
) – a sliced DataPack before unpacking.
-
on_batch_unpacked
(x, y)¶ on_batch_unpacked.
Parameters: - x (
dict
) – unpacked x. - y (
ndarray
) – unpacked y.
- x (
-
class
matchzoo.data_generator.callbacks.dynamic_pooling.
DynamicPooling
(fixed_length_left, fixed_length_right, compress_ratio_left=1, compress_ratio_right=1)¶ Bases:
matchzoo.data_generator.callbacks.callback.Callback
DPoolPairDataGenerator
constructor.Parameters: - fixed_length_left (
int
) – max length of left text. - fixed_length_right (
int
) – max length of right text. - compress_ratio_left (
float
) – the length change ratio, especially after normal pooling layers. - compress_ratio_right (
float
) – the length change ratio, especially after normal pooling layers.
-
on_batch_unpacked
(x, y)¶ Insert dpool_index into x.
Parameters: - x – unpacked x.
- y – unpacked y.
- fixed_length_left (
-
class
matchzoo.data_generator.callbacks.histogram.
Histogram
(embedding_matrix, bin_size=30, hist_mode='CH')¶ Bases:
matchzoo.data_generator.callbacks.callback.Callback
Generate data with matching histogram.
Parameters: - embedding_matrix (
ndarray
) – The embedding matrix used to generator match histogram. - bin_size (
int
) – The number of bin size of the histogram. - hist_mode (
str
) – The mode of theMatchingHistogramUnit
, one of CH, NH, and LCH.
-
on_batch_unpacked
(x, y)¶ Insert match_histogram to x.
- embedding_matrix (
-
class
matchzoo.data_generator.callbacks.lambda_callback.
LambdaCallback
(on_batch_data_pack=None, on_batch_unpacked=None)¶ Bases:
matchzoo.data_generator.callbacks.callback.Callback
LambdaCallback. Just a shorthand for creating a callback class.
See
matchzoo.data_generator.callbacks.Callback
for more details.Example
>>> import matchzoo as mz >>> from matchzoo.data_generator.callbacks import LambdaCallback >>> data = mz.datasets.toy.load_data() >>> batch_func = lambda x: print(type(x)) >>> unpack_func = lambda x, y: print(type(x), type(y)) >>> callback = LambdaCallback(on_batch_data_pack=batch_func, ... on_batch_unpacked=unpack_func) >>> data_gen = mz.DataGenerator( ... data, batch_size=len(data), callbacks=[callback]) >>> _ = data_gen[0] <class 'matchzoo.data_pack.data_pack.DataPack'> <class 'dict'> <class 'numpy.ndarray'>
-
on_batch_data_pack
(data_pack)¶ on_batch_data_pack.
-
on_batch_unpacked
(x, y)¶ on_batch_unpacked.
-
Submodules¶
matchzoo.data_generator.data_generator module¶
Base generator.
-
class
matchzoo.data_generator.data_generator.
DataGenerator
(data_pack, mode='point', num_dup=1, num_neg=1, resample=True, batch_size=128, shuffle=True, callbacks=None)¶ Bases:
keras.utils.data_utils.Sequence
Data Generator.
Used to divide a
matchzoo.DataPack
into batches. This is helpful for generating batch-wise features and delaying data preprocessing to the fit time.See tutorials/data_handling.ipynb for a walkthrough.
Parameters: - data_pack (
DataPack
) – DataPack to generator data from. - mode – One of “point”, “pair”, and “list”. (default: “point”)
- num_dup (
int
) – Number of duplications per instance, only effective when mode is “pair”. (default: 1) - num_neg (
int
) – Number of negative samples per instance, only effective when mode is “pair”. (default: 1) - resample (
bool
) – Either to resample for each epoch, only effective when mode is “pair”. (default: True) - batch_size (
int
) – Batch size. (default: 128) - shuffle (
bool
) – Either to shuffle the samples/instances. (default: True) - callbacks (
Optional
[List
[Callback
]]) – Callbacks. See matchzoo.data_generator.callbacks for more details.
- Examples::
>>> import numpy as np >>> import matchzoo as mz >>> np.random.seed(0) >>> data_pack = mz.datasets.toy.load_data() >>> batch_size = 8
- To generate data points:
>>> point_gen = mz.DataGenerator( ... data_pack=data_pack, ... batch_size=batch_size ... ) >>> len(point_gen) 13 >>> x, y = point_gen[0] >>> for key, value in sorted(x.items()): ... print(key, str(value)[:30]) id_left ['Q6' 'Q17' 'Q1' 'Q13' 'Q16' ' id_right ['D6-6' 'D17-1' 'D1-2' 'D13-3' text_left ['how long is the term for fed text_right ['See Article I and Article II
- To generate data pairs:
>>> pair_gen = mz.DataGenerator( ... data_pack=data_pack, ... mode='pair', ... num_dup=4, ... num_neg=4, ... batch_size=batch_size, ... shuffle=False ... ) >>> len(pair_gen) 3 >>> x, y = pair_gen[0] >>> for key, value in sorted(x.items()): ... print(key, str(value)[:30]) id_left ['Q1' 'Q1' 'Q1' 'Q1' 'Q1' 'Q1' id_right ['D1-3' 'D1-4' 'D1-0' 'D1-1' ' text_left ['how are glacier caves formed text_right ['A glacier cave is a cave for
- To generate data lists:
- # TODO:
-
batch_indices
¶ batch_indices getter.
-
batch_size
¶ batch_size getter.
-
callbacks
¶ callbacks getter.
-
mode
¶ mode getter.
-
num_dup
¶ num_dup getter.
-
num_neg
¶ num_neg getter.
-
on_epoch_end
()¶ Reorganize the index array while epoch is ended.
-
reset_index
()¶ Set the
index_array
.Here the
index_array
records the index of all the instances.
-
shuffle
¶ shuffle getter.
- data_pack (
matchzoo.data_generator.data_generator_builder module¶
-
class
matchzoo.data_generator.data_generator_builder.
DataGeneratorBuilder
(**kwargs)¶ Bases:
object
Data Generator Bulider. In essense a wrapped partial function.
Example
>>> import matchzoo as mz >>> builder = mz.DataGeneratorBuilder(mode='pair', batch_size=32) >>> data = mz.datasets.toy.load_data() >>> gen = builder.build(data) >>> type(gen) <class 'matchzoo.data_generator.data_generator.DataGenerator'> >>> gen.batch_size 32 >>> gen_64 = builder.build(data, batch_size=64) >>> gen_64.batch_size 64
-
build
(data_pack, **kwargs)¶ Build a DataGenerator.
Parameters: - data_pack – DataPack to build upon.
- kwargs – Additional keyword arguments to override the keyword arguments passed in __init__.
Return type:
-
Module contents¶
matchzoo.data_pack package¶
Submodules¶
matchzoo.data_pack.data_pack module¶
Matchzoo DataPack, pair-wise tuple (feature) and context as input.
-
class
matchzoo.data_pack.data_pack.
DataPack
(relation, left, right)¶ Bases:
object
Matchzoo
DataPack
data structure, store dataframe and context.DataPack is a MatchZoo native data structure that most MatchZoo data handling processes build upon. A DataPack consists of three parts: left, right and relation, each one of is a pandas.DataFrame.
Parameters: - relation (
DataFrame
) – Store the relation between left document and right document use ids. - left (
DataFrame
) – Store the content or features for id_left. - right (
DataFrame
) – Store the content or features for id_right.
Example
>>> left = [ ... ['qid1', 'query 1'], ... ['qid2', 'query 2'] ... ] >>> right = [ ... ['did1', 'document 1'], ... ['did2', 'document 2'] ... ] >>> relation = [['qid1', 'did1', 1], ['qid2', 'did2', 1]] >>> relation_df = pd.DataFrame(relation) >>> left = pd.DataFrame(left) >>> right = pd.DataFrame(right) >>> dp = DataPack( ... relation=relation_df, ... left=left, ... right=right, ... ) >>> len(dp) 2
-
DATA_FILENAME
= 'data.dill'¶
-
class
FrameView
(data_pack)¶ Bases:
object
FrameView.
-
append_text_length
(verbose=1)¶ Append length_left and length_right columns.
Parameters: - inplace – True to modify inplace, False to return a modified copy. (default: False)
- verbose – Verbosity.
Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> 'length_left' in data_pack.frame[0].columns False >>> new_data_pack = data_pack.append_text_length(verbose=0) >>> 'length_left' in new_data_pack.frame[0].columns True >>> 'length_left' in data_pack.frame[0].columns False >>> data_pack.append_text_length(inplace=True, verbose=0) >>> 'length_left' in data_pack.frame[0].columns True
-
apply_on_text
(func, mode='both', rename=None, verbose=1)¶ Apply func to text columns based on mode.
Parameters: - func (
Callable
) – The function to apply. - mode (
str
) – One of “both”, “left” and “right”. - rename (
Optional
[str
]) – If set, use new names for results instead of replacing the original columns. To set rename in “both” mode, use a tuple of str, e.g. (“text_left_new_name”, “text_right_new_name”). - inplace – True to modify inplace, False to return a modified copy. (default: False)
- verbose (
int
) – Verbosity.
- Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> frame = data_pack.frame
- To apply len on the left text and add the result as ‘length_left’:
>>> data_pack.apply_on_text(len, mode='left', ... rename='length_left', ... inplace=True, ... verbose=0) >>> list(frame[0].columns) # noqa: E501 ['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'label']
- To do the same to the right text:
>>> data_pack.apply_on_text(len, mode='right', ... rename='length_right', ... inplace=True, ... verbose=0) >>> list(frame[0].columns) # noqa: E501 ['id_left', 'text_left', 'length_left', 'id_right', 'text_right', 'length_right', 'label']
- To do the same to the both texts at the same time:
>>> data_pack.apply_on_text(len, mode='both', ... rename=('extra_left', 'extra_right'), ... inplace=True, ... verbose=0) >>> list(frame[0].columns) # noqa: E501 ['id_left', 'text_left', 'length_left', 'extra_left', 'id_right', 'text_right', 'length_right', 'extra_right', 'label']
- To suppress outputs:
>>> data_pack.apply_on_text(len, mode='both', verbose=0, ... inplace=True)
- func (
-
drop_label
()¶ Remove label column from the data pack.
Parameters: inplace – True to modify inplace, False to return a modified copy. (default: False) Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> data_pack.has_label True >>> data_pack.drop_label(inplace=True) >>> data_pack.has_label False
-
frame
¶ View the data pack as a
pandas.DataFrame
.Returned data frame is created by merging the left data frame, the right dataframe and the relation data frame. Use [] to access an item or a slice of items.
Return type: FrameView
Returns: A matchzoo.DataPack.FrameView
instance.Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> type(data_pack.frame) <class 'matchzoo.data_pack.data_pack.DataPack.FrameView'> >>> frame_slice = data_pack.frame[0:5] >>> type(frame_slice) <class 'pandas.core.frame.DataFrame'> >>> list(frame_slice.columns) ['id_left', 'text_left', 'id_right', 'text_right', 'label'] >>> full_frame = data_pack.frame() >>> len(full_frame) == len(data_pack) True
-
has_label
¶ True if label column exists, False other wise.
Type: return Return type: bool
-
one_hot_encode_label
(num_classes=2)¶ One-hot encode label column of relation.
Parameters: - num_classes – Number of classes.
- inplace – True to modify inplace, False to return a modified copy. (default: False)
Returns:
-
relation
¶ relation getter.
-
save
(dirpath)¶ Save the
DataPack
object.A saved
DataPack
is represented as a directory with aDataPack
object (transformed user input as features and context), it will be saved by pickle.Parameters: dirpath ( Union
[str
,Path
]) – directory path of the savedDataPack
.
-
shuffle
()¶ Shuffle the data pack by shuffling the relation column.
Parameters: inplace – True to modify inplace, False to return a modified copy. (default: False) Example
>>> import matchzoo as mz >>> import numpy.random >>> numpy.random.seed(0) >>> data_pack = mz.datasets.toy.load_data() >>> orig_ids = data_pack.relation['id_left'] >>> shuffled = data_pack.shuffle() >>> (shuffled.relation['id_left'] != orig_ids).any() True
-
unpack
()¶ Unpack the data for training.
The return value can be directly feed to model.fit or model.fit_generator.
Return type: Tuple
[Dict
[str
, <built-in function array>],Optional
[<built-in function array>]]Returns: A tuple of (X, y). y is None if self has no label. Example
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> X, y = data_pack.unpack() >>> type(X) <class 'dict'> >>> sorted(X.keys()) ['id_left', 'id_right', 'text_left', 'text_right'] >>> type(y) <class 'numpy.ndarray'> >>> X, y = data_pack.drop_label().unpack() >>> type(y) <class 'NoneType'>
- relation (
matchzoo.data_pack.pack module¶
Convert list of input into class:DataPack expected format.
-
matchzoo.data_pack.pack.
pack
(df)¶ Pack a
DataPack
using df.The df must have text_left and text_right columns. Optionally, the df can have id_left, id_right to index text_left and text_right respectively. id_left, id_right will be automatically generated if not specified.
Parameters: df ( DataFrame
) – Inputpandas.DataFrame
to use.- Examples::
>>> import matchzoo as mz >>> import pandas as pd >>> df = pd.DataFrame(data={'text_left': list('AABC'), ... 'text_right': list('abbc'), ... 'label': [0, 1, 1, 0]}) >>> mz.pack(df).frame() id_left text_left id_right text_right label 0 L-0 A R-0 a 0 1 L-0 A R-1 b 1 2 L-1 B R-1 b 1 3 L-2 C R-2 c 0
Return type: DataPack
Module contents¶
matchzoo.datasets package¶
Subpackages¶
Embedding data loader.
Quora Question Pairs data loader.
-
matchzoo.datasets.quora_qp.load_data.
load_data
(stage='train', task='classification', return_classes=False)¶ Load QuoraQP data.
Parameters: - path – None for download from quora, specific path for downloaded data.
- stage (
str
) – One of train, dev, and test. - task (
str
) – Could be one of ranking, classification or amatchzoo.engine.BaseTask
instance. - return_classes (
bool
) – Whether return classes for classification task.
Return type: Union
[DataPack
,tuple
]Returns: A DataPack if ranking, a tuple of (DataPack, classes) if classification.
SNLI data loader.
-
matchzoo.datasets.snli.load_data.
load_data
(stage='train', task='classification', target_label='entailment', return_classes=False)¶ Load SNLI data.
Parameters: - stage (
str
) – One of train, dev, and test. (default: train) - task (
str
) – Could be one of ranking, classification or amatchzoo.engine.BaseTask
instance. (default: ranking) - target_label (
str
) – If ranking, chose one of entailment, contradiction, neutral, and - as the positive label. (default: entailment) - return_classes (
bool
) – True to return classes for classification task, False otherwise.
Return type: Union
[DataPack
,tuple
]Returns: A DataPack unless task is classificiation and return_classes is True: a tuple of (DataPack, classes) in that case.
- stage (
-
matchzoo.datasets.toy.
load_data
(stage='train', task='ranking', return_classes=False)¶ Load WikiQA data.
Parameters: - stage (
str
) – One of train, dev, and test. - task (
str
) – Could be one of ranking, classification or amatchzoo.engine.BaseTask
instance. - return_classes (
bool
) – True to return classes for classification task, False otherwise.
Return type: Returns: A DataPack unless task is classificiation and return_classes is True: a tuple of (DataPack, classes) in that case.
Example
>>> import matchzoo as mz >>> stages = 'train', 'dev', 'test' >>> tasks = 'ranking', 'classification' >>> for stage in stages: ... for task in tasks: ... _ = mz.datasets.toy.load_data(stage, task)
- stage (
-
matchzoo.datasets.toy.
load_embedding
()¶
WikiQA data loader.
-
matchzoo.datasets.wiki_qa.load_data.
load_data
(stage='train', task='ranking', filtered=False, return_classes=False)¶ Load WikiQA data.
Parameters: - stage (
str
) – One of train, dev, and test. - task (
str
) – Could be one of ranking, classification or amatchzoo.engine.BaseTask
instance. - filtered (
bool
) – Whether remove the questions without correct answers. - return_classes (
bool
) – True to return classes for classification task, False otherwise.
Return type: Union
[DataPack
,tuple
]Returns: A DataPack unless task is classificiation and return_classes is True: a tuple of (DataPack, classes) in that case.
- stage (
matchzoo.embedding package¶
Submodules¶
matchzoo.embedding.embedding module¶
Matchzoo toolkit for token embedding.
-
class
matchzoo.embedding.embedding.
Embedding
(data)¶ Bases:
object
Embedding class.
- Examples::
>>> import matchzoo as mz >>> train_raw = mz.datasets.toy.load_data() >>> pp = mz.preprocessors.NaivePreprocessor() >>> train = pp.fit_transform(train_raw, verbose=0) >>> vocab_unit = mz.build_vocab_unit(train, verbose=0) >>> term_index = vocab_unit.state['term_index'] >>> embed_path = mz.datasets.embeddings.EMBED_RANK
- To load from a file:
>>> embedding = mz.embedding.load_from_file(embed_path) >>> matrix = embedding.build_matrix(term_index) >>> matrix.shape[0] == len(term_index) True
- To build your own:
>>> data = pd.DataFrame(data=[[0, 1], [2, 3]], index=['A', 'B']) >>> embedding = mz.Embedding(data) >>> matrix = embedding.build_matrix({'A': 2, 'B': 1, '_PAD': 0}) >>> matrix.shape == (3, 2) True
-
build_matrix
(term_index, initializer=<function Embedding.<lambda>>)¶ Build a matrix using term_index.
Parameters: - term_index (
dict
) – A dict or TermIndex to build with. - initializer – A callable that returns a default value for missing terms in data. (default: a random uniform distribution in range) (-0.2, 0.2)).
Return type: ndarray
Returns: A matrix.
- term_index (
-
input_dim
¶ return Embedding input dimension.
Return type: int
-
output_dim
¶ return Embedding output dimension.
Return type: int
-
matchzoo.embedding.embedding.
load_from_file
(file_path, mode='word2vec')¶ Load embedding from file_path.
Parameters: - file_path (
str
) – Path to file. - mode (
str
) – Embedding file format mode, one of ‘word2vec’ or ‘glove’. (default: ‘word2vec’)
Return type: Returns: An
matchzoo.embedding.Embedding
instance.- file_path (
Module contents¶
matchzoo.engine package¶
Submodules¶
matchzoo.engine.base_metric module¶
Metric base class and some related utilities.
-
class
matchzoo.engine.base_metric.
BaseMetric
¶ Bases:
abc.ABC
Metric base class.
-
ALIAS
= 'base_metric'¶
-
-
matchzoo.engine.base_metric.
sort_and_couple
(labels, scores)¶ Zip the labels with scores into a single list.
Return type: <built-in function array>
matchzoo.engine.base_model module¶
Base Model.
-
class
matchzoo.engine.base_model.
BaseModel
(params=None, backend=None)¶ Bases:
abc.ABC
Abstract base class of all MatchZoo models.
MatchZoo models are wrapped over keras models, and the actual keras model built can be accessed by model.backend. params is a set of model hyper-parameters that deterministically builds a model. In other words, params[‘model_class’](params=params) of the same params always create models with the same structure.
Parameters: - params (
Optional
[ParamTable
]) – Model hyper-parameters. (default: return value fromget_default_params()
) - backend (
Optional
[Model
]) – A keras model as the model backend. Usually not passed as an argument.
Example
>>> BaseModel() # doctest: +ELLIPSIS Traceback (most recent call last): ... TypeError: Can't instantiate abstract class BaseModel ... >>> class MyModel(BaseModel): ... def build(self): ... pass >>> isinstance(MyModel(), BaseModel) True
-
BACKEND_WEIGHTS_FILENAME
= 'backend_weights.h5'¶
-
PARAMS_FILENAME
= 'params.dill'¶
-
backend
¶ return model backend, a keras model instance.
Return type: Model
-
build
()¶ Build model, each subclass need to impelemnt this method.
-
compile
()¶ Compile model for training.
Only keras native metrics are compiled together with backend. MatchZoo metrics are evaluated only through
evaluate()
. Notice that keras count loss as one of the metrics while MatchZoomatchzoo.engine.BaseTask
does not.Examples
>>> from matchzoo import models >>> model = models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['task'].metrics = ['mse', 'map'] >>> model.params['task'].metrics ['mse', mean_average_precision(0.0)] >>> model.build() >>> model.compile()
-
evaluate
(x, y, batch_size=128)¶ Evaluate the model.
Parameters: - x (
Dict
[str
,ndarray
]) – Input data. - y (
ndarray
) – Labels. - batch_size (
int
) – Number of samples when predict for evaluation. (default: 128)
- Examples::
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data() >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> data_pack = preprocessor.fit_transform(data_pack, verbose=0) >>> m = mz.models.DenseBaseline() >>> m.params['task'] = mz.tasks.Ranking() >>> m.params['task'].metrics = [ ... 'acc', 'mse', 'mae', 'ce', ... 'average_precision', 'precision', 'dcg', 'ndcg', ... 'mean_reciprocal_rank', 'mean_average_precision', 'mrr', ... 'map', 'MAP', ... mz.metrics.AveragePrecision(threshold=1), ... mz.metrics.Precision(k=2, threshold=2), ... mz.metrics.DiscountedCumulativeGain(k=2), ... mz.metrics.NormalizedDiscountedCumulativeGain( ... k=3, threshold=-1), ... mz.metrics.MeanReciprocalRank(threshold=2), ... mz.metrics.MeanAveragePrecision(threshold=3) ... ] >>> m.guess_and_fill_missing_params(verbose=0) >>> m.build() >>> m.compile() >>> x, y = data_pack.unpack() >>> evals = m.evaluate(x, y) >>> type(evals) <class 'dict'>
Return type: Dict
[BaseMetric
,float
]- x (
-
evaluate_generator
(generator, batch_size=128)¶ Evaluate the model.
Parameters: - generator (
DataGenerator
) – DataGenerator to evluate. - batch_size (
int
) – Batch size. (default: 128)
Return type: Dict
[BaseMetric
,float
]- generator (
-
fit
(x, y, batch_size=128, epochs=1, verbose=1, **kwargs)¶ Fit the model.
See
keras.models.Model.fit()
for more details.Parameters: - x (
Union
[ndarray
,List
[ndarray
],dict
]) – input data. - y (
ndarray
) – labels. - batch_size (
int
) – number of samples per gradient update. - epochs (
int
) – number of epochs to train the model. - verbose (
int
) – 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Key word arguments not listed above will be propagated to keras’s fit.
Return type: History
Returns: A keras.callbacks.History instance. Its history attribute contains all information collected during training. - x (
-
fit_generator
(generator, epochs=1, verbose=1, **kwargs)¶ Fit the model with matchzoo generator.
See
keras.models.Model.fit_generator()
for more details.Parameters: - generator (
DataGenerator
) – A generator, an instance ofengine.DataGenerator
. - epochs (
int
) – Number of epochs to train the model. - verbose (
int
) – 0, 1, or 2. Verbosity mode. 0 = silent, 1 = verbose, 2 = one log line per epoch.
Return type: History
Returns: A keras.callbacks.History instance. Its history attribute contains all information collected during training.
- generator (
-
classmethod
get_default_params
(with_embedding=False, with_multi_layer_perceptron=False)¶ Model default parameters.
- The common usage is to instantiate
matchzoo.engine.ModelParams
- first, then set the model specific parametrs.
Examples
>>> class MyModel(BaseModel): ... def build(self): ... print(self._params['num_eggs'], 'eggs') ... print('and', self._params['ham_type']) ... ... @classmethod ... def get_default_params(cls): ... params = ParamTable() ... params.add(Param('num_eggs', 512)) ... params.add(Param('ham_type', 'Parma Ham')) ... return params >>> my_model = MyModel() >>> my_model.build() 512 eggs and Parma Ham
Notice that all parameters must be serialisable for the entire model to be serialisable. Therefore, it’s strongly recommended to use python native data types to store parameters.
Return type: ParamTable
Returns: model parameters - The common usage is to instantiate
-
classmethod
get_default_preprocessor
()¶ Model default preprocessor.
The preprocessor’s transform should produce a correctly shaped data pack that can be used for training. Some extra configuration (e.g. setting input_shapes in
matchzoo.models.DSSMModel
may be required on the user’s end.Return type: BasePreprocessor
Returns: Default preprocessor.
-
get_embedding_layer
(name='embedding')¶ Get the embedding layer.
All MatchZoo models with a single embedding layer set the embedding layer name to embedding, and this method should return that layer.
Parameters: name ( str
) – Name of the embedding layer. (default: embedding)Return type: Layer
-
guess_and_fill_missing_params
(verbose=1)¶ Guess and fill missing parameters in
params
.Use this method to automatically fill-in other hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manaully for data packs prepared for classification, then the shape of the model output and the data will mismatch.
Parameters: verbose – Verbosity.
-
load_embedding_matrix
(embedding_matrix, name='embedding')¶ Load an embedding matrix.
Load an embedding matrix into the model’s embedding layer. The name of the embedding layer is specified by name. For models with only one embedding layer, set name=’embedding’ when creating the keras layer, and use the default name when load the matrix. For models with more than one embedding layers, initialize keras layer with different layer names, and set name accordingly to load a matrix to a chosen layer.
Parameters: - embedding_matrix (
ndarray
) – Embedding matrix to be loaded. - name (
str
) – Name of the layer. (default: ‘embedding’)
- embedding_matrix (
-
params
¶ model parameters.
Type: return Return type: ParamTable
-
predict
(x, batch_size=128)¶ Generate output predictions for the input samples.
See
keras.models.Model.predict()
for more details.Parameters: - x (
Dict
[str
,ndarray
]) – input data - batch_size – number of samples per gradient update
Return type: ndarray
Returns: numpy array(s) of predictions
- x (
-
save
(dirpath)¶ Save the model.
A saved model is represented as a directory with two files. One is a model parameters file saved by pickle, and the other one is a model h5 file saved by keras.
Parameters: dirpath ( Union
[str
,Path
]) – directory path of the saved modelExample
>>> import matchzoo as mz >>> model = mz.models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.save('temp-model') >>> import shutil >>> shutil.rmtree('temp-model')
- params (
-
matchzoo.engine.base_model.
load_model
(dirpath)¶ Load a model. The reverse function of
BaseModel.save()
.Parameters: dirpath ( Union
[str
,Path
]) – directory path of the saved modelReturn type: BaseModel
Returns: a BaseModel
instanceExample
>>> import matchzoo as mz >>> model = mz.models.Naive() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.save('my-model') >>> model.params.keys() == mz.load_model('my-model').params.keys() True >>> import shutil >>> shutil.rmtree('my-model')
matchzoo.engine.base_preprocessor module¶
BasePreprocessor
define input and ouutput for processors.
-
class
matchzoo.engine.base_preprocessor.
BasePreprocessor
¶ Bases:
object
BasePreprocessor
to input handle data.A preprocessor should be used in two steps. First, fit, then, transform. fit collects information into context, which includes everything the preprocessor needs to transform together with other useful information for later use. fit will only change the preprocessor’s inner state but not the input data. In contrast, transform returns a modified copy of the input data without changing the preprocessor’s inner state.
-
DATA_FILENAME
= 'preprocessor.dill'¶
-
context
¶ Return context.
-
fit
(data_pack, verbose=1)¶ Fit parameters on input data.
This method is an abstract base method, need to be implemented in the child class.
This method is expected to return itself as a callable object.
Parameters: - data_pack (
DataPack
) –Datapack
object to be fitted. - verbose (
int
) – Verbosity.
Return type: - data_pack (
-
fit_transform
(data_pack, verbose=1)¶ Call fit-transform.
Parameters: - data_pack (
DataPack
) –DataPack
object to be processed. - verbose (
int
) – Verbosity.
Return type: - data_pack (
-
save
(dirpath)¶ Save the
DSSMPreprocessor
object.A saved
DSSMPreprocessor
is represented as a directory with the context object (fitted parameters on training data), it will be saved by pickle.Parameters: dirpath ( Union
[str
,Path
]) – directory path of the savedDSSMPreprocessor
.
-
transform
(data_pack, verbose=1)¶ Transform input data to expected manner.
This method is an abstract base method, need to be implemented in the child class.
Parameters: - data_pack (
DataPack
) –DataPack
object to be transformed. - verbose (
int
) – Verbosity. or list of text-left, text-right tuples.
Return type: - data_pack (
-
-
matchzoo.engine.base_preprocessor.
load_preprocessor
(dirpath)¶ Load the fitted context. The reverse function of
save()
.Parameters: dirpath ( Union
[str
,Path
]) – directory path of the saved model.Return type: DataPack
Returns: a DSSMPreprocessor
instance.
-
matchzoo.engine.base_preprocessor.
validate_context
(func)¶ Validate context in the preprocessor.
matchzoo.engine.base_task module¶
Base task.
-
class
matchzoo.engine.base_task.
BaseTask
(loss=None, metrics=None)¶ Bases:
abc.ABC
Base Task, shouldn’t be used directly.
-
classmethod
list_available_losses
()¶ Return type: list
Returns: a list of available losses.
-
classmethod
list_available_metrics
()¶ Return type: list
Returns: a list of available metrics.
-
loss
¶ Loss used in the task.
Type: return
-
metrics
¶ Metrics used in the task.
Type: return
-
output_dtype
¶ output data type for specific task.
Type: return
-
output_shape
¶ output shape of a single sample of the task.
Type: return Return type: tuple
-
classmethod
matchzoo.engine.callbacks module¶
Callbacks.
-
class
matchzoo.engine.callbacks.
EvaluateAllMetrics
(model, x, y, once_every=1, batch_size=128, model_save_path=None, verbose=1)¶ Bases:
keras.callbacks.Callback
Callback to evaluate all metrics.
MatchZoo metrics can not be evaluated batch-wise since they require dataset-level information. As a result, MatchZoo metrics are not evaluated automatically when a Model fit. When this callback is used, all metrics, including MatchZoo metrics and Keras metrics, are evluated once every once_every epochs.
Parameters: - model (
BaseModel
) – Model to evaluate. - x (
Union
[ndarray
,List
[ndarray
]]) – - y (
ndarray
) – - once_every (
int
) – Evaluation only triggers when epoch % once_every == 0. (default: 1, i.e. evaluate on every epoch’s end) - batch_size (
int
) – Number of samples per evaluation. This only affects the evaluation of Keras metrics, since MatchZoo metrics are always evaluated using the full data. - model_save_path (
Optional
[str
]) – Directory path to save the model after each evaluate callback, (default: None, i.e., no saving.) - verbose – Verbosity.
-
on_epoch_end
(epoch, logs=None)¶ Called at the end of en epoch.
Parameters: - epoch (
int
) – integer, index of epoch. - logs (
Optional
[dict
]) – dictionary of logs.
Returns: dictionary of logs.
- epoch (
- model (
matchzoo.engine.hyper_spaces module¶
Hyper parameter search spaces wrapping hyperopt.
-
class
matchzoo.engine.hyper_spaces.
HyperoptProxy
(hyperopt_func, **kwargs)¶ Bases:
object
Hyperopt proxy class.
See hyperopt’s documentation for more details: https://github.com/hyperopt/hyperopt/wiki/FMin
Reason of these wrappers:
A hyper space in hyperopt requires a label to instantiate. This label is used later as a reference to original hyper space that is sampled. In matchzoo, hyper spaces are used inmatchzoo.engine.Param
. Only if a hyper space’s label matches its parentmatchzoo.engine.Param
’s name, matchzoo can correctly back-refrenced the parameter got sampled. This can be done by asking the user always use the same name for a parameter and its hyper space, but typos can occur. As a result, these wrappers are created to hide hyper spaces’ label, and always correctly bind them with its parameter’s name.- Examples::
>>> import matchzoo as mz >>> from hyperopt.pyll.stochastic import sample
- Basic Usage:
>>> model = mz.models.DenseBaseline() >>> sample(model.params.hyper_space) # doctest: +SKIP {'mlp_num_layers': 1.0, 'mlp_num_units': 274.0}
- Arithmetic Operations:
>>> new_space = 2 ** mz.hyper_spaces.quniform(2, 6) >>> model.params.get('mlp_num_layers').hyper_space = new_space >>> sample(model.params.hyper_space) # doctest: +SKIP {'mlp_num_layers': 8.0, 'mlp_num_units': 292.0}
-
convert
(name)¶ Attach name as hyperopt.hp’s label.
Parameters: name ( str
) –Return type: Apply
Returns: a hyperopt ready search space
-
class
matchzoo.engine.hyper_spaces.
choice
(options)¶ Bases:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.choice()
proxy.
-
class
matchzoo.engine.hyper_spaces.
quniform
(low, high, q=1)¶ Bases:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.quniform()
proxy.
-
matchzoo.engine.hyper_spaces.
sample
(space)¶ Take a sample in the hyper space.
This method is stateless, so the distribution of the samples is different from that of tune call. This function just gives a general idea of what a sample from the space looks like.
Example
>>> import matchzoo as mz >>> space = mz.models.Naive.get_default_params().hyper_space >>> mz.hyper_spaces.sample(space) # doctest: +ELLIPSIS {'optimizer': ...}
-
class
matchzoo.engine.hyper_spaces.
uniform
(low, high)¶ Bases:
matchzoo.engine.hyper_spaces.HyperoptProxy
hyperopt.hp.uniform()
proxy.
matchzoo.engine.param module¶
Parameter class.
-
class
matchzoo.engine.param.
Param
(name, value=None, hyper_space=None, validator=None, desc=None)¶ Bases:
object
Parameter class.
Basic usages with a name and value:
>>> param = Param('my_param', 10) >>> param.name 'my_param' >>> param.value 10
Use with a validator to make sure the parameter always keeps a valid value.
>>> param = Param( ... name='my_param', ... value=5, ... validator=lambda x: 0 < x < 20 ... ) >>> param.validator # doctest: +ELLIPSIS <function <lambda> at 0x...> >>> param.value 5 >>> param.value = 10 >>> param.value 10 >>> param.value = -1 Traceback (most recent call last): ... ValueError: Validator not satifised. The validator's definition is as follows: validator=lambda x: 0 < x < 20
Use with a hyper space. Setting up a hyper space for a parameter makes the parameter tunable in a
matchzoo.engine.Tuner
.>>> from matchzoo.engine.hyper_spaces import quniform >>> param = Param( ... name='positive_num', ... value=1, ... hyper_space=quniform(low=1, high=5) ... ) >>> param.hyper_space # doctest: +ELLIPSIS <matchzoo.engine.hyper_spaces.quniform object at ...> >>> from hyperopt.pyll.stochastic import sample >>> hyperopt_space = param.hyper_space.convert(param.name) >>> samples = [sample(hyperopt_space) for _ in range(64)] >>> set(samples) == {1, 2, 3, 4, 5} True
The boolean value of a
Param
instance is only True when the value is not None. This is because some default falsy values like zero or an empty list are valid parameter values. In other words, the boolean value means to be “if the parameter value is filled”.>>> param = Param('dropout') >>> if param: ... print('OK') >>> param = Param('dropout', 0) >>> if param: ... print('OK') OK
A _pre_assignment_hook is initialized as a data type convertor if the value is set as a number to keep data type consistency of the parameter. This conversion supports python built-in numbers, numpy numbers, and any number that inherits
numbers.Number
.>>> param = Param('float_param', 0.5) >>> param.value = 10 >>> param.value 10.0 >>> type(param.value) <class 'float'>
-
desc
¶ Parameter description.
Type: return Return type: str
-
hyper_space
¶ Hyper space of the parameter.
Type: return Return type: Union
[Apply
,HyperoptProxy
]
-
name
¶ Name of the parameter.
Type: return Return type: str
-
reset
()¶ Set the parameter’s value to None, which means “not set”.
This method bypasses validator.
Example
>>> import matchzoo as mz >>> param = mz.Param( ... name='str', validator=lambda x: isinstance(x, str)) >>> param.value = 'hello' >>> param.value = None Traceback (most recent call last): ... ValueError: Validator not satifised. The validator's definition is as follows: name='str', validator=lambda x: isinstance(x, str)) >>> param.reset() >>> param.value is None True
-
set_default
(val, verbose=1)¶ Set default value, has no effect if already has a value.
Parameters: - val – Default value to set.
- verbose – Verbosity.
-
validator
¶ Validator of the parameter.
Type: return Return type: Callable
[[Any
],bool
]
-
value
¶ Value of the parameter.
Type: return Return type: Any
-
matchzoo.engine.param_table module¶
Parameters table class.
-
class
matchzoo.engine.param_table.
ParamTable
¶ Bases:
object
Parameter table class.
Example
>>> params = ParamTable() >>> params.add(Param('ham', 'Parma Ham')) >>> params.add(Param('egg', 'Over Easy')) >>> params['ham'] 'Parma Ham' >>> params['egg'] 'Over Easy' >>> print(params) ham Parma Ham egg Over Easy >>> params.add(Param('egg', 'Sunny side Up')) Traceback (most recent call last): ... ValueError: Parameter named egg already exists. To re-assign parameter egg value, use `params["egg"] = value` instead.
-
completed
()¶ Return type: bool
Returns: True if all params are filled, False otherwise. Example
>>> import matchzoo >>> model = matchzoo.models.Naive() >>> model.params.completed() False >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params.completed() True
-
hyper_space
¶ Hyper space of the table, a valid hyperopt graph.
Type: return Return type: dict
-
keys
()¶ Return type: KeysView
Returns: Parameter table keys.
-
set
(key, param)¶ Set key to parameter param.
-
to_frame
()¶ Convert the parameter table into a pandas data frame.
Return type: DataFrame
Returns: A pandas.DataFrame. Example
>>> import matchzoo as mz >>> table = mz.ParamTable() >>> table.add(mz.Param(name='x', value=10, desc='my x')) >>> table.add(mz.Param(name='y', value=20, desc='my y')) >>> table.to_frame() Name Description Value Hyper-Space 0 x my x 10 None 1 y my y 20 None
-
update
(other)¶ Update self.
Update self with the key/value pairs from other, overwriting existing keys. Notice that this does not add new keys to self.
This method is usually used by models to obtain useful information from a preprocessor’s context.
Parameters: other ( dict
) – The dictionary used update.Example
>>> import matchzoo as mz >>> model = mz.models.DenseBaseline() >>> model.params['input_shapes'] is None True >>> prpr = model.get_default_preprocessor() >>> _ = prpr.fit(mz.datasets.toy.load_data(), verbose=0) >>> model.params.update(prpr.context) >>> model.params['input_shapes'] [(30,), (30,)]
-
matchzoo.engine.parse_metric module¶
-
matchzoo.engine.parse_metric.
parse_metric
(metric, task=None)¶ Parse input metric in any form into a
BaseMetric
instance.Parameters: - metric (
Union
[str
,Type
[BaseMetric
],BaseMetric
]) – Input metric in any form. - task (
Optional
[BaseTask
]) – Task type for determining specific metric.
Return type: Union
[BaseMetric
,str
]Returns: A
BaseMetric
instance- Examples::
>>> from matchzoo import metrics >>> from matchzoo.engine.parse_metric import parse_metric
- Use str as keras native metrics:
>>> parse_metric('mse') 'mse'
- Use str as MatchZoo metrics:
>>> mz_metric = parse_metric('map') >>> type(mz_metric) <class 'matchzoo.metrics.mean_average_precision.MeanAveragePrecision'>
- Use
matchzoo.engine.BaseMetric
subclasses as MatchZoo metrics: >>> type(parse_metric(metrics.AveragePrecision)) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
- Use
matchzoo.engine.BaseMetric
instances as MatchZoo metrics: >>> type(parse_metric(metrics.AveragePrecision())) <class 'matchzoo.metrics.average_precision.AveragePrecision'>
- metric (
Module contents¶
matchzoo.layers package¶
Submodules¶
matchzoo.layers.dynamic_pooling_layer module¶
An implementation of Dynamic Pooling Layer.
-
class
matchzoo.layers.dynamic_pooling_layer.
DynamicPoolingLayer
(psize1, psize2, **kwargs)¶ Bases:
keras.engine.base_layer.Layer
Layer that computes dynamic pooling of one tensor.
Parameters: - psize1 (
int
) – pooling size of dimension 1 - psize2 (
int
) – pooling size of dimension 2 - kwargs – Standard layer keyword arguments.
Examples
>>> import matchzoo as mz >>> layer = mz.layers.DynamicPoolingLayer(3, 2) >>> num_batch, left_len, right_len, num_dim = 5, 3, 2, 10 >>> layer.build([[num_batch, left_len, right_len, num_dim], ... [num_batch, left_len, right_len, 3]])
-
build
(input_shape)¶ Build the layer.
Parameters: input_shape ( List
[int
]) – the shapes of the input tensors, for DynamicPoolingLayer we need tow input tensors.
-
call
(inputs, **kwargs)¶ The computation logic of DynamicPoolingLayer.
Parameters: inputs ( list
) – two input tensors.Return type: Any
-
compute_output_shape
(input_shape)¶ Calculate the layer output shape.
Parameters: input_shape ( list
) – the shapes of the input tensors, for DynamicPoolingLayer we need tow input tensors.Return type: tuple
-
get_config
()¶ Get the config dict of DynamicPoolingLayer.
Return type: dict
- psize1 (
matchzoo.layers.matching_layer module¶
An implementation of Matching Layer.
-
class
matchzoo.layers.matching_layer.
MatchingLayer
(normalize=False, matching_type='dot', **kwargs)¶ Bases:
keras.engine.base_layer.Layer
Layer that computes a matching matrix between samples in two tensors.
Parameters: - normalize (
bool
) – Whether to L2-normalize samples along the dot product axis before taking the dot product. If set to True, then the output of the dot product is the cosine proximity between the two samples. - matching_type (
str
) – the similarity function for matching - kwargs – Standard layer keyword arguments.
Examples
>>> import matchzoo as mz >>> layer = mz.layers.MatchingLayer(matching_type='dot', ... normalize=True) >>> num_batch, left_len, right_len, num_dim = 5, 3, 2, 10 >>> layer.build([[num_batch, left_len, num_dim], ... [num_batch, right_len, num_dim]])
-
build
(input_shape)¶ Build the layer.
Parameters: input_shape ( list
) – the shapes of the input tensors, for MatchingLayer we need tow input tensors.
-
call
(inputs, **kwargs)¶ The computation logic of MatchingLayer.
Parameters: inputs ( list
) – two input tensors.Return type: Any
-
compute_output_shape
(input_shape)¶ Calculate the layer output shape.
Parameters: input_shape ( list
) – the shapes of the input tensors, for MatchingLayer we need tow input tensors.Return type: tuple
-
get_config
()¶ Get the config dict of MatchingLayer.
Return type: dict
- normalize (
Module contents¶
matchzoo.losses package¶
Submodules¶
matchzoo.losses.rank_cross_entropy_loss module¶
The rank cross entropy loss.
-
class
matchzoo.losses.rank_cross_entropy_loss.
RankCrossEntropyLoss
(num_neg=1)¶ Bases:
object
Rank cross entropy loss.
Examples
>>> from keras import backend as K >>> softmax = lambda x: np.exp(x)/np.sum(np.exp(x), axis=0) >>> x_pred = K.variable(np.array([[1.0], [1.2], [0.8]])) >>> x_true = K.variable(np.array([[1], [0], [0]])) >>> expect = -np.log(softmax(np.array([[1.0], [1.2], [0.8]]))) >>> loss = K.eval(RankCrossEntropyLoss(num_neg=2)(x_true, x_pred)) >>> np.isclose(loss, expect[0]).all() True
-
num_neg
¶ num_neg getter.
-
matchzoo.losses.rank_hinge_loss module¶
The rank hinge loss.
-
class
matchzoo.losses.rank_hinge_loss.
RankHingeLoss
(num_neg=1, margin=1.0)¶ Bases:
object
Rank hinge loss.
Examples
>>> from keras import backend as K >>> x_pred = K.variable(np.array([[1.0], [1.2], [0.8], [1.4]])) >>> x_true = K.variable(np.array([[1], [0], [1], [0]])) >>> expect = ((1.0 + 1.2 - 1.0) + (1.0 + 1.4 - 0.8)) / 2 >>> expect 1.4 >>> loss = K.eval(RankHingeLoss(num_neg=1, margin=1.0)(x_true, x_pred)) >>> np.isclose(loss, expect) True
-
margin
¶ margin getter.
-
num_neg
¶ num_neg getter.
-
Module contents¶
matchzoo.metrics package¶
Submodules¶
matchzoo.metrics.average_precision module¶
Average precision metric for ranking.
-
class
matchzoo.metrics.average_precision.
AveragePrecision
(threshold=0.0)¶ Bases:
matchzoo.engine.base_metric.BaseMetric
Average precision metric.
-
ALIAS
= ['average_precision', 'ap']¶
-
matchzoo.metrics.discounted_cumulative_gain module¶
Discounted cumulative gain metric for ranking.
-
class
matchzoo.metrics.discounted_cumulative_gain.
DiscountedCumulativeGain
(k=1, threshold=0.0)¶ Bases:
matchzoo.engine.base_metric.BaseMetric
Disconunted cumulative gain metric.
-
ALIAS
= ['discounted_cumulative_gain', 'dcg']¶
-
matchzoo.metrics.mean_average_precision module¶
Mean average precision metric for ranking.
-
class
matchzoo.metrics.mean_average_precision.
MeanAveragePrecision
(threshold=0.0)¶ Bases:
matchzoo.engine.base_metric.BaseMetric
Mean average precision metric.
-
ALIAS
= ['mean_average_precision', 'map']¶
-
matchzoo.metrics.mean_reciprocal_rank module¶
Mean reciprocal ranking metric.
-
class
matchzoo.metrics.mean_reciprocal_rank.
MeanReciprocalRank
(threshold=0.0)¶ Bases:
matchzoo.engine.base_metric.BaseMetric
Mean reciprocal rank metric.
-
ALIAS
= ['mean_reciprocal_rank', 'mrr']¶
-
matchzoo.metrics.normalized_discounted_cumulative_gain module¶
Normalized discounted cumulative gain metric for ranking.
-
class
matchzoo.metrics.normalized_discounted_cumulative_gain.
NormalizedDiscountedCumulativeGain
(k=1, threshold=0.0)¶ Bases:
matchzoo.engine.base_metric.BaseMetric
Normalized discounted cumulative gain metric.
-
ALIAS
= ['normalized_discounted_cumulative_gain', 'ndcg']¶
-
matchzoo.metrics.precision module¶
Precision for ranking.
-
class
matchzoo.metrics.precision.
Precision
(k=1, threshold=0.0)¶ Bases:
matchzoo.engine.base_metric.BaseMetric
Precision metric.
-
ALIAS
= 'precision'¶
-
matchzoo.models package¶
Submodules¶
matchzoo.models.anmm module¶
An implementation of aNMM Model.
-
class
matchzoo.models.anmm.
ANMM
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
ANMM Model.
Examples
>>> model = ANMM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
aNMM model based on bin weighting and query term attentions
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
matchzoo.models.arci module¶
An implementation of ArcI Model.
-
class
matchzoo.models.arci.
ArcI
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
ArcI Model.
Examples
>>> model = ArcI() >>> model.params['num_blocks'] = 1 >>> model.params['left_filters'] = [32] >>> model.params['right_filters'] = [32] >>> model.params['left_kernel_sizes'] = [3] >>> model.params['right_kernel_sizes'] = [3] >>> model.params['left_pool_sizes'] = [2] >>> model.params['right_pool_sizes'] = [4] >>> model.params['conv_activation_func'] = 'relu' >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 64 >>> model.params['mlp_num_fan_out'] = 32 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
ArcI use Siamese arthitecture.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
matchzoo.models.arcii module¶
An implementation of ArcII Model.
-
class
matchzoo.models.arcii.
ArcII
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
ArcII Model.
Examples
>>> model = ArcII() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_1d_count'] = 32 >>> model.params['kernel_1d_size'] = 3 >>> model.params['kernel_2d_count'] = [16, 32] >>> model.params['kernel_2d_size'] = [[3, 3], [3, 3]] >>> model.params['pool_2d_size'] = [[2, 2], [2, 2]] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
ArcII has the desirable property of letting two sentences meet before their own high-level representations mature.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
matchzoo.models.cdssm module¶
An implementation of CDSSM (CLSM) model.
-
class
matchzoo.models.cdssm.
CDSSM
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
CDSSM Model implementation.
Learning Semantic Representations Using Convolutional Neural Networks for Web Search. (2014a) A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. (2014b)
Examples
>>> model = CDSSM() >>> model.params['optimizer'] = 'adam' >>> model.params['filters'] = 32 >>> model.params['kernel_size'] = 3 >>> model.params['conv_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
CDSSM use Siamese architecture.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
classmethod
get_default_preprocessor
()¶ Returns: Default preprocessor.
-
guess_and_fill_missing_params
(verbose=1)¶ Guess and fill missing parameters in
params
.Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manually for data packs prepared for classification, then the shape of the model output and the data will mismatch.
Parameters: verbose ( int
) – Verbosity.
-
matchzoo.models.conv_knrm module¶
ConvKNRM model.
-
class
matchzoo.models.conv_knrm.
ConvKNRM
(params=None, backend=None)¶ Bases:
matchzoo.models.knrm.KNRM
ConvKNRM model.
Examples
>>> model = ConvKNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['embedding_trainable'] = True >>> model.params['filters'] = 128 >>> model.params['conv_activation_func'] = 'tanh' >>> model.params['max_ngram'] = 3 >>> model.params['use_crossmatch'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model.
-
get_default_params
()¶ Get default parameters.
-
matchzoo.models.dense_baseline module¶
A simple densely connected baseline model.
-
class
matchzoo.models.dense_baseline.
DenseBaseline
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
A simple densely connected baseline model.
Examples
>>> model = DenseBaseline() >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
-
build
()¶ Model structure.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
matchzoo.models.drmm module¶
An implementation of DRMM Model.
-
class
matchzoo.models.drmm.
DRMM
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
DRMM Model.
Examples
>>> model = DRMM() >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
-
classmethod
attention_layer
(attention_input, attention_mask=None)¶ Performs attention on the input.
Parameters: - attention_input (
Any
) – The input tensor for attention layer. - attention_mask (
Optional
[Any
]) – A tensor to mask the invalid values.
Return type: Layer
Returns: The masked output tensor.
- attention_input (
-
build
()¶ Build model structure.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
classmethod
matchzoo.models.drmmtks module¶
An implementation of DRMMTKS Model.
-
class
matchzoo.models.drmmtks.
DRMMTKS
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
DRMMTKS Model.
Examples
>>> model = DRMMTKS() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['top_k'] = 20 >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
classmethod
attention_layer
(attention_input, attention_mask=None)¶ Performs attention on the input.
Parameters: - attention_input (
Any
) – The input tensor for attention layer. - attention_mask (
Optional
[Any
]) – A tensor to mask the invalid values.
Return type: Layer
Returns: The masked output tensor.
- attention_input (
-
build
()¶ Build model structure.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
classmethod
matchzoo.models.dssm module¶
An implementation of DSSM, Deep Structured Semantic Model.
-
class
matchzoo.models.dssm.
DSSM
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
Deep structured semantic model.
Examples
>>> model = DSSM() >>> model.params['mlp_num_layers'] = 3 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
DSSM use Siamese arthitecture.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
classmethod
get_default_preprocessor
()¶ Returns: Default preprocessor.
-
matchzoo.models.duet module¶
DUET Model.
-
class
matchzoo.models.duet.
DUET
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
DUET Model.
Examples
>>> model = DUET() >>> model.params['embedding_input_dim'] = 1000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['lm_filters'] = 32 >>> model.params['lm_hidden_sizes'] = [64, 32] >>> model.params['dropout_rate'] = 0.5 >>> model.params['dm_filters'] = 32 >>> model.params['dm_kernel_size'] = 3 >>> model.params['dm_d_mpool'] = 4 >>> model.params['dm_hidden_sizes'] = [64, 32] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model.
-
classmethod
get_default_params
()¶ Get default parameters.
-
matchzoo.models.knrm module¶
KNRM model.
-
class
matchzoo.models.knrm.
KNRM
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
KNRM model.
Examples
>>> model = KNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 10 >>> model.params['embedding_trainable'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model.
-
classmethod
get_default_params
()¶ Get default parameters.
-
matchzoo.models.match_pyramid module¶
An implementation of MatchPyramid Model.
-
class
matchzoo.models.match_pyramid.
MatchPyramid
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
MatchPyramid Model.
Examples
>>> model = MatchPyramid() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_count'] = [16, 32] >>> model.params['kernel_size'] = [[3, 3], [3, 3]] >>> model.params['dpool_size'] = [3, 10] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
MatchPyramid text matching as image recognition.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
matchzoo.models.mvlstm module¶
An implementation of MVLSTM Model.
-
class
matchzoo.models.mvlstm.
MVLSTM
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
MVLSTM Model.
Examples
>>> model = MVLSTM() >>> model.params['lstm_units'] = 32 >>> model.params['top_k'] = 50 >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 20 >>> model.params['mlp_num_fan_out'] = 10 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
-
build
()¶ Build model structure.
-
classmethod
get_default_params
()¶ Return type: ParamTable
Returns: model default parameters.
-
matchzoo.models.naive module¶
Naive model with a simplest structure for testing purposes.
-
class
matchzoo.models.naive.
Naive
(params=None, backend=None)¶ Bases:
matchzoo.engine.base_model.BaseModel
Naive model with a simplest structure for testing purposes.
Bare minimum functioning model. The best choice to get things rolling. The worst choice to fit and evaluate performance.
-
build
()¶ Build.
-
classmethod
get_default_params
()¶ Default parameters.
-
matchzoo.models.parameter_readme_generator module¶
matchzoo/models/README.md generater.
matchzoo.preprocessors package¶
Subpackages¶
-
class
matchzoo.preprocessors.units.digit_removal.
DigitRemoval
¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit to remove digits.
-
transform
(input_)¶ Remove digits from list of tokens.
Parameters: input – list of tokens to be filtered. Return tokens: tokens of tokens without digits. Return type: list
-
-
class
matchzoo.preprocessors.units.fixed_length.
FixedLength
(text_length, pad_value=0, pad_mode='pre', truncate_mode='pre')¶ Bases:
matchzoo.preprocessors.units.unit.Unit
FixedLengthUnit Class.
Process unit to get the fixed length text.
Examples
>>> from matchzoo.preprocessors.units import FixedLength >>> fixedlen = FixedLength(3) >>> fixedlen.transform(list(range(1, 6))) == [3, 4, 5] True >>> fixedlen.transform(list(range(1, 3))) == [0, 1, 2] True
-
transform
(input_)¶ Transform list of tokenized tokens into the fixed length text.
Parameters: input – list of tokenized tokens. Return tokens: list of tokenized tokens in fixed length. Return type: list
-
-
class
matchzoo.preprocessors.units.frequency_filter.
FrequencyFilter
(low=0, high=inf, mode='df')¶ Bases:
matchzoo.preprocessors.units.stateful_unit.StatefulUnit
Frequency filter unit.
Parameters: - low (
float
) – Lower bound, inclusive. - high (
float
) – Upper bound, exclusive. - mode (
str
) – One of tf (term frequency), df (document frequency), and idf (inverse document frequency).
- Examples::
>>> import matchzoo as mz
- To filter based on term frequency (tf):
>>> tf_filter = mz.preprocessors.units.FrequencyFilter( ... low=2, mode='tf') >>> tf_filter.fit([['A', 'B', 'B'], ['C', 'C', 'C']]) >>> tf_filter.transform(['A', 'B', 'C']) ['B', 'C']
- To filter based on document frequency (df):
>>> tf_filter = mz.preprocessors.units.FrequencyFilter( ... low=2, mode='df') >>> tf_filter.fit([['A', 'B'], ['B', 'C']]) >>> tf_filter.transform(['A', 'B', 'C']) ['B']
- To filter based on inverse document frequency (idf):
>>> idf_filter = mz.preprocessors.units.FrequencyFilter( ... low=1.2, mode='idf') >>> idf_filter.fit([['A', 'B'], ['B', 'C', 'D']]) >>> idf_filter.transform(['A', 'B', 'C']) ['A', 'C']
-
fit
(list_of_tokens)¶ Fit list_of_tokens by calculating mode states.
-
transform
(input_)¶ Transform a list of tokens by filtering out unwanted words.
Return type: list
- low (
-
class
matchzoo.preprocessors.units.lemmatization.
Lemmatization
¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for token lemmatization.
-
transform
(input_)¶ Lemmatization a sequence of tokens.
Parameters: input – list of tokens to be lemmatized. Return tokens: list of lemmatizd tokens. Return type: list
-
-
class
matchzoo.preprocessors.units.lowercase.
Lowercase
¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for text lower case.
-
transform
(input_)¶ Convert list of tokens to lower case.
Parameters: input – list of tokens. Return tokens: lower-cased list of tokens. Return type: list
-
-
class
matchzoo.preprocessors.units.matching_histogram.
MatchingHistogram
(bin_size=30, embedding_matrix=None, normalize=True, mode='LCH')¶ Bases:
matchzoo.preprocessors.units.unit.Unit
MatchingHistogramUnit Class.
Parameters: - bin_size (
int
) – The number of bins of the matching histogram. - embedding_matrix – The word embedding matrix applied to calculate the matching histogram.
- normalize – Boolean, normalize the embedding or not.
- mode (
str
) – The type of the historgram, it should be one of ‘CH’, ‘NG’, or ‘LCH’.
Examples
>>> embedding_matrix = np.array([[1.0, -1.0], [1.0, 2.0], [1.0, 3.0]]) >>> text_left = [0, 1] >>> text_right = [1, 2] >>> histogram = MatchingHistogram(3, embedding_matrix, True, 'CH') >>> histogram.transform([text_left, text_right]) [[3.0, 1.0, 1.0], [1.0, 2.0, 2.0]]
-
transform
(input_)¶ Transform the input text.
Return type: list
- bin_size (
-
class
matchzoo.preprocessors.units.ngram_letter.
NgramLetter
(ngram=3, reduce_dim=True)¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for n-letter generation.
Triletter is used in
DSSMModel
. This processor is expected to execute before Vocab has been created.Examples
>>> triletter = NgramLetter() >>> rv = triletter.transform(['hello', 'word']) >>> len(rv) 9 >>> rv ['#he', 'hel', 'ell', 'llo', 'lo#', '#wo', 'wor', 'ord', 'rd#'] >>> triletter = NgramLetter(reduce_dim=False) >>> rv = triletter.transform(['hello', 'word']) >>> len(rv) 2 >>> rv [['#he', 'hel', 'ell', 'llo', 'lo#'], ['#wo', 'wor', 'ord', 'rd#']]
-
transform
(input_)¶ Transform token into tri-letter.
For example, word should be represented as #wo, wor, ord and rd#.
Parameters: input – list of tokens to be transformed. Return n_letters: generated n_letters. Return type: list
-
-
class
matchzoo.preprocessors.units.punc_removal.
PuncRemoval
¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for remove punctuations.
-
transform
(input_)¶ Remove punctuations from list of tokens.
Parameters: input – list of toekns. Return rv: tokens without punctuation. Return type: list
-
-
class
matchzoo.preprocessors.units.stateful_unit.
StatefulUnit
¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Unit with inner state.
Usually need to be fit before transforming. All information gathered in the fit phrase will be stored into its context.
-
context
¶ Get current context. Same as unit.state.
-
fit
(input_)¶ Abstract base method, need to be implemented in subclass.
-
state
¶ Get current context. Same as unit.context.
Deprecated since v2.2.0, and will be removed in the future. Used unit.context instead.
-
-
class
matchzoo.preprocessors.units.stemming.
Stemming
(stemmer='porter')¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for token stemming.
Parameters: stemmer – stemmer to use, porter or lancaster. -
transform
(input_)¶ Reducing inflected words to their word stem, base or root form.
Parameters: input – list of string to be stemmed. Return type: list
-
-
class
matchzoo.preprocessors.units.stop_removal.
StopRemoval
(lang='english')¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit to remove stop words.
Example
>>> unit = StopRemoval() >>> unit.transform(['a', 'the', 'test']) ['test'] >>> type(unit.stopwords) <class 'list'>
-
stopwords
¶ Get stopwords based on language.
Params lang: language code. Return type: list
Returns: list of stop words.
-
transform
(input_)¶ Remove stopwords from list of tokenized tokens.
Parameters: - input – list of tokenized tokens.
- lang – language code for stopwords.
Return tokens: list of tokenized tokens without stopwords.
Return type: list
-
-
class
matchzoo.preprocessors.units.tokenize.
Tokenize
¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Process unit for text tokenization.
-
transform
(input_)¶ Process input data from raw terms to list of tokens.
Parameters: input – raw textual input. Return tokens: tokenized tokens as a list. Return type: list
-
-
class
matchzoo.preprocessors.units.vocabulary.
Vocabulary
(pad_value='<PAD>', oov_value='<OOV>')¶ Bases:
matchzoo.preprocessors.units.stateful_unit.StatefulUnit
Vocabulary class.
Parameters: - pad_value (
str
) – The string value for the padding position. - oov_value (
str
) – The string value for the out-of-vocabulary terms.
Examples
>>> vocab = Vocabulary(pad_value='[PAD]', oov_value='[OOV]') >>> vocab.fit(['A', 'B', 'C', 'D', 'E']) >>> term_index = vocab.state['term_index'] >>> term_index # doctest: +SKIP {'[PAD]': 0, '[OOV]': 1, 'D': 2, 'A': 3, 'B': 4, 'C': 5, 'E': 6} >>> index_term = vocab.state['index_term'] >>> index_term # doctest: +SKIP {0: '[PAD]', 1: '[OOV]', 2: 'D', 3: 'A', 4: 'B', 5: 'C', 6: 'E'}
>>> term_index['out-of-vocabulary-term'] 1 >>> index_term[0] '[PAD]' >>> index_term[42] Traceback (most recent call last): ... KeyError: 42 >>> a_index = term_index['A'] >>> c_index = term_index['C'] >>> vocab.transform(['C', 'A', 'C']) == [c_index, a_index, c_index] True >>> vocab.transform(['C', 'A', '[OOV]']) == [c_index, a_index, 1] True >>> indices = vocab.transform(list('ABCDDZZZ')) >>> ' '.join(vocab.state['index_term'][i] for i in indices) 'A B C D D [OOV] [OOV] [OOV]'
-
class
TermIndex
¶ Bases:
dict
Map term to index.
-
transform
(input_)¶ Transform a list of tokens to corresponding indices.
Return type: list
- pad_value (
-
class
matchzoo.preprocessors.units.word_hashing.
WordHashing
(term_index)¶ Bases:
matchzoo.preprocessors.units.unit.Unit
Word-hashing layer for DSSM-based models.
The input of
WordHashingUnit
should be a list of word sub-letter list extracted from one document. The output of is the word-hashing representation of this document.NgramLetterUnit
andVocabularyUnit
are two essential prerequisite ofWordHashingUnit
.Examples
>>> letters = [['#te', 'tes','est', 'st#'], ['oov']] >>> word_hashing = WordHashing( ... term_index={ ... '_PAD': 0, 'OOV': 1, 'st#': 2, '#te': 3, 'est': 4, 'tes': 5 ... }) >>> hashing = word_hashing.transform(letters) >>> hashing[0] [0.0, 0.0, 1.0, 1.0, 1.0, 1.0] >>> hashing[1] [0.0, 1.0, 0.0, 0.0, 0.0, 0.0]
-
transform
(input_)¶ Transform list of
letters
into word hashing layer.Parameters: input – list of tri_letters generated by NgramLetterUnit
.Return type: list
Returns: Word hashing representation of tri-letters.
-
Submodules¶
matchzoo.preprocessors.basic_preprocessor module¶
Basic Preprocessor.
-
class
matchzoo.preprocessors.basic_preprocessor.
BasicPreprocessor
(fixed_length_left=30, fixed_length_right=30, filter_mode='df', filter_low_freq=2, filter_high_freq=inf, remove_stop_words=False)¶ Bases:
matchzoo.engine.base_preprocessor.BasePreprocessor
Baisc preprocessor helper.
Parameters: - fixed_length_left (
int
) – Integer, maximize length ofleft
in the data_pack. - fixed_length_right (
int
) – Integer, maximize length ofright
in the data_pack. - filter_mode (
str
) – String, mode used byFrequenceFilterUnit
, Can be ‘df’, ‘cf’, and ‘idf’. - filter_low_freq (
float
) – Float, lower bound value used byFrequenceFilterUnit
. - filter_high_freq (
float
) – Float, upper bound value used byFrequenceFilterUnit
. - remove_stop_words (
bool
) – Bool, useStopRemovalUnit
unit or not.
Example
>>> import matchzoo as mz >>> train_data = mz.datasets.toy.load_data('train') >>> test_data = mz.datasets.toy.load_data('test') >>> preprocessor = mz.preprocessors.BasicPreprocessor( ... fixed_length_left=10, ... fixed_length_right=20, ... filter_mode='df', ... filter_low_freq=2, ... filter_high_freq=1000, ... remove_stop_words=True ... ) >>> preprocessor = preprocessor.fit(train_data, verbose=0) >>> preprocessor.context['input_shapes'] [(10,), (20,)] >>> preprocessor.context['vocab_size'] 226 >>> processed_train_data = preprocessor.transform(train_data, ... verbose=0) >>> type(processed_train_data) <class 'matchzoo.data_pack.data_pack.DataPack'> >>> test_data_transformed = preprocessor.transform(test_data, ... verbose=0) >>> type(test_data_transformed) <class 'matchzoo.data_pack.data_pack.DataPack'>
- fixed_length_left (
matchzoo.preprocessors.build_unit_from_data_pack module¶
Build unit from data pack.
-
matchzoo.preprocessors.build_unit_from_data_pack.
build_unit_from_data_pack
(unit, data_pack, mode='both', flatten=True, verbose=1)¶ Build a
StatefulUnit
from aDataPack
object.Parameters: - unit (
StatefulUnit
) –StatefulUnit
object to be built. - data_pack (
DataPack
) – The inputDataPack
object. - mode (
str
) – One of ‘left’, ‘right’, and ‘both’, to determine the source data for building theVocabularyUnit
. - flatten (
bool
) – Flatten the datapack or not. True to organize theDataPack
text as a list, and False to organizeDataPack
text as a list of list. - verbose (
int
) – Verbosity.
Return type: Returns: A built
StatefulUnit
object.- unit (
matchzoo.preprocessors.build_vocab_unit module¶
-
matchzoo.preprocessors.build_vocab_unit.
build_vocab_unit
(data_pack, mode='both', verbose=1)¶ Build a
preprocessor.units.Vocabulary
given data_pack.The data_pack should be preprocessed forehand, and each item in text_left and text_right columns of the data_pack should be a list of tokens.
Parameters: - data_pack (
DataPack
) – TheDataPack
to build vocabulary upon. - mode (
str
) – One of ‘left’, ‘right’, and ‘both’, to determine the source
data for building the
VocabularyUnit
. :type verbose:int
:param verbose: Verbosity. :rtype:Vocabulary
:return: A built vocabulary unit.- data_pack (
matchzoo.preprocessors.cdssm_preprocessor module¶
CDSSM Preprocessor.
-
class
matchzoo.preprocessors.cdssm_preprocessor.
CDSSMPreprocessor
(fixed_length_left=10, fixed_length_right=40, with_word_hashing=True)¶ Bases:
matchzoo.engine.base_preprocessor.BasePreprocessor
CDSSM Model preprocessor.
-
fit
(data_pack, verbose=1)¶ Fit pre-processing context for transformation.
Parameters: - verbose (
int
) – Verbosity. - data_pack (
DataPack
) – Data_pack to be preprocessed.
Returns: class:CDSSMPreprocessor instance.
- verbose (
-
transform
(data_pack, verbose=1)¶ Apply transformation on data, create letter-ngram representation.
Parameters: - data_pack (
DataPack
) – Inputs to be preprocessed. - verbose (
int
) – Verbosity.
Return type: Returns: Transformed data as
DataPack
object.- data_pack (
-
with_word_hashing
¶ with_word_hashing getter.
-
matchzoo.preprocessors.chain_transform module¶
Wrapper function organizes a number of transform functions.
matchzoo.preprocessors.dssm_preprocessor module¶
DSSM Preprocessor.
-
class
matchzoo.preprocessors.dssm_preprocessor.
DSSMPreprocessor
(with_word_hashing=True)¶ Bases:
matchzoo.engine.base_preprocessor.BasePreprocessor
DSSM Model preprocessor.
-
fit
(data_pack, verbose=1)¶ Fit pre-processing context for transformation.
Parameters: - verbose (
int
) – Verbosity. - data_pack (
DataPack
) – data_pack to be preprocessed.
Returns: class:DSSMPreprocessor instance.
- verbose (
-
transform
(data_pack, verbose=1)¶ Apply transformation on data, create tri-letter representation.
Parameters: - data_pack (
DataPack
) – Inputs to be preprocessed. - verbose (
int
) – Verbosity.
Return type: Returns: Transformed data as
DataPack
object.- data_pack (
-
with_word_hashing
¶ with_word_hashing getter.
-
matchzoo.preprocessors.naive_preprocessor module¶
Naive Preprocessor.
-
class
matchzoo.preprocessors.naive_preprocessor.
NaivePreprocessor
¶ Bases:
matchzoo.engine.base_preprocessor.BasePreprocessor
Naive preprocessor.
Example
>>> import matchzoo as mz >>> train_data = mz.datasets.toy.load_data() >>> test_data = mz.datasets.toy.load_data(stage='test') >>> preprocessor = mz.preprocessors.NaivePreprocessor() >>> train_data_processed = preprocessor.fit_transform(train_data, ... verbose=0) >>> type(train_data_processed) <class 'matchzoo.data_pack.data_pack.DataPack'> >>> test_data_transformed = preprocessor.transform(test_data, ... verbose=0) >>> type(test_data_transformed) <class 'matchzoo.data_pack.data_pack.DataPack'>
matchzoo.tasks package¶
Submodules¶
matchzoo.tasks.classification module¶
Classification task.
-
class
matchzoo.tasks.classification.
Classification
(num_classes=2, **kwargs)¶ Bases:
matchzoo.engine.base_task.BaseTask
Classification task.
Examples
>>> classification_task = Classification(num_classes=2) >>> classification_task.metrics = ['precision'] >>> classification_task.num_classes 2 >>> classification_task.output_shape (2,) >>> classification_task.output_dtype <class 'int'> >>> print(classification_task) Classification Task with 2 classes
-
classmethod
list_available_losses
()¶ Return type: list
Returns: a list of available losses.
-
classmethod
list_available_metrics
()¶ Return type: list
Returns: a list of available metrics.
-
num_classes
¶ number of classes to classify.
Type: return Return type: int
-
output_dtype
¶ target data type, expect int as output.
Type: return
-
output_shape
¶ output shape of a single sample of the task.
Type: return Return type: tuple
-
classmethod
matchzoo.tasks.ranking module¶
Ranking task.
-
class
matchzoo.tasks.ranking.
Ranking
(loss=None, metrics=None)¶ Bases:
matchzoo.engine.base_task.BaseTask
Ranking Task.
Examples
>>> ranking_task = Ranking() >>> ranking_task.metrics = ['map', 'ndcg'] >>> ranking_task.output_shape (1,) >>> ranking_task.output_dtype <class 'float'> >>> print(ranking_task) Ranking Task
-
classmethod
list_available_losses
()¶ Return type: list
Returns: a list of available losses.
-
classmethod
list_available_metrics
()¶ Return type: list
Returns: a list of available metrics.
-
output_dtype
¶ target data type, expect float as output.
Type: return
-
output_shape
¶ output shape of a single sample of the task.
Type: return Return type: tuple
-
classmethod
Module contents¶
matchzoo.utils package¶
Submodules¶
matchzoo.utils.list_recursive_subclasses module¶
-
matchzoo.utils.list_recursive_subclasses.
list_recursive_concrete_subclasses
(base)¶ List all concrete subclasses of base recursively.
matchzoo.utils.make_keras_optimizer_picklable module¶
-
matchzoo.utils.make_keras_optimizer_picklable.
make_keras_optimizer_picklable
()¶ Fix https://github.com/NTMC-Community/MatchZoo/issues/726.
This function changes how keras behaves, use with caution.
matchzoo.utils.one_hot module¶
One hot vectors.
-
matchzoo.utils.one_hot.
one_hot
(indices, num_classes)¶ Return type: ndarray
Returns: A one-hot encoded vector.
matchzoo.utils.tensor_type module¶
Define Keras tensor type.
Module contents¶
Submodules¶
matchzoo.version module¶
Matchzoo version file.
Module contents¶
MatchZoo Model Reference¶
Naive¶
Model Documentation¶
Naive model with a simplest structure for testing purposes.
Bare minimum functioning model. The best choice to get things rolling. The worst choice to fit and evaluate performance.
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.naive.Naive’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | choice in [‘adam’, ‘adagrad’, ‘rmsprop’] |
DSSM¶
Model Documentation¶
Deep structured semantic model.
- Examples:
>>> model = DSSM() >>> model.params['mlp_num_layers'] = 3 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.dssm.DSSM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
5 | mlp_num_units | Number of units in first mlp_num_layers layers. | 128 | quantitative uniform distribution in [8, 256), with a step size of 8 |
6 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 6), with a step size of 1 |
7 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
8 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu |
CDSSM¶
Model Documentation¶
CDSSM Model implementation.
Learning Semantic Representations Using Convolutional Neural Networks for Web Search. (2014a) A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. (2014b)
- Examples:
>>> model = CDSSM() >>> model.params['optimizer'] = 'adam' >>> model.params['filters'] = 32 >>> model.params['kernel_size'] = 3 >>> model.params['conv_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.cdssm.CDSSM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
5 | mlp_num_units | Number of units in first mlp_num_layers layers. | 128 | quantitative uniform distribution in [8, 256), with a step size of 8 |
6 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 6), with a step size of 1 |
7 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
8 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu | |
9 | filters | Number of filters in the 1D convolution layer. | 32 | |
10 | kernel_size | Number of kernel size in the 1D convolution layer. | 3 | |
11 | strides | Strides in the 1D convolution layer. | 1 | |
12 | padding | The padding mode in the convolution layer. It should be one of same, valid, and causal. | same | |
13 | conv_activation_func | Activation function in the convolution layer. | relu | |
14 | w_initializer | glorot_normal | ||
15 | b_initializer | zeros | ||
16 | dropout_rate | The dropout rate. | 0.3 |
DenseBaseline¶
Model Documentation¶
A simple densely connected baseline model.
- Examples:
>>> model = DenseBaseline() >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 300 >>> model.params['mlp_num_fan_out'] = 128 >>> model.params['mlp_activation_func'] = 'relu' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.dense_baseline.DenseBaseline’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
5 | mlp_num_units | Number of units in first mlp_num_layers layers. | 256 | quantitative uniform distribution in [16, 512), with a step size of 1 |
6 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 5), with a step size of 1 |
7 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
8 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu |
ArcI¶
Model Documentation¶
ArcI Model.
- Examples:
>>> model = ArcI() >>> model.params['num_blocks'] = 1 >>> model.params['left_filters'] = [32] >>> model.params['right_filters'] = [32] >>> model.params['left_kernel_sizes'] = [3] >>> model.params['right_kernel_sizes'] = [3] >>> model.params['left_pool_sizes'] = [2] >>> model.params['right_pool_sizes'] = [4] >>> model.params['conv_activation_func'] = 'relu' >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 64 >>> model.params['mlp_num_fan_out'] = 32 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.arci.ArcI’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
9 | mlp_num_units | Number of units in first mlp_num_layers layers. | 128 | quantitative uniform distribution in [8, 256), with a step size of 8 |
10 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 6), with a step size of 1 |
11 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
12 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu | |
13 | num_blocks | Number of convolution blocks. | 1 | |
14 | left_filters | The filter size of each convolution blocks for the left input. | [32] | |
15 | left_kernel_sizes | The kernel size of each convolution blocks for the left input. | [3] | |
16 | right_filters | The filter size of each convolution blocks for the right input. | [32] | |
17 | right_kernel_sizes | The kernel size of each convolution blocks for the right input. | [3] | |
18 | conv_activation_func | The activation function in the convolution layer. | relu | |
19 | left_pool_sizes | The pooling size of each convolution blocks for the left input. | [2] | |
20 | right_pool_sizes | The pooling size of each convolution blocks for the right input. | [2] | |
21 | padding | The padding mode in the convolution layer. It should be oneof same, valid, and causal. | same | choice in [‘same’, ‘valid’, ‘causal’] |
22 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.01 |
ArcII¶
Model Documentation¶
ArcII Model.
- Examples:
>>> model = ArcII() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_1d_count'] = 32 >>> model.params['kernel_1d_size'] = 3 >>> model.params['kernel_2d_count'] = [16, 32] >>> model.params['kernel_2d_size'] = [[3, 3], [3, 3]] >>> model.params['pool_2d_size'] = [[2, 2], [2, 2]] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.arcii.ArcII’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | choice in [‘adam’, ‘rmsprop’, ‘adagrad’] | |
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | num_blocks | Number of 2D convolution blocks. | 1 | |
9 | kernel_1d_count | Kernel count of 1D convolution layer. | 32 | |
10 | kernel_1d_size | Kernel size of 1D convolution layer. | 3 | |
11 | kernel_2d_count | Kernel count of 2D convolution layer ineach block | [32] | |
12 | kernel_2d_size | Kernel size of 2D convolution layer in each block. | [[3, 3]] | |
13 | activation | Activation function. | relu | |
14 | pool_2d_size | Size of pooling layer in each block. | [[2, 2]] | |
15 | padding | The padding mode in the convolution layer. It should be oneof same, valid. | same | choice in [‘same’, ‘valid’] |
16 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.01 |
MatchPyramid¶
Model Documentation¶
MatchPyramid Model.
- Examples:
>>> model = MatchPyramid() >>> model.params['embedding_output_dim'] = 300 >>> model.params['num_blocks'] = 2 >>> model.params['kernel_count'] = [16, 32] >>> model.params['kernel_size'] = [[3, 3], [3, 3]] >>> model.params['dpool_size'] = [3, 10] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.match_pyramid.MatchPyramid’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | num_blocks | Number of convolution blocks. | 1 | |
9 | kernel_count | The kernel count of the 2D convolution of each block. | [32] | |
10 | kernel_size | The kernel size of the 2D convolution of each block. | [[3, 3]] | |
11 | activation | The activation function. | relu | |
12 | dpool_size | The max-pooling size of each block. | [3, 10] | |
13 | padding | The padding mode in the convolution layer. | same | |
14 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.01 |
KNRM¶
Model Documentation¶
KNRM model.
- Examples:
>>> model = KNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 10 >>> model.params['embedding_trainable'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.knrm.KNRM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | kernel_num | The number of RBF kernels. | 11 | quantitative uniform distribution in [5, 20), with a step size of 1 |
9 | sigma | The sigma defines the kernel width. | 0.1 | quantitative uniform distribution in [0.01, 0.2), with a step size of 0.01 |
10 | exact_sigma | The exact_sigma denotes the sigma for exact match. | 0.001 |
DUET¶
Model Documentation¶
DUET Model.
- Examples:
>>> model = DUET() >>> model.params['embedding_input_dim'] = 1000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['lm_filters'] = 32 >>> model.params['lm_hidden_sizes'] = [64, 32] >>> model.params['dropout_rate'] = 0.5 >>> model.params['dm_filters'] = 32 >>> model.params['dm_kernel_size'] = 3 >>> model.params['dm_d_mpool'] = 4 >>> model.params['dm_hidden_sizes'] = [64, 32] >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.duet.DUET’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | lm_filters | Filter size of 1D convolution layer in the local model. | 32 | |
9 | lm_hidden_sizes | A list of hidden size of the MLP layer in the local model. | [32] | |
10 | dm_filters | Filter size of 1D convolution layer in the distributed model. | 32 | |
11 | dm_kernel_size | Kernel size of 1D convolution layer in the distributed model. | 3 | |
12 | dm_q_hidden_size | Hidden size of the MLP layer for the left text in the distributed model. | 32 | |
13 | dm_d_mpool | Max pooling size for the right text in the distributed model. | 3 | |
14 | dm_hidden_sizes | A list of hidden size of the MLP layer in the distributed model. | [32] | |
15 | padding | The padding mode in the convolution layer. It should be one of same, valid, and causal. | same | |
16 | activation_func | Activation function in the convolution layer. | relu | |
17 | dropout_rate | The dropout rate. | 0.5 | quantitative uniform distribution in [0.0, 0.8), with a step size of 0.02 |
DRMMTKS¶
Model Documentation¶
DRMMTKS Model.
- Examples:
>>> model = DRMMTKS() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['top_k'] = 20 >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.drmmtks.DRMMTKS’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | [(5,), (300,)] | |
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
9 | mlp_num_units | Number of units in first mlp_num_layers layers. | 128 | quantitative uniform distribution in [8, 256), with a step size of 8 |
10 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 6), with a step size of 1 |
11 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
12 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu | |
13 | mask_value | The value to be masked from inputs. | -1 | |
14 | top_k | Size of top-k pooling layer. | 10 | quantitative uniform distribution in [2, 100), with a step size of 1 |
DRMM¶
Model Documentation¶
DRMM Model.
- Examples:
>>> model = DRMM() >>> model.params['mlp_num_layers'] = 1 >>> model.params['mlp_num_units'] = 5 >>> model.params['mlp_num_fan_out'] = 1 >>> model.params['mlp_activation_func'] = 'tanh' >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build() >>> model.compile()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.drmm.DRMM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | [(5,), (5, 30)] | |
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
9 | mlp_num_units | Number of units in first mlp_num_layers layers. | 128 | quantitative uniform distribution in [8, 256), with a step size of 8 |
10 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 6), with a step size of 1 |
11 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
12 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu | |
13 | mask_value | The value to be masked from inputs. | -1 |
ANMM¶
Model Documentation¶
ANMM Model.
- Examples:
>>> model = ANMM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.anmm.ANMM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | dropout_rate | The dropout rate. | 0.1 | quantitative uniform distribution in [0, 1), with a step size of 0.05 |
9 | num_layers | Number of hidden layers in the MLP layer. | 2 | |
10 | hidden_sizes | Number of hidden size for each hidden layer | [30, 30] |
MVLSTM¶
Model Documentation¶
MVLSTM Model.
- Examples:
>>> model = MVLSTM() >>> model.params['lstm_units'] = 32 >>> model.params['top_k'] = 50 >>> model.params['mlp_num_layers'] = 2 >>> model.params['mlp_num_units'] = 20 >>> model.params['mlp_num_fan_out'] = 10 >>> model.params['mlp_activation_func'] = 'relu' >>> model.params['dropout_rate'] = 0.5 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.mvlstm.MVLSTM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | with_multi_layer_perceptron | A flag of whether a multiple layer perceptron is used. Shouldn’t be changed. | True | |
9 | mlp_num_units | Number of units in first mlp_num_layers layers. | 128 | quantitative uniform distribution in [8, 256), with a step size of 8 |
10 | mlp_num_layers | Number of layers of the multiple layer percetron. | 3 | quantitative uniform distribution in [1, 6), with a step size of 1 |
11 | mlp_num_fan_out | Number of units of the layer that connects the multiple layer percetron and the output. | 64 | quantitative uniform distribution in [4, 128), with a step size of 4 |
12 | mlp_activation_func | Activation function used in the multiple layer perceptron. | relu | |
13 | lstm_units | Integer, the hidden size in the bi-directional LSTM layer. | 32 | |
14 | dropout_rate | Float, the dropout rate. | 0.0 | |
15 | top_k | Integer, the size of top-k pooling layer. | 10 | quantitative uniform distribution in [2, 100), with a step size of 1 |
MatchLSTM¶
Model Documentation¶
Match LSTM model.
- Examples:
>>> model = MatchLSTM() >>> model.guess_and_fill_missing_params(verbose=0) >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 100 >>> model.params['embedding_trainable'] = True >>> model.params['fc_num_units'] = 200 >>> model.params['lstm_num_units'] = 256 >>> model.params['dropout_rate'] = 0.5 >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.contrib.models.match_lstm.MatchLSTM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | lstm_num_units | The hidden size in the LSTM layer. | 256 | quantitative uniform distribution in [128, 384), with a step size of 32 |
9 | fc_num_units | The hidden size in the full connection layer. | 200 | quantitative uniform distribution in [100, 300), with a step size of 20 |
10 | dropout_rate | The dropout rate. | 0.0 | quantitative uniform distribution in [0.0, 0.9), with a step size of 0.01 |
ConvKNRM¶
Model Documentation¶
ConvKNRM model.
- Examples:
>>> model = ConvKNRM() >>> model.params['embedding_input_dim'] = 10000 >>> model.params['embedding_output_dim'] = 300 >>> model.params['embedding_trainable'] = True >>> model.params['filters'] = 128 >>> model.params['conv_activation_func'] = 'tanh' >>> model.params['max_ngram'] = 3 >>> model.params['use_crossmatch'] = True >>> model.params['kernel_num'] = 11 >>> model.params['sigma'] = 0.1 >>> model.params['exact_sigma'] = 0.001 >>> model.guess_and_fill_missing_params(verbose=0) >>> model.build()
Model Hyper Parameters¶
Name | Description | Default Value | Default Hyper-Space | |
---|---|---|---|---|
0 | model_class | Model class. Used internally for save/load. Changing this may cause unexpected behaviors. | <class ‘matchzoo.models.conv_knrm.ConvKNRM’> | |
1 | input_shapes | Dependent on the model and data. Should be set manually. | ||
2 | task | Decides model output shape, loss, and metrics. | ||
3 | optimizer | adam | ||
4 | with_embedding | A flag used help auto module. Shouldn’t be changed. | True | |
5 | embedding_input_dim | Usually equals vocab size + 1. Should be set manually. | ||
6 | embedding_output_dim | Should be set manually. | ||
7 | embedding_trainable | True to enable embedding layer training, False to freeze embedding parameters. | True | |
8 | kernel_num | The number of RBF kernels. | 11 | quantitative uniform distribution in [5, 20), with a step size of 1 |
9 | sigma | The sigma defines the kernel width. | 0.1 | quantitative uniform distribution in [0.01, 0.2), with a step size of 0.01 |
10 | exact_sigma | The exact_sigma denotes the sigma for exact match. | 0.001 | |
11 | filters | The filter size in the convolution layer. | 128 | |
12 | conv_activation_func | The activation function in the convolution layer. | relu | |
13 | max_ngram | The maximum length of n-grams for the convolution layer. | 3 | |
14 | use_crossmatch | Whether to match left n-grams and right n-grams of different lengths | True |