Fast Python Vowpal Wabbit wrapper¶
For Kaggle playing use official vowpalwabbit package, for production use subwabbit.
subwabbit is Python wrapper around great Vowpal Wabbit tool that aims to be as fast as Vowpal itself. It is ideal for real time use, when many lines need to be scored in just few milliseconds or when high throughput is required.
Advantages:
- more then 4x faster then official Python wrapper
- good latency guarantees - give 10ms for prediction and it will end in 10ms
- explainability - API for explaining prediction value
- use just
vw
CLI - no compiling - proven by reliably running in production at Seznam.cz where it makes hundreds of thousands of predictions per second per machine
Documentation¶
Full documentation can be found on Read the docs.
Requirements¶
- Python 3.4+
- Vowpal Wabbit
You can install Vowpal Wabbit by running:
sudo apt-get install vowpal-wabbit
on Debian-based systems or by using Homebrew:
brew install vowpal-wabbit
You can also build Vowpal Wabbit from source, see instructions.
subwabbit will probably work on other Pythons than 3.4+ but it is not tested (contribution welcomed).
Installation¶
pip install subwabbit
Example use¶
from subwabbit import VowpalWabbitProcess, VowpalWabbitDummyFormatter
vw = VowpalWabbitProcess(VowpalWabbitDummyFormatter(), ['-q', 'ab'])
common_features = '|a common_feature1:1.5 common_feature2:-0.3'
items_features = [
'|b item123',
'|b item456',
'|b item789'
]
for prediction in vw.predict(common_features, items_features, timeout=0.001):
print(prediction)
0.4
0.5
0.6
This is the simplest use of subwabbit library. You have some common features that describe context - it can be location of user or daytime for example. Then there is collection of items to score, each item has its specific features. Use of timeout argument means “compute as many predictions as you can in 1ms”, then stop.
More advanced use¶
With simple implementation above you will not use key feature of subwabbit: you can format your vw lines while Vowpal is busy with computing predictions. By using this trick, you can get great speedup and VW lines formatting abstraction as a bonus.
Suppose we have features as dicts:
common_features = {
'common_feature1': 1.5,
'common_feature2': -0.3
}
items_features = [
{'id': 'item123'},
{'id': 'item456'},
{'id': 'item789'}
]
Then implementation with use of formatter can look like this:
from subwabbit import VowpalWabbitBaseFormatter, VowpalWabbitProcess
class MyVowpalWabbitFormatter(VowpalWabbitBaseFormatter):
def format_common_features(self, common_features, debug_info=None):
return '|a ccommon_feature1:{:.2f} common_feature2:{:.2f}'.format(
common_features['common_feature1'],
common_features['common_feature2']
)
def format_item_features(self, common_features, item_features, debug_info=None):
return '|b {}'.format(item_features['id'])
vw = VowpalWabbitProcess(MyVowpalWabbitFormatter(), ['-q', 'ab'])
for prediction in vw.predict(common_features, items_features, timeout=0.001):
print(prediction)
0.4
0.5
0.6
Benchmarks¶
Benchmarks were made on logistic regression model with L2 regularization and with many quadratic combinations to mimic real-world use case. Real dataset containing 1000 contexts and 3000 items was used. Model was pretrained on this dataset with random labels generated. You can see used features at:
- tests/benchmarks/requests.json
- tests/benchmarks/items.json
# Prepare environment
pip install pandas vowpalwabbit
cd tests/benchmarks
# benchmarks depends a lot whether Vowpal is trained or just initialized
python pretrain_model.py
# Benchmark official Python client
python benchmark_pyvw.py
# Benchmark blocking implementation
python benchmark_blocking_implementation.py
# Benchmark nonblocking implementation
python benchmark_blocking_implementation.py
Benchmark results¶
Results on Dell Latitude E7470 with Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz.
Table shows how many lines implementation can predict in 10ms:
pyvw | subwabbit | |
---|---|---|
mean | 239.461000 | 1033.70000 |
min | 83.000000 | 100.00000 |
25% | 192.750000 | 650.00000 |
50% | 240.000000 | 1000.00000 |
75% | 288.000000 | 1350.00000 |
90% | 316.000000 | 1600.00000 |
99% | 349.000000 | 1900.00000 |
max | 362.000000 | 2050.00000 |
subwabbit is in average more then 4x faster than official Python wrapper.
License¶
Copyright (c) 2016 - 2018, Seznam.cz, a.s. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Contents¶
Throughput vs. latency¶
There are two implementations of subwabbit.base.VowpalWabbitBaseModel
. Both implementations
run vw
subprocess and communicates with subprocess through pipes, but implementations differ in whether
pipe is blocking or nonblocking.
Blocking¶
subwabbit.blocking.VowpalWabbitProcess
Blocking implementation use buffered binary IO. When predict() method is called, there is loop that:
- creates batch of VW lines
- sends this batch to Vowpal and flush Python-side buffer into system pipe buffer
- waits for predictions from last but one batch (writing is one batch ahead, so Vowpal should always be busy with processing lines)
There is also train()
method that looks very similar,
but usually you run training on instance with write_only=True so there is no need to wait for predictions.
Nonblocking¶
Warning
Nonblocking implementation is only available for Linux based systems.
Warning
Training is not implemented for nonblocking variant.
Blocking implementation has great throughput, depends on features you have and arguments of vw process, it can be even optimal, so Vowpal itself is a bottleneck. However, due to blocking system calls, it can miss timeout. That is unacceptable if there is SLO with low-latency requirements.
Nonblocking implementation works similar to blocking, but it does not block for system calls when there are no predictions to read or system level buffer for VW lines is full, which helps to keep latencies very stable.
There is comparison of running time of predict()
method with timeout set to 10ms:
pyvw | blocking | nonblocking | |
---|---|---|---|
mean | 0.010039 | 0.010929 | 0.009473 |
min | 0.010012 | 0.010054 | 0.009049 |
25% | 0.010025 | 0.010130 | 0.009142 |
50% | 0.010036 | 0.010312 | 0.009355 |
75% | 0.010048 | 0.010630 | 0.009804 |
90% | 0.010063 | 0.010950 | 0.010024 |
99% | 0.010091 | 0.013289 | 0.010140 |
max | 0.010138 | 0.468903 | 0.010999 |
Nonblocking implementation reduced latency peaks significantly, from almost 460ms to just 1ms.
Nonblocking implementation makes more system calls with smaller batches then blocking implementation and it comes with price of slightly lower throughput.
Predicted lines per request:
pyvw | blocking | nonblocking | |
---|---|---|---|
mean | 239.461000 | 1033.70000 | 911.890000 |
min | 83.000000 | 100.00000 | 0.000000 |
25% | 192.750000 | 650.00000 | 552.000000 |
50% | 240.000000 | 1000.00000 | 841.500000 |
75% | 288.000000 | 1350.00000 | 1271.750000 |
90% | 316.000000 | 1600.00000 | 1574.000000 |
99% | 349.000000 | 1900.00000 | 1900.130000 |
max | 362.000000 | 2050.00000 | 2022.000000 |
Note
Nonblocking implementation may have even zero predictions per call. It can happen due to
previous call not having enough time to clean buffers before timeout, thus next call has to clean buffers and that
can take all of it’s time.
See predict()
metrics argument for details how
to monitor this behavior.
Monitoring and debugging¶
This section gives overview of subwabbit monitoring and debugging capabilities.
Monitoring¶
It is good practice to monitor your system’s behavior and fire an alert when system behavior changes.
Both blocking and nonblocking implementations of predict()
can collect some metrics that can be helpful. There are two kinds of
metrics:
metrics
- one numeric measurment per one call of predict() method. They are relatively cheap to collect and should be monitored in production.detailed_metrics
- more measurements per one call of predict(). Each metric value is a list containing tuple(time, numeric value)
. Their collection brings some overhead (e.g. reallocation of memory for growing lists of measurements). They are useful for profiling and can answer questions like “What is the bottleneck, formatting Vowpal lines or Vowpal itself?” or “Can change in some parameter bring some additional performance?”.
See API documentation for more details about collected metrics for specific implementation.
See example of visualizing detailed_metrics
:
pip install jupyter pandas matplotlib
jupyter notebook examples/Detailed-metrics.ipynb
Debugging¶
Sometimes it is useful to save some internal state like final formatted VW line. For these cases you can use
debug_info
parameter, which can be passed both to predict()
and train()
methods and which is passed to all following
subwabbit.base.VowpalWabbitBaseFormatter
calls and to private method calls. You can pass dict
for example and fill it by some useful information.
Explaining predictions¶
It is practical to understand your model. There are various ways how to gain some insights about your model behavior, see for example excellent Dan Becker’s tutorial on Kaggle: https://www.kaggle.com/learn/machine-learning-explainability .
Vowpal Wabbit offers various options how to inspect learned weights, subwabbit helps with use of audit mode. It allows to easily compute which features contributes the most for particular line’s prediction.
How to explain prediction¶
At first, you need to turn on audit_mode by passing audit_mode=True
argument to
subwabbit.base.VowpalWabbitBaseModel
constructor.
Then use explain_vw_line()
to retrieve explanation string. It will
look like this: c^c8*f^f10237121819548268936:23365229:1:0.0220863@0 a^a3426538138935958091*e^e115:1296634:0.2:0.0987504@0
Features used for prediction are separated by tab and for each feature, there is string in format: namespace^feature:hashindex:value:weight[@ssgrad]
Then we can use get_human_readable_explanation()
function
to transform explanation string into more interpretable structure:
-
subwabbit.base.VowpalWabbitBaseFormatter.
get_human_readable_explanation
(self, explanation_string: str, feature_translator: Any = None) → List[Dict[KT, VT]] Transform explanation string into more readable form. Every feature used for prediction is translated into this structure:
{ # For each feature used in higher interaction there is a 2-tuple 'names': [('Human readable namespace name 1', 'Human readable feature name 1'), ...], 'original_feature_name': 'c^c8*f^f102' # feature name how vowpal sees it, 'hashindex': 123, # Vowpal's internal hash of feature name 'value': 0.123, # value for feature in input line 'weight': -0.534, # weight learned by VW for this feature 'potential': value * weight, 'relative_potential': abs(potential) / sum_of_abs_potentials_for_all_features }
Parameters: - explanation_string – Explanation string from
explain_vw_line()
- feature_translator – Any object that can help you with translation of feature names into human readable
form, for example some database connection.
See
parse_element()
Returns: List of dicts, sorted by contribution to final score
- explanation_string – Explanation string from
You may also want to overwrite parse_element()
method on your
formatter to translate Vowpal feature names into human readable form, for example translate IDs to their names,
potentialy using some mapping in database.
Example¶
Feature importances can also be visualized in Jupyter notebook, see complete example of how to use subwabbit for explaining predictions:
pip install jupyter
jupyter notebook examples/Explaining-prediction.ipynb
Notes¶
Note
This explanation is valid if you use sparse features, since expected value of every feature is close to zero. When you use dense features, you should normalize your features. If you do not normalize to zero mean, explaining features by their absolute contribution is not informative because you also need to consider how feature value differs from some expected value of that feature. In this case, you should use SHAP values for better interpretability, see https://www.kaggle.com/learn/machine-learning-explainability for more details. You still may find subwabbit explaining functionality useful, but interpreting results results won’t be straightforward.
Note
In case you have correlated features, it is better to sum their potentials and consider them as single feature, otherwise you may underestimate influence of these features.
API¶
Base classes¶
-
class
subwabbit.base.
VowpalWabbitBaseFormatter
[source]¶ Formatter translates structured information about context and items to Vowpal Wabbit’s input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format
It also can implement reverse translation, from Vowpal Wabbits feature names into human readable feature names.
-
format_common_features
(common_features: Any, debug_info: Any = None) → str[source]¶ Return part of VW line with features that are common for one call of predict/train. This method will run just once per one call of
subwabbit.base.VowpalWabbitBaseModel
’s predict() or train() method.Parameters: - common_features – Features common for all items
- debug_info – Optional dict that can be filled by information useful for debugging
Returns: Part of line that is common for each item in one call. Returned string has to start with ‘|’ symbol.
-
format_item_features
(common_features: Any, item_features: Any, debug_info: Any = None) → str[source]¶ Return part of VW line with features specific to each item. This method will run for each item per one call of
subwabbit.base.VowpalWabbitBaseModel
’s predict() or train() method.Note
It is a good idea to cache results of this method.
Parameters: - common_features – Features common for all items
- item_features – Features for item
- debug_info – Optional dict that can be filled by information useful for debugging
Returns: Part of line that is specific for item. Depends on whether namespaces are used or not in
format_common_features
method:- namespaces are used: returned string has to start with
'|NAMESPACE_NAME'
where NAMESPACE_NAME is the name of some namespace - namespaces are not used: returned string should not contain ‘|’ symbol
-
get_formatted_example
(common_line_part: str, item_line_part: str, label: Optional[float] = None, weight: Optional[float] = None, debug_info: Optional[Dict[Any, Any]] = None)[source]¶ Compose valid VW line from its common and item-dependent parts.
Parameters: - common_line_part – Part of line that is common for each item in one call.
- item_line_part – Part of line specific for each item
- label – Label of this row
- weight – Optional weight of row
- debug_info – Optional dict that can be filled by information useful for debugging
Returns: One VW line in input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format
-
get_human_readable_explanation
(explanation_string: str, feature_translator: Any = None) → List[Dict[KT, VT]][source]¶ Transform explanation string into more readable form. Every feature used for prediction is translated into this structure:
{ # For each feature used in higher interaction there is a 2-tuple 'names': [('Human readable namespace name 1', 'Human readable feature name 1'), ...], 'original_feature_name': 'c^c8*f^f102' # feature name how vowpal sees it, 'hashindex': 123, # Vowpal's internal hash of feature name 'value': 0.123, # value for feature in input line 'weight': -0.534, # weight learned by VW for this feature 'potential': value * weight, 'relative_potential': abs(potential) / sum_of_abs_potentials_for_all_features }
Parameters: - explanation_string – Explanation string from
explain_vw_line()
- feature_translator – Any object that can help you with translation of feature names into human readable
form, for example some database connection.
See
parse_element()
Returns: List of dicts, sorted by contribution to final score
- explanation_string – Explanation string from
-
get_human_readable_explanation_html
(explanation_string: str, feature_translator: Any = None, max_rows: Optional[int] = None)[source]¶ Visualize importance of features in Jupyter notebook.
Parameters: - explanation_string – Explanation string from
explain_vw_line()
- feature_translator – Any object that can help you with translation, e.g. some database connection.
- max_rows – Maximum number of most important features. None return all used features.
Returns: IPython.core.display.HTML
- explanation_string – Explanation string from
-
parse_element
(element: str, feature_translator: Any = None) → Tuple[str, str][source]¶ This method is supposed to translate namespace name and feature name to human readable form.
For example, element can be “a_item_id^i123” and result can be (‘Item ID’, ‘News of the day: ID of item is 123’)
Parameters: - element – namespace name and feature name, e.g. a_item_id^i123
- feature_translator – Any object that can help you with translation, e.g. some database connection
Returns: tuple(human understandable namespace name, human understandable feature name)
-
-
class
subwabbit.base.
VowpalWabbitDummyFormatter
[source]¶ Formatter that assumes that either common features and item features are already formatted VW input format strings.
-
class
subwabbit.base.
VowpalWabbitBaseModel
(formatter: subwabbit.base.VowpalWabbitBaseFormatter)[source]¶ Declaration of Vowpal Wabbit model interface.
-
explain_vw_line
(vw_line: str, link_function: bool = False)[source]¶ Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing
audit_mode=True
to constructor.Parameters: - vw_line – String in VW line format
- link_function – If your model use link function, pass True
Returns: (raw prediction without use of link function, explanation string)
-
predict
(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[float][source]¶ Transforms iterable with item features to iterator of predictions.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
- debug_info – Some object that can be filled by information useful for debugging.
- metrics – Optional dict that is populated with some metrics that are good to monitor.
- detailed_metrics – Optional dict with more detailed (and more time consuming) metrics that are good for debugging and profiling.
Returns: Iterable with predictions for each item from
items_features
-
train
(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶ Transform features, label and weight into VW line format and send it to Vowpal.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- labels – Iterable with same length as items features with label for each item
- weights – Iterable with same length as items features with optional weight for each item
- debug_info – Some object that can be filled by information useful for debugging
-
Blocking implementation¶
-
class
subwabbit.blocking.
VowpalWabbitProcess
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]¶ Class representing Vowpal Wabbit model. It runs
vw
command through subprocess library and communicates through pipes.-
__init__
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]¶ Parameters: - formatter – Instance of
subwabbit.base.VowpalWabbitBaseFormatter
- vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
- batch_size – Number of lines communicated to Vowpal in one system call, has influence on performance. Smaller batches slightly reduces latencies and throughput.
- write_only – whether we expect to get predictions or we will just train This can greatly improve training performance but disables predicting.
- audit_mode – When set to True, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.
Warning
WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.
- formatter – Instance of
-
explain_vw_line
(vw_line: str, link_function=False)[source]¶ Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing
audit_mode=True
to constructor.Parameters: - vw_line – String in VW line format
- link_function – If your model use link function, pass True
Returns: (raw prediction without use of link function, explanation string)
-
predict
(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[float][source]¶ Transforms iterable with item features to iterator of predictions.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
- debug_info – Some object that can be filled by information useful for debugging.
- metrics –
Optional dict populated with metrics that are good to monitor:
prepare_time
- Time from call start to start of prediction loop, includingformat_common_features
calltotal_time
- Total time spend in predict callnum_lines
- Count of predictions performed
- detailed_metrics –
- Optional dict with more detailed (and more time consuming) metrics that are good
- for debugging and profiling:
generating_lines_time
- time spent by generating VW linesending_lines_time
- time spent by sending VW lines to OS pipe bufferreceiving_lines_time
- time spent by reading predictions from OS pipe buffer
For each key, there will be list of tuples (time, metric value).
Returns: Iterable with predictions for each item from
items_features
-
train
(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶ Transform features, label and weight into VW line format and send it to Vowpal.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- labels – Iterable with same length as items features with label for each item
- weights – Iterable with same length as items features with optional weight for each item
- debug_info – Some object that can be filled by information useful for debugging
-
Nonblocking implementation¶
-
class
subwabbit.nonblocking.
VowpalWabbitNonBlockingProcess
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]¶ Class representing Vowpal Wabbit model. It runs vw bash command through subprocess library and communicates through non-blocking pipes.
Warning
Available on Linux only.
-
__init__
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]¶ Parameters: - formatter – Instance of
subwabbit.base.VowpalWabbitBaseFormatter
- vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
- batch_size – Maximum number of lines communicated to Vowpal in one system call. Smaller batches means less system calls overhead, but also higher risk of keeping mess for other calls.
- audit_mode – When turned on, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.
- max_pending_lines – How many lines can wait for prediction in buffers. Recommended to set it to same value as batch_size, but it can be higher.
- write_timeout_ms – When predict is called with timeout, then write_timeout_ms before timeout sending lines to vowpal stops. It provides time to finish work without keeping mess that next call have to clean.
- pipe_buffer_size_bytes – Optionally set size of system buffer for sending lines to Vowpal.
None means use default buffer size, for more details see
http://man7.org/linux/man-pages/man7/pipe.7.html and
detailed_metrics
argument ofpredict()
method
Warning
WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.
- formatter – Instance of
-
cleanup
(deadline: Optional[float] = None, debug_info: Any = None)[source]¶ Cleans buffers after previous calls
Parameters: deadline – Optional unix timestamp to end
-
explain_vw_line
(vw_line: str, link_function=False)[source]¶ Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing
audit_mode=True
to constructor.Parameters: - vw_line – String in VW line format
- link_function – If your model use link function, pass True
Returns: (raw prediction without use of link function, explanation string)
-
predict
(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[float][source]¶ Transforms iterable with item features to iterator of predictions.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
- debug_info – Some object that can be filled by information useful for debugging.
- metrics –
Optional dict populated with metrics that are good to monitor:
cleanup_time
- Time spent on cleaning buffers after last callsbefore_cleanup_pending_lines
- Count of lines pending in buffers before cleaningafter_cleanup_pending_lines
- Count of lines pending in buffers after cleaningprepare_time
- Time from call start to start of prediction loop, includingformat_common_features
calltotal_time
- Total time spend in predict callnum_lines
- Count of predictions performed
- detailed_metrics –
- Optional dict with more detailed (and more time consuming) metrics that are good
- for debugging and profiling:
sending_bytes
- number of bytes (VW lines) sent to OS pipe bufferreceiving_bytes
- number of bytes (predictions) received from OS pipe bufferpending_lines
- number of pending lines sent to VW at the timegenerating_lines_time
- time spent by generating VW lines batchsending_lines_time
- time spent by sending lines to OS pipe bufferreceiving_lines_time
- time spent by receiving predictions from OS pipe buffer
For each key, there will be list of tuples (time, metric value).
Returns: Iterable with predictions for each item from
items_features
-
train
(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶ Transform features, label and weight into VW line format and send it to Vowpal.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- labels – Iterable with same length as items features with label for each item
- weights – Iterable with same length as items features with optional weight for each item
- debug_info – Some object that can be filled by information useful for debugging
-