Welcome to tf-mdp’s documentation!¶
tfmdp package¶
Subpackages¶
tfmdp.model package¶
Subpackages¶
tfmdp.model.cell package¶
-
class
tfmdp.model.cell.basic_cell.
BasicMarkovCell
(compiler: rddl2tf.compiler.Compiler, policy: tfmdp.policy.drp.DeepReactivePolicy, config: Optional[Dict] = None)¶ Bases:
tensorflow.python.ops.rnn_cell_impl.RNNCell
BasicMarkovCell implements a 1-step MDP transition function as an RNNCell whose hidden state is the MDP current state and output is a tuple with next state, action, intermediate fluents, and reward.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - config (Dict) – The cell configuration parameters.
-
__call__
(inputs: tensorflow.python.framework.ops.Tensor, state: Sequence[tensorflow.python.framework.ops.Tensor], scope: Optional[str] = None) → Tuple[Tuple[Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor], Sequence[tensorflow.python.framework.ops.Tensor]]¶ Returns the cell’s output tuple and next state tensors.
Output tuple packs together the next state, action, interms, and reward tensors in order.
Parameters: - inputs (tf.Tensor) – The timestep input tensor.
- state (Sequence[tf.Tensor]) – The current state tensors.
- scope (Optional[str]) – The cell name scope.
Returns: A pair with the cell’s output tuple and next state.
Return type: (CellOutput, CellState)
-
action_size
¶ Returns the MDP action size.
-
graph
¶ Returns the cell’s computation graph.
-
interm_size
¶ Returns the MDP intermediate state size.
-
output_size
¶ Returns the simulation cell output size.
-
state_size
¶ Returns the MDP state size.
- compiler (
-
class
tfmdp.model.cell.basic_cell.
OutputTuple
(state, action, interms, reward)¶ Bases:
tuple
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, state, action, interms, reward)¶ Create new instance of OutputTuple(state, action, interms, reward)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)¶ Make a new OutputTuple object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new OutputTuple object replacing specified fields with new values
-
action
¶ Alias for field number 1
-
interms
¶ Alias for field number 2
-
reward
¶ Alias for field number 3
-
state
¶ Alias for field number 0
-
-
tfmdp.model.cell.basic_cell.
cell_size
(sizes: Sequence[Sequence[int]]) → Sequence[Union[Sequence[int], int]]¶
-
tfmdp.model.cell.basic_cell.
to_tensor
(fluents)¶
Submodules¶
tfmdp.model.sequential.montecarlo module¶
tfmdp.model.sequential.mrm module¶
Module contents¶
tfmdp.planning package¶
Submodules¶
tfmdp.planning.pdplanner module¶
tfmdp.planning.planner module¶
Module contents¶
tfmdp.policy package¶
Subpackages¶
tfmdp.policy.layers package¶
-
class
tfmdp.policy.layers.action_layer.
ActionLayer
(action_size: int)¶ Bases:
tensorflow.python.layers.base.Layer
ActionLayer should be used as the output layer in a DRP.
It generates multi-head dense output layers with the same shape as action fluents. Otionally, it restricts the output tensors based on action bounds.
Parameters: action_size (Sequence[Sequence[int]]) – The list of action fluent sizes. -
_get_output_tensor
(tensor: tensorflow.python.framework.ops.Tensor, bounds: Tuple[Optional[tensorflow.python.framework.ops.Tensor], Optional[tensorflow.python.framework.ops.Tensor]]) → tensorflow.python.framework.ops.Tensor¶ Returns the value constrained output tensor.
Parameters: - tensor (tf.Tensor) – The layer’s output tensor corresponding to an action fluent.
- bounds (Tuple[Optional[tf.Tensor], Optional[tf.Tensor]]) – The action fluent bounds.
Returns: the constrained output tensor.
Return type: (tf.Tensor)
-
call
(inputs: tensorflow.python.framework.ops.Tensor, action_bounds: Optional[Sequence[Tuple[Optional[tensorflow.python.framework.ops.Tensor], Optional[tensorflow.python.framework.ops.Tensor]]]] = None) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns the tensors of the multi-head layer’s output.
Parameters: - inputs (tf.Tensor) – A hidden layer’s output.
- action_bounds (Optional[Sequence[Tuple[Optional[tf.Tensor], Optional[tf.Tensor]]]]) – The action bounds.
Returns: A tuple of action tensors.
Return type: Sequence[tf.Tensor]
-
trainable_variables
¶ Returns the list of all layer variables/weights.
-
-
class
tfmdp.policy.layers.state_layer.
StateLayer
(input_layer_norm: bool = False)¶ Bases:
tensorflow.python.layers.base.Layer
StateLayer should be used as an input layer in a DRP.
It flattens each state fluent and returns a single concatenated tensor.
Parameters: input_layer_norm (bool) – The boolean flag for enabling layer normalization. -
call
(inputs: Sequence[tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor¶ Returns the concatenation of all state fluent tensors previously flatten.
Parameters: inputs (Sequence[tf.Tensor]) – A tuple of state fluent tensors. Returns: A single output tensor. Return type: tf.Tensor
-
trainable_variables
¶ Returns the list of all layer variables/weights.
-
Submodules¶
tfmdp.policy.drp module¶
-
class
tfmdp.policy.drp.
DeepReactivePolicy
(compiler: rddl2tf.compiler.Compiler, config: Dict)¶ Bases:
object
DeepReactivePolicy abstract base class.
It defines the basic API for building, saving and restoring reactive policies implemented as deep neural nets.
A reactive policy defines a mapping from current state fluents to action fluents.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - config (Dict) – The reactive policy configuration parameters.
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – A tuple of state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
-
build
() → None¶ Create the DRP layers and trainable weights.
-
classmethod
from_json
(compiler: rddl2tf.compiler.Compiler, json_config: str) → tfmdp.policy.drp.DeepReactivePolicy¶ Instantiates a DRP from a json_config string.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - json_config (str) – A DRP configuration encoded in JSON format.
Returns: A DRP object.
Return type: - compiler (
-
graph
¶
-
name
¶ Returns the canonical DRP name.
-
restore
(sess: tensorflow.python.client.session.Session, path: Optional[str] = None) → None¶ Restores previously saved DRP trainable variables.
If path is not provided, restores from last saved checkpoint.
Parameters: - sess (
tf.Session
) – A running session. - path (Optional[str]) – An optional path to a checkpoint directory.
- sess (
-
save
(sess: tensorflow.python.client.session.Session, path: str) → str¶ Serializes all DRP trainable variables into a checkpoint file.
Parameters: - sess (
tf.Session
) – A running session. - path (str) – The path to a checkpoint directory.
Returns: The path prefix of the newly created checkpoint file.
Return type: str
- sess (
-
size
¶ Returns the number of trainable parameters.
-
summary
() → None¶ Prints a string summary of the DRP.
-
to_json
() → str¶ Returns the policy configuration parameters serialized in JSON format.
-
vars
¶ Returns a list of the trainable variables.
- compiler (
tfmdp.policy.feedforward module¶
-
class
tfmdp.policy.feedforward.
FeedforwardPolicy
(compiler: rddl2tf.compiler.Compiler, config: dict)¶ Bases:
tfmdp.policy.drp.DeepReactivePolicy
FeedforwardPolicy implements a DRP as a multi-layer perceptron.
- It is parameterized by the following configuration params:
- config[‘layers’]: a list of number of units; and
- config[‘activation’]: an activation function.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - config (Dict) – The policy configuration parameters.
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – A tuple of state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
Builds all hidden layers as tf.layers.Dense layers.
-
_build_input_layer
() → None¶ Builds the DRP input layer using a tfmdp.policy.layers.state_layer.StateLayer.
-
_build_output_layer
() → None¶ Builds the DRP output layer using a tfmdp.policy.layers.action_layer.ActionLayer.
-
build
() → None¶ Create the DRP layers and trainable weights.
-
name
¶ Returns the canonical DRP name.
-
size
¶ Returns the number of trainable parameters.
-
vars
¶ Returns a list of the trainable variables.
Module contents¶
Submodules¶
tfmdp.utils module¶
-
tfmdp.utils.
get_params_string
(config: Dict) → str¶ Returns a canonical configuration string by concatenating its parameters.