Welcome to vinci’s documentation!¶
Vinci is a generic Deep Reinforcement Learning framework.
Contents:
agent
¶
ddpg
¶
-
class
rl.agents.ddpg.
DDPGAgent
(actor, critic, memory, gamma=0.99, batch_size=32, train_interval=1, memory_interval=1, critic_gradient_clip=100, random_process=None, custom_model_objects=None, warmup_actor_steps=200, warmup_critic_steps=200, invert_gradients=False, gradient_inverter_min=-1.0, gradient_inverter_max=1.0, actor_reset_threshold=0.3, reset_controlers=False, actor_learning_rate=0.001, critic_learning_rate=0.0001, target_critic_update=0.01, target_actor_update=0.01, critic_regularization=0.01, **kwargs)[source]¶ Deep Deterministic Policy Gradient Agent as defined in https://arxiv.org/abs/1509.02971.
Parameters: - actor (keras.model) – The actor network
- critic (keras.model) – The critic network
- env (gym.env) – The gym environment
- memory (
rl.memory.Memory
) – The memory object - gamma (float) – Discount factor
- batch_size (int) – Size of the minibatches
- train_interval (int) – Train only at multiples of this number
- memory_interval (int) – Add experiences to memory only at multiples of this number
- critic_gradient_clip – Delta to which the rewards are clipped (via Huber loss, see https://github.com/devsisters/DQN-tensorflow/issues/16)
- random_process – The noise used to perform exploration
- custom_model_objects –
- target_critic_update (float) – Target critic update factor
- target_actor_update (float) – Target actor update factor
- invert_gradients (bool) – Use gradient inverting as defined in https://arxiv.org/abs/1511.04143
-
backward_offline
(train_actor=True, train_critic=True)[source]¶ Offline Backward method of the DDPG agent
Parameters:
hooks
¶
-
class
rl.hooks.hook.
Hook
(agent_id='default', experiment_id='default')[source]¶ The abstract Hook class. A hook is designed to be a callable running on an agent object. It shouldn’t return anything and instead exports the data itself (e.g. pickle, image). It is run at the end of each step.
The hook API relies on the following agent attributes, always available:
- agent.training: boolean: Whether the agent is in training mode
- agent.step: int: the step number. Begins to 1.
- agent.reward: The reward of the current step
- agent.episode: int: The current episode. Begins to 1.
- agent.episode_step: int: The step count in the current episode. Begins to 1.
- agent.done: Whether the episode is terminated
- agent.step_summaries: A list of summaries of the current step
These variables may also be available: * agent.episode_reward: The cumulated reward of the current episode * agent.observation: The observation at the beginning of the step * agent.observation_1: The observation at the end of the step * agent.action: The action taken during the step * agent.policy * agent.goal * agent.achievement * agent.error
Parameters: - agent – the RL agent
- episodic – Whether the hook will use episode information
memory
¶
-
class
rl.memory.
Batch
(state0, action, reward, state1, terminal1)¶ -
action
¶ Alias for field number 1
-
reward
¶ Alias for field number 2
-
state0
¶ Alias for field number 0
-
state1
¶ Alias for field number 3
-
terminal1
¶ Alias for field number 4
-
-
class
rl.memory.
Experience
(state0, action, reward, state1, terminal1)¶ -
action
¶ Alias for field number 1
-
reward
¶ Alias for field number 2
-
state0
¶ Alias for field number 0
-
state1
¶ Alias for field number 3
-
terminal1
¶ Alias for field number 4
-
-
class
rl.memory.
SimpleMemory
(env, limit)[source]¶ A simple memory directly storing experiences in a circular buffer
Data is stored directly as an array of
Experience