lagom

inte för mycket och inte för lite, enkelhet är bäst
not too much and not too little, simplicity is often the best
Fork me on GitHub

lagom is a light PyTorch infrastructure to quickly prototype reinforcement learning algorithms.

lagom balances between the flexibility and the userability when developing reinforcement learning (RL) algorithms. The library is built on top of PyTorch and provides modular tools to quickly prototype RL algorithms. However, we do not go overboard, because going too low level is rather time consuming and prone to potential bugs, while going too high level degrades the flexibility which makes it difficult to try out some crazy ideas.

We are continuously making lagom more ‘self-contained’ to run experiments quickly. Now, it internally supports base classes for multiprocessing (master-worker framework) to parallelize (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.

One of the main pipelines to use lagom can be done as following:

  1. Define environment and RL agent
  2. User runner to collect data for agent
  3. Define algorithm to train agent
  4. Define experiment and configurations.

A graphical illustration is coming soon.

Setup environment

Installing lagom

lagom

Agent

class lagom.BaseAgent(config, env, device, **kwargs)[source]

Base class for all agents.

The agent could select an action from a given observation and update itself by defining a certain learning mechanism.

Any agent should subclass this class, e.g. policy-based or value-based.

Note

All agents should by default handle batched data e.g. batched observation returned from VecEnv and batched action for each sub-environment of a VecEnv.

Parameters:
  • config (dict) – a dictionary of configurations
  • env (VecEnv) – environment object.
  • device (Device) – a PyTorch device
  • **kwargs – keyword aguments used to specify the agent
choose_action(obs, **kwargs)[source]

Returns an (batched) action selected by the agent from received (batched) observation/

Note

Tensor conversion should be handled here instead of in policy or network forward pass.

The output is a dictionary containing useful items, e.g. action, action_logprob, state_value

Parameters:
  • obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
  • **kwargs – keyword arguments to specify action selection.
Returns:

out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with BaseRunner. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e. scalar_loss -> [scalar_loss]

Return type:

dict

learn(D, **kwargs)[source]

Defines learning mechanism to update the agent from a batched data.

Parameters:
  • D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be a list of Trajectory or Segment
  • **kwargs – keyword arguments to specify learning mechanism
Returns:

out – a dictionary of learning output. This could contain the loss.

Return type:

dict

class lagom.RandomAgent(config, env, device, **kwargs)[source]

A random agent samples action uniformly from action space.

choose_action(obs, **kwargs)[source]

Returns an (batched) action selected by the agent from received (batched) observation/

Note

Tensor conversion should be handled here instead of in policy or network forward pass.

The output is a dictionary containing useful items, e.g. action, action_logprob, state_value

Parameters:
  • obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
  • **kwargs – keyword arguments to specify action selection.
Returns:

out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with BaseRunner. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e. scalar_loss -> [scalar_loss]

Return type:

dict

learn(D, **kwargs)[source]

Defines learning mechanism to update the agent from a batched data.

Parameters:
  • D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be a list of Trajectory or Segment
  • **kwargs – keyword arguments to specify learning mechanism
Returns:

out – a dictionary of learning output. This could contain the loss.

Return type:

dict

Logger

class lagom.Logger[source]

Log the information in a dictionary.

If a key is logged more than once, then the new value will be appended to a list.

Note

It uses pickle to serialize the data. Empirically, pickle is 2x faster than numpy.save and other alternatives like yaml is too slow and JSON does not support numpy array.

Warning

It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.

Example:

  • Default:

    >>> logger = Logger()
    >>> logger('iteration', 1)
    >>> logger('train_loss', 0.12)
    >>> logger('iteration', 2)
    >>> logger('train_loss', 0.11)
    >>> logger('iteration', 3)
    >>> logger('train_loss', 0.09)
    
    >>> logger
    OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])])
    
    >>> logger.dump()
    Iteration: [1, 2, 3]
    Train Loss: [0.12, 0.11, 0.09]
    
  • With indentation:

    >>> logger.dump(indent=1)
        Iteration: [1, 2, 3]
        Train Loss: [0.12, 0.11, 0.09]
    
  • With specific keys:

    >>> logger.dump(keys=['iteration'])
    Iteration: [1, 2, 3]
    
  • With specific index:

    >>> logger.dump(index=0)
    Iteration: 1
    Train Loss: 0.12
    
  • With specific list of indices:

    >>> logger.dump(index=[0, 2])
    Iteration: [1, 3]
    Train Loss: [0.12, 0.09]
    
__call__(key, value)[source]

Log the information with given key and value.

Note

The key should be semantic and each word is separated by _.

Parameters:
  • key (str) – key of the information
  • value (object) – value to be logged
clear()[source]

Remove all loggings in the dictionary.

dump(keys=None, index=None, indent=0, border='')[source]

Dump the loggings to the screen.

Parameters:
  • keys (list, optional) – a list of selected keys. If None, then use all keys. Default: None
  • index (int/list, optional) –

    the index of logged values. It has following use cases:

    • scalar: a specific index. If -1, then use last element.
    • list: a list of indicies.
    • None: all indicies.
  • indent (int, optional) – the number of tab indentation. Default: 0
  • border (str, optional) – the string to print as header and footer
save(f)[source]

Save loggings to a file.

Parameters:f (str) – file path

Engine

class lagom.BaseEngine(config, **kwargs)[source]

Base class for all engines.

It defines the training and evaluation process.

eval(n=None, **kwargs)[source]

Evaluation process for one iteration.

Note

It is recommended to use Logger to store loggings.

Note

All parameterized modules should be called .eval() to specify evaluation mode.

Parameters:
  • n (int, optional) – n-th iteration for evaluation.
  • **kwargs – keyword aguments used for logging.
Returns:

out – evluation output

Return type:

dict

train(n=None, **kwargs)[source]

Training process for one iteration.

Note

It is recommended to use Logger to store loggings.

Note

All parameterized modules should be called .train() to specify training mode.

Parameters:
  • n (int, optional) – n-th iteration for training.
  • **kwargs – keyword aguments used for logging.
Returns:

out – training output

Return type:

dict

Evolution Strategies

class lagom.BaseES[source]

Base class for all evolution strategies.

Note

The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.

Note

For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.

  • Set max_workers argument to control the max parallelization capacity.
  • When execution get stuck, try to use CloudpickleWrapper to wrap the objective function e.g. particularly for lambda, class methods
  • Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
  • To reduce overheads further (e.g. PyTorch models, gym environments)
    • Recreate such models for each generation will be very expensive.
    • Use initializer function for ProcessPoolExecutor
    • Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
    • Don’t forget to use with torch.no_grad to increase forward pass speed.
ask()[source]

Sample a set of new candidate solutions.

Returns:solutions – sampled candidate solutions
Return type:list
result

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:
  • solutions (list/ndarray) – candidate solutions returned from ask()
  • function_values (list) – a list of objective function values evaluated for the sampled solutions.
class lagom.CMAES(x0, sigma0, opts=None)[source]

Implements CMA-ES algorithm.

Note

It is a wrapper of the original CMA-ES implementation.

Parameters:
  • x0 (list) – initial solution
  • sigma0 (list) – initial standard deviation
  • opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]
ask()[source]

Sample a set of new candidate solutions.

Returns:solutions – sampled candidate solutions
Return type:list
result

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:
  • solutions (list/ndarray) – candidate solutions returned from ask()
  • function_values (list) – a list of objective function values evaluated for the sampled solutions.
class lagom.CEM(x0, sigma0, opts=None)[source]
ask()[source]

Sample a set of new candidate solutions.

Returns:solutions – sampled candidate solutions
Return type:list
result

Return a namedtuple of all results for the optimization.

It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations

tell(solutions, function_values)[source]

Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.

Parameters:
  • solutions (list/ndarray) – candidate solutions returned from ask()
  • function_values (list) – a list of objective function values evaluated for the sampled solutions.

lagom.envs

lagom.envs.flatdim(space)[source]
lagom.envs.flatten(space, x)[source]
lagom.envs.unflatten(space, x)[source]
class lagom.envs.VecEnv(list_make_env)[source]

A vectorized environment runs serially for each sub-environment.

Each observation returned from vectorized environment is a batch of observations for each sub-environment. And step() is expected to receive a batch of actions for each sub-environment.

Note

All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.

Parameters:
  • list_make_env (list) – a list of functions each returns an instantiated enviroment.
  • observation_space (Space) – observation space of the environment
  • action_space (Space) – action space of the environment
close()[source]

Close all environments.

It closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is useful for parallelized environments.

Note

This will be automatically called when garbage collected or program exited.

close_extras()[source]

Clean up the extra resources e.g. beyond what’s in this base class.

get_images()[source]

Returns a batched RGB array with shape [N, H, W, C] from all environments.

Returns:imgs – a batched RGB array with shape [N, H, W, C]
Return type:ndarray
get_viewer()[source]

Returns an instantiated ImageViewer.

Returns:viewer – an image viewer
Return type:ImageViewer
render(mode='human')[source]

Render all the environments.

It firstly retrieve RGB images from all environments and use GridImage to make a grid of them as a single image. Then it either returns the image array or display the image to the screen by using ImageViewer.

See docstring in Env for more detais about rendering.

reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:observations – a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
  • observations (list) – a list of observations, each returned from one environment after executing the given action.
  • rewards (list) – a list of scalar rewards, each returned from one environment.
  • dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list) – a list of dictionaries of additional informations, each returned from one environment.
unwrapped

Unwrap this vectorized environment.

Useful for sequential wrappers applied, it can access information from the original vectorized environment.

class lagom.envs.VecEnvWrapper(env)[source]

Wraps the vectorized environment to allow a modular transformation.

This class is the base class for all wrappers for vectorized environments. The subclass could override some methods to change the behavior of the original vectorized environment without touching the original code.

Note

Don’t forget to call super().__init__(env) if the subclass overrides __init__().

close_extras()[source]

Clean up the extra resources e.g. beyond what’s in this base class.

get_images()[source]

Returns a batched RGB array with shape [N, H, W, C] from all environments.

Returns:imgs – a batched RGB array with shape [N, H, W, C]
Return type:ndarray
reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:observations – a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
  • observations (list) – a list of observations, each returned from one environment after executing the given action.
  • rewards (list) – a list of scalar rewards, each returned from one environment.
  • dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list) – a list of dictionaries of additional informations, each returned from one environment.
unwrapped

Unwrap this vectorized environment.

Useful for sequential wrappers applied, it can access information from the original vectorized environment.

lagom.envs.make_vec_env(make_env, num_env, init_seed)[source]

Create a vectorized environment, each associated with a different random seed.

Example:

>>> import gym
>>> make_vec_env(lambda: gym.make('CartPole-v1'), 3, 0)
<VecEnv: 3, CartPole-v1>
Parameters:
  • make_env (function) – a function to create an environment
  • num_env (int) – number of environments to create.
  • init_seed (int) – initial seed for Seeder to sample random seeds.
Returns:

env – created vectorized environment

Return type:

VecEnv

Wrappers

lagom.envs.wrappers.get_wrapper(env, name)[source]

Return a wrapped environment of a specific wrapper.

Note

If no such wrapper found, then an None is returned.

Parameters:
  • env (Env) – environment
  • name (str) – name of the wrapper
Returns:

out – wrapped environment

Return type:

env

lagom.envs.wrappers.get_all_wrappers(env)[source]

Returns a list of wrapper names of a wrapped environment.

Parameters:env (Env) – wrapped environment
Returns:out – list of string names of wrappers
Return type:list
class lagom.envs.wrappers.ClipAction(env)[source]

Clip the continuous action within the valid bound.

class lagom.envs.wrappers.ClipReward(env, min_r, max_r)[source]

“Clip reward to [min, max].

class lagom.envs.wrappers.SignClipReward(env)[source]

“Bin reward to {-1, 0, +1} by its sign.

class lagom.envs.wrappers.FlattenObservation(env)[source]

Observation wrapper that flattens the observation.

class lagom.envs.wrappers.NormalizeAction(env)[source]

Rescale the continuous action space of the environment from [-1, 1].

class lagom.envs.wrappers.LazyFrames(frames, lz4_compress=False)[source]

Ensures common frames are only stored once to optimize memory use.

To further reduce the memory use, it is optionally to turn on lz4 to compress the observations.

Note

This object should only be converted to numpy array just before forward pass.

class lagom.envs.wrappers.FrameStack(env, num_stack, lz4_compress=False)[source]

Observation wrapper that stacks the observations in a rolling manner.

For example, if the number os stacks is 4, then returned observation constains the most recent 4 observations. For environment ‘Pendulum-v0’, the original observation is an array with shape [3], so if we stack 4 observations, the processed observation has shape [3, 4].

Note

To be memory efficient, the stacked observations are wrapped by LazyFrame.

Note

The observation space must be Box type. If one uses Dict as observation space, it should apply FlattenDictWrapper at first.

Example:

>>> import gym
>>> env = gym.make('PongNoFrameskip-v0')
>>> env = FrameStack(env, 4)
>>> env.observation_space
Box(4, 210, 160, 3)
Parameters:
  • env (Env) – environment object
  • num_stack (int) – number of stacks
reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
class lagom.envs.wrappers.GrayScaleObservation(env, keep_dim=False)[source]

Convert the image observation from RGB to gray scale.

class lagom.envs.wrappers.ResizeObservation(env, size)[source]

Downsample the image observation to a square image.

class lagom.envs.wrappers.ScaleReward(env, scale=0.01)[source]

Scale the reward.

Note

This is incredibly important and drastically impact on performance e.g. PPO.

Example:

>>> from lagom.envs import make_gym_env
>>> env = make_gym_env(env_id='CartPole-v1', seed=0)
>>> env = ScaleReward(env, scale=0.1)
>>> env.reset()
>>> observation, reward, done, info = env.step(env.action_space.sample())
>>> reward
0.1
Parameters:
  • env (Env) – environment
  • scale (float) – reward scaling factor
class lagom.envs.wrappers.ScaledFloatFrame(env)[source]

Convert image frame to float range [0, 1] by dividing 255.

Warning

Do NOT use this wrapper for DQN ! It will break the memory optimization.

class lagom.envs.wrappers.TimeAwareObservation(env)[source]

Augment the observation with current time step in the trajectory.

Note

Currently it only works with one-dimensional observation space. It doesn’t support pixel observation space yet.

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:the initial observation.
Return type:observation (object)
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
class lagom.envs.wrappers.VecMonitor(env, deque_size=100)[source]

Record episode reward, horizon and time and report it when an episode terminates.

reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:observations – a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
  • observations (list) – a list of observations, each returned from one environment after executing the given action.
  • rewards (list) – a list of scalar rewards, each returned from one environment.
  • dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list) – a list of dictionaries of additional informations, each returned from one environment.
class lagom.envs.wrappers.VecStandardizeObservation(env, clip=10.0, constant_moments=None)[source]

Standardizes the observations by running estimation of mean and variance.

Warning

To evaluate the agent trained on standardized observations, remember to save and load observation scalings, otherwise, the performance will be incorrect.

Parameters:
  • env (VecEnv) – a vectorized environment
  • clip (float) – clipping range of standardized observation, i.e. [-clip, clip]
  • constant_moments (tuple) – a tuple of constant mean and variance to standardize observation. Note that if it is provided, then running average will be ignored.
reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:observations – a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
  • observations (list) – a list of observations, each returned from one environment after executing the given action.
  • rewards (list) – a list of scalar rewards, each returned from one environment.
  • dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list) – a list of dictionaries of additional informations, each returned from one environment.
class lagom.envs.wrappers.VecStandardizeReward(env, clip=10.0, gamma=0.99, constant_var=None)[source]

Standardize the reward by running estimation of variance.

Warning

We do not subtract running mean from reward but only divides it by running standard deviation. Because subtraction by mean will alter the reward shape so this might degrade the performance. Note that we perform this transformation from the second incoming reward while keeping first reward unchanged, otherwise it’ll have too large magnitude (then just being clipped) due to the fact that we do not subtract it from mean.

Note

Each reset(), we do not clean up the self.all_returns buffer. Because of discount factor (\(< 1\)), the running averages will converge after some iterations. Therefore, we do not allow discounted factor as \(1.0\) since it will lead to unbounded explosion of reward running averages.

Parameters:
  • env (VecEnv) – a vectorized environment
  • clip (float) – clipping range of standardized reward, i.e. [-clip, clip]
  • gamma (float) – discounted factor. Note that the value 1.0 should not be used.
  • constant_var (ndarray) – Constant variance to standardize reward. Note that when it is provided, then running average will be ignored.
reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:observations – a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
  • observations (list) – a list of observations, each returned from one environment after executing the given action.
  • rewards (list) – a list of scalar rewards, each returned from one environment.
  • dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list) – a list of dictionaries of additional informations, each returned from one environment.
class lagom.envs.wrappers.StepInfo(done: bool, info: dict)[source]

Defines a set of information for each time step.

A StepInfo is returned from each step and reset of an environment. It contains properties of the transition and additional information.

class lagom.envs.wrappers.VecStepInfo(env)[source]
reset()[source]

Reset all the environments and return a list of initial observations from each environment.

Warning

If step_async() is still working, then it will be aborted.

Returns:observations – a list of initial observations from all environments.
Return type:list
step(actions)[source]

Ask all the environments to take a step with a list of actions, each for one environment.

Parameters:actions (list) – a list of actions, each for one environment.
Returns:
  • observations (list) – a list of observations, each returned from one environment after executing the given action.
  • rewards (list) – a list of scalar rewards, each returned from one environment.
  • dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
  • infos (list) – a list of dictionaries of additional informations, each returned from one environment.

lagom.experiment

Config

class lagom.experiment.Grid(values)[source]

A grid search over a list of values.

class lagom.experiment.Sample(f)[source]
class lagom.experiment.Condition(f)[source]
class lagom.experiment.Config(items, num_sample=1, keep_dict_order=False)[source]

Defines a set of configurations for the experiment.

The configuration includes the following possible items:

  • Hyperparameters: learning rate, batch size etc.
  • Experiment settings: training iterations, logging directory, environment name etc.

All items are stored in a dictionary. It is a good practice to semantically name each item e.g. network.lr indicates the learning rate of the neural network.

For hyperparameter search, we support both grid search (Grid) and random search (Sample).

Call make_configs() to generate a list of all configurations, each is assigned with a unique ID.

note:

For random search over small positive float e.g. learning rate, it is recommended to
use log-uniform distribution, i.e.
.. math::
    \text{logU}(a, b) \sim \exp(U(\log(a), \log(b)))

An example: `np.exp(np.random.uniform(low=np.log(low), high=np.log(high)))`

Because direct uniform sampling is very `numerically unstable`_.

Warning

The random seeds should not be set here. Instead, it should be handled by BaseExperimentMaster and BaseExperimentWorker.

Example:

>>> config = Config({'log.dir': 'some path', 'network.lr': Grid([1e-3, 5e-3]), 'env.id': Grid(['CartPole-v1', 'Ant-v2'])}, num_sample=1, keep_dict_order=False)
>>> import pandas as pd
>>> print(pd.DataFrame(config.make_configs()))
       ID       env.id    log.dir  network.lr
    0   0  CartPole-v1  some path       0.001
    1   1       Ant-v2  some path       0.001
    2   2  CartPole-v1  some path       0.005
    3   3       Ant-v2  some path       0.005
Parameters:
  • items (dict) – a dictionary of all configuration items.
  • num_sample (int) – number of samples for random configuration items. If grid search is also provided, then the grid will be repeated num_sample of times.
  • keep_dict_order (bool) – if True, then each generated configuration has the same key ordering with items.
make_configs()[source]

Generate a list of all possible combinations of configurations, including grid search and random search.

Returns:list_config – a list of all possible configurations
Return type:list

Run experiment

lagom.experiment.run_experiment(run, config, seeds, log_dir, max_workers, chunksize=1, use_gpu=False, gpu_ids=None)[source]

A convenient function to parallelize the experiment (master-worker pipeline).

It is implemented by using concurrent.futures.ProcessPoolExecutor

It automatically creates all subfolders for each pair of configuration and random seed to store the loggings of the experiment. The root folder is given by the user. Then all subfolders for each configuration are created with the name of their job IDs. Under each configuration subfolder, a set subfolders are created for each random seed (the random seed as folder name). Intuitively, an experiment could have following directory structure:

- logs
    - 0  # ID number
        - 123  # random seed
        - 345
        - 567
    - 1
        - 123
        - 345
        - 567
    - 2
        - 123
        - 345
        - 567
    - 3
        - 123
        - 345
        - 567
    - 4
        - 123
        - 345
        - 567
Parameters:
  • run (function) – a function that defines an algorithm, it must take the arguments (config, seed, device, logdir)
  • config (Config) – a Config object defining all configuration settings
  • seeds (list) – a list of random seeds
  • log_dir (str) – a string to indicate the path to store loggings.
  • max_workers (int) – argument for ProcessPoolExecutor. if None, then all experiments run serially.
  • chunksize (int) – argument for Executor.map()
  • use_gpu (bool) – if True, then use CUDA. Otherwise, use CPU.
  • gpu_ids (list) – if None, then use all available GPUs. Otherwise, only use the GPU device defined in the list.

lagom.metric: Metrics

lagom.metric.returns(gamma, traj)[source]
lagom.metric.bootstrapped_returns(gamma, traj, last_V)[source]

Return (discounted) accumulated returns with bootstrapping for a batch of episodic transitions.

Formally, suppose we have all rewards \((r_1, \dots, r_T)\), it computes

\[Q_t = r_t + \gamma r_{t+1} + \dots + \gamma^{T - t} r_T + \gamma^{T - t + 1} V(s_{T+1})\]

Note

The state values for terminal states are masked out as zero !

lagom.metric.td0_target(gamma, traj, Vs, last_V)[source]

Calculate TD(0) targets of a batch of episodic transitions.

Let \(r_1, r_2, \dots, r_T\) be a list of rewards and let \(V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})\) be a list of state values including a last state value. Let \(\gamma\) be a discounted factor, the TD(0) targets are calculated as follows

\[r_t + \gamma V(s_t), \forall t = 1, 2, \dots, T\]

Note

The state values for terminal states are masked out as zero !

lagom.metric.td0_error(gamma, traj, Vs, last_V)[source]

Calculate TD(0) errors of a batch of episodic transitions.

Let \(r_1, r_2, \dots, r_T\) be a list of rewards and let \(V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})\) be a list of state values including a last state value. Let \(\gamma\) be a discounted factor, the TD(0) errors are calculated as follows

\[\delta_t = r_{t+1} + \gamma V(s_{t+1}) - V(s_t)\]

Note

The state values for terminal states are masked out as zero !

lagom.metric.gae(gamma, lam, traj, Vs, last_V)[source]

Calculate the Generalized Advantage Estimation (GAE) of a batch of episodic transitions.

Let \(\delta_t\) be the TD(0) error at time step \(t\), the GAE at time step \(t\) is calculated as follows

\[A_t^{\mathrm{GAE}(\gamma, \lambda)} = \sum_{k=0}^{\infty}(\gamma\lambda)^k \delta_{t + k}\]

lagom.multiprocessing

Use Python multiprocessing library

class lagom.multiprocessing.ProcessMaster(worker_class, num_worker)[source]

Base class for all masters implemented with Python multiprocessing.Process.

It creates a number of workers each with an individual Process. The communication between master and each worker is via independent Pipe connection. The master assigns tasks to workers. When all tasks are done, it stops all workers and terminate all processes.

Note

If there are more tasks than workers, then tasks will be splitted into chunks. If there are less tasks than workers, then we reduce the number of workers to the number of tasks.

assign_tasks(tasks)[source]

Assign a given list of tasks to the workers and return the received results.

Parameters:tasks (list) – a list of tasks
Returns:results – received results
Return type:object
close()[source]

Defines everything required after finishing all the works, e.g. stop all workers, clean up.

make_tasks()[source]

Returns a list of tasks.

Returns:tasks – a list of tasks
Return type:list
class lagom.multiprocessing.ProcessWorker(master_conn, worker_conn)[source]

Base class for all workers implemented with Python multiprocessing.Process.

It communicates with master via a Pipe connection. The worker is stand-by infinitely waiting for task from master, working and sending back result. When it receives a close command, it breaks the infinite loop and close the connection.

work(task_id, task)[source]

Work on the given task and return the result.

Parameters:
  • task_id (int) – the task ID.
  • task (object) – a given task.
Returns:

result – working result.

Return type:

object

lagom.networks: Networks

class lagom.networks.Module(**kwargs)[source]

Wrap PyTorch nn.module to provide more helper functions.

from_vec(x)[source]

Set the network parameters from a single flattened vector.

Parameters:x (Tensor) – A single flattened vector of the network parameters with consistent size.
load(f)[source]

Load the network parameters from a file.

It complies with the recommended approach for saving a model in PyTorch documentation.

Parameters:f (str) – file path.
num_params

Returns the total number of parameters in the neural network.

num_trainable_params

Returns the total number of trainable parameters in the neural network.

num_untrainable_params

Returns the total number of untrainable parameters in the neural network.

save(f)[source]

Save the network parameters to a file.

It complies with the recommended approach for saving a model in PyTorch documentation.

Note

It uses the highest pickle protocol to serialize the network parameters.

Parameters:f (str) – file path.
to_vec()[source]

Returns the network parameters as a single flattened vector.

lagom.networks.ortho_init(module, nonlinearity=None, weight_scale=1.0, constant_bias=0.0)[source]

Applies orthogonal initialization for the parameters of a given module.

Parameters:
  • module (nn.Module) – A module to apply orthogonal initialization over its parameters.
  • nonlinearity (str, optional) – Nonlinearity followed by forward pass of the module. When nonlinearity is not None, the gain will be calculated and weight_scale will be ignored. Default: None
  • weight_scale (float, optional) – Scaling factor to initialize the weight. Ignored when nonlinearity is not None. Default: 1.0
  • constant_bias (float, optional) – Constant value to initialize the bias. Default: 0.0

Note

Currently, the only supported module are elementary neural network layers, e.g. nn.Linear, nn.Conv2d, nn.LSTM. The submodules are not supported.

Example:

>>> a = nn.Linear(2, 3)
>>> ortho_init(a)
lagom.networks.linear_lr_scheduler(optimizer, N, min_lr)[source]

Defines a linear learning rate scheduler.

Parameters:
  • optimizer (Optimizer) – optimizer
  • N (int) – maximum bounds for the scheduling iteration e.g. total number of epochs, iterations or time steps.
  • min_lr (float) – lower bound of learning rate
lagom.networks.make_fc(input_dim, hidden_sizes)[source]

Returns a ModuleList of fully connected layers.

Note

All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in BaseNetwork.

Example:

>>> make_fc(3, [4, 5, 6])
ModuleList(
  (0): Linear(in_features=3, out_features=4, bias=True)
  (1): Linear(in_features=4, out_features=5, bias=True)
  (2): Linear(in_features=5, out_features=6, bias=True)
)
Parameters:
  • input_dim (int) – input dimension in the first fully connected layer.
  • hidden_sizes (list) – a list of hidden sizes, each for one fully connected layer.
Returns:

fc – A ModuleList of fully connected layers.

Return type:

nn.ModuleList

lagom.networks.make_cnn(input_channel, channels, kernels, strides, paddings)[source]

Returns a ModuleList of 2D convolution layers.

Note

All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in BaseNetwork.

Example:

>>> make_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0])
ModuleList(
  (0): Conv2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (1): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
)
Parameters:
  • input_channel (int) – input channel in the first convolution layer.
  • channels (list) – a list of channels, each for one convolution layer.
  • kernels (list) – a list of kernels, each for one convolution layer.
  • strides (list) – a list of strides, each for one convolution layer.
  • paddings (list) – a list of paddings, each for one convolution layer.
Returns:

cnn – A ModuleList of 2D convolution layers.

Return type:

nn.ModuleList

lagom.networks.make_transposed_cnn(input_channel, channels, kernels, strides, paddings, output_paddings)[source]

Returns a ModuleList of 2D transposed convolution layers.

Note

All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in BaseNetwork.

Example:

make_transposed_cnn(input_channel=3,
                    channels=[16, 32],
                    kernels=[4, 3],
                    strides=[2, 1],
                    paddings=[1, 0],
                    output_paddings=[1, 0])
ModuleList(
  (0): ConvTranspose2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
  (1): ConvTranspose2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
)
Parameters:
  • input_channel (int) – input channel in the first transposed convolution layer.
  • channels (list) – a list of channels, each for one transposed convolution layer.
  • kernels (list) – a list of kernels, each for one transposed convolution layer.
  • strides (list) – a list of strides, each for one transposed convolution layer.
  • paddings (list) – a list of paddings, each for one transposed convolution layer.
  • output_paddings (list) – a list of output paddings, each for one transposed convolution layer.
Returns:

transposed_cnn – A ModuleList of 2D transposed convolution layers.

Return type:

nn.ModuleList

class lagom.networks.MDNHead(in_features, out_features, num_density, device, **kwargs)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

loss(logit_pi, mean, std, target)[source]

Calculate the MDN loss function.

The loss function (negative log-likelihood) is defined by:

\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\prod_{d=1}^{D} \pi_{k}(x_{n, d}) \mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right)\]

For better numerical stability, we could use log-scale:

\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\exp \left\{ \sum_{d=1}^{D} \ln\pi_{k}(x_{n, d}) + \ln\mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right\} \right)\]

Note

One should always use the second formula via log-sum-exp trick. The first formula is numerically unstable resulting in +/- Inf and NaN error.

The log-sum-exp trick is defined by

\[\log\sum_{i=1}^{N}\exp(x_i) = a + \log\sum_{i=1}^{N}\exp(x_i - a)\]

where \(a = \max_i(x_i)\)

Parameters:
  • logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
  • mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
  • std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
  • target (Tensor) – target tensor, shape [N, D]
Returns:

loss – calculated loss

Return type:

Tensor

sample(logit_pi, mean, std, tau=1.0)[source]

Sample from Gaussian mixtures using reparameterization trick.

  • Firstly sample categorically over mixing coefficients to determine a specific Gaussian
  • Then sample from selected Gaussian distribution
Parameters:
  • logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
  • mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
  • std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
  • tau (float) – temperature during sampling, it controls uncertainty. * If \(\tau > 1\): increase uncertainty * If \(\tau < 1\): decrease uncertainty
Returns:

x – sampled data with shape [N, D]

Return type:

Tensor

Recurrent Neural Networks

class lagom.networks.LayerNormLSTMCell(input_size, hidden_size)[source]
class lagom.networks.LSTMLayer(cell, *cell_args)[source]
class lagom.networks.StackedLSTM(num_layers, layer, first_layer_args, other_layer_args)[source]
lagom.networks.make_lnlstm(input_size, hidden_size, num_layers=1)[source]

RL components

class lagom.networks.CategoricalHead(feature_dim, num_action, device, **kwargs)[source]

Defines a module for a Categorical (discrete) action distribution.

Example

>>> import torch
>>> action_head = CategoricalHead(30, 4, 'cpu')
>>> action_head(torch.randn(2, 30))
Categorical(probs: torch.Size([2, 4]))
Parameters:
  • feature_dim (int) – number of input features
  • num_action (int) – number of discrete actions
  • device (torch.device) – PyTorch device
  • **kwargs – keyword arguments for more specifications.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class lagom.networks.DiagGaussianHead(feature_dim, action_dim, device, std0, **kwargs)[source]

Defines a module for a diagonal Gaussian (continuous) action distribution which the standard deviation is state independent.

The network outputs the mean \(\mu(x)\) and the state independent logarithm of standard deviation \(\log\sigma\) (allowing to optimize in log-space, i.e. both negative and positive).

The standard deviation is obtained by applying exponential function \(\exp(x)\).

Example

>>> import torch
>>> action_head = DiagGaussianHead(10, 4, 'cpu', 0.45)
>>> action_dist = action_head(torch.randn(2, 10))
>>> action_dist.base_dist
Normal(loc: torch.Size([2, 4]), scale: torch.Size([2, 4]))
>>> action_dist.base_dist.stddev
tensor([[0.4500, 0.4500, 0.4500, 0.4500],
        [0.4500, 0.4500, 0.4500, 0.4500]], grad_fn=<ExpBackward>)
Parameters:
  • feature_dim (int) – number of input features
  • action_dim (int) – flat dimension of actions
  • device (torch.device) – PyTorch device
  • std0 (float) – initial standard deviation
  • **kwargs – keyword arguments for more specifications.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

lagom.runner: Runner

class lagom.runner.BaseRunner[source]

Base class for all runners.

A runner is a data collection interface between the agent and the environment. For each calling of the runner, the agent will take actions and receive observation in and from an environment for a certain number of trajectories/segments and a certain number of time steps.

Note

By default, the agent handles batched data returned from VecEnv type of environment.

__call__(agent, env, T, **kwargs)[source]

Run the agent in the environment for a number of time steps and collect all necessary interaction data.

Parameters:
  • agent (BaseAgent) – agent
  • env (VecEnv) – VecEnv type of environment
  • T (int) – number of time steps
  • **kwargs – keyword arguments for more specifications.
class lagom.runner.Trajectory[source]
class lagom.runner.EpisodeRunner(reset_on_call=True)[source]

lagom.transform: Transformations

class lagom.transform.Describe(count: int, mean: float, std: float, min: float, max: float, repr_indent: int = 0, repr_prefix: str = None)[source]
lagom.transform.describe(x, axis=-1, repr_indent=0, repr_prefix=None)[source]
lagom.transform.interp_curves(x, y)[source]

Piecewise linear interpolation of a discrete set of data points and generate new \(x-y\) values from the interpolated line.

It receives a batch of curves with \(x-y\) values, a global min and max of the x-axis are calculated over the entire batch and new x-axis values are generated to be applied to the interpolation function. Each interpolated curve will share the same values in x-axis.

Note

This is useful for plotting a set of curves with uncertainty bands where each curve has data points at different \(x\) values. To generate such plot, we need the set of \(y\) values with consistent \(x\) values.

Warning

Piecewise linear interpolation often can lead to more realistic uncertainty bands. Do not use polynomial interpolation which the resulting curve can be extremely misleading.

Example:

>>> import matplotlib.pyplot as plt

>>> x1 = [4, 5, 7, 13, 20]
>>> y1 = [0.25, 0.22, 0.53, 0.37, 0.55]
>>> x2 = [2, 4, 6, 7, 9, 11, 15]
>>> y2 = [0.03, 0.12, 0.4, 0.2, 0.18, 0.32, 0.39]

>>> plt.scatter(x1, y1, c='blue')
>>> plt.scatter(x2, y2, c='red')

>>> new_x, new_y = interp_curves([x1, x2], [y1, y2], num_point=100)
>>> plt.plot(new_x[0], new_y[0], 'blue')
>>> plt.plot(new_x[1], new_y[1], 'red')
Parameters:
  • x (list) – a batch of x values.
  • y (list) – a batch of y values.
  • num_point (int) – number of points to generate from the interpolated line.
Returns:

  • out_x (list) – interpolated x values (shared for the batch of curves)
  • out_y (list) – interpolated y values

lagom.transform.geometric_cumsum(alpha, x)[source]

Calculate future accumulated sums for each element in a list with an exponential factor.

Given input data \(x_1, \dots, x_n\) and exponential factor \(\alpha\in [0, 1]\), it returns an array \(y\) with the same length and each element is calculated as following

\[y_i = x_i + \alpha x_{i+1} + \alpha^2 x_{i+2} + \dots + \alpha^{n-i-1}x_{n-1} + \alpha^{n-i}x_{n}\]

Note

To gain the optimal runtime speed, we use scipy.signal.lfilter

Example

>>> geometric_cumsum(0.1, [1, 2, 3, 4])
array([[1.234, 2.34 , 3.4  , 4.   ]])
Parameters:
  • alpha (float) – exponential factor between zero and one.
  • x (list) – input data
Returns:

out – calculated data

Return type:

ndarray

lagom.transform.explained_variance(y_true, y_pred, **kwargs)[source]

Computes the explained variance regression score.

It involves a fraction of variance that the prediction explains about the ground truth.

Let \(\hat{y}\) be the predicted output and let \(y\) be the ground truth output. Then the explained variance is estimated as follows:

\[\text{EV}(y, \hat{y}) = 1 - \frac{\text{Var}(y - \hat{y})}{\text{Var}(y)}\]

The best score is \(1.0\), and lower values are worse. A detailed interpretation is as following:

  • \(\text{EV} = 1\): perfect prediction
  • \(\text{EV} = 0\): might as well have predicted zero
  • \(\text{EV} < 0\): worse than just predicting zero

Note

It calls the function from scikit-learn which handles exceptions better e.g. zero division, batch size.

Example

>>> explained_variance(y_true=[3, -0.5, 2, 7], y_pred=[2.5, 0.0, 2, 8])
0.9571734475374732
>>> explained_variance(y_true=[[3, -0.5, 2, 7]], y_pred=[[2.5, 0.0, 2, 8]])
0.9571734475374732
>>> explained_variance(y_true=[[0.5, 1], [-1, 1], [7, -6]], y_pred=[[0, 2], [-1, 2], [8, -5]])
0.9838709677419355
>>> explained_variance(y_true=[[0.5, 1], [-1, 10], [7, -6]], y_pred=[[0, 2], [-1, 0.00005], [8, -5]])
0.6704023148857179
Parameters:
  • y_true (list) – ground truth output
  • y_pred (list) – predicted output
  • **kwargs – keyword arguments to specify the estimation of the explained variance.
Returns:

out – estimated explained variance

Return type:

float

class lagom.transform.LinearSchedule(initial, final, N, start=0)[source]

A linear scheduling from an initial to a final value over a certain timesteps, then the final value is fixed constantly afterwards.

Note

This could be useful for following use cases:

  • Decay of epsilon-greedy: initialized with \(1.0\) and keep with start time steps, then linearly decay to final over N time steps, and then fixed constantly as final afterwards.
  • Beta parameter in prioritized experience replay.

Note that for learning rate decay, one should use PyTorch optim.lr_scheduler instead.

Example

>>> scheduler = LinearSchedule(initial=1.0, final=0.1, N=3, start=0)
>>> [scheduler(i) for i in range(6)]
[1.0, 0.7, 0.4, 0.1, 0.1, 0.1]
Parameters:
  • initial (float) – initial value
  • final (float) – final value
  • N (int) – number of scheduling timesteps
  • start (int, optional) – the timestep to start the scheduling. Default: 0
__call__(x)[source]

Returns the current value of the scheduling.

Parameters:x (int) – the current timestep.
Returns:out – current value of the scheduling.
Return type:float
lagom.transform.rank_transform(x, centered=True)[source]

Rank transformation of a vector of values. The rank has the same dimensionality as the vector. Each element in the rank indicates the index of the ascendingly sorted input. i.e. ranks[i] = k, it means i-th element in the input is \(k\)-th smallest value.

Rank transformation reduce sensitivity to outliers, e.g. in OpenAI ES, gradient computation involves fitness values in the population, if there are outliers (too large fitness), it affects the gradient too much.

Note that a centered rank transformation to the range [-0.5, 0.5] is supported by an option.

Example

>>> rank_transform([3, 14, 1], centered=True)
array([ 0. ,  0.5, -0.5])
>>> rank_transform([3, 14, 1], centered=False)
array([1, 2, 0])
Parameters:
  • x (list/ndarray) – a vector of values.
  • centered (bool, optional) – if True, then centered the rank transformation to \([-0.5, 0.5]\). Defualt: True
Returns:

ranks – ranks of input data

Return type:

ndarray

class lagom.transform.PolyakAverage(alpha)[source]

Keep a running average of a quantity via Polyak averaging.

Compared with estimating mean, it is more sentitive to recent changes.

Parameters:alpha (float) – factor to control the sensitivity to recent changes, in the range [0, 1]. Zero is most sensitive to recent change.
__call__(x)[source]

Update the estimate.

Parameters:x (object) – additional data to update the estimation of running average.
get_current()[source]

Return the current running average.

class lagom.transform.RunningMeanVar(shape)[source]

Estimates sample mean and variance by using Chan’s method.

It supports for both scalar and multi-dimensional data, however, the input is expected to be batched. The first dimension is always treated as batch dimension.

Note

For better precision, we handle the data with np.float64.

Warning

To use estimated moments for standardization, remember to keep the precision np.float64 and calculated as ..math:frac{x - mu}{sqrt{sigma^2 + 10^{-8}}}.

Example

>>> f = RunningMeanVar(shape=())
>>> f([1, 2])
>>> f([3])
>>> f([4])
>>> f.mean
2.499937501562461
>>> f.var
1.2501499923440393
__call__(x)[source]

Update the mean and variance given an additional batched data.

Parameters:x (object) – additional batched data.
n

Returns the total number of samples so far.

lagom.transform.smooth_filter(x, window_length, polyorder, **kwargs)[source]

Smooth a sequence of noisy data points by applying Savitzky–Golay filter. It uses least squares to fit a polynomial with a small sliding window and use this polynomial to estimate the point in the center of the sliding window.

This is useful when a curve is highly noisy, smoothing it out leads to better visualization quality.

Example

>>> import matplotlib.pyplot as plt
>>> x = np.linspace(0, 4*2*np.pi, num=100)
>>> y = x*(np.sin(x) + np.random.random(100)*4)
>>> y2 = smooth_filter(y, window_length=31, polyorder=10)
>>> plt.plot(x, y)
>>> plt.plot(x, y2, 'red')
Parameters:
  • x (list) – one-dimensional vector of scalar data points of a curve.
  • window_length (int) – the length of the filter window
  • polyorder (int) – the order of the polynomial used to fit the samples
Returns:

out – smoothed curve data

Return type:

ndarray

class lagom.transform.SegmentTree(capacity, operation, identity_element)[source]

Defines a segment tree data structure.

It can be regarded as regular array, but with two major differences

  • Value modification is slower: O(ln(capacity)) instead of O(1)
  • Efficient reduce operation over contiguous subarray: O(ln(segment size))
Parameters:
  • capacity (int) – total number of elements, it must be a power of two.
  • operation (lambda) – binary operation forming a group, e.g. sum, min
  • identity_element (object) – identity element in the group, e.g. 0 for sum
reduce(start=0, end=None)[source]

Returns result of operation(A[start], operation(A[start+1], operation(… A[end - 1]))).

Parameters:
  • start (int) – start of segment
  • end (int) – end of segment
Returns:

out – result of reduce operation

Return type:

object

class lagom.transform.SumTree(capacity)[source]

Defines the sum tree for storing replay priorities.

Each leaf node contains priority value. Internal nodes maintain the sum of the priorities of all leaf nodes in their subtrees.

find_prefixsum_index(prefixsum)[source]

Find the highest index i in the array such that sum(A[0] + A[1] + … + A[i - 1]) <= prefixsum

if array values are probabilities, this function efficiently sample indices according to the discrete probability.

Parameters:prefixsum (float) – prefix sum.
Returns:index – highest index satisfying the prefixsum constraint
Return type:int
sum(start=0, end=None)[source]

Return A[start] + … + A[end - 1]

class lagom.transform.MinTree(capacity)[source]
min(start=0, end=None)[source]

Returns min(A[start], …, A[end])

lagom.vis: Visualization

class lagom.vis.ImageViewer(max_width=500)[source]

Display an image from an RGB array in an OpenGL window.

Example:

imageviewer = ImageViewer(max_width=500)
image = np.asarray(Image.open('x.jpg'))
imageviewer(image)
__call__(x)[source]

Create an image from the given RGB array and display to the window.

Parameters:x (ndarray) – RGB array
__init__(max_width=500)[source]

Initialize the OpenGL window.

Parameters:max_width (int) – maximum width of the window.
close()[source]

Close the Window.

class lagom.vis.GridImage(ncol=8, padding=2, pad_value=0)[source]

Generate a grid of images. The images can be iteratively added.

Example:

grid = GridImage(ncol=8, padding=5, pad_value=0)

a = np.random.randint(0, 255+1, size=[10, 3, 64, 64])
grid.add(a)
grid()

Reference:

Parameters:
  • ncol (int, optional) – Number of images to show in each row of the grid. Final grid size is [N/ncol, ncol]. Default: 8.
  • padding (int, optional) – Number of paddings. Default: 2.
  • pad_value (float, optional) – Padding value in the range [0, 255]. Black is 0 and white 255. Default: 0
__call__(**kwargs)[source]

Make grid of images.

Parameters:**kwargs – keyword aguments used to specify the grid of images.
Returns:img – a grid of image with shape [H, W, C] and dtype np.uint8
Return type:Image
add(x)[source]

Add a new data for making grid images.

Parameters:x (list/ndarray) – a list or ndarray of images, with shape either [H, W], [C, H, W] or [N, C, H, W]
lagom.vis.set_ticker(ax, axis='x', num=None, KM_format=False, integer=False)[source]
lagom.vis.read_xy(log_folder, file_name, get_x, get_y, smooth_out=False)[source]

lagom.utils: Utils

lagom.utils.set_global_seeds(seed)[source]

Set the seed for generating random numbers.

It sets the following dependencies with the given random seed:

  1. PyTorch
  2. Numpy
  3. Python random
Parameters:seed (int) – a given seed.
class lagom.utils.Seeder(init_seed=0)[source]

A random seed generator.

Given an initial seed, the seeder can be called continuously to sample a single or a batch of random seeds.

Note

The seeder creates an independent RandomState to generate random numbers. It does not affect the RandomState in np.random.

Example:

>>> seeder = Seeder(init_seed=0)
>>> seeder(size=5)
[209652396, 398764591, 924231285, 1478610112, 441365315]
__call__(size=1)[source]

Return the sampled random seeds according to the given size.

Parameters:size (int or list) – The size of random seeds to sample.
Returns:seeds – a list of sampled random seeds.
Return type:list
__init__(init_seed=0)[source]

Initialize the seeder.

Parameters:init_seed (int, optional) – Initial seed for generating random seeds. Default: 0.
lagom.utils.tensorify(x, device)[source]
lagom.utils.numpify(x, dtype)[source]
lagom.utils.pickle_dump(obj, f, ext='.pkl')[source]

Serialize an object using pickling and save in a file.

Note

It uses cloudpickle instead of pickle to support lambda function and multiprocessing. By default, the highest protocol is used.

Note

Except for pure array object, it is not recommended to use np.save because it is often much slower.

Parameters:
  • obj (object) – a serializable object
  • f (str/Path) – file path
  • ext (str, optional) – file extension. Default: .pkl
lagom.utils.pickle_load(f)[source]

Read a pickled data from a file.

Parameters:f (str/Path) – file path
lagom.utils.yaml_dump(obj, f, ext='.yml')[source]

Serialize a Python object using YAML and save in a file.

Note

YAML is recommended to use for a small dictionary and it is super human-readable. e.g. configuration settings. For saving experiment metrics, it is better to use pickle_dump().

Note

Except for pure array object, it is not recommended to use np.load because it is often much slower.

Parameters:
  • obj (object) – a serializable object
  • f (str/Path) – file path
  • ext (str, optional) – file extension. Default: .yml
lagom.utils.yaml_load(f)[source]

Read the data from a YAML file.

Parameters:f (str/Path) – file path
lagom.utils.color_str(string, color, attribute=None)[source]

Returns stylized string with color and attribute for printing.

Example:

>>> print(color_str('lagom', 'green', attribute='bold'))

See colored documentation for more details.

Parameters:
  • string (str) – input string
  • color (str) – color name
  • attribute (str, optional) – attribute. Default: None
Returns:

out – stylized string

Return type:

str

lagom.utils.timed(color='green', attribute='bold')[source]

A decorator to print the total time of executing a body function.

Parameters:
  • color (str, optional) – color name. Default: ‘green’
  • attribute (str, optional) – attribute. Default: ‘bold’
lagom.utils.timeit(_func=None, *, color='green', attribute='bold')[source]
lagom.utils.ask_yes_or_no(msg)[source]

Ask user to enter yes or no to a given message.

Parameters:msg (str) – a message
class lagom.utils.CloudpickleWrapper(x)[source]

Uses cloudpickle to serialize contents (multiprocessing uses pickle by default)

This is useful when passing lambda definition through Process arguments.

Indices and tables