lagom¶
inte för mycket och inte för lite, enkelhet är bästnot too much and not too little, simplicity is often the best
lagom is a light PyTorch infrastructure to quickly prototype reinforcement learning algorithms.
lagom balances between the flexibility and the userability when developing reinforcement learning (RL) algorithms. The library is built on top of PyTorch and provides modular tools to quickly prototype RL algorithms. However, we do not go overboard, because going too low level is rather time consuming and prone to potential bugs, while going too high level degrades the flexibility which makes it difficult to try out some crazy ideas.
We are continuously making lagom more ‘self-contained’ to run experiments quickly. Now, it internally supports base classes for multiprocessing (master-worker framework) to parallelize (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.
One of the main pipelines to use lagom can be done as following:
- Define environment and RL agent
- User runner to collect data for agent
- Define algorithm to train agent
- Define experiment and configurations.
A graphical illustration is coming soon.
lagom¶
Agent¶
-
class
lagom.
BaseAgent
(config, env, device, **kwargs)[source]¶ Base class for all agents.
The agent could select an action from a given observation and update itself by defining a certain learning mechanism.
Any agent should subclass this class, e.g. policy-based or value-based.
Note
All agents should by default handle batched data e.g. batched observation returned from
VecEnv
and batched action for each sub-environment of aVecEnv
.Parameters: - config (dict) – a dictionary of configurations
- env (VecEnv) – environment object.
- device (Device) – a PyTorch device
- **kwargs – keyword aguments used to specify the agent
-
choose_action
(obs, **kwargs)[source]¶ Returns an (batched) action selected by the agent from received (batched) observation/
Note
Tensor conversion should be handled here instead of in policy or network forward pass.
The output is a dictionary containing useful items, e.g. action, action_logprob, state_value
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with
BaseRunner
. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e.scalar_loss -> [scalar_loss]
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
orSegment
- **kwargs – keyword arguments to specify learning mechanism
Returns: out – a dictionary of learning output. This could contain the loss.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
class
lagom.
RandomAgent
(config, env, device, **kwargs)[source]¶ A random agent samples action uniformly from action space.
-
choose_action
(obs, **kwargs)[source]¶ Returns an (batched) action selected by the agent from received (batched) observation/
Note
Tensor conversion should be handled here instead of in policy or network forward pass.
The output is a dictionary containing useful items, e.g. action, action_logprob, state_value
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: out – a dictionary of action selection output. It should also contain all useful information to be stored during interaction with
BaseRunner
. This allows a generic API of the runner classes for all kinds of agents. Note that everything should be batched even if for scalar loss, i.e.scalar_loss -> [scalar_loss]
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
orSegment
- **kwargs – keyword arguments to specify learning mechanism
Returns: out – a dictionary of learning output. This could contain the loss.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
Logger¶
-
class
lagom.
Logger
[source]¶ Log the information in a dictionary.
If a key is logged more than once, then the new value will be appended to a list.
Note
It uses pickle to serialize the data. Empirically,
pickle
is 2x faster thannumpy.save
and other alternatives likeyaml
is too slow andJSON
does not support numpy array.Warning
It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.
Example:
Default:
>>> logger = Logger() >>> logger('iteration', 1) >>> logger('train_loss', 0.12) >>> logger('iteration', 2) >>> logger('train_loss', 0.11) >>> logger('iteration', 3) >>> logger('train_loss', 0.09) >>> logger OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])]) >>> logger.dump() Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With indentation:
>>> logger.dump(indent=1) Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With specific keys:
>>> logger.dump(keys=['iteration']) Iteration: [1, 2, 3]
With specific index:
>>> logger.dump(index=0) Iteration: 1 Train Loss: 0.12
With specific list of indices:
>>> logger.dump(index=[0, 2]) Iteration: [1, 3] Train Loss: [0.12, 0.09]
-
__call__
(key, value)[source]¶ Log the information with given key and value.
Note
The key should be semantic and each word is separated by
_
.Parameters: - key (str) – key of the information
- value (object) – value to be logged
-
dump
(keys=None, index=None, indent=0, border='')[source]¶ Dump the loggings to the screen.
Parameters: - keys (list, optional) – a list of selected keys. If
None
, then use all keys. Default:None
- index (int/list, optional) –
the index of logged values. It has following use cases:
scalar
: a specific index. If-1
, then use last element.list
: a list of indicies.None
: all indicies.
- indent (int, optional) – the number of tab indentation. Default:
0
- border (str, optional) – the string to print as header and footer
- keys (list, optional) – a list of selected keys. If
Engine¶
-
class
lagom.
BaseEngine
(config, **kwargs)[source]¶ Base class for all engines.
It defines the training and evaluation process.
-
eval
(n=None, **kwargs)[source]¶ Evaluation process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .eval() to specify evaluation mode.
Parameters: - n (int, optional) – n-th iteration for evaluation.
- **kwargs – keyword aguments used for logging.
Returns: out – evluation output
Return type: dict
-
train
(n=None, **kwargs)[source]¶ Training process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .train() to specify training mode.
Parameters: - n (int, optional) – n-th iteration for training.
- **kwargs – keyword aguments used for logging.
Returns: out – training output
Return type: dict
-
Evolution Strategies¶
-
class
lagom.
BaseES
[source]¶ Base class for all evolution strategies.
Note
The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.
Note
For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.
- Set max_workers argument to control the max parallelization capacity.
- When execution get stuck, try to use
CloudpickleWrapper
to wrap the objective function e.g. particularly for lambda, class methods - Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
- To reduce overheads further (e.g. PyTorch models, gym environments)
- Recreate such models for each generation will be very expensive.
- Use initializer function for ProcessPoolExecutor
- Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
- Don’t forget to use with torch.no_grad to increase forward pass speed.
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: solutions – sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CMAES
(x0, sigma0, opts=None)[source]¶ Implements CMA-ES algorithm.
Note
It is a wrapper of the original CMA-ES implementation.
Parameters: - x0 (list) – initial solution
- sigma0 (list) – initial standard deviation
- opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: solutions – sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CEM
(x0, sigma0, opts=None)[source]¶ -
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: solutions – sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
lagom.envs¶
-
class
lagom.envs.
VecEnv
(list_make_env)[source]¶ A vectorized environment runs serially for each sub-environment.
Each observation returned from vectorized environment is a batch of observations for each sub-environment. And
step()
is expected to receive a batch of actions for each sub-environment.Note
All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.
Parameters: - list_make_env (list) – a list of functions each returns an instantiated enviroment.
- observation_space (Space) – observation space of the environment
- action_space (Space) – action space of the environment
-
close
()[source]¶ Close all environments.
It closes all the existing image viewers, then calls
close_extras()
and setclosed
asTrue
.Warning
This function itself does not close the environments, it should be handled in
close_extras()
. This is useful for parallelized environments.Note
This will be automatically called when garbage collected or program exited.
-
get_images
()[source]¶ Returns a batched RGB array with shape [N, H, W, C] from all environments.
Returns: imgs – a batched RGB array with shape [N, H, W, C] Return type: ndarray
-
get_viewer
()[source]¶ Returns an instantiated
ImageViewer
.Returns: viewer – an image viewer Return type: ImageViewer
-
render
(mode='human')[source]¶ Render all the environments.
It firstly retrieve RGB images from all environments and use
GridImage
to make a grid of them as a single image. Then it either returns the image array or display the image to the screen by usingImageViewer
.See docstring in
Env
for more detais about rendering.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
unwrapped
¶ Unwrap this vectorized environment.
Useful for sequential wrappers applied, it can access information from the original vectorized environment.
-
class
lagom.envs.
VecEnvWrapper
(env)[source]¶ Wraps the vectorized environment to allow a modular transformation.
This class is the base class for all wrappers for vectorized environments. The subclass could override some methods to change the behavior of the original vectorized environment without touching the original code.
Note
Don’t forget to call
super().__init__(env)
if the subclass overrides__init__()
.-
get_images
()[source]¶ Returns a batched RGB array with shape [N, H, W, C] from all environments.
Returns: imgs – a batched RGB array with shape [N, H, W, C] Return type: ndarray
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
unwrapped
¶ Unwrap this vectorized environment.
Useful for sequential wrappers applied, it can access information from the original vectorized environment.
-
-
lagom.envs.
make_vec_env
(make_env, num_env, init_seed)[source]¶ Create a vectorized environment, each associated with a different random seed.
Example:
>>> import gym >>> make_vec_env(lambda: gym.make('CartPole-v1'), 3, 0) <VecEnv: 3, CartPole-v1>
Parameters: - make_env (function) – a function to create an environment
- num_env (int) – number of environments to create.
- init_seed (int) – initial seed for
Seeder
to sample random seeds.
Returns: env – created vectorized environment
Return type:
Wrappers¶
-
lagom.envs.wrappers.
get_wrapper
(env, name)[source]¶ Return a wrapped environment of a specific wrapper.
Note
If no such wrapper found, then an
None
is returned.Parameters: - env (Env) – environment
- name (str) – name of the wrapper
Returns: out – wrapped environment
Return type: env
-
lagom.envs.wrappers.
get_all_wrappers
(env)[source]¶ Returns a list of wrapper names of a wrapped environment.
Parameters: env (Env) – wrapped environment Returns: out – list of string names of wrappers Return type: list
-
class
lagom.envs.wrappers.
ClipAction
(env)[source]¶ Clip the continuous action within the valid bound.
-
class
lagom.envs.wrappers.
FlattenObservation
(env)[source]¶ Observation wrapper that flattens the observation.
-
class
lagom.envs.wrappers.
NormalizeAction
(env)[source]¶ Rescale the continuous action space of the environment from [-1, 1].
-
class
lagom.envs.wrappers.
LazyFrames
(frames, lz4_compress=False)[source]¶ Ensures common frames are only stored once to optimize memory use.
To further reduce the memory use, it is optionally to turn on lz4 to compress the observations.
Note
This object should only be converted to numpy array just before forward pass.
-
class
lagom.envs.wrappers.
FrameStack
(env, num_stack, lz4_compress=False)[source]¶ Observation wrapper that stacks the observations in a rolling manner.
For example, if the number os stacks is 4, then returned observation constains the most recent 4 observations. For environment ‘Pendulum-v0’, the original observation is an array with shape [3], so if we stack 4 observations, the processed observation has shape [3, 4].
Note
To be memory efficient, the stacked observations are wrapped by
LazyFrame
.Note
The observation space must be Box type. If one uses Dict as observation space, it should apply FlattenDictWrapper at first.
Example:
>>> import gym >>> env = gym.make('PongNoFrameskip-v0') >>> env = FrameStack(env, 4) >>> env.observation_space Box(4, 210, 160, 3)
Parameters: - env (Env) – environment object
- num_stack (int) – number of stacks
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
class
lagom.envs.wrappers.
GrayScaleObservation
(env, keep_dim=False)[source]¶ Convert the image observation from RGB to gray scale.
-
class
lagom.envs.wrappers.
ResizeObservation
(env, size)[source]¶ Downsample the image observation to a square image.
-
class
lagom.envs.wrappers.
ScaleReward
(env, scale=0.01)[source]¶ Scale the reward.
Note
This is incredibly important and drastically impact on performance e.g. PPO.
Example:
>>> from lagom.envs import make_gym_env >>> env = make_gym_env(env_id='CartPole-v1', seed=0) >>> env = ScaleReward(env, scale=0.1) >>> env.reset() >>> observation, reward, done, info = env.step(env.action_space.sample()) >>> reward 0.1
Parameters: - env (Env) – environment
- scale (float) – reward scaling factor
-
class
lagom.envs.wrappers.
ScaledFloatFrame
(env)[source]¶ Convert image frame to float range [0, 1] by dividing 255.
Warning
Do NOT use this wrapper for DQN ! It will break the memory optimization.
-
class
lagom.envs.wrappers.
TimeAwareObservation
(env)[source]¶ Augment the observation with current time step in the trajectory.
Note
Currently it only works with one-dimensional observation space. It doesn’t support pixel observation space yet.
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
-
class
lagom.envs.wrappers.
VecMonitor
(env, deque_size=100)[source]¶ Record episode reward, horizon and time and report it when an episode terminates.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
-
class
lagom.envs.wrappers.
VecStandardizeObservation
(env, clip=10.0, constant_moments=None)[source]¶ Standardizes the observations by running estimation of mean and variance.
Warning
To evaluate the agent trained on standardized observations, remember to save and load observation scalings, otherwise, the performance will be incorrect.
Parameters: - env (VecEnv) – a vectorized environment
- clip (float) – clipping range of standardized observation, i.e. [-clip, clip]
- constant_moments (tuple) – a tuple of constant mean and variance to standardize observation. Note that if it is provided, then running average will be ignored.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
class
lagom.envs.wrappers.
VecStandardizeReward
(env, clip=10.0, gamma=0.99, constant_var=None)[source]¶ Standardize the reward by running estimation of variance.
Warning
We do not subtract running mean from reward but only divides it by running standard deviation. Because subtraction by mean will alter the reward shape so this might degrade the performance. Note that we perform this transformation from the second incoming reward while keeping first reward unchanged, otherwise it’ll have too large magnitude (then just being clipped) due to the fact that we do not subtract it from mean.
Note
Each
reset()
, we do not clean up theself.all_returns
buffer. Because of discount factor (\(< 1\)), the running averages will converge after some iterations. Therefore, we do not allow discounted factor as \(1.0\) since it will lead to unbounded explosion of reward running averages.Parameters: - env (VecEnv) – a vectorized environment
- clip (float) – clipping range of standardized reward, i.e. [-clip, clip]
- gamma (float) – discounted factor. Note that the value 1.0 should not be used.
- constant_var (ndarray) – Constant variance to standardize reward. Note that when it is provided, then running average will be ignored.
-
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
class
lagom.envs.wrappers.
StepInfo
(done: bool, info: dict)[source]¶ Defines a set of information for each time step.
A StepInfo is returned from each step and reset of an environment. It contains properties of the transition and additional information.
-
class
lagom.envs.wrappers.
VecStepInfo
(env)[source]¶ -
reset
()[source]¶ Reset all the environments and return a list of initial observations from each environment.
Warning
If
step_async()
is still working, then it will be aborted.Returns: observations – a list of initial observations from all environments. Return type: list
-
step
(actions)[source]¶ Ask all the environments to take a step with a list of actions, each for one environment.
Parameters: actions (list) – a list of actions, each for one environment. Returns: - observations (list) – a list of observations, each returned from one environment after executing the given action.
- rewards (list) – a list of scalar rewards, each returned from one environment.
- dones (list) – a list of booleans indicating whether the episode terminates, each returned from one environment.
- infos (list) – a list of dictionaries of additional informations, each returned from one environment.
-
lagom.experiment¶
Config¶
-
class
lagom.experiment.
Config
(items, num_sample=1, keep_dict_order=False)[source]¶ Defines a set of configurations for the experiment.
The configuration includes the following possible items:
- Hyperparameters: learning rate, batch size etc.
- Experiment settings: training iterations, logging directory, environment name etc.
All items are stored in a dictionary. It is a good practice to semantically name each item e.g. network.lr indicates the learning rate of the neural network.
For hyperparameter search, we support both grid search (
Grid
) and random search (Sample
).Call
make_configs()
to generate a list of all configurations, each is assigned with a unique ID.note:
For random search over small positive float e.g. learning rate, it is recommended to use log-uniform distribution, i.e. .. math:: \text{logU}(a, b) \sim \exp(U(\log(a), \log(b))) An example: `np.exp(np.random.uniform(low=np.log(low), high=np.log(high)))` Because direct uniform sampling is very `numerically unstable`_.
Warning
The random seeds should not be set here. Instead, it should be handled by
BaseExperimentMaster
andBaseExperimentWorker
.Example:
>>> config = Config({'log.dir': 'some path', 'network.lr': Grid([1e-3, 5e-3]), 'env.id': Grid(['CartPole-v1', 'Ant-v2'])}, num_sample=1, keep_dict_order=False) >>> import pandas as pd >>> print(pd.DataFrame(config.make_configs())) ID env.id log.dir network.lr 0 0 CartPole-v1 some path 0.001 1 1 Ant-v2 some path 0.001 2 2 CartPole-v1 some path 0.005 3 3 Ant-v2 some path 0.005
Parameters: - items (dict) – a dictionary of all configuration items.
- num_sample (int) – number of samples for random configuration items.
If grid search is also provided, then the grid will be repeated
num_sample
of times. - keep_dict_order (bool) – if
True
, then each generated configuration has the same key ordering withitems
.
Run experiment¶
-
lagom.experiment.
run_experiment
(run, config, seeds, log_dir, max_workers, chunksize=1, use_gpu=False, gpu_ids=None)[source]¶ A convenient function to parallelize the experiment (master-worker pipeline).
It is implemented by using concurrent.futures.ProcessPoolExecutor
It automatically creates all subfolders for each pair of configuration and random seed to store the loggings of the experiment. The root folder is given by the user. Then all subfolders for each configuration are created with the name of their job IDs. Under each configuration subfolder, a set subfolders are created for each random seed (the random seed as folder name). Intuitively, an experiment could have following directory structure:
- logs - 0 # ID number - 123 # random seed - 345 - 567 - 1 - 123 - 345 - 567 - 2 - 123 - 345 - 567 - 3 - 123 - 345 - 567 - 4 - 123 - 345 - 567
Parameters: - run (function) – a function that defines an algorithm, it must take the arguments (config, seed, device, logdir)
- config (Config) – a
Config
object defining all configuration settings - seeds (list) – a list of random seeds
- log_dir (str) – a string to indicate the path to store loggings.
- max_workers (int) – argument for ProcessPoolExecutor. if None, then all experiments run serially.
- chunksize (int) – argument for Executor.map()
- use_gpu (bool) – if True, then use CUDA. Otherwise, use CPU.
- gpu_ids (list) – if None, then use all available GPUs. Otherwise, only use the GPU device defined in the list.
lagom.metric: Metrics¶
-
lagom.metric.
bootstrapped_returns
(gamma, traj, last_V)[source]¶ Return (discounted) accumulated returns with bootstrapping for a batch of episodic transitions.
Formally, suppose we have all rewards \((r_1, \dots, r_T)\), it computes
\[Q_t = r_t + \gamma r_{t+1} + \dots + \gamma^{T - t} r_T + \gamma^{T - t + 1} V(s_{T+1})\]Note
The state values for terminal states are masked out as zero !
-
lagom.metric.
td0_target
(gamma, traj, Vs, last_V)[source]¶ Calculate TD(0) targets of a batch of episodic transitions.
Let \(r_1, r_2, \dots, r_T\) be a list of rewards and let \(V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})\) be a list of state values including a last state value. Let \(\gamma\) be a discounted factor, the TD(0) targets are calculated as follows
\[r_t + \gamma V(s_t), \forall t = 1, 2, \dots, T\]Note
The state values for terminal states are masked out as zero !
-
lagom.metric.
td0_error
(gamma, traj, Vs, last_V)[source]¶ Calculate TD(0) errors of a batch of episodic transitions.
Let \(r_1, r_2, \dots, r_T\) be a list of rewards and let \(V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})\) be a list of state values including a last state value. Let \(\gamma\) be a discounted factor, the TD(0) errors are calculated as follows
\[\delta_t = r_{t+1} + \gamma V(s_{t+1}) - V(s_t)\]Note
The state values for terminal states are masked out as zero !
-
lagom.metric.
gae
(gamma, lam, traj, Vs, last_V)[source]¶ Calculate the Generalized Advantage Estimation (GAE) of a batch of episodic transitions.
Let \(\delta_t\) be the TD(0) error at time step \(t\), the GAE at time step \(t\) is calculated as follows
\[A_t^{\mathrm{GAE}(\gamma, \lambda)} = \sum_{k=0}^{\infty}(\gamma\lambda)^k \delta_{t + k}\]
lagom.multiprocessing¶
Use Python multiprocessing library¶
-
class
lagom.multiprocessing.
ProcessMaster
(worker_class, num_worker)[source]¶ Base class for all masters implemented with Python multiprocessing.Process.
It creates a number of workers each with an individual Process. The communication between master and each worker is via independent Pipe connection. The master assigns tasks to workers. When all tasks are done, it stops all workers and terminate all processes.
Note
If there are more tasks than workers, then tasks will be splitted into chunks. If there are less tasks than workers, then we reduce the number of workers to the number of tasks.
-
assign_tasks
(tasks)[source]¶ Assign a given list of tasks to the workers and return the received results.
Parameters: tasks (list) – a list of tasks Returns: results – received results Return type: object
-
-
class
lagom.multiprocessing.
ProcessWorker
(master_conn, worker_conn)[source]¶ Base class for all workers implemented with Python multiprocessing.Process.
It communicates with master via a Pipe connection. The worker is stand-by infinitely waiting for task from master, working and sending back result. When it receives a
close
command, it breaks the infinite loop and close the connection.
lagom.networks: Networks¶
-
class
lagom.networks.
Module
(**kwargs)[source]¶ Wrap PyTorch nn.module to provide more helper functions.
-
from_vec
(x)[source]¶ Set the network parameters from a single flattened vector.
Parameters: x (Tensor) – A single flattened vector of the network parameters with consistent size.
-
load
(f)[source]¶ Load the network parameters from a file.
It complies with the recommended approach for saving a model in PyTorch documentation.
Parameters: f (str) – file path.
-
num_params
¶ Returns the total number of parameters in the neural network.
-
num_trainable_params
¶ Returns the total number of trainable parameters in the neural network.
-
num_untrainable_params
¶ Returns the total number of untrainable parameters in the neural network.
-
save
(f)[source]¶ Save the network parameters to a file.
It complies with the recommended approach for saving a model in PyTorch documentation.
Note
It uses the highest pickle protocol to serialize the network parameters.
Parameters: f (str) – file path.
-
-
lagom.networks.
ortho_init
(module, nonlinearity=None, weight_scale=1.0, constant_bias=0.0)[source]¶ Applies orthogonal initialization for the parameters of a given module.
Parameters: - module (nn.Module) – A module to apply orthogonal initialization over its parameters.
- nonlinearity (str, optional) – Nonlinearity followed by forward pass of the module. When nonlinearity
is not
None
, the gain will be calculated andweight_scale
will be ignored. Default:None
- weight_scale (float, optional) – Scaling factor to initialize the weight. Ignored when
nonlinearity
is notNone
. Default: 1.0 - constant_bias (float, optional) – Constant value to initialize the bias. Default: 0.0
Note
Currently, the only supported
module
are elementary neural network layers, e.g. nn.Linear, nn.Conv2d, nn.LSTM. The submodules are not supported.Example:
>>> a = nn.Linear(2, 3) >>> ortho_init(a)
-
lagom.networks.
linear_lr_scheduler
(optimizer, N, min_lr)[source]¶ Defines a linear learning rate scheduler.
Parameters: - optimizer (Optimizer) – optimizer
- N (int) – maximum bounds for the scheduling iteration e.g. total number of epochs, iterations or time steps.
- min_lr (float) – lower bound of learning rate
-
lagom.networks.
make_fc
(input_dim, hidden_sizes)[source]¶ Returns a ModuleList of fully connected layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
>>> make_fc(3, [4, 5, 6]) ModuleList( (0): Linear(in_features=3, out_features=4, bias=True) (1): Linear(in_features=4, out_features=5, bias=True) (2): Linear(in_features=5, out_features=6, bias=True) )
Parameters: - input_dim (int) – input dimension in the first fully connected layer.
- hidden_sizes (list) – a list of hidden sizes, each for one fully connected layer.
Returns: fc – A ModuleList of fully connected layers.
Return type: nn.ModuleList
-
lagom.networks.
make_cnn
(input_channel, channels, kernels, strides, paddings)[source]¶ Returns a ModuleList of 2D convolution layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
>>> make_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0]) ModuleList( (0): Conv2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) )
Parameters: - input_channel (int) – input channel in the first convolution layer.
- channels (list) – a list of channels, each for one convolution layer.
- kernels (list) – a list of kernels, each for one convolution layer.
- strides (list) – a list of strides, each for one convolution layer.
- paddings (list) – a list of paddings, each for one convolution layer.
Returns: cnn – A ModuleList of 2D convolution layers.
Return type: nn.ModuleList
-
lagom.networks.
make_transposed_cnn
(input_channel, channels, kernels, strides, paddings, output_paddings)[source]¶ Returns a ModuleList of 2D transposed convolution layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
make_transposed_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0], output_paddings=[1, 0]) ModuleList( (0): ConvTranspose2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (1): ConvTranspose2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) )
Parameters: - input_channel (int) – input channel in the first transposed convolution layer.
- channels (list) – a list of channels, each for one transposed convolution layer.
- kernels (list) – a list of kernels, each for one transposed convolution layer.
- strides (list) – a list of strides, each for one transposed convolution layer.
- paddings (list) – a list of paddings, each for one transposed convolution layer.
- output_paddings (list) – a list of output paddings, each for one transposed convolution layer.
Returns: transposed_cnn – A ModuleList of 2D transposed convolution layers.
Return type: nn.ModuleList
-
class
lagom.networks.
MDNHead
(in_features, out_features, num_density, device, **kwargs)[source]¶ -
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
loss
(logit_pi, mean, std, target)[source]¶ Calculate the MDN loss function.
The loss function (negative log-likelihood) is defined by:
\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\prod_{d=1}^{D} \pi_{k}(x_{n, d}) \mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right)\]For better numerical stability, we could use log-scale:
\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\exp \left\{ \sum_{d=1}^{D} \ln\pi_{k}(x_{n, d}) + \ln\mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right\} \right)\]Note
One should always use the second formula via log-sum-exp trick. The first formula is numerically unstable resulting in +/-
Inf
andNaN
error.The log-sum-exp trick is defined by
\[\log\sum_{i=1}^{N}\exp(x_i) = a + \log\sum_{i=1}^{N}\exp(x_i - a)\]where \(a = \max_i(x_i)\)
Parameters: - logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
- mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
- std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
- target (Tensor) – target tensor, shape [N, D]
Returns: loss – calculated loss
Return type: Tensor
-
sample
(logit_pi, mean, std, tau=1.0)[source]¶ Sample from Gaussian mixtures using reparameterization trick.
- Firstly sample categorically over mixing coefficients to determine a specific Gaussian
- Then sample from selected Gaussian distribution
Parameters: - logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
- mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
- std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
- tau (float) – temperature during sampling, it controls uncertainty. * If \(\tau > 1\): increase uncertainty * If \(\tau < 1\): decrease uncertainty
Returns: x – sampled data with shape [N, D]
Return type: Tensor
-
Recurrent Neural Networks¶
RL components¶
-
class
lagom.networks.
CategoricalHead
(feature_dim, num_action, device, **kwargs)[source]¶ Defines a module for a Categorical (discrete) action distribution.
Example
>>> import torch >>> action_head = CategoricalHead(30, 4, 'cpu') >>> action_head(torch.randn(2, 30)) Categorical(probs: torch.Size([2, 4]))
Parameters: - feature_dim (int) – number of input features
- num_action (int) – number of discrete actions
- device (torch.device) – PyTorch device
- **kwargs – keyword arguments for more specifications.
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
lagom.networks.
DiagGaussianHead
(feature_dim, action_dim, device, std0, **kwargs)[source]¶ Defines a module for a diagonal Gaussian (continuous) action distribution which the standard deviation is state independent.
The network outputs the mean \(\mu(x)\) and the state independent logarithm of standard deviation \(\log\sigma\) (allowing to optimize in log-space, i.e. both negative and positive).
The standard deviation is obtained by applying exponential function \(\exp(x)\).
Example
>>> import torch >>> action_head = DiagGaussianHead(10, 4, 'cpu', 0.45) >>> action_dist = action_head(torch.randn(2, 10)) >>> action_dist.base_dist Normal(loc: torch.Size([2, 4]), scale: torch.Size([2, 4])) >>> action_dist.base_dist.stddev tensor([[0.4500, 0.4500, 0.4500, 0.4500], [0.4500, 0.4500, 0.4500, 0.4500]], grad_fn=<ExpBackward>)
Parameters: - feature_dim (int) – number of input features
- action_dim (int) – flat dimension of actions
- device (torch.device) – PyTorch device
- std0 (float) – initial standard deviation
- **kwargs – keyword arguments for more specifications.
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
lagom.runner: Runner¶
-
class
lagom.runner.
BaseRunner
[source]¶ Base class for all runners.
A runner is a data collection interface between the agent and the environment. For each calling of the runner, the agent will take actions and receive observation in and from an environment for a certain number of trajectories/segments and a certain number of time steps.
Note
By default, the agent handles batched data returned from
VecEnv
type of environment.
lagom.transform: Transformations¶
-
class
lagom.transform.
Describe
(count: int, mean: float, std: float, min: float, max: float, repr_indent: int = 0, repr_prefix: str = None)[source]¶
-
lagom.transform.
interp_curves
(x, y)[source]¶ Piecewise linear interpolation of a discrete set of data points and generate new \(x-y\) values from the interpolated line.
It receives a batch of curves with \(x-y\) values, a global min and max of the x-axis are calculated over the entire batch and new x-axis values are generated to be applied to the interpolation function. Each interpolated curve will share the same values in x-axis.
Note
This is useful for plotting a set of curves with uncertainty bands where each curve has data points at different \(x\) values. To generate such plot, we need the set of \(y\) values with consistent \(x\) values.
Warning
Piecewise linear interpolation often can lead to more realistic uncertainty bands. Do not use polynomial interpolation which the resulting curve can be extremely misleading.
Example:
>>> import matplotlib.pyplot as plt >>> x1 = [4, 5, 7, 13, 20] >>> y1 = [0.25, 0.22, 0.53, 0.37, 0.55] >>> x2 = [2, 4, 6, 7, 9, 11, 15] >>> y2 = [0.03, 0.12, 0.4, 0.2, 0.18, 0.32, 0.39] >>> plt.scatter(x1, y1, c='blue') >>> plt.scatter(x2, y2, c='red') >>> new_x, new_y = interp_curves([x1, x2], [y1, y2], num_point=100) >>> plt.plot(new_x[0], new_y[0], 'blue') >>> plt.plot(new_x[1], new_y[1], 'red')
Parameters: - x (list) – a batch of x values.
- y (list) – a batch of y values.
- num_point (int) – number of points to generate from the interpolated line.
Returns: - out_x (list) – interpolated x values (shared for the batch of curves)
- out_y (list) – interpolated y values
-
lagom.transform.
geometric_cumsum
(alpha, x)[source]¶ Calculate future accumulated sums for each element in a list with an exponential factor.
Given input data \(x_1, \dots, x_n\) and exponential factor \(\alpha\in [0, 1]\), it returns an array \(y\) with the same length and each element is calculated as following
\[y_i = x_i + \alpha x_{i+1} + \alpha^2 x_{i+2} + \dots + \alpha^{n-i-1}x_{n-1} + \alpha^{n-i}x_{n}\]Note
To gain the optimal runtime speed, we use
scipy.signal.lfilter
Example
>>> geometric_cumsum(0.1, [1, 2, 3, 4]) array([[1.234, 2.34 , 3.4 , 4. ]])
Parameters: - alpha (float) – exponential factor between zero and one.
- x (list) – input data
Returns: out – calculated data
Return type: ndarray
-
lagom.transform.
explained_variance
(y_true, y_pred, **kwargs)[source]¶ Computes the explained variance regression score.
It involves a fraction of variance that the prediction explains about the ground truth.
Let \(\hat{y}\) be the predicted output and let \(y\) be the ground truth output. Then the explained variance is estimated as follows:
\[\text{EV}(y, \hat{y}) = 1 - \frac{\text{Var}(y - \hat{y})}{\text{Var}(y)}\]The best score is \(1.0\), and lower values are worse. A detailed interpretation is as following:
- \(\text{EV} = 1\): perfect prediction
- \(\text{EV} = 0\): might as well have predicted zero
- \(\text{EV} < 0\): worse than just predicting zero
Note
It calls the function from
scikit-learn
which handles exceptions better e.g. zero division, batch size.Example
>>> explained_variance(y_true=[3, -0.5, 2, 7], y_pred=[2.5, 0.0, 2, 8]) 0.9571734475374732
>>> explained_variance(y_true=[[3, -0.5, 2, 7]], y_pred=[[2.5, 0.0, 2, 8]]) 0.9571734475374732
>>> explained_variance(y_true=[[0.5, 1], [-1, 1], [7, -6]], y_pred=[[0, 2], [-1, 2], [8, -5]]) 0.9838709677419355
>>> explained_variance(y_true=[[0.5, 1], [-1, 10], [7, -6]], y_pred=[[0, 2], [-1, 0.00005], [8, -5]]) 0.6704023148857179
Parameters: - y_true (list) – ground truth output
- y_pred (list) – predicted output
- **kwargs – keyword arguments to specify the estimation of the explained variance.
Returns: out – estimated explained variance
Return type: float
-
class
lagom.transform.
LinearSchedule
(initial, final, N, start=0)[source]¶ A linear scheduling from an initial to a final value over a certain timesteps, then the final value is fixed constantly afterwards.
Note
This could be useful for following use cases:
- Decay of epsilon-greedy: initialized with \(1.0\) and keep with
start
time steps, then linearly decay tofinal
overN
time steps, and then fixed constantly asfinal
afterwards. - Beta parameter in prioritized experience replay.
Note that for learning rate decay, one should use PyTorch
optim.lr_scheduler
instead.Example
>>> scheduler = LinearSchedule(initial=1.0, final=0.1, N=3, start=0) >>> [scheduler(i) for i in range(6)] [1.0, 0.7, 0.4, 0.1, 0.1, 0.1]
Parameters: - initial (float) – initial value
- final (float) – final value
- N (int) – number of scheduling timesteps
- start (int, optional) – the timestep to start the scheduling. Default: 0
- Decay of epsilon-greedy: initialized with \(1.0\) and keep with
-
lagom.transform.
rank_transform
(x, centered=True)[source]¶ Rank transformation of a vector of values. The rank has the same dimensionality as the vector. Each element in the rank indicates the index of the ascendingly sorted input. i.e.
ranks[i] = k
, it means i-th element in the input is \(k\)-th smallest value.Rank transformation reduce sensitivity to outliers, e.g. in OpenAI ES, gradient computation involves fitness values in the population, if there are outliers (too large fitness), it affects the gradient too much.
Note that a centered rank transformation to the range [-0.5, 0.5] is supported by an option.
Example
>>> rank_transform([3, 14, 1], centered=True) array([ 0. , 0.5, -0.5])
>>> rank_transform([3, 14, 1], centered=False) array([1, 2, 0])
Parameters: - x (list/ndarray) – a vector of values.
- centered (bool, optional) – if
True
, then centered the rank transformation to \([-0.5, 0.5]\). Defualt:True
Returns: ranks – ranks of input data
Return type: ndarray
-
class
lagom.transform.
PolyakAverage
(alpha)[source]¶ Keep a running average of a quantity via Polyak averaging.
Compared with estimating mean, it is more sentitive to recent changes.
Parameters: alpha (float) – factor to control the sensitivity to recent changes, in the range [0, 1]. Zero is most sensitive to recent change.
-
class
lagom.transform.
RunningMeanVar
(shape)[source]¶ Estimates sample mean and variance by using Chan’s method.
It supports for both scalar and multi-dimensional data, however, the input is expected to be batched. The first dimension is always treated as batch dimension.
Note
For better precision, we handle the data with np.float64.
Warning
To use estimated moments for standardization, remember to keep the precision np.float64 and calculated as ..math:frac{x - mu}{sqrt{sigma^2 + 10^{-8}}}.
Example
>>> f = RunningMeanVar(shape=()) >>> f([1, 2]) >>> f([3]) >>> f([4]) >>> f.mean 2.499937501562461 >>> f.var 1.2501499923440393
-
__call__
(x)[source]¶ Update the mean and variance given an additional batched data.
Parameters: x (object) – additional batched data.
-
n
¶ Returns the total number of samples so far.
-
-
lagom.transform.
smooth_filter
(x, window_length, polyorder, **kwargs)[source]¶ Smooth a sequence of noisy data points by applying Savitzky–Golay filter. It uses least squares to fit a polynomial with a small sliding window and use this polynomial to estimate the point in the center of the sliding window.
This is useful when a curve is highly noisy, smoothing it out leads to better visualization quality.
Example
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(0, 4*2*np.pi, num=100) >>> y = x*(np.sin(x) + np.random.random(100)*4) >>> y2 = smooth_filter(y, window_length=31, polyorder=10)
>>> plt.plot(x, y) >>> plt.plot(x, y2, 'red')
Parameters: - x (list) – one-dimensional vector of scalar data points of a curve.
- window_length (int) – the length of the filter window
- polyorder (int) – the order of the polynomial used to fit the samples
Returns: out – smoothed curve data
Return type: ndarray
-
class
lagom.transform.
SegmentTree
(capacity, operation, identity_element)[source]¶ Defines a segment tree data structure.
It can be regarded as regular array, but with two major differences
- Value modification is slower: O(ln(capacity)) instead of O(1)
- Efficient reduce operation over contiguous subarray: O(ln(segment size))
Parameters: - capacity (int) – total number of elements, it must be a power of two.
- operation (lambda) – binary operation forming a group, e.g. sum, min
- identity_element (object) – identity element in the group, e.g. 0 for sum
-
class
lagom.transform.
SumTree
(capacity)[source]¶ Defines the sum tree for storing replay priorities.
Each leaf node contains priority value. Internal nodes maintain the sum of the priorities of all leaf nodes in their subtrees.
-
find_prefixsum_index
(prefixsum)[source]¶ Find the highest index i in the array such that sum(A[0] + A[1] + … + A[i - 1]) <= prefixsum
if array values are probabilities, this function efficiently sample indices according to the discrete probability.
Parameters: prefixsum (float) – prefix sum. Returns: index – highest index satisfying the prefixsum constraint Return type: int
-
lagom.vis: Visualization¶
-
class
lagom.vis.
ImageViewer
(max_width=500)[source]¶ Display an image from an RGB array in an OpenGL window.
Example:
imageviewer = ImageViewer(max_width=500) image = np.asarray(Image.open('x.jpg')) imageviewer(image)
-
__call__
(x)[source]¶ Create an image from the given RGB array and display to the window.
Parameters: x (ndarray) – RGB array
-
-
class
lagom.vis.
GridImage
(ncol=8, padding=2, pad_value=0)[source]¶ Generate a grid of images. The images can be iteratively added.
Example:
grid = GridImage(ncol=8, padding=5, pad_value=0) a = np.random.randint(0, 255+1, size=[10, 3, 64, 64]) grid.add(a) grid()
Reference:
Parameters: - ncol (int, optional) – Number of images to show in each row of the grid. Final grid size is [N/ncol, ncol]. Default: 8.
- padding (int, optional) – Number of paddings. Default: 2.
- pad_value (float, optional) – Padding value in the range [0, 255]. Black is 0 and white 255. Default: 0
lagom.utils: Utils¶
-
lagom.utils.
set_global_seeds
(seed)[source]¶ Set the seed for generating random numbers.
It sets the following dependencies with the given random seed:
- PyTorch
- Numpy
- Python random
Parameters: seed (int) – a given seed.
-
class
lagom.utils.
Seeder
(init_seed=0)[source]¶ A random seed generator.
Given an initial seed, the seeder can be called continuously to sample a single or a batch of random seeds.
Note
The seeder creates an independent RandomState to generate random numbers. It does not affect the RandomState in
np.random
.Example:
>>> seeder = Seeder(init_seed=0) >>> seeder(size=5) [209652396, 398764591, 924231285, 1478610112, 441365315]
-
lagom.utils.
pickle_dump
(obj, f, ext='.pkl')[source]¶ Serialize an object using pickling and save in a file.
Note
It uses cloudpickle instead of pickle to support lambda function and multiprocessing. By default, the highest protocol is used.
Note
Except for pure array object, it is not recommended to use
np.save
because it is often much slower.Parameters: - obj (object) – a serializable object
- f (str/Path) – file path
- ext (str, optional) – file extension. Default: .pkl
-
lagom.utils.
pickle_load
(f)[source]¶ Read a pickled data from a file.
Parameters: f (str/Path) – file path
-
lagom.utils.
yaml_dump
(obj, f, ext='.yml')[source]¶ Serialize a Python object using YAML and save in a file.
Note
YAML is recommended to use for a small dictionary and it is super human-readable. e.g. configuration settings. For saving experiment metrics, it is better to use
pickle_dump()
.Note
Except for pure array object, it is not recommended to use
np.load
because it is often much slower.Parameters: - obj (object) – a serializable object
- f (str/Path) – file path
- ext (str, optional) – file extension. Default: .yml
-
lagom.utils.
yaml_load
(f)[source]¶ Read the data from a YAML file.
Parameters: f (str/Path) – file path
-
lagom.utils.
color_str
(string, color, attribute=None)[source]¶ Returns stylized string with color and attribute for printing.
Example:
>>> print(color_str('lagom', 'green', attribute='bold'))
See colored documentation for more details.
Parameters: - string (str) – input string
- color (str) – color name
- attribute (str, optional) – attribute. Default:
None
Returns: out – stylized string
Return type: str
-
lagom.utils.
timed
(color='green', attribute='bold')[source]¶ A decorator to print the total time of executing a body function.
Parameters: - color (str, optional) – color name. Default: ‘green’
- attribute (str, optional) – attribute. Default: ‘bold’