PK !lyD|{ nestly-0.4.0/objects.inv# Sphinx inventory version 2 # Project: nestly # Version: 0.4 # The remainder of this file is compressed using zlib. xڭW=o0+p WQ`,C?u%P|#㤲ya(wE{)CuFoNFti^qoޔ|}wSo nHSѠ h+ǍE RQUJ_}˽ujm˻kq&@ӬGeUbMHRU@A{b7'﹪N*T @tV,͖\~Fý%(׳ǽӂk<m*E `0_wY' tM'qPW'~糇:(Z dž+٤tJga]MkjڱactDn%M\iA@ʡ7ম^)Nοd4h&\1zKuNգ3δ 9j9CLo: Q+}u/x_gX2_c ^:ҋƁͲoe:!}vAPK !lyDm$! nestly-0.4.0/.buildinfo# Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. config: tags: PK !lyDZ7 7 nestly-0.4.0/index.html
Nestly is a small package designed to ease running software with combinatorial choices of parameters. It can easily do so for “cartesian products” of parameter choices, but can do much more– arbitrary “backwards-looking” dependencies can be used.
To find out more, check out the the Examples.
Contents:
This is a realistic example of using nestly to examine the performance of two algorithms. Source code to run it is available in examples/adcl/.
We will use the min_adcl_tree subcommand of the rppr tool from the pplacer suite, available from http://matsen.fhcrc.org/pplacer.
This tool chooses k representative leaves from a phylogenetic tree. There are two implementations: the Full algorithm solves the problem exactly, while the PAM algorithm uses a variation on the partitioning among medoids heuristic to find a solution.
We’d like to compare the two algorithms on a variety of trees, using different values for k.
Setting up the comparison is demonstrated in 00make_nest.py, which builds up combinations of (algorithm, tree, k):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | #!/usr/bin/env python
# This example compares runtimes of two implementations of
# an algorithm to minimize the average distance to the closest leaf
# (Matsen et. al., accepted to Systematic Biology).
#
# To run it, you'll need the `rppr` binary on your path, distributed as part of
# the pplacer suite. Source code, or binaries for OS X and 64-bit Linux are
# available from http://matsen.fhcrc.org/pplacer/.
#
# The `rppr min_adcl_tree` subcommand takes a tree, an algorithm name, and
# the number of leaves to keep.
#
# We wish to explore the runtime, over each tree, for various leaf counts.
import glob
from os.path import abspath
from nestly import Nest, stripext
# The `trees` directory contains 5 trees, each with 1000 leaves.
# We want to run each algorithm on all of them.
trees = [abspath(f) for f in glob.glob('trees/*.tre')]
n = Nest()
# We'll try both algorithms
n.add('algorithm', ['full', 'pam'])
# For every tree
n.add('tree', trees, label_func=stripext)
# Store the number of leaves - always 1000 here
n.add('n_leaves', [1000], create_dir=False)
# Now we vary the number of leaves to keep (k)
# Sample between 1 and the total number of leaves.
def k(c):
n_leaves = c['n_leaves']
return range(1, n_leaves, n_leaves // 10)
# Add `k` to the nest.
# This will call k with each combination of (algorithm, tree, n_leaves).
# Each value returned will be used as a possible value for `k`
n.add('k', k)
# Build the nest:
n.build('runs')
|
Running that:
$ ./00make_nest.py
Creates a new directory, runs.
Within this directory are subdirectories for each algorithm:
runs/full
runs/pam
Each of these contains a directory for each tree used:
$ ls runs/pam
random001 random002 random003 random004 random005
Within each of these subdirectories are directories for each choice of k.
$ ls runs/pam/random001
1 101 201 301 401 501 601 701 801 901
These directories are leaves. There is a JSON file in each, containing the choices made. For example, runs/full/random003/401/control.json contains:
{
"algorithm": "full",
"tree": "/home/cmccoy/development/nestly/examples/adcl/trees/random003.tre",
"n_leaves": 1000,
"k": 401
}
The nestrun command-line tool allows you to run a command for each combination of parameters in a nest. It allows you to substitute parameters chosen by surrounding them in curly brackets, e.g. {algorithm}.
To see how long, and how much memory each run uses, we’ll use the short shell script time_rppr.sh:
1 2 3 4 5 6 | #!/bin/sh
export TIME='elapsed,maxmem,exitstatus\n%e,%M,%x'
/usr/bin/time -o time.csv \
rppr min_adcl_tree --algorithm {algorithm} --leaves {k} {tree}
|
Note the placeholders for the parameters to be provided at runtime: k, tree, and algorithm.
Running a script like time_rppr.sh on every experiment within a nest in parallel is facilitated by the nestrun script distributed with nestly:
$ nestrun -j 4 --template-file time_rppr.sh -d runs
(this will take awhile)
This command runs the shell script time_rppr.sh for each parameter choice, substituting the appropriate parameters. The -j 4 flag indicates that 4 processors should be used.
Now we have a little CSV file in each leaf directory, containing the running time:
|----------+--------+-------------|
| elapsed | maxmem | exitstatus |
|----------+--------+-------------|
| 17.78 | 471648 | 0 |
|----------+--------+-------------|
To analyze these en-masse, we need to combine them and add information about the parameters used to generate them. The nestagg script does just this.
$ nestagg delim -d runs -o results.csv time.csv -k algorithm,k,tree
Where -d runs indicates the directory containing program runs; -o results.csv specifies where to write the output; time.csv gives the name of the file in each leaf directory, and -k algorithm,k,tree lists the parameters to add to each row of the CSV files.
Looking at results.csv:
|----------+---------+------------+-----------+---------------------------------------+------|
| elapsed | maxmem | exitstatus | algorithm | tree | k |
|----------+---------+------------+-----------+---------------------------------------+------|
| 17.04 | 941328 | 0 | full | .../examples/adcl/trees/random001.tre | 1 |
| 20.86 | 944336 | 0 | full | .../examples/adcl/trees/random001.tre | 101 |
| 31.75 | 944320 | 0 | full | .../examples/adcl/trees/random001.tre | 201 |
| 39.34 | 980048 | 0 | full | .../examples/adcl/trees/random001.tre | 301 |
| 37.84 | 1118960 | 0 | full | .../examples/adcl/trees/random001.tre | 401 |
| 42.15 | 1382000 | 0 | full | .../examples/adcl/trees/random001.tre | 501 |
etc
Now we have something we can look at!
So: PAM is faster for large k, and always has lower maximum memory use.
(generated by examples/adcl/03analyze.R)
From examples/basic_nest/make_nest.py, this is a simple, combinatorial example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #!/usr/bin/env python
import glob
import math
import os
import os.path
from nestly import Nest
wd = os.getcwd()
input_dir = os.path.join(wd, 'inputs')
nest = Nest()
# Simplest case: Levels are added with a name and an iterable
nest.add('strategy', ('exhaustive', 'approximate'))
# Sometimes it's useful to add multiple keys to the nest in one operation, e.g.
# for grouping related data.
# This can be done by passing an iterable of dictionaries to the `Nest.add` call,
# each containing at least the named key, along with the `update=True` flag.
#
# Here, 'run_count' is the named key, and will be used to create a directory in the nest,
# and the value of 'power' will be added to each control dictionary as well.
nest.add('run_count', [{'run_count': 10**i, 'power': i}
for i in range(3)], update=True)
# label_func can be used to generate a meaningful name. Here, it strips the all
# but the file name from the file path
nest.add('input_file', glob.glob(os.path.join(input_dir, 'file*')),
label_func=os.path.basename)
# Items can be added that don't generate directories
nest.add('base_dir', [os.getcwd()], create_dir=False)
# Any function taking one argument (control dictionary) and returning an
# iterable may also be used.
# This one just takes the logarithm of 'run_count'.
# Since the function only returns a single result, we don't create a new directory.
def log_run_count(c):
run_count = c['run_count']
return [math.log(run_count, 10)]
nest.add('run_count_log', log_run_count, create_dir=False)
nest.build('runs')
|
This example is then run with the ../examples/basic_nest/run_example.sh script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #!/bin/sh
set -e
set -u
set -x
# Build a nested directory structure
./make_nest.py
# Let's look at a sample control file:
cat runs/approximate/1/file1/control.json
# Run `echo.sh` using every control.json under the `runs` directory, 2
# processes at a time
nestrun --processes 2 --template-file echo.sh -d runs
# Merge the CSV files named '{strategy}.csv' (where strategy value is taken
# from the control file)
nestagg delim '{strategy}.csv' -d runs -o aggregated.csv
|
echo.sh is just the simple script that runs nestrun and aggregates the results into an aggregate.csv file:
1 2 3 4 5 6 7 | #!/bin/sh
#
# Echo the value of two fake output variables: var1, which is always 13, and
# var2, which is 10 times the run_count.
echo "var1,var2
13,{run_count}0" > "{strategy}.csv"
|
This is a bit more complicated, with lookups on previous values of the control dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #!/usr/bin/env python
import glob
import os
import os.path
from nestly import Nest, stripext
wd = os.getcwd()
startersdir = os.path.join(wd, "starters")
winedir = os.path.join(wd, "wine")
mainsdir = os.path.join(wd, "mains")
nest = Nest()
bn = os.path.basename
# Start by mirroring the two directory levels in startersdir, and name those
# directories "ethnicity" and "dietary".
nest.add('ethnicity', glob.glob(os.path.join(startersdir, '*')),
label_func=bn)
# In the `dietary` key, the anonymous function `lambda ...` chooses as values
# names of directories the current `ethnicity` directory
nest.add('dietary', lambda c: glob.glob(os.path.join(c['ethnicity'], '*')),
label_func=bn)
## Now get all of the starters.
nest.add('starter', lambda c: glob.glob(os.path.join(c['dietary'], '*')),
label_func=stripext)
## Then get the corresponding mains.
nest.add('main', lambda c: [os.path.join(mainsdir, bn(c['ethnicity']) + "_stirfry.txt")],
label_func=stripext)
## Take only the tasty wines.
nest.add('wine', glob.glob(os.path.join(winedir, '*.tasty')),
label_func=stripext)
## The wineglasses should be chosen by the wine choice, but we don't want to
## make a directory for those.
nest.add('wineglass', lambda c: [stripext(c['wine']) + ' wine glasses'],
create_dir=False)
nest.build('runs')
|
This SConstruct file is an example of using nestly with the SCons build system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | # -*- python -*-
#
# This example takes every file in the inputs directory and performs the
# following operations. First, it cuts out a column range from every line in
# the file, in this case either 1-5 or 3-40. After it does this it optionally
# filters out every line that has an "o" or "O". Then it runs wc on every such
# file. These word counts then get aggregated together by the prep_tab.sh
# script.
#
# Assuming that SCons is installed, you should be able to run this example by
# simply typing `scons` in this directory. That should build a series of things
# in the `build` directory. Because this is a build system, deleting a file or
# directory in the build directory and then running scons will simply rerun the
# needed parts.
from os.path import join
import os
from nestly.scons import SConsWrap
from nestly import Nest
nest = Nest()
w = SConsWrap(nest, 'build')
# Initialize an aggregator that runs prep_tab.sh on all of its arguments. The
# `list` argument specifies that this aggregator will be a list of functions.
# This aggregator is then populated below with `append`.
@w.add_aggregate(list)
def all_counts(outdir, c, inputs):
return Command(join(outdir, 'all_counts.tab'),
inputs,
'./prep_tab.sh $SOURCES | column -t >$TARGET')
# Initialize an aggregator that concatenates all of the cut files into one.
@w.add_aggregate(list)
def all_cut(outdir, c, inputs):
return Command(join(outdir, 'all_cut.txt'),
inputs,
'cat $SOURCES >$TARGET')
# Add a nest with the name 'input_file' that the files in the inputs directory
# as its nestable list. Make its label function just the basename.
w.add('input_file', [join('inputs', f) for f in os.listdir('inputs')],
label_func=os.path.basename)
# This determines the column range we will cut out of the file.
w.add('cut_range', ['1-5', '3-40'])
# This adds a nest with the name 'cut', but also makes an SCons target out of
# the result.
@w.add_target()
def cut(outdir, c):
cut, = Command(join(outdir, 'cut'),
c['input_file'],
'cut -c {0[cut_range]} <$SOURCE >$TARGET'.format(c))
# Here we add this cut file to the all_cut aggregator.
c['all_cut'].append(cut)
return cut
# This determines if we remove the lines with o's.
w.add('o_choice', ['remove_o', 'leave_o'])
@w.add_target()
def o_choice(outdir, c):
# If we leave the o lines, then we don't have to do anything.
if c['o_choice'] == 'leave_o':
return c['cut']
# If we want to remove the o lines, then we have to make an SCons Command
# that does so with sed.
return Command(join(outdir, 'o_removed'),
c['cut'],
'sed "/[oO]/d" <$SOURCE >$TARGET')[0]
# Add a target for the word counts.
@w.add_target()
def counts(outdir, c):
counts, = Command(join(outdir, 'counts'),
c['o_choice'],
'wc <$SOURCE >$TARGET')
c['all_counts'].append(counts)
return counts
# When we finalize all of the aggregates, all of the aggregate functions are
# called, creating the corresponding dependencies.
w.finalize_all_aggregates()
|
nestly is a collection of functions designed to make running software with combinatorial choices of parameters easier.
Core functions for building nests.
Bases: object
Nests are used to build nested parameter selections, culminating in a directory structure representing choices made, and a JSON dictionary with all selections.
Build parameter combinations with Nest.add(), then create a nested directory structure with Nest.build().
Parameters: |
|
---|
Add a level to the nest
Parameters: |
|
---|
Generate the names of all control files under base_dir
Apply map_fn to the directories defined by control_iter
For each control file in control_iter, map_fn is called with the directory and control file contents as arguments.
Example:
>>> list(nest_map(['run1/control.json', 'run2/control.json'],
... lambda d, c: c['run_id']))
[1, 2]
Parameters: |
|
---|---|
Returns: | A generator of the results of applying map_fn to elements in control_iter |
SCons integration for nestly.
Bases: object
A Nest wrapper to add SCons integration.
This class wraps a Nest in order to provide methods which are useful for using nestly with SCons.
A Nest passed to SConsWrap must have been created with include_outdir=True, which is the default.
Add an aggregate target to this nest.
The first argument is a nullary factory function which will be called immediately for each of the current control dictionaries and stored in each dictionary with the given name like in add_target. After finalize_aggregate or finalize_all_aggregates are called, the decorated function will then be called in the same way as add_target, except with an additional argument: the value which was returned by the factory function.
Since nests added after the aggregate can access the factory function’s value, it can be mutated to provide additional values for use when the decorated function is called.
Add an SCons target to this nest.
The function decorated will be immediately called with each of the output directories and current control dictionaries. Each result will be added to the respective control dictionary for later nests to access.
Call the finalizers for one particular aggregate.
Finalizing an aggregate this way means that it will not be finalized by any future calls to finalize_all_aggregates.
Call the finalizers for all defined aggregates.
If any aggregates have been specifically finalized by finalize_aggregate, they will not be finalized again. This function itself calls finalize_aggregate; if finalize_all_aggregates is called twice, aggregates will not be finalized twice.
Aggregates will be finalized in the same order in which they were defined.
nestrun.py - run commands based on control dictionaries.
Bases: object
Metadata about a process run
‘Type’ for argparse - checks that file exists but does not open.
Substitute template arguments in in_file from variables in d, write the result to out_fobj.
Aggregate results of nestly runs.
Execute delim action.
Parameters: | arguments – Parsed command line arguments from main() |
---|
nestrun takes a command template and a list of control.json files with variables to substitute. Substitution is performed using the Python built-in str.format method. See the Python Formatter documentation for details on syntax, and examples/jsonrun/do_nestrun.sh for an example.
nestrun also handles some signals by default.
This tells nestrun to stop spawning jobs. All jobs that were already spawned will continue running.
This tells nestrun to terminate if received twice. On the first SIGTERM, nestrun will emit a warning message; on the second, it will terminate all jobs and then itself.
This tells nestrun to immediately write a list of all currently-running processes and their working directories to stderr, then flush stderr.
usage: nestrun.py [-h] [-j N] [--template 'template text'] [--stop-on-error]
[--template-file FILE] [--save-cmd-file SAVECMD_FILE]
[--log-file LOG_FILE | --no-log] [--dry-run]
[--summary-file SUMMARY_FILE] [-d DIR]
[control_files [control_files ...]]
nestrun - substitute values into a template and run commands in parallel.
optional arguments:
-h, --help show this help message and exit
-j N, --processes N, --local N
Run a maximum of N processes in parallel locally
(default: 2)
--template 'template text'
Command-execution template, e.g. bash {infile}. By
default, nestrun executes the templatefile.
--stop-on-error Terminate remaining processes if any process returns
non-zero exit status (default: False)
--template-file FILE Command-execution template file path.
--save-cmd-file SAVECMD_FILE
Name of the file that will contain the command that
was executed.
--log-file LOG_FILE Name of the file that will contain output of the
executed command.
--no-log Don't create a log file
--dry-run Dry run mode, does not execute commands.
--summary-file SUMMARY_FILE
Write a summary of the run to the specified file
Control files:
control_files Nestly control dictionaries
-d DIR, --directory DIR
Run on all control files under DIR. May be used in
place of specifying control files.
The nestagg command provides a mechanism for combining results of multiple runs, via a subcommand interface. Currently, the only supported action is merging delimited files from a set of leaves, adding values from the control dictionary on each. This is performed via nestagg delim.
usage: nestagg.py delim [-h] [-k KEYS | -x EXCLUDE_KEYS] [-m {fail,warn}]
[-d DIR] [-s SEPARATOR] [-t] [-o OUTPUT]
file_template [control.json [control.json ...]]
positional arguments:
file_template Template for the delimited file to read in each
directory [e.g. '{run_id}.csv']
control.json Control files
optional arguments:
-h, --help show this help message and exit
-k KEYS, --keys KEYS Comma separated list of keys from the JSON file to
include [default: all keys]
-x EXCLUDE_KEYS, --exclude-keys EXCLUDE_KEYS
Comma separated list of keys from the JSON file not to
include [default: None]
-m {fail,warn}, --missing-action {fail,warn}
Action to take when a file is missing [default: fail]
-d DIR, --directory DIR
Run on all control files under DIR. May be used in
place of specifying control files.
-s SEPARATOR, --separator SEPARATOR
Separator [default: ,]
-t, --tab Files are tab-separated
-o OUTPUT, --output OUTPUT
Output file [default: stdout]
SCons is an excellent build tool (analogous to make). The nestly.scons module is provided to make integrating nestly with SCons easier. SConsWrap wraps a Nest object to provide additional methods for adding nests. SCons is complex and is fully documented on their website, so we do not describe it here. However, for the purposes of this document, it suffices to know that dependencies are created when a target function is called.
The basic idea is that when writing an SConstruct file (analogous to a Makefile), these SConsWrap objects extend the usual nestly functionality with build dependencies. Specifically, there are functions that add targets to the nest. When SCons is invoked, these targets are identified as dependencies and the needed code is run. There are also aggregate functions (this is aggregate with a hard second “a”; rhymes with “Watergate”) that don’t get immediately called, but rather when the finalize_aggregate() method is called.
SConsWrap objects wrap and modify a Nest object. Each Nest object needs to have been created with include_outdir=True, which is the default.
Optionally, a destination directory can be given to the SConsWrap which will be passed to Nest.iter():
>>> nest = Nest()
>>> wrap = SConsWrap(nest, dest_dir='build')
In this example, all the nests created by wrap will go under the build directory. Throughout the rest of this document, nest will refer to this same Nest instance and wrap will refer to this same SConsWrap instance.
Nests can still be added to the nest object:
>>> nest.add('nest1', ['spam', 'eggs'])
SConsWrap also provides a convenience decorator SConsWrap.add_nest() for adding nests which use a function as their nestable. The following examples are exactly equivalent:
@wrap.add_nest('nest2', label_func=str.strip)
def nest2(c):
return [' __' + c['nest1'], c['nest1'] + '__ ']
def nest2(c):
return [' __' + c['nest1'], c['nest1'] + '__ ']
nest.add('nest2', nest2, label_func=str.strip)
Another advantage to using the decorator is that the name parameter is optional; if it’s omitted, the name of the nest is taken from the name of the function. As a result, the following example is also equivalent:
@wrap.add_nest(label_func=str.strip)
def nest2(c):
return [' __' + c['nest1'], c['nest1'] + '__ ']
Note
add_nest() must always be called before being applied as a decorator. @wrap.add_nest is not valid; the correct usage is @wrap.add_nest() if no other parameters are specified.
The fundamental action of SCons integration is in adding a target to a nest. Adding a target is very much like adding a nest in that it will add a key to the control dictionary, except that it will not add any branching to a nest. For example, successive calls to Nest.add() produces results like the following:
>>> nest.add('nest1', ['A', 'B'])
>>> nest.add('nest2', ['C', 'D'])
>>> pprint.pprint([c.items() for outdir, c in nest])
[[('nest1', 'A'), ('nest2', 'C')],
[('nest1', 'A'), ('nest2', 'D')],
[('nest1', 'B'), ('nest2', 'C')],
[('nest1', 'B'), ('nest2', 'D')]]
A crude illustration of how nest1 and nest2 relate:
# C .---- - -
# A .----------o nest2
# | D '---- - -
# o----o nest1
# | C .---- - -
# B '----------o nest2
# D '---- - -
Calling add_target(), however, produces slightly different results:
>>> nest.add('nest1', ['A', 'B'])
>>> @wrap.add_target()
... def target1(outdir, c):
... return 't-{0[nest1]}'.format(c)
...
>>> pprint.pprint([c.items() for outdir, c in nest])
[[('nest1', 'A'), ('target1', 't-A')],
[('nest1', 'B'), ('target1', 't-B')]]
And a similar illustration of how nest1 and target1 relate:
# t-A
# A .----------o------ - -
# o----o nest1 target1
# B '----------o------ - -
# t-B
add_target() does not increase the total number of control dictionaries from 2; it only updates each existing control dictionary to add the target1 key. This is effectively the same as calling add() (or add_nest()) with a function and returning an iterable of one item:
>>> nest.add('nest1', ['A', 'B'])
>>> @wrap.add_nest()
... def target1(c):
... return ['t-{0[nest1]}'.format(c)]
...
>>> pprint.pprint([c.items() for outdir, c in nest])
[[('nest1', 'A'), ('target1', 't-A')],
[('nest1', 'B'), ('target1', 't-B')]]
Astute readers might have noticed the key difference between the two: functions decorated with add_target() have an additional parameter, outdir. This allows targets to be built into the correct place in the directory hierarchy.
The other notable difference is that the function decorated by add_target() will be called exactly once with each control dictionary. A function added with add() may be called more than once with equal control dictionaries.
Like add_nest(), add_target() must always be called, and optionally takes the name of the target as the first parameter. No other parameters are accepted.
Aggregate functions are a special case of targets. Instead of the decorated function being called immediately, it will be called at some other specified moment. An example:
>>> nest.add('nest1', ['A', 'B'])
>>> @wrap.add_aggregate(list)
... def aggregate1(outdir, c, inputs):
... print 'agg', c['nest1'], inputs
...
>>> nest.add('nest2', ['C', 'D'])
>>> nest.add('nest3', ['E', 'F'])
>>> @wrap.add_target()
... def add_target(outdir, c):
... c['aggregate1'].append((c['nest2'], c['nest3']))
...
>>> wrap.finalize_aggregate('aggregate1')
agg A [('C', 'E'), ('C', 'F'), ('D', 'E'), ('D', 'F')]
agg B [('C', 'E'), ('C', 'F'), ('D', 'E'), ('D', 'F')]
The first argument to add_aggregate() is a factory function which will be called with no arguments and added to each control dictionary as the name of the aggregate. Targets added after the aggregate are able to access and modify the value added.
When the aggregate is finalized, it will be called with output directory and control dictionary like a target, but also with the value which was added to the control dictionary. This allows aggregates to use values from later targets.
Aggregates can either be finalized by calling finalize_aggregate() or finalize_all_aggregates(). The former will finalize a particular aggregate by name, while the latter finalizes all aggregates in the same order they were added.
The second parameter to add_aggregate() is the same as the first parameter to add_target(): the name of the aggregate, which will default to the name of the function if none is specified.
While the previous example demonstrate how to use the various methods of SConsWrap, they did not demonstrate how to actually call commands using SCons. The easiest way is to define the various targets from within the SConstruct file:
from nestly.scons import SConsWrap
from nestly import Nest
import os
nest = Nest()
wrap = SConsWrap(nest, 'build')
# Add a nest for each of our input files.
nest.add('input_file', [join('inputs', f) for f in os.listdir('inputs')],
label_func=os.path.basename)
# Each input will get transformed each of these different ways.
nest.add('transformation', ['log', 'unit', 'asinh'])
@wrap.add_target()
def transformed(outdir, c):
# The template for the command to run.
action = 'guppy mft --transform {0[transformation]} $SOURCE -o $TARGET'
# Command will return a tuple of the targets; we want the only item.
outfile, = Command(
source=c['input_file'],
target=os.path.join(outdir, 'transformed.jplace'),
action=action.format(c))
return outfile
A function name_targets() is also provided for more easily naming the targets of an SCons command:
@wrap.add_target('target1')
@name_targets
def target1(outdir, c):
return 'outfile1', 'outfile2', Command(
source=c['input_file'],
target=[os.path.join(outdir, 'outfile1'),
os.path.join(outdir, 'outfile2')],
action="transform $SOURCE $TARGETS")
In this case, target1 will be a dict resembling {'outfile1': 'build/outdir/outfile1', 'outfile2': 'build/outdir/outfile2'}.
Note
name_targets() does not preserve the name of the decorated function, so the name of the target must be provided as a parameter to add_target().
A more involved, runnable example is in the examples/scons directory.
nestly is a collection of functions designed to make running software with combinatorial choices of parameters easier.
Core functions for building nests.
Bases: object
Nests are used to build nested parameter selections, culminating in a directory structure representing choices made, and a JSON dictionary with all selections.
Build parameter combinations with Nest.add(), then create a nested directory structure with Nest.build().
Parameters: |
|
---|
Add a level to the nest
Parameters: |
|
---|
Generate the names of all control files under base_dir
Apply map_fn to the directories defined by control_iter
For each control file in control_iter, map_fn is called with the directory and control file contents as arguments.
Example:
>>> list(nest_map(['run1/control.json', 'run2/control.json'],
... lambda d, c: c['run_id']))
[1, 2]
Parameters: |
|
---|---|
Returns: | A generator of the results of applying map_fn to elements in control_iter |
SCons integration for nestly.
Bases: object
A Nest wrapper to add SCons integration.
This class wraps a Nest in order to provide methods which are useful for using nestly with SCons.
A Nest passed to SConsWrap must have been created with include_outdir=True, which is the default.
Add an aggregate target to this nest.
The first argument is a nullary factory function which will be called immediately for each of the current control dictionaries and stored in each dictionary with the given name like in add_target. After finalize_aggregate or finalize_all_aggregates are called, the decorated function will then be called in the same way as add_target, except with an additional argument: the value which was returned by the factory function.
Since nests added after the aggregate can access the factory function’s value, it can be mutated to provide additional values for use when the decorated function is called.
Add an SCons target to this nest.
The function decorated will be immediately called with each of the output directories and current control dictionaries. Each result will be added to the respective control dictionary for later nests to access.
Call the finalizers for one particular aggregate.
Finalizing an aggregate this way means that it will not be finalized by any future calls to finalize_all_aggregates.
Call the finalizers for all defined aggregates.
If any aggregates have been specifically finalized by finalize_aggregate, they will not be finalized again. This function itself calls finalize_aggregate; if finalize_all_aggregates is called twice, aggregates will not be finalized twice.
Aggregates will be finalized in the same order in which they were defined.
nestrun.py - run commands based on control dictionaries.
Bases: object
Metadata about a process run
‘Type’ for argparse - checks that file exists but does not open.
Substitute template arguments in in_file from variables in d, write the result to out_fobj.
Aggregate results of nestly runs.
Execute delim action.
Parameters: | arguments – Parsed command line arguments from main() |
---|