Welcome to StructOpt’s documentation!¶
StructOpt is a reverse structure determination toolkit.
What is reverse structure determination?¶
Reverse structure determination is a type of structural refinement that iteratively modifies an atomic structure by, for example, moving atoms in the structure with the goal to minimize a function (such as system energy). In atomistic simulations, the positions of the atoms are moved within the model at every step. After the atoms have moved, the structure is evaluated to see how “good” it is. If the structure is “better” than the previous step, the moved atoms are more likely to persist into the next generation. This process is repeated many times until acceptable structure(s) have been generated.
Many different metrics can be used to determine how “good” a structure is, and this is often material-dependent. The average energy per atom is one commonly used metric, and others include fits to experimental data (e.g. S(q) or g(r) data), medium-range order information available via FEM measurements, average coordination number, bond angle constraints, etc.
Different optimization algorithms can be used to minimize the different metrics including Monte Carlo, Genetic Algorithm, and Particle Swarm algorithms.
Overview of StructOpt¶
User Documentation¶
StructOpt is a structure optimization framework that incorportates multiple forward simulation techniques into its optimization scheme with the goal of identifying stable and realistic atomic structures. It is designed with modularity in mind. Nearly any forward simulation technique that takes an atomic model as input and outputs a fitness value can be integrated into this framework.
This documentation serves as both a user and developer guide for StructOpt. However, parts of this documentation are likely lacking. If you have questions, please post them as an issue on github.
StructOpt serves the purpose of structure refinment for multiple different materials including nanoparticles and metallic glasses and is highly customizable and extendable to new structure types. There are many different types of simulations that can be set up, which requires getting to know the relevent parameters. Examples are included in the github repository and comments via issues on our github page are welcome.
Contents¶
Core Concepts¶
Detailed information about StructOpt can be found in our paper: [in the submission process]
Overview and General Workflow¶
StructOpt uses a Genetic Algorithm to optimize a set of atomic structures according to a customizable objective function (aka cost function).
Genetic Algorithm¶
A genetic algorithm utilizes a population of structures rather than a single individual. A genetic algorithm, or evolutionary algorithm, is conceptually similar to genetic Darwinism where animals are replaced by “individuals”. In StructOpt an “individual” is an atomic model. A population of atomic models is first generated. Given this population, pairs of individuals are mated (aka crossed over) by selecting different volumes of different models and joining them. Crossovers always produce two children, one for each section of the models combined together. The offspring are added to the population. After the mating scheme has finished, single individuals can “mutate” (i.e. moving atoms in a unique way) to add new “genes” to the population’s gene pool. After the atoms have been moved via crossovers and mutations, the structures are relaxed. Finally, each structure is run though a series of “fitness” evaluations to determine how “fit to survive” it is, and the population is then reduced to its original size based on a number of optional selection criteria. This process is repeated many times.
In summary:
- Generate initial structures
- Locally relax structures
- Calculate fitness values (e.g. energies) of each structure
- Remove some individuals from the population based on their fitness value
- Perform crossovers and selected individuals to generate offspring for the next generation
- Perform mutations on the selected individuals in the current population and offspring for the next generation
- Repeat steps 2-6 until the convergence criteria are met
The relaxation and fitness evaluations are only performed on individuals that have been modified via crossovers and mutations. This avoids recomputing these expensive calculations for individuals that were unchanged during the generation’s crossover/mutation scheme.
During crossovers, the offspring are collected into a list. After all crossovers have been completed, these offspring are added to the entire population. Each individual in the entire population then has a chance to be mutated. There will therefore be a variable number of modified individuals that will need to be relaxed and fit during each generation. The number of modified individuals can only be predicted by using the probability of mutation and crossover.
Installation and Setup¶
StructOpt is written in Python 3 and as such requires a working Python 3 installation. We recommend setting up an Anaconda virtual environment exclusively for StructOpt.
Python Libraries¶
conda install numpy
conda install scipy
pip install ase
pip install natsorted
# Install mpi4py from source (below)
# Install LAMMPS (if needed)
mpi4py¶
On Madison’s ACI cluster:
module load compile/intel
module load mpi/intel/openmpi-1.10.2
Follow these instructions:
wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-X.Y.tar.gz
tar -zxf mpi4py-X.Y.tar.gz
cd mpi4py-X.Y
python setup.py build
python setup.py install --user
You can test your installation by following these instructions.
Installing StructOpt¶
To get the code, fork and clone the StructOpt repository or download the zip here. Add the location of the StructOpt folder (e.g. $HOME/repos/StructOpt/v2-experiments-and-energy/structopt
) to your PYTHONPATH
environment variable.
Create an environment variable called STRUCTOPT_HOME
with the same folder location as you added to your path.
Input Parameters¶
The input parameters are defined in a JSON file as a single dictionary. Due to the modular nature of StructOpt, the input file is a dictionary of dictionaries where keys and values often relate directly to function names and kwargs.
The parameters for a simulation can be defined in the optimizer file using structopt.setup(parameters)
where parameters
is either a filename or a dictionary.
The parameters for a simple Au nanoparticle example that finds the lowest energy configuration using an EAM potential in LAMMPS is given blow. Each part of the parameters will be discussed in the following sections.
Example:
{
"seed": 0,
"structure_type": "aperiodic",
"generators": {
"fcc": {
"number_of_individuals": 5,
"kwargs": {
"atomlist": [["Au", 55]],
"orientation": "100",
"cell": [20, 20, 20],
"a": 4.08
}
}
},
"fitnesses": {
"LAMMPS": {
"weight": 1.0,
"use_mpi4py": false,
"kwargs": {
"MPMD": 0,
"keep_files": true,
"min_style": "cg",
"min_modify": "line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam",
"potential_file": "$STRUCTOPT_HOME/potentials/Au_u3.eam",
"thermo_steps": 0,
"reference": {"Au": -3.930}
}
}
},
"relaxations": {
"LAMMPS": {
"order": 0,
"use_mpi4py": false,
"kwargs": {
"MPMD": 0,
"keep_files": true,
"min_style": "cg",
"min_modify": "line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam",
"potential_file": "$STRUCTOPT_HOME/potentials/Au_u3.eam",
"thermo_steps": 0
}
}
},
"convergence": {
"max_generations": 10
},
"mutations": {
"move_surface_atoms": {"probability": 0.0},
"move_atoms": {"probability": 0.0},
"move_atoms_group": {"probability": 0.0},
"rotate_atoms": {"probability": 0.0},
"rotate_cluster": {"probability": 0.0},
"rotate_all": {"probability": 0.0},
"move_surface_defects": {"probability": 1.0}
},
"crossovers": {
"rotate": {"probability": 0.7, "kwargs": {"repair_composition": true}}
},
"predators": {
"fuss": {"probability": 1.0}
},
"selections": {
"rank": {"probability": 1.0}
}
}
Global Parameters¶
Global parameters define parameters that pertain to the entire simulation.
structure_type¶
structure_type
(str)
: Defines the type of material being optimized. StructOpt currently supports the periodic
and aperiodic
structure types. The periodic
structure type defines periodic boundary conditions in all three directions, and the aperiodic
structure type has no periodic boundary conditions.
seed¶
seed
(int)
: The seed for the psuedo-random number generator. Two runs with identical input and seed should run identically (however, in rare cases rounding errors in the machine may cause some selection schemes to choose different individuals in runs that should be identical).
convergence¶
convergence
(dict)
: Convergence is a dictionary that defines when to stop the calculation. Currently, the only convergence criteria is max_generations
, which is set to an int. For example, the setting below parameters will run the optimizer for 200 generations.
Example:
"convergence": {"max_generations": 200}
post_processing¶
post_processing
(dict)
: Determines the outputs of the simulation. Currently, the only option is XYZs
, which determines how frequentely the xyz files of each generation should be printed. The rules for this are as follows.
XYZs
= 0: all generations are keptXYZs
> 0: everyXYZs
generation is keptXYZs
< 0: the lastXYZs
generations are kept
The below example is a run where only the last generation is kept (this is the default behavior because saving every individual is both time and disk-space intensive).
Example:
"post_processing": {"XYZs": -1}
Generators¶
Generators are functions for initializing the population. They are pseudo-random generators that depend on the seed
global parameter.
Generators are given as a dictionary entry defined by the generators
key in the input file. The structure of the generators dictionary with N desired generators is given below.
Example:
"generators": {
generator_1: {"number_of_individuals": n_1,
"kwargs": kwargs_1}
generator_2: {"number_of_individuals": n_2,
"kwargs": kwargs_2}
generator_3: {"number_of_individuals": n_3,
"kwargs": kwargs_3}
...
generator_N: {"number_of_individuals": n_N,
"kwargs": kwargs_N}
}
The string for generator_i, is the name of the generator one wants to use. The number of individuals that generator should generate is determined by the integer n_i. The sum of all n_i values determines the total size of the population, which is fixed throughout the run unless code is added to the optimizer to change the population size. kwargs_i are dictionaries that define the kwargs to the generator function being used. The kwargs are specific to the function and can be found in their help function, show below.
Crossovers¶
Crossovers are operations for combing two individuals chosen by a selection algorithm. The purpose of the crossover is to intelligently combine (mate) different individuals (parents) in a way to create new individuals (children) that have the features of the parents. Often the parents are chosen to be the best individuals in the population.
Crossovers are given as a dictionary entry defined by the crossovers
key in the input file. The structure of the crossovers dictionary with N desired selections is given below.
Example:
"crossovers": {
crossover_1: {"probability": p_1,
"kwargs": kwargs_1}
crossover_2: {"probability": p_2,
"kwargs": kwargs_2}
crossover_3: {"probability": p_3,
"kwargs": kwargs_3}
...
crossover_N: {"probability": p_N,
"kwargs": kwargs_N}
}
The string for crossover_i, is the name of the crossover that will be used. The probability p_i is the probability of the crossover occuring if a mate is determined to happen in the population. p_i values should sum to 1. kwargs_i are dictionaries that input the kwargs to the crossover function being used. These kwargs will be specific to the function and can be found in their help function.
Currently the only crossover in use in the algorithm is the cut-and-splice operator introduced by Deaven and Ho. The description is shown below.
Selections¶
Selections are operations for choosing which individuals to “mate” when producing new individuals. Individuals are chosen based on their fitness, and different selection functions determine how which individuals will be mated. The selection scheme impacts the diversity of subsequent populations.
Selections are given as a dictionary entry defined by the selections
key in the input file. The structure of the selections dictionary with N desired selections is given below.
Example:
"selections": {
selection_1: {"probability": p_1,
"kwargs": kwargs_1}
selection_2: {"probability": p_2,
"kwargs": kwargs_2}
selection_3: {"probability": p_3,
"kwargs": kwargs_3}
...
selection_N: {"probability": p_N,
"kwargs": kwargs_N}
}
The string for selection_i, is the name of the selection one wants to use. The probability p_i is the probability of the selection occuring if a mate is determined to happen in the population. p_i values should sum to 1. kwargs_i are dictionaries that input the kwargs to the selection function one is using. These will be specific to the function and can be found in their help function.
Predators¶
Similar to selections, predators are selection processes that select individuals based on their fitness. The distinction is that while selections select individuals with positive features to duplicate in children, predators select which individuals to keep in the next generation. Note, this must be done because crossovers and mutations increase the population every generation, and hence each generation requires a predator step.
Predators are given as a dictionary entry defined by the predators
key in the input file. The structure of the predators dictionary with N desired predators is given below.
Example:
"predators": {
predator_1: {"probability": p_1,
"kwargs": kwargs_1}
predator_2: {"probability": p_2,
"kwargs": kwargs_2}
predator_3: {"probability": p_3,
"kwargs": kwargs_3}
...
predator_N: {"probability": p_N,
"kwargs": kwargs_N}
}
The string for predator_i, is the name of the predator one wants to use. The probability p_i is the probability of the predator occuring on every individual in the population. p_i values should sum to 1. kwargs_i are dictionaries that input the kwargs to the predator function one is using. These will be specific to the function and can be found in their help function.
Mutations¶
Mutations are operations applied to an individual that changes its structure and composition. It is a local search operation, although the mutation itself can be written to perform small or larger changes.
Mutations are given as a dictionary entry defined by the mutations
key in the input file. The structure of the mutations dictionary with N desired mutations is given below.
Example:
"mutations": {
"preserve_best": "true" or "false",
"keep_original": "true" or "false",
"keep_original_best": "true" or "false,
mutation_1: {"probability": p_1,
"kwargs": kwargs_1}
mutation_2: {"probability": p_2,
"kwargs": kwargs_2}
mutation_3: {"probability": p_3,
"kwargs": kwargs_3}
...
mutation_N: {"probability": p_N,
"kwargs": kwargs_N}
}
The string for mutation_i, is the name of the mutation being used. The probability p_i is the probability of the mutation occuring on every individual in the population. p_i values should sum to any value between 0 and 1. kwargs_i are dictionaries that input the kwargs to the mutation function being used. These will be specific to the function and can be found in their help function.
In addition to specifying the mutations to use, the mutations
dictionary takes three special kwargs: preserve_best
, keep_original
, and keep_original_best
. Setting preserve_best
to true
, means the highest fitness individual will never be mutated. Setting keep_original
to true
means mutations will be applied to copies of individuals, not the individuals themselves. This means the original individual is not changed during a mutation. keep_original_best
applies keep_original
to only the best individual.
The currently implemented mutations can be found in the structopt/*/individual/mutations
folders depending on the structure typing being used. Note in all functions, the first argument is the atomic structure, which inserted by the optimizer. The user defines all of the other kwargs after the first input.
Relaxations¶
Relaxations perform a local relaxation to the atomic structure before evaluating their fitness. This is typically done after crossover and mutation operators are applied.
Relaxations differ from the previous operations in that they require varying amounts of resources. Hence, a subsequent section, Parallelization, will introduce ways to run your job with varying levels of parallel performance.
Relaxations are given as a dictionary entry defined by the relaxations
key in the input file. The structure of these dictionaries is shown below.
Example:
"relaxations": {
relaxation_1: {"order": o_1,
"kwargs": kwargs_1}
relaxation_2: {"order": o_2,
"kwargs": kwargs_2}
relaxation_3: {"order": o_3,
"kwargs": kwargs_3}
...
relaxation_N: {"order": o_N,
"kwargs": kwargs_N}
}
The string for relaxation_i, is the name of the relaxation being used. The order o_i is the order of the relaxation occuring on every individual in the population. kwargs_i are dictionaries that input the kwargs to the relaxation function being used. These will be specific to the function. More details of each relaxation module will be given in the following subsections.
LAMMPS¶
The LAMMPS relaxation module calls LAMMPS to relax the structure using a potential. Most of the kwargs can be found from the LAMMPS documentation.
The potential files available to use are listed below and are from the default potentials included from LAMMPS. Given a potential, enter in the potential_file
kwarg as "$STRUCTOPT_HOME/potentials/<name>"
. Note also that different potentials will have different lines of the pair_style
kwarg. If the user would like to use an unavailable potential file, please submit a pull request to this repository and the potential will be added. Currently available potentials can be found in the potentials/
directory.
AlCu.eam.alloy: Aluminum and copper alloy EAM (Cai and Ye, Phys Rev B, 54, 8398-8410 (1996))
Au_u3.eam: Gold EAM (SM Foiles et al, PRB, 33, 7983 (1986))
ZrCuAl2011.eam.alloy: Zirconium, copper, and aluminum glass (Howard Sheng at GMU. (hsheng@gmu.edu))
Fitnesses¶
Fitnesses evaluate the “goodness” of the individual, for example the simulated energy of the structure. Lower fitness values are better.
Fitnesses differ than the previous operations in that they require varying amounts of resources. Hence, a subsequent section, Parallelization, will introduce ways to run your job with varying levels of parallel performance.
Fitnesses are given as a dictionary entry defined by fitnesses
key in the input file. The structure of these dictionaries is shown below.
Example:
"fitnesses": {
fitness_1: {"weight": w_1,
"kwargs": kwargs_1}
fitness_2: {"weight": w_2,
"kwargs": kwargs_2}
fitness_3: {"weight": w_3,
"kwargs": kwargs_3}
...
fitness_N: {"weight": w_N,
"kwargs": kwargs_N}
}
The string for fitness_i, is the name of the fitness one wants to use. The weight w_i is the constant to multiply the fitness value returned by the fitness_i module. Note that all selections and predators operate on the total fitness, which is a sum of each fitness and their weight. kwargs_i are dictionaries that input the kwargs to the fitness function one is using. These will be specific to the function. More details of each fitness module will be given in the following subsections.
LAMMPS¶
The LAMMPS fitness module calls LAMMPS to calculate the potential energy of the structure. Most of the kwargs can be found from the LAMMPS documentation. In addition, most of the kwargs are the same as relaxations, except the fitness module of LAMMPS has a number of normalization options for returning the potential energy. These are described below.
The potential files available to use are listed below and are from the default potentials included from LAMMPS. Given a potential, enter in the potential_file
kwarg as "$STRUCTOPT_HOME/potentials/<name>"
. Note also that different potentials will have different lines of the pair_style
kwarg. If the user would like to use an unavailable potential file, please submit a pull request to this repository and the potential will be added.
AlCu.eam.alloy: Aluminum and copper alloy EAM (Cai and Ye, Phys Rev B, 54, 8398-8410 (1996))
Au_u3.eam: Gold EAM (SM Foiles et al, PRB, 33, 7983 (1986))
ZrCuAl2011.eam.alloy: Zirconium, copper, and aluminum glass (Howard Sheng at GMU. (hsheng@gmu.edu))
Parallelization¶
In addition to the module-specific parameters, each module requires two parallelization entries: use_mpi4py
and MPMD_cores_per_structure
. These two entries are mutually exclusive, meaning that only one can be turned on at a time. use_mpi4py
can take two values, true
or false
depending on whether the module should use the `one-structure-per-core <>`_ parallelization.
MPMD_cores_per_structure
can be disabled (if use_mpi4py
is true
) by setting it to 0
, but otherwise specifies the number of cores that each process/structure should be allocated within the MPI_Comm_spawn_multiple
command. There are two types of valid values for this parameter: 1) an integer specifying the number of cores per structure, or 2) a string of two integers separated by a dash specifying the minimum and maximum number of cores allowed (e.g. "4-16"
). MPMD_cores_per_structure
can also take the value of "any"
, and StructOpt will use as many cores as it can to run each individual.
Example:
"relaxations": {
"LAMMPS": {
"order": 0,
"use_mpi4py": true,
"kwargs": {
"MPMD_cores_per_structure": 0,
"keep_files": true,
"min_style": "cg\nmin_modify line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam/alloy",
"potential_file": "$STRUCTOPT_HOME/potentials/ZrCuAl2011.eam.alloy",
"thermo_steps": 1000
}
}
}
Outputs¶
All outputs are contained in a log folder with the timestamp when the simulation started (with 1-second resolution).
Examples¶
Examples can be found in the examples/
directory on github.
Running StructOpt¶
StructOpt can be run on a single processor or in parallel using MPI. Depending on the cluster/environment you are using, you may need to load the following modules:
module load lammps-31Jan14
module load compile/intel
module load mpi/intel/openmpi-1.10.2
StructOpt can be run serially using the following command:
python $STRUCTOPT_HOME/structopt/optimizers/genetic.py structopt.in.json
In a parallel environment with N processors, StructOpt can be run with the following command:
mpirun -n N python $STRUCTOPT_HOME/structopt/optimizers/genetic.py structopt.in.json
The output will exist in the folder the command was run from.
Parallelism¶
In general, the parallelized parts of StructOpt are the fitness and relaxation modules.
StructOpt’s fitness and relaxation modules allow two parallelization mechanisms. The first is the simplest case where each structure is assigned to a single core. The core does the significant processing for one structure by running the module’s code. This is optimal when the module does not implement MPI, or the code is relatively fast.
The second parallelization method, called MPMD (see documentation online for MPI_Comm_spawn_multiple
), is a type of advanced dynamic process management but remains relatively easy to use within StructOpt. It allows MPI code to be used within modules and for those modules to be processed on an arbitrary number of cores.
For functions that are only run on the root core (e.g. crossovers and mutations), the root decorator is used on the main fitness
or relaxation
function to broadcast the return value of the function to all cores.
StructOpt acts as a master process (“master program” may be a better word) that runs in Python and uses MPI (via mpi4py) to communicate between cores. This master process/program makes MPI_Comm_spawn_multiple
calls to C and Fortran programs (which also use MPI). While the C and Fortran processes run, the master python program waits until they are finished. As an example in this section, we will assume StructOpt is using 16 cores to do calculations on 4 structures.
In terms of MPMD parallelization, StructOpt does two primary things:
- Uses MPI to do preprocessing for the spawning in step (2).
MPI_Barrier
is called after this preprocessing to ensure that all ranks have finished their preprocessing before step (2) begins. Note that the preprocessing is distributed across all 16 cores (via the one-core-per structure parallelism usingmpi4py
), and at the end of the preprocessing the resulting information is passed back to the root rank (e.g. rank == 0). - After the preprocessing, the root rank spawns 4 workers, each of which use 4 cores (i.e. all 16 cores are needed to run all 4 processes at the same time). These workers are spawned through either a relaxation or fitness evaluation module, which is done via
MPI_Comm_spawn_multiple
. These workers can use MPI to communicate within their 4 cores. In the master StructOpt program, only the root rank spawns the C or Fortran subprocesses, and the modules wait until the spawned processes finish before they continue execution.
Cores per Structure Use Cases¶
ncores == len(population)
: One core per structurencores < len(population)
: One core per structure, but all the structure cannot be run at oncencores > len(population)
: Multiple cores per structures
Unfortunately, it is impossible to predict the number of structures that will be need to be relaxed and fitted after crossovers and mutations have been performed on the population. As a result, all of the above cases are possible (and probable) for any given simulation.
mpi4py: One structure per core¶
Main idea: One structure per core, or multiple structures per core that execute serially in a for-loop. The module must be written in python (or callable from python like LAMMPS through ASE) and implemented directly into StructOpt.
mpi4py allows MPI commands to be run within python.
Installation¶
mpi4py needs to be installed from source against OpenMPI 1.10.2 because (at the time of developing this package) newer versions of OpenMPI had bugs in their implementation of MPI_Comm_spawn_multiple
. Follow the instructions here under “3.3: Using distutils”. In short:
# Setup modules so that `mpi/intel/openmpi` is loaded and `mpirun` finds that executable
wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-X.Y.tar.gz
tar -zxf mpi4py-X.Y.tar.gz
cd mpi4py-X.Y
python setup.py build
python setup.py install --user
Depending on the installation and hardware, the following parameters may need to be added when running a StructOpt simulation: -mca btl tcp,sm,self
forces the ethernet interfaces to use TCP rather than infiniband.
MPMD: Multiple cores per structure¶
Multiple program, multiple data (MPMD) is a form of MPI parallelization where multiple MPI communicators are used synchonously to run multiple MPI processes at the same time. MPMD can be used within mpirun
by separating each command by colons. Each command is preceded by the -n
option whcih specifies the number of cores to be used for that executable. MPMD can also be used from another MPI master process which calls MPI_Comm_spawn_multiple
. This is how StructOpt implements its advanced parallelization techniques to integrate MPI relaxation and fitness programs into its framework. The executable needs to implement MPMD by disconnecting a parent process if it exists (see here and here for an example parent/child implementation).
JobManager¶
Introduction¶
The purpose of the JobManager
module is to provide a python wrapper for submitting and tracking jobs in a queue environment.
Configuration¶
The JobManager
is initially built for a PBS queue environment, so many of the commands will have to be modified for usage in a different queue environment. These customizations will likely take place in the following files.
- The
submit
andwrite_submit
function in thestructopt/utilities/job_manager.py
file will likely need to be updated to reflect your specific queue environment. - The dictionaries held in
structopt.utilities/rc.py
is the first attempt to store some dictionaries specific to the queue environment. Many queue specific variables are drawn from here.
Submitting jobs¶
Single job¶
The script below is an example script of submitting a single job to a queue using the JobManager
. The optimization run is a short run of a Au55nanoparticle using only LAMMPS. A large part of the script is defining the input, which goes into the JobManager
class. These inputs are given below.
calcdir
: This is a string that tells where the calculation is run. Note that the calculation itself is run within thecalcdir/logs{time}
directory, which is created when the job starts to run on the queue. Unless an absolute path, thecalcdir
directory is always given with respect to directory that the job script is run fromoptimizer
: This is a string of the optimizer file used for the calculation. These files can be found in thestructopt/optimizers
folder. Upon run, a copy of this script is placed insde of thecalcdir
directory and accessed from there.structopt_parameters
: This is a dictionary object that should mirror the input file you are trying to submitsubmit_parameters
: This dictionary holds the submit parameters. These will be specific to the queue system in use. In this example, we specify the the submission system, queue, number of nodes, number of cores, and walltime.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
calcdir = 'job_manager_examples/Au55-example'
structopt_parameters = {
"seed": 0,
"structure_type": "cluster",
...
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
job = JobManager(calcdir, optimizer, structopt_parameters, submit_parameters)
job.optimize()
Upon running this script, the user should get back an exception called structopt.utilities.exceptions.Submitted
with the jobid. This is normal behavior and communicates that the job has successfully been submitted.
Multiple jobs¶
One advantage of the job manager is that it allows one to submit multiple jobs to the queue. This is often useful for tuning the optimizer against different inputs. The script below is an example of submitting the same job at different seeds.
In the previous script, submitting a single job successfully with JobManager.optimizer
method resulted in an exception. We can catch this exception with a try
and except
statement. This is shown below in the script where upon a successful submission, the script prints out the jobid to the user.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
structopt_parameters = {
"seed": 0,
"structure_type": "cluster",
...
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
seeds = [0, 1, 2, 3, 4]
for seed in seeds:
structopt_parameters['seed'] = seed
calcdir = 'job_manager_examples/Au55-seed-{}'.format(seed)
job = JobManager(calcdir, optimizer, structopt_parameters, submit_parameters)
try:
job.optimize()
except Submitted:
print(calcdir, job.get_jobid(), 'submitted')
job_manager_examples/Au55-seed-0 936454.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-1 936455.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-2 936456.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-3 936457.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-4 936458.bardeen.msae.wisc.edu submitted
Tracking jobs¶
In the previous section, we covered how to submit a new job from an empty directory. This is done by first initializing an instance of the StructOpt.utilities.job_manager.JobManager
class with a calculation directory along with some input files and then submitting the job with the JobManager.optimize
method. The JobManager.optimize
method knows what to do because upon initialization, it detected an empty directory. If the directory was not empty and contained a StructOpt job, the JobManager
knows what to do with it if optimize
was run again. This is all done with exceptions.
The four primary exceptions that are returned upon executing the optimize
method are below along with their explanations.
Submitted
: This exception is returned if a job is submitted from the directory. This is done whenJobManager.optimize
is called in an empty directory orJobManager.optimize
is called with the kwargrestart=True
in a directory where a job is not queued or running.Queued
: The job is queued and has not started running. There should be no output files to be analyzed.Running
: The job is running and output files should be continously be updated. These output files can be used for analysis before the job has finished running.UnknownState
: This is returned if thecalcdir
is not an empty directory doesn’t detect it as a StructOpt run. A StructOpt run is detected when astructopt.in.json
file is found in thecalcdir
.
Note that if no exception is returned, it means the job is done and is ready to be analyzed. Job.optimize
does nothing in this case.
One way of using these three exceptions is below. If the job is submitted or Queued, we want the script to stop and not submit the job. If it is running, additional commands can be used to track the progress of the job.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
calcdir = 'job_manager_examples/Au55-example'
structopt_parameters = {
"seed": 0,
"structure_type": "cluster",
...
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
job = JobManager(calcdir, optimizer, structopt_parameters, submit_parameters)
try:
job.optimize()
except (Submitted, Queued):
print(calcdir, job.get_jobid(), 'submitted or queued')
except Running:
pass
job_manager_examples/Au55-example 936453.bardeen.msae.wisc.edu submitted or queued
Restarting jobs¶
Sometimes jobs need to be restarted or continued from the last generation. The JobManager
does this by submitting a new job from the same calcdir
folder the previous job was run in. Because calculations take place in unique log{time}
directories, the job will run in a new log{time}
directory. Furthermore, the JobManager
modifies the structopt.in.json
file so the initial population of the new job are the XYZ files of the last generation of the previous run. The code below is an example of restarting the first run of this example. The only difference between this code and the one presented in the previous section is that a restart=True
kwarg has been added to the JobManager.optimize
command.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
calcdir = 'job_manager_examples/Au55-example'
structopt_parameters = {
"seed": 0,
"structure_type": "aperiodic",
...
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
job = JobManager(calcdir, optimizer, structopt_parameters, submit_parameters)
job.optimize(restart=True)
Relaxation and Fitness Modules¶
LAMMPS¶
Installation¶
Follow the standard installation instructions.
Create an environment variable called LAMMPS_COMMAND
that points to the serial LAMMPS executable after installation.
VASP¶
VASP currently cannot be run within StructOpt. We have done quite a bit of work to get it running, but it isn’t working yet. If you’d like to work on this, please post an issue on github.
FEMSIM¶
Installation¶
Fork and clone the repository from github.
Using OpenMPI 1.10.2 compilers, follow the instructions to compile femsim.
Create an environment variable called FEMSIM_COMMAND
pointing to the newly created femsim
executable.
STEM¶
The STEM software is included in the StructOpt repository.
References: http://pubs.acs.org/doi/abs/10.1021/acsnano.5b05722
Creating Your Own Module¶
Any forward simulation that takes an atomic model as input and outputs a “fitness” value that can be interpreted as a measure of “goodness” of a structure can be integrated into StructOpt. Contact the developers by making an issue on github to get in touch with us.
Why Python?¶
Python has been widely accepted by scientific community. From the invaluable scientific software packages such as numpy, scipy, mpi4py, dask, and pandas to the thousands of specialized software packages, the scientific support through Python is enormous.
StructOpt is meant to solve new problems rather than be a better tool for solving well understood problems. Machine learning techniques are being developed for optimization problems at an extremely fast rate. This requires research efforts to evolve equally quickly. As a result, many of the users of StructOpt will be exploring new scientific territory and will be in the development process of creation and iteration on their tools. Python is a forerunner for development applications due to its ability to scale from off-hand scripts to large packages and applications.
Via Jupyter notebooks, Python code is on its way to becoming readable for the general community. This, combined with the drive toward more accessible and better documented scientific code, may provide a powerful combination to encourage scientific reproducability and archival. To this end, StructOpt’s data explorer is meant to ease the process of analyzing and displaying useful information.
Troubleshooting¶
For now, please see any issues on github.
StructOpt Package¶
The Optimizer¶
structopt.Optimizer¶
structopt.common.population.Population¶
structopt.common.population.crossovers¶
structopt.common.population.fitnesses¶
structopt.common.population.relaxations¶
structopt.common.population.mutations¶
structopt.common.population.predators¶
structopt.common.population.selections¶
structopt.common.individual.Individual¶
structopt.common.individual.mutations¶
structopt.common.individual.fitnesses¶
structopt.common.individual.relaxations¶
structopt.common.individual.generators¶
structopt.common.individual.fingerprinters¶
Contributing¶
Bug fixes and error reports are always welcome. We accept PRs and will try to fix issues that have detailed descriptions and are reproducable.
If you have a forward simulation module that you wish to contribute, please make an issue and the correct people will get email notifications so we can respond.
License Agreement¶
StructOpt is distributed under the MIT license, reproduced below:
Copyright (c) 2016 University of Wisconsin-Madison Computational Materials Group
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.