QCFractal¶
A platform to compute, store, organize, and share large-scale quantum chemistry data.
QCFractal emphasizes the following virtues:
Organize: Large sets of computations are organized into Collections for easy reference and manipulation.
Reproducibility: All steps of commonly used pipelines are elucidated in the input without additional human intervention.
History: Organize all computations ever executed in a research group in an easily indexable manner, never lose data again!
Accessibility: Easily share quantum chemistry data with colleagues or the community through accessibility settings.
Elasticity: Scale from a single laptop to a dozen simultaneous supercomputers.
Transferability: Run many different quantum chemistry, semiempirical, or force field programs with a single unified interface.
QCFratal within the QCArchive stack¶
Within the QCArchive stack, QCFractal is responsible for:
Accepting requests for calculations, submitted from a client such as QCPortal.
Running these calculations on disparate compute resources through Quantum Chemistry programs and procedures supported by QCEngine.
Storing and organizing the results of these calculations in a database.
Providing access to stored results to clients such as QCPortal.
The diagram below illustrates a representative data flow:

Pipelines¶
QCFractal supports several high-throughput pipelines:
Ensembles of single point quantum chemistry computations.
Procedures such as geometry optimization, finite different gradients and Hessians, and complete basis set extrapolations.
Complex scenarios such as the OpenFF torsion scan workflow.
User-defined procedures.
Data Sharing¶
QCFractal allows for the creation of a single data repository for a group with varying permission levels to allow control of who can access your data or submit new tasks to your compute network. Hook into the central MolSSI repository or create your own!
Scales from laptops to supercomputers¶
QCFractal is build to be elastic, scaling from a single researcher on a laptop to large multi-PI groups on dozens of different supercomputers. QCFractal provides a central location to marshal and distribute data or computation. QCFractal can switch between a variety of computational queue backends such as:
Parsl - High-performance workflows with backend support for common schedulers, supercomputers, and cloud compute.
Dask - A graph-based workflow engine for laptops and small clusters.
Fireworks - A asynchronous Mongo-based distributed queuing system.
Additional backends such as BOINC, RADICAL Pilot, Kubernetes, and Balsam are under active consideration. Contact us if you are interested in one of these use-cases.
Index¶
Getting Started
Install QCFractal¶
You can install QCFractal with conda
(recommended) or with pip
(with some caveats).
The below commands install QCFractal and its required dependencies, but not any of the quantum chemistry codes nor the software to run Queue Managers. This is done to avoid requiring all software which can interface with QCFractal, and instead requires the user to obtain the software they individually require.
Conda¶
You can install QCFractal using conda:
>>> conda install qcfractal -c conda-forge
This installs QCFractal and its dependencies. The QCFractal package is maintained on the conda-forge channel.
Conda Pre-Created Environments¶
QCFractal can also be installed through pre-configured environments you can pull through our Conda Channel:
>>> conda env create qcarchive/{environment name}
>>> conda activate {environment name}
The environments are created from the YAML files hosted on the Anaconda Cloud, which then need to be activated to use. You can find all of the environments here.
If you want to use a different name than the environment file, you can add a -n {custom name}
flag to the
conda env
command.
The environments must be installed as new environments and cannot be installed into existing ones.
The environments are designed to provide pre-built environments which include additional programs beyond QCFractal
itself which are designed for use in production or practical experimentation. For instance, the qcf-manager-openff
environment also installs a couple quantum chemistry programs, a distributed compute Queue Adapter, and a
service which QCFractal can run. This environment can be deployed for immediate use on a remote compute
site (e.g. a cluster) and connect to a QCFractal instance to consume compute tasks.
Pip¶
Warning
Installing QCFractal from PyPi/pip
requires an existing PostgreSQL installation on your computer. Whether that be
through a native install on your device (e.g. managed clusters), a direct installer, yum
install, a conda
install, or otherwise; it must be installed first or the Psycopg2
package will complain about missing the
pg_config
. Installation of PostgreSQL manually is beyond the scope of these instructions, so we recommend
either using a Conda install of QCFractal or contacting your systems administrator.
If you have PosgreSQL installed already, you can also install QCFractal using pip
:
>>> pip install qcfractal
Test the Installation¶
Note
There are several optional packages QCFractal can interface with for additional features such as visualization,
Queue Adapters, and services. These are not installed by default and so you can expect many
of the tests will be marked with skip
or s
.
You can test to make sure that Fractal is installed correctly by first installing pytest
.
From conda
:
>>> conda install pytest -c conda-forge
From pip
:
>>> pip install pytest
Then, run the following command:
>>> pytest --pyargs qcfractal
QCFractal ships with a small testing plugin which should be automatically detected and gives you access to the
--runslow
and --runexamples
PyTest CLI flags. The --runslow
flag tells the testing suite to run any test
the developers think are a bit more time consuming than the others. Without this flag, you will see many tests (such
as those for the CLI) skipped.
Example¶
This tutorial will go over general QCFractal usage to give a feel for the ecosystem. In this tutorial, we employ Snowflake, a simple QCFractal stack which runs on a local machine for demonstration and exploration purposes.
Installation¶
To begin this quickstart tutorial, first install the QCArchive Snowflake environment from conda:
conda env create qcarchive/qcfractal-snowflake -n snowflake
conda activate snowflake
If you have a pre-existing environment with qcfractal
, ensure that rdkit
and geometric
are installed from the conda-forge
channel and psi4
and dftd3
from the psi4
channel.
Importing QCFractal¶
First let us import two items from the ecosystem:
FractalSnowflakeHandler - This is a FractalServer that is temporary and is used for trying out new things.
qcfractal.interface
is the QCPortal module, but if using QCFractal it is best to import it locally.
Typically we alias qcportal
as ptl
. We will do the same for qcfractal.interface
so that the code can be used anywhere.
[1]:
from qcfractal import FractalSnowflakeHandler
import qcfractal.interface as ptl
We can now build a temporary server which acts just like a normal server, but we have a bit more direct control of it.
Warning! All data is lost when this notebook shuts down! This is for demonstration purposes only! For information about how to setup a permanent QCFractal server, see the Setup Quickstart Guide.
[2]:
server = FractalSnowflakeHandler()
server
[2]:
FractalSnowflakeHandler
- Server: db_4dd4a305_1692_4f29_ae5a_ac4c8bcb1002
- Address: https://localhost:60885
We can then build a typical FractalClient to automatically connect to this server using the client() helper command. Note that the server names and addresses are identical in both the server and client.
[3]:
client = server.client()
client
[3]:
FractalClient
- Server: FractalSnowFlake_db_4dd4a
- Address: https://localhost:60885/
- Username: None
Adding and Querying data¶
A server starts with no data, so let’s add some! We can do this by adding a water molecule at a poor geometry from XYZ coordinates. Note that all internal QCFractal values are stored and used in atomic units; whereas, the standard Molecule.from_data() assumes an input of Angstroms. We can switch this back to Bohr by adding a units
command in the text string.
[4]:
mol = ptl.Molecule.from_data("""
O 0 0 0
H 0 0 2
H 0 2 0
units bohr
""")
mol
We can then measure various aspects of this molecule to determine its shape. Note that the measure
command will provide a distance, angle, or dihedral depending if 2, 3, or 4 indices are passed in.
This molecule is quite far from optimal, so let’s run an geometry optimization!
[5]:
print(mol.measure([0, 1]))
print(mol.measure([1, 0, 2]))
2.0
90.0
Evaluating a Geometry Optimization¶
We originally installed psi4
and geometric
, so we can use these programs to perform a geometry optimization. In QCFractal, we call a geometry optimization a procedure
, where procedure
is a generic term for a higher level operation that will run multiple individual quantum chemistry energy, gradient, or Hessian evaluations. Other procedure
examples are finite-difference computations, n-body computations, and torsiondrives.
We provide a JSON-like input to the client.add_procedure() command to specify the method, basis, and program to be used. The qc_spec
field is used in all procedures to determine the underlying quantum chemistry method behind the individual procedure. In this way, we can use any program or method that returns a energy or gradient quantity to run our
geometry optimization! (See also add_compute().)
[6]:
spec = {
"keywords": None,
"qc_spec": {
"driver": "gradient",
"method": "b3lyp",
"basis": "6-31g",
"program": "psi4"
},
}
# Ask the server to compute a new computation
r = client.add_procedure("optimization", "geometric", spec, [mol])
print(r)
print(r.ids)
ComputeResponse(nsubmitted=1 nexisting=0)
['1']
We can see that we submitted a single task to be evaluated and the server has not seen this particular procedure before. The ids
field returns the unique id
of the procedure. Different procedures will always have a unique id
, while identical procedures will always return the same id
. We can submit the same procedure again to see this effect:
[7]:
r2 = client.add_procedure("optimization", "geometric", spec, [mol])
print(r)
print(r.ids)
ComputeResponse(nsubmitted=1 nexisting=0)
['1']
Querying Procedures¶
Once a task is submitted, it will be placed in the compute queue and evaluated. In this particular case the FractalSnowflakeHandler uses your local hardware to evaluate these jobs. We recommend avoiding large tasks!
In general, the server can handle anywhere between laptop-scale resources to many hundreds of thousands of concurrent cores at many physical locations. The amount of resources to connect is up to you and the amount of compute that you require.
Since we did submit a very small job it is likely complete by now. Let us query this procedure from the server using its id
like so:
[15]:
proc = client.query_procedures(id=r.ids)[0]
proc
[15]:
<OptimizationRecord(id='1' status='COMPLETE')>
This OptimizationRecord object has many different fields attached to it so that all quantities involved in the computation can be explored. For this example, let us pull the final molecule (optimized structure) and inspect the physical dimensions.
Note: if the status does not say COMPLETE
, these fields will not be available. Try querying the procedure again in a few seconds to see if the task completed in the background.
[16]:
final_mol = proc.get_final_molecule()
[17]:
print(final_mol.measure([0, 1]))
print(final_mol.measure([1, 0, 2]))
final_mol
1.844111236246719
108.30246679333517
This water molecule has bond length and angle dimensions much closer to expected values. We can also plot the optimization history to see how each step in the geometry optimization affected the results. Though the chart is not too impressive for this simple molecule, it is hopefully illuminating and is available for any geometry optimization ever completed.
[18]:
proc.show_history()
Collections¶
Submitting individual procedures or single quantum chemistry tasks is not typically done as it becomes hard to track individual tasks. To help resolve this, Collections
are different ways of organizing standard computations so that many tasks can be referenced in a more human-friendly way. In this particular case, we will be exploring an intermolecular potential dataset.
To begin, we will create a new dataset and add a few intermolecular interactions to it.
[19]:
ds = ptl.collections.ReactionDataset("My IE Dataset", ds_type="ie", client=client, default_program="psi4")
We can construct a water dimer that has fragments used in the intermolecular computation with the --
divider. A single water molecule with ghost atoms can be extracted like so:
[20]:
water_dimer = ptl.Molecule.from_data("""
O 0.000000 0.000000 0.000000
H 0.758602 0.000000 0.504284
H 0.260455 0.000000 -0.872893
--
O 3.000000 0.500000 0.000000
H 3.758602 0.500000 0.504284
H 3.260455 0.500000 -0.872893
""")
water_dimer.get_fragment(0, 1)
Many molecular entries can be added to this dataset where each is entry is a given intermolecular complex that is given a unique name. In addition, the add_ie_rxn
method to can automatically fragment molecules.
[21]:
ds.add_ie_rxn("water dimer", water_dimer)
ds.add_ie_rxn("helium dimer", """
He 0 0 -3
--
He 0 0 3
""")
[21]:
ReactionEntry(attributes={}, reaction_results={'default': {}}, name='helium dimer', stoichiometry={'default1': {'b3855c64e9f61158f5e449e2df7b79bf1fa599d7': 2.0}, 'cp1': {'e925ae0ef797c65c359e9c8029f0342df495d116': 2.0}, 'default': {'a0d757c65af76f309b2685f31f63b3da781d0ba7': 1.0}, 'cp': {'a0d757c65af76f309b2685f31f63b3da781d0ba7': 1.0}}, extras={})
Once the Collection is created it can be saved to the server so that it can always be retrived at a future date
[22]:
ds.save()
[22]:
'1'
The client can list all Collections currently on the server and retrive collections to be manipulated:
[23]:
client.list_collections()
[23]:
tagline | ||
---|---|---|
collection | name | |
ReactionDataset | My IE Dataset | None |
[24]:
ds = client.get_collection("ReactionDataset", "My IE Dataset")
ds
[24]:
<ReactionDataset(name=`My IE Dataset`, id='1', client='https://localhost:60885/') >
Computing with collections¶
Computational methods can be applied to all of the reactions in the dataset with just a few simple lines:
[25]:
ds.compute("B3LYP-D3", "def2-SVP")
[25]:
<ComputeResponse(nsubmitted=10 nexisting=0)>
By default this collection evaluates the non-counterpoise corrected interaction energy which typically requires three computations per entry (the complex and each monomer). In this case we compute the B3LYP and -D3 additive correction separately, nominally 12 total computations. However the collection is smart enough to understand that each Helium monomer is identical and does not need to be computed twice, reducing the total number of computations to 10 as shown here. We can continue to compute additional methods. Again, this is being evaluated on your computer! Be careful of the compute requirements.
[26]:
ds.compute("PBE-D3", "def2-SVP")
[26]:
<ComputeResponse(nsubmitted=10 nexisting=0)>
A list of all methods that have been computed for this dataset can also be shown:
[28]:
ds.list_values()
[28]:
stoichiometry | name | ||||||
---|---|---|---|---|---|---|---|
native | driver | program | method | basis | keywords | ||
True | energy | psi4 | b3lyp | def2-svp | None | default | B3LYP/def2-svp |
b3lyp-d3 | def2-svp | None | default | B3LYP-D3/def2-svp | |||
pbe | def2-svp | None | default | PBE/def2-svp | |||
pbe-d3 | def2-svp | None | default | PBE-D3/def2-svp |
The above only shows what has been computed and does not pull this data from the server to your computer. To do so, the get_history
command can be used:
[32]:
print(f"DataFrame units: {ds.units}")
ds.get_values()
DataFrame units: kcal / mol
[32]:
PBE/def2-svp | B3LYP/def2-svp | PBE-D3/def2-svp | B3LYP-D3/def2-svp | |
---|---|---|---|---|
water dimer | -5.11552 | -4.75187 | -5.63188 | -5.57767 |
helium dimer | 0.000684115 | 0.000674748 | 0.000207208 | 0.00017274 |
You can also visualize results and more!
[33]:
ds.visualize(["B3LYP-D3", "PBE-D3"], "def2-SVP", bench="B3LYP/def2-svp", kind="violin")
This is only the very beginning of what you can do with QCFractal! Explore the documentation to learn more capabilities. In particular, see the next section for a quickstart guide on how to set up QCFractal in production.
If you like the project, consider starring us on GitHub. If you have any questions, join our Slack channel.
Setup Overview¶
QCFractal comprises two components:
The Server (
qcfractal-server
), which accepts compute and data queries and maintains a database of tasks and results. The Server should be run continuously on a persistent machine.One or more Managers (
qcfractal-manager
). The Managers pull work from the Server, use attached compute resources to complete the work, and report results back to the server. Managers may be turned off and on at any time. Managers connect to compute resources through Adapters.
In the Quickstart Tutorial, the above components were combined within a python environment using FractalSnowflake
.
In general, the Server and Manager(s) are run separately in different processes, often on different machines.
For detailed information about the relationship between Server and Manager, see Fractal Queue Managers.
Common Use Cases¶
The table below lists some common use cases for QCFractal:
Use case |
|
|
Recommended manager |
---|---|---|---|
Snowflake |
Snowflake |
Snowflake |
|
Local |
Local |
Pool |
|
Head node |
Head node |
Parsl |
|
Personal server, head node (if permitted) |
Head node |
Parsl |
|
Personal server |
Head node of each cluster |
Parsl |
|
Personal server or cloud instance |
Docker container |
Pool |
QCFractal is highly adaptable and is not limited to the above use cases.
For example, it possible to mix local, cluster, supercomputer, and cloud Managers simultaneously.
In addition, a cloud instance may provide a good option for running qcfractal-server
when a persistent web-exposed server is not otherwise available.
Quickstart Setups¶
This section presents quickstart setup guides for the above common use cases. The guides assume that QCFractal has been installed (see Install QCFractal). General guides are also available:
Single Workstation¶
This quickstart guide addresses QCFractal setup on a single computer which will be used for the Server, Manager, user client, and compute. On the workstation, initialize the Server:
qcfractal-server init
Next, start the Server and ProcessPoolExecutor Manager:
nohup qcfractal-server start --local-manager 1 &
The second command starts qcfractal-server
in the background.
It also starts one Worker which will pull tasks from the Server and run them.
Test if everything is setup by running a Hartree-Fock calculation on a single hydrogen molecule,
as in the Example (note this requires psi4
):
python
>>> import qcfractal.interface as ptl
# Note that server TLS verification is turned off (verify=False) since all components are run locally.
>>> client = ptl.FractalClient(address="localhost:7777", verify=False)
>>> mol = ptl.Molecule(symbols=["H", "H"], geometry=[0, 0, 0, 0, 5, 0])
>>> mol_id = client.add_molecules([mol])[0]
>>> r = client.add_compute("psi4", "HF", "STO-3G", "energy", None, [mol_id])
# Wait a minute for the job to complete
>>> proc = client.query_procedures(id=r.ids)[0]
>>> print(proc)
<ResultRecord(id='0' status='COMPLETE')>
>>> print(proc.properties.scf_total_energy)
-0.6865598095254312
Private Cluster¶
This quickstart guide addresses QCFractal setup on a private cluster comprising a head node and compute nodes, with a Scheduler such as SLURM, PBS, or Torque.
This guide requires Parsl which may be installed with pip
or conda
.
Begin by initializing the Server on the cluster head node:
qcfractal-server init
Next, start the Server in the background:
nohup qcfractal-server start &
The Manager must be configured before use. Create a configuration file (e.g. in ~/.qca/qcfractal/my_manager.yaml
) based on the following template:
common:
adapter: parsl
tasks_per_worker: 1
cores_per_worker: 6
memory_per_worker: 64
max_workers: 5
scratch_directory: "$TMPDIR"
cluster:
node_exclusivity: True
scheduler: slurm
parsl:
provider:
partition: CLUSTER
cmd_timeout: 30
You may need to modify these values to match the particulars of your cluster. In particular:
The
scheduler
andpartition
options should be set to match the details of your Scheduler (e.g. SLURM, PBS, Torque).Options related to Workers should be set appropriately for the compute node on your cluster. Note that Parsl requires that full nodes be allocated to each Worker (i.e.
node_exclusivity: True
).
For more information on Manager configuration, see Fractal Queue Managers and Queue Manager Example YAML Files.
Finally, start the Manager in the background on the cluster head node:
nohup qcfractal-manager --config-file <path to config YAML> --verify=False &
Note that TLS certificate verification is disabled (--verify=False
) because the Manager and Server are both run on the head node.
Test if everything is setup by running a Hartree-Fock calculation on a single hydrogen molecule,
as in the Example (note this requires psi4
):
python
>>> import qcfractal.interface as ptl
# Note that server TLS verification is turned off (verify=False) since all components are run locally.
>>> client = ptl.FractalClient(address="localhost:7777", verify=False)
>>> mol = ptl.Molecule(symbols=["H", "H"], geometry=[0, 0, 0, 0, 5, 0])
>>> mol_id = client.add_molecules([mol])[0]
>>> r = client.add_compute("psi4", "HF", "STO-3G", "energy", None, [mol_id])
# Wait a minute for the job to complete
>>> proc = client.query_procedures(id=r.ids)[0]
>>> print(proc)
<ResultRecord(id='0' status='COMPLETE')>
>>> print(proc.properties.scf_total_energy)
-0.6865598095254312
Cloud Compute¶
This quickstart guide addresses QCFractal setup using cloud resources for computation. The Server should be set up on a persistent server for which you have permission to expose ports. For example, this may be a dedicated webserver, the head node of a private cluster, or a cloud instance. The Manager will be set up on a Kubernetes cluster as a Deployment.
Begin by initializing the Server on your persistent server:
qcfractal-server init
The QCFractal server receives connections from Managers and clients on TCP port 7777.
You may optionally specify the --port
option to choose a custom port.
You may need to configure your firewall to allow access to this port.
Because the Server will be exposed to the internet,
security should be enabled to control access.
Enable security by changing the YAML file (default: ~/.qca/qcfractal/qcfractal_config.yaml
)
fractal.security
option to local
:
- security: null
+ security: local
Start the Server:
nohup qcfractal-server start &
Note
You may optionally provide a TLS certificate to enable host verification for the Server
using the --tls-cert
and --tls-key
options.
If a TLS certificate is not provided, communications with the server will still be encrypted,
but host verification will be unavailable
(and Managers and clients will need to specify verify=False
).
Next, add users for admin, the Manager, and a user (you may choose whatever usernames you like):
qcfractal-server user add admin --permissions admin
qcfractal-server user add manager --permissions queue
qcfractal-server user add user --permissions read write compute
Passwords will be automatically generated and printed. You may instead specify a password with the --password
option.
See Fractal Server User for more information.
The Manager will be set up on a Kubernetes cluster as a Deployment, running Docker images which each contain QCEngine, QCFractal, and relevant programs. In this guide, we use the molssi/qcarchive_worker_openff Docker image. For execution, this image includes:
Note
You may wish to set up a custom Docker image for your specific use case. The Dockerfile corresponding to the molssi/qcarchive_worker_openff image is included below as an example.
FROM continuumio/miniconda3
RUN conda install -c psi4/label/dev -c conda-forge psi4 dftd3 mp2d qcengine qcfractal rdkit geometric
RUN groupadd -g 999 qcfractal && \
useradd -m -r -u 999 -g qcfractal qcfractal
USER qcfractal
ENV PATH /opt/local/conda/envs/base/bin/:$PATH
ENTRYPOINT qcfractal-manager --config-file /etc/qcfractal-manager/manager.yaml
Create a manager configuration file (e.g. manager.yaml
) following the template below.
common:
adapter: pool
tasks_per_worker: 1
cores_per_worker: 4 # CHANGEME number of cores/worker
memory_per_worker: 16 # CHANGEME memory/worker in Gb
max_workers: 1
scratch_directory: "$TMPDIR"
server:
fractal_uri: api.qcarchive.molssi.org:443 # CHANGEME URI of your server goes here
username: manager
password: foo # CHANGEME manager password goes here
verify: True # False if TLS was skipped earlier
manager:
manager_name: MyManager # CHANGEME name your manager
queue_tag: null
log_file_prefix: null
update_frequency: 30
test: False
Add the manager configuration as a secret in Kubernetes:
kubectl create secret generic manager-config-yaml --from-file=manager.yaml
This allows us to pass the manager configuration into the Docker container securely.
Next, create a Kubernetes deployment configuration file (e.g. deployment.yaml
) following the template below.
The cpu
and memory
fields of the deployment configuration should match the cores_per_worker
and memory_per_worker
fields of the manager configuration.
In this setup, replicas
determines the number of workers; the max_workers
and tasks_per_worker
fields
in the manager configuration should be set to 1.
apiVersion: apps/v1
kind: Deployment
metadata:
name: qcfractal-manager
labels:
k8s-app: qcfractal-manager
spec:
replicas: 4 # CHANGEME: number of images here
selector:
matchLabels:
k8s-app: qcfractal-manager
template:
metadata:
labels:
k8s-app: qcfractal-manager
spec:
containers:
- image: molssi/qcarchive_worker_openff # you may wish to specify your own Docker image here
name: qcfractal-manager-pod
resources:
limits:
cpu: 4 # CHANGEME number of cores/worker
memory: 16Gi # CHANGEME memory/worker
volumeMounts:
- name: manager-config-secret
mountPath: "/etc/qcfractal-manager"
readOnly: true
volumes:
- name: manager-config-secret
secret:
secretName: manager-config-yaml
Start the deployment:
kubectl apply -f deployment.yaml
Note
You can view the status of your deployment with:
kubectl get deployments
You can view the status of individual “Pods” (Docker containers) with:
kubectl get pods --show-labels
To get the output of invidual Managers:
kubectl logs <pod name>
To get Kubernetes metadata and status information about a Pod:
kubectl describe pod <pod name>
See the Kubernetes Deployment documentation for more information.
Test if everything is setup by running a Hartree-Fock calculation on a single hydrogen molecule,
as in the Example
(note this requires psi4
to be installed on at least one compute resource).
This test may be run from any machine.
python
>>> import qcfractal.interface as ptl
# Note that server TLS verification may need to be turned off if (verify=False).
# Note that the Server URL and the password for user will need to be filled in.
>>> client = ptl.FractalClient(address="URL:Port", username="user", password="***")
>>> mol = ptl.Molecule(symbols=["H", "H"], geometry=[0, 0, 0, 0, 5, 0])
>>> mol_id = client.add_molecules([mol])[0]
>>> r = client.add_compute("psi4", "HF", "STO-3G", "energy", None, [mol_id])
# Wait a minute for the job to complete
>>> proc = client.query_procedures(id=r.ids)[0]
>>> print(proc)
<ResultRecord(id='0' status='COMPLETE')>
>>> print(proc.properties.scf_total_energy)
-0.6865598095254312
Server Setup¶
A qcfractal-server
instance contains a record of all results, task queue,
and collection information and provides an interface to all FractalClients
and qcfractal-manager
s. All data is stored in a PostgreSQL database which is often
handled transparently. A server
instance should be run on hardware that is for long periods stable (not
shutdown often), accessible from both compute resources and users via HTTP,
and have access to permanent storage. This location is often either research
groups local computers, a supercomputer with appropriately allocated
resources for this task, or the cloud.
Using the Command Line¶
The command line is used for qcfractal-server
instances that are long-term
data storage and task distribution engines. To begin, a qcfractal-server
is first initialized using the command line:
>>> qcfractal-server init
This initialization will create ~/.qca/qcfractal
folder (which can be
altered) which contains default specifications for the qcfractal-server
and for the underlying PostgreSQL database. The qcfractal-server init
--help
CLI command will describe all parameterizations of this folder. In
addition to the specification information, a new PostgreSQL database will be
initialized and started in the background. The background PostgreSQL database
consumes virtually no resources when not in use and should not interfere with
your system.
Once a qcfractal-server
instance is initialized the server can then be run
with the start
command:
>>> qcfractal-server start
The QCFractal server is now ready to accept new connections.
Within a Python Script¶
Canonical workflows can be run from a Python script using the FractalSnowflake
instance. With default options a FractalSnowflake
will spin up a fresh database which
will be removed after shutdown.
Warning
All data inside a FractalSnowflake
is temporary and will be deleted when the
FractalSnowflake
shuts down.
>>> from qcfractal import FractalSnowflake
>>> server = FractalSnowflake()
# Obtain a FractalClient to the server
>>> client = server.client()
A standard FractalServer
cannot be started in a Python script and then interacted with
as a FractalServer
uses asynchronous programming by default. FractalServer.stop
will
stop the script.
Within a Jupyter Notebook¶
Due to the way Jupyter Notebooks work an interactive server needs to take a different approach
than the canonical Python script. To manipulate a server in a Jupyter Notebook a
FractalSnowflakeHandler
can be used much in the same way as a FractalSnowflake
.
Warning
All data inside a FractalSnowflakeHandler
is temporary and will be deleted when the
FractalSnowflakeHandler
shuts down.
>>> from qcfractal import FractalSnowflakeHandler
>>> server = FractalSnowflakeHandler()
# Obtain a FractalClient to the server
>>> client = server.client()
Full Server Config Settings¶
The full CLI and configs for the Fractal Server can be found on the following pages:
Fractal Server Config file: Fractal Server Config
qcfractal-server init
: Fractal Server Initqcfractal-server start
: Fractal Server Startqcfractal-server upgrade
: Fractal Server Upgrade
Manager Setup¶
Once a QCFractal server is running, compute can be attached to it by spinning
up qcfractal-manager
. These qcfractal-manager
connect to your
FractalServer instance, adds tasks to a distributed workflow manager, and
pushes complete tasks back to the qcfractal-server
instance. These
qcfractal-manager
should be run on either the machine that is executing
the computations or on the head nodes of supercomputers and local clusters.
Distributed Workflow Engines¶
QCFractal supports a number of distributed workflow engines to execute computations. Each of these has strengths and weaknesses depending on the workload, task specifications, and resources that the compute will be executed on. In general, we recommend the following:
For laptops and single nodes: ProcessPoolExecutor
For local clusters: Dask or Parsl
The ProcessPoolExecutor uses built-in Python types and requires no additional
libraries while Dask requires dask
, dask.distributed
, and
dask_jobqueue
(Dask Distributed Docs, Dask Jobqueue Docs); and Parsl requires parsl
(Parsl Docs)
Using the Command Line¶
Note
The CLI + YAML config file is the current recommended way to start and run Fractal Queue Managers
At the moment only ProcessPoolExecutor qcfractal-manager
can be spun up purely
from the command line as other distributed workflow managers require
additional setup through a YAML config file.
For the full docs for setting up a Manager, please see the Manager documentation pages.
Launching a qcfractal-manager
using a ProcessPoolExecutor:
$ fractal-manager executor
[I 190301 10:45:50 managers:118] QueueManager:
[I 190301 10:45:50 managers:119] Version: v0.5.0
[I 190301 10:45:50 managers:122] Name Information:
[I 190301 10:45:50 managers:123] Cluster: unknown
[I 190301 10:45:50 managers:124] Hostname: qcfractal.local
[I 190301 10:45:50 managers:125] UUID: 0d2b7704-6ac0-4ef7-b831-00aa6afa8c1c
[I 190301 10:45:50 managers:127] Queue Adapter:
[I 190301 10:45:50 managers:128] <ExecutorAdapter client=<ProcessPoolExecutor max_workers=8>>
[I 190301 10:45:50 managers:131] QCEngine:
[I 190301 10:45:50 managers:132] Version: v0.6.1
[I 190301 10:45:50 managers:150] Connected:
[I 190301 10:45:50 managers:151] Version: v0.5.0
[I 190301 10:45:50 managers:152] Address: https://localhost:7777/
[I 190301 10:45:50 managers:153] Name: QCFractal Server
[I 190301 10:45:50 managers:154] Queue tag: None
[I 190301 10:45:50 managers:155] Username: None
[I 190301 10:45:50 managers:194] QueueManager successfully started. Starting IOLoop.
The connected qcfractal-server
instance can be controlled by:
$ qcfractal-manager --fractal-uri=api.qcfractal.molssi.org:443
Only basic settings can be started through the CLI and most of the options require a YAML config file to get up and going. You can check all valid YAML options in the Manager documentation pages or you can always check the current schema from the CLI with:
$ qcfractal-manager --schema
The CLI has several options which can examined with:
qcfractal-manager --help
Every option specified in the CLI has an equal option in the YAML config file (except for --help
and --schema
),
but many YAML options are not present in the CLI due to their complex nature. Any option set in both places will
defer to the CLI’s setting, allowing you to overwrite some of the common YAML config options on invocation.
Using the Python API¶
qcfractal-managers
can also be created using the Python API.
Warning
This is for advanced users and special care needs to be taken to ensure that both the manager and the workflow tool need to understand the number of cores and memory available to prevent over-subscription of compute.
from qcfractal.interface import FractalClient
from qcfractal import QueueManager
import dask import distributed
fractal_client = FractalClient("localhost:7777")
workflow_client = distributed.Client("tcp://10.0.1.40:8786")
ncores = 4
mem = 2
# Build a manager
manager = QueueManager(fractal_client, workflow_client, cores_per_task=ncores, memory_per_task=mem)
# Important for a calm shutdown
from qcfractal.cli.cli_utils import install_signal_handlers
install_signal_handlers(manager.loop, manager.stop)
# Start or test the loop. Swap with the .test() and .start() method respectively
manager.start()
Testing¶
A qcfractal-manager
can be tested using the --test
argument and does
not require an active qcfractal-manager
, this is very useful to check if
both the distributed workflow manager is setup correctly and correct
computational engines are found.
$ qcfractal-manager --test
[I 190301 10:55:57 managers:118] QueueManager:
[I 190301 10:55:57 managers:119] Version: v0.5.0+52.g6eab46f
[I 190301 10:55:57 managers:122] Name Information:
[I 190301 10:55:57 managers:123] Cluster: unknown
[I 190301 10:55:57 managers:124] Hostname: Daniels-MacBook-Pro.local
[I 190301 10:55:57 managers:125] UUID: 0cd257a6-c839-4743-bb33-fa55bebac1e1
[I 190301 10:55:57 managers:127] Queue Adapter:
[I 190301 10:55:57 managers:128] <ExecutorAdapter client=<ProcessPoolExecutor max_workers=8>>
[I 190301 10:55:57 managers:131] QCEngine:
[I 190301 10:55:57 managers:132] Version: v0.6.1
[I 190301 10:55:57 managers:158] QCFractal server information:
[I 190301 10:55:57 managers:159] Not connected, some actions will not be available
[I 190301 10:55:57 managers:389] Testing requested, generating tasks
[I 190301 10:55:57 managers:425] Found program rdkit, adding to testing queue.
[I 190301 10:55:57 managers:425] Found program torchani, adding to testing queue.
[I 190301 10:55:57 managers:425] Found program psi4, adding to testing queue.
[I 190301 10:55:57 base_adapter:124] Adapter: Task submitted rdkit
[I 190301 10:55:57 base_adapter:124] Adapter: Task submitted torchani
[I 190301 10:55:57 base_adapter:124] Adapter: Task submitted psi4
[I 190301 10:55:57 managers:440] Testing tasks submitting, awaiting results.
[I 190301 10:56:04 managers:444] Testing results acquired.
[I 190301 10:56:04 managers:451] All tasks retrieved successfully.
[I 190301 10:56:04 managers:456] rdkit - PASSED
[I 190301 10:56:04 managers:456] torchani - PASSED
[I 190301 10:56:04 managers:456] psi4 - PASSED
[I 190301 10:56:04 managers:465] All tasks completed successfully!
Records Documentation
The records created from adding additional compute.
Results¶
A result is a single quantum chemistry method evaluation, which might be an energy, an analytic gradient or Hessian, or a property evaluation. Collections of evaluations such as finite-difference gradients, complete basis set extrapolation, or geometry optimizations belong under the “Procedures” heading.
Indices¶
A result can be found based off a unique tuple of (driver, program, molecule_id, keywords_set, method, basis)
driver
- The type of calculation being evaluated (i.e.energy
,gradient
,hessian
,properties
)program
- A lowercase string representation of the quantum chemistry program used (gamess
,nwchem
,psi4
, etc.)molecule_id
- The ObjectId of the molecule used in the computation.keywords_set
- The key to the options set stored in the database (e.g.default
->{"e_convergence": 1.e-7, "scf_type": "df", ...}
)method
- A lowercase string representation of the method used in the computation (e.g.b3lyp
,mp2
,ccsd(t)
).basis
- A lowercase string representation of the basis used in the computation (e.g.6-31g
,cc-pvdz
,def2-svp
)
Schema¶
All results are stored using the QCSchema so that the storage is quantum chemistry program agnostic. An example of the QCSchema input is shown below:
{
"schema_name": "qc_json_input",
"schema_version": 1,
"molecule": {
"geometry": [
0.0, 0.0, -0.1294,
0.0, -1.4941, 1.0274,
0.0, 1.4941, 1.0274
],
"symbols": ["O", "H", "H"]
},
"driver": "energy",
"model": {
"method": "MP2",
"basis": "cc-pVDZ"
},
"keywords": {},
}
This input would correspond to the following output:
{
"schema_name": "qc_json_output",
"schema_version": 1,
"molecule": {
"geometry": [
0.0, 0.0, -0.1294,
0.0, -1.4941, 1.0274,
0.0, 1.4941, 1.0274
],
"symbols": ["O", "H", "H"]
},
"driver": "energy",
"model": {
"method": "MP2",
"basis": "cc-pVDZ"
},
"keywords": {},
"provenance": {
"creator": "QM Program",
"version": "1.1",
"routine": "module.json.run_json"
},
"return_result": -76.22836742810021,
"success": true,
"properties": {
"calcinfo_nbasis": 24,
"calcinfo_nmo": 24,
"calcinfo_nalpha": 5,
"calcinfo_nbeta": 5,
"calcinfo_natom": 3,
"return_energy": -76.22836742810021,
"scf_one_electron_energy": -122.44534536383037,
"scf_two_electron_energy": 37.62246494040059,
"nuclear_repulsion_energy": 8.80146205625184,
"scf_dipole_moment": [0.0, 0.0, 2.0954],
"scf_iterations": 10,
"scf_total_energy": -76.02141836717794,
"mp2_same_spin_correlation_energy": -0.051980792916251864,
"mp2_opposite_spin_correlation_energy": -0.15496826800602342,
"mp2_singles_energy": 0.0,
"mp2_doubles_energy": -0.20694906092226972,
"mp2_total_correlation_energy": -0.20694906092226972,
"mp2_total_energy": -76.22836742810021
}
}
Procedures¶
Pictorial Procedure Flowchart¶
See the flowchart showing the psuedo-calls made when the Client adds a procedure to Fractal.
Services¶
Services are unique workflows where there is an iterative component on the server. A typical service workflow looks like the following:
A client submits a new service request to the server.
A service is created on the server and placed in the service queue.
A service iteration is called that will spawn new tasks.
The service waits until all generated tasks are complete.
The service repeats 3 and 4 until the service iterations are complete.
The service cleans intermediate data, finalizes the data representation, and marks itself complete.
The TorsionDrive service will be used as an example to illuminate the above steps. The TorsionDrive service optimizes the geometry of a biomolecule at a number of frozen dihedral angles to provide an energy profile of the rotation of this dihedral bond.
Consider the service using a concrete example of scanning the hydrogen peroxide dihedral:
A client submits a task to scan the HOOH molecule dihedral every 90 degrees as a service.
The service is received by the server, and the first 0-degree dihedral geometry optimization Task is spawned.
The service waits until the 0-degree Task is complete, and then generates 90 and -90-degree tasks based off this 0-degree geometry.
The service waits for the two new tasks to complete and spawns 0 and 180-degree tasks based on the 90 and - 90-degree geometries.
The service waits for the 90- and -90-degree tasks to complete. Then it builds its final data structure for user querying and marks itself complete.
The service technology allows the FractalServer
to complete very complex
workflows of arbitrary design. To see a pictorial representation of this
process, please see the
flowchart showing the pseudo-calls when a
service is added to the FractalServer
TorsionDrive¶
Fractal Call Flowcharts¶
The interface between the Portal client, the Fractal server, and the distributed compute resources is not something easily conveyed by text. We have created flowchart diagrams to help explain what happens from the time Portal invokes a call to Fractal, to the time that Fractal finishes handling the request. These diagrams are simplified to not show every routine and middleware call, but instead to provide a visual aid to what is happening to help understanding.
add_compute
¶
This flowchart follows the logic behind a user’s call to add a compute action to fractal and any attached distributed compute system.

add_procedure
or add_service
¶
When a user calls add_procedure
or add_service
, much of the same
logic is called. The major difference is which side of the distributed
compute engine the logic of the subsequent procedural calls are handled,
on the compute side, or the Fractal side.
This flowchart shows both ends and provides a different path for each call show by the different colored connecting arrows.

Manager Documentation
Setting up and running Fractal’s Queue Managers on your system.
Fractal Queue Managers¶
Queue Managers are the processes which interface with the Fractal Server and clusters, supercomputers, and cloud resources to execute the tasks in the Fractal Server. These managers pull compute tasks from the Server, and then pass them to various distributed back ends for computation for a variety of different needs. The architecture of the Fractal Server allows many managers to be created in multiple physical locations. Currently, Fractal supports the following:
Pool - A python ProcessPoolExecutor for computing tasks on a single machine (or node).
Dask - A graph-based workflow engine for laptops and small clusters.
Parsl - High-performance workflows.
Fireworks - An asynchronous Mongo-based distributed queuing system.
These backends allow Fractal to be incredibly elastic in utilizing computational resources, scaling from a single laptop to thousands of nodes on physically separate hardware. Our end goal is to be able to setup a manager at a physical site and allow it to scale up and down as its task queue requires and execute compute over long periods of time (months) without intervention.
The basic setup of the Queue Managers and how they interact with the Server is as follows:

In this, multiple managers talk to a central Fractal Server and deploy tasks to different compute resources based on availability, physical location, and tags.
The main goals of the Queue Manager is to reduce the user’s level of expertise needed to start compute with Fractal and, more importantly, to need as little manual intervention as possible to have persistent compute. Ideally, you start the Manager in a background process, leave it be while it checks in with the Fractal Server from time to time to get tasks, and pushes/pulls tasks from the distributed Adapter as need be.
The communication between each of the layers involved, and the mechanism by which they communicate is summarized in this image:

The different levels of communication are all established automatically once the user configures the manager, and this image shows how information flow from point-to-point.
The manager itself is a fairly lightweight process and consumes very little CPU power on its own. You should talk with your sysadmin before running this on a head node, but the Queue Manager itself will consume less than 1% CPU we have found and virtually no RAM.
If you are interested in the more detailed workings of the Manager, please see the Detailed Manager Information page. However, the information on that page is not required to set up and run a Manager.
Queue Manager Quick Starts¶
For those who just want to get up and going, consider the following examples.
Laptop/Desktop Quick Boot¶
To get a Manager set up with defaults, running on local hardware, consuming local CPU and RAM, targeting a Fractal Server running locally, run the following:
$ qcfractal-manager
SLURM Cluster, Dask Adapter¶
To start a manager with a dask Adapter, on a SLURM cluster, consuming 1 CPU and 8 GB of ram, targeting a Fractal
Server running on that cluster, and using the SLURM partition default
, save the following YAML config file:
common:
adapter: dask
tasks_per_worker: 1
cores_per_worker: 1
memory_per_worker: 8
cluster:
scheduler: slurm
dask:
queue: default
and then run the following command:
$ qcfractal-manager --config-file="path/to/config.yaml"
replacing the config-file
arg with the path to the file you saved. You will need dask
and dask-jobqueue
(Dask Distributed Docs, Dask Jobqueue Docs), to run this example, which are not packages required by Fractal unless you are running compute
managers; if you use a different Adapter, you would need a separate set of packages.
Queue Manager CLI¶
The CLI for the Fractal Queue Manager acts as an option-specific overwrite of the YAML file for various options and therefore its flags can be set in tandem with the YAML. However, it does not have as extensive control as the YAML file and so complex Managers (like those running Dask and Parsl) need to be setup in YAML.
In case this ever falls out of date, you can always run qcfractal-manager --help
to get the most up-to-date
help block.
$ qcfractal-manager --help
usage: qcfractal-manager [-h] [--config-file CONFIG_FILE] [--adapter ADAPTER]
[--tasks_per_worker TASKS_PER_WORKER]
[--cores-per-worker CORES_PER_WORKER]
[--memory-per-worker MEMORY_PER_WORKER]
[--scratch-directory SCRATCH_DIRECTORY] [-v]
[--fractal-uri FRACTAL_URI] [-u USERNAME]
[-p PASSWORD] [--verify VERIFY]
[--manager-name MANAGER_NAME] [--queue-tag QUEUE_TAG]
[--log-file-prefix LOG_FILE_PREFIX]
[--update-frequency UPDATE_FREQUENCY]
[--max-queued-tasks MAX_QUEUED_TASKS] [--test]
[--ntests NTESTS] [--schema]
A CLI for a QCFractal QueueManager with a ProcessPoolExecutor, Dask, or Parsl
backend. The Dask and Parsl backends *requires* a config file due to the
complexity of its setup. If a config file is specified, the remaining options
serve as CLI overwrites of the config.
optional arguments:
-h, --help show this help message and exit
--config-file CONFIG_FILE
Common Adapter Settings:
--adapter ADAPTER The backend adapter to use, currently only {'dask',
'parsl', 'pool'} are valid.
--tasks_per_worker TASKS_PER_WORKER
The number of simultaneous tasks for the executor to
run, resources will be divided evenly.
--cores-per-worker CORES_PER_WORKER
The number of process for each executor's Workers
--memory-per-worker MEMORY_PER_WORKER
The total amount of memory on the system in GB
--scratch-directory SCRATCH_DIRECTORY
Scratch directory location
-v, --verbose Increase verbosity of the logger.
FractalServer connection settings:
--fractal-uri FRACTAL_URI
FractalServer location to pull from
-u USERNAME, --username USERNAME
FractalServer username
-p PASSWORD, --password PASSWORD
FractalServer password
--verify VERIFY Do verify the SSL certificate, leave off (unset) for
servers with custom SSL certificates.
QueueManager settings:
--manager-name MANAGER_NAME
The name of the manager to start
--queue-tag QUEUE_TAG
The queue tag to pull from
--log-file-prefix LOG_FILE_PREFIX
The path prefix of the logfile to write to.
--update-frequency UPDATE_FREQUENCY
The frequency in seconds to check for complete tasks.
--max-queued-tasks MAX_QUEUED_TASKS
Maximum number of tasks to hold at any given time.
Generally should not be set.
Optional Settings:
--test Boot and run a short test suite to validate setup
--ntests NTESTS How many tests per found program to run, does nothing
without --test set
--schema Display the current Schema (Pydantic) for the YAML
config file and exit. This will always show the most
up-to-date schema. It will be presented in a JSON-like
format.
Terminology¶
There are a number of terms which can overlap due to the layers of abstraction and the type of software and hardware the Queue Manager interacts with. To help with that, the pages in this section will use the terminology defined below. Several pieces of software we interface with may have their own terms or the same term with different meaning, but because one goal of the Manager is to abstract those concepts away as best it can, we choose the following set. If you find something inappropriately labeled, unclear, or overloaded in any way, please raise an issue on GitHub and help us make it better!
An important note: Not all the concepts/mechanics of the Manager and Adapter are covered in the glossary by design! There are several abstraction layers and mechanics which the user should never have to interact with or even be aware of. However, if you feel something is missing, let us know!
- Adapter¶
The specific piece of software which accepts tasks from the Manager and sends them to the physical hardware. It is also the software which typically interacts with a cluster’s Scheduler to allocate said hardware and start Job.
- Distributed Compute Engine¶
A more precise, although longer-winded, term for the Adapter.
- Job¶
The specific allocation of resources (CPU, Memory, wall clock, etc) provided by the Scheduler to the Adapter. This is identical to if you requested batch-like job on a cluster (e.g. though
qsub
orsbatch
), however, it is more apt to think of the resources allocated in this way as “resources to be distributed to the Task by the Adapter”. Although a user running a Manager will likely not directly interact with these, its important to track as these are what your Scheduler is actually running and your allocations will be charged by. At least (and usually only) one Worker will be deployed to a Job from the Adapter to handle incoming Tasks. Once the Worker lands, it will report back to the Adapter and all communications happen between those two objects; the Job simply runs until either the Adapter stops it, or the Scheduler ends it.- Manager¶
The Fractal Queue Manager. The term “Manager” presented by itself refers to this object.
- Scheduler¶
The software running on a cluster which users request hardware from to run computational tasks, e.g. PBS, SLURM, LSF, SGE, etc. This, by itself, does not have any concept of the Manager or even the Adapter as both interface with it, not the other way around. Individual users’ clusters may, and almost always, have a different configuration, even amongst the same governing software. Therefore, no two Schedulers should be treated the same. In many cases, the Adapter submits a Job to the Scheduler with instructions of how the Job should start a Worker once it is allocated and booted.
- Server¶
The Fractal Server that the Manager connects to. This is the source of the Tasks which are pulled from and pushed to. Only the Manager has any notion of the Server of all the other software involved with the Manager does not.
- Tag¶
Arbitrary categorization labels that different tasks can be assigned when submitted to the Server. Managers can pull these tags if configured, and will exclusively pull their defined tag if so. Similarly, tasks set with a given tag can only be pulled if their Manager is configured to do so.
- Task¶
A single unit of compute as defined by the Fractal Server (i.e. the item which comes from the Task Queue). These tasks are preserved as they pass to the distributed compute engine and are what are presented to each distributed compute engine’s Workers to compute
- Worker¶
The process executed from the Adapter on the allocated hardware inside a Job. This process receives the tasks tracked by the Adapter and is responsible for their execution. The Worker itself is responsible for consuming the resources of the Job and distributing them to handle concurrent tasks. In most cases, there will be 1 Worker per Job, but there are some uncommon instances where this isn’t true. You can safely assume the 1 Worker/Job case for Fractal usage. Resources allocated for the Worker will be distributed by the Adapters configuration, but is usually uniform.
Queue Manager API¶
This page documents all valid options for the YAML file inputs to the config manager. This first section outlines each of the headers (top level objects) and a description for each one. The final file will look like the following:
common:
option_1: value_for1
another_opt: 42
server:
option_for_server: "some string"
This is the complete set of options, auto-generated from the parser itself, so it should be accurate for the given
release. If you are using a developmental build or want to see the schema yourself, you can run the
qcfractal-manager --schema
command and it will display the whole schema for the YAML input.
Each section below here is summarized the same way, showing all the options for that YAML header in the form of their pydantic API which the YAML is fed into in a one-to-one match of options.
-
class
qcfractal.cli.qcfractal_manager.
ManagerSettings
(*, common: qcfractal.cli.qcfractal_manager.CommonManagerSettings = CommonManagerSettings(adapter=<AdapterEnum.pool: 'pool'>, tasks_per_worker=1, cores_per_worker=2, memory_per_worker=6.707, max_workers=1, retries=2, scratch_directory=None, verbose=False, nodes_per_job=1, nodes_per_task=1, cores_per_rank=1), server: qcfractal.cli.qcfractal_manager.FractalServerSettings = FractalServerSettings(fractal_uri='localhost:7777', username=None, password=None, verify=None), manager: qcfractal.cli.qcfractal_manager.QueueManagerSettings = QueueManagerSettings(manager_name='unlabeled', queue_tag=None, log_file_prefix=None, update_frequency=30.0, test=False, ntests=5, max_queued_tasks=None), cluster: qcfractal.cli.qcfractal_manager.ClusterSettings = ClusterSettings(node_exclusivity=False, scheduler=None, scheduler_options=[], task_startup_commands=[], walltime='06:00:00', adaptive=<AdaptiveCluster.adaptive: 'adaptive'>), dask: qcfractal.cli.qcfractal_manager.DaskQueueSettings = DaskQueueSettings(interface=None, extra=None, lsf_units=None), parsl: qcfractal.cli.qcfractal_manager.ParslQueueSettings = ParslQueueSettings(executor=ParslExecutorSettings(address=None), provider=ParslProviderSettings(partition=None, launcher=None)))[source]¶ The config file for setting up a QCFractal Manager, all sub fields of this model are at equal top-level of the YAML file. No additional top-level fields are permitted, but sub-fields may have their own additions.
Not all fields are required and many will depend on the cluster you are running, and the adapter you choose to run on.
- Parameters
common (
CommonManagerSettings
, Optional)server (
FractalServerSettings
, Optional)manager (
QueueManagerSettings
, Optional)cluster (
ClusterSettings
, Optional)dask (
DaskQueueSettings
, Optional)parsl (
ParslQueueSettings
, Optional)
common¶
-
class
qcfractal.cli.qcfractal_manager.
CommonManagerSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, adapter: qcfractal.cli.qcfractal_manager.AdapterEnum = <AdapterEnum.pool: 'pool'>, tasks_per_worker: int = 1, cores_per_worker: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 2, memory_per_worker: qcfractal.cli.qcfractal_manager.ConstrainedFloatValue = 6.707, max_workers: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 1, retries: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 2, scratch_directory: str = None, verbose: bool = False, nodes_per_job: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 1, nodes_per_task: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 1, cores_per_rank: int = 1)[source]¶ The Common settings are the settings most users will need to adjust regularly to control the nature of task execution and the hardware under which tasks are executed on. This block is often unique to each deployment, user, and manager and will be the most commonly updated options, even as config files are copied and reused, and even on the same platform/cluster.
- Parameters
adapter ({dask,pool,parsl}, Default: pool) – Which type of Distributed adapter to run tasks through.
tasks_per_worker (int, Default: 1) – Number of concurrent tasks to run per Worker which is executed. Total number of concurrent tasks is this value times max_workers, assuming the hardware is available. With the pool adapter, and/or if max_workers=1, tasks_per_worker is the number of concurrent tasks.
cores_per_worker (ConstrainedInt, Default: 2) – Number of cores to be consumed by the Worker and distributed over the tasks_per_worker. These cores are divided evenly, so it is recommended that quotient of cores_per_worker/tasks_per_worker be a whole number else the core distribution is left up to the logic of the adapter. The default value is read from the number of detected cores on the system you are executing on.
In the case of node-parallel tasks, this number means the number of cores per node.
memory_per_worker (ConstrainedFloat, Default: 6.707) – Amount of memory (in GB) to be consumed and distributed over the tasks_per_worker. This memory is divided evenly, but is ultimately at the control of the adapter. Engine will only allow each of its calls to consume memory_per_worker/tasks_per_worker of memory. Total memory consumed by this manager at any one time is this value times max_workers. The default value is read from the amount of memory detected on the system you are executing on.
max_workers (ConstrainedInt, Default: 1) – The maximum number of Workers which are allowed to be run at the same time. The total number of concurrent tasks will maximize at this quantity times tasks_per_worker.The total number of Jobs on a cluster which will be started is equal to this parameter in most cases, and should be assumed 1 Worker per Job. Any exceptions to this will be documented. In node exclusive mode this is equivalent to the maximum number of nodes which you will consume. This must be a positive, non zero integer.
retries (ConstrainedInt, Default: 2) – Number of retries that QCEngine will attempt for RandomErrors detected when running its computations. After this many attempts (or on any other type of error), the error will be raised.
scratch_directory (str, Optional) – Scratch directory for Engine execution jobs.
verbose (bool, Default: False) – Turn on verbose mode or not. In verbose mode, all messages from DEBUG level and up are shown, otherwise, defaults are all used for any logger.
nodes_per_job (ConstrainedInt, Default: 1) – The number of nodes to request per job. Only used by the Parsl adapter at present
nodes_per_task (ConstrainedInt, Default: 1) – The number of nodes to use for each tasks. Only relevant for node-parallel executables.
cores_per_rank (int, Default: 1) – The number of cores per MPI rank for MPI-parallel applications. Only relevant for node-parallel codes and the most relevant to codes that with hybrid MPI+OpenMP parallelism (e.g., NWChem).
server¶
-
class
qcfractal.cli.qcfractal_manager.
FractalServerSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, fractal_uri: str = 'localhost:7777', username: str = None, password: str = None, verify: bool = None)[source]¶ Settings pertaining to the Fractal Server you wish to pull tasks from and push completed tasks to. Each manager supports exactly 1 Fractal Server to be in communication with, and exactly 1 user on that Fractal Server. These can be changed, but only once the Manager is shutdown and the settings changed. Multiple Managers however can be started in parallel with each other, but must be done as separate calls to the CLI.
Caution: The password here is written in plain text, so it is up to the owner/writer of the configuration file to ensure its security.
- Parameters
fractal_uri (str, Default: localhost:7777) – Full URI to the Fractal Server you want to connect to
username (str, Optional) – Username to connect to the Fractal Server with. When not provided, a connection is attempted as a guest user, which in most default Servers will be unable to return results.
password (str, Optional) – Password to authenticate to the Fractal Server with (alongside the username)
verify (bool, Optional) – Use Server-side generated SSL certification or not.
manager¶
-
class
qcfractal.cli.qcfractal_manager.
QueueManagerSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, manager_name: str = 'unlabeled', queue_tag: Optional[Union[str, List[str]]] = None, log_file_prefix: str = None, update_frequency: qcfractal.cli.qcfractal_manager.ConstrainedFloatValue = 30, test: bool = False, ntests: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 5, max_queued_tasks: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = None)[source]¶ Fractal Queue Manger settings. These are options which control the setup and execution of the Fractal Manager itself.
- Parameters
manager_name (str, Default: unlabeled) – Name of this scheduler to present to the Fractal Server. Descriptive names help the server identify the manager resource and assists with debugging.
queue_tag (Union[str, List[str]], Optional) – Only pull tasks from the Fractal Server with this tag. If not set (None/null), then pull untagged tasks, which should be the majority of tasks. This option should only be used when you want to pull very specific tasks which you know have been tagged as such on the server. If the server has no tasks with this tag, no tasks will be pulled (and no error is raised because this is intended behavior). If multiple tags are provided, tasks will be pulled (but not necessarily executed) in order of the tags.
log_file_prefix (str, Optional) – Full path to save a log file to, including the filename. If not provided, information will still be reported to terminal, but not saved. When set, logger information is sent both to this file and the terminal.
update_frequency (ConstrainedFloat, Default: 30) – Time between heartbeats/update checks between this Manager and the Fractal Server. The lower this value, the shorter the intervals. If you have an unreliable network connection, consider increasing this time as repeated, consecutive network failures will cause the Manager to shut itself down to maintain integrity between it and the Fractal Server. Units of seconds
test (bool, Default: False) – Turn on testing mode for this Manager. The Manager will not connect to any Fractal Server, and instead submits netsts worth trial tasks per quantum chemistry program it finds. These tasks are generated locally and do not need a running Fractal Server to work. Helpful for ensuring the Manager is configured correctly and the quantum chemistry codes are operating as expected.
ntests (ConstrainedInt, Default: 5) – Number of tests to run if the test flag is set to True. Total number of tests will be this number times the number of found quantum chemistry programs. Does nothing if test is False.If set to 0, then this submits no tests, but it will run through the setup and client initialization.
max_queued_tasks (ConstrainedInt, Optional) – Generally should not be set. Number of tasks to pull from the Fractal Server to keep locally at all times. If None, this is automatically computed as ceil(common.tasks_per_worker*common.max_workers*2.0) + 1. As tasks are completed, the local pool is filled back up to this value. These tasks will all attempt to be run concurrently, but concurrent tasks are limited by number of cluster jobs and tasks per job. Pulling too many of these can result in under-utilized managers from other sites and result in less FIFO returns. As such it is recommended not to touch this setting in general as you will be given enough tasks to fill your maximum throughput with a buffer (assuming the queue has them).
cluster¶
-
class
qcfractal.cli.qcfractal_manager.
ClusterSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, node_exclusivity: bool = False, scheduler: qcfractal.cli.qcfractal_manager.SchedulerEnum = None, scheduler_options: List[str] = [], task_startup_commands: List[str] = [], walltime: str = '06:00:00', adaptive: qcfractal.cli.qcfractal_manager.AdaptiveCluster = <AdaptiveCluster.adaptive: 'adaptive'>)[source]¶ Settings tied to the cluster you are running on. These settings are mostly tied to the nature of the cluster jobs you are submitting, separate from the nature of the compute tasks you will be running within them. As such, the options here are things like wall time (per job), which Scheduler your cluster has (like PBS or SLURM), etc. No additional options are allowed here.
- Parameters
node_exclusivity (bool, Default: False) – Run your cluster jobs in node-exclusivity mode. This option may not be available to all scheduler types and thus may not do anything. Related to this, the flags we have found for this option may not be correct for your scheduler and thus might throw an error. You can always add the correct flag/parameters to the scheduler_options parameter and leave this as False if you find it gives you problems.
scheduler ({slurm,pbs,sge,moab,lsf,cobalt}, Optional) – Option of which Scheduler/Queuing system your cluster uses. Note: not all scheduler options are available with every adapter.
scheduler_options (List[str], Default: []) – Additional options which are fed into the header files for your submitted jobs to your cluster’s Scheduler/Queuing system. The directives are automatically filled in, so if you want to set something like ‘#PBS -n something’, you would instead just do ‘-n something’. Each directive should be a separate string entry in the list. No validation is done on this with respect to valid directives so it is on the user to know what they need to set.
task_startup_commands (List[str], Default: []) – Additional commands to be run before starting the Workers and the task distribution. This can include commands needed to start things like conda environments or setting environment variables before executing the Workers. These commands are executed first before any of the distributed commands run and are added to the batch scripts as individual commands per entry, verbatim.
walltime (str, Default: 06:00:00) – Wall clock time of each cluster job started. Presented as a string in HH:MM:SS form, but your cluster may have a different structural syntax. This number should be set high as there should be a number of Fractal tasks which are run for each submitted cluster job. Ideally, the job will start, the Worker will land, and the Worker will crunch through as many tasks as it can; meaning the job which has a Worker in it must continue existing to minimize time spend redeploying Workers.
adaptive ({static,adaptive}, Default: adaptive) – Whether or not to use adaptive scaling of Workers or not. If set to ‘static’, a fixed number of Workers will be started (and likely NOT restarted when the wall clock is reached). When set to ‘adaptive’ (the default), the distributed engine will try to adaptively scale the number of Workers based on tasks in the queue. This is str instead of bool type variable in case more complex adaptivity options are added in the future.
dask¶
-
class
qcfractal.cli.qcfractal_manager.
DaskQueueSettings
(*, interface: str = None, extra: List[str] = None, lsf_units: str = None, **kwargs)[source]¶ Settings for the Dask Cluster class. Values set here are passed directly into the Cluster objects based on the cluster.scheduler settings. Although many values are set automatically from other settings, there are some additional values such as interface and extra which are passed through to the constructor.
Valid values for this field are functions of your cluster.scheduler and no linting is done ahead of trying to pass these to Dask.
NOTE: The parameters listed here are a special exception for additional features Fractal has engineered or options which should be considered for some of the edge cases we have discovered. If you try to set a value which is derived from other options in the YAML file, an error is raised and you are told exactly which one is forbidden.
Please see the docs for the provider for more information.
- Parameters
interface (str, Optional) – Name of the network adapter to use as communication between the head node and the compute node.There are oddities of this when the head node and compute node use different ethernet adapter names and we have not figured out exactly which combination is needed between this and the poorly documented ip keyword which appears to be for Workers, but not the Client.
extra (List[str], Optional) – Additional flags which are fed into the Dask Worker CLI startup, can be used to overwrite pre-configured options. Do not use unless you know exactly which flags to use.
lsf_units (str, Optional) – Unit system for an LSF cluster limits (e.g. MB, GB, TB). If not set, the units are are attempted to be set from the lsf.conf file in the default locations. This does nothing if the cluster is not LSF
parsl¶
-
class
qcfractal.cli.qcfractal_manager.
ParslQueueSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, executor: qcfractal.cli.qcfractal_manager.ParslExecutorSettings = ParslExecutorSettings(address=None), provider: qcfractal.cli.qcfractal_manager.ParslProviderSettings = ParslProviderSettings(partition=None, launcher=None), **values: Any)[source]¶ The Parsl-specific configurations used with the common.adapter = parsl setting. The parsl config is broken up into a top level Config class, an Executor sub-class, and a Provider sub-class of the Executor. Config -> Executor -> Provider. Each of these have their own options, and extra values fed into the ParslQueueSettings are fed to the Config level.
It requires both executor and provider settings, but will default fill them in and often does not need any further configuration which is handled by other settings in the config file.
- Parameters
executor (
ParslExecutorSettings
, Optional)provider (
ParslProviderSettings
, Optional)
executor¶
-
class
qcfractal.cli.qcfractal_manager.
ParslExecutorSettings
(*, address: str = None, **kwargs)[source]¶ Settings for the Parsl Executor class. This serves as the primary mechanism for distributing Workers to jobs. In most cases, you will not need to set any of these options, as several options are automatically inferred from other settings. Any option set here is passed through to the HighThroughputExecutor class of Parsl.
https://parsl.readthedocs.io/en/latest/stubs/parsl.executors.HighThroughputExecutor.html
NOTE: The parameters listed here are a special exception for additional features Fractal has engineered or options which should be considered for some of the edge cases we have discovered. If you try to set a value which is derived from other options in the YAML file, an error is raised and you are told exactly which one is forbidden.
- Parameters
address (str, Optional) – This only needs to be set in conditional cases when the head node and compute nodes use a differently named ethernet adapter.
An address to connect to the main Parsl process which is reachable from the network in which Workers will be running. This can be either a hostname as returned by hostname or an IP address. Most login nodes on clusters have several network interfaces available, only some of which can be reached from the compute nodes. Some trial and error might be necessary to identify what addresses are reachable from compute nodes.
provider¶
-
class
qcfractal.cli.qcfractal_manager.
ParslProviderSettings
(*, partition: str = None, launcher: qcfractal.cli.qcfractal_manager.ParslLauncherSettings = None, **kwargs)[source]¶ Settings for the Parsl Provider class. Valid values for this field depend on your choice of cluster.scheduler and are defined in the Parsl docs for the providers with some minor exceptions. The initializer function for the Parsl settings will indicate which
NOTE: The parameters listed here are a special exception for additional features Fractal has engineered or options which should be considered for some of the edge cases we have discovered. If you try to set a value which is derived from other options in the YAML file, an error is raised and you are told exactly which one is forbidden.
SLURM: https://parsl.readthedocs.io/en/latest/stubs/parsl.providers.SlurmProvider.html PBS/Torque/Moab: https://parsl.readthedocs.io/en/latest/stubs/parsl.providers.TorqueProvider.html SGE (Sun GridEngine): https://parsl.readthedocs.io/en/latest/stubs/parsl.providers.GridEngineProvider.html
- Parameters
partition (str, Optional) – The name of the cluster.scheduler partition being submitted to. Behavior, valid values, and evenits validity as a set variable are a function of what type of queue scheduler your specific cluster has (e.g. this variable should NOT be present for PBS clusters). Check with your Sys. Admins and/or your cluster documentation.
launcher (
ParslLauncherSettings
, Optional) – The Parsl Launcher to use with your Provider. If left toNone
, defaults are assumed (check the Provider’s defaults), otherwise this should be a dictionary requiring the optionlauncher_class
as a str to specify which Launcher class to load, and the remaining settings will be passed on to the Launcher’s constructor.
Configuration for High-Performance Computing¶
High-performance computing (HPC) clusters are designed to complete highly-parallel tasks in a short time. Properly leveraging such clusters requires utilizing large numbers of compute nodes at the same time, which requires special configurations for the QCFractal manager. This part of the guide details several routes for configuring HPC clusters to use either large numbers tasks that each use only a single node, or deploying a smaller number of tasks that use multiple nodes.
Note: This guide is currently limited to using the Parsl adapter and contains some configuration options which do not work with other adapters.
Many Nodes per Job, Single Node per Application¶
The recommended configuration for a QCFractal manager to use multi-node Jobs with tasks limited to a single node is launch many workers for a single Job.
The Parsl adapter deploys a single ‘’manager’’ per Job and uses the HPC system’s
MPI task launcher to deploy the Parsl executors on to the compute nodes.
Each “executor” will run a single Python process per QCEngine worker and can run
more than one worker per node.
The manager
will run on the login or batch node (depending on the cluster’s configuration)
once the Job is started and will communicate to the workers using Parsl’s ØMQ messaging protocol.
The QCFractal QueueManager will connect to the Parsl manager for each Job.
See the example page for details on how to configure Parsl for your system.
The configuration setting common.nodes_per_job
defines the ability to make multi-node allocation
requests to a scheduler via an Adapter.
Many Nodes per Job, More than One Node per Application¶
The recommended configuration for using node-parallel tasks is to have a single QCFractal worker running on the batch node, and using that worker to launch MPI tasks on the compute nodes. The differentiating aspect of deploying multi-node tasks is that the QCFractal Worker and QCEngine Python process will run on different nodes than the quantum chemistry code.
The Parsl implementation for multi-node jobs will place a Parsl single executor and interchange on the login/batch node. The Parsl executor will launch a number of workers (as separate Python processes) equal to the number of nodes per Job divided by the number of nodes per Task. The worker will call the MPI launch system to place quantum-chemistry calculations on the compute nodes of the clusters.
See the example page for details on how to configure Parsl for your system.
Queue Manager Example YAML Files¶
The primary way to set up a Manager is to setup a YAML config file. This page provides helpful config files which mostly can be just copied and used in place (filling in things like **username** and **password** as needed.)
The full documentation of every option and how it can be used can be found in the Queue Manager’s API.
For these examples, the username
will always be “Foo” and the password
will always be “b4R”
(which are just placeholders and not valid). The manager_name
variable can be any string and these examples provide
some descriptive samples. The more distinct the name, the better it is to see its status on the Server.
SLURM Cluster, Dask Adapter with additional options¶
This example is similar to the example on the start page for Managers, but with some
additional options such as connecting back to a central Fractal instance and setting more cluster-specific options.
Again, this starts a manager with a dask Adapter, on a SLURM cluster, consuming 1 CPU and 8 GB of ram, targeting
a Fractal Server running on that cluster, and using the SLURM partition default
, save the following YAML config
file:
common:
adapter: dask
tasks_per_worker: 1
cores_per_worker: 1
memory_per_worker: 8
server:
fractal_uri: "localhost:7777"
username: Foo
password: b4R
manager:
manager_name: "SlurmCluster_OneDaskTask"
cluster:
scheduler: slurm
walltime: "72:00:00"
dask:
queue: default
Multiple Tasks, 1 Cluster Job¶
This example starts a max of 1 cluster Job, but multiple tasks. The hardware will be
consumed uniformly by the Worker. With 8 cores, 20 GB of memory, and 4 tasks; the Worker will provide
2 cores and 5 GB of memory to compute each Task. We set common.max_workers
to 1 to limit the number
of Workers and Jobs which can be started. Since this is SLURM, the squeue
information
will show this user has run 1 sbatch
jobs which requested 4 cores and 20 GB of memory.
common:
adapter: dask
tasks_per_worker: 4
cores_per_worker: 8
memory_per_worker: 20
max_workers: 1
server:
fractal_uri: "localhost:7777"
username: Foo
password: b4R
manager:
manager_name: "SlurmCluster_MultiDask"
cluster:
scheduler: slurm
walltime: "72:00:00"
dask:
queue: default
Testing the Manager Setup¶
This will test the Manager to make sure it’s setup correctly, and does not need to
connect to the Server, and therefore does not need a server
block. It will still however submit
jobs.
common:
adapter: dask
tasks_per_worker: 2
cores_per_worker: 4
memory_per_worker: 10
manager:
manager_name: "TestBox_NeverSeen_OnServer"
test: True
ntests: 5
cluster:
scheduler: slurm
walltime: "01:00:00"
dask:
queue: default
Running commands before work¶
Suppose there are some commands you want to run before starting the Worker, such as starting a Conda environment, or setting some environment variables. This lets you specify that. For this, we will run on a Sun Grid Engine (SGE) cluster, start a conda environment, and load a module.
An important note about this one, we have now set max_workers
to something larger than 1.
Each Job will still request 16 cores and 256 GB of memory to be evenly distributed between the
4 tasks, however, the Adapter will attempt to start 5 independent jobs, for a
total of 80 cores, 1.280 TB of memory, distributed over 5 Workers collectively running 20 concurrent
tasks. If the Scheduler does not
allow all of those jobs to start, whether due to lack of resources or user limits, the
Adapter can still start fewer jobs, each with 16 cores and 256 GB of memory, but Task
concurrency will change by blocks of 4 since the Worker in each Job is configured to handle 4
tasks each.
common:
adapter: dask
tasks_per_worker: 4
cores_per_worker: 16
memory_per_worker: 256
max_workers: 5
server:
fractal_uri: localhost:7777
username: Foo
password: b4R
manager:
manager_name: "GridEngine_OpenMPI_DaskWorker"
test: False
cluster:
scheduler: sge
task_startup_commands:
- module load mpi/gcc/openmpi-1.6.4
- conda activate qcfmanager
walltime: "71:00:00"
dask:
queue: free64
Additional Scheduler Flags¶
A Scheduler may ask you to set additional flags (or you might want to) when submitting a Job.
Maybe it’s a Sys. Admin enforced rule, maybe you want to pull from a specific account, or set something not
interpreted for you in the Manager or Adapter (do tell us though if this is the case). This
example sets additional flags on a PBS cluster such that the final Job launch file will have
#PBS {my headers}
.
This example also uses Parsl and sets a scratch directory.
common:
adapter: parsl
tasks_per_worker: 1
cores_per_worker: 6
memory_per_worker: 64
max_workers: 5
scratch_directory: "$TMPDIR"
server:
fractal_uri: localhost:7777
username: Foo
password: b4R
verify: False
manager:
manager_name: "PBS_Parsl_MyPIGroupAccount_Manger"
cluster:
node_exclusivity: True
scheduler: pbs
scheduler_options:
- "-A MyPIsGroupAccount"
task_startup_commands:
- conda activate qca
- cd $WORK
walltime: "06:00:00"
parsl:
provider:
partition: normal_q
cmd_timeout: 30
Single Job with Multiple Nodes and Single-Node Tasks with Parsl Adapter¶
Leadership platforms prefer or require more than one node per Job request. The following configuration will request a Job with 256 nodes and place one Worker on each node.
common:
adapter: parsl
tasks_per_worker: 1
cores_per_worker: 64 # Number of cores per compute node
max_workers: 256 # Maximum number of workers deployed to compute nodes
nodes_per_job: 256
cluster:
node_exclusivity: true
task_startup_commands:
- module load miniconda-3/latest # You will need to load the Python environment on startup
- source activate qcfractal
- export KMP_AFFINITY=disable # KNL-related issue. Needed for multithreaded apps
- export PATH=~/software/psi4/bin:$PATH # Points to psi4 compiled for compute nodes
scheduler: cobalt # Varies depending on supercomputing center
parsl:
provider:
queue: default
launcher: # Defines the MPI launching function
launcher_class: AprunLauncher
overrides: -d 64 # Option for XC40 machines, allows workers to access 64 threads
init_blocks: 0
min_blocks: 0
account: CSC249ADCD08
cmd_timeout: 60
walltime: "3:00:00"
Consult the Parsl configuration docs for information on how to configure the Launcher and Provider classes for your cluster.
Single Job with Multiple, Node-Parallel Tasks with Parsl Adapter¶
Running MPI-parallel tasks requires a similar configuration to the multiple nodes per job
for the manager and also some extra work in defining the qcengine environment.
The key difference that sets apart managers for node-parallel applications is that
that nodes_per_job
is set to more than one and
Parsl uses SimpleLauncher
to deploy a Parsl executor onto
the batch/login node once a job is allocated.
common:
adapter: parsl
tasks_per_worker: 1
cores_per_worker: 16 # Number of cores used on each compute node
max_workers: 128
memory_per_worker: 180 # Summary for the amount per compute node
nodes_per_job: 128
nodes_per_task: 2 # Number of nodes to use for each task
cores_per_rank: 1 # Number of cores to each of each MPI rank
cluster:
node_exclusivity: true
task_startup_commands:
- module load miniconda-3/latest
- source activate qcfractal
- export PATH="/soft/applications/nwchem/6.8/bin/:$PATH"
- which nwchem
scheduler: cobalt
parsl:
provider:
queue: default
launcher:
launcher_class: SimpleLauncher
init_blocks: 0
min_blocks: 0
account: CSC249ADCD08
cmd_timeout: 60
walltime: "0:30:00"
The configuration that describes how to launch the tasks must be written at a qcengine.yaml
file. See QCEngine docs
for possible locations to place the qcengine.yaml
file and full descriptions of the
configuration option.
One key option for the qcengine.yaml
file is the description of how to launch MPI
tasks, mpiexec_command
. For example, many systems use mpirun
(e.g., OpenMPI).
An example configuration a Cray supercomputer is:
all:
hostname_pattern: "*"
scratch_directory: ./scratch # Must be on the global filesystem
is_batch_node: True # Indicates that `aprun` must be used for all QC code invocations
mpiexec_command: "aprun -n {total_ranks} -N {ranks_per_node} -C -cc depth --env CRAY_OMP_CHECK_AFFINITY=TRUE --env OMP_NUM_THREADS={cores_per_rank} --env MKL_NUM_THREADS={cores_per_rank}
-d {cores_per_rank} -j 1"
jobs_per_node: 1
ncores: 64
Note that there are several variables in the mpiexec_command
that describe how to insert parallel configurations into the
command: total_ranks
, ranks_per_node
, and cores_per_rank
.
Each of these values are computed based on the number of cores per node, the number of nodes per application
and the number of cores per MPI rank, which are all defined in the Manager settings file.
Queue Manager Frequent Questions and Issues¶
This page documents some of the frequent questions and issues we see with the Queue Managers. If this page and none of the other documentation pages have answered your question, please ask on GitHub or join our Slack group to get assistance.
Common Questions¶
How do I get more information from the Manger?¶
Turn on verbose
mode, either add the -v
flag to the CLI, or set the
common.verbose
to True
in the YAML file. Setting this flag will produce
much more detailed information. This sets the loggers to DEBUG
level.
In the future, we may allow for different levels of increased verbosity, but for now there is only the one level.
Can I start more than one Manager at a time?¶
Yes. This is often done if you would like to create multiple task tags that have different resource requirements or spin up managers that can access different resources. Check with your cluster administrators though to find out their policy on multiple processes running on the clusters head node.
You can reuse the same config file, just invoke the CLI again.
Can I connect to a Fractal Server besides MolSSI’s?¶
Yes! Just change the server.fractal_uri
argument.
Can I connect to more than one Fractal Server¶
Yes and No. Each Manager can only connect to a single Fractal Server, but you can start multiple managers with different config files pointing to different Fractal Servers.
How do I help contribute compute time to the MolSSI database?¶
Join our Slack group! We would love to talk to you and help get you contributing as well!
I have this issue, here is my config file…¶
Happy to look at it! We only ask that you please remove the password from the config file before posting it. If we see a password, we’ll do our best to delete it, but that does not ensure someone did not see it.
Common Issues¶
This documents some of the common issues we see.
Jobs are quickly started and die without error¶
We see this problem with Dask often and the most common case is the head node (landing node, login node, etc.)
has an ethernet adapter with a different name than the compute nodes. You can check this by running the command
ip addr
on both the head node and a compute node (either through an interactive job or a job which writes
the output of that command to a file).
You will see many lines of output, but there should be a block that looks like the following:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
3: eno49.4010@eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
inet 10.XX.Y.Z/16 brd 10.XX.255.255 scope global eno49.4010
4: eno49.4049@eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
inet 198.XX.YYY.ZZZ/24 brd 198.XX.252.255 scope global eno49.4049
the XX
, YYY
and ZZZ
will have values corresponding to your cluster’s configuration.
There are a few critical items:
The headers (
lo
,eno.49...
, yours will be different) and the addresses where heXX
placeholders are.Ignore the
lo
adapter, every machine should have one.The head node should have a
inet
that looks like a normal IP address, and another one which looks like it has a10.something
IP address.The compute node will likely have an adapter which is only the
10.something
.These
10.something
IP addresses are often intranet communication only, meaning the compute nodes cannot reach the broader internet
The name of the ethernet adapter housing
the 10.something
will be different on the head node and the compute node.
In this case, in your YAML file, add a line in dask
called interface
and set it to the name of the
adapter which is shared. So for this it would be:
dask:
interface: "eno49.4049"
plus all the rest of your YAML file. You can safely ignore the bit after the @
sign.
If there isn’t a shared adapter name, try this instead:
dask:
ip: "10.XX.Y.Z"
Replace the .XX.Y.Z
with the code which has the intranet IP of the head node. This option
acts as a pass through to the Dask Worker call and tells the worker to try and connect to the
head node at that IP address.
If that still doesn’t work, contact us. We’re working to make this less manual and difficult in the future.
Other variants:
“My jobs start and stop instantly”
“My jobs restart forever”
My Conda Environments are not Activating¶
You likely have to source
the Conda profile.d
again first. See also
https://github.com/conda/conda/issues/8072
This can also happen during testing where you will see command-line based binaries (like Psi4) pass, but Python-based
codes (like RDKit) fail saying complaining about an import error. On cluster compute nodes, this often manifests as
the $PATH
variable being passed from your head node correctly to the compute node, but then the Python imports
cannot be found because the Conda environment is not set up correctly.
This problem is obfuscated by the fact that
workers such as Dask Workers can still start initially despite being a Python code themselves. Many
adapters will start their programs using the absolute Python binary path which gets around the
incomplete Conda configuration. We strongly recommend you do not try setting the absolute Python path in your
scripts to get around this, and instead try to source
the Conda profile.d
first. For example, you might
need to add something like this to your YAML file (change paths/environment names as needed):
cluster:
task_startup_commands:
- source ~/miniconda3/etc/profile.d/conda.sh
- conda activate qcfractal
Other variants:
“Tests from one program pass, but others don’t”
“I get errors about unable to find program, but its installed”
“I get path and/or import errors when testing”
My jobs appear to be running, but only one (or few) workers are starting¶
If the jobs appear to be running (and the Manager is reporting they return successfully), a few things may be happening.
If jobs are completing very fast, the Adapter may not feel like it needs to start more workers, which is fine.
(Not recommended, use for debug only) Check your
manger.max_queued_tasks
arg to pull more tasks from the Server to fill the jobs you have started. This option is usually automatically calculated based on yourcommon.tasks_per_worker
andcommon.max_workers
to keep all workers busy and still have a buffer.
Detailed Manager Information¶
This page documents all the internals of the Managers in depth and is not intended for the general user nor should be required for setting up and running them. This page is for those who are interested in the inner workings and detailed flow of the how the Manager interacts with the Server, the Adapter, the Scheduler, and what it is abstracting away from the user.
Since this is intended to be a deep dive into mechanics, please let us know if something is missing or unclear and we can expand this page as needed.
Manager Flowchart¶
The Queue Manager’s interactions with the Fractal Server, the Distributed Compute Engine, the physical Compute Hardware, and the user are shown in the following diagram.

Server Documentation
Configuring and running the Server from the CLI and Config Files
Fractal Server Init¶
The sub-command for the qcfractal-server
CLI which initializes a new server instance, including configuring
the PostgreSQL database if it is not setup yet.
Command Invocation¶
qcfractal-server init [<options>]
Command Description¶
This command will attempt to do the following actions for the user in default mode (no args):
Create the QCFractal Config directory
Create a blank Fractal Config file (assumes defaults)
Create the folders for housing the PostgreSQL database file, which will be the home of Fractal’s data.
Initialize PostgreSQL’s service at the database location from above
Start the PostgreSQL server
Populate the database tables and finalize everything for Fractal’s operation
In most cases, the user should not have to change any configurations if they are the system owners or admins. However, if users want to do something different, they can write their own Config File and change the settings though the CLI to start the server.
Options¶
This is a set of GLOBAL level options which impact where the ``init`` command looks, and how it interacts with the config file
--overwrite
Control whether the rest of the settings overwrite an existing config file in the QCFractal Config directory
--base-folder [<folder>]
The QCFractal base directory to attach to. Default:
~/.qca/qcfractal
This set of options pertain to the PostgreSQL database itself and translate to the database
header in the
Fractal Server Config.
--db-port [<port>]
The PostgreSQL default port, Default
5432
--db-host [<host>]
Default location for the Postgres server. If not
localhost
, Fractal command lines cannot manage the instance. and will have to be configured in the Config File. Default:localhost
--db-username [<user>]
The postgres username to default to. Planned Feature - Currently inactive.
--db-password [<password>]
The postgres password for the give user. Planned Feature - Currently inactive.
--db-directory [<dir_path>]
“The physical location of the QCFractal instance data, defaults to the root Config directory.
--db-default-database [<db_name>]
The default database to connect to. Typically used if you already have a Fractal Database set up or you want to use a different name for the database besides the default. Default
qcfractal_default
.--db-logfile [<logfile>]
The logfile to write postgres logs. Default
qcfractal_postgres.log
.--db-own (True|False)
If own is True, Fractal will control the database instance. If False Postgres will expect a booted server at the database specification. Default
True
The settings below here pertain to the Fractal Server and translate to the fractal
header in the
Fractal Server Config.
--name [<name>]
The Fractal server default name. Controls how the server presents itself to connected clients. Default
QCFractal Server
--port [<port>]
The Fractal default port. This is the port which Fractal listens to for client connections (and for the URI). This is separate from the
--db-port
which is the port that PostgreSQL database is listening for. In general, these should be separate. Default7777
.--compres-response (True|False)
Compress REST responses or not, should be True unless behind a proxy. Default
True
.--allow-read (True|False)
Always allows read access to record tables. Default
True
--security [<security_string>]
Optional security features. Not set by default.
--query-limit [<int_limit>]
The maximum number of records to return per query. Default
1000
--logfile [<log>]
The logfile the Fractal Server writes to. Default
qcfractal_server.log
--service-frequency [<frequency>]
The frequency to update the Fractal services. Default
60
--max-active-services [<max-services>]
The maximum number of concurrent active services. Default
20
--heartbeat-frequency [<heartbeat>]
The frequency (in seconds) to check the heartbeat of Managers. Default
1800
Fractal Server Config¶
This page documents the valid options for the YAML file inputs to the Config File. This first section outlines each of the headers (top level objects) and a description for each one. The final file will look like the following:
common:
option_1: value_for1
another_opt: 42
server:
option_for_server: "some string"
Command Invocation¶
qcfractal-server config [<options>]
Command Description¶
Show the current config file at an optional location.
Looks in the default location if no arg is provided
Options¶
--base-folder [<folder>]
The QCFractal base directory to attach to. Default:
~/.qca/qcfractal
Config File Complete Options¶
The valid top-level YAML headers are the parameters of the FractalConfig
class.
-
class
qcfractal.config.
FractalConfig
(*, base_folder: str = '/home/docs/.qca/qcfractal', database: qcfractal.config.DatabaseSettings = DatabaseSettings(port=5432, host='localhost', username=None, password=None, directory=None, database_name='qcfractal_default', logfile='qcfractal_postgres.log', own=True), view: qcfractal.config.ViewSettings = ViewSettings(enable=True, directory=None), fractal: qcfractal.config.FractalServerSettings = FractalServerSettings(name='QCFractal Server', port=7777, compress_response=True, allow_read=True, security=None, query_limit=1000, logfile='qcfractal_server.log', loglevel='info', cprofile=None, service_frequency=60, max_active_services=20, heartbeat_frequency=1800, log_apis=False, geo_file_path=None))[source]¶ Top level configuration headers and options for a QCFractal Configuration File
- Parameters
base_folder (str, Default: /home/docs/.qca/qcfractal) – The QCFractal base instance to attach to. Default will be your home directory
database (
DatabaseSettings
, Optional)view (
ViewSettings
, Optional)fractal (
FractalServerSettings
, Optional)
database
¶
-
class
qcfractal.config.
DatabaseSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, port: int = 5432, host: str = 'localhost', username: str = None, password: str = None, directory: str = None, database_name: str = 'qcfractal_default', logfile: str = 'qcfractal_postgres.log', own: bool = True)[source]¶ Postgres Database settings
- Parameters
port (int, Default: 5432) – The postgresql default port
host (str, Default: localhost) – Default location for the postgres server. If not localhost, qcfractal command lines cannot manage the instance.
username (str, Optional) – The postgres username to default to.
password (str, Optional) – The postgres password for the give user.
directory (str, Optional) – The physical location of the QCFractal instance data, defaults to the root folder.
database_name (str, Default: qcfractal_default) – The database name to connect to.
logfile (str, Default: qcfractal_postgres.log) – The logfile to write postgres logs.
own (bool, Default: True) – If own is True, QCFractal will control the database instance. If False Postgres will expect a booted server at the database specification.
fractal
¶
-
class
qcfractal.config.
FractalServerSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, name: str = 'QCFractal Server', port: int = 7777, compress_response: bool = True, allow_read: bool = True, security: str = None, query_limit: int = 1000, logfile: str = 'qcfractal_server.log', loglevel: str = 'info', cprofile: str = None, service_frequency: int = 60, max_active_services: int = 20, heartbeat_frequency: int = 1800, log_apis: bool = False, geo_file_path: str = None)[source]¶ Fractal Server settings
- Parameters
name (str, Default: QCFractal Server) – The QCFractal server default name.
port (int, Default: 7777) – The QCFractal default port.
compress_response (bool, Default: True) – Compress REST responses or not, should be True unless behind a proxy.
allow_read (bool, Default: True) – Always allows read access to record tables.
security (str, Optional) – Optional user authentication. Specify ‘local’ to enable authentication through locally stored usernames. User permissions may be manipulated through the
qcfractal-server user
CLI.query_limit (int, Default: 1000) – The maximum number of records to return per query.
logfile (str, Default: qcfractal_server.log) – The logfile to write server logs.
loglevel (str, Default: info) – Level of logging to enable (debug, info, warning, error, critical)
cprofile (str, Optional) – Enable profiling via cProfile, and output cprofile data to this path
service_frequency (int, Default: 60) – The frequency to update the QCFractal services.
max_active_services (int, Default: 20) – The maximum number of concurrent active services.
heartbeat_frequency (int, Default: 1800) – The frequency (in seconds) to check the heartbeat of workers.
log_apis (bool, Default: False) – True or False. Store API access in the Database. This is an advanced option for servers accessed by external users through QCPortal.
geo_file_path (str, Optional) – Geoip2 cites file path (.mmdb) for resolving IP addresses. Defaults to [base_folder]/GeoLite2-City.mmdb
view
¶
-
class
qcfractal.config.
ViewSettings
(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, enable: bool = True, directory: str = None)[source]¶ HDF5 view settings
- Parameters
enable (bool, Default: True) – Enable frozen-views.
directory (str, Optional) – Location of frozen-view data. If None, defaults to base_folder/views.
Fractal Server Start¶
The sub-command for the qcfractal-server
CLI which starts the Fractal server instance
Command Invocation¶
qcfractal-server start [<options>]
Command Description¶
This command will attempt to do the following actions for the user in default mode (no args):
Read the QCFractal Config directory
Read the config file in that directory
Connect to the previously created Fractal database created in the PostgreSQL service (see Fractal Server Init).
Start Fractal’s periodic services.
Create and provide SSL certificates.
The options for the database and starting local compute on the same resources as the server can be controlled through the flags below. Also see all the config file options in Config File.
Options¶
--base-folder [<folder>]
The QCFractal base directory to attach to. Default:
~/.qca/qcfractal
--port [<port>]
The Fractal default port. This is the port which Fractal listens to for client connections (and for the URI). This is separate from the the port that PostgreSQL database is listening for. In general, these should be separate. Default ``7777`.
--logfile [<log>]
The logfile the Fractal Server writes to. Default
qcfractal_server.log
--database-name [<db_name>]
The database to connect to, defaults to the default database name. Default
qcfractal_default
--server-name [<server_name>]
The Fractal server default name. Controls how the server presents itself to connected clients. Default
QCFractal Server
--start-periodics (True|False)
Expert Level Flag Only Warning! Can disable periodic update (services, heartbeats) if False. Useful when running behind a proxy. Default
True
--disable-ssl (False|True)
Disables SSL if present, if
False
a SSL cert will be created for you. DefaultFalse
--tls-cert [<tls_cert_str>]
Certificate file for TLS (in PEM format)
--tls-key [<tls_key_str>]
Private key file for TLS (in PEM format)
--local-manager [<int>]
Creates a local pool QueueManager attached to the server using the number of threads specified by the arg. If this flag is set and no number is provided, 1 (one) thread will be spun up and running locally. If you expect Fractal Managers to connect to this server, then it is unlikely you need this. Related, if no compute is expected to be done on this server, then it is unlikely this will be needed.
Fractal Server User¶
The sub-command for the qcfractal-server
CLI which manages user permissions and passwords.
Command Invocation¶
qcfractal-server user [<options>]
Top-level Options¶
--base-folder [<folder>]
The QCFractal base directory to attach to. Default:
~/.qca/qcfractal
.
Subcommand Summary¶
The qcfractal-server user
CLI allows for manipulation of users through four subcommands:
add
: Add a new user.show
: Display a user’s permissions.modify
: Change a user’s permissions or password.remove
: Delete a user.
Add Subcommand¶
Command Invocation¶
qcfractal-server user add [<options>] <username>
Command Description¶
This command adds a new user, setting the user’s password and permissions. The user must not already exist.
Arguments¶
<username>
The username to add.
--password [<password>]
The password for the user. If this option is not provided, a password will be generated and printed.
--permissions [<permissions>]
Permissions for the user. Allowed values:
read
,write
,queue
,compute
,admin
. Multiple values are allowed. At least one value must be specified.
Show Subcommand¶
Command Invocation¶
qcfractal-server user show <username>
Command Description¶
This command prints the permissions for a given user.
Arguments¶
<username>
The username for which to show permissions.
Modify Subcommand¶
Command Invocation¶
qcfractal-server user modify [<options>] <username>
Command Description¶
This command modifys a user’s permissions or password.
Arguments¶
<username>
The username to modfiy.
--password [<password>]
Change the user’s password to a given string. This options excludes
--reset-password
.--reset-password
Change the user’s password to an auto-generated value. The new password will be printed. This option excludes
--password
.--permissions [<permissions>]
Change the user’s permissions to the given set. Allowed values:
read
,write
,queue
,compute
,admin
. Multiple values are allowed. See User Permissions for more information.
Remove Subcommand¶
Command Invocation¶
qcfractal-server user remove <username>
Command Description¶
This command removes a user.
Arguments¶
<username>
The username to remove.
User Permissions¶
Five permission types are available:
Fractal Server Upgrade¶
The sub-command for the qcfractal-server
CLI which allows in-place upgrade of Fractal Databases to newer versions
through SQLAlchemy Alembic.
Command Invocation¶
qcfractal-server upgrade [<options>]
Command Description¶
This command will attempt to upgrade an existing Fractal Database (stored in PostgreSQL) to a new version based on the currently installed Fractal software version. Not every version of Fractal updates the database, so this command will only need to be run when you know the database has changed (or attempting to start it tells you to).
This command will attempt to do the following actions for the user in default mode (no args):
Read the database location from your Config File in the default location (can be controlled)
Determine the upgrade paths from your existing version to the version known by Alembic (update information is shipped with the Fractal software)
Stage update
Commit update if no errors found
You will then need to start the server again through Fractal Server Start to bring the server back online.
Caveat: This command will not initialize the Fractal Database for you from nothing. The database must exist for this command to run.
Options¶
--base-folder [<folder>]
The QCFractal base directory to attach to. Default:
~/.qca/qcfractal
Server-side Dataset Views¶
Note
This is an experimental feature.
HDF5 views of Datasets may be stored on the server to improve query performance.
To use views, first specify a path to store views in the qcfractal.config.ViewSettings
.
Next, generate a view for the collection(s) of interest:
import qcfractal.interface as ptl
ds = ptl.get_collection("ReactionDataset", "S22")
# Note the server will look for views in the directory specified above,
# named {collection_id}.hdf5
view = ptl.collections.HDF5View(viewpath / f"{ds.data.id}.hdf5")
view.write(ds)
Finally, mark the collection as supporting views:
# Update the dataset to indicate a view is available
ds.__dict__["view_available"] = True
ds.save()
# Optionally, you may add a download URL for the view
ds.__dict__["view_url"] = "https://someserver.com/view.hdf5"
ds.save()
Developer Documentation
Contains in-depth developer documentation.
QCArchive Design¶
The QCArchive software ecosystem consists of a series of Python modules that can either be used together or are useful standalone pieces to the computational molecular sciences community. This ecosystem is constructed to be used at single-user, small group, and multi-PI levels while retaining the ability to scale up to the needs of an entire community of scientist.
In each case, it is expected only a small number of users are required to understand the entire software stack and the primary interaction with the QCArchive ecosystem will be through the user front-end (QCPortal). After the persistence server instance (QCFractal) is instantiated with a distributed workflow system and compute the server should be able to maintain itself without user intervention. A diagram of how the ecosystem works in concert can be seen below:

1) QCPortal¶
Hardware: Laptop
Actor: User
Primary Developer: MolSSI
QCPortal provides a Python-based user front-end experience for users who are interested in exploring data and executing new tasks. Exploration of computed data is augmented by the ability to generate graphs quickly and other representations to view the data in Jupyter notebooks and high-level abstractions are used to view and manipulate many individual tasks simultaneously. Querying of data and submission of new tasks occurs over the QCFractal REST API.
2) QCFractal¶
Hardware: Persistent Server
Actor: Power User
Primary Developer: MolSSI
QCFractal is the primary persistent server that QCPortal communicates with and has several main duties:
Maintain a database of all completed quantum chemistry results along with metadata that forms higher-level collections of results.
Maintain a compute queue of all requested and completed tasks. Where each task is a single quantum chemistry result.
Submit new tasks to distributed workflow engines and insert complete results into the database.
Maintain high level compute workflows via the “Services” feature.
3) Distributed Compute¶
Hardware: Persistent Server/Supercomputer
Actor: Power User (can be separate from the Fractal Power Users)
Primary Developer: Scientific and HPC Communities
The QCArchive project relies on a number of distributed compute workflow engines to enable a large variety of compute workloads. QCFractal will interact with each of these projects by submitting a series of tasks that do not have data or execution order dependence. The communication interfaces vary from Python-based API calls to REST API interfaces depending on the implementation details of the individual tools.
Current distributed compute backends are:
Dask Distributed - Multi-node task graph scheduler built in Python.
Parsl - High-performance workflows.
Fireworks - Multi-site task scheduler built in Python with a central MongoDB server.
Pending backend implementations include:
RADICAL Cybertools - Distributed task scheduler built for DOE and NSF compute resources.
BOINC - High throughput volunteer computing task manager.
Balsam - Task manager for a single compute resource (supercomputer) with tasks pulled from a central server.
The compute workers of each of these tools is executed in different ways. However, in each case the compute workers will distribute QCSchema inputs, call QCEngine, and receive a QCSchema output.
4) QCEngine¶
Hardware: Local Cluster, Supercomputer, or Cloud Compute
Actor: Power User
QCEngine is a stateless, lightweight wrapper around Quantum Chemistry programs so that these programs consistently accept and emit QCSchema. Depending on the underlying program QCEngine provides this uniform interface by either:
Calling the QCSchema IO functions that individual program have implemented.
Calling the Python-API of the program and modifying the input/output according to the QCSchema.
Writing a ASCII input file based on the input QCSchema, running the program, and parsing an ASCII output file into the QCSchema.
QCEngine also keeps track of the provenance of each task. This includes:
A description of the hardware used (CPU, GPU, memory, etc).
The total compute time and resources allocated to the run.
The function and version of the program called.
5) 3rd Party Services¶
Hardware: Laptop
Actor: User/Power User
Primary Developer: Computational Molecular Sciences Community
The QCFractal API is expected to have additional services attached by 3rd parties. These services can range from cross-reference data services to user website that visualize and interact with the data in a specific way,
QCFractal API¶
qcfractal Package¶
Main init function for qcfractal
Functions¶
|
Factory for generating storage sockets. |
Classes¶
|
|
|
|
|
|
|
|
|
This object maintains a computational queue and watches for finished tasks for different queue backends. |
|
Class Inheritance Diagram¶

qcfractal.queue Package¶
Initializer for the queue_handler folder
Functions¶
|
Constructs a queue manager based off the incoming queue socket type. |
Classes¶
|
Handles management/status querying of managers |
|
This object maintains a computational queue and watches for finished tasks for different queue backends. |
|
Manages the task queue. |
|
Handles service management (querying/add/modifying) |
|
Handles task management (querying/adding/modifying tasks) |
Class Inheritance Diagram¶

qcfractal.services Package¶
Base import for services
Functions¶
|
Initializes a service from a JSON blob. |
|
Initializes a service from a API call. |
Database Design¶
Warning
Final MongoDB Supported Version: 0.7.0
0.7.0 is the last major release which support MongoDB. Fractal is moving towards a PostgreSQL database to make upgrades more stable and because it is more suited to the nature of QCArchive Data. The upgrade path from MongoDB to PostgreSQL will be provided by the Fractal developers in the next release. Due to the complex nature of the upgrade, the PostgreSQL upgrade will through scripts which will be provided. After the PostgreSQL upgrade, there will be built-in utilities to upgrade the Database.
QCArchive stores all its data and computations in a database in the backend of QCFractal. The DB is designed with extensibility in mind, allowing flexibility and easy accommodation of future features. The current backend of the DB storage is build on top of a non-relational DB, MongoDB, but it can be easily implemented in a Relational DB like MySQL or Postgres. In addition, Object Relational Mapping (ORM) is used to add some structure and ensure validation on the MongoDB which does not have any by definition. The ORM used is the most popular general MongoDB Python ORM, Mongoengine.
The main idea behind QCArchive DB design is to be able to store and retrieve wide range of Quantum Chemistry computations using different programs and variety of configurations. The DB also stores information about jobs submitted to request computations, and all their related data, along with registered users and computational managers.
QCArchive DB is organized into a set of tables (or documents), each of which are detailed below.
1) Molecule¶
The molecule table stores molecules used in any computation in the system. The molecule structure is based on the standard QCSchema. It stores entries like geometry, masses, and fragment charges. Please refer to the QCSchema for a complete description of all the possible fields.
2) Keyword¶
Keywords are a store of key-value pairs that are configuration for some
QC program. It is flexible and there is no restriction on what configuration
can be stored here. This table referenced by the Result
table.
3) Result¶
This table stores the actual computation results along with the attributes used to calculate it. Each entry is a single unit of computation. The following are the unique set of keys (or indices) that define a result:
driver
- The type of calculation being evaluated (i.e.energy
,gradient
,hessian
,properties
)program
: such asgamess
orpsi4
(lower case)molecule
: the ID of the molecule in theMolecule
tablemethod
: the method used in the computation (b3lyp, mp2, ccsd(t))keywords
: the ID of the keywords in theKeywords
tablebasis
: the name of the basis used in the computation (6-31g, cc-pvdz, def2-svp)
For more information see: Results.
4) Procedure¶
Procedures are also computational results but in a more complex fashion.
They perform more aggregate computations like optimizations, torsion drive, and
grid optimization. The DB can support new types of optimizations by
inheriting from the the base procedure table. Each procedure usually reference
several other results from the Results
table, and possibly other procedures
(self-reference).
5) Services¶
Services are more flexible workflows that eventually produce results to be
stored in the Result
and/or the Procedure
tables when they are done.
So, from the DB point of view, this is an intermediate table for on going
iterative computations.
More about services in QCArchive can be found here: Services.
6) TaskQueue¶
This table is the main task queue of the system. Tasks are submitted to this
table by QCFractal and wait for a manager to pull it for computation. Each
task in the queue references a Result
or a Procedure
, meaning that it is
corresponding to a specific Quantum computation. The table stores the status
of the task (WAITING
, RUNNING
, COMPLETE
, and ERROR
) and also
keeps track of the execution manager and the modification dates.
7) QueueManagers¶
Managers are the registered servers for computing tasks from the TaskQueue
.
This table keep information about the server such as the host, cluster,
number of completed tasks, submissions, and failures.
The database only keeps track of what Tasks have been handed out to each Manager and maintains a heartbeat to ensure the Manager is still connected. More information about the configuration and execution of managers can be found here: Fractal Queue Managers.
Glossary¶
This glossary contains the common terms which appear over the entire Fractal project. There are other, specialized glossaries for components of Fractal which are linked below to help group terms together with their contextual docs. Some terms may appear in multiple glossaries, but will always have the same meaning, e.g. Queue Adapter and Adapter.
- DB Index¶
A DB Index (or Database Index) is a commonly queried field used to speed up searches in a DB Table.
- DB Socket¶
A DB Socket (or Database Socket) is the interface layer between standard Python queries and raw SQL or MongoDB query language.
- DB Table¶
A set of data inside the Database which has a common ObjectId. The
table
name follows SQL conventions which is also known as acollection
in MongoDB.- Fractal Config Directory¶
The directory where QCFractal Server and Database configuration files live. This is also the home of the Database itself in the default configuration. Default path is
~/.qca/qcfractal
- Hash Index¶
A index that hashes the information contained in the object in a reproducible manner. This hash index is only used to find duplicates and should not be relied upon as it may change in the future.
- Molecule¶
A unique 3D representation of a molecule. Any changes to the protonation state, multiplicity, charge, fragments, coordinates, connectivity, isotope, or ghost atoms represent a change in the molecule.
- ObjectId¶
A ObjectId (or Database ID) is a unique ID for a given row (a document or entry) in the database that uniquely defines that particular row in a DB Table. These rows are automatically generated and will be different for every database, but outlines ways to reference other rows in the database quickly. A ObjectId is unique to a DB Table.
- Procedures¶
On-node computations, these can either be a single computation (energy, gradient, property, etc.) or a series of calculations such as a geometry optimization.
- Queue Adapter¶
The interface between QCFractal’s internal queue representation and other queueing systems such as Dask or Fireworks. Also see the Adapter in the Manager glossary.
- Services¶
Iterative workflows where the required computations are distributed via the queue and then are processed on the server to acquire the next iteration of calculations.
Contextually Organized Glossaries¶
Development Guidelines¶
QCArchive developers adhere to a set of guidelines, both in the software stylistic guide, and in the outward conduct. We are working on codifying these in a clean list here, but early guides can be as follows
Software Development Guides¶
We openly encourage development of the QCArchive and all of its projects in public discourse through GitHub and the QCArchive Slack (Join our Slack group).
All changes should be proposed through a PR to the main projects from Forks of said projects.
For more details about the development cycle and guidelines, please see the DevTools Readme on GitHub for the project.
Personal Conduct Guides¶
Basic rule of thumb: Be respectful, welcome people, keep all interactions harassment-free.
Please see the full Code of Conduct on the project’s GitHub page for more information.
Changelog¶
0.15.3 / 2021-03-15¶
This is a small release focused on some database migrations to improve performance. This should greatly improve performance of certain actions (particularly task submission) with large databases.
This release also drops support for python < 3.7
Client and managers should not need to be upgraded.
0.15.0 / 2020-11-11¶
This release is focused on bugfixes, and laying some foundation for larger changes to come.
New features¶
Enhancements¶
Bug Fixes¶
(GH#624) Lock task queue rows to prevent multiple managers requesting the same task
(GH#626) Fix printing of client version during version check failure
(GH#632) Fix ordering of initial/final molecule in torsion drives
(GH#637) Fix inability to shutdown ProcessPoolExecutor workers
(GH#638) Fix incorrect error in datasets
(GH#641) Fix exception in web handler that was polluting log files
0.13.0 / 2020-01-15¶
New Features¶
Enhancements¶
(GH#507) Automatically adds collection molecules in chunks if more than the current limit needs to be submitted.
(GH#515) Conda environments now correspond to docker images in all deployed cases.
(GH#524) The
delete_collection
function was added toqcportal.FractalClient
.(GH#530) Adds the ability to specify cores per rank for node-parallel tasks in QCEngine.
(GH#507) Adds a formatting and lint check to CI during pull requests.
(GH#535) Allows dftd3 to be computed for all stoichiometries rather than just defaults.
(GH#542) Important: TaskRecord.base_result is now an ObjectId (int or str), and no more a
DBRef
. So, code that usesmy_task.base_result.id
should change to simply usemy_task.base_result
.
Bug Fixes¶
(GH#506) Fixes repeated visualize calls where previously the visualize call would corrupt local state.
(GH#521) Fixes an issue where ProcessPoolExecutor returned the incorrect number of currently running tasks.
(GH#522) Fixes a bug where
ProcedureDataset.status()
failed for specifications where only a subset was computed.(GH#525) This PR fixes ENTRYPOINT of the qcarchive_worker_openff worker. (Conda and Docker are not friends.)
(GH#532) Fixes a testing subprocess routine when coverage is enabled for coverage 5.0 breaking changes.
(GH#543) Fixes a bug where
qcfractal-server
“start” before an “upgrade” prevented the “upgrade” command from correctly running.(GH#545) Fixed an issue in Dataset.get_records() that could occur when the optional arguments keywords and basis were not provided.
0.12.2 / 2019-12-07¶
Enhancements¶
(GH#477) Removes 0.12.x xfails when connecting to the server.
(GH#481) Expands Parsl Manager Adapter to include ALCF requirements.
(GH#483) Dataset Views are now much faster to load in HDF5.
(GH#488) Allows gzipped dataset views.
(GH#490) Computes checksums on gzipped dataset views.
(GH#542)
TaskRecord.base_result
is now anObjectId
, and no more aDBRef
. So, code that usesmy_task.base_result.id
should change to simply bemy_task.base_result
.
Bug Fixes¶
(GH#486) Fixes pydantic
__repr__
issues after update.(GH#492) Fixes error where
ReactionDataset
didn’t allow a minimum number of n-body expansion to be added.(GH#493) Fixes an issue with
ReactionDataset.get_molecules
when a subset is present.(GH#494) Fixes an issue where queries with
limit=0
erroneously returned all results.(GH#496) TorsionDrive tests now avoid 90 degree angles with RDKit to avoid some linear issues in the forcefield and make them more stable.
(GH#497)
TorsionDrive.get_history
now works for extremely large (1000+) optimizations in the procedure.
0.12.0 / 2019-11-06¶
Highlights¶
The ability to handle very large datasets (1M+ entries) quickly and efficiently.
Store and compute Wavefunction information.
Build, serve, and export views for Datasets that can stored in journal supplementary information or services like Zenodo.
A new GUI dashboard to observe the current state of the server, see statistics, and fix issues.
New Features¶
(GH#433 and GH#462)
Dataset
andReactionDataset
(interface.collections
) now have adownload`
method which downloads a frozen view of the dataset. This view is used to speed up calls toget_values
,get_molecules
,get_entries
, andlist_values
.(GH#440) Wavefunctions can now be stored in the database using Result
protocols
.(GH#453) The server now periodically logs manager and current state to provide data over time.
(GH#460) Contributed values are now in their own table to speed up access of Collections.
(GH#461) Services now update their corresponding record every iteration. An example is a torsiondrive which now updates the
optimization_history
field each iteration.
Enhancements¶
(GH#429) Enables protocols for
OptimizationDataset
collections.(GH#430) Adds additional QCPortal type hints.
(GH#433, GH#443)
Dataset
andReactionDataset
(interface.collections
) are now faster for calls to calls toget_values
,get_molecules
,get_entries
, andlist_values
for large datasets if the server is configured to use frozen views. See “Server-side Dataset Views” documentation. Subsets may be passed toget_values
,get_molecules
, andget_entries
(GH#447) Enables the creation of plaintext (xyz and csv) output from Dataset Collections.
(GH#455) Projection queries should now be much faster as excluded results are not pulled to the server.
(GH#458) Collections now have a metadata field.
(GH#463)
FractalClient.list_collections
by default only returns collections whose visibility flag is set to true, and whose group is “default”. This change was made to filter out in-progress, intermediate, and specialized collections.(GH#464) Molecule insert speeds are now 4-16x faster.
Bug Fixes¶
0.11.0 / 2019-10-01¶
New Features¶
(GH#420) Pre-storage data handling through Elemental’s
Protocols
feature are now present in Fractal. Although only optimization protocols are implemented functionally, the database side has been upgraded to store protocol settings.
Enhancements¶
(GH#385, GH#404, GH#411)
Dataset
andReactionDataset
have five new functions for accessing data.get_values
returns the canonical headline value for a dataset (e.g. the interaction energy for S22) in data columns with caching, both for result-backed values and contributed values. This function replaces the now-deprecatedget_history
andget_contributed_values
.list_values
returns the list of data columns available fromget_values
. This function replaces the now-deprecatedlist_history
andlist_contributed_values
.get_records
either returnsResultRecord
or a projection. For the case ofReactionDataset
, the results are broken down into component calculations. The function replaces the now-deprecatedquery
.list_records
returns the list of data columns available fromget_records
.get_molecules
returns theMolecule
associated with a dataset.(GH#393) A new feature added to
Client
to be able to have more custom and fast queries, thecustom_query
method. Those fast queries are now used intorsiondrive.get_final_molecules
andtorsiondrive.get_final_results
. More Advanced queries will be added.(GH#394) Adds
tag
andmanager
selector fields toclient.query_tasks
. This is helpful for managing jobs in the queue and detecting failures.(GH#400, GH#401, GH#410) Adds Dockerfiles corresponding to builds on Docker Hub.
(GH#406) The
Dataset
collection’s primary indices (database level) have been updated to reflect its new understanding.
Bug Fixes¶
(GH#396) Fixed a bug in internal
Dataset
function which causedComputeResponse
to be truncated when the number of calculations is larger than the query_limit.(GH#403) Fixed
Dataset.get_values
for any method which involved DFTD3.(GH#409) Fixed a compatibility bug in specific version of Intel-OpenMP by skipping version 2019.5-281.
0.10.0 / 2019-08-26¶
Note
Stable Beta Release
This release marks Fractal’s official Stable Beta Release. This means that future, non-backwards compatible changes to the API will result in depreciation warnings.
Enhancements¶
(GH#356) Collections’ database representations have been improved to better support future upgrade paths.
(GH#375) Dataset Records are now copied alongside the Collections.
(GH#377) The
testing
suite from Fractal now exposes as a PyTest entry-point when Fractal is installed so that tests can be run from anywhere with the--pyargs qcfractal
flag ofpytest
.(GH#384) “Dataset Records” and “Reaction Dataset Records” have been renamed to “Dataset Entry” and “Reaction Dataset Entry” respectively.
(GH#387) The auto-documentation tech introduced in GH#321 has been replaced by the improved implementation in Elemental.
0.9.0 / 2019-08-16¶
New Features¶
(GH#354) Fractal now takes advantage of Elemental’s new Msgpack serialization option for Models. Serialization defaults to msgpack when available (
conda install msgpack-python [-c conda-forge]
), falling back to JSON otherwise. This results in substantial speedups for both serialization and deserialization actions and should be a transparent replacement for users within Fractal, Engine, and Elemental themselves.(GH#358) Fractal Server now exposes a CLI for user/permissions management through the
qcfractal-server user
command. See the full documentation for details.(GH#358) Fractal Server’s CLI now supports user manipulations through the
qcfractal-server user
subcommand. This allows server administrators to control users and their access without directly interacting with the storage socket.
Enhancements¶
(GH#330, GH#340, GH#348, GH#349) Many Pydantic based Models attributes are now documented and in an on-the-fly manner derived from the Pydantic Schema of those attributes.
(GH#335) Dataset’s
get_history
function is fixed by allowing the ability to force a new query even if one has already been cached.(GH#338) The Queue Manager which generated a
Result
is now stored in theResult
records themselves.(GH#341) Skeletal Queue Manager YAML files can now be generated through the
--skel
or--skeleton
CLI flag onqcfractal-manager
(GH#361) Staged DB’s in Fractal copy Alembic alongside them.
(GH#363) A new REST API hook for services has been added so Clients can manage Services.
0.8.0 / 2019-07-25¶
Breaking Changes¶
Warning
PostgreSQL is now the only supported database backend.
Fractal has officially dropped support for MongoDB in favor of PostgreSQL as our database backend. Although MongoDB served the start of Fractal well, our database design as evolved since then and will be better served by PostgreSQL.
New Features¶
(GH#307, GH#319 GH#321) Fractal’s Server CLI has been overhauled to more intuitively and intelligently control Server creation, startup, configuration, and upgrade paths. This is mainly reflected in a Fractal Server config file, a config folder (default location
~/.qca
, and sub-commandsinit
,start
,config
, andupgrade
of theqcfractal-server (command)
CLI. See the full documentation for details(GH#323) First implementation of the
GridOptimizationDataset
for collecting Grid Optimization calculations. Not yet fully featured, but operational for users to start working with.
Enhancements¶
(GH#291) Tests have been formally added for the Queue Manager to reduce bugs in the future. They cannot test on actual Schedulers yet, but its a step in the right direction.
(GH#295) Quality of life improvement for Mangers which by default will be less noisy about heartbeats and trigger a heartbeat less frequently. Both options can still be controlled through verbosity and a config setting.
(GH#296) Services are now prioritized by the date they are created to properly order the compute queue.
(GH#301)
TorsionDriveDataset
status can now be checked through the.status()
method which shows the current progress of the computed data.(GH#310) The Client can now modify tasks and restart them if need be in the event of random failures.
(GH#313) Queue Managers now have more detailed statistics about failure rates, and core-hours consumed (estimated)
(GH#314) The
PostgresHarness
has been improved to include better error handling if Postgress is not found, and will not try to stop/start if the target data directory is already configured and running.(GH#318) Large collections are now automatically paginated to improve Server/Client response time and reduce query sizes. See also GH#322 for the Client-side requested pagination.
(GH#322) Client’s can request paginated queries for quicker responses. See also GH#318 for the Server-side auto-pagination.
(GH#322)
Record
models and their derivatives now have aget_molecule()
method for fetching the molecule directly.(GH#324) Optimization queries for its trajectory pull the entire trajectory in one go and keep the correct order.
get_trajectory
also pulls the correct order.(GH#325) Collections’ have been improved to be more efficient. Previous queries are cached locally and the
compute
call is now a single function, removing the need to make a separate call to the submission formation.(GH#326)
ReactionDataset
now explicitly groups the fragments to future-proof this method from upstream changes toMolecule
fragmentation.(GH#329) All API requests are now logged server side anonymously.
(GH#331) Queue Manager jobs can now auto-retry failed jobs a finite number of times through QCEngine’s retry capabilities. This will only catch RandomErrors and all other errors are raised normally.
(GH#332) SQLAlchemy layer on the PostgreSQL database has received significant polish
Bug Fixes¶
(GH#291) Queue Manager documentation generation works on Pydantic 0.28+. A number as-of-yet uncaught/unseen bugs were revealed in tests and have been fixed as well.
(GH#300) Errors thrown in the level between Managers and their Adapters now correctly return a
FailedOperation
instead ofdict
to be consistent with all other errors and not crash the Manager.(GH#301) Invalid passwords present a helpful error message now instead of raising an Internal Server Error to the user.
(GH#306) The Manager CLI option
tasks-per-worker
is correctly hyphens instead of underscores to be consistent with all other flags.(GH#316) Queue Manager workarounds for older versions of Dask-Jobqueue and Parsl have been removed and implicit dependency on the newer versions of those Adapters is enforced on CLI usage of
qcfractal-manager
. These packages are not required for Fractal, so their versions are only checked when specifically used in the Managers.(GH#320) Duplicated
initial_molecules
in theTorsionDriveDataset
will no longer cause a failure in adding them to the database while still preserving de-duplication.(GH#327) Jupyter Notebook syntax highlighting has been fixed on Fractal’s documentation pages.
(GH#331) The BaseModel/Settings auto-documentation function can no longer throw an error which prevents using the code.
0.7.2 / 2019-05-31¶
New Features¶
Enhancements¶
(GH#275) Documentation has been further updated to be more contiguous between pages.
(GH#276) Imports and type hints in Database objects have been improved to remove ambiguity and make imports easier to follow.
(GH#280) Optimizations queried in the database are done with a more efficient lazy
selectin
. This should make queries much faster.(GH#281) Database Migration tech has been moved to their own folder to keep them isolated from normal production code. This PR also called the testing database
test_qcarchivedb
to avoid clashes with production DBs. Finally, a new keyword for testing geometry optimizations has been added.
0.7.1 / 2019-05-28¶
0.7.0 / 2019-05-27¶
Warning
Final MongoDB Supported Release
This is the last major release which support MongoDB. Fractal is moving towards a PostgreSQL for database to make upgrades more stable and because it is more suited to the nature of QCArchive Data. The upgrade path from MongoDB to PostgreSQL will be provided by the Fractal developers in the next release. Due to the complex nature of the upgrade, the PostgreSQL upgrade will through scripts which will be provided. After the PostgreSQL upgrade, there will be built-in utilities to upgrade the Database.
New Features¶
(GH#206, GH#249, GH#264, GH#267) SQL Database is now feature complete and implemented. As final testing in production is continued, MongoDB will be phased out in the future.
(GH#242) Parsl can now be used as an
Adapter
in the Queue Managers.(GH#247) The new
OptimizationDataset
collection has been added! This collection returns a set of optimized molecular structures given an initial input.(GH#254) The QCFractal Server Dashboard is now available through a Dash interface. Although not fully featured yet, future updates will improve this as features are requested.
(GH#260) Its now even easier to install Fractal/Portal through conda with pre-built environments on the
qcarchive
conda channel. This channel only provides environment files, no packages (and there are not plans to do so.)(GH#269) The Fractal Snowflake project has been extended to work in Jupyter Notebooks. A Fractal Snowflake can be created with the
FractalSnowflakeHandler
inside of a Jupyter Session.
Database Compatibility Updates¶
(GH#256) API calls to Elemental 0.4 have been updated. This changes the hashing system and so upgrading your Fractal Server instance to this (or higher) will require an upgrade path to the indices.
Enhancements¶
(GH#238)
GridOptimizationRecord
supports the helper functionget_final_molecules
which returns the set of molecules at each final, optimized grid point.(GH#259) Both
GridOptimizationRecord
andTorsionDriveRecord
support the helper functionget_final_results
, which is likeget_final_molecules
, but for x(GH#241) The visualization suite with Plotly has been made more general so it can be invoked in different classes. This particular PR updates the TorsionDriveDataSet objects.
(GH#243) TorsionDrives in Fractal now support the updated Torsion Drive API from the underlying package. This includes both the new arguments and the “extra constraints” features.
(GH#244) Tasks which fail are now more verbose in the log as to why they failed. This is additional information on top of the number of pass/fail.
(GH#246) Queue Manager
verbosity
level is now passed down into the adapter programs as well and the log file (if set) will continue to print to the terminal as well as the physical file.(GH#247) Procedure classes now all derive from a common base class to be more consistent with one another and for any new Procedures going forward.
(GH#248) Jobs which fail, or cannot be returned correctly, from Queue Managers are now better handled in the Manager and don’t sit in the Manager’s internal buffer. They will attempt to be returned to the Server on later updates. If too many jobs become stale, the Manager will shut itself down for safety.
(GH#258 and GH#268) Fractal Queue Managers are now fully documented, both from the CLI and through the doc pages themselves. There have also been a few variables renamed and moved to be more clear the nature of what they do. See the PR for the renamed variables.
(GH#251) The Fractal Server now reports valid minimum/maximum allowed client versions. The Portal Client will try check these numbers against itself and fail to connect if it is not within the Server’s allowed ranges. Clients started from Fractal’s
interface
do not make this check.
Bug Fixes¶
(GH#248) Fixed a bug in Queue Managers where the extra worker startup commands for the Dask Adapter were not being parsed correctly.
(GH#250) Record objects now correctly set their provenance time on object creation, not module import.
(GH#253) A spelling bug was fixed in GridOptimization which caused hashing to not be processed correctly.
(GH#270) LSF clusters not in
MB
for the units on memory by config are now auto-detected (or manually set) without large workarounds in the YAML file and the CLI file itself. Supports documented settings of LSF 9.1.3.
0.6.0 / 2019-03-30¶
0.5.5 / 2019-03-26¶
New Features¶
(GH#228) ReactionDatasets visualization statistics plots can now be generated through Plotly! This feature includes bar plots and violin plots and is designed for interactive use through websites, Jupyter notebooks, and more.
(GH#233) TorsionDrive Datasets have custom visualization statistics through Plotly! This allows plotting 1-D torsion scans against other ones.
Enhancements¶
(GH#226) LSF can now be specified for the Queue Managers for Dask Managers.
(GH#228) Plotly is an optional dependency overall, it is not required to run QCFractal or QCPortal but will be downloaded in some situations. If you don’t have Plotly installed, more graceful errors beyond just raw
ImportErrors
are given.(GH#234) Queue Managers now report the number of passed and failed jobs they return to the server and can also have verbose (debug level) outputs to the log.
(GH#234) Dask-driven Queue Managers can now be set to simply scale up to a fixed number of workers instead of trying to adapt the number of workers on the fly.
Bug Fixes¶
(GH#227) SGE Clusters specified in Queue Manager under Dask correctly process
job_extra
for additional scheduler headers. This is implemented in a stable way such that if the upstream Dask Jobqueue implements a fix, the Manager will keep working without needing to get a new release.(GH#234) Fireworks managers now return the same pydantic models as every other manager instead of raw dictionaries.
0.5.4 / 2019-03-21¶
New Features¶
(GH#216) Jobs submitted to the queue can now be assigned a priority to be served out to the Managers.
(GH#219) Temporary, pop-up, local instances of
FractalServer
can now be created through theFractalSnowflake
. This creates an instance ofFractalServer
, with its database structure, which is entirely held in temporary storage and memory, all of which is deleted upon exit/stop. This feature is designed for those who want to tinker with Fractal without needed to create their own database or connect to a productionFractalServer
.(GH#220) Queue Managers can now set the
scratch_directory
variable that is passed to QCEngine and its workers.
Enhancements¶
(GH#216) Queue Managers now report what programs and procedures they have access to and will only pull jobs they think they can execute.
(GH#222) All of
FractalClient
’s methods now have full docstrings and type annotations for clairy(GH#222) Massive overhaul to the REST interface to simplify internal calls from the client and server side.
(GH#223)
TorsionDriveDataset
objects are modeled through pydantic objects to allow easier interface with the database back end and data validation.
Bug Fixes¶
(GH#215) Dask Jobqueue for the
qcfractal-manager
is now tested and working. This resolve the outstanding issue introduced in GH#211 and pushed in v0.5.3.(GH#216) Tasks are now stored as
TaskRecord
pydantic objects which now preempts a bug introduced from providing the wrong schema.(GH#217) Standalone QCPortal installs now report the correct version
(GH#221) Fixed a bug in
ReactionDataset.query
where passing inNone
was treated as a string.
0.5.3 / 2019-03-13¶
New Features¶
(GH#207) All compute operations can now be augmented with a
tag
which can be later consumed by differentQueueManager
s to only carry out computations with specified tags.(GH#210) Passwords in the database can now be generated for new users and user information can be updated (server-side only)
(GH#210)
Collections
can now be updated automatically from the defaults(GH#211) The
qcfractal-manager
CLI command now accepts a config file for more complex managers through Dask JobQueue. As such, many of the command line flags have been altered and can be used to either spin up a PoolExecutor, or overwrite the config file on-the-fly. As of this PR, the Dask Jobqueue component has been untested. Future updates will indicate when this has been tested.
Enhancements¶
(GH#203)
FractalClient
’sget_X
methods have been renamed toquery_X
to better reflect what they actually do. An exception to this is theget_collections
method which is still a trueget
.(GH#207)
FractalClient.list_collections
now respects show case sensitive results and queries are case insensitive(GH#207)
FractalServer
can now compress responses to reduce the amount of data transmitted over the serialization. The main benefactor here is theOpenFFWorkflow
collection which has significant transfer speed improvements due to compression.(GH#207) The
OpenFFWorkflow
collection now has better validation on input and output data.(GH#210) The
OpenFFWorkflow
collection only stores databaseid
to reduce duplication and data transfer quantities. This results in about a 50x duplication reduction.(GH#211) The
qcfractal-template
command now has fields for Fractal username and password.(GH#212) The docs for QCFractal and QCPortal have been split into separate structures. They will be hosted on separate (although linked) pages, but their content will all be kept in the QCFractal source code. QCPortal’s docs are for most users whereas QCFractal docs will be for those creating their own Managers, Fractal instances, and developers.
Bug Fixes¶
(GH#207)
FractalClient.get_collections
is now correctly case insensitive.(GH#210) Fixed a bug in the
iterate
method of services which returned the wrong status if everything completed right away.(GH#210) The
repr
of the MongoEngine Socket now displays correctly instead of crashing the socket due to missing attribute
0.5.2 / 2019-03-08¶
New Features¶
(GH#197) New
FractalClient
instances will automatically connect to the central MolSSI Fractal Server
Enhancements¶
(GH#195) Read-only access has been granted to many objects separate from their write access. This is in contrast to the previous model where either there was no access security, or everything was access secure.
(GH#197) Unknown stoichiometry are no longer allowed in the
ReactionDataset
(GH#197) CLI for FractalServer uses Executor only to encourage using the Template Generator introduced in GH#177.
(GH#197)
Dataset
objects can now query keywords from aliases as well.
Bug Fixes¶
(GH#195) Manager cannot pull too many tasks and potentially loose data due to query limits.
(GH#195)
Records
now correctly adds Provenance information(GH#196)
compute_torsion
example update to reflect API changes(GH#197) Fixed an issue where CLI input flags were not correctly overwriting default values
(GH#197) Fixed an issue where
Collections
were not correctly updating when thesave
function was called on existing objects in the database.(GH#197)
_qcfractal_tags
are no longer carried through theRecords
objects in errant.(GH#197) Stoichiometry information is no longer accepted in the
Dataset
object since this is not used in this class of object anymore (seeReactionDataset
).
0.5.1 / 2019-03-04¶
New Features¶
(GH#177) Adds a new
qcfractal-template
command to generateqcfractal-manager
scripts.(GH#181) Pagination is added to queries, defaults to 1000 matches.
(GH#185) Begins setup documentation.
(GH#186) Begins database design documentation.
(GH#187) Results add/update is now simplified to always store entire objects rather than update partials.
(GH#189) All database compute records now go through a single
BaseRecord
class that validates and hashes the objects.
Enhancements¶
(GH#175) Refactors query massaging logic to a single function, ensures all program queries are lowercase, etc.
(GH#175) Keywords are now lazy reference fields.
(GH#182) Reworks models to have strict fields, and centralizes object hashing with many tests.
(GH#183) Centralizes duplicate checking so that accidental mixed case duplicate results could go through.
(GH#190) Adds QCArchive sphinx theme to the documentation.
0.5.0 / 2019-02-20¶
New Features¶
(GH#165) Separates datasets into a Dataset, ReactionDataset, and OptimizationDataset for future flexability.
(GH#168) Services now save their Procedure stubs automatically, the same as normal Procedures.
(GH#169)
setup.py
now uses the README.md and conveys Markdown to PyPI.(GH#171) Molecule addition now takes in a flat list and returns a flat list of IDs rather than using a dictionary.
(GH#173) Services now return their correspond Procedure ID fields.
Enhancements¶
0.5.0rc1 / 2019-02-15¶
New Features¶
(GH#114) A new Collection:
Generic
, has been added to allow semi-structured user defined data to be built without relying only on implemented collections.(GH#125) QCElemental common pydantic models have been integrated throughout the QCFractal code base, making a common model repository for the prevalent
Molecule
object (and others) come from a single source. Also converted QCFractal to pass serialized pydantic objects between QCFractal and QCEngine to allow validation and (de)serialization of objects automatically.(GH#130, GH#142, and GH#145) Pydantic serialization has been added to all REST calls leaving and entering both QCFractal Servers and QCFractal Portals. This allows automatic REST call validation and formatting on both server and client sides.
(GH#141 and GH#152) A new GridOptimizationRecord service has been added to QCFractal. This feature supports relative starting positions from the input molecule.
Enhancements¶
General note: Options
objects have been renamed to KeywordSet
to better match their goal (See GH#155.)
(GH#110) QCFractal now depends on QCElemental and QCEngine to improve consistent imports.
(GH#116) Queue Manger Adapters are now more generalized and inherit more from the base classes.
(GH#118) Single and Optimization procedures have been streamlined to have simpler submission specifications and less redundancy.
(GH#133) Fractal Server and Queue Manager startups are much more verbose and include version information.
(GH#135) The TorsionDriveService has a much more regular structure based on pydantic models and a new TorsionDrive model has been created to enforce both validation and regularity.
(GH#143)
Task``s in the Mongo database can now be referenced by multiple ``Results
andProcedures
(i.e. a singleResult
orProcedure
does not have ownership of aTask
.)(GH#147) Service submission has been overhauled such that all services submit to a single source. Right now, only one service can be submitted at a time (to be expanded in a future feature.) TorsionDrive can now have multiple molecule inputs.
(GH#149) Package import logic has been reworked to reduce the boot-up time of QCFractal from 3000ms at the worst to about 600ms.
(GH#150)
KeywordSet
objects are now modeled much more consistently through pydantic models and are consistently hashed to survive round trip serialization.(GH#153) Datasets now support option aliases which map to the consistent
KeywordSet
models from GH#150.(GH#155) Adding multiple
Molecule
orResult
objects to the database at the same time now always return their Database ID’s if added, and order of returned list of ID’s matches input order. This PR also renamedOptions
toKeywordSet
to properly reflect the goal of the object.(GH#156) Memory and Number of Cores per Task can be specified when spinning up a Queue Manager and/or Queue Adapter objects. These settings are passed on to QCEngine. These must be hard-set by users and no environment inspection is done. Users may continue to choose not to set these and QCEngine will consume everything it can when it lands on a compute.
(GH#162) Services can now be saved and fetched from the database through MongoEngine with document validation on both actions.
Bug Fixes¶
(GH#132) Fixed MongoEngine Socket bug where calling some functions before others resulted in an error due to lack of initialized variables.
(GH#133)
Molecule
objects cannot be oriented once they enter the QCFractal ecosystem (after optional initial orientation.)Molecule
objects also cannot be oriented by programs invoked by the QCFractal ecosystem so orientation is preserved post-calculation.(GH#146) CI environments have been simplified to make maintaining them easier, improve test coverage, and find more bugs.
(GH#158) Database addition documents in general will strip IDs from the input dictionary which caused issues from MongoEngine having a special treatment for the dictionary key “id”.
0.4.0a / 2019-01-15¶
This is the fourth alpha release of QCFractal focusing on the database backend and compute manager enhancements.
New Features¶
(GH#78) Migrates Mongo backend to MongoEngine.
(GH#78) Overhauls tasks so that results or procedures own a task and ID.
(GH#78) Results and procedures are now inserted upon creation, not just completion. Added a status field to results and procedures.
(GH#78) Overhauls storage API to no longer accept arbitrary JSON queries, but now pinned kwargs.
(GH#106) Compute managers now have heartbeats and tasks are recycled after a manager has not been heard from after a preset interval.
(GH#106) Managers now also quietly shutdown on SIGTERM as well as SIGINT.
0.3.0a / 2018-11-02¶
This is the third alpha release of QCFractal focusing on a command line interface and the ability to have multiple queues interacting with a central server.
New Features¶
(GH#72) Queues are no longer required of FractalServer instances, now separate QueueManager instances can be created that push and pull tasks to the server.
(GH#75) CLI’s have been added for the
qcfractal-server
andqcfractal-manager
instances.(GH#83) The status of server tasks and services can now be queried from a FractalClient.
(GH#82) OpenFF Workflows can now add single optimizations for fragments.
Enhancements¶
(GH#74) The documentation now has flowcharts showing task and service pathways through the code.
(GH#73) Collection
.data
attributes are now typed and validated with pydantic.(GH#85) The CLI has been enhanced to cover additional features such as
queue-manager
ping time.(GH#84) QCEngine 0.4.0 and geomeTRIC 0.9.1 versions are now compatible with QCFractal.
0.2.0a / 2018-10-02¶
This is the second alpha release of QCFractal containing architectural changes to the relational pieces of the database. Base functionality has been expanded to generalize the collection idea with BioFragment and OpenFFWorkflow collections.
Documentation¶
(GH#58) A overview of the QCArchive project was added to demonstrate how all modules connect together.
New Features¶
(GH#57) OpenFFWorkflow and BioFragment collections to support OpenFF uses cases.
(GH#57) Requested compute will now return the id of the new submissions or the id of the completed results if duplicates are submitted.
(GH#67) The OpenFFWorkflow collection now supports querying of individual geometry optimization trajectories and associated data for each torsiondrive.
Enhancements¶
(GH#43) Services and Procedures now exist in the same unified table when complete as a single procedure can be completed in either capacity.
(GH#44) The backend database was renamed to storage to prevent misunderstanding of the Database collection.
(GH#47) Tests can that require an activate Mongo instance are now correctly skipped.
(GH#51) The queue now uses a fast hash index to determine uniqueness and prevent duplicate tasks.
(GH#52) QCFractal examples are now tested via CI.
(GH#53) The MongoSocket
get_generic_by_id
was deprecated in favor ofget_generic
where an ID can be a search field.(GH#61, GH#64) TorsionDrive now tracks tasks via ID rather than hash to ensure integrity.
(GH#63) The Database collection was renamed Dataset to more correctly illuminate its purpose.
(GH#65) Collection can now be aquired directly from a client via the
client.get_collection
function.
0.1.0a / 2018-09-04¶
This is the first alpha release of QCFractal containing the primary structure of the project and base functionality.
New Features¶
(GH#41) Molecules can now be queried by molecule formula
(GH#39) The server can now use SSL protection and auto-generates SSL certificates if no certificates are provided.
(GH#31) Adds authentication to the FractalServer instance.
(GH#26) Adds TorsionDrive (formally Crank) as the first service.
(GH#26) Adds a “services” feature which can create large-scale iterative workflows.
(GH#21) QCFractal now maintains its own internal queue and uses queuing services such as Fireworks or Dask only for the currently running tasks
Enhancements¶
(GH#40) Examples can now be testing through PyTest.
(GH#38) First major documentation pass.
(GH#37) Canonicalizes string formatting to the
"{}".format
usage.(GH#36) Fireworks workflows are now cleared once complete to keep the active entries small.
(GH#35) The “database” table can now be updated so that database entries can now evolve over time.
(GH#32) TorsionDrive services now track all computations that are completed rather than just the last iteration.
(GH#30) Creates a Slack Community and auto-invite badge on the main readme.
(GH#24) Remove conda-forge from conda-envs so that more base libraries can be used.
Bug Fixes¶
Innumerable bug fixes and improvements in this alpha release.