Welcome to HDNNP’s documentation!

What is HDNNP?

This program is an implementation of HDNNP that is suggested by Behler et al [Ref].
HDNNP stands for High Dimensional Neural Network Potential.
HDNNP is one of machine learning potentials that is used to reduce calculation cost of DFT(Density Functional Theory) calculation.
Currently, energy and force prediction using symmetry function have been implemented.
[Ref]https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890

How to install HDNNP

Python installation

We recommend that you install python using pyenv, because non-sudo user can install any python version on any computer.
We confirmed that this program works only with python 3.6.7.
(on Linux)
$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
(on MacOS)
$ brew install pyenv

$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
$ echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
$ source ~/.bash_profile

$ pyenv install 3.6.7

Get source code

Note

This program is now under development, not uploaded to PyPI.
You have to get source code and install it manually.
$ git clone https://github.com/ogura-edu/HDNNP.git

Install dependencies and this program

Via pipenv

$ cd HDNNP/
$ pyenv local 3.6.7
$ pip install pipenv
$ pipenv install --dev

(activate)
$ pipenv shell

(for example:)
(HDNNP) $ hdnnpy train

(deactivate)
(HDNNP) $ exit

Via anaconda

Anaconda also can be installed by pyenv.

$ cd HDNNP/
$ pyenv install anaconda3-xxx
$ pyenv local anaconda3-xxx
$ conda env create -n HDNNP --file condaenv.yaml

(activate)
$ conda activate HDNNP

(for example:)
(HDNNP) $ hdnnpy train

(deactivate)
(HDNNP) $ conda deactivate

Via raw pip

You can install all dependent packages manually. The dependent packages are written in Pipfile, condaenv.yaml or requirements.txt.

$ cd HDNNP/
$ pip install PKG1 PKG2 ...
$ pip install --editable .

How to use HDNNP

Data generation

Usually, HDNNP is used to reduce cost by learning the result of DFT(Density Functional Theory) calculation that is high accuracy and high cost.
Therefore, first step is to generate training dataset using DFT calculation such as ab-initio MD calculation.

Pre-processing

HDNNP training application supports only .xyz file format.
We prepare a python script to convert the output file of VASP such as OUTCAR to .xyz format file, but in the same way you can convert the output of other DFT calculation program to .xyz format file.
Inside this program, file format conversion is performed using ASE package.

Training

Configuration

A default configuration file for training is located in examples/training_config.py.

training_config.py consists of some subclasses that inherits traitlets.config.Configurable:

  • c.Application.xxx
  • c.TrainingApplication.xxx
  • c.DatasetConfig.xxx
  • c.ModelConfig.xxx
  • c.TrainingConfig.xxx

Following configurations are required, and remaining configurations are optional.

  • c.DatasetConfig.parameters
  • c.ModelConfig.layers
  • c.TrainingConfig.data_file
  • c.TrainingConfig.batch_size
  • c.TrainingConfig.epoch
  • c.TrainingConfig.order
  • c.TrainingConfig.loss_function
  • c.TrainingConfig.interval
  • c.TrainingConfig.patients

For details of each setting, see training_config.py

Command line interface

Execute the following command in the directory where training_config.py is located.

$ hdnnpy train

Note

Currently, if output directory set by c.TrainingConfig.out_dir already exists, it overwrites the existing file in the directory.
If you want to avoid this, please change c.TrainingConfig.out_dir for each execution.

Prediction

Configuration

A default configuration file for prediction is located in examples/prediction_config.py.

prediction_config.py consists of some subclasses that inherits traitlets.config.Configurable:

  • c.Application.xxx
  • c.PredictionApplication.xxx
  • c.PredictionConfig.xxx

Following configurations are required, and remaining configurations are optional.

  • c.PredictionConfig.data_file
  • c.PredictionConfig.order

For details of each setting, see prediction_config.py

Command line interface

Execute the following command in the directory where prediction_config.py is located.

$ hdnnpy predict

Post-processing

It is possible to calculate MD simulation with LAMMPS using trained HDNNP.
However, it is also under development.
We welcome your comments and suggestions.

HDNNP-LAMMPS interface program

Command line interface

Execute the following command.

$ hdnnpy convert
2 command line options are available, and no config file is used in this command.
To see details of these options, use
$ hdnnpy convert -h

Execution example

GaN interatomic potential

In this section, show you an execution example of HDNNP training using 1st order differentiation of interatomic potential (e.g. interatomic forces) of GaN

Data file

Prepare a .xyz format file which have some structures with energy and force data.

GaN.xyz

32
Lattice="6.46474316 0.0 0.0 -3.23237159 5.5986318 0.0 0.0 0.0 10.53232454" Properties=species:S:1:pos:R:3:forces:R:3 energy=-194.5164333 tag=CrystalGa16N16 pbc="T T T"
Ga       1.61619000       0.93311000       2.62845000       0.00000300       0.00001200      -0.00570900
Ga       3.23237000       3.73242000       2.62845000       0.00003900      -0.00004700      -0.00571500
Ga       4.84856000       0.93311000       2.62845000       0.00000400      -0.00001100      -0.00563600
Ga      -0.00000000       3.73242000       7.89461000      -0.00003800       0.00003200      -0.00564200
Ga       1.61619000       0.93311000       7.89461000       0.00006100      -0.00001800      -0.00571100
Ga       3.23237000       3.73242000       7.89461000       0.00002100      -0.00006400      -0.00572000
Ga       4.84856000       0.93311000       7.89461000      -0.00003200      -0.00002300      -0.00565600
Ga      -0.00000000       3.73242000       2.62845000       0.00002100      -0.00002000      -0.00565100
Ga      -0.00000000       1.86621000       5.26153000      -0.00006900       0.00005900      -0.00572300
Ga       1.61619000       4.66553000       5.26153000      -0.00002700       0.00008200      -0.00571900
Ga       3.23237000       1.86621000       5.26153000       0.00001800      -0.00001400      -0.00566500
Ga      -1.61619000       4.66553000      10.52769000      -0.00002700      -0.00002600      -0.00566900
Ga      -0.00000000       1.86621000      10.52769000      -0.00002200       0.00008500      -0.00568700
Ga       1.61619000       4.66553000      10.52769000       0.00000600      -0.00002400      -0.00574300
Ga       3.23237000       1.86621000      10.52769000       0.00000100       0.00007600      -0.00564000
Ga      -1.61619000       4.66553000       5.26153000       0.00002200      -0.00000200      -0.00568800
N       1.61619000       0.93311000       4.61253000       0.00005500      -0.00002000      -0.00041000
N       3.23237000       3.73242000       4.61253000       0.00003600      -0.00000900      -0.00037900
N       4.84856000       0.93311000       4.61253000      -0.00004100       0.00000700      -0.00041100
N      -0.00000000       3.73242000       9.87869000      -0.00001300      -0.00003500      -0.00042500
N       1.61619000       0.93311000       9.87869000       0.00001200       0.00002900      -0.00040900
N       3.23237000       3.73242000       9.87869000       0.00002700      -0.00006200      -0.00041700
N       4.84856000       0.93311000       9.87869000      -0.00000400       0.00002500      -0.00041500
N      -0.00000000       3.73242000       4.61253000      -0.00004500      -0.00000400      -0.00041800
N      -0.00000000       1.86621000       1.97945000       0.00000000      -0.00000800      -0.00034400
N       1.61619000       4.66553000       1.97945000      -0.00000200       0.00000500      -0.00033700
N       3.23237000       1.86621000       1.97945000       0.00001700       0.00001600      -0.00036100
N      -1.61619000       4.66553000       7.24561000       0.00002800      -0.00002300      -0.00036000
N      -0.00000000       1.86621000       7.24561000      -0.00008200       0.00001500      -0.00043200
N       1.61619000       4.66553000       7.24561000      -0.00002200       0.00004200      -0.00040100
N       3.23237000       1.86621000       7.24561000       0.00001900      -0.00001200      -0.00039500
N      -1.61619000       4.66553000       1.97945000       0.00000400      -0.00001800      -0.00046000
32
Lattice="6.46474316 0.0 0.0 -3.23237159 5.5986318 0.0 0.0 0.0 10.53232454" Properties=species:S:1:pos:R:3:forces:R:3 energy=-169.96635976 tag=CrystalGa16N16 pbc="T T T"
Ga       1.44265000       1.46790000       2.04947000      -0.95595000      -3.56110800       2.54045000
Ga       2.88538000       4.34404000       2.89380000       4.75932000      -2.04809500      -1.43108200
Ga       4.38372000       0.68215000       2.61606000       0.15090500       6.97113700       2.40537400
Ga       0.47836000       3.95213000       7.90284000      -3.31821700      -0.13409600      -0.21437100
Ga       1.82415000       1.43420000       8.18380000      -0.78327100      -2.70531000      -3.50469000
Ga       3.49351000       3.96284000       7.92622000       1.84595600      -0.42627100      -0.16593100
Ga       5.17229000       0.83662000       7.71745000      -0.46937900       1.21688400       1.11923500
Ga      -0.04508000       3.95689000       2.71946000      -3.88117900      -1.84159800       0.64959300
Ga      -0.96518000       1.98086000       5.22137000       1.12890800      -1.31857500      -0.37168600
Ga       1.18573000       3.20454000       5.22045000       1.58317800       1.58466500       0.77557000
Ga       2.91073000       1.45415000       5.60119000      -0.29420600      -1.79185700      -2.55652100
Ga      -0.99634000       4.45389000       0.07004000      -2.39983600       3.43545000       1.27018200
Ga       0.17764000       1.60544000      10.36435000       6.30208700       4.30252400       2.73199900
Ga       2.35420000       4.13573000       0.39168000      -1.28509600      -0.64262000      -3.92936300
...
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.3605335 tag=CrystalGa2N2 pbc="T T T"
Ga       1.60815000       0.92846000       2.61537000       0.00057000      -0.00032400      -0.00131800
Ga       0.00000000       1.85693000       5.23535000      -0.00055000       0.00030900      -0.00128000
N       1.60815000       0.92846000       4.58958000       0.00038300      -0.00020300       0.00049500
N       0.00000000       1.85693000       1.96960000      -0.00030900       0.00021200       0.00050600
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.04284841 tag=CrystalGa2N2 pbc="T T T"
Ga       1.56998000       1.01961000       2.64712000       0.37879200      -0.65345000      -0.84588100
Ga       0.00233000       1.78610000       5.21359000       1.53422400       0.01126800       0.83092200
N       1.80998000       0.78162000       4.55671000      -1.91098000       0.49960800      -0.07141600
N      -0.02338000       1.90257000       1.95274000       0.00855700       0.14604000       0.09234500
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.07370026 tag=CrystalGa2N2 pbc="T T T"
Ga       1.68022000       0.78468000       2.59601000      -0.77026300       1.15126700       0.71828100
Ga      -0.04831000       1.97869000       0.01593000      -1.05203000       0.42443800      -0.31339000
N       1.47544000       1.12447000       4.57171000       1.50854300      -1.32922700      -0.04524600
N       0.01431000       1.77059000       1.98155000       0.31937700      -0.24596800      -0.35639000
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.06789171 tag=CrystalGa2N2 pbc="T T T"
Ga       1.55216000       1.03346000       2.59780000       1.76477100      -1.33788800       0.62275500
Ga       0.04645000       1.78043000       0.02483000      -0.39888700      -0.84820500      -0.84426800
N       1.59299000       0.75442000       4.54056000       0.36047300       1.45854900       0.51138400
N       0.06265000       1.88907000       1.95951000      -1.73396900       0.72932900      -0.27762300
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.10933618 tag=CrystalGa2N2 pbc="T T T"
Ga       1.62285000       0.92354000       2.56898000      -0.87387700       0.84344000       1.29437700
Ga      -0.00655000       1.82730000       0.04373000       0.63633100       1.10065300      -1.07564600
N       1.65007000       1.03662000       4.56438000      -0.83168500      -1.16592600       0.26072300
N      -0.08253000       1.92082000       1.98507000       1.07124400      -0.78418500      -0.47994500
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.15961153 tag=CrystalGa2N2 pbc="T T T"
Ga       1.61929000       0.86275000       2.60668000       0.91655600       0.12884500       0.02524600
Ga      -0.02746000       1.90759000       0.02534000      -0.00425900       0.48361500      -1.32527900
N       1.57325000       1.05930000       4.54898000       0.29235100      -0.94998800       0.25695700
N       0.11613000       1.80106000       1.90435000      -1.21017800       0.33509300       1.05032200
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-23.90497111 tag=CrystalGa2N2 pbc="T T T"
Ga       1.57753000       1.01962000       2.53889000      -0.58498700       0.38561600       1.95812800
Ga       0.05221000       1.77667000       0.06084000      -0.50913400      -1.39207300      -1.16507600
N       1.60109000       0.71987000       4.62834000       0.25821000       2.35785600      -0.69708500
N      -0.10050000       2.01120000       1.98576000       0.83273600      -1.35617800      -0.10520400
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-24.17936965 tag=CrystalGa2N2 pbc="T T T"
Ga       1.65588000       0.84325000       2.61391000      -0.48280700       0.58352400      -0.06140200
Ga      -0.05236000       1.91994000       0.00989000       1.13163900       0.73695700      -0.46324400
N       1.63413000       1.09260000       4.55873000      -1.08709100      -1.30806300       0.05205700
N      -0.00295000       1.80336000       1.93549000       0.44154800      -0.01662100       0.47920500
4
Lattice="3.21629013 0.0 0.0 -1.60814507 2.78538896 0.0 0.0 0.0 5.23996246" Properties=species:S:1:pos:R:3:forces:R:3 energy=-23.82707164 tag=CrystalGa2N2 pbc="T T T"
...

Config file

training_config.py (necessary parts picked up)

c.TrainingApplication.verbose = True

c.DatasetConfig.parameters = {
   'type1': [
       (5.0,),
       ],
   'type2': [
       (5.0, 0.01, 2.0),
       (5.0, 0.01, 3.2),
       (5.0, 0.01, 3.8),
       (5.0, 0.1, 2.0),
       (5.0, 0.1, 3.2),
       (5.0, 0.1, 3.8),
       (5.0, 1.0, 2.0),
       (5.0, 1.0, 3.2),
       (5.0, 1.0, 3.8),
       ],
   'type4': [
       (5.0, 0.01, -1, 1),
       (5.0, 0.01, -1, 2),
       (5.0, 0.01, -1, 4),
       (5.0, 0.01, 1, 1),
       (5.0, 0.01, 1, 2),
       (5.0, 0.01, 1, 4),
       (5.0, 0.1, -1, 1),
       (5.0, 0.1, -1, 2),
       (5.0, 0.1, -1, 4),
       (5.0, 0.1, 1, 1),
       (5.0, 0.1, 1, 2),
       (5.0, 0.1, 1, 4),
       (5.0, 1.0, -1, 1),
       (5.0, 1.0, -1, 2),
       (5.0, 1.0, -1, 4),
       (5.0, 1.0, 1, 1),
       (5.0, 1.0, 1, 2),
       (5.0, 1.0, 1, 4),
       ],
   }

c.DatasetConfig.preprocesses = [
   ('pca', (), {}),
   ]

c.ModelConfig.layers = [
   (90, 'tanh'),
   (90, 'tanh'),
   (1, 'identity'),
   ]

c.TrainingConfig.batch_size = 100

c.TrainingConfig.data_file = 'data/GaN.xyz'

c.TrainingConfig.epoch = 1000

c.TrainingConfig.interval = 10

c.TrainingConfig.loss_function = (
   'first_only',
   {}
   )

c.TrainingConfig.lr_decay = 1.0e-6

c.TrainingConfig.order = 1

c.TrainingConfig.out_dir = 'output'

c.TrainingConfig.patients = 5

c.TrainingConfig.scatter_plot = True

command line log

Once edited configuration file training_config.py, you just do one command hdnnpy trian.

$ hdnnpy train

Construct sub dataset tagged as "CrystalGa16N16"
Successfully loaded & made needed symmetry_function dataset from <workdir>/data/CrystalGa16N16/symmetry_function.npz
Successfully loaded & made needed interatomic_potential dataset from <workdir>/data/CrystalGa16N16/interatomic_potential.npz

Initialized PCA parameters for Ga
    Feature dimension: 74 => 74
    Cumulative contribution rate = 0.9999999403953552


Initialized PCA parameters for N
    Feature dimension: 74 => 74
    Cumulative contribution rate = 1.0000001192092896

Construct sub dataset tagged as "CrystalGa2N2"
Successfully loaded & made needed symmetry_function dataset from <workdir>/data/CrystalGa2N2/symmetry_function.npz
Successfully loaded & made needed interatomic_potential dataset from <workdir>/data/CrystalGa2N2/interatomic_potential.npz
Saved PCA parameters to <workdir>/output/preprocess/pca.npz.
early stopping: operator is less
epoch       iteration   main/RMSE/force  main/RMSE/total  val/main/RMSE/force  val/main/RMSE/total
1           14          1.20575          1.20575          1.21576              1.21576
2           28          1.08758          1.08758          1.06121              1.06121
3           42          0.895798         0.895798         0.865482             0.865482
4           55          0.685623         0.685623         0.694789             0.694789
5           69          0.560702         0.560702         0.603832             0.603832
6           83          0.509542         0.509542         0.570984             0.570984
7           97          0.486743         0.486743         0.552533             0.552533
8           110         0.468966         0.468966         0.540375             0.540375
9           124         0.458917         0.458917         0.531327             0.531327
10          138         0.448132         0.448132         0.524466             0.524466
...

Directory tree

After training, directory tree becomes as follows:

workdir
├── data/
│   ├── GaN.xyz
│   ...
├── output/
│   ├── CrystalGa16N16/
│   │   ├── energy.png
│   │   ├── force.png
│   │   └── training.log
│   ├── CrystalGa2N2/
│   │   ├── energy.png
│   │   ├── force.png
│   │   └── training.log
│   ├── master_nnp.npz
│   ├── preprocess/
│   │   └── pca.npz
│   ├── training_config.py
│   └── training_result.yaml
└── training_config.py

Modules

Dataset tools

DatasetGenerator Deal out datasets as needed.
HDNNPDataset Combine and preprocess descriptor and property dataset.

Descriptor datasets

SymmetryFunctionDataset Symmetry function dataset for descriptor of HDNNP.

Property datasets

InteratomicPotentialDataset Interatomic potential dataset for property of HDNNP.

Dataset base classes

DescriptorDatasetBase Base class of atomic structure based descriptor dataset.
PropertyDatasetBase Base class of atomic structure based property dataset.

Atomic structure

AtomicStructure Wrapper class of ase.Atoms.

File parsing tools

parse_xyz Parse a xyz format file and bunch structures by the same tag.

Neural network potential models

HighDimensionalNNP High dimensional neural network potential.
MasterNNP Responsible for managing the parameters of each element.
SubNNP Feed-forward neural network representing one element or atom.

Pre-processing of dataset

PCA Principal component analysis (PCA).
Scaling Scale all feature values into the certain range.
Standardization Scale all feature values to be zero-mean and unit-variance.

Pre-processing base class

PreprocessBase Base class of pre-processing.

Chainer-based training tools

Custom training extensions

ScatterPlot Trainer extension to output predictions/labels scatter plots.
set_log_scale Change y axis scale as log scale.

Loss functions

Zeroth Loss function to optimize 0th-order property.
First Loss function to optimize 0th and 1st-order property.
Potential Loss function to optimize 0th property as scalar potential.

Loss function base class

loss_function.loss_function_base.LossFunctionBase

Training manager

Manager Context manager to take trainer snapshot and decide whether to train or not.

Updater

Updater Updater for HDNNP training using HighDimensionalNNP and MasterNNP.

Utilities

MPI MPI world communicator and aliases.
pprint Pretty print function.

How to extend HDNNP

Dataset

HDNNP dataset consists of Descriptor dataset and Property dataset.

Descriptor dataset

Currently, we have implemented only symmetry function dataset.
If you want to use other descriptor dataset, define a class that inherits
hdnnpy.dataset.descriptor.descriptor_dataset_base.DescriptorDatasetBase
It defines several instance variables, properties and instance methods for creating a HDNNP dataset.

In addition, override the following abstract method.

  • generate_feature_keys
It returns a list of unique keys in feature dimension.
In addition to being able to use it internally, it is also used to expand feature dimension and zero-fill in hdnnpy.dataset.HDNNPDataset
  • calculate_descriptors
It is main function for calculating descriptors from a atomic structure, which is a wrapper of ase.Atoms object.

Property dataset

Currently, we have implemented only interatomic potential dataset.
If you want to use other property dataset, define a class that inherits
hdnnpy.dataset.property.property_dataset_base.PropertyDatasetBase
It defines several instance variables, properties and instance methods for creating a HDNNP dataset.

In addition, override the following abstract method.

  • calculate_properties
It is main function for getting properties from a atomic structure, which is a wrapper of ase.Atoms object.

Preprocess

  • PCA
  • Scaling
  • Standardization

Loss function

Currently, we have implemented following loss function for HDNNP training.

  • Zeroth
  • First

Each loss function uses a 0th/1st order error of property to optimize HDNNP. First uses both 0th/1st order errors of property weighted by parameter mixing_beta to optimize HDNNP.

  • Potential

It uses 2nd order derivative of descriptor dataset to optimize HDNNP to satisfy following condition:

\[\rot \bm{F} = 0\]

Then, there is a scalar potential \(\varphi\):

\[\bm{F} = \mathrm{grad} \varphi\]
If you want to use other loss function, define a class that inherits
hdnnpy.training.loss_function.loss_function_base.LossFunctionBase.
It defines several instance variables, properties and instance methods.

Indices and tables