Welcome to BioShell package documentation!¶
What is BioShell¶
BioShell is a general bioinformatics toolkit, focused on biomolecular structures. It provides:
- Command line applications
- that have been distributed since the original 1.0 version of the package. Some of them have changed their names (e.g. HCPM has been renamed to clust)
- Many (currently over a hundred) small applications
- that also serve as integration tests. They come with example input data and expected output
- Python library
- majority of BioShell classes may be directly used in Python
- C++ library
- which offers highly optimized implementations of oftenly used BioInformatics algorithms and protocols.
BioShell functionality¶
BioShell functionality covers file processing such as data filtering and file formats convertion. It handle protein sequences, sequence profiles and alignments. Structures calculations capabilities include superimpositions, crmsd calculations, alignments, Phi/Psi angles and many more.
Since its first publication, BioShell has been providing a small set of command-line programs for easy data manipulation from a UNIX-like terminal or a shell script. The newest release extends this set by over a hundred simple command-line utilities. See examples page to see which program can help you in solving a particular problem.
BioShell command-line utilities¶
The original BioShell command line utilities are still maintained, although their functionality is a bit redundant with applications released with BioShell 3.0 version. See Programs page for details.
BioShell tests & examples¶
Since the most recently published version 3.0, BioShell package comes with extensive set of example applications, which have been created to simultaneously reach tree goals:
- to extend the set of BioShell command line tools. Programs with names starting with
ap_
are in fact yet another applications. The difference between these test and standard apps is that the latter perform only a single action and their command line is simplified. These programs are integration tests at the same time.- to provide high quality code snippets that help BioShell users write their own programs. Small programs, that show how to use a particular class or a function, are named
ex_*
. At the same time they serve as unit tests- to test the code. Both
ex_*
andap_*
tests are automatically executed by a test server to ensure the quality and integrity of the package. Input data as well as curated output of these tests is versioned in git repository along the source code.All the examples are included in respective API documentation pages. Since the test are continuously tested, the serve as a source of validated snippets for creating future programs.
BioShell library for Python (aka PyBioShell)¶
BioShell distribution provides also bindings to Python scripting language; that is, BioShell is also a versatile library for python scripting. BioShell objects can be imported as any other python modules. Example scripts are also included in the repository.
Precompiled library (a single .so
file) for Unix distribitions can be downloaded for the following Python versions. Click on an appropriate link below:
or type this command in your terminal:
curl -O http://bioshell.pl/downloads/bioshell/Python37/pybioshell.so
Remember to add path with pybioshell.so
to your PyBioShell script eg.
sys.path.append('/home/username/src.git/bioshell/bin/')
If you really need to compile your own version follow the instructions here
Previous versions¶
BioShell versions 1.x¶
The original BioShell package was designed as a suite of programs designed for pre- and post-processing in protein structure modeling protocols. The package has been providing a convenient set of tools for in conversion between various sequence and structure formats. It has been also possible to calculate simple properties of protein conformations. The very first commands (e.g. HCPM for clustering protein structures) were implemented in C, later on the development switched to C++.
BioShell versions 2.x¶
Around 2006/07 BioShell has been reimplemented in JAVA, designed as a library for scripting languages running on Java Virtual Machine, most notably Python, but also Scala, Ruby, Groovy and many others. Currently the most recent stable release is 2.2. API docs as well as example scripts may be found in documentation. All program from 1.x versions were also ported to JAVA.
Citations¶
- BioShell - the third version:
- Joanna M. Macnar, Natalia A. Szulc, Justyna D. Kryś, Aleksandra E. Badaczewska-Dawid and Dominik Gront “BioShell 3.0: Library for processing structural biology data.” Biomolecules 2020, 10, 461; https://doi.org/10.3390/biom10030461
- Three-dimensional protein threading:
- Gront, M. Blaszczyk, P. Wojciechowski, A. Kolinski “Bioshell Threader: protein homology detection based on sequence profiles and secondary structure profiles.” Nucleic Acids Research 2012 doi:10.1093/nar/gks555
- One-dimensional protein threading:
- Gniewek, A. Kolinski, D. Gront “Optimization of profile-to-profile alignment parameters for one-dimensional threading.” J. Computational Biology 2012 Jul;19(7):879-86
- BioShell - the second version:
- Gront and A. Kolinski “Utility library for structural bioinformatics” Bioinformatics 2008 24(4):584-585
- BBQ - program for backbone reconstruction:
- Gront, S. Kmiecik, A. Kolinski “Backbone Building from Quadrilaterals. A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates.” J. Comput. Chemistry 2007 28(9):1593-1597
- BioShell - the first version:
- Gront and A. Kolinski “BioShell - a package of tools for structural biology computations” Bioinformatics 2006 22(5):621-622
- Program for clustering protein structures (currently named clust):
- Gront and A. Kolinski “HCPM - program for hierarchical clustering of protein models” Bioinformatics 2005 21(14):3179-3180
Installation¶
This document describes, how to install binary programs of BioShell toolkit. See PyBioShell Installation page for instruction regarding Python bindings.
BioShell package has been written in C++11 and must be built before use. This is a quite easy process, which requires CMake (https://cmake.org) and a relatively modern C++ compiler such as gcc 5.0 or clang 10.0
Just follow the steps below to compile the package:
The two additional sections below provide more information on customization of the building process:
1. Install zlib
BioShell requires zlib library so it can handle compressed files. You must install developer version of the library to be able to compile BioShell. On Ubuntu linux it can be installed by the command:
sudo apt-get install zlib1g-dev
2. Clone BioShell
If you haven’t done it yet, clone bioshell repository (https://bitbucket.org/dgront/bioshell/src/master/) from Bitbucket:
git clone https://bitbucket.org/dgront/bioshell.git
cd bioshell
This should create bioshell
directory in your current location. The second line steps into this new directory
2.1 Clone submodules for Bioshell
Now Bioshell package contains submodules to use machine learning. Update neccecary submodules with this command:
git submodule update --init
Submodules will be downloaded to external/
directory in bioshell repository.
3. Run CMake:
cd build
cmake ..
The build
directory will contain compilation intermediate files and may be deleted once BioShell is compiled.
The first line enters that direcotry, the second command calls cmake
to set up the compilation process. CMake
attempts to set up everything automatically, sometimes however it would require some guidance, e.g. to find
the right compiler (see below)
4. Run Make:
make -j 4
where -j 4
allows make use 4 cores to run parallel compilations. This command will attempt to compile all targets; the list of all targets can be printed by make help
. As one can see, each executable is a separate target. There are also predefined group targets:
- bioshell
- compiles only bioshell library
- bioshell-apps
- compiles bioshell library and bioshell toolkit applications, such as seqc and strc
- examples
- compiles all examples, i.e. all ap_ and ex_ application
5. Set BIOSHELL_DATA_DIR path
Last step is to add path to data/
directory to your shell variables e.g.
export BIOSHELL_DATA_DIR="/Users/username/bioshell/data"
or add this variable to your ~/.bashrc
:
echo 'export BIOSHELL_DATA_DIR="/Users/username/bioshell/data" ' >> ~/.bashrc
6. Additional parameters for compilation
The procedure described above compiles the package with the default settings: Release build with no profiling. To change
it, you should remove everything from ./build
directory and generate new makefiles with new settings:
in order to use a compiler other that the default one (e.g. gcc version 4.9), say:
cmake -DCMAKE_CXX_COMPILER=g++-4.9 -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_BUILD_TYPE=Release ..
or to use icc
for instance:
cmake -DCMAKE_CXX_COMPILER=icc -DCMAKE_C_COMPILER=icc -DCMAKE_BUILD_TYPE=Release ..
to selecting a different compiler and making a profile build
-DCMAKE_CXX_COMPILER=icc -DCMAKE_C_COMPILER=icc -D PROFILE=ON -DCMAKE_BUILD_TYPE=Release ..
to brew a debug build, turn
-DCMAKE_BUILD_TYPE=Release
into-DCMAKE_BUILD_TYPE=Debug
. So to make a debug build without changing the compiler, say just:cmake -DCMAKE_BUILD_TYPE=Debug ..
to make a profiling build (-pg option) for gcc or Xcode Instruments add
-D PROFILE=ON
to thecmake
command (the customPROFILE
variable test is implemented in the mainCMakeLists.txt
).cmake -D PROFILE=ON ..
7. Using IDE
In the above examples, cmake
was used to produce makefiles for to compile BioShell.
cmake
command may be also used to generate project files for other environments, in particular:
to produce *.xcodeproj file for xcode:
cmake -DCMAKE_BUILD_TYPE=Release -G Xcode
or to prepare solution files for Microsoft Visual Studio (must be run on a Windows machine):
cmake -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 2013"
PyBioShell Installation¶
PyBioShell is a set of Python bindings to BioShell library. It allows use of BioShell classes like any other Python modules. The closest tool similar by functionality is Biopython, which however is partially written in Python.
The easiest option to get PyBioShell on your machine is to download precombiled library, available for the following Python versions. Click on an appropriate link below:
or type this command in your terminal:
curl -O http://bioshell.pl/~jkrys/pybioshell/pybioshell37/pybioshell.so
You also need data/
directory, which contains files necessary to run BioShell. Download data.tar.gz , uncompress it and put it somewhere BioShell will be able to find it, see here for details.
Remember to add path with pybioshell.so
to your shell variables e.g.
export PYTHONPATH="$PYTHONPATH:$HOME/bioshell/bin"
or add this variable to your ~/.bashrc
:
echo 'export PYTHONPATH="$PYTHONPATH:$HOME/bioshell/bin" ' >> ~/.bashrc
Remember also to add data/
directory to your shell variables. Look here for details.
Another way is to compile it from sources, following the steps given below. The procedure assumes your bioshell
repository is located in src.git/bioshell/
and binder
in src.git/binder/
; these paths are arbitrary but the commands must be adjusted accordingly.
0. Prequisities¶
In order to compile binder
, you need to have Ninja
building tool (website) and cmake. You will also need python headers,
available from python-dev
package or similar (e.g. python3.5-dev
). On Ubuntu Linux you can install them with apt-get
:
sudo apt-get install ninja-build cmake python-dev
The use of clang
compiler is advised. Try to get clang-6.0
or newer (see this link)
1. Clone and compile binder
¶
To clone binder from its github repository:
git clone https://github.com/RosettaCommons/binder
cd binder
python3 ./build.py -j 4
where the last command actually builds binder using four CPU cores for that. Note, that binder uses more than 1GB of disc space and its compilation may take a few hours.
2. Build PyBioShell¶
Open scripts/build_pybioshell.py
file and edit variables, adapting it to your system. In particular, you most likely have to fix
clang++
version (LINKER_CMD
variable) as well as the path where the binder
executable is located (BINDER_PATH
variable)
Make a directory build_bindings
in the main BioShell directory, i.e in the directory where pybioshell.config
is located.
Choose your Python version and run the compilation as follow:
python3 ./scripts/build_pybioshell.py -v 3.5
You should find your compiled version in bin/pybioshell.so
. If you have any problems with compilation, please do not hesitate to contact us.
Welcome to BioShell documentation!¶
Indices and tables¶
BioShell programs¶
Currently, BioShell distribution provides the following programs:
- seqc (cookbook recipes):
- sequence converter : a utility to convert between sequence data formats
- strc (cookbook recipes):
- structure converter : a utility to work with PDB files
- str_calc (cookbook recipes):
- structure calculator; perform various calculations on a PDB file
- clust (cookbook recipes):
- calculates hierarchical clustering of arbitrary objects based on a map of pairwise distances between them
- hist (cookbook recipes):
- simple utulity to make 1D and 2D histograms
Now you can browse BioShell cookbook, or read tutorials, listed below
clust
tutorial : clustering sequences and structures¶
Clustering procedure allows one to divide arbitrary number of objects into groups accordint to their mutual (dis)similarity. This method is widely used in bioinformatics and molecular modeling to deal with data sets that are too large to be inspected manually. Here we give two examples of Hierarchical Agglomerative Clustering with BioShell package:
- to cluster a pool of protein sequences
- to cluster results of protein-peptide docking
The BioShell procedure for clustering divides the task into three steps:
calculate a matrix of distances between elements subjected to a clustering analysis.
As a result, a flat text file should be produced. The three columns of that file must provide i-th element ID, j-th element ID and the respective distance value
run the actuall clustering procedure.
Although the procedure can be stopped at a particular cutoff distance, we advise to conduct the calculations i.e. until all the objects are merged into a single cluster. Clustering tree will be stored in an output file
analyse the clustering tree to retrieve clusters at a desired cutoff level
Below we show how to perform these three steps for two different clustering applications
Example 1. Clustering protein sequences by their mutual sequence identity¶
Step 1: Calculating the distances¶
Clustering procedure should merge close sequences (i.e. small mutual distance) into a single cluster, while dissimilar sequences should be placed in different clusters. Unfortunately, sequence identity value (seq_id) cannot be used here because its largest value (1.0) denotes identical sequences. Here we propose to use 1.0 - seq_id as a distance function.
First we use ap_PairwiseSequenceIdentityProtocol
program to evaluate all pairwise distances:
ap_PairwiseSequenceIdentityProtocol inp.fasta 8 0.4 > seq_id.out 2>LOG
where inp.fasta
is the input file (FASTA format), 8
is the number of cores (threads run in parallel)
and 0.4
is the smallest seq_id value to be written to a file.
Then the seq_id values are converted into distances with awk
command line tool:
awk '{print $1,$2,1.0-$3}' seq_id.out > distances.out
Step 2: Clustering the data¶
Then we run the clust
tool:
clust -in::file=distances.out \ -n=46621 \ -complete \ -clustering:missing_distance=1.1 \ -clustering:out:tree=tree-complete >clust_out 2>clust_log
The -n
option is necessary to provide the number of objects subjected to clustering (not the number of distance values!).
-clustering:missing_distance
Provides the default distance value for the cases it’s undefined. The clustering tree will be
stored in a file specified by -clustering:out:tree
option
Step 3: Analysis¶
clust -in::file=distances.out \ -n=46621 \ -clustering:in:tree=tree-complete \ -clustering:out:clusters \ -clustering:out:distance=0.4 \ -clustering:out:min_size=1
Example 2. Clustering results of protein-peptide docking¶
The input data set contains 12500 conformations of a protein receptor (1jd4) with a short peptide bound to its surface. The conformations were calculated with FlexPepDocking program from Rosetta modelling suite.
Step 1: Calculating the distances¶
Step 2: Clustering the data¶
We run clust program as above, just should remember to put the correct imput file name and to change the number of data elements (i.e. protein conformations)
clust -in::file=1jd4-pep-crsmd \ -n=12500 \ -complete \ -clustering:missing_distance=15.1 \ -clustering:out:tree=tree-complete >clust_out 2>clust_log
Step 3: Analysis¶
clust -in::file=all_vs_all_crmsd_15 \ -n=12500 -clustering:out:clusters \ -clustering:out:distance=2.5 \ -clustering:out:min_size=10 \ -clustering:in:tree=tree-complete
BioShell cookbook¶
This cookbook provides a bunch of handy one-liners that simplify daily tasks in structural bioinformatics.
bash
-only recipes¶
Combine a bunch of .pdb
files into a single multimodel-pdb:
k=0; for i in *.pdb; do k=$(($k+1)); echo "MODEL "$k; cat $i; echo "ENDMDL"; done > all-pdb mv all-pdb all.pdb
1. seqc
recipes¶
1.1 Create FASTA from PDB (prints FASTA on a screen):
seqc -in:pdb=2gb1.pdb -out:fasta
1.2 Create FASTA from PDB, including secondary structure:
seqc -in:pdb=2gb1.pdb -out:fasta -in::pdb::header -out:fasta:secondarySecondary structure annotation is extracted from the PDB file header (
-in::pdb::header
option is necessary to parse it)
1.3 Create SS2 file from PDB:
seqc -in:pdb=2gb1.pdb -out:ss2 -in::pdb::headerAs above, the secondary structure is extracted from the PDB file header; all the probability values (last three columns in a SS2 file) are set either to \(1.0\) or \(0.0\)
1.4 Count secondary structure elements in a bunch of PDB files, create a nice table:
for i in 2gb1.pdb 2fdo.pdb 1rrx.pdb do ss=`seqc -in:pdb=$i -out:ss2 -in::pdb::header -of -out::sequence::width=0 \ | tail -1 | fold -w1 | uniq | sort | uniq -c | tr '\n' ' '` echo $i $ss done 2>/dev/nullAs in recipe 1.2, but this time a combination of a few bash commands is used to parse the ouput and count the number of secondary structure elements: coil (C), strands (E) and helices (H). Example output looks as below:
2gb1.pdb 6 C 4 E 1 H 2fdo.pdb 7 C 6 E 3 H 1rrx.pdb 16 C 11 E 5 H
1.5 Write FASTA file with only one line per sequence (un-wrap sequences)
seqc -in:fasta=in.fasta -out:sequence:width=0 -out:fasta
1.6 Convert ASN.1 sequence profile (psiblast output) into a text format
seqc -in:profile:asn1=d1or4A_.asn1 -out:profile:txt
1.7 As in recipe 1.5 (i.e. .asn1 -> .txt), but this time reorder profile columns
seqc -in:profile:asn1=d1or4A_.asn1 -out:profile:txt \ -out:profile:columns=GAPVILMCHWFYKRQDNQST
1.8 Sort sequences from the longest to the shortest
seqc -in:fasta=in.fasta -seqc:sort -out:fastaThis recipe can obviously be combined with the one above (every FASTA sequence in a single line)
1.9 Basic sequence filtering
seqc -in:fasta=in.fasta -seqc:sort -select::sequence::protein -out:fasta \ -select::sequence::long_at_least=30Print only amino acid sequences (due to
-select::sequence::protein
filter) that are at least 30 residues long
1.10 Basic sequence filtering: keep nucleotide sequences
seqc -in:fasta=in.fasta -seqc:sort -select::sequence::nucleic -out:fasta \ -select::sequence::long_at_least=30Print only nucleic acid sequences (due to
-select::sequence::nucleic
filter) that are at least 30 residues long
2. strc
recipes¶
2.1 Write only chain A of the given input PDB file
strc -in:pdb=5edw.pdb -select::chains=A -out:pdb=5edwA.pdb
2.2 Write only aminoacids of chain A (ligands, water etc will be removed)
strc -in:pdb=5edw.pdb -select::chains=A -out:pdb=5edwA.pdb -select::aa
2.3 Write only selected fragment of a given protein (residues from 1 to 83 of chain A)
strc -in:pdb=1PQX.pdb -select::substructure=A:1-83 -op=out.pdb
3. str_calc
recipes¶
3.1 Find all pairwise all-atom crmsd distances between all the models in a given PDB
str_calc -in:pdb=2kmk-1.pdb -calc::crmsd -in:pdb::all_models -in:pdb:native=2KMK.pdb.gz
3.2 Read in only CA atoms; find all pairwise crmsd distances between all the models in a given PDB
str_calc -select::ca -in:pdb=2kmk-1.pdb -calc::crmsd -in:pdb::all_models \ -in:pdb:native=2KMK.pdb.gz
3.3 Generate theoretical NOE restraints on for a protein backbone
str_calc -in::pdb=2kwi.pdb -in:pdb:with_hydrogens \ -calc::distmap::describe -calc::distmap::allatomThis command lists all distances between any two backbone atoms;
-in:pdb:with_hydrogens
option forces BioShell to read hydrogen atoms, which is false by default,-calc::distmap::describe
turns on longer atom descriptions. The output may look as below:A GLN 9 N 10 A GLY 8 N 1 3.602 A GLN 9 N 10 A GLY 8 CA 2 2.418 A GLN 9 N 10 A GLY 8 C 3 1.326 A GLN 9 N 10 A GLY 8 O 4 2.245 A GLN 9 N 10 A GLY 8 HA2 8 2.506 A GLN 9 N 10 A GLY 8 HA3 9 2.959 A GLN 9 CA 11 A GLY 8 N 1 4.834 A GLN 9 CA 11 A GLY 8 CA 2 3.788 A GLN 9 CA 11 A GLY 8 C 3 2.425 A GLN 9 CA 11 A GLY 8 O 4 2.756str_calc -in::pdb=2kwi.pdb -in:pdb:with_hydrogens -calc::distmap::describe \ -calc::distmap::allatom | awk '{if(($11<2.5) && ($3-$8>4)) print $0}'This output is the filtered with awk. The ouput lines must satisfy the criteria: distance below 2.5 Angstroms, sequence separation at least 4 residues.
3.3 Find all-atom crmsd distances between all models in a single PDB and the reference native structure
str_calc -in:pdb=2kmk-1.pdb -calc::crmsd -in:pdb::all_models -in:pdb:native=2KMK.pdb.gz
3.4 As in the above example, but after superimposing alpha-carbons, calculate crmsd on all the atoms:
str_calc -in:pdb=2kmk-1.pdb -calc::crmsd -in:pdb::all_models -in:pdb:native=2KMK.pdb.gz \ -calc::crmsd::matching_atoms=A:*:_CA_ -calc::crmsd::rotated_atoms=A:*:*
Check peptide docking results: superimpose two structures using alpha carbons of chain A (i.e. the receptor) and calculate crmsd of CA atoms of chain B (i.e. the ligand)
str_calc -in:pdb=model-1.pdb -calc::crmsd -in:pdb::all_models -in:pdb:native=native.pdb \ -calc::crmsd::matching_atoms=A:*:_CA_ -calc::crmsd::rotated_atoms=B:*:_CA_
3.5 Check peptide docking results: superimpose two structures using alpha carbons of chain A (i.e. the receptor) and calculate crmsd of CA atoms of chain B (i.e. the ligand)
str_calc -in:pdb=models-1.pdb -calc::crmsd -in:pdb::all_models -in:pdb:native=native.pdb \ -calc::crmsd::matching_atoms=A:*:_CA_ -calc::crmsd::rotated_atoms=B:*:_CA_Note, that this recipe loads all models from the
models-1.pdb
file. For instance, if that file contains 10 structures, one can expect the following output:# name crmsd len crmsd len models-1-1.pdb 0.000 96 0.000 4 models-1-2.pdb 0.262 96 22.598 4 models-1-3.pdb 0.274 96 16.670 4 models-1-4.pdb 0.260 96 16.123 4 models-1-5.pdb 0.292 96 24.524 4 models-1-6.pdb 0.320 96 27.575 4 models-1-7.pdb 0.351 96 24.200 4 models-1-8.pdb 0.385 96 24.613 4 models-1-9.pdb 0.297 96 22.778 4 models-1-10.pdb 0.325 96 25.136 4The first column identifies a model structure (name-of-input-file + dash + model number), the second and third provide crmsd on the atoms used for superposition (CA atoms of chains A inthis case) and the number of these atoms (here 96), respectively. Finaly the last two columns provude crmds and atom count for the rotated atom set. The results come for tetrapeptide docking experiment, hence only 4 CA atoms were rotated.
4. clust
recipes¶
4.1 Calculate hierarchical clustering of 140 elements; distances are stored in tm_dist
file.
clust -i=tm_dist -n=140 -clustering:out:distance=0.4Prints clusters for critical distance 0.4. By default single link clustering strategy is used
5. hist
recipes¶
5.1 Calculate a histogram from the 14th column of a given input file:
hist -in:file=default.fsc -in:column=14 -hist:x_max=10 -hist:x_min=0The command reads a score file produced by Rosetta and makes a histogram of crmsd, assuming it’s in the 14th column
BioShell examples¶
There are three groups of examples for BioShell library: ap_* which are functional applications and are helpful for every user, ex_* show how to use a particular BioShell class or function in your own C++ program. Finally Python scripts (*.py files) show how to solve bioinformatics problems using PyBioShell. You can automatically run these test on your local machine. Use
python3 call_all_tests.py
in your bioshell/doc_examples/cc-examples
directory to run ap_* and ex_* or in bioshell/doc_examples/py-examples
directory to run *.py scripts. You will find test_results.html
in either cc-examples
or py-examples
directory, which you can open with your web browser to see the test results summary.
Overall there is more than 200 examples than can be accessed by the index pages below:
BioShell examples list¶
The latest BioShell 3.0 distribution provides an extensive set of examples. The purpose to create them is three-fold:
- to facilitate continuous testing of the package (unit and integration tests)
- to provide additional functionality to the package,and
- to serve as coding examples and provide ready-to-use snippets
All the tests, which in practice are small C++ applications, were divided into two broad groups; the tests are named
staring from ap_
, ex_
. In additiion we provide also example Python scripts which use PyBioShell package.
ap_* programs¶
These are integration tests, that besides testing whether the package is bug-free, should also do something usefull.
ap_BackboneHBondMap¶
Reads a PDB file and calculates a map of backbone hydrogen bonds, providing also the geometry of each bond. The resulting table, printed on the screen provides: - H donor residue name and id (columns 1 and 2) - H acceptor residue name and id (columns 4 and 5) - two distances: r(O..H) and r(N..O) (columns 7 and 8) - planar (C-O..H) and dihedral (C-O..H-N) (columns 9 and 10) - DSSP energy for this bond (column 11) - X,Y, Z coordinates of H atom in the local coordinates system (columns 12, 13 and 14) - theta, phi spherical coordinates of H atom (columns 15 and 16)
EXAMPLE OUTPUT:
# 42 hydrogen bonds found in backbone:
TYR 3 -> THR 18 : 2.620 3.346 165.58 94.25 -1.170 -0.702 0.211 2.515 16.25 163.29
LYS 4 -> LYS 50 : 2.259 3.156 120.71 159.88 -1.310 1.445 1.249 1.205 57.75 40.83
LEU 5 -> THR 16 : 1.838 2.802 143.84 -115.27 -2.834 0.102 -1.075 1.488 35.98 -84.57
USAGE:
./ap_BackboneHBondMap input.pdb
EXAMPLE:
./ap_BackboneHBondMap 5edw.pdb
Keywords:
- PDB input
- Hydrogen bonds
- data_structures
- Protein structure features
Categories:
- core/calc/structural/BackboneHBondMap
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/protocols/selection_protocols.hh> #include <core/calc/structural/interactions/BackboneHBondMap.hh> #include <utils/exit.hh> using namespace core::data::structural; using namespace core::data::io; using namespace core::data::basic; std::string program_info = R"( Reads a PDB file and calculates a map of backbone hydrogen bonds, providing also the geometry of each bond. The resulting table, printed on the screen provides: - H donor residue name and id (columns 1 and 2) - H acceptor residue name and id (columns 4 and 5) - two distances: r(O..H) and r(N..O) (columns 7 and 8) - planar (C-O..H) and dihedral (C-O..H-N) (columns 9 and 10) - DSSP energy for this bond (column 11) - X,Y, Z coordinates of H atom in the local coordinates system (columns 12, 13 and 14) - theta, phi spherical coordinates of H atom (columns 15 and 16) EXAMPLE OUTPUT: # 42 hydrogen bonds found in backbone: TYR 3 -> THR 18 : 2.620 3.346 165.58 94.25 -1.170 -0.702 0.211 2.515 16.25 163.29 LYS 4 -> LYS 50 : 2.259 3.156 120.71 159.88 -1.310 1.445 1.249 1.205 57.75 40.83 LEU 5 -> THR 16 : 1.838 2.802 143.84 -115.27 -2.834 0.102 -1.075 1.488 35.98 -84.57 USAGE: ./ap_BackboneHBondMap input.pdb EXAMPLE: ./ap_BackboneHBondMap 5edw.pdb )"; /** @brief Calculates a map of backbone hydrogen bonds. * * BackboneHBondMap is derived from PairwiseResidueMap class * * CATEGORIES: core/calc/structural/BackboneHBondMap; * KEYWORDS: PDB input; Hydrogen bonds; data_structures; Protein structure features * GROUP: Structure calculations; * IMG: ap_BackboneHBondMap-2gb1.png * IMG_ALT: Map of backbone hydrogen bonds for 2GB1 protein */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1], is_not_alternative, only_ss_from_header, true); // --- Read in a PDB file core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- create a Structure object from the first model // ---------- Remove everything but amino acids from an input structure; BackboneHBondMap currently can process only AA core::protocols::keep_selected_atoms(core::data::structural::selectors::IsAA{}, *strctr); // --- The line below creates a map of backbone hydrogen bonds; -0.2 is the energy cutoff (in kcal/mol) to recognize the bond core::calc::structural::interactions::BackboneHBondMap hb_map(*strctr, -0.2); std::cout << "# " << hb_map.count_bonds() << " hydrogen bonds found in backbone:\n"; for (auto h_it = hb_map.cbegin(); h_it != hb_map.cend(); ++h_it) // --- iterate over the bonds and print each of them std::cout << *((*h_it).second->donor_residue()) << " -> " << *((*h_it).second->acceptor_residue()) << " : "<< *(*h_it).second << "\n"; // --- Here we test some other BackboneHBondMap methods const auto hbond = (*hb_map.cbegin()).second; // --- here we make a local copy of the first h-bond to be used in tests std::cerr << "# Is residue 0 h-bonded to residue 3? " << ((hb_map.are_H_bonded(0, 3)) ? "yes\n" : "no\n"); std::cerr << "# Is residue 0 h-bonded to residue 5? " << ((hb_map.at(0, 5) != nullptr) ? "yes\n" : "no\n"); std::cerr << "# Are " << *hbond->donor_residue() << " and " << *hbond->acceptor_residue() << " really bonded? " << ((hb_map.at(*hbond->donor_residue(), *hbond->acceptor_residue()) != nullptr) ? "yes\n" : "no\n"); } |

ap_Crmsd¶
Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on C-alpha coordinates and prints it. If only one input PDB file is given, cRMSD is computed for every pair of models found in the input file (each-vs-each). If exactly two structures are provided, the program calculates cRMSD between the first model of structure A and the first model of structure B. Finally, when more than two input files are specified, each-vs-each calculations are performed for every pair of given structures. Note, that all the structures must contain the same number of C-alpha atoms.
USAGE:
./ap_Crmsd file1.pdb [file2.pdb ..]
EXAMPLEs:
./ap_Crmsd 1cey.pdb
./ap_Crmsd 2gb1.pdb 2gb1-model1.pdb
./ap_Crmsd 2gb1-model1.pdb 2gb1-model2.pdb 2gb1-model3.pdb 2gb1-model4.pdb
REFERENCE: Kabsch, W. “A Solution for the Best Rotation to Relate Two Sets of Vectors.” Acta Cryst (1976) 32 922-923
Keywords:
Categories:
- core/calc/structural/transformations/Crmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/transformations/Crmsd.hh> #include <utils/exit.hh> std::string program_info = R"( Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on C-alpha coordinates and prints it. If only one input PDB file is given, cRMSD is computed for every pair of models found in the input file (each-vs-each). If exactly two structures are provided, the program calculates cRMSD between the first model of structure A and the first model of structure B. Finally, when more than two input files are specified, each-vs-each calculations are performed for every pair of given structures. Note, that all the structures must contain the same number of C-alpha atoms. USAGE: ./ap_Crmsd file1.pdb [file2.pdb ..] EXAMPLEs: ./ap_Crmsd 1cey.pdb ./ap_Crmsd 2gb1.pdb 2gb1-model1.pdb ./ap_Crmsd 2gb1-model1.pdb 2gb1-model2.pdb 2gb1-model3.pdb 2gb1-model4.pdb REFERENCE: Kabsch, W. "A Solution for the Best Rotation to Relate Two Sets of Vectors." Acta Cryst (1976) 32 922-923 )"; /** @brief Calculates crmsd value on C-alpha coordinates. The program prints just the crmsd value. * * CATEGORIES: core/calc/structural/transformations/Crmsd * KEYWORDS: PDB input; crmsd * GROUP: Structure calculations; * IMG: ap_Crmsd_deepteal_brown_1.png * IMG_ALT: 2GB1 model structure superimposed on the native, crmsd = 4.93952 */ int main(const int argc, const char *argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::basic::Vec3; using namespace core::calc::structural::transformations; Crmsd<std::vector<Vec3>,std::vector<Vec3>> rms; if(argc==2) { // --- The case of each-vs-each calculations between models of a single PDB file core::data::io::Pdb q_reader(argv[1],core::data::io::is_ca, core::data::io::keep_all, false); core::data::structural::Structure_SP q_strctr = q_reader.create_structure(0); // --- create a structure object std::vector<std::vector<core::data::basic::Vec3>> models(q_reader.count_models()); for(int i=0;i<q_reader.count_models();++i) { models[i].resize(q_strctr->count_atoms()); q_reader.fill_structure(i,models[i]); for (int j = 0; j < i; ++j) std::cout << i<<" "<<j << " "<<rms.crmsd(models[i], models[j],models[j].size()) << "\n"; } } else { // --- The case when two PDB files are given core::data::io::Pdb q_reader(argv[1], core::data::io::is_ca, core::data::io::keep_all, false); core::data::structural::Structure_SP q_strctr = q_reader.create_structure(0); // --- create a structure object core::data::io::Pdb t_reader(argv[2], core::data::io::is_ca, core::data::io::keep_all, false); core::data::structural::Structure_SP t_strctr = t_reader.create_structure(0); // --- create a structure object if (q_strctr->count_atoms() != t_strctr->count_atoms()) utils::exit_OK_with_message("The two structures have different number of CA atoms!\n"); std::vector<Vec3> q, t; for (auto atom_it = q_strctr->first_atom(); atom_it != q_strctr->last_atom(); ++atom_it) q.push_back(**atom_it); for (auto atom_it = t_strctr->first_atom(); atom_it != t_strctr->last_atom(); ++atom_it) t.push_back(**atom_it); std::cout << "crmsd: " << rms.crmsd(q, t, q_strctr->count_atoms()) << "\n"; } } |

ap_Hexbins¶
Reads a file with 2D observations (two columns with real values) and makes hexbin histogram.
USAGE:
ap_Hexbins input.dat [bin_side_width]
Keywords:
- histogram
- statistics
Categories:
- core::calc::statistics::Hexbins
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | #include <iostream> #include <random> #include <vector> #include <core/index.hh> #include <core/calc/statistics/Hexbins.hh> #include <core/calc/statistics/Random.hh> #include <core/data/io/DataTable.hh> std::string program_info = R"( Reads a file with 2D observations (two columns with real values) and makes hexbin histogram. USAGE: ap_Hexbins input.dat [bin_side_width] )"; /** @brief Reads a file with 2D observations (two columns) and makes hexbin histogram. * * CATEGORIES: core::calc::statistics::Hexbins * KEYWORDS: histogram; statistics * GROUP: Statistics; * IMG: ramachandran_map_all.png * IMG_ALT: hexabin representation of Ramachandran map (histogram made from non-redundant subset of PDB) */ int main(const int argc, const char *argv[]) { using namespace core::calc::statistics; Hexbins<double, core::index4> hist(0.05); if (argc > 1) { // --- If an input file was given, make histogram using this data if (argc > 2) hist.bin_side(atof(argv[2])); float x,y; std::ifstream in(argv[1]); std::string line; // --- here we read the input file using pure C API since it's faster than C++ fancy streams while (std::getline(in, line)) { sscanf(line.c_str(),"%f %f",&x,&y); hist.insert(x,y); } } else { // --- otherwise generate some random data std::cerr << program_info <<"\n"; Random r = Random::get(); r.seed(9876543); NormalRandomDistribution<double> dist_x(1.0, 0.25); NormalRandomDistribution<double> dist_y(3.0, 0.5); for (size_t i = 0; i < 100000; ++i) { hist.insert(dist_x(r), dist_y(r)); } } std::cout << "# Created histogram of " << hist.count_entries() << " observations, " << hist.count_outside() << " were outside\n"; std::vector<std::pair<double,double>> coordinates; // --- a vector used to retrieve coordinates of each hexagon for (auto it = hist.cbegin(); it != hist.cend(); ++it) { coordinates.clear(); auto bin = (*it).first; hist.bin_vertices(bin,coordinates); std::cout << utils::string_format("%4d %4d %4d ",bin.first, bin.second,(*it).second); // --- uncomment the lines below to print coordinates of hexbin vertexes in every line // --- Note: this is a lot of (redundant) output; make_plots.py script may generate these coordinates for you based on bin indexes // std::cout <<" : "; // for(const auto & xy : coordinates) std::cout << utils::string_format("%8.3f %8.3f ",xy.first,xy.second); std::cout <<"\n"; } } |

ap_LigandTossingMover¶
The program creates a multimodel PDB with random orientations of a ligand in respect to the protein.
USAGE:
ap_LigandTossingMover 2kwi.pdb GNP [30]
where 2kwi.pdb is the name of the input PDB file, GNP is the code of the ligand (must be in the same PDB file) and 30 is the number of random conformations generated (optional argument)
Keywords:
Categories:
- simulations::movers::LigandTossingMover
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | #include <core/data/basic/Vec3.hh> #include <core/data/io/Pdb.hh> #include <simulations/movers/LigandTossingMover.hh> #include <simulations/forcefields/ConstEnergy.hh> #include <simulations/systems/PdbAtomTyping.hh> #include <simulations/observers/cartesian/PdbObserver_OBSOLETE.hh> #include <simulations/observers/cartesian/ExplicitPdbFormatter_OBSOLETE.hh> #include <simulations/sampling/AlwaysAccept.hh> using namespace core::data::basic; using namespace simulations; #include <utils/exit.hh> std::string program_info = R"( The program creates a multimodel PDB with random orientations of a ligand in respect to the protein. USAGE: ap_LigandTossingMover 2kwi.pdb GNP [30] where 2kwi.pdb is the name of the input PDB file, GNP is the code of the ligand (must be in the same PDB file) and 30 is the number of random conformations generated (optional argument) )"; /** @brief To test LigandTossingMover mover, tosses a ligand on a proteins surface * * The program creates a multimodel PDB with random orientations of a ligand in respect to the protein * * CATEGORIES: simulations::movers::LigandTossingMover * KEYWORDS: docking; Mover * IMG: ap_LigandTossingMover.png * IMG_ALT: 25 conformations of the same ligand randomly placed on the surface of a protein */ int main(int argc, char *argv[]) { using namespace simulations::systems; // for AtomTypingInterface and ResidueChain using namespace simulations::observers::cartesian; using namespace core::data::io; // for Pdb reader using core::data::basic::Vec3; if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // --- Read the input PDB and create a structure object // Pdb reader(argv[1], all_true(is_not_hydrogen, is_not_water, is_not_alternative), true); Pdb reader(argv[1]); core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- Prepare a modeled system from a given PDB file ResidueChain_OBSOLETE<Vec3> system(*strctr); std::string ligand_code(argv[2]); int from = -1, to = -1, i = 0; for (auto atom_it = strctr->first_atom(); atom_it != strctr->last_atom(); ++atom_it) { if (ligand_code.size()==3) { // truly it's a ligand code if (((*atom_it)->owner()->residue_type().code3 == ligand_code) && (from == -1)) from = i; if (((*atom_it)->owner()->residue_type().code3 != ligand_code) && (to == -1) && (from != -1)) to = i - 1; } else { // it should be a chain code then if (((*atom_it)->owner()->owner()->id() == ligand_code) && (from == -1)) from = i; if (((*atom_it)->owner()->owner()->id() != ligand_code) && (to == -1) && (from != -1)) to = i - 1; } ++i; } if (to == -1) to = i - 1; // --- assign the last atom if nothing has been assigned yet AtomRange moving(from,to); std::cout << "Moving atoms: " << moving << "\n"; core::index4 n_moves = (argc==4) ? atoi(argv[3]) : 10; simulations::movers::LigandTossingMover<Vec3> mover(system, moving, 5.0); simulations::sampling::AlwaysAccept alwaysMove; std::shared_ptr<AbstractPdbFormatter_OBSOLETE<Vec3>> fmt = std::make_shared<ExplicitPdbFormatter_OBSOLETE<Vec3>>(*strctr); simulations::observers::cartesian::PdbObserver_OBSOLETE<Vec3> trajectory(system, fmt, "out.pdb"); for (size_t i = 0; i < n_moves; i++) { mover.move(alwaysMove); trajectory.observe(); } } |

ap_aligned_pdb¶
Reads an alignment between two proteins (PIR format) and the two respective protein structures (PDB format) and writes the aligned parts of the two structures. The program concerns only the first two sequences found in the PIR file; they must be given in the same order as the input PDB files. Only the first chain will be used from either structure; if you want to use chain ‘B’, from a structure, use strc command to extract it prior using ap_aligned_pdb. The program writes ‘query’ and ‘tmplt’ files which contain the respective structure fragments, already superimposed (the template on the query). One of the two structures (either the query or the template) may be missing, e.g. in a case of gene duplication dash ‘-’ should be used instead of the respective file name, as in the examples below.
USAGE:
ap_aligned_pdb alignment.pir prot1.pdb prot2.pdb
EXAMPLE:
ap_aligned_pdb 1uox_1uox_1.pir 1uox.pdb -
ap_aligned_pdb 1uox_1uox_1.pir - 1uox.pdb
Keywords:
Categories:
- core/data/io/pir_io
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | #include <iostream> #include <core/data/io/pir_io.hh> #include <core/data/io/Pdb.hh> #include <core/data/io/alignment_io.hh> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/calc/structural/transformations/Crmsd.hh> #include <utils/Logger.hh> std::string program_info = R"( Reads an alignment between two proteins (PIR format) and the two respective protein structures (PDB format) and writes the aligned parts of the two structures. The program concerns only the first two sequences found in the PIR file; they must be given in the same order as the input PDB files. Only the first chain will be used from either structure; if you want to use chain 'B', from a structure, use strc command to extract it prior using ap_aligned_pdb. The program writes 'query' and 'tmplt' files which contain the respective structure fragments, already superimposed (the template on the query). One of the two structures (either the query or the template) may be missing, e.g. in a case of gene duplication dash '-' should be used instead of the respective file name, as in the examples below. USAGE: ap_aligned_pdb alignment.pir prot1.pdb prot2.pdb EXAMPLE: ap_aligned_pdb 1uox_1uox_1.pir 1uox.pdb - ap_aligned_pdb 1uox_1uox_1.pir - 1uox.pdb )"; using namespace core::data::structural; utils::Logger logs("ap_aligned_pdb"); Structure_SP process_input_pdb(const std::string &pdb_fname, std::vector<Residue_SP> &residues, const selectors::ResidueSelector & which_part) { selectors::IsAA aa_only; // --- Read the first structure and repack its amino acid residues (the other cannot be aligned) core::data::io::Pdb reader(pdb_fname, core::data::io::is_not_alternative, core::data::io::only_ss_from_header, true); Structure_SP strctr = reader.create_structure(0); logs << utils::LogLevel::INFO << "Selecting " << utils::to_string(which_part)<<" from "<<strctr->code()<<"\n"; for(auto i_chain : *strctr) { for(auto i_resid : *i_chain) if (which_part(*i_resid) && aa_only(*i_resid)) { residues.push_back(i_resid); } } return strctr; } /** @brief Reads an alignment between two proteins (PIR format) and the two structures and writes PDB for the aligned parts * * CATEGORIES: core/data/io/pir_io; * KEYWORDS: PDB input; PIR; PDB output * GROUP: Alignments * IMG: ap_aligned-1k6m-1bif.png * IMG_ALT: 1K6M and 1BIF structures aligned according to HOMSTRAD database */ int main(const int argc, const char *argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameters using core::data::sequence::Sequence_SP; // --- Sequence_SP is just a std::shared_ptr to core::data::sequence::Sequence type using namespace core::alignment; using namespace core::data::structural; // --- Create a container where the sequences will be stored std::vector<Sequence_SP> sequences; // --- Read a file with PIR sequences and create an alignment object core::data::io::read_pir_file(argv[1], sequences); PairwiseSequenceAlignment alignment("query", sequences[0]->sequence, 0, "tmplt", sequences[1]->sequence, 0, 0.0); const auto select_query = core::data::structural::selectors::select_by_pir_header(*std::dynamic_pointer_cast<PirEntry>(sequences[0])); const auto select_tmplt = core::data::structural::selectors::select_by_pir_header(*std::dynamic_pointer_cast<PirEntry>(sequences[1])); // --- Read the first structure and repack its amino acid residues (the other cannot be aligned) Structure_SP query_structure, tmplt_structure; std::vector<Residue_SP> query_residues, tmplt_residues; if (strncmp(argv[2], "-", 1) != 0) query_structure = process_input_pdb(argv[2], query_residues,*select_query); if (strncmp(argv[3], "-", 1) != 0) tmplt_structure = process_input_pdb(argv[3], tmplt_residues,*select_tmplt); std::stringstream ss; core::data::io::write_edinburgh(alignment,ss,65535); logs << utils::LogLevel::INFO << "Input alignment\n" << ss.str() << "\n"; // --- Retrieve aligned residues from the two structures according to the alignment object std::vector<Residue_SP> tmplt_residues_aligned, query_residues_aligned; // --- container for the residues if (query_residues.size() == 0) alignment.alignment->get_aligned_template(tmplt_residues, tmplt_residues_aligned); if (tmplt_residues.size() == 0) alignment.alignment->get_aligned_query(query_residues, query_residues_aligned); // --- If both sets of coordinates are present - retrieve both and superimpose // --- Also, when both structures are given - calculate crmsd and roto-translation transformation if ((query_residues.size() > 0) && (tmplt_residues.size() > 0)) { alignment.alignment->get_aligned_query_template(query_residues, tmplt_residues, query_residues_aligned, tmplt_residues_aligned); std::vector<Vec3> query_xyz, tmplt_xyz; for (auto res:query_residues_aligned) query_xyz.push_back(*res->find_atom(" CA ")); for (auto res:tmplt_residues_aligned) { tmplt_xyz.push_back(*res->find_atom(" CA ")); } core::calc::structural::transformations::Crmsd<std::vector<Vec3>, std::vector<Vec3>> rms; std::cout << "crmsd between coordinates of " << query_xyz.size() << " CA atoms: " << rms.crmsd(tmplt_xyz, query_xyz, query_xyz.size(), true) << "\n"; for (auto res:tmplt_residues_aligned) for (auto atom:*res) rms.apply(*atom); } // --- Rotate the query coordinates and superimpose them on the template; print them in PDB format if (query_residues_aligned.size() > 0) { std::ofstream query_file("query.pdb"); for (auto res:query_residues_aligned) for (auto atom:*res) query_file << atom->to_pdb_line() << "\n"; query_file.close(); } // --- Print template coordinates in PDB format if (tmplt_residues_aligned.size() > 0) { std::ofstream tmplt_file("tmplt.pdb"); for (auto res:tmplt_residues_aligned) for (auto atom:*res) tmplt_file << atom->to_pdb_line() << "\n"; tmplt_file.close(); } } |

ap_chi1_rotamers_estimation¶
ap_chi1_rotamers_estimation reads a text file with Chi_1 angles (single column of real values) and fits a mixture of VonMisses distributions to the data. The program may be thus used for deriving rotamer library for VAL, THR, SER and CYS
USAGE:
ap_chi1_rotamers_estimation input-data
EXAMPLE:
ap_chi1_rotamers_estimation THR_chi1.dat
REFERENCE: Mardia, Kanti V., and Peter E. Jupp. Directional statistics. Vol. 494. John Wiley & Sons, 2009
Keywords:
- von Misses distribution
- estimation
- expectation-maximization
- statistics
Categories:
- core::calc::statistics::VonMissesDistribution
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #include <math.h> #include <iostream> #include <random> #include <core/calc/statistics/VonMisesDistribution.hh> #include <core/calc/statistics/expectation_maximization.hh> #include <core/data/io/DataTable.hh> #include <core/calc/numeric/numerical_integration.hh> #include <utils/exit.hh> std::string program_info = R"( ap_chi1_rotamers_estimation reads a text file with Chi_1 angles (single column of real values) and fits a mixture of VonMisses distributions to the data. The program may be thus used for deriving rotamer library for VAL, THR, SER and CYS USAGE: ap_chi1_rotamers_estimation input-data EXAMPLE: ap_chi1_rotamers_estimation THR_chi1.dat REFERENCE: Mardia, Kanti V., and Peter E. Jupp. Directional statistics. Vol. 494. John Wiley & Sons, 2009 )"; /** @brief Reads a file with 1D data and estimates a mixture of VonMissesDistribution based on these observations. * * This example may be used to approximate a $\Chi_1$ rotamer (such as VAL, THR or SER) with a mixture of * VonMisses distributions. * * CATEGORIES: core::calc::statistics::VonMissesDistribution * KEYWORDS: von Misses distribution; estimation; expectation-maximization; statistics * GROUP: Statistics; * IMG: ap_chi1_rotamers_estimation.png * IMG_ALT: Distribution of Chi1 angles of THR side chains approximated with a mixture of three von Mises distribution */ int main(const int argc, const char *argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter const double deg_to_rad = M_PI/180.0; using namespace core::calc::statistics; std::vector<std::vector<double>> chi_angle; if (argc == 2) { core::data::io::DataTable in(argv[1]); for(const auto & row : in) { std::vector<double> chi; chi.push_back( row.get<double>(0) ); chi_angle.push_back(chi); } } // ---------- Three distributions - one for each rotamer VonMisesDistribution m(-60 * deg_to_rad, 10.0), p(60 * deg_to_rad, 10.0), t(-180 * deg_to_rad, 10.0); // medium, gauge-plus and gauge-minus std::vector<VonMisesDistribution> distributions_1D({{-60 * deg_to_rad, 10.0}, // --- gauge minus {60 * deg_to_rad, 10.0}, // --- gauge plus {-180 * deg_to_rad, 10.0}}); // --- trans std::vector<core::index1> index_1D; double score = expectation_maximization(chi_angle, distributions_1D, index_1D, 0.000001); core::index4 cnt0 = std::count(index_1D.cbegin(), index_1D.cend(), 0); core::index4 cnt1 = std::count(index_1D.cbegin(), index_1D.cend(), 1); core::index4 cnt2 = std::count(index_1D.cbegin(), index_1D.cend(), 2); std::cout << "# log-likelihood: " << score << "\n"; std::cout << "# " << cnt0 << " " << distributions_1D[0] << " " << cnt1 << " " << distributions_1D[1] << " " << cnt2 << " " << distributions_1D[2] << "\n"; for (double x = -M_PI; x < M_PI; x += 0.01) std::cout << utils::string_format("%6.3f %8.5f %8.5f %8.5f\n", x, cnt0 * distributions_1D[0].evaluate(x) / chi_angle.size(), cnt1 * distributions_1D[1].evaluate(x) / chi_angle.size(), cnt2 * distributions_1D[2].evaluate(x) / chi_angle.size()); } |

ap_contact_map¶
ap_contact_map calculates a contact map for a given protein structure If a multi-model PDB file was given, the program prints for every contact in how many models the contact was observed. The program can calculate the contacts either on side chains, on alpha carbon or on beta carbon atoms.
USAGE:
ap_contact_map atom-filter input.pdb cutoff
EXAMPLE:
ap_contact_map CA 2kwi.pdb 4.5
where 2kwi.pdb is the input file and 4.5 the contact distance in Angstroms. CA defines the contact map type; allowed options are: CA CB and SC for C-alpha, C-beta and all atom side chain, respectively
Keywords:
Categories:
- core::calc::structural::ContactMap
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | #include <iostream> #include <core/index.hh> #include <core/data/io/Pdb.hh> #include <core/calc/structural/interactions/ContactMap.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/exit.hh> std::string program_info = R"( ap_contact_map calculates a contact map for a given protein structure If a multi-model PDB file was given, the program prints for every contact in how many models the contact was observed. The program can calculate the contacts either on side chains, on alpha carbon or on beta carbon atoms. USAGE: ap_contact_map atom-filter input.pdb cutoff EXAMPLE: ap_contact_map CA 2kwi.pdb 4.5 where 2kwi.pdb is the input file and 4.5 the contact distance in Angstroms. CA defines the contact map type; allowed options are: CA CB and SC for C-alpha, C-beta and all atom side chain, respectively )"; /** @brief Calculates a contact map for a given protein structure * * CATEGORIES: core::calc::structural::ContactMap * KEYWORDS: PDB input; contact map * GROUP: Structure calculations; * IMG: ap_contact_map.png * IMG_ALT: Contact map calculated for 2KWI protein structure solved by NMR. The 2KWI deposit holds 51 models, the color scale shows how popular is a given contact among the models */ int main(const int argc, const char* argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::structural::selectors::AtomSelector_SP selector = std::make_shared<core::data::structural::selectors::IsSC>(); core::data::io::PdbLineFilter filter = core::data::io::is_not_water; if (std::strcmp(argv[1],"CA")==0 ) { selector = std::make_shared<core::data::structural::selectors::IsNamedAtom>(" CA "); core::data::io::PdbLineFilter filter = core::data::io::is_ca; } if (std::strcmp(argv[1],"CB")==0) { core::data::io::PdbLineFilter filter = core::data::io::is_cb; selector = std::make_shared<core::data::structural::selectors::IsNamedAtom>(" CB "); } double cutoff = utils::from_string<double>(argv[3]); // The third parameter is the contact distance (in Angstroms) core::data::io::Pdb reader(argv[2],filter); // --- file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP structure = reader.create_structure(0); core::calc::structural::interactions::ContactMap cmap(*structure, cutoff, selector); for (int i_model = 1; i_model < reader.count_models(); ++i_model) { reader.fill_structure(i_model, *structure); cmap.add(*structure); } std::vector<std::pair<core::index2, core::index2>> contacts; cmap.nonempty_indexes(contacts); for(const std::pair<core::index2,core::index2> ij : contacts) { core::index2 i_res = ij.first; core::index2 j_res = ij.second; std::cout << utils::string_format("%4d %4s %4d%c %4d %4s %4d%c %d\n", i_res, cmap.residue_index(i_res).chain_id.c_str(), cmap.residue_index(i_res).residue_id, cmap.residue_index(i_res).i_code, j_res, cmap.residue_index(j_res).chain_id.c_str(), cmap.residue_index(j_res).residue_id, cmap.residue_index(j_res).i_code, cmap.at(i_res, j_res, 0)); } } |

ap_fit_VonMises_mixture¶
ap_fit_VonMisses_mixture reads a text file with 1D arbitrary observations in degrees and fits a mixture of VonMisses distributions to the data. The number of distributions to fit is determined by the starting parameters: f$muf$ and f$kappaf$ for each distribution. Alternatively, the program can scan the parameter space automatically, when only the number of distributions is given at the input.
USAGE:
ap_fit_VonMises_mixture chi_angles.dat mu kappa [mu2 kappa2 ...]
ap_fit_VonMises_mixture chi_angles.dat n_dist
EXAMPLES:
ap_fit_VonMises_mixture chi_angles.dat -1.05 30 -3.0 30 1.05 30
ap_fit_VonMises_mixture chi_angles.dat 3
Keywords:
Categories:
- core::calc::statistics::VonMissesDistribution
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | #include <math.h> #include <iostream> #include <random> #include <core/algorithms/basic_algorithms.hh> #include <core/algorithms/Combinations.hh> #include <core/calc/statistics/NormalDistribution.hh> #include <core/calc/statistics/expectation_maximization.hh> #include <core/calc/numeric/numerical_integration.hh> #include <core/calc/statistics/VonMisesDistribution.hh> #include <core/calc/statistics/RobustDistributionDecorator.hh> #include <core/data/io/DataTable.hh> #include <utils/exit.hh> std::string program_info = R"( ap_fit_VonMisses_mixture reads a text file with 1D arbitrary observations in degrees and fits a mixture of VonMisses distributions to the data. The number of distributions to fit is determined by the starting parameters: \f$\mu\f$ and \f$\kappa\f$ for each distribution. Alternatively, the program can scan the parameter space automatically, when only the number of distributions is given at the input. USAGE: ap_fit_VonMises_mixture chi_angles.dat mu kappa [mu2 kappa2 ...] ap_fit_VonMises_mixture chi_angles.dat n_dist EXAMPLES: ap_fit_VonMises_mixture chi_angles.dat -1.05 30 -3.0 30 1.05 30 ap_fit_VonMises_mixture chi_angles.dat 3 )"; /** @brief Reads a file with 1D data and estimates a mixture of VonMisesDistribution based on these observations. * * CATEGORIES: core::calc::statistics::VonMissesDistribution * KEYWORDS: statistics; estimation; expectation-maximization * GROUP: Statistics; * IMG: ap_fit_VonMises_mixture.png * IMG_ALT: Distribution of Chi1 angles of THR side chains approximated with a mixture of three von Mises distribution */ int main(const int argc, const char *argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::calc::statistics; double deg_to_rad = M_PI / 180.0; // ---------- Read in data points (your observations) from a file std::vector<std::vector<double>> data_points; core::data::io::DataTable in(argv[1]); double min = 180, max = -180; // This is to detects whether angle values are in radians or in degrees for (const auto &row : in) { std::vector<double> d; double v = row.get<double>(0); if (v < min) min = v; if (v > max) max = v; d.push_back(v); data_points.push_back(d); } if (((min < -M_PI) || (max > M_PI))) std::cerr << "Converting from degrees to radians!\n"; else deg_to_rad = 1.0; for (std::vector<double> &d : data_points) d[0] *= deg_to_rad; std::vector<double> default_params{0.0, 100.0}; // ---------- RobustDistributionDecorator object for each distribution core::index1 n_distributions = atoi(argv[2]); std::vector<RobustDistributionDecorator<VonMisesDistribution>> r_distributions_1D; std::vector<RobustDistributionDecorator<VonMisesDistribution>> r_best_distributions; std::vector<std::vector<std::vector<double>>> initial_parameters; // --- Initial parameters for fitting std::vector<core::index1> distribution_index; // --- Resulting assignment of every data point to a distribution std::vector<core::index1> best_assignment; // --- Resulting assignment of every data point to a distribution // --- This is the case when user provided initial parameters for fitting Von Mises distribution if (argc >= 4) { // ---------- Read parameters from cmdline and create distribution objects std::vector<std::vector<double>> params; for (int i = 2; i < argc; i += 2) { params.push_back(std::vector<double>{atof(argv[i]) * deg_to_rad, atof(argv[i + 1])}); r_distributions_1D.emplace_back(RobustDistributionDecorator<VonMisesDistribution>(params.back())); r_best_distributions.emplace_back(RobustDistributionDecorator<VonMisesDistribution>(default_params)); initial_parameters.push_back(params); } initial_parameters.push_back(params); } else { // --- This is the case when user provided the number of distributions to be fit automatically n_distributions = atoi(argv[2]); for (size_t i = 0; i < n_distributions; ++i) { r_distributions_1D.emplace_back(RobustDistributionDecorator<VonMisesDistribution>(default_params)); r_best_distributions.emplace_back(RobustDistributionDecorator<VonMisesDistribution>(default_params)); } std::vector<double> random_starts{-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0}; // multiplicity of PI core::algorithms::Combinations<double> generator(n_distributions, random_starts); std::vector<double> combination(n_distributions); std::vector<std::vector<double>> params; while (generator.next(combination)) { params.clear(); for (size_t i_distr = 0; i_distr < n_distributions; ++i_distr) params.push_back(std::vector<double>{combination[i_distr] * M_PI, 100.0}); initial_parameters.push_back(params); } } double best_likelihood = -std::numeric_limits<double>::max(); // ---------- Run Expectation-Maximization algorithm for (size_t i_start = 0; i_start < initial_parameters.size(); ++i_start) { // --- iterate over starting points for (size_t i_distr = 0; i_distr < r_distributions_1D.size(); ++i_distr) // --- loop over distributions to set each starting point r_distributions_1D[i_distr].copy_parameters_from(initial_parameters[i_start][i_distr]); double score = expectation_maximization(data_points, r_distributions_1D, distribution_index, 0.1, 100); if (score > best_likelihood) { best_likelihood = score; for (size_t i = 0; i < n_distributions; ++i) r_best_distributions[i].copy_parameters_from(r_distributions_1D[i].parameters()); best_assignment.swap(distribution_index); std::cerr << "# Best likelihood so far: " << best_likelihood << "\n"; } } std::map<core::index1, core::index4> counts; core::algorithms::count_distinct(best_assignment, counts); const double total = std::accumulate(std::begin(counts), std::end(counts), 0, [](const size_t previous, decltype(*counts.begin()) p) { return previous + p.second; }); std::cout << "# Best likelihood " << best_likelihood << "\n# Estimated distributions:\n"; for (size_t i_distr = 0; i_distr < r_distributions_1D.size(); ++i_distr) { std::cout << counts[i_distr] / total << " " << r_best_distributions[i_distr] << "\n"; } } |

ap_hbonds¶
ap_hbonds finds all hydrogen bonds in a given protein structure, including side chain interactions. For each bond the program lists residues involved and describes its geometry (bond length and respective angles). Backbone hydrogen bonds are reported separately from those involving side chains. Detection of hydrogen bond donors and acceptors in a given PDB deposit is based on the definition of respective monomers. The most popular monomers including amino acids and nucleotides are provided with the BioShell distribution. Others must be provided by a user, either in CIF or in PDB format. The input protein must include hydrogen atoms. Crystal structures should be protonated before using this app
USAGE:
ap_hbonds input.pdb [ligand_1.cif [ ligand_2.pdb ...] ]
EXAMPLE:
ap_hbonds 2gb1.pdb
OUTPUT (fragment):
Keywords:
Categories:
- core::calc::structural::interactions::HydrogenBondInteraction
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <core/calc/structural/interactions/HydrogenBondInteraction.hh> #include <core/calc/structural/interactions/BackboneHBondInteraction.hh> #include <core/calc/structural/interactions/HydrogenBondCollector.hh> #include <core/chemical/MonomerStructureFactory.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( ap_hbonds finds all hydrogen bonds in a given protein structure, including side chain interactions. For each bond the program lists residues involved and describes its geometry (bond length and respective angles). Backbone hydrogen bonds are reported separately from those involving side chains. Detection of hydrogen bond donors and acceptors in a given PDB deposit is based on the definition of respective monomers. The most popular monomers including amino acids and nucleotides are provided with the BioShell distribution. Others must be provided by a user, either in CIF or in PDB format. The input protein must include hydrogen atoms. Crystal structures should be protonated before using this app USAGE: ap_hbonds input.pdb [ligand_1.cif [ ligand_2.pdb ...] ] EXAMPLE: ap_hbonds 2gb1.pdb OUTPUT (fragment): )"; /** @brief Finds all hydrogen bonds in a given protein structure * * CATEGORIES: core::calc::structural::interactions::HydrogenBondInteraction * KEYWORDS: PDB input; interactions * GROUP: Structure calculations; * IMG: ap_hbonds_sq.png * IMG_ALT: Hydrogen bonds */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; // --- Pdb reader and PdbLineFilter lives there using namespace core::chemical; using namespace core::calc::structural::interactions; // ---------- Register additional monomers, provided by a user from a command line, either .pdb or .cif for (int i = 2; i < argc; ++i) MonomerStructureFactory::get_instance().register_monomer(argv[i]); // ---------- Read a PDB file given as an argument to this program Pdb reader(argv[1]); HydrogenBondCollector collector; std::vector<ResiduePair_SP> sink; // ---------- Iterate over all models in the input file for (size_t i_protein = 0; i_protein < reader.count_models(); ++i_protein) { sink.clear(); core::data::structural::Structure_SP s = reader.create_structure(0); collector.collect(*s, sink); // ---------- The first loop prints backbone hydrogen bonds std::cout << BackboneHBondInteraction::output_header() << "\n"; for (const ResiduePair_SP ri:sink) { BackboneHBondInteraction_SP bi = std::dynamic_pointer_cast<BackboneHBondInteraction>(ri); if (bi) std::cout << *bi << "\n"; } // ---------- The second loop prints hydrogen bonds involving side chain atoms std::cout << HydrogenBondInteraction::output_header() << "\n"; for (const ResiduePair_SP ri:sink) { BackboneHBondInteraction_SP bi = std::dynamic_pointer_cast<BackboneHBondInteraction>(ri); if (!bi) std::cout << *std::dynamic_pointer_cast<HydrogenBondInteraction>(ri) << "\n"; } } } |

ap_interdigitated_strands¶
Reads a PDB file, creates a BetaStructuresGraph for it and finds all interdigitated strands. A strand is interdigitated when its hydrogen-bonded neighbors within a beta sheet come from different protein chains than that strand.
EXAMPLE:
ap_interdigitated_strands 2fdo.pdb
REFERENCE: Wang S. et al. “Crystal Structure of the Conserved Protein of Unknown Function AF2331 from Archaeoglobus fulgidus DSM 4304 Reveals a New Type of Alpha/Beta Fold” Protein Sci. (2009) 18 2410–2419.
Keywords:
Categories:
- core::data::structural::BetaStructuresGraph
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | #include <core/data/io/Pdb.hh> #include <core/algorithms/graph_algorithms.hh> #include <core/data/structural/BetaStructuresGraph.hh> #include <core/calc/structural/ProteinArchitecture.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a PDB file, creates a BetaStructuresGraph for it and finds all interdigitated strands. A strand is interdigitated when its hydrogen-bonded neighbors within a beta sheet come from different protein chains than that strand. EXAMPLE: ap_interdigitated_strands 2fdo.pdb REFERENCE: Wang S. et al. "Crystal Structure of the Conserved Protein of Unknown Function AF2331 from Archaeoglobus fulgidus DSM 4304 Reveals a New Type of Alpha/Beta Fold" Protein Sci. (2009) 18 2410–2419. )"; void index_strands(core::data::structural::BetaStructuresGraph_SP g) { using core::data::structural::Strand_SP; // ---------- Firstly, let's find the first strand on the path: the one with just one partner Strand_SP first_strand = nullptr; for (auto it = g->begin_strand(); it != g->end_strand(); ++it) { if (g->count_partners(*it) == 1) { if (first_strand == nullptr) first_strand = *it; else if (first_strand->length() > (*it)->length()) // there are two edge strands, take the shorter one first_strand = *it; } } // ---------- If it's a barrel, take the shortest one if (first_strand == nullptr) { Strand_SP first_strand = *g->begin_strand(); for (auto it = g->begin_strand(); it != g->end_strand(); ++it) { if (first_strand->length() > (*it)->length()) first_strand = *it; } } std::set<Strand_SP> visited; std::vector<Strand_SP> stack; std::vector<Strand_SP> scratch; stack.push_back(first_strand); core::index2 idx = 0; while(stack.size()>0) { // --- pop a strand from stack, mark as visited Strand_SP s = stack.back(); s->strand_index_in_sheet = (++idx); visited.insert(s); stack.pop_back(); // --- get its neighbors, push to scratch if not visited yet scratch.clear(); for (auto it = g->begin_strand(s); it != g->end_strand(s); ++it) if(visited.find(*it)==visited.cend()) scratch.push_back(*it); // --- sort neighbors std::sort(scratch.begin(), scratch.end(), [](Strand_SP lhs, Strand_SP rhs) { return rhs->length() < lhs->length(); }); // --- push from the shortest for(Strand_SP si:scratch) stack.push_back(si); } } struct OrderStandsInSheet { bool operator()(core::data::structural::Strand_SP lhs, core::data::structural::Strand_SP rhs) { return lhs->strand_index_in_sheet < rhs->strand_index_in_sheet; } }; /** @brief Creates a BetaStructuresGraph and finds interdigitated sheets * * CATEGORIES: core::data::structural::BetaStructuresGraph * KEYWORDS: PDB input * GROUP: Structure calculations; * IMG: 2fdo-7-sq.png * IMG_ALT: Interdigitated beta-sheet of 2FDO deposit; the two chains A and B shown with different colors */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; using namespace core::data::io; // core::data::io::Pdb reader(argv[1], (is_not_alternative), true); core::data::io::Pdb reader(argv[1], core::data::io::all_true(is_not_alternative, is_not_water), core::data::io::keep_all, true); core::data::structural::Structure_SP strctr = reader.create_structure(0); std::string sec_str; for (auto chain : *strctr) sec_str += chain->create_sequence()->str(); core::calc::structural::ProteinArchitecture a(*strctr, false); BetaStructuresGraph_SP g = a.create_strand_graph(); g->print_adjacency_matrix(std::cerr); for (auto s_it = g->begin(); s_it != g->end(); ++s_it) { auto s = *s_it; for (auto nbr = g->begin_strand(s); nbr != g->end_strand(s); ++nbr) { core::data::structural::Strand_SP strnd = *nbr; } } std::vector<StrandPairing_SP> edges; for (auto it = g->cbegin_pairings(); it != g->cend_pairings(); ++it) edges.push_back((*it).second); for (StrandPairing_SP sp:edges) { Strand_SP first = sp->first_strand; Strand_SP second = sp->second_strand; if ((*first)[0]->owner()->id() == (*second)[0]->owner()->id()) { g->remove_strand_pairing(first, second); } } auto sheets = core::algorithms::connected_components<BetaStructuresGraph, Strand_SP, StrandPairing_SP>(*g, 2); int cnt = 0; for(const auto & sheet: sheets) { std::cout << utils::string_format("-------------- Sheet %d -------------------\n",++cnt); auto strnd = sheet->cbegin_strand(); index_strands(g); // --- index strands in a current sheet std::vector<Strand_SP> strands; for(auto it=sheet->cbegin_strand();it!=sheet->cend_strand();++it) strands.push_back(*it); std::sort(strands.begin(), strands.end(),OrderStandsInSheet{}); for (auto s:strands) std::cout << *s << ", has " << g->count_partners(s) << " edges\n"; } } |

ap_ligand_clustering¶
ap_ligand_clustering performs clustering analysis on small molecule docking poses. The default settings for this program are: clustering_cutoff: 5.0 Angstroms and min_cluster_size: 5 Every line of the output contains a single cluster: the first is number that cluster size, followed by PDB file names that belong to that cluster SEE: pdb_from_clustering.py example script is a tool to create PDB files based on output from ap_ligand_clustering and input PDB files
USAGE:
ap_ligand_clustering code list_of_files.txt [ min_cluster_size clustering_cutoff1 .. ]
SEE ALSO: ap_docking_crmsd - for a flexible docking crmsd calculations ap_stiff_docking_crmsd - for a rigid docking crmsd calculations ap_LigandsOnGridProtocol - simple clustering by projecting ligands on a 3D grid; crude but fast; can handle very large poolsof models
EXAMPLE:
ap_ligand_clustering CLO list.txt 10 2.0 5.0
Keywords:
- PDB input
- clustering
- :ref:``
Categories:
- core::calc::clustering::HierarchicalClustering1B
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/protocols/PairwiseLigandCrmsd.hh> #include <core/calc/clustering/DistanceByValues1B.hh> #include <core/calc/clustering/HierarchicalClustering1B.hh> std::string program_info = R"( ap_ligand_clustering performs clustering analysis on small molecule docking poses. The default settings for this program are: clustering_cutoff: 5.0 Angstroms and min_cluster_size: 5 Every line of the output contains a single cluster: the first is number that cluster size, followed by PDB file names that belong to that cluster SEE: pdb_from_clustering.py example script is a tool to create PDB files based on output from ap_ligand_clustering and input PDB files USAGE: ap_ligand_clustering code list_of_files.txt [ min_cluster_size clustering_cutoff1 .. ] SEE ALSO: ap_docking_crmsd - for a flexible docking crmsd calculations ap_stiff_docking_crmsd - for a rigid docking crmsd calculations ap_LigandsOnGridProtocol - simple clustering by projecting ligands on a 3D grid; crude but fast; can handle very large poolsof models EXAMPLE: ap_ligand_clustering CLO list.txt 10 2.0 5.0 )"; /** @brief Performs clustering analysis on small molecule docking poses * * CATEGORIES: core::calc::clustering::HierarchicalClustering1B * KEYWORDS: PDB input; clustering; * GROUP: Structure calculations; Docking; * IMG: * IMG_ALT: */ int main(const int argc, const char* argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::Logger l("ap_ligand_clustering"); using namespace core::data::structural::selectors; AtomSelector_SP select_ligand = std::static_pointer_cast<AtomSelector>(std::make_shared<SelectResidueByName>(argv[1])); AtomSelector_SP select_ca = std::static_pointer_cast<AtomSelector>(std::make_shared<IsCA>()); core::protocols::PairwiseLigandCrmsd crmsd_calculator(select_ligand, select_ca); std::vector<std::string> fnames = utils::read_listfile(argv[2]); for(const std::string & f:fnames) { core::data::io::Pdb reader(f, "is_not_hydrogen", false, false); crmsd_calculator.add_input_structure(reader.create_structure(0), f); } core::index2 min_cluster_size = (argc < 4) ? 5 : atof(argv[3]); std::vector<double> clustering_cutoffs; double max_clustering_cutoff = 0.0; if (argc < 4) { max_clustering_cutoff = 15.0; clustering_cutoffs.push_back(15.0); } else { for (int i = 4; i < argc; ++i) { clustering_cutoffs.push_back(atof(argv[i])); max_clustering_cutoff = std::max(max_clustering_cutoff, clustering_cutoffs.back()); } } double evaluate_cutoff = max_clustering_cutoff * 1.5; double conversion_factor = 255 / evaluate_cutoff; crmsd_calculator.crmsd_cutoff(evaluate_cutoff); crmsd_calculator.set_out_matrix(); crmsd_calculator.calculate(); auto out = crmsd_calculator.out_matrix(); core::calc::clustering::DistanceByValues1B distances(crmsd_calculator.tags()); for (core::index4 i = 1; i < fnames.size(); ++i) { for (core::index4 j = 0; j < i; ++j) { if (out->has_element(i, j)) { double v = out->at(i, j) * conversion_factor; distances.set(i, j, core::index1(v)); distances.set(j, i, core::index1(v)); // --- uncomment the line below to see the actual distance values together with their converted counterparts // std::cerr << i << " " << j << " " << out->at(i, j) << " " << int(v) << "\n"; } } } core::calc::clustering::HierarchicalClustering1B hac(distances.labels(), ""); hac.run_clustering(distances, "COMPLETE_LINK"); for (double clustering_cutoff:clustering_cutoffs) { std::ofstream out(utils::string_format("clusters-%.2f.txt", clustering_cutoff)); auto clusters = hac.get_clusters(clustering_cutoff * conversion_factor, min_cluster_size); l << utils::LogLevel::INFO << clusters.size() << " clusters created for cutoff " << clustering_cutoff << "\n"; for (const auto &c : clusters) { std::vector<std::string> el = c->cluster_items(c); out << el.size() << " "; for (const std::string &s:el) out << s << " "; out << "\n"; } out.close(); } } |
ap_ligand_contacts¶
ap_ligand_contacts finds contacts between a ligand molecule and a protein. It reads a multi-model PDB file and for each of the models detects contacts between a particular ligand and the rest of the complex. The ligand must be identified by its three-letter code. The output provides the interacting residues (name and residueId) along - separately for each model
USAGE:
ap_ligand_contacts input.pdb ligand-code cutoff-distance
EXAMPLE:
ap_ligand_contacts 5edw.pdb TTP 7.0
where 5edw.pdb id an input file, TTP the ligand code and 7.0 - contact distance in Angstroms
OUTPUT (fragment): —- ligand —- | ——— partner ——– | distance c res id atname | c res id type atname | in Angstrom A TTP 404 C5’ A ASP 105 protein OD1 3.371 A TTP 404 C5’ A ASP 105 protein OD2 3.149 A TTP 404 O2G A LYS 159 protein CE 2.958 A TTP 404 O2G A LYS 159 protein NZ 2.936 A TTP 404 O3G A LYS 159 protein NZ 3.455 A TTP 404 O2A A CA 401 unknown CA 2.316 A TTP 404 O2B A CA 401 unknown CA 2.375 A TTP 404 O2G A CA 401 unknown CA 2.325 A TTP 404 O2A A CA 402 unknown CA 2.356 A TTP 404 O1A A HOH 510 unknown O 3.150 A TTP 404 O3A A HOH 510 unknown O 2.782 A TTP 404 O1G A HOH 510 unknown O 3.373 A TTP 404 O2 T DG 6 nucleic N1 3.048 A TTP 404 O2 T DG 6 nucleic C2 3.467 A TTP 404 O2 T DG 6 nucleic N2 2.931 A TTP 404 N3 T DG 6 nucleic O6 3.129
Keywords:
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( ap_ligand_contacts finds contacts between a ligand molecule and a protein. It reads a multi-model PDB file and for each of the models detects contacts between a particular ligand and the rest of the complex. The ligand must be identified by its three-letter code. The output provides the interacting residues (name and residueId) along - separately for each model USAGE: ap_ligand_contacts input.pdb ligand-code cutoff-distance EXAMPLE: ap_ligand_contacts 5edw.pdb TTP 7.0 where 5edw.pdb id an input file, TTP the ligand code and 7.0 - contact distance in Angstroms OUTPUT (fragment): ---- ligand ---- | --------- partner -------- | distance c res id atname | c res id type atname | in Angstrom A TTP 404 C5' A ASP 105 protein OD1 3.371 A TTP 404 C5' A ASP 105 protein OD2 3.149 A TTP 404 O2G A LYS 159 protein CE 2.958 A TTP 404 O2G A LYS 159 protein NZ 2.936 A TTP 404 O3G A LYS 159 protein NZ 3.455 A TTP 404 O2A A CA 401 unknown CA 2.316 A TTP 404 O2B A CA 401 unknown CA 2.375 A TTP 404 O2G A CA 401 unknown CA 2.325 A TTP 404 O2A A CA 402 unknown CA 2.356 A TTP 404 O1A A HOH 510 unknown O 3.150 A TTP 404 O3A A HOH 510 unknown O 2.782 A TTP 404 O1G A HOH 510 unknown O 3.373 A TTP 404 O2 T DG 6 nucleic N1 3.048 A TTP 404 O2 T DG 6 nucleic C2 3.467 A TTP 404 O2 T DG 6 nucleic N2 2.931 A TTP 404 N3 T DG 6 nucleic O6 3.129 )"; /** @brief Finds contacts between a ligand molecule and a protein. * * CATEGORIES: core::data::io::Pdb * KEYWORDS: PDB input; contact map; ligand * GROUP: Structure calculations; * IMG: ap_ligand_contacts.png * IMG_ALT: Contacts found between 5EDW protein and its ligand TTP */ int main(const int argc, const char* argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // --- file name (PDB format, may be gzip-ped) const std::string code(argv[2]); // --- The ligand code is the second parameter of the program double cutoff = utils::from_string<double>(argv[3]); // The third parameter is the contact distance (in Angstroms) std::cout << " ---- ligand ---- | --------- partner -------- | distance\n"; std::cout << "c res id atname | c res id type atname | in Angstrom\n"; for (size_t i = 0; i < reader.count_models(); ++i) { // --- Iterate over all models in the input file core::data::structural::Structure_SP strctr = reader.create_structure(i); // --- Here we use a standard <code>find_if</code> algorithm to find the ligand residue by its 3-letter code auto ligand = std::find_if(strctr->first_residue(), strctr->last_residue(), [&code](core::data::structural::Residue_SP res) {return (res->residue_type().code3==code);}); if(ligand== strctr->last_residue()) { // --- If no ligand - print a message and take next structure std::cerr << "Model " << i << " of " << argv[1] << " has no " << argv[2] << " residue\n"; continue; } if (reader.count_models() > 1) std::cout << "# Model " << i + 1 << "\n"; for (auto it_resid = strctr->first_residue(); it_resid != strctr->last_residue(); ++it_resid) { if(*it_resid == *ligand) continue; double d = (*it_resid)->min_distance(*ligand); if (d < cutoff) { // --- if this is close enough, for(auto const & ligand_atom : **ligand) { for(auto const & other_atom : **it_resid) { if(ligand_atom->distance_to(*other_atom) <= cutoff) { std::cout << utils::string_format("%4s %3s %4d %4s %4s %3s %4d %6s %4s %6.3f\n", (**ligand).owner()->id().c_str(), (**ligand).residue_type().code3.c_str(), (**ligand).id(), ligand_atom->atom_name().c_str(), (**it_resid).owner()->id().c_str(), (**it_resid).residue_type().code3.c_str(), (**it_resid).id(), core::chemical::monomer_type_name((**it_resid).residue_type()).c_str(), other_atom->atom_name().c_str(), ligand_atom->distance_to(*other_atom)); } } } } } } } |

ap_orient_pdb¶
ap_orient_pdb reads a PDB file and orients the atoms along the axes so the longest protein dimension is along X and the second longest along Y. This example also creates a second transformation, that repeatedly rotate a structure fragment around Z axis by 45 degrees The first (mandatory) argument is a PDB file name. User can also specify a structural fragment by providing a respective chain-ID and a residue range.
USAGE:
ap_orient_pdb input.pdb [chain-id first-resid last-resid]
EXAMPLE:
ap_orient_pdb input.pdb B 419 446
where 2kwi.pdb is the name of an input file and 419 446 are the first and last of the reoriented residues of chain B, respectively
Keywords:
- PDB input
- structural fragment
- structure selectors
- PCA
- transformations
Categories:
- core/calc/numeric/PCA.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | #include <iostream> #include <random> #include <core/index.hh> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/calc/structural/transformations/Rototranslation.hh> #include <core/calc/numeric/Pca3.hh> #include <utils/exit.hh> #include <core/calc/structural/angles.hh> std::string program_info = R"( ap_orient_pdb reads a PDB file and orients the atoms along the axes so the longest protein dimension is along X and the second longest along Y. This example also creates a second transformation, that repeatedly rotate a structure fragment around Z axis by 45 degrees The first (mandatory) argument is a PDB file name. User can also specify a structural fragment by providing a respective chain-ID and a residue range. USAGE: ap_orient_pdb input.pdb [chain-id first-resid last-resid] EXAMPLE: ap_orient_pdb input.pdb B 419 446 where 2kwi.pdb is the name of an input file and 419 446 are the first and last of the reoriented residues of chain B, respectively )"; /** @brief Shows how to rotate a piece of a protein structure * * CATEGORIES: core/calc/numeric/PCA.hh * KEYWORDS: PDB input; structural fragment; structure selectors; PCA; transformations * GROUP: Structure calculations; * IMG: helices.png * IMG_ALT: Alpha helix rotated a few times by a fixed angle */ int main(const int argc, const char *argv[]) { using namespace core::data::basic; // --- for Vec3 and Array2D if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // --- read the input PDB file core::data::structural::Structure_SP structure_sp = reader.create_structure(0); // --- create a structure corresponding to the first model std::vector<core::data::structural::PdbAtom_SP> points3d; // --- if a chain-ID and residue range were also given, create a selector to extract the relevant part of the input if (argc > 4) { std::string selection_string = utils::string_format("%c:%d-%d", argv[2][0], atoi(argv[3]), atoi(argv[4])); core::data::structural::selectors::SelectChainResidues selector(selection_string); // --- if selector selects (returns true), copy the atoms for (auto atom_it = structure_sp->first_atom(); atom_it != structure_sp->last_atom(); ++atom_it) if (selector((*atom_it)->owner())) points3d.push_back(*atom_it); } else { // --- If there is no selection, copy all the atoms from the given structure for (auto atom_it = structure_sp->first_atom(); atom_it != structure_sp->last_atom(); ++atom_it) points3d.push_back(*atom_it); } core::calc::numeric::Pca3 pca3(points3d); auto rt = pca3.create_transformation(); std::cout << "MODEL 1\n"; for (auto atom : points3d) { rt.apply(*atom); std::cout << (atom)->to_pdb_line() << "\n"; } std::cout << "ENDMDL\n"; auto rt2 = core::calc::structural::transformations::Rototranslation::around_axis( Vec3(0,0,1),core::calc::structural::to_radians(45.0),Vec3(0,0,0)); for(int i=2;i<5;i++) { std::cout << "MODEL " << i << "\n"; for (auto atom : points3d) { rt2.apply(*atom); std::cout << atom->to_pdb_line() << "\n"; } std::cout << "ENDMDL\n"; } } |

ap_shuffled_sequence_alignment¶
Reads a FASTA file with two sequences and calculate global sequence alignment scores with one of the two sequences randomly shuffled N_shuffles times (1000 by default). Each time the reshuffled sequence is aligned to the other one. The statistics of scores from randomised alignments is then used to estimate p-value of the global alignment. The default substitution-matrix is BLOSUM62 The program prints all the randomized alignment scores and estimated p-value of the alignment
USAGE:
ap_shuffled_sequence_alignment input.fasta [[substitution_matrix] N_shuffles]
EXAMPLE:
ap_shuffled_sequence_alignment input2.fasta BLOSUM80 10000
Keywords:
Categories:
- core::alignment::NWAligner
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | #include <iostream> #include <chrono> #include <core/data/io/fasta_io.hh> #include <core/alignment/NWAligner.hh> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/SimilarityMatrixScore.hh> #include <core/alignment/scoring/NcbiSimilarityMatrixFactory.hh> #include <core/calc/statistics/OnlineStatistics.hh> #include <core/calc/statistics/NormalDistribution.hh> #include <core/calc/statistics/Random.hh> #include <core/protocols/PairwiseSequenceIdentityProtocol.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a FASTA file with two sequences and calculate global sequence alignment scores with one of the two sequences randomly shuffled N_shuffles times (1000 by default). Each time the reshuffled sequence is aligned to the other one. The statistics of scores from randomised alignments is then used to estimate p-value of the global alignment. The default substitution-matrix is BLOSUM62 The program prints all the randomized alignment scores and estimated p-value of the alignment USAGE: ap_shuffled_sequence_alignment input.fasta [[substitution_matrix] N_shuffles] EXAMPLE: ap_shuffled_sequence_alignment input2.fasta BLOSUM80 10000 )"; /** @brief Calculate global sequence alignment scores with one sequence randomly shuffled and estimates alignment p-value * * CATEGORIES: core::alignment::NWAligner * KEYWORDS: FASTA input; Needleman-Wunsch; sequence alignment; statistics * GROUP: Alignments * IMG: ap_shuffled_sequence_alignment.png * IMG_ALT: Statistics of random sequence alignment between 1BC6 and SFL95851.1 */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::alignment::scoring; core::index2 n_shuffles = (argc > 3) ? atoi(argv[3]) : 1000; // --- The number of random shuffles // --- Read the query sequence std::vector<std::shared_ptr<Sequence>> input_sequences; read_fasta_file(argv[1], input_sequences); // --- find longest sequence to initialize aligner object large enough unsigned max_len = std::max(input_sequences[0]->length(), input_sequences[1]->length()); // --- create aligner object core::alignment::NWAligner<short, SimilarityMatrixScore<short>> aligner(max_len); // --- read similarity matrix from a file (i.e. BLOSUM62) std::string substitution_matrix_name = (argc > 2) ? argv[2] : "BLOSUM62"; NcbiSimilarityMatrix_SP sim_m = NcbiSimilarityMatrixFactory::get().get_matrix("BLOSUM62"); // --- go through all db sequences and align them with the given query auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! std::string j_seq_copy = input_sequences[1]->sequence; SimilarityMatrixScore<short> score(input_sequences[0]->sequence, j_seq_copy, *sim_m); // --- find score of the alignment; just the score - this is faster than aligning and keeping backtracking info short result = aligner.align_for_score(-10, -1, score); core::calc::statistics::Random & r = core::calc::statistics::Random::get(); r.seed(12345); // --- seed the generator for repeatable results core::calc::statistics::OnlineStatistics stats; // --- online (on-the fly) statistics calculator for (size_t i = 0; i < n_shuffles; ++i) { shuffle(j_seq_copy.begin(), j_seq_copy.end(), r); SimilarityMatrixScore<short> score(input_sequences[0]->sequence, j_seq_copy, *sim_m); short res = aligner.align_for_score(-10, -1, score); stats(res); std::cout << res << "\n"; } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); std::cerr << "# " << n_shuffles << " alignment shuffled scores computed within " << time_span.count() << " [s]\n"; std::cout << "# alignment score: " << result << "\n"; std::cout << "# normal p-value, avg, sdev: " << 1 - core::calc::statistics::NormalDistribution::cdf(result, stats.avg(), sqrt(stats.var())) << " " << stats.avg() << " " << sqrt(stats.var()) << "\n"; core::protocols::PairwiseSequenceIdentityProtocol protocol; protocol.substitution_matrix("BLOSUM62").gap_open(-10).gap_extend(-1); protocol.add_input_sequence(input_sequences[0]); protocol.add_input_sequence(input_sequences[1]); protocol.run(); std::cout << "# same value calculated by a library function: " << protocol.count_identical(0, 1) << "\n"; } |

ap_stacking_interactions¶
Finds stacking interactions in a given PDB file. The program reports all stacking interactions detected in a given PDB file. A plausible stacking interaction is detected when two aromatic rings are found to be close in space. Detection of aromatic rings in a given PDB deposit is based on the definition of respective monomers. The most popular monomers including amino acids and nucleotides are provided with the BioShell distribution. Others must be provided by a user, either in CIF or in PDB format.
USAGE:
ap_stacking_interactions input.pdb [ligand1.cif [ligand2.pdb ...] ]
EXAMPLE:
ap_stacking_interactions 5edw.pdb
Keywords:
- PDB input
- PDB line filter
- stacking interactions
Categories:
- core::calc::structural::interactions::StackingInteraction
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/interactions/StackingInteraction.hh> #include <core/calc/structural/interactions/StackingInteractionCollector.hh> #include <utils/exit.hh> #include <utils/LogManager.hh> std::string program_info = R"( Finds stacking interactions in a given PDB file. The program reports all stacking interactions detected in a given PDB file. A plausible stacking interaction is detected when two aromatic rings are found to be close in space. Detection of aromatic rings in a given PDB deposit is based on the definition of respective monomers. The most popular monomers including amino acids and nucleotides are provided with the BioShell distribution. Others must be provided by a user, either in CIF or in PDB format. USAGE: ap_stacking_interactions input.pdb [ligand1.cif [ligand2.pdb ...] ] EXAMPLE: ap_stacking_interactions 5edw.pdb )"; using namespace core::data::io; // --- Pdb reader and PdbLineFilter lives there using namespace core::chemical; using namespace core::calc::structural::interactions; /** @brief Finds stacking interactions in a given PDB file. * * CATEGORIES: core::calc::structural::interactions::StackingInteraction * KEYWORDS: PDB input; PDB line filter; stacking interactions * GROUP: Structure calculations; * IMG: ap_stacking_interactions_sq.png * IMG_ALT: Two tyrosine residues in stacking interaction */ int main(const int argc, const char *argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::INFO(); // --- INFO is the default logging level; set it to FINE to see more // ---------- Read a PDB file given as an argument to this program Pdb reader(argv[1], // --- input PDB file all_true(is_not_water, is_not_alternative, is_not_hydrogen, invert_filter(is_bb)), // --- Inverted backbone selector reads only side chains core::data::io::only_ss_from_header, true); // --- yes, read header core::data::structural::Structure_SP s = reader.create_structure(0); // ---------- Register additional monomers, provided by a user from a command line, either .pdb or .cif for (int i = 2; i < argc; ++i) MonomerStructureFactory::get_instance().register_monomer(argv[i]); StackingInteractionCollector collector=StackingInteractionCollector(); std::vector<ResiduePair_SP> sink; collector.collect(*s,sink); std::cout << StackingInteraction::output_header()<<"\n"; for (const ResiduePair_SP ri:sink) { StackingInteraction_SP bi = std::dynamic_pointer_cast<StackingInteraction>(ri); if (bi) std::cout << *bi << "\n"; } } |

ap_vdw_interactions¶
ap_vdw_interactions finds all van der Waals interactions in a given protein structure.
USAGE:
ap_vdw_interactions input.pdb [input2.pdb ...]
EXAMPLE:
ap_vdw_interactions 2gb1.pdb
OUTPUT (fragment):
Keywords:
Categories:
- core::calc::structural::interactions::VdWInteraction
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <core/calc/structural/interactions/VdWInteraction.hh> #include <core/calc/structural/interactions/VdWInteractionCollector.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( ap_vdw_interactions finds all van der Waals interactions in a given protein structure. USAGE: ap_vdw_interactions input.pdb [input2.pdb ...] EXAMPLE: ap_vdw_interactions 2gb1.pdb OUTPUT (fragment): )"; /** @brief Finds all van der Waals interactions in a given protein structure. * * CATEGORIES: core::calc::structural::interactions::VdWInteraction * KEYWORDS: PDB input; interactions * GROUP: Structure calculations; * IMG: ap_ligand_contacts.png * IMG_ALT: Contacts found between 5EDW protein and its ligand TTP */ int main(const int argc, const char* argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::calc::structural::interactions; VdWInteractionCollector collector; for (size_t i_protein = 1; i_protein < argc; ++i_protein) { // --- Iterate over all models in the input file core::data::io::Pdb reader(argv[i_protein]); // --- file name (PDB format, may be gzip-ped) for (size_t i_model = 0; i_model < reader.count_models(); ++i_model) { // --- Iterate over all models in the input file std::vector<ResiduePair_SP> sink; core::data::structural::Structure_SP strctr = reader.create_structure(i_model); collector.collect(*strctr, sink); std::cout << VdWInteraction::output_header()<<"\n"; for (const ResiduePair_SP ri:sink) { VdWInteraction_SP bi = std::dynamic_pointer_cast<VdWInteraction>(ri); if (bi) std::cout << *bi << "\n"; } } } } |

ap_AAHydrophobicity¶
Reads a PDB file and substitutes b-factor column with hydrophobicity values according to Kyte-Doolittle scale. If just a PDB file is given as an input, all b-factors will be replaced by respective KD hydrophobicity values. User can also provide a Multiple Sequence Alignment (MSA) in ClustalO format (.aln); hydrophobicity values will be averaged over a corresponding column of the MSA. In that case the sequence from the given PDB file must also be included in the alignment; its name is third argument of the program.
USAGE:
ap_AAHydrophobicity input.pdb
ap_AAHydrophobicity input.pdb input.aln sequence-id
EXAMPLE
ap_AAHydrophobicity 2gb1.pdb
ap_AAHydrophobicity 2gb1.pdb 2gb1.aln 2GB1
REFERENCE: Kyte, Jack, and Russell F. Doolittle. “A simple method for displaying the hydropathic character of a protein.” Journal of molecular biology 157.1 (1982): 105-132. doi: 10.1016/0022-2836(82)90515-0
Keywords:
- PDB input
- hydrophobicity
- structure selectors
- PDB line filter
- sequence alignment
- MSA input
Categories:
- core/chemical/AAHydrophobicity
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | #include <iostream> #include <iomanip> #include <core/algorithms/predicates.hh> #include <core/alignment/NWAligner.hh> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/SimilarityMatrixScore.hh> #include <core/alignment/scoring/NcbiSimilarityMatrixFactory.hh> #include <core/chemical/AAHydrophobicity.hh> #include <core/data/io/Pdb.hh> #include <core/data/io/clustalw_io.hh> #include <utils/exit.hh> #include <core/data/structural/selectors/structure_selectors.hh> std::string program_info = R"( Reads a PDB file and substitutes b-factor column with hydrophobicity values according to Kyte-Doolittle scale. If just a PDB file is given as an input, all b-factors will be replaced by respective KD hydrophobicity values. User can also provide a Multiple Sequence Alignment (MSA) in ClustalO format (.aln); hydrophobicity values will be averaged over a corresponding column of the MSA. In that case the sequence from the given PDB file must also be included in the alignment; its name is third argument of the program. USAGE: ap_AAHydrophobicity input.pdb ap_AAHydrophobicity input.pdb input.aln sequence-id EXAMPLE ap_AAHydrophobicity 2gb1.pdb ap_AAHydrophobicity 2gb1.pdb 2gb1.aln 2GB1 REFERENCE: Kyte, Jack, and Russell F. Doolittle. "A simple method for displaying the hydropathic character of a protein." Journal of molecular biology 157.1 (1982): 105-132. doi: 10.1016/0022-2836(82)90515-0 )"; /** @brief Reads a PDB file and substitutes b-factor column with hydrophobicity values according to Kyte-Doolittle scale. This example prints atoms for each side chain in a protein * * CATEGORIES: core/chemical/AAHydrophobicity; * KEYWORDS: PDB input; hydrophobicity; structure selectors; PDB line filter; sequence alignment; MSA input * GROUP: Sequence calculations */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; // is_not_alternative, Pdb and read_clustalw_file() are from this namespace using namespace core::data::sequence; using namespace core::data::structural; // Structure and Residue come from here using namespace core::alignment::scoring; Pdb reader(argv[1], all_true(is_not_alternative, is_not_water)); Structure_SP strctr = reader.create_structure(0); // create a Structure object from the first model found in the input file Chain & first_chain = *(*strctr)[0]; // --- We assume the first chain is the one used in MSA first_chain.erase(std::remove_if(first_chain.begin(), first_chain.end(), core::algorithms::Not<selectors::IsAA>(selectors::IsAA())), first_chain.end()); std::vector<double> kd_values; const core::chemical::AAHydrophobicity &kd_scale = core::chemical::AAHydrophobicity::KyteDoolittle; std::ofstream out("out.pdb"); // ---------- The case when we have both a PDB file and a multiple sewuence alignment (.aln file) if (argc ==4) { std::vector<Sequence_SP> msa; // --- placeholder for aligned sequences core::data::io::read_clustalw_file(argv[2],msa); // --- read the MSA and store sequences in a vector // ---------- Find the reference sequence in the alignment std::string ref_sequence_name(argv[3]); // --- the name of the sequence auto s = std::find_if(msa.begin(), msa.end(), [&ref_sequence_name](Sequence_SP s) { return s->header().find(ref_sequence_name) != std::string::npos; }); if (s == msa.end()) utils::exit_OK_with_message( "Can't find the reference sequence in the given MSA. Is the name correct: " + ref_sequence_name); Sequence_SP ref_sequence = *s; // --- Create a sequence object for the first chain of the PDB deposit core::data::sequence::SecondaryStructure_SP pdb_seq = first_chain.create_sequence(); // ---------- we have to align the reference sequence with the sequence found in the given PDB file // ---------- as they might differ; we set PDB sequence to be a query and the reference - as a template unsigned max_len = std::max(pdb_seq->length(), ref_sequence->length()); core::alignment::NWAligner<short, SimilarityMatrixScore<short>> aligner(max_len); NcbiSimilarityMatrix_SP sim_m = NcbiSimilarityMatrixFactory::get().get_matrix("BLOSUM62"); SimilarityMatrixScore<short> score(pdb_seq->sequence, ref_sequence->sequence, *sim_m); aligner.align(-10, -1, score); auto alignment = aligner.backtrace(); std::cout << "#msa_col aa_col aa res_id : avg_KD n_aa\n"; // ---------- Iterate over all columns of the MSA for (core::index2 i_res = 0; i_res < ref_sequence->length(); ++i_res) { if (ref_sequence->get_monomer(i_res).is_gap()) continue; int j = alignment->which_query_for_template(i_res); // --- -1 denotes a gap, otherwise the index is non-negative if (j < 0) continue; double avg_kd = 0; double n = 0; for (Sequence_SP si:msa) { if (!si->get_monomer(i_res).is_gap()) { avg_kd += kd_scale.hydrophobicity(si->get_monomer(i_res)); ++n; } } std::cout << utils::string_format("%4d %4d %c %4d : %5.2f %3d\n", i_res, j, first_chain[j]->residue_type().code1, first_chain[j]->id(), avg_kd / n, int(n)); avg_kd = avg_kd / n + 5.0; // --- we add 5.0 because KD scale is from -4.5 to 4.5 and b-factor can't be negative for (const PdbAtom_SP &a : *(first_chain[j])) { a->b_factor(avg_kd); out << a->to_pdb_line() << "\n"; } } return 0; } // ---------- The case when we have only PDB file : iIterate over all residues in the structure for (auto res_it = strctr->first_residue(); res_it != strctr->last_residue(); ++res_it) { double val = kd_scale.hydrophobicity((*res_it)->residue_type()) + 5.0; for (const PdbAtom_SP &a : **res_it) { a->b_factor(val); out << a->to_pdb_line() << "\n"; } } out.close(); } |

ap_AlignmentPValuesProtocol¶
ap_AlignmentPValuesProtocol calculates each-vs-each pairwise semiglobal alignments between protein sequences read from a given input file. p-value for every alignment is estimated based on re-shuffled statistics (30 randomly shuffled alignments are calculated)
USAGE:
ap_AlignmentPValuesProtocol input.fasta
EXAMPLE:
ap_AlignmentPValuesProtocol small500_95identical.fasta
Keywords:
Categories:
- core/protocols/AlignmentPValuesProtocol.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | #include <utils/exit.hh> #include <core/alignment/aligner_factory.hh> #include <core/data/basic/Array2D.hh> #include <core/data/sequence/Sequence.hh> #include <core/protocols/AlignmentPValuesProtocol.hh> #include <core/data/io/fasta_io.hh> #include <core/data/basic/SparseMap2D.hh> std::string program_info = R"( ap_AlignmentPValuesProtocol calculates each-vs-each pairwise semiglobal alignments between protein sequences read from a given input file. p-value for every alignment is estimated based on re-shuffled statistics (30 randomly shuffled alignments are calculated) USAGE: ap_AlignmentPValuesProtocol input.fasta EXAMPLE: ap_AlignmentPValuesProtocol small500_95identical.fasta )"; /** @brief Uses AlignmentPValuesProtocol protocol to calculate all pairwise p-values for a given set of sequences * * CATEGORIES: core/protocols/AlignmentPValuesProtocol.hh * KEYWORDS: FASTA input; sequence alignment; statistics * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::sequence; using namespace core::protocols; using namespace core::alignment; core::protocols::AlignmentPValuesProtocol protocol; protocol.gap_open(-10).gap_extend(-1).substitution_matrix("BLOSUM62").keep_alignments(true). alignment_method(AlignmentType::SEMIGLOBAL_ALIGNMENT).keep_alignments(true).n_threads(4); protocol.n_shuffles(30).p_value_cutoff(0.01); std::vector<Sequence_SP> input_sequences; core::data::io::read_fasta_file(argv[1], input_sequences); for(Sequence_SP si: input_sequences) protocol.add_input_sequence(si); auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! protocol.run(); auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); std::cerr << input_sequences.size() * (input_sequences.size() - 1) / 2.0 << " global alignment sequence similarities calculated within " << time_span.count() << " [s]\n"; protocol.print_p_values(std::cout); } |

ap_LigandsOnGridProtocol¶
ap_LigandsOnGridProtocol reads a list of pdb files and creates grid with ligands in it.
USAGE:
ap_LigandsOnGridProtocol box_grid_width models-list
Keywords:
- PDB input
- :ref:``
Categories:
- core::protocols::LigandsOnGridProtocol
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | #include <vector> #include <string> #include <fstream> #include <iostream> #include <cstring> #include <core/data/io/Pdb.hh> #include <utils/io_utils.hh> #include <utils/options/output_options.hh> #include <utils/options/input_options.hh> #include <core/protocols/LigandsOnGridProtocol.hh> #include <utils/exit.hh> using namespace core::data::io; // PDB is from this namespace using namespace core::data::structural; using namespace core::data::structural::selectors; using namespace core::calc::structural; using namespace utils; using namespace std; std::string program_info = R"( ap_LigandsOnGridProtocol reads a list of pdb files and creates grid with ligands in it. USAGE: ap_LigandsOnGridProtocol box_grid_width models-list )"; /** @brief Reads list of pdb files and creates grid with ligands in it. * * The first model on list (index = 0 ) is the representative one. * * CATEGORIES: core::protocols::LigandsOnGridProtocol * GROUP: Structure calculations; Docking; * KEYWORDS: PDB input; */ int main(const int argc, const char *argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); std::vector<std::string> pdb_files; //vector of string file names utils::read_listfile(argv[2], pdb_files); // assumes that ligand is in B chain AtomSelector_SP select_ligand = std::static_pointer_cast<AtomSelector>(std::make_shared<ChainSelector>("B")); // assumes that receptor is in A chain AtomSelector_SP select_receptor = std::static_pointer_cast<AtomSelector>(std::make_shared<ChainSelector>("A")); // creating LigandsOnGridProtocol object core::protocols::LigandsOnGridProtocol ligpro = core::protocols::LigandsOnGridProtocol(select_ligand, select_receptor); // setting box_grid_size ligpro.box_grid_size(atof(argv[1])); PdbLineFilter filter = core::data::io::is_ca; Pdb pdb = Pdb(pdb_files[0], filter); //creating structure from first file on a list Structure_SP strctr = pdb.create_structure(0); // adding structure to LigandsOnGridProtocol object ligpro.add_input_structure(strctr); // loading and adding rest of structures from cat_list for (int i = 1; i < pdb_files.size(); i++) { Pdb pdb = Pdb(pdb_files[i], filter); pdb.fill_structure(0, *strctr); // std::cout << pdb_files[i] << "\n"; ligpro.add_input_structure(strctr); } // running the calculation to put ligands into grid ligpro.calculate(); // creating a copy of a vector with hashes from filled grid cells std::vector<core::index4> grid_cells = ligpro.grid()->filled_cells(); core::index4 index = 0; // variable to remember iterator for biggest cell int size = 100; //variable to remember SIZE while (grid_cells.size() > 0 and size >= 10) { //until vector is not empty and there are cells bigger than 10 size = 0; for (core::index4 i = 0; i < grid_cells.size(); i++) { // iterating over cells //std::cout<<grid_cells[i]<<" "<<ligpro.grid()->get_cell(grid_cells[i]).size()<<"\n"; if (ligpro.grid()->get_cell(grid_cells[i]).size() > size) { //checking if current cell size is bigger then SIZE size = ligpro.grid()->get_cell(grid_cells[i]).size(); //if yes, changing size and index values index = grid_cells[i]; } } if (size >= 10) { // std::cout<<index<<" "<<ligpro.grid()->get_cell(index).size()<<"\n"; std::vector<core::index4> hashes; //vector to store neighbor cells ligpro.grid()->get_neighbor_cells(index, hashes); //getting all hashes for neighbors cells for (core::index4 ind = 0; ind < hashes.size(); ind++) { grid_cells.erase(std::remove(grid_cells.begin(), grid_cells.end(), hashes[ind]), grid_cells.end()); //attepmt to erase biggest cell from the vector } std::ofstream of(utils::to_string(index) + ".out"); std::vector<core::data::structural::PdbAtom_SP> sink; ligpro.grid()->get_neighbors(index, sink); for (core::index4 a = 0; a < sink.size(); a++) {//iterating over Atoms in sink of << sink[a]->id() << "\n"; //writing to file } of.close(); } } } |

ap_LocalStructureMatch¶
Finds contiguous structural segments that are similar between two structures. The program creates contiguous structural segments of 5 or 7 CA atoms based on C-alpha coordinates from file1 and file2 (PDB format). The segment size must be given as the first input parameter. Then it looks for segments that are structurally similar by computing LocalStructureMatch distance between them. This value is defined as a squared difference between local inter-atomic distances. A small value means local structural similarity between respective segments. The last (optional) parameter is the maximum value of a LocalStructureMatch distance to be printed.
USAGE:
./ap_LocalStructureMatch (5 or 7) file1.pdb file2.pdb [max_distance]
EXAMPLE:
./ap_LocalStructureMatch 7 4rm4A.pdb 5ofqA.pdb 9.0
Keywords:
- PDB input
- structure match
Categories:
- core/alignment/scoring/LocalStructureMatch
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #include <ctime> #include <iostream> #include <sstream> #include <utils/Logger.hh> #include <utils/io_utils.hh> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/options/input_options.hh> #include <core/data/basic/Vec3.hh> #include <core/data/io/Pdb.hh> #include <core/alignment/scoring/LocalStructure7.hh> #include <core/alignment/scoring/LocalStructure5.hh> #include <core/alignment/scoring/LocalStructureMatch.hh> #include <utils/exit.hh> utils::Logger l("ap_LocalStructureMatch"); std::string program_info = R"( Finds contiguous structural segments that are similar between two structures. The program creates contiguous structural segments of 5 or 7 CA atoms based on C-alpha coordinates from file1 and file2 (PDB format). The segment size must be given as the first input parameter. Then it looks for segments that are structurally similar by computing LocalStructureMatch distance between them. This value is defined as a squared difference between local inter-atomic distances. A small value means local structural similarity between respective segments. The last (optional) parameter is the maximum value of a LocalStructureMatch distance to be printed. USAGE: ./ap_LocalStructureMatch (5 or 7) file1.pdb file2.pdb [max_distance] EXAMPLE: ./ap_LocalStructureMatch 7 4rm4A.pdb 5ofqA.pdb 9.0 )"; /** @brief Finds contiguous structural segments that are similar between two structures * * CATEGORIES: core/alignment/scoring/LocalStructureMatch; * KEYWORDS: PDB input; structure match * GROUP: Alignments */ int main(const int argc, const char *argv[]) { if(argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter int match_size = atoi(argv[1]); double max_print_distance = (argc==5) ? atof(argv[4]) : 100000.0; using namespace core::alignment::scoring; // --- for LocalStructure7, LocalStructure5 and LocalStructureMatch using namespace core::data::basic; // --- for Coordinates_SP and Vec3 Coordinates_SP xyz_q = std::make_shared<std::vector<Vec3>>(); Coordinates_SP xyz_t = std::make_shared<std::vector<Vec3>>(); core::data::io::Pdb::read_coordinates(argv[2], *xyz_q, true, core::data::io::is_ca); if (match_size == 7) { LocalStructure7 local_query(xyz_q); core::data::io::Pdb::read_coordinates(argv[3], *xyz_t, true, core::data::io::is_ca); LocalStructure7 local_tmplt(xyz_t); LocalStructureMatch<LocalStructure7, 8> lm(local_query, local_tmplt); lm.print(std::cout, max_print_distance); } else if (match_size == 5) { LocalStructure5 local_query(xyz_q); core::data::io::Pdb::read_coordinates(argv[3], *xyz_t, true, core::data::io::is_ca); LocalStructure5 local_tmplt(xyz_t); LocalStructureMatch<LocalStructure5, 8> lm(local_query, local_tmplt); lm.print(std::cout, max_print_distance); } } |

ap_MC_water¶
The program runs an isothermal MC simulation of water. By default it starts from a regular lattice conformation unless an input file (PDB) with initial conformation is provided
USAGE:
ap_MC_water n_molecules temperature small_cycles big_cycles
ap_MC_water starting.pdb temperature small_cycles big_cycles
Keywords:
- no_keywords
Categories:
- no_categories
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | #include <iostream> #include <vector> #include <string> #include <core/data/basic/Vec3I.hh> #include <core/BioShellVersion.hh> #include <utils/string_utils.hh> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/options/output_options.hh> #include <utils/options/sampling_options.hh> #include <simulations/systems/CartesianChains.hh> #include <simulations/systems/BuildFluidSystem.hh> #include <simulations/movers/RotateRigidMolecule.hh> #include <simulations/movers/TranslateMolecule.hh> #include <simulations/movers/MoversSetSweep.hh> #include <simulations/forcefields/mm/Water3PointEnergy.hh> #include <simulations/forcefields/mm/WaterModelParameters.hh> #include <simulations/sampling/IsothermalMC.hh> #include <simulations/observers/ObserveEvaluators.hh> #include <simulations/observers/cartesian/PdbObserver.hh> #include <simulations/observers/ObserveMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/observers/AdjustMoversAcceptance.hh> #include <simulations/observers/cartesian/ExplicitPdbFormatter.hh> #include <simulations/evaluators/CallEvaluator.hh> #include <simulations/systems/SimpleAtomTyping.hh> using namespace core::data::basic; utils::Logger logs("ap_MC_water"); std::string program_info = R"( The program runs an isothermal MC simulation of water. By default it starts from a regular lattice conformation unless an input file (PDB) with initial conformation is provided USAGE: ap_MC_water n_molecules temperature small_cycles big_cycles ap_MC_water starting.pdb temperature small_cycles big_cycles )"; /** @brief Isothermal Monte Carlo simulation of water. * */ int main(const int argc,const char* argv[]) { using core::data::basic::Vec3Cubic; using namespace simulations::systems; using namespace simulations::movers; // for MoversSet using namespace simulations::observers::cartesian; // for all observers using simulations::forcefields::WaterModelParameters; logs << utils::LogLevel::INFO << "BioShell version:\n" << core::BioShellVersion().to_string() << "\n"; core::index4 n_outer_cycles = 1000; core::index4 n_inner_cycles = 10; double temperature = 298; // in Kelvins core::index4 n_molecules = 216; // 216 core::calc::statistics::Random::seed(1234); double water_density = 0.99823; double water_mass = 18.01528; core::data::structural::Structure_SP water_structure = nullptr; if (argc < 5) std::cerr << program_info; else { if (utils::is_integer(argv[1])) n_molecules = atoi(argv[1]); else { // --- read an input file if given core::data::io::Pdb reader(argv[1]); water_structure = reader.create_structure(0); n_molecules = water_structure->count_residues(); } temperature = atof(argv[2]); n_inner_cycles = atoi(argv[3]); n_outer_cycles = atoi(argv[4]); } double water_volume = n_molecules * 10 * water_mass/6.02214; double box_len = pow(water_volume / water_density, 0.33333333333333); // --- Initialize periodic boundary conditions core::data::basic::Vec3I::set_box_len(box_len); logs << utils::LogLevel::INFO << "box width for " << int(n_molecules) << " molecules : " << box_len << "\n"; WaterModelParameters::load_models(); WaterModelParameters tip3p = WaterModelParameters::get_model("TIP3P"); // --- Create water structure if not loaded from PDB if (water_structure == nullptr) { core::data::structural::Residue_SP hoh = tip3p.create_residue(); core::data::structural::Residue_SP water_molecule = std::make_shared<core::data::structural::Residue>(1,"HOH"); PointGridGenerator_SP grid = std::make_shared<SimpleCubicGrid>(box_len, n_molecules); water_structure = BuildFluidSystem::build_structure(*hoh, grid); } // SimpleAtomTyping tip3p_typing({"HOH"}, {"O", "H"}, {" O ", " H "}); // --- Create the system to be sampled std::vector<std::string> res_types {"HOH"}; std::vector<std::string> atom_types {"O", "H"}; std::vector<std::string> pdb_types {" O ", " H "}; AtomTypingInterface_SP tip3p_typing = std::make_shared<SimpleAtomTyping>(res_types, atom_types, pdb_types); CartesianChains system(tip3p_typing, *water_structure); CartesianChains backup(system); // --- Create energy function - TIP3P potential simulations::forcefields::mm::Water3PointEnergy en(tip3p); // --- Create a mover, which is a random perturbation of an atom in this case, and place it in a movers' set std::shared_ptr<RotateRigidMolecule> rot = std::make_shared<RotateRigidMolecule>(system, backup, en, 0); std::shared_ptr<TranslateMolecule> trs = std::make_shared<TranslateMolecule>(system, backup, en); MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(rot, n_molecules); movers->add_mover(trs, n_molecules); // --- create an isothermal Monte Carlo sampler simulations::sampling::IsothermalMC mc(movers,temperature); // ---------- Create an observer which calls energy calculation and prints it on the screen std::shared_ptr<simulations::observers::ObserveEvaluators> obs = std::make_shared<simulations::observers::ObserveEvaluators>(""); // ---------- Create an observer which calls energy calculation and prints it to a file // std::shared_ptr<simulations::observers::ObserveEvaluators> obs = std::make_shared<simulations::observers::ObserveEvaluators>("energy.dat"); std::function<double(void)> recent_energy = [&en, &system]() { return en.energy(system) / (system.count_residues() * 1000); }; obs->add_evaluator( std::make_shared<simulations::evaluators::CallEvaluator<std::function<double(void)>>>(recent_energy, "energy", 8)); std::shared_ptr<simulations::observers::AdjustMoversAcceptance> observe_moves = std::make_shared<simulations::observers::AdjustMoversAcceptance>(*movers,"movers.dat", 0.4); observe_moves->observe_header(); std::shared_ptr<AbstractPdbFormatter> fmt = std::make_shared<ExplicitPdbFormatter>(*water_structure); auto observe_trajectory = std::make_shared<simulations::observers::cartesian::PdbObserver>(system, fmt, "water_tra.pdb"); // observe_trajectory->trigger(std::make_shared<simulations::observers::TriggerEveryN>(10)); mc.outer_cycle_observer(observe_trajectory); // --- commented out to save disk space mc.outer_cycle_observer(observe_moves); mc.outer_cycle_observer(obs); mc.cycles(n_inner_cycles,n_outer_cycles,1); mc.run(); simulations::observers::cartesian::PdbObserver final(system, fmt, "final.pdb"); final.observe(); // logs << utils::LogLevel::INFO << "Final energy " << lj_energy.calculate() << "\n"; } |

ap_MSAColumnConservation¶
Reads a Multiple Sequence Alignment (MSA) in ClustalW or FASTA format and evaluates sequence conservation for every column.
USAGE:
./ap_MSAColumnConservation msa-file [sequence-id]
EXAMPLE:
./ap_MSAColumnConservation cyped.CYP109.aln M5R670_9BACI
where cyped.CYP109.aln is the name of input MSA file (.aln or .fasta format). If the sequence identifier is given as a second optional argument (here: M5R670_9BACI), program will attempt to find the sequence annotated with this name. When such a sequence is found, additional column will be added to provide residue for every position in that sequence (gaps are also shown).
Keywords:
Categories:
- core::alignment::MSAColumnConservation
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | #include <iostream> #include <core/alignment/MSAColumnConservation.hh> #include <core/data/io/clustalw_io.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> #include <core/data/io/fasta_io.hh> std::string program_info = R"( Reads a Multiple Sequence Alignment (MSA) in ClustalW or FASTA format and evaluates sequence conservation for every column. USAGE: ./ap_MSAColumnConservation msa-file [sequence-id] EXAMPLE: ./ap_MSAColumnConservation cyped.CYP109.aln M5R670_9BACI where cyped.CYP109.aln is the name of input MSA file (.aln or .fasta format). If the sequence identifier is given as a second optional argument (here: M5R670_9BACI), program will attempt to find the sequence annotated with this name. When such a sequence is found, additional column will be added to provide residue for every position in that sequence (gaps are also shown). )"; /** @brief Reads a MSA in ClustalW format and evaluates sequence conservation for every column * * CATEGORIES: core::alignment::MSAColumnConservation * KEYWORDS: clustal input; MSA; FASTA input * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; std::vector<Sequence_SP> msa; // --- Sequence_SP is just a shorter name for std::shared_ptr<Sequence> const std::pair<std::string, std::string> name_ext = utils::root_extension(argv[1]); if((name_ext.second=="fasta")||(name_ext.second=="FASTA")||(name_ext.second=="fast")) core::data::io::read_fasta_file(argv[1], msa, true); else core::data::io::read_clustalw_file(argv[1],msa); std::string seq_str( msa[0]->length(),' '); std::string seq_name = (argc > 2) ? argv[2] : ""; bool sequence_found = false; if (seq_name.size() > 0) { for (const auto &seq:msa) if (seq->header().find(argv[2]) != std::string::npos) { seq_str = seq->sequence; sequence_found = true; } if (!sequence_found) std::cerr << "Warning: the sequence >" << seq_name << "< can't be located!\n"; } core::alignment::MSAColumnConservation consrv(msa); if (sequence_found) std::cout << "#pos a gaps Shanon Relative Variation SumOfPairs JensenShannon\n"; else std::cout << "#pos gaps Shanon Relative Variation SumOfPairs JensenShannon\n"; for (size_t ipos = 0; ipos < msa[0]->length(); ++ipos) std::cout << utils::string_format("%4d %c %7.3f %7.3f %7.3f %7.3f %7.3f %7.3f\n", ipos, seq_str[ipos], consrv.evaluate(core::alignment::ColumnConservationScores::GapPercent, ipos), consrv.evaluate(core::alignment::ColumnConservationScores::ShannonEntropy, ipos), consrv.evaluate(core::alignment::ColumnConservationScores::RelativeEntropy, ipos), consrv.evaluate(core::alignment::ColumnConservationScores::Variation, ipos), consrv.evaluate(core::alignment::ColumnConservationScores::SumOfPairs, ipos), consrv.evaluate(core::alignment::ColumnConservationScores::JensenShannonDivergence, ipos)); } |

ap_NWAligner¶
Calculates global sequence alignments (Needleman–Wunsch algorithm) between sequences read from a FASTA file. For every query - subject pair of sequences prints the alignment in the Edinburgh format. The default substitution-matrix is BLOSUM62
USAGE:
ap_NWAligner query.fasta database.fasta [substitution-matrix]
EXAMPLE:
ap_NWAligner 5fd1.fasta ferrodoxins.fasta
REFERENCE: Needleman, Saul B., and Christian D. Wunsch. “A general method applicable to the search for similarities in the amino acid sequence of two proteins.” JMB 48.3 (1970): 443-453. https://doi.org/10.1016/0022-2836(70)90057-4
Keywords:
Categories:
- core/alignment/NWAligner
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | #include <iostream> #include <chrono> #include <algorithm> #include <core/data/io/fasta_io.hh> #include <core/alignment/NWAligner.hh> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/SimilarityMatrixScore.hh> #include <core/alignment/scoring/NcbiSimilarityMatrixFactory.hh> #include <core/alignment/on_alignment_computations.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/data/io/alignment_io.hh> #include <core/data/sequence/Sequence.hh> #include <utils/exit.hh> std::string program_info = R"( Calculates global sequence alignments (Needleman–Wunsch algorithm) between sequences read from a FASTA file. For every query - subject pair of sequences prints the alignment in the Edinburgh format. The default substitution-matrix is BLOSUM62 USAGE: ap_NWAligner query.fasta database.fasta [substitution-matrix] EXAMPLE: ap_NWAligner 5fd1.fasta ferrodoxins.fasta REFERENCE: Needleman, Saul B., and Christian D. Wunsch. "A general method applicable to the search for similarities in the amino acid sequence of two proteins." JMB 48.3 (1970): 443-453. https://doi.org/10.1016/0022-2836(70)90057-4 )"; /** @brief Calculate all pairwise sequence alignments between sequences read from two FASTA files : query and database * * CATEGORIES: core/alignment/NWAligner * KEYWORDS: FASTA input; Needleman-Wunsch; sequence alignment * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; using namespace core::alignment::scoring; // --- the query sequence std::vector<std::shared_ptr<Sequence>> query_sequences; read_fasta_file(argv[1], query_sequences); // --- container for the sequence database std::vector<std::shared_ptr<Sequence>> db_sequences; read_fasta_file(argv[2], db_sequences); // --- find longest sequence to initialize aligner object large enough unsigned max_len = 0; std::for_each(query_sequences.begin(), query_sequences.end(), [&max_len](const Sequence_SP s) { max_len = std::max(max_len, unsigned(s->length())); }); std::for_each(db_sequences.begin(), db_sequences.end(), [&max_len](const Sequence_SP s) { max_len = std::max(max_len, unsigned(s->length())); }); // --- create aligner object core::alignment::NWAligner<short, SimilarityMatrixScore<short>> aligner(max_len); // --- read similarity matrix from a file (e.g. BLOSUM62) NcbiSimilarityMatrix_SP sim_m = NcbiSimilarityMatrixFactory::get().get_matrix((argc > 3) ? argv[3] : "BLOSUM62"); // --- go through all db sequences and align them with the given query auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! for (size_t i = 0; i < query_sequences.size(); ++i) { for (size_t j = 0; j < db_sequences.size(); ++j) { // --- Here we create a sequence similarity object that will score a match // --- between individual positions from the two sequences being aligned SimilarityMatrixScore<short> score(query_sequences[i]->sequence, db_sequences[j]->sequence, *sim_m); // ---------- calculate local alignment aligner.align(-14, -2, score); // ---------- Convert the abstract alignment to a pairwise sequence alignment object const core::alignment::PairwiseAlignment_SP ali = aligner.backtrace(); core::alignment::PairwiseSequenceAlignment seq_ali(ali, query_sequences[i], db_sequences[j]); // ---------- check basics statistics of the alignment core::index2 identical = core::alignment::sum_identical(seq_ali); core::index2 n_aligned = seq_ali.alignment->n_aligned(); std::cout <<utils::string_format("# %s %s id: %6.3f cov: %6.3f\n", utils::split(query_sequences[i]->header())[0].c_str(), utils::split(db_sequences[j]->header())[0].c_str(), identical / double(query_sequences[i]->length()), n_aligned / double(query_sequences[i]->length()) ); // ---------- Print the alignment in Edinburgh format core::data::io::write_edinburgh(seq_ali, std::cout, 80); // --- Alternatively one can find only the score of the alignment; // --- just the score - this is faster than aligning and keeping backtracking info short result = aligner.align_for_score(-10, -1, score); } } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); std::cerr << db_sequences.size() * query_sequences.size() << " global alignment scores computed within " << time_span.count() << " [s]\n"; } |

ap_OnlineStatistics¶
ap_OnlineStatistics reads a file with real values and calculates simple statistics: min, mean, stdev, max. The program uses method of Knuth and Welford for computing average and standard deviation in one pass through the data If no input file is provided, the program calculates the statistics from a random sample.
USAGE:
ap_OnlineStatistics infile
EXAMPLE:
ap_WeightedOnlineStatistics random_normal.txt
REFERENCE: https://www.johndcook.com/blog/skewness_kurtosis/ https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford%27s_online_algorithm
Keywords:
Categories:
- core::calc::statistics::OnlineStatistics; core::calc::statistics::Random
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #include <iostream> #include <core/index.hh> #include <core/calc/statistics/Random.hh> #include <core/calc/statistics/OnlineStatistics.hh> std::string program_info = R"( ap_OnlineStatistics reads a file with real values and calculates simple statistics: min, mean, stdev, max. The program uses method of Knuth and Welford for computing average and standard deviation in one pass through the data If no input file is provided, the program calculates the statistics from a random sample. USAGE: ap_OnlineStatistics infile EXAMPLE: ap_WeightedOnlineStatistics random_normal.txt REFERENCE: https://www.johndcook.com/blog/skewness_kurtosis/ https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford%27s_online_algorithm )"; /** @brief Reads a file with real values and calculates simple statistics: min, mean, stdev, max. * If no input file is provided, the program calculates the statistics from a random sample * * CATEGORIES: core::calc::statistics::OnlineStatistics; core::calc::statistics::Random * KEYWORDS: random numbers; statistics * GROUP: Statistics; */ int main(const int argc, const char *argv[]) { core::calc::statistics::OnlineStatistics stats; if(argc < 2) { // --- complain about missing program parameter std::cerr << program_info; // ---------- Use the random engine if no data is provided core::calc::statistics::Random r = core::calc::statistics::Random::get(); r.seed(12345); // --- seed the generator for repeatable results core::calc::statistics::UniformRealRandomDistribution<double> uniform_random; for (core::index4 n = 0; n < 100000; ++n) stats(uniform_random(r)); } else { std::ifstream in(argv[1]); double r; while(in) { in >> r; stats(r); } } std::cout << "#cnt min avg sdev skewness kurtosis max bimodalitycoefficient\n"; std::cout << utils::string_format("%d %f %f %f %f %f %f %f\n",stats.cnt(),stats.min(),stats.avg(), sqrt(stats.var()),stats.skewness(),stats.kurtosis(),stats.max(), stats.bimodality_coefficient()); } |

ap_PairwiseCrmsd¶
ap_PairwiseCrmsd calculates crmsd value between every pair of protein structures given at the input (at least two structures must be provided). Only values smaller than 20 Angstroms are printed. This example evaluates crmsd for each pair of proteins twice: on C-alpha atoms and on all backbone atoms
USAGE:
ap_PairwiseCrmsd structureA.pdb structureB.pdb [structureC.pdb ... ]
EXAMPLE:
ap_PairwiseCrmsd 2gb1.pdb 2gb1-model1.pdb 2gb1-model2.pdb
Keywords:
Categories:
- core::protocols::PairwiseCrmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <iostream> #include <core/protocols/PairwiseCrmsd.hh> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/exit.hh> std::string program_info = R"( ap_PairwiseCrmsd calculates crmsd value between every pair of protein structures given at the input (at least two structures must be provided). Only values smaller than 20 Angstroms are printed. This example evaluates crmsd for each pair of proteins twice: on C-alpha atoms and on all backbone atoms USAGE: ap_PairwiseCrmsd structureA.pdb structureB.pdb [structureC.pdb ... ] EXAMPLE: ap_PairwiseCrmsd 2gb1.pdb 2gb1-model1.pdb 2gb1-model2.pdb )"; /** @brief Calculates crmsd value for a set of protein structures (at least two) * * CATEGORIES: core::protocols::PairwiseCrmsd * KEYWORDS: PDB input; crmsd; structure selectors * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::basic::Vec3; using namespace core::data::structural::selectors; // --- for all AtomSelector types using namespace core::data::io; using namespace core::protocols; std::vector<core::data::structural::Structure_SP> structures; std::vector<std::string> tags; for (int i = 1; i < argc; ++i) { core::data::io::Pdb reader(argv[i],all_true(is_not_alternative,is_not_water), keep_all, false); // --- note we read all atoms but skip alternate locators and waters structures.push_back(reader.create_structure(0)); tags.push_back(structures.back()->code()); } // ---------- crmsd on C-alpha : this is the std::cout <<"# crmsd on alpha carbons:\n"; std::shared_ptr<AtomSelector> is_CA = std::make_shared<IsCA>(); PairwiseCrmsd rmsd_ca(structures, is_CA, tags); rmsd_ca.crmsd_cutoff(20.0); // crmsd cutoff large enough to get some output rmsd_ca.calculate(); // ---------- crmsd on backbone std::cout <<"# crmsd on heavy backbone atoms:\n"; std::shared_ptr<AtomSelector> is_bb = std::make_shared<IsBB>(); std::shared_ptr<AtomSelector> not_h = std::make_shared<NotHydrogen>(); std::shared_ptr<LogicalANDSelector> heavy_bb = std::make_shared<LogicalANDSelector>(); heavy_bb->add_selector(is_bb); heavy_bb->add_selector(not_h); PairwiseCrmsd rmsd_bb(structures, heavy_bb, tags); rmsd_bb.crmsd_cutoff(20.0); // crmsd cutoff large enough to get some output rmsd_bb.calculate(); } |

ap_PairwiseSequenceIdentityProtocol¶
Evaluates pairwise sequence identity between sequences found in a given FASTA file. The calculations may be performed for a single sequence (against all the other sequences) or for a range of sequences. Calculations may be executed in several parallel threads, calculated values are printed on the screen if they are greater than given cutoff. In addition, the query sequence or sequence range may be provided as fourth, or fourth and fifth parameters, respectively. By default, the program runs on 4 threads, with cutoff 0.28, i.e. printing only these pairs where sequence identity is higher than 28%
USAGE:
./ap_PairwiseSequenceIdentityProtocol in.fasta [n_threads [cutoff] ]
./ap_PairwiseSequenceIdentityProtocol in.fasta n_threads cutoff query-sequence-index
./ap_PairwiseSequenceIdentityProtocol in.fasta n_threads cutoff first-sequence-index last-sequence-index
EXAMPLEs:
./ap_PairwiseSequenceIdentityProtocol small50_95identical.fasta 4 0.28
./ap_PairwiseSequenceIdentityProtocol small50_95identical.fasta 4 0.28 0
./ap_PairwiseSequenceIdentityProtocol small50_95identical.fasta 4 0.28 0 5
First example calculates identity for every pair of sequences. Next one between the first sequence (index 0) all others sequences. Finally the third uses sequences from 0 to 5 (both inclusive) as queries against all the other sequences.
Keywords:
Categories:
- core/protocols/PairwiseSequenceIdentityProtocol.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | #include <core/index.hh> #include <utils/exit.hh> #include <utils/Logger.hh> #include <core/data/io/fasta_io.hh> #include <core/protocols/PairwiseSequenceIdentityProtocol.hh> std::string program_info = R"( Evaluates pairwise sequence identity between sequences found in a given FASTA file. The calculations may be performed for a single sequence (against all the other sequences) or for a range of sequences. Calculations may be executed in several parallel threads, calculated values are printed on the screen if they are greater than given cutoff. In addition, the query sequence or sequence range may be provided as fourth, or fourth and fifth parameters, respectively. By default, the program runs on 4 threads, with cutoff 0.28, i.e. printing only these pairs where sequence identity is higher than 28% USAGE: ./ap_PairwiseSequenceIdentityProtocol in.fasta [n_threads [cutoff] ] ./ap_PairwiseSequenceIdentityProtocol in.fasta n_threads cutoff query-sequence-index ./ap_PairwiseSequenceIdentityProtocol in.fasta n_threads cutoff first-sequence-index last-sequence-index EXAMPLEs: ./ap_PairwiseSequenceIdentityProtocol small50_95identical.fasta 4 0.28 ./ap_PairwiseSequenceIdentityProtocol small50_95identical.fasta 4 0.28 0 ./ap_PairwiseSequenceIdentityProtocol small50_95identical.fasta 4 0.28 0 5 First example calculates identity for every pair of sequences. Next one between the first sequence (index 0) all others sequences. Finally the third uses sequences from 0 to 5 (both inclusive) as queries against all the other sequences. )"; /** @brief Uses PairwiseSequenceIdentityProtocol protocol to calculate all pairwise sequence identity values for a set of sequences * * CATEGORIES: core/protocols/PairwiseSequenceIdentityProtocol.hh * KEYWORDS: FASTA input; sequence alignment * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::sequence; using namespace core::protocols; using namespace core::alignment; utils::Logger logs("ap_PairwiseSequenceIdentityProtocol"); int my_argc = argc; bool if_use_fasta = false; // if (strstr(argv[my_argc-1],"fasta")!=NULL) { // if_use_fasta = true; // --my_argc; // } core::index2 n_threads = (my_argc > 2) ? atoi(argv[2]) : 4; float cutoff = (my_argc > 3) ? atof(argv[3]) : 0.25; logs << utils::LogLevel::INFO << "number of threads used : " << n_threads << "\n"; logs << utils::LogLevel::INFO << "seq. similarity cutoff : " << cutoff << "\n"; core::protocols::PairwiseSequenceIdentityProtocol protocol; protocol.printed_seqname_length(20).gap_open(-10).gap_extend(-1).substitution_matrix("BLOSUM62"). keep_alignments(false).alignment_method(AlignmentType::SEMIGLOBAL_ALIGNMENT).n_threads(n_threads); protocol.if_use_fasta_filter(if_use_fasta).seq_identity_cutoff(cutoff).batch_size(10000); protocol.printed_seqname_length(10); if (my_argc == 5) { protocol.select_query(atoi(argv[4])); logs << utils::LogLevel::INFO << "Using sequence at index " << atoi(argv[4]) << " as a query\n"; } if (my_argc == 6) { for (core::index4 i = atoi(argv[4]); i <= atoi(argv[5]); ++i) protocol.add_query(i); logs << utils::LogLevel::INFO << "Using " << atoi(argv[5]) - atoi(argv[4]) << " query sequences\n"; } std::vector<Sequence_SP> input_sequences; core::data::io::read_fasta_file(argv[1], input_sequences); for(Sequence_SP si: input_sequences) protocol.add_input_sequence(si); auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! protocol.run(); auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); logs << utils::LogLevel::INFO << (size_t ) protocol.n_jobs_completed() << " global alignment sequence identities calculated within " << time_span.count() << " [s]\n"; protocol.print_header(std::cout); protocol.print_sequence_identity(std::cout); } |

ap_ProteinArchitecture¶
ap_ProteinArchitecture reads a PDB file and describes its architecture in terms of secondary structure elements (SSEs) and their connectivity (i.e. how strands are connected in sheets). The SSEs themselves are defined based on data from PDB file header. If DSSP flag has been given, the app will detect secondary structure elements using BioShell’s implementation of DSSP algorithm.
USAGE:
ap_ProteinArchitecture input.pdb [DSSP]
EXAMPLE:
ap_ProteinArchitecture 5edw.pdb [DSSP]
REFERENCE: Kabsch, Wolfgang, and Christian Sander. “Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features.” Biopolymers 22 (1983): 2577-2637. doi:10.1002/bip.360221211
Keywords:
Categories:
- core/calc/structural/ProteinArchitecture
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/ProteinArchitecture.hh> #include <utils/exit.hh> #include <utils/LogManager.hh> using namespace core::data::structural; using namespace core::data::io; using namespace core::data::basic; std::string program_info = R"( ap_ProteinArchitecture reads a PDB file and describes its architecture in terms of secondary structure elements (SSEs) and their connectivity (i.e. how strands are connected in sheets). The SSEs themselves are defined based on data from PDB file header. If DSSP flag has been given, the app will detect secondary structure elements using BioShell's implementation of DSSP algorithm. USAGE: ap_ProteinArchitecture input.pdb [DSSP] EXAMPLE: ap_ProteinArchitecture 5edw.pdb [DSSP] REFERENCE: Kabsch, Wolfgang, and Christian Sander. "Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features." Biopolymers 22 (1983): 2577-2637. doi:10.1002/bip.360221211 )"; /** @brief Calculates a map of backbone hydrogen bonds. * * CATEGORIES: core/calc/structural/ProteinArchitecture; * KEYWORDS: PDB input; Hydrogen bonds; Protein structure features * GROUP: Structure calculations; */ int main(const int argc, const char* argv[]) { using namespace core::calc::structural; utils::LogManager::INFO(); // --- Turn it to FINE to see a lot more of messages, e.g about missed h-bonds if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1], core::data::io::all_true(is_not_alternative, is_not_water), core::data::io::keep_all, true); // --- Read in a PDB file Structure_SP strctr = reader.create_structure(0); // --- create a Structure object from the first model bool if_dssp = (argc > 2) && (strcmp(argv[2], "DSSP") == 0); core::calc::structural::ProteinArchitecture pa(*strctr, if_dssp); std::cout <<"# ---------- Secondary structure elements ----------\n"; for (const auto sse : pa.sse_vector()) std::cout << *sse << "\n"; std::cout <<"# ---------- Beta strand connectivity ----------\n"; auto sse_graph = pa.create_strand_graph(); sse_graph->print_adjacency_matrix(std::cerr); for(auto e_it = sse_graph->cbegin_strand();e_it!=sse_graph->cend_strand();++e_it) { std::cout << (*e_it)->info()<<" paired with:\n"; for(auto partner_it = sse_graph->cbegin_strand(*e_it); partner_it != sse_graph->cend_strand(*e_it); ++partner_it) { auto pairing_sp = sse_graph->get_strand_pairing(*e_it, *partner_it); std::cout << "\t" << (*partner_it)->name() << " " << Strand::strand_type_name(pairing_sp->pairing_type) << " by " << pairing_sp->hydrogen_bonds().size() << " hbonds\n"; } } } |

ap_Rubik_simulation¶
The program runs a Replica Exchange Monte Carlo simulation of a Rubik’s cube system
Keywords:
Categories:
- simulations/sampling/ReplicaExchangeMC
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | #include <iostream> #include <string> #include <stdexcept> #include <stdlib.h> #include <fstream> #include <vector> #include <simulations/evaluators/CallEvaluator.hh> #include <simulations/forcefields/CalculateEnergyBase.hh> #include <simulations/evaluators/Evaluator.hh> #include <simulations/forcefields/TotalEnergy_OBSOLETE.hh> #include <simulations/observers/ObserveEvaluators.hh> #include <simulations/observers/ObserveMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/observers/ObserveReplicaFlow.hh> #include <simulations/observers/ObserveWLSampling.hh> #include <simulations/sampling/IsothermalMC.hh> #include <simulations/sampling/ReplicaExchangeMC.hh> #include <simulations/systems/ising/RubikCube.hh> #include <utils/options/sampling_options.hh> #include <utils/options/sampling_from_cmdline.hh> #include <simulations/sampling/WangLandauSampler.hh> using namespace simulations; using namespace simulations::systems::ising; utils::Logger logs("ap_Rubik_simulation"); std::string program_info = R"( The program runs a Replica Exchange Monte Carlo simulation of a Rubik's cube system )"; /** @brief Turns energy of a system into an energy bin index (integer) * @param energy - system's energy * @return integer assigned to a bin; may be negative */ inline int bfe(double energy) { return (int) energy; } std::shared_ptr<simulations::sampling::WangLandauSampler> prepare_wl_simulation(const simulations::SimulationSettings &settings) { using namespace utils::options; int system_size = settings.get<int>("cube_size"); Rubik_SP system = std::make_shared<Rubik>(system_size); logs << "Minimum energy: " << system->calculate()<<"\n"; system->scramble(); // Set the cube to a random conformation logs << "starting energy: " << system->calculate()<<"\n"; // ---------- Movers definition ---------- simulations::movers::MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(std::static_pointer_cast<simulations::movers::Mover>(system), system_size * system_size); // ---------- Create the sampler ---------- std::shared_ptr<simulations::sampling::WangLandauSampler> sampler = std::make_shared<simulations::sampling::WangLandauSampler>( movers, system->calculate(), bfe, system_size * system_size * 6); sampler->reset(settings); simulations::forcefields::CalculateEnergyBase_SP energies; simulations::observers::ObserveEvaluators_SP observations = std::make_shared<simulations::observers::ObserveEvaluators>(utils::string_format("energy-wl.dat")); observations->add_evaluator(system); sampler->outer_cycle_observer(observations); sampler->outer_cycle_observer(std::make_shared<simulations::observers::ObserveWLSampling>(*sampler, "wl.dat")); simulations::observers::ObserveMoversAcceptance_SP obs_ms = std::make_shared<simulations::observers::ObserveMoversAcceptance>(*movers, utils::string_format("movers-wl.dat")); obs_ms->observe_header(); sampler->outer_cycle_observer(obs_ms); return sampler; } std::shared_ptr<simulations::sampling::ReplicaExchangeMC> prepare_replica_simulation(const simulations::SimulationSettings& settings) { using namespace utils::options; std::vector<Rubik_SP> systems; std::vector<simulations::sampling::IsothermalMC_SP> replica_samplers; std::vector<simulations::forcefields::CalculateEnergyBase_SP> energies; std::vector<double> temperatures; utils::split(settings.get<std::string>(replicas),temperatures, ','); core::index4 n_outer_cycles = settings.get<core::index4>(mc_outer_cycles); core::index4 n_inner_cycles = settings.get<core::index4>(mc_inner_cycles); core::index4 n_exchanges = settings.get<core::index4>(replica_exchanges); int system_size = settings.get<int>("cube_size"); for (core::index2 irepl = 0; irepl < temperatures.size(); ++irepl) { // ---------- Create the systems to be sampled ---------- Rubik_SP system = std::make_shared<Rubik>(system_size); system->scramble(); // Set the cube to a random conformation systems.push_back(system); energies.push_back(system); // ---------- Movers definition ---------- simulations::movers::MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(std::static_pointer_cast<simulations::movers::Mover>(system), system_size * system_size); // ---------- Create the sampler ---------- auto sampler = std::make_shared<simulations::sampling::IsothermalMC>(movers, temperatures[irepl]); replica_samplers.push_back(sampler); sampler->cycles(n_inner_cycles, n_outer_cycles); simulations::observers::ObserveEvaluators_SP observations = std::make_shared<simulations::observers::ObserveEvaluators>(utils::string_format("energy-%.3f.dat", temperatures[irepl])); observations->add_evaluator(system); sampler->outer_cycle_observer(observations); simulations::observers::ObserveMoversAcceptance_SP obs_ms = std::make_shared<simulations::observers::ObserveMoversAcceptance>(*movers, utils::string_format("movers-%.3f.dat", temperatures[irepl])); obs_ms->observe_header(); sampler->outer_cycle_observer(obs_ms); } auto remc = std::make_shared<simulations::sampling::ReplicaExchangeMC>(replica_samplers, energies, true); auto remc_flow = std::make_shared<simulations::observers::ObserveReplicaFlow>(*remc, "replica_flow.dat"); remc->exchange_observer(remc_flow); remc->replica_exchanges(n_exchanges); return remc; } /** @brief The program runs a Replica Exchange Monte Carlo simulation of a Rubik's cube system. * * This example shows how to simulate a system using BioShell library * * CATEGORIES: simulations/sampling/ReplicaExchangeMC; * KEYWORDS: Monte Carlo; sampling; observer; simulation * IMG_ALT: Example results from a Rubik's cube simulations */ int main(const int argc, const char *argv[]) { using namespace utils::options; // --- All the options are in this namespace static Option cube_size("-c", "-cube_size", "size of the Rubik's cube"); static Option sampler("-s", "-sampler", "MC sampler: 'remc' or 'wl'"); utils::options::OptionParser &cmd = utils::options::OptionParser::get(); cmd.register_option(utils::options::help, verbose, rnd_seed, cube_size(3), sampler); cmd.register_option(mc_outer_cycles(10000), mc_inner_cycles(10), mc_cycle_factor(1), replica_exchanges(10)); cmd.register_option(begin_temperature(2.0), end_temperature(0.5), temp_steps(0.1), replicas("2,1.75,1.5,1.25,1.0,0.8,0.7,0.6,0.5")); if (!cmd.parse_cmdline(argc, argv)) return 1; if (rnd_seed.was_used()) { auto rnd = option_value<core::calc::statistics::Random::result_type>(rnd_seed); core::calc::statistics::Random::seed(rnd); logs << utils::LogLevel::SEVERE << "Pseudorandom start: " << rnd << "\n"; } else { core::calc::statistics::Random::get().seed(12345); // --- seed the generator for repeatable results logs << utils::LogLevel::SEVERE << "Pseudorandom start with seed: 12345\n"; // core::calc::statistics::Random::seed(time(0)); // logs << utils::LogLevel::SEVERE << "Pseudorandom start with time(0) seed: \n"; } simulations::SimulationSettings settings; settings.insert_or_assign(cmd, true); if ((option_value<std::string>(sampler) == "wl") || (option_value<std::string>(sampler) == "WL")) { auto wl = prepare_wl_simulation(settings); wl->run(); } else { auto remc = prepare_replica_simulation(settings); remc->run(); } } |

ap_SWAligner¶
Calculates local sequence alignments (Smith-Waterman algorithm) between sequences read from a FASTA file. For every query - subject pair of sequences prints the alignment in the Edinburgh format. The default substitution-matrix is BLOSUM62.
USAGE:
ap_SWAligner query.fasta database.fasta [substitution-matrix]
EXAMPLE:
ap_SWAligner 5fd1.fasta test_inputs/ferrodoxins.fasta
REFERENCE: Smith, Temple F., and Michael S. Waterman. “Identification of common molecular subsequences.” JMB 147.1 (1981): 195-197. doi:10.1016/0022-2836(81)90087-5
Keywords:
Categories:
- core/alignment/SWAligner
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | #include <iostream> #include <chrono> #include <algorithm> #include <core/data/io/fasta_io.hh> #include <core/data/sequence/Sequence.hh> #include <core/alignment/SWAligner.hh> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/SimilarityMatrixScore.hh> #include <core/alignment/scoring/NcbiSimilarityMatrixFactory.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/data/io/alignment_io.hh> #include <utils/exit.hh> std::string program_info = R"( Calculates local sequence alignments (Smith-Waterman algorithm) between sequences read from a FASTA file. For every query - subject pair of sequences prints the alignment in the Edinburgh format. The default substitution-matrix is BLOSUM62. USAGE: ap_SWAligner query.fasta database.fasta [substitution-matrix] EXAMPLE: ap_SWAligner 5fd1.fasta test_inputs/ferrodoxins.fasta REFERENCE: Smith, Temple F., and Michael S. Waterman. "Identification of common molecular subsequences." JMB 147.1 (1981): 195-197. doi:10.1016/0022-2836(81)90087-5 )"; /** @brief Calculate all pairwise sequence alignments between sequences read from two FASTA files : query and database * * CATEGORIES: core/alignment/SWAligner * KEYWORDS: FASTA input; sequence alignment * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; using namespace core::alignment::scoring; // --- the query sequence std::vector<std::shared_ptr<Sequence>> query_sequences; read_fasta_file(argv[1], query_sequences); // --- container for the sequence database std::vector<std::shared_ptr<Sequence>> db_sequences; read_fasta_file(argv[2], db_sequences); // --- find longest sequence to initialize aligner object large enough unsigned max_len = 0; std::for_each(query_sequences.begin(), query_sequences.end(), [&max_len](const Sequence_SP s) { max_len = std::max(max_len, unsigned(s->length())); }); std::for_each(db_sequences.begin(), db_sequences.end(), [&max_len](const Sequence_SP s) { max_len = std::max(max_len, unsigned(s->length())); }); // ---------- Create aligner object core::alignment::SWAligner<short, SimilarityMatrixScore<short>> aligner(max_len); // ---------- read similarity matrix from a file (e.g. BLOSUM62) NcbiSimilarityMatrix_SP sim_m = NcbiSimilarityMatrixFactory::get().get_matrix((argc > 3) ? argv[3] : "BLOSUM62"); // ---------- Go through all db sequences and align them with the given query auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! for (size_t i = 0; i < query_sequences.size(); ++i) { for (size_t j = 0; j < db_sequences.size(); ++j) { // ---------- Here we create a sequence similarity object that will score a match // ---------- between individual positions from the two sequences being aligned SimilarityMatrixScore<short> score(query_sequences[i]->sequence, db_sequences[j]->sequence, *sim_m); // ---------- calculate local alignment aligner.align(-10, -1, score); // ---------- Convert the abstract alignment to a pairwise sequence alignment object const core::alignment::PairwiseAlignment_SP ali = aligner.backtrace(); core::alignment::PairwiseSequenceAlignment seq_ali(ali, query_sequences[i], db_sequences[j]); // ---------- Print the alignment in Edinburgh format core::data::io::write_edinburgh(seq_ali, std::cout, 80); } } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); std::cerr << db_sequences.size() * query_sequences.size() << " local alignment scores computed within " << time_span.count() << " [s]\n"; } |

ap_SequenceProfile¶
ap_SequenceProfile reads a Multiple Sequence Alignment (MSA) in ClustalO or FASTA format and prints a sequence profile made from it. The program detects the format of ain input file by its extension: use either .fasta or .aln, for FASTA and ClustalO, respectively. If the optional argument -w is used, sequences will be weighted before profile calculations. The profile probabilities will be therefore weighted counts rather than just raw observations.
USAGE:
./ap_SequenceProfile infile.aln [-w]
EXAMPLE:
./ap_SequenceProfile cyped.CYP109.aln
./ap_SequenceProfile cyped.CYP109.fasta -w
Keywords:
Categories:
- core/data/sequence/SequenceProfile; core/protocols/SequenceWeightingProtocol
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #include <iostream> #include <core/data/io/hssp_io.hh> #include <core/data/io/fasta_io.hh> #include <core/data/io/clustalw_io.hh> #include <core/data/sequence/SequenceProfile.hh> #include <core/protocols/SequenceWeightingProtocol.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> std::string program_info = R"( ap_SequenceProfile reads a Multiple Sequence Alignment (MSA) in ClustalO or FASTA format and prints a sequence profile made from it. The program detects the format of ain input file by its extension: use either .fasta or .aln, for FASTA and ClustalO, respectively. If the optional argument -w is used, sequences will be weighted before profile calculations. The profile probabilities will be therefore weighted counts rather than just raw observations. USAGE: ./ap_SequenceProfile infile.aln [-w] EXAMPLE: ./ap_SequenceProfile cyped.CYP109.aln ./ap_SequenceProfile cyped.CYP109.fasta -w )"; /** @brief Reads a MSA in ClustalW format and prints a sequence profile * * CATEGORIES: core/data/sequence/SequenceProfile; core/protocols/SequenceWeightingProtocol * KEYWORDS: sequence profile; Clustal input; MSA * GROUP: Sequence calculations; */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; // ---------- Load all sequences into a vector std::vector<Sequence_SP> msa; // --- Sequence_SP is just a shorter name for std::shared_ptr<Sequence> auto root_extn = utils::root_extension(argv[1]); if ((root_extn.second == "aln") || (root_extn.second == "clustalw")) { core::data::io::read_clustalw_file(argv[1], msa, true); } else if (root_extn.second == "hssp") { core::data::io::read_hssp_file(argv[1], msa, true, true); } else core::data::io::read_fasta_file(argv[1], msa); std::vector<double> seq_weights{1,1.0}; // --- just one weight of value 1.0 // ---------- Set up and run sequence weighting protocol if needed if ((argc == 3) && (strcmp(argv[2], "-w") == 0)) { core::protocols::HenikoffSequenceWeights protocol; protocol.n_threads(4).add_input_sequences(msa); protocol.run(); seq_weights.clear(); for (core::index2 i = 0; i < msa.size(); ++i) seq_weights.push_back(protocol.get_weight(i)); } // ---------- Create a sequence profile and print in on the screen SequenceProfile profile(*msa[0], SequenceProfile::aaOrderByPropertiesGapped(), msa, seq_weights); profile.write_table(std::cout); } |

ap_SequenceWeightingProtocol¶
ap_SequenceWeightingProtocol reads a set of protein sequences and computes a real weight for each of those sequences. If the FASTA file is the input, every pair of sequences will be aligned and sequence identity values will be evaluated based on these alignments. If .aln is the input (i.e. ClustalO MSA file format), it is assumed the sequences are already aligned and sequence identity values will be computed based on the MSA. Sequence identity values will be transformed into real weights. These weights may be further used e.g. in sequence profile construction
USAGE:
ap_SequenceWeightingProtocol input-file
EXAMPLEs:
ap_SequenceWeightingProtocol input.fasta
ap_SequenceWeightingProtocol input.aln
Keywords:
- FASTA input
- sequence alignment
- sequence identity
- sequence weighting
Categories:
- core/protocols/SequenceWeightingProtocol
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #include <utils/exit.hh> #include <core/data/basic/Array2D.hh> #include <core/data/sequence/Sequence.hh> #include <core/data/io/fasta_io.hh> #include <core/data/io/clustalw_io.hh> #include <core/protocols/SequenceWeightingProtocol.hh> #include <utils/io_utils.hh> std::string program_info = R"( ap_SequenceWeightingProtocol reads a set of protein sequences and computes a real weight for each of those sequences. If the FASTA file is the input, every pair of sequences will be aligned and sequence identity values will be evaluated based on these alignments. If .aln is the input (i.e. ClustalO MSA file format), it is assumed the sequences are already aligned and sequence identity values will be computed based on the MSA. Sequence identity values will be transformed into real weights. These weights may be further used e.g. in sequence profile construction USAGE: ap_SequenceWeightingProtocol input-file EXAMPLEs: ap_SequenceWeightingProtocol input.fasta ap_SequenceWeightingProtocol input.aln )"; /** @brief Shows how to use SequenceWeightingProtocol class * * CATEGORIES: core/protocols/SequenceWeightingProtocol * KEYWORDS: FASTA input; sequence alignment; sequence identity; sequence weighting * GROUP: Sequence calculations; */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::sequence; using namespace core::protocols; bool if_align = true; std::vector<Sequence_SP> input_sequences; auto root_extn = utils::root_extension(argv[1]); if ((root_extn.second == "aln") || (root_extn.second == "clustalw")) { core::data::io::read_clustalw_file(argv[1], input_sequences); if_align = false; } else core::data::io::read_fasta_file(argv[1], input_sequences); core::protocols::HenikoffSequenceWeights protocol; protocol.n_threads(1); protocol.add_input_sequences(input_sequences); auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! protocol.run(); auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); std::cerr << input_sequences.size() * (input_sequences.size() - 1) / 2.0 << " sequence similarities calculated within " << time_span.count() << " [s]\n"; protocol.print_weights(std::cout); } |

ap_WeightedOnlineStatistics¶
ap_WeightedOnlineStatistics reads a file with two columns: real values and their weights. It calculates average value and standard deviation of the data using an online algorithm (Welford method). If no input file is provided, the program calculates the statistics from a random sample.
USAGE:
ap_WeightedOnlineStatistics infile
EXAMPLE:
ap_WeightedOnlineStatistics random_normal_weighted.txt
REFERENCE: https://www.johndcook.com/blog/skewness_kurtosis/ https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford%27s_online_algorithm
Keywords:
Categories:
- core/calc/statistics/ap_WeightedOnlineStatistics; core/calc/statistics/Random
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #include <iostream> #include <cmath> #include <core/index.hh> #include <core/calc/statistics/Random.hh> #include <core/calc/statistics/WeightedOnlineStatistics.hh> std::string program_info = R"( ap_WeightedOnlineStatistics reads a file with two columns: real values and their weights. It calculates average value and standard deviation of the data using an online algorithm (Welford method). If no input file is provided, the program calculates the statistics from a random sample. USAGE: ap_WeightedOnlineStatistics infile EXAMPLE: ap_WeightedOnlineStatistics random_normal_weighted.txt REFERENCE: https://www.johndcook.com/blog/skewness_kurtosis/ https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford%27s_online_algorithm )"; /** @brief Reads a file with two columns: real values and their weights, and calculates their mean and stdev. * * If no input file is provided, the program calculates the statistics from a random sample * * CATEGORIES: core/calc/statistics/ap_WeightedOnlineStatistics; core/calc/statistics/Random * KEYWORDS: statistics * GROUP: Statistics; */ int main(const int argc, const char *argv[]) { core::calc::statistics::WeightedOnlineStatistics stats; if (argc < 2) { // --- complain about missing program parameter std::cerr << program_info; // ---------- Use the random engine if no data is provided core::calc::statistics::Random r = core::calc::statistics::Random::get(); r.seed(12345); // --- seed the generator for repeatable results std::normal_distribution<double> normal_random; for (core::index4 n = 0; n < 10000; ++n) { double x = normal_random(r); if (x <= 2.0) stats(x, 0.1); // --- insert the random point with an arbitrary weight = 0.1 else for (int i = 0; i < 10; ++i) stats(x, 0.01); // in the tail insert points ten times with weight 1/10 } } else { std::ifstream in(argv[1]); double x, w; while (in) { in >> x >> w; stats(x, w); } } std::cout << "#count sum_wghts avg sdev\n"; std::cout << utils::string_format("%d %lf %f %f \n", stats.cnt(), double(stats.sum_of_weights()), stats.avg(), sqrt(stats.var())); } |

ap_align_profiles¶
Read two files with sequence profiles (BioShell’s tabular format) and calculates global alignment between them. The gap penalty function depends on observed gap probabilities. Prints sequence alignment as an output. The default for values for base gap penalty is -10 and -1 for gap_open and gap_extend, respectively.
USAGE:
ap_align_profiles <file1.profile> <file2.profile> [gap_open gap_extend]
EXAMPLE:
ap_align_profiles d4proc1-A1.profile d4proc1-A2.profile -11 -2
Keywords:
Categories:
- core/alignment/NWAlignerAnyGap
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | #include <iostream> #include <chrono> #include <algorithm> #include <core/data/io/fasta_io.hh> #include <core/alignment/NWAlignerAnyGap.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/alignment/scoring/Picasso3.hh> #include <core/alignment/scoring/FrequencyScaledGapPenalty.hh> #include <core/data/io/alignment_io.hh> #include <core/data/sequence/Sequence.hh> #include <utils/exit.hh> std::string program_info = R"( Read two files with sequence profiles (BioShell’s tabular format) and calculates global alignment between them. The gap penalty function depends on observed gap probabilities. Prints sequence alignment as an output. The default for values for base gap penalty is -10 and -1 for gap_open and gap_extend, respectively. USAGE: ap_align_profiles <file1.profile> <file2.profile> [gap_open gap_extend] EXAMPLE: ap_align_profiles d4proc1-A1.profile d4proc1-A2.profile -11 -2 )"; /** @brief Calculate all pairwise sequence alignments between sequence profiles * * CATEGORIES: core/alignment/NWAlignerAnyGap * KEYWORDS: FASTA input; Needleman-Wunsch; sequence alignment * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; using namespace core::alignment::scoring; float gap_open = -10; float gap_extend = -1; if(argc == 5) { gap_open = atof(argv[3]); gap_extend = atof(argv[4]); } utils::Logger logs("ap_align_profiles"); // --- the query profile SequenceProfile_SP query = read_profile_table(argv[1]); std::vector<float> query_gap_open, query_gap_extend; query->get_probabilities(core::chemical::Monomer::GAP,query_gap_open); query->get_probabilities(core::chemical::Monomer::GPE,query_gap_extend); logs << utils::LogLevel::INFO << "Query sequence is: " << query->sequence<<"\n"; // --- the template profile SequenceProfile_SP tmplt = read_profile_table(argv[2]); std::vector<float> tmplt_gap_open, tmplt_gap_extend; tmplt->get_probabilities(core::chemical::Monomer::GAP,tmplt_gap_open); tmplt->get_probabilities(core::chemical::Monomer::GPE,tmplt_gap_extend); logs << utils::LogLevel::INFO << "Template sequence is: " << tmplt->sequence<<"\n"; // --- scoring system const Picasso3 scoring(query,tmplt); const FrequencyScaledGapPenalty gaps(gap_open,gap_extend,query_gap_open,query_gap_extend,tmplt_gap_open,tmplt_gap_extend); // --- create aligner object core::alignment::NWAlignerAnyGap<Picasso3,FrequencyScaledGapPenalty> aligner(std::max(query->length(),tmplt->length())); auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! float score = aligner.align(scoring,gaps); auto ali = aligner.backtrace(); core::alignment::PairwiseSequenceAlignment seq_ali(ali, query, tmplt); std::cout << seq_ali.get_aligned_query('*') << "\n"; std::cout << seq_ali.get_aligned_template('*') << "\n"; auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); } |

ap_atom_correlations¶
ap_atom_correlations reads a multimodel PDB trajectory and calculates correlations between atomic coordinates
USAGE:
ap_atom_correlations 2kwi.pdb
where 2kwi.pdb is the input file. The output, printed on the screen, provides nine columns: i-atom j-atom covariance(i,j)
where the covariance between is computed
Keywords:
Categories:
- core::data::io::Pdb::fill_structure; core::calc::statistics::OnlineMultivariateStatistics
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/calc/statistics/OnlineMultivariateStatistics.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( ap_atom_correlations reads a multimodel PDB trajectory and calculates correlations between atomic coordinates USAGE: ap_atom_correlations 2kwi.pdb where 2kwi.pdb is the input file. The output, printed on the screen, provides nine columns: i-atom j-atom covariance(i,j) where the covariance between is computed )"; /** @brief Reads a multimodel PDB trajectory and calculates correlation between atomic coordinates * * CATEGORIES: core::data::io::Pdb::fill_structure; core::calc::statistics::OnlineMultivariateStatistics * KEYWORDS: PDB input; statistics * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1], core::data::io::is_ca); // --- Read PDB file, may be gzip-ped; take only the lines with C-alphas std::vector<core::data::basic::Vec3> atoms(reader.count_atoms(0)); std::vector<double> xyz(atoms.size() * 3); core::calc::statistics::OnlineMultivariateStatistics stats(xyz.size()); // --- Read all models from the deposit, store alpha carbons from each model as a separate vector of double values for (size_t i = 0; i < reader.count_models(); ++i) { // --- Iterate over all models in the input file reader.fill_structure(i, atoms); // --- utilize coordinates of the new pose for (size_t j = 0; j < atoms.size(); ++j) { xyz[j * 3] = atoms[j].x; xyz[j * 3 + 1] = atoms[j].y; xyz[j * 3 + 2] = atoms[j].z; } stats(xyz); } std::vector<std::string> labels; const auto structure = reader.create_structure(0); for(auto it = structure->first_const_residue(); it!=structure->last_const_residue();++it) labels.push_back(utils::string_format("%4d %3s CA", (**it).id(), (**it).residue_type().code3.c_str())); std::cout << "# i-resid coord j-resid coord i j correlation\n"; std::string xyz_chars = "XYZ"; std::cout << "#ipos j-pos correlation\n"; for (size_t i = 0; i < xyz.size(); ++i) { for (size_t j = 0; j < xyz.size(); ++j) { std::cout << labels[int(i / 3)] << "-" << xyz_chars[i % 3] << " " << labels[int(j / 3)] << "-" << xyz_chars[j % 3]; std::cout << " " << std::setw(4) << i << " " << std::setw(4) << j << " " << stats.covar(i, j) << "\n"; } } } |

ap_blast_nonredundant¶
ap_blast_nonredundant reads output from blast search (XML format) and selects a non-redundant subset of sequences. The subset is selected by hierarchical clustering (complete-linkage approach) of the sequences extracted from the given input file generated by psiblast - last iteration only. Distance between any two sequences is defined as (1 - sequence identity fraction) calculated over alignment extracted from blast results.
USAGE:
ap_blast_nonredundant blast-out.xml identity_ratio
EXAMPLE:
ap_blast_nonredundant 1K25_01+PBP_C2.psi 0.5
Keywords:
- hierarchical clustering
- blast
Categories:
- core::calc::clustering::HierarchicalClustering; core::data::io::BlastXMLReader
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | #include <iostream> #include <core/data/io/BlastXMLReader.hh> #include <core/alignment/on_alignment_computations.hh> #include <core/calc/clustering/DistanceByValues1B.hh> #include <core/calc/clustering/HierarchicalCluster.hh> #include <core/calc/clustering/HierarchicalClustering1B.hh> #include <utils/exit.hh> #include <utils/LogManager.hh> std::string program_info = R"( ap_blast_nonredundant reads output from blast search (XML format) and selects a non-redundant subset of sequences. The subset is selected by hierarchical clustering (complete-linkage approach) of the sequences extracted from the given input file generated by psiblast - last iteration only. Distance between any two sequences is defined as (1 - sequence identity fraction) calculated over alignment extracted from blast results. USAGE: ap_blast_nonredundant blast-out.xml identity_ratio EXAMPLE: ap_blast_nonredundant 1K25_01+PBP_C2.psi 0.5 )"; /** @brief Reads output from blast search (XML format) and selects a non-redundant subset of sequences * * CATEGORIES: core::calc::clustering::HierarchicalClustering; core::data::io::BlastXMLReader * KEYWORDS: hierarchical clustering; blast * GROUP: File processing;Data filtering */ int main(const int argc, const char *argv[]) { utils::LogManager::INFO(); if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::calc::clustering; BlastXMLReader blast_reader; auto hits = blast_reader.parse(argv[1]); std::vector<std::string> sequences; // --- vector containing all sequences from the last iteration of psiblast std::vector<std::string> seq_ids; // --- vector containing identifiers of these sequences; sequences.size() equals to seq_ids.size() std::map<std::string,std::string> seq_by_id; // --- maps sequence IDs (keys) to the respective sequences (values) for (const auto hsp: hits.back()) { seq_ids.push_back(hsp.hit_accession()); sequences.push_back(std::string(hsp.query_start() - 1, '-') + hsp.sbjct()); seq_by_id[seq_ids.back()] = sequences.back(); } DistanceByValues1B dist(seq_ids, 254,255); for(core::index4 i=1;i<sequences.size();++i) for(core::index4 j=0;j<i;++j) { double val = core::alignment::sum_identical(sequences[i],sequences[j]); val /= std::min(sequences[i].length(), sequences[j].length()); core::index1 d = core::index1(250 * (1-val)); // std::cout << i << " " << j << " " << core::alignment::sum_identical(sequences[i], sequences[j]) // << " " << val << " " << int(d) << "\n"; dist.set(i, j, d); dist.set(j, i, d); } HierarchicalClustering1B hac(dist.labels(), ""); CompleteLink1B merge; hac.run_clustering(dist, merge); // --- Uncomment the line below to print the clustering tree (may be a lot of output) // hac.write_merging_steps(std::cerr); std::vector<std::string> elements; // --- vector used to store elements of each cluster core::index1 cutoff = core::index1((1.0 - atof(argv[2])) * 250); std::cerr << "# clustering cutoff set to " << int(cutoff) << "\n"; auto clusters = hac.get_clusters(cutoff, 1); std::cerr << "# " << sequences.size() << " hits' set reduced to " << clusters.size() << " representatives\n"; for (core::index2 i = 0; i < clusters.size(); i++) { const auto & c = clusters[i]; std::string medoid_id = medoid_by_average_distance<core::index1, std::string, DistanceByValues1B >(c, dist).medoid; elements.clear(); collect_leaf_elements(std::static_pointer_cast<BinaryTreeNode<std::string>>(c), elements); std::cout << "> " << medoid_id; if(elements.size() > 1) { std::cout << " represents also:"; for(const std::string & e: elements) if(e!=medoid_id) std::cout << " "<<e; } core::data::sequence::remove_gaps(seq_by_id[medoid_id]); std::cout << "\n" << seq_by_id[medoid_id] << "\n\n"; } } |

ap_blastxml_to_fasta¶
ap_blastxml_to_fasta reads a XML file produced by PsiBlast and extracts sequences of all hits. The list of hits is divided into sections, according to the psiblast iteration when a given subject sequence was detected. The sequences are written on the screen in FASTA format
USAGE:
ap_blastxml_to_fasta blastout.xml
EXAMPLE:
ap_blastxml_to_fasta "1K25_01+PBP_C2.psi"
Keywords:
Categories:
- core::data::io::XML; core::algorithms::trees::TreeNode
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | #include <iostream> #include <fstream> #include <stdexcept> #include <core/data/io/BlastXMLReader.hh> #include <core/algorithms/trees/algorithms.hh> #include <core/index.hh> #include <core/data/io/XML.hh> #include <core/data/io/XMLElement.hh> #include <core/data/io/Hsp.hh> #include <utils/exit.hh> std::string program_info = R"( ap_blastxml_to_fasta reads a XML file produced by PsiBlast and extracts sequences of all hits. The list of hits is divided into sections, according to the psiblast iteration when a given subject sequence was detected. The sequences are written on the screen in FASTA format USAGE: ap_blastxml_to_fasta blastout.xml EXAMPLE: ap_blastxml_to_fasta "1K25_01+PBP_C2.psi" )"; using namespace core::data::io; struct BlastXMLVisitor { void operator()(std::shared_ptr<core::algorithms::trees::TreeNode<XMLElementData>> n) { if (n->element.name() == "Hsp") { auto xmlel = std::static_pointer_cast<XMLElement>(n); auto xmlel_root = std::static_pointer_cast<XMLElement>(n->get_root()->get_root()); const std::string &sequence = xmlel->find_value("Hsp_hseq"); const std::string &seq_name = xmlel_root->find_value("Hit_accession"); std::cout << "> " << seq_name << "\n" << sequence << "\n"; } } }; /** @brief Reads XML produced by psiblast and creates FASTA file containing all hits * * CATEGORIES: core::data::io::XML; core::algorithms::trees::TreeNode * KEYWORDS: XML; data structures * GROUP: File processing;Format conversion */ int main(int argc, char *argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter XML xxx; std::shared_ptr<XMLElement> root = xxx.load_data(argv[1]); auto it = root->begin(); while ((*it)->element.name() != "BlastOutput_iterations") ++it; // --- Visit branches until you find BlastOutput_iterations core::index2 iteration_counter = 0; for (const auto &v : **it) { if (v->element.name() == "Iteration") { std::cout << "\n# ------ iteration " << ++iteration_counter << " --------\n"; core::algorithms::trees::depth_first_preorder((*it)->get_right(), BlastXMLVisitor()); } } return 0; } |

ap_blastxml_to_hsp¶
Reads a XML file produced by PsiBlast and extracts High Scoring Pairs (HSP). Program prints a table where each row corresponds to a single HSP found in the input file. The table’s columns provide: - hit sequence ID - hit length - alignment score - number of gaps - gap percentage - number of identical positions - identity percentage - e-value - query start position - subject start position - subject sequence
USAGE:
ap_blastxml_to_hsp blastout.xml
EXAMPLE:
ap_blastxml_to_hsp "1K25_01+PBP_C2.psi"
OUTPUT:
hit sequence ID len score gaps gap% ident ident% evalue qpos tpos sequence
[ UniRef50_A0A0E9GHR2] 139 220 0 ( 0%) 28 ( 50%) 3.56e-22 3 84 --ELPDMYGWTKENVQVFGKWTGIEVTYQGNGSHVTAQSSDTGTALKKLKKLTITLGE
[ UniRef50_A0A111B192] 151 221 0 ( 0%) 48 ( 82%) 4.26e-22 1 94 AEEVPDMYGWTKETAETLAKWLNIELEFQGSGSTVQKQDVRANTAIKDIKKITLTLGD
[ UniRef50_P59676] 750 229 0 ( 0%) 48 ( 82%) 2.43e-21 1 693 AEEVPDMYGWTKETAETLAKWLNIELEFQGSGSTVQKQDVRANTAIKDIKKITLTLGD
[ UniRef50_T0UT66] 466 227 0 ( 0%) 29 ( 50%) 3.87e-21 2 410 -DAVPDMYGWTKKNADIFGEWTGIEITYKGSGKKVTKQSVKMNTSLNKTKKITLTLGD
[ UniRef50_A0A0T8ADZ4] 322 223 0 ( 0%) 58 (100%) 5.32e-21 1 265 VEEIPDMYGWKKETAETFAKWLDIELEFEGSGSVVQKQDVRTNTAIKNIKKIKLTLGD
[ UniRef50_A0A139PMG7] 412 222 0 ( 0%) 48 ( 82%) 1.37e-20 1 355 AEEVPDMYGWTKATAETLAKWLNIELEFEGSGSTVQKQDVRANTAIKDIKKITLTLGD
[ UniRef50_A0A0E9EQ17] 236 212 0 ( 0%) 29 ( 51%) 5.33e-20 3 181 --EMPDMYGWTKKNVETFGEWLGIKVHVKSKGSKVVAQSVKTNASLKKIKEITITLGD
Keywords:
- XML
- data structures
- HSP
Categories:
- core::data::io::XML; core::algorithms::trees::TreeNode
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #include <iostream> #include <fstream> #include <stdexcept> #include <core/data/io/BlastXMLReader.hh> #include <core/algorithms/trees/algorithms.hh> #include <core/index.hh> #include <core/data/io/XML.hh> #include <core/data/io/XMLElement.hh> #include <core/data/io/Hsp.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a XML file produced by PsiBlast and extracts High Scoring Pairs (HSP). Program prints a table where each row corresponds to a single HSP found in the input file. The table's columns provide: - hit sequence ID - hit length - alignment score - number of gaps - gap percentage - number of identical positions - identity percentage - e-value - query start position - subject start position - subject sequence USAGE: ap_blastxml_to_hsp blastout.xml EXAMPLE: ap_blastxml_to_hsp "1K25_01+PBP_C2.psi" OUTPUT: hit sequence ID len score gaps gap% ident ident% evalue qpos tpos sequence [ UniRef50_A0A0E9GHR2] 139 220 0 ( 0%) 28 ( 50%) 3.56e-22 3 84 --ELPDMYGWTKENVQVFGKWTGIEVTYQGNGSHVTAQSSDTGTALKKLKKLTITLGE [ UniRef50_A0A111B192] 151 221 0 ( 0%) 48 ( 82%) 4.26e-22 1 94 AEEVPDMYGWTKETAETLAKWLNIELEFQGSGSTVQKQDVRANTAIKDIKKITLTLGD [ UniRef50_P59676] 750 229 0 ( 0%) 48 ( 82%) 2.43e-21 1 693 AEEVPDMYGWTKETAETLAKWLNIELEFQGSGSTVQKQDVRANTAIKDIKKITLTLGD [ UniRef50_T0UT66] 466 227 0 ( 0%) 29 ( 50%) 3.87e-21 2 410 -DAVPDMYGWTKKNADIFGEWTGIEITYKGSGKKVTKQSVKMNTSLNKTKKITLTLGD [ UniRef50_A0A0T8ADZ4] 322 223 0 ( 0%) 58 (100%) 5.32e-21 1 265 VEEIPDMYGWKKETAETFAKWLDIELEFEGSGSVVQKQDVRTNTAIKNIKKIKLTLGD [ UniRef50_A0A139PMG7] 412 222 0 ( 0%) 48 ( 82%) 1.37e-20 1 355 AEEVPDMYGWTKATAETLAKWLNIELEFEGSGSTVQKQDVRANTAIKDIKKITLTLGD [ UniRef50_A0A0E9EQ17] 236 212 0 ( 0%) 29 ( 51%) 5.33e-20 3 181 --EMPDMYGWTKKNVETFGEWLGIKVHVKSKGSKVVAQSVKTNASLKKIKEITITLGD )"; using namespace core::data::io; /** @brief Reads XML produced by psiblast and creates High Scoring FASTA Pair * * CATEGORIES: core::data::io::XML; core::algorithms::trees::TreeNode * KEYWORDS: XML; data structures; HSP * GROUP: File processing;Format conversion */ int main(int argc, char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // ---------- creates BlastXMLReader object BlastXMLReader p; // ---------- parse data from XML file and returns it to variable auto iterations = p.parse(argv[1]); std::cout << Hsp::output_header << "\n"; // print Hsp line for every hit for every iteration for (core::index2 i = 0; i < iterations.size(); i++) { for (core::index2 j = 0; j < iterations[i].size(); j++) std::cout << iterations[i][j] << "\n"; } return 0; } |

ap_bootstrap_quantile¶
ap_bootstrap_quantile reads a file with real values and calculates statistics for a given quantile. The statistics: expected quantile value and its standard deviation are computed by 100-folt bootstrap procedure. If no input file is provided, the program calculates the statistics of a random sample withdrawn from a normal distribution (mean=0.0, variance = 1.0)
USAGE:
ap_bootstrap_quantile quantile_value infile
ap_bootstrap_quantile quantile_value
Keywords:
Categories:
- core::calc::statistics::simple_statistics.hh; core::calc::statistics::Random
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #include <iostream> #include <core/index.hh> #include <core/calc/statistics/Random.hh> #include <core/calc/statistics/OnlineStatistics.hh> #include <core/calc/statistics/simple_statistics.hh> #include <utils/exit.hh> std::string program_info = R"( ap_bootstrap_quantile reads a file with real values and calculates statistics for a given quantile. The statistics: expected quantile value and its standard deviation are computed by 100-folt bootstrap procedure. If no input file is provided, the program calculates the statistics of a random sample withdrawn from a normal distribution (mean=0.0, variance = 1.0) USAGE: ap_bootstrap_quantile quantile_value infile ap_bootstrap_quantile quantile_value )"; /** @brief Reads a file with real values and calculates statistics for a given quantile * * If no input file is provided, the program calculates the statistics from a random sample * * CATEGORIES: core::calc::statistics::simple_statistics.hh; core::calc::statistics::Random * KEYWORDS: random numbers; statistics * GROUP: Statistics; */ int main(const int argc, const char *argv[]) { if(argc ==1) utils::exit_OK_with_message(program_info); double quantile_level = atof(argv[1]); core::calc::statistics::OnlineStatistics stats; if(argc < 3) { // --- complain about missing program parameter //std::cerr << program_info; // ---------- Use the random engine if no data is provided - for testing purposes size_t n_data = 10000; std::vector<double> data(n_data); core::calc::statistics::Random r = core::calc::statistics::Random::get(); r.seed(12345); // --- seed the generator for repeatable results core::calc::statistics::NormalRandomDistribution<double> normal_random; for (core::index4 n = 0; n < n_data; ++n) data[n] = normal_random(r); const auto out = core::calc::statistics::bootstrap_quantile(data, quantile_level, 100); std::cout << "q-level value stdev\n" << quantile_level << " " << out.first << " " << out.second << "\n"; } else { std::vector<double> data; for(size_t i_file=2;i_file<argc;++i_file) { data.clear(); std::ifstream in(argv[i_file]); double r; while(in) { in >> r; data.push_back(r); } in.close(); const auto out = core::calc::statistics::bootstrap_quantile(data, quantile_level, 100); std::cout << "fname q-level value stdev\n" << argv[i_file] << " " << quantile_level << " " << out.first << " " << out.second << "\n"; } } } |

ap_build_crystal¶
ap_create_crystal reads a given PDB file and prints all atoms in a unit cell.
USAGE:
ap_create_crystal 5edw.pdb
Keywords:
Categories:
- core/data/io/Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #include <iostream> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> #include <iomanip> std::string program_info = R"( ap_create_crystal reads a given PDB file and prints all atoms in a unit cell. USAGE: ap_create_crystal 5edw.pdb )"; /** @brief ap_create_crystal reads a given PDB file and prints all atoms in a unit cell. * * CATEGORIES: core/data/io/Pdb; * KEYWORDS: PDB input; PDB line filter; Structure * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1], // file name (PDB format, may be gzip-ped) core::data::io::keep_all, // a predicate - now read all atoms core::data::io::keep_all, true); // parse PDB header std::shared_ptr<core::data::io::Remark290> r290 = reader.symmetry_operators(); core::data::structural::Structure_SP s = reader.create_structure(0); std::cout << "# Symmetry operators found: " << r290->count_operators() << "\n"; core::data::basic::Vec3 tmp; core::index2 im = 0; for (const auto &rt: *r290) { std::cout << "MODEL " << std::setw(6) << ++im << "\n"; for (auto a_it = s->first_atom(); a_it != s->last_atom(); ++a_it) { tmp.set((**a_it)); rt.apply(**a_it); std::cout << (*a_it)->to_pdb_line() << "\n"; (*a_it)->set(tmp); } std::cout << "ENDMDL\n"; } } |

ap_calc_rdf¶
ap_calc_rdf calculates Radial Distribution Function (RDF) over a trajectory If a multi-model PDB file was given, the program combines the data from all models
USAGE:
ap_calc_rdf trajectory.pdb HOH O box_side
where trajectory.pdb is the input file multimodel-PDB file, HOH and O defines the atom in a molecules for which the RDF will be evaluated
Keywords:
Categories:
- core::data::basic::Vec3
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | #include <iostream> #include <core/index.hh> #include <core/data/io/Pdb.hh> #include <core/data/basic/Vec3I.hh> #include <core/calc/statistics/Histogram.hh> #include <utils/exit.hh> std::string program_info = R"( ap_calc_rdf calculates Radial Distribution Function (RDF) over a trajectory If a multi-model PDB file was given, the program combines the data from all models USAGE: ap_calc_rdf trajectory.pdb HOH O box_side where trajectory.pdb is the input file multimodel-PDB file, HOH and O defines the atom in a molecules for which the RDF will be evaluated )"; /** @brief Calculates Radial Distribution Function * * CATEGORIES: core::data::basic::Vec3 * KEYWORDS: PDB input; simulation * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter double L = utils::from_string<double>(argv[4]); // The third parameter is the box width (in Angstroms) core::data::basic::Vec3I::set_box_len(L); core::data::io::Pdb reader(argv[1]); // --- file name (PDB format, may be gzip-ped) core::index4 n_atoms = reader.count_atoms(0); core::index4 n_frmes = reader.count_models(); std::vector<core::data::basic::Vec3I> frame_i(n_atoms); core::calc::statistics::Histogram<double, core::index4> h(0.01, 0, L/4.0); // ---------- Load coordinates to memory and accumulate RDF ---------- for (int i_start = 0; i_start < n_frmes; ++i_start) { reader.fill_structure(i_start, frame_i); for(core::index4 i=1;i<n_atoms;++i) { for(core::index4 j=0;j<i;++j) { double d = frame_i[i].distance_to(frame_i[j]); h.insert(d); } } } double norm = 4 * M_PI * n_atoms / pow(L, 3.0) * n_frmes; for(core::index4 i=0;i<h.count_bins();++i) { std::cout << h.bin_middle_val(i) << " " << h.get_bin(i) / (norm * h.bin_middle_val(i) * h.bin_middle_val(i)) << " " << h.get_bin(i) << "\n"; } // ---------- Calculate displacement ---------- } |

ap_caonly_multimodel¶
Reads a file with names of PDB files and creates a single multimodel PDB file. Each model is stored as a separate model within that file. Only C-alpha atoms are written to the output PDB
EXAMPLE:
ap_caonly_multimodel cat_list
where cat_list is a file with a content like:
2gb1-model1.pdb 2gb1-model2.pdb 2gb1-model3.pdb 2gb1-model4.pdb
Keywords:
Categories:
- core::data::io::is_ca
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #include <vector> #include <string> #include <fstream> #include <iostream> #include <cstring> #include <core/data/io/Pdb.hh> #include <utils/options/output_options.hh> #include <utils/options/input_options.hh> #include <utils/exit.hh> using namespace core::data::io; // PDB is from this namespace using namespace core::data::structural; using namespace core::calc::structural; using namespace utils; std::string program_info = R"( Reads a file with names of PDB files and creates a single multimodel PDB file. Each model is stored as a separate model within that file. Only C-alpha atoms are written to the output PDB EXAMPLE: ap_caonly_multimodel cat_list where cat_list is a file with a content like: 2gb1-model1.pdb 2gb1-model2.pdb 2gb1-model3.pdb 2gb1-model4.pdb )"; /** @brief Reads cat_list of pdb files and creates multimodel pdb with CA only * * CATEGORIES: core::data::io::is_ca * KEYWORDS: PDB input; CA only; structure selectors;PDB output; PDB line filter * GROUP: File processing;Format conversion */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); PdbLineFilter filter = core::data::io::is_ca; std::ofstream out; out.open("all.pdb"); std::ifstream pdb_list(argv[1]); std::string pdb_file; std::getline(pdb_list, pdb_file); Pdb pdb = Pdb(pdb_file, filter); Structure_SP strctr = pdb.create_structure(0); out << "MODEL 1\n"; for (auto it = strctr->first_atom(); it != strctr->last_atom(); it++) out << (*it)->to_pdb_line() << "\n"; out << "ENDMDL\n"; core::index4 i = 2; while (std::getline(pdb_list, pdb_file)) { Pdb pdb = Pdb(pdb_file, filter); pdb.fill_structure(0, *strctr); out << utils::string_format("MODEL %6d\n", i); for (auto it = strctr->first_atom(); it != strctr->last_atom(); it++) out << (*it)->to_pdb_line() << "\n"; out << "ENDMDL\n"; i++; } out.close(); } |

ap_contact_map_overlap¶
ap_contact_map_overlap calculates overlap between contact maps calculated for two (or more) structures. The overlap, defined as Jaccard coefficient, is computed between the native structure and every model found in models.pdb; map-type can take one of the following values: CA CB and SC for C-alpha, C-beta and all atom side chain, respectively. Contact is recorded when any selected atoms from two different residues are closer to each other than the give cutoff. If only one PDB file is given, the program computes calculates overlap between the first and any other model found in that file
USAGE:
ap_contact_map_overlap map-type native.pdb models.pdb cutoff
EXAMPLE:
ap_contact_map_overlap SC 2gb1.pdb 2gb1-model1.pdb 4.5
REFERENCE: https://en.wikipedia.org/wiki/Jaccard_index
Keywords:
Categories:
- core::calc::structural::ContactMap
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | #include <vector> #include <algorithm> #include <core/data/io/Pdb.hh> #include <core/calc/structural/interactions/ContactMap.hh> #include <utils/exit.hh> std::string program_info = R"( ap_contact_map_overlap calculates overlap between contact maps calculated for two (or more) structures. The overlap, defined as Jaccard coefficient, is computed between the native structure and every model found in models.pdb; map-type can take one of the following values: CA CB and SC for C-alpha, C-beta and all atom side chain, respectively. Contact is recorded when any selected atoms from two different residues are closer to each other than the give cutoff. If only one PDB file is given, the program computes calculates overlap between the first and any other model found in that file USAGE: ap_contact_map_overlap map-type native.pdb models.pdb cutoff EXAMPLE: ap_contact_map_overlap SC 2gb1.pdb 2gb1-model1.pdb 4.5 REFERENCE: https://en.wikipedia.org/wiki/Jaccard_index )"; /** @brief Calculates overlap between contact maps calculated for two (or more) structures * * CATEGORIES: core::calc::structural::ContactMap * KEYWORDS: PDB input; contact map * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; // PDB is from this namespace using namespace core::data::structural; using namespace core::data::structural::selectors; // --- for all structural selectors using namespace core::calc::structural::interactions; AtomSelector_SP selector = std::make_shared<IsSC>(); core::data::io::PdbLineFilter filter = core::data::io::is_not_water; if (std::strcmp(argv[1],"CA")==0 ) { selector = std::make_shared<IsNamedAtom>(" CA "); core::data::io::PdbLineFilter filter = core::data::io::is_ca; } if (std::strcmp(argv[1],"CB")==0) { core::data::io::PdbLineFilter filter = core::data::io::is_cb; selector = std::make_shared<IsNamedAtom>(" CB "); } double cutoff = utils::from_string<double>(argv[(argc==5) ? 4 : 3]); // The third/fourth parameter is the contact distance (in Angstroms) // --- This is the case when user gave a reference structure (e.g. the native one) if (argc == 5) { Pdb pdb_native = Pdb(argv[2], filter); core::data::structural::Structure_SP reference_structure = pdb_native.create_structure(0); ContactMap_SP reference_map = std::make_shared<ContactMap>(*reference_structure, cutoff, selector); Pdb models_pdb = Pdb(argv[3], filter); ContactMap cmap(*reference_structure, cutoff, selector); std::vector<std::pair<core::index2, core::index2>> contacts; for (core::index4 i = 0; i < models_pdb.count_models(); ++i) { models_pdb.fill_structure(i,*reference_structure); ContactMap cmap(*reference_structure, cutoff, selector); std::cout << i << " " << reference_map->jaccard_overlap_coefficient(cmap) << "\n"; } } else { Pdb models_pdb = Pdb(argv[2], filter); core::data::structural::Structure_SP structure = models_pdb.create_structure(0); std::vector<std::vector<std::pair<core::index2, core::index2>>> models(models_pdb.count_models()); for (int i = 0; i < models_pdb.count_models(); ++i) { models_pdb.fill_structure(i,*structure); ContactMap cmap(*models_pdb.create_structure(i), cutoff, selector); cmap.nonempty_indexes(models[i]); } for (core::index4 i = 1; i < models.size(); ++i) { for (core::index4 j = 0; j < i; ++j) std::cout << i << " " << j << " " << core::calc::structural::interactions::jaccard_overlap_coefficient(models[i], models[j]) << "\n"; } } } |

ap_crmsd_on_common_subset¶
Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on common subset of atoms Program ensures the same number of atoms.
USAGE:
./ap_crmsd_on_common_subset -in:pdb:native=file.pdb -select:chains=A -select:bb -in:pdb=rebuilt.pdb
EXAMPLEs:
./ap_crmsd_on_common_subset -in:pdb:native=6h60.pdb -select:chains=A -select:bb -in:pdb=6h60_A_rebuilt.pdb
REFERENCE: Kabsch, W. “A Solution for the Best Rotation to Relate Two Sets of Vectors.” Acta Cryst (1976) 32 922-923
Keywords:
Categories:
- core/calc/structural/transformations/Crmsd
Input files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/transformations/Crmsd.hh> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/options/input_options.hh> #include <utils/options/structures_from_cmdline.hh> #include <utils/options/select_options.hh> #include <utils/options/selector_from_cmdline.hh> #include <utils/exit.hh> #include <core/algorithms/basic_algorithms.hh> #include <utils/options/output_options.hh> #include <core/data/structural/selectors/SelectChainBreaks.hh> std::string program_info = R"( Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on common subset of atoms Program ensures the same number of atoms. USAGE: ./ap_crmsd_on_common_subset -in:pdb:native=file.pdb -select:chains=A -select:bb -in:pdb=rebuilt.pdb EXAMPLEs: ./ap_crmsd_on_common_subset -in:pdb:native=6h60.pdb -select:chains=A -select:bb -in:pdb=6h60_A_rebuilt.pdb REFERENCE: Kabsch, W. "A Solution for the Best Rotation to Relate Two Sets of Vectors." Acta Cryst (1976) 32 922-923 )"; void extract_atom_by_name(const core::data::structural::Structure_SP s, std::vector<std::string> & atom_names, std::map<std::string, core::data::structural::PdbAtom_SP> & atoms_by_name, bool aa_only = true, bool skip_chainbreaks=true) { using namespace core::data::structural; // --- select only amino acids selectors::IsAA is_aa; // --- remove atoms at chain breaks selectors::ProperlyConnectedCA at_gap; for (auto c: *s) { for (auto r: *c) { if (r == nullptr) continue; if (aa_only && ! is_aa(*r)) continue; if (skip_chainbreaks && ! at_gap(*r)) continue; for(auto a: (*r)) { std::string code = c->id() + (*r).residue_type().code3 + (*r).residue_id() + a->atom_name(); atom_names.push_back(code); atoms_by_name[code] = a; } } } } /** @brief Calculates crmsd value on C-alpha coordinates. The program prints just the crmsd value. * * CATEGORIES: core/calc/structural/transformations/Crmsd * KEYWORDS: PDB input; crmsd * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::basic::Vec3; using namespace core::calc::structural::transformations; using namespace core::data::io; using namespace utils::options; using namespace core::data::basic; using namespace core::data::structural; using namespace core::data::structural::selectors; using namespace core::calc::structural; using namespace core::calc::structural::transformations; Crmsd<std::vector<Vec3>, std::vector<Vec3>> rms; utils::options::OptionParser &cmd = OptionParser::get("ap_crmsd_on_common_subset"); // ---------- input PDB structures cmd.register_option(input_pdb, input_pdb_native, input_pdb_list, input_pdb_path, input_pdb_header); // ---------- selecting options cmd.register_option(select_ca, select_bb, select_bb_cb, select_cb, select_atoms_by_name, select_aa, select_chains, all_models); // ---------- output PDB structures cmd.register_option(output_pdb,output_name_prefix); cmd.register_option(verbose, db_path); if (!cmd.parse_cmdline(argc, argv)) return 1; Structure_SP native = native_from_cmdline(); std::vector<core::data::structural::Structure_SP> structures; std::vector<std::string> structure_ids; utils::options::structures_from_cmdline(structure_ids, structures); // ---------- Load atoms from the native structure std::vector<std::string> native_names; std::map<std::string, PdbAtom_SP> native_name_to_atom; extract_atom_by_name(native, native_names, native_name_to_atom, select_aa.was_used()); std::sort(native_names.begin(), native_names.end()); std::vector<Vec3> q, t; std::cout << "native nres, atoms " << native->count_residues() << " " << native->count_atoms() << "\n"; std::cout << "model nres, atoms " << structures[0]->count_residues() << " " << structures[0]->count_atoms() << "\n"; for (auto i = 0; i < structures.size(); ++i) { // ---------- Load atoms from a query structure std::vector<std::string> q_names; std::map<std::string, PdbAtom_SP> q_name_to_atom; extract_atom_by_name(structures[i], q_names, q_name_to_atom, select_aa.was_used()); std::sort(q_names.begin(), q_names.end()); // ---------- find the common subset std::vector<std::string> common_atom_names; core::algorithms::intersect_sorted(native_names.begin(), native_names.end(), q_names.begin(), q_names.end(), common_atom_names); // ---------- get the two subset of atoms q.clear(); t.clear(); std::shared_ptr<std::vector<double>> errors = std::make_shared<std::vector<double>>(); for (const std::string &name: common_atom_names) { t.push_back(*q_name_to_atom[name]); q.push_back(*native_name_to_atom[name]); } double rms_val = rms.crmsd(q, t, q.size(), true); rms.calculate_crmsd_value(q, t, q.size(), errors); // ---------- This is the moment when we can dump the transformed structure into a PDB file if(output_pdb.was_used()) { int iatm = -1; std::string fname; if(input_pdb_list.was_used()) fname = option_value<std::string>(output_name_prefix)+utils::split(structure_ids[i],{'/'}).back() + ".pdb"; else fname = option_value<std::string>(output_pdb); std::ofstream out(fname); for (const std::string &name: common_atom_names) { q_name_to_atom[name]->b_factor((*errors)[++iatm]); out << q_name_to_atom[name]->to_pdb_line() << "\n"; // std::cout << name << " " << (*errors)[++iatm] << "\n"; } } std::cout << native->code() << " " << i << " crmsd: " << rms_val << "\n"; } } |

ap_docking_crmsd¶
ap_docking_crmsd calculates crmsd between ligand positions after flexible docking to a receptor. The program reads in a native pose and at least one PDB file with a computed pose (i.e. a model), each of them must contain a ligand molecule bound to a protein receptor. The ligand can be a small molecule, peptide or even a protein. The program finds a small-molecule ligand by residue ID (a three-letter code, such as CAM) Peptide ligands (or proteins) are identified by a chain ID (a single letter code, such as X). If the reference structure is not given and dash ‘-’ character is used instead (as in the last example), the program evaluates pairwise all-vs-all cRMSD calculations. The output provides: - ligand name (and possibly model ID) - crmsd on receptor - no. of atoms of a receptor - crmsd on a ligand - no. of atoms of a ligand
USAGE:
./ap_docking_crmsd reference.pdb ligand_def model.pdb [model2.pdb ...]
SEE ALSO: ap_ligand_clustering - for clustering of ligand docking poses ap_stiff_docking_crmsd - for a rigid docking crmsd calculations
EXAMPLEs:
./ap_docking_crmsd 2m56-ref.pdb CAM 00199.pdb 00963.pdb 04473.pdb
./ap_docking_crmsd 2kwi-1.pdb B 2kwi.pdb
./ap_docking_crmsd - B 2kwi.pdb
where 2m56-ref.pdb is the native and CAM is the three-letter PDB code of the ligand for which crmsd will be evaluated and 00199.pdb and the two other files are conformation after docking. In the second and third examples, B is the ID of the chain containing a ligand peptide.
Keywords:
Categories:
- core::protocols::PairwiseCrmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <core/protocols/PairwiseCrmsd.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/exit.hh> std::string program_info = R"( ap_docking_crmsd calculates crmsd between ligand positions after flexible docking to a receptor. The program reads in a native pose and at least one PDB file with a computed pose (i.e. a model), each of them must contain a ligand molecule bound to a protein receptor. The ligand can be a small molecule, peptide or even a protein. The program finds a small-molecule ligand by residue ID (a three-letter code, such as CAM) Peptide ligands (or proteins) are identified by a chain ID (a single letter code, such as X). If the reference structure is not given and dash '-' character is used instead (as in the last example), the program evaluates pairwise all-vs-all cRMSD calculations. The output provides: - ligand name (and possibly model ID) - crmsd on receptor - no. of atoms of a receptor - crmsd on a ligand - no. of atoms of a ligand USAGE: ./ap_docking_crmsd reference.pdb ligand_def model.pdb [model2.pdb ...] SEE ALSO: ap_ligand_clustering - for clustering of ligand docking poses ap_stiff_docking_crmsd - for a rigid docking crmsd calculations EXAMPLEs: ./ap_docking_crmsd 2m56-ref.pdb CAM 00199.pdb 00963.pdb 04473.pdb ./ap_docking_crmsd 2kwi-1.pdb B 2kwi.pdb ./ap_docking_crmsd - B 2kwi.pdb where 2m56-ref.pdb is the native and CAM is the three-letter PDB code of the ligand for which crmsd will be evaluated and 00199.pdb and the two other files are conformation after docking. In the second and third examples, B is the ID of the chain containing a ligand peptide. )"; using namespace core::data::structural; /** @brief ap_docking_crmsd calculates crmsd between two ligand positions after docking to a receptor. * * CATEGORIES: core::protocols::PairwiseCrmsd * KEYWORDS: PDB input; crmsd; docking; structure selectors * GROUP: Structure calculations; Docking; */ int main(const int argc, const char* argv[]) { using namespace core::data::structural::selectors; if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter const std::string code(argv[2]); // --- The ligand code is the second parameter of the program AtomSelector_SP select_ligand = nullptr; // --- Ligand selector object if (code.size() == 3) // --- If the code is 3 characters long, its a residue code select_ligand = std::static_pointer_cast<AtomSelector>(std::make_shared<SelectResidueByName>(code)); else { AtomSelector_SP select_chain = std::static_pointer_cast<AtomSelector>(std::make_shared<ChainSelector>(code[0])); std::shared_ptr<LogicalANDSelector> select_chain_ca = std::make_shared<LogicalANDSelector>(); select_chain_ca->add_selector( std::make_shared<IsCA>() ); select_chain_ca->add_selector(select_chain); select_ligand = select_chain_ca; } std::shared_ptr<LogicalANDSelector> select_receptor = std::make_shared<LogicalANDSelector>(); // --- Receptor selector object AtomSelector_SP select_not_ligand = std::static_pointer_cast<AtomSelector>(std::make_shared<InverseAtomSelector>(*select_ligand)); select_receptor->add_selector(std::make_shared<IsCA>()); select_receptor->add_selector(select_not_ligand); core::protocols::PairwiseCrmsd crmsd_protocol(select_receptor, select_ligand); for(core::index2 i=3;i<argc;++i) { core::data::io::Pdb reader(argv[i], core::data::io::is_not_hydrogen); if (reader.count_models()>1) { for (core::index2 j = 0; j < reader.count_models(); ++j) crmsd_protocol.add_input_structure(reader.create_structure(j), utils::string_format("%s:%4d",argv[i], j)); } else crmsd_protocol.add_input_structure(reader.create_structure(0), argv[i]); } crmsd_protocol.crmsd_cutoff(50.0); // crmsd cutoff large enough to get some output crmsd_protocol.output_stream( std::shared_ptr<std::ostream>(&std::cout, [](void*) {}) ); if(std::string(argv[1]) != "-") { core::data::io::Pdb reader(argv[1], core::data::io::is_not_hydrogen); Structure_SP native = reader.create_structure(0); crmsd_protocol.calculate(native); } else crmsd_protocol.calculate(); } |

ap_download_pdb¶
Simple app downloads a requested pdb file from RCSB website; it expects a four-letter PDB code of the deposit
USAGE:
ap_download_pdb PDB_code
EXAMPLE:
ap_download_pdb 2gb1
Keywords:
- PDB file
- download
Categories:
- utils/read_properties_file
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #include <iostream> #include <utils/exit.hh> #include <utils/io_utils.hh> std::string program_info = R"( Simple app downloads a requested pdb file from RCSB website; it expects a four-letter PDB code of the deposit USAGE: ap_download_pdb PDB_code EXAMPLE: ap_download_pdb 2gb1 )"; /** @brief Simple app downloads a pdb file from RCSB website * * CATEGORIES: utils/read_properties_file * KEYWORDS: PDB file; download * GROUP: File processing */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter std::ofstream out(std::string(argv[1])+".pdb"); out << utils::download_pdb(argv[1]); out.close(); } |

ap_dssp¶
Detects secondary structure using BioShell’s implementation of the DSSP algorithm.
USAGE:
ap_dssp input.pdb
EXAMPLE:
ap_dssp 5edw.pdb
REFERENCE: Kabsch, Wolfgang, and Christian Sander. “Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features.” Biopolymers 22 (1983): 2577-2637. doi:10.1002/bip.360221211
Keywords:
Categories:
- core::calc::structural::ProteinArchitecture
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/ProteinArchitecture.hh> #include <utils/exit.hh> std::string program_info = R"( Detects secondary structure using BioShell's implementation of the DSSP algorithm. USAGE: ap_dssp input.pdb EXAMPLE: ap_dssp 5edw.pdb REFERENCE: Kabsch, Wolfgang, and Christian Sander. "Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features." Biopolymers 22 (1983): 2577-2637. doi:10.1002/bip.360221211 )"; /** @brief DSSP implementation * * CATEGORIES: core::calc::structural::ProteinArchitecture; * KEYWORDS: PDB input; Hydrogen bonds; secondary structure; DSSP; Protein structure features * GROUP: Structure calculations; */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); core::calc::structural::ProteinArchitecture pa(*strctr); std::cout << pa.hec_string() << "\n"; } |

ap_dssp_to_ss2¶
Reads a DSSP file and writes secondary structure in SS2 format. To convert DSSP to FASTA format use ap_DsspData
EXAMPLE:
ap_dssp_to_ss2 5edw.dssp
Keywords:
Categories:
- core::data::io::DsspData
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #include <iostream> #include <core/data/io/ss2_io.hh> #include <core/data/io/DsspData.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a DSSP file and writes secondary structure in SS2 format. To convert DSSP to FASTA format use ap_DsspData EXAMPLE: ap_dssp_to_ss2 5edw.dssp )"; /** @brief Reads a DSSP file and prints the secondary structure of each chain in SS2 format. * * @see ap_DsspData.cc converts DSSP to FASTA format * CATEGORIES: core::data::io::DsspData * KEYWORDS: DSSP; Structure; secondary structure; Format conversion * GROUP: File processing; Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // --- read a DSSP file - the first command line argument of the program core::data::io::DsspData dssp(argv[1], true); for (const auto & ss2 : dssp.sequences()) // --- for each protein sequence found in the DSSP data ... core::data::io::write_ss2(*ss2,std::cout); // --- print it as SS2! } |

ap_filter_fasta¶
ap_find_in_fasta reads a file in FASTA format and prints only these sequences which satisfy the following filters: - sequence must a protein - sequence must not be shorter than 20 aa - sequence must contain at most 10 UNK residues The output sequences are sorted.
USAGE:
ap_filter_fasta input.fasta [input2.fasta ...]
EXAMPLE:
ap_filter_fasta ferrodoxins.fasta
Keywords:
- FASTA input
- FASTA output
- sequence
- FASTA
- pre-processing
- sequence filters
Categories:
- core/data/io/fasta_io
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | #include <iostream> #include <core/algorithms/UnionFind.hh> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> std::string program_info = R"( ap_find_in_fasta reads a file in FASTA format and prints only these sequences which satisfy the following filters: - sequence must a protein - sequence must not be shorter than 20 aa - sequence must contain at most 10 UNK residues The output sequences are sorted. USAGE: ap_filter_fasta input.fasta [input2.fasta ...] EXAMPLE: ap_filter_fasta ferrodoxins.fasta )"; /** @brief This program reads a file with sequences in FASTA format and sorts them by length. * DNA sequences, sequences that are shorter than 15 residues and those having more than 10 Xs are removed * * CATEGORIES: core/data/io/fasta_io; * KEYWORDS: FASTA input; FASTA output; sequence; FASTA; pre-processing; sequence filters * GROUP: File processing;Data filtering */ int main(const int argc, const char* argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // --- Sequence_SP is just a std::shared_ptr to core::data::sequence::Sequence type using core::data::sequence::Sequence_SP; // --- Create a container where the sequences will be stored std::vector<Sequence_SP> sequences; // --- Read a file (or files) with FASTA sequences; sequences are appended to the given vector for (int i = 1; i < argc; ++i) core::data::io::read_fasta_file(argv[i], sequences); // --- Remove sequences that do not come from proteins sequences.erase(std::remove_if(sequences.begin(),sequences.end(), [](const Sequence_SP s){ return !s->is_protein_sequence;}),sequences.end()); // --- Remove sequences that are too short sequences.erase(std::remove_if(sequences.begin(),sequences.end(), [](const Sequence_SP s){ return s->length()<20;}),sequences.end()); // --- Remove sequences that contain 10 or more 'X' characters (i.e. unknown amino acids) sequences.erase(std::remove_if(sequences.begin(),sequences.end(), [](const Sequence_SP s){ return std::count(s->sequence.begin(), s->sequence.end(), 'X')>10;}),sequences.end()); // --- Now, sort the sequences by length std::sort(sequences.begin(),sequences.end(), [](const Sequence_SP si,const Sequence_SP sj){ return si->length()<sj->length();}); // --- Remove duplicates core::algorithms::UnionFind<Sequence_SP, core::index4> uf; uf.add_element(sequences[0]); for (size_t i = 1; i < sequences.size(); ++i) { uf.add_element(sequences[i]); for (int j = i - 1; j >= 0; --j) { if (sequences[i]->length() - sequences[j]->length() > 20) break; if (sequences[i]->sequence.find(sequences[j]->sequence) != std::string::npos) uf.union_set(i, j); } } for (size_t i = 0; i < sequences.size(); ++i) { core::index4 set_id = uf.find_set(i); if (set_id != i) { std::string new_header = sequences[set_id]->header() + " " + sequences[i]->header(); sequences[set_id]->header(new_header); } } // --- Print sequences to stdout for (size_t i = 0; i < sequences.size(); ++i) { if (uf.find_set(i) == i) std::cout << core::data::io::create_fasta_string(*sequences[i]) << "\n"; } } |

ap_filter_msa¶
Reads a Multiple Sequence Alignment (MSA) in ClustalW or FASTA format and removes these sequences who produce highly gapped columns. This filter first identifies Highly Gapped Columns (HGCs) as those MSA columns that have at most HG-fraction*N_SEQ letters and all remaining characters are gaps. Then each sequence that participates in at least sum_gap gapped columns is removed
USAGE:
./ap_filter_msa msa-file HG-fraction sum_gap
EXAMPLE:
./ap_filter_msa cyped.CYP109.aln 0.01 10
where cyped.CYP109.aln is the name of input MSA file; 0.01 means that in gapped columns 99% of sequences have a gap and 1% has a letter. Finally, 10 means that sequences that participate in at least 10 HGPs are removed from the input MSA
Keywords:
Categories:
- core::alignment::MSAColumnConservation
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <iostream> #include <core/alignment/FilterByHighlyGappedColumns.hh> #include <core/data/io/clustalw_io.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> #include <core/data/io/fasta_io.hh> std::string program_info = R"( Reads a Multiple Sequence Alignment (MSA) in ClustalW or FASTA format and removes these sequences who produce highly gapped columns. This filter first identifies Highly Gapped Columns (HGCs) as those MSA columns that have at most HG-fraction*N_SEQ letters and all remaining characters are gaps. Then each sequence that participates in at least sum_gap gapped columns is removed USAGE: ./ap_filter_msa msa-file HG-fraction sum_gap EXAMPLE: ./ap_filter_msa cyped.CYP109.aln 0.01 10 where cyped.CYP109.aln is the name of input MSA file; 0.01 means that in gapped columns 99% of sequences have a gap and 1% has a letter. Finally, 10 means that sequences that participate in at least 10 HGPs are removed from the input MSA )"; /** @brief Reads a MSA in ClustalW format and removes these sequences who produce * highly gapped columns * * CATEGORIES: core::alignment::MSAColumnConservation * KEYWORDS: clustal input; MSA; FASTA input * GROUP: Alignments */ int main(const int argc, const char* argv[]) { if(argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; double f_nnc = atof(argv[2]); core::index2 sum_gap = atoi(argv[3]); std::vector<Sequence_SP> msa; // --- Sequence_SP is just a shorter name for std::shared_ptr<Sequence> const std::pair<std::string, std::string> name_ext = utils::root_extension(argv[1]); if((name_ext.second=="fasta")||(name_ext.second=="FASTA")||(name_ext.second=="fast")) core::data::io::read_fasta_file(argv[1], msa, true); else core::data::io::read_clustalw_file(argv[1],msa); core::alignment::FilterByHighlyGappedColumns filter{msa}; filter.f_non_gapped(f_nnc); core::index2 n_seq = msa.size(); filter.run(sum_gap); std::cout << "# " << filter.msa().size() << " sequences remained, " << (n_seq - filter.msa().size()) << " removed\n"; std::cout << "# " << filter.highly_gapped_positions().size() << " highly gapped positions found\n"; for (const core::data::sequence::Sequence_SP &seq:filter.msa()) std::cout << core::data::io::create_fasta_string(*seq); int i_seq = -1; for (core::index2 cnt:filter.highly_gapped_for_sequence()) std::cout << "# " << (++i_seq) << " " << cnt << "\n"; } |

ap_filter_pdb¶
Shows the concept of PDB line filters in BioShell: creates a PDB reader which accepts only desired atoms/groups. The filter used by this example to read PDB file is created based on filter names (space separated). For each string a distinct filter will be created; all filters will be joined with logical AND operation i.e. all must be true to read in a PDB line. Therefore the last example below will return an empty set of atoms because the two filters it uses are contradictory.
USAGE:
ex_filter_pdb 5edw.pdb filter-names
EXAMPLEs:
ex_filter_pdb 5edw.pdb is_standard_atom
ex_filter_pdb 5edw.pdb is_bb is_cb
Keywords:
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <iostream> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> std::string program_info = R"( Shows the concept of PDB line filters in BioShell: creates a PDB reader which accepts only desired atoms/groups. The filter used by this example to read PDB file is created based on filter names (space separated). For each string a distinct filter will be created; all filters will be joined with logical AND operation i.e. all must be true to read in a PDB line. Therefore the last example below will return an empty set of atoms because the two filters it uses are contradictory. USAGE: ex_filter_pdb 5edw.pdb filter-names EXAMPLEs: ex_filter_pdb 5edw.pdb is_standard_atom ex_filter_pdb 5edw.pdb is_bb is_cb )"; /** @brief Filters a PDB file by a given filter * * CATEGORIES: core::data::io::Pdb; * KEYWORDS: PDB input; PDB line filter */ int main(const int argc, const char* argv[]) { if(argc < 3) { program_info += "\nKnown filters:\n"; for (const std::string &name: core::data::io::Pdb::pdb_filter_names) program_info += "\t" + name; utils::exit_OK_with_message(program_info); // --- complain about missing program parameter } std::string filter_names(argv[2]); for (int i = 3; i < argc; ++i) filter_names += " " + std::string(argv[i]); core::data::io::Pdb reader(argv[1], // file name (PDB format, may be gzip-ped) filter_names, // filter names combined into a single string false); // don't parse header to achieve highest speed for (int im = 0; im < reader.count_models(); ++im) for (const core::data::io::Atom &pdb_line : (*reader.atoms[im])) std::cout << pdb_line.to_pdb_line() << "\n"; } |

ap_find_in_fasta¶
Program reads a sequence database in FASTA format and a text file with sequence identifiers, and prints the requested sequences on the screen.
USAGE:
ap_find_in_fasta input.fasta seq_id_list.txt
EXAMPLE:
ap_find_in_fasta uniref90.fasta seq_id_list.txt
ap_find_in_fasta ferrodoxins.fasta selected_list.txt
Keywords:
Categories:
- core/data/io/fasta_io.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include <iostream> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> std::string program_info = R"( Program reads a sequence database in FASTA format and a text file with sequence identifiers, and prints the requested sequences on the screen. USAGE: ap_find_in_fasta input.fasta seq_id_list.txt EXAMPLE: ap_find_in_fasta uniref90.fasta seq_id_list.txt ap_find_in_fasta ferrodoxins.fasta selected_list.txt )"; bool is_good_sequence(const core::data::sequence::Sequence_SP seq, const std::vector<std::string> & wanted_seq_id) { for(const std::string & s : wanted_seq_id) if(seq->header().find(s)!=std::string::npos) return true; return false; } /** @brief ap_find_in_fasta reads a sequence database in FASTA format and looks for sequences by given IDs * * CATEGORIES: core/data/io/fasta_io.hh; * KEYWORDS: FASTA input; FASTA output; sequence; FASTA; pre-processing * GROUP: File processing;Data filtering */ int main(const int argc, const char *argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::sequence::Sequence_SP; // --- Sequence_SP is just a std::shared_ptr to core::data::sequence::Sequence type using namespace core::data::io; // --- for FASTA I/O std::vector<std::string> wanted_seq_id; utils::read_listfile(argv[2], wanted_seq_id); std::vector<core::data::sequence::Sequence_SP> sink; // --- Read a file with FASTA sequences core::data::sequence::Sequence_SP seq = nullptr; std::ifstream infile; utils::in_stream(argv[1], infile); size_t n = 0; infile >> seq; while (seq != nullptr) { if (is_good_sequence(seq, wanted_seq_id)) std::cout << create_fasta_string(*seq) << '\n'; if ((++n) % 10000 == 10000) std::cerr << n << " sequences tested\n"; infile >> seq; } } |

ap_hhpred_converter¶
Reads alignments from HHPred output and prints then in Edinburgh, FASTA or PIR format, according to given flag. The list of available flags: -e for Edinburgh output format -f for FASTA output format -p for PIR output format
USAGE:
ap_hhpred_converter hhpred-file flag
EXAMPLE:
ap_hhpred_converter hhpred.out -p
REFERENCE: Soding, J and Biegert, A and Lupas, A. N., “The HHpred interactive server for protein homology detection and structure prediction.” Nucleic acids research (2005) 33 W244–W248
Keywords:
- sequence alignment
- FASTA
- PIR
- Edinburgh
Categories:
- core::data::io::read_hhpred
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #include <iostream> #include <core/data/io/alignment_io.hh> #include <core/data/io/fasta_io.hh> #include <core/data/io/pir_io.hh> #include <utils/exit.hh> std::string program_info = R"( Reads alignments from HHPred output and prints then in Edinburgh, FASTA or PIR format, according to given flag. The list of available flags: -e for Edinburgh output format -f for FASTA output format -p for PIR output format USAGE: ap_hhpred_converter hhpred-file flag EXAMPLE: ap_hhpred_converter hhpred.out -p REFERENCE: Soding, J and Biegert, A and Lupas, A. N., "The HHpred interactive server for protein homology detection and structure prediction." Nucleic acids research (2005) 33 W244--W248 )"; /** @brief Extract alignments from HHPred output * * CATEGORIES: core::data::io::read_hhpred; * KEYWORDS: sequence alignment; FASTA; PIR; Edinburgh * GROUP: File processing; Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter std::vector<core::alignment::PairwiseSequenceAlignment_SP> alignments; core::data::io::read_hhpred(argv[1], alignments); char flag = argv[2][1]; // --- E, e, F, f, P, p switch(flag) { case 'E' : case 'e' : for(const auto & seq_ali : alignments) core::data::io::write_edinburgh(*seq_ali, std::cout, 80); break; case 'F' : case 'f' : for(const auto & seq_ali : alignments) std::cout << core::data::io::create_fasta_string(*seq_ali, 80) << "\n"; break; case 'P' : case 'p' : for(const auto & seq_ali : alignments) std::cout << core::data::io::create_pir_string(*seq_ali, 80) << "\n"; break; default: std::cerr << "Incorrect output format requested!\n"; } } |

ap_ligand_interactions¶
Finds ligand - protein interactions in a given PDB file. Ligand code must also be provided The program prints interactions between protein and ligand including stacking, hydrogen bonds and van der Waals interactions.
USAGE:
ap_ligand_interactions input.pdb ligand_code
EXAMPLE:
ap_ligand_interactions 5ldk.pdb ATP
Keywords:
Categories:
- core::data::io::Pdb; core::calc::structural::interactions
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | #include <iostream> #include <numeric> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/data/io/Pdb.hh> #include <core/calc/structural/interactions/StackingInteraction.hh> #include <core/calc/structural/interactions/StackingInteractionCollector.hh> #include <core/calc/structural/interactions/VdWInteractionCollector.hh> #include <core/calc/structural/interactions/VdWInteraction.hh> #include <core/calc/structural/interactions/HydrogenBondInteraction.hh> #include <core/calc/structural/interactions/HydrogenBondCollector.hh> #include <utils/LogManager.hh> #include <utils/exit.hh> std::string program_info = R"( Finds ligand - protein interactions in a given PDB file. Ligand code must also be provided The program prints interactions between protein and ligand including stacking, hydrogen bonds and van der Waals interactions. USAGE: ap_ligand_interactions input.pdb ligand_code EXAMPLE: ap_ligand_interactions 5ldk.pdb ATP )"; using namespace core::data::io; // --- Pdb reader and PdbLineFilter lives there using namespace core::calc::structural::interactions; /** @brief Finds stacking, hbonds and van der Waals interactions for ligand in a given PDB file. * * CATEGORIES: core::data::io::Pdb; core::calc::structural::interactions * KEYWORDS: PDB input; PDB line filter; interactions * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::FINER(); // --- INFO is the default logging level; set it to FINE to see more // --- Read a PDB file given as an argument to this program Pdb reader(argv[1], // --- input PDB file all_true(is_not_water, is_not_alternative, is_not_hydrogen, invert_filter(is_bb)), // --- Inverted backbone selector reads only side chains keep_all, true); // --- yes, read header core::data::structural::Structure_SP s = reader.create_structure(0); std::string code_3 = argv[2]; VdWInteractionCollector vdw_collector = VdWInteractionCollector(); HydrogenBondCollector hb_collector = HydrogenBondCollector(); StackingInteractionCollector stack_collector = StackingInteractionCollector(); std::vector<ResiduePair_SP> v_sink; std::vector<ResiduePair_SP> hb_sink; std::vector<ResiduePair_SP> s_sink; hb_collector.collect(*s, hb_sink); vdw_collector.collect(*s, v_sink); stack_collector.collect(*s, s_sink); std::cout << VdWInteraction::output_header() << "\n"; for (const ResiduePair_SP ri:v_sink) { VdWInteraction_SP bi = std::dynamic_pointer_cast<VdWInteraction>(ri); if (bi && bi->first_residue()->residue_type().code3.c_str() == code_3) std::cout << *bi << "\n"; } std::cout << HydrogenBondInteraction::output_header() << "\n"; for (const ResiduePair_SP ri:hb_sink) { HydrogenBondInteraction_SP bi = std::dynamic_pointer_cast<HydrogenBondInteraction>(ri); if (bi && bi->first_residue()->residue_type().code3.c_str() == code_3) std::cout << *bi << "\n"; } std::cout << StackingInteraction::output_header() << "\n"; for (const ResiduePair_SP ri:s_sink) { StackingInteraction_SP bi = std::dynamic_pointer_cast<StackingInteraction>(ri); if (bi && bi->first_residue()->residue_type().code3.c_str() == code_3) std::cout << *bi << "\n"; } } |

ap_ligand_trajectory¶
Finds contacts between a ligand molecule and a protein. Reads a multi-model PDB file and detects contacts in every model (e.g. a frame of an MD simulation). The output provides the interacting residues (name and residue ID) along with the number of observations for this contact. Requires PDB input file, three-letter ligand code and contact distance in Angstroms.
USAGE:
./ap_ligand_trajectory input.pdb ligand-code contact-distance
EXAMPLE:
./ap_ligand_trajectory test_inputs/2kwi.pdb GNP 3.5
Keywords:
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( Finds contacts between a ligand molecule and a protein. Reads a multi-model PDB file and detects contacts in every model (e.g. a frame of an MD simulation). The output provides the interacting residues (name and residue ID) along with the number of observations for this contact. Requires PDB input file, three-letter ligand code and contact distance in Angstroms. USAGE: ./ap_ligand_trajectory input.pdb ligand-code contact-distance EXAMPLE: ./ap_ligand_trajectory test_inputs/2kwi.pdb GNP 3.5 )"; /** @brief Finds contacts between a ligand molecule and a multimodel-protein. * * CATEGORIES: core::data::io::Pdb * KEYWORDS: PDB input; contact map; ligand * GROUP: Structure calculations; */ int main(const int argc, const char* argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // --- file name (PDB format, may be gzip-ped) const std::string code(argv[2]); // --- The ligand code is the second parameter of the program double cutoff = utils::from_string<double>(argv[3]); // The third parameter is the contact distance (in Angstroms) std::map<std::string, int> results; // --- Map used to store results for (size_t i = 0; i < reader.count_models(); ++i) { // --- Iterate over all models in the input file core::data::structural::Structure_SP strctr = reader.create_structure(i); // --- Here we use a standard <code>find_if</code> algorithm to find the ligand residue by its 3-letter code auto ligand = std::find_if(strctr->first_residue(), strctr->last_residue(), [&code](core::data::structural::Residue_SP res) {return (res->residue_type().code3==code);}); if(ligand== strctr->last_residue()) { // --- If no ligand - print a message and take next structure std::cerr << "Model " << i << " has no " << argv[2] << " residue\n"; continue; } for (auto it_resid = strctr->first_residue(); it_resid != strctr->last_residue(); ++it_resid) { double d = (*it_resid)->min_distance(*ligand); if (d < cutoff) { // --- if this is close enough, std::string key = utils::string_format("%3s %4s %4d",(*it_resid)->residue_type().code3.c_str(), (*it_resid)->owner()->id().c_str(), (*it_resid)->id()); if (results.find(key) == results.end()) results[key] = 1; else results[key]++; } } } // --- print results std::cout <<"#resn chain resid counts\n"; for(const auto & p:results) std::cout << p.first<<" "<<utils::string_format("%5d",p.second)<<"\n"; } |

ap_molecule_diffusion¶
ap_molecule_diffusion calculates average displacement of a small molecule as a function of time over a trajectory If a multi-model PDB file was given, the program prints contact count observed in all models
USAGE:
ap_molecule_diffusion trajectory.pdb HOH box_side
where trajectory.pdb is the input file multimodel-PDB file HOH is the PDB-id of molecules for which the displacement will be evaluated
Keywords:
Categories:
- core::data::basic::Vec3Cubic
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | #include <iostream> #include <core/index.hh> #include <core/data/io/Pdb.hh> #include <core/data/basic/Vec3I.hh> #include <core/calc/statistics/OnlineStatistics.hh> #include <utils/exit.hh> std::string program_info = R"( ap_molecule_diffusion calculates average displacement of a small molecule as a function of time over a trajectory If a multi-model PDB file was given, the program prints contact count observed in all models USAGE: ap_molecule_diffusion trajectory.pdb HOH box_side where trajectory.pdb is the input file multimodel-PDB file HOH is the PDB-id of molecules for which the displacement will be evaluated )"; /** @brief Calculates average displacement of a small molecule as a function of time over a trajectory * * CATEGORIES: core::data::basic::Vec3Cubic * KEYWORDS: PDB input; simulation * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter double L = utils::from_string<double>(argv[3]); // The third parameter is the box width (in Angstroms) core::data::basic::Vec3I::set_box_len(L); core::data::io::Pdb reader(argv[1]); // --- file name (PDB format, may be gzip-ped) core::index4 n_atoms = reader.count_atoms(0); core::index4 n_frmes = reader.count_models(); core::index4 t_max = n_frmes / 5; std::vector<std::vector<core::data::basic::Vec3I>> v; // ---------- Load coordinates to memory ---------- for (int i_start = 0; i_start < n_frmes; ++i_start) { std::vector<core::data::basic::Vec3I> vi(n_atoms); reader.fill_structure(i_start, vi); v.push_back(vi); } // ---------- Calculate displacement ---------- std::vector<core::calc::statistics::OnlineStatistics> avg(t_max); for (size_t i_start = 0; i_start < n_frmes - t_max - 1; ++i_start) { const std::vector<core::data::basic::Vec3I> &v0 = v[i_start]; for (size_t i_t = 1; i_t <= t_max; ++i_t) { const std::vector<core::data::basic::Vec3I> &vi = v[i_start + i_t]; for (size_t i_atom = 0; i_atom < n_atoms; ++i_atom) avg[i_t - 1](sqrt(v0[i_atom].distance_square_to(vi[i_atom]))); } } for (size_t i_t = 1; i_t <= t_max; ++i_t) { std::cout << utils::string_format("%5d %f %f\n", i_t, avg[i_t - 1].avg(), sqrt(avg[i_t - 1].var())); } } |

ap_pdb_to_fasta_ss¶
Reads a PDB file and writes protein sequence(s) in FASTA format. The program also writes secondary structure in FASTA format, if this data is available from PDB headers. The sequence comprise only these amino acid residues which have C-alpha atom User can select a chain by providing its code as the second argument of the program. The program also writes a PDB file that corresponds to the sequence.
USAGE:
ap_pdb_to_fasta_ss input.pdb chain-code
EXAMPLE:
ap_pdb_to_fasta_ss 5edw.pdb A
OUTPUT: >2GB1 A MTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE
>2GB1 A - secondary structure CEEEEEECCCCCCEEEEEECCHHHHHHHHHHHHHHHCCCCCEEEEECCCCEEEEEC
Keywords:
- PDB input
- FASTA output
- secondary structure
- predicates
Categories:
- core::data::io::Pdb; core::algorithms::Not; core::data::sequence::SecondaryStructure
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | #include <iostream> #include <core/algorithms/predicates.hh> #include <core/data/io/Pdb.hh> #include <core/data/io/fasta_io.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a PDB file and writes protein sequence(s) in FASTA format. The program also writes secondary structure in FASTA format, if this data is available from PDB headers. The sequence comprise only these amino acid residues which have C-alpha atom User can select a chain by providing its code as the second argument of the program. The program also writes a PDB file that corresponds to the sequence. USAGE: ap_pdb_to_fasta_ss input.pdb chain-code EXAMPLE: ap_pdb_to_fasta_ss 5edw.pdb A OUTPUT: >2GB1 A MTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE >2GB1 A - secondary structure CEEEEEECCCCCCEEEEEECCHHHHHHHHHHHHHHHCCCCCEEEEECCCCEEEEEC )"; /** @brief Reads a PDB file and writes protein sequence(s) in FASTA format. * * The program also writes secondary structure in FASTA format, if this data is available from PDB headers. * User can select a chain by providing its code as the second argument of the program * USAGE: * ap_pdb_to_fasta_ss 5edw.pdb A * * CATEGORIES: core::data::io::Pdb; core::algorithms::Not; core::data::sequence::SecondaryStructure * KEYWORDS: PDB input; FASTA output; secondary structure; predicates * GROUP: File processing; Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; // Pdb and create_fasta_string lives there using namespace core::data::structural; // Chain and Pdb reader(argv[1],is_not_alternative,core::data::io::keep_all,true); Structure_SP strctr = reader.create_structure(0); // Iterate over all chains for (auto it_chain = strctr->begin(); it_chain!=strctr->end(); ++it_chain) { Chain & c = **it_chain; // --- dereference iterator for easier access if ((argc > 2) && ((*it_chain)->id() != argv[2])) continue; // --- The line below uses STL algorithm with BioShell predicate to remove all the residues lacking c-alpha c.erase(std::remove_if(c.begin(), c.end(), core::algorithms::Not<selectors::ResidueHasCA>(selectors::ResidueHasCA())), c.end()); if(c.size()>0) { // --- Create a sequence object (including secondary structure information) core::data::sequence::SecondaryStructure_SP s = (*it_chain)->create_sequence(); // --- Write sequence as FASTA std::cout << create_fasta_string(*s) << "\n"; // --- Write secondary structure as FASTA std::cout << create_fasta_secondary_string(*s) << "\n"; } } } |

ap_pdb_to_pir¶
Reads a PDB file and writes protein sequence(s) in PIR format
USAGE:
ap_pdb_to_pir 5edw.pdb
Keywords:
Categories:
- core::data::io::PirEntry
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/io/pir_io.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a PDB file and writes protein sequence(s) in PIR format USAGE: ap_pdb_to_pir 5edw.pdb )"; /** @brief Reads a PDB file and writes protein sequence(s) in PIR format * * CATEGORIES: core::data::io::PirEntry * KEYWORDS: PDB input; PIR * GROUP: File processing;Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; core::data::io::Pdb reader(argv[1],is_not_alternative, only_ss_from_header, true); core::data::structural::Structure_SP strctr = reader.create_structure(0); // Iterate over all chains for (auto it_chain = strctr->begin(); it_chain!=strctr->end(); ++it_chain) { PirEntry e("",(*it_chain)->create_sequence()->sequence); e.type(PirEntryType::STRUCTURE_X); e.code(strctr->code()); e.first_residue_id((*it_chain)->front()->id()); e.first_chain_id((*it_chain)->char_id()); e.last_residue_id((*it_chain)->back()->id()); e.last_chain_id((*it_chain)->char_id()); std::cout << e<<"\n"; } } |

ap_pir_to_fasta¶
Reads a file with sequences in PIR format and converts them to FASTA.
USAGE:
ap_pir_to_fasta example.pir
REFERENCE: https://salilab.org/modeller/9v8/manual/node454.html
Keywords:
Categories:
- core/data/io/pir_io
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | #include <iostream> #include <core/data/io/pir_io.hh> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a file with sequences in PIR format and converts them to FASTA. USAGE: ap_pir_to_fasta example.pir REFERENCE: https://salilab.org/modeller/9v8/manual/node454.html )"; /** @brief Reads a file with sequences in PIR format and converts them to FASTA. * * CATEGORIES: core/data/io/pir_io; * KEYWORDS: PIR; FASTA output * GROUP: File processing;Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::sequence::Sequence_SP; // --- Sequence_SP is just a std::shared_ptr to core::data::sequence::Sequence type using namespace core::data::io; // --- This is required so PirEntry can be printed with << operator // --- Create a container where the sequences will be stored std::vector<Sequence_SP> sequences; // --- Read a file with PIR sequences core::data::io::read_pir_file(argv[1], sequences); // --- Write them in FASTA for (const Sequence_SP s : sequences) std::cout << core::data::io::create_fasta_string(*s) << "\n"; // --- The sequence data is actually in FASTA format; just upper-casted to Sequence_SP // --- Here we down-cast it back to the derived type std::cout << "The source PIR data was:\n"; for (const Sequence_SP s : sequences) std::cout << *std::dynamic_pointer_cast<core::data::sequence::PirEntry>(s) << "\n"; } |

ap_reorder_profile_columns¶
Reads a sequence profile (ASN.1 file format) and shuffles profile’s columns as requested. Resulting profile is writen in a tabular text format. If the new column order is not specified, amino acids will appear in the order: GAP VILMC HWFY KR QD NQST, i.e. small, aromatic, positive, negative and other-polar
USAGE:
./ap_reorder_profile_columns input.asn1 [column-order]
EXAMPLE:
./ap_reorder_profile_columns d1or4A_.asn1
Keywords:
- output file
- sequence profile
Categories:
- core::data::sequence::SequenceProfile
Input files:
- d1or4A_.asn1_
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <iostream> #include <core/chemical/Monomer.hh> #include <core/data/sequence/SequenceProfile.hh> #include <utils/exit.hh> #include <utils/LogManager.hh> #include <utils/io_utils.hh> std::string program_info = R"( Reads a sequence profile (ASN.1 file format) and shuffles profile's columns as requested. Resulting profile is writen in a tabular text format. If the new column order is not specified, amino acids will appear in the order: GAP VILMC HWFY KR QD NQST, i.e. small, aromatic, positive, negative and other-polar USAGE: ./ap_reorder_profile_columns input.asn1 [column-order] EXAMPLE: ./ap_reorder_profile_columns d1or4A_.asn1 )"; // small aromatic positive negative other-polar const std::string nice_order = "GAP" "VILMC" "HWFY" "KR" "QD" "NQST"; /** @brief Reads a sequence profile (ASN.1 file format) and shuffles profile's columns * * CATEGORIES: core::data::sequence::SequenceProfile * KEYWORDS: output file; sequence profile * GROUP: File processing */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; utils::Logger logs("ap_reorder_profile_columns"); std::string order_string = (argc>2) ? argv[2] : nice_order; logs << utils::LogLevel::INFO << "new aa order is: " << order_string << "\n"; SequenceProfile_SP profile_in = core::data::sequence::read_ASN1_checkpoint(argv[1]); SequenceProfile_SP profile_out = profile_in->create_reordered(order_string); profile_out->write_table(std::cout); } |

ap_rescore_alignment¶
Reads sequence alignment(s) in the FASTA format and recalculates scores. The input file may contained more than two sequences, i.e. may provide a Multiple Sequence Alignment. Every pair of aligned sequences is rescored in this case. Output values are printed on the screen. The default scoring parameters are: BLOSUM62, -10, -1
USAGE:
./ap_rescore_alignment input.fasta [substitution-matrix [gap_open [gap_extend] ] ]
EXAMPLE:
./ap_rescore_alignment test_inputs/2azaA_2pcyA-ali.fasta
Keywords:
- sequence alignment
- FASTA input
- sequence alignment score
Categories:
- core/alignment/on_alignment_computations.cc
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #include <iostream> #include <utils/exit.hh> #include <core/data/io/fasta_io.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/alignment/on_alignment_computations.hh> #include <core/alignment/scoring/NcbiSimilarityMatrixFactory.hh> using namespace core::data::io; using namespace core::alignment::scoring; using namespace core::data::sequence; std::string program_info = R"( Reads sequence alignment(s) in the FASTA format and recalculates scores. The input file may contained more than two sequences, i.e. may provide a Multiple Sequence Alignment. Every pair of aligned sequences is rescored in this case. Output values are printed on the screen. The default scoring parameters are: BLOSUM62, -10, -1 USAGE: ./ap_rescore_alignment input.fasta [substitution-matrix [gap_open [gap_extend] ] ] EXAMPLE: ./ap_rescore_alignment test_inputs/2azaA_2pcyA-ali.fasta )"; /** @brief Estimates pairwise sequence similarity for a set of sequences given in a FASTA format * * CATEGORIES: core/alignment/on_alignment_computations.cc; * KEYWORDS: sequence alignment; FASTA input; sequence alignment score * GROUP: Alignments */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // --- read input database fasta file std::vector<Sequence_SP> ali_fasta; // --- stores sequences (should be already aligned) core::data::io::read_fasta_file(argv[1], ali_fasta); std::string matrix_name = (argc>2) ? argv[2] : "BLOSUM62"; short gap_open = (argc > 3) ? atoi(argv[3]) : -10; short gap_extend = (argc > 4) ? atoi(argv[4]) : -1; NcbiSimilarityMatrix_SP sim_m = NcbiSimilarityMatrixFactory::get().get_matrix(matrix_name); // --- prints both fasta sequences names and their recalculated score for (size_t i = 1; i < ali_fasta.size(); ++i) for (size_t j = 0; j < i; ++j) std::cout << std::setw(10) << ali_fasta[i]->header().substr(0, 10) << " " << std::setw(10) << ali_fasta[j]->header().substr(0, 10) << " " << core::alignment::calculate_score(*ali_fasta[i], *ali_fasta[j], *sim_m, gap_open, gap_extend) << "\n"; } |

ap_sasa¶
Calculates Solvent Accessible Surface Area (SASA) for every atom in the input structure for the probe sphere with the given radius (in Angstroms) and number of dots (n_dots) used to approximate surface area. Resulting values will be stored as B-factor values in PDB file. Default probe radius is 1.6 Angstroms, the program uses 960 dots by default
USAGE:
./ap_sasa input.pdb probe-radius n-dots
EXAMPLE:
./ap_sasa 2gb1.pdb 1.6
REFERENCE: Lee, Byungkook, Frederic M. Richards. “The interpretation of protein structures: estimation of static accessibility.” JMB 55 (1971): 379-IN4. doi:10.1016/0022-2836(71)90324-X
Shrake, A., J. A. Rupley. “Environment and exposure to solvent of protein atoms. Lysozyme and insulin.” JMB 79(1973): 351-371. doi:10.1016/0022-2836(73)90011-9.
Keywords:
Categories:
- core::calc::structural::shrake_rupley_sasa
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #include <iostream> #include <algorithm> #include <set> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/calc/structural/sasa.hh> #include <utils/exit.hh> std::string program_info = R"( Calculates Solvent Accessible Surface Area (SASA) for every atom in the input structure for the probe sphere with the given radius (in Angstroms) and number of dots (n_dots) used to approximate surface area. Resulting values will be stored as B-factor values in PDB file. Default probe radius is 1.6 Angstroms, the program uses 960 dots by default USAGE: ./ap_sasa input.pdb probe-radius n-dots EXAMPLE: ./ap_sasa 2gb1.pdb 1.6 REFERENCE: Lee, Byungkook, Frederic M. Richards. "The interpretation of protein structures: estimation of static accessibility." JMB 55 (1971): 379-IN4. doi:10.1016/0022-2836(71)90324-X Shrake, A., J. A. Rupley. "Environment and exposure to solvent of protein atoms. Lysozyme and insulin." JMB 79(1973): 351-371. doi:10.1016/0022-2836(73)90011-9. )"; /** @brief Calculates Solvent Accessible Surface Area for every atom in the input structure * * * CATEGORIES: core::calc::structural::shrake_rupley_sasa * KEYWORDS: PDB input; structural properties * GROUP: Structure calculations; */ int main(const int argc, const char* argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; Pdb reader(argv[1], all_true(is_not_water,is_not_alternative)); // --- file name (PDB format, may be gzip-ped) double probe_radius = (argc > 2) ? atof(argv[2]) : 1.6; int n_dots = (argc > 3) ? atoi(argv[3]) : 960; using namespace core::data::structural; Structure_SP strctr = reader.create_structure(0); std::vector<double> sasa; core::calc::structural::shrake_rupley_sasa(*strctr, sasa, probe_radius, n_dots); int i = -1; for (auto it_atom_i = strctr->first_const_atom(); it_atom_i != strctr->last_const_atom(); ++it_atom_i) { (**it_atom_i).b_factor(sasa[++i]); std::cout << (**it_atom_i).to_pdb_line() << "\n"; } } |

ap_scorefile_columns¶
Reads a score file or a silent file (produced by Rosetta) and extracts requested columns of scores
USAGE:
ap_scorefile_columns default.out
ap_scorefile_columns score.fsc
ap_scorefile_columns 1pgxA-abinitio.fsc rms ss_pair rsigma
Keywords:
- Rosetta scorefile
- :ref:``
Categories:
- core::data::io::scorefile_io
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #include <iostream> #include <core/data/io/scorefile_io.hh> #include <utils/exit.hh> #include <utils/Logger.hh> std::string program_info = R"( Reads a score file or a silent file (produced by Rosetta) and extracts requested columns of scores USAGE: ap_scorefile_columns default.out ap_scorefile_columns score.fsc ap_scorefile_columns 1pgxA-abinitio.fsc rms ss_pair rsigma )"; /** @brief Reads a score-file or a silent file (produced by Rosetta) and extracts requested columns of scores * * CATEGORIES: core::data::io::scorefile_io * KEYWORDS: Rosetta scorefile; * GROUP: File processing;Data filtering */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; utils::Logger logs("ap_scorefile_columns"); std::shared_ptr<NamedDataTable> fsc = read_scorefile(argv[1]); std::vector<core::index1> columns; if (argc > 2) { for (core::index1 i = 2; i < argc; ++i) if (fsc->has_column(argv[i])) columns.push_back(fsc->column_index(argv[i])); else logs << utils::LogLevel::WARNING << "Unknown column ID: " << argv[i] << "\n"; } else { columns.push_back(fsc->column_index("score")); columns.push_back(fsc->column_index("rms")); } std::vector<std::string> tags; std::cout << "#"; for (core::index1 icol : columns) std::cout << " "<< fsc->column_name(icol) ; std::cout << "\n"; for (const auto &row: *fsc) { for (core::index1 icol : columns) std::cout << row[icol] << " "; std::cout << "\n"; } } |

ap_stiff_docking_crmsd¶
Reads a PDB file with a ligand docked to a protein receptor and a native (reference) protein-ligand complex and calculates cRMSD (coordinate Root-Mean-Square Deviation) on a ligand molecule between the two conformations. The file with structural models may contain more than one conformation (multi-model PDB file). The program assumes that the receptor structure doesn’t change significantly during docking (stiff or semi-flexible docking scenario) and superimposes all models on the first one, which significantly reduces calculation time. The ligand may be a small molecule compound, peptide or even a protein. The program evaluates cRMSD based solely on ligand coordinates. The program finds a small-molecule ligand by residue ID (a three-letter code, such as CAM) Peptide ligands (or proteins) are identified by a chain ID (a single letter code, such as X). If the reference structure is not given and dash ‘-’ character is used instead (as in the last example), the program evaluates pairwise all-vs-all cRMSD calculations. The output provides: - ligand name (and possibly model ID) - crmsd on receptor (to confirm that is rigid) - no. of atoms of a receptor - crmsd on a ligand - no. of atoms of a ligand
USAGE:
./ap_stiff_docking_crmsd reference.pdb ligand_def model.pdb [model2.pdb ...]
SEE ALSO: ap_docking_crmsd - for a flexible docking analysis ap_ligand_clustering - for clustering of ligand docking poses
EXAMPLEs:
./ap_stiff_docking_crmsd 2m56-ref.pdb CAM 00199.pdb 00963.pdb 04473.pdb
./ap_stiff_docking_crmsd 2kwi-1.pdb B 2kwi.pdb
./ap_stiff_docking_crmsd - B 2kwi.pdb
where 2m56-ref.pdb is the native and CAM is the three-letter PDB code of the ligand for which crmsd will be evaluated and 00199.pdb and the two other files are conformation after docking. In the second and third examples, B is the ID of the chain containing a ligand peptide.
Keywords:
Categories:
- core::protocols::PairwiseLigandCrmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <core/calc/structural/transformations/Crmsd.hh> #include <utils/exit.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/protocols/PairwiseLigandCrmsd.hh> std::string program_info = R"( Reads a PDB file with a ligand docked to a protein receptor and a native (reference) protein-ligand complex and calculates cRMSD (coordinate Root-Mean-Square Deviation) on a ligand molecule between the two conformations. The file with structural models may contain more than one conformation (multi-model PDB file). The program assumes that the receptor structure doesn't change significantly during docking (stiff or semi-flexible docking scenario) and superimposes all models on the first one, which significantly reduces calculation time. The ligand may be a small molecule compound, peptide or even a protein. The program evaluates cRMSD based solely on ligand coordinates. The program finds a small-molecule ligand by residue ID (a three-letter code, such as CAM) Peptide ligands (or proteins) are identified by a chain ID (a single letter code, such as X). If the reference structure is not given and dash '-' character is used instead (as in the last example), the program evaluates pairwise all-vs-all cRMSD calculations. The output provides: - ligand name (and possibly model ID) - crmsd on receptor (to confirm that is rigid) - no. of atoms of a receptor - crmsd on a ligand - no. of atoms of a ligand USAGE: ./ap_stiff_docking_crmsd reference.pdb ligand_def model.pdb [model2.pdb ...] SEE ALSO: ap_docking_crmsd - for a flexible docking analysis ap_ligand_clustering - for clustering of ligand docking poses EXAMPLEs: ./ap_stiff_docking_crmsd 2m56-ref.pdb CAM 00199.pdb 00963.pdb 04473.pdb ./ap_stiff_docking_crmsd 2kwi-1.pdb B 2kwi.pdb ./ap_stiff_docking_crmsd - B 2kwi.pdb where 2m56-ref.pdb is the native and CAM is the three-letter PDB code of the ligand for which crmsd will be evaluated and 00199.pdb and the two other files are conformation after docking. In the second and third examples, B is the ID of the chain containing a ligand peptide. )"; using namespace core::data::structural; /** @brief ap_peptide_docking_crmsd calculates crmsd of a peptide that is bound to a receptor * * CATEGORIES: core::protocols::PairwiseLigandCrmsd * KEYWORDS: PDB input; crmsd; docking; structure selectors * GROUP: Structure calculations; Docking; */ int main(const int argc, const char* argv[]) { using namespace core::data::structural::selectors; if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter const std::string code(argv[2]); // --- The ligand code is the second parameter of the program AtomSelector_SP select_ligand = nullptr; // --- Ligand selector object if (code.size() == 3) // --- If the code is 3 characters long, its a residue code select_ligand = std::static_pointer_cast<AtomSelector>(std::make_shared<SelectResidueByName>(code)); else { AtomSelector_SP select_chain = std::static_pointer_cast<AtomSelector>(std::make_shared<ChainSelector>(code[0])); std::shared_ptr<LogicalANDSelector> select_chain_ca = std::make_shared<LogicalANDSelector>(); select_chain_ca->add_selector( std::make_shared<IsCA>() ); select_chain_ca->add_selector(select_chain); select_ligand = select_chain_ca; } std::shared_ptr<LogicalANDSelector> select_receptor = std::make_shared<LogicalANDSelector>(); // --- Receptor selector object AtomSelector_SP select_not_ligand = std::static_pointer_cast<AtomSelector>(std::make_shared<InverseAtomSelector>(*select_ligand)); select_receptor->add_selector(std::make_shared<IsCA>()); select_receptor->add_selector(select_not_ligand); core::protocols::PairwiseLigandCrmsd crmsd_protocol(select_ligand, select_receptor); for(core::index2 i=3;i<argc;++i) { core::data::io::Pdb reader(argv[i], core::data::io::is_not_hydrogen); if (reader.count_models()>1) { for (core::index2 j = 0; j < reader.count_models(); ++j) crmsd_protocol.add_input_structure(reader.create_structure(j), utils::string_format("%s:%4d",argv[i], j)); } else crmsd_protocol.add_input_structure(reader.create_structure(0), argv[i]); } crmsd_protocol.crmsd_cutoff(20.0); // crmsd cutoff large enough to get some output crmsd_protocol.output_stream( std::shared_ptr<std::ostream>(&std::cout, [](void*) {}) ); if(std::string(argv[1]) != "-") { core::data::io::Pdb reader(argv[1], core::data::io::is_not_hydrogen); Structure_SP native = reader.create_structure(0); crmsd_protocol.calculate(native); } else crmsd_protocol.calculate(); } |

ap_superimpose_pdb_by_ca¶
Superimposes protein structures by matching C-alphas. All atoms of the second (and subsequent) protein structures will be superimposed on the first protein based on the CA positions. All structures must contain the same number of C-alphas atoms.
USAGE:
./ap_superimpose_pdb_by_ca reference pdb_file_1 [pdb_file_2 ...]
EXAMPLE:
./ap_superipose_pdb_by_ligand 4rm4A.pdb model.pdb
Keywords:
- PDB input
- rototranslation
- superimposition
- crmsd
- docking
Categories:
- core/calc/structural/transformations/PairwiseCrmsd
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | #include <memory> #include <iostream> #include <vector> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/protocols/PairwiseCrmsd.hh> #include <utils/LogManager.hh> using namespace core::data::structural; std::string program_info = R"( Superimposes protein structures by matching C-alphas. All atoms of the second (and subsequent) protein structures will be superimposed on the first protein based on the CA positions. All structures must contain the same number of C-alphas atoms. USAGE: ./ap_superimpose_pdb_by_ca reference pdb_file_1 [pdb_file_2 ...] EXAMPLE: ./ap_superipose_pdb_by_ligand 4rm4A.pdb model.pdb )"; /** @brief Superimposes protein structures by matching ligand molecules. * * * CATEGORIES: core/calc/structural/transformations/PairwiseCrmsd * KEYWORDS: PDB input; rototranslation; superimposition; crmsd; docking * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::get().FINE(); selectors::AtomSelector_SP selector = std::make_shared<selectors::IsCA>(); // --- select a ligand residue by its 3-letter code core::data::io::Pdb read_native(argv[1], core::data::io::keep_all); // --- Read the native (reference) structure, keep all atoms Structure_SP native = read_native.create_structure(0); std::vector<Structure_SP> models; // --- Container for targets to be superimposed for (int i = 2; i < argc; ++i) { core::data::io::Pdb reader(argv[i], core::data::io::keep_all); for (int j = 0; j < reader.count_models(); ++j) // --- Read all models from each target PDB file models.push_back(reader.create_structure(j)); } selectors::AtomSelector_SP select_all = std::make_shared<selectors::AtomSelector>(); core::protocols::PairwiseCrmsd rms_calc(models, selector); std::shared_ptr<std::ostream> out = std::make_shared<std::ofstream>("out.pdb"); rms_calc.calculate(native, out); } |

ap_superimpose_pdb_by_ligand¶
Superimposes protein structures by matching ligand molecules. All the given protein structures must contain the same ligand molecule, every time in the same conformation. The program calculates a transformation (rotation-translation) that superimposes that ligand from input structures on the same ligand molecule found in the native PDB. The transformation is then used to rototranslate whole protein structures. Results is written to “out.pdb” file
USAGE:
./ap_superimpose_pdb_by_ligand native_pdb ligand_name pdb_file_1 [pdb_file_2 ...]
EXAMPLE:
./ap_superipose_pdb_by_ligand 4rm4A.pdb HEM 5ofqA.pdb
Keywords:
- PDB input
- rototranslation
- superimposition
- crmsd
- docking
Categories:
- core/calc/structural/transformations/PairwiseCrmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #include <memory> #include <iostream> #include <vector> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/protocols/PairwiseCrmsd.hh> #include <utils/LogManager.hh> using namespace core::data::structural; std::string program_info = R"( Superimposes protein structures by matching ligand molecules. All the given protein structures must contain the same ligand molecule, every time in the same conformation. The program calculates a transformation (rotation-translation) that superimposes that ligand from input structures on the same ligand molecule found in the native PDB. The transformation is then used to rototranslate whole protein structures. Results is written to "out.pdb" file USAGE: ./ap_superimpose_pdb_by_ligand native_pdb ligand_name pdb_file_1 [pdb_file_2 ...] EXAMPLE: ./ap_superipose_pdb_by_ligand 4rm4A.pdb HEM 5ofqA.pdb )"; /** @brief Superimposes protein structures by matching ligand molecules. * * * CATEGORIES: core/calc/structural/transformations/PairwiseCrmsd * KEYWORDS: PDB input; rototranslation; superimposition; crmsd; docking * GROUP: Structure calculations; */ int main(const int argc, const char *argv[]) { if(argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::get().FINE(); selectors::AtomSelector_SP selector = std::make_shared<selectors::SelectResidueByName>(argv[2]); // --- select a ligand residue by its 3-letter code core::data::io::Pdb read_native(argv[1], core::data::io::keep_all); // --- Read the native (reference) structure, keep all atoms Structure_SP native = read_native.create_structure(0); std::vector<Structure_SP> models; // --- Container for targets to be superimposed for (int i = 3; i < argc; ++i) { core::data::io::Pdb reader(argv[i], core::data::io::keep_all); for (int j = 0; j < reader.count_models(); ++j) // --- Read all models from each target PDB file models.push_back(reader.create_structure(j)); } selectors::AtomSelector_SP select_all = std::make_shared<selectors::AtomSelector>(); core::protocols::PairwiseCrmsd rms_calc(models, selector); std::shared_ptr<std::ostream> out = std::make_shared<std::ofstream>("out.pdb"); rms_calc.calculate(native, out); } |

ap_symmetry_in_pdb¶
Example shows how to access symmetry operators stored in a PDB file header. For every operation found, it creates a Rototranslation object and prints it on a screen
USAGE:
c input.pdb
EXAMPLE:
ex_Remark290 5edw.pdb
Keywords:
Categories:
- core/data/io/Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #include <iostream> #include <core/data/io/Pdb.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Example shows how to access symmetry operators stored in a PDB file header. For every operation found, it creates a Rototranslation object and prints it on a screen USAGE: c input.pdb EXAMPLE: ex_Remark290 5edw.pdb )"; /** @brief ex_Remark290 demo shows how to access symmetry operators stored in a PDB file header. * * CATEGORIES: core/data/io/Pdb; * KEYWORDS: PDB input; PDB line filter; Structure */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter for (int i = 1; i < argc; ++i) { core::data::io::Pdb reader(argv[i], // file name (PDB format, may be gzip-ped) core::data::io::is_bb, // a predicate to read only the ATOM lines corresponding to backbone atoms core::data::io::keep_all, // keep all header lines true); // parse PDB header std::shared_ptr<core::data::io::Remark290> r290 = reader.symmetry_operators(); std::shared_ptr<core::data::io::Remark350> r350 = reader.biomolecule_symmetry(); std::cout << "# Symmetry operators found: " << r290->count_operators() << "\n"; for (const auto &rt: *r290) { std::cout << rt << "\n"; } std::cout << "# Biological symmetry biomolecules found: " << r350->count_biomolecules() << "\n"; for (const auto &sym: *r350) { std::cout << "For chains: "; for (auto c: sym.first) std::cout<< c<<" "; std::cout << "with size "<< sym.second.size()<<"\n"; for (auto rt: sym.second) std::cout<< rt<<" "; std::cout <<"\n"; } } } |

.py scripts¶
These group contains Python simple examples, which shows how to use PyBioShell package.
contact_map.py¶
Calculates contact map for a number of models from a single PDB file and prints how often any two residues are in contact
USAGE:
python3 contact_map.py input.pdb cutoff
EXAMPLE:
python3 contact_map.py 2kwi.pdb 4.5
Keywords:
Categories:
- core/calc/structural/ContactMap
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | import sys from pybioshell.core.data.io import Pdb from pybioshell.core.calc.structural import evaluate_phi, evaluate_psi, ContactMap import time if len(sys.argv) < 3 : print(""" Calculates contact map for a number of models from a single PDB file and prints how often any two residues are in contact USAGE: python3 contact_map.py input.pdb cutoff EXAMPLE: python3 contact_map.py 2kwi.pdb 4.5 CATEGORIES: core/calc/structural/ContactMap KEYWORDS: PDB input; contact map GROUP: Structure calculations; IMG: ap_contact_map.png """) sys.exit() pdb = Pdb(sys.argv[1],"") cutoff = float(sys.argv[2]) contact_map = ContactMap(pdb.create_structure(0), cutoff) for i_model in range(1,pdb.count_models()) : strctr = pdb.create_structure(i_model) contact_map.add(strctr) print("model",i_model,"added", file=sys.stderr) res_max = contact_map.max_row_index() print(" i ci res_i j cj resj n_cont") for i in range(res_max) : # --- Get residue index for residue indexed by i; residue index is a structure holding chain ID, residue ID and an insertion code ri = contact_map.residue_index(i) for j in range(i-1) : rj = contact_map.residue_index(j) if contact_map.has_element(i,j) : print("%4d %c %4d%c %4d %c %4d%c %4d" %(i,ri.chain_id, ri.residue_id, ri.i_code, j,rj.chain_id, rj.residue_id, rj.i_code,contact_map.at(i,j))) |

crmsd_on_c-alpha.py¶
Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on C-alpha coordinates. If one file is given, each-vs-each cRMSD between models is calculated. If two or more file is given, crmsd for first pdb vs. the rest is calculated.
USAGE:
python3 crmsd_on_c-alpha.py file1.pdb [file2.pdb...]
EXAMPLE:
python3 crmsd_on_c-alpha.py 1cey.pdb
Keywords:
Categories:
- core/calc/structural/transformations/Crmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | import sys, math from pybioshell.core.data.io import Pdb from pybioshell.core.data.basic import Vec3 from pybioshell.std import vector_core_data_basic_Vec3 from pybioshell.core.calc.structural.transformations import * from pybioshell.utils import LogManager LogManager.INFO() if len(sys.argv) < 2: print(""" Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on C-alpha coordinates. If one file is given, each-vs-each cRMSD between models is calculated. If two or more file is given, crmsd for first pdb vs. the rest is calculated. USAGE: python3 crmsd_on_c-alpha.py file1.pdb [file2.pdb...] EXAMPLE: python3 crmsd_on_c-alpha.py 1cey.pdb CATEGORIES: core/calc/structural/transformations/Crmsd KEYWORDS: PDB input; crmsd GROUP: Structure calculations IMG: ap_Crmsd_deepteal_brown_1.png """) sys.exit() rms = CrmsdOnVec3() if len(sys.argv) == 2: # --- The case of each-vs-each calculations between models of a single PDB file pdb = Pdb(sys.argv[1], "is_ca", False) n_atoms = pdb.count_atoms(0) structure = pdb.create_structure(0) models = [] for i_model in range(0, pdb.count_models()): xyz = vector_core_data_basic_Vec3() for i in range(n_atoms): xyz.append(Vec3()) models.append(xyz) for i_model in range(0, pdb.count_models()): pdb.fill_structure(i_model, models[i_model]) for j in range(i_model): try: print("%2d %2d %6.3f" % (i_model, j, rms.crmsd(models[i_model], models[j], len(models[j])))) except: sys.stderr.write(str(sys.exc_info()[0]) + " " + str(sys.exc_info()[1])) else: # --- The case when two or more PDB files are given, calculates first vs. the rest pdb = Pdb(sys.argv[1], "is_ca", False) n_atoms = pdb.count_atoms(0) structure = pdb.create_structure(0) xyz = vector_core_data_basic_Vec3() for i in range(n_atoms): xyz.append(Vec3()) pdb.fill_structure(0, xyz) for pdb_fname in sys.argv[2:]: other_pdb = Pdb(pdb_fname, "is_ca", False) other_structure = other_pdb.create_structure(0) if n_atoms != other_pdb.count_atoms(0): print("The two structures have different number of CA atoms!\n") other_xyz = vector_core_data_basic_Vec3() for i in range(n_atoms): other_xyz.append(Vec3()) other_pdb.fill_structure(0, other_xyz) try: print("%s: %6.3f" % (pdb_fname.split("/")[-1].split(".")[0], rms.crmsd(xyz, other_xyz, n_atoms))) except: sys.stderr.write(str(sys.exc_info()[0]) + " " + str(sys.exc_info()[1])) |

hist_from_scorefile.py¶
Reads Rosetta scorefile and plot histogram of given column name (default is rms) and energy plot.
EXAMPLE:
python3 hist_from_scorefile.py -f 1pgx-abinitio.fsc 1pgx-abinitio-bis.fsc -c score -min -50.0 -max 40.0
Call python3 hist_from_scorefile.py -h for full help
Keywords:
- Rosetta scorefile
- histogram
- energy plot
Categories:
- core/calc/statistics/HistogramDD; core/data/io/read_scorefile
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | import sys,argparse from pybioshell.core.data.io import find_pdb, read_scorefile from os.path import expanduser, join home_dir = expanduser("~") # It is assumed VisuaLife library is installed in your $HOME/src.git/visualife/src/ directory sys.path.append(join(home_dir, "src.git/visualife/src/")) from core.Plot import Plot from core.SvgViewport import SvgViewport from core.styles import ColorRGB, color_by_name from pybioshell.core.calc.statistics import HistogramDD if len(sys.argv) < 2 : print(""" Reads Rosetta scorefile and plot histogram of given column name (default is rms) and energy plot. EXAMPLE: python3 hist_from_scorefile.py -f 1pgx-abinitio.fsc 1pgx-abinitio-bis.fsc -c score -min -50.0 -max 40.0 Call python3 hist_from_scorefile.py -h for full help CATEGORIES: core/calc/statistics/HistogramDD; core/data/io/read_scorefile KEYWORDS: Rosetta scorefile; histogram; energy plot GROUP: Statistics; IMG: rms_hist.svg """) sys.exit() # -----------argument parsing parser = argparse.ArgumentParser(description='Reads Rosetta scorefiles and plot histogram of given column name (default is rms) for all files in one plot') parser.add_argument('-f', '--file', help="input .fsc file", nargs='+', required=True) parser.add_argument('-c', '--column', help="column name for histogram", nargs=1, required=False, default=["rms"]) parser.add_argument('-min', '--min', help="minimum data value", nargs=1, required=False) parser.add_argument('-max', '--max', help="maximum data value", nargs=1, required=False) parser.add_argument('-b', '--bin_width', help="bin width", nargs=1, required=False, default=[1.0]) parser = parser.parse_args() xmin = float(parser.min[0]) if parser.min else min(alldata) xmax = float(parser.max[0]) if parser.max else max(alldata) step = float(parser.bin_width[0]) data = [] alldata = [] # -----------filling lists with data from file for i in range(len(parser.file)): p = read_scorefile(parser.file[i]) indx = p.column_index(parser.column[0]) print("#Making histogram for ",parser.column[0]," column at index ",indx) lhist = [] for row in p: lhist.append(float(row[indx])) data.append(lhist) alldata.extend(lhist) # ------------ploting----------- drawing = SvgViewport("%s_hist.svg"%(parser.column[0]), 0, 0, 800, 650,color="white") pl = Plot(drawing,100,700,100,550,xmin, xmax,0,0.5,axes_definition="UBLR") stroke_color = color_by_name("SteelBlue").create_darker(0.3) pl.axes["B"].label = parser.column[0] for key,ax in pl.axes.items() : ax.fill, ax.stroke, ax.stroke_width = stroke_color, stroke_color, 2.0 ax.tics(0,5) pl.axes["U"].tics(5,0) pl.axes["R"].tics(5,0) pl.draw_axes() pl.plot_label = "%s histogram" %(parser.column[0]) pl.draw_plot_label() box_width = 0.9 * step if len(data) == 1 else step/(len(data)+1.0) for i in range(len(data)): # ------------------ here we actually make a histogram ----------------- h = HistogramDD(step,xmin,xmax) for j in data[i]: h.insert(j) h.normalize() x_data = [] y_data = [] for b in range(h.count_bins()): x_data.append(h.bin_min_val(b)+box_width*i) y_data.append(h.get_bin(b)) print(h) pl.bars(x_data, y_data, width=step, title="hist%s"%(i)) drawing.close() |
phi_psi.py¶
Calculates Phi, Psi dihedral angles of a given input PDB structure.
USAGE:
python3 phi_psi.py input.pdb
EXAMPLE:
python3 phi_psi.py 2gb1.pdb
Keywords:
Categories:
- core/calc/structural/evaluate_phi; core/calc/structural/evaluate_psi
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | import sys from pybioshell.core.data.io import Pdb from pybioshell.core.calc.structural import evaluate_phi, evaluate_psi if len(sys.argv) < 2 : print(""" Calculates Phi, Psi dihedral angles of a given input PDB structure. USAGE: python3 phi_psi.py input.pdb EXAMPLE: python3 phi_psi.py 2gb1.pdb CATEGORIES: core/calc/structural/evaluate_phi; core/calc/structural/evaluate_psi KEYWORDS: PDB input; structural properties GROUP: Structure calculations IMG: Toluen_dihedral_flat_angle.png """) sys.exit() factor = 180.0/3.14159 structure = Pdb(sys.argv[1],"",False).create_structure(0) for code in structure.chain_codes() : chain = structure.get_chain(code) n_res = chain.count_aa_residues() for i_res in range(1,n_res-1) : try : r = chain[i_res] r_prev = chain[i_res-1] r_next = chain[i_res+1] phi = evaluate_phi(r_prev,r) psi = evaluate_psi(r,r_next) print("%d %s %c %7.2f %7.2f" % (r.id(), r.residue_type().code3, r.owner().id(),phi*factor, psi*factor)) except : print("can't evaluate Phi/Psi at position",i_res, file=sys.stderr) |

score_rms_plot.py¶
Reads Rosetta scorefile and make an energy to rms plot.
USAGE:
python3 score_rms_plot.py -f file1 file2 ... [-x from to -y from to]
EXAMPLE:
python3 score_rms_plot.py -f 1pgx-abinitio.fsc 1pgx-abinitio-bis.fsc
Keywords:
- Rosetta scorefile
- every vs. rms plot
Categories:
- core/data/io/read_scorefile
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | import sys,argparse from pybioshell.core.data.io import find_pdb, read_scorefile from os.path import expanduser, join home_dir = expanduser("~") # It is assumed VisuaLife library is installed in your $HOME/src.git/visualife/src/ directory sys.path.append(join(home_dir, "src.git/visualife/src/")) from core.Plot import Plot from core.SvgViewport import SvgViewport from core.styles import ColorRGB, color_by_name from pybioshell.core.calc.statistics import HistogramD4 if len(sys.argv) < 2 : print(""" Reads Rosetta scorefile and make an energy to rms plot. USAGE: python3 score_rms_plot.py -f file1 file2 ... [-x from to -y from to] EXAMPLE: python3 score_rms_plot.py -f 1pgx-abinitio.fsc 1pgx-abinitio-bis.fsc CATEGORIES: core/data/io/read_scorefile KEYWORDS: Rosetta scorefile; every vs. rms plot GROUP: Statistics IMG: score_to_rms.svg """) sys.exit() #-----------argument parsing parser = argparse.ArgumentParser(description='Reads scorefile from Rosetta and prepares score to rms plot for all given files as a series in one picture') parser.add_argument('-f', '--file', help="input .fsc file", nargs='+', required=True) parser.add_argument('-x', '--x_range', help="range for X axis of a plot (two values: min max)", nargs=2, required=False) parser.add_argument('-y', '--y_range', help="range for X axis of a plot (two values: min max)", nargs=2, required=False) parser = parser.parse_args() energy = [] rms = [] allenergy = [] allrms = [] s="" #-----------filling lists with data from file print("Plotting score vs. rms for:",end="") for i in range(len(parser.file)): s+=" "+parser.file[i] p = read_scorefile(parser.file[i]) en = p.column_index("score") xrms = p.column_index("rms") e = [] r = [] for row in p: e.append(float(row[en])) r.append(float(row[xrms])) energy.append(e) rms.append(r) allenergy.extend(e) allrms.extend(r) #-----------plotting energy plot xfrom = float(parser.x_range[0]) if parser.x_range else min(allrms)-1 xto = float(parser.x_range[1]) if parser.x_range else max(allrms)+1 yfrom = float(parser.y_range[0]) if parser.y_range else min(allenergy)-10 yto = float(parser.y_range[1]) if parser.y_range else max(allenergy)+10 drawing = SvgViewport("outputs_from_test/score_to_rms.svg", 0, 0, 800, 650,color="white") pl = Plot(drawing,100,700,100,600,xfrom,xto,yfrom,yto,axes_definition="UBLR") stroke_color = color_by_name("SteelBlue").create_darker(0.3) pl.axes["B"].label = "rmsd" pl.axes["L"].label = "score" for key,ax in pl.axes.items() : ax.fill, ax.stroke, ax.stroke_width = stroke_color, stroke_color, 2.0 ax.tics(0,5) pl.axes["U"].tics(5,0) pl.axes["R"].tics(5,0) pl.draw_axes() pl.plot_label = "score_to_rms" pl.draw_plot_label() print(s) for i in range(len(energy)): pl.scatter(rms[i],energy[i], markersize=2, markerstyle='c', title="serie-%s"%(i)) drawing.close() |
seq_identity.py¶
Reads a .fasta file with a set of amino acid sequences and calculates each-vs-each pairwise alignments using semi-global aligner. Prints only these pairs for which sequence identity is higher than a given cutoff.
USAGE:
python333 seq_identity.py input.fasta cutoff
EXAMPLE:
python3 seq_identity.py cyped.CYP109.fasta 0.3
- REFERENCEs:
Smith, Temple F., and Michael S. Waterman. “Identification of common molecular subsequences.” Journal of molecular biology 147.1 (1981): 195-197. https://doi.org/10.1016/0022-2836(81)90087-5
Needleman, Saul B., and Christian D. Wunsch. “A general method applicable to the search for similarities in the amino acid sequence of two proteins.” JMB 48.3 (1970): 443-453. https://doi.org/10.1016/0022-2836(70)90057-4
Keywords:
Categories:
- core/protocols/PairwiseSequenceIdentityProtocol
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | import sys from pybioshell.std import vector_std_shared_ptr_core_data_sequence_Sequence_t from pybioshell.core.data.io import read_fasta_file from pybioshell.core.alignment import AlignmentType from pybioshell.core.protocols import PairwiseSequenceIdentityProtocol if len(sys.argv) < 3 : print(""" Reads a .fasta file with a set of amino acid sequences and calculates each-vs-each pairwise alignments using semi-global aligner. Prints only these pairs for which sequence identity is higher than a given cutoff. USAGE: python333 seq_identity.py input.fasta cutoff EXAMPLE: python3 seq_identity.py cyped.CYP109.fasta 0.3 REFERENCEs: Smith, Temple F., and Michael S. Waterman. "Identification of common molecular subsequences." Journal of molecular biology 147.1 (1981): 195-197. https://doi.org/10.1016/0022-2836(81)90087-5 Needleman, Saul B., and Christian D. Wunsch. "A general method applicable to the search for similarities in the amino acid sequence of two proteins." JMB 48.3 (1970): 443-453. https://doi.org/10.1016/0022-2836(70)90057-4 CATEGORIES: core/protocols/PairwiseSequenceIdentityProtocol KEYWORDS: FASTA input; sequence alignment GROUP: Alignments IMG: ap_aligned-1k6m-1bif.png """) sys.exit() cutoff = float(sys.argv[2]) sequences = vector_std_shared_ptr_core_data_sequence_Sequence_t() read_fasta_file(sys.argv[1], sequences, True) align_protocol = PairwiseSequenceIdentityProtocol() n_seq = align_protocol.add_input_sequences(sequences) align_protocol.n_threads(4).alignment_method(AlignmentType.SEMIGLOBAL_ALIGNMENT) align_protocol.run() for i in range(1,n_seq) : for j in range(i) : seq_id = align_protocol.get_sequence_identity(i,j) if seq_id > cutoff : print( i, j, seq_id) |

unwrap_pdb.py¶
Reads PDB file with wrapped coordinates (from simulation with periodic boundary conditions), unwraps them and generates PDB with it.
USAGE:
python3 unwrap_pdb.py input_pbc.pdb cutoff
EXAMPLE:
python3 unwrap_pdb.py out_pbc.pdb 40
Keywords:
- PDB input
- PBC
- SURPASS
- Vec3Cubic
Categories:
- core/data/basic/Vec3Cubic
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | import sys, math,copy from pybioshell.core.data.io import find_pdb from pybioshell.core.data.basic import Vec3Cubic from pybioshell.std import vector_core_data_basic_Vec3Cubic if len(sys.argv) < 3 : print(""" Reads PDB file with wrapped coordinates (from simulation with periodic boundary conditions), unwraps them and generates PDB with it. USAGE: python3 unwrap_pdb.py input_pbc.pdb cutoff EXAMPLE: python3 unwrap_pdb.py out_pbc.pdb 40 CATEGORIES: core/data/basic/Vec3Cubic KEYWORDS: PDB input; PBC; SURPASS; Vec3Cubic IMG: unwrapped.gif """) sys.exit() pdb = find_pdb(sys.argv[1], "./") n_atoms = pdb.count_atoms(0) cutoff = float(sys.argv[2]) Vec3Cubic.set_box_len(cutoff) xyz = vector_core_data_basic_Vec3Cubic() for i in range(n_atoms): xyz.append(Vec3Cubic()) for i_model in range(0, pdb.count_models()) : xyz = vector_core_data_basic_Vec3Cubic() for i in range(n_atoms): xyz.append(Vec3Cubic()) pdb.fill_structure(i_model, xyz) structure = pdb.create_structure(i_model) n_res = 0 print("MODEL ",i_model+1 ) for ia in range(structure.count_chains()): chain = structure[ia] #wrapping first atom of every chain to the first box if xyz[n_res ].x > cutoff: xyz[n_res ].x -= cutoff if xyz[n_res ].x < 0: xyz[n_res ].x += cutoff if xyz[n_res ].y > cutoff: xyz[n_res ].y -= cutoff if xyz[n_res ].y < 0: xyz[n_res ].y += cutoff if xyz[n_res ].z > cutoff: xyz[n_res ].z -= cutoff if xyz[n_res ].z < 0: xyz[n_res ].z += cutoff chain[0][0].set(xyz[n_res]) #calculating unwraped coordinates for ir in range(chain.count_residues()-1): ax = xyz[n_res+ir+1].closest_delta_x(xyz[n_res+ir]) ay = xyz[n_res+ir+1].closest_delta_y(xyz[n_res+ir]) az = xyz[n_res+ir+1].closest_delta_z(xyz[n_res+ir]) xyz[n_res+ir+1].x = xyz[n_res+ir].x + ax xyz[n_res+ir+1].y = xyz[n_res+ir].y + ay xyz[n_res+ir+1].z = xyz[n_res+ir].z + az chain[ir+1][0].set(xyz[n_res+ir+1]) n_res+=ir+2 #writing to PDB file for ir in range(chain.count_residues()) : resid = chain[ir] print(resid[0].to_pdb_line()) print("ENDMDL") |

align_sequences.py¶
Computes pairwise sequence alignment
USAGE:
python3 align_sequences.py input-1.fasta input-2.fasta [gap_open gap_cont]
EXAMPLE:
python3 align_sequences.py 2azaA.pdb 2pcyA.pdb
Keywords:
- alignment
Categories:
- core/alignment/SequenceNWAligner
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | import sys from pybioshell.core.alignment import SequenceNWAligner, SequenceSWAligner from pybioshell.core.data.io import read_fasta_file if len(sys.argv) < 3 : print(""" Computes pairwise sequence alignment USAGE: python3 align_sequences.py input-1.fasta input-2.fasta [gap_open gap_cont] EXAMPLE: python3 align_sequences.py 2azaA.pdb 2pcyA.pdb CATEGORIES: core/alignment/SequenceNWAligner KEYWORDS: alignment GROUP: Sequence calculations """) sys.exit() q = read_fasta_file(sys.argv[1])[0] t = read_fasta_file(sys.argv[2])[0] # ---------- calculate a global alignment aligner = SequenceNWAligner(max(q.length(), t.length())) # ---------- calculate a local alignment #aligner = SequenceSWAligner(max(q.length(), t.length())) if len(sys.argv) < 4 : score = aligner.align(q, t, -10, -1, "BLOSUM62") else: score = aligner.align(q, t, int(sys.argv[3]), int(sys.argv[4]), "BLOSUM62") alignment = aligner.backtrace_sequence_alignment() print("# score:", score) print("> query\n" + alignment.get_aligned_query()) print("> template\n" + alignment.get_aligned_template()) |

asn1_to_profile.py¶
Converts a sequence profile (in ASN.1 format) produced by psiblast to a flat tabular format
USAGE:
python asn1_to_profile.py input.asn1
EXAMPLE:
python asn1_to_profile.py d1or4A_.asn1
Keywords:
Categories:
- core/data/sequence/read_ASN1_checkpoint
Input files:
- d1or4A_.asn1_
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import sys from pybioshell.core.data.sequence import read_ASN1_checkpoint if len(sys.argv) < 2 : print(""" Converts a sequence profile (in ASN.1 format) produced by psiblast to a flat tabular format USAGE: python asn1_to_profile.py input.asn1 EXAMPLE: python asn1_to_profile.py d1or4A_.asn1 CATEGORIES: core/data/sequence/read_ASN1_checkpoint KEYWORDS: sequence profile; Format conversion GROUP: File processing; Format conversion """) sys.exit() profile = read_ASN1_checkpoint(sys.argv[1]) profile.write_table_header() profile.write_table() |

betastructures_graph.py¶
Shows how to use BetaStructuresGraph class from Python
USAGE:
python3 betastructures_graph.py input.pdb
EXAMPLE:
python3 betastructures_graph.py 5edw.pdb
Keywords:
Categories:
- core/calc/structural/ProteinArchitecture
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | import sys from pybioshell.core.data.io import Pdb from pybioshell.core.calc.structural import ProteinArchitecture if len(sys.argv) < 2 : print(""" Shows how to use BetaStructuresGraph class from Python USAGE: python3 betastructures_graph.py input.pdb EXAMPLE: python3 betastructures_graph.py 5edw.pdb CATEGORIES: core/calc/structural/ProteinArchitecture KEYWORDS: PDB input; graphs """) sys.exit() pdb_fname = sys.argv[1] structure = Pdb(pdb_fname, "").create_structure(0) architecture = ProteinArchitecture(structure) beta_graph = architecture.create_strand_graph() strands = beta_graph.get_strands_copy() print(" ",end ="") for i in range(len(strands)) : print(" S%2d " % (i),end ="") print() for i in range(len(strands)) : pairings = [] print("S%2d: " % (i),end ="") for j in range(len(strands)) : are_paired = False try : p = beta_graph.get_strand_pairing(strands[i],strands[j]) are_paired = True except: pass if are_paired : print(" X ",end ="") pairings.append(j) else : print(" ",end ="") print("| %-45s paired with %d" % (str(strands[i]),pairings[0]),end ="") for pi in range(1,len(pairings)) : print(", %d" % (pairings[pi]),end ="") print() |

caonly_multimodel.py¶
Reads multiple PDB files and writes C-alpha atom of all structures into a single multimodel pdb file. The input file is a simple text file providing PDB file names (one string per line).
USAGE:
python3 caonly_multimodel.py.py input_structres_list [output_fname.pdb]
EXAMPLE:
python3 caonly_multimodel.py.py cat_lits o.pdb
Keywords:
Categories:
- core/data/io/Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | import sys from pybioshell.core.data.io import Pdb, write_pdb if len(sys.argv) < 2 : print(""" Reads multiple PDB files and writes C-alpha atom of all structures into a single multimodel pdb file. The input file is a simple text file providing PDB file names (one string per line). USAGE: python3 caonly_multimodel.py.py input_structres_list [output_fname.pdb] EXAMPLE: python3 caonly_multimodel.py.py cat_lits o.pdb CATEGORIES: core/data/io/Pdb KEYWORDS: PDB input; structure selectors; PDB output GROUP: File processing """) sys.exit() input_fnames = open(sys.argv[1]) reader = Pdb(input_fnames.readline().strip(),"is_ca",False) structure = reader.create_structure(0) out_fname = "out.pdb" if len(sys.argv) == 2 else sys.argv[2] write_pdb(structure, out_fname, 1) i_model = 2 print("Reading PDB files: ",end="") for pdb_fname in input_fnames : print(pdb_fname.strip().split("/")[-1],end=" ") reader = Pdb(pdb_fname.strip(),"is_ca",False) reader.fill_structure(0,structure) write_pdb(structure, out_fname, i_model) i_model += 1 |

center_protein.py¶
Moves a given protein structure so its geometric center is located at (0,0,0).
USAGE:
python3 center_protein.py input.pdb [which_model]
EXAMPLE:
python3 center_protein.py 2kwi.pdb 51
Keywords:
- PDB input
- center protein
- internal coordinates
Categories:
- core/data/io/find_pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | import sys from pybioshell.core.data.io import find_pdb if len(sys.argv) < 2 : print(""" Moves a given protein structure so its geometric center is located at (0,0,0). USAGE: python3 center_protein.py input.pdb [which_model] EXAMPLE: python3 center_protein.py 2kwi.pdb 51 CATEGORIES: core/data/io/find_pdb KEYWORDS: PDB input; center protein; internal coordinates GROUP: Structure calculations; """) sys.exit() which_model = 0 if len(sys.argv) == 2 else (int(sys.argv[2])-1) pdb_reader = find_pdb(sys.argv[1], "./") structure = pdb_reader.create_structure(which_model) cx, cy, cz, n = 0, 0, 0, 0 for ic in range(structure.count_chains()) : chain = structure[ic] for ir in range(chain.count_residues()) : resid = chain[ir] for ai in range(resid.count_atoms()) : cx += resid[ai].x cy += resid[ai].y cz += resid[ai].z n+=1.0 cx /= n cy /= n cz /= n print("# Center was:",cx,cy,cz) for ic in range(structure.count_chains()) : chain = structure[ic] for ir in range(chain.count_residues()) : resid = chain[ir] for ai in range(resid.count_atoms()) : resid[ai].x -= cx resid[ai].y -= cy resid[ai].z -= cz print(resid[ai].to_pdb_line()) |

check_structure.py¶
Checks if a given structure has chain breaks
USAGE:
python3 check_structure.py input.pdb
EXAMPLE:
python3 check_structure.py 2gb1.pdb
Keywords:
Categories:
- core/calc/structural/
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | import sys sys.path.append("/Users/dgront/src.git/bioshell/bin") from pybioshell.core.data.io import Pdb if len(sys.argv) < 2 : print(""" Checks if a given structure has chain breaks USAGE: python3 check_structure.py input.pdb EXAMPLE: python3 check_structure.py 2gb1.pdb CATEGORIES: core/calc/structural/ KEYWORDS: PDB input; structural properties GROUP: Structure calculations """) sys.exit() N_gaps = 0 structure = Pdb(sys.argv[1],"",False).create_structure(0) for code in structure.chain_codes() : chain = structure.get_chain(code) r_prev, r = chain[0], chain[0] prev_ca = r.find_atom(" CA ") the_ca = prev_ca for ires in range(1,chain.size()): r_prev = r prev_ca = the_ca if not prev_ca: continue r = chain[ires] the_ca = r.find_atom(" CA ") if not the_ca: continue d = the_ca.distance_to(prev_ca) if d > 4.0: N_gaps += 1 print("chain %c: too long distance between CA of %s%d and %s%d residue: %6.3f" % (r.owner().id(), r_prev.residue_type().code3, r_prev.id(), r.residue_type().code3, r.id(), d)) print("# Summary for %s: n_gaps - %d" % (sys.argv[1], N_gaps)) |

cif_to_mol2.py¶
Converts a small molecule structure from CIF to MOL2 file format. The last, optional parameter of the script provides the name of a given molecule, that will be stored in MOL2 file
USAGE:
python3 cif_to_mol2.py input.cif [molecule_name]
EXAMPLE:
python3 cif_to_mol2.py HEM.cif [molecule_name]
python3 cif_to_mol2.py HEM.cif HAEM
Keywords:
- CIF input
- MOL2 output
- Format conversion
Categories:
- core/data/io/write_mol2
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | import sys from pybioshell.core.data.io import Cif, write_mol2 from pybioshell.core.chemical import MonomerStructure if len(sys.argv) < 2 : print(""" Converts a small molecule structure from CIF to MOL2 file format. The last, optional parameter of the script provides the name of a given molecule, that will be stored in MOL2 file USAGE: python3 cif_to_mol2.py input.cif [molecule_name] EXAMPLE: python3 cif_to_mol2.py HEM.cif [molecule_name] python3 cif_to_mol2.py HEM.cif HAEM CATEGORIES: core/data/io/write_mol2 KEYWORDS: CIF input; MOL2 output; Format conversion GROUP: File processing; Format conversion """) sys.exit() mm = MonomerStructure.from_cif(sys.argv[1]) if len(sys.argv) > 2: # --- set molecule name if provided from command line mm.molecule_name = sys.argv[2] write_mol2(mm, "stdout") |

convert_msa.py¶
Converts multiple sequence alignment (MSA) data from one format to another. Known input formats: FASTA (.fasta), HSSP (.hssp) and ClustalW/ClustalO (.aln) Known output formats: FASTA (.fasta) Input and output file formats are detected by file extension
USAGE:
python3 convert_msa.py input_file output_file
EXAMPLE:
python3 convert_msa.py cyped.CYP109.aln cyped.CYP109.fasta
Keywords:
Categories:
- core/data/io/read_hssp_file
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import sys from pybioshell.std import vector_std_shared_ptr_core_data_sequence_Sequence_t from pybioshell.core.data.io import read_hssp_file, write_clustalo_file, read_fasta_file, read_clustalw_file from pybioshell.utils import LogManager LogManager.FINEST() if len(sys.argv) < 3 : print(""" Converts multiple sequence alignment (MSA) data from one format to another. Known input formats: FASTA (.fasta), HSSP (.hssp) and ClustalW/ClustalO (.aln) Known output formats: FASTA (.fasta) Input and output file formats are detected by file extension USAGE: python3 convert_msa.py input_file output_file EXAMPLE: python3 convert_msa.py cyped.CYP109.aln cyped.CYP109.fasta CATEGORIES: core/data/io/read_hssp_file KEYWORDS: MSA; Format conversion GROUP: File processing; """) sys.exit() msa = vector_std_shared_ptr_core_data_sequence_Sequence_t() # --- vector to hold sequences obtained from an input file extension_in = sys.argv[1].split('.')[-1] # --- detect the input format if extension_in == 'aln': read_clustalw_file(sys.argv[1], msa) elif extension_in == 'hssp': read_hssp_file(sys.argv[1], msa) elif extension_in == 'fasta': read_fasta_file(sys.argv[1], msa) f = open(sys.argv[2],"w") for seq in msa: print(">", seq.header(), file=f) print(seq.sequence, file=f) f.close() |

crmsd_on_ligands.py¶
Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on all atoms of a ligand conformations. This scripts reads one or more PDB files, extracts all ligands that matches a given three-letter code and calculates cRMSD between them on all atoms.
USAGE:
python3 crmsd_on_ligands.py ligand input1.pdb [input2.pdb]
EXAMPLE:
python3 crmsd_on_ligands.py HEM 5ofq.pdb 4rm4.pdb
Keywords:
Categories:
- core/calc/structural/transformations/CrmsdOnVec3
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | import sys, math from pybioshell.core.data.io import Pdb from pybioshell.std import vector_core_data_basic_Vec3 from pybioshell.core.calc.structural.transformations import * from pybioshell.utils import LogManager LogManager.INFO() if len(sys.argv) < 3 : print(""" Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on all atoms of a ligand conformations. This scripts reads one or more PDB files, extracts all ligands that matches a given three-letter code and calculates cRMSD between them on all atoms. USAGE: python3 crmsd_on_ligands.py ligand input1.pdb [input2.pdb] EXAMPLE: python3 crmsd_on_ligands.py HEM 5ofq.pdb 4rm4.pdb CATEGORIES: core/calc/structural/transformations/CrmsdOnVec3 KEYWORDS: PDB input; ligand; crmsd GROUP: Structure calculations """) sys.exit() rms = CrmsdOnVec3() pdb_codes = [] # --- PDB code for every input structure: for informative output structures = [] # --- contains all input structures residues = [] # --- ligand residue objects (to keep information about residue number and chain) atoms_by_ligand = [] # --- a list of atoms for every ligand for pdb_code in sys.argv[2:] : pdb = Pdb(pdb_code,"",False) structure = pdb.create_structure(0) structures.append(structure) for ic in range(structure.count_chains()) : chain = structure[ic] for ir in range(chain.terminal_residue_index() + 1,chain.size()) : resid = chain[ir] code3 = resid.residue_type().code3 if code3 != sys.argv[1] : continue # Skip other ligands residues.append(resid) pdb_codes.append(pdb_code) atoms = vector_core_data_basic_Vec3() for ia in range(resid.count_atoms()): atoms.append(resid[ia]) atoms_by_ligand.append(atoms) for i_ligand in range(0, len(atoms_by_ligand)) : ir = residues[i_ligand] for j_ligand in range(i_ligand): jr = residues[j_ligand] crmsd_val = rms.crmsd(atoms_by_ligand[i_ligand], atoms_by_ligand[j_ligand], len(atoms_by_ligand[j_ligand])) print("%s %4d %3s %c - %s %4d %3s %c : %7.3f" % (pdb_codes[i_ligand], ir.id(), ir.residue_type().code3, ir.owner().id(), pdb_codes[j_ligand], jr.id(), jr.residue_type().code3, jr.owner().id(), crmsd_val)) |

fasta_subset.py¶
Reads a multiple FASTA files and print a randomly selected fraction of sequences.
USAGE:
python3 read_fasta.py faction input.fasta
EXAMPLE:
python3 read_fasta.py 0.01 small500_95identical.fasta
Keywords:
Categories:
- core/data/io/read_fasta_file
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import sys from random import random, seed from pybioshell.core.data.io import read_fasta_file, create_fasta_string if len(sys.argv) < 3 : print(""" Reads a multiple FASTA files and print a randomly selected fraction of sequences. USAGE: python3 read_fasta.py faction input.fasta EXAMPLE: python3 read_fasta.py 0.01 small500_95identical.fasta CATEGORIES: core/data/io/read_fasta_file KEYWORDS: FASTA input; sequence GROUP: File processing; Data filtering """) sys.exit() seed(0) fasta = read_fasta_file(sys.argv[2]) for fname in sys.argv[3:] : read_fasta_file(fname,fasta) fraction = float(sys.argv[1]) for seq in fasta: if random() < fraction : print(create_fasta_string(seq)) |

filter_scorefile.py¶
Reads Rosetta scorefile and prints only it’s requested part
EXAMPLE:
python3 filter_scorefile.py 1pgx-abinitio.fsc 1pgx-abinitio-bis.fsc score -50.0 40.0
Call python3 filter_scorefile.py -h for full help
Keywords:
- Rosetta scorefile
- :ref:``
Categories:
- core/data/io/read_scorefile
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import sys, argparse from pybioshell.core.data.io import read_scorefile if len(sys.argv) < 2 : print(""" Reads Rosetta scorefile and prints only it's requested part EXAMPLE: python3 filter_scorefile.py 1pgx-abinitio.fsc 1pgx-abinitio-bis.fsc score -50.0 40.0 Call python3 filter_scorefile.py -h for full help CATEGORIES: core/data/io/read_scorefile KEYWORDS: Rosetta scorefile; GROUP: Statistics; """) sys.exit() # -----------argument parsing parser = argparse.ArgumentParser(description="Reads Rosetta scorefile and prints only it's requested part") parser.add_argument('-f', '--file', help="input .fsc file", nargs='+', required=True) parser.add_argument('-c', '--column', help="column name(s) to keep", nargs='+', required=False, default=["score", "rms"]) parser = parser.parse_args() columns = [col_name for col_name in parser.column] # ---------- Print scorefile header print("SCORE: ", end="") for col_name in columns: print(col_name, end=" ") print() # ---------- Print scorefile data for file_name in parser.file: sf = read_scorefile(file_name) for i_row in range(len(sf)) : row = sf[i_row] print("SCORE: ",end="") for col_name in columns: print(row[sf.column_index(col_name)],end=" ") print() |

find_rings.py¶
Reads in a mall molecule (PDB file format) and prints all cycles (i.e. rings) that can be found. Note, that rings may be nested, e.g. naphthalene molecule has actually three rings!
USAGE:
python3 find_rings.py molecule.pdb
EXAMPLE:
python3 find_rings.py 9ZB_ideal.pdb
Keywords:
- PDB input
- small molecules
Categories:
- core/chemical/
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import sys from pybioshell.core.chemical import PdbMolecule from pybioshell.core.chemical import find_rings if len(sys.argv) < 2 : print(""" Reads in a mall molecule (PDB file format) and prints all cycles (i.e. rings) that can be found. Note, that rings may be nested, e.g. naphthalene molecule has actually three rings! USAGE: python3 find_rings.py molecule.pdb EXAMPLE: python3 find_rings.py 9ZB_ideal.pdb CATEGORIES: core/chemical/ KEYWORDS: PDB input; small molecules GROUP: small molecules; """) sys.exit() mol = PdbMolecule.from_pdb(sys.argv[1]) rings = find_rings(mol) for ring in rings: print("# ------------") for atom in ring: print(mol.get_atom(atom).to_pdb_line()) |

hhpred_to_modeller.py¶
Reads an output file produced by HHPred, that contains alignments between a query protein and template protein structues. Writes PIR input files necessary for Modeller to build structural models of the query based on a given alignment (by default the first alignment is used)
USAGE:
python3 hhpred_to_modeller.py hhpred_output [which-alignment [other alignments ... ] ]
EXAMPLE:
python3 hhpred_to_modeller.py CYP51F.hhpred 1 2
Keywords:
- HHPred
- comparative modelling
Categories:
- core/data/io/read_hhpred
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import sys from pybioshell.core.data.io import read_hhpred, create_pir_string if len(sys.argv) < 2 : print(""" Reads an output file produced by HHPred, that contains alignments between a query protein and template protein structues. Writes PIR input files necessary for Modeller to build structural models of the query based on a given alignment (by default the first alignment is used) USAGE: python3 hhpred_to_modeller.py hhpred_output [which-alignment [other alignments ... ] ] EXAMPLE: python3 hhpred_to_modeller.py CYP51F.hhpred 1 2 CATEGORIES: core/data/io/read_hhpred KEYWORDS: HHPred; comparative modelling GROUP: File processing; Format conversion """) sys.exit() alignments = read_hhpred(sys.argv[1]) which_ali = sys.argv[2:] if len(sys.argv) > 2 else [1] print(len(alignments),"alignments found in",sys.argv[1]) for i in which_ali : i = int(i) print("retriving alignment:",i,"as %d.pir" % (i)) f = open("%d.pir" % (i), "w") f.write(create_pir_string(alignments[i-1], 80)) f.close() |

ligand_contacts.py¶
Finds contacts between a ligand molecule and a protein. This scripts reads one or more PDB files, extracts all ligands that matches a given three-letter code and finds contacts between a ligand molecule and a protein for given cutoff. The script can also detectplausible hydrogen bonds between a ligand and a protein, but user must provide two JSON dictionaries: of hydrogen bond donors and acceptors. Use ‘-’ (dash character) to omit either of the two files and provide just one of them Note, that both files with JSON must have .json extension, otherwise the script will attempt to load them as PDB
USAGE:
python3 ligand_contacts.py ligand distance [donors.json acceptors.json] input.pdb [input2.pdb]
EXAMPLE:
python3 ligand_contacts.py HEM 3.5 5ofq.pdb 4rm4.pdb
python3 ligand_contacts.py TDZ 3.5 donors.json acceptors.json 2vn0.pdb
Keywords:
Categories:
- core/data/io/Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | import sys, math, json from pybioshell.core.data.io import Pdb from pybioshell.utils import LogManager from pybioshell.core.chemical import monomer_type_name, HydrogenBondFilter LogManager.INFO() if len(sys.argv) < 4: print(""" Finds contacts between a ligand molecule and a protein. This scripts reads one or more PDB files, extracts all ligands that matches a given three-letter code and finds contacts between a ligand molecule and a protein for given cutoff. The script can also detectplausible hydrogen bonds between a ligand and a protein, but user must provide two JSON dictionaries: of hydrogen bond donors and acceptors. Use '-' (dash character) to omit either of the two files and provide just one of them Note, that both files with JSON must have .json extension, otherwise the script will attempt to load them as PDB USAGE: python3 ligand_contacts.py ligand distance [donors.json acceptors.json] input.pdb [input2.pdb] EXAMPLE: python3 ligand_contacts.py HEM 3.5 5ofq.pdb 4rm4.pdb python3 ligand_contacts.py TDZ 3.5 donors.json acceptors.json 2vn0.pdb CATEGORIES: core/data/io/Pdb KEYWORDS: PDB input; ligand; structural properties GROUP: Structure calculations """) sys.exit() cutoff = float(sys.argv[2]) first_pdb = 3 extra_acceptors, extra_donors = {}, {} if sys.argv[3] == '-': first_pdb = 5 elif sys.argv[3].endswith(".json"): extra_donors = json.loads(open(sys.argv[3]).read()) first_pdb = 5 if sys.argv[4].endswith(".json"): extra_acceptors = json.loads(open(sys.argv[4]).read()) hb_filter = HydrogenBondFilter() for code3 in extra_acceptors.keys(): for atom_name in extra_acceptors[code3]: hb_filter.add_acceptor_definition(code3, atom_name) for code3 in extra_donors.keys(): for atom_name in extra_donors[code3]: hb_filter.add_donor_definition(code3, atom_name) for pdb_code in sys.argv[first_pdb:]: # --- Iterate over PDB input files if len(sys.argv[first_pdb:]) > 1: print("# Pdb file %s" % (pdb_code.split("/")[-1].split(".")[0])) pdb = Pdb(pdb_code, "", False) print(" ---- ligand ---- | --------- partner -------- | distance") print("c res id atname | c res id type atname | in Angstrom") for m in range(pdb.count_models()): # --- Iterate over all models in the input file if pdb.count_models() > 1: print("# Model %d" % (i + 1)) structure = pdb.create_structure(m) for ic in range(structure.count_chains()): lig_chain = structure[ic] for ir in range(lig_chain.count_residues()): ligand = lig_chain[ir] code3 = ligand.residue_type().code3 if code3 != sys.argv[1]: continue # Skip other ligands for iic in range(structure.count_chains()): other_chain = structure[iic] for r in range(other_chain.count_residues()): # ----Iterate over residues res = other_chain[r] if res == ligand: continue d = res.min_distance(ligand) # ---- If residue is close enough to ligand if d < cutoff: for ilig in range(ligand.count_atoms()): for ioth in range(res.count_atoms()): ligand_atom = ligand[ilig] other_atom = res[ioth] if ligand_atom.distance_to(other_atom) <= cutoff: extras = "" if hb_filter(ligand_atom, other_atom,ligand_atom.distance_to(other_atom)): extras = "HYDROGEN_BOND" print("%s %3s %4d %4s %s %3s %4d %6s %4s %6.3f %s" % (ligand.owner().id(), ligand.residue_type().code3, ligand.id(), ligand_atom.atom_name(), res.owner().id(), res.residue_type().code3, res.id(), monomer_type_name(res.residue_type()), other_atom.atom_name(), ligand_atom.distance_to(other_atom), extras)) |

ligand_crmsd_on_cofactor.py¶
Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on all atoms of a ligand conformations after superimposition based on another group, e.g. a cofactor. This scripts has been used in P450 analysis project: all PDB deposits with a drug (e.g. itraconazole) were pulled from PDB For each pair of structures, the optimal superimposition for haeme groups is found. Then the very transformation is used to transfrom coordinates a molecule of a drug and to compute crsmd on the two itraconazole molecules
USAGE:
python3 ligand_crmsd_on_cofactor.py cofactor-code3 ligand-code3 input1.pdb [input2.pdb]
EXAMPLE:
python3 crmsd_on_ligands.py HEM 1YN 5ofq.pdb 4rm4.pdb
Keywords:
Categories:
- core/calc/structural/transformations/CrmsdOnVec3
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | import sys, math sys.path.append("../../../../../bin/") from pybioshell.core.data.basic import Vec3 from pybioshell.core.data.io import Pdb from pybioshell.std import vector_core_data_basic_Vec3 from pybioshell.core.calc.structural.transformations import * from pybioshell.utils import LogManager LogManager.INFO() if len(sys.argv) < 3: print(""" Calculates cRMSD (coordinate Root-Mean-Square Deviation) value on all atoms of a ligand conformations after superimposition based on another group, e.g. a cofactor. This scripts has been used in P450 analysis project: all PDB deposits with a drug (e.g. itraconazole) were pulled from PDB For each pair of structures, the optimal superimposition for haeme groups is found. Then the very transformation is used to transfrom coordinates a molecule of a drug and to compute crsmd on the two itraconazole molecules USAGE: python3 ligand_crmsd_on_cofactor.py cofactor-code3 ligand-code3 input1.pdb [input2.pdb] EXAMPLE: python3 crmsd_on_ligands.py HEM 1YN 5ofq.pdb 4rm4.pdb CATEGORIES: core/calc/structural/transformations/CrmsdOnVec3 KEYWORDS: PDB input; ligand; crmsd GROUP: Structure calculations """) sys.exit() rms = CrmsdOnVec3() class Entry: def __init__(self, code, structure): self.pdb_code = code # --- PDB code for every input structure: for informative output self.structure = structure # --- contains all input structures self.ligand = None self.cofactor = None self.atoms_superimposed = vector_core_data_basic_Vec3() # --- a list of atoms to define rototranslation self.atoms_crmsd = vector_core_data_basic_Vec3() # --- a list of atoms to compute crmsd def is_OK(self): return self.ligand and self.cofactor def add_ligand(self,ligand): for ia in range(ligand.count_atoms()): e.atoms_crmsd.append(ligand[ia]) self.ligand = ligand def add_cofactor(self,cofactor): for ia in range(cofactor.count_atoms()): e.atoms_superimposed.append(cofactor[ia]) self.cofactor = cofactor rototranslation_code3 = sys.argv[1] crmsd_code3 = sys.argv[2] entries = [] for pdb_code in sys.argv[3:]: pdb = Pdb(pdb_code, "is_not_alternative is_not_water", False) structure = pdb.create_structure(0) for ic in range(structure.count_chains()): chain = structure[ic] # start from chain.terminal_residue_index() + 1 if you are sure the PDB file has TER lines e = Entry(pdb_code.split("/")[-1], structure) for ir in range(0, chain.size()): code3 = chain[ir].residue_type().code3 if code3 == rototranslation_code3: e.add_cofactor(chain[ir]) elif code3 == crmsd_code3: e.add_ligand(chain[ir]) if e.is_OK(): entries.append(e) tmp_vec = Vec3() output = open("%s-by-%s.pdb" % (crmsd_code3, rototranslation_code3), "w") n_superimposed = len(entries[0].atoms_superimposed) n_crmsd = len(entries[0].atoms_crmsd) for ei in entries: for ej in entries: if ei == ej: continue try : if len(ei.atoms_superimposed) != len(ej.atoms_superimposed): print("superimposed sets differ in size") continue crmsd_val_1 = rms.crmsd(ei.atoms_superimposed, ej.atoms_superimposed, n_superimposed, True) if len(ei.atoms_crmsd) != len(ej.atoms_crmsd): print("crsmd sets differ in size") continue crmsd_val_2 = rms.calculate_crmsd_value(ei.atoms_crmsd, ej.atoms_crmsd, n_crmsd) print("%s %4d %3s %c - %s %4d %3s %c : %7.3f %7.3f" % (ei.pdb_code, ei.ligand.id(), ei.ligand.residue_type().code3, ei.ligand.owner().id(), ej.pdb_code, ej.ligand.id(), ej.ligand.residue_type().code3, ej.ligand.owner().id(), crmsd_val_1, crmsd_val_2)) if ej == entries[0]: output.write("MODEL %1\n") for ai in range(ei.ligand.count_atoms()): rms.apply(ei.ligand[ai]) output.write(ei.ligand[ai].to_pdb_line() + "\n") output.write("ENDMDL\n") except: pass output.write("MODEL %d\n" % (1)) for ai in range(entries[0].ligand.count_atoms()): output.write(entries[0].ligand[ai].to_pdb_line() + "\n") output.write("ENDMDL\n") output.write("MODEL %d\n" % (2)) for ai in range(entries[0].cofactor.count_atoms()): output.write(entries[0].cofactor[ai].to_pdb_line() + "\n") output.write("ENDMDL\n") output.close() |

ligand_rototranslation.py¶
Calculates rototranslation transformation that superimposes a ligand molecule from one reference frame to another. As an output, prints the rototranslation
USAGE:
python3 ligand_rototranslation.py ligand-code3 reference.pdb input1.pdb [input2.pdb ...]
EXAMPLE:
python3 ligand_rototranslation.py CAM 2m56-ref.pdb 00199.pdb 00963.pdb 04473.pdb
Keywords:
Categories:
- core/calc/structural/transformations/CrmsdOnVec3
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | import sys, math sys.path.append("../../../../../bin/") from pybioshell.core.data.basic import Vec3 from pybioshell.core.data.io import Pdb from pybioshell.core.data.structural.selectors import SelectResidueByName from pybioshell.core.protocols import copy_selected_atoms, copy_selected_coordinates from pybioshell.core.calc.structural.transformations import * from pybioshell.utils import LogManager LogManager.INFO() IF_ROW_OUTPUT = False if len(sys.argv) < 3: print(""" Calculates rototranslation transformation that superimposes a ligand molecule from one reference frame to another. As an output, prints the rototranslation USAGE: python3 ligand_rototranslation.py ligand-code3 reference.pdb input1.pdb [input2.pdb ...] EXAMPLE: python3 ligand_rototranslation.py CAM 2m56-ref.pdb 00199.pdb 00963.pdb 04473.pdb CATEGORIES: core/calc/structural/transformations/CrmsdOnVec3 KEYWORDS: PDB input; ligand; crmsd GROUP: Structure calculations """) sys.exit() rms = CrmsdOnVec3() select_ligand = SelectResidueByName(sys.argv[1]) ref_strctr = Pdb(sys.argv[2], "").create_structure(0) ref_atoms = copy_selected_coordinates(ref_strctr, select_ligand) for f_model in sys.argv[3:]: strctr = Pdb(f_model, "").create_structure(0) ligand_atoms = copy_selected_coordinates(strctr, select_ligand) crsmd_val = rms.crmsd(ligand_atoms, ref_atoms, len(ref_atoms), True) # superimpose a reference onto a model # crsmd_val = rms.crmsd(ref_atoms, ligand_atoms, len(ref_atoms), True) # superimpose a model onto a reference if IF_ROW_OUTPUT: print("%7.4f %7.4f %7.4f %7.4f %7.4f %7.4f %7.4f %7.4f %7.4f %8.3f %8.3f %8.3f %8.3f %8.3f %8.3f" % (rms.rot_x().x, rms.rot_x().y, rms.rot_x().z, rms.rot_y().x, rms.rot_y().y, rms.rot_y().z, rms.rot_z().x, rms.rot_z().y, rms.rot_z().z, rms.tr_before().x, rms.tr_before().y, rms.tr_before().z, rms.tr_after().x, rms.tr_after().y, rms.tr_after().z)) else: print(rms) |

list_pdb_ligands.py¶
Prints names of ligand molecules found in a given PDB file.
A ligand is defined as a residue located after TER field in a PDB chain
USAGE:
python3 list_pdb_ligands.py input.pdb
EXAMPLE:
python3 list_pdb_ligands.py 5edw.pdb
Keywords:
Categories:
- core/data/io/find_pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import sys from pybioshell.core.data.io import find_pdb if len(sys.argv) < 2 : print(""" Prints names of ligand molecules found in a given PDB file. A ligand is defined as a residue located after TER field in a PDB chain USAGE: python3 list_pdb_ligands.py input.pdb EXAMPLE: python3 list_pdb_ligands.py 5edw.pdb CATEGORIES: core/data/io/find_pdb KEYWORDS: PDB input; ligand GROUP: File processing; Data filtering """) sys.exit() for pdb_fname in sys.argv[1:] : structure = find_pdb(pdb_fname, "./").create_structure(0) for ic in range(structure.count_chains()) : chain = structure[ic] #print(chain.terminal_residue_index()) for ir in range(chain.terminal_residue_index() + 1,chain.size()) : resid = chain[ir] code3 = resid.residue_type().code3 if resid.residue_type().code3 == "HOH" : continue # Skip water molecules, they are so obvious and abundant formula = structure.formula(code3) hetname = structure.hetname(code3) print("%3s %c %4d %s %s" %(code3, chain.id(), resid.id(), formula.strip(), hetname.strip())) |

msa_to_profile.py¶
Reads a multiple sequence alignment (MSA) (in .aln format) produced by ClustalO, calculates a sequence profile and prints in a flat tabular format
USAGE:
python3 msa_to_profile.py input.aln
EXAMPLE:
python3 msa_to_profile.py cyped.CYP109.aln
Keywords:
Categories:
- core/data/io/read_clustalw_file
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | import sys from pybioshell.std import vector_std_shared_ptr_core_data_sequence_Sequence_t from pybioshell.core.data.io import read_clustalw_file from pybioshell.core.data.sequence import SequenceProfile if len(sys.argv) < 2 : print(""" Reads a multiple sequence alignment (MSA) (in .aln format) produced by ClustalO, calculates a sequence profile and prints in a flat tabular format USAGE: python3 msa_to_profile.py input.aln EXAMPLE: python3 msa_to_profile.py cyped.CYP109.aln CATEGORIES: core/data/io/read_clustalw_file KEYWORDS: MSA; sequence profile; Format conversion GROUP: Sequence calculations; """) sys.exit() msa = vector_std_shared_ptr_core_data_sequence_Sequence_t() read_clustalw_file(sys.argv[1], msa) profile = SequenceProfile(msa[0], SequenceProfile.aaOrderByPropertiesGapped(), msa) profile.write_table() |

partial_thread.py¶
Reads a FASTA file with two aligned sequences: a query and a template, and a template structure. Prints a partial thread of the template, i.e. the fragment of a template structure that is aligned with a query
USAGE:
python3 partial_thread.py ali.fasta template.pdb [chain-id]
EXAMPLE:
python3 partial_thread.py 2azaA_2pcyA-ali.fasta 2aza.pdb A
Keywords:
Categories:
- core/alignment
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | import sys from pybioshell.core.data.io import read_fasta_file, Pdb if len(sys.argv) < 3 : print(""" Reads a FASTA file with two aligned sequences: a query and a template, and a template structure. Prints a partial thread of the template, i.e. the fragment of a template structure that is aligned with a query USAGE: python3 partial_thread.py ali.fasta template.pdb [chain-id] EXAMPLE: python3 partial_thread.py 2azaA_2pcyA-ali.fasta 2aza.pdb A CATEGORIES: core/alignment KEYWORDS: FASTA input; sequence GROUP: File processing; Data filtering """) sys.exit() def select_aligned_residues(query_seq, template_seq, template_chain): j = 0 out = [] for i in range(len(query_seq)): if template_seq[i] != '-': if query_seq[i] != '-': out.append(template_chain[j]) j += 1 return out def print_atoms(residues): for r in residues: for i in range(r.count_atoms()): print(r[i].to_pdb_line()) fasta = read_fasta_file(sys.argv[1]) seq1 = fasta[0].sequence seq1_gapless = fasta[0].create_ungapped_sequence().sequence seq2 = fasta[1].sequence seq2_gapless = fasta[1].create_ungapped_sequence().sequence print(fasta[0].sequence,fasta[1].sequence) strctr = Pdb(sys.argv[2],"").create_structure(0) if len(sys.argv) > 3: chain = strctr.get_chain(sys.argv[3]) else: chain = strctr[0] seq_pdb = chain.create_sequence().sequence if seq_pdb == seq1_gapless: atoms = select_aligned_residues(seq2, seq1, chain) print_atoms(atoms) elif seq_pdb == seq2_gapless: atoms = select_aligned_residues(seq1, seq2, chain) print_atoms(atoms) else: print("template sequence can't be identified in the given alignment") |

pdb_from_clustering.py¶
Extracts PDB clusters from clustering results produced by ap_cluster_ligands output SEE:
ap_cluster_ligands program to see how to run clustering
USAGE:
python3 pdb_from_clusters.py clustering_output.txt ligand_code
EXAMPLE:
python3 pdb_from_clustering.py clustering_output.txt Clo
Keywords:
Categories:
- core/data/io/Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | import sys from pybioshell.core.data.io import Pdb if len(sys.argv) < 2 : print(""" Extracts PDB clusters from clustering results produced by ap_cluster_ligands output SEE: ap_cluster_ligands program to see how to run clustering USAGE: python3 pdb_from_clusters.py clustering_output.txt ligand_code EXAMPLE: python3 pdb_from_clustering.py clustering_output.txt Clo CATEGORIES: core/data/io/Pdb KEYWORDS: PDB output; clustering GROUP: File processing; Structure calculations """) sys.exit() chain_ids = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890abcdefghijklmnopqrstuvw" clusters_file = open(sys.argv[1]) ligand_code = sys.argv[2] iline = 0 for line in clusters_file: iline += 1 tokens = line.strip().split() outp = open("c"+str(iline)+tokens[0]+".pdb","w") p = Pdb(tokens[1],"", False) s = p.create_structure(0) i_chain = 0 for ic in range(s.count_chains()): c = s[ic] for ir in range(c.count_residues()): r = c[ir] for ia in range(r.count_atoms()): a = r[ia] outp.write(a.to_pdb_line() + "\n") for fname in tokens[2:]: p = Pdb(fname,"", False) s = p.create_structure(0) for ic in range(s.count_chains()): c = s[ic] for ir in range(c.count_residues()): r = c[ir] if r.residue_type().code3 != ligand_code: continue r.owner().id(chain_ids[i_chain]) for ia in range(r.count_atoms()): a = r[ia] outp.write(a.to_pdb_line() + "\n") outp.close() |

pdb_info.py¶
Reads a PDB file and extracts some basic information from its header
USAGE:
python3 pdb_info.py input.pdb [input2.pdb]
EXAMPLE:
python3 pdb_info.py 2kwi.pdb
Keywords:
- PDB input
- :ref:``
Categories:
- core/data/io/Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import sys from pybioshell.core.data.io import Pdb if len(sys.argv) < 2 : print(""" Reads a PDB file and extracts some basic information from its header USAGE: python3 pdb_info.py input.pdb [input2.pdb] EXAMPLE: python3 pdb_info.py 2kwi.pdb CATEGORIES: core/data/io/Pdb KEYWORDS: PDB input; GROUP: File processing; """) sys.exit() for pdb_fname in sys.argv[1:] : s = Pdb(pdb_fname, "", True).create_structure(0) print(s.classification()) print("protein", s.code(), "has", s.count_chains(), "chain(s),", s.count_residues(), "residues and", s.count_atoms(), "atoms\n") print("deposited : ", s.deposition_date()) print("Is XRAY? : ", (s.is_xray())) print("Is NMR? : ", (s.is_nmr())) print("Is EM? : ", (s.is_em())) print("resolution: ", s.resolution()) print("R-value : ", s.r_value()) print("R-free : ", s.r_free()) if len(s.keywords()) > 0: print("Keywords : ", s.keywords()[0], end="") for k in s.keywords()[1:]: print(", ", k, end="") print() |

pdb_to_fasta.py¶
Extracts amino acid (or nucleotide) sequence from a PDB file. Note, that by default ligands are not included in the output sequence even if they are amino acids (e.g. 7dk3 deposit)
USAGE:
python3 pdb_to_fasta.py input.pdb [input2.pdb]
EXAMPLE:
python3 pdb_to_fasta.py 2kwi.pdb
Keywords:
Categories:
- core/data/io/find_pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | import sys from pybioshell.core.data.io import find_pdb # change that setting to False to include ligands in the output sequence IF_EXCLUDE_LIGANDS = True if len(sys.argv) < 2 : print(""" Extracts amino acid (or nucleotide) sequence from a PDB file. Note, that by default ligands are not included in the output sequence even if they are amino acids (e.g. 7dk3 deposit) USAGE: python3 pdb_to_fasta.py input.pdb [input2.pdb] EXAMPLE: python3 pdb_to_fasta.py 2kwi.pdb CATEGORIES: core/data/io/find_pdb KEYWORDS: PDB input; FASTA; Format conversion GROUP: File processing; Format conversion """) sys.exit() for pdb_fname in sys.argv[1:] : structure = find_pdb(pdb_fname, "./").create_structure(0) for ic in range(structure.count_chains()) : chain = structure[ic] print(">",structure.code(), chain.id()) print(chain.create_sequence(IF_EXCLUDE_LIGANDS).sequence) |

pdb_to_seq.py¶
Converts sequence in a PDB format to SEQ format.
USAGE:
python3 pdb_to_ss2.py input.pdb [input.ss2]
EXAMPLE:
python3 pdb_to_ss2.py 2gb1.pdb 2gb1.out
python3 pdb_to_ss2.py 2kwi.pdb
Keywords:
Categories:
- core/data/io/write_seq
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import sys from pybioshell.core.data.io import Pdb,create_seq_string if len(sys.argv) < 2 : print(""" Converts sequence in a PDB format to SEQ format. USAGE: python3 pdb_to_ss2.py input.pdb [input.ss2] EXAMPLE: python3 pdb_to_ss2.py 2gb1.pdb 2gb1.out python3 pdb_to_ss2.py 2kwi.pdb CATEGORIES: core/data/io/write_seq KEYWORDS: PDB input; secondary structure; Format conversion GROUP: File processing; Format conversion """) sys.exit() structure = Pdb(sys.argv[1],"").create_structure(0) outname = sys.argv[2] if len(sys.argv) > 2 else "stdout" for ic in range(structure.count_chains()) : chain = structure[ic] ss = chain.create_sequence() a = create_seq_string(ss) print(a) |

pdb_to_ss2.py¶
Extracts amino acid (or nucleotide) sequence from a PDB file.
USAGE:
python3 pdb_to_ss2.py input.pdb [input.ss2]
EXAMPLE:
python3 pdb_to_ss2.py 2gb1.pdb 2gb1.out
python3 pdb_to_ss2.py 2kwi.pdb
Keywords:
Categories:
- core/data/io/write_ss2
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | import sys from pybioshell.core.data.io import find_pdb, write_ss2 if len(sys.argv) < 2 : print(""" Extracts amino acid (or nucleotide) sequence from a PDB file. USAGE: python3 pdb_to_ss2.py input.pdb [input.ss2] EXAMPLE: python3 pdb_to_ss2.py 2gb1.pdb 2gb1.out python3 pdb_to_ss2.py 2kwi.pdb CATEGORIES: core/data/io/write_ss2 KEYWORDS: PDB input; secondary structure; Format conversion GROUP: File processing; Format conversion """) sys.exit() structure = find_pdb(sys.argv[1], "./").create_structure(0) outname = sys.argv[2] if len(sys.argv) > 2 else "stdout" for ic in range(structure.count_chains()) : chain = structure[ic] ss = chain.create_sequence() write_ss2(ss,outname) print() # Print empty line to separate chain: note that it works only when printed to stdout |

radial_distribution_function.py¶
Calculates radial distribution function for a trajectory from a molecular simulation.
USAGE:
python3 radial_distribution_function.py input_tra.pdb cutoff
EXAMPLE:
python3 radial_distribution_function.py ar_tra.pdb 27.6214
Keywords:
Categories:
- core/data/basic/Vec3Cubic
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import sys, math from pybioshell.core.data.io import find_pdb from pybioshell.core.data.basic import Vec3Cubic from pybioshell.std import vector_core_data_basic_Vec3Cubic if len(sys.argv) < 3 : print(""" Calculates radial distribution function for a trajectory from a molecular simulation. USAGE: python3 radial_distribution_function.py input_tra.pdb cutoff EXAMPLE: python3 radial_distribution_function.py ar_tra.pdb 27.6214 CATEGORIES: core/data/basic/Vec3Cubic KEYWORDS: PDB input; structural properties GROUP: Structure calculations; """) sys.exit() pdb = find_pdb(sys.argv[1], "./") n_atoms = pdb.count_atoms(0) cutoff = float(sys.argv[2]) Vec3Cubic.set_box_len(cutoff) xyz = vector_core_data_basic_Vec3Cubic() for i in range(n_atoms) : xyz.append( Vec3Cubic() ) histogram = [0 for i in range(121)] for i_model in range(0, pdb.count_models()) : # print i_model pdb.fill_structure(i_model, xyz) for i_atom in range(n_atoms) : for j_atom in range(i_atom) : d = xyz[i_atom].closest_distance_square_to(xyz[j_atom],12*12) if d < 144 : histogram[ int(math.sqrt(d)*10) ] += 1 for i in range(1,120) : print("%5f %7.2f" % (i*0.1, histogram[i]/(i*i*0.01))) |

read_scorefile.py¶
Simple example that parses a score file (Rosetta output)
USAGE:
python3 read_scorefile.py score-file
EXAMPLE:
python3 read_scorefile.py scores.sf
Keywords:
- scorefile input
- :ref:``
Categories:
- core/data/io/read_scorefile
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import sys from pybioshell.core.data.io import read_scorefile if len(sys.argv) < 2 : print(""" Simple example that parses a score file (Rosetta output) USAGE: python3 read_scorefile.py score-file EXAMPLE: python3 read_scorefile.py scores.sf CATEGORIES: core/data/io/read_scorefile KEYWORDS: scorefile input; GROUP: File processing; Format conversion """) sys.exit() sf = read_scorefile(sys.argv[1]) print("Number of rows: %d" % len(sf)) print("Number of columns: %d" % sf[0].size()) print("Known columns:") for i in range(sf[0].size()) : print(sf.column_name(i)) |

rg.py¶
Calculates the radius of gyration from given pdb file coordinates.
USAGE:
python3 rg.py input.pdb
EXAMPLE:
python3 rg.py 1cey.pdb
Keywords:
Categories:
- core/calc/structural/calculate_Rg_square
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | import sys, math from pybioshell.core.data.io import find_pdb from pybioshell.core.data.basic import Vec3 from pybioshell.std import vector_core_data_basic_Vec3 from pybioshell.core.calc.structural import * from pybioshell.utils import LogManager LogManager.INFO() if len(sys.argv) < 2 : print(""" Calculates the radius of gyration from given pdb file coordinates. USAGE: python3 rg.py input.pdb EXAMPLE: python3 rg.py 1cey.pdb CATEGORIES: core/calc/structural/calculate_Rg_square KEYWORDS: PDB input; structural properties GROUP: Structure calculations; """) sys.exit() for pdb_fname in sys.argv[1:] : pdb=find_pdb(pdb_fname, "./") n_atoms = pdb.count_atoms(0) structure = pdb.create_structure(0) models=[] for i_model in range(0, pdb.count_models()) : xyz=vector_core_data_basic_Vec3() for i in range(n_atoms) : xyz.append( Vec3() ) models.append(xyz) for i_model in range(0, pdb.count_models()) : pdb.fill_structure(i_model, models[i_model]) try: print("Rg for %s, model # %5d : %7.3f" % (pdb_fname.split("/")[-1].split(".")[0],i_model, math.sqrt(calculate_Rg_square(models[i_model][0], models[i_model][n_atoms-1])))) except: sys.stderr.write(str(sys.exc_info()[0])+" "+str(sys.exc_info()[1])) |

superimpose_by_fragment.py¶
Superimposes protein structures based on a structural fragment.
This script superimposes all models given at command line (at least one) on the reference structure. The superimposition is based on C-alpha atoms of residues from %d to %d. If you need another fragment, change these values in the script!
USAGE:
python3 superimpose_by_fragment.py reference.pdb model1.pdb [model2.pdb...]
EXAMPLE:
python3 superimpose_by_fragment.py 2gb1.pdb 2gb1-model1.pdb 2gb1-model2.pdb
Keywords:
Categories:
- core/calc/structural/transformations/Crmsd
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | import sys, math, os from pybioshell.core.data.io import Pdb, write_pdb from pybioshell.core.data.basic import Vec3 from pybioshell.std import vector_core_data_basic_Vec3 from pybioshell.core.calc.structural.transformations import * from pybioshell.utils import LogManager REFERENCE_FROM, REFERENCE_TO = 23, 32 # 25, 390 LogManager.INFO() if len(sys.argv) < 3: print(""" Superimposes protein structures based on a structural fragment. This script superimposes all models given at command line (at least one) on the reference structure. The superimposition is based on C-alpha atoms of residues from %d to %d. If you need another fragment, change these values in the script! USAGE: python3 superimpose_by_fragment.py reference.pdb model1.pdb [model2.pdb...] EXAMPLE: python3 superimpose_by_fragment.py 2gb1.pdb 2gb1-model1.pdb 2gb1-model2.pdb CATEGORIES: core/calc/structural/transformations/Crmsd KEYWORDS: PDB input; crmsd GROUP: Structure calculations """ % (REFERENCE_FROM, REFERENCE_TO)) sys.exit() rms = CrmsdOnVec3() pdb = Pdb(sys.argv[1], "", False) # --- read the reference PDB file - only C-alfas structure = pdb.create_structure(0) n_atoms = REFERENCE_TO - REFERENCE_FROM + 1 xyz = vector_core_data_basic_Vec3() # --- std::vector of Vec3 object is required to calculate superimposition for i in range(REFERENCE_FROM, REFERENCE_TO+1): # --- fill the vector with the selected reference coordinates r = structure[0][i] # --- i-th residue of the first chain xyz.append(r.find_atom(" CA ")) #out_fname = "rot.pdb" for pdb_fname in sys.argv[2:]: # --- iterate over all models out_fname = "rot-" + pdb_fname.split(os.path.sep)[-1] other_pdb = Pdb(pdb_fname, "", False) other_structure = other_pdb.create_structure(0) other_xyz = vector_core_data_basic_Vec3() # --- container for coordinates of a model try: for i in range(REFERENCE_FROM, REFERENCE_TO+1): # --- fill the vector with the selected coordinates r = other_structure[0][i] # --- i-th residue of the first chain other_xyz.append(r.find_atom(" CA ")) rms_val = rms.crmsd(xyz, other_xyz, n_atoms, True) rms.apply_inverse(other_structure) write_pdb(other_structure, out_fname, 0) except: sys.stderr.write(str(sys.exc_info()[0]) + " " + str(sys.exc_info()[1])) |

tmscore.py¶
Calculates TMScore value on two or more structures. First file is a referance structure and the second can be multimodel pdb. Calculations is running between reference structure and every model from the second file.
USAGE:
python3 tmscore.py file1.pdb [file2.pdb...]
EXAMPLE:
python3 tmscore.py 2gb1-model1.pdb 2gb1-model2.pdb
Keywords:
- PDB input
- TMScore
Categories:
- core/calc/structural/transformations/TMScore
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | import sys, math from pybioshell.core.data.io import Pdb from pybioshell.core.data.basic import Vec3 from pybioshell.std import vector_core_data_basic_Vec3 from pybioshell.core.calc.structural.transformations import * from pybioshell.utils import LogManager LogManager.INFO() if len(sys.argv) < 2: print(""" Calculates TMScore value on two or more structures. First file is a referance structure and the second can be multimodel pdb. Calculations is running between reference structure and every model from the second file. USAGE: python3 tmscore.py file1.pdb [file2.pdb...] EXAMPLE: python3 tmscore.py 2gb1-model1.pdb 2gb1-model2.pdb CATEGORIES: core/calc/structural/transformations/TMScore KEYWORDS: PDB input; TMScore GROUP: Structure calculations """) sys.exit() if len(sys.argv) == 3: pdb = Pdb(sys.argv[1], "is_not_water", False) n_atoms = pdb.count_atoms(0) ref_structure = pdb.create_structure(0) ref_xyz = vector_core_data_basic_Vec3() for i in range(n_atoms): ref_xyz.append(Vec3()) pdb.fill_structure(0, ref_xyz) pdb = Pdb(sys.argv[2], "is_not_water", False) n_atoms = pdb.count_atoms(0) structure = pdb.create_structure(0) models = [] for i_model in range(0, pdb.count_models()): xyz = vector_core_data_basic_Vec3() for i in range(n_atoms): xyz.append(Vec3()) models.append(xyz) for i_model in range(0, pdb.count_models()): pdb.fill_structure(i_model, models[i_model]) tmscore = TMScore(models[i_model],ref_xyz) try: print("%2d %6.3f" % (i_model,tmscore.tmscore())) except: sys.stderr.write(str(sys.exc_info()[0]) + " " + str(sys.exc_info()[1])) |

validate_saturated_ring6.py¶
Validates a hexagonal saturated ring, defined by 6 atoms.
USAGE:
python3 validate_saturated_ring6.py input.pdb ligand _atom1_ _atom2_ _atom3_ _atom4_ _atom5_ _atom6_
EXAMPLE:
python3 validate_saturated_ring6.py 4jm3.pdb EPE _N1_ _C2_ _C3_ _N4_ _C5_ _C6_
Keywords:
Categories:
- core/calc/structural/SaturatedRing6Geometry
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import sys, math from pybioshell.core.data.io import find_pdb from pybioshell.core.calc.structural import SaturatedRing6Geometry if len(sys.argv) < 3 : print(""" Validates a hexagonal saturated ring, defined by 6 atoms. USAGE: python3 validate_saturated_ring6.py input.pdb ligand _atom1_ _atom2_ _atom3_ _atom4_ _atom5_ _atom6_ EXAMPLE: python3 validate_saturated_ring6.py 4jm3.pdb EPE _N1_ _C2_ _C3_ _N4_ _C5_ _C6_ CATEGORIES: core/calc/structural/SaturatedRing6Geometry KEYWORDS: PDB input; structural properties GROUP: Structure calculations """) sys.exit() pdb = find_pdb(sys.argv[1], "./") strctr = pdb.create_structure(0) for i_chain in range(strctr.count_chains()) : chain = strctr[i_chain] for i_res in range(chain.count_residues()) : if chain[i_res].residue_type().code3 == sys.argv[2] : atoms = [] for at_name in sys.argv[3:] : try : at_name_fixed = at_name.replace("_"," " ) atoms.append( chain[i_res].find_atom(at_name_fixed)) if not atoms[-1] : sys.stderr.write("Can't find atom "+at_name_fixed+" in "+sys.argv[2]+" residue\n") except : sys.stderr.write("Can't find atom "+at_name+" in "+sys.argv[2]+" residue\n") s = SaturatedRing6Geometry(atoms[0],atoms[1],atoms[2],atoms[3],atoms[4],atoms[5]) print(s.first_wing_angle(),s.second_wing_angle()) |

ex_* programs¶
These group contains unit test, i.e. programs that tests a single class of a function.
ex_BinaryTreeNode¶
Simple demo for BinaryTreeNode class
Keywords:
Categories:
- core/algorithms/trees/BinaryTreeNode
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | #include <memory> #include <iostream> #include <core/algorithms/trees/TreeNode.hh> #include <core/algorithms/trees/algorithms.hh> #include <core/algorithms/trees/trees_io.hh> /** @brief Simple demo for BinaryTreeNode class * * This program creates a small tree with 6 nodes and performs various operations on it * * CATEGORIES: core/algorithms/trees/BinaryTreeNode * KEYWORDS: algorithms; data structures; graphs * IMG: ex_BinaryTreeNode_1.png * IMG_ALT: Example tree node */ int main(const int argc, const char* argv[]) { using namespace core::algorithms::trees; typedef std::shared_ptr<BinaryTreeNode<char>> Node_SP; // --- Let's make the typename shorter Node_SP p1(new BinaryTreeNode<char>(0, 'A')); Node_SP p2(new BinaryTreeNode<char>(1, 'B')); Node_SP p3(new BinaryTreeNode<char>(2, 'C')); Node_SP p4(new BinaryTreeNode<char>(3, 'D')); Node_SP p5(new BinaryTreeNode<char>(4, 'E')); Node_SP p6(new BinaryTreeNode<char>(5, 'F')); p1->set_left_right(p2, p3); p3->set_left_right(p4, p5); p2->set_left(p6); std::cout << "Size of the whole tree and its right branch: " << size(p1) << " " << size(p3) << " (should be 6 and 3)\n"; std::vector<char> elements; collect_leaf_elements(p1, elements); std::cout << "Leaf-only elements (E D F):\n"; for (std::vector<char>::const_iterator i = elements.begin(); i != elements.end(); i++) std::cout << *i << ' '; std::cout << "\n"; std::cout << "All elements stored on the tree (A C E D B F):\n"; elements.clear(); collect_elements(p1, elements); for (std::vector<char>::const_iterator i = elements.begin(); i != elements.end(); i++) std::cout << *i << ' '; std::cout << "\n"; std::cout << "Leaf-only nodes (E D F):\n"; elements.clear(); std::vector<Node_SP> nodes; collect_leaf_nodes(p1, nodes); for (Node_SP & i : nodes) std::cout << i->element << ' '; std::cout << "\n"; std::cout << "The tree was:\n"; XMLFormatters<Node_SP> xml(std::cout); write_tree(p1, xml.start, xml.leaf, xml.stop); return 0; } |

ex_Molecule¶
Demonstrates how to create a Molecule object based on PdbAtom data type (as nodes of the graph)
Keywords:
Categories:
- core::chemical::Molecule
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | #include <iostream> #include <memory> #include <core/index.hh> #include <core/algorithms/graph_algorithms.hh> #include <core/chemical/Molecule.hh> #include <core/chemical/molecule_utils.hh> #include <core/calc/structural/angles.hh> #include <core/data/structural/PdbAtom.hh> #include <core/chemical/PdbMolecule.hh> #include <core/chemical/Bond.hh> core::chemical::PdbMolecule_SP create_toluene_molecule() { using namespace core::chemical; using namespace core::data::structural; // --- Define atoms that we use to build a molecule PdbAtom_SP atoms[] = {std::make_shared<PdbAtom>(1, " C1 ", 0, 0, 0), std::make_shared<PdbAtom>(2, " C2 ", 1.24, 0.72, 0), std::make_shared<PdbAtom>(3, " C3 ", 1.24, 2.16, 0), std::make_shared<PdbAtom>(4, " C4 ", 0, 2.88, 0), std::make_shared<PdbAtom>(5, " C5 ", -1.24, 2.16, 0), std::make_shared<PdbAtom>(6, " C6 ", -1.24, 0.72, 0), std::make_shared<PdbAtom>(7, " C7 ", 0, -1.52, 0)}; PdbMolecule_SP toluene = std::make_shared<PdbMolecule>(); // --- Insert atoms into the molecule for (PdbAtom_SP ai : atoms) toluene->add_atom(ai); // --- Create bonds between them toluene->bind_atoms(0, 1, BondType::AROMATIC); toluene->bind_atoms(1, 2, BondType::AROMATIC); toluene->bind_atoms(2, 3, BondType::AROMATIC); toluene->bind_atoms(3, 4, BondType::AROMATIC); toluene->bind_atoms(4, 5, BondType::AROMATIC); toluene->bind_atoms(0, 5, BondType::AROMATIC); toluene->bind_atoms(0, 6, BondType::SINGLE); return toluene; } /** @brief Demonstrates how to create a Molecule object based on PdbAtom data type (as nodes of the graph) * * This demo is similar to ex_Molecule_vec3, the difference is that here PdbAtom instances are used as graph nodes * rather than Vec3 instance. It creates a toluene molecule and detects planar and dihedral angles. * * CATEGORIES: core::chemical::Molecule * KEYWORDS: molecule * IMG: Toluen_dihedral_flat_angle.png * IMG_ALT: Planar angles in a toluen molecule */ int main(const int argc, const char *argv[]) { using namespace core::chemical; using namespace core::data::structural; PdbMolecule_SP molecule; if (argc == 1) molecule = create_toluene_molecule(); else { // --- Read structure that we use to build a molecule core::data::io::Pdb reader(argv[1]); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- Create molecule object molecule = structure_to_molecule(*strctr); //molecule = create_molecule<Structure::atom_iterator>(strctr->first_atom(), strctr->last_atom(), 0.1); } // --- Print some info about the molecule std::cout << molecule->count_atoms() << " atoms, " << molecule->count_bonds() << " bonds\n"; for (auto atom_it=molecule->begin_atom();atom_it!=molecule->end_atom();++atom_it) { PdbAtom_SP ai = *atom_it; std::cout << "atom " << ai->id() << " bonded to " << molecule->count_bonds(ai) << " atoms:"; for (auto n_it = molecule->begin_atom(ai); n_it != molecule->end_atom(ai); ++n_it) std::cout << " " << (*n_it)->id(); std::cout << "\n"; } // --- Find all planar angles in the molecule std::vector<std::tuple<PdbAtom_SP, PdbAtom_SP, PdbAtom_SP>> planars; find_planar_angles(*molecule, planars); // --- Sort the angles just to make the output stable i.e. every time in the same order so it can be used for benchmarking std::sort(planars.begin(), planars.end(), ComparePlanarAngles()); std::cout << "Detected planar angles:\n"; // --- Evaluate and print all the planars for (auto pi : planars) { PdbAtom &a1 = *std::get<0>(pi); PdbAtom &a2 = *std::get<1>(pi); PdbAtom &a3 = *std::get<2>(pi); std::cout << a1.id() << " -- " << a2.id() << " -- " << a3.id() << " " << core::calc::structural::evaluate_planar_angle(a1, a2, a3) * 180.0 / 3.14159 << "\n"; } // --- Find all torsion angles in the molecule std::vector<std::tuple<PdbAtom_SP, PdbAtom_SP, PdbAtom_SP, PdbAtom_SP>> torsions; find_torsion_angles(*molecule, torsions); // --- Sort also dihedral angles std::sort(torsions.begin(), torsions.end(), CompareDihedralAngles()); std::cout << "Detected dihedral angles:\n"; // --- Evaluate and print all the planars for (auto ti : torsions) { PdbAtom &a1 = *std::get<0>(ti); PdbAtom &a2 = *std::get<1>(ti); PdbAtom &a3 = *std::get<2>(ti); PdbAtom &a4 = *std::get<3>(ti); std::cout << a1.id() << " -- " << a2.id() << " -- " << a3.id() << " -- " << a4.id() << " " << core::calc::structural::evaluate_dihedral_angle(a1, a2, a3, a4) * 180.0 / 3.14159 << "\n"; } // --- Here we find the benzene ring in the molecule - a cycle in a graph std::vector<std::vector<core::index4>> cycles = core::algorithms::find_cycles<PdbMolecule, PdbAtom_SP, std::shared_ptr<BondType >>(*molecule); std::cout << "Atoms in a cycle:"; for (core::index4 i:cycles[0]) std::cout << " " << molecule->get_atom(i)->atom_name(); std::cout << "\n"; } |

ex_Molecule_Vec3¶
Unit test which shows how to create a Molecule object based on Vec3 data type (Vec3 objects are nodes of the graph).
USAGE:
./ex_Molecule_Vec3
Keywords:
Categories:
- core::chemical::Molecule
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | #include <iostream> #include <memory> #include <core/chemical/Molecule.hh> #include <core/chemical/molecule_utils.hh> #include <core/calc/structural/angles.hh> #include <core/data/basic/Vec3.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to create a Molecule object based on Vec3 data type (Vec3 objects are nodes of the graph). USAGE: ./ex_Molecule_Vec3 )"; /** @brief Demonstrates how to create a Molecule object based on Vec3 data type (Vec3 are nodes of the graph) * * This demo is similar to ex_Molecule, the difference is that here Vec3 instances are used as graph nodes * rather than PdbAtom instance. It creates a toluene molecule and detects planar angles. * * CATEGORIES: core::chemical::Molecule * KEYWORDS: molecule * IMG: Toluen_dihedral_flat_angle.png * IMG_ALT: Planar angles in a toluen molecule */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::chemical; using namespace core::data::structural; Molecule<Vec3> toluene; Vec3 atoms[] = {Vec3(0, 0, 0), Vec3(1.24, 0.72, 0), Vec3(1.24, 2.16, 0), Vec3(0, 2.88, 0), Vec3(-1.24, 2.16, 0), Vec3(-1.24, 0.72, 0), Vec3(0, -1.52, 0)}; // --- Mark atom numbers to check if the molecule is correct for (core::index2 i = 0; i < 7; ++i) atoms[i].register_ = i; // --- Insert atoms into the molecule for (Vec3 & ai : atoms) toluene.add_atom(ai); // --- Create bonds between them toluene.bind_atoms(0, 1, BondType::AROMATIC); toluene.bind_atoms(1, 2, BondType::AROMATIC); toluene.bind_atoms(2, 3, BondType::AROMATIC); toluene.bind_atoms(3, 4, BondType::AROMATIC); toluene.bind_atoms(4, 5, BondType::AROMATIC); toluene.bind_atoms(0, 5, BondType::AROMATIC); toluene.bind_atoms(0, 6, BondType::SINGLE); std::cout << "Connectivity (bonds):\n"; for(auto atom_it=toluene.cbegin_atom();atom_it!=toluene.cend_atom();++atom_it) { std::cout << (*atom_it).register_ << " : "; for(auto atom_it2=toluene.cbegin_atom(*atom_it);atom_it2!=toluene.cend_atom(*atom_it);++atom_it2) std::cout << " "<<(*atom_it2).register_; std::cout << "\n"; } // --- Find all planar angles in the molecule std::vector<std::tuple<Vec3, Vec3, Vec3>> planars; find_planar_angles(toluene, planars); std::vector<double> planar_values; std::cout << "Detected planar angles:\n"; // --- Evaluate and print all the planars for (auto pi : planars) { Vec3 &a1 = std::get<0>(pi); Vec3 &a2 = std::get<1>(pi); Vec3 &a3 = std::get<2>(pi); planar_values.push_back(core::calc::structural::evaluate_planar_angle(a1, a2, a3) * 180.0 / 3.14159); } // --- Sort the values before printing them to make the output stable std::sort(planar_values.begin(),planar_values.end()); for (double value:planar_values) std::cout << value << "\n"; } |

ex_NcbiSimilarityMatrixFactory¶
Test for loading substitution matrices available in BioShell. The program eads a substitution matrix from a given file (NCBI file format) and prints it back on the screen. The program can either load the input matrix from Biohell database (data/alignments directory) or from a file specified by a user. One can manually install custom matrices just by copying them to data/alignments/
USAGE:
./ex_NcbiSimilarityMatrixFactory subst-matrix-name
EXAMPLES:
./ex_NcbiSimilarityMatrixFactory BLOSUM45
./ex_NcbiSimilarityMatrixFactory ./BLOSUM45.txt
Keywords:
- sequence alignment
- substitution matrix
Categories:
- core::alignment::scoring::NcbiSimilarityMatrixFactory
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | #include <iostream> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/NcbiSimilarityMatrixFactory.hh> #include <utils/exit.hh> std::string program_info = R"( Test for loading substitution matrices available in BioShell. The program eads a substitution matrix from a given file (NCBI file format) and prints it back on the screen. The program can either load the input matrix from Biohell database (data/alignments directory) or from a file specified by a user. One can manually install custom matrices just by copying them to data/alignments/ USAGE: ./ex_NcbiSimilarityMatrixFactory subst-matrix-name EXAMPLES: ./ex_NcbiSimilarityMatrixFactory BLOSUM45 ./ex_NcbiSimilarityMatrixFactory ./BLOSUM45.txt )"; /** @brief Test for loading substitution matrices available in BioShell * * CATEGORIES: core::alignment::scoring::NcbiSimilarityMatrixFactory * KEYWORDS: sequence alignment; substitution matrix * IMG: heatmap_1.png * IMG_ALT: BLOSUM62 matrix plotted */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::alignment::scoring; NcbiSimilarityMatrixFactory sim_factory = NcbiSimilarityMatrixFactory::get(); if (argc == 1) { std::vector<std::string> names; sim_factory.get().matrix_names(names); std::cout << "\nMatrices defined in BioShell:\n"; for (const auto &n : names) std::cout << "\t" << n; std::cout << "\n"; } else { NcbiSimilarityMatrix_SP m = sim_factory.load_matrix(argv[1]); std::stringstream out; out << "\n"; m->print("%4d", 4, out); std::cout << out.str() << "\n"; } } |

ex_REMC_Ising¶
The program runs a Replica Exchange Monte Carlo simulation of a simple 2D Ising system (a spin glass). The simulation performs N_INNER x N_OUTER MC cycles and then a replica exchange is attempted.
USAGE:
ex_REMC_Ising [system_size inner_cycles outer_cycles n_exchanges]
EXAMPLE:
ex_REMC_Ising 32 50 100 100
Keywords:
- REMC
- Ising2D
- observer
- simulation
Categories:
- simulations/sampling/ReplicaExchangeMC; simulations/systems/ising/Ising2D
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | #include <iostream> #include <simulations/evaluators/CallEvaluator.hh> #include <simulations/forcefields/CalculateEnergyBase.hh> #include <simulations/movers/ising/SingleFlip2D.hh> #include <simulations/movers/ising/WolffMove2D.hh> #include <simulations/observers/ObserveEvaluators.hh> #include <simulations/observers/ObserveMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/observers/ObserveReplicaFlow.hh> #include <simulations/sampling/IsothermalMC.hh> #include <simulations/sampling/ReplicaExchangeMC.hh> #include <simulations/systems/ising/Ising2D.hh> using namespace core::data::basic; utils::Logger logs("ex_REMC_Ising"); std::string program_info = R"( The program runs a Replica Exchange Monte Carlo simulation of a simple 2D Ising system (a spin glass). The simulation performs N_INNER x N_OUTER MC cycles and then a replica exchange is attempted. USAGE: ex_REMC_Ising [system_size inner_cycles outer_cycles n_exchanges] EXAMPLE: ex_REMC_Ising 32 50 100 100 )"; /** @brief The program runs a Replica Exchange Monte Carlo simulation of a simple 2D Ising system (spin glass). * * This example shows how to set up a REMC simulation * * CATEGORIES: simulations/sampling/ReplicaExchangeMC; simulations/systems/ising/Ising2D * KEYWORDS: REMC; Ising2D; observer; simulation * IMG: Energy_plot.png * IMG_ALT: Energy over time in Ising model */ int main(const int argc,const char* argv[]) { using namespace simulations::systems::ising; using namespace simulations::movers::ising; core::index4 n_outer_cycles = 10; core::index4 n_inner_cycles = 10; core::index4 n_exchanges = 10; std::vector<double> temperatures = {100.0, 7.5, 5, 4, 3, 2.5, 2.25, 2, 1.75, 1.5, 1}; core::index2 system_size = 32; if (argc < 2) std::cerr << program_info; else { system_size = atoi(argv[1]); n_inner_cycles = atoi(argv[2]); n_outer_cycles = atoi(argv[3]); n_exchanges = atoi(argv[4]); } core::calc::statistics::Random::get().seed(12345); // --- seed the generator for repeatable results std::vector<std::shared_ptr<Ising2D<core::index1,core::index2>>> systems; std::vector<simulations::sampling::IsothermalMC_SP> replica_samplers; std::vector<simulations::forcefields::TotalEnergy_SP> energies; for (core::index2 irepl = 0; irepl < temperatures.size(); ++irepl) { // ---------- Create the systems to be sampled ---------- std::shared_ptr<Ising2D<core::index1,core::index2>> system = std::make_shared<Ising2D<core::index1,core::index2>>(system_size, system_size); system->initialize(); // Populate system with random spins systems.push_back(system); energies.push_back(system); // ---------- Movers definition ---------- simulations::movers::MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover( std::make_shared<SingleFlip2D<core::index1,core::index2>>(*system),system->count_spins()); movers->add_mover( std::make_shared<WolffMove2D<core::index1,core::index2>>(*system),system->count_spins()*0.2); // ---------- Create the sampler ---------- auto sampler = std::make_shared<simulations::sampling::IsothermalMC>(movers, temperatures[irepl]); replica_samplers.push_back(sampler); sampler->cycles(n_inner_cycles,n_outer_cycles); simulations::observers::ObserveEvaluators_SP observations = std::make_shared<simulations::observers::ObserveEvaluators>(utils::string_format("energy-%.3f.dat",temperatures[irepl])); observations->add_evaluator(system); sampler->outer_cycle_observer(observations); simulations::observers::ObserveMoversAcceptance_SP obs_ms = std::make_shared<simulations::observers::ObserveMoversAcceptance>(*movers, utils::string_format("movers-%.3f.dat",temperatures[irepl])); obs_ms->observe_header(); sampler->outer_cycle_observer(obs_ms); } bool replica_isothermal_observation_mode = true; auto remc = std::make_shared<simulations::sampling::ReplicaExchangeMC>(replica_samplers, energies, replica_isothermal_observation_mode); auto remc_flow = std::make_shared<simulations::observers::ObserveReplicaFlow>(*remc,"replica_flow.dat"); remc->exchange_observer(remc_flow); remc->replica_exchanges(n_exchanges); remc->run(); } |

ex_SelectChainResidueAtom¶
Extracts a fragment of a PDB file by applying a SelectChainResidueAtom selector. The selection string constists of chain code and residue range, separated by a colon, e.g.: - A:-1-10 - AB:
USAGE:
ex_SelectChainResidueAtom input.pdb selector-string
EXAMPLEs:
ex_SelectChainResidueAtom 2gb1.pdb A:23-32
ex_SelectChainResidueAtom 1ofz.pdb A:aa
ex_SelectChainResidueAtom 2gb1.pdb A:1-20:_CA_+_N__+_O__+_C__
ex_SelectChainResidueAtom 1ofz.pdb *:*:_CA_
Keywords:
Categories:
- core::data::structural::StructureSelector
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/exit.hh> std::string program_info = R"( Extracts a fragment of a PDB file by applying a SelectChainResidueAtom selector. The selection string constists of chain code and residue range, separated by a colon, e.g.: - A:-1-10 - AB: USAGE: ex_SelectChainResidueAtom input.pdb selector-string EXAMPLEs: ex_SelectChainResidueAtom 2gb1.pdb A:23-32 ex_SelectChainResidueAtom 1ofz.pdb A:aa ex_SelectChainResidueAtom 2gb1.pdb A:1-20:_CA_+_N__+_O__+_C__ ex_SelectChainResidueAtom 1ofz.pdb *:*:_CA_ )"; /** @brief Extracts a fragment of a PDB file. * * CATEGORIES: core::data::structural::StructureSelector * KEYWORDS: structure selectors; PDB input; PDB output * IMG: ex_SelectChainResidueAtoms_1.png * IMG_ALT: Proline residue selected from 1OFZ deposit */ int main(const int argc, const char* argv[]) { using namespace core::data::structural; if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // ---------- Read a PDB file and create a Structure object core::data::io::Pdb reader(argv[1], // --- data file core::data::io::keep_all); // --- a predicate to read ALL the ATOM lines (by default hydrogens are excluded) Structure_SP strctr = reader.create_structure(0); // --- Create a selector object from a selector string selectors::SelectChainResidueAtom sel(argv[2]); Structure_SP full_copy = strctr->clone(sel); // --- cloning with this selector makes a deep copy of everything for(auto atom_it=full_copy->first_atom();atom_it!=full_copy->last_atom();++atom_it) std::cout << (*atom_it)->to_pdb_line() << "\n"; } |

ex_SelectPlanarCAGeometry¶
Reads a PDB file and tests whether geometry at CA atom is tetrahedral or not. The program also prints the actual values of the N-CA-C-CB dihedral angle.
USAGE:
./ex_SelectPlanarCAGeometry input.pdb
EXAMPLE:
./ex_SelectPlanarCAGeometry 5edw.pdb
OUTPUT (fragment): 112 CYS D OK -2.22 3dcg 140 ASN E WRONG -2.42 3dcg 141 LYS E OK -2.23 3dcg 142 VAL E OK -2.17 3dcg 144 SER E OK -2.16 3dcg 145 LEU E OK -2.19 3dcg
Keywords:
- residue geometry
- structure selectors
- PDB input
- structure validation
Categories:
- core::data::structural::ResidueHasBBCB; core::data::structural::SelectResidueByName; core::data::structural::SelectPlanarCAGeometry
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #include <core/data/io/Pdb.hh> #include <core/calc/structural/angles.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/data/structural/selectors/SelectPlanarCAGeometry.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> std::string program_info = R"( Reads a PDB file and tests whether geometry at CA atom is tetrahedral or not. The program also prints the actual values of the N-CA-C-CB dihedral angle. USAGE: ./ex_SelectPlanarCAGeometry input.pdb EXAMPLE: ./ex_SelectPlanarCAGeometry 5edw.pdb OUTPUT (fragment): 112 CYS D OK -2.22 3dcg 140 ASN E WRONG -2.42 3dcg 141 LYS E OK -2.23 3dcg 142 VAL E OK -2.17 3dcg 144 SER E OK -2.16 3dcg 145 LEU E OK -2.19 3dcg )"; /** @brief Tests whether alpha-carbons actually have tetrahedral geometry as they should. * * CATEGORIES: core::data::structural::ResidueHasBBCB; core::data::structural::SelectResidueByName; core::data::structural::SelectPlanarCAGeometry * KEYWORDS: residue geometry; structure selectors; PDB input; structure validation * IMG: 1NXB_A_56_Glu_pymol_5.png * IMG_ALT: GLU56 of 1NXB deposit has planar geometry of alpha-carbon (witch contradicts basic chemical knowledge) */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using core::calc::structural::to_degrees; core::data::io::Pdb reader(argv[1], is_not_alternative); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- Selector that returns true if a residue has beta-carbon core::data::structural::selectors::ResidueHasBBCB has_bb_cb; // --- Selector that test the geometry on alpha carbon core::data::structural::selectors::SelectPlanarCAGeometry tester; // --- Selector that selects GLY residues core::data::structural::selectors::SelectResidueByName is_gly("GLY"); for (auto res_it = strctr->first_residue(); res_it != strctr->last_residue(); ++res_it) { // --- If a residue is not GLY and has C-beta ... if ((has_bb_cb(**res_it)) && (!is_gly(**res_it))) std::cout << utils::string_format("%4d %3s %4s %s %7.2f %s\n", (*res_it)->id(), (*res_it)->residue_type().code3.c_str(), (*res_it)->owner()->id().c_str(), (tester(**res_it)) ? "WRONG" : " OK ", to_degrees(tester.evaluate_angle(**res_it)), utils::basename(strctr->code()).c_str()); } } |

ex_VonMisesDistribution¶
ex_VonMisesDistribution withdraws N random values (by default N = 1000) from a Normal distribution and fits Von Mises distribution to the data. If exactly two arguments are provided (mu and kappa, respectively) the program tabulates Von Mises distribution for that parameters.
USAGE:
ex_VonMisesDistribution N
ex_VonMisesDistribution mu kappa
EXAMPLES:
ex_VonMisesDistribution 10000
ex_VonMisesDistribution 1.5708 100.0
Keywords:
Categories:
- core/calc/statistics/VonMisesDistribution
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | #include <iostream> #include <random> #include <core/calc/statistics/VonMisesDistribution.hh> #include <core/calc/statistics/Random.hh> std::string program_info = R"( ex_VonMisesDistribution withdraws N random values (by default N = 1000) from a Normal distribution and fits Von Mises distribution to the data. If exactly two arguments are provided (mu and kappa, respectively) the program tabulates Von Mises distribution for that parameters. USAGE: ex_VonMisesDistribution N ex_VonMisesDistribution mu kappa EXAMPLES: ex_VonMisesDistribution 10000 ex_VonMisesDistribution 1.5708 100.0 )"; /** @brief Example which estimates parameters of von Mises distribution and tabulates its values * CATEGORIES: core/calc/statistics/VonMisesDistribution * KEYWORDS: statistics * IMG: von_mises.png * IMG_ALT: Example von Mises distribution: histogram of a sample, pdf(x) and cdf(x) */ int main(const int argc, const char *argv[]) { using namespace core::calc::statistics; core::calc::statistics::VonMisesDistribution f(std::vector<double>{0.0, 1.0}); // --- initial distribution if (argc == 3) { double mu = atof(argv[1]); double kappa = atof(argv[2]); f.copy_parameters_from(std::vector<double>{mu, kappa}); std::cout << "# tabulating VonMisesDistribution: " << f << "\n"; for (double x = -M_PI; x <= M_PI; x += M_PI / 25.0) std::cout << utils::string_format("%6.3f %9f %5.3f\n", x, f.evaluate(x), VonMisesDistribution::cdf(x, f.mu(), f.kappa())); return 0; } core::index4 N = 1000; if (argc < 2) std::cerr << program_info; else N = atoi(argv[1]); std::vector<std::vector<double>> input_data; Random r = Random::get(); r.seed(9876543); std::normal_distribution<double> dist(M_PI / 2.0, 0.1); // ---------- prepare data that will be use to estimate the distribution for (core::index4 i = 0; i < N; ++i) { std::vector<double> v({dist(r)}); input_data.push_back(v); } f.estimate(input_data); // --- run the estimation std::cout << f << "\n"; // ---------- now prepare weighted data: just copy some points ten times and insert them with weight 0.1 input_data.clear(); std::vector<double> weights; for (core::index4 i = 0; i < N; ++i) { double x = dist(r); std::vector<double> v({x}); if (x < 2) { input_data.push_back(v); weights.push_back(1.0); } else { for (int j = 0; j < 10; ++j) { input_data.push_back(v); weights.push_back(0.1); } } } f.estimate(input_data, weights); // --- run the estimation based on weighted observations std::cout << f << "\n"; } |

ex_bf_by_residue¶
ex_bf_by_residue reads a PDB file and prints per-residue statistics of B-factors. The output provides: amino acid type (1-letter code), residue ID, and minimum, average and maximum b-factors for that residue
USAGE:
ex_bf_by_residue input.pdb
EXAMPLE:
ex_bf_by_residue 2gb1.pdb
Keywords:
- PDB input
- B-factors
- structure selectors
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( ex_bf_by_residue reads a PDB file and prints per-residue statistics of B-factors. The output provides: amino acid type (1-letter code), residue ID, and minimum, average and maximum b-factors for that residue USAGE: ex_bf_by_residue input.pdb EXAMPLE: ex_bf_by_residue 2gb1.pdb )"; /** @brief Reads a PDB file and per-residue statistics of B-factors * * CATEGORIES: core::data::io::Pdb; * KEYWORDS: PDB input; B-factors; structure selectors * IMG: Bfactor_plot.png * IMG_ALT: B-factors of 2GB1 PDB deposit */ int main(const int argc, const char *argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); core::data::structural::selectors::IsBB is_bb; core::data::structural::selectors::IsAA is_aa; for (auto res_it = strctr->first_residue(); res_it != strctr->last_residue(); ++res_it) { if (!is_aa(**res_it)) continue; double min = 9999.0, max = -999.0, avg = 0.0, n = 0.0; for (auto atom : **res_it) { // if (is_bb(*atom)) continue; // --- uncommment that line to compute statistics for side chain only double bf = atom->b_factor(); if (min > bf) min = bf; if (max < bf) max = bf; avg += bf; n += 1.0; } std::cout << (*res_it)->residue_type().code1 << " " << utils::string_format("%4d %5.2f %5.2f %5.2f\n", (*res_it)->id(), min, avg / n, max); } } |

ex_chi_correlation¶
Unit test which calculates Chi dihedral angles for every pair of amino acid side chains measured in two different homologous protein structures which are assumed to be aligned.
USAGE:
ex_chi_correlation file-1.pdb file-2.pdb
EXAMPLE:
ex_chi_correlation 1bgx_aligned.pdb 1xo1_aligned.pdb
Keywords:
- PDB input
- structural properties
- rotamers
Categories:
- core/calc/structural/evaluate_chi
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/calc/structural/protein_angles.hh> #include <core/chemical/ChiAnglesDefinition.hh> #include <utils/exit.hh> #include <core/calc/structural/angles.hh> std::string program_info = R"( Unit test which calculates Chi dihedral angles for every pair of amino acid side chains measured in two different homologous protein structures which are assumed to be aligned. USAGE: ex_chi_correlation file-1.pdb file-2.pdb EXAMPLE: ex_chi_correlation 1bgx_aligned.pdb 1xo1_aligned.pdb )"; /** @brief Calculates correlation between Chi dihedral angles measured in two different protein structures. * * CATEGORIES: core/calc/structural/evaluate_chi; * KEYWORDS: PDB input; structural properties; rotamers * IMG: ex_chi_correlation_plot.png * IMG_ALT: Example correlation of chi angles between two homologus structures */ int main(const int argc, const char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; core::data::io::Pdb readerA(argv[1]); // file name (PDB format, may be gzip-ped) Structure_SP proteinA = readerA.create_structure(0); core::data::io::Pdb readerB(argv[2]); Structure_SP proteinB = readerB.create_structure(0); std::vector<std::vector<double>> chiA, chiB; std::vector<std::string> labels; // for nice output on a screen std::vector<std::string> mpt_annotationsA; // MPT string - one per residue only (for other lines of a given residue empty strings are inserted) std::vector<std::string> mpt_annotationsB; if (proteinA->count_residues() != proteinB->count_residues()) { std::cerr << "The two input PDB files should contain only the aligned parts of input proteins and be equal in length!\n"; return 0; } auto a_res_it = proteinA->first_const_residue(); auto b_res_it = proteinB->first_const_residue(); while(a_res_it!=proteinA->last_const_residue()) { if ((*a_res_it)->residue_type() == (*b_res_it)->residue_type()) { // check chi angles std::vector<double> ca, cb; for (core::index2 k = 1; k <= core::chemical::ChiAnglesDefinition::count_chi_angles((*a_res_it)->residue_type()); ++k) { try { double aA = core::calc::structural::evaluate_chi((**a_res_it), k); double aB = core::calc::structural::evaluate_chi((**b_res_it), k); ca.push_back(aA); cb.push_back(aB); labels.push_back(utils::string_format("%4d%c %4d%c %c %1d", (*a_res_it)->id(), (*a_res_it)->icode(), (*b_res_it)->id(), (*b_res_it)->icode(), (*a_res_it)->residue_type().code1, k)); } catch (utils::exceptions::AtomNotFound e) { std::cerr << e.what() << "\n"; std::cerr << "Can't define chi angle for residue " << (**a_res_it) << "\n"; } } if (ca.size() == cb.size() && ca.size() > 0) { chiA.push_back(ca); chiB.push_back(cb); mpt_annotationsA.push_back(core::calc::structural::define_rotamer(ca)); mpt_annotationsB.push_back(core::calc::structural::define_rotamer(cb)); } } ++a_res_it; ++b_res_it; } // end of while loop over residues std::cout << "#ires jres aa k ichi_k jchi_k delta(chi) irot jrot\n"; double err = 0.0; core::index2 n_reoriented = 0; size_t ilabel=0; for(size_t ires=0;ires<chiA.size();++ires) { for (size_t i = 0; i < chiA[ires].size(); ++i) { double e = fabs(chiA[ires][i] - chiB[ires][i]); if (e > M_PI) e = 2.0 * M_PI - e; if (e > 0.523) ++n_reoriented; // --- i.e. 30 degrees of error std::cout << labels[ilabel] << utils::string_format("%8.2f %8.2f %8.2f", core::calc::structural::to_degrees(chiA[ires][i]), core::calc::structural::to_degrees(chiB[ires][i]), core::calc::structural::to_degrees(e)); if (i == 0) std::cout << std::setw(5) << mpt_annotationsA[ires] << std::setw(5) << mpt_annotationsB[ires] << "\n"; else std::cout << "\n"; err += e; ++ilabel; } } std::cout <<"# avg_err, n_diff, n_res: " << err / double(chiA.size()) << " "<<n_reoriented<<" "<<chiA.size()<<"\n"; } |

ex_evaluate_chi¶
Calculates all side chain Chi dihedral angles for the input protein structure
USAGE:
ex_evaluate_chi input.pdb
EXAMPLE:
ex_evaluate_chi 2kwi.pdb
Keywords:
Categories:
- core::chemical::ChiAnglesDefinition; core::calc::structural::evaluate_chi()
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/chemical/ChiAnglesDefinition.hh> #include <core/calc/structural/protein_angles.hh> #include <utils/exit.hh> std::string program_info = R"( Calculates all side chain Chi dihedral angles for the input protein structure USAGE: ex_evaluate_chi input.pdb EXAMPLE: ex_evaluate_chi 2kwi.pdb )"; /** @brief Calculates all side chain Chi dihedral angles for the input protein structure * * CATEGORIES: core::chemical::ChiAnglesDefinition; core::calc::structural::evaluate_chi() * KEYWORDS: PDB input; structural properties; structure validation * IMG: ex_evaluate_chi.png * IMG_ALT: Chi1-Chi2 statistics for ILE residue */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // Create a PDB reader for a given file core::data::structural::Structure_SP str = reader.create_structure( 0); // Create a structure object from the first model for (auto ires = str->first_residue(); ires != str->last_residue(); ++ires) { // iterate over all residues std::string line = utils::string_format("%4d %3s %4s", (*ires)->id(), (*ires)->residue_type().code3.c_str(), (*ires)->owner()->id().c_str()); try { for (unsigned short i = 1; i <= core::chemical::ChiAnglesDefinition::count_chi_angles((*ires)->residue_type()); ++i) line += utils::string_format(" %6.1f", core::calc::structural::evaluate_chi(**ires, i) * 180.0 / 3.1415); std::cout << line << "\n"; } catch(const std::exception& e) { std::cerr << "Skipping incomplete residue: "<<line<<"\n"; } } } |

ex_evaluate_phi_psi¶
Calculates Phi,Psi angles (Ramachandran map) for every model found in the input protein structure
USAGE:
ex_evaluate_phi_psi input.pdb [chain-id]
EXAMPLE:
ex_evaluate_phi_psi 2kwi.pdb B
Keywords:
Categories:
- core::calc::structural::LocalBackboneProperties
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | #include <iostream> #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/local_backbone_geometry.hh> #include <utils/exit.hh> #include <core/data/structural/ResidueSegmentProvider.hh> #include <core/data/structural/selectors/ResidueSegmentSelector.hh> #include <core/data/structural/selectors/SelectPlanarCAGeometry.hh> #include <core/protocols/selection_protocols.hh> std::string program_info = R"( Calculates Phi,Psi angles (Ramachandran map) for every model found in the input protein structure USAGE: ex_evaluate_phi_psi input.pdb [chain-id] EXAMPLE: ex_evaluate_phi_psi 2kwi.pdb B )"; /** @brief Calculates Phi,Psi angles (Ramachandran map) for the input protein structure * * CATEGORIES: core::calc::structural::LocalBackboneProperties * KEYWORDS: PDB input; structural properties; structure validation * IMG: phi_psi_scatterplot.png * IMG_ALT: Phi,Psi values plotted for every residue of 2KWI PDB deposit; radius of a circle denotes standard deviation calculated from the NMR ensemble */ int main(const int argc, const char *argv[]) { using namespace core::calc::structural; using namespace core::data::structural; using namespace core::data::io; if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1], keep_all, keep_all, true); // Create a PDB reader for a given file selectors::HasProperlyConnectedCA is_connected; core::data::structural::selectors::ResidueHasAllHeavyAtoms check_atoms; core::data::structural::selectors::SelectPlanarCAGeometry if_flat; Phi phi(1); Psi psi(1); for(int i=0;i<reader.count_models();++i) { core::data::structural::Structure_SP str = reader.create_structure(i); if (argc==3) { core::data::structural::selectors::ChainSelector pick_chain(argv[2]); core::protocols::keep_selected_chains(pick_chain, *str); } ResidueSegmentProvider rsp(str, 3); while (rsp.has_next()) { const ResidueSegment_SP seg = rsp.next(); if (is_connected(*seg)) { for(int i=0;i<3;++i) { if (if_flat((*seg)[i])) break; if (!check_atoms((*seg)[i])) break; } const auto &res = *(*seg)[1]; std::cout << utils::string_format("%4s %s %4d %3s ", str->code().c_str(), res.owner()->id().c_str(), res.id(), res.residue_type().code3.c_str()); std::cout << seg->sequence()->sequence << " " << seg->sequence()->str()<< " "; std::cout << utils::string_format(phi.format(), phi(*seg)) << " "; std::cout << utils::string_format(psi.format(), psi(*seg)) << "\n"; } } } } |

ex_plot_VonMises_mixture¶
ex_plot_VonMises_mixture evaluates a mixture of Von Mises distribution so it can be plotted nicely
USAGE:
ex_plot_VonMises_mixture scaling mu kappa [scaling2 mu2 kappa2 ...]
EXAMPLE:
ex_plot_VonMises_mixture 0.487862 -3.00582 17.4059 0.0794212 -1.02886 112.164
where the six numbers are scaling, mean and spread of two VonMises distributions
REFERENCE: http://mathworld.wolfram.com/vonMisesDistribution.html https://en.wikipedia.org/wiki/Von_Mises_distribution
Keywords:
Categories:
- core/calc/statistics/VonMisesDistribution
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | #include <iostream> #include <random> #include <cstdlib> #include <core/calc/statistics/VonMisesDistribution.hh> #include <core/calc/statistics/Random.hh> std::string program_info = R"( ex_plot_VonMises_mixture evaluates a mixture of Von Mises distribution so it can be plotted nicely USAGE: ex_plot_VonMises_mixture scaling mu kappa [scaling2 mu2 kappa2 ...] EXAMPLE: ex_plot_VonMises_mixture 0.487862 -3.00582 17.4059 0.0794212 -1.02886 112.164 where the six numbers are scaling, mean and spread of two VonMises distributions REFERENCE: http://mathworld.wolfram.com/vonMisesDistribution.html https://en.wikipedia.org/wiki/Von_Mises_distribution )"; /** @brief Example which evaluates a mixture of Von Mises distribution so it can be plotted nicely * CATEGORIES: core/calc/statistics/VonMisesDistribution * KEYWORDS: statistics * IMG: ex_plot_VonMises_mixture.png * IMG_ALT: Mixture of von Mises functions plotted */ int main(const int argc, const char *argv[]) { using namespace core::calc::statistics; std::vector<double> factors; std::vector<VonMisesDistribution> components; for (size_t i = 1; i < argc; i += 3) { factors.push_back(atof(argv[i])); components.push_back(VonMisesDistribution{atof(argv[i + 1]), atof(argv[i + 2])}); } for (double x = -M_PI; x <= M_PI; x += M_PI / 62.8) { double val = 0; for (size_t i = 0; i < components.size(); ++i) val += factors[i] * components[i].evaluate(x); std::cout << utils::string_format("%6.3f %9f\n", x, val); } } |

ex_Array2DSymmetric¶
Unit test which demonstrates how to use Array2DSymmetric class. The test fills a matrix with random data and prints it on the screen.
USAGE:
./ex_Array2DSymmetric
Keywords:
Categories:
- core::data::basic::Array2DSymmetric
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | #include <iostream> #include <core/data/basic/Array2DSymmetric.hh> #include <core/calc/statistics/Random.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which demonstrates how to use Array2DSymmetric class. The test fills a matrix with random data and prints it on the screen. USAGE: ./ex_Array2DSymmetric )"; /** @brief Simple test for Array2DSymmetric class. * * The test fills a mtrix with random data and prints it on the screen. * * CATEGORIES: core::data::basic::Array2DSymmetric * KEYWORDS: data structures; random numbers */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); core::calc::statistics::Random r = core::calc::statistics::Random::get(); std::uniform_int_distribution<core::index1> uniform_bytes; r.seed(12345); core::data::basic::Array2DSymmetric<core::index1> m(10); for (core::index4 n = 0; n < 1000; ++n) { core::index1 i = uniform_bytes(r) % 10; core::index1 j = uniform_bytes(r) % 10; m.set(i, j, uniform_bytes(r)); } m.print("%4d",std::cout); } |

ex_AtomSelector¶
Simple example showing how to use atom selectors. Each selector returns true or false. This example uses selector to check, if: - an atom is an alpha-carbon (core::data::structural::IsCA) - an atom is a beta-carbon (core::data::structural::IsCB) - an atom is in backbone (core::data::structural::IsBB) - an atom is of the specified element (core::data::structural::IsElement) - an atom is either beta-carbon or a backbone atom (core::data::structural::IsBBCB) - an atom is of the specified name (core::data::structural::IsNamedAtom) - an atom is neither beta-carbon nor a backbone atom (core::data::structural::InverseAtomSelector of core::data::structural::IsBBCB)
USAGE:
./ex_AtomSelector
)”;
std::string thr = R”(ATOM 726 N THR A 49 16.822 -5.118 -7.249 1.00 0.00 N ATOM 727 CA THR A 49 18.249 -4.825 -7.180 1.00 0.00 C ATOM 728 C THR A 49 18.495 -3.354 -6.872 1.00 0.00 C ATOM 729 O THR A 49 19.599 -2.845 -7.066 1.00 0.00 O ATOM 730 CB THR A 49 18.965 -5.191 -8.493 1.00 0.00 C ATOM 731 OG1 THR A 49 18.016 -5.723 -9.426 1.00 0.00 O ATOM 732 CG2 THR A 49 20.053 -6.223 -8.238 1.00 0.00 C ATOM 733 H THR A 49 16.231 -4.547 -7.836 1.00 0.00 H ATOM 734 HA THR A 49 18.702 -5.391 -6.366 1.00 0.00 H ATOM 735 HB THR A 49 19.411 -4.291 -8.916 1.00 0.00 H ATOM 736 HG1 THR A 49 17.144 -5.733 -9.024 1.00 0.00 H ATOM 737 1HG2 THR A 49 20.548 -6.468 -9.177 1.00 0.00 H ATOM 738 2HG2 THR A 49 20.782 -5.816 -7.538 1.00 0.00 H ATOM 739 3HG2 THR A 49 19.607 -7.123 -7.817 1.00 0.00 H
Keywords:
Categories:
- core/data/structural/structure_selectors.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #include <iostream> #include <iomanip> // for std::setw() #include <ios> // for std::boolalpha #include <sstream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Simple example showing how to use atom selectors. Each selector returns true or false. This example uses selector to check, if: - an atom is an alpha-carbon (core::data::structural::IsCA) - an atom is a beta-carbon (core::data::structural::IsCB) - an atom is in backbone (core::data::structural::IsBB) - an atom is of the specified element (core::data::structural::IsElement) - an atom is either beta-carbon or a backbone atom (core::data::structural::IsBBCB) - an atom is of the specified name (core::data::structural::IsNamedAtom) - an atom is neither beta-carbon nor a backbone atom (core::data::structural::InverseAtomSelector of core::data::structural::IsBBCB) USAGE: ./ex_AtomSelector )"; std::string thr = R"(ATOM 726 N THR A 49 16.822 -5.118 -7.249 1.00 0.00 N ATOM 727 CA THR A 49 18.249 -4.825 -7.180 1.00 0.00 C ATOM 728 C THR A 49 18.495 -3.354 -6.872 1.00 0.00 C ATOM 729 O THR A 49 19.599 -2.845 -7.066 1.00 0.00 O ATOM 730 CB THR A 49 18.965 -5.191 -8.493 1.00 0.00 C ATOM 731 OG1 THR A 49 18.016 -5.723 -9.426 1.00 0.00 O ATOM 732 CG2 THR A 49 20.053 -6.223 -8.238 1.00 0.00 C ATOM 733 H THR A 49 16.231 -4.547 -7.836 1.00 0.00 H ATOM 734 HA THR A 49 18.702 -5.391 -6.366 1.00 0.00 H ATOM 735 HB THR A 49 19.411 -4.291 -8.916 1.00 0.00 H ATOM 736 HG1 THR A 49 17.144 -5.733 -9.024 1.00 0.00 H ATOM 737 1HG2 THR A 49 20.548 -6.468 -9.177 1.00 0.00 H ATOM 738 2HG2 THR A 49 20.782 -5.816 -7.538 1.00 0.00 H ATOM 739 3HG2 THR A 49 19.607 -7.123 -7.817 1.00 0.00 H )"; /** @brief Demonstrates how to use atom selectors. * * CATEGORIES: core/data/structural/structure_selectors.hh * KEYWORDS: PDB input; structure selectors */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::stringstream in(thr); // Create an input stream that will provide data from a string core::data::io::Pdb reader(in, // data stream core::data::io::keep_all); // a predicate to read ALL the ATOM lines (hydrogens are excluded by default) core::data::structural::Structure_SP strctr = reader.create_structure(0); core::data::structural::selectors::IsCA ca_test; core::data::structural::selectors::IsCB cb_test; core::data::structural::selectors::IsBB bb_test; core::data::structural::selectors::IsElement is_H("H"); core::data::structural::selectors::IsBBCB bb_cb_test; core::data::structural::selectors::InverseAtomSelector not_bb_cb(bb_cb_test); core::data::structural::selectors::IsNamedAtom is_og1(" OG1"); // note the padding for four characters! std::cout <<"atom is_CA is_CB is_BB is_H is_OG1 is_bb_CB !is_bb_CB\n"; for(auto ai = strctr->first_atom(); ai != strctr->last_atom(); ++ai) std::cout << (*ai)->atom_name()<<" " << std::setw(5) << std::boolalpha << ca_test(**ai)<<" " << std::setw(5) << std::boolalpha << cb_test(**ai)<<" " << std::setw(5) << std::boolalpha << bb_test(**ai)<<" " << std::setw(5) << std::boolalpha << is_H(**ai)<<" " << std::setw(5) << std::boolalpha << is_og1(**ai)<<" " << std::setw(5) << std::boolalpha << bb_cb_test(**ai)<<" " << std::setw(5) << std::boolalpha << not_bb_cb(**ai)<<"\n"; } |

ex_AtomicElement¶
Unit test which shows how to use AtomicElement class. It prints all the element names
USAGE:
./ex_AtomicElement
Keywords:
- chemical elements
Categories:
- core::chemical::AtomicElement
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #include <iostream> #include <iomanip> #include <core/chemical/AtomicElement.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use AtomicElement class. It prints all the element names USAGE: ./ex_AtomicElement )"; /** @brief Example showing how to use AtomicElement class * * CATEGORIES: core::chemical::AtomicElement * KEYWORDS: chemical elements */ int main(int argc, char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::chemical; if (argc < 2) for (const std::pair<std::string, AtomicElement> &e : AtomicElement::elements_by_symbol) std::cout << e.second << std::endl; else for (int i = 1; i < argc; i++) std::cout << AtomicElement::by_symbol(argv[i]) << "\n"; } |

ex_BetaStructuresGraph¶
Reads a PDB file, creates a BetaStructuresGraph for it and finds all strands as connected components of that graph
USAGE:
ex_BetaStructuresGraph 5edw.pdb
Keywords:
Categories:
- core::data::structural::BetaStructuresGraph
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | #include <core/data/io/Pdb.hh> #include <utils/exit.hh> #include <core/algorithms/graph_algorithms.hh> #include <core/data/structural/BetaStructuresGraph.hh> #include <core/calc/structural/ProteinArchitecture.hh> std::string program_info = R"( Reads a PDB file, creates a BetaStructuresGraph for it and finds all strands as connected components of that graph USAGE: ex_BetaStructuresGraph 5edw.pdb )"; /** @brief Creates a BetaStructuresGraph and finds all strands * * CATEGORIES: core::data::structural::BetaStructuresGraph * KEYWORDS: PDB input */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; using namespace core::data::io; core::data::io::Pdb reader(argv[1],is_not_alternative,only_ss_from_header, true); core::data::structural::Structure_SP strctr = reader.create_structure(0); core::calc::structural::ProteinArchitecture a(*strctr); BetaStructuresGraph_SP g = a.create_strand_graph(); auto sheets = core::algorithms::connected_components<BetaStructuresGraph, Strand_SP, StrandPairing_SP>(*g); int cnt = 0; for(const auto & sheet: sheets) { std::cout << utils::string_format("-------------- Sheet %d -------------------\n",++cnt); for(auto it=sheet->cbegin_strand();it!=sheet->cend_strand();++it) std::cout << **it<<"\n"; } } |

ex_BioShellVersion¶
Unit test for BioShellVersion class which prints the BioShell version info - a string that unambiguously describes code version (Git SHA and branch) and compilation time (Git timestamp). Note, that the output changes with every git / cmake operation
USAGE:
./ex_BioShellVersion
Keywords:
Categories:
- core/BioShellVersion
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #include <iostream> #include <core/BioShellVersion.hh> #include <utils/exit.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Unit test for BioShellVersion class which prints the BioShell version info - a string that unambiguously describes code version (Git SHA and branch) and compilation time (Git timestamp). Note, that the output changes with every git / cmake operation USAGE: ./ex_BioShellVersion )"; /** @brief Test for BioShellVersion class prints the BioShell version info * * CATEGORIES: core/BioShellVersion * KEYWORDS: bioshell */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::cout << "BioShell boilerplate:\n"; std::cout << core::BioShellVersion() << "\n"; } |

ex_BivariateNormal¶
Estimates parameters of a two-dimensional Gaussian distribution The program expects a file with columns of real values; based on them parameters of the distributions are estimated. Otherwise the example withdraws 10000 random numbers from a normal distribution and later it estimates a normal distribution from the sample.
USAGE:
ex_BivariateNormal infile [x_column y_column]
EXAMPLE:
./ex_BivariateNormal bivariate_normal.dat 0 1
where x_column y_column are optional parameters that indicate which columns should be used for estimation; by default columns 0 and 1 are used.
Keywords:
Categories:
- core::calc::statistics::BivariateNormal; core::calc::statistics::Random
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | #include <math.h> #include <iostream> #include <random> #include <core/data/io/DataTable.hh> #include <core/calc/statistics/Random.hh> #include <core/calc/statistics/BivariateNormal.hh> #include <core/calc/statistics/RobustDistributionDecorator.hh> std::string program_info = R"( Estimates parameters of a two-dimensional Gaussian distribution The program expects a file with columns of real values; based on them parameters of the distributions are estimated. Otherwise the example withdraws 10000 random numbers from a normal distribution and later it estimates a normal distribution from the sample. USAGE: ex_BivariateNormal infile [x_column y_column] EXAMPLE: ./ex_BivariateNormal bivariate_normal.dat 0 1 where x_column y_column are optional parameters that indicate which columns should be used for estimation; by default columns 0 and 1 are used. )"; /** @brief Estimates parameters of a two-dimensional Gaussian distribution * * CATEGORIES: core::calc::statistics::BivariateNormal; core::calc::statistics::Random * KEYWORDS: statistics; random numbers; estimation */ int main(const int argc, const char* argv[]) { using namespace core::calc::statistics; std::vector<std::vector<double> > data_2D; std::vector<double> row(2); if (argc == 1) { // --- No input file? Generate random data for the test std::cerr << program_info; Random rd = core::calc::statistics::Random::get(); rd.seed(12345); // --- seed the generator for repeatable results unsigned N = 100000; //--- the number of random points to use in tests core::calc::statistics::NormalRandomDistribution<double> nX(1.0, 2.5); core::calc::statistics::NormalRandomDistribution<double> nY(2.0, 0.7); for (unsigned i = 0; i < N; ++i) { // --- get a random sample in 2D double x = nX(rd); double y = nY(rd); row[0] = x - y; // --- make X variable correlated with Y row[1] = x + y; data_2D.push_back(row); } } else { core::data::io::DataTable in_data(argv[1]); int column_x_id = 0, column_y_id = 1; if (argc > 3) { column_x_id = utils::from_string<int>(argv[2]); column_y_id = utils::from_string<int>(argv[3]); } for (const auto &data_row : in_data) { row[0] = data_row.get<double>(column_x_id); row[1] = data_row.get<double>(column_y_id); data_2D.push_back(row); } } std::vector<double> initial_parameters{0.0, 0.0, 1.0, 1.0, 1.0}; // --- Here we declare a 2D normal distribution ... core::calc::statistics::BivariateNormal n(initial_parameters); // ... and estimate its parameters const std::vector<double> & params = n.estimate(data_2D); // show the estimated parameters of the distribution std::cout << " estimated parameters: " << params[0] << " " << params[1] << " " << params[2] << " " << params[3] << " " << params[4] << "\n"; // ... and estimate its parameters core::calc::statistics::RobustDistributionDecorator<BivariateNormal> rn(initial_parameters, 0.05); const std::vector<double> & params_r = rn.estimate(data_2D); // show the estimated parameters of the distribution std::cout << " estimated parameters (robust): " << params_r[0] << " " << params_r[1] << " " << params_r[2] << " " << params_r[3] << " " << params_r[4] << "\n"; } |

ex_BoundedPriorityQueue¶
Unit test which shows how to use the BoundedPriorityQueue data structure. BoundedPriorityQueue is a sorted queue with pre-defined maximum capacity. It’s purpose is to keep N best elements of what was inserted to the queue. Overflow elements are removed from the queue
USAGE:
./ex_BoundedPriorityQueue
Keywords:
- algorithms
- data structures
- BoundedPriorityQueue
Categories:
- core::data::basic::BoundedPriorityQueue
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #include <iostream> #include <random> #include <core/index.hh> #include <core/calc/statistics/Random.hh> #include <core/data/basic/BoundedPriorityQueue.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use the BoundedPriorityQueue data structure. BoundedPriorityQueue is a sorted queue with pre-defined maximum capacity. It's purpose is to keep N best elements of what was inserted to the queue. Overflow elements are removed from the queue USAGE: ./ex_BoundedPriorityQueue )"; using namespace core::data::basic; /** @brief Simple demo for BoundedPriorityQueue class * * This program creates a BoundedPriorityQueue and fills it with random numbers. * When printed, they should be ordered descending * * CATEGORIES: core::data::basic::BoundedPriorityQueue * KEYWORDS: algorithms; data structures; BoundedPriorityQueue */ int main(int cnt, char *argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); // ---------- test on real values typedef std::function<bool(const double, const double)> ComparatorType; core::data::basic::BoundedPriorityQueue<double, ComparatorType, ComparatorType> q( [&](double x, double y) { return x > y; }, [&](double x, double y) { return x == y; }, 10, 20, -std::numeric_limits<double>::max()); core::calc::statistics::Random r = core::calc::statistics::Random::get(); std::uniform_real_distribution<float> flat_f(0, 10.0); for (core::index4 i = 0; i < 30; ++i) q.push(flat_f(r)); for (core::index4 i = 1; i < q.size(); ++i) { std::cout << q[i] << " "; if (q[i - 1] < q[i]) std::cerr << utils::string_format("Incorrect ordering in a bounded priority queue, %f before %f\n", q[i - 1], q[i]); } std::cout << "\n"; // ---------- test on integers typedef std::function<bool(int, int)> ComparatorTypeI; core::data::basic::BoundedPriorityQueue<int, ComparatorTypeI, ComparatorTypeI> q_i( [&](int x, int y) { return x > y; }, [&](int x, int y) { return x == y; }, 10, 20, -std::numeric_limits<int>::max()); std::uniform_int_distribution<int> flat(0, 20); for (core::index4 i = 0; i < 30; ++i) q_i.push(flat(r)); for (core::index4 i = 1; i < q_i.size(); ++i) { std::cout << q_i[i] << " "; if (q_i[i - 1] < q_i[i]) std::cerr << utils::string_format("Incorrect ordering in a bounded priority queue, %f before %f\n", q_i[i - 1], q_i[i]); } } |

ex_BuildPolymerChain¶
Creates a mixture of simple polymer chains in a periodic box
USAGE:
./ex_BuildPolymerChain box_width n_chains n_atoms_in_chain
Keywords:
- Chain
- ResidueChain
- Polymer
- Vec3Cubic
Categories:
- simulations::systems::BuildPolymerChain
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #include <iostream> #include <core/data/basic/Vec3Cubic.hh> #include <simulations/systems/BuildPolymerChain.hh> #include <simulations/systems/SingleAtomType.hh> #include <simulations/systems/CartesianChains.hh> #include <simulations/observers/cartesian/PdbObserver.hh> #include <simulations/observers/cartesian/ExplicitPdbFormatter.hh> #include <utils/exit.hh> using namespace simulations::systems; using namespace core::data::structural; using core::data::basic::Vec3Cubic; std::string program_info = R"( Creates a mixture of simple polymer chains in a periodic box USAGE: ./ex_BuildPolymerChain box_width n_chains n_atoms_in_chain )"; /* @brief Creates a mixture of simple polymer chains in a periodic box * * CATEGORIES: simulations::systems::BuildPolymerChain * KEYWORDS: Chain; ResidueChain; Polymer; Vec3Cubic */ int main(const int argc, const char *argv[]) { if (argc < 4) utils::exit_OK_with_message(program_info); double box = atof(argv[1]); core::index2 n_chains = atoi(argv[2]); core::index2 n_res_each = atoi(argv[3]); // --- here we create a Structure object of n_chains polyalanine chains Structure_SP starting_structure = std::make_shared<Structure>(""); std::string sequence(n_res_each,'A'); // --- We create a polyalanine chain, the sequence is made by many 'A's for (core::index2 i = 0; i < n_chains; ++i) starting_structure->push_back( Chain::create_ca_chain(sequence, std::string{utils::letters[i]} )); // --- A + 1 makes the chain code // --- we have to renumber atoms so the indexes are consistent in the whole structure core::index4 i_atom = 0; for(Chain_SP m : *starting_structure) std::for_each(m->first_atom(), m->last_atom(), [&](PdbAtom_SP e) {(e)->id(++i_atom);}); // Vec3Cubic::set_box_len(box); // --- set periodic box width std::shared_ptr<AtomTypingInterface> atom_typing = std::make_shared<SingleAtomType>(); // --- simplest atom typing possible CartesianChains chains(atom_typing,*starting_structure); BuildPolymerChain chain_builder(chains); chain_builder.generate(3.8,5.5); core::index4 ai = 0; std::shared_ptr<simulations::observers::cartesian::AbstractPdbFormatter> fmt = std::make_shared<simulations::observers::cartesian::ExplicitPdbFormatter>(*starting_structure); simulations::observers::cartesian::PdbObserver start(chains, fmt, ""); start.observe(); } |

ex_Cart¶
Shows how to use CART classification model
Keywords:
- CART
- observer
Categories:
- core::calc::statistics::Cart
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | #include <iostream> #include <vector> #include <map> #include <core/calc/statistics/Cart.hh> #include <core/algorithms/basic_algorithms.hh> using namespace core::calc::statistics; using namespace core::data::io; using namespace core::data::basic; utils::Logger logs("ex_Cart"); /** @brief Shows how to use CART classification model * * CATEGORIES: core::calc::statistics::Cart * KEYWORDS: CART; observer */ int main(const int argc, const char *argv[]) { DataTable dt; dt.load(argv[1]); std::vector<LabelledObservationVector_SP> observations; std::vector<std::string> class_names; std::vector<core::index2> class_ids; std::map<std::string, core::index2> class_to_id; // --- First find distinct labels core::index2 i_class = 0; for (const TableRow &tr : dt) { if (class_to_id.find(tr.back()) == class_to_id.end()) { class_ids.push_back(i_class); class_to_id[tr.back()] = i_class; ++i_class; } } for (const TableRow &tr : dt) observations.push_back(std::make_shared<LabelledObservationVector>(tr, class_to_id[tr.back()], 0, tr.size() - 2)); // --- print some debug info : known classes etc. logs << utils::LogLevel::INFO << "classification into " << class_ids.size() << " classes\n"; if (logs.is_logable(utils::LogLevel::INFO)) { logs << utils::LogLevel::INFO << "Known classes:\n"; core::index1 icol = 0; for (auto c:class_to_id) { logs << c.first << " "; ++icol; if (icol % 10 == 0) logs << "\n"; } logs << "\n"; } // --- create the CART classifier and train it Cart cart(class_ids); cart.train(observations); std::cout << cart; // --- test the classifier for the training data set core::index4 n_ok = 0; for(const auto o : observations) { n_ok += (cart.classify(o) == o->label()); } std::cout << utils::string_format("# classification test:\n# success rate: %d of %d (%6.2f%%)\n", n_ok, observations.size(), 100.0*n_ok / float(observations.size())); } |

ex_CartesianToSpherical¶
Unit test that calculates spherical coordinates from a few points in the Cartesian space using BioShell
USAGE:
./ex_CartesianToSpherical
)”;
std::string input_pdb = R”(ATOM 201 N SER A 12 25.081 -7.330 -14.416 1.00 0.00 N ATOM 202 CA SER A 12 25.875 -6.648 -15.435 1.00 0.00 C ATOM 203 C SER A 12 25.030 -6.429 -16.700 1.00 0.00 C ATOM 204 O SER A 12 25.187 -5.429 -17.400 1.00 0.00 O ATOM 205 CB SER A 12 27.126 -7.492 -15.717 1.00 0.00 C ATOM 206 OG SER A 12 27.645 -8.029 -14.500 1.00 0.00 O ATOM 207 H SER A 12 25.486 -8.177 -14.049 1.00 0.00 H
Keywords:
Categories:
- core/calc/structural/CartesianToSpherical
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/PdbAtom.hh> #include <core/calc/structural/transformations/Rototranslation.hh> #include <core/calc/structural/transformations/transformation_utils.hh> #include <core/calc/structural/transformations/CartesianToSpherical.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Unit test that calculates spherical coordinates from a few points in the Cartesian space using BioShell USAGE: ./ex_CartesianToSpherical )"; std::string input_pdb = R"(ATOM 201 N SER A 12 25.081 -7.330 -14.416 1.00 0.00 N ATOM 202 CA SER A 12 25.875 -6.648 -15.435 1.00 0.00 C ATOM 203 C SER A 12 25.030 -6.429 -16.700 1.00 0.00 C ATOM 204 O SER A 12 25.187 -5.429 -17.400 1.00 0.00 O ATOM 205 CB SER A 12 27.126 -7.492 -15.717 1.00 0.00 C ATOM 206 OG SER A 12 27.645 -8.029 -14.500 1.00 0.00 O ATOM 207 H SER A 12 25.486 -8.177 -14.049 1.00 0.00 H)"; /** @brief Calculates spherical coordinates using BioShell and 'by hand' to check of it works * * CATEGORIES: core/calc/structural/CartesianToSpherical; * KEYWORDS: internal coordinates */ int main(const int argc, const char *argv[]) { using namespace core::data::structural; using namespace core::calc::structural::transformations; std::stringstream in_stream(input_pdb); // --- Create an input stream from the text core::data::io::Pdb reader(in_stream, core::data::io::keep_all); // --- read from this stream core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- create a structure object auto residue = *(strctr->first_residue()); // --- get the residue ... PdbAtom_SP n = residue->find_atom(" N "); // --- and extract three atoms PdbAtom_SP ca = residue->find_atom(" CA ");// --- to form a local coordinate system (LCS) PdbAtom_SP c = residue->find_atom(" C "); PdbAtom_SP cb = residue->find_atom(" CB "); // --- CB will be transformed Rototranslation_SP rt = local_coordinates_three_atoms(*n, *ca, *c); CartesianToSpherical to_spherical; Vec3 cb_local, cb_spherical; rt->apply(*cb, cb_local); to_spherical.apply(cb_local, cb_spherical); double r = cb_local.length(); double theta = acos(cb_local.z / r); double phi = atan2(cb_local.y, cb_local.x); std::cout << "local computed by BioShell: " << cb_local << "\n"; std::cout << "spherical by BioShell: " << cb_spherical << "\n"; std::cout << "spherical computed here: " << utils::string_format("%8.3f %8.3f %8.3f\n", r, theta, phi); double x = r * sin(theta) * cos(phi); double y = r * sin(theta) * sin(phi); double z = r * cos(theta); std::cout << "local from inversion: " << utils::string_format("%8.3f %8.3f %8.3f\n", x, y, z); to_spherical.apply_inverse(cb_spherical); std::cout << "local by BioShell : " << cb_spherical << "\n"; } |

ex_ChiAnglesDefinition¶
Unit test that shows how to look up information on Chi angle definitions. It prints how many Chi angles are defined for ARG and which atoms define Chi2 of TRP
USAGE:
./ex_ChiAnglesDefinition
Keywords:
Categories:
- core::chemical::ChiAnglesDefinition
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #include <iostream> #include <core/chemical/ChiAnglesDefinition.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test that shows how to look up information on Chi angle definitions. It prints how many Chi angles are defined for ARG and which atoms define Chi2 of TRP USAGE: ./ex_ChiAnglesDefinition )"; /** @brief Shows how to look up information on Chi angle definitions * * This example prints how many Chi angles are defined for ARG and wish atoms define Chi2 of TRP * * CATEGORIES: core::chemical::ChiAnglesDefinition * KEYWORDS: structural properties */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); // List atoms that define the second Chi angle in TRP residue std::cout << "Chi_2 in TRP:"; for (const std::string &a : core::chemical::ChiAnglesDefinition::chi_angle_atoms("TRP", 2)) // "TRP" defines a residue, "2" stands for Chi_2 std::cout << a << " "; std::cout << "\n"; const core::chemical::Monomer &m = core::chemical::Monomer::ARG; // Create a local reference to ARG monomer (just to make the following lines shorter) std::cout << "\nAll Chi angles for in " << m.code3 << " :\n"; for (unsigned short i = 1; i <= core::chemical::ChiAnglesDefinition::count_chi_angles(m); ++i) { // Count how many Chi angles ARG has std::cout << "Chi" << i << " "; for (const std::string &a : core::chemical::ChiAnglesDefinition::chi_angle_atoms(m, i)) // List atoms for each of them std::cout << a << " "; std::cout << "\n"; } } |

ex_Cif¶
Unit test which shows how to read CIF files.
USAGE:
ex_Cif file.cif
EXAMPLE:
ex_Cif AA3.cif
Keywords:
Categories:
- core/data/io/Cif
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #include <core/data/io/Cif.hh> #include <utils/Logger.hh> #include <utils/LogManager.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to read CIF files. USAGE: ex_Cif file.cif EXAMPLE: ex_Cif AA3.cif )"; /** @brief ex_Cif tests reading CIF files * * CATEGORIES: core/data/io/Cif * KEYWORDS: CIF input */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::FINEST(); // --- INFO is the default logging level; set it to FINE to see more core::data::io::Cif reader(argv[1]); std::cout << reader; } |

ex_Combinations¶
Unit test for BioShell’s combination generator prints all possible tripeptides by taking all 3-element combinations of 20-elements set.
USAGE:
./ex_Combinations
Keywords:
- algorithms
- random
Categories:
- core::algorithms::Combination
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <memory> #include <iostream> #include <random> #include <core/algorithms/Combinations.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test for BioShell's combination generator prints all possible tripeptides by taking all 3-element combinations of 20-elements set. USAGE: ./ex_Combinations )"; /** @brief A simple example shows how to generate Combination * * The program generates all possible tripeptides as ${20}\choose {3}$ combinations * * CATEGORIES: core::algorithms::Combination; * KEYWORDS: algorithms; random */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::vector<std::string> amino_acids{"ALA","ARG","ASP","ASN","CYS","PHE","GLU","GLN","GLY","HIS","ILE","LEU","LYS","MET","PRO","SER", "THR","TYR","TRP","VAL"}; std::vector<std::string> a_combination(3); core::algorithms::Combinations<std::string> generator(3,amino_acids); int cnt = 0; while (generator.next(a_combination)) { std::cout << a_combination[0] << " " << a_combination[1] << " " << a_combination[2] << "\n"; ++cnt; } std::cout << "# " << cnt << " combinations generated\n"; } |

ex_DsspData¶
ex_DsspData reads a DSSP file and writes secondary structure in FASTA format
USAGE:
ex_DsspData input.dssp
EXAMPLE:
ex_DsspData 5edw.dssp
Keywords:
Categories:
- core/data/io/DsspData
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #include <iostream> #include <core/data/io/fasta_io.hh> #include <core/data/io/DsspData.hh> #include <utils/exit.hh> std::string program_info = R"( ex_DsspData reads a DSSP file and writes secondary structure in FASTA format USAGE: ex_DsspData input.dssp EXAMPLE: ex_DsspData 5edw.dssp )"; /** @brief Reads a DSSP file and prints the sequence and the secondary structure of each chain in FASTA format. * * @see ex_dssp_to_ss2.cc converts DSSP to SS2 format * * CATEGORIES: core/data/io/DsspData * KEYWORDS: DSSP; FASTA output; Structure; secondary structure; Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::DsspData dssp(argv[1], true); // --- read a DSSP file - the first command line argument of the program for (const auto & ss2 : dssp.sequences()) // --- for each protein sequence found in the DSSP data ... std::cout << core::data::io::create_fasta_string(*ss2, 80) << "\n" // --- print the sequence as FASTA << core::data::io::create_fasta_secondary_string(*ss2, 80) << "\n"; // --- print the secondary structure as FASTA } |

ex_FastaMatchProtocol¶
ex_FastaMatchProtocol finds similar substrings between two amino acid sequences. FastaMatchProtocol implements FAST algorithm to detect similar subsequences. This example just prints the list of FAST matches found between any two sequences from the input set.
USAGE:
ex_FastaMatchProtocol input.fasta [n_threads]
EXAMPLES:
ex_FastaMatchProtocol small500_95identical.fasta 4
REFERENCE: Smith, Temple F., and Michael S. Waterman. “Identification of common molecular subsequences.” PNAS 85 (1988): 2444-8 doi:10.1073/pnas.85.8.2444
Keywords:
Categories:
- core/protocols/PairwiseSequenceIdentityProtocol.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | #include <core/index.hh> #include <utils/exit.hh> #include <core/data/io/fasta_io.hh> #include <core/protocols/PairwiseSequenceIdentityProtocol.hh> #include <core/protocols/FastaMatchProtocol.hh> std::string program_info = R"( ex_FastaMatchProtocol finds similar substrings between two amino acid sequences. FastaMatchProtocol implements FAST algorithm to detect similar subsequences. This example just prints the list of FAST matches found between any two sequences from the input set. USAGE: ex_FastaMatchProtocol input.fasta [n_threads] EXAMPLES: ex_FastaMatchProtocol small500_95identical.fasta 4 REFERENCE: Smith, Temple F., and Michael S. Waterman. "Identification of common molecular subsequences." PNAS 85 (1988): 2444-8 doi:10.1073/pnas.85.8.2444 )"; /** @brief Uses FastaMatchProtocol protocol to find similar substrings between two amino acid sequences * * CATEGORIES: core/protocols/PairwiseSequenceIdentityProtocol.hh * KEYWORDS: FASTA input; sequence alignment; statistics */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::sequence; using namespace core::protocols; using namespace core::alignment; utils::Logger logs("ex_FastaMatchProtocol"); core::index2 n_threads = (argc > 2) ? atoi(argv[2]) : 4; bool if_store_diagonals = false; logs << utils::LogLevel::INFO << "number of threads used : " << n_threads << "\n"; core::protocols::FastaMatchProtocol protocol; protocol.minimum_diagonal_coverage(0.9).shortest_match_recorded(20).minimum_identity(0.9).longest_gap(8); protocol.batch_size(10000).n_threads(n_threads).keep_alignments(if_store_diagonals).printed_seqname_length(5); std::vector<Sequence_SP> input_sequences; core::data::io::read_fasta_file(argv[1], input_sequences); for(Sequence_SP si: input_sequences) protocol.add_input_sequence(si); auto start = std::chrono::high_resolution_clock::now(); // --- timer starts! protocol.run(); auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(end - start); logs << utils::LogLevel::INFO << (size_t ) protocol.n_jobs_completed() << " FASTA matches calculated within " << time_span.count() << " [s]\n"; if(if_store_diagonals) { std::cout << "Diagonals:\n"; protocol.print_header(std::cout); protocol.print_diagonals(std::cout); } std::cout << "Hits:\n"; std::vector<core::index4> hits; for(core::index4 i_seq=0;i_seq< input_sequences.size();++i_seq) { if (protocol.matches(i_seq, hits) > 0) { std::cout << i_seq << " :"; for (core::index4 j:hits) std::cout << " " << j; std::cout << "\n"; } } } |

ex_GraphWithData¶
A unit test for SimpleGraph and GraphWithData classes
Keywords:
Categories:
- core/algorithms/GraphWithData; core/algorithms/SimpleGraph
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <memory> #include <iostream> #include <iomanip> #include <core/algorithms/GraphWithData.hh> #include <core/algorithms/SimpleGraph.hh> /** @brief A unit test for SimpleGraph and GraphWithData classes * * This program creates small graph data structures and test their methods * * CATEGORIES: core/algorithms/GraphWithData; core/algorithms/SimpleGraph * KEYWORDS: algorithms; data structures * IMG_ALT: Example tree node */ int main(const int argc, const char* argv[]) { using namespace core::algorithms; GraphWithData<SimpleGraph,int,std::string> g; g.add_vertex(0); g.add_vertex(1); g.add_vertex(2); g.add_vertex(3); g.add_edge(0,2,"0-2"); g.add_edge(1,2,"1-2"); g.add_edge(3,2,"3-2"); g.add_edge(core::index4(3),core::index4(0),"0-3"); std::cout << "# adjacency matrix\n"; g.print_adjacency_matrix(std::cout); std::cout << "# are 0 and 3 connected?\n" << std::boolalpha << g.are_connected(0,3)<<"\n"; g.remove_edge(0,3); std::cout << "# are 0 and 3 still connected?\n" << std::boolalpha << g.are_connected(0,3)<<"\n"; std::cout << "# adjacency matrix\n"; g.print_adjacency_matrix(std::cout); std::cout << "# neighbors of 3\n"; for(auto iter = g.begin(3);iter!=g.end(3);++iter) std::cout << *iter<<" "; std::cout << "\n"; return 0; } |

ex_HierarchicalClustering¶
Example showing how to use hierarchical clustering method. The program uses Single Link method to cluster letters. Once clustering is done, it prints the clustering tree.
USAGE:
./ex_HierarchicalClustering
Keywords:
Categories:
- core::calc::clustering::HierarchicalClustering
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | #include <iostream> #include <sstream> #include <vector> #include <numeric> // for std::accumulate #include <core/algorithms/trees/algorithms.hh> #include <core/calc/clustering/DistanceByValues.hh> #include <core/calc/clustering/HierarchicalCluster.hh> #include <core/calc/clustering/HierarchicalClustering.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Example showing how to use hierarchical clustering method. The program uses Single Link method to cluster letters. Once clustering is done, it prints the clustering tree. USAGE: ./ex_HierarchicalClustering )"; using namespace core::calc::clustering; static utils::Logger l("ex_HierarchicalClustering"); /// Data points to be clustered std::vector<std::string> points = {"A", "B", "C", "E", "G", "L", "M", "Q", "R", "T", "X", "Y", "Z"}; /// Distance function defined for the data above; here the alphabetic distance is used DistanceByValues<float> calc_distance_matrix(std::vector<std::string> points) { DistanceByValues<float> d(points, 99.0, 99.0); for (size_t i = 1; i < d.n_data(); i++) for (size_t j = 0; j < i; j++) { float v = std::sqrt((points[j][0] - points[i][0]) * (points[j][0] - points[i][0])); d.set(i, j, v); d.set(j, i, v); } return d; } /** @brief Example showing how to use hierarchical clustering method. * * CATEGORIES: core::calc::clustering::HierarchicalClustering * KEYWORDS: clustering; hierarchical clustering */ int main(int cnt, char *argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); DistanceByValues<float> d = calc_distance_matrix(points); HierarchicalClustering<float, std::string> hac(d.labels(), ""); hac.run_clustering(d, ""); std::vector<std::string> elements; for (size_t i = 0; i < hac.count_steps(); i++) { elements.clear(); std::shared_ptr<BinaryTreeNode<std::string> > c_node(std::static_pointer_cast<BinaryTreeNode<std::string>>(hac.clustering_step(i))); core::algorithms::trees::collect_leaf_elements(c_node, elements); std::string a = std::accumulate(elements.begin(), elements.end(), std::string("")); a.erase(std::remove(a.begin(), a.end(), ' '), a.end()); std::sort(a.begin(), a.end()); std::cout << "Clustering step: "<<i<<" : "; std::cout << a << "\n"; } // --- write the clustering steps to a stream std::ostringstream sso; hac.write_merging_steps(sso); std::cout <<sso.str(); // --- get medoid element for of the clusters created at distance d = 1.0 auto clusters = hac.get_clusters(1.0, 2); for (core::index2 i = 0; i < 4; i++) std::cout << medoid_by_average_distance<float, std::string, DistanceByValues<float> >(clusters[i], d).medoid << "\n"; } |

ex_HierarchicalClustering1B¶
Example showing how to use hierarchical clustering method - 1 byte version. HierarchicalClustering1B is a specialized version of HierarchicalClustering which uses as least memory as possible. Distance values must be an integer in the range 0-255 (both inclusive); user is responsible for an appropriate and relevant conversion. The program uses Complete Link strategy. Once clustering is done, it prints medoids - elements located in centers of their clusters, corresponding to the given distance cutoff. The default cutoff value is set to 195. The clustering tree is printed on stderr.
USAGE:
ex_HierarchicalClustering1B input.txt
EXAMPLE:
ex_HierarchicalClustering1B fasta_distances
REFERENCE: Dominik Gront, Andrzej Koliński. “HCPM–program for hierarchical clustering of protein models.” Bioinformatics, 21 (2005):3179–80 doi:10.1093/bioinformatics/bti450
Keywords:
Categories:
- core::calc::clustering::HierarchicalClustering
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | #include <iostream> #include <sstream> #include <vector> #include <numeric> // for std::accumulate #include <core/algorithms/trees/algorithms.hh> #include <core/algorithms/UnionFind.hh> #include <core/calc/clustering/DistanceByValues1B.hh> #include <core/calc/clustering/HierarchicalCluster.hh> #include <core/calc/clustering/HierarchicalClustering1B.hh> #include <core/BioShellEnvironment.hh> #include <utils/LogManager.hh> using namespace core::calc::clustering; static utils::Logger l("ex_HierarchicalClustering1B"); std::string program_info = R"( Example showing how to use hierarchical clustering method - 1 byte version. HierarchicalClustering1B is a specialized version of HierarchicalClustering which uses as least memory as possible. Distance values must be an integer in the range 0-255 (both inclusive); user is responsible for an appropriate and relevant conversion. The program uses Complete Link strategy. Once clustering is done, it prints medoids - elements located in centers of their clusters, corresponding to the given distance cutoff. The default cutoff value is set to 195. The clustering tree is printed on stderr. USAGE: ex_HierarchicalClustering1B input.txt EXAMPLE: ex_HierarchicalClustering1B fasta_distances REFERENCE: Dominik Gront, Andrzej Koliński. "HCPM–program for hierarchical clustering of protein models." Bioinformatics, 21 (2005):3179–80 doi:10.1093/bioinformatics/bti450 )"; const int MAX = 10; // --- longest sequence id string DistanceByValues1B read_distance_matrix(const std::string &distance_file, std::set<std::string> & labels) { core::index1 val; std::string line; std::ifstream infile(distance_file); char name_i[MAX], name_j[MAX]; while (std::getline(infile, line)) { if(scanf_row(line, name_i, name_j, val)==3) { labels.insert(name_i); labels.insert(name_j); } } infile.close(); infile.open(distance_file); std::vector<std::string> v( labels.begin(), labels.end() ); core::algorithms::UnionFindSI4 uf(labels.size()); for (const std::string &s:v) uf.add_element(s); DistanceByValues1B d(v); while (std::getline(infile, line)) { if (scanf_row(line, name_i, name_j, val) == 3) { size_t i_index = d.at(name_i); size_t j_index = d.at(name_j); d.set(i_index, j_index, val); d.set(j_index, i_index, val); if(val==255) uf.union_set(i_index, j_index); } } std::cout << "# UnionFind groups larger than 2 (representative PDB-ID, group size):\n"; const auto sets = uf.retrieve_sets(); for (const auto &set: sets) { if (set.second.size() > 2) std::cout << uf.element(set.first) << " : " << set.second.size() << "\n"; } std::cout << sets.size() << "\n"; return d; } /** @brief Example showing how to use hierarchical clustering method - 1 byte version. * * CATEGORIES: core::calc::clustering::HierarchicalClustering * KEYWORDS: clustering; hierarchical clustering */ int main(int cnt, char *argv[]) { std::set<std::string> labels_set; DistanceByValues1B d = read_distance_matrix(argv[1], labels_set); std::cout << labels_set.size() << " items for clustering\n"; utils::LogManager::INFO(); HierarchicalClustering1B hac(d.labels(), ""); CompleteLink1B merge; hac.run_clustering(d, merge); // --- write the clustering steps to a stream hac.write_merging_steps(std::cerr); } |

ex_Interpolate1D¶
Unit test which reads a file with two columns of data and calculates interpolated values for given number of steps.
USAGE:
./ex_Interpolate1D file.txt n_points
EXAMPLE:
./ex_Interpolate1D input.txt 100
Keywords:
Categories:
- core::calc::numeric::Interpolate1D
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <cmath> #include <iostream> #include <vector> #include <core/calc/numeric/interpolators.hh> #include <core/calc/numeric/Interpolate1D.hh> #include <core/data/io/DataTable.hh> #include <utils/exit.hh> using namespace core::calc::numeric; std::string program_info = R"( Unit test which reads a file with two columns of data and calculates interpolated values for given number of steps. USAGE: ./ex_Interpolate1D file.txt n_points EXAMPLE: ./ex_Interpolate1D input.txt 100 )"; /** @brief Reads a file with two columns of data and calculates interpolated values * * CATEGORIES: core::calc::numeric::Interpolate1D; * KEYWORDS: interpolation */ int main(int argc, char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); std::vector<double> vx; // --- vector of function arguments: X axis std::vector<double> vy; // --- vector of function arguments: Y axis core::data::io::DataTable in_data(argv[1]); in_data.column(0, vx); in_data.column(1, vy); // --- Create the actual interpolator object CatmullRomInterpolator<double> cri; Interpolate1D<std::vector<double>, double, CatmullRomInterpolator<double> > ip(vx,vy, cri); double step = (vx.back() - vx.front())/ atof(argv[2]); for (double x = vx[0]; x <= vx.back(); x += step) { std::cout << x << " " << ip(x) << "\n"; } } |

ex_InterpolatePeriodic1D¶
Simple example creates an interpolating polynomial for sin(x) function and computes maximal interpolation error (i.e. the maximum of the difference between the true function and its interpolator)
USAGE:
./ex_InterpolatePeriodic1D
Keywords:
Categories:
- core::calc::numeric::InterpolatePeriodic1D
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #include <cmath> #include <iostream> #include <vector> #include <core/calc/numeric/interpolators.hh> #include <core/calc/numeric/InterpolatePeriodic1D.hh> #include <core/calc/structural/angles.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Simple example creates an interpolating polynomial for sin(x) function and computes maximal interpolation error (i.e. the maximum of the difference between the true function and its interpolator) USAGE: ./ex_InterpolatePeriodic1D )"; using namespace core::calc::numeric; /** @brief Simple test for interpolation of a periodic 1D function * * CATEGORIES: core::calc::numeric::InterpolatePeriodic1D; * KEYWORDS: interpolation */ int main(int cnt, char *argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::vector<float> x, y; // --- vectors for points used for interpolation double step = M_PI / 50.0; // --- interpolation data step for (double ix = 0.0; ix < 2 * M_PI; ix += step) { // --- generate data for interpolation knots x.push_back(ix); y.push_back(sin(ix)); } CatmullRomInterpolator<float> cri; // --- Interpolating engine InterpolatePeriodic1D<std::vector<float>, float, CatmullRomInterpolator<float> > i1d2(x, y, cri, 2*M_PI); double max_error = 0.0; // --- used to keep track of the maximum interpolation error double worst_x = 0.0; // --- The function has been defined in the range \f$[0,\pi]\f$ // --- interpolation goes in the range \f$[-20\pi,40\pi]\f$ for (float x = -20 * M_PI; x <= 40 * M_PI; x += 0.1 * step) { double error = fabs(sin(x) - i1d2(x)); if (error > max_error) { max_error = std::max(error, max_error); worst_x = x; } } std::cout << "Maximum interpolation error:" << max_error << " for x= " << worst_x << "\n"; } |

ex_InterpolatePeriodic2D¶
Simple example creates an interpolating polynomial for sin(x) * cos(y) 2D function and computes maximal interpolation error (i.e. the maximum of the difference between the true function and its interpolator)
USAGE:
./ex_InterpolatePeriodic2D
Keywords:
Categories:
- core::calc::numeric::InterpolatePeriodic2D
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <cmath> #include <iostream> #include <memory> #include <sstream> #include <vector> #include <core/calc/numeric/interpolators.hh> #include <core/calc/numeric/InterpolatePeriodic2D.hh> #include <core/calc/structural/angles.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Simple example creates an interpolating polynomial for sin(x) * cos(y) 2D function and computes maximal interpolation error (i.e. the maximum of the difference between the true function and its interpolator) USAGE: ./ex_InterpolatePeriodic2D )"; using namespace core::calc::numeric; /** @brief Simple test for interpolation of a periodic 2D function * * CATEGORIES: core::calc::numeric::InterpolatePeriodic2D; * KEYWORDS: interpolation */ int main(int cnt, char* argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); double inter_step = core::calc::structural::to_radians(5.0); // --- interpolation step : 5 degrees const double EPS = 0.0001; std::vector<double> vx; // --- vector of function arguments: X axis std::vector<double> vy; // --- vector of function arguments: Y axis for (double x = -M_PI; x < M_PI-EPS; x += inter_step) { vx.push_back(x); vy.push_back(x); // --- we use the same values both for X and Y but in general they may differ } core::index2 nx = vx.size(); // the number of grid points // --- Prepare data to be interpolated std::shared_ptr<core::data::basic::Array2D<double>> data_periodic = std::make_shared<core::data::basic::Array2D<double>>(nx,nx); for (size_t ix = 0; ix < vx.size(); ++ix) for (size_t iy = 0; iy < vy.size(); ++iy) data_periodic->set(ix, iy, sin(vx[ix]) * cos(vy[iy])); // --- Create the actual interpolator object CatmullRomInterpolator<double> cri; InterpolatePeriodic2D<double, CatmullRomInterpolator<double> > ip(-M_PI,inter_step,nx,-M_PI,inter_step,nx, data_periodic, cri); double max_error = 0.0; for (double x = -5; x <= 4.0; x += inter_step / 3.0) { for (double y = -5.0; y <= 4.0; y += inter_step / 3.0) { double v = sin(x) * cos(y); // --- calculate the true value of the interpolated function ... double vi = ip(x, y); // --- also the interpolated value double err = fabs(v - vi); max_error = std::max(err, max_error); } } std::cout << "Maximum interpolation error: " << max_error << "\n"; } |

ex_Ising2D¶
Simple but fully functional Ising simulating program.
Keywords:
- Mover
- Simulated Annealing
- Ising2D
Categories:
- simulations::systems::ising::Ising2D
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | #include <iostream> #include <fstream> #include <simulations/systems/ising/Ising2D.hh> #include <simulations/movers/ising/SingleFlip2D.hh> #include <simulations/movers/ising/WolffMove2D.hh> #include <simulations/observers/ObserveEvaluators.hh> #include <simulations/movers/MoversSetSweep.hh> #include <simulations/sampling/SimulatedAnnealing.hh> /* @brief Simple but fully functional Ising simulating program. * * Runs simulated annealing simulations for 100x100 * * CATEGORIES: simulations::systems::ising::Ising2D * KEYWORDS: Mover;Simulated Annealing;Ising2D */ using namespace simulations::systems::ising; using namespace simulations::movers::ising; int main(const int argc, const char *argv[]) { /* Simulation controlling variables */ int n_cols = 10, n_rows = 10; // size of system /* Other settings necessary for the simulation */ int seed = 12345; // seed for rng core::calc::statistics::Random::seed(seed); /* Initializing the system */ std::shared_ptr<Ising2D<core::index1,core::index2>> system = std::make_shared<Ising2D<core::index1,core::index2>>(n_rows, n_cols); system->initialize(); // Populate system with random spins simulations::movers::MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover( std::make_shared<SingleFlip2D<core::index1,core::index2>>(*system),system->count_spins()); movers->add_mover( std::make_shared<WolffMove2D<core::index1,core::index2>>(*system),system->count_spins()*0.2); std::vector<float> temperatures = {5,4,3,2.5,2.25,2,1.75,1.5,1}; simulations::sampling::SimulatedAnnealing sa(movers,temperatures); sa.cycles(100,100); simulations::observers::ObserveEvaluators_SP observations = std::make_shared<simulations::observers::ObserveEvaluators>(""); observations->add_evaluator(system); sa.outer_cycle_observer(observations); sa.run(); observations->finalize(); } |

ex_IterateIJ¶
Unit test which shows how to use IterateIJ class, which is an iterator to a 2D container, e.g. a 2D array.
USAGE:
./ex_IterateIJ
Keywords:
Categories:
- core::algorithms::IterateIJ
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | #include <iostream> #include <core/algorithms/IterateIJ.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use IterateIJ class, which is an iterator to a 2D container, e.g. a 2D array. USAGE: ./ex_IterateIJ )"; /** @brief A simple example shows how to use IterateIJ * * * CATEGORIES: core::algorithms::IterateIJ * KEYWORDS: data structures; algorithms */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); // ---------- Here we declare an iterator that generates indexes for a square 5x5 2D array core::algorithms::IterateIJ ij1(5, false); for (auto ij:ij1) std::cout << ij.first << " " << ij.second << "\n"; std::cout <<"\n"; // ---------- Here we declare an iterator that generates indexes for the UR triangle of a 5x5 2D array core::algorithms::IterateIJ ij2(5, true); for (auto ij:ij2) std::cout << ij.first << " " << ij.second << "\n"; std::cout <<"\n"; // ---------- Now only selected rows of a square 5x5 2D array core::algorithms::IterateIJ ij3(5, false); ij3.add_selected_row(1).add_selected_row(2); for (auto ij:ij3) std::cout << ij.first << " " << ij.second << "\n"; std::cout <<"\n"; // ---------- Now only selected rows and only UR triangle of a 5x5 2D array core::algorithms::IterateIJ ij4(5, true); ij4.add_selected_row(2).add_selected_row(3); for (auto ij:ij4) std::cout << ij.first << " " << ij.second << "\n"; } |

ex_JsonNode¶
Demonstrates how to handle JSON data.
USAGE:
./ex_JsonNode
Keywords:
- JSON
Categories:
- core::data::io::JsonNode
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #include <iostream> #include <core/data/io/json_io.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Demonstrates how to handle JSON data. USAGE: ./ex_JsonNode )"; /** @brief Demo for handling JSON data * * The example tests whether a JSON data is parsed and printed correctly. * * CATEGORIES: core::data::io::JsonNode * KEYWORDS: JSON */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::data::io; // Create JsonNode object using constructors JsonNode_SP n0 = std::make_shared<JsonNode>(std::make_shared<JsonValue>("options"),1); n0->add_branch(std::make_shared<JsonNode>(std::make_shared<JsonValue>("size","56"),2)); n0->add_branch(std::make_shared<JsonNode>(std::make_shared<JsonValue>("color","red"),3)); std::cout << n0; // This is how to print a full JSON tree std::string json = R"({"options" : {"size" : 56 , "color" : {"fg" : "red", "feature" : "none", "bg" : {"r":25,"g":124,"b":19} }, "do" : "all"})" ; JsonNode_SP root = read_json(json); // Here json string is parsed std::cout << root; // and here send back to a stream // Here a JsonArray instance is created; an empty array at first ... std::shared_ptr<JsonArray> jv1 = std::make_shared<JsonArray>("res-1"); // and now that empty array is fileld with data; <code>"011"</code> string means that the first token (i.e. 37) does not // require quotes (hence logical 0) and the two latter tokens do (logical 1) jv1->values("011",37,"ALA",'A'); JsonNode_SP jv2 = create_json_node("res-2","011",38,"PHE",'A'); std::cout << (*jv1) << " " << (jv2) << "\n"; // Create JsonNode object using helper methods, provided by <code>json_io.hh</code> JsonNode_SP another_root = create_json_node(); another_root->add_branch( create_json_node(jv1) ); another_root->add_branch( jv2 ); std::cout << another_root; } |

ex_KDE_1D¶
Reads one column of observations and calculates Kernel Density Estimator (KDE) with given bandwidth value for the data. If optional parameters min and max are given, it defines the evaluation range. The last optional argument is the word ‘periodic’ to treat the estimated distribution as periodic.
USAGE:
ex_KDE_1D normal.txt 0.25 [min max periodic]
REFERENCE: Davis, Richard A., Keh-Shin Lii, Dimitris N. Politis. “Remarks on some nonparametric estimates of a density function.” Selected Works of Murray Rosenblatt. Springer, New York, NY, 2011. 95-100. doi:10.1214/aoms/1177728190.
Parzen, Emanuel. “On estimation of a probability density function and mode.” The annals of mathematical statistics 33.3 (1962): 1065-1076.
Keywords:
Categories:
- core/calc/statistics/KDE_1D
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/data/io/DataTable.hh> #include <core/calc/statistics/KDE_1D.hh> #include <utils/exit.hh> std::string program_info = R"( Reads one column of observations and calculates Kernel Density Estimator (KDE) with given bandwidth value for the data. If optional parameters min and max are given, it defines the evaluation range. The last optional argument is the word 'periodic' to treat the estimated distribution as periodic. USAGE: ex_KDE_1D normal.txt 0.25 [min max periodic] REFERENCE: Davis, Richard A., Keh-Shin Lii, Dimitris N. Politis. "Remarks on some nonparametric estimates of a density function." Selected Works of Murray Rosenblatt. Springer, New York, NY, 2011. 95-100. doi:10.1214/aoms/1177728190. Parzen, Emanuel. "On estimation of a probability density function and mode." The annals of mathematical statistics 33.3 (1962): 1065-1076. )"; /** @brief Reads one column of observations and calculates Kernel Density Estimator (KDE) for the data * * CATEGORIES: core/calc/statistics/KDE_1D * KEYWORDS: statistics; estimation; data table */ int main(const int argc, const char* argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::DataTable input_data(argv[1]); // --- Here we read a text file with data values in columns std::vector<double> col1; // --- empty vector to hold the data from a file bool is_periodic = (argc > 2 && argv[argc-1][0] == 'p'); // --- Here we read the input file and create a KDE estimator for the data in the first column core::calc::statistics::KDE_Kernel kernel_type = core::calc::statistics::normal_kernel; core::calc::statistics::KDE_1D kde(input_data.column(0, col1), atof(argv[2]), kernel_type, is_periodic); // --- find max and min value in the file; tabulate the data in 100 steps double min = (argc < 5) ? *std::min_element(col1.begin(), col1.end()) : atof(argv[3]); double max = (argc < 5) ? *std::max_element(col1.begin(), col1.end()) : atof(argv[4]); double step = (max - min) / 300.0; for (double x = min; x <= max; x += step) std::cout << utils::string_format("%8.3f %8.3f\n", x, kde(x, is_periodic)); } |

ex_LBFGS¶
Unit test which shows how to use Broyden–Fletcher–Goldfarb–Shanno (BFGS) function minimizer.
USAGE:
./ex_LBFGS
REFERENCE: Fletcher, Roger. Practical methods of optimization. John Wiley & Sons, 2013.
Keywords:
Categories:
- core/calc/numeric/Bfgs
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #include <iostream> #include <core/index.hh> #include <core/calc/numeric/LBFGS.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use Broyden–Fletcher–Goldfarb–Shanno (BFGS) function minimizer. USAGE: ./ex_LBFGS REFERENCE: Fletcher, Roger. Practical methods of optimization. John Wiley & Sons, 2013. )"; using namespace core::calc::numeric; class TestFunction : public DerivableFunction<double> { public: double a, b; TestFunction() : a(1.0), b(30.0) {} virtual double operator()(const std::vector<double> &x) { return (a - x[0]) * (a - x[0]) + b * (x[1] - x[0] * x[0]) * (x[1] - x[0] * x[0]); } virtual double operator()(const std::vector<double> &x, std::vector<double> &gradient) { float t1 = a - x[0]; float t2 = b * (x[1] - x[0] * x[0]); gradient[1] = 2 * b * t2; gradient[0] = -2.0 * (x[0] * gradient[1] + t1); return t1 * t1 + t2 * t2; } core::index2 dim() const { return 2; } }; /** @brief Example shows how to use BFGS function minimizer * * * CATEGORIES: core/calc/numeric/Bfgs * KEYWORDS: numerical methods */ int main(const int argc, const char *argv[]) { TestFunction f; LBFGS<double> minimizer(2); std::vector<double> x{0.0,0.0}; double val; core::index2 nit = minimizer.minimize(f, x, val); std::cout << "Minimum value " << val << " found at ["; for (const double v:x) std::cout << v << ' '; std::cout << "] after " << nit << " iterations\n"; return 0; } |

ex_MC_Ar¶
The program runs an isothermal MC simulation of argon gas. By default it stars from a regular lattice conformation unless an input file (PDB) with initial conformation is provided
USAGE:
ex_MC_Ar n_atoms density temperature small_cycles big_cycles [max_jump]
ex_MC_Ar starting.pdb density temperature small_cycles big_cycles [max_jump]
Keywords:
- no_keywords
Categories:
- no_categories
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | #include <cstdio> #include <ctime> #include <iostream> #include <thread> #include <core/data/basic/Vec3I.hh> #include <core/BioShellVersion.hh> #include <utils/string_utils.hh> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/options/output_options.hh> #include <utils/options/sampling_options.hh> #include <simulations/systems/CartesianAtoms.hh> #include <simulations/systems/BuildFluidSystem.hh> #include <simulations/systems/SingleAtomType.hh> #include <simulations/movers/TranslateAtom.hh> #include <simulations/forcefields/cartesian/LJEnergySWHomogenic.hh> #include <simulations/sampling/IsothermalMC.hh> #include <simulations/observers/ObserveEvaluators.hh> #include <simulations/observers/cartesian/PdbObserver.hh> #include <simulations/evaluators/CallEvaluator.hh> #include <simulations/observers/ObserveMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/observers/AdjustMoversAcceptance.hh> #include <simulations/observers/cartesian/SimplePdbFormatter.hh> using namespace core::data::basic; utils::Logger logs("ex_MC_Ar"); std::string program_info = R"( The program runs an isothermal MC simulation of argon gas. By default it stars from a regular lattice conformation unless an input file (PDB) with initial conformation is provided USAGE: ex_MC_Ar n_atoms density temperature small_cycles big_cycles [max_jump] ex_MC_Ar starting.pdb density temperature small_cycles big_cycles [max_jump] )"; const double EPSILON = 1.654E-21; // [J] per molecule const double EPSILON_BY_K = EPSILON / 1.381E-23; // = 119.6 in Kelvins const double SIGMA = 3.4; // in Angstroms /** @brief Isothermal Monte Carlo simulation of argon gas. * */ int main(const int argc,const char* argv[]) { using core::data::basic::Vec3I; using namespace simulations::systems; using namespace simulations::movers; // for MoversSet using namespace simulations::observers::cartesian; // for all observers logs << utils::LogLevel::INFO << "BioShell version:\n"<<core::BioShellVersion().to_string() << "\n"; core::index4 n_outer_cycles = 1000; core::index4 n_inner_cycles = 100; double density = 0.5; // density of the system controls how many atoms will be contained in the box double temperature = 97; // in Kelvins core::index4 n_atoms = 64; double max_jump = 0.5; // Random move range (in Angstroms) core::data::structural::Structure_SP argon_structure = nullptr; core::data::structural::PdbAtom_SP ar_atom = std::make_shared<core::data::structural::PdbAtom>(1," AR"); if (argc < 6) std::cerr << program_info; else { if (utils::is_integer(argv[1])) n_atoms = atoi(argv[1]); else { // --- read an input file if given core::data::io::Pdb reader(argv[1]); argon_structure = reader.create_structure(0); n_atoms = argon_structure->count_atoms(); } density = atof(argv[2]); temperature = atof(argv[3]); n_inner_cycles = atoi(argv[4]); n_outer_cycles = atoi(argv[5]); if (argc == 7) max_jump = atof(argv[6]); } double ar_volume = 4.0 / 3.0 * M_PI * SIGMA * SIGMA * SIGMA * n_atoms; double box_len = pow(ar_volume / density, 0.33333333333333); // --- Initialize periodic boundary conditions core::data::basic::Vec3I::set_box_len(box_len); logs << utils::LogLevel::INFO << "box width for " << int(n_atoms) << " atoms set to : " << box_len << "\n"; // --- Create the system and distribute atoms in the box AtomTypingInterface_SP ar_type = std::make_shared<SingleAtomType>(" AR"); CartesianAtoms ar(ar_type, n_atoms); core::calc::statistics::Random::seed(1234); if(argon_structure != nullptr) { // --- read coordinates from a PDB file if provided set_conformation(argon_structure->first_const_atom(), argon_structure->last_const_atom(), ar); } else { // --- otherwise generate coordinates const auto grid = std::make_shared<SimpleCubicGrid>(box_len, n_atoms); BuildFluidSystem::generate(ar, *ar_atom, grid); } CartesianAtoms ar_backup(ar); // --- make a backup system // --- Create energy function - just LJ potential simulations::forcefields::cartesian::LJEnergySWHomogenic lj_energy(ar, SIGMA, EPSILON_BY_K); // --- Create a mover, which is a random perturbation of an atom in this case, and place it in a movers' set std::shared_ptr<TranslateAtom> translate = std::make_shared<TranslateAtom>(ar, ar_backup, lj_energy); translate->max_move_range_allowed(1.5); MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(translate, n_atoms); translate->max_move_range(max_jump); // --- set the maximum distance a single atom can be moved by a single MC perturbation // --- create an isothermal Monte Carlo sampler simulations::sampling::IsothermalMC mc(movers,temperature); // ---------- Create an observer which calls energy calculation and prints it on the screen std::shared_ptr<simulations::observers::ObserveEvaluators> obs = std::make_shared<simulations::observers::ObserveEvaluators>(""); std::function<double(void)> recent_energy = [&lj_energy, &ar]() { return lj_energy.energy(ar); }; std::function<double(void)> nbl_updates = [&lj_energy]() { return lj_energy.non_bonded_neighbors().updates_ratio(); }; obs->add_evaluator( std::make_shared<simulations::evaluators::CallEvaluator<std::function<double(void)>>>(recent_energy, "energy", 8)); obs->add_evaluator( std::make_shared<simulations::evaluators::CallEvaluator<std::function<double(void)>>>(nbl_updates, "updates_ratio", 6)); // std::shared_ptr<simulations::observers::ObserveMoversAcceptance> observe_moves // = std::make_shared<simulations::observers::ObserveMoversAcceptance>(*movers,"movers.dat"); std::shared_ptr<simulations::observers::AdjustMoversAcceptance> observe_moves = std::make_shared<simulations::observers::AdjustMoversAcceptance>(*movers,"movers.dat", 0.4); observe_moves->observe_header(); std::shared_ptr<AbstractPdbFormatter> fmt = std::make_shared<SimplePdbFormatter>(" AR ", "AR ", "AR"); auto observe_trajectory = std::make_shared<simulations::observers::cartesian::PdbObserver>(ar, fmt, "ar_tra.pdb"); observe_trajectory->trigger(std::make_shared<simulations::observers::TriggerEveryN>(10)); mc.outer_cycle_observer(observe_trajectory); // --- commented out to save disk space mc.outer_cycle_observer(observe_moves); mc.outer_cycle_observer(obs); mc.cycles(n_inner_cycles,n_outer_cycles,1); mc.run(); simulations::observers::cartesian::PdbObserver final(ar, fmt, "final.pdb"); final.observe(); logs << utils::LogLevel::INFO << "Final energy " << lj_energy.energy(ar) << "\n"; } |

ex_MMAtomTyping¶
Reads a PDB file and assigns MM atom typing for every atom of the given protein according to the given force field parametrisation file. If no .par file was given, AMBER03 force field will be used.
USAGE:
./ex_MMAtomTyping [param-file] input.pdb
EXAMPLE:
./ex_MMAtomTyping 2gb1.pdb
./ex_MMAtomTyping amber03_atoms.par 2gb1.pdb
Keywords:
- PDB input
- atom typing
- force field
Categories:
- simulations/forcefields/mm/MMAtomTyping
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #include <iostream> #include <vector> #include <core/data/io/Pdb.hh> #include <core/BioShellEnvironment.hh> #include <simulations/forcefields/mm/MMForceField.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a PDB file and assigns MM atom typing for every atom of the given protein according to the given force field parametrisation file. If no .par file was given, AMBER03 force field will be used. USAGE: ./ex_MMAtomTyping [param-file] input.pdb EXAMPLE: ./ex_MMAtomTyping 2gb1.pdb ./ex_MMAtomTyping amber03_atoms.par 2gb1.pdb )"; /** @brief Assigns MM atom typing for every atom of the given protein according to the given force field parametrisation file * * CATEGORIES: simulations/forcefields/mm/MMAtomTyping * KEYWORDS: PDB input; atom typing; force field */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace simulations::forcefields::mm; MMForceField mmff("AMBER03"); const MMAtomTyping & atom_typing = *(mmff.mm_atom_typing()); core::data::io::Pdb pdb((argc == 2) ? argv[1] : argv[2]); core::data::structural::Structure_SP s = pdb.create_structure(0); for (auto chain: *s) { for (core::index2 ires = 0; ires < chain->size(); ++ires) { for (auto atom : *(*chain)[ires]) { core::index2 t = atom_typing.atom_type(*atom); std::cout << *(*atom).owner() << " " << (*atom).atom_name() << " : " << t << " " << atom_typing.atom_internal_name(t) << "\n"; } } } } |

ex_MMBondEnergy¶
Keywords:
- no_keywords
Categories:
- no_categories
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #include <iostream> #include <core/data/basic/Vec3.hh> #include <core/data/structural/Structure.hh> #include <utils/LogManager.hh> #include <simulations/forcefields/mm/MMBondEnergy.hh> #include <simulations/forcefields/mm/MMBondedParameters.hh> #include <simulations/forcefields/mm/MMForceField.hh> int main(const int argc, const char *argv[]) { using namespace simulations::forcefields::mm; // for all MM - related force field classes using namespace core::data::structural; // for Structure, Residue, PdbAtom using namespace core::data::basic; // for Vec3 utils::LogManager::get().set_level("FINE"); if (argc < 2) { std::cerr << "USAGE:\n\t./ex_MMBondEnergy atom_typing ff_bonded.par input.pdb\n\n"; exit(0); } // --- Create molecular mechanic bond energy object MMForceField mmff("AMBER03"); MMBondedParameters bond_manager(argv[2],mmff.mm_atom_typing()); // --- Read structure for which calculate bond energy core::data::io::Pdb reader(argv[3]); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); auto rc = CartesianChains(mmff.mm_atom_typing(), *strctr); // --- Create molecular mechanic bond energy object simulations::forcefields::mm::MMBondEnergy bond_energy(rc,bond_manager); const auto & bonds = bond_energy.get_bonds(); std::cout << "# i j E(d0) d0 d\n"; // --- Calculate energy for each bond in the structure double total = 0.0; for (auto it = bonds.cbegin(); it != bonds.cend(); ++it) { double en = bond_energy.calculate(*it); total += en; double dreal = (rc)[it->i].distance_to((rc)[it->j]); std::cout << utils::string_format("%5d %5d %10.3f %4.2f %4.2f\n", (*it).i, (*it).j, en,(*it).d0,dreal); } // --- Calculate total bond energy by whole structure double total_en = bond_energy.energy(rc); std::cout << utils::string_format("%10.3f %10.3f\n", total, total_en); } |

ex_MMEnergy¶
Keywords:
- no_keywords
Categories:
- no_categories
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <string> #include <chrono> #include <core/data/basic/Vec3.hh> #include <utils/string_utils.hh> #include <utils/LogManager.hh> #include <simulations/forcefields/mm/MMNonBonded.hh> #include <simulations/forcefields/mm/MMBondType.hh> #include <simulations/forcefields/mm/MMBondEnergy.hh> #include <simulations/forcefields/mm/MMPlanarEnergy.hh> #include <simulations/forcefields/mm/MMDihedralEnergy.hh> #include <simulations/forcefields/mm/MMBondedParameters.hh> #include <simulations/forcefields/mm/MMForceField.hh> int main(const int argc, const char *argv[]) { using namespace core::data::structural; // for Structure, Residue, PdbAtom using namespace core::data::basic; // for Vec3 using namespace simulations::forcefields::mm; // for all MM - related force field classes using namespace simulations::systems; // for ResidueChain utils::LogManager::get().set_level("INFO"); if (argc < 2) { std::cerr << "USAGE:\n\t./ex_MMEnergy atoms.par ff_bonded.par 2gb1.pdb\n\n"; exit(0); } // --- Read atom typing and bond parameters MMForceField mmff("AMBER03"); simulations::forcefields::mm::MMBondedParameters ff(argv[2],mmff.mm_atom_typing()); // --- Read structure for which calculate bond energy core::data::io::Pdb reader(argv[3]); // file name (PDB format, may be gzip-ped) Structure_SP strctr = reader.create_structure(0); auto rc = CartesianChains(mmff.mm_atom_typing(), *strctr); size_t i = 0; // for (auto atom_it = strctr->first_const_atom(); atom_it != strctr->last_const_atom(); ++atom_it) { // auto at = atom_typing->atom_type(**atom_it); // (*rc)[i].register_ = (1.0 + at.charge()) * 10000.0; // ++i; // } // --- Create molecular mechanic bond energy object simulations::forcefields::mm::MMBondEnergy bond_energy(rc, ff); // --- Create molecular mechanic planar energy objects simulations::forcefields::mm::MMPlanarEnergy planar_energy(rc, ff, bond_energy); // --- Create molecular mechanic dihedral energy objects simulations::forcefields::mm::MMDihedralEnergy dihedral_energy(rc, ff, bond_energy); // --- Create non-bonded energy; pass the reference to bond energy object so non-bonded energy can exclude bonds MMNonBonded nb_energy(rc, bond_energy); // nb_energy.get_excluded_pairs().print(std::cout); auto start = std::chrono::high_resolution_clock::now(); std::cout<<"#bond_energy planar_energy dihedral_energy nb_energy\n"; std::cout << utils::string_format(" %10.3f %10.3f %10.3f %10.3f\n", bond_energy.energy(rc), planar_energy.energy(rc), dihedral_energy.energy(rc), nb_energy.energy(rc)); auto end = std::chrono::high_resolution_clock::now(); std::cerr << "# computed in " << std::chrono::duration<double>(end - start).count() << " [s]\n"; } |

ex_MMNonBonded¶
Keywords:
- no_keywords
Categories:
- no_categories
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #include <iostream> #include <string> #include <vector> #include <tuple> #include <utils/string_utils.hh> #include <utils/LogManager.hh> #include <core/data/structural/Structure.hh> #include <simulations/forcefields/mm/MMBondType.hh> #include <simulations/forcefields/mm/MMBondEnergy.hh> #include <simulations/forcefields/mm/MMNonBondedSW.hh> #include <simulations/forcefields/mm/MMNonBonded.hh> #include <simulations/forcefields/mm/MMForceField.hh> /** \todo_code Fix this demo one MM topology files parser is ready * */ int main(const int argc, const char *argv[]) { using namespace simulations::forcefields::mm; // for all MM - related force field classes using namespace core::data::structural; // for Structure, Residue, PdbAtom using namespace core::data::basic; // for Vec3 using namespace simulations::systems; // for ResidueChain utils::LogManager::get().set_level("FINE"); if (argc < 2) { std::cerr << "USAGE:\n\t./ex_MMNonBondEnergy data_atom_typing data_bond_typing input.pdb\n\n"; exit(0); } // --- Read atom typing and bond parameters MMForceField mmff("AMBER03"); MMBondedParameters bonded_manager(argv[2],mmff.mm_atom_typing()); // --- Read structure for which calculate bond energy core::data::io::Pdb reader(argv[3]); // file name (PDB format, may be gzip-ped) Structure_SP strctr = reader.create_structure(0); auto rc = CartesianChains(mmff.mm_atom_typing(), *strctr); // --- Create molecular mechanic bond energy object MMBondEnergy bond_energy(rc, bonded_manager); // --- Create non-bonded energy; pass the reference to bond energy object so non-bonded energy can exclude bonds MMNonBonded nb_energy(rc,bond_energy); std::cout<<"# list of atoms pair excluded from pairwise calculation information\n"; // nb_energy.get_neighbor_list().print(std::cout); // --- Calculate total non-bonded energy of the structure double energy_by_str = nb_energy.energy(rc); std::cout << utils::string_format("%10.3f\n", energy_by_str); } |

ex_MeanFieldDistributions¶
Prints values for a given Mean Field potential so it can be plotted nicely. The parameters of the program are: MF potental file name, pseudocounts, min_x, max_x, potential name [other potential names ..]
USAGE:
ex_MeanFieldDistributions "forcefield/cabs/R13_cabs.dat" 0.01 2.0 15.0 AD.HH
(note that apostrophes may be mandatory, otherwise bash will not pass the arguments correctly)
Keywords:
- force field
Categories:
- simulations/forcefields/mf/MeanFieldDistributions
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include <string> #include <simulations/forcefields/mf/MeanFieldDistributions.hh> #include <utils/exit.hh> std::string program_info = R"( Prints values for a given Mean Field potential so it can be plotted nicely. The parameters of the program are: MF potental file name, pseudocounts, min_x, max_x, potential name [other potential names ..] USAGE: ex_MeanFieldDistributions "forcefield/cabs/R13_cabs.dat" 0.01 2.0 15.0 AD.HH (note that apostrophes may be mandatory, otherwise bash will not pass the arguments correctly) )"; /** @brief Prints values for a given Mean Field potential so it can be plotted nicely. * * The program works for any potential stored in the Bioshell row-wise format; both CABS and SURPASS potentials may be * plotted with this utility. Program usage: * * ex_MeanFieldDistributions "forcefield/cabs/R13_cabs.dat" 0.01 2.0 15.0 AD.HH * * where the parameters of the program are: * MF potental file name, pseudocounts, min_x, max_x, potential name [other potential names ..] * * CATEGORIES: simulations/forcefields/mf/MeanFieldDistributions * KEYWORDS: force field */ int main(const int argc, const char *argv[]) { using namespace simulations::forcefields::mf; if(argc < 5) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter double pseudocounts = utils::from_string<double>(argv[2]); double min_x = utils::from_string<double>(argv[3]); double max_x = utils::from_string<double>(argv[4]); std::shared_ptr<MeanFieldDistributions> mf = load_1D_distributions(argv[1], pseudocounts); std::vector<EnergyComponent_SP> terms; if (argc > 5) for (int i = 5; i < argc; ++i) { std::string label(argv[i]); if(mf->contains_distribution(label)) terms.push_back(mf->at(label)); else std::cerr <<"Key "<<label<<" not found!\n"; } else for (const std::string label:mf->known_distributions()) terms.push_back(mf->at(label)); for (double d = min_x; d <= max_x; d += 0.0125) { std::cout << utils::string_format("%7.3f",d); for (const auto e : terms) std::cout << utils::string_format(" %8.3f", (*e)(d)); std::cout << "\n"; } } |

ex_Monomer¶
Unit test which demonstrates functionality of core::chemical::Monomer data type.
USAGE:
./ex_Monomer
Keywords:
- monomers
Categories:
- core::chemical::Monomer
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | #include <iostream> #include <algorithm> // for std::count_if #include <iterator> // for std::distance #include <utils/string_utils.hh> #include <core/chemical/Monomer.hh> #include <core/chemical/monomer_io.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which demonstrates functionality of core::chemical::Monomer data type. USAGE: ./ex_Monomer )"; // First we declare a function object used to count how many monomers have type = 'P' i.e. "protein" bool IsAA(const core::chemical::Monomer &m) { return (m.type == 'P') || (core::chemical::Monomer::get(m.parent_id).type == 'P'); } /** @brief Example demonstrates functionality of core::chemical::Monomer data type. * * CATEGORIES: core::chemical::Monomer * KEYWORDS: monomers */ int main(const int argc, const char *argv[]) { using namespace core::chemical; // First we iterate over all monomers and count, how many of them are actually amino acids // See how bioshell's iterators work together with std library int n_aa = std::count_if(Monomer::cbegin(), Monomer::cend(), IsAA); std::cout << std::distance(Monomer::cbegin(), Monomer::cend()) << " standard monomers found, including " << n_aa << " peptide-forming.\n"; std::cout << "The order of standard amino acid residues is:\n"; for (core::index2 i = 0; i < n_aa; ++i) { const Monomer &m = Monomer::get(i); std::cout << utils::string_format("%2d %c %3s %c\n", i, m.code1, m.code3.c_str(), m.type); } load_monomers_from_db(); // --- load the database of all known monomers // Count amino acid monomers again n_aa = std::count_if(Monomer::cbegin(), Monomer::cend(), IsAA); std::cout << "Monomer database loaded; " << std::distance(Monomer::cbegin(), Monomer::cend()) << " monomers found, including " << n_aa << " peptide-forming.\n"; // Now let's count how many non-standard residues are derived from ALA // Simply a parent_id of a monomer must be equal to ALA.id // This time we use a lambda expression rather than a functor n_aa = std::count_if(Monomer::cbegin(), Monomer::cend(), [](const Monomer &m) { return m.parent_id == Monomer::ALA.id; }); std::cout << "There are " << n_aa << " derived from alanine\n"; } |

ex_MonomerStructure¶
Unit test which shows how to read CIF files.
USAGE:
ex_MonomerStructure file.cif
EXAMPLE:
ex_MonomerStructure AA3.cif
Keywords:
Categories:
- core/chemical/MonomerStructure
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | #include <utils/Logger.hh> #include <utils/LogManager.hh> #include <core/chemical/MonomerStructure.hh> #include <core/chemical/MonomerStructureFactory.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to read CIF files. USAGE: ex_MonomerStructure file.cif EXAMPLE: ex_MonomerStructure AA3.cif )"; /** @brief ex_MonomerStructure tests reading CIF files * * CATEGORIES: core/chemical/MonomerStructure * KEYWORDS: CIF input */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::FINE(); // --- INFO is the default logging level; set it to FINE to see more core::chemical::MonomerStructure_SP str = core::chemical::MonomerStructure::from_cif(argv[1]); std::cout<<"\nPOLAR H: "; for (auto i: str->polar_hydrogens()) std::cout<<i->atom_name()<<" "; std::cout<<"\nNONPOLAR H: "; for (auto i: str->nonpolar_hydrogens()) std::cout<<i->atom_name()<<" "; std::cout<<"\nNONPOLAR: "; for (auto i: str->nonpolar_heavy()) std::cout<<i->atom_name()<<" "; std::cout<<"\nDONORS: "; for (auto i: str->hydrogen_donors()) std::cout<<i->atom_name()<<" "; std::cout<<"\nACCEPTORS: "; for (auto i: str->hydrogen_acceptors()) std::cout<<i->atom_name()<<" "; std::cout<<"\n"; core::chemical::MonomerStructureFactory m = core::chemical::MonomerStructureFactory::get_instance(); core::chemical::MonomerStructure_SP mstr = m.get("PRO"); std::cout<<mstr->code3<<"\n"; std::cout<<"\nPOLAR H: "; for (auto i: mstr->polar_hydrogens()) std::cout<<i->atom_name()<<" "; } |

ex_NeighborGrid3D¶
ex_NeighborGrid3D finds possible structural neighbors of a given residue using a 3D hashing grid
USAGE:
ex_NeighborGrid3D 2gb1.pdb 4.0 A:12
where 2gb1.pdb is an input file, 4.0 - grid size, A:12 - selector of a query residue
Keywords:
Categories:
- core::calc::structural::NeighborGrid3D
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | #include <iostream> #include <algorithm> #include <set> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/calc/structural/NeighborGrid3D.hh> #include <utils/exit.hh> #include <utils/LogManager.hh> std::string program_info = R"( ex_NeighborGrid3D finds possible structural neighbors of a given residue using a 3D hashing grid USAGE: ex_NeighborGrid3D 2gb1.pdb 4.0 A:12 where 2gb1.pdb is an input file, 4.0 - grid size, A:12 - selector of a query residue )"; /** @brief Finds possible structural neighbors of a given atom * * * CATEGORIES: core::calc::structural::NeighborGrid3D * KEYWORDS: PDB input; structure selectors */ int main(const int argc, const char* argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::FINE(); using namespace core::data::io; Pdb reader(argv[1], all_true(is_not_water,is_not_alternative)); // --- file name (PDB format, may be gzip-ped) utils::Logger logs("ex_NeighborGrid3D"); using namespace core::data::structural; Structure_SP strctr = reader.create_structure(0); selectors::SelectChainResidues select_query(argv[3]); // --- Here we find the query residue, selected by a selector string Residue_SP query_resid = nullptr; for(auto it=strctr->first_residue();it!=strctr->last_residue();++it) { if (select_query(*it)) { query_resid = *it; logs << utils::LogLevel::INFO << "selecting spatial neighbors of " << " " << query_resid->residue_type().code3 << query_resid->residue_id() << "\n"; break; } } // --- Create a 3D grid object double grid_mesh = atof(argv[2]); core::calc::structural::NeighborGrid3D grid(*strctr,grid_mesh); // --- Print content of the grid std::cout << "# Atoms as they are located on the grid\n"; std::cout << "# Grid center is: " << grid.cx() << " " << grid.cy() << " " << grid.cz() << "\n"; core::index2 ix, iy, iz; for(const auto & hash : grid.filled_cells()) { grid.xyz_from_hash(hash, ix, iy, iz); for(const PdbAtom_SP & at : grid.get_cell(hash)) { std::cout << "# " << hash << " " << at->owner()->residue_type().code3 << at->owner()->id() << " " << *at << " ix: " << ix << " iy: " << iy << " iz: " << iz << "\n"; } } // --- Print neighbor cells of the selected residue core::index4 hash_ca = grid.hash(*query_resid->find_atom_safe(" CA ")); std::vector<core::index4> neighb_hash; grid.get_neighbor_cells(hash_ca, neighb_hash); std::cout << "# neighbors of a cell " << hash_ca << "\n# "; for (core::index4 n:neighb_hash) std::cout << " " << n; std::cout << "\n"; // --- Mark the selection on a PDB file by setting B-factor to 10.0 (all other atoms to 0.0) float max_distance = 0; std::vector< PdbAtom_SP> result; for(auto it=strctr->first_atom();it!=strctr->last_atom();++it) (**it).b_factor(0.0); for(PdbAtom_SP a : *query_resid) { result.clear(); grid.get_neighbors(*a,result); for(auto atom_sp:result) { atom_sp->b_factor(10.0); max_distance = std::max(max_distance,atom_sp->distance_to(*a)); } } for(auto it=strctr->first_atom();it!=strctr->last_atom();++it) std::cout << (**it).to_pdb_line() << "\n"; std::cout << "# Max distance: " <<max_distance << "\n"; } |

ex_NormalDistribution¶
Unit test for NormalDistribution class. The example withdraws 1000 random numbers from a normal distribution and later it estimates a normal distribution from the sample.
USAGE:
./ex_NormalDistribution
Keywords:
- NormalDistribution
- random numbers
Categories:
- core/calc/statistics/NormalDistribution; core/calc/statistics/Random
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #include <iostream> #include <random> #include <math.h> #include <core/calc/statistics/NormalDistribution.hh> #include <core/calc/statistics/Random.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Unit test for NormalDistribution class. The example withdraws 1000 random numbers from a normal distribution and later it estimates a normal distribution from the sample. USAGE: ./ex_NormalDistribution )"; /** @brief Unit test for NormalDistribution class. * * CATEGORIES: core/calc/statistics/NormalDistribution; core/calc/statistics/Random * KEYWORDS: NormalDistribution; random numbers */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::calc::statistics; Random rd = core::calc::statistics::Random::get(); rd.seed(12345); // --- seed the generator for repeatable results std::mt19937 gen(rd()); std::normal_distribution<> d(10.5, 2.0); // --- The original distribution we take a sample from // --- Note that the container for samples is two-dimensional! Each sample is placed in a separate row std::vector<std::vector<double> > data; std::vector<double> row(1); for (unsigned short i = 0; i < 1000; ++i) { row[0] = (d(gen)); data.push_back(row); // --- This works only because C++ makes an implicit copy of the vector we place into the outer vector } // --- Here we estimate the distribution parameters core::calc::statistics::NormalDistribution n(0.0, 1.0); std::vector<double> E = n.estimate(data); std::cout << "True values: average = 10.5; stdev = 2.0\n"; std::cout << "Estimated values: average = " << E[0] << " sdev = " << E[1] << "\n"; // Calculate average and stdev of values in the vector } |

ex_OptionParser¶
Shows how to use BioShell command line parser in your own program
Keywords:
- option parsing
Categories:
- utils::options::OptionParser
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include <iostream> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/LogManager.hh> using namespace utils::options; /** @brief Shows how to use BioShell command line parser in your own program * * To test this program, run: * ./ex_OptionParser -n=4 -nn=1,2,3,4 * * CATEGORIES: utils::options::OptionParser * KEYWORDS: option parsing */ int main(const int cnt, const char *argv[]) { // --- Limit the stdout on stderr (logging) utils::LogManager::WARNING(); // --- First get th parser instance (it's a singleton) utils::options::OptionParser &cmd = OptionParser::get("ex_OptionParser"); // --- This is how to register an option that has already been declared in BioShell library cmd.register_option(utils::options::verbose, help); // --- User can also declare non-standard options Option number("-n", "-number", "returns an integer"); Option numbers("-nn", "-numbers", "returns a vector of integers"); Option value("-x", "-value_x", "returns a real value of X"); // --- Options that have beed declared, must also be registered // --- (Declaration doesn't mean automatic registration) cmd.register_option(number, numbers, value); // --- after all the relevant options were registered, we parse a program command line cmd.parse_cmdline(cnt, argv); // ---- User should not check for -help and -verbose flags : this is automatically done by OptionParser // --- Once command line has been parsed, we may check for a program parameter and retrieve its value: if (numbers.was_used()) std::cout << "A number given: " << option_value<int>(number) << "\n"; // --- Options may be also accessed by their long name (but not by the abbreviated name, so cmd.was_used("-x") won't work if (cmd.was_used("-value_x")) std::cout << "A number given: " << option_value<double>("-value_x") << "\n"; // --- This shows how to read a vector of values if(cmd.was_used(numbers)) { std::vector<int> v = option_value<std::vector<int>, int>(numbers); std::cout << "Given set of values: "; for (auto vi : v) std::cout << vi << " "; std::cout << "\n"; // --- This is how the raw string given at the command line may be accessed: std::cout << "The raw string associated with -value_x option was: " << cmd.value_string(numbers) << "\n"; } } |

ex_P2QuantileEstimation¶
ex_P2QuantileEstimation reads a file with real values and calculates a quantile using the P-square algorithm If no input file is provided, the program calculates 0.25, 0.5 and 0.75 quantiles of a random sample from normal distribution
USAGE:
ex_P2QuantileEstimation [infile p_value]
EXAMPLE:
ex_P2QuantileEstimation random_normal.txt 0.5
REFERENCE: Jain, Raj, Imrich Chlamtac. “The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations.” Communications of the ACM 28.10 (1985): 1076-1085. doi:10.1145/4372.4378
Keywords:
Categories:
- core/calc/statistics/OnlineStatistics; core/calc/statistics/Random
Input files:
- random_N(0,1).txt_
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <iostream> #include <core/index.hh> #include <core/calc/statistics/Random.hh> #include <core/calc/statistics/P2QuantileEstimation.hh> std::string program_info = R"( ex_P2QuantileEstimation reads a file with real values and calculates a quantile using the P-square algorithm If no input file is provided, the program calculates 0.25, 0.5 and 0.75 quantiles of a random sample from normal distribution USAGE: ex_P2QuantileEstimation [infile p_value] EXAMPLE: ex_P2QuantileEstimation random_normal.txt 0.5 REFERENCE: Jain, Raj, Imrich Chlamtac. "The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations." Communications of the ACM 28.10 (1985): 1076-1085. doi:10.1145/4372.4378 )"; /** @brief Reads a file with real values and calculates simple statistics: min, mean, stdev, max. * * If no input file is provided, the program calculates the statistics from a random sample * * CATEGORIES: core/calc/statistics/OnlineStatistics; core/calc/statistics/Random * KEYWORDS: statistics */ int main(const int argc, const char *argv[]) { if(argc < 3) { // --- complain about missing program parameter std::cerr << program_info; // ---------- Use the random engine if no data is provided core::calc::statistics::Random r = core::calc::statistics::Random::get(); r.seed(12345); // --- seed the generator for repeatable results std::normal_distribution<double> normal_random; core::calc::statistics::P2QuantileEstimation quartile1(0.25),quartile2(0.5), quartile3(0.75); for (core::index4 n = 0; n < 10000; ++n) { double rr = normal_random(r); quartile1(rr); quartile2(rr); quartile3(rr); } std::cout << "Quantile 0.25 :"<<quartile1.p_value() << "\n"; // Should be -0.675 std::cout << "Quantile 0.50 :"<<quartile2.p_value() << "\n"; // Should be 0.0 std::cout << "Quantile 0.75 :"<<quartile3.p_value() << "\n"; // Should be 0.675 } else { std::ifstream in(argv[1]); core::calc::statistics::P2QuantileEstimation stats(atof(argv[2])); double r; core::index4 cnt = 0; while (in >> r) { ++cnt; stats(r); } std::cout << "Quantile " << atof(argv[2]) << " " << stats.p_value() << " based on " << cnt << " observations\n"; } } |

ex_PairwiseAlignment¶
Simple example showing how to work with PairwiseAlignment data structure, e.g. how to retrieve arbitrary data according to a sequence alignment object
USAGE:
./ex_PairwiseAlignment
Keywords:
Categories:
- core::alignment::PairwiseAlignment
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #include <iostream> #include <iterator> #include <core/alignment/PairwiseAlignment.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Simple example showing how to work with PairwiseAlignment data structure, e.g. how to retrieve arbitrary data according to a sequence alignment object USAGE: ./ex_PairwiseAlignment )"; /** @brief Simple example showing how to retrieve arbitrary data according to a sequence alignment object * * CATEGORIES: core::alignment::PairwiseAlignment; * KEYWORDS: sequence alignment */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); // ---------- The two sequences that are already globally aligned, therefore indexes in both sequences start from 0 core::alignment::PairwiseAlignment ali("FTFTALILL-AVAV", 0, "--FTAL-LLAAV--", 0); // ---------- query "objects" : that may be C-alpha atoms, residues, etc; here just chars std::vector<char> query_chars = {'F', 'T', 'F', 'T', 'A', 'L', 'I', 'L', 'L', 'A', 'V', 'A', 'V'}; // --- all the characters of the query sequence std::vector<char> tmplt_chars = {'F', 'T', 'A', 'L', 'L', 'L', 'A', 'A', 'V'}; // --- all the characters of the template sequence // ---------- container for the expected result std::vector<char> query_chars_aligned; std::vector<char> tmplt_chars_aligned; // ---------- set up query "objects" in the order as they appear in the alignment; print result on the screen ali.get_aligned_query(query_chars, '-', query_chars_aligned); // ---------- show results (it should be identical as the original alignment std::copy(query_chars_aligned.begin(), query_chars_aligned.end(), std::ostream_iterator<char>(std::cout, "")); std::cout << "\n"; // ---------- Should print FTFTALILL-AVAV // ---------- Now we extract both query and template objects; only the mutually aligned positions (no gaps) query_chars_aligned.clear(); ali.get_aligned_query_template(query_chars, tmplt_chars, query_chars_aligned, tmplt_chars_aligned); // ---------- show results std::copy(query_chars_aligned.begin(), query_chars_aligned.end(), std::ostream_iterator<char>(std::cout, "")); std::cout << "\n"; // ---------- Should print FTALLLAV std::copy(tmplt_chars_aligned.begin(), tmplt_chars_aligned.end(), std::ostream_iterator<char>(std::cout, "")); std::cout << "\n"; // ---------- Should also print FTALLLAV // ---------- ... and finally print the alignment as a path std::cout << ali.to_path() << "\n"; } |

ex_PairwiseSequenceAlignment¶
Simple example showing how to work with PairwiseSequenceAlignment data structure, e.g. how to create such an object and hot to print it in different formats.
USAGE:
./ex_PairwiseSequenceAlignment
Keywords:
Categories:
- core::alignment::PairwiseAlignment; core::alignment::PairwiseSequenceAlignment
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | #include <iostream> #include <core/BioShellEnvironment.hh> #include <core/alignment/on_alignment_computations.hh> #include <core/alignment/PairwiseAlignment.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/data/io/alignment_io.hh> #include <core/data/sequence/Sequence.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Simple example showing how to work with PairwiseSequenceAlignment data structure, e.g. how to create such an object and hot to print it in different formats. USAGE: ./ex_PairwiseSequenceAlignment )"; std::string Q52825_1 = "IDVLLGADDGSLAFVPSEFSISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQGAGMVGKVTV"; std::string P80401_2 = "VQMLNKGTDGAMVFEPGFLKIAPGDTVTFIPTDKS-HNVETFKGLIPDGV---------PDFKSKPNEQYQVKFDIPGAYVLKCTPHVGMGMVALIQV"; /** @brief Simple example showing how to work with PairwiseSequenceAlignment data structure. * * CATEGORIES: core::alignment::PairwiseAlignment; core::alignment::PairwiseSequenceAlignment; * KEYWORDS: sequence alignment */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::alignment; using namespace core::alignment::scoring; using namespace core::data::sequence; // for core::data::sequence::Sequence // ---------- Test for global alignment ---------- // --- Alignment defined as path : '-' and '-' mean a gap in a template and in a query sequence, respectively; '*' is a match PairwiseAlignment_SP ali = std::make_shared<PairwiseAlignment>(0, 0, 0, "--****-**|**--"); // ---------- The two sequences that will be aligned Sequence_SP query = std::make_shared<Sequence>("query", "ITFTALILLAVAV", 1); Sequence_SP tmplt = std::make_shared<Sequence>("tmplt", "FTALLLAAV", 1); PairwiseSequenceAlignment seq_ali(ali, query, tmplt); // --- Show alignment as a path std::cout << "Alignment path:\n" << ali->to_path() << "\n\n"; // ---------- Print the alignment in Edinburgh format core::index2 identity = sum_identical(seq_ali); core::index2 n_gaps = seq_ali.alignment->length() - seq_ali.alignment->n_aligned(); std::cout << "# score: " << seq_ali.alignment_score() << " length: " << seq_ali.alignment->length() << " n_identical: " << identity << " n_gaps: " << n_gaps << "\n"; core::data::io::write_edinburgh(seq_ali, std::cout, 80); // ---------- Test for local alignment ---------- // --- Alignment defined as path : '-' and '-' mean a gap in a template and in a query sequence, respectively; '*' is a match PairwiseAlignment_SP loc_ali = std::make_shared<PairwiseAlignment>(2, 0, 0.0, "****-**|**"); PairwiseSequenceAlignment loc_seq_ali(loc_ali, query, tmplt); // ---------- Print the alignment in Edinburgh format identity = sum_identical(loc_seq_ali); n_gaps = loc_seq_ali.alignment->length() - loc_seq_ali.alignment->n_aligned(); std::cout << "# score: " << loc_seq_ali.alignment_score() << " length: " << loc_seq_ali.alignment->length() << " n_identical: " << identity << " n_gaps: " << n_gaps << "\n"; core::data::io::write_edinburgh(loc_seq_ali, std::cout, 80); PairwiseSequenceAlignment loc_seq_ali2("Q52825_1", Q52825_1, 0, "P80401_2", P80401_2, 0, 0.0); loc_seq_ali2.template_sequence->first_pos(28); loc_seq_ali2.query_sequence->first_pos(1); core::data::io::write_edinburgh(loc_seq_ali2, std::cout, 80); } |

ex_Pca3¶
Unit test orients 3D points along the axes using the PCA algorithm.
USAGE:
./ex_Pca3
REFERENCE: Pearson, Karl. “On lines and planes of closest fit to systems of points in space.” Philosophical Magazine 2 (1901): 559-572. doi:10.1080/14786440109462720.
Keywords:
- PCA
- transformations
Categories:
- core/calc/numeric/basic_algebra.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #include <iostream> #include <random> #include <core/index.hh> #include <core/calc/numeric/basic_algebra.hh> #include <core/calc/numeric/Pca3.hh> #include <utils/exit.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Unit test orients 3D points along the axes using the PCA algorithm. USAGE: ./ex_Pca3 REFERENCE: Pearson, Karl. "On lines and planes of closest fit to systems of points in space." Philosophical Magazine 2 (1901): 559-572. doi:10.1080/14786440109462720. )"; /** @brief Orients 3D points along the axes using PCA algorithm * * CATEGORIES: core/calc/numeric/basic_algebra.hh * KEYWORDS: PCA; transformations */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::data::basic; // --- for Vec3 and Array2D std::mt19937 gen; std::uniform_real_distribution<double> r(0, 1); std::vector<Vec3> points3d; for (core::index2 i = 0; i < 100; ++i) { double x = r(gen), y = r(gen), z = r(gen); points3d.emplace_back(x + 0.3 * y + 0.6 * z, 0.4 * x + 1.9 * y + 0.7 * z, 0.3 * x + 0.1 * y + 0.7 * z); } core::calc::numeric::Pca3 pca3(points3d); auto rt = pca3.create_transformation(); for (auto &p:points3d) { std::cout << p << " "; rt.apply(p); std::cout << p << "\n"; } } |

ex_Pdb¶
Unit test which shows how to read a PDB file and create a Structure object. The program reads a given file with a PDB line filter that passes only backbone atoms; prints experimental method, resolution, R-value and R-free about the input file.
USAGE:
ex_Pdb 5edw.pdb
Keywords:
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #include <iostream> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to read a PDB file and create a Structure object. The program reads a given file with a PDB line filter that passes only backbone atoms; prints experimental method, resolution, R-value and R-free about the input file. USAGE: ex_Pdb 5edw.pdb )"; /** @brief Reads a PDB file and creates a Structure object. * * Input PDB data is filtered so only protein backbone atoms are loaded * * CATEGORIES: core::data::io::Pdb; * KEYWORDS: PDB input; PDB line filter; Structure */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter for (int i = 1; i < argc; ++i) { core::data::io::Pdb reader(argv[i], // file name (PDB format, may be gzip-ped) core::data::io::is_bb, // a predicate to read only the ATOM lines corresponding to backbone atoms core::data::io::only_ss_from_header, true); // parse PDB header core::data::structural::Structure_SP backbone = reader.create_structure(0); std::cout << "protein " << reader.pdb_code() << " has " << backbone->count_chains() << " chain(s), " << backbone->count_residues() << " residues and " << backbone->count_atoms() << " backbone atoms\n"; std::cout << "title : " << backbone->title() << "\n"; std::cout << "compound : " << backbone->compound() << "\n"; std::cout << "classification : " << backbone->classification() << "\n"; std::cout << "deposited : " << backbone->deposition_date() << "\n"; std::cout << "Is XRAY? : " << ((backbone->is_xray()) ? "YES\n" : "No\n"); std::cout << "Is NMR? : " << ((backbone->is_nmr()) ? "YES\n" : "No\n"); std::cout << "Is EM? : " << ((backbone->is_em()) ? "YES\n" : "No\n"); std::cout << "resolution : " << backbone->resolution() << "\n"; std::cout << "R-value : " << backbone->r_value() << "\n"; std::cout << "R-free : " << backbone->r_free() << "\n"; if(backbone->keywords().size()>0) std::cout << "keywords : " << backbone->keywords()[0]; for (auto it = ++backbone->keywords().cbegin(); it != backbone->keywords().cend(); ++it) std::cout << ", "<< *it; std::cout << "\n"; } } |

ex_Quaternion¶
ex_Quaternion illustrates how to use Quaternion class
Keywords:
- algebra
Categories:
- core::calc::numeric::Quaternion
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #include <iostream> #include <core/calc/numeric/Quaternion.hh> #include <core/data/basic/Vec3.hh> #include <core/data/io/Pdb.hh> #include <core/calc/statistics/Random.hh> /** @brief ex_Quaternion illustrates how to use Quaternion class * * CATEGORIES: core::calc::numeric::Quaternion * KEYWORDS: algebra */ int main(const int argc, const char* argv[]) { using namespace core::calc::numeric; using namespace core::data::basic; // --- for Vec3 Quaternion p, rot; core::calc::statistics::Random::seed(0); rot.random(); // --- Random rotation axis //rot = Quaternion::create_from_axis_angle(1,0,0,M_PI/3.0); // ---------- Read a structure to be rotated core::data::io::Pdb reader(argv[1],core::data::io::is_not_alternative, core::data::io::only_ss_from_header, true); const auto strctr_sp = reader.create_structure(0); // ---------- Iterate over atoms and rotate them one by one for(auto a_it = strctr_sp->first_atom(); a_it != strctr_sp->last_atom(); ++a_it) { // --- use quaternion rotation method p.i = (**a_it).x; p.j = (**a_it).y; p.k = (**a_it).z; p.rotate_by(rot); std::cout << p.i<<" "<<p.j<<" "<<p.k <<"\n"; // --- and now the same, but with apply() method - output should be numerically the same rot.apply(**a_it); std::cout << (**a_it).to_pdb_line() << "\n"; } return 0; } |

ex_REMC_Ar¶
The program runs an Replica Exchange MC simulation of argon gas. By default it stars from a regular lattice conformation unless an input file (PDB) with initial conformation is provided
USAGE:
ex_REMC_Ar n_atoms density small_cycles big_cycles n_exchange temperature_1 temperature_2 [temperature_3 ...]
ex_REMC_Ar starting.pdb density small_cycles big_cycles n_exchange temperature_1 temperature_2 [temperature_3 ...]
Keywords:
- Mover
- Replica Exchange
- Monte Carlo
- sampling
Categories:
- simulations::sampling::ReplicaExchangeMC
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | #include <cstdio> #include <ctime> #include <iostream> #include <core/data/basic/Vec3Cubic.hh> #include <utils/string_utils.hh> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/options/output_options.hh> #include <utils/options/sampling_options.hh> #include <simulations/systems/CartesianAtomsSimple.hh> #include <simulations/systems/BuildFluidSystem.hh> #include <simulations/systems/SingleAtomType.hh> #include <simulations/movers/TranslateAtom.hh> #include <simulations/forcefields/cartesian/LJEnergySWHomogenic.hh> #include <simulations/sampling/IsothermalMC.hh> #include <simulations/observers/ObserveMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/observers/ObserveReplicaFlow.hh> #include <simulations/observers/ObserveEnergyComponents.hh> #include <simulations/observers/UpdateSystemTags.hh> #include <simulations/observers/AdjustMoversAcceptance.hh> #include <simulations/observers/cartesian/SimplePdbFormatter.hh> #include <utils/exit.hh> using namespace core::data::basic; utils::Logger logs("ex_REMC_Ar"); std::string program_info = R"( The program runs an Replica Exchange MC simulation of argon gas. By default it stars from a regular lattice conformation unless an input file (PDB) with initial conformation is provided USAGE: ex_REMC_Ar n_atoms density small_cycles big_cycles n_exchange temperature_1 temperature_2 [temperature_3 ...] ex_REMC_Ar starting.pdb density small_cycles big_cycles n_exchange temperature_1 temperature_2 [temperature_3 ...] )"; const double EPSILON = 1.654E-21; // [J] per molecule const double EPSILON_BY_K = EPSILON / 1.381E-23; // = 119.6 in Kelvins const double SIGMA = 3.4; // in Angstroms /** @brief A helper function that creates a dummy structure of 830 argon atoms. * * The structure is necessary to be able to save the system state in the PDB format. It will not be used * in the simulation - just for output formatting. The atoms may be stored in multiple chains, because PDB file * format allows a single chain to have at most 9999 atoms. * * CATEGORIES: simulations::sampling::ReplicaExchangeMC * KEYWORDS: Mover;Replica Exchange; Monte Carlo; sampling */ core::data::structural::Structure_SP create_argon_structure(const core::index4 n_ar) { using namespace core::data::structural; Structure_SP s = std::make_shared<Structure>(""); core::index2 n_chains = n_ar / 9999; core::index4 n_atoms = 0; for (core::index2 ic = 0; ic < n_chains; ++ic) { Chain_SP chain = std::make_shared<Chain>(std::string{utils::letters[ic]}); for (core::index4 i = 0; i < 9999; ++i) { Residue_SP res = std::make_shared<Residue>(i + 1, " AR"); res->push_back(std::make_shared<PdbAtom>(i + 1, " AR ", core::chemical::AtomicElement::ARGON.z)); chain->push_back(res); ++n_atoms; } s->push_back(chain); } if (n_atoms < n_ar) { Chain_SP chain = std::make_shared<Chain>(std::string{utils::letters[n_chains]}); for (core::index4 i = n_atoms; i < n_ar; ++i) { Residue_SP res = std::make_shared<Residue>(i + 1 - n_atoms, " AR"); res->push_back(std::make_shared<PdbAtom>(i + 1 - n_atoms, " AR ", core::chemical::AtomicElement::ARGON.z)); chain->push_back(res); } s->push_back(chain); } return s; } /** @brief Isothermal Monte Carlo simulation of argon gas. * */ int main(const int argc,const char* argv[]) { using core::data::basic::Vec3Cubic; using namespace simulations::systems; using namespace simulations::observers::cartesian; using namespace simulations::forcefields; // for CalculateEnergyBase, NeighborList using namespace simulations::movers; // for MoversSet // ---------- Define some new types so the program is easier to read and lines get shorter typedef typename cartesian::LJEnergySWHomogenic LjEnergy; typedef typename simulations::observers::ObserveEnergyComponents<TotalEnergy> EnergyObserverType; typedef std::shared_ptr<EnergyObserverType> EnergyObserverType_SP; core::index4 n_atoms; if (argc < 8) utils::exit_OK_with_message(program_info); std::vector<core::data::structural::Structure_SP> argon_structures; bool input_from_file = false; if (utils::is_integer(argv[1])) { n_atoms = atoi(argv[1]); argon_structures.push_back(create_argon_structure(n_atoms)); } else { // --- read an input file if given core::data::io::Pdb reader(argv[1]); for(core::index2 i_str=0;i_str<reader.count_models();++i_str) argon_structures.push_back(reader.create_structure(i_str)); n_atoms = argon_structures[0]->count_atoms(); input_from_file = true; } double density = atof(argv[2]); core::index4 n_inner_cycles = atoi(argv[3]); core::index4 n_outer_cycles = atoi(argv[4]); core::index4 n_exchanges = atoi(argv[5]); double ar_volume = 4.0 / 3.0 * M_PI * SIGMA * SIGMA * SIGMA * n_atoms; double box_len = pow(ar_volume / density, 0.33333333333333); core::calc::statistics::Random::seed(1234); // --- Initialize periodic boundary conditions for the box length core::data::basic::Vec3Cubic::set_box_len(box_len); logs << utils::LogLevel::INFO << "box width for "<<int(n_atoms)<<" atoms : " << box_len << "\n"; std::vector<double> temperatures; for (int i = 6; i < argc; ++i) temperatures.push_back(atof(argv[i])); AtomTypingInterface_SP ar_type = std::make_shared<SingleAtomType>(" AR"); std::vector<std::shared_ptr<CartesianAtomsSimple<Vec3Cubic>>> systems; core::data::structural::PdbAtom_SP ar_atom = std::make_shared<core::data::structural::PdbAtom>(1," AR"); std::shared_ptr<AbstractPdbFormatter> fmt = std::make_shared<SimplePdbFormatter>(" AR ", "AR ", "AR"); std::vector<simulations::sampling::IsothermalMC_SP> replica_samplers; std::vector<CalculateEnergyBase_SP> energies; std::vector<simulations::observers::ObserverInterface_SP> mover_adjusters; for (core::index2 irepl = 0; irepl < temperatures.size(); ++irepl) { // ---------- Create the systems to be sampled ---------- CartesianAtoms ar(ar_type, n_atoms); systems.push_back(ar); // ---------- Distribute atoms in the box or use coordinates from provided PDB file if(input_from_file) { const auto strctr = argon_structures[irepl % argon_structures.size()]; core::index2 ia=0; for(auto a_it = strctr->first_const_atom();a_it !=strctr->last_const_atom();++a_it) { (*ar)[ia].set(**a_it); ++ia; } } else BuildFluidSystem<Vec3Cubic>::generate(*ar, *ar_atom, n_atoms); // ---------- Create energy function - just LJ potential std::shared_ptr<NeighborList_OBSOLETE<Vec3Cubic>> nbl = std::make_shared<NeighborList_OBSOLETE<Vec3Cubic>>(*ar, 9.0, 3.0); std::shared_ptr<LjEnergy> lj_energy_term = std::make_shared<LjEnergy>(*ar, *nbl, SIGMA, EPSILON_BY_K); std::shared_ptr<TotalEnergy_OBSOLETE<ByAtomEnergy_OBSOLETE>> lj_energy = std::make_shared<TotalEnergy_OBSOLETE<ByAtomEnergy_OBSOLETE>>(); lj_energy->add_component(lj_energy_term,1.0); energies.push_back(std::static_pointer_cast<class CalculateEnergyBase>(lj_energy)); // --- Create a mover, which is a random perturbation of an atom in this case, and place it in a movers' set std::shared_ptr<TranslateAtom<Vec3Cubic>> translate = std::make_shared<TranslateAtom<Vec3Cubic>>(*ar, *lj_energy_term); MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(translate, n_atoms); translate->max_move_range(0.5); // --- set the maximum distance a single atom can be moved by a single MC perturbation // --- Create an observer to record and adjust movers acceptance rate simulations::observers::AdjustMoversAcceptance_SP adj = std::make_shared<simulations::observers::AdjustMoversAcceptance>( *movers, utils::string_format("movers-%d.dat", irepl), 0.4); mover_adjusters.push_back(adj); adj->observe_header(); // ---------- create an isothermal Monte Carlo sampler auto mc = std::make_shared<simulations::sampling::IsothermalMC>(movers,temperatures[irepl]); // ---------- Create an observer for energy components EnergyObserverType_SP obs_en = std::make_shared<EnergyObserverType>(*lj_energy,utils::string_format("energy-%d.dat",irepl)); obs_en->observe_header(); // ---------- Create an observer for trajectory in PDB format auto observe_trajectory = std::make_shared<simulations::observers::cartesian::PdbObserver_OBSOLETE<Vec3Cubic>>( *ar, fmt, utils::string_format("ar_tra-%d.pdb", irepl)); observe_trajectory->trigger(std::make_shared<simulations::observers::TriggerEveryN>(10)); mc->outer_cycle_observer(observe_trajectory); mc->outer_cycle_observer(obs_en); mc->cycles(n_inner_cycles,n_outer_cycles,1); replica_samplers.push_back(mc); } bool replica_isothermal_observation_mode = true; simulations::sampling::ReplicaExchangeMC remc(replica_samplers, energies, replica_isothermal_observation_mode); auto remc_flow = std::make_shared<simulations::observers::ObserveReplicaFlow>(remc,"replica_flow.dat"); remc.exchange_observer(remc_flow); auto tag_updater = std::make_shared<simulations::observers::UpdateSystemTags<CartesianAtomsSimple<Vec3Cubic>>>(systems,remc); tag_updater->observe(); remc.exchange_observer(tag_updater); remc.replica_exchanges(n_exchanges); for (auto o : mover_adjusters) remc.exchange_observer(o); remc.run(); simulations::observers::cartesian::PdbObserver_OBSOLETE<Vec3Cubic> final(*systems[0], fmt, "final.pdb"); final.observe(); for (core::index2 i_system = 1; i_system < systems.size(); ++i_system) { for (core::index4 i_atom = 0; i_atom < systems[0]->n_atoms; ++i_atom) systems[0]->operator[](i_atom) = systems[i_system]->operator[](i_atom); final.observe(); } final.finalize(); } |

ex_ReduceSequenceAlphabet¶
If no input is given, ex_ReduceSequenceAlphabet lists all reduced amino acid alphabets registered in BioShell library. Alternatively, user can provide an alphabet name; in this case the relevant mapping is printed on the screen.
USAGE:
ex_ReduceSequenceAlphabet [alphabet_name]
EXAMPLEs:
ex_ReduceSequenceAlphabet
ex_ReduceSequenceAlphabet lz-mj.16
Keywords:
- reduced alphabet
Categories:
- core::data::sequence::ReduceSequenceAlphabet
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | #include <iostream> #include <iomanip> #include <core/data/sequence/ReduceSequenceAlphabet.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( If no input is given, ex_ReduceSequenceAlphabet lists all reduced amino acid alphabets registered in BioShell library. Alternatively, user can provide an alphabet name; in this case the relevant mapping is printed on the screen. USAGE: ex_ReduceSequenceAlphabet [alphabet_name] EXAMPLEs: ex_ReduceSequenceAlphabet ex_ReduceSequenceAlphabet lz-mj.16 )"; /** @brief ex_ReduceSequenceAlphabet lists all reduced amino acid alphabets registered in BioShell library * * CATEGORIES: core::data::sequence::ReduceSequenceAlphabet * KEYWORDS: reduced alphabet */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using core::data::sequence::ReduceSequenceAlphabet; int i = 1; std::cout << "Known alphabets:\n"; for (auto it = ReduceSequenceAlphabet::cbegin(); it != ReduceSequenceAlphabet::cend(); ++it) { std::cout << std::setw(8) << it->first << ((i % 10 == 0) ? "\n" : " "); ++i; } std::cout << "\n"; for (int i = 1; i < argc; ++i) { std::cout << "Listing alphabet " << argv[i] << ":\n"; core::data::sequence::ReduceSequenceAlphabet_SP alph = ReduceSequenceAlphabet::get_alphabet(argv[i]); std::cout << (*alph) << "\n"; } } |

ex_Residue¶
Simple example reads a PDB file and checks if all amino acid residues have complete backbone.
EXAMPLE:
ex_Residue 5edw.pdb
Keywords:
Categories:
- core::data::structural::Structure; core::data::structural::Residue
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> std::string program_info = R"( Simple example reads a PDB file and checks if all amino acid residues have complete backbone. EXAMPLE: ex_Residue 5edw.pdb )"; /** @brief Reads a PDB file and checks if all amino acid residues have complete backbone * * CATEGORIES: core::data::structural::Structure; core::data::structural::Residue * KEYWORDS: PDB input; pre-processing */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1]); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); // Iterate over all residues in the structure bool is_OK = true; for (auto it_resid = strctr->first_residue(); it_resid!=strctr->last_residue(); ++it_resid) { const core::chemical::Monomer & m = (*it_resid)->residue_type(); core::data::structural::PdbAtom_SP atom_sp; if(m.type=='P') { atom_sp = (*it_resid)->find_atom(" N "); if(atom_sp==nullptr) { std::cout << "Missing backbone atom N \n"; is_OK = false; } atom_sp = (*it_resid)->find_atom(" CA "); if(atom_sp==nullptr) { std::cout << "Missing backbone atom CA \n"; is_OK = false; } atom_sp = (*it_resid)->find_atom(" C "); if(atom_sp==nullptr) { std::cout << "Missing backbone atom C \n"; is_OK = false; } atom_sp = (*it_resid)->find_atom(" O "); if(atom_sp==nullptr) { std::cout << "Missing backbone atom O \n"; is_OK = false; } } } if(is_OK) std::cout << "Backbone complete!\n"; } |

ex_RobustDistributionDecorator¶
Example showing how to create and use a RobustDistributionDecorator, which facilitates distribution estimation of any probability distribution function that is defined in BioShell. This example estimates parameters of a normal distribution from a noised data using regular and robust methods.
USAGE:
./ex_RobustDistributionDecorator [data.txt]
REFERENCE: Kim Seong-Ju “The Metrically Trimmed Mean as a Robust Estimator of Location”, The Annals of Statistics (1992) 20 1534-1547
Keywords:
Categories:
- core::calc::statistics::NormalDistribution; core::calc::statistics::RobustDistributionDecorator
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | #include <math.h> #include <iostream> #include <random> #include <vector> #include <core/calc/statistics/NormalDistribution.hh> #include <core/calc/statistics/RobustDistributionDecorator.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Example showing how to create and use a RobustDistributionDecorator, which facilitates distribution estimation of any probability distribution function that is defined in BioShell. This example estimates parameters of a normal distribution from a noised data using regular and robust methods. USAGE: ./ex_RobustDistributionDecorator [data.txt] REFERENCE: Kim Seong-Ju "The Metrically Trimmed Mean as a Robust Estimator of Location", The Annals of Statistics (1992) 20 1534-1547 )"; /** @brief Example showing how to create and use a RobustDistributionDecorator * * CATEGORIES: core::calc::statistics::NormalDistribution; core::calc::statistics::RobustDistributionDecorator * KEYWORDS: estimation */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::calc::statistics; unsigned int rd = 9876543; std::mt19937 gen(rd); core::index4 N = 10000; //--- the number of random points to use in tests core::index4 Nnoise = 10; //--- the number of random points from the noise distribution // ---------- The two distributions used in this test std::normal_distribution<> base(2.0, 0.5); // --- the "base" distribution std::normal_distribution<> noise(2.0, 50); // --- the "noise" distribution // ---------- std::vector<std::vector<double> > random_points; if (argc==1) { // ---------- Generate random sample for (core::index4 i = 0; i < N; ++i) random_points.emplace_back( std::initializer_list<double>{base(gen)} ); for (core::index4 i = 0; i < Nnoise; ++i) random_points.emplace_back( std::initializer_list<double>{noise(gen)} ); } else { // ---------- Read data from a file std::fstream infile(argv[1], std::ios_base::in); double a; while (infile >> a) { random_points.emplace_back(std::initializer_list<double>{a}); } } std::vector<double> init_params{1.0,0.0}; // --- initial parameters of the estimated distribution NormalDistribution n(init_params); n.estimate(random_points); std::cout << "Estimated: " << n << "\n"; RobustDistributionDecorator<NormalDistribution> rn(init_params, 0.05); rn.estimate(random_points); std::cout << "Estimated (robust): "<< rn << "\n"; if (argc==1) std::cout << "True distribution: 2.0 0.5\n"; } |

ex_SelectChainBreaks¶
Reads a PDB file and prints list of chain breaks found in every chain
EXAMPLE:
ex_SelectChainBreaks 4mcb.pdb
Keywords:
Categories:
- core::data::structural::SelectChainBreaks
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/SelectChainBreaks.hh> #include <utils/exit.hh> #include <utils/LogManager.hh> std::string program_info = R"( Reads a PDB file and prints list of chain breaks found in every chain EXAMPLE: ex_SelectChainBreaks 4mcb.pdb )"; /** @brief Reads a PDB file and prints a list of chain breaks found in every chain * * CATEGORIES: core::data::structural::SelectChainBreaks * KEYWORDS: structure selectors; PDB input */ int main(const int argc, const char *argv[]) { using namespace core::data::structural::selectors; utils::LogManager::INFO(); if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // ---------- Read a PDB file and create a Structure object core::data::io::PdbLineFilter filt1 = core::data::io::all_true(core::data::io::is_ca, core::data::io::is_standard_atom,core::data::io::is_not_alternative); core::data::io::PdbLineFilter filt2 = core::data::io::all_true(core::data::io::is_ca, core::data::io::is_hetero_atom,core::data::io::is_not_alternative); core::data::io::PdbLineFilter filt3 = core::data::io::one_true(filt1, filt2); core::data::io::Pdb reader(argv[1],filt3); auto strctr = reader.create_structure(0); SelectChainBreaks sel; bool chain_is_OK = true; for (const auto &chain : *strctr) { if (std::distance(chain->begin(),chain->terminal_residue()) < 3) { std::cerr << "Chain " << chain->id() << " of " << strctr->code() << " is too short\n"; continue; } if (chain->count_aa_residues() < chain->size() * 0.8) { std::cerr << "Chain " << chain->id() << " of " << strctr->code() << " is not a protein\n"; continue; } std::cerr << "# Processing " << strctr->code() << " chain " << chain->id() << "\n"; size_t first_aa = 0; while ((*chain)[first_aa]->residue_type().type != 'P') ++first_aa; const auto last_it = chain->terminal_residue() - 1; for (auto res_it = chain->begin() + first_aa + 1; res_it != last_it; ++res_it) { auto next = (**res_it).next(); if(next== nullptr) { next = *(++std::find(chain->begin(),chain->end(),*res_it)); std::cerr << "Residue following "<<(**res_it)<<" is not an amino acid!\n"; } auto prev = (**res_it).previous(); if(prev == nullptr) { prev = *(--std::find(chain->begin(),chain->end(),*res_it)); std::cerr << "Residue preceding "<<(**res_it)<<" is not an amino acid!\n"; } if (sel(*res_it)) { chain_is_OK = false; if (sel.last_chainbreak_type == RIGHT) { std::cout << utils::string_format("%s %4s %4d%c %4d%c %6.2f\n", strctr->code().c_str(), chain->id().c_str(), (*res_it)->id(), (*res_it)->icode(),next->id(), next->icode(), sel.right_side_distance); ++res_it; if (res_it == last_it) break; } if (sel.last_chainbreak_type == LEFT) { std::cout << utils::string_format("%s %4s %4d%c %4d %c %6.2f\n", strctr->code().c_str(), chain->id().c_str(), prev->id(), prev->icode(), (*res_it)->id(), (*res_it)->icode(), sel.left_side_distance); } if (sel.last_chainbreak_type == BOTH) { std::cout << utils::string_format("%s %4s %4d%c %4d %c %6.2f %6.2f\n", strctr->code().c_str(), chain->id().c_str(), prev->id(), prev->icode(), (*res_it)->id(), (*res_it)->icode(), sel.left_side_distance); std::cout << utils::string_format("%s %4s %4d%c %4d %c\n", strctr->code().c_str(), chain->id().c_str(), (*res_it)->id(), (*res_it)->icode(), next->id(), next->icode(), sel.right_side_distance); ++res_it; if (res_it == last_it) break; } } } if(chain_is_OK) std::cout << utils::string_format("%s %4s OK\n", strctr->code().c_str(), chain->id().c_str()); } } |

ex_SelectResidueRange¶
Simple example showing how to select a structural fragment based on residue IDs
USAGE:
./ex_SelectResidueRange
Keywords:
Categories:
- core::data::structural::SelectResidueRange
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #include <iostream> #include <sstream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Simple example showing how to select a structural fragment based on residue IDs USAGE: ./ex_SelectResidueRange )"; /** @brief Shows how to select a structural fragment based on residue IDs * * CATEGORIES: core::data::structural::SelectResidueRange * KEYWORDS: structure selectors; PDB input; STL; algorithms */ // --- Only C-alpha atoms are listed here to keep this example short and simple std::string fragment = R"(ATOM 312 CA ALA A -1 -10.035 4.811 1.920 1.00 0.24 C ATOM 322 CA VAL A 0 -13.437 5.248 0.258 1.00 0.33 C ATOM 338 CA ASP A 1 -12.201 3.975 -3.121 1.00 0.24 C ATOM 350 CA ALA A 1A -9.237 2.226 -4.777 1.00 0.18 C ATOM 360 CA ALA A 1B -7.956 5.461 -6.338 1.00 0.24 C ATOM 370 CA THR A 2 -7.460 7.449 -3.135 1.00 0.21 C ATOM 384 CA ALA A 3 -6.080 4.229 -1.648 1.00 0.12 C )" ; int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::data::structural; std::stringstream in(fragment); // Create an input stream that will provide data from a string core::data::io::Pdb reader(in); Structure_SP strctr = reader.create_structure(0); std::cout << "IDs of the residues available for selection:"; std::for_each(strctr->first_residue(), strctr->last_residue(), [](Residue_SP r) {std::cout << r->residue_id()<<" ";}); std::cout << "\n"; selectors::SelectResidueRange range0("-1-1"); std::cout << "selector " << range0.selector_string() << " selects: " << std::count_if(strctr->first_residue(), strctr->last_residue(), range0) << " residues\n"; selectors::SelectResidueRange range1("-1-1A"); std::cout << "selector " << range1.selector_string() << " selects: " << std::count_if(strctr->first_residue(), strctr->last_residue(), range1) << " residues\n"; selectors::SelectResidueRange range2("*"); std::cout << "selector " << range2.selector_string() << " selects: " << std::count_if(strctr->first_residue(), strctr->last_residue(), range2) << " residues\n"; } |

ex_SemiglobalAligner¶
Example that calculates semiglobal alignment i.e. the optimal global alignment where trailing gaps are not penalized. The program also shows how one can define its own scoring function to calculate an alignment
USAGE:
./ex_PairwiseAlignment
Keywords:
Categories:
- core::alignment::SemiglobalAligner; core::alignment::PairwiseSequenceAlignment
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | #include <iostream> #include <chrono> #include <algorithm> #include <core/data/io/fasta_io.hh> #include <core/data/sequence/Sequence.hh> #include <core/alignment/SemiglobalAligner.hh> #include <core/alignment/PairwiseAlignment.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/SimilarityMatrixScore.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Example that calculates semiglobal alignment i.e. the optimal global alignment where trailing gaps are not penalized. The program also shows how one can define its own scoring function to calculate an alignment USAGE: ./ex_PairwiseAlignment )"; using namespace core::data::sequence; using namespace core::alignment::scoring; /// An example score function used by BioShell pairwise sequence alignment methods. /** Such a scoring object must provide three components: * - a scoring operator, whose arguments are positions in the scored sequences (query and template, respectively) * - query_length() method, and * - tmplt_length() method */ struct IdentityScore { IdentityScore(const Sequence & query,const Sequence & tmplt) : q(query.sequence), t(tmplt.sequence) {} /// Alignment score is 1 when the two compared letters are identical and 0 otherwise short operator()(const core::index2 i,const core::index2 j) const { return q[i]==t[j]; } /// Returns the length of a template sequence core::index2 tmplt_length() const { return t.length(); } /// Returns the length of a query sequence core::index2 query_length() const { return q.length(); } const std::string & q; const std::string & t; }; /** @brief Calculate a pairwise sequence alignment between two sequences with identity scoring method. * * The program calculates semiglobal alignment i.e. the optimal global alignment where trailing gaps are not penalized. * The program also shows how one can define its own scoring function to calculate an alignment * * CATEGORIES: core::alignment::SemiglobalAligner; core::alignment::PairwiseSequenceAlignment * KEYWORDS: sequence alignment */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); Sequence_SP query = std::make_shared<Sequence>("query","CATACGTCGACGGCT",1); Sequence_SP tmplt = std::make_shared<Sequence>("tmplt","ACGACGT",1); // --- create aligner object core::index2 max_len = std::max(query->length(),tmplt->length()); core::alignment::SemiglobalAligner<short, IdentityScore> aligner(max_len); // --- find score of the alignment; just the score - this is faster than aligning and keeping backtracking info IdentityScore s(*query, *tmplt); short result1 = aligner.align_for_score(-10, -1, s); short result2 = aligner.align(-10, -1, s); core::alignment::PairwiseAlignment_SP ali = aligner.backtrace(); std::cerr << ali->query_length() << " " << query->sequence << " " << ali->template_length() << " " << tmplt->sequence << "\n"; core::alignment::PairwiseSequenceAlignment seq_ali(ali,query,tmplt); IdentityScore s2(*tmplt, *query); short result3 = aligner.align(-10, -1, s2); std::cout << "The three scores below should be identical:\n" << result1 << " " << result2 << " " << result3 << "\n" << seq_ali << "\n"; } |

ex_Seqres¶
Reads a PDB file and prints the sequences stored in its SEQRES fields These sequences in many cases differ from the sequences extracted from coordinates section
EXAMPLE:
./ex_Seqres 2kwi.pdb
Keywords:
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a PDB file and prints the sequences stored in its SEQRES fields These sequences in many cases differ from the sequences extracted from coordinates section EXAMPLE: ./ex_Seqres 2kwi.pdb )"; /** @brief Reads a PDB file and extracts its SEQRES sequence(s) * CATEGORIES: core::data::io::Pdb * KEYWORDS: PDB input; Structure; sequence */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; Pdb reader(argv[1], // file name (PDB format, may be gzip-ped) is_ca, // read only CA atoms keep_all, true); // parse PDB header ! std::shared_ptr<Seqres> seq_res = std::static_pointer_cast<Seqres>(reader.header.find("SEQRES")->second); for(const auto & chain_seq : seq_res->sequences) { const std::string header = reader.pdb_code()+" : "+chain_seq.first; core::data::sequence::Sequence_SP s = seq_res->create_sequence(chain_seq.first,header); if((s->length()>20) && (s->get_monomer(2).type=='P')) std::cout << core::data::io::create_fasta_string(*s,-1)<<"\n"; } } |

ex_Sequence¶
Unit test which reads a PDB file and prints a requested sequence fragment.
USAGE:
./ex_Sequence input.pdb chain from to
EXAMPLE
./ex_Sequence 3wn7.pdb A 366 405
Keywords:
Categories:
- core/data/io/Sequence
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads a PDB file and prints a requested sequence fragment. USAGE: ./ex_Sequence input.pdb chain from to EXAMPLE ./ex_Sequence 3wn7.pdb A 366 405 )"; /** @brief Reads a PDB file and prints a fragment of its sequence. * * CATEGORIES: core/data/io/Sequence * KEYWORDS: PDB input; sequence */ int main(const int argc, const char* argv[]) { if(argc < 4) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; // --- Try this test program with 3wn7 as the input structure ! Pdb reader(argv[1], // file name (PDB format, may be gzip-ped) all_true(is_not_hydrogen,is_not_water), // don't read hydrogens, skip water molecules core::data::io::keep_all, true); // parse PDB header ! core::index2 from = atoi(argv[3]); core::index2 to = atoi(argv[4]); auto structure = reader.create_structure(0); auto chain = structure->get_chain(argv[2][0]); auto sequence = chain->create_sequence(); // --- Should be 324 (324 is the ID of the very first residue of 3wn7 chain A) std::cout << "Id of the first residue: " << sequence->first_pos() << "\n"; Sequence fragment(*sequence, from, to); // Cut from residue whose ID is 366 std::cout << fragment.first_pos() << " " << fragment.sequence << "\n"; } |

ex_SimulatedAnnealing¶
A simple example shows how to use Monte Carlo simulated annealing.
Keywords:
- Monte Carlo
- Mover
- sampling
- simulated annealing
- evaluators
Categories:
- simulations/generic/evaluators/EchoEvaluator; simulations/generic/evaluators/CallEvaluator
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | #include <memory> #include <iostream> #include <random> #include <core/calc/statistics/Random.hh> #include <simulations/movers/Mover.hh> #include <simulations/movers/MoversSetSweep.hh> #include <simulations/evaluators/EchoEvaluator.hh> #include <simulations/evaluators/CallEvaluator.hh> #include <simulations/sampling/SimulatedAnnealing.hh> #include <simulations/observers/ObserveEvaluators.hh> /** @brief Models a point particle in 1D space with harmonic energy. * This simple class implements the Mover interface. One has to implement just two * essential methods: <code>move()</code> and <code>undo()</code>. For simplicity, in this example energy calculation * is implemented within the <code>move()</code> method. */ class HarmonicSystemMover : public simulations::movers::Mover { public: const double x0; ///< Initial position of the particle, where energy is 0.0 double x; ///< Actual position of the particle at the end of the spring double recent_energy; ///< Actual energy of the spring HarmonicSystemMover() : simulations::movers::Mover("HarmonicSystemMover"), x0(0.0) {} /** @brief Moves the particle randomly in either direction. * This method is declared abstract in Mover class and must be implemented here */ virtual bool move(simulations::sampling::AbstractAcceptanceCriterion &mc_scheme) { double old_en = (x - x0) * (x - x0); double delta_x = 0.1 - 0.2 * rand_coordinate(generator); x += delta_x; double new_en = (x - x0) * (x - x0); inc_move_counter(); if (!mc_scheme.test(old_en, new_en)) { undo(); recent_energy = old_en; return false; } else { recent_energy = new_en; return true; } } /** @brief Back up the most recent move. * This method is declared abstract in Mover class and must be implemented here */ inline void undo() { dec_move_counter(); x -= delta_x; } /// Yet another method inherited from the base class virtual const std::string &name() const { return name_; } /// Does nothing, but must be implemented since it's been declared as virtual in the base class void max_move_range(const double max_range) {} /// Reads the maximum range for a move virtual double max_move_range() const { return 0.1; } private: double delta_x; std::uniform_real_distribution<double> rand_coordinate; core::calc::statistics::Random &generator = core::calc::statistics::Random::get(); static std::string name_; }; std::string HarmonicSystemMover::name_ = "HarmonicSystemMover"; /** @brief A simple example shows how to use Monte Carlo simulated annealing. * * This demo also shows how to implement a simple mover and how to hook it up to sampling protocol. * CATEGORIES: simulations/generic/movers/Mover; simulations/generic/sampling/SimulatedAnnealing; simulations/generic/evaluators/EchoEvaluator; * CATEGORIES: simulations/generic/evaluators/EchoEvaluator; simulations/generic/evaluators/CallEvaluator * KEYWORDS: Monte Carlo; Mover; sampling; simulated annealing; evaluators */ int main(const int argc, const char *argv[]) { using namespace simulations::evaluators; // ---------- Here we create a system to me modelled std::shared_ptr<HarmonicSystemMover> harmonic_ptr = std::make_shared<HarmonicSystemMover>(); // --- You need a mover set to use SimulatedAnnealing protocol, even if the set contains just one mover simulations::movers::MoversSet_SP moves = std::make_shared<simulations::movers::MoversSetSweep>(); moves->add_mover(harmonic_ptr,1); simulations::sampling::SimulatedAnnealing sa(moves,{2.0,1.5,1.0}); // ---------- Create an observer which calls evaluators and writes the observations on the screen std::shared_ptr<simulations::observers::ObserveEvaluators> obs = std::make_shared<simulations::observers::ObserveEvaluators>(""); // --- Create an evaluator which just return the X position of the particle // --- Add the evaluator to the observer. Obviously there might be may evaluators added to this observer; // --- then a nice table will be printed with a column corresponding to an evaluator. obs->add_evaluator(std::make_shared<EchoEvaluator<double>>(harmonic_ptr->x,"position X")); obs->add_evaluator(std::make_shared<EchoEvaluator<double>>(harmonic_ptr->recent_energy,"energy")); std::function<double(void)> get_temperature = [&sa]() { return sa.temperature(); }; obs->add_evaluator( std::make_shared<CallEvaluator<std::function<double(void)>>>(get_temperature, "temperature")); sa.outer_cycle_observer(obs); sa.cycles(100,100); sa.run(); } |

ex_Structure¶
ex_Structure reads a PDB file and prints a list of all atoms grouped by residues they belong to
EXAMPLE:
./ex_Structure 5edw.pdb
Keywords:
Categories:
- core::data::structural::Structure
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | #include <iostream> #include <iomanip> #include <core/algorithms/predicates.hh> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> std::string program_info = R"( ex_Structure reads a PDB file and prints a list of all atoms grouped by residues they belong to EXAMPLE: ./ex_Structure 5edw.pdb )"; /** @brief Reads a PDB file and prints a list of all atoms grouped by residues they belong to. * * CATEGORIES: core::data::structural::Structure * KEYWORDS: PDB input; Structure; Chain; Residue; PdbAtom; STL */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; core::data::io::Pdb reader(argv[1],is_not_alternative); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); // Iterate over all chains for (auto it_chain = strctr->begin(); it_chain!=strctr->end(); ++it_chain) { std::cout << "---------- chain " << (*it_chain)->id() << " ----------\n"; // Iterate over all residues for (auto it_res = (*it_chain)->begin(); it_res!=(*it_chain)->end(); ++it_res) { std::cout << std::setw(5)<<(*it_res)->id()<<" "<<(*it_res)->residue_type().code3<<" :"; for (auto it_atom = (*it_res)->begin(); it_atom!=(*it_res)->end(); ++it_atom) { if (((*it_atom)->alt_locator() == ' ') || ((*it_atom)->alt_locator() == 'A')) std::cout << " " << (*it_atom)->atom_name(); } std::cout <<"\n"; } } } |

ex_ThreadPool¶
Unit test which shows how to use a ThreadPool class.
USAGE:
./ex_ThreadPool
Keywords:
- concurrency
- multi-threading
Categories:
- utils/ThreadPool
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | #include <iostream> #include <utils/ThreadPool.hh> #include <utils/string_utils.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use a ThreadPool class. USAGE: ./ex_ThreadPool )"; /// Operator called by each thread (pretends to run very time consuming calculations) struct Op { std::string operator()(int i) { return "Hello " + utils::to_string(i); } }; /** @brief Simple test for a ThreadPool class * * CATEGORIES: utils/ThreadPool * KEYWORDS: concurrency; multi-threading */ int main(const int cnt, const char *argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); utils::ThreadPool pool(4); // --- Create a pool of four threads = four jobs can be executed at a time typedef typename std::result_of<Op(int)>::type return_type; Op o; std::vector<std::future<return_type>> futures; // --- Here we start 10 jobs which will be executed by four workers (threads) for (int i = 0; i < 10; ++i) futures.push_back(pool.enqueue(o, i)); for (int i = 0; i < 10; ++i) std::cout << std::string("Hello " + utils::to_string(i)) << " " << futures[i].get() << "\n"; } |

ex_ThreadSafeMap¶
Shows how to use ThreadSafeMap class
Keywords:
- data container
Categories:
- core::data::basic::ThreadSafeMap
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <iostream> #include <string> #include <core/data/basic/ThreadSafeMap.hh> /** @brief Shows how to use ThreadSafeMap class * * CATEGORIES: core::data::basic::ThreadSafeMap * KEYWORDS: data container */ int main(const int argc, const char* argv[]) { core::data::basic::ThreadSafeMap<std::string,int> map; int one = 1, two = 2; map.insert_or_assign("one",one); map.insert_or_assign("two",two); } |

ex_ThreeDTree¶
A simple example shows how to use BioShell kd-tree routines.
Keywords:
- neighborhood detection
- data structures
- algorithms
Categories:
- core::algorithms::trees::BinaryTreeNode; core::algorithms::trees::kd_tree.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | #include <memory> #include <iostream> #include <random> #include <core/algorithms/trees/kd_tree.hh> #include <core/algorithms/trees/BinaryTreeNode.hh> #include <core/algorithms/trees/algorithms.hh> #include <core/data/basic/Vec3.hh> #include <core/calc/statistics/Random.hh> using core::data::basic::Vec3; using namespace core::algorithms::trees; using namespace core::calc::statistics; /// Tree traversal operation prints each node on the screen struct PrintPoint { void operator()(std::shared_ptr<BinaryTreeNode<KDTreeNode<Vec3>> > node) { std::cout << node->element.element << " " << node->element.level << "\n"; } }; /** @brief A simple example shows how to use BioShell kd-tree routines. * * The program generates N=500 random points and partites them into KD-tree. Later the tree is used to find spatal * neighbors in 3D space. * * CATEGORIES: core::algorithms::trees::BinaryTreeNode; core::algorithms::trees::kd_tree.hh * KEYWORDS: neighborhood detection; data structures; algorithms */ int main(const int argc, const char* argv[]) { // ---------- Here we generate N random points in 3D space to be partitioned const unsigned short N = 5000; Random::seed(0); // seed random number generator Random & gen = Random::get(); // get rnd generator singleton UniformRealRandomDistribution<double> rnd; // uniform distribution will be used to assign coordinates std::vector<Vec3> atoms; // container for the points // ---------- We use <code>emplace_back()</code> to create Vec3 objects directly in the container for(unsigned int i=0;i<N;++i) atoms.emplace_back(rnd(gen),rnd(gen),rnd(gen)); // ---------- Here the actual kd-tree is constructed std::shared_ptr<BinaryTreeNode<KDTreeNode<Vec3>> > root = create_kd_tree<Vec3,std::vector<Vec3>::iterator, CompareAsReferences<Vec3>>(atoms.begin(),atoms.end()); // ---------- Here we divide all the points into \f$2^2=4\f$ groups std::vector<std::shared_ptr<BinaryTreeNode<KDTreeNode<Vec3>>>> node_groups; collect_given_level(root, 2, node_groups); unsigned short subcluster_id = 0; for(const auto node:node_groups) { // then we mark all nodes in each subtree by a distinct ID number depth_first_preorder(node, [subcluster_id](std::shared_ptr<BinaryTreeNode<KDTreeNode<Vec3>>> node) { node->element.level = subcluster_id; }); ++subcluster_id; } breadth_first_preorder(root, PrintPoint()); // finally, each node is printed // ---------- Here is an example how to search the tree Vec3 query(0.7,0.7,0.4); // a query point // ---------- Here is an example how to find the closes element Vec3 best_point; double d = search_kd_tree(root, [](const Vec3 &v1, const Vec3 &v2) { return v1.distance_to(v2); }, query, best_point); std::cout << "point closest to query: " << best_point << " : " << d << "\n"; Vec3 q_low(0.68,0.68,0.38); Vec3 q_up(0.8,0.8,0.45); std::vector<Vec3> hits; search_kd_tree(root, q_low, q_up, 3, hits); std::cout << "point within a box bounded by: " << q_low << " and " << q_up << ":\n"; for (const auto &p:hits) std::cout << p << "\n"; } |

ex_TreeNode¶
Unit test which shows how to use a TreeNode data structure defined in BioShell
USAGE:
./ex_TreeNode
Keywords:
Categories:
- core::algorithms::trees::TreeNode
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #include <memory> #include <iostream> #include <string> #include <core/algorithms/trees/TreeNode.hh> #include <core/algorithms/trees/algorithms.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use a TreeNode data structure defined in BioShell USAGE: ./ex_TreeNode )"; /** @brief Simple demo for TreeNode class * * This program creates a small tree with 7 nodes * * CATEGORIES: core::algorithms::trees::TreeNode * KEYWORDS: algorithms; data structures; graphs */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::algorithms::trees; std::shared_ptr<TreeNode<std::string>> n1 = std::make_shared<TreeNode<std::string>>("A",1); std::shared_ptr<TreeNode<std::string>> n2 = std::make_shared<TreeNode<std::string>>("B",2); std::shared_ptr<TreeNode<std::string>> n3 = std::make_shared<TreeNode<std::string>>("C",3); std::shared_ptr<TreeNode<std::string>> n4 = std::make_shared<TreeNode<std::string>>("D",4); std::shared_ptr<TreeNode<std::string>> n5 = std::make_shared<TreeNode<std::string>>("E",5); std::shared_ptr<TreeNode<std::string>> n6 = std::make_shared<TreeNode<std::string>>("F",6); std::shared_ptr<TreeNode<std::string>> n7 = std::make_shared<TreeNode<std::string>>("G",7); n1->add_branch(n2); n1->add_branch(n3); n2->add_branch(n4); n2->add_branch(n5); n2->add_branch(n7); n5->add_branch(n6); depth_first_preorder(n1,[](std::shared_ptr<TreeNode<std::string>> n){ std::cout << n->id<< "\n";}); std::cout << "Size of the tree: " << size(n1)<<"\n"; return 0; } |

ex_UnionFind¶
Unit test which shows how to use the Union-Find algorithm.
USAGE:
./ex_UnionFind
REFERENCE: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Keywords:
Categories:
- core::algorithms::UnionFind
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | #include <memory> #include <iostream> #include <core/algorithms/UnionFind.hh> #include <core/calc/statistics/Random.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to use the Union-Find algorithm. USAGE: ./ex_UnionFind REFERENCE: https://en.wikipedia.org/wiki/Disjoint-set_data_structure )"; // ---------- Data type of objects that will be clustered struct Point2D { float x, y; Point2D(float nx, float ny) : x(nx), y(ny) {} float distance_to(const Point2D &p) { return sqrt((x - p.x) * (x - p.x) + (y - p.y) * (y - p.y)); } }; // ---------- this operator is necessary because core::algorithms::UnionFind keeps std::map of data points bool operator<(const Point2D &lhs, const Point2D &rhs) { return lhs.x < rhs.x; } /** @brief A simple example shows how to use UnionFind algorithm. * * The program calculates greedy clustering of points in 2D. The number of points can be provided from command line * * CATEGORIES: core::algorithms::UnionFind; * KEYWORDS: data structures; data structures; algorithms */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::calc::statistics; // ---------- Here we generate N random points in 3D space to be partitioned const unsigned short N = (argc > 1) ? atoi(argv[1]) : 500; const float cutoff = (argc > 2) ? atof(argv[2]) : 0.05; std::vector<Point2D> points; // container for the points Random::seed(0); // seed random number generator Random &gen = Random::get(); // get rnd generator singleton UniformRealRandomDistribution<double> rnd; // uniform distribution will be used to assign coordinates core::algorithms::UnionFind<Point2D, core::index2> uf; for (unsigned int i = 0; i < N; ++i) { points.emplace_back(rnd(gen), rnd(gen)); // we use <code>emplace_back()</code> to create point objects directly in the container uf.add_element(points.back()); for (unsigned int j = 0; j < i; ++j) { if (points[i].distance_to(points[j]) < cutoff) uf.union_set(i, j); } } std::cout << "# x-coord y-coord cluster_assignment\n"; for (unsigned int i = 0; i < N; ++i) std::cout << points[i].x << " " << points[i].y << " " << uf.find_set(i) << "\n"; } |

ex_WL_Ar¶
The program runs a Wang-Landau MC simulation of argon gas. By default it stars from a regular lattice conformation unless an input file (PDB) with initial conformation is provided
USAGE:
ex_WL_Ar n_atoms density temperature small_cycles big_cycles [max_jump]
ex_WL_Ar starting.pdb density temperature small_cycles big_cycles [max_jump]
Keywords:
- no_keywords
Categories:
- no_categories
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | #include <iostream> #include <thread> #include <core/data/basic/Vec3Cubic.hh> #include <utils/string_utils.hh> #include <utils/options/OptionParser.hh> #include <simulations/systems/CartesianAtoms.hh> #include <simulations/systems/BuildFluidSystem.hh> #include <simulations/systems/SingleAtomType.hh> #include <simulations/movers/TranslateAtom.hh> #include <simulations/forcefields/cartesian/LJEnergySWHomogenic.hh> #include <simulations/sampling/WangLandauSampler.hh> #include <simulations/observers/ObserveEvaluators.hh> #include <simulations/observers/cartesian/PdbObserver.hh> #include <simulations/observers/ObserveWLSampling.hh> #include <simulations/observers/AdjustMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/observers/cartesian/SimplePdbFormatter.hh> #include <simulations/evaluators/CallEvaluator.hh> using namespace core::data::basic; utils::Logger logs("ex_WL_Ar"); std::string program_info = R"( The program runs a Wang-Landau MC simulation of argon gas. By default it stars from a regular lattice conformation unless an input file (PDB) with initial conformation is provided USAGE: ex_WL_Ar n_atoms density temperature small_cycles big_cycles [max_jump] ex_WL_Ar starting.pdb density temperature small_cycles big_cycles [max_jump] )"; const double EPSILON = 1.654E-21; // [J] per molecule const double EPSILON_BY_K = EPSILON / 1.381E-23; // = 119.6 in Kelvins const double SIGMA = 3.4; // in Angstroms inline int bin_from_energy(double E) { return (int)(E / 100); } /** @brief Isothermal Monte Carlo simulation of argon gas. * */ int main(const int argc,const char* argv[]) { using core::data::basic::Vec3Cubic; using namespace simulations::systems; using namespace simulations::movers; // for MoversSet using namespace simulations::observers::cartesian; // for all observers core::index4 n_outer_cycles = 1000; core::index4 n_inner_cycles = 1000; double density = 0.5; // density of the system controls how many atoms will be contained in the box double temperature = 97; // in Kelvins core::index4 n_atoms = 256; double max_jump = 0.5; // Random move range (in Angstroms) core::data::structural::Structure_SP argon_structure = nullptr; core::data::structural::PdbAtom_SP ar_atom = std::make_shared<core::data::structural::PdbAtom>(1," AR"); if (argc < 6) std::cerr << program_info; else { if (utils::is_integer(argv[1])) n_atoms = atoi(argv[1]); else { // --- read an input file if given core::data::io::Pdb reader(argv[1]); argon_structure = reader.create_structure(0); n_atoms = argon_structure->count_atoms(); } density = atof(argv[2]); temperature = atof(argv[3]); n_inner_cycles = atoi(argv[4]); n_outer_cycles = atoi(argv[5]); if (argc == 7) max_jump = atof(argv[6]); } double ar_volume = 4.0 / 3.0 * M_PI * SIGMA * SIGMA * SIGMA * n_atoms; double box_len = pow(ar_volume / density, 0.33333333333333); // --- Initialize periodic boundary conditions core::data::basic::Vec3Cubic::set_box_len(box_len); logs << utils::LogLevel::INFO << "box width for " << int(n_atoms) << " atoms : " << box_len << "\n"; // --- Create the system and distribute atoms in the box AtomTypingInterface_SP ar_type = std::make_shared<SingleAtomType>(" AR"); CartesianAtoms ar(ar_type, n_atoms); core::calc::statistics::Random::seed(1234); if(argon_structure != nullptr) { // --- read coordinates from a PDB file if provided set_conformation(argon_structure->first_const_atom(), argon_structure->last_const_atom(), ar); } else { // --- otherwise generate coordinates const auto grid = std::make_shared<SimpleCubicGrid>(box_len, n_atoms); BuildFluidSystem::generate(ar, *ar_atom, grid); } CartesianAtoms ar_backup(ar); // --- make a backup system // --- Create energy function - just LJ potential simulations::forcefields::cartesian::LJEnergySWHomogenic lj_energy(ar, SIGMA, EPSILON_BY_K); // --- Create a mover, which is a random perturbation of an atom in this case, and place it in a movers' set std::shared_ptr<TranslateAtom> translate = std::make_shared<TranslateAtom>(ar, ar_backup, lj_energy); translate->max_move_range_allowed(1.5); MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(translate, n_atoms); translate->max_move_range(max_jump); // --- set the maximum distance a single atom can be moved by a single MC perturbation // --- create a Wang-Landau Monte Carlo sampler double initial_energy = lj_energy.energy(ar); logs << utils::LogLevel::INFO << "Initial energy of the system (used to limit WL sampling) : " << initial_energy << "\n"; simulations::sampling::WangLandauSampler sampler(movers, initial_energy, bin_from_energy, initial_energy + 1); // ---------- Create an observer which calls energy calculation and prints it on the screen std::shared_ptr<simulations::observers::ObserveEvaluators> obs = std::make_shared<simulations::observers::ObserveEvaluators>(""); std::function<double(void)> recent_energy = [&lj_energy,&ar]() { return lj_energy.energy(ar); }; obs->add_evaluator( std::make_shared<simulations::evaluators::CallEvaluator<std::function<double(void)>>>(recent_energy, "energy", 8)); std::shared_ptr<AbstractPdbFormatter> fmt = std::make_shared<SimplePdbFormatter>(" AR ", "AR ", "AR"); auto observe_trajectory = std::make_shared<simulations::observers::cartesian::PdbObserver>(ar, fmt, "ar_tra.pdb"); observe_trajectory->trigger(std::make_shared<simulations::observers::TriggerEveryN>(1)); sampler.outer_cycle_observer(observe_trajectory); // --- commented out to save disk space std::shared_ptr<simulations::observers::AdjustMoversAcceptance> observe_moves = std::make_shared<simulations::observers::AdjustMoversAcceptance>(*movers,"movers.dat", 0.4); sampler.outer_cycle_observer(observe_moves); sampler.outer_cycle_observer(obs); sampler.cycle_size(1000); sampler.inner_cycles(n_inner_cycles); sampler.outer_cycles(n_outer_cycles); sampler.outer_cycle_observer(std::make_shared<simulations::observers::ObserveWLSampling>(sampler, "wl.dat")); sampler.run(); simulations::observers::cartesian::PdbObserver final(ar, fmt, "final.pdb"); final.observe(); logs << utils::LogLevel::INFO << "Final energy " << lj_energy.energy(ar) << "\n"; } |

ex_WL_Ising¶
The program runs a Wang-Landau Monte Carlo simulation of a simple 2D Ising system (spin glass)
Keywords:
Categories:
- simulations/sampling/WangLandauSampler; simulations/systems/ising/Ising2D
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #include <iostream> #include <simulations/observers/ObserveWLSampling.hh> #include <simulations/movers/ising/SingleFlip2D.hh> #include <simulations/movers/ising/WolffMove2D.hh> #include <simulations/observers/ObserveMoversAcceptance.hh> #include <simulations/observers/TriggerEveryN.hh> #include <simulations/sampling/WangLandauSampler.hh> #include <simulations/systems/ising/Ising2D.hh> using namespace core::data::basic; utils::Logger logs("ex_WL_Ising"); std::string program_info = R"( The program runs a Wang-Landau Monte Carlo simulation of a simple 2D Ising system (spin glass) )"; /** @brief Turns energy of a system into an energy bin index (integer) * @param energy - system's energy * @return integer assigned to a bin; may be negative */ inline int bfe(double energy) { return (int) energy; } /** @brief The program runs a Wang-Landau Monte Carlo simulation of a simple 2D Ising system (spin glass). * * This example shows how to set up a WL simulation * * CATEGORIES: simulations/sampling/WangLandauSampler; simulations/systems/ising/Ising2D * KEYWORDS: observer; simulation */ int main(const int argc, const char *argv[]) { using namespace simulations::systems::ising; using namespace simulations::movers::ising; using namespace simulations::observers; core::index4 n_outer_cycles = 1; core::index4 n_inner_cycles = 10000; core::index2 system_size = 10; if (argc < 2) std::cerr << program_info; else { system_size = atoi(argv[1]); if (argc == 4) { n_inner_cycles = atoi(argv[2]); n_outer_cycles = atoi(argv[3]); } } core::calc::statistics::Random::get().seed(12345); // --- seed the generator for repeatable results // ---------- Create the system to be sampled ---------- std::shared_ptr<Ising2D<core::index1, core::index2>> system = std::make_shared<Ising2D<core::index1, core::index2>>(system_size, system_size); system->initialize(); // Populate system with random spins // ---------- Movers definition ---------- simulations::movers::MoversSet_SP movers = std::make_shared<simulations::movers::MoversSetSweep>(); movers->add_mover(std::make_shared<SingleFlip2D<core::index1, core::index2>>(*system), system->count_spins()); movers->add_mover(std::make_shared<WolffMove2D<core::index1, core::index2>>(*system), system->count_spins() * 0.2); // ---------- Create the sampler ---------- const double initial_energy = system->calculate(); simulations::sampling::WangLandauSampler sampler(movers, initial_energy, bfe, 13); sampler.inner_cycles(n_inner_cycles); sampler.outer_cycles(n_outer_cycles); sampler.inner_cycle_observer(std::make_shared<ObserveWLSampling>(sampler, "wl.dat")); sampler.run(); } |

ex_XML¶
Demonstrate how to parse XML with BioShell utilities. The test runs on a predefined XML data
USAGE:
./ex_XML
)”;
using namespace core::data::io;
std::string xml_data = R”(<product> <id>15</id> <name>Widgets</name> <description>Example text.</description> <options type=”color”> <item value=”Purple” shade=”bright” /> <item>Green</item> <item>Orange</item> </options> </product>
Keywords:
Categories:
- core/data/io/XML
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | #include <iostream> #include <core/data/io/XML.hh> #include <core/data/io/XMLElement.hh> #include <utils/LogManager.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Demonstrate how to parse XML with BioShell utilities. The test runs on a predefined XML data USAGE: ./ex_XML )"; using namespace core::data::io; std::string xml_data = R"(<product> <id>15</id> <name>Widgets</name> <description>Example text.</description> <options type="color"> <item value="Purple" shade="bright" /> <item>Green</item> <item>Orange</item> </options> </product> )"; /** @brief Simple for XML I/O utils. * * CATEGORIES: core/data/io/XML * KEYWORDS: XML */ int main(int argc, char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); utils::LogManager::FINE(); XML xxx; if (argc == 1) { std::istringstream input(xml_data); xxx.load_data(input); } else xxx.load_data(argv[1]); std::cout << xxx.document_root(); return 0; } |

ex_alignment_io¶
Unit test which reads alignment in Edinburgh format or calculates a global sequence alignment for two predefined sequences. It saves output alignment in Edinburgh format.
USAGE:
./ex_alignment_io [alignment]
EXAMPLE:
./ex_alignment_io example.edinb
Keywords:
Categories:
- core/data/io/alignment_io.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <iostream> #include <core/data/io/alignment_io.hh> #include <core/BioShellEnvironment.hh> #include <core/alignment/NWAligner.hh> #include <core/alignment/on_alignment_computations.hh> #include <core/alignment/PairwiseAlignment.hh> #include <core/alignment/PairwiseSequenceAlignment.hh> #include <core/alignment/scoring/SimilarityMatrix.hh> #include <core/alignment/scoring/SimilarityMatrixScore.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads alignment in Edinburgh format or calculates a global sequence alignment for two predefined sequences. It saves output alignment in Edinburgh format. USAGE: ./ex_alignment_io [alignment] EXAMPLE: ./ex_alignment_io example.edinb )"; /** @brief Read alignment in Edinburgh format or calculate a new one from given sequences; write Edinburgh. * * CATEGORIES: core/data/io/alignment_io.hh * KEYWORDS: sequence alignment */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::alignment; using namespace core::alignment::scoring; if (argc > 1) { // If there was an input alignment file given, read it! std::vector<PairwiseSequenceAlignment_SP> alignments; auto ali = core::data::io::read_edinburgh(argv[1], alignments); for (const PairwiseSequenceAlignment_SP & ali : alignments) core::data::io::write_edinburgh(*ali, std::cout, 80); } else { // otherwise, align the two sequences defined below // ---------- The two sequences that will be aligned std::string query = "MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAYAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLI"; std::string tmplt = "MIYLYLLCAIFAEVVATSLLKSTEGFTRLWPTVGCLVGYGIAFALLALSISHGMQTDVAYALWSAIGTAAIVLVAVLFLGSPISVMKVVGVGLI"; // ---------- Gap penalties short int open = -10; short int extend = -2; // ---------- load BLOSUM matrix from bioshell's library; the directory must be defined as a shell variable const std::shared_ptr<SimilarityMatrix<short int>> b62_matrix = SimilarityMatrix<short int>::from_ncbi_file("alignments/BLOSUM62"); const SimilarityMatrixScore<short int> b62_score(query, tmplt, *b62_matrix); NWAligner<short int, SimilarityMatrixScore<short int>> global(std::max(query.length(), tmplt.length())); // ---------- Compute and backtrace the alignment global.align(open, extend, b62_score); const PairwiseAlignment_SP ali = global.backtrace(); // ---------- Convert the abstract alignment to a pairwise sequence alignment object core::data::sequence::Sequence query_seq("Q7B1Y7_SALEN",query); core::data::sequence::Sequence tmplt_seq("MMR_MYCTU",tmplt); core::data::io::write_edinburgh(query_seq, *ali, tmplt_seq, std::cout, 80); } } |

ex_basic_algebra¶
Unit test that calculates eigenvalues and eigenvectors for a 3x3 matrix
USAGE:
./ex_basic_algebra
Keywords:
Categories:
- core/calc/numeric/basic_algebra.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | #include <iostream> #include <core/data/basic/Vec3.hh> #include <core/calc/numeric/basic_algebra.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Unit test that calculates eigenvalues and eigenvectors for a 3x3 matrix USAGE: ./ex_basic_algebra )"; /** @brief ex_basic_algebra illustrates how to calculate eigenvalues and eigenvectors for a 3x3 matrix * * CATEGORIES: core/calc/numeric/basic_algebra.hh * KEYWORDS: numerical methods */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::calc::numeric; using namespace core::data::basic; // --- for Array2D Array2D<double> m3x3(3,3,{2, -1, 0, -1, 2, -1, 0, -1, 2}); // --- input matrix to be solved std::cout << "\nOriginal matrix:\n"; m3x3.print("%8.3f", std::cout); std::vector<double> eigenval; core::calc::numeric::eigenvalues3(m3x3, eigenval); std::cout << "\nEigenvalues: " << eigenval[0] << " " << eigenval[1] << " " << eigenval[2] << "\n\n"; auto eigenv = eigenvectors3(m3x3, eigenval); std::cout << "\nEigenvectors:\n" << eigenv[0] << "\n" << eigenv[1] << "\n" << eigenv[2] << "\n\n"; } |

ex_cabs_representation¶
Unit test which reads an all-atom structure from a PDB file and produces a structure in CABS representation.
USAGE:
./ex_cabs_representation input.pdb
EXAMPLE:
./ex_cabs_representation 2gb1.pdb
REFERENCE: Kolinski, Andrzej. “Protein modeling and structure prediction with a reduced representation.” Acta Biochimica Polonica 51 (2004).
Kmiecik, Sebastian, et al. “Coarse-grained protein models and their applications.” Chemical reviews 116.14 (2016): 7898-7936. doi: 10.1021/acs.chemrev.6b00163
Keywords:
- PDB input
- CABS
- representation
Categories:
- simulations::representations::cabs::cabs_utils
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <simulations/representations/cabs/cabs_utils.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads an all-atom structure from a PDB file and produces a structure in CABS representation. USAGE: ./ex_cabs_representation input.pdb EXAMPLE: ./ex_cabs_representation 2gb1.pdb REFERENCE: Kolinski, Andrzej. "Protein modeling and structure prediction with a reduced representation." Acta Biochimica Polonica 51 (2004). Kmiecik, Sebastian, et al. "Coarse-grained protein models and their applications." Chemical reviews 116.14 (2016): 7898-7936. doi: 10.1021/acs.chemrev.6b00163 )"; using namespace core::data::structural; using namespace core::data::io; /** @brief Reads an all-atom structure from a PDB file and produces a structure in CABS representation. * * CATEGORIES: simulations::representations::cabs::cabs_utils * KEYWORDS: PDB input; CABS; representation */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); // --- Read the input PDB and create a structure object core::data::io::Pdb reader(argv[1], is_not_alternative, only_ss_from_header, true); core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- Check whether loaded structure is in the CABS representation if (simulations::representations::is_cabs_model(*strctr)) std::cerr<<"Loaded structure of "<<argv[1]<<" has CABS representation! Load all-atom model.\n"; else if (simulations::representations::is_cabsbb_model(*strctr)) { std::cerr<<"Loaded structure of "<<argv[1]<<" has CABSBB representation! Load fullatom model.\n"; } else { // --- Convert the Structure into CABS representation and write the result in the PDB format Structure_SP structure_sp = simulations::representations::cabs_representation(*strctr); for (auto atom_sp = structure_sp->first_atom(); atom_sp != structure_sp->last_atom(); ++atom_sp) std::cout << (*atom_sp)->to_pdb_line() << "\n"; // --- Here we generate CONNECT lines so the PDB file displays nicely in PyMOL auto prev_ca_sp = *structure_sp->first_atom(); // --- the very first CA in the structure for (auto res_sp_it = (++(structure_sp->first_residue())); res_sp_it != structure_sp->last_residue(); ++res_sp_it) { auto ca = *((*res_sp_it)->cbegin()); // --- iterator to the CA of this residue core::data::io::Conect cn(prev_ca_sp->id(), ca->id()); std::cout << cn.to_pdb_line(); if ((*res_sp_it)->count_atoms() < 2) { // --- it has also CB auto cb = *((*res_sp_it)->cbegin() + 2); // --- CB is always the third one core::data::io::Conect cn(ca->id(), cb->id()); std::cout << cn.to_pdb_line(); if ((*res_sp_it)->count_atoms() < 2) { // --- it has also CB auto sc = *((*res_sp_it)->cbegin() + 3); // --- SC is always the fourth one core::data::io::Conect cn(cb->id(), sc->id()); std::cout << cn.to_pdb_line(); } } prev_ca_sp = ca; } } } |

ex_cabs_rotamers¶
Keywords:
- no_keywords
Categories:
- no_categories
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | #include <iostream> #include <core/chemical/ChiAnglesDefinition.hh> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/data/structural/selectors/SelectChainBreaks.hh> #include <core/data/structural/selectors/SelectClashingResidues.hh> #include <core/data/structural/selectors/SelectContiguousResidues.hh> #include <core/calc/structural/angles.hh> #include <core/calc/structural/protein_angles.hh> #include <core/calc/structural/transformations/CartesianToSpherical.hh> #include <core/calc/structural/transformations/transformation_utils.hh> #include <core/calc/structural/transformations/Rototranslation.hh> #include <utils/io_utils.hh> #include <utils/LogManager.hh> utils::Logger logger("ex_cabs_rotamers"); int main(const int argc, const char* argv[]) { using namespace core::chemical; using namespace core::data::structural; using namespace core::calc::structural; using namespace core::calc::structural::transformations; utils::LogManager::get().set_level("FINE"); if(argc==1) { return 0; } core::data::io::Pdb reader(argv[1], core::data::io::all_true(core::data::io::is_not_alternative,core::data::io::is_not_hydrogen), core::data::io::keep_all, false); // Create a PDB reader for a given file core::data::structural::Structure_SP str = reader.create_structure(0); // Create a structure object from the first model // ---------- Selector we use to be sure that the residue is correct selectors::IsAA is_aa; selectors::IsBBCB atom_is_bb_cb; selectors::ResidueHasBBCB has_bb_cb; selectors::ResidueHasAllHeavyAtoms all_atoms; selectors::SelectChainBreaks breaks; selectors::SelectClashingResidues clash_test(str,2.3); selectors::SelectContiguousResidues check_resids; std::cout << "# r alpha theta alpha theta x y z ss res_id aa chain prot rotamer chi angles\n"; CartesianToSpherical acs; std::vector<std::string> lcs_atom_names = {" N ", " CA ", " C "}; auto ires = str->first_residue(); ++ires; auto next_res = str->first_residue(); ++(++next_res); for (; next_res != str->last_residue(); ++ires,++next_res) { // iterate over all residues staring from the second one if (!is_aa(**ires)) continue; if (((**ires).residue_type() == Monomer::GLY) || ((**ires).residue_type() == Monomer::ALA)) continue; if(!has_bb_cb(**ires)) { logger << utils::LogLevel::WARNING << "Residue "<<(**ires) << " has incomplete backbone or lacks its CB atom\n"; continue; } if(!all_atoms(**ires)) { logger << utils::LogLevel::WARNING << "Residue side chain "<<(**ires) << " is incomplete\n"; continue; }; if(!check_resids(**ires)) { logger << utils::LogLevel::WARNING << "Residue "<<(**ires) << " is broken\n"; continue; }; if(breaks(**ires)) { logger << utils::LogLevel::WARNING << "Residue "<<(**ires) << " is at chain break\n"; continue; }; if(clash_test(**ires)) { logger << utils::LogLevel::WARNING << "Residue "<<(**ires) << " is in a steric clash\n"; continue; }; PdbAtom_SP CA = (*ires)->find_atom_safe(" CA "); PdbAtom_SP N = (*ires)->find_atom_safe(" N "); PdbAtom_SP C = (*ires)->find_atom_safe(" C "); PdbAtom_SP CB = (*ires)->find_atom_safe(" CB "); double t = core::calc::structural::evaluate_dihedral_angle(*N, *C, *CB, *CA) * 180.0 / 3.14159; if ((t < -50) || (t > -20)) { logger << utils::LogLevel::WARNING << "Residue " << (**ires) << " has incorrect geometry at CA. Dihedral angle: " << t << "\n"; continue; }; core::data::basic::Vec3 cm; double n = 0; std::for_each((*ires)->cbegin(),(*ires)->cend(),[&](const PdbAtom_SP a){ if(!atom_is_bb_cb(*a)) { cm+=(*a); ++n;} }); if (n == 0) continue; cm /= n; Rototranslation_SP lcs = local_coordinates_three_atoms(**ires, lcs_atom_names); Vec3 sph; lcs->apply(cm); acs.apply(cm, sph); std::cout << utils::string_format("%7.4f %7.2f %7.2f %7.4f %7.4f %6.3f %6.3f %6.3f ", sph.x, to_degrees(sph.y), to_degrees(sph.z), sph.y, sph.z, cm.x, cm.y, cm.z); std::cout << utils::string_format("%c %6d %3s %s %s %4s", (*ires)->ss(), (*ires)->id(), (*ires)->residue_type().code3.c_str(), (*ires)->owner()->id().c_str(), utils::basename(str->code()).c_str(), core::calc::structural::define_rotamer(**ires).c_str()); for (unsigned short i = 1; i <= core::chemical::ChiAnglesDefinition::count_chi_angles((*ires)->residue_type()); ++i) std::cout << utils::string_format(" %6.1f", core::calc::structural::evaluate_chi(**ires, i) * 180.0 / 3.1415); std::cout << "\n"; } } |

ex_cabsbb_representation¶
Converts all-atom protein structure to CABS-bb representation
USAGE:
ex_cabsbb_representation input.pdb
USAGE:
ex_cabsbb_representation 2gb1.pdb
Keywords:
- PDB input
- CABS-bb
Categories:
- simulations/representations/cabs/cabs_utils
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #include <iostream> #include <core/data/io/Pdb.hh> #include <utils/exit.hh> #include <simulations/representations/cabs/cabs_utils.hh> using namespace core::data::structural; using namespace core::data::io; /** @brief Reads an all-atom structure from a PDB file and produces a structure in CABS-BB representation. */ std::string program_info = R"( Converts all-atom protein structure to CABS-bb representation USAGE: ex_cabsbb_representation input.pdb USAGE: ex_cabsbb_representation 2gb1.pdb )"; /** @brief Converts all-atom protein structure to CABS-bb representation * * * CATEGORIES: simulations/representations/cabs/cabs_utils; * KEYWORDS: PDB input; CABS-bb */int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // --- Read the input PDB and create a structure object core::data::io::Pdb reader(argv[1], all_true(is_not_hydrogen,is_not_water,is_not_alternative), keep_all, true); core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- Check whether loaded structure is in the CABSBB representation if (simulations::representations::is_cabsbb_model(*strctr)) std::cerr << "Loaded structure of " << argv[1] << " has CABSBB representation! Load fullatom model.\n"; else if (simulations::representations::is_cabs_model(*strctr)) std::cerr << "Loaded structure of " << argv[1] << " has CABS representation! Load fullatom model.\n"; else { // --- Convert the Structure into CABSBB representation and write the result in the PDB format core::data::structural::Structure_SP structure_sp = simulations::representations::cabsbb_representation(*strctr); for (auto atom_sp = structure_sp->first_atom(); atom_sp != structure_sp->last_atom(); ++atom_sp) std::cout << (*atom_sp)->to_pdb_line() << "\n"; // --- Here we generate CONNECT lines so the PDB file displays nicely in PyMOL for (auto it = structure_sp->first_const_residue(); it != structure_sp->last_const_residue(); ++it) { if ((*it)->count_atoms() == 6) { // --- if this CABSBB residue has 6 atoms, it must have the SC atom auto cb = *((*it)->cbegin() + 4); // --- CB is always the fifth one auto sc = *((*it)->cbegin() + 5); // --- SC is always the sixth one core::data::io::Conect cn(cb->id(),sc->id()); std::cout << cn.to_pdb_line(); } } } } |

ex_chi2_independence_test¶
Performs chi-square test: calculates p-value for a given number of DOFs. Alternatively, it can read a contingency matrix from a file and calculate test for independence of its two first rows When no input data is provided, the example performs Chi-square independence test on a test data
USAGE:
ex_chi2_independence_test [n_dofs chi2_value]
ex_chi2_independence_test [input_contingency_matrix_file]
EXAMPLE:
ex_chi2_independence_test 4 1.52
ex_chi2_independence_test matrix.dat
REFERENCE: Pearson, Karl. “X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50.302 (1900): 157-175.
Keywords:
Categories:
- core::calc::statistics::chi2_independence_test
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | #include <iostream> #include <core/data/basic/Array2D.hh> #include <core/calc/statistics/simple_statistics.hh> std::string program_info = R"( Performs chi-square test: calculates p-value for a given number of DOFs. Alternatively, it can read a contingency matrix from a file and calculate test for independence of its two first rows When no input data is provided, the example performs Chi-square independence test on a test data USAGE: ex_chi2_independence_test [n_dofs chi2_value] ex_chi2_independence_test [input_contingency_matrix_file] EXAMPLE: ex_chi2_independence_test 4 1.52 ex_chi2_independence_test matrix.dat REFERENCE: Pearson, Karl. "X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling." The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50.302 (1900): 157-175. )"; /** @brief Chi-square test for independence. * * CATEGORIES: core::calc::statistics::chi2_independence_test; * KEYWORDS: statistics; data table */ int main(const int argc, const char* argv[]) { using core::data::basic::Array2D; if(argc==3) { // --- calculate chi-square test for given chi-square statistics value and the number of DOFs int k = utils::from_string<int>(argv[1]); double crit = utils::from_string<double>(argv[2]); std::cout << "# DOFs: " << k << "\n"; std::cout << "# chi2 value: " << crit << "\n"; std::cout << "# p-value: " << core::calc::numeric::chi_square_pvalue(k,crit) << "\n"; return 0; } if(argc==2) { // --- calculate chi-square test for given data (test the independence of the two first rows) Array2D<core::index4> m = Array2D<core::index4>::from_file(argv[1]); int k = (m.count_rows() - 1) * (m.count_columns() - 1); double crit = core::calc::statistics::chi2_independence_test(m); std::cout << "# DOFs: " << k << "\n"; std::cout << "# chi2 value: " << crit << "\n"; std::cout << "# p-value: " << core::calc::numeric::chi_square_pvalue(k, crit) << "\n"; return 0; } std::cerr << program_info; std::vector<core::index4> data = {71,154,398,4992,2808,2737}; Array2D<core::index4> m(2,3, data); int k = 2; // (2-1)*(3-1) = 2 double crit = core::calc::statistics::chi2_independence_test(m); std::cout << "# DOFs: " << k << "\n"; std::cout << "# chi2 value: " << crit << "\n"; std::cout << "# p-value: " << core::calc::numeric::chi_square_pvalue(k,crit) << "\n"; } |

ex_consecutive_find¶
Unit test which shows how to find islands of consecutive elements in a container.
USAGE:
./ex_consecutive_find
Keywords:
Categories:
- core/algorithms/basic_algorithms.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #include <vector> #include <iostream> #include <iterator> #include <core/algorithms/basic_algorithms.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to find islands of consecutive elements in a container. USAGE: ./ex_consecutive_find )"; struct AreConsecutive { bool operator()(int i, int n) { return (n - i) == 1; } }; struct SSranges { bool operator()(char ci, char cn) { return (ci==cn); } }; /** @brief Shows how to find islands of consecutive elements in a container * * CATEGORIES: core/algorithms/basic_algorithms.hh * KEYWORDS: algorithms; data structures */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::vector<int> v{-3, 1, 2, 4, 5, 7, 8, 9, 10, 12, 12, 13, 16, 18, 20, 21, 22, 23}; std::vector<std::pair<int, int>> islands; int n_islands = core::algorithms::consecutive_find(v.begin(), v.end(), AreConsecutive(), islands); for (const auto &is : islands) { for (int i = is.first; i <= is.second; ++i) std::cout << v[i] << " " ; std::cout << "\n"; } std::string ss("CEEEEEECCCCCCEEEEEECCHHHHHHHHHHHHHHHCCCCCEEEEECCCCEEEEEC"); std::vector<char> v_ss(ss.begin(), ss.end()); islands.clear(); n_islands = core::algorithms::consecutive_find(v_ss.begin(), v_ss.end(), SSranges(), islands); for (const auto &is : islands) std::cout << v_ss[is.first] << " " << is.first << " " << is.second << "\n"; } |

ex_count_residues_by_type¶
Reads a Multiple Sequence Alignment (MSA) in ClustalW format and counts residues by its type.
EXAMPLE:
./ex_count_residues_by_type cyped.CYP109.aln
Keywords:
Categories:
- core::data::sequence::sequence_utils
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #include <iostream> #include <core/data/io/clustalw_io.hh> #include <core/data/sequence/sequence_utils.hh> #include <utils/exit.hh> #include <utils/io_utils.hh> std::string program_info = R"( Reads a Multiple Sequence Alignment (MSA) in ClustalW format and counts residues by its type. EXAMPLE: ./ex_count_residues_by_type cyped.CYP109.aln )"; /** @brief Reads a MSA in ClustalW format and prints by-residue counts * * CATEGORIES: core::data::sequence::sequence_utils * KEYWORDS: clustal input; MSA */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::sequence; std::vector<Sequence_SP> msa; // --- Sequence_SP is just a shorter name for std::shared_ptr<Sequence> core::data::io::read_clustalw_file(argv[1],msa); std::map<core::chemical::Monomer,core::index4> counts = core::data::sequence::count_residues_by_type(msa); for (const auto &key_val : counts) std::cout << key_val.first.code3 << " " << key_val.second << "\n"; } |

ex_define_rotamer¶
Prints rotamer type (M-P-T code) for each amino acid residue in the input PDB structure
USAGE:
ex_define_rotamer input.pdb
EXAMPLE:
ex_define_rotamer 5edw.pdb
OUTPUT (fragment): 277 ASP 2 TP 278 LYS 4 incomplete 279 ARG 4 TTMT 280 ILE 2 MM 281 PRO 3 PMP 282 LYS 4 MTMM 283 ALA 0 284 ILE 2 TT
Keywords:
- PDB input
- structural properties
- rotamers
- STL
Categories:
- core::chemical::ChiAnglesDefinition; core::data::structural::ResidueHasAllHeavyAtoms
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/calc/structural/protein_angles.hh> #include <core/chemical/ChiAnglesDefinition.hh> #include <utils/exit.hh> #include <core/data/structural/selectors/structure_selectors.hh> std::string program_info = R"( Prints rotamer type (M-P-T code) for each amino acid residue in the input PDB structure USAGE: ex_define_rotamer input.pdb EXAMPLE: ex_define_rotamer 5edw.pdb OUTPUT (fragment): 277 ASP 2 TP 278 LYS 4 incomplete 279 ARG 4 TTMT 280 ILE 2 MM 281 PRO 3 PMP 282 LYS 4 MTMM 283 ALA 0 284 ILE 2 TT )"; /** @brief Prints rotamer type (M-P-T code) for each amino acid residue in the input PDB structure * * CATEGORIES: core::chemical::ChiAnglesDefinition; core::data::structural::ResidueHasAllHeavyAtoms * KEYWORDS: PDB input; structural properties; rotamers; STL */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; using namespace core::data::structural; Pdb reader(argv[1]); // file name (PDB format, may be gzip-ped) Structure_SP strctr = reader.create_structure(0); selectors::ResidueHasAllHeavyAtoms has_full_sc; selectors::IsAA is_aa; // Iterate over all residues for (auto ires = strctr->first_residue(); ires != strctr->last_residue(); ++ires) { core::data::structural::Residue &res_sp = (**ires); if (!is_aa(res_sp)) continue; std::cout << std::setw(4) << res_sp.id() << " " << res_sp.residue_type().code3 << " " << core::chemical::ChiAnglesDefinition::count_chi_angles(res_sp.residue_type()); if (has_full_sc(res_sp)) std::cout << std::setw(5) << core::calc::structural::define_rotamer(res_sp); else std::cout << " incomplete"; std::cout << "\n"; } } |

ex_expectation_maximization¶
Example showing how to use expectation-maximization method retrieve arbitrary data according to a sequence alignment object
USAGE:
./ex_expectation_maximization
REFERENCE: Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society: Series B 39 (1977): 1-22. doi: 10.1111/j.2517-6161.1977.tb01600.x
Keywords:
Categories:
- core::calc::statistics::NormalDistribution; core::calc::statistics::BivariateNormal;core/calc/statistics/expectation_maximization.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | #include <math.h> #include <iostream> #include <random> #include <vector> #include <core/calc/statistics/NormalDistribution.hh> #include <core/calc/statistics/BivariateNormal.hh> #include <core/calc/statistics/Combined_1D_2D_Normal.hh> #include <core/calc/statistics/TrivariateNormal.hh> #include <core/calc/statistics/expectation_maximization.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Example showing how to use expectation-maximization method retrieve arbitrary data according to a sequence alignment object USAGE: ./ex_expectation_maximization REFERENCE: Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B 39 (1977): 1-22. doi: 10.1111/j.2517-6161.1977.tb01600.x )"; /** @brief Example showing how to use expectation-maximization method * * CATEGORIES: core::calc::statistics::NormalDistribution; core::calc::statistics::BivariateNormal;core/calc/statistics/expectation_maximization.hh * KEYWORDS: estimation; expectation-maximization */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::calc::statistics; double rd = 9876543; std::mt19937 gen(rd); core::index4 N = 10000; //--- the number of random points to use in tests // ---------- a few distributions to play with double ave1 = 2.0, ave2 = 4.0, ave3 = 6.0; double std1 = 0.2, std2 = 0.3, std3 = 0.5; std::normal_distribution<> m(ave1, std1); std::normal_distribution<> p(ave2, std2); std::normal_distribution<> t(ave3, std3); // ---------- First let's solve a 1D problem : mixture of 3 Gaussians std::vector<std::vector<double> > random_points; std::vector<double> r1(1), r2(1), r3(1); for (core::index4 i = 0; i < N; ++i) { r1[0] = (m(gen)); r2[0] = (p(gen)); r3[0] = (t(gen)); random_points.push_back(r1); random_points.push_back(r2); random_points.push_back(r3); } std::vector<core::calc::statistics::NormalDistribution> distributions_1D; std::vector<core::index1> index_1D; // --- Distribution assignment computed by EM will be stored here core::calc::statistics::NormalDistribution d1(ave1, std1), d2(ave2, std2), d3(ave3, std1); distributions_1D.push_back(d1); distributions_1D.push_back(d2); distributions_1D.push_back(d3); std::cout << "1D distributions: starting params\n"; //print parameters_ of all distributions for (const auto & d : distributions_1D) std::cout << d << "\n"; double score = expectation_maximization(random_points, distributions_1D, index_1D, true); std::cout << "1D distributions: resulting params\n"; //print parameters_ of all distributions for (const auto & d : distributions_1D) std::cout << d << "\n"; std::cout << "\n"; // ---------- now a mixture of 2 Gaussians in 2D std::vector<core::calc::statistics::BivariateNormal> distributions_2D; std::vector<core::index1> index_2D; core::calc::statistics::BivariateNormal d4(ave2, ave3, std2, std3, 0.1), d5(ave3, ave2, std3, std2, 0.1); distributions_2D.push_back(d4); distributions_2D.push_back(d5); std::vector<std::vector<double> > data_2D; std::vector<double> rr1(2), rr2(2); for (core::index4 i = 0; i < N; ++i) { rr1[0] = (p(gen)); rr1[1] = (t(gen)); rr2[0] = (t(gen)); rr2[1] = (p(gen)); data_2D.push_back(rr1); data_2D.push_back(rr2); } std::cout << "2D distributions: starting params\n"; //print parameters_ of all distributions for (const auto & d : distributions_2D) std::cout << d << "\n"; expectation_maximization(data_2D, distributions_2D, index_2D, true); std::cout << "2D distributions: resulting params\n"; //print parameters_ of all distributions for (const auto & d : distributions_2D) std::cout << d << "\n"; } |

ex_find_side_group¶
Reads a PDB file and prints names of all atoms in each residue side chain. The find_side_group() function, tested by this program, creates a molecular graph to detect a side chain and returns copies side chain atoms on a vector.
USAGE:
ex_find_side_group 2gb1.pdb
Keywords:
- data_structures
- graphs
- residue side chains
Categories:
- core/chemical/find_side_group
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/chemical/Molecule.hh> #include <core/chemical/molecule_utils.hh> #include <utils/exit.hh> std::string program_info = R"( Reads a PDB file and prints names of all atoms in each residue side chain. The find_side_group() function, tested by this program, creates a molecular graph to detect a side chain and returns copies side chain atoms on a vector. USAGE: ex_find_side_group 2gb1.pdb )"; /** @brief A simple example shows how to select a chemical group of a molecule using find_side_group() method. * * This example prints atoms for each side chain in a protein * CATEGORIES: core/chemical/find_side_group; * KEYWORDS: data_structures;graphs ; residue side chains */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; core::data::io::Pdb reader(argv[1],core::data::io::is_not_alternative); // file name (PDB format, may be gzip-ped) Structure_SP strctr = reader.create_structure(0); // create a Structure object from the first deposit found in the input file // --- Here we create a molecule object; 0.1 is the tolerance for bond lengths (used to detect bonds) auto molecule_sp = core::chemical::create_molecule(strctr->first_atom(),strctr->last_atom(),0.1); // --- Iterate over all residues in the structure for(auto res_it = strctr->first_residue();res_it!=strctr->last_residue();++res_it) { auto ca = (*res_it)->find_atom(" CA "); // alpha carbon is the preceding atom auto cb = (*res_it)->find_atom(" CB "); // beta carbon is the atom where a side chain is attached if((ca== nullptr)||(cb== nullptr)) continue; std::vector<PdbAtom_SP> sc; core::chemical::find_side_group<PdbAtom_SP>(ca,cb,*molecule_sp,sc); std::cout << utils::string_format("%4d %s :",(*res_it)->id(),(*res_it)->residue_type().code3.c_str()); for(const PdbAtom_SP & a : sc) std::cout << " " << a->atom_name(); std::cout << "\n"; } } |

ex_goodman_kruskal_rank_correlation¶
The program read a contingency matrix from a file and calculates Goodman and Kruskal’s gamma parameters which is a measure of rank correlation.
USAGE:
ex_goodman_kruskal_rank_correlation input_contingency_matrix_file
EXAMPLE:
ex_goodman_kruskal_rank_correlation contingency_matrix.txt
REFERENCE: Kruskal, William H., and Leo Goodman. “Measures of association for cross classifications.” Journal of the American Statistical Association 49 (1954): 732-764. doi:10.2307/2281536.
Keywords:
Categories:
- core::calc::statistics::goodman_kruskal_rank_correlation
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #include <iostream> #include <core/data/basic/Array2D.hh> #include <core/calc/statistics/simple_statistics.hh> #include <utils/exit.hh> std::string program_info = R"( The program read a contingency matrix from a file and calculates Goodman and Kruskal's gamma parameters which is a measure of rank correlation. USAGE: ex_goodman_kruskal_rank_correlation input_contingency_matrix_file EXAMPLE: ex_goodman_kruskal_rank_correlation contingency_matrix.txt REFERENCE: Kruskal, William H., and Leo Goodman. "Measures of association for cross classifications." Journal of the American Statistical Association 49 (1954): 732-764. doi:10.2307/2281536. )"; /** @brief Calculates Goodman and Kruskal's gamma parameters * * CATEGORIES: core::calc::statistics::goodman_kruskal_rank_correlation; * KEYWORDS: statistics; data table */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::basic::Array2D; // --- Read an input file - data table format Array2D<core::index4> m = Array2D<core::index4>::from_file(argv[1]); std::cout << core::calc::statistics::goodman_kruskal_rank_correlation(m)<<"\n"; } |

ex_greedy_clustering¶
Example showing how to use greedy clustering method.
Keywords:
Categories:
- core::calc::clustering::greedy_clustering()
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | #include <vector> #include <iostream> #include <random> #include <core/calc/clustering/greedy_clustering.hh> /// A distance operator calculates the distance between two points indexed by <code>i</code> and <code>j</code> struct PointDistance { std::vector<double> & points; /// Constructor just copies the reference of a data vector PointDistance(std::vector<double> & pts) : points(pts) {} /// Call-operator computes the distance double operator()(const size_t i,const size_t j) const { return fabs(points[i]-points[j]); } }; /** @brief Example showing how to use greedy clustering method. * * CATEGORIES: core::calc::clustering::greedy_clustering() * KEYWORDS: clustering */ int main(const int argc, const char* argv[]) { // --- Prepare random number generators std::mt19937 gen(1234567); std::normal_distribution<> d1(10.5, 2.0); std::normal_distribution<> d2(-0.5, 2.0); std::vector<double> data; // --- Generate 20 random values for (unsigned short i = 0; i < 10; ++i) { data.push_back(d1(gen)); data.push_back(d2(gen)); } std::vector<size_t> clusters; // --- Clusters will be stored here std::vector<size_t> cluster_members; // --- vector for members assigned to clusters PointDistance distance(data); // --- instance of the distance operator core::calc::clustering::greedy_clustering(data,distance,5.0,clusters,cluster_members); // --- Show results std::cout << "n_clusters: " << clusters.size() << "\n"; std::cout << "cluster assignment: "; for (const unsigned short i : cluster_members) std::cout << i << " "; std::cout << "\n"; } |

ex_hssp_to_fasta¶
Simple test which reads a Multiple Sequence Alignment in the HSSP file format and writes it in the FASTA format.
USAGE:
./ex_hssp_to_fasta input.hssp
EXAMPLE:
./ex_hssp_to_fasta 1crn.hssp
REFERENCE: Soding, J and Biegert, A and Lupas, A. N., “The HHpred interactive server for protein homology detection and structure prediction.” Nucleic acids research (2005) 33 W244–W248
Keywords:
- sequence alignment
- FASTA
- HSSP
Categories:
- core::data::io::hssp_io
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <iostream> #include <core/data/io/hssp_io.hh> #include <core/data/io/fasta_io.hh> #include <utils/exit.hh> #include <utils/options/OptionParser.hh> std::string program_info = R"( Simple test which reads a Multiple Sequence Alignment in the HSSP file format and writes it in the FASTA format. USAGE: ./ex_hssp_to_fasta input.hssp EXAMPLE: ./ex_hssp_to_fasta 1crn.hssp REFERENCE: Soding, J and Biegert, A and Lupas, A. N., "The HHpred interactive server for protein homology detection and structure prediction." Nucleic acids research (2005) 33 W244--W248 )"; /** @brief Reads an MSA in HSSP format and writes a FASTA file. * * * USAGE: * ex_hssp_to_fasta 1crn.pdb * * CATEGORIES: core::data::io::hssp_io; * KEYWORDS: sequence alignment; FASTA; HSSP * GROUP: File processing; Format conversion */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::data::io; std::vector<std::shared_ptr<core::data::sequence::Sequence>> sink; core::data::io::read_hssp_file(argv[1], sink); for(const auto & seq:sink) { std::cout << create_fasta_string(seq->header(), seq->sequence)<<"\n"; } } |

ex_intersect_sorted¶
Unit test shows how to how to find an intersection of two sorted vectors of data
USAGE:
./ex_intersect_sorted
Keywords:
Categories:
- core/algorithms/basic_algorithms.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | #include <vector> #include <iostream> #include <iterator> #include <core/algorithms/basic_algorithms.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test shows how to how to find an intersection of two sorted vectors of data USAGE: ./ex_intersect_sorted )"; /** @brief Shows how to find an intersection of two sorted vectors of data * * CATEGORIES: core/algorithms/basic_algorithms.hh * KEYWORDS: algorithms; data structures */ int main(const int argc, const char* argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::vector<int> range1({1,2,3,5,6,7}), range2({5,6,7,8,9,10}), repeated; // Note that both <code>range1</code> and <code>range2</code> are already sorted! core::algorithms::intersect_sorted(range1.begin(), range1.end(), range2.begin(), range2.end(), repeated); // Print the element found as the intersection between the two ranges std::copy(repeated.begin(), repeated.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << "\n"; } |

ex_local_BBQ_coordinates¶
Unit test which reads a PDB file and prints local coordinates for side chain atoms. The example uses BBQ local coordinate system definition, based on three subsequent alpha carbon atoms.
USAGE:
./ex_local_BBQ_coordinates input.pdb
EXAMPLE:
./ex_local_BBQ_coordinates 5edw.pdb
REFERENCE: D. Gront, S. Kmiecik, A. Kolinski . “Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates.” J Comput Chem (2007) 1593-1597. doi:10.1002/jcc.20624
Keywords:
- PDB input
- local coordinates
Categories:
- core::calc::structural::transformations::local_BBQ_coordinates
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/transformations/Rototranslation.hh> #include <core/calc/structural/transformations/transformation_utils.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads a PDB file and prints local coordinates for side chain atoms. The example uses BBQ local coordinate system definition, based on three subsequent alpha carbon atoms. USAGE: ./ex_local_BBQ_coordinates input.pdb EXAMPLE: ./ex_local_BBQ_coordinates 5edw.pdb REFERENCE: D. Gront, S. Kmiecik, A. Kolinski . "Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates." J Comput Chem (2007) 1593-1597. doi:10.1002/jcc.20624 )"; /** @brief Reads a PDB file and prints local coordinates for side chain atoms * * CATEGORIES: core::calc::structural::transformations::local_BBQ_coordinates * KEYWORDS: PDB input; local coordinates */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; using namespace core::calc::structural::transformations; core::data::io::Pdb reader(argv[1]); Structure_SP strctr = reader.create_structure(0); Rototranslation_SP rt = nullptr; for(auto a_chain : *strctr) { for (core::index2 i_residue = 1; i_residue < a_chain->count_residues() - 1; ++i_residue) { const Residue & the_residue = *(*a_chain)[i_residue]; const Residue & prev_residue = *(*a_chain)[i_residue - 1]; const Residue & next_residue = *(*a_chain)[i_residue + 1]; try { rt = local_BBQ_coordinates(*prev_residue.find_atom_safe(" CA "), *the_residue.find_atom_safe(" CA "), *next_residue.find_atom_safe(" CA ")); } catch (utils::exceptions::AtomNotFound ex) { i_residue+=2; continue; } Vec3 tmp_atom; for (auto i_atom : the_residue) { tmp_atom = *i_atom; rt->apply(*i_atom); std::cout << i_atom->to_pdb_line() << "\n"; // --- Here we test if the inverse transformation really moves an atom to its original location rt->apply_inverse(*i_atom); if(tmp_atom.distance_to(*i_atom)>0.001) throw std::runtime_error("Incorrect position after transformation!"); } } } } |

ex_local_coordinates_three_atoms¶
Unit test which reads a PDB file and prints local coordinates of every atom. For every residue, a local coordinate system (LCS) is constructed based on its N, C-alpha and C atoms. Then the program prints coordinates of all the atoms of that residue defined in the respective LCS.
USAGE:
./ex_local_coordinates_three_atoms input.pdb
EXAMPLE:
./ex_local_coordinates_three_atoms 5edw.pdb
Keywords:
- PDB input
- local coordinates
- rototranslation
Categories:
- core::calc::structural::transformations::local_coordinates_three_atoms
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/calc/structural/transformations/Rototranslation.hh> #include <core/calc/structural/transformations/transformation_utils.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads a PDB file and prints local coordinates of every atom. For every residue, a local coordinate system (LCS) is constructed based on its N, C-alpha and C atoms. Then the program prints coordinates of all the atoms of that residue defined in the respective LCS. USAGE: ./ex_local_coordinates_three_atoms input.pdb EXAMPLE: ./ex_local_coordinates_three_atoms 5edw.pdb )"; /** @brief Reads a PDB file and prints local coordinates for sidechain atoms * * CATEGORIES: core::calc::structural::transformations::local_coordinates_three_atoms * KEYWORDS: PDB input; local coordinates; rototranslation */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; core::data::io::Pdb reader(argv[1]); core::data::structural::Structure_SP strctr = reader.create_structure(0); for (auto it_resid = strctr->first_residue(); it_resid != strctr->last_residue(); ++it_resid) { PdbAtom_SP n = (*it_resid)->find_atom(" N "); PdbAtom_SP ca = (*it_resid)->find_atom(" CA "); PdbAtom_SP c = (*it_resid)->find_atom(" C "); if ((n == nullptr)||(ca == nullptr)||(c == nullptr)) { std::cout << "Missing backbone atom\n"; continue; } core::calc::structural::transformations::Rototranslation_SP rt = core::calc::structural::transformations::local_coordinates_three_atoms(*n,*ca,*c); Vec3 tmp_atom; for (auto i_atom : **it_resid) { tmp_atom = *i_atom; rt->apply(*i_atom); std::cout << i_atom->to_pdb_line() << "\n"; // --- Here we test if the inverse transformation really moves an atom to its original location rt->apply_inverse(*i_atom); if(tmp_atom.distance_to(*i_atom)>0.001) throw std::runtime_error("Incorrect position after transformation!"); } } } |

ex_mmCif¶
Unit test which shows how to read CIF files.
USAGE:
ex_Cif file.cif
EXAMPLE:
ex_Cif AA3.cif
Keywords:
Categories:
- core/data/io/Cif
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #include <core/data/io/Cif.hh> #include <core/data/io/mmCif.hh> #include <utils/Logger.hh> #include <utils/LogManager.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which shows how to read CIF files. USAGE: ex_Cif file.cif EXAMPLE: ex_Cif AA3.cif )"; /** @brief ex_Cif tests reading CIF files * * CATEGORIES: core/data/io/Cif * KEYWORDS: CIF input */ int main(const int argc, const char *argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter utils::LogManager::INFO(); // --- INFO is the default logging level; set it to FINE to see more core::data::io::mmCif reader(argv[1]); std::cout<<reader.pdb_code()<<"\n"; core::data::structural::Structure_SP strc = reader.create_structure(0); for (auto a=strc->first_atom();a!=strc->last_atom();++a) std::cout<< (*a)->to_pdb_line()<<"\n"; } |

ex_monomer_io¶
The program converts a monomer structure from CIF format to internal formats used by BioShell. Use it to register your own monomer which is missing in BioShell library. The program is also used to create ‘monomers.txt’ file from BioShell distribution (located in ./data/ directory). In order to do so, download the fresh repository of monomers in CIF format from: http://ligand-expo.rcsb.org/dictionaries/Components-pub.cif and run the program. Then replace the released monomers.txt file with the new one
USAGE:
./ex_monomer_io -in::monomers::cif=HEM.cif -out:file=hem.txt
./ex_monomer_io -in::monomers::cif=Components-pub.cif
Keywords:
- monomers
- option parsing
Categories:
- core/chemical/Monomer; utils/options/OptionParser; utils/options/Option
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #include <fstream> #include <iostream> #include <core/chemical/Monomer.hh> #include <core/chemical/monomer_io.hh> #include <utils/options/Option.hh> #include <utils/options/OptionParser.hh> #include <utils/options/input_options.hh> #include <utils/options/output_options.hh> #include <utils/exit.hh> using namespace core::chemical; std::string program_info = R"( The program converts a monomer structure from CIF format to internal formats used by BioShell. Use it to register your own monomer which is missing in BioShell library. The program is also used to create 'monomers.txt' file from BioShell distribution (located in ./data/ directory). In order to do so, download the fresh repository of monomers in CIF format from: http://ligand-expo.rcsb.org/dictionaries/Components-pub.cif and run the program. Then replace the released monomers.txt file with the new one USAGE: ./ex_monomer_io -in::monomers::cif=HEM.cif -out:file=hem.txt ./ex_monomer_io -in::monomers::cif=Components-pub.cif )"; /** @brief The program converts a monomer structure from CIF format to internal formats used by BioShell. * * CATEGORIES: core/chemical/Monomer; utils/options/OptionParser; utils/options/Option * KEYWORDS: monomers; option parsing */ int main(const int argc, const char* argv[]) { using namespace utils::options; utils::options::OptionParser & cmd = OptionParser::get(); cmd.register_option(utils::options::help); cmd.register_option(verbose, mute); cmd.register_option(db_path); cmd.register_option(input_bin_monomers, input_cif_monomers,input_txt_monomers); cmd.register_option(output_file); cmd.program_info(program_info); if (!cmd.parse_cmdline(argc, argv)) return 1; if (input_cif_monomers.was_used()) read_monomers_cif(option_value<std::string>(input_cif_monomers)); if (input_txt_monomers.was_used()) read_monomers_txt(option_value<std::string>(input_txt_monomers)); if (input_bin_monomers.was_used()) read_monomers_binary(option_value<std::string>(input_bin_monomers)); write_monomers_txt(option_value<std::string>(output_file,"monomers.txt")); } |

ex_pdb_to_fasta¶
Unit test which reads a PDB file and writes protein sequence(s) in FASTA format. Unlike, ap_pdb_to_fasta_ss.cc application, this doesn’t print secondary structure strings
USAGE:
./ex_pdb_to_fasta input.pdb
EXAMPLE:
./ex_pdb_to_fasta 5edw.pdb
Keywords:
Categories:
- core::data::io::Pdb
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/Structure.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads a PDB file and writes protein sequence(s) in FASTA format. Unlike, ap_pdb_to_fasta_ss.cc application, this doesn't print secondary structure strings USAGE: ./ex_pdb_to_fasta input.pdb EXAMPLE: ./ex_pdb_to_fasta 5edw.pdb )"; /** @brief Reads a PDB file and writes protein sequence(s) in FASTA format. * * This is a simplified version of ap_pdb_to_fasta_ss.cc application * USAGE: * ex_pdb_to_fasta 5edw.pdb * * CATEGORIES: core::data::io::Pdb * KEYWORDS: PDB input * GROUP: File processing; Format conversion */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace core::data::io; // Pdb and create_fasta_string lives there Pdb reader(argv[1], is_not_alternative, only_ss_from_header, true); core::data::structural::Structure_SP strctr = (reader.create_structure(0)); // Iterate over all chains for (int ic = 0; ic < strctr->count_chains(); ++ic) std::cout << "> " << strctr->code() << (*strctr)[ic]->id() << "\n" // --- e.g. prints "> 2gb1 A" << (*strctr)[ic]->create_sequence()->sequence << "\n"; // --- prints the sequence itself } |

ex_peptide_hydrogen¶
ex_peptide_hydrogen reconstructs peptide hydrogen atoms using BioShell algorithm, where amide H is placed in reference to its N atom. Resulting coordinates are printed on the screen. The program also computes the amide-H positions using DSSP approach and calculates the average error (in Angstroms) between the two methods.
USAGE:
ex_peptide_hydrogen input.pdb
EXAMPLE:
ex_peptide_hydrogen 5edw.pdb
REFERENCE: Kabsch, Wolfgang, and Christian Sander. “Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features.” Biopolymers 22 (1983): 2577-2637. doi:10.1002/bip.360221211
Keywords:
- PDB input
- hydrogen reconstruction
Categories:
- core::calc::structural::peptide_hydrogen()
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/calc/structural/interactions/BackboneHBondCollector.hh> #include <utils/exit.hh> using namespace core::data::structural; using namespace core::data::io; using namespace core::data::basic; std::string program_info = R"( ex_peptide_hydrogen reconstructs peptide hydrogen atoms using BioShell algorithm, where amide H is placed in reference to its N atom. Resulting coordinates are printed on the screen. The program also computes the amide-H positions using DSSP approach and calculates the average error (in Angstroms) between the two methods. USAGE: ex_peptide_hydrogen input.pdb EXAMPLE: ex_peptide_hydrogen 5edw.pdb REFERENCE: Kabsch, Wolfgang, and Christian Sander. "Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features." Biopolymers 22 (1983): 2577-2637. doi:10.1002/bip.360221211 )"; /** @brief Reconstructs peptide hydrogen atoms using two methods and compares the error between them. * CATEGORIES: core::calc::structural::peptide_hydrogen() * KEYWORDS: PDB input; hydrogen reconstruction */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::io::Pdb reader(argv[1], all_true(is_not_alternative, is_not_water)); // --- Read in a PDB file core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- create a Structure object from the first model double err = 0, n = 0; auto res_it = ++(strctr->first_residue()); // --- The residue being reconstructed auto prev_res_it = strctr->first_residue(); // --- preceding residue for (; res_it != strctr->last_residue(); ++res_it) { std::cerr << "# reconstructing:" << (**prev_res_it) << " and " << (**res_it) << "\n"; if (((*prev_res_it)->residue_type().parent_id > 20) || ((*res_it)->residue_type().parent_id > 20)) { // --- its not an amino acid ++prev_res_it; continue; } try { if((*prev_res_it)->owner() != (*res_it)->owner()) { std::cerr << "# Chain break between residues:" << (**prev_res_it) << " and " << (**res_it) << "\n"; ++prev_res_it; continue; } // --- Rebuild the peptide hydrogen in a residue pointed by res_it iterator. // --- This method actually adds the newly created H atom to the residue core::calc::structural::interactions::peptide_hydrogen(*prev_res_it, *res_it); auto new_H = (*res_it)->find_atom_safe(" H "); // --- Here we reconstruct amide H knowing the relevant atoms, but now according to the DSSP approach. Resulting H is not inserted auto prev_O = (*prev_res_it)->find_atom_safe(" O "); auto prev_C = (*prev_res_it)->find_atom_safe(" C "); auto this_N = (*res_it)->find_atom_safe(" N "); PdbAtom other_H; core::calc::structural::interactions::peptide_hydrogen_dssp(*prev_C, *prev_O, *this_N, other_H); err += other_H.distance_to(*new_H); n++; for (const auto &atom : **res_it) std::cout << atom->to_pdb_line() << "\n"; //-- print all atoms in the current residue (in PDB format) ++prev_res_it; // --- advance one of the iterators by one residue; the other iterator is advanced by the loop } catch (utils::exceptions::AtomNotFound e) { std::cerr << e.what() << "\n"; ++prev_res_it; } } std::cout << "# difference between two methods: " << err / n << "\n"; } |

ex_protein_peptide_interface¶
ex_protein_peptide_interface finds atomic contacts between a receptor and a peptide found in an input PDB file. The peptide is defined as a protein chain shorter than 35 residues, while the receptor must consist of at least 40 amino acids. Output provides: protein residue name and ID, protein chain ID, peptide protein name and ID, peptide chain ID, minimum distance between the residues, e.g.: ILE 36 A ARG 104 X 5.92977 LEU 44 A ARG 104 X 5.92685 LEU 44 A LEU 108 X 5.57779 GLU 45 A THR 102 X 6.81994
USAGE:
ex_protein_peptide_interface file.pdb cutoff-distance
EXAMPLE:
ex_protein_peptide_interface 1dt7.pdb 7.0
where 1dt7.pdb id an input file and 7.0 - contact distance in Angstroms.
Keywords:
- PDB input
- contact map
- peptide
- STL
Categories:
- core::data::structural::Structure
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | #include <iostream> #include <map> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/exit.hh> std::string program_info = R"( ex_protein_peptide_interface finds atomic contacts between a receptor and a peptide found in an input PDB file. The peptide is defined as a protein chain shorter than 35 residues, while the receptor must consist of at least 40 amino acids. Output provides: protein residue name and ID, protein chain ID, peptide protein name and ID, peptide chain ID, minimum distance between the residues, e.g.: ILE 36 A ARG 104 X 5.92977 LEU 44 A ARG 104 X 5.92685 LEU 44 A LEU 108 X 5.57779 GLU 45 A THR 102 X 6.81994 USAGE: ex_protein_peptide_interface file.pdb cutoff-distance EXAMPLE: ex_protein_peptide_interface 1dt7.pdb 7.0 where 1dt7.pdb id an input file and 7.0 - contact distance in Angstroms. )"; unsigned int MAX_PEPTIDE_LENGTH = 35; unsigned int MIN_PROTEIN_LENGTH = 40; /** @brief Finds contacts atomic contacts between a receptor and a peptide. * * * CATEGORIES: core::data::structural::Structure * KEYWORDS: PDB input; contact map; peptide; STL */ int main(const int argc, const char* argv[]) { if (argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::io; Pdb reader(argv[1], all_true(is_not_water,is_not_alternative,is_not_hydrogen)); // --- file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); core::data::structural::selectors::IsAA is_aa_tester; core::data::structural::Structure_SP sub_strctr = strctr;// = strctr->clone(is_aa_tester); double cutoff = utils::from_string<double>(argv[2]); // The second parameter is the contact distance (in Angstroms) for (auto protein_chain_sp: *strctr) { // --- protein_chain_sp is a shared pointer to a chain if (protein_chain_sp->size() < MIN_PROTEIN_LENGTH) continue; for (auto i_residue_sp: *protein_chain_sp) { // --- i_residue_sp is a shared pointer to a residue for (auto peptide_chain_sp: *strctr) { if (peptide_chain_sp->size() > MAX_PEPTIDE_LENGTH) continue; for (auto j_residue_sp: *peptide_chain_sp) { double d = (i_residue_sp)->min_distance(j_residue_sp); if (d < cutoff) std::cout << (*i_residue_sp) << " " << (*i_residue_sp).owner()->id() << " " << (*j_residue_sp) << " " << (*j_residue_sp).owner()->id() << " " << d << "\n"; } } } } } |

ex_ramachandran_kd_tree¶
ex_ramachandran_kd_tree partitions observations from a Ramachandran map
USAGE:
ex_ramachandran_kd_tree phi_psi.dat n_level width
where phi_psi.dat is an input file with two columns of data (Phi and Psi angles), width - width of a square range for counting neighbors and n_level - maximum level on the kd-tree to assign.
The output consists of lines as below: -65.08 125.25 15 684 where the first two columns contain the Phi,Psi angles, respectively, that have been loaded from the input file. The third value (15 in the example) is the number of a rectangular area resulting from the 2D-tree construction. Finally 684 is the number of points found in a square area width x width centered at the given point. That value provides in insight how probable is a given Phi, Psi observation
Keywords:
- neighborhood detection
- data structures
- algorithms
Categories:
- core::algorithms::trees::BinaryTreeNode; core::algorithms::trees::kd_tree.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | #include <memory> #include <iostream> #include <random> #include <core/algorithms/trees/kd_tree.hh> #include <core/algorithms/trees/BinaryTreeNode.hh> #include <core/algorithms/trees/algorithms.hh> #include <core/data/io/DataTable.hh> #include <utils/exit.hh> std::string program_info = R"( ex_ramachandran_kd_tree partitions observations from a Ramachandran map USAGE: ex_ramachandran_kd_tree phi_psi.dat n_level width where phi_psi.dat is an input file with two columns of data (Phi and Psi angles), width - width of a square range for counting neighbors and n_level - maximum level on the kd-tree to assign. The output consists of lines as below: -65.08 125.25 15 684 where the first two columns contain the Phi,Psi angles, respectively, that have been loaded from the input file. The third value (15 in the example) is the number of a rectangular area resulting from the 2D-tree construction. Finally 684 is the number of points found in a square area width x width centered at the given point. That value provides in insight how probable is a given Phi, Psi observation )"; using namespace core::algorithms::trees; class Point: public std::pair<float, float> { public: Point(float phi, float psi) : std::pair<float, float>(phi, psi) {} float operator[](const size_t k) const { if (k == 0) return first; else return second; } }; /// Operation that computes the distance between points on Ramachandran map struct PhiPsiDistance { /// This operation will be called at every tree node considered during a tree traversal float operator() (const Point & n1,const Point & n2) const { float d = (n1.first - n2.first); float d2 = d * d; d = (n1.second - n2.second); d2 += d * d; return sqrt(d2); } }; /** @brief Tree traversal operation prints a given point along with its node level and the number of neighbors */ struct PrintPoint { PrintPoint(std::shared_ptr<BinaryTreeNode<KDTreeNode<Point>>> root, float width) : root_(root), w_(width / 2.0) {} /// this operator prints a visited node on the screen void operator()(std::shared_ptr<BinaryTreeNode<KDTreeNode<Point>>> node) { Point q_low(node->element.element[0] - w_, node->element.element[1] - w_); Point q_up(node->element.element[0] + w_, node->element.element[1] + w_); std::vector<Point> hits; search_kd_tree(root_, q_low, q_up, 2, hits); std::cout << utils::string_format("%7.2f %7.2f %3d %4d\n", node->element.element.first, node->element.element.second, node->element.level, hits.size()); } private: std::shared_ptr<BinaryTreeNode<KDTreeNode<Point>>> root_; float w_; }; /** @brief A simple example shows how to use BioShell kd-tree routines. * * The program reads a file with Phi, Psi observations and partitions them in a kd-tree. * * CATEGORIES: core::algorithms::trees::BinaryTreeNode; core::algorithms::trees::kd_tree.hh * KEYWORDS: neighborhood detection; data structures; algorithms */ int main(const int argc, const char* argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameters core::index2 n_level = (argc > 2) ? atoi(argv[2]) : 4; core::index2 width = (argc > 3) ? atof(argv[3]) : 5; // ---------- First we read a file with Phi, Psi observations core::data::io::DataTable dt; dt.load(argv[1]); std::vector<Point> points; // container for the points for(const auto & row:dt) points.emplace_back(row.get<float>(0), row.get<float>(1)); // ---------- Here the actual kd-tree is constructed std::shared_ptr<BinaryTreeNode<KDTreeNode<Point>>> root = create_kd_tree<Point, std::vector<Point>::iterator, CompareAsReferences<Point>>(points.begin(), points.end(), 2); std::vector<std::shared_ptr<BinaryTreeNode<KDTreeNode<Point>>>> node_group_representatives; collect_given_level(root, n_level, node_group_representatives, 0); core::index2 group_id = 0; for(const auto node:node_group_representatives) { depth_first_preorder(node, [group_id](std::shared_ptr<BinaryTreeNode<KDTreeNode<Point>>> node) { node->element.level = group_id; }); ++group_id; } // ---------- Here we print each node PrintPoint pp(root, width); breadth_first_preorder(root, pp); // finally, each node is printed } |

ex_random_vector_on_sphere¶
Simple test shows that random_vector_on_sphere() really produces a unifirm distribution
Keywords:
- no_keywords
Categories:
- simulations/movers/random_vector_on_sphere
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | #include <iostream> #include <core/data/basic/Vec3.hh> #include <simulations/movers/movers_utils.hh> using namespace core::data::basic; using namespace simulations; /** @brief Simple test shows that random_vector_on_sphere() really produces a unifirm distribution * * CATEGORIES: simulations/movers/random_vector_on_sphere; */ int main(int argc, char *argv[]) { core::data::basic::Vec3 v; core::data::basic::Vec3 s; core::index4 N = 100000; for (size_t i = 0; i < N; ++i) { simulations::movers::random_vector_on_sphere(v); s += v; std::cout << v << "\n"; } s /= N; std::cout << "# sum: " << s << "\n"; } |

ex_read_properties_file¶
Simple test for ex_read_properties_file function reads a file given from command line. The program expects a file in JAVA’s .properties file format
USAGE:
ex_read_properties_file input_file.properties
REFERENCE: https://en.wikipedia.org/wiki/.properties
Keywords:
- file utils
- properties file
Categories:
- utils/read_properties_file
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | #include <iostream> #include <utils/io_utils.hh> #include <utils/exit.hh> std::string program_info = R"( Simple test for ex_read_properties_file function reads a file given from command line. The program expects a file in JAVA's .properties file format USAGE: ex_read_properties_file input_file.properties REFERENCE: https://en.wikipedia.org/wiki/.properties )"; /** @brief Simple test reads .properties file and prints these settings on the screen * * CATEGORIES: utils/read_properties_file * KEYWORDS: file utils;properties file */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter for (int i = 1; i < argc; ++i) { // In the single line below we read the properties file auto mapa = utils::read_properties_file(argv[i]); // Here we print the content of the map in the same format (i.e. .properties) for (auto it = mapa.cbegin(); it != mapa.cend(); ++it) { std::cout << it->first << " : "; for (auto it2 = it->second.cbegin(); it2 != it->second.cend(); ++it2) std::cout << (*it2) << " "; std::cout << "\n"; } } } |

ex_selection_protocols¶
Simple test shows how to use AtomSelector from selection protocols set. As an example, selects atoms that belong to nucleic acid residues.
USAGE:
ex_selection_protocols 5edw.pdb
Keywords:
- PDB input
- Selection protocols
- structure selectors
Categories:
- core::protocols::keep_selected_atoms()
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <iostream> #include <sstream> #include <core/data/io/Pdb.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/protocols/selection_protocols.hh> #include <utils/exit.hh> std::string program_info = R"( Simple test shows how to use AtomSelector from selection protocols set. As an example, selects atoms that belong to nucleic acid residues. USAGE: ex_selection_protocols 5edw.pdb )"; /** @brief Shows how to use selection protocols functions * * CATEGORIES: core::protocols::keep_selected_atoms() * KEYWORDS: PDB input; Selection protocols; structure selectors */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; using namespace core::data::structural::selectors; using namespace core::protocols; core::data::io::Pdb reader(argv[1]); { // --- section which tests selecting nucleotides Structure_SP strctr = reader.create_structure(0); std::shared_ptr<AtomSelector> select_nt = std::make_shared<IsNT>(); keep_selected_atoms(*select_nt, *strctr); for (auto chain_sp : *strctr) std::cout << utils::string_format("\tchain %s has %3d residues satisfying the selector\n", chain_sp->id().c_str(), chain_sp->size()); } } |

ex_seq_io¶
Unit test which reads a SEQ file and prints it’s content in FASTA format.
USAGE:
./ex_seq_io SEQ-file
EXAMPLE:
./ex_seq_io 2gb1.seq
Keywords:
Categories:
- core/data/io/read_seq
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #include <iostream> #include <core/data/io/fasta_io.hh> #include <core/data/io/seq_io.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which reads a SEQ file and prints it’s content in FASTA format. USAGE: ./ex_seq_io SEQ-file EXAMPLE: ./ex_seq_io 2gb1.seq )"; /** @brief Example reads SEQ file and prints the data stored there in FASTA format * * CATEGORIES: core/data/io/read_seq * KEYWORDS: sequence; FASTA output; secondary structure; Format conversion */ int main(const int argc, const char* argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter core::data::sequence::SecondaryStructure_SP ss = core::data::io::read_seq(argv[1],""); ss->header(argv[1]); std::cout << core::data::io::create_fasta_string(*ss, 80)<<"\n"; std::cout << core::data::io::create_fasta_secondary_string(*ss, 80)<<"\n"; } |

ex_set_dihedral¶
Sets a particular values for Phi, Psi and Omega angles at a certain residue in a protein.
USAGE:
ex_set_dihedral res-id file.pdb phi psi omega
EXAMPLE:
ex_set_dihedral 2gb1.pdb 18 -80.4 90.4 180.0
where 2gb1.pdb is the protein structure to be modified, 18 is the residue ID and the three following real values are Phi, Psi and omega dihedrals (in the range [-180.0,180.0]). The results is printed in PDB format
Keywords:
Categories:
- core/calc/structural/transformations/Rototranslation
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | #include <iostream> #include <cmath> #include <core/data/io/Pdb.hh> #include <core/data/structural/Structure.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <core/calc/structural/protein_angles.hh> #include <core/calc/structural/transformations/Rototranslation.hh> #include <utils/exit.hh> using namespace core::data::structural; using namespace core::data::io; using namespace core::data::basic; std::string program_info = R"( Sets a particular values for Phi, Psi and Omega angles at a certain residue in a protein. USAGE: ex_set_dihedral res-id file.pdb phi psi omega EXAMPLE: ex_set_dihedral 2gb1.pdb 18 -80.4 90.4 180.0 where 2gb1.pdb is the protein structure to be modified, 18 is the residue ID and the three following real values are Phi, Psi and omega dihedrals (in the range [-180.0,180.0]). The results is printed in PDB format )"; /** @brief Sets a particular values for Phi, Psi and Omega angles at a certain residue in a protein. * * * CATEGORIES: core/calc/structural/transformations/Rototranslation * KEYWORDS: PDB input; rototranslation; structural properties */ int main(const int argc, const char *argv[]) { if(argc < 3) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter // --- The first parameter of the program is the PDB file name core::data::io::Pdb reader(argv[1], is_not_alternative, keep_all, false); // --- Read in a PDB file core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- create a Structure object from the first model and extract the first chain (indexed as 'A') from it core::data::structural::Chain & chain = *(strctr->get_chain('A')); core::index2 res_idx = utils::from_string<core::index2>(argv[2]); core::calc::structural::transformations::Rototranslation rt; // --- Phi rotation; the new value of the Phi angle (in degrees) is the fourth parameter of this program if (argc > 3 && strlen(argv[3]) > 1) { double phi = core::calc::structural::evaluate_phi(*chain[res_idx-1],*chain[res_idx]); std::cerr << "Phi angle before change: " << phi * 180.0 / M_PI << " degrees\n"; phi = utils::from_string<double>(argv[3]) * M_PI / 180.0 - phi; PdbAtom & N = *chain[res_idx]->find_atom_safe(" N "); PdbAtom & CA = *chain[res_idx]->find_atom_safe(" CA "); core::calc::structural::transformations::Rototranslation::around_axis(N, CA, phi, CA, rt); for (core::index2 ires = 0 ; ires < res_idx; ++ires) { for (PdbAtom_SP ai : *chain[ires]) rt.apply(*ai); } if(chain[res_idx]->find_atom(" H ")!=nullptr) rt.apply(*chain[res_idx]->find_atom(" H ")); } // --- Psi rotation; the new value of the Psi angle (in degrees) is the fifth parameter of this program if (argc > 4 && strlen(argv[4]) > 1) { double psi = core::calc::structural::evaluate_psi(*chain[res_idx],*chain[res_idx+1]); std::cerr << "Psi angle before change: " << psi * 180.0 / M_PI << " degrees\n"; psi = utils::from_string<double>(argv[4]) * M_PI / 180.0 - psi; PdbAtom & C = *chain[res_idx]->find_atom_safe(" C "); PdbAtom & CA = *chain[res_idx]->find_atom_safe(" CA "); core::calc::structural::transformations::Rototranslation::around_axis(CA, C, psi, C, rt); rt.apply(*chain[res_idx]->find_atom_safe(" O ")); for (core::index2 ires = res_idx + 1; ires < chain.count_residues(); ++ires) { for (PdbAtom_SP ai : *chain[ires]) rt.apply(*ai); } } // --- Omega rotation; the new value of the Omega angle (in degrees) is the fifth parameter of this program if (argc > 5 && strlen(argv[5]) > 1) { double omega = core::calc::structural::evaluate_omega(*chain[res_idx],*chain[res_idx+1]); std::cerr << "Omega angle before change: " << omega * 180.0 / M_PI << " degrees\n"; omega = utils::from_string<double>(argv[5]) * M_PI / 180.0 - omega; PdbAtom & C = *chain[res_idx]->find_atom_safe(" C "); PdbAtom & N = *chain[res_idx+1]->find_atom_safe(" N "); core::calc::structural::transformations::Rototranslation::around_axis(C, N, omega, N, rt); for (core::index2 ires = res_idx + 1; ires < chain.count_residues(); ++ires) { for (PdbAtom_SP ai : *chain[ires]) rt.apply(*ai); } } std::for_each(chain.first_atom(),chain.last_atom(),[](PdbAtom_SP ai){std::cout << ai->to_pdb_line()<<"\n";}); } |

ex_simpson_integration¶
Unit test for Simpson numerical integration routine.
USAGE:
ex_simpson_integration
REFERENCE: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992) Numerical Recipes in C: The Art of Scientific Computing, Cambridge Univ. Press,
Keywords:
Categories:
- core/calc/numeric/simpson_integration
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <math.h> #include <iostream> #include <core/calc/numeric/numerical_integration.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test for Simpson numerical integration routine. USAGE: ex_simpson_integration REFERENCE: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992) Numerical Recipes in C: The Art of Scientific Computing, Cambridge Univ. Press, )"; /// First of the two functions integrated in this example struct Sin { double operator()(double x) { return sin(x); } } sin_func; /// Second of the two functions integrated in this example struct X2 { double operator()(double x) { return x*x; } } x_square_func; /** @brief Example for numerical integration with Simpson method * * CATEGORIES: core/calc/numeric/simpson_integration * KEYWORDS: numerical methods */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::cout << core::calc::numeric::simpson_integration(sin_func, 0, M_PI,1000) << "\n"; std::cout << core::calc::numeric::simpson_integration(x_square_func, 0.0, 1.0,1000) << "\n"; } |

ex_split_fasta¶
ex_split_fasta reads a FASTA file and writes every sequence from it in a separate file
EXAMPLE:
./ex_split_fasta 5edw.fasta
Keywords:
Categories:
- core/data/io/fasta_io.hh
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #include <iostream> #include <core/data/io/fasta_io.hh> #include <utils/string_utils.hh> #include <utils/exit.hh> std::string program_info = R"( ex_split_fasta reads a FASTA file and writes every sequence from it in a separate file EXAMPLE: ./ex_split_fasta 5edw.fasta )"; /** @brief Reads a file with sequences in FASTA format and writes each sequence to a separate FASTA file. * * CATEGORIES: core/data/io/fasta_io.hh; * KEYWORDS: FASTA input; FASTA output; sequence; FASTA; pre-processing */ int main(const int argc, const char *argv[]) { if(argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using core::data::sequence::Sequence_SP; // --- Sequence_SP is just a std::shared_ptr to core::data::sequence::Sequence type using namespace core::data::io; // --- for FASTA I/O // --- Create a container where the sequences will be stored std::vector<Sequence_SP> sequences; // --- Read a file with FASTA sequences core::data::io::read_fasta_file(argv[1], sequences); // --- Write them in separate FASTA files for (const Sequence_SP s : sequences) { std::string header = s->header(); std::replace(header.begin(), header.end(), '|', ' '); // --- fix ncbi-style header in FASTA files auto words = utils::split(header, {' '}); // --- We take the very first word of the FASTA as a file name; hopefully it is sth meaningful, e.g. a gene name std::ofstream out(words[0] + ".fasta"); out << "> " << s->header() << "\n" << s->sequence << "\n"; out.close(); } } |

ex_structure_iterators¶
Example that shows how to iterate through structural components
USAGE:
ex_structure_iterators 1dt7.pdb
where 1dt7.pdb id an input file (PDB format)
Keywords:
Categories:
- core/data/structural/Structure
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #include <iostream> #include <iomanip> #include <core/data/io/Pdb.hh> #include <core/chemical/Molecule.hh> #include <core/chemical/molecule_utils.hh> #include <core/data/structural/PdbAtom.hh> #include <utils/exit.hh> std::string program_info = R"( Example that shows how to iterate through structural components USAGE: ex_structure_iterators 1dt7.pdb where 1dt7.pdb id an input file (PDB format) )"; /** @brief Shows how to iterate through structural components (residues, atoms, etc) * * CATEGORIES: core/data/structural/Structure * KEYWORDS: PDB input; Structure; Chain; Residue; PdbAtom; STL */ int main(const int argc, const char* argv[]) { if (argc < 2) utils::exit_OK_with_message(program_info); // --- complain about missing program parameter using namespace core::data::structural; core::data::io::Pdb reader(argv[1],core::data::io::is_not_alternative); // file name (PDB format, may be gzip-ped) Structure_SP strctr = reader.create_structure(0); // ------- Directly iterate over atoms of a structure, jump over chains, residues, etc. // ------- atom_it is an iterator, which points to a shared pointer to an atom int n_atoms_1 = 0; for (auto atom_it = strctr->first_atom(); atom_it != strctr->last_atom(); ++atom_it) ++n_atoms_1; // ------- Iterate over chains, residues, atoms int n_atoms_2 = 0; int n_chains = 0, n_residues = 0; for(auto chain_sp: *strctr) { // --- chain_sp is already a shared pointer to a chain ++n_chains; for(auto residue_sp: *chain_sp) { // --- residue_sp is already a shared pointer to a residue ++n_residues; for(auto atom_sp: *residue_sp) // --- atom_sp is already a shared pointer to an atom ++n_atoms_2; } } int n_residues_2 = 0; // ------- Iterate over residues of a structure, jump over chains // ------- iter_res_i is an iterator, which points to a shared pointer to a residue for (auto iter_res_i = strctr->first_residue(); iter_res_i != strctr->last_residue(); ++iter_res_i) ++n_residues_2; std::cout << "These three atom counts should be equal: " << n_atoms_1 << ", " << n_atoms_2 << " and " << strctr->count_atoms() << "\n"; std::cout << "These three residue counts should be equal: " << n_residues << ", " << n_residues_2 << " and " << strctr->count_residues() << "\n"; std::cout << "These two chain counts should be equal: " << n_chains << " and " << strctr->count_chains() << "\n"; } |

ex_structure_to_molecule¶
Unit test which creates a Molecule object from a given PDB file. As a test, the program lists all covalent bonds between a given ligand and the rest of the protein.
USAGE:
./ex_structure_to_molecule input.pdb ligand-code
EXAMPLE:
./ex_structure_to_molecule 4rm4.pdb HEM
Keywords:
Categories:
- core::chemical::structure_to_molecule
Input files:
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #include <iostream> #include <memory> #include <core/algorithms/graph_algorithms.hh> #include <core/chemical/Molecule.hh> #include <core/chemical/molecule_utils.hh> #include <core/calc/structural/angles.hh> #include <core/data/structural/PdbAtom.hh> #include <core/data/structural/selectors/structure_selectors.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which creates a Molecule object from a given PDB file. As a test, the program lists all covalent bonds between a given ligand and the rest of the protein. USAGE: ./ex_structure_to_molecule input.pdb ligand-code EXAMPLE: ./ex_structure_to_molecule 4rm4.pdb HEM )"; /** @brief Creates a Molecule object from a given PDB file. * * As a test, the program lists all covalent bonds between a given ligand and the rest of the protein * * CATEGORIES: core::chemical::structure_to_molecule * KEYWORDS: molecule */ int main(const int argc, const char *argv[]) { if (argc <3) utils::exit_OK_with_message(program_info); using namespace core::chemical; using namespace core::data::structural; PdbMolecule_SP molecule; // --- Read structure that we use to build a molecule core::data::io::Pdb reader(argv[1]); // file name (PDB format, may be gzip-ped) core::data::structural::Structure_SP strctr = reader.create_structure(0); // --- Create molecule object molecule = structure_to_molecule(*strctr); // --- Find the ligand object(s) for a given 3-letter code selectors::SelectResidueByName ligand_by_name(argv[2]); std::vector<Residue_SP> ligand; strctr->find_residues(ligand_by_name, ligand); for (const auto &l:ligand) { // --- iterate over ligands found std::cout << "Bonds between " << l->residue_type().code3 << " and the rest of the protein:\n"; for (const auto &atom : *l) { // --- iterate over atoms of a ligand, find all its bounded partners for (auto it = molecule->cbegin_atom(atom); it != molecule->cend_atom(atom); ++it) { if ((**it).owner() != l) // --- if the two atoms belong to different residues - print the output std::cout << (*atom).atom_name() << " - " << (**it).atom_name() << " " << (**it).owner()->residue_type().code3 << " " << (**it).owner()->residue_id() << " " << (**it).owner()->owner()->id() << "\n"; } } } } |

ex_test_gzip¶
Unit test which gzips and un-gzips a string data.
USAGE:
./ex_test_gzip
)”;
// — The input data to be compressed std::string ala_cif_data = R”(data_ALA # _chem_comp.id ALA _chem_comp.name ALANINE _chem_comp.type “L-PEPTIDE LINKING” _chem_comp.pdbx_type ATOMP _chem_comp.formula “C3 H7 N O2” _chem_comp.mon_nstd_parent_comp_id ? _chem_comp.pdbx_synonyms ? _chem_comp.pdbx_formal_charge 0 _chem_comp.pdbx_initial_date 1999-07-08 _chem_comp.pdbx_modified_date 2011-06-04 _chem_comp.pdbx_ambiguous_flag N _chem_comp.pdbx_release_status REL _chem_comp.pdbx_replaced_by ? _chem_comp.pdbx_replaces ? _chem_comp.formula_weight 89.093 _chem_comp.one_letter_code A _chem_comp.three_letter_code ALA _chem_comp.pdbx_model_coordinates_details ? _chem_comp.pdbx_model_coordinates_missing_flag N _chem_comp.pdbx_ideal_coordinates_details ? _chem_comp.pdbx_ideal_coordinates_missing_flag N _chem_comp.pdbx_model_coordinates_db_code ? _chem_comp.pdbx_subcomponent_list ? _chem_comp.pdbx_processing_site RCSB
Keywords:
- GZIP
Categories:
- utils/io_utils
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | #include <sstream> #include <utils/io_utils.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test which gzips and un-gzips a string data. USAGE: ./ex_test_gzip )"; // --- The input data to be compressed std::string ala_cif_data = R"(data_ALA # _chem_comp.id ALA _chem_comp.name ALANINE _chem_comp.type "L-PEPTIDE LINKING" _chem_comp.pdbx_type ATOMP _chem_comp.formula "C3 H7 N O2" _chem_comp.mon_nstd_parent_comp_id ? _chem_comp.pdbx_synonyms ? _chem_comp.pdbx_formal_charge 0 _chem_comp.pdbx_initial_date 1999-07-08 _chem_comp.pdbx_modified_date 2011-06-04 _chem_comp.pdbx_ambiguous_flag N _chem_comp.pdbx_release_status REL _chem_comp.pdbx_replaced_by ? _chem_comp.pdbx_replaces ? _chem_comp.formula_weight 89.093 _chem_comp.one_letter_code A _chem_comp.three_letter_code ALA _chem_comp.pdbx_model_coordinates_details ? _chem_comp.pdbx_model_coordinates_missing_flag N _chem_comp.pdbx_ideal_coordinates_details ? _chem_comp.pdbx_ideal_coordinates_missing_flag N _chem_comp.pdbx_model_coordinates_db_code ? _chem_comp.pdbx_subcomponent_list ? _chem_comp.pdbx_processing_site RCSB )"; /** @brief Simple test to gzip and un-gzip a string data * * CATEGORIES: utils/io_utils * KEYWORDS: GZIP */ int main(int cnt, char* argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::string zipped,result; // --- here we compress ala_cif_data string with ZIP and store the result in another string utils::zip_string(ala_cif_data,zipped); // --- here we un-zip it back and store the ouput in the string "result" utils::unzip_string(zipped,result); if (result == ala_cif_data) { std::cout << "GZIP OK :-)\n"; std::cout << "compressed from " << ala_cif_data.size() << " to " << zipped.size() << " bytes\n"; } else std::cout << "GZIP ERROR !!!\n"; // --- Here we un-zip directly to a stream std::stringstream ss; utils::unzip_string(zipped,ss); if (ss.str() == ala_cif_data) std::cout << "GZIP OK :-)\n"; else std::cout << "GZIP ERROR !!!\n"; } |

ex_uniquify¶
Unit test for uniquify() method which removes redundant objects from a container
USAGE:
./ex_uniquify
Keywords:
Categories:
- core/algorithms/basic_algorithms.hh
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #include <vector> #include <core/algorithms/basic_algorithms.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Unit test for uniquify() method which removes redundant objects from a container USAGE: ./ex_uniquify )"; /** @brief Tests uniquify() method which removes redundant objects from a container. * * CATEGORIES: core/algorithms/basic_algorithms.hh * KEYWORDS: data structures; algorithms */ int main(int cnt, char* argv[]) { if ((cnt > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); std::vector<int> datum = std::vector<int> { 1, 8, 4, 5, 9, 4, 5 }; // ---------- Below we define >is_equal< operator auto eq = [](std::vector<int>::iterator a, std::vector<int>::iterator b) -> bool { return *a == *b; }; // ---------- Below we define >less_then< operator auto lt = [](std::vector<int>::iterator a, std::vector<int>::iterator b) -> bool { return *a < *b; }; // ---------- Below we apply uniquify() operation on a range of integers datum.erase(core::algorithms::uniquify(datum.begin(), datum.end(), eq, lt), datum.end()); for (int c:datum) std::cout << c << " "; std::cout << "\n"; // ---------- Below we apply uniquify() operation on a range of characters std::vector<char> chars = std::vector<char> {'a', 'g', 'd', 'r', 'a', 'd'}; chars.erase(core::algorithms::uniquify(chars.begin(), chars.end()), chars.end()); for (char c:chars) std::cout << c << " "; std::cout << "\n"; } |

ex_web_client¶
Simple test for web_client methods downloads 2GB1 protein from rcsb.org website
USAGE:
ex_web_client
Keywords:
- WWW
Categories:
- ui/www/web_client
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #include <iostream> #include <ui/www/web_client.hh> #include <utils/exit.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Simple test for web_client methods downloads 2GB1 protein from rcsb.org website USAGE: ex_web_client )"; /** @brief Simple test for web_client methods downloads 2GB1 protein from rcsb.org website * * CATEGORIES: ui/www/web_client * KEYWORDS: WWW */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using namespace ui::www; http_t *request = http_get("http://files.rcsb.org/view/2GB1.pdb", NULL); if (!request) { std::cerr << "Invalid request.\n"; return 1; } http_status_t status = HTTP_STATUS_PENDING; int prev_size = -1; while (status == HTTP_STATUS_PENDING) { status = http_process(request); if (prev_size != (int) request->response_size) { std::cout << utils::string_format("%d byte(s) received.\n", (int) request->response_size); prev_size = (int) request->response_size; } } if (status == HTTP_STATUS_FAILED) { std::cerr << utils::string_format("HTTP request failed (%d): %s.\n", request->status_code, request->reason_phrase); http_release(request); return 1; } std::cout << "\nContent type: " << request->content_type << "\n\n" << (char const *) request->response_data << "\n"; http_release(request); return 0; } |

ex_z_matrix_to_cartesian¶
Test for z_matrix_to_cartesian() function recovers cartesian coordinates of a fluoroethylene from Z-matrix (internal coordinates)
USAGE:
./ex_z_matrix_to_cartesian
Keywords:
Categories:
- core::calc::structural::z_matrix_to_cartesian
Output files:
Program source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include <iostream> #include <core/calc/structural/protein_angles.hh> #include <core/calc/structural/angles.hh> #include <core/data/structural/PdbAtom.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> std::string program_info = R"( Test for z_matrix_to_cartesian() function recovers cartesian coordinates of a fluoroethylene from Z-matrix (internal coordinates) USAGE: ./ex_z_matrix_to_cartesian )"; /** @brief Test for z_matrix_to_cartesian() function * * This test recovers fluoroethylene from Z-matrix (internal coordinates) * CATEGORIES: core::calc::structural::z_matrix_to_cartesian * KEYWORDS: internal coordinates */ int main(const int argc, const char *argv[]) { if ((argc > 1) && utils::options::call_for_help(argv[1])) utils::exit_OK_with_message(program_info); using core::data::structural::PdbAtom; using namespace core::calc::structural; PdbAtom F(1, " F ", -1.0606, 0.1723, 0.0001, core::chemical::AtomicElement::FLUORINE.z); PdbAtom C1(2, " C1 ", 0.1319, -0.4627, -0.0005, core::chemical::AtomicElement::CARBON.z); PdbAtom C2(3, " C2 ", 1.2458, 0.2325, 0.0001, core::chemical::AtomicElement::CARBON.z); PdbAtom H11(2, " H11", 0.1690, -1.5420, 0.0030, core::chemical::AtomicElement::HYDROGEN.z); PdbAtom H21(3, " H21", 2.1991, -0.2751, -0.0004, core::chemical::AtomicElement::HYDROGEN.z); PdbAtom H22(3, " H22", 1.2087, 1.3119, 0.0010, core::chemical::AtomicElement::HYDROGEN.z); PdbAtom H11_(2, " H11", 0,0,0, core::chemical::AtomicElement::HYDROGEN.z); PdbAtom H21_(2, " H21", 0,0,0, core::chemical::AtomicElement::HYDROGEN.z); PdbAtom H22_(2, " H22", 0,0,0, core::chemical::AtomicElement::HYDROGEN.z); core::calc::structural::z_matrix_to_cartesian(F, C2, C1, 1.0, to_radians(120), to_radians(180), H11_); core::calc::structural::z_matrix_to_cartesian(F, C1, C2, 1.0, to_radians(120), to_radians(180), H21_); core::calc::structural::z_matrix_to_cartesian(H21_, C1, C2, 1.0, to_radians(120), to_radians(180), H22_); std::cout << C2.to_pdb_line() << "\n"; std::cout << C1.to_pdb_line() << "\n"; std::cout << F.to_pdb_line() << "\n"; std::cout << H11_.to_pdb_line() << "\n"; std::cout << H21_.to_pdb_line() << "\n"; std::cout << H22_.to_pdb_line() << "\n"; double error = H11_.distance_to(H11); error += H21_.distance_to(H21); error += H22_.distance_to(H22); std::cout << "# Average error on the three hydrogen atoms: "<<error/3.0<<"\n"; } |

Alphabethical list of all BioShell examples grouped by ap_* ex_* and *.py category.
Examples by functionality¶
File processing¶
BioShell supports the following file formats, holding bioinformatics data:
- PDB
- FASTA
- CIF
- ALN (ClustalW output with multiple sequence alignment)
- HHPred output [1]
- PIR
- XML (most notably these produced by blast+)
- SS2 (PsiPred output that holds secondary structure with predicted probabilities)
- CHK (legacy blast profiles, binary files)
- MAT (PSSM files produced by PsiBlast that contains PSSM)
BioShell offers reading and processing processing these files, which includes substructure extraction, format convertion and data filtering.
Alignments¶
Sequence alignment and multiple sequence alignment calculations inludes Smith & Waterman [2] and Needleman & Wunsh [3], both available in \(O(N^2)\) and \(O(N^3)\) implementations. These algotirhms are implemented as C++ templates, which facilitates alignment of virtually any kind of data, assuming that the appropriate scoring method is provided.
Sequence calculations¶
BioShell can calculate protein pI as well as hydrophobicity according to several scales. Creates, writes and handles sequence profiles. It can also convert an amino acid sequence to one of over 16 reduced alphabets [4] obtained from teh work by Peterson at al. [5].
Structure calculations¶
Since its origing, the main role of BioShell were structure-based calculations. The package can calculate a very broad selection of structural parameters, including:
- distances and distance maps
- contacts and contact maps
- hydrogen bonds
- dihedral angle by name (e.g. Phi or Chi1) or based on arbitrary atoms
- structural superimpositions (Kabsh algorithm) and rmsd value on arbitrary set of atoms
- structure similarity measures such as: GDT, LGS and TM-score
Statistical & numerical analysis¶
This includes:
- hierarchical agglomerative clustering with arbitrary distance and four merging scenarios: Single Link, Complete Link, Average Link and Ward’s method
- spline approximation
- kernel density estimation
- expectation-maximization
- simple non-parametric statistics such as mean, variance, bootstrap estimation, robust estimation
Footnotes
[1] | Soding, J and Biegert, A and Lupas, A. N., “The HHpred interactive server for protein homology detection and structure prediction.” Nucleic acids research (2005) 33 W244–W248 |
[2] |
|
[3] |
|
[4] |
|
[5] | Peterson, Kondev, Theriot and Phillips. “Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment”. Bioinformatics 2009 25:1356-1362 |
This file has been automatically generated on Jul 19 2023 12:57:25
BioShell ap_* examples grouped by their functionality.
Examples by keywords¶
CIF input¶
Chain¶
DSSP¶
FASTA¶
FASTA input¶
FASTA output¶
Format conversion¶
Hydrogen bonds¶
MSA¶
Monte Carlo¶
Mover¶
Needleman-Wunsch¶
PDB input¶
PDB line filter¶
PDB output¶
PIR¶
Protein structure features¶
Rosetta scorefile¶
STL¶
Structure¶
XML¶
algorithms¶
clustal input¶
clustering¶
contact map¶
crmsd¶
data structures¶
data table¶
docking¶
estimation¶
expectation-maximization¶
graphs¶
hierarchical clustering¶
interactions¶
internal coordinates¶
interpolation¶
ligand¶
molecule¶
numerical methods¶
observer¶
pre-processing¶
random numbers¶
rototranslation¶
sampling¶
secondary structure¶
sequence¶
sequence alignment¶
sequence profile¶
simulation¶
statistics¶
structural properties¶
structure selectors¶
structure validation¶
BioShell examples comming from all the three ap_* ex_* and *.py categories sorted by keywords. Typically an example has more than one keyword assigned and thus appears more than once on the list.
BioShell C++ library¶
BioShell is a versatile C++11 library for structural bioinformatics. Its struture has been shown in the figure below:

See the API documentation generated with Doxygen.
Reading and processing PDB files¶
Reading PDB files into a BioShell program is divided into two steps:
- loading a text file into memory, and
- parsing its content and creating Structure object(s)
Loading a PDB file¶
You have to create a reader object to read a PDB file. In the simplest case this looks as below:
core::data::io::Pdb reader("infile.pdb");
This reader will skip water molecules and hydrogen atoms. You can control which PDB line will be omitted during reading by providing a PdbLineFilter instance to the constructor, e.g.
core::data::io::Pdb reader("infile.pdb",
core::data::io::all_true(core::data::io::is_not_water,
core::data::io::is_not_alternative));
PdbLineFilter
objects can dramaticly limit the number of PDB lines to be parsed and thus shorten the time spent of PDB file loading.
Creating Structure object¶
Once a file is loaded, you can create a Structure object from one of its models:
core::data::structural::Structure_SP model = reader.create_structure(0);
The very first model is indexed by 0. Every time create_structure()
method is called, a new Structure
object is created, which includes necessary memory allocation. Creating new atom objects is in fact the slowest part of this call. Sometimes it is possible to recycle old structure filling it with new coordinates rather than just creating a new one from scratch. This can be done as in the ap_contact_map
program; the relevant fragment is shown below:
1 2 3 4 5 6 7 8 | } if (std::strcmp(argv[1],"CB")==0) { core::data::io::PdbLineFilter filter = core::data::io::is_cb; selector = std::make_shared<core::data::structural::selectors::IsNamedAtom>(" CB "); } double cutoff = utils::from_string<double>(argv[3]); // The third parameter is the contact distance (in Angstroms) core::data::io::Pdb reader(argv[2],filter); // --- file name (PDB format, may be gzip-ped) |
Coordinates of a new structure must fit into the existing stucture i.e. the new structure must be composed of the same number of chains, residues and atom as the old one. In practice this is most useful when a multi-model PDB file must be loaded, as in this example:
- in the line 1 a PDB file is loaded with a filter instance defined someehere before
- in the line 3 a
Structure
object is creaded based on the first model defined in the file- in the line 4 a
ContactMap
object is creaded and the first structure is loaded id- finally, in lines 5-8 a loop iterates over all the remaining models; in line 6 coordinates of each model are loaded into the existing structure (the one created in line 3)
Residue, PdbAtom and Chain objects are created only once, when the structure at index 0 is loaded. After that the loop only substitutes. coordinates of this structure
BioShell Python library¶
BioShell 3.0 comes also with Python bindings i.e. BioShell classes can be also used as Python modules. Let’s consider the following C++ program that reads a PDB file and creates a Structure
object that represents a biomacromolecular complex. Then it writes a FASTA sequence for every chain in the structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #include <iostream> #include <core/data/io/Pdb.hh> #include <core/data/structural/Structure.hh> #include <utils/options/OptionParser.hh> #include <utils/exit.hh> */ int main(const int argc, const char *argv[]) { using namespace core::data::io; // Pdb and create_fasta_string lives there Pdb reader(argv[1], is_not_alternative, only_ss_from_header, true); core::data::structural::Structure_SP strctr = (reader.create_structure(0)); // Iterate over all chains for (int ic = 0; ic < strctr->count_chains(); ++ic) std::cout << "> " << strctr->code() << (*strctr)[ic]->id() << "\n" // --- e.g. prints "> 2gb1 A" << (*strctr)[ic]->create_sequence()->sequence << "\n"; // --- prints the sequence itself } |
The same program written in Pyton looks much simpler. It calls nearly the same BioShell C++ objects as the one above, but due to simplicity of Python, the script is a bit shorter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import sys from pybioshell.core.data.io import find_pdb GROUP: File processing; Format conversion """) sys.exit() for pdb_fname in sys.argv[1:] : structure = find_pdb(pdb_fname, "./").create_structure(0) for ic in range(structure.count_chains()) : chain = structure[ic] print(">",structure.code(), chain.id()) print(chain.create_sequence(IF_EXCLUDE_LIGANDS).sequence) |
Reading and writing PDB files¶
Reading PDB files¶
Reading PDB data is a two-stage process. First you create a reader that loads PDB content into memory:
pdb = Pdb(pdb_code,"",False)
First argument is a string which is a path to a PDB file. Second is a string which represents a PdbLineFilter
object eg. "is_ca"
- reader will read only CA atoms or "is_not_water"
- will read everything what is not water. In general, filtering PDB lines may considerably speed up loading time. Third argument is a flag whether reader should read header of a PDB file. Header is neccesary to read some additional information eg. about secondary structure, connectivity (CONNECT fields), etc.
Then the content is parsed according to user’s requests. In the example below the FASTA string is printed out.
1 2 3 4 5 | structure = pdb.create_structure(0) for ic in range(structure.count_chains()) : chain = structure[ic] print(">",structure.code(), chain.id()) print(chain.create_sequence().sequence) |
Writing PDB files¶
PdbAtom class provides create_pdb_line()
method. In the following example four nested loop iterate over models, chains, residues and (finally) atoms;
1 2 3 4 5 6 7 8 9 | for i_model in range(pdb.count_models()) : structure = pdb.create_structure(i_model) for ic in range(structure.count_chains()): chain = structure[ic] for ir in range(chain.count_residues()): resid = chain[ir] for ia in range(resid.count_atoms()): if resid[ia].atom_name() == "CA" or resid[ia].atom_name() == "CB": print(resid[ia].to_pdb_line()) |
Also, pybioshell.core.data.io
module contains write_pdb()
method which writes in example below model no. 5 of a structure
object (note: models count from 0) to a file with a given name. Existing files will be overwriten.
1 2 3 | reader = Pdb(pdb_fname,"is_ca",False) structure = reader.create_structure(0) write_pdb(structure, out_fname, 5) |
Help! My script crashes!¶
Getting more logs from BioShell library¶
There are nine levels of importance for log messages reported by BioShell methods. Seven of them are used for general reporting, in the order of decreasing importance: CRITICAL, SEVERE, WARNING, INFO, FINE, FINER and FINEST. The two additional levels: HTTP and FILE are used to report HTTP messages and information about disk I/O, respectively. By default, loggging level is set to INFO which means that only INFO and more important messages show up. To see more logs, increase the verbosity as follows:
from pybioshell.utils import LogManager LogManager.FINEST()
Checking C++ excpetion from Python¶
Occasionally PyBioShell library throws an exception, which stops a Python script. To find out the reason, wrap a bioshell call into a try / except block and print out execution information as below:
try: pdb = find_pdb(pdb_fname, path, True, False) except: sys.stderr.write(str(sys.exc_info()[0])+" "+str(sys.exc_info()[1]))
Using PyBioShell in PyMOL¶
PyBioShell can be loaded by a Python interpreter as any other library. This also applies to the interpreter that is build in PyMOL - a molecular visualization system [1].
Loading PyBioShell¶
Load a PyBioShell module by typing a respective command in a PyMOL command input area, the same as you would use in a Python script. E.g. you can try the following:
from pybioshell.core import BioShellVersion
print(BioShellVersion().to_string())
which should print information about your BioShell version, as you can see below:

After a successful import, you can use any PyBioShell module inside PyMOL. But how to transfer data that is visible in PyMOL 3D window to BioShell?
Loading PDB data from PyMOL¶
Fortunately PyMOL can export a desired part of a 3D view in a PDB format, which can be directly parsed
by core.data.io.Pdb
reader, e.g:
from pybioshell.core.data.io import Pdb
cmd.fetch("2gb1")
pdb_txt = cmd.get_pdbstr('all')
pdb = Pdb(pdb_txt, "")
strstr = pdb.create_structure(0)
The detailed description how to read in and process PDB data is given here
References
[1] | https://pymol.org/ |
These pages provide documentation for BioShell package. Api documentation is given here To answer most common questions, we have a list of shortcuts below:
- Description of BioShell package components: What is BioShell
- Overview of BioShell functionality: Examples by functionality
- How to install BioShell package: Installation
- How to install PyBioShell package (Python bindings to BioShell): PyBioShell Installation
Our laboratory protocols, both related to BioShell and Rosetta, are provided on our labnotes website.
SURPASS model¶
- SURPASS model
- Single United Residue per Pre-Averaged Secondary Structure fragment is a coarse-grained low resolution model for protein simulations.
- See SURPASS representation
- Read about: SURPASS force field
- Necessary and optional Input files
- Resutling Output files
surpass_annealing
command line program
SURPASS force field¶
The generic force field for SURPASS model describes the most fundamental properties of globular proteins. The only sequence-dependent parameters comes from secondary structure. The background for force-field derivation define regularities observed in real protein structures. The statistics is based on a redundant set of 4600 protein chains, representing all known protein families, with resolution not lower than 1.6Å and a sequence identity not greater than 60%. Described below analysis of these statistical data defines the SURPASS force field consisting of knowledge-based statistical potentials. > [Figure 1. Schematic illustration of the terms included in the SURPASS force field.]
Terms to create regular secondary structure (close in sequence)¶
1. Short range interactions¶
The deficiencies of atomic details in strongly simplified and pre-averaged SURPASS chain may cause an incorrect local geometry of the structure. To avoid this, it is necessary to transfer the structural regularities of the atomistic models onto the corresponding sets of united atoms. All generic terms: R12, R13, R14 and R15 are prepared in six variants (HH, EE, CC, HE, HC, EC) depending on the secondary structure assignments for pairs of residues located at key positions. All short-range interactions have been implemented in the force field as potential of mean field (PMF), using a one-dimensional kernel density estimator (KDE) as a method of estimating the density of the empirical distribution.
- > [Table 1. Secondary structure dependent short range interactions.
- term | statistic plots (6 variants) | energy plot (all-in-one) - table 4 rows x 8 columns]
> [equasion and description]
2. Model of hydrogen bonding¶
In the SURPASS model only the hydrogen bonds between residues that are distant in the sequence, especially in extended structure fragments, are modeled more directly. Therefore, the formation of model hydrogen bonds depends on the fulfillment of a few simple geometrical conditions:
the length of the model hydrogen bond is in a range of 3.8Å to 6.0Å, and the most probable length is 4.65Å;
the maximum number of connections for each pseudo residue in the β-strand is 2; if there are more potential candidates for hydrogen bond formation, the best two are chosen according to the following angular criteria:
- a hydrogen bond should be perpendicular to the main chain of both interacting β-strands and the permitted angle range is from 70˚ to 115˚;
- the maximum allowable twist of the beta sheet, measured as the planar angle between the main chains of two adjacent β-strands, is not greater than 55˚;
- for a pseudo residue that forms two hydrogen bonds (with two different β-strands), the planar angle between these bonds must be greater than 125˚, and 180˚ is the best orientation.
> [Figure 2. Statistical analysis of the geometry of the model hydrogen bond: A – length of hydrogen pseudobonds extracted from the RDF of distance between i-th and j-th pseudoresidues in two beta strands. B – angle between two β-strands connected by a hydrogen bond. C – twist of the β-sheet measured as a planar angle between the main chains of two adjacent β-strands; D – angle between two hydrogen bonds of three connecting β-strands.]
3. Helix stiffnes¶
Terms to control local packing (close in space)¶
1. Local repulsive interactions¶
2. Local attractive interactions: Excluded Volume & Contacts¶
- pseudo atom H (helix-like) for helical (HHHH) or almost helical (HHHC, CHHH) fragments
- pseudo atom S (like β-strand) representing centers of mass of EEEE, EEEC or CEEE, fragments
- pseudo atom C (coil-like) for all remaining secondary structure combinations (H, E and C)
Input files¶
Secondary structure profile file (*.ss2
file format) is the only mandatory
input to the program. Example input file for 2GB1 protein can be found here
.
Optionally, a starting conformation (in the PDB format) may be provided with -model::pdb
flag.
Please note that these input files must be in all-residues representation, even though SURPASS models are shorter by 3 residues
An input SS2 file may be conveniently generated from a PDB file as long as it contains secondary structure information in its header.
The following command uses seqc
program of BioShell package:
seqc -in::pdb=2gb1.pdb -select:chains=A -in:pdb:header -out:ss2
Output files¶
After every outer cycle (see options below), surpass_annealing makes an observation of the current state of the
simulated system. Typically this means observing energy of the system, various evaluators, topology of a protein,
and the coordinates in .pdb
file format.
energy.dat
- The file provides energy components for every observed frame
movers.dat
- The file provides movers acceptance ratio and range
observers.dat
- provides various measurements for every observed frame, such as elapsed time, temperature, radius of gyration, crsmd, etc.
topology.dat
- The file topology footprint
trajectory.pdb
- File contains coordinates of the system recorded at every observation event
(file name may be changed with
-out:pdb
option)
SURPASS representation¶
SURPASS is a new coarse-grained model of protein structure. Deep reduction of the number of atoms in the representation results in a powerful computational speed-up and in this context ranks the model as a low resolution.
The number of pseudoresidues present in the modeled system corresponds to polipeptide chain size and is equal to N-3, where N is the number of amino acids in the sequence. The positions of pseudo residues are defined by averaging the coordinates of short secondary structure fragments. These fragments are replaced by a single center of interactions. The choice of four residue averaging is crucial for the local geometry of the model because leads to an almost linear shape of the SURPASS fragments representing helices or beta strands.
The SURPASS representation assumes three types of pseudo atoms depending on secondary structure assignment of the averaged fragments of protein structure:
- pseudo atom H (helix-like) for helical (HHHH) or almost helical (HHHC, CHHH) fragments
- pseudo atom S (like β-strand) representing centers of mass of EEEE, EEEC or CEEE, fragments
- pseudo atom C (coil-like) for all remaining secondary structure combinations (H, E and C)
surpass_annealing program¶
You need just a .ss2
file for your target protein to run the program. Provide it using -in:ss2
command line option.
Other options are used to:
specify starting conformation
specify temperature set for simulated annealing run
-sample:t_start
,-sample:t_end
and-sample:t_steps
define a set of N+1 temperatures distributed uniformly between the starting and the ending temperature
- specify the length of the simulation (Monte Carlo steps)
-sample:mc_inner_cycles
defines the amount of sampling between frames that are recordedsample:mc_cycle_factor
makes every inner MC cycle longer (multiplying them by a given factor)-sample:mc_outer_cycles
defines the number of frames recorded for every temperature valueThe total number of MC steps is then \(N_{temperatures} \times N_{inner} \times N_{factor} \times N_{outer}\)
Example parameters for ab-initio simulation of a protein:
./surpass_annealing -model:random \
-in:ss2=test_inputs/2gb1A.ss2 \
-sample:t_start=2.2 \
-sample:t_end=0.9 \
-sample:t_steps=15 \
-sample:mc_outer_cycles=100 \
-sample:mc_inner_cycles=10 \
-sample:mc_cycle_factor=10 \
-sample:perturb:range=0.7
Example parameters to relax an input structure:
./surpass_annealing -model:pdb=2gb1A.pdb \
-in:ss2=test_inputs/2gb1A.ss2 \
-sample:t_start=2.2 \
-sample:t_end=0.9 \
-sample:t_steps=15 \
-sample:mc_outer_cycles=100 \
-sample:mc_inner_cycles=10 \
-sample:mc_cycle_factor=10 \
-sample:perturb:range=0.7
Notes for BioShell developers¶
Do you want to participate in the project? Have briliant idea what would be a cool extension? Or maybe you need a specific feature?
This page will help you with rolling your own copy of BioShell!
1. Don’t create branch, fork instead Make your own fork of BioShell repository
2. Work as usually
3. Merge with the upstream repository often
Remember to sync your fork of the BioShell source tree to keep it up-to-date with the upstream repository. Use the command below:
git pull git@bitbucket.org:dgront/bioshell.git
This will only update your local copy of the repository. Use git push
to update your fork on BitBucket.
4. Create a pull request
When your work is done, you may contribute your changes to the main BioShell repository. Simply push your development branch to the forked remote repository and create the pull request.