chemfp.rdkit_toolkit module¶
The chemfp toolkit API wrapper for the RDKit toolkit.
This module is also available as chemfp.rdkit
-
chemfp.rdkit_toolkit.
name
¶ The string “rdkit”.
-
chemfp.rdkit_toolkit.
software
¶ The string used in output file metadata to describe this version of RDKit. For example, “RDKit/2023.03.1”.
-
chemfp.rdkit_toolkit.
avalon
¶ The available version of the ‘RDKit-Avalon’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitAvalonFingerprintType_v1
with the full type:RDKit-Avalon/1 fpSize=512 isQuery=0 bitFlags=15761407
-
chemfp.rdkit_toolkit.
maccs166
¶ The available version of the ‘RDKit-MACCS166’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitMACCSFingerprintType_v2
with the full type:RDKit-MACCS166/2
-
chemfp.rdkit_toolkit.
morgan
¶ An alias for
chemfp.rdkit_toolkit.morgan2
.
-
chemfp.rdkit_toolkit.
morgan1
¶ The available version of the ‘RDKit-Morgan radius=1’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitMorganFingerprintType_v1
with the full type:RDKit-Morgan/1 radius=1 fpSize=2048 useFeatures=0 useChirality=0 useBondTypes=1
-
chemfp.rdkit_toolkit.
morgan2
¶ The available version of the ‘RDKit-Morgan radius=2’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitMorganFingerprintType_v1
with the full type:RDKit-Morgan/1 radius=2 fpSize=2048 useFeatures=0 useChirality=0 useBondTypes=1
-
chemfp.rdkit_toolkit.
morgan3
¶ The available version of the ‘RDKit-Morgan radius=3’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitMorganFingerprintType_v1
with the full type:RDKit-Morgan/1 radius=3 fpSize=2048 useFeatures=0 useChirality=0 useBondTypes=1
-
chemfp.rdkit_toolkit.
morgan4
¶ The available version of the ‘RDKit-Morgan radius=4’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitMorganFingerprintType_v1
with the full type:RDKit-Morgan/1 radius=4 fpSize=2048 useFeatures=0 useChirality=0 useBondTypes=1
-
chemfp.rdkit_toolkit.
atom_pair
¶ The available version of the ‘RDKit-AtomPair’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitAtomPairFingerprint_v2
with the full type:RDKit-AtomPair/2 fpSize=2048 minLength=1 maxLength=30
-
chemfp.rdkit_toolkit.
pattern
¶ The available version of the ‘RDKit-Pattern’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitPatternFingerprint_v4
with the full type:RDKit-Pattern/4 fpSize=2048
-
chemfp.rdkit_toolkit.
rdk
¶ The available version of the ‘RDKit-Fingerprint’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitFingerprintType_v2
with the full type:RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1
-
chemfp.rdkit_toolkit.
secfp
¶ The available version of the ‘RDKit-SECFP’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitSECFPFingerprintType_v1
with the full type:RDKit-SECFP/1 fpSize=2048 radius=3 rings=1 isomeric=0 kekulize=0 min_radius=1
-
chemfp.rdkit_toolkit.
torsion
¶ The available version of the ‘RDKit-Torsion’ fingerprint type, for example, an instance of
chemfp.rdkit_types.RDKitTorsionFingerprintType_v3
with the full type:RDKit-Torsion/3 fpSize=2048 targetSize=4
-
chemfp.rdkit_toolkit.
is_licensed
()¶ Return True - RDKit is always licensed
Returns: True
-
chemfp.rdkit_toolkit.
get_formats
(include_unavailable=False)¶ Get the list of structure formats that RDKit supports
If include_unavailable is True then also include RDKit formats which aren’t available to this specific version of RDKit, such as the InChI formats if your RDKit installation wasn’t compiled with InChI support.
Parameters: include_unavailable (True or False) – include unavailable formats? Returns: a list of Format objects
-
chemfp.rdkit_toolkit.
get_input_formats
()¶ Get the list of supported RDKit input formats
Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.rdkit_toolkit.
get_output_formats
()¶ Get the list of supported RDKit output formats
Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.rdkit_toolkit.
get_format
(format)¶ Get the named format, or raise a ValueError
This will raise a ValueError if RDKit does not implement the format format_name or that format is not available.
Parameters: format_name (a string) – the format name Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.rdkit_toolkit.
get_input_format
(format)¶ Get the named input format, or raise a ValueError
This will raise a ValueError if RDKit does not implement the format format_name or that format is not an input format.
Parameters: format_name (a string) – the format name Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.rdkit_toolkit.
get_output_format
(format)¶ Get the named format, or raise a ValueError
This will raise a ValueError if RDKit does not implement the format format_name or that format is not an output format.
Parameters: format_name (a string) – the format name Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.rdkit_toolkit.
get_input_format_from_source
(source=None, format=None)¶ Get the most appropriate format given the available source and format information
If format is a
chemfp.base_toolkit.Format
then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.If format is None, use the source to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.
Parameters: - source (a filename (as a string), a file object, or None to read from stdin) – the structure data source.
- format (a Format(-like) object, string, or None) – format information, if known.
Returns: a
chemfp.base_toolkit.Format
object
-
chemfp.rdkit_toolkit.
get_output_format_from_destination
(destination=None, format=None)¶ Get the most appropriate format given the available destination and format information
If format is a
chemfp.base_toolkit.Format
then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.If format is None, use the destination to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.
Parameters: - destination (a filename (as a string), a file object, or None to read from stdin) – The structure data source.
- format (a Format(-like) object, string, or None) – format information, if known.
Returns: a
chemfp.base_toolkit.Format
object
-
chemfp.rdkit_toolkit.
read_molecules
(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶ Return an iterator that reads RDKit molecules from a structure file
Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. For SD files, use id_tag to get the record id from the given SD tag instead of the title line. (read_molecules() will ignore the id_tag. It exists to make it easier to switch between reader functions.)
Note: the reader returns a new RDKit molecule each time.
The reader_args dictionary parameters depend on the format. These include:
- SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- has_header - True or False
- sanitize - True or default sanitizes; False for unsanitized processing
- InChI
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- logLevel - an integer log level
- treatWarningAsError - True raises an exception on error; False or default keeps processing
- SDF
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- strictParsing - True or default for strict parsing; False for lenient parsing
The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a
chemfp.io.Location
instance. If None then a default Location will be created.See
chemfp.rdkit_toolkit.read_ids_and_molecules()
if you want (id, molecule) pairs instead of just the molecules.Parameters: - source (a filename, file object, or None to read from stdin) – the structure source
- format (a format name string, or Format object, or None to auto-detect) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader parameters passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules- SMILES
-
chemfp.rdkit_toolkit.
read_molecules_from_string
(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶ Return an iterator that reads RDKit molecules from a string containing structure records
content is a string containing 0 or more records in the format format. See
chemfp.rdkit_toolkit.read_molecules()
for details about the other parameters. Seechemfp.rdkit_toolkit.read_ids_and_molecules_from_string()
if you want to read (id, RDKit) pairs instead of just molecules.Parameters: - content (a string) – the string containing structure records
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_ids_and_molecules
(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶ Return an iterator that reads (id, RDKit molecule) pairs from a structure file
See
chemfp.rdkit_toolkit.read_molecules()
for full parameter details. The major difference is that this returns an iterator of (id, RDKit molecule) pairs instead of just the molecules.Parameters: - source (a filename, file object, or None to read from stdin) – the structure source
- format (a format name string, or Format object, or None to auto-detect) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating (id, RDKit molecule) pairs
-
chemfp.rdkit_toolkit.
read_csv_ids_and_molecules
(source, *, id_column=1, mol_column=2, dialect=None, has_header=True, compression='auto', format='smi', id_tag=None, reader_args=None, errors='report', csv_errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶ Read ids and molecules from column(s) of a CSV file using RDKit.
Read from source, which may be a filename, a file-like object, or None (the default) to read from stdin.
Use id_column and mol_column to specify the columns containing the record identifier and molecule record. By default the identifiers come from column 1 (the first column) and the molecules from column 2 (the second column). Columns can be specified by integer position (starting with 1), or by a string matching the title from the header line. If id_column is None then the molecule id will come from parsing the molecule record.
Use dialect to specify the type of CSV file. The default of None infers the dialect from the filename extension; *.csv for comma-separated, and *.tsv for tab-separated. The dialect can be specified directly as “csv” or “tsv”, as a registered Python csv dialect at https://docs.python.org/3/library/csv.html (though “excel” is the same as “csv” and “excel-tab” is the same as “tsv”), or as a csv.Dialect or a .class:CSVDialect instance.
If has_header is True then the first line/record contains column titles, and if False then there are no column titles.
Use compression to specify how the file compression format. The default “auto” uses the filename extension. Other options are “gz” and “zst”, or the empty string “” to mean no compresssion.
Use format to specify the structure format for how to parse the molecule column. The default of ‘smi’ will parse it as a SMILES string and, if id_column=None, will also parse any identifier.
The id_tag and reader_args arguments contain additional format configuration parameters.
The errors and csv_errors describe how to handle failures in molecule parsing and CSV parsing, respectively. The default is to report molecule parse failures to stderr, and to stop parsing if a CSV row does not contain enough columns.
The location parameter takes a
chemfp.io.Location
instance. If None then a default Location will be created.The encoding and encoding_errors are strings describing the input file character encoding, and how to handle decoding errors. See https://docs.python.org/3/library/codecs.html#error-handlers and https://docs.python.org/3/library/codecs.html#error-handlers for details.
Parameters: - source (a filename, file object, or None to read from stdin) – the CSV source
- id_column (integer position (starting from 1), string, or None) – the column position or column title containing the identifier
- mol_column (integer position (starting from 1), string) – the column position or column title containing the structure record
- dialect (None, a string name, or a Dialect instance) – the CSV dialect
- has_header (bool) – True if the first record contains titles, False of it does not
- compression (string or None) – file compression format
- format (a format name string, or Format object) – the molecule structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle molecule parse errors
- csv_errors (one of "strict", "report", or "ignore") – specify how to handle CSV errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information - encoding (string) – the name of the file’s character encoding
- encoding_errors (string) – the method used handle decoding errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating (id, RDKit molecule) pairs
-
chemfp.rdkit_toolkit.
read_ids_and_molecules_from_string
(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶ Return an iterator that reads (id, RDKit molecule) pairs from a string containing structure records
content is a string containing 0 or more records in the format format. See
chemfp.rdkit_toolkit.read_molecules()
for details about the other parameters. Seechemfp.rdkit_toolkit.read_molecules_from_string()
if you just want to read the RDKit molecules instead of (id, molecule) pairs.Parameters: - content (a string) – the string containing structure records
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating (id, RDKit molecule) pairs
-
chemfp.rdkit_toolkit.
make_id_and_molecule_parser
(format, id_tag=None, reader_args=None, errors='strict')¶ Create a specialized function which takes a record and returns an (id, RDKit molecule) pair
The returned function is optimized for reading many records from individual strings because it only does parameter validation once. However, I haven’t really noticed much of a performance difference between this and
chemfp.rdkit_toolkit.parse_id_and_molecule()
so you can probably so I suggest you use that function directly instead of making a specialized function. (Let me know if making a specialized function is useful.)See
chemfp.rdkit_toolkit.read_molecules()
for details about the other parameters.Parameters: - format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: a function of the form
parser(record string) -> (id, RDKit molecule)
-
chemfp.rdkit_toolkit.
parse_molecule
(content, format, id_tag=None, reader_args=None, errors='strict')¶ Parse the first structure record from the content string and return an RDKit molecule.
content is a string containing a single structure record in format format. (Additional records are ignored). See
chemfp.rdkit_toolkit.read_molecules()
for details about the other parameters. Seechemfp.rdkit_toolkit.parse_id_and_molecule()
if you want the (id, RDKit molecule) pair instead of just the molecule.Parameters: - content (a string) – the string containing a structure record
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: an RDKit molecule
-
chemfp.rdkit_toolkit.
parse_id_and_molecule
(content, format, id_tag=None, reader_args=None, errors='strict')¶ Parse the first structure record from content and return the (id, RDKit molecule) pair.
content is a string containing a single structure record in format format. (Additional records are ignored). See
chemfp.rdkit_toolkit.read_molecules()
for details about the other parameters.See
chemfp.rdkit_toolkit.read_molecules()
for details about the other parameters. Seechemfp.rdkit_toolkit.parse_molecule()
if just want the RDKit molecule and not the the (id, RDKit molecule) pair.Parameters: - content (a string) – the string containing a structure record
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: an (id, RDKit molecule) pair
-
chemfp.rdkit_toolkit.
create_string
(mol, format, id=None, writer_args=None, errors='strict')¶ Convert an RDKit molecule into a structure record in the given format as a Unicode string
If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.
Parameters: - mol (an RDKit molecule) – the molecule to use for the output
- format (a format name string, or Format object) – the output structure format
- id (a string, or None to use the molecule's own id) – an alternate record id
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: a Unicode string
-
chemfp.rdkit_toolkit.
create_bytes
(mol, format, id=None, writer_args=None, errors='strict', level=None)¶ Convert an RDKit molecule into a structure record in the given format as a byte string
If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.
Parameters: - mol (an RDKit molecule) – the molecule to use for the output
- format (a format name string, or Format object) – the output structure format
- id (a string, or None to use the molecule's own id) – an alternate record id
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns: a byte string
-
chemfp.rdkit_toolkit.
translate_record
(content, in_format='smi', out_format='smi', *, id_tag=None, reader_args=None, writer_args=None, id=None, errors='strict')¶ Translate a molecule record from one format to another
Use the RDKit toolkit to parse the content as format in_format (default: “smi”) and translate it into out_format (default: “smi”). For an SDF record, use id_tag to get the record id from the given SD tag instead of the title line. Use reader_args and writer_args to configure format-specific parameters. Use id to set the id of the output record.
The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
Parameters: - content (a string) – the string containing a structure record
- in_format (a format name string, or Format object) – the input structure format
- out_format (a format name string, or Format object) – the output structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary, or None) – reader arguments for the specified in_format
- writer_args (a dictionary, or None) – writer arguments for the specified out_format
- id (a string, or None to use the default) – the record id to use for the output record
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: a string
-
chemfp.rdkit_toolkit.
open_molecule_writer
(destination=None, format=None, writer_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict', level=None)¶ Return a MoleculeWriter which can write RDKit molecules to a destination.
A
chemfp.base_toolkit.MoleculeWriter
has the methodswrite_molecule
,write_molecules
, andwrite_ids_and_molecules
, which are ways to write an RDKit molecule, an RDKit molecule iterator, or an (id, RDKit molecule) pair iterator to a file.Molecules are written to destination. The output format can be a string like “sdf.gz” or “smi”, a
chemfp.base_toolkit.Format
, or Format-like object with “name” and “compression” attributes, or None to auto-detect based on the destination. If auto-detection is not possible, the output will be written as uncompressed SMILES.The writer_args dictionary parameters depend on the format. These include:
- SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- isomericSmiles - True to generate isomeric SMILES
- kekuleSmiles - True to generate SMILES in Kekule form
- canonical - True to generate a canonical SMILES
- allBondsExplicit - True to write explict ‘-’ and ‘:’ bonds, even if they can be inferred; default is False
- allHsExplicit - True to write explicit hydrogen counts; default is False
- cxsmiles - True to include CXSMILES annotations; default is False
InChI and InChIKey
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- include_id - True or default to include the id as the second column; False has no id column
- options - an options string passed to the underlying InChI library
- logLevel - an integer log level
- treatWarningAsError - True raises an exception on error; False or default keeps processing
SDF
- includeStereo - True include stereo information; False or default does not
- kekulize - True or default creates the connection table with bonds in Kekeule form
- v3k - True to alway export in V3000 format
The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a
chemfp.io.Location
instance. If None then a default Location will be created.Parameters: - destination (a filename, file object, or None to write to stdout) – the structure destination
- format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
- writer_args (a dictionary) – writer parameters passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track writer state information - level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules- SMILES
-
chemfp.rdkit_toolkit.
open_molecule_writer_to_string
(format, writer_args=None, errors='strict', location=None)¶ Return a MoleculeStringWriter which can write molecule records in the given format to a string.
See
chemfp.rdkit_toolkit.open_molecule_writer()
for full parameter details.Use the writer’s
chemfp.base_toolkit.MoleculeStringWriter.getvalue()
to get the output as a Unicode string.Parameters: - format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track writer state information
Returns: a
chemfp.base_toolkit.MoleculeStringWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_molecule_writer_to_bytes
(format, writer_args=None, errors='strict', location=None, level=None)¶ Return a MoleculeStringWriter which can write molecule records in the given format to a text string.
See
chemfp.rdkit_toolkit.open_molecule_writer()
for full parameter details.Use the writer’s
chemfp.base_toolkit.MoleculeStringWriter.getvalue()
to get the output as a byte string.Parameters: - format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track writer state information - level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns: a
chemfp.base_toolkit.MoleculeStringWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
copy_molecule
(mol)¶ Return a new RDKit molecule which is a copy of the given molecule
Parameters: mol (an RDKit molecule) – the molecule to copy Returns: a new RDKit Mol instance
-
chemfp.rdkit_toolkit.
add_tag
(mol, tag, value)¶ Add an SD tag value to the RDKit molecule
Parameters: - mol (an RDKit molecule) – the molecule
- tag (string) – the SD tag name
- value (string) – the text for the tag
Returns: None
-
chemfp.rdkit_toolkit.
get_tag
(mol, tag)¶ Get the named SD tag value, or None if it doesn’t exist
Parameters: - mol (an RDKit molecule) – the molecule
- tag (string) – the SD tag name
Returns: a string, or None
-
chemfp.rdkit_toolkit.
get_tag_pairs
(mol)¶ Get a list of all SD tag (name, value) pairs for the molecule
Parameters: mol (an RDKit molecule) – the molecule Returns: a list of (string name, string value) pairs
-
chemfp.rdkit_toolkit.
get_id
(mol)¶ Get the molecule’s id from RDKit’s _Name property
Parameters: mol (an RDKit molecule) – the molecule Returns: a string
-
chemfp.rdkit_toolkit.
set_id
(mol, id)¶ Set the molecule’s id as RDKit’s _Name property
Parameters: - mol (an RDKit molecule) – the molecule
- id (string) – the new id
Returns: None
-
chemfp.rdkit_toolkit.
parse_smistring
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')¶ Parse a SMILES string using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "smistring", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_smistring
(mol: Any, *, id: Optional[str, None] = None, isomericSmiles: bool = True, kekuleSmiles: bool = False, canonical: bool = True, allBondsExplicit: bool = False, allHsExplicit: bool = False, cxsmiles: bool = False, errors: str = 'strict') → Optional[str, None]¶ Generate a SMILES string from an RDKit molecule
This is equivalent to calling:
create_string(mol, "smistring", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- isomericSmiles (Boolean (default: True)) – If true, generate an isomeric SMILES
- kekuleSmiles (Boolean (default: False)) – If true, generate Kekule SMILES
- canonical (Boolean (default: True)) – If true, generate a canonical SMILES
- allBondsExplicit (Boolean (default: False)) – If true, include bond symbols even for single and aromatic bond
- allHsExplicit (Boolean (default: False)) – If true, include hydrogen counts for every atom
- cxsmiles (Boolean (default: False)) – If true, generate CXSmiles
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
parse_smi
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Parse a SMILES string and its id using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "smi", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_smi
(mol: Any, *, id: Optional[str, None] = None, isomericSmiles: bool = True, kekuleSmiles: bool = False, canonical: bool = True, allBondsExplicit: bool = False, allHsExplicit: bool = False, cxsmiles: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict') → Optional[str, None]¶ Generate a SMILES string and its id from an RDKit molecule
This is equivalent to calling:
create_string(mol, "smi", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- isomericSmiles (Boolean (default: True)) – If true, generate an isomeric SMILES
- kekuleSmiles (Boolean (default: False)) – If true, generate Kekule SMILES
- canonical (Boolean (default: True)) – If true, generate a canonical SMILES
- allBondsExplicit (Boolean (default: False)) – If true, include bond symbols even for single and aromatic bond
- allHsExplicit (Boolean (default: False)) – If true, include hydrogen counts for every atom
- cxsmiles (Boolean (default: False)) – If true, generate CXSmiles
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
read_smi_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, cxsmiles: bool = True, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read molecules from a SMILES file using the RDKit toolkit
This is mostly equivalent to calling:
read_molecules(source, "smi", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_smi_ids_and_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, cxsmiles: bool = True, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read ids and molecules from a SMILES file using the RDKit toolkit
This is mostly equivalent to calling:
read_ids_and_molecules(source, "smi", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_smi_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read molecules from a string containing a SMILES file using the RDKit toolkit
This is equivalent to calling:
read_molecules_from_string(content, "smi", reader_args={...}, errors=errors)
Use read_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_smi_ids_and_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read ids and molecules from a string containing a SMILES file using the RDKit toolkit
This is equivalent to calling:
read_ids_and_molecules_from_string(content, "smi", reader_args={...}, errors=errors)
Use read_ids_and_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
open_smi_writer
(destination: Union[None, str, BinaryIO], *, isomericSmiles: bool = True, kekuleSmiles: bool = False, canonical: bool = True, allBondsExplicit: bool = False, allHsExplicit: bool = False, cxsmiles: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Open a SMILES file to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "smi", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- isomericSmiles (Boolean (default: True)) – If true, generate an isomeric SMILES
- kekuleSmiles (Boolean (default: False)) – If true, generate Kekule SMILES
- canonical (Boolean (default: True)) – If true, generate a canonical SMILES
- allBondsExplicit (Boolean (default: False)) – If true, include bond symbols even for single and aromatic bond
- allHsExplicit (Boolean (default: False)) – If true, include hydrogen counts for every atom
- cxsmiles (Boolean (default: False)) – If true, generate CXSmiles
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_smi_writer_to_string
(*, isomericSmiles: bool = True, kekuleSmiles: bool = False, canonical: bool = True, allBondsExplicit: bool = False, allHsExplicit: bool = False, cxsmiles: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Open a SMILES file to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("smi", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Parameters: - isomericSmiles (Boolean (default: True)) – If true, generate an isomeric SMILES
- kekuleSmiles (Boolean (default: False)) – If true, generate Kekule SMILES
- canonical (Boolean (default: True)) – If true, generate a canonical SMILES
- allBondsExplicit (Boolean (default: False)) – If true, include bond symbols even for single and aromatic bond
- allHsExplicit (Boolean (default: False)) – If true, include hydrogen counts for every atom
- cxsmiles (Boolean (default: False)) – If true, generate CXSmiles
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
parse_sdf
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')¶ Parse an SDF record using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "sdf", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_sdf
(mol: Any, *, id: Optional[str, None] = None, includeStereo: bool = False, kekulize: bool = True, v3k: bool = False, errors: str = 'strict') → Optional[str, None]¶ Generate an SDF record from an RDKit molecule
This is equivalent to calling:
create_string(mol, "sdf", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: False)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
read_sdf_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')¶ Read molecules from an SDF file using the RDKit toolkit
This is mostly equivalent to calling:
read_molecules(source, "sdf", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_sdf_ids_and_molecules
(source: Union[None, str, BinaryIO], *, id_tag: Optional[None, str] = None, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')¶ Read ids and molecules from an SDF file using the RDKit toolkit
This is mostly equivalent to calling:
read_ids_and_molecules(source, "sdf", id_tag=id_tag, reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - id_tag (a string, or None to use the title) – get the id from the named data item instead of using the record title
- sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_sdf_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')¶ Read molecules from a string containing an SDF file using the RDKit toolkit
This is equivalent to calling:
read_molecules_from_string(content, "sdf", reader_args={...}, errors=errors)
Use read_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_sdf_ids_and_molecules_from_string
(content: Union[str, bytes], *, id_tag: Optional[None, str] = None, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')¶ Read ids and molecules from a string containing an SDF file using the RDKit toolkit
This is equivalent to calling:
read_ids_and_molecules_from_string(content, "sdf", id_tag=id_tag, reader_args={...}, errors=errors)
Use read_ids_and_molecules_from_string() if the content is compressed.
Parameters: - id_tag (a string, or None to use the title) – get the id from the named data item instead of using the record title
- sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
open_sdf_writer
(destination: Union[None, str, BinaryIO], *, includeStereo: bool = False, kekulize: bool = True, v3k: bool = False, errors: str = 'strict')¶ Open an SDF file to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "sdf", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: False)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_sdf_writer_to_string
(*, includeStereo: bool = False, kekulize: bool = True, v3k: bool = False, errors: str = 'strict')¶ Open an SDF file to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("sdf", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Parameters: - includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: False)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
create_sdf3k
(mol: Any, *, id: Optional[str, None] = None, includeStereo: bool = False, kekulize: bool = True, v3k: bool = True, errors: str = 'strict') → Optional[str, None]¶ Generate an SDF record in V3000 format from an RDKit molecule
This is equivalent to calling:
create_string(mol, "sdf3k", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: True)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
open_sdf3k_writer
(destination: Union[None, str, BinaryIO], *, includeStereo: bool = False, kekulize: bool = True, v3k: bool = True, errors: str = 'strict')¶ Open an SDF file in V3000 format to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "sdf3k", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: True)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_sdf3k_writer_to_string
(*, includeStereo: bool = False, kekulize: bool = True, v3k: bool = True, errors: str = 'strict')¶ Open an SDF file in V3000 format to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("sdf3k", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Parameters: - includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: True)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
parse_molfile
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, errors: str = 'strict')¶ Parse a molfile using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "molfile", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_molfile
(mol: Any, *, id: Optional[str, None] = None, includeStereo: bool = False, kekulize: bool = True, v3k: bool = False, errors: str = 'strict') → Optional[str, None]¶ Generate a molfile from an RDKit molecule
This is equivalent to calling:
create_string(mol, "molfile", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- includeStereo (Boolean (default: False)) – if true, include stereochemistry information in the record
- kekulize (Boolean (default: True)) – if true, Kekulize the molecule before creating the record
- v3k (Boolean (default: False)) – if true, always write in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
parse_rdbinmol
(content: Union[str, bytes], *, errors: str = 'strict')¶ Parse an RDKit binary molecule byte string using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "rdbinmol", reader_args={...}, errors=errors)
Parameters: errors (one of "strict", "ignore", or "log") – specify how to handle errors Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_rdbinmol
(mol: Any, *, id: Optional[str, None] = None, errors: str = 'strict') → Optional[str, None]¶ Generate an RDKit binary molecule byte string from an RDKit molecule
This is equivalent to calling:
create_string(mol, "rdbinmol", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
parse_fasta
(content: Union[str, bytes], *, sanitize: bool = True, flavor: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = 0, errors: str = 'strict')¶ Parse a FASTA record using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "fasta", reader_args={...}, errors=errors)
Possible flavor values are:
- 0 = L-amino acids
- 1 = D-amino acids
- 2 = RNA, no cap
- 3 = RNA, 5’ cap
- 4 = RNA, 3’ cap
- 5 = RNA, both caps
- 6 = DNA, no cap
- 7 = DNA, 5’ cap
- 8 = DNA, 3’ cap
- 9 = DNA, both caps
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- flavor (integer from 0-9, inclusive (default: 0)) – The sequence type (amino acid, RNA, or DNA), and how to handle caps
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_fasta
(mol: Any, *, id: Optional[str, None] = None, errors: str = 'strict') → Optional[str, None]¶ Generate a FASTA record from an RDKit molecule
This is equivalent to calling:
create_string(mol, "fasta", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
read_fasta_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, flavor: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = 0, errors: str = 'strict')¶ Read molecules from a FASTA file using the RDKit toolkit
This is mostly equivalent to calling:
read_molecules(source, "fasta", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Possible flavor values are:
- 0 = L-amino acids
- 1 = D-amino acids
- 2 = RNA, no cap
- 3 = RNA, 5’ cap
- 4 = RNA, 3’ cap
- 5 = RNA, both caps
- 6 = DNA, no cap
- 7 = DNA, 5’ cap
- 8 = DNA, 3’ cap
- 9 = DNA, both caps
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- flavor (integer from 0-9, inclusive (default: 0)) – The sequence type (amino acid, RNA, or DNA), and how to handle caps
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_fasta_ids_and_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, flavor: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = 0, errors: str = 'strict')¶ Read ids and molecules from a FASTA file using the RDKit toolkit
This is mostly equivalent to calling:
read_ids_and_molecules(source, "fasta", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Possible flavor values are:
- 0 = L-amino acids
- 1 = D-amino acids
- 2 = RNA, no cap
- 3 = RNA, 5’ cap
- 4 = RNA, 3’ cap
- 5 = RNA, both caps
- 6 = DNA, no cap
- 7 = DNA, 5’ cap
- 8 = DNA, 3’ cap
- 9 = DNA, both caps
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- flavor (integer from 0-9, inclusive (default: 0)) – The sequence type (amino acid, RNA, or DNA), and how to handle caps
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_fasta_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, flavor: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = 0, errors: str = 'strict')¶ Read molecules from a string containing a FASTA file using the RDKit toolkit
This is equivalent to calling:
read_molecules_from_string(content, "fasta", reader_args={...}, errors=errors)
Use read_molecules_from_string() if the content is compressed.
Possible flavor values are:
- 0 = L-amino acids
- 1 = D-amino acids
- 2 = RNA, no cap
- 3 = RNA, 5’ cap
- 4 = RNA, 3’ cap
- 5 = RNA, both caps
- 6 = DNA, no cap
- 7 = DNA, 5’ cap
- 8 = DNA, 3’ cap
- 9 = DNA, both caps
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- flavor (integer from 0-9, inclusive (default: 0)) – The sequence type (amino acid, RNA, or DNA), and how to handle caps
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_fasta_ids_and_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, flavor: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = 0, errors: str = 'strict')¶ Read ids and molecules from a string containing a FASTA file using the RDKit toolkit
This is equivalent to calling:
read_ids_and_molecules_from_string(content, "fasta", reader_args={...}, errors=errors)
Use read_ids_and_molecules_from_string() if the content is compressed.
Possible flavor values are:
- 0 = L-amino acids
- 1 = D-amino acids
- 2 = RNA, no cap
- 3 = RNA, 5’ cap
- 4 = RNA, 3’ cap
- 5 = RNA, both caps
- 6 = DNA, no cap
- 7 = DNA, 5’ cap
- 8 = DNA, 3’ cap
- 9 = DNA, both caps
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- flavor (integer from 0-9, inclusive (default: 0)) – The sequence type (amino acid, RNA, or DNA), and how to handle caps
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
open_fasta_writer
(destination: Union[None, str, BinaryIO], *, errors: str = 'strict')¶ Open a FASTA file to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "fasta", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_fasta_writer_to_string
(*, errors: str = 'strict')¶ Open a FASTA file to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("fasta", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Parameters: errors (one of "strict", "ignore", or "log") – specify how to handle errors Returns: a chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
parse_sequence
(content: Union[str, bytes], *, sanitize: bool = True, flavor: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = 0, errors: str = 'strict')¶ Parse an IUPAC sequence using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "sequence", reader_args={...}, errors=errors)
Possible flavor values are:
- 0 = L-amino acids
- 1 = D-amino acids
- 2 = RNA, no cap
- 3 = RNA, 5’ cap
- 4 = RNA, 3’ cap
- 5 = RNA, both caps
- 6 = DNA, no cap
- 7 = DNA, 5’ cap
- 8 = DNA, 3’ cap
- 9 = DNA, both caps
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- flavor (integer from 0-9, inclusive (default: 0)) – The sequence type (amino acid, RNA, or DNA), and how to handle caps
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_sequence
(mol: Any, *, id: Optional[str, None] = None, errors: str = 'strict') → Optional[str, None]¶ Generate an IUPAC sequence from an RDKit molecule
This is equivalent to calling:
create_string(mol, "sequence", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
parse_helm
(content: Union[str, bytes], *, sanitize: bool = True, errors: str = 'strict')¶ Parse a HELM string using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "helm", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_helm
(mol: Any, *, id: Optional[str, None] = None, errors: str = 'strict') → Optional[str, None]¶ Generate a HELM string from an RDKit molecule
This is equivalent to calling:
create_string(mol, "helm", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
parse_pdb
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, flavor: Literal[0] = 0, proximityBonding: bool = True, errors: str = 'strict')¶ Parse a PDB record using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "pdb", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- flavor (0) – The value 0 (may change in the future)
- proximityBonding (Boolean (default: True)) – If true, connect atoms based on a proximity search
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_pdb
(mol: Any, *, id: Optional[str, None] = None, flavor: Union[int, str] = 0, errors: str = 'strict') → Optional[str, None]¶ Generate a PDB record from an RDKit molecule
This is equivalent to calling:
create_string(mol, "pdb", id=id, writer_args={...}, errors=errors)
Available bit flag flavors are:
- 1 = ‘MODEL’ = write MODEL/ENDMDL
- 2 = ‘NO_CONECT’ = no CONECT records
- 4 = ‘BOTH_CONECT’ = CONECT records in both directions
- 8 = ‘NO_BOND_ORDER’ = use only one CONECT even for higher bond orders
- 16 = ‘MASTER’ = write MASTER record
- 32 = ‘TER’ = write TER record
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- flavor (An integer or string of '|'- or ','-separated terms) – Output flavor bit flags
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
read_pdb_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, removeHs: bool = True, flavor: Literal[0] = 0, proximityBonding: bool = True, errors: str = 'strict')¶ Read molecules from a PDB file using the RDKit toolkit
This is mostly equivalent to calling:
read_molecules(source, "pdb", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- flavor (0) – The value 0 (may change in the future)
- proximityBonding (Boolean (default: True)) – If true, connect atoms based on a proximity search
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_pdb_ids_and_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, removeHs: bool = True, flavor: Literal[0] = 0, proximityBonding: bool = True, errors: str = 'strict')¶ Read ids and molecules from a PDB file using the RDKit toolkit
This is mostly equivalent to calling:
read_ids_and_molecules(source, "pdb", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- flavor (0) – The value 0 (may change in the future)
- proximityBonding (Boolean (default: True)) – If true, connect atoms based on a proximity search
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_pdb_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, flavor: Literal[0] = 0, proximityBonding: bool = True, errors: str = 'strict')¶ Read molecules from a string containing a PDB file using the RDKit toolkit
This is equivalent to calling:
read_molecules_from_string(content, "pdb", reader_args={...}, errors=errors)
Use read_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- flavor (0) – The value 0 (may change in the future)
- proximityBonding (Boolean (default: True)) – If true, connect atoms based on a proximity search
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_pdb_ids_and_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, flavor: Literal[0] = 0, proximityBonding: bool = True, errors: str = 'strict')¶ Read ids and molecules from a string containing a PDB file using the RDKit toolkit
This is equivalent to calling:
read_ids_and_molecules_from_string(content, "pdb", reader_args={...}, errors=errors)
Use read_ids_and_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- flavor (0) – The value 0 (may change in the future)
- proximityBonding (Boolean (default: True)) – If true, connect atoms based on a proximity search
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
open_pdb_writer
(destination: Union[None, str, BinaryIO], *, flavor: Union[int, str] = 0, errors: str = 'strict')¶ Open a PDB file to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "pdb", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Available bit flag flavors are:
- 1 = ‘MODEL’ = write MODEL/ENDMDL
- 2 = ‘NO_CONECT’ = no CONECT records
- 4 = ‘BOTH_CONECT’ = CONECT records in both directions
- 8 = ‘NO_BOND_ORDER’ = use only one CONECT even for higher bond orders
- 16 = ‘MASTER’ = write MASTER record
- 32 = ‘TER’ = write TER record
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- flavor (An integer or string of '|'- or ','-separated terms) – Output flavor bit flags
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_pdb_writer_to_string
(*, flavor: Union[int, str] = 0, errors: str = 'strict')¶ Open a PDB file to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("pdb", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Available bit flag flavors are:
- 1 = ‘MODEL’ = write MODEL/ENDMDL
- 2 = ‘NO_CONECT’ = no CONECT records
- 4 = ‘BOTH_CONECT’ = CONECT records in both directions
- 8 = ‘NO_BOND_ORDER’ = use only one CONECT even for higher bond orders
- 16 = ‘MASTER’ = write MASTER record
- 32 = ‘TER’ = write TER record
Parameters: - flavor (An integer or string of '|'- or ','-separated terms) – Output flavor bit flags
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
parse_inchi
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Parse an InChI string and its id using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "inchi", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_inchi
(mol: Any, *, id: Optional[str, None] = None, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict') → Optional[str, None]¶ Generate an InChI string and its id from an RDKit molecule
This is equivalent to calling:
create_string(mol, "inchi", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
read_inchi_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read molecules from an InChI file (with InChI and optional id) using the RDKit toolkit
This is mostly equivalent to calling:
read_molecules(source, "inchi", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_inchi_ids_and_molecules
(source: Union[None, str, BinaryIO], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read ids and molecules from an InChI file (with InChI and optional id) using the RDKit toolkit
This is mostly equivalent to calling:
read_ids_and_molecules(source, "inchi", reader_args={...}, errors=errors)
along with decompression based on the source filename’s extension.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_inchi_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read molecules from a string containing an InChI file (with InChI and optional id) using the RDKit toolkit
This is equivalent to calling:
read_molecules_from_string(content, "inchi", reader_args={...}, errors=errors)
Use read_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
read_inchi_ids_and_molecules_from_string
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Read ids and molecules from a string containing an InChI file (with InChI and optional id) using the RDKit toolkit
This is equivalent to calling:
read_ids_and_molecules_from_string(content, "inchi", reader_args={...}, errors=errors)
Use read_ids_and_molecules_from_string() if the content is compressed.
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating RDKit molecules
-
chemfp.rdkit_toolkit.
open_inchi_writer
(destination: Union[None, str, BinaryIO], *, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Open an InChI file (with InChI and optional id) to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "inchi", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_inchi_writer_to_string
(*, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Open an InChI file (with InChI and optional id) to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("inchi", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Parameters: - options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
parse_inchistring
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, errors: str = 'strict')¶ Parse an InChI string using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "inchistring", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_inchistring
(mol: Any, *, id: Optional[str, None] = None, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, errors: str = 'strict') → Optional[str, None]¶ Generate an InChI string from an RDKit molecule
This is equivalent to calling:
create_string(mol, "inchistring", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
create_inchikey
(mol: Any, *, id: Optional[str, None] = None, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict') → Optional[str, None]¶ Generate an InChIKey string and its id from an RDKit molecule
This is equivalent to calling:
create_string(mol, "inchikey", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
open_inchikey_writer
(destination: Union[None, str, BinaryIO], *, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Open an InChIKey file (with InChIKey and optional id) to write RDKit molecules
This is mostly equivalent to calling:
open_molecule_writer(destination, "inchikey", writer_args={...}, errors=errors)
along with compression based on the destination filename’s extension.
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
open_inchikey_writer_to_string
(*, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Open an InChIKey file (with InChIKey and optional id) to write RDKit molecules to an in-memory string
This is equivalent to calling:
open_molecule_writer_to_string("inchikey", writer_args={...}, errors=errors)
Use write_molecules_to_string() to write compressed output.
Parameters: - options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting RDKit molecules
-
chemfp.rdkit_toolkit.
create_inchikeystring
(mol: Any, *, id: Optional[str, None] = None, options: str = '', logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, errors: str = 'strict') → Optional[str, None]¶ Generate an InChIKey string from an RDKit molecule
This is equivalent to calling:
create_string(mol, "inchikeystring", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- options (a string (default: "")) – an configuration string to pass to the InChI API
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored
-
chemfp.rdkit_toolkit.
parse_smiles
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Parse a SMILES string and its id using the RDKit toolkit
This is equivalent to calling:
parse_molecule(content, "smi", reader_args={...}, errors=errors)
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: an RDKit molecule object
-
chemfp.rdkit_toolkit.
create_smiles
(mol: Any, *, id: Optional[str, None] = None, isomericSmiles: bool = True, kekuleSmiles: bool = False, canonical: bool = True, allBondsExplicit: bool = False, allHsExplicit: bool = False, cxsmiles: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict') → Optional[str, None]¶ Generate a SMILES string and its id from an RDKit molecule
This is equivalent to calling:
create_string(mol, "smi", id=id, writer_args={...}, errors=errors)
Parameters: - mol (an RDKit molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- isomericSmiles (Boolean (default: True)) – If true, generate an isomeric SMILES
- kekuleSmiles (Boolean (default: False)) – If true, generate Kekule SMILES
- canonical (Boolean (default: True)) – If true, generate a canonical SMILES
- allBondsExplicit (Boolean (default: False)) – If true, include bond symbols even for single and aromatic bond
- allHsExplicit (Boolean (default: False)) – If true, include hydrogen counts for every atom
- cxsmiles (Boolean (default: False)) – If true, generate CXSmiles
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a string, or None if errors are ignored