Bio Jade Project Documentation¶
Contents:
Bio-Jade¶
A repository for modules and applications to aid in the design and analysis of Biological molecules, especially when working with Rosetta or PyRosetta.
- Free software: BSD license
- Documentation: https://bio-jade.readthedocs.io.
Features¶
- A suite of modules for working with biological modules in python.
- A suite of public and pilot applications to make day-to-day tasks easier.
- Commonly used ones include score_analysis.py, get_seq.py, and RunRosettaMPI.py
Caveats¶
This package is still under heavy development, and test code coverage is limited at the moment.
Credits¶
This package was in part created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Installation¶
Stable release¶
To install Jade, run this command in your terminal:
$ pip install bio-jade
This is the preferred method to install Jade, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for Jade can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/SchiefLab/Jade
Or download the tarball:
$ curl -OL https://github.com/SchiefLab/Jade/tarball/master
Once you have a copy of the source, you can install it with:
$ pip install -e .
This will symlink your cloned code instead of copying it - very useful for development.
Don’t forget to source your bashrc/bash_profile or whatever other shell profile you have.
Scripts¶
All scripts and applications in jade/apps will be installed in your bin directory and available for use.
API Documentation¶
Jade API¶
Subpackages¶
jade.antibody package¶
Subpackages¶
-
class
jade.antibody.decoy_data.DecoyData.
DecoyData
(name, has_real_values=True, reverse_top=False)[source]¶ Bases:
object
-
add_data
(strategy, con)[source]¶ Baseclass method - needs to be overridden in subclass :param strategy: Strategy to which we are adding data. :param con: Sqlite3 Connection object
-
get_concatonated_map
(by_score_tuple=False)[source]¶ Returns a defaultDic: Default:
decoy: DecoyDataTriple- by_score_tuple (for sorting on score and having possible redundancy)
- [score, decoy]: DecoyDataTriple
-
get_data_for_decoy
(strategy, decoy)[source]¶ Get the held data for the decoy :param strategy: Strategy Name :param decoy: Decoy name (with dir and suffix) :rtype: DecoyDataTriple
-
get_ordered_decoy_list
(strategy, top_n=None)[source]¶ Get an ordered array of decoy names by energy for a particular strategy :rtype: list of str
-
get_ordered_decoy_list_all
(top_n=None)[source]¶ Get an ordered array of decoy names by energy over all the strategies :rtype: list of str
-
get_pandas_dataframe
(top_n=None, drop_dir_prfix=False)[source]¶ Gets all data as a pandas dataframe. Uses the set name as the score. You can then order, or select specific ones using the data frame. :return: pandas.DataFrame
-
get_strategy_data
(strategy, by_score_tuple=False)[source]¶ For a particular strategy: Return a dictionary of decoy:DataTriple or if by_score_tuple:
[score, decoy] = DataTriple
-
get_top_all_data
(top_n, by_score_tuple=False)[source]¶ Over all the strategies: Return a dictionary of decoy:DataTriple or if by_score_tuple:
[score, decoy] = DataTriple
-
-
class
jade.antibody.decoy_data.DecoyDataTypes.
CombinedStrDecoyData
(filters, filt_name)[source]¶ Bases:
jade.antibody.decoy_data.DecoyData.DecoyData
DecoyData class that has value as a string of the 3 main scores. Value held in DecoyDataTriple is a string: dG::total::dSASA for reference.
-
class
jade.antibody.decoy_data.DecoyDataTypes.
IntHbondDecoyData
[source]¶ Bases:
jade.antibody.decoy_data.DecoyData.DecoyData
New way for int hbonds - added directly from IAM.
-
class
jade.antibody.decoy_data.DecoyDataTypes.
InterfaceHBondDecoyDataLoader
[source]¶ Bases:
jade.antibody.decoy_data.DecoyData.DecoyData
DecoyData class that holds the number of LH_A or L_H interface Hbonds, and energies. Very Slow to get this information.
- SO - Subsequent Hbond classes accept this on construction and then parse its information
-
class
jade.antibody.decoy_data.DecoyDataTypes.
InterfaceHbondCountDecoyData
[source]¶
-
class
jade.antibody.decoy_data.DecoyDataTypes.
InterfaceHbondEnergyDecoyData
[source]¶
-
class
jade.antibody.decoy_data.DecoyDataTypes.
dGTotalScoreSubset
[source]¶ Bases:
jade.antibody.decoy_data.DecoyData.DecoyData
dG of the top x percent of total score (for each strategy)
jade.antibody.CDRClusterer module¶
-
class
jade.antibody.CDRClusterer.
CDRClusterer
(bio_pose)[source]¶ A simple class for calculating a CDRs cluster from dihedrals or a renumbered pose.
-
get_fullcluster
(cdr_name, chain=None, region=None)[source]¶ IF DIHEDRALS is SET - AKA from before using the same class - WILL USE THE SAME DIHEDRALS AS BEFORE Rewritten from C++ code. Identifies the cluster of a known cdr type given either custom dihedrals or dihedrals calculated from a pose. Returns a pair or [cdr_cluster, distance] region is [int start, int end, chain] - This way you can cluster without renumbering if you want.
Return type: list[str, float]
-
set_custom_dihedrals
(dihedrals)[source]¶ Dihedrals is a dict: [‘phi’]=[x, y, z]; [‘psi’] = [x, y, z]; [‘omega’] = [x, y, z (degrees)]
-
jade.antibody.ClusterData module¶
-
class
jade.antibody.ClusterData.
CDRData
(ab_db, cdr_dir, limit_to_known=True)[source]¶ Bases:
jade.antibody.ClusterData.Data
-
get_cluster_data
(cdr_name, length, cluster)[source]¶ Parameters: - cdr_name – string
- length – int
- cluster – string
Returns: CDRClusterData
-
jade.antibody.ab_db module¶
-
jade.antibody.ab_db.
get_all_clusters_for_length
(db, cdr, length, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Get all unique clusters for a length and a cdr
-
jade.antibody.ab_db.
get_all_lengths
(db, cdr, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Get all unique lengths for a CDR
-
jade.antibody.ab_db.
get_cdr_rmsd_for_entry
(db, pdb, original_chain, cdr, length, fullcluster)[source]¶
-
jade.antibody.ab_db.
get_center_dih_degrees_for_cluster_and_length
(db, cdr, length, cluster)[source]¶ Returns a dictionary of center dihedral angles in positional order. Or returns False if not found. result[“phis’] = [phis as floats] result[“psis”] = [Psis as floats] result[“omegas”] = [Omegas as floats]
-
jade.antibody.ab_db.
get_center_for_cluster_and_length
(db, cdr, length, cluster, data_names_array)[source]¶ Get the center for a particular cluster and length
-
jade.antibody.ab_db.
get_data_for_cluster_and_length
(db, cdr, length, cluster, data_names_array, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Get a set of data of a particular length, cdr, and cluster. data_names_array is a list of the types of data. Can include DISTINCT keyword
Example: data_names_array = [“PDB”, “original_chain”, “new_chain”, “sequence”]
-
jade.antibody.ab_db.
get_dihedral_string_for_centers
(db, limit_to_known=True)[source]¶ Get a string of the dihedral angles for all centers
-
jade.antibody.ab_db.
get_pdb_chain_subset
(db, gene, use_cutoffs=False, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Return a list of tuples of [pdb, chain] of the particular gene
jade.antibody.outliers module¶
jade.antibody.split_structure module¶
-
jade.antibody.split_structure.
run_split_proto_CDR4
(ab_dir, output_dir, overhang=0, skip_present=False)[source]¶
-
jade.antibody.split_structure.
run_split_proto_CDR4_by_gene
(db, ab_dir, output_dir, overhang=0, skip_present=False, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶
-
jade.antibody.split_structure.
separate_pdb
(pdb_path, output_dir, only_dimer=True)[source]¶ Determine if we have Fv or FAB. If FAB, split into parts: Fc, Fv, linker.
-
jade.antibody.split_structure.
separate_proto_CDR4
(pdb_path, output_dir, chain, overhang=0, skip_present=True)[source]¶
-
jade.antibody.split_structure.
split_Fc
(parent_PDB)[source]¶ Split Fc from FAB. Return new pdb_map to save
-
jade.antibody.split_structure.
split_Fv
(parent_PDB)[source]¶ Split Fv from FAB. Return new pdb_map to save
-
jade.antibody.split_structure.
split_linker
(parent_PDB)[source]¶ Split 4 Residue linker from FAB. Linker may be longer than this, but I think this is about it.
jade.basic package¶
Subpackages¶
-
class
jade.basic.filters.DataFilters.
H3ExtendedFilter
[source]¶ Bases:
jade.basic.filters.DataFilter.DataFilter
Filter to remove kinked H3 structures
-
class
jade.basic.filters.DataFilters.
TotalScoreCutoffFilter
(value)[source]¶ Bases:
jade.basic.filters.DataFilter.DataFilter
Filter to remove structures with total_score greater than a particular value
-
class
jade.basic.filters.DataFilters.
dGCutoffFilter
(value)[source]¶ Bases:
jade.basic.filters.DataFilter.DataFilter
Filter to remove structures with LH_A dG greater than a particular value
-
class
jade.basic.filters.DataFilters.
dSASACutoffFilter
(value)[source]¶ Bases:
jade.basic.filters.DataFilter.DataFilter
Filter to remove dSASA greater than some value
-
class
jade.basic.pandas.PandasDataFrame.
GeneralPandasDataFrame
(data=None, index=None, columns=None, dtype=None, copy=False)[source]¶ Bases:
pandas.core.frame.DataFrame
-
get_matches
(column, to_match)[source]¶ Get all the rows that match a paricular element of a column. :param column: str :param to_match: str :rtype: pandas.DataFrame
-
get_row_matches
(column1, to_match, column2)[source]¶ Get the elements of the rows that match a particular column. If one element, this can be converted easily enough :param column1: str :param to_match: str :param column2: str :rtype: pandas.Series
-
n_matches
(column, to_match)[source]¶ Return the number of matches. :param column: str :param to_match: str :rtype: int
-
to_tsv
(path_or_buf=None, na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, doublequote=True, escapechar=None, decimal='.')[source]¶
-
-
jade.basic.pandas.PandasDataFrame.
detect_numeric
(df)[source]¶ Detect numeric components
Parameters: df – pd.DataFrame Return type: pd.DataFrame
-
jade.basic.pandas.PandasDataFrame.
drop_duplicate_columns
(df)[source]¶ Drop Duplicate columns from the DataFrame. Return DF
Parameters: df – pandas.DataFrame Return type: pandas.DataFrame
-
jade.basic.pandas.PandasDataFrame.
get_columns
(df, columns)[source]¶ Get a new dataframe of only the columns
Parameters: - df – pandas.DataFrame
- columns – list
Return type: pd.DataFrame
-
jade.basic.pandas.PandasDataFrame.
get_match_by_array
(df, column, match_array)[source]¶ Get a new dataframe of all dataframes of the subset series, match_array
- Note: This will result in a dataframe, but there may be strange issues when you go to plot the data in seaborn
- No idea why.
Parameters: - df – pd.DataFrame
- column – str
- match_array – pd.Series
Return type: pd.DataFrame
-
jade.basic.pandas.PandasDataFrame.
get_matches
(df, column, to_match)[source]¶ Get all the rows that match a paricular element of a column.
Parameters: - df – pandas.DataFrame
- column – str
- to_match – str
Return type: pd.DataFrame
-
jade.basic.pandas.PandasDataFrame.
get_multiple_matches
(df, column, to_match_array)[source]¶ Get all the rows that match any of the values in to_match_array.
Parameters: - df – pandas.DataFrame
- column – str
- to_match_array – list
Return type: pd.DataFrame
-
jade.basic.pandas.PandasDataFrame.
get_n_matches
(df, column, to_match)[source]¶ Get the number of matches :param df: pd.DataFrame :param column: str :param to_match: :rtype: int
-
jade.basic.pandas.PandasDataFrame.
get_row_matches
(df, column1, to_match, column2)[source]¶ Get the elements of the rows that match a particular column. If one element, this can be converted easily enough :param df: pd.DataFrame :param column1: str :param to_match: str :param column2: str :rtype: pd.Series
-
jade.basic.pandas.PandasDataFrame.
get_value
(df, column)[source]¶ Get a single value from a one-row df. THis is to help for implicit docs, since the syntax to Iloc is so fucking strange.
Parameters: - df – pd.DataFrame
- column – str
Returns: value
-
jade.basic.pandas.PandasDataFrame.
multi_tab_excel
(df_list, sheet_list, file_name)[source]¶ Writes multiple dataframes as separate sheets in an output excel file.
If directory of output does not exist, it will create it.
Author: Tom Dobbs http://stackoverflow.com/questions/32957441/putting-many-python-pandas-dataframes-to-one-excel-worksheet
Parameters: - df_list – [pd.Dataframe]
- sheet_list – [str]
- file_name – str
-
jade.basic.pandas.stats.
calculate_stddev
(df, x, y, hue=None)[source]¶ Calcuates standard deviations for a normal distribution (Numerical data) over X and Hue categories.
If hue is given, the hue column will be added, and the overall will be of ‘ALL’
Example DataFrame output (x=’exp’, y= ‘length_recovery_freq’, hue = ‘cdr’:
SD cdr exp y20 6.739596 H2 ALL length_recovery_freq 21 7.373650 H2 min.remove_antigen-F length_recovery_freq 22 6.400637 ALL min.remove_antigen-T length_recovery_freq
Parameters: - df – pandas.DataFrame
- x – str
- y – str
- total_column – str
- hue – str
Return type: pandas.DataFrame
-
jade.basic.pandas.stats.
calculate_stddev_binomial_distribution
(df, x, y, total_column, y_mean_column, hue=None)[source]¶ Calculates standard deviations for a binomial distribution (like experiment True/False values) over X and Hue categories..
Typically used for bar-plot.
If hue is given the hue column will be added, and the overall will be of ‘ALL’, plus that of Hue
Example DataFrame output (x=’exp’, y= ‘length_recovery_freq’, hue = ‘cdr’:
SD cdr exp y20 6.739596 H2 ALL length_recovery_freq 21 7.373650 H2 min.remove_antigen-F length_recovery_freq 22 6.400637 ALL min.remove_antigen-T length_recovery_freq
Parameters: - df – pandas.DataFrame
- x – str
- y – str
- total_column – str
- hue – str
Return type: pandas.DataFrame
-
class
jade.basic.plotting.MakeFigure.
MakeFigure
(rows=1, columns=1, share_x=True, share_y=True)[source]¶ Deprecated. NOW - GO Checkout SEABORN instead of this class! Essentially, this is an interface to a facet grid. Seaborn does this awesomely.
My take on a plotting interface. Because I think matplotlib’s interface sucks.
I wrote this before I knew of pandas.
- You need to know the number of plots ahead of time by passing the grid.
1x1 will make one plot. 2x2 will make a grid of 4 plots. 1x3 is 3 columns of grids horizontally 3x1 is a list of figures.
share_x and share_y tell the full sublplot to share the axis.
-
fill_subplot
(title, labels, x_axis_label=None, y_axis_label=None, index=None, grid=None, add_legend=False, linestyle='--', marker='^', colors=None)[source]¶ This will add data to a particular subplot/plot.
: title: : labels: : x_axis_label: : y_axis_label: : specify_index: : add_legend: : linestyle: : marker: : colors: :return:
-
jade.basic.plotting.MakeFigure.
pad_single_title
(ax, x=0.5, y=1.05)[source]¶ Move the Title up in reference to the plot, essentially adding padding. SINGLE AXES :param ax:Axes :param x: :param y: :return:
-
jade.basic.plotting.MakeFigure.
plot_general_pandas
(df, title, outpath, plot_type, x, y=None, z=None, top_p=0.95, reverse=True)[source]¶ Plot anything in pandas. Make it look descent. Save the figure.
- If you are doing this multiple times in a Notebook:
- Don’t forget to call (matplotlib.pyplot)
- plot.show() plot.close()
Parameters: - df – pandas.DataFrame
- title – str
- outpath – str
- plot_type – str
- x – str
- y – str
- z – str
- top_p – float
- reverse – bool
Return type: matplotlib.Axes
-
jade.basic.plotting.MakeFigure.
plot_x_vs_y_sea_with_regression
(df, title, outpath, x, y, top_p=0.95, reverse=True)[source]¶ Plot X vs Y using a Pandas Dataframe and Seaborn, with regression line., save the figure, and return the Axes.
- If you are doing this multiple times in a Notebook:
- Don’t forget to call (matplotlib.pyplot)
- plot.show() plot.close()
Parameters: - df – pandas.DataFrame
- title – str
- outpath – str
- x – str
- y – str
- top_p – float
- reverse – bool
Return type: matplotlib.Axes
-
jade.basic.plotting.correlations.
annotate_r_value
(data, x, y, ax, func=<function pearsonr>, template=None, stat=None, loc='best')[source]¶ Forked from seaborn JointPlot for use with regplot, scatter, etc. Woot. Needs to actually go into Seaborn Now!
Annotate the plot with a statistic about the relationship.
data: pandas.DataFrame x: str y: str ax: matplotlib.Axes
- func : callable
- Statistical function that maps the x, y vectors either to (val, p) or to val.
- template : string format template, optional
- The template must have the format keys “stat” and “val”; if func returns a p value, it should also have the key “p”.
- stat : string, optional
- Name to use for the statistic in the annotation, by default it uses the name of func.
- loc : string or int, optional
- Matplotlib legend location code; used to place the annotation.
-
jade.basic.plotting.error_bars.
calculate_set_errorbars_hist
(ax, data, x, y, binomial_distro=True, total_column='total_entries', y_freq_column=None, x_order=None, hue_order=None, hue=None, caps=True, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶ Calculates the standard deviation of the data, sets erorr bars for a bar chart. Default base_columnwidth for seaborn plots is .8
Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting. Note:
If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.Parameters: - ax – mpl.Axes
- data – pandas.DataFrame
- x – str
- y – str
- binomial_distro – bool
- total_column – str
- y_freq_column – str
- x_order – list
- hue_order – list
- hue – str
- caps – bool
- color – str
- linewidth – float
- base_columnwidth – float
- full – bool
Return type: None
-
class
jade.basic.sequence.ClustalRunner.
ClustalRunner
(fasta_path, clustal_name='clustal_omega', clustal_dir=None)[source]¶ A very simple class wrapper to run clustal omega.
-
output_alignment
(out_dir, out_name, parellel_process=False)[source]¶ Configure command line and Run Clustal Omega
-
set_extra_options
(extra_options='')[source]¶ Set any extra options as a string which will be added to the end of the command line.
-
-
class
jade.basic.sequence.PDBConsensusInfo.
PDBConsensusInfo
(resinfo_list)[source]¶ Class to compute frequency and probability from an array of PDBInfo classes. The sequences within PDBInfo do not necessarily need to be the same length. A given sequence position is identified and stored in the data maps by its [pdb_num, chain, and icode] -> Use get_position_from_residue(residue) to get this position from a Residue instance.
-
compute_stats
()[source]¶ Compute frequency and probability (0-1) for each position for each amino acid
-
get_probability
(residue, aa)[source]¶ Get probability of the current position (starting from 0) and aa
-
-
class
jade.basic.sequence.SequenceInfo.
SequenceInfo
[source]¶ Simple class for holding + accessing sequence metadata
Original class for sequence info. Basically deprecated by SequenceStats and PDBConsensusInfo.
-
class
jade.basic.sequence.SequenceResults.
SequenceResults
[source]¶ Simple class for holding, calculating, + accessing result data Residue Numbers are in Rosetta numbering.
Original class for sequence stats. Basically deprecated by SequenceStats and PDBConsensusInfo.
-
get_all_reference_percent_observed
()[source]¶ Returns array of tripplets of [postion, one_letter_code, percent] of reference amino acid found.
-
get_decoys_with_aa
(resnum, one_letter_code)[source]¶ Returns all decoys with a specific mutation at a position.
-
-
class
jade.basic.sequence.SequenceStats.
SequenceStats
(sequence_list)[source]¶ Class for getting data from an array of strings of sequences (one letter code) of equal length.
-
compute_stats
()[source]¶ Compute frequency and probability (0-1) for each position for each amino acid
-
-
jade.basic.sequence.fasta.
chain_fasta_files_from_pose
(pose, prefix, outdir)[source]¶ Creates fasta for each chain in the pose. Returns a list of paths for each fasta.
-
jade.basic.sequence.fasta.
chain_fasta_from_biostructure
(structure, outname, outdir)[source]¶ Creates a single fasta from biopython structure, split by individual chains.
-
jade.basic.sequence.fasta.
chain_fasta_from_pose
(pose, outname, outdir)[source]¶ Creates a single fasta from pose, split by individual chains.
-
jade.basic.sequence.fasta.
fasta_from_pose
(pose, fasta_label, outname, outdir)[source]¶ Creates a fasta from the pose.
-
jade.basic.sequence.fasta.
fasta_from_sequences
(sequences, outdir, outname)[source]¶ Output a general fasta, with tag being 1_outname etc. Use write_fasta for more control. Returns path to Fasta File written
-
jade.basic.sequence.fasta.
get_label_from_fasta
(fasta_path)[source]¶ Gets the first chainID found - Should be a single chain fasta file.
-
jade.basic.sequence.fasta.
output_fasta_from_pdbs_biopython
(path_header_dict, out_path, native_path=None, native_label='native', is_camelid=False)[source]¶ Used only for L and H chains! Concatonates the L and H in order if present, otherwise assumes camelid at H.
-
jade.basic.sequence.fasta.
output_weblogo
(alignment_path, outdir, outname, tag='Dunbrack Lab - Antibody Database Team')[source]¶
-
jade.basic.sequence.fasta.
output_weblogo_for_sequences
(sequences, outdir, outname, tag='Dunbrack Lab - Antibody Database Team')[source]¶
-
jade.basic.sequence.fasta.
read_header_data_from_fasta
(fasta_path)[source]¶ Reads > from fasta (PDBAA) and returns a defaultdict of pdb_chain: [method, residues, resolution, R factor]
-
class
jade.basic.structure.BasicPose.
BasicPose
(pdb_file_path='')[source]¶ -
-
change_occupancy
()[source]¶ Changes ALL occupancies in a PDB dictionary to 1.00 Returns PDB Dictionary.
-
clean_PDB
()[source]¶ Removes HSD, Waters: Tries to fix atom and residue name inconsistencies. HAS worked for changing a single MD pdb (NAMD) frame to Rosetta file. PLEASE Expand if possible to alias all residues for Rosetta compatability. NOT gaurenteed, but SHOULD work ok.
-
combine_pdb
(py_pdb)[source]¶ Combines pdb_map from instance of PyPDB to this one. Does not do any checks.
-
copy_all_but_chains_into_pdb_map
(py_pdb, chains)[source]¶ Copies all data from one pdb_map of a py_pdb of all data except the specified chains into this one. Useful for reordering chains.
-
copy_chain_into_pdb_map
(py_pdb, chain)[source]¶ Copies all data from one pdb_map of a py_pdb of a chain into the one held in this class. Useful for reordering chains.
-
morph_line_in_pdb_map_to_pdb_line
(entry)[source]¶ Oh What fun. ;) Magic Numbers?: (6,5,4,3,1,4,8,8,8,4,5);
-
pdb_alias
(pairs, element)[source]¶ Replaces ALL occurances of old element with new from pair. pair is a dictionary. In C++ it would be an array of pairs. [string old]:[string new] For Specific functions, please see below.
-
pdb_atom_alias
(line_num, pair)[source]¶ Replaces atom_names with ones Rosetta is happy with. pair is a dictionary. In C++ it would be an array of pairs. [string MD atom_name]:[string rosetta atom_name]
-
pdb_chain_alias
(pairs)[source]¶ Replaces ALL occurances of old chain with new chain. pair is a dictionary. In C++ it would be an array of pairs. [string old chain]:[string new chain]
-
pdb_residue_alias
(pairs)[source]¶ Replaces ALL occurances of old residue with new residue. pair is a dictionary. In C++ it would be an array of pairs. [string old residue_name]:[string new residue_name]
-
read_file_and_replace_b_factors
(deliminator, filename='', resnum_column=1, chain_column=2, data_column=3, atomname_column=False)[source]¶ This function reads a deliminated file with data and inserts the data into the BFactor column. Used to visualize arbitrary data. Use function options to control which column the data is in as well as where your resnums and chains are located. If atomname column is given, will insert by atom instead of by residue
-
remove_alternate_residues
()[source]¶ Removes any alternate residue codes and renumbers by renumbering from 1 and integrating any inserts.
-
remove_element_column
()[source]¶ Removes the extra stuff in the element column, but not the element itself.
-
replace_atom_b_factor
(resnum, chain, atomname, data)[source]¶ Replaces the b factor of an atom. Can be all string representations or not.
-
replace_residue_b_factor
(resnum, chain, data)[source]¶ Replaces the b factor of each atom in the residue with data. Can be all string representations or not.
-
-
class
jade.basic.structure.BioPose.
BioPose
(path, model_num=0)[source]¶ Bases:
object
This is my biopython meta class. Because biopython’s interface kinda sucks. This is a little cleaner.
The other way is to sublclass each Biopython class structure, which I’m not ready to do.
Right now, you need a path as I don’t know how we would use this from sequence, etc as you do in Rosetta. :path: Is a path to an RCSB file. PDB (.pdb), mmCIF(.cif), and gzipped (.gz) versions.
-
atom
(atom_name, resnum, chain_id, icode=' ', alt=' ', model_num=0)[source]¶ Get a Bio Atom of the stored structure
Parameters: - atom_name – str
- resnum – int
- chain_id – str
- icode – str
- alt – str
- model_num – int
Return type: bio.PDB.Atom.Atom
-
atoms
(resnum, chain_id, icode=' ', alt=' ', model_num=0)[source]¶ Get a list of Bio Atoms :param resnum: int :param chain_id: str :param icode: str :param alt: str :param model_num: int :rtype: list[bio.PDB.Atom.Atom]
-
chain
(chain_id, model_num=0)[source]¶ Get a Bio Chain of the stored structure :param chain_id: str :param model_num: int :rtype: bio.PDB.Chain.Chain
-
chains
(model_num=0)[source]¶ Get a list of Bio Chains :param model_num: int :rtype: list[bio.PDB.Chain.Chain
-
get_chain_ids
(model_num)[source]¶ Get all chain IDS for a model. :param model_num: int :rtype: list[str]
-
get_chain_length
(chain_id, model_num=0)[source]¶ Get the number of AA in a chain - Not including alternate res locations :param chain_id: str :rtype: int
-
get_sequence
(chain_id, model_num=0)[source]¶ Get a sequence of a chain - Not including alternate res locations
Parameters: - chain_id – str
- model_num – int
Return type: str
-
load_from_file
(path)[source]¶ Load a file from PDB or mmCIF. .gz is supported.
Parameters: path – Path to PDB or mmCIF file Return type: tuple(bio.PDB.Structure.Structure, dict)
-
model
(model_num=0)[source]¶ Get a Bio Model of the stored structure :param id: int :rtype: bio.PDB.Model.Model
-
omega
(i, rosetta_definitions=True)[source]¶ Get the Omega Angle of i in radians Omega is defined as the dihedral angle between the peptide bond of i and i + 1, as in Rosetta. If rosetta_definitions are False, omega is then treated as being between i and i -1
Parameters: - i – int
- reverse_rosetta_definitions – bool
Return type: float
-
reload_from_file
(path, model_num=0)[source]¶ Reload a BioPose from a file path. :param path: str :param model_num: int :return:
-
res_bond_distance
(resi)[source]¶ Get the stored bond distances between residue and residue+1 :param res: int :rtype: float
-
residue
(resnum, chain_id, icode=' ', alt=' ', model_num=0)[source]¶ Get a Bio Residue of the stored structure. Adds a chain_id attribute.
Parameters: - resnum – int
- chain_id – str
- icode – str
- alt – str
- model_num – int
Return type: bio.PDB.Residue.Residue
-
residues
(chain_id, model_num=0, include_alt=False)[source]¶ Get residues, including or not including residues with alternate location codes - which can be a PITA Adds chain_id attribute to residue.
Parameters: - chain_id – str
- model_num – int
- include_alt – bool
Return type: list[bio.PDB.Residue.Residue]
-
-
class
jade.basic.structure.SQLPose.
PDB_database
(database)[source]¶ This class is specifically for if we already have a database. Note: This is not a ROSETTA database. If you need to convert this, use ROSETTA (Which now works in PyRosetta!) Functions are to output the database as a PDB, output specific pieces of protein as a pdb and query the database.
-
save_cur_as_pdb
(outpath, supress_modelSep=False)[source]¶ Saves the DB at the current cursor to a file. Make sure cursor is on the pdb table.
-
save_whole_db_as_db
(filename, seperate_structures=False)[source]¶ Saves the whole database in MEMORY to a file….
-
scrub
(table_name)[source]¶ This should help protect from sql injection. Not that it’s important now, but… Author:OrangeOctopus from stack overflow
-
-
class
jade.basic.structure.SQLPose.
SQLPose
(pdbID, modelID, structID, memory=False, path='')[source]¶ -
fetch_and_read_pdb_into_database
(pdbID, read_header=False, header_only=False)[source]¶ Uses the PDB file specified, grabs it from the PDB, and reads the data in.
-
read_pdb_into_database_flat
(filePath, specific_chain=False, read_header=False, header_only=False)[source]¶ Reads the flat filepath specified into a database structure. This can then be parsed using the PDB_Database class. NOTE: Reading of header not implemented. if header_only is True, only loads the header. Useful for just getting specific information. More useful to D/L it from the pdb if possible. If Header only, reads the header into the database.
-
-
class
jade.basic.structure.Structure.
AntibodyResidueRecord
(aa, pdb_res_num, chain, icode=' ')[source]¶ Bases:
jade.basic.structure.Structure.ResidueRecord
Extension of Residue used to hold and access extra data used for renumbering/printing renumbering info I could backport python Enums, which would be incredibly useful here, but I don’t want to require the additional step.
- used in Python3.4
-
class
jade.basic.structure.Structure.
AntibodyStructure
[source]¶ Simple class for accessing Modified_AHO antibody numbering information outside of Rosetta.
-
class
jade.basic.structure.Structure.
PDBInfo
[source]¶ Bases:
object
Analogous to Rosetta PDBInfo Class I should start at 1
-
class
jade.basic.structure.Structure.
ResidueRecord
(one_letter_aa, pdb_num, chain, icode=' ')[source]¶ Bases:
object
Basic class to PDBInfo
-
jade.basic.structure.util.
atomic_distance
(res1, res2, res1_atom_name, res2_atom_name)[source]¶ Return the atomic distance between two arbitrary Bio residues and two arbitrary atom names. :param res1: Bio.PDB.Residue.Residue :param res2: Bio.PDB.Residue.Residue :param res1_atom_name: str :param res2_atom_name: str :rtype: float
-
class
jade.basic.threading.Threader.
Threader
(print_interval=0)[source]¶ Bases:
object
Class for starting 2 new threads. One that runs a system process and one that waits and prints info to std::out or whatever you currently have set as std::out. Use print interval to set the wait time between prints. Useful for GUI subprocessing.
jade.basic.RestypeDefinitions module¶
-
class
jade.basic.RestypeDefinitions.
ResTypeSergey
(ignore_groups=[])[source]¶ Bases:
object
Residue Types corresponding to Sergey Menis’ definition of groups.
jade.basic.general module¶
-
jade.basic.general.
extract_score_from_decoy
(pdb_path)[source]¶ Extract total score from a rosetta decoy (gzipped or otherwise)
If score is not found, it will return 0.
Parameters: pdb_path – Returns: float
-
jade.basic.general.
fix_input_args
()[source]¶ Enables options to be passed to ArgumentParser with dashes, but not single charactor ones. Example:
–rosetta_args “-out:prefix test -out:path:all my/dir/”- Normally, this would fail if you had declared an -o option to the ArgumentParser.
- This happens because although the quotes are being parsed correctly, the system is looking or options using the starting ‘-‘ charactor. If you give a quote and then a space, you will recieve no error.
- This code essentially checks for single dashes and puts a space in front of them. Note that this does not work with single
- charactor options you are hoping to pass with a quote. Because there is no way to grab the input string from the system and fix it myself, for these it will have to have a space after the quotes. This at least fixes the most common use cases (Mostly for use with Rosetta.).
-
jade.basic.general.
get_all_combos
(list_of_lists)[source]¶ Get all the position-specific combos of a list of lists.
- This is taken directly from Stack Overflow:
- http://stackoverflow.com/questions/798854/all-combinations-of-a-list-of-lists
Parameters: list_of_lists – A list of lists we would like combos of. Return type: list[list]
-
jade.basic.general.
get_platform
()[source]¶ Get OS of the particular platform the toolkit is being run on.
-
jade.basic.general.
get_rosetta_program
(program, mpi=True, compiler='gcc')[source]¶ Get the set program
-
jade.basic.general.
match_patterns
(search_string, patterns)[source]¶ Uses RE to match multiple patterns. Returns boolean of success
Parameters: - search_string – str
- patterns – [str]
Return type: boolean
-
jade.basic.general.
merge_dicts
(*dict_args)[source]¶ Given any number of dicts, shallow copy and merge into a new dict, precedence goes to key value pairs in latter dicts. (Pre-Python 3.5) (http://stackoverflow.com/questions/38987/how-to-merge-two-python-dictionaries-in-a-single-expression)
jade.basic.numeric module¶
-
jade.basic.numeric.
distance
(x1, y1, z1, x2, y2, z2)[source]¶ Get the distance between variables. :param x1: float :param y1: float :param z1: float :param x2: float :param y2: float :param z2: float :rtype: float
-
jade.basic.numeric.
distance_numpy
(array1, array2)[source]¶ Get the distance between two points :param array1: numpy.Array :param array2: numpy.Array :rtype: float
-
jade.basic.numeric.
geometric_mean
(data)[source]¶ Get the geometric mean of the data. Useful for numbers that go from 0 -> and are a type of enrichment of the data.
Parameters: data – numpy.Array Returns: float
jade.basic.path module¶
-
jade.basic.path.
get_database_testing_path
()[source]¶ Get the path to the database testing file. :return:
-
jade.basic.path.
get_decoy_extension
(decoy)[source]¶ Return the extension of the decoy. .pdb, .pdb.gz, .cif, .cif.gz, etc. :param decoy: str :rtype: str
-
jade.basic.path.
get_decoy_name
(decoy)[source]¶ Get the decoy name from path or name, whether .pdb, .pdb.gz or no extension. :param decoy: :rtype:str
-
jade.basic.path.
get_decoy_path
(decoy, alternate_paths=None)[source]¶ Search .pdb, .pdb.gz, .cif, .cif.gz, .xml, .xml.gz In addition, Search alternative search paths. Return found path or NONE.
Parameters: - decoy –
- alternate_paths –
:rtype:str
-
jade.basic.path.
get_directories_recursively
(inpath)[source]¶ Get a list of directories recursively in a path. Skips hidden directories. :param inpath: str :rtype: list
-
jade.basic.path.
get_file_paths
(pattern, dir, ext='.pdb')[source]¶ Get file paths matching the exact pattern and extension. :param pattern: :param dir: :param ext: :return:
-
jade.basic.path.
get_make_get_dirs
(root, dirs)[source]¶ Recursively make dirs and return the final path :param root: :param dirs: :rtype: str
-
jade.basic.path.
get_matching_pdbs
(directory, pattern, ext='.pdb')[source]¶ Get pdbs in a directory matching a pattern. :param directory: :param pattern: :param ext: :return:
-
jade.basic.path.
get_rosetta_features_root
()[source]¶ Get the path to Rosetta features directory through set ROSETTA3_DB env variable. :rtype: str
-
jade.basic.path.
get_rosetta_features_run_script
()[source]¶ Get the path to Rosetta features script dir through the set ROSETTA3_DB env variable. :rtype: str
-
jade.basic.path.
get_testing_inputs_path
()[source]¶ Get the path to testing inputs (PDBs,fasta,etc.) :rtype:str
-
jade.basic.path.
get_xml_scripts_path
()[source]¶ Get the path to the Rosetta xml script directory. Useful for variable substitutions. :rtype: str
jade.clustering package¶
jade.clustering.CaliburRunner module¶
-
class
jade.clustering.CaliburRunner.
CaliburWrapper
(caliburPath=<type 'set'>)[source]¶
jade.machine_learning package¶
jade.pymol_jade package¶
jade.pymol_jade.PyMolScriptWriter module¶
-
class
jade.pymol_jade.PyMolScriptWriter.
PyMolScriptWriter
(outdir)[source]¶ Class to help build PyMol scripts using arbitrary lists of PDBs.
Example for loading all top models into PyMol, aligning them to the native, and labeling them:
scripter = PyMolScriptWriter(outpath)
- if native_path:
- scripter.add_load_pdb(native_path, “native_”+os.path.basename(native_path))
scripter.add_load_pdbs(pdb_path_list, load_as_list) scripter.add_align_all_to(scripter.get_final_names()[0]) scripter.add_show(“cartoon”) scripter.add_line(“center”) scripter.add_save_session(pse_path) scripter.write_script(“load_align_top.pml”) run_pymol_script(top_dir+”/”+”load_align_top.pml”)
-
add_align_all
(sele1='', sele2='', limit_to_bb=True, pair_fit=False)[source]¶ Align all to the first model
-
add_align_all_to
(model, sele1='', sele2='', limit_to_bb=True, pair_fit=False)[source]¶ Align all to a particular model
-
add_align_to
(model1, model2, sele1='', sele2='', limit_to_bb=True, pair_fit=False)[source]¶ Align one model to another, optionally specifying a selection. Recommended to use superimpose instead
-
add_antibody_script
()[source]¶ Add running the color cdrs pymol script. Antibody must be in AHO numbering
-
add_color
(sele, color)[source]¶ Add color to a selection. sele: PyMol Selection color: Particular color.
See Also self.colors
-
add_group_object
(name, new_group_name)[source]¶ Group a single object to another. Useful for meta-groups.
-
add_load_pdb
(pdb_path, load_as=None, group=None)[source]¶ Add line to load a PDB Path into PyMol Optionally load them as a particular name Will then set the final names PyMol uses to the object.
-
add_load_pdbs
(pdb_paths, load_as=None, group=None)[source]¶ Add lines to load the list of PDB paths into PyMol Optionally load them as a particular name Will then set the final names PyMol uses to the object.
-
get_sele
(chain, resid_array)[source]¶ Get a a selection from an array of residue IDs and a particular chain. If the residue Id is a two-element tupple, then add a selection between the first and last element
-
jade.pymol_jade.PyMolScriptWriter.
make_pymol_session_on_top
(pdb_path_list, load_as_list, script_dir, session_dir, out_name, top_num=None, native_path=None, antibody=True)[source]¶ Make a pymol session on a set of decoys. Usually an ordered decoy list. :param top_dir: :param pdb_path_list: List of PDB Paths :param load_as_list: List of PDB Path names for pymol. :param outdir: :param out_name: :param top_num: :param native_path: :return:
-
jade.pymol_jade.PyMolScriptWriter.
make_pymol_session_on_top_ab_include_native_cdrs
(pdb_path_list, load_as_list, script_dir, session_dir, out_name, cdr_dir, top_num=None, native_path=None)[source]¶ Make a pymol session on a set of decoys. These decoys should have REMARK CDR_origin. These origin pdbs will be aligned and included in the pymol session :param top_dir: :param pdb_path_list: List of PDB Paths :param load_as_list: List of PDB Path names for pymol. :param cdr_dir: The directory of antibody CDRs from PyIgClassify. :return:
-
jade.pymol_jade.PyMolScriptWriter.
make_pymol_session_on_top_scored
(pdbpaths_scores, script_dir, session_dir, out_name, top_num=-1, native_path=None, antibody=True, parellel=True, super='', run_pymol=True, model_names=[])[source]¶ Make a pymol session on a set of decoys with a tuple of [[score, pdb], … ] Optionally, it can be a 3 length tupple with model name to use as last:
[[score, pdb, model_name], … ]if run_pymol is False, will not run pymol.
Pymol names will be: model_n_RosettaModelNumber_score Score will be truncated to two decimal places.
Returns configured PyMol Scripter for extra use.
Parameters: - pdbpaths_scores – tuple of [[score, pdb], … ]
- script_dir – Path to output PyMol script
- session_dir – Path to output Session
- out_name – name of the Pymol session
- top_num – Optional - Only output TOP N models
- native_path – Optional - Path to any input native to add to pymol session
- parellel – Optional - Run in parellel (so many pymol sessions can be created at once)
- super – Optional - Super to THIS particular selection instead of align_all to.
- run_pymol – Optional - Run Pymol using script? Default true
Return type:
jade.RAbD package¶
Subpackages¶
jade.RAbD.AnalyzeAntibodyDesigns module¶
-
class
jade.RAbD.AnalyzeAntibodyDesigns.
CompareAntibodyDesignStrategies
(db_dir, out_dir_name, strategies=[], jsons=[])[source]¶ Class mainly for comparing different Antibody Design strategies using our Features Databases.
-
get_csv_data
(top=False, summary=False)[source]¶ Get data by converting everything to a pandas dataframe first. For now, one function pretty much does everything.
Return type: [pandas.Dataframe],[str]
-
get_top_dataframe_by_all_scores
()[source]¶ Get a pandas DataFrame for top, grouped by the type of score that is on. :rtype: pandas.DataFrame
-
get_top_from_dataframe
(score_name)[source]¶ Gets a pandas Dataframe for top :rtype: pandas.DataFrame
-
output_csv_data
(top=False, summary=False)[source]¶ Output a CSV file of combined or individual data.
-
-
class
jade.RAbD.AnalyzeAntibodyDesigns.
Perc
(count, total)[source]¶ Simple class for holding enrichment/recovery information
-
jade.RAbD.AnalyzeAntibodyDesigns.
calculate_enrichments
(all_decoy_data, cdr, decoy_list=None)[source]¶ Returns defaultdict of [count_type] : Perc
-
jade.RAbD.AnalyzeAntibodyDesigns.
calculate_observed_value
(value, all_decoy_data, cdr, decoy_list=None)[source]¶ Calculate the enrichment of some value
jade.RAbD_BM package¶
jade.RAbD_BM.AnalysisInfo module¶
-
class
jade.RAbD_BM.AnalysisInfo.
AnalysisInfo
(json_path)[source]¶ Simple class that parses a json file which defines (USING RELATIVE PATHS):
- exp - The name of the experiment - whatever you want it to be.
- decoy_dir - the directory of the decoys.
- features_db - the db where the features reporters have been run.
The class will store this information, and parse the benchmark info in the decoy dir, storing a BenchmarkInfo object. Benchmark classes and scripts will take lists to these analysis files and use them to generate plots and data.
jade.RAbD_BM.AnalyzeRecovery module¶
-
class
jade.RAbD_BM.AnalyzeRecovery.
AnalyzeRecovery
(pyig_design_db_path, analysis_info, native_info, cdrs=None)[source]¶ Pools Recovery and RR data, outputs to DB
-
jade.RAbD_BM.AnalyzeRecovery.
calculate_exp_rr_and_recovery
(exp, result_df)[source]¶ Calculate the overall recovery and risk ratio. :param exp: :param result_df: :rtype: pandas.DataFrame
-
jade.RAbD_BM.AnalyzeRecovery.
calculate_per_cdr_rr_and_recovery
(exp, cdrs, result_df)[source]¶ Calculate the recovery and risk-ratios PER CDR. :rtype: pandas.DataFrame
-
jade.RAbD_BM.AnalyzeRecovery.
calculate_recovery_and_risk_ratios
(top_recovery_df, observed_df)[source]¶ Calculate the Risk Ratio and Recovery Percent for each pdb/cdr given dataframes output by the calculators below.
Return a merged dataframe of the top recovery and observed, with the resulting risk ratio data.
Parameters: - top_recovery_df – pandas.DataFrame
- observed_df – pandas.DataFrame
Return type: pandas.DataFrame
jade.RAbD_BM.RunBenchmarksRAbD module¶
-
class
jade.RAbD_BM.RunBenchmarksRAbD.
RunBenchmarksRAbD
[source]¶ Bases:
jade.rosetta_jade.RunRosettaBenchmarks.RunRosettaBenchmarks
Benchmark class specifically for RAbD
Details:
ALL INPUT PDBs should go into
project_root/datasetsTypically, you will have multiple directories - native, relaxed, etc.
This is specified as a benchmark using ‘input_pdb_type’ in your json file.ALL PDBLISTs for benchmarking should go into
project_root/datasets/pdblists
jade.RAbD_BM.benchmark_plotting module¶
-
class
jade.RAbD_BM.benchmark_plotting.
NativeCDRData
(datatype, native_path, data_table='cdr_metrics')[source]¶
jade.RAbD_BM.recovery_rr_tools module¶
-
jade.RAbD_BM.recovery_rr_tools.
calculate_geometric_means_rr
(df, x, y, hue=None)[source]¶ Example use: rr_data_lengths = calculate_geometric_means_rr(df_all, x=’cdr’, y=’length_rr’, hue=’exp’) rr_data_clusters = calculate_geometric_means_rr(df_all, x=’cdr’, y=’cluster_rr’, hue=’exp’)
-
jade.RAbD_BM.recovery_rr_tools.
calculate_rr_errors
(df_all_errors)[source]¶ Calculates the risk ratio errors for cluster and lengths using propagation error equations calculated for the recovery itself. Which is the same for percent as it would be raw data, as the N cancels out in the equations. http://lectureonline.cl.msu.edu/~mmp/labs/error/e2.htm
-
jade.RAbD_BM.recovery_rr_tools.
calculate_set_errorbars_hist
(ax, data, x, y, binomial_distro=True, total_column='total_entries', y_freq_column=None, x_order=None, hue_order=None, hue=None, caps=False, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶ Calculates the standard deviation of the data, sets erorr bars for a histogram. Default base_columnwidth for seaborn plots is .8
Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting.
- Notes:
- If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.
- Caps are the line horizontal lines in the errorbar.
- ‘full’ means error bars on both vertical sides of the histogram bar.
- Warning:
- linewidth of .5 does not show up in all PDFs for all bars.
-
jade.RAbD_BM.recovery_rr_tools.
calculate_set_errorbars_scatter
(ax, data, x, y, binomial_distro=False, total_column='total_entries', caps=False, color='k', lw=1.5)[source]¶ (Untested) - Calculates the standard deviation of the data, sets error bars for a typical scatter plot
-
jade.RAbD_BM.recovery_rr_tools.
calculate_stddev_binomial_distribution2
(df, x, y, total_column, y_mean_column, hue=None, percent=True)[source]¶ Calcuates stddeviations for a binomial distribution. Returns a dataframe of stddevs If percent=True, we dived by the total to normalize the standard deviation. SD of ‘mean’ = SQRT(n*p*q) where p is probability of success and q is probability of failure.
-
jade.RAbD_BM.recovery_rr_tools.
load_precomputed_recoveries
(db_path='data/all_recovery_and_risk_ratio_data.db', table='full_data')[source]¶ Reads recovery data from a database created via script.
rtype: pandas.Dataframe
-
jade.RAbD_BM.recovery_rr_tools.
order_by_row_group
(df, column, groups)[source]¶ Order a dataframe by groups. Return the dataframe. Probably a better way to do this already, but I don’t know what it is.
-
jade.RAbD_BM.recovery_rr_tools.
remove_pdb_and_cdr
(df, pdbid, cdr)[source]¶ Removes a particular pdbid and cdr from the db. Returns the new df.
-
jade.RAbD_BM.recovery_rr_tools.
set_errorbars_bar
(ax, data, x, y, error_dfs, x_order=None, hue_order=None, hue=None, caps=False, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶ Sets erorr bars for a bar chart.
Default base_columnwidth for seaborn plots is .8
Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting.
- Notes:
- If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.
- Caps are the line horizontal lines in the errorbar.
- ‘full’ means error bars on both vertical sides of the histogram bar.
- Warning:
- linewidth of .5 does not show up in all PDFs for all bars.
-
jade.RAbD_BM.recovery_rr_tools.
set_errorbars_bar_rr
(ax, data, x, y, error_dfs, x_order=None, hue_order=None, hue=None, caps=False, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶ Sets erorr bars for a bar chart.
Default base_columnwidth for seaborn plots is .8
Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting.
- Notes:
- If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.
- Caps are the line horizontal lines in the errorbar.
- ‘full’ means error bars on both vertical sides of the histogram bar.
- Warning:
- linewidth of .5 does not show up in all PDFs for all bars.
jade.RAbD_BM.tools module¶
jade.RAbD_BM.tools_ab_db module¶
-
jade.RAbD_BM.tools_ab_db.
get_all_clusters_for_length
(db, cdr, length, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Get all unique clusters for a length and a cdr
-
jade.RAbD_BM.tools_ab_db.
get_all_lengths
(db, cdr, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Get all unique lengths for a CDR
-
jade.RAbD_BM.tools_ab_db.
get_cdr_data_table_df
(db_path)[source]¶ Get a dataframe with typical info from the cdr_data table in the PyIgClassify db. :param db_con: sqlite3.con :rtype: pandas.DataFrame
-
jade.RAbD_BM.tools_ab_db.
get_cdr_rmsd_for_entry
(db, pdb, original_chain, cdr, length, fullcluster)[source]¶
-
jade.RAbD_BM.tools_ab_db.
get_center_dih_degrees_for_cluster_and_length
(db, cdr, length, cluster)[source]¶ Returns a dictionary of center dihedral angles in positional order. Or returns False if not found. result[“phis’] = [phis as floats] result[“psis”] = [Psis as floats] result[“omegas”] = [Omegas as floats]
-
jade.RAbD_BM.tools_ab_db.
get_center_for_cluster_and_length
(db, cdr, length, cluster, data_names_array)[source]¶
-
jade.RAbD_BM.tools_ab_db.
get_cluster_enrichment
(df, gene, cdr, cluster)[source]¶ Get the number of matches in the df and pdbid to the cdr and cluster :param df: pandas.DataFrame :rtype: int
-
jade.RAbD_BM.tools_ab_db.
get_cluster_matches
(df, gene, cdr, cluster)[source]¶ Get a dataframe of the matching (“Recovered”) rows (DataFrame).
Parameters: df – pandas.DataFrame Return type: pandas.DataFrame:
-
jade.RAbD_BM.tools_ab_db.
get_data_for_cluster_and_length
(db, cdr, length, cluster, data_names_array, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶ Get a set of data of a particular length, cdr, and cluster. data_names_array is a list of the types of data. Can include DISTINCT keyword
Example: data_names_array = [“PDB”, “original_chain”, “new_chain”, “sequence”]
-
jade.RAbD_BM.tools_ab_db.
get_length_enrichment
(df, gene, cdr, length)[source]¶ Get the number of matches in the df and pdbid to the cdr and length
Parameters: - df – pandas.DataFrame
- length – int
Return type: int
-
jade.RAbD_BM.tools_ab_db.
get_length_matches
(df, gene, cdr, length)[source]¶ Get a dataframe of the matching (“Recovered”) rows (DataFrame).
Parameters: - df – pandas.DataFrame
- length – int
Return type: pandas.DataFrame
-
jade.RAbD_BM.tools_ab_db.
get_pdb_chain_subset
(db, gene)[source]¶ Return a list of tuples of [pdb, chain] of the particular gene
-
jade.RAbD_BM.tools_ab_db.
get_stem_rmsd_for_entry
(db, pdb, original_chain, cdr, length, fullcluster)[source]¶
jade.RAbD_BM.tools_features_db module¶
-
jade.RAbD_BM.tools_features_db.
get_all_entries
(df, pdbid, cdr)[source]¶ Get all entries of a given PDBID and CDR. :param df: pandas.DataFrame :rtype: pandas.DataFrame
-
jade.RAbD_BM.tools_features_db.
get_cdr_cluster_df
(db_path)[source]¶ Get a dataframe with typical cluster info in it, which was generated by the features reporter framework. :param db_con: sqlite3.con :rtype: pandas.DataFrame
-
jade.RAbD_BM.tools_features_db.
get_cluster
(df, pdbid, cdr)[source]¶ Get the fullcluster from the dataframe for native or experimental data
Parameters: df – pandas.DataFrame Return type: str
-
jade.RAbD_BM.tools_features_db.
get_cluster_matches
(df, pdbid, cdr, cluster)[source]¶ Get a dataframe of the matching (“Recovered”) rows (DataFrame).
Parameters: df – pandas.DataFrame Return type: pandas.DataFrame:
-
jade.RAbD_BM.tools_features_db.
get_cluster_recovery
(df, pdbid, cdr, cluster)[source]¶ Get the number of matches in the df and pdbid to the cdr and cluster :param df: pandas.DataFrame :rtype: int
-
jade.RAbD_BM.tools_features_db.
get_length
(df, pdbid, cdr)[source]¶ Get the length from the dataframe for native or experimental data
Parameters: df – pandas.DataFrame Return type: int
-
jade.RAbD_BM.tools_features_db.
get_length_matches
(df, pdbid, cdr, length)[source]¶ Get a dataframe of the matching (“Recovered”) rows (DataFrame).
Parameters: - df – pandas.DataFrame
- length – int
Return type: pandas.DataFrame
jade.rosetta_jade package¶
jade.rosetta_jade.BenchmarkInfo module¶
-
class
jade.rosetta_jade.BenchmarkInfo.
BenchmarkInfo
(decoy_path, full_name, final_name, scorefunction='talaris2014')[source]¶ Simple Class for holding info for a particular benchmark. Parses the Run_Settings.txt file in the decoy directory. This file is output by RunRosettaBenchmarks.
The settings dictionary then holds key/value pairs. Here is an example of this file for RAbD:
CDR = ALL DATASET = bm2_ten DOCK = False INNER_CYCLE_ROUNDS = 1 INPUT_PDB_TYPE = pareto L_CHAIN = kappa MINTYPE = relax OUTER_CYCLE_ROUNDS = 100 PAPER_AB_DB = True PROTOCOL = even_cluster_mc RANDOM_START = True REMOVE_ANTIGEN = True SEPARATE_CDRS = False
-
jade.rosetta_jade.BenchmarkInfo.
get_run_settings
(dir, fname='RUN_SETTINGS.txt')[source]¶ Gets a dict of the settings used to run the benchmark in the directory.
The settings file looks like this, and is output by RunRosettaBenchmarks into the decoy directory:
CDR = ALL DATASET = bm2_ten DOCK = False INNER_CYCLE_ROUNDS = 1 INPUT_PDB_TYPE = pareto L_CHAIN = kappa MINTYPE = relax OUTER_CYCLE_ROUNDS = 100 PAPER_AB_DB = True PROTOCOL = even_cluster_mc RANDOM_START = True REMOVE_ANTIGEN = True SEPARATE_CDRS = FalseParameters: dir – str Return type: defaultdict
jade.rosetta_jade.FeaturesJsonCreator module¶
-
class
jade.rosetta_jade.FeaturesJsonCreator.
JsonCreator
(out_path, script_type)[source]¶ Basic implementation of a simple JsonCreator to create Jsons. Could be expanded to not load jsons with pre-set scripts. A nicer implementation would be a GUI for running the FeaturesReporter scripts.
-
jade.rosetta_jade.FeaturesJsonCreator.
run_features_json
(json_path, backround=False, outpath='')[source]¶ Convenience function Outputs an R script for running a JSON file, and runs it. Works with the new Library structure of the Features Reporter Framework.
jade.rosetta_jade.Region module¶
jade.rosetta_jade.RunRosetta module¶
-
class
jade.rosetta_jade.RunRosetta.
RunRosetta
(program=None, parser=None, db_mode=False, json_run=None)[source]¶ Bases:
object
-
jade.rosetta_jade.RunRosetta.
get_option_strings
(cmd)[source]¶ Get the options as a string to be printed or saved to a file. :param cmd: :rtype: str
-
jade.rosetta_jade.RunRosetta.
run_on_qsub
(cmd, queue_dir, name, print_only=False, extra_opts='')[source]¶
jade.rosetta_jade.RunRosettaBenchmarks module¶
-
class
jade.rosetta_jade.RunRosettaBenchmarks.
RunRosettaBenchmarks
(program=None, parser=None)[source]¶
jade.rosetta_jade.ScoreFiles module¶
-
class
jade.rosetta_jade.ScoreFiles.
ScoreFile
(filename)[source]¶ -
get_Dataframe
(scoreterms=None, order_by='total_score', top_n=-1, reverse=True)[source]¶ Get data as a pandas dataframe. Definitely preferred now. :param scoreterms: list :param order_by: str :param top_n: int :param reverse: bool :rtype: pandas.DataFrame
-
get_ordered_decoy_list
(scoreterm, decoy_names=None, top_n=-1, reverse=False)[source]¶ Get an ordered tuple of [[score, decoy_name], …] Will automatically order some known scoreterms (hbonds_int, dSASA_int)
Return type: list[list]
-
-
jade.rosetta_jade.ScoreFiles.
get_scorefiles
(indir='/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs')[source]¶ Get Score files from a directory. Walk through all directories in directory. :param indir: str :rtype: list
-
jade.rosetta_jade.ScoreFiles.
plot_score_vs_rmsd
(df, title, outpath, score='total_score', rmsd='looprms', top_p=0.95, reverse=True)[source]¶ Plot a typical Score VS RMSD using matplotlib, save it somewhere. Return the axes. By default, plot the top 95% :param df: pandas.DataFrame :param outpath: str :param score: str :param rmsd: str :rtype: matplotlib.Axes
-
jade.rosetta_jade.ScoreFiles.
pymol_session_on_top_df
(df, outdir, decoy_dir=None, scoreterm='total_score', top_n=10, decoy_column='decoy', native_path=None, out_prefix_override=None, ab_structure=False, superimpose=False, run_pymol=True)[source]¶ Make a PyMol session (or setup a scripter) on top X using a dataframe. Return the scripter for extra control.
df should have an attribute of ‘name’ or out_prefix_override should be set.
Parameters: - df – pandas.DataFrame
- outdir – str
- decoy_dir – str
- scoreterm – str
- top_n – int
- decoy_column – str
- native_path – str
- out_prefix_override – str
- ab_structure – boolean
- superimpose – boolean
Return type:
jade.rosetta_jade.SetupRosettaOptionsBenchmark module¶
-
class
jade.rosetta_jade.SetupRosettaOptionsBenchmark.
SetupRosettaOptionsBenchmark
(json_file)[source]¶ Bases:
jade.rosetta_jade.SetupRosettaOptionsGeneral.SetupRosettaOptionsGeneral
Class for setting up Rosetta Benchmarks. See database/rosetta/benchmark_jsons_rabd/nstruct_test.json for an example.
Basically, a set of benchmarks and rosetta options are given in the JSON. Other keys can be specified for specific benchmarks (like the instructions file stuff in the above file.)
This can be used to use a single JSON file and run RosettaMPI on ALL combinations of benchmarks given.
-
get_benchmark_names
(only_rosetta=False)[source]¶ Get the names of all the benchmarks we will run.
Each benchmark must have a dictionary that defines ‘benchmarks’ as a list. You may optionally give the rosetta_option. Currently, your subclass of RunRosetta will need to code how all this is run. Hopefully, that will change.
If only_rosetta is true, will only give the benchmark names that are based on rosetta options.
For example:
- “outer_cycle_rounds”:{
- “rosetta_option”:”-outer_cycle_rounds”, “benchmarks”:[ 25, 50, 75, 100]
},
Return type: list
-
get_benchmarks_of_key
(benchmark_name)[source]¶ Get the list of benchmarks for a particular benchmark key. :param benchmark_name: str :rtype: list
-
get_non_rosetta_option_benchmark_names
()[source]¶ Similar to get_benchmark_names, but only for options which do not have the tag rosetta_option
Return type: list
-
get_rosetta_option_of_key
(benchmark_name)[source]¶ Get the Rosetta option :param benchmark_name: :rtype: str
-
jade.rosetta_jade.SetupRosettaOptionsGeneral module¶
-
class
jade.rosetta_jade.SetupRosettaOptionsGeneral.
SetupRosettaOptionsGeneral
(cluster_json_file)[source]¶ Bases:
object
Class for setting up more general Rosetta options for benchmarking and repeatable runs on different clusters. Useful for benchmarking. Subclass for adding more benchmarking settings for specific benchmarks.
jade.rosetta_jade.alignment module¶
-
jade.rosetta_jade.alignment.
align_to_second_pose_save_pdb
(pose_name, pose, second_pose, outdir, overhang=0, stem_align=False)[source]¶
jade.rosetta_jade.features module¶
-
jade.rosetta_jade.features.
create_features_db
(pdb_list, xml_name, compiler, score_weights, out_db_name, out_db_batch, outdir, use_present_dbs, indir='', mpi=True, np=5)[source]¶ old_db_name = outdir+’/’+out_db_name+’.’+score_weights+”.db3” new_db_name = outdir+’/’+out_db_name+’.’+xml_name+’.’+score_weights+”.db3” if os.path.exists(old_db_name):
os.system(‘mv ‘+old_db_name+’ ‘+new_db_name) print “Old db name already exists. Moving.” return
jade.rosetta_jade.flag_util module¶
jade.utility package¶
jade.utility.string_util module¶
-
jade.utility.string_util.
deduce_str_type
(s)[source]¶ Deduce the type of a string. Either return the string as the literal, or as the string if not possible. http://stackoverflow.com/questions/13582142/deduce-the-type-of-data-in-a-string
Parameters: s – str Returns:
Public Apps¶
public.antibody_benchmark_utils¶
RunRosettaBenchmarksMPI.py¶
This program runs Rosetta MPI locally or on a cluster using slurm or qsub. Relative paths are accepted.
usage: RunRosettaBenchmarksMPI.py [-h]
bm-RAbD_Jade.py¶
This program is a GUI used for benchmarking Rosetta Antibody Design.Before running this application, you will probably want to run ‘run_rabd_features_for_benchmarks.py to create the databases required.
usage: bm-RAbD_Jade.py [-h] [--main_dir MAIN_DIR] [--out_dir OUT_DIR] --jsons
[JSONS [JSONS ...]]
Named Arguments¶
--main_dir | Main working directory. Not Required. Default = PWD Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs” |
--out_dir | Output data directory. Not Required. Default = pooled_data Default: “pooled_data” |
--jsons, -j | Analysis JSONs to use. See RAbD_MB.AnalysisInfo for more on what is in the JSON.The JSON allows us to specify the final name, decoy directory, and features db associated with the benchmark as well as all options that went into it. |
bm-calculate_graft_closure_rabd.py¶
Calculate the frequence of graft closures.
usage: bm-calculate_graft_closure_rabd.py [-h] [--dir DIR] [--outfile OUTFILE]
[--use_ensemble]
[--match_name MATCH_NAME]
Named Arguments¶
--dir, -i | Input directory |
--outfile, -o | Path to outfile |
--use_ensemble | Use ensembles in calculation Default: False |
--match_name | Match a subexperiment in the file name such as relax |
bm-calculate_recoveries_and_risk_ratios.py¶
Calculates and plots monte carlo acceptance values for antibody design benchmarking.
usage: bm-calculate_recoveries_and_risk_ratios.py [-h] --jsons
[JSONS [JSONS ...]]
[--data_outdir DATA_OUTDIR]
Named Arguments¶
--jsons, -j | Analysis JSONs to use. See RAbD_MB.AnalysisInfo for more on what is in the JSON.The JSON allows us to specify the final name, decoy directory, and features db associated with the benchmark as well as all options that went into it. |
--data_outdir, -o | |
Path to outfile. DEFAULT = data Default: “data” |
bm-output_all_clusters.py¶
Calculates and plots monte carlo acceptance values for antibody design benchmarking.
usage: bm-output_all_clusters.py [-h] --jsons [JSONS [JSONS ...]]
[--data_outdir DATA_OUTDIR]
Named Arguments¶
--jsons, -j | Analysis JSONs to use. See RAbD_MB.AnalysisInfo for more on what is in the JSON.The JSON allows us to specify the final name, decoy directory, and features db associated with the benchmark as well as all options that went into it. |
--data_outdir, -o | |
Path to outfile. DEFAULT = data Default: “data” |
bm-plot_features.py¶
Calculates and plots monte carlo acceptance values for antibody design benchmarking.
usage: bm-plot_features.py [-h] --jsons [JSONS [JSONS ...]]
[--plot_outdir PLOT_OUTDIR]
Named Arguments¶
--jsons, -j | Analysis JSONs to use. See RAbD_MB.AnalysisInfo for more on what is in the JSON.The JSON allows us to specify the final name, decoy directory, and features db associated with the benchmark as well as all options that went into it. |
--plot_outdir, -p | |
DIR for plots. DEFAULT = plots Default: “plots” |
bm-run_rabd_benchmarks.py¶
This program runs Rosetta MPI locally or on a cluster using slurm or qsub. Relative paths are accepted.
usage: bm-run_rabd_benchmarks.py [-h]
public.antibody_utils¶
RAbD_Jade.py¶
GUI application to analyze designs output by RosettaAntibodyDesign. Designs should first be analyzed by both the AntibodyFeatures and CDRClusterFeatures reporters into sqlite3 databases.
usage: RAbD_Jade.py [-h] [--db_dir DB_DIR] [--analysis_name ANALYSIS_NAME]
[--native NATIVE] [--root_dir ROOT_DIR]
[--cdrs [{L1,H1,L1,H2,L3,H3} [{L1,H1,L1,H2,L3,H3} ...]]]
[--pyigclassify_dir PYIGCLASSIFY_DIR]
[--jsons [JSONS [JSONS ...]]]
Named Arguments¶
--db_dir | Directory with databases to compare. DEFAULT = databases Default: “databases” |
--analysis_name | |
Main directory to complete analysis. DEFAULT = prelim_analysis Default: “prelim_analysis” | |
--native | Any native structure to compare to |
--root_dir | Root directory to run analysis from Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs” |
--cdrs | Possible choices: L1, H1, L1, H2, L3, H3 A list of CDRs for the analysis (Not used for Features Reporters) Default: [‘L1’, ‘L2’, ‘L3’, ‘H1’, ‘H2’, ‘H3’] |
--pyigclassify_dir | |
Optional PyIgClassify Root Directory with DBOUT. Used for debugging. Default: “” | |
--jsons, -j | Analysis JSONs to use. See RAbD_MB.AnalysisInfo for more on what is in the JSON.The JSON allows us to specify the final name, decoy directory, and features db associated with the benchmark as well as all options that went into it. |
convert_IMGT_to_fasta.py¶
This script converts an IMGT output file (5_AA-seqs.csv) to a FASTA. All Framework and CDRs are concatonated. * is skipped. The FASTA file can then be used by PyIgClassify.
usage: convert_IMGT_to_fasta.py [-h] --inpath INPATH --outpath OUTPATH
Named Arguments¶
--inpath, -i | Input IMGT file path |
--outpath, -o | Output Fasta outfile path. |
create_features_json.py¶
This script will create either cluster features or antibody features json for use in Features R script. Example Cmd-line: python create_features_json.py –database databases/baseline_comparison.txt –scripts cluster
usage: create_features_json.py [-h] [--databases [DATABASES [DATABASES ...]]]
[--script {cluster,antibody,interface,antibody_minimal}]
[--db_path DB_PATH] [--outdir OUTDIR]
[--outname OUTNAME]
[--add_comparison_to_this_json ADD_COMPARISON_TO_THIS_JSON]
[--run]
Named Arguments¶
--databases, -l | |
List of dbs: db_name,short_name,ref keyword if the reference databaseSeparated by white space. Default: [] | |
--script, -s | Possible choices: cluster, antibody, interface, antibody_minimal Script type. Will setup the appropriate output formats and R scripts Default: “antibody_minimal” |
--db_path, -p | Path to databases. Default is pwd/databases Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs/databases” |
--outdir, -o | Where to put the result of the analysis scripts. Currently unsupported by the features framework. Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs/plots” |
--outname, -n | Output file name of json file Default: “local_json_compare_ss.json” |
--add_comparison_to_this_json, -a | |
Add all this data to this json as more sample sources. | |
--run, -r | Go ahead and run compare_sample_sources.R. Must be in path!! Default: False |
generate_rabd_features_dbs.py¶
Generates RAbD Features DBs using RunRosettaMPI in db mode.
usage: generate_rabd_features_dbs.py [-h]
match_antibody_structures.py¶
This App aims to make pymol alignments using the PyIgClassify database and structures, matching specific criterion.
usage: match_antibody_structures.py [-h] --db DB --ab_dir AB_DIR --where WHERE
[--outdir OUTDIR] [--prefix PREFIX]
[--cdr CDR] [--native NATIVE]
Required Arguments¶
--db, -d |
|
--ab_dir, -b | Directory with renumbered antibody PDBs (Full or CDRs-only) |
--where, -w | Your where clause for the db in quotes. Not including WHERE. Use ‘ ‘ for string matches |
Other Arguments¶
--outdir, -o | Output directory. Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs” |
--prefix, -p | Output prefix |
--cdr, -c | Optionally load the CDR PDBs of the given type in the ab_dir. If this option is set, the ab_dir should be of CDRs only from PyIgClassify. |
--native, -n | Align everything to this PDB, the native or something you are interested in. |
order_ab_chains.py¶
Reorders PDBFiles in a dirctory according to A_LH in order for Rosetta Antibody Design benchmarking. Removes HetAtm
usage: order_ab_chains.py [-h] [--in_dir IN_DIR] [--in_pdblist IN_PDBLIST]
[--in_single IN_SINGLE] [--out_dir OUT_DIR]
[--reverse]
Named Arguments¶
--in_dir, -i | Input Directory of PDB files listed in any passed PDBLIST. Default=PWD Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs” |
--in_pdblist, -l | |
Input PDBList file. Assumes PDBList has no paths and requires an input directory as if we run Rosetta. Default: “” | |
--in_single, -s | |
Path to Input PDB File, instead of list. Default: “” | |
--out_dir, -d | Output Directory. Resultant PDB files will go here. Default: “reordered” |
--reverse, -r | Reverse order (LH_A instead of A_LH). Used for snugdock Default: False |
split_antibody_components.py¶
Script for splitting AHO renumbered antibodies into Fv, Fc, and linker regions
usage: split_antibody_components.py [-h] [--any_structure] --ab_dir AB_DIR
--output_dir OUTPUT_DIR
Named Arguments¶
--any_structure | |
Be default, we only output structures with both L/H. Pass this option to split structures that are L or H only. Default: False | |
--ab_dir, -a | Antibody Directory with AHO-renumbered structures to split. Can be .pdb, or .pdb.gz |
--output_dir, -o | |
Output Directory for antibody structures. |
public.general¶
canceljobs.py¶
Call scancel to cancel a consecutive set of cluster job numbers
usage: canceljobs.py [-h]
convert_fig.py¶
Converts images to TIFF figures at 300 DPI for publication using sips. Arguments: INFILE OUTFILE
usage: convert_fig.py [-h]
genscript_to_fasta.py¶
This script outputs fasta files from a genscript format. Pass the –format option to control which genscript format as input ~~~ Ex: python genscript_mut_to_fasta.py –format mutagenesis MutagenesisFormatU68 ~~~
usage: genscript_to_fasta.py [-h] --format {mutagenesis,GeneSynth} infile
Positional Arguments¶
infile | The mutagenesis format file. |
Named Arguments¶
--format | Possible choices: mutagenesis, GeneSynth The genscript file format |
get_seq.py¶
Uses Biopython to print sequence information. Example: get_seq.py –pdb 2j88_A.pdb –format fasta –outpath test.txt
usage: get_seq.py [-h] [--pdb PDB] [--pdblist PDBLIST]
[--pdblist_input_dir PDBLIST_INPUT_DIR] [--chain CHAIN]
[--cdr CDR]
[--format {basic,fasta,general_order,IgG_order,IgG_order_lambda,IgG_order_kappa,IgG_order_heavy}]
[--outpath OUTPATH] [--prefix PREFIX] [--region REGION]
[--strip_c_term STRIP_C_TERM] [--pad_c_term PAD_C_TERM]
[--output_original_seq]
Named Arguments¶
--pdb, -s | Input PDB path |
--pdblist, -l | Input PDB List |
--pdblist_input_dir, -i | |
Input directory if needed for PDB list | |
--chain, -c | A specific chain to output Default: “” |
--cdr | Pass a specific CDR to output alignments of. Default: “” |
--format | Possible choices: basic, fasta, general_order, IgG_order, IgG_order_lambda, IgG_order_kappa, IgG_order_heavy The output format requried. Default: “fasta” |
--outpath, -o | Output path. If none is specified it will write to screen. |
--prefix, -t | Tag to add before chain Default: “” |
--region | specify a particular region, start:end:chain |
--strip_c_term | Strip this sequence off the C-term of resulting sequences. (Useful for antibodies |
--pad_c_term | Pad this sequence with some C-term (Useful for antibodies |
--output_original_seq | |
Output the original sequence and the striped seqeunce if stripped. Default FALSE. Default: False |
rename_designs.py¶
Renames original files to new names for design ordering. Copy all models going to be ordered into a single directory first. Run from directory with pdb files already copied in!
usage: rename_designs.py [-h] -i NEW_NAMES
Named Arguments¶
-i, --new_names | |
File with new to old names. Example line: new_name * filename. Can have lines that don’t have all three. Will only rename if it has a star in the second column. |
public.pdb_utils¶
place_TERs.py¶
This script places ters between ATOM/HETATM columns. This is currently needed to reload symmetrized glycan posescreated by the god aweful make_symm_file.pl Rosetta script. USE: place_TERs.py my_pdb - Does it in place.
usage: place_TERs.py [-h] [pdb_files [pdb_files ...]]
Positional Arguments¶
pdb_files | Path to PDB files we will be stripping. |
public.pyrosetta¶
build_loop_pyrosetta.py¶
This script builds a loop between two places in a structure with the given sequence, and closes the loop.It is not meant to be the last modeling step, just to create missing density or to prepare for loop modeling.
usage: build_loop_pyrosetta.py [-h] --start START --stop STOP --sequence
SEQUENCE [--out_prefix OUT_PREFIX]
[--retain_aligned_roots] --pdb PDB [--kic]
[--dump_midpoints]
Named Arguments¶
--start | Starting resnum. Ex: 24L |
--stop | Ending resnum. Ex. 42L. |
--sequence | Sequence of the loop |
--out_prefix | Any prefix to give results. Default: “loop_built_” |
--retain_aligned_roots | |
Attempt to keep any aligned root residues during the build Default: False | |
--pdb, -s | Input model |
--kic | Run KIC peruturber after closing the loop? Default: False |
--dump_midpoints | |
Dump midpoint PDBs? Default: False |
find_my_glycans.py¶
This app is the PyRosetta equivalent of GlycanInfo. Print carbohydrate info about the pose. Pass the pose in as an argument
usage: find_my_glycans.py [-h]
find_my_residues.py¶
Simple app to scan a PDB file and print PDB info and Rosetta understood chains and resnums.
usage: find_my_residues.py [-h] [--chain CHAIN] [--echo_input] pdb_file
Positional Arguments¶
pdb_file | The PDB file to scan. |
Named Arguments¶
--chain, -c | Specify only a single chain to scan. |
--echo_input, -e | |
Echo the input structure as output. This is to check how Rosettta worked reading it. Default: False |
get_mutation_energy.py¶
Basic app to get mutation energy of each residue in a particular region using PyRosetta
usage: get_mutation_energy.py [-h] [--pdb PDB] [--outpath OUTPATH]
[--filename FILENAME] [--region REGION]
[--relax_whole_structure] [--alanine_scan]
Named Arguments¶
--pdb, -s | Path to PDB file. Required. |
--outpath, -o | Full output directory path. Default is pwd/RESULTS Default: “/RESULTS” |
--filename, -n | The filename of the results file Default: “mutation_energies.txt” |
--region, -r | (region designated as start:end:chain) If none is given, will use whole PDB |
--relax_whole_structure, -m | |
Relax the whole structure? Default is to only relax chain under question. If no region is set, will default to true Default: False | |
--alanine_scan, -a | |
Trigger the script to do an alanine scan of the mutations instead of a full mutational scan. Default: False |
public.rosetta¶
score_analysis.py¶
This utility parses and extracts data from score files in JSON format
usage: score_analysis.py [-h] [-s [SCORETYPES [SCORETYPES ...]]] [-n TOP_N]
[--top_n_by_10 TOP_N_BY_10]
[--top_n_by_10_scoretype TOP_N_BY_10_SCORETYPE]
[--decoy_names [DECOY_NAMES [DECOY_NAMES ...]]]
[--list_scoretypes] [--pdb_dir PDB_DIR] [--summary]
[--csv] [--make_pdblist] [--pymol_session]
[--plot [PLOT [PLOT ...]]] [--copy_top_models]
[--prefix PREFIX] [--outdir OUTDIR]
[--plot_type {line,scatter,bar,hist,box,kde,area,pie,hexbin}]
[--plot_filter PLOT_FILTER] [--native NATIVE]
[--ab_structure] [--super SUPER]
[scorefiles [scorefiles ...]]
Positional Arguments¶
scorefiles | A list of scorefiles |
Named Arguments¶
-s, --scoretypes | |
List of score terms to extract Default: [‘dSASA_int’, ‘delta_unsatHbonds’, ‘hbonds_int’, ‘total_score’, ‘dG_separated’, ‘top_n_by_10’] | |
-n, --top_n | Only list Top N when doing top scoring decoys or making pymol sessionsDefault is to print all of them. Default: -1 |
--top_n_by_10 | Top N by 10 percent total score to print out. Default: 10 |
--top_n_by_10_scoretype | |
Scoretype to use for any top N by 10 printing. If scoretype not present, won’t do anything. Default: “dG_separated” | |
--decoy_names | Decoy names to use Default: [] |
--list_scoretypes | |
List score term names Default: False | |
--pdb_dir, -d | Directory for PDBs if different than the directory of the scorefile |
OUTPUT¶
General output options.
--summary, -S | Compute stats summarizing data Default: False |
--csv, -c | Output selected columns, top, and decoys as CSV. Default: False |
--make_pdblist | Output PDBlist file(s) Default: False |
--pymol_session | |
Make pymol session(s) of the scoretypes specified Default: False | |
--plot | Plot one score type vs another. Save the plot. 2 or 3 Arguments. [X, Y, ‘Title’‘] OR [X, ‘Title’]. If title has spaces, use quotes. Nothing special, just used for quick info. Default: [] |
--copy_top_models | |
Copy the top -n to the output directory for each scorefile passed. Default: False | |
--prefix, -p | Prefix to use for any file output. Do not include any _ Default: “” |
--outdir, -o | Output dir. Default is current directory. Default: “/home/docs/checkouts/readthedocs.org/user_builds/bio-jade/checkouts/latest/docs” |
PLOTTING¶
Options for plot output
--plot_type | Possible choices: line, scatter, bar, hist, box, kde, area, pie, hexbin The type of plot we are outputting. Default: “scatter” |
--plot_filter | Filter X to top Percent of this - useful to remove outliers. Default: 1.0 |
PYMOL¶
Options for pymol session output
--native | Native structure to use for pymol sessions. |
--ab_structure | Specify if the module is a renumbered antibody structure. Will run pymol script for ab-specific selection Default: False |
--super | Super this selection instead of align all to. |
RunRosettaMPI.py¶
This program runs Rosetta MPI locally or on a cluster using slurm or qsub. Relative paths are accepted.
usage: RunRosettaMPI.py [-h]
RunRosettaDBMode.py¶
This program runs Rosetta MPI locally or on a cluster using slurm or qsub. Relative paths are accepted.
usage: RunRosettaDBMode.py [-h]
util¶
check_missing_rosetta_nstruct.py¶
This extremely simple script checks nstruct of the input files and outputs which nstruct number is missing.
usage: check_missing_rosetta_nstruct.py [-h] [-n NSTRUCT]
[--pdb_files [PDB_FILES [PDB_FILES ...]]]
[--pdblist PDBLIST] [--dir DIR]
Named Arguments¶
-n, --nstruct | Default: 1000 |
--pdb_files | Path to PDB files we will be checking. |
--pdblist, -l | Optional INPUT PDBLIST (without 00s, etc. for which to check |
--dir | The Directory to check. As opposed to a list of pdb files. |
create_score_json_from_scored_decoys.py¶
This script creates a Rosetta score file from a set of structures - by parsing the score from them. Pass a directory, a PDBLIST, and/or a list of filenames
usage: create_score_json_from_scored_decoys.py [-h] [--prefix PREFIX]
[decoys [decoys ...]]
Positional Arguments¶
decoys | A directory, a PDBLIST, and/or a list of filenames Default: [] |
Named Arguments¶
--prefix | Any prefix to use. Default: “” |
insert_natives_table_into_features_db.py¶
This script takes a PDBLIST of natives and then adds a new table to the database with struct_id as proper foreign primary key and the native structure based solely on a search of the name tag.
usage: insert_native_table_into_features_db.py [-h] [--pdblist PDBLIST]
[--db DB]
Named Arguments¶
--pdblist | PDBLIST of native structures used. |
--db | The database we are working on. |
Pilot Apps¶
apps.pilot.jadolfbr¶
copy_top_each_strategy.py¶
usage: copy_top_each_strategy.py [-h] [-n N] -i INDIR -o OUTDIR
[-s [STRATEGIES [STRATEGIES ...]]]
Named Arguments¶
-n | Number of models to copy. DEFAULT = 2 Default: 2 |
-i, --indir | Input directory |
-o, --outdir | Output directory |
-s, --strategies | |
The type of strategies we are interested in Default: [‘delta_unsats_per_1000_dSASA’, ‘dG_top_Ptotal’] |
glycan_basic_LCM_protocol.py¶
usage: glycan_basic_LCM_protocol.py [-h] --infile INFILE
--glycosylation_position
GLYCOSYLATION_POSITION
[--glycosylation_name GLYCOSYLATION_NAME]
[--nstruct NSTRUCT] [--cycles CYCLES]
Named Arguments¶
--infile, -s | Input PDB. |
--glycosylation_position, -g | |
Glycosylation site. Rosetta resnum or resnumChain, ex: 463G | |
--glycosylation_name, -n | |
Glycosylation name Default: “man5” | |
--nstruct | Number of output structures Default: 1 |
--cycles, -c | Total number of cycles to attempt using the LCM Default: 75 |