Bio2BEL Entrez

A Bio2BEL package for Entrez Gene and HomoloGene.

Manager

Manager for Bio2BEL Entrez.

class bio2bel_entrez.manager.Manager(*args, **kwargs)[source]

Genes and orthologies.

namespace_model

alias of bio2bel_entrez.models.Gene

is_populated() → bool[source]

Check if the database is already populated.

get_or_create_species(taxonomy_id: str, **kwargs) → bio2bel_entrez.models.Species[source]

Get or create a Species model.

Parameters

taxonomy_id – NCBI taxonomy identifier

get_gene_by_entrez_id(entrez_id: str) → Optional[bio2bel_entrez.models.Gene][source]

Get a gene with the given Entrez Gene identifier, if it exists.

Parameters

entrez_id – Entrez Gene identifier

get_genes_by_name(name: str) → List[bio2bel_entrez.models.Gene][source]

Get a list of genes with the given name (case insensitive).

Parameters

name – A gene name

get_gene_by_rgd_name(name: str) → Optional[bio2bel_entrez.models.Gene][source]

Get a gene by its RGD name.

Parameters

name – RGD gene symbol

get_gene_by_mgi_name(name: str) → Optional[bio2bel_entrez.models.Gene][source]

Get a gene by its MGI name.

Parameters

name – MGI gene symbol

get_gene_by_hgnc_name(name: str) → Optional[bio2bel_entrez.models.Gene][source]

Get a gene by its HGNC gene symbol.

get_or_create_gene(entrez_id: str, **kwargs) → bio2bel_entrez.models.Gene[source]

Get or create a Gene model.

Parameters

entrez_id – Entrez Gene identifier

get_or_create_homologene(homologene_id: str, **kwargs) → bio2bel_entrez.models.Homologene[source]

Get or create a HomoloGene model.

Parameters

homologene_id – HomoloGene Gene identifier

populate_homologene(url=None, cache=True, force_download=False, tax_id_filter=None) → None[source]

Populate the database.

Parameters
  • url (Optional[str]) – Homologene data url

  • cache (bool) – If true, the data is downloaded to the file system, else it is loaded from the internet

  • force_download (bool) – If true, overwrites a previously cached file

  • tax_id_filter (Optional[iter[str]]) – Species to keep

populate_gene_info(url: Optional[str] = None, cache: bool = True, force_download: bool = False, interval: Optional[int] = None, tax_id_filter: Iterable[str] = None)[source]

Populate the database.

Parameters
  • url – A custom url to download

  • interval – The number of records to commit at a time

  • cache – If true, the data is downloaded to the file system, else it is loaded from the internet

  • force_download – If true, overwrites a previously cached file

  • tax_id_filter – Species to keep

populate(gene_info_url: Optional[str] = None, interval: Optional[int] = None, tax_id_filter: Iterable[str] = ('9606', '10090', '10116', '7227', '4932', '6239', '7955', '9913', '9615'), homologene_url: Optional[str] = None)[source]

Populate the database.

Parameters
  • gene_info_url – A custom url to download

  • interval – The number of records to commit at a time

  • tax_id_filter – Species to keep. Defaults to 9606 (human), 10090 (mouse), 10116 (rat), 7227 (fly), and 4932 (yeast). Explicitly set to None to get all taxonomies.

  • homologene_url – A custom url to download

lookup_node(node: pybel.dsl.node_classes.BaseEntity) → Optional[bio2bel_entrez.models.Gene][source]

Look up a gene from a PyBEL data dictionary.

iter_genes(graph: pybel.struct.graph.BELGraph, use_tqdm: bool = False) → Iterable[Tuple[pybel.dsl.node_classes.BaseEntity, bio2bel_entrez.models.Gene]][source]

Iterate over genes in the graph that can be mapped to an Entrez gene.

normalize_genes(graph: pybel.struct.graph.BELGraph, use_tqdm: bool = False) → None[source]

Add identifiers to all Entrez genes.

enrich_genes_with_homologenes(graph: pybel.struct.graph.BELGraph) → None[source]

Enrich the nodes in a graph with their HomoloGene parents.

enrich_equivalences(graph: pybel.struct.graph.BELGraph) → None[source]

Add equivalent node information.

enrich_orthologies(graph: pybel.struct.graph.BELGraph) → None[source]

Add ortholog relationships to graph.

add_homologene_namespace_to_graph(graph: pybel.struct.graph.BELGraph) → pybel.manager.models.Namespace[source]

Add the homologene namespace to the graph.

count_genes() → int[source]

Count the genes in the database.

count_homologenes() → int[source]

Count the HomoloGenes in the database.

count_species() → int[source]

Count the species in the database.

list_species() → List[bio2bel_entrez.models.Species][source]

List all species in the database.

list_homologenes() → List[bio2bel_entrez.models.Homologene][source]

List all HomoloGenes in the database.

summarize() → Dict[str, int][source]

Return a summary dictionary over the content of the database.

list_genes(limit: Optional[int] = None, offset: Optional[int] = None) → List[bio2bel_entrez.models.Gene][source]

List genes in the database.

Models

SQLAlchemy models for Bio2BEL Entrez.

class bio2bel_entrez.models.Species(**kwargs)[source]

Represents a Species.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

taxonomy_id

NCBI Taxonomy Identifier

class bio2bel_entrez.models.Homologene(**kwargs)[source]

Represents a HomoloGene Group.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

as_bel(func: Optional[str] = None) → pybel.dsl.node_classes.CentralDogma[source]

Make a PyBEL DSL object from this HomoloGene.

class bio2bel_entrez.models.Gene(**kwargs)[source]

Represents a gene.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

entrez_id

NCBI Entrez Gene Identifier

name

Entrez Gene Symbol

description

Gene Description

type_of_gene

Type of Gene

property bel_encoding

Return the BEL encoding.

as_bel(func=None) → pybel.dsl.node_classes.CentralDogma[source]

Make a PyBEL DSL object from this gene.

property is_transcribed

Return if this gene can be transcribed to an RNA.

property is_translated

Return if this gene can be translated to a protein.

to_json() → Mapping[str, int][source]

Return this Gene as a JSON dictionary.

class bio2bel_entrez.models.Xref(**kwargs)[source]

Represents a database cross reference.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

database

Database name

value

Database entry name

Constants

Constants for Bio2BEL Entrez.

bio2bel_entrez.constants.GENE_INFO_COLUMNS = ['#tax_id', 'GeneID', 'Symbol', 'dbXrefs', 'description', 'type_of_gene']

Columns fro gene_info.gz that are used

bio2bel_entrez.constants.CONSORTIUM_SPECIES_MAPPING = {'FLYBASE': '7227', 'HGNC': '9606', 'MGI': '10090', 'RGD': '10116', 'SGD': '4932', 'WORMBASE': '6239', 'ZFIN': '7955'}

All namepace codes (in lowercase) that can map to ncbigene

Indices and tables