ssmpy.
calculate_information_content_intrinsic
(df, max_freq)¶Calculates the information content of a dataframe of entries :param df: pandas DataFrame :param max_freq: maximum frequency in the ontology :return: df with extra column ‘IC’
ssmpy.
common_ancestors
(entry1, entry2)¶Get common ancestors between two semantic base entries
entry1 (int) – first semantic base ID
entry1 – second semantic base ID
List of common ancestors
list
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.common_ancestors(gold, silver)
[6, 2, 10]
ssmpy.
create_connection
(db_file)¶specified by db_file
db_file – database file
Connection object or None
ssmpy.
create_semantic_base
(owl_file, sb_file, name_prefix, relation, annotation_file='')¶Create sqlite3 semantic base using a owl file.
owl_file (string) – File name of ontolgy in owl format
sb_file (string) – File name of database where semantic base will be stored
name_prefix (string) – Prefix of the concepts to be extracted from the ontology
relation (string) – Type of relation to be extracted from the ontology
annotation_file (string) – File containing ontology concepts to use as annotations and calculate concept frequency. Empty string if this file is not available.
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("http://purl.obolibrary.org/obo/go.owl", "go.owl")[0]
'go.owl'
>>> ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "")
loading the ontology go.owl
calculating transitive closure at distance: 1
calculating transitive closure at distance: 2
calculating transitive closure at distance: 3
calculating transitive closure at distance: 4
calculating transitive closure at distance: 5
calculating transitive closure at distance: 6
calculating transitive closure at distance: 7
calculating transitive closure at distance: 8
calculating transitive closure at distance: 9
calculating transitive closure at distance: 10
calculating transitive closure at distance: 11
calculating transitive closure at distance: 12
calculating transitive closure at distance: 13
calculating transitive closure at distance: 14
calculating transitive closure at distance: 15
calculating transitive closure at distance: 16
calculating the descendents
calculating the hierarchical frequency
the end
>>> ssmpy.semantic_base("go.db")
ssmpy.
db_select_entry
(conn, entry_list)¶Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity names
pandas dataframe with all columns in entry
ssmpy.
db_select_entry_by_id
(conn, entry_list)¶Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity ids
pandas dataframe with all columns in entry
ssmpy.
db_select_transitive
(conn, ids_list)¶Query all rows in the transitive table, where id is in ids_list :param conn: the Connection object :param ids_list: list of entity ids
pandas dataframe with all columns in transitive
ssmpy.
fast_jc
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶Calculates the JC MICA INTRINSIC similarity between it1 and it2
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
list: [e1, e2, sim_jc]
ssmpy.
fast_lin
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶Calculates the LIN MICA INTRINSIC similarity between it1 and it2
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
list: [e1, e2, sim_lin]
ssmpy.
fast_resn_lin_jc
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶Calculates the RESNIK, LIN and JC MICA INTRINSIC similarity between it1 and it2
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
list: [e1, e2, sim_resnik, sim_lin, sim_jc]
ssmpy.
fast_resnik
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶Calculates the RESNIK MICA INTRINSIC similarity between it1 and it2
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
list: [e1, e2, sim_resnik]
ssmpy.
get_all_commom_ancestors
(all_ancestors, it1, it2)¶Get all common ancestors for it1 and it2
all_ancestors – pandas DataFrame of all ancestors
it1 – entity 1 (id)
it2 – entity 2 (id)
pandas DataDrame of common ancestors or zero
ssmpy.
get_ancestors
(entry)¶Get ancestors of a given semantic base entry
entry (int) – semantic base ID
List of ancestors
list
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.get_ancestors(gold)
[3, 6, 2, 10]
ssmpy.
get_id
(name)¶Get semantic base ID of ontolgy concept by its original label (name).
name (string) – ontology label (depends on the ontolgy)
semantic base ID or -1 if not found
int
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> ssmpy.get_id("gold")
3
ssmpy.
get_name
(cid)¶Get ontology label (name) for a given semantic base ID.
cid (int) – semantic base ID
ontology label (name)
string
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> ssmpy.get_name(3)
'gold'
ssmpy.
get_uniprot_annotations
(protein_acc)¶Retrieve GO annotations for a UniProt ID using UniProt API
protein_acc (string) – UniProt protein ID
list of GO terms
list
>>> import ssmpy
>>> import urllib.request
>>> import gzip
>>> import shutil
>>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0]
'go.db.gz'
>>> with gzip.open('go.db.gz', 'rb') as f_in:
... with open('go.db', 'wb') as f_out:
... shutil.copyfileobj(f_in, f_out)
>>> ssmpy.semantic_base("go.db")
>>> l = sorted(ssmpy.get_uniprot_annotations("Q12345"))
>>> l
[1746, 9044, 17053, 21566, 24341, 57621, 95359]
ssmpy.
information_content
(entry)¶Get information content of a semantic base entry according to intrinsic.
entry (int) – semantic base ID
information content
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.ssm.intrinsic = True
>>> ssmpy.information_content(gold)
1.5040773967762742
ssmpy.
information_content_extrinsic
(entry)¶Get the extrinsic information content of a semantic base entry.
The values are precomputated at the time of creation of the semantic base according to the annotations file provided.
entry (int) – semantic base ID
extrinsic information content
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.information_content_extrinsic(gold)
1.2992829841302609
ssmpy.
information_content_intrinsic
(entry)¶Get the intrinsic information content of a semantic base entry.
entry (int) – semantic base ID
intrinsic information content
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.information_content_intrinsic(gold)
1.5040773967762742
ssmpy.
light_similarity
(conn, entry_ids_1, entry_ids_2, metric, cpu_cores)¶main function :param conn: db_connection :param entry_ids_1: list of entries 1 :param entry_ids_2: list of entries 2 :param cpu_cores: number of cores to be used :param metric: ‘lin’, ‘resnick’, ‘jc’ or ‘all’ :return: list with results ([e1, e2, similarity] or [e1, e2, similarity resnik, similarity lin, similarity jc])
>>> import ssmpy
>>> ssmpy.create_semantic_base('doid.owl', 'doid.db', "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "")
>>> conn = ssmpy.create_connection('doid.db')
>>> list1 = ['DOID_10587', 'DOID_2841']
>>> list2 = ['DOID_1927', 'DOID_1324']
>>> ssmpy.light_similarity(conn, list1, list2, 'all', 4)
[[['DOID_10587', 'DOID_1324', -0.0, -0.0, 0.068819810490695],
['DOID_10587', 'DOID_1927', 5.937536205082426, 0.8269561090992177, 0.28695173228265203]],
[['DOID_2841', 'DOID_1324', 3.703943983575332, 0.659762410973656, 0.20745912457314464],
['DOID_2841', 'DOID_1927', -0.0, -0.0, 0.07658496040867407]]]
ssmpy.
num_paths
(entry1, ancestor)¶Get number of paths (edges) between two concepts.
entry1 (int) – Child concept
ancestor (int) – Parent concept
number of edges between the two concepts
int
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> metal = ssmpy.get_id("metal")
>>> ssmpy.num_paths(gold, metal)
5
ssmpy.
run_query
(query, params)¶Run any query on the semantic base.
query (string) – query to run on the semantic base
params (tuple) – query parameters
query result
sqlite3.Cursor
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> query = "SELECT id FROM entry WHERE name = ?"
>>> ssmpy.run_query(query, ("gold",)).fetchone()
(3,)
ssmpy.
semantic_base
(sb_file, **kwargs)¶Initialize global connection object.
You can also pass other arguments to be given to the sqlite3.connect method, for example check_same_thread
.
After this method is called, the other methods will be applied to the semantic base.
sb_file (string) – sqlite database filename
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
Calculate the shared information content of two concepts according to the value of ssmpy.ssm.mica
Previously computed values are stored in memory for faster computation.
entry1 (int) – First concept
ancestor (int) – Second concept
Shared information content
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm.mica = True
>>> ssmpy.shared_ic(gold, silver)
0.587786664902119
Calculate the shared information content of two concepts using disjunctive common ancestors.
entry1 (int) – First concept
ancestor (int) – Second concept
Shared information content
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.shared_ic_dca(gold, silver)
0.587786664902119
Calculate the shared information content of two concepts using the most informative common ancestor.
entry1 (int) – First concept
ancestor (int) – Second concept
Shared information content
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.shared_ic_mica(gold, silver)
0.587786664902119
ssmpy.
ssm_jiang_conrath
(entry1, entry2)¶Calculate JC’s semantic similarity.
entry1 (int) – First concept
ancestor (int) – Second concept
Semantic similarity
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_jiang_conrath(gold, silver)
0.5456783339686456
ssmpy.
ssm_lin
(entry1, entry2)¶Calculate Lin’s semantic similarity.
entry1 (int) – First concept
ancestor (int) – Second concept
Semantic similarity
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_lin(gold, silver)
0.39079549108439265
ssmpy.
ssm_multiple
(m, entry1_list, entry2_list)¶Calculate semantic similarity over two lists of concepts.
m – semantic similarity function
entry1_list – First concept list
entry2_list – Second concept list
Aggregate Similarity Measure
float
>>> import ssmpy
>>> import urllib.request
>>> import gzip
>>> import shutil
>>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0]
'go.db.gz'
>>> with gzip.open('go.db.gz', 'rb') as f_in:
... with open('go.db', 'wb') as f_out:
... shutil.copyfileobj(f_in, f_out)
>>> ssmpy.semantic_base("go.db")
>>> e1 = ssmpy.get_uniprot_annotations("Q12345")
>>> e2 = ssmpy.get_uniprot_annotations("Q12346")
>>> ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2)
1.653493583942882
ssmpy.
ssm_resnik
(entry1, entry2)¶Calculate Resnik’s semantic similarity.
entry1 (int) – First concept
ancestor (int) – Second concept
Semantic similarity
float
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_resnik(gold, silver)
0.587786664902119