Welcome to PDXIntegrator’s documentation!

Setting up PDXIntegrator

PDXIntegrator is a Java program that requires at least Java version 8.

Note that for now, we are using a private fork of the ontolib library for parsing the NCIT obo file. Go to https://github.com/monarch-initiative/OLPG and enter the following commands (notify Peter if you need access).

$ git clone https://github.com/monarch-initiative/OLPG
$ cd OLPG
$ mvn install

This will put the OLPG library files into your local maven repository (.m2 directory) and let you build PDXIntegrator.

PDXIntegrator is provided as a maven project. The quickest way to set it up is to clone the project from the GitHub site at https://github.com/TheJacksonLaboratory/PDXintegrator and then to use maven to build the project. In the following, we show a command that will display a help message–if you see this, then you have successfully built the program.

$ git clone https://github.com/TheJacksonLaboratory/PDXintegrator
$ cd PDXintegrator
$ mvn package
$ java -jar target/PdxIntegrator.jar

[ERROR] no command

Program: PdxIntegrator (Common Knowledge Graph PdxModel for PDXNet)
Version: 0.0.2

usage: java -jar PdxIntegrator.jar command [-c <arg>] [-d <arg>] [-o <arg>]
Available commands:

download
    java -jar PdxIntegrator.jar download [-d directory]: Download NCI files to directory at -d (default="data").

simulate
    java -jar PdxIntegrator.jar simulate [-d directory]: Requires NCI files in directory at -d (default="data").

map
    java -jar PdxIntegrator.jar map [-d directory]: todo.

Setting up drugbank

Before running the simulation command, the Drugbank XML file needs to be downloaded and processed. See drugbank.

Running the simulation command

Currently, we are building out a complete RDF model for the PDX Minimal Information standard (https://www.ncbi.nlm.nih.gov/pubmed/29092942). Our strategy for development is to generate random PDX cases, write the corresponding RDF code to file (with the simulate command). Then, to test the query-ability of the model, we ingest the RDF file and query it using SPARQL queries (with the query command). This is inteded to allow collaborators to view the model and make suggestions for improvement. The end game would be to develop this code to allow ETL of PDX data into this model. The following sections show how to set up and run the simulation

Drugbank

We propose to use the Drugbank resource to track medications. According to the Drugbank website (2018-02-26), The latest release of DrugBank (version 5.0.11, released 2017-12-20) contains 10,999 drug entries including 2,504 approved small molecule drugs, 942 approved biotech (protein/peptide) drugs, 109 nutraceuticals and over 5,108 experimental drugs. A recent Nucleic Acid Research database article provides further details. Drugbank is licensed under a a Creative Common’s Attribution-NonCommercial 4.0 International License, and therefore can be used freely for non-commercial applications.

Our usage of drugbank

We used the Java JAXB tool xjc to create Java classes that mirror the XSD XML schema definition for Drugbase (the schema is available here: https://www.drugbank.ca/releases/latest). To use PDXIntegrator, you will need to download the Drugbase XML file (which requires free registration). Then follow these steps to unpack the XML file and to remove the space character from the file name.

$ unzip drugbank_all_full_database.xml.zip
    Archive:  drugbank_all_full_database.xml.zip
    inflating: full database.xml
$ mv full\ database.xml fulldatabase.xml

Now we can extract the contents of the XML file for use by PDXIntegrator. The following command does that and creates a new file in the data directory that is also used by PDXIntegrator to store downloaded files.

$ java -jar target/PdxIntegrator.jar drugbank --drugbank fulldatabase.xml

Adjust the path to fulldatabase.xml as necessary.

This will generate a new file in the data directory called drugbank.tab. The contents of this file look like this.

Denileukin diftitox     173146-27-5     DB00004 ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS
Etanercept      185243-69-0     DB00005 ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS
Bivalirudin     128270-60-0     DB00006 BLOOD AND BLOOD FORMING ORGANS
Leuprolide      53714-56-0      DB00007 ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS
Peginterferon alfa-2a   198153-51-4     DB00008 ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS
Alteplase       105857-23-6     DB00009 BLOOD AND BLOOD FORMING ORGANS
Sermorelin      86168-78-7      DB00010 SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS
Interferon alfa-n1      74899-72-2      DB00011 ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS

For the purposes of this demonstration program, we will use the antineoplastic agents. ToDO–if we choose to stick with DrugBank, we can exploit the hierarchy and will need to emit some RDF to represent the hierarchy that is recorded in the XML file (for now, we will treat these medications as literals for demo purposes).

NCIT

We will use the NCIT thesaurus to provide IDs for the concepts in PDXnet. /

Parsing the NCIT ontology file with OWL-API

The National Cancer Institute (NCI) has developed a thesaurus (NCIT) of many kinds of cancer-relevant data. The Monarch Initiative has developed a version of this ontology as an OBO file. This Wiki provides details of this effort.

PDXIntegrator will download the OBO file with the following command.

$ java -jar target/PdxIntegrator.jar download

The ncit.obo file will be stored in a newly created directory called data. If the file is already present, PDXIntegrator will emit a warning message (delete the file if you want PDXIntegrator to download a fresh copy of the file).

PDXIntegrator uses the code in the class NcitOwlApiParser to parse the Neoplasm terms of NCIT (only). Currently, it only saves the IDs and the labels (although it may be useful in the future to also ingest the synonyms in case this code will be used to drive an autocomplete. This is sufficient to cause the simulated patients to have real NCIT diagnosis codes for the cancers.

For the SPARQL queries, it will be necessary to include the subclass definitions of NCIT to enable queries that use the NCIT hierarchy.

Clinical/Patient Module

The PDX-MI provides a clinical module that is divided into two submodules: “clinical/patient” and “clinical/tumor.” “Clinical/patient” requires information about the patient from which the engrafted tumor originates, including age, sex, ethnicity, and disease diagnosis.

Field Rec Description Example PDXNet
Submitter Patient ID D Unique ID. Tumor id if not supplied PAT-123 CenterID:PatientID
Sex D Patient Sex female NCIT:C16576
Age at diagnosis D Age at diagnosis 30 binned in 5 year age groups
Age at collection D Age at specimen collection 30 binned in 5 year age groups
Submitted Diagnosis E Initial Clinical Diagnosis invasive breast cancer invasive breast cancer
Diagnosis E Initial Clinical Diagnosis Term invasive breast cancer NCIT:C6257
Consent to share E Patient Consent to share data   PDXNET:consent_ACADEMIC_ONLY
Ethnicity D   caucasian currently string literal
Race D     currently string literal
Current treatment drug D   everolimus CHEMBL83
Current trtmnt protocol D   afinitor;10 mg/day  
Prior treatment protocol D   afinitor;10 mg/day  
Prior treatment response D     RECIST
Virology Status D   HIV-/HBV-/HCV+/HTLV-/EBV+ NCIT

Table 1. Rec: Recommendation; E: essential; D:desirable. Desireable fields will be shown as ‘Not Reported’ if no data is provided.

1. Submitter Patient ID. Used when possible for identifying multiple models from the same patient. Patient 123 from JAX would be shown on the PDXNet website as JAX:PAT-123. Will use supplied PDXNet abbreviations to prefix any ID. If no ID is supplied, the ID will be based on Tumor ID.

2. Sex. Patient sex. We will use this field to record biological sex. We will use the NCIT terms:

  • Female (Code NCIT:C16576): A person who belongs to the sex that normally produces ova. The term is used to indicate biological sex distinctions, or cultural gender role distinctions, or both.
  • Male (Code NCIT:C20197): A person who belongs to the sex that normally produces sperm. The term is used to indicate biological sex distinctions, cultural gender role distinctions, or both.

3. Age at Diagnosis. Patient age at time of initial clinical diagnosis in years. For display : to reduce the possibility of patient identification, PDX-MI recommends grouping ages into 5-year groups, although more granular groupings may be used in cases such as pediatric tumors if approved by a contributor’s Institutional Review Board. Here, we have implemented binned age groups as follows.

PDXNET:PAT-1511  PDXNET:ageBinLowerRange 55 ;
    PDXNET:ageBinUpperRange      59 ;
    (...) .

In the simulation code, we simulate using 5 year bins, but any ranges could be used in real code. For now, the age is understood to be in years, and if there is a need to be more precise we would need to change the model. TODO it may be better to include the word “year” in the predicate name?

Note that there is a mistake in the PDX-MI, which uses a six year range instead of a five year range: 30–35 (binned in 5-year age groups)

4. Age at Collection. Patient age when specimen was collected in years. Will be binned for dispaly as above.

  1. Submitted Diagnosis. The initial clinical diagnosis provided as free text.

6. Diagnosis. Initial clinical diagnosis. Submitted diagnosis mapped to NCIT term Note this represents the initial diagnosis and may be less precise than the histological diagnosis used in the second module. We will take the diagnosis codes from NCIT. The following shows an example triple for an individual with a diagnosis of Central Nervous System Histiocytic Sarcoma (NCIT:C129807).

PDXNET:PAT-1511 PDXNET:hasDiagnosis   NCIT:C129807 .

7. Consent to share data. Patient consent. Reporting on consent is essential. We are using the following codes.

  • PDXNET:consent_NO
  • PDXNET:consent_YES
  • PDXNET:consent_ACADEMIC_ONLY

TODO 1 define what we mean by these categories and add more categories as necessary. TODO 2. Define relation to NCIT term for Consent (NCIT:C25460).

8. Ethnicity Patient ethnicity. TODO we need to decide which reference terminology to use. One option is to adopt the NCIT terminology, which includes

9. Race Patient race. TODO we need to decide which reference terminology to use. One option is to adopt the NCIT terminology, which includes

10. Current treatment drug Patient treatment at time of specimen sample. We would like to have a resource to that represents classes, ingredients, brand names, dosage forms, etc., in a computable manner. There are several contenders, including ChEMBL and RxNorm (a standardized drug nomenclature maintained by the National Library of Medicine), but I think that DrugBank is the best option because it is comprehensive, it combines detailed drug data with comprehensive drug target information, it has an open source (Creative Common’s Attribution-NonCommercial 4.0 International License) license, and it is easy to use. A newer resource DrugCentral can also be used to map between many resources. TODO this is an issue that will require thought and consensus building. Please communicate ideas/comments to Peter. Currently, the PDXNet simulation is showing data from DrugBank as a literal (String).

PDXNET:PAT-248 PDXNET:currentTreatmentDrug  "Leuprolide[DB00007;53714-56-0]" .

The String currently shows the name (Leuprolide), the DrugBank ID (DB00007), and the CAS id (53714-56-0). If we decide to go with DrugBank, then probable the triple should be formed like this.

PDXNET:PAT-248 PDXNET:currentTreatmentDrug  drugbank:DB00007 .

By adding other information from DrugBank to the RDF data available in our query engine, it would be possible to formulate expressive queries about PDX models that have been treated by drugs that correspond to some overall treatment category (e.g., Leuprolide corresponds to L02AE - Gonadotropin releasing hormone analogues), have certain indications (e.g., Leuprolide is indicated for Advanced Prostate Cancer), interact with certain drugs (e.g., Allicin; The therapeutic efficacy of Allicin can be decreased when used in combination with Leuprolide), etc.

11. Current treatment protocol (dose; details) There is currently no ontology that I know of for representing dosages. There are many ways of representing dosages, e.g., 10 mg/day or 5 mg b.i.d. TODO discuss what methodology would work best for PDX centers.

12. Prior treatment protocol The medication data should be represented as above. The surgery data could be represented using MedDRA codes (a rich and highly specific standardised medical terminology to facilitate sharing of regulatory information internationally for medical products used by humans), but MedDRA does not have an open license and it may be difficult to reuse/redistribute, and so if we want to use MedDRA we would need to come to an agreement with them. MeSH would be an option, although MeSH is not always ontologically well structured, but there are a large number of terms. The NCI thesaurus has a hierarchy of terms for Intervention or Procedure, including Cancer Diagnostic or Therapeutic procedure, including terms for operations such as Mastectomy (NICT:C15277).

This is probably sufficient for our needs, and I would suggest we use this.

TODO – decide if the NCIT codes are sufficient for our needs. I suggest that we examine the subhierarchy underneath the term Cancer Diagnostic or Therapeutic Procedure (Code C79426).

13. Response to prior treatment progressive disease (RECIST1.1) These items can be represented in the NCIT, which has a subhierarchy for Clinical Course of Disease (Code C35461), which includes items such as “Complete remission”, “Progressive disease” and many more. Currently, the PDXIntegrator uses the following five terms

  • notAssessed
  • completeResponse
  • partialResponse
  • stableDisease
  • progressiveDisease

TODO Decide on whether we want to limit this category to a small number of terms (like the above), to allow any term from the NCIT Clinical Course of Disease subhierarchy, or choose some other scheme. Currently, I am using the PDXNET namespace for these terms in the RDF code, but we should use the NCIT namespace once we have decided where to take this.

14. Virology status Probably the NCIT subhierarchy of Viral infection (Code C3439), (which includes these viruses and many more) would be best. We can represent this in RDX using a scheme such as this.

PDXNET:PAT-248 PDXNET:virologyStatus  NCIT:C141405 .

where NCIT:C141405 is the code for Hepatitis B Virus Positive (Code C141405). Note that we may either want to use the terms for virus infection (which is a clinical diagnosis) or for serology (as in this example, with the term coming from the Laboratory Finding subhierarchy of NCIT). It depends on how we want to model this. TODO Determine the terminology and the depth of detail we want to capture.

Clinical/Tumor module

The following table shows the recommendations from the PDX-MI manuscript.

Field Rec Description Example PDXNet
Submitter Tumor ID E Tissue ID TUM-123 Unique ID
Primary Tumor Tissue E Primary Tumor Tissue breast UBERON code
Primary, Met, Recurrence E Disease Progression Recurrence enumeration
Specimen Tumor Tissue E Sampled Tissue breast UBERON code
Tissue Histology D Histologic Diagnosis invasive ductal carcinoma NCIT code
Tumor Grade D Tumor Grade grade 3 AJCC Grade
Stage; T N M D Tumor Stage   AJCC TNM Stages
Diagnostic Markers D Clinical BioMarkers ER+, PR+, HER2+;  
Treatment Naive Patient D   yes/no enumeration
Tumor Sample Type D Collection Procedure biopsy enumeration
Subline of   Subline of model PDX-123 Model Identifier
Subline reason   Why a subline Lost Cisplatin Resist. String

1. Submitter Tumor ID Display as CenterID:TumorID to act as a primary key. If not provided, patient ID can be used. Tumor 123 from JAX would be shown on the PDXNet website as JAX:TUM-123

2. Primary tumor tissue The tissue of the primary tumor. For now, the simulation shows a random UBERON code for an anatomical entity. We will use the uberon cross-species anatomy ontology [8] that is developed by Monarch Initiative (M. Haendel, C. Mungall).

3. Primary, metastasis,recurrence Is the specimen tissue from the primary tumor, a metastasis or a recurrance For now, we are using PDXNet entities, but we should use the NCIT terms for these items. This would allow users to enter a more specific NCIT term such as Distant metastasis (C18206), which is a child of Metastasis (C19151)

4. Specimen tumor tissue Tissue from which the specimen was collected. Same as Primary tissue if the tumor is not metastatic. Uberon as above. TODO For melanoma do we want to capture specimen location?

5. Tissue histology This is the pathologist’s diagnosis and may often represent a refinement of the clinical diagnosis given in the Patient/Clinical module. Should use the same terminology as diagnosis, but represent the pathologist’s findings.

6. Tumor Grade For now we are using PDXNet codes, but we will switch to the NCIT subhierarchy, although I think they may need some TLC. We will work with NCIT to revise these terms as a part of Monarch’s ongoing collaboration with NCIT.

7. Disease Stage; classification T3N2M1; TNM or Non applicable (example blood cancer) Should follow Tumor Grade; classification standard Use AJCC. This will be seperated into pT,pN,PM and stage

8. Specific markers (diagnostic linked) Clinically relevant bio markers. Pairs of Marker:Status where status can be “positive”, “negative”, a percent value, or a variant and optional platform Most of the assays such as IHC are covered by the NCIT under the subhierarchy “Laboratory Procedure”. That NCIT subhierachy also includes items for Receptor status (e.g., HER2/Neu positive), and these will be linked to external representations of genes/proteins by the Monarch collaboration.

9. Treatment naive patient? yes/no (enumeration) Yes means patient has never had neoadjuvant treatment. per TCGA

10. Original tumor sample type The process used to collect the sample. biopsy, surgical sample, punch NCIT has a subhierarchy of terms for biopsy and biopsy locations, that will be linked to uberon etc by the Monarch collaboration. Some terms appear to be missing, e.g., “ascites fluid”, but will be added to NCIT as needed for PDXNet.

11. Subline of If this model is created as a subline of an existing model indicate which model it is a subline of.

12. Subline reason We may need to create our own mini-terminology to describe the reasons for using a subline

Model creation module

The following table shows the recommendations from the PDX-MI manuscript.

Field Rec Description Example PDXNet
Submitter PDX ID E Unique ID for the PDX Model PDX#123 CenterID:PatientID
Passage E Tissue Passsage. P0 tissue is from PT P1 create PDXNet classes?
Mouse strain E Name of strain used for engraftment NOD.Cg-Prkdc<scid> Il2rg<tm1Wj>l/SzJ create PDXNet classes?
Mouse source E Institution supplying strain The Jackson Laboratory create PDXNet classes?
Mouse sex D Sex of mouse used for engaftment Male  
Mouse immune system humanised? E   yes/no  
Type of humanisation E   CD34+hematopoietic stem cell-engrafted  
Engraftment Procedure D   suspension ?
Engraftment Method D     ?
Engraftment Site D   right flank ?
Mouse treatment for engraftment D   estrogen treatment ?
Engraftment rate D   80% literal
Engraftment time D   8 weeks literal

1. Submitter PDX ID. This field is analogous to PatientID. We display as CenterID:PDXID and keep an internal ID that will not be shown externally to act as a primary key. For instance, PDX 123 from JAX would be shown on the PDXNet website as JAX:PDX-123

2. Passage. We neet to have a standard for this. Suggesting that P0 would indicate engrafted tissue is from the patient. P1 tissue comes from P0. P2 from P1 etc.

3. Mouse Strain. We will allow strains to be denoted according to the MGI guidelines.

4. Mouse Source. Institution providing the strain. An enumeration.

5 Mouse Sex.

6. Strain Immune System Humanised? Yes or No

7. Type of Humanisation. Description of humanisation method.

8. Engraftmet Procedure. Enumeration (TODO Need input: list of all the methods and whether any vocabulary exists?)

9. Engraftment Method. Enumeration (TODO Need input: list of all the methods and whether any vocabulary exists?)

10. Engraftment Site. Subcutaenous, Mammary Fat Pad, Orthotopic, etc. Where the tissue is engrafted to the mouse.

11. Mouse Treatment for Engraftment. estrogen treatment Enumeration (TODO Need input: list of all the methods and whether any vocabulary exists?)

12. Engraftment Rate. 80%. Would it be better to state “n of m” rather than a percentage? TODO Use percent or N of M?

13. Engraftment Time. 8 weeks Number of weeks/days PDMR uses “Estimated days from implant to 500 mm3” Both Rate and Time would benefit from noting if tissue was cryopreserved. Should this be captured as a field?

Model quality assurance

The model quality assurance module captures information about tissue provenance and fidelity of the passaged tumor with respect to key characteristics of the patient tumor.

The following table shows the recommendations from the PDX-MI manuscript.

In terms of RDF modeling, we will add these items as properties of the Pdx Sample.

Field Rec Example PDXNet
Quality control method(s) E histology and IHC create PDXNet classes?
Quality control results E   Text
Animal health status D SPF/SOPF  
Passage QA performed D P4 Integer

Table 2.4. Model creation Q/A module. Rec: Recommendation; E: essential; D:desirable.

1. Quality control method TODO Need to get a set of QC methods.

2. Quality control results Description of the results of the QA/QC method

3. Animal health status Pertains to the status of mouse room models are kept in.

4. Passage QA performed The passage or passages on which QA was performed. As models are repeately passaged QA status may change. If QA/QC is done on multiple passages multiple QA sections can be added

Model study module

Tumors from PDX often undergo comprehensive genomic characterization and/or treatment in controlled dosing studies to define therapeutic response and resistance. PDX-MI includes desirable fields in the reporting of these studies that supplement existing guidelines for reporting on in vivo biomedical research ( Meehan et al., 2017).

Field Rec Example PDXNet
Study name or identifier   PDX-123P3 Pertuzmab/Trastuzumab Needs to be unique to attach files
Treatment D pertuzumab in combination with trastuzumab; CHEMBL2007641 List of 1 or more generic drugs
Treatment protocol D trastuzumab (30 mg/kg loading dose, 15 mg/kg weekly); Additional module to capture Drug,Dose,Route,Frequency
Treatment Response D   RECIST Term
Passage D P2 Integer
Metastasis D Yes Yes or No
Metastasized to D Liver Uberon
Metastasis in passage D P3 Integer
Lag time/doubling time D 48h separate elements

Table 2.5. Model study module. Rec: Recommendation; E: essential; D:desirable.

1. Study Name or identifier (if not available could be model + passage + treatment) A way to uniquly identify the study. Human readable or unique ID

2. Treatment List of treatment(s) as generic terms for medications.

3. Treatment protocol For each treatment drug provide drug name, dosage, route and frequency.

4. Treatment response complete response, partial response, stable disease, progressive disease.

5 Model Passage Passage(s) of models used in study. Assumption is P0 modles have tissue directly from patient. P1 is engrafted with tissue from P0 etc.

Tumor OMICS: This was removed. It will be populated based on types of files uploaded for Study/Model.

6,7,8 Development of metastases in strain We will code this as Yes/no; site as uberon; passage as enumeration

9. Lag time/doubling time of tumor We will code this as the number of hours.

Demonstration of simulating cases and performing SPARQL queries

This page demonstrates how to run the demonstration query.

First we run the simulate command of PDXIntegrator to produce an RDF file with “randomized” cases. By default, 5 random cases are produced.

$ java -jar target/PdxIntegrator.jar simulate

This will produce a file called simulatedCases.rdf in the current working directory. This file will use the RDF/XML format, but the program will also emit the same RDF data in the Turtle format. For instance,

@prefix PDXNET: <http://pdxnetwork/pdxmodel#> .
@prefix NCIT:  <http://purl.obolibrary.org/obo/NCIT#> .
@prefix UBERON: <http://purl.obolibrary.org/obo/UBERON#> .

PDXNET:PAT-1  PDXNET:age_group  "0-4 years" ;
    PDXNET:consent       PDXNET:consent_YES ;
    PDXNET:ethnicity     "Sephardic" ;
    PDXNET:gender        PDXNET:female ;
    PDXNET:hasDiagnosis  NCIT:C130038 ;
    PDXNET:hasTumor      PDXNET:TUMOR-PAT-1 ;
    PDXNET:patient_id    "PAT-1" .

PDXNET:TUMOR-PAT-1  PDXNET:hasSubmitterTumorId
            "TUMOR-PAT-1" ;
    PDXNET:stage                NCIT:C19251 ;
    PDXNET:tissueOfOrigin       UBERON:35975 ;
    PDXNET:tumorCategory        NCIT:C3352 ;
    PDXNET:tumorGrade           NCIT:C121173 ;
    PDXNET:tumorHistology       NCIT:C130038 .

PDXNET:PAT-0  PDXNET:age_group  "15-19 years" ;
    PDXNET:consent       PDXNET:consent_NO ;
    PDXNET:ethnicity     "hispanic or latino" ;
    PDXNET:gender        PDXNET:male ;
    PDXNET:hasDiagnosis  NCIT:C7326 ;
    PDXNET:hasTumor      PDXNET:TUMOR-PAT-0 ;
    PDXNET:patient_id    "PAT-0" .

PDXNET:TUMOR-PAT-0  PDXNET:hasSubmitterTumorId
            "TUMOR-PAT-0" ;
    PDXNET:stage                NCIT:C19251 ;
    PDXNET:tissueOfOrigin       UBERON:4146 ;
    PDXNET:tumorCategory        NCIT:C8509 ;
    PDXNET:tumorGrade           NCIT:C48934 ;
    PDXNET:tumorHistology       NCIT:C7326 .

SPARQL Queries

We will use the corresponding RDF/XML file to perform demonstration SQPARL queries. For this, we use the query command, which produces output like this.

PREFIX pdxnet: <http://pdxnetwork/pdxmodel_>
PREFIX ncit: <http://purl.obolibrary.org/obo/NCIT_>
SELECT ?patient_id ?consent ?diagnosis
WHERE {
  ?x pdxnet:patient_id ?patient_id .
  ?x pdxnet:consent ?consent .
  ?x pdxnet:hasDiagnosis ?diagnosis .
  }
LIMIT 5
Lock : main
Lock : main
----------------------------------------------------------
| patient_id | consent                      | diagnosis  |
==========================================================
| "PAT-846"  | pdxnet:consent_NO            | ncit:C5235 |
| "PAT-1256" | pdxnet:consent_ACADEMIC_ONLY | ncit:C4887 |
| "PAT-127"  | pdxnet:consent_NO            | ncit:C5656 |
| "PAT-179"  | pdxnet:consent_YES           | ncit:C7811 |
| "PAT-1477" | pdxnet:consent_ACADEMIC_ONLY | ncit:C7965 |
----------------------------------------------------------

###########  Next Query  ###########  Next Query

PREFIX pdxnet: <http://pdxnetwork/pdxmodel_>
PREFIX ncit: <http://purl.obolibrary.org/obo/NCIT_>
PREFIX uberon: <http://purl.obolibrary.org/obo/UBERON_>
SELECT ?patient_id ?currentTreatmentDrug ?diagnosis
WHERE {
  ?x pdxnet:patient_id ?patient_id .
  ?x pdxnet:currentTreatmentDrug ?currentTreatmentDrug .
  ?x pdxnet:gender pdxnet:female .
  ?x pdxnet:hasDiagnosis ?diagnosis .
  }
LIMIT 5
Lock : main
Lock : main
--------------------------------------------------------------------------
| patient_id | currentTreatmentDrug                         | diagnosis  |
==========================================================================
| "PAT-846"  | "Goserelin[DB00014;65807-02-5]"              | ncit:C5235 |
| "PAT-1256" | "Sargramostim[DB00020;123774-72-1]"          | ncit:C4887 |
| "PAT-1477" | "Peginterferon alfa-2a[DB00008;198153-51-4]" | ncit:C7965 |
| "PAT-1770" | "Cetuximab[DB00002;205923-56-4]"             | ncit:C7061 |
| "PAT-1676" | "Cetuximab[DB00002;205923-56-4]"             | ncit:C8834 |
--------------------------------------------------------------------------

###########  Next Query  ###########  Next Query

PREFIX pdxnet: <http://pdxnetwork/pdxmodel_>
PREFIX ncit: <http://purl.obolibrary.org/obo/NCIT_>
PREFIX uberon: <http://purl.obolibrary.org/obo/UBERON_>
SELECT ?patient_id ?currentTreatmentDrug ?diagnosis ?age_lowerrange ?age_upperrange
WHERE {
  ?x pdxnet:patient_id ?patient_id .
  ?x pdxnet:currentTreatmentDrug ?currentTreatmentDrug .
  ?x pdxnet:gender pdxnet:female .
  ?x pdxnet:hasDiagnosis ?diagnosis .
  ?x pdxnet:ageBinLowerRange ?age_lowerrange .
  ?x pdxnet:ageBinUpperRange ?age_upperrange .
  FILTER (?age_lowerrange > 55) .
  }
LIMIT 5
Lock : main
Lock : main
-----------------------------------------------------------------------------------------------------------
| patient_id | currentTreatmentDrug                       | diagnosis   | age_lowerrange | age_upperrange |
===========================================================================================================
| "PAT-1256" | "Sargramostim[DB00020;123774-72-1]"        | ncit:C4887  | 75             | 79             |
| "PAT-1770" | "Cetuximab[DB00002;205923-56-4]"           | ncit:C7061  | 105            | 109            |
| "PAT-75"   | "Denileukin diftitox[DB00004;173146-27-5]" | ncit:C5631  | 75             | 79             |
| "PAT-1765" | "Pegfilgrastim[DB00019;208265-92-3]"       | ncit:C7964  | 80             | 84             |
| "PAT-851"  | "Leuprolide[DB00007;53714-56-0]"           | ncit:C27754 | 65             | 69             |
-----------------------------------------------------------------------------------------------------------

Development plans

Currently, there are prototype versions of all modules but one. We will go through the entire PDX-MI ontology specification in this document : https://docs.google.com/document/d/1M81y8wbT5gegUe35RZwS92bvHLYJrVPhaFnnkECgbto/edit and will implement RDF patterns, and will test the ability to query the data with SPARQL. Once this is mature and tested, we will adapt the code to provide ETL and Q/C functionalities.

Visualization

This is a nice tool for visualizing RDF graphs: http://visgraph3.org/

PDXIntegrator

PDXIntegrator is inteded to provide a common semantic model for PDXNet. The project is currently in a preliminary stage and we do not recommend use of any of this code in production environments. Instead this site is intended to provide information about current plans for collaborators. Please visit https://github.com/TheJacksonLaboratory/PDXintegrator to view the current code base.

To get a very first impression of this project, we recommend that you go through the Demonstration of simulating cases and performing SPARQL queries.