MedCo Technical Documentation

System Administrator Guide

Specifications

We recommend the following specifications for running MedCo:

  • Network Bandwidth: >100 Mbps (ideal), >10 Mbps (minimum), symmetrical

  • Ports Opening and IP Restrictions: see Network Architecture

  • Hardware

    • CPU: 8 cores (ideal), 4 cores (minimum)
    • RAM: >16 GB (ideal), >8GB (minimum)
    • Storage: dependent on data loaded, >100GB
  • Software

    • OS: Any flavor of Linux, physical or virtualized (tested with Ubuntu 16.04, 18.04, Fedora 29)
    • Softwares: OpenSSL, Docker (tested with Docker 18.09.1) & Docker-Compose (tested with Docker-Compose 1.23.2), Git and Git-LFS

Deployment

Local Test Deployment

Profile test-local-3nodes

This test profile deploys 3 MedCo nodes on a single machine for test purposes. It can be used either on your local machine, or any other machine to which you have access. The version of the docker images used are the latest released versions. This profile is for example used for the MedCo public demo.

MedCo Node Deployment (except IRCT)

First step is to get the MedCo Deployment latest release.

$ cd ~
$ wget https://github.com/lca1/medco-deployment/archive/v0.1.1c.tar.gz
$ tar xvzf v0.1.1c.tar.gz
$ mv medco-deployment-0.1.1c medco-deployment

Next step is to download and build the docker images:

$ cd ~/medco-deployment/compose-profiles/test-local-3nodes
$ docker-compose pull
$ docker-compose build

Final step is to run the nodes. They will run simultaneously, and the logs of the running containers will maintain the console captive. No configuration changes are needed in this scenario before running the nodes. To run them:

$ docker-compose up

Wait some time for the initialization of the containers to be done (up to the message: “i2b2-medco-srv… - Started x of y services (z services are lazy, passive or on-demand)”), this can take up to 10 minutes. For the subsequent runs, the startup will be faster.

IRCT Deployment and Configuration

First step is to clone the IRCT repository with the correct branch.

$ cd ~
$ git clone -b MedCo-v0.1.1 https://github.com/lca1/IRCT.git

Currently IRCT must be deployed separately, this will change in the future:

$ cd ~/IRCT/deployments
$ docker-compose -f docker-compose.medco.test-local-3nodes.yml build

Next, if running on another machine than the local host, a configuration file must be changed. If running on the local host, the default settings can be left in place. Edit the file ~/medco-deployment/compose-profiles/test-local-3nodes/.env to reflect your configuration. For example:

MEDCO_NODE_URL=https://medco-demo.epfl.ch
HTTP_SCHEME=https

MEDCO_NODE_URL should include the protocol and the fully qualified domain name of the host, HTTP_SCHEME should be http or https.

Follow HTTPS Configuration to set up the certificates needed for HTTPS. If you are deploying on another host than the local host without HTTPS take note of the following: Disabling HTTPS requirement for external connections.

In a separate terminal run the IRCT container:

$ chmod -R a+rw ../
$ docker-compose -f docker-compose.medco.test-local-3nodes.yml up

Again, the initial startup takes up a few minutes as IRCT is compiled at that point (up to the message: “irct_1… - Started x of y services (z services are lazy, passive or on-demand)”).

In order to stop the containers, simply hit Ctrl+C in all the active windows.

Keycloak Configuration

Follow the instructions from Keycloak Configuration and then you should be able to login in Glowing Bear.

Test the deployment

In order to test that the local test deployment of MedCo is working, access Glowing Bear in your web browser at http://<domain name> (or https) and use the credentials previously configured during the Keycloak Configuration. If you are new to Glowing Bear you can watch the Glowing Bear user interface walkthrough video.

By default MedCo loads a specific test data, refer to Description of the default test data for expected results to queries. To load a dataset, follow the guide Loading Data. For reference, the database address (host) to use during loading is <domain name>:5432 and the databases i2b2medcosrv0, i2b2medcosrv1 and i2b2medcosrv2.

Network Test Deployment

Profile test-network

This test profile deploys an arbitrary set of MedCo nodes independently in different machines that together form a MedCo network. This deployment assumes each node is deployed in a single dedicated machine. All the machines have to be reachable between each other. Nodes should agree on a network name and individual indexes beforehand (to be assigned an UID). The next set of steps must be executed individually by each node of the network.

This guide is for the latest released version of the docker images.

Preliminaries

First step is to get the MedCo Deployment latest release at each node.

$ cd ~
$ wget https://github.com/lca1/medco-deployment/archive/v0.1.1b.tar.gz
$ tar xvzf v0.1.1b.tar.gz
$ mv medco-deployment-0.1.1b medco-deployment
Generation of the Deployment Profile

Next the compose and configuration profiles must be generated using a script. This script is executed in two steps.

  • Step 1: each node generates its keys and certificates, and shares its public information with the other nodes
  • Step 2: each node collects the public keys and certificates of the all the other nodes

For step 1, the network name should be common to all the nodes. A <node domain name> corresponds to the machine domain name where the node is being deployed. As mentioned before the different parties should have agreed beforehand on the members of the network, and assigned an index to each different node to construct its UID (starting from 0, to n-1, n being the total number of nodes).

$ cd ~/medco-deployment/resources/profile-generation-scripts/test-network
$ bash step1.sh <network name> <node index> <node domain name>

This script will generate part of the configuration profile, including a file srv<node index>-public.tar.gz. This file should be shared with the other nodes, and all of them need to place it in the configuration profile folder, ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>, such that all the files inside srv<node index>-public.tar.gz are in the same location in each node.

Once this is done, step 2 can be executed:

$ bash step2.sh <network name> <node index>

The deployment profile is now ready to be used.

MedCo Node Deployment (except IRCT)

Next step is to download and build the docker images, and run a node.

$ cd ~/medco-deployment/compose-profiles/test-network-<network name>-node<node index>
$ docker-compose pull
$ docker-compose build
$ docker-compose up

Wait some time for the initialization of the containers to be done (up to the message: “- Started x of y services (z services are lazy, passive or on-demand)”), this can take up to 10 minutes. For the subsequent runs, the startup will be faster.

IRCT Deployment and Configuration

Currently IRCT must be configured manually and deployed separately in each of the nodes. This will change in the future.

$ cd ~
$ git clone -b MedCo-v0.1.1 https://github.com/lca1/IRCT.git

Edit the file ~/IRCT/deployments/.env and adjust for each node:

MEDCO_NODE_URL=https://<node domain name>
MEDCO_NODE_IDX=<node index>
MEDCO_PROFILE_NAME=test-network-<network name>-node<node index>

Copy all the certificates obtained from the previous step to the folder ~/IRCT/deployments/irct/volumes/certificates/:

$ cp ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/*.crt ~/IRCT/deployments/irct/volumes/certificates/

Then, build and run the IRCT container:

$ cd ~/IRCT/deployments
$ docker-compose -f docker-compose.medco.test-network.yml build
$ chmod -R a+rw ../
$ docker-compose -f docker-compose.medco.test-network.yml up

Use the pgAdmin tool to add the IRCT configuration (see The PostgreSQL database). With the query tool, execute the following SQL in the database irct by adapting to your case:

select add_i2b2_medco_resource(
    'i2b2-medco-test-network',
    'https://<node 0 domain name>/i2b2/services/,https://<node 1 domain name>/i2b2/services/,...',
    'i2b2medco,i2b2medco,i2b2medco',
    'medcouser',
    'demouser',
    'true',
    'false',
    'edu.harvard.hms.dbmi.bd2k.irct.ri.medco.I2B2MedCoResourceImplementation',
    'TREE'
);

Finally, restart IRCT to account for the new configuration by hitting Ctrl+C in IRCT terminal, and starting it again:

$ docker-compose -f docker-compose.medco.test-network.yml up

In order to stop the containers, simply hit Ctrl+C in all the active windows.

Keycloak Configuration

Follow the instructions from Keycloak Configuration and then you should be able to login in Glowing Bear.

Data Loading

Contrary to the other deployment profiles the default test data will not be working (the queries made will fail) since the data is not encrypted with the collective key that was generated (encryption key derived from all the nodes’ public keys). Run the MedCo loader (see Loading Data) to be able to test this deployment. For reference, the database address (host) to use during loading is <domain name>:5432 and the database i2b2medco.

Test the deployment

In order to test that the network deployment of MedCo is working, access Glowing Bear in your web browser at http://<node domain name> and use the credentials previously configured during the Keycloak Configuration. If you are new to Glowing Bear you can watch the Glowing Bear user interface walkthrough video.

Note that by default the certificates generated by the script are self-signed and thus, when using Glowing Bear, the browser will issue a security warning. To use your own valid certificates, see HTTPS Configuration.

Local Development Deployment

Profile dev-local-3nodes

This deployment profile deploys 3 MedCo nodes on a single machine for development purposes. It is meant to be used only on your local machine, i.e. localhost. The version of the docker images used are all dev, i.e. the ones built from the development version of the different source codes. They are available either through Docker Hub, or built locally.

MedCo Node Deployment (except IRCT)

First step is to clone the medco-deployment repository with the correct branch. This example gets the data in the home directory of the current user, but that can be changed.

$ cd ~
$ git clone -b dev https://github.com/lca1/medco-deployment.git

Next step is to download or build the docker images:

$ cd ~/medco-deployment/compose-profiles/dev-local-3nodes
$ docker-compose pull
$ docker-compose build

Next step is to run the nodes. They will run simultaneously, and the logs of the running containers will maintain the console captive. No configuration changes are needed in this scenario before running the nodes. To run them:

$ docker-compose up

Wait some time for the initialization of the containers to be done (up to the message: “i2b2-medco-srv… - Started x of y services (z services are lazy, passive or on-demand)”), this can take up to 10 minutes. For the subsequent runs, the startup will be faster.

IRCT Deployment and Configuration

First step is to clone the IRCT repository with the correct branch.

$ cd ~
$ git clone -b fork/thehyve https://github.com/lca1/IRCT.git

Currently IRCT must be deployed separately, this will change in the future:

$ cd ~/IRCT/deployments
$ docker-compose -f docker-compose.medco.dev-local-3nodes.yml build

In a separate terminal run the IRCT container:

$ chmod -R a+rw ../
$ docker-compose -f docker-compose.medco.dev-local-3nodes.yml up

Again, the initial startup takes up a few minutes as IRCT is compiled at that point (up to the message: “irct_1… - Started x of y services (z services are lazy, passive or on-demand)”).

Glowing Bear Deployment and Configuration

First step is to clone the glowing-bear repository with the correct branch.

$ cd ~
$ git clone -b picsure https://github.com/lca1/glowing-bear-medco.git

Glowing Bear is deployed separately for development, as we use its very practical development server:

$ cd ~/glowing-bear-medco/deployment
$ docker-compose build dev-server

In another separate terminal run the glowing bear development server:

$ docker-compose up dev-server

In order to stop the containers, simply hit Ctrl+C in all the active windows.

Keycloak Configuration

Follow the instructions from Keycloak Configuration and then you should be able to login in Glowing Bear.

Test the deployment

In order to test that the development deployment of MedCo is working, access Glowing Bear in your web browser at http://localhost:4200 and use the credentials previously configured during the Keycloak Configuration. If you are new to Glowing Bear you can watch the Glowing Bear user interface walkthrough video.

By default MedCo loads a specific test data, refer to Description of the default test data for expected results to queries. To load a dataset, follow the guide Loading Data. For reference, the database address (host) to use during loading is localhost:5432 and the databases i2b2medcosrv0, i2b2medcosrv1 and i2b2medcosrv2.

These pages explain how to deploy MedCo in different scenarios. Each deployment scenario corresponds to a deployment profile, as described below. All these instructions use the deployment scripts from the medco-deployment repository.

If you are new to MedCo…

… and want to try to deploy the system on a single machine to test it, you should should follow the Local Test Deployment guide.

… and want to create or join a MedCo network, you should follow the Network Test Deployment guide.

… and want to develop around MedCo, you should follow the Local Development Deployment guide.

Deployment Profiles

A deployment profile is composed of two things:

  • a compose profile in ~/medco-deployment/compose-profiles/<profile name>/: docker-compose file and parameters like ports to expose, log level, etc.
  • a configuration profile in ~/medco-deployment/configuration-profiles/<profile name>/: files mounted in the docker containers, containing the cryptographic keys, the certificates, etc.

Some profiles are provided by default, for development or testing purposes. Those should not be used in a production scenario with real data, as the private keys are set by default, thus not private. Other types of profiles must generated using the scripts in ~/medco-deployment/resources/profile-generation-scripts/<profile name>/.

The different profiles are the following:

  • test-local-3nodes (Local Test Deployment)

    • for test on a single machine (used by the MedCo live demo)
    • 3 nodes on any host
    • using the latest release of the source codes
    • no debug logging
    • profile pre-generated
  • test-network (Network Test Deployment)

    • for test on several different hosts
    • a single node on a host part of a MedCo network
    • using the latest release of the source codes
    • no debug logging
    • profile must be generated prior to use with the provided scripts
  • dev-local-3nodes (Local Development Deployment)

    • for software development
    • 3 nodes on the local host
    • using development version of source codes
    • debug logging enabled
    • profile pre-generated

The database is pre-loaded with some encrypted test data using a key that is pre-generated from the combination of all the participating nodes’ public keys. For the test-network deployment profile this data will not be correctly encrypted, since the public key of each node is generated independently, and, as such, the data must be re-loaded.

Configuration

Keycloak Configuration

Here follows some MedCo-specific instructions for the administration of Keycloak. For anything, please refer to the Keycloak Server Administration Guide.

Accessing the web administration interface

In the case of the development profile dev-local-3nodes (i.e. without reverse proxy), the address is http://localhost:8081/auth/admin. In the other cases (with the reverse proxy), the address is http://<node domain name>/auth/admin. The credentials are :

  • User keycloak
  • Password keycloak by default, or whatever else was configured at the initial deployment.
Disabling HTTPS requirement for external connections

When deploying the test-local-3nodes profile without HTTPS on a machine other than localhost, the administration interface will refuse to load. To solve this, access pgAdmin (see The PostgreSQL database) and execute the following SQL on the keycloak database:

update REALM set ssl_required = 'NONE' where id = 'master';

You need to restart the Keycloak docker container to enable the changes.

Manually add an authorized user
  • Go to the configuration panel Users, click on Add user.
  • Fill the Username field, toggle to ON the Email Verified button and click Save.
  • In the next window, click on Credentials, enter twice the user’s password, toggle to OFF the Temporary button if desired and click Reset Password.
Add the default OpenID Connect client configuration for MedCo
  • Go to the configuration panel Clients, click on Create.
  • There specify in Client ID the value i2b2-local (or another value if previously configured) and click Save.
  • In the next window, fill Valid Redirect URIs and Web Origins according to the table below and click Save.
Deployment Profile Valid Redirect URIs Web Origins
test-local-3nodes http(s)://<node domain name>/glowing-bear http(s)://<node domain name>
test-network https://<node domain name>/glowing-bear https://<node domain name>
dev-local-3nodes http://localhost:4200 http://localhost:4200

HTTPS Configuration

HTTPS is supported for the profiles test-local-3nodes and test-network.

Certificate

The certificates are held in the configuration profile folder (e.g, ~/medco-deployment/configuration-profiles/test-local-3nodes):

  • certificate.key: private key
  • certificate.crt: certificate of own node
  • srv0-certificate.crt, srv1-certificate.crt, …: certificates of all nodes of the network
Enable HTTPS for the Test Local Deployment

To enable HTTPS for the profile test-local-3nodes, replace the files certificate.key and certificate.crt from the configuration profile folder with your own versions. Such a certificate can be obtained for example through Let’s Encrypt.

Then edit the file .env from the compose profile, replace the http with https, and restart the deployment.

Configure HTTPS for the Test Network Deployment

Coming soon

The PostgreSQL database

Administration with PgAdmin

PgAdmin can be accessed through http://<node domain name>/pgadmin with username admin and password admin (by default). To access the test database just create a server with the name MedCo, the address postgresql, username postgres and password postgres.

_images/pgadmin.png

Loading Data

v0 (Genomic Data)

The v0 loader expects an ontology, with mutation and clinical data in the MAF format. As the ontology data you must use ~/medco-loader/data/genomic/tcga_cbio/clinical_data.csv and ~/medco-loader/data/genomic/tcga_cbio/mutation_data.csv. For clinical data you can keep using the same two files or a subset of the data (e.g. 8_clinical_data.csv). More information about how to generate sample datafiles can be found below. After the following script is executed all the data is encrypted and ‘deterministically tagged’ in compliance with the MedCo data model.

Loading from the same host

If you using the same host machine to deploy and load the data you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. This includes the scenario in test-network where for each of the nodes you want to load data from its hosting machine. You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.

Deployment Profile –network –v (volumes) –dbHost –dbName
test-local-3nodes test-local-3nodes_medco-network + test-local-3nodes_medco-srv<node index> ~/medco-loader/data/genomic:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml postgresql i2b2medcosrv<node index>
test-network test-network-<network name>-node<node index>_default ~/medco-loader/data/genomic:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml postgresql i2b2medco
dev-local-3nodes dev-local-3nodes_medco-network + dev-local-3nodes_medco-srv<node index> ~/medco-loader/data/genomic:/dataset + ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml postgresql i2b2medcosrv<node index>
Loading from a different host

If you are using an external machine (e.g. your laptop) to load the data into one of the nodes you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. In this case you do not need to specify the --network parameters. You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.

Deployment Profile –v (volumes) –dbHost –dbName
test-local-3nodes ~/medco-loader/data/genomic:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml <domain name> i2b2medcosrv<node index>
test-network ~/medco-loader/data/genomic:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml <domain name> i2b2medco
Example

The following example allows to load data into a running MedCo development deployment (dev-local-3nodes), on the node 0. Adapt accordingly arguments network, entryPointIdx and dbName for the 2 other nodes.

cd ~/medco-loader/deployment
docker run --network="dev-local-3nodes_medco-network" --network="dev-local-3nodes_medco-srv0" \
    -v ~/medco-loader/data/genomic:/dataset \
    -v ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml \
    medco/medco-loader:v0.1.1 medco-loader -debug 2 v0 --group /group.toml --entryPointIdx 0 \
    --ont_clinical /dataset/tcga_cbio/8_clinical_data.csv --sen /dataset/sensitive.txt \
    --ont_genomic /dataset/tcga_cbio/8_mutation_data.csv --clinical /dataset/tcga_cbio/8_clinical_data.csv \
    --genomic /dataset/tcga_cbio/8_mutation_data.csv --output /dataset/ --dbHost localhost --dbPort 5432 \
    --dbName i2b2medcosrv0 --dbUser i2b2 --dbPassword i2b2

Explanation of the arguments:

NAME:
    medco-loader v0 - Load genomic data (e.g. tcga_bio dataset)

USAGE:
    medco-loader v0 [command options] [arguments...]

OPTIONS:
    --group value, -g value               UnLynx group definition file
    --entryPointIdx value, --entry value  Index (relative to the group definition file) of the collective authority server to load the data
    --sensitive value, --sen value        File containing a list of sensitive concepts
    --dbHost value, --dbH value           Database hostname
    --dbPort value, --dbP value           Database port (default: 0)
    --dbName value, --dbN value           Database name
    --dbUser value, --dbU value           Database user
    --dbPassword value, --dbPw value      Database password
    --ont_clinical value, --oc value      Clinical ontology to load
    --ont_genomic value, --og value       Genomic ontology to load
    --clinical value, --cl value          Clinical file to load
    --genomic value, --gen value          Genomic file to load
    --output value, -o value              Output path to the .csv files
Data Manipulation

Inside ~/medco-loader/data/scripts/ you can find a small python application to extract (or replicate) data out of the original tcga_cbio dataset. You can decide which patients you want to consider for you ‘new’ dataset or simply randomly pick a sample.

To check that it is working you can query for:

-> MedCo Gemomic Ontology -> Gene Name -> BRPF3

For the small dataset ``8_xxxx``you should obtain 3 matching subjects (one at each site).

v1 (I2B2 Demodata)

The v1 loader expects an already existing i2b2 database (in .csv format) that will be converted in a way that is compliant with the MedCo data model. This involves encrypting and ‘deterministically tagging’ some of the data.

List of input (‘original’) files:

  • all i2b2metadata files (e.g. i2b2.csv)
  • dummy_to_patient.csv
  • patient_dimension.csv
  • visit_dimension.csv
  • concept_dimension.csv
  • modifier_dimension.csv
  • observation_fact.csv
  • table_access.csv
Loading in the same host

If you using the same host machine to deploy and load the data you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. This includes the scenario in test-network where for each of the nodes you want to load data from its hosting machine. You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.

Deployment Profile –network –v (volumes) –dbHost –dbName
test-local-3nodes test-local-3nodes_medco-network + test-local-3nodes_medco-srv<node index> ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml postgresql i2b2medcosrv<node index>
test-network test-network-<network name>-node<node index>_default ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml postgresql i2b2medco
dev-local-3nodes dev-local-3nodes_medco-network + dev-local-3nodes_medco-srv<node index> ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml postgresql i2b2medcosrv<node index>
Loading in a different host

If you are using an external machine (e.g. your laptop) to load the data into one of the nodes you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. In this case you do not need to specify the --network parameters. You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.

Deployment Profile –v (volumes) –dbHost –dbName
test-local-3nodes ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml <domain name> i2b2medcosrv<node index>
test-network ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml <domain name> i2b2medco
Dummy Generation

The provided example data set files come with dummy data pre-generated. Those data are random dummy entries whose purpose is to prevent frequency attacks. For more information on how this dummy generation is done please refer to ~/medco-loader/data/scripts/import-tool/report/report.pdf. In a future release, the generation will be done dynamically by the loader.

Example

The following example allows to load data into a running MedCo development deployment (dev-local-3nodes), on the node 0. Adapt accordingly arguments network, entryPointIdx and dbName for the 2 other nodes.

cd ~/medco-loader/deployment
docker run --network="dev-local-3nodes_medco-network" --network="dev-local-3nodes_medco-srv0" \
    -v ~/medco-loader/data/i2b2:/dataset -v ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml \
    medco/medco-loader:v0.1.1 medco-loader -debug 2 v1 --group /group.toml --entryPointIdx 0 --sen /dataset/sensitive.txt  \
    --files /dataset/files.toml --dbHost localhost --dbPort 5432 --dbName i2b2medcosrv0 --dbUser i2b2 --dbPassword i2b2
NAME:
    medco-loader v1 - Convert existing i2b2 data model

USAGE:
    medco-loader v1 [command options] [arguments...]

OPTIONS:
    --group value, -g value               UnLynx group definition file
    --entryPointIdx value, --entry value  Index (relative to the group definition file) of the collective authority server to load the data
    --sensitive value, --sen value        File containing a list of sensitive concepts
    --dbHost value, --dbH value           Database hostname
    --dbPort value, --dbP value           Database port (default: 0)
    --dbName value, --dbN value           Database name
    --dbUser value, --dbU value           Database user
    --dbPassword value, --dbPw value      Database password
    --files value, -f value               Configuration toml with the path of the all the necessary i2b2 files
    --empty, -e                           Empty patient and visit dimension tables (y/n)

To check that it is working you can query for:

-> Diagnoses -> Neoplasm -> Benign neoplasm -> Benign neoplasm of breast

You should obtain 2 matching subjects.

The current version offers two different loading alternatives: (v0) loading of clinical and genomic data based on MAF datasets; and (v1) loading of generic i2b2 data. Currently these two loaders support each one dataset:

Future releases of this software will allow for other arbitrary data sources, given that they follow a specific structure (e.g. BAM format).

Pre-Requisites

First get the repository containing the MedCo loader software, which already contains some test data for you to work with. Not that you need git-lfs for those data to be retrieved with the repository.

$ cd ~
$ git clone -b v0.1.1 https://github.com/lca1/medco-loader.git

Building Application

To get the MedCo loader application, pull it with Docker:

docker pull medco/medco-loader:v0.1.1

Network Architecture

_images/network_architecture.png

External Entities

Entities that need to connect to a machine running MedCo can be categorized as follow:

  • System administrators: Persons administrating the MedCo node. Likely to remain inside the clinical site internal network.
  • End-users: Researchers using MedCo to access the shared. Likely to remain inside the clinical site internal network.
  • Other MedCo nodes: MedCo nodes belonging to other clinical sites of the network.

Firewall Ports Opening

The following ports should be accessible by the listed entities, which makes IP address white-listing possible:

  • Port 22, 5432 (TCP): System Administrators
  • Port 80 (TCP): End-Users (HTTP automatic redirect to HTTPS (443))
  • Port 443 (TCP): System Administrators, End-Users, Other MedCo Nodes
  • Ports 2000-2001 (TCP): Other MedCo Nodes

This guide explains the deployment and configuration of MedCo instances.

Developer Guide

System Architecture

_images/system_architecture.png

Containers

medco-unlynx

The software executing the distributed cryptographic protocols, based on Unlynx.

i2b2-medco

The i2b2 stack (all the cells), with the addition of the MedCo i2b2 cell to process the queries. This cell communicates with medco-unlynx to execute the distributed cryptographic protools.

irct

The query translation and broadcasting layer.

glowing-bear

Nginx web server serving Glowing Bear and the crypto module.

keycloak

OpenID Connect identity provider.

postgresql

The SQL database used by all other services, contains all the data.

pg-admin

A web-based administration tool for the PostgreSQL database.

nginx

Web server and (HTTPS-enabled) reverse proxy.

php-fpm

PHP processor running with FPM (FastCGI Process Manager), used by Nginx. Executes the PHP code needed to serve the genomic annotations.

Description of the default test data

Coming soon

If you are interested in developing around MedCo, the first thing you might want to do is to follow the Local Development Deployment guide to set up the development version of MedCo.

User Guide

Coming soon

Disclaimer: MedCo is still an experimental software under development and should not, at this point, use real sensitive data.

Releases

  • 0.1.1, 23rd Jan. 2019
    Deployment for test purposes on several machines, enhancements of documentation and deployment infrastructure, Nginx reverse proxy with HTTPS support, Keycloak update.
  • 0.1, 1st Dec. 2018
    First public release of MedCo, running with i2b2 v1.7, PIC-SURE/IRCT v1.4 and centralized OpenID Connect authentication. Deployment for development and test purpose on a single machine.

Resources

Contact

For assistance with deploying MedCo or any other technical questions, send an email at medco-dev@listes.epfl.ch or any of the contributors.

License

MedCo is licensed under a End User Software License Agreement (‘EULA’) for non-commercial use. If you need more information, please contact us.