Welcome to COMMAND>_’s documentation!¶
COMMAND>_ is a web-based application used to download, collect and manage gene expression data from public databases.
Main features are.
- Easy installation and update using Docker Compose technology.
- Graphical User Interface (GUI) for parsing and importing gene expression data.
- Default Python scripts for easy parsing/importing of the most common microarray platforms (Affymetrix, Nimblegen, two-colors, etc.) and dedicated scripting editor for allowing flexible importing of any kind of gene expression data.
- Automatic pre-processing (downloading, trimming, mapping and counting) of bulk RNA-Seq data.
- Exporting of the collected data.
Note
Give it a try on https://command.fmach.it:4242 using:
- username:
guest
- password:
demo
Check out the Use Cases!
What is COMMAND>_?¶
COMMAND>_[#f1]_ is an acronym that stands for COMpendia MANagement Desktop. It is the software used for the creation of several gene expression compendia such as COLOMBOS [2] and VESPUCCI [3]. Despite being used since 2010 it has been made publicly available for anyone to use only in 2018, after having been completely rewritten. COMMAND>_ was originally conceived for the collection (and integration) of prokaryotes microarray experiments. As time goes by it has been evolved to allow also RNA-seq experiment to be imported and other species to be managed. With the current implementation COMMAND>_ is still meant for gene expression data collection but can be easily extended to support other kind of quantitative measurement technology (have a look at COMMAND>_ for developers).
COMMAND>_ is a Python web application developed using the Django framework for the backend, while the web interface has been developed using ExtJS with a look and feel typical of desktop applications. With COMMAND>_ you can search and download experiment from public gene expression databases, such as GEO , ArrayExpress or SRA, parse downloaded files to extract only valuable information, preview parsed data and import experiment data into a database. The pivotal point is the usage of custom Python scripts to mine only the relevant information. Scripts can be created or modified directly within the interface and are responsible to parse input files and populate each part of the data model (see Database schema), i.e. measurement data and meta-data for experiment, platforms and samples.
For microarray platforms it would be necessary to map probes to genes but before this step genes have first to be imported. COMMAND>_ allow to perform both these steps. For the latter it would be simply a matter of uploading a FASTA file with gene sequences (see data_collection), while for the former a BLAST alignment followed by a two-step filtering will be performed. In this way the microarray gets annotated with the latest available information enhancing the homogeneity since all microarrays will be annotated using the same gene list (see also map_feature).
References
[1] | Moretto, M. et al. (2019). First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_. BMC bioinformatics, 20(1), 54. |
[2] | Moretto, M. et al. (2015). COLOMBOS v3. 0: leveraging gene expression compendia for cross-species analyses. Nucleic acids research, 44(D1), D620-D623. |
[3] | Moretto, M. et al. (2016). VESPUCCI: exploring patterns of gene expression in grapevine. Frontiers in plant science, 7, 633. |
Getting start with COMMAND>_¶
Getting my user id and password¶
If you are using the public COMMAND>_ instance on https://command.fmach.it:4242 you can login using:
- username:
guest
- password:
demo
This is a user with restricted privileges meant for demonstration purpose only. If you have your running instance of COMMAND>_ (see Deploy) you will be able to first login using:
- username:
admin
- password:
admin
Now you can change the admin password, create new users and assign them privileges following the instructions in Admin.

Set up and select a compendium¶
The first thing to do is creating a new empty compendium. Go to Admin (top bar) > Compendium Manager > Create Compendium (bottom-left corner + icon) and follow the instructions at Admin.

Now that a new compendium has been set up you need to retrieve a FASTA file containing the gene ids and sequences for the species you want to study.
Tip
For example you can visit the NCBI Nucleotide database and get the coding sequences for the organism of interest. This file is mandatory for blasting and mapping respectively in either microarray or RNA-Seq experiments. In order to import it into COMMAND>_ go to Data collection (on the top-left corner) > Biological features, then select Import biological features from the bottom-left + icon.

Now your gene annotation file has been imported and you can start looking for interesting experiments (both microarray and RNA-Seq) related to the organism of interest.
Searching public databases¶
After a new empty compendium has been created and a species of interest selected the user can start looking for collections of samples (from one or more experiments) from public databases (GEO, ArrayExpress or, in case of RNA-Seq experiments, SRA): Go to > Data collection (on the top-left corner) then > Experiments > New Experiment (on the bottom right corner) > from public DB.

- In the Search options field of the dialog ‘Download from Public DB’ select the DB (here GEO) and the term of interest, either a description (e.g. Leukemia b-cell, Vitis vinifera, erc.) or directly a GSE ID.
- From the list select an experiment of interest and click the download button.
Tip
You can download multiple experiments at the same time.

- After a while, depending of the number of samples in the selected experiment(s) you have your experiment downloaded.
Tip
- Check Message log frequently.
- Inspect the Experiments section to see which experiments are available, yet to be parsed or already imported.
Now you can start parse and import some experiment (see Use Cases).
Message log¶
The Message Log page (Top > Options > Message Log) allows the user to take an eye on every activity of COMMAND>_. Check it frequently!
Admin¶
The admin interface is visible only to admin users that have complete access to COMMAND>_ functionalities and compendia.
Users/Group manager¶
The user menu allow to create, remove and modify users. Moreover, an admin user can assign users to groups and set privileges to them. Group privileges are compendium-specific, i.e. we can for example restrict access only to some compendia and avoid users belonging to a group to see the others. For those compendia we can limit some functionalities, for example we could avoid users to run Python script or import experiments.
Compendium manager¶
From this page a compendium can be created, modified, deleted and initialized. From a technical point of view a compendium is nothing more than a database schema. When an admin user creates a new compendium he will be asked to add all the information necessary to connect to such database.
Once the connection information are saved and a new compendium appear in the grid, it would be possible to initialize it, i.e. to create the database schema.
Note
In this way it is possible to have compendia hosted on different database server. If the database do not exists yet it will be possible to have COMMAND>_ to create it on the fly but you will be asked to provide username and password for a database admin user. Default database admin user is postgres with password postgres.
The Compendium Type section is read-only and at the moment is filled only with gene expression since it is the only type of compendium you are allowed to create. To extend COMMAND>_ and allow other kind of quantitative data to be collected please have a look at COMMAND>_ for developers.
Deploy¶
COMMAND>_ is a complex application and relies on several other software components to work. In order to ease up the deployment process a docker-compose.yml
file is provided, so assuming you have a working Docker Compose environment, the deployment process will be a matter of running a few commands.
In case you want to manually deploy COMMAND>_ in your environment there will be more steps you will need to take care of such as installing the web-server, the DBMS, etc.
Requirements¶
Have a look at the requirements.txt
file for details. COMMAND>_ main dependencies are:
COMMAND>_ uses several external tools
that you’ll need to download them separately:
Docker Compose¶
Assuming that you have Docker Compose correctly installed, you should be able to perform the following steps:
# 1. clone the repository
git clone https://github.com/marcomoretto/command.git
# 2. copy external dependencies (check figure below)
# 3. build
docker-compose build
# 4. start docker
docker-compose up -d
# 5. create database schema
docker-compose exec web python manage.py migrate
# 6. create admin user
docker-compose exec web python manage.py init_admin
# 7. create initial options
docker-compose exec web python manage.py init_options
# 8. create demo compendium
docker-compose exec web python manage.py init_demo_compendium demo
# 9. run daphne
docker-compose exec -d daphne daphne -b 0.0.0.0 -p 8001 cport.asgi:channel_layer
# 10. run worker
docker-compose exec -d worker python3 manage.py runworker
- That’s it! You should be able to point your browser to http://localhost and login into COMMAND>_ using:
- username:
admin
- password:
admin
- username:
Note
You should have the following directory structure for the external tools

Note
You might need to rename the directory from command
to cport
before doing step # 2.
Manual Deploy¶
One easy way to understand what you need to do to manually deploy COMMAND>_ is to have a look at 2 files:
- the Dockerfile
- the docker-compose.yml file
In a nutshell, after having installed and configured Nginx (or another web-server to run Django applications), PostgreSQL, Redis, RabbitMQ and Celery, you’ll have to run:
pip3 install --upgrade pip
pip3 install Cython==0.28.1
pip3 install -r requirements.txt
Now you should be ready configure Django (check the documentation for details), create the database schema and run the application.
python manage.py migrate
python manage.py init_admin
python manage.py init_options
python manage.py init_demo_compendium demo
daphne -b 0.0.0.0 -p 8001 cport.asgi:channel_layer
python3 manage.py runworker
Note
COMMAND>_ id a Django application so refer to the Django docs for database configuration https://docs.djangoproject.com/en/1.11/ref/settings/
Database schema¶

Use Cases¶
In this section we show how to both parse and import experiments from various gene expression platforms, technologies and sources (both public databases and local files) using the provided default scripts.
Use Case - Affymetrix from GEO¶
Import Gene Annotations¶
We want to look for experiments related to Yeast: the Saccharomyces Genome Database is the proper choice for retrieving sequences associated to Yeast’s genes (from this link). Go to > Data collection (on the top left corner) then > Biological features > Import biological feature (+ symbol on the bottom left) > Type: FASTA , File name: select the annotation file you downloaded before > Import Biological features. Wait.


We start by selecting Experiments from Data collection (top left corner) then we highlight the experiment of interest (it was previously retrieved from GEO following Searching public databases), here GSE8536, an expression analyses study which inspects the response of Saccharomyces cerevisiae to stress throughout a 15-day wine fermentation.
Parse Experiment, Platform and Samples¶
Since we have a new platform (GPL90) never imported before into COMMAND>_ for this compendium, we retrieve the sequences associated to the Affymetrix probe ids (YG_S98 probes
) for this platform from the Affymetrix Support sitewebsite.
From Experiments (Data collection Menu) we highlight the selected experiment (GSE8536 here) and click the Parse/Import experiment from the bottom bar. On the main window you can see that the Experiment tab is populated with metadata gathered from the publicDB (GEO here). Now we can start parsing the Experiment, the Platform(s) and the Samples.


Being a dataset retrieved from GEO we take advantage of the .soft file downloaded (see GEO Documentation for a description of this type of file):
Select GSE8536_family.soft and click the Use assignment script to assign files to experiment entities icon on the bottom-right. A dialog will show-up:
- Script > assign_all.py > Only selected files
- Experiment tab > Script: > soft_experiment.py, Execution order: 1
- Platform tab > Script: > soft_platform.py, Execution order: 1
- Sample tab > Script: > soft_sample.py, Execution order: 1
- Run assignment script

Now in order to parse the new platform we are going to use the sequences associated to the Affymetrix probe ids we have already downloaded. We import the annotation ( YG_S98 probes
) in the File assignment section of Experiment files clicking the upload icon on the bottom of the page.
Now we associate the file to the platform:
- In Experiment files Section > File Assignment select the uploaded file (YG_S98.probe_tab) and click Use assignment script to assign files to experiment entities. On the Assign files dialog:
- Script: assign_all.py
- Param:
- Only selected files checked (default)
- Platform tab > Script: gpr_platform.py , Parameters: 0,Probe X|Probe Y,Probe Sequence , Execution order: 2
- Run assignment script

Note
- the Parameters assigned to the gpr_platform.py script specify to not skip any line, use the combination of Probe X and Probe Y columns to create an unique id for the cel files and indicate the sequences for the probes are in the Probe Sequence column.
- The parsing of the Platform is a once time procedure: from now on we can use this platform for all related experiments.
Now we parse the Affymetrix cel files (sample files):
- In Experiment files Section > File Assignment we use CEL as filter and select all files > click the Use assignment script to assign files to experiment entities icon on the bottom-right corner and the Assign files and scripts to experiment structure dialog will pop-up:
- Script > match_entity_name.py
- Only selected files (default) checked
- Sample tab > Script: cell_sample.py, Execution order: 2
- Run assignment script

Finally, in the Preview Section (Preview of GSE8536 here) click Run Selected (bottom-right corner). After a while your samples will be parsed.
Now you can Import both the Platform (since is the first time we use this specific one) and the Experiment.
Tip
Check that both the platform and the samples are properly parsed from the Preview interface of the Parse Experiment section clicking on the platform and on each sample.
Click the Import button on the bottom-right corner and select Import whole experiment. After a while the experiment and the platform (in this case) will be imported.

Use Case - Nimblegen from ArrayExpress¶
In COMMAND>_ the preferred way to import experiments from public db is by using GEO which provide the most convenient interface out-of-the-box. In case an experiment is not included in GEO it is possible to import it from ArrayExpress. Start by searching the experiment of interest following the procedure described in Searching public databases, select E-GEOD-58806 as Term and ArrayExpress as Database. Go the experiment slide on the left, select the experiment of interest (here E-GEOD-58806 ) and click >_ Parse/Import experiment. On the main window you can see that the Experiment tab is populated with metadata gathered from the publicDB (ArrayExpress here).
Import Platform from GEO¶
COMMAND>_can use a previous imported platform from a different public database (either ArrayExpress or GEO) and assign it as Reporter platform (in the preview main section of Parsing) for the current experiment.
In our case we want to parse and import an experiment from ArrayExpress using a previously imported platform from GEO. In order to do so we import ONLY the platform for another experiment (here GSE32561) which uses the same platform of the experiment of interest. After the selection of the new experiment using the Searching from public db procedure we use the Nimblegen ndf files which allows to associate probes to sequences to the platform GPL14649.
Experiment files > File Assignment > Select GPL14649_071112_Ecoli_K12_EXP.ndf and in the Assign files dialog:
- Script: match_entity_name.py
- Param: platform
- Only selected files checked (default)
- Platform tab > Script: > gpr_platform.py; Parameters: 0,X|Y,PROBE_ID; Execution order: 2
- Run assignment script

Now we can import this platform from the Platform section of Preview:

Parse Experiment, Platform and Samples¶
Now the Platform is available and can be used to import the experiment retrieved from ArrayExpress. Go to Experiments > Parse Experiment E-GEOD-58806 > Experiment Files > Platform and now click over A-GEOD-14649 in the Reporter Platform field and selected the previously imported GPL14649.

Finally you parse and import the nimblegen .pair files:
- In Experiment files Section > File Assignment > Filter .pair and select all files
- click the Use assignment script to assign files to experiment entities icon on the bottom-right and the Assign files and scripts to experiment structure dialog will pop-up:
- Script: match_sample_name.py > Only selected files
- Sample: Script: > pair_sample.py, Execution order: 2
- Run assignment script
Use Case - Multiplatform Experiment¶
It is standard practice for gene expressione esperiments to make use of multiple platforms for the same organism in the same experiment: usually it comes from multiple single experiments performed in different conditions/time. Here, we select from GEO the GSE13713 experiment regarding Phenotypic and transcriptomic analyses of mildly and severely salt-stressed Bacillus cereus ATCC. It is related to two platforms: GPL7634 and GPL7636.
Import Gene Annotation¶
Since the platforms related to the selected experiment were never imported before into COMMAND>_, we need the gene sequences in order to properly import our probes at gene level. We got gene/sequence list from ncbi: go here and from the top-right button select send to: Coding sequences, Format: FASTA Nucleotide and Choose destination: File. In COMMAND>_ go to > Data Collection (on the top left corner) then > Bio features (genes) > Import biological feature (+ symbol on the bottom left) > Type: FASTA , File name: select the annotation file you downloaded before > Import Biological features.

Parse Platforms and Samples¶
In order to parse the two platforms, we need both the soft file related to the experiment and the soft_platform.py script.
In Experiment files Section > File Assignement > Select the GSE13713_family.soft file and on the Assign files dialog:
- Script: match_all.py
- Param: platform
- Only selected files checked
- Platform tab > Script: > soft_platform.py, parameters: True, Execution order: 1
In Experiment files Section > File Assignement > Select the .txt files (all Sultana in the Filter field) and on the Assign files dialog:
- Script: match_entitye_name.py
- Parameters: ch1
- Only selected files checked
Platform tab
- Script: gpr_sample.py
- parameters: Gene name,Spot Mean Intensity (Cyanine5_060909_1136(1)),0
- Execution order: 2
Do the same again for the ch2 but use as Parameters for Platform:
Platform tab
- Script: gpr_sample.py
- parameters: Gene name,Spot Mean Intensity (Cyanine3_060909_1136(1)),0
- Execution order: 2
for Platform GPL10439:
In Experiment files Section > File Assignement > Select the .ndf file and on the Assign files dialog”:
- Script: match_entity_type_param.py
- Param: platform
- Only selected files checked
- Platform tab > Script: > soft_platform.py, Execution order: 2
In Experiment files Section > File Assignement > Select the .txt files (all pair files) and on the Assign files dialog:
- Script: match_entity_name.py
Parameters: ch1
- Only selected files checked
- Platform tab > Script: > gpr_sample.py; Execution; order: 2
- Parameters: ID_REF,Spot Mean Intensity (Alexa555_101810_0935(1)),0
Parameters: ch2
- Only selected files checked
- Platform tab > Script: > gpr_sample.py; Execution; order: 2
- Parameters: ID_REF,Spot Mean Intensity (Alexa647_111510_1227(1))
Use Case - Import experiment from local file¶
In order to import an experiment which is not available from public repositories the user needs to provide:
- a yaml file (see an example:
here
) containing the description of the experiment to be imported: The first row contains the Experiment id, the other rows start with the Platform id followed by the Samples ids. - a single compressed file (either zip or tar.gz) containing the raw data.
Go to Experiments > New Experiment (bottom-left) > From local file

Fill the form which popped up starting with Experiment ID (the same contained in the yaml file, GSE13713 for the embedded example) then upload the yaml file (the system will take care to check if the format is ok), finally upload the compressed data. In a while your experiment is going to be imported.



Use Case - RNA-Seq¶
Similarly to the microarray cases, RNA-Seq experiments can be retrieved from public database, specifically the Sequence Read Archive (SRA) , from the New Experiment/From public DB interface (bottom-left border icon). Here we select a small RNA-Seq experiment from SRA (PRJNA471071) where the authors employed a computational model of underground metabolism and laboratory evolution experiments to examine the role of enzyme promiscuity in the acquisition and optimization of growth on predicted non-native substrates in E. coli K-12 MG1655.

Indexing¶
The first step is to build the index for the quasi-alignment mapper (kallisto here [1]): select demo.fasta, It contains the sequences for the genes of the Escherichia coli genome and it is automatically build by COMMAND>_ when you begin parsing the data.
Use Assignment Script (bottom-right corner icon) > from the dialog:match_entity_name.py > Only selected files Experiment tab > Script: > kallisto_index.py, Execution order: 1 > Run assignment script
RNA-Seq pre-processing and summarization¶
Since the experiment is paired-end, the default script for preprocessing and summarization requires to indicate only one of the two paired files. You can do it using the filter and selecting *1.fastq, the script will take care of the rest.
Use Assignment Script (bottom-right corner icon) > from the dialog:match_entity_name.py > Only selected files Experiment tab > Script: > trim_quantify.py, Execution order: 1, Parameters: 1 (being a paired end)

Run assignment script¶
After a while all the sample will be preprocessed and summarized and the experiment can be imported from the Preview section: bottom-right corner > Import whole experiment.
Mapping probes and export the gene expression matrix¶
If you are done with importing experiments you can now map the probes to genes using BLAST [2] and a double filtering GUI of COMMAND>_. Go to Platform, select the platform to be mapped (e.g. GPL90 from the Affymetrix Use Case) and click the chain icon (map platform to biological features) on the bottom left corner.
Now you can use the dialog to run BLAST and filter the data (here we use the default settings).
When your are fine with filtering you can use one of the selected filtered objects and download the expression matrix going to Options > Export.


Tip
You can filter the data with different parameters, each set of parameters is saved in a specific slot.
References
[1] | Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519 |
[2] | Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) “Basic local alignment search tool.” J. Mol. Biol. 215:403-410. |
Python parsing scripts¶
The Experiment Object, Platform Object and Sample Object are Python objects used as proxy to import a new experiment in the database
The file name of the experiment, platform or sample is stored in the variable named INPUT_FILE The name of the entity (experiment, platform or sample name) is stored in the variable named ENTITY_NAME To access parameters passed to each script use the list PARAMETERS Within each entity (experiment, platform or sample) you can choose the execution order of the script using the Order column.
To access Experiment Object use the EXPERIMENT_OBJECT variable in the Python script used with experiment files.
EXPERIMENT_OBJECT variables¶
EXPERIMENT_OBJECT.experiment_access_id: (string) the experiment access id EXPERIMENT_OBJECT.experiment_name: (string) the experiment name EXPERIMENT_OBJECT.scientific_paper_ref: (string) pubblication associated to the experiment EXPERIMENT_OBJECT.description: (string) the experiment description
To access Platform Object use the PLATFORM_OBJECT variable in the Python script used with platform files.
PLATFORM_OBJECT variables¶
PLATFORM_OBJECT.platform_access_id (string) the platform access id PLATFORM_OBJECT.platform_name (string) the platform name PLATFORM_OBJECT.platform_type (string) ‘microarray or rna-seq’ PLATFORM_OBJECT.description (string) the platform description PLATFORM_OBJECT.add_bio_feature_reporter_data(name, description, **kwargs): add a reporter to the platform
**kwargs are platform_type dependent. i.e. for ‘microarray’ they are probe_access_id, probe_set_name, probe_type and sequence
To access Sample Object use the SAMPLE_OBJECT variable in the Python script used with sample files.
SAMPLE_OBJECT variables and methods¶
SAMPLE_OBJECT.sample_name (string) the sample name SAMPLE_OBJECT.description (string) the sample description SAMPLE_OBJECT.add_raw_data(bio_feature_reporter_name, value): add raw data of this sample
parsing_scripts package¶
Subpackages¶
parsing_scripts.experiment package¶
Submodules¶
parsing_scripts.experiment.kallisto_index module¶
-
parsing_scripts.experiment.kallisto_index.
main
()¶ Create an index file for the KALLISTO software using the current BIOLOGICAL FEATURES
Biologial features for this compendium are putted into a FASTA file that is then indexed to be used for RNA-seq quantification using KALLISTO
- PARAMETERS:
- None
parsing_scripts.experiment.soft_experiment module¶
-
parsing_scripts.experiment.soft_experiment.
main
()¶ Parse a SOFT file and extract EXPERIMENT information
Looks for accession number, experiment name, scientific paper, experiment description
- PARAMETERS:
- None
Module contents¶
parsing_scripts.file_assignment package¶
Submodules¶
parsing_scripts.file_assignment.assign_all module¶
-
parsing_scripts.file_assignment.assign_all.
assign
(input_files, entity, entity_type, parameters)¶ Assign the selected input files (or all the files if checked) to every selected ENTITY
For each ENTITY (experiment, platforms or samples) for which a parsing script is selected, all the (selected) input files will be assigned regardless.
- PARAMETERS:
- None
parsing_scripts.file_assignment.match_entity_name module¶
-
parsing_scripts.file_assignment.match_entity_name.
assign
(input_files, entity, entity_type, parameters)¶ Assign the selected input files (or all the files if checked) to every ENTITY with matching NAME
For each ENTITY (experiment, platforms or samples) for which a parsing script is selected, only the (selected) input files with a name that match the one of the entity will be assinged (for example a file name GSE123.soft would match the experiment entity GSE123).
- PARAMETERS:
- None
Module contents¶
parsing_scripts.platform package¶
Submodules¶
parsing_scripts.platform.adf_platform module¶
-
parsing_scripts.platform.adf_platform.
main
()¶ Parse an ADF file and extract PLATFORM information
Looks for accession number, platform name, platform type and platform description
- PARAMETERS:
param1 (string): The original probe id field. If it is composed by more than one field, put all of them separated with a |. For example X|Y
param2 (bool): If True (or 1 or a non-empty string) the probe information (sequence) will be added
parsing_scripts.platform.cdf_platform module¶
-
parsing_scripts.platform.cdf_platform.
main
()¶ Parse a CDF file (Affymetrix) and extract PLATFORM information
Looks for probe set name and probe id. Please note that CDF does not contain probe sequence, for that information refer to cdf_platform_fasta.py
- PARAMETERS:
- None
parsing_scripts.platform.cdf_platform_fasta module¶
-
parsing_scripts.platform.cdf_platform_fasta.
main
()¶ Parse a FASTA file containing probe sequences
This script is usually used before cdf_platform.py in order to get the probe sequence information that a CDF file doesn’t provide.
- PARAMETERS:
- None
parsing_scripts.platform.csv_platform module¶
-
parsing_scripts.platform.csv_platform.
main
()¶ Parse a CSV file containing probe sequences
A CSV file containing probe information is parsed and probes get added to the platform. This script is usually used together with other PLATFORM scripts
- PARAMETERS:
param1 (string): The probe id field
param2 (string): The probe sequence
parsing_scripts.platform.gpr_platform module¶
-
parsing_scripts.platform.gpr_platform.
main
()¶ Parse a GPR file containing PLATFORM information and probe sequences
A GPR file is a TAB-delimited file with headers and complete platform information (descriptions and probe sequences)
- PARAMETERS:
param1 (int): Number of lines to skip
param2 (string): The column header to parse out the original probe id field. If it is composed by more than one field, put all of them separated with a |. For example X|Y (actual probe ids will be concatenated with dots . in that case)
param3 (string): The column header of the probe sequence you want to parse out
param4 (string): DEPRECATED - The column header to parse out the DB ‘gene_map_content’ field; if multiple seperate with a pipe | (actual probe ids will be concatenated with dots . in that case)
param5 (string): The column header to parse out probe name field. If it is composed by more than one field, put all of them separated with a |. For example X|Y (actual probe ids will be concatenated with dots . in that case)
param6 (string): The column header to parse out probe set name field
param7 (bool): Ensure that orgiginal probe id in SAMPLE_OBJECT will be unique (defaults to False)
parsing_scripts.platform.ndf_platform module¶
-
parsing_scripts.platform.ndf_platform.
main
()¶ Parse a NDF file containing probe sequences
A NDF file is an ArrayExpress file that contains probe sequences. They have a header file with X and Y position for the probe, the SEQUENCE field and a PROBE_ID field. The combination of X.Y is used to store the probe id and ensure that is a unique name
- PARAMETERS:
- param1 (int): Number of lines to skip
parsing_scripts.platform.soft_platform module¶
-
parsing_scripts.platform.soft_platform.
main
()¶ Parse a SOFT file and extract PLATFORM information
Looks for accession number, platform name, platform type and platform description. If True is passed as parameter it will look for probe sequence information in the data table part of the file
- PARAMETERS:
- param1 (bool): Read the data table information (default False)
Module contents¶
parsing_scripts.sample package¶
Submodules¶
parsing_scripts.sample.cel_sample module¶
-
parsing_scripts.sample.cel_sample.
main
()¶ Parse a CEL (Affymetrix) file and extract SAMPLE raw data
The probe original id is given by X.Y
- PARAMETERS:
- None
parsing_scripts.sample.gpr_sample module¶
-
parsing_scripts.sample.gpr_sample.
main
()¶ Parse a GPR file and extract SAMPLE raw data
A GPR file is a TAB-delimited file with headers and complete sample raw data information
- PARAMETERS:
param1 (string): The column header of the original probe id to parse out. If it is composed by more than one field, put all of them separated with a |. For example X|Y (actual probe ids will be concatenated with dots . in that case)
param2 (string): The column header of the data value you want to parse out
param3 (int): Number of lines to skip
param4 (int): The sample channel (optional)
parsing_scripts.sample.pair_sample module¶
-
parsing_scripts.sample.pair_sample.
main
()¶ Parse a PAIR file and extract SAMPLE raw data
A PAIR file is a TAB-delimited file with headers and complete sample raw data information The probe id is given by X.Y to ensure uniqueness and the raw data value is taken from the PM column
- PARAMETERS:
- param1 (int): Number of lines to skip
parsing_scripts.sample.soft_sample module¶
-
parsing_scripts.sample.soft_sample.
main
()¶ Parse a SOFT file and extract SAMPLE description and optionally raw data
- PARAMETERS:
- param1 (string): The raw data value field, if empty it will be assigned automatically using the sample_column_identifier function
parsing_scripts.sample.trim_quantify module¶
-
parsing_scripts.sample.trim_quantify.
main
()¶ Trim a FASTQ file using Trimmomatic and quantify using KALLISTO
The result counts will be added to the SAMPLE OBJECT
- PARAMETERS:
- param1 (bool): True if this FASTQ file has a PAIRED file (forward or reverse), default False
Module contents¶
parsing_scripts.utils package¶
Submodules¶
parsing_scripts.utils.column_identifier module¶
-
parsing_scripts.utils.column_identifier.
sample_column_identifier
(query, header)¶ Tries to automatically identify the header column that contains the raw data given some query information (like the dye color)
Multi-channel array might have different dye color on different samples (dye-swap) and thus it would be tedious to manually define it for each single sample. This function tries to do it for you and is tipically invoked for the SOFT sample files.
- PARAMETERS:
query (string): The query string is usually something that contains information about the color i.e. cy3, red, green etc.
header (list): The header is a list of string from which to chose one that will match the query
parsing_scripts.utils.rnaseq module¶
-
parsing_scripts.utils.rnaseq.
create_fasta
(file, compendium)¶ Create a FASTA file using the BIOLOGICAL FEATURE of the current Organism
- PARAMETERS:
file (string): The output FASTA file name
compendium (string): The organism (nick) name
Module contents¶
Module contents¶
COMMAND>_ for developers¶
In order to add new features to COMMAND>_ you’ll need to stick with the whole framework. As a demonstration we will create a basic page to retrieve some data from the database and show them in a grid within COMMAND>_. So we will take care of:
- create the ExtJS interface;
- create the Python view;
- create the permission to access the view;
- make an AJAX call passing parameters;
- perform a job on celery to run in background;
- handle websocket to show the results on a grid;
We will also see how to extend COMMAND>_ functionalities such as how to add a new public database users can use to perform search on, how to add a new platform type and so on.
Note
For anything else related to the interface design please refer to the ExtJS documentation. While to properly add new models and extend the Data Model, please refer to the Django documentation
Add brand new feature in COMMAND>_¶
Create the ExtJS interface¶
COMMAND>_ is a single-page application, so everything you see runs within one HTML file and the Javascript code needed to display the interface is loaded and managed by the ExtJS framework.
All ExtJS interface files (views) live within the directory command/static/command/js/ext-js/app/view
. So let’s create a test
directory in here and, within that directory let’s create 2 files: Test.js
and TestController.js
.
Let’s fill these two files with some basic code like the following:
// Test.js
Ext.define('command.view.test.Test', {
extend: 'Ext.Component',
xtype: 'test',
title: 'Test',
requires: [
'Ext.panel.Panel',
'command.view.test.TestController'
],
controller: 'test',
store: null,
alias: 'widget.test',
itemId: 'test',
reference: 'test',
viewModel: {},
html: 'TEST',
listeners: {
//
},
initComponent: function() {
this.callParent();
},
destroy: function() {
this.callParent();
}
});
// TestController.js
Ext.define('command.view.test.TestController', {
extend: 'Ext.app.ViewController',
alias: 'controller.test'
});
Now you will need to run the command sencha app build
from within the command/static/command/js/ext-js
directory.
Note
To use the sencha app build
command you will need to download and install Sencha CMD
Now you should be able to point your browser to http://localhost/#view/test and see that the Test
panel has been correctly loaded as a tab within the main application panel. To make it reachable with a button and to add a small icon next to the tab name we should edit two files, Main.js
(here) and Application.js
(here).
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | // Main.js
// Add the ``Test`` menu button
},{
text: 'Test',
itemId: 'test_menu_item',
iconCls: null,
glyph: 'xf11b',
listeners: {
click: {
fn: 'onAction',
hash: 'view/test',
glyph: 'xf11b',
panel: 'test'
}
}
},{
text: 'Options',
...
|
34 35 36 37 38 39 40 41 | // Application.js
// Add the ``test`` glyph
version: null,
panel_glyph: {
'test': 'xf11b',
...
|
You should see something like the following:

Create the Python View code¶
Now let’s create a grid, a basic double-click event and a link to a Python view. First of all we need to create the test.py
file within the views
directory (here). The basic view file should look something like that:
// test.py
import json
from django.http import HttpResponse
from django.views import View
from command.lib.utils.decorators import forward_exception_to_http
class TestView(View):
def get(self, request, operation, *args, **kwargs):
method = getattr(self, operation)
return method(request, *args, **kwargs)
def post(self, request, operation, *args, **kwargs):
method = getattr(self, operation)
return method(request, *args, **kwargs)
@staticmethod
@forward_exception_to_http
def test(request, *args, **kwargs):
return HttpResponse(json.dumps({'success': True}),
content_type="application/json")
The test
function does nothing at the moment and is meant to respond to an Ajax call. We’ll see that within the same TestView
class we will put both code to manage Ajax and WebSocket requests. Before we add any business logic code we need to tell COMMAND>_ that the ExtJS view test
will make requests to the Python view TestView
and that users need no specific privileges to do that (for the moment). So let’s add one line in the consumer.py
script (here):
34 35 36 37 38 39 40 41 | # consumer.py
class Dispatcher:
dispatcher = {
...
ExportDataView: ['export_data'],
TestView: ['test']
}
|
Add a grid to the ExtJS interface¶
So far, so good. Let’s remove the HTML code from the Test.js
file and let’s add a grid to show all the experiments for the selected compendium. The file will now look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | // Test.js
Ext.define('command.view.test.Test', {
extend: 'command.Grid',
xtype: 'test',
title: 'Test',
requires: [
'Ext.panel.Panel',
'command.view.test.TestController'
],
controller: 'test',
store: null,
alias: 'widget.test',
itemId: 'test',
reference: 'test',
viewModel: {},
mixins: {
getRequestObject: 'RequestMixin'
},
command_view: 'test',
command_read_operation: 'test_read',
listeners: {
//
},
columns: [{
text: 'Accession',
flex: 2,
sortable: true,
dataIndex: 'experiment_access_id',
}, {
text: 'Experiment name',
flex: 2,
sortable: true,
tdCls: 'command_tooltip',
dataIndex: 'experiment_name'
}, {
text: 'Scientific paper',
flex: 2,
sortable: true,
dataIndex: 'scientific_paper_ref'
}, {
text: 'Description',
flex: 2,
sortable: true,
tdCls: 'command_tooltip',
dataIndex: 'description'
}],
initComponent: function() {
this.store = Ext.create('command.store.Experiments');
this.callParent();
},
destroy: function() {
this.callParent();
}
});
|
Please note that:
- at line
4
we extendcommand.Grid
;- at line
31
we are saying to COMMAND>_ the view to be used;- at line
33
we are declaring the default read operation (i.e. the default Python function to be called);- at line
64
we are declaring the ExtJS store to use.
Link the ExtJS grid to the Python code via WebSocket¶
The test.py
Python view file will have a test_read
function that will look like the following:
# test.py
@staticmethod
@forward_exception_to_channel
def test_read(channel_name, view, request, user):
channel = Channel(channel_name)
start = 0
end = None
compendium = CompendiumDatabase.objects.get(id=request['compendium_id'])
if request['page_size']:
start = (request['page'] - 1) * request['page_size']
end = start + request['page_size']
order = ''
if request['ordering'] == 'DESC':
order = '-'
query_response = Experiment.objects.using(compendium.compendium_nick_name). \
filter(Q(experiment_access_id__icontains=request['filter']) |
Q(scientific_paper_ref__icontains=request['filter']) |
Q(description__icontains=request['filter']) |
Q(experiment_name__icontains=request['filter']))
try:
query_response = query_response.order_by(order + request['ordering_value'])
except Exception as e:
pass
total = query_response.count()
query_response = query_response[start:end]
channel.send({
'text': json.dumps({
'stream': view,
'payload': {
'request': request,
'data': {
'experiments': [exp.to_dict() for exp in query_response],
'total': total
}
}
})
})
If you refresh your browser, you should now see something like the following:

As final step in this brief tutorial, let’s add a double-click event on the grid to call the test
function defined in the TestView
Python view to run an empty job on the Celery task manager. When the job is done we’ll have a callback function to show a message back on the interface. First thing is to add the event listener.
Create the Ajax call on double-click event¶
// Test.js
listeners: {
itemdblclick: 'onTestDoubleClick'
},
Then we’ll need to implement the onTestDoubleClick
in the TestController.js
// TestController.js
onTestDoubleClick: function(dv, record, item, index, e) {
var grid = dv.up('grid');
var gridSelection = grid.getSelection();
var request = grid.getRequestObject('test');
request.values = JSON.stringify(gridSelection[0].data);
Ext.Ajax.request({
url: request.view + '/' + request.operation,
params: request,
success: function (response) {
command.current.checkHttpResponse(response);
},
failure: function (response) {
console.log('Server error', reponse);
}
});
}
Manage asynchronous code using Celery and WebSocket¶
The request
object is configured to automatically retrieve the view name (request.view
) and setted to call the test
function in the Python TestView
.
# test.py
@staticmethod
@forward_exception_to_http
def test(request, *args, **kwargs):
values = json.loads(request.POST['values'])
comp_id = request.POST['compendium_id']
channel_name = request.session['channel_name']
view = request.POST['view']
operation = request.POST['operation']
test.test_task.apply_async(
(request.user.id, comp_id, values['id'], channel_name, view, operation)
)
return HttpResponse(json.dumps({'success': True}),
content_type="application/json")
With the test.test_task.apply_async
we are calling the test_task
function from the Celery task file test.py
(not to be confused with the Python view file that have the same name). We need to create this file and implement the functionality. So let’s create a file name test.py
in the command/command/lib/task
directory (here). The file will look like that:
# test.py
from __future__ import absolute_import, unicode_literals
from time import sleep
import celery
from channels import Channel
from command.lib.utils.message import Message
class TestCallbackTask(celery.Task):
def on_success(self, retval, task_id, args, kwargs):
user_id, compendium_id, path, channel_name, view, operation = args
channel = Channel(channel_name)
message = Message(type='info', title='Hello world!',
message='Hi there!'
)
message.send_to(channel)
def on_failure(self, exc, task_id, args, kwargs, einfo):
pass
@celery.task(base=TestCallbackTask, bind=True)
def test_task(self, user_id, compendium_id, exp_id, channel_name, view, operation):
sleep(1)
The test_task
function simply wait for one seconds. When it’s done the on_success
callback function gets called and it retrieve the WebSocket channel name to send back a simple message. That message will be captured on the client side and a pop-up will appear. Before trying it out we need to inform Celery that there’s an extra file to search for when calling a task. This is done in the Django setting file, here.
# settings.py
CELERY_IMPORTS = (
'command.lib.tasks.experiment_public',
'command.lib.tasks.experiment_local',
'command.lib.tasks.uncompress_file',
'command.lib.tasks.run_file_assignment_script',
'command.lib.tasks.run_parsing_script',
'command.lib.tasks.parse_bio_feature_file',
'command.lib.tasks.run_platform_mapper',
'command.lib.tasks.import_experiment',
'command.lib.tasks.import_platform_mapping',
'command.lib.tasks.export_data',
'command.lib.tasks.test',
)
You should now be able to double-click on a grid value and see something like this.

Add new public database manager¶
At the moment of writing, COMMAND>_ is able to search on GEO, ArrayExpress and SRA.

To add a new database on this list, you will need to add a line in a database table and extend one class. In the command_datasource
database table you should add the source name and the class to handle it.

The class should be defined extending the class PublicDatabase
that is defined here. This is an abstract class and to extend it you will need to implement three methods:
search
: it perform the actual search on the public database (through a REST call or FTP for example) and create oneExperimentSearchResult
for each retrieved entry to be stored in the database;download_experiment_files
: it is responsible to get all the data files related to one singleExperimentSearchResult
and save them in the output directory;create_experiment_structure
: starting from the information of the downloaded files, this method should create the experiment, platform, sample structures and save it usingExperiment
,Platform
andSample
Django models.
Add new compendium type¶
This is by far the easiest thing to do since it’s just a matter of adding one tuple on the command
DB. The table to be modified is the command_compendiumtype
table. At the moment
the only compendium type defined is the gene expression one. The fields are name, description and the biological feature name, so respectively gene_expression, Gene expression compendium and gene.

Add new biological feature file importer¶
All the classes releated to importing biological features are located here. First thing to do is to
inform the dispatcher in the importers.py
file which are the classes responsible to manage different file types. For example, genes will be imported using FASTA files. The second step is
to actually implement the class extending the BaseImporter
class. The newly defined class will need to implement the parse
method and redefine the FILE_TYPE_NAME
variable.
# fasta_file_importer.py
class FastaFileImporter(BaseImporter):
FILE_TYPE_NAME = 'FASTA'
def parse(self, filename):
sequence_field = BioFeatureFields.objects.using(self.compendium).get(name='sequence')
with transaction.atomic(using=self.compendium):
with open(filename, 'rU') as handle:
for record in SeqIO.parse(handle, 'fasta'):
gene = BioFeature()
gene.name = record.id
gene.description = record.description
gene.save(using=self.compendium)
bf_value = BioFeatureValues()
bf_value.bio_feature = gene
bf_value.bio_feature_field = sequence_field
bf_value.value = str(record.seq)
bf_value.save(using=self.compendium)

Add new platform type¶
To add a new platform type there are several step to do and mostly depends on the kind of platform is going to be added.
Database entry
To add a new platform type for a single compendium (organism) you will need to add a tuple with name, description, bio feature reporter name and the compendium type ID, for example:
microarray, MicroArray, probe and 1 to the command_platformtype
table. If you want every new compendium you are going to create to have such new platform you will need to add the same tuple to the
command_platformtypeadmin
table in the command
DB.

Reporters ExtJS GUI
Next step will be to inform the GUI how to behave when the user wants to see the biological feature reporters associated with the new platform. For example in case of Microarray the
biological feature reporters are the probes. The file to modify is PlatformController.js
(defined here).
onViewBioFeatureReporter
is the method to modify adding a new case for the new platform. For example in case of RNA-seq we simply display a message to say there’s no associated
biological feature reporters since the gene expression measurement in this case is directly given by read counts. For Microarray instead we have probes and thus we will open a new
window to show the probes associated with this platform, the window_bio_feature_reporter
window.
// PlatformsController.js
onViewBioFeatureReporter: function (me) {
var selection = me.up('grid').getSelectionModel().getSelection()[0].data;
var comp = JSON.parse(localStorage.getItem("current_compendium"));
if (selection.platform_type) {
switch (selection.platform_type.name) {
case 'rnaseq':
Ext.MessageBox.show({
title: 'RNA-seq platform',
msg: 'For RNA-seq platform ' + selection.platform_access_id + ', ' + comp.compendium_type.bio_feature_name + ' is/are directly measured',
buttons: Ext.MessageBox.OK,
icon: Ext.MessageBox.INFO,
fn: function (a) {
}
});
break
case 'microarray':
var win = Ext.create({
xtype: 'window_bio_feature_reporter',
title: 'Microarray platform ' + selection.platform_access_id + ': ' +
comp.compendium_type.bio_feature_name + ' feature reporters (' + selection.platform_type.bio_feature_reporter_name + ')',
platform: selection
});
break
}
}
Add new platform mapper¶
When a platform has biological feature reporters associated, these must be mapped to the biological features. In case of gene expression compendium the biological features are genes. So to give a concrete example we will need to associate Microarray probes to genes. This step is very platform-dependant and so a lot of freedom is left to the developer to design the GUI. There are just few things to keep in mind in order to have everything working correctly within the COMMAND>_ framework.
Mapper ExtJS GUI
First thing will be to inform the GUI how to behave when the user wants to map this platform reporters to the biological features. The file to modify is again the PlatformController.js
(defined here),
but this time we are going to modify the onMapPlatformToBioFeature
method, adding a new case for the new platform. For Microarray we defined a new window window_map_microarray_platform
here. Again, in this case the developer is left completely free to design it as he wants.
// PlatformsController.js
onMapPlatformToBioFeature: function (me) {
var selection = me.up('grid').getSelectionModel().getSelection()[0].data;
var comp = JSON.parse(localStorage.getItem("current_compendium"));
if (selection.platform_type) {
switch (selection.platform_type.name) {
case 'rnaseq':
Ext.MessageBox.show({
title: 'RNA-seq platform',
msg: 'RNA-seq platform ' + selection.platform_access_id + ' is automatically mapped to ' + comp.compendium_type.bio_feature_name,
buttons: Ext.MessageBox.OK,
icon: Ext.MessageBox.INFO,
fn: function (a) {
}
});
break
case 'microarray':
command.current.createWin({
xtype: 'window_map_microarray_platform',
title: 'Map microarray platform ' + selection.platform_access_id + ' to ' + comp.compendium_type.bio_feature_name,
platform: selection
});
break
}
}
Mapper Django View
The associated Django View is defined in platform.py
view file here and for Microarray this is the MicroarrayPlatformView
class.
This is pretty standard view as described previously.
Mapper code
The actual code is stored in a class that will extend the BaseMapper
(placeholder) class. For Microarray this class is MicroarrayMapper
and is located here.
Last step is to inform the mapper dispatcher on which class to invoke, and this is done in the mappers.py
file located here.
// mappers.py
from command.lib.coll.platform.microarray_mapper import MicroarrayMapper
platform_mapper = {
'microarray': MicroarrayMapper
}
Contribute & Support¶
Author¶
To send me an e-mail about anything else related to COMMAND>_ write to
License¶
The project is licensed under the GPLv3 license.
How to cite¶
If you find COMMAND>_ useful for your work please cite
Moretto, M., Sonego, P., Villaseñor-Altamirano, A. B., & Engelen, K. (2019). First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_. BMC bioinformatics, 20(1), 54. ISO 690