Genomics Workspace

Genomics workspace is a open-source project created by i5k workspace of NAL.

In this project, we produced a Django website with functionality of common sequence searchs including BLAST, HMMER, and Clustal.

Leveraging the admin page of Django and task queue by RabbitMQ and Celery, it’s much easier to manage the sequence databases and provide services to end-users.

All source codes of genomics workspace are in our github repo.

Note

You can try genomics workspace on our live services:

In fact, the live services listed above are implemented by a customized version of genomics workspace. You can check the source code of it in another github repo: NAL-genomics-workspace.

Pre-requisites

  • git
  • Python 2.7
  • npm
  • RabbitMQ
  • PostgreSQL
  • mod_wsgi (optional, only for production)

Setup Guide

This is our introduction to this project.

Setup Guide (CentOS)

This setup guide is for CentOS. It’s tested in CentOS 6.7 and CentOS 7.2, but it should also work on all modern linux distributions.

Note: The following variables may be used in path names; substitute as appropriate:

<user>      :  the name of the user doing a set up.
<user-home> :  the user's home directory, e.g., /home/<user>
<git-home>  :  the directory containing the genomics-workspace, and `.git/` folder for `git` will be there.

Project Applications

Clone or refresh the genomics-workspace:

git clone https://github.com/NAL-i5K/genomics-workspace

# Or if the  repository exists:
cd <git-home>
git fetch

Yum

Generate metadata cache:

yum makecache

Python

Install necessary packages:

sudo yum -y groupinstall "Development tools"
sudo yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel
sudo yum -y install readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel python-devel

Install python 2.7.13 from source:

cd <user-home>
wget http://www.python.org/ftp/python/2.7.13/Python-2.7.13.tar.xz
tar -xf Python-2.7.13.tar.xz

# Configure as a shared library:
cd Python-2.7.13
./configure --prefix=/usr/local --enable-unicode=ucs4 --enable-shared LDFLAGS="-Wl,-rpath /usr/local/lib"

# Compile and install:
make
sudo make altinstall

# Update PATH:
export PATH="/usr/local/bin:$PATH"

# Checking Python version (output should be: Python 2.7.13):
python2.7 -V

# Cleanup if desired:
cd ..
rm -rf Python-2.7.13.tar.xz Python-2.7.13

Install pip and virtualenv:

wget https://bootstrap.pypa.io/ez_setup.py
sudo /usr/local/bin/python2.7 ez_setup.py

wget https://bootstrap.pypa.io/get-pip.py
sudo /usr/local/bin/python2.7 get-pip.py

sudo /usr/local/bin/pip2.7 install virtualenv

Build a separate virtualenv:

cd <git-home>

# Create a virtual environment called py2.7 and activate:
virtualenv -p python2.7 py2.7
source py2.7/bin/activate

RabbitMQ

Install RabbitMQ Server:

cd <user-home>

# Install RHEL/CentOS 6.8 64-Bit Extra Packages for Enterprise Linux (Epel).
# The 6.8 Epel caters for CentOS 6.*:
wget https://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -ivh epel-release-6-8.noarch.rpm

# For RHEL/CentOS 7.* :
# wegt http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-10.noarch.rpm
# and change other commands accordingly

# Install Erlang:
sudo yum -y install erlang

# Install RabbitMQ server:
sudo yum -y install rabbitmq-server

# To start the daemon by default when system boots run:
sudo chkconfig rabbitmq-server on

# Start the server:
sudo /sbin/service rabbitmq-server start

# Clean up:
rm epel-release-6-8.noarch.rpm

Memcached

Install and activate memcached:

sudo yum -y install memcached

# Set to start at boot time:
sudo chkconfig memcached on

Database

Install PostgreSQL:

# Add line to yum repository:
echo 'exclude=postgresql*' | sudo tee -a /etc/yum.repos.d/CentOS-Base.repo

# Install the PostgreSQL Global Development Group (PGDG) RPM file:
sudo yum -y install http://yum.postgresql.org/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-2.noarch.rpm

# Install PostgreSQL 9.5:
sudo yum -y install postgresql95-server postgresql95-contrib postgresql95-devel

# Initialize (uses default data directory: /var/lib/pgsql):
sudo service postgresql-9.5 initdb

# Startup at boot:
sudo chkconfig postgresql-9.5 on

# Control:
# sudo service postgresql-9.5 <command>
#
# where <command> can be:
#
#     start   : start the database.
#     stop    : stop the database.
#     restart : stop/start the database; used to read changes to core configuration files.
#     reload  : reload pg_hba.conf file while keeping database running.

# Start:
sudo service postgresql-9.5 start

#
#  (To remove everything: sudo yum erase postgresql95*)
#

# Create django database and user:
sudo su - postgres
psql

# At the prompt 'postgres=#' enter:
create database django;
create user django;
grant all on database django to django;
ALTER USER django CREATEDB;

# Connect to django database:
\c django

# Create extension hstore:
create extension hstore;

# Exit psql and postgres user:
\q
exit

# Config in pg_hba.conf:
cd <git-home>
export PATH=/usr/pgsql-9.5/bin:$PATH

# Restart:
sudo service postgresql-9.5 restart

Python Modules and Packages

Install additional Python packages:

cd <git-home>
pip install -r requirements.txt

Chrome Driver

Celery

Configure celery:

# Run celery manually
celery -A i5k worker --loglevel=info --concurrency=3
# Run celery beat maually as well
celery -A i5k beat --loglevel=info

Migrate Schema to to PostgreSQL

Run migrate:

cd <git-home>
python manage.py migrate

Install Binary Files and Front-end Scripts

This step will instll binary files (for BLAST, HMMER and Clustal) and front-end scripts (.js, .css files):

npm run build

Start development server

To run developement server:

cd <git-home>
python manage.py runserver

Setup Guide (MacOS)

This setup guide is tested in MacOS Sierra (10.12) and MacOS High Sierra (10.13), but it should also work on all recent MacOS versions.

Note: The following variables may be used in path names; substitute as appropriate:

<user>      :  the name of the user doing a set up.
<user-home> :  the user's home directory, e.g., /home/<user>
<git-home>  :  the directory containing the genomics-workspace, and `.git/` folder for `git` will be there.

Project Applications

Clone or refresh the genomics-workspace:

git clone https://github.com/NAL-i5K/genomics-workspace

# Or if the  repository exists:
cd <git-home>
git fetch

Homebrew

We recommend to use Homebrew as package manager. Installation steps can be found at https://brew.sh/.

Python

Install virtualenv:

pip install virtualenv

Build a separate virtualenv:

# Make root dir for virtualenv and cd into it:
cd genomics-workspace

# Create a virtual environment called py2.7 and activate:
virtualenv -p python2.7 py2.7
source py2.7/bin/activate

RabbitMQ

Install and run RabbitMQ Server:

brew install rabbitmq
# Make sure /usr/local/sbin is in your $PATH
rabbitmq-server

Memcached

Install and activate memcached:

brew install memcached
memcached

Database

Install PostgreSQL:

brew install postgres
psql postgres

# At the prompt 'postgres=#' enter:
create database django;
create user django;
grant all on database django to django;
ALTER USER django CREATEDB;

# Connect to django database:
\c django

# Create extension hstore:
create extension hstore;

# Exit psql and postgres user:
\q
exit

Python Modules and Packages

Install additional Python packages:

cd <git-home>
pip install -r requirements.txt

Chrome Driver

Celery

Configure celery:

# Run celery manually
celery -A i5k worker --loglevel=info --concurrency=3
# Run celery beat maually as well
celery -A i5k beat --loglevel=info

Migrate Schema to to PostgreSQL

Run migrate:

cd <git-home>
python manage.py migrate

Install Binary Files and Front-end Scripts

This step will instll binary files (for BLAST, HMMER and Clustal) and front-end scripts (.js, .css files):

npm run build

Start development server

To run developement server:

cd <git-home>
python manage.py runserver

Advanced Setup

JBrowse/Apollo Linkout Integration

In Genomics workspace, we have a linkout integration between BLAST and JBrowse/Apollo. You can directly go to corresponding sequence location through clicking entries in BLAST result table. To start using it, make change of ENABLE_JBROWSE_INTEGRATION in i5k/settings.py;

ENABLE_JBROWSE_INTEGRATION = True

User Guide

BLAST, HMMER, and Clustal are main functionality of genomics-workspace. Each one of this, we implemented it as a single app under Django.

In this section, we will go through details about how to configure each one of them.

In short, you need to configure database for BLAST and HMMER, but you don’t need to configure anything for Clustal.

Note

The page is for user that wants to set up genomics-workspace by creating new admin user and confuguring in admin page. If you want to know how to use services provided by genomics-workspace, see these tutorials:

To get started, you need to setup an admin account:

python manage.py createsuperuser

Follow the instruction shown on your terminal, then browse and login to the admin of genomics-workspace. Usually, the admin page should be at http://127.0.0.1:8000/admin/.

BLAST Database Configuration

There are five steps to create a BLAST database.

  • Add Organism (click the Organism icon at sidebar and click Add organism):
    • Display name should be scientific name.
    • Short name are used by system as a abbreviation.
    • Descriptions and NCBI taxa ID are automatically filled.
_images/add_organism.png
  • Add Sequence types:
    • Used to classify BLAST DBs in distinct catagories.
    • Provide two kinds of molecule type for choosing, Nucleotide/Peptide.
  • Add Sequence
  • Add BLAST DB
    • Choose Organsim
    • Choose Type (Sequence type)
    • Type location of fasta file in FASTA file path (It should be in <git-home>/media/blast/db/)
    • Type Title name. (showed in HMMER page)
    • Type Descriptions.
    • Check is shown, if not check, this database would show in HMMER page.
    • Save
_images/add_blastdb.png
  • Browse to http://127.0.0.1:8000/blast/, you should able to see the page with dataset shown there.

HMMER Database Configuration

Like BLAST, HMMER databases must be configured then they could be searched.

Go django admin page and click Hmmer on left-menubar. You need to create HMMER db instance (Hmmer dbs) for each fasta file.

  • Choose Organsim
  • Type location of peptide fasta file in FASTA file path
  • Type Title name. (showed in HMMER page)
  • Type Descriptions.
  • Check is shown, if not check, this database would show in HMMER page.
  • Save
_images/hmmer_add.png

How to Deploy

In short, you need to setup following tools and services:

Because genomics workspace is a standard Django website, there is no large difference to deploy genomics workspace. We recommed to deploy genomics workspace through Apache and mod_wsgi.

You may want take a look the great documentation of Django project on deploying as well.

Apache HTTP server and mod_wsgi

See the document of Django. You can also see the example settings file of Apache and mod_wsgi in our github repo.

RabbitMQ

Use the rabbitmq-server command.

Celery and celerybeat

Here are example setup steps for linux,

  1. Copy files:

    # when using CentOS 7.*
    # copy celeryd.sysconfig and celerybeat.sysconfig to /etc/default instead.
    sudo cp celeryd /etc/init.d
    sudo cp celerybeat /etc/init.d
    sudo cp celeryd.sysconfig /etc/sysconfig/celeryd
    sudo cp celerybeat.sysconfig /etc/sysconfig/celerybeat
    
  2. edit ‘/etc/sysconfig/celeryd’:

    CELERYD_CHDIR="<git-home>"
    CELERYD_MULTI="<git-home>/py2.7/bin/celery multi"
    
  3. edit ‘/etc/sysconfig/celerybeat’ as follows:

    CELERYBEAT_CHDIR="<git-home>"
    CELERY_BIN="<git-home>/py2.7/bin/celery"
    
  4. set as daemon:

    sudo chkconfig celeryd on
    sudo chkconfig celerybeat on
    

For more details or setup on Mac, check the document of Celery. Example files mentioned above are also (celery*) in our github repo.

Trouble Shooting

Q: I get an error message like: FATAL: Ident authentication failed. How can I fix this ?

A: It’s because the setting of PostgreSQL database. Try to modify the config file pg_hba.conf. For example, in PostgreSQL 9.5, the file is at /var/lib/pgsql/9.5/data/pg_hba.conf. Make sure you change part of the content of it into something like:

local   all             all                               peer
host    all             all             127.0.0.1/32      ident
host    all             all             ::1/128           md5

About i5k Workplace at NAL

The i5k Workspace at NAL is a platform for communities around ‘orphaned’ arthropod genome projects to access, visualize, curate and disseminate their data.

For more information, please see website of i5k Workspace@NAL.

Indices and tables