AnnoTree¶
Caution
This documentation server is in an early stage of development.
AnnoTree with Custom Data¶
The use of custom taxonomic, phylogenetic, proteomic, or annotation data only affects the database preparation step. All other application components operate the same way as the production version of AnnoTree.
Prepare Database¶
Documentation for database preparation is located within the AnnoTree Database Scripts section. Keep track of the name of the new database for the backend configuration.
Configure Backend¶
Documentation for backend configuration is located within the AnnoTree Backend section. Keep track of the URL that the backend is being served to for the frontend configuration.
Configure Frontend¶
Documentation for the frontend configuration is located within the AnnoTree Frontend section.
.. setup-annotree_data:
AnnoTree Production Version¶
Linux Server¶
- You need at least 150GB of disk space to properly load databases. We recommend 250GB.
- You should always use a locally attached SSD with >= 40MB/s IO speed for decent query time. Hard drives, network attached SSDs are proven to be too slow for some larger queries (can take 2 minutes). Some Cloud VMs by default will use network attached SSD and will not suit this task.
- It is recommended to create another user
annotree_user
and change user and group to that user:chown -R annotree_user:annotree_user <this repository>
- Downloading and loading database can take a long time, you might find it easier to run scripts related to them first
- It can take 30 minutes to 1 hour to follow all instructions (excluding loading time)
The following is a log of setting up a server on Feb 25, 2019, using (Ubuntu 16.04, with 2 core CPU, 8GB of RAM and 250GB of disk space):
Setting up MySQL: (If you are using Google Cloud, or other cloud services see bottom before proceeding to this section)
# download SQL dump, please refer to https://bitbucket.org/doxeylabcrew/annotree-database/src/master/ for a list of URLs
# e.g. wget <my-dump-url>
sudo apt-get update && sudo apt-get install -y mysql-server mysql-client
# <enter root password>
# The following would take a while, we recommend you use "screen" cmd to avoid terminal interruption
tar -vzxOf <path to .sql.tar.gz file> | mysql -u root -p --default-character-set=utf8
# give full permission to gtdb_user
mysql_username=annotree
mysql_password=<CHOOSE YOUR OWN PASSWORD>
echo "CREATE USER '$mysql_username'@'%' IDENTIFIED BY '$mysql_password';GRANT ALL PRIVILEGES ON *.* TO '$mysql_username'@'%'; FLUSH PRIVILEGES;" | mysql -u root -p --default-character-set=utf8
# in case you want to save it
sudo echo $mysql_password > /root/mysql_annotree_password
Now log in to the database, for sanity check
mysql -u annotree -p
# <enter your password>
show databases;
## should show everything that's loaded, keep note of the database names
use <any of the database name, e.g. gtdb_bacteria>
show tables;
## should see a list of tables
SELECT COUNT(*) FROM node;
## should show the size of node table
SHOW INDEX FROM pfam_top_hits;
## should list pfam_id and gtdb_id indices, we encountered an issue in the past when disk space ran out in /tmp and indices were not loaded
Setting up server:
sudo apt-get update && apt-get install -y git python-pip libmysqlclient-dev python-dev build-essential
sudo mkdir -p /app
sudo useradd annotree_user
sudo passwd annotree_user
<enter password>
sudo mkhomedir_helper annotree_user
sudo chown -R annotree_user:annotree_user /app
sudo su - annotree_user
cd /app
git clone --branch latest-release --depth=1 https://bitbucket.org/doxeylabcrew/annotree-backend.git
cd annotree-backend
pip install -r requirements.txt
Now you can update config.py
in backend
sudo su - annotree_user # make sure you are annotree_user, skip if you already are
cd /app/annotree-backend
cp sample-config.py config.py
vi config.py
## change mysql username and password
## change bacterial and archaeal database names as shown when you checked database
## normally they should be gtdb_bacteria and gtdb_archaea
## you may want to make sure config.py has secure permissions
chmod 440 config.py # this ensures only annotree_user and group annotree_user can read config.py
We will also show how to set up landing page and frontend here.
Landing page
sudo su - annotree_user # make sure you are annotree_user, skip if you already are
cd /app
git clone --depth=1 https://bitbucket.org/doxeylabcrew/annotree-landing-page.git
cd annotree-landing-page
# Check annotree-landing-page for newest set up instructions, what's recorded here may be outdated
sudo easy_install nodeenv # switch to your own user if necessary
sudo su - annotree_user && cd /app/annotree-landing-page # do NOT use root from now on
nodeenv node6 --node=6.14.4 # this will be stuck if you use root
. node6/bin/activate
# you have node 6 now
node --version
# should say 6.14.4
npm install
node node_modules/gulp/bin/gulp.js
# the default job compiles all SCSS files and minifies javascript, you should be good to go
sudo chown -R annotree_user:annotree_user /app/annotree-landing-page # fix permission if you accidentally installed by root
Frontend
# use your own user account with sudo access
curl -sL https://deb.nodesource.com/setup_11.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo su - annotree_user # make sure you are annotree_user, skip if you already are
cd /app
git clone --branch latest-release --depth=1 https://bitbucket.org/doxeylabcrew/annotree-frontend.git
cd annotree-frontend
# Check frontend repo for specific instructions, the following instructions may be outdated
npm install
cp app/Config.js.sample app/Config.js
vi app/Config.js
# change SERVER_BASE_URL to "http://<MY IP OR DOMAIN ADRESS>/annotree-api"
npm run build
# you should see a public/ folder, this is where all html files are
# we will symlink this to inside the landing page
ln -s /app/annotree-frontend/public /app/annotree-landing-page/app
Then, to serve frontend and backend using apache WSGI module:
sudo apt-get install -y apache2 libapache2-mod-wsgi
sudo a2enmod wsgi
We will symlink annotree-landing-page
sudo ln -s /app/annotree-landing-page /var/www/html/annotree
Change apache config file using the following as an example:
sudo vi /etc/apache2/sites-available/000-default.conf
# replace with the following
Define annotree_backend_dir /app/annotree-backend
<VirtualHost *:80>
ServerAdmin annotree_backend
ServerName annotree_backend
WSGIDaemonProcess dummy.com user=annotree_user group=annotree_user processes=2 threads=25
WSGIScriptAlias /annotree-api ${annotree_backend_dir}/app.wsgi
DocumentRoot /var/www/html/annotree
<Directory ${annotree_backend_dir}>
Options Indexes FollowSymLinks
WSGIProcessGroup dummy.com
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
Require all granted
</Directory>
ErrorLog ${APACHE_LOG_DIR}/annotree_error.log
CustomLog ${APACHE_LOG_DIR}/annotree_access.log combined
</VirtualHost>
Request to http://example.com/annotree-api/gtdb_bacteria/tree
will be converted to /gtdb_bacteria/tree
and sent to app.wsgi
. You can modify API prefix by changing WSGIScriptAlias
directive.
You will need to check frontend app/Config.js
to make sure api URL prefix matches.
Enable changes and restart:
sudo service apache2 restart
Now check if everything is working:
curl localhost # you should see the landing page html
curl localhost/app # the main app
curl localhost/annotree-api/gtdb_bacteria/tree # check database and backend
Google Cloud¶
Google Cloud by default, uses network attached SSDs and will not satisfy our database access needs.We need to allocate a local SSD, which will be removed after instance is stopped (but not restarted). Extra caution is suggested. Database will reside on this drive instead of the default.
We will also need to make sure cloud service has the correct network configuration, this can be done by checking “Allow HTTP/Allow HTTPS” traffic in the VM instance EDIT tab (not included in the following instructions)
The following has been tested on Ubuntu 16.04 LTS:
Steps to set up database on Google Cloud
First go to https://cloud.google.com/compute/docs/disks/local-ssd#creating_a_local_ssd to allocate a SSD for your machine. Then run the following
# the following mounts up SSD drive, to /mnt/disks/ssd
lsblk # lists all attached drives, usually "sdb" is the local SSD
sudo mkfs.ext4 -F /dev/sdb
sudo mkdir -p /mnt/disks/ssd
sudo mount /dev/sdb /mnt/disks/ssd
sudo chmod a+w /mnt/disks/ssd
We will install MYSQL server then move to SSD
sudo apt-get update && sudo apt-get install -y mysql-server mysql-client
# ENTER "root" as password
sudo service mysql stop
sudo mv /var/lib/mysql /mnt/disks/ssd/mysql
sudo ln -s /mnt/disks/ssd/mysql /var/lib/mysql
# Edit config, change "tmpdir" from /tmp to /tmp/mysql
sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
mkdir -p /mnt/disks/ssd/tmp-mysql
ln -s /mnt/disks/ssd/tmp-mysql /tmp/mysql
chmod a+rw /mnt/disks/ssd/tmp-mysql
Google Cloud uses app armor to manage application read/write permissions
sudo echo "alias /var/lib/mysql/ -> /mnt/disks/ssd/mysql/," >> /etc/apparmor.d/tunables/alias
sudo vi /etc/apparmor.d/usr.sbin.mysqld
Add the following to /etc/apparmor.d/usr.sbin.mysqld
; source from: https://support.plesk.com/hc/en-us/articles/360004185293-Unable-to-start-MySQL-on-Ubuntu-AVC-apparmor-DENIED-operation-open-
/proc/*/status r,
/sys/devices/system/node/ r,
/sys/devices/system/node/node*/meminfo r,
/sys/devices/system/node/*/* r,
/sys/devices/system/node/* r,
/tmp/mysql/ r,
/tmp/mysql/** rwk,
/mnt/disks/ssd/tmp-mysql/ r,
/mnt/disks/ssd/tmp-mysql/** rwk,
/mnt/disks/ssd/mysql/ r,
/mnt/disks/ssd/mysql/** rwk,
sudo apparmor_parser -r /etc/apparmor.d/usr.sbin.mysqld
sudo service mysql start
echo "Sanity check to make sure everything is working"
echo "CREATE DATABASE temp; DROP DATABASE temp;" | mysql -u root -proot
# should say ok
Now MySQL is good to go, you can continue in previous session
If you are interested in testing speed, use the following command. (It runs a large PFAM query that would normally take ~40s to >1minute on other machines):
echo "SELECT node_id
FROM gtdb_node gn
JOIN
(
SELECT
gtdb_id,
COUNT(DISTINCT pfam_id) AS num_hit_per_genome
FROM pfam_top_hits
WHERE pfam_id IN ('PF00252') AND eval <= 1.0
GROUP BY gtdb_id
HAVING num_hit_per_genome >= 1
) g
ON gn.gtdb_id = g.gtdb_id;" | mysql -u root -proot -D gtdb_bacteria_RS86
Updating
- Please check each repository (annotree-landing-page, annotree-frontend, annotree-backend) for possible instructions, what’s listed here may not be up to date
- First pull newest code to each repository by running
git pull origin latest-release:latest-release
; or for landing pagegit pull origin master
- For database, run
mysql -u annotree -p
andSHOW DATABASES;
in MySql to check if all data are loaded properly, do not forget to update backendconfig.py
in case of DB name change - In case of server domain change, you need to change that in frontend:
vi /app/annotree-frontend/app/Config.js
to make sure new domain name matches. - For backend, you may need to run
pip install -r requirements.txt
again, andnpm install && npm run build
for frontend; any change to frontend code must be followed bynpm run build
for it to compile. - Finally check firewall settings, both on your local machine and on network (You need to allow incoming traffic on any cloud services)
- run
service apache2 restart
service mysql restart
to bring up databases
AnnoTree Database Scripts¶
This repository contains all the scripts for creating a MySQL database that can be queried by the front-end AnnoTree application for the visualization of functional data on a phylogenetic tree. The scripts were written for implementation with data provided by the GTDB but any data files that use the same format may be used.
Installation Requirements¶
- You must be running a Unix-based operating system (ie. Linux or Mac)
- Install MySQL 5.7+
- Install Python 3.+
- Install required Python modules. To do this, download the
requirements.txt
file in the root of this repo and run the following:pip install -r requirements.txt
Configuration YAML and Data Files:¶
Use gtdb_database/test_data/db_config_example.yml
as a template and change the fields accordingly. It is suggested that this file be password-protected in order to hide the secure database information it contains. Consult the gtdb_database/test_data
directory for example data files.
Field | Description |
---|---|
database_name |
The name to be assigned to the database in MySQL [default: gtdb_bacteria ] |
host |
MySQL database host. This may differ from default if running in a Docker image [default: localhost ] |
port |
MySQL port. This may differ from default if running in a Docker container [default: 3306 ] |
user |
MySQL user with database creation privileges |
password |
Password for the MySQL user |
kegg_counts |
Path to count matrix with KEGG ID's as column names and genome ID's as row names. Row names must match the leaf names in newick_tree [Example: gtdb_kegg_table.test.tsv ] |
kegg_tophits_dir |
Path to directory containing KEGG hit scores for each genome. Each file must follow the naming format [genome ID]_ko_hits.tsv and have the same header and format as the files in the example directory [Example directory: ko_tophits.test ] |
metadata |
Path to metadata file supplied by the GTDB. So far only the accession (ie. genome ID), ncbi_taxonomy , gtdb_taxonomy , and ncbi_taxid fields are used so a file containing only these fields should work [Example: bac_metadata.test.tsv ] |
newick_tree |
Path to phylogenetic tree in Newick format. The tree must contain branch lengths, bootstrap values, and labels at internal nodes with taxonomic ranks following Greengenes taxonomy formatting (ie. 'p__Firmicutes'). Leaf names should represent genome ID's [Example: tsv-to-json/gtdb_r80_bac120.20171025.tree ] |
pfam_counts |
Path to count matrix with Pfam ID's as column names and genome ID's as row names. Row names must match the leaf names in newick_tree [Example: gtdb_pfam_table.test.tsv ] |
pfam_ftp_dir_url |
URL to the Pfam FTP directory corresponding to the Pfam version you would like to download [Example for v32.0: ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam32.0/database_files ] |
pfam_tophits_dir |
Path to directory containing Pfam hit scores for each genome. Each file must follow the naming format [genome ID]_pfam_tophit.tsv and have the same header and format as the files in the example directory [Example directory: pfam_tophits.test ] |
protein_seq_dir |
Path to directory containing protein FASTA files for each genome. Each file must follow the naming format [genome ID]_protein.faa . Encoded STOP codons (* ) are permitted but are removed for database loading [Example directory: protein_files.test ] |
gtdb_taxonomy |
Path to file containing the taxonomy information for each genome ID with similar headers and delimiters as those in the taxonomy files provided by the GTDB [Example: tsv-to-json/gtdb_bac_r80_20171025.tsv ] |
json_tree |
Desired output path for the JSON tree generated from data in the gtdb_taxonomy and newick_tree files. It is recommended that this file be given a version number associated with the data files used to generate it [Example: bac_r80_tree.json ] |
pfamA_sql |
Desired output path of the MySQL dump file for the pfamA table of the Pfam database. It is recommended that this file be given a version number associated with the Pfam version that was given in pfam_ftp_dir_url . If 'current_release' was used, you should verify the version [Example: pfamA_v32_0.sql ] |
pfamA_txt |
Desired output path of the MySQL data file for the pfamA table of the Pfam database. If you are running MySQL with the --secure-file-priv option (true by default), the file must be in the secure directory. It is recommended that this file be given a version number associated with the Pfam version that was given in pfam_ftp_dir_url . If 'current_release' was used, you should verify the version [Example: pfamA_v32_0.txt ] |
pfam_taxonomy_sql |
Desired output path of the MySQL dump file for the taxonomy table of the Pfam database. It is recommended that this file be given a version number associated with the Pfam version that was given in pfam_ftp_dir_url . If 'current_release' was used, you should verify the version [Example: pfam_taxonomy_v32_0.sql ] |
pfam_taxonomy_txt |
Desired output path of the MySQL data file for the taxonomy table of the Pfam database. If you are running MySQL with the --secure-file-priv option (true by default), the file must be in the secure directory. It is recommended that this file be given a version number associated with the Pfam version that was given in pfam_ftp_dir_url . If 'current_release' was used, you should verify the version [Example: pfamA_v32_0.txt ] |
tigrfam_counts |
Path to count matrix with TIGRFAM ID's as column names and genome ID's as row names. Row names must match the leaf names in newick_tree [Example: gtdb_tigrfam_table.test.tsv |
tigrfam_tophits_dir |
Path to directory containing TIGRFAM hit scores for each genome. Each file must follow the naming format [genome ID]_tigrfam_tophit.tsv and have the same header and format as the files in the example directory [Example directory: tigrfam_tophits.test ] |
tigrfam_info_dir |
Path to directory containing a .INFO file for each TIGRFAM ID. The directory can be obtained from the JCVI FTP site: ftp://ftp.jcvi.org/pub/data/TIGRFAMs/ . [Example directory: TIGRFAMs_INFO.test derived from the TIGRFAMs_15.0_INFO.tar.gz file at the JCVI FTP site] |
NOTE: All file paths must be full paths or be relative to the directory that you are running make_db.py
.
Running Instructions¶
Once you have satisfied all of the installation requirements, all data is formatted correctly, and you have generated a configuration file, run the wrapper script: python make_db.py --config path/to/config.yaml
It will generate and populate a new MySQL database and output progress to the screen and make_db.log
.
AnnoTree Backend¶
This is the backend part for AnnoTree.
Install¶
Clone this repo, then
pip install -r requirements.txt
Connection to database is in config.py
. Create config.py
, using sample-config.py
as an example, for security purpose please turn off read access for other group: chmod o-r config.py
Development server and debug
python app.py
to start development server.
App is served on localhost:5001
, you can change that in app.run
line.
Note, normally port 5001 is blocked by firewall, to allow remote access, do:
sudo iptables -I input -p tcp --dport 5001 -j ACCEPT
, BE CAREFUL THIS MAY OPEN SECURITY VULNERABILITIES
Production¶
- You need at least 150GB of disk space to properly load databases. We recommend 250GB.
- You should always use a locally attached SSD with >= 40MB/s IO speed for decent query time. Hard drives, network attached SSDs are proven to be too slow for some larger queries (can take 2 minutes). Some Cloud VMs by default will use network attached SSD and will not suit this task.
- It is recommended to create another user
annotree_user
and change user and group to that user:chown -R annotree_user:annotree_user <this repository>
- Downloading and loading database can take a long time, you might find it easier to run scripts related to them first
- It can take 30 minutes to 1 hour to follow all instructions (excluding loading time)
The following is a log of setting up a server on Feb 25, 2019, using (Ubuntu 16.04, with 2 core CPU, 8GB of RAM and 250GB of disk space):
Setting up MySQL: (If you are using Google Cloud, or other cloud services see bottom before proceeding to this section)
# download SQL dump, please refer to https://bitbucket.org/doxeylabcrew/annotree-database/src/master/ for a list of URLs
# e.g. wget <my-dump-url>
sudo apt-get update && sudo apt-get install -y mysql-server mysql-client
# <enter root password>
# The following would take a while, we recommend you use "screen" cmd to avoid terminal interruption
tar -vzxOf <path to .sql.tar.gz file> | mysql -u root -p --default-character-set=utf8
# give full permission to gtdb_user
mysql_username=annotree
mysql_password=<CHOOSE YOUR OWN PASSWORD>
echo "CREATE USER '$mysql_username'@'%' IDENTIFIED BY '$mysql_password';GRANT ALL PRIVILEGES ON *.* TO '$mysql_username'@'%'; FLUSH PRIVILEGES;" | mysql -u root -p --default-character-set=utf8
# in case you want to save it
sudo echo $mysql_password > /root/mysql_annotree_password
Now log in to the database, for sanity check
mysql -u annotree -p
# <enter your password>
show databases;
## should show everything that's loaded, keep note of the database names
use <any of the database name, e.g. gtdb_bacteria>
show tables;
## should see a list of tables
SELECT COUNT(*) FROM node;
## should show the size of node table
SHOW INDEX FROM pfam_top_hits;
## should list pfam_id and gtdb_id indices, we encountered an issue in the past when disk space ran out in /tmp and indices were not loaded
Setting up server:
sudo apt-get update && apt-get install -y git python-pip libmysqlclient-dev python-dev build-essential
sudo mkdir -p /app
sudo useradd annotree_user
sudo passwd annotree_user
<enter password>
sudo mkhomedir_helper annotree_user
sudo chown -R annotree_user:annotree_user /app
sudo su - annotree_user
cd /app
git clone --branch latest-release --depth=1 https://bitbucket.org/doxeylabcrew/annotree-backend.git
cd annotree-backend
pip install -r requirements.txt
Now you can update config.py
in backend
sudo su - annotree_user # make sure you are annotree_user, skip if you already are
cd /app/annotree-backend
cp sample-config.py config.py
vi config.py
## change mysql username and password
## change bacterial and archaeal database names as shown when you checked database
## normally they should be gtdb_bacteria and gtdb_archaea
## you may want to make sure config.py has secure permissions
chmod 440 config.py # this ensures only annotree_user and group annotree_user can read config.py
We will also show how to set up landing page and frontend here.
Landing page
sudo su - annotree_user # make sure you are annotree_user, skip if you already are
cd /app
git clone --depth=1 https://bitbucket.org/doxeylabcrew/annotree-landing-page.git
cd annotree-landing-page
# Check annotree-landing-page for newest set up instructions, what's recorded here may be outdated
sudo easy_install nodeenv # switch to your own user if necessary
sudo su - annotree_user && cd /app/annotree-landing-page # do NOT use root from now on
nodeenv node6 --node=6.14.4 # this will be stuck if you use root
. node6/bin/activate
# you have node 6 now
node --version
# should say 6.14.4
npm install
node node_modules/gulp/bin/gulp.js
# the default job compiles all SCSS files and minifies javascript, you should be good to go
sudo chown -R annotree_user:annotree_user /app/annotree-landing-page # fix permission if you accidentally installed by root
Frontend
# use your own user account with sudo access
curl -sL https://deb.nodesource.com/setup_11.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo su - annotree_user # make sure you are annotree_user, skip if you already are
cd /app
git clone --branch latest-release --depth=1 https://bitbucket.org/doxeylabcrew/annotree-frontend.git
cd annotree-frontend
# Check frontend repo for specific instructions, the following instructions may be outdated
npm install
cp app/Config.js.sample app/Config.js
vi app/Config.js
# change SERVER_BASE_URL to "http://<MY IP OR DOMAIN ADRESS>/annotree-api"
npm run build
# you should see a public/ folder, this is where all html files are
# we will symlink this to inside the landing page
ln -s /app/annotree-frontend/public /app/annotree-landing-page/app
Then, to serve frontend and backend using apache WSGI module:
sudo apt-get install -y apache2 libapache2-mod-wsgi
sudo a2enmod wsgi
We will symlink annotree-landing-page
sudo ln -s /app/annotree-landing-page /var/www/html/annotree
Change apache config file using the following as an example:
sudo vi /etc/apache2/sites-available/000-default.conf
# replace with the following
Define annotree_backend_dir /app/annotree-backend
<VirtualHost *:80>
ServerAdmin annotree_backend
ServerName annotree_backend
WSGIDaemonProcess dummy.com user=annotree_user group=annotree_user processes=2 threads=25proper permissions
WSGIScriptAlias /annotree-api ${annotree_backend_dir}/app.wsgi
DocumentRoot /var/www/html/annotree
<Directory ${annotree_backend_dir}>
Options Indexes FollowSymLinks
WSGIProcessGroup dummy.com
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
Require all granted
</Directory>
ErrorLog ${APACHE_LOG_DIR}/annotree_error.log
CustomLog ${APACHE_LOG_DIR}/annotree_access.log combined
</VirtualHost>
Request to http://example.com/annotree-api/gtdb_bacteria/tree
will be converted to /gtdb_bacteria/tree
and sent to app.wsgi
. You can modify API prefix by changing WSGIScriptAlias
directive.
You will need to check frontend app/Config.js
to make sure api URL prefix matches.
Enable changes and restart:
sudo service apache2 restart
Now check if everything is working:
curl localhost # you should see the landing page html
curl localhost/app # the main app
curl localhst/annotree-api/gtdb_bacteria/tree # check database and backend
Google Cloud¶
Google Cloud by default, uses network attached SSDs and will not satisfy our database access needs.We need to allocate a local SSD, which will be removed after instance is stopped (but not restarted). Extra caution is suggested. Database will reside on this drive instead of the default.
We will also need to make sure cloud service has the correct network configuration, this can be done by checking “Allow HTTP/Allow HTTPS” traffic in the VM instance EDIT tab (not included in the following instructions)
The following has been tested on Ubuntu 16.04 LTS:
Steps to set up database on Google Cloud
First go to https://cloud.google.com/compute/docs/disks/local-ssd#creating_a_local_ssd to allocate a SSD for your machine. Then run the following
# the following mounts up SSD drive, to /mnt/disks/ssd
lsblk # lists all attached drives, usually "sdb" is the local SSD
sudo mkfs.ext4 -F /dev/sdb
sudo mkdir -p /mnt/disks/ssd
sudo mount /dev/sdb /mnt/disks/ssd
sudo chmod a+w /mnt/disks/ssd
We will install MYSQL server then move to SSD
sudo apt-get update && sudo apt-get install -y mysql-server mysql-client
# ENTER "root" as password
sudo service mysql stop
sudo mv /var/lib/mysql /mnt/disks/ssd/mysql
sudo ln -s /mnt/disks/ssd/mysql /var/lib/mysql
# Edit config, change "tmpdir" from /tmp to /tmp/mysql
sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
mkdir -p /mnt/disks/ssd/tmp-mysql
ln -s /mnt/disks/ssd/tmp-mysql /tmp/mysql
chmod a+rw /mnt/disks/ssd/tmp-mysql
Google Cloud uses app armor to manage application read/write permissions
sudo echo "alias /var/lib/mysql/ -> /mnt/disks/ssd/mysql/," >> /etc/apparmor.d/tunables/alias
sudo vi /etc/apparmor.d/usr.sbin.mysqld
Add the following to /etc/apparmor.d/usr.sbin.mysqld
; source from: https://support.plesk.com/hc/en-us/articles/360004185293-Unable-to-start-MySQL-on-Ubuntu-AVC-apparmor-DENIED-operation-open-
/proc/*/status r,
/sys/devices/system/node/ r,
/sys/devices/system/node/node*/meminfo r,
/sys/devices/system/node/*/* r,
/sys/devices/system/node/* r,
/tmp/mysql/ r,
/tmp/mysql/** rwk,
/mnt/disks/ssd/tmp-mysql/ r,
/mnt/disks/ssd/tmp-mysql/** rwk,
/mnt/disks/ssd/mysql/ r,
/mnt/disks/ssd/mysql/** rwk,
sudo apparmor_parser -r /etc/apparmor.d/usr.sbin.mysqld
sudo service mysql start
echo "Sanity check to make sure everything is working"
echo "CREATE DATABASE temp; DROP DATABASE temp;" | mysql -u root -proot
# should say ok
Now MySQL is good to go, you can continue in previous session
If you are interested in testing speed, use the following command. (It runs a large PFAM query that would normally take ~40s to >1minute on other machines):
echo "SELECT node_id
FROM gtdb_node gn
JOIN
(
SELECT
gtdb_id,
COUNT(DISTINCT pfam_id) AS num_hit_per_genome
FROM pfam_top_hits
WHERE pfam_id IN ('PF00252') AND eval <= 1.0
GROUP BY gtdb_id
HAVING num_hit_per_genome >= 1
) g
ON gn.gtdb_id = g.gtdb_id;" | mysql -u root -proot -D gtdb_bacteria_RS86
Updating
- Please check each repository (annotree-landing-page, annotree-frontend, annotree-backend) for possible instructions, what’s listed here may not be up to date
- First pull newest code to each repository by running
git pull origin latest-release:latest-release
; or for landing pagegit pull origin master
- For database, run
mysql -u annotree -p
andSHOW DATABASES;
in MySql to check if all data are loaded properly, do not forget to update backendconfig.py
in case of DB name change - In case of server domain change, you need to change that in frontend:
vi /app/annotree-frontend/app/Config.js
to make sure new domain name matches. - For backend, you may need to run
pip install -r requirements.txt
again, andnpm install && npm run build
for frontend; any change to frontend code must be followed bynpm run build
for it to compile. - Finally check firewall settings, both on your local machine and on network (You need to allow incoming traffic on any cloud services)
- run
service apache2 restart
service mysql restart
to bring up databases
Issues and contributing¶
Please feel free to open an issue on Bitbucket page for developers to review.
AnnoTree Frontend¶
This is the frontend of AnnoTree, that is used to browse and explore GTDB microbial genome data. Full website is at: http://annotree.uwaterloo.ca
Installation¶
To create a standalone instance of AnnoTree that include frontend, backend, and database, it is recommended to use our docker installer.
The following is for developer use.
Install
npm install
Set up app/Config.js
cp app/Config.js.sample app/Config.js
# You need to change parameters in Config.js
# if you have the backend in debug mode running in 10.123.45.78:5001, then change SERVER_BASE_URL to "http://10.123.45.78:5001"
# NO TRAILING SLASHES
Start the application in development mode
npm run start
If npm run start
failed, try switching node version to 6.x.
Open http://localhost:8080 in your browser.
Build for production
npm run build
This will generate many html,js files in the public
folder. You can point Apache DocumentRoot
to public
folder to serve web pages, or symlink /var/www/html/annotree to public folder
Here is a sample apache config:
<VirtualHost example.com:80>
ServerAdmin webmaster@localhost
# here we created a symlink /var/www/html/annotree that points to public folder
DocumentRoot /var/www/html/
# alternatively use DocumentRoot /var/www/html/annotree
ErrorLog ${APACHE_LOG_DIR}/gtdb_frontend_error.log
CustomLog ${APACHE_LOG_DIR}/gtdb_frontend_access.log combined
</VirtualHost>
Now you can visit example.com/annotree
to see served pages.
Please make sure that app/Config.js
is pointed to the correct backend URL.
Issues and contributing¶
Please feel free to open an issue on Bitbucket page for developers to review.
AnnoTree Landing Page¶
This is project is based on Freelancer landing page
#Installing and Updating
Making sure you have node6
Gulp-sass, part of the build tool here works well with node 6.x
use node --version
to check your version, if different, proceed to https://github.com/ekalinin/nodeenv to install nodeenv that customizes your node environment. Specifically:
sudo easy_install nodeenv
# do NOT use root user now, otherwise script will be stuck
nodeenv node6 --node=6.14.4
. node6/bin/activate
# you have node 6 now
node --version
# should say 6.14.4
npm install
node node_modules/gulp/bin/gulp.js
# the default job compiles all SCSS files and minifies javascript, you should be good to go
Running in dev mode, real time update in browser
node node_modules/gulp/bin/gulp.js dev
Making production files
node node_modules/gulp/bin/gulp.js dev
Linking with main app¶
Symlink in this folder app/ to the public/ folder in annotree-frontend, so that all requests to <base url>/app/
gets redirected