Welcome to Mineotaur’s documentation!¶
Mineotaur is a web application to share and visually analyse high-throughput/high-content microscopy screens developed at the Carazo Salas lab of the University of Cambridge, Department of Genetics.
The project website can be found at http://www.mineotaur.org. Please cite the following paper when using Mineotaur: B. Antal, A. Chessel, R. E. Carazo Salas: Mineotaur: interactive visual analytics for high-content microscopy screens, under revision.
Contents:
Getting started¶
Motivation¶
Despite the ground-breaking discoveries in genomics, the genomes of most organisms remain black boxes with the function of the majority of genes and gene products still unknown. Moreover, many genes and proteins play roles in multiple biological processes. High-throughput/high-content microscopy-based screening (HT/HCS) provides an increasingly powerful tool to discover and functionally annotate genes and biological pathways, which already led to several important discoveries, like the systematic identification of genes important for mitosis, endocytosis, and other fundamental processes. Specialised large-scale image and data analysis methods are needed to produce phenotypic data, limiting such functional genomic annotation techniques to researchers of groups that possess that expertise. This means that the community at large is limited in their access to data and their ability to further mine it after publication, reducing the impact of the expensive HT/HC screens. Overall, while technical advances led to an explosion in the amount of data being acquired, suitable data handling, visualization and analysis techniques are still lagging behind.
What is Mineotaur?¶
Here we propose a novel data visualization tool called Mineotaur (http://www.mineotaur.org), which will allow the community to mine further the raw multidimensional feature data and knowledge from published HT/HC screens leading to a better exploitation of experimental results. The user interface allows the members of the community without any computational knowledge to extract meaningful information from the data. The web interface can be used for querying the data and the results are visualized as plots (e.g. scatter plot, histogram) in real-time. The tool is based on a novel data model allowing the visualization and analysis of extremely large amounts of data.
About the documentation¶
Installation describes how to generate a new Minetaur instance. To use an existing Mineotaur instance, see Using the web interface. Those who want to understand the technical aspects of Mineotaur better or would like to contribute to it, go to Developing Mineotaur.
Installation¶
Requirements¶
Mineotaur requires Java 8 or higher, which can be download here: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
If you want to build Mineotaur from source, you will also need Maven: https://maven.apache.org/
Generting a Mineotaur instance from text files¶
To generate a Mineotaur instance, you have to provide three input files: a data file containing all the measurements you want to include in Mineotaur, a label file containing the annotations assigned to the objects in Mineotaur and a file setting several options in Mineotaur. A sample for all input file can be downloaded here..
Data file¶
The input data file can ?SV (? Separated Values), where ? is an appropriate separator set in the options file (e.g. TSV - Tab Separated Values). Each line describes a set of measurements for a descriptive object, which is a unique obejct of interest in the experiment. Each descriptive object should be connected to a group object. Examples: descriptive object - cell, group object - gene. The file is consists of a header, an object and a type descriptor and the data lines.
Header¶
The first line of the data file. The header describes the names of the properties to be stored in the Mineotaur. Each name must be unique for a given object type and should not contain non-alphanumerical characters.
Object descriptor¶
The second line of the data file. The object descriptor describes what kind of real-world object does the respective column belongs to. The object descriptors can be any string. However it is advised to give semantically relevant names to future usage. Examples: Gene, Cell, Experiment.
Type descriptor¶
The third line of the data file. The type descriptor describes the data type for each column. The following types are accepted: * ID: identifier for a given object. Can be multiple IDs for one object type. * NUMBER: numerical data. Each numerical column of the descriptive can be queried. * TEXT: non-numerical data.
Data lines¶
Each line after starting from the fourth should contain the actual measurements for a descriptive object and other meatadat connecting them to experimental conditions.
Label file¶
The label file contain the annotations for the group level objects. For example, what genes were picked up as hits in a study. The label file consists of a header line and multiple label lines.
Header¶
The first line of the label file. The first column contains the name of the group object ID property from the data file, while the rest of the columns contain the annotations.
Label lines¶
Each line starting from the second contains a group object ID and a 1 for each annotation assigned to the group object or 0, otherwise.
Metadata wizard¶
Mineotaur also provides a graphical interface to provide the metadata required for a standard data file by starting the wizard from the command line:
java -jar <path_to_jar file> -metadata <data_file> <spearator_character>
Options file¶
The options describes metadata for the instance generation. All options are in the following format: option_name = option_value. The following options can be set:
- (REQUIRED) name: name of the instance
- group: name of the group object (same as described in the data file). Default: GENE
- groupName: group object ID (same as described in the data file). Default: geneID
- descriptive: name of the group object (same as described in the data file). Default: CELL
- total_memory: the amount of memory can be used by Neo4J. Default: 4G
- separator: character used to separate columns in the data and the label files. Default: \t
- overwrite: whether to overwrite the current instance with the same name. Default: true
Please note that the different object caching methods of the operating systems might affect the performance of Neo4J so it is advised to set the amount of total memory after some experimenting. Under OSX, it is also advised to perform a memory clean from time to time since a lot of object is kept in the memory, leading to performance loss in the long run.
Generation from command line¶
- Download the latest jar file from http://www.mineotaur.org.
- Create a property file, a data file and a label file (see documentation and example input data)
- Start the data import with the following command:
java -jar <path_to_jar file> -import mineotaur.input chia_sample.tsv chia_labels.tsv
- After the database creation is completed you can start your Mineotaur instance with the following command:
java –jar <path_to_jar file> -start <instance_name>
- You can start querying at http://127.0.0.1:8080 in your browser.
Generation using the wizard¶
- Download the latest jar file from http://www.mineotaur.org.
- Create a property file, a data file and a label file (see documentation and example input data)
- Start the data import with the following command:
java -jar <path_to_jar file> -wizard
- After the database creation is completed you can start your Mineotaur instance with the following command:
java –jar <path_to_jar file> -start <instance_name>
- You can start querying at http://127.0.0.1:8080 in your browser.
Using the web interface¶
Layout¶
Each Mineotaur instance use the same web interface layout.
Query panel¶
In each query panel, a variety of different option can be set to customize the query.
For more details, please go to the Query tools, Scatterplots and the Distribution plots pages.
Plot area¶
The plot showing the requested data will be shown below the query panel.
After the query, the Tools menu is activated, which allows different actions regarding the plot and the queried data.
For more details, please go to the The Tools menu, Scatterplots and the Distribution plots pages.
Help¶
(Optional) If provided with the Mineotaur instance, clicking on the Help link show information on the elements shown on the current page.
Query tools¶
Variable selection¶
First, the two variables (properties) to be shown needs to be selected.
Clicking on the selection panel shows the available variables.
(Group level scatterplot only)The group level variables are aggregated from the descriptive level data. The aggregation mode selection panel allows the selection of how the group level value are supposed to be caluclated.
Finally, the descriptive data can be filtered via a filter property. That is, in this example only those cells are used in the query, which are in the selected cell cycle stage.
Filtering by annotation¶
The queried group objects can be filtered by their annotations. For example, only genes with a certain hit type associated are shown:
Group selection¶
Finally, we can select what group objects what we want to show on the plot. For descriptive level scatterplots, it is done by selection the object of interest from a list:
For group level scatterplots, the selection can be done by selecting objects from a selection menu:
The search box on the top of the menu enables quick lookup of the objects included in the screen:
As an alternative, one could use the free text input by clicking the “Enter a list of genes” link below the selection box.
The entered gene names will be validated and the ones included in the screen will be selected. Once every option is selected, the submit button needs to be clicked. If you want to start over, click the Reset button which will turn every option to be their default settings.
Scatterplots¶
A scatter plot shows two variables against each other in a 2D coordinate system. In a Mineotaur instance, there are two kinds of scatterplots: group level and descriptive level scatterplots. The query plot for a group level scatterplot looks like this:
Using the scatterplot¶
Once you hit the submit button, the query is sent to the server and if there was data returned, a plot like this is displayed:
Coloring¶
The coloring of the data point are based on the colors associated to each annotation (hit) type, which can be seen in the top right corner of the plot:
The nodes are also transparent, which enables the visual representation of multiple annotations (for which the coloring is the addition of the colors) as well as showing distribution of the data points.
Exploring invidiual data points¶
Name and values¶
To see the name of the underlying data point and the respective values for the queried variables, hover the mouse over the data point.
External resource¶
(Optional) Left clicking on the data points will open an external link associated to the object, e.g. the raw images used for analysis. This option only works if the external resource is provided during the instance generation.
Subqueries¶
By invoking the context menu (e.g. right-click in Windows or CMD+click in OSX) a subquery for the selected node can be created. That is, we can see the distribution of one of the queried variables or a descriptive scatterplot.
To go back to the original scatterplot, use the browser’s back button.
Plot tools¶
Plot tools contain several to transform or analyze plots
Logarithm¶
Clicking on the Logarithm checkbox transform the axes of the plot to logarithmic scale.
To go back to the original scale, untick the checkbox.
Transpose¶
Clicking on the Logarithm checkbox swaps the X-axis and the Y-axis.
To go back to the original scale, untick the checkbox.
Regression¶
Clicking on the Regression checkbox fits a regression line on the data shown in the current plot. The type of the regression line can be selected from the selection box next to the checkbox.
To see the correlation coefficient of the regression line, hover the mouse over the line:
To remove the regression line, untick the checkbox.
Select area¶
To analyze a specific area of the plot, use the Select area tool. Checking the box transforms the cursor to an area selection tool, what you can use to draw a rectangle around the area to be selected:
If you are satisfied with the selection, hover over the are and click on the Analyze button:
Then, a plot showing the data points from the selected are is shown. To go back to the previous plot, use the browsers Back button.
Visual filtering of nodes¶
Since scatterplots can be overcrowded, it is might be hardd to find individual objects on a plot. For example, genes of interest can be highlighted on a plot by selecting them from the provided list and clicking on the Filter link.
The highlighting can be reset by using the Reset link.
Setting the opacity¶
To enable the visual inspection of crowded areas, once could use the opacity slider to set the right amount of transparency.
Distribution plots¶
Distribution plots provide a graphical representation of the distribution of a variable. In Mineotaur, there are three distribution plot types, which can be selected from the Plot type selection box from the Distribution plot query menu.
Histogram¶
Histograms shows the frequncy of variable values along the selected dataset. The binning of the histogram is automatically calculated based on the data.
Multihistogram¶
(Group level only) Multihistograms shows the frequncy of variable values along the selected dataset where the data is split according to the annotations assigned to the data points. The histograms belonging to the annotations are shown in different color. The legend is provided in the top right corner.
Kernel Density Estimation¶
Kernel Density Estimation plots show a continious approximation of the distribution with a Gaussian function fitted to the data. In group level plots, the different colors refer to the data point annotations. The legend is provided in the top right corner.
Developing Mineotaur¶
Software used to create Mineotaur¶
Server side:¶
- Programming language: Java 8
- Build system: Apache Maven
- Database: Neo4j
- Web framework: Spring Boot
- Template engine: Thymeleaf
- Test framework: TestNG
- Continous integration tool: Travis CI
- Bytecode manipulation tool: Javassist
- Command line parsing: Apache Commons CLI
- Math package: Apache Commons Math
Client side:¶
- Script and markup languages: Javascript, HTML 5, CSS
- Front-end framework: Twitter Bootstrap
- Visualization: D3
- HTML manipulation: jQuery
- jQuery UI library: jQuery-UI
- jQuery spinner: spin.js
- jQuery blockUI plugin: jquery.blockui.js
- jQuery form plugin: jquery.form.js
- jQuery history plugin: history.js
- jQuery context menu: jeegoocontext
- jQuery modal widget: Magnific Popup
- jQuery multiselect widget: jQuery UI MultiSelect widget
- jQuery modal widget: Magnific Popup
- AMD framework: RequireJS
- General utility collection: Underscore
- Math library: numbers.js
- Regression library: regression.js
- ZLib Javascript library: Pako
Architecture of Mineotaur¶
The Mineotaur web server can be accessed from both a web interface and programatically using REST. The web server handles the interaction with the graph database containing the HT/HCS data.
Server side architecture¶
The web server if based on the Spring Model-View-Controller (MVC), using Thymeleaf as a template engine. The data is stored in the Neo4j graph database. A web client can access the content by making an HTTP request to the server, which will query the appropriate data from the database and render a web page from a Thymeleaf template.
Client side architecture¶
On the client side, all interaction is done using a Javascript application. The application is modular, with different modules responsinble to handle events (Controller), carry data values (Context), manipulate web pages (UI), generate plots (Plot) and provide general functionalities (Utilities).
Features planned to be included in further releases of Mineotaur¶
- Omero integration
- REST client libraries
- Time-lapse data handling
- Network data handling
If you have any other suggestions, please let us now at info@mineotaur.org !
Licence¶
Mineotaur: a visual analytics tool for high-throughput microscopy screens Copyright (C) 2014 Bálint Antal (University of Cambridge)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.