https://img.shields.io/badge/python-2.7-blue.svg https://img.shields.io/badge/python-3.5-blue.svg _images/genomon_kun.PNG

paplot documentation

Contents:

About

paplot is a suite of programs to create various dynamic and interactive reports for cancer genome analysis.
_images/mutation_list.PNG

Available reports

  1. Quality Control (QC) Report

QC Reports present the qualities of each sequence data (sequencing coverage, alignment ratio, insert sizes, etc.).

_images/qc_dummy.PNG
  1. Chromosomal Aberration (CA) Report

CA Reports present the sample-wise landscape of chromosomal aberrations (e.g., structural variations and gene fusion).

_images/sv_dummy.PNG
  1. Mutation Matrix Report

Mutation Matrix Reports present the mutation status of each gene (vertical axis) and sample (horizontal axis).

_images/mut_dummy.PNG
  1. Mutational Signature Report

Mutational Signature Reports present the mutation signatures identified in the cohort and contribution ratios of signatures per sample.

_images/sig_dummy.PNG

Representation by pmsignature is also possible.

_images/pmsig_dummy.PNG

Quick Start

In this section, we will learn to

  1. Install paplot
  2. Execute paplot with simple example data
  3. View the output reports
  4. Modify configuration files and use your own data.

1. Install paplot

paplot is confirmed to work on Python2.7 and Python 3.5.
At present, paplot requires only standard packages.
cd {the directory where you want to install paplot}
# for v0.5.4
wget https://github.com/Genomon-Project/paplot/archive/v0.5.4.zip
unzip v0.5.4.zip
cd paplot-0.5.4/

python setup.py build install

Confirmation of installation

Execute the following command.
paplot --version
If the following messages appears, the installation is successful.
paplot-0.5.4
For more detailed information on installation, please consult Install.

2. Execute paplot with simple example data

The basic commands of paplot are as follows:
For the description of detailed options, please consult paplot command.
paplot subcommand [--config_file CONFIG_FILE] [--title TITLE]
                  [--ellipsis ELLIPSIS] [--overview OVERVIEW]
                  [--remarks REMARKS]
                  input output_dir project_name

Required arguments

subcommand:

The type of report to generate. Select from the following:

  • qc
  • ca
  • mutation
  • signature
  • pmsignature
input:

The input data table.

output_dir:

The directory wherein the output files of paplot are generated.

project_name:

The project name (used as the title of the output files).

Please execute paplot using the prepared sample data.

cd {the path where paplot is installed}

# QC Report
paplot qc example/qc_brush/data.csv ./tmp demo

# Chromosomal Aberration Report
paplot ca example/ca_option/data.csv ./tmp demo

# Mutation Matrix Report
paplot mutation example/mutation_option/data.csv ./tmp demo

# Mutational Signature Report
paplot signature "example/signature_stack/data*.json" ./tmp demo

# pmsignature Report
paplot pmsignature "example/pmsignature_stack/data*.json" ./tmp demo

3. View the output file

You will find the following directory structure:

The directory specified by the {output_dir} argument
  ├ demo
  │   ├ graph_ca.html            <--- Chromosomal Aberration Report
  │   ├ graph_mut.html           <--- Mutation Matrix Report
  │   ├ graph_pmsignature2.html  <--- pmsignature Report (with varying number of mutation signatures)
  │   ├ graph_pmsignature3.html
  │   ├ graph_pmsignature4.html
  │   ├ graph_pmsignature5.html
  │   ├ graph_pmsignature6.html
  │   ├ graph_qc.html            <--- QC Report
  │   ├ graph_signature2.html    <--- Mutational Signature Report (with varying number of mutation signatures)
  │   ├ graph_signature3.html
  │   ├ graph_signature4.html
  │   ├ graph_signature5.html
  │   └ graph_signature6.html
  │
  ├ js          <--- The next four directories are necessary to display HTML files, Do not remove them.
  ├ layout
  ├ lib
  ├ style
  │
  └ index.html             <--- Open this file in a web browser.
Open index.html file in a web browser, and you will find the following reports.

QC Report
_images/qc_dummy.PNG
Chromosomal Aberration Report
_images/sv_dummy.PNG
Mutation Matrix Report
_images/mut_dummy.PNG
Mutational Signature Report
_images/sig_dummy.PNG
pmsignature Report
_images/pmsig_dummy.PNG
For how to interpret each graph, please refer to HOW TO USE GRAPHS.

Modify configuration files and use your own data

Please consult the following links to set up your own data and configuration files.

[For basic use]

Install

paplot runs on:
  • Linux
  • MacOS X
  • Windows
You require Python2.7 or Python 3.5 to execute paplot (previous versions of Python have not been tested).

For Linux

1. Install paplot

Download the latest Source code (zip) files from the paplot web-site (https://github.com/Genomon-Project/paplot/releases/).
cd {the directory where you want to install paplot}
# For v0.5.4
wget https://github.com/Genomon-Project/paplot/archive/v0.5.4.zip
unzip v0.5.4.zip
cd paplot-0.5.4/

python setup.py build install

# If you get an error with the above command, try this
export PATH=~/.local/bin/:$PATH
export LD_LIBRARY_PATH=~/.local/lib/:$LD_LIBRARY_PATH
python setup.py build install --user
If the following messages appear, the installation is successful!.
paplot --version
paplot-0.5.4
After installation, open Quick Start.

Note

Set PATH

Add the following two lines to ~/.bashrc or ~/.bash_profile.
export PATH=~/.local/bin/:$PATH
export LD_LIBRARY_PATH=~/.local/lib/:$LD_LIBRARY_PATH

For MacOS X

1. Download source files

Download the latest Source code (zip) files from the paplot website (https://github.com/Genomon-Project/paplot/releases/).
Alternatively, if git is installed, you can type git clone -b master https://github.com/Genomon-Project/paplot.git.

2. Install paplot

Launch Terminal, and change the directory where the source files are downloaded.
cd {the directory where paplot source files are downloaded}
Install paplot.
python setup.py build install --user

3. Setting PATH

Add the path of the executable file to PATH with terminal.
Generally, the executable file of paplot is installed as described below.

/Users/<user name>/Library/Python/2.7/bin

export PATH={the directory where paplot is installed}/bin:$PATH
export LD_LIBRARY_PATH={the directory where paplot is installed}/lib:$LD_LIBRARY_PATH

# Mostly you can set up by adding the following lines (replace <user name> with your user name).
# export PATH=/Users/<user name>/Library/Python/2.7/bin:$PATH
# export LD_LIBRARY_PATH=/Users/<user name>/Library/Python/2.7/lib:$LD_LIBRARY_PATH
Verify installation.
paplot --version
paplot-0.5.4
Then, the installation is successful! Open Quick Start.

For Windows

1. Install Python

To execute paplot in Windows, it is recommended to using use either winPython or Python(x,y).
Alternatively, you can use cygwin (in that case, refer to For Linux).
paplot is verified in Python 2.7.10, Python 3.5.3.

2. Install paplot

Download the latest Source code (zip) files from the paplot website (https://github.com/Genomon-Project/paplot/releases/),
and unzip the downloaded file into an arbitrary folder.
Launch Command prompt, and change the directory where the source files of paplot are unzipped.
cd {the directory where the source files are unzipped}
Execute the command for installing paplot.

Attention

The following command is for the case where WinPython-64bit-2.7.10.3 is installed.

> C:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe setup.py build install
Then, execute the test command.
> C:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe paplot --version
paplot-0.5.4
It seems to be successful if such a display appears.

After installation, open Quick Start.

paplot command

1. Basic usage

paplot subcommand [--config_file CONFIG_FILE] [--title TITLE]
                  [--ellipsis ELLIPSIS] [--overview OVERVIEW]
                  [--remarks REMARKS]
                  input output_dir project_name

Required arguments

subcommand:

The type of report to generate. Select from the following:

  • qc
  • ca
  • mutation
  • signature
  • pmsignature
input:

Input files. If you wish to process multiple files (usually divided by individual samples), please refer to Processing multiple input files.

# for single input file
paplot mutation {unzip_path}/example/mutation_minimal/data.csv ./tmp mutation_minimal \
--config_file {unzip_path}/example/mutation_minimal/paplot.cfg

# for multiple input files, delimit them by comma
paplot mutation \
{unzip_path}/example/mutation_split_file/SAMPLE00.data.csv,{unzip_path}/example/mutation_split_file/SAMPLE01.data.csv \
./tmp mutation_split_file1 --config_file {unzip_path}/example/mutation_split_file/paplot.cfg

# paplot also accept wild card representation. In this case enclose the input by double quotations
paplot mutation "{unzip_path}/example/mutation_split_file/*.csv" ./tmp mutation_split_file2 \
--config_file {unzip_path}/example/mutation_split_file/paplot.cfg
output_dir:Output directory path. Refer Output directory for the details of the directory components.
project_name:Project name (used as the title of the output files).

2. Output directory

You will find the following directory structure:

{output_dir}{project_name}
  │   └ graph_*.html      <--- Each report
  │
  ├ js          <--- The next four directories are necessary to display HTML files, Do not remove them.
  ├ layout
  ├ lib
  ├ style
  │
  └ index.html             <--- Open this file in a web browser.

If you wish to move the output, move the entire output directory. For the usage of each report, please refer to HOW TO USE GRAPHS.

3. Options

You can add the following optional arguments:

--config_file Path to the configuration file. If it is not specified, the default file is used.
--title Title of the graph.
--ellipsis Abbreviated name of the graph used for file names (e.g., graph_**ca**.html). It may be convenient when outputting multiple files to the same directory.
--overview Outline of the graph (displayed in the index.html file).
--remarks Text displayed in the remark section of the index.html file (the default value is set at ( [style] section’s remarks option) in the configuration file.

The default values are as follows:

subcommand title ellipsis overview remarks
qc QC graphs qc Quality Control of bam. None
ca CA graphs ca Chromosomal Aberration. None
mutation Mutation Matrix mutation Gene-sample mutational profiles. None
signature Signature signature Mutational Signatures. None
pmsignature PMSignature pmsignature Express mutational signatures in pmsignature. None

Quality Control (QC) Report

QC Report displays qualities of each sequence data, such as sequence depth and coverage, mapping ratio, and duplicate ratio.

When mousing over each bar, detailed information for each sample and property will be displayed as pop-ups.

Barplot of light grey color at the top shows the average depth of each sequence data. [*]
By selecting the range in this panel, you can zoom in all the graphs below so that just an easily viewable number of samples are highlighted.
[*]The graph for selecting samples can be changed by using the configuration files. Please refer to Graph for selecting samples .
_images/qc_operation.PNG

Chromosomal Aberration (CA) Report

CA Report displays a landscape of chromosomal aberrations, such as structural variations (typically identified by genome sequence data) and gene fusions (by transcriptome sequence data).
  • The barplot at the top panel displays the distribution of the breakpoints of the CAs in the cohort.
  • The circular plots below illustrate the CIRCOS-like profile of the CAs for each sample, where two edges of a curved line represent the breakpoints of each CA.
When you select a region in the barplot at the top, samples having any of the breakpoints in the selected region are highlighted (when Style of selected graphs are set to “Highlight selected graphs”) or the samples without any of the breakpoints in the region disappears (“Hide non-selected graphs”).
_images/sv_operation1.PNG
In the default setting, CAs are categorized as Inter-chromosome (two breakpoints are on different chromosomes) or Intra-chromosome (two breakpoints are on the same chromosome). [*] If you uncheck any of the groups, the CAs within the group will disappear.
[*]

About categorization

By using the configuration file, the categorization can be changed. Please refer to Customizing categorization .

_images/sv_operation2.PNG
Click on the circle graph for each sample to enlarge it.
View the details by hovering the mouse over the lines connecting the breakpoints.
_images/sv_operation3.PNG

Mutation Matrix Report

Mutation Matrix Report displays a landscape of mutation status across genes (vertical axis) and samples (horizontal axis).

Horizontal bar chart (Sample):
 
Displays the total number of mutations detected for each sample.
Vertical bar chart (Gene):
 
Displays the number of mutations and fractions of mutation types (e.g., nonsynonymous, stopgain, etc.) for each gene.

- If the same sample has multiple mutations on the same gene, this is counted as 1.
- If the same sample has multiple mutations with different mutation types on the same gene, a mutation type with “the higher priority” is counted. For example, in the default setting, the priority of stopgain is higher than that of nonsynonymous mutations. The order of priority can be modified by the configuration file.
Mutation type:
The mutation type is displayed with a distinct color. If you wish to hide a specific type of mutations, uncheck them in this section.
Subplot:
If meta information is available for the samples (e.g., clinical information), it can be displayed as a subplot. This file must be entered in the configuration file before executing the paplot command.
_images/mut_operation1.PNG

How to view

_images/mut_operation2.PNG _images/mut_operation2_2.PNG

1. X-axis sort

Change the order of items on the horizontal axis:

  • None ... Default order
  • ASC ... Ascending order
  • DESC ... Descending order

It can sort based on the following elements (allowing for multiple key ordering):

Sample ID:
Sort by sample name.
Mutation number:
 
Sort by the number of mutations per sample.
Genes:
Sort by the mutation states of the selected genes. After selecting either ASC or DESC, select the gene to add from the [Select gene name] list box, and click the [Add sort key] button.
Automatic Gantt-chart:
 
A Gantt chart can be automatically created. Enter the number of genes to display [*] in the horizontal edit box, and click the [Gantt-chart] button.

How Gantt-chart a Generated

  1. First, sort the genes according to the descending order to number of mutations.
  2. Then, divide the samples into two groups according to the mutation status of the first gene, and place the group with the mutation to the left and the other group to the right. Repeat this procedure for the second one, third one, etc.
[*]It is preferable to display all the detected genes. However, as processing becomes heavier. In numerous cases, narrowing down to the gene list will be practical.
_images/mut_operation3.PNG

2. Y-axis sort

Change the order of items on the vertical axis.

  • None ... Default order
  • ASC ... Ascending order
  • DESC ... Descending order

It can sort based only the following elements (allowing for multiple key ordering):

Mutation number:
 Sort by number of mutations per gene.
Gene name:Sort by gene name.

3. Sample filter

Sets the maximum value of the vertical axis of the horizontal bar chart.

In certain cases, only a few samples exhibit remarkably large numbers of mutations compared to the others.
In those cases, setting the threshold for the maximum number of mutations will render the graph significantly more convenient to view.
Enter the threshold value in the horizontal edit box. Then, click the [Update filter] button.
In the default setting (“blank”), the maximum of the horizontal axis is automatically set to the maximum number of mutations among the samples in the cohort.

Before and after filter application

Example of display when maximum value is set to 200:

_images/mut_operation4.PNG

4. Genes filter

Set the filter for the gene displayed on the vertical axis.

Rate:
Frequency of the samples with mutations at each gene (%). The initial value is 0% (no filtering).
Display maximum:
 
Maximum number of genes to display.

After setting the above items, please click the [update filter] button.

Mutational Signature Report

Mutational Signature Report displays the “mutation signature” (see e.g., Alexandrov et al., Nature, 2013) as bargraphs and the estimated “contributions” of each signature to the mutations per samples as stacked bar graphs.

Upper panel (mutation signature graph):
 
Mutation signatures are displayed (typically by barplots with 96 elements).
Lower panel (signature contribution graph):
 
For each sample, the ratios of contributions of mutation signatures to the mutations are displayed.
_images/sig_dummy.PNG

In addition, you can change the display mode using the list box below the stacked graph.

View mode:
Rate:The percentage (%) of signature contribution normalized by sum-to-one constraint.
Count:It displays the ratio by the actual number of mutations.
Sort by:
Sample ID:Sort by sample ID.
Mutation count:Descending order of the number of mutations (applicable only when the view mode is Count).

Display example when the view mode is [Count] and sorting is by [Mutation count].

_images/sig_operation1.PNG

A similar representation is adopted for the case with pmsignature.

_images/pmsig_dummy.PNG

QC Report

Here, we describe the procedure to generate QC report using sample data [*].

[*]The sample data is equipped with the example directory of the paplot directory.

1. Minimal dataset

For generating QC Report using paplot, sample ID (Sample) and QC items (at least one) are required. In this example, we adopt mean sequence depth (AverageDepth).

example/qc_minimal/data.csv
Sample,AverageDepth
SAMPLE1,70.0474
SAMPLE2,65.7578
SAMPLE3,63.3750
SAMPLE4,70.9654
SAMPLE5,69.9653

First, set the column names in the [result_format_qc] section of the configuration file.

example/qc_minimal/paplot.cfg
[result_format_qc]
col_opt_id = Sample
col_opt_key1 = AverageDepth

The column names of optional items can be set as col_opt_{keyword} = {actual column name}.

For a more detailed description on keyword, please refer to About keyword.

Then, add the [qc_chart_1] section to the configuration file and fill the contents within it.

example/qc_minimal/paplot.cfg
[qc_chart_1]

# Title of the graph
title = Average depth

# Label of the Y axis
title_y = Average of depth

# Items for the stacked bargraph
# In this example, only one item is used and the graph is displayed as non-stacked bargraph
stack1 = {key1}

# Color and legend of the graph
name_set = Average depth:#2478B4

# Pop-up information
tooltip_format1 = Sample:{id}
tooltip_format2 = {key1:.2}

Note

Here, {key1} used above is the {keyword} specified in the [result_format_qc] section.

  • For a more detailed description on the procedure to set name_set, please refer to How to set name_set.
  • For a more detailed description on the procedure to set tooltip_format, please refer to User defined format.

Then, execute paplot.

paplot qc {unzip_path}/example/qc_minimal/data.csv ./tmp qc_minimal \
--config_file {unzip_path}/example/qc_minimal/paplot.cfg

2. Without header

example/qc_noheader/data.csv
SAMPLE1,70.0474
SAMPLE2,65.7578
SAMPLE3,63.3750
SAMPLE4,70.9654
SAMPLE5,69.9653

When the input data has no header (column names), it is necessary to set the column number to each key in the [result_format_qc] section of the configuration file.

example/qc_noheader/paplot.cfg
[result_format_qc]
# Set the value of the header option to False
header = False

col_opt_id = 1
col_opt_average_depth = 2

Then, execute paplot.

paplot qc {unzip_path}/example/qc_noheader/data.csv ./tmp qc_noheader \
--config_file {unzip_path}/example/qc_noheader/paplot.cfg

3. Stacked bargraph

Here, we generate a report with a stacked bargraph as well as a normal bargraph (generated in the minimal dataset example).

example/qc_stack/data.csv
Sample,AverageDepth,ReadLengthR1,ReadLengthR2
SAMPLE1,70.0474,265,270
SAMPLE2,65.7578,140,200
SAMPLE3,63.375,120,175
SAMPLE4,70.9654,120,140
SAMPLE5,69.9653,230,110

  • chart_1 [normal bargraph] AverageDepth (the same as the minimal dataset example)
  • chart_2 [stacked bargraph] ReadLengthR1, ReadLengthR2

First, add these columns to the [result_format_qc] section in the configuration file.

example/qc_multi_plot/paplot.cfg
[result_format_qc]
col_opt_id = Sample

# Column used in the chart_1
col_opt_keyA1 = AverageDepth

# Column used in the chart_2
col_opt_keyB1 = ReadLengthR1
col_opt_keyB2 = ReadLengthR2

The column names of optional items can be set as col_opt_{keyword} = {actual column name}.

For a more detailed description on keyword, please refer to About keyword.

Next, add the [qc_chart_1], and [qc_chart_2] sections to the configuration file and fill the contents within it.

To increase the number of graphs in QC Report, increase the [qc_chart_*] sections.
Set the indices to *, which should start from 1.

For the completed configuration file, please refer to example/qc_stack/paplot.cfg.

3-1. Normal bargraph

The [qc_chart_1] section is for a normal bargraph, and the contents should be filled as in the minimal dataset example.

3-2. Stacked bargraph

The [qc_chart_2] section is for a stacked bargraph.

example/qc_multi_plot/paplot.cfg
[qc_chart_2]

# Titles
title = Chart 2: Read length
title_y = Read length

# Items for the stacked bargraph
# Items are stacked in the order of stack1 → 2 → ...
stack1 = {keyB1}
stack2 = {keyB2}

# Color and legend
# Write down in the order of stack1 → 2 → ..., and join them by commas ','.
name_set = Read length r1:#2478B4, Read length r2:#FF7F0E

# Pop-up information
tooltip_format1 = Sample:{id}
tooltip_format2 = Read1: {keyB1:,}
tooltip_format3 = Read2: {keyB2:,}

Note

Here, {key*} used above is the {keyword} specified in the [result_format_qc] section.

  • For a more detailed description on the procedure to set name_set, please refer to How to set name_set.
  • For a more detailed description on the procedure to set tooltip_format, please refer to User defined format.

Then, execute paplot.

paplot qc {unzip_path}/example/qc_multi_plot/data.csv ./tmp qc_multi_plot \
--config_file {unzip_path}/example/qc_multi_plot/paplot.cfg

3-3. How to set name_set

Define the legends and their colors.

Write {legend}:{color} for each item in the stacked bargraph (colors can be omitted).

name_set = average_depth:#2478B4

# When there are multiple items, join them by commas ','.
name_set = Read length r1:#2478B4, Read length r2:#FF7F0E

When colors are omitted, the default colors defined in the following file are used:

_images/default_color.PNG

4. Various graphs

In the previous example, we generated a report with one normal bargraph and one stacked bargraph. Here, we generate more graphs.

example/qc_variation/data.csv
Sample,AverageDepth,ReadLengthR1,ReadLengthR2,TotalReads,MappedReads,2xRatio,10xRatio,20xRatio,30xRatio
SAMPLE1,70.0474,265,270,94315157,56262203,0.9796,0.768,0.6844,0.6747
SAMPLE2,65.7578,140,200,50340277,33860998,0.8489,0.7725,0.7655,0.6131
SAMPLE3,63.375,120,175,90635480,88010999,0.9814,0.8236,0.6045,0.5889
SAMPLE4,70.9654,120,140,72885114,89163960,0.9047,0.8303,0.7032,0.6801
SAMPLE5,69.9653,230,110,92572101,28793615,0.9776,0.9452,0.672,0.6518

  • chart_1 [normal bargraph] AverageDepth (the same as the minimal dataset example)
  • chart_2 [stacked bargraph] ReadLengthR1, ReadLengthR2 (the same as the previous example)
  • chart_3 [normal bargraph] MappedReads divided by TotalReads (mapping ratio)
  • chart_4 [stacked bargraph] 2xRatio, 10xRatio, 20xRatio, 30xRatio (subtracting the values of items below)

First, add these columns to the [result_format_qc] section in the configuration file.

example/qc_variation/paplot.cfg
[result_format_qc]
col_opt_id = Sample

# Columns used in the chart_1
col_opt_average_depth = AverageDepth

# Columns used in the chart_2
col_opt_read_length_r1 = ReadLengthR1
col_opt_read_length_r2 = ReadLengthR2

# Columns used in the chart_3
col_opt_mapped_reads = MappedReads
col_opt_total_reads = TotalReads

# Columns used in the chart_4
col_opt_ratio_2x = 2xRatio
col_opt_ratio_10x = 10xRatio
col_opt_ratio_20x = 20xRatio
col_opt_ratio_30x = 30xRatio

The column names of optional items can be set as col_opt_{keyword} = {actual column name}.

For a more detailed description on keyword, please refer to About keyword.

Next, add the [qc_chart_1], [qc_chart_2], [qc_chart_3], and [qc_cahrt_4] sections to the configuration file and fill the contents within it.

For the completed configuration file, please refer to example/qc_variation/paplot.cfg.

4-1. Simple normal bargraph

The [qc_chart_1] section is for a normal bargraph, and the contents should be filled as in the minimal dataset example.

4-2. Simple stacked bargraph

The [qc_chart_2] section is for a stacked bargraph, and the contents should be filled as in the previous example.

4-3. Normal bargraph (with numeric operations on columns)

The [qc_chart_3] section is a graph for mapping ratio (Mapped reads divided by Total reads).

example/qc_variation/paplot.cfg
[qc_chart_3]

# Titles
title = Mapped reads/Total reads
title_y = Rate

# Items for the graph
stack1 = {mapped_reads/total_reads}

# Colors and legends
name_set = Mapped reads/Total reads:#2478B4

# Pop-up information
tooltip_format1 = Sample:{id}
tooltip_format2 = {mapped_reads/total_reads:.2}
In the above example, we used division (stack1 = {mapped_reads/total_reads}).
We can also use subtraction (e.g., {mapped_reads-total_reads}) and addition (e.g., {mapped_reads+total_reads}).

Additionally, we can use numerical operations for pop-up information
tooltip_format2 = {mapped_reads/total_reads:.2}

When we wish to display the value for each column, set as
tooltip_format2 = Mapped: {mapped_reads}, Total: {total_reads}.

For more detailed description on how to set pop-up information, please refer to User defined format.

4-4. Stacked bargraph (with numerical operations on columns)

The chart_4 section is a graph for sequence coverage.

example/qc_variation/paplot.cfg
[qc_chart_2]

# Title
title = Depth coverage
title_y = Coverage

# Items for the graph
stack1 = {ratio_30x}
stack2 = {ratio_20x-ratio_30x}
stack3 = {ratio_10x-ratio_20x}
stack4 = {ratio_2x-ratio_10x}

# Colors and legends
name_set = Ratio 30x:#2478B4, Ratio 20x:#FF7F0E, Ratio 10x:#2CA02C, Ratio 2x:#D62728

# Pop-up information
tooltip_format1 = ID:{id}
tooltip_format2 = ratio__2x: {ratio_2x:.2}
tooltip_format3 = ratio_10x: {ratio_10x:.2}
tooltip_format4 = ratio_20x: {ratio_20x:.2}
tooltip_format5 = ratio_30x: {ratio_30x:.2}

Here, we set the first stack (stack1) to ratio_30x, the second stack (stack2) to ratio_30x subtracted by ratio_20x, etc.

Then, execute paplot.

paplot qc {unzip_path}/example/qc_variation/data.csv ./tmp qc_variation \
--config_file {unzip_path}/example/qc_variation/paplot.cfg

5. Graph for selecting samples

Here, we add the graph for selecting samples (using the column AverageDepth). If you wish to use other columns, it should be registered in the [result_format_qc] section of the configuration file (as col_opt_{name}).

Only one graph for selecting samples can be included. Add the [qc_chart_brush] section to the configuration file and fill the contents within it.

example/qc_brush/paplot.cfg
[qc_chart_brush]
stack = {average_depth}
name_set = average:#E3E5E9

Then, execute paplot.

paplot qc {unzip_path}/example/qc_brush/data.csv ./tmp qc_brush \
--config_file {unzip_path}/example/qc_brush/paplot.cfg

Chromosomal Aberration Report

Here, we describe the procedure to generate Chromosomal Aberration Report using sample data [*].

[*]The sample data is equipped with the example directory of the paplot directory.

1. Minimal dataset

For generating Chromosomal Aberration Report using paplot, at least the following five items are necessary:

  • Sample ID (Sample)
  • Chromosome of the breakpoint 1 (Chr1)
  • Coordinate of the breakpoint 1 (Break1)
  • Chromosome of the breakpoint 2 (Chr2)
  • Coordinate of the breakpoint 2 (Break2)
example/ca_minimal/data.csv
Sample,Chr1,Break1,Chr2,Break2,
SAMPLE1,14,16019088,12,62784483,
SAMPLE1,9,99412502,7,129302434,
SAMPLE1,13,84663781,18,52991509,
SAMPLE2,11,101374238,22,26701405,
SAMPLE2,2,121708638,7,137424167,
SAMPLE3,22,34268355,10,19871820,
SAMPLE3,8,107868940,hs37d5,20517614,
SAMPLE4,8,135644313,3,116748248,
SAMPLE4,7,6037836,21,34855497,
SAMPLE4,7,109724564,14,106387943,

Set the column names in the [result_format_ca] section of the configuration file.

example/ca_minimal/paplot.cfg
[result_format_ca]
col_chr1 = Chr1
col_break1 = Break1
col_chr2 = Chr2
col_break2 = Break2
col_opt_id = Sample

Then, execute the paplot.

paplot ca {unzip_path}/example/ca_minimal/data.csv ./tmp ca_minimal \
--config_file {unzip_path}/example/ca_minimal/paplot.cfg

2. Without header

example/ca_noheader/data.csv
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
SAMPLE00,exonic,GATA3
SAMPLE01,splicing,WASF3
SAMPLE01,intronic,WASF3
SAMPLE01,exonic,NRAS
SAMPLE02,intronic,FBXW7
SAMPLE02,intronic,GATA3
SAMPLE02,ncRNA_intronic,ACVR2B
SAMPLE03,exonic,CAP2
SAMPLE03,intronic,PIK3CA
SAMPLE03,downstream,SEPT12

When the input data has no header (column names), it is necessary to set the column number to each key in the [result_format_ca] section of the configuration file.

example/ca_noheader/paplot.cfg
[result_format_ca]
# Set the value of the header option to False
header = False

col_chr1 = 2
col_break1 = 3
col_chr2 = 4
col_break2 = 5
col_opt_id = 1

Then execute paplot.

paplot ca {unzip_path}/example/ca_noheader/data.csv ./tmp ca_noheader \
--config_file {unzip_path}/example/ca_noheader/paplot.cfg

3. Customizing categorization

In the minimal dataset, chromosomal aberrations are categorized into intra-chromosomal (green) and inter-chromosomal (purple). We can customize the categorization.

example/ca_group/data.csv
Sample,Chr1,Break1,Chr2,Break2,Label
SAMPLE1,14,16019088,12,62784483,C
SAMPLE1,9,99412502,7,129302434,B
SAMPLE1,13,84663781,18,52991509,A
SAMPLE2,11,101374238,22,26701405,B
SAMPLE2,2,121708638,7,137424167,C
SAMPLE2,16,43027789,22,23791492,C
SAMPLE3,22,34268355,10,19871820,A
SAMPLE3,14,56600342,hs37d5,5744957,B
SAMPLE3,Y,12191863,hs37d5,29189687,A
SAMPLE4,8,135644313,3,116748248,D
SAMPLE4,7,6037836,21,34855497,D
SAMPLE4,7,109724564,14,106387943,A

In the example data above, a new column, Label, is included apart from Sample, Chr1, Break1, Chr2, and Break2. First, we set the Label as the column used for categorization in the [result_format_ca] section in the configuration file.

example/ca_group/paplot.cfg
[result_format_ca]
col_opt_group = Label

Moreover, the color for each category can be set.

example/ca_group/paplot.cfg
[ca]
# Set {Value}:{the name of color or RGB value} for each category and join them by comma ','.
group_color = A:#66C2A5,B:#FC8D62,C:#8DA0CB,D:#E78AC3

# Only categories registered below will be displayed.
limited_group =

# Categories registered below will not be displayed.
nouse_group =

Then, execute paplot.

paplot ca {unzip_path}/example/ca_group/data.csv ./tmp ca_group \
--config_file {unzip_path}/example/ca_group/paplot.cfg

4. Customizing pop-up information

We can customize the pop-up information that appears upon mouseover events. In the minimal dataset, the pop-up information is displayed as illustrated below:

Before customization
_images/data_ca1.png

By customizing the pop-up information, we can view more detailed information on each chromosomal aberration.

After customization

_images/data_ca2.png
example/ca_option/data.csv
Sample,Chr1,Break1,Dir1,Chr2,Break2,Dir2,MutationType,Gene1,Gene2
SAMPLE1,14,16019088,-,12,62784483,+,deletion,LS7T1EG444,4GRRIO5AVR
SAMPLE1,9,99412502,-,7,129302434,+,translocation,FQFW16UF5U,QP779MLPNV
SAMPLE1,13,84663781,+,18,52991509,-,deletion,Q9VX1I9U3I,7XM09ETN40
SAMPLE1,1,153160367,+,22,33751554,+,inversion,CEE2SPV1R1,PVYYQIVS8G
SAMPLE1,18,12249358,-,3,146222593,+,translocation,HH9OL7CK6G,XD80LI4E6Q
SAMPLE1,21,8658030,+,X,133492043,-,tandem_duplication,I20EVP15ZM,WPE8O5H237
SAMPLE1,12,120178477,+,1,155354923,-,deletion,IMYXD3TCA4,3MNN5J0MDN
SAMPLE2,11,101374238,+,22,26701405,+,translocation,FZ7LOS66RD,9WYBJR57E0
SAMPLE2,2,121708638,-,7,137424167,-,translocation,5655M5E46B,HB14VJXDHV
SAMPLE2,16,43027789,+,22,23791492,-,inversion,REFSIL0H2M,L5EA31R8U0
SAMPLE2,19,3862589,-,16,37135239,+,deletion,1IRWHVZLH8,6FUR9YMZOH
SAMPLE2,20,50294222,+,1,164250235,-,inversion,DOH5G0YRQ9,9TWYMR5CZ2
SAMPLE2,X,67392415,+,15,3327412,+,translocation,EM36MRX9B3,G4FPLN527D
SAMPLE3,22,34268355,+,10,19871820,+,tandem_duplication,9SVRQCFVCO,2BEWSO91FZ

In this example, the following five (optional) columns are incorporated apart from the five required columns:

  • Mutation type (MutationType)
  • Gene affected by the breakpoint 1 (Gene1)
  • Gene affected by the breakpoint 2 (Gene2)
  • Direction of the breakpoint 1 (Dir1)
  • Direction of the breakpoint 2 (Dir2)

First, add these columns to the [result_format_ca] section in the configuration file.

example/ca_option/paplot.cfg
[result_format_ca]
col_opt_dir1 = Dir1
col_opt_dir2 = Dir2
col_opt_type = MutationType
col_opt_gene_name1 = Gene1
col_opt_gene_name2 = Gene2
col_opt_dir1 = Dir1
col_opt_dir2 = Dir2

The column names of the optional items can be set as col_opt_{keyword} = {actual column name}.

For a more detailed description on keyword, please refer to About keyword.

Then, modify the [ca] section in the configuration file.

example/ca_option/paplot.cfg
[ca]
# before customization
# tooltip_format = [{chr1}] {break1:,}; [{chr2}] {break2:,}
# after customization
tooltip_format = [{chr1}] {break1:,} ({dir1}) {gene_name1}; [{chr2}] {break2:,} ({dir2}) {gene_name2}; {type}

Then, execute paplot.

paplot ca {unzip_path}/example/ca_option/data.csv ./tmp ca_option \
--config_file {unzip_path}/example/ca_option/paplot.cfg

For a more detailed description on the procedure to set pop-up information (tooltip_format), please refer to User defined format.

Mutation Matrix Report

Here, we show describe the procedure generate Mutation Matrix report using sample data [*].

[*]The sample data is equipped with the example directory of the paplot directory.

1. Minimal dataset

For generating Mutation Matrix Report using paplot, at least sample ID (Sample), gene name (Gene) and mutation type (MutationType) are required.

example/mutation_minimal/data.csv
Sample,MutationType,Gene
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
SAMPLE00,exonic,GATA3
SAMPLE01,splicing,WASF3
SAMPLE01,intronic,WASF3
SAMPLE01,exonic,NRAS
SAMPLE02,intronic,FBXW7
SAMPLE02,intronic,GATA3
SAMPLE02,ncRNA_intronic,ACVR2B
SAMPLE03,exonic,CAP2
SAMPLE03,intronic,PIK3CA
SAMPLE03,downstream,SEPT12

Although the column names are Sample, MutationType, and Gene, they can be arbitrary changed.

Set the column names in the [result_format_mutation] section of the configuration file.

example/mutation_minimal/paplot.cfg
[result_format_mutation]
col_group = MutationType
col_gene = Gene
col_opt_id = Sample

Then, execute the paplot.

paplot mutation {unzip_path}/example/mutation_minimal/data.csv ./tmp mutation_minimal \
--config_file {unzip_path}/example/mutation_minimal/paplot.cfg

2. Without header

example/mutation_noheader/data.csv
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
SAMPLE00,exonic,GATA3
SAMPLE01,splicing,WASF3
SAMPLE01,intronic,WASF3
SAMPLE01,exonic,NRAS
SAMPLE02,intronic,FBXW7
SAMPLE02,intronic,GATA3
SAMPLE02,ncRNA_intronic,ACVR2B
SAMPLE03,exonic,CAP2
SAMPLE03,intronic,PIK3CA
SAMPLE03,downstream,SEPT12

When the input data has no header (column names), it is necessary to set the column number to each key in the [result_format_mutation] section of the configuration file.

example/mutation_noheader/paplot.cfg
[result_format_mutation]
# Set the value of the header option to False
header = False

col_group = 2
col_gene = 3
col_opt_id = 1

Then execute paplot.

paplot mutation {unzip_path}/example/mutation_noheader/data.csv ./tmp mutation_noheader \
--config_file {unzip_path}/example/mutation_noheader/paplot.cfg

3. Customizing pop-up information

We can customize the pop-up information that appears upon mouseover events. In the minimal dataset, the pop-up information displays sample, gene, and mutation type as illustrated below:

Before customization

_images/data_mut1.png

By customizing the configuration file, the information of positions and substitution types can be incorporated.

After customization

_images/data_mut2.png
example/mutation_option/data.csv
Sample,Chr,Start,Ref,Alt,MutationType,Gene
SAMPLE00,chr10,8114472,A,C,intronic,GATA3
SAMPLE00,chr13,28644892,G,-,intronic,FLT3
SAMPLE00,chr13,28664636,-,G,intronic,FLT3
SAMPLE00,chr16,68795521,-,T,UTR3,CDH1
SAMPLE00,chr10,8117068,G,T,exonic,GATA3
SAMPLE00,chr3,178906688,G,A,intronic,PIK3CA
SAMPLE00,chr13,28603715,G,-,intergenic,FLT3
SAMPLE00,chr14,103368263,G,C,intronic,TRAF3

In the example data above, the following four (optional) items are incorporated a part from sample ID, gene name, and mutation type (required items).

  • Chromosome (Chr)
  • Variant start position (Start)
  • Reference base (Ref)
  • Alternative base (Alt)

First, add these columns to the [result_format_mutation] section in the configuration file.

example/mutation_option/paplot.cfg
[result_format_mutation]
col_opt_chr = Chr
col_opt_start = Start
col_opt_ref = Ref
col_opt_alt = Alt

The column names of optional items can be set as col_opt_{keyword} = {actual column name}.

For a more detailed description on keyword, please refer to About keyword.

Then, modify the [mutation] section in the configuration file.

example/mutation_option/paplot.cfg
[mutation]
# before customization
# tooltip_format_checker_partial = Mutation Type[{group}]
# after customization
tooltip_format_checker_partial = Mutation Type[{group}] {chr}:{start:,} [{ref} -> {alt}]

Then, execute paplot.

paplot mutation {unzip_path}/example/mutation_option/data.csv ./tmp mutation_option \
--config_file {unzip_path}/example/mutation_option/paplot.cfg

Here, we describe the procedure to customize the pop-up for each element in the main grid. For customizing other pop-ups, please refer the following:

Six types are set for each display location; however the method of writing is identical.

Correspondence between setting items and display

_images/conf_mut4.PNG

The following can also be used as a special keyword:

{#number_id}:the number of mutations per sample
{#number_gene}:the number of mutations per gene
{#number_mutaion}:
 the number of mutations (Even if the same sample is detected multiple times with the same gene, it counts as 1.)
{#sum_mutaion}:Total number of mutations
{#item_value}:Value of one item of stacked graph
{#sum_item_value}:
 Total value of stacked graph

Moreover, for a more detailed description of the procedure to set pop-up information, please refer to User defined format.

Mutational Signature Report

Here, we describe the procedure to generate Mutation Signature Report using sample data [*].

[*]The sample data is equipped with the example directory of the paplot directory.

1. Input data format

To generate Mutation Signature Report using paplot, json format input data is required.

example/signature_stack/data2.json
{
  "signature":[
                [ # signature 1
                  [0.0018,0.0003,0.0002,0.0005,0.0014,0.0008,0.0002,0.0007,0.0012,0.0003,0.0002,0.0004,0.0271,0.0107,0.0016,0.0145],  # C -> A
                  [0.0023,0.0007,0.0001,0.002,0.0027,0.0005,0.0004,0.0032,0.0007,0.0004,0.0001,0.0013,0.1546,0.0306,0.0055,0.1931],   # C -> G
                  [0.0043,0.0016,0.0027,0.0019,0.0096,0.0026,0.0046,0.0053,0.0045,0.0021,0.0034,0.0028,0.2612,0.0517,0.0284,0.1335],  # C -> T
                  [0.0012,0.0007,0.0004,0.0003,0.0003,0.0003,0,0,0.0003,0.0001,0.0003,0,0.0005,0.0001,0.0001,0.0002],                 # T -> A
                  [0.0008,0.0003,0.0008,0.0007,0.0002,0.0004,0.0009,0.0005,0.0004,0.0003,0.0006,0.0003,0.0003,0.0004,0.0002,0.0004],  # T -> C
                  [0.0001,0.0001,0.0001,0.0001,0,0.0001,0.0001,0,0.0001,0.0001,0.0009,0.0002,0.0001,0,0.0001,0.0005]                  # T -> G
                ],
                [ # signature 2
                  [0.0266,0.0222,0.0026,0.02,0.0205,0.0145,0.0012,0.0155,0.0155,0.0094,0.0009,0.011,0.0224,0.0177,0.0019,0.0307],
                  [0.0127,0.0079,0.0035,0.0145,0.0058,0.0048,0.0015,0.0115,0.0034,0.0032,0,0.0071,0.0047,0.0145,0.0006,0.0246],
                  [0.0232,0.0099,0.042,0.0184,0.014,0.0108,0.0219,0.02,0.0137,0.0102,0.0264,0.0128,0.0048,0.0186,0.0153,0.0165],
                  [0.0096,0.0084,0.0094,0.0175,0.0075,0.0076,0.0046,0.0123,0.0044,0.0035,0.0028,0.008,0.0176,0.0047,0.0031,0.0139],
                  [0.0245,0.0087,0.0144,0.0235,0.0098,0.0096,0.0051,0.0102,0.0105,0.0053,0.0042,0.0108,0.0114,0.0081,0.0038,0.0098],
                  [0.0046,0.0006,0.0036,0.0035,0.0025,0.0009,0.0028,0.0082,0.0023,0.0005,0.004,0.0048,0.0041,0.0012,0.0056,0.0104]
                ]
              ],
  "id":["PD3851a","PD3890a","PD3904a"],
  "mutation":[[0,0,0.0594],[0,1,0.7677],[0,2,0.1727],[1,0,0.1474],[1,1,0.4064],[1,2,0.4461]],
  "mutation_count":[4001,7174,5804]
}

Elements of the input data for Mutation Signature Report

signature:
Probability masses for each mutation pattern.
Input the probability value for each mutation signature, substitution pattern (e.g., C > A), and context (e.g., TpCpA > TpApA).
The number of bases should be three or five.
The number of contexts for each substitution pattern should be identical (16 and 256 when the numbers of bases are three and five, respectively).

As the number of bases is three in the above example data, probability values for the 16 contexts should be put down in the following order:

ANA,ANC,ANG,ANT,CNA,CNA,CNG,CNT,GNA,GNC,GNG,GNT,TNA,TNA,TNG,TNT

When base = 5, the 256 context values should be put down in the following order:

AANAA,AANAC,AANAG,AANAT,AANCA,AANCC,AANCG,AANCT,AANGA,AANGC,AANGG,AANGT,AANTA,AANTC,AANTG,AANTT,
ACNAA,ACNAC,ACNAG,ACNAT,ACNCA,ACNCC,ACNCG,ACNCT,ACNGA,ACNGC,ACNGG,ACNGT,ACNTA,ACNTC,ACNTG,ACNTT,
AGNAA,AGNAC,AGNAG,AGNAT,AGNCA,AGNCC,AGNCG,AGNCT,AGNGA,AGNGC,AGNGG,AGNGT,AGNTA,AGNTC,AGNTG,AGNTT,
ATNAA,ATNAC,ATNAG,ATNAT,ATNCA,ATNCC,ATNCG,ATNCT,ATNGA,ATNGC,ATNGG,ATNGT,ATNTA,ATNTC,ATNTG,ATNTT,
CANAA,CANAC,CANAG,CANAT,CANCA,CANCC,CANCG,CANCT,CANGA,CANGC,CANGG,CANGT,CANTA,CANTC,CANTG,CANTT,
CCNAA,CCNAC,CCNAG,CCNAT,CCNCA,CCNCC,CCNCG,CCNCT,CCNGA,CCNGC,CCNGG,CCNGT,CCNTA,CCNTC,CCNTG,CCNTT,
CGNAA,CGNAC,CGNAG,CGNAT,CGNCA,CGNCC,CGNCG,CGNCT,CGNGA,CGNGC,CGNGG,CGNGT,CGNTA,CGNTC,CGNTG,CGNTT,
CTNAA,CTNAC,CTNAG,CTNAT,CTNCA,CTNCC,CTNCG,CTNCT,CTNGA,CTNGC,CTNGG,CTNGT,CTNTA,CTNTC,CTNTG,CTNTT,
GANAA,GANAC,GANAG,GANAT,GANCA,GANCC,GANCG,GANCT,GANGA,GANGC,GANGG,GANGT,GANTA,GANTC,GANTG,GANTT,
GCNAA,GCNAC,GCNAG,GCNAT,GCNCA,GCNCC,GCNCG,GCNCT,GCNGA,GCNGC,GCNGG,GCNGT,GCNTA,GCNTC,GCNTG,GCNTT,
GGNAA,GGNAC,GGNAG,GGNAT,GGNCA,GGNCC,GGNCG,GGNCT,GGNGA,GGNGC,GGNGG,GGNGT,GGNTA,GGNTC,GGNTG,GGNTT,
GTNAA,GTNAC,GTNAG,GTNAT,GTNCA,GTNCC,GTNCG,GTNCT,GTNGA,GTNGC,GTNGG,GTNGT,GTNTA,GTNTC,GTNTG,GTNTT,
TANAA,TANAC,TANAG,TANAT,TANCA,TANCC,TANCG,TANCT,TANGA,TANGC,TANGG,TANGT,TANTA,TANTC,TANTG,TANTT,
TCNAA,TCNAC,TCNAG,TCNAT,TCNCA,TCNCC,TCNCG,TCNCT,TCNGA,TCNGC,TCNGG,TCNGT,TCNTA,TCNTC,TCNTG,TCNTT,
TGNAA,TGNAC,TGNAG,TGNAT,TGNCA,TGNCC,TGNCG,TGNCT,TGNGA,TGNGC,TGNGG,TGNGT,TGNTA,TGNTC,TGNTG,TGNTT,
TTNAA,TTNAC,TTNAG,TTNAT,TTNCA,TTNCC,TTNCG,TTNCT,TTNGA,TTNGC,TTNGG,TTNGT,TTNTA,TTNTC,TTNTG,TTNTT

Elements for signature contribution graph

This graph is optional.

Signature contribution graph presents the amount of mutations associated with each mutation signature. When id, mutation, and mutation_count are set in the input json file, the signature contribution graph is generated (example ).

id:
List of samples. For each sample, sample indices are assigned (in this example, PD3851a = 0, PD3890a = 1, PD3904a = 2, etc.).
mutation_count:
The number of mutations for each sample (the mutation number for PD3851a = 4001, that for PD3890a = 7174, etc.).
mutation:
Contribution ratio of each mutation signature to each sample ([sample index, signature index, value]).

The indices for mutation signature (signature index) are assigned in the listed order in the signature key.
In the above example, (signature1 = 0, signature2 = 1, signature3 = 2).

Note

The keys in the input json file can be modified by changing the contents in the [result_format_signature] section of the configuration file.

example/signature_stack/paplot.cfg
[result_format_signature]
# the keys in input json file
key_signature = signature
key_id = id
key_mutation = mutation
key_mutation_count = mutation_count

Note

One procedure to validate json file format

paplot using json python package. When the input file can be loaded successfully using the load() function from json python package, then the input file is confirmed to be valid json format.

Example, when the file name is “data2.json”.

$ python
>>> import json
>>> json.load(open("data2.json"))

2. Minimal dataset

For the format of input data, please refer to 1. Input data format.

Input data file (the number of mutation signatures is two)

example/signature_minimal/data.json
{
  "signature":[
    # signature 1
    [
      [0.0021,0.0006,0.0002,0.0007,0.0017,0.001,0.0003,0.0009,0.0014,0.0006,0.0003,0.0006,0.027,0.0108,0.0016,0.0147],
      [0.0025,0.0009,0.0002,0.0022,0.0029,0.0007,0.0005,0.0034,0.0009,0.0006,0.0002,0.0014,0.1504,0.0301,0.0053,0.1884],
      [0.0046,0.0018,0.0031,0.0021,0.0097,0.0029,0.0049,0.0055,0.0047,0.0024,0.0037,0.003,0.2557,0.0513,0.0286,0.1312],
      [0.0014,0.0009,0.0007,0.0006,0.0004,0.0005,0.0003,0.0003,0.0004,0.0003,0.0005,0.0002,0.0008,0.0003,0.0003,0.0005],
      [0.001,0.0004,0.0011,0.001,0.0003,0.0007,0.0012,0.0008,0.0006,0.0004,0.0007,0.0005,0.0005,0.0007,0.0004,0.0007],
      [0.0003,0.0003,0.0003,0.0003,0.0001,0.0003,0.0003,0.0003,0.0002,0.0002,0.0011,0.0004,0.0003,0.0002,0.0003,0.0009]
    ],
    # signature 2
    [
      [0.022,0.0183,0.0028,0.0171,0.0192,0.0148,0.0026,0.0157,0.0143,0.0108,0.0018,0.0116,0.0181,0.016,0.0021,0.0246],
      [0.0133,0.0088,0.0037,0.0136,0.0095,0.008,0.003,0.0131,0.0065,0.0063,0.0016,0.0095,0.0044,0.0135,0.0016,0.0171],
      [0.0195,0.0098,0.0283,0.0159,0.0138,0.0112,0.0156,0.0183,0.0128,0.0108,0.0186,0.0127,0,0.0146,0.0095,0.0115],
      [0.0095,0.0085,0.0102,0.0155,0.0077,0.0102,0.0096,0.0135,0.0054,0.0052,0.0058,0.0089,0.0145,0.0076,0.0058,0.016],
      [0.0192,0.0089,0.0135,0.0198,0.0089,0.0113,0.0092,0.0117,0.0092,0.0063,0.0064,0.01,0.0107,0.0096,0.0061,0.0123],
      [0.0059,0.0028,0.0068,0.0063,0.0039,0.0044,0.0076,0.0101,0.004,0.0028,0.007,0.0064,0.006,0.0046,0.008,0.0132]
    ]
  ]
}

Configuration file

example/signature_minimal/paplot.cfg
[signature]
tooltip_format_signature_title = {sig}
tooltip_format_signature_partial = {route}: {#sum_item_value:6.2}

signature_y_max = -1

alt_color_CtoA = #1BBDEB
alt_color_CtoG = #211D1E
alt_color_CtoT = #E62623
alt_color_TtoA = #CFCFCF
alt_color_TtoC = #ACD577
alt_color_TtoG = #EDC7C4

[result_format_signature]
format = json
background = False
key_signature = signature

Execute paplot.

paplot signature signature_minimal/data.json ./tmp signature_minimal \
--config_file ./signature_minimal/paplot.cfg

Then the report is generated in the tmp directory.

Here, the file names (graph_signature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).

./tmp
  ┗ signature_minimal
      ┗ graph_signature2.html

3. Mutation signature with multiple numbers of signatures

View the report generated in this section.

For the format of input data, please refer to 1. Input data format.

The input data for each signature number and configuration file are necessary for generating Mutation Signature Report with various numbers of signatures.

In this example dataset, the following files are prepared:

example/signature_multi_class/

   # Input data files
  ┣ data2.json  # signature num = 2
  ┣ data3.json  # signature num = 3
  ┣ data4.json  # signature num = 4
  ┣ data5.json  # signature num = 5
  ┣ data6.json  # signature num = 6

   # Configuration file
  ┗ paplot.cfg

Execute paplot for each mutation signature number.

paplot signature signature_multi_class/data2.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg

paplot signature signature_multi_class/data3.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg

paplot signature signature_multi_class/data4.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg

paplot signature signature_multi_class/data5.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg

paplot signature signature_multi_class/data6.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg

Or, execute the following batch command:

paplot signature "signature_multi_class/data*.json" ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg

Then, the report is generated in the tmp directory.

Here, the file names (graph_signature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).

./tmp
  ┗ signature_multi_class
      ┣ graph_signature2.html
      ┣ graph_signature3.html
      ┣ graph_signature4.html
      ┣ graph_signature5.html
      ┗ graph_signature6.html

4. Signature contribution graph

View the report generated in this section.

Here, we add a signature contribution graph.

For the format of input data, please refer to 1. Input data format.

For generating report with various signature numbers, please refer to 3. Mutation signature with multiple numbers of signatures.

Execute paplot.

paplot signature "signature_stack/data*.json" ./tmp signature_stack \
--config_file ./signature_stack/paplot.cfg

pmsignature Report

Here, we show describe the procedure generate pmsignature Report using sample data [*].

[*]the sample data is equipped with the example directory of the paplot directory.

1. Input data format

To generate pmsignature Report using paplot, json format input data is required.

example/pmsignature_stack/data2.json
{
  "ref":[
          [ # pmsignature 1
            [0.338,0.15,0.183,0.327],  # ref1 (A,C,G,T)
            [0.362,0.191,0.177,0.267], # ref2 (A,C,G,T)
            [0,0.731,0,0.268],         # ref3 (A,C,G,T)
            [0.31,0.165,0.251,0.272],  # ref4 (A,C,G,T)
            [0.295,0.193,0.168,0.341]  # ref5 (A,C,G,T)
          ],
          [ # pmsignature 2
            [0.179,0.414,0.084,0.321],
            [0.007,0.025,0.004,0.962],
            [0,0.999,0,0],
            [0.472,0.104,0.041,0.381],
            [0.277,0.175,0.284,0.262]
          ]
        ],
  "alt":[
          [ # pmsignature 1
            [0,0,0,0],                 # altA (A,C,G,T)
            [0.194,0,0.091,0.445],     # altC (A,C,G,T)
            [0,0,0,0],                 # altG (A,C,G,T)
            [0.093,0.163,0.011,0]      # altT (A,C,G,T)
          ],
          [ # pmsignature 2
            [0,0,0,0],
            [0.059,0,0.437,0.502],
            [0,0,0,0],
            [0,0,0,0]
          ]
        ],
  "strand":[
            [0.461,0.538],  # pmsignature 1
            [0.512,0.487]   # pmsignature 2
           ],
  "id":["PD3851a","PD3890a","PD3904a"],
  "mutation":[[0,0,0.535],[0,1,0.038],[0,2,0.426],[1,0,0.186],[1,1,0.156],[1,2,0.656]],
  "mutation_count":[702,2312,2096]
}
_images/exec_pmsig1.PNG

Elements of the input data for pmsignature Report

ref:
Values of reference bases (in the order of A, C, G, T) for each mutation signature.
Not necessarily sum-to-one (normalized within the program).
In this example, the number of bases is five. However, this number can be changed to any arbitrary odd number (e.g., 3, 7).
alt:
Values for alternative base (in the order of A, C, G, T for each central reference base) for each mutation signature.
Four values (in the order of A, C, G, T) for each reference base A, C, G, and T. Therefore, in total 16 values are required for each mutation signature.
Usually, the central base is fixed to C or T. Therefore, the values whose reference bases are A or G contribute negligibly to the visualization (and thus can be set to zero).
strand:
Values for the strand (in the order of plus and minus) for each mutation signature.
When strand biasness is not taken into account, set [0, 0].

Elements for signature contribution graph

This graph is optional.

Signature contribution graph presents the amount of mutations associated with each mutation signature. When id, mutation, and mutation_count are set in the input json file, the signature contribution graph is generated (example).

id:
List of samples. For each sample, sample indices are assigned (in this example, PD3851a = 0, PD3890a = 1, PD3904a = 2, etc.).
mutation_count:
The number of mutations for each sample (in this example, the mutation number for PD3851a = 702, that for PD3890a = 2312, etc.).
mutation:
Contribution ratio of each mutation signature to each sample ([sample index, signature index, value]).

The indices for mutation signature (signature index) are assigned in the listed order in the signature key.
In the above example, (signature1 = 0, signature2 = 1, signature3 = 2).

Note

The keys in the input json file can be modified by changing the contents in the [result_format_pmsignature] section of the configuration file.

example/pmsignature_stack/paplot.cfg
[result_format_pmsignature]
format = json
background = True
key_ref = ref
key_alt = alt
key_strand = strand
key_id = id
key_mutation = mutation
key_mutation_count = mutation_count

Note

The procedure to validate json file format

paplot using json python package. When loading the input file using load function from json package, then the input file is valid json format.

For e.g., when the file fine name is “data2.json”.

$ python
>>> import json
>>> json.load(open("data2.json"))

2. Minimal dataset

For the format of input data, please refer to 1. Input data format.

example/pmsignature_minimal/data.json
{
  "ref":[[[0.189,0.395,0.088,0.326],[0.019,0.029,0.01,0.94],[0,0.999,0,0],[0.467,0.103,0.054,0.374],[0.278,0.175,0.276,0.268]]],
  "alt":[[[0,0,0,0],[0.063,0,0.415,0.521],[0,0,0,0],[0,0,0,0]]],
  "strand":[[0.514,0.485]]
}

Configuration file

example/pmsignature_minimal/paplot.cfg
[pmsignature]
tooltip_format_ref1 = A: {a:.2}
tooltip_format_ref2 = C: {c:.2}
tooltip_format_ref3 = G: {g:.2}
tooltip_format_ref4 = T: {t:.2}
tooltip_format_alt1 = C -> A: {ca:.2}
tooltip_format_alt2 = C -> G: {cg:.2}
tooltip_format_alt3 = C -> T: {ct:.2}
tooltip_format_alt4 = T -> A: {ta:.2}
tooltip_format_alt5 = T -> C: {tc:.2}
tooltip_format_alt6 = T -> G: {tg:.2}
tooltip_format_strand = + {plus:.2} - {minus:.2}

color_A = #06B838
color_C = #609CFF
color_G = #B69D02
color_T = #F6766D
color_plus = #00BEC3
color_minus = #F263E2

[result_format_pmsignature]
format = json
background = True
key_ref = ref
key_alt = alt
key_strand = strand

Execute paplot.

paplot pmsignature pmsignature_minimal/data.json ./tmp pmsignature_minimal \
--config_file ./pmsignature_minimal/paplot.cfg

Then, the report is generated in the tmp directory.

Here, the file names (graph_pmsignature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).

./tmp
  ┗ pmsignature_minimal
      ┗ graph_pmsignature2.html

Note

Since one signature is assigned to background signature in this example, the last signature in the contribution graph is background signature.


3. Mutation signature with multiple numbers of signatures

View the report generated in this section

For the format of input data, please refer to 1. Input data format.

The input data for each signature number and configuration file are required for generating Mutation Signature Report with various numbers of signatures.

In this example dataset, following files are prepared.

example/pmsignature_multi_class/

   # Input data files
  ┣ data2.json  # pmsignature num = 2
  ┣ data3.json  # pmsignature num = 3
  ┣ data4.json  # pmsignature num = 4
  ┣ data5.json  # pmsignature num = 5
  ┣ data6.json  # pmsignature num = 6

   # Configuration file
  ┗ paplot.cfg

Execute paplot for each mutation signature number.

paplot pmsignature pmsignature_multi_class/data2.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg

paplot pmsignature pmsignature_multi_class/data3.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg

paplot pmsignature pmsignature_multi_class/data4.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg

paplot pmsignature pmsignature_multi_class/data5.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg

paplot pmsignature pmsignature_multi_class/data6.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg

Or, execute the following batch command.

paplot pmsignature "pmsignature_multi_class/data*.json" ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg

Then, the report is generated in the tmp directory.

Here, the file names (graph_pmsignature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).

./tmp
  ┗ pmsignature_multi_class
      ┣ graph_pmsignature2.html
      ┣ graph_pmsignature3.html
      ┣ graph_pmsignature4.html
      ┣ graph_pmsignature5.html
      ┗ graph_pmsignature6.html

Note

Since one signature is assigned to background signature in this example, the last signature in the contribution graph is background signature.


4. Signature contribution graph

View the report generated in this section.

Here, we add a signature contribution graph.

For the format of input data, please refer to 1. Input data format.

For generating report with various signature numbers, please refer to 3. Mutation signature with multiple numbers of signatures.

Execute paplot.

paplot pmsignature "pmsignature_stack/data*.json" ./tmp pmsignature_stack \
--config_file ./pmsignature_stack/paplot.cfg

Note

Since one signature is assigned to background signature in this example, the last signature in the contribution graph is background signature.


5. Without background

Here, we generate a pmsignature Report without background.

Set the background option to False in the configuration file.

example/pmsignature_nobackground/paplot.cfg
[result_format_pmsignature]
background = False

Then, execute paplot.

paplot pmsignature pmsignature_nobackground/data.json ./tmp pmsignature_nobackground \
--config_file ./pmsignature_nobackground/paplot.cfg

Common Issues

Here, we describe several common issues that are helpful in generating reports using sample data [*].

[*]The sample data is equipped with the example directory of the paplot directory.

1. Delimiter for the input data

When the delimiter for the input data is tab or space character, modify the configuration file as follows:

# For the case of Mutation Matrix Report
[result_format_mutation]
sept = \t

# for the case of space character
sept = " "

For QC and Chromosomal Aberration report, change the [result_format_qc] and [result_format_ca] sections.


2. Comment line

# This is comment.
# Please skip this line.

ID,Type,Gene
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1

In the default setting, the character “#” is used to denote the start of a comment line, and paplot ignores it. To change the character for the comment line, modify the configuration file as follows:

# For the case of Mutation Matrix Report
[result_format_mutation]
comment = #

For QC and Chromosomal Aberration report, change the [result_format_qc] and [result_format_ca] sections.


3. Processing multiple input data

Generally, cancer genome study uses multiple sequence data, and the reports generated by paplot consist of information from multiple samples. There are two approaches for preparing input data with multiple samples for paplot,

  • case1: Single merged input data

    In this case, there should be a column for the sample name (that should be set in the key col_opt_id in the configuration file).

  • case2: Multiple input data divided among individual samples

    In this case, paplot discern samples names by file names (by removing the character set by the suffix key in the configuration file). Alternatively, set the column for the sample name (and set the col_opt_id in the configuration file) for each input file.

In the previous examples, we generally used merged input data (case 1 above). Here, we describe the procedure for generating a report using multiple input data (case 2).

example/mutation_split_file/

   # Input data files
  ┣ SAMPLE00.data.csv  # input data for SAMPLE00
  ┣ SAMPLE01.data.csv  # input data for SAMPLE01
  ┣ SAMPLE02.data.csv  # input data for SAMPLE02
  ┣ SAMPLE03.data.csv  # input data for SAMPLE03
  ┣ SAMPLE04.data.csv  # input data for SAMPLE04

   # Configuration file
  ┗ paplot.cfg
example/mutation_split_file/SAMPLE00.data.csv
MutationType,Gene
intronic,GATA3
intronic,FLT3
intronic,FLT3
UTR3,CDH1
exonic,GATA3

Set the suffix key in the configuration file.

example/mutation_split_file/paplot.cfg
 [result_format_mutation]
 suffix = .data.csv

 # Do not use the col_opt_id
 col_opt_id =

When the suffix key is set, the file name before the suffix characters becomes the sample name.

_images/id_suffix.PNG

Then, execute paplot.

# For the case of Mutation Matrix Report

# When setting each input file, join them by ','.
paplot mutation {unzip_path}/example/mutation_split_file/SAMPLE00.data.csv,{unzip_path}/example/mutation_split_file/SAMPLE01.data.csv ./tmp mutation_split_file \
--config_file {unzip_path}/example/mutation_split_file/paplot.cfg

# Moreover, wild-card character can be used. (Enclose in double quotations).
paplot mutation "{unzip_path}/example/mutation_split_file/*.csv" ./tmp mutation_split_file \
--config_file {unzip_path}/example/mutation_split_file/paplot.cfg

For QC and Chromosomal Aberration report, change the [result_format_qc] and [result_format_ca] sections.

4. Keyword

4-1. About keyword

For each column name, we can set the keyword by setting the configuration file. Keywords will be used for customizing pop-up information, etc.

Configuration file

[result_format_mutation]
# Required items
# col_{key} = {actual column name}
#
col_gene = Gene
col_group = MutationType

# Optional items
# col_opt_{key} = {actual column name}
#
col_opt_id = Sample
col_opt_start = Start
col_opt_end = End

col_{keyword} = {actual column name} or col_opt_{keyword} = {actual column name} entries, {keyword} will be the keyword.

Please note the following points:

  • The keywords are case-independent. For example, CHR, Chr, and chr are considered identical.
  • The part {keyword} can be set arbitrarily. However, always start with col_opt_.
  • col_opt_id is to be used only for sample ID.
  • For Mutation Matrix and Chromosomal Aberration Report, col_opt_group is also reserved for grouping and cannot be used for other purposes.
  • Mutational Signature Report and pmsignature Report does not use these keywords.

5. User defined format

We can customize the pop-up information that appears upon mouseover events.

For each report and graph, it is necessary to set up the contents of pop-up information. However, the manner of writing is similar.

Configuration file

tooltip_format_checker_partial = type[{func}], {chr}:{start}:{end}, [{ref} -> {alt}]

# will be displayed as:
type[exome], chr1:2000:2001, [A -> T]

The words surrounded by {} are keywords; when the pop-up information is displayed, keywords will be replaced by the actual value.

About Keyword

5-1. Numerical calculation

paplot can use one or more keywords to perform numerical calculations.

{key1/key2*100}%

# will be displayed as (no rounding)
3.33333333333333%

If you wish to round-off decimals, add the value for the number of digits to be displayed after the decimal point; e.g. add : .2 to display two digits after the decimal point in the keyword value.

{key1/key2*100:.2}%

# will be displayed as (with rounding)
3.33%

5-2. Separated digits

If you wish to put commas in every third digit, add : , after the keyword value.

{key1}

# will be displayed as (with no digit separator)
123456789
{key1:,}

# will be displayed as (with digit separator)
123,456,789

Release Note

Attention

There is no compatibility of the configuration files among various versions.
When updating, please use the configuration file of the corresponding version.

v0.5.4

  • Added several example data.
  • Modified several items in the configuration file.

v0.5.3

Added options related to font-size.

v0.5.2

  • Added the functions to generate Mutation Signature Report and pmsignature Report.
  • Modified the name of subcommand sv -> ca.

v0.4.0

Added the function for saving the image.

v0.3.1

Modified certain bugs related to Mutation Matrix Report.

v0.3.0

Incorporated functions to generate Mutation Matrix Report.

v0.2.8

Modified specification of function for merging multiple files.

v0.2.7

Incorporated functions to generate QC Report and SV (Structural Variation) Report.

License

paplot is licensed under the MIT License.

Copyright (c) 2017 Ai Okada and Yuichi Shiraishi
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contact

E-mail:genomon.devel@gmail.com

Javascript Libraries

paplot uses the following javascript packages.