paplot documentation¶
Contents:
About¶
Available reports¶
- Quality Control (QC) Report
QC Reports present the qualities of each sequence data (sequencing coverage, alignment ratio, insert sizes, etc.).
- Chromosomal Aberration (CA) Report
CA Reports present the sample-wise landscape of chromosomal aberrations (e.g., structural variations and gene fusion).
- Mutation Matrix Report
Mutation Matrix Reports present the mutation status of each gene (vertical axis) and sample (horizontal axis).
- Mutational Signature Report
Mutational Signature Reports present the mutation signatures identified in the cohort and contribution ratios of signatures per sample.
Representation by pmsignature is also possible.
Quick Start¶
In this section, we will learn to
- Install paplot
- Execute paplot with simple example data
- View the output reports
- Modify configuration files and use your own data.
1. Install paplot¶
cd {the directory where you want to install paplot}
# for v0.5.4
wget https://github.com/Genomon-Project/paplot/archive/v0.5.4.zip
unzip v0.5.4.zip
cd paplot-0.5.4/
python setup.py build install
Confirmation of installation
paplot --version
paplot-0.5.4
2. Execute paplot with simple example data¶
paplot subcommand [--config_file CONFIG_FILE] [--title TITLE]
[--ellipsis ELLIPSIS] [--overview OVERVIEW]
[--remarks REMARKS]
input output_dir project_name
Required arguments
subcommand: | The type of report to generate. Select from the following:
|
---|---|
input: | The input data table. |
output_dir: | The directory wherein the output files of paplot are generated. |
project_name: | The project name (used as the title of the output files). |
Please execute paplot using the prepared sample data.
cd {the path where paplot is installed}
# QC Report
paplot qc example/qc_brush/data.csv ./tmp demo
# Chromosomal Aberration Report
paplot ca example/ca_option/data.csv ./tmp demo
# Mutation Matrix Report
paplot mutation example/mutation_option/data.csv ./tmp demo
# Mutational Signature Report
paplot signature "example/signature_stack/data*.json" ./tmp demo
# pmsignature Report
paplot pmsignature "example/pmsignature_stack/data*.json" ./tmp demo
3. View the output file¶
You will find the following directory structure:
The directory specified by the {output_dir} argument
├ demo
│ ├ graph_ca.html <--- Chromosomal Aberration Report
│ ├ graph_mut.html <--- Mutation Matrix Report
│ ├ graph_pmsignature2.html <--- pmsignature Report (with varying number of mutation signatures)
│ ├ graph_pmsignature3.html
│ ├ graph_pmsignature4.html
│ ├ graph_pmsignature5.html
│ ├ graph_pmsignature6.html
│ ├ graph_qc.html <--- QC Report
│ ├ graph_signature2.html <--- Mutational Signature Report (with varying number of mutation signatures)
│ ├ graph_signature3.html
│ ├ graph_signature4.html
│ ├ graph_signature5.html
│ └ graph_signature6.html
│
├ js <--- The next four directories are necessary to display HTML files, Do not remove them.
├ layout
├ lib
├ style
│
└ index.html <--- Open this file in a web browser.
Modify configuration files and use your own data
Install¶
- Linux
- MacOS X
- Windows
For Linux¶
1. Install paplot¶
Source code (zip)
files from the paplot web-site (https://github.com/Genomon-Project/paplot/releases/).cd {the directory where you want to install paplot}
# For v0.5.4
wget https://github.com/Genomon-Project/paplot/archive/v0.5.4.zip
unzip v0.5.4.zip
cd paplot-0.5.4/
python setup.py build install
# If you get an error with the above command, try this
export PATH=~/.local/bin/:$PATH
export LD_LIBRARY_PATH=~/.local/lib/:$LD_LIBRARY_PATH
python setup.py build install --user
paplot --version
paplot-0.5.4
Note
Set PATH
export PATH=~/.local/bin/:$PATH
export LD_LIBRARY_PATH=~/.local/lib/:$LD_LIBRARY_PATH
For MacOS X¶
1. Download source files¶
Source code (zip)
files from the paplot website (https://github.com/Genomon-Project/paplot/releases/).git
is installed, you can type git clone -b master https://github.com/Genomon-Project/paplot.git
.2. Install paplot¶
cd {the directory where paplot source files are downloaded}
python setup.py build install --user
3. Setting PATH¶
/Users/<user name>/Library/Python/2.7/bin
export PATH={the directory where paplot is installed}/bin:$PATH
export LD_LIBRARY_PATH={the directory where paplot is installed}/lib:$LD_LIBRARY_PATH
# Mostly you can set up by adding the following lines (replace <user name> with your user name).
# export PATH=/Users/<user name>/Library/Python/2.7/bin:$PATH
# export LD_LIBRARY_PATH=/Users/<user name>/Library/Python/2.7/lib:$LD_LIBRARY_PATH
paplot --version
paplot-0.5.4
For Windows¶
1. Install Python¶
- winPython http://winpython.github.io/
- Python(x,y) http://python-xy.github.io/
2. Install paplot¶
Source code (zip)
files from the paplot website (https://github.com/Genomon-Project/paplot/releases/),cd {the directory where the source files are unzipped}
Attention
The following command is for the case where WinPython-64bit-2.7.10.3 is installed.
> C:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe setup.py build install
> C:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe paplot --version
paplot-0.5.4
paplot command¶
1. Basic usage¶
paplot subcommand [--config_file CONFIG_FILE] [--title TITLE]
[--ellipsis ELLIPSIS] [--overview OVERVIEW]
[--remarks REMARKS]
input output_dir project_name
Required arguments
subcommand: | The type of report to generate. Select from the following:
|
---|---|
input: | Input files. If you wish to process multiple files (usually divided by individual samples), please refer to Processing multiple input files. |
# for single input file
paplot mutation {unzip_path}/example/mutation_minimal/data.csv ./tmp mutation_minimal \
--config_file {unzip_path}/example/mutation_minimal/paplot.cfg
# for multiple input files, delimit them by comma
paplot mutation \
{unzip_path}/example/mutation_split_file/SAMPLE00.data.csv,{unzip_path}/example/mutation_split_file/SAMPLE01.data.csv \
./tmp mutation_split_file1 --config_file {unzip_path}/example/mutation_split_file/paplot.cfg
# paplot also accept wild card representation. In this case enclose the input by double quotations
paplot mutation "{unzip_path}/example/mutation_split_file/*.csv" ./tmp mutation_split_file2 \
--config_file {unzip_path}/example/mutation_split_file/paplot.cfg
output_dir: | Output directory path. Refer Output directory for the details of the directory components. |
---|---|
project_name: | Project name (used as the title of the output files). |
2. Output directory¶
You will find the following directory structure:
{output_dir}
├ {project_name}
│ └ graph_*.html <--- Each report
│
├ js <--- The next four directories are necessary to display HTML files, Do not remove them.
├ layout
├ lib
├ style
│
└ index.html <--- Open this file in a web browser.
If you wish to move the output, move the entire output directory. For the usage of each report, please refer to HOW TO USE GRAPHS.
3. Options¶
You can add the following optional arguments:
--config_file | Path to the configuration file. If it is not specified, the default file is used. |
--title | Title of the graph. |
--ellipsis | Abbreviated name of the graph used for file names (e.g., graph_**ca**.html). It may be convenient when outputting multiple files to the same directory. |
--overview | Outline of the graph (displayed in the index.html file). |
--remarks | Text displayed in the remark section of the index.html file (the default value is set at ( [style] section’s remarks option) in the configuration file. |
The default values are as follows:
subcommand | title | ellipsis | overview | remarks |
---|---|---|---|---|
qc | QC graphs | qc | Quality Control of bam. | None |
ca | CA graphs | ca | Chromosomal Aberration. | None |
mutation | Mutation Matrix | mutation | Gene-sample mutational profiles. | None |
signature | Signature | signature | Mutational Signatures. | None |
pmsignature | PMSignature | pmsignature | Express mutational signatures in pmsignature. | None |
Quality Control (QC) Report¶
[*] | The graph for selecting samples can be changed by using the configuration files. Please refer to Graph for selecting samples . |
Chromosomal Aberration (CA) Report¶
- The barplot at the top panel displays the distribution of the breakpoints of the CAs in the cohort.
- The circular plots below illustrate the CIRCOS-like profile of the CAs for each sample, where two edges of a curved line represent the breakpoints of each CA.
[*] | About categorization By using the configuration file, the categorization can be changed. Please refer to Customizing categorization . |
Mutation Matrix Report¶
Mutation Matrix Report displays a landscape of mutation status across genes (vertical axis) and samples (horizontal axis).
Horizontal bar chart (Sample): | |
---|---|
Displays the total number of mutations detected for each sample.
|
|
Vertical bar chart (Gene): | |
Displays the number of mutations and fractions of mutation types (e.g., nonsynonymous, stopgain, etc.) for each gene.
- If the same sample has multiple mutations on the same gene, this is counted as 1.
- If the same sample has multiple mutations with different mutation types on the same gene, a mutation type with “the higher priority” is counted. For example, in the default setting, the priority of stopgain is higher than that of nonsynonymous mutations. The order of priority can be modified by the configuration file.
|
|
Mutation type: | The mutation type is displayed with a distinct color. If you wish to hide a specific type of mutations, uncheck them in this section.
|
Subplot: | If meta information is available for the samples (e.g., clinical information), it can be displayed as a subplot. This file must be entered in the configuration file before executing the paplot command.
|
How to view
1. X-axis sort¶
Change the order of items on the horizontal axis:
- None ... Default order
- ASC ... Ascending order
- DESC ... Descending order
It can sort based on the following elements (allowing for multiple key ordering):
Sample ID: | Sort by sample name.
|
---|---|
Mutation number: | |
Sort by the number of mutations per sample.
|
|
Genes: | Sort by the mutation states of the selected genes. After selecting either ASC or DESC, select the gene to add from the [Select gene name] list box, and click the [Add sort key] button.
|
Automatic Gantt-chart: | |
A Gantt chart can be automatically created. Enter the number of genes to display [*] in the horizontal edit box, and click the [Gantt-chart] button.
|
How Gantt-chart a Generated
- First, sort the genes according to the descending order to number of mutations.
- Then, divide the samples into two groups according to the mutation status of the first gene, and place the group with the mutation to the left and the other group to the right. Repeat this procedure for the second one, third one, etc.
[*] | It is preferable to display all the detected genes. However, as processing becomes heavier. In numerous cases, narrowing down to the gene list will be practical. |
2. Y-axis sort¶
Change the order of items on the vertical axis.
- None ... Default order
- ASC ... Ascending order
- DESC ... Descending order
It can sort based only the following elements (allowing for multiple key ordering):
Mutation number: | |
---|---|
Sort by number of mutations per gene. | |
Gene name: | Sort by gene name. |
3. Sample filter¶
Sets the maximum value of the vertical axis of the horizontal bar chart.
Before and after filter application
4. Genes filter¶
Set the filter for the gene displayed on the vertical axis.
Rate: | Frequency of the samples with mutations at each gene (%). The initial value is 0% (no filtering).
|
---|---|
Display maximum: | |
Maximum number of genes to display.
|
After setting the above items, please click the [update filter] button.
Mutational Signature Report¶
Mutational Signature Report displays the “mutation signature” (see e.g., Alexandrov et al., Nature, 2013) as bargraphs and the estimated “contributions” of each signature to the mutations per samples as stacked bar graphs.
Upper panel (mutation signature graph): | |
---|---|
Mutation signatures are displayed (typically by barplots with 96 elements).
|
|
Lower panel (signature contribution graph): | |
For each sample, the ratios of contributions of mutation signatures to the mutations are displayed.
|
In addition, you can change the display mode using the list box below the stacked graph.
View mode: |
|
||||
---|---|---|---|---|---|
Sort by: |
|
Display example when the view mode is [Count] and sorting is by [Mutation count].
A similar representation is adopted for the case with pmsignature.
QC Report¶
Here, we describe the procedure to generate QC report using sample data [*].
[*] | The sample data is equipped with the example directory of the paplot directory. |
1. Minimal dataset¶
For generating QC Report using paplot, sample ID (Sample) and QC items (at least one) are required. In this example, we adopt mean sequence depth (AverageDepth).
Sample,AverageDepth
SAMPLE1,70.0474
SAMPLE2,65.7578
SAMPLE3,63.3750
SAMPLE4,70.9654
SAMPLE5,69.9653
First, set the column names in the [result_format_qc]
section of the configuration file.
[result_format_qc]
col_opt_id = Sample
col_opt_key1 = AverageDepth
The column names of optional items can be set as col_opt_{keyword} = {actual column name}
.
For a more detailed description on keyword, please refer to About keyword.
Then, add the [qc_chart_1]
section to the configuration file and fill the contents within it.
[qc_chart_1]
# Title of the graph
title = Average depth
# Label of the Y axis
title_y = Average of depth
# Items for the stacked bargraph
# In this example, only one item is used and the graph is displayed as non-stacked bargraph
stack1 = {key1}
# Color and legend of the graph
name_set = Average depth:#2478B4
# Pop-up information
tooltip_format1 = Sample:{id}
tooltip_format2 = {key1:.2}
Note
Here, {key1}
used above is the {keyword}
specified in the [result_format_qc]
section.
- For a more detailed description on the procedure to set
name_set
, please refer to How to set name_set. - For a more detailed description on the procedure to set
tooltip_format
, please refer to User defined format.
Then, execute paplot.
paplot qc {unzip_path}/example/qc_minimal/data.csv ./tmp qc_minimal \
--config_file {unzip_path}/example/qc_minimal/paplot.cfg
2. Without header¶
SAMPLE1,70.0474
SAMPLE2,65.7578
SAMPLE3,63.3750
SAMPLE4,70.9654
SAMPLE5,69.9653
When the input data has no header (column names), it is necessary to set the column number to each key in the [result_format_qc]
section of the configuration file.
[result_format_qc]
# Set the value of the header option to False
header = False
col_opt_id = 1
col_opt_average_depth = 2
Then, execute paplot.
paplot qc {unzip_path}/example/qc_noheader/data.csv ./tmp qc_noheader \
--config_file {unzip_path}/example/qc_noheader/paplot.cfg
3. Stacked bargraph¶
Here, we generate a report with a stacked bargraph as well as a normal bargraph (generated in the minimal dataset example).
Sample,AverageDepth,ReadLengthR1,ReadLengthR2
SAMPLE1,70.0474,265,270
SAMPLE2,65.7578,140,200
SAMPLE3,63.375,120,175
SAMPLE4,70.9654,120,140
SAMPLE5,69.9653,230,110
- chart_1 [normal bargraph] AverageDepth (the same as the minimal dataset example)
- chart_2 [stacked bargraph] ReadLengthR1, ReadLengthR2
First, add these columns to the [result_format_qc]
section in the configuration file.
[result_format_qc]
col_opt_id = Sample
# Column used in the chart_1
col_opt_keyA1 = AverageDepth
# Column used in the chart_2
col_opt_keyB1 = ReadLengthR1
col_opt_keyB2 = ReadLengthR2
The column names of optional items can be set as col_opt_{keyword} = {actual column name}
.
For a more detailed description on keyword, please refer to About keyword.
Next, add the [qc_chart_1]
, and [qc_chart_2]
sections to the configuration file and fill the contents within it.
[qc_chart_*]
sections.*
, which should start from 1.For the completed configuration file, please refer to example/qc_stack/paplot.cfg.
3-1. Normal bargraph¶
The [qc_chart_1]
section is for a normal bargraph, and the contents should be filled as in the minimal dataset example.
3-2. Stacked bargraph¶
The [qc_chart_2]
section is for a stacked bargraph.
[qc_chart_2]
# Titles
title = Chart 2: Read length
title_y = Read length
# Items for the stacked bargraph
# Items are stacked in the order of stack1 → 2 → ...
stack1 = {keyB1}
stack2 = {keyB2}
# Color and legend
# Write down in the order of stack1 → 2 → ..., and join them by commas ','.
name_set = Read length r1:#2478B4, Read length r2:#FF7F0E
# Pop-up information
tooltip_format1 = Sample:{id}
tooltip_format2 = Read1: {keyB1:,}
tooltip_format3 = Read2: {keyB2:,}
Note
Here, {key*}
used above is the {keyword}
specified in the [result_format_qc]
section.
- For a more detailed description on the procedure to set
name_set
, please refer to How to set name_set. - For a more detailed description on the procedure to set
tooltip_format
, please refer to User defined format.
Then, execute paplot.
paplot qc {unzip_path}/example/qc_multi_plot/data.csv ./tmp qc_multi_plot \
--config_file {unzip_path}/example/qc_multi_plot/paplot.cfg
3-3. How to set name_set¶
Define the legends and their colors.
Write {legend}:{color}
for each item in the stacked bargraph (colors can be omitted).
name_set = average_depth:#2478B4
# When there are multiple items, join them by commas ','.
name_set = Read length r1:#2478B4, Read length r2:#FF7F0E
When colors are omitted, the default colors defined in the following file are used:
4. Various graphs¶
In the previous example, we generated a report with one normal bargraph and one stacked bargraph. Here, we generate more graphs.
Sample,AverageDepth,ReadLengthR1,ReadLengthR2,TotalReads,MappedReads,2xRatio,10xRatio,20xRatio,30xRatio
SAMPLE1,70.0474,265,270,94315157,56262203,0.9796,0.768,0.6844,0.6747
SAMPLE2,65.7578,140,200,50340277,33860998,0.8489,0.7725,0.7655,0.6131
SAMPLE3,63.375,120,175,90635480,88010999,0.9814,0.8236,0.6045,0.5889
SAMPLE4,70.9654,120,140,72885114,89163960,0.9047,0.8303,0.7032,0.6801
SAMPLE5,69.9653,230,110,92572101,28793615,0.9776,0.9452,0.672,0.6518
- chart_1 [normal bargraph] AverageDepth (the same as the minimal dataset example)
- chart_2 [stacked bargraph] ReadLengthR1, ReadLengthR2 (the same as the previous example)
- chart_3 [normal bargraph] MappedReads divided by TotalReads (mapping ratio)
- chart_4 [stacked bargraph] 2xRatio, 10xRatio, 20xRatio, 30xRatio (subtracting the values of items below)
First, add these columns to the [result_format_qc]
section in the configuration file.
[result_format_qc]
col_opt_id = Sample
# Columns used in the chart_1
col_opt_average_depth = AverageDepth
# Columns used in the chart_2
col_opt_read_length_r1 = ReadLengthR1
col_opt_read_length_r2 = ReadLengthR2
# Columns used in the chart_3
col_opt_mapped_reads = MappedReads
col_opt_total_reads = TotalReads
# Columns used in the chart_4
col_opt_ratio_2x = 2xRatio
col_opt_ratio_10x = 10xRatio
col_opt_ratio_20x = 20xRatio
col_opt_ratio_30x = 30xRatio
The column names of optional items can be set as col_opt_{keyword} = {actual column name}
.
For a more detailed description on keyword, please refer to About keyword.
Next, add the [qc_chart_1]
, [qc_chart_2]
, [qc_chart_3]
, and [qc_cahrt_4]
sections to the configuration file and fill the contents within it.
For the completed configuration file, please refer to example/qc_variation/paplot.cfg.
4-1. Simple normal bargraph¶
The [qc_chart_1]
section is for a normal bargraph, and the contents should be filled as in the minimal dataset example.
4-2. Simple stacked bargraph¶
The [qc_chart_2]
section is for a stacked bargraph, and the contents should be filled as in the previous example.
4-3. Normal bargraph (with numeric operations on columns)¶
The [qc_chart_3]
section is a graph for mapping ratio (Mapped reads divided by Total reads).
[qc_chart_3]
# Titles
title = Mapped reads/Total reads
title_y = Rate
# Items for the graph
stack1 = {mapped_reads/total_reads}
# Colors and legends
name_set = Mapped reads/Total reads:#2478B4
# Pop-up information
tooltip_format1 = Sample:{id}
tooltip_format2 = {mapped_reads/total_reads:.2}
stack1 = {mapped_reads/total_reads}
).{mapped_reads-total_reads}
) and addition (e.g., {mapped_reads+total_reads}
).tooltip_format2 = {mapped_reads/total_reads:.2}
tooltip_format2 = Mapped: {mapped_reads}, Total: {total_reads}
.4-4. Stacked bargraph (with numerical operations on columns)¶
The chart_4 section is a graph for sequence coverage.
[qc_chart_2]
# Title
title = Depth coverage
title_y = Coverage
# Items for the graph
stack1 = {ratio_30x}
stack2 = {ratio_20x-ratio_30x}
stack3 = {ratio_10x-ratio_20x}
stack4 = {ratio_2x-ratio_10x}
# Colors and legends
name_set = Ratio 30x:#2478B4, Ratio 20x:#FF7F0E, Ratio 10x:#2CA02C, Ratio 2x:#D62728
# Pop-up information
tooltip_format1 = ID:{id}
tooltip_format2 = ratio__2x: {ratio_2x:.2}
tooltip_format3 = ratio_10x: {ratio_10x:.2}
tooltip_format4 = ratio_20x: {ratio_20x:.2}
tooltip_format5 = ratio_30x: {ratio_30x:.2}
Here, we set the first stack (stack1) to ratio_30x, the second stack (stack2) to ratio_30x subtracted by ratio_20x, etc.
Then, execute paplot.
paplot qc {unzip_path}/example/qc_variation/data.csv ./tmp qc_variation \
--config_file {unzip_path}/example/qc_variation/paplot.cfg
5. Graph for selecting samples¶
Here, we add the graph for selecting samples (using the column AverageDepth).
If you wish to use other columns, it should be registered in the [result_format_qc]
section of the configuration file (as col_opt_{name}
).
Only one graph for selecting samples can be included.
Add the [qc_chart_brush]
section to the configuration file and fill the contents within it.
[qc_chart_brush]
stack = {average_depth}
name_set = average:#E3E5E9
Then, execute paplot.
paplot qc {unzip_path}/example/qc_brush/data.csv ./tmp qc_brush \
--config_file {unzip_path}/example/qc_brush/paplot.cfg
Chromosomal Aberration Report¶
Here, we describe the procedure to generate Chromosomal Aberration Report using sample data [*].
[*] | The sample data is equipped with the example directory of the paplot directory. |
1. Minimal dataset¶
For generating Chromosomal Aberration Report using paplot, at least the following five items are necessary:
- Sample ID (Sample)
- Chromosome of the breakpoint 1 (Chr1)
- Coordinate of the breakpoint 1 (Break1)
- Chromosome of the breakpoint 2 (Chr2)
- Coordinate of the breakpoint 2 (Break2)
Sample,Chr1,Break1,Chr2,Break2,
SAMPLE1,14,16019088,12,62784483,
SAMPLE1,9,99412502,7,129302434,
SAMPLE1,13,84663781,18,52991509,
SAMPLE2,11,101374238,22,26701405,
SAMPLE2,2,121708638,7,137424167,
SAMPLE3,22,34268355,10,19871820,
SAMPLE3,8,107868940,hs37d5,20517614,
SAMPLE4,8,135644313,3,116748248,
SAMPLE4,7,6037836,21,34855497,
SAMPLE4,7,109724564,14,106387943,
Set the column names in the [result_format_ca]
section of the configuration file.
[result_format_ca]
col_chr1 = Chr1
col_break1 = Break1
col_chr2 = Chr2
col_break2 = Break2
col_opt_id = Sample
Then, execute the paplot.
paplot ca {unzip_path}/example/ca_minimal/data.csv ./tmp ca_minimal \
--config_file {unzip_path}/example/ca_minimal/paplot.cfg
2. Without header¶
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
SAMPLE00,exonic,GATA3
SAMPLE01,splicing,WASF3
SAMPLE01,intronic,WASF3
SAMPLE01,exonic,NRAS
SAMPLE02,intronic,FBXW7
SAMPLE02,intronic,GATA3
SAMPLE02,ncRNA_intronic,ACVR2B
SAMPLE03,exonic,CAP2
SAMPLE03,intronic,PIK3CA
SAMPLE03,downstream,SEPT12
When the input data has no header (column names), it is necessary to set the column number to each key in the [result_format_ca]
section of the configuration file.
[result_format_ca]
# Set the value of the header option to False
header = False
col_chr1 = 2
col_break1 = 3
col_chr2 = 4
col_break2 = 5
col_opt_id = 1
Then execute paplot.
paplot ca {unzip_path}/example/ca_noheader/data.csv ./tmp ca_noheader \
--config_file {unzip_path}/example/ca_noheader/paplot.cfg
3. Customizing categorization¶
In the minimal dataset, chromosomal aberrations are categorized into intra-chromosomal (green) and inter-chromosomal (purple). We can customize the categorization.
Sample,Chr1,Break1,Chr2,Break2,Label
SAMPLE1,14,16019088,12,62784483,C
SAMPLE1,9,99412502,7,129302434,B
SAMPLE1,13,84663781,18,52991509,A
SAMPLE2,11,101374238,22,26701405,B
SAMPLE2,2,121708638,7,137424167,C
SAMPLE2,16,43027789,22,23791492,C
SAMPLE3,22,34268355,10,19871820,A
SAMPLE3,14,56600342,hs37d5,5744957,B
SAMPLE3,Y,12191863,hs37d5,29189687,A
SAMPLE4,8,135644313,3,116748248,D
SAMPLE4,7,6037836,21,34855497,D
SAMPLE4,7,109724564,14,106387943,A
In the example data above, a new column, Label, is included apart from Sample, Chr1, Break1, Chr2, and Break2.
First, we set the Label
as the column used for categorization in the [result_format_ca]
section in the configuration file.
[result_format_ca]
col_opt_group = Label
Moreover, the color for each category can be set.
[ca]
# Set {Value}:{the name of color or RGB value} for each category and join them by comma ','.
group_color = A:#66C2A5,B:#FC8D62,C:#8DA0CB,D:#E78AC3
# Only categories registered below will be displayed.
limited_group =
# Categories registered below will not be displayed.
nouse_group =
Then, execute paplot.
paplot ca {unzip_path}/example/ca_group/data.csv ./tmp ca_group \
--config_file {unzip_path}/example/ca_group/paplot.cfg
4. Customizing pop-up information¶
We can customize the pop-up information that appears upon mouseover events. In the minimal dataset, the pop-up information is displayed as illustrated below:
Before customization

By customizing the pop-up information, we can view more detailed information on each chromosomal aberration.
After customization

Sample,Chr1,Break1,Dir1,Chr2,Break2,Dir2,MutationType,Gene1,Gene2
SAMPLE1,14,16019088,-,12,62784483,+,deletion,LS7T1EG444,4GRRIO5AVR
SAMPLE1,9,99412502,-,7,129302434,+,translocation,FQFW16UF5U,QP779MLPNV
SAMPLE1,13,84663781,+,18,52991509,-,deletion,Q9VX1I9U3I,7XM09ETN40
SAMPLE1,1,153160367,+,22,33751554,+,inversion,CEE2SPV1R1,PVYYQIVS8G
SAMPLE1,18,12249358,-,3,146222593,+,translocation,HH9OL7CK6G,XD80LI4E6Q
SAMPLE1,21,8658030,+,X,133492043,-,tandem_duplication,I20EVP15ZM,WPE8O5H237
SAMPLE1,12,120178477,+,1,155354923,-,deletion,IMYXD3TCA4,3MNN5J0MDN
SAMPLE2,11,101374238,+,22,26701405,+,translocation,FZ7LOS66RD,9WYBJR57E0
SAMPLE2,2,121708638,-,7,137424167,-,translocation,5655M5E46B,HB14VJXDHV
SAMPLE2,16,43027789,+,22,23791492,-,inversion,REFSIL0H2M,L5EA31R8U0
SAMPLE2,19,3862589,-,16,37135239,+,deletion,1IRWHVZLH8,6FUR9YMZOH
SAMPLE2,20,50294222,+,1,164250235,-,inversion,DOH5G0YRQ9,9TWYMR5CZ2
SAMPLE2,X,67392415,+,15,3327412,+,translocation,EM36MRX9B3,G4FPLN527D
SAMPLE3,22,34268355,+,10,19871820,+,tandem_duplication,9SVRQCFVCO,2BEWSO91FZ
In this example, the following five (optional) columns are incorporated apart from the five required columns:
- Mutation type (MutationType)
- Gene affected by the breakpoint 1 (Gene1)
- Gene affected by the breakpoint 2 (Gene2)
- Direction of the breakpoint 1 (Dir1)
- Direction of the breakpoint 2 (Dir2)
First, add these columns to the [result_format_ca]
section in the configuration file.
[result_format_ca]
col_opt_dir1 = Dir1
col_opt_dir2 = Dir2
col_opt_type = MutationType
col_opt_gene_name1 = Gene1
col_opt_gene_name2 = Gene2
col_opt_dir1 = Dir1
col_opt_dir2 = Dir2
The column names of the optional items can be set as col_opt_{keyword} = {actual column name}
.
For a more detailed description on keyword, please refer to About keyword.
Then, modify the [ca]
section in the configuration file.
[ca]
# before customization
# tooltip_format = [{chr1}] {break1:,}; [{chr2}] {break2:,}
# after customization
tooltip_format = [{chr1}] {break1:,} ({dir1}) {gene_name1}; [{chr2}] {break2:,} ({dir2}) {gene_name2}; {type}
Then, execute paplot.
paplot ca {unzip_path}/example/ca_option/data.csv ./tmp ca_option \
--config_file {unzip_path}/example/ca_option/paplot.cfg
For a more detailed description on the procedure to set pop-up information (tooltip_format
), please refer to User defined format.
Mutation Matrix Report¶
Here, we show describe the procedure generate Mutation Matrix report using sample data [*].
[*] | The sample data is equipped with the example directory of the paplot directory. |
1. Minimal dataset¶
For generating Mutation Matrix Report using paplot, at least sample ID (Sample), gene name (Gene) and mutation type (MutationType) are required.
Sample,MutationType,Gene
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
SAMPLE00,exonic,GATA3
SAMPLE01,splicing,WASF3
SAMPLE01,intronic,WASF3
SAMPLE01,exonic,NRAS
SAMPLE02,intronic,FBXW7
SAMPLE02,intronic,GATA3
SAMPLE02,ncRNA_intronic,ACVR2B
SAMPLE03,exonic,CAP2
SAMPLE03,intronic,PIK3CA
SAMPLE03,downstream,SEPT12
Although the column names are Sample, MutationType, and Gene, they can be arbitrary changed.
Set the column names in the [result_format_mutation]
section of the configuration file.
[result_format_mutation]
col_group = MutationType
col_gene = Gene
col_opt_id = Sample
Then, execute the paplot.
paplot mutation {unzip_path}/example/mutation_minimal/data.csv ./tmp mutation_minimal \
--config_file {unzip_path}/example/mutation_minimal/paplot.cfg
2. Without header¶
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
SAMPLE00,exonic,GATA3
SAMPLE01,splicing,WASF3
SAMPLE01,intronic,WASF3
SAMPLE01,exonic,NRAS
SAMPLE02,intronic,FBXW7
SAMPLE02,intronic,GATA3
SAMPLE02,ncRNA_intronic,ACVR2B
SAMPLE03,exonic,CAP2
SAMPLE03,intronic,PIK3CA
SAMPLE03,downstream,SEPT12
When the input data has no header (column names), it is necessary to set the column number to each key in the [result_format_mutation]
section of the configuration file.
[result_format_mutation]
# Set the value of the header option to False
header = False
col_group = 2
col_gene = 3
col_opt_id = 1
Then execute paplot.
paplot mutation {unzip_path}/example/mutation_noheader/data.csv ./tmp mutation_noheader \
--config_file {unzip_path}/example/mutation_noheader/paplot.cfg
3. Customizing pop-up information¶
We can customize the pop-up information that appears upon mouseover events. In the minimal dataset, the pop-up information displays sample, gene, and mutation type as illustrated below:
Before customization

By customizing the configuration file, the information of positions and substitution types can be incorporated.
After customization

Sample,Chr,Start,Ref,Alt,MutationType,Gene
SAMPLE00,chr10,8114472,A,C,intronic,GATA3
SAMPLE00,chr13,28644892,G,-,intronic,FLT3
SAMPLE00,chr13,28664636,-,G,intronic,FLT3
SAMPLE00,chr16,68795521,-,T,UTR3,CDH1
SAMPLE00,chr10,8117068,G,T,exonic,GATA3
SAMPLE00,chr3,178906688,G,A,intronic,PIK3CA
SAMPLE00,chr13,28603715,G,-,intergenic,FLT3
SAMPLE00,chr14,103368263,G,C,intronic,TRAF3
In the example data above, the following four (optional) items are incorporated a part from sample ID, gene name, and mutation type (required items).
- Chromosome (Chr)
- Variant start position (Start)
- Reference base (Ref)
- Alternative base (Alt)
First, add these columns to the [result_format_mutation]
section in the configuration file.
[result_format_mutation]
col_opt_chr = Chr
col_opt_start = Start
col_opt_ref = Ref
col_opt_alt = Alt
The column names of optional items can be set as col_opt_{keyword} = {actual column name}
.
For a more detailed description on keyword, please refer to About keyword.
Then, modify the [mutation]
section in the configuration file.
[mutation]
# before customization
# tooltip_format_checker_partial = Mutation Type[{group}]
# after customization
tooltip_format_checker_partial = Mutation Type[{group}] {chr}:{start:,} [{ref} -> {alt}]
Then, execute paplot.
paplot mutation {unzip_path}/example/mutation_option/data.csv ./tmp mutation_option \
--config_file {unzip_path}/example/mutation_option/paplot.cfg
Here, we describe the procedure to customize the pop-up for each element in the main grid. For customizing other pop-ups, please refer the following:
Six types are set for each display location; however the method of writing is identical.
Correspondence between setting items and display
The following can also be used as a special keyword:
{#number_id}: | the number of mutations per sample |
---|---|
{#number_gene}: | the number of mutations per gene |
{#number_mutaion}: | |
the number of mutations (Even if the same sample is detected multiple times with the same gene, it counts as 1.) | |
{#sum_mutaion}: | Total number of mutations |
{#item_value}: | Value of one item of stacked graph |
{#sum_item_value}: | |
Total value of stacked graph |
Moreover, for a more detailed description of the procedure to set pop-up information, please refer to User defined format.
Mutational Signature Report¶
Here, we describe the procedure to generate Mutation Signature Report using sample data [*].
[*] The sample data is equipped with the example
directory of the paplot directory.
1. Input data format¶
To generate Mutation Signature Report using paplot, json format input data is required.
{
"signature":[
[ # signature 1
[0.0018,0.0003,0.0002,0.0005,0.0014,0.0008,0.0002,0.0007,0.0012,0.0003,0.0002,0.0004,0.0271,0.0107,0.0016,0.0145], # C -> A
[0.0023,0.0007,0.0001,0.002,0.0027,0.0005,0.0004,0.0032,0.0007,0.0004,0.0001,0.0013,0.1546,0.0306,0.0055,0.1931], # C -> G
[0.0043,0.0016,0.0027,0.0019,0.0096,0.0026,0.0046,0.0053,0.0045,0.0021,0.0034,0.0028,0.2612,0.0517,0.0284,0.1335], # C -> T
[0.0012,0.0007,0.0004,0.0003,0.0003,0.0003,0,0,0.0003,0.0001,0.0003,0,0.0005,0.0001,0.0001,0.0002], # T -> A
[0.0008,0.0003,0.0008,0.0007,0.0002,0.0004,0.0009,0.0005,0.0004,0.0003,0.0006,0.0003,0.0003,0.0004,0.0002,0.0004], # T -> C
[0.0001,0.0001,0.0001,0.0001,0,0.0001,0.0001,0,0.0001,0.0001,0.0009,0.0002,0.0001,0,0.0001,0.0005] # T -> G
],
[ # signature 2
[0.0266,0.0222,0.0026,0.02,0.0205,0.0145,0.0012,0.0155,0.0155,0.0094,0.0009,0.011,0.0224,0.0177,0.0019,0.0307],
[0.0127,0.0079,0.0035,0.0145,0.0058,0.0048,0.0015,0.0115,0.0034,0.0032,0,0.0071,0.0047,0.0145,0.0006,0.0246],
[0.0232,0.0099,0.042,0.0184,0.014,0.0108,0.0219,0.02,0.0137,0.0102,0.0264,0.0128,0.0048,0.0186,0.0153,0.0165],
[0.0096,0.0084,0.0094,0.0175,0.0075,0.0076,0.0046,0.0123,0.0044,0.0035,0.0028,0.008,0.0176,0.0047,0.0031,0.0139],
[0.0245,0.0087,0.0144,0.0235,0.0098,0.0096,0.0051,0.0102,0.0105,0.0053,0.0042,0.0108,0.0114,0.0081,0.0038,0.0098],
[0.0046,0.0006,0.0036,0.0035,0.0025,0.0009,0.0028,0.0082,0.0023,0.0005,0.004,0.0048,0.0041,0.0012,0.0056,0.0104]
]
],
"id":["PD3851a","PD3890a","PD3904a"],
"mutation":[[0,0,0.0594],[0,1,0.7677],[0,2,0.1727],[1,0,0.1474],[1,1,0.4064],[1,2,0.4461]],
"mutation_count":[4001,7174,5804]
}
Elements of the input data for Mutation Signature Report
signature: | Probability masses for each mutation pattern.
Input the probability value for each mutation signature, substitution pattern (e.g., C > A), and context (e.g., TpCpA > TpApA).
The number of bases should be three or five.
The number of contexts for each substitution pattern should be identical (16 and 256 when the numbers of bases are three and five, respectively).
|
---|
As the number of bases is three in the above example data, probability values for the 16 contexts should be put down in the following order:
ANA,ANC,ANG,ANT,CNA,CNA,CNG,CNT,GNA,GNC,GNG,GNT,TNA,TNA,TNG,TNT
When base = 5, the 256 context values should be put down in the following order:
AANAA,AANAC,AANAG,AANAT,AANCA,AANCC,AANCG,AANCT,AANGA,AANGC,AANGG,AANGT,AANTA,AANTC,AANTG,AANTT,
ACNAA,ACNAC,ACNAG,ACNAT,ACNCA,ACNCC,ACNCG,ACNCT,ACNGA,ACNGC,ACNGG,ACNGT,ACNTA,ACNTC,ACNTG,ACNTT,
AGNAA,AGNAC,AGNAG,AGNAT,AGNCA,AGNCC,AGNCG,AGNCT,AGNGA,AGNGC,AGNGG,AGNGT,AGNTA,AGNTC,AGNTG,AGNTT,
ATNAA,ATNAC,ATNAG,ATNAT,ATNCA,ATNCC,ATNCG,ATNCT,ATNGA,ATNGC,ATNGG,ATNGT,ATNTA,ATNTC,ATNTG,ATNTT,
CANAA,CANAC,CANAG,CANAT,CANCA,CANCC,CANCG,CANCT,CANGA,CANGC,CANGG,CANGT,CANTA,CANTC,CANTG,CANTT,
CCNAA,CCNAC,CCNAG,CCNAT,CCNCA,CCNCC,CCNCG,CCNCT,CCNGA,CCNGC,CCNGG,CCNGT,CCNTA,CCNTC,CCNTG,CCNTT,
CGNAA,CGNAC,CGNAG,CGNAT,CGNCA,CGNCC,CGNCG,CGNCT,CGNGA,CGNGC,CGNGG,CGNGT,CGNTA,CGNTC,CGNTG,CGNTT,
CTNAA,CTNAC,CTNAG,CTNAT,CTNCA,CTNCC,CTNCG,CTNCT,CTNGA,CTNGC,CTNGG,CTNGT,CTNTA,CTNTC,CTNTG,CTNTT,
GANAA,GANAC,GANAG,GANAT,GANCA,GANCC,GANCG,GANCT,GANGA,GANGC,GANGG,GANGT,GANTA,GANTC,GANTG,GANTT,
GCNAA,GCNAC,GCNAG,GCNAT,GCNCA,GCNCC,GCNCG,GCNCT,GCNGA,GCNGC,GCNGG,GCNGT,GCNTA,GCNTC,GCNTG,GCNTT,
GGNAA,GGNAC,GGNAG,GGNAT,GGNCA,GGNCC,GGNCG,GGNCT,GGNGA,GGNGC,GGNGG,GGNGT,GGNTA,GGNTC,GGNTG,GGNTT,
GTNAA,GTNAC,GTNAG,GTNAT,GTNCA,GTNCC,GTNCG,GTNCT,GTNGA,GTNGC,GTNGG,GTNGT,GTNTA,GTNTC,GTNTG,GTNTT,
TANAA,TANAC,TANAG,TANAT,TANCA,TANCC,TANCG,TANCT,TANGA,TANGC,TANGG,TANGT,TANTA,TANTC,TANTG,TANTT,
TCNAA,TCNAC,TCNAG,TCNAT,TCNCA,TCNCC,TCNCG,TCNCT,TCNGA,TCNGC,TCNGG,TCNGT,TCNTA,TCNTC,TCNTG,TCNTT,
TGNAA,TGNAC,TGNAG,TGNAT,TGNCA,TGNCC,TGNCG,TGNCT,TGNGA,TGNGC,TGNGG,TGNGT,TGNTA,TGNTC,TGNTG,TGNTT,
TTNAA,TTNAC,TTNAG,TTNAT,TTNCA,TTNCC,TTNCG,TTNCT,TTNGA,TTNGC,TTNGG,TTNGT,TTNTA,TTNTC,TTNTG,TTNTT
Elements for signature contribution graph
This graph is optional.
Signature contribution graph presents the amount of mutations associated with each mutation signature. When id, mutation, and mutation_count are set in the input json file, the signature contribution graph is generated (example ).
id: | List of samples. For each sample, sample indices are assigned (in this example, PD3851a = 0, PD3890a = 1, PD3904a = 2, etc.).
|
---|---|
mutation_count: | The number of mutations for each sample (the mutation number for PD3851a = 4001, that for PD3890a = 7174, etc.).
|
mutation: | Contribution ratio of each mutation signature to each sample ([sample index, signature index, value]).
The indices for mutation signature (signature index) are assigned in the listed order in the signature key.
In the above example, (signature1 = 0, signature2 = 1, signature3 = 2).
|
Note
The keys in the input json file can be modified by changing the contents in the [result_format_signature]
section of the configuration file.
[result_format_signature]
# the keys in input json file
key_signature = signature
key_id = id
key_mutation = mutation
key_mutation_count = mutation_count
Note
One procedure to validate json file format
paplot using json python package. When the input file can be loaded successfully using the load() function from json python package, then the input file is confirmed to be valid json format.
Example, when the file name is “data2.json”.
$ python
>>> import json
>>> json.load(open("data2.json"))
2. Minimal dataset¶
For the format of input data, please refer to 1. Input data format.
Input data file (the number of mutation signatures is two)
{
"signature":[
# signature 1
[
[0.0021,0.0006,0.0002,0.0007,0.0017,0.001,0.0003,0.0009,0.0014,0.0006,0.0003,0.0006,0.027,0.0108,0.0016,0.0147],
[0.0025,0.0009,0.0002,0.0022,0.0029,0.0007,0.0005,0.0034,0.0009,0.0006,0.0002,0.0014,0.1504,0.0301,0.0053,0.1884],
[0.0046,0.0018,0.0031,0.0021,0.0097,0.0029,0.0049,0.0055,0.0047,0.0024,0.0037,0.003,0.2557,0.0513,0.0286,0.1312],
[0.0014,0.0009,0.0007,0.0006,0.0004,0.0005,0.0003,0.0003,0.0004,0.0003,0.0005,0.0002,0.0008,0.0003,0.0003,0.0005],
[0.001,0.0004,0.0011,0.001,0.0003,0.0007,0.0012,0.0008,0.0006,0.0004,0.0007,0.0005,0.0005,0.0007,0.0004,0.0007],
[0.0003,0.0003,0.0003,0.0003,0.0001,0.0003,0.0003,0.0003,0.0002,0.0002,0.0011,0.0004,0.0003,0.0002,0.0003,0.0009]
],
# signature 2
[
[0.022,0.0183,0.0028,0.0171,0.0192,0.0148,0.0026,0.0157,0.0143,0.0108,0.0018,0.0116,0.0181,0.016,0.0021,0.0246],
[0.0133,0.0088,0.0037,0.0136,0.0095,0.008,0.003,0.0131,0.0065,0.0063,0.0016,0.0095,0.0044,0.0135,0.0016,0.0171],
[0.0195,0.0098,0.0283,0.0159,0.0138,0.0112,0.0156,0.0183,0.0128,0.0108,0.0186,0.0127,0,0.0146,0.0095,0.0115],
[0.0095,0.0085,0.0102,0.0155,0.0077,0.0102,0.0096,0.0135,0.0054,0.0052,0.0058,0.0089,0.0145,0.0076,0.0058,0.016],
[0.0192,0.0089,0.0135,0.0198,0.0089,0.0113,0.0092,0.0117,0.0092,0.0063,0.0064,0.01,0.0107,0.0096,0.0061,0.0123],
[0.0059,0.0028,0.0068,0.0063,0.0039,0.0044,0.0076,0.0101,0.004,0.0028,0.007,0.0064,0.006,0.0046,0.008,0.0132]
]
]
}
Configuration file
[signature]
tooltip_format_signature_title = {sig}
tooltip_format_signature_partial = {route}: {#sum_item_value:6.2}
signature_y_max = -1
alt_color_CtoA = #1BBDEB
alt_color_CtoG = #211D1E
alt_color_CtoT = #E62623
alt_color_TtoA = #CFCFCF
alt_color_TtoC = #ACD577
alt_color_TtoG = #EDC7C4
[result_format_signature]
format = json
background = False
key_signature = signature
Execute paplot.
paplot signature signature_minimal/data.json ./tmp signature_minimal \
--config_file ./signature_minimal/paplot.cfg
Then the report is generated in the tmp directory.
Here, the file names (graph_signature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).
./tmp
┗ signature_minimal
┗ graph_signature2.html
3. Mutation signature with multiple numbers of signatures¶
For the format of input data, please refer to 1. Input data format.
The input data for each signature number and configuration file are necessary for generating Mutation Signature Report with various numbers of signatures.
In this example dataset, the following files are prepared:
example/signature_multi_class/
# Input data files
┣ data2.json # signature num = 2
┣ data3.json # signature num = 3
┣ data4.json # signature num = 4
┣ data5.json # signature num = 5
┣ data6.json # signature num = 6
# Configuration file
┗ paplot.cfg
Execute paplot for each mutation signature number.
paplot signature signature_multi_class/data2.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data3.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data4.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data5.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
paplot signature signature_multi_class/data6.json ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
Or, execute the following batch command:
paplot signature "signature_multi_class/data*.json" ./tmp signature_multi_class \
--config_file ./signature_multi_class/paplot.cfg
Then, the report is generated in the tmp directory.
Here, the file names (graph_signature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).
./tmp
┗ signature_multi_class
┣ graph_signature2.html
┣ graph_signature3.html
┣ graph_signature4.html
┣ graph_signature5.html
┗ graph_signature6.html
4. Signature contribution graph¶
Here, we add a signature contribution graph.
For the format of input data, please refer to 1. Input data format.
For generating report with various signature numbers, please refer to 3. Mutation signature with multiple numbers of signatures.
Execute paplot.
paplot signature "signature_stack/data*.json" ./tmp signature_stack \
--config_file ./signature_stack/paplot.cfg
pmsignature Report¶
Here, we show describe the procedure generate pmsignature Report using sample data [*].
[*] the sample data is equipped with the example
directory of the paplot directory.
1. Input data format¶
To generate pmsignature Report using paplot, json format input data is required.
{
"ref":[
[ # pmsignature 1
[0.338,0.15,0.183,0.327], # ref1 (A,C,G,T)
[0.362,0.191,0.177,0.267], # ref2 (A,C,G,T)
[0,0.731,0,0.268], # ref3 (A,C,G,T)
[0.31,0.165,0.251,0.272], # ref4 (A,C,G,T)
[0.295,0.193,0.168,0.341] # ref5 (A,C,G,T)
],
[ # pmsignature 2
[0.179,0.414,0.084,0.321],
[0.007,0.025,0.004,0.962],
[0,0.999,0,0],
[0.472,0.104,0.041,0.381],
[0.277,0.175,0.284,0.262]
]
],
"alt":[
[ # pmsignature 1
[0,0,0,0], # altA (A,C,G,T)
[0.194,0,0.091,0.445], # altC (A,C,G,T)
[0,0,0,0], # altG (A,C,G,T)
[0.093,0.163,0.011,0] # altT (A,C,G,T)
],
[ # pmsignature 2
[0,0,0,0],
[0.059,0,0.437,0.502],
[0,0,0,0],
[0,0,0,0]
]
],
"strand":[
[0.461,0.538], # pmsignature 1
[0.512,0.487] # pmsignature 2
],
"id":["PD3851a","PD3890a","PD3904a"],
"mutation":[[0,0,0.535],[0,1,0.038],[0,2,0.426],[1,0,0.186],[1,1,0.156],[1,2,0.656]],
"mutation_count":[702,2312,2096]
}
Elements of the input data for pmsignature Report
ref: | Values of reference bases (in the order of A, C, G, T) for each mutation signature.
Not necessarily sum-to-one (normalized within the program).
In this example, the number of bases is five. However, this number can be changed to any arbitrary odd number (e.g., 3, 7).
|
---|---|
alt: | Values for alternative base (in the order of A, C, G, T for each central reference base) for each mutation signature.
Four values (in the order of A, C, G, T) for each reference base A, C, G, and T. Therefore, in total 16 values are required for each mutation signature.
Usually, the central base is fixed to C or T. Therefore, the values whose reference bases are A or G contribute negligibly to the visualization (and thus can be set to zero).
|
strand: | Values for the strand (in the order of plus and minus) for each mutation signature.
When strand biasness is not taken into account, set
[0, 0] . |
Elements for signature contribution graph
This graph is optional.
Signature contribution graph presents the amount of mutations associated with each mutation signature. When id, mutation, and mutation_count are set in the input json file, the signature contribution graph is generated (example).
id: | List of samples. For each sample, sample indices are assigned (in this example, PD3851a = 0, PD3890a = 1, PD3904a = 2, etc.).
|
---|---|
mutation_count: | The number of mutations for each sample (in this example, the mutation number for PD3851a = 702, that for PD3890a = 2312, etc.).
|
mutation: | Contribution ratio of each mutation signature to each sample ([sample index, signature index, value]).
The indices for mutation signature (signature index) are assigned in the listed order in the signature key.
In the above example, (signature1 = 0, signature2 = 1, signature3 = 2).
|
Note
The keys in the input json file can be modified by changing the contents in the [result_format_pmsignature]
section of the configuration file.
[result_format_pmsignature]
format = json
background = True
key_ref = ref
key_alt = alt
key_strand = strand
key_id = id
key_mutation = mutation
key_mutation_count = mutation_count
Note
The procedure to validate json file format
paplot using json python package. When loading the input file using load function from json package, then the input file is valid json format.
For e.g., when the file fine name is “data2.json”.
$ python
>>> import json
>>> json.load(open("data2.json"))
2. Minimal dataset¶
For the format of input data, please refer to 1. Input data format.
{
"ref":[[[0.189,0.395,0.088,0.326],[0.019,0.029,0.01,0.94],[0,0.999,0,0],[0.467,0.103,0.054,0.374],[0.278,0.175,0.276,0.268]]],
"alt":[[[0,0,0,0],[0.063,0,0.415,0.521],[0,0,0,0],[0,0,0,0]]],
"strand":[[0.514,0.485]]
}
Configuration file
[pmsignature]
tooltip_format_ref1 = A: {a:.2}
tooltip_format_ref2 = C: {c:.2}
tooltip_format_ref3 = G: {g:.2}
tooltip_format_ref4 = T: {t:.2}
tooltip_format_alt1 = C -> A: {ca:.2}
tooltip_format_alt2 = C -> G: {cg:.2}
tooltip_format_alt3 = C -> T: {ct:.2}
tooltip_format_alt4 = T -> A: {ta:.2}
tooltip_format_alt5 = T -> C: {tc:.2}
tooltip_format_alt6 = T -> G: {tg:.2}
tooltip_format_strand = + {plus:.2} - {minus:.2}
color_A = #06B838
color_C = #609CFF
color_G = #B69D02
color_T = #F6766D
color_plus = #00BEC3
color_minus = #F263E2
[result_format_pmsignature]
format = json
background = True
key_ref = ref
key_alt = alt
key_strand = strand
Execute paplot.
paplot pmsignature pmsignature_minimal/data.json ./tmp pmsignature_minimal \
--config_file ./pmsignature_minimal/paplot.cfg
Then, the report is generated in the tmp directory.
Here, the file names (graph_pmsignature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).
./tmp
┗ pmsignature_minimal
┗ graph_pmsignature2.html
Note
Since one signature is assigned to background signature in this example, the last signature in the contribution graph is background signature.
3. Mutation signature with multiple numbers of signatures¶
For the format of input data, please refer to 1. Input data format.
The input data for each signature number and configuration file are required for generating Mutation Signature Report with various numbers of signatures.
In this example dataset, following files are prepared.
example/pmsignature_multi_class/
# Input data files
┣ data2.json # pmsignature num = 2
┣ data3.json # pmsignature num = 3
┣ data4.json # pmsignature num = 4
┣ data5.json # pmsignature num = 5
┣ data6.json # pmsignature num = 6
# Configuration file
┗ paplot.cfg
Execute paplot for each mutation signature number.
paplot pmsignature pmsignature_multi_class/data2.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg
paplot pmsignature pmsignature_multi_class/data3.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg
paplot pmsignature pmsignature_multi_class/data4.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg
paplot pmsignature pmsignature_multi_class/data5.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg
paplot pmsignature pmsignature_multi_class/data6.json ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg
Or, execute the following batch command.
paplot pmsignature "pmsignature_multi_class/data*.json" ./tmp pmsignature_multi_class \
--config_file ./pmsignature_multi_class/paplot.cfg
Then, the report is generated in the tmp directory.
Here, the file names (graph_pmsignature2.html) are determined by the number of mutation signatures (interpreted automatically from the input data).
./tmp
┗ pmsignature_multi_class
┣ graph_pmsignature2.html
┣ graph_pmsignature3.html
┣ graph_pmsignature4.html
┣ graph_pmsignature5.html
┗ graph_pmsignature6.html
Note
Since one signature is assigned to background signature in this example, the last signature in the contribution graph is background signature.
4. Signature contribution graph¶
Here, we add a signature contribution graph.
For the format of input data, please refer to 1. Input data format.
For generating report with various signature numbers, please refer to 3. Mutation signature with multiple numbers of signatures.
Execute paplot.
paplot pmsignature "pmsignature_stack/data*.json" ./tmp pmsignature_stack \
--config_file ./pmsignature_stack/paplot.cfg
Note
Since one signature is assigned to background signature in this example, the last signature in the contribution graph is background signature.
5. Without background¶
Here, we generate a pmsignature Report without background.
Set the background option to False in the configuration file.
[result_format_pmsignature]
background = False
Then, execute paplot.
paplot pmsignature pmsignature_nobackground/data.json ./tmp pmsignature_nobackground \
--config_file ./pmsignature_nobackground/paplot.cfg
Common Issues¶
Here, we describe several common issues that are helpful in generating reports using sample data [*].
[*] | The sample data is equipped with the example directory of the paplot directory. |
1. Delimiter for the input data¶
When the delimiter for the input data is tab or space character, modify the configuration file as follows:
# For the case of Mutation Matrix Report
[result_format_mutation]
sept = \t
# for the case of space character
sept = " "
For QC and Chromosomal Aberration report, change the [result_format_qc]
and [result_format_ca]
sections.
2. Comment line¶
# This is comment.
# Please skip this line.
ID,Type,Gene
SAMPLE00,intronic,GATA3
SAMPLE00,UTR3,CDH1
In the default setting, the character “#” is used to denote the start of a comment line, and paplot ignores it. To change the character for the comment line, modify the configuration file as follows:
# For the case of Mutation Matrix Report
[result_format_mutation]
comment = #
For QC and Chromosomal Aberration report, change the [result_format_qc]
and [result_format_ca]
sections.
3. Processing multiple input data¶
Generally, cancer genome study uses multiple sequence data, and the reports generated by paplot consist of information from multiple samples. There are two approaches for preparing input data with multiple samples for paplot,
case1: Single merged input data
In this case, there should be a column for the sample name (that should be set in the key
col_opt_id
in the configuration file).case2: Multiple input data divided among individual samples
In this case, paplot discern samples names by file names (by removing the character set by the
suffix
key in the configuration file). Alternatively, set the column for the sample name (and set thecol_opt_id
in the configuration file) for each input file.
In the previous examples, we generally used merged input data (case 1 above). Here, we describe the procedure for generating a report using multiple input data (case 2).
example/mutation_split_file/
# Input data files
┣ SAMPLE00.data.csv # input data for SAMPLE00
┣ SAMPLE01.data.csv # input data for SAMPLE01
┣ SAMPLE02.data.csv # input data for SAMPLE02
┣ SAMPLE03.data.csv # input data for SAMPLE03
┣ SAMPLE04.data.csv # input data for SAMPLE04
# Configuration file
┗ paplot.cfg
MutationType,Gene
intronic,GATA3
intronic,FLT3
intronic,FLT3
UTR3,CDH1
exonic,GATA3
Set the suffix
key in the configuration file.
[result_format_mutation]
suffix = .data.csv
# Do not use the col_opt_id
col_opt_id =
When the suffix
key is set, the file name before the suffix
characters becomes the sample name.
Then, execute paplot.
# For the case of Mutation Matrix Report
# When setting each input file, join them by ','.
paplot mutation {unzip_path}/example/mutation_split_file/SAMPLE00.data.csv,{unzip_path}/example/mutation_split_file/SAMPLE01.data.csv ./tmp mutation_split_file \
--config_file {unzip_path}/example/mutation_split_file/paplot.cfg
# Moreover, wild-card character can be used. (Enclose in double quotations).
paplot mutation "{unzip_path}/example/mutation_split_file/*.csv" ./tmp mutation_split_file \
--config_file {unzip_path}/example/mutation_split_file/paplot.cfg
For QC and Chromosomal Aberration report, change the [result_format_qc]
and [result_format_ca]
sections.
4. Keyword¶
4-1. About keyword¶
For each column name, we can set the keyword by setting the configuration file. Keywords will be used for customizing pop-up information, etc.
Configuration file
[result_format_mutation]
# Required items
# col_{key} = {actual column name}
#
col_gene = Gene
col_group = MutationType
# Optional items
# col_opt_{key} = {actual column name}
#
col_opt_id = Sample
col_opt_start = Start
col_opt_end = End
col_{keyword} = {actual column name}
or col_opt_{keyword} = {actual column name}
entries, {keyword}
will be the keyword.
Please note the following points:
- The keywords are case-independent. For example, CHR, Chr, and chr are considered identical.
- The part
{keyword}
can be set arbitrarily. However, always start withcol_opt_
.col_opt_id
is to be used only for sample ID.- For Mutation Matrix and Chromosomal Aberration Report,
col_opt_group
is also reserved for grouping and cannot be used for other purposes.- Mutational Signature Report and pmsignature Report does not use these keywords.
5. User defined format¶
We can customize the pop-up information that appears upon mouseover events.
For each report and graph, it is necessary to set up the contents of pop-up information. However, the manner of writing is similar.
Configuration file
tooltip_format_checker_partial = type[{func}], {chr}:{start}:{end}, [{ref} -> {alt}]
# will be displayed as:
type[exome], chr1:2000:2001, [A -> T]
The words surrounded by {} are keywords; when the pop-up information is displayed, keywords will be replaced by the actual value.
5-1. Numerical calculation¶
paplot can use one or more keywords to perform numerical calculations.
{key1/key2*100}%
# will be displayed as (no rounding)
3.33333333333333%
If you wish to round-off decimals,
add the value for the number of digits to be displayed after the decimal point;
e.g. add : .2
to display two digits after the decimal point in the keyword value.
{key1/key2*100:.2}%
# will be displayed as (with rounding)
3.33%
5-2. Separated digits¶
If you wish to put commas in every third digit, add : ,
after the keyword value.
{key1}
# will be displayed as (with no digit separator)
123456789
{key1:,}
# will be displayed as (with digit separator)
123,456,789
Release Note¶
Attention
v0.5.4¶
- Added several example data.
- Modified several items in the configuration file.
v0.5.3¶
Added options related to font-size.
v0.5.2¶
- Added the functions to generate Mutation Signature Report and pmsignature Report.
- Modified the name of subcommand sv -> ca.
v0.4.0¶
Added the function for saving the image.
v0.3.1¶
Modified certain bugs related to Mutation Matrix Report.
v0.3.0¶
Incorporated functions to generate Mutation Matrix Report.
v0.2.8¶
Modified specification of function for merging multiple files.
v0.2.7¶
Incorporated functions to generate QC Report and SV (Structural Variation) Report.
License¶
paplot is licensed under the MIT License.
Contact¶
E-mail: | genomon.devel@gmail.com |
---|