nanopolish¶
nanopolish is a software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more (see Nanopolish modules, below).
Installation¶
Dependencies¶
A compiler that supports C++11 is needed to build nanopolish. Development of the code is performed using gcc-4.8.
By default, nanopolish will download and compile all of its required dependencies. Some users however may want to use system-wide versions of the libraries. To turn off the automatic installation of dependencies set HDF5=noinstall, EIGEN=noinstall or HTS=noinstall parameters when running make as appropriate. The current versions and compile options for the dependencies are:
- libhdf5-1.8.14 compiled with multi-threading support
--enable-threadsafe
- eigen-3.2.5
- htslib-1.4
Additionally the helper scripts require biopython and pysam.
Installing the latest code from github (recommended)¶
You can download and compile the latest code from github as follows
git clone --recursive https://github.com/jts/nanopolish.git
cd nanopolish
make
Installing a particular release¶
When major features have been added or bugs fixed, we will tag and release a new version of nanopolish. If you wish to use a particular version, you can checkout the tagged version before compiling
git clone --recursive https://github.com/jts/nanopolish.git
cd nanopolish
git checkout v0.7.1
make
To run using docker¶
First build the image from the dockerfile:
docker build .
Note the uuid given upon successful build. Then you can run nanopolish from the image:
docker run -v /path/to/local/data/data/:/data/ -it :image_id ./nanopolish eventalign -r /data/reads.fa -b /data/alignments.sorted.bam -g /data/ref.fa
Quickstart - how to polish a genome assembly¶
The original purpose of nanopolish was to improve the consensus accuracy of an assembly of Oxford Nanopore Technology sequencing reads. Here we provide a step-by-step tutorial to help you get started.
Requirements:
Download example dataset¶
You can download the example dataset we will use here:
wget http://s3.climb.ac.uk/nanopolish_tutorial/ecoli_2kb_region.tar.gz
tar -xvf ecoli_2kb_region.tar.gz
cd ecoli_2kb_region
Details:
- Sample : E. coli str. K-12 substr. MG1655
- Instrument : MinION sequencing R9.4 chemistry
- Basecaller : Albacore v2.0.1
- Region: “tig00000001:200000-202000”
- Note: Ligation-mediated PCR amplification performed
This is a subset of reads that aligned to a 2kb region in the E. coli draft assembly. To see how we generated these files please refer to the tutorial creating_example_dataset.
You should find the following files:
reads.fasta
: subset of basecalled readsdraft.fa
: draft genome assemblydraft.fa.fai
: draft genome assembly indexfast5_files/
: a directory containing FAST5 filesecoli_2kb_region.log
: a log file for how the dataset was created with nanopolish helper script (scripts/extract_reads_aligned_to_region.py
)
For the evaluation step you will need the reference genome:
curl -o ref.fa https://ftp.ncbi.nih.gov/genomes/archive/old_genbank/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid225/U00096.ffn
Analysis workflow¶
The pipeline below describes the recommended analysis workflow for larger datasets. In this tutorial, we will run through the basic steps of the pipeline for this smaller (2kb) dataset.

Data preprocessing¶
nanopolish needs access to the signal-level data measured by the nanopore sequencer. To begin, we need to create an index readdb
file that links read ids with their signal-level data in the FAST5 files:
nanopolish index -d fast5_files/ reads.fasta
We get the following files: reads.fasta.index
, reads.fasta.index.fai
, reads.fasta.index.gzi
, and reads.fasta.index.readdb
.
Compute the draft genome assembly using canu¶
As computing the draft genome assembly takes a few hours we have included the pre-assembled data for you (draft.fa
).
We used the following parameters with canu:
canu \
-p ecoli -d outdir genomeSize=4.6m \
-nanopore-raw albacore-2.0.1-merged.fastq
Compute a new consensus sequence for a draft assembly¶
Now that we have reads.fasta
indexed with nanopolish index
, and have a draft genome assembly draft.fa
, we can begin to improve the assembly with nanopolish. Let us get started!
First step, is to index the draft genome assembly. We can do that with the following command:
minimap2 -d draft.mmi draft.fa
Next, we align the original reads (reads.fasta
) to the draft assembly (draft.fa
) and sort alignments:
minimap2 -ax map-ont -t 8 draft.fa reads.fasta | samtools sort -o reads.sorted.bam -T reads.tmp
samtools index reads.sorted.bam
Checkpoint: we can do a quick check to see if this step worked. The bam file should not be empty.
samtools view reads.sorted.bam | head
Then we run the consensus algorithm. For larger datasets we use nanopolish_makerange.py
to split the draft genome assembly into 50kb segments, so that we can run the consensus algorithm on each segment in parallel. The output would be the polished segments in fasta
format.
Since our dataset is only covering a 2kb region, we skip this step and use the following command:
nanopolish variants --consensus polished.fa \
-w "tig00000001:200000-202000" \
-r reads.fasta \
-b reads.sorted.bam \
-g draft.fa
We are left with our desired output: polished.fa
.
Evaluate the assembly¶
To analyze how nanopolish performed improving the accuracy we use MUMmer. MUMmer contains “dnadiff”, a program that enables us to see a report on alignment statistics. With dnadiff we can compare the two different assemblies.
mkdir analysis
MUMmer3.23/dnadiff --prefix analysis/draft.dnadiff ref.fa draft.fa
MUMmer3.23/dnadiff --prefix analysis/polished.dnadiff ref.fa polished.fa
This generates draft.dnadiff.report
and polished.dnadiff.report
along with other files. The metric we are interested in is AvgIdentity
under [ Alignments ] 1-to-1
, which is a measurement of how similar the genome assemblies are to the reference genome. We expect to see a higher value for the polished assembly than the draft ( 99.90
vs 99.53
), concluding that the nanopolish consensus algorithm worked successfully.
Note
The example dataset was PCR amplified causing a loss of methylation information. We recommend using the -q dam,dcm
with nanopolish variants --consensus
if you have data with methylation information to account for known bacterial methyltransferases.
Quickstart - how to align events to a reference genome¶
The eventalign module in nanopolish is used to align events or “squiggles” to a reference genome. We (the developers of nanopolish) use this feature extensively when we want to see what the low-level signal information looks like. It helps us model the signal and discover differences in current that might hint at base modifications. Here we provide a step-by-step tutorial to help you get started with the nanopolish eventalign module.
For more information about eventalign:
- Blog post: “Aligning Nanopore Events to a Reference”
- Paper: “A complete bacterial genome assembled de novo using only nanopore sequencing data”
Requirements:
Download example dataset¶
You can download the example dataset we will use here:
wget http://s3.climb.ac.uk/nanopolish_tutorial/ecoli_2kb_region.tar.gz
tar -xvf ecoli_2kb_region.tar.gz
cd ecoli_2kb_region
Details:
- Sample : E. coli str. K-12 substr. MG1655
- Instrument : MinION sequencing R9.4 chemistry
- Basecaller : Albacore v2.0.1
- Region: “tig00000001:200000-202000”
- Note: Ligation-mediated PCR amplification performed
This is a subset of reads that aligned to a 2kb region in the E. coli draft assembly. To see how we generated these files please refer to this section: Tutorial - using extraction helper script to create example datsets.
You should find the following files:
reads.fasta
: subset of basecalled readsfast5_files/
: a directory containing FAST5 files
You will need the E. coli reference genome:
curl -o ref.fa https://ftp.ncbi.nih.gov/genomes/archive/old_genbank/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid225/U00096.ffn
Align the reads with minimap2¶
In order to run minimap2 we first need to index the reference genome:
minimap2 -d ref.mmi ref.fa
Output files: ref.mmi
.
We will need to index the reads as well:
nanopolish index -d fast5_files/ reads.fasta
Output files: reads.fasta.index
, reads.fasta.index.fai
, reads.fasta.index.gzi
, and reads.fasta.index.readdb
.
Then we can align the reads to the reference:
minimap2 -ax map-ont -t 8 ref.fa reads.fasta | samtools sort -o reads-ref.sorted.bam -T reads.tmp
samtools index reads-ref.sorted.bam
Output files: reads-ref.sorted.bam
and reads-ref.sorted.bam.bai
.
Checkpoint: Let’s see if the bam file is not truncated. This will check that the beginning of the file contains a valid header, and checks if the EOF is present. This will exit with a non-zero exit code if the conditions were not met:
samtools quickcheck reads-ref.sorted.bam
Align the nanopore events to a reference¶
Now we are ready to run nanopolish to align the events to the reference genome:
nanopolish eventalign \
--reads reads.fasta \
--bam reads-ref.sorted.bam \
--genome ref.fa \
--scale-events > reads-ref.eventalign.txt
Assess the eventalign output¶
If we take a peek at the first few lines of reads-ref.eventalign.txt
this is what we get:
contig position reference_kmer read_index strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level
gi|545778205|gb|U00096.3|:c514859-514401 3 ATGGAG 0 t 16538 89.82 3.746 0.00100 CTCCAT 92.53 2.49 -0.88
gi|545778205|gb|U00096.3|:c514859-514401 3 ATGGAG 0 t 16537 88.89 2.185 0.00100 CTCCAT 92.53 2.49 -1.18
gi|545778205|gb|U00096.3|:c514859-514401 3 ATGGAG 0 t 16536 94.96 2.441 0.00125 CTCCAT 92.53 2.49 0.79
gi|545778205|gb|U00096.3|:c514859-514401 3 ATGGAG 0 t 16535 81.63 2.760 0.00150 NNNNNN 0.00 0.00 inf
gi|545778205|gb|U00096.3|:c514859-514401 7 AGTTAA 0 t 16534 78.96 2.278 0.00075 TTAACT 75.55 3.52 0.79
gi|545778205|gb|U00096.3|:c514859-514401 8 GTTAAT 0 t 16533 98.81 4.001 0.00100 ATTAAC 95.87 3.30 0.72
gi|545778205|gb|U00096.3|:c514859-514401 9 TTAATG 0 t 16532 96.92 1.506 0.00150 CATTAA 95.43 3.32 0.36
gi|545778205|gb|U00096.3|:c514859-514401 10 TAATGG 0 t 16531 70.86 0.402 0.00100 CCATTA 68.99 3.70 0.41
gi|545778205|gb|U00096.3|:c514859-514401 11 AATGGT 0 t 16530 91.24 4.256 0.00175 ACCATT 85.84 2.74 1.60
Example plots¶
In Figure 1 of our methylation detection paper we show a histogram of event_level_mean
for a selection of k-mers to demonstrate how methylation changes the observed current. The data for these figures was generated by eventalign, which we subsequently plotted in R using ggplot2.
Quickstart - calling methylation with nanopolish¶
Oxford Nanopore sequencers are sensitive to base modifications. Here we provide a step-by-step tutorial to help you get started with detecting base modifications using nanopolish.
For more information about our approach:
- Simpson, Jared T., et al. “Detecting DNA cytosine methylation using nanopore sequencing.” Nature Methods (2017).
Requirements:
Download example dataset¶
In this tutorial we will use a subset of the NA12878 WGS Consortium data. You can download the example dataset we will use here (warning: the file is about 2GB):
wget http://s3.climb.ac.uk/nanopolish_tutorial/methylation_example.tar.gz
tar -xvf methylation_example.tar.gz
cd methylation_example
Details:
- Sample : Human cell line (NA12878)
- Basecaller : Albacore v2.0.2
- Region: chr20:5,000,000-10,000,000
In the extracted example data you should find the following files:
albacore_output.fastq
: the subset of the basecalled readsreference.fasta
: the chromsome 20 reference sequencefast5_files/
: a directory containing signal-level FAST5 files
The reads were basecalled using this albacore command:
read_fast5_basecaller.py -c r94_450bps_linear.cfg -t 8 -i fast5_files -s basecalled/ -o fastq
After the basecaller finished, we merged all of the fastq files together into a single file:
cat basecalled/workspace/pass/*.fastq > albacore_output.fastq
Data preprocessing¶
nanopolish needs access to the signal-level data measured by the nanopore sequencer. To begin, we need to create an index file that links read ids with their signal-level data in the FAST5 files:
nanopolish index -d fast5_files/ albacore_output.fastq
We get the following files: albacore_output.fastq.index
, albacore_output.fastq.index.fai
, albacore_output.fastq.index.gzi
, and albacore_output.fastq.index.readdb
.
Aligning reads to the reference genome¶
Next, we need to align the basecalled reads to the reference genome. We use minimap2 as it is fast enough to map reads to the human genome. In this example we’ll pipe the output directly into samtools sort
to get a sorted bam file:
minimap2 -a -x map-ont reference.fasta albacore_output.fastq | samtools sort -T tmp -o albacore_output.sorted.bam
samtools index albacore_output.sorted.bam
Calling methylation¶
Now we’re ready to use nanopolish to detect methylated bases (in this case 5-methylcytosine in a CpG context). The command is fairly straightforward - we have to tell it what reads to use (albacore_output.fastq
), where the alignments are (albacore_output.sorted.bam
), the reference genome (reference.fasta
) and what region of the genome we’re interested in (chr20:5,000,000-10,000,000
):
nanopolish call-methylation -t 8 -r albacore_output.fastq -b albacore_output.sorted.bam -g reference.fasta -w "chr20:5,000,000-10,000,000" > methylation_calls.tsv
The output file contains a lot of information including the position of the CG dinucleotide on the reference genome, the ID of the read that was used to make the call, and the log-likelihood ratio calculated by our model:
chromosome start end read_name log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands num_cpgs sequence
chr20 4980553 4980553 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a 3.70 -167.47 -171.17 1 1 TGAGACGGGGT
chr20 4980599 4980599 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a 2.64 -98.87 -101.51 1 1 AATCTCGGCTC
chr20 4980616 4980616 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a -0.61 -95.35 -94.75 1 1 ACCTCCGCCTC
chr20 4980690 4980690 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a -2.99 -99.58 -96.59 1 1 ACACCCGGCTA
chr20 4980780 4980780 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a 5.27 -135.45 -140.72 1 1 CACCTCGGCCT
chr20 4980807 4980807 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a -2.95 -89.20 -86.26 1 1 ATTACCGGTGT
chr20 4980820 4980822 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a 7.47 -90.63 -98.10 1 2 GCCACCGCGCCCA
chr20 4980899 4980901 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a 3.17 -96.40 -99.57 1 2 GTATACGCGTTCC
chr20 4980955 4980955 c1e202f4-e8f9-4eb8-b9a6-d79e6fab1e9a 0.33 -92.14 -92.47 1 1 AGTCCCGATAT
A positive value in the log_lik_ratio
column indicates support for methylation. We have provided a helper script that can be used to calculate how often each reference position was methylated:
scripts/calculate_methylation_frequency.py -i methylation_calls.tsv > methylation_frequency.tsv
The output is another tab-separated file, this time summarized by genomic position:
chromosome start end num_cpgs_in_group called_sites called_sites_methylated methylated_frequency group_sequence
chr20 5036763 5036763 1 21 20 0.952 split-group
chr20 5036770 5036770 1 21 20 0.952 split-group
chr20 5036780 5036780 1 21 20 0.952 split-group
chr20 5037173 5037173 1 13 5 0.385 AAGGACGTTAT
In the example data set we have also included bisulfite data from ENCODE for the same region of chromosome 20. We can use the included compare_methylation.py
helper script to do a quick comparison between the nanopolish methylation output and bisulfite:
python compare_methylation.py bisulfite.ENCFF835NTC.example.tsv methylation_frequency.tsv > bisulfite_vs_nanopolish.tsv
We can use R to visualize the results - we observe good correlation between the nanopolish methylation calls and bisulfite:
library(ggplot2)
library(RColorBrewer)
data <- read.table("bisulfite_vs_nanopolish.tsv", header=T)
# Set color palette for 2D heatmap
rf <- colorRampPalette(rev(brewer.pal(11,'Spectral')))
r <- rf(32)
c <- cor(data$frequency_1, data$frequency_2)
title <- sprintf("N = %d r = %.3f", nrow(data), c)
ggplot(data, aes(frequency_1, frequency_2)) +
geom_bin2d(bins=25) + scale_fill_gradientn(colors=r, trans="log10") +
xlab("Bisulfite Methylation Frequency") +
ylab("Nanopolish Methylation Frequency") +
theme_bw(base_size=20) +
ggtitle(title)
Here’s what the output should look like:
Helping us debug nanopolish¶
Overview¶
Running into errors with nanopolish? To help us debug, we need to be able to reproduce the errors. We can do this by packaging a subset of the files that were used by a nanopolish. We have provided scripts/extract_reads_aligned_to_region.py
and this tutorial to help you do exactly this.
Briefly, this script will:
- extract reads that align to a given region in the draft genome assembly
- rewrite a new BAM, BAI, FASTA file with these reads
- extract the FAST5 files associated with these reads
- save all these files into a tar.gz file
Workflow¶
- Narrow down a problematic region by running
nanopolish variants --consensus [...]
and changing the-w
parameter. - Run the
scripts/extract_reads_aligned_to_region.py
. - Send the resulting
tar.gz
file to us by hosting either a dropbox or google drive.
Tutorial - using extraction helper script to create example datsets¶
We extracted a subset of reads for a 2kb region to create the example dataset for the eventalign and consensus tutorial using scripts/extract_reads_aligned_to_region.py
. Here is how:
Generated basecalled --reads
file:
Basecalled reads with albacore:
read_fast5_basecaller.py -c r94_450bps_linear.cfg -t 8 -i /path/to/raw/fast5/files -s /path/to/albacore-2.0.1/output/directory -o fastq
Merged the different albacore fastq outputs:
cat /path/to/albacore-2.0.1/output/directory/workspace/pass/*.fastq > albacore-2.0.1-merged.fastq
Converted the merged fastq to fasta format:
paste - - - - < albacore-2.0.1-merged.fastq | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > reads.fasta
Generated --bam
file with the draft genome assembly (-g
):
Ran canu to create draft genome assembly:
canu \ -p ecoli -d outdir genomeSize=4.6m \ -nanopore-raw reads.fasta \
Index draft assembly:
bwa index ecoli.contigs.fasta samtools faidx ecoli.contigs.fasta
Aligned reads to draft genome assembly with bwa (v0.7.12):
bwa mem -x ont2d ecoli.contigs.fasta reads.fasta | samtools sort -o reads.sorted.bam -T reads.tmp samtools index reads.sorted.bam
Selected a --window
:
Identified the first contig name and chose a random start position:
head -3 ecoli.contigs.fasta
Output:
>tig00000001 len=4376233 reads=23096 covStat=7751.73 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
AGATGCTTTGAAAGAAACGCAGAATAGATCTCTATGTAATGATATGGAATACTCTGGTATTGTCTGTAAAGATACTAATGGAAAATATTTTGCATCTAAG
GCAGAAACTGATAATTTAAGAAAGGAGTCATATCCTCTGAAAAGAAAATGTCCCACAGGTACAGATAGAGTTGCTGCTTATCATACTCACGGTGCAGATA
As we wanted a 2kb region, we selected a random start position (200000) so our end position was 202000. Therefore our --window
was “tig00000001:200000-202000”.
Using the files we created, we ran scripts/extract_reads_aligned_to_region.py
, please see usage example below.
Note
Make sure nanopolish still reproduces the same error on this subset before sending it to us.
Usage example¶
python extract_reads_aligned_to_region.py \
--reads reads.fasta \
--genome ecoli.contigs.fasta \
--bam reads.sorted.bam \
--window "tig00000001:200000-202000" \
--output_prefix ecoli_2kb_region --verbose
Argument name(s) | Req. | Default value | Description |
---|---|---|---|
-b , --bam |
Y | NA | Sorted bam file created by aligning reads to the draft genome. |
-g , --genome |
Y | NA | Draft genome assembly |
-r , --reads |
Y | NA | Fasta, fastq, fasta.gz, or fastq.gz file containing basecalled reads. |
-w , --window |
Y | NA | Draft genome assembly coordinate string ex. “contig:start-end”. It is essential that you wrap the coordinates in quotation marks (“). |
-o , --output_prefix |
N | reads_subset | Prefix of output tar.gz and log file. |
-v , --verbose |
N | False | Use for verbose output with info on progress. |
Script overview¶
- Parse input files
- Assumes readdb file name from input reads file
- Validates input
- checks that input bam, readdb, fasta/q, draft genome assembly, draft genome assembly index file exist, are not empy, and are readable
- With user input draft genome assembly coordinates, extracts all reads that aligned within these coordinates stores the read_ids. This information can be found from the input BAM.
- uses pysam.AlignmentFile
- uses samfile.fetch(region=draft_ga_coords) to get all reads aligned to region
- if reads map to multiple sections within draft ga it is not added again
- Parses through the input readdb file to find the FAST5 files associated with each region that aligned to region
- stores in dictionary region_fast5_files; key = read_id, value = path/to/fast5/file
- path to fast5 file is currently dependent on the user’s directory structure
- Make a BAM and BAI file for this specific region
- creates a new BAM file called
region.bam
- with pysam.view we rewrite the new bam with reads that aligned to the region…
- with pysam.index we create a new BAI file
- creates a new BAM file called
- Now to make a new FASTA file with this subset of reads
- the new fasta file is called
region.fasta
- this first checks what type of sequences file is given {
fasta
,fastq
,fasta.gz
,fastq.gz
} - then handles based on type of seq file using SeqIO.parse
- then writes to a new fasta file
- the new fasta file is called
- Let’s get to tarring
- creates a
tar.gz
file with the output prefix - saves the fast5 files in directory
output_prefix/fast5_files/
- creates a
- Adds the new fasta, new bam, and new bai file with the subset of reads
- Adds the draft genome asssembly and associated fai index file
- Performs a check
- the number of reads in the new BAM file, new FASTA file, and the number of files in the fast5_files directory should be equal
- Outputs a
tar.gz
andlog
file. FIN!
Manual¶
Modules available:
nanopolish extract: extract reads in FASTA or FASTQ format from a directory of FAST5 files
nanopolish call-methylation: predict genomic bases that may be methylated
nanopolish variants: detect SNPs and indels with respect to a reference genome
nanopolish variants --consensus: calculate an improved consensus sequence for a draft genome assembly
nanopolish eventalign: align signal-level events to k-mers of a reference genome
nanopolish phase-reads: Phase reads using heterozygous SNVs with respect to a reference genome
extract¶
Overview¶
This module is used to extract reads in FASTA or FASTQ format from a directory of FAST5 files.
Input¶
- path to a directory of FAST5 files modified to contain basecall information
Output¶
- sequences of reads in FASTA or FASTQ format
Usage example¶
nanopolish extract [OPTIONS] <fast5|dir>
Argument name(s) | Required | Default value | Description |
---|---|---|---|
<fast5|dir> | Y | NA | FAST5 or path to directory of FAST5 files. |
-r , --recurse |
N | NA | Recurse into subdirectories |
-q , --fastq |
N | fasta format | Use when you want to extract to FASTQ format |
-t , --type=TYPE |
N | 2d-or-template | The type of read either: {template, complement, 2d, 2d-or-template, any} |
-b , --basecaller=NAME[:VERSION] |
N | NA | consider only data produced by basecaller NAME, optionally with given exact VERSION |
-o , --output=FILE |
N | stdout | Write output to FILE |
index¶
Overview¶
Build an index mapping from basecalled reads to the signals measured by the sequencer
Input¶
- path to directory of raw nanopore sequencing data in FAST5 format
- basecalled reads
Output¶
- gzipped FASTA format of basecalled reads
- index files (fai, gzi, readdb)
Readdb file format¶
Readdb file is a tab-separated file that contains two columns. One column represents read ids and the other column represents the corresponding path to FAST5 file:
read_id_1 /path/to/fast5/containing/reads_id_1/signals
read_id_2 /path/to/fast5/containing/read_id_2/signals
Usage example¶
nanopolish index [OPTIONS] -d nanopore_raw_file_directory reads.fastq
Argument name(s) | Required | Default value | Description |
---|---|---|---|
-d , --directory |
Y | NA | FAST5 or path to directory of FAST5 files containing ONT sequencing raw signal information. |
-f , --fast5-fofn |
N | NA | file containing the paths to each fast5 for the run |
call-methylation¶
Overview¶
Classify nucleotides as methylated or not.
Input¶
- Basecalled ONT reads in FASTA format
Output¶
- tab-separated file containing per-read log-likelihood ratios
Usage example¶
nanopolish call-methylation [OPTIONS] <fast5|dir>
Argument name(s) | Required | Default value | Description |
---|---|---|---|
-r , --reads=FILE |
Y | NA | the ONT reads are in fasta FILE |
-b , --bam=FILE |
Y | NA | the reads aligned to the genome assembly are in bam FILE |
-g , --genome=FILE |
Y | NA | the genome we are computing a consensus for is in FILE |
-t , --threads=NUM |
N | 1 | use NUM threads |
--progress |
N | NA | print out a progress message |
variants¶
Overview¶
This module is used to call single nucleotide polymorphisms (SNPs) using a signal-level HMM.
Input¶
- basecalled reads
- alignment info
- genome assembly
Output¶
- VCF file
Usage example¶
nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa
Argument name(s) | Required | Default value | Description |
---|---|---|---|
--snps |
N | NA | use flag to only call SNPs |
--consensus=FILE |
N | NA | run in consensus calling mode and write polished sequence to FILE |
--fix-homopolymers |
N | NA | use flag to run the experimental homopolymer caller |
--faster |
N | NA | minimize compute time while slightly reducing consensus accuracy |
-w , --window=STR |
N | NA | find variants in window STR (format: <chromsome_name>:<start>-<end>) |
-r , --reads=FILE |
Y | NA | the ONT reads are in fasta FILE |
-b , --bam=FILE |
Y | NA | the reads aligned to the reference genome are in bam FILE |
-e , --event-bam=FILE |
Y | NA | the events aligned to the reference genome are in bam FILE |
-g , --genome=FILE |
Y | NA | the reference genome is in FILE |
-o , --outfile=FILE |
N | stdout | write result to FILE |
-t , --threads=NUM |
N | 1 | use NUM threads |
-m , --min-candidate-frequency=F |
N | 0.2 | extract candidate variants from the aligned reads when the variant frequency is at least F |
-d , --min-candidate-depth=D |
N | 20 | extract candidate variants from the aligned reads when the depth is at least D |
-x , --max-haplotypes=N |
N | 1000 | consider at most N haplotypes combinations |
--max-rounds=N |
N | 50 | perform N rounds of consensus sequence improvement |
-c , --candidates=VCF |
N | NA | read variants candidates from VCF, rather than discovering them from aligned reads |
-a , --alternative-basecalls-bam=FILE |
N | NA | if an alternative basecaller was used that does not output event annotations then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam |
--calculate-all-support |
N | NA | when making a call, also calculate the support of the 3 other possible bases |
--models-fofn=FILE |
N | NA | read alternatives k-mer models from FILE |
event align¶
Overview¶
Align nanopore events to reference k-mers
Input¶
- basecalled reads
- alignment information
- assembled genome
Usage example¶
nanopolish eventalign [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa
Argument name(s) | Required | Default value | Description |
---|---|---|---|
--sam |
N | NA | use to write output in SAM format |
-w, --window=STR |
N | NA | Compute the consensus for window STR (format : ctg:start_id-end_id) |
-r, --reads=FILE |
Y | NA | the ONT reads are in fasta FILE |
-b, --bam=FILE |
Y | NA | the reads aligned to the genome assembly are in bam FILE |
-g, --genome=FILE |
Y | NA | the genome we are computing a consensus for is in FILE |
-t, --threads=NUM |
N | 1 | use NUM threads |
--scale-events |
N | NA | scale events to the model, rather than vice-versa |
--progress |
N | NA | print out a progress message |
-n , --print-read-names |
N | NA | print read names instead of indexes |
--summary=FILE |
N | NA | summarize the alignment of each read/strand in FILE |
--samples |
N | NA | write the raw samples for the event to the tsv output |
--models-fofn=FILE |
N | NA | read alternative k-mer models from FILE |
phase-reads - (experimental)¶
Overview¶
Phase reads using heterozygous SNVs with respect to a reference genome
Input¶
- basecalled reads
- alignment information
- assembled genome
- variants (from nanopolish variants or from other sources eg. Illumina VCF)
Usage example¶
nanopolish phase-reads [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa variants.vcf
Publications¶
- Loman, Nicholas J., Joshua Quick, and Jared T. Simpson. “A complete bacterial genome assembled de novo using only nanopore sequencing data.” Nature methods 12.8 (2015): 733-735.
- Quick, Joshua, et al. “Real-time, portable genome sequencing for Ebola surveillance.” Nature 530.7589 (2016): 228-232.
- Simpson, Jared T., et al. “Detecting DNA cytosine methylation using nanopore sequencing.” nature methods 14.4 (2017): 407-410.