CIPHER¶
Installation¶
CIPHER requires no manual installation. Just download and unzip the tar file from our GitHub or git clone the repository. The download may be large.
The only required manual software installation on your local computer / cluster are Docker and Nextflow.
By default, CIPHER will run off a Docker container, but this can be optionally turned off by removing the option in the config file. For more information, please read the ‘Nextflow Config File’ documentation.
Installing Nextflow¶
For Linux:
- We recommend installing Nextflow through the Anaconda package manager.
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh bash Anaconda2-4.3.1-Linux-x86_64.sh
- Install Nextflow
conda install -c bioconda nextflow
For macOS:
- We recommend installing Nextflow through the Anaconda package manager.
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-MacOSX-x86_64.sh bash Anaconda2-4.3.1-Linux-x86_64.sh
- Install Nextflow
conda install -c bioconda nextflow
For Windows:
CIPHER has not been tested on the Windows operating system. However, Windows 10 has introduced the ‘Ubuntu Bash Shell’ sub-system, which potentially can be used to run CIPHER.
Install Ubuntu Bash on your Windows 10 computer. Follow the instructions here: https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/
- Using the Ubuntu Bash terminal we recommend installing nextflow through the Anaconda package manager.
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh bash Anaconda2-4.3.1-Linux-x86_64.sh
- Install Nextflow
conda install -c bioconda nextflow
Installing Docker¶
Installing Docker can be tricky. Because the instructions for correct installation are constantly changing, we provide links to Docker’s official installation process. You’ll want to use Linux’s installation instructions for Windows via the Ubuntu Bash terminal.
For Linux:
https://docs.docker.com/engine/installation/#supported-platforms
For macS:
Manual Installation¶
Some universities may not want to install Docker for security reasons. And others may not want to use Docker at all on their local desktop. This section lists how to manually install all the dependencies for CIPHER.
We recommend using the Anaconda package manager.
For Linux:
- We recommend installing Nextflow through the Anaconda package manager.
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh bash Anaconda2-4.3.1-Linux-x86_64.sh
- Install bioconda packages
conda install -c bioconda nextflow fastqc bbmap star hisat2 bowtie2 bwa multiqc macs2 deeptools epic preseq samtools sambamba bedtools bedops stringtie subread
- Install R packages
R install.packages(c("data.table", "ggplot2", "gplots")) install.packages("http://hartleys.github.io/QoRTs/QoRTs_LATEST.tar.gz", repos=NULL, type="source") source("https://bioconductor.org/biocLite.R") biocLite() biocLite(c("ChIPQC", "RUVSeq", "ChIPseeker"))
For macOS:
- We recommend installing Nextflow through the Anaconda package manager.
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-MacOSX-x86_64.sh bash Anaconda2-4.3.1-Linux-x86_64.sh
- Install bioconda packages
conda install -c bioconda nextflow fastqc bbmap star hisat2 bowtie2 bwa multiqc macs2 deeptools epic preseq samtools sambamba bedtools bedops stringtie subread
- Install R packages
R install.packages(c("data.table", "ggplot2", "gplots")) install.packages("http://hartleys.github.io/QoRTs/QoRTs_LATEST.tar.gz", repos=NULL, type="source") source("https://bioconductor.org/biocLite.R") biocLite() biocLite(c("ChIPQC", "RUVSeq", "ChIPseeker"))
Parameters¶
The following table lists the currently available parameters for CIPHER along with a brief description. Updated: May 2017
REQUIRED PARAMETERS
Parameter | Description |
---|---|
–mode | Choose from available: ‘chip’, ‘rna’, ‘gro’, ‘mnase’, ‘dnase’ , ‘atac’, ‘analysis’. |
–config | Tab separated config file with sample information. Check README for more information. |
–fasta | Reference genome in FASTA format. |
–gtf | Reference genome in GTF format. |
–lib | Library information. “s” for single-stranded data, “p” for pair-ended data. |
–readLen | The length of your reads. |
RNA-seq MODE ONLY
Parameter | Description | |||
---|---|---|---|---|
|
||||
|
ANALYSIS MODE ONLY
Parameter | Description | |||
---|---|---|---|---|
|
OPTIONAL PARAMETERS
Parameter | Description | Default Value |
---|---|---|
–threads | Number of threads to use. | 1 |
–aligner | The alignment software for mapping reads. Available: bbmap, star, hisat2, bwa, bowtie2. | bbmap |
–minid | Minimum alignment identity to look for during BBMap mapping. Higher is faster and less sensitive. | 0.76 |
–qvalue | Minimum FDR cutoff for peak detection in MACS2 ad EPIC. | 0.01 |
–epic_w | Window size ued to scan the genome for peak detection in EPIC. | 200 |
–epic_g | A multiple of epic_w used to determine the gap size in EPIC. | 3 |
–maxindel | Maximum indel length searched during mapping. 200k recommended for vertebrate genomes. | 200k |
–intronlen | Maximum intron length during mapping. 20 recommended for vertebrate genomes. | 20 |
–egs | The effective genome size of your species. Is automatically calculated by default. 80GB of RAM. | Auto |
–egs_ratio | Effective genome as fraction of genome size. Must be between 0 and 1. Automatically calculated. | Auto |
–outdir | Name of ourput directory. | results |
ALIGNER-SPECIFIC OPTIONAL PARAMETERS
BWA
Parameter | Description | Default Value |
---|---|---|
–bwa_k | See -k option in BWA user manual for more information. | 19 |
–bwa_w | See -w option in BWA user manual for more information. | 100 |
–bwa_d | See -d option in BWA user manual for more information. | 19 |
–bwa_r | See -r option in BWA user manual for more information. | 1.5 |
–bwa_c | See -c option in BWA user manual for more information. | 10000 |
–bwa_A | See -A option in BWA user manual for more information. | 1 |
–bwa_B | See -B option in BWA user manual for more information. | 4 |
–bwa_O | See -O option in BWA user manual for more information. | 6 |
–bwa_E | See -E option in BWA user manual for more information. | 1 |
–bwa_L | See -L option in BWA user manual for more information. | 5 |
–bwa_U | See -U option in BWA user manual for more information. | 9 |
–bwa_T | See -T option in BWA user manual for more information. | 30 |
BOWTIE2
Parameter | Description | Default Value |
---|---|---|
–bt2_D | See -D option in Bowtie2 user manual for more information. | 20 |
–bt2_R | See -R option in Bowtie2 user manual for more information. | 3 |
–bt2_N | See -N option in Bowtie2 user manual for more information. | 0 |
–bt2_L | See -L option in Bowtie2 user manual for more information. | 20 |
–bt2_i | See -i option in Bowtie2 user manual for more information. | S,5,1,0.50 |
–bt2_trim5 | See –trim5 option in Bowtie2 user manual for more information. | 0 |
–bt2_trim3 | See –trim3 option in Bowtie2 user manual for more information. | 0 |
–local | Set this parameter to map alignments in local mode. | false |
HISAT2
Parameter | Description | Default Value |
---|---|---|
–hs_k | See -k option in HISAT2 user manual for more information. | 5 |
–hs_trim5 | See –trim5 option in HISAT2 user manual for more information. | 0 |
–hs_trim3 | See –trim3 option in HISAT2 user manual for more information. | 0 |
–hs_mp | See –mp option in HISAT2 user manual for more information. | 6,2 |
–hs_sp | See –sp option in HISAT2 user manual for more information. | 2,1 |
–hs_np | See –np option in HISAT2 user manual for more information. | 1 |
–hs_rdg | See –rdg option in HISAT2 user manual for more information. | 5,3 |
–hs_rfg | See –rfg option in HISAT2 user manual for more information. | 5,3 |
–hs_pen_cansplice | See –pen-cansplice option in HISAT2 user manual for more information. | 0 |
–hs_pen_noncansplice | See –pen-noncansplice option in HISAT2 user manual for more information. | 12 |
–hs_min_intronlen | See –min-intronlen option in HISAT2 user manual for more information. | 20 |
–hs_max_intronlen | See –max-intronlen option in HISAT2 user manual for more information. | 500000 |
–hs_max_seeds | See –max-seeds option in HISAT2 user manual for more information. | 5 |
STAR
Parameter | Description | Default Value |
---|---|---|
–star_clip3pNbases | See –clip3pNbases option in STAR user manual for more information. | 0 |
–star_clip5pNbases | See –clip5pNbases option in STAR user manual for more information. | 0 |
–star_outFilterMultimapScoreRange | See –outFilterMultimapScoreRange option in STAR user manual for more information. | 1 |
–star_outFilterMultimapNmax | See –outFilterMultimapNmax option in STAR user manual for more information. | 10 |
–star_outFilterMismatchNmax | See –outFilterMismatchNmax option in STAR user manual for more information. | 10 |
–star_outFilterScoreMin | See –outFilterScoreMin option in STAR user manual for more information. | 0 |
–star_alignEndsType | See –alignEndsType option in STAR user manual for more information. | Local |
–star_winAnchorMultimapNmax | See –winAnchorMultimapNmax option in STAR user manual for more information. | 50 |
–star_quantMode | See –quantMode option in STAR user manual for more information. | |
–star_twopassMode | See –twopassMode option in STAR user manual for more information. | None |
FOR OPTIMIZING AND TESTING
Parameter | Description | Default Value |
---|---|---|
–subsample | Set this flag to subsample reads for teting. | false |
ChIP-seq Workflow¶
All workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz control1 sample1 sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz control1 sample1 sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz control2 sample2 sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz control2 sample2 control1 control1_rep1 /path/to/control1_rep1.fastq.gz control1 input control2 control2_rep1 /path/to/control2_rep1.fastq.gz control2 input
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz control1 sample1 sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz control1 sample1 sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz control2 sample2 sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz control2 sample2 control1 control1_rep1 /path/to/control1_rep1_R1.fastq.gz /path/to/control1_rep1_R2.fastq.gz control1 input control2 control2_rep1 /path/to/control2_rep1_R1.fastq.gz /path/to/control2_rep1_R2.fastq.gz control2 input
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
- Simple ChIP-seq Tutorial (single-end, 75 length reads)
nextflow run /path/to/main.nf --mode chip --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75
- Simple ChIP-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode chip --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75
- Simple ChIP-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode chip --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5
RNA-seq Workflow¶
RNA-seq workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
- A experimental config file (described below)
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz sample1 sample1_rep3 /path/to/sample1_rep2.fastq.gz sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz sample2 sample2_rep3 /path/to/sample1_rep2.fastq.gz
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz sample1 sample1_rep3 /path/to/sample1_rep3_R1.fastq.gz /path/to/sample1_rep3_R2.fastq.gz sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_rep2_R2.fastq.gz sample2 sample2_rep3 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep3_R2.fastq.gz
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
Experimental Config File A experimental config file is a tab separated text file that has sample and condition information. Your experimental config file must be ordered in the same way that your config file is. (sample1 in config file == sample1 in expInfo file). Additionally you must include the headers in the expInfo config file.
sample condition ctrl1 WT ctrl2 WT ctrl3 WT ko1 KO ko2 KO ko3 KO
The headers “sample” and “condition” must be included.
The columns represent:
- SampleID: Refers to the ID in your config file. Used to differentiate between different replicates.
- Condition: Refers to the condition or experimental variable of your sample. CIPHER currently only supports two condition DGE analysis.
- Simple RNA-seq Tutorial (single-end, 75 length reads, frFirstStrand)
nextflow run /path/to/main.nf --mode rna --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --strandInfo frFirstStrand --expInfo exp_config.txt
- Simple RNA-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode rna --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75 --strandInfo frFirstStrand --expInfo exp_config.txt
- Simple RNA-seq Tutorial (single-end, 75 length reads, use star aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode rna --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --strandInfo frFirstStrand --expInfo exp_config.txt --aligner star --threads 5
DNase-seq Workflow¶
All workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
- Simple MNase-seq Tutorial (single-end, 75 length reads)
nextflow run /path/to/main.nf --mode dnase --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75
- Simple MNase-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode dnase --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75
- Simple MNase-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode dnase --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5
MNase-seq Workflow¶
All workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
- Simple MNase-seq Tutorial (single-end, 75 length reads)
nextflow run /path/to/main.nf --mode mnase --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75
- Simple MNase-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode mnase --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75
- Simple MNase-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode mnase --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5
GRO-seq Workflow¶
All workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
- Simple MNase-seq Tutorial (single-end, 75 length reads)
nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75
- Simple MNase-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75
- Simple MNase-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5
ATAC-seq Workflow¶
All workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz control2 sample2 sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz control2 sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
- Simple MNase-seq Tutorial (single-end, 75 length reads)
nextflow run /path/to/main.nf --mode atac --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75
- Simple MNase-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode atac --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75
- Simple MNase-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode atac --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5
Indices and tables¶
CIPHER is a workflow platform written in the Nextflow DSL that was developed to enhance reproducibility among research, and to simplify data processing for non-computational scientists. CIPHER can be used for the efficient pre-processing and analysis of high-throughput sequencing data including: ChIP-seq, RNA-seq, DNase-seq, MNase-seq, GRO-seq, and ATAC-seq.
For support, questions, or feature requests contact: cag104@ucsd.edu or submit an issue at our github.
CIPHER is run from the command line, can be run on a local desktop or HPC and only requires the installation of Nextflow and Docker (optionally). Please visit the ‘Installation’ page for more details.