Welcome to Metagenomics Training Material RoundUp’s documentation!¶
Sections¶
- Metagenomics Training Available
- Learning Objectives
List of Training Materials¶
Background Material¶
Factors influencing course design¶
- Organisms
- Audience/Student expertise
- Course length
- Compute resources
- Trainer knowledge
- Environment/Biome
- Available Datasets
Links to resources that cover several for the above topics¶
Background¶
- Experimental Design
- Computer Literacy
- Cloud Computing
- Sequencing options
- Statistical power
Links to resources that cover several for the above topics¶
Pre-Processing¶
Data treatment/pre-processing¶
- Quality
- Filtering Reads
- Trimming Reads
- Contamination
- Barcodes / multiplexing
- Read Merging
Links to resources that cover several for the above topics¶
Assembly¶
- Short reads
- Long reads
- Hybrid
- Data reduction / Clustering / Dereplication
Links to resources that cover several for the above topics¶
Types of Analysis¶
Marker gene amplicons¶
- 16s rRNA, 18s rRNA, ITS1/2
- Taxonomic profiling
- OTU clustering
- taxonomy -> function inferences
Links to resources that cover several for the above topics¶
Shotgun metagenomics & metatranscriptomics¶
- To assemble or not?
- To gene call or not?
- Kmers (“reference-free”)
- Taxonomic profiling
- Coverage estimation
- Reference mapping / alignment
- Gene Catalogues
- Marker-Based approaches
- Read-based approaches
- Functional profiling/annotation
- Functional hierarchies / ontologies
- Depth comparisons
- Binning
- Whole genome assembly and evaluation
- Pathway Analysis / Metabolic modeling
- Cross assembly
Links to resources that cover several for the above topics¶
Data treatment/pre-processing¶
- Quality
- Filtering Reads
- Trimming Reads
- Contamination
- Barcodes / multiplexing
- Read Merging
Learning Objectives¶
- Assess the overall quality of NGS (FastQ format) sequence reads
- Visualise the quality, and other associated matrices, of reads to decide on filters and cutoffs for cleaning up data ready for downstream analysis
- Clean up adaptors and pre-process the sequence data for further analysis
Examples of Tutorial with Specific Learning Objectives¶
Amplicon analyses¶
Marker gene amplicons¶
- 16s rRNA, 18s rRNA, ITS1/2
- Taxonomic profiling
- OTU clustering
- taxonomy -> function inferences
Links to resources that cover several for the above topics¶
Learning objectives¶
- Define a marker gene and its taxonomic relevance.
- Distinguish between amplicon vs. shotgun approach.
- Apply a marker gene analysis workflow
- Analyse the taxonomic composition of an environment.
- Interpret results in biological context
- Describe and execute an amplicon workflow
- Install software for amplicon analysis
- Assemble paired-end reads
- Execute a shell script to automate a process
- Describe input and output files for amplicon workflows and scripts
- Describe the structure and components of a good mapping/metadata file
- Move sequences from compute resources to local computer
- Obtain summary information about sequence files (fasta, fna, fastq)
- Define operational taxaonomic units (OTUs)
- Align sequences, assign taxonomy, and build a tree with representative sequences from OTU definitions
- Calculate similarity of two samples (similarity matrices)
- Visualize comparative diversity across a priori categorical groups
- Convert .biom formatted OTU tables to text files for use outside of QIIME
Shotgun Analyses¶
Shotgun metagenomics & metatranscriptomics¶
- To assemble or not?
- To gene call or not?
- Kmers (“reference-free”)
- Taxonomic profiling
- Coverage estimation
- Reference mapping / alignment
- Gene Catalogues
- Marker-Based approaches
- Read-based approaches
- Functional profiling/annotation
- Functional hierarchies / ontologies
- Depth comparisons
- Binning
- Whole genome assembly and evaluation
- Pathway Analysis / Metabolic modeling
- Cross assembly
Links to resources that cover several for the above topics¶
Learning objectives¶
Shotgun metagenomics & metatranscriptomics (Intro):
- Discern the difference between genetic potential and expressed genes.
- Evaluate the advantages and disadvantages between the metaT and metaG
- Aware of the cost and depth of sequencing differences for metaT and metaG
Reference mapping / alignment:
- Understand approaches to the elimination of host-associated or contaminating material
- Apply reference mapping to assess community structure
- Interpret the mapping outcomes as representative abundance of taxa.
- Compare the mapping approach with the “reference-free”.
Taxonomic profiling:
- Compare taxonomic profiling using metagenomics and amplicon-based approaches.
- Utilise both marked based and all-read methods for determining taxonomic profiles from metagenomic data
- Apply one or more methods to create a taxonomic profile
- Distinguish reference taxonomies and database limitations.
- Apply appropriate statistical methods for comparative analysis.
- Demonstrate knowledge of the limitations of quantifying microbes as relative abundances
Functional profiling:
- Recognise the role and application of different reference databases in functional assignments
- Use tools to perform functional annotations of nucleic acid and protein sequences found in metagenomics datasets.
- Demonstrate an understanding of the relationship between ontologies (e.g. GO) and functional assignments as a means of hierarchically viewing the data.
- Critically assess the functional assignments in terms of confidence, with clear understanding about the limitations of the algorithms and databases used.
- Apply pathway gap-filling methods to overcome low coverage sequencing
Statistics and visualization - Comparative Analyses¶
- Metadata integration
- Sample comparison
- Ecological measurements/indexes
- Multivariate and comparative statistics
- Machine Learning
Links to resources that cover several for the above topics¶
Learning objectives¶
Normalization / appropriate comparisons:
- basic: applying DeSEQ for normalization
- more advanced: raise awareness and discuss the possible limitations of DeSEQ and rarefaction for metagenomics analysis
Ecological measurements/indexes:
Alpha diversity (Chao, Simpson, etc. )
- basic: knowing that alpha diversity refers to diversity of a single sample. Understanding what the inputs are, that it can be computed/reasoned at different taxonomic levels (species, genus, …). Ability use some software package to compute.
- More advanced: Interpreting the values, knowing which measures are more robust to low abundance taxa, different sampling depths, &c. Understanding the effects of “dark matter” when using reference based methods (ie, microbes that do not match the reference being employed would not be counted towards diversity).
- Advanced: knowing the formulas and understanding the underlying ecological/mathematical assumptions.
beta diversity (bray curtis distances, Unifrac etc,)
- Basic: knowing that beta diversity refers to diversity across samples (differing definitions in the field). SImilar to alpha diversity, can be computed at different taxonomic (or functional) levels. Ability to use a software package.
- Intermediate: Understanding the effects of different normalization procedures.
Multivariate statistics (Differential expression, comparison)
- general statistics knowledge, unspecific to metagenomics (p-values, multiple comparison correction, difference between statistical significance and effect size, training/testing split in machine learning).
- Knowing how auto-correlation can lead to spurious correlations (ecological studies).
- More advanced: knowing which methods attempt to correct for auto-correlation and how to apply them in software.
Machine Learning
- Basic: knowing that an ordination plot represents distance between samples in a low dimensional space. Knowing that the output represents the input distance matrix (i.e., different input matrices would lead to different results: see discussion on beta diversity). Being able to generate a plot in software.
- Intermediate: knowing what the “fraction of explained variance” represents. Advanced: understanding the difference between the different methods.