Welcome to Metagenomics Training Material RoundUp’s documentation!

Sections

List of Training Materials

Background Material

Factors influencing course design
  • Organisms
  • Audience/Student expertise
  • Course length
  • Compute resources
  • Trainer knowledge
  • Environment/Biome
  • Available Datasets
Background
  • Experimental Design
  • Computer Literacy
  • Cloud Computing
  • Sequencing options
  • Statistical power

Pre-Processing

Data treatment/pre-processing
  • Quality
  • Filtering Reads
  • Trimming Reads
  • Contamination
  • Barcodes / multiplexing
  • Read Merging
Links to resources that cover several for the above topics
Assembly
  • Short reads
  • Long reads
  • Hybrid
  • Data reduction / Clustering / Dereplication
Links to resources that cover several for the above topics

Types of Analysis

Marker gene amplicons
  • 16s rRNA, 18s rRNA, ITS1/2
  • Taxonomic profiling
  • OTU clustering
  • taxonomy -> function inferences
Links to resources that cover several for the above topics
Shotgun metagenomics & metatranscriptomics
  • To assemble or not?
  • To gene call or not?
  • Kmers (“reference-free”)
  • Taxonomic profiling
  • Coverage estimation
  • Reference mapping / alignment
  • Gene Catalogues
  • Marker-Based approaches
  • Read-based approaches
  • Functional profiling/annotation
  • Functional hierarchies / ontologies
  • Depth comparisons
  • Binning
  • Whole genome assembly and evaluation
  • Pathway Analysis / Metabolic modeling
  • Cross assembly

Comparative Analyses

Statistics and visualization
  • Metadata integration
  • Sample comparison
  • Ecological measurements/indexes
  • Multivariate and comparative statistics
  • Machine Learning
Links to resources that cover several for the above topics

Data treatment/pre-processing

  • Quality
  • Filtering Reads
  • Trimming Reads
  • Contamination
  • Barcodes / multiplexing
  • Read Merging

Learning Objectives

  • Assess the overall quality of NGS (FastQ format) sequence reads
  • Visualise the quality, and other associated matrices, of reads to decide on filters and cutoffs for cleaning up data ready for downstream analysis
  • Clean up adaptors and pre-process the sequence data for further analysis

Examples of Tutorial with Specific Learning Objectives

crAssphage Assembly Workshop

Amplicon analyses

Marker gene amplicons

  • 16s rRNA, 18s rRNA, ITS1/2
  • Taxonomic profiling
  • OTU clustering
  • taxonomy -> function inferences

Learning objectives

  • Define a marker gene and its taxonomic relevance.
  • Distinguish between amplicon vs. shotgun approach.
  • Apply a marker gene analysis workflow
  • Analyse the taxonomic composition of an environment.
  • Interpret results in biological context
  • Describe and execute an amplicon workflow
  • Install software for amplicon analysis
  • Assemble paired-end reads
  • Execute a shell script to automate a process
  • Describe input and output files for amplicon workflows and scripts
  • Describe the structure and components of a good mapping/metadata file
  • Move sequences from compute resources to local computer
  • Obtain summary information about sequence files (fasta, fna, fastq)
  • Define operational taxaonomic units (OTUs)
  • Align sequences, assign taxonomy, and build a tree with representative sequences from OTU definitions
  • Calculate similarity of two samples (similarity matrices)
  • Visualize comparative diversity across a priori categorical groups
  • Convert .biom formatted OTU tables to text files for use outside of QIIME

Shotgun Analyses

Shotgun metagenomics & metatranscriptomics

  • To assemble or not?
  • To gene call or not?
  • Kmers (“reference-free”)
  • Taxonomic profiling
  • Coverage estimation
  • Reference mapping / alignment
  • Gene Catalogues
  • Marker-Based approaches
  • Read-based approaches
  • Functional profiling/annotation
  • Functional hierarchies / ontologies
  • Depth comparisons
  • Binning
  • Whole genome assembly and evaluation
  • Pathway Analysis / Metabolic modeling
  • Cross assembly

Learning objectives

Shotgun metagenomics & metatranscriptomics (Intro):

  • Discern the difference between genetic potential and expressed genes.
  • Evaluate the advantages and disadvantages between the metaT and metaG
  • Aware of the cost and depth of sequencing differences for metaT and metaG

Reference mapping / alignment:

  • Understand approaches to the elimination of host-associated or contaminating material
  • Apply reference mapping to assess community structure
  • Interpret the mapping outcomes as representative abundance of taxa.
  • Compare the mapping approach with the “reference-free”.

Taxonomic profiling:

  • Compare taxonomic profiling using metagenomics and amplicon-based approaches.
  • Utilise both marked based and all-read methods for determining taxonomic profiles from metagenomic data
  • Apply one or more methods to create a taxonomic profile
  • Distinguish reference taxonomies and database limitations.
  • Apply appropriate statistical methods for comparative analysis.
  • Demonstrate knowledge of the limitations of quantifying microbes as relative abundances

Functional profiling:

  • Recognise the role and application of different reference databases in functional assignments
  • Use tools to perform functional annotations of nucleic acid and protein sequences found in metagenomics datasets.
  • Demonstrate an understanding of the relationship between ontologies (e.g. GO) and functional assignments as a means of hierarchically viewing the data.
  • Critically assess the functional assignments in terms of confidence, with clear understanding about the limitations of the algorithms and databases used.
  • Apply pathway gap-filling methods to overcome low coverage sequencing

Statistics and visualization - Comparative Analyses

  • Metadata integration
  • Sample comparison
  • Ecological measurements/indexes
  • Multivariate and comparative statistics
  • Machine Learning

Learning objectives

Normalization / appropriate comparisons:

  • basic: applying DeSEQ for normalization
  • more advanced: raise awareness and discuss the possible limitations of DeSEQ and rarefaction for metagenomics analysis

Ecological measurements/indexes:

Alpha diversity (Chao, Simpson, etc. )

  • basic: knowing that alpha diversity refers to diversity of a single sample. Understanding what the inputs are, that it can be computed/reasoned at different taxonomic levels (species, genus, …). Ability use some software package to compute.
  • More advanced: Interpreting the values, knowing which measures are more robust to low abundance taxa, different sampling depths, &c. Understanding the effects of “dark matter” when using reference based methods (ie, microbes that do not match the reference being employed would not be counted towards diversity).
  • Advanced: knowing the formulas and understanding the underlying ecological/mathematical assumptions.

beta diversity (bray curtis distances, Unifrac etc,)

  • Basic: knowing that beta diversity refers to diversity across samples (differing definitions in the field). SImilar to alpha diversity, can be computed at different taxonomic (or functional) levels. Ability to use a software package.
  • Intermediate: Understanding the effects of different normalization procedures.

Multivariate statistics (Differential expression, comparison)

  • general statistics knowledge, unspecific to metagenomics (p-values, multiple comparison correction, difference between statistical significance and effect size, training/testing split in machine learning).
  • Knowing how auto-correlation can lead to spurious correlations (ecological studies).
  • More advanced: knowing which methods attempt to correct for auto-correlation and how to apply them in software.

Machine Learning

  • Basic: knowing that an ordination plot represents distance between samples in a low dimensional space. Knowing that the output represents the input distance matrix (i.e., different input matrices would lead to different results: see discussion on beta diversity). Being able to generate a plot in software.
  • Intermediate: knowing what the “fraction of explained variance” represents. Advanced: understanding the difference between the different methods.

Indices and tables