A short list of tools for shotgun metagenomics

We are currently finalizing our work on the analysis of metagenomes for soils taken from the Etosha National Park, Namibia. These soils were interesting since Zebra blood containing the nasty bug Bacillus anthracis had poured into it. Our aim was to follow the changes of the microbial soil communities over 30 days using shotgun metagenomics. After we have published our paper, I will detail the results of that work at this blog. Here I will just list the tools we used.

This list is far from complete, and I urge you to test other methods as well. Shotgun metagenomic analysis is a rapidly changing field, and a lot of new tools come out week, so use my list as a starting point, and then move on.


Taxonomic classification of metagenomic contigs (> 1000 bp) using the IMG/ER system. Shown are phyla belonging to Bacteria, Archaea, Eukaryotes and Virusses,

Overview of methods used in our work:

Data clean-up
• Removal of low-quality data & adapters: Cutadapt (Martin, 2011).
• Removal of artificial duplicates & low complexity sequences: Prinseq (Schmieder and Edwards, 2011).

Mapping of metagenomes reads to reference genomes.
BWA with aln algorithm and not allowing mismatches (Li, 2013)
Presence of closely related species (B. anthracis / B. cereus) needs stringent mapping. We obtained strange results with BWA-mem (standard settings)

Extraction and classification of 16S /18S rRNA from metagenomes.
Metaxa2 (Bengtsson-Palme et al., 2015)

Taxonomic and functional classification of metagenomic shot gun sequences
DIAMOND (blastX) and MEGAN6 (Buchfink et al., 2014; Huson et al., 2007)

Other tools worth trying with your data:
• One-codex (online: https://www.onecodex.com/ )
Kraken (Wood and Salzberg, 2014)
Metaphlan2 (Truong et al., 2015)
Phylosift (Darling et al., 2014) can also be used to check your metagenomes bins / contigs(see below).

Average genome size estimation
MicrobeCensus calculates average genome size / Genome equivalents (nr of genomes sequenced) and it can be used for normalizing counts to compare different metagenomes (Nayfach and Pollard, 2015)

Metagenome assembly
Megahit (Li et al., 2015)

Metagenome assembly binning
Maxbin2 (Wu et al., 2015)

Metagenome bin checking and quality control.
CheckM(Parks et al., 2015)

Interesting resources:
Publications that are worth reading when doing shot-gun metagenomics to obtain complete genomes.
• (Albertsen et al., 2013)
(see also excellent tutorial : http://madsalbertsen.github.io/multi-metagenome/docs/step9.html )
• (Speth et al., 2016)


  • Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. (2013). Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31: 533–538.
  • Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, et al. (2015). METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol Ecol Resour 15: 1403–1414.
  • Buchfink B, Xie C, Huson DH. (2014). Fast and sensitive protein alignment using DIAMOND. Nat Meth 12: 59–60.
  • Darling AE, Jospin G, Lowe E, Matsen FA, Bik HM, Eisen JA. (2014). PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2: e243.
  • Huson DH, Auch AF, Qi J, Schuster SC. (2007). MEGAN analysis of metagenomic data. Genome Research 17: 377–386.
  • Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. (2015). MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31: 1674–1676.
  • Li H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXivorg.
  • Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17: pp. 10–12.
  • Nayfach S, Pollard KS. (2015). Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome. Genome Biol 16: 51.
  • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25: 1043–1055.
  • Schmieder R, Edwards R. (2011). Quality control and preprocessing of metagenomic datasets. Bioinformatics 27: 863–864.
  • Speth DR, In ‘t Zandt MH, Guerrero-Cruz S, Dutilh BE, Jetten MSM. (2016). Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system. Nature 7: 11172.
  • Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, et al. (2015). MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Meth 12: 902–903.
    Wood DE, Salzberg SL. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15: R46.
  • Wu Y-W, Simmons BA, Singer SW. (2015). MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics btv638.

About Thomas Haverkamp

A microbial ecologist, an amateur photographer and a proud father a wonderful girl.
This entry was posted in Genomics & more, High-throughput Sequencing, Microbes, software and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s