Location: Immunity and Disease Prevention ResearchTitle: Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
|TREIBER, MICHELLE - University Of California, Davis|
|TAFT, DIANA - University Of California, Davis|
|KORF, IAN - University Of California, Davis|
|MILLS, DAVID - University Of California, Davis|
Submitted to: BMC Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/17/2020
Publication Date: 2/24/2020
Citation: Treiber, M.L., Taft, D.H., Korf, I., Mills, D.A., Lemay, D.G. 2020. Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes. BMC Bioinformatics. (2020)21:74. https://doi.org/10.1186/s12859-020-3416-y.
Interpretive Summary: To understand the interaction between diet, the gut microbiome, and human health, it is necessary to assess the functional capacity of gut microbes. This can be done by sequencing all the DNA—metagenomes—in human fecal samples and then analyzing those sequences. The purpose of the current study was to determine how practical choices made before and after sequencing impact the ability to quantify the functional capacity of gut microbes. Two types of experiments were conducted to evaluate choices of how long the sequencing reads should be, how many sequencing reads are needed per sample, whether the DNA should be size-selected before sequencing, what protein database should be used for analysis, and what thresholds should be used for detection. The first type of experiment involved the collection of protein sequences of known function, the simulation of metagenome data from these known sequences, and the evaluation of how parameter choices impacted the ability to accurately detect these known sequences. The second type of experiment involved the use of real human fecal metagenomes to assess other parameters. This work resulted in specific recommendations to assess the functional capacity of human fecal microbiomes: a minimum of 5 million mergable reads from DNA sequenced in a 2x150bp format with size selection to enable merging of overlapping paired end reads which would then be mapped to a custom database with a detection threshold matched to the merged read length.
Technical Abstract: Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its few most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. The ability to detect genes in short read sequences is dependent on pre- and post-sequencing decisions. The objective of the current study was to determine how library size selection, read length and format, protein database, e-value threshold, and sequencing depth impact gene-centric analysis of human fecal microbiomes when using DIAMOND, an alignment tool that is up to 20,000 times faster than BLASTX.