Submitted to: Data in Brief
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/13/2016
Publication Date: 7/19/2016
Citation: Myer, P.R., Kim, M.S., Freetly, H.C., Smith, T.P. 2016. Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers. Data in Brief. 8:1048-1053. doi: 10.1016/j.dib.2016.07.027.
Interpretive Summary: The development of high-throughput sequencing platforms over the past decade has enabled the analysis of microbial communities by targeting a specific gene, the 16S ribosomal RNA gene, that all known bacteria share. By amplification and sequencing of this gene from DNA purified from the complex community of microbes in a specific environment, such as the rumen of cattle, a profile of the microbial content can be obtained, due to the fact that specific areas of the 16S gene are shared among many bacteria, but certain regions of the gene are specific to genera or species. There are nine such variable regions, but the existing “short read” high-throughput sequencing platforms are only able to capture one to three of these regions, limiting the resolution with which the community can be divided into component species. New technologies for examining up to eight of the nine variable regions of the 16S gene are evaluated in this manuscript to characterize the increase in resolution that may be possible, specifically in the case of studies of the bovine rumen. The manuscript documents and quantifies the improvement using longer read technology, and suggests an improved path for characterization of rumen bacterial populations.
Technical Abstract: Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplification primer selection, and read length, which can affect the apparent microbial community. In this study, we compared short read 16S rRNA variable regions, V1-V3, with that of near-full length 16S regions, V1-V8, using highly diverse steer rumen microbial communities, in order to examine the impact of technology selection on phylogenetic profiles. Short paired end reads from the Illumina MiSeq platform were used to generate V1-V3 sequence, while long "circular consensus" reads from the Pacific Biosciences RSII instrument were used to generate V1-V8 data. The two platforms predicted similar microbial operational taxonomic units (OTUs) as well as species richness, Good's coverage, and Shannon diversity. However, the V1-V8 amplified ruminal community resulted in significant increases in several orders of taxa, such as phyla Proteobacteria and Verrucomicrobia (P < 0.05). Taxonomic classification accuracy was also greater in the near full-length read. UniFrac distance matrices using jackknifed UPGMA clustering also noted differences between the communities. These data support the consensus that longer reads result in a finer phylogenetic resolution that may not be achieved by shorter 16S rRNA gene fragments. Our work on the cattle rumen bacterial community demonstrates that utilizing near full-length 16S reads may be useful in conducting a more thorough study, or for developing a niche-specific database to use in analyzing data from shorter read technologies when budgetary constraints preclude use of near-full length 16S sequencing.