Submitted to: Frontiers in Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/25/2018
Publication Date: 2/26/2018
Citation: Keel, B.N., Snelling, W.M. 2018. Comparison of Burrows-Wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to Illumina data for livestock genomes. Frontiers in Genetics. 9:35. https://doi.org/10.3389/fgene.2018.00035.
Interpretive Summary: Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. The evolution of NGS has been paralleled by the development of software to analyze the increasing quantity of data being produced. A fundamental step in the analysis of NGS data is the alignment of the sequence reads to a reference genome. Efficient alignment of reads with high accuracy is very important because it determines the global quality of downstream analyses. Currently, there are more than 60 different programs available for aligning reads to a reference genome. Hence, choosing an appropriate alignment tool has proven to be a challenging task. ARS scientists have used simulated NGS data to evaluate the computational efficiency and accuracy of three aligners that are widely used in analyses involving large mammalian genomes. The results show that there is not a single alignment tool that is ideal in all scenarios but rather the choice of aligner should be driven by the application and sequencing technology. This work provides ARS scientists and other researchers with guidelines for selecting an accurate and efficient alignment tool for Illumina sequencing data of large mammalian genomes, which should boost the quality of subsequent data analyses.
Technical Abstract: Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Efficient alignment of the reads onto the reference genome with high accuracy is very important because it determines the global quality of downstream analyses. In this study, we evaluate the performance of three Burrows-Wheeler transform-based mappers, BWA, Bowtie2, and HISAT2, in the context of paired-end Illumina whole-genome sequencing of livestock, using simulated sequence data sets with varying sequence read lengths, insert sizes, and levels of genomic coverage, as well as five real data sets. The mappers were evaluated based on two criteria, computational resource/time requirements and robustness of mapping. Our results show that BWA and Bowtie2 tend to be more robust than HISAT2, while HISAT2 was significantly faster and used less memory than both BWA and Bowtie2. We conclude that there is not a single mapper that is ideal in all scenarios but rather the choice of alignment tool should be driven by the application and sequencing technology.