Skip to main content
ARS Home » Northeast Area » Geneva, New York » Grape Genetics Research Unit (GGRU) » Research » Publications at this Location » Publication #381418

Research Project: Grapevine Genetics, Genomics and Molecular Breeding for Disease Resistance, Abiotic Stress Tolerance, and Improved Fruit Quality

Location: Grape Genetics Research Unit (GGRU)

Title: Comparison of short-read sequence aligners indicates strengths and weaknesses for biologists to consider

Author
item MUSICH, RYAN - ROCHESTER INSTITUTE OF TECHNOLOGY
item Cadle-Davidson, Lance
item OSIER, MICHAEL - ROCHESTER INSTITUTE OF TECHNOLOGY

Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/29/2021
Publication Date: 4/16/2021
Citation: Musich, R., Cadle Davidson, L.E., Osier, M.V. 2021. Comparison of short-read sequence aligners indicates strengths and weaknesses for biologists to consider. Frontiers in Plant Science. 12:657240. https://doi.org/10.3389/fpls.2021.657240.
DOI: https://doi.org/10.3389/fpls.2021.657240

Interpretive Summary: Aligning short DNA sequences to a long, reference DNA sequence is the first step to most genomic analyses, but not all software programs perform equally. Choosing among the growing body of available alignment programs can be challenging. Here, we discuss the merits of common alignment programs in a way that should be approachable to biologists with limited experience in bioinformatics. To compare alignment programs (Bowtie2, BWA, HISAT2, MUMmer4, STAR, and TopHat2), an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus. All aligners performed well with the exception of TopHat2. BWA perhaps had the best performance, except for genes longer than 500 base pairs, for which HISAT2 and STAR performed well. HISAT2 was ~3-fold faster than the next fastest aligner in runtime. This direct comparison of commonly used aligners can help biologists decide which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many good aligners available.

Technical Abstract: Aligning short-read sequences is the foundational step to most genomic and transcriptomic analyses, but not all tools perform equally, and choosing among the growing body of available tools can be daunting. Here, we discuss the merits of common algorithms and programs in a way that should be approachable to biologists with limited experience in bioinformatics. We will only in passing consider the effects of data cleanup, a precursor analysis to most alignment tools, and no consideration will be given to downstream processing of the aligned fragments. To compare aligners (Bowtie2, BWA, HISAT2, MUMmer4, STAR, and TopHat2), an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus Erysiphe necator. Based on alignment rate and gene coverage, all aligners performed well with the exception of TopHat2, which HISAT2 superseded. BWA perhaps had the best performance in these metrics, except for longer transcripts (>500bp) for which HISAT2 and STAR performed well. HISAT2 was ~3-fold faster than the next fastest aligner in runtime, which we consider a secondary factor in most alignments. At the end, this direct comparison of commonly used aligners illustrate key considerations when choosing which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many quality aligners available.