Location: Grape Genetics Research Unit (GGRU)
Title: Comparison of short-read sequence aligners indicates strengths and weaknesses for biologists to considerAuthor
MUSICH, RYAN - Rochester Institute Of Technology | |
Cadle-Davidson, Lance | |
OSIER, MICHAEL - Rochester Institute Of Technology |
Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/29/2021 Publication Date: 4/16/2021 Citation: Musich, R., Cadle Davidson, L.E., Osier, M.V. 2021. Comparison of short-read sequence aligners indicates strengths and weaknesses for biologists to consider. Frontiers in Plant Science. 12:657240. https://doi.org/10.3389/fpls.2021.657240. DOI: https://doi.org/10.3389/fpls.2021.657240 Interpretive Summary: Aligning short DNA sequences to a long, reference DNA sequence is the first step to most genomic analyses, but not all software programs perform equally. Choosing among the growing body of available alignment programs can be challenging. Here, we discuss the merits of common alignment programs in a way that should be approachable to biologists with limited experience in bioinformatics. To compare alignment programs (Bowtie2, BWA, HISAT2, MUMmer4, STAR, and TopHat2), an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus. All aligners performed well with the exception of TopHat2. BWA perhaps had the best performance, except for genes longer than 500 base pairs, for which HISAT2 and STAR performed well. HISAT2 was ~3-fold faster than the next fastest aligner in runtime. This direct comparison of commonly used aligners can help biologists decide which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many good aligners available. Technical Abstract: Aligning short-read sequences is the foundational step to most genomic and transcriptomic analyses, but not all tools perform equally, and choosing among the growing body of available tools can be daunting. Here, we discuss the merits of common algorithms and programs in a way that should be approachable to biologists with limited experience in bioinformatics. We will only in passing consider the effects of data cleanup, a precursor analysis to most alignment tools, and no consideration will be given to downstream processing of the aligned fragments. To compare aligners (Bowtie2, BWA, HISAT2, MUMmer4, STAR, and TopHat2), an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus Erysiphe necator. Based on alignment rate and gene coverage, all aligners performed well with the exception of TopHat2, which HISAT2 superseded. BWA perhaps had the best performance in these metrics, except for longer transcripts (>500bp) for which HISAT2 and STAR performed well. HISAT2 was ~3-fold faster than the next fastest aligner in runtime, which we consider a secondary factor in most alignments. At the end, this direct comparison of commonly used aligners illustrate key considerations when choosing which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many quality aligners available. |