Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #386566

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: A multiple genome alignment workflow shows the impact of repeat masking and parameter tuning on alignment of functional regions in plants

Author
item WU, YAOYAO - Cornell University
item JOHNSON, LYNN - Cornell University
item SONG, BAOXING - Cornell University
item ROMAY, MARIA CINTA - Cornell University
item STITZER, MICHELLE - Cornell University
item SIEPEL, ADAM - Cold Spring Harbor Laboratory
item Buckler, Edward - Ed
item SCHEBEN, ARMIN - Cold Spring Harbor Laboratory

Submitted to: bioRxiv
Publication Type: Other
Publication Acceptance Date: 6/2/2021
Publication Date: 6/21/2021
Citation: Wu, Y., Johnson, L., Song, B., Romay, M., Stitzer, M., Siepel, A., Buckler IV, E.S., Scheben, A. 2021. A multiple genome alignment workflow shows the impact of repeat masking and parameter tuning on alignment of functional regions in plants. bioRxiv. https://doi.org/10.1101/2021.06.01.446647.
DOI: https://doi.org/10.1101/2021.06.01.446647

Interpretive Summary: In order to identify the underlying genetic causes of differences between species, one must first compare their genome sequences. Existing tools can accomplish this, but require specialist knowledge to implement. . The many requirements and types of software involved can make the seemingly straightforward task of multiple sequence comparison technically challenging for individual researchers. We developed the msa_pipeline workflow (https://bitbucket.org/bucklerlab/msa_pipeline) to allow comparison of diverged plant genomes with minimal user inputs. The msa_pipeline leverages existing tools to provide a practical solution for rapid multiple alignment of genomes with minimal user effort. As the pace of genome sequencing and assembly accelerates, comparison of the genomes of tens to hundreds of species will drive biological discovery in plants. Our workflow presented here provides a practical first step to perform these comparisons.

Technical Abstract: In order to identify the underlying genetic causes of differences between species, one must first compare their genome sequences. Existing tools can accomplish this, but require specialist knowledge to implement. . The many requirements and types of software involved can make the seemingly straightforward task of multiple sequence comparison technically challenging for individual researchers. We developed the msa_pipeline workflow (https://bitbucket.org/bucklerlab/msa_pipeline) to allow comparison of diverged plant genomes with minimal user inputs. The msa_pipeline leverages existing tools to provide a practical solution for rapid multiple alignment of genomes with minimal user effort. As the pace of genome sequencing and assembly accelerates, comparison of the genomes of tens to hundreds of species will drive biological discovery in plants. Our workflow presented here provides a practical first step to perform these comparisons.