|WANG, LIYA - Cold Spring Harbor Laboratory|
|LU, ZHENGYUAN - Cold Spring Harbor Laboratory|
|REGULSKI, MICHAEL - Cold Spring Harbor Laboratory|
|JIAO, YINPING - Texas Tech University|
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/4/2020
Publication Date: 2/1/2021
Citation: Wang, L., Lu, Z., Regulski, M., Jiao, Y., Chen, J., Ware, D., Xin, Z. 2021. BSAseq: An interactive and integrated web-based workflow for identification of causal mutations in bulked f2 populations. Bioinformatics. 37(3):382-387. https://doi.org/10.1093/bioinformatics/btaa709.
Interpretive Summary: Genome editing promises to revolutionize plant breeding. However, it is essential to know the target genes for editing to apply this technology effectively. In collaboration with scientists from the Cold Spring Harbor Laboratory, ARS scientists from Lubbock, Texas constructed a free online workflow that can discover genes controlling many agricultural traits in sorghum rapidly and at affordable cost. The workflow has been tested with 11 published data sets. In each case, the workflow detected the gene correctly. The cost for finding a gene is less than $200. This workflow can be used in other organisms for fast discovering of genes and finding targets for genome editing.
Technical Abstract: With the advance of Next-Generation Sequencing (NGS) technologies and reduced cost, Bulked Segregant Analysis (BSA) has become not only a powerful tool for mapping quantitative trait loci (QTL) but also a useful way for the identification of causal gene mutations underlying phenotypes of interest. However, due to the presence of background mutations and errors in sequencing, genotyping, and reference assembly, it is often difficult to distinguish true causal mutation from background mutations. In this work, we have developed a BSAseq workflow, including an automated bioinformatics analysis pipeline with a probabilistic model for estimating the segregation region and an interactive Shiny web app for visualizing the results. Furthermore, we deeply sequenced a male sterile parental line (ms8) for capturing the majority of the background mutations. We applied the workflow to 11 bulked F2 populations and identified the true causal mutation in each population. The workflow is intuitive and easy to use for users with or without bioinformatics analysis skills. We expect the workflow will find a wide application in the identification of causal mutations for many phenotypes of interest.