Location: Hard Winter Wheat Genetics ResearchTitle: Haplocatcher: A package for prediction of haplotypes
|WINN, ZACHARY - Colorado State University|
|HUDSON-ARNS, EMILY - Colorado State University|
|HAMMERS, MIKAYLA - Colorado State University|
|DEWITT, NOAH - Louisiana State University|
|LYERLY, JEANETTE - North Carolina State University|
|St Amand, Paul|
|HALEY, SCOTT - Colorado State University|
|MASON, RICHARD - Colorado State University|
Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/15/2023
Publication Date: 7/24/2023
Citation: Winn, Z.J., Hudson-Arns, E., Hammers, M., Dewitt, N., Lyerly, J., Bai, G., St Amand, P.C., Haley, S., Mason, R.E. 2023. Haplocatcher: A package for prediction of haplotypes. The Plant Genome. https://doi.org/10.1101/2023.07.20.549744.
Interpretive Summary: Breeders use molecular markers to identify lines possessing beneficial haplotypes. Breeding programs may leverage genome-wide PCR-based marker information to make inferences about target haplotypes in newly sequenced lines. In this study, we developed "HaploCatcher", an R package, to predict haplotypes of interest in the lines genotyped using genome-wide markers. The package was used to predict the Sst1 haplotypes of 292 new breeding lines with high accuracy based on the data from 1,056 wheat breeding lines that have genome-wide markers and the Sst1 marker. The package is freely available and can be utilized to predict haplotypes in whole-genome sequenced early generation materials.
Technical Abstract: Wheat (Triticum aestivum L.) is crucial to global food security, but is often threatened by diseases, pests, and environmental stresses. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes. Breeding programs have heavily invested in genome-wide genotyping platforms that produce high-volume, non-targeted molecular information. Early-stage lines for which non-targeted genotypes are available are not characterized for beneficial haplotypes. This implies that breeding programs may leverage genome-wide polymerase chain reaction (PCR)-based marker information to make inferences about haplotypes in newly sequenced lines. In this study, an R package titled "HaploCatcher" was developed to predict specific haplotypes of interest in the lines genotyped using genome-wide markers. A training population of 1,056 lines genotyped for the Sst1 locus and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes made with the training population were compared to marker derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed Colorado State lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy to that estimated in cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling were not significantly different. The HaploCatcher package is freely available and may be utilized by breeding programs to predict haplotypes in whole-genome sequenced early generation materials.