Skip to main content
ARS Home » Plains Area » Manhattan, Kansas » Center for Grain and Animal Health Research » Hard Winter Wheat Genetics Research » Research » Publications at this Location » Publication #294201

Title: Impact of marker ascertainment bias on genomic selection accuracy and estimates of genetic diversity

item HESLOT, NICOLAS - Cornell University
item RUTKOSKI, JESSICA - Cornell University
item Poland, Jesse
item Jannink, Jean-Luc
item SORRELLS, MARK - Cornell University

Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/9/2013
Publication Date: 9/5/2013
Publication URL:
Citation: Heslot, N., Rutkoski, J., Poland, J.A., Jannink, J., Sorrells, M.E. 2013. Impact of marker ascertainment bias on genomic selection accuracy and estimates of genetic diversity. PLoS One. 8(9): e74612.

Interpretive Summary: To use molecular DNA markers in a breeding program for prediction of traits and selection of superior lines or to understand the diversity in germplasm collections, it is important that the molecular markers are an accurate representation of the overall population under study. Marker bias can arise when molecular markers are discovered in one set of material and then applied to a different set of lines. Newly developed methods for genotyping that rely on DNA sequencing have the advantage of discovering polymorphisms at the same time they are assayed in the population. Using this ‘genotyping-by-sequencing’ (GBS) approach, a set of 365 winter wheat breeding lines was evaluated. These lines were also genotyping using DArT (Diversity Array Technology) markers, which is a fixed array platform that has formed the basis of most of our knowledge about cereals genetic diversity and is used for genomic selection. It was found that the GBS markers gave higher prediction accuracy for genomic selection and that relative to DArT markers, the GBS markers captured more of the genetic diversity in the population. There are many more GBS markers than DArT markers so an equal number of markers from each set were compared. When using equal number of markers there was not a difference in prediction accuracy between GBS and DArT suggesting that the increased accuracy is largely due to having more markers in the GBS dataset. We conclude that GBS markers are a usable platform for genomic selection and a preferable platform for assessing genetic diversity due to the simultaneous discovery and typing of DNA polymorphisms.

Technical Abstract: Genome-wide molecular markers are readily being applied to evaluate genetic diversity in germplasm collections and for making genomic selections in breeding programs. To accurately predict phenotypes and assay genetic diversity, molecular markers should assay a representative sample of the polymorphisms in the population under study. Ascertainment bias arises when marker data is not obtained from a random sample of the polymorphisms in the population of interest. Genotyping-by-sequencing (GBS), is rapidly emerging as a low cost genotyping platform, even for the large, complex, and polyploid wheat (Triticum aestivum L.) genome. With GBS, marker discovery is simultaneous with genotyping of large populations resulting in minimal ascertainment bias. The previous platform of choice for whole-genome genotyping in many species such as wheat was DArT (Diversity Array Technology), and has formed the basis of most of our knowledge about cereals genetic diversity. This study compared GBS and DArT marker platforms for measuring genetic diversity and genomic selection (GS) accuracy in elite U.S. winter wheat. From a set of 365 breeding lines, 38,412 single nucleotide polymorphism (SNP) GBS markers were discovered and genotyped. The GBS SNPs gave a higher GS accuracy than 1544 DArTs markers on the same lines, despite 43.9% missing data for the GBS markers. Using a bootstrap approach, we observed significantly more clustering of markers and ascertainment bias with DArT markers relative to GBS SNPs. The minor allele frequency (MAF) distribution of DArT markers was significantly skewed with an excess of medium frequency variants compared to GBS and a deficit of rare variants. Despite the ascertainment bias of the DArTs, GS accuracy for three traits out of four was not significantly different when an equal number of markers was used for each platform. This suggests that the gain in accuracy observed using GBS compared to DArT markers is mainly due to a large increase in the number markers available for the analysis.