Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #328124

Title: Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications

Author
item WU, XIAO-LIN - Geneseek Inc, A Neogen Company
item XU, JIAQI - Geneseek Inc, A Neogen Company
item FENG, GUOFEI - Geneseek Inc, A Neogen Company
item Wiggans, George
item TAYLOR, JEREMY - University Of Missouri
item HE, JUN - Hunan Agricultural University
item QIAN, CHANGSONG - Collaborator
item QIU, JIANSHENG - Geneseek Inc, A Neogen Company
item SIMPSON, BARRY - Geneseek Inc, A Neogen Company
item WALKER, JEREMY - Geneseek Inc, A Neogen Company
item BAUCK, STEWART - Geneseek Inc, A Neogen Company

Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/10/2016
Publication Date: 9/1/2016
Citation: Wu, X., Xu, J., Feng, G., Wiggans, G.R., Taylor, J.F., He, J., Qian, C., Qiu, J., Simpson, B., Walker, J., Bauck, S. 2016. Optimal design of low-density SNP arrays for genomic prediction: Algorithm and applications. PLoS One. 11(9):e0161719.

Interpretive Summary: A multiple-objective, local optimization algorithm was developed to select Single Nucleotide polymorphisms (SNP) for low density chips. The objective was to facilitate accurate imputation to medium-density or high-density SNP genotypes for genomic prediction. The competing goals of high minor allele frequency and even spacing are considered in the optimization. The method allows for inclusion of mandatory SNP. It optionally provides for greater density of SNP at the ends of the chromosomes to improve imputation accuracy. It gives measurably higher imputation accuracy which translates into more accurate genomic evaluation that an equal number of SNP selected on minor allele frequency or with uniform spacing.

Technical Abstract: Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. Use of HASE increased system information over LASE when <=1,000 SNPs were selected, but the difference diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For non-uniform design, a tunable empirical Beta distribution was used to guide location distribution of framework SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized as the objective function was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly spaced and highly informative SNPs that led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs or design that optimized SNP minor allelic frequency (MAF). Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with >=3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Out results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. With this algorithm, loss of prediction accuracy was minimal because imputation accuracy was quite high.