Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #338689

Title: SSGP: SNP-set based genomic prediction to incorporate biological information

item JIANG, JICAI - University Of Maryland
item O'CONNELL, JEFFREY - University Of Maryland
item Vanraden, Paul
item MA, LI - University Of Maryland

Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 3/15/2017
Publication Date: 6/24/2017
Citation: Jiang, J., O'Connell, J.R., Van Raden, P.M., Ma, L. 2017. SSGP: SNP-set based genomic prediction to incorporate biological information. Journal of Dairy Science. 100(Suppl. 2):412–413(abstr. 470).

Interpretive Summary:

Technical Abstract: Genomic prediction has emerged as an effective approach in plant and animal breeding and in precision medicine. Much research has been devoted to an improved accuracy in genomic prediction, and one of the potential ways is to incorporate biological information. Due to the statistical and computational challenges in large genomics studies, however, a fast and flexible method to incorporate such external information is still lacking. Here, we proposed a linear mixed model that can incorporate biological information in a flexible way and developed a fast variational Bayes-based software package named SSGP. In our model, whole genome markers can be split into groups in a user-defined manner, and each group of markers is given a common effect variance. Since previous functional genomics studies have accumulated much evidence on which genes, genomic regions or pathways are more/less important for a trait of interest, we can divide genome-wide SNPs into a number of groups based on their levels of importance and then use the predefined SNP sets in SSGP. Additionally, each marker has a pre-specified weight for which the rule can be flexibly assigned, e.g. based on minor allele frequency or linkage disequilibrium pattern. Our proposed model was implemented with the parameter expanded variational Bayesian method, making it fast and feasible to analyze very large datasets. SSGP was written in C++ with the Intel MKL library. For testing purpose, we analyzed a large cattle dataset consisting of ~20k bulls and ~760k whole-genome SNP markers. By simply grouping markers based on proximity, SSGP already performed better than Bayes A in all five milk traits analyzed, with an increase of up to 10% in prediction accuracy. Meantime, it took only ~5h for each trait with 20 threads. We also analyzed many simulation datasets and the WTCCC heterogeneous stock mice dataset for which the results of many existing methods had been reported. Generally, SSGP could achieve similar prediction performance compared to the best approaches reported, though only proximity was used for grouping SNPs. Collectively, the method and software show great potential to increase accuracy in genomic prediction, particularly in the future when more useful biological information is becoming available.